Rust

A Perceptron in Rust: How One Struct Learns to Say "Yes" and "No"

A straightforward guide to building a perceptron system that learns to perform basic logical operations such as AND and OR.

Machine learning is easy to picture as something vast: layers, tensors, GPUs. Yet at its very core sits a tiny idea that fits entirely into a single Rust struct. That idea is the perceptron — one artificial neuron that looks at a few numbers and answers "yes" or "no". Here we build it as a library crate, train it on the logic gates AND and OR, and then hit XOR — a task this neuron simply cannot solve. That wall is the most valuable lesson of the course.

One Neuron Under Const Generics

A perceptron holds exactly three things: weights (one per input), a bias, and a learning rate. The number of inputs is known up front, so we bake it into the type with a const parameter.

pub struct Perceptron<const N: usize> {
    weights: [f64; N],
    bias: f64,
    learning_rate: f64,
}

The parameter N is the input dimension, and it is a usize because we use it as the length of the array [f64; N]. The alternative is to store weights in a Vec and check input length at runtime. Const generics remove those checks: a two-input perceptron and a three-element array simply will not compile together. Dimensionality becomes part of the type, and a mismatch is caught at compile time rather than panicking in the middle of training.

The weights field is the influence of each input on the decision. bias shifts the threshold so the decision boundary isn't tied to the origin. learning_rate sets how sharply the neuron corrects itself when it is wrong.

A Random Start Instead of Zeros

We create a perceptron with the associated function new(). Weights and bias are filled not with zeros but with random numbers.

pub fn new(learning_rate: f64) -> Self {
    let mut rng = rand::rng();
    let weights =
        array::from_fn(|_| rng.random_range(-1.0..1.0));
    let bias = rng.random_range(-1.0..1.0);
    Self { weights, bias, learning_rate }
}

array::from_fn from the standard library builds an array of length N, calling the closure for each index. That gives us [f64; N] without ever spelling out N in the code — the compiler infers the length from the field's type. The random_range method comes from the Rng trait of the rand crate; the range -1.0..1.0 gives small weights on both sides of zero.

Why not zeros? If every weight is equal, the neuron starts from a symmetric point, and for linear tasks it would still converge. But a random start is a general habit that becomes critical for more complex models: there, identical weights get "stuck", learning in lockstep. We pick up the right habit from the start.

The Forward Pass: Weighted Sum and Threshold

The neuron's decision is the forward() method. It takes an immutable reference to itself, because predicting changes nothing inside: it is a projection of state and inputs onto a single answer.

pub fn forward(&self, inputs: &[f64; N]) -> f64 {
    let weighted_sum = self
        .weights
        .iter()
        .zip(inputs)
        .map(|(w, x)| w * x)
        .sum::<f64>()
        + self.bias;
    if weighted_sum > 0.0 { 1.0 } else { 0.0 }
}

Let's walk the chain. iter() yields an iterator over the weights, zip() pairs each weight with its matching input, and map() multiplies them. sum::<f64>() adds up all the products; the type parameter here is mandatory, because the compiler can't guess which type to sum into. To the sum we add bias — and that is the weighted sum with the threshold shifted.

The last line is the activation function. Zero is chosen as the decision boundary on purpose: the bias has already shifted the threshold where it needs to be, so comparing against zero answers the question "which side of the separating hyperplane does the input lie on?". If the sum is strictly greater than zero, the answer is 1.0, otherwise 0.0. This is a step function: fractional numbers live inside, a clean binary decision comes out.

The predict() method is the public name for that same computation: it just calls forward(). The split is handy because training uses forward() internally, while outside code calls the readable predict().

The Learning Rule: Error Moves the Weights

Training lives in train(). It takes a mutable reference to itself, a slice of training pairs, and a number of epochs. Each pair is an array of inputs and the expected answer.

pub fn train(
    &mut self,
    data: &[([f64; N], f64)],
    epochs: usize,
) {
    for _epoch in 0..epochs {
        for (inputs, expected) in data {
            let prediction = self.forward(inputs);
            let error = expected - prediction;
            // ... adjust the weights
        }
    }
}

The type &[([f64; N], f64)] reads densely but clearly: a slice of tuples where the first element is the input array and the second is the expected value. For each example we compute prediction with the current neuron and take error as the difference between desired and obtained. The error can be 1.0, 0.0, or -1.0: the sign tells us which way the neuron missed.

The adjustment itself is the heart of the perceptron.

for i in 0..N {
    self.weights[i] +=
        self.learning_rate * error * inputs[i];
}
self.bias += self.learning_rate * error;

Each weight shifts by the product of learning_rate, the error, and its corresponding input. The multiplication by inputs[i] is key: an input equal to 0 doesn't move its weight at all — it took no part in this decision, so there's nothing to blame it for. Active inputs pull their weights toward the correct answer. The bias is adjusted without an input factor, since it shifts the whole threshold, independent of any signal.

Early Stopping and the Role of learning_rate

Running all epochs to the end is pointless: if a pass had no errors at all, the neuron has already learned everything. Let's add a counter and bail out the moment it hits zero.

let mut errors = 0;
// ... inside the loop over data:
if error.abs() > 1e-10 {
    errors += 1;
    // ... adjust weights and bias
}
// ... after passing over all data:
if errors == 0 {
    break;
}

We compare not the error itself with zero, but its magnitude against a tiny threshold 1e-10. The reason is f64 arithmetic: a difference is rarely exactly zero, only very close to it, so an epsilon comes into play — a tolerance within which we treat a value as zero. The abs() method strips the sign so one threshold catches both positive and negative misses. We count those misses in errors, and if a whole epoch had none, break cuts the outer loop short.

And why have a learning_rate at all? Without it, every miss would move the weights by the full error, and the parameters would oscillate around the solution without ever landing on it. A factor like 0.1 damps those oscillations: the neuron approaches the answer in small steps and settles onto it neatly.

AND, OR — and the XOR Wall

Let's put it all together. A two-input perceptron learns AND in a hundred epochs.

let mut perceptron = Perceptron::<2>::new(0.1);
let data = [
    ([0.0, 0.0], 0.0),
    ([0.0, 1.0], 0.0),
    ([1.0, 0.0], 0.0),
    ([1.0, 1.0], 1.0),
];
perceptron.train(&data, 100);

After training, predict(&[1.0, 1.0]) returns 1.0, and the other three combinations return 0.0. Swap the data table for OR, and the same code gives a different gate: for OR the one comes out everywhere except [0.0, 0.0]. The same neuron, different data, different behavior. That is exactly what learning is.

This works because AND and OR are linearly separable: on the plane of two inputs, the zero and one answers can be split by a single straight line. The perceptron looks for precisely that line — the weights set its slope, the bias moves its position.

Now XOR. Its table is one when the inputs differ and zero when they match. Let's give the neuron even a thousand epochs.

perceptron.train(&data, 1_000);

And still at least one answer stays wrong. This is not a bug nor a shortage of epochs: the XOR points cannot be separated by any single line — you'd need a broken line, a curve, two lines. A single perceptron can only draw one straight line, and here its expressive power runs out.

Why Build This

The perceptron is valuable precisely because of its ceiling. In two hundred lines you can see how a machine learns: it predicted, measured the error, nudged the weights the right way, repeated. That same loop — forward, error, step back — underlies enormous networks, just with many layers and smooth activation functions.

The XOR wall shows why one layer isn't enough and where multilayer networks came from in the first place: to draw not a straight line but a complex boundary, neurons are stacked into layers. That is a different construction and the topic of a separate course. But the intuition worth writing this tiny crate for will stay with you: learning isn't magic — it's a small, honest loop of corrections.