A Perceptron in Rust: How One Struct Learns to Say "Yes" and "No"
A straightforward guide to building a perceptron system that learns to perform basic logical operations such as AND and OR.
Machine learning is easy to picture as something vast: layers,
tensors, GPUs. Yet at its very core sits a tiny idea that
fits entirely into a single Rust struct. That idea is the
perceptron — one artificial neuron that looks at a few numbers
and answers "yes" or "no". Here we build it as a library
crate, train it on the logic gates AND and OR, and then
hit XOR — a task this neuron simply cannot solve. That wall
is the most valuable lesson of the course.
One Neuron Under Const Generics
A perceptron holds exactly three things: weights (one per input), a bias, and a learning rate. The number of inputs is known up front, so we bake it into the type with a const parameter.
pub struct Perceptron<const N: usize> {
weights: [f64; N],
bias: f64,
learning_rate: f64,
}
The parameter N is the input dimension, and it is a usize
because we use it as the length of the array [f64; N]. The
alternative is to store weights in a Vec and check input
length at runtime. Const generics remove those checks: a
two-input perceptron and a three-element array simply will not
compile together. Dimensionality becomes part of the type, and
a mismatch is caught at compile time rather than panicking in
the middle of training.
The weights field is the influence of each input on the
decision. bias shifts the threshold so the decision boundary
isn't tied to the origin. learning_rate sets how sharply the
neuron corrects itself when it is wrong.
A Random Start Instead of Zeros
We create a perceptron with the associated function new().
Weights and bias are filled not with zeros but with random
numbers.
pub fn new(learning_rate: f64) -> Self {
let mut rng = rand::rng();
let weights =
array::from_fn(|_| rng.random_range(-1.0..1.0));
let bias = rng.random_range(-1.0..1.0);
Self { weights, bias, learning_rate }
}
array::from_fn from the standard library builds an array of
length N, calling the closure for each index. That gives us
[f64; N] without ever spelling out N in the code — the
compiler infers the length from the field's type. The
random_range method comes from the Rng trait of the rand
crate; the range -1.0..1.0 gives small weights on both sides
of zero.
Why not zeros? If every weight is equal, the neuron starts from a symmetric point, and for linear tasks it would still converge. But a random start is a general habit that becomes critical for more complex models: there, identical weights get "stuck", learning in lockstep. We pick up the right habit from the start.
The Forward Pass: Weighted Sum and Threshold
The neuron's decision is the forward() method. It takes an
immutable reference to itself, because predicting changes
nothing inside: it is a projection of state and inputs onto a
single answer.
pub fn forward(&self, inputs: &[f64; N]) -> f64 {
let weighted_sum = self
.weights
.iter()
.zip(inputs)
.map(|(w, x)| w * x)
.sum::<f64>()
+ self.bias;
if weighted_sum > 0.0 { 1.0 } else { 0.0 }
}
Let's walk the chain. iter() yields an iterator over the
weights, zip() pairs each weight with its matching input,
and map() multiplies them. sum::<f64>() adds up all the
products; the type parameter here is mandatory, because the
compiler can't guess which type to sum into. To the sum we add
bias — and that is the weighted sum with the threshold
shifted.
The last line is the activation function. Zero is chosen as
the decision boundary on purpose: the bias has already shifted
the threshold where it needs to be, so comparing against zero
answers the question "which side of the separating hyperplane
does the input lie on?". If the sum is strictly greater than
zero, the answer is 1.0, otherwise 0.0. This is a step
function: fractional numbers live inside, a clean binary
decision comes out.
The predict() method is the public name for that same
computation: it just calls forward(). The split is handy
because training uses forward() internally, while outside
code calls the readable predict().
The Learning Rule: Error Moves the Weights
Training lives in train(). It takes a mutable reference to
itself, a slice of training pairs, and a number of epochs.
Each pair is an array of inputs and the expected answer.
pub fn train(
&mut self,
data: &[([f64; N], f64)],
epochs: usize,
) {
for _epoch in 0..epochs {
for (inputs, expected) in data {
let prediction = self.forward(inputs);
let error = expected - prediction;
// ... adjust the weights
}
}
}
The type &[([f64; N], f64)] reads densely but clearly: a
slice of tuples where the first element is the input array and
the second is the expected value. For each example we compute
prediction with the current neuron and take error as the
difference between desired and obtained. The error can be
1.0, 0.0, or -1.0: the sign tells us which way the
neuron missed.
The adjustment itself is the heart of the perceptron.
for i in 0..N {
self.weights[i] +=
self.learning_rate * error * inputs[i];
}
self.bias += self.learning_rate * error;
Each weight shifts by the product of learning_rate, the
error, and its corresponding input. The multiplication by
inputs[i] is key: an input equal to 0 doesn't move its
weight at all — it took no part in this decision, so there's
nothing to blame it for. Active inputs pull their weights
toward the correct answer. The bias is adjusted without an
input factor, since it shifts the whole threshold, independent
of any signal.
Early Stopping and the Role of learning_rate
Running all epochs to the end is pointless: if a pass had no
errors at all, the neuron has already learned everything.
Let's add a counter and bail out the moment it hits zero.
let mut errors = 0;
// ... inside the loop over data:
if error.abs() > 1e-10 {
errors += 1;
// ... adjust weights and bias
}
// ... after passing over all data:
if errors == 0 {
break;
}
We compare not the error itself with zero, but its magnitude
against a tiny threshold 1e-10. The reason is f64
arithmetic: a difference is rarely exactly zero, only very
close to it, so an epsilon comes into play — a tolerance
within which we treat a value as zero. The abs() method
strips the sign so one threshold catches both positive and
negative misses. We count those misses in errors, and if a
whole epoch had none, break cuts the outer loop short.
And why have a learning_rate at all? Without it, every miss
would move the weights by the full error, and the parameters
would oscillate around the solution without ever landing on
it. A factor like 0.1 damps those oscillations: the neuron
approaches the answer in small steps and settles onto it
neatly.
AND, OR — and the XOR Wall
Let's put it all together. A two-input perceptron learns AND
in a hundred epochs.
let mut perceptron = Perceptron::<2>::new(0.1);
let data = [
([0.0, 0.0], 0.0),
([0.0, 1.0], 0.0),
([1.0, 0.0], 0.0),
([1.0, 1.0], 1.0),
];
perceptron.train(&data, 100);
After training, predict(&[1.0, 1.0]) returns 1.0, and the
other three combinations return 0.0. Swap the data table for
OR, and the same code gives a different gate: for OR the
one comes out everywhere except [0.0, 0.0]. The same neuron,
different data, different behavior. That is exactly what
learning is.
This works because AND and OR are linearly separable: on
the plane of two inputs, the zero and one answers can be split
by a single straight line. The perceptron looks for precisely
that line — the weights set its slope, the bias moves its
position.
Now XOR. Its table is one when the inputs differ and zero
when they match. Let's give the neuron even a thousand epochs.
perceptron.train(&data, 1_000);
And still at least one answer stays wrong. This is not a bug
nor a shortage of epochs: the XOR points cannot be separated
by any single line — you'd need a broken line, a curve, two
lines. A single perceptron can only draw one straight line,
and here its expressive power runs out.
Why Build This
The perceptron is valuable precisely because of its ceiling. In two hundred lines you can see how a machine learns: it predicted, measured the error, nudged the weights the right way, repeated. That same loop — forward, error, step back — underlies enormous networks, just with many layers and smooth activation functions.
The XOR wall shows why one layer isn't enough and where
multilayer networks came from in the first place: to draw not
a straight line but a complex boundary, neurons are stacked
into layers. That is a different construction and the topic of
a separate course. But the intuition worth writing this tiny
crate for will stay with you: learning isn't magic — it's a
small, honest loop of corrections.