One neuron that learns: building a perceptron from scratch in Rust
Before deep learning, before backpropagation, there was one neuron. In 1957 Frank Rosenblatt's perceptron could learn to classify inputs by nudging a handful of numbers until it stopped making mistakes. It's the smallest thing in machine learning that genuinely learns — and it fits in about forty lines of Rust. We'll build it one small piece at a time, teach it the AND gate, and then watch it fail at XOR in a way that rerouted the history of AI.
The idea, in one sentence
A perceptron multiplies each input by a learned weight, sums everything, adds a bias, and squashes the result into a 0 or a 1:
y = step(w_1 x_1 + w_2 x_2 + \ldots + w_n x_n + b)
The weights and bias are the only state. "Learning" means adjusting them. That's the whole model — everything below is making those numbers move in the right direction.
A struct sized at compile time
A perceptron for two inputs and one for three inputs are genuinely different things, and Rust lets us say so with a const generic. We make the input count N part of the type itself:
struct Perceptron<const N: usize> {
weights: [f64; N],
bias: f64,
learning_rate: f64,
}
weights is a fixed-size [f64; N] — no heap allocation, no length to check at runtime. Because N lives in the type, Perceptron<2> and Perceptron<3> are distinct types, so an array meant for a 2-input neuron can never be fed to a 3-input one. A whole class of dimension-mismatch bugs simply can't compile.
The remaining two fields are scalars, not arrays. bias shifts the decision boundary off the origin so the line doesn't have to pass through (0,0), and learning_rate controls how big each correction is during training — a single knob we'll set once and reuse on every update.
Random starting weights break symmetry
If every weight started at zero, every input would contribute identically and the neuron could never tell features apart. So we seed them with small random values. Rust's standard library has no RNG, so we pull in the rand crate and write the whole constructor at once:
impl<const N: usize> Perceptron<N> {
fn new(learning_rate: f64) -> Self {
let mut rng = rng();
Self {
weights: array::from_fn(|_| {
rng.random_range(-1.0..1.0)
}),
bias: rng.random_range(-1.0..1.0),
learning_rate,
}
}
}
rng() (from the rand crate) hands us a thread-local generator, and it's mut because drawing a number advances its internal state. The constructor takes only the learning rate; everything else is randomized, so two freshly built neurons start from different points.
array::from_fn (from std) is the natural partner to const generics: it calls the closure once per index and builds the [f64; N] for us — no loop, no uninitialized memory. Each call draws a fresh value in [-1, 1], a symmetric range that keeps the initial weighted sum small and unbiased toward either output class. The bias gets the same [-1, 1] treatment so the starting boundary isn't pinned to the origin, while learning_rate is moved straight in from the argument — the one piece of state the caller chooses rather than chance.
The forward pass: predict in two steps
Prediction takes inputs and returns a label. It doesn't change anything, so it borrows &self — a weighted sum followed by a threshold:
fn forward(&self, inputs: &[f64; N]) -> f64 {
let sum: f64 = self.weights.iter()
.zip(inputs)
.map(|(w, x)| w * x)
.sum::<f64>() + self.bias;
if sum > 0.0 { 1.0 } else { 0.0 }
}
The zip().map().sum() chain is the idiomatic way to express a dot product in Rust — it reads like the formula and the compiler optimizes it as tightly as a hand-written loop. We pair each weight with its input, multiply, total them, then add the bias once at the end.
That final if is the step activation: a hard threshold with no in-between. There's no "maybe 0.7" here — a single-layer perceptron is a blunt binary classifier, and that bluntness is exactly what we'll run into later.
The learning rule is one subtraction
Here's the part that earns the word "learn." For each labeled example we predict, measure the error, and push the weights in the direction that reduces it:
\Delta w_i = \eta \cdot
(y_{\text{true}} - y_{\text{pred}}) \cdot x_i
One epoch is a sweep over the data, and we repeat for a fixed budget. Here's the whole training loop, with the actual correction left for the next snippet:
fn train(&mut self, data: &[([f64; N], f64)],
epochs: usize) {
for _ in 0..epochs {
let mut errors = 0;
for (inputs, expected) in data {
let error =
expected - self.forward(inputs);
if error.abs() > 1e-10 {
errors += 1;
// … correct weights and bias
}
}
if errors == 0 { break; }
}
}
Training takes &mut self because, unlike forward, it rewrites the neuron's state. The outer loop runs at most epochs times; the errors counter, reset each epoch, tells us whether a whole pass went by without a single mistake. Read the logic of error: if the prediction is right it's 0 and the guard skips the update entirely. If we said 0 but should have said 1, error is +1; if we overshot, it's -1 — and the 1e-10 comparison is just a safe way to ask "is this float nonzero" without trusting exact equality.
The early break exploits a remarkable fact: for any linearly separable problem this rule is guaranteed to converge in finite steps — so a zero-error epoch means we're done and can stop early.
Now the correction itself, the part that earns the word "learn":
for (w, x) in self.weights
.iter_mut().zip(inputs)
{
*w += self.learning_rate * error * x;
}
self.bias += self.learning_rate * error;
Each weight moves by learning_rate * error * x. The sign of error decides the direction — grow when we undershot, shrink when we overshot — and multiplying by the input x means active features get adjusted more, while a zero input contributes no change at all. The bias updates the same way but without an x, since it has no input to scale by.
Proof it works: the AND gate
The honest test of "did it learn" is an integration test in tests/, which can only touch the public API — exactly how a real user would call it. Set up the neuron and truth table, then verify every row:
#[test]
fn learns_and() {
let mut p = Perceptron::<2>::new(0.1);
let data = [
([0.0, 0.0], 0.0),
([0.0, 1.0], 0.0),
([1.0, 0.0], 0.0),
([1.0, 1.0], 1.0),
];
p.train(&data, 100);
for (inputs, expected) in &data {
assert_eq!(p.forward(inputs), *expected);
}
}
We build a 2-input neuron with a learning rate of 0.1 and hand it the four rows of AND. The output is 1 only when both inputs are 1 — and crucially, you can draw a single straight line that fences off the (1,1) corner from the other three points.
A hundred epochs is plenty: because AND is linearly separable, the rule converges and the early break likely fires long before the budget runs out. The line it finds is the decision boundary the weights and bias encode. Swap in the OR truth table (three 1s instead of one) and the very same code learns it too — a different boundary, discovered automatically, with not a single line changed.
Where it breaks: XOR and the limit of one line
Now try XOR — output 1 only when the inputs differ. A fresh test feeds it the new truth table and trains hard:
#[test]
fn cannot_learn_xor() {
let mut p = Perceptron::<2>::new(0.1);
let data = [
([0.0, 0.0], 0.0),
([0.0, 1.0], 1.0),
([1.0, 0.0], 1.0),
([1.0, 1.0], 0.0),
];
p.train(&data, 1000); // 10x the epochs
// … assert it never gets all four right
}
Plot those four points: the two 1s sit on one diagonal, the two 0s on the other. No straight line separates them. We give training ten times the budget — 1000 epochs — precisely to show that more effort doesn't rescue it; with no separating line, the rule never finds a zero-error pass and just oscillates.
So we assert the failure on purpose, in place of that comment:
let all_correct = data.iter()
.all(|(inp, exp)| p.forward(inp) == *exp);
assert!(!all_correct,
"a single line can't separate XOR");
The all adapter is true only if every row predicts correctly, so !all_correct codifies the expectation that at least one row stays wrong. This isn't a bug; it's the boundary of the model. Since our perceptron is a single line (one weighted sum, one threshold), it physically cannot represent XOR. Minsky and Papert's 1969 book made exactly this point, deflating perceptron hype and helping trigger an "AI winter." The cure — stacking neurons into layers so they can carve curved boundaries — is the multi-layer perceptron, a different course entirely.
Where to go next
This neuron is complete and honest, but a few steps take it further:
- More inputs, real data —
Perceptron<4>classifies4-feature samples with zero code changes; try a simple linearly-separable dataset. - A smooth activation — swapping the hard step for a sigmoid gives gradients, the doorway to gradient descent and, eventually, backprop.
- Track convergence — return the per-epoch error count to watch it learn, and to see XOR refuse to settle.
Why build it
You could pip install a neural net and never know what's inside. Building one neuron by hand makes the abstractions concrete: weights are just an array, "learning" is just a subtraction in a loop, and a model's limits come straight from its math — a single line can't bend. Hold that in your hands once and every fancier network is just more of the same idea, stacked. It fits in an afternoon.