## Phase 1: Core Data Structures **`src/model.rs`** - Manual parameter management - `struct Parameters`: holds `w1, b1, w2, b2` as `Tensor` - `impl Parameters`: initialization with `randn(0.1)` for weights, zeros for biases - No `nn.Linear`—manual tensors to match the Python exercise ## Phase 2: Forward Pass **`src/forward.rs`** or in `model.rs` - `fn forward(params: &Parameters, images: Tensor) -> Tensor` - Cast `uint8` images to `f32`, divide by 255, flatten to `[batch, 784]` - `hidden = tanh(images @ w1 + b1)` - `logits = hidden @ w2 + b2` - Return raw logits (no softmax here) ## Phase 3: Loss Computation **`src/loss.rs`** - `fn cross_entropy_loss(logits: Tensor, labels: Tensor) -> Tensor` - Manual implementation—no `CrossEntropyLoss` module - `softmax = exp(logits - max) / sum(exp(logits - max))` - Index `softmax` by gold labels to get `p_correct` - `loss = -mean(log(p_correct))` ## Phase 4: Backward Pass & SGD **`src/train.rs`** - `fn train_epoch(params: &mut Parameters, dataset: &[MnistItem], args: &Args)` - For each batch: 1. `let loss = cross_entropy_loss(forward(¶ms, images), labels)` 2. `let grads = loss.backward()` — automatic differentiation 3. **Manual SGD**: `param = param - lr * grad` for each parameter 4. No `Optimizer`—raw gradient descent like Python ## Phase 5: Evaluation **`src/eval.rs`** - `fn evaluate(params: &Parameters, dataset: &[MnistItem]) -> f64` - `argmax` on logits, compare to labels, return accuracy ## Phase 6: Main Training Loop **Update `src/main.rs`** - Parse args ✓ (done) - Load data ✓ (done) - Initialize `Parameters` with seed - Loop `args.epochs`: `train_epoch` → `evaluate(dev)` → print - Final `evaluate(test)`