Add uv for dependency management and update workshop materials

2026-03-31 12:03:34 -04:00 · 2026-03-31 12:03:34 -04:00 · 7e4f0fb80b
commit 7e4f0fb80b
parent 4c88157a8e
6 changed files with 4122 additions and 53 deletions
--- a/05-neural-networks/README.md
+++ b/05-neural-networks/README.md
@ -62,7 +62,7 @@ The curve is smooth and nonlinear — $C_p$ increases with temperature as molecu
 Our network has three layers:

 ```
-Input (1 neuron: T)  →  Hidden (10 neurons)  →  Output (1 neuron: Cp)
+Input (1 neuron: T)  ->  Hidden (10 neurons)  ->  Output (1 neuron: Cp)
 ```

 Here's what happens at each step:
@ -84,9 +84,9 @@ This is a linear combination — no activation on the output, since we want to p
 ### Counting parameters

 With 10 hidden neurons:
- `W1`: 10 weights (input → hidden)
+- `W1`: 10 weights (input -> hidden)
 - `b1`: 10 biases (hidden)
- `W2`: 10 weights (hidden → output)
+- `W2`: 10 weights (hidden -> output)
 - `b2`: 1 bias (output)
 - **Total: 31 parameters**

@ -123,7 +123,7 @@ $$w \leftarrow w - \eta \cdot \frac{\partial L}{\partial w}$$

 where $\eta$ is the **learning rate** — a small number (0.01 in our code) that controls how big each step is. Too large and training oscillates; too small and it's painfully slow.

-One full pass through these three steps (forward → loss → backward → update) is one **epoch**. We train for 5000 epochs.
+One full pass through these three steps (forward -> loss -> backward -> update) is one **epoch**. We train for 5000 epochs.

 In nanoGPT, the training loop in `train.py` does exactly the same thing, but with the AdamW optimizer (a fancier version of gradient descent) and batches of data instead of the full dataset.