llm-workshop/05-neural-networks/nn_workshop.ipynb
Eric 1604671d36 Initial commit: LLM workshop materials
Five modules covering nanoGPT, Ollama, RAG, semantic search, and neural networks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 07:11:01 -04:00

137 lines
No EOL
13 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "xbsmj1hcj1g",
"source": "# Building a Neural Network: $C_p(T)$ for Nitrogen\n\n**CHEG 667-013 — LLMs for Engineers**\n\nIn this notebook we fit the heat capacity of N₂ gas using three approaches:\n1. A polynomial fit (the classical approach)\n2. A neural network built from scratch in numpy\n3. The same network in PyTorch\n\nThis makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible.",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "szrl41l3xbq",
"source": "## 1. Load and plot the data\n\nThe data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 3002000 K.",
"metadata": {}
},
{
"cell_type": "code",
"id": "t4lqkcoeyil",
"source": "import numpy as np\nimport matplotlib.pyplot as plt\n\ndata = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\nT_raw = data[:, 0] # Temperature (K)\nCp_raw = data[:, 1] # Cp (kJ/kg/K)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\nplt.show()\n\nprint(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} {T_raw.max():.0f} K\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "1jyrgsvp7op",
"source": "## 2. Polynomial fit (baseline)\n\nTextbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does.",
"metadata": {}
},
{
"cell_type": "code",
"id": "4smvu4z2oro",
"source": "# Fit a cubic polynomial\ncoeffs = np.polyfit(T_raw, Cp_raw, 3)\npoly = np.poly1d(coeffs)\n\nT_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\nCp_poly = poly(T_fine)\n\n# Compute residuals\nCp_poly_at_data = poly(T_raw)\nmse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Polynomial fit')\nplt.legend()\nplt.show()\n\nprint(f\"Polynomial coefficients: {coeffs}\")\nprint(f\"MSE: {mse_poly:.8f}\")\nprint(f\"Parameters: 4\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "97y7mrcekji",
"source": "## 3. Neural network from scratch (numpy)\n\nNow let's build a one-hidden-layer neural network. The architecture:\n\n```\nInput (1: T) → Hidden (10 neurons, tanh) → Output (1: Cp)\n```\n\nWe need to:\n1. **Normalize** the data to [0, 1] so the network trains efficiently\n2. **Forward pass**: compute predictions from input through each layer\n3. **Loss**: mean squared error between predictions and data\n4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n5. **Gradient descent**: update weights in the direction that reduces the loss\n\nThis is exactly what nanoGPT's `train.py` does — just at a much larger scale.",
"metadata": {}
},
{
"cell_type": "code",
"id": "365o7bqbwkr",
"source": "# Normalize inputs and outputs to [0, 1]\nT_min, T_max = T_raw.min(), T_raw.max()\nCp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n\nT = (T_raw - T_min) / (T_max - T_min)\nCp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n\nX = T.reshape(-1, 1) # (N, 1) input matrix\nY = Cp.reshape(-1, 1) # (N, 1) target matrix\nN = X.shape[0]\n\n# Network setup\nH = 10 # hidden neurons\n\nnp.random.seed(42)\nW1 = np.random.randn(1, H) * 0.5 # input → hidden weights\nb1 = np.zeros((1, H)) # hidden biases\nW2 = np.random.randn(H, 1) * 0.5 # hidden → output weights\nb2 = np.zeros((1, 1)) # output bias\n\nprint(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\nprint(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"id": "5w1ezs9t2w6",
"source": "# Training loop\nlearning_rate = 0.01\nepochs = 5000\nlog_interval = 500\nlosses_np = []\n\nfor epoch in range(epochs):\n # Forward pass\n Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n A1 = np.tanh(Z1) # hidden activation (N, H)\n Y_pred = A1 @ W2 + b2 # output (N, 1)\n\n # Loss (mean squared error)\n error = Y_pred - Y\n loss = np.mean(error ** 2)\n losses_np.append(loss)\n\n # Backpropagation (chain rule, working backward)\n dL_dYpred = 2 * error / N\n dL_dW2 = A1.T @ dL_dYpred\n dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n dL_dA1 = dL_dYpred @ W2.T\n dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n dL_dW1 = X.T @ dL_dZ1\n dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n\n # Gradient descent update\n W2 -= learning_rate * dL_dW2\n b2 -= learning_rate * dL_db2\n W1 -= learning_rate * dL_dW1\n b1 -= learning_rate * dL_db1\n\n if epoch % log_interval == 0 or epoch == epochs - 1:\n print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"id": "onel9r0kjk",
"source": "# Predict on a fine grid and convert back to physical units\nT_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\nA1_fine = np.tanh(T_fine_norm @ W1 + b1)\nCp_nn_norm = A1_fine @ W2 + b2\nCp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\nT_fine_K = T_fine_norm * (T_max - T_min) + T_min\n\n# MSE in original units for comparison with polynomial\nCp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\nCp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\nmse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n\nax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\nax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\nax1.set_xlabel('Temperature (K)')\nax1.set_ylabel('$C_p$ (kJ/kg/K)')\nax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\nax1.legend()\n\nax2.semilogy(losses_np)\nax2.set_xlabel('Epoch')\nax2.set_ylabel('MSE (normalized)')\nax2.set_title('Training loss — numpy NN')\n\nplt.tight_layout()\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "ea9z35qm9u8",
"source": "## 4. Neural network in PyTorch\n\nThe same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n\nThis is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`.",
"metadata": {}
},
{
"cell_type": "code",
"id": "3qxnrtyxqgz",
"source": "import torch\nimport torch.nn as nn\n\n# Prepare data as PyTorch tensors\nX_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\nY_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n\n# Define the network\nmodel = nn.Sequential(\n nn.Linear(1, H), # input → hidden (W1, b1)\n nn.Tanh(), # activation\n nn.Linear(H, 1), # hidden → output (W2, b2)\n)\n\nprint(model)\nprint(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"id": "ydl3ycnypps",
"source": "# Train\noptimizer = torch.optim.Adam(model.parameters(), lr=0.01)\nloss_fn = nn.MSELoss()\nlosses_torch = []\n\nfor epoch in range(epochs):\n Y_pred_t = model(X_t)\n loss = loss_fn(Y_pred_t, Y_t)\n losses_torch.append(loss.item())\n\n optimizer.zero_grad() # reset gradients\n loss.backward() # automatic differentiation\n optimizer.step() # update weights\n\n if epoch % log_interval == 0 or epoch == epochs - 1:\n print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "bg0kvnk4ho",
"source": "## 5. Compare all three approaches",
"metadata": {}
},
{
"cell_type": "code",
"id": "h2dfstoh8gd",
"source": "# PyTorch predictions\nT_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\nwith torch.no_grad():\n Cp_torch_norm = model(T_fine_t)\nCp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n\n# MSE for PyTorch model\nwith torch.no_grad():\n Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\nmse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n\n# Left: all three fits\nax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\nax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\nax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\nax1.set_xlabel('Temperature (K)')\nax1.set_ylabel('$C_p$ (kJ/kg/K)')\nax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\nax1.legend()\n\n# Right: training loss comparison\nax2.semilogy(losses_np, label='numpy (gradient descent)')\nax2.semilogy(losses_torch, label='PyTorch (Adam)')\nax2.set_xlabel('Epoch')\nax2.set_ylabel('MSE (normalized)')\nax2.set_title('Training loss comparison')\nax2.legend()\n\nplt.tight_layout()\nplt.show()\n\nprint(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "xyw3sr20brn",
"source": "## 6. Extrapolation\n\nHow do the models behave *outside* the training range? This is a key test — and where the differences become stark.",
"metadata": {}
},
{
"cell_type": "code",
"id": "fi3iq2sjh6",
"source": "# Extrapolate beyond the training range\nT_extrap = np.linspace(100, 2500, 300)\nT_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n\n# Polynomial extrapolation\nCp_poly_extrap = poly(T_extrap)\n\n# Numpy NN extrapolation\nA1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\nCp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n\n# PyTorch NN extrapolation\nwith torch.no_grad():\n Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\nCp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n\nplt.figure(figsize=(10, 6))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\nplt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\nplt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\nplt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\nplt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Extrapolation beyond training data')\nplt.legend()\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "yb2s18keiw",
"source": "## 7. Exercises\n\nTry these in new cells below:\n\n1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n\n2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n\n3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n\n4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n\n5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n\n6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?",
"metadata": {}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}