llm-workshop/05-neural-networks/nn_workshop.ipynb

428 lines
15 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "xbsmj1hcj1g",
"metadata": {},
"source": [
"# Building a Neural Network: $C_p(T)$ for Nitrogen\n",
"\n",
"**CHEG 667-013 — LLMs for Engineers**\n",
"\n",
"In this notebook we fit the heat capacity of N₂ gas using three approaches:\n",
"1. A polynomial fit (the classical approach)\n",
"2. A neural network built from scratch in numpy\n",
"3. The same network in PyTorch\n",
"\n",
"This makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible."
]
},
{
"cell_type": "markdown",
"id": "szrl41l3xbq",
"metadata": {},
"source": [
"## 1. Load and plot the data\n",
"\n",
"The data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 3002000 K."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "t4lqkcoeyil",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"data = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\n",
"T_raw = data[:, 0] # Temperature (K)\n",
"Cp_raw = data[:, 1] # Cp (kJ/kg/K)\n",
"\n",
"plt.figure(figsize=(8, 5))\n",
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6)\n",
"plt.xlabel('Temperature (K)')\n",
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
"plt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\n",
"plt.show()\n",
"\n",
"print(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} {T_raw.max():.0f} K\")"
]
},
{
"cell_type": "markdown",
"id": "1jyrgsvp7op",
"metadata": {},
"source": [
"## 2. Polynomial fit (baseline)\n",
"\n",
"Textbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4smvu4z2oro",
"metadata": {},
"outputs": [],
"source": [
"# Fit a cubic polynomial\n",
"coeffs = np.polyfit(T_raw, Cp_raw, 3)\n",
"poly = np.poly1d(coeffs)\n",
"\n",
"T_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\n",
"Cp_poly = poly(T_fine)\n",
"\n",
"# Compute residuals\n",
"Cp_poly_at_data = poly(T_raw)\n",
"mse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n",
"\n",
"plt.figure(figsize=(8, 5))\n",
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"plt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\n",
"plt.xlabel('Temperature (K)')\n",
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
"plt.title('Polynomial fit')\n",
"plt.legend()\n",
"plt.show()\n",
"\n",
"print(f\"Polynomial coefficients: {coeffs}\")\n",
"print(f\"MSE: {mse_poly:.8f}\")\n",
"print(f\"Parameters: 4\")"
]
},
{
"cell_type": "markdown",
"id": "97y7mrcekji",
"metadata": {},
"source": [
"## 3. Neural network from scratch (numpy)\n",
"\n",
"Now let's build a one-hidden-layer neural network. The architecture:\n",
"\n",
"```\n",
"Input (1: T) -> Hidden (10 neurons, tanh) -> Output (1: Cp)\n",
"```\n",
"\n",
"We need to:\n",
"1. **Normalize** the data to [0, 1] so the network trains efficiently\n",
"2. **Forward pass**: compute predictions from input through each layer\n",
"3. **Loss**: mean squared error between predictions and data\n",
"4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n",
"5. **Gradient descent**: update weights in the direction that reduces the loss\n",
"\n",
"This is exactly what nanoGPT's `train.py` does — just at a much larger scale."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "365o7bqbwkr",
"metadata": {},
"outputs": [],
"source": [
"# Normalize inputs and outputs to [0, 1]\n",
"T_min, T_max = T_raw.min(), T_raw.max()\n",
"Cp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n",
"\n",
"T = (T_raw - T_min) / (T_max - T_min)\n",
"Cp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n",
"\n",
"X = T.reshape(-1, 1) # (N, 1) input matrix\n",
"Y = Cp.reshape(-1, 1) # (N, 1) target matrix\n",
"N = X.shape[0]\n",
"\n",
"# Network setup\n",
"H = 10 # hidden neurons\n",
"\n",
"np.random.seed(42)\n",
"W1 = np.random.randn(1, H) * 0.5 # input -> hidden weights\n",
"b1 = np.zeros((1, H)) # hidden biases\n",
"W2 = np.random.randn(H, 1) * 0.5 # hidden -> output weights\n",
"b2 = np.zeros((1, 1)) # output bias\n",
"\n",
"print(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\n",
"print(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5w1ezs9t2w6",
"metadata": {},
"outputs": [],
"source": [
"# Training loop\n",
"learning_rate = 0.01\n",
"epochs = 5000\n",
"log_interval = 500\n",
"losses_np = []\n",
"\n",
"for epoch in range(epochs):\n",
" # Forward pass\n",
" Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n",
" A1 = np.tanh(Z1) # hidden activation (N, H)\n",
" Y_pred = A1 @ W2 + b2 # output (N, 1)\n",
"\n",
" # Loss (mean squared error)\n",
" error = Y_pred - Y\n",
" loss = np.mean(error ** 2)\n",
" losses_np.append(loss)\n",
"\n",
" # Backpropagation (chain rule, working backward)\n",
" dL_dYpred = 2 * error / N\n",
" dL_dW2 = A1.T @ dL_dYpred\n",
" dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n",
" dL_dA1 = dL_dYpred @ W2.T\n",
" dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n",
" dL_dW1 = X.T @ dL_dZ1\n",
" dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n",
"\n",
" # Gradient descent update\n",
" W2 -= learning_rate * dL_dW2\n",
" b2 -= learning_rate * dL_db2\n",
" W1 -= learning_rate * dL_dW1\n",
" b1 -= learning_rate * dL_db1\n",
"\n",
" if epoch % log_interval == 0 or epoch == epochs - 1:\n",
" print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "onel9r0kjk",
"metadata": {},
"outputs": [],
"source": [
"# Predict on a fine grid and convert back to physical units\n",
"T_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\n",
"A1_fine = np.tanh(T_fine_norm @ W1 + b1)\n",
"Cp_nn_norm = A1_fine @ W2 + b2\n",
"Cp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\n",
"T_fine_K = T_fine_norm * (T_max - T_min) + T_min\n",
"\n",
"# MSE in original units for comparison with polynomial\n",
"Cp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\n",
"Cp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\n",
"mse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n",
"\n",
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
"\n",
"ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\n",
"ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\n",
"ax1.set_xlabel('Temperature (K)')\n",
"ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
"ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
"ax1.legend()\n",
"\n",
"ax2.semilogy(losses_np)\n",
"ax2.set_xlabel('Epoch')\n",
"ax2.set_ylabel('MSE (normalized)')\n",
"ax2.set_title('Training loss — numpy NN')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "ea9z35qm9u8",
"metadata": {},
"source": [
"## 4. Neural network in PyTorch\n",
"\n",
"The same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n",
"\n",
"This is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3qxnrtyxqgz",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"\n",
"# Prepare data as PyTorch tensors\n",
"X_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\n",
"Y_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n",
"\n",
"# Define the network\n",
"model = nn.Sequential(\n",
" nn.Linear(1, H), # input -> hidden (W1, b1)\n",
" nn.Tanh(), # activation\n",
" nn.Linear(H, 1), # hidden -> output (W2, b2)\n",
")\n",
"\n",
"print(model)\n",
"print(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ydl3ycnypps",
"metadata": {},
"outputs": [],
"source": [
"# Train\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
"loss_fn = nn.MSELoss()\n",
"losses_torch = []\n",
"\n",
"for epoch in range(epochs):\n",
" Y_pred_t = model(X_t)\n",
" loss = loss_fn(Y_pred_t, Y_t)\n",
" losses_torch.append(loss.item())\n",
"\n",
" optimizer.zero_grad() # reset gradients\n",
" loss.backward() # automatic differentiation\n",
" optimizer.step() # update weights\n",
"\n",
" if epoch % log_interval == 0 or epoch == epochs - 1:\n",
" print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")"
]
},
{
"cell_type": "markdown",
"id": "bg0kvnk4ho",
"metadata": {},
"source": [
"## 5. Compare all three approaches"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "h2dfstoh8gd",
"metadata": {},
"outputs": [],
"source": [
"# PyTorch predictions\n",
"T_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\n",
"with torch.no_grad():\n",
" Cp_torch_norm = model(T_fine_t)\n",
"Cp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n",
"\n",
"# MSE for PyTorch model\n",
"with torch.no_grad():\n",
" Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\n",
"mse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n",
"\n",
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
"\n",
"# Left: all three fits\n",
"ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\n",
"ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\n",
"ax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\n",
"ax1.set_xlabel('Temperature (K)')\n",
"ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
"ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
"ax1.legend()\n",
"\n",
"# Right: training loss comparison\n",
"ax2.semilogy(losses_np, label='numpy (gradient descent)')\n",
"ax2.semilogy(losses_torch, label='PyTorch (Adam)')\n",
"ax2.set_xlabel('Epoch')\n",
"ax2.set_ylabel('MSE (normalized)')\n",
"ax2.set_title('Training loss comparison')\n",
"ax2.legend()\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"print(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")"
]
},
{
"cell_type": "markdown",
"id": "xyw3sr20brn",
"metadata": {},
"source": [
"## 6. Extrapolation\n",
"\n",
"How do the models behave *outside* the training range? This is a key test — and where the differences become stark."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fi3iq2sjh6",
"metadata": {},
"outputs": [],
"source": [
"# Extrapolate beyond the training range\n",
"T_extrap = np.linspace(100, 2500, 300)\n",
"T_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n",
"\n",
"# Polynomial extrapolation\n",
"Cp_poly_extrap = poly(T_extrap)\n",
"\n",
"# Numpy NN extrapolation\n",
"A1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\n",
"Cp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n",
"\n",
"# PyTorch NN extrapolation\n",
"with torch.no_grad():\n",
" Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\n",
"Cp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"plt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\n",
"plt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\n",
"plt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\n",
"plt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\n",
"plt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\n",
"plt.xlabel('Temperature (K)')\n",
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
"plt.title('Extrapolation beyond training data')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "yb2s18keiw",
"metadata": {},
"source": [
"## 7. Exercises\n",
"\n",
"Try these in new cells below:\n",
"\n",
"1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n",
"\n",
"2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n",
"\n",
"3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n",
"\n",
"4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n",
"\n",
"5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n",
"\n",
"6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}