{ "cells": [ { "cell_type": "markdown", "id": "xbsmj1hcj1g", "metadata": {}, "source": [ "# Building a Neural Network: $C_p(T)$ for Nitrogen\n", "\n", "**CHEG 667-013 — LLMs for Engineers**\n", "\n", "In this notebook we fit the heat capacity of N₂ gas using three approaches:\n", "1. A polynomial fit (the classical approach)\n", "2. A neural network built from scratch in numpy\n", "3. The same network in PyTorch\n", "\n", "This makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible." ] }, { "cell_type": "markdown", "id": "szrl41l3xbq", "metadata": {}, "source": [ "## 1. Load and plot the data\n", "\n", "The data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 300–2000 K." ] }, { "cell_type": "code", "execution_count": null, "id": "t4lqkcoeyil", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "data = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\n", "T_raw = data[:, 0] # Temperature (K)\n", "Cp_raw = data[:, 1] # Cp (kJ/kg/K)\n", "\n", "plt.figure(figsize=(8, 5))\n", "plt.plot(T_raw, Cp_raw, 'ko', markersize=6)\n", "plt.xlabel('Temperature (K)')\n", "plt.ylabel('$C_p$ (kJ/kg/K)')\n", "plt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\n", "plt.show()\n", "\n", "print(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} – {T_raw.max():.0f} K\")" ] }, { "cell_type": "markdown", "id": "1jyrgsvp7op", "metadata": {}, "source": [ "## 2. Polynomial fit (baseline)\n", "\n", "Textbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does." ] }, { "cell_type": "code", "execution_count": null, "id": "4smvu4z2oro", "metadata": {}, "outputs": [], "source": [ "# Fit a cubic polynomial\n", "coeffs = np.polyfit(T_raw, Cp_raw, 3)\n", "poly = np.poly1d(coeffs)\n", "\n", "T_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\n", "Cp_poly = poly(T_fine)\n", "\n", "# Compute residuals\n", "Cp_poly_at_data = poly(T_raw)\n", "mse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n", "\n", "plt.figure(figsize=(8, 5))\n", "plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n", "plt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\n", "plt.xlabel('Temperature (K)')\n", "plt.ylabel('$C_p$ (kJ/kg/K)')\n", "plt.title('Polynomial fit')\n", "plt.legend()\n", "plt.show()\n", "\n", "print(f\"Polynomial coefficients: {coeffs}\")\n", "print(f\"MSE: {mse_poly:.8f}\")\n", "print(f\"Parameters: 4\")" ] }, { "cell_type": "markdown", "id": "97y7mrcekji", "metadata": {}, "source": [ "## 3. Neural network from scratch (numpy)\n", "\n", "Now let's build a one-hidden-layer neural network. The architecture:\n", "\n", "```\n", "Input (1: T) -> Hidden (10 neurons, tanh) -> Output (1: Cp)\n", "```\n", "\n", "We need to:\n", "1. **Normalize** the data to [0, 1] so the network trains efficiently\n", "2. **Forward pass**: compute predictions from input through each layer\n", "3. **Loss**: mean squared error between predictions and data\n", "4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n", "5. **Gradient descent**: update weights in the direction that reduces the loss\n", "\n", "This is exactly what nanoGPT's `train.py` does — just at a much larger scale." ] }, { "cell_type": "code", "execution_count": null, "id": "365o7bqbwkr", "metadata": {}, "outputs": [], "source": [ "# Normalize inputs and outputs to [0, 1]\n", "T_min, T_max = T_raw.min(), T_raw.max()\n", "Cp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n", "\n", "T = (T_raw - T_min) / (T_max - T_min)\n", "Cp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n", "\n", "X = T.reshape(-1, 1) # (N, 1) input matrix\n", "Y = Cp.reshape(-1, 1) # (N, 1) target matrix\n", "N = X.shape[0]\n", "\n", "# Network setup\n", "H = 10 # hidden neurons\n", "\n", "np.random.seed(42)\n", "W1 = np.random.randn(1, H) * 0.5 # input -> hidden weights\n", "b1 = np.zeros((1, H)) # hidden biases\n", "W2 = np.random.randn(H, 1) * 0.5 # hidden -> output weights\n", "b2 = np.zeros((1, 1)) # output bias\n", "\n", "print(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\n", "print(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")" ] }, { "cell_type": "code", "execution_count": null, "id": "5w1ezs9t2w6", "metadata": {}, "outputs": [], "source": [ "# Training loop\n", "learning_rate = 0.01\n", "epochs = 5000\n", "log_interval = 500\n", "losses_np = []\n", "\n", "for epoch in range(epochs):\n", " # Forward pass\n", " Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n", " A1 = np.tanh(Z1) # hidden activation (N, H)\n", " Y_pred = A1 @ W2 + b2 # output (N, 1)\n", "\n", " # Loss (mean squared error)\n", " error = Y_pred - Y\n", " loss = np.mean(error ** 2)\n", " losses_np.append(loss)\n", "\n", " # Backpropagation (chain rule, working backward)\n", " dL_dYpred = 2 * error / N\n", " dL_dW2 = A1.T @ dL_dYpred\n", " dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n", " dL_dA1 = dL_dYpred @ W2.T\n", " dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n", " dL_dW1 = X.T @ dL_dZ1\n", " dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n", "\n", " # Gradient descent update\n", " W2 -= learning_rate * dL_dW2\n", " b2 -= learning_rate * dL_db2\n", " W1 -= learning_rate * dL_dW1\n", " b1 -= learning_rate * dL_db1\n", "\n", " if epoch % log_interval == 0 or epoch == epochs - 1:\n", " print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "onel9r0kjk", "metadata": {}, "outputs": [], "source": [ "# Predict on a fine grid and convert back to physical units\n", "T_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\n", "A1_fine = np.tanh(T_fine_norm @ W1 + b1)\n", "Cp_nn_norm = A1_fine @ W2 + b2\n", "Cp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\n", "T_fine_K = T_fine_norm * (T_max - T_min) + T_min\n", "\n", "# MSE in original units for comparison with polynomial\n", "Cp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\n", "Cp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\n", "mse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n", "\n", "ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n", "ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\n", "ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\n", "ax1.set_xlabel('Temperature (K)')\n", "ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n", "ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n", "ax1.legend()\n", "\n", "ax2.semilogy(losses_np)\n", "ax2.set_xlabel('Epoch')\n", "ax2.set_ylabel('MSE (normalized)')\n", "ax2.set_title('Training loss — numpy NN')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "ea9z35qm9u8", "metadata": {}, "source": [ "## 4. Neural network in PyTorch\n", "\n", "The same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n", "\n", "This is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`." ] }, { "cell_type": "code", "execution_count": null, "id": "3qxnrtyxqgz", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "\n", "# Prepare data as PyTorch tensors\n", "X_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\n", "Y_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n", "\n", "# Define the network\n", "model = nn.Sequential(\n", " nn.Linear(1, H), # input -> hidden (W1, b1)\n", " nn.Tanh(), # activation\n", " nn.Linear(H, 1), # hidden -> output (W2, b2)\n", ")\n", "\n", "print(model)\n", "print(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "ydl3ycnypps", "metadata": {}, "outputs": [], "source": [ "# Train\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n", "loss_fn = nn.MSELoss()\n", "losses_torch = []\n", "\n", "for epoch in range(epochs):\n", " Y_pred_t = model(X_t)\n", " loss = loss_fn(Y_pred_t, Y_t)\n", " losses_torch.append(loss.item())\n", "\n", " optimizer.zero_grad() # reset gradients\n", " loss.backward() # automatic differentiation\n", " optimizer.step() # update weights\n", "\n", " if epoch % log_interval == 0 or epoch == epochs - 1:\n", " print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")" ] }, { "cell_type": "markdown", "id": "bg0kvnk4ho", "metadata": {}, "source": [ "## 5. Compare all three approaches" ] }, { "cell_type": "code", "execution_count": null, "id": "h2dfstoh8gd", "metadata": {}, "outputs": [], "source": [ "# PyTorch predictions\n", "T_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\n", "with torch.no_grad():\n", " Cp_torch_norm = model(T_fine_t)\n", "Cp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n", "\n", "# MSE for PyTorch model\n", "with torch.no_grad():\n", " Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\n", "mse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n", "\n", "# Left: all three fits\n", "ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n", "ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\n", "ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\n", "ax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\n", "ax1.set_xlabel('Temperature (K)')\n", "ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n", "ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n", "ax1.legend()\n", "\n", "# Right: training loss comparison\n", "ax2.semilogy(losses_np, label='numpy (gradient descent)')\n", "ax2.semilogy(losses_torch, label='PyTorch (Adam)')\n", "ax2.set_xlabel('Epoch')\n", "ax2.set_ylabel('MSE (normalized)')\n", "ax2.set_title('Training loss comparison')\n", "ax2.legend()\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")" ] }, { "cell_type": "markdown", "id": "xyw3sr20brn", "metadata": {}, "source": [ "## 6. Extrapolation\n", "\n", "How do the models behave *outside* the training range? This is a key test — and where the differences become stark." ] }, { "cell_type": "code", "execution_count": null, "id": "fi3iq2sjh6", "metadata": {}, "outputs": [], "source": [ "# Extrapolate beyond the training range\n", "T_extrap = np.linspace(100, 2500, 300)\n", "T_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n", "\n", "# Polynomial extrapolation\n", "Cp_poly_extrap = poly(T_extrap)\n", "\n", "# Numpy NN extrapolation\n", "A1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\n", "Cp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n", "\n", "# PyTorch NN extrapolation\n", "with torch.no_grad():\n", " Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\n", "Cp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n", "\n", "plt.figure(figsize=(10, 6))\n", "plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n", "plt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\n", "plt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\n", "plt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\n", "plt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\n", "plt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\n", "plt.xlabel('Temperature (K)')\n", "plt.ylabel('$C_p$ (kJ/kg/K)')\n", "plt.title('Extrapolation beyond training data')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "yb2s18keiw", "metadata": {}, "source": [ "## 7. Exercises\n", "\n", "Try these in new cells below:\n", "\n", "1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n", "\n", "2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n", "\n", "3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n", "\n", "4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n", "\n", "5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n", "\n", "6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }