Add uv for dependency management and update workshop materials

This commit is contained in:
Eric 2026-03-31 12:03:34 -04:00
commit 7e4f0fb80b
6 changed files with 4122 additions and 53 deletions

4
.gitignore vendored
View file

@ -3,6 +3,7 @@ __pycache__/
*.pyc *.pyc
.venv/ .venv/
llm/ llm/
llm-workshop/
# Model files and vector stores (too large for git) # Model files and vector stores (too large for git)
*.pt *.pt
@ -29,6 +30,9 @@ store/
*~ *~
*.bak *.bak
# Personal notes (not part of the workshop)
*-notes.md
# Legacy directories (not part of the workshop) # Legacy directories (not part of the workshop)
handouts/ handouts/
class_demo/ class_demo/

View file

@ -62,7 +62,7 @@ The curve is smooth and nonlinear — $C_p$ increases with temperature as molecu
Our network has three layers: Our network has three layers:
``` ```
Input (1 neuron: T) → Hidden (10 neurons) → Output (1 neuron: Cp) Input (1 neuron: T) -> Hidden (10 neurons) -> Output (1 neuron: Cp)
``` ```
Here's what happens at each step: Here's what happens at each step:
@ -84,9 +84,9 @@ This is a linear combination — no activation on the output, since we want to p
### Counting parameters ### Counting parameters
With 10 hidden neurons: With 10 hidden neurons:
- `W1`: 10 weights (input hidden) - `W1`: 10 weights (input -> hidden)
- `b1`: 10 biases (hidden) - `b1`: 10 biases (hidden)
- `W2`: 10 weights (hidden output) - `W2`: 10 weights (hidden -> output)
- `b2`: 1 bias (output) - `b2`: 1 bias (output)
- **Total: 31 parameters** - **Total: 31 parameters**
@ -123,7 +123,7 @@ $$w \leftarrow w - \eta \cdot \frac{\partial L}{\partial w}$$
where $\eta$ is the **learning rate** — a small number (0.01 in our code) that controls how big each step is. Too large and training oscillates; too small and it's painfully slow. where $\eta$ is the **learning rate** — a small number (0.01 in our code) that controls how big each step is. Too large and training oscillates; too small and it's painfully slow.
One full pass through these three steps (forward → loss → backward → update) is one **epoch**. We train for 5000 epochs. One full pass through these three steps (forward -> loss -> backward -> update) is one **epoch**. We train for 5000 epochs.
In nanoGPT, the training loop in `train.py` does exactly the same thing, but with the AdamW optimizer (a fancier version of gradient descent) and batches of data instead of the full dataset. In nanoGPT, the training loop in `train.py` does exactly the same thing, but with the AdamW optimizer (a fancier version of gradient descent) and batches of data instead of the full dataset.

View file

@ -3,122 +3,413 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "xbsmj1hcj1g", "id": "xbsmj1hcj1g",
"source": "# Building a Neural Network: $C_p(T)$ for Nitrogen\n\n**CHEG 667-013 — LLMs for Engineers**\n\nIn this notebook we fit the heat capacity of N₂ gas using three approaches:\n1. A polynomial fit (the classical approach)\n2. A neural network built from scratch in numpy\n3. The same network in PyTorch\n\nThis makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible.", "metadata": {},
"metadata": {} "source": [
"# Building a Neural Network: $C_p(T)$ for Nitrogen\n",
"\n",
"**CHEG 667-013 — LLMs for Engineers**\n",
"\n",
"In this notebook we fit the heat capacity of N₂ gas using three approaches:\n",
"1. A polynomial fit (the classical approach)\n",
"2. A neural network built from scratch in numpy\n",
"3. The same network in PyTorch\n",
"\n",
"This makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible."
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "szrl41l3xbq", "id": "szrl41l3xbq",
"source": "## 1. Load and plot the data\n\nThe data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 3002000 K.", "metadata": {},
"metadata": {} "source": [
"## 1. Load and plot the data\n",
"\n",
"The data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 3002000 K."
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"id": "t4lqkcoeyil",
"source": "import numpy as np\nimport matplotlib.pyplot as plt\n\ndata = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\nT_raw = data[:, 0] # Temperature (K)\nCp_raw = data[:, 1] # Cp (kJ/kg/K)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\nplt.show()\n\nprint(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} {T_raw.max():.0f} K\")",
"metadata": {},
"execution_count": null, "execution_count": null,
"outputs": [] "id": "t4lqkcoeyil",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"data = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\n",
"T_raw = data[:, 0] # Temperature (K)\n",
"Cp_raw = data[:, 1] # Cp (kJ/kg/K)\n",
"\n",
"plt.figure(figsize=(8, 5))\n",
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6)\n",
"plt.xlabel('Temperature (K)')\n",
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
"plt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\n",
"plt.show()\n",
"\n",
"print(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} {T_raw.max():.0f} K\")"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "1jyrgsvp7op", "id": "1jyrgsvp7op",
"source": "## 2. Polynomial fit (baseline)\n\nTextbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does.", "metadata": {},
"metadata": {} "source": [
"## 2. Polynomial fit (baseline)\n",
"\n",
"Textbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does."
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"id": "4smvu4z2oro",
"source": "# Fit a cubic polynomial\ncoeffs = np.polyfit(T_raw, Cp_raw, 3)\npoly = np.poly1d(coeffs)\n\nT_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\nCp_poly = poly(T_fine)\n\n# Compute residuals\nCp_poly_at_data = poly(T_raw)\nmse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Polynomial fit')\nplt.legend()\nplt.show()\n\nprint(f\"Polynomial coefficients: {coeffs}\")\nprint(f\"MSE: {mse_poly:.8f}\")\nprint(f\"Parameters: 4\")",
"metadata": {},
"execution_count": null, "execution_count": null,
"outputs": [] "id": "4smvu4z2oro",
"metadata": {},
"outputs": [],
"source": [
"# Fit a cubic polynomial\n",
"coeffs = np.polyfit(T_raw, Cp_raw, 3)\n",
"poly = np.poly1d(coeffs)\n",
"\n",
"T_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\n",
"Cp_poly = poly(T_fine)\n",
"\n",
"# Compute residuals\n",
"Cp_poly_at_data = poly(T_raw)\n",
"mse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n",
"\n",
"plt.figure(figsize=(8, 5))\n",
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"plt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\n",
"plt.xlabel('Temperature (K)')\n",
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
"plt.title('Polynomial fit')\n",
"plt.legend()\n",
"plt.show()\n",
"\n",
"print(f\"Polynomial coefficients: {coeffs}\")\n",
"print(f\"MSE: {mse_poly:.8f}\")\n",
"print(f\"Parameters: 4\")"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "97y7mrcekji", "id": "97y7mrcekji",
"source": "## 3. Neural network from scratch (numpy)\n\nNow let's build a one-hidden-layer neural network. The architecture:\n\n```\nInput (1: T) → Hidden (10 neurons, tanh) → Output (1: Cp)\n```\n\nWe need to:\n1. **Normalize** the data to [0, 1] so the network trains efficiently\n2. **Forward pass**: compute predictions from input through each layer\n3. **Loss**: mean squared error between predictions and data\n4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n5. **Gradient descent**: update weights in the direction that reduces the loss\n\nThis is exactly what nanoGPT's `train.py` does — just at a much larger scale.", "metadata": {},
"metadata": {} "source": [
"## 3. Neural network from scratch (numpy)\n",
"\n",
"Now let's build a one-hidden-layer neural network. The architecture:\n",
"\n",
"```\n",
"Input (1: T) -> Hidden (10 neurons, tanh) -> Output (1: Cp)\n",
"```\n",
"\n",
"We need to:\n",
"1. **Normalize** the data to [0, 1] so the network trains efficiently\n",
"2. **Forward pass**: compute predictions from input through each layer\n",
"3. **Loss**: mean squared error between predictions and data\n",
"4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n",
"5. **Gradient descent**: update weights in the direction that reduces the loss\n",
"\n",
"This is exactly what nanoGPT's `train.py` does — just at a much larger scale."
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"id": "365o7bqbwkr", "id": "365o7bqbwkr",
"source": "# Normalize inputs and outputs to [0, 1]\nT_min, T_max = T_raw.min(), T_raw.max()\nCp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n\nT = (T_raw - T_min) / (T_max - T_min)\nCp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n\nX = T.reshape(-1, 1) # (N, 1) input matrix\nY = Cp.reshape(-1, 1) # (N, 1) target matrix\nN = X.shape[0]\n\n# Network setup\nH = 10 # hidden neurons\n\nnp.random.seed(42)\nW1 = np.random.randn(1, H) * 0.5 # input → hidden weights\nb1 = np.zeros((1, H)) # hidden biases\nW2 = np.random.randn(H, 1) * 0.5 # hidden → output weights\nb2 = np.zeros((1, 1)) # output bias\n\nprint(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\nprint(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")",
"metadata": {}, "metadata": {},
"execution_count": null, "outputs": [],
"outputs": [] "source": [
"# Normalize inputs and outputs to [0, 1]\n",
"T_min, T_max = T_raw.min(), T_raw.max()\n",
"Cp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n",
"\n",
"T = (T_raw - T_min) / (T_max - T_min)\n",
"Cp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n",
"\n",
"X = T.reshape(-1, 1) # (N, 1) input matrix\n",
"Y = Cp.reshape(-1, 1) # (N, 1) target matrix\n",
"N = X.shape[0]\n",
"\n",
"# Network setup\n",
"H = 10 # hidden neurons\n",
"\n",
"np.random.seed(42)\n",
"W1 = np.random.randn(1, H) * 0.5 # input -> hidden weights\n",
"b1 = np.zeros((1, H)) # hidden biases\n",
"W2 = np.random.randn(H, 1) * 0.5 # hidden -> output weights\n",
"b2 = np.zeros((1, 1)) # output bias\n",
"\n",
"print(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\n",
"print(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"id": "5w1ezs9t2w6", "id": "5w1ezs9t2w6",
"source": "# Training loop\nlearning_rate = 0.01\nepochs = 5000\nlog_interval = 500\nlosses_np = []\n\nfor epoch in range(epochs):\n # Forward pass\n Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n A1 = np.tanh(Z1) # hidden activation (N, H)\n Y_pred = A1 @ W2 + b2 # output (N, 1)\n\n # Loss (mean squared error)\n error = Y_pred - Y\n loss = np.mean(error ** 2)\n losses_np.append(loss)\n\n # Backpropagation (chain rule, working backward)\n dL_dYpred = 2 * error / N\n dL_dW2 = A1.T @ dL_dYpred\n dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n dL_dA1 = dL_dYpred @ W2.T\n dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n dL_dW1 = X.T @ dL_dZ1\n dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n\n # Gradient descent update\n W2 -= learning_rate * dL_dW2\n b2 -= learning_rate * dL_db2\n W1 -= learning_rate * dL_dW1\n b1 -= learning_rate * dL_db1\n\n if epoch % log_interval == 0 or epoch == epochs - 1:\n print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")",
"metadata": {}, "metadata": {},
"execution_count": null, "outputs": [],
"outputs": [] "source": [
"# Training loop\n",
"learning_rate = 0.01\n",
"epochs = 5000\n",
"log_interval = 500\n",
"losses_np = []\n",
"\n",
"for epoch in range(epochs):\n",
" # Forward pass\n",
" Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n",
" A1 = np.tanh(Z1) # hidden activation (N, H)\n",
" Y_pred = A1 @ W2 + b2 # output (N, 1)\n",
"\n",
" # Loss (mean squared error)\n",
" error = Y_pred - Y\n",
" loss = np.mean(error ** 2)\n",
" losses_np.append(loss)\n",
"\n",
" # Backpropagation (chain rule, working backward)\n",
" dL_dYpred = 2 * error / N\n",
" dL_dW2 = A1.T @ dL_dYpred\n",
" dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n",
" dL_dA1 = dL_dYpred @ W2.T\n",
" dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n",
" dL_dW1 = X.T @ dL_dZ1\n",
" dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n",
"\n",
" # Gradient descent update\n",
" W2 -= learning_rate * dL_dW2\n",
" b2 -= learning_rate * dL_db2\n",
" W1 -= learning_rate * dL_dW1\n",
" b1 -= learning_rate * dL_db1\n",
"\n",
" if epoch % log_interval == 0 or epoch == epochs - 1:\n",
" print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"id": "onel9r0kjk",
"source": "# Predict on a fine grid and convert back to physical units\nT_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\nA1_fine = np.tanh(T_fine_norm @ W1 + b1)\nCp_nn_norm = A1_fine @ W2 + b2\nCp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\nT_fine_K = T_fine_norm * (T_max - T_min) + T_min\n\n# MSE in original units for comparison with polynomial\nCp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\nCp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\nmse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n\nax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\nax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\nax1.set_xlabel('Temperature (K)')\nax1.set_ylabel('$C_p$ (kJ/kg/K)')\nax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\nax1.legend()\n\nax2.semilogy(losses_np)\nax2.set_xlabel('Epoch')\nax2.set_ylabel('MSE (normalized)')\nax2.set_title('Training loss — numpy NN')\n\nplt.tight_layout()\nplt.show()",
"metadata": {},
"execution_count": null, "execution_count": null,
"outputs": [] "id": "onel9r0kjk",
"metadata": {},
"outputs": [],
"source": [
"# Predict on a fine grid and convert back to physical units\n",
"T_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\n",
"A1_fine = np.tanh(T_fine_norm @ W1 + b1)\n",
"Cp_nn_norm = A1_fine @ W2 + b2\n",
"Cp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\n",
"T_fine_K = T_fine_norm * (T_max - T_min) + T_min\n",
"\n",
"# MSE in original units for comparison with polynomial\n",
"Cp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\n",
"Cp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\n",
"mse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n",
"\n",
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
"\n",
"ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\n",
"ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\n",
"ax1.set_xlabel('Temperature (K)')\n",
"ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
"ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
"ax1.legend()\n",
"\n",
"ax2.semilogy(losses_np)\n",
"ax2.set_xlabel('Epoch')\n",
"ax2.set_ylabel('MSE (normalized)')\n",
"ax2.set_title('Training loss — numpy NN')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "ea9z35qm9u8", "id": "ea9z35qm9u8",
"source": "## 4. Neural network in PyTorch\n\nThe same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n\nThis is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`.", "metadata": {},
"metadata": {} "source": [
"## 4. Neural network in PyTorch\n",
"\n",
"The same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n",
"\n",
"This is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`."
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"id": "3qxnrtyxqgz", "id": "3qxnrtyxqgz",
"source": "import torch\nimport torch.nn as nn\n\n# Prepare data as PyTorch tensors\nX_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\nY_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n\n# Define the network\nmodel = nn.Sequential(\n nn.Linear(1, H), # input → hidden (W1, b1)\n nn.Tanh(), # activation\n nn.Linear(H, 1), # hidden → output (W2, b2)\n)\n\nprint(model)\nprint(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")",
"metadata": {}, "metadata": {},
"execution_count": null, "outputs": [],
"outputs": [] "source": [
"import torch\n",
"import torch.nn as nn\n",
"\n",
"# Prepare data as PyTorch tensors\n",
"X_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\n",
"Y_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n",
"\n",
"# Define the network\n",
"model = nn.Sequential(\n",
" nn.Linear(1, H), # input -> hidden (W1, b1)\n",
" nn.Tanh(), # activation\n",
" nn.Linear(H, 1), # hidden -> output (W2, b2)\n",
")\n",
"\n",
"print(model)\n",
"print(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"id": "ydl3ycnypps",
"source": "# Train\noptimizer = torch.optim.Adam(model.parameters(), lr=0.01)\nloss_fn = nn.MSELoss()\nlosses_torch = []\n\nfor epoch in range(epochs):\n Y_pred_t = model(X_t)\n loss = loss_fn(Y_pred_t, Y_t)\n losses_torch.append(loss.item())\n\n optimizer.zero_grad() # reset gradients\n loss.backward() # automatic differentiation\n optimizer.step() # update weights\n\n if epoch % log_interval == 0 or epoch == epochs - 1:\n print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")",
"metadata": {},
"execution_count": null, "execution_count": null,
"outputs": [] "id": "ydl3ycnypps",
"metadata": {},
"outputs": [],
"source": [
"# Train\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
"loss_fn = nn.MSELoss()\n",
"losses_torch = []\n",
"\n",
"for epoch in range(epochs):\n",
" Y_pred_t = model(X_t)\n",
" loss = loss_fn(Y_pred_t, Y_t)\n",
" losses_torch.append(loss.item())\n",
"\n",
" optimizer.zero_grad() # reset gradients\n",
" loss.backward() # automatic differentiation\n",
" optimizer.step() # update weights\n",
"\n",
" if epoch % log_interval == 0 or epoch == epochs - 1:\n",
" print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "bg0kvnk4ho", "id": "bg0kvnk4ho",
"source": "## 5. Compare all three approaches", "metadata": {},
"metadata": {} "source": [
"## 5. Compare all three approaches"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"id": "h2dfstoh8gd",
"source": "# PyTorch predictions\nT_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\nwith torch.no_grad():\n Cp_torch_norm = model(T_fine_t)\nCp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n\n# MSE for PyTorch model\nwith torch.no_grad():\n Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\nmse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n\n# Left: all three fits\nax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\nax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\nax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\nax1.set_xlabel('Temperature (K)')\nax1.set_ylabel('$C_p$ (kJ/kg/K)')\nax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\nax1.legend()\n\n# Right: training loss comparison\nax2.semilogy(losses_np, label='numpy (gradient descent)')\nax2.semilogy(losses_torch, label='PyTorch (Adam)')\nax2.set_xlabel('Epoch')\nax2.set_ylabel('MSE (normalized)')\nax2.set_title('Training loss comparison')\nax2.legend()\n\nplt.tight_layout()\nplt.show()\n\nprint(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")",
"metadata": {},
"execution_count": null, "execution_count": null,
"outputs": [] "id": "h2dfstoh8gd",
"metadata": {},
"outputs": [],
"source": [
"# PyTorch predictions\n",
"T_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\n",
"with torch.no_grad():\n",
" Cp_torch_norm = model(T_fine_t)\n",
"Cp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n",
"\n",
"# MSE for PyTorch model\n",
"with torch.no_grad():\n",
" Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\n",
"mse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n",
"\n",
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
"\n",
"# Left: all three fits\n",
"ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\n",
"ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\n",
"ax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\n",
"ax1.set_xlabel('Temperature (K)')\n",
"ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
"ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
"ax1.legend()\n",
"\n",
"# Right: training loss comparison\n",
"ax2.semilogy(losses_np, label='numpy (gradient descent)')\n",
"ax2.semilogy(losses_torch, label='PyTorch (Adam)')\n",
"ax2.set_xlabel('Epoch')\n",
"ax2.set_ylabel('MSE (normalized)')\n",
"ax2.set_title('Training loss comparison')\n",
"ax2.legend()\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"print(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "xyw3sr20brn", "id": "xyw3sr20brn",
"source": "## 6. Extrapolation\n\nHow do the models behave *outside* the training range? This is a key test — and where the differences become stark.", "metadata": {},
"metadata": {} "source": [
"## 6. Extrapolation\n",
"\n",
"How do the models behave *outside* the training range? This is a key test — and where the differences become stark."
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"id": "fi3iq2sjh6",
"source": "# Extrapolate beyond the training range\nT_extrap = np.linspace(100, 2500, 300)\nT_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n\n# Polynomial extrapolation\nCp_poly_extrap = poly(T_extrap)\n\n# Numpy NN extrapolation\nA1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\nCp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n\n# PyTorch NN extrapolation\nwith torch.no_grad():\n Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\nCp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n\nplt.figure(figsize=(10, 6))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\nplt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\nplt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\nplt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\nplt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Extrapolation beyond training data')\nplt.legend()\nplt.show()",
"metadata": {},
"execution_count": null, "execution_count": null,
"outputs": [] "id": "fi3iq2sjh6",
"metadata": {},
"outputs": [],
"source": [
"# Extrapolate beyond the training range\n",
"T_extrap = np.linspace(100, 2500, 300)\n",
"T_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n",
"\n",
"# Polynomial extrapolation\n",
"Cp_poly_extrap = poly(T_extrap)\n",
"\n",
"# Numpy NN extrapolation\n",
"A1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\n",
"Cp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n",
"\n",
"# PyTorch NN extrapolation\n",
"with torch.no_grad():\n",
" Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\n",
"Cp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
"plt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\n",
"plt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\n",
"plt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\n",
"plt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\n",
"plt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\n",
"plt.xlabel('Temperature (K)')\n",
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
"plt.title('Extrapolation beyond training data')\n",
"plt.legend()\n",
"plt.show()"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "yb2s18keiw", "id": "yb2s18keiw",
"source": "## 7. Exercises\n\nTry these in new cells below:\n\n1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n\n2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n\n3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n\n4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n\n5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n\n6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?", "metadata": {},
"metadata": {} "source": [
"## 7. Exercises\n",
"\n",
"Try these in new cells below:\n",
"\n",
"1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n",
"\n",
"2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n",
"\n",
"3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n",
"\n",
"4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n",
"\n",
"5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n",
"\n",
"6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?"
]
} }
], ],
"metadata": { "metadata": {
@ -134,4 +425,4 @@
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 5 "nbformat_minor": 5
} }

View file

@ -37,15 +37,24 @@ Each section has its own `README.md` with a full walkthrough, exercises, and any
### Python environment ### Python environment
Create a virtual environment once and reuse it across sections: Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (a fast Python package manager), then:
```bash ```bash
python3 -m venv llm-workshop uv sync
source llm-workshop/bin/activate
pip install numpy torch matplotlib
``` ```
Sections 03 and 04 have additional dependencies listed in their `requirements.txt` files. This creates a `.venv/` virtual environment and installs all dependencies from the lock file. To run scripts:
```bash
uv run python 05-neural-networks/nn_torch.py
```
Or activate the environment directly:
```bash
source .venv/bin/activate
python 05-neural-networks/nn_torch.py
```
## License ## License

18
pyproject.toml Normal file
View file

@ -0,0 +1,18 @@
[project]
name = "llm-workshop"
version = "0.1.0"
description = "CHEG 667-013 LLMs for Engineers workshop"
requires-python = ">=3.10"
dependencies = [
"numpy",
"torch",
"matplotlib",
"llama-index-core",
"llama-index-readers-file",
"llama-index-llms-ollama",
"llama-index-embeddings-huggingface",
"llama-index-retrievers-bm25",
"python-dateutil",
"nltk",
"sentence-transformers",
]

3747
uv.lock generated Normal file

File diff suppressed because it is too large Load diff