Add uv for dependency management and update workshop materials
This commit is contained in:
parent
4c88157a8e
commit
7e4f0fb80b
6 changed files with 4122 additions and 53 deletions
|
|
@ -62,7 +62,7 @@ The curve is smooth and nonlinear — $C_p$ increases with temperature as molecu
|
|||
Our network has three layers:
|
||||
|
||||
```
|
||||
Input (1 neuron: T) → Hidden (10 neurons) → Output (1 neuron: Cp)
|
||||
Input (1 neuron: T) -> Hidden (10 neurons) -> Output (1 neuron: Cp)
|
||||
```
|
||||
|
||||
Here's what happens at each step:
|
||||
|
|
@ -84,9 +84,9 @@ This is a linear combination — no activation on the output, since we want to p
|
|||
### Counting parameters
|
||||
|
||||
With 10 hidden neurons:
|
||||
- `W1`: 10 weights (input → hidden)
|
||||
- `W1`: 10 weights (input -> hidden)
|
||||
- `b1`: 10 biases (hidden)
|
||||
- `W2`: 10 weights (hidden → output)
|
||||
- `W2`: 10 weights (hidden -> output)
|
||||
- `b2`: 1 bias (output)
|
||||
- **Total: 31 parameters**
|
||||
|
||||
|
|
@ -123,7 +123,7 @@ $$w \leftarrow w - \eta \cdot \frac{\partial L}{\partial w}$$
|
|||
|
||||
where $\eta$ is the **learning rate** — a small number (0.01 in our code) that controls how big each step is. Too large and training oscillates; too small and it's painfully slow.
|
||||
|
||||
One full pass through these three steps (forward → loss → backward → update) is one **epoch**. We train for 5000 epochs.
|
||||
One full pass through these three steps (forward -> loss -> backward -> update) is one **epoch**. We train for 5000 epochs.
|
||||
|
||||
In nanoGPT, the training loop in `train.py` does exactly the same thing, but with the AdamW optimizer (a fancier version of gradient descent) and batches of data instead of the full dataset.
|
||||
|
||||
|
|
|
|||
|
|
@ -3,122 +3,413 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "xbsmj1hcj1g",
|
||||
"source": "# Building a Neural Network: $C_p(T)$ for Nitrogen\n\n**CHEG 667-013 — LLMs for Engineers**\n\nIn this notebook we fit the heat capacity of N₂ gas using three approaches:\n1. A polynomial fit (the classical approach)\n2. A neural network built from scratch in numpy\n3. The same network in PyTorch\n\nThis makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible.",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Building a Neural Network: $C_p(T)$ for Nitrogen\n",
|
||||
"\n",
|
||||
"**CHEG 667-013 — LLMs for Engineers**\n",
|
||||
"\n",
|
||||
"In this notebook we fit the heat capacity of N₂ gas using three approaches:\n",
|
||||
"1. A polynomial fit (the classical approach)\n",
|
||||
"2. A neural network built from scratch in numpy\n",
|
||||
"3. The same network in PyTorch\n",
|
||||
"\n",
|
||||
"This makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "szrl41l3xbq",
|
||||
"source": "## 1. Load and plot the data\n\nThe data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 300–2000 K.",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 1. Load and plot the data\n",
|
||||
"\n",
|
||||
"The data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 300–2000 K."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"id": "t4lqkcoeyil",
|
||||
"source": "import numpy as np\nimport matplotlib.pyplot as plt\n\ndata = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\nT_raw = data[:, 0] # Temperature (K)\nCp_raw = data[:, 1] # Cp (kJ/kg/K)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\nplt.show()\n\nprint(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} – {T_raw.max():.0f} K\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"id": "t4lqkcoeyil",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"data = np.loadtxt(\"data/n2_cp.csv\", delimiter=\",\", skiprows=1)\n",
|
||||
"T_raw = data[:, 0] # Temperature (K)\n",
|
||||
"Cp_raw = data[:, 1] # Cp (kJ/kg/K)\n",
|
||||
"\n",
|
||||
"plt.figure(figsize=(8, 5))\n",
|
||||
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6)\n",
|
||||
"plt.xlabel('Temperature (K)')\n",
|
||||
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
|
||||
"plt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"print(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} – {T_raw.max():.0f} K\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1jyrgsvp7op",
|
||||
"source": "## 2. Polynomial fit (baseline)\n\nTextbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does.",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Polynomial fit (baseline)\n",
|
||||
"\n",
|
||||
"Textbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"id": "4smvu4z2oro",
|
||||
"source": "# Fit a cubic polynomial\ncoeffs = np.polyfit(T_raw, Cp_raw, 3)\npoly = np.poly1d(coeffs)\n\nT_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\nCp_poly = poly(T_fine)\n\n# Compute residuals\nCp_poly_at_data = poly(T_raw)\nmse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Polynomial fit')\nplt.legend()\nplt.show()\n\nprint(f\"Polynomial coefficients: {coeffs}\")\nprint(f\"MSE: {mse_poly:.8f}\")\nprint(f\"Parameters: 4\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"id": "4smvu4z2oro",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Fit a cubic polynomial\n",
|
||||
"coeffs = np.polyfit(T_raw, Cp_raw, 3)\n",
|
||||
"poly = np.poly1d(coeffs)\n",
|
||||
"\n",
|
||||
"T_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\n",
|
||||
"Cp_poly = poly(T_fine)\n",
|
||||
"\n",
|
||||
"# Compute residuals\n",
|
||||
"Cp_poly_at_data = poly(T_raw)\n",
|
||||
"mse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n",
|
||||
"\n",
|
||||
"plt.figure(figsize=(8, 5))\n",
|
||||
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
|
||||
"plt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\n",
|
||||
"plt.xlabel('Temperature (K)')\n",
|
||||
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
|
||||
"plt.title('Polynomial fit')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"print(f\"Polynomial coefficients: {coeffs}\")\n",
|
||||
"print(f\"MSE: {mse_poly:.8f}\")\n",
|
||||
"print(f\"Parameters: 4\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97y7mrcekji",
|
||||
"source": "## 3. Neural network from scratch (numpy)\n\nNow let's build a one-hidden-layer neural network. The architecture:\n\n```\nInput (1: T) → Hidden (10 neurons, tanh) → Output (1: Cp)\n```\n\nWe need to:\n1. **Normalize** the data to [0, 1] so the network trains efficiently\n2. **Forward pass**: compute predictions from input through each layer\n3. **Loss**: mean squared error between predictions and data\n4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n5. **Gradient descent**: update weights in the direction that reduces the loss\n\nThis is exactly what nanoGPT's `train.py` does — just at a much larger scale.",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Neural network from scratch (numpy)\n",
|
||||
"\n",
|
||||
"Now let's build a one-hidden-layer neural network. The architecture:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Input (1: T) -> Hidden (10 neurons, tanh) -> Output (1: Cp)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"We need to:\n",
|
||||
"1. **Normalize** the data to [0, 1] so the network trains efficiently\n",
|
||||
"2. **Forward pass**: compute predictions from input through each layer\n",
|
||||
"3. **Loss**: mean squared error between predictions and data\n",
|
||||
"4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n",
|
||||
"5. **Gradient descent**: update weights in the direction that reduces the loss\n",
|
||||
"\n",
|
||||
"This is exactly what nanoGPT's `train.py` does — just at a much larger scale."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "365o7bqbwkr",
|
||||
"source": "# Normalize inputs and outputs to [0, 1]\nT_min, T_max = T_raw.min(), T_raw.max()\nCp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n\nT = (T_raw - T_min) / (T_max - T_min)\nCp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n\nX = T.reshape(-1, 1) # (N, 1) input matrix\nY = Cp.reshape(-1, 1) # (N, 1) target matrix\nN = X.shape[0]\n\n# Network setup\nH = 10 # hidden neurons\n\nnp.random.seed(42)\nW1 = np.random.randn(1, H) * 0.5 # input → hidden weights\nb1 = np.zeros((1, H)) # hidden biases\nW2 = np.random.randn(H, 1) * 0.5 # hidden → output weights\nb2 = np.zeros((1, 1)) # output bias\n\nprint(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\nprint(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Normalize inputs and outputs to [0, 1]\n",
|
||||
"T_min, T_max = T_raw.min(), T_raw.max()\n",
|
||||
"Cp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n",
|
||||
"\n",
|
||||
"T = (T_raw - T_min) / (T_max - T_min)\n",
|
||||
"Cp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n",
|
||||
"\n",
|
||||
"X = T.reshape(-1, 1) # (N, 1) input matrix\n",
|
||||
"Y = Cp.reshape(-1, 1) # (N, 1) target matrix\n",
|
||||
"N = X.shape[0]\n",
|
||||
"\n",
|
||||
"# Network setup\n",
|
||||
"H = 10 # hidden neurons\n",
|
||||
"\n",
|
||||
"np.random.seed(42)\n",
|
||||
"W1 = np.random.randn(1, H) * 0.5 # input -> hidden weights\n",
|
||||
"b1 = np.zeros((1, H)) # hidden biases\n",
|
||||
"W2 = np.random.randn(H, 1) * 0.5 # hidden -> output weights\n",
|
||||
"b2 = np.zeros((1, 1)) # output bias\n",
|
||||
"\n",
|
||||
"print(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\n",
|
||||
"print(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5w1ezs9t2w6",
|
||||
"source": "# Training loop\nlearning_rate = 0.01\nepochs = 5000\nlog_interval = 500\nlosses_np = []\n\nfor epoch in range(epochs):\n # Forward pass\n Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n A1 = np.tanh(Z1) # hidden activation (N, H)\n Y_pred = A1 @ W2 + b2 # output (N, 1)\n\n # Loss (mean squared error)\n error = Y_pred - Y\n loss = np.mean(error ** 2)\n losses_np.append(loss)\n\n # Backpropagation (chain rule, working backward)\n dL_dYpred = 2 * error / N\n dL_dW2 = A1.T @ dL_dYpred\n dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n dL_dA1 = dL_dYpred @ W2.T\n dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n dL_dW1 = X.T @ dL_dZ1\n dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n\n # Gradient descent update\n W2 -= learning_rate * dL_dW2\n b2 -= learning_rate * dL_db2\n W1 -= learning_rate * dL_dW1\n b1 -= learning_rate * dL_db1\n\n if epoch % log_interval == 0 or epoch == epochs - 1:\n print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Training loop\n",
|
||||
"learning_rate = 0.01\n",
|
||||
"epochs = 5000\n",
|
||||
"log_interval = 500\n",
|
||||
"losses_np = []\n",
|
||||
"\n",
|
||||
"for epoch in range(epochs):\n",
|
||||
" # Forward pass\n",
|
||||
" Z1 = X @ W1 + b1 # hidden pre-activation (N, H)\n",
|
||||
" A1 = np.tanh(Z1) # hidden activation (N, H)\n",
|
||||
" Y_pred = A1 @ W2 + b2 # output (N, 1)\n",
|
||||
"\n",
|
||||
" # Loss (mean squared error)\n",
|
||||
" error = Y_pred - Y\n",
|
||||
" loss = np.mean(error ** 2)\n",
|
||||
" losses_np.append(loss)\n",
|
||||
"\n",
|
||||
" # Backpropagation (chain rule, working backward)\n",
|
||||
" dL_dYpred = 2 * error / N\n",
|
||||
" dL_dW2 = A1.T @ dL_dYpred\n",
|
||||
" dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n",
|
||||
" dL_dA1 = dL_dYpred @ W2.T\n",
|
||||
" dL_dZ1 = dL_dA1 * (1 - A1 ** 2) # tanh derivative\n",
|
||||
" dL_dW1 = X.T @ dL_dZ1\n",
|
||||
" dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n",
|
||||
"\n",
|
||||
" # Gradient descent update\n",
|
||||
" W2 -= learning_rate * dL_dW2\n",
|
||||
" b2 -= learning_rate * dL_db2\n",
|
||||
" W1 -= learning_rate * dL_dW1\n",
|
||||
" b1 -= learning_rate * dL_db1\n",
|
||||
"\n",
|
||||
" if epoch % log_interval == 0 or epoch == epochs - 1:\n",
|
||||
" print(f\"Epoch {epoch:5d} Loss: {loss:.6f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"id": "onel9r0kjk",
|
||||
"source": "# Predict on a fine grid and convert back to physical units\nT_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\nA1_fine = np.tanh(T_fine_norm @ W1 + b1)\nCp_nn_norm = A1_fine @ W2 + b2\nCp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\nT_fine_K = T_fine_norm * (T_max - T_min) + T_min\n\n# MSE in original units for comparison with polynomial\nCp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\nCp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\nmse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n\nax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\nax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\nax1.set_xlabel('Temperature (K)')\nax1.set_ylabel('$C_p$ (kJ/kg/K)')\nax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\nax1.legend()\n\nax2.semilogy(losses_np)\nax2.set_xlabel('Epoch')\nax2.set_ylabel('MSE (normalized)')\nax2.set_title('Training loss — numpy NN')\n\nplt.tight_layout()\nplt.show()",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"id": "onel9r0kjk",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Predict on a fine grid and convert back to physical units\n",
|
||||
"T_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\n",
|
||||
"A1_fine = np.tanh(T_fine_norm @ W1 + b1)\n",
|
||||
"Cp_nn_norm = A1_fine @ W2 + b2\n",
|
||||
"Cp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\n",
|
||||
"T_fine_K = T_fine_norm * (T_max - T_min) + T_min\n",
|
||||
"\n",
|
||||
"# MSE in original units for comparison with polynomial\n",
|
||||
"Cp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\n",
|
||||
"Cp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\n",
|
||||
"mse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n",
|
||||
"\n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
|
||||
"\n",
|
||||
"ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
|
||||
"ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\n",
|
||||
"ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\n",
|
||||
"ax1.set_xlabel('Temperature (K)')\n",
|
||||
"ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
|
||||
"ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
|
||||
"ax1.legend()\n",
|
||||
"\n",
|
||||
"ax2.semilogy(losses_np)\n",
|
||||
"ax2.set_xlabel('Epoch')\n",
|
||||
"ax2.set_ylabel('MSE (normalized)')\n",
|
||||
"ax2.set_title('Training loss — numpy NN')\n",
|
||||
"\n",
|
||||
"plt.tight_layout()\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea9z35qm9u8",
|
||||
"source": "## 4. Neural network in PyTorch\n\nThe same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n\nThis is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`.",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 4. Neural network in PyTorch\n",
|
||||
"\n",
|
||||
"The same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n",
|
||||
"\n",
|
||||
"This is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3qxnrtyxqgz",
|
||||
"source": "import torch\nimport torch.nn as nn\n\n# Prepare data as PyTorch tensors\nX_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\nY_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n\n# Define the network\nmodel = nn.Sequential(\n nn.Linear(1, H), # input → hidden (W1, b1)\n nn.Tanh(), # activation\n nn.Linear(H, 1), # hidden → output (W2, b2)\n)\n\nprint(model)\nprint(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import torch\n",
|
||||
"import torch.nn as nn\n",
|
||||
"\n",
|
||||
"# Prepare data as PyTorch tensors\n",
|
||||
"X_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\n",
|
||||
"Y_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"# Define the network\n",
|
||||
"model = nn.Sequential(\n",
|
||||
" nn.Linear(1, H), # input -> hidden (W1, b1)\n",
|
||||
" nn.Tanh(), # activation\n",
|
||||
" nn.Linear(H, 1), # hidden -> output (W2, b2)\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(model)\n",
|
||||
"print(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"id": "ydl3ycnypps",
|
||||
"source": "# Train\noptimizer = torch.optim.Adam(model.parameters(), lr=0.01)\nloss_fn = nn.MSELoss()\nlosses_torch = []\n\nfor epoch in range(epochs):\n Y_pred_t = model(X_t)\n loss = loss_fn(Y_pred_t, Y_t)\n losses_torch.append(loss.item())\n\n optimizer.zero_grad() # reset gradients\n loss.backward() # automatic differentiation\n optimizer.step() # update weights\n\n if epoch % log_interval == 0 or epoch == epochs - 1:\n print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"id": "ydl3ycnypps",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Train\n",
|
||||
"optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
|
||||
"loss_fn = nn.MSELoss()\n",
|
||||
"losses_torch = []\n",
|
||||
"\n",
|
||||
"for epoch in range(epochs):\n",
|
||||
" Y_pred_t = model(X_t)\n",
|
||||
" loss = loss_fn(Y_pred_t, Y_t)\n",
|
||||
" losses_torch.append(loss.item())\n",
|
||||
"\n",
|
||||
" optimizer.zero_grad() # reset gradients\n",
|
||||
" loss.backward() # automatic differentiation\n",
|
||||
" optimizer.step() # update weights\n",
|
||||
"\n",
|
||||
" if epoch % log_interval == 0 or epoch == epochs - 1:\n",
|
||||
" print(f\"Epoch {epoch:5d} Loss: {loss.item():.6f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bg0kvnk4ho",
|
||||
"source": "## 5. Compare all three approaches",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 5. Compare all three approaches"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"id": "h2dfstoh8gd",
|
||||
"source": "# PyTorch predictions\nT_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\nwith torch.no_grad():\n Cp_torch_norm = model(T_fine_t)\nCp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n\n# MSE for PyTorch model\nwith torch.no_grad():\n Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\nmse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n\n# Left: all three fits\nax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\nax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\nax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\nax1.set_xlabel('Temperature (K)')\nax1.set_ylabel('$C_p$ (kJ/kg/K)')\nax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\nax1.legend()\n\n# Right: training loss comparison\nax2.semilogy(losses_np, label='numpy (gradient descent)')\nax2.semilogy(losses_torch, label='PyTorch (Adam)')\nax2.set_xlabel('Epoch')\nax2.set_ylabel('MSE (normalized)')\nax2.set_title('Training loss comparison')\nax2.legend()\n\nplt.tight_layout()\nplt.show()\n\nprint(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"id": "h2dfstoh8gd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# PyTorch predictions\n",
|
||||
"T_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\n",
|
||||
"with torch.no_grad():\n",
|
||||
" Cp_torch_norm = model(T_fine_t)\n",
|
||||
"Cp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n",
|
||||
"\n",
|
||||
"# MSE for PyTorch model\n",
|
||||
"with torch.no_grad():\n",
|
||||
" Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\n",
|
||||
"mse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n",
|
||||
"\n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
|
||||
"\n",
|
||||
"# Left: all three fits\n",
|
||||
"ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
|
||||
"ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\n",
|
||||
"ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\n",
|
||||
"ax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\n",
|
||||
"ax1.set_xlabel('Temperature (K)')\n",
|
||||
"ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
|
||||
"ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
|
||||
"ax1.legend()\n",
|
||||
"\n",
|
||||
"# Right: training loss comparison\n",
|
||||
"ax2.semilogy(losses_np, label='numpy (gradient descent)')\n",
|
||||
"ax2.semilogy(losses_torch, label='PyTorch (Adam)')\n",
|
||||
"ax2.set_xlabel('Epoch')\n",
|
||||
"ax2.set_ylabel('MSE (normalized)')\n",
|
||||
"ax2.set_title('Training loss comparison')\n",
|
||||
"ax2.legend()\n",
|
||||
"\n",
|
||||
"plt.tight_layout()\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"print(f\"MSE — Polynomial: {mse_poly:.2e} | NN numpy: {mse_nn:.2e} | NN PyTorch: {mse_torch:.2e}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "xyw3sr20brn",
|
||||
"source": "## 6. Extrapolation\n\nHow do the models behave *outside* the training range? This is a key test — and where the differences become stark.",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 6. Extrapolation\n",
|
||||
"\n",
|
||||
"How do the models behave *outside* the training range? This is a key test — and where the differences become stark."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"id": "fi3iq2sjh6",
|
||||
"source": "# Extrapolate beyond the training range\nT_extrap = np.linspace(100, 2500, 300)\nT_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n\n# Polynomial extrapolation\nCp_poly_extrap = poly(T_extrap)\n\n# Numpy NN extrapolation\nA1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\nCp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n\n# PyTorch NN extrapolation\nwith torch.no_grad():\n Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\nCp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n\nplt.figure(figsize=(10, 6))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\nplt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\nplt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\nplt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\nplt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Extrapolation beyond training data')\nplt.legend()\nplt.show()",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
"id": "fi3iq2sjh6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Extrapolate beyond the training range\n",
|
||||
"T_extrap = np.linspace(100, 2500, 300)\n",
|
||||
"T_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"# Polynomial extrapolation\n",
|
||||
"Cp_poly_extrap = poly(T_extrap)\n",
|
||||
"\n",
|
||||
"# Numpy NN extrapolation\n",
|
||||
"A1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\n",
|
||||
"Cp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n",
|
||||
"\n",
|
||||
"# PyTorch NN extrapolation\n",
|
||||
"with torch.no_grad():\n",
|
||||
" Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\n",
|
||||
"Cp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n",
|
||||
"\n",
|
||||
"plt.figure(figsize=(10, 6))\n",
|
||||
"plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
|
||||
"plt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\n",
|
||||
"plt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\n",
|
||||
"plt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\n",
|
||||
"plt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\n",
|
||||
"plt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\n",
|
||||
"plt.xlabel('Temperature (K)')\n",
|
||||
"plt.ylabel('$C_p$ (kJ/kg/K)')\n",
|
||||
"plt.title('Extrapolation beyond training data')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "yb2s18keiw",
|
||||
"source": "## 7. Exercises\n\nTry these in new cells below:\n\n1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n\n2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n\n3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n\n4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n\n5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n\n6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?",
|
||||
"metadata": {}
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 7. Exercises\n",
|
||||
"\n",
|
||||
"Try these in new cells below:\n",
|
||||
"\n",
|
||||
"1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n",
|
||||
"\n",
|
||||
"2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n",
|
||||
"\n",
|
||||
"3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n",
|
||||
"\n",
|
||||
"4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n",
|
||||
"\n",
|
||||
"5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n",
|
||||
"\n",
|
||||
"6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
|
@ -134,4 +425,4 @@
|
|||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue