README updates, textbook polynomial cell, self-contained notebook

Same set of changes as che-computing-dev/LLMs: - 03/04/05 READMEs: uv add workflow, required model caching - 05-tool-use: add Setup section, requirements.txt - 06-neural-networks: textbook cubic polynomial comparison cell - 06-neural-networks: add nn_workshop_colab.ipynb (self-contained, inline data) - vocab.md: catch up with terms from 02-05 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 10:18:10 -04:00 · 2026-05-04 10:18:10 -04:00 · f7d2b48f5a
commit f7d2b48f5a
parent a1f9d4d5ed
7 changed files with 534 additions and 23 deletions
--- a/03-rag/README.md
+++ b/03-rag/README.md
@ -59,20 +59,35 @@ source .venv/bin/activate
 ### Install the required packages
 Each section has its own `requirements.txt` listing the libraries it needs.
 **If you are using `uv` for the workshop** (recommended):
 ```bash
-pip install llama-index-core llama-index-readers-file \
+cd /path/to/llm-workshop
-    llama-index-llms-ollama llama-index-embeddings-huggingface \
+uv add $(cat 03-rag/requirements.txt)
    python-dateutil
 ```
-The `llama-index-*` packages are components of the [LlamaIndex](https://docs.llamaindex.ai/en/stable/) framework, which provides the plumbing for building RAG systems. `python-dateutil` is used by `clean_eml.py` for parsing email dates.
+`uv add` adds the packages to `pyproject.toml`, updates `uv.lock`, and installs them into `.venv/`.
-A `requirements.txt` is provided:
+**If you have a plain venv activated:**
 ```bash
 pip install -r requirements.txt
 ```
 Either way, the relevant packages are:
 ```
 llama-index-core
 llama-index-readers-file
 llama-index-llms-ollama
 llama-index-embeddings-huggingface
 python-dateutil
 ```
 The `llama-index-*` packages are components of the [LlamaIndex](https://docs.llamaindex.ai/en/stable/) framework, which provides the plumbing for building RAG systems. `python-dateutil` is used by `clean_eml.py` for parsing email dates.
 ### Pull the LLM
 We will use the `command-r7b` model, which was fine-tuned for RAG tasks:
@ -85,22 +100,18 @@ Other models work too — `llama3.1:8B`, `deepseek-r1:8B`, `gemma3:1b` — but `
 ### Cache the embedding model
-The embedding model converts text into vectors. We use `BAAI/bge-large-en-v1.5`, a sentence transformer hosted on Huggingface. It will download automatically on first use (~1.3 GB), but you can pre-cache it with a short Python script:
+**This step is required, not optional.** The embedding model `BAAI/bge-large-en-v1.5` (~1.3 GB) is downloaded from Hugging Face on first use. The `build.py` and `query.py` scripts run in *offline mode* (`HF_HUB_OFFLINE=1`) so that subsequent runs are fast and deterministic — but that means they cannot download the model on demand. If you skip this step, the scripts will fail with a `LocalEntryNotFoundError`.
-```python
+Run the included `cache_model.py` script first:
 from llama_index.embeddings.huggingface import HuggingFaceEmbedding
 embed_model = HuggingFaceEmbedding(
    cache_folder="./models",
    model_name="BAAI/bge-large-en-v1.5"
 )
 ```
 Save this as `cache_model.py` and run it:
 ```bash
 cd 03-rag
 python cache_model.py
 ```
-(This is also saved in the Github.) Each script that uses the model will set environmental variables to prevent checking for updates. You can manually update either by running `cache_model.py` or editing the scripts themselves.
+
 This populates `./models/` with the embedding model. After it succeeds, `build.py` and `query.py` will run.
 If you ever need to refresh the model or switch to a different one, edit `cache_model.py` (or temporarily set `HF_HUB_OFFLINE=0` in your shell) and re-run.
 ## 2. The libraries we use
@ -283,7 +294,7 @@ Our custom prompt in `query.py` is more detailed — it asks for structured outp
 > **Exercise 7:** Bring your own documents. Find a collection of text files — research paper abstracts, class notes, or a downloaded text from Project Gutenberg — and build a RAG system over them. What questions can you answer that a plain LLM cannot?
-> **Exercise 8 (optional, sets up Part IV):** Build a larger corpus. Ten emails is small enough that retrieval is barely selective — the system returns most of the corpus on every query. The script `fetch_arxiv.py` pulls 100 recent abstracts from a chosen arXiv category and writes one text file per abstract:
+> **Exercise 8 (optional, sets up Part IV):** Build a larger corpus. Ten emails is small enough that retrieval is barely selective. The system returns most of the corpus on every query. The script `fetch_arxiv.py` pulls 100 recent abstracts from a chosen arXiv category and writes one text file per abstract:
 >
 > ```bash
 > python fetch_arxiv.py --category cs.LG --max 100 --output data_arxiv
--- a/04-semantic-search/README.md
+++ b/04-semantic-search/README.md
@ -83,19 +83,39 @@ We use `cross-encoder/ms-marco-MiniLM-L-12-v2` to re-rank the merged candidates
 ### Prerequisites
-Everything from Part III, plus a few additional packages:
+Everything from Part III, plus a few additional packages. Each section has its own `requirements.txt`.
 **If you are using `uv` for the workshop** (recommended):
 ```bash
-pip install llama-index-retrievers-bm25 nltk
+cd /path/to/llm-workshop
 uv add $(cat 04-semantic-search/requirements.txt)
 ```
-A `requirements.txt` is provided with the full set of dependencies:
+**With a plain venv:**
 ```bash
 pip install -r requirements.txt
 ```
-The cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-12-v2`) will download automatically on first use via `sentence-transformers`. It is small (~130 MB).
+The new packages over Part III are `llama-index-retrievers-bm25` (BM25 keyword retrieval) and `nltk` (used by `search_keywords.py` for part-of-speech tagging).
 ### Cache the models
 **Required.** This section uses *two* models: the embedding model from Part III (cached if you ran `cache_model.py` already) and a cross-encoder for re-ranking. Both must be cached before `build_store.py` and `query_hybrid.py` will run, since the scripts run in offline mode.
 If you have already cached the embedding model in Part III, point this section at it:
 ```bash
 cd 04-semantic-search
 ln -s ../03-rag/models models
 ```
 Then pre-cache the cross-encoder (`cross-encoder/ms-marco-MiniLM-L-12-v2`, ~130 MB) by running a similar one-shot script (or temporarily set `HF_HUB_OFFLINE=0` in your shell for the first run of `build_store.py`).
 If you have not cached the embedding model yet, run `python ../03-rag/cache_model.py` first.
 ### Pull the LLM
 Make sure `ollama` is running and `command-r7b` is available:
--- a/05-tool-use/README.md
+++ b/05-tool-use/README.md
@ -18,6 +18,42 @@ The LLM tools you use every day are not bare language models. They are agentic s
 ---
 ## Setup
 This section uses the [Ollama Python library](https://github.com/ollama/ollama-python) to call local models programmatically.
 ### Install the Python library
 A `requirements.txt` is provided. **If you are using `uv` for the workshop** (recommended):
 ```bash
 cd /path/to/llm-workshop
 uv add $(cat 05-tool-use/requirements.txt)
 ```
 **With a plain venv:**
 ```bash
 pip install -r requirements.txt
 ```
 The only direct dependency is `ollama`. If you completed sections 03 or 04, this is likely already installed as a transitive dependency of `llama-index-llms-ollama`.
 ### Pull the model
 Both scripts use `llama3.1:8b`:
 ```bash
 ollama pull llama3.1:8b
 ```
 You can substitute another tool-calling model by editing the scripts. See https://ollama.com/search?c=tool for the current list of models that support function calling. Smaller options like `llama3.2:3b` work for the simple examples; larger models tend to handle multi-step tool calls more reliably.
 ### A note on Exercise 8
 The optional advanced exercise wires the RAG pipeline from sections 03-04 into a LlamaIndex `ReActAgent`. If you plan to attempt it, you'll need the LlamaIndex packages from those sections already installed and a working RAG store from `03-rag/`.
 ## 1. From LLM to agent: what changed?
 ### The early days
--- a/05-tool-use/requirements.txt
+++ b/05-tool-use/requirements.txt
@ -0,0 +1 @@
 ollama
--- a/06-neural-networks/nn_workshop.ipynb
+++ b/06-neural-networks/nn_workshop.ipynb
@ -93,6 +93,20 @@
    "print(f\"Parameters: 4\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07673988",
   "source": "### Comparing to a textbook polynomial\n\nMost thermodynamics textbooks tabulate $C_p$ correlations of the same cubic form, but with **fixed coefficients** drawn from a broader fit and reported in **molar** units, $C_p$ in J/(mol·K). For nitrogen, a typical reference set is:\n\n| Coefficient | Value |\n|-------------|-------|\n| $a$ | 28.883 |\n| $b$ | $-0.157 \\times 10^{-2}$ |\n| $c$ | $0.808 \\times 10^{-5}$ |\n| $d$ | $-2.871 \\times 10^{-9}$ |\n\n**Valid range: 273–1800 K.** Outside this range the textbook polynomial is being extrapolated and is not guaranteed by the original fit.\n\nTo compare with our NIST mass-basis data, we divide by the molar mass of N$_2$ (28.014 g/mol) — convenient because J/(mol·K) ÷ g/mol = kJ/(kg·K).",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "id": "f21c64a3",
   "source": "# Reference polynomial from textbook (Cp in J/(mol·K))\n# Form: Cp = a + bT + cT^2 + dT^3\n# Valid range: 273-1800 K\n\na_ref = 28.883\nb_ref = -0.157e-2\nc_ref = 0.808e-5\nd_ref = -2.871e-9\nM_N2 = 28.014  # g/mol\n\nT_REF_MIN, T_REF_MAX = 273.0, 1800.0\n\ndef cp_ref_molar(T):\n    \"\"\"Textbook cubic in J/(mol·K).\"\"\"\n    return a_ref + b_ref * T + c_ref * T**2 + d_ref * T**3\n\ndef cp_ref(T):\n    \"\"\"Same fit converted to kJ/(kg·K) to match the NIST data.\"\"\"\n    return cp_ref_molar(T) / M_N2\n\n# Evaluate the textbook polynomial across the full plot range\nCp_ref_full = cp_ref(T_fine)\n\n# MSE on the data points that fall within the textbook's stated valid range\nin_range = (T_raw >= T_REF_MIN) & (T_raw <= T_REF_MAX)\nCp_ref_at_data = cp_ref(T_raw[in_range])\nmse_ref = np.mean((Cp_ref_at_data - Cp_raw[in_range]) ** 2)\n\n# Plot both fits across the full data range; shade the textbook's valid range\nplt.figure(figsize=(8, 5))\nplt.axvspan(T_REF_MIN, T_REF_MAX, alpha=0.07, color='green',\n            label='Textbook valid range (273–1800 K)')\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_fine, Cp_ref_full, 'g--', linewidth=2,\n         label=f'Textbook polynomial (4 params, MSE={mse_ref:.2e})')\nplt.plot(T_fine, Cp_poly, 'b-', linewidth=2,\n         label=f'numpy.polyfit (4 params, MSE={mse_poly:.2e})')\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Two cubic fits: textbook coefficients vs. fit to these 35 points')\nplt.legend(fontsize=9)\nplt.show()\n\nprint(f\"Textbook polynomial MSE (within 273-1800 K, {in_range.sum()} points): {mse_ref:.6e}\")\nprint(f\"numpy.polyfit MSE (all {len(T_raw)} points):                     {mse_poly:.6e}\")\nprint()\nprint(\"Note: the textbook polynomial is plotted across the full range, but its\")\nprint(\"authors only claim validity from 273 to 1800 K (shaded band).\")\nprint(\"Beyond 1800 K, the curve is an extrapolation — useful as a teaching point\")\nprint(\"about trusting correlations only over their stated range.\")",
   "metadata": {},
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "id": "97y7mrcekji",
--- a/06-neural-networks/nn_workshop_colab.ipynb
+++ b/06-neural-networks/nn_workshop_colab.ipynb
@ -0,0 +1,415 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "xbsmj1hcj1g",
   "metadata": {},
   "source": "# Building a Neural Network: $C_p(T)$ for Nitrogen — self-contained edition\n\n**CHEG 667-013 — LLMs for Engineers**\n\nThis is a **self-contained version** of the notebook. The 35 NIST data points are embedded directly so it runs anywhere without needing the workshop repository. Suitable for Google Colab, JupyterHub, or any environment with `numpy`, `matplotlib`, and `torch`.\n\nIn this notebook we fit the heat capacity of N₂ gas using three approaches:\n1. A polynomial fit (the classical approach)\n2. A neural network built from scratch in numpy\n3. The same network in PyTorch\n\nThis makes the ML concepts behind LLMs — weights, loss, gradient descent, overfitting — concrete and tangible."
  },
  {
   "cell_type": "markdown",
   "id": "szrl41l3xbq",
   "metadata": {},
   "source": [
    "## 1. Load and plot the data\n",
    "\n",
    "The data is from the [NIST Chemistry WebBook](https://webbook.nist.gov/): isobaric heat capacity of N₂ at 1 bar, 300–2000 K."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "t4lqkcoeyil",
   "metadata": {},
   "outputs": [],
   "source": "import numpy as np\nimport matplotlib.pyplot as plt\n\n# NIST WebBook: ideal gas Cp of N2 at 1 bar, 300-2000 K\n# https://webbook.nist.gov/\nnist_data = np.array([\n    [ 300.0, 1.0413], [ 350.0, 1.0423], [ 400.0, 1.0450], [ 450.0, 1.0497],\n    [ 500.0, 1.0564], [ 550.0, 1.0650], [ 600.0, 1.0751], [ 650.0, 1.0863],\n    [ 700.0, 1.0981], [ 750.0, 1.1102], [ 800.0, 1.1223], [ 850.0, 1.1342],\n    [ 900.0, 1.1457], [ 950.0, 1.1568], [1000.0, 1.1674], [1050.0, 1.1774],\n    [1100.0, 1.1868], [1150.0, 1.1957], [1200.0, 1.2040], [1250.0, 1.2118],\n    [1300.0, 1.2191], [1350.0, 1.2260], [1400.0, 1.2323], [1450.0, 1.2383],\n    [1500.0, 1.2439], [1550.0, 1.2491], [1600.0, 1.2540], [1650.0, 1.2586],\n    [1700.0, 1.2630], [1750.0, 1.2670], [1800.0, 1.2708], [1850.0, 1.2744],\n    [1900.0, 1.2778], [1950.0, 1.2810], [2000.0, 1.2841],\n])\n\nT_raw = nist_data[:, 0]    # Temperature (K)\nCp_raw = nist_data[:, 1]   # Cp (kJ/kg/K)\n\nplt.figure(figsize=(8, 5))\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6)\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('$C_p(T)$ for N$_2$ at 1 bar — NIST WebBook')\nplt.show()\n\nprint(f\"{len(T_raw)} data points, T range: {T_raw.min():.0f} – {T_raw.max():.0f} K\")"
  },
  {
   "cell_type": "markdown",
   "id": "1jyrgsvp7op",
   "metadata": {},
   "source": [
    "## 2. Polynomial fit (baseline)\n",
    "\n",
    "Textbooks fit $C_p(T)$ with a polynomial: $C_p = a + bT + cT^2 + dT^3$. This is a **4-parameter** model. Let's fit it with `numpy.polyfit` and see how well it does."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4smvu4z2oro",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fit a cubic polynomial\n",
    "coeffs = np.polyfit(T_raw, Cp_raw, 3)\n",
    "poly = np.poly1d(coeffs)\n",
    "\n",
    "T_fine = np.linspace(T_raw.min(), T_raw.max(), 200)\n",
    "Cp_poly = poly(T_fine)\n",
    "\n",
    "# Compute residuals\n",
    "Cp_poly_at_data = poly(T_raw)\n",
    "mse_poly = np.mean((Cp_poly_at_data - Cp_raw) ** 2)\n",
    "\n",
    "plt.figure(figsize=(8, 5))\n",
    "plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
    "plt.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Cubic polynomial (4 params)')\n",
    "plt.xlabel('Temperature (K)')\n",
    "plt.ylabel('$C_p$ (kJ/kg/K)')\n",
    "plt.title('Polynomial fit')\n",
    "plt.legend()\n",
    "plt.show()\n",
    "\n",
    "print(f\"Polynomial coefficients: {coeffs}\")\n",
    "print(f\"MSE: {mse_poly:.8f}\")\n",
    "print(f\"Parameters: 4\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07673988",
   "source": "### Comparing to a textbook polynomial\n\nMost thermodynamics textbooks tabulate $C_p$ correlations of the same cubic form, but with **fixed coefficients** drawn from a broader fit and reported in **molar** units, $C_p$ in J/(mol·K). For nitrogen, a typical reference set is:\n\n| Coefficient | Value |\n|-------------|-------|\n| $a$ | 28.883 |\n| $b$ | $-0.157 \\times 10^{-2}$ |\n| $c$ | $0.808 \\times 10^{-5}$ |\n| $d$ | $-2.871 \\times 10^{-9}$ |\n\n**Valid range: 273–1800 K.** Outside this range the textbook polynomial is being extrapolated and is not guaranteed by the original fit.\n\nTo compare with our NIST mass-basis data, we divide by the molar mass of N$_2$ (28.014 g/mol) — convenient because J/(mol·K) ÷ g/mol = kJ/(kg·K).",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "id": "f21c64a3",
   "source": "# Reference polynomial from textbook (Cp in J/(mol·K))\n# Form: Cp = a + bT + cT^2 + dT^3\n# Valid range: 273-1800 K\n\na_ref = 28.883\nb_ref = -0.157e-2\nc_ref = 0.808e-5\nd_ref = -2.871e-9\nM_N2 = 28.014  # g/mol\n\nT_REF_MIN, T_REF_MAX = 273.0, 1800.0\n\ndef cp_ref_molar(T):\n    \"\"\"Textbook cubic in J/(mol·K).\"\"\"\n    return a_ref + b_ref * T + c_ref * T**2 + d_ref * T**3\n\ndef cp_ref(T):\n    \"\"\"Same fit converted to kJ/(kg·K) to match the NIST data.\"\"\"\n    return cp_ref_molar(T) / M_N2\n\n# Evaluate the textbook polynomial across the full plot range\nCp_ref_full = cp_ref(T_fine)\n\n# MSE on the data points that fall within the textbook's stated valid range\nin_range = (T_raw >= T_REF_MIN) & (T_raw <= T_REF_MAX)\nCp_ref_at_data = cp_ref(T_raw[in_range])\nmse_ref = np.mean((Cp_ref_at_data - Cp_raw[in_range]) ** 2)\n\n# Plot both fits across the full data range; shade the textbook's valid range\nplt.figure(figsize=(8, 5))\nplt.axvspan(T_REF_MIN, T_REF_MAX, alpha=0.07, color='green',\n            label='Textbook valid range (273–1800 K)')\nplt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\nplt.plot(T_fine, Cp_ref_full, 'g--', linewidth=2,\n         label=f'Textbook polynomial (4 params, MSE={mse_ref:.2e})')\nplt.plot(T_fine, Cp_poly, 'b-', linewidth=2,\n         label=f'numpy.polyfit (4 params, MSE={mse_poly:.2e})')\nplt.xlabel('Temperature (K)')\nplt.ylabel('$C_p$ (kJ/kg/K)')\nplt.title('Two cubic fits: textbook coefficients vs. fit to these 35 points')\nplt.legend(fontsize=9)\nplt.show()\n\nprint(f\"Textbook polynomial MSE (within 273-1800 K, {in_range.sum()} points): {mse_ref:.6e}\")\nprint(f\"numpy.polyfit MSE (all {len(T_raw)} points):                     {mse_poly:.6e}\")\nprint()\nprint(\"Note: the textbook polynomial is plotted across the full range, but its\")\nprint(\"authors only claim validity from 273 to 1800 K (shaded band).\")\nprint(\"Beyond 1800 K, the curve is an extrapolation — useful as a teaching point\")\nprint(\"about trusting correlations only over their stated range.\")",
   "metadata": {},
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "id": "97y7mrcekji",
   "metadata": {},
   "source": [
    "## 3. Neural network from scratch (numpy)\n",
    "\n",
    "Now let's build a one-hidden-layer neural network. The architecture:\n",
    "\n",
    "```\n",
    "Input (1: T) -> Hidden (10 neurons, tanh) -> Output (1: Cp)\n",
    "```\n",
    "\n",
    "We need to:\n",
    "1. **Normalize** the data to [0, 1] so the network trains efficiently\n",
    "2. **Forward pass**: compute predictions from input through each layer\n",
    "3. **Loss**: mean squared error between predictions and data\n",
    "4. **Backpropagation**: compute gradients of the loss w.r.t. each weight using the chain rule\n",
    "5. **Gradient descent**: update weights in the direction that reduces the loss\n",
    "\n",
    "This is exactly what nanoGPT's `train.py` does — just at a much larger scale."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "365o7bqbwkr",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Normalize inputs and outputs to [0, 1]\n",
    "T_min, T_max = T_raw.min(), T_raw.max()\n",
    "Cp_min, Cp_max = Cp_raw.min(), Cp_raw.max()\n",
    "\n",
    "T = (T_raw - T_min) / (T_max - T_min)\n",
    "Cp = (Cp_raw - Cp_min) / (Cp_max - Cp_min)\n",
    "\n",
    "X = T.reshape(-1, 1)    # (N, 1) input matrix\n",
    "Y = Cp.reshape(-1, 1)   # (N, 1) target matrix\n",
    "N = X.shape[0]\n",
    "\n",
    "# Network setup\n",
    "H = 10  # hidden neurons\n",
    "\n",
    "np.random.seed(42)\n",
    "W1 = np.random.randn(1, H) * 0.5   # input -> hidden weights\n",
    "b1 = np.zeros((1, H))               # hidden biases\n",
    "W2 = np.random.randn(H, 1) * 0.5   # hidden -> output weights\n",
    "b2 = np.zeros((1, 1))               # output bias\n",
    "\n",
    "print(f\"Parameters: W1({W1.shape}) + b1({b1.shape}) + W2({W2.shape}) + b2({b2.shape})\")\n",
    "print(f\"Total: {W1.size + b1.size + W2.size + b2.size} parameters for {N} data points\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5w1ezs9t2w6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Training loop\n",
    "learning_rate = 0.01\n",
    "epochs = 5000\n",
    "log_interval = 500\n",
    "losses_np = []\n",
    "\n",
    "for epoch in range(epochs):\n",
    "    # Forward pass\n",
    "    Z1 = X @ W1 + b1           # hidden pre-activation  (N, H)\n",
    "    A1 = np.tanh(Z1)           # hidden activation       (N, H)\n",
    "    Y_pred = A1 @ W2 + b2      # output                  (N, 1)\n",
    "\n",
    "    # Loss (mean squared error)\n",
    "    error = Y_pred - Y\n",
    "    loss = np.mean(error ** 2)\n",
    "    losses_np.append(loss)\n",
    "\n",
    "    # Backpropagation (chain rule, working backward)\n",
    "    dL_dYpred = 2 * error / N\n",
    "    dL_dW2 = A1.T @ dL_dYpred\n",
    "    dL_db2 = np.sum(dL_dYpred, axis=0, keepdims=True)\n",
    "    dL_dA1 = dL_dYpred @ W2.T\n",
    "    dL_dZ1 = dL_dA1 * (1 - A1 ** 2)   # tanh derivative\n",
    "    dL_dW1 = X.T @ dL_dZ1\n",
    "    dL_db1 = np.sum(dL_dZ1, axis=0, keepdims=True)\n",
    "\n",
    "    # Gradient descent update\n",
    "    W2 -= learning_rate * dL_dW2\n",
    "    b2 -= learning_rate * dL_db2\n",
    "    W1 -= learning_rate * dL_dW1\n",
    "    b1 -= learning_rate * dL_db1\n",
    "\n",
    "    if epoch % log_interval == 0 or epoch == epochs - 1:\n",
    "        print(f\"Epoch {epoch:5d}  Loss: {loss:.6f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "onel9r0kjk",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Predict on a fine grid and convert back to physical units\n",
    "T_fine_norm = np.linspace(0, 1, 200).reshape(-1, 1)\n",
    "A1_fine = np.tanh(T_fine_norm @ W1 + b1)\n",
    "Cp_nn_norm = A1_fine @ W2 + b2\n",
    "Cp_nn = Cp_nn_norm * (Cp_max - Cp_min) + Cp_min\n",
    "T_fine_K = T_fine_norm * (T_max - T_min) + T_min\n",
    "\n",
    "# MSE in original units for comparison with polynomial\n",
    "Cp_nn_at_data = np.tanh(X @ W1 + b1) @ W2 + b2\n",
    "Cp_nn_at_data = Cp_nn_at_data * (Cp_max - Cp_min) + Cp_min\n",
    "mse_nn = np.mean((Cp_nn_at_data.flatten() - Cp_raw) ** 2)\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
    "\n",
    "ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
    "ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params, MSE={mse_poly:.2e})')\n",
    "ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r-', linewidth=2, label=f'NN numpy (31 params, MSE={mse_nn:.2e})')\n",
    "ax1.set_xlabel('Temperature (K)')\n",
    "ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
    "ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
    "ax1.legend()\n",
    "\n",
    "ax2.semilogy(losses_np)\n",
    "ax2.set_xlabel('Epoch')\n",
    "ax2.set_ylabel('MSE (normalized)')\n",
    "ax2.set_title('Training loss — numpy NN')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea9z35qm9u8",
   "metadata": {},
   "source": [
    "## 4. Neural network in PyTorch\n",
    "\n",
    "The same network, but PyTorch handles backpropagation automatically. Compare the training loop above to the one below — `loss.backward()` replaces all of our manual gradient calculations.\n",
    "\n",
    "This is the same API used in nanoGPT's `model.py` — `nn.Linear`, activation functions, `optimizer.step()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3qxnrtyxqgz",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "\n",
    "# Prepare data as PyTorch tensors\n",
    "X_t = torch.tensor((T_raw - T_min) / (T_max - T_min), dtype=torch.float32).reshape(-1, 1)\n",
    "Y_t = torch.tensor((Cp_raw - Cp_min) / (Cp_max - Cp_min), dtype=torch.float32).reshape(-1, 1)\n",
    "\n",
    "# Define the network\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(1, H),    # input -> hidden (W1, b1)\n",
    "    nn.Tanh(),           # activation\n",
    "    nn.Linear(H, 1),    # hidden -> output (W2, b2)\n",
    ")\n",
    "\n",
    "print(model)\n",
    "print(f\"Total parameters: {sum(p.numel() for p in model.parameters())}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ydl3ycnypps",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
    "loss_fn = nn.MSELoss()\n",
    "losses_torch = []\n",
    "\n",
    "for epoch in range(epochs):\n",
    "    Y_pred_t = model(X_t)\n",
    "    loss = loss_fn(Y_pred_t, Y_t)\n",
    "    losses_torch.append(loss.item())\n",
    "\n",
    "    optimizer.zero_grad()   # reset gradients\n",
    "    loss.backward()         # automatic differentiation\n",
    "    optimizer.step()        # update weights\n",
    "\n",
    "    if epoch % log_interval == 0 or epoch == epochs - 1:\n",
    "        print(f\"Epoch {epoch:5d}  Loss: {loss.item():.6f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bg0kvnk4ho",
   "metadata": {},
   "source": [
    "## 5. Compare all three approaches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "h2dfstoh8gd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# PyTorch predictions\n",
    "T_fine_t = torch.linspace(0, 1, 200).reshape(-1, 1)\n",
    "with torch.no_grad():\n",
    "    Cp_torch_norm = model(T_fine_t)\n",
    "Cp_torch = Cp_torch_norm.numpy() * (Cp_max - Cp_min) + Cp_min\n",
    "\n",
    "# MSE for PyTorch model\n",
    "with torch.no_grad():\n",
    "    Cp_torch_at_data = model(X_t).numpy() * (Cp_max - Cp_min) + Cp_min\n",
    "mse_torch = np.mean((Cp_torch_at_data.flatten() - Cp_raw) ** 2)\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))\n",
    "\n",
    "# Left: all three fits\n",
    "ax1.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
    "ax1.plot(T_fine, Cp_poly, 'b-', linewidth=2, label=f'Polynomial (4 params)')\n",
    "ax1.plot(T_fine_K.flatten(), Cp_nn.flatten(), 'r--', linewidth=2, label=f'NN numpy (31 params)')\n",
    "ax1.plot(T_fine_K.flatten(), Cp_torch.flatten(), 'g-', linewidth=2, alpha=0.8, label=f'NN PyTorch (31 params)')\n",
    "ax1.set_xlabel('Temperature (K)')\n",
    "ax1.set_ylabel('$C_p$ (kJ/kg/K)')\n",
    "ax1.set_title('$C_p(T)$ for N$_2$ at 1 bar')\n",
    "ax1.legend()\n",
    "\n",
    "# Right: training loss comparison\n",
    "ax2.semilogy(losses_np, label='numpy (gradient descent)')\n",
    "ax2.semilogy(losses_torch, label='PyTorch (Adam)')\n",
    "ax2.set_xlabel('Epoch')\n",
    "ax2.set_ylabel('MSE (normalized)')\n",
    "ax2.set_title('Training loss comparison')\n",
    "ax2.legend()\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(f\"MSE — Polynomial: {mse_poly:.2e}  |  NN numpy: {mse_nn:.2e}  |  NN PyTorch: {mse_torch:.2e}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "xyw3sr20brn",
   "metadata": {},
   "source": [
    "## 6. Extrapolation\n",
    "\n",
    "How do the models behave *outside* the training range? This is a key test — and where the differences become stark."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fi3iq2sjh6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extrapolate beyond the training range\n",
    "T_extrap = np.linspace(100, 2500, 300)\n",
    "T_extrap_norm = ((T_extrap - T_min) / (T_max - T_min)).reshape(-1, 1)\n",
    "\n",
    "# Polynomial extrapolation\n",
    "Cp_poly_extrap = poly(T_extrap)\n",
    "\n",
    "# Numpy NN extrapolation\n",
    "A1_extrap = np.tanh(T_extrap_norm @ W1 + b1)\n",
    "Cp_nn_extrap = (A1_extrap @ W2 + b2) * (Cp_max - Cp_min) + Cp_min\n",
    "\n",
    "# PyTorch NN extrapolation\n",
    "with torch.no_grad():\n",
    "    Cp_torch_extrap = model(torch.tensor(T_extrap_norm, dtype=torch.float32)).numpy()\n",
    "Cp_torch_extrap = Cp_torch_extrap * (Cp_max - Cp_min) + Cp_min\n",
    "\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(T_raw, Cp_raw, 'ko', markersize=6, label='NIST data')\n",
    "plt.plot(T_extrap, Cp_poly_extrap, 'b-', linewidth=2, label='Polynomial')\n",
    "plt.plot(T_extrap, Cp_nn_extrap.flatten(), 'r--', linewidth=2, label='NN numpy')\n",
    "plt.plot(T_extrap, Cp_torch_extrap.flatten(), 'g-', linewidth=2, alpha=0.8, label='NN PyTorch')\n",
    "plt.axvline(T_raw.min(), color='gray', linestyle=':', alpha=0.5, label='Training range')\n",
    "plt.axvline(T_raw.max(), color='gray', linestyle=':', alpha=0.5)\n",
    "plt.xlabel('Temperature (K)')\n",
    "plt.ylabel('$C_p$ (kJ/kg/K)')\n",
    "plt.title('Extrapolation beyond training data')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "yb2s18keiw",
   "metadata": {},
   "source": [
    "## 7. Exercises\n",
    "\n",
    "Try these in new cells below:\n",
    "\n",
    "1. **Change the number of hidden neurons** (`H`). Try 2, 5, 20, 50. How does the fit change? At what point does adding neurons stop helping?\n",
    "\n",
    "2. **Activation functions**: In the PyTorch model, replace `nn.Tanh()` with `nn.ReLU()` or `nn.Sigmoid()`. How does the fit change?\n",
    "\n",
    "3. **Optimizer comparison**: Replace `Adam` with `torch.optim.SGD(model.parameters(), lr=0.01)`. How does training speed compare?\n",
    "\n",
    "4. **Remove normalization**: Use `T_raw` and `Cp_raw` directly (no scaling to [0,1]). What happens? Can you fix it by adjusting the learning rate?\n",
    "\n",
    "5. **Overfitting**: Set `H = 100` and train for 20,000 epochs. Does it fit the training data well? Look at the extrapolation — is it reasonable?\n",
    "\n",
    "6. **Higher-order polynomial**: Try `np.polyfit(T_raw, Cp_raw, 10)`. How does it compare to the cubic? How does it extrapolate?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/vocab.md
+++ b/vocab.md
@ -47,6 +47,10 @@ Key terms organized by the section where they are first introduced.
 | **System prompt** | Instructions that shape the model's behavior, role, or constraints. Set in a Modelfile or at runtime. |
 | **Modelfile** | A configuration file for Ollama that defines a custom model: base model, parameters, and system prompt. |
 | **API** | Application Programming Interface. A defined way for programs to communicate. Ollama provides an API for sending prompts and receiving responses. |
 | **Embedding length** | The dimensionality of a model's internal vector representation of each token. Same idea as `n_embd` in nanoGPT. Larger embedding length captures more meaning at the cost of memory. |
 | **Repeat penalty** | A parameter that discourages the model from repeating tokens it has recently produced. Helps avoid loops. |
 | **Min-p sampling** | A sampling strategy that keeps tokens whose probability is at least `min_p` times the top token's probability. |
 | **Hallucination** | When a model produces confident-looking output that is factually wrong. The base model is doing what it always does (predicting plausible tokens); grounding via retrieval or tool use reduces it. |
 ## Section 03: RAG
@ -58,10 +62,15 @@ Key terms organized by the section where they are first introduced.
 | **Vector store** | An indexed collection of embedded chunks, searchable by vector similarity. |
 | **Cosine similarity** | A measure of similarity between two vectors based on the angle between them. Used to find the most relevant chunks for a query. |
 | **Semantic search** | Search based on meaning rather than exact keyword matching, enabled by embeddings. |
-| **LlamaIndex** | A Python framework for building RAG systems: chunking, embedding, indexing, and querying. |
+| **LlamaIndex** | A Python framework for building RAG systems: chunking, embedding, indexing, and querying. Split since v0.10 into `llama-index-core` plus integration packages. |
 | **Settings** | LlamaIndex's global configuration object. Setting `Settings.llm` and `Settings.embed_model` once configures all downstream components. Replaced the deprecated `ServiceContext`. |
 | **Node** | In LlamaIndex, a parsed text segment ready for embedding and indexing. |
 | **Context** | The retrieved chunks passed to the LLM as background information for answering a query. |
 | **Generator** | The LLM component in a RAG system that reads retrieved context and composes a response. |
 | **Embedding model** | A model whose job is to convert text to vectors. Different from the generator (LLM). We use `BAAI/bge-large-en-v1.5`. |
 | **Hugging Face Hub** | A registry of open-source models (embeddings, LLMs, cross-encoders). Models download automatically on first use. |
 | **`sentence-transformers`** | A Python library that loads and runs sentence/embedding models from Hugging Face. Used under the hood by LlamaIndex's `HuggingFaceEmbedding`. |
 | **`HF_HUB_OFFLINE`** | An environment variable that tells Hugging Face libraries not to check the Hub for updates. Set it (along with `TOKENIZERS_PARALLELISM` and `SENTENCE_TRANSFORMERS_HOME`) *before* importing LlamaIndex, because the libraries read the environment at import time. |
 ## Section 04: Semantic Search
@ -72,8 +81,10 @@ Key terms organized by the section where they are first introduced.
 | **Sparse retrieval** | Keyword-based search (like BM25). Good at finding exact names, dates, and technical terms. |
 | **BM25** | "Best Matching 25." A classical algorithm that scores documents by term frequency, adjusted for document length. |
 | **Cross-encoder** | A model that reads query and document together to produce a relevance score. More accurate than embeddings alone, but slower. |
 | **Bi-encoder** | A model that encodes query and document separately into vectors, then compares them. Embedding models are bi-encoders. Fast at scale; less accurate per pair than a cross-encoder. |
 | **Re-ranking** | A second pass that scores a candidate pool more carefully (typically with a cross-encoder) to improve retrieval quality. |
 | **Candidate pool** | The initial set of retrieved chunks before re-ranking narrows them down. |
 | **MTEB** | Massive Text Embedding Benchmark. A public leaderboard at https://huggingface.co/spaces/mteb/leaderboard for comparing embedding and re-ranking models. Useful for finding current state-of-the-art. |
 ## Section 05: Tool Use and Agentic Systems
@ -85,6 +96,9 @@ Key terms organized by the section where they are first introduced.
 | **Memory** | Stored conversation history re-injected into prompts to maintain context across turns. The LLM itself is stateless; memory is managed by the system. |
 | **Type hints** | Python annotations specifying parameter and return types. Used by tool-calling systems to understand function signatures. |
 | **Docstring** | Documentation inside a Python function describing what it does. Tool-calling systems use docstrings to explain tools to the LLM. |
 | **LLM-as-interface** | The framing that an LLM in a modern agentic system is the natural-language interface to tools and data, not the engine that produces final answers. The LLM interprets requests and orchestrates; the tools do the work. |
 | **Reasoning layer** | The LLM's role in interpreting ambiguous requests, deciding which tool to use, handling unexpected results, and explaining outcomes. Reasoning here is *in language*, not in mathematics. |
 | **ReAct** | "Reasoning + Acting." A pattern where the LLM alternates between reasoning steps (in natural language) and tool actions, observing each result before deciding the next step. The default agent type for local models in LlamaIndex. |
 ## Section 06: Neural Networks