README updates, textbook polynomial cell, self-contained notebook

Same set of changes as che-computing-dev/LLMs: - 03/04/05 READMEs: uv add workflow, required model caching - 05-tool-use: add Setup section, requirements.txt - 06-neural-networks: textbook cubic polynomial comparison cell - 06-neural-networks: add nn_workshop_colab.ipynb (self-contained, inline data) - vocab.md: catch up with terms from 02-05 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 10:18:10 -04:00 · 2026-05-04 10:18:10 -04:00 · f7d2b48f5a
commit f7d2b48f5a
parent a1f9d4d5ed
7 changed files with 534 additions and 23 deletions
--- a/vocab.md
+++ b/vocab.md
@ -47,6 +47,10 @@ Key terms organized by the section where they are first introduced.
 | **System prompt** | Instructions that shape the model's behavior, role, or constraints. Set in a Modelfile or at runtime. |
 | **Modelfile** | A configuration file for Ollama that defines a custom model: base model, parameters, and system prompt. |
 | **API** | Application Programming Interface. A defined way for programs to communicate. Ollama provides an API for sending prompts and receiving responses. |
+| **Embedding length** | The dimensionality of a model's internal vector representation of each token. Same idea as `n_embd` in nanoGPT. Larger embedding length captures more meaning at the cost of memory. |
+| **Repeat penalty** | A parameter that discourages the model from repeating tokens it has recently produced. Helps avoid loops. |
+| **Min-p sampling** | A sampling strategy that keeps tokens whose probability is at least `min_p` times the top token's probability. |
+| **Hallucination** | When a model produces confident-looking output that is factually wrong. The base model is doing what it always does (predicting plausible tokens); grounding via retrieval or tool use reduces it. |

 ## Section 03: RAG

@ -58,10 +62,15 @@ Key terms organized by the section where they are first introduced.
 | **Vector store** | An indexed collection of embedded chunks, searchable by vector similarity. |
 | **Cosine similarity** | A measure of similarity between two vectors based on the angle between them. Used to find the most relevant chunks for a query. |
 | **Semantic search** | Search based on meaning rather than exact keyword matching, enabled by embeddings. |
-| **LlamaIndex** | A Python framework for building RAG systems: chunking, embedding, indexing, and querying. |
+| **LlamaIndex** | A Python framework for building RAG systems: chunking, embedding, indexing, and querying. Split since v0.10 into `llama-index-core` plus integration packages. |
+| **Settings** | LlamaIndex's global configuration object. Setting `Settings.llm` and `Settings.embed_model` once configures all downstream components. Replaced the deprecated `ServiceContext`. |
 | **Node** | In LlamaIndex, a parsed text segment ready for embedding and indexing. |
 | **Context** | The retrieved chunks passed to the LLM as background information for answering a query. |
 | **Generator** | The LLM component in a RAG system that reads retrieved context and composes a response. |
+| **Embedding model** | A model whose job is to convert text to vectors. Different from the generator (LLM). We use `BAAI/bge-large-en-v1.5`. |
+| **Hugging Face Hub** | A registry of open-source models (embeddings, LLMs, cross-encoders). Models download automatically on first use. |
+| **`sentence-transformers`** | A Python library that loads and runs sentence/embedding models from Hugging Face. Used under the hood by LlamaIndex's `HuggingFaceEmbedding`. |
+| **`HF_HUB_OFFLINE`** | An environment variable that tells Hugging Face libraries not to check the Hub for updates. Set it (along with `TOKENIZERS_PARALLELISM` and `SENTENCE_TRANSFORMERS_HOME`) *before* importing LlamaIndex, because the libraries read the environment at import time. |

 ## Section 04: Semantic Search

@ -72,8 +81,10 @@ Key terms organized by the section where they are first introduced.
 | **Sparse retrieval** | Keyword-based search (like BM25). Good at finding exact names, dates, and technical terms. |
 | **BM25** | "Best Matching 25." A classical algorithm that scores documents by term frequency, adjusted for document length. |
 | **Cross-encoder** | A model that reads query and document together to produce a relevance score. More accurate than embeddings alone, but slower. |
+| **Bi-encoder** | A model that encodes query and document separately into vectors, then compares them. Embedding models are bi-encoders. Fast at scale; less accurate per pair than a cross-encoder. |
 | **Re-ranking** | A second pass that scores a candidate pool more carefully (typically with a cross-encoder) to improve retrieval quality. |
 | **Candidate pool** | The initial set of retrieved chunks before re-ranking narrows them down. |
+| **MTEB** | Massive Text Embedding Benchmark. A public leaderboard at https://huggingface.co/spaces/mteb/leaderboard for comparing embedding and re-ranking models. Useful for finding current state-of-the-art. |

 ## Section 05: Tool Use and Agentic Systems

@ -85,6 +96,9 @@ Key terms organized by the section where they are first introduced.
 | **Memory** | Stored conversation history re-injected into prompts to maintain context across turns. The LLM itself is stateless; memory is managed by the system. |
 | **Type hints** | Python annotations specifying parameter and return types. Used by tool-calling systems to understand function signatures. |
 | **Docstring** | Documentation inside a Python function describing what it does. Tool-calling systems use docstrings to explain tools to the LLM. |
+| **LLM-as-interface** | The framing that an LLM in a modern agentic system is the natural-language interface to tools and data, not the engine that produces final answers. The LLM interprets requests and orchestrates; the tools do the work. |
+| **Reasoning layer** | The LLM's role in interpreting ambiguous requests, deciding which tool to use, handling unexpected results, and explaining outcomes. Reasoning here is *in language*, not in mathematics. |
+| **ReAct** | "Reasoning + Acting." A pattern where the LLM alternates between reasoning steps (in natural language) and tool actions, observing each result before deciding the next step. The default agent type for local models in LlamaIndex. |

 ## Section 06: Neural Networks