Sync RAG and semantic-search updates from che-computing

- 03-rag, 04-semantic-search: env-var-before-imports fix in build/query scripts
- 03-rag: new libraries section, fetch_arxiv.py, exercises for larger corpus
  and finding current SOTA models, formal references (Lewis, Booth)
- 04-semantic-search: libraries pointer back to Part III, larger corpus
  subsection, model-update exercise, formal references
- 06-neural-networks: add Nielsen reference (recommended by student)
- README: vocab.md link, agentic systems in description, Ollama prereq for 02-05
- New: vocab.md (glossary organized by section)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Eric Furst 2026-04-28 12:05:08 -04:00
commit 59e5f86884
9 changed files with 359 additions and 17 deletions

View file

@ -5,6 +5,14 @@
# August 2025
# E. M. Furst
# Environment vars must be set before importing huggingface/transformers
# libraries, because huggingface_hub.constants evaluates HF_HUB_OFFLINE
# at import time.
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "./models"
os.environ["HF_HUB_OFFLINE"] = "1"
from llama_index.core import (
load_index_from_storage,
StorageContext,
@ -13,12 +21,7 @@ from llama_index.core import (
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
import os, time
#
# Globals
#
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import time
# Embedding model used in vector store (this should match the one in build.py)
embed_model = HuggingFaceEmbedding(cache_folder="./models",