RAG pipeline for semantic search over personal archives

Journal and clippings search with LlamaIndex, HuggingFace embeddings,
cross-encoder re-ranking, and local LLM inference via Ollama. Clippings
index uses ChromaDB for persistent vector storage.
This commit is contained in:
Eric Furst 2026-02-22 12:46:29 -05:00
commit 90449f108e
12 changed files with 2031 additions and 0 deletions

37
.gitignore vendored Normal file
View file

@ -0,0 +1,37 @@
# Python
.venv/
__pycache__/
*.pyc
# HuggingFace cached models (large, ~2 GB)
models/
# Vector stores (large, rebuild with build scripts)
storage_exp/
storage/
storage_clippings/
# Data (symlinks to private files)
data
clippings
# Generated file lists
ocr_needed.txt
# IDE and OS
.DS_Store
.vscode/
.idea/
# Jupyter checkpoints
.ipynb_checkpoints/
# Secrets
.env
API_key_temp
# Query log
query.log
# Duplicate of CLAUDE.md
claude.md