RAG pipeline for semantic search over personal archives

Journal and clippings search with LlamaIndex, HuggingFace embeddings, cross-encoder re-ranking, and local LLM inference via Ollama. Clippings index uses ChromaDB for persistent vector storage.
2026-02-22 12:46:29 -05:00 · 2026-02-22 12:46:29 -05:00 · 90449f108e
commit 90449f108e
12 changed files with 2031 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,37 @@
+# Python
+.venv/
+__pycache__/
+*.pyc
+
+# HuggingFace cached models (large, ~2 GB)
+models/
+
+# Vector stores (large, rebuild with build scripts)
+storage_exp/
+storage/
+storage_clippings/
+
+# Data (symlinks to private files)
+data
+clippings
+
+# Generated file lists
+ocr_needed.txt
+
+# IDE and OS
+.DS_Store
+.vscode/
+.idea/
+
+# Jupyter checkpoints
+.ipynb_checkpoints/
+
+# Secrets
+.env
+API_key_temp
+
+# Query log
+query.log
+
+# Duplicate of CLAUDE.md
+claude.md