Initial commit: RAG pipeline for semantic search over personal journal archive

Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval, incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
2026-02-20 06:02:28 -05:00 · 2026-02-20 06:02:28 -05:00 · e9fc99ddc6
commit e9fc99ddc6
43 changed files with 7349 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,31 @@
+# Python
+.venv/
+__pycache__/
+*.pyc
+
+# HuggingFace cached models (large, ~2 GB)
+models/
+
+# Vector stores (large, rebuild with build_exp_claude.py)
+storage_exp/
+storage/
+
+# Data (symlink to private journal files)
+data
+
+# IDE and OS
+.DS_Store
+.vscode/
+.idea/
+
+# Jupyter checkpoints
+.ipynb_checkpoints/
+
+# Secrets
+.env
+
+# Query log
+query.log
+
+# Duplicate of CLAUDE.md
+claude.md