Initial commit: RAG pipeline for semantic search over personal journal archive

Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval,
incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
This commit is contained in:
Eric 2026-02-20 06:02:28 -05:00
commit e9fc99ddc6
43 changed files with 7349 additions and 0 deletions

31
.gitignore vendored Normal file
View file

@ -0,0 +1,31 @@
# Python
.venv/
__pycache__/
*.pyc
# HuggingFace cached models (large, ~2 GB)
models/
# Vector stores (large, rebuild with build_exp_claude.py)
storage_exp/
storage/
# Data (symlink to private journal files)
data
# IDE and OS
.DS_Store
.vscode/
.idea/
# Jupyter checkpoints
.ipynb_checkpoints/
# Secrets
.env
# Query log
query.log
# Duplicate of CLAUDE.md
claude.md