Initial commit: RAG pipeline for semantic search over personal journal archive
Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval, incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
This commit is contained in:
commit
e9fc99ddc6
43 changed files with 7349 additions and 0 deletions
31
.gitignore
vendored
Normal file
31
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
# Python
|
||||
.venv/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
|
||||
# HuggingFace cached models (large, ~2 GB)
|
||||
models/
|
||||
|
||||
# Vector stores (large, rebuild with build_exp_claude.py)
|
||||
storage_exp/
|
||||
storage/
|
||||
|
||||
# Data (symlink to private journal files)
|
||||
data
|
||||
|
||||
# IDE and OS
|
||||
.DS_Store
|
||||
.vscode/
|
||||
.idea/
|
||||
|
||||
# Jupyter checkpoints
|
||||
.ipynb_checkpoints/
|
||||
|
||||
# Secrets
|
||||
.env
|
||||
|
||||
# Query log
|
||||
query.log
|
||||
|
||||
# Duplicate of CLAUDE.md
|
||||
claude.md
|
||||
Loading…
Add table
Add a link
Reference in a new issue