ssearch/NOTES.md
Eric e9fc99ddc6 Initial commit: RAG pipeline for semantic search over personal journal archive
Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval,
incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
2026-02-20 06:02:28 -05:00

576 B

Simple query in ChatGPT produced

Metric Best For Type Notes
Cosine Similarity L2-normalized vectors Similarity Scale-invariant
Dot Product Transformer embeddings Similarity Fast, especially on GPUs
Euclidean Distance Raw vectors with meaningful norms Distance Sensitive to scale
Jaccard Sparse binary or set-based data Similarity Discrete features
Soft Cosine Sparse with semantic overlap Similarity Better for text-term overlap
Learned Similarity Fine-tuned deep models Varies Best accuracy, slowest retrieval