ssearch/NOTES.md
Eric e9fc99ddc6 Initial commit: RAG pipeline for semantic search over personal journal archive
Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval,
incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
2026-02-20 06:02:28 -05:00

13 lines
576 B
Markdown

Simple query in ChatGPT produced
Metric | Best For | Type | Notes
-- | -- | -- | --
Cosine Similarity | L2-normalized vectors | Similarity | Scale-invariant
Dot Product | Transformer embeddings | Similarity | Fast, especially on GPUs
Euclidean Distance | Raw vectors with meaningful norms | Distance | Sensitive to scale
Jaccard | Sparse binary or set-based data | Similarity | Discrete features
Soft Cosine | Sparse with semantic overlap | Similarity | Better for text-term overlap
Learned Similarity | Fine-tuned deep models | Varies | Best accuracy, slowest retrieval