Updated to Gemma4 models.
This commit is contained in:
parent
53404dd396
commit
3347a242ef
4 changed files with 46 additions and 10 deletions
10
README.md
10
README.md
|
|
@ -5,7 +5,7 @@ Semantic search over a personal journal archive and a collection of clippings. U
|
|||
## How it works
|
||||
|
||||
```
|
||||
Query → Embed (BAAI/bge-large-en-v1.5) → Vector similarity (top-30) → Cross-encoder re-rank (top-15) → LLM synthesis (command-r7b via Ollama, or OpenAI API) → Response + sources
|
||||
Query → Embed (BAAI/bge-large-en-v1.5) → Vector similarity (top-30) → Cross-encoder re-rank (top-15) → LLM synthesis (Gemma4 via Ollama, or OpenAI API) → Response + sources
|
||||
```
|
||||
|
||||
1. **Build**: Source files are chunked (256 tokens, 25-token overlap) and embedded into a vector store using LlamaIndex. The journal index uses LlamaIndex's JSON store; the clippings index uses ChromaDB. Both support incremental updates.
|
||||
|
|
@ -35,7 +35,7 @@ ssearch/
|
|||
|
||||
## Setup
|
||||
|
||||
**Prerequisites**: Python 3.12, [Ollama](https://ollama.com) with `command-r7b` pulled.
|
||||
**Prerequisites**: Python 3.12, [Ollama](https://ollama.com) with `gemma4:e4b` or similar pulled.
|
||||
|
||||
```bash
|
||||
cd ssearch
|
||||
|
|
@ -90,7 +90,7 @@ The default incremental mode loads the existing index, compares file sizes and m
|
|||
|
||||
#### Semantic search with LLM synthesis
|
||||
|
||||
**Requires Ollama running with `command-r7b`.**
|
||||
**Requires Ollama running with `gemma4`.**
|
||||
|
||||
**Hybrid BM25 + vector** (`query_hybrid.py`): Retrieves top 20 by vector similarity and top 20 by BM25 term frequency, merges and deduplicates, re-ranks the union to top 15, synthesizes. Catches exact name/term matches that vector-only retrieval misses.
|
||||
```bash
|
||||
|
|
@ -150,7 +150,7 @@ Key parameters (set in source files):
|
|||
| Initial retrieval | 30 chunks | query and retrieve scripts |
|
||||
| Re-rank model | `cross-encoder/ms-marco-MiniLM-L-12-v2` | query and retrieve scripts |
|
||||
| Re-rank top-n | 15 | query and retrieve scripts |
|
||||
| LLM | `command-r7b` (Ollama) or `gpt-4o-mini` (OpenAI API) | `query_hybrid.py` |
|
||||
| LLM | `gemma4:e4b` (Ollama) or `gpt-4o-mini` (OpenAI API) | `query_hybrid.py` |
|
||||
| Temperature | 0.3 | `query_hybrid.py` |
|
||||
| Context window | 8000 tokens | `query_hybrid.py` |
|
||||
| Request timeout | 360 seconds | `query_hybrid.py` |
|
||||
|
|
@ -171,7 +171,7 @@ Key parameters (set in source files):
|
|||
|
||||
- **BAAI/bge-large-en-v1.5 over all-mpnet-base-v2**: Better semantic matching quality for journal text despite slower embedding.
|
||||
- **256-token chunks**: Tested 512 and 384; 256 with 25-token overlap produced the highest quality matches.
|
||||
- **command-r7b over llama3.1:8B**: Sticks closer to provided context with less hallucination at comparable speed.
|
||||
- **gemma4:e4b over command-r7b**: Sticks closer to provided context with less hallucination at comparable speed. Earlier, selected **command-r7b over llama3.1:8B** for similar reasons.
|
||||
- **Cross-encoder re-ranking**: Retrieve top-30 via bi-encoder, re-rank to top-15 with a cross-encoder that scores each (query, chunk) pair jointly. Tested three models; `ms-marco-MiniLM-L-12-v2` selected over `stsb-roberta-base` (wrong task) and `BAAI/bge-reranker-v2-m3` (slower, weak score tail).
|
||||
- **HyDE query rewriting tested and dropped**: Did not improve results over direct prompt engineering.
|
||||
- **Hybrid BM25 + vector retrieval**: BM25 nominates candidates with exact term matches that embeddings miss; the cross-encoder decides final relevance.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue