Updated to Gemma4 models.

2026-04-09 10:35:48 -04:00 · 2026-04-09 10:35:48 -04:00 · 3347a242ef
commit 3347a242ef
parent 53404dd396
4 changed files with 46 additions and 10 deletions
--- a/README.md
+++ b/README.md
@ -5,7 +5,7 @@ Semantic search over a personal journal archive and a collection of clippings. U
 ## How it works

 ```
-Query → Embed (BAAI/bge-large-en-v1.5) → Vector similarity (top-30) → Cross-encoder re-rank (top-15) → LLM synthesis (command-r7b via Ollama, or OpenAI API) → Response + sources
+Query → Embed (BAAI/bge-large-en-v1.5) → Vector similarity (top-30) → Cross-encoder re-rank (top-15) → LLM synthesis (Gemma4 via Ollama, or OpenAI API) → Response + sources
 ```

 1. **Build**: Source files are chunked (256 tokens, 25-token overlap) and embedded into a vector store using LlamaIndex. The journal index uses LlamaIndex's JSON store; the clippings index uses ChromaDB. Both support incremental updates.
@ -35,7 +35,7 @@ ssearch/

 ## Setup

-**Prerequisites**: Python 3.12, [Ollama](https://ollama.com) with `command-r7b` pulled.
+**Prerequisites**: Python 3.12, [Ollama](https://ollama.com) with `gemma4:e4b` or similar pulled.

 ```bash
 cd ssearch
@ -90,7 +90,7 @@ The default incremental mode loads the existing index, compares file sizes and m

 #### Semantic search with LLM synthesis

-**Requires Ollama running with `command-r7b`.**
+**Requires Ollama running with `gemma4`.**

 **Hybrid BM25 + vector** (`query_hybrid.py`): Retrieves top 20 by vector similarity and top 20 by BM25 term frequency, merges and deduplicates, re-ranks the union to top 15, synthesizes. Catches exact name/term matches that vector-only retrieval misses.
 ```bash
@ -150,7 +150,7 @@ Key parameters (set in source files):
 | Initial retrieval | 30 chunks | query and retrieve scripts |
 | Re-rank model | `cross-encoder/ms-marco-MiniLM-L-12-v2` | query and retrieve scripts |
 | Re-rank top-n | 15 | query and retrieve scripts |
-| LLM | `command-r7b` (Ollama) or `gpt-4o-mini` (OpenAI API) | `query_hybrid.py` |
+| LLM | `gemma4:e4b` (Ollama) or `gpt-4o-mini` (OpenAI API) | `query_hybrid.py` |
 | Temperature | 0.3 | `query_hybrid.py` |
 | Context window | 8000 tokens | `query_hybrid.py` |
 | Request timeout | 360 seconds | `query_hybrid.py` |
@ -171,7 +171,7 @@ Key parameters (set in source files):

 - **BAAI/bge-large-en-v1.5 over all-mpnet-base-v2**: Better semantic matching quality for journal text despite slower embedding.
 - **256-token chunks**: Tested 512 and 384; 256 with 25-token overlap produced the highest quality matches.
- **command-r7b over llama3.1:8B**: Sticks closer to provided context with less hallucination at comparable speed.
+- **gemma4:e4b over command-r7b**: Sticks closer to provided context with less hallucination at comparable speed. Earlier, selected **command-r7b over llama3.1:8B** for similar reasons.
 - **Cross-encoder re-ranking**: Retrieve top-30 via bi-encoder, re-rank to top-15 with a cross-encoder that scores each (query, chunk) pair jointly. Tested three models; `ms-marco-MiniLM-L-12-v2` selected over `stsb-roberta-base` (wrong task) and `BAAI/bge-reranker-v2-m3` (slower, weak score tail).
 - **HyDE query rewriting tested and dropped**: Did not improve results over direct prompt engineering.
 - **Hybrid BM25 + vector retrieval**: BM25 nominates candidates with exact term matches that embeddings miss; the cross-encoder decides final relevance.