Updated to Gemma4 models.
This commit is contained in:
parent
53404dd396
commit
3347a242ef
4 changed files with 46 additions and 10 deletions
10
README.md
10
README.md
|
|
@ -5,7 +5,7 @@ Semantic search over a personal journal archive and a collection of clippings. U
|
||||||
## How it works
|
## How it works
|
||||||
|
|
||||||
```
|
```
|
||||||
Query → Embed (BAAI/bge-large-en-v1.5) → Vector similarity (top-30) → Cross-encoder re-rank (top-15) → LLM synthesis (command-r7b via Ollama, or OpenAI API) → Response + sources
|
Query → Embed (BAAI/bge-large-en-v1.5) → Vector similarity (top-30) → Cross-encoder re-rank (top-15) → LLM synthesis (Gemma4 via Ollama, or OpenAI API) → Response + sources
|
||||||
```
|
```
|
||||||
|
|
||||||
1. **Build**: Source files are chunked (256 tokens, 25-token overlap) and embedded into a vector store using LlamaIndex. The journal index uses LlamaIndex's JSON store; the clippings index uses ChromaDB. Both support incremental updates.
|
1. **Build**: Source files are chunked (256 tokens, 25-token overlap) and embedded into a vector store using LlamaIndex. The journal index uses LlamaIndex's JSON store; the clippings index uses ChromaDB. Both support incremental updates.
|
||||||
|
|
@ -35,7 +35,7 @@ ssearch/
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
**Prerequisites**: Python 3.12, [Ollama](https://ollama.com) with `command-r7b` pulled.
|
**Prerequisites**: Python 3.12, [Ollama](https://ollama.com) with `gemma4:e4b` or similar pulled.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd ssearch
|
cd ssearch
|
||||||
|
|
@ -90,7 +90,7 @@ The default incremental mode loads the existing index, compares file sizes and m
|
||||||
|
|
||||||
#### Semantic search with LLM synthesis
|
#### Semantic search with LLM synthesis
|
||||||
|
|
||||||
**Requires Ollama running with `command-r7b`.**
|
**Requires Ollama running with `gemma4`.**
|
||||||
|
|
||||||
**Hybrid BM25 + vector** (`query_hybrid.py`): Retrieves top 20 by vector similarity and top 20 by BM25 term frequency, merges and deduplicates, re-ranks the union to top 15, synthesizes. Catches exact name/term matches that vector-only retrieval misses.
|
**Hybrid BM25 + vector** (`query_hybrid.py`): Retrieves top 20 by vector similarity and top 20 by BM25 term frequency, merges and deduplicates, re-ranks the union to top 15, synthesizes. Catches exact name/term matches that vector-only retrieval misses.
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -150,7 +150,7 @@ Key parameters (set in source files):
|
||||||
| Initial retrieval | 30 chunks | query and retrieve scripts |
|
| Initial retrieval | 30 chunks | query and retrieve scripts |
|
||||||
| Re-rank model | `cross-encoder/ms-marco-MiniLM-L-12-v2` | query and retrieve scripts |
|
| Re-rank model | `cross-encoder/ms-marco-MiniLM-L-12-v2` | query and retrieve scripts |
|
||||||
| Re-rank top-n | 15 | query and retrieve scripts |
|
| Re-rank top-n | 15 | query and retrieve scripts |
|
||||||
| LLM | `command-r7b` (Ollama) or `gpt-4o-mini` (OpenAI API) | `query_hybrid.py` |
|
| LLM | `gemma4:e4b` (Ollama) or `gpt-4o-mini` (OpenAI API) | `query_hybrid.py` |
|
||||||
| Temperature | 0.3 | `query_hybrid.py` |
|
| Temperature | 0.3 | `query_hybrid.py` |
|
||||||
| Context window | 8000 tokens | `query_hybrid.py` |
|
| Context window | 8000 tokens | `query_hybrid.py` |
|
||||||
| Request timeout | 360 seconds | `query_hybrid.py` |
|
| Request timeout | 360 seconds | `query_hybrid.py` |
|
||||||
|
|
@ -171,7 +171,7 @@ Key parameters (set in source files):
|
||||||
|
|
||||||
- **BAAI/bge-large-en-v1.5 over all-mpnet-base-v2**: Better semantic matching quality for journal text despite slower embedding.
|
- **BAAI/bge-large-en-v1.5 over all-mpnet-base-v2**: Better semantic matching quality for journal text despite slower embedding.
|
||||||
- **256-token chunks**: Tested 512 and 384; 256 with 25-token overlap produced the highest quality matches.
|
- **256-token chunks**: Tested 512 and 384; 256 with 25-token overlap produced the highest quality matches.
|
||||||
- **command-r7b over llama3.1:8B**: Sticks closer to provided context with less hallucination at comparable speed.
|
- **gemma4:e4b over command-r7b**: Sticks closer to provided context with less hallucination at comparable speed. Earlier, selected **command-r7b over llama3.1:8B** for similar reasons.
|
||||||
- **Cross-encoder re-ranking**: Retrieve top-30 via bi-encoder, re-rank to top-15 with a cross-encoder that scores each (query, chunk) pair jointly. Tested three models; `ms-marco-MiniLM-L-12-v2` selected over `stsb-roberta-base` (wrong task) and `BAAI/bge-reranker-v2-m3` (slower, weak score tail).
|
- **Cross-encoder re-ranking**: Retrieve top-30 via bi-encoder, re-rank to top-15 with a cross-encoder that scores each (query, chunk) pair jointly. Tested three models; `ms-marco-MiniLM-L-12-v2` selected over `stsb-roberta-base` (wrong task) and `BAAI/bge-reranker-v2-m3` (slower, weak score tail).
|
||||||
- **HyDE query rewriting tested and dropped**: Did not improve results over direct prompt engineering.
|
- **HyDE query rewriting tested and dropped**: Did not improve results over direct prompt engineering.
|
||||||
- **Hybrid BM25 + vector retrieval**: BM25 nominates candidates with exact term matches that embeddings miss; the cross-encoder decides final relevance.
|
- **Hybrid BM25 + vector retrieval**: BM25 nominates candidates with exact term matches that embeddings miss; the cross-encoder decides final relevance.
|
||||||
|
|
|
||||||
|
|
@ -43,8 +43,10 @@ import sys
|
||||||
# Embedding model (must match build_store.py)
|
# Embedding model (must match build_store.py)
|
||||||
EMBED_MODEL = HuggingFaceEmbedding(cache_folder="./models", model_name="BAAI/bge-large-en-v1.5", local_files_only=True)
|
EMBED_MODEL = HuggingFaceEmbedding(cache_folder="./models", model_name="BAAI/bge-large-en-v1.5", local_files_only=True)
|
||||||
|
|
||||||
# LLM model for generation
|
# LLM model for generation. Use temp 0.3.
|
||||||
LLM_MODEL = "command-r7b"
|
#LLM_MODEL = "command-r7b"
|
||||||
|
# testing gemma4:e4b. Recommendations are to use temp 1.0
|
||||||
|
LLM_MODEL = "gemma4:e4b"
|
||||||
|
|
||||||
# Cross-encoder model for re-ranking (cached in ./models/)
|
# Cross-encoder model for re-ranking (cached in ./models/)
|
||||||
RERANK_MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
|
RERANK_MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
|
||||||
|
|
@ -89,7 +91,8 @@ def main():
|
||||||
# Note: Ollama temperature defaults to 0.8
|
# Note: Ollama temperature defaults to 0.8
|
||||||
Settings.llm = Ollama(
|
Settings.llm = Ollama(
|
||||||
model=LLM_MODEL,
|
model=LLM_MODEL,
|
||||||
temperature=0.3,
|
temperature=1.0,
|
||||||
|
thinking=True, # enable
|
||||||
request_timeout=360.0,
|
request_timeout=360.0,
|
||||||
context_window=8000,
|
context_window=8000,
|
||||||
)
|
)
|
||||||
|
|
@ -153,9 +156,20 @@ def main():
|
||||||
n_bm25_only = len([n for n in bm25_nodes if n.node.node_id not in {v.node.node_id for v in vector_nodes}])
|
n_bm25_only = len([n for n in bm25_nodes if n.node.node_id not in {v.node.node_id for v in vector_nodes}])
|
||||||
n_both = len(vector_nodes) + len(bm25_nodes) - len(merged)
|
n_both = len(vector_nodes) + len(bm25_nodes) - len(merged)
|
||||||
|
|
||||||
|
# Estimate context length (prompt + node text)
|
||||||
|
context_text = "\n\n".join(n.get_content() for n in reranked)
|
||||||
|
prompt_text = PROMPT.format(context_str=context_text, query_str=q)
|
||||||
|
n_context_tokens = len(prompt_text.split()) # rough word count; ~1.3 tokens/word
|
||||||
|
|
||||||
print(f"\nQuery: {q}")
|
print(f"\nQuery: {q}")
|
||||||
print(f"Vector: {len(vector_nodes)}, BM25: {len(bm25_nodes)}, "
|
print(f"Vector: {len(vector_nodes)} ({n_vector_only} unique), "
|
||||||
|
f"BM25: {len(bm25_nodes)} ({n_bm25_only} unique), "
|
||||||
f"overlap: {n_both}, merged: {len(merged)}, re-ranked to: {len(reranked)}")
|
f"overlap: {n_both}, merged: {len(merged)}, re-ranked to: {len(reranked)}")
|
||||||
|
|
||||||
|
# The token estimate uses a ~1.3 tokens/word ratio, which is a rough approximation.
|
||||||
|
# For an exact count you'd need the model's tokenizer, but this gives a useful ballpark
|
||||||
|
# for gauging how much of the context window we use.
|
||||||
|
print(f"Context: ~{n_context_tokens} words (~{int(n_context_tokens * 1.3)} tokens)")
|
||||||
|
|
||||||
# Synthesize response with LLM
|
# Synthesize response with LLM
|
||||||
synthesizer = get_response_synthesizer(text_qa_template=PROMPT)
|
synthesizer = get_response_synthesizer(text_qa_template=PROMPT)
|
||||||
|
|
@ -169,7 +183,7 @@ def main():
|
||||||
for node in response.source_nodes:
|
for node in response.source_nodes:
|
||||||
meta = getattr(node, "metadata", None) or node.node.metadata
|
meta = getattr(node, "metadata", None) or node.node.metadata
|
||||||
score = getattr(node, "score", None)
|
score = getattr(node, "score", None)
|
||||||
print(f"{meta.get('file_name')} {meta.get('file_path')} {score:.3f}")
|
print(f"data/{meta.get('file_name')} {score:.3f}")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
|
||||||
11
run_query.sh
11
run_query.sh
|
|
@ -7,6 +7,17 @@
|
||||||
# Usage: ./run_query.sh
|
# Usage: ./run_query.sh
|
||||||
|
|
||||||
QUERY_SCRIPT="query_hybrid.py"
|
QUERY_SCRIPT="query_hybrid.py"
|
||||||
|
VENV_DIR=".venv"
|
||||||
|
|
||||||
|
# Activate the virtual environment
|
||||||
|
if [ -d "$VENV_DIR" ]; then
|
||||||
|
source "$VENV_DIR/bin/activate"
|
||||||
|
echo "Activated virtual environment: $VENV_DIR"
|
||||||
|
else
|
||||||
|
echo "Error: Virtual environment not found at '$VENV_DIR'" >&2
|
||||||
|
echo "Create one with: python3 -m venv $VENV_DIR" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
echo -e "Current query engine is $QUERY_SCRIPT\n"
|
echo -e "Current query engine is $QUERY_SCRIPT\n"
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -7,6 +7,17 @@
|
||||||
# Usage: ./run_query.sh
|
# Usage: ./run_query.sh
|
||||||
|
|
||||||
QUERY_SCRIPT="retrieve.py"
|
QUERY_SCRIPT="retrieve.py"
|
||||||
|
VENV_DIR=".venv"
|
||||||
|
|
||||||
|
# Activate the virtual environment
|
||||||
|
if [ -d "$VENV_DIR" ]; then
|
||||||
|
source "$VENV_DIR/bin/activate"
|
||||||
|
echo "Activated virtual environment: $VENV_DIR"
|
||||||
|
else
|
||||||
|
echo "Error: Virtual environment not found at '$VENV_DIR'" >&2
|
||||||
|
echo "Create one with: python3 -m venv $VENV_DIR" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
echo -e "$QUERY_SCRIPT -- retrieve vector store chunks based on similaity + BM25 with reranking.\n"
|
echo -e "$QUERY_SCRIPT -- retrieve vector store chunks based on similaity + BM25 with reranking.\n"
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue