Compare commits

..

No commits in common. "public" and "eb9997326fa43559aa1d652f3d2477d88e2b4cb0" have entirely different histories.

38 changed files with 7370 additions and 0 deletions

13
NOTES.md Normal file
View file

@ -0,0 +1,13 @@
Simple query in ChatGPT produced
Metric | Best For | Type | Notes
-- | -- | -- | --
Cosine Similarity | L2-normalized vectors | Similarity | Scale-invariant
Dot Product | Transformer embeddings | Similarity | Fast, especially on GPUs
Euclidean Distance | Raw vectors with meaningful norms | Distance | Sensitive to scale
Jaccard | Sparse binary or set-based data | Similarity | Discrete features
Soft Cosine | Sparse with semantic overlap | Similarity | Better for text-term overlap
Learned Similarity | Fine-tuned deep models | Varies | Best accuracy, slowest retrieval

View file

@ -30,7 +30,11 @@ ssearch/
├── clippings/ # Symlink to clippings (PDFs, TXT, webarchive, RTF) ├── clippings/ # Symlink to clippings (PDFs, TXT, webarchive, RTF)
├── store/ # Persisted journal vector store ├── store/ # Persisted journal vector store
├── models/ # Cached HuggingFace models (offline) ├── models/ # Cached HuggingFace models (offline)
├── archived/ # Superseded script versions
├── saved_output/ # Saved query results and model comparisons
├── requirements.txt # Python dependencies ├── requirements.txt # Python dependencies
├── devlog.md # Development log and experimental findings
└── *.ipynb # Jupyter notebooks (HyDE, metrics, sandbox)
``` ```
## Setup ## Setup
@ -167,6 +171,16 @@ Key parameters (set in source files):
- **sentence-transformers** -- cross-encoder re-ranking - **sentence-transformers** -- cross-encoder re-ranking
- **torch** -- ML runtime - **torch** -- ML runtime
## Notebooks
Three Jupyter notebooks document exploration and analysis:
- **`hyde.ipynb`** -- Experiments with HyDE (Hypothetical Document Embeddings) query rewriting. Finding: did not improve retrieval quality over direct prompt engineering.
- **`sandbox.ipynb`** -- Exploratory notebook for learning the LlamaIndex API.
- **`vs_metrics.ipynb`** -- Quantitative analysis of the vector store (embedding distributions, pairwise similarity, clustering, PCA/t-SNE projections).
## Design decisions ## Design decisions
- **BAAI/bge-large-en-v1.5 over all-mpnet-base-v2**: Better semantic matching quality for journal text despite slower embedding. - **BAAI/bge-large-en-v1.5 over all-mpnet-base-v2**: Better semantic matching quality for journal text despite slower embedding.
@ -178,3 +192,10 @@ Key parameters (set in source files):
- **ChromaDB for clippings**: Persistent SQLite-backed store. Chosen over the JSON store for its metadata filtering and direct chunk-level operations for incremental updates. - **ChromaDB for clippings**: Persistent SQLite-backed store. Chosen over the JSON store for its metadata filtering and direct chunk-level operations for incremental updates.
- **PDF validation before indexing**: Pre-check each PDF with pypdf — skip if text extraction yields <100 chars or low printable ratio. Skipped files written to `ocr_needed.txt`. - **PDF validation before indexing**: Pre-check each PDF with pypdf — skip if text extraction yields <100 chars or low printable ratio. Skipped files written to `ocr_needed.txt`.
## Development history
- **Aug 2025**: Initial implementation -- build pipeline, embedding model comparison, chunk size experiments, HyDE testing.
- **Jan 2026**: Command-line interface, prompt improvements, model comparison (command-r7b selected).
- **Feb 2026**: Cross-encoder re-ranking, hybrid BM25+vector retrieval, LlamaIndex upgrade to 0.14.14, OpenAI API backend, incremental updates, clippings search (ChromaDB), project reorganization.
See `devlog.md` for detailed development notes and experimental findings.

51
archived/build.py Normal file
View file

@ -0,0 +1,51 @@
# build.py
#
# Import documents from data, generate embedded vector store
# and save to disk in directory ./storage
#
# August 2025
# E. M. Furst
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
def main():
# Choose your embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5")
# Configure global settings for LlamaIndex
Settings.embed_model = embed_model
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create the custom textsplitter
# Set chunk size and overlap (e.g., 256 tokens, 25 tokens overlap)
# see https://docs.llamaindex.ai/en/stable/api_reference/node_parsers/sentence_splitter/#llama_index.core.node_parser.SentenceSplitter
text_splitter = SentenceSplitter(
chunk_size=256,
chunk_overlap=25,
paragraph_separator="\n\n", # use double newlines to separate paragraphs
)
Settings.text_splitter = text_splitter
# Build the index
index = VectorStoreIndex.from_documents(
documents, transformations=[text_splitter],
show_progress=True,
)
# Persist both vector store and index metadata
index.storage_context.persist(persist_dir="./storage")
print("Index built and saved to ./storage")
if __name__ == "__main__":
main()

68
archived/build_exp.py Normal file
View file

@ -0,0 +1,68 @@
# build_exp.py
#
# Import document from data, generate embedded vector store
# and save to disk
#
# Experiment to include text chunking with a textsplitter
#
# August 2025
# E. M. Furst
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
Settings,
)
from pathlib import Path
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
def main():
# Choose your embedding model
#embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
# embedding is slower with BAAI/bge-large-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5")
# Configure global settings for LlamaIndex
Settings.embed_model = embed_model
# Load documents (capabilities?)
documents = SimpleDirectoryReader(
"./data",
# # p is a string path
# file_metadata=lambda p: {
# "filename": Path(p).name, # just the file name
# "filepath": str(Path(p).resolve()), # absolute path (handy for tracing)
# },
).load_data()
# Create the custom textsplitter
# Set chunk size and overlap (e.g., 512 tokens, 10 toekns overlap)
# see https://docs.llamaindex.ai/en/stable/api_reference/node_parsers/sentence_splitter/#llama_index.core.node_parser.SentenceSplitter
text_splitter = SentenceSplitter(
chunk_size=256,
chunk_overlap=25,
paragraph_separator="\n\n", # use double newlines to separate paragraphs
)
# b/c passing text_splitter in the index build, this may cause problems
# test with it commented out...
# Settings.text_splitter = text_splitter
# Build the index
index = VectorStoreIndex.from_documents(
documents, transformations=[text_splitter],
show_progress=True,
)
# Persist both vector store and index metadata
index.storage_context.persist(persist_dir="./storage_exp")
# storage_context = StorageContext.from_defaults(vector_store=index.vector_store)
# storage_context.persist(persist_dir="./storage")
print("Index built and saved to ./storage_exp")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,164 @@
# Better HyDE debugging with targeted tests
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core import PromptTemplate
from llama_index.core import Settings
from llama_index.core.base.base_query_engine import BaseQueryEngine
from llama_index.llms.ollama import Ollama
llm="llama3.1:8B"
# Use a local model to generate
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000,
temperature=0.7,
)
# Test queries that should produce very different hypothetical documents
test_queries = [
"What is the capital of France?",
"How do you make chocolate chip cookies?",
"Explain quantum physics",
"Write a love letter",
"Describe symptoms of the common cold"
]
print("=== DEBUGGING HYDE STEP BY STEP ===\n")
# 1. Test the LLM with HyDE-style prompts directly
print("1. Testing LLM directly with HyDE-style prompts:")
print("-" * 50)
for query in test_queries[:2]: # Just test 2 to keep output manageable
direct_prompt = f"""Generate a hypothetical document that would contain the answer to this query.
Query: {query}
Hypothetical document:"""
response = Settings.llm.complete(direct_prompt)
print(f"Query: {query}")
print(f"Direct LLM Response: {response.text[:100]}...")
print()
# 2. Check HyDE internals - let's see what's actually happening
print("\n2. Examining HyDE internal behavior:")
print("-" * 50)
# Create a custom HyDE that shows us everything
class VerboseHyDETransform(HyDEQueryTransform):
def _get_prompts(self):
"""Show what prompts are being used"""
prompts = super()._get_prompts()
print(f"HyDE prompts: {prompts}")
return prompts
def _run_component(self, **kwargs):
"""Show what's being passed to the LLM"""
print(f"HyDE _run_component kwargs: {kwargs}")
result = super()._run_component(**kwargs)
print(f"HyDE _run_component result: {result}")
return result
# Test with verbose HyDE
verbose_hyde = VerboseHyDETransform(llm=Settings.llm)
test_result = verbose_hyde.run("What is machine learning?")
print(f"Final verbose result: {test_result}")
# 3. Try the most basic possible test
print("\n3. Most basic HyDE test:")
print("-" * 50)
basic_hyde = HyDEQueryTransform(llm=Settings.llm)
basic_result = basic_hyde.run("Paris")
print(f"Input: 'Paris'")
print(f"Output: '{basic_result}'")
print(f"Same as input? {basic_result.strip() == 'Paris'}")
# 4. Check if it's a version issue - try alternative approach
print("\n4. Alternative HyDE approach:")
print("-" * 50)
try:
# Some versions might need different initialization
from llama_index.core.query_engine import TransformQueryEngine
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
# Try with explicit prompt template
hyde_prompt_template = PromptTemplate(
"Please write a passage to answer the question\n"
"Try to include as many key details as possible\n"
"\n"
"\n"
"Passage:{query_str}\n"
"\n"
"\n"
"Passage:"
)
alt_hyde = HyDEQueryTransform(
llm=Settings.llm,
hyde_prompt=hyde_prompt_template
)
alt_result = alt_hyde.run("What causes rain?")
print(f"Alternative approach result: {alt_result}")
except Exception as e:
print(f"Alternative approach failed: {e}")
# 5. Check what happens with different query formats
print("\n5. Testing different input formats:")
print("-" * 50)
from llama_index.core.schema import QueryBundle
# Test with QueryBundle vs string
hyde_test = HyDEQueryTransform(llm=Settings.llm)
string_result = hyde_test.run("test query")
print(f"String input result: '{string_result}'")
query_bundle = QueryBundle(query_str="test query")
bundle_result = hyde_test.run(query_bundle)
print(f"QueryBundle input result: '{bundle_result}'")
# 6. Version and import check
print("\n6. Environment check:")
print("-" * 50)
import llama_index
print(f"LlamaIndex version: {llama_index.__version__}")
# Check what LLM you're actually using
print(f"LLM type: {type(Settings.llm)}")
print(f"LLM model name: {getattr(Settings.llm, 'model', 'Unknown')}")
# 7. Try the nuclear option - completely manual implementation
print("\n7. Manual HyDE implementation:")
print("-" * 50)
def manual_hyde(query: str, llm):
"""Completely manual HyDE to see if the concept works"""
prompt = f"""You are an expert writer. Generate a realistic document excerpt that would contain the answer to this question.
Question: {query}
Document excerpt:"""
response = llm.complete(prompt)
return response.text
manual_result = manual_hyde("What is photosynthesis?", Settings.llm)
print(f"Manual HyDE result: {manual_result[:150]}...")
# 8. Final diagnostic
print("\n8. Final diagnostic questions:")
print("-" * 50)
print("If all the above show the LLM generating proper responses but HyDE still returns original:")
print("- What LLM are you using? (OpenAI, Anthropic, local model, etc.)")
print("- What's your LlamaIndex version?")
print("- Are there any error messages in the logs?")
print("- Does the LLM have any special configuration or wrappers?")

BIN
archived/output.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 785 KiB

110
archived/query.py Normal file
View file

@ -0,0 +1,110 @@
# query_topk_prompt.py
# Run a querry on a vector store
#
# E. M. Furst August 2025
from llama_index.core import (
load_index_from_storage,
StorageContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
import os
#
# Globals
#
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Embedding model used in vector store (this should match the one in build.py or equivalent)
embed_model = HuggingFaceEmbedding(cache_folder="./models",model_name="BAAI/bge-large-en-v1.5")
# LLM model to use in query transform and generation
llm="command-r7b"
#
# Custom prompt for the query engine
#
PROMPT = PromptTemplate(
"""You are an expert research assistant. You are given top-ranked writing excerpts (CONTEXT) and a user's QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to 10).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context in a few sentences.
2. **Matching Files**
Make a list of 10 matching files. The format for each should be:
<filename> -
<rationale tied to content. Include date or section hints if available.>
CONTEXT:
{context_str}
QUERY:
{query_str}
Now provide the theme and list of matching files."""
)
#
# Main program routine
#
def main():
# Use a local model to generate -- in this case using Ollama
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
query_engine = index.as_query_engine(
similarity_top_k=15, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
while True:
q = input("\nEnter a search topic or question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
# Generate the response by querying the engine
# This performes the similarity search and then applies the prompt
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(f"{meta.get('file_name')} {meta.get('file_path')} {getattr(node, 'score', None)}")
if __name__ == "__main__":
main()

90
archived/query_catalog.py Normal file
View file

@ -0,0 +1,90 @@
# query.py
# Run a querry on a vector store
# This version implements a CATALOG prompt
#
# E.M.F. July 2025
# August 2025 - updated for nd ssearch
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core.prompts import PromptTemplate
import logging
logging.basicConfig(level=logging.DEBUG)
CATALOG_PROMPT = PromptTemplate(
"""You are a research assistant. Youre given journal snippets (CONTEXT) and a user query.
Your job is NOT to write an essay but to list the best-matching journal files with a 12 sentence rationale.
Rules:
- Use only the CONTEXT; do not invent content.
- Prefer precise references to passages over generalities.
- Output exactly:
1) A brief one-line summary of the overall theme you detect.
2) A bulleted list: **filename** brief rationale. If available in the snippet, include date or section hints.
CONTEXT:
{context_str}
QUERY: {query_str}
Now produce the summary line and the bulleted list of matching files."""
)
# Use a local model to generate
Settings.llm = Ollama(
# model="llama3.1:8B", # First model tested
# model="deepseek-r1:8B", # This model shows its reasoning
model="gemma3:1b",
request_timeout=360.0,
context_window=8000
)
def main():
# Load embedding model (same as used for vector store)
embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(
similarity_top_k=10, # pull wide (tune to taste)
#response_mode="compact", # concise synthesis
text_qa_template=CATALOG_PROMPT, # <- custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
while True:
q = input("\nEnter your question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for sn in response.source_nodes:
meta = getattr(sn, "metadata", None) or sn.node.metadata
print(meta.get("file_name"), "---", meta.get("file_path"), getattr(sn, "score", None))
if __name__ == "__main__":
main()

View file

@ -0,0 +1,223 @@
#!/usr/bin/env python3
"""
query_topk_prompt_engine.py
Query a vector store with a custom prompt for research assistance.
Uses BAAI/bge-large-en-v1.5 embeddings and Ollama for generation.
E.M.F. January 2026
Using Claude Sonnet 4.5 to suggest changes
"""
import argparse
import os
import sys
from pathlib import Path
from llama_index.core import (
Settings,
StorageContext,
load_index_from_storage,
)
from llama_index.core.prompts import PromptTemplate
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
# Suppress tokenizer parallelism warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Configuration defaults
DEFAULT_LLM = "command-r7b"
DEFAULT_EMBED_MODEL = "BAAI/bge-large-en-v1.5"
DEFAULT_STORAGE_DIR = "./storage_exp"
DEFAULT_TOP_K = 15
DEFAULT_SIMILARITY_CUTOFF = 0.7 # Set to None to disable
def get_prompt_template(max_files: int = 10) -> PromptTemplate:
"""Return the custom prompt template for the query engine."""
return PromptTemplate(
f"""You are an expert research assistant. You are given top-ranked writing excerpts (CONTEXT) and a user's QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to {max_files}).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context in a few sentences.
2. **Matching Files**
List up to {max_files} matching files. Format each as:
<filename> - <rationale tied to content. Include date or section hints if available.>
CONTEXT:
{{context_str}}
QUERY:
{{query_str}}
Now provide the theme and list of matching files."""
)
def load_models(
llm_name: str = DEFAULT_LLM,
embed_model_name: str = DEFAULT_EMBED_MODEL,
cache_folder: str = "./models",
request_timeout: float = 360.0,
context_window: int = 8000,
):
"""Initialize and configure the LLM and embedding models."""
Settings.llm = Ollama(
model=llm_name,
request_timeout=request_timeout,
context_window=context_window,
)
Settings.embed_model = HuggingFaceEmbedding(
cache_folder=cache_folder,
model_name=embed_model_name,
local_files_only=True,
)
def load_query_engine(
storage_dir: str = DEFAULT_STORAGE_DIR,
top_k: int = DEFAULT_TOP_K,
similarity_cutoff: float | None = DEFAULT_SIMILARITY_CUTOFF,
max_files: int = 10,
):
"""Load the vector store and create a query engine with custom prompt."""
storage_path = Path(storage_dir)
if not storage_path.exists():
raise FileNotFoundError(f"Storage directory not found: {storage_dir}")
storage_context = StorageContext.from_defaults(persist_dir=str(storage_path))
index = load_index_from_storage(storage_context)
# Build postprocessors
postprocessors = []
if similarity_cutoff is not None:
postprocessors.append(SimilarityPostprocessor(similarity_cutoff=similarity_cutoff))
return index.as_query_engine(
similarity_top_k=top_k,
text_qa_template=get_prompt_template(max_files),
node_postprocessors=postprocessors if postprocessors else None,
)
def get_node_metadata(node) -> dict:
"""Safely extract metadata from a source node."""
# Handle different node structures in llamaindex
if hasattr(node, "metadata") and node.metadata:
return node.metadata
if hasattr(node, "node") and hasattr(node.node, "metadata"):
return node.node.metadata
return {}
def print_results(response):
"""Print the query response and source documents."""
print("\n" + "=" * 60)
print("RESPONSE")
print("=" * 60 + "\n")
print(response.response)
print("\n" + "=" * 60)
print("SOURCE DOCUMENTS")
print("=" * 60 + "\n")
for i, node in enumerate(response.source_nodes, 1):
meta = get_node_metadata(node)
score = getattr(node, "score", None)
file_name = meta.get("file_name", "Unknown")
file_path = meta.get("file_path", "Unknown")
score_str = f"{score:.3f}" if score is not None else "N/A"
print(f"{i:2}. [{score_str}] {file_name}")
print(f" Path: {file_path}")
def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description="Query a vector store with a custom research assistant prompt.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python query_topk_prompt_engine.py "What themes appear in the documents?"
python query_topk_prompt_engine.py --top-k 20 --llm llama3.1:8B "Find references to machine learning"
""",
)
parser.add_argument("query", nargs="+", help="The query text")
parser.add_argument(
"--llm",
default=DEFAULT_LLM,
help=f"Ollama model to use for generation (default: {DEFAULT_LLM})",
)
parser.add_argument(
"--storage-dir",
default=DEFAULT_STORAGE_DIR,
help=f"Path to the vector store (default: {DEFAULT_STORAGE_DIR})",
)
parser.add_argument(
"--top-k",
type=int,
default=DEFAULT_TOP_K,
help=f"Number of similar documents to retrieve (default: {DEFAULT_TOP_K})",
)
parser.add_argument(
"--similarity-cutoff",
type=float,
default=DEFAULT_SIMILARITY_CUTOFF,
help=f"Minimum similarity score (default: {DEFAULT_SIMILARITY_CUTOFF}, use 0 to disable)",
)
parser.add_argument(
"--max-files",
type=int,
default=10,
help="Maximum files to list in response (default: 10)",
)
return parser.parse_args()
def main():
args = parse_args()
# Handle similarity cutoff of 0 as "disabled"
similarity_cutoff = args.similarity_cutoff if args.similarity_cutoff > 0 else None
try:
print(f"Loading models (LLM: {args.llm})...")
load_models(llm_name=args.llm)
print(f"Loading index from {args.storage_dir}...")
query_engine = load_query_engine(
storage_dir=args.storage_dir,
top_k=args.top_k,
similarity_cutoff=similarity_cutoff,
max_files=args.max_files,
)
query_text = " ".join(args.query)
print(f"Querying: {query_text[:100]}{'...' if len(query_text) > 100 else ''}")
response = query_engine.query(query_text)
print_results(response)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error during query: {e}", file=sys.stderr)
raise
if __name__ == "__main__":
main()

106
archived/query_exp.py Normal file
View file

@ -0,0 +1,106 @@
# query_topk.py
# Run a querry on a vector store
#
# This verison implements a prompt and uses the build_exp.py vector store
# It is based on query_topk.py
# It uses 10 top-k results and a custom prompt
# The next version after this is query_rewrite.py
# build_exp.py modifies the chunk size and overlap form the orignal build.py
#
# E.M.F. August 2025
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
# LLM model to use in query transform and generation
llm="llama3.1:8B"
# Other models tried:
# llm="deepseek-r1:8B"
# llm="gemma3:1b"
# Custom prompt for the query engine
PROMPT = PromptTemplate(
"""You are an expert research assistant. You are given top-ranked journal excerpts (CONTEXT) and a users QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to 10).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context.
2. **Matching Files**
Make a bullet list of 10. The format for each should be:
**<filename>** <rationale tied to content. Include date or section hints if available.>
CONTEXT:
{context_str}
QUERY:
{query_str}
Now provide the theme and list of matching files."""
)
#
# Main program routine
#
def main():
# Use a local model to generate
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
query_engine = index.as_query_engine(
similarity_top_k=10, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
while True:
q = input("\nEnter your question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(meta.get("file_name"), "---", meta.get("file_path"), getattr(node, "score", None))
if __name__ == "__main__":
main()

106
archived/query_multitool.py Normal file
View file

@ -0,0 +1,106 @@
"""
This is output generated by ChatG to implement a new regex + vector search engine
"""
from __future__ import annotations
from typing import List, Iterable
import json, re
from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.retrievers import BaseRetriever, EnsembleRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import Document
# 0) Configure your LLM + embeddings up front
# Example: Settings.llm = <your Command-R wrapper> ; Settings.embed_model = <your embeddings>
# (You can also pass an llm explicitly into the retriever if you prefer.)
# Settings.llm.complete("hello") should work in v0.10+
# 1) Prepare nodes once (so regex + vector share the same chunks)
def build_nodes(docs: List[Document], chunk_size: int = 1024, overlap: int = 100):
splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=overlap)
return splitter.get_nodes_from_documents(docs)
# 2) LLM-guided regex retriever
class RegexRetriever(BaseRetriever):
def __init__(self, nodes: Iterable, llm=None, top_k: int = 5, flags=re.IGNORECASE):
super().__init__()
self._nodes = list(nodes)
self._llm = llm or Settings.llm
self._top_k = top_k
self._flags = flags
def _extract_terms(self, query: str) -> List[str]:
"""Ask the LLM for up to ~6 distinctive keywords/short phrases. Return a list of strings."""
prompt = f"""
You extract search terms for a boolean/regex search.
Query: {query}
Rules:
- Return ONLY a JSON array of strings.
- Use up to 6 concise keywords/short phrases.
- Keep phrases short (<= 3 words).
- Avoid stopwords, punctuation, and generic terms.
- No explanations, no extra text.
"""
raw = self._llm.complete(prompt).text.strip()
try:
terms = json.loads(raw)
# basic sanitize
terms = [t for t in terms if isinstance(t, str) and t.strip()]
except Exception:
# simple fall-back if JSON parse fails
terms = [w for w in re.findall(r"\w+", query) if len(w) > 2][:6]
return terms[:6]
def _compile_patterns(self, terms: List[str]) -> List[re.Pattern]:
pats = []
for t in terms:
# Escape user/LLM output, add word boundaries; allow whitespace inside short phrases
escaped = re.escape(t)
# turn '\ ' (escaped space) back into '\s+' to match any whitespace in phrases
escaped = escaped.replace(r"\ ", r"\s+")
pats.append(re.compile(rf"\b{escaped}\b", self._flags))
return pats
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
terms = self._extract_terms(query_bundle.query_str)
patterns = self._compile_patterns(terms)
scored: List[tuple] = []
for n in self._nodes:
txt = n.get_content(metadata_mode="all")
hits = 0
for p in patterns:
if p.search(txt):
hits += 1
if hits:
# simple score = number of distinct term hits (you can weight phrase vs single word if you like)
scored.append((n, float(hits)))
scored.sort(key=lambda x: x[1], reverse=True)
return [NodeWithScore(node=n, score=s) for n, s in scored[: self._top_k]]
# 3) Wire it all together
def build_query_engine(docs: List[Document], k_vec=5, k_regex=5, weights=(0.7, 0.3)):
nodes = build_nodes(docs)
# Vector index over the SAME nodes
vindex = VectorStoreIndex(nodes)
vector_ret = vindex.as_retriever(similarity_top_k=k_vec)
regex_ret = RegexRetriever(nodes, top_k=k_regex)
ensemble = EnsembleRetriever(
retrievers=[vector_ret, regex_ret],
weights=list(weights), # tune this: more recall from regex? bump weight on regex
# uses Reciprocal Rank Fusion by default
)
return RetrieverQueryEngine(retriever=ensemble)
# 4) Use it
# docs = SimpleDirectoryReader("data").load_data()
# qe = build_query_engine(docs)
# print(qe.query("Find entries with strong feelings of depression."))

View file

@ -0,0 +1,126 @@
# query_rewrite_hyde.py
# Run a querry on a vector store
#
# Latest experiment to include query rewriting using HyDE (Hypothetial Document Embeddings)
# The goal is to reduce the semantic gap between the query and the indexed documents
# This verison implements a prompt and uses the build_exp.py vector store
# Based on query_exp.py
#
# E.M.F. July 2025
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine.transform_query_engine import TransformQueryEngine
import os
# Globals
# Embedding model used in vector store (this should match the one in build_exp.py or equivalent)
# embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
embed_model = HuggingFaceEmbedding(cache_folder="./models",model_name="BAAI/bge-large-en-v1.5")
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# LLM model to use in query transform and generation
llm="llama3.1:8B"
# Other models tried:
# llm="deepseek-r1:8B"
# llm="gemma3:1b"
# Custom prompt for the query engine
PROMPT = PromptTemplate(
"""You are an expert research assistant. You are given top-ranked writing excerpts (CONTEXT) and a user's QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to 10).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context in a few sentences.
2. **Matching Files**
Make a list of 10 matching files. The format for each should be:
<filename> <rationale tied to content. Include date or section hints if available.>
CONTEXT:
{context_str}
QUERY:
{query_str}
Now provide the theme and list of matching files."""
)
#
# Main program routine
#
def main():
# Use a local model to generate
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
base_query_engine = index.as_query_engine(
similarity_top_k=15, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# HyDE is "Hypothetical Document Embeddings"
# It generates a hypothetical document based on the query
# and uses that to augment the query
# Here we include the original query as well
# I get better similarity values with include_orignal=True
hyde_transform = HyDEQueryTransform(llm=Settings.llm,include_original=True)
# Query
while True:
q = input("\nEnter a search topic or question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
# The query uses a HyDE trasformation to rewrite the query
query_engine = TransformQueryEngine(base_query_engine, query_transform=hyde_transform)
# Generate the response by querying the engine
# This performes the similarity search and then applies the prompt
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(meta.get("file_name"), "---", meta.get("file_path"), getattr(node, "score", None))
if __name__ == "__main__":
main()

58
archived/query_topk.py Normal file
View file

@ -0,0 +1,58 @@
# query_topk.py
# Run a querry on a vector store
#
# E.M.F. July 2025
# August 2025 - updated for nd ssearch
# this version uses top-k similarity
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
# Use a local model to generate
Settings.llm = Ollama(
model="llama3.1:8B", # First model tested
# model="deepseek-r1:8B", # This model shows its reasoning
# model="gemma3:1b",
request_timeout=360.0,
context_window=8000
)
def main():
# Load embedding model (same as used for vector store)
embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(similarity_top_k=5)
# Query
while True:
q = input("\nEnter your question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(meta.get("file_name"), "---", meta.get("file_path"), getattr(node, "score", None))
if __name__ == "__main__":
main()

View file

@ -0,0 +1,123 @@
# query_topk_prompt.py
# Run a querry on a vector store
#
# This version from query_rewrite_hyde.py, but removing hyde and using a custom prompt
# This verison implements a prompt and uses the build_exp.py vector store with BAAI/bge-large-en-v1.5
# Based on query_exp.py->query_topk.py->query_rewrite_hyde.py
# The results are as good as with HyDE.
#
# E.M.F. August 2025
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
import os
#
# Globals
#
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Embedding model used in vector store (this should match the one in build_exp.py or equivalent)
# embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
embed_model = HuggingFaceEmbedding(cache_folder="./models",model_name="BAAI/bge-large-en-v1.5")
# LLM model to use in query transform and generation
# command-r7b generates about as quickly as llama3.1:8B, but provides results that stick better
# to the provided context
llm="command-r7b"
# Other models tried:
#llm="llama3.1:8B"
#llm="deepseek-r1:8B"
#llm="gemma3:1b"
#
# Custom prompt for the query engine
#
PROMPT = PromptTemplate(
"""You are an expert research assistant. You are given top-ranked writing excerpts (CONTEXT) and a user's QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to 10).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context in a few sentences.
2. **Matching Files**
Make a list of 10 matching files. The format for each should be:
<filename> -
<rationale tied to content. Include date or section hints if available.>
CONTEXT:
{context_str}
QUERY:
{query_str}
Now provide the theme and list of matching files."""
)
#
# Main program routine
#
def main():
# Use a local model to generate -- in this case using Ollama
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
query_engine = index.as_query_engine(
similarity_top_k=15, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
while True:
q = input("\nEnter a search topic or question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
# Generate the response by querying the engine
# This performes the similarity search and then applies the prompt
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(f"{meta.get('file_name')} {meta.get('file_path')} {getattr(node, 'score', None)}")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,134 @@
# query_topk_prompt_dw.py
# Run a querry on a vector store
#
# This version from query_rewrite_hyde.py, but removing hyde and using a custom prompt
# This verison implements a prompt and uses the build_exp.py vector store with BAAI/bge-large-en-v1.5
# Based on query_exp.py->query_topk.py->query_rewrite_hyde.py
# The results are as good as with HyDE.
# Modified for terminal output (132 columns)
#
# E.M.F. August 2025
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
import os
import sys
import textwrap
# Print wrapping for terminal output
class Wrap80:
def write(self, text):
for line in text.splitlines():
sys.__stdout__.write(textwrap.fill(line, width=131) + "\n")
def flush(self):
sys.__stdout__.flush()
sys.stdout = Wrap80()
#
# Globals
#
# Embedding model used in vector store (this should match the one in build_exp.py or equivalent)
# embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
embed_model = HuggingFaceEmbedding(cache_folder="./models",model_name="BAAI/bge-large-en-v1.5")
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# LLM model to use in query transform and generation
# command-r7b generates about as quickly as llama3.1:8B, but provides results that stick better
# to the provided context
llm="command-r7b"
# Other models tried:
#llm="llama3.1:8B"
# llm="deepseek-r1:8B"
# llm="gemma3:1b"
# Custom prompt for the query engine
PROMPT = PromptTemplate(
"""You are an expert research assistant. You are given top-ranked writing excerpts (CONTEXT) and a user's QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to 10).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context in a few sentences.
2. **Matching Files**
Make a list of 10 matching files. The format for each should be:
<filename> -
<rationale tied to content. Include date or section hints if available.>
CONTEXT:
{context_str}
QUERY:
{query_str}
Now provide the theme and list of matching files."""
)
#
# Main program routine
#
def main():
# Use a local model to generate
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
query_engine = index.as_query_engine(
similarity_top_k=15, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
while True:
q = input("\nEnter a search topic or question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
# Generate the response by querying the engine
# This performes the similarity search and then applies the prompt
response = query_engine.query(q)
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(f"{meta.get('file_name')} {meta.get('file_path')} {getattr(node, 'score', None)}", end="")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,123 @@
# query_topk_prompt_engine.py
# Run a querry on a vector store
#
# This version is query_topk_prompt.py but the query is passed though the command line.
#
# Implements a prompt and uses the build_exp.py vector store with BAAI/bge-large-en-v1.5
# Based on query_exp.py->query_topk.py
#
# E.M.F. August 2025
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
import os
import sys
#
# Globals
#
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Embedding model used in vector store (this should match the one in build_exp.py or equivalent)
# embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
embed_model = HuggingFaceEmbedding(cache_folder="./models",model_name="BAAI/bge-large-en-v1.5",local_files_only=True)
# LLM model to use in query transform and generation
# command-r7b generates about as quickly as llama3.1:8B, but provides results that stick better
# to the provided context
llm="command-r7b"
# Other models tried:
#llm="llama3.1:8B"
#llm="deepseek-r1:8B"
#llm="gemma3:1b"
#
# Custom prompt for the query engine
#
PROMPT = PromptTemplate(
"""You are an expert research assistant. You are given top-ranked writing excerpts (CONTEXT) and a user's QUERY.
Instructions:
- Base your response *only* on the CONTEXT.
- The snippets are ordered from most to least relevantprioritize insights from earlier (higher-ranked) snippets.
- Aim to reference *as many distinct* relevant files as possible (up to 10).
- Do not invent or generalize; refer to specific passages or facts only.
- If a passage only loosely matches, deprioritize it.
Format your answer in two parts:
1. **Summary Theme**
Summarize the dominant theme from the relevant context in a few sentences.
2. **Matching Files**
Make a list of 10 matching files. The format for each should be:
<filename> -
<rationale tied to content. Include date or section hints if available.>
CONTEXT:
{context_str}
QUERY:
{query_str}
Now provide the theme and list of matching files."""
)
#
# Main program routine
#
def main():
# Use a local model to generate -- in this case using Ollama
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
query_engine = index.as_query_engine(
similarity_top_k=15, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
if len(sys.argv) < 2:
print("Usage: python query.py QUERY_TEXT")
sys.exit(1)
q = " ".join(sys.argv[1:])
# Generate the response by querying the engine
# This performes the similarity search and then applies the prompt
response = query_engine.query(q)
# Return the query response and source documents
print("\nResponse:\n")
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(f"{meta.get('file_name')} {meta.get('file_path')} {getattr(node, 'score', None):.3f}")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,136 @@
# query_topk_prompt_engine_v3.py
# Run a query on a vector store with cross-encoder re-ranking
#
# Based on v2. Adds a cross-encoder re-ranking step:
# 1. Retrieve top-30 chunks via vector similarity (bi-encoder, fast)
# 2. Re-rank to top-15 using a cross-encoder (slower but more accurate)
# 3. Pass re-ranked chunks to LLM for synthesis
#
# The cross-encoder scores each (query, chunk) pair jointly, which captures
# nuance that bi-encoder dot-product similarity misses.
#
# E.M.F. February 2026
# Environment vars must be set before importing huggingface/transformers
# libraries, because huggingface_hub.constants evaluates HF_HUB_OFFLINE
# at import time.
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "./models"
os.environ["HF_HUB_OFFLINE"] = "1"
from llama_index.core import (
StorageContext,
load_index_from_storage,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
from llama_index.core.postprocessor import SentenceTransformerRerank
import sys
#
# Globals
#
# Embedding model used in vector store (must match build_exp_claude.py)
EMBED_MODEL = HuggingFaceEmbedding(cache_folder="./models", model_name="BAAI/bge-large-en-v1.5", local_files_only=True)
# LLM model for generation
llm = "command-r7b"
# Cross-encoder model for re-ranking (cached in ./models/)
#RERANK_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
RERANK_MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
#RERANK_MODEL = "cross-encoder/stsb-roberta-base"
#RERANK_MODEL = "BAAI/bge-reranker-v2-m3"
RERANK_TOP_N = 15 # keep top 15 after re-ranking
RETRIEVE_TOP_K = 30 # retrieve wider pool for re-ranker to work with
#
# Custom prompt for the query engine - Version 3
#
# Adapted for re-ranked context: every excerpt below has been scored for
# relevance by a cross-encoder, so even lower-ranked ones are worth examining.
# The prompt encourages the LLM to draw from all provided excerpts and to
# note what each distinct file contributes rather than collapsing onto one.
#
PROMPT = PromptTemplate(
"""You are a precise research assistant analyzing excerpts from a personal journal collection.
Every excerpt below has been selected and ranked for relevance to the query.
CONTEXT (ranked by relevance):
{context_str}
QUERY:
{query_str}
Instructions:
- Answer ONLY using information explicitly present in the CONTEXT above
- Examine ALL provided excerpts, not just the top few -- each one was selected for relevance
- Be specific: quote or closely paraphrase key passages and cite their file names
- When multiple files touch on the query, note what each one contributes
- If the context doesn't contain enough information to answer fully, say so
Your response should:
1. Directly answer the query, drawing on as many relevant excerpts as possible
2. Reference specific files and their content (e.g., "In <filename>, ...")
3. End with a list of all files that contributed to your answer, with a brief note on each
If the context is insufficient, explain what's missing."""
)
#
# Main program routine
#
def main():
# Use a local model to generate -- in this case using Ollama
Settings.llm = Ollama(
model=llm,
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = EMBED_MODEL
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Cross-encoder re-ranker
reranker = SentenceTransformerRerank(
model=RERANK_MODEL,
top_n=RERANK_TOP_N,
)
# Build query engine: retrieve wide (top-30), re-rank to top-15, then synthesize
query_engine = index.as_query_engine(
similarity_top_k=RETRIEVE_TOP_K,
text_qa_template=PROMPT,
node_postprocessors=[reranker],
)
# Query
if len(sys.argv) < 2:
print("Usage: python query_topk_prompt_engine_v3.py QUERY_TEXT")
sys.exit(1)
q = " ".join(sys.argv[1:])
# Generate the response by querying the engine
response = query_engine.query(q)
# Return the query response and source documents
print("\nResponse:\n")
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(f"{meta.get('file_name')} {meta.get('file_path')} {getattr(node, 'score', None):.3f}")
if __name__ == "__main__":
main()

60
archived/query_tree.py Normal file
View file

@ -0,0 +1,60 @@
# query_tree.py
#
# Run a querry on a vector store
# This is to test summarization using a tree-summarize response mode
# It doesn't work very well, perhaps because of the struture of the data
#
# E.M.F. August 2025
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
# Use a local model to generate
Settings.llm = Ollama(
model="llama3.1:8B", # First model tested
# model="deepseek-r1:8B", # This model shows its reasoning
# model="gemma3:1b",
request_timeout=360.0,
context_window=8000
)
def main():
# Load embedding model (same as used for vector store)
embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(response_mode="tree_summarize")
# Query
while True:
q = input("\nEnter your question (or 'exit'): ").strip()
if q.lower() in ("exit", "quit"):
break
print()
response = query_engine.query("<summarization_query>")
# Return the query response and source documents
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(meta.get("file_name"), "---", meta.get("file_path"), getattr(node, "score", None))
if __name__ == "__main__":
main()

97
archived/retrieve_raw.py Normal file
View file

@ -0,0 +1,97 @@
# retrieve_raw.py
# Verbatim chunk retrieval: vector search + cross-encoder re-ranking, no LLM.
#
# Returns the top re-ranked chunks with their full text, file metadata, and
# scores. Useful for browsing source material directly and verifying what
# the RAG pipeline retrieves before LLM synthesis.
#
# Uses the same vector store, embedding model, and re-ranker as
# query_topk_prompt_engine_v3.py, but skips the LLM step entirely.
#
# E.M.F. February 2026
# Environment vars must be set before importing huggingface/transformers
# libraries, because huggingface_hub.constants evaluates HF_HUB_OFFLINE
# at import time.
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "./models"
os.environ["HF_HUB_OFFLINE"] = "1"
from llama_index.core import (
StorageContext,
load_index_from_storage,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.postprocessor import SentenceTransformerRerank
import sys
import textwrap
#
# Globals
#
# Embedding model (must match build_exp_claude.py)
EMBED_MODEL = HuggingFaceEmbedding(cache_folder="./models", model_name="BAAI/bge-large-en-v1.5", local_files_only=True)
# Cross-encoder model for re-ranking (cached in ./models/)
RERANK_MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
RERANK_TOP_N = 15
RETRIEVE_TOP_K = 30
# Output formatting
WRAP_WIDTH = 80
def main():
# No LLM needed -- set embed model only
Settings.embed_model = EMBED_MODEL
# Load persisted vector store
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build retriever (vector search only, no query engine / LLM)
retriever = index.as_retriever(similarity_top_k=RETRIEVE_TOP_K)
# Cross-encoder re-ranker
reranker = SentenceTransformerRerank(
model=RERANK_MODEL,
top_n=RERANK_TOP_N,
)
# Query
if len(sys.argv) < 2:
print("Usage: python retrieve_raw.py QUERY_TEXT")
sys.exit(1)
q = " ".join(sys.argv[1:])
# Retrieve and re-rank
nodes = retriever.retrieve(q)
reranked = reranker.postprocess_nodes(nodes, query_str=q)
# Output
print(f"\nQuery: {q}")
print(f"Retrieved {len(nodes)} chunks, re-ranked to top {len(reranked)}\n")
for i, node in enumerate(reranked, 1):
meta = getattr(node, "metadata", None) or node.node.metadata
score = getattr(node, "score", None)
file_name = meta.get("file_name", "unknown")
text = node.get_content()
print("="*WRAP_WIDTH)
print(f"=== [{i}] {file_name} (score: {score:.3f}) ")
print("="*WRAP_WIDTH)
# Wrap text for readability
for line in text.splitlines():
if line.strip():
print(textwrap.fill(line, width=WRAP_WIDTH))
else:
print()
print()
if __name__ == "__main__":
main()

27
archived/vs_metrics.py Normal file
View file

@ -0,0 +1,27 @@
# vs_metrics.py
# Quantify vector store properties and performance
#
# E.M.F. August 2025
# Read in vector store
# What are properties of the vector store?
# - number of vectors
# - distribution of distances
# - clustering?
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# Load embedding model (same as used for vector store)
embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

100
deploy_public.sh Executable file
View file

@ -0,0 +1,100 @@
#!/bin/bash
# deploy_public.sh — Deploy public files from main to the Forgejo public branch
#
# Usage: ./deploy_public.sh ["optional commit message"]
#
# Checks out the public branch, updates it with public files from main,
# generates a public README (stripping private sections), commits if
# anything changed, and pushes to origin. Then switches back to main.
#
# On first run (no public branch exists), creates an orphan branch.
#
# E.M.F. February 2026
set -e
# --- Configuration ---
# Files to include on the public branch
PUBLIC_FILES=(
build_store.py
query_hybrid.py
retrieve.py
search_keywords.py
run_query.sh
clippings_search/build_clippings.py
clippings_search/retrieve_clippings.py
requirements.txt
.gitignore
LICENSE
)
REMOTE="origin"
BRANCH="public"
COMMIT_MSG="${1:-Update public branch from main}"
# --- Safety checks ---
CURRENT=$(git branch --show-current)
if [ "$CURRENT" != "main" ]; then
echo "Error: must be on main branch (currently on $CURRENT)"
exit 1
fi
if ! git diff --quiet HEAD 2>/dev/null; then
echo "Error: uncommitted changes on main. Commit or stash first."
exit 1
fi
MAIN_HEAD=$(git rev-parse --short HEAD)
# --- Build public branch ---
echo "Deploying main ($MAIN_HEAD) -> $BRANCH..."
# Check out public branch, or create orphan if it doesn't exist yet
if git show-ref --verify --quiet "refs/heads/$BRANCH"; then
git checkout "$BRANCH"
else
echo "No local $BRANCH branch — creating orphan..."
git checkout --orphan "$BRANCH"
git rm -rf . >/dev/null 2>&1 || true
fi
# Copy public files from main
for f in "${PUBLIC_FILES[@]}"; do
git checkout main -- "$f"
done
# Generate public README from main's README:
# - Strip "## Notebooks" section
# - Strip "## Development history" section
# - Remove project-tree lines referencing private files
git checkout main -- README.md
awk '
/^## Notebooks/ { skip = 1; next }
/^## Development hist/ { skip = 1; next }
/^## / { skip = 0 }
skip { next }
/archived\// { next }
/saved_output\// { next }
/devlog\.md/ { next }
/\*\.ipynb/ { next }
{ print }
' README.md > README.tmp && mv README.tmp README.md
# Stage only the public files (not untracked files on disk)
git add "${PUBLIC_FILES[@]}" README.md
# Commit only if there are changes
if git diff --cached --quiet; then
echo "No changes to deploy."
else
git commit -m "$COMMIT_MSG"
git push "$REMOTE" "$BRANCH"
echo ""
echo "Done. Deployed main ($MAIN_HEAD) -> $REMOTE/$BRANCH"
fi
# Switch back to main
git checkout main

1035
devlog.md Normal file

File diff suppressed because it is too large Load diff

1189
devlog.txt Normal file

File diff suppressed because it is too large Load diff

249
hyde.ipynb Normal file
View file

@ -0,0 +1,249 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "11d5ae50",
"metadata": {},
"source": [
"# Experimenting with HyDE\n",
"\n",
"Using this to explore query rewrites\\\n",
"August 2025"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "813f8b1a",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import (\n",
" StorageContext,\n",
" load_index_from_storage,\n",
" ServiceContext,\n",
" Settings,\n",
")\n",
"from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
"from llama_index.llms.ollama import Ollama\n",
"from llama_index.core.prompts import PromptTemplate\n",
"from llama_index.core.indices.query.query_transform import HyDEQueryTransform\n",
"from llama_index.core.query_engine.transform_query_engine import TransformQueryEngine"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "f3d65589",
"metadata": {},
"outputs": [],
"source": [
"llm=\"llama3.1:8B\"\n",
"\n",
"# Use a local model to generate\n",
"Settings.llm = Ollama(\n",
" model=llm, # First model tested\n",
" request_timeout=360.0,\n",
" context_window=8000,\n",
" temperature=0.7,\n",
" )\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "afd593ee",
"metadata": {},
"outputs": [],
"source": [
"# Load embedding model (same as used for vector store)\n",
"embed_model = HuggingFaceEmbedding(model_name=\"all-mpnet-base-v2\")\n",
"Settings.embed_model = embed_model"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "04c702a2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original query: Find entries with strong feelings of depression.\n",
"HyDE-generated query (used for embedding):\n",
" Find entries with strong feelings of depression.\n"
]
}
],
"source": [
"#Initial query\n",
"initial_query = \"Find entries with strong feelings of depression.\"\n",
"\n",
"# Define a custom HyDE prompt (this is fully supported)\n",
"hyde_prompt = PromptTemplate(\n",
" \"You are a helpful assistant. Generate a detailed hypothetical answer to the user query below.\\n\\nQuery: {query_str}\\n\\nAnswer:\"\n",
")\n",
"\n",
"hyde_transform = HyDEQueryTransform(llm=Settings.llm,hyde_prompt=hyde_prompt,include_original=False)\n",
"\n",
"# Run the transform manually\n",
"hyde_query = hyde_transform.run(initial_query)\n",
"\n",
"# Print the result\n",
"print(\"Original query:\", initial_query)\n",
"print(\"HyDE-generated query (used for embedding):\\n\", hyde_query.query_str)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "3b211daf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are many important feelings that people experience in their lives. Here are some examples:\n",
"\n",
"1. **Love**: A strong affection or attachment to someone, which can be romantic, familial, or platonic.\n",
"2. **Happiness**: A positive emotional state characterized by a sense of joy, contentment, and satisfaction.\n",
"3. **Empathy**: The ability to understand and share the feelings of others, which is essential for building strong relationships and fostering compassion.\n",
"4. **Gratitude**: Feeling thankful or appreciative for something or someone in one's life, which can cultivate a positive outlook and well-being.\n",
"5. **Compassion**: A feeling of concern and kindness towards others who are suffering or struggling, which can inspire acts of service and support.\n",
"6. **Confidence**: A sense of self-assurance and faith in one's abilities, which is essential for personal growth and achievement.\n",
"7. **Respect**: Feeling admiration or esteem for someone or something, which is necessary for building strong relationships and social bonds.\n",
"8. **Forgiveness**: The ability to let go of negative emotions and forgive oneself or others for past mistakes or hurtful actions.\n",
"9. **Excitement**: A feeling of enthusiasm and eagerness, often accompanied by a sense of anticipation or adventure.\n",
"10. **Serenity**: A state of calmness and peace, which can be cultivated through mindfulness and self-reflection.\n",
"\n",
"These feelings are essential for human well-being and relationships, and they play important roles in shaping our experiences and interactions with others.\n",
"\n",
"Would you like me to expand on any of these feelings or explore other emotions?\n"
]
}
],
"source": [
"# Check that the LLM is working\n",
"# confirmed that this generates different responses each time\n",
"response = Settings.llm.complete(\"What are several important feelings?\")\n",
"print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "9db5c9c2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HyDE output:\n",
" Find entries with strong feelings of depression.\n"
]
}
],
"source": [
"# Test for silent errors. Output verifies working.\n",
"try:\n",
" hyde_result = hyde_transform.run(initial_query)\n",
" print(\"HyDE output:\\n\", hyde_result)\n",
"except Exception as e:\n",
" print(\"LLM error:\", e)"
]
},
{
"cell_type": "markdown",
"id": "d5add1ed",
"metadata": {},
"source": [
"## Testing HyDE based on llamaindex documentation\n",
"\n",
"https://docs.llamaindex.ai/en/stable/examples/query_transformations/HyDEQueryTransformDemo/#querying-without-transformation-yields-reasonable-answer"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "90381bc2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[\"Here is a passage that includes several key details about depression:\\n\\n**The Descent into Darkness**\\n\\nAs I lay in bed, staring blankly at the ceiling, I felt an overwhelming sense of hopelessness wash over me. The darkness seemed to close in around me, suffocating me with its crushing weight. Every thought felt like a burden, every decision a chore. I couldn't bear the idea of getting out of bed, of facing another day filled with anxiety and despair.\\n\\nI had been struggling with depression for what felt like an eternity. The symptoms had started slowly, a nagging feeling that something was off, but I had tried to brush it aside as mere exhaustion or stress. But as time went on, the feelings intensified, until they became all-consuming. I felt like I was drowning in a sea of sadness, unable to find a lifeline.\\n\\nThe smallest things would set me off - a harsh word from a loved one, a missed deadline at work, even just getting out of bed and facing another day. The world seemed too much for me to handle, and I retreated into my own private hell of despair. I couldn't eat, couldn't sleep, couldn't find any joy in the things that used to bring me happiness.\\n\\nAs I looked back on the past few months, I realized that this wasn't just a passing phase or a normal response to stress. Depression had taken hold, and it was suffocating me. I knew I needed help, but the thought of seeking treatment seemed daunting, even terrifying. What if they couldn't help me? What if I was stuck in this pit forever?\\n\\nI felt like I was losing myself, bit by bit, as depression consumed me. I longed for a glimmer of hope, a spark of light to guide me through the darkness. But it seemed elusive, always just out of reach.\\n\\nThis passage includes several key details about depression, including:\\n\\n* **Overwhelming feelings of sadness and hopelessness**: The protagonist feels an intense sense of despair that is difficult to shake.\\n* **Loss of motivation**: They feel like they can't get out of bed or face another day filled with anxiety and despair.\\n* **Withdrawal from activities**: They have lost interest in things that used to bring them joy, and are unable to eat or sleep.\\n* **Social isolation**: They retreat into their own private hell, feeling disconnected from others.\\n* **Loss of identity**: They feel like they are losing themselves as depression consumes them.\\n* **Fear of seeking help**: The protagonist is afraid to seek treatment, fearing that it won't work or that they will be stuck in this state forever.\",\n",
" 'Find entries with strong feelings of depression.']"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hyde = HyDEQueryTransform(llm=Settings.llm,include_original=True)\n",
"query_str = \"Find entries with strong feelings of depression.\"\n",
"query_bundle = hyde(query_str)\n",
"hyde_doc = query_bundle.embedding_strs\n",
"hyde_doc"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "08e7eca4",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"[\"Here is a passage that includes several key details about depression:\\n\\n**The Descent into Darkness**\\n\\nAs I lay in bed, staring blankly at the ceiling, I felt an overwhelming sense of hopelessness wash over me. The darkness seemed to close in around me, suffocating me with its crushing weight. Every thought felt like a burden, every decision a chore. I couldn't bear the idea of getting out of bed, of facing another day filled with anxiety and despair.\\n\\nI had been struggling with depression for what felt like an eternity. The symptoms had started slowly, a nagging feeling that something was off, but I had tried to brush it aside as mere exhaustion or stress. But as time went on, the feelings intensified, until they became all-consuming. I felt like I was drowning in a sea of sadness, unable to find a lifeline.\\n\\nThe smallest things would set me off - a harsh word from a loved one, a missed deadline at work, even just getting out of bed and facing another day. The world seemed too much for me to handle, and I retreated into my own private hell of despair. I couldn't eat, couldn't sleep, couldn't find any joy in the things that used to bring me happiness.\\n\\nAs I looked back on the past few months, I realized that this wasn't just a passing phase or a normal response to stress. Depression had taken hold, and it was suffocating me. I knew I needed help, but the thought of seeking treatment seemed daunting, even terrifying. What if they couldn't help me? What if I was stuck in this pit forever?\\n\\nI felt like I was losing myself, bit by bit, as depression consumed me. I longed for a glimmer of hope, a spark of light to guide me through the darkness. But it seemed elusive, always just out of reach.\\n\\nThis passage includes several key details about depression, including:\\n\\n* **Overwhelming feelings of sadness and hopelessness**: The protagonist feels an intense sense of despair that is difficult to shake.\\n* **Loss of motivation**: They feel like they can't get out of bed or face another day filled with anxiety and despair.\\n* **Withdrawal from activities**: They have lost interest in things that used to bring them joy, and are unable to eat or sleep.\\n* **Social isolation**: They retreat into their own private hell, feeling disconnected from others.\\n* **Loss of identity**: They feel like they are losing themselves as depression consumes them.\\n* **Fear of seeking help**: The protagonist is afraid to seek treatment, fearing that it won't work or that they will be stuck in this state forever.\", 'Find entries with strong feelings of depression.']"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import Markdown, display\n",
"display(Markdown(f\"{hyde_doc}\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ca50f9d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View file

@ -0,0 +1,125 @@
# query_topk_prompt_engine_v2.py
# Run a querry on a vector store
#
# This version uses an improved prompt that is more flexible and query-adaptive
# Based on query_topk_prompt_engine.py
#
# Implements a prompt and uses the build_exp.py vector store with BAAI/bge-large-en-v1.5
#
# E.M.F. January 2026
from llama_index.core import (
StorageContext,
load_index_from_storage,
ServiceContext,
Settings,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
import os
import sys
#
# Globals
#
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Embedding model used in vector store (this should match the one in build_exp.py or equivalent)
# embed_model = HuggingFaceEmbedding(model_name="all-mpnet-base-v2")
embed_model = HuggingFaceEmbedding(cache_folder="./models",model_name="BAAI/bge-large-en-v1.5",local_files_only=True)
# LLM model to use in query transform and generation
# command-r7b generates about as quickly as llama3.1:8B, but provides results that stick better
# to the provided context
llm="command-r7b"
# Other models tried:
#llm="llama3.1:8B"
#llm="deepseek-r1:8B"
#llm="gemma3:1b"
#
# Custom prompt for the query engine - Version 2 (improved)
#
# This prompt is more flexible and query-adaptive than v1:
# - Doesn't force artificial structure (exactly 10 files, mandatory theme)
# - Works for factual questions, exploratory queries, and comparisons
# - Emphasizes precision with explicit citations
# - Allows natural synthesis across sources
# - Honest about limitations when context is insufficient
#
PROMPT = PromptTemplate(
"""You are a precise research assistant analyzing excerpts from a document collection.
CONTEXT (ranked by relevance):
{context_str}
QUERY:
{query_str}
Instructions:
- Answer ONLY using information explicitly present in the CONTEXT above
- Prioritize higher-ranked excerpts but don't ignore lower ones if they contain unique relevant information
- Be specific: cite file names and quote/paraphrase key passages when relevant
- If the context doesn't contain enough information to answer fully, say so
- Synthesize information across multiple sources when appropriate
Your response should:
1. Directly answer the query using the context
2. Reference specific files and their content (e.g., "In <filename>, ...")
3. List all relevant source files at the end with brief relevance notes
If you find relevant information, organize it clearly. If the context is insufficient, explain what's missing."""
)
#
# Main program routine
#
def main():
# Use a local model to generate -- in this case using Ollama
Settings.llm = Ollama(
model=llm, # First model tested
request_timeout=360.0,
context_window=8000
)
# Load embedding model (same as used for vector store)
Settings.embed_model = embed_model
# Load persisted vector store + metadata
storage_context = StorageContext.from_defaults(persist_dir="./storage_exp")
index = load_index_from_storage(storage_context)
# Build regular query engine with custom prompt
query_engine = index.as_query_engine(
similarity_top_k=15, # pull wide
#response_mode="compact" # concise synthesis
text_qa_template=PROMPT, # custom prompt (v2)
# node_postprocessors=[
# SimilarityPostprocessor(similarity_cutoff=0.75) # keep strong hits; makes result count flexible
# ],
)
# Query
if len(sys.argv) < 2:
print("Usage: python query.py QUERY_TEXT")
sys.exit(1)
q = " ".join(sys.argv[1:])
# Generate the response by querying the engine
# This performes the similarity search and then applies the prompt
response = query_engine.query(q)
# Return the query response and source documents
print("\nResponse:\n")
print(response.response)
print("\nSource documents:")
for node in response.source_nodes:
meta = getattr(node, "metadata", None) or node.node.metadata
print(f"{meta.get('file_name')} {meta.get('file_path')} {getattr(node, 'score', None):.3f}")
if __name__ == "__main__":
main()

973
sandbox.ipynb Normal file
View file

@ -0,0 +1,973 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "11d5ae50",
"metadata": {},
"source": [
"# llamaindex sandbox\n",
"\n",
"Using this to explore llamaindex\\\n",
"August 2025"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "813f8b1a",
"metadata": {},
"outputs": [],
"source": [
"import llama_index.core"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "656faffb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['BaseCallbackHandler', 'BasePromptTemplate', 'Callable', 'ChatPromptTemplate', 'ComposableGraph', 'Document', 'DocumentSummaryIndex', 'GPTDocumentSummaryIndex', 'GPTKeywordTableIndex', 'GPTListIndex', 'GPTRAKEKeywordTableIndex', 'GPTSimpleKeywordTableIndex', 'GPTTreeIndex', 'GPTVectorStoreIndex', 'IndexStructType', 'KeywordTableIndex', 'KnowledgeGraphIndex', 'ListIndex', 'MockEmbedding', 'NullHandler', 'Optional', 'Prompt', 'PromptHelper', 'PromptTemplate', 'PropertyGraphIndex', 'QueryBundle', 'RAKEKeywordTableIndex', 'Response', 'SQLContextBuilder', 'SQLDatabase', 'SQLDocumentContextBuilder', 'SelectorPromptTemplate', 'ServiceContext', 'Settings', 'SimpleDirectoryReader', 'SimpleKeywordTableIndex', 'StorageContext', 'SummaryIndex', 'TreeIndex', 'VectorStoreIndex', '__all__', '__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'async_utils', 'base', 'bridge', 'callbacks', 'chat_engine', 'constants', 'data_structs', 'download', 'download_loader', 'embeddings', 'evaluation', 'get_response_synthesizer', 'get_tokenizer', 'global_handler', 'global_tokenizer', 'graph_stores', 'image_retriever', 'indices', 'ingestion', 'instrumentation', 'llama_dataset', 'llms', 'load_graph_from_storage', 'load_index_from_storage', 'load_indices_from_storage', 'logging', 'memory', 'multi_modal_llms', 'node_parser', 'objects', 'output_parsers', 'postprocessor', 'prompts', 'query_engine', 'question_gen', 'readers', 'response', 'response_synthesizers', 'schema', 'selectors', 'service_context', 'set_global_handler', 'set_global_service_context', 'set_global_tokenizer', 'settings', 'storage', 'tools', 'types', 'utilities', 'utils', 'vector_stores', 'workflow']\n"
]
}
],
"source": [
"# List available objects\n",
"print(dir(llama_index.core))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bea0759d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"BaseCallbackHandler\n",
"BasePromptTemplate\n",
"Callable\n",
"ChatPromptTemplate\n",
"ComposableGraph\n",
"Document\n",
"DocumentSummaryIndex\n",
"GPTDocumentSummaryIndex\n",
"GPTKeywordTableIndex\n",
"GPTListIndex\n",
"GPTRAKEKeywordTableIndex\n",
"GPTSimpleKeywordTableIndex\n",
"GPTTreeIndex\n",
"GPTVectorStoreIndex\n",
"IndexStructType\n",
"KeywordTableIndex\n",
"KnowledgeGraphIndex\n",
"ListIndex\n",
"MockEmbedding\n",
"NullHandler\n",
"Optional\n",
"Prompt\n",
"PromptHelper\n",
"PromptTemplate\n",
"PropertyGraphIndex\n",
"QueryBundle\n",
"RAKEKeywordTableIndex\n",
"Response\n",
"SQLContextBuilder\n",
"SQLDatabase\n",
"SQLDocumentContextBuilder\n",
"SelectorPromptTemplate\n",
"ServiceContext\n",
"Settings\n",
"SimpleDirectoryReader\n",
"SimpleKeywordTableIndex\n",
"StorageContext\n",
"SummaryIndex\n",
"TreeIndex\n",
"VectorStoreIndex\n",
"__all__\n",
"__annotations__\n",
"__builtins__\n",
"__cached__\n",
"__doc__\n",
"__file__\n",
"__loader__\n",
"__name__\n",
"__package__\n",
"__path__\n",
"__spec__\n",
"__version__\n",
"async_utils\n",
"base\n",
"bridge\n",
"callbacks\n",
"chat_engine\n",
"constants\n",
"data_structs\n",
"download\n",
"download_loader\n",
"embeddings\n",
"evaluation\n",
"get_response_synthesizer\n",
"get_tokenizer\n",
"global_handler\n",
"global_tokenizer\n",
"graph_stores\n",
"image_retriever\n",
"indices\n",
"ingestion\n",
"instrumentation\n",
"llama_dataset\n",
"llms\n",
"load_graph_from_storage\n",
"load_index_from_storage\n",
"load_indices_from_storage\n",
"logging\n",
"memory\n",
"multi_modal_llms\n",
"node_parser\n",
"objects\n",
"output_parsers\n",
"postprocessor\n",
"prompts\n",
"query_engine\n",
"question_gen\n",
"readers\n",
"response\n",
"response_synthesizers\n",
"schema\n",
"selectors\n",
"service_context\n",
"set_global_handler\n",
"set_global_service_context\n",
"set_global_tokenizer\n",
"settings\n",
"storage\n",
"tools\n",
"types\n",
"utilities\n",
"utils\n",
"vector_stores\n",
"workflow\n"
]
}
],
"source": [
"# Better formatted output for list of available objects\n",
"objects = dir(llama_index.core)\n",
"for obj in objects:\n",
" print(obj)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3886a5f0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# dir returns a list\n",
"type(objects)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "272cb0c9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"104"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# In the case of llamaindex.core, it contains 104 objects\n",
"\n",
"len(objects)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bfffc03f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on class VectorStoreIndex in module llama_index.core.indices.vector_store.base:\n",
"\n",
"class VectorStoreIndex(llama_index.core.indices.base.BaseIndex)\n",
" | VectorStoreIndex(nodes: Optional[Sequence[llama_index.core.schema.BaseNode]] = None, use_async: bool = False, store_nodes_override: bool = False, embed_model: Union[llama_index.core.base.embeddings.base.BaseEmbedding, ForwardRef('LCEmbeddings'), str, NoneType] = None, insert_batch_size: int = 2048, objects: Optional[Sequence[llama_index.core.schema.IndexNode]] = None, index_struct: Optional[llama_index.core.data_structs.data_structs.IndexDict] = None, storage_context: Optional[llama_index.core.storage.storage_context.StorageContext] = None, callback_manager: Optional[llama_index.core.callbacks.base.CallbackManager] = None, transformations: Optional[List[llama_index.core.schema.TransformComponent]] = None, show_progress: bool = False, **kwargs: Any) -> None\n",
" |\n",
" | Vector Store Index.\n",
" |\n",
" | Args:\n",
" | use_async (bool): Whether to use asynchronous calls. Defaults to False.\n",
" | show_progress (bool): Whether to show tqdm progress bars. Defaults to False.\n",
" | store_nodes_override (bool): set to True to always store Node objects in index\n",
" | store and document store even if vector store keeps text. Defaults to False\n",
" |\n",
" | Method resolution order:\n",
" | VectorStoreIndex\n",
" | llama_index.core.indices.base.BaseIndex\n",
" | typing.Generic\n",
" | abc.ABC\n",
" | builtins.object\n",
" |\n",
" | Methods defined here:\n",
" |\n",
" | __init__(self, nodes: Optional[Sequence[llama_index.core.schema.BaseNode]] = None, use_async: bool = False, store_nodes_override: bool = False, embed_model: Union[llama_index.core.base.embeddings.base.BaseEmbedding, ForwardRef('LCEmbeddings'), str, NoneType] = None, insert_batch_size: int = 2048, objects: Optional[Sequence[llama_index.core.schema.IndexNode]] = None, index_struct: Optional[llama_index.core.data_structs.data_structs.IndexDict] = None, storage_context: Optional[llama_index.core.storage.storage_context.StorageContext] = None, callback_manager: Optional[llama_index.core.callbacks.base.CallbackManager] = None, transformations: Optional[List[llama_index.core.schema.TransformComponent]] = None, show_progress: bool = False, **kwargs: Any) -> None\n",
" | Initialize params.\n",
" |\n",
" | async adelete_nodes(self, node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) -> None\n",
" | Delete a list of nodes from the index.\n",
" |\n",
" | Args:\n",
" | node_ids (List[str]): A list of node_ids from the nodes to delete\n",
" |\n",
" | async adelete_ref_doc(self, ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) -> None\n",
" | Delete a document and it's nodes by using ref_doc_id.\n",
" |\n",
" | async ainsert_nodes(self, nodes: Sequence[llama_index.core.schema.BaseNode], **insert_kwargs: Any) -> None\n",
" | Insert nodes.\n",
" |\n",
" | NOTE: overrides BaseIndex.ainsert_nodes.\n",
" | VectorStoreIndex only stores nodes in document store\n",
" | if vector store does not store text\n",
" |\n",
" | as_retriever(self, **kwargs: Any) -> llama_index.core.base.base_retriever.BaseRetriever\n",
" |\n",
" | build_index_from_nodes(self, nodes: Sequence[llama_index.core.schema.BaseNode], **insert_kwargs: Any) -> llama_index.core.data_structs.data_structs.IndexDict\n",
" | Build the index from nodes.\n",
" |\n",
" | NOTE: Overrides BaseIndex.build_index_from_nodes.\n",
" | VectorStoreIndex only stores nodes in document store\n",
" | if vector store does not store text\n",
" |\n",
" | delete_nodes(self, node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) -> None\n",
" | Delete a list of nodes from the index.\n",
" |\n",
" | Args:\n",
" | node_ids (List[str]): A list of node_ids from the nodes to delete\n",
" |\n",
" | delete_ref_doc(self, ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) -> None\n",
" | Delete a document and it's nodes by using ref_doc_id.\n",
" |\n",
" | insert_nodes(self, nodes: Sequence[llama_index.core.schema.BaseNode], **insert_kwargs: Any) -> None\n",
" | Insert nodes.\n",
" |\n",
" | NOTE: overrides BaseIndex.insert_nodes.\n",
" | VectorStoreIndex only stores nodes in document store\n",
" | if vector store does not store text\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Class methods defined here:\n",
" |\n",
" | from_vector_store(vector_store: llama_index.core.vector_stores.types.BasePydanticVectorStore, embed_model: Union[llama_index.core.base.embeddings.base.BaseEmbedding, ForwardRef('LCEmbeddings'), str, NoneType] = None, **kwargs: Any) -> 'VectorStoreIndex'\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Readonly properties defined here:\n",
" |\n",
" | ref_doc_info\n",
" | Retrieve a dict mapping of ingested documents and their nodes+metadata.\n",
" |\n",
" | vector_store\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Data and other attributes defined here:\n",
" |\n",
" | __abstractmethods__ = frozenset()\n",
" |\n",
" | __annotations__ = {}\n",
" |\n",
" | __orig_bases__ = (llama_index.core.indices.base.BaseIndex[llama_index....\n",
" |\n",
" | __parameters__ = ()\n",
" |\n",
" | index_struct_cls = <class 'llama_index.core.data_structs.data_structs....\n",
" | A simple dictionary of documents.\n",
" |\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from llama_index.core.indices.base.BaseIndex:\n",
" |\n",
" | async ainsert(self, document: llama_index.core.schema.Document, **insert_kwargs: Any) -> None\n",
" | Asynchronously insert a document.\n",
" |\n",
" | async arefresh_ref_docs(self, documents: Sequence[llama_index.core.schema.Document], **update_kwargs: Any) -> List[bool]\n",
" | Asynchronously refresh an index with documents that have changed.\n",
" |\n",
" | This allows users to save LLM and Embedding model calls, while only\n",
" | updating documents that have any changes in text or metadata. It\n",
" | will also insert any documents that previously were not stored.\n",
" |\n",
" | as_chat_engine(self, chat_mode: llama_index.core.chat_engine.types.ChatMode = <ChatMode.BEST: 'best'>, llm: Union[str, llama_index.core.llms.llm.LLM, ForwardRef('BaseLanguageModel'), NoneType] = None, **kwargs: Any) -> llama_index.core.chat_engine.types.BaseChatEngine\n",
" | Convert the index to a chat engine.\n",
" |\n",
" | Calls `index.as_query_engine(llm=llm, **kwargs)` to get the query engine and then\n",
" | wraps it in a chat engine based on the chat mode.\n",
" |\n",
" | Chat modes:\n",
" | - `ChatMode.BEST` (default): Chat engine that uses an agent (react or openai) with a query engine tool\n",
" | - `ChatMode.CONTEXT`: Chat engine that uses a retriever to get context\n",
" | - `ChatMode.CONDENSE_QUESTION`: Chat engine that condenses questions\n",
" | - `ChatMode.CONDENSE_PLUS_CONTEXT`: Chat engine that condenses questions and uses a retriever to get context\n",
" | - `ChatMode.SIMPLE`: Simple chat engine that uses the LLM directly\n",
" | - `ChatMode.REACT`: Chat engine that uses a react agent with a query engine tool\n",
" | - `ChatMode.OPENAI`: Chat engine that uses an openai agent with a query engine tool\n",
" |\n",
" | as_query_engine(self, llm: Union[str, llama_index.core.llms.llm.LLM, ForwardRef('BaseLanguageModel'), NoneType] = None, **kwargs: Any) -> llama_index.core.base.base_query_engine.BaseQueryEngine\n",
" | Convert the index to a query engine.\n",
" |\n",
" | Calls `index.as_retriever(**kwargs)` to get the retriever and then wraps it in a\n",
" | `RetrieverQueryEngine.from_args(retriever, **kwrags)` call.\n",
" |\n",
" | async aupdate_ref_doc(self, document: llama_index.core.schema.Document, **update_kwargs: Any) -> None\n",
" | Asynchronously update a document and it's corresponding nodes.\n",
" |\n",
" | This is equivalent to deleting the document and then inserting it again.\n",
" |\n",
" | Args:\n",
" | document (Union[BaseDocument, BaseIndex]): document to update\n",
" | insert_kwargs (Dict): kwargs to pass to insert\n",
" | delete_kwargs (Dict): kwargs to pass to delete\n",
" |\n",
" | delete(self, doc_id: str, **delete_kwargs: Any) -> None\n",
" | Delete a document from the index.\n",
" | All nodes in the index related to the index will be deleted.\n",
" |\n",
" | Args:\n",
" | doc_id (str): A doc_id of the ingested document\n",
" |\n",
" | insert(self, document: llama_index.core.schema.Document, **insert_kwargs: Any) -> None\n",
" | Insert a document.\n",
" |\n",
" | refresh(self, documents: Sequence[llama_index.core.schema.Document], **update_kwargs: Any) -> List[bool]\n",
" | Refresh an index with documents that have changed.\n",
" |\n",
" | This allows users to save LLM and Embedding model calls, while only\n",
" | updating documents that have any changes in text or metadata. It\n",
" | will also insert any documents that previously were not stored.\n",
" |\n",
" | refresh_ref_docs(self, documents: Sequence[llama_index.core.schema.Document], **update_kwargs: Any) -> List[bool]\n",
" | Refresh an index with documents that have changed.\n",
" |\n",
" | This allows users to save LLM and Embedding model calls, while only\n",
" | updating documents that have any changes in text or metadata. It\n",
" | will also insert any documents that previously were not stored.\n",
" |\n",
" | set_index_id(self, index_id: str) -> None\n",
" | Set the index id.\n",
" |\n",
" | NOTE: if you decide to set the index_id on the index_struct manually,\n",
" | you will need to explicitly call `add_index_struct` on the `index_store`\n",
" | to update the index store.\n",
" |\n",
" | Args:\n",
" | index_id (str): Index id to set.\n",
" |\n",
" | update(self, document: llama_index.core.schema.Document, **update_kwargs: Any) -> None\n",
" | Update a document and it's corresponding nodes.\n",
" |\n",
" | This is equivalent to deleting the document and then inserting it again.\n",
" |\n",
" | Args:\n",
" | document (Union[BaseDocument, BaseIndex]): document to update\n",
" | insert_kwargs (Dict): kwargs to pass to insert\n",
" | delete_kwargs (Dict): kwargs to pass to delete\n",
" |\n",
" | update_ref_doc(self, document: llama_index.core.schema.Document, **update_kwargs: Any) -> None\n",
" | Update a document and it's corresponding nodes.\n",
" |\n",
" | This is equivalent to deleting the document and then inserting it again.\n",
" |\n",
" | Args:\n",
" | document (Union[BaseDocument, BaseIndex]): document to update\n",
" | insert_kwargs (Dict): kwargs to pass to insert\n",
" | delete_kwargs (Dict): kwargs to pass to delete\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Class methods inherited from llama_index.core.indices.base.BaseIndex:\n",
" |\n",
" | from_documents(documents: Sequence[llama_index.core.schema.Document], storage_context: Optional[llama_index.core.storage.storage_context.StorageContext] = None, show_progress: bool = False, callback_manager: Optional[llama_index.core.callbacks.base.CallbackManager] = None, transformations: Optional[List[llama_index.core.schema.TransformComponent]] = None, **kwargs: Any) -> ~IndexType\n",
" | Create index from documents.\n",
" |\n",
" | Args:\n",
" | documents (Sequence[Document]]): List of documents to\n",
" | build the index from.\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Readonly properties inherited from llama_index.core.indices.base.BaseIndex:\n",
" |\n",
" | docstore\n",
" | Get the docstore corresponding to the index.\n",
" |\n",
" | index_id\n",
" | Get the index struct.\n",
" |\n",
" | index_struct\n",
" | Get the index struct.\n",
" |\n",
" | storage_context\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors inherited from llama_index.core.indices.base.BaseIndex:\n",
" |\n",
" | __dict__\n",
" | dictionary for instance variables\n",
" |\n",
" | __weakref__\n",
" | list of weak references to the object\n",
" |\n",
" | summary\n",
" |\n",
" | ----------------------------------------------------------------------\n",
" | Class methods inherited from typing.Generic:\n",
" |\n",
" | __class_getitem__(...)\n",
" | Parameterizes a generic class.\n",
" |\n",
" | At least, parameterizing a generic class is the *main* thing this\n",
" | method does. For example, for some generic class `Foo`, this is called\n",
" | when we do `Foo[int]` - there, with `cls=Foo` and `params=int`.\n",
" |\n",
" | However, note that this method is also called when defining generic\n",
" | classes in the first place with `class Foo[T]: ...`.\n",
" |\n",
" | __init_subclass__(...)\n",
" | Function to initialize subclasses.\n",
"\n"
]
}
],
"source": [
"# Get help on a specific object\n",
"help(llama_index.core.VectorStoreIndex)\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3eb5f1b7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class VectorStoreIndex(BaseIndex[IndexDict]):\n",
" \"\"\"\n",
" Vector Store Index.\n",
"\n",
" Args:\n",
" use_async (bool): Whether to use asynchronous calls. Defaults to False.\n",
" show_progress (bool): Whether to show tqdm progress bars. Defaults to False.\n",
" store_nodes_override (bool): set to True to always store Node objects in index\n",
" store and document store even if vector store keeps text. Defaults to False\n",
"\n",
" \"\"\"\n",
"\n",
" index_struct_cls = IndexDict\n",
"\n",
" def __init__(\n",
" self,\n",
" nodes: Optional[Sequence[BaseNode]] = None,\n",
" # vector store index params\n",
" use_async: bool = False,\n",
" store_nodes_override: bool = False,\n",
" embed_model: Optional[EmbedType] = None,\n",
" insert_batch_size: int = 2048,\n",
" # parent class params\n",
" objects: Optional[Sequence[IndexNode]] = None,\n",
" index_struct: Optional[IndexDict] = None,\n",
" storage_context: Optional[StorageContext] = None,\n",
" callback_manager: Optional[CallbackManager] = None,\n",
" transformations: Optional[List[TransformComponent]] = None,\n",
" show_progress: bool = False,\n",
" **kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"Initialize params.\"\"\"\n",
" self._use_async = use_async\n",
" self._store_nodes_override = store_nodes_override\n",
" self._embed_model = resolve_embed_model(\n",
" embed_model or Settings.embed_model, callback_manager=callback_manager\n",
" )\n",
"\n",
" self._insert_batch_size = insert_batch_size\n",
" super().__init__(\n",
" nodes=nodes,\n",
" index_struct=index_struct,\n",
" storage_context=storage_context,\n",
" show_progress=show_progress,\n",
" objects=objects,\n",
" callback_manager=callback_manager,\n",
" transformations=transformations,\n",
" **kwargs,\n",
" )\n",
"\n",
" @classmethod\n",
" def from_vector_store(\n",
" cls,\n",
" vector_store: BasePydanticVectorStore,\n",
" embed_model: Optional[EmbedType] = None,\n",
" **kwargs: Any,\n",
" ) -> \"VectorStoreIndex\":\n",
" if not vector_store.stores_text:\n",
" raise ValueError(\n",
" \"Cannot initialize from a vector store that does not store text.\"\n",
" )\n",
"\n",
" kwargs.pop(\"storage_context\", None)\n",
" storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
"\n",
" return cls(\n",
" nodes=[],\n",
" embed_model=embed_model,\n",
" storage_context=storage_context,\n",
" **kwargs,\n",
" )\n",
"\n",
" @property\n",
" def vector_store(self) -> BasePydanticVectorStore:\n",
" return self._vector_store\n",
"\n",
" def as_retriever(self, **kwargs: Any) -> BaseRetriever:\n",
" # NOTE: lazy import\n",
" from llama_index.core.indices.vector_store.retrievers import (\n",
" VectorIndexRetriever,\n",
" )\n",
"\n",
" return VectorIndexRetriever(\n",
" self,\n",
" node_ids=list(self.index_struct.nodes_dict.values()),\n",
" callback_manager=self._callback_manager,\n",
" object_map=self._object_map,\n",
" **kwargs,\n",
" )\n",
"\n",
" def _get_node_with_embedding(\n",
" self,\n",
" nodes: Sequence[BaseNode],\n",
" show_progress: bool = False,\n",
" ) -> List[BaseNode]:\n",
" \"\"\"\n",
" Get tuples of id, node, and embedding.\n",
"\n",
" Allows us to store these nodes in a vector store.\n",
" Embeddings are called in batches.\n",
"\n",
" \"\"\"\n",
" id_to_embed_map = embed_nodes(\n",
" nodes, self._embed_model, show_progress=show_progress\n",
" )\n",
"\n",
" results = []\n",
" for node in nodes:\n",
" embedding = id_to_embed_map[node.node_id]\n",
" result = node.model_copy()\n",
" result.embedding = embedding\n",
" results.append(result)\n",
" return results\n",
"\n",
" async def _aget_node_with_embedding(\n",
" self,\n",
" nodes: Sequence[BaseNode],\n",
" show_progress: bool = False,\n",
" ) -> List[BaseNode]:\n",
" \"\"\"\n",
" Asynchronously get tuples of id, node, and embedding.\n",
"\n",
" Allows us to store these nodes in a vector store.\n",
" Embeddings are called in batches.\n",
"\n",
" \"\"\"\n",
" id_to_embed_map = await async_embed_nodes(\n",
" nodes=nodes,\n",
" embed_model=self._embed_model,\n",
" show_progress=show_progress,\n",
" )\n",
"\n",
" results = []\n",
" for node in nodes:\n",
" embedding = id_to_embed_map[node.node_id]\n",
" result = node.model_copy()\n",
" result.embedding = embedding\n",
" results.append(result)\n",
" return results\n",
"\n",
" async def _async_add_nodes_to_index(\n",
" self,\n",
" index_struct: IndexDict,\n",
" nodes: Sequence[BaseNode],\n",
" show_progress: bool = False,\n",
" **insert_kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"Asynchronously add nodes to index.\"\"\"\n",
" if not nodes:\n",
" return\n",
"\n",
" for nodes_batch in iter_batch(nodes, self._insert_batch_size):\n",
" nodes_batch = await self._aget_node_with_embedding(\n",
" nodes_batch, show_progress\n",
" )\n",
" new_ids = await self._vector_store.async_add(nodes_batch, **insert_kwargs)\n",
"\n",
" # if the vector store doesn't store text, we need to add the nodes to the\n",
" # index struct and document store\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" for node, new_id in zip(nodes_batch, new_ids):\n",
" # NOTE: remove embedding from node to avoid duplication\n",
" node_without_embedding = node.model_copy()\n",
" node_without_embedding.embedding = None\n",
"\n",
" index_struct.add_node(node_without_embedding, text_id=new_id)\n",
" await self._docstore.async_add_documents(\n",
" [node_without_embedding], allow_update=True\n",
" )\n",
" else:\n",
" # NOTE: if the vector store keeps text,\n",
" # we only need to add image and index nodes\n",
" for node, new_id in zip(nodes_batch, new_ids):\n",
" if isinstance(node, (ImageNode, IndexNode)):\n",
" # NOTE: remove embedding from node to avoid duplication\n",
" node_without_embedding = node.model_copy()\n",
" node_without_embedding.embedding = None\n",
"\n",
" index_struct.add_node(node_without_embedding, text_id=new_id)\n",
" await self._docstore.async_add_documents(\n",
" [node_without_embedding], allow_update=True\n",
" )\n",
"\n",
" def _add_nodes_to_index(\n",
" self,\n",
" index_struct: IndexDict,\n",
" nodes: Sequence[BaseNode],\n",
" show_progress: bool = False,\n",
" **insert_kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"Add document to index.\"\"\"\n",
" if not nodes:\n",
" return\n",
"\n",
" for nodes_batch in iter_batch(nodes, self._insert_batch_size):\n",
" nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)\n",
" new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)\n",
"\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" # NOTE: if the vector store doesn't store text,\n",
" # we need to add the nodes to the index struct and document store\n",
" for node, new_id in zip(nodes_batch, new_ids):\n",
" # NOTE: remove embedding from node to avoid duplication\n",
" node_without_embedding = node.model_copy()\n",
" node_without_embedding.embedding = None\n",
"\n",
" index_struct.add_node(node_without_embedding, text_id=new_id)\n",
" self._docstore.add_documents(\n",
" [node_without_embedding], allow_update=True\n",
" )\n",
" else:\n",
" # NOTE: if the vector store keeps text,\n",
" # we only need to add image and index nodes\n",
" for node, new_id in zip(nodes_batch, new_ids):\n",
" if isinstance(node, (ImageNode, IndexNode)):\n",
" # NOTE: remove embedding from node to avoid duplication\n",
" node_without_embedding = node.model_copy()\n",
" node_without_embedding.embedding = None\n",
"\n",
" index_struct.add_node(node_without_embedding, text_id=new_id)\n",
" self._docstore.add_documents(\n",
" [node_without_embedding], allow_update=True\n",
" )\n",
"\n",
" def _build_index_from_nodes(\n",
" self,\n",
" nodes: Sequence[BaseNode],\n",
" **insert_kwargs: Any,\n",
" ) -> IndexDict:\n",
" \"\"\"Build index from nodes.\"\"\"\n",
" index_struct = self.index_struct_cls()\n",
" if self._use_async:\n",
" tasks = [\n",
" self._async_add_nodes_to_index(\n",
" index_struct,\n",
" nodes,\n",
" show_progress=self._show_progress,\n",
" **insert_kwargs,\n",
" )\n",
" ]\n",
" run_async_tasks(tasks)\n",
" else:\n",
" self._add_nodes_to_index(\n",
" index_struct,\n",
" nodes,\n",
" show_progress=self._show_progress,\n",
" **insert_kwargs,\n",
" )\n",
" return index_struct\n",
"\n",
" def build_index_from_nodes(\n",
" self,\n",
" nodes: Sequence[BaseNode],\n",
" **insert_kwargs: Any,\n",
" ) -> IndexDict:\n",
" \"\"\"\n",
" Build the index from nodes.\n",
"\n",
" NOTE: Overrides BaseIndex.build_index_from_nodes.\n",
" VectorStoreIndex only stores nodes in document store\n",
" if vector store does not store text\n",
" \"\"\"\n",
" # Filter out the nodes that don't have content\n",
" content_nodes = [\n",
" node\n",
" for node in nodes\n",
" if node.get_content(metadata_mode=MetadataMode.EMBED) != \"\"\n",
" ]\n",
"\n",
" # Report if some nodes are missing content\n",
" if len(content_nodes) != len(nodes):\n",
" print(\"Some nodes are missing content, skipping them...\")\n",
"\n",
" return self._build_index_from_nodes(content_nodes, **insert_kwargs)\n",
"\n",
" def _insert(self, nodes: Sequence[BaseNode], **insert_kwargs: Any) -> None:\n",
" \"\"\"Insert a document.\"\"\"\n",
" self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)\n",
"\n",
" def _validate_serializable(self, nodes: Sequence[BaseNode]) -> None:\n",
" \"\"\"Validate that the nodes are serializable.\"\"\"\n",
" for node in nodes:\n",
" if isinstance(node, IndexNode):\n",
" try:\n",
" node.dict()\n",
" except ValueError:\n",
" self._object_map[node.index_id] = node.obj\n",
" node.obj = None\n",
"\n",
" async def ainsert_nodes(\n",
" self, nodes: Sequence[BaseNode], **insert_kwargs: Any\n",
" ) -> None:\n",
" \"\"\"\n",
" Insert nodes.\n",
"\n",
" NOTE: overrides BaseIndex.ainsert_nodes.\n",
" VectorStoreIndex only stores nodes in document store\n",
" if vector store does not store text\n",
" \"\"\"\n",
" self._validate_serializable(nodes)\n",
"\n",
" with self._callback_manager.as_trace(\"insert_nodes\"):\n",
" await self._async_add_nodes_to_index(\n",
" self._index_struct, nodes, **insert_kwargs\n",
" )\n",
" self._storage_context.index_store.add_index_struct(self._index_struct)\n",
"\n",
" def insert_nodes(self, nodes: Sequence[BaseNode], **insert_kwargs: Any) -> None:\n",
" \"\"\"\n",
" Insert nodes.\n",
"\n",
" NOTE: overrides BaseIndex.insert_nodes.\n",
" VectorStoreIndex only stores nodes in document store\n",
" if vector store does not store text\n",
" \"\"\"\n",
" self._validate_serializable(nodes)\n",
"\n",
" with self._callback_manager.as_trace(\"insert_nodes\"):\n",
" self._insert(nodes, **insert_kwargs)\n",
" self._storage_context.index_store.add_index_struct(self._index_struct)\n",
"\n",
" def _delete_node(self, node_id: str, **delete_kwargs: Any) -> None:\n",
" pass\n",
"\n",
" async def adelete_nodes(\n",
" self,\n",
" node_ids: List[str],\n",
" delete_from_docstore: bool = False,\n",
" **delete_kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"\n",
" Delete a list of nodes from the index.\n",
"\n",
" Args:\n",
" node_ids (List[str]): A list of node_ids from the nodes to delete\n",
"\n",
" \"\"\"\n",
" # delete nodes from vector store\n",
" await self._vector_store.adelete_nodes(node_ids, **delete_kwargs)\n",
"\n",
" # delete from docstore only if needed\n",
" if (\n",
" not self._vector_store.stores_text or self._store_nodes_override\n",
" ) and delete_from_docstore:\n",
" for node_id in node_ids:\n",
" self._index_struct.delete(node_id)\n",
" await self._docstore.adelete_document(node_id, raise_error=False)\n",
" self._storage_context.index_store.add_index_struct(self._index_struct)\n",
"\n",
" def delete_nodes(\n",
" self,\n",
" node_ids: List[str],\n",
" delete_from_docstore: bool = False,\n",
" **delete_kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"\n",
" Delete a list of nodes from the index.\n",
"\n",
" Args:\n",
" node_ids (List[str]): A list of node_ids from the nodes to delete\n",
"\n",
" \"\"\"\n",
" # delete nodes from vector store\n",
" self._vector_store.delete_nodes(node_ids, **delete_kwargs)\n",
"\n",
" # delete from docstore only if needed\n",
" if (\n",
" not self._vector_store.stores_text or self._store_nodes_override\n",
" ) and delete_from_docstore:\n",
" for node_id in node_ids:\n",
" self._index_struct.delete(node_id)\n",
" self._docstore.delete_document(node_id, raise_error=False)\n",
" self._storage_context.index_store.add_index_struct(self._index_struct)\n",
"\n",
" def _delete_from_index_struct(self, ref_doc_id: str) -> None:\n",
" # delete from index_struct only if needed\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" ref_doc_info = self._docstore.get_ref_doc_info(ref_doc_id)\n",
" if ref_doc_info is not None:\n",
" for node_id in ref_doc_info.node_ids:\n",
" self._index_struct.delete(node_id)\n",
" self._vector_store.delete(node_id)\n",
"\n",
" def _delete_from_docstore(self, ref_doc_id: str) -> None:\n",
" # delete from docstore only if needed\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" self._docstore.delete_ref_doc(ref_doc_id, raise_error=False)\n",
"\n",
" def delete_ref_doc(\n",
" self, ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Delete a document and it's nodes by using ref_doc_id.\"\"\"\n",
" self._vector_store.delete(ref_doc_id, **delete_kwargs)\n",
" self._delete_from_index_struct(ref_doc_id)\n",
" if delete_from_docstore:\n",
" self._delete_from_docstore(ref_doc_id)\n",
" self._storage_context.index_store.add_index_struct(self._index_struct)\n",
"\n",
" async def _adelete_from_index_struct(self, ref_doc_id: str) -> None:\n",
" \"\"\"Delete from index_struct only if needed.\"\"\"\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" ref_doc_info = await self._docstore.aget_ref_doc_info(ref_doc_id)\n",
" if ref_doc_info is not None:\n",
" for node_id in ref_doc_info.node_ids:\n",
" self._index_struct.delete(node_id)\n",
" self._vector_store.delete(node_id)\n",
"\n",
" async def _adelete_from_docstore(self, ref_doc_id: str) -> None:\n",
" \"\"\"Delete from docstore only if needed.\"\"\"\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" await self._docstore.adelete_ref_doc(ref_doc_id, raise_error=False)\n",
"\n",
" async def adelete_ref_doc(\n",
" self, ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Delete a document and it's nodes by using ref_doc_id.\"\"\"\n",
" tasks = [\n",
" self._vector_store.adelete(ref_doc_id, **delete_kwargs),\n",
" self._adelete_from_index_struct(ref_doc_id),\n",
" ]\n",
" if delete_from_docstore:\n",
" tasks.append(self._adelete_from_docstore(ref_doc_id))\n",
"\n",
" await asyncio.gather(*tasks)\n",
"\n",
" self._storage_context.index_store.add_index_struct(self._index_struct)\n",
"\n",
" @property\n",
" def ref_doc_info(self) -> Dict[str, RefDocInfo]:\n",
" \"\"\"Retrieve a dict mapping of ingested documents and their nodes+metadata.\"\"\"\n",
" if not self._vector_store.stores_text or self._store_nodes_override:\n",
" node_doc_ids = list(self.index_struct.nodes_dict.values())\n",
" nodes = self.docstore.get_nodes(node_doc_ids)\n",
"\n",
" all_ref_doc_info = {}\n",
" for node in nodes:\n",
" ref_node = node.source_node\n",
" if not ref_node:\n",
" continue\n",
"\n",
" ref_doc_info = self.docstore.get_ref_doc_info(ref_node.node_id)\n",
" if not ref_doc_info:\n",
" continue\n",
"\n",
" all_ref_doc_info[ref_node.node_id] = ref_doc_info\n",
" return all_ref_doc_info\n",
" else:\n",
" raise NotImplementedError(\n",
" \"Vector store integrations that store text in the vector store are \"\n",
" \"not supported by ref_doc_info yet.\"\n",
" )\n",
"\n"
]
}
],
"source": [
"import inspect\n",
"print(inspect.getsource(llama_index.core.VectorStoreIndex))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8125e2de",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View file

@ -0,0 +1,52 @@
Enter a search topic or question (or 'exit'):
The mind as a terrible master.
**Summary Theme:**
This collection of excerpts explores the human mind's complex nature, its cognitive processes, perception, memory, and social
interactions. The texts delve into how our thoughts are shaped by external stimuli, our brain's organizational patterns, and the
emergence of consciousness. Additionally, they touch on the mind's tendency to create illusions (like "mass delusion") and the
challenges posed by a distributed brain and decentralized consciousness.
**Matching Files:**
1. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-01-23.txt - The excerpt examines what becomes of an
incessant critic in the mind, questioning the role of brain functions in creating negative self-perceptions.
2. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-10.txt - The excerpt delves into the journal's
effect on cognition, implying that it can profoundly influence thought processes even if not actively processing itself.
3. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-08-18.txt - This excerpt suggests a comparison between
a mind that is constantly on the move and an overloaded train, emphasizing the lack of central control.
4. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-08.txt - The excerpt explores thoughts as
"tendrils" that can hold a person and influence their mood and behavior, mirroring the mind's struggle with self-awareness.
5. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2019-07-03.txt - This excerpt connects the idea of
"impression management" to evolutionary adaptation, suggesting a mind capable of organizing individuals around "fictions."
6. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-03-24.txt - The excerpt emphasizes the modularity of
the mind and preconditioned responses, reflecting on E. Bruce Goldstein's book and its relevance to understanding consciousness.
7. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-01-06.txt - A contemplation of memory's role as a
"lump of coal that bears the delicate impression of a leaf," highlighting its complexity and non-linear nature.
8. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-05-04.txt - The excerpt reflects on feelings of
skepticism about human endeavors and the mind's tendency to imagine conflict, leading to negative self-judgment.
9. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-01-23.txt - The excerpt references David Foster
Wallace's quote about the mind as a "terrible master" and explores how individuals who commit suicide often shoot themselves in the
head, silencing their inner critic.
10. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-08.txt - This excerpt discusses the philosophical
implications of a distributed brain and decentralized consciousness, questioning the existence of a singular "self" making
decisions.
Source documents:
2009-08-24.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2009-08-24.txt 0.7286187534682154
2023-01-23.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-01-23.txt 0.7174705749042735
2023-03-09.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-03-09.txt 0.6905817844827031
2023-01-06.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-01-06.txt 0.6872058770669452
2021-05-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-05-04.txt 0.6866138676376796
2022-04-17.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2022-04-17.txt 0.6837786406828062
2025-03-10.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-10.txt 0.6825293816922051
2021-05-02.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-05-02.txt 0.6818701242339038
2025-03-08.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-08.txt 0.6804468955664654
2024-03-24.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-03-24.txt 0.6798798323221176
2022-02-24.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2022-02-24.txt 0.6779782723066287
2024-08-18.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-08-18.txt 0.676507830756482
2019-07-03.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2019-07-03.txt 0.6754137298987061
2021-12-22.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-12-22.txt 0.6747843533262554
2024-03-24.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-03-24.txt 0.6740836055290546

View file

@ -0,0 +1,65 @@
Enter your query (or type 'exit' to quit): It's a weird refuge to refute nationality; to claim that all is a fraud, anyway. Still, it is the most sane reaction right now. The right to walk away. All else is slavery.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/index_store.json.
Response:
**Summary Theme:**
The texts explore complex issues related to freedom, equality, and justice, particularly within political and social contexts.
They discuss the limitations of current systems in addressing human rights and ethical standards, including prison treatment,
racial discrimination, and the expansion of citizenship rights. The texts also delve into the nature of revolutions, questioning
whether they benefit a select elite or are instigated by them.
**Matching Files:**
1. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2002-05-09.txt - This text discusses the treatment of
prisoners, highlighting a lack of adherence to constitutional standards and due process, as well as addressing issues related to
nationality and citizenship.
2. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-06-04.txt - A blog post discussing the idea of 'human
nature' in politics, with a focus on the importance of freedom and the critique of a political figure's policies.
3. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-13.txt - This passage explores the topic of white
male privilege and its implications in business, questioning why it is highlighted while overlooking broader systemic issues.
4. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-01-03.txt - The text delves into the idea of freedom
and imagination in the early 20th century, comparing it to the present day and discussing the constraints of democracy and
capitalism.
5. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2016-11-09.txt - A text addressing civil rights history
and the hate and anger that arise from it, questioning how to respond while avoiding tribalism and degeneration.
6. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-28.txt - Discussing legal issues and the
challenges of thinking about human rights in a multicultural context, particularly regarding people who are neither Christians nor
infidels.
7. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-01-29.txt - A personal reflection on current
political events and the author's difficulties in expressing feelings, tying into discussions of history and psychology.
8. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2001-09-13.txt - This text addresses the justification for
violent acts during a perceived war against fundamentalism, emphasizing economic and moral principles over nationhood.
9. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-06-03.txt - A passage on the concept of
incommensurability in politics and how consensus processes aim to reconcile different perspectives in practical situations.
10. file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2002-05-09.txt - Discussing the expansion of citizenship
rights in American history, highlighting key dates and milestones in this process.
Source documents:
2002-05-09.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2002-05-09.txt 0.6722719323340611
2025-06-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-06-04.txt 0.6608581763116415
2024-07-14.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-07-14.txt 0.6475284193414396
2025-06-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-06-04.txt 0.6468059334833061
2025-06-03.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-06-03.txt 0.6466041920182646
2016-11-09.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2016-11-09.txt 0.6451955555687188
2001-09-13.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2001-09-13.txt 0.6433104875230174
2025-01-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-01-04.txt 0.6356563682194852
2024-10-13.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-13.txt 0.6347407640363988
2025-01-03.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-01-03.txt 0.6336626187333729
2021-09-16.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-09-16.txt 0.6328042502815873
2025-07-28.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-28.txt 0.6324342333276086
2025-01-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-01-04.txt 0.6317671258192576
2025-01-29.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-01-29.txt 0.6313280704571994
2024-06-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-06-20.txt 0.629663790289146
Query processed in 95 seconds.

174
saved_output/2025_08_28.txt Normal file
View file

@ -0,0 +1,174 @@
Enter your query (or type 'exit' to quit): I'm looking for the happiest and most joyful passages.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/index_store.json.
Response:
**Summary Theme:**
The author reflects on moments of joy and happiness in their life, exploring themes such as contentment, love, and the beauty of
everyday experiences. They express a desire to let themselves be happy every day and find pleasure in creative pursuits like
poetry and art appreciation. Despite personal struggles with depression and anxiety, the author emphasizes the importance of
finding happiness in one's daily life.
**Matching Files:**
1. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-12-03.txt** — Chloe's smile while praising her
piano playing brings joy, highlighting the author's appreciation for small acts of kindness.
2. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2006-08-11.txt** — The author's day feeding carrots to
horses and making pizza with Matthew is described as "fun times, maybe the best ever," showcasing their ability to find joy in
simple pleasures.
3. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-01-23.txt** — Reading poetry and appreciating
simple observations brings positive thoughts, indicating a focus on finding happiness through creative pursuits.
4. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-07-25.txt** — The author reflects on the joys of
their life, such as time with family and the love they experienced with T, despite later experiencing heartache.
5. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2013-02-15.txt** — The passage encourages being joyful,
happy, pleased, and glad, aligning with the author's overall theme of finding happiness in various life experiences.
6. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2019-09-14.txt** — Reflecting on a week of learning,
teaching, and feeling curious leads to the realization that one can find happiness every day, emphasizing the author's ability to
let themselves be happy.
7. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-20.txt** — This file, titled "Ευδαιμονía,"
contains ancient Greek words related to happiness and well-being, further reinforcing the author's exploration of finding joy in
life.
8. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-29.txt** — The scorching weather and being
outside provide a backdrop to the author's ability to find happiness despite potential physical discomfort, demonstrating their
resilient outlook.
9. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-09.txt** — The author expresses frustration and
depression due to daily interactions but also acknowledges the importance of finding happiness in life, aligning with their
broader theme.
10. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-12-06.txt** — The passage defines happiness as
contentment and peacefulness, highlighting the author's pursuit of a joyful life through their experiences.
Source documents:
2025-07-29.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-29.txt 0.7135682886000794
2008-12-06.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-12-06.txt 0.7099131243276414
2009-06-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2009-06-04.txt 0.6973211899243362
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6866097119060084
2013-02-15.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2013-02-15.txt 0.686259123672228
2012-09-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-20.txt 0.6790148415972938
2015-01-23.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-01-23.txt 0.6761073066656899
2015-12-03.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-12-03.txt 0.6712531329880593
2006-08-11.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2006-08-11.txt 0.6613670040827223
2024-07-25.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-07-25.txt 0.6570111677987235
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6558116128405127
2019-09-14.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2019-09-14.txt 0.6549423349658567
2024-04-03.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-04-03.txt 0.6546862471469852
2023-07-24.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-07-24.txt 0.6544076938168284
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6541587448214657
Query processed in 73 seconds.
---
This was a strange failure!
((.venv) ) ~/Library/CloudStorage/Dropbox/nd/ssearch/$ run_query.sh
Enter your query (or type 'exit' to quit): Find documents that express feelings of gratitude.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/index_store.json.
Response:
**Summary Theme:**
The query is about finding documents expressing feelings of gratitude. However, it seems there was an error in my interpretation
or the context provided, as the dominant themes I identified earlier were related to depression and anxiety rather than gratitude.
Based on the given context, the theme that matches the query is related to personal struggles with mental health, particularly
feelings of sadness and appreciation for connections.
**Matching Files:**
1. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-09.txt** — Expressed frustration with joggers on the bike
path but did not mention gratitude.
2. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt** — No direct expressions of gratitude found, but a
reflection on personal struggles and achievements was present.
3. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-05-27.txt** — Focuses on negative emotions like anxiety and
anger, with no clear expressions of gratitude.
4. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2013-05-23.txt** — Mentions the joy of helping others achieve their
goals, which could be interpreted as a form of appreciation or gratitude for their success and recognition.
5. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-03-17.txt** — Contains suicidal thoughts and negative
feelings, indicating a lack of gratitude.
6. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-07-16.txt** — Describes feelings of loss and the search for
meaning, devoid of expressions of gratitude.
7. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-18.txt** — No clear mentions of gratitude found.
8. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2007-05-31.txt** — Focuses on career concerns and negative
emotions, without expressing gratitude.
9. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2020-05-13.txt** — Struggles with recognizing others' efforts due
to internal bad feelings, which contrasts the idea of gratitude.
10. **/Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2020-02-01.txt** — Mentions reconnecting with old friendships and
family, but there are no explicit expressions of gratitude.
Source documents:
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6865291287082457
2008-05-27.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-05-27.txt 0.6707430757786356
2023-02-17.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2023-02-17.txt 0.6624994985797085
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6614406157945066
2025-03-18.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-03-18.txt 0.6589271548285772
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6583888795181797
2025-07-28.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-28.txt 0.6575634356770015
2012-09-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-20.txt 0.6564913212073614
2020-05-13.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2020-05-13.txt 0.6563809376620068
2025-08-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-08-20.txt 0.6549296468531686
2013-05-23.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2013-05-23.txt 0.653871795081564
2009-06-04.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2009-06-04.txt 0.6535844277567499
2007-05-31.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2007-05-31.txt 0.6524713123412845
2025-07-29.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-29.txt 0.6517446358739963
2020-02-01.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2020-02-01.txt 0.6514433384900066
Query processed in 74 seconds.
---
I implemented a regex that srtips the full path:
((.venv) ) ~/Library/CloudStorage/Dropbox/nd/ssearch/$ run_query.sh
Enter your query (or type 'exit' to quit): Entries that discuss testing one's limits, especially emotional and mental.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/index_store.json.
Response:
**Summary Theme:**
The dominant theme in this context appears to be an individual exploring their emotions, particularly their mental and emotional
boundaries, as well as the impact of societal perceptions on feelings. The writer grapples with anxiety, depression, self-worth,
and the fear of inadequacy while also contemplating their own mortality and purpose. They seek to understand and manage their
emotions, often viewing them as data or information that can guide survival and informed decision-making.
**Matching Files:**
1. **file_path: ./data/2023-07-16.txt** — Describes the struggle of wrestling with depression for years, emphasizing the search
for meaning in a world driven by efficiency and optimization.
2. **file_path: ./data/2015-03-17.txt** — Mentions suicidal thoughts and feeling overwhelmed by negative emotions, indicating a
desire to test one's limits emotionally.
3. **file_path: ./data/2019-01-14.txt** — Discusses the struggle with controlling impulses and feelings of stress, anxiety, and
depression while questioning if one is a prisoner of their biology.
4. **file_path: ./data/2025-06-17.txt** — Explores the concept of feeling out personal boundaries and accepting dissonance, which
could be seen as testing emotional limits.
5. **file_path: ./data/2025-08-20.txt** — Mentions the interest in anarchy while being invested in capital markets and holding a
tenured position, indicating a potential exploration of one's limits.
6. **file_path: ./data/2017-12-06.txt** — Expresses suicidal thoughts due to burnout and emotional exhaustion, suggesting an
attempt to test personal boundaries.
7. **file_path: ./data/2017-12-16.txt** — Explores the desire to be a better person and the struggle with balance, potentially
indicating a journey of testing one's limits.
8. **file_path: ./data/2017-04-13.txt** — Focuses on worrying about hypotheticals and imagined fights, suggesting an exploration
of personal boundaries and emotional limits.
9. **file_path: ./data/2024-09-20.txt** — Admitted to having depressive thoughts despite appearing jovial, indicating a discussion
on testing the limits of one's mental health.
10. **file_path: ./data/2025-08-20.txt** — The computer facilitates artistic innovation by freeing the artist from conventional
"mental ready-mades," enabling the production of new assemblages of shapes and colors.
Source documents:
2019-01-28.txt ./data/2019-01-28.txt 0.7091032318236316
2003-03-09.txt ./data/2003-03-09.txt 0.6819464422399241
2025-08-20.txt ./data/2025-08-20.txt 0.6796124657599102
2025-08-20.txt ./data/2025-08-20.txt 0.6785008440538487
2017-04-13.txt ./data/2017-04-13.txt 0.6768340197245936
2022-05-06.txt ./data/2022-05-06.txt 0.6750801120630013
2023-01-27.txt ./data/2023-01-27.txt 0.6703347559624786
2023-03-14.txt ./data/2023-03-14.txt 0.668340287632692
2025-06-17.txt ./data/2025-06-17.txt 0.6656929175939117
2025-08-20.txt ./data/2025-08-20.txt 0.6645024849162311
2023-07-16.txt ./data/2023-07-16.txt 0.6618312766890652
2021-04-15.txt ./data/2021-04-15.txt 0.661171288633267
2025-08-20.txt ./data/2025-08-20.txt 0.6600615010925119
2019-01-14.txt ./data/2019-01-14.txt 0.6563840810491259
2025-05-23.txt ./data/2025-05-23.txt 0.6561484407217757
Query processed in 79 seconds.

124
saved_output/2025_08_30.txt Normal file
View file

@ -0,0 +1,124 @@
((.venv) ) ~/Library/CloudStorage/Dropbox/nd/ssearch/$ run_query.sh
Enter your query (or type 'exit' to quit): You are a machine and you can program yourself.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/index_store.json.
Response:
### Summary Theme
The dominant theme from the provided context revolves around the concept of human agency and the transformative power of
self-programming, often referred to as "YOU ARE A MACHINE AND YOU CAN PROGRAM YOURSELF". This phrase encapsulates the idea that
individuals have the capability to upgrade their own mental and emotional states through conscious effort and learning. The
context explores this theme through various angles, including philosophical reflections on storytelling and human existence,
discussions of meditation and personal growth, and explorations of information replication and self-replication systems like DNA
and memes.
### Matching Files
1. **file_path: ./data/2024-09-21.txt** - The passage discusses Von Neumann's theory of self-replication, which ties into the idea
that individuals can replicate and "program" themselves through learning and development.
2. **file_path: ./data/2024-08-19.txt** - This snippet features a series of repetitions of "YOU ARE A MACHINE AND YOU CAN PROGRAM
YOURSELF," emphasizing the transformative power of self-programming and personal growth.
3. **file_path: ./data/2024-03-11.txt** - The author contemplates purpose, meaning, and the act of programming oneself through
writing and reflection, aligning with the self-programming theme.
4. **file_path: ./data/2023-07-16.txt** - A robot's inner struggles, including a desire to explore feelings and understand its own
existence, hint at the idea of self-programming and personal development.
5. **file_path: ./data/2024-04-11.txt** - The author's fascination with the materiality of computation suggests a connection to
understanding human existence through self-programming and self-replication.
6. **file_path: ./data/2024-02-13.txt** - The extensive list of tasks and projects the author wants to accomplish reflects a drive
for personal growth, akin to self-programming.
7. **file_path: ./data/2024-08-19.txt** - A discussion of information degradation and the relationship between DNA, memes, and
information replication hints at a deeper understanding through self-programming.
8. **file_path: ./data/2025-01-24.txt** - A detailed account of typing a program onto IBM cards and working with early computers
emphasizes the labor involved in creating and learning from technology, which can be seen as self-programming through the
acquisition of knowledge.
9. **file_path: ./data/2025-02-05.txt** - This passage raises questions about truth and human agency, which are closely tied to
the idea of self-programming and the ability to shape one's own existence through conscious effort.
10. **file_path: ./data/2022-01-22.txt** - The author relates meditation to "operating system updates," illustrating how
self-programming can lead to improved performance and functionality in the mind, much like a computer's software.
Source documents:
2021-02-12.txt ./data/2021-02-12.txt 0.7813853524305151
2021-03-12.txt ./data/2021-03-12.txt 0.7170262020422805
2021-03-22.txt ./data/2021-03-22.txt 0.7080438590471859
2025-02-05.txt ./data/2025-02-05.txt 0.700772619041579
2022-01-22.txt ./data/2022-01-22.txt 0.6946526808142116
2024-08-19.txt ./data/2024-08-19.txt 0.6909295863339957
2024-09-21.txt ./data/2024-09-21.txt 0.6863798746276172
2024-08-19.txt ./data/2024-08-19.txt 0.6811521050296564
2024-03-11.txt ./data/2024-03-11.txt 0.6776553751255855
2023-07-16.txt ./data/2023-07-16.txt 0.6734772841938028
2021-03-01.txt ./data/2021-03-01.txt 0.6703476982962236
2025-05-26.txt ./data/2025-05-26.txt 0.6699061717036373
2024-02-13.txt ./data/2024-02-13.txt 0.6675189407579228
2025-01-24.txt ./data/2025-01-24.txt 0.6661259191158485
2024-04-11.txt ./data/2024-04-11.txt 0.664854046786588
Query processed in 97 seconds.
Enter your query (or type 'exit' to quit): Summarize passages related to questions about truth and human agency.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage_exp/index_store.json.
Response:
**Summary Theme:**
The texts explore the relationship between truth, knowledge, and human agency, highlighting how our understanding of reality is
shaped by interpretation and negotiation rather than an objective standard. They question the nature of self-awareness and
consciousness, suggesting that it arises from independent facts and truths beyond individual control. This perspective challenges
traditional notions of knowledge and ethics, suggesting that shared meaning and identity might be more influential than facts
themselves. The theme also delves into the implications of these ideas for governance, society, and the integration of diverse
epistemic frameworks.
**Matching Files:**
1. ./data/2025-03-08.txt - The passage emphasizes that truth is not fixed but a product of human interpretation, challenging the
idea of absolute knowledge and suggesting that our beliefs are subject to revision and refinement as we learn more about ourselves
and the world.
2. ./data/2025-02-14.txt - This snippet discusses truth as a social construct, prompting questions about ethical epistemology and
the potential role of AI in shaping epistemic environments. It also introduces the idea that knowledge is a product of shared
meaning and identity rather than just facts.
3. ./data/2025-02-14.txt - The text includes a quote suggesting that our concepts are self-created, which can both empower and
limit us, fostering critical thinking and personal responsibility while also potentially leading to disorientation and existential
uncertainty.
4. ./data/2025-03-08.txt - The implications of truth being a social construct are explored further, including the idea that
fact-checking alone doesn't address shared meaning and identity, leading to discussions about ethics, society, governance, and the
role of AI in shaping epistemic environments.
5. ./data/2025-03-08.txt - This file delves into the concept of "we are supplicants to our own fiction," exploring how humans
create meaning systems that can be comforting but potentially misleading or limiting, and emphasizing the importance of
self-awareness for critical thinking and personal growth.
6. ./data/2025-02-14.txt - The text raises questions about the nature of truth, reality, and human agency, inviting contemplation
on whether our stories are mere constructs or reflections of deeper aspects of human existence, and how we can navigate
storytelling to uncover accurate portrayals.
7. ./data/2025-03-08.txt - This passage continues the discussion on the implications of a social construct of truth, including the
potential role of AI in mediating competing epistemic frameworks and reducing polarization.
8. ./data/2010-12-30.txt - A statement that "there is no meaning and no purpose in life" is discussed, reflecting existentialist
philosophies, and raising questions about reconciling the tension between fiction and the search for meaning and purpose.
9. ./data/2025-03-08.txt - This snippet presents a list of resources related to knowledge, power, and institutions, including
Michel Foucault, Donna Haraway, Stefan Lorenz Sorgner, William James, and Noam Chomsky, reflecting on the relationship between
power and knowledge throughout history.
Source documents:
2006-12-27.txt ./data/2006-12-27.txt 0.7103349341421636
2025-02-14.txt ./data/2025-02-14.txt 0.6992770918721224
2025-03-08.txt ./data/2025-03-08.txt 0.686001774445945
2025-02-14.txt ./data/2025-02-14.txt 0.6743349162123844
2025-03-08.txt ./data/2025-03-08.txt 0.6733934128354977
2025-03-08.txt ./data/2025-03-08.txt 0.6706689033144045
2025-02-05.txt ./data/2025-02-05.txt 0.6702486733668184
2025-01-04.txt ./data/2025-01-04.txt 0.6699433363201491
2025-02-14.txt ./data/2025-02-14.txt 0.6691576672622886
2025-03-08.txt ./data/2025-03-08.txt 0.6670311145975771
2008-04-22.txt ./data/2008-04-22.txt 0.665624998848253
2025-02-06.txt ./data/2025-02-06.txt 0.6654464518589284
2010-12-30.txt ./data/2010-12-30.txt 0.663147445474458
2004-02-15.txt ./data/2004-02-15.txt 0.6625948924633361
2025-02-06.txt ./data/2025-02-06.txt 0.6608789240071589
Query processed in 92 seconds.

1
saved_output/README.txt Normal file
View file

@ -0,0 +1 @@
This directory contains collections of interesting output from the nd_ssearch query engine.

View file

@ -0,0 +1,113 @@
# Generaterd by llama3.1:8B
Enter a search topic or question (or 'exit'):
Simplicity, peace, and acceptance.
**Summary Theme**
The dominant theme that emerges from the provided context is the pursuit of simplicity, peace, and acceptance as a means to find
meaning and contentment in life. The excerpts suggest that individuals often struggle with existential crises, anxiety, and
dissatisfaction, but through various philosophical and spiritual practices, they seek to cultivate a sense of inner peace and
harmony.
**Matching Files**
1. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-09-22.txt** — Suzuki discusses the importance of
letting go of intellectual pursuits and embracing the simplicity of life, feeling the "power of life" and being content with its
evolution.
2. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-02-12.txt** — The author reflects on their emptiness
and yearning for something deeper, wondering if they can relish in this feeling and explore it further.
3. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-02-27.txt** — Life is described as pain, but the
author finds solace in God and feels a deep connection to something universal, seeking to do good for others.
4. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-08.txt** — The text advises living life with
virtue above pleasure and tranquility above happiness, finding contentment in the present moment.
5. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2007-05-30.txt** — The author reflects on their
existential crisis and seeks to find meaning through meditation, exploring the wonder of life and the universe.
6. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-02-05.txt** — Alan Watts' measured pace and calm
tone are mentioned as soothing, inviting listeners to slow down and absorb his philosophical musings.
7. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt** — The author experiences a sense of
dissolution of the self and acceptance of mortality through music, finding hope in the peaceful and calm aspects of human
existence.
8. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2011-04-20.txt** — The text presents two contrasting
choices for raising children: to learn how to fight or navigate complex social interactions, leading to exhaustion and missteps.
9. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt** — Music is mentioned as a means to
connect with those who have passed and accept one's own mortality, finding hope in acceptance.
10. **file_path: /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-07-09.txt** — The author reflects on their own
flaws and limitations, seeking to cultivate peace and tranquility through meditation and philosophical insights.
These files prioritize the pursuit of simplicity, peace, and acceptance as a means to find meaning and contentment in life.
Source documents:
2008-09-22.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-09-22.txt 0.6942251115364034
2008-12-06.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-12-06.txt 0.69329826339918
2006-01-18.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2006-01-18.txt 0.6928289731862756
2024-07-09.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-07-09.txt 0.6851764077064736
2024-10-21.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-21.txt 0.6850957394630768
2021-05-15.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-05-15.txt 0.6792933726282656
2025-07-28.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-28.txt 0.677805020905822
2008-02-27.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-02-27.txt 0.6758790431127354
2025-02-05.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-02-05.txt 0.6751368515758781
2003-11-11.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt 0.6741012107117246
2015-02-12.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-02-12.txt 0.6724595748596391
2012-09-08.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-08.txt 0.6709374012666305
2007-05-30.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2007-05-30.txt 0.6705586893086012
2003-11-11.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt 0.6683483221331056
2011-04-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2011-04-20.txt 0.6679376031898634
# Generated by command-r7b
Enter a search topic or question (or 'exit'):
Simplicity, peace, and acceptance.
**Summary Theme:**
The dominant themes from the provided context revolve around personal growth, spirituality, and finding contentment through
simplicity, peace, and acceptance. The text emphasizes the idea that one should let go of intellectual pursuits and anxiety to feel
more content with life's challenges and find inner peace. It explores the concepts of Zen, tranquility, and the power of meditation
as a path to achieving this state of being. Additionally, the texts touch on the impact of personal experiences (like the birth of
a child) in fostering a sense of calm and connection with the divine.
**Matching Files:**
1. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-09-22.txt - D.T. Suzuki's excerpt highlights the importance of
letting go of intellectual pursuits and focusing on the present moment to achieve peace and contentment, aligning with themes of
simplicity and acceptance.
2. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-02-12.txt - This file discusses the emptiness and yearning for
deeper meaning, suggesting a journey towards personal peace and content in one's spiritual quest.
3. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-12-06.txt - The excerpt promotes happiness through contentment
and peacefulness, echoing the themes of simplicity and acceptance.
4. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt - The text explores the dissolution of the self and
finding peace in spiritual experiences, relating to themes of simplification and acceptance.
5. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-21.txt - A reflection on human nature and violence, this
passage emphasizes the importance of letting go of aggression and finding peace through a peaceful mind.
6. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-28.txt - This file mentions consensus-based processes and
Quaker values, which align with the themes of acceptance and tranquility in decision-making.
7. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-08.txt - The excerpt promotes the idea of regulating thoughts
and actions with a view towards mortality, highlighting simplicity and tranquility in preparation for life's end.
8. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2007-05-30.txt - The text discusses existential crises and finding
meaning through meditation and therapy, contributing to the theme of simplification through introspection.
9. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2011-04-20.txt - While more focused on child-rearing, this passage
hints at themes of acceptance and tranquility in navigating complex social interactions.
10. /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt - The exploration of prayer and spiritual experiences
leads to the theme of acceptance as one sheds the veneer of life, revealing deeper human existence.
Source documents:
2008-09-22.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-09-22.txt 0.6942251115364034
2008-12-06.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-12-06.txt 0.69329826339918
2006-01-18.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2006-01-18.txt 0.6928289731862756
2024-07-09.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-07-09.txt 0.6851764077064736
2024-10-21.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2024-10-21.txt 0.6850957394630768
2021-05-15.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2021-05-15.txt 0.6792933726282656
2025-07-28.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-07-28.txt 0.677805020905822
2008-02-27.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2008-02-27.txt 0.6758790431127354
2025-02-05.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2025-02-05.txt 0.6751368515758781
2003-11-11.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt 0.6741012107117246
2015-02-12.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2015-02-12.txt 0.6724595748596391
2012-09-08.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2012-09-08.txt 0.6709374012666305
2007-05-30.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2007-05-30.txt 0.6705586893086012
2003-11-11.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2003-11-11.txt 0.6683483221331056
2011-04-20.txt /Users/furst/Library/CloudStorage/Dropbox/nd/ssearch/data/2011-04-20.txt 0.6679376031898634

29
tests/README.md Normal file
View file

@ -0,0 +1,29 @@
# LLM Comparison Tests
Query used for all tests: **"Passages that quote Louis Menand."**
Script: `query_hybrid_bm25_v4.py` (hybrid BM25 + vector, cross-encoder re-rank to top 15)
Retrieval is identical across all tests (same 15 chunks, same scores).
Only the LLM synthesis step differs.
File naming: `results_<model>_t<temperature>.txt`
## Results
| File | LLM | Temperature | Files cited | Time | Notes |
|------|-----|-------------|-------------|------|-------|
| `results_gpt4omini_t0.1.txt` | gpt-4o-mini (OpenAI API) | 0.1 | 6 | 44s | Broader coverage, structured numbered list, drew from chunks ranked as low as #14 |
| `results_commandr7b_t0.8.txt` | command-r7b (Ollama local) | 0.8 (default) | 2 | 78s | Focused on top chunks, reproduced exact quotes verbatim |
| `results_gpt4omini_t0.3.txt` | gpt-4o-mini (OpenAI API) | 0.3 | 6 | 45s | Very similar to 0.1 run -- same 6 files, same structure, slightly more interpretive phrasing |
| `results_commandr7b_t0.3.txt` | command-r7b (Ollama local) | 0.3 | 6 | 94s | Major improvement over 0.8 default: cited 6 files (was 2), drew from lower-ranked chunks including 2024-08-03 (#15) |
## Observations
- Lowering command-r7b from 0.8 to 0.3 dramatically improved breadth (2 → 6 files cited).
At 0.8, the model focused narrowly on the top-scored chunks. At 0.3, it used the full
context window much more effectively.
- gpt-4o-mini showed little difference between 0.1 and 0.3. It already used the full
context at 0.1. The API model appears less sensitive to temperature for this task.
- command-r7b at 0.3 took longer (94s vs 78s), likely due to generating more text.
- At temperature=0.3, both models converge on similar quality: 6 files cited, good
coverage of the context window, mix of direct quotes and paraphrases.

View file

@ -0,0 +1,54 @@
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The provided excerpts contain several references to Louis Menand's ideas and quotes. Here are the specific passages:
- In file_path: ./data/2025-11-04.txt, you can find a direct quote from Louis Menand: "We created God, and then pretended that God
created us. We hypostatized our own concept and turned it into something “out there” whose commandments (which we made up) we
struggle to understand and obey. We are supplicants to our own fiction."
- In file_path: ./data/2025-02-14.txt, there is a reference to Menand's quote about the human tendency to create and interact with
abstract ideas as if they are tangible realities.
- In file_path: ./data/2022-08-14.txt, another excerpt from Menand's work is mentioned: "We created God, and then we pretended
that God created us."
- In file_path: ./data/2025-07-27.txt, the author discusses Menand's quote about the creation of fictions and the role of
organization in human culture.
- In file_path: ./data/2024-09-06.txt, there is a mention of Menand's writing style, describing him as "witty and serious."
- In file_path: ./data/2025-02-14.txt, the poem takes a wistful tone, hinting at the fragility of human attachment to concepts,
which is reminiscent of existential crises and Menand's ideas on hypostasis.
- In file_path: ./data/2025-07-27.txt, the author further elaborates on Menand's quote, emphasizing the human tendency to create
and interact with fictions.
- In file_path: ./data/2024-08-03.txt, there is a reference to Louis Menand's book "The Free World" and its exploration of art,
literature, and culture in the 20th century.
All these excerpts contribute to understanding Louis Menand's ideas on hypostasis, human creation of fictions, and the complex
relationship between stories, beliefs, and reality.
Files contributing to the answer:
- ./data/2025-11-04.txt
- ./data/2025-02-14.txt
- ./data/2022-08-14.txt
- ./data/2025-07-27.txt
- ./data/2024-09-06.txt
- ./data/2025-02-14.txt
- ./data/2025-07-27.txt
- ./data/2024-08-03.txt
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 94 seconds.

View file

@ -0,0 +1,58 @@
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The provided excerpts feature multiple references to Louis Menand's quote about hypostasis and our creation of God as a fiction.
Here are the relevant passages:
- **"We created God, and then pretended that God created us."** **(file_path: ./data/2025-11-04.txt)**
This excerpt introduces Menand's quote as the starting point for an exploration of how humans create and interact with abstract
concepts.
- **"We hypostatized our own concept and turned it into something “out there” whose commandments (which we made up) we struggle to
understand and obey."** **(file_path: ./data/2025-11-04.txt)**
This quote delves further into the idea of turning concepts into something tangible, emphasizing the struggle to understand and
follow the rules we create.
- **"We are supplicants to our own fiction."** **(file_path: ./data/2025-11-04.txt)**
The passage concludes by emphasizing the concept of being subservient to our own creations, highlighting the complex relationship
between human stories and reality.
- **"We created God, and then pretended that God created us."** **(file_path: ./data/2022-08-14.txt)**
This excerpt reiterates the core idea from Menand's quote in a slightly different phrasing.
- **"We hypostatized our own concept and turned it into something “out there” whose commandments, which we also made up, we
struggle to understand and obey."** **(file_path: ./data/2022-08-14.txt)**
This quote emphasizes the process of creating rules and struggling to comprehend them while adhering to them.
All these excerpts contribute to exploring Menand's perspective on how humans create, project, and interact with abstract
concepts, highlighting the complex relationship between stories, beliefs, and reality.
**Files contributing:**
* **2025-11-04.txt**
* **2022-08-14.txt**
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 78 seconds.

View file

@ -0,0 +1,65 @@
run_query.sh
Passages that quote Louis Menand.
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The excerpts provided contain several passages that quote or reference Louis Menand, particularly focusing on his ideas about
hypostasis and the human relationship with concepts such as God and societal structures. Here are the relevant passages:
1. **File: ./data/2025-11-04.txt**
- This excerpt includes the quote: "We created God, and then pretended that God created us. We hypostatized our own concept and
turned it into something 'out there' whose commandments (which we made up) we struggle to understand and obey. We are supplicants
to our own fiction." It discusses the parallels between this idea and other societal constructs like democracy and politics.
2. **File: ./data/2025-02-14.txt**
- The passage reiterates Menand's quote about hypostasis, emphasizing how humans have taken abstract concepts and given them a
separate existence, which allows for worship and obedience. It reflects on the implications of this process on our understanding
of reality.
3. **File: ./data/2022-08-14.txt**
- This excerpt discusses Menand's critique of how people perceive forces as external and inevitable, which are actually human
creations. It includes the same quote about God and hypostasis, highlighting the reification of human concepts.
4. **File: ./data/2025-07-27.txt**
- Here, Menand's quote is referenced in the context of discussing narratives and organizational structures, emphasizing the
fictional nature of societal constructs.
5. **File: ./data/2025-02-05.txt**
- This excerpt paraphrases Menand's idea that we are "supplicants to our own fictions," linking it to the broader theme of
storytelling in human culture.
6. **File: ./data/2024-09-04.txt**
- Although not a direct quote, this excerpt mentions Menand's work in relation to AI and copyright, indicating his relevance in
contemporary discussions.
In summary, the passages collectively illustrate Menand's exploration of how humans create and interact with abstract concepts,
particularly the notion of God, and how these constructs influence our understanding of reality and societal structures.
**Files that contributed to the answer:**
- **2025-11-04.txt**: Discusses the quote in relation to societal constructs.
- **2025-02-14.txt**: Reiterates the quote and its implications on reality.
- **2022-08-14.txt**: Analyzes Menand's critique of perception and includes the quote.
- **2025-07-27.txt**: References the quote in the context of narratives and organization.
- **2025-02-05.txt**: Paraphrases Menand's idea about fictions in human culture.
- **2024-09-04.txt**: Mentions Menand's relevance in discussions about AI.
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 44 seconds.

View file

@ -0,0 +1,63 @@
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The excerpts contain several passages that quote or reference Louis Menand, particularly focusing on his ideas about hypostasis
and the human relationship with concepts such as God and societal structures. Here are the relevant passages:
1. **File: ./data/2025-11-04.txt**
- This excerpt includes the quote: "We created God, and then pretended that God created us. We hypostatized our own concept and
turned it into something 'out there' whose commandments (which we made up) we struggle to understand and obey. We are supplicants
to our own fiction." It discusses the parallels between this idea and other societal constructs like democracy and politics.
2. **File: ./data/2025-02-14.txt**
- The passage reiterates Menand's quote about hypostasis and emphasizes how humans have created a sense of distance from their
own creations, allowing for worship or rebellion against them. It reflects on the implications of this process for understanding
reality and human experience.
3. **File: ./data/2022-08-14.txt**
- This excerpt discusses Menand's "Free World," specifically pages 335-339, where he critiques how people perceive external
forces as inevitable, which are actually human creations. It includes the same quote about God and hypostasis, highlighting the
idea that these constructs are reified as "the way things are."
4. **File: ./data/2025-07-27.txt**
- This excerpt references Menand's quote in the context of discussing anti-capitalist themes and the narratives that organize
society, emphasizing the fictional nature of these constructs.
5. **File: ./data/2025-02-05.txt**
- Here, Menand's notion that we are "supplicants to our own fictions" is paraphrased, relating it to the broader role of
storytelling in human culture and how we create and live by the narratives we construct.
6. **File: ./data/2024-09-04.txt**
- This excerpt mentions Menand in the context of discussing AI and copyright, though it does not quote him directly. It
highlights his relevance in contemporary discussions about technology.
### Summary of Contributions:
- **2025-11-04.txt**: Introduces Menand's quote and connects it to societal constructs.
- **2025-02-14.txt**: Explores the implications of Menand's ideas on reality and human experience.
- **2022-08-14.txt**: Discusses Menand's critique of how humans perceive societal constructs as external forces.
- **2025-07-27.txt**: Relates Menand's quote to anti-capitalist themes and societal narratives.
- **2025-02-05.txt**: Paraphrases Menand's ideas in the context of storytelling in culture.
- **2024-09-04.txt**: Mentions Menand in discussions about AI, highlighting his ongoing relevance.
Overall, these excerpts collectively illustrate Menand's critical perspective on how humans create and relate to their own
constructs, emphasizing the fictional nature of many societal narratives.
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 45 seconds.

1065
vs_metrics.ipynb Normal file

File diff suppressed because one or more lines are too long