Initial commit: RAG pipeline for semantic search over personal journal archive

Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval,
incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
This commit is contained in:
Eric 2026-02-20 06:02:28 -05:00
commit e9fc99ddc6
43 changed files with 7349 additions and 0 deletions

29
tests/README.md Normal file
View file

@ -0,0 +1,29 @@
# LLM Comparison Tests
Query used for all tests: **"Passages that quote Louis Menand."**
Script: `query_hybrid_bm25_v4.py` (hybrid BM25 + vector, cross-encoder re-rank to top 15)
Retrieval is identical across all tests (same 15 chunks, same scores).
Only the LLM synthesis step differs.
File naming: `results_<model>_t<temperature>.txt`
## Results
| File | LLM | Temperature | Files cited | Time | Notes |
|------|-----|-------------|-------------|------|-------|
| `results_gpt4omini_t0.1.txt` | gpt-4o-mini (OpenAI API) | 0.1 | 6 | 44s | Broader coverage, structured numbered list, drew from chunks ranked as low as #14 |
| `results_commandr7b_t0.8.txt` | command-r7b (Ollama local) | 0.8 (default) | 2 | 78s | Focused on top chunks, reproduced exact quotes verbatim |
| `results_gpt4omini_t0.3.txt` | gpt-4o-mini (OpenAI API) | 0.3 | 6 | 45s | Very similar to 0.1 run -- same 6 files, same structure, slightly more interpretive phrasing |
| `results_commandr7b_t0.3.txt` | command-r7b (Ollama local) | 0.3 | 6 | 94s | Major improvement over 0.8 default: cited 6 files (was 2), drew from lower-ranked chunks including 2024-08-03 (#15) |
## Observations
- Lowering command-r7b from 0.8 to 0.3 dramatically improved breadth (2 → 6 files cited).
At 0.8, the model focused narrowly on the top-scored chunks. At 0.3, it used the full
context window much more effectively.
- gpt-4o-mini showed little difference between 0.1 and 0.3. It already used the full
context at 0.1. The API model appears less sensitive to temperature for this task.
- command-r7b at 0.3 took longer (94s vs 78s), likely due to generating more text.
- At temperature=0.3, both models converge on similar quality: 6 files cited, good
coverage of the context window, mix of direct quotes and paraphrases.

View file

@ -0,0 +1,54 @@
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The provided excerpts contain several references to Louis Menand's ideas and quotes. Here are the specific passages:
- In file_path: ./data/2025-11-04.txt, you can find a direct quote from Louis Menand: "We created God, and then pretended that God
created us. We hypostatized our own concept and turned it into something “out there” whose commandments (which we made up) we
struggle to understand and obey. We are supplicants to our own fiction."
- In file_path: ./data/2025-02-14.txt, there is a reference to Menand's quote about the human tendency to create and interact with
abstract ideas as if they are tangible realities.
- In file_path: ./data/2022-08-14.txt, another excerpt from Menand's work is mentioned: "We created God, and then we pretended
that God created us."
- In file_path: ./data/2025-07-27.txt, the author discusses Menand's quote about the creation of fictions and the role of
organization in human culture.
- In file_path: ./data/2024-09-06.txt, there is a mention of Menand's writing style, describing him as "witty and serious."
- In file_path: ./data/2025-02-14.txt, the poem takes a wistful tone, hinting at the fragility of human attachment to concepts,
which is reminiscent of existential crises and Menand's ideas on hypostasis.
- In file_path: ./data/2025-07-27.txt, the author further elaborates on Menand's quote, emphasizing the human tendency to create
and interact with fictions.
- In file_path: ./data/2024-08-03.txt, there is a reference to Louis Menand's book "The Free World" and its exploration of art,
literature, and culture in the 20th century.
All these excerpts contribute to understanding Louis Menand's ideas on hypostasis, human creation of fictions, and the complex
relationship between stories, beliefs, and reality.
Files contributing to the answer:
- ./data/2025-11-04.txt
- ./data/2025-02-14.txt
- ./data/2022-08-14.txt
- ./data/2025-07-27.txt
- ./data/2024-09-06.txt
- ./data/2025-02-14.txt
- ./data/2025-07-27.txt
- ./data/2024-08-03.txt
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 94 seconds.

View file

@ -0,0 +1,58 @@
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The provided excerpts feature multiple references to Louis Menand's quote about hypostasis and our creation of God as a fiction.
Here are the relevant passages:
- **"We created God, and then pretended that God created us."** **(file_path: ./data/2025-11-04.txt)**
This excerpt introduces Menand's quote as the starting point for an exploration of how humans create and interact with abstract
concepts.
- **"We hypostatized our own concept and turned it into something “out there” whose commandments (which we made up) we struggle to
understand and obey."** **(file_path: ./data/2025-11-04.txt)**
This quote delves further into the idea of turning concepts into something tangible, emphasizing the struggle to understand and
follow the rules we create.
- **"We are supplicants to our own fiction."** **(file_path: ./data/2025-11-04.txt)**
The passage concludes by emphasizing the concept of being subservient to our own creations, highlighting the complex relationship
between human stories and reality.
- **"We created God, and then pretended that God created us."** **(file_path: ./data/2022-08-14.txt)**
This excerpt reiterates the core idea from Menand's quote in a slightly different phrasing.
- **"We hypostatized our own concept and turned it into something “out there” whose commandments, which we also made up, we
struggle to understand and obey."** **(file_path: ./data/2022-08-14.txt)**
This quote emphasizes the process of creating rules and struggling to comprehend them while adhering to them.
All these excerpts contribute to exploring Menand's perspective on how humans create, project, and interact with abstract
concepts, highlighting the complex relationship between stories, beliefs, and reality.
**Files contributing:**
* **2025-11-04.txt**
* **2022-08-14.txt**
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 78 seconds.

View file

@ -0,0 +1,65 @@
run_query.sh
Passages that quote Louis Menand.
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The excerpts provided contain several passages that quote or reference Louis Menand, particularly focusing on his ideas about
hypostasis and the human relationship with concepts such as God and societal structures. Here are the relevant passages:
1. **File: ./data/2025-11-04.txt**
- This excerpt includes the quote: "We created God, and then pretended that God created us. We hypostatized our own concept and
turned it into something 'out there' whose commandments (which we made up) we struggle to understand and obey. We are supplicants
to our own fiction." It discusses the parallels between this idea and other societal constructs like democracy and politics.
2. **File: ./data/2025-02-14.txt**
- The passage reiterates Menand's quote about hypostasis, emphasizing how humans have taken abstract concepts and given them a
separate existence, which allows for worship and obedience. It reflects on the implications of this process on our understanding
of reality.
3. **File: ./data/2022-08-14.txt**
- This excerpt discusses Menand's critique of how people perceive forces as external and inevitable, which are actually human
creations. It includes the same quote about God and hypostasis, highlighting the reification of human concepts.
4. **File: ./data/2025-07-27.txt**
- Here, Menand's quote is referenced in the context of discussing narratives and organizational structures, emphasizing the
fictional nature of societal constructs.
5. **File: ./data/2025-02-05.txt**
- This excerpt paraphrases Menand's idea that we are "supplicants to our own fictions," linking it to the broader theme of
storytelling in human culture.
6. **File: ./data/2024-09-04.txt**
- Although not a direct quote, this excerpt mentions Menand's work in relation to AI and copyright, indicating his relevance in
contemporary discussions.
In summary, the passages collectively illustrate Menand's exploration of how humans create and interact with abstract concepts,
particularly the notion of God, and how these constructs influence our understanding of reality and societal structures.
**Files that contributed to the answer:**
- **2025-11-04.txt**: Discusses the quote in relation to societal constructs.
- **2025-02-14.txt**: Reiterates the quote and its implications on reality.
- **2022-08-14.txt**: Analyzes Menand's critique of perception and includes the quote.
- **2025-07-27.txt**: References the quote in the context of narratives and organization.
- **2025-02-05.txt**: Paraphrases Menand's idea about fictions in human culture.
- **2024-09-04.txt**: Mentions Menand's relevance in discussions about AI.
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 44 seconds.

View file

@ -0,0 +1,63 @@
Query: --query Passages that quote Louis Menand.
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
Response:
The excerpts contain several passages that quote or reference Louis Menand, particularly focusing on his ideas about hypostasis
and the human relationship with concepts such as God and societal structures. Here are the relevant passages:
1. **File: ./data/2025-11-04.txt**
- This excerpt includes the quote: "We created God, and then pretended that God created us. We hypostatized our own concept and
turned it into something 'out there' whose commandments (which we made up) we struggle to understand and obey. We are supplicants
to our own fiction." It discusses the parallels between this idea and other societal constructs like democracy and politics.
2. **File: ./data/2025-02-14.txt**
- The passage reiterates Menand's quote about hypostasis and emphasizes how humans have created a sense of distance from their
own creations, allowing for worship or rebellion against them. It reflects on the implications of this process for understanding
reality and human experience.
3. **File: ./data/2022-08-14.txt**
- This excerpt discusses Menand's "Free World," specifically pages 335-339, where he critiques how people perceive external
forces as inevitable, which are actually human creations. It includes the same quote about God and hypostasis, highlighting the
idea that these constructs are reified as "the way things are."
4. **File: ./data/2025-07-27.txt**
- This excerpt references Menand's quote in the context of discussing anti-capitalist themes and the narratives that organize
society, emphasizing the fictional nature of these constructs.
5. **File: ./data/2025-02-05.txt**
- Here, Menand's notion that we are "supplicants to our own fictions" is paraphrased, relating it to the broader role of
storytelling in human culture and how we create and live by the narratives we construct.
6. **File: ./data/2024-09-04.txt**
- This excerpt mentions Menand in the context of discussing AI and copyright, though it does not quote him directly. It
highlights his relevance in contemporary discussions about technology.
### Summary of Contributions:
- **2025-11-04.txt**: Introduces Menand's quote and connects it to societal constructs.
- **2025-02-14.txt**: Explores the implications of Menand's ideas on reality and human experience.
- **2022-08-14.txt**: Discusses Menand's critique of how humans perceive societal constructs as external forces.
- **2025-07-27.txt**: Relates Menand's quote to anti-capitalist themes and societal narratives.
- **2025-02-05.txt**: Paraphrases Menand's ideas in the context of storytelling in culture.
- **2024-09-04.txt**: Mentions Menand in discussions about AI, highlighting his ongoing relevance.
Overall, these excerpts collectively illustrate Menand's critical perspective on how humans create and relate to their own
constructs, emphasizing the fictional nature of many societal narratives.
Source documents:
2025-11-04.txt ./data/2025-11-04.txt 3.303
2025-02-14.txt ./data/2025-02-14.txt 3.144
2022-08-14.txt ./data/2022-08-14.txt 2.183
2021-07-04.txt ./data/2021-07-04.txt 1.595
2025-02-14.txt ./data/2025-02-14.txt 1.504
2025-07-27.txt ./data/2025-07-27.txt 0.171
2024-07-26.txt ./data/2024-07-26.txt -0.969
2024-09-06.txt ./data/2024-09-06.txt -1.661
2024-04-08.txt ./data/2024-04-08.txt -2.017
2025-02-05.txt ./data/2025-02-05.txt -2.027
2022-04-17.txt ./data/2022-04-17.txt -2.054
2023-07-01.txt ./data/2023-07-01.txt -2.284
2023-01-06.txt ./data/2023-01-06.txt -2.315
2024-09-04.txt ./data/2024-09-04.txt -2.379
2024-08-03.txt ./data/2024-08-03.txt -2.391
Query processed in 45 seconds.