Initial commit: RAG pipeline for semantic search over personal journal archive
Vector search with cross-encoder re-ranking, hybrid BM25+vector retrieval, incremental index updates, and multiple LLM backends (Ollama local, OpenAI API).
This commit is contained in:
commit
e9fc99ddc6
43 changed files with 7349 additions and 0 deletions
29
tests/README.md
Normal file
29
tests/README.md
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
# LLM Comparison Tests
|
||||
|
||||
Query used for all tests: **"Passages that quote Louis Menand."**
|
||||
Script: `query_hybrid_bm25_v4.py` (hybrid BM25 + vector, cross-encoder re-rank to top 15)
|
||||
|
||||
Retrieval is identical across all tests (same 15 chunks, same scores).
|
||||
Only the LLM synthesis step differs.
|
||||
|
||||
File naming: `results_<model>_t<temperature>.txt`
|
||||
|
||||
## Results
|
||||
|
||||
| File | LLM | Temperature | Files cited | Time | Notes |
|
||||
|------|-----|-------------|-------------|------|-------|
|
||||
| `results_gpt4omini_t0.1.txt` | gpt-4o-mini (OpenAI API) | 0.1 | 6 | 44s | Broader coverage, structured numbered list, drew from chunks ranked as low as #14 |
|
||||
| `results_commandr7b_t0.8.txt` | command-r7b (Ollama local) | 0.8 (default) | 2 | 78s | Focused on top chunks, reproduced exact quotes verbatim |
|
||||
| `results_gpt4omini_t0.3.txt` | gpt-4o-mini (OpenAI API) | 0.3 | 6 | 45s | Very similar to 0.1 run -- same 6 files, same structure, slightly more interpretive phrasing |
|
||||
| `results_commandr7b_t0.3.txt` | command-r7b (Ollama local) | 0.3 | 6 | 94s | Major improvement over 0.8 default: cited 6 files (was 2), drew from lower-ranked chunks including 2024-08-03 (#15) |
|
||||
|
||||
## Observations
|
||||
|
||||
- Lowering command-r7b from 0.8 to 0.3 dramatically improved breadth (2 → 6 files cited).
|
||||
At 0.8, the model focused narrowly on the top-scored chunks. At 0.3, it used the full
|
||||
context window much more effectively.
|
||||
- gpt-4o-mini showed little difference between 0.1 and 0.3. It already used the full
|
||||
context at 0.1. The API model appears less sensitive to temperature for this task.
|
||||
- command-r7b at 0.3 took longer (94s vs 78s), likely due to generating more text.
|
||||
- At temperature=0.3, both models converge on similar quality: 6 files cited, good
|
||||
coverage of the context window, mix of direct quotes and paraphrases.
|
||||
54
tests/results_commandr7b_t0.3.txt
Normal file
54
tests/results_commandr7b_t0.3.txt
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
Query: --query Passages that quote Louis Menand.
|
||||
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
|
||||
|
||||
Response:
|
||||
|
||||
The provided excerpts contain several references to Louis Menand's ideas and quotes. Here are the specific passages:
|
||||
|
||||
- In file_path: ./data/2025-11-04.txt, you can find a direct quote from Louis Menand: "We created God, and then pretended that God
|
||||
created us. We hypostatized our own concept and turned it into something “out there” whose commandments (which we made up) we
|
||||
struggle to understand and obey. We are supplicants to our own fiction."
|
||||
- In file_path: ./data/2025-02-14.txt, there is a reference to Menand's quote about the human tendency to create and interact with
|
||||
abstract ideas as if they are tangible realities.
|
||||
- In file_path: ./data/2022-08-14.txt, another excerpt from Menand's work is mentioned: "We created God, and then we pretended
|
||||
that God created us."
|
||||
- In file_path: ./data/2025-07-27.txt, the author discusses Menand's quote about the creation of fictions and the role of
|
||||
organization in human culture.
|
||||
- In file_path: ./data/2024-09-06.txt, there is a mention of Menand's writing style, describing him as "witty and serious."
|
||||
- In file_path: ./data/2025-02-14.txt, the poem takes a wistful tone, hinting at the fragility of human attachment to concepts,
|
||||
which is reminiscent of existential crises and Menand's ideas on hypostasis.
|
||||
- In file_path: ./data/2025-07-27.txt, the author further elaborates on Menand's quote, emphasizing the human tendency to create
|
||||
and interact with fictions.
|
||||
- In file_path: ./data/2024-08-03.txt, there is a reference to Louis Menand's book "The Free World" and its exploration of art,
|
||||
literature, and culture in the 20th century.
|
||||
|
||||
All these excerpts contribute to understanding Louis Menand's ideas on hypostasis, human creation of fictions, and the complex
|
||||
relationship between stories, beliefs, and reality.
|
||||
|
||||
Files contributing to the answer:
|
||||
- ./data/2025-11-04.txt
|
||||
- ./data/2025-02-14.txt
|
||||
- ./data/2022-08-14.txt
|
||||
- ./data/2025-07-27.txt
|
||||
- ./data/2024-09-06.txt
|
||||
- ./data/2025-02-14.txt
|
||||
- ./data/2025-07-27.txt
|
||||
- ./data/2024-08-03.txt
|
||||
|
||||
Source documents:
|
||||
2025-11-04.txt ./data/2025-11-04.txt 3.303
|
||||
2025-02-14.txt ./data/2025-02-14.txt 3.144
|
||||
2022-08-14.txt ./data/2022-08-14.txt 2.183
|
||||
2021-07-04.txt ./data/2021-07-04.txt 1.595
|
||||
2025-02-14.txt ./data/2025-02-14.txt 1.504
|
||||
2025-07-27.txt ./data/2025-07-27.txt 0.171
|
||||
2024-07-26.txt ./data/2024-07-26.txt -0.969
|
||||
2024-09-06.txt ./data/2024-09-06.txt -1.661
|
||||
2024-04-08.txt ./data/2024-04-08.txt -2.017
|
||||
2025-02-05.txt ./data/2025-02-05.txt -2.027
|
||||
2022-04-17.txt ./data/2022-04-17.txt -2.054
|
||||
2023-07-01.txt ./data/2023-07-01.txt -2.284
|
||||
2023-01-06.txt ./data/2023-01-06.txt -2.315
|
||||
2024-09-04.txt ./data/2024-09-04.txt -2.379
|
||||
2024-08-03.txt ./data/2024-08-03.txt -2.391
|
||||
Query processed in 94 seconds.
|
||||
58
tests/results_commandr7b_t0.8.txt
Normal file
58
tests/results_commandr7b_t0.8.txt
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
Query: --query Passages that quote Louis Menand.
|
||||
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
|
||||
|
||||
Response:
|
||||
|
||||
The provided excerpts feature multiple references to Louis Menand's quote about hypostasis and our creation of God as a fiction.
|
||||
Here are the relevant passages:
|
||||
|
||||
- **"We created God, and then pretended that God created us."** **(file_path: ./data/2025-11-04.txt)**
|
||||
|
||||
This excerpt introduces Menand's quote as the starting point for an exploration of how humans create and interact with abstract
|
||||
concepts.
|
||||
|
||||
- **"We hypostatized our own concept and turned it into something “out there” whose commandments (which we made up) we struggle to
|
||||
understand and obey."** **(file_path: ./data/2025-11-04.txt)**
|
||||
|
||||
This quote delves further into the idea of turning concepts into something tangible, emphasizing the struggle to understand and
|
||||
follow the rules we create.
|
||||
|
||||
- **"We are supplicants to our own fiction."** **(file_path: ./data/2025-11-04.txt)**
|
||||
|
||||
The passage concludes by emphasizing the concept of being subservient to our own creations, highlighting the complex relationship
|
||||
between human stories and reality.
|
||||
|
||||
- **"We created God, and then pretended that God created us."** **(file_path: ./data/2022-08-14.txt)**
|
||||
|
||||
This excerpt reiterates the core idea from Menand's quote in a slightly different phrasing.
|
||||
|
||||
- **"We hypostatized our own concept and turned it into something “out there” whose commandments, which we also made up, we
|
||||
struggle to understand and obey."** **(file_path: ./data/2022-08-14.txt)**
|
||||
|
||||
This quote emphasizes the process of creating rules and struggling to comprehend them while adhering to them.
|
||||
|
||||
All these excerpts contribute to exploring Menand's perspective on how humans create, project, and interact with abstract
|
||||
concepts, highlighting the complex relationship between stories, beliefs, and reality.
|
||||
|
||||
**Files contributing:**
|
||||
|
||||
* **2025-11-04.txt**
|
||||
* **2022-08-14.txt**
|
||||
|
||||
Source documents:
|
||||
2025-11-04.txt ./data/2025-11-04.txt 3.303
|
||||
2025-02-14.txt ./data/2025-02-14.txt 3.144
|
||||
2022-08-14.txt ./data/2022-08-14.txt 2.183
|
||||
2021-07-04.txt ./data/2021-07-04.txt 1.595
|
||||
2025-02-14.txt ./data/2025-02-14.txt 1.504
|
||||
2025-07-27.txt ./data/2025-07-27.txt 0.171
|
||||
2024-07-26.txt ./data/2024-07-26.txt -0.969
|
||||
2024-09-06.txt ./data/2024-09-06.txt -1.661
|
||||
2024-04-08.txt ./data/2024-04-08.txt -2.017
|
||||
2025-02-05.txt ./data/2025-02-05.txt -2.027
|
||||
2022-04-17.txt ./data/2022-04-17.txt -2.054
|
||||
2023-07-01.txt ./data/2023-07-01.txt -2.284
|
||||
2023-01-06.txt ./data/2023-01-06.txt -2.315
|
||||
2024-09-04.txt ./data/2024-09-04.txt -2.379
|
||||
2024-08-03.txt ./data/2024-08-03.txt -2.391
|
||||
Query processed in 78 seconds.
|
||||
65
tests/results_gpt4omini_t0.1.txt
Normal file
65
tests/results_gpt4omini_t0.1.txt
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
run_query.sh
|
||||
Passages that quote Louis Menand.
|
||||
|
||||
Query: --query Passages that quote Louis Menand.
|
||||
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
|
||||
|
||||
Response:
|
||||
|
||||
The excerpts provided contain several passages that quote or reference Louis Menand, particularly focusing on his ideas about
|
||||
hypostasis and the human relationship with concepts such as God and societal structures. Here are the relevant passages:
|
||||
|
||||
1. **File: ./data/2025-11-04.txt**
|
||||
- This excerpt includes the quote: "We created God, and then pretended that God created us. We hypostatized our own concept and
|
||||
turned it into something 'out there' whose commandments (which we made up) we struggle to understand and obey. We are supplicants
|
||||
to our own fiction." It discusses the parallels between this idea and other societal constructs like democracy and politics.
|
||||
|
||||
2. **File: ./data/2025-02-14.txt**
|
||||
- The passage reiterates Menand's quote about hypostasis, emphasizing how humans have taken abstract concepts and given them a
|
||||
separate existence, which allows for worship and obedience. It reflects on the implications of this process on our understanding
|
||||
of reality.
|
||||
|
||||
3. **File: ./data/2022-08-14.txt**
|
||||
- This excerpt discusses Menand's critique of how people perceive forces as external and inevitable, which are actually human
|
||||
creations. It includes the same quote about God and hypostasis, highlighting the reification of human concepts.
|
||||
|
||||
4. **File: ./data/2025-07-27.txt**
|
||||
- Here, Menand's quote is referenced in the context of discussing narratives and organizational structures, emphasizing the
|
||||
fictional nature of societal constructs.
|
||||
|
||||
5. **File: ./data/2025-02-05.txt**
|
||||
- This excerpt paraphrases Menand's idea that we are "supplicants to our own fictions," linking it to the broader theme of
|
||||
storytelling in human culture.
|
||||
|
||||
6. **File: ./data/2024-09-04.txt**
|
||||
- Although not a direct quote, this excerpt mentions Menand's work in relation to AI and copyright, indicating his relevance in
|
||||
contemporary discussions.
|
||||
|
||||
In summary, the passages collectively illustrate Menand's exploration of how humans create and interact with abstract concepts,
|
||||
particularly the notion of God, and how these constructs influence our understanding of reality and societal structures.
|
||||
|
||||
**Files that contributed to the answer:**
|
||||
- **2025-11-04.txt**: Discusses the quote in relation to societal constructs.
|
||||
- **2025-02-14.txt**: Reiterates the quote and its implications on reality.
|
||||
- **2022-08-14.txt**: Analyzes Menand's critique of perception and includes the quote.
|
||||
- **2025-07-27.txt**: References the quote in the context of narratives and organization.
|
||||
- **2025-02-05.txt**: Paraphrases Menand's idea about fictions in human culture.
|
||||
- **2024-09-04.txt**: Mentions Menand's relevance in discussions about AI.
|
||||
|
||||
Source documents:
|
||||
2025-11-04.txt ./data/2025-11-04.txt 3.303
|
||||
2025-02-14.txt ./data/2025-02-14.txt 3.144
|
||||
2022-08-14.txt ./data/2022-08-14.txt 2.183
|
||||
2021-07-04.txt ./data/2021-07-04.txt 1.595
|
||||
2025-02-14.txt ./data/2025-02-14.txt 1.504
|
||||
2025-07-27.txt ./data/2025-07-27.txt 0.171
|
||||
2024-07-26.txt ./data/2024-07-26.txt -0.969
|
||||
2024-09-06.txt ./data/2024-09-06.txt -1.661
|
||||
2024-04-08.txt ./data/2024-04-08.txt -2.017
|
||||
2025-02-05.txt ./data/2025-02-05.txt -2.027
|
||||
2022-04-17.txt ./data/2022-04-17.txt -2.054
|
||||
2023-07-01.txt ./data/2023-07-01.txt -2.284
|
||||
2023-01-06.txt ./data/2023-01-06.txt -2.315
|
||||
2024-09-04.txt ./data/2024-09-04.txt -2.379
|
||||
2024-08-03.txt ./data/2024-08-03.txt -2.391
|
||||
Query processed in 44 seconds.
|
||||
63
tests/results_gpt4omini_t0.3.txt
Normal file
63
tests/results_gpt4omini_t0.3.txt
Normal file
|
|
@ -0,0 +1,63 @@
|
|||
Query: --query Passages that quote Louis Menand.
|
||||
Vector: 20, BM25: 20, overlap: 7, merged: 33, re-ranked to: 15
|
||||
|
||||
Response:
|
||||
|
||||
The excerpts contain several passages that quote or reference Louis Menand, particularly focusing on his ideas about hypostasis
|
||||
and the human relationship with concepts such as God and societal structures. Here are the relevant passages:
|
||||
|
||||
1. **File: ./data/2025-11-04.txt**
|
||||
- This excerpt includes the quote: "We created God, and then pretended that God created us. We hypostatized our own concept and
|
||||
turned it into something 'out there' whose commandments (which we made up) we struggle to understand and obey. We are supplicants
|
||||
to our own fiction." It discusses the parallels between this idea and other societal constructs like democracy and politics.
|
||||
|
||||
2. **File: ./data/2025-02-14.txt**
|
||||
- The passage reiterates Menand's quote about hypostasis and emphasizes how humans have created a sense of distance from their
|
||||
own creations, allowing for worship or rebellion against them. It reflects on the implications of this process for understanding
|
||||
reality and human experience.
|
||||
|
||||
3. **File: ./data/2022-08-14.txt**
|
||||
- This excerpt discusses Menand's "Free World," specifically pages 335-339, where he critiques how people perceive external
|
||||
forces as inevitable, which are actually human creations. It includes the same quote about God and hypostasis, highlighting the
|
||||
idea that these constructs are reified as "the way things are."
|
||||
|
||||
4. **File: ./data/2025-07-27.txt**
|
||||
- This excerpt references Menand's quote in the context of discussing anti-capitalist themes and the narratives that organize
|
||||
society, emphasizing the fictional nature of these constructs.
|
||||
|
||||
5. **File: ./data/2025-02-05.txt**
|
||||
- Here, Menand's notion that we are "supplicants to our own fictions" is paraphrased, relating it to the broader role of
|
||||
storytelling in human culture and how we create and live by the narratives we construct.
|
||||
|
||||
6. **File: ./data/2024-09-04.txt**
|
||||
- This excerpt mentions Menand in the context of discussing AI and copyright, though it does not quote him directly. It
|
||||
highlights his relevance in contemporary discussions about technology.
|
||||
|
||||
### Summary of Contributions:
|
||||
- **2025-11-04.txt**: Introduces Menand's quote and connects it to societal constructs.
|
||||
- **2025-02-14.txt**: Explores the implications of Menand's ideas on reality and human experience.
|
||||
- **2022-08-14.txt**: Discusses Menand's critique of how humans perceive societal constructs as external forces.
|
||||
- **2025-07-27.txt**: Relates Menand's quote to anti-capitalist themes and societal narratives.
|
||||
- **2025-02-05.txt**: Paraphrases Menand's ideas in the context of storytelling in culture.
|
||||
- **2024-09-04.txt**: Mentions Menand in discussions about AI, highlighting his ongoing relevance.
|
||||
|
||||
Overall, these excerpts collectively illustrate Menand's critical perspective on how humans create and relate to their own
|
||||
constructs, emphasizing the fictional nature of many societal narratives.
|
||||
|
||||
Source documents:
|
||||
2025-11-04.txt ./data/2025-11-04.txt 3.303
|
||||
2025-02-14.txt ./data/2025-02-14.txt 3.144
|
||||
2022-08-14.txt ./data/2022-08-14.txt 2.183
|
||||
2021-07-04.txt ./data/2021-07-04.txt 1.595
|
||||
2025-02-14.txt ./data/2025-02-14.txt 1.504
|
||||
2025-07-27.txt ./data/2025-07-27.txt 0.171
|
||||
2024-07-26.txt ./data/2024-07-26.txt -0.969
|
||||
2024-09-06.txt ./data/2024-09-06.txt -1.661
|
||||
2024-04-08.txt ./data/2024-04-08.txt -2.017
|
||||
2025-02-05.txt ./data/2025-02-05.txt -2.027
|
||||
2022-04-17.txt ./data/2022-04-17.txt -2.054
|
||||
2023-07-01.txt ./data/2023-07-01.txt -2.284
|
||||
2023-01-06.txt ./data/2023-01-06.txt -2.315
|
||||
2024-09-04.txt ./data/2024-09-04.txt -2.379
|
||||
2024-08-03.txt ./data/2024-08-03.txt -2.391
|
||||
Query processed in 45 seconds.
|
||||
Loading…
Add table
Add a link
Reference in a new issue