Reframe from three modes to two worlds

Restructures section 01 from "web chat / in-editor / agentic" into "web chat vs. tools that live with your code," with the autocomplete / in-project chat / agentic spectrum as a sub-structure of the latter. Inline edits are reduced to a historical note tied to the 2023 instruction-tuned LLM era. - Rename 01-three-modes -> 01-two-worlds and 03-in-editor-workflow -> 03-autocomplete; section 03 narrows to autocomplete (ghost text habits, the autocomplete-your-verification trap) - Section 04 reframes in-project chat as the default venue, web chat as a special-case venue; adds "Carrying context across sessions" covering dev-log.md, CLAUDE.md, .cursorrules - Section 05 reworks intro to contrast against in-project chat instead of "editor extension"; tightens prose and removes em-dashes - Update cross-references and tool-mode language in 02, 06, 07, and the root README to match the new framing - Swap the CRDT example in section 04 for finite-volume methods, fitting the CHEG audience - Minor typo/wording fixes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 23:01:09 -04:00 · 2026-05-28 23:01:09 -04:00 · d2ca02bd90
commit d2ca02bd90
parent 5780cdf097
10 changed files with 308 additions and 270 deletions
--- a/07-local-models/README.md
+++ b/07-local-models/README.md
@ -2,14 +2,14 @@

 ## Key idea

-You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models are not a fourth *mode* on top of chat, editor, and agent — they cut across all three. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself.
+You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models cut across every workflow we've covered — web chat, autocomplete, in-project chat, and agentic — rather than being a separate mode. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself.

 This section is about local models as a *user* of AI coding tools. If you want to understand how local models work under the hood, train your own, or build the infrastructure around them, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop).

 ## Key goals

 - Understand why you might prefer a local model to a cloud model
- Recognize which tools in each of the three modes support local models
+- Recognize which tools across the autocomplete/chat/agent spectrum support local models
 - Calibrate expectations about capability and latency relative to frontier cloud models
 - Identify the situations where local is the right choice and where cloud still wins

@ -47,11 +47,11 @@ A rough sense of what runs comfortably where, as of early 2026:
 If you took the time to fill out the spec table in [computing-setup section 01](https://lem.che.udel.edu/git/furst/computing-setup/src/branch/main/01-know-your-machine/), you already know what tier you're in.


-## Local models across the three modes
+## Local models across the workflow

-The three-mode framing from [section 01](../01-three-modes/) still applies — what changes is the host.
+The framing from [section 01](../01-two-worlds/) still applies — what changes is the host. Below, we walk through where local models fit in each kind of work.

-### Local in *chat* mode
+### Local in *web-chat* style

 You can have a private, local ChatGPT-style experience entirely on your laptop.

@ -62,15 +62,15 @@ You can have a private, local ChatGPT-style experience entirely on your laptop.
 | **Open WebUI** | A self-hosted web UI (like ChatGPT) that talks to Ollama or any OpenAI-compatible backend. Good if you want a familiar chat experience or want to share access on a LAN. |
 | **Jan**, **GPT4All** | Other desktop chat apps with similar goals. |

-The Ollama-powered backends in particular are useful well beyond chat — most of the editor and agentic tools below can connect to an Ollama endpoint, which means setting up Ollama once unlocks every mode.
+The Ollama-powered backends in particular are useful well beyond chat — most of the in-editor and agentic tools below can connect to an Ollama endpoint, which means setting up Ollama once unlocks every other use case.

-### Local in *editor* mode
+### Local for autocomplete and in-project chat

-Several VS Code extensions support local models. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension.
+Several VS Code extensions support local models for autocomplete and side-panel chat. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude (legacy) extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension.

 | Extension | Notes |
 |---|---|
-| **Continue.dev** | Open-source, the flagship local-friendly extension. Works with Ollama, LM Studio, llama.cpp, and many cloud providers. Supports autocomplete, inline edit, and a chat panel. The first tool to try. |
+| **Continue.dev** | Open-source, the flagship local-friendly extension. Works with Ollama, LM Studio, llama.cpp, and many cloud providers. Supports autocomplete and a chat panel. The first tool to try. |
 | **Cody** (Sourcegraph) | Has a "local context" mode and can use local models via Ollama. Also has a strong cloud product. |
 | **Llama Coder** | Ollama-focused, autocomplete-first. Lightweight. |
 | **Tabby** | A self-hosted code completion server. Heavier setup but good for shared use within a team or lab. |