Reframe from three modes to two worlds

Restructures section 01 from "web chat / in-editor / agentic" into "web
chat vs. tools that live with your code," with the autocomplete /
in-project chat / agentic spectrum as a sub-structure of the latter.
Inline edits are reduced to a historical note tied to the 2023
instruction-tuned LLM era.

- Rename 01-three-modes -> 01-two-worlds and 03-in-editor-workflow ->
  03-autocomplete; section 03 narrows to autocomplete (ghost text habits,
  the autocomplete-your-verification trap)
- Section 04 reframes in-project chat as the default venue, web chat as
  a special-case venue; adds "Carrying context across sessions" covering
  dev-log.md, CLAUDE.md, .cursorrules
- Section 05 reworks intro to contrast against in-project chat instead
  of "editor extension"; tightens prose and removes em-dashes
- Update cross-references and tool-mode language in 02, 06, 07, and
  the root README to match the new framing
- Swap the CRDT example in section 04 for finite-volume methods, fitting
  the CHEG audience
- Minor typo/wording fixes

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Eric Furst 2026-05-28 23:01:09 -04:00
commit d2ca02bd90
10 changed files with 308 additions and 270 deletions

View file

@ -2,14 +2,14 @@
## Key idea
You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models are not a fourth *mode* on top of chat, editor, and agent — they cut across all three. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself.
You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models cut across every workflow we've covered — web chat, autocomplete, in-project chat, and agentic — rather than being a separate mode. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself.
This section is about local models as a *user* of AI coding tools. If you want to understand how local models work under the hood, train your own, or build the infrastructure around them, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop).
## Key goals
- Understand why you might prefer a local model to a cloud model
- Recognize which tools in each of the three modes support local models
- Recognize which tools across the autocomplete/chat/agent spectrum support local models
- Calibrate expectations about capability and latency relative to frontier cloud models
- Identify the situations where local is the right choice and where cloud still wins
@ -47,11 +47,11 @@ A rough sense of what runs comfortably where, as of early 2026:
If you took the time to fill out the spec table in [computing-setup section 01](https://lem.che.udel.edu/git/furst/computing-setup/src/branch/main/01-know-your-machine/), you already know what tier you're in.
## Local models across the three modes
## Local models across the workflow
The three-mode framing from [section 01](../01-three-modes/) still applies — what changes is the host.
The framing from [section 01](../01-two-worlds/) still applies — what changes is the host. Below, we walk through where local models fit in each kind of work.
### Local in *chat* mode
### Local in *web-chat* style
You can have a private, local ChatGPT-style experience entirely on your laptop.
@ -62,15 +62,15 @@ You can have a private, local ChatGPT-style experience entirely on your laptop.
| **Open WebUI** | A self-hosted web UI (like ChatGPT) that talks to Ollama or any OpenAI-compatible backend. Good if you want a familiar chat experience or want to share access on a LAN. |
| **Jan**, **GPT4All** | Other desktop chat apps with similar goals. |
The Ollama-powered backends in particular are useful well beyond chat — most of the editor and agentic tools below can connect to an Ollama endpoint, which means setting up Ollama once unlocks every mode.
The Ollama-powered backends in particular are useful well beyond chat — most of the in-editor and agentic tools below can connect to an Ollama endpoint, which means setting up Ollama once unlocks every other use case.
### Local in *editor* mode
### Local for autocomplete and in-project chat
Several VS Code extensions support local models. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension.
Several VS Code extensions support local models for autocomplete and side-panel chat. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude (legacy) extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension.
| Extension | Notes |
|---|---|
| **Continue.dev** | Open-source, the flagship local-friendly extension. Works with Ollama, LM Studio, llama.cpp, and many cloud providers. Supports autocomplete, inline edit, and a chat panel. The first tool to try. |
| **Continue.dev** | Open-source, the flagship local-friendly extension. Works with Ollama, LM Studio, llama.cpp, and many cloud providers. Supports autocomplete and a chat panel. The first tool to try. |
| **Cody** (Sourcegraph) | Has a "local context" mode and can use local models via Ollama. Also has a strong cloud product. |
| **Llama Coder** | Ollama-focused, autocomplete-first. Lightweight. |
| **Tabby** | A self-hosted code completion server. Heavier setup but good for shared use within a team or lab. |