From d2ca02bd90b1d91afc8277b9b3fcb664c4fba772 Mon Sep 17 00:00:00 2001 From: Eric Furst Date: Thu, 28 May 2026 23:01:09 -0400 Subject: [PATCH] Reframe from three modes to two worlds Restructures section 01 from "web chat / in-editor / agentic" into "web chat vs. tools that live with your code," with the autocomplete / in-project chat / agentic spectrum as a sub-structure of the latter. Inline edits are reduced to a historical note tied to the 2023 instruction-tuned LLM era. - Rename 01-three-modes -> 01-two-worlds and 03-in-editor-workflow -> 03-autocomplete; section 03 narrows to autocomplete (ghost text habits, the autocomplete-your-verification trap) - Section 04 reframes in-project chat as the default venue, web chat as a special-case venue; adds "Carrying context across sessions" covering dev-log.md, CLAUDE.md, .cursorrules - Section 05 reworks intro to contrast against in-project chat instead of "editor extension"; tightens prose and removes em-dashes - Update cross-references and tool-mode language in 02, 06, 07, and the root README to match the new framing - Swap the CRDT example in section 04 for finite-volume methods, fitting the CHEG audience - Minor typo/wording fixes Co-Authored-By: Claude Opus 4.7 --- 01-three-modes/README.md | 113 ------------------------------ 01-two-worlds/README.md | 113 ++++++++++++++++++++++++++++++ 02-errors-and-logs/README.md | 2 +- 03-autocomplete/README.md | 100 ++++++++++++++++++++++++++ 03-in-editor-workflow/README.md | 88 ----------------------- 04-conversations/README.md | 49 +++++++++---- 05-agentic-workflow/README.md | 75 ++++++++++---------- 06-verifying-and-citing/README.md | 4 +- 07-local-models/README.md | 18 ++--- README.md | 16 ++--- 10 files changed, 308 insertions(+), 270 deletions(-) delete mode 100644 01-three-modes/README.md create mode 100644 01-two-worlds/README.md create mode 100644 03-autocomplete/README.md delete mode 100644 03-in-editor-workflow/README.md diff --git a/01-three-modes/README.md b/01-three-modes/README.md deleted file mode 100644 index 1a64a53..0000000 --- a/01-three-modes/README.md +++ /dev/null @@ -1,113 +0,0 @@ -# Three Modes - -## Key idea - -There are three distinct ways to work with AI assistants today, and they suit different problems. Knowing which is which, and which one would actually help for a given problem, is the most important judgment that we would like to develop. - -## Key goals - -- Recognize the three modes: web chat, in-editor extension, and agentic tool -- Understand the characteristic strengths and weaknesses of each -- Develop heuristics for choosing the right mode for a given task - ---- - -## The three modes - -### 1. Web chat - -Web chat is a a browser- or app-based conversation with a model. You type, paste, or drag content in and the model responds in the same window. - -**Examples (early 2026):** ChatGPT, Claude.ai, Gemini, Microsoft Copilot (web). Each has a free tier with usage limits, a paid plan that removes those limits, and (often) institutional access through a university or employer agreement. You can also self-host a chat interface against a local model — see [section 07](../07-local-models/). - -**What it's good at:** - -- One-shot interpretation tasks: *explain this error, what does this log mean, what does this regex match* -- Multi-turn design discussions: *"I'm choosing between approach A and B, what should I think about?"* -- Non-code work: drafting documentation, writing commit messages, explaining a concept -- Working with content you do not want the AI to "live in" — a snippet from a paper, output from a server you don't own, a script from an unfamiliar repo - -**Weaknesses:** - -- It's disconnected from your project. The model is stateless and has no idea what files exist, what your codebase conventions are, or what you changed five minutes ago, unless you paste it. -- Round-trip friction: copy from terminal → paste into chat → wait → read → copy answer → paste back. Fine for one-shot, but it is painful for iterative editing. -- You have to remember to bring the relevant context with you each time. - - -### 2. In-editor extension - -An AI assistant living inside your editor (VS Code, JetBrains, Neovim, etc.) with awareness of your open files and project. - -**Examples (early 2026):** GitHub Copilot, Claude (VS Code extension), Codeium, Microsoft Copilot (in VS Code), Cursor and Windsurf (which are VS Code forks with deeper AI baked in). - -**What it's good at:** - -- Autocomplete while typing — the model suggests the next few lines as you write -- Inline edits: highlight a block, ask for a refactor or fix, review a *diff* (a side-by-side view of the proposed changes — what is being added, removed, or modified), accept or reject changes -- "Explain this" on a selection, function, or whole file without leaving the editor -- Quick rename, extract function, add type hints — the kind of work that ought to be in place - -**Weaknesses:** - -- Conversation UX is limited compared to web chat — fine for *"do X to this code,"* awkward for *"let's talk through the design."* -- Easy to accept suggestions you did not fully read (the autocomplete habit). -- The model's context is whatever the extension chooses to feed it — usually the open file plus some recently viewed files. Larger projects can confuse it. - - -### 3. Agentic tools - -An AI that takes multi-step actions on its own: read files, run commands, edit, run tests, read the output, edit again. You set the goal; the agent runs the loop. - -**Examples (early 2026):** Claude Code (CLI), Cursor agent mode, Microsoft Copilot agent, Cline (VS Code extension), Aider. - -**What it's good at:** - -- Multi-file refactors with verification (*"rename this concept everywhere and make the tests pass"*) -- Investigating an unfamiliar codebase (*"find where X is defined and tell me how it's wired"*) -- Larger work units where you would otherwise be the one ferrying information between editor, terminal, and chat -- Repetitive maintenance (*"update all these import paths," "add docstrings to this module"*) - -**Weaknesses:** - -- The agent will happily do the wrong thing efficiently. Supervision matters! -- Permissioning matters, too: agents that can run arbitrary shell commands can do real damage if pointed at the wrong directory or given the wrong instructions. -- Cost can scale faster than you expect — multi-step tasks consume many model calls. -- For small, well-scoped edits, an agent is overkill compared to a simple inline edit. - - -## How to choose - -### Editor vs agentic: what's the actual difference? - -The chat/editor split is usually obvious. The editor/agent split trips people up. Two questions clarify it: - -- **Who drives the loop?** With the editor extension, *you* do — you make one request, see the proposed change, accept or reject, then make the next request. With an agent, *the model* does — it decides the steps, runs them, and reports back at the end (or at checkpoints you've configured). -- **How many actions does the task need?** A single targeted edit you can see in front of you is editor work. A task that needs a chain of actions, like reading several files, making changes in multiple places, or running tests, react to the output, is agent work. - -In other words, if you can point at the code on screen and say *"do X to this,"* then you want the editor. If the work is *"figure out how this codebase does X and change it consistently,"* then you want an agent. - -### Starting heuristic - -| If the work is... | Reach for... | Why | -|---|---|---| -| Explain an error, parse a log, interpret some output | Web chat | One-shot interpretation; the answer is words, not code-in-place | -| A targeted edit you can see in front of you — refactor this function, add types here, rewrite this block | In-editor extension | One action, one diff, one accept/reject decision; you stay in the driver's seat | -| A task that needs multiple steps — cross-file changes, run-tests-and-fix loops, "explore the project and then change it" | Agentic | The model owns the sequence; you set the goal and review the end state | -| Deciding between two approaches; talking through a design | Web chat or editor side panel | Conversation UX is what matters; either works (see [section 03](../03-in-editor-workflow/) for venue choice) | -| Writing a commit message, README, or documentation | Either chat or editor | Both work — chat if standalone, editor if it should live inline | - - -## Two principles underneath the heuristic - -**Match the mode to the output target.** If the answer should *be code in a file*, use a tool that can put it there (editor or agent). If the answer should be a conversation or an explanation, use chat. Mode-mismatching is what leads to painful copy-paste loops. - -**Match the mode to the iteration speed.** Single-shot interpretation → use a chat. Tight feedback loop on a known file → use an editor. Multi-step plan you would rather not babysit step by step → use an agent. - - -## Exercises - -> **Exercise 1:** Think of three recent times you used an AI assistant. For each, classify which mode you used and which mode this guide would suggest. Were any mismatched? If so, what did the mismatch cost you (time, friction, abandoned attempts)? - -> **Exercise 2:** Pick one tool from each mode you have access to. Use each one in the next week and keep a one-line note of what you used it for. After a week, look at your notes: are you using each mode for things it is genuinely good at? - - diff --git a/01-two-worlds/README.md b/01-two-worlds/README.md new file mode 100644 index 0000000..767d708 --- /dev/null +++ b/01-two-worlds/README.md @@ -0,0 +1,113 @@ +# Two Worlds + +## Key idea + +Most people meet AI assistance through web chat: open a browser tab, paste in a problem, copy the answer back. That works for one-shot questions, but it is the wrong tool for actually writing code. The goal of this guide is to move you off the copy-paste habit and onto tools that live with your code. + +There are really only two worlds worth distinguishing: + +1. **Web chat** — a browser tab. The AI has no awareness of your project. +2. **In your editor or terminal** — the AI lives where your code does. It can read your files, change them directly, and often run commands too. + +The first is useful for explanation, interpretation, and sometimes planning. The second is what you want to use for any actual coding work. + +## Key goals + +- Distinguish web chat from tools that live with your code, and recognize which one your current task wants +- Recognize the spectrum *within* world 2: autocomplete, in-project chat, and delegated agentic work +- Stop treating the browser tab as your default coding workspace + +--- + +## World 1: Web chat + +A browser- or app-based conversation with a model. You type, paste, or drag content in and the model responds in the same window. + +**Examples (early 2026):** ChatGPT, Claude.ai, Gemini, Microsoft Copilot (web). Each has a free tier with usage limits, a paid plan that removes those limits, and (often) institutional access through a university or employer agreement. You can also self-host a chat interface against a local model. See [section 07](../07-local-models/). + +**What it's good at:** + +- One-shot interpretation: *explain this error, what does this log mean, what does this regex match* +- Multi-turn design discussion: *"I'm choosing between approach A and B, what should I think about?"* +- Non-code work: drafting documentation, writing commit messages, explaining a concept +- Content you do not want the AI to "live in" — a snippet from a paper, output from a server you don't own, a script from an unfamiliar repo + +**Weaknesses:** + +- It's disconnected from your project. The model is stateless and has no idea what files exist, what your conventions are, or what you changed five minutes ago, unless you paste it in. +- Round-trip friction: copy from terminal → paste into chat → wait → read → copy answer → paste back. Fine for one-shot, painful for iterative editing. +- You have to remember to bring the relevant context with you every time. + +**The trap:** because web chat was the first interface most people learned, it becomes their default. If you find yourself pasting code back and forth between a browser tab and your editor more than a couple of times in a session, you are using the wrong tool. + + +## World 2: AI that lives with your code + +An AI assistant inside your editor or terminal that can see your files and change them directly. The interface might be a side panel where you chat, a CLI you type into, an autocomplete that fills in code as you type, or some combination, but the common aspect is that the tool *lives where the code lives*. + +**Examples (early 2026):** Claude Code (CLI, VS Code, JetBrains), Cursor, Windsurf, GitHub Copilot, Cline, Aider, Microsoft Copilot agent. Most modern tools combine multiple interaction patterns in one package. + +**Why this is the world to learn:** + +- No copy-paste: the tool reads the file you're working on directly +- The model can see your project's actual structure and conventions +- Edits land in the file as a *diff* (a side-by-side view of what's being added, removed, or modified) that you review, not as text in a chat window you have to copy back +- The same tool can answer a quick question, make one targeted change, or run a multi-step task. You decide which by how or what you ask. + + +### The spectrum within world 2 + +A single tool that lives with your code can be operated three different ways. Modern tools (Claude Code, Cursor) support all three; older ones (vanilla Copilot, c. 2024) support only the first. + +**1. Autocomplete.** Ghost text suggesting the next few lines as you type. You stay in the driver's seat keystroke by keystroke; the model just predicts what comes next. Cheap, fast, and easy to ignore when wrong. This is covered in [section 03](../03-autocomplete/). + +**2. In-project chat.** A side panel or terminal session where you have a conversation that's aware of your files. You can ask design questions, request a targeted change, or ask the AI to explain something without leaving your editor. This is what you are doing if you are using Claude Code in a VS Code panel. We cover this in [section 04](../04-conversations/). + +**3. Delegated agentic work.** You give the AI a multi-step goal, such as *"find where X is defined and update all the callers, then run the tests,"* and it runs the loop by reading files, making changes, running tests, reading the output, and fixing as it goes. You set the goal, but the agent runs the steps. We will review this in [section 05](../05-agentic-workflow/). + +The line between (2) and (3) is fuzzy, and that's fine. A chat with an in-project AI *becomes* agentic the moment you ask it to do something multi-step. They are not separate tools, but are instead different ways of operating the same tool. The question to ask yourself is not "which mode am I in?" but "what am I asking for right now?" + + +### A historical note: inline edits + +A fourth pattern, central to the Copilot/Cursor era of 2023–2024, is the **inline edit**: highlight a block of code, press a hotkey (Cmd+K in Cursor, "Edit with Copilot" in VS Code), type *"make this async"* or *"add error handling,"* and a diff appears in place. There is no chat, no agent loop, just one selection, one instruction, and one diff. + +Inline edits emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4) — the first models that could reliably take a natural-language instruction and produce a corresponding code transformation. They sat between the earlier completion-only era (autocomplete, powered by Codex and similar) and the agentic loops that followed. As tools became agentic, the pattern has faded: you can do the same thing by asking the in-project chat *"switch the function I have highlighted to async,"* and the result is the same diff. Newer users may skip the inline-edit hotkey entirely. We mention it so you recognize what older tutorials are describing, not as a workflow you need to learn from scratch. + + +## How to choose + +Two questions cover almost every case: + +1. **Should the answer be words I'll read, or code that should land in a file?** + - Words → either world works + - Code → use world 2; world 1 forces a copy-paste loop +2. **Do I want the AI to be able to act on this content?** + - Yes, then use world 2 + - No (sensitive snippet, untrusted code, output from a server you don't own), then world 1 is safer because the tool isn't wired to your filesystem + +### Starting heuristic + +| If the work is... | Use... | Why | +|---|---|---| +| Explain an error, parse a log, interpret some output | Either world | Words-out, no edit needed; pick whichever is open | +| Anything that should result in code landing in a file | World 2 | Removes the copy-paste round trip | +| A targeted edit you can describe in one sentence | In-project chat (world 2) | Fast, low-overhead, you read the diff before accepting | +| Multi-step work — cross-file changes, run-tests-and-fix loops, "explore the project and then change it consistently" | Delegated agent (world 2) | The model owns the sequence; you set the goal and review the end state | +| Reviewing or thinking through code you do not want the AI to act on | Web chat (world 1) | The tool can read but not edit | +| Deciding between two approaches; talking through a design | Either world | Conversation UX is what matters | + + +## Two principles underneath + +**Match the mode to the output target.** If the answer should *be code in a file*, use a tool that can put it there. If the answer should be a conversation, either approach works. + +**Match the mode to the iteration speed.** One-shot interpretation matches well with a chat. If you need a tight feedback loop on a known file, then in-project chat, no agent loop needed. For a multi-step plan you would rather not babysit step by step, use an agent. + + +## Exercises + +> **Exercise 1:** Think of three recent times you used an AI assistant. For each, ask: did the work involve copy-pasting code into or out of a browser tab? If yes, what tool could have done the same job in-place? + +> **Exercise 2:** Pick one in-project AI tool you have access to (Claude Code, Cursor, Copilot, etc.). Use it for everything code-related for one week — no browser chat for coding tasks. Keep a one-line note of where it fell short. Those gaps are where web chat is still the right tool. + diff --git a/02-errors-and-logs/README.md b/02-errors-and-logs/README.md index d7981b6..8fe9f45 100644 --- a/02-errors-and-logs/README.md +++ b/02-errors-and-logs/README.md @@ -22,7 +22,7 @@ A typical Python traceback is 10–40 lines of mostly-noise with one or two line 1. **The output is words, not code-in-place.** You are looking for an explanation or at least a pointer, not an edit to a file. Chat is the stronger choice. 2. **The input is self-contained.** You can paste the whole error and the model can reason about it without needing your project layout, history, or build state. -In-editor extensions can also handle errors (most have "explain this error" features), and that is fine for small ones. But for a long traceback or a multi-page log, chat's room to expand and your ability to copy-paste freely makes it the better tool. +An in-project chat can handle errors well too — and is often the right tool, because it can see the file that threw the error without you pasting it. For a long traceback or a multi-page log that doesn't fit cleanly in your editor's chat panel, or when the failure involves output from a server or system that isn't part of your project, a web chat's room to expand and your ability to copy-paste freely make it the better venue. ## What to paste diff --git a/03-autocomplete/README.md b/03-autocomplete/README.md new file mode 100644 index 0000000..5922252 --- /dev/null +++ b/03-autocomplete/README.md @@ -0,0 +1,100 @@ +# Autocomplete + +## Key idea + +Autocomplete is the lowest-friction way to work with an AI assistant: ghost text appears as you type, you accept with Tab or keep typing to ignore. It is the *one* form of AI assistance that does not require you to write a prompt — the act of typing is the prompt. + +That cheapness is its strength and its trap. Because accepting a suggestion is a single keystroke, it is easy to accept code you did not actually read. The skill of using autocomplete well is almost entirely about *what you accept* and *what you reject*, not about how you invoke it. + +## Key goals + +- Recognize what autocomplete is good for and what it is not +- Build the habit of reading a suggestion before accepting it +- Avoid the autocomplete-your-verification trap +- Know when to escalate from autocomplete to a chat or an agent + +--- + +## How autocomplete works + +As you type, the extension sends a window of context (the current file, usually some recently viewed files) to a model that predicts the next tokens. The prediction appears as faint "ghost text" inline; Tab accepts it, continued typing ignores it. + +**Examples (early 2026):** GitHub Copilot, Codeium, Cursor Tab, Continue.dev, Microsoft Copilot in VS Code. Most agentic tools (Claude Code, Cline) do *not* provide ghost-text autocomplete — they're optimized for chat-and-agent interaction. If you want autocomplete and an agentic tool, you generally run two extensions side by side. + +The model is small and fast on purpose; the latency budget is the time between your keystrokes, which is short. Don't expect the depth of reasoning you get from a chat-with-a-frontier-model — autocomplete is pattern completion, not analysis. + + +## Where autocomplete shines + +- **Boilerplate you would have typed anyway.** Loop scaffolds, function signatures whose shape is obvious, import lines, the body of a getter, repetitive variations of the same pattern. +- **Filling in a pattern from context.** If you've just written three similar dictionary entries, the fourth will autocomplete correctly. +- **Local syntactic completions.** Closing brackets, common method names on a known object, the rest of a familiar identifier. + +The common thread: the model has all the information it needs *in the few lines around your cursor*, and the answer is mechanical. + + +## Where autocomplete fails + +- **Anything that requires understanding a wider context.** If the right answer depends on what a function in another file does, autocomplete will guess — and the guess looks plausible. +- **Novel logic.** If you are doing something the codebase has not done before, the model will pattern-match to something *similar* and produce confident-looking code that is subtly wrong. +- **Anything where "correct" is non-obvious from the surface.** Off-by-one indices, edge cases in numerical code, units, sign conventions, the precise contract of an API you are calling. + + +## Habits that matter + +### Read the suggestion before accepting it + +The cost of accepting wrong code that *looks* right is high. You will find the bug an hour later in a debugger when you could have caught it in 200 milliseconds. If a suggestion is more than a few lines, the right move is to read it, decide, and either accept or rewrite — don't Tab-and-pray. + +A useful threshold: if the suggestion is longer than the comment or signature that triggered it, slow down. + +### Do not autocomplete your verification + +This is the single most damaging autocomplete failure mode in scientific and engineering code. + +Whether your verification is a formal unit test, a sanity-check script, a comparison against a known answer, or a hand-checked numerical result, it is supposed to be *your* expression of what the code should do. If the model writes the check based on the code, the check passes by construction and confirms nothing. + +Write your check yourself; let the model help with the implementation. If you must use autocomplete in a test file, autocomplete the *boilerplate* (imports, fixtures, the test function signature) and write the assertion yourself. + +### Treat repeated rejection as a signal + +If you find yourself Tab-rejecting (or ignoring and overwriting) the same kind of suggestion repeatedly for the same task, the model doesn't have the signal it needs. Stop reaching for autocomplete and either write it yourself or escalate to a chat where you can give the model the context it's missing. + + +## When to escalate + +Autocomplete is one rung of a ladder. Step off when: + +| Symptom | Escalate to | Why | +|---|---|---| +| You want a refactor or change to a block you can describe in words | In-project chat ([section 04](../04-conversations/)) | A one-sentence instruction lands better in a chat panel than as a comment-prompt to autocomplete | +| The change spans more than one file | Agentic delegation ([section 05](../05-agentic-workflow/)) | Autocomplete is per-cursor; agents handle cross-file work | +| You need to think through a design decision | Chat ([section 04](../04-conversations/)) | Autocomplete cannot reason; it can only complete | +| You keep accepting suggestions and then deleting them | Write it yourself | The model isn't helping; you are doing the work in two places | + + +## A historical note: inline edits + +Older guides and tutorials (and the 2024-era marketing for Copilot and Cursor) put **inline edit** — highlight a block, press Cmd+K, type *"make this async"* — alongside autocomplete as the second main in-editor interaction. The pattern emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4), which were the first models that could reliably turn a natural-language instruction into a code transformation. It sat between the earlier completion-only era (autocomplete, powered by Codex and similar) and the agentic loops that followed. The hotkey still exists in most tools, but the pattern is fading because in-project chat does the same job with better context. + +If you have *highlighted code and a one-sentence instruction*, an inline edit and an in-project chat message produce essentially the same diff. The chat panel just makes it easier to follow up, ask why, or refine. We don't teach inline edit as a primary workflow here. If your tool of choice still leans on it, the same one-sentence-spec discipline applies. + + +## Habits that survive tool changes + +The tools will keep changing. These habits do not: + +- **Read every accepted suggestion.** Even short ones. Especially short ones in numerical code, where a sign flip looks the same as the right answer. +- **Keep the cycle tight.** If autocomplete is producing more than ~10 lines at a time for you, you are no longer reviewing in real time — you are reading code the AI wrote, which is a different mode. +- **Use version control as a safety net.** Commit before a stretch of heavy AI-assisted coding. `git diff` is the last line of defense. +- **Verify with your own checks.** The check has to come from you, not from the AI that wrote the code. +- **Be willing to turn it off.** Autocomplete is the right tool sometimes and the wrong tool other times. Toggling it off for a session where you want to think is a real productivity move. + + +## Exercises + +> **Exercise 1:** For one work session, keep autocomplete on but make a conscious "accept / reject / rewrite" decision on every suggestion of more than one token. Note how often each happens. The exercise is not "reject more" — it is to make the choice visible. + +> **Exercise 2:** Take a small numerical or scientific function you've recently written. Use autocomplete to draft a verification check for it. Then write the same check by hand without looking at the function. Compare what each check actually tests — and notice which one is checking what *you* expected versus what the code *does*. + +> **Exercise 3:** Spend a half-day with autocomplete disabled. Note where you missed it (boilerplate, repetition) and where you didn't (anywhere you were actually thinking). The exercise is to feel the difference between the two modes of writing code. diff --git a/03-in-editor-workflow/README.md b/03-in-editor-workflow/README.md deleted file mode 100644 index f4f2890..0000000 --- a/03-in-editor-workflow/README.md +++ /dev/null @@ -1,88 +0,0 @@ -# In-Editor Workflow - -## Key idea - -Editor extensions are best used as a tight feedback loop: small suggestions, surgical edits, fast accept/reject decisions. The point is not to "let the AI write code for you." The point is to remove the keystrokes that don't deserve your attention while keeping the keystrokes that do. - -## Key goals - -- Recognize the four common in-editor patterns: autocomplete, inline edit, side-panel chat, quick actions -- Use each pattern for the kind of work it suits -- Develop habits that keep you in control of what lands in your code -- Know when to escalate to a chat, an agent, or just write it yourself - ---- - -## Four patterns - -In-editor AI extensions (GitHub Copilot, Claude for VS Code, Codeium, Microsoft Copilot, Cursor's built-in features) vary in keystrokes and naming, but most expose the same four patterns. Learn the patterns; the keystrokes will follow. - -### 1. Autocomplete (ghost text) - -As you type, the extension proposes the next few tokens or lines as faint "ghost text." You accept with Tab (typically) or keep typing to ignore. - -**Best for:** boilerplate you would have typed anyway. Loop scaffolds, function signatures whose shape is obvious, import lines, the body of a getter, repetitive variations of the same pattern. - -**Habit to build:** *read the suggestion before accepting it.* The cost of accepting wrong code that looks right is high — you'll find the bug an hour later in a debugger when you could have caught it in 200 milliseconds. If a suggestion is more than a few lines, the right move is usually to read it, decide, and either accept or rewrite. Do not Tab-and-pray. - -**Habit to break:** *do not autocomplete your verification.* Whether your verification is a formal unit test, a sanity-check script, or a comparison against a known answer, it is supposed to be *your* expression of what the code should do. If the model writes the check based on the code, the check passes by construction and confirms nothing. Write your check yourself; let the model help with the implementation. - -### 2. Inline edit (edit-this-selection) - -You highlight a block of code and invoke the AI with a brief instruction: *"rewrite using a list comprehension,"* *"extract the inner loop into a helper,"* *"add type hints."* The extension shows a diff; you accept or reject. - -**Best for:** surgical edits with a clear before/after. Refactors, type hint additions, adding docstrings, converting between equivalent forms, applying a stylistic change consistently. - -**Habit to build:** *think of the instruction as a one-line spec.* The clearer your instruction, the better the diff. *"Make this better"* is a worse prompt than *"split this into a parse step and a validate step, keeping the same return signature."* - -**Habit to break:** *do not invoke inline edit on a block you do not understand.* If you cannot evaluate the diff, you cannot reject a bad one. Skim or read the block first. - -### 3. Side-panel chat (with file or project context) - -A chat window in your editor where you can ask questions and the extension attaches whatever files or selections you've referenced. Some extensions auto-attach the open file; others require you to add context explicitly. - -**Best for:** *"explain this function," "why is this slow," "how would I extend this to do Y,"* — questions where the answer is words, not a direct edit, but where you want the answer informed by your actual code rather than a generic snippet. Side panels have matured to the point where they also handle **multi-turn design discussions well**, especially when the discussion is anchored in files you have open — the one-click attachment of files and selections is a real advantage over alt-tabbing to a web chat. - -**Habit to build:** *be explicit about what context you want included.* If the extension lets you pin specific files or selections to the conversation, use it. The model can only reason about what it sees. - -**When to step out to a web chat instead:** the discussion needs to outlive the editor session (a record you want to return to days later), needs to include collaborators who don't share your editor, or pulls in lots of non-code context (papers, third-party docs, screenshots). See [section 04: Conversations](../04-conversations/) for those patterns. - -### 4. Quick actions (rename, extract, add types) - -Many extensions surface "intelligent" versions of classic IDE operations: rename a symbol across a file or project, extract a selection into a function, add type hints to a function signature. - -**Best for:** classic refactors you would otherwise do by hand, with an AI doing the renaming or signature work. These are usually safe because the change set is small and visible. - -**Habit to build:** *check the scope.* "Rename across project" can change more than you expect — make sure you reviewed the file list or used a version-controlled state you can roll back from. - - -## When to escalate - -In-editor extensions are great inside their lane. Recognize when to step out of it: - -| Symptom | Escalate to | Why | -|---|---|---| -| The conversation needs to outlive this editor session, be shared with collaborators, or pull in non-code context | Web chat ([section 04](../04-conversations/)) | Web chats persist, share, and accept arbitrary content more easily | -| The edit you want spans many files | Agent ([section 05](../05-agentic-workflow/)) | Inline edit is per-file; agents handle cross-file work | -| You keep tabbing through bad suggestions for the same task | Write it yourself | The model doesn't have enough signal; you are faster | -| The output is large and the result needs verification | Write it yourself or pair with your own checks | Trust-but-verify gets expensive at scale | - - -## Habits that survive tool changes - -The tools will keep changing. These habits do not: - -- **Read every accepted suggestion.** Even autocompletes. Especially autocompletes. -- **Keep the cycle tight.** If the model is producing more than ~20 lines at a time without your review, you are no longer in the loop. -- **Use version control as a safety net.** Commit before any large AI-assisted change. `git diff` is the last line of defense. -- **Verify with your own checks.** Whether that means a formal test, a script that compares against a known answer, a plot you eyeball, or a hand calculation depends on what you are writing. The check has to come from you, not from the AI that wrote the code. -- **Be willing to write code yourself.** The AI is a tool, not a substitute for understanding what you're building. - - -## Exercises - -> **Exercise 1:** For one work session, keep autocomplete on but make a conscious "accept / reject / rewrite" decision on every suggestion of more than one token. Note how often each happens. The exercise is not "reject more" — it is to make the choice visible. - -> **Exercise 2:** Take a function in a recent project and use inline edit three times with three different instructions: a vague one (*"make this better"*), a specific one (*"split into parse and validate steps"*), and a constraint one (*"refactor to remove the nested if"*). Compare the diffs. - -> **Exercise 3:** Try a "no AI for one task" experiment: pick a small feature, write it yourself with the extension disabled. Then re-enable and use it for a comparable second feature. Note where the AI saved you time, where it cost you time, and where the difference was negligible. diff --git a/04-conversations/README.md b/04-conversations/README.md index a6656bd..3ea10ec 100644 --- a/04-conversations/README.md +++ b/04-conversations/README.md @@ -2,9 +2,11 @@ ## Key idea -A chat is at its best when you treat it as a *conversation*, not a search bar. Multi-turn discussions, such as design tradeoffs, exploring an unfamiliar library, or talking through a problem that you can't quite articulate, are where conversational interactions are better compared to single-shot edits or fire-and-forget agents. The patterns in this section apply whether you are in a dedicated web chat or your editor's side panel; see [section 03](../03-in-editor-workflow/) for choosing between those venues. The skill is steering the conversation so it stays useful. +A chat is at its best when you treat it as a *conversation*, not a search bar. Multi-turn discussions that include design tradeoffs, exploring an unfamiliar library, or talking through a problem that you can't quite articulate are where conversational interactions are a better approach than single-shot edits or fire-and-forget agents. -Another way to think about a chat with a capable model is as a kind of programming in natural language: you specify what you want, the model executes, you observe the output, and you refine the specification. The skills that make a programmer effective, including clarity, decomposition, anticipating ambiguity, and iterating, turn out to be the same skills that make a chat user effective. Until LLMs, natural language was almost never an executable specification, and this is one of the more remarkable shifts the technology has produced. This shift explains why "prompt engineering" became a buzzword. It's not magic words or incantations, it's specification quality. The same reason a vague programming spec produces buggy code, a vague prompt produces vague output. +The patterns in this section apply in two areas: the **in-project chat** panel inside your editor or CLI (Claude Code's panel, Cursor's chat, Continue.dev's panel) and a **web chat** in a browser tab (ChatGPT, Claude.ai, Gemini). While web chat is where most of us get our start with AI tools, you should probably default to the in-project chat, since it can see your files and edit them, while stepping out to a web chat only when the discussion does not belong in the project (it includes sensitive content, non-code context, or a record you want to share with collaborators). The skill is steering the conversation so it stays useful. + +Another way to think about a chat with a capable model is as a kind of programming in natural language. In the chat, you specify what you want, the model executes, you observe the output, and you refine the specification. Effective programming skills, including clarity, decomposition, anticipating ambiguity, and iterating, are the same ones that make a chat user effective. Until LLMs, natural language was almost never an executable specification, and this is one of the more remarkable shifts the technology has produced. This very shift explains why "prompt engineering" became a buzzword. It's not magic words or incantations, it's specification quality. ## Key goals @@ -18,14 +20,26 @@ Another way to think about a chat with a capable model is as a kind of programmi ## When a conversation is the right tool -The mode chart in [section 01](../01-three-modes/) tells you to reach for chat when the answer is words, not code-in-place. Within that bucket, multi-turn conversation is specifically valuable for: +The heuristic in [section 01](../01-two-worlds/) tells you to reach for chat when the answer is words, not code-in-place. Within that framing, multi-turn conversation is specifically valuable for: - **Design tradeoffs.** *"I'm choosing between approach A and B. What should I weigh?"* The model can lay out the dimensions and you can push back on weightings. - **Exploration of unfamiliar territory.** *"I've never used asyncio in Python. Walk me through what the event loop is actually doing."* Each follow-up question sharpens the model's answer. - **Talking through a problem you can't quite name.** *"Something feels wrong about this architecture but I can't put my finger on it. Here's the structure..."* The act of describing it often clarifies your own thinking, and the model's questions back can probe weak spots. -- **Learning a new domain or library.** Conversations let you ask the dumb questions you'd be embarrassed to ask a colleague repeatedly. +- **Learning a new domain or library.** Conversations let you ask the dumb questions you'd be embarrassed to ask a colleague. -If you find yourself wanting the model to *produce a specific edit*, you have drifted out of conversation territory. Switch to the editor or an agent. +If you find yourself wanting the model to *produce a specific edit*, you have drifted out of pure conversation territory. In an in-project chat, though, that drift is basically harmless. The same panel can just go make the edit, and you are now in an agentic interaction mode ([section 05](../05-agentic-workflow/)). In a web chat, though, the same drift is relatively expensive because you have to copy-paste the result back. If the conversation looks like it's heading toward edits, select the in-project chat from the start! + + +## In-project chat versus web chat: when to step out + +As we've seen, most coding conversations belong in the in-project chat. There are a few notable exceptions: + +- **The content is sensitive or untrusted.** A snippet from a paper, code from a server you don't own, an error log with credentials in it, anything covered by an NDA or IRB. A web chat reads but does not act, and you can scrub what you paste. (Privacy and security are discussed in [section 06](../06-verifying-and-citing/).) +- **The discussion needs to outlive the editor session and you want to revisit the *conversation itself* later.** Services like ChatGPT, Claude.ai, and Gemini save every conversation to your account history automatically (the same history whether you used the web or the desktop app), and you can mint a public share link with one click if you want to send it to a collaborator. In-project chats typically don't persist this way. If you mainly want the *takeaways* preserved rather than the full transcript, see "Carrying context across sessions" below. +- **The non-code context is large.** Papers, third-party docs, screenshots, long PDFs. Web chats handle these uploads natively, but in-project chats usually don't. +- **You don't want the AI's reasoning anchored to your codebase.** Sometimes you want a fresh-look answer that isn't biased by the file you happen to have open. + +When none of these apply, default to the in-project chat. The time savings from "the model can already see the file" compound fairly rapidly over a session. ## Opening well @@ -49,7 +63,7 @@ The first prompt gets you a generic comparison. The second gets you a specific r ## Managing context across turns -Chat models do not have memory of your project. Instead, they have memory of *this conversation*, and that memory is bounded by the model's **context window** — the amount of recent conversation it can attend to at once. Current chat services handle tens to hundreds of thousands of tokens per session, but once a chat exceeds the limit the interface usually truncates or summarizes the oldest turns silently, and even within the limit the model attends more reliably to recent content than to material from many turns ago. The three strategies below work because they keep the most relevant material at the recent end of the window where the model can still see it clearly. As the conversation grows, three things matter: +Chat models do not have memory of your project. Instead, they have memory of *this conversation*, and that memory is bounded by the model's **context window**, which is the amount of recent conversation it can attend to at once. Current chat services handle tens to hundreds of thousands of tokens per session, but once a chat exceeds the limit the interface usually truncates or summarizes the oldest turns silently, and even within the limit the model attends more reliably to recent content than to material from many turns ago. The three strategies below work because they keep the most relevant material at the recent end of the window where the model can still see it clearly. As the conversation grows, three things matter: ### Re-paste changed code rather than referring to "the function I sent earlier" @@ -57,7 +71,7 @@ If you've edited code based on the model's suggestion and want to continue the d ### Summarize your own thinking back to the model occasionally -Especially in longer conversations, a short *"so what I'm taking away is X, Y, Z — am I missing something?"* anchors the conversation and surfaces misunderstandings cheaply. It also forces you to articulate what you've learned, which is half the point. +Especially in longer conversations, a short *"so what I'm taking away is X, Y, Z. Am I missing something?"* anchors the conversation and surfaces misunderstandings cheaply. It also forces you to articulate what you've learned, which is half the point. ### Watch for the model drifting @@ -75,14 +89,25 @@ Conversations accumulate context that can both help and hurt. Start a new chat w There is no virtue in keeping a chat going longer than it needs to. Open a new one freely. Conversations are cheap. When you're deciding "should I keep this conversation going or start fresh?" you should be biased toward starting fresh. +## Carrying context across sessions + +Starting fresh is cheap, but it does throw away whatever the previous conversation taught the model about your project. If a project spans multiple sessions, you can carry the *takeaways* forward without carrying the full transcript: + +- **Ask the in-project chat to write a summary file as you go.** A `dev-log.md`, `notes.md`, or similar at the project root can capture decisions, dead ends, and "where I left off" notes. Next session, you (or the chat) read that file and pick up where you stopped. +- **Maintain a project-level instructions file that the tool re-reads on each session.** Claude Code reads a `CLAUDE.md` in your project root automatically on every session, so anything written there (project conventions, library choices, what "done" means for this work) is available without re-pasting. Cursor has a similar mechanism via `.cursorrules`. Other tools have their own variants. +- **Treat these files as the project's memory, not the chat's.** The conversation is ephemeral, but the files are not. When the chat learns something durable, write it to a file. + +The web-chat equivalent is your account history, where the *conversation itself* is the persistent artifact and you scroll back to find what you said before. The in-project equivalent puts the persistent artifact in your repo, where it lives with the code and travels with collaborators. + + ## Patterns that work -- **Compare and contrast.** *"What are the practical differences between pandas `merge` and `join`? When would I reach for each?"* Models are good at structured comparisons. +- **Compare and contrast.** *"What are the practical differences between pandas `merge` and `join`? When would I one or the other?"* Models are good at structured comparisons. - **Devil's advocate.** *"I'm planning to use approach X. What would make that a bad choice? What's the strongest argument against it?"* Inverts the default "let me help you do what you said" tendency. -- **Explain to a target audience.** *"Explain CRDTs to me as if I have 10 years of backend experience but no distributed-systems background."* The audience framing tightens the level of abstraction. +- **Explain to a target audience.** *"Explain finite-volume methods to me as if I have a strong finite-difference background but no CFD experience."* The audience framing tightens the level of abstraction: the model can skip discretization basics and focus on what's actually new (flux conservation across control volumes, dealing with unstructured meshes). - **Critique my draft.** *"Here is my approach / commit message / README. What's confusing or weak?"* Models are surprisingly useful as a first-pass reviewer. - **Walk me through.** *"Walk me through what happens when I call `requests.get(...)`. Don't skip the boring parts."* Good for building mental models of libraries you use but don't fully understand. -- **Iterate on the prompt itself.** *"What would I have to add to my question to get a better answer?"* or *"Help me rewrite this prompt to be more specific."* The model is often perceptive about its own failure modes, and the resulting prompt is sharper than what you started with. Especially valuable when you are crafting a prompt you will reuse — a template, a system prompt, or an agent's instruction. +- **Iterate on the prompt itself.** *"What would I have to add to my question to get a better answer?"* or *"Help me rewrite this prompt to be more specific."* The model is often perceptive about its own failure modes, and the resulting prompt is sharper than what you started with. Especially valuable when you are crafting a prompt you will reuse, such as a template, a system prompt, or an agent's instruction. ## Patterns that don't @@ -93,7 +118,7 @@ There is no virtue in keeping a chat going longer than it needs to. Open a new o ## When to stop talking and write code -A conversation has done its job when you can clearly articulate the next concrete action. At that point, more conversation is procrastination, and the work should move to whichever execution mode fits — an inline edit for a single function, an agent to draft a multi-file change you will then review, or your own keyboard for the parts that benefit from your hands-on judgment. If you get stuck, come back to the conversation. +A conversation has done its job when you can clearly articulate the next concrete action. At that point, more conversation is procrastination, and the work should move to execution. Ask the in-project chat to make the change (or, for multi-step work, brief an agent), or move to your own keyboard for the parts that benefit from your hands-on judgment. If you get stuck, come back to the conversation. Watch for these stop signals: @@ -110,4 +135,4 @@ The point of the conversation was to get you *to* the work, not to replace it. > **Exercise 2:** Take a conversation you had recently that felt unproductive. Reread it from the model's perspective: what context was missing? Was the question buried? Now imagine the version of the chat you'd have if you started over with what you know now. -> **Exercise 3:** Try the "devil's advocate" pattern on a decision you've already made and feel confident about. The discomfort of hearing the strongest argument against is informative — sometimes the decision survives intact (now better justified), sometimes it doesn't. +> **Exercise 3:** Try the "devil's advocate" pattern on a decision you've already made and feel confident about. The discomfort of hearing the strongest argument against is informative. Sometimes the decision survives intact (now better justified), sometimes it doesn't. diff --git a/05-agentic-workflow/README.md b/05-agentic-workflow/README.md index ea04f21..381176b 100644 --- a/05-agentic-workflow/README.md +++ b/05-agentic-workflow/README.md @@ -2,13 +2,13 @@ ## Key idea -An agentic tool is an AI that takes actions on its own, whether it's reading a file, running a command, editing, testing, reading the output, editing again, without you mediating each step. You set the goal and the agent runs the loop. That power is only useful when paired with judgment about *when* to deploy an agent and *how* to supervise it. +An agentic tool is an AI that takes actions on its own (reading files, running commands, editing, testing, observing results, editing again) without you mediating each step. You set the goal and the agent runs the loop. That power is only useful when paired with judgment about *when* to deploy an agent and *how* to supervise it. -This section is about *using* agentic tools as an engineer or scientist solving problems with code — models, data analysis, simulations, coursework — rather than as someone building production software for end users. If you want to understand how tool use works under the hood and how to build a system like this, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop) section on tool use and agentic systems. +This section is about *using* agentic tools as an engineer or scientist (for modeling, data analysis, simulations, or coursework), not building production software for end users. For how tool use works under the hood, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop) section on tool use and agentic systems. ## Key goals -- Recognize agentic mode and what distinguishes it from an editor extension +- Recognize when you have moved from in-project chat into agentic territory, and what changes when you do - Identify the kinds of task where an agent is most useful - Brief an agent in a way that produces good results - Supervise effectively: scope, permissions, review @@ -18,71 +18,72 @@ This section is about *using* agentic tools as an engineer or scientist solving ## What an agent actually does -A typical agentic tool is built on the same underlying model as the chat or editor extension, but wrapped in a *loop* that lets the model take actions in your environment. A simplified version of one step: +An agentic tool wraps a chat model in a *loop* that lets it take actions in your environment. In practice, the same in-project chat panel you use for a one-shot question ([section 04](../04-conversations/)) becomes an agent the moment you give it a multi-step goal. There is no separate "agent app" to launch. One step in the loop: 1. The model receives your goal and the current state (files, terminal output, etc.) 2. The model decides on the next action: read a file, run a command, write an edit 3. The action runs; the result is fed back to the model 4. Repeat until the model believes the goal is met (or it asks you a question) -The two things that make this different from in-editor edits: +What's new compared to a single chat message or autocomplete: -- **Actions are real.** The agent can run `rm`, `git push`, `pip install`, or hit external APIs. Permission models vary by tool, but the capability is the defining feature. -- **The model owns the plan.** You don't write the steps. Instead, the model figures out the steps. +- **Actions are real.** The agent can run `rm`, `git push`, `pip install`, or hit external APIs. Permission models vary, but the capability is the defining feature. +- **The model owns the plan.** You don't write the steps; the model figures them out and runs them. **Examples (early 2026):** Claude Code (CLI), Cursor (agent mode and Background Agents), Cline and Windsurf Cascade (VS Code), Microsoft Copilot agent, GitHub Copilot Workspace, Aider, and more autonomous platforms like Devin and Replit Agent. ## Variations on the basic loop -The read-act-observe-act cycle described above is the core, but the agentic landscape has expanded substantially through 2025 and into early 2026, and a working knowledge of the main variations helps you choose the right tool and supervise it well. +The read-act-observe-act cycle is the core, but the landscape expanded substantially through 2025: -- **Sub-agents and parallelism.** A primary agent can spawn sub-agents to handle independent branches of work — searching different parts of a codebase at once, running parallel investigations, or specializing roles such as one agent writing and another reviewing. Claude Code's `Agent` tool and similar features in other platforms enable this. The supervision burden shifts, though. You are no longer watching one loop but several. -- **Plan-then-execute modes.** Many agents now offer a mode where they first produce a written plan, you review and edit it, and only then do they execute. Claude Code's plan mode, Cursor's planning step, and similar features fit this pattern. It sits between "approve every action" (slow) and "let it run" (risky), and is often the right default for a non-trivial task. -- **Async and background agents.** Some agents run while you do other things and report back when finished — Cursor's Background Agents, Devin, Replit Agent, GitHub Copilot Workspace. The trade is real-time visibility for parallelism with your own work, and it changes how you brief the agent because you cannot easily course-correct mid-task. -- **MCP and external tools.** The Model Context Protocol, introduced by Anthropic in late 2024 and widely adopted since, lets agents connect to external systems through standardized servers — Slack, Linear, GitHub, databases, monitoring dashboards, file systems on remote machines. "The agent reads files and runs commands" is now a starting point rather than a ceiling; in practice, agents reach into whatever services your team uses. -- **Sandboxed execution.** Some agents run inside isolated virtual machines or containers, which limits what an agent can affect and means destructive actions only impact the sandbox. Devin and some Cursor modes work this way. The downside is reduced access to your real environment, but the upside is genuine freedom to experiment without risking your machine. +- **Plan-then-execute modes.** The agent first produces a written plan; you review and edit it; only then does it execute. Claude Code's plan mode and Cursor's planning step fit here. It sits between "approve every action" (slow) and "let it run" (risky), and is often the right default for a non-trivial task. +- **Sub-agents and parallelism.** A primary agent spawns sub-agents for independent branches of work, such as searching different parts of a codebase at once or specializing roles (one writes, one reviews). The supervision burden shifts: you watch several loops, not one. +- **Async and background agents.** Some agents run while you do other things and report back when finished (Cursor's Background Agents, Devin, Replit Agent, GitHub Copilot Workspace). You trade real-time visibility for parallelism with your own work, and you have to brief more carefully because mid-task course-correction is hard. +- **MCP and external tools.** The Model Context Protocol, introduced by Anthropic in late 2024, lets agents connect to external systems (Slack, Linear, GitHub, databases, dashboards, remote filesystems) through standardized servers. "Reads files and runs commands" is now a starting point, not a ceiling. +- **Sandboxed execution.** Some agents run inside isolated VMs or containers, so destructive actions only affect the sandbox. The downside is reduced access to your real environment; the upside is genuine room to experiment. -These variations do not change the supervision principles below — clear briefs, permission control, review the result — but they do change how those principles are applied. A plan-mode agent shifts review from the result to the plan; a sub-agent setup means supervising several flows at once; a sandboxed agent means review can be looser because consequences are contained. +These variations don't change the supervision principles below, only how they're applied: plan mode shifts review from result to plan, sub-agents multiply the loops you watch, sandboxing lets review run looser because consequences are contained. ## When to use an agent -Agents shine on tasks where the *work between steps* is the expensive part for a human: +Agents are best for tasks where the *work between steps* is the expensive part for a human: -- **Multi-file changes that need verification.** *"Rename this concept across the codebase and make sure the tests still pass"* — or, for scientific code without a formal test suite, *"...and make sure my analysis script still reproduces the expected numbers."* The agent reads, edits, re-runs the verification, re-edits if needed. You would do the same thing manually, with much more context-switching. +- **Multi-file changes that need verification.** *"Rename this concept across the codebase and make sure the tests still pass."* For scientific code without a formal test suite: *"...and make sure my analysis script still reproduces the expected numbers."* The agent reads, edits, re-runs verification, re-edits if needed. You would do the same thing manually with much more context-switching. - **Exploring an unfamiliar codebase.** *"How is authentication handled in this project? Find the entry point and explain the flow."* The agent grep-walks the project; you read the summary. - **Repetitive maintenance.** *"Update all the imports from `old_lib` to `new_lib` and adjust the calls that changed."* Mechanical, scoped, verifiable. - **End-to-end small features in well-tested code.** *"Add an endpoint that does X, following the patterns in the existing endpoints. Update the tests."* Agents are *less* useful for: -- **A single line you already know how to write.** Inline edit is faster. +- **A single line you already know how to write.** Autocomplete or typing it yourself is faster. +- **A targeted edit you can describe in one sentence.** A single message to the in-project chat ([section 04](../04-conversations/)) is faster than spinning up an agent loop. - **A design discussion.** Use chat. The agent has nowhere to act. -- **Anything where you don't know what "done" looks like.** The agent will reach a state and stop; if you can't tell whether it's the right state, you've shifted the problem rather than solving it. +- **Anything where you don't know what "done" looks like.** The agent will reach a state and stop. If you can't tell whether it's the right state, you've shifted the problem rather than solved it. ## Briefing an agent well -A good agent brief looks more like a task description for a new teammate than a search query. Include: +A good brief looks more like a task description for a new teammate than a search query. Include: - **The goal**, stated as outcome rather than steps. *"Add a `--dry-run` flag to the `migrate` command that prints what would change without writing anything."* - **Constraints** the agent might not infer. *"Use the existing logging helper rather than `print`. Match the style of the other flags."* -- **What "done" means.** *"All existing tests still pass. There is a new test verifying the `--dry-run` output for the simple case."* For code without a formal test suite, substitute whichever form of verification you use — sanity-check runs, known-answer comparisons, a regression script's expected output. +- **What "done" means.** *"All existing tests still pass. There is a new test verifying the `--dry-run` output for the simple case."* For code without a formal test suite, substitute whichever verification you use (sanity-check runs, known-answer comparisons, regression scripts). - **What to ask about, not assume.** *"If the migration step has side effects I can't easily reverse, stop and ask before running it."* -The single biggest predictor of an agent doing the right thing is how well-bounded the task is. *"Improve this code"* is poorly bounded; the agent will improve it in directions you may not want. *"Reduce the duplication between `parse_csv` and `parse_tsv` by extracting a shared helper, preserving the existing return signatures"* is well-bounded. +The biggest predictor of an agent doing the right thing is how well-bounded the task is. *"Improve this code"* is poorly bounded; the agent will improve it in directions you may not want. *"Reduce the duplication between `parse_csv` and `parse_tsv` by extracting a shared helper, preserving the existing return signatures"* is well-bounded. ## Supervision -Agentic tools work because they take real actions. That means real consequences if they take the wrong ones. Three things to think about before letting an agent loose: +Agentic tools work because they take real actions, which means real consequences when those actions are wrong. Three things to think about before letting one loose: ### Permissions -Most tools have a permission model: which commands run automatically, which require confirmation, which are blocked outright. Default toward *more* confirmation when you are starting out with a new tool or a new codebase. Speed up later as you learn what the agent does well. +Most tools have a permission model: which commands run automatically, which require confirmation, which are blocked. Default toward *more* confirmation with a new tool or a new codebase, and speed up as you learn what the agent does well. -A useful rule of thumb: **destructive or remote-affecting actions deserve confirmation.** Local edits to a project under version control are reversible. `git push --force`, `rm -rf`, `pip uninstall`, and anything that hits an external service or shared system are not. +Rule of thumb: **destructive or remote-affecting actions deserve confirmation.** Local edits to a project under version control are reversible. `git push --force`, `rm -rf`, `pip uninstall`, and anything that hits an external service or shared system are not. ### Working directory and damage control @@ -94,12 +95,12 @@ An agent pointed at a fresh sandbox can experiment freely. An agent pointed at y ### Review -The agent's report — *"I added the flag, updated the tests, and they pass"* — describes what it intended to do, not necessarily what it did. Always check: +The agent's report (*"I added the flag, updated the tests, and they pass"*) describes what it intended to do, not necessarily what it did. Always check: -- `git diff` (or the equivalent) — what actually changed? -- Any verification — tests, sanity-check scripts, known-answer comparisons — did it actually verify the new behavior, or did it get loosened to pass? -- Any new files — were they expected? -- Any commands run — were there surprises in the output? +- `git diff`: what actually changed? +- Verification: did the tests or sanity checks actually exercise the new behavior, or did they get loosened to pass? +- New files: were they expected? +- Commands run: any surprises in the output? Spot-checking is fast. Skipping it is how subtle bugs and security issues land in your codebase. @@ -108,23 +109,23 @@ Spot-checking is fast. Skipping it is how subtle bugs and security issues land i Agentic tools use many model calls per task. A task that takes one back-and-forth in chat can take thirty in an agent. Watch for: -- **Long-running loops.** If an agent has been working for a long time without progress, it may be stuck in a try-fix-try cycle. Intervening early is cheaper than letting it grind. -- **Wide context.** Agents that read many files pay for that context on every step. Pointed work in a small subdirectory costs less than open-ended exploration of a large repo. -- **Wandering.** If the agent has drifted from the original goal, stopping and restarting with a tighter brief is usually cheaper than letting it wander back on its own. +- **Long-running loops.** If an agent has been working a long time without progress, it may be stuck in a try-fix-try cycle. Intervening early is cheaper than letting it grind. +- **Wide context.** Agents pay for every file they read on every step. Pointed work in a small subdirectory costs less than open-ended exploration of a large repo. +- **Wandering.** If the agent has drifted from the original goal, stop and restart with a tighter brief rather than letting it wander back on its own. ## Common failure modes -- **The agent does the wrong thing efficiently.** The brief was ambiguous; the agent picked one interpretation and proceeded fast. Catch this in review and brief better next time. -- **Checks get loosened rather than the code being fixed.** The agent finds a failing test or a sanity check that doesn't pass, decides the check was wrong, and weakens it rather than fixing what it was checking. Always look at what changed in your verification scripts and test files. -- **Cascading small edits.** The agent makes a small change, notices a knock-on, fixes that, notices another, fixes that... twenty edits later, half the codebase has been touched. Tight scopes and good initial briefs prevent this. -- **Confident hallucinations about a library or API.** The agent will use a function that doesn't exist with full confidence, then patch around its own error when the test fails. Pin the agent to documentation or examples when the library is unfamiliar. +- **The agent does the wrong thing efficiently.** The brief was ambiguous; the agent picked one interpretation and proceeded fast. Catch in review and brief better next time. +- **Checks get loosened rather than the code being fixed.** The agent finds a failing check, decides the check was wrong, and weakens it. Always look at what changed in your verification scripts and test files. +- **Cascading small edits.** A small change triggers a knock-on, which triggers another, and twenty edits later half the codebase has been touched. Tight scope and a good brief prevent this. +- **Confident hallucinations about a library or API.** The agent uses a function that doesn't exist, then patches around its own error when the test fails. Pin the agent to documentation or examples when the library is unfamiliar. - **Permissions creep.** "Just this once, allow this command unsupervised" turns into a default. Re-tighten when you change tasks. ## Exercises -> **Exercise 1:** Pick a small, well-scoped task you've been putting off — a refactor, a chore, a small feature — and brief an agent to do it. Write the brief first, before invoking the tool. Note how often you wanted to add a detail you forgot. +> **Exercise 1:** Pick a small, well-scoped task you've been putting off (a refactor, a chore, a small feature) and brief an agent to do it. Write the brief first, before invoking the tool. Note how often you wanted to add a detail you forgot. > **Exercise 2:** Compare an agentic run with a manual run of the same task on a small scale. Time both. Account not just for elapsed time but for the *quality* of the result and the time you spent reviewing. diff --git a/06-verifying-and-citing/README.md b/06-verifying-and-citing/README.md index d3d4c98..e3e18e2 100644 --- a/06-verifying-and-citing/README.md +++ b/06-verifying-and-citing/README.md @@ -60,7 +60,7 @@ Trust is not "the model is bad at X." It's "the *consequence* of being wrong abo ## Part 2: Privacy and IP -### Baseline: the same risk profile as other cloud services +### Baseline: similar risk profile as other cloud services When you paste content into a chat, that content is sent to the service. Editor extensions and agents do the same with the files they read. This is the same baseline as Gmail, Google Drive, GitHub, OneDrive, Dropbox, or any other cloud service you already use. In these cases, your content lives on someone else's servers, subject to their data-handling policies. For most academic and research work, including coursework, classroom code, public datasets, open-source libraries, drafts of your own writing, the privacy risk is no different from what you accept every time you use those other services. @@ -68,7 +68,7 @@ When you paste content into a chat, that content is sent to the service. Editor The genuine differences are narrower than the general "the cloud is watching" framing suggests, but they are real: -1. **Training-data inclusion.** Gmail and Drive do not train on your content. AI services historically have, varying by the provider and the plan. WHile defaults have been changing, and most paid and enterprise tiers now opt out, the precedent is real and there is no Gmail equivalent. Check the current setting for the service you use. +1. **Training-data inclusion.** Gmail and Drive do not train on your content. AI services historically have, varying by the provider and the plan. While defaults have been changing, and most paid and enterprise tiers now opt out, the precedent is real and there is no Gmail equivalent. Check the current setting for the service you use. 2. **Aggregation richness.** A chat history reveals more than an email archive. What you are working on, what you do not know, what you are puzzling over. These can accumulate in conversations in a way they do not in inboxes. Aggregated chat history is potentially a richer and more sensitive documentation of your work than aggregated email is. 3. **Routine review of flagged content.** Most AI services explicitly reserve the right to have humans review conversations flagged by their safety systems. Gmail has no equivalent "if our spam filter trips, a person may read this" policy. In practice your conversations are almost certainly not reviewed, but the legal posture is different. diff --git a/07-local-models/README.md b/07-local-models/README.md index 41b51fa..9bdae51 100644 --- a/07-local-models/README.md +++ b/07-local-models/README.md @@ -2,14 +2,14 @@ ## Key idea -You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models are not a fourth *mode* on top of chat, editor, and agent — they cut across all three. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself. +You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models cut across every workflow we've covered — web chat, autocomplete, in-project chat, and agentic — rather than being a separate mode. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself. This section is about local models as a *user* of AI coding tools. If you want to understand how local models work under the hood, train your own, or build the infrastructure around them, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop). ## Key goals - Understand why you might prefer a local model to a cloud model -- Recognize which tools in each of the three modes support local models +- Recognize which tools across the autocomplete/chat/agent spectrum support local models - Calibrate expectations about capability and latency relative to frontier cloud models - Identify the situations where local is the right choice and where cloud still wins @@ -47,11 +47,11 @@ A rough sense of what runs comfortably where, as of early 2026: If you took the time to fill out the spec table in [computing-setup section 01](https://lem.che.udel.edu/git/furst/computing-setup/src/branch/main/01-know-your-machine/), you already know what tier you're in. -## Local models across the three modes +## Local models across the workflow -The three-mode framing from [section 01](../01-three-modes/) still applies — what changes is the host. +The framing from [section 01](../01-two-worlds/) still applies — what changes is the host. Below, we walk through where local models fit in each kind of work. -### Local in *chat* mode +### Local in *web-chat* style You can have a private, local ChatGPT-style experience entirely on your laptop. @@ -62,15 +62,15 @@ You can have a private, local ChatGPT-style experience entirely on your laptop. | **Open WebUI** | A self-hosted web UI (like ChatGPT) that talks to Ollama or any OpenAI-compatible backend. Good if you want a familiar chat experience or want to share access on a LAN. | | **Jan**, **GPT4All** | Other desktop chat apps with similar goals. | -The Ollama-powered backends in particular are useful well beyond chat — most of the editor and agentic tools below can connect to an Ollama endpoint, which means setting up Ollama once unlocks every mode. +The Ollama-powered backends in particular are useful well beyond chat — most of the in-editor and agentic tools below can connect to an Ollama endpoint, which means setting up Ollama once unlocks every other use case. -### Local in *editor* mode +### Local for autocomplete and in-project chat -Several VS Code extensions support local models. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension. +Several VS Code extensions support local models for autocomplete and side-panel chat. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude (legacy) extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension. | Extension | Notes | |---|---| -| **Continue.dev** | Open-source, the flagship local-friendly extension. Works with Ollama, LM Studio, llama.cpp, and many cloud providers. Supports autocomplete, inline edit, and a chat panel. The first tool to try. | +| **Continue.dev** | Open-source, the flagship local-friendly extension. Works with Ollama, LM Studio, llama.cpp, and many cloud providers. Supports autocomplete and a chat panel. The first tool to try. | | **Cody** (Sourcegraph) | Has a "local context" mode and can use local models via Ollama. Also has a strong cloud product. | | **Llama Coder** | Ollama-focused, autocomplete-first. Lightweight. | | **Tabby** | A self-hosted code completion server. Heavier setup but good for shared use within a team or lab. | diff --git a/README.md b/README.md index 3a5c366..59287bd 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,22 @@ # Coding with AI -A practical guide to working effectively with AI coding assistants — chat interfaces (ChatGPT, Claude, Gemini, Microsoft Copilot), in-editor extensions, and agentic tools. Our focus is on *workflow* and *judgment*: when to reach for which mode, what to paste, how to prompt, how to verify, and what to cite. +A practical guide to working effectively with AI coding assistants — web chat interfaces (ChatGPT, Claude, Gemini, Microsoft Copilot), in-project chat panels and CLIs (Claude Code, Cursor), autocomplete, and agentic tools. Our focus is on *workflow* and *judgment*: when to reach for which tool, what to paste, how to prompt, how to verify, and what to cite. AI tools change quickly, but the patterns change slowly. This guide aims at the patterns and uses current tools as examples. -**A note on scope.** This guide is about *coding* — writing, editing, refactoring, and debugging software. Students and engineers also use AI tools heavily for *learning* tasks: explaining concepts, summarizing literature, generating practice problems, study quizzes, mnemonics, working through homework, finding the right vocabulary for a half-remembered idea. The three-mode framework here applies broadly, but the tools, examples, and tradeoffs for learning use cases are different enough to deserve their own guide. +**A note on scope.** This guide is about *coding* — writing, editing, refactoring, and debugging software. Students and engineers also use AI tools heavily for *learning* tasks: explaining concepts, summarizing literature, generating practice problems, study quizzes, mnemonics, working through homework, finding the right vocabulary for a half-remembered idea. The web-chat-vs-in-project framing here applies broadly, but the tools, examples, and tradeoffs for learning use cases are different enough to deserve their own guide. ## Sections | # | Topic | Description | |---|-------|-------------| -| [01](01-three-modes/) | **Three modes** | Web chat, in-editor, and agentic. When to use each one and the heuristics for choosing. | +| [01](01-two-worlds/) | **Two worlds** | Web chat versus tools that live with your code. Why the second is where coding work belongs, and the autocomplete/chat/agent spectrum within it. | | [02](02-errors-and-logs/) | **Errors and logs** | The canonical copy-paste case. How to frame what you paste so the assistant can actually help. | -| [03](03-in-editor-workflow/) | **In-editor workflow** | Autocomplete, inline edit, "explain this," refactor. Patterns that make the editor extension worth its slot. | -| [04](04-conversations/) | **Conversations** | Multi-turn design discussions, managing context, when to start a fresh chat. | +| [03](03-autocomplete/) | **Autocomplete** | Ghost-text suggestions as you type. What it's good for, the traps (especially in verification code), and when to escalate. | +| [04](04-conversations/) | **Conversations** | Multi-turn design discussions in the in-project chat or a web chat, managing context, and when to start a fresh chat. | | [05](05-agentic-workflow/) | **Agentic workflow** | What agentic tools (Claude Code, Cursor agent, Microsoft Copilot agent mode) actually do, and how to supervise them. | | [06](06-verifying-and-citing/) | **Verifying and citing** | Reviewing AI output for hallucinations and silent errors. Privacy and IP of what you paste. Attribution in academic and professional work. | -| [07](07-local-models/) | **Using local models** | Local models as a cross-cutting alternative — privacy, cost, offline operation. Which tools support local in each of the three modes, and where the capability gap to cloud still matters. | +| [07](07-local-models/) | **Using local models** | Local models as a cross-cutting alternative — privacy, cost, offline operation. Which tools support local across the autocomplete/chat/agent spectrum, and where the capability gap to cloud still matters. | ## Who this is for @@ -25,11 +25,11 @@ Students and practicing engineers who are already using AI assistants but want t ## Prerequisites - A working development setup (editor, terminal, version control). See [computing-setup](https://lem.che.udel.edu/git/furst/computing-setup) and [cli-walkthrough](https://lem.che.udel.edu/git/furst/cli-walkthrough) for the underlying skills. -- Access to at least one AI tool. The examples use Claude and ChatGPT in chat form, and GitHub Copilot / Claude / Codeium / Microsoft Copilot interchangeably as editor extensions. University-provided access (e.g., Microsoft Copilot or Gemini through institutional agreements) works equally well for nearly everything covered here. +- Access to at least one AI tool. The examples use Claude and ChatGPT in web chat form, Claude Code or Cursor as in-project chat / agentic tools, and GitHub Copilot / Codeium / Microsoft Copilot interchangeably for autocomplete. University-provided access (e.g., Microsoft Copilot or Gemini through institutional agreements) works equally well for nearly everything covered here. ## A note on tools and dates -Tool capabilities, pricing, and policies change frequently. Where this guide names a specific feature ("Cursor's agent mode," "Claude Code"), the description reflects what those tools did as of the first half of 2026. The underlying patterns, inlcuding copy-paste versus in-editor versus agentic AI, are durable. Remember to treat any tool-specific advice as illustrative. +Tool capabilities, pricing, and policies change frequently. Where this guide names a specific feature ("Cursor's agent mode," "Claude Code"), the description reflects what those tools did as of the first half of 2026. The underlying patterns — web chat versus tools that live with your code, and the autocomplete/chat/agent spectrum within the latter — are durable. Treat any tool-specific advice as illustrative. ## License