diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..9fc9e21 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +CHANGES.md diff --git a/01-two-worlds/README.md b/01-two-worlds/README.md index 767d708..f441292 100644 --- a/01-two-worlds/README.md +++ b/01-two-worlds/README.md @@ -2,12 +2,12 @@ ## Key idea -Most people meet AI assistance through web chat: open a browser tab, paste in a problem, copy the answer back. That works for one-shot questions, but it is the wrong tool for actually writing code. The goal of this guide is to move you off the copy-paste habit and onto tools that live with your code. +Most people meet AI assistance through web chat. They open a browser tab, paste in a problem, copy the answer back. That works for one-shot questions, but it is the wrong tool for actually writing code. The goal of this guide is to move you off the copy-paste habit and onto tools that live with your code. -There are really only two worlds worth distinguishing: +There are only two worlds worth distinguishing: -1. **Web chat** — a browser tab. The AI has no awareness of your project. -2. **In your editor or terminal** — the AI lives where your code does. It can read your files, change them directly, and often run commands too. +1. **Web chat.** A browser tab or standalone app. The AI has no awareness of your project. +2. **In your editor or terminal.** The AI lives where your code does. It can read your files, change them directly, and often run commands too. The first is useful for explanation, interpretation, and sometimes planning. The second is what you want to use for any actual coding work. @@ -30,12 +30,12 @@ A browser- or app-based conversation with a model. You type, paste, or drag cont - One-shot interpretation: *explain this error, what does this log mean, what does this regex match* - Multi-turn design discussion: *"I'm choosing between approach A and B, what should I think about?"* - Non-code work: drafting documentation, writing commit messages, explaining a concept -- Content you do not want the AI to "live in" — a snippet from a paper, output from a server you don't own, a script from an unfamiliar repo +- Content you do not want the AI to "live in," such as a snippet from a paper, output from a server you don't own, or a script from an unfamiliar repo **Weaknesses:** - It's disconnected from your project. The model is stateless and has no idea what files exist, what your conventions are, or what you changed five minutes ago, unless you paste it in. -- Round-trip friction: copy from terminal → paste into chat → wait → read → copy answer → paste back. Fine for one-shot, painful for iterative editing. +- Round-trip friction: you copy from the terminal, paste into chat, wait, read, copy the answer, and paste back into your editor. Fine for one-shot, painful for iterative editing. - You have to remember to bring the relevant context with you every time. **The trap:** because web chat was the first interface most people learned, it becomes their default. If you find yourself pasting code back and forth between a browser tab and your editor more than a couple of times in a session, you are using the wrong tool. @@ -72,7 +72,7 @@ The line between (2) and (3) is fuzzy, and that's fine. A chat with an in-projec A fourth pattern, central to the Copilot/Cursor era of 2023–2024, is the **inline edit**: highlight a block of code, press a hotkey (Cmd+K in Cursor, "Edit with Copilot" in VS Code), type *"make this async"* or *"add error handling,"* and a diff appears in place. There is no chat, no agent loop, just one selection, one instruction, and one diff. -Inline edits emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4) — the first models that could reliably take a natural-language instruction and produce a corresponding code transformation. They sat between the earlier completion-only era (autocomplete, powered by Codex and similar) and the agentic loops that followed. As tools became agentic, the pattern has faded: you can do the same thing by asking the in-project chat *"switch the function I have highlighted to async,"* and the result is the same diff. Newer users may skip the inline-edit hotkey entirely. We mention it so you recognize what older tutorials are describing, not as a workflow you need to learn from scratch. +Inline edits emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4), which were the first models that could reliably take a natural-language instruction and produce a corresponding code transformation. They existed between the earlier completion-only era (autocomplete, powered by Codex and similar) and the agentic loops that followed. As tools became agentic, the pattern has faded. You can do the same thing by asking the in-project chat *"switch the function I have highlighted to async,"* and the result is the same. Newer users may skip the inline-edit hotkey entirely. We mention it so you recognize what older tutorials are describing, not as a workflow you need to learn from scratch. ## How to choose @@ -80,8 +80,8 @@ Inline edits emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4) Two questions cover almost every case: 1. **Should the answer be words I'll read, or code that should land in a file?** - - Words → either world works - - Code → use world 2; world 1 forces a copy-paste loop + - If the answer is words, either approach works + - If the answer is code, use world 2 (world 1 forces a copy-paste loop) 2. **Do I want the AI to be able to act on this content?** - Yes, then use world 2 - No (sensitive snippet, untrusted code, output from a server you don't own), then world 1 is safer because the tool isn't wired to your filesystem @@ -93,12 +93,11 @@ Two questions cover almost every case: | Explain an error, parse a log, interpret some output | Either world | Words-out, no edit needed; pick whichever is open | | Anything that should result in code landing in a file | World 2 | Removes the copy-paste round trip | | A targeted edit you can describe in one sentence | In-project chat (world 2) | Fast, low-overhead, you read the diff before accepting | -| Multi-step work — cross-file changes, run-tests-and-fix loops, "explore the project and then change it consistently" | Delegated agent (world 2) | The model owns the sequence; you set the goal and review the end state | +| Multi-step work, such as cross-file changes, run-tests-and-fix loops, or "explore the project and then change it consistently" | Delegated agent (world 2) | The model owns the sequence; you set the goal and review the end state | | Reviewing or thinking through code you do not want the AI to act on | Web chat (world 1) | The tool can read but not edit | | Deciding between two approaches; talking through a design | Either world | Conversation UX is what matters | - -## Two principles underneath +In summary, remember these two things: **Match the mode to the output target.** If the answer should *be code in a file*, use a tool that can put it there. If the answer should be a conversation, either approach works. diff --git a/02-errors-and-logs/README.md b/02-errors-and-logs/README.md index 8fe9f45..2b2970b 100644 --- a/02-errors-and-logs/README.md +++ b/02-errors-and-logs/README.md @@ -2,14 +2,14 @@ ## Key idea -Errors, stack traces, and log output are *exactly* the kind of thing chat models excel at parsing. A transformer's attention is built for finding the relevant token among noise. Use it! +Errors, stack traces, and log output are *exactly* the kind of thing chat models excel at parsing. A transformer's attention is built for finding the relevant token among (machine generated) noise. Use it! -Errors and logs are the canonical copy-paste use case. The trick is pasting *enough* context but not *too much*, and being explicit about what you were trying to do. +Errors and logs are the canonical copy-paste use case of World 1. The trick is pasting *enough* context but not *too much*, and being explicit about what you were trying to do. ## Key goals - Recognize when a chat is the right tool for an error or log -- Paste the right amount of context — neither too little nor too much +- Paste the right amount of context (neither too little nor too much) - Frame the paste so the model can give a useful answer, not a guess - Use the answer as a starting point, not gospel @@ -17,16 +17,16 @@ Errors and logs are the canonical copy-paste use case. The trick is pasting *eno ## Why chat is the right mode here -A typical Python traceback is 10–40 lines of mostly-noise with one or two lines of actual signal. A 500-line server log has maybe three lines that matter. Two reasons chat works here: +A typical Python traceback is 10–40 lines of mostly noise with one or two lines of actual signal. A 500-line server log has maybe three lines that matter. Two reasons chat works here: 1. **The output is words, not code-in-place.** You are looking for an explanation or at least a pointer, not an edit to a file. Chat is the stronger choice. 2. **The input is self-contained.** You can paste the whole error and the model can reason about it without needing your project layout, history, or build state. -An in-project chat can handle errors well too — and is often the right tool, because it can see the file that threw the error without you pasting it. For a long traceback or a multi-page log that doesn't fit cleanly in your editor's chat panel, or when the failure involves output from a server or system that isn't part of your project, a web chat's room to expand and your ability to copy-paste freely make it the better venue. +An in-project chat can handle errors well too — and is often the right tool, because it can see the file that threw the error without you pasting it. For a long traceback or a multi-page log that doesn't fit cleanly in your editor's chat panel, or when the failure involves output from a server or system that isn't part of your project, a web chat's room to expand and your ability to copy-paste freely make it the better choice. ## What to paste -Three rules: +A good practice is to follow these three simple rules: ### 1. Paste the whole relevant block, not just the last line @@ -51,11 +51,17 @@ ValueError: could not convert string to float: 'N/A' The second version makes the cause (the string `'N/A'` in the data) obvious. The first leaves the model guessing. -### 2. Trim noise that doesn't help +### 2. Trim only when you have to -A 2000-line log with five relevant lines is harder for the model (and you) than 50 lines centered on the relevant region. Use `grep`, `tail`, or your eyes to narrow it down. Include enough surrounding context that the model can see the lead-up to the failure, but cut the parts that are clearly unrelated (startup banners, unrelated services, repeated heartbeat lines). +The advice you'll often hear is "trim aggressively before pasting." In practice, modern chat models are very good at finding the few relevant lines in a noisy log — that's exactly what attention is for. For a typical few-hundred-line log, pasting the whole thing usually works fine, and trimming buys you little. -If you do not know which part is relevant, paste a reasonable chunk and *say so*: "I think the failure is somewhere in here but I'm not sure where to look." +Trimming is worth the effort in three specific cases: + +1. **The log is large enough to bump against context limits** (tens of thousands of lines, or a multi-megabyte file). Here you have no choice — use `grep`, `tail`, or your editor to cut it down. +2. **The noise is the *kind* that misleads** — unrelated red `ERROR` lines from a different service, deprecation warnings that look like the failure but aren't, repeated retries that bury the original cause. A model will sometimes latch onto the loudest-looking line rather than the actual one. +3. **You want a faster, cheaper turn.** Less input means less to read and quicker iteration. + +Outside those cases, don't agonize over it. If you do not know which part is relevant, paste a reasonable chunk and *say so*: "I think the failure is somewhere in here but I'm not sure where to look." The model will find it. ### 3. Include the command or code that triggered it @@ -88,18 +94,18 @@ If you've already tried things, say so. *"I tried `pd.read_csv(..., na_values='N ## What to do with the answer -The model's first answer to an error is often *plausible but wrong about the root cause*. It will identify the right neighborhood — "there is a non-numeric value in your column" — but may guess wrong about the specific row or the fix that works in your case. Treat the answer as a **search query**, not a firm result: +The model's first answer to an error is often *plausible but wrong about the root cause*. It will identify the right neighborhood ("there is a non-numeric value in your column") but may guess wrong about the specific row or the fix that works in your case. Treat the answer as a **search query**, not a firm result: - Does the diagnosis match what you see in your data or code? - Is the suggested fix one you can verify quickly (one line, one test)? - If a suggested fix doesn't work, *say so* in the next turn and paste the new output. The second answer is usually much better because the first one ruled out a possibility. -The biggest mistake students make with chat-based error help is **treating the first response as authoritative**. The right mental model is: the model gives you a focused hypothesis, but you do the verification. +The biggest mistake we often make with chat-based error help is treating the first response as *authoritative*. The right mental model is: the model gives you a focused hypothesis, but you do the verification. ## Common pitfalls - **Pasting only the last line of the traceback.** The model can guess, but it's a guess. Paste the whole traceback. -- **Pasting a 2000-line log unfiltered.** The model wastes attention on irrelevant material and the answer suffers. Trim. +- **Pasting a log so large it doesn't fit, or one whose noise actively misleads.** A few hundred noisy lines is fine; tens of thousands of lines, or a log full of unrelated-but-loud `ERROR` entries, is when trimming earns its keep. - **Pasting code with no error message.** "Why doesn't this work?" without the actual failure makes the model invent failure modes. Always run the code and paste what happened. - **Pasting proprietary code into a public chat.** See [section 06](../06-verifying-and-citing/) — what you paste, the service sees and may log. Match the chat to the sensitivity of the content. - **Not iterating.** If the first answer is wrong, the second is usually better. Treat the conversation as a debugging session, not a single oracle query. diff --git a/03-autocomplete/README.md b/03-autocomplete/README.md index 5922252..7472e7a 100644 --- a/03-autocomplete/README.md +++ b/03-autocomplete/README.md @@ -1,8 +1,10 @@ # Autocomplete +> **Heads up:** Autocomplete is the least important of the three in-editor modes covered in this guide, and its share of day-to-day AI-assisted coding has shrunk as in-project chat and agentic workflows have taken over. If you don't currently use ghost-text autocomplete, you can skim this section and move on to [section 04](../04-conversations/) without missing anything that later sections depend on. The traps below (especially around verification) still matter if you *do* use it. + ## Key idea -Autocomplete is the lowest-friction way to work with an AI assistant: ghost text appears as you type, you accept with Tab or keep typing to ignore. It is the *one* form of AI assistance that does not require you to write a prompt — the act of typing is the prompt. +Autocomplete is the lowest-friction way to work with an AI assistant: ghost text appears as you type, you accept with Tab or keep typing to ignore. It is the *one* form of AI assistance that does not require you to write a prompt. The act of typing is the prompt. That cheapness is its strength and its trap. Because accepting a suggestion is a single keystroke, it is easy to accept code you did not actually read. The skill of using autocomplete well is almost entirely about *what you accept* and *what you reject*, not about how you invoke it. @@ -21,7 +23,7 @@ As you type, the extension sends a window of context (the current file, usually **Examples (early 2026):** GitHub Copilot, Codeium, Cursor Tab, Continue.dev, Microsoft Copilot in VS Code. Most agentic tools (Claude Code, Cline) do *not* provide ghost-text autocomplete — they're optimized for chat-and-agent interaction. If you want autocomplete and an agentic tool, you generally run two extensions side by side. -The model is small and fast on purpose; the latency budget is the time between your keystrokes, which is short. Don't expect the depth of reasoning you get from a chat-with-a-frontier-model — autocomplete is pattern completion, not analysis. +The model is small and fast on purpose; the latency budget is the time between your keystrokes, which is short. Don't expect the depth of reasoning you get from a chat-with-a-frontier-model. Autocomplete is pattern completion, not analysis. ## Where autocomplete shines @@ -35,7 +37,7 @@ The common thread: the model has all the information it needs *in the few lines ## Where autocomplete fails -- **Anything that requires understanding a wider context.** If the right answer depends on what a function in another file does, autocomplete will guess — and the guess looks plausible. +- **Anything that requires understanding a wider context.** If the right answer depends on what a function in another file does, autocomplete will guess. The guess looks plausible. - **Novel logic.** If you are doing something the codebase has not done before, the model will pattern-match to something *similar* and produce confident-looking code that is subtly wrong. - **Anything where "correct" is non-obvious from the surface.** Off-by-one indices, edge cases in numerical code, units, sign conventions, the precise contract of an API you are calling. @@ -44,13 +46,13 @@ The common thread: the model has all the information it needs *in the few lines ### Read the suggestion before accepting it -The cost of accepting wrong code that *looks* right is high. You will find the bug an hour later in a debugger when you could have caught it in 200 milliseconds. If a suggestion is more than a few lines, the right move is to read it, decide, and either accept or rewrite — don't Tab-and-pray. +The cost of accepting wrong code that *looks* right is high. You will find the bug an hour later in a debugger when you could have caught it in 200 milliseconds. If a suggestion is more than a few lines, the right move is to read it, decide, and either accept or rewrite. Don't Tab-and-pray. A useful threshold: if the suggestion is longer than the comment or signature that triggered it, slow down. ### Do not autocomplete your verification -This is the single most damaging autocomplete failure mode in scientific and engineering code. +This is the most damaging autocomplete failure mode in scientific and engineering code. Whether your verification is a formal unit test, a sanity-check script, a comparison against a known answer, or a hand-checked numerical result, it is supposed to be *your* expression of what the code should do. If the model writes the check based on the code, the check passes by construction and confirms nothing. @@ -75,18 +77,18 @@ Autocomplete is one rung of a ladder. Step off when: ## A historical note: inline edits -Older guides and tutorials (and the 2024-era marketing for Copilot and Cursor) put **inline edit** — highlight a block, press Cmd+K, type *"make this async"* — alongside autocomplete as the second main in-editor interaction. The pattern emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4), which were the first models that could reliably turn a natural-language instruction into a code transformation. It sat between the earlier completion-only era (autocomplete, powered by Codex and similar) and the agentic loops that followed. The hotkey still exists in most tools, but the pattern is fading because in-project chat does the same job with better context. +Older guides and tutorials (and the 2024-era marketing for Copilot and Cursor) put **inline edit** (highlight a block, press Cmd+K, or type *"make this async"*) alongside autocomplete as the second main in-editor interaction. The pattern emerged in 2023 alongside instruction-tuned LLMs (ChatGPT, GPT-4), which were the first models that could reliably turn a natural-language instruction into a code transformation. It sat between the earlier completion-only era (autocomplete, powered by Codex and similar) and the agentic loops that followed. The hotkey still exists in most tools, but the pattern is fading because in-project chat does the same job with better context. -If you have *highlighted code and a one-sentence instruction*, an inline edit and an in-project chat message produce essentially the same diff. The chat panel just makes it easier to follow up, ask why, or refine. We don't teach inline edit as a primary workflow here. If your tool of choice still leans on it, the same one-sentence-spec discipline applies. +If you have *highlighted code and a one-sentence instruction*, an inline edit and an in-project chat message produce essentially the same change. The chat panel just makes it easier to follow up, ask why, or refine it. We don't teach inline edit as a primary workflow here. If your tool of choice still leans on it, the same one-sentence-spec discipline applies. ## Habits that survive tool changes -The tools will keep changing. These habits do not: +The tools will keep changing, but these are still good habits to follow: - **Read every accepted suggestion.** Even short ones. Especially short ones in numerical code, where a sign flip looks the same as the right answer. -- **Keep the cycle tight.** If autocomplete is producing more than ~10 lines at a time for you, you are no longer reviewing in real time — you are reading code the AI wrote, which is a different mode. -- **Use version control as a safety net.** Commit before a stretch of heavy AI-assisted coding. `git diff` is the last line of defense. +- **Keep the cycle tight.** If autocomplete is producing more than ~10 lines at a time for you, you are no longer reviewing in real time. You are reading code the AI wrote, which is a different mode. +- **Use version control as a safety net.** Commit before a stretch of heavy AI-assisted coding. `git diff` is your fallback check. - **Verify with your own checks.** The check has to come from you, not from the AI that wrote the code. - **Be willing to turn it off.** Autocomplete is the right tool sometimes and the wrong tool other times. Toggling it off for a session where you want to think is a real productivity move. diff --git a/04-conversations/README.md b/04-conversations/README.md index 3ea10ec..789ba11 100644 --- a/04-conversations/README.md +++ b/04-conversations/README.md @@ -4,9 +4,9 @@ A chat is at its best when you treat it as a *conversation*, not a search bar. Multi-turn discussions that include design tradeoffs, exploring an unfamiliar library, or talking through a problem that you can't quite articulate are where conversational interactions are a better approach than single-shot edits or fire-and-forget agents. -The patterns in this section apply in two areas: the **in-project chat** panel inside your editor or CLI (Claude Code's panel, Cursor's chat, Continue.dev's panel) and a **web chat** in a browser tab (ChatGPT, Claude.ai, Gemini). While web chat is where most of us get our start with AI tools, you should probably default to the in-project chat, since it can see your files and edit them, while stepping out to a web chat only when the discussion does not belong in the project (it includes sensitive content, non-code context, or a record you want to share with collaborators). The skill is steering the conversation so it stays useful. +The patterns in this section apply in two areas: the **in-project chat** panel inside your editor or CLI (Claude Code's panel, Cursor's chat, Continue.dev's panel) and a **web chat** in a browser tab (ChatGPT, Claude.ai, Gemini). While web chat is where most of us get our start with AI tools, you should probably default to the in-project chat, since it can see your files and edit them, while stepping out to a web chat only when the discussion does not belong in the project (it includes sensitive content, non-code context, or a record you want to share with collaborators). The main skill to develop is steering the conversation so it stays useful. -Another way to think about a chat with a capable model is as a kind of programming in natural language. In the chat, you specify what you want, the model executes, you observe the output, and you refine the specification. Effective programming skills, including clarity, decomposition, anticipating ambiguity, and iterating, are the same ones that make a chat user effective. Until LLMs, natural language was almost never an executable specification, and this is one of the more remarkable shifts the technology has produced. This very shift explains why "prompt engineering" became a buzzword. It's not magic words or incantations, it's specification quality. +Another way to think about a chat with a capable model is as a kind of programming in *natural language*. In the chat, you specify what you want, the model executes, you observe the output, and you refine the specification. Effective programming skills, including clarity, decomposition, anticipating ambiguity, and iterating, are the same ones that make a chat user effective. Until LLMs, natural language was almost never an executable specification, and this is one of the more remarkable shifts the technology has produced. This very shift explains why "prompt engineering" became a buzzword. It's not magic words or incantations that matter, it's the specification quality. ## Key goals @@ -27,7 +27,7 @@ The heuristic in [section 01](../01-two-worlds/) tells you to reach for chat whe - **Talking through a problem you can't quite name.** *"Something feels wrong about this architecture but I can't put my finger on it. Here's the structure..."* The act of describing it often clarifies your own thinking, and the model's questions back can probe weak spots. - **Learning a new domain or library.** Conversations let you ask the dumb questions you'd be embarrassed to ask a colleague. -If you find yourself wanting the model to *produce a specific edit*, you have drifted out of pure conversation territory. In an in-project chat, though, that drift is basically harmless. The same panel can just go make the edit, and you are now in an agentic interaction mode ([section 05](../05-agentic-workflow/)). In a web chat, though, the same drift is relatively expensive because you have to copy-paste the result back. If the conversation looks like it's heading toward edits, select the in-project chat from the start! +If you find yourself wanting the model to *produce a specific edit*, you have drifted out of pure conversation territory. In an in-project chat, though, that drift is harmless. The same panel can just go make the edit, and you are now in an agentic interaction mode ([section 05](../05-agentic-workflow/)). In a web chat, though, the same drift is relatively expensive because you have to copy-paste the result back. If the conversation looks like it's heading toward edits, select the in-project chat from the start! ## In-project chat versus web chat: when to step out @@ -86,7 +86,7 @@ Conversations accumulate context that can both help and hurt. Start a new chat w - **The conversation went sideways early.** If the model misunderstood your first message and the next several turns were spent correcting course, the corrected understanding is buried under that wrong understanding. A fresh start with a better first message is often faster. - **The chat has become long enough that important details from early turns are out of recent attention.** Most chat interfaces handle this gracefully, but very long chats can have the model "forget" something you said ten turns ago. Restating it in a fresh chat is sometimes easier than fighting it in the existing one. -There is no virtue in keeping a chat going longer than it needs to. Open a new one freely. Conversations are cheap. When you're deciding "should I keep this conversation going or start fresh?" you should be biased toward starting fresh. +There is no virtue in keeping a chat going longer than it needs to. Open a new one freely (in Claude Code, the `/clear` command; in ChatGPT or Claude.ai, the *New chat* button in the sidebar; in Cursor, the *+* at the top of the chat panel). Conversations are cheap. When you're deciding "should I keep this conversation going or start fresh?" you should be biased toward starting fresh. ## Carrying context across sessions @@ -102,14 +102,14 @@ The web-chat equivalent is your account history, where the *conversation itself* ## Patterns that work -- **Compare and contrast.** *"What are the practical differences between pandas `merge` and `join`? When would I one or the other?"* Models are good at structured comparisons. +- **Compare and contrast.** *"What are the practical differences between pandas `merge` and `join`? When would I use one or the other?"* Models are good at structured comparisons. - **Devil's advocate.** *"I'm planning to use approach X. What would make that a bad choice? What's the strongest argument against it?"* Inverts the default "let me help you do what you said" tendency. - **Explain to a target audience.** *"Explain finite-volume methods to me as if I have a strong finite-difference background but no CFD experience."* The audience framing tightens the level of abstraction: the model can skip discretization basics and focus on what's actually new (flux conservation across control volumes, dealing with unstructured meshes). - **Critique my draft.** *"Here is my approach / commit message / README. What's confusing or weak?"* Models are surprisingly useful as a first-pass reviewer. - **Walk me through.** *"Walk me through what happens when I call `requests.get(...)`. Don't skip the boring parts."* Good for building mental models of libraries you use but don't fully understand. - **Iterate on the prompt itself.** *"What would I have to add to my question to get a better answer?"* or *"Help me rewrite this prompt to be more specific."* The model is often perceptive about its own failure modes, and the resulting prompt is sharper than what you started with. Especially valuable when you are crafting a prompt you will reuse, such as a template, a system prompt, or an agent's instruction. -## Patterns that don't +## Patterns that don't work well - **Asking for "the best" with no criteria.** *"What's the best Python plotting library?"* gets you a generic matplotlib-vs-seaborn-vs-plotly survey. Add criteria like *"for publication-quality figures with mathematical annotations, where I need fine control over axes and tick formatting"* and the answer becomes more useful. - **Long preamble before the question.** Models read top-down, but the actual question is what they answer. If you bury it in paragraph three, the model may answer paragraph one. diff --git a/05-agentic-workflow/README.md b/05-agentic-workflow/README.md index 381176b..970cd05 100644 --- a/05-agentic-workflow/README.md +++ b/05-agentic-workflow/README.md @@ -6,6 +6,8 @@ An agentic tool is an AI that takes actions on its own (reading files, running c This section is about *using* agentic tools as an engineer or scientist (for modeling, data analysis, simulations, or coursework), not building production software for end users. For how tool use works under the hood, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop) section on tool use and agentic systems. +> **A note on overlap with [section 04](../04-conversations/):** the boundary between conversation and agentic work is fuzzy in practice. The same in-project chat panel becomes an agent the moment you give it a multi-step goal it executes on its own. This distinction is behavioral, not a separate tool to launch, and you will routinely cross between the two modes within a single session. This section focuses on what changes once the model is taking actions, but expect the framing to apply to many of your existing chat sessions too. + ## Key goals - Recognize when you have moved from in-project chat into agentic territory, and what changes when you do @@ -18,7 +20,7 @@ This section is about *using* agentic tools as an engineer or scientist (for mod ## What an agent actually does -An agentic tool wraps a chat model in a *loop* that lets it take actions in your environment. In practice, the same in-project chat panel you use for a one-shot question ([section 04](../04-conversations/)) becomes an agent the moment you give it a multi-step goal. There is no separate "agent app" to launch. One step in the loop: +An agentic tool wraps a chat model in a *loop* that lets it take actions in your environment. One step in the loop: 1. The model receives your goal and the current state (files, terminal output, etc.) 2. The model decides on the next action: read a file, run a command, write an edit @@ -87,7 +89,7 @@ Rule of thumb: **destructive or remote-affecting actions deserve confirmation.** ### Working directory and damage control -An agent pointed at a fresh sandbox can experiment freely. An agent pointed at your home directory can do real damage. Before starting: +An agent pointed at a fresh sandbox can experiment freely. An agent pointed at your home directory can do real damage! Before starting: - Be sure you are in the right directory - Have a clean git state (or know what's uncommitted) so you can see what the agent changed diff --git a/06-verifying-and-citing/README.md b/06-verifying-and-citing/README.md index e3e18e2..3a33fc4 100644 --- a/06-verifying-and-citing/README.md +++ b/06-verifying-and-citing/README.md @@ -15,14 +15,14 @@ AI assistants are useful because they generate plausible output fast. They are * ## Part 1: Verifying -> **A note on terminology.** This section uses "check" and "test" with different meanings. A **unit test** is a specific software-development practice — a small piece of code (often written with a framework like `pytest`) that exercises a function with known inputs and confirms the output matches an expected result. Tests are automated and reusable, and they pay off when code will be edited many times by many people. A **check**, more broadly, is anything that verifies the code does what you intended: running on a known limit case, comparing to a published value, plotting and inspecting the shape, or hand-calculating a small input. Formal unit tests are one form of check, but for scientific code written for a single project they are often not the most natural form. Whenever this guide says "verification" or "check," any of these forms count; "test" appears only where an automated test is the right tool. +> **A note on terminology.** This section uses "check" and "test" with different meanings. A **unit test** is a specific software-development practice, which is a small piece of code (often written with a framework like `pytest`) that exercises a function with known inputs and confirms the output matches an expected result. Tests are automated and reusable, and they pay off when code will be edited many times by many people. A **check**, more broadly, is anything that verifies the code does what you intended: running on a known limit case, comparing to a published value, plotting and inspecting the shape, or hand-calculating a small input. Formal unit tests are one form of check, but for scientific code written for a single project they are often not the most natural form. Whenever this guide says "verification" or "check," any of these forms count; "test" appears only where an automated test is the right tool. ### Why verification matters Hallucinations in AI-assisted coding fall into two broad categories: -1. **Loud hallucinations** — code that fails to compile or run. Easy to catch; the tool tells you. -2. **Quiet hallucinations** — code that runs and produces a result, but the result is wrong. These are the dangerous ones. +1. **Loud hallucinations.** Code that fails to compile or run. Easy to catch; the tool tells you. +2. **Quiet hallucinations.** Code that runs and produces a result, but the result is wrong. These are the dangerous ones. There is often a familiar pattern: a function that uses an API method that doesn't exist, a regex that handles all the cases you mentioned but fails silently on an edge case you didn't think to mention, a math expression that is dimensionally inconsistent but produces a number anyway. The output *looks like* an answer, so you accept it. Hours or weeks later, you discover the silent failure. @@ -81,10 +81,10 @@ For most academic work, the cloud-service baseline is the right mental model. Yo - **Restricted research data.** Anything covered by your IRB protocol, your data-use agreement with a collaborator or industrial partner, or institutional policies around HIPAA, FERPA, export controls, or similar regimes. If a category of data is restricted on your computer, it is restricted in your chat too. - **Unpublished work that isn't yours.** Collaborator drafts, manuscripts under review, code from a lab that hasn't been released. You don't own the right to share these regardless of how you happen to be sharing them. - **NDA-covered or proprietary material.** From an industrial collaboration, an internship, an advisor's industry consulting work. Check the specific agreement. -- **Personally identifying information.** Participant data, survey responses, names attached to outcomes — even when "anonymized for internal use." If you need help analyzing it, paste a synthetic example with the same shape rather than the real thing. +- **Personally identifying information.** Participant data, survey responses, names attached to outcomes, even when "anonymized for internal use." If you need help analyzing it, paste a synthetic example with the same shape rather than the real thing. - **Credentials, API keys, internal URLs.** Easy to leak by accident when pasting config files or logs. -For most students most of the time who are dealing with coursework, classroom exercises, your own scripts, public datasets, open-source libraries, and drafts of your own writing, the answer is "the chat is fine, same risk as email." Graduate students and undergradute reseearchers working with sensitive research data are the most common case for the categories above. If that's you, take the agreements that govern your data seriously, and when in doubt, ask your advisor or your IRB. +For most students most of the time who are dealing with coursework, classroom exercises, your own scripts, public datasets, open-source libraries, and drafts of your own writing, the answer is "the chat is fine, same risk as email." Graduate students and undergraduate researchers working with sensitive research data are the most common case for the categories above. If that's you, take the agreements that govern your data seriously, and when in doubt, ask your advisor or your IRB. ### A practical checklist @@ -118,7 +118,7 @@ Two complementary reasons: The realistic bar is not "note every Copilot autocomplete." That standard is impossible to meet in practice, and treating it as required is part of why disclosure norms feel unrealistic. A more useful distinction: -- **Background assistance** that shaped *how* you worked, such as autocomplete, syntax help, name suggestions, quick lookups, debugging conversations. Usually no disclosure is needed unless your venue's policy is specific. +- **Background assistance** that shaped *how* you worked, such as autocomplete, syntax help, name suggestions, quick lookups, debugging conversations. Usually no disclosure is needed unless your venue's policy is specific. - **Substantive contribution** that shaped *what* you produced, such as AI drafted a section, generated significant chunks of code that you reviewed and accepted, planned the analytical approach, wrote the literature summary, debugged a critical reasoning step. These likely warrant a brief note. - **Substituted work** where AI produced something you submitted as your own without meaningful engagement, including running an assignment through ChatGPT and turning in the output. This is the case policies are most worried about, and it sits closer to academic dishonesty than to the disclosure question. @@ -136,7 +136,7 @@ The form depends on context: A useful pattern when you do disclose is to state three things: -1. **What tool you used** (specific model and version if available — "Claude Opus 4.7," "ChatGPT-4o," "GitHub Copilot") +1. **What tool you used** (specific model and version if available, such as "Claude Opus 4.7," "ChatGPT-4o," "GitHub Copilot") 2. **What you used it for** ("debugging error messages," "drafting the introduction," "generating boilerplate code") 3. **What you did with the output** ("reviewed and edited," "used as a starting point and rewrote," "used as-is after verification") diff --git a/07-local-models/README.md b/07-local-models/README.md index 9bdae51..3e55496 100644 --- a/07-local-models/README.md +++ b/07-local-models/README.md @@ -2,9 +2,9 @@ ## Key idea -You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware: no API, no per-token cost, no data leaving the machine. Local models cut across every workflow we've covered — web chat, autocomplete, in-project chat, and agentic — rather than being a separate mode. The same workflow patterns apply; what changes is the tool that hosts the model and what you give up (and gain) by running it yourself. +You do not have to use a frontier cloud model to use AI in your work. A "local" model runs entirely on your own hardware with no API, no per-token cost, and no data leaving the machine. Local models cut across every workflow we've covered (web chat, autocomplete, in-project chat, and agentic) rather than being a separate mode. The same workflow patterns we've reviewed in this tutorial apply to local models. What changes is the tool that hosts the model and what you give up (and gain) by running it yourself. -This section is about local models as a *user* of AI coding tools. If you want to understand how local models work under the hood, train your own, or build the infrastructure around them, see the [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop). +This section is about local models as a *user* of AI coding tools. If you want to understand how local models work under the hood, train your own, or build the infrastructure around them, see [llm-workshop](https://lem.che.udel.edu/git/furst/llm-workshop). ## Key goals @@ -17,39 +17,49 @@ This section is about local models as a *user* of AI coding tools. If you want t ## Why run a local model? -Five reasons, ordered by how often they matter in practice: +Here are six reasons to run a local model, ordered by how often they matter in practice: 1. **Privacy and IP.** Code, data, and prompts never leave your machine. This is the deciding factor for proprietary work, IRB-constrained research, employer-restricted code, and anything covered by an NDA or government contract. *What you don't send, the service can't see.* -2. **Cost.** No per-token billing. After the hardware cost (which you may already have paid for), inference is effectively free. For heavy use — long agentic sessions, batch processing — the savings add up quickly. +2. **Cost.** No per-token billing. After the hardware cost (which you may already have paid for), inference is effectively free. For heavy use (long agentic sessions, batch processing) the savings add up quickly. 3. **Offline operation.** Works on a plane, in a lab without internet, in a SCIF, on a remote field deployment. Cloud models simply don't. -4. **Control and reproducibility.** You pin a specific model version. It doesn't get retired, deprecated, or silently updated under you. Useful for reproducible research and long-lived pipelines. +4. **Control and reproducibility.** You pin a specific model version. It doesn't get retired, deprecated, or silently updated under you. This is useful for reproducible research and long-lived pipelines. 5. **Learning.** Running a model yourself forces you to understand what it is, what it can do, and where it breaks. This is a real benefit for engineers and researchers who plan to work with these systems. +6. **Future-proofing against vendor decisions.** Pricing structures, rate limits, terms of service, available model lineups, and the surrounding tools (CLIs, IDE extensions, SDKs) all change on the vendor's schedule, not yours. A workflow built around a local model is insulated from price hikes, deprecated APIs, retired models, regional availability changes, and the slow drift of vendor lock-in. This matters most for work you expect to maintain for years. -These are *also* the reasons people use cloud models for the opposite of each: convenience, no setup, always-current, no local hardware burden. +These are *also* the reasons people use cloud models for the opposite of each: performance, convenience, no setup, always-current, no local hardware burden, and someone else worrying about keeping the model up to date. ## Hardware reality -Local models are constrained by your hardware in a way cloud models are not. The dominant factor is **memory** — specifically VRAM on a GPU or unified memory on Apple Silicon. +Local models are constrained by your hardware in a way cloud models are not. The dominant factor is **memory**. Specifically, VRAM on a GPU or unified memory on Apple Silicon. -A rough sense of what runs comfortably where, as of early 2026: +A rough sense of what runs comfortably where (snapshot as of early 2026; the specific models below will date within a year, but the size tiers will not): | Hardware | Practical model size | Example models | |---|---|---| -| 8 GB RAM/VRAM | 1–3 B parameter models, heavily quantized | Gemma 2 2B, Phi 3 Mini | -| 16 GB | 7–8 B at moderate quantization | Llama 3.1 8B, Qwen 2.5 Coder 7B | -| 24–32 GB (high-end laptop GPU or Apple Silicon) | 13–32 B at moderate quantization | Qwen 2.5 Coder 32B, Mistral Small | -| 48–64 GB (Mac Studio, server GPU) | 70 B class at heavy quantization | Llama 3.3 70B, DeepSeek Coder V2 | -| 128 GB+ workstation | 70 B at lighter quantization, or multiple models | larger Qwen, Mixtral variants | +| 8 GB RAM/VRAM | 1–4 B parameter models, heavily quantized | Gemma 4 `e4b`, Phi-4 mini, Qwen3 4B | +| 16 GB | 7–14 B at moderate quantization | Qwen3 8B, Qwen3.5 9B, Gemma 3 12B | +| 24–32 GB (high-end laptop GPU or Apple Silicon) | 13–32 B at moderate quantization | Qwen3.6 27B, Qwen3-Coder 30B, Mistral Small 3.2, Phi-4, Gemma 3 27B | +| 48–64 GB (Mac Studio, server GPU) | 70 B class at heavy quantization, or smaller MoE | Llama 3.3 70B, Qwen3.6 35B, Qwen3-Coder 30B (lighter quantization) | +| 128 GB+ workstation | 70 B at lighter quantization, MoE models, or multiple models in parallel | Llama 4 Scout (`16x17b`), DeepSeek V4 Flash, GLM-5.1, Qwen3 235B | -**Quantization** (compressing model weights from 16-bit floats down to 4-bit or 5-bit integers) is what makes large models fit on consumer hardware. You trade a small amount of quality for a large amount of memory savings. Most local-model tools default to a sensible quantization. +For an up-to-date view of what's available and how it ranks, treat the [Ollama model library](https://ollama.com/library) as the catalog and combine two kinds of signal: + +- **Benchmark aggregators** for the quantitative picture: [Artificial Analysis](https://artificialanalysis.ai/) (composite Intelligence Index), [LMArena](https://lmarena.ai/) (human-preference Elo, including a Code Arena), and [Vellum's open LLM leaderboard](https://www.vellum.ai/llm-leaderboard) (deliberately excludes saturated benchmarks like MMLU). +- **Practitioner signal** from [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/), which is where quantization quality, VRAM-fit reports, tokens/sec on specific GPUs, and reactions to new model drops actually surface. Treat the boards as the baseline and the subreddit as the field report. + +The original Hugging Face Open LLM Leaderboard was retired in 2025; its v1 and v2 result archives remain downloadable but no new models are scored against it. The table above will drift; those live sources track the frontier. + +A note on **licenses**, since "open weights" is not the same as "open source." Models are released under widely varying terms. GLM-5.1 ships under a plain MIT license with no field-of-use restrictions, which is the cleanest of the current frontier-tier open releases. Meta's Llama licenses include a usage-scale cap. Several research releases restrict commercial use. If you plan to deploy a model in a classroom, lab pipeline, or research-group setting, check the license on the model's Hugging Face or Ollama page before you build on it. + +**Quantization** (compressing model weights from 16-bit floats down to 4-bit or 5-bit integers) is what makes large models fit on consumer hardware. We trade a small amount of quality for a large amount of memory savings. Most local-model tools default to a sensible quantization. If you took the time to fill out the spec table in [computing-setup section 01](https://lem.che.udel.edu/git/furst/computing-setup/src/branch/main/01-know-your-machine/), you already know what tier you're in. ## Local models across the workflow -The framing from [section 01](../01-two-worlds/) still applies — what changes is the host. Below, we walk through where local models fit in each kind of work. +The framing from [section 01](../01-two-worlds/) still applies. What changes is the host. Below, we walk through where local models fit in each kind of work. ### Local in *web-chat* style @@ -66,7 +76,7 @@ The Ollama-powered backends in particular are useful well beyond chat — most o ### Local for autocomplete and in-project chat -Several VS Code extensions support local models for autocomplete and side-panel chat. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude (legacy) extension do not** — they require their vendor's cloud service. If you want a local model in your editor, you need a different extension. +Several VS Code extensions support local models for autocomplete and side-panel chat. Notably, **GitHub Copilot, Microsoft Copilot, and the Claude (legacy) extension do not**. They require their vendor's cloud service. If you want a local model in your editor, you need a different extension. | Extension | Notes | |---|---| @@ -94,14 +104,14 @@ Notable exclusions (as of early 2026): **Claude Code, Cursor agent mode, and Mic Frontier cloud models (Claude Opus, GPT-4o, Gemini Pro) are still better than local models at almost every coding task. Pretending otherwise sets you up for disappointment. Some honest framing: -- **For autocomplete and short suggestions**, a good local 7–13 B model (Qwen 2.5 Coder, DeepSeek Coder Lite, Codestral) is genuinely useful and the gap to cloud is small. +- **For autocomplete and short suggestions**, a good local 7–13 B model (Qwen3 8B, DeepSeek-Coder-V2 16B, Codestral) is genuinely useful and the gap with cloud models is small. - **For one-shot Q&A and short refactors**, the gap is noticeable but acceptable. You may need a second try where a frontier model would have nailed it the first time. - **For long reasoning chains, multi-file work, or anything subtle**, the gap is large. Frontier cloud models still win clearly. -- **For agentic loops**, the gap compounds: each step has slightly worse output, errors propagate, and you spend more time supervising. Local agents on a 7 B model are frustrating; on a 32–70 B model, they're usable. On a frontier cloud model, they're effective. +- **For agentic loops**, the gap compounds: each step has slightly worse output, errors propagate, and you spend more time supervising. Local agents on a 7 B model are frustrating, and on a 32–70 B model, they're usable. On a frontier cloud model, they can be fairly effective. -The gap is narrowing every few months. The advice above will date faster than most of this guide. +The gap is narrowing every few months, so this advice above will date faster than most of this guide! -There is also a **latency gap**. A frontier cloud model returns a response in a second or two; a 70 B local model on a typical workstation might take fifteen to thirty seconds for the same prompt. For autocomplete this is the difference between "helpful" and "in the way." For longer answers it's the difference between "fluid" and "wait, think about something else, come back." +There is also a **latency gap**. A frontier cloud model returns a response in a second or two, while a 70 B local model on a typical workstation might take fifteen to thirty seconds for the same prompt. For autocomplete this is the difference between "helpful" and "in the way." For longer answers it's the difference between "fluid" and waiting or doing something else in the meantime. ## When local makes the most sense @@ -117,8 +127,8 @@ The clearest cases: The cases where cloud still wins: - **You don't have the hardware.** Frontier cloud is cheaper than buying a workstation if you're not going to use it heavily. -- **You're at the frontier of difficulty** — the hardest reasoning, the longest contexts, the newest capabilities. The cloud has more parameters than your laptop. -- **You use AI occasionally and care more about ease than control.** Cloud is one click; local is one weekend. +- **You're at the frontier of difficulty:** the hardest reasoning, the longest contexts, the newest capabilities. A cloud model has (many) more parameters than your laptop or workstation. +- **You use AI occasionally and care more about ease than control.** Cloud access is one (or two) clicks. ## A practical starting setup @@ -126,10 +136,10 @@ The cases where cloud still wins: If you want to try local models, the lowest-friction path is: 1. Install [Ollama](https://ollama.com/) (`brew install ollama` on macOS; one-liner installer on Linux; native installer on Windows). -2. Pull a model sized for your hardware: - - 8 GB RAM: `ollama pull gemma2:2b` or `ollama pull phi3.5` - - 16 GB: `ollama pull llama3.1:8b` or `ollama pull qwen2.5-coder:7b` - - 24–32 GB+: `ollama pull qwen2.5-coder:32b` or `ollama pull llama3.3:70b` (the 70 B will be tight) +2. Pull a model sized for your hardware (verify current names against the [Ollama library](https://ollama.com/library); these tags drift): + - 8 GB RAM: `ollama pull gemma4:e4b` or `ollama pull phi4-mini` + - 16 GB: `ollama pull qwen3:8b` or `ollama pull gemma3:12b` + - 24–32 GB+: `ollama pull qwen3:32b`, `ollama pull qwen3-coder:30b`, or `ollama pull llama3.3:70b` (the 70 B will be tight) 3. Try it in chat: `ollama run ` in the terminal, or point Open WebUI at it. 4. Try it in your editor: install **Continue.dev**, configure it to use Ollama as the provider, point it at your model. 5. Try it agentic: install **Aider** (`pip install aider-chat`), run `aider --model ollama/` in a project directory. diff --git a/README.md b/README.md index 59287bd..c3f0670 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ # Coding with AI -A practical guide to working effectively with AI coding assistants — web chat interfaces (ChatGPT, Claude, Gemini, Microsoft Copilot), in-project chat panels and CLIs (Claude Code, Cursor), autocomplete, and agentic tools. Our focus is on *workflow* and *judgment*: when to reach for which tool, what to paste, how to prompt, how to verify, and what to cite. +This is a practical guide to working effectively with AI coding assistants: web chat interfaces (ChatGPT, Claude, Gemini, Microsoft Copilot), in-project chat panels and CLIs (Claude Code, Cursor), autocomplete, and agentic tools. Our focus is on *workflow* and *judgment*: which tool to choose, what to paste, how to prompt, how to verify, and what to cite. -AI tools change quickly, but the patterns change slowly. This guide aims at the patterns and uses current tools as examples. +AI tools change quickly, but the patterns change more slowly. This guide aims at the patterns and uses current tools as examples. -**A note on scope.** This guide is about *coding* — writing, editing, refactoring, and debugging software. Students and engineers also use AI tools heavily for *learning* tasks: explaining concepts, summarizing literature, generating practice problems, study quizzes, mnemonics, working through homework, finding the right vocabulary for a half-remembered idea. The web-chat-vs-in-project framing here applies broadly, but the tools, examples, and tradeoffs for learning use cases are different enough to deserve their own guide. +**A note on scope.** This guide is about *coding*: writing, editing, refactoring, and debugging software. Students and engineers also use AI tools heavily for *learning* tasks: explaining concepts, summarizing literature, generating practice problems, study quizzes, mnemonics, working through homework, finding the right vocabulary for a half-remembered idea. The web-chat-vs-in-project framing here applies broadly, but the tools, examples, and tradeoffs for learning use cases are different enough to deserve their own guide. ## Sections @@ -12,15 +12,15 @@ AI tools change quickly, but the patterns change slowly. This guide aims at the |---|-------|-------------| | [01](01-two-worlds/) | **Two worlds** | Web chat versus tools that live with your code. Why the second is where coding work belongs, and the autocomplete/chat/agent spectrum within it. | | [02](02-errors-and-logs/) | **Errors and logs** | The canonical copy-paste case. How to frame what you paste so the assistant can actually help. | -| [03](03-autocomplete/) | **Autocomplete** | Ghost-text suggestions as you type. What it's good for, the traps (especially in verification code), and when to escalate. | +| [03](03-autocomplete/) | **Autocomplete** | Ghost-text suggestions as you type. What it's good for, the traps (especially in verification code), and when to escalate. *Less central than chat or agents; skim or skip if you don't use it.* | | [04](04-conversations/) | **Conversations** | Multi-turn design discussions in the in-project chat or a web chat, managing context, and when to start a fresh chat. | | [05](05-agentic-workflow/) | **Agentic workflow** | What agentic tools (Claude Code, Cursor agent, Microsoft Copilot agent mode) actually do, and how to supervise them. | | [06](06-verifying-and-citing/) | **Verifying and citing** | Reviewing AI output for hallucinations and silent errors. Privacy and IP of what you paste. Attribution in academic and professional work. | -| [07](07-local-models/) | **Using local models** | Local models as a cross-cutting alternative — privacy, cost, offline operation. Which tools support local across the autocomplete/chat/agent spectrum, and where the capability gap to cloud still matters. | +| [07](07-local-models/) | **Using local models** | Local models as a cross-cutting alternative (privacy, cost, offline operation). Which tools support local across the autocomplete/chat/agent spectrum, and where the capability gap to cloud still matters. | ## Who this is for -Students and practicing engineers who are already using AI assistants but want to use them more deliberately — including those whose default workflow is "ask ChatGPT, copy the answer back." There is nothing wrong with copy-paste, but our goal is to know *when* it is the right tool and when to use something else. +Students and practicing engineers who are already using AI assistants but want to use them more deliberately, including those whose default workflow is "ask ChatGPT, copy the answer back." There is nothing wrong with copy-paste, but our goal is to know *when* it is the right tool and when to use something else. ## Prerequisites @@ -29,7 +29,7 @@ Students and practicing engineers who are already using AI assistants but want t ## A note on tools and dates -Tool capabilities, pricing, and policies change frequently. Where this guide names a specific feature ("Cursor's agent mode," "Claude Code"), the description reflects what those tools did as of the first half of 2026. The underlying patterns — web chat versus tools that live with your code, and the autocomplete/chat/agent spectrum within the latter — are durable. Treat any tool-specific advice as illustrative. +Tool capabilities, pricing, and policies change frequently. Where this guide names a specific feature ("Cursor's agent mode," "Claude Code"), the description reflects what those tools did as of the first half of 2026. The underlying patterns (web chat versus tools that live with your code, and the autocomplete/chat/agent spectrum within the latter) are durable. Treat any tool-specific advice as illustrative. ## License