Reorder: tool use is now 05, neural networks is 06
The LLM arc completes at section 05 (agentic systems), with neural networks as a standalone ML deep-dive in section 06. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
aee8ecd7b8
commit
cab2ebfd9d
11 changed files with 384 additions and 4 deletions
200
05-tool-use/README.md
Normal file
200
05-tool-use/README.md
Normal file
|
|
@ -0,0 +1,200 @@
|
|||
# Large Language Models Part V: Tool Use and Agentic Systems
|
||||
|
||||
**CHEG 667-013 -- Chemical Engineering with Computers**
|
||||
Department of Chemical and Biomolecular Engineering, University of Delaware
|
||||
|
||||
---
|
||||
|
||||
## Key idea
|
||||
|
||||
The LLM tools you use every day are not bare language models. They are agentic systems where the LLM serves as a natural-language interface to a collection of tools. In this section, we build one ourselves.
|
||||
|
||||
## Key goals
|
||||
|
||||
- Understand the difference between a bare LLM and an agentic system
|
||||
- See how the tools you use daily (ChatGPT, Claude, Copilot) are built from the same pieces we have studied
|
||||
- Use Ollama's tool-calling API to connect an LLM to Python functions
|
||||
- Build a simple engineering assistant that can call a solver and return results in natural language
|
||||
|
||||
---
|
||||
|
||||
## 1. From LLM to agent: what changed?
|
||||
|
||||
### The early days
|
||||
|
||||
When ChatGPT launched in late 2022, it was essentially a web interface to a language model. You typed a prompt, the model generated text from its weights, and that was the experience. The LLM *was* the product.
|
||||
|
||||
### What you are using now
|
||||
|
||||
That is no longer what is happening. When you use ChatGPT, Claude, or Copilot today, you are interacting with an *agentic system* -- a program that uses an LLM as one component among several:
|
||||
|
||||
- **ChatGPT** can browse the web, run Python in a sandbox, generate images, and read uploaded files.
|
||||
- **Claude** can read documents, use tools, search the web, and write and execute code.
|
||||
- **Copilot** integrates with your editor, reads your codebase, and suggests completions in context.
|
||||
|
||||
None of these capabilities come from the LLM itself. They are built *around* it.
|
||||
|
||||
### The shift in the LLM's role
|
||||
|
||||
- **Then**: The LLM as the engine. You ask, it generates. The model's weights are the whole system.
|
||||
- **Now**: The LLM as the interface and reasoning layer. You ask in natural language, it figures out what needs to happen -- retrieve documents? run code? search the web? call an API? -- orchestrates those actions, and synthesizes the results back into language for you.
|
||||
|
||||
The LLM brings a kind of flexible reasoning to the system: it can interpret ambiguous requests, decide which tool to use, handle unexpected results, and explain what happened. But it is reasoning *in language*, not in math or physics. This is why an LLM that can *call* `scipy.optimize.fsolve` is more useful than one that tries to solve equations by generating arithmetic token-by-token.
|
||||
|
||||
The reasoning is about orchestration and communication. The computation is done by tools.
|
||||
|
||||
|
||||
## 2. The anatomy of an agentic system
|
||||
|
||||
Most agentic systems are built from the same set of components. You have already encountered many of them in this course:
|
||||
|
||||
| Component | What it does | Where you saw it |
|
||||
|-----------|-------------|-----------------|
|
||||
| **LLM** | Generates text given a prompt | nanoGPT (section 01), Ollama (section 02) |
|
||||
| **System prompt** | Shapes the LLM's behavior and persona | Ollama Modelfiles (section 02) |
|
||||
| **Retrieval** | Pulls relevant information into the context window | RAG (section 03), semantic search (section 04) |
|
||||
| **Tool use** | LLM requests a function call; the system executes it | *This section* |
|
||||
| **Memory** | Stores conversation history and re-injects it into prompts | The `messages` list in Ollama's chat API |
|
||||
| **Orchestration** | A loop: the LLM decides what to do, the system does it, repeat | *This section* |
|
||||
|
||||
The key insight: the LLM does not *do* any of these things itself. It generates text that the surrounding system interprets as instructions. The system does the actual work.
|
||||
|
||||
|
||||
## 3. Tool calling with Ollama
|
||||
|
||||
Ollama supports tool use (also called function calling) through its chat API. The flow works like this:
|
||||
|
||||
```
|
||||
1. You define tools (Python functions with descriptions)
|
||||
2. You send a user message + tool definitions to the model
|
||||
3. The model responds with a tool call (function name + arguments)
|
||||
4. Your code executes the function
|
||||
5. You send the result back to the model
|
||||
6. The model responds in natural language, incorporating the result
|
||||
```
|
||||
|
||||
The model never executes code. It only *asks* for a tool to be called. Your program does the execution. This is exactly how ChatGPT's code interpreter works -- the model generates Python code, a sandboxed runtime executes it, and the result is fed back to the model.
|
||||
|
||||
### A simple example
|
||||
|
||||
Open `tool_demo.py` and read through it. The script defines a single tool (an `add` function), sends a math question to the model, and lets the model call the tool to get the answer. Run it:
|
||||
|
||||
```bash
|
||||
python tool_demo.py
|
||||
```
|
||||
|
||||
```
|
||||
Model wants to call: add({'a': 247, 'b': 863})
|
||||
Result: 1110
|
||||
|
||||
Model: The sum of 247 and 863 is 1,110.
|
||||
```
|
||||
|
||||
Notice what happened: the model did not try to do arithmetic. It recognized that it had a tool for addition, generated a structured request, and let the tool do the computation. Then it reported the result in natural language. This is the pattern behind every "AI tool" you use.
|
||||
|
||||
> **Exercise 1:** Run `tool_demo.py`. Then modify it to add a `multiply` function alongside `add`. Ask the model a question that requires multiplication. Does it choose the right tool?
|
||||
|
||||
> **Exercise 2:** Ask the model a question that does not need any tool (e.g., "What is the capital of France?"). What happens? Does it still try to call a tool, or does it respond directly?
|
||||
|
||||
|
||||
## 4. An engineering assistant
|
||||
|
||||
Now let's build something relevant to chemical engineering. Open `thermo_assistant.py` and read through it. The script gives an LLM access to two tools:
|
||||
|
||||
- `vapor_pressure` -- computes vapor pressure using the Antoine equation for five compounds (water, ethanol, benzene, toluene, acetone)
|
||||
- `available_compounds` -- lists which compounds are in the database
|
||||
|
||||
It also includes a system prompt that tells the model it is a chemical engineering assistant, and an orchestration loop (`ask()`) that handles multi-step tool calls. Run it:
|
||||
|
||||
```bash
|
||||
python thermo_assistant.py
|
||||
```
|
||||
|
||||
```
|
||||
Thermo Assistant (type "quit" to exit)
|
||||
|
||||
You: What is the vapor pressure of water at 100 degrees C?
|
||||
[tool] vapor_pressure({'compound': 'water', 'temperature_C': 100})
|
||||
[result] 759.94 mmHg
|
||||
|
||||
Assistant: The vapor pressure of water at 100°C is approximately 759.94 mmHg,
|
||||
which is very close to 1 atmosphere (760 mmHg). This is consistent with the
|
||||
normal boiling point of water.
|
||||
|
||||
You: Compare ethanol and water at 78 degrees C.
|
||||
[tool] vapor_pressure({'compound': 'ethanol', 'temperature_C': 78})
|
||||
[result] 752.73 mmHg
|
||||
[tool] vapor_pressure({'compound': 'water', 'temperature_C': 78})
|
||||
[result] 327.84 mmHg
|
||||
|
||||
Assistant: At 78°C, ethanol has a vapor pressure of about 753 mmHg (close to
|
||||
1 atm, since 78°C is near its boiling point), while water has a vapor pressure
|
||||
of only 328 mmHg. Ethanol is much more volatile at this temperature.
|
||||
```
|
||||
|
||||
Notice several things:
|
||||
|
||||
- The model called the right tool with the right arguments, without being told the function signature explicitly. It inferred this from the docstring and type hints.
|
||||
- For the comparison question, the model called the tool *twice* (once for each compound) and then synthesized the results.
|
||||
- The model added context (the boiling point connection) that came from its training data, not from the tool. The tool provided the numbers; the model provided the interpretation.
|
||||
- The `while` loop in `ask()` handles multi-step interactions where the model may want to call tools more than once before giving a final answer. This is the orchestration loop -- a minimal version of what ChatGPT and Claude do internally.
|
||||
|
||||
> **Exercise 3:** Run `thermo_assistant.py`. Try asking questions in different ways: "What boils first, benzene or toluene?" or "At what temperature is the vapor pressure of acetone equal to 400 mmHg?" How does the model handle questions that require reasoning beyond a single tool call?
|
||||
|
||||
> **Exercise 4:** Add a new tool: a function that estimates the normal boiling point by finding the temperature where the vapor pressure equals 760 mmHg. (Hint: use `scipy.optimize.brentq` or a simple bisection.) Does the model use it when asked "What is the boiling point of ethanol?"
|
||||
|
||||
> **Exercise 5:** Ask the model about a compound that is not in the database (e.g., "What is the vapor pressure of hexane at 60 C?"). What happens? How does the error message from the tool help the model respond?
|
||||
|
||||
|
||||
## 5. What would you build?
|
||||
|
||||
You have now built or studied every major component of an agentic system:
|
||||
|
||||
| Component | Where you built it |
|
||||
|-----------|-------------------|
|
||||
| LLM (next-token prediction) | nanoGPT (section 01) |
|
||||
| System prompts and customization | Ollama Modelfiles (section 02) |
|
||||
| Retrieval-augmented generation | RAG pipeline (sections 03-04) |
|
||||
| Tool use and orchestration | This section |
|
||||
|
||||
The tools you use every day -- ChatGPT, Claude, Copilot -- are these pieces wired together, with the LLM as the natural-language interface to all of them. The "intelligence" you experience is partly the LLM, but substantially the engineering of the system around it.
|
||||
|
||||
These are systems we can build, extend, and reason about, not just black boxes we only consume. Increasingly, we can build them as completely independent tools using locally-run models without relying on cloud-based APIs.
|
||||
|
||||
> **Exercise 6:** Design (on paper) an agentic system for a problem in your research or coursework. What tools would the LLM need access to? What data would it retrieve? What should the system prompt say? You do not need to build it -- just sketch the architecture.
|
||||
|
||||
> **Exercise 7:** Pick one tool from your Exercise 6 design and implement it. Wire it into `thermo_assistant.py` (or a copy of it) and test it. Does the model use it correctly?
|
||||
|
||||
> **Exercise 8 (advanced):** In sections 03-04, you built a RAG pipeline with LlamaIndex. LlamaIndex can wrap that pipeline as a *tool* that an agent decides when to call. Using `FunctionTool` and `QueryEngineTool` from `llama_index.core.tools` and `ReActAgent` from `llama_index.core.agent`, create an agent that has access to both your RAG query engine *and* the `vapor_pressure` function from this section. Ask it a question that requires retrieval ("What did the president's email say about research funding?") and one that requires computation ("What is the vapor pressure of ethanol at 60 C?"). Does the agent choose the right tool for each? Set `verbose=True` to see the agent's reasoning trace. See https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/ for details.
|
||||
|
||||
|
||||
## A brief timeline of LLM tools
|
||||
|
||||
| Date | Event |
|
||||
|------|-------|
|
||||
| Nov 2022 | ChatGPT launches as a chat interface to GPT-3.5. The LLM is the product. |
|
||||
| Mar 2023 | GPT-4 released. ChatGPT adds plugins (web browsing, code interpreter). The shift toward tool use begins. |
|
||||
| Mar 2023 | AutoGPT released -- one of the first open-source "autonomous agent" projects. An LLM in a loop that can plan, use tools, and act on its own. Sparks widespread interest in agentic architectures. |
|
||||
| Jun 2023 | OpenAI introduces function calling in the API. Developers can define tools for GPT to call. |
|
||||
| Oct 2023 | LangChain and similar orchestration frameworks gain traction, providing standard patterns for building chains of LLM calls, tool use, and retrieval. |
|
||||
| Mar 2024 | Claude 3 released with tool use support. Anthropic's approach emphasizes structured tool definitions. |
|
||||
| Mar 2024 | Devin announced as an "AI software engineer" -- an early example of a fully agentic coding system that can plan, write code, debug, and deploy. |
|
||||
| Jul 2024 | Ollama adds tool calling support (v0.3.0). Local models can now use tools. |
|
||||
| Oct 2024 | Anthropic introduces "computer use" -- Claude can see and interact with a desktop, clicking, typing, and navigating applications like a human user. |
|
||||
| Nov 2024 | Ollama Python library v0.4: pass Python functions directly as tools. |
|
||||
| 2025 | Agentic systems become mainstream products. Claude Code (a coding agent in the terminal), ChatGPT with persistent memory and tool use, GitHub Copilot as an in-editor agent. The LLM is now the interface, not the product. |
|
||||
|
||||
|
||||
## Additional resources and references
|
||||
|
||||
### Ollama tool calling
|
||||
|
||||
- Ollama tool support announcement: https://ollama.com/blog/tool-support
|
||||
- Functions as tools (Python): https://ollama.com/blog/functions-as-tools
|
||||
- Ollama Python library: https://github.com/ollama/ollama-python
|
||||
- Models with tool support: https://ollama.com/search?c=tool
|
||||
|
||||
### Background reading
|
||||
|
||||
- Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools" (2023): https://arxiv.org/abs/2302.04761
|
||||
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022): https://arxiv.org/abs/2210.03629
|
||||
Loading…
Add table
Add a link
Reference in a new issue