Update module docs: fix arXiv URL, uv setup, nanoGPT clone path

- Use HTTPS for arXiv API (was returning 301 on HTTP)
- Point module 01 preliminaries to root uv sync instead of separate venv
- Clone nanoGPT into 01-nanogpt/ and add to .gitignore
- Add llama3.1:8B to module 02 models table
- Various editorial updates to modules 01 and 02

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Eric 2026-04-01 22:25:42 -04:00
commit e10e411e41
3 changed files with 41 additions and 34 deletions

View file

@ -18,30 +18,22 @@ We will study how Large Language Models (LLMs) work and discuss some of their us
---
Large Language Models (LLMs) have rapidly integrated into our daily lives. Our goal is to learn a bit about how LLMs work. As you have probably become well aware of throughout your studies, engineers often don't take technical solutions for granted. We generally like to "look under the hood" and see how a system, process, or tool does its job — and whether it is giving us accurate and useful solutions. The material we will cover is largely inspired by the rapid adoption of LLMs to help us solve problems in our engineering practice.
Large Language Models (LLMs) have rapidly become part of our lives. Our goal is to learn a bit about how LLMs work. As you have probably become well aware of throughout your studies, engineers often don't take technical solutions for granted. We generally like to "look under the hood" and see how a system, process, or tool does its job — and whether it is giving us accurate and useful solutions. The material we will cover is largely inspired by the rapid adoption of LLMs to help us solve problems in our engineering practice.
We will use a code repository published by Andrej Karpathy called nanoGPT. GPT stands for **G**enerative **P**re-trained **T**ransformer. A transformer is a neural network architecture designed to handle sequences of data using self-attention, which allows it to weigh the importance of different words in a context. The neural network's weights and biases are created beforehand using training and validation datasets (these constitute the training and fine-tuning steps, which often require considerable computational effort, depending on the model size). Generative refers to a model's ability to create new content, rather than just analyzing or classifying existing data. When we generate text, we are running an *inference* on the model. Inference requires much less computational effort.
NanoGPT can replicate the function of the GPT-2 model. Building the model from scratch to that level of performance (which is far lower than the current models) would still require a significant investment in computational effort — Karpathy reports using eight NVIDIA A100 GPUs for four days on the task — or 768 GPU hours. In this introduction, our aspirations will be far lower. We should be able to do simpler work with only a CPU.
Hoave you wondered why LLMs tend to use GPUs? The math underlying the transformer architecture is largely based on matrix calculations. Originally, GPUs were developed to quickly calculate matrix transformations associated with high-performance graphics applications. (It's all linear algebra!) These processors have since been adapted into general-purpose engines for the parallel computations used in modern AI algorithms.
Have you wondered why LLMs tend to use GPUs? If you dig deeper into the models, you will find that the math underlying the transformer architecture is largely based on matrix calculations. Originally, GPUs were developed to quickly calculate matrix transformations associated with high-performance graphics applications. (It's all linear algebra!) These processors have since been adapted into general-purpose engines for the parallel computations used in modern AI and machine learning algorithms.
## 1. Preliminaries
Dust off those command line skills! There will be no GUI where we're going. I recommend making a new directory (under WSL if you're using a Windows machine) and setting up a Python virtual environment:
Dust off those command line skills! There will be no GUI where we're going. Set up the Python environment as described in the main [README](../README.md). If you haven't already:
```bash
python -m venv llm
source llm/bin/activate
```
You will need to install packages like `numpy` and `pytorch`. If you have [uv](https://docs.astral.sh/uv/) installed, you can use it instead:
```bash
uv venv llm
source llm/bin/activate
uv pip install numpy torch
uv sync
source .venv/bin/activate
```
@ -49,9 +41,10 @@ uv pip install numpy torch
Karpathy's code is at https://github.com/karpathy/nanoGPT
Download the code using `git`. An alternative is to download a `zip` file from the Github page. (Look for the green `Code` button on the site. Clicking this, you will see `Download ZIP` in the dropdown menu.)
From the `01-nanogpt/` directory, download the code using `git`. An alternative is to download a `zip` file from the Github page. (Look for the green `Code` button on the site. Clicking this, you will see `Download ZIP` in the dropdown menu.)
```bash
cd 01-nanogpt
git clone https://github.com/karpathy/nanoGPT
```
@ -59,16 +52,22 @@ You should now have a nanoGPT directory:
```bash
$ ls
nanoGPT/
README.md nanoGPT/
```
## 3. A quick tour
List the directory contents of `./nanoGPT`. You should see something like:
Change into the nanoGPT directory — the remaining commands in this module are run from here:
```bash
cd nanoGPT
```
List the directory contents. You should see something like:
```
$ ls -l nanoGPT
$ ls -l
total 696
-rw-r--r-- 1 furst staff 1072 Apr 17 12:44 LICENSE
-rw-r--r-- 1 furst staff 13576 Apr 17 12:44 README.md