Eric Furst 794cdaea0d Sync changes from che-computing

- Fix checkpoint directory name in 01-nanogpt
- Add generative text references (OUTPUT, Love Letters)
- Add PYTORCH.md troubleshooting (MPS, CUDA, WSL)
- Minor spacing fix in 02-ollama

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 09:50:42 -04:00

3.7 KiB

Raw Permalink Blame History

PyTorch Troubleshooting

Which device should I use?

PyTorch can run on CPU, NVIDIA GPU (CUDA), or Apple GPU (MPS). Use this to check what's available on your machine:

python -c "import torch; print('CUDA:', torch.cuda.is_available()); print('MPS:', torch.backends.mps.is_available())"

Then use the appropriate --device flag when running nanoGPT:

Hardware	Flag
No GPU / any machine	`--device=cpu`
Apple Silicon (M1/M2/M3/M4)	`--device=mps`
NVIDIA GPU	`--device=cuda`

CPU works everywhere but is the slowest. For the exercises in this course, CPU is fine.

Apple Silicon (macOS)

The default PyTorch installed by uv add torch includes MPS (Metal Performance Shaders) support out of the box. No special installation is needed.

To use it with nanoGPT:

python train.py config/train_shakespeare_char.py --device=mps --compile=False
python sample.py --out_dir=out-shakespeare-char --device=mps --compile=False

Note: --compile=False is required on MPS because torch.compile does not support the MPS backend.

Windows with NVIDIA GPU (WSL)

If you have a Windows laptop with an NVIDIA GPU (common on gaming laptops with RTX 3060, 4060, etc.), you can use it through WSL. The key requirement is that the NVIDIA drivers are installed on the Windows side. WSL automatically bridges to the Windows GPU driver, so you do not need to install CUDA or NVIDIA drivers inside WSL itself.

Check that WSL can see your GPU:

nvidia-smi

If this works and shows your GPU, follow the NVIDIA GPU instructions below to install PyTorch with CUDA support.

If nvidia-smi is not found, make sure you have the latest NVIDIA drivers installed in Windows (download from https://www.nvidia.com/Download/index.aspx). After installing or updating the driver, restart WSL (wsl --shutdown from PowerShell, then reopen Ubuntu).

If you have an Intel or AMD integrated GPU without an NVIDIA card, use --device=cpu. CPU mode works fine for all the exercises in this course.

NVIDIA GPU (Linux / WSL)

Problem: "NVIDIA driver is too old" or CUDA not found

If you see an error like:

RuntimeError: The NVIDIA driver on your system is too old (found version 12020)

or if this check fails:

python -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available())"
# If cuda is None and is_available() is False, you have the CPU-only build

The issue is that uv add torch installs the CPU-only PyTorch wheel by default. It does not include CUDA support.

Fix: reinstall PyTorch with CUDA

First, check your NVIDIA driver version:

nvidia-smi

Look for the "CUDA Version" in the top right of the output. Then install the matching PyTorch CUDA wheels. For most systems with CUDA 12.x drivers:

uv pip uninstall torch torchvision torchaudio
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

For older drivers with CUDA 11.8:

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Then verify:

python -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Notes

The CUDA 12.1 wheels are compatible with CUDA 12.x drivers (e.g., 12.2, 12.4).
uv pip install operates on the active virtual environment only and does not affect other environments.
See https://pytorch.org/get-started/locally/ for the full compatibility matrix.

General tips

Always check your device before starting a long training run.
If something is not working, run the diagnostic check at the top of this file first.
The --compile=False flag is needed on CPU and MPS. It is optional (but harmless) on CUDA.

3.7 KiB Raw Permalink Blame History