computing-setup/01-know-your-machine/README.md
Eric Furst 0c6e919bdd Initial commit: computing-setup
A two-module standalone guide for setting up a new machine for
scientific computing work:

- 01-know-your-machine: hardware and OS inspection. Reads the
  physical machine first via macOS/Linux terminals or Windows
  PowerShell; a separate section walks through the WSL VM and
  how its allocations differ from the host hardware.
- 02-git-basics: pull-focused git workflow. Install, configure
  identity, clone a public repo, pull updates. Authentication
  and pushing are deferred to a future collaboration module.

Includes top-level WSL.md (copied from cli-walkthrough) for
Windows users who need the Linux environment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:09:13 -04:00

313 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Know Your Machine
## Key idea
Understand the basic hardware and software of the computer you are working on.
## Key goals
- Identify your operating system, CPU, RAM, storage, and GPU
- Understand what these components do and why they matter for computing tasks
- Learn commands to query your system on macOS, Linux, and Windows
> **Read your *physical* machine first.** Sections 16 walk through inspecting the actual hardware you own. On macOS and Linux, the terminal reports directly from the hardware. On Windows, use **PowerShell** (or the Settings GUI) — those readings come straight from the real machine.
>
> **Then visit the WSL VM separately (Section 7).** If you are on Windows and have WSL installed, your Linux environment is a *virtual machine* that sees only what it has been allocated. Section 7 covers how to inspect that and why it differs from the physical machine. If you do not yet have WSL installed, see [../WSL.md](../WSL.md).
---
As engineers, we should know our tools. You would not run a reactor without knowing its volume, pressure rating, and materials of construction. The same principle applies to computing: before we write code, train models, or analyze data, we should understand the machine we are working on.
This module is a hands-on survey. Run the commands below on your own machine and record what you find. By the end, you should be able to answer: *What is my computer, and what can it do?*
## 1. Operating system
Your operating system (OS) manages the hardware and provides the environment where all your programs run. The three major OS families are:
- **macOS** -- Apple's OS, based on Unix (Darwin kernel). Runs on Intel and Apple Silicon (M1/M2/M3/M4) hardware. Closely related to iOS, watchOS, and other Apple systems. (These are all, in fact, computers!)
- **Linux** -- Open-source Unix-like OS. Many distributions exist (Ubuntu, Fedora, etc.). Common on servers, clusters, and in WSL.
- **Windows** -- Microsoft's OS. For terminal-based work, we recommend the Windows Subsystem for Linux (WSL) to access a Unix environment.
### Find your OS version
**macOS:**
In the terminal, use the command
```bash
sw_vers
```
**macOS (GUI):** Apple menu > About This Mac. Shows the macOS version, chip (e.g., Apple M3), and memory.
**Linux:**
```bash
cat /etc/os-release
uname -a
```
The `uname -a` command shows the kernel version and architecture. You will see something like `x86_64` (Intel/AMD) or `aarch64`/`arm64` (ARM).
**Windows (PowerShell):**
```powershell
Get-ComputerInfo | Select-Object OsName, OsVersion, OsArchitecture
```
This tells you the Windows version and architecture of the physical machine.
**Windows (GUI):** Settings > System > About. Shows the edition (Home, Pro), version, and processor.
> **Exercise 1:** Run the commands above. What OS and version are you running? What architecture?
## 2. CPU (processor)
The CPU (Central Processing Unit) executes your code. Key properties:
- **Architecture**: `x86_64` (Intel/AMD) or `arm64` (Apple Silicon, some Windows laptops). This affects which software binaries you can run.
- **Cores**: Modern CPUs have multiple cores that can work in parallel. More cores help with parallel tasks (compiling, running simulations, some ML training).
- **Clock speed**: Measured in GHz. Higher is faster for single-threaded tasks, but clock speed alone does not tell the whole story.
### Find your CPU
**macOS:**
```bash
sysctl -n machdep.cpu.brand_string
sysctl -n hw.ncpu
```
The first command shows the CPU model. The second shows the total number of cores (including efficiency and performance cores on Apple Silicon).
**macOS (GUI):** Apple menu > About This Mac shows the chip (e.g., "Apple M3 Pro"). For core count, open Activity Monitor > CPU tab or run the command above.
**Linux:**
```bash
lscpu
```
This shows the CPU model, architecture, number of cores, and clock speed.
**Windows (PowerShell):**
```powershell
Get-CimInstance Win32_Processor | Select-Object Name, NumberOfCores, NumberOfLogicalProcessors, MaxClockSpeed
```
**Windows (GUI):** Settings > System > About shows the processor name. For more detail, open Task Manager (Ctrl+Shift+Esc) > Performance > CPU. This shows cores, logical processors, and clock speed.
### Why it matters
Heavy numerical work — simulations, data processing, training machine-learning models — runs faster with more cores and higher clock speed. Even so, CPUs are orders of magnitude slower than GPUs for highly parallel tasks like neural network training, which is why we also look at GPUs below.
> **Exercise 2:** What CPU does your machine have? How many cores? What architecture?
## 3. RAM (memory)
RAM (Random Access Memory) is your computer's short-term working space. When you open a program, load a dataset, or run a model, the data lives in RAM. Key things to know:
- RAM is **volatile**: it is erased when you shut down.
- RAM is **fast**: much faster than reading from disk.
- RAM is **limited**: if you run out, the OS will start using disk as overflow ("swap"), which is extremely slow.
### Find your RAM
**macOS:**
```bash
sysctl -n hw.memsize | awk '{printf "%.0f GB\n", $1/1024/1024/1024}'
```
**macOS (GUI):** Apple menu > About This Mac shows memory (e.g., "18 GB"). For current usage, open Activity Monitor > Memory tab.
**Linux:**
```bash
free -h
```
This shows total, used, and available memory. The `-h` flag makes the output human-readable (GB instead of bytes).
**Windows (PowerShell):**
```powershell
[math]::Round((Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory / 1GB, 1)
```
This returns the **physical** RAM installed on the machine. WSL users: this is the number you want — running `free -h` inside WSL would only show the VM's allocation. See Section 7.
**Windows (GUI):** Settings > System > About shows "Installed RAM". For current usage, open Task Manager (Ctrl+Shift+Esc) > Performance > Memory.
### Why it matters
Loading a large dataset or model weights means everything in active use has to fit in RAM. A modern large language model can be 48 GB or more; if you load one on an 8 GB machine alongside your OS, editor, and a browser, you may run out. When that happens the system swaps to disk and *everything* slows down dramatically. Knowing your RAM ceiling helps you plan what is realistic to run.
> **Exercise 3:** How much physical RAM does your machine have? Use the appropriate command for your OS above. How much is currently in use?
## 4. Storage (disk)
Storage is where your files, programs, and OS live permanently. Unlike RAM, it persists when you shut down. The two main types:
- **SSD (Solid State Drive)**: Fast, no moving parts. Standard on modern laptops.
- **HDD (Hard Disk Drive)**: Slower, mechanical. Sometimes used for bulk storage.
### Find your storage
**macOS:**
```bash
df -h /
```
**macOS (GUI):** Apple menu > About This Mac > More Info > Storage. Shows total capacity, used space, and a breakdown by category.
**Linux:**
```bash
df -h /
```
**Windows (PowerShell):**
```powershell
Get-Volume | Where-Object DriveLetter -eq 'C' | Select-Object DriveLetter, @{N='Size(GB)';E={[math]::Round($_.Size/1GB)}}, @{N='Free(GB)';E={[math]::Round($_.SizeRemaining/1GB)}}
```
This is the physical C: drive's total size, used, and available space. WSL users: this is your real disk; the WSL VM has its own virtual disk that we look at in Section 7.
**Windows (GUI):** Settings > System > Storage. Shows total capacity and usage per drive. You can also open File Explorer, right-click the C: drive, and select Properties.
### Why it matters
Software adds up fast. A rough sense of common items:
| Item | Approximate size |
|------|-----------------|
| A Python environment with scientific libraries | 13 GB |
| A local large language model | 120 GB each |
| A course or project repository | 50500 MB |
| Datasets | varies widely (MB to TB) |
If you are low on storage, be selective about what you install, and clean up environments and downloaded models you no longer need.
> **Exercise 4:** How much total storage does your machine have? How much is free? Is it an SSD or HDD? (On macOS, check Apple menu > About This Mac > More Info. On Linux, `lsblk` shows disk devices.)
## 5. GPU (graphics processor)
A GPU (Graphics Processing Unit) was originally designed for rendering graphics, but its architecture (thousands of small cores optimized for parallel math) makes it excellent for machine learning. There are three common situations:
- **NVIDIA GPU (discrete)**: Found in gaming laptops and workstations. Supports CUDA, which PyTorch uses for fast training. This is the best case for ML work.
- **Apple Silicon GPU (integrated)**: The M1/M2/M3/M4 chips include a GPU that PyTorch can use via MPS (Metal Performance Shaders). Faster than CPU, slower than a dedicated NVIDIA GPU.
- **Intel/AMD integrated GPU**: Built into the CPU. Not usable by PyTorch. Use `--device=cpu`.
### Find your GPU
**macOS (Apple Silicon):**
```bash
system_profiler SPDisplaysDataType
```
If you see "Apple M1" (or M2, M3, M4), you have an integrated GPU that supports MPS.
**macOS (GUI):** Apple menu > About This Mac shows the chip. Apple Silicon chips (M1/M2/M3/M4) all include a GPU.
**Linux (NVIDIA):**
```bash
nvidia-smi
```
If this command works, you have an NVIDIA GPU and the drivers are installed. It shows the GPU model, driver version, and memory. If the command is not found, you either do not have an NVIDIA GPU or the drivers are not installed.
**Windows (PowerShell):**
```powershell
Get-CimInstance Win32_VideoController | Select-Object Name, AdapterRAM, DriverVersion
```
This lists every GPU Windows sees on the physical machine — useful on laptops that have both an integrated GPU (Intel/AMD) and a discrete one (NVIDIA).
**Windows (GUI):** Task Manager (Ctrl+Shift+Esc) > Performance > GPU. This shows the GPU name (e.g., "NVIDIA GeForce RTX 4060" or "Intel UHD Graphics"), memory, and utilization.
**No GPU or unsure:**
If you have PyTorch installed, you can ask it directly:
```bash
python -c "import torch; print('CUDA:', torch.cuda.is_available()); print('MPS:', torch.backends.mps.is_available())"
```
This tells you what PyTorch can use on your machine.
### Why it matters
Training a small neural network on CPU takes minutes; on a GPU, seconds. The difference grows dramatically with model size — this is why large language models are trained on clusters of thousands of GPUs. For most introductory computing work, a CPU is sufficient. GPU acceleration is a bonus, not a requirement.
> **Exercise 5:** What GPU (if any) does your machine have? Can PyTorch use it? Run the Python check above (if you have PyTorch installed).
## 6. Putting it all together
Fill in this table for your machine:
| Component | Your machine |
|-----------|-------------|
| Operating system | |
| OS version | |
| Architecture (x86_64 / arm64) | |
| CPU model | |
| CPU cores | |
| RAM (total) | |
| Storage (total / free) | |
| GPU | |
| PyTorch device (cpu / mps / cuda) | |
### One-line system summary
**macOS:**
```bash
echo "$(sw_vers -productName) $(sw_vers -productVersion), $(sysctl -n machdep.cpu.brand_string), $(sysctl -n hw.ncpu) cores, $(sysctl -n hw.memsize | awk '{printf "%.0f GB", $1/1024/1024/1024}') RAM"
```
**Linux:**
```bash
echo "$(uname -o) $(uname -r), $(lscpu | grep 'Model name' | sed 's/.*: *//' ), $(nproc) cores, $(free -h | awk '/Mem:/ {print $2}') RAM"
```
> **Exercise 6:** Fill in the table above. If you are working alongside others, compare with a classmate or colleague. How are your machines different? How might those differences affect the kinds of work each of you can do comfortably?
## 7. Inspecting your WSL environment (Windows + WSL users)
If you are on Windows and use the Windows Subsystem for Linux (WSL), your Linux environment runs inside a **virtual machine** managed by Windows. The Linux commands from sections 15 will all work inside WSL, but the answers they give are about *the VM*, not the physical machine you already inspected. Some readings match the host; others are very different. Understanding which is which is the goal of this section.
| Reading | Inside WSL you see... | Notes |
|---------|----------------------|-------|
| OS (`uname -a`, `cat /etc/os-release`) | The Linux distribution and kernel running in the VM | Has nothing to do with your Windows version |
| CPU (`lscpu`) | The host CPU model, architecture, and core count | Passed through from the physical machine — should match what PowerShell told you |
| RAM (`free -h`) | The RAM allocated to the VM | By default, about half your physical RAM, capped at 8 GB. Configurable in a `.wslconfig` file — see the [Microsoft docs](https://learn.microsoft.com/en-us/windows/wsl/wsl-config) |
| Disk (`df -h /`) | A virtual disk (`ext4.vhdx`) stored on your Windows drive | Not the same as the C: drive. The VM grows the file on demand up to a configured maximum |
| GPU (`nvidia-smi`) | An NVIDIA GPU, *if* the Windows-side driver supports WSL | Recent NVIDIA Windows drivers include WSL support. No separate Linux driver is installed inside WSL. See [NVIDIA's CUDA on WSL guide](https://docs.nvidia.com/cuda/wsl-user-guide/) |
### Why this matters
When you install Python, run a model, or train something *inside WSL*, you are constrained by the VM's allocation, not the machine's full capacity. An 8 GB RAM cap inside WSL can mean a model loads fine on the Windows side but fails inside WSL. Knowing both numbers — physical and VM — lets you predict what will actually run where.
> **Exercise 7 (WSL users):** Run `free -h` and `df -h /` inside WSL. Compare the results to the PowerShell readings you recorded in Section 6. How much physical RAM does your VM actually see? How much of your physical disk is the VM using right now?
## 8. Keeping a machine log
Engineers keep logs for lab equipment, process equipment, and instruments. Your computer deserves the same treatment. Create a document called `machine_log` in your personal files and start it with the spec table from section 6. It should be a simple format — a text, rich text, or markdown file.
While you are at it, *give your machine a name* if you have not already. (On macOS: System Settings > General > About > Name. On Linux: `hostnamectl set-hostname yourname`. On Windows: Settings > System > About > Rename this PC.) A named machine is easier to reference in logs, SSH configs, and conversation, especially once you have more than one. Put the name at the top of your log.
After that, add a dated entry whenever you:
- **Install or upgrade the OS** or major software
- **Change system configuration** (environment variables, shell settings, drivers, WSL setup)
- **Encounter a problem and solve it** (the error, what you tried, what worked)
- **Upgrade hardware** (new RAM, new drive, etc.)
Keep entries short. Date, what changed, and the outcome. When something breaks months later, you will be glad you wrote down what you changed and when. This is especially valuable when troubleshooting: knowing what was different before the problem started is often the fastest path to a fix.
> **Exercise 8:** Start your machine log. Put the spec table at the top and add an entry for today.
## Additional resources
- [Crash Course Computer Science](https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo) — episodes 1-10 cover hardware fundamentals (transistors, ALU, registers, RAM, CPU, instructions) at a reasonable pace
- J. Clark Scott, *But How Do It Know?* — a short, readable book that builds a computer from logic gates up. Good for understanding what is actually happening inside the machine.
- `top` and `htop` — interactive process viewers that show CPU, memory, and process usage in real time. `top` is the classic Unix tool and ships built-in on macOS and Linux, so it's always available. `htop` is a more modern third-party rewrite: colored CPU/memory bars, a scrollable process list, click-to-sort columns, F-keys (or mouse) to kill/renice/filter processes, an F5 tree view, and the same behavior everywhere (macOS `top` and Linux `top` differ in flags and output; `htop` does not). Install with `brew install htop` (macOS) or `sudo apt install htop` (Linux/WSL). Worth knowing both — `top` for "wherever I land," `htop` for daily use on your own machine.