LLM

Always Under Construction!
99.5% Human Generated Content!
Using em-dash since 1993!

This page collects resources for using large language models (LLMs) in research and teaching -- building RAG pipelines, running local models, performing semantic search, and more.

LLMs have matured enormously since their introduction. Their most promising uses help us solve difficult but adjacent problems in research -- as tools for coding numerical solutions, performing exploratory data analysis, or organizing and cleaning datasets within an open source scientific computing stack built around Python and Jupyter notebooks. In the classroom, promises of individualized tutor chatbots abound, but LLMs can power locally run, teacher-facing analysis tools, too. (See Stan below!)

The transformer neural network architecture underlying LLMs is a transformative technology in what is, basically, an unexpected convergence of three things: the availability and maturity of highly parallelized processors optimized for matrix calculations (GPUs); the surprising (even to its inventors) performance of the transformer architecture; and large training datasets made possible by the maturity of the world wide web and broader internet. In many ways, our recent experiences with LLMs are analogous to the disruption that came with the introduction of the all-electronic general programmable computer in the late 1940s. Chemical engineers rapidly adopted that earlier technology to solve challenging modeling problems -- giving them solutions to partial differential equations and systems of these equations, especially, that were intractable or extremely inefficient before the development of machine computing. Specifically, LLMs give us a natural language interface to our computational tools.

One thing that's stayed with me as I've worked with LLMs is that engineers and scientists don't take technical solutions for granted. We generally like to "look under the hood" and see how things work. So, if you are interested in learning more about the technical underpinnings of LLMs, this page collects a few of those resources, too. (Spoiler alert: a Boltzmann-like distribution plays a central role in the architecture of GPTs.)

- Eric Furst

Earlier version of this page (January 2025): LLM2025

Recent work

Stan: An LLM-based thermodynamics course assistant

Emerging AI tools in education are largely student-facing: chatbots that answer questions, tutors that explain concepts, generators that produce practice problems. Instructor-facing tools -- tools that help faculty understand and improve their own teaching -- are far less developed. Stan is an attempt to fill that gap using many of the resources described on this page. Everything runs on local hardware -- a laptop for interactive queries, a GPU workstation for batch processing -- with no cloud APIs, no per-query fees, and full data privacy.

See the arXiv preprint for a full description and the code base on Github.

arXiv preprint at arXiv:2603.04657
Github repo: https://github.com/EntropicLearners/stan.git

CHEG 667-013 Course Handouts

In Spring 2025, I taught a module on LLMs for an elective course, Chemical Engineering with Computers. These handouts are a practical starting point for anyone wanting to run and experiment with LLMs:

Attach:llm_1.pdf -- Large Language Models Part 1: NanoGPT
Attach:llm_2.pdf -- Large Language Models Part 2: Running Ollama
Attach:cli_1.pdf -- Command Line Interface Part 1
Attach:cli_2.pdf -- Command Line Interface Part 2

Lecture slides: Attach:cheg667_013_llm_2025.pdf

Talks on LLMs

Winter Research Review, January 2025

Presented on January 22, 2025 as the lunch talk for the Department of Chemical and Biomolecular Engineering Winter Research Review. In my talk, I discussed uses of large language models (LLMs), the underlying architecture of a generative pre-trained transformer (GPT), and basic aspects of the mechanics behind training and deploying LLMs.

FURST_WRR_2025.pdf (65M)

Chemistry Biology Interface program's annual retreat, July 29, 2025

An updated and somewhat complementary talk for UD's Chemistry Biology Interface program's annual retreat.

FURST_CBI_LLM_2025.pdf (4M)

Hacking with LLMs

"Programming is a skill best acquired by practice and example rather than from books."

- Alan Turing, Programmers' Handbook for Manchester Electronic Computer Mark II, 1951 link

Running LLMs Locally -- Quick Start

Inference -- the text generation we experience with LLMs -- is not terribly computationally expensive. (The main and highly publicized energy use in these models comes with training them.) Capable models in the 7-8 billion parameter range can be run on consumer grade laptops and desktops. One of the fastest paths to running a local LLM is Ollama. Once installed, a single command pulls and runs a model:

  ollama run llama3.1:8B

See the full Ollama entry in Access Models and Tools below, and the course handouts for step-by-step walkthroughs.

Build a RAG!

One of the interesting applications of transformers is the ability to catalog and search text semantically. In retrieval augmented generation (RAG) a body of text is used to construct a vector store -- chunks of text that are encoded in the abstract vector space of a sentence transformer (or more technically, embedded using a sentence transformer). Similarly embedded queries can then be compared against the vector store to find matching text. The search isn't a literal one, but it captures the semantic similarity of the query and matching vectors. The text retrieved from this search is then passed to an LLM to generate the query output. RAG can be used for such a semantic search or to summarize the retrieved documents. It also provides a way of introducing specific information an LLM can draw from outside of fine tuning a model.

See the detailed walk through: RAG
Code repo: https://furst.group/git/furst/rag-demo

LLMs on the CLI

Need to summarize text, but don't want to share it with OpenAI or some other business with nebulous and ever-shifting data privacy policies? Local models can handle summarization tasks directly from the command line interface, enabling you to harness search and processing in conjunction with unix commands on macOS, Linux, etc.

Examples and walkthrough: ollamacli

Semantic Search

Build a semantic search over a personal archive and a collection of clippings. It uses vector embeddings and a local LLM to find and synthesize information across 1800+ dated text entries spanning 2000-2025, plus a library of PDFs, articles, and web saves. You can test it against the ChatGPT API (or other cloud provider) or run completely local embedding and inference.

See the code repo at https://furst.group/git/furst/ssearch

Access Models and Tools

Frameworks and Code

Ollama -- https://ollama.com

Ollama enables you to run inference (embedding) and sampling (generation) on Llama, Phi, Mistral, Gemma, DeepSeek models, and more, locally, from the command line.

Github page: https://github.com/ollama/ollama/tree/main

Models: https://ollama.com/library or https://ollama.com/search

LlamaIndex -- https://docs.llamaindex.ai/en/stable/

MIT licensed library for building LLM agents and workflows with a focus on retrieval augmented generation (RAG).

Starter tutorial using local models (ollama): https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/

LLaMa (Meta) -- https://llama.com

Open LLMs from Meta, available in multiple sizes for local inference. Llama includes multilingual text-only models (1B, 3B), text-image models (11B, 90B) and Llama 3.3 70B.

llama.cpp -- https://github.com/ggerganov/llama.cpp

Open source library for local LLM inference across a wide range of hardware. Includes command line tools and a simple web interface.

NanoGPT (Andrej Karpathy) -- Github codebases

NanoGPT -- https://github.com/karpathy/nanoGPT

build-nanogpt -- https://github.com/karpathy/build-nanogpt

nano-llama31 -- https://github.com/karpathy/nano-llama31

Other Resources

Tiktokenizer -- https://tiktokenizer.vercel.app

Online playground for openai/tiktoken. Calculate the number of tokens for a given prompt.

Swiss AI Initiative -- https://www.swiss-ai.org

ETH Zurich and EPFL released a fully open LLM developed on public infrastructure. Announcement: https://ethz.ch/en/news-and-events/eth-news/news/2025/07/a-language-model-built-for-the-public-good.html

Essential Background

TL;DR

Start here -- Intro to Large Language Models, by Andrej Karpathy:

https://www.youtube.com/watch?v=zjkBMFhNj_g

and here -- two clips from Physics, AI, and the Future of Discovery:

https://www.youtube.com/live/cUeEP15KN8M?si=AIdi8sNEgiG7Bhv0&t=2087

https://www.youtube.com/live/cUeEP15KN8M?si=UngwZpUcpxYkaYCE&t=611

or jump off the high board:

Dive Into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola

https://d2l.ai

Attention and LLMs: https://d2l.ai/chapter_attention-mechanisms-and-transformers/index.html

Softmax: https://d2l.ai/chapter_linear-classification/softmax-regression.html

Key References

Attention Is All You Need -- the paper that introduced the transformer architecture.

A. Vaswani et al., Attention Is All You Need, NeurIPS 2017, pp. 6000–6010.
Link: https://dl.acm.org/doi/10.5555/3295222.3295349

The concept of attention was first introduced in Bahdanau, Cho, and Bengio, Neural machine translation by jointly learning to align and translate, ICLR 2015.
Link: https://arxiv.org/pdf/1409.0473

The generative decoder-only architecture: Liu et al., Generating Wikipedia by Summarizing Long Sequences, ICLR 2018.
Link: https://openreview.net/pdf?id=Hyg0vbWC-

GPT-2 - Radford et al., Language Models are Unsupervised Multitask Learners, OpenAI blog, 2019.
Link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

GPT-3 - Brown et al., Language Models Are Few-Shot Learners, NeurIPS 2020.
Link: https://dl.acm.org/doi/abs/10.5555/3495724.3495883

GPT-4 Technical Report -- OpenAI, 2023. arXiv:2303.08774

Andrej Karpathy's NanoGPT video walks through building a GPT step-by-step in code:
https://www.youtube.com/watch?v=kCc8FmEb1nY

The simplest implementation generates fake Shakespeare. More advanced models can be fine-tuned using GPT-2 checkpoints. It's instructive to contrast the raw generative output with a fine-tuned chatbot like ChatGPT.

Brendan Bycroft's LLM visualizer (showing multi-head attention and transformer layers):
https://bbycroft.net/llm

Left: the transformer (encoder and decoder) from Attention Is All You Need. Right: Karpathy's nanoGPT visualized with Bycroft's tool.

LLM

Recent work

Stan: An LLM-based thermodynamics course assistant

CHEG 667-013 Course Handouts

Talks on LLMs

Hacking with LLMs

Running LLMs Locally -- Quick Start

Build a RAG!

LLMs on the CLI

Semantic Search

Access Models and Tools

Frameworks and Code

Other Resources

Essential Background

TL;DR

Key References

Related Reading (and Watching)

General interest

Ethics of AI

Problems and Pitfalls

Energy and Resource Use

Artistic and Literary Practices

History of Computing

AI in the Physical Sciences

More Learning Resources