LLM

Always Under Construction!
99.5% Human Generated Content!
Using em-dash since 1993!

This page collects resources for using large language models (LLMs) in research and teaching -- building RAG pipelines, running local models, performing semantic search, and more.

LLMs have matured enormously since their introduction. Their most promising uses help us solve difficult but adjacent problems in research -- as tools for coding numerical solutions, performing exploratory data analysis, or organizing and cleaning datasets within an open source scientific computing stack built around Python and Jupyter notebooks. In the classroom, promises of individualized tutor chatbots abound, but LLMs can power locally run, teacher-facing analysis tools, too. (See Stan below!)

The transformer neural network architecture underlying LLMs is a transformative technology in what is, basically, an unexpected convergence of three things: the availability and maturity of highly parallelized processors optimized for matrix calculations (GPUs); the surprising (even to its inventors) performance of the transformer architecture; and large training datasets made possible by the maturity of the world wide web and broader internet. In many ways, our recent experiences with LLMs are analogous to the disruption that came with the introduction of the all-electronic general programmable computer in the late 1940s. Chemical engineers rapidly adopted that earlier technology to solve challenging modeling problems -- giving them solutions to partial differential equations and systems of these equations, especially, that were intractable or extremely inefficient before the development of machine computing. Specifically, LLMs give us a natural language interface to our computational tools.

One thing that's stayed with me as I've worked with LLMs is that engineers and scientists don't take technical solutions for granted. We generally like to "look under the hood" and see how things work. So, if you are interested in learning more about the technical underpinnings of LLMs, this page collects a few of those resources, too. (Spoiler alert: a Boltzmann-like distribution plays a central role in the architecture of GPTs.)

- Eric Furst

Earlier version of this page (January 2025): LLM2025

Recent work


Stan: An LLM-based thermodynamics course assistant

Emerging AI tools in education are largely student-facing: chatbots that answer questions, tutors that explain concepts, generators that produce practice problems. Instructor-facing tools -- tools that help faculty understand and improve their own teaching -- are far less developed. Stan is an attempt to fill that gap using many of the resources described on this page. Everything runs on local hardware -- a laptop for interactive queries, a GPU workstation for batch processing -- with no cloud APIs, no per-query fees, and full data privacy.

See the arXiv preprint for a full description and the code base on Github.

CHEG 667-013 Course Handouts

In Spring 2025, I taught a module on LLMs for an elective course, Chemical Engineering with Computers. These handouts are a practical starting point for anyone wanting to run and experiment with LLMs:

Lecture slides: Attach:cheg667_013_llm_2025.pdf

Talks on LLMs

  • Winter Research Review, January 2025
Presented on January 22, 2025 as the lunch talk for the Department of Chemical and Biomolecular Engineering Winter Research Review. In my talk, I discussed uses of large language models (LLMs), the underlying architecture of a generative pre-trained transformer (GPT), and basic aspects of the mechanics behind training and deploying LLMs.
  • Chemistry Biology Interface program's annual retreat, July 29, 2025
An updated and somewhat complementary talk for UD's Chemistry Biology Interface program's annual retreat.

Hacking with LLMs


"Programming is a skill best acquired by practice and example rather than from books."
- Alan Turing, Programmers' Handbook for Manchester Electronic Computer Mark II, 1951 link

Running LLMs Locally -- Quick Start

Inference -- the text generation we experience with LLMs -- is not terribly computationally expensive. (The main and highly publicized energy use in these models comes with training them.) Capable models in the 7-8 billion parameter range can be run on consumer grade laptops and desktops. One of the fastest paths to running a local LLM is Ollama. Once installed, a single command pulls and runs a model:

  ollama run llama3.1:8B

See the full Ollama entry in Access Models and Tools below, and the course handouts for step-by-step walkthroughs.

Build a RAG!

One of the interesting applications of transformers is the ability to catalog and search text semantically. In retrieval augmented generation (RAG) a body of text is used to construct a vector store -- chunks of text that are encoded in the abstract vector space of a sentence transformer (or more technically, embedded using a sentence transformer). Similarly embedded queries can then be compared against the vector store to find matching text. The search isn't a literal one, but it captures the semantic similarity of the query and matching vectors. The text retrieved from this search is then passed to an LLM to generate the query output. RAG can be used for such a semantic search or to summarize the retrieved documents. It also provides a way of introducing specific information an LLM can draw from outside of fine tuning a model.

LLMs on the CLI

Need to summarize text, but don't want to share it with OpenAI or some other business with nebulous and ever-shifting data privacy policies? Local models can handle summarization tasks directly from the command line interface, enabling you to harness search and processing in conjunction with unix commands on macOS, Linux, etc.

Semantic Search

Build a semantic search over a personal archive and a collection of clippings. It uses vector embeddings and a local LLM to find and synthesize information across 1800+ dated text entries spanning 2000-2025, plus a library of PDFs, articles, and web saves. You can test it against the ChatGPT API (or other cloud provider) or run completely local embedding and inference.

Access Models and Tools


Frameworks and Code

Ollama enables you to run inference (embedding) and sampling (generation) on Llama, Phi, Mistral, Gemma, DeepSeek models, and more, locally, from the command line.
MIT licensed library for building LLM agents and workflows with a focus on retrieval augmented generation (RAG).
Open LLMs from Meta, available in multiple sizes for local inference. Llama includes multilingual text-only models (1B, 3B), text-image models (11B, 90B) and Llama 3.3 70B.
Open source library for local LLM inference across a wide range of hardware. Includes command line tools and a simple web interface.
  • NanoGPT (Andrej Karpathy) -- Github codebases

Other Resources

Online playground for openai/tiktoken. Calculate the number of tokens for a given prompt.
ETH Zurich and EPFL released a fully open LLM developed on public infrastructure. Announcement: https://ethz.ch/en/news-and-events/eth-news/news/2025/07/a-language-model-built-for-the-public-good.html

Essential Background


TL;DR

Start here -- Intro to Large Language Models, by Andrej Karpathy:
and here -- two clips from Physics, AI, and the Future of Discovery:
or jump off the high board:
Dive Into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola

Key References

  • Attention Is All You Need -- the paper that introduced the transformer architecture.
A. Vaswani et al., Attention Is All You Need, NeurIPS 2017, pp. 6000–6010.
Link: https://dl.acm.org/doi/10.5555/3295222.3295349
The concept of attention was first introduced in Bahdanau, Cho, and Bengio, Neural machine translation by jointly learning to align and translate, ICLR 2015.
Link: https://arxiv.org/pdf/1409.0473
The generative decoder-only architecture: Liu et al., Generating Wikipedia by Summarizing Long Sequences, ICLR 2018.
Link: https://openreview.net/pdf?id=Hyg0vbWC-
GPT-2 - Radford et al., Language Models are Unsupervised Multitask Learners, OpenAI blog, 2019.
Link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
GPT-3 - Brown et al., Language Models Are Few-Shot Learners, NeurIPS 2020.
Link: https://dl.acm.org/doi/abs/10.5555/3495724.3495883
GPT-4 Technical Report -- OpenAI, 2023. arXiv:2303.08774
The simplest implementation generates fake Shakespeare. More advanced models can be fine-tuned using GPT-2 checkpoints. It's instructive to contrast the raw generative output with a fine-tuned chatbot like ChatGPT.

Left: the transformer (encoder and decoder) from Attention Is All You Need. Right: Karpathy's nanoGPT visualized with Bycroft's tool.

Related Reading (and Watching)


General interest

  • Gideon Lewis-Kraus, "What Is Claude? Anthropic Doesn’t Know, Either," New Yorker, February 9, 2026.

Ethics of AI

  • Louis Menand, "Is A.I. the Death of I.P.?", New Yorker, January 15, 2024.
  • Grotti, Meg, et al. Summary and Recommendations from Spring 2024 Faculty Interviews in the Making AI Generative for Higher Education Project. University of Delaware, 2024.

Problems and Pitfalls

Melanie Mitchell reminds us that the solutions to alignment problems are not obvious, and that AI literacy is a commonsense first step:

In Why AI chatbots lie to us, Melanie Mitchell
https://doi.org/10.1126/science.aea3922

Energy and Resource Use

Artistic and Literary Practices

When we first experiment with GPT-based LLMs, it's fascinating to experience a machine generating text with such high fidelity. But the interest in machine or "generative" text dates almost to the beginning of the modern computer era. Many experiments, spanning a context from AI research to artistic and literary practices, have been shared over the intervening decades.

  • Christopher Strachey's program, often referred to as Love Letters, was written in 1952 for the Manchester Mark I computer. It is considered by many to be the first example of generative computer literature. In 2009, David Link ran Strachey's original code on an emulated Mark I, and Nick Montfort, professor of digital media at MIT, coded a modern recreation of it in 2014. The text output follows the pattern "you are my [adjective] [noun]. my [adjective] [noun] [adverb] [verbs] your [adjective] [noun]," signed by "M.U.C." for the Manchester University Computer. With the vocabulary in the program, there are over 300 billion possible combinations.
To experience the poem in a modern browser, see Nick Montfort's recreation:
https://nickm.com/memslam/love_letters.html
Wikipedia page on Strachey's algorithm: https://en.wikipedia.org/wiki/Strachey_love_letter_algorithm.
Siobhan Roberts' article on Strachey's Love Letters: https://www.newyorker.com/tech/annals-of-technology/christopher-stracheys-nineteen-fifties-love-machine
David Link's Love Letters installation -- https://alpha60.de/art/love_letters/
My remix for the GPU era -- https://ef1j.org/glitched/love-llms/
  • OUTPUT: An Anthology of Computer-Generated Text by Lillian-Yvonne Bertram and Nick Montfort is a timely book covering a wide range of texts, "from research systems, natural-language generation products and services, and artistic and literary programs." (Bertram, Lillian-Yvonne, and Nick Montfort, editors. Output: An Anthology of Computer-Generated Text, 1953–2023. The MIT Press, 2024.)
  • Hallucinate This! by Mark Marino, https://markcmarino.com/chatgpt/

History of Computing

George Dyson's book Turing's Cathedral documents the history of the general, programmable electronic computer, including the explosion of applications that came with the introduction of this radical new technology, especially under the influence of John von Neumann.

  • George Dyson, Turing's Cathedral: The Origins of the Digital Universe. Pantheon Books, 2012.

AI in the Physical Sciences

More Learning Resources

Physicist Florian Marquardt's lecture series.
Python Computations in Science and Engineering -- online textbook for Mathematical Modeling of Chemical Engineering Processes at CMU.

More to come!