LLM2025
2025 Winter Research Review Tech Talk
Under Construction! -- Check back for updates!
Presented on January 22, 2025 as the lunch talk for the Department of Chemical and Biomolecular Engineering Winter Research Review. In my talk, I discussed uses of large language models (LLMs), the underlying architecture of a generative pre-trained transformer (GPT), and basic aspects of the mechanics behind training and deploying LLMs.
This was on my mind: that engineers don't take technical solutions for granted. We generally like to "look under the hood" and see how things work. So, if you are interested in learning more about the technical underpinnings of LLMs, this page collects a few resources. The talk is largely inspired by the rapid adoption of LLMs to help us solve difficult but adjacent problems in our research.
In my talk, I didn't get into the details of how one goes from the single attention mechanism to "multi-head" attention, which is an important feature of LLM models. I also did not emphasize the step of fine-tuning models and how the basic generative text function of a GPT is built up into a powerful chatbot that many of us use. Those are topics worth exploring in greater depth.
Overall, I view GPTs as a transformative technology, in many ways analogous to the disruption that came with the introduction of the all-electronic general programmable computer in the late 1940s. Chemical engineers rapidly adopted that earlier technology to solve challenging modeling problems -- giving them solutions to partial differential equations and systems of these equations, especially, that were intractable or extremely inefficient before the development of machine computing. Specifically, LLMs will give us new ways to use our computational tools through natural language, help us to rapidly come up to speed in a new area, or quickly develop and analyze models and data with code.
Essential background
TL;DR
References and citations
These are several of the key references and resources that I cited in my talk.
- Attention Is All You Need -- this is the paper that introduced the transformer architecture. It's interesting to go back to the source. The transformer architecture discussed in the paper incorporates both encoder and decoder functions because the authors were testing its performance on machine translation tasks. Its performance in other natural language processing tasks, like language modeling and text generation in the form of unsupervised pretraining and autoregressive generation (as in GPT) was a major subsequent innovation.
Link: https://dl.acm.org/doi/10.5555/3295222.3295349
- Andrej Karpathy posts videos on Youtube that teach basic implementations of GPTs.
Karpathy's NanoGPT video shows you how to build a GPT, step-by-step: https://www.youtube.com/watch?v=kCc8FmEb1nY
- OpenAI 2023, GPT-4 Technical Report, arXiv:2303.08774
- Grotti, Meg, et al. Summary and Recommendations from Spring 2024 Faculty Interviews in the Making AI Generative for Higher Education Project. University of Delaware, Library, Museums and Press, Center for Teaching & Assessment of Learning, IT-Academic Technology Services, and School of Education, 2024.
- Applications in the physical sciences
I recommend to my students that they watch this roundtable discussion hosted by the American Institute of Physics Foundation In April 2024, Physics, AI, and the Future of Discovery. In that event, Prof. Jesse Thaler (MIT) provided some especially insightful (and sometimes funny) remarks on the role of AI in the physical sciences -- including the April Fools joke, ChatJesseT. Below are links to his segments if you're short on time:
- Hallucinate This! by Mark Marino, https://markcmarino.com/chatgpt/
Models and tools
The ecosystem of LLMs continues to grow. Many of us are familiar with proprietary LLMs through applications like OpenAI's ChatGPT, Anthropic's Claude, and Microsoft's Co-Pilot, but a number of open models are available to download and experiment on. Some models include information about the training dataset.
- NanoGPT (Andrej Karpathy) -- Github codebases
- Llama (Meta) -- https://llama.com
- Ollama -- https://ollama.com
- llama.cpp -- https://github.com/ggerganov/llama.cpp
- Chatbot Leaderboard -- https://lmarena.ai
Related reading
Ethics of AI
In discussions concerning the ethics of AI, and LLMs in particular, questions around at least two major topics frequently appear: intellectual property and resource use, including electricity and water. (I'd love to have more suggestions here as I work to expand this section.)
- Louis Menand discusses the relationship between AI and intellectual property in "Is A.I. the Death of I.P.?", New Yorker, January 15, 2024.
There are many articles in our daily news that cite the energy use of LLMs, some with sometimes drastic predictions, such as the imminent collapse of the electrical grid. But how does LLM training and use compare to other digital activities, like search, streaming, and cryptocurrencies?
- The International Energy Agency's Electricity 2025 covers several relevant topics. For instance, growing electricity demand in the US and other mature economies is driven in part by data centers, but also from new electric vehicles, air conditioners, and heat pumps.
- Lawrence Livermore National Laboratory's Energy Flow Charts is a useful resource for understanding US energy use and waste.
History of computers and computational tools
- George Dyson's book Turing's Cathedral documents the history of the general, programmable electronic computer, including the explosion of applications that came with the introduction of this radical new technology, especially under the influence of John von Neumann.
Artistic and literary practices
When first experimenting with GPT-based LLMs, it's fascinating to experience a machine generating text with such high fidelity! But the interest in machine or "generative" text dates almost to the beginning of the modern computer era. Many experiments, spanning a context from AI research to artistic and literary practices, have been shared over the intervening decades. Mark Marino's book cited above is a recent example in this area.
- Christopher Strachey's program, often referred to as Love Letters, was written in 1952 for the Manchester Mark I computer. It is considered by many to be the first example of generative computer literature. In 2009, David Link ran Strachey's original code on an emulated Mark I, and Nick Montfort, professor of digital media at MIT, coded a modern recreation of it in 2014. The text output follows the pattern "you are my [adjective] [noun]. my [adjective] [noun] [adverb] [verbs] your [adjective] [noun]," signed by "M.U.C." for the Manchester University Computer. With the vocabulary in the program, there are over 300 billion possible combinations.
https://strachey-love-letters.glitch.me
Siobhan Roberts' article on Strachey's Love Letters: https://www.newyorker.com/tech/annals-of-technology/christopher-stracheys-nineteen-fifties-love-machine
David Link's Love Letters installation -- https://alpha60.de/art/love_letters/
- OUTPUT: An Anthology of Computer-Generated Text by Lillian-Yvonne Bertram and Nick Montfort is a timely book covering a wide range of texts, "from research systems, natural-language generation products and services, and artistic and literary programs." (Bertram, Lillian-Yvonne, and Nick Montfort, editors. Output: An Anthology of Computer-Generated Text, 1953–2023. The MIT Press, 2024.)
For instructors
LLM performance in technical subjects
- In addition to the OpenAI GPT-4 Technical Report that evaluates the model's performance on standardized exams, the MIT Teaching + Learning Lab published this evaluation of ChatGPT-4's responses to thermodynamics problems:
https://tll.mit.edu/chatgpt-4-questions-from-a-materials-thermodynamics-course/
Course policies
Here is text that I published on our Fall 2024 CHEG 231 Canvas site: Using AI: tools, tips, and guidelines. Instructors: feel free to download the html and use it as a starting point for your own course. Suggestions for how to improve it are welcome!
Here is a picture of the transformer (encoder and decoder) from Attention Is All You Need.
More to come!