LLM2025
2025 Winter Research Review Tech Talk (beta)
Under Construction! -- Check back for updates!
Presented on January 22, 2025 as the lunch talk for the Department of Chemical and Biomolecular Engineering Winter Research Review. In my talk, I discussed uses of large language models (LLMs), the underlying architecture of a generative pre-trained transformer (GPT), and basic aspects of the mechanics behind training and deploying LLMs.
This was on my mind: that engineers don't take technical solutions for granted. We generally like to "look under the hood" and see how things work. So, if you are interested in learning more about the technical underpinnings of LLMs, this page collects a few resources. The talk is largely inspired by the rapid adoption of LLMs to help us solve difficult but adjacent problems in our research.
In my talk, I didn't get into the details of how one goes from the single attention mechanism to "multi-head" attention, which is an important feature of LLM models. I also did not emphasize the step of fine-tuning models and how the basic generative text function of a GPT is built into a powerful chatbot that many of us use. Those are topics worthy of exploring in greater depth.
Overall, I view GPTs as a transformative technology, in many ways analogous to the disruption that came with the introduction of the all-electronic general programmable computer in the late 1940s. Chemical Engineers rapidly adopted that earlier technology to solve challenging modeling problems -- giving them solutions to partial differential equations and systems of these equations, especially, that were intractable or extremely inefficient before the development of machine computing. Specifically, LLMs will give us new ways to use our computational tools through natural language, help us to rapidly come up to speed in a new field or area, or quickly develop and analyze models and data with code.
Essential background
TL;DR
References and citations
- Attention Is All You Need -- this is the paper that introduced the transformer architecture.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention Is All You Need, in Proceedings of the 31st International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA, 2017), pp. 6000–6010.
- Andrej Karpathy posts videos on Youtube that teach basic implementations of GPTs.
Karpathy's NanoGPT video shows you how to build a GPT, step-by-step: https://www.youtube.com/watch?v=kCc8FmEb1nY
- OpenAI 2023, GPT-4 Technical Report, arXiv:2303.08774
I cited Figure 4, GPT performance on academic and professional exams. My thought is that this provides a point of discussion with classes on why LLMs may not be ready to answer questions related to specialized domain knowledge, like Chemical Engineering Thermodynamics. This possible limitation is changing rapidly. - Grotti, Meg, et al. Summary and Recommendations from Spring 2024 Faculty Interviews in the Making AI Generative for Higher Education Project. University of Delaware, Library, Museums and Press, Center for Teaching & Assessment of Learning, IT-Academic Technology Services, and School of Education, 2024.
- Applications in the physical sciences
I recommend to my students that they watch this roundtable discussion hosted by the American Institute of Physics Foundation In April 2024, Physics, AI, and the Future of Discovery. In that event, Prof. Jesse Thaler (MIT) provided some especially insightful (and sometimes funny) remarks on the role of AI in the physical sciences -- including the April Fools joke, ChatJesseT. Below are links to his segments if you're short on time:
- Hallucinate This! by Mark Marino, https://markcmarino.com/chatgpt/
Models and tools
The ecosystem of LLMs continues to grow. Many of us are familiar with proprietary LLMs through applications like OpenAI's ChatGPT, Anthropic's Claude, and Microsoft's Co-Pilot, but a number of open models are available to download and experiment on. Some models include information about the training dataset.
- NanoGPT (Andrej Karpathy) -- Github codebases
- Llama (Meta) -- https://llama.com
- Ollama -- https://ollama.com
- llama.cpp -- https://github.com/ggerganov/llama.cpp
- Chatbot Leaderboard -- https://lmarena.ai
Related reading
Ethics of AI
In discussions concerning the ethics of AI, and LLMs in particular, questions around at least two major topics frequently appear: intellectual property and resource use, including electricity and water. (I'd love to have more suggestions here as I work to expand this section.)
- Louis Menand discusses the relationship between AI and intellectual property in "Is A.I. the Death of I.P.?", New Yorker, January 15, 2024.
History of computers and computational tools
- George Dyson's book Turing's Cathedral documents the history of the general, programmable electronic computer, including the explosion of applications that came with the introduction of this radical new technology, especially under the influence of John von Neumann.
Artistic and literary practices
When first experimenting with GPT-based LLMs, it's fascinating to experience a machine generating text with such high fidelity! But the interest in machine or "generative" text dates almost to the beginning of the modern computer era. Many experiments, spanning a context from AI research to artistic and literary practices, have been shared over the intervening decades. Mark Marino's book cited above is a recent example in this area.
- Christopher Strachey's program, often referred to as Love Letters, was written in 1952 for the Manchester Mark I computer. It is considered by many to be the first example of generative computer literature. In 2009, David Link ran Strachey's original code on an emulated Mark I, and Nick Montfort, professor of digital media at MIT, coded a modern recreation of it in 2014. The text output follows the pattern "you are my [adjective] [noun]. my [adjective] [noun] [adverb] [verbs] your [adjective] [noun]," signed by "M.U.C." for the Manchester University Computer. With the vocabulary in the program, there are over 300 billion possible combinations.
https://strachey-love-letters.glitch.me
Siobhan Roberts' article on Strachey's Love Letters: https://www.newyorker.com/tech/annals-of-technology/christopher-stracheys-nineteen-fifties-love-machine
David Link's Love Letters installation -- https://alpha60.de/art/love_letters/
- OUTPUT: An Anthology of Computer-Generated Text by Lillian-Yvonne Bertram and Nick Montfort is a timely book covering a wide range of texts, "from research systems, natural-language generation products and services, and artistic and literary programs." (Bertram, Lillian-Yvonne, and Nick Montfort, editors. Output: An Anthology of Computer-Generated Text, 1953–2023. The MIT Press, 2024.)
For instructors
Course policies
Here is text that I published on our Fall 2024 CHEG 231 Canvas site: Using AI: tools, tips, and guidelines. Instructors: feel free to download the html and use it as a starting point for your own course.
Here is a picture of the transformer (encoder and decoder) from Attention Is All You Need.
More to come!