LLM2025

2025 Winter Research Review Tech Talk (beta)

Under Construction! -- Check back for updates!

Presented on January 22, 2025 as the lunch talk for the Department of Chemical and Biomolecular Engineering Winter Research Review. In my talk, I discussed uses of large language models (LLMs), the underlying architecture of a generative pre-trained transformer (GPT), and basic aspects of the mechanics behind training and deploying LLMs.

This was on my mind: that engineers don't take technical solutions for granted. We generally like to "look under the hood" and see how things work. So, if you are interested in learning more about the technical underpinnings of LLMs, this page collects a few resources. The talk is largely inspired by the rapid adoption of LLMs to help us solve difficult but adjacent problems in our research.

In my talk, I didn't get into the details of how one goes from the single attention mechanism to "multi-head" attention, which is an important feature of LLM models. I also did not emphasize the step of fine-tuning models and how the basic generative text function of a GPT is built into a powerful chatbot that many of us use. Those are topics worthy of exploring in greater depth.

Overall, I view GPTs as a transformative technology, in many ways analogous to the disruption that came with the introduction of the all-electronic general programmable computer in the late 1940s. Chemical Engineers rapidly adopted that earlier technology to solve challenging modeling problems -- giving them solutions to partial differential equations and systems of these equations, especially, that were intractable or extremely inefficient before the development of machine computing. Specifically, LLMs will give us new ways to use our computational tools through natural language, help us to rapidly come up to speed in a new field or area, or quickly develop and analyze models and data with code.

Essential background

TL;DR

Start here:
Intro to Large Language Models, by Andrej Karpathy https://www.youtube.com/watch?v=zjkBMFhNj_g
and here -- two clips from Physics, AI, and the Future of Discovery:

References and citations

  • Attention Is All You Need -- this is the paper that introduced the transformer architecture.
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention Is All You Need, in Proceedings of the 31st International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA, 2017), pp. 6000–6010.
At the heart of an LLM like ChatGPT is the transformer -- a neural architecture that uses self-attention and feed-forward layers. Karpathy's video goes through each module and shows how to implement it in code. Try it out! See below for the code repositories. The simplest implementation generates fake Shakespeare! More advanced models can be fine-tuned using GPT-2 checkpoints. It is interesting to experience the raw, generative output of a GPT-based LLM and contrast its performance to highly fine-tuned chatbot applications like ChatGPT.
Also see his overview of LLMs, Intro to Large Language Models: https://www.youtube.com/watch?v=zjkBMFhNj_g
  • OpenAI 2023, GPT-4 Technical Report, arXiv:2303.08774
    I cited Figure 4, GPT performance on academic and professional exams. My thought is that this provides a point of discussion with classes on why LLMs may not be ready to answer questions related to specialized domain knowledge, like Chemical Engineering Thermodynamics. This possible limitation is changing rapidly.
  • Grotti, Meg, et al. Summary and Recommendations from Spring 2024 Faculty Interviews in the Making AI Generative for Higher Education Project. University of Delaware, Library, Museums and Press, Center for Teaching & Assessment of Learning, IT-Academic Technology Services, and School of Education, 2024.
Results from interviews of 18 UD faculty conducted in the spring of 2024 that broadly focused on three topics: (a) the impact of generative AI on teaching and learning, (b) the impact of generative AI on research, and (c) faculty support needs related to generative AI.
  • Applications in the physical sciences
    I recommend to my students that they watch this roundtable discussion hosted by the American Institute of Physics Foundation In April 2024, Physics, AI, and the Future of Discovery. In that event, Prof. Jesse Thaler (MIT) provided some especially insightful (and sometimes funny) remarks on the role of AI in the physical sciences -- including the April Fools joke, ChatJesseT. Below are links to his segments if you're short on time:
The "Authoritized Autobotography" of ChatGPT. Marino's (?) book is funny and it provides insight into the mechanics of prompting an LLM.

Models and tools

The ecosystem of LLMs continues to grow. Many of us are familiar with proprietary LLMs through applications like OpenAI's ChatGPT, Anthropic's Claude, and Microsoft's Co-Pilot, but a number of open models are available to download and experiment on. Some models include information about the training dataset.

  • NanoGPT (Andrej Karpathy) -- Github codebases
Llama is a set of open LLMs from Meta. These models are available in different sizes. They can be downloaded and used locally.
Their description: Llama includes multilingual text-only models (1B, 3B), including quantized versions, text-image models (11B, 90B) and Llama 3.3 70B model offering similar performance to the Llama 3.1 405B model, allowing developers to achieve greater quality and performance on text-based applications at a fraction of the cost.
Ollama enables you to run Llama 3.3, Phi 4, Mistral, Gemma 2 models, and more, locally from the command line.
Open source software library that performs inference on LLMs locally. It includes command line tools and a simple web interface.
Tracks the LLM model / chatbot ecosystem and compares performance in an ELO-like methodology.

Related reading

Ethics of AI

In discussions concerning the ethics of AI, and LLMs in particular, questions around at least two major topics frequently appear: intellectual property and resource use, including electricity and water. (I'd love to have more suggestions here as I work to expand this section.)

  • Louis Menand discusses the relationship between AI and intellectual property in "Is A.I. the Death of I.P.?", New Yorker, January 15, 2024.

History of computers and computational tools

  • George Dyson's book Turing's Cathedral documents the history of the general, programmable electronic computer, including the explosion of applications that came with the introduction of this radical new technology, especially under the influence of John von Neumann.
Dyson, George. Turing’s Cathedral: The Origins of the Digital Universe. Pantheon Books, 2012.

Artistic and literary practices

When first experimenting with GPT-based LLMs, it's fascinating to experience a machine generating text with such high fidelity! But the interest in machine or "generative" text dates almost to the beginning of the modern computer era. Many experiments, spanning a context from AI research to artistic and literary practices, have been shared over the intervening decades. Mark Marino's book cited above is a recent example in this area.

  • Christopher Strachey's program, often referred to as Love Letters, was written in 1952 for the Manchester Mark I computer. It is considered by many to be the first example of generative computer literature. In 2009, David Link ran Strachey's original code on an emulated Mark I, and Nick Montfort, professor of digital media at MIT, coded a modern recreation of it in 2014. The text output follows the pattern "you are my [adjective] [noun]. my [adjective] [noun] [adverb] [verbs] your [adjective] [noun]," signed by "M.U.C." for the Manchester University Computer. With the vocabulary in the program, there are over 300 billion possible combinations.
To experience the poem in a modern browser, see Nick Montfort's recreation as implemented in Glitch by Mark Sample:
https://strachey-love-letters.glitch.me
Wikipedia page on Strachey's algorithm: https://en.wikipedia.org/wiki/Strachey_love_letter_algorithm.
Siobhan Roberts' article on Strachey's Love Letters: https://www.newyorker.com/tech/annals-of-technology/christopher-stracheys-nineteen-fifties-love-machine
David Link's Love Letters installation -- https://alpha60.de/art/love_letters/
  • OUTPUT: An Anthology of Computer-Generated Text by Lillian-Yvonne Bertram and Nick Montfort is a timely book covering a wide range of texts, "from research systems, natural-language generation products and services, and artistic and literary programs." (Bertram, Lillian-Yvonne, and Nick Montfort, editors. Output: An Anthology of Computer-Generated Text, 1953–2023. The MIT Press, 2024.)

For instructors

Course policies

Here is text that I published on our Fall 2024 CHEG 231 Canvas site: Using AI: tools, tips, and guidelines. Instructors: feel free to download the html and use it as a starting point for your own course.

Here is a picture of the transformer (encoder and decoder) from Attention Is All You Need.

More to come!