LLM2025

2025 Winter Research Review Tech Talk

Under Construction! -- Check back for updates!

Presented on January 22, 2025 as the lunch talk for the Department of Chemical and Biomolecular Engineering Winter Research Review. In my talk, I discussed uses of large language models (LLMs), the underlying architecture of a generative pre-trained transformer (GPT), and basic aspects of the mechanics behind training and deploying LLMs.

FURST_WRR_2025.pdf (65M)

This was on my mind: that engineers don't take technical solutions for granted. We generally like to "look under the hood" and see how things work. So, if you are interested in learning more about the technical underpinnings of LLMs, this page collects a few resources. The talk is largely inspired by the rapid adoption of LLMs to help us solve difficult but adjacent problems in our research.

In my talk, I didn't get into the details of how one goes from the single attention mechanism to "multi-head" attention, which is an important feature of LLM models. I also did not emphasize the step of fine-tuning models and how the basic generative text function of a GPT is built up into a powerful chatbot that many of us use. Those are topics worth exploring in greater depth.

Overall, I view GPTs as a transformative technology, in many ways analogous to the disruption that came with the introduction of the all-electronic general programmable computer in the late 1940s. Chemical engineers rapidly adopted that earlier technology to solve challenging modeling problems -- giving them solutions to partial differential equations and systems of these equations, especially, that were intractable or extremely inefficient before the development of machine computing. Specifically, LLMs will give us new ways to use our computational tools through natural language, help us to rapidly come up to speed in a new area, or quickly develop and analyze models and data with code.

- Eric Furst

Essential background

TL;DR

Start here -- Intro to Large Language Models, by Andrej Karpathy:

https://www.youtube.com/watch?v=zjkBMFhNj_g

and here -- two clips from Physics, AI, and the Future of Discovery:

https://www.youtube.com/live/cUeEP15KN8M?si=AIdi8sNEgiG7Bhv0&t=2087

https://www.youtube.com/live/cUeEP15KN8M?si=UngwZpUcpxYkaYCE&t=611

Referenced in the talk

These are several of the key references and resources that I cited in my talk.

Attention Is All You Need -- this is the paper that introduced the transformer architecture. It's interesting to go back to the source. The transformer architecture discussed in the paper incorporates both encoder and decoder functions because the authors were testing its performance on machine translation tasks. Its performance in other natural language processing tasks, like language modeling and text generation in the form of unsupervised pretraining and autoregressive generation (as in GPT) was a major subsequent innovation.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention Is All You Need, in Proceedings of the 31st International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA, 2017), pp. 6000–6010.
Link: https://dl.acm.org/doi/10.5555/3295222.3295349

Follow-up papers describe subsequent changes to the model:

GPT-2 A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language Models are Unsupervised Multitask Learners, OpenAI blog, 2019.
Link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

GPT-3 T. B. Brown et al., Language Models Are Few-Shot Learners, in Proceedings of the 34th International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA, 2020), pp. 1877–1901.
Link: https://dl.acm.org/doi/abs/10.5555/3495724.3495883

Andrej Karpathy posts videos on Youtube that teach basic implementations of GPTs.
Karpathy's NanoGPT video shows you how to build a GPT, step-by-step: https://www.youtube.com/watch?v=kCc8FmEb1nY

At the heart of an LLM like ChatGPT is the transformer -- a neural architecture that uses self-attention and feed-forward layers. Karpathy's video goes through each module and shows how to implement it in code. Try it out! (See below for the code repositories.) The simplest implementation generates fake Shakespeare! More advanced models can be fine-tuned using GPT-2 checkpoints. It is interesting to experience the raw, generative output of a GPT-based LLM and contrast its performance to highly fine-tuned chatbot applications like ChatGPT.

Also see his overview of LLMs, Intro to Large Language Models: https://www.youtube.com/watch?v=zjkBMFhNj_g

OpenAI 2023, GPT-4 Technical Report, arXiv:2303.08774

I cited Figure 4, GPT performance on academic and professional exams. My thought is that this provides a point of discussion with classes on why LLMs may not be ready to answer questions related to specialized domain knowledge, like Chemical Engineering Thermodynamics (slide 7). This possible limitation could change rapidly. Also see the MIT Teaching + Learning Lab study below.

Grotti, Meg, et al. Summary and Recommendations from Spring 2024 Faculty Interviews in the Making AI Generative for Higher Education Project. University of Delaware, Library, Museums and Press, Center for Teaching & Assessment of Learning, IT-Academic Technology Services, and School of Education, 2024.

Results from interviews of 18 UD faculty conducted in the spring of 2024 that broadly focused on three topics: (a) the impact of generative AI on teaching and learning, (b) the impact of generative AI on research, and (c) faculty support needs related to generative AI.

Link: https://www.udel.edu/content/dam/udelImages/artificial-intelligence/read-the-summary-and-recommendations-from-spring-2024-faculty-interviews/summary_and_recommendations_from_spring_2024_faculty_interviews.pdf

Applications in the physical sciences
I recommend to my students that they watch this roundtable discussion hosted by the AIP Foundation In April 2024, Physics, AI, and the Future of Discovery. In that event, Prof. Jesse Thaler (MIT) provided some especially insightful (and sometimes funny) remarks on the role of AI in the physical sciences -- including the April Fools joke, ChatJesseT. Below are links to his segments if you're short on time:

https://www.youtube.com/live/cUeEP15KN8M?si=AIdi8sNEgiG7Bhv0&t=2087

https://www.youtube.com/live/cUeEP15KN8M?si=UngwZpUcpxYkaYCE&t=611

Hallucinate This! by Mark Marino, https://markcmarino.com/chatgpt/

The "Authoritized Autobotography" of ChatGPT. Marino's (?) book is funny and it provides insight into the mechanics of prompting an LLM.

Access models and tools

"Programming is a skill best acquired by practice and example rather than from books."

- Alan Turning, Programmers' Handbook for Manchester Electronic Computer Mark II, 1951

The ecosystem of LLMs continues to grow. Many of us are familiar with proprietary LLMs through applications like OpenAI's ChatGPT, Anthropic's Claude, and Microsoft's Co-Pilot, but a number of open models are available to download and experiment on. Some models include information about the training dataset.

NanoGPT (Andrej Karpathy) -- Github codebases

NanoGPT -- https://github.com/karpathy/nanoGPT

build-nanogpt -- https://github.com/karpathy/build-nanogpt

nano-llama31 -- https://github.com/karpathy/nano-llama31

Llama (Meta) -- https://llama.com

Llama is a set of open LLMs from Meta. These models are available in different sizes. They can be downloaded and used locally in the inference mode.

Their description: Llama includes multilingual text-only models (1B, 3B), including quantized versions, text-image models (11B, 90B) and Llama 3.3 70B model offering similar performance to the Llama 3.1 405B model, allowing developers to achieve greater quality and performance on text-based applications at a fraction of the cost.

Ollama -- https://ollama.com

Ollama enables you to run inference (embedding) and sampling on Llama, Phi, Mistral, Gemma, DeepSeek models, and more, locally from the command line.

Github page: https://github.com/ollama/ollama/tree/main

Models at Ollama.com: https://ollama.com/library or https://ollama.com/search

llama.cpp -- https://github.com/ggerganov/llama.cpp

Open source software library that performs inference on LLMs locally, with an emphasis on enabling a wide range of hardware. It includes command line tools and a simple web interface.

Chatbot Leaderboard -- https://lmarena.ai

Tracks the LLM model / chatbot ecosystem and compares performance in an Elo-like methodology.

More learning resources

StatQuest -- https://statquest.org

Josh Starmer's excellent site on statistics and machine learning.

Also see Starmer's books:

The StatQuest Illustrated Guide to Machine Learning and

The StatQuest Illustrated Guide to Neural Networks and AI.

NLP Course For You -- https://lena-voita.github.io/nlp_course.html

Lena Voita's extension of her Natural Language Processing course

Github page: https://github.com/yandexdataschool/nlp_course

Advanced Machine Learning -- https://www.youtube.com/@florian_marquardt_physics

Physicist Florian Marquardt's lecture series, recommended by a colleague

Group webpage -- https://mpl.mpg.de/divisions/marquardt-division

Mastodon account -- https://fediscience.org/@FMarquardtGroup

Tiktokenizer -- https://tiktokenizer.vercel.app

Online playground for openai/tiktoken, calculating the number of tokens for a given prompt.

For instructors

LLM performance in technical subjects

In addition to the OpenAI GPT-4 Technical Report that evaluates the model's performance on standardized exams, the MIT Teaching + Learning Lab published this evaluation of ChatGPT-4's responses to thermodynamics problems:
https://tll.mit.edu/chatgpt-4-questions-from-a-materials-thermodynamics-course/

The take-home is that the LLM does not perform well, but it's interesting to see how it fails. I plan to share this page with my thermodynamics students.

Course policies

Here is text that I published on our Fall 2024 CHEG 231 Canvas site: Using AI: tools, tips, and guidelines. Instructors: feel free to download the html and use it as a starting point for your own course. Suggestions for how to improve it are welcome!

Attach:LLM_CHEG231_2024.html

Below on the left is a picture of the transformer (encoder and decoder) from Attention Is All You Need. The architecture varies from model to model. GPT-2 and GPT-3 changed the order of some calculations.

It's difficult to appreciate the multi-head attention and multiple transformer layers in the original diagram. Brendan Bycroft provides a neat visualization tool at https://bbycroft.net/llm that shows these features in greater detail. A snapshot is shown on the right for Karpathy's nanoGPT.

CHEG 667-013

In the Spring 2025, I taught a module on LLMs for an elective course. Below are the course handouts. The first two cover LLMs, including running Andrej Karpathy's NanoGPT and local models using Ollama. The other two handouts are an introduction to command line interfaces.

Attach:llm_1.pdf -- Large Language Models Part 1: NanoGPT
Attach:llm_2.pdf -- Large Language Models Part 2: Running Ollama
Attach:cli_1.pdf -- Command Line Interface Part 1
Attach:cli_2.pdf -- Command Line Interface Part 2

More to come!