About LLMs and Energy

When I was still studying electronics in college, I remember a project in microcontrollers class where the professor let us choose what we wanted to do. At that time, I was almost obsessed with neural networks and proposed to the professor that I would implement a network on a PIC16F877A (my favorite). It was a very simple implementation, composed of no more than 64 nodes, that I trained to recognize the patterns that I entered manually using an array of jumpers.

Sometime after starting the training, I noticed that the microcontroller began to heat up to the point of losing the serial connection with which I was monitoring it. My partner and I decided to try again with a small fan which, to our surprise, allowed the training to be completed. Against all odds, the experiment worked, the grade was good, and I passed Microcontrollers 2.

This was nowhere near one of the most impressive projects I did during my student days, but I always remember it as an important moment in my training as an engineer. It was the first time I could clearly see that software consumes power; A “for” can drain a battery and divert important energy resources. Nowadays it seems obvious, but I think we forget more often than we should.

A lot of time has passed, and many things have changed since then. Now we have LLMs, a type of artificial intelligence designed to interpret and generate text that looks like it was written by humans. These models can answer questions, hold conversations, tell stories, and more.

The basic structure of LLMs is the same as the code that ran in my little Micros 2 project: they use artificial neural networks, which are inspired by the way natural neurons in our brains process information. Artificial neural networks are made up of layers of neurons that do the same thing, and the interconnections between neurons allow them to encode and recognize patterns.

In my experiment, the patterns the network learned were configured with an array of 8 switches. On the other hand, to create an LLM a huge amount of text information is used, such as books, articles, websites, etc., which is presented as input to the neural network. The model then learns patterns that allow it to predict the next word in a sentence based on the previous words (sorry to spoil the magic, but that’s what ChatGPT really does: predict the next word). Through this training process, the model develops an apparent ability to understand language, grammar, and knowledge about the presented contexts.

According to the information provided by OpenAI, GPT-3 is made up of 96 layers of 12,288 neurons each, which adds up to around 1.18 million neurons in a model developed in 2020, certainly far from the 64 nodes I presented to my professor. The complexity of GPT-3 is better reflected in its 175 billion parameters (the name given to the interconnections between its neurons) than in the number of neurons that comprise it. I decided to use the number of neurons to facilitate comparison with the model I developed in class.

Having this context in mind makes it easier for us to understand how the computational demand of these models drives their intense energy consumption. In the training phase, vast volumes of text are processed using computationally complex algorithms, such as backpropagation, that involve intensive mathematical calculations over billions of parameters. The large size and intricate complexity of LLMs, added to the iterative training phases and real-time processing during the inference stage (for example, when interacting with a user), requires particularly powerful hardware in continuous operation, which entails significant energy consumption.

How much energy are we talking about? There are reports of information leaked from OpenAI about GPT-4 training, where it is mentioned that 25,000 Nvidia A100 GPUs were used running for more than 90 days. It is estimated that the consumption of this configuration was around 51,000 to 62,000 MWh. This is roughly equivalent to 5 or 6 year energy consumption of 1,000 average American homes, or 5,700 average Colombian homes. Alex de Vries, a PhD candidate at the University of Amsterdam, estimated that a single average interaction between a user and ChatGPT uses the same amount of energy as an LED light bulb on for an hour.

The current trend in the development of LLMs seems to be aimed at achieving smaller models, now we see some that can even run on phones, techniques such as pruning, and distillation allow the creation of smaller models derived from large models while retaining many of the characteristics and performance of their parents. Also helping is the development in TPUs, improvements in GPUs and the increasing adoption of ARM-based architectures that have proven better energy performance compared to CISC – x86.

The future of this technology is promising, and its adoption will continue to grow and probably more rapidly than at present. As software developers we must assume the responsibility of being informed and aware of the energy consumption and efficiency of the solutions we create.


Leave a Reply

Your email address will not be published. Required fields are marked *