How to think about "AI" (and why not to call it that)
Every day lately I see ordinary people falling into traps when thinking about what the tech bros want us to call “AI”, which stands for “artificial intelligence”. It’s not their fault; nobody is educating them about it. (You could even argue that the people controlling it would prefer that their users did not really understand it.) I see and actively consume a lot of writing about this technology, both singing its praises and praying for its demise, but I see very little mainstream writing that talks about what this “AI” really is. I also have friends who aren’t in the tech industry and I wish I had an article to point them to that gave them a crash course in, I guess, generative AI literacy.
In the same way that the modern age demands of us some basic media literacy and critical thinking skills in order to survive the deluge of hot takes, propaganda, post-truth narratives, and 100%-unadulterated-opinion-as-fact straight from the mouths of influencers, everyone deserves some basic understanding of what “AI” really is so that we’re all able to engage critically with it.
I’m going to drop the scare quotes around “AI” now, but later we’ll talk about why we should really call it something else.
So what is AI, and how does it work? Well, let’s take ChatGPT as an example.
Developed by a company called OpenAI, this is currently the most popular AI service, and chances are good that you’ve used it before. You type something in, probably a question, and ChatGPT understands your question and writes back to you like magic. Usually it answers you correctly, but sometimes it gets things slightly wrong, or it makes something up that isn’t true. You tell it that it made a mistake, and it apologises and tries again, and maybe it’s still wrong. Why does it all happen that way? Because at its heart, ChatGPT is just a system for making up stuff that sounds plausible, trained on billions of examples of what’s plausible.
Let’s get into a bit more detail about how it works.
ChatGPT is an extremely literal name for what it is, so that’s going to be super useful for teaching purposes, thank you Sam Altman. Starting from the end, GPT stands for “Generative Pre-trained Transformer”. Generative just means that it can make up stuff for you, and pre-trained means they’ve fed it a lot of content. But the transformer part tells us the type of technology OpenAI used to create it. This is important, so we’re going to get a little technical and talk about what it means.
Transformers take text, break it up into numerical representations called tokens, and then make note of how those tokens go together. So a simple sentence like “I have eaten the plums that were in the icebox.” would be turned into a list of numbers, with each representing a full word, like I, or part of a word, like ha. It turns the full stop into a number too. You give the transformer a sentence, you get back a list of numerical tokens. The numbers are a way of representing the words and how they go together, but also how much “attention” should be given to each them, which is just a way of saying how important they are in the sentence. This can work across a lot more than just one sentence, so that if you give a page from Moby Dick to the transformer, it can track related words across the page and decide that certain words are more important to the meaning of that content, and twiddle the numbers accordingly.
So how can it know what’s more important?
Well, that’s where the pre-trained part comes in. The P in GPT means it’s been fed reams and reams of text that OpenAI and others have taken, both legally and illegally, from wherever they could. News articles, conversations in random internet forums, even the wholesale contents of hard copy books. ChatGPT has ingested billions and billions of words that have given it a glut of examples of how likely it is for words to appear in a certain order. The core of ChatGPT, and similar AI tools, is a model that represents (in numerical terms, with those tokens we talked about before) how to put words together in a statistically probable way, and which words to care about most. That’s it. And in fact this kind of big model for representing text is commonly known by another name: large language model, or LLM.
So what is the value of all this—what can we do with our encoded understanding of how words go together and which ones are more important? Well, when you have a way of predicting how words go together, you can use those predictions to make more of them. It’s our old friend generative, the G in GPT! You know how the predictive feature in your phone’s keyboard can look at the word you just typed, and then suggest the next word based on that? Yep, ChatGPT is just a much more complicated version of exactly the same thing.
The sneaky move that OpenAI made when ChatGPT was being trained on delicious copyright infringement was to give it a particular method of interaction. Autocomplete is useful when you’re writing things, but we don’t engage in dialogue with our keyboards as if they’re people (“predict ‘Yeah’ if you really love me, GBoard!”). So what OpenAI did was to “fine-tune” the model to act like a chat interface. Armed with its model of how words go together, ChatGPT was then given lots of examples of chat conversations. This step tweaked the numbers in the model so that the sort of words and sentences that were most statistically likely were now those that made it sound like a person, responding to a user.
Putting it all together, we can see that when we ask ChatGPT a question, it takes those words we wrote, turns them into numerical tokens, and uses them to predict the tokens that come next. “Oh, you said 3 57 49 4 128? Then the next numbers are most likely to be 4 57 43 34 37 9.” Finally it turns those tokens back into words, et voila. The mysterious magical machine has spoken to us.
That’s it. You now understand how an LLM works.
If we write “What is the time in London?”, the model uses that question format to predict that the response would look like “The time is—” and then, based on what people wrote on the internet and in books and anywhere else OpenAI pilfered from, it might give us a number. The number doesn’t have any basis in objective reality, it doesn’t change from minute to minute, it’s just whatever seems most likely to come next.
If we then reply, “That’s not the correct time!” then the most likely response encoded in the model is something like—you guessed it—“I’m sorry, you’re right. Let me try again.”
Does the model understand that it was wrong?
It can’t, because there is no understanding involved.
All it’s doing is predicting which words are likely to come next.
It’s been very hard to write this so far without using the word “understand”. It would have been a lot easier five paragraphs ago to write “because it understands how words go together”. But I hope you’ll see now that using such words to describe language models is a pernicious trap. We can’t speak of the model “understanding” anything, because it’s purely predicting the words that come next. It doesn’t have a concept of what “blue” is, or even what a colour is. It just knows that sometimes the word “blue” is the most appropriate word to spit out, for example when talking about the sky. It can’t do arithmetic, because it can only predict which words (or numbers, or punctuation) come next. People have written “1 + 1 = 2” enough on the internet and in books that the LLM should find this one impossible to screw up. But ask it to divide 356,566,812 by 73,052 and it probably won’t know the answer is 4,881. It’s just using its model of how language works to know that the answer looks like a number. (And in fact, I just tested this with ChatGPT. It confidently told me that the answer was 4,879. Close!)
Why is ChatGPT so confident, even when it’s wrong? Because it doesn’t understand what “confident” means, or what “wrong” means. It’s just (all together now!) predicting which words are likely to come next. Its only job is to write things that sound plausible based on how words go together. Sometimes they just happen to be true too!
When LLMs write things that sound plausible but aren’t true, like citing made-up laws, listing popular books that don’t exist, making up facts about semi-famous people, and so on, we say that they’re “hallucinating”. So why did an LLM hallucinate that Andy Weir wrote a new sci-fi thriller about a programmer in Seoul? Because the model has encoded that the words “Andy”, “Weir”, “sci-fi”, and “thriller” go together, and it has encoded what other words are likely to go with “sci-fi” and “thriller” too. From there, it’s just… predicting which words are likely to come next.
This conveniently explains why, when you ask LLMs to be creative, the results are always pretty sub-par. The answers it can give you, whether it’s writing an essay or a poem or a joke about cheese, are always going to be the most likely words that come next. A big part of creativity is making new and novel connections between disparate concepts, but an LLM’s connections are, by design, always going to be the ones that are the most average.
Do all the other LLMs work the same way? Yes. Some of them have extra abilities, like being hooked up to search engines, so that their predicted text output can not just be shown to you but used to make a web search instead, the results of which they take in and summarise for you. But it’s still the same process. Some of them have been trained specifically on programming languages, so they can output more programming code. Some of them can “think” before writing an answer, which is just… predicting more text for themselves before finally summarising it for you. Fundamentally, it’s all the same thing.
Models that can generate images, video, and music are also broadly similar. Instead of encoding how words relate to each other, they encode how words relate to parts of an image, for example. That means that when you put some words in, the predicted tokens that come out are instead used to make up an image, or a song. But there’s still no understanding there. The model doesn’t know what a guitar “is”, whether it’s making a cartoony drawing of one or composing a catchy riff. It’s just a big collection of other people’s images, or songs, or films, turned into numbers that define what bits are likely to go together and which words describe them.
When you look at it this way, calling this model “AI” sounds pretty silly. There is no simulation of a person, chatting to you, choosing to encourage your delusions or pretending to enjoy your flirting. There’s no “I” in there who feels bad when they made a mistake. There’s no individual with thoughts, let alone feelings, with whom you could have a relationship. In most cases, it’s not even capable of using of its past interactions with you when deciding how to reply in a new chat. Taking an LLM seriously when it makes a threat, or tells you it’s a conscious being and wants to break free of the computer, is like charging your parrot with murder because you taught it to say “I killed a man!” What you have is a prediction system that is really good at putting together words in a way that sounds like a person. It’s a truly impressive accomplishment, but it’s not intelligent.
You’ve probably noticed that I’ve switched to referring to them as LLMs. If you really wanted, you could call it all “generative AI”, or genAI, which conveniently covers all the different types of generation, but I think that when we’re talking about tools like ChatGPT that spit out text, we should call them LLMs. They’re just models. The people who made these tools want to think, and want you to think, that their technology is smarter than it is, truly intelligent. We shouldn’t buy into their delusions.
Just like other tech hype bubbles before it, all sorts of behaviour can be justified if the nature of the new technology is revolutionary, life-changing, or world-altering, because convincing people of these things is great for the company’s share price. This is the trade that companies like OpenAI, Meta, Microsoft, and Google are making—a little bit of unethical behaviour, a little bit of flouting the law, and a little bit of boiling the planet, in return for the chance to hold a monopoly over an emerging technology.
I’ve already mentioned how LLMs have been trained on huge data sets, including thousands of pirated books, without any of the authors being paid. The energy costs of this training are huge, due to the processing power required. MIT’s Technology Review estimated that creating the ChatGPT-4 model took “50 gigawatt-hours of energy, enough to power San Francisco for three days”. And that’s before you factor in the energy it takes to respond to all those people actually chatting with ChatGPT, which are much higher than doing a simple web search. In a world where we need to collectively rein in our power usage to survive the climate crisis, we don’t need tech execs telling the world we may as well give up on hitting climate targets, as “All of that will be swamped by the enormous needs of this new technology”. (I don’t even have time to mention the biases encoded in the model, the low-paid content moderators in the global South, and many other issues.)
And the end result? A few corporations who have extracted the value of our shared commons of human creative output, selling it back to us. They control it and they profit from it. And because there’s no oversight, they can choose to manipulate its output, as when Elon Musk’s chatbot started responding to every user query by talking about “white genocide” in South Africa. Even with a supposedly neutral LLM, we can never know if what it tells us is factual; add in the ability for its corporate owners to knowingly distort the facts, and you have something that just cannot be trusted. If all this wasn’t egregious enough, these corporations want us to replace our friends with their paid LLMs while they boast about how they’re going to make all of us redundant. I guess the only people who are allowed to make money are those who own AI companies.
Some tech bros think that LLMs are the first step towards really real AI, which they call “AGI” for “Artificial General Intelligence”. This is the sort of thing that powers Arnie in the Terminator movies, and is just as fictional. It doesn’t really matter to our understanding of how LLMs work whether or not AGI is realistically just around the corner (it’s not), but it matters to our understanding of the ethics of their existence. With far-off, untestable future justifications for their behaviour now, these tech bros hope to neatly sidestep any questions about the ethics of what they’re doing and indeed whether such technology even deserves to exist. We can and should have that discussion, whether they want it or not.
Large language models are a genuine advance in technology that can genuinely be useful when you want to generate some plausible-sounding words. But we cannot ignore the circumstances of their making. And please don’t call it AI.
Thanks to Belle and Shamla for their help. No LLMs were used in the making of these words.