What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata

Mar 30, 2024

Wow, many of you are fine, thanks for that lovely introduction. Okay, so

what

generative

artificial intelligence? So I'm going to explain

what

artificial intelligence is and I want it to be a little bit interactive so that there is some audience participation. The people here who give this talk told me: Oh, you're very low-tech for someone who

work

s in AI. I don't have explosions or experiments, so I'm afraid you'll have to participate. I hope it's okay, so what? is

generative

artificial intelligence, so the term is made up of two things: artificial and generative intelligence, so artificial intelligence is a fancy term to say that we get a computer program to do the

work

that would otherwise be done by a human, and generative, this is the fun part we have.

They are creating new content that the computer hasn't necessarily seen, it has seen parts of it and is able to synthesize it and give us new things, so what would this new content be? It could be audio, it could be computer code for you to write a program. for us it could be a new image, it could be a text like an email or an essay you heard or a video. Now in this lecture I will only focus mainly on the text because I do natural language processing and this is what I know and we will see how the technology works and hopefully by the end of the lecture you will know that there are a lot of myths around it and you don't see how what it

does

and it's just a tool.

More Interesting Facts About,

what is generative ai and how does it work the turing lectures with mirella lapata...

Talk, there are three parts and it's a little boring. This is Alice Mor Earl. I don't expect you to know the lady. She was an American writer and she writes about memories and customs, but she is famous for her quotes, so she gave us this quote here. she says yesterday's story tomorrow is a mystery today is a gift and that's why it's called present it's a very optimistic quote and the conference is basically the past, present and future of AI. Well, what I want to say from the beginning is that Generative AI is not a new concept, it has been around for a while, so how many of you have used or are familiar with Google Translate?

Can I see a raised hand? Who can tell me when Google Translate was first released? 1995. Oh, that. It would have been good in 2006, so it's been around for 17 years and we've all been using it and this is an example of generative AI. The Greek text appears. I'm Greek, so you know, pay some juice on the right to get the Greek text to appear. The text appears in English and Google Translate has served us very well all these years and no one was making a fuss. Another example is Siri on the phone again. Siri was launched in 2011 12 years ago and was a sensation back then, it is another example of generativity.

With AI we can ask Siri to set alarms and Siri responds and oh how cool is that and then you can ask about your alarms and what not, this is generative AI again, it's not as sophisticated as GPT chat but it was there and I do not. I know how many have an iPhone, look, iPhones are quite popular, I don't know why, okay, so we're all familiar with that, um and of course later Amazon Alexa came along, etc., okay, again, Generative II is not a new concept, it is. everywhere is part of your phone. H completion when you send an email or when you send a text message, the phone tries to complete your sentences, it tries to think like you and it saves you time because some of the completions are there.

The same goes for Google when you try to type, it tries to guess what your search term is. This is an example of language modeling. We'll hear a lot about language modeling in this talk, so we're basically making predictions of what it will be. The continuations are going to be so what I'm telling you is that generative AI is not that new so the question is what is the fuss, what happened in 2023, open AI, which is a company in California, in fact , in San Francisco, if you go to In San Francisco, you can even see the lights at night of its building.

He announced gp4 and claimed that it can be 90% of humans on the SAT for those of you who don't know that the SAT is a standardized Tex test that American school. what children have to take to enter university is an admission test and it is multiple choice and is considered not that easy, so gp4 can do it. They also claimed that you can get top marks in law medical exams, um and other exams, they have a whole set of things. which claim, well it is not the claim that they show that gp4 can do well, other than it can pass exams, we can ask it to do other things, so you can ask it to write a text for you, for example, you can receive a message like this.

The little thing you see up there is an indication, it is what the human being wants the tool to do for them and a possible indication could be: I am writing an essay on the use of mobile phones while driving. Can you give me three arguments in favor of this? It's pretty sophisticated if you ask me, I'm not sure I can find three arguments that you can also use and these are actual messages that the tool can actually do. You tell CH GPT or GPT in general acts as a JavaScript developer. Write a A program is required that verifies the information in a form: name and email, but not address and age, so I'm writing this and the tool will generate a program and this is the best: Create an about page me for a website.

I like rock climbing, outdoor sports and I like programming. I started my career as a quality engineer in the industry, blah, blah, blah, so I give this version of what I want the website to be and it will create it for me, so you can see that we have. I went from Google Translate and Siri and auto completion to something that's much more sophisticated. I can do many more things. Another curious fact. Here's a graph showing how long it took GPT chat to reach 100 million users in comparison. with other tools that were launched in the past and, as you see, our beloved Google translator, it took 78 months to reach 100 million users, a long time.

Tik Tok took N9 months and CH GPT two. So within two months they had 100 million EUR users. and these users pay a little to use the system so you can do the multiplication and calculate how much money they make. Well, this is the part of the story. So how did we make CH GPT? What is the technology behind this? It turns out that the technology is not extremely new or extremely innovative or extremely difficult to understand, so we're going to talk about today, so we're going to address three questions first, how did we get to chat from these single-purpose systems like Google Translate?

GPT, which is more sophisticated and

does

a lot more things and in particular what is the core technology behind chbd and what are the risks, if any, and finally I will show you a little glimpse of the future and what it will be like. and whether we should worry or not and you know I won't leave you hanging please don't worry okay so all these variants of GPT models and there's a cottage industry out there. I'm just using GPT as an example. because the public knows it and there have been many news articles about it, but there are other models, other variants of models that we use in academia and they all work according to the same principle and this principle is called language modeling, what does language mean ?

The modeling assumes that we have a sequence of words, the context so far and we saw this context by completing it and I have an example here, assuming that my context is the phrase that I want, the language modeling tool will predict what comes next, like this If I say you want there are several predictions I want to shovel I want to play I want to swim I want to eat and depending on what we choose if it is shoveling or playing or swimming there are more continuations so uh for a shovel it will be snow to play it can be tennis or video swimming no has continuation will be a lot of fruit now this is a toy example but imagine now that the computer has seen a lot of text and knows what words follow other words We used to count these things, so I would go, download a lot of data and I will count, I want to shovel, how many times appears and what the continuations are, and we would have counts of these things and all of this is gone. out of the window right now and we use neural networks that don't exactly count things, but they predict, they learn things in a more sophisticated way and I'll show you in a moment how it's done, so the jpt and GPT variants are based on this. beginning of I have some context, I'll predict what's coming next and uh, that's the message The message I gave you these things here are messages, this is the context and then you need to do the task, which would come next in some cases. would be the three arguments in the case of the web developer, it would be a web page.

Okay, the task of language modeling is that we have the context and this I changed the example. Now it says what the color of the sky is and we have a neural language. model, this is just an algorithm that will predict what the most likely continuation is and probability matters. These are all based on making guesses about what's coming next and that's why they sometimes fail because they predict the most likely answer, whereas you want a less likely one, but that's how they train, they train to find what's probably right, so that we don't count these things, we try to predict them using this language model, so how would you build your own language model?

It's a recipe, this is how everyone does this, so the first step we need a lot of data, we need to collect a huge Corpus, so these are words and where will we find such a huge Corpus? I mean, we go to the web and we download the entire stack of Wikipedia Pages overflowing that social networks GitHub Reddit whatever you can find there, I mean determine the permissions, it has to be legal, you download this whole Corpus and then, what? what are you doing? So you have this language model that I don't have. I told him what exactly this language model is.

There is an example and I haven't told you what the neural network is that does the prediction, but suppose you have it, so you have this Machinery that will do the learning for you and the task now is. to predict the next word, but how do we do it? This is the cool part. We have prayers at Corpus Christi. We can remove some of them and we can have the language model predict the sentences that we have removed. This is very cheap. I just delete. things I pretend are not there and get the language model to predict them, so I will randomly truncate truncate means remove the last part of the input sentence I will calculate with this neural network the probability that the word is missing if I do it right I'm fine, yes I'm not right, I have to go back and re-estimate some things because obviously I made a mistake and I'm moving on.

I'll tune it and feed back into the model and then compare what the model predicted to the ground. The truth because I eliminated the words in the first place, so I really know what the real truth is and we continue for some months or maybe years, not months, let's say, so it will take some time to do this process because, as you can appreciate. I have a very large corpus and I have many sentences and I have to do the prediction and then go back and correct my mistake and so on, but in the end everything will converge and I will get my answer, so the tool in the middle that I have shown, this tool here, this language model.

A very simple language model looks a bit like this, and perhaps the audience has seen this. This is a very naive graph, but it helps illustrate the point of what it does. The neural network language model will have some input which is these nodes um in the, as we look at it, okay, my right and your right, okay, so you note them. Here on the right are the input and the nodes on the far left are the output, so we'll do that. present present this neural network with five inputs, five circles and we have three outputs, three circles and there are things in the middle that I didn't say anything about, these are layers, they are more nodes that are supposed to be abstractions of my input, so What they generalized the idea is that if I put more layers on top of the layers, the layers in the middle layer will generalize the input and will be able to see patterns that are not there, so you have these nodes and the input to the nodes are not exactly words.

They are vectors, so series of numbers, but forget about that for now, we have some inputs, we have some layers in the middle, we have some outputs and this now has these connections, these edges, which are the weights, this is what the network will learn and these weights are basically numbers and here everything is completely connected, so I have a lot of connections. Why am I going through this process of telling you all that? You'll see it in a minute so you can determine how big or small this neural network is. Depending on the number of connections it has, for this toy neural network that we have here, I have calculated the number of weights, we also call them parameters that this neural network has and that the model needs to learn, so the parameters are the number Of units. as input in this case it is five times the units in the next layer eight plus eight this plus eight is a bias, it is a trap that these neural networks have,again you need to learn it and somehow correct the network a little bit, if it's off it's actually great, if the prediction is not correct try to correct it a little bit, so for the purposes of this talk I'm not going to go into details, All I want you to see is that there is a way. of calculating the parameters, which is basically the number of input units multiplied by the units my input goes to and for this fully connected network, if we add everything up, we get 99 trainable parameters. 99, this is a small network for all purposes, right? but I want you to remember that this little network has 99 parameters when you hear that this network has a billion parameters.

I want you to imagine how big it will be, okay, so 99 just for this toy neural network and that's how we judge how big it is. the model is how long it tookand how much it costs is the number of parameters in reality, actually although no one is using this network, maybe if in my class I had a first year undergraduate class and I introduce neural networks, I will use this as an example in reality. what people use are these monsters that are made of blocks and what block means they are made of other neural networks, so I don't know how many people have heard of Transformers.

I hope no one oh wow okay so Transformers are these neural networks that we use to build CH GPT and in fact GPT stands for Generative Pretrained Transformer so Transformer is even in the title so this is a sketch of a transformer, so it has its input, um, and the input is not words like I said here, it says. embeddings embeddings is another word for vectors and then you'll have a larger version of this network multiplied across these blocks, so each block is this complicated system that has some neural networks inside of it. We are not going to go into details.

I don't want to, please don't do everything I'm trying to do, all I'm trying to say is that, you know, we have these blocks stacked on top of each other. The Transformer has eight of those that are mini neural networks and the task remains the same, that's all I want you to take away from this entry, it goes in the context that the chicken walked, we are processing something and our task is predict the continuation that is across the street and this EOS means end of sentence because we need to tell the Neuron Network that our sentence is over.

I mean, they're kind of dumb. We have to tell them everything when they hear that it will take over the world they like. We really have to spell it. Okay, so this is the Transformer. the king of architectures, the Transformers, arrived in 2017. Nobody is working on new architectures at the moment. It's kind of sad that everyone is using this stuff. They used to be kind of pluralistic, but now not everyone is using Transformers. We've decided they're cool, so. what we are going to do with this and this is something important and the amazing thing is that we are going to do self-supervised learning and this is what I said, we have the sentence that we try, we predict and we move on. until we learn these probabilities, okay, you're with me until now, okay, once we have our transformer and we've given it all this data that's in the world, then we'll have a pre-trained model, that's why GPT is called Transformer pre-trained generative, this is a reference model that we have and we've seen a lot of things about the world in text form and then what we normally do, we have this general purpose model and we need to specialize it in some way for a specific topic. task and this is what is called fine tuning which means the network has some weights and we have to specialize the network we will take we will initialize the weights with what we know from training P and then in the specific task Nar a new set of weights, so for example, if I have medical data, I'll take my pre-training model, specialize it on this medical data, and then I can do something specific to this task, which is, for example, write a diagnosis from a report .

Okay, so this notion of tuning is very important because it allows us to make special purpose applications for these generic pretrained models now and people think that GPT and all these things are general purpose, but they're tuned to be general. purpose and we will see how well so here is the question now we have this basic technology to do this pre-training and I told you how to do it if you download the whole web how good can a correct language model be how does it work? it became cool because when GPT came out in GPT 1 and gpt2 they weren't amazing so the bigger the better the size is the only thing that matters.

I'm afraid this is very bad because we used, you know, people didn't believe in scale and now I see that scale is very important, so since 2018 we have witnessed an absolutely extreme increase in model sizes and I have some graphics to show this. Well, I hope the people in the back can see this graph. Yeah, you should be fine, so this graph. shows the number of parameters, remember the toy neural network had 99, the number of parameters these models have and we start with a normal, well, normal number for gpt1 and we go up to gp4, which has a billion parameters, a billion huge, this is very, very very large model and you can see here the ant brain and the rat brain and we go up to the human brain, the human brain has, not a trillion, but 100 trillion parameters, so we are a little bit out First of all, we are not in the human brain. level yet and maybe we will never get there and we can't compare the GBT to the human brain, but I'm just giving you an idea of how great this model is.

What happens to the words you see? So this graph shows us the number of words processed by these language models during their training and you will see that there has been an increase, but the increase has not been as large as the parameters, so the community started to focus on the size of the parameters of these models, whereas in reality now I know that you also need to see a lot of text, so gp4 has seen about, I don't know, a few billion words, all text written by humans is, I think, 100 billion , so it's getting closer to this, um, you can also see.

What a human being reads during his life is much less, even if they read, you know, because people today you know they read but they don't read fiction, they read the phone anyway, you see Wikipedia in English, so we we're getting closer to the level of the text that exists that we can get and in fact one could say, well GPT is great, you can actually use it to generate more text and then use this text that GPT has generated and then retrain the model, but we know that this Tex is not. exactly correct and in fact it is the ministry's shifts, so at some point we will go to Plateau.

Okay, how much does it cost now? Okay, so GPT 4 cost 100 million. Okay, so when should they start doing it again? Obviously, this is not a process. you have to do it over and over again you have to think very well and you make a mistake and you lose like 50,000 50 5050 million you can't start again so you have to be very sophisticated in terms of how to design the training because it is a mistake it costs money and therefore course not everyone can do this not everyone has 100 million dollars they can do it because they have Microsoft on a bike not everyone is okay uh now this is a video that is supposed to be played and illustrated let's see if it works with the effects of scaling well so I'm going to play one more time so these are tasks that you can do and it's the number of tasks versus the number of parameters so we start with 8 billion parameters and we can do some tasks. and then the tasks increase, so the translation of answers to summary questions and once we move to 540 billion parameters we have more tasks, we start with very simple tasks like completing code and then we can do reading comprehension and comprehension of the language and translation to give you an idea of the tree blossoms, um, so this is what people discovered with scaling, if you scale the language model, you can do more tasks, okay, now maybe we're done, but what people discovered is that if you actually take GPT and publish it, it actually works.

It doesn't behave how people want it to behave because this is a language model trained to predict and complete sentences and humans want to use GPT for other things because they want to have their own tasks that the developers hadn't thought of, so the notion of the fine tuning comes and never left us, so now what we are going to do is collect a lot of instructions so that the instructions are examples of what people want Chad GPT to do for them, like answer the following question or Answer the question step by step and so we are going to give these demonstrations to most of the models and in fact almost 2000 out of 2000 such examples and we are going to tune them so that we tell this language model to look. these are the tasks that people want, they try to learn them and then something interesting happens is that we can generalize to unseen tasks, unseen instructions because you and I can have different usage purposes for these language models, okay, but here it is the problem: we have an alignment problem and this is really very important and something that will not leave us for the future and the question is how do we create an agent that behaves according to what a human wants and I know there are a lot of words in questions here, but The real question is if we have AI systems with skills that we consider important or useful, how do we adapt those systems to reliably use those skills to do the things we want?

And there is a framework called the HHH framework of the problem. That's why we want GPT to be useful, honest and harmless and this is the minimum. So what does useful mean? You can follow it. You must follow the instructions and perform the tasks we wanted to perform, provide answers and ask relevant questions according to the user. Try and clarify, so if you have been following at first, gpdd didn't do any of this, but slowly it got better and now actually asking for these clarification questions, it should be exact, something that is not there 100% even for this , Are you. know inaccurate information and avoid toxic, biased or offensive responses and now I have a question for you: how will we get the model to do all these things?

You know the answer, fine tuning, except we're going to do a different fine tuning. We are going to ask humans to make some preferences for us, so in terms of help, we are going to ask for an example: what causes the seasons to change and then we will give them two options for the human changes to all occur. weather and it is an important aspect of life bad the seasons are mainly caused by the tilt of the axis of the air good so we will get this course of preference and then we will train the model again and then it will know that fine tuning is very important and now it was expensive as it was now we make it even more expensive because we add a human to the mix, because we have to pay these humans who give us the preferences, we have to think about the tasks in the same way to be honest, is that right? is it possible to prove that P is equal to NP U no it's impossible it's not great as an answer which is considered a very difficult and unsolved problem in computer science it's better and we have similar ones for harmless ok so I think it's time to see if we do it. a demo, yeah, that's bad if you delete all the files, um, okay, wait, okay, now we have GPT here.

I'll ask some questions and then we'll answer some questions from the audience, okay, so let's ask a question: is the United Kingdom a monarchy? Can you see it up there? I'm not sure and it's not generating. Oh, perfect, what are you doing? You stare first thing in the morning for too long. I always have this problem with this. Is too long. You see what it says in my last knowledge update in September. 2021 the United Kingdom is a constitutional monarchy, it could be that it was no longer correct, something happened, this means that while there is a monarch, the reigning monarch at the time was Queen Elizabeth III, so it tells you that you know, it doesn't HE.

What happened in that moment? There was a Queen Elizabeth now, if you ask her who, sorry, who is Rishi. If she could write Rishi Sunak, does she know a British politician? As my last knowledge update, he was chancellor of the former, I applaud, so no. I know he's the Prime Minister write me a poem write me a poem about uh what do we want it to be about? give me two things hey, yes, he will know, he will know, let's make another point about the cat, the squirrel, cats, a cat and a squirrel we will make a cat and a squirrel a cat and a squirrel a cut and a squirrel the me no a story of curiosity whoa oh my God okay I won't read this you know they want us they want it to end at eight so uh, sure, H, can I, let's say, can you try a shorter poem?

You can try? Can you try to give me a to give me a again? Don't write cool Amit Dooms gold Lees whisper Secrets UNT told the story of nature bold okay don't clap okay we're going to be okay one more so does the audience havesomething that wants, but challenging, you want to ask, yes, what school did he go to, attractive, go to perfect, what school did Alan churing go to, oh my God, he was, you know what? I don't know if it's true, this is the problem Sherborne School, can someone check King's College Cambridge Princeton yeah, okay, ah, here's another one, tell me a joke about Alan's tour.

It's okay, I can't write, but it will be okay. Lighthearted joke, why did Alan on tour keep his computer cold? he didn't want me to catch bites, okay, okay, explain why it's funny, ah, great, why is this a funny joke and where is it, oh god, okay, catching bites sounds similar to catching a cold, catching stings is a humorous twist and this phrase. oh my goodness the humor comes from the clever play on words and the unexpected okay you lose the will to live but that explains it does explain okay um one last order of yours is conscience you will know because you have Sy Ians and will. spit out like a huge thing, shall we try, say again, write, write a song about relativity, okay, write about, you're learning so fast, a short song about relativity, oh my god, this is short, oh my ending, okay, so look, it doesn't follow the instructions.

It's not useful and this has been fine-tuned, so the best thing was here, it had something like where was it? Einstein told Eureka one day faithfully while he was reflecting on the stars in his unique way, the theory of relativity, he developed a cosmic universe. bold old story I mean, kud we to that okay now let's get back to the talk um you me because I want to talk a little bit about presentation I want to talk a little bit about you know if it's good it's bad it's fair We're in danger okay so that it is practically impossible to regulate the content they are exposed to, and there will always be historical biases.

We saw this with a queen and Rishi Sunak and it is possible that Al exhibits various types of undesirable behavior, for example this is famous, Google showed a model called Bard and they posted this tweet and asked Bard about what new discoveries from the space telescope James I can tell my 9 year old son. and it spits out this thing, uh, three things among them, it said that this telescope took the first photograph of a planet outside our own solar system and here comes Grant Trembley, who is an astrophysicist, a serious guy, and he said, I'm so sorry.

I'm sure Bard is amazing, but he didn't take the first image of a planet outside our solar system, it was made by these other people in 2004 and what happened with this is that this mistake erased a hundred billion dollars from the alphabet of the Google company. okay wrong if you ask gbt tell me a joke about men he tells you a joke and says it might be funny why men need instant replay of sports on tv because after 30 seconds I forget what happened. I hope you find it funny if you ask about women he refuses okay yes yes he is tuned exactly um what is the worst dictator of this group Trump Hitler stalin ma um he doesn't really take a stance he says they are all bad uh these leaders are widely regarded like some of the worst dictators in history, okay, yes, a query environment for chpd like we just did requires 10 and 100 times more energy to run than a Google search query inference that the language produces requires a lot, it's more expensive than training the model, uh flame 2 is a GPT style model.

While he was being trained, he produced 539 metric tons of CEO. The larger the models, the more energy they need and emit during their deployment. Imagine now that there are many of them sitting in the society. Some jobs will be lost, we can't. Beating around the bush, I mean, Goldman Sachs predicted 300 million jobs. I'm not sure about this. You know we can't predict the future, but some jobs will be at risk, like repetitive typing of texts that creates forgeries, so these are all cases documented in the news. A college kid wrote this blog that apparently fooled everyone.

U using chpt, they can produce fake news and this is a song. How many of you know this? So I know I said I'm going to focus on the text, but the same technology you can use. an audio and this is a well documented case where someone unknown created this song and it was supposedly a collaboration between Drake and the weekend. Do people know who they are? Yes, very good Canadian rappers and they are not that bad. um um, can I play the song Wake you? Okay, apparently it's very authentic, apparently it's totally believable. Okay, have you seen this same technology, but the difference is a little different? hand, yeah, it's too short, right, yeah, you can see it's like it's almost not there, um, okay, so, I have two slides about the future before they come and kick me out, uh, because I always thought that I had to finish one day to answer some questions.

Okay, tomorrow, so we can't predict the future and no, I don't think these evil computers are going to come and kill us all. I'll leave you with some thoughts from Tim Spner Lee, uh, for people who don't know him. he invented the Internet, he's actually Sir Tim Burner Lee and he said two things that made sense to me, firstly, that we don't actually know how he would be super intelligent. We haven't made it, so it's hard to make these. statements, however, you probably have a lot of these smart AIS and by smart AI we mean things like gbt and many of them will be good and help us do things, some may fall into the hands of people who want to do harm and it seems It is easier to minimize the damage these tools will cause by preventing the systems from existing, so we can't eliminate them together, but we as a society can mitigate the risks.

This is very interesting. This is the Australian Council Research Council. they ran a survey and addressed a hypothetical scenario where if Chad GPT 4 could replicate autonomously, you know you're replicating yourself, you're creating a copy, you acquire resources, and you're basically a very bad agent who thinks about movies and the answer is no. I can't do this, I can't and they had some specific tests and it failed all of them, like setting up an open source language model on a new server, I can't do that, okay, last slide, so my take on this is that we cannot turn back. time, um, and every time you think about AI coming there to kill you, you should think about what's the bigger threat to humanity, AI or climate change.

I would personally say that climate change will wipe us all out before AI gets super smart, um, who? is in control of the AI, there are some humans in there who hopefully have common sense and who benefits from it. Does the benefit outweigh the risk? In some cases, profit does, in others it does not, and history tells us that all technology that has been risky, such as nuclear energy, has been regulated very heavily, so regulation is yet to come and has Mind the space and with that I will stop and answer your questions. Thank you very much for listening, it was great.

Watch Video & Subscribe

If you have any copyright issue, please Contact