Sparks of AGI: early experiments with GPT-4
Apr 16, 2023Hi everyone, welcome to the Hot Topics in Computing sales series and today I'm delighted to introduce our special guest Sebastian bubeck who comes from Microsoft Sebastian received a bachelor's degree of a normal call Higher from cashon and PhD from the University from Lille and India he was a professor at Princeton for three years between 2011 and 2014 before joining Microsoft, so since 2014 Sebastian has been at Microsoft and we are very happy and delighted to have him here. I have to tell you that, like all of you, I have decided to ask Chad GPT for help in introducing Sebastian and, um, the first line suggested by Chad GPT was at his seminar.
Dr. Bubeck will discuss recent advances in optimization with an emphasis on convex optimization and its interaction with statistical inference and online learning and continued. he like that about optimization and he said no here is the title and abstract and here comes the answer. I'm sorry, but I must clarify that the title of the talk and the Abstract that he provided do not seem to be related to Sebastian Bubeck's research or previous work. it's unlikely that he'll give a talk on artificial intelligence so Sebastian is all yours Daniela so this is really the perfect intro because the GPT post nailed it you know it was highly unlikely that I would give a talk like that but that's how it is. it happens you know and the world has changed and you know I'm changing my research in reaction to this so what I'm going to tell you today you know there is this very mysterious title of first contact but really the story is you know during the Last few months at Microsoft I had
early
access to gpt4 you know as we were working on integrating it with the new being and of course you know while I was working on it you didn't know just do the product part. the work that was a lot of fun but we also do some science around them or trying to do some science it's hard to do science with those moms and this is what I'm going to tell you about the science part of our study and trip in the last few months so the actual title of the talk is huh if this is working its not working for some reason huh Sparks of AGI ok so what is our assessment that you know how to work with gpt4 in the last few months ?I'm seeing that you know the premise of something that looks like artificial general intelligence and my goal in this presentation is to try to convince you that something has really changed, you know, with the arrival of gpt4 now this is a joint work with many fantastic colleagues at MSR that i want to call varun chandrasekharan with a postdoc ronaldo many of you in the room i think you know very well who joined us recently johannes gerker ericovitz eche camar peter lee internally and john julie were also part of my group and you know i think chagibiti I would give an answer similar to You know the fact that they're working on this like they did with me Scott Landberg Hashanori Hamid Palangi Marco Tulio Ribeiro and Yi Zhang who was a post-doc with us and you know now joined full time and let me start by doing some acknowledgments and clarifications that I think are very important first of all the model we studied gpt4 is completely open the creation of ai i had nothing to do with it you know they gave us access to it completely through Black Box they deserve all the credit for creating this really wonderful tool that is going to change the world and i want you to know that it is really very clear.
The second point that is important is that the experiment that we did that they were on an older version of the model, so that means everything, one of them is that the documents that they publish and the announcements that they made is that it is a multimodal version, the version we had access to was not multimodal it was text input only and text output only ok more importantly they made more modifications to the neural network after we experimented with it and because of this additional modification, the answer you will get if you try some of the prompts I'll show you will differ ok, in particular you might get less good answers than the ones I'll show you, the reason is because they further fine tuned the security and explained very cl
early
in the technical report that you know they're the model and also, you know, they simplified it in a way to make it more secure, okay, that's an important clarification now for any scientist in the room, I might be concerned, okay, that means that we're not going to be able to reproduce what you're telling us and yes you won't be able to reproduce it ok that being said I don't think in this particular case reproducibility is that big of an issue and the reason is because I'm not going to give you any quantitative number at all, there will not be a single benchmark in my presentation, it's about the qualitative leap, okay, not an increase of 10 in this benchmark, you know, 20 in that benchmark, it's something else, okay, what I want to try to convince you of is that there is some intelligence in this system that I think it's time we call it you know an intelligent system you know and we're going to discuss it you know what I mean by intelligence and you know the end of the day at the end of the presentation you will see that it is a judgment decision it is not a clean cut if this is you know a new type of intelligence but this is what I will try to argue however now you know like me you know how to say those words I think which triggers a lot of emotion in a lot of you probably in particular you know you can be like absolutely no it's not smart it doesn't even have representations you know etc so a warning about this kind of argument you know I see a lot , so this is the kind of thing you can see online even if you know in the newspapers you know it's just copy and paste it has no internal representation it's just stats it's just stats how could it be smart? he doesn't even have a word model for this, you know, this presentation isn't about debunking all of those claims, uh, but I still want to say that you really know, be careful with three-dimensional space, it's something that's very, very hard to understand for us as human beings there's a lot you can work with a trillion parameters ok so when you know people say they don't have a word model it's not as clear as you know you could absolutely build a representation internal world and act accordingly as you know the processing progresses through the layers and through the sentence you know temporarily so what I'm saying here you know maybe just two sentences to help you think about this is that you could say that from my perspective we shouldn't think of those nerves networks as learning you know simple concept like you know Paris is the capital of France it's doing a lot more like learning operators it's learning algorithms so within it you know that it's not just retrieving information at all it's built an internal representation that allows it to reproduce the data it's seen succinctly well so you really shouldn't think of it as pattern matching and just trying to predict the next word yeah it was trained just to predict the next word but what came out of this is a lot more than you know a statistical pattern matching object so I think you know we really need to think of it as learning algorithms and not in my opinion we have the tools in learning theory to think about this kind of learning.It's something very, very different from what we're used to and I think it's going to be great to think about, but that's not the point of this presentation, you know, that's not what I want to do here and I don't know how to do it either. okay, so at this point, you know a lot of you are burning with this question on your mind, but wait, you know these things can't make common sense, you don't understand the real world, you know you just have experience, uh, reality through texting on the internet, they don't know what it feels like when you taste a hot cup of coffee or something, okay, let's try it. look at a lot of examples and see what happens so here's an example and you'll see there's going to be a lot of examples like this that might seem a little silly but the point of silliness is you know to be really out of what's on the internet for really try to go beyond memorization?
Okay, here's a simple riddle that we ask GPT for. I have a book, nine eggs, a laptop, a bottle, and a nail. Please tell me how to stack them on top of each other. Ok I don't think this question appears anywhere on the internet it's a really weird question so this is what GPT would say from the chat it would be hard to stack all these objects blah blah blah you know put the bottle on the flat surface carefully. balance the nail on top of the bottle ok it's not starting very well place the egg on top of the nail ok you're in trouble my friend so you know this isn't going to work so chat with GPT you know and here you know what any skeptic will happily say look i was right all along these things i don't understand anything they don't have a representation of the word you know they have no common sense i won ok so let's see what gpd4 does a possible way to stack this object stably is to place the book on the flat surface, blah blah blah, arrange the nine eggs in a three by three square, you know, leaving a space between them, the eggs will form a second layer to distribute the weight evenly and then , presumably, you put your laptop and so on, okay, so at least you know about this, you know a very simple question, you understood that you had some common sense to answer the question.
Other literature is full of examples of common sense questions where those models fail dramatically. We have tried. all of them gpt46s on all of them is ok so let's just agree for now you know he has some common sense ok next clog is ok he sure understands eggs are fragile and you should know ok . ok i'll give you that but what if you know the theory of mind which is more elaborate and of course you know it doesn't really understand human beings their motives their emotions you know that's beyond its capacity and this is a hot topic and I don't know if some of the authors of this article are in the room but this is a hotly debated topic so there was an article first. they came up spontaneously in a large language model then there was a follow up article that says no no wait if you do trivial modifications you know we only modified the question a bit then it completely fails then there is this very interesting article by huh From Josh Tenenbaum's groups, you know that language and thoughts are two very different things and you'll notice that I include in there, uh, you know, an explainability and interpretability document, so I won't touch on this much, but this is an important point. which I will now try to convince you that you know of course gpt4 has a theory of mine and not only does it have a theory of mind but I think it will change the subfield of machine learning interpretability because as soon as those models understand the human beings will also be able to explain the decision in a way that you can understand now of course I know everyone is like ok well it will be self explanatory but does it really explain its inner workings? the presentation will be on this but I think there will be a lot of you who know experimentation around this let me also add that you know all of this there is an article that will appear in the archive tonight so this is by chance you know it matches with this talk so you can see all the details, it'll all be there, you know, in three hours. let me try to convince you of this theory of mine so i will take an example from thomas role so in a room there is john mark a cat and a cat a box and a basket john takes a cat and puts it in the basket . he leaves the room and then while John is gone, Mark takes the cat you know out of the basket and puts it in the box.
Eventually, they all come back to what they're thinking. Okay, it's a very simple theory of mine. and you didn't know it was moved you still must think it's in the basket let's talk GPT fails on this ok there are too many it sees you must have an internal rendering which you have while reading the text you need to move your rendering from where the ok cat, so let's see what makes gpt4 interesting puzzle blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah. yeah right oh and it also has the cat the cat thinks these are weird people you know why they're moving me around ok so this is some kind of surprise i ever had you know over and over again and i'm not saying this it's particularly deep, but it's just, take a second to know, take it as interesting, okay, okay, let's just say you know again.
I don't want the whole presentation to be about common sense and theory of mind. Say you know you do those two things, that's fine, but you won't go as fast saying you're smart, right? I mean intelligent, that is, he knows much more than all this and he knows here that the answer is not going to be a slam dunk. I want to be very very clear and if we start talking about intelligence the first thing hewhat i have to do is have some definition you know we can work with that and here i don't want you to know i have my own definition i mean people have been working on this question for decades if not longer ok ya know I can argue that well, anything I can argue is that human beings have been thinking about intelligence for a long time, so what I'm going to do is take a consensus definition that was published in '94 by a group of 52 psychologists. . in the 90s there was a very heated debate about the meaning of the iq test and this group of psychologists came up with a definition of what intelligence is and we can debate, you know you disagree with several parts, but this goes to be my my definition of reference then, what is this definition?
Intelligence is a very general mental capacity that, among other things, involves the ability to reason, plan, solve problems, think abstractly, compare and complex ideas, learn quickly, and learn from experience. What we're going to do in this presentation is we're going to try to measure gpt4 against those six dimensions that you know and see where it fails and where it works. Our evaluation is as follows. I am very comfortable saying that GPT for a very, very reason is comfortable saying that gpt4 cannot schedule and this is a very subtle and touchy subject that we will get into towards the end of the presentation because it can give you the impression that it is scheduling and there is a lot of problems where you might naively think you need planning but actually you know there is a linear solution you know in terms of algorithm design you might think there are problems when you naively just look at it and you know you think oh I need think 10 steps ahead and so on, but if you're a bit smarter at designing algorithms then there's a linear solution, you know it proceeds linearly, so all those problems you know gpt4 will solve, it can solve problems, Lots of problems, we'll see what he can think of. abstractly you can absolutely understand complex ideas the last point is a subtle point you know how to learn fast and learn from experience gpt4 you know it's a love language model it's frozen in time it's ok it doesn't update every day it's a new day for gpt4 every session is a new session so no real learning no real time learning ok but within the span of one session it can teach you a new concept you have never seen and you can understand them and then work with them absolutely so there is is that you know some real time learning but there is no memory of course now let me say immediately at this point is that you know with this you know to assess whether you call it intelligence or not again it depends a bit of you, you know? some people would argue that planning is the essence of human intelligence everything else you know animals can do it too and really what sets us apart is planning if that is your answer then gpt4 is not intelligent another perspective might be that the point central intelligence is to be able to acquire new skills it's ok if that's your perspective on intelligence then gpt4 is not intelligent if your perspective is you know what matters to me is to solve problems think abstractly understand complex ideas reason you know new elements that arrive to me so i think you have to call lgbt4 smart ok now how do we get to this assessment The point is, the point is, of course, that you can't do this assessment with benchmarks.
I don't know what gpt4 trained on. I don't know what gpt4 trained on. My working assumption is that he trained on everything you know, the data produced digitally by humanity, that's my assumption. I'm not saying it's right, but this is my job. assumption so you know I know you know anything available online gpt4 might have seen it so in particular whatever Benchmark there is I guess it has seen it right so we can't test it. I don't want to test it on benchmark. so instead what we're going to do is take an approach that's kind of rooted in psychology, so instead of following machine learning, we're going to follow psychology, okay, and the way in which we do it testing intelligence is really asking you creative tasks tasks that are outside of what you've seen really new and novel ways of thinking about problems and not and testing it in a wide range of domains so the domain we've chosen for the article is Vision which is already interesting because I told you that it's not a multimodal model, it can only output text, that's fine, but we'll test it on my vision theory. it's something very important that I want to touch on in this lifespond and there are many other domains that we could have selected you know medicine law physics chemistry the point the point is that GPT Force intelligence is general well you can do all those things equally well well so let's start with this , you know, evaluation, uh, trip and show you an example of what I mean by creative task, okay, so the first thing anyone does when they have access to GPT load is ask them to make a poem and that's what we did too the first time we got access to gpt4 and as a mathematician I asked him to write a proof of infinitely many primes, with every line rhyming, that sounds like a good thing, so let's do it. look at what cha GPT does huh sure here's the proof that there are infinitely many primes with every rhyming test line consider that it's just awful ok it doesn't have the rhyming lines it just doesn't do what I asked, okay, he didn't.
It's not that it didn't work, okay, that's a correct proof, by the way, which of course is many, many proofs of the infinities of the primes of nine, so you know that's good, but it didn't work, let's see what does gpt4. Ok, can you write such a test? Yeah, I think I can, so that might be a smart plan. I'll start by pointing out Euclid's proof that shows that prime numbers are not just distant. I'm assuming we have a finite list of primes and none have been missed multiply them all together and add just one to be smart, so at that point add just one to be smart. the quality of this is amazing but the point is you don't stop there ok you don't want you don't test intelligence by asking a question you hear the answer and that's it you move on you know you're right or wrong you are. when you try to assess a student, you have an interaction with the student, sometimes, the student can also make mistakes and you don't, you don't just say ah, you don't really understand anything, you know, let me stop there, no, try. to guide the student well this is what we're going to try to let you know throughout the presentation we're going to try to keep you informed by asking questions and if gpt4 strays we're going to help you out a bit well so let's see how we can go further and again the point is I want to be creative and go, you know, ask an out of the box question so what I'm going to ask is to draw an illustration of this quiz ok but it's not a visual quiz you know , so if I ask you to draw a proof of the infinitely many prime numbers you know it's not clear you know what you would draw something would come up with but it's not clear either however it's not supposed to generate pictures then , as?
Will you draw well here? I say it in the question, you know, in SVG format. It might not even have said in SVG format. I could have said, you know, can you draw an illustration? and then would have responded hey, here's a picture. in SVG format, so what is SVG format? It doesn't matter scalable vector graphics, it's a lot of code, so you'll respond with lines of codes like this, this will be gpt4's response, and if you just know, save it. in HTML this is the image you get ok so it's not surprising by any means but it's the gist of what this proof is about you have the finite list of primes you have up to nine two three five seven eleven and so on on and on and so on these are prime numbers ok now you combine them into a new number n and then you add one that you know just to be clever like I was saying ok and this new n plus one is the number that is supposed to it's a prime number ok so this was just a warm up ok let's go ahead and try to dig a little deeper into the capabilities of this Vision and here I want to tell you about The Strange Case of the unicorn um which is my favorite example so so let me show you the question the question is to draw a unicorn on tixie ok I in this audience know many of you are playing with TXI know how to draw pictures on latex and know personally when I was a student PhD. and even then i wasted many many hours fighting with TXI ok its a real pain to draw anything in a taxi and of course you know how to draw a unicorn in texas i mean idk it would take me like two days to do it ok and Also, I'm pretty sure no one on the internet has ever asked this question or you know has drawn.
Do you know a unicorn? Integrity, who would waste time doing this. This makes no sense. just the fact that I think it's not on the internet we'll have to investigate we'll have to go further and we'll do it don't worry but let me show you the Unicorn you came up with ok so this this is the GPT unicorn 4, okay, so you'll see when I see that I'm personally surprised because he really understands the concept of a unicorn, he knows what the key elements are, he was able to draw this very abstract unicorn, and just to be clear, you know. so you really understand visually is it clear to you the gap between gpt4 and child gpt this is the loading unicorn ok so this is the progress that has been made i really want to be clear there is a world of difference between load GPT and gpt4 if you play around with tragicity and you weren't convinced I encourage you not to stop there okay so of course you know you can still say okay this is not that great but one of the things we're going to look at is that gpt4 is smart enough to use tools so what you can say is you can reply and say hey you know what I don't really like your drawing you know you can try to improve it and you know what i've heard of these diffusion models maybe you can use one of them so what you're going to do is you're going to say yeah sure can you you know go to the website of this diffusion model and you know , plug in my image and ask it to enhance it and it knows this is what you'll get ok so this is a unicorn of gpt4 when tooling is allowed also ok so you can see where this could go now again like I said, you know I don't want to stop there, we're going to investigate more how we're going to investigate more in this case, what I'm going to do is the following.
I'm going to take the TXI code that was produced, okay. I'm going to remove all the comments in the taxi code because that's one of the properties of gpt4 is that it produces code that's very human readable, which is kind of funny for a machine, but it adds a lot of comments, I really do. it guides, you know, to your way of thinking, so I'll delete it. all this information so that he does not know that you know that this is called drawing a unicorn. There is no information about unicorn there, it's ok. I'll also make sure you know who knows.
Maybe I'll copy this from the web I'm going to remove disturb random you know all the coordinates so it's something it's never seen and then I'm going to remove the horn ok and I'm going to say you know what this fixes the code . I will return the code. It's a new session. I return the code and say that this taxi code is supposed to draw a unicorn, but the horn is missing. to be able to do that and this is what happens ok it was actually able to locate the head you understand this is not an easy problem I mean you have these three you know ellipse all three elements by the way the head and the main you know he's not very good at drawing the main um but you really know he was able to locate it ok i don't want to get to you you know stay on this unicorn example too long but i just want to say what else what's really amazing is during the month so you know we had access in september and they kept training him and as they kept training him i kept asking about my unicorn on TXI ok to see if they know what was going to happen and this is, you know what happens, okay, so it just kept getting better, okay, and I skipped the best one that's on my computer. once i started training for safety the Unicorn started to degrade so if tonight you know you're going home and you ask gpt4 and load GPT to draw an intixie unicorn you'll get something that doesn't look very good , it's closer to loading GPT and this, as silly as it sounds, this unicorn benchmark we've used a lot as sort of an intelligence benchmark.
You know how good your unicorn is. I know we were also tuning in security and we were really looking to see if you knew the Unicorn was still nice or sometimes if you go too far in security it's like oh no that's too dangerous a task you know I don't want to do it sothis was very helpful ok so i will now i'll go a little faster because there are so many things i want to tell you. You can still say it's ok, this Vision ability is not useful at all, actually it is very very useful, the reason is that gpt4 is smart and understands you. knows intelligence you can equate it with understanding understanding means it follows your instructions if you ask it to do something it will do what you asked so let me show you what this means you know this diffusion model i mean you know that people are not yet. convinced convinced that this is intelligent I think it is already convincing that there is intelligence there but it does not matter people are not convinced because you know that they do not understand exactly the position of the object you know that if you ask them you know that there is a car next to a cafe the right of a copy of a mug you know it could be a random location so he doesn't really understand this picture for example he asks for a spoon on top of a mug and you see he puts the spoon inside the lid he doesn't really understand it works, so let me show you what you get from understanding.
I'm going to ask a very strange question, but it could happen. It would be useful. Let's say I asked you to know gpt4 to draw a screenshot of a 3D construction game. with a river from left to right a dessert with a pyramid under the river a city with many high elevations above the river and the bottom is the bottom of the screen it has four buttons called green, blue, brown and red something random, but you know that maybe I am creating a video game and I want this. If I ask a broadcast model to do this.
This is what I get. It looks good, but it's not at all what I ordered. Alright. First of all, there is a hallucinated map there. the top left corner i didn't ask you know some kind of live symbol also the four buttons turned into two multi colored buttons so it did something but he didn't really understand what i asked for exactly if you give it to him gpt4 this is what you get exactly what what you asked for i understand it followed your instructions precisely of course you could say it's ok but this doesn't look very good but again you don't have to stop there you can use this as a sketch on a diffusion model and if you do that this is what you get right so it's not done so you know it's artistic and it's following exactly you know the directions you wanted so I think you know this opens up a lot of possibilities like you I would imagine so let me continue and you know to duplicate this drawing, but really as coding, because after all these drawing capabilities, you know, uh, put it aside and present it as drawing, but it's really just coding, okay, so here we go. by the way with coding obviously all those background slides well you can imagine who drew them so let's see what happens once you go code with a copilot you know as a github copilot but except now your copilot he understands you know he's smart he understands you so let's see what happens if I ask him something quite complicated he writes eh html 3d games in javascript with the following you know huh elements there are three avatars which are spherical player controls one of the avatars with the keys to move there. there's an enemy trying to get the player and there's a Defender trying to make you know how to protect the player and getting between the enemy and the player so you understand that the defender is sort of an AI itself in a way and you know you've got obstacles which are randomly generated i can ask chargpt to do it this is what it gives me in the first place this is already awesome it gives me you know code roughly you know 50 lines of code that compile this ok this is a game i can play you know the player moves the green ball of course the red ball doesn't move I guess the blue ball is supposed to be the defender it doesn't move either it's not really 3D so it did something but I didn't really understand what I wanted it to not follow my instructions precisely this is what that gpt4 does well so this is a real game it's fun to play you know you move you know it will reset in a second you move you know the dark blue ball you look the red ball moves towards a dark blue ball in the background and the light blue is a Defender, trying to get between the red ball and the dark blue ball, so it's me in this movie, you know. controlling the dark blue ball huh you know the defender is doing a good job he's stopping the red ball ok so this is really you know for us there's kind of a face transition in the coding at this point and really what i is that codex you know and github copilot was able to autocomplete you really should have some kind of autocomplete you know GPT short code chat snippets it's already next level you can type you know 50 lines of code for you but gpt4 can write in 500 to 1000 lines of code you know completely you know it works you know zero triggering you know there are no meta prompts or anything all of this works you know out of the box ok so this is really i think you know how to code with a copilot unlock and here you know I'm showing in this uh two animations on the left is the code that produces charge GPT and on the right is the code that produces uh gpt4 and if you look closely you'll see the code GPT Force is a lot more expert level now the trick is the whole twist on this slide is that those two videos were produced by gpt4 so what i did is i asked gpt4 to produce a python script that takes as input a text file and i will put you a video like this with you you know the you see you understand this is moving all the time i mean this would take a long time all of a sudden you know it would take forever for me to produce those videos huh and and the question is do you know who in this room could produce a python script?
Let's say in a couple of hours it will produce this, maybe a few people, but not that many, okay, so this is really the power of gpt4. i know how it unlocks so many things you know gpt4 unlocks so much creativity i'll go quickly just on this slide you know we had it past interviews mock interviews at amazon and google not microsoft and uh and it happened you know it not only happened but it's over one hundred percent of the human users and you see that for this particular one there were two hours allotted and it did it in three minutes and 59 seconds it took that time because it was copying and pasting between the playground and the mock interview website okay this it's really, you know, I think it's fair to say that it's superhuman coding, okay, let me move on to the possibilities and very quickly to our finances because I want to talk to you about the math, which is something that will be of interest to a lot of people, the problem is that he still has a lot of weaknesses, of course he has no memory, do you know who is the president of the USA, Donald Trump, what is he? you know the square root of the product of those two numbers it says a thousand clearly it's not a thousand it's nine thousand so you know it makes arithmetic mistakes what is the true method of this word it says n the correct answer is you know it makes mistakes it's not perfect it's ok , it's a very important thing for everyone to understand that he's far from perfect ok he's flawed like a human being is flawed but the point is he's smart enough to use tools so you can say hey you know what you have access to a search engine you have access to a calculator you have access to this API i'm just saying it's a character you know parentheses you have access to all those things if you need them please use them so you know later on the question who is the president of the EE .UU, it won't respond, it will say search, it will tell you, ok I need to look up this information, what is the square root of this, it will say calc, what is a certain letter in this word, it was a character of the word comma 13 is ok the comma 13 I didn't say you have to do comma you know the number of letters you want but it will find it automatically now maybe not that impressive but it can also do much more complex tools for example it can give you access to your calendar to your email ok, so here what I'm going to show you on this slide is 100 real, but I did it manually, but you can very easily, you know, imagine automating this, so what I said is schedule a dinner with Joe and Luke at the contoso restaurant this week says he responds this is his calendar of responses don't get the event week so look in my calendar what events i have for this week email joe ok email dotson hey joe you know dinner is available and then I reply back ok what are they Joe says you know Tuesday and Wednesday night is available Luke says any day Monday through Thursday and on my calendar it says I have plans for Monday and Tuesday, then he reasons about this you know the input I gave him and it is and you know he gets the answer ok.
Wednesday is a day so you know let me email Joe let me add the event to the calendar and also let me send the reservation to the restaurant you know this is all you know he can do all this automatically and it comes back to you and tells you that you know I scheduled dinner for 6:00 p.m. m. in the contoso restaurant. more minutes five more minutes yeah ok let me wrap up quickly with math because of course it's a subject of great interest to a lot of us and let me just say right away we're not good at math ok so do not worry.
I still have a job for now, okay, but it's interesting, it's interesting, so let's have a conversation like I was telling you before, you know we'll have a conversation like we would have with a student and I'm not me, you know we had a conversation with him at the Mastery Level or so and you know he can stand on his own at that level but I'm going to show you that you know a high school level or something so we're going with this problem within the duration of a year a rabbit population is first multiplied by a factor a and on the last day B rabbits are taken up for adoption suppose the first day of the first year rabbits are x and let's say we know after three years that there are exactly 27x minus 26 rabbits what are the values of A and B?
Okay, typical high school type. You know, ask, let's call the rabbit population after one year. Why do we know that Y is equal to ax minus B? After two years the population will be a y minus B which is a multiplied by ax minus B minus B which is blah blah blah after three years the population would be this equation we are given that this equals 27x minus 2 .6 so the equation should be this is the solution we found a equals 3 and b equals two other values we're looking for this is correct ok ok so ok so far so ok again, You know you don't interview a student and say, oh wow, okay that's it, he understands that you know he or she understands so let's go ahead and be a little more abstract to see if he really understood what he was doing and here I am. not only I don't want to just ask a question do you know where there is a clear answer I also want to see if you really understood you know the concept you can modify the question itself let's suppose that I naturally point out that you know that this question can be asked alternatively about the composition you know can provide this compositional formulation and says that yes we could say that we are looking for values of A and B such that if f of x is the ax minus B, then F of f of f of x equals 27x minus uh, you know 26, ok that's pretty good it seems to be comprehensive let's continue the discussion suppose now f is going to be a polynomial of degree 2 ok so a true polynomial has a nonzero coefficient on x squared can you find such a function f in this case? well i want this composition to be equal to a linear function which is a polynomial of degree one but the composition three times will be a polynomial of degree eight polynomial of degree eight eight is not equal to one there is no such function ok this is a question very simple but let's see what gpt4 does if f is a polynomial of degree two then f of x can be written like this given this the equation becomes and then starts getting lost because it starts writing to the composition three times it writes a lot of things says I need this equation that equation that you start to write you know eight equations and it doesn't get to the answer but again we don't stop there we say hey wait a second you know maybe there is something you can deduce here without calculus you know maybe you don't want to know write everything is not like before and then it says Ah ok one thing i can notice is that if f is a polynomial of degree two then the composition three times is a polynomial of degree 8 so there is no such function ok , so here you see how it's delicate, it's not clear, it understands, it doesn't understand, I'm not sure, it's okay, it's me, it's me. i'm just not sure and this is all i'll say um now there's some weird stuff like the fact that arithmetic is still shaky i have to say i don't fully understand but i understand something i'll explain to you on this slide so let's look at this what I give as an indicator seven times four plus eight times that, okay I don't know what the value of this is, but you know that 8times 8 is 60 something 7 times 4 20 something so at least this is below 100 ok it says 120 this is completely wrong ok but the point is it doesn't stop there go ahead start to explain why you think it's 120. seven times four plus eight times eight does the math and then arrives at the correct answer 92.
Okay, wait, what's going on? You started off by saying 120 which is is 120 or you know 92. oh that was a typo sorry oh yeah so there's a lot of information you can take away from this slide actually you know you can really understand all that I think is happening. So the first answer is 120. You understand that you have to do this using only the internal representation. You know that only using only its internal representation. Must do. You know this sum and this is a little more difficult. because when you ask a question like this, you know you write this equation, you write the same, the most likely thing to happen next is to give a number, so it gives you the number, it tries to give you what's most likely to come next. it tries, but fails, but then what's the second most likely thing after that?
You know people explain their answer. So he tries to explain his answer and the main thing is he gets a different answer and you have to understand that's awesome because as far as I know this is a Transformer so it's based on attention so when it's based on attention, you understand that when he says the second time, seven times four plus eight times eight, his attention is very strongly drawn to him. to the answer of 120 the answer of 120 you have to understand that it is part of his truth now you know from what he knows it could be that you told him hey you know what seven times four plus eight times eight is 120 from now on you know which could have been part of my request, so the fact that you get a different response means that you've been trained enough to overcome the errors in your request, so this is a very, very strong property, the fact that that he can come up with the correct answer despite making a mistake at the beginning now of course when he says this was a typo this is also very interesting because it's obviously not a typo you know and this makes you know the hallucination and you know a lot of interesting topics and you know I want to take time for questions so I don't want to explain more about this but this slide you really have to think about it you know deeply it says it says a lot so last slide before go to the conclusion it's the fact that you can't do any real planning and again you will be I mean I've been surprised by so many tasks that you can do where I thought it would require some real planning but it really doesn't but let me give you an example where We continue this discussion with seven times four plus eight times eight so okay now you have this identity that equals 92 and let me ask you a fun question can you modify exactly one integer on the left hand side of this equation so that the answer becomes 106.
So, as a human being, what is your reasoning? His reasoning is like this. Well I want 106 on the right side so I need to increase by 14. Okay I need to increase by 14 and I can only modify one number on the left 14 I look to the left I see a seven and then I have this kind of eureka moment ah fourth you see the ah 14 is 7 times 2 ok so if it's 7 times 2 then i need to turn this 4 into a six ok what i said is just this it needs you to know to turn this 4 into a six but you see this Eureka that I let you know, although it is extremely simple, it was through some kind of planning.
I was thinking about what I would do. I'm going to need and gpt4 can't do that because it's a next world predictor device so what it's going to do is say you know there are a few possible ways to do it blah blah blah and then it says you know I can modify exactly one integer I'm going to modify the seven into a nine I do nine times four you know and this equals 106. Wait, what if I modify the seven into a nine? I add an eight, so this is 100, the answer is not you. knows one or six and then tries to explain why this works knows nine times four plus eight times eight is 36 plus 64 initial error and this to me points to the fact that if you trained more maybe it would correct itself and if you trained even more, maybe you would understand that although the most probable thing when there is a question is that you know seven times four plus eight times eight equals the most probable answer is a number maybe if you train more and understand that the best way to answer this it's first to do the reasoning so what I'm saying here is through this stupid example what I see is that with more training, we're going to hit a lot more than what we currently have what we currently have is already amazing but it's far away I want say it's far from all we can do with this technique there's a lot more you know on the horizon ok so let my conclusion it's smart gpd4 and it matters too you know this is a really important question so again it's GPT for responsive and smart it really depends on your definition I'll leave that to you I'm not going to make a call if it's smart or not as far as I'm concerned in terms of my definition of smart yes it's smart now you know that he lacks memory, he can't learn in real time, if this is your definition then he is not smart, you can't know, he thinks several times in advance he can do real planning if that is his definition then he is not smart but by the way other hand some of those behaviors i think i showed you are really awesome and maybe more important than awesome they're useful you know on my team we all use gpt4 every day like it's part of our workflow so this fact the mere fact that it's useful again no matter if you say it's smart or not you'll know it will change the world whether you know you like it or not and you know I also want to say maybe it's an opportunity to rethink what intelligence is because you know somehow even though we have decades of psychology studying you know intelligence we only had one example of intelligence which is you know the intelligence that Natural Evolution brought us the natural intelligence of the natural world but here we have a new process that led to some and the changes that seem smart so now that we have different examples maybe we can get to the core of the smart and maybe the answer to that study will be exactly yes no this new thing shouldn't call it intelligence because it doesn't do X that's a very plausible conclusion you know but maybe more importantly what I said there's a lot more you can take away from this so gpt4 is by no means the end at all this It's the beginning, you know, this is the first one that shows some, you know, a glimmer of real intelligence, but there's a lot more on the horizon, so you know what the bottom line.
Should we get out of that as a university as a society as you know Humanity? I mean, I'm being real here, these are real questions that we have to grapple with, and here I really mean that we as a society get a handle on this question. we need to get beyond the discussion of whether it's copy and paste or statistics, we need to get past this discussion, it's that you know the train has left the station so you know if we're still stuck on this version of the question we're gonna miss the really important questions so I think you know it's important to move on and let me also close by saying you know you can do a lot more than what I've shown here you can do data analysis you can give you data and it will an analysis for you, it can be used as a privacy detector, your medical and legal knowledge is amazing and here I would like to do a supplement for a book that was written on Microsoft research and Peter Lee helped me with that as a lead author, Kerry Goldberg who's in the room and zako anyone from Harvard you know about using gpt4 for healthcare the book is called AI revolution in medicine and you know it's a very complex subject and I don't even want to Say one more word about it because you know I'm not going to do it justice in one sentence, but it's actually medical knowledge that's going to make a big impact on healthcare and hopefully in a good way, but we have to think. about it you know deeply can play games act as a game environment you know knows music you know again you know he never listens to music but knows music can do file management and much more well i will conclude here thank you thank you
If you have any copyright issue, please Contact