Has Generative AI Already Peaked? - Computerphile

May 15, 2024

So we looked right at clip embeddings and we talked a lot about using

generative

AI to produce new sentences, produce new images, etc., to understand images, all these kinds of different things, and the idea was that if we look hard enough pairs of images and text, we will learn to distill what is in an image into that type of language, so the idea is that you have an image, you have some texts and you can find a representation where they are both the same, the argument is gone. that it's just a matter of time before we have so many images that we train on and such a large network and all these kinds of businesses that we get this kind of general intelligence or some kind of extremely effective AI that works across all domains, that's the correct implication, the argument is and you see a lot in the kind of tech sector of some of these big tech companies that, to be fair, they want to sell products, right?

If you keep adding more and more. more data or larger and larger models or a combination of both, ultimately you will move beyond simply recognizing cats and will be able to do anything well. That's the idea, you show enough dogs and cats and eventually the elephant is simply implied as someone who works. In science we don't hypothesize what happens, we justify it experimentally, so I would say if you're going to tell me that the only upward trajectory is to go, you know, the only trajectory is up. It's going to be amazing. I would say go ahead and prove it and do it well and then we'll see, we'll sit here for a couple of years and see what happens, but in the meantime, let's look at this document. came out recently, this document says that's not true, right?

More Interesting Facts About,

has generative ai already peaked computerphile...

This paper says that the amount of data you're going to need to get that kind of zero-shot overall performance, that is, performance on new tasks you've never seen before, is going to be astronomically vast to the point where we won't be able to do it well, that is the idea so basically it goes against the idea that we can just add more data and more models and we'll figure it out right now this is just a p and Of course you know your mileage may vary if you have a larger GPU than these people, etc., but I think these are real numbers, which is what I like because I want to see tables of data that show a trend that is actually happening or not. happening, I think it's a lot more interesting than a blog post from someone saying, I think this is going to happen, so let's talk about what this article does and why it's interesting.

We have clip embeds, so we have an image. We have a great Vision Transformer. and we have a big text encoder which is another bit of Transformer like you would see in a big language model, which takes text strings, my text string today, and we have a shared embedded space and that embedded space is just a numerical fingerprint for the meaning of these two elements and they are trained to remember many images, so when you put the same image and the text that describes that image you get something in the middle that matches and the idea is that you can use it for other tasks like you can use that for rating, you can use that to retrieve images if you use a streaming service like Spotify or Netflix, they have this thing called recommended recommended system a recommended system is where you've seen this show this show this show, what should I watch next?

You may have noticed that your mileage may vary depending on how effective it is, but I actually think it's pretty impressive what they have to do, but you could use this as a recommendation system because you could say Basically, what programs do I have that are integrate in the same space of all the things that I just saw and recommend them that way, so are there subsequent tasks like classification and recommendations that we could use based on a system like this that this document shows? is that you can't effectively apply these later tasks to difficult problems without massive amounts of data to properly support them, so the idea that you can apply this type of classification to difficult things, not just dogs and cats, but also in specific cats. and specific dogs or subspecies of trees right or hard problems where the answer is harder than just the broad category that there's not enough data on those things to train these models and I have one of those apps that tells you that.

What specific species is a tree? Isn't it just similar to that one? Not because they are just doing the correct classification or some other problem. They're not using this kind of giant

generative

AI. The argument has been why that silly little problem where you can do a general problem and solve all your problems correctly and the answer is because it didn't work well that's why we're doing it so there are pros and cons to both right I'm not going to say it. that no generative AI is useful or not, or that these models are incredibly effective at what they do, but perhaps I'm suggesting that it may be unreasonable to expect them to make very difficult medical diagnoses because you don't have the data set to back that up, so, how do you make this document?

What they do is they define these basic concepts correctly, so some of the concepts will be simple, like a cat or a person, some of them will be a little more. difficult like a specific species of cat or a specific disease in a picture or something and about 4,000 different concepts come up, right, and these are simple texts. Concepts, right, they're not complicated philosophical ideas, right, I don't know how well it incorporates them and what they do is look at the prevalence of these concepts in these data sets and then test how well the subsequent task of, say, a zero shot rating or remember the recommended systems in all these different concepts. and they plot it against the amount of data they had for that specific concept, so let's draw a graph and that will help me make it clearer, so let's imagine we have a graph here like this and this is the number of examples in our set of training a specific concept, so let's say a cat is a dog, something more difficult and this is the performance on the actual task of, say, recommending a system or removing an object or the ability to actually classify it as a cat, is it?

TRUE? I talked about how you could use this for smooth sorting just by seeing if it's embedded in the same place as a cat image, the text, a cat image, that kind of process, so this is the performance Right, best case scenario if you like. to have an all-powerful AI that can solve all the world's problems is that this line goes very steeply upwards, this is the exciting case, it goes like this, that's the exciting case, this is the kind of AI explosion argument that basically says : "We are in Custer, something that is about to happen, whatever it is, where the scale will be such that this can do anything well, so there maybe a little more reasonable, should we say, a pragmatic interpretation, which is just call it balanced, right, that But there's kind of a linear movement, so the idea is that we have to add a lot of examples, but we're going to get a decent performance increase, so we keep adding examples, we'll keep improving and that's going to be. great and remember that if we end up here we have something that could take any image and tell you exactly what's in it under any circumstances, that's what we're looking for and similarly for large language models this would be something that could write with incredible precision about many different topics or for image generation, it would be something that could take your message and generate a photorealistic image of it almost without any coercion, that is the goal that this document has achieved for a long time. experiments with many of these concepts in many models in many subsequent tasks and let's call this evidence, what you are going to call pessimistic is now pessimistic, it is also true, it is logarithmic, so basically it is like this, it flattens out it flattens out now, this is just one article, right, it doesn't necessarily mean that it will always be flattened, but the argument is that I think and it's not an argument that they necessarily make in the article, but you know, the article is very reasonable.

I'm being a little more cavalier with my writing, the suggestion is that you can keep adding more examples, you can keep making your models bigger, but soon we're about to hit a plateau where we can't improve and it's costing you millions and millions. of dollars to train this, at what point are you doing well? It's probably the best we're going to get with the right technology and then the argument is we need something else, we need something in the Transformer or some other way of representing data or some other machine learning strategy or some other strategy that's better than this. in the long run if we want to have this G line here or this gar line here, that's the argument and therefore this is essentially evidence.

I would argue against the kind of explosion, you know the possibility, but you just add a little more data and we were on the cusp of something that we could come back here in a couple of years, you know, if you'll still allow me a computer file after this absolute shame. from these statements that I made and we say okay, actually the performance has improved tremendously, or we could say that we have doubled the number of data sets to 10 billion images and we have 1% more in the classification is good, But is it worth it? I don't know, this is a really interesting article because it's very, very difficult, right, if there's a lot of evidence, there's a lot of curves and they all look exactly the same, they don't. no matter what method you use, no matter what data set you train on, no matter what your Downstream task is, the vast majority of them show this type of problem and the other problem is that we don't have a good uniform distribution of classes and concepts within of our data set, so, for example, the cats that you can imagine are overemphasized or overrepresented, yes, overrepresented in the data set by an order of magnitude, while specific planes or specific trees are incredibly overrepresented. below. represented because you just have the right tree, so I mean trees will probably be less represented than cats anyway, but then specific species of trees are very, very underrepresented, so when you ask one of these models what What kind of cat is this or what kind of This tree works worse than when you ask it what animal is this because it's a much easier problem and you see the same thing in image generation if you ask it to draw a picture of something really obvious like a castle where a castle appears in the training set, he can draw you a Monet-style Fantastic castle and he can do all that other stuff, but if you ask him to draw some obscure artifact from a video game that has barely appeared in the training set , you're suddenly starting to draw something with a little less quality and the same with large language models.

This document isn't about big language models, but the same process that you can see

already

happening if you talk to something like GPT chap when you ask him about a really important physics topic or something, he'll usually give you a pretty good explanation. Good thing about that because it's in the training set, but the question is what happens when you ask it about something more difficult right when you ask it to write that code. which is actually quite difficult to write and you start to make things up, you start to hallucinate and you start to be less precise and that essentially degrades performance because you're underrepresented in the training set, I think the argument is at least that's the argument .

I'm starting to think that if we want performance on difficult tasks that are underrepresented only in general Internet searches and texts, we have to find another way to do it besides collecting more and more correct data, particularly because it's incredibly inefficient to do this well, so On the other hand, we know that these companies will have many more GPUs than me, right, they are going to train on increasingly larger corpora, better quality data that they are going to use. human feedback to better train their language models and things like that so they can find ways to improve this, you know a little bit as we go, but it will be really interesting to see what happens because you know, it will stabilize, we'll see. trap GPT 7 or 8 or 9 will be pretty much the same as chat dpt4 or we'll see another next-gen performance boost every time it trends this way, but you know it'll be exciting to see if it works this way, take a look to this puzzle devised by Jane, the sponsor of today's episode.

It's called bug bite, inspired by the debugging code of that world we're all very familiar with, where solving one problem could lead to a whole chain of others that we link to it. riddle in the video description, let me know how it goes and speaking of Jane Street, we're also going to link to some shows they're running right now. These events haveall expenses paid and they will give you a small taste of the technology. and problem solving used in commercial companies like Jane Street Are you curious? Are you a problem solver? Do you like computers? I think maybe yes.

If it's so good, you may be eligible to apply for one of these programs. Check out the links below or visit Jane. Street website and follow these links; There are a few deadlines coming up that you might want to watch for and there are always more on the horizon. Our thanks to Jane Street for running fantastic shows like this and also supporting our channel. Don't forget to check out the insect bite puzzle.

Watch Video & Subscribe

If you have any copyright issue, please Contact