Google IO 2024 Full Breakdown: Google is RELEVANT Again!

May 17, 2024

Google just wrapped up their Google IO event and of course it was all about Ai and it turns out they had some interesting things to release as well, including essentially what Open AI released with it yesterday. They had a multimodal model that you can have a conversation with. That felt incredibly personal and incredibly real, so we're going to look at the most important parts of the

google

.io event together and I'm going to give my opinion on it, so let's get to the first thing that's going to be. We're talking about Gemini and they're putting Gemini in everything and I remember just a couple of months ago they made a lot of similar announcements about Gemini being in Google Docs and Sheets and Gmail and workspaces in general, but now they're duplicating. or maybe I'm just saying the same thing

again

, but I feel like I've already heard a lot of this, but let's still go over something because what makes Gemini stand out is its context window with a million tokens and at the same time maintains a quality incredible and today.

They're going to announce two 2 million token context Windows, which is absolutely crazy, so let's take a look, so first it's designed to be multimodal natively, let's hear what Sundar has to say, let's hear what Sundar have to say about that model. Built to be multimodal natively from the start, which could reason through text, images, video code and more, it's a big step towards turning any input into any output. Introducing Gemini 1.5 Pro, delivering a breakthrough in the long run. You can run 1 million tokens in production. consistently more than any other large-scale Foundation model, however, the most interesting Transformations with Gemini have been in Google Search, so now let's give some examples of how they have been using Gemini in Google Search and going to expand that use even further because they have to protect their Golden Goose and that is the search, so they are trying to incorporate AI into all aspects of the search and, understandably, open AI is catching up with them and it was rumored that yesterday it opened AI was going to announce a search product, but they didn't, they announced that product, but I would be very surprised if open AI wasn't working in a search product, people are using it to search in completely new ways and ask new types of questions. of questions longer and more complex queries even searches with photos, okay, I'll skip ahead.

More Interesting Facts About,

google io 2024 full breakdown google is relevant again...

I usually find the part where they repeat everything they've done before pretty boring, so I guess you guys find it boring too, let's skip ahead, alright, one of my favorite products of all time, literally, of all times is Google Photos, it's absolutely amazing. I have hundreds of thousands of photos and videos in Google Photos. I use it every day and love the product for years and now. They're adding AI to Google Photos in a really cool way, so let's look at a little bit about that over 6 billion photos and videos uploaded every day and people love using photos to search throughout their lives.

With Gemini, you're making that a It's much easier to say you're at a parking station ready to pay but can't remember your license plate number before you can search for photos by keywords and then scroll through years of photos looking for the right one. now you can simply ask for photos. It knows the cars that appear often, triangulates which one is yours and only tells you the license plate number. I mean, that's super impressive. This is a very specific use case, but it's actually one that I've done myself not only for my driver's license and other information that I've had, this is a really cool feature because if you think about it, if you have that many photos and videos in Google Photos and you can ask questions about any of those photos, the amount and depth of information in those photos. and the video is really tremendous and this has the potential to be a great feature and as photos can also help you search your memories in a deeper way, for example you could be remembering your daughter Lucia's first milestones.

You can ask in the photos, when did Lucia learn to swim, you can even continue with something more complex, show me how Lucia progresses in swimming here. Gemini goes beyond a simple search by recognizing different contexts, from swimming in the pool to snorkeling in the ocean, to the text and dates on our swimming certificates and photo packages. Put it all together in one summary, you can really take it all in and relive amazing memories once

again

. We are rolling out photos this summer with more capabilities to get back up and running. A really cool feature, making it one of my favorite things about Google Photos.

What it does for me right now is, I guess, like photo albums for me, albums and it combines them in different ways and it's automatic and it delivers them to me and it's really nice. An example is: hey, this is what you did a year ago and this. It's a more dynamic way of doing it, you can basically ask him anything and I think it will be really cool. This is a feature I will definitely use and of course the 1 million token context window for Gemini is awesome. because it really opens up a lot of different use cases that weren't possible before.

The one that comes to mind and the one I'm most excited about is coding because you could fit a pretty big code base into a million token context window, but now let me show you, as I mentioned, we're about to talk. about the context of 2 million tokens. Windows delivers the context window to 2 million tokens. Now, another thing I'll mention is that I heard directly from Google that they actually have 10 million tokens. Context Windows works internally, but they haven't released it yet, so I can't wait until these Context Windows become essentially infinite and we don't have to worry about chunking up different documents and what we want to put in the quick versus what we should delete right now, they're going to talk a little bit about Google Gemini in the Google workspace and I want to mention that this is actually the reason why Google has a pretty clear advantage over open AI and of course open AI has something of really cool products, they may have the best model, but the fact that Gemini can actually do things on your behalf and we'll see more of that in these demos, that's really the main feature, the fact that you have access to their emails to your document your presentations all the information Google has about you could be used so it's very personal and can actually perform tasks on your behalf people are always looking in your emails and Gmail we're working on making it much more powerful with Gemini let's see how a parent wants to know everything that is happening with their child's school, okay, maybe not everything is okay, so I want to point out one thing: the Google presentation lasts almost 2 hours, it is very polished, it has high script and at times it feels a bit cheesy and this is a drastic difference from what we saw yesterday with open AI which felt much more warm, personal and improvised and they did everything live in the demo and I really like that approach and I and I'm not a big fan of the highly polished long form, highly scripted, like even that joke was so scripted that it landed so flat and, yeah, I'm a much bigger fan of the much more casual approach, now we can ask Gemini to summarize all recent school emails in the background. identify

relevant

emails, even analyze attachments such as PDF files, and get a summary of key points and action items.

Yes, this is another great example of why Google has a distinct advantage: they have every single one of my emails, meaning hundreds of thousands of emails over decades. of use and now I can use Gemini to search through them and potentially take actions on my behalf again and this is something I have been looking for for a long time and I can't wait because I search frequently through email and sometimes I find what I want but sometimes not, so I'm really excited about this feature, maybe you were traveling this week and couldn't make it to the PTA meeting.

The recording of the meeting lasts one hour. If it's from Google Meet, you can request that you give it to me. you highlights yeah again really great saves a lot of time. What would be especially cool is if Google Gemini becomes more proactive, it would let you know that you're on a trip and then automatically summarize the meeting, send you the notes, and ask if you'd like to follow up. this way and that suggestion is based on the context of everything you know about the meeting your trip what you missed Etc there is a group of parents looking for volunteers are you free that day of course Gemini can compose a response there are countless other examples of how This can make life easier and here I think they went over this very quickly, but you can simply write a response saying I want to volunteer for the parent group event and then the Gemini model will compose a response and could possibly submit that as well. but again, I don't want to have to ask you to write that email.

You should figure out what you wanted to write based on everything and then draft it so I can take a quick look at it, approve it, and send it off nicely. here's a somewhat new product, they're updating it, it's called the LM notebook, it allows you to put everything in one place, documents, PDFs, notes, everything and then you can ask questions about all that knowledge, all those documents, so it looks nice. Great, I haven't used it yet, but I want to play with this notebook. I've been using it with my youngest son and added some of his science worksheets, some slideshows from his teacher and even in open source textbook

full

of charts and diagrams with 1.5 Pro, instantly create this notebook guide with a helpful summary and can generate a study guide, FAQ or even quizzes, but for my son Jimmy, he really learns better when he can listen to something, so we have prototyped a new feature with Gemini and it's called notebook audio summaries.

LM will take all the material on the left as input and generate it into a lively scientific discussion customized for him, let's listen, so let's dive into the physics. deck for today well, we're starting with the basics of force and motion, okay, and that of course means we have to talk about Sir Isaac Newton and his three laws of motion, oh yeah, the basics to understand how objects move, okay, so it's nice. Great, I like the idea. I still have to figure out what the use case is for me and that's always the most important thing.

Technology can be cool, but unless it's highly applicable to something I need, then it's useless to me, so it's still cool. Alright, now they're going to start talking about agents and of course when they said the word agents my ears perked up, that's something that makes me really excited for the future. Let's see what they have to implement soon, one of the opportunities. that we see with AI agents, let me take a step back and explain what I mean by that. I think of them as intelligent systems that show reasoning, planning and memory, are able to think several steps ahead, work on software and systems, all to do something on your behalf and, most importantly, under your supervision, but let me show you the types of use cases that we are working hard to solve.

Let's start with shopping. It's very annoying when they always go to buy every time they offer one of these demos of really interesting potential. product or feature why do you always go shopping? It's such a boring use case to me. Maybe it's more exciting for all of you, huh, but I don't think so. I want to see agents doing tasks, real tasks for the things I need to get. Whether at work or in real life, not shopping. I don't have a hard time shopping, it's just not something I struggle with. It's fun to buy shoes and much less fun to return them when they don't fit.

Imagine if Gemini could do it all. the steps to find the receipt in your inbox, locate the order number from your email, fill out a returns form, and even schedule a pickup, yeah, okay, that part is great, but I still don't know about returns , I think they might have appeared. With a better use case that illustrates the power of Agents, let's take another example that is a little more complex, okay, I want to see a more complex example. Let's say you just moved to Chicago. You can imagine Gemini and Chrome working together to help you do a number of things. of things to prepare organize the reasoning synthesize in your name, for example, you will want to explore the city and find nearby services, from dry cleaners to dog walkers, you will need to update your new address on dozens of websites, it is okay to find and explore the city and think of things to do, okay, it's really basic, now I update my address on various websites, it's amazing, it's much more secure, thatIt's exactly what you'd expect AI agents to be able to do, so let's continue to note that Gemini can work on these tasks and will ask you for more information when necessary so you're always in control.

That part is really important as we prototype. Yes, that's amazing. It's an amazing use case. Something I would love to try next. Demis aabis who founded Google Deep Mind, he knew that this is his first time on the Google IO stage, so he was very excited to hear what he had to say, so let's see what he has to say and see what he is going to present. Welcome to the First time on the io stage Mr Demis, thank you HRH, it's great to be here since he was a kid playing chess for England. Okay, I'm going to skip this a little bit.

I just want to see what he has to announce. Alright, first they will announce Gemini 1.5 flash, which is a smaller, really fast and really cheap version of Gemini, so let's see a little about it. flash is a lighter model compared to Pro, it is designed to be fast and cost effective to serve at scale, while still offering multimodal reasoning capabilities and innovative long context. flash is optimized for tasks where low latency and efficiency are most important starting today you can use 1.5 Flash and 1.5 Pro with up to 1 million tokens in Google AI Studio and vertex Ai and developers can sign up to test 2 million tokens and you'll hear a little more about Flash later from Josh.

Today we have some new and interesting developments to share about the future of AI assistance which we call project Astra and project Astra seems to be what is open AI GPT 4. Now I'm going to play a demo video and it's incredibly impressive. This is so awesome. Yesterday I opened the AI GPT 40 demo. This is a pre-recorded video and I know Google was very sad because they created a video earlier that was ultimately shown to be heavily edited, but they made it clear that it is not edited in this video. It's one shot and it's at 1X speed, so let's look, okay, let's do some tests, tell me when you see something that makes sound.

I see a speaker that emits sound. What is the name of that part of the speaker that is the tweeter? Produces high frequency sounds. I, a creative alliteration about these creative crayons, they color happily, they certainly make colorful creations, what is that part of the code? Does this code define encryption and decryption functions? It appears to use aebc encryption to encode and decode data based on a key and an initialization Vector IV. That's right, what neighborhood do you think I'm in? This appears to be the King's Cross area of London. It is known for its train station and transport links.

Do you remember where you saw my glasses? Yes I see them your glasses were on the desk near a red apple okay that's awesome it really is before I play the rest of this video that's what AGI feels like in my mind you can ask him anything about anything around you, it can range from analyzing code to looking around the neighborhood and explaining things. on my desk and everything is fluid and everything is through a very natural personal voice. Now this person is putting on glasses and I think the glasses have cameras and I'm actually not sure what glasses they are, I'm not familiar with them.

I have the meta AI Rayband glasses that do the same thing, but let's go ahead and see what it can do now. What can I add here to make this system faster by adding a cache between the server and by the way, does anyone know what they are? The glasses are, if you make it, let me know in the comments below. I haven't seen them. I don't think Google has any glasses products, but to be honest, I haven't searched, so here we go, let's keep looking. The database could improve speed. this reminds you of Shringer Cat, okay, uh, give me a b name for this duo Golden Stripes.

Nice, thanks jemini, yes, really impressive, uh, I would love glasses that did that. The AI metag glasses are almost there, but the back-and-forth dialogue ability isn't quite there. Well, now Demis is going to talk about Ai generative art and specifically a SORA competitor that looks pretty cool, so let's take a look. First they announce Imagine 3, which is a competitor to Dolly. Okay, we've seen some nice ones. incredible progress in the generative art space. But to be honest, I'm not that excited about this anymore. I want to see the Gen AI video. Granted, they also announced generative music, which again is not a personal interest of mine.

I want to see the video. so today I'm going to move on to that. I'm excited to announce our new, more capable generative video model called VR vo, which creates high-quality 1080p videos from text images and video messages. You can capture the details of your instructions in different visual formats and Cinematic Styles: You can request things like aerial shots of a landscape or time sequences and further edit your videos using additional instructions. You can use vo in our new experimental tool called video FX. We are exploring features like storyboarding and generating longer vo scenes. gives you unprecedented creative control.

Techniques for generating still images have come a long way, but generating video is a different challenge. Taken together, it is not only important to understand where an object or subject should be in space, but it is also necessary to maintain this consistency over time. Like the car in this video, it builds on years of our pioneering work with generative video models, including GQN Faki Walt's video, yeah, yeah, that video was really impressive, let's watch it one more time, this basically reflects what we've seen with Sora in all the cars move in this generative AI video along the road they are consistent in space all objects move are consistent in space looks very real now they are going to give some updates on the AI search , which again is their bread and butter.

I'll skip most of it, but let's see what they have to say. We're making AI overviews even more useful for your more complex questions, for example, let's say you've been trying to get into yoga and pilates to find the right studio may require a lot of research, there are so many factors you need to consider, as you can see here. Google gets to work for you, finding the most

relevant

information and putting it together in your AI overview. You get some excellent graded studies and their introductory presentations. deals, so this is great, this is great. I think really complex searches are actually something that I've been turning to llama 3 more and opening up ai models for, so I think this is Google's attempt to really beef up their search product.

Alright, next they're going to show what's called the Gemini sidebar, I think, and it's a pretty cool feature. It's again a sort of agent workflow where you task Gemini to perform various steps for you and it actually does it quite well. well, let's take a look at what this demo is in the first one. It has a PDF that is an attachment of a hotel as a receipt and I see a tip on the side panel that helps me organize and track my receipts. Let's click on this message. The side panel will now show me more details on what that actually means and as you can see there are two steps here, the first step, creating a drive folder and putting this receipt and 37 others in.

This was discovered to be something that could use right now when you are I have receipts and need to track them in a Google Sheet. This is literally the exact use case I would love. Step two extracts the relevant information from those receipts in that folder to a new spreadsheet. Now this sounds useful, why not? You have the option to edit these actions or just press OK, so let's press OK. Gemini will now complete the two steps described above and this is where it gets even better. Gemini gives you the option to automate this so that this particular workflow runs in all future ones. emails that keep your unit folder and expense sheet up to date without any effort on your part, yes this is what I want, this is exactly what I want agents to do things on my behalf.

I have to explain it once and then they do it continuously forever. This is really great, this will make people much more productive. Now they're going to talk about a virtual teammate, so this is actually AI that you can have on your team to perform different tasks and of course this is all inside of you. the Google Workspaces account, as you can see, the teammate has their own account and we can go ahead and give it a name. We'll do something fun like chip. Chip has been assigned a specific work list. It is a set of descriptions about how to be useful.

For the team, you can see that here and some of the jobs are for monitoring, so one thing that I really like about all of Google's AI products is that you can do things like this, basically wherever you go to define the message or the system message. you can invoke references to other things, so here you can invoke a reference to the KNX project, maybe it's a Google document, here's a gem and here's a Google IO, you know, I don't know how to search or something, but basically just reference these external documents. sources of knowledge very easily from any of these input and project monitoring boxes.

We've listed a few to organize the information and provide context and a few more things now that we've set up our virtual teammate, let's go ahead and look at the chip and the action to do. that I will first switch here to Google Chat when planning an event. Does anyone use Google Chat? I don't. I use Slack and Teams and almost every company I've interacted with uses one or both. I don't know anyone who uses Google Chat IO. We have lots of chat rooms for various purposes. Fortunately for me, the chip is not available to everyone quickly.

I could ask a question like does anyone know if our IO storyboards are approved because we. I have instructed the chip to track this project. The chip seekers are found throughout the conversation and know how to respond with an answer there. It's simple but very useful, so I want to talk about a couple of things real quick, so this is really cool, really useful, but I already mentioned it. I don't use Google Chat and I don't know of any other company that does, but here's the key: being able to access these different external sources, Google Drive, different documents, presentations, emails, which is a key capability for Google now, so that too.

I'm very bullish on open source because companies use a variety of different tools and being locked into one ecosystem is really bad and has a big platform risk for that business, so when you build on top of open source and use open source. tools, the goal, hope

full

y, is that you can plug and play any service you want and it will be much more resilient and much more flexible, so that's the hope, which is why I'm very optimistic about open source, but open source as well has the challenge of how are they going to get all these different sources of information if Google isn't willing to share it, if Facebook isn't willing to share it, etc., they're not going to have access to them, so it's kind of a question. of the chicken and the egg.

Well, those are the most interesting ads. They also talked about how they're adding a ton of AI to the Google Pixel phones, which seems a bit duplicative. They already had many of these features, but I want to show the result. where CEO Sundar returns to the stage and last year at Google IO one of the memes was this super cut of him saying AI like 37 times over the course of the Google io event and it was a super cut and it was really funny and it was viral and now he approaches it in a really nice, unique and interesting way, so let's look at that output and then I'll come back and do my own output before I'm done.

I have a feeling that someone out there might be Counting how many times you've mentioned AI today and since today's big topic has been letting Google do the work for you, we're going ahead and counting so you don't have to, that might be a record for how many times someone has said IA. I'm tempted to say it a few more times, but I won't, so that's it. If you enjoyed this video, consider liking, subscribing and I'll see you in the next one.

Watch Video & Subscribe

If you have any copyright issue, please Contact