Actual AI Text-To-Video is Finally Here!

Mar 16, 2024

So so far we haven't seen real

text

video

, we've seen some demos from companies like meta and companies like Google showing what

text

video

is coming and we've had some really cool tools like deforum and this plasma. The punk tool and this deco

here

nce tool that blend one image into another and give a cool animation effect, but we haven't really had

actual

text with image in the way that we would use stable diffusion or mid-travel to write what we want see and get a video of that so far, in fact

here

's an interesting demo of what some people have done that I found on Reddit.

You have mountains and water in a Chinese painting, a beautiful painting of a Buddhist temple and a serene landscape, traditional Chinese painting, landscape with bridge and waterfall, you have fireworks, you have a bonfire at night in the snowy forest with the starry sky of background, you have a mountain river, so I'm going to go ahead and play this so you can see what. You see all these demonstrations and then we're going to play with them ourselves because you can use it right now today you can see the mountain and the water you can see all these waterfalls you can see the fireworks here is the mountain river The Starry The night with the lit fire we have a clown fish swimming across the coral reef we have ducks swimming in a pond a litter of cubs running through the garden a panda bear eating bamboo on a rock a horse chewing a knight riding a horse animation, all of this is made with text to represent an image of an orange cat in a leather jacket and sunglasses sings in a metal band on stage a monkey learning to play the piano two kangaroos are busy preparing dinner in a kitchen this is from a Reddit post I found on the stable broadcast subreddit here called the first open source 1.7 billion text to video parameter broadcast model that is now available.

More Interesting Facts About,

actual ai text to video is finally here...

You can play with this right now in this huggable face space here called Model Scope Text to Video Synthesis now there's kind of a little bit of a hiccup, it looks like a lot of the videos that are inside this model that it was trained on were

actual

ly videos that appear to have been taken from Shutterstock, which is why many of the videos it produces have Shutterstock watermarks. Through it, for example, Victor M here on Twitter, who is the head of product design at Hugging Face, made this post go somewhat viral where he generated his own Little Star Wars clip using AI.

Now if we look at this demo video they made, you'll notice. At the bottom of the video there is some sort of Shutterstock watermark that appears throughout the entire video, which I think proves that much of the video is trained on foreign Shutterstock. If you want to play with this yourself, here's what you can do. Can you go to text-to-video synthesis of the hugging face model scope? I'll make sure it's linked under this video here and there's really two ways you can do this. You can do it for free right now in the hug space where we go to message and it's probably going to take a little bit because a lot of people are playing with this right now, so let's do like an alien eating a taco and if I click run, it says This app is too busy, keep trying, let's go ahead and try again.

The app is too busy, keep trying, so you might be able to run it after a while, but a lot of people are playing with this right now because it's recent, it's new, it's the most popular thing right now, however you can duplicate space yourself, but you'll need to have a credit card registered inside the hug face and it'll probably cost you a few cents, probably less than two dollars or something, so for this example I'm going to double the space so you can see what it does, but keep in mind that if you really want to do it for free, you can keep trying while their servers are really stuck, so let's go ahead and duplicate this space here and it will take me a little while to get the space up and running, so I duplicated the space, but it shows me that I have a runtime error and the reason is that I duplicated it on a free GPU, which I don't.

It's going to be powerful enough to run this model, so if I go here to the settings, you can see that it has it on this base CPU and we're going to want to do something a little bit more powerful than that, so I'm actually going to do that. to upgrade it to this medium T4 here with 30 gigabytes of RAM and that should be enough to run this, so let's go ahead and switch to this. I'm going to need to add my payment method here and I'm going to set a sleep timer of just an hour of inactivity so if I accidentally walk away it doesn't continue to charge me so we'll go ahead and click confirm new hardware here and now it will try to start this media T4 and that should get rid of our runtime error once everything has booted up fine and now we are running our own T4 system so you should be able to output whatever you want and not have to wait for any kind of signals and let's move on and let's try a green alien eating a taco and click run, you can see it's actually rendering this time.

I don't get any errors because I don't have to deal with everyone else using the exact same server as me. Now I have my own server and our video is ready, it took about 60 seconds to process now it's just a two second clip and you can see a little watermark running through it, but here's our green alien eating a taco, maybe I can see a taco there. maybe if we give it more details here, if we do the standard thing you might do in stable diffusion, a detailed green alien standing on a red Mars landscape, let's add some of these other words like Unreal Engine, trending in Art station, realistic, realistic as an alien can be, I guess, HD 4K, let's look at a detailed green alien standing on a red Mars landscape eating a crunchy yellow taco, so now that we have a little more detail, let's see what happens when we run it this time.

We have a little more details. I still don't see a taco, but let's go ahead and see what we get. I'm going to get my alien on Mars, but I don't really see the cue part. We're still getting that Shutterstock watermark. that seems to be in every video, which I think proves that all the training material they use was probably Shutterstock videos. Let's try a different theme here, maybe something a little more realistic. Let's make a penguin kicking a soccer ball. I'm watching a football field here, oh, you see a penguin flash on the screen really fast and then fly away.

I can't get anything as detailed as what we're seeing in some of these videos here. Try some that were actually in this demo and see if I can get a similar result from the demo, like a clownfish swimming through a coral reef, let's see if we can get something similar there, so we have a clownfish swimming through of a coral reef, let's move on. and run that and look what we get, okay this looks a little better now you can see it's supposed to be a clownfish swimming across a coral reef, let's try a monkey on skates, okay I see the monkey . not seeing skates oh okay okay yeah that's a monkey on skates, towards the end of the video you can see it.

I really wish you could generate more than two seconds, but you know it is what it is now, this is obviously. very, very early technology, let's try a cat learning to play the piano, okay, let's see our cat playing the piano here, it looks more like a cat sniffing a piano, now here is the model scope page for this video mockup from Text Generation and you can check out some of those who shared here a giraffe under a microwave looks like a giraffe in a microwave a golden doodle playing in a park by a lake a panda bear driving a car a teddy bear running in New York City A drone flies from a fast food restaurant on a dystopian alien planet, a dog dressed in a superhero costume and a red cape flying through the sky.

Now I have to be honest, I think these are some of the chosen ones because when I try things I don't get the result. best results, so I think they probably did a thousand generations and they're showing you the nine best ones that they came up with because so far you don't know that they're not that close to what I'm trying to generate, let's go ahead and try. a dog dressed in a superhero suit with a red cape flying through the sky see if I get something similar look when I use the exact same message that's what I get it's like a dog with maybe a cape wrapped around it and that's exactly the same message shown here as a dog dressed in a superhero suit with a red cape flying through the sky and I'm sure I can use a different seed and get a completely different result, so let's go ahead and change the seed to get a different result, but we're definitely cherry-picking, so all the stuff you're seeing online is probably after hundreds and hundreds, if not thousands, of generations and going fine.

Here are the best ones we have come up with. Okay, let's try again a little closer, no. It looks like a dog running with a cape and maybe it flies and floats for a second there, but definitely nothing like what we're seeing in this picture here where it actually looks like a dog flying with a cape, let's see if it Try a teddy bear running around in New York City, let's see if it's anything like what the third generation looks like here. Okay, so here's our version of a teddy bear running around New York City. It's not that bad, actually, it's one of the most impressive.

I've seen, just keep in mind that a lot of the images that you're seeing from this text to the video, you know the ones that I showed you on Reddit here, they look really cool, the ones that they're showing here on their model scope page here, These are the Cherry Picked, these are the ones they probably tried a bunch of different seeds hundreds of times until they

finally

came up with a video that looked exactly like what they had imagined in their head and while this is really cool, It's Probably You'll need to do tons and tons and tons of prompts until you

finally

get one that looks like what you want it to look like and every time you request one using your own T4 server, like I do, it takes about a minute, so if you're trying to generate 20 different videos to finally get the video you're looking for, it may take you 20 minutes to generate 20 videos, but you can probably get to this level of quality that we see here and that we see here.

There will be a lot of trial and error until you do it, but I want to remind you of something: this is super early technology. If we look back to when Dolly One was first available towards the end of '21 early 2022, these are the types of images that Dolly was generating less than a year ago and this is what we can create to date with things like version 5 mid-trip. Here are some earlier images of Dolly when the text image was just making its way onto an armchair that looks like an avocado, and these are the kind of images we're generating today with something like version 5 of Mid Journey.

So in less than a year we went from this to this, so if we are able to generate videos like this today, imagine where this technology is. In a year's time, text to video is finally here, there's a version you can play with. Yes, it may seem a little disappointing at the moment and you may have to try a hundred times to get the exact video you are looking for. but it's here, it's available and we're basically on day one of having access to this once again. I'll make sure the link where these hug face spaces are located is in the comments below so you can go and use it, you'll probably need it. double the space and upgrade your server to use it right now, but if you're one of the lucky ones who manages to get in and play with it when the server isn't completely bogged down, you might be able to generate some free images right now, hopefully, I enjoyed this quick video of an emerging technology that's new right now, this is new, this is the hottest thing right now in the world of AI if you're like most people and you feel like this AI.

The space is moving super super fast and you want to stay a little bit on top of things, head over to Futuretools.io, this is where I curate all the best tools I can find. In fact, I'm starting to eliminate some. tools to make it less overwhelming, if I'm honest there are some junk tools out there and I'm starting to get rid of some of them so just the cream of the crop can stay on the site so check it out. at Futuretools.io and if what's here is still too overwhelming, there are still too many tools to review and you just want the tldr of the week, click here to join the free newsletter and every Friday I'll send it to you.the five coolest tools I found and I'll give you the tldr of the coolest news and videos of the week, as well as a cool way to make money with AI.

I send it every Friday, all you have to do is go to feature tools dot Oh, thank you so much for tuning in. I'll try to keep you up to date with the latest and greatest in emerging technology in the AI space, so if you like this type of video, please give it a thumbs up. that will ensure that you see more videos like this in your feed, if you haven't subscribed to the channel yet, that will ensure that you see more of my videos. I'm so happy that there are so many other people that love learning about this AI technology like I do and I really appreciate you watching my videos to learn more about this, so thanks again for tuning in, I really appreciate you watching. next time, bye, thank you.

Watch Video & Subscribe

If you have any copyright issue, please Contact