Dojo: Secret Tesla Program for Full Self Driving
Feb 27, 2020we don't have enough time to talk about today I'm not ready to talk about what it says in that project yet main
program
in testicledojo
a project we calldojo
a super powerful training computer knows what training computer on the chip hi I'm Warren Redlich. He wanted to talk a little bit today aboutfull
y autonomousdriving
in the dojo. There was a presentation with Elon Musk and Andrei Carpathia. This great presentation of the day of autonomy together with the day of autonomy. There was another presentation, but both. gave some really interesting hints about what dojo is and whatfull
autonomousdriving
is doing and what's coming with full autonomous driving version 4 so in this video I'm going to talk in depth about what dojo is, what comes with version 4 full autonomous driving. driving 4 and what exactly istesla
software doing to learn driving are you ready this is going to be fun the first thing i think we need to talk about is how many miles istesla
learning from what you will see with some of the dis asks if this is hypershift or i have seen a couple other people talking about this they will talk about how many miles tesla has driven on autopilot and that the system the software is learning from every mile driven on autopilot i think thats the wrong tesla you are learning from every mile you drive every tesla that has been built with fullself
-driving hardware in your version 2 or version 3 if you bought a tesla and didn't pay for the drive option full hardware autonomous there anyway every time you drive the software is watching you drive so I think the idea people have is autopilot drives and when you turn to the steering wheel or hit the brake, that's an important thing for the software to look at to say ok, we need to learn from this the moment you disengage autopilot for some reason because the software isn't working properly.Tesla is reanalyzing that footage. This moment is also a very important window into difficult cases and I think that underestimates the learning opportunities and I think Tesla knows that if you look very carefully, they talk about this in a couple of videos, either Elon or Karpov II, essentially like everyone is training the network all the time is whatever it is water to apologize on or off the network is training every mile you drive for the car but it's hotter or more is training the network the software is being running whether or not you have auto pylon if you are driving your Tesla the autopilot software is running and is predicting a path it says this is the path you would take if you were driving and compares that path to the path you choose and if given it counts a significant difference if you brake when you wouldn't have braked if you turn significantly when you wouldn't have turned that's another thing you can learn from it's not just learn since it interferes c With the autopilot running until it disengages, that's not the only thing you're looking at and there's more.
Keep in mind that with hardware three there are two chips running and each chip is capable of running the car on its own so now imagine there is an update they are working on Tesla download the updated version of the software from the software from autonomous driving to your computer that you are driving runs the current version of autopilot on one chip runs the updated version on another chip and now you are able to compare and are driving note that you are driving now you can compare how you drive to how you drive the software existing with how you drive unreleased updated software and you can look for variations and then you can make comparisons they test the software in shadow mode which means they see how the new software compares to the current software by running it side by side on the cars and if there are disagreements, so this is a big difference to understand how Tesla uses this is not a prediction this is not a c guess if you look at what elon and audrey karpati say this is what they're doing they're not just looking at autopilot commits they're looking for any opportunity to learn and that feeds into the dojo all you see full
self
driving hardware is an opportunity for the dojo to learn so that's the biggest thing I think we need to understand if you bought a Tesla whether you paid for full autonomous driving or not whether you paid for enhanced autopilot or not whatever paid if it was possible for them to put the full self-driving hardware in the car it's there you're watching it drive if it's running autopilot then it's comparing if it's looking for if it disengages but while running autopilot the second chip is running a potential upgrade, it may be running a potential upgrade by comparing how the current software is being handled to how the po upgrade is being handled tential and looking to either opt in or turn off the software so many learning opportunities and all of that will be reflected in how Tesla will use the hardware and software at the head office in Fremont where the autopilot team is working and the software. he's working to learn from how you drive how anyone who drives a Tesla drives that powers all of that and that brings us to another problem so if you look at Andre Carpathia's talk on PI torch and what he said on autonomy day that the car is watching you drive it drives itself and as you drive it looks for things that are out of the ordinary so in autonomy day you talked at length about a bike in the back of a car and originally the software saw a bike and it saw a car and it tracked them as two separate objects and that wasn't the right way for the software to think of it so they consulted the fleet for similar images and used those images to train the software to recognize that when there is a bicycle le that is in a car that is connected to a car that is just a car with a bike or just a car you don't need to track two separate objects the neural network actually left when i joined it would create two deductions would create a deduction car and a bike that we took this image and we have a machine learning mechanism whereby we can ask the fleet to provide us with examples that look like this for example these six images could come from the fleet which all contain bikes backs of cars carpathia too He talked about how it's not just about learning images of the Carpathia background, this image recognition, and using neural networks to improve image recognition, but this isn't just about image recognition, it's about observing situations that are present to the car and make decisions and not make decisions simply in a still image carpathia made a point about having a series of frames from eight cameras and it added up to 4096 images maybe we have eight cameras If we unroll for 16 time steps, which is the wrong size of, say, 32, then we will have 4096 images in memory and all their activations in one step forward and how difficult it is to process all of that, but that's what they are. the design of this hardware for this hardware is designed to process full video from all eight cameras plus radar plus sonar; uses all of that information to make decisions and instead of looking at a static moment in time, what am I seeing on all eight cameras right now is looking at a series of frames together maybe it's 12 frames maybe it's three seconds worth of frames that we don't actually know how much they can process and then when there's a disconnect when you're driving and you're driving you know autopilots only watch when you're driving an autopilot you notice a substantial difference between how you're driving and how you're driving you can query the fleet so you're Driving on the freeway someone is on the L on the left or right and they cut in front of you into your lane so here is a video showing the autopilot detecting that this car is veering into our lane.We ask the fleet to send us data whenever they see a car transition from a right lane to center lane or left to center lane and then what we do is rewind time backwards and we can automatically note that hey, that car it's going to spin in 1.3 seconds and it's going to cut it off the lack of beauty and then we can use that for training in their lab, they send their Tesla is going to upload data to head office, they're going to get that data, they're going to say, okay, Here's a moment where something happened that we need to study and they can check the fleet and they can send to all the cars in the fleet.
Have you had something similar to this that we can look at so you can compare multiple situations that are similar and see how the software handles it and how human drivers handle it and one of the things if I look there's a post on Reddit where there are posts on Reddit where people talk about how many gigabytes your car is charging so when Tesla queries fleet it's looking for certain things and if it finds something in your car your car is going to upload additional data to fleet if your car has a lot of disconnection you have a lot of unusual situations your car is uploading that data to the salk to head office and head office then queries the fleet for situations similar to what it finds so there is a lot of data being downloaded , there are wireless updates and your Tesla is getting new secure software from the home office, but at the same time the home office is at learning from you, you're learning from every car out there, you may not want to annotate the older drivers on my today, you just may want to imitate the best drivers and there are many technical ways that we actually specify that data and all that data is b By the way they didn't upload all the data because of the way your car generates a lot of data but as they said they don't need to upload the data of you following them you know driving in the middle of the lane on a highway .
I already understood that they are only uploading the data for incidents where the human driver differs significantly from the software or where the existing software diverges significantly from the other software they are testing on the second chip. I mean crazy. it's that the network is predicting paths that you can't even see with incredibly high accuracy the big question i wanted to talk about when i started thinking about making this video is what is the dojo for ilan talk about the dojo we have heard andre karpati talk about the dojo there is all this mystery about what is the main
program
in Tesla that we don't have enough time to talk about today called dojo which is a super powerful training computer the hardware team is also working on a project which we call dojo a and a dojo It's a neural network training computer and a chip, so we're hoping to do the exact same thing for training that we did for inference, basically improve efficiency by altering the reminder at a lower cost, but I'm not ready to talk about that. more details on that project yet so i have some theories as to what dojo my crazy theory could be which i'm pretty sure is wrong ok i have some more rational theories to come but my crazy theory i'm pretty sure i I'm wrong is that they've developed a quantum computer to do neural network processing because the idea of quantum computing is that you can test multiple ideas, multiple possibilities at once, so the idea of being able to test multiple paths that software might take and then comparing that to how the human driver does it would be very interesting now I don't think they're using quantum computing it's a wild theory maybe it's something they'll do in the future one of the things I looked at was I looked at Tesla Hiring I looked at for myself e What do they hire for I looked on LinkedIn to see people who work for Tesla.I couldn't find any examples of anyone working for Tesla or any job Tesla is hiring for that involves quantum computing, so though I think that would be really cool. idea and in the future someone will come up with some kind of neural network that uses quantum computing and there will be a brilliant breakthrough and that will make a huge difference to everything. I don't think we're there yet and I don't think that's what Tesla is doing now another idea that I think Tesla might be doing is they might be taking the 3 existing hardware chips okay and like there's a bunch of GPU, there's a bunch of graphical processing units that are using they've been using this and they're using it to learn how we drive how we drive how the car drives how other drivers drive there's also a reference to the dojo I don't think the dojo is just a chip Elon said no sir maybe it was Andre Karpov the one of them said the dojo is a neural network on a chip a dojo is a normal training computer on a chip but it's not just a chip i think so What is happening is that there is a group of neural network chips that form a networklarger neural it is learning so the simplest theory would be that they are taking the 3 existing hardware chips that are being used in the current Tesla Model 3 and Tesla Model S and X and future Model Y are using that chip of existing 3 hardware and if we use all 12 CPUs to process that network, we can do a frame and a half per second; if we use the 600 gigaflops GPU the same network we would get 17 frames per second the on chip neural network accelerators can deliver 2100 frames per second we run it on the old hardware in a loop as fast as possible delivering 110 frames per second for the new FST computer we can get 2300 frames per second a factor of twenty one because this is perhaps the most significant slide it's night and day and they've built some kind of cluster that uses the neural networks on each chip forms one more neural network big and can process at a much higher level so I think that's a pretty reasonable theory of where Tesla could go with this and you know it fits you know it's something that's very not easy in terms of software programming but in hardware terms we are already making all these chips we will only make a thousand more they are making hundreds of thousands of these chips and you only make a thousand for the cluster and that is another question i Interesting that it's hard: how many chips do you have in the cluster because this is supposed to be this super powerful computer?
Can they find a way to do it with a thousand twenty four chips maybe they are doing it with a hundred thousand chips who knows they can do as many as they want it would be very high power consumption but you know that for the purpose they are trying to achieve it could make sense for them to do a lot so I don't think we're talking about a million chip cluster but there's a very good chance the cluster will have a thousand chips or ten thousand chips so that's a theory and I think maybe maybe the most sensible theory we got from the crazy theory about quantum computing to the sane theory that they just took a bunch of three hardware chips now we know from the videos we know that tesla has been developing hardware for We finished this design as maybe we wanted to have it two years ago and we started the design of the next generation.
We're not talking about the next generation today, but we're halfway there. It will be at least let's say three times better on the current system so another theory is that they already have Tesla full self-driving version four prototypes and they're using early FS d4 version four chips they're using those early chips to create a dojo pool and they're using those early s chips and that's capable of processing note that Tesla Elon has said that hardware version four will be able to process about three times that will be three times better than hardware version three In all important ways, I don't think the power consumption was one that he thought was important to improve by three, but the processing speed, the ability to process video on a large scale, makes sense to me.
I'm not saying it's more sensible than hardware version 3 because there is this question. but if you are developing dojo and dojo is still this as elan calls in a major program if you are developing dojo for this purpose it could be that you are using dojo on the one hand to process more information quickly and on the other hand to test version 4 of FSD , so that's two, I think somewhat reasonable theories, I think it's optimistic to say that version 4 is so ready that they could do it, but at the same time they've been working on it for almost three years. now and world chip development time that's a long time you may be ready so that's another theory whatever the idea whatever the dojo is taking information at the video level now this is something which i also wanted to talk about if you watch karpati talk there are times ilan has also mentioned this there is a referent there are these references to vector space and karpati uses a video where you are looking at a top down image of cars going by via an easy one with smart summit and you are driving through a parking lot we are connecting three cameras simultaneously to a neural network and the predictions from the network are no longer in image space they are not in top down space now here we're doing predictions on the image and then we're of course pull them out and piece them together in space and time to understand a sort of layout of the scene around us so here's Here's an example of this occupancy grid that we show just the edges of the road and how they're projected so there's the high level that we operate at least we think we operate where we operate in what we see and process that directly into action there a view that Tesla takes the visual information he gets from the cameras and reduces it to a two or three D space and that every object in space becomes a vector with properties, thus there is Tesla himself and his size and which direction are you going and how fast are you going, this object right here is a stop sign it's not moving it's red there's a traffic light there's a car moving and this is the approximate size of that car and this is the direction it's moving and this is speed I mean these are all properties of the objects that the Tesla sees and these can be reduced to some kind of vector space and the information can be processed at the vector level which has It made sense to me from an efficiency point of view, since it would be easier to do the processing and decision making. at that lower level, but the idea of the dojo, if it gains momentum from what he said, the goal of the dojo will be to be able to receive large amounts of data and train at a video level and do massive unsupervised training of large amounts of video with the dojo dojo computer program that is processing all this information at the video level the idea is that the software is now making decisions based on what it sees without taking the intermediate step of putting everything in vector space and operating in vector space The one I relate to this personally I speak several languages I speak Spanish in particular Spanish and Japanese a bit of French but I'm not that good at French but I speak Spanish and Japanese quite well and I lived in Japan and traveled Europe and I live in Florida and I have lived in Texas in California, so I have had many opportunities to speak Spanish when you are in a country that speaks another language than yours and you know that language when you do it for the first time. you hear something in that language, let's say I hear something in Japanese, you mentally translate it into English and then you act or respond to something and you think how that would respond in English and then you translate it back into Japanese, but at a certain point, you reach a level where you can think in the other language and it becomes a lot easier and I think that's really fluency in a language when you're able to think and operate in that language without having to translate back into your native language and that's an analogy that I think It makes sense because what Ilan said about the dojo is that instead of the computer processing the information it needs, it visually reduces to vector space, makes decisions in vector space, and sends commands to the car turn left speed up break whatever whether the computer is able to process a level of video and skip the vector space step and translate that directly into action that might make the car more responsive might reduce d time The reaction could mean that he makes better decisions as I think when I'm acting in Japan and I'm actually thinking in Japanese instead of thinking about translating.
I think at least it's also better decision making, so again that's a theory. I don't know how else you would interpret what Ilan said about processing information at the video level. I think that's the only way that makes sense, so I hope it's helpful. Thanks for watching. I hope you liked this video and that it was informative. and it gave you food for thought if you have your own ideas about how Tesla is learning how Tesla software is learning self-driving from people driving from computers watching people driving, and in particular whether you have your own thoughts on what you think dojo what is it?
We know to tell everyone that they will tell the world what you think the dojo is and what you think the hardware could be for any other ideas how they are learning how they decide who are the best drivers they should see that was a very interesting clip earlier in the video where karpati said they're not just scoring all the drivers but they're looking for the best drivers and they have ways of measuring that so how are they doing it? Just let me know what you think there's a lot to this video and I'd love to hear your thoughts and of course if you're not already subscribed please subscribe check out some of my other videos and let me know what you think of those videos thanks again for looking.
If you have any copyright issue, please Contact