What is Tesla’s Dojo Supercomputer? Chat With Neural Net Expert James Douma #1 (Ep. 180)

Apr 08, 2024

Hello, I'm Dave Elon Musk. He has shared that Tesla is building a

supercomputer

called Dojo and that Jojo will be an important part of how Tesla trains and improves all of its autonomous driving features, but have you ever wondered

what

exactly this Dojo

supercomputer

is and why?

tesla

needs to build it themselves instead of using

what

already exists in this video we are going to dive deeper into this with machine learning

expert

james

dauma what follows is an excerpt from a conversation i had with

james

dalma he is an engineer at software and investor with a deep passion for machine learning, has followed autonomous driving efforts since the early 2000s and has driven nearly 80,000 miles on Autopilot in the two Teslas he owns.

I first learned of James when he published some fascinating details of Tesla's early

neural

networks a few years ago. Years ago at

tesla

motorsclub.com and as Tesla's full self-driving beta started rolling out, I wanted to get a clear idea of what's going on under the hood, how quickly Tesla's full self-driving feature is going to improve , because. Did Tesla have to rewrite it and what is the

dojo

supercomputer? If you're interested in watching the full interview, which is three hours long, I'll post that link in the video description and later this week I'll post another excerpt. From a previous conversation I had with James Dalma about the possibility of a Tesla robot in the future, I want to ask him his opinion on this

dojo

training system that Tesla is developing.

More Interesting Facts About,

what is tesla s dojo supercomputer chat with neural net expert james douma 1 ep 180...

You know Elon mentioned it. People were excited about it, but I wasn't. I don't know if many people really know what this is, you know what it does, can you explain what it is? You know about Tesla's dojo system, why they did it, and how this will help Tesla's full self-driving. You know, autonomous efforts in the future. So one thing about deep learning

neural

networks is that they scale very well with power and data, with computing power and data, which is a simple way to improve any system to simply improve its accuracy, that is, make that the neural network is larger and fed. get more data and train it for longer, right, many systems don't scale with ju by simply throwing more computing power at the problem, right, classic coding systems, don't you know that Microsoft Windows doesn't get better if Microsoft uses it? twice as many computers to write it right and most of the code in the world is like that, but you know there are economic limits to the size of a system you can put in the car and the size of a system you can train with data, so if you want to be able to train these systems cheaply, that means you need to do as many calculations for a reasonable amount of money as you can make, so the current hardware in the deep learning world is an interesting place.

Deep learning is a new enough technology that when it first came into the world no specialized hardware was built to do what it does, and in fact the best hardware wasn't actually a very good match for what it does. does deep learning. Learning really is necessary and that's why a company like Tesla can basically get their full self-driving chip right, take a relatively small team of people and give them six months or two years or something, they're not even working on the last process node and yet they can create a chip that is dramatically better than any commercial solution that can be put in the car.

I mean, it has a lot less power and it's going to be more reliable because it fits this thing better. It has dramatically greater computational power and this is because it is designed to do exactly what neural networks need to do well, similarly training hardware has similar types of limitations in today's world, you can go out and buy tons of CPU and tons. of gpus and you can build a hpc high performance computing cluster and network all these things together and you can do training and it costs a certain amount, basically, this basically comes down to how much money you want to spend training your neural network.

True, if Tesla wants to continue improving the system, wants to continue increasing the size of the network, wants to continue adding more data and that means it needs to train increasingly larger networks, it also has another limitation, which is that. So far, a very substantial fraction of the data that they use to train the network is hand-labeled, so they have this army of people who are not mechanical Turks, right, they are people who have been trained to do labeling for internal use. tools in tesla and they look at this data and evaluate it, you know, they essentially flag this data to provide training targets for certain parts of the neural network that are highly dependent on this thing or this right now, so the amount of Data they can enter into the training system is limited by the number of people they have to do this.

Some people are expensive. Some people, I mean, they might have 5,000 or 500 of these people, but having 50,000 starts to get very expensive, so if you want to scale up 10 or 100 times, you need to find a way to take advantage of them and one of the things is a way to take advantage of labeling, which is this, this technique that you've described where we essentially use the structure. from motion or let's use some mature geometry-based computational techniques to take a large amount of data and videos from all the cameras in the car at the same time and use it to build a 3D scene, let's have the hand of human taggers. tag that scene and then you can send all that data to the original frames, which will increase the productivity of the taggers like a hundred times, maybe a thousand times.

I think those are the numbers that have been mentioned and I think that's realistic, like um. the 3d um kind of structure through motion that's what um andrew andrei caparthi was showing at autonomy day basically as a kind of 3d modeling and I think in our last conversation you said that doing that on the side of the car will be just too intensive, it's not really that practical, but doing that, say, on the dojo side, will be a lot easier, but it will also be where it needs to be, which means they need that kind of structure through movement or 3D modeling. to be able to power your labeling correctly, so if I can paraphrase you're saying, hey, a label can go through that 3D scene, you know, make of car, make of cat, person, person, person, you know, big truck, what? whatever, and then, now that 3D modeling scene is actually made up of thousands. of frames, let's say that made that up, but once they tag it, all those tags will go back to the individual frames as tagged and so it basically takes advantage of your human tagging efforts, but it's immensely correct, so what involves using it is called motion slam structure and it is not the only technique, there are also other techniques, but they are techniques that do not fundamentally depend on neural networks, they basically take a lot of frames and use quite simple geometric processing, which I don't know if it is ripe.

It's probably not good to say it's mature, but it has a geometric basis, so you don't need to train a neural network to do it. You can train neural networks to help and probably Tesla and other companies will start using neural networks to speed up slam, but right now slam is compute intensive, so you get this data stream from a car, you know a car goes through a scene and you have all the data from all the cameras and all the other sensors in the car and you want to build this scene. that a tagger is going to tag well, some computer is going to run on that for a long time and then they are going to present the scene to a human, so now you have two limitations: how fast can my computer produce these scenes, plus how quickly my people can tag them, they're both moving targets, because the tools are going to get better and the computers are going to get faster, but one thing Tesla can do now or they'll be able to do a dojo and they're probably already doing this with the Amazon cluster stuff or maybe they have a data center made up of video boxes or something like that, I mean they have some kind of infrastructure that What we're using now so they can do this to a certain extent and to the extent That they do, leverages their taggers, so leveraging their taggers to be able to train larger networks with more data on the back end are all very computationally intensive things, right, and so, looking out, you know about the arc because it still We're not done with this, uh, right, here's what you know, hardware 4 is coming out, there's going to be a hardware 5 and a hardware six, the networks are going. to get bigger and bigger and within 10 years this technology will advance dramatically from where it is now, we're not going to get to robotaxis and it's already done because robot taxes can get better and better if you look across that . arc it is very important to reduce the cost of computing along that arc as low as possible and right now computing is much more expensive than necessary because most of the computing is not done in silicon which is specialized in do it. neural networks so tesla has a neural network problem that they want to solve it's about neural networks so you know they could spend 50 million dollars to build a machine i.e. build their own machine that has a certain performance or they could spend 50 million dollars and buy a bunch of nvidia boxes or buy some Intel boxes or rent time from AWS and get less than that, so my read on the dojo is that Tesla is making an investment in infrastructure so that within two years and five years later they get the most computing at the cheapest price and they also control their own destiny and one of the problems with depending on cloud providers, which is a really attractive alternative way of doing it, is that someone else owns a critical point. part of your infrastructure is like leasing your car factory, you don't want to do it, you want to own your car factory, you don't want to lease it to GM because GM is maybe a competitor in certain aspects and I think that's exactly the right thing is that I have Been discussing in circles and deep learning circles for some time that we needed to get off GPUs and we need to get into accelerators as quickly as possible.

I was super happy when I saw Google went to tpus because that was proof of existence that there is a lot of low hanging fruit in manufacturing custom silicon to make neural networks and you know Google has gone to the second and third generation of that kind of thing. You see some startups. do these things, but there is no commercially available part that you can buy today that is really specialized in deep learning, so you know that's why Tesla had to make its own silicon for its car, no one else can do it. was doing and it's strange, but it seems like the economic incentives don't really exist to do it, like the people who are in the best position to bring these custom neural network chips into the world, they're not actually incentivized to do it because it would compete with other businesses. that have. they are already making a lot of money in this space so google is making their own silicon amazon has started making their own silicon apple makes their own silicon as a big part of the silicon that goes into apple phones is now a processor of neural network that they use it to add value to many things that the phone does and that they can't buy from anyone else because no one makes it, so if tesla wants, they know an hpc infrastructure to train these big neural networks, I think they looked at the world. and they, you know, made the same kind of decision as Spacex.

We'd be happy to buy it from someone else, but no one else is doing it, so we're going to do the development ourselves, that's what I think the dojo has. Um, you're mentioning in our last

chat

that this custom silicon for neural networks is more like not the same chip as hardware 3 inside the cars or hardware 4. So this is custom and made basically for the type of hub. data. big applications like embedded computing, right? So how long do you think this has gone on? Do they have chips like these custom chips that you know are working right now?

Are they just slowly scaling this or do you think that's the point of the development process that you know and the scaling process, do you think they're at this point with dojo? So my read on dojo is that it's taking longer than they wanted and I think they probably got silicon in relation to getting the first silicon, you know, essentially making an architecture that gets their first silicon back and all that to make it work, eh , that is a necessary first step for this to work, but designingan hpc cluster is a lot of really specialized technology on top of that that's not traditionally like tesla's strength, right?

We have to like develop a team that can do this and those people will have to catch up just like they had to develop a silicon team to get silicon right and HPC infrastructure involves a lot of very specialized networks. the thermal, these thermal issues that elon has alluded to, are a big problem for hpc structures and there is a lot of technology and

expert

ise related to that and i think tesla has to overcome that learning curve and it will take a while before they are able to deploy this at the kind of scale that is meaningful to their development efforts for fsd, like having hundreds of these things running on the back end. space somewhere isn't enough, you know they're going to need a thousand or ten thousand and you don't want to do that until you've put the effort into designing the thermal solutions, the power solutions, the networking solutions that are needed to support these things and that's going to take time and I think that's probably the longest pole for them.

Right now, it's not silicon. I guess they already had their first generation silicon in these things. Got it, I want to talk about labeling because I think most people don't really understand what labeling does and I'm sure we can go on for a couple of hours, you know what labeling is and its importance, but instead of that, maybe I just want to focus on um maybe. I can share a little bit of my understanding of labeling and then you can poke some holes or try to get help to understand it, maybe a little deeper or clearer, so my understanding of labeling is that you have this neural network. and I think it will isolate it from Tesla's perception system, so how do you know?

You put the images together and perceive the world, and then these images are coming and I guess with this new rewrite you have a fusion, the camera fusion, so maybe you have It has been decided what the truth is in terms of knowing what it's happening and as it goes in, it goes through a neural network of different nodes and comes out with an output that decides what's in that perceived world, like what cars are there, what. type of cars, um and what type of people, all the different buildings that you know, etc., but the degree to which this neural network can be accurate can have accurate results, it all depends on the meaning of the labeling itself, the neural network It can't really figure out anything about it, maybe it could analyze the different, like you know, colors, textures, edges, shapes and all that, all the objects, but it's not going to tell you this is a cat or this is a person. or this is it. something that depends entirely on the labeling that is placed in the system, which is basically the training of the neural network, so you show the neural network an image of a person and you show an image of a person or a different one.

Types of people are liked a billion times and the neural network will pick up all the nuances and learn what that person looks like in different situations and become more accurate over time the more labeling is done. or the more you know, you are training that neural network and applying this to the case of tesla. I think tesla has trained its neural networks with all this labeling that shows its neural network and this training of what is correct over time and it has gotten to the point where it is decently accurate and the main things are important, but there is everything this field where there are a lot of cases where it's just ambiguous, you know, it's really a car under the bridge or this lighting situation, it's just some ambiguity in terms of perception and the neural network doesn't have enough training or enough data or enough examples shown and labeled where you can accurately discern what exactly it is, but to the extent that Tesla over time can increase the training and labeling of what, for example, that unique ambiguous situation is and not just once, but Let's say they have thousands and thousands of examples of user interventions that they know, etc., where they can extract those images and the neural network wasn't really sure what that was. but through human tagging, let's say, you tag thousands of those similar situations, now the neural network is getting better, it says oh, now I know that situation because I've been trained, you know what kind of situation it is or what it's called, etcetera.

And so the big challenge for Tesla's self-driving improvement in the future is that it needs to teach the neural network more and more, especially these extreme cases in the nines march, these extremely difficult ambiguous cases and the way in which What they can do is they take the data, especially, let's say intervention data or situations where they know the neural network has not been performing optimally and they can take that and then try to better label the situation so that the neural network can improve with the time, but The challenge is I guess if you only have 50 label labellers or something like that, or 100, you're not going to be able to take advantage of the enormous amount of data growing and, growing, you know, the amount of training that you could achieve if you have more unlimited quantity. or scalable tagging, so what dojo can do is provide a way for taggers to supercharge their tagging.

You know, not just individual frames, but now they structure through moving 3D models so that your labels can now review and train the system even. faster over time, thus increasing the rate of improvement of autonomous driving, so in that picture of my understanding of labeling, how it applies to, say, full autonomous driving, like, what am I missing something? Is that the essence or what else is possible? Add to that understanding that that's all you said is basically correct on that, so neural networks there are different ways to train neural networks, actually, there's a whole zoo of ways now, but there's a fun way, the way more fundamental you are, uh.

You know, a neural network is a transfer function, it basically takes an input and predicts an output. In the case of Tesla networks, they actually generate thousands of outputs, they have thousands of different things than they predict. every frame that comes in or every set of frames that comes from the cameras that predict, you know, where I can drive, where I can drive safely, where I can't drive at all because it's a cliff or a wall, um, you. know where the lines are on the pavement where the crosswalk is where the signs are where the traffic lights are which traffic light is appropriate for my lane from now on what is the smoothest path through the available space and thousands of other things beyond that like there are these, they have these cuboids, like every object in the environment that the car cares about at all, it draws a little three-dimensional box around it to figure out what the limits are and those limits define a box that you don't want. turn right, so you know a stop sign has a cuboid because you don't want to go through a stop sign, but then an animal on the road has a little cuboid around it and all cars have cuboids, right, they have like all that happened, they also label all that kind of stuff, so, the simplest way, and Andre Carpathi has made statements like: Yes, we know that there are many other ways to train, but this simple supervised training is what we know it works. and we know how it works and we know how to make it work well and we are putting almost all of our effort into that.

They have things that they don't need labeling for and like my favorite example, which is cut off because Carpathian gave a bunch of examples back in autonomy day about exactly how they developed the cut off and the cut off predictor is basically good, if I'm driving in my lane, it predicts if a car in an adjacent lane is going to cut in front of you because you have to take action and that was a big weak point that AP had until they got the cutoff detector. It was a pleasure to see it develop over time and hear from Carpathi how it worked, especially since it didn't really require much labeling, which is in contrast to most of the rest of the things they do, so when they started doing it, they knew the car would take photographs and a photograph would be placed in front of a label maker and a labeler, you know they would have this photo on the screen and they say this is the center dividing line, you know they would mark each thing in maybe a hundred different categories that were in the frame and that you would try to place the boxes around everything very precisely, then what the neural network would do is have many of these examples, millions of them, like it would be incredibly laborious for a group of people to go through all this data and label it.

It's and the neural network's job is to get an image and it has to tell you all these things about it and it has to give you the same answer that the human tagger did right and you give it a different image and it tries to do this. essentially with a neural network, every time it's wrong and it's always wrong, it will never be exactly right, it will always be a little off. You can take that error and you can see everything that contributed to that error in the weights in the neural network and you can modify them a little bit in the direction of correctness and it turns out that if you have enough examples and you do this enough, the neural network will become really good at giving you the same answers that the human being gave correctly. when they were just on camera when they were just using camera networks, which you know, camera networks, by the way, the term I'm using there's a big neural network connected to each camera, the process is directly what gets the frames from the camera and then it does a bunch of processing and generates a few hundred different data that comes out and some of that data includes these are the lane lines that I see and this is where they are and some of the data is like these are the vehicles, the pedestrians and the objects of interest and these are the cuboids that surround them, this is the volume of space that they take up, it predicts all of that for all of those things, so you know at one point they had software that took all this data from all these cameras and then it would decide what the situation is based on that and over time what they do is more and more of that background processing is handled by other neural networks that train even higher level things. but most of the data and the most important things are the ones that come from the cameras and that requires a lot of manpower to label this is what is called supervised training.

Supervised training is where you have a bunch of examples and you go and give them to the neural network and its job is to produce the response that you know is the correct response given the input that you know is a particular input and then they take all this data from the world. real, so an important part of the labeling and what is not well understood about the Tesla network or maybe well understood maybe not well appreciated. There's often this kind of misperception that all the Teslas are out there just recording all this data and charging all this mass. data, in other words that the fleet with tesla feet function works like a video recording device, most of it produces all this data which is then tagged, selected and entered into the system and that is not actually what it does , it does something that is much more useful than that, once you have this basic system working correctly, let's say you have built your network and you have all the examples labeled and it runs a bunch of things that developers will start to see. patterns in the mistakes that are made, as a good example, you know you are coming to a bridge and it breaks because it thinks it sees a wall because of a shadow or something and the way you fix it with supervision Training is looking at the places where mistakes were made, take those examples, have a human label them with the correct label and then you train the network on those and what happens is the answers that the network gives you, move away from the wrong ones and move towards the right, but to be able to quickly capture a lot of examples of what you're having trouble with what you need, what you want in the real world, you'd like to have a fleet. of a million cars out there where you can say send me examples of this you send them a request I want this and they take them and send them to you and that's what Tesla has they are not absorbing the fleet it is not a question of data collection It is a Look at the world and find examples of this particular thing that I'm asking you because what Tesla needs is not a lot of data, what they need is very specific data.

You know, if I told you that you know, waymo, gimme, gimme. Yo, examples of office chairs falling off the back of trucks, like I'm going to take, you know if all you have is a thousand cars in the world or if you're just getting all this data from all these cars, go figure. . having to have a computer to go through all this data that's just raw data from cars, looking for examples of office chairs falling off the back of trucks because you know that's a problem you're trying to solve, which tesla can do is can they say I want the office chairsfall off the back of the trucks, they can send that to the fleet and every car that sees an office chair fall off an office truck, takes those images and sends them back to tesla so these cars in the field, They're filtering the world, they're using the processors they have to filter every single thing they see when they're driving, looking for examples of things that Tesla has requested, and they take that example and send it back.

Not only do they function not as recording devices, but they intelligently filter the world to find things that Tesla has ordered and send them back, and that's really important, it's very different than just having a bunch. of recorders out there, I'm sure I want to say it's because I want to say I'll continue with the thinking is like a tesla, the neural network is able to pick up certain things very easily or it's like it doesn't have any problem, you don't need much more data or labels in the things you're really good at, you're needing the labeling and data for the things you're not good at because that's what you're trying to fix in those cases where you're unique, the capability is like a bubble that expands to the right and there is this thing you can't do.

I don't want to collect data that there's no way you can do well yet and there's no point in having things that are already inside your bubble. You want to collect data that's right on the edge, like you can do it, but you can. I don't do it well because if you can do it, I can tell the cars in the field, you know, if I ask them, this was one that we saw, you know, when we were looking at these people who hacked the cars. I've been watching Tesla pull this data for a long time and they were able to take the triggers apart to look and see what Tesla was asking for and there were all these great examples, but one of them that I really liked was the garages, and I think this was because someone had a hard time understanding when it was in a garage versus a parking lot versus you know, just entered the tunnel or whatever the deal was and it was interesting to see some of the things that came back because there are a lot of different types of garages in the world, so Tesla sent me pictures of garages and the way they do it cars can recognize a garage like if they see something they would say well maybe this is a garage maybe it's not a garage if they have no idea if it is a garage or not, there is no point in asking the fleet because you will just get a random answer if the fleet knows what a garage is but makes a lot of mistakes you can ask for all of those and humans can review them and if the data is 10 effective or 50 effective, they just throw out all the non-garage examples, the rest are labeled garage, they enter the training system correctly, and What happens is that two weeks later, suddenly, their system is really good at correctly recognizing workshops and They simply review this list of problems.

You can't do this. You ask the fleet to get the data tag, you enter it and this is it. This is the Carpathia data engine, right, they say we need this. Ask the fleet. The data returns. People label it. He is going on a holiday operation for them. The labelers stay on top of things and the system stays on top of things, but the engineers. could go on vacation and the system keeps improving because you know they look at the next problem, which is what the car can't do, and they learn from the interventions, that's why the interventions are really important, because it's the human drivers who They tell you to solve this. they fix this they fix this I had to intervene automatically they extract that data they collect it to look for common phenomena they discover you know some human being discovers what the common things are they generate the trigger they send it to the fleet they pull the data train the network problem solved move on to the next , yeah, it's interesting because, what you're talking about in terms of, you know, extracting the data, the images that they need from the cars, it's interesting because the first thing I do is, well, how.

Can they take out something that the car doesn't even know what it's like, for example, a garage, but I think you're your example that yes, they will do it even if they recognize them, I say 10 percent of garages will take out. that, but then use that data and correctly label the ten percent that they were able to accurately put back into the system so that the neural network knows better how to search for a garage, let me give you a little more detail about This is true, the networks neural networks, all camera networks, generate something called embedding, which is a number that basically tries to describe what the camera is seeing in this very high-dimensional space.

You might know that embeddings are usually 2,000 to 8,000 dimensional vectors or something like that, so if you want pictures of garages, what you do is you take pictures of like 100 garages and run them through the neural network and look at the embeddings. that you get for each one and the embeddings that they will form will form a cluster in this high dimensional space. In other words, there will be a point in this thousand-dimensional space that is the point of the garage. This is what a garage is. The network is uncertain, although you know, for example, if I. just ask, tell me what exactly is this point, I send this point and this embed on the network, so tell me, send me things that match this, you will only get things that match exactly with it and you won't learn anything from what. what you do is say, send me anything that is distance x from this, in other words if you think it could be a garage in all these dimensions, send me that too and then a human makes the decision, the tagger does it correctly . the system doesn't need to understand precisely what a garage is, it just needs to have a vague and reasonable idea of what a garage might be and then you tell it how likely something is to be a garage, you know if you like it if the network believes that it's 90 likely to be a garage, send me all of those or can you say if the network thinks there's a five percent chance of sending me a garage, all of those so they can test these different confidence figures and they can pull examples from reality. that they are rich.

There are enough things that the network wouldn't have picked up before, but should, so we'll add them to our labeled data. That's how it works, that's how you know all these neural networks. They're fundamentally probabilistic, like when you classify something, you basically know when it has a lane line, it says here's a lane line, here's the probabilistic boundary where I think the lane line is, I think there's a 90 percent chance of that it's here, but there's a 10 chance that it's Here, instead, what you want to do is develop the networks, make them more confident in the correct answer over time, and therefore a tool with the one you always have to work with when you're trying to select data or make decisions. about this kind of thing is that you're always coming up with these probabilities, you can say that even if you think there's only a five percent chance that it's a garage, send it to me or you know if you were doing the thing about the office chair falling off the truck.

That's a very difficult question, right? It might be if you said "tell me if you think it's 90% likely that it's an office chair falling off a truck, send me that and you'll miss out on a lot of office chairs falling off trucks because it's never that confident, but if you do." accounts, you know 10, you might get a lot of junk, but at least some office chairs will fall off a truck so you'll have something to work with and then you'll have to figure out what that is and then you can take the ones that give data on the real cases of officers falling off their chair, label them, stick them back into the system, you can take a group of those who thought maybe it's an office chair falling off a truck, right, and you can stick yourself .those in your training group and can tell that this is definitely not an office chair that falls off the truck and also gets smarter that way, that's true, and then Tesla releases an update and now they are much better for detect office chairs that fall. the truck and it seems to me that if this happens constantly and continuously, you know, the tagging, the capture, the use of the fleet to extract the images, the data to train the networks, it makes more sense in terms of how the improvement goes.

This is getting worse because the capacity of the network to be able to train or extract relevant data is growing because now you have garages and then you can have bicycles coming out of the garages or office chairs that fall off the chairs, but you can have different things, a bookshelf falling off a chair or different things, so all these cases start to grow, but your ability to recognize and improve, but also the ability to extract more and more data grows because you are recognizing more. and more things along the way, yes, everywhere, so one of the questions that came up, which is actually a very good question, is how to deal with the march of the nines, you know, because essentially when we started dating really in the weeds and You're trying to deal with really weird phenomena like, do you know how you deal with that kind of thing?

All you know to be able to recognize a chair that falls off a track, first you have to know the chair part and then you have to know the truck part, you have to know that I am on a road, you may have to know that It's not sitting on the ground, it's not on the side of the road, I mean, there's all these things that you have to understand before you can get there. to the chair falling off the truck, but the thing is that once you have the truck and we're in it, you know that this is something that shouldn't happen once you know enough about how things should be.

You can start looking specifically for things that shouldn't be the way they are, so as the network improves, your ability to find edge cases expands rapidly. It's true that you know that edge cases are almost an unlimited universe of edge cases, right, but humans? in the real world they may experience extreme cases that they have never seen before. You know I may be driving, I can't know what a tsunami or a flood is and if I'm driving on a highway and I say hot water, I know it's not supposed to be like that. to be that way and I can stop the more things you know, the easier it is to spot things that don't fit, so your leverage for dealing with edge cases actually expands, it never converges like you never do. up to a point where you perfectly understand each edge case, but as the edge cases become increasingly difficult to detect, if your network is good and you are training it correctly, the network's ability to detect the edge cases simultaneously becomes more and more powerful, so we can continue to make progress to improve the network even when the edge cases become really strange and exist, yes, so this is, I think we are getting closer to the essence of how Tesla is going to solve it, you know, autonomous driving is this continuous improvement that compounds over time as your data grows and the ability to train your network grows.

I want to take an example of, let's say, let's look at the type of interventions, how they are used, you know, in combination with maybe data mining. so let's take an example, let's say speed bumps, there is a video about full self-driving on Twitter where the full self-driving beta didn't catch the obstacle and just passed by very quickly, okay, so let's say that in this situation The driver intervenes, you know, presses the brake or something and it disengages, so now Tesla knows that there is a disengagement that it goes into and pushes it so they can pull those images of the disengagement, let's say 15-20 seconds before and after or something like that and they pull on that and you look at it and say oh man, you know, the nerve that didn't get that this was a roadblock and then they can label that they're going well, this is a roadblock, but now they don't have enough data to really know. train the neural network, this is the speed bump, so now they send a request to all their cars saying, "look for speed bumps" and then basically the network of cars will bring back. images and get out of what you're saying, it could depend on how much data you want, you could make the request more precise, less precise, but it brings back, let's say a bunch of data, let's say you know a certain percentage are real obstacles for you. you have human taggers, you know, they tagged all these things like speed bumps, um in the future with dojo, could you know, let's say structure through movement?

You can have them, you know, like labeling the speed bump on a structure through motion and all the frames. You would know that they would have that labeled as a speed bump, but for now let's say they do human labeling for the speed bump on certain frames and then let's say they are able to recover a hundred speed links, you know, labeled correctly by human label taggers. . they put it back into the training system, the neural network is trained where it reweights, you know the ratings between certain nodes to basically get results where now you can conclude that these are the most probabilistic obstacles, probably, and now they throw aupdate, let's say, for these fully autonomous beta cars and now when the fully autonomous beta cars pass or approach a speed obstacle, they are more likely not to or the ability of the neural network to detect the speed obstacle is has improved a lot, then people say, "wow, I didn't know, just a couple of weeks ago, you know I went over the speed bump, but now you know I'll stop before the speed bump." I mean, that's generally the system. working, yes, I mean, there is a mechanism that is working.

You can also do something with your interventions. You know where you press the triggers of the car and say if there is an intervention. Look for these things and if you see one of them. they send it to me so you can send a message like the speed bump, it's not a speed bump, it's not a speed bump, it's just that you know there's a dark spot on the road, what's the probability that it's a speed bump? The car thinks and you tell it that you know the The control system of the car is good, if there is less than 50 probability of not stopping or if you know if it is 90 to stop or you know that you have it, it has some kind of threshold, so one thing you can doRight away you don't even have to retrain the network if you're looking at these things and you've told the car you know to stop, if so you have a 98 chance of getting a speed bump, you know this because , otherwise, maybe you are afraid.

It would stop all the time for shadows and random things on the road and you know people drive it for a while and you find out that the correct threshold is actually 95, it's not 98, so you just push that number into the car. and suddenly the behavior of the car changes and it's because you've learned what the right threshold is to take that action, so I think a lot of these immediate improvements that we're seeing in the fsd cars are probably in that sense because they haven't really had time, I mean the data engine, you know, it's a one or two week cycle for them to send tag requests, retrieve the data, and deliver it to the taggers that the tags pass through. select the set tag, stick it correctly, retrain the network, check the network for serious errors and then take it out of the car, it's like a two week cycle, but something you can do right away is you.

I can look and say, well, I had 57 interventions here and each one of them had a speed increase probability between 95 and 98. We didn't stop if we had set it to 98, it would be and then you can look, you know you can. just lower that number a little bit and then look to see how many places you were slowing down where people are pressing the accelerator correctly because they're telling you, oh, you saw a speed bump and it's not there and it feeds back to you. the car too and it can be, I mean, it's probably more than just a speed bump versus no speed bump, right, I mean, they probably have more categories of things in there, so you can also break that issue down a little bit more finely because They are speed. speed bumps it's like it's a yellow stripe of the road and there are other speed bumps where it's a shadow and you might have different categories for those that have it so I mean it's interesting so you're saying that certain variables or settings that you know for example, the percentage, let's say, 95 or the threshold that is required for them to say stop at an obstacle, um, um, those things that you think can be removed without a big firmware update, you're saying basically through software configuration updates, that's right.

What I understand people who have seen these things go back and forth between Tesla and cars have seen is that neural network weight updates now only happen with relatively large nudges, so the packets you see in the networks neurons are large. large files and are highly interdependent with a bunch of other things and the packages that come out of the car and update the neural networks, as far as I know, only happen with a pretty big push, which is the kind where you know the car. you have to go to sleep and wake up again and the setup takes 30 minutes or whatever, there could be exceptions to that, like some of the neural networks might have a bunch of neural networks in the thing and it might be that you have a small neural network, let's say the windshield wiper network, this is something they struggled with for a long time and for like a year the neural network kept getting smaller and smaller and it combined with this guy and it just moved around. place while they were trying to figure out how to get the wipers to correctly detect rain, but it's a relatively low-risk neural network like you might use and there were times when it was quite small, so it's conceivable, although as far as I know that no one saw this happen, that Tesla could have launched a new wiper network if they really wanted to try something no one saw that happen, but it's not beyond the possible, on the other hand, there are these types of smaller hyperparameters. files correctly and they seem to update correctly, so you could have a file that has several of these percentages, like slow down if the curve is narrower than this or stop if you know with this probability if Look at this and you know about the windshield wiper, it was producing a windshield with a chance of rain like it had these, it was changed a bunch of times, but for a while it had these different categories of rain, like you know, it's drizzling or whatever. and it would give you a probability in each of those categories and then the software that would turn on the wipers would look at those probabilities and make a decision and you could imagine those are numbers that you could definitely move like this Yeah, this is great, I think it's definitely helping me. to understand.

I think it will help others too. I want to ask some questions about this whole idea of the perception engine for autopilot or full autonomous driving versus your kind of planning. knowledge section, so as I understand it, you know you have this kind of perceptual system where you try to understand the world correctly and what objects are moving by basically interpreting what's happening, but then there's another kind of set of challenges that It's how you navigate that world, what decisions you make, how fast you go, when you turn, how you turn, etc. and it seems like Tesla has these things in a separate type of planning system where you just determine how to plan and some people have asked me on Twitter, you know, this planning system is mostly hand coded, you know, or there is use of neural networks in terms of making planning more automated or improvements to planning more automated or Do you foresee this planning system remaining a sort of manual code?

You know, into the future. Yeah, so people talk about these systems like they plan perception and action correctly and have these three big categories of things and, uh, the big challenge. has had accurate insight so far, so there are things in planning that are really amenable to a neural network doing them and there are other things that are less amenable and all of these systems I call a harness that runs on you, you have this , they all have handwritten code that looks for really stupid things that are almost certainly a bug and prevents you from knowing about bad results by noting that the harness is mainly there to catch errors due to neural networks.

It can also be buggy, right, I mean, you can, you can have something that's like this really strange result that you get, it's very rare, but it happens and building the harness is relatively simple, you do it once and you don't have. to do it again and everyone who builds robotic systems in the real world has a harness around this thing that looks uh and you can look, there's a lot of different ways that you can examine this and you know the output of these nerve networks and these systems of planning to decide if it makes sense or not and the harness will step in and prevent it used to be like it used to be, especially in ap1 and early, you would see the harness activate all the time like you could see all these strict thresholds. or like it was doing the thing and then the harness would kick in like one of them ap2 for a long time had a harness on how tight you could turn on a highway because one of the failure modes they had is the wheel decided to turn too much. strong for the speed you were going on the right road, so for a long time the car's main failure mode on highways was slowly drifting out of its lane.

You know you'd be driving, maybe the speed limit is 65, but traffic is going 75, so you're going 75 and you come to a relatively narrow lane and you see the car slowly drifting toward the line and you wonder what's going on. by that, I mean, you can look at the screen on the display lines, I knew clearly where the lane was because you could see it really swerving, why wouldn't it just stay on its own? The reason is that they had a difficult setting that said at this speed you can't turn the steering wheel any harder than this right because that was a The safety harness and the harness, the functioning of the harness is what kept the car from entering. in the lane because you happened to be on a road where, no matter how fast you were going, you were going too fast for how fast the harness would allow you to turn on that guy. of the situation and for a while they needed it because the accuracy of the curve prediction was not good enough to be able to avoid occasional really bad situations, like you know there's a mountain ridge problem that they used to have where uh for a long time time. the way it understood where it was on the road was it looked for lane lines like in the beginning lane lines are really important now they are very unimportant especially with fsd only humans treat a lane line as if it were just a suggestion because you know you can cross, you know if it makes sense, you cross the land line correctly, if it's safer to do it or faster, you know, you just cross a lane line, but on autopilot for a long time the lane lines were God , they were like concrete limits and I like it.

I wouldn't cross one of those unless the harness activated and said I had to turn right and that was the only situation I would do it, but then you had this problem at the top of a hill where you're driving towards below. a road and the road goes up and down to the right and it wobbles back and forth a little bit and every once in a while there's this thing where you're like in a curve and when you're coming to the top of a curve , you know the car can't see the lane lines on the curve, so it doesn't know where the lane is going right now.

If you're on a completely straight road and it comes to the top, the car would guess, oh, it's probably straight on the other side, to the right. but if you're taking a curve when you get to the top, it's like you know because a lot of times the curve changes right after you cross the hill or goes straight, sometimes the car gets scared at the top, like because of the lane line. it just disappears and has no horizon for the road or something and then you know that just as you're getting to the top of the road you can see some markings on the other side, they're giving you an idea that the lane is going in this direction, the line goes that way and suddenly you would want to turn the steering wheel right at the top of the hill, this was a very common failure mode and if you drove on those types of roads very often you would see it all the time, you could predict it as a driver. and you say, oh, it's going to happen right here and the wheel would jerk suddenly and the harness was a way to make sure that those jerks were never strong enough to cause an accident, so you have these harnesses. there, so nothing is completely without hand coding, your harnesses, it's going to be these system-wide security harnesses that look for the things that perception is, uh, perception didn't used to be all neural networks, it seems like perception is almost all the neural networks now that the camera networks in fsd now have a radar that they don't use, before the neural networks processed the camera in a separate neural network for the radar somewhere and then there were other networks that would combine that and some heuristic code that I would combine it, but now they are starting to gather more and more sensors earlier in the process and those networks that they know as the system develops extend further. and further up in perception, I guess at this point there are a lot of things that I can't see, the things that I can see, are neural networks like the things that I have access to, you know, that's what we're going to look for that's what the others give me and that is what I can process so I can look and see what is in the neural networks, but I have to guess what is not in the neural networks, which I do by observing how the system works. it behaves and you know, it's kind of common sense and that kind of thing, neural networks are accumulating more planning characteristics in them, but planning is not completely a networkAt any reasonable time you will reach a point where you put the Google map into the neural network and then the neural network decides that you know how to get there.

There will be heuristics for route planning and all that kind of stuff and it has to be interconnected. At some level of the planning module, which lane I should be in when I cross this intersection depends a lot on where you want to go and there will be a heuristic mapping module that will decide that kind of thing and it will be part of that equation for a long time, but all the decision about how quickly should I move into the left lane so I can turn left or if it's not safe and should I just go through the light and take the left lane?

Then left and come back, you know those are those types of trade-off decisions that will eventually be almost all in the neural network, but not now, so the planning is quite a lot, there's a decent amount of things very related to the neural network. I think the network is already doing that, there are a lot of things that are not neural networks now and probably won't be in a reasonable period of time and that limit is constantly changing as a neural network can do it and more and more . more, they deliver more to the neural network and then what they had before becomes a larger part of the safety harness.

It becomes a stopping measure to make sure the neural network is behaving. Yes, yes, that makes sense. um so in your opinion the perceptionThe problem is that the big problem, I mean, is that the planning would be more secondary, the challenges that that faces are over time, so all these systems have aspects that adapt better or poorly to the heuristic code versus handwritten code versus what neural networks do and that's it. a moving target, right, heuristics have been around for a long time and robotics is a mature field as there are a lot of well understood techniques for dealing with these things, heuristics and we have a pretty good understanding of the limit of what those things they can do it, so the good thing to start with is to build them that way and then, as the capabilities of the neural network in that space exceed the heuristic, you hand it to the neural network and then, once again, the heuristic is becomes part of the safety harness for the neural network at that point, I hope the conversation was helpful, if it was, please consider liking the video to help spread the word and subscribe to my channel to stay up to date on new videos.

In fact, I interviewed James Damo over a week ago. for two hours, but after recording I discovered that the first hour was not saved due to lack of storage on my computer, so I asked James if he would be willing to talk again and he kindly offered me his time, so the excerpt What you saw was from My second interview with him, the entire interview is over three hours long and we cover a lot more topics, like how much Tesla can charge for fully autonomous driving in the future, who the competition is, and a host of other topics that i will link to the full video in the video description below also later this week i will share an excerpt from my first interview with james and it's about how tesla can leverage its expertise in artificial intelligence to enter new markets beyond autonomous driving.

If you're on Twitter, I'm active there. On heyday7 too, if you listen to podcasts, all of my YouTube videos are in an audio podcast, just go to your podcast player and search for Dave Lee on investing. I hope to see you in my next video, thank you.

Watch Video & Subscribe

If you have any copyright issue, please Contact