Tesla Autonomy DayFeb 27, 2020
you you you hello everyone. I'm sorry I'm late. Welcome to our first day of
autonomyanalysts. I really hope this is something that we can do a little more regularly now to keep you posted on the development that we're making with regards to autonomous driving about three months ago we were preparing for our fourth quarter earnings call with Elon and a few other executives. and one of the things that I told the group is that of all the conversations that I continue to have with investors on a regular basis that the biggest gap that I see with what I see inside the company and what is the external perception it's our self-driving capability and it makes sense because for the last couple of years we've been really talking about the model three ramp and you know a lot of the debate has been around the model three but there's actually been a lot going on in the background we have been working on the new force of having a chip we have had a complete overhaul of our network ne uronal for vision recognition, etc., so now that we finally start producing our complete autonomous computer, we think it's a good idea to just open the veil invite everyone and talk about everything we've been doing for the last two years , so about three years ago we wanted to use, we wanted to find the best possible chip for full
autonomyand we found that there is no chip that has been designed from the ground up for neural networks, so we invited my colleague Pete Bannon, vice president of engineering at silicon, to design such a chip for us.
He has over 35 years of experience in chip design and construction. called PA semi, which was later acquired by Apple, so he worked on dozens of different architectures and designs and was the lead designer, I think, for Apple iPhone 5, but right before he joined Tesla and they're going to join him on stage. by Illamasqua thanks actually I was going to introduce Pete but I've done them so he's the best ride and best systems architect I know in the world and it's an honor to have you and you and your team at Tesla and we'll get along just tell him I think how working on Unity never got done thanks Eli it's a pleasure to be here this morning and a real pleasure to tell you about all the work my colleagues and I have been doing here at Tesla over the last three years I think I'll tell you a little bit about how it all started and then I'll introduce you to the complete stand-alone computer and I'll tell you a little bit about how it works, we'll dive into the chip itself and go through some of those details, I'll describe how the network accelerator works custom neural that we designed and then I'll show you some results and hopefully everything will be up by then.
I was hired in February 2016. I asked Elon if he was willing to spend all the money it takes to do a full client om system design and he said ok we're going to win and I said ok yeah of course so He said I'm in and that got us started and we hired a bunch of people and started thinking about how complete a custom designed chip is. for full autonomy it would seem like we spent eighteen months doing the design and in august 2017 we released the design for manufacturing we brought it back in december it turned on and actually worked great on the first try we made a few changes and it released a B zero Rev in april 2018 in July 2018 the chip was qualified and we started full production of production quality parts in December 2018 we had the autonomous driving stack running on the new hardware and we were able to start retrofitting employee cars and tested the hardware and the software in the real world last March we started shipping the new computer on the Model S and X and in early April we started production on the model 3 so this whole program from hiring the first employees in To have it in full production on all three of our cars is only a little over three years old and it's probably the fastest systems development schedule we've ever come up with.
I've been associated with and it really talks a lot about the advantages of having a tremendous amount of vertical integration to allow you to do concurrent engineering and speed up implementation in terms of targets, we totally focus exclusively on Tesla's requirements and that makes life so much easier if you have a single customer you don't have to worry about Anything else, one of those goals was to keep power under 100 watts so they could fit the new machine into existing cars. We also wanted a lower part cost so we could allow for redundancy for safety when we had our thumbs in the wind.
I presented that it would take at least 50 trillion operations, that is, one second of neural network performance to drive a car, so we wanted to get at least that many and really as many as possible. It calculates the number of items it is operating on at the same time, for example Google TPU has a batch size of 256 and has to wait until it has 256 items to process before it can start, we didn't want to do that so We design our machine with a batch size of one, so as soon as an image appears, we process it immediately to minimize latency, which maximizes security.
We speculated that over time the amount of post-processing on the GPU would decrease as neural networks got better and that actually happened, so we took a chance by putting a fairly modest GPU into the design as you'll see. and that turned out to be a good bet safety is very important if you don't have a safe car you can't have a safe car so there's a lot of focus on safety and then of course safety in terms of doing the chip design. like her and alluded to above, there was actually no neuro network accelerator that existed in 2016, everyone was adding instructions to their CPU, GPU, or DSP to improve inference, but no one was doing it natively, so we set out to do it ourselves. themselves and then to other components on the chip. we purchased industry standard IPs for CPUs and GPUs which allowed us to minimize design time and also risk to the program another thing that was a bit unexpected when I first came was our ability to leverage existing equipment at Tesla Tesla had a wonderful power supply design teams test signal integrity package design system software firmware board layouts and a very good system validation program that we were able to leverage to speed up this program this is what it looks like there to the right you see all the connectors for the video coming from our cameras that are in the car you can see the two standalone computers in the middle of the dash and then to the left is the ly power supply and some control connections , so I really love it when a solution is stripped down to its most basic elements, it has video computing and power, and it's direct o and simple here is the original 2.5 hardware cabinet that the computer came in and we have been shipping For the last two years here is the new design for the FSD computer it is basically the same and of course it is driven by the limitations of having an upgrade program for cars.
I would like to point out that this is actually quite a small computer. It fits behind the glove box, between the glove box and the car firewall, it doesn't take up half of the trunk, as I said before, there are two totally independent computers on the board, you can see them, they are highlighted in blue and green on both sides of the big SOC , you can see the DRAM chips that we use for storage and then on the bottom left you see the flash chips that represent the file system so these are two separate computers that boot up and run their own operating system, yeah So, if I may add something, the general principle here is that if any part of this could fail and the call will still work, so the cameras fail, the power circuits could fail, you could have one of the Tesla pulse rifle auto- drive computer chips fail car keep driving the probability of this computer failing is substantially less than someone losing consciousness that's the key metric at least by an order of magnitude yes so one of the things we do to keep the machine running is they have redundant power supplies in the car so one of the machines runs on one single power supply and the others on the other the cameras are the same, so half of the cameras are powered by the blue power supply, the other half around the green power supply and both the chips receive all the video and process it independently, so in In terms of driving the car the basic sequence is to collect a lot of information from the world around you not only have we come flush we also have GPS radar maps the I M uses ultrasonic sensors all around the car we have wheel ticking steering angle we know which one we are supposed to it's the acceleration and deceleration of the car all of that integrates to form a plan once we have a plan the two machines exchange their independent version of each other the plan to make sure it's the same and assuming we agree we act and drive the car now once you've driven the car with some new control you have the costs you want to validate so we can validate that what we convey was what we intend to relay to the other actuators in the car and then you can use the sensor suite to make sure that happens so if you ask the car to speed up, brake or turn right or left you can look at the accelerometers and make sure you're actually doing it, so there's a tremendous amount of redundancy and overlap in both our data acquisition and data monitoring capabilities.
Next, we will move on to talk about full autonomous driving. chip a bit, it's packed in a 37.5mm BGA with 1,600 balls, most of them are used for power ground, but also for signal. If you remove the lid it looks like this you can see the substrate of the package and you can see the dye sitting in the center there if you remove the dye and turn it over it looks like this there's 13,000 C there are four bumps scattered across the top of the tint and then under the grid that there are twelve layers of metal and if you whats obscuring all the details of the design so if you remove that it looks like this is a 14 nanometer FinFET process CMOS it's 260 millimeters in size which is a modest sized iso for comparison the typical cell phone chip is about a hundred square millimeters so we're a little bigger than that but a high end GPU would be more like six hundred eight hundred square millimeters so we are in the middle.
I would call it the sweet spot, it's a comfortable size to build. There are 250 million logic gates in there and a total of six billion transistors which even though I work on this all the time that's mind blowing to me the chip is manufactured and tested to AEC q100 standards which is a standard automotive criteria. I like to just walk around the chip and explain all the different pieces to you and I'm going to go in the order that a pixel coming from the camera would visit all the different pieces so up top left you can see the cellular interface of the camera we can ingest 2.5 billion pixels per second which is more than enough to cover all the sensors we know of we have a network on the chip that distributes data from the memory system so that the pixels travel through the network to the memory controllers in On the left and right edges of the chip we use industry-standard DDR4 LPD memory running at 440,266 gigabits per second, giving us a maximum bandwidth of 68 gigabytes a second, which is a pretty healthy bandwidth but again this isn't ridiculous so we're trying to stay in the comfortable sweet spot for cost reasons the pr The image signal processor has a 24-bit internal pipeline that allows us to take full advantage of the HDR sensors that we have around the car, it does advanced tone mapping that helps bring out details and shadows and then it has advanced noise reduction that it just improves the overall quality of the images we are using in the neural network. neural network accelerator itself, there are two of them on the chip, each has 32 megabytes of SRAM to hold temporary results and minimize the amount of data we have to transmit in and out of the chip, which helps reduce power, each matrix has a multiplier of 96 by 96 Add a matrix with accumulation in place that allows us to do 10,000 multiplied ads per cycle.
There is dedicated riilu hardware. Dedicated pooling hardware and each of these delivers 306. Excuse me, they each deliver 3.6 trillion operations per second and they operate at two gigahertz, the two of them put together on a die that delivers 72 trillion operations per second, so we exceed our goal of 50 fees for a bit. There is also a video encoder in which we encode the video and use it. agreat variety of places in the car including backup camera display optionally there's a user function for dash camp and also for a cloud data logging clip which Stewart and Andre will talk about later there's a GPU on the chip it's modest performance it has support for 32 and 16 bit floating point and then we have 12 to 72 64 bit CPUs for general purpose processing running at 2.2 gigahertz and this is about two and a half times the performance available in the current solution. a safety system containing two CPUs working in unison this system is the final arbiter of whether it is safe to operate the actuators in the car so this is where the two plans come together and we decide whether or not it is safe to go ahead and lastly there is a security system and then basically the job of the security system is to ensure that this chip only runs software that has been cryptographically signed by Tesla if it hasn't been signed by Tesla then the chip doesn't work now.
I've told you a lot of different performance numbers and I thought it would be helpful to maybe put it into perspective a little bit. So throughout this talk I'm going to talk about a neural network from our narrow chamber that uses 35 Giga three five billion 35 operations. Giga applications and if we used the 12 CPUs to process that network we could do one and another. half frames per second, which is very slow. I am not good enough to drive the car if we use the 600 gigaflop GPU the same net we would get 17 frames per second which is still not good enough to drive the car with cameras.
On-chip neural network accelerators can deliver 21 frames per second and you can see from the scale as we go that the amount of computation on the CPU and GPU are basically negligible for what is available in the neural network accelerator. It really is night and day like that. Moving on to talking about the neural network accelerator, let's stop for some water. On the left is a cartoon of a neural network just to give you an idea of what's going on. The data enters at the top and visits each of the boxes. and the data flows along the arrows to the different boxes, the boxes are typically convolutions or d-convolutions with actual ooze, the green boxes are pooling layers and the important thing about this is that the data produced by one box is then consumed by the next frame and then you don't need it anymore you can throw it away so all that temporary data that gets created and destroyed as it flows through the network there's no need to store that chip and DRAM so we keep all that data at SRAM and I'll explain why that's super important in a few minutes.
If you look at the right hand side of this, you can see that in this network of 35 billion operations, almost all of them are convolution, which is based on dot products, the rest are deconvolution. also based on the dot product and then riilu and bundling which are relatively simple operations so if I were designing some hardware I would clearly aim to make dot products which are based on multiplied ads and really kill that but imagine if sped it up by a factor of 10,000 so 100% suddenly becomes 0.1% 0.01% and all of a sudden riilu and pool operations are going to be pretty significant so our hardware doesn't includes our hardware design dedicated resources for processing riilu and pooling now this chip is operating in a thermally constrained environment so we had to be very careful how we burn that power, we want to maximize the amount of arithmetic we can do , so we choose Integer Addition, it's 9 times less energy than the corresponding floating point addition, and we choose 8-bit-by-bit Integer Multiplication, which is significantly e less power than other multiply operations and probably enough precision to get good results in terms of memory.
We choose to use SRAM as much as possible and you can see that going from chip to DRAM is about a hundred times more expensive in terms of power consumption than using local SRAM, so clearly we want to use local SRAM as much as possible in terms of control. that was published in an article by Mark Horowitz at SCC where he criticized how much power it takes to run a single instruction on a normal integer CPU and you can see that the addition operation is only 0.15 percent of the full power. the rest of the power is controlling overhead and accounting so in our design we basically set out to get rid of all of that as much as possible because what we're really interested in is the arithmetic so here's the design that we ended up with you can see it's dominated by the 32 megabytes of SRAM there are big banks to the left and right and bottom center and then all the computing is done in the top half every clock we read 256 bytes of trigger data from the SRAM 128 matrix bytes of weight data from the SRAM matrix and combine it into a 96 by 96 mole matrix that performs 9000 ad times per clock at 2 gigahertz for a total of 3.6336 air points at 8 Tare ops now When finished with a dot product, we download the engine so that we transfer the data through the optionally dedicated riilu drive via a pool drive and then finally to a d buffer. e write where all the results are added and then we write 128 bytes per cycle. back to the SRAM and this all repeats all the time continuously so we're doing dot products as we download the old results, pooling and writing back to memory if you add it all to your Hertz you need a terabyte for the Estrin man's second was to support all of that work and so the hardest supplies are one terabyte per second, one bandwidth per engine, there's two on the chip, two terabytes per second, the chip has the throttle, has a relatively small instruction set, we have a DMA read operation to fetch data from memory.
We have a DMA write operation to return the results to memory. We have three instructions based on dot products. Deconvolution inner product of convolution and then two relatively simple ones. these two inputs and one output and then of course it stops when it's done we had to develop a neural network compiler for this so we took the neural network that has been trained by our vision team as it would be implemented in older cars old ones and when you take that and compile it for use in the new accelerator. The compiler performs layer merging, allowing us to maximize computation each time we read data from SRAM and put it back.
It also smooths it out a bit so it demands on memory. the system is not too lumpy and then we also do channel padding to reduce blowout conflicts and we do some Bank aware SRAM allocation and this is one case where we could have put more hardware into the design to handle Bank conflicts, but by embedding it in software, we save potential hardware at the cost of some software complexity. We also automatically insert DMA into the graph so that the data arrives just in time for computation without having to stop the machine, and then at the end we output everything. the code we generate all the weight data we compress it and add a CRC checksum for reliability to run a program all neural network descriptions our programs are loaded into SRAM at first and then sit there ready to run all the time so to run a network, you need to program the address of the input buffer, which is presumably a new image that just came in from a camera, set the address of the output buffer, set the pointer to the network weights, and then set go and then the machine works it will shut down and sequence through the whole neural network on its own, typically it will run for a million or two million cycles and then when it's done you'll get an interrupt and be able to process the results later, so it will go to re result, we had a goal to stay below 100 watts, this is data measured from cars driving with full autopilot stack and we're dissipating 72 watts which is a bit more power than the previous design but with the dramatic improvement in performance it's still a pretty good answer that 72 watts draws about 15 watts running the neural networks in terms of cost the silicon cost of this solution is about 80% of what we were paying before so we are saving money by switching to this solution and in terms of performance we take the narrow chamber neural network the one I've been talking about that has 35 billion operations, we run it on the old hardware like in a loop as fast as possible and we deliver one hundred and ten frames per second. took the same data the same network compiled it for the new FST computer hardware and using all four accelerators we can get 2300 frames per second processed so a factor of 21 I think this is perhaps the most significant slide it's day and night.
I've never worked on a project where the performance boost was more than three so that was a lot of fun when you compare it to the videos driving the AVR solution, a single chip gives 21 ter ops, our full self driving. a two-chip computer has 144 slices, so to conclude, I think we've created a design that delivers outstanding performance. 144 slices for neural network processing has outstanding power performance. We managed to cram all that performance into the thermal budget we had. allows for a fully redundant computing solution at a modest cost and really what is important is that this FSD computer will enable a new level of safety and autonomy in Tesla vehicles without affecting their cost or scope something I think we are all hoping for when we ask questions and answers after each segment so if people have questions about the hardware they can ask right now the reason why I would ask Pete to just do a detailed much more detailed and perhaps the most people would appreciate diving into the Tesla's full self-driving computer because at first it seems unlikely, how could it be that Tesla, who has never designed a chip before, would design the best chip in the world, but that is objectively what has happened? better by a small margin better by a large margin it's in the cars right now all the Tesla's are being produced right now they have this computer we switched from the Nvidia solution to SMX about a month ago I switched the model three about ten days ago every cars being produced have all the necessary hardware for computing and otherwise full autonomous driving.
I will say that a set of vintage Tesla cars being produced right now has what it takes for full autonomous driving. All you need to do is upgrade the software and check it out later today. they will drive the cars with the development version of the upgraded software and see for themselves the questions repeat a trip two or three global stock research very, very impressive in all ways and I was wondering like if I took some notes. You are using the activation function. Arielle, you're the rectified linear unit, but if we think about deep neural networks, it's multi-layered, and some algorithms might use different activation functions for different hidden layers like smooth.
Max or tan h, do you have flexibility to incorporate different activation functions instead of Lu in your platform? So I have a follow up. Yeah we have tan edge and sigmoid information for example one last question as in nanometers you mentioned 14 nanometers. as i was wondering it wouldn't make sense to go any lower maybe 10 nanometers two years before or maybe seven at the time we started the design not all the IP we wanted to buy was available in ten nanometers we had to finish the design in 14 maybe it's worth noting that we finished this design about a year and a half ago and started designing if next gen we're not talking about next gen today but we're halfway there all the things that are obvious for a next generation chip that we're doing oh hi you talked about the software as the piece now you did a great job.
I was impressed. I understood ten. percent of what you said, but I trust that he is in good hands. Thanks, it looks like you finished the hardware pieces and that was really hard to do and now you have to do the software piece, now maybe that's out of your expertise. think about that part of the software, what could you ask for a better introduction? Andre and Stuart I think so, are there any fun dating questions for the chip part before the next part of the presentation is neural networks and software so maybe I'll be the chip side? the last slide was 144 trillion ops per second compared to Nvidia 21 that's right and maybe you can put that into context for a finance person why is that gap so important thank you well I mean it's a factor ofseven and Delta performance for which means you can do seven times as many frames you can run neural networks that are seven times larger and more sophisticated so that's a very big coin that you can spend on a lot of cool things to improve the car I think savior power usage is higher than ours Xavier powers I don't know if it's the best I know power requirements would go up by at least the same factor of seven and costs would go up by a factor of seven as well so which yeah i mean how power is a real issue because it also reduces range so you have the auxiliary power is very high and then you have to get rid of that power because the thermal issue becomes really significant because you had to go to get rid of all that power thanks A lot I think we know a lot about asking the questions if you don't mind on race day but in the meantime we'll have the demo drives afterwards as well í that if they have yes, yes, if someone.
If you need to go out and demo drive a bit sooner, you can do that. I want to make sure we answer your questions. Yes, Pradeep Romani from UBS. To some extent, Intel and AMD have started to move towards a chip lab-based architecture. I didn't notice a garland-based design here, did I? I think looking to the future would be something that might be of interest to you from an architecture standpoint, a chip-based architecture. Yeah, we're not currently considering any of that. want to embed silicon germanium or DRAM technology on the same silicon substrate that gets pretty interesting but until die size gets nasty I wouldn't go there the strategy here on this is the start basically makes three bit more than three years, where is it? design, build a computer that is fully optimized and aim for full autonomous driving, then write software that is designed to work specifically on that computer and take full advantage of outside of that computer, so it's been tailored to the hardware, i.e. is a master of a trade autonomous driving video is a big company but they have a lot of customers and so when they apply their resources they need to make a widespread solution we care about one thing autonomous driving so it was designed to do that unbelievably well the software is also designed to run on that hardware unbelievably well and the combination of the software and the hardware I think is unbeatable I the chip is designed to cross this video input in case you use let's say lidar , would it be able to process that as well or is it mainly for video? orb lidar is doomed doomed expensive expensive sensors that are unnecessary it's like having a bunch of expensive appendages that compacting an appendage is bad well now don't put a bunch of them that's ridiculous you'll see so just two questions on just on power consumption, Is there a way to give us a general rule?
You know every watt reduces the range by a certain percentage or a certain amount just so we can get an idea of how much of a model three is the goal. Consumption is 250 watts per mile it depends on the nature of the driving as to how many miles that effect in the city would have a much bigger effect than on the highway so you know if you are driving for an hour in a city and you had a hypothetical solution you know it was a kilowatt you'd lose four miles on a model three so if you're just going to say 12 miles an hour then it's like there's a 25 second hit to range in town they're basically the powers of power that system power has a massive impact on city range which is where we think most of the Robo taxi market will be so power is extremely important sorry thanks um what is the main design goal of the next generation ship, we don't want to talk too much about the next generation ship, but it will be at least say three times better than the current system two years from now is the chip being mainly but you don't mean that you bill the chip that you hire that and how much cost reduction does that save in the total cost of the vehicle, the 20% cost reduction that I quoted was the cost reduction per part per vehicle, it was not a cost dev it was just real yea im saying but like im mass manufacturing these does this save money doing it yourself? yeah kinda i mean most chips are made for most people they don't make chips there's no five it's very unusual.
Any supply problem without getting the chip. Mass produce. Cost savings pay for development. I mean the basic strategy for Elon was we're going to build this chip. It will reduce costs. right yeah sorry if they really are chip specific questions we can answer them to others there will be a Q&A after Andre speaks and after Stuart speaks so there will be two other Q&As this is very chip specific so i'll be here. all afternoon yes and exactly me If the people will also be here at the end, very good, REO. custom design for Tesla and then I guess the follow-up would be there's probably a fair amount of opportunity to reduce that footprint as you tweak the design, it's actually quite dense so in terms of reduction, I don't think so. it will greatly improve the functional capabilities in the next generation, okay, and then the last question, can you share where you are?
Are you failing in this part? What, where are we, oh, is it Samsung? and the design comes from him from an IP standpoint and I hope he doesn't give away a lot of outside IP for free. Thank you, we have filed on the order of a dozen patents on this technology, it is fundamentally linear algebra. I don't think you can patent, oh, I'm not sure, I think if somebody started today and they're really good, they could have something like what we have now in three years, but in two years we'll have something three times better when it comes to intellectual property. protection you have the best intellectual property and some people just steal it for fun.
I was wondering if we see some interactions with Aurora that companies in the industry believe stole their intellectual property I think the key ingredient you need to protect is the weights that are attached to various parameters do you think your chip can do something to prevent someone from encrypting all the weights so you don't even know what the weights are at the chip level so your IP stays inside it and nobody knows and nobody can feel it, man. I would like to meet the person who could do that because it was. I would hire them in a heart beat yeah so every problem is hard yeah. h I mean we encrypt the it's a hard journey to crack so if they can crack it it's great if they crack it and then they also figure out the software and the neural network system and everything else that they can design it from. scratching like that is that's all our intent is to stop people from stealing all that stuff and if they do we hope it will at least take a long time it will definitely take a long time yeah i mean i felt like we were if it was alcohol to do that , How we do it?
It will be very difficult, but I think a very powerful sustainable advantage for us is the fleet that nobody has the fleet that those bucks are consuming updated and improved based on billions of miles driven Tesla has a hundred times as many cars with the self-driving hardware complete than everyone else combined you know we have by the end of this quarter we'll have 500,000 cars with the full setup of eight cameras twelve ultrasounds someone will still be on hardware two but still have the ability to collect data and then a year from now we'll have over a million cars with full autonomous computer hardware yeah we should have some fun it's just a massive data advantage it's similar to like you know like google search engine has a huge advantage because people use it and people are programming effectively program google with the queries and the results yeah just press that and rephrase the questions so I'm addressing and if it's appropriate, but you know when we talk to Wayne.
Moe or Nvidia speak with equal conviction about their leadership due to their proficiency in simulating miles driven. Can you talk about the advantage of having real world miles versus simulated miles because I think they expressed that you know when you get a million? they can simulate a billion and no Formula One racing driver, for example, could successfully complete a real world track without driving in a sim c and you talk about the advantages it sounds like you perceived to be associated with ingesting data coming from of real world miles vs simulated miles absolutely the simulator we've got a pretty good simulation too but it just doesn't capture the long tail of Weird Things Happening in the Real World if the simulation fully captured the real world well I mean that it would be proof that we are living in a simulation.
I don't think so. I would, but the simulations don't capture the real world. The real world is really weird and messy. You need the cars on the road and we actually get that in Andre and Stuart's presentation. Yeah okay when we go to 200 last question was actually very good Segway because one thing to remember about our F is computer is that it can run much more complex neural networks for much more accurate image recognition and talk to you about how we actually get the image data and how we analyze it. Go to our Senior Director of AI, Andre Karpati, who will explain all that to you.
Andre has a PhD from Stanford University, where he studied computer science with a focus on education recognition and deep learning. Andre why don't you talk and do your own introduction if there's a lot of PhDs from Stanford that's not important yeah yeah we don't care come on thanks Andre started the computer vision class at Stanford that's a lot more important that's what matters so if you can please talk about your experience in a way you are not shy just tell me what's up with the SEC redundancy yeah and then sure yeah I think I've been training neural networks basically for what is now a decade and these neural networks were not really used in industry until about five or six years ago. so it's been some time since I've trained on these neural networks and that included institutions at Stanford at Google opening and I've really just trained a lot of neural networks for not only images but also natural language and design architectures that combine those two modalities for my PhD so the computer science class oh yeah and at Stanford he actually taught the Convolutional Neural Oryx class and therefore I was the lead instructor for that class.
I actually started the course and designed the entire curriculum so at first it was around 150 students and then it grew to 700 students over the next two or three years so it's a very popular class it's one of the biggest classes at Stanford right now so it was also really successful I mean on Dre as really one of the best computer vision people in the world possibly the best okay thanks yeah hi everyone so pete told you all about the chip we designed that runs neural networks in the car my team is responsible for training these neural networks and that includes all the data collection of the fleet neural network training and then part of the rollout on that so what do the n you know works exactly works on the car so what we're looking at here is a video sequence of the whole vehicle through the car these are o eight cameras sending us video and then these neural networks are looking at that video and they're processing it and making predictions about what they're seeing and so some of the things that we're interested in there are some of the things that you're seeing in this visualization here our lane line marking other objects the distances to those objects what we call drawable space shown in blue which is where the car is allowed to go and many other predictions such as traffic lights, road signs etc.
Now for my talk, I'll talk in roughly three stages, so first I'll give you a brief Introduction to neural networks and how they work and how they're trained and I need to do this because I need to explain in the second part why it's so important that we have the fleet and why it's so important and why it's a key enabling factor for really training this, you know, the networks and making them work effectively on the roads and in the third stage I'll talk about a vision and lidar and how we can estimate the depth from vision alone, so the core problem that these networks are solving in the car. is a visual recognition so four joins these are very this is a very simple problem you can look at all these four pictures and you can see that they contain a cello on an iguana or scissors so this is very simple and effortless forwe this is not the case for computers and the reason is that these images are for a computer really just a massive grid of pixels and in each pixel you have the brightness value at that point and so instead of just seeing an image a computer actually gets a million numbers on a grid that tells it the brightness values at all positions the maker knows if you want it it's really the matrix yeah so we have to go from that grid of pixels and brightness values to high level concepts like iguana etc and as you can imagine this iguana has a certain pattern of brightness values but iguanas can actually take on many appearances so they can have many different appearances , different poses and different brightness conditions on the different backgrounds, you may have a different crop of that iguana, so we have to be robust in all of those conditions and we have to understand that all of those different glitter palette patterns actually correspond to goannas now the reason you and me are so good at this we have a massive neural network inside our heads are processing those images so that light hits the retina travel to the part back of your brain to the visual cortex and the original cortex consists of many neurons that are connected together and that are doing all the pattern recognition on top of those images and really about the last I would say about five years, the last approaches generation to process images using computers have also started using neu neural networks but in this case artificial neural networks but these artificial neural networks and this is just a cartoon diagram they are a very rough mathematical approximation to your visual cortex we will actually have neurons and they are connected together and here I am alone it shows three or four neurons and three or four in four layers , but a typical neural network will have tens of hundreds of millions of neurons and each neuron will have a thousand connections, so they're really big pieces of almost simulated tissue and then what we can do is we can take those neural networks and we can show them images, for example I can feed my iguana into this neural network and the network will make predictions about what it sees now, at first these neural networks are initialized completely randomly so the connection strengths between all those different neurons are completely random and so the predictions from that network will also be completely random so it might I don't think you're looking at a boat right now and it's highly unlikely that it's an iguana and during training during a training process it really does.
What we are doing is that we know that it is actually in iguana we have a label so what we are doing is basically saying that we would like the iguana probability to be higher for this image and the probability of all other things to go down and then there's a mathematical process called back propagation, stochastic gradient descent that allows us to back propagate that signal across those connections and upgrade each of those connections and upgrade each of those connections just a small amount and once the upgrade is complete the iguana chance for this image will go up a bit so it might become 14% and the chance of the other things will decrease and of course we don't do this for just this one image, we actually have large entire data sets that are labeled so we have a lot of images, typically you might have millions of images, thousands of labels or something and it's doing a back and forth pass it's over and over so it shows the computer here's a picture , you have an opinion and then you say this is the correct answer and you tune in a bit, you repeat this millions of times, and sometimes you show the same image to the computer you know hundreds of times too, so the training of the network will usually take on the order of a few hours or a few days depending on the size of the network you are training and that is the process of training a neural network now there is something very unintuitive about the way neural networks work in what i really have to dig into is they really require a lot of these examples and they really start from scratch they don't know anything and it's really hard to understand this as an example here's a cute dog and you may not know the breed of this dog , but the correct answer is this is a Japanese spaniel now we're all looking at this and we're seeing Japanese spaniels more like or okay I get it.
I get what this Japanese spaniel looks like and if I show you some more pictures of other dogs you can probably pick other Japanese spaniels here so in particular those three look like a Japanese spaniel and the others do. no so you can do this real quick and you need an example but computers don't work like that they actually need a ton of japanese spaniel data so this is a grid of japanese spaniels showing them in Thousands of examples showing them in different poses. different brightness conditions different backgrounds different crops you really need to teach the computer from all different angles what this japanese spaniel looks like and it really requires all that data to make it work otherwise the computer can't detect that pattern automatically, so with us this all implies about setting up autonomous driving of course we don't care too much about dog breeds maybe we will at some point but for now we really care Ling line markings objects where they are where we can drive and so on so the way we do this is we don't have tags like iguana for the images but we have fleet images like this and we're interested in say the line markings so a human you usually go into an image and use a mouse to annotate the ling line markings so here is an example of an annotation that a human could create a tag for this image and it says that's what you should be seeing in this image these are the ling line marks and then what we can do is go to the fleet and we can ask for more fleet images and if you ask the fleet if you just do one Evo of this and just ask for random images the fleet might reply with images like this normally driving along some highway this is what you might get as a random collection like this and we would write down all those data if you are not careful and just write down a random distribution of this data if the network will pick you p in this this random distribution of data and works only in that regime so if you show it a slightly different example eg here there is an image that actually the road is a curve and it is a slightly more residential neighborhood, so if the neural network shows this image that network could make a prediction that is incorrect it could say that it is ok, bu eno I've seen many times where highway lanes just move on so here's a possible prediction and of course this is wildly wrong but the neural network really can't blame it on it doesn't know the train on the The tree on the left matters or not, it doesn't know if the car on the right matters for not to the line of the lane, it doesn't know that the buildings in the background matter or not, it really starts completely from scratch and you and I know that the truth is none of those things matter, what really matters is that there are some white lane line markings there and at a vanishing point point and the fact that they curl up a bit should drive the prediction, except that there is no mechanism by which we can just tell the neural network hey those linge line marks actually matter the only tool in the toolbox we have is labeled data so what we do is bad taking pictures c How is it when the network fails and we need to label them correctly so in this case we'll turn the rail to the right and then we'll need to send a lot of images of this to the neural and neural network which over time will humiliate it will basically figure out this pattern that those things don't matter but those markings on the leg lines do and we learn to predict the correct lane so what's really critical is not just the scale of the data set, we don't just want millions of images, we actually need to do a very good job of covering as much space as possible from things the car might encounter on the roads, so we need to teach the computer how to handle scenarios where there is light and humidity, it has all these differences.
There are no specular reflections and as you can imagine the patterns of brightness and these images will look very different we have to teach a computer how to deal with shadows how to deal with forks in the road how to deal with large objects that could occupy most of that image of how to deal with tunnels or how to deal with construction sites and in all these cases there is no explicit mechanism again to tell the network what to do we just have massive amounts of data we want to get all those images and we want to write down the right lines and the network will pick up the patterns from those now large and varied data sets basically make these networks work very well this is not just a finding for us here at Tesla it's a ubiquitous finding across the industry so what the experiments and baidu facebook google research of deepmind alphabets all show similar plots where neural networks rea They really love data and they love scale and variety as you add more data these neural networks start to work better and get higher precisions for free so more data just makes them work better now various companies have a number from people who have pointed out that we could potentially use simulation to achieve the scale of the data sets and we're in charge of a lot of the conditions here and maybe you can achieve some variety in a simulator now at Tesla and that was also mentioned in the questions just before this now at Tesla this is actually a screenshot our simulator we use simulation a lot who use it to develop and test the software we also use it for training quite successfully but really when it comes to training data of their networks, there really is no substitute for real data, the simulator simulations have a lot of problems with modeling the to physics of the appearance and behaviors of all the agents around it, so there are a few examples to really prove that point in the real world which really throws up a lot of crazy stuff.
In this case, for example, we have very complicated environments with snow with windy trees we have various visual artifacts that are difficult to simulate potentially we have complicated construction sites bushes and plastic bags that can blow the wind complicated construction sites that could present lots of people, kids, animals all mixed together and simulating how those things interact and flow through this construction zone could actually be completely intractable, it's not about the movement of a pedestrian there, it's about how they respond. to each other and how those cars will respond to each other and how they'll respond when you're driving in that setup and all of those are really hard to simulate it's almost like you have to solve the autonomous driving problem to just simulate other cars in your simulation so It's very complicated, so we have dogs, exotic animals, and in some cases, it's not even that it can't be simulated, it's that it doesn't even occur to you, so, for example, I know that you can have truck after truck after truck like that, but in the In the real world you find this and you find many other things that are very difficult to imagine.
Really, the variety that I see in the data coming from the fleet is insane compared to what we have in a simulator. We have a really good simulation simulator. Basically, you are a pimple. You are grading your own homework. you know if you know you are going to simulate it ok you can definitely figure it out but as Andre says you don't know what you don't know the world is very strange and has millions of corner cases and if anyone can produce a driving simulation self-contained that precisely matches reality which in itself would be a monumental achievement of human capability. for neural networks to work well you need the These three essentials require a large data set, a very data set, and a real data set, and if you have those capabilities, you can train everything that works and make it work very well.
OK, so why is Tesla such a unique and interesting position? to really get these three essentials right and the answer to that of course is the fleet we can actually get data from it and make our neural network systems work extremely well so let me show you a concrete example of for example , make the object detector. they work best to give you an idea of how we build them into everything that works, how we iterate them, and how we make them work overtime, so object detection is something weIt matters a lot we'd like to put bounding boxes let's say the cars and objects here because we need to track them and we need to understand how they might move so again we could ask the human annotators to give us some annotations for these and the humans could come in and I tell them ok those patterns over there are cars and bikes etc and you can train your neural network on this but if you are not careful the neural network will make wrong predictions in some cases eg if we bump into a car like this which has a bike in the back so the neural network actually left when I joined it would actually create two detections it would create a car detection and a bike detection and that's sort of correct because I assume both objects really just but for the purposes of the driver in a later planner you really don't want to deal with the fact that this bike can go with the car, the truth is, that bike is attached to that car, so in terms of similar objects on the road, there's a single object, a single car, so what you'd like to do now is write down potentially a lot of those images as this is a single car so the process that we go through internally in the team is we take this image or some images that show this pattern and we have a machine learning mechanism whereby we can ask the fleet to provide us with examples that look like this and the fleet could respond with images containing those patterns, for example these six images may come from the fleet they all contain bikes on the back of cars and we we'd go in and annotate all of those as one car and then the performance of that detector really improves and the network understands internally that it hears when the bike is just plugged in to the car that is actually just a car and you can learn that with enough examples and that is how we have fixed that problem.
I will mention that I talked quite a bit about getting fleet data. I just want to make a quick point that we've designed this from the ground up with privacy in mind and all the data we use for training is anonymized now the fleet doesn't just respond with bikes on the back of cars we look for everything we look for a lot of things all the time for example we search for ships and the fleet can respond with ships that we search for construction sites and the fleet can send us many construction sites around the world that we search for even a little rarer cases for example find Debris on the road is very important to us, so these are examples of images that have been passed down to us from the fleet showing tires, cones, plastic bags and the like, if we can get it to scale. you can write them down correctly and then your network will learn how to deal with them in the world.
Here is another example. animals, of course, is also a very rare event, but we wanted the neural network to really understand what's going on here, that these are animals and we want to deal with that correctly, so to summarize the process by which we iterate on the neural network predictions looks like this, we start with an initial data set that was potentially randomly obtained, we annotate ate that data set and then we train your lab to work with that data set and put it in the car and then we have mechanisms by which we notice inaccuracies in the car when this detector may be behaving for example if we detect the neural network may be uncertain or if we detect that or if there is driver intervention or any of those configurations we can create this activation infrastructure that sends us data of those inaccuracies and, for example, if we do not perform very well in the detection ection of rail lines in tunnels, then we can notice that there is a problem in the tunnels, that image would go into our unit tests so that we can verify that we have really fixed the problem over time, but now what it does is correct this inaccuracy, you need to get a lot more examples than so we asked the fleet to send us a lot more tunnels and then we label all those tunnels correctly, feed them into the training set and retrain the network, redeploy and iterate the loop over and over time and we refer to this iterative process by which we improve these predictions as the data engine, so we iteratively implement something potentially in stealth mode, we get inaccuracies and we feed the training set over and over again and d we do this basically to all the predictions of these neural networks so far.
I've talked a lot about explicit labeling, so as I mentioned, we asked people to annotate the data. This is a costly process at the time and it was also especially yes. it's just an expensive process and so these annotations of course can be very expensive to achieve so what I want to talk about is also really utilizing the power of the fleet you don't want to go through this bottle of human annotation like you just want to stream data and automate it automatically and we have multiple mechanisms by which we can do this as an example of a project we worked on recently is current sensing so you're driving down the road someone's on the left or right and they cut in front of you in your lane so here's a video showing the autopilot detecting this car is cutting into our lane now of course we'd like to detect a current as fast as possible so the way we approach this p The problem is that we didn't write explicit code for if the left turn signal is on it's the right blinker on the keyboard tracking over time and seeing if it moves horizontally we actually use a fleet learning approach so the way this works is we ask the fleet to send us data any time you see a car transition from a right lane to center lane or left to center lane and then what we do is we go back in time and we can automatically note that that car will turn in 1.3 seconds by cutting to the front of the preview and then we can use that to train that its lat and so the neural network will automatically detect a lot of these patterns for example cars are typically Yod and then they move like this maybe the flashing light is on all those things happens internally within of the neural network just from these examples, so we ask the fleet to automatically send us all this data, we can get half a million images and t all these would be annotated for streams and then we train the network and then we take this outage in the network and deploy it to the fleet but we still don't turn it on we run it in shadow mode and in shadow mode the network is always making predictions hey i think this vehicle is going to be cut off from the way it looks like this vehicle is going to be cut off and then we look for wrong predictions so as an example this is a clip we had of the shadow mode of the cut on the grid and it's a Little hard to see, but the network thought the vehicle was fine. ahead of us and to the right it's going to interrupt and you can see it's flirting a little bit where the lane line is trying to encroach on a little bit and the network got excited and they thought that was going to be cut on that vehicle will actually end up on our center lane which turns out to be wrong and the vehicle didn't actually do that so what we do now is we turn on the data engine that we got that worked in shadow mode it's making predictions it does some false positives and there are some false negative detections , so we get overly excited and sometimes miss the cut when it actually happened all of that creates a trigger that streams to us and it's built in now for free no harm humans in the process of labeling this built in data to free in our training set we retrain the network and re-implement shadow mode so we can spin this multiple times and always observe the false positives and negatives coming from the fleet and once we're happy with the proportion of false positives or negatives indeed, we switch a bit and let the car control that network, so you may have noticed we sent a from our early builds of a copy of the architecture intact about three months ago so if you've noticed the car is much better at sensing currents than it is learning fleet operating at scale yeah it actually works pretty well so that's board learning no harm to humans in the process it's just a bunch of ne data driven rural network training and a lot of shadow mode and looking at those results another very centrally like they all train the network all the time is what matters if the order polishing on or off the network is trained every mile that is driven the car that is more difficult or superior is to train the network yes another way interesting where we use this in a fleet learning scheme in the other project i'll talk about is a route prediction so while you're driving a car what are you actually doing? he's noting the data because he's driving the wheel, he's telling us how to go through different environments, so what we're looking at here is a person in the fleet that made a left turn at an intersection and what we're doing here is we've got the full video of all the cameras and we know the path this person took because of the GPS the inertial unit of measurement the angle of the wheel the wheel ticks so we put all of that together and under respect the path this person took through this environment and then of course this you can use for network monitoring so we just look up a lot of this moon fleet train a neural network on those trajectories and then neural path predictions just from that data so really what this refers to is typically called imitation learning we're taking real world human trajectories I'm just trying to mimic how people drive in the real world and we can also apply the same data engine crank to all of this and make this work over time so here's an example of route prediction through a complicated environment so what you're looking at here is a video and we're overlaying the network predictions so this is a path that the network would follow in green and some yeah maybe the crazy thing is that the network is predicting paths that it can't even see with incredibly high accuracy that they can't see around the corner but I would but it's saying the probabi The reality of that curve is extremely high so that's the road and it nails it you'll see that on the cars today but we're going to turn on the augmented vision so you can see the lane lines and the path predictions of the cars superimposed on the video, yes there is actually more going on under the hood that may even scare you and of course there are a lot of details that I am skipping , it's po You may not want to annotate all the drivers that you could annotate, you just might want to mimic the best drivers and there are many technical ways that we actually slice that data, but what's interesting here is that this prediction is actually a 3D prediction that we project back to the image here so the road ahead is a 3d thing that we're just rendering in 2d but we know about the slope of the ground from all of this and that's extremely valuable in driving so the mathematical prediction is actually active in a fleet today by the way so if you're driving Clover Leafs if they were a cloverleaf on the road until about five months ago your car wouldn't be able to do the cloverleaf now it might be a prediction that is running live on your cars we sent this a while back and today you will get to experience this to get through intersections a big component of how we go through the intersections in your drives today comes from an automatic label prediction so what i talked about so far is really the three key components of how we iterate on the network predictions and how we make it work with the time, you need a large set of varied and real data.
We can actually achieve that here at Tesla and we do it via scaling to float the data engine sending things in shadow mode iterating that loop and potentially even using fleet learning where you don't harm the human annotators in the process and just we use data automatically and we can really do it at scale so in the next section of my talk I'll talk especially about using g vision depth perception alone so you may be familiar that there are at least two sensors in the car , one is vision cameras that just get pixels and the other is lidar which a lot of companies use as well and lidar gives you these point measurements of distance around now one thing i would like to point out first is everyone came here a lot ofyou guys drove here and you used your neural network and your vision you weren't shooting lasers out of your eyes and you still ended up here We may be so clear that the human neural network derives distance and all measurements in three dimensional understanding of the world just from of vision.
It actually uses several keys to do it so I'll go over a few of them briefly just to give you an idea of more or less what's going on and inside, as an example, we have two ices flagged so you get two independent measurements at each time step. of the role in front of it and your brain puts this information together to come up with an estimate of depth because it can try to gulate any point through those two viewpoints many animals have eyes that are set to the sides so they have very little overlap in their visual fields so they'll typically use the motion structure and the idea is that they move their heads and because of the motion they actually get multiple observations of the world and you can triangulate depths again and even with one eye closed and completely immobile, you can still get some sense of depth perception if you did this, I don't think you'll notice that I'm moving two meters towards you you or a hundred miles back and that's because there are a lot of very strong monocular signals that your brain takes into account as well.
This is an example of a fairly common visual illusion where you know these two blue bars are identical, but the way your brain puts the scene together is that it simply expects one of them to be bigger than the other because of the lines. vanishing of this image so your brain does a lot of this automatically lly and an artificial neural network scan of neural networks as well so let me give you three examples of how you can arrive at depth perception from vision alone , one classic approach and two that are based on neural networks, so here's a video.
I think this is San Francisco from a Tesla so these are our cameras are detecting and we're looking at everything I'm just showing the main camera but all the cameras are on all eight cameras on autopilot and if you only have this six second clip it what you can do is stitch together this 3d environment using multi-view stereo techniques, so this is supposed to be a video, isn't it a video? six seconds of that car driving down that road and you can see that this information is purely retrievable from just video and pretty much that's through the process of triangulation and as I mentioned multi-view Syria and we've applied similar techniques in A bit more sparse and rough on the car too, so it's remarkable that all the information is actually there on the sensor and just a matter of extracting it.
The other project that I want to talk about briefly is, as I mentioned, there's nothing about neural networks. Neural networks are very powerful visual recognition engines and if you want them to predict depth then you need to for example look for depth labels and then they can do it extremely well so there is nothing limiting networks in predicting this monocular depth , except tag data, so an example The project that we've actually looked at internally is that we use the forward-facing radar shown in blue and that radar is looking at and measuring the depths of objects and we use that radar to annotate which vision you're seeing bounding boxes coming out of the neural networks so instead of human annotators telling you ok this car and this bounding box are about 25 meters apart you can annotate that data a lot better using sensors so sensor annotation as an example radar is pretty good at that distance you can annotate that and then you can train your lab work on it and if you just have enough data this neural network is very good at predicting those patterns so here's an example of predictions of that so circling I'm showing radar objects and inside , and the keyboards that are going out or here are purely from vision, so the keyboards here are going out of vision and the depth of those cuboids is learned by a radar sensor annotation, so if this works great, you'll see that the circles in the top-down view would match the keypads and they do, and that's because you know the sticks are very proficient at predicting the depths different vehicle sizes can learn internally and they know how big they are those vehicles and you can actually derive depth from that pretty accurately.
The last mechanism that I'll talk about very briefly is a bit more f. It's a bit more technical, I suppose, but it's a mechanism that's been published recently in a few articles, basically over the last year or two, on this approach. It's called self-monitoring so what you do in a lot of these articles is you just feed raw video into untagged neural networks and you can still learn you can still get neural networks to learn in depth and it's kind of a little bit technical as well which I can't go into full details, but the idea is that the neural network predicts the depth at every frame of that video and then there are no explicit targets that the neural network is supposed to fall back to with the labels, rather that the target of the network is to be consistent over time so any depth you predict should be consistent for the duration of that video and the only way to be consistent is to be right as the network automatically predicts the value of all pixels and we have replicated some of these results in-house so it works pretty well too, in short people drive with vision only no lasers involved it seems fu To get across pretty well the point I would like to make is that visual recognition and really very powerful recognition is absolutely necessary for autonomy it is not good to have neural networks that actually understand the environment around you and lidar points are a much less rich environment in information so the vision really understands all the details just a few points around it's a lot there's a lot less information in those So as an example on the left here's a plastic bag or that outfit well lidar might give you A few points on that, but vision can tell you which of those two is true and that affects your control, is that person looking slightly to the rear, are they trying to merge into your lane on the bike or are they just pulling in places of construction?
What do those signs say? How should I behave in this world? All the infrastructure we have built for the roads. everything is designed for human visual consumption, so all the signs, all the traffic lights, everything is designed for vision and that's where all that information is, so you need that capability. The person who is distracted and on their phone goes to work walk to their lane those answers to all these questions are only found in vision and are necessary for level 4 level 5 autonomy and that is the capability that we are developing at Tesla and through this is done through a large scale mix on your bottom training through the data engine and making it work over time and using the power of the fleet and in this sense lidar is really a shortcut it sidesteps the issues fundamentals the important problem of visual recognition which is necessary for autonomy and therefore gives a false sense of progress and ultimately is a crutch gives really quick demos so if I were to sum up all my one slide talk it would be all about autonomy because you want tier 4 and tier 5 systems that can handle all possible situations 99.99% of the time and chase some of the last few nights it's going to be tricky and very difficult t and it's going to require a very powerful visual system so I'm showing you some images of what you can find anywhere on those nine so at first you just have very simple cars going forward then those cars start to look like kind of fun then maybe you have black stone cars then maybe you have cars and cars maybe you start to get into really weird events like flip cars or even airborne cars we see a lot of stuff coming from the fleet and we see them in some qualify as a really good rate compared to all of our competitors and so the rate of progress at which you can really address these issues iterate on the software and really feed the neural hours with the right data that rate of progress is really only proportional to how often you encounter these situations in the wild and find them significant mind more often than anywhere else so we'll do extremely well thank you everything is super awesome thank you very much how much ch data, how many images are you collecting on average from each car per time period and then it looks like the new hardware with the dual active dual active computers gives you some really cool opportunities to run in full simulation a copy of the neural network while you're running the other one the only one who drives the car and compares the results to do QA and then I was also wondering if there are any other opportunities to use the computers for training when they are parked in the garage for the 90% of the time that I don't drive my Tesla thank you very much yes so for the first question how much data do we get from the fleet?
So it's very important to note that it's not just one scale of the data set. it's really the variety of that data set that matters if you just have a lot of images of something going down the road at some point a neurologist gets it he doesn't need that data so we're really strategic and how we can choose and the tr The igger infrastructure that we've built is pretty sophisticated parsing to get just the data that we need right now, so it's not a huge amount of data, it's just very well curated data for the second question regarding redundancy, absolutely you can run Basically the network copy on both and that's how it's designed to achieve a little 405 system that's redundant so that's absolutely the case and your last question sorry I didn't train the car it's a computer optimized by inference that we make. we have a major program at Tesla that we don't have enough time to talk about today called dojo which is a super powerful training computer that the Gulf of Georgia will be able to receive massive amounts of data and train at the video level and do it unattended mass training of large amounts of video with the program dojo computer dojo but that's for another day test pilot sort of because i drive the four five ten and all this really complicated really long stuff happens eve ry day but the only challenge i'm curious how you're going to solve is to change lanes because every time I try to get into a lane with traffic, everyone cuts you off and therefore human behavior is very irrational when driving in Los Angeles. and the car just wants to do it safely and you almost have to do it unsafely so I was wondering how you're going to solve that problem yeah so one thing I'll point out is I talked about the data engine as an iteration in the neural networks, but we do the exact same thing at the software level and all the hyperparameters that go into the options of when we link to actually change how aggressive we are.
We're always changing those that potentially run in shadow mode and seeing how well they work and so on. tweaking our heuristics when it's okay to change lanes, we'd also potentially use the data engine and a shadow mode etc, ultimately designing all the different heuristics for when it's okay to change lanes is actually a bit intractable I think . nk in the general case and ideally you really want to use fleet learning to guide those decisions so when do humans change lanes in what scenarios and when do they feel it's unsafe to change lanes and let's look at a lot of the data ? and train machine learning classifiers to distinguish when it's too safe to do so and those machine learning classifiers can write much better code than humans because they have a lot of data backing up so they can actually set all the right thresholds and be according to humans and make something safe well we'll probably have a mode that goes beyond mad max mode to LA traffic mode yeah well you know mad max would have a hard time in LA traffic I think yeah so it's really a tradeoff like you don't I don't want to create unsafe situations but I want to be assertive but that little dance of how you make it work as a human being is really very complicated.
It was very difficult to write it in code, but I think we really do. It really does. It seems like the machine learning approach is like the right way to do it, where we just look at a lot of ways people do this and try to mimic that justwe are being more conservative right now as we earn more. higher trust will allow users to select a more aggressive mode which will depend on the user but in more aggressive modes when trying to merge into traffic there is a slight difference from you and more no matter how many new there is a small chance of I like a fender or not a bad crash but basically you'll have a choice if you want a non-zero chance of a fender in freeway traffic which unfortunately is the only way to navigate traffic yes yes yes yes yes yes yes and that was fine with the story going on yes you will have more aggressive options over time that will be user specified yes ma'am x+ exactly yes hi hi Jed or Hummer from Canaccord Genuity thanks and congratulations on all you've built as we look at the zero alpha project, it was a very defined and limited variable in terms of the parameters that allowed the learning curve to be so fast, the risk or what What you're trying to do here is almost build awareness in the car through the neural network so I guess the challenge is how not to create a circular reference in terms of pulling the centralized model of the fleet to that handover where the car you have enough information where is that line i guess in terms of the point of the learning process is to deliver it where there is enough information in the car and not have to pull out of the fleet look good the car can work if you are completely disconnected from the fleet just load that training that's what it tastes better and better as free fleet gets better and better so simply if you're logged in from fleet from then on it would stop getting better but it worked so well at heart than earlier version and he talked about a lot of the benefits of being able to not store a lot of the images, so in this part he's talking about the learning that occurs ext raer from the fleet I guess I'm having a hard time reconciling how if there was a situation where I'm driving uphill like you showed and I'm predicting where the road will go that comes from all the other fleet variables that led to that intelligence how I am not how I am getting the benefit of low power using the cameras with the neural network which is where I'm losing the - maybe it's just me but I guess I mean the computing power on the full autonomous driving computer is amazing and maybe we should To mention that if I had never seen that road before I would still have made those predictions as long as it was a road in the United States in the case of lidar the march of the nines isn't there an example? to your slam on lidar because it's pretty clear you don't like it What light is there in this last flame The lighter is a name that's not there as a case where sometime nine nine nine nine nine down the road we're actually LIDAR can be useful and why not have it as a kind of redundancy or the backup sets up my first question and the second one so you can continue to focus on computer vision, but just as a redundancy.
My second question is if that is true, what about the rest of the industry that is building their autonomy solutions on lidar? we're all going to throw lidar that's my prediction mark my words i should point out i don't actually hate light much or as much as it may seem but in SpaceX the basics the dragon uses lidar to navigate to the space station or dock Normally we SpaceX built their own lidar from the ground up to do that and I personally spearheaded that effort because in that scenario lidar makes sense and size is fucking stupid and expensive and unnecessary and like Larry was saying once you solve the vision it's worthless so you have worthless expensive hardware in the car we have a four word radar that is low cost and useful especially for occlusion situations so if it's foggy or dusty or you know snow the radar can see through that if you're going to use active photon generation don't use visible wavelength because once you have passive optics you will have taken care of all the visible wavelength stuff that d esee if you want to use a wavelength that is an occlusion that penetrates like radar, then what is Lana? only active photon generation in the visual spectrum if you're going to generate active photon generation do it outside the visual spectrum on radars in the radar spectrum so at twenty point eight millimeters versus 400 to 700 nanometers it's going to be much better penetration of occlusion and that's why we have a forward radar and then we also have only twelve ultrasonics for near field information in addition to the eight cameras and the radar ford youngsters need the radar in all four directions because that's the only direction you're going very fast so it's I mean we've been through this a few times as usual we're sure we've got the right size candy we should add some more not tall so here So then you mentioned you asked the fleet for the info you are looking for part of the vision and i have two questions about it.
Well, it looks like the cars are doing some math to determine what kind of information to send. you are that's a correct assumption and they're doing it in real time or they're doing it based on stored information so they absolutely do real time calculations on the car over there and we'll wait to basically specify the condition that we're interested in and then those cars do that competition there, if they didn't then we would have to send all the data and do it offline in our back end. We don't want to do that, so all of that calculation has us in the car so based on that search it sounds like you guys are in a very good position to currently have half a million cars in the future, potentially millions of cars that are essentially computers. representing almost free free data centers for you yeah for computing it's a great future opportunity for the Tesla car it's a current opportunity and that's not taken into account for anything yet that's awesome thank you we have four hundred and twenty five thousand cars with hardware two and up which means they have all eight cameras to the right of the radar in ultrasonics and they have at least one nvidia computer that is enough to figure out what information is important what is not to compress the information that is important to the most important elements highlights and upload it to the network for training, so it's a massive compression of real world data, you have this type of network of millions of computers which is like massive data centers essentially which are distributed data centers for computational capacity do you see it being used for things other than autonomous driving in the future?
I guess it could poss probably be used for something other than autonomous driving we'll focus on autonomous driving so you know as we get there maybe there's some other use for, you know, millions and then tens of millions of computers with Hardware three or four so traffic computer yeah maybe there could be could be could be maybe like kind of an AWS angle here it's possible hello hello at Mat Choice Loop Ventures I have a model three in Minnesota where it snows a lot. Since the camera and radar cannot see road markings through the snow, what is your technical strategy to solve this challenge?
Is it a high precision GPS? When Alana's markings are faded or when there's a lot of rain on them, we still seem to drive relatively well. We still didn't specifically go after snow with our data engine, but I actually think this is completely workable because in a lot of those images, even when there's snow, when you ask a human scorer where the lane lines are , I could actually tell you that they're actually micro literally consistent in the rain those lines as long as the annotators are consistent in your then I have data the neural network will detect those patterns and do it right so it's really about the signal there even for the human annotator if that's the answer to that then the neural network can do it just fine actually there are important numeric cues as it says below lane lines are one of those things but one of them are The most important cues is driving space, so what is drivable space and what is not drivable space? and what really matters the most is there's driveable space more than the main lines and the predictable driveable space is extremely good and I think especially after this coming winter it's going to be amazing it's like it's going to be like how could it be that good?
That's crazy. The other thing to point out is that it may not even be just human scorers, as long as you as a human can steer through that handicap fleet learning that we actually know the path you took and obviously use vision to guide you to Through that road you didn't just use the lane line markings, you used all the geometry of the whole scene so you see like you know, you see how the world curves roughly, you see how the cars are positioned around you, you know. that job will detect all of those patterns automatically within it if you just have enough data from people going through those environments.
Yes, it's actually extremely important that things aren't rigidly tied to GPS because the GPS error can vary quite a bit and the actual situation on a road can vary quite a bit, so rebuilding could be a bit of a diversion and if the car uses GPS as main this is a really bad situation as its ok to use gps for tips and tricks so its like you can drive your home neighborhood better enough than a neighborhood in another country or some other part of the country so you know your our neighborhood well and use something like your neighborhood knowledge to drive more confidently to maybe have counter intuitive shortcuts and that sort of thing but you you GPS overlay data They should only be useful but never primary, if ever primary their problem so ask here in the back corner.
I only wanted to partially stick with that because several of your competitors in the space over the past few years have let you know that I've talked about how they're increasing all of their perception and route planning capabilities that are on the in-car platform with high-definition maps. the areas that our driving does play a role in your system do you see it adding any value there are areas where you would like more data that is not collected from the fleet but it is more of a mapping style type of data. I think high precision GPS high precision maps and lanes are a very bad idea.
The system becomes extremely fragile, so any change like this could cause any change to the system to do so. you can't adapt so if you get stuck on GPS and high precision lane lines and don't allow vision or override in fact great vision should be what makes everything what is and then like the lines of lane are a guide but but they are, they're not the main thing we briefly barked the tree of high precision lane lines and then realized it was a big mistake and reversed it not good so this is very helpful to understand the annotation where the objects are and how the car drives, but what about the negotiation aspect for parking and roundabouts and other things where there are other cars on the road that are driven by humans where it's more art than science? s and stuff it's doing really well yeah so I'll commission that we're using a lot of machine learning right now in terms of prediction creating an explicit representation of what the world looks like and then there's an explicit scheduler and controller and talk about rendering and there's a lot of heuristics on how to traverse and trade etc. there's a long tail, like in the visuals of the environments, there's a long tail on those trades and a little game of chicken that you play with. other people, etc., and I think we're very confident that eventually there needs to be some sort of quick learning component to how to do that because writing all those rules by hand is going to go quickly I think. yeah we've dealt with this issue with hacks and it's like it gradually enables more aggressive user behavior which they can do just check settings and say be more aggressive be less aggressive ya know Drive easy cool mode aggressive yeah amazing progress phenomenal two you ask first in terms of peloton do you think the system is adapted because someone asked when there is snow on the road but if you have a big platform gain function you can just follow the car in front if their system is their system capable of doing that and I have two follow so you're asking about peloton so I think we could absolutely build those features but again if you just use if you just train your own networks for example in mimicking humans the humans already followed the car that was ahead and that neural neural network actuallyincorporates those patterns internally, you just realize there's a correlation between how the car in front of you is facing and the path you're going to take, but that's all done internally in the network, so you just you worry about getting enough data and complicated data and the neural training process is actually quite magical it does all the other stuff automatically so you turn all the different problems into one problem just collect your data set and use your lopper training, yes there are three steps to driving itself you know this is complete in the future then there will be a complete future to the extent that we think the person in the car need not pay attention and then it is at a level of reliability we have also convinced the regulators that that's true, so there's like three levels that we hope will be just as comprehensive in self-driving this year and hopefully we'll have the confidence enough from our point of view to say that we think people don't need to touch the wheel look out the window at some point probably around I don't know second quarter of next year and then we start waiting to get regulatory approval in at least some jurisdictions for that towards the end of next year, what is that?
That's roughly the timeline I expect things to continue in and probably for the trucks, the peloton. ng will be approved by the regulators before anything else and you can have, maybe if it's long haul freight you can have a driver up front and then have four semi trucks behind in a platoon fashion and I think that probably the regulators will be quicker to approve that than other things of course it doesn't have to convince us in my opinion technology has an answer looking for a question probably dead i mean this is very impressive what we saw today and probably the demo could show something else.
I'm just wondering what is the maximum dimension of an array that you can have in your training or deep learning pipeline? Figure good matrix information for you to be doing. you know the network you're asking about them there are many different ways to answer that question but i'm not sure they're helpful they are helpful answers these neural hours used to have as i mentioned ed around tens to hundreds of millions of neurons each one of them has on average about a thousand connections to the neurons below so these are the typical scales that are used on T in this train as well that we also reduce yes I have actually been very impressed with the rate of improvement in autopilot last year on my model three the two scenarios i wanted your feedback on last week the first scenario was i was in the rightmost lane of the freeway and there was a freeway on- ramp and then my model three was actually able to spot two cars on the side, slow down and let the car pass in front of me and a car behind me and I was like oh my god this is crazy like there was no I would have thought my Model T could do that so that was super awesome but the same week another situation where I was in the right lane again but my right lane was merging into the left lane and it wasn't an on-ramp it's just a normal freeway lane and my Model T couldn't really detect ect that situation and I couldn't slow down or speed up and had to step in so from your perspective can you share the background on how would a neural network work how Tesla could adjust to that and you know how that could be improved in the Union over time yeah so as I mentioned we have a very sophisticated activation infrastructure if you've stepped in actually it's probably we get that clip and we can analyze it and see what happened and and tune the system so it'll probably put in some stats on well how fast we're merging traffic correctly and look at those numbers and look We look at the clips and see what's wrong and try to fix those clips and progress against those benchmarks, so yeah, we'd potentially go through a categorization phase and then look at some of the larger types of categories that actually seemed to be semantically related. with a simple through sa my problem and then we'd look at some of those and then we'd try to develop software against that okay we have one more presentation which is the software there's essentially the autopilot hardware with Stewart there's the kind of neural network vision with Andre and then there's software engineering at scale which is a Stewart computer science presenter so he likes it and then you'll have a chance to ask questions so yeah thanks.
Just wanted to say very briefly if you have an early flight and want to do a test drive with our latest development software if you could speak to my colleague and/or email him and we can take you for a test ride and Stuart to you ok that's actually from a clip of a 30+ minute uninterrupted drive with no interventions navigate an auto module on the highway system that's in production today on hundreds of thousands of cars so I'm Stewart and I'm here to talk about how we build so many systems in scalar like a very short induction I'm a little bit where I'm from and I do I've been in a couple of companies or less I've been writing about the software profession for about twelve years what the more it excites me and I What I'm really passionate about is taking the cutting edge of machine learning and connecting it with customers through a bustah arena scale, which is why at Facebook I worked initially ly within our ad infrastructure to build part of the Machine lore.
They are really very intelligent people and we her. we tried to build on a single platform that Zdenek could scale to every other aspect of the business, from how we rank the newsfeed to how we deliver search results to how we make every recommendation across the platform and that became the group of applied machine learning which is something I was incredibly proud of and a lot of it wasn't just the core algorithm some of the really important improvements that happened there that really matter a lot in engineering practice ices to build these systems at scale in the same thing was true at the time I went where we were really, really excited to help monetize this product, but the hardest part is we were using Google at the time and sure enough, you know, they were running us on a pretty small scale and we wanted build that same infrastructure. that on a massive scale and splitting billions and then trillions of predictions and auctions every day into what's really solid and so when the opportunity came to come to Tesla, that's something I'm incredibly excited to do. , which is specifically taking the amazing things that are happening on both the hardware side and the computer vision and AI side and actually packaging it together with all the planning that drives testing, kernel patching of the operating system, all of our continuous integration, our simulation, and actually made it into a product to get into people's cars in production today, so I want to talk about the timeline of how we did with navigating in pilot auto and h Now we're going to do that as we get a navigator off the freeway and out onto city streets so we're already at 770 million miles so navigating on autopilot is something or really cool and I think something worthwhile. call this is we continue to accelerate and we continue to learn from this data as Andriy talked about this data engine as it accelerates we actually make more and more assertive lane changes we are learning from these cases where we will intervene either because they don't detect emerging correctly or because they wanted the car to be a little bit more dynamic in different environments and we just want to keep making progress so to start all of this off we started with trying to understand the world around us and we talked about the different sensors in the vehicle but i want to dig a little deeper here we have eight cameras but we also have additionally 12 ultrasonic sensors or radar a gps inertial measurement unit and then one thing we forgot as well is the hub and steering actions so we can not only see what what happens around the vehicle, but we can also see how humans choose to interact with that environment and then talk about it.
I'll lark with this clip right now this is basically showing what's happening in the car today and we'll continue to push this forward so we start with the single neural network, we see the sensing surround it, then we build all of that together from multiple neural networks in multiple seductions we incorporate the other sensors and turn that into Alan calls vector space an understanding of the world around us and this is something where as we continue to get better at this we are moving more and more of this logic to the neural networks themselves and the obvious end. here it is the neural network looks through all the cars, gathers all the information and ultimately generates a source of truth for the world around us and this is actually not like a har rendered dest in a lot of ways, this is actually the output of one of the debugging tools that we use on the team every day to understand what the world around us looks like, so another thing that i think is really exciting for me i think when i do hearing about sensors like lidar a common question is just having additional sensory modalities like why not have some redundancy in the vehicle and i want to dig into one thing that is not always obvious with neural networks themselves so we have a neural network running on our wide fisheye camera, let's say, that neural network isn't making one prediction about the world, it's making many separate predictions, some of which actually d thought of each other so it's a real example we have the ability to detect a pedestrian that's something we trained very very very carefully and a lot of work but we also have the ability to detect obstacles on the road and a pedestrian it's an obstacle and it shows up differently than the neural network k says oh there's something I can't drive because of and these together combine to give us a greater idea of what we can and can't deal with the vehicle and how to plan for that , then we do this through multiple cameras Because we have overlapping fields of view and many around the vehicle in front of us, we have a particularly large number of overlapping fields of view. we can use that to learn future behaviors that are very accurate, we can also build very accurate predictions of how things will continue to happen in front of us, so one example that I think is really exciting is that we can look at cyclists and people and Don't just ask where you are now but where you're going and this is actually the heart of the ordinary part of our next generation automatic emergency braking system that will not just stop for the people in your path, but for everyone. the software people will get in their way and that's running in shadow mode right now, we're going to fleet this quarter.
I'll talk about shadow mode in a second, so when you want to start a function like this to navigate. autopilot in highway system can start by learning from the data and you can just see how humans do things today what their assertiveness profile is how they change lanes what makes them abort or change like their maneuvers and they can see things that are not immediately obvious, like oh yes I will. Fusion is rare but very complicated and very important and you can start to get feedback on different scenarios like a vehicle overtaking quickly so this is what we do when you initially have some algorithms if you want to try we can put them in the fleet and we can see what they would have done in a real world scenario, like this car that is passing us very quickly, this is taken from our real simulation environment. ing different paths that we've considered taking and how they overlap in the real world behavior of a user when you tune those algorithms and feel good about them specifically and this really takes that out for the neural network by putting it in that vector space and building and tuning these parameters plus ultimately I think we can do it through more and more machine learning you go into a controlled deployment which for us is our early access program and this is what you get out to a couple thousand people who are really excited to give you very thoughtful but helpful feedback on how the house behaves not in an open loop but in a closed loop in the real world and you watch their interventions and we talk about when someone takes over we can actually get that clip try to understand what happens and one thing we can actually do is play this again in open loop erto and ask us aswe built our software are we getting closer or further away from how human s behave in the real world and what they were were great with full autonomous computers we're actually building our own racks and infrastructure so you can basically face four or complete self-contained computers fully wrapped to build into ours. to put together and run this very sophisticated data infrastructure to really understand over time as we tune correct these algorithms are getting closer and closer to the behavior of humans and ultimately we can exceed their capabilities and once we had this , we did very well.
We wanted to do our wide release, but to start with, we actually asked everyone to confirm the behavior of the cars through a stock confirmation, so we started making many, many predictions about how we should be doing. navigating the road. We ask people to tell us if this is the case. or it's wrong and this is again an opportunity to fire up that data engine and we spot some really complicated and interesting long tales of, in this case I think a really fun example like th Are these very interesting cases of simultaneous merging where you start to go and then someone moves behind or in front of you without noticing and what is the appropriate behavior here and what are the neural network adjustments we need to make to be super accurate. on the appropriate behaviors here we work on, tweak them in the background, refine them, and over time we've gotten 9 million successfully accepted lane changes and use them again with our continuous integration infrastructure to really understand how we think I'm doing. done and this is one thing we are fully autonomous is also really exciting to me as we own the entire software stack right from the kernel patch to the I suspect the tweak in the image signal processor we can start to collect even more data that is even more accurate and this allows us to do better and better by adjusting these faster iteration cycles, so earlier this month we thought about We are ready to employ an even smoother version of autopilot navigation in the road and that fluid version doesn't require a stock confirmation, so you can sit there, relax yeah, put your hand on the wheel and just monitor what the car is doing and in this case, we're actually seeing over a hundred thousand automated lane changes every day in the highway system and this is something that's great for we implement at scale and what I'm most excited about in all of this is the actual life cycle of this and how we can get the data engine to start faster and faster over time and I think one thing that's really becoming very clear is the combination of the infrastructure that we have built the tools that we built on top of that combined power of the full autonomous computer I think we can do this even faster as we move forward now to be an anonymous appeal from the highway system to the streets of the city and so yes, with that I will deliver the only yes, me and to my knowledge, all those lane changes have occurred with zero accidents, that is correct, yes, I observe every accident, for which is conservative, obviously, but is having hundreds of thousands going to millions of lane changes and zero accidents.
I think that's a great accomplishment as a team yeah thank you so let's look at a few other things that are familiar to bring up to have an autonomous car or a Robo taxi you really need redundancy throughout the vehicle at the hardware level so starting at every was in Oct 2016 all cars made by Tesla have redundant power steering so we are done with motors in power steering so if the motor fails the car can still direct all the power and lines The data lines have redundancy so you can cut any given power line or any data line and the call will still power the auxiliary power system even if the main pack loses power completely to the main pack the car is able to steer and brake us ing the axial power system so that you can completely lose the main package and this the car is safe, the whole system, from the point of view from the hardware it was signed on to be a Robo taxi basically since October 2016 so when we were all that hardware autopilot version two what we don't expect to upgrade cars made before we think it would actually cost more to make a new car than to upgrade the cars just to give you an idea how hard it is to do this unless it's designed in yesterday it's not worth it so we've been through the future of autonomous driving where it's clear it's the hardware it's the vision and then there's a lot of software and the software problem here shouldn't downplay some two massive software problems that do manage huge amounts of data, train against the data, how do you control the car based on vision?
It's a very difficult software problem, so going after a guy like Tesla Tesla's master plan obviously made a lot of forwa Ambitious statements, as they call it, but let's go over some of our forward-looking statements that we haven't backtracked on when we created the company we sat at both Tesla Roadsters said it was impossible and then and even if we built it no one would buy it, it was like the universal opinion that building an electric car was extremely foolish and would fail. I agree with him that the probability of failure was high, but this was important, so we built the Tesla Roadster. they broke up in 2008 and shipped that car its not a collectors item they built a more affordable car with the Model S we did it again they told us thats impossible they called me a fraud and a liar its not going to happen this is all fake , okay, famous.
The last words now are that we are in production with the Model S in 2012 that exceeded all expectations. Still in 2019 there isn't a car that can compete with the 2012 Model S. It's been seven years. I keep hoping for an affordable car, maybe very. affordable is affordable more affordable with the model 3 we have bought the model 3 we are in production I said we would get over five thousand cars we have the model 3 right now five thousand cars the week is a walk in the park for us it's not even hard, so we do full scale solar which we did through souls up until acquisition and we're developing to play solar roof which is going very well we are now on version 3 of the solar tile roof and hopefully this be a production of the roof of the solar tower significantly later this year i have it on my house and it's great and i kind of do the power wall and the power pack we did the wind power pack actually the power pack is now implemented in massive, grid-scale utility systems around the world, including the world's largest operating battery projects exceeding 100 megawatts and in the next or likely in the next year, next year cheered at the most, we hope to have a gigawatt scale battery. project i completed so all th Those things i said we would do it we did it we say we do it we did it we are going to do the rover taxi thing just to criticize and it's fair and sometimes i'm not on time but i do it and the
teslateam It does, so what?
What we're going to do this year is we're going to hit a combined output of 10,000 a week between six and three o'clock on the air. Feel very confident about that and we feel very confident to be complete in the future with autonomous driving. Next year we will expand the park. in line with the why and semi model and we hope to have the first Robo taxis up and running next year with no one in them next year it's always hard to like it when things aren't going exponentially at an exponential rate of improvement it's very hard to correct one's mind around because we're used to extrapolating on a linear basis, but when you have massive amounts of hardware, massive amounts of hardware in the way, cumulative data increases exponentially, software improves exponentially. rate I feel very confident in predicting autonomous Rover taxis for Tesla next year, it's not a state mandate or jurisdictions because we won't have regulatory approval everywhere, but I am confident that we will have less regulatory original approval somewhere literally next year, so any customer will be able to add or remove their car to the Tesla network so expect us to operate it somehow it's like a mix of maybe the uber and airbnb model so if you own the car you can add it or subtract it from the Tesla network and it tells you it would take 25 or 30 percent of the revenue and then in places where there aren't enough carpoolers, we'd just have dedicated Tesla vehicles so when you use the car we'll give you I'll show you our ride sharing app.
Label? You will only be able to call the car from the parking lot. Come in and take a walk. It's very simple. Just grab the same Tesla app you currently have. app and add a summin summin Tesla or you can make your car part of the fleet to see the summin summin your car or in so many Tesla or add your sum or subtract your part of the fleet you will be able to do it from your phone so we see potential to smooth the demand distribution transition curve and have a car running at a much higher utility than an old car, so typically car usage is about 10-12 hours a week So most people will be driving for an hour and a half to two hours per day, usually 10 to 12 hours a week of total driving, but if you have a car that can operate autonomously, chances are that You can probably get that car to run for a third of a week or more, so that's 168 hours in a week then you probably have something on the order of 55-60 hours a week of operation, maybe a little more, so that the fundamental utility of the vehicle increases by a factor of five so you look at this from a macroeconomic standpoint and say if this was like we were running a big simulation if you could update your simulation to increase the utility of the cars by a factor out of five that would be a massive increase in the economic efficiency of the simulation just gigantic so we'll do the model 3 SAS 3 and X as taxis but we've made a major change to our leases so if you rent a model 3 then You do not have the option to purchase it at the end of the lease.
We want them back. If you buy the car, you get to keep it. if it has to go back on the grid and like I said we're in places where there's not enough to supply to share, will Tesla just make its own cars and add them to the grid in that place so that the current cost of broken model three Robo the taxi costs less than $38,000 we expect that number to improve with time and redesign of the cars the cars currently being built are designed for one million miles of operation the drive units the design the design and validated testing for a million million miles of operation current battery pack is about 300 to 500 thousand miles new battery pack likely to be manufactured next year is explicitly designed for one million miles of operation entire vehicle battery pack is designed to run for a million m files with minimal maintenance, so they'll actually fine-tune the design of the tires and re They will eventually optimize the car for a hyper-efficient Robo taxi and at some point you won't need steering wheels or pedals and will just remove them as these things become less and more. less importantly we'll just leave the parts as they won't be there if you say probably two years from now we'll make a car that doesn't have steering wheels or pedals and if we need to speed up that time we can always remove the parts easy and probably say a long term three year rubber cab with parts removed it might end up costing you $25,000 or less and you want a super efficient car so illustrated electricity usage is very low so we're currently at four and -- half a mile per kilowatt hour but we can, we'll improve that to five and up and there really isn't a company that has the full stack integration, we have the vehicle design and manufacturing, but the computer hardware is in-house. ve got in-house soccer development, at and artificial intelligence, and we have by far the largest suite.
It's extremely difficult, not impossible, perhaps, but extremely difficult to catch up when Tesla gets a hundred times more miles a day than everyone else. this is the cost to run a gasoline car the average cost to run a car in the usa is taken from triple-a so it currently costs about 62 cents a mile the 13 five hundred kilometers for 15 million vehicles add up to two trillion a year, these literallythey are only taken from the triple-a website. The cost of carpools is according to your left there are two or three dollars per mile the cost of operating a roving taxi we think less than 18 cents per mile and and falling like this is a car this would be current this current cost future costs will be less if you say what the probable gross profit of a single Robo cab would be, we think it would probably be on the order of $30,000 per year and we expect that word literally designing we are designing cars the same way commercial semi trucks and commercial semi trucks are designed they were all designed for a million mile life and we're also designing the cars for a million mile life so there's no nominal guys that would be, you know, a little over three hundred thousand dollars in the over 11 years, maybe higher I think these consume ion is actually relatively conservative and this assumes 50 percent of miles driven arids are art there is nothing or not useful so this is only 50% useful by the middle of next year we will have over a million Tesla cars on the road with full Autonomous Hardware feature complete at a reliable level that we would consider no one needs to pay attention to which means you might go to sleep from our point of view if you fast for a year it should seem like maybe a year maybe a year in three months but next year we will surely have more than a million taxis Highway robbery the fleet wakes up to a wireless update that's all you need to say what is the net present value of a mobile taxi probably in the order of a couple hundred thousand dollars so that buying a Model 3 is a good deal, well, I mean in our own fleets, I don't know, I guess in the long run we probably have on the order of 10 million vehicles, I mean our overall production rates ral if you look at a compound annual production ra Since 2012, which is our first full year of production for Model Model S, we went from 23,000 vehicles produced in 2013 to about 250,000 vehicles produced last year, so over the course of five years we increase production by a factor of 10. whether something similar happens in the next five or six years as far as carpooling, I don't know, but the good thing is that essentially the customers are advancing us the money for the car, it's great so in terms of one thing it's the snake charger I'm curious about that and how you determined the price seems like you're cutting the Lyft or Uber average by about 50 percent so I'm curious if you could talk a little about the pricing strategy, I'm sure. hopefully solving the solution for the snake charger is pretty straight forward it's from a vision accessory point of view it's like a known situation any kind of known situation with vision is like a charging port it's trivial so yeah car w should auto park but and auto connect there would be no one no human supervision required yeah no sorry what was a price uh yeah we just threw some numbers in there he just randomly said, okay, maybe a dollar and things like it's theirs, on the order of two billion cars and trucks in the world, so Robo taxis will be in extremely high demand for a long time to come and, My observation so far is that the ordering industry is very slow to adapt.
I mean, I said there's still not a car on the road that I can buy today that's as good as the Model S back in 2012, which suggests a pretty slow rate of adaptation for the auto industry and so probably a dollar is conservative for the next 10 years because i make people think they don't really appreciate enough the difficulty of manufacturing manufacturing is incredibly difficult but a lot of people i talk to think if you just get the design right you can instantly like to make whatever the world wants out of that thing this is not true it is extremely difficult to design a new factory system tion for new technology i mean the ones with major problems can affectionate rum and they are extremely good at manufacturing and if they have problems what about the others?
So, you know, there are on the order of two billion cars, trucks in the world, on the order of a hundred million units per year of vehicle production capacity, but just with the old design, it's going to take a long time to convert everything. that in fully autonomous cars and they really need to be electric because the cost of operating a gasoline diesel car is much higher than an electric car and any any robotex that is electric is not going to be competitive at all. Elin, it's an Oppenheimer's Colin Rush around here. It's not being built, but it sounds like a massive commitment to the bottom line of the organization over time.
Can you talk a bit about how it looks? What are his expectations in terms of financing in the next three, three, four years? In order to build this fleet and stock and monetize it with their known customer base, our goal is to be approximately cash flow neutral during the fleet creation phase and then respect the extremely positive cash flow once the Robo taxis are in place. enabled but I don't want to talk about financing so it's hard to talk about financing in this place but hey I think we'll make the right moves oh wait I think I'll make the move so you think we should main I have a question if I am Ober, why would I not buy all your cars?
Do you know why I would let them put me out of business? or four years ago they can only be using the Tesla network, so even a private person would like to go out and buy ten model three. Can't. The network is a business now. testing the network in theory i could run a car sharing Robo taxi business with my ten model threes yes but it's like the App Store you can only add or remove them through the Tesla Network and then it tells you you get a share in the income but similar to Airbnb although I have this house my car and now I can just rent them out so I can get some extra income having multiple cars and just rent them out like I have a model three I aspire to get this roadster here next when you build it and i'm going to rent my model three owl, why would i give it back to you? seems easy ok try it to operate a robotic taxi now orkut sounds like you have to solve certain problems like autopilot today if you turn too much it lets you take control but if so you know if it's a rideshare product that someone else is taking in the passenger seat, like moving the steering, you can't let that person take over the car, for example, because they may not even be in the driver's seat, for what the hardware is already there to make it a robot taxi and it could get into situations like a cop stopping it where some human might need to step in like using a core fleet of operators that interact remotely with humans or I mean it's all that kind of infrastructure already built into each of the cars, does it make sense?
I think there will be some kind of phone home where if the car gets stuck it will just burn home - to Tesla and ask for a fix, things like being pulled over by a police officer - that's easy for us to program - that's not a problem, it will be possible for someone to take care of it. you sing a steering wheel or at least for a period of time and then probably in the future we'll just cap the steering wheel so there's no steering control, well just take the steering wheel off put a cap on and if it takes a long time you know give like a couple years of hardware mod to the car to enable it or yeah we literally just unbolt the steering wheel and put a cap where the steering wheel drives Carlita's but that's a future similar car I'd turn off but what about with today's cars where the steering wheel is a mechanism to take control of the autopilot like that so if it's in a robotic taxi mode someone could take control just by moving the steering wheel like yeah I think it will a transition period where the people will take over and they should be able to take over from the mobile taxi and then once the regulators are comfortable with us not having a steering wheel we will just remove that and to the cars that are in the fleet obviously with the permission of the owner if owned by someone else we would just remove the steering wheel and put a cap where the steering wheel currently connects so there could be two phases for Robo taxi one in which the service is provided and you come in as the driver but you could take over and then in the future there may not be a driver option as well as you see or like in the future it will be the probability that the steering wheel will be removed in the future. it's one hundred percent people consumers will demand it but but initially you would follow this is not clear this is not me prescribing a view of the world this is not me predicting what consumers will demand yes a consumer I will demand in the future that people Don't be allowed to drive these two-ton machines of death.
I don't totally agree with that, yes, but for a model 3 today to be part of the Robo taxi network when you call it. then he would sit in the driver's seat essentially because yeah, just to be sure, okay, that's the right sense, thanks, like you knew there were amphibians, you know, but then things become like land creatures a little bit of civilian phase hi sorry ok yeah the strategy we've heard from other players in the Robo taxi space is to select a certain municipal area to create geo-fenced autonomous driving that way you're using a HD map to have a more confined area with a bit more security hey we don't hear much today about the importance of HD maps to what extent is an HD map necessary for you in a second we also don't hear much about implementing this in specific municipalities where you are working with the municipality to get the input of them and you are also getting a more defined area so what is the importance of HD maps and to what extent are you looking for esp municipalities eficient for its implementation?
I think HTM F SAR is a bug. because you need HD maps in which case if something changes in environment the car will break down or you don't need HTM apps in which case why are you wasting time during HD maps so HD maps is like this ? the two main crutches they offer not to be used and wither in retro parlance just retrospect the obviously fake and silly in lidar and HD maps hello if you need geofencing in the area you don't have real autonomous driving It just sounds like maybe battery supply might be the only remaining bottleneck for this vision and it might also shed some light on how it makes battery packs last a million miles.
I think cells will be a limitation. project and I think we're actually going to want to push sort of the standard range plus battery more than our long range battery because the power content in the long range package is 50% higher in kilowatt hours so essentially you can let you know 1/3 more cars if if only if they're all standard range plus instead of the long range package the ones we're at 50 kilowatt hours the others around 75 kilowatt hours so we're probably in a We bias our sales intentionally towards the smaller battery pack in order to have a larger volume of what you basically want to eat, but the most obvious thing to do is to maximize the number of self-contained units or the number of max output which will subsequently result in the largest self-contained unit jumps down the road, so we're doing a number of things along those lines, but just for today's meeting, million-mile life is about bas Just to get these the life cycle that the package, you know that you basically need it you know that in the order as I say you have a basic calculation if you have a 250 mile range package you know you need four thousand cycles very cheap well we already do it with our stationary storage both say stationary storage solutions like the power pack we are ready to play power pack with a lifespan of 4000 cycles yea may i apologize yea it's like we are touring It's obviously significant, it has very constructive margin implications to the extent that you can drive the Tatra, it's a lot. superior to the fully self-driving option.
I'm just curious if we can establish a level of where you are in terms of those connection rates and how you hope to educate consumers on the Robotech scenario so that connection rates improve materially over time, I'm a little sorry. It's hard to hear your question, yeah, just curious, where we are today in terms of full autonomous driving at rates in terms of the financial implications. I think it is very beneficial if those attached fees increase materially due to the higher margingross in dollars flowing in as people sign up for the full FST just curious how do you see that increase or what are the connection rates today compared to when you expect how you expect to educate consumers and make them aware that they should attach FSD to their vehicle purchases, we massively increased it after today, yeah, I mean, if the really fundamental bottom line message that consumers should be taking today is that it's financial folly to buy anything but a Tesla, they'll be like owning a horse in three years i mean it's ok if you know a horse but you should go in with that expectation if you buy a car that doesn't have the necessary hardware for full autonomous driving it was like buying a horse and the only car that has the toughness or acceptable autonomous driving Tesla as people should really think of their approaches any other any other vehicle is basically me before crazy buy any car other than Tesla yeah ah we need to make it make that point clear and we will have today thank you for bringing the future to present several informative rock moments today.
I was wondering if you didn't talk much about the Tesla truck and let me give you some context for that. it could be wrong, but the way i see the test on the net will be as an early adopter and something of a test bread. I think the Tesla truck maybe the first phase of putting the vehicles on the grid because the utility of the Tesla truck would be quite a lot they are hauling a lot of things or they are in the construction profession or little here and there odd items like picking up home depot stuff sure i would say you know maybe you need to have a two stage process net testing as a starting point then people like me can buy them later but what's your take on that?
Well, today it was really just about autonomy. There are many things we could talk about, such as cell production. kup truck and vehicles of the future but today was just to focus on range but i agree it's kind of important. I'm very excited for the introduction of a truck later this year, it will be great. Colin Lang and UB s just so we understand the definitions we need to refer to full self-driving feature sounds like you're talking about tier five no geofencing is that expected by the end of the year like this and then the regulatory process I mean have you talked with the regulators on this this seems like quite an aggressive timeline from what other people have posted I mean do you know what are the hurdles it takes and what is the timeline to get approved and you need things like in California they know they're tracking miles you know what's an operator behind that do you need those things but what will that process be like?
Yes, we talk to regulators around the world all the time as we introduce it. Now this requires regulatory approval based on jurisdiction, but I think fundamentally the regulators my experience are sold on the data, so if you have a lot of data that shows autonomy is safe, they listen to you, they can take a while. to digest the information that process can take a bit of time but they have always come to the right conclusion from what i have seen oh i have a question here as it says i have lights in my eyes and a pillar.
Okay, so I just wanted to let you know some of the work we've done to try to better understand the Hale market. It seems like it's heavily concentrated in the major dense urban centers, so that's the way to think about this that Robo taxis would probably implement. more in that area and the further failure of complete self driving for personally owned vehicles would be in the suburban areas I think probably yes as Tesla owned Robo taxis would be in the intimate areas along with customer vehicles and then custom reaching medium and low density areas, it would tend to be more people owning the car and occasionally lending it. around the world who have challenging urban environments but we don't expect this to be a major problem when I say full future I mean it will work in downtown San Francisco and midtown Manhattan this year. use different models for, say, route planning and perception or different types of AI and how you split that issue between the different parts of autonomy, well essentially the revamp of the AI right now that's actually used for object recognition and we're still basically just using it as still frames to identify objects that still frame and put them together in a perception path planning layer that they're looking for but but what's constantly happening is that the neural network is consuming every time plus the software foundation so over time we expect the neural network to do more and more now from a computational cost standpoint there are some things that are very simple for a heuristic and very difficult. for a neural network, so it probably makes sense to keep some level of heuristics in the system because they are computationally a thousand times easier than a neural network.
I can make it like a cruise missile and if she's trying to swat a fly just use a fly swatter or not a cruise missile so kind of over time expect her to actually move to train her against video and then a video on the car is cornering and pedaling well it's basically video and that longitudinal lateral acceleration was almost entirely out that's what we're going to use the dojo system because there's no system currently that can do that maybe around here just going back to the suite of sensors discussing the area I would like to talk about is a lack of side cameras an instance situation where you have an intersection with a stop sign where maybe there is cross traffic at 35 to 40 miles per hour do you feel comfortable with the array of sensors that the side cameras can handle? the problem essentially the cause is going to do something like what a human would do i think you can be a human its basically like a camera on a slow gimbal and if its pretty remarkable that people can drive the car the way they do because if you know what you can't look in all directions at once the car can literally look in all directions at once with multiple cameras so humans can drive just looking this way looking that way they're stuck in their driver's seat they can't actually get out of the driver's seat so it's like a camera on a gimbal and able to drive a conscientious driver to drive very safely cameras in cars have a better point of view that the person s are either like above and on the B-pillar or in front of the rear view mirror, they really have a great vantage point, so if you're turning onto a road that has a lot of a lot of high -speed traffic, you can do what the person is like graduating, like turning a little bit, not going all the way into the road, have the cameras see what's going on and if things look good then the rear cameras don't show no oncoming traffic.
Or if you go and it looks sketchy you can kind of back off like a person if the behaviors are amazing it starts to get remarkably real it's pretty creepy actually it's a car starting to behave like a person here. come on uh so there's problems here okay given all the value you're creating in your auto business by wrapping all this technology around you guys I guess I'm just curious why you're still taking some of your cell capacity and putting it on power wall and power pack wouldn't it make sense to put all the units you know you can convert into this part of your business they already stole almost all the cell lines from us because they were meant to power the world power pack and use them for the Model 3 I mean last year to do our model three production and not sell out of stock we had to convert all 2170 lines in the gigafactory two to two cars sell and our actual production in total gigawatt hours of stationary storage versus vehicles it's an order of magnitude different and for stationary storage we can basically use a bunch of miscellaneous cells so we can You collect cells from multiple vendors around the world and you know you don't have a type approval issue or a safety issue like you have with cars so that's basically our stationary battery business it's been feeding on waste for quite some time , yeah, so we really think production is one, there are a lot of limitations of a massive constraint. tive production system is like a child but as the degree to which manufacturing in a supply chain is underestimated is incredible, they are a whole series of constraints and what is the constraint in one week may not be the constraint in another week is unbelievably hard to make a car especially one that is rapidly evolving so yeah but i'll just answer a few more questions and then i think wishes are broken so i can test the cars well anyone adam jonas asks about safety what data can you share with us today how safe is this technology that obviously would be important in a regulation or insurance so we publish accidents per mile every quarter and what we see right now is that autopilot is about twice as safe as a normal driver on average and we expect that increases a bit over time, as stated in the future, consumers will want to outlaw it.
I don't think they will succeed or I'm saying I agree with this position but in the future consumers will want to ban people from driving their own cars because it's not safe if you think about elevators used to use elevators. We operate on a big lever like up and down the floor and there's like a big relay and yet the elevator operators, but then periodically they'd get tired or drunk or something and then they'd flip the lever at the wrong time and cut to someone in half so now you don't have lift operators and it could be quite alarming if you got into a lift that had a big lever that could move arbitrarily between floors so there are just buttons and in the long run again it's not a value judgment well i mean i want the world to be this way i say consumers will probably demand that people not be allowed to drive cars follow up can you share with us how much
teslaspends on autopilot or autonomous technology per order of magnitude annually thanks is Basically our whole question on Tesla's grid economics expense structure just so I understand it looks like e that you get a model three lease $25,000 on the balance sheet would be an asset and then you would have a cash flow of about $30,000 a year is that the way of thinking about yeah sort of yeah and then just in fancy terms there's a question before you mentioned what would you do if cash flow neutral for the Robo taxi program or cash flow neutral for Tesla as a whole sorry cash flow here in terms you asked a question on the financing of the Robo tax, but it seems to me that they are self-funding, but yes, you mentioned that they would be basically cash flow neutral. which what you mean now i'm just saying that between now and when robo taxis are fully deployed around the world, the most sensible thing for us to do is to maximize the fare and drive the company to cash flow on their troll once the Robbery taxi fleet is active.
I would expect to be extremely cash flow positive on this, so you were talking about production, yes you did it to produce. Tesla to Robo taxi network that is responsible for an accident is lower proof than me if the vehicle has an accident and damages probably Tesla Tesla yes the right thing to do is to make sure there are very few accidents ok thanks everyone please enjoy the price thank you
If you have any copyright issue, please Contact