Tesla Autonomy Day

Feb 27, 2020

you you you hello everyone I'm sorry I'm late welcome to our first day of analyst for

autonomy

. I really hope that this is something that we can do a little more regularly now to keep you informed about the development that we are making with respect to autonomous driving about three months ago we were preparing for our fourth quarter earnings call with Elon and quite a few other executives and one of the things that I told the group is that of all the conversations that I continue to have with investors on a regular basis Based on that the biggest gap that I see with what I see within the company and what is the external perception is our autonomous driving capability and it makes sense because for the last few years we've been really talking about the Model Three ramp. and you know a lot of the debate has been around model three, but there's actually been a lot going on in the background.

We have been working on the new force of you having a chip. We have performed a complete overhaul of our neural network. for vision recognition, etc., so now that we're finally starting to produce our fully autonomous computer we thought it was a good idea to just open the veil, invite everyone in, and talk about everything we've been doing for the last few years. two years, that is, about three. Years ago we wanted to use, we wanted to find the best possible chip for full

autonomy

and we discovered that there was no chip designed from scratch for neural networks, so we invited my colleague Pete Bannon, vice president of silicon engineering, to design such a chip.

More Interesting Facts About,

tesla autonomy day...

For us, he has about 35 years of experience building and designing chips, about 12 of those years were for a company called PA Semi that was later acquired by Apple, so he worked on dozens of different architectures and designs and was the main designer. I think about Apple's iPhone 5, but just before he joins Tesla, Illamasqua will join him on stage. Thanks, actually I was going to introduce Pete, but he's already there, he's the best trip and systems architect I know in the world. It's an honor to have you and your team at Tesla and we'll take it, just tell him.

I don't think it's ever been done how to work in Unity. Thanks, Eli. It's a pleasure to be here this morning and a real pleasure to tell you everything. I think the work that my colleagues and I have been doing here at Tesla for the last three years will explain to you a little bit about how it all started and then I will introduce you to the fully autonomous computer and tell you To explain a little bit about how it works, we will dive into the chip and we'll go over some of those details. I'll describe how the custom neural network accelerator that we designed works and then I'll show you some results and hopefully that's it.

Everyone was still awake. I was hired in February 2016. I asked Elon if he was willing to spend all the money necessary to do a full custom system design and he said, "Well, we're going to win." I said, "Yes, of course." I said I'm in and that got us started, we hired a bunch of people and started thinking about what a custom designed chip for full autonomy would look like. We spent eighteen months doing the design and in August 2017 we released The design for manufacturing we got it in December, it fired up and it actually worked very, very well on the first try, we made some changes and launched a B zero Rev in April 2018 , in July 2018, the chip was qualified and we started. complete production of production-quality parts in December 2018, we had the autonomous driving stack running on the new hardware and were able to begin retrofitting employee cars and testing the hardware and software in the real world, just this past March we started shipping the new computer in the Model S and lasts just over three years. and it's probably the fastest systems development program I've ever been associated with and it really speaks volumes to the benefits of having a tremendous amount of vertical integration to allow you to do simultaneous engineering and accelerate implementation in terms of goals that we were totally focused on. exclusively on Tesla requirements and that makes life much easier.

If you have only one client, you don't have to worry about anything else. One of those goals was to keep the power below 100 watts so we could adapt the new machine to We also wanted a lower partial cost for the existing cars so we could enable redundancy for safety at the time we had our thumb on the wind. I presented that it would take at least 50 billion operations, that was one second of neural network performance to drive a car. and so we wanted to get at least that many and really as many batch sizes as possible, how many items it operates at the same time, for example Google's TPU has a batch size of 256 and you have to wait until we have 256 things to process before you can begin.

We didn't want to do that, so we designed our machine with a batch size of one, so that as soon as an image appears we process it immediately to minimize latency, which maximizes security. We needed a GPU to do some post-processing at the time we were doing quite a bit of that, but we speculated that over time the amount of post-processing on the GPU would decrease as the neural networks got better and better and that actually happened. so we took a risk by putting a fairly modest GPU in the design, as you'll see, and it turned out to be a good bet.

Security is very important. If you don't have a safe car, you can't have a safe car, so there's a lot of focus on safety and then of course on safety in terms of doing the chip design as ela and alluded to earlier. There really was no neural network accelerator from scratch. In 2016, everyone was adding instructions to their CPU or GPU or DSP to improve inference, but no one was really doing it natively, so we set out to do it ourselves and then for other components on the chip, we bought standard IP from the industry for CPU and GPU, which allowed us to minimize design time and also risk to the program.

Another thing that was a little unexpected when I first arrived was our ability to leverage existing teams at Tesla. Tesla had wonderful power supply design teams, signal integrity analysis, package design, system software, firmware board designs and a really good systems validation program that we were able to leverage to accelerate this program here's how it went go there on the right you see all the connectors for the video that comes in from our cameras that are in the car you can see the two autonomous computers in the middle of the board and then on the left is the power supply and some control connections , so I really love when a solution is stripped down to its most basic elements: you have video and power computing, and it's straightforward and simple, here's the original hardware. 2.5 that the computer came in and that we have been shipping for the last two years.

Here is the new design for the FSD computer. It's basically the same and of course it's driven by the limitations of having a retrofit program for the cars I had. I would like to point out that this is actually a fairly small computer that fits behind the glove box, between the glove box and the firewall of the car, it doesn't take up half the trunk, as I said before, there are two totally independent computers on the board that you can see them , are highlighted in blue and green on either side of the large SOC. You can see the DRAM chips that we use for storage and then at the bottom left you see the flash chips that represent the file system, so they are two independent computers that boot and run their own operating system, yes if I may add something, that's the general principle here is that if any part of this can fail and the call will still work, so the cameras fail, the power circuits may fail, you could have one of the Tesla pulse rifle, the autonomous computer chips fail, the car keeps driving, the probability of this computer failing is substantially less than the probability of someone losing consciousness, that's the key metric, at least by an order of magnitude, yeah, so one of the additional things we do to keep the machine running is to have redundant power supplies in the car so one of the machines runs on a single power supply and the others on the other, the cameras are the same so Half of the cameras run on the blue power supply and the other half around it. the green power supply and both chips receive all the video and process it independently, so in terms of driving the car, the basic sequence is to collect a lot of information from the world around you, not only do we have cameras, we also have maps GPS with radar, I M. we use ultrasonic sensors around the car, we have wheel ticking, steering angle, we know what the acceleration and deceleration of the car is supposed to be, all of that comes together to form a plan, once we have a plan, the two machines exchange their independent version of the we plan to make sure it is the same and, assuming we agree, then we act and drive the car now, once you have driven the car with some new control, you have the costs that you want to validate it, so we validate that what we transmitted was what you intend to transmit to the other actuators in the car and then you can use the sensor suite to make sure that this happens, so if you ask the car to accelerate, brake or turn right or left, you can look at the accelerometers and make sure that in fact, we are doing it, so there is a huge amount of redundancy and overlap in both our data acquisition and our capabilities of data monitoring.

Here we move on to talk a little bit about the entire autonomous driving chip, which is packaged in a thirty-seven point five millimeter BGA with sixteen hundred balls, most of them are used to feed ground, but there are also many for signal. If you remove the lid it looks like this, you can see the substrate of the package and you can see the dye in the center if you remove the dye. and you turn it over, it looks like this, there are 13,000 C, four bumps spread on top of the dye and then under the net, underneath there are twelve layers of metal and if you darken all the details of the design, if you remove it, it See this, this is a 14 nanometer FinFET CMOS process, it's 260 millimeters in size, which is a modest iso size in comparison, the typical cell phone chip is about a hundred square millimeters, so we're a little bit larger. bigger than that, but it is a high-end GPU.

It would be more like six hundred eight hundred square millimeters, so we are in the middle, I would call it the sweet spot, it is a comfortable size to build, there are 250 million logic gates there and a total of six billion transistors that even though I work in this all the time, that's mind blowing to me, the chip is manufactured and tested to AEC q100 standards which is a standard automotive criteria. Next, I'd like to just walk through the chip and explain all the different pieces to you. I'm going to go in the order that a pixel coming from the camera would visit all the different pieces, so above in the top left you can see the cellular interface for the camera.

We can ingest 2.5 billion pixels per second, which is more than enough to cover all the sensors we know of. We have an on-chip network that distributes data from the memory system so that the pixels travel through the network to the memory controllers. on the right and left edges of the chip we use industry standard LPD DDR4 memory running at four hundred and four thousand two hundred and sixty-six gigabits per second, which gives us a maximum bandwidth of sixty-eight gigabytes per second, which is a Pretty healthy band, but again, this isn't ridiculous, so we're trying to stay.

In the comfortable sweet spot, for cost reasons, the image signal processor has a 24-bit internal pipeline that allows us to take full advantage of the HDR sensors we have around the car. Performs advanced tone mapping that helps highlight details and shadows. then it has advanced noise reduction which simply improves the overall quality of the images we are using in the neural network, the neural network accelerator itself, there are two on the chip, each one has 32 megabytes of SRAM to maintain temporary results and minimize the amount of data we have to transmit on and off the chip, which helps reduce power.

Each matrix has a 96 by 96 multiplied addition matrix with in-place accumulation that allows us to do 10,000 multiplied ads per cycle. There is dedicated riilu hardware, dedicated cluster hardware and each of these delivers 306, excuse me, each delivers 36 billion operations per second and they operate at two gigahertz, bothtogether on a die that delivers 72 billion operations per second, so we exceeded our goal of 50 rates by quite a bit. a video encoder we encode video and use it in a variety of places in the car, including the rear view camera screen, there is optionally a user function for the camp on the dashboard and also for a clip that records data in the cloud , which Stewart and Andre will talk about later. there is a GPU on the chip its modest performance it has support for 32 and 16 bit floating point and then we have 12 to 72 64 bit CPUs for general purpose processing operating at 2.2 gigahertz and this represents about two and a half the performance available in the current solution there is a safety system that contains two CPUs working in unison this system is the final arbiter of whether it is truly safe to drive the actuators in the car so this is where the two plans come together and we decide whether it is safe or not to move forward and lastly there is a security system and basically the job of the security system is to ensure that this chiponly runs software that has been cryptographically signed by Tesla, if it is not signed by Tesla then the chip doesn't work.

I've told you a lot of different performance numbers and I thought maybe it would be helpful to put it into perspective a little bit. A little bit, throughout this talk I'm going to talk about a neural network from our narrow chamber, it uses 35 Giga, three five billion operations, 35 Giga applications and if we used all 12 CPUs to process that network, we could do one and other. half frames per second, which is super slow. I am not suitable enough to drive the car. If we use the 600 gigaflop GPU, the same network we would get 17 frames per second, which is still not good enough to drive the car with cameras.

The on-chip neural network accelerators can deliver 21 frames per second and as we move forward, you can see from the scale that the amount of compute on the CPU and GPU is basically negligible compared to what is available on the accelerator. of neural networks. It really is night and day, so moving on to talk about the neural network accelerator, we'll just stop for some water. On the left is a cartoon of a neural network just to give you an idea of what is happening, the data comes at the top and visits each of the boxes and the data flows along the arrows to the different boxes. boxes are typically convolutions or d-convolutions with actual bleed the green boxes are pooling layers and the important thing about this is that the data produced by one box is then consumed by the next box and then you no longer need it, you can throw it away so that all those Temporary data that is created and destroyed as it flows through the network there is no need to store it outside of the chip and DRAM, so we keep all that data in SRAM. and I'll explain why that's super important in a few minutes.

If you look to the right side, you can see that in this network of 35 billion operations, almost all of them are convolution, which is based on dot products, the rest are deconvolution is also based on dot product and then riilu and pooling, which are relatively simple operations, so if you were designing some hardware you would clearly aim to make dot products that rely on multiplied ads and really eliminate that, but imagine you sped it up by a factor of 10,000, so 100% suddenly becomes 0.1% 0.01 percent and suddenly riilu and pooling operations are going to be quite significant, so our hardware does not include dedicated resources to process riilu and pooling as well.

The chip is operating in a thermally restricted environment, so we had to be very careful with how we burn that energy. We want to maximize the amount of arithmetic we can do, so we choose integer addition, it is 9 times less power than a corresponding floating point addition, and we choose 8-bit integer multiplication by one bit, which consumes significantly less power than other arithmetic operations. multiplication and probably has enough precision to get good results in terms of memory. We chose to use SRAM as much as possible and you can see there going from chip to DRAM. is about a hundred times more expensive in terms of power consumption than using local SRAM, so we clearly want to use local SRAM as much as possible in terms of control.

This is data that was published in an article by Mark Horowitz in SCC, where we sort of criticized the amount of power it takes to execute a single instruction on a normal integer CPU and you can see that the addition operation is just the 0.15 percent of the total energy, the rest of the energy is accounting and control overhead, so in our design we basically set out to get rid of all that as much as possible because what we're really interested in is the arithmetic , so here is the design we finished. You can see that it is dominated by the 32 megabytes of SRAM, there are large banks on the left and right. and in the bottom center and then all the computing is done in the top middle each clock we read 256 bytes of activation data from the SRAM die 128 bytes of weight data from the SRAM die and we combine them into a 96 by 96 moles in a array that performs 9,000 multiplied announcements per clock at 2 gigahertz, that's a total of 3.6,336 air points in 8 Tara operations.

Now when we are done with a dot product we offload the engine to move the data through the dedicated riilu drive optionally through a pool drive and then finally into a write buffer where all the results are aggregated and then we write 128 bytes per cycle again in SRAM and all of this is advancing continuously all the time, so we are doing dot products while we download the previous results, group and write back to memory, if you add it all up to your Hertz, you need a terabyte per second of Estrin man to support all that work, so the most difficult supplies are one terabyte per second one bandwidth per engine there are two on the chip two terabytes per second the chip has the accelerator it has a relatively small instruction set we have a read operation DMA to fetch data from memory we have a write operation DMA to return the results in memory we have three instructions based on scalar product convolution deconvolution inner product and then two relatively simple one scalar is a one input one output operation and ly are two inputs and one output and then of course stopping when it's done, we had to develop a neural network compiler for this so we took the neural network that has been trained by our vision team just as it would be implemented in cars more old and when you take it and compile it to use in the new accelerator, the compiler performs a layer fusion that allows us to maximize the computation every time we read data from the SRAM and put it back, it also smoothes out a little so that the demands on the memory system are not too irregular and then we also do channel padding to reduce burst conflicts and we make the SRAM allocation bank aware and this is a case where we could have put more hardware in the design to handle bank conflicts, but by inserting it into software we save hardware on power at the cost of some software complexity, we also automatically insert DMA into the graph. so that the data arrives just in time for computation without having to stop the machine and then at the end we generate all the code, generate all the weight data, compress it and add a CRC checksum to verify the reliability to execute a program with all the nerves. network descriptions our programs are loaded into SRAM at startup and then stay there ready to go all the time, so to run a network you need to program the address of the input buffer which is presumably a new image that just came in from a camera that you set the address of the output buffer, set the pointer to the network weights and then set go and then the machine shuts down and it will sequence the entire neural network on its own, usually running for a million or two million cycles and then when it's done we get an interrupt and can post-process the results, so moving on to the results we were aiming to stay under 100 watts.

This is measured data from cars driving on full autopilot and we're dissipating 72 watts, which is a bit more. power than the previous design, but with the dramatic improvement in performance, it is still a pretty good answer: 72 watts are consumed, approximately 15 watts when running the neural networks. In terms of cost, the silicon cost of this solution is about 80% of what we were. paying before, so we are saving money by switching to this solution and in terms of performance, we take the narrow chamber neural network that I have been talking about, which has 35 billion operations, we run it on the old hardware like in A loop. as fast as possible and we delivered one hundred and ten frames per second, we took the same data, the same network compiled it for the hardware of the new FST computer and using the four accelerators we can process 2300 frames per second, so I think that is a factor of 21.

This is perhaps the most significant slide, it is day and night. I've never worked on a project where the performance increase was more than three, so it was quite funny if you compare it to what is said in the videos: boost the AVR solution, a single chip offers 21b. Ops, our fully self-contained two-chip computer has 144 slices, so to conclude, I believe we have created a design that offers exceptional performance. 144 slices for processing a neural network has exceptional energy performance in that we manage to block all that performance. The thermal budget we had allows for a fully redundant computing solution, it has a modest cost and really the important thing is that this FSD computer will allow a new level of safety and autonomy in Tesla vehicles without affecting their cost or range, something I believe We are all eager to do Q&A after each segment, so if people have questions about the hardware, they can ask right now why I would ask Pete to do a much more detailed explanation and maybe most of the People would appreciate going deeper into it.

Tesla's fully autonomous computer is because at first it seems unlikely how it could be that Tesla, which had never designed a chip before, would design the best chip in the world, but that is objectively what has happened and it is not the best by a long shot. small margin. The best thing by a large margin is that it is in cars right now, all the Teslas that are being produced right now have this computer that we switched from the Nvidia solution for SMX about a month ago. I changed model three about ten days ago. All cars produced have all the hardware necessary for fully autonomous driving.

I will say that a set of vintage Tesla cars being produced right now has everything needed for fully autonomous driving. All you need to do is improve the software and later today you will drive the cars. with the development version of the improved software and you will see for yourself the questions, repeat a trip, two or three investigations on global stocks, very, very impressive in all aspects. I was wondering, like I had taken some notes. You are using the activation function. Arielle. the rectified linear unit, but if we think about deep neural networks, it has multiple layers and some algorithms can use different activation functions for different hidden layers like soft Max or tan h.

Do you have the flexibility to incorporate different activation functions instead of Lu into your platform? then I have a follow up, yes we have tan edge and sigmoid information for example one last question like in the nanometers you mentioned, 14 nanometers, as I was wondering, it wouldn't make sense to go lower, maybe 10 nanometers, two. years later or maybe seven at the time we started the design, not all the IP we wanted to buy was available in ten nanometers, we had to finish the design in 14, maybe it's worth noting that we finished this design like maybe in one and a half. two years ago and I started designing if the next generation we're not talking about the next generation today but we're halfway there all the things that are obvious for a next generation chip that we're doing oh hello you talked about software as a piece, now you did a great job.

I was impressed. I understood ten percent of what you said, but I trust he's in good hands. Thanks, so it looks like you finished the hardware parts and it was very difficult to do. now you have to do the piece of software, maybe that's outside your experience, how should we think about that piece of software? What could you ask for a better introduction? So Andre and Stuart, I think so, are there any fun dating questions for the chip part before the next one? part of the presentationit's neural networks and software, so maybe I'm the chip side, the last slide was 144 trillion operations per second versus Nvidia 21, that's correct and maybe you can contextualize that for a finance person why it's so important. gap, thanks, well I mean it's a factor of seven and Delta performance, which means you can do seven times as many frames, you can run neural networks that are seven times larger and more sophisticated, so it's a coin very big that you can spend.

I continue with many interesting things to improve the car. I think the savior of energy use is greater than ours. Xavier powers. I don't know if it's the best I know, the power requirements would increase by at least the same degree of a factor of seven and the costs would also increase by a factor of seven, so yeah, I mean power is a real issue because it's also it reduces the range, so the auxiliary power is very high and then you have to get rid of that power because of the thermal problem. Really significant because you had to go to get rid of all that power.

Thank you so much. I think you know, ask a lot of questions if they don't mind running day, but for a long time, we're just going to do that. the unit demos later, so if anyone needs to go out and do unit demos a little early, they can do that. I want to make sure we answer your questions. Yes, Pradeep Romani from UBS Intel. and AMD, to some extent, has begun to move toward a chip lab-based architecture. I didn't notice a crown-based design here. Do you think looking into the future would be something that might be of interest to you from an architectural point of view?

A chip-based architecture, yes. We are currently not considering any of that. I think it's most useful when you need to use different styles of technology, so if you want to integrate silicon, germanium or DRAM technology on the same silicon substrate, that becomes quite interesting, but until the die size becomes unpleasant. I wouldn't go there, the strategy here is it started, you know, three a little over three years ago, where the design builds a computer that is completely optimized and aims for full autonomous driving and then writes software that is designed to work specifically on that computer and get the most out of that computer, so you've adapted to the hardware, I mean, you're a master of a craft, autonomous driving, video is a big company, but they have a lot of customers, so when they apply their resources that they need to To make a widespread solution, we care about one thing: autonomous driving, so it was designed to do that incredibly well, the software is also designed to run on that hardware incredibly well and I think the combination of the software and the hardware is unbeatable. is designed to cross this video input in case you use say lidar, could it process that too or is it mainly for video?

I explained to you today that lidar is nonsense and anyone can trust it. orb lidar is doomed to fail, expensive, expensive, sensors that are unnecessary, it's like having a bunch of expensive appendages, that compact appendage is bad, well don't put a bunch of them, that's ridiculous you see, so just two questions about Just regarding power consumption, is there a way to give us a rule of thumb that each watt reduces the range by a certain percentage or by a certain amount, just so we can get an idea of how much the target consumption is of a model three? 250 watts per mile, it depends on the nature of the driving as to how many miles that affects in the city, it would have a much greater effect than on the highway so you know if you are driving for an hour in a city you have a solution. hypothetically you know it was a kilowatt you would lose four miles on a model three so if you're just going to say 12 miles an hour then it's like there's a 25 second hit to the range in the city it's basically powers of power That the power of the system has a massive impact on the reach of the city, which is where we think the majority of the Robo taxi market will be, so the power is extremely important.

Sorry, thanks, what is the main layout? goal of the next generation ship, we don't want to talk too much about the next generation ship, but it will be at least, let's say, three times better than the current system, two years from now, the chip will be mainly you. It does not mean that you manufacture the chip, contract it and how much cost reduction that savings represents in the total cost of the vehicle. The 20% cost reduction I cited was part cost reduction per vehicle, not that, it was not a development cost. I was just saying, yeah, but if I'm mass making them, it's a money saver to do it yourself, yeah, a little bit.

I mean, most chips are made so most people don't make chips with what's out there. At five, it is very unusual. I think there is no supply problem seen without the chip being mass produced. Cost savings pay for development. I mean Elon's basic strategy was to build this chip and reduce costs. Anil said hmm times a million cars a year, that's right, yes, sorry, if they are really chips, specific questions, we can answer them. Others there will be a Q&A opportunity after Andre's talks and after Stuart's talks, so there will be two more Q&As. opportunities, this is very chip specific, so I will be here all afternoon, yes, and exactly if people will also be here at the end, very good REO.

Thanks, that dead photo you had, the neural processor takes up quite a bit of the chip. I'm curious if it's their own design or is there some external IP, yes, that was Tesla's custom design and then I guess the next thing would be that there's probably a good amount of opportunity to reduce that footprint as you modify the design. quite dense, so in terms of reducing it, I don't think it will greatly improve the functional capabilities in the next generation. Well, and then the last question, can you share where you are? You're doing this part, what, where are we? oh, it's Samsung, yes, thank you.

Tanaka Tanaka Kapil, I'm just curious how defensible your chip technologies and design are from an IP point of view and I hope you don't offer much of the IP abroad for free. Thank you. We have filed around a dozen patents on this technology, fundamentally it is linear algebra, which I do not believe can be patented. I'm not sure. I think if someone started today and was really good, they could have something like what we have now. now, in three years, but in two years we will have something three times better talking about intellectual property protection, you have the best intellectual property and some people just steal it for fun.

I was wondering if we looked at some interactions with Aurora that companies and industry believe stole their intellectual property. I think the key ingredient you need to protect is the weights that are associated with various parameters. Do you think your chip can do anything to prevent someone from encrypting all the weights so that you don't even know what the weights are at the chip level, so your intellectual property stays inside and no one knows and no one can sense it. I would like to know the person who could do that because I would do it. hire them in the blink of an eye, yeah, so every problem is difficult, yeah, I mean, we encrypt the, it's a difficult journey to crack, so if they can crack it, it's very good if they crack it and then they also figure out the software and the neural network system and everything else, you can design it from scratch, that's all, our intention is to prevent people from stealing all that stuff and if they do, we hope that at least it will take them a long time, it will definitely take them a long time, yes.

I mean, I felt like if we did it, if it were alcohol, doing that, how would we do it, it would be very difficult, but I think a very powerful sustainable advantage for us is the fleet that nobody has, the fleet that those pesos are consuming , updated and improved. Based on billions of miles driven, Tesla has a hundred times more cars with full self-driving hardware than everyone else combined, you know, by the end of this quarter we'll have 500,000 cars with all eight cameras set up and twelve ultrasounds. someone will still be on Hardware two, but we still have the ability to collect data and within a year we will have over a million cars with fully autonomous computer hardware, everything, yeah, if we have fun, it's just a huge data advantage . similar to like you already know Google search engine has a big advantage because people use it and people people are programming effectively program Google with the queries and the results yeah I just press that and please rephrase the questions that I'm addressing and whether it's appropriate, but you know, when we talk to Wayne Moe or Nvidia, they talk with the same conviction about their leadership because of their proficiency in simulating miles driven.

Can you talk about the advantage of having real miles versus simulated miles because I think they expressed that you know when you go a million miles, they can simulate a billion and no Formula One racing driver, for example, could successfully complete a track of the real world without driving in a simulator. Can you tell us what the advantages look like? It is perceived to have been associated with ingesting data from real-world miles versus simulated miles. Absolutely the simulator we have a pretty good simulation too, but it just doesn't capture the long tail of weird things that happen in the real world if the simulation is fully completed.

It captured the real world well, I mean, that would be proof that we are living in a simulation. I don't think so, I wish, but simulations don't capture the real world, they don't. The real world is really weird and messy, you need it. You need the cars on the road and we actually get that, get into that in Andre and Stuart's presentation, yeah, okay, when we move on to 200, the last question was actually a very good Segway because one thing to remember about our F is the computer. is that you can run much more complex neural networks for much more accurate image recognition and to talk to you about how we actually get that image data and how we analyze it, we have our senior director of AI, Andre Karpati, who will explain all that to you.

Andre has a PhD from Stanford University, where he studied computer science focusing on recognition education and deep learning. Andre, why don't you speak up and do your own introduction? If there are a lot of Stanford PhDs, that's not important. Yes, yes, we don't care. Come on, thanks Andre started the computer vision class at Stanford, that's much more important, that's what matters, so if you can please talk about your background in a way that's not shy, just say what's up with the SEC redun, yes, and then, of course, yes, yes. I think I've been training neural networks for basically what's been a decade now and these neural networks weren't actually really used in the industry until maybe five or six years ago, so it's been a while since I trained these neural networks and that included, you know, institutions at Stanford in the opening of Google and I really just trained a lot of neural networks not only for images but also for natural language and architectural design that combines those two modalities for my PhD, so the computer science class, oh Yeah, and at Stanford I actually taught the Convolutional Neural Oryx class, so I was the lead instructor for that class.

I actually started the course and designed the entire curriculum, so at first it was about 150 students and then it grew to 700 students in the next. two or three years, so it's a very popular class, it's one of the largest classes at Stanford right now, so it was also very successful, I mean, in Dre he's really one of the best computer vision people in the world. world, possibly the best, okay, thank you, yes. Hi everyone, Pete told you all about the chip we designed that runs neural networks in the car. My team is responsible for the training of these neural networks and that includes all of the fleet neural network training data collection and then some of the implementation into that, so what do you know?

Do you know what exactly works in the car? So what we're looking at here is a sequence of videos from the entire vehicle along the length of the car. These are eight cameras that send us videos and then these neural networks are watching. those videos and process them and make predictions about what they are seeing and some of the things that interest us are some of the things that they are seeing in this visualization here our lane line marks other objects the distances to those objects that we call drawable space is They show in blue, which is where the car is allowed to go and many other predictions like traffic lights, traffic signs, etc.

Now for my talk, I'm going to talk roughly in three stages, so first I'm going to give you a brief introduction to neural networks, how they work and how they are trained, and I need to dothis because I need to explain in the second part why it is so important that we have the fleet and why it is so important. so important and why it is a key enabling factor to actually train these networks and make them work effectively on roads and in the third stage I will talk about a vision and lidar and how we can estimate the depth just from the vision so that the The core problem these networks are solving in the car is visual recognition, so four bring them together.

This is a very simple problem. You can look at these four images and you can see that they contain a cello on an iguana or a pair of scissors. So this is very simple and effortless for us, this is not the case for computers and the reason is that these images are for a computer actually just a massive grid of pixels and in each pixel you have the brightness value at that point and so on. Instead of just looking at an image, a computer actually gets a million numbers on a grid that tell it the brightness values at all positions, the creator knows if it's really the matrix, yes, so we have to go from that pixel grid and brightness values and high level concepts like iguana etc., and as you can imagine, this iguana has a certain pattern of brightness values, but iguanas can actually take on many appearances, so they can have many different appearances, different poses and different brightness conditions against different backgrounds can have different crops of that iguana, so we have to be robust in all those conditions and we have to understand that all those different brightness palette patterns actually correspond to a goannas.

Now the reason you and I are very good at We have a massive neural network inside our heads that processes those images so that light reaches the retina and travels to the back of the brain to the visual cortex and the cortex. The original consists of many neurons that are connected to each other and that do all the pattern recognition. In addition to those images and really in the last, I would say about five years, the most modern approaches to processing images using computers have also started to use neural networks, but in this case artificial neural networks, but these artificial neural networks and this is just a cartoon diagram, it is a very rough mathematical approximation of your visual cortex, we will actually have neurons and they are connected to each other and here I only show three or four neurons and three or four in four layers, but a typical neural network will have tens to hundreds of millions of neurons and each neuron will have a thousand connections, so they are really large pieces of almost simulated tissue and then what we can do is take those neural networks and show them images. for example, I can feed my iguana this neural network and the network will make predictions about what it sees now at first.

These neural networks are initialized completely randomly, so the connection strengths between all those different neurons are completely random and therefore the predictions of that. The network will also be completely random so it might think that you are actually looking at a ship right now and it is very unlikely that it is actually an iguana and during training during a training process really what we are doing is we know that that's actually in iguana we have a label so what we're doing is basically saying we'd like the probability of the iguana to be higher for this image and the probability of all the other things to go down and then there's a process math called backpropagation, a stochastic gradient descent that allows us to propagate that signal across those connections and update each of those connections and update each of those connections. just a small amount and once the update is complete the probability of there being iguana in this image will increase a little bit so it could be 14% and the probability of the other things will decrease and of course not we do this just for In this single image, we actually have large entire data sets that are labeled, so we have many images.

Typically you might have millions of images, thousands of tags or something and you're doing passes back and forth over and over again, so you're showing the computer here's an image that has an opinion and then you say that this is the correct answer and it gets tuned a little bit, you repeat this millions of times and sometimes you show images to the computer, the same image that you know hundreds of times too so training the network will normally take a few hours or a few days depending on the size of the network you are training and that is the process of training a neural network.

Now there is something very unintuitive about the way neural networks work. We have to go really deep and it's because they really require a lot of these examples and they really start from scratch, they don't know anything and it's very difficult to understand this, so as an example, here's a cute dog and you. You probably don't know the breed of this dog, but the correct answer is that it is a Japanese spaniel. Now we're all looking at this and we're looking at the Japanese spaniel rather, I get it. I understand what this Japanese spaniel looks like.

I like it and if I show you some more pictures of other dogs, you can probably pick out other Japanese spaniels here, so in particular those three look like a Japanese spaniel and the others don't, so you can do this very quickly and you need an example. but computers don't work like that, they actually need a ton of data on japanese spaniels, so this is a grid of japanese spaniels showing them in thousands of examples showing them in different poses, different brightness conditions, different backgrounds, different crops, you really need to teach them. computer from all different angles what this Japanese spaniel looks like and it really requires all that data to make it work otherwise the computer can't detect that pattern automatically so for us this all involves driving setup Of course, we don't worry too much about dog breeds, maybe we will at some point, but for now we really care about the Ling lines that mark objects where they are, where we can drive, etc., so the way we do this is we don't have. tags like iguana for images, but we have fleet images like this and we are interested in, for example, line markings, so we humans typically go into an image and, using a mouse, annotate the markings of ling lines, so here's an example of an annotation that a human could create a label for this image and it says that's what you should see in this image.

These are the Ling line markings and then what we can do is go to the fleet and we can ask for more images. from the fleet and if you ask the fleet if you just make an Evo of this and just ask for random images, the fleet might respond with images like this, usually moving along some road, this is what you could get as a random collection . like this and we would write down all that data if you are not careful and just write down a random distribution of this data your network will pick up this random distribution of data and work only in that regime so if you show a slightly different example for example here is an image that in reality the road has curves and is a slightly more residential neighborhood.

So if you show this image to the neural network, that network might make a wrong prediction, it might say okay, well. I have seen many times that on highways the lanes just move forward, so here is a possible prediction and of course this is very wrong, but you can't blame the neural network because it doesn't know if the Train in the tree left or doesn't matter or doesn't doesn't know if the car on the right matters or not towards the lane line doesn't know that the buildings in the background matter or not really starts completely from scratch and you and I know that the truth is that none of those things matter, what really matters is that there are some white lane markings there and at a vanishing point and the fact that they curve a little should alter the prediction, except that there is no mechanism by which we can just tell the neural network, hey those line marks actually matter, the only tool in the toolbox we have is labeled data, so what we do is take images like this when the network fails and we need to label them correctly, in this case .

We'll turn the lane to the right and then we'll need to feed a lot of images of this to the neural network and the neurons that will eventually humiliate themselves will basically pick up this pattern that those things there don't matter except those leg line markings. We do and learn to predict the correct lane, so what is really critical is not just the scale of the data set, we don't just want millions of images, but we really need to do a very good job of covering the possible space of things that The a car can be found on the roads, so we have to teach the computer how to handle scenarios where it's light and humid, it has all these different specular reflections and as you can imagine, the brightness patterns and these images will look very different, we have to teach a computer. how to deal with shadows how to deal with forks in the road how to deal with large objects that could take up most of that image how to deal with tunnels or how to deal with construction sites and in all these cases there is no mechanism again explicit To tell the network what to do, we just have massive amounts of data, we want to get all those images and we want to annotate the right lines and the network will pick up the patterns from those now large and varied data sets, they basically make these networks work.

Alright, this is not just a finding for us here at Tesla, it's a ubiquitous finding throughout the industry, which is why experiments and research from Google, from Facebook, from Baidu, from Alphabets Deepmind, show similar graphs in the that neural networks really love data and they love scale and variety. As you add more data, these neural networks start to perform better and get higher accuracies for free, so more data just makes them perform better now, several companies have pointed out that we could potentially use simulation to do this. the scale of the data sets and we are in charge of many of the conditions here and maybe some variety can be achieved in a simulator now in Tesla and that was also mentioned in the questions just before this now in Tesla This is in actually a screenshot of our own simulator.

Simulation is widely used by those who use it to develop and evaluate software. We've even used it for training quite successfully, but in reality, when it comes to training data from your networks, there's really no substitute. For real data, simulator simulations have a lot of problems with modeling the physics of the appearance and behaviors of all the agents around you, so there are some examples to really prove that point in the real world that really throws you off. a lot of crazy things, so in this case, for example, we have very complicated environments with snow, trees and wind, we have various visual artifacts that are potentially difficult to simulate, we have complicated construction sites, bushes and plastic bags that can move and that can move with the wind, complicated construction sites. that could feature a lot of people, children, animals, all mixed together and simulate how those things interact and flow through this construction zone, in reality it could be completely intractable, it's not about the movement of any pedestrians there, it's about how they respond to each other and how those cars will respond to each other and how they will respond when you drive in that environment and all of those are really hard to simulate, it's almost like you have to solve the autonomous driving problem to just simulate other cars in your simulation, so it's really complicated, so we have dogs, exotic animals and in some cases it's not even that you can't simulate it, it's that you can't even come up with it, so for example, I know you can have truck in truck that way, but In the real world you find this and you find a lot of other things that are very difficult to find, so really the variety that I'm seeing in the data coming from the fleet is crazy compared to what we have in a simulator we have a simulator of simulation really good, you're fundamentally a pimple, you're grading your own homework, so you know, if you know you're going to simulate it, that's fine, you can definitely figure it out, but like Andre says, you don't know what you don't know, the world "It's very strange and has millions of edge cases, and if someone can produce a self-driving simulation that accurately matches reality, that in itself would be a monumental achievement of human ability, they can do it." There's no way, yeah, yeah, so I think the three points that I've really tried to make so far are getting the networksneural networks to work well, you need these three essential elements, you need a big data set, a big data set and a real data set, and If you have those capabilities, you can train everything that works and make it work very well, then, Why is Tesla such a unique and interesting position to really get these three essential elements right and the answer to that, of course, is the fleet that we have? we can actually get data from it and make our neural network systems work extremely well, so let me walk you through a concrete example of how to make the object detector work better to give you an idea of how we develop them into everything that works, how we iterate. about them and how we make them work overtime, so object detection is something we care a lot about.

We would like to put bounding boxes, let's say the cars and the objects here, because we need to track them and understand how they work. could move, so again we could ask the human annotators to give us some annotations for these and the humans could come in and tell them that, well, those patterns there are cars and bikes, etc., and they can train their neural network on this, but if you're not careful, the neural network will make wrong predictions in some cases, so for example, if we stumble upon a car like this that has a bike on the back, then the neural network really went off when I joined, it would actually create two detections.

I would create a car detection and a bike detection and that's actually correct because I assume that both objects are actually alone, but for the purposes of the controller in a downstream scheduler, you don't really want to deal with the fact that this bike can go with the car, the truth is that that bike is attached to that car, so in terms of just objects on the road, there's a single object, a single car, and what you'd like to do now is just potentially write down a lot of those images, since this is just one car, so the process that we go through internally in the team is that we take this image or some images that show this pattern and we have a machine learning mechanism through which we can ask the fleet to give us examples that look like this and the fleet could respond with images containing those patterns, so as an example these six images could come from the fleet, they all contain bikes on the back of the cars and we would go in and annotate them all as one car and then the performance of that detector really improves and the network internally understands that, hey, when the bike is just connected to the car, it's actually just a car and you can learn that with enough examples and that's how we solve that problem.

I will mention that I talked quite a bit about getting fleet data. I just want to briefly point out that we've designed this from the beginning with privacy and everything in mind. The data we use for training is anonymized now the fleet doesn't just respond with bikes on the back of cars, we search for everything, we will search for many things all the time, for example we search for ships and the fleet. We can respond with ships that we look at from construction sites and the fleet can send us many construction sites around the world. We look for even slightly rarer cases, so for example finding debris on the road is very important to us, so these are examples of images. that have come to us from the fleet that show tires, cones, plastic bags and things like that, if we can get them to scale, we can write them down correctly and then your network will learn how to deal with them in the world.

Here's another animal example, of course. It's also a very rarely occurring event, but we wanted the neural network to really understand what's happening here, that these are animals and we want to deal with that correctly, so to summarize the process by which we iterate on the predictions of the neural network looks like this: we start with a seed data set that was potentially randomly obtained, we annotate that data set and then we train your lab on that data set and put it in the car and then we have mechanisms by which we notice inaccuracies in the car when this detector may be behaving like that, for example, if we detected the network may be uncertain or if we detect that or if there is driver intervention or any of those settings, we can create this activation infrastructure that sends us data from those inaccuracies and for example if we don't perform very well on lane line detection in tunnels then we may notice that there is a problem in tunnels, that image would go into our unit tests so we can verify that we have actually fixed the problem. problem over time, but now what you need to do is correct this inaccuracy, you need to get many more examples it looks like this, so we ask the fleet to send us many more tunnels and then we label all those tunnels correctly, we incorporate it into the training set and we retrain the network, redistribute and iterate the loop over and over again, which is what we mean by this. iterative process by which we improve these predictions as a data engine, so we iteratively implement something potentially in shadow mode, generating inaccuracies and incorporating the training set over and over again, and we do this for basically all the predictions of these networks neurons until now.

I talked a lot about explicit labeling, so as I mentioned, we asked people to write down the data. This is a time expensive process and it was also especially yeah, it's just an expensive process and so these annotations, of course, can be very expensive to achieve. What I want to talk about also is really utilizing the power of the fleet, you don't want to go through this bottle of human annotation like you just want to stream data and automate it automatically and we have multiple mechanisms by which we can do this. As an example of a project we worked on recently is current detection, so you're driving down the road, someone is to the left or right and cuts in front of you in your lane, so here's a video that shows Autopilot detects that this car is invading our lane.

Now, of course, we would like to detect a current as quickly as possible, so the way we approach this problem is that we don't write explicit code to know if the left flasher is on or if it is the right one. blink, track the keyboard over time and see if it is moving horizontally. We actually use a fleet learning approach, so the way this works is we ask the fleet to send us data every time they see a car move from the right lane to the center lane or from left to center and then what we do is we rewind time back and we can automatically note that that car will turn in 1.3 seconds by cutting in front of the preview and then we can use that to train that your lat and so the neural.

The network will automatically detect many of these patterns, so for example, cars are usually Yod and then move this way, maybe the blinker is on. That all happens internally within the neural network just from these examples, so we ask fleet to send us automatically. With all this data we can get about half a million images and all of them would be annotated for currents and then we train the network and then we take this cut in the network and we deploy it to the fleet, but we don't activate it. However, we run it in shadow mode and in shadow mode the network is always making predictions.

Hey, I think this vehicle is going to intervene because of the way it looks, this vehicle is going to intervene and then we look for wrong predictions, as an example this. It's a clip we had of the shadow mode of the cut on the network and it's a little difficult to see, but the network thought that the vehicle right in front of us and to the right was going to cut and you can see that it's flirting a little where the line of the lane is trying to encroach a little bit and the network got excited and they thought that was going to cut into that vehicle, it would actually end up in our center lane, which turns out to be incorrect and the vehicle didn't actually do that, so what we do now is we just turn on the data engine that we get, which ran in shadow mode, it's making predictions, it generates some false positives and there are some false negative detections, so we get overexcited and sometimes sometimes missed the cut when it actually happened all that creates a trigger that is broadcast to us and is now incorporated for free no humans harmed in the process of labeling this data incorporated for free into our training set we retrain the network and we redistribute the shadow mode and then we can spin this a few times and we always look at the false positives and negatives coming from the fleet and once we're happy with the ratio of false positives or negatives, we actually spin it a little bit and let the auto control that.

Red, so you may have noticed that we actually shipped one of our first versions of an intact copy of the architecture, I think about three months ago, so if you've noticed that the car is much better at detecting currents, it's That is, fleet learning operates at scale, yes, actually. works pretty well, so that's plate learning, no humans were harmed in the process, it's just a lot of data-driven neural network training and a lot of shadow mode, and looking at those results, it's another very centralized thing, like if everyone was training the network all the time. It's a question of whether on- or off-network order polishing is trained on each mile traveled for the car that is more difficult or superior to train the network.

Yeah, another interesting way that we use this in a fleet learning scheme in the other project that we're going to talk about is a route prediction, so while you're driving a car, what you're actually doing is writing down the data because you're driving the wheel and you're telling us how to traverse different environments, so what we're looking at here. It's a person in the fleet who turned left at an intersection and what we do here is we have the complete video from all the cameras and we know the path that this person took thanks to the GPS, the inertial measurement unit of the wheel .

Tilt the wheel so we put all that together and understand the path that this person took through this environment and then of course this we can use for network monitoring so we just get a lot of this lunar fleet and train to a neural network on those trajectories and then the neuron predicts routes just from that data, so actually what this refers to is generally called imitation learning, we're taking human trajectories from the real world. I'm just trying to imitate how people drive. real worlds and we can also apply the same data engine to all of this and make this work over time so here is an example of route prediction going through a complicated environment so what you are seeing here is a video and we are overlaying the network's predictions so this is a path that the network would follow in green and some yes, maybe the crazy thing is that the network is predicting paths that it can't even see with incredibly high accuracy that they can't see around the corner, but I would, but it says the probability of that curve is extremely high, so that's the way and it's key.

You'll see it in cars today, but we'll turn on augmented vision so you can see lane markings and road predictions. of the superimposed cars in the video, yes, there is actually more stuff under the hood that you may even be afraid of and of course there are many details that I am skipping, you may not want to write down all the drivers, you can write down just to you. You might want to just mimic the best drivers and there are a lot of technical ways that we actually slice and dice that data, but the interesting thing here is that this prediction is actually a 3D prediction that we project back onto the image here, so so the way forward is a three-dimensional Something that we're just rendering in 2D, but we know about the slope of the terrain from all of this and that's actually extremely valuable for driving, so by the way, the mathematical prediction in It's actually available in a fleet today, so if you're driving.

Cloverleaf, if you were on a cloverleaf on the road until about five months ago, your car couldn't do cloverleaf now, that's a prediction that runs live on your cars, we sent this out a while ago and Today You will be able to experience this when passing through intersections. A big component of how we traverse intersections in your tours today comes from an automatic label prediction, what I talked about so far are really the three key components of how we iterate. about network predictions and how we make it perform over time, a large, varied, real-world data set is required.

We can really do it here at Tesla and wewe do through the scale to float the data engine that sends things in shadow mode iterating that loop. and potentially even using fleet learning where no human annotators are harmed in the process and just using data automatically and we can really do that at scale, so in the next section of my talk I'll talk especially about depth perception using just vision. You may be familiar that there are at least two sensors in the car, one is the vision cameras which only get pixels and the other is the lidar which many companies also use and the lidar gives you these point measurements of the distance to your around now, one one.

What I would like to point out first is that all of you came here, many of you drove here and used your neural network and your vision, you were not shooting lasers from your eyes and you still ended up here. Clearly, the human neural network derives distance and all measurements in the 3D understanding of the world from vision alone. It actually uses multiple keys to do this, so I'll briefly go over a few of them to give you a rough idea of what's going on. and inside, as an example, we have two ice marked, so you get two independent measurements at each step of the role in front of you and your brain puts this information together to come up with a depth estimate because you can triangulate any point between those two points of sight.

Instead, many animals have eyes placed on the sides, so they have very little overlap in their visual fields, so they typically use the structure of movement and the idea is that they move their head and because of the movement, they actually get multiple observations. of the world and you can triangulate the depths again and even with one eye closed and completely still you can still have some sense of depth perception if you did this I don't think you would notice me coming two meters towards you or a hundred miles back and that's because There are many very strong monocular signals that your brain also takes into account.

This is an example of a fairly common visual illusion where you know these two blue bars are identical, but your brain stitches the scene together. you just expect one of them to be bigger than the other because of the vanishing lines in this image, so your brain does a lot of this automatically and also an artificial neural network scan of neural networks, so let me give you three examples of how depth perception can be reached from vision alone, a classic approach and two that are based on neural networks, so here is a video. I think this is San Francisco from a Tesla, so our cameras are detecting and we're watching everything.

I'm only showing the main camera, but all the cameras are on, all eight cameras on autopilot and if you only have this six second clip, what you can do is stitch this environment together in 3D using multi-view stereo techniques so that this Oops, this is supposed to be a video, isn't it a video? Although I know, here we go. This is the 3D reconstruction of those six seconds of that car driving down that road and you can see that this information is purely, it's very good, it can be recovered just from videos and generally speaking, that's through the process For triangulation and, as I mentioned, multi-view in Syria, we've applied similar techniques a little more sparsely and roughly in the car as well, so it's notable that all that information is actually there on the sensor. and it's just a matter of extracting it, the other project I want to talk about briefly is, as I mentioned, there's nothing about neural networks.

Neural networks are very powerful visual recognition engines and if you want them to predict depth, then you need, for example, to search. depth tags and then they can do it extremely well, so there's nothing limiting the networks in predicting this monocular depth except the tag data, so an example project that we've looked at internally is that we use the forward facing radar which is shown in blue and that radar looks and measures the depths of objects and we use that radar to annotate what the vision sees, the bounding boxes that come out of the neural networks, so instead From human annotators telling you okay, this car and this bounding box are about 25 meters apart, you can annotate that data much better using sensors, so sensor annotation from, for example, radar is pretty good. at that distance, you can write that down and then you can train your lab work on it and if you have enough data from This neural network is very good at predicting those patterns, so here's an example of predictions from that, in circles I show objects radar and in, and the keyboards that come out or here are purely from vision, so the keyboards here are just coming out. out of sight and the depth of those cuboids is learned by a radar sensor annotation, so if this works very well, you'll see that the circles in the top down view will match the keypads and they do, and that it's because they know that levers are very proficient at predicting depths, they can learn the different sizes of vehicles internally and they know how big those vehicles are and they can actually derive the depth from that quite accurately.

The last mechanism that I will talk about very briefly is a little more sophisticated. a little more technical, but it's a mechanism that has recently appeared in some articles, basically in the last two years, about this approach. It's called self-monitoring, so what a lot of these articles do is just feed raw videos into neural networks without labels of any kind and you can still learn, you can still get neural networks to learn in depth and it's a bit technical, so which I can't go into all the details, but the idea is that neural networks The network predicts the depth in each frame of that video and then there are no explicit targets that the neural network is supposed to return with labels, Rather, the goal of the network is to be consistent over time, so any depth it predicts must be consistent. for the duration of that video and the only way to be consistent is to be right as the network automatically predicts that for all pixels and we've reproduced some of these results internally so this works pretty well too so, In short, people drive with vision alone no, there are no lasers involved, that seems to work pretty well.

The point I would like to make is that visual recognition and really powerful recognition is absolutely necessary for autonomy, it is not a good thing to have as we must have neural networks that actually really understand the environment around you and the LIDAR points are a much less information rich environment, so the vision really understands all the details, only a few points around have much less information, so as an example on the left, a plastic bag is shown. or is that outfit, well lidar might give you some points on that, but vision can tell you which of those two is true and that affects your control.

Is that person looking back slightly? Is she trying to merge into your lane on the street? bike or are just moving around construction sites, what do those signs say? How should I behave in this world? All the infrastructure that we have built for roads is designed for human visual consumption, so all the signs, all the traffic lights everything is designed for vision and that's where all that information is and that's why you need that skill if that person she is distracted and on her phone she goes to work walks towards your lane those answers to all these questions are only found in vision and are necessary for level 4 autonomy and level 5 and that is the capability that we are developing at Tesla and through This is done through a combination of large-scale training at its bottom through the data engine and making it work over time and using the power of the fleet. and so in this sense, lidar is really a shortcut, it avoids the fundamental problems, the important problem of visual recognition that is necessary for autonomy, so it gives a false sense of progress and ultimately it is a crutch that offers really quick demos, so if I had to summarize my whole talk in one slide, it would be this autonomy because you want level 4 and level 5 systems that can handle all possible situations in 99.99% of the cases and chasing some of the last few nights.

It's going to be complicated and very difficult and require a very powerful visual system, so I'm going to show you some pictures of what you might find on any portion of those nine, so at first you'll just have very simple cars. Go ahead, those cars start to look a little weird, then maybe you have black stone cars, then maybe you have cars and cars, maybe you start getting involved in really weird events, like overturned cars or even cars in the air, we see a lot things that are coming from the fleet and we see them at a certain pace as a really good pace compared to all of our competitors and so the rate of progress at which you can actually address these issues iterates in the software and really feeds the hours neurons with the correct qualifying data.

The level of progress is really proportional to how often you encounter these situations in the wild and we encounter them much more frequently than anywhere else, which is why we are going to do extremely well, thank you, everything is super impressive, thank you so much. data, how many images you're collecting on average of each car per time period and then it looks like the new hardware with the dual active computers gives you some really interesting opportunities to run in full simulation one copy of the neural network while you're running the other, the only one who drives the car and compares the results to do Quality Control and then I was also wondering if there are other opportunities to use the computers for training when they are parked in the garage for the 90.% of the time I am not driving my Tesla.

Thank you very much, yes, for the first question, how much data do we get from the fleet? So it's very important to point out that it's not just a scale of the data set, but really What matters is the variety of that data set, if you just have a lot of images of something moving along the road, at some point a neurologist gets it, you don't need that data, so we're really strategic and how we can choose and the trigger. The infrastructure we've built is a pretty sophisticated analysis to get only the data we need right now, so it's not a massive amount of data, it's just very well curated data for the second question regarding redundancy, absolutely, you can basically run them. the network copy on both and that's how it's designed to achieve a small 405 system that is redundant so that's absolutely the case and your last question, sorry I didn't train the car, it's an inference optimized computer that we have a major program at Tesla that we don't have enough time to talk about today, called dojo, which is a super powerful training computer, the Gulf of Georgia will be able to absorb large amounts of data and train at the video level and perform massive tasks Unsupervised. training large amounts of video with the dojo computer dojo program but that's for another test pilot day in a way because I drive the four five ten and all these really complicated, long tail things happen every day, but the only challenge that I'm curious What you're going to solve is changing lanes because every time I try to get into a lane with traffic, everyone cuts you off, so human behavior is very irrational when you're driving in Los Angeles and the car just wants to do it the same way. safe way and you almost have to do it insecure way, so I was wondering how are you going to solve that problem.

Yeah, one thing I'll point out is that I talked about the data engine as iterating on neural networks, but we do exactly the same thing at the software level and All the hyperparameters that go into the choices of when we actually link change how aggressive we are, we're always changing those potentially running in shadow mode and seeing how well they perform and therefore to adjust our heuristics around when it's okay to change lanes. You could also potentially use the data engine and a shadow mode etc. Ultimately, designing all the different heuristics for when it's okay to change lanes is actually a bit intractable I think in the general case, so the ideal is to use fleet learning. to guide those decisions, so when do humans change lanes, in what scenarios, and when do they feel it is unsafe to change lanes?

Let's just look at a lot of the data and train the machine learning classifiers to distinguish when it's too safe to do so. and those machine learning classifiers can write code much better than humans because they have greatamount of data behind them, so they can actually adjust all the right thresholds and agree with humans and do something safe, we will probably have a way that goes further. From Mad Max mode to L.A. traffic mode, yeah, well, you know, Mad Max would have a hard time in L.A. traffic, I think so, so it's really a trade-off, since you don't want to create unsafe situations but you want to be assertive, but that The little dance of how to make that work as a human is actually very complicated, it was very difficult to write code, but I think we really do it, it really seems like the machine learning approach is the right way to do it.

We just look at a lot of ways that people do this and try to imitate that we are more conservative right now and then as we gain more confidence, it will allow users to select a more aggressive way that will live up to it. the user, but in the more aggressive modes when trying to merge into traffic, there is a slight error, no matter how many new ones, there is a small chance of a fender bender or a non-serious accident, but basically you will have the option to do it. You want to have a non-zero chance of getting into an accident in freeway traffic, which unfortunately is the only way to navigate traffic.

Yes, yes, yes, yes, yes, and it was nice with the story that's happening. Yes, you will have more aggressive options. time that will be specified by the user yes ma'am the learning curve were so fast, the risk or what you're trying to do here is almost developed consciousness in cars through the neural network, so I guess the challenge is how not to create a circular reference in terms of passing From the centralized model of the fleet to that transfer where the car has enough information, where is that line? I guess in terms of the point in the learning process to deliver it where there's enough information in the car and not have to take it out of the fleet, look, the car can work if it's completely disconnected from the fleet, it just loads up and the training knows. getting better and better as the free fleet gets better and better, it's that simple if you're just logged in from the fleet from then on it would stop getting better, but it worked great in the heart of the previous version and spoke to many of the energy benefits of not storing many images, so in this part you are talking about the learning that occurs when you retire from the fleet.

I guess I'm having a hard time reconciling how if there's a situation where I'm driving uphill like you showed and I'm predicting where the road is. is going to go, that comes from all the other fleet variables that led to that intelligence, how am I not, how am I getting the benefit of the low power using the cameras with the neural network, that's where I'm losing the the - such Maybe it's just me, but I guess what I mean is that the computing power on the fully autonomous computer is incredible and maybe we should mention that if I had never seen that road before, I would still have made those predictions whenever it was a road in the United States.

United in the case of lidar, the march of the nines, isn't there an example? It won't just hit your lidar because it's pretty clear you don't like light. In this last flame, the lighter is the name. Isn't there a case where at some point nine nine nine nine nine in the future, we're actually lidar can be useful and why not have it as a kind of redundancy or backup sets up my first question and the second so you can You still focus on computer vision, but make it redundant. My second question is if that's true, what about the rest of the industry that is building their autonomy solutions on LIDAR?

Everyone is going to get rid of LIDAR, that's my prediction. Remember my words. I should point out that I don't actually hate light much or as much as it may seem, but at SpaceX their basic dragon uses lidar to navigate to the space station or dock normally, so SpaceX developed their own lidar from scratch to do that and I spearheaded that effort personally because in that scenario lidar makes sense, it's fucking stupid, it's expensive and unnecessary and like Larry was saying, once you figure it out, it's worthless, so you have expensive hardware that's worthless in the car , we have a word of four. radar which is low cost and is useful especially for occlusion situations, so if there is fog or dust or snow, the radar can see through that, if you are going to use active photon generation, don't use the length of visible wave because once with passive optics you have taken care of all the visible wavelengths that you want if you want to use a wavelength that is occlusive and penetrating like a radar, then what Lana is just active generation of photons in the visual spectrum if you are going to do active photons?

The generation does it outside the visual spectrum on the radars in the radar spectrum so at twenty point eight millimeters versus 400 to 700 nanometers there will be much better occlusion penetration and that's why we have a forward radar and then we also have i . We'll only need twelve ultrasounds to get near-field information in addition to the eight cameras and the Ford Young radar needs the radar in all four directions because that's the only direction you're going really fast, so, I mean, we have several on this. Sometimes, as always, we are sure we have the right size of candy.

Should we add anything else? Not high, so right here you mentioned that you asked the fleet for the information that you're looking for for some of the vision and I have two questions about that, well, it looks like the cars are doing some calculations to determine what type of information to send you. That's a correct assumption and are they doing it in real time or are they doing it based on stored information? information, so they absolutely do calculations in real time on the car there and we'll wait to basically specify the condition that we're interested in and then those cars do that competition there, if they didn't do that, then we would have to send all the data and do it outside of line in our back end, we don't want to do that, so all those calculations have us in the car, so based on that question, it seems like you guys are in a very good position to currently have half. a million cars in the future, potentially millions of cars that are essentially computers that represent almost free data centers for you, yes, what you need to do computationally is that a big future opportunity for the Tesla car is a current opportunity and that's not It's been taken into account yet, that's amazing, thank you, we have four. one hundred and twenty-five thousand cars with hardware two and up, meaning they have all eight cameras to the right of the radar in ultrasonics and have at least one Nvidia computer, which is enough to essentially figure out what information is important and what is not about compressing the information that is important for the most prominent elements and upload it to the network for training, but it is a massive compression of real-world data.

You have this kind of network of millions of computers that are essentially like massive data centers that are Distributed Data Centers for computing capacity. Do you think it will be used for things other than autonomous driving in the future? I suppose it could possibly be used for something besides autonomous driving. Let's focus on autonomous driving. So you already know. As we get to it, maybe there's some other use for, you know, millions and then tens of millions of computers with hardware three or four, so the computer traffic, yeah, maybe there would be, could be, could be , maybe this as some kind. from an AWS angle here is possible hello, hello, at Mat Choice Loop Ventures.

I have a model three in Minnesota where it snows a lot because the camera and radar can't see the road markings through the snow. What is your technical strategy to solve this challenge? Is it something high? Precision GPS, yes, so actually, like today, Auto Pal will do a decent job in the snow, even when the lane markings are covered, even when the Alana markings are faded or when it's raining heavily, it still seems that we drive relatively well. We're not specifically looking for snow with our data engine yet, but I actually think this is completely manageable because in a lot of those images, even when there's snow, when you ask a human annotator where the lane lines are, they could actually tell you. are actually micro, literally consistent in the rain, those lines, as long as the annotators are consistent in their data, then I have the neural network that will detect those patterns and work well, so it's really about the signal is there.

For the human annotator, if that's the answer is yes, then the neural network can do it well. There are actually a number of important signs, as noted below. Lane lines are one of those things, but one of them is the most important sign. driving space so what is drivable space and what is not drivable space and what really matters more is the drivable space rather than the main lines and the drivable space prediction is extremely good and I think, especially after this coming winter, It's going to be amazing it's, it's like, it's going to be like, how could it be so good?

That's crazy. The other thing to point out is that maybe it's not even just human annotators, as long as you as a human can overcome that handicap, learning the fleet we actually know the path you took and obviously you use vision to guide you to through that road, you didn't just use the lane line markings, you used all the geometry of the whole scene so that you see as if you knew, you see how the world curves approximately, you see how the cars are placed around you , you know that the job will automatically detect all those patterns within it if you have enough data about the people going through those environments.

Yes, it's actually extremely important that things are not rigidly tied to GPS because GPS error can vary quite a bit. a bit and if the actual situation of a road can vary quite a bit, the reconstruction could be a detour and if the car uses GPS as primary, this is a really bad situation, since looking for problems is fine using GPS. For similar tips and tricks, it's like you can drive your home neighborhood better than a neighborhood in some other country or some other part of the country so you know your neighborhood well and use the knowledge of your neighborhood. to drive more confidently, maybe have counterintuitive shortcuts and that sort of thing, but you, the GPS overlay data should only be useful, but never primary, if it's ever primary, your problem, so ask here in the back corner.

I just wanted to follow up. partially because of that because several of your competitors in the space in recent years have let you know that I've talked about how they're augmenting all of their route planning and perception capabilities that are in the automotive platform with high-definition maps. Of the areas where our driving plays a role in your system, do you see it adding any value? Are there areas where you would like more data that is not collected from the fleet, but is more cartographic style types of data? I think the high precision type, high precision GPS maps and lanes are a very bad idea, the system becomes extremely fragile, so any change like this could make any changes to the system unable to accommodate, so if you're locked into GPS and high precision lane lines and they don't allow vision or override, in fact, great vision should be what makes everything that is and then like lane lines, they're a guide, but They are not the main thing.

I briefly barked the high precision lane lines tree and then realized it was a big mistake and reverted it, not good, so this is very useful for understanding the annotation, where the objects are and how it drives the car, but what about the negotiation aspect? parking lots and roundabouts and other things where there are other cars on the road driven by humans where it's more art than science, it's pretty cool, it actually looks like with the ends cut off and stuff, it's working very well, yeah, so I'll have it let's be using a There's a lot of machine learning right now in terms of prediction, kind of creating an explicit representation of what the world looks like and then there's an explicit planner and a controller and you talk about representation and there's a lot of heuristics about how to traverse and negotiate, etc. there's a long tail as the visual environment looks, there's a long tail just in those negotiations and a little game of chicken that you play with other people and so on, so I think we have a lot ofconfidence that eventually there will need to be some sort of fleet learning component about how that's actually done, because writing all those rules by hand is going to vato quickly.

I think so, we have solved this problem with cuts and it is as if we gradually allow it. they can make more aggressive behavior on the part of the user, just check the settings and say: be more aggressive, be less aggressive, you know. Drive easily, relaxed mode, aggressive, yes, incredible progress, phenomenal, two questions first in terms of peloton, do you think the system is adapted? because someone asked when there is snow on the road, but if you have a big rig winning feature, you can just follow the car in front. Is your system capable of doing that?

And I have two tracking, so you're asking about platooning, so I think we could build those features, but again, if you just use them, you train your own networks, for example, to imitate humans, the humans already followed the car forward and that neural neural network actually incorporates those patterns internally, it's just that. You realize that there is a correlation between the way you look at the car in front of you and the path you are going to take, but that is all done internally in the network, so you are only concerned about getting enough data, the Complicated data and neural training.

The process is actually quite magical, it does all the other things automatically, so it turns all the different problems into one problem, just collects your data set and uses your clipper training. Yes, there are three steps to driving autonomously. You know this will be completed in the future. The future is complete to the point where we think the person in the car doesn't need to pay attention and then there's the level of reliability, we've also convinced the regulators that that's true so there are like three levels. We hope to have as many autonomous driving features this year and we hope to be confident enough from our point of view to say that we believe people don't need to touch the steering wheel.

Look out the window at some point, probably around the second trimester. next year and then we start to expect to get regulatory approval at least in some jurisdictions for that towards the end of next year. What is the approximate timeline that I expect things to go on and probably for trucks the platooning will be? approved by the regulators first of all and you can have, maybe, if you're a long-haul carrier doing long-haul loads, you can have one driver in front and then have four semi-trucks behind in a platoon fashion and I think probably the regulators It will be faster to approve that than other things, of course, you don't have to convince us.

In my opinion, technology has an answer. Looking for a question, it's probably dead. I mean, what we saw today is very impressive and probably the demo could show something more. I'm just wondering what is the maximum dimension of a matrix you can have in your training or deep learning process. Figure good matrix information so you know which matrix multiplies operations within your network. I'm asking about them, there are many different ways to answer that question, but I'm not sure if they are useful, they are useful answers, these neural hours typically had, as I mentioned, between tens and hundreds of millions of neurons each. of them, on average, have about a thousand connections to the following neurons, so these are the typical scales that are used in T in this train and that we also reduce.

Yes, in fact, I have been very impressed with the rate of improvement on autopilot last year in my model three. The two scenarios I wanted your comments on last week, the first scenario was that I was in the rightmost lane of the highway. and there was an on-ramp to the freeway and then my model three was actually able to detect two cars on the side, slow down and let the car go in front of me and a car behind me and I was like, oh my gosh, this is crazy, I didn't think my Model T could do that so it was like super awesome but the same week there was another scenario where I was in the right lane again but my right lane was merging into the left lane and it wasn't an on-ramp, it's just a normal highway lane and my Model T.

I couldn't really detect that situation and I couldn't slow down or accelerate and I had to intervene, so from your perspective, can you share the background on how a neural network would work with Tesla? could be adjusted to that and you know how that could be improved in the Union over time, yes, so as I mentioned, we have a very sophisticated activation infrastructure, if you have intervened, it is potentially likely that we have received that clip and we can analyze it . and see what happened and adjust the system to probably input some statistics, okay? At what speed are we merging traffic correctly and we look at those numbers and we look at the clips and we see what's wrong and we try to fix it? those clips and progress compared to those benchmarks, so yeah, we would potentially go through a categorization phase and then look at some of the larger types of categories that actually seemed to be semantically related to a simple problem and then look at some of them and then try to develop software against that, okay, we have one more presentation, which is the software, essentially the autopilot, the hardware with Stewart, there is the neural network type of vision with Andre and then there is the software. scale engineering, which is a Stewart computing presenter, really likes it and then there will be an opportunity to ask questions, so yeah, thank you.

I just wanted to say very briefly if you have an early flight and would like to try out our latest development software. I could talk to my colleague and/or email him and we can take you for a test ride and Stuart will get back to you, so it's actually a clip of an uninterrupted 30+ minute ride with no interventions to navigate a car. capsule in the highway system that today is produced in hundreds of thousands of cars, so I'm Stewart and I'm here to talk about how we build so many systems at scale, like it's a really short induction.

I come and do it. I've been to a couple of companies or less. I've been writing professional software for about twelve years. What I'm most excited about and really passionate about is taking the cutting edge of machine learning and actually connecting that to customers across a scale of arena bustah, so at Facebook I initially worked within our ads infrastructure to build some of Machine tradition, they are really very smart people and she tried to build a single platform that we could Zdenek scale to all of us. the other aspects of the business, from how we rank the newsfeed to how we deliver search results to how we make each recommendation on the platform, and that became the applied machine learning group, something I was incredibly proud of and a lot of it isn't. just the core algorithm, some of the really important improvements that happened there, the ones that matter a lot, actually, the engineering practices of building these systems at scale, the same was true at the time where I went, where we were really excited to really Helping monetize this product, but the hardest part was using Google at the time and they were effectively leading us and on a fairly small scale and we wanted to build that same infrastructure.

We understand that these users connect that with a cutting-edge machine learning construct. that on a massive scale and generating billions and then trillions of predictions and auctions every day, which is really robust and so when the opportunity arose to come to Tesla, that's something that I'm incredibly excited to do. , which is specifically taking advantage of awesome things. that are happening on both the hardware side and the computer vision and AI side and we actually package that together with all the planning that drives testing, OS kernel patching, all of our continuous integration , our simulation and we actually integrated it into a product that we put into people's cars in production today, so I want to talk about the timeline of how we did that with autopilot navigation and how we'll do it as we get a navigator on off the highway and on city streets, so we're already at 770 million miles to cruise on autopilot is a really cool thing and I think one thing worth noting is that we're still accelerating and learning. of this data, as Andriy talked about this data. engine as this accelerates, we actually make more and more assertive lane changes, we are learning from these cases where we will intervene, either because they are not detecting the exit correctly or because they wanted the car to be a little more cheerful in different environments and we just want to keep progressing, so to start all of this off, we start by trying to understand the world around us and we talked about the different sensors in the vehicle, but I want to go a little deeper here, we have eight cameras, but then we also have 12 ultrasonic sensors or radar, a GPS inertial measurement unit and then we forget about the hub and steering actions, so we can not only observe what is happening around the vehicle, but also how humans choose to interact with it. that environment, so I'll talk to this clip right now, this basically shows what's happening in the car today and we'll continue to push this forward so we start with the single neural network, look at the detection around it and then build everything out. that together, from multiple neural networks in multiple seductions, we incorporate the other sensors and turn them into Alan calls vector space an understanding of the world around us and this is something that as we continue to get better and better at this , we are increasingly moving this logic to the neural networks themselves and the obvious endgame here is that the neural network examines all the cars, gathers all the information and ultimately generates a source of truth for the world around us and this is actually It's not a harder render in many ways, it's actually the result of one of the debugging tools that we use on the team every day to understand what the world around us looks like, so another thing that I think it's really exciting for me.

When I hear about sensors like lidar, a common question is about having additional sensory modalities, like why not have some redundancy in the vehicle, and I want to delve into one thing that's not always obvious with the neural networks themselves, so we have a neural network running on our fisheye camera for example, that neural network doesn't make one prediction about the world, it makes many separate predictions, some of which actually thought about each other, so that's an example real: we have the ability to detect a pedestrian, that is something we train very carefully and work hard on, but we also have the ability to detect obstacles on the road and a pedestrian is an obstacle and is shown differently to the neural network and he says, "Oh, there's something I can't get through." and these together combine to give us a better idea of what we can and cannot do in front of the vehicle and how to plan for it.

Then we do it through multiple cameras because we have overlapping fields of view and many around the vehicle in Front we have a particularly large number of overlapping fields of view. Finally, we can combine that, if things like radar and ultrasonic stabilities understand extremely accurately what's going on in front of the car, we can use them to learn future behaviors that are very precise. We can also build very accurate predictions about how things will continue to happen in front of us, so one example that I think is really exciting is that we can look at cyclists and people and not just ask where they are now but where they are going and this is actually the heart of the ordinary part of our next generation automatic emergency braking system, which will not only stop for people in its path, but all the people in the software will be in its path and it is running in mode shadow right now, we're going out to fleet this quarter I'll talk about shadow mode in a second, so when you want to start a feature like this to navigate on autopilot on the highway system, you can start by learning from the data and you can just observe how humans do things nowadays.

It's their assertiveness profile, how they change lanes, what makes them abort or change it, like their maneuvers, and you can see things that aren't immediately obvious, like oh yeah, I'll do it. Constant onboarding is rare but very complicated and very important, and you can start generating opinions on different scenarios, like a vehicle overtaking quickly, so this is what we do when you initially havesome algorithm that you want to try, we can put them in the fleet and we can see what they would have done in a real situation. -World scenarios like this car that is passing us very quickly, this is taken from our real simulation environment and shows different paths we have considered taking and how they overlap on the real world behavior of a user when you adjust those algorithms and feel good with them specifically and this is really taking that out of the neural network, putting it into that vector space and building and tuning these parameters on top of it, ultimately I think we can do that through more and more machine learning, getting into an implementation controlled. which for us is our early access program and this is to get this out to a couple thousand people who are really excited to give you very thoughtful but helpful feedback on how the house is behaving, not as an open loop, but as a closed loop way in real life. world and you see their interventions and we talk about when someone takes control we can get that clip, try to understand what happens and one thing we can really do is play this again in an open loop way and ask as we build our software, are we moving closer or further away from how humans behave in the real world and what was great about fully autonomous computers?

We're actually building our own racks and infrastructure so you can basically face four or one. -Drive fully prepared computers, build them on our own cluster, and actually run this very sophisticated data infrastructure to really understand, over time, as we tune in, these algorithms are getting closer and closer to the behavior of humans and ultimately we can understand if we can exceed its capabilities. So once we had this, we were very good about it, we wanted to make our launch wide, but to start, we actually asked everyone to confirm the behavior of the cars through a stock confirmation, so we started to make lots of predictions about how we should navigate.

The highway that we asked people to tell us is right or wrong and this is again an opportunity to activate that data engine and we detect some really complicated and interesting long stories, in this case I think it's a really fun example like they. They're these very interesting cases of simultaneous fusion where you start going and then someone moves behind you or in front of you without realizing it and what is the appropriate behavior here and what are the neural network adjustments that we need to make to be super precise On the Appropriate Behaviors here we worked, we adjusted them in the background, we improved them and over time we got 9 million successfully accepted lane changes and we used them again with our continuous integration infrastructure to really understand how we think we are ready.

And this is one thing: we are completely autonomous and it is also very exciting for me since we own the entire software stack directly from the kernel patch to the end. I suspect we can start collecting image signal processor tuning. even more data that is even more precise and this allows us to better and better tune these faster iteration cycles, so earlier this month we thought we were ready to employ an even smoother version of autopilot navigation in the system of highways and that perfect version does not require a stock confirmation so you can sit back, relax, put your hand on the wheel and just monitor what the car is doing and in this case we are actually looking at over a hundred thousand changes of automated lane each. day in the highway system and this is a great thing for us to implement at scale and what excites me most about all of this is the actual life cycle of this and how we can really spin up the data engine. getting faster and faster over time and I think one thing that's really becoming very clear is the combination of the infrastructure that we've built, the tools that we've built on top of that combined power of the fully autonomous computer.

I think we can do this even faster as we move now to being an anonymous appeal of the highway system to the city streets, so yeah, with that I'll deliver the only yeah, I mean, as far as I know, all of those Lane changes have occurred without accidents, that is correct. Yes, I watch every accident, so it's conservative, obviously, but having hundreds of thousands of people in millions of lane changes and zero accidents is a huge accomplishment for the team. Yeah, thanks, so let's see, you know some other things. which are familiar with mentioning that to have a self-driving car or a robot taxi you really need redundancy throughout the vehicle at a hardware level, so as of October 2016, all cars made by Tesla have redundant power steering, so which we end up with motors in the power steering, so if the motor fails the car can still drive.

All power and data lines have redundancy, so you can cut any power line or any data line and the call will still drive auxiliary power. system even if the main pack loses all the power in the main pack, the car is able to turn and brake using the axillary power system, so it can completely lose the main pack and this makes the car safe, all the system from a hardware A POV has been signed to be a Robo taxi basically since October 2016, so when we were all that version two hardware autopilot, we didn't expect to upgrade the cars made before, we thought it would actually cost more make a new car than make it. upgrading cars just to give you an idea of how hard it is to do this unless it's designed yesterday it's not worth it so we've gone through the future of autonomous driving where it's clear it's the hardware, the vision and then There is a There is a lot of software and the software problem here should not be to minimize two massive software problems that yes, managing large amounts of training data against the data, how do you control the car based on vision?

It's a very difficult software problem, so going after a guy like Tesla, Tesla's master plan obviously made a lot of forward-looking statements, as they call it, but let's go over some of our forward-looking statements that we didn't make back when we created the company in which we sit. Both Tesla Roadsters said it was impossible and then even if we built it no one would buy it. It was as if the universal opinion was that building an electric car was extremely foolish and would fail. I agree with him that the probability of failure was high, but this was important, so we built the Tesla Roadster, it broke in 2008 and by shipping that car is not a collector's item, they built a more affordable car with the Model S.

We did it again and they told us it was impossible. They called me a fraud and a liar it's not going to happen this is all fake okay famous last words now is that we are in production with the Model S in 2012 it exceeded all expectations still in 2019 there is no car that can compete with the 2012 Model S It's seven years later I'm still waiting for an affordable car maybe very affordable it's affordable more affordable with the model 3 we bought the model 3 we are in production I said we would get more than five thousand cars, we have the model 3 right now five thousand The week of cars is a walk in the park for us, it's not even difficult, so we do large scale solar, which we did through souls to acquisition, and we are developing a solar roof, which is going very well. in version 3 of the solar tile roof and we expect this to be a production of the solar tower roof significantly later this year.

I have it in my house and it's great and I kind of do the power wall and the power pack that we made wind power pack in fact the power pack is now deployed in massive grid scale utility systems all over the world. world, including the world's largest operating battery projects that with more than 100 megawatts and in the next or probably next year, next year applauded at most. We hope to have a group gigawatt scale battery project that I completed, so all these things that I said we would do, we did it, we did it, we did it, we did it, we're going to do the taxi rover thing just to criticize and it's fair and Sometimes I don't arrive on time but I do it and the Tesla team does it, so what?

What we will do this year is we will reach a combined production of 10,000 per week between airs six and three. We're very confident in that and we're very confident that the future will be complete with autonomous driving next year. We will expand the park. in line with the why and semi model and we hope to have the first operational Robo taxis next year with no one in them next year it is always difficult to like it when things are not going exponentially at an exponential rate of improvement it is very difficult to correct the mind around this because we're used to extrapolating linearly, but when you have massive amounts of hardware in the way and the accumulated data increases exponentially, software improves at an exponential rate.

I feel very confident predicting self-driving Rover taxis for Tesla next year, it's not a state or jurisdictions mandate because we won't have regulatory approval everywhere, but I'm sure we will have the least regulatory approval somewhere literally next year, so any customer will be able to add or remove their car to the Tesla Network, so expect us to operate it in some way, it's like a combination of maybe the uber and airbnb model, so if you own the car , you can add it or subtract it to the Tesla Network and it tells you that it would take 25 or 30 percent of the revenue and then in places where there aren't enough people sharing their cars, we would just have dedicated Tesla vehicles, so that when you use the car we will show you our ridesharing app. just being able to call the car from the parking lot, get in and go for a spin, it's really simple, just grab the same Tesla app that you currently have, we'll just update the app and add a Tesla summary or we can make your car to the fleet, so See the summary of your car or that many Teslas or add or subtract your share of the fleet.

You'll be able to do this from your phone so we see potential to smooth out the demand distribution curve. and having a car runs much higher utility than an old car, so typically using a car is 10 to 12 hours a week, so most people will drive one and a half to two hours a day, usually 10 to 12 hours a week of total driving, but if you have a car that can run autonomously, you can most likely get that car to run for a third of the week or more, so which is 168 hours in a week, so I probably have something on the order of 55 to 60 hours a week of operation, maybe a little more, so the fundamental utility of the vehicle increases by a factor of five, so We see this from a macroeconomic point of view and say if this were so. some, if we were operating a large simulation, if you could update your simulation to increase the utility of the cars by a factor of five, that would be a massive increase in the economic efficiency of the simulation, just gigantic, so we'll do the 3 SAS model 3 and You can keep it, but if you rent it you have to go back on the grid and like I said, we are in places where there is not enough supply to share.

Will Tesla simply make its own cars and add them to the network there so that the The current cost of Roto's Robo model three taxi is less than $38,000. We expect that number to improve with time and the redesign of the cars. Cars built today are designed for one million miles of operation. Transmission units. The design and testing were validated for a million. million miles of operation, the current battery pack is approximately 300 to 500 thousand miles of the new battery pack that will likely go into production next year is explicitly designed for one million miles of operation, the entire vehicle battery pack even, well, it's designed to run for a million miles with minimal maintenance, so you'll actually adjust the tire design and really optimize the car for a hyper-efficient Robo taxi and at some point you won't need steering wheels or pedals and you'll just will remove them so these things become less and less important, we'll just leave the pieces that won't be there if you say thatprobably within two years we will make a car that has no steering wheels or pedals and if we need to speed up that time, we can always just remove parts it's easy and probably say in the long run, three years of rubber taxis with parts removed, it might end up costing $25,000 or less and you want a super efficient car, so the illustrated electricity consumption is very low, so we're currently at four-and-a-half miles per kilowatt hour, but we can improve that to five and up and there's really no no company that has the full integration that we have in vehicle design and manufacturing, except for internal computer hardware. we have in-house football development, at and artificial intelligence and we have by far the largest suite.

It's extremely difficult, not impossible perhaps, but extremely difficult to catch up when Tesla has a hundred times more miles per day than everyone else thinks. This is the current cost of operating a gasoline car The average cost of operating a car in the US is taken from triple-a, so it currently costs about 62 cents per mile, between 13 and a half thousand miles for 15 million vehicles, adds up to two billion a year, these are literally taken from the triple-a website. The cost of shared rides is according to your left there are between two and three dollars per mile, the cost of operating a mobile taxi we believe is less than 18 cents per mile and going down like this it is a car, this would be the current cost, the future costs will be less if you say what would be The probable gross profit from a single Robo taxi we think is probably on the order of $30,000 per year and we hope that the word literally design, we are designing cars the same way commercial semi-trailers are designed and commercial semi-trailers.

Designed for a million mile life and we're designing the cars for a million mile life as well, so there's no nominal rates, you know, a little over three hundred thousand dollars over the course of 11 years, maybe more. I think this consumption is actually. relatively conservative and this assumes that 50 percent of the miles driven are art, there is nothing or no use, so this has only a 50 percent utility by the middle of next year we will have over a million Tesla cars on the roads with fully autonomous driving. The hardware function is complete to a reliable level that we would consider that no one needs to pay attention to it, which means that from our point of view, you could go to sleep instead if you fast for a year, it should seem like maybe a year, maybe once a year in three months, but next year for sure we will have over a million Robo taxis on the road, the fleet wakes up with a wireless update, that's all it takes, you say what the current value is net of a mobile taxi, probably in the order of a couple of hundred thousand. dollars, so buying a Model 3 is a good deal.

Well, I mean, in our own fleets, I don't know. I assume that in the long term we will probably have on the order of 10 million vehicles. I'm talking about our production rates in general, if you look at a compound annual production. rate since 2012, which is like our first full year of Model S production, we went from 23,000 vehicles produced in 2013 to about 250,000 vehicles produced last year, so over the course of five years we increased production by a factor of 10, as expected. Whether something similar happens in the next five or six years as far as bus sharing, I don't know, but the good thing is that essentially the customers advance us the money for the car, it's great, in terms of one thing it's the snake charger .

I'm curious about that and how you determined the price. It sounds like you're undervaluing the average Lyft or Uber ride by about 50 percent, so I'm curious if you could talk a little bit about pricing strategy, I'm sure. We hope to solve the solution for the snake loader. It's pretty simple. It is from the point of view of a vision accessory. It's like a known situation. Any type of situation known with vision is like a loading dock. It's trivial. So, yes, the car would activate automatically. Park but and connect automatically, there will be no one, no human supervision required, yes, no, sorry, what was the price?

Yeah, we just added some numbers in there. I mean, I think it's like he definitely plugs in whatever price he thinks makes sense, he just randomly said, well, maybe a dollar and stuff like that is theirs, like on the order of two billion cars and trucks in the world. , so robotaxis will be in extremely high demand for a long time and from my observation so far the hailing industry is very slow to adapt I mean I said there is still no car on the road that you can buy today that be as good as the Model S was in 2012, suggesting a fairly slow pace of adaptation for the auto industry and therefore probably a dollar. it's conservative for the next 10 years because I make people think that there really isn't enough recognition for the difficulty of manufacturing, manufacturing is incredibly difficult, but a lot of people I talk to think that if you have the right design, you might like it. . instantly make as much of that thing as the world wants.

This is not true, it is extremely difficult to design a new manufacturing system for new technology. I mean, those who have major problems may want to rum and are extremely good at making and if they have problems. What about the others? So, you know, there are on the order of two billion cars and trucks in the world, on the order of a hundred million units per year of vehicle production capacity, but just with the old design, it will take a long time. It's time to turn all that into fully autonomous cars and they really need to be electric because the cost of running a petrol and diesel car is much higher than an electric car and any robotex that is electric will not be competitive at all.

Elin, it's an avalanche of Oppenheimer's Colin around here, you know, obviously, we appreciate that customers are spending some of the cash to get this fleet being built, but it sounds like a massive balance sheet commitment by the organization over the course of the year. time, can you talk? a little bit about what it looks like what your expectations are in terms of financing over the next three years three four years to build this fleet and store it, monetize it with your customer base that you already know, our goal is to be approximately cash flow neutral during the fleet construction phase and then I respect the extremely positive cash flow once the Robo taxis are enabled, but I don't want to talk about financing, so it will be difficult to talk about financing rounds in this place, but well, I think We'll make the right moves, oh wait, I think I'll make the move, so you think we should be main.

I have a question, if I'm Ober, why wouldn't I buy all your cars? You know why? Would I let him put me out of business? There is a lock that we put on our cars. I think it was about three or four years ago, they can only use the Tesla network, so even a private person would like to go out and buy ten model threes. I can't use the network, that's a business right now, just already used it, it's not working properly, but if I use the network to test it, in theory, I could run a car sharing Robo taxi business with my ten. model three, yes, but it's like the App Store where you can just add them, add them or remove them through the Tesla Network and then it tells you that you get a share of the revenue, but it's similar to Airbnb, although I have this house, my car and now I can rent them so I can make extra income by having multiple cars and just rent them like I had a model three.

I aspire to have this roadster here next time you build it and will simply rent my model. Three owls, why would I give it back to you? You know, I suppose you could operate a fleet of rental cars, but I think this is very difficult to manage. Yes, I don't think so. It seems easy, okay, try it to operate a robo taxi. Orkut, it sounds. like if you have to solve certain problems like for example autopilot today if you turn it too far it lets you take control but if so you know if it's a ride sharing product where someone else sits in the passenger seat, like moving the steering, can not. let that person take charge of the car for example because they might not even be in the driver's seat so the hardware is already there for it to be a robo taxi and you could get into situations like getting pulled over by a cop where some human might need it. intervene like using a central fleet of operators that interact remotely with humans or I mean, it's all that kind of infrastructure already integrated into each of the cars, does that make sense?

I think there will be some kind of phone home where if the car gets stuck, it will just burn to Tesla and ask for a solution, things like having a police officer pull you over, you know, that's easy for us to program, That's not a problem, it will be possible for someone to do it. take control using a steering wheel or at least for a period of time and then probably in the future we will just cover the steering wheel so there is no steering control, well just take the steering wheel off, put a cover on it and if you are in a long time you know , give a couple of years of hardware modification to the car to allow it or yes, we literally just unscrew the steering wheel and put a cover where the steering wheel drives Carlita, but that is a like the car of the future that you would take out, but what What happens with today's cars where the steering wheel is a mechanism to take control of the autopilot?

So if it's in robo-taxi mode, someone could take control by simply moving the steering wheel? Yes, I think it will be a transition period where people will take control and they should be able to take control of the mobile taxi and then once the regulators are comfortable with us not having a steering wheel, we will just remove it and for the cars that are in the fleet, you know, obviously, with the owner's permission, if it's owned by someone else, we would just remove the steering wheel and put a cap where the steering wheel currently connects, so there could be two phases to the Robo taxi , one where the service is provided and you enter. as a driver, but you could potentially take control and then in the future there might not be a driver option, so as you see it too or like it in the future, in the future there will be a chance that the steering wheel will be taken away from you in the future.

One hundred percent people will demand it, but initially you would understand. This is not clear. This is not me prescribing a point of view on the world. This is not me predicting what consumers will demand. Yes, consumers will demand in the future. People are not allowed to drive these two-ton deadly machines. I don't totally agree with that, yes, but for a model 3 today to be part of the Robo Taxi Network, when you call it, then you would get into the driver's seat. I nod essentially because yeah, just to be sure, okay, that's the right thing to do, thank you like you knew there were amphibians, you know, but then things become like terrestrial creatures a little bit, a little bit of civil phase, hello, I I feel, okay, yes, the strategy we have.

What I've heard from other players in the Robo taxi space is to select a certain municipal area to create geofenced autonomous driving that way you're using an HD map to have a more confined area with a little more security. Hey, we didn't do it. We hear a lot today about the importance of HD maps. To what extent is an HD map necessary for you. On a second, we also haven't heard much about implementing this in specific municipalities where you're working with the municipality to get buy-in. them and you are also getting a more defined area, so what is the importance of HD maps and to what extent are you looking at specific municipalities for implementation?

I think HTM F SAR is a mistake, we actually got old after a while, we can't actually do it. because you need HD maps in which case if something changes in the environment the car will break down or you don't need HTM apps in which case you are wasting your time during HD maps then HD maps are like that. the two main crutches that offer that should not be used and wither in hindsight, just review the obviously fake and silly on LIDAR and HD maps. Hi, if you need a geofence for an area where you don't have real autonomous driving, it sounds like maybe.

Battery supply might be the only bottleneck left for this vision and also could you clarify how you get battery packs to last a million miles? I think the cells will be a limitation, that's all, that's a completely separate topic, there is a completely separate topic. And I think we're actually going to want to launch sort of a standard range plus battery instead of our long range battery because the energy content in the long range pack is 50% higher in kilowatt hours, so basicallyyou can let yourself know 1/3 more. cars, if you, only if, they all have a standard range and instead of the long range package, the ones we have are 50 kilowatt hours, the others around 75 kilowatt hours, so we are probably biased in our sales intentionally . towards the smaller battery pack to have a larger volume of what I basically want to eat, but the most obvious thing is to maximize the number of autonomous units or the amount of maximizing output that will subsequently result in the largest autonomous jump. in the future, so we're doing a number of things along those lines, but it's just for today's meeting.

Million mile life is basically about getting the life cycle that the package - you know you need, basically, you know. Order like I said, you have basic calculations, if you have a 250 mile range package, you know you need four thousand cycles, it's very cheap, well we already do that with our stationary storage, so they say stationary storage solutions as a package of energy. We are ready to use a power pack capable of 4000 cycle life. If I can ask? I'm sorry. Yes, it's as if we were tourism. Obviously, it has significant and very constructive margin implications on the extent to which you can drive the Tatra.

It is much higher than the total. autonomous driving option. I'm just curious if we can establish where you are in terms of those connection rates and how you expect to educate consumers about the Robotech scenario so that connection rates improve materially over time. Sorry, it's a little difficult to hear your opinion. question, yeah, I'm just curious where we are today in terms of fully autonomous driving at rates in terms of financial implications. I think it's hugely beneficial if those attachment rates increase materially due to the higher gross margin dollars flowing in as people sign up for full FST, I'm just curious how you see that increase or what the attachment rates are today in day, compared to you, when do you expect?

How do you hope to educate consumers and make them aware that they should attach FSD to their vehicle purchases? We've increased that enormously after today, yeah, I mean, if the really fundamental message that consumers should accept today is that it's financially crazy to buy anything other than a Tesla, they'll be like owning a horse in three years, I mean , it's okay if I know a horse, but you should approach it with that expectation. If you buy a car that doesn't have the hardware necessary for fully autonomous driving, it was like buying a horse and the only car that has acceptable toughness or capability.

Self-driving Tesla, like people, should really think about their approaches, any other, any other vehicle, it's basically crazy to buy any other car other than Tesla, yes, we need to make that convey that argument clearly and we'll have it today, thanks for bringing the future to the present. Several informative rock moments today I was wondering if you didn't talk a lot about the Tesla pickup and let me give you some context. I could be wrong, but the way I see testing on the network will be as an early adopter and somewhat of a test. bread I think Tesla pickup maybe the first phase of putting the vehicles on the network because the usefulness of Tesla pickup would be for people who are carrying a lot of things or who are in the construction profession or with few extraneous items here and there like Pick up things from Home Depot.

I'm sure I would say you know that maybe it's necessary to have a two-stage process. Picking up drops exclusively for testing on the net as a starting point and then people like me can buy them later, but what do you think about that? well today was really just about autonomy, there are a lot of things we could talk about like the cellular production pickup truck and future vehicles, but today was just to focus on autonomy, but I agree it's a big deal. . I am very excited by what he says. The introduction of the truck later this year will be great for Colin Lang and UB, just so we understand the definitions we need to refer to.

Full self-driving feature. It sounds like you're talking about level five. Without geofencing, that's what's expected by the end of the year like that and then the regulatory process, I mean, have you talked to the regulators about this? It seems like a pretty aggressive timeline from what other people have posted. I mean, do you know what the hurdles are that are needed and what the timeline is to get approval and you need things like in California, knowing that they're tracking miles, knowing that there's an operator behind it, you need those things, but what is that process going to be like, Yes, we talk to regulators around the world all the time.

We present you with additional features such as a navigator and an autopilot. We know that this requires regulatory approval depending on the jurisdiction, but I think fundamentally the regulators, my experience is convinced by the data, so if you have a lot of data that demonstrates that autonomy. It's for sure they hear it they may take a while to digest the information that process may take a little time but they have always come to the correct conclusion from what I have seen oh um I have a question here as it says I have lights in my eyes and a pillar, okay, I just wanted to let you know about some of the work we've done to try to better understand the Hale travel market.

It seems like it's very concentrated in the major dense urban centers, so the way to think about this is that Robo taxis would probably be deployed more in that area and the additional failure of full autonomous driving for personally owned vehicles would be in the areas suburban. I think probably yes, like the Tesla-owned Robo taxis would be in the private. areas along with customer vehicles and then as you get to medium and low density areas it would tend to be more people owning the car and occasionally lending it out. Yes, there are a lot of extreme cases in Manhattan and they say in downtown San Francisco, but those are You know, there are several cities around the world that have challenging urban environments, but we don't expect this to be a major problem.

When I say future-complete, I mean it will work in downtown San Francisco and downtown Manhattan this year. Hello, I have a neural network architecture question asked: do you use different models for, for example, path planning and perception or different types of AI and more or less how do you divide that problem between the different parts of autonomy? Basically, the revamp of the current AI that is actually used for object recognition. and we are still basically using it as fixed frames to identify objects that still frame and put them together in a perceptual path planning layer that they are looking for, but what is constantly happening is that the neural network is devouring the software base more and more and, over time, we expect the neural network to do more and more Now from a computational cost standpoint, there are some things that are very simple for a heuristic and very difficult for a neural network, which is probably why it has It makes sense to maintain a certain level of heuristics in the system because they are computationally a thousand times easier than a neural network.

I can do it, it's like a cruise missile and if you're trying to swat a fly, just use a fly swatter or not. cruise missile so with a little bit of time we wait for it to actually move to train it against video and then a video in the car turns and pedals well, it's basically video and that lateral longitudinal acceleration disappears almost completely, that's what we're going to do. use the dojo system because there is no system that can do that currently, maybe here we will just go back to the sensor suite discussion about the area I would like to talk about is the lack of side radars, an example situation where You have an intersection with a stop sign where there is maybe 35 to 40 mile per hour cross traffic, are you comfortable with the sensor suite that the side cameras can handle?

That's what we talked about, yes, your problem, essentially, the cause that you're going to do something like that. a human would think you can be human is basically like a camera on a slow gimbal and it's quite remarkable that people can drive the car the way they do because if you know what you can't look at all directions at once, the car can literally look in all directions at once with multiple cameras, so humans can drive by just looking this way, looking that way, they are stuck in the driver's seat that they can't really get out of. the driver's seat, so it is like a kind of camera on a gimbal and is capable of leading a conscientious driver to drive with very high safety.

Cameras in cars have a better vantage point than the person, so they're like up and the B-pillar or in front of the rear view mirror, they really have a great vantage point, so if you're turning onto a road that has a lot of high speed traffic, you can do whatever the person is like graduate, turn a little, don't go all the way onto the road so the cameras see what's going on and if things look good and then the rear cameras don't show no oncoming traffic or if you go and if it looks sketchy you can just back off a bit like a person if the behaviors are remarkably similar it starts to become remarkably realistic it's quite strange actually it's a car it just starts behaving like a person, here you go, so, course of problems right here, okay, given. all the value that they are creating in their automotive business by wrapping all this technology around it.

I guess I'm curious why they're still taking some of their cell capacity and putting it into the power wall and power pack, right? It makes sense to put every single unit you know you can make into this part of your business. They already stole almost all of our cell lines that were meant to power the world's energy pack and use them for the Model 3, I mean, last year. In order to produce our model three and not sell stock, we had to convert all 2170 lines in the gigafactory from two to two cars sold and our actual production in total gigawatt hours of stationary storage compared to vehicles is an order. of different magnitude and for stationary storage, we can basically use a lot of diverse cells available so we can pool cells from multiple suppliers around the world and you know you don't have a certification or safety issue. like it has with cars, so basically our stationary battery business has been running on waste for quite some time, yeah, so we really think about production is like there are many, many limitations of a mass production system and restrictive. like a kid, but I find it surprising the extent to which manufacturing a supply chain is underestimated.

There are a whole series of constraints and what the constraint is one week may not be the constraint another week. It's tremendously difficult to make a car, especially one that is evolving rapidly, so yes, but I'll answer a few more questions and then I think. wishes so you can test drive the cars, okay, anyone Adam Jonas has questions about safety, what data can you share with us today? quarter and what we see now is that autopilot is about twice as safe as a normal driver on average and we expect that to increase quite a bit over time, as was said, in the future consumers will want to ban it.

I don't think they will be successful or I'm saying I agree with this position but in the future consumers will want to ban people from driving their own cars because it's not safe if you think about elevators, elevators used to be operated with a big lever like above and down the floor and there's like a big shift and there are still elevator operators but periodically they would get tired or drunk or something and then they would turn the lever at the wrong time and cut someone in half, so now there are no elevator operators elevators and It might be quite alarming if you walked into an elevator that had a big lever that could move between floors arbitrarily, so there are just buttons and in the long run, again, it's not a value judgment, well, it's saying, " "I want the world to be like this." I'm saying that consumers will most likely demand that people not be allowed to drive cars.

Can you share with us how much Tesla spends on Autopilot or autonomous technology by order of magnitude annually? Thanks, that's basically our entire question regarding the expense structure. The economics of the Tesla Network, for me to understand, it seems that if you get a model three on lease, $25,000 goes to the balance sheet would be an asset and then you would generate a cash flow of $30,000 a year or so, that's the way of thinking, if something. like that, yeah, and then just in terms of fan cing, there's a question that you mentioned before that you would ask: is theNeutral cash flow for the Robo taxi program or neutral cash flow for Tesla as a whole?

Sorry, cash flow here in terms. You asked a question about financing the Robo tax, but it seems to me that they are self-funded, but yes, you mentioned that they would be basically cash flow neutral. That's what you mean now, I'm just saying between now and when Robo taxis are fully deployed around the world, the sensible thing for us to do is to maximize the fare and push the company to generate cash flow on their troll once they once Robo taxi fleet is active, you would expect to be extremely cash flow positive on this, so you were. talking about production, yes, I did it to produce '''I'm fine, thanks, maximize the number of autonomous units manufactured, thanks, maybe one last question, yes, if I add my Tesla to the Robo taxi network, who is responsible for an accident?

Is the test less than? Me, if the vehicle has an accident and probably damages Tesla Tesla, yes, the right thing to do is to make sure there are very few accidents, okay, thank you all, please enjoy the price, thank you.

Watch Video & Subscribe

If you have any copyright issue, please Contact