YTread Logo
YTread Logo

Elon Musk on Cameras vs LiDAR for Self Driving and Autonomous Cars

Feb 27, 2020
Your

lidar

is nonsense and any

lidar

is doomed to failure. Expensive sensors that are unnecessary. It's like having a bunch of expensive pen appendages. Like an appendix is ​​bad, well I put a bunch of them in, that's ridiculous, so in the next section of my talk I'm going to talk especially about depth perception using vision alone so you know there are at least two sensors in the car, one is vision

cameras

that just get pixels and the other is lidar which a lot of companies also use and lidar gives them these point measurements of the distance around you.
elon musk on cameras vs lidar for self driving and autonomous cars
One thing I would like to point out first is that all of you came here, you drove here, many of you and you used your neural network and your vision, you were not shooting. lasers from your eyes and you still ended up here, we may have it so clearly that the human neural network derives distance and all measurements and three-dimensional understanding of the world is adjusted from vision, it actually uses multiple signals to do it . I'll briefly go over some of them just to give you a rough idea of ​​what's going on and as an example we have two that I supported so that you get two independent measurements at each time step of the world in front of you and your brain puts this information together. to arrive at a depth estimate because you can triangulate any point through those two points of view, many animals have eyes placed on the sides, so they have very little overlap in their visual fields, so they will typically use the structure of movement and idea. is that they move their head and because of the movement they actually get multiple observations of the world and you can triangulate the depths again and even with one eye closed and completely still you can still have some sense of depth perception if you do this.
elon musk on cameras vs lidar for self driving and autonomous cars

More Interesting Facts About,

elon musk on cameras vs lidar for self driving and autonomous cars...

I don't think it realizes I'm moving six feet toward you or a hundred miles back, and that's because there are a lot of very strong monocular signals that your brain also takes into account. This is an example of a pretty common visual illusion where you have, you know? These two blue bars are identical, but the way your brain puts the scene together simply expects one of them to be larger than the other because of the vanishing lines in this image, so your brain does a lot of this automatically and a neural network Artificial nerves also do not scan, so let me give you three examples of how you can reach depth from vision alone, a classic approach and for that rely on your own networks, so here is a video.
elon musk on cameras vs lidar for self driving and autonomous cars
I think this is the San Francisco of a Tesla, so these are our

cameras

, our sensors, and we're looking at everything. I'm only showing the main camera, but all the cameras are on, all eight cameras on autopilot and if you only have this six second clip, what you can do is create this 3D environment using multi-view stereo techniques, so that this is the 3D reconstruction of those six seconds of that car

driving

down that road and you can see that this information is purely recoverable only from videos and, roughly, through triangulation process and, as I mentioned, multi-view in Syria, we've applied similar techniques a little more sparsely and roughly in the car as well, so it's notable that all that information is actually there in the sensor and it's just a matter of extracting it from the other one.
elon musk on cameras vs lidar for self driving and autonomous cars
The project I want to talk about briefly is, as I mentioned, there is nothing about neural networks, ORS are very powerful visual recognition engines and if you want them to predict depth then you need to, for example, look for depth labels and then they can do it. that extremely well, so there is nothing that limits the networks in predicting this monocular depth except the tag data, so an example project that we have looked at internally is that we use forward-facing radar which is shown in blue and that radar is looking and measuring. depths of objects and we use that radar to annotate what you see in the vision, the bounding boxes that come out of the neural networks, so instead of human annotators telling you okay, this car and this bounding box are approximately 25 meters away, you can write down that data.
It's much better to use sensors to do sensor annotation, so as an example, radar is pretty good at that distance, you can annotate it and then you can train your lab work on it and if you have enough data, this neural network is very good at predicting. those patterns so here's an example of predictions of that, in circles I'm showing radar objects and the keyboards that come out of here are purely from vision, so the keyboards here just come out of vision and the depth of those cuboid. it is learned by an annotation from the radar sensor, so if this works very well, you will see that the circles in the top down view would match the cuboids and that is because the neural networks are very proficient at predicting depths that they can learn the different sizes of vehicles internally and they know how big those vehicles are and you can actually get depth from that pretty accurately.
The last mechanism that I'll talk about very briefly is a little more sophisticated, it's a little more technical, but it's a mechanism that Recently, there were some papers basically in the last two years about this approach, it's called

self

-monitoring, so what's does in a lot of these articles is just feed raw videos into neural networks without labels of any kind and you can still learn that you can. I still have neural networks to learn the depth and it's a bit technical so I can't go into all the details, but the idea is that the neural network predicts the depth in each frame of that video and then there are no explicit goals to be made.
It assumes that the neural network returns with the labels, but instead the goal of the network is to be consistent over time, so any depth you predict must be consistent over the duration of that video and the only way to be consistent is to be right. the network automatically predicts the correct depth for all pixels and we have reproduced some of these results internally, so this works quite well too, in short, people drive with vision alone, no, there are no lasers involved, this seems to work quite well well, the point I would like to make is that Nishan's visual recognition and very powerful radar work are absolutely necessary for autonomy, it is not a nice to have as we must have neural networks that actually understand the environment around you and the LIDAR points are much smaller. information rich environment, so vision really understands all the details, just a few points around are a lot, there is much less information in them, so as an example on the left here, it is a plastic bag or a lidar for tires that could give you some points.
In that, but the vision can tell you which of those two things is true and that affects your control. Is it that person who looks back slightly? Is she trying to merge into your lane on the bike or is she just moving forward? Construction sites, what do those signs say? How should I behave in this world? All the infrastructure that we've built for roads is designed for human visual consumption, so all the sizes, all the traffic lights, everything is designed for vision, and that's where all that. the information is and that is why you need that skill is that person distracted on their phone goes to work walks towards your lane those answers to all these questions are only found in vision and are necessary for level 4 level 5 of autonomy and in this sense lidar is really a shortcut, it avoids the fundamental problems, the important problem of visual recognition that is necessary for autonomy and therefore gives a false sense of progress and is ultimately a crutch.
I should point out that I don't actually hate light much or that much. as they sound, but at SpaceX their physics dragon uses lidar to navigate to the space station or dock, not just us, SpaceX developed their own light from scratch to do that and I spearheaded that effort personally because in that scenario lidar makes sense and in Cars are fucking stupid, they're expensive and unnecessary and like I was saying once you solve the vision it's worthless so you have expensive hardware that has no value in the car. We have a forward radar that is low cost and is useful especially for occlusion situations, so if there is fog or dust or snow, the radar can see through that, if you are going to use active photon generation, don't use the visible wavelength because once you use passive optics, you have taken care of the entire visible wavelength. things you want if you want to use a wavelength that is penetrating occlusion like a radar, so it's true.
Lana is just active photon generation. Individual spectrum. You are going to do active photon generation. Do it outside the original spectrum on the radars in the radar spectrum, like this. three point eight millimeters versus 400 to 700 nanometers we have much better occlusion penetration and that's why we have a Ford radar and then we also have ultra 12 ultrasound for near field information, in addition to the eight cameras and the Ford radar, you just need the radar in all four directions because that's the only direction you're going very fast so I mean we've gone over this a few times like everything we show we have the right sensor set we should add something else no we just I wanted to follow up. partially because of that because several of your competitors in the space in recent years have let you know that I've talked about how they're augmenting all of their route planning and perception capabilities that are in the automotive platform with high-definition maps. of the areas you are

driving

, does that play a role in your system?
Do you see it adding any value? Are there areas where you would like more data that is not collected from the fleet, but is more cartographic style types of data? High precision GPS maps and lanes are a very bad idea, the system becomes extremely fragile so any changes like this could make any changes to the system unable to adapt, so if you get stuck on GPS and lines high precision fixed lines and and does not allow you to override vision, in fact, your vision should be what makes everything you are and then, as fixed lines, they are a guide, but they are not the main thing, we briefly block the fixed lines tree of high accuracy. and then I realized it was a big mistake and I reversed it, not good

If you have any copyright issue, please Contact