IROS 2022 Keynote: Safe Learning in Robotics by Prof. Angela Schoellig

Mar 16, 2024

Hello everyone, thank you very much for attending my talk. Unfortunately I felt sick and have to give this talk virtually. I would have loved to meet you all in person and discuss with you our

robotics

research in Japan, but we will do it again at the next conference I also have some personal news to share. I recently assumed an Alexander Humboldt Professorship at the Technical University of Munich in Germany and my laboratory has been expanding from Toronto, Canada, to Munich, Germany. I would like to take this opportunity to acknowledge the incredibly supportive environment I have experienced at the University of Toronto and I want to thank one person in particular, Tim Barford, who welcomed me to Toronto more than nine years ago and trusted in my abilities as a young researcher.

Thank you. Three researchers from Canada have joined my lab in Munich and the rest of the team is working with our robot setups in Canada and these are also the people who are behind all the results that I am showing today as part of our movement that we have renamed as

learning

systems and

robotics

lab and you can find us on social media at Ed learns this lab, I am finally hiring researchers at all levels as well as engineers and administrative staff, and if you are interested, please contact us, okay, not enough personal updates, let's get started.

More Interesting Facts About,

iros 2022 keynote safe learning in robotics by prof angela schoellig...

Today's talk focuses on

safe

learning

and robotics, a topic that interests me a lot. The motivation for this work is quite simple. The next generation of robots will depend on machine learning in one form or another. We need machine learning to understand complex environments and complex robots. Dynamics to perform complex tasks and interact with the world; However, when machine learning algorithms or their results are implemented on robots in the real world, their

safe

ty is important to set the stage. I would like to take you to my laboratory, my team works upstairs. monitoring the performance of mobile robots in increasingly complex scenarios our goal is to achieve high-performance movements for robot teams.

Here you see 25 vehicles coordinating their movements and we have also taken these vehicles outside the laboratory to a model nuclear power plant. facilities to study how they can be used for monitoring tasks. We have implemented machine learning and other safety-critical applications, such as outdoor flights, where we use vision for localization if GPS fails and where experience gains up to 28 kilometers per hour. We have implemented flying vehicles. in mines and we have done long-term off-road driving experiments, we also drive on the road for the SAE Auto Drive challenge that the University of Toronto won five times in a row and with our mobile manipulator we finally interact directly with the environment, but what we have seen repeatedly is that the use of data can improve performance in a typical control system.

We designed a reference controller based on our prior knowledge in a way that achieves good reference tracking, meaning that the actual output closely follows the reference signal no matter how often we watch. something like this that the desired trajectory in blue is not followed very accurately and we see a repetitive error every time we perform this task and this is shown in the colored lines this requires learning in a very simple way is to iteratively update the reference trajectory if we do that, we see that the input moves earlier and shows larger amplitudes and then we can get a much closer tracking.

We have used this for tasks like fast flying, slalom options, the disadvantages of this approach is that we have to relearn each scrape task through repetition, so a slight modification of the architecture allows us to learn an inverse model which we do offline from the data we collect and then we can improve the tracking of arbitrary trajectories as seen here so that the students in the lab draw trajectories in red and then the order of the quadrants with learning enabled is shown with a green trajectory and we get significant performance improvements of around 60 percent compared to no learning, which is the light gray line.

We have also used this on a mobile handling platform that we don't have. access to some of the lower level drivers, but we can still use that approach as an additional module to improve tracking performance, so here you can see that we need to be very precise in space and time to catch balls and with the learning the inverse model. With this approach we can reduce the base and arm tracking error by about 80 percent and then we can achieve capture rates of about 85 percent, so this is perfect real-time Vienna. Sometimes we don't predict the trajectory of the ball accurately, so this is a different situation. problem, but sometimes our tracking is still not accurate enough.

These two examples show that with relatively simple methods and without knowing much about our system we can use data to improve performance, but what does security mean in this context? These two approaches are security means stability. So if the Baseline system is stable, as shown in this gray shaded area or in a box, then our learning ensures that we stay stable, so safety generally means stability like keeping a flying vehicle in the air or It means whether it limits satisfaction by keeping objects in balance. on the train and avoiding the obstacle To summarize, the safe learning control problem aims to use data to improve robust decision making under conditions of uncertainty while meeting the given safety constraints.

I hope I have convinced you that data matters even in simple control cases and that safety matters in real-world robotics applications. In the rest of this talk I want to present to UBS the state of the art in safe robots, learning the various , what are the open challenges and what is my vision for this field and the benefit of the Quran. service what we need to do to enable progress, so this first part summarizes the key findings from a review paper my team has recently undertaken. What we have seen is that the field of safe learning in robotics is driven by controls and reinforcement. learning communities and often quite separated from each other with our review article, we hope to bring these works together in one place and unify the language with the goal of ultimately accelerating research in this area, controls and Reinforcement learning communities really come from two different extremes.

Model-driven approaches, which are generally classical control approaches, consider a set of predefined robot dynamics and environments and design controllers for those conditions, so that only a small portion of the world can be accurately modeled with the models. simple. Think of a system linearized around an operating point and there is usually a clear understanding of what can be accurately modeled and is safe, shown in green, and what cannot be accurately modeled and is not safe, shown in red. In this case, it is possible to give guarantees within the specific context, but it is possible to generalize to new operating conditions.

Challenging data-driven approaches that are typically reinforcement learning approaches learn a model of the world over time by collecting data; However, there is generally no clear boundary between what can be accurately modeled and what cannot be accurately modeled. These approaches allow for high generalization but provide guarantees. is often challenging and eventually combining the two approaches promises generalization, this ensures that we use models and learn together to improve the model over time with a predefined risk and what we established in our review article is that the safe robot control problem that has been studied in the controls and reinforcement learning community can generally be described as an optimization problem in which we want to minimize a cost subject to the true dynamics of the robot and subject constraints and here all of these components may be unknown, the dynamics of the robot may be unknown or partially unknown, the cost may be partially unknown and the constraints, so for example we may not know the exact mapping of how the states and inputs they map to the performance metric of the task we are interested in and then the question is really how do we design algorithms. which map our prior knowledge and any data we have collected from the system to the optimal control policy and this is what we call secure learning control problems in the literature, we have found three different security levels that, these approaches, have the level security one. a minimal violation of security restrictions, so failures are possible.

Security level 2 will guarantee the satisfaction of security constraints with a predefined probability, so with a high probability we will not see any security violations, and security level 3 guarantees that there are no violations at all times and with these defined security levels. We can map the current literature in this field on this diagram where we have on the vertical axis the increasing security guarantees from no guarantee to satisfying strict constraints and on the horizontal axis we have our knowledge about the system from completely non-dynamic to unknown dynamic. where as we go to the right we have increasing destructuring and uncertainty of the problem, so classical control approaches fall into this area on the left, we assume that we have a prior model with possibly bounded uncertainties, but we assume this prior model and the limits of uncertainties. are correct and if they are correct then we can provide satisficing reinforcement learning with strict constraints, on the other hand it is at the bottom, there are usually no guarantees but also no assumptions are made about the dynamics and the perfect example for this is the open hand of AI solving a Rubik's cube is a tremendously complicated dynamic that is very difficult to write about or make assumptions about, but reinforcement learning can accomplish this task;

However, only sometimes, only in 20 of the cases of the heart is this learned policy successful and 80 of the time the cube. is removed or a timeout is reached, so there are really no guarantees about the performance of the policy and now safe learning approaches have tried to fill in the white gaps and one category we find in the literature is safe learning. The key idea here is that you can have a normal nominal Dynamic, but there are uncertain components of your Dynamic and these are often modeled with the Gaussian process where initially your uncertainty, the blue shaded area, is large, but Over time, as you collect more data, the uncertainty reduces. and this stochastic model is combined with robust control to ensure the safety of all possible models represented by the blue shaded area, so we have shown this here on the off-road vehicle, just with a simple kinematic model, we hit those pylons and we cannot guarantee that.

We follow the path accurately enough if we add this additive model um learned initially, the uncertainty of that model is large and the robot drives slowly to keep the purple envelope within the previous bounce for the next few seconds and as we collect more data, the about is reduced. and we can try faster, while this approach always guarantees that we satisfy the path constraints. You can use a similar approach also to learn the cost function of a task, so here we want to understand how the controller gains one and two assigned to the performance of the task, which is a floating task and over time we can explore from We safely search these possible sets of games that could lead to high performance without compromising security and finally we find the optimal games that are safe and get an optimized driver.

So these are some examples that safely learn Uncertain Dynamics or Uncertain Cross Functions. A second category is security certification and security filters. The main idea is that we can have very complex learning-based drivers that can even include it as an image input, um, but then the security certification filter minimally modifies the input to ensure that a system remains secure in the future. One focus in this category from my group is the Ellipsis Network adaptation, so think about the previous inverse model approach where we learned this inverse model offline here. Learn it online as we go and the assumption we make is that this underlying system can save iteverything, but then the following condition can ensure that the closed-loop system is stable.

The ellipsis constant of the neural network must be less than one. System gain and system gain are something very easy and relatively easy to identify through an experiment, so by using the Lipschitz network architecture of the lips in particular, we can ensure that the prescribed lip constant is met. , so we will only learn um models that guarantee that our inputs are certified and safe, so we modify the gains of the neural network if it thinks it is not safe and we use this to stabilize a pendulum in a quadrotor in this very quadrature basic and available in the market and with a standard model based approach it is extremely difficult because there are many non-idealities that we do not know about, but with the Ellipsis Network approach we can stabilize the model and it learns and adapts online as we we move forward, so as you can see, it is very fast and can even withstand disturbances.

The third category, powered by The Reef reinforcement learning community, encourages safety by including constraints in the cost function and therefore as we minimize the cost function we will hopefully eventually satisfy the constraint as well, so it encourages safety and solidity, ultimately we want to come out on top here. true, we can deal with very complex systems and have very expressive models while still providing security guarantees, so what are the open challenges? A broader class of systems is what we need to look at at this point. We assume ordinary differential equations that are smooth enough but that The real world is not fluid, we have hyperdynamics when we cross objects.

If there are also multi-agent systems and soft robots that cannot be modeled with current approaches or captured with current secure learning approaches, we also need to work on scalability, sampling and computational efficiency. Even very simple examples like the inverted pendulum here on the left sometimes take hours or days to train a safe controller, so how do we go from simple tasks like these to something much more complex like the surface sampling satellite on the right? , computing power is extremely limited and security is critical. Another point is that all of these approaches assume that you have a state estimate available that is reasonably good - maybe you know it is corrupted by white noise, but is otherwise reasonably good - but this is not true in practice, where no We have complex sensor arrays that provide high-dimensional sensor data and we first have to make sense of that data and extract the state.

So how do we do it if maybe the system model is not uniform? available, safety guarantees are based on assumptions, even the smallest ones such as boundary perturbations or lip ship dynamics, but how do we verify that some of these assumptions hold true in real-world environments? We can do this because we collect data all the time, but what do we do if it is true? Don't wait and one of my favorites is: can we automatically infer what is safe? A robot with a camera can semantically understand the environment and should eventually be able to infer safe behavior rather than us manually programming those constraints, such as room boundaries. and the positions of the obstacles, so how do we move towards these goals?

One thing we noted in our review article is that the results have been tested on a wide variety of different systems, from numerical examples to grid rows to physics-based simulations on real robotic systems. but of the 80 papers, only about 30 provide hardware experiments, suggesting that it is still very difficult to run many of the proposed algorithms in real time in practice while ensuring the satisfaction of all these assumptions. He also noted that less than 20 of the papers provide open source implementations, this affects reproducibility and discourages comparison between different algorithms, so in response my team develops a secure control gym.

Secure Control Gym is an environment that helps us compare any model-based and learning-based controller and also secure learning-based control. Physics-based simulation is a compromise between realism and accessibility and will now recently lead us to the real transfer option. The key features we included that were not part of any of the popular morality benchmarks are to allow the symbolic definition of prior knowledge, such as The security constraints of dynamic equations can be specified, and we can reproducibly inject perturbations, such as input disturbances, processed noise and additive dynamics. Finally, it is compatible with all other environments through the gym interface.

We started with three systems: the card post and the Quattro that move in 1D and 2D, but now we also have a 3D Quattro, these tasks, which ones or these systems were chosen because this was the lowest common denominator that we found in the literature of RL and controls, the two tasks that we include at this time but that anyone can expand on later. stabilization and tracking of a predetermined trajectory, so finally this now allows control based on the reference model and learning, as well as reinforcement learning of the model 3, and one of the great benefits of using this environment is that it already We have implemented many baselines that you can use to compare your algorithm with classic model-based control approaches, such as LQR and model predictive control, and standard reinforcement learning approaches, such as PPO and SAC, up to some of the latest safe learning control approaches, and then you can get results like this here.

The robot needs to stay in the black square while following the red circle as closely as possible. Now we can use approaches from the different categories I introduced and compare them. You can also compare them on performance and data efficiency, so the blue line here is a model. based learning approach where the parameters are overestimated by 150 percent and on the vertical axis you see the performance on the given task and on the horizontal axis you see the necessary training data, so it is seen that this based learning approach in models even though the model is quite incorrect requires like less than two magnitudes of data compared to a completely model-free learning approach.

We have taken this to the community and here at IRA we have held a virtual safe learning competition where the goal was to design a controller or planner that allows you to safely pass through a set of doors and reach a target while avoiding obstacles. The challenge was that there are uncertainties in robodynamics on Mars and inertia and the environment, such as wind and the position of the doors, and here you see the video of the winners. and we encourage participants to explore both control and reinforcement learning approaches and the best approaches we test at our flying field in Toronto and as you see we can guarantee that it looks like a real transfer, this keeps us full because he really tests in their vehicles and we notice, for example. some approaches are not fast enough to run very well in the real vehicle, so I hope you remember our review paper on safe learning and robotics and contribute to the field by making use of our Safe Control Gym Benchmark suit and if you are interested For any upcoming events and competitions go to robotlearning.org.

We also have recordings of some of the previous workshops we have been hosting on this topic, there is also a way to sign up for our mailing list and since it is very difficult in a virtual environment I thought I would host a Zoom meeting next week after that everyone has returned home, so if you are interested in chatting and asking questions on Zoom, fill out the form on tiny.cc slash