Keynote talk: Tapio Schneider, CalTech: “Lessons from Arakawa in Fusing Theory, Data, and Computing”

Mar 18, 2024

Tapio earned his PhD in 2001 from Princeton and has pretty much been at Caltech ever since. He is a professor of environmental science and engineering and is also a senior research scientist at JPL. He heads Caltech's climate dynamics group. And when I was, Looking at Tapio's website, he's got a lot of really good videos on there and he also has some blog posts that are really interesting, especially for people like me who aren't quite up to speed on some of this newer work. He has a video that I love. I recommend it is titled Clouds, Clouds and Climate Tipping Points.

It's a video he made about three years ago. I really recommend that video has a lot of things about the role of CO2 change in stratocumulus clouds. Anyway, the title of Topio's

talk

is Lessons. from

arakawa

fusing

theoretical

data

and computer science thanks Wayne and um thanks for Dave and everyone who put this together. I think this has been really interesting for me and especially yesterday, learning about a cure, our cover in ways that I didn't get the chance to. to get to know him I met him we have had brief conversations um I think between his shyness and my shyness and intimidation there wasn't much exchange but I really enjoyed the few hours we spent together but what I have always enjoyed And what I admire is his work which I am familiar with and his approach to science.

More Interesting Facts About,

keynote talk tapio schneider caltech lessons from arakawa in fusing theory data and computing...

I think it is exemplary and should be exemplary for all of us today and I want to explain why and how. Let me maybe dive right into um Akira kava. Of course, it was part of what we call the Charney report, the first climate assessment, where people looked at multi-model ensembles, you know, three models at the time of climate change predictions and we're trying to find ways to say what CO2 will affect the climate and this has been cited many times. The estimate of surface warming was between two and three and a half degrees, which was organized taking into account both the models and some factors of what people thought were uncertainties, that part is fine. known I think what's interesting about the report is the care with which it considers the science and this is already in the summary statement.

This rate reflects both uncertainties in physical understanding and inaccuracies arising from the need to reduce the mathematical problem. which can be handled by even the fastest computers available. I mean the same phrase that you could still write today in any climate assessment word for word, of course, here we are today, this is climate sensitivity and in the same six models the range is two to none. almost six degrees, so there are some models that have higher climate sensitivity than before. Very hot models are not considered realistic by many and do not reproduce well some aspects of past climate, but that is not what I want.

To speak, I think the key here is that the uncertainties are still large and they all have to do with the types of problems that our coverage has worked on as small-scale processes and mainly the clouds dominate the uncertainties that we cannot solve. Randall

talk

ed a little bit about this yesterday and maybe he wants to dwell a little bit on the computational challenges and what can and can't be calculated because I think there are a lot of statements that are somewhere between con

fusing

and misleading. low clouds that at least until recently dominated the uncertainties and probably still do clouds like those off the coast here stratocumulus clouds that Wayne has also worked on a good combination of layer models, um, they have dynamical scales on the order of meters , maybe tens of meters and we are reaching global models and resolutions at tens of kilometers that are becoming routine and we can push ourselves towards shorter kilometers and routes, now, but that is still summer, three orders of magnitude longer than we you really need to solve. low clouds and of course that doesn't get you into microphysics.

They have mentioned since their time as a graduate student. Computing power I think you said has increased by a factor of 10 to 11. So this is the increase in

computing

power since the time of returning the report is 1979. It's not exactly 10 to 11 since then, maybe it's 10 nine, more or less. Computing power has increased exponentially, it is on a logarithmic scale and still doubling, which is really surprising given that Moore's Law is predicted. to come to an end and our scale has come to an end and yet computer power continues to double. This graph is not completely up to date.

I need to update it. It's already five years old, but it still holds up, but the interesting piece you can add. to that's this, so here are all the climate models published from 1979 to 2017, atmosphere models, atmosphere models, ocean models or systems models, um, and what is represented is the horizontal resolution on the left axis in a logarithmic scale, so its inverse kilometer is the horizontal resolution in the atmosphere. and the left axis is plotted so that increasing the factor 10 on the left corresponds exactly to a factor 10 to the four on the right because if you want to increase resolution isotropically by a factor 10 and a 3D fluid dynamic model needs 10 to 4 times more floating point operations and you see that the models follow a shallower line than computer performance because the complexity has increased, we have gone from an atmosphere model to ocean atmosphere models or systems models learned in the immune process and the like.

So in many ways it was a good evolution if we had used all the marginal increase in

computing

to increase the resolution of the atmosphere models, they should have aligned with a blue line and well, they didn't, so we started. Dave was talking about 250 mile models. or about a thousand kilometers resolution in the 70s, so it's a lawyer's report and now we are tens of kilometers away, but we had to use all the additional computing power available to increase the resolution of the atmosphere models. We could already have a higher resolution and for good reasons. I didn't do this, but suppose from now on you do this, so from now on there is enough complexity, we just add resolution to the atmosphere in the ocean and then in principle we could follow this blue line as long as the exponential growth continue, which can last forever. but let's assume, for the sake of argument, that it will be for a while, then you can ask, well, when do we resolve what they've talked about about the gray zone for deep convection?

We are reaching this now in tens of kilometers. Resolution of 10 kilometers in the direction of kilometers. but low clouds, dynamic scales of tens of meters, are not even on this graph. If you just extrapolate exponentially, which is completely unreasonable, we wouldn't solve this before 2060. So kilometer-scale models are useful and I think it's something we should strive to build, but there's no reasonable hope that the Kilometer-scale resolution alone resolves the uncertainties we have in climate predictions. You will get weather predictions that are more detailed. Rain predictions that are more detailed, but can be just as wrong as the ones below.

Right now we still have a double ittz, for example, in many high-resolution simulations, so what has changed since Arakawa's work in the 70s on the development of numerical methods to develop methane, the prioritization of clusters that many of us use and love, it's really quite amazing. to think about the changes, so computing power since 1979 has increased by more than a factor of 10 to nine and if you go back a little further you get to the 10 to 11 that Dave was mentioning, Dave was mentioning the Akira cover. I don't really have articles where I compare simulations to observations in much detail, that may be true, but I think part of the reason is that there just wasn't that much

data

and even when I was in grad school when I wanted to look at the data. found brown or to give me a tape where there was larger data on a 10 by 73 grid on the atmospheric circulation that I used, but that was 20 years after this and it was still quite difficult to work with data.

I was trying to estimate how much data there was in, say, the 1970s and I'm not quite sure I know. Now we're getting about 50 terabytes of data a day from NASA satellites alone and you know it's pretty reasonable to think there wasn't more than a kilobyte. per day of radiosounds and the like, so it's probably also something like a 10 to nine effector increase in data volume, if not, maybe more. I'm not too sure, so by some metric you'd think we're a billion times better than what people did in the '70s, and of course in many ways we're not in terms of climate model reliability and I would say that a key reason is that from a theoretical point of view there has been a lot of progress, but if you look at the derivation in our Schubert recovery that we heard about the 150 equations yesterday, you go through the first 74 of them, they still feel very modern and I want to argue why the key here is that well, maybe in terms of communicating the end result that you might have.

I did this in a shorter article, I don't know, but the nice thing about the work is that there is a controlled set of approaches that Akira covered and that he went through to get to the result and establish what that controlled set of approaches was. take some space and now if you want to do better, you can go back and, um, I would say right now you should stop at a broad equation. 57 I looked it up yesterday and moved on from there, but that part is still useful. I mean, here's the equation for a thermodynamic variable and sigma is the area fraction and, for example, there's a time dependence in the memory term that we heard about yesterday and stuff like that, those equations are still good, in essence, those equations are still the ones we use and I'll show you how they are just some later approximations that you might want to generalize, so computing alone is not going to solve the problems that we have and climate modeling and the dynamical scales that I mentioned are one of the reasons, but even if you could solve all the The dynamics of cloud microphysics remain and there is simply no way to unite the scales in microphysics using Brute Force Computing that we would need to unite to overcome all of these small-scale microphysical processes in clouds.

Warm rain processes. Ice phase processes. Mixed phase clouds are a big problem for the latest climate simulations there is no way to get from there to what we need macroscopic effects like Albedo precipitation and the like using Brute Force Computing you can calculate what happens on microphysical scales and clouds with a lot of precision in domains the size of cubic centimeters from there to the world is a long way, so computing alone is not going to achieve this. I think we need more

theory

, not less, to move forward here, and of course it's not just

theory

, it's not just paper and pencil next to the trash can. fire, but you need to combine that with what you can do computationally and what we can do with the data now give them how much we have, we have amazing data, for example, we have cloud radar, observations of lighter clouds from space to not All this it's a little over a decade of data, but on day one for the first time we have 3D data, 4D data on global cloud cover and we don't use that data much in climate modeling.

NASA is the expression they used. to the ground, they are there, we use it for model evaluation, but directly to inform a model, we don't use them much, of course, the other data that we have and that we are using a little more is computationally generated data, this can be a of the simulation visualizations wayne was talking about was made by calpresel this is a big edge simulation of course on a secret what else do you do? There are other things to do now, but this is what we did, it is learning from we recover how to discretize and these simulations are very good because we can compare them with aircraft data in this particular case, this is a Caribbean cumulus situation, the blue is rain and they are quantitatively quite precise, there are microphages, of course, they are parameters, the microphysics is The uncertain microphysics still plays a role, for example, in controlling the amount of rain in these simulations quite sensitively, so the dynamics of These simulations are well resolved.

You just have to assess to the payers that the domain is small, this is a few kilometers on the site, we can reach dozens. of kilometers on a side, maybe 100 kilometers on a side, but that's the biggest you can do with these simulations now and to get to 100 kilometers on a side you need hundreds of petaflops and big computers. Computing alone is not going to solve the problems and then the next, the next thing that is talked about a lot is also these simulations that we can do, we have data, can learningautomatic solve the problem? I think it's part of a combination, but by itself it can't be resolved.

The problem is because what we want in climate models are models that are useful for science. I mean, we all do this because we want to make scientific progress, ultimately understanding how Ice Ages arise and liking them. So you want models that you like. We can interpret only for scientific reasons, but the most important thing is that perhaps, ultimately, what we do is for the benefit of society. We are funded by public funders in one way or another and we have a responsibility to deliver things that are useful to the rest of society. The crucial thing is that people can trust a model and, unlike weather forecasting in a climate context, we do not have daily verification or falsification of what we do, so the model needs to be interpretable so that people I can disarm it from the beginning.

Finishing and understanding what is happening in its interpretability is important. Machine learning methods are not easily interpretable. You often need models that generalize from the sample and that is perhaps the most crucial piece of data we have now that we need to predict weather that none of us have seen and that may be very different because for the generation of my kids I don't have data on that, so if we need methods that generalize to a climate for which we haven't observed analogues and again for practical applications that Um, you know, didn't play much of a role in the '70s, but I think now We are crucial to what we do.

Quantifying uncertainty is also important, you don't just need point estimates, this is the mean change and sea level or sea ice cover. or whatever you take, but you wanted to come equipped with uncertainties because if, as you plan, you want the probabilities of, say, storm surge to exceed a certain level, then I think these are three crucial requirements and, um, deep learning doesn't works well to satisfy these requirements or depth. Learning as practiced so far is not working well to meet the requirements. Here is an example. I think it is a very good and useful study of Purevontine.

Several other people took a super parameterized camera on the right, so it's a camera with high resolution simulations built in. where each grid cell gives you high resolution convection simulations and the top is the heating rate, the bottom is the wetting rate and what you get from there are inlet and outlet convection pairs, so you get the temperature and humidity in a column in the temperature trend and humidity trend. in the output column and you can use standard supervised learning approaches that require labeled input and output pairs to train, in this case, a neural network. It is to the left of these Outward Trending State entry and exit pairs and reproduces this mapping from States to Trends. alright, the left and right columns agree pretty well, this works but it has several problems, so this was in um, the models involved here involve over a million parameters.

It's good to know that with a million parameters you can parameterize convection, but it is difficult to interpret, it doesn't generalize very well from the training sample, and quantifying the uncertainty is essentially impossible. I think it's a useful exercise and approach, but I don't think this is the way to go now if you think about what. Arakawa has been doing it, it was a reductionist science and I loved Wayne's stories about looking at ocean cumulus clouds and the results of the cartoons and in the later articles of our clusters penetrate through a boundary layer and result from those 150 equations that I tried to describe that, right, is very much a reductionist approach.

You take the big picture and try to boil it down to something small that you can turn into equations in a pretigmatic kind of science. The reduction of science would be Newton, right? The universal law of gravitation is the law of one parameter with a few variables that generalizes from planets orbiting stars to apples falling from trees. It is extremely interpretable. Yes, there is some creepy action at a distance, but you can interpret it and it lends itself very well to quantifying uncertainty. It is also useful. To take into account what Newton correctly replaced, this reductionist sign began with Bacon Newton in the late 17th century before Newton we had to cure him.

He was the big data guy and by diligently compiling observations of planets orbiting stars, those observations were used to inform Deep Earth. Photolumy epicycles learning model of time, so he used circles as basic functions to describe planetary motion and can describe anything with superpositions of circles and self-contained epicycles. If something funny happens, just add another basic function, another layer in the foundation network. functions and you adapt to the planetary movement extremely well, the price to pay is that you need too many parameters, so some circles go to the right and some kind to the left and several other parameters that you have to adjust and at this time people such You might laugh at that, but I think Ptolemy is where we are with deep learning right now Ptolemy was replaced by Kepler first, who replaced the circle basis function with an elliptic basis function and things get a lot simpler, so now we're trying to use things like value connections, I mean basically no. -Strongly non-linear linear sigmoid functions as basis functions to fit everything and works great when you have a lot of data.

I would say that in science what we need is a mutant-like reduction to the correct set of basis functions and a smaller set of parameters. Of course, in the reductionist approach to clouds, one recovery made more progress than anyone else, but many of us have tried to find the perfect reductionist description of clouds. Dave Randall spent decades on it and ended up writing articles on How to Break the Deadlock and Cluster Prioritization Situation realizing this, this is reaching its limits, right? I mean, the systems are complex enough that you know that paper and pencil may or may not be a trash can, you're just not going to solve that problem completely this way, the success of deep learning, instead, is based on overparameterization, if you want to put it in a word, which means that you are going to regulate with a number of parameters that you are estimating that is very large, that is, larger than a sample size, it usually requires methods that consume a lot of data, but leads to very expressive models, can fit anything without universal approximation theorems, but the interpretability of generalization and uq are really challenging, so what I would argue and what we are doing is trying of combining the best of reductionist science theory with what can be learned from data science methods.

Well, what is required, I think is progress on three fronts, advancing the theory is still absolutely essential and I want to give some examples of what we are doing there and I have to say that in recent years we have been following these approaches if there are something that What has surprised me day to day more than anything is the power of theory and being careful and the derivations and leading to models that are very predictive. I will show some example. We want to use data. The data we have is observational data and is generated computationally. data and I will show how and of course computing power continues to increase exponentially.

It is really surprising. One thing means that we must achieve the highest possible resolution. I don't know what that means in practice. I'm trying to figure it out, I mean, going to one kilometer is the performance is too slow, probably not useful, but there is a very sweet spot where you get a good compromise between using scientific models with reasonable performance and a reasonable precision, and I don't think so. We know where that is right now. The other way to use computing power is to generate, for example, data about clouds computationally and that lends itself very well to computing with accelerators and GPUs because it's a natively distributed problem, so how does it actually work? and when.

I started working in this area, I was inspired by the more electronic formalism of statistical mechanics and I don't want to explain what this is, I just want to illustrate what the result is, if you have a dynamic system like here like a variable Let's say x is a slow variable and Y is a fast variable and each has its four signs f clouds can you do this precisely um and it's a little complicated but what comes out in the end is an equation that looks like the equation at the bottom an equation for the slow modes and The crucial thing here is that you still have the forcing for the mode slow cfx, then there is a term M which is what comes out of the size, it is a renormalized version of the coupling, but the most important thing is that there is noise, stochastic noise and there is memory that they also have.

As I talked about yesterday, memory appears here as a convolution integral, so H is a memory kernel and there are some convolved over some memory kernel or some time scale Tau, that's one way of writing it, an alternative way of write the same thing instead of having an integral differential. equations is to write them as a separate differential equation with its own time dependence for memory, so I think it's important to keep that in mind when averaging a system that has no scale separation here, i.e. there is no built-in scale separation if scale separation and time gets big, you can ignore noise and memory and things like that, but we don't have that, especially as we get to 10 km resolution or so and climate models, cumulus convection already It is not separated into scales from the resolved scales you need to finish. with prioritization and having these kinds of elements, um, and you can do that, so we started working on this work, I was here and I think we'll talk about some related approaches later, a good number of years ago and some colleagues at that time The ECMWF pioneered what they call the antidiffusion mass flow approach.

The idea is to take the flow in the grid box and decompose it into coherent parts and more isotropic parts that we hear into parts of updrafts, downdrafts and the like and the more isotropic ones. parts is all the turbulence around it, you can decompose the flow and you can conditionally average the two parts and then what you get are terms of exchange between these different parts of the flow, orange arrows indicating those entrainment and uprooting that were already in Arakawa and Schubert, so In a little more detail, this is literally you take a recovery and Schubert and you stop more or less at equation 57 and then take a detour from there. um, you take the equations and you average more isotropically the overcoherent and the incoherent. turbulent parts and you take the numbers, talk about equations, you get a bunch of equations including higher moments for turbulent kinetic energy for scalar variations and the like.

I'm just showing two to illustrate the continuity equation and some equation for some scalar five where the scalar can be a thermodynamic variable, specific humidity or it can even be an updraft velocity. The key is that you generate equations for each subdomain over which you have an average and these can be columns of updrafts, downdrafts and there is a distinguished subdomain that for us is the index times times zero, I is equal to zero, that's what which we call the environment and you know, all the terminology comes from recovery and true Birds, that's how you guys talked about it at the time and we still talk about it in those terms, so there is a turbulent environment that interacts with the coherent structures with the columns, so the equations you get look like this on the left side. um AI is the fraction of area that was sigmai and a two-bird recovery.

The only thing that is different from how equations are written or retrieved is Schubert. that these horizontal terms are still there, you're just averaging everything, so there's an invection of things like area fraction updrafts and the like on the horizontal, on the vertical via a grid mean and the like, so that the angle brackets indicate the grid mean, um. If you do this, in some ways the prioritization becomes part of what you would normally call the right dichor, so these are the sites on the left, here are equations that you would solve like everything else in the forecast equations course, havea memory which is the forecast terms, for example, memory and in the area fraction, and you solve it in the same way that you solve any other dichori question, um and important things like the area fraction or any other property of convection , is it vectorized along with the flow?

The resolved flow to the left side is the result of rigorous homologation averaging. There is no approximation, except that right now we essentially do the boundary layer approximation that we say in the updraft, the vertical scales, the variations in the vertical are much larger. that the durations on the horizontal which is the only approximation in this on the right hand side you get all these terms which are what is now the prioritization if you want, that is crucially is drag drag which describes the exchange of mass or the exchange of tracers between different columns of updrafts and environments and things like that and then travel and transportation terms emerge that can be diffusely closed and that's what makes this editing diffusion mass flow approach so important that it has a structure that the Maury Transit formalism suggests that prioritization should have, there's memory, well, there should There will be noise somewhere that might be on the right side and um, I think the Harkana student was here, he's working on that thing about having stochastic closures in the right side, um, what I'm going to show you has no noise right now, I think it's crucial, but what?

I'll show you that he doesn't have that yet, but he does have memory and I'll show you why that matters and everything on the right side implies things that we don't know much about Ignacio yesterday he talked about the turbulent transportation shutdown and it's another example more of what Ignacy had been doing there that really impressed me. It's just taking the equations and the equilibrium laws very seriously to derive, essentially, derive mixed length formulations, a whole zoo of them, and then at the end, combine them. and that was really important for the results that I'm going to show and it just came from theory, not machine learning or anything else, and it was extremely successful, extremely generalizable, we ended up mixing link formulations.

Basically, you wouldn't have any free parameters. you can tune to a parameter, but even that parameter is not totally important, but some of these things, entrainment and detraining, are a really great target for machine learning, so if you think about money, novokoff and maybe let's talk of it, our money, no books. I have worked on this. You also try to reduce the problem to the extent that you can make this approximation of the boundary layer that we are making variations in a vertical or larger than the horizontal and then ask what dimensionless groups are there in this problem and in the morning case only a dimensionless group emerged and then you say well, everything else, the structure of velocity profiles and the like is just a function of the height of that dimensionless group divided by a book length in that case, so here we are talking about clouds, turbulence, convection, all in one and we don't know all the dimensionless groups, maybe we came up with six.

That's so, if you don't involve higher order derivatives, you get six groups that can matter here and if there's anything like how they look like, there's something like um Z multiplied by the buoyancy divided by a measure of turbulent kinetic energy in the vertical. there's a group that comes out and just relative humidity arises naturally, obviously it's true, it's a relative humidity difference between an updraft and the environment as a one-dimensional measure and we have four others that look like this, some look a little complicated, the point are things I can frame. Hopefully, in detraining, they become universal functions of these non-dimensional groups and it is necessary to introduce a dimensional scale so that the fractional and treatment detraining or one of our lengths, so that there is some length scale that can Choose in principle any length scale you want.

Z or anything else you do will change the functional form of f if you do it, but in principle if you've chosen the right groups with whatever scale you choose you should get the correct function as long as there is a universal function um It's important To say that this arises through homogenization in frame entertainment, some might say well and I think Dave was saying that you want to use things that you can measure and in principle you can measure them and frame them without training and practice them very hard even. simulations I think being able to measure that is not so crucial, it's a little bit like if you think about um, you know, in quantum field theories, when you renormalize you end up with renormalized quantities and people have this slightly successful language that they use nakedly. quantities and dressed quantities and the like um simple quantities cannot be measured what can be measured are things that are renormalized and it is a bit like that here that entrainment, detraining and theory is a concept that we can describe, what is it, it is difficult to measure. and practice and I think it's not crucial that you can address it directly and I'll show you why not, you know, I started working in this area at the time of this.

I get it back on blue paper and for me I had been thinking about these. topics and talk to the works, especially about how to devise fertilizations that are more suitable for the gray zone of deep convection and here comes this article that I recover about who and I find it enormously inspiring, I think largely because I refer to our The cover was relatively old at that time and he was the one who innovated and led the field and the innovation said, well, this is what we have to do now or what we did there, what we did there, you know, 40 years before, everything was good, but it no longer fits with the computing power you have, the data we have.

I found it really inspiring. What we do here is a little different. This paper was concerned with ruling out the small area fraction and updraft approximation that we are not doing. that approximation, but there are also memory terms and some other approximations that you can give up, but you can ask. I think why a recovered person came up with something so late in his life and I think it's the 150 equations, I mean. I carefully went through a set of successive approximations and then people just took the final result and put it into a model and forgot about the 50 equations about 49 before, um and I think because you guys reviewed this carefully, there was a feeling where are we approaching and what are we approaching?

That was deeply ingrained and I think a lot of other people later maybe didn't have it and it's important to have that sense of success in successfully approaching phenomena. I don't know if that's the reason, but that's it. How it seemed to me that there was just an awareness of the approximations that were being made and that just the users that they used as models maybe didn't have the same degree because it took a year to get over it and that makes it more memorable um so that you get consistent approximations across. this milling approach from the course I described, you would get models that are barely interpretable, there are just some functions that we don't know what they are like, incremental detraining from the beginning, conserve mass momentum energy with respect to John until Burns talks about energy problems and Of the climate models we are trying to avoid that problem in the first place by building models that exactly conserve energy by using energy.

Total energy actually has a predictor variable in both prioritizations and dichor and the like, so its Guaranteed Energy Conservation, you still have to worry about discretization, but as long as everything stays consistent, the energy of this framework is will be exactly preserved and a physically consistent interaction between the processes will be obtained. You can't always do this. Take this approach, but I find it works more often. than people think, for example, one thing I don't mean is working on Earth models Hydraulics of plants Hydraulics or obeying Newton's laws. You can use very similar approaches that turn out to be quite successful to get into developer transpiration, for example this gives you equations for all convection turbulence cloud dynamics and you can use them for anything other than boundary turbulence. basement convection that works for everything and you need to combine that with saying what clouds do and the way we're doing it. ideas from the Sumerian derdorf we carry with us subgrid scale distributions of quantities, so variances, covariances of thermodynamic variables and once you have these distributions, you can ask what fraction of our subgrid scale distribution is above saturation, which it should be in a condensed phase and the like, and you can pull things like cloud cover and liquid water out of the cloud.

There's one more thing we're starting to do that I think is, in the same way, relaxing the equilibrium assumptions, so here we have relaxed quasi-colorium assumptions, but there's also thermodynamic equilibrium baked in. in parameterizations and in thermodynamic equilibrium you don't have a super cold liquid and in reality we have a super cold liquid, so thermodynamic equilibrium is also something that you don't want to incorporate and I don't want to talk about any interest of time, but I just want to say that you can relax the thermodynamic equilibrium assumptions using assumptions of how the phases relax toward equilibrium and what that gives you is that if you have a fast updraft in a deep convective cloud, the condensate may not freeze at the freezing point. level if the updraft is fast enough it takes some time to freeze and if it's fast enough it won't and you can get a super cool cloud in a physically consistent way, get the asymmetry between the updrafts and downdrafts and the like, um Ignacio talked a little bit about what we're doing next and Joey's not here, it's not a sign, so there are these functions that we don't know what they are, the main one being drag and drop among them, plus a few others, um, here it is just the bias in cloud cover. of a climate model and it is quite common for climate models to have biases of 50 or more in the stratocumulus region, large biases in the polar regions, the bias and the percentage means that the cloud cover is underestimated and we only focus on regions, for example, in the tropical Pacific. with great biases and Ignacio already spoke of some polar examples that I will not repeat.

What we did there is just generate training data, essentially use big Eddie simulations, drive them with GCM output from some gcm from the ipcc file from the CM file and Joey. We generate about 500 of these simulations now in different places at different times of the year and then I think what has been enormously important and useful for us is generating an automated process where we have this large library of hundreds of the ones that we have right now. . you have physics-based parameterization and every time you change something in that prioritization, you can automatically test it with that library of simulations and see if it makes things better or worse on a fairly large sample that samples different conditions and you can automate it and use the machine. learning tools and foundation that I will describe in a second to learn about the functions we don't know what they are so I showed these supervised learning approaches and well what we do differs from that in the following way in the weather what matters statistics , so averages, second moments, highest moments, extremes and the like, and what matters is not Trends or the next day, um, Trends could be a crutch to get there.

The problem with learning from Trends is that not only is that interpretation of similar difficult, it is also difficult once. you have a model that captures Trends very well, you put it in a GCM, it tends to be unstable, there is no guarantee that if your Trends are captured well you will have something that will lead to a stable simulation and often you don't, so We focus on climate statistics and learning from climate statistics both from these large area simulations and soon also from observations, and that means scaling up any simulation, we take things like the path of liquid water, time-averaged humidity profiles specific, temperature, second moments of various quantities, but average of time. quantities is a key and use them to learn about the unknown functions the equivalent of the morning overcome similarity functions entrainment and detraining based on the dimensionless groups that we decide in advance what these statistics are can include any statistics that can It can include a measure of covariance of extreme precipitation between SSD and cloud cover.

A pop-up restriction, if you will, of cloud changes. You can incorporate this into the learning process. I think it's a good way to use emergent constraints to bring them into the learning process and then We want to use machine learning. We want to use the expressive models that exist,but we do it in an inverse problem setting, meaning that there is some Epsilon Delta function deep in the convection scheme that is integrated into a larger model. The larger model produces statistical results. and we can use the mismatch between these statistics that are simulated, observed or generated in Les to learn about those functions in an inverse problem setup by not having input and output pairs, let's say the input to the detraining would be the state of a column, the output would be entrainment detraining, but again, you can't measure that in observations and it's even very difficult to get out of the simulation, so we're not trying to do that supervised learning of a geometry treatment, but to learn about those functions in setting up inverse problems, since the data we have is indirect as cloud cover depends on drag, but it is not directly a measure of drag, it creates some challenges, throws away some information, obviously, since in reality it always I have lost some information, the evaluation of the loss function of the climate model is expensive and we have found ways to avoid that to accelerate the patient learning process here, I will not talk about the crux of the matter is that Bayesian learning can be accelerated into something like a thousand factor combining common investing ideas with ideas for machine learning and that. makes even a climate model feasible for us, but I will show you that for now it is just a single color model, so it has no input and output pairs and generally no gradients of the functions you want to minimize here, we have tested it like Try as much as we can to remove all the things that end up being non-differentiable in the parameterization, so fun clippings and the like are not totally free of those things, but they are still non-differential aspects and phase transitions, for example, so that we use common Ensemble methods. which are gradient free methods that come from common filtering used in weather forecasting that is used in a smoothing and inversion setup to learn about these closure functions and here we came up with closure function guesses that made some physical sense in which I don't think necessarily rely on physical structure to a large extent, I mean, some of it makes sense, but it's not the first principle derivation like the average is, but it actually works pretty well.

In this case, there are about a dozen parameters in these functions that we can estimate and on the left is the root mean square error on this training sample of hundreds of large Eddy simulations, then we don't run any simulations to increase the temperature of the surface at 4 Kelvin, so we get out of the validation tests on samples that we don't use in training and The key is that as you iterate through all these common deviations, some mini batches over epochs left, the error decreases. Different common investment methods. Different types of mini batteries do not need to go into what it is, it decreases approximately monotonically.

There is noise from the lot. The important part is on the right as validation error, so here is a sample that was not seen during training and these methods validate extremely well, the validation error decreases monotonically, so it generalizes out of sample. I think that because physics is rigorous, it works on non-dimensional quantities, I mean, in the same way that Mourinho's similarity generalizes from the wheat fields of Kansas to almost everything else, you have the same kind of generalization here and How few people Costa, I don't know if he is here or Ignacio said yesterday. What some others are doing is replacing these functions with neural networks, random feature models, free neural operators.

We have a whole zoo and by automating this training process there is the data and we run the models in a single color. You can experiment very quickly with putting neural networks in and see what makes that better or worse. Neural networks improve things a little, but not by much. uh, again, I think there's a good illustration of the power of theory here. I mean, as long as the physics framework is good, it's hard to go wrong, that's what we've learned, where it's very easy to go wrong if you exaggerate the physics. just a few examples, so here is in blue a large eddy simulation of a stratocumulus situation, um, this is off the coast of California in gray.

This is just another big power simulation with output from a gray acidic GCM. GCM. I think this doesn't matter what GCM it is, but it is fairly typical in that cloud cover is underestimated by something like a factor of two. Les gets cloud cover of about 100 and this new edmf type model gets cloud cover almost right, it has cloud cover of 100 and overall structure right and the fun thing is you can still do that in all kinds of regimes Ignacio showed some yesterday for stable boundary layers. They are all the same models and all kinds of different regimes or the same parameters that give you the correct dynamics.

It is another case of accumulation from the Dicoms field campaign. We are with aerial observations. The single color pattern and the dashed line need resolution. It is relatively high at the bottom of the atmosphere, in this case it is 20 meters or between 20 and 50 meters. is what you need and you can get medicinal humus cloud cover as well as Les, in fact better than many Las, cloud cover is quite sensitive to numbers and this is less sensitive to numbers of course it is much faster than any direct simulation. you know what you need for prioritization, it's a little bit more complicated to implement in the dynamic core because it has a diagnostic term, the prognostic terms, they lead to some challenges, we're still working on this implicit time step for everything and stuff So. a little tedious to make it computationally fast just another example of this here is deep convection over the Amazon and you see on the right is the vertical velocity um on the left is cloud condensate and a mean precipitation coming soon this is from um from in one of the arm sites I didn't put the observations because it gets a little complicated.

The main point I just want to make is to get the ball rolling. This is a diurnal cycle of deep convection and has a smooth evolution from the underlying turbulence. shallow convection deep convection there are no switches triggers such things, you just know that it starts from boundary layer turbulence, you start to form a low cloud, it gradually rises, reaches the freezing level, you make ice, it starts to rain and it's just a priority on this. case only with a dozen parameters and no more for all, so you cannot change from turbulence to superficial to deep connection or anything like that and that is the general cycle. um, in the simulation it captures the observed iron cycle very well, I don't think Ignacio triggered that. yesterday too, so I think the most fun part of working on this was that we started with these assumptions for a detraining treatment that I didn't have a lot of confidence in, but I did have a lot of confidence in the series of approaches in the physical system and we started.

In the first few articles we wrote, we're only on a handful of case studies, as is often the case with these prioritizations, and I was nervous that once we look at more data, you'll know this won't work anymore, you're overfitting what we had. and we didn't mean it, it's still pretty accurate for any new information you're seeing. I think that perhaps it is the unreasonable effectiveness of mathematics in the Natural Sciences that reaches the fourth year and we must not forget about The terms of memory are crucial for this, so you could not obtain this very fluid evolution without having a memory in the subslice scale, so the next step for us is to integrate this into GCM and some of us are working feverishly to hopefully make this happen.

In a few weeks we will have that and then we can learn from the observations in the same way we did with the third resolution simulations. There is no fundamental difference because we focus on the statistics that are available by observation and from them. Using the same approach draft this schematic for all components of a completely new system model. I think they're saying it's probably just a climate modeling project at a university anywhere in the world. Everything is new. You know, you can ask if everything is new. necessary, I think for a dichor it wouldn't necessarily be necessary to do it, it gives you some advantages and new software that also exploits the accelerator architecture and the like, you could do a lot of these things, let's say with ncism, in the same way, um. is a group that works on oceans at MIT and the terrestrial biosphere and it's been fun for me to learn a little bit about that same approach in theory.

In fact, it takes you quite far, it takes you to much simpler models that are then used today. They are still used. more predictive and I think this tripod here is key to the Progressive, whatever you call it, an essay or something in physics today last year that was deploying the design of this approach and I think that's very much the approach that Arakawa was adopting all the time you are just very careful with the theory of course for any question use the data you have to inform what is left there and in the 70s it was little data, some data computationally and much less observational, but now we have a lot more data that we can use to learn. about closing functions like here symbolically, Reynold emphasizes that they appear in an obvious conversation equation, so leave it at that.

I think, as I said at the beginning, I bring back an exemplary approach to science so it's crucial for progress in climate science and modeling and patience. He had to derive things and continue until you got the solution. I think it's really important and it's something that, well, at least I would like my students to learn about science as well. They are incentives and today's signs are not. encourage this very well, this quick post and things like that, but I think it's really essential for sustainable progress and what could be more sustainable and the ideas that he has developed are still in active use 50 years later.

Theory is still essential and the job of theory is To provide sparsely parameterized generalizable models, we can interpret things that we can understand here in the case of modeling, but the same goes for general circulation, where the focus would be on actually understanding things and I think the way to use data is to treat machines. learning as an inverse problem, combine it with theory, so learn within physical structures about functions you don't know much about, like what we do in the book Monumental of Similarities. Computational capabilities of the theory that you can use in various ways.

Get the highest resolution possible, but so do I. I think it's good use to generate large libraries of training data for anything that can explicitly simulate ocean turbulence, cloud turbulence, and the like, and then I showed you some examples of these personally parameterized, physics-based models that They can capture turbines and cloud regimes that have affected the climate. models for decades and I'll leave it here and thanks for listening, yes, we have time for some questions, start again here. I guess he's not doing it. The point is that it has distribution and also in these parameter values Universal is o Yes, yes, I mean the last point, it is crucial, right?

I'm serious, as soon as you create any physical parameters depending on space, you haven't done your physical modeling job right, a gravitational constant doesn't depend on whether you're Mars or on Earth. So it starts with this promise, there is no premise, there is nothing explicitly space or time dependent about the parameters and yes, you get distributions of the parameters that you can sample after the fact to get predictions with quantified uncertainties. There is much more at stake. If you get distributions there, you are starting to include structural error models. Well, I think these equations are good, they are not lacking in approximation.

You also want to quantify the approximation error and you can do it the same way I do. is described here, so yes, the distributions are essential and after the fact you sample them and don't show an example, but you can get climate predictions that sample that posterior density over throwaway model uncertainty and expose them intelligently, a very important point for everyone here. Then I want to come back to a minor point that you randomly mentioned on this page: akio preparation, staying away from observing and posing, is a question for David to some extent, which is that while it is possible that have not done very much directly you had in this department.

Mickey and I would spend our careers doing that, and even though the amount of data may have been smaller, there is a lot of information. we did thingsreally amazing for producing convection properties. read your base there is a lot of overlap with the methodology what I don't know and what David might comment on is how much interaction they need and I interrupted them very very closely when I was there the two groups were wondering so I think there is something of teamwork, yes, yes, I mean, it is clear that your theoretical work is not inspired by observations, but by very diligent observations.

Every yes, we have more questions, so at the beginning of the talk you mentioned, when you talk about 150 equations, you mentioned the environment. That's something special and it's a part of the miraculous theory that I think maybe we should stay away from. You know, the idea is that you have an updraft here at my expense and then you have another updraft here most of the time, so I think it might be best to consider the various eyes that represent some of the subdomains, some of which might behave as an environment sometimes without specifically designating one as special.

Yeah, I think that's a good point, I think what I did. What I don't say is what made the environment special to us. I mean, it was really just a way to reduce the number of equations to deal with, so in the edmf approach, like Zhao and Pierre, they were people known for pioneering the key question. is that there are fluctuations in the environment but not in the updrafts after UPS or top hats, and we are doing the same approach here, you don't have to do that, you can carry the fluctuations in the updrafts too, so maybe the way I would Your comment would be that there are no assumptions about the fraction of air or anything else in the environment here, so maybe the way to look at it is to say that maybe you shouldn't treat your updrafts and downdrafts as a top hat, but wear fluctuations in them. also and then they look exactly like the environment and you could do it.

We carry second order equations in the environment. You would have to carry second order equations. The ascending and descending currents. This is more equations and you know it's definitely doable. the usual balance between computational costs and accuracy and, um, we've done extremely well with accuracy so far and it's pretty easy to change the approximation when necessary because again we know where we did the approximations correctly, make sure the Amazon is out. Note that we don't have to do a space class, we need a haircut or yeah, well, there's a very slight overlap, you see, it's very small in this case, um, this particular simulation doesn't have this thermodynamics of no balance that I had. mention either one, but that's right, this is doing what is normally done in climate models: simply making the partition between liquid and ice a function of temperatures or below homogeneous freezing and all the liquid above um freezing temperature, which physically isn't true, because it treats something as an equilibrium process which it's not, but that's what it's doing and that gives you a little bit of overlap.

So it's great to have this. I mean, you know, the buying problem, whether the forecast models comment on the models, is sort of. Same thing, the difference is that if you don't get the shadow clouds or the Pacific right in the weather forecast model, your data assimilation corrects the error you would be making if you did this freely. I mean, my hope would be that these First the ideas that A Rose wrote in ecfwf with concerns and people for weather forecast models. My hope is that they can use exactly the same approach in the weather forecast model and one thing we will have to do once this works in a global model. is to see to what degree you get great convergence and to what degrees a simulation of 50 kilometers from 10 to 5 kilometers and see if it fails.

I think if you can make this scalable resolution independent of having some degree of network conversions and you could use a different better high resolution forecasting model. One crucial thing is that the only approach really built into this is to assume that the vertical variations are large compared to a horizontal one that you know at some scale that will break down as you get to very high resolution. but before we get there, in principle this should work now, in practice we don't know how to look at it yet.

Watch Video & Subscribe

If you have any copyright issue, please Contact