Metaprogramming in Dotty - Nicolas Stucki

Mar 07, 2024

Hello new class, Sookie, I work on the lamp here at EPFL and I'm working on metal programming for Dotty, so first we're going to focus more on macros than other types of

metaprogramming

in this presentation, so to start this talk where I'm going to quickly say something about Scala to macros, so the implementation was pretty much coupled with the internals of the Scala C compiler, leading to some portability issues even on a different version of Scala to 2x, making it pretty impossible to port to Dotty there were quite a few attempts and unfortunately it was considered impossible, so what we did was redesign the core of the

metaprogramming

infrastructure and we focused on the first, more portable one, taking advantage of the tasty file format we have. now and we also focus on making life simpler and safer for both macro implementers and macro users and we also add some new language features that are sometimes used or can simply be used instead of macros what we have. we would have used it in Scala as well, so we will cover these three topics first inline as a metric programming feature, then we will cover many types to compute new types and then we will move on to macros which is actually user written code that will execute some code to calculate some new program that will be generated in place and there we have two subsections which are the quota joints, which is a simple high level API and then we have the taste inflection which is the equivalent of the abstraction that we have now in Scala for macros, they are a little more restrictive and we'll see exactly what the details of that are later, so first in line, it looks like it's just an optimization.

I'm going to take this code and put it wherever I want to call this method, but then we'll see how we can make this do a lot more, so first let's look at the syntax for having this. new inline keyword that we can put in our death and in this case we have a blocking method that will take a message, print it and then we will calculate a chunk, print the result and then return that result, so having this inline keyword will guarantee us that this method will be inline, it could also be recursive, so we can use log within log.

More Interesting Facts About,

metaprogramming in dotty nicolas stucki...

You also have a way to specialize a return type. I'll go into more detail later and it's also the entry. dot for macros so a macro is just a way to implement this inline method so here we can see the object loader class where we implement this register function method and here we define an idea variable id which is public and then within the implementation. what we do is print the message, increase this id, then calculate the tank torque result and finally decrease the id. Now the important part here is that the fragment should be by name because we don't want to calculate it before. we print the message and below is a snippet where we actually use this register operation that with a power function and we print a message right before that basically tells us what power we are calculating, so when we insert this code it will look something like this so the message is placed in a Val, then we will have the print line just before the identification, well, we are going to prepend it because we are no longer directly inside the object and We will have done the power calculation and we will continue again with the intention of printing the result, so as I said before, it's important to have this distinction between by name and by value, so if you have a parameter by name, what do we normally do is evaluate the argument and then pass the result to the function, but with a parameter by name we simply pass a way to calculate this within the body of the record, so we need to maintain the semantics when we are inline here.

To do that, we simply completely aligned the code that was in the tank directly where it is used. Now let's imagine that if our id variable was private, then wherever we go to align this, we may not actually have access to this id to overcome this. limitation whenever we refer to a private or protected member, we can create accessors and setters so that those variables can be accessed from the outside, although those accesses will not be visible to the user, they will simply be generated later in the process, now what about the recursive tax that overlays some code so here we see a pretty standard power function where if the values if the exponent is zero then we simply return as one if the exponent is odd we will multiply the value one and then do the power of to the power of minus one and if we have an odd power we're going to multiply the price of how are we going to align this code?

Let's see if a pair with power of X and ten we know statically what the value of ten is and therefore when we align this code it will look like this. We see it because we knew statically ten, we could constantly fold it and use it wherever we had before n and then we also know static, we can also constantly fold those conditions and know that the first two are always false, so we know that we will only get the last branch. so the actual result will be this code and then we will again have a call to the power that we can recursively inline until we have no more and we get a code that looks like this, which in no way refers to n now.

What happens if we don't actually statically know the value of double n? Here we have an evil power where n is just a parameter and we don't know it, so if we start inserting we are going to get more and more code and If you see all those conditions inside the evils, if we will never be able to know if they will be true or false, so this will just be an infinite recursion to overcome this problem that we have or one. The way to overcome this problem is to have this parameter inline, so what it does is that wherever we call this power function, we need the argument for that parameter to be a concern, a constant value, an unknown constant value, so So, we're You'll be able to do constant folding later and this works with any primitive values that the compiler knows about some take classes, like option, basically things that the compiler already knows how to optimize and understands their semantics.

You know there's another way to do it. Same thing for this particular case and it says it should be reduced whenever inline, so we add this inline keyword, but this time in the bottom if statement before they have a statement to indicate where I'm going to inline. this if the condition is known reduce it and if not then fail and tell me this cannot be reduced and don't try to continue and this will also guarantee us that we will only take one branch of that so if we know how this should be reduced and that things that variables decrease then we know that eventually we are going to end and the error that we would get is something similar to that where I say that we could not reduce because and we talk we cannot reduce N equals zero because we do not know what n is now, this is not only applies to two if-then-else, but can also be used for pattern matching with a match, so for this let's take this simple piano encoding as an ADT where we have a natural number that has two zero or successor cases from another natural and we will use it to create a net twin method that should result after inserting it only as the value 2 for this particular example because we have the successor of successor 0 so how would we implement this?

We put the line before the match and then this means that one of those branches must match every time we align this code. In this case, we're going to match twice on the successor and then at the end once on zero and then we'll take care of the plus operation or you need to create both and this also ensures that we're only going to take one of those branches, but sometimes we may want to be even more precise. What if we also want to refine the type of what is returned so that we have this new notation where we say that the result type is not a normal result type but is just a subtype of int, which means that after aligning this has an int or a more precise type, like a literal constant type like 1 2 3, so we could just rewrite it using the same implementation, we could just inline it and know that no 2 has actually written the literal 2 and see other.

In an example of this, let's introduce a class hierarchy where we have a Class A and a Class B that extends a and B has a subject that a does not have and then we have an inline method to choose from that will take a boolean value depending on whether it is true or false will return A or B now if true it will return a and if possible it will return B so if you use that in that implementation, for Val a of type uppercase we will choose true it will return us an a but if we do it with fake shoes, we can refine the type to something more precise, which is B.

Typically, if we had written the return type as A, we would have gotten an error there and therefore we can use the choose option. to restrict to know that we can actually call methods mathematical method in case of false false because we have a B that would have been impossible before the last but not least in this category that we imagine if we want to implement the set for the function that will take a parameter of type and it will return us that part of that type and what we want is that if we have an order in the T part we want to use a set of trees and if we don't have an order we just fall Let's go back to a set of hearts, so in Scala 2 we would have had to prioritize the implicit in a rather clumsy way, but here we can just say okay, we have a lot of this implicit, but here we were not giving anything to the match for the value that we are going to take, so when we say case order of T, we will look for that order of T and if we have it we will take that branch and instantiate the set of trees, if we don't have it we just go back to the other set.

So far we have seen how to refine terms and get more precise types from terms, but there is another way to get more precise types and that is to get a type from another type, that is what type matching is for and as a first example let's look at this simple LM matching type so on the right side we see that there is a type in this case the first case will take a string and if it is a string our element of our individual element of a string will be a character if it is an array then we are going to bind a type that comes from that array, if it is an array of int, will be int and if it is an array of booleans, it will be boolean, we use the same notation as in normal pattern matching, where Lowercase names mean type bindings and now let's look at some examples, as I said, the string element would be equal to correct equivalent of character and the element of an array of int would simply be int because we talk, we will go into the second case. and the element of a list of float will be float because we gather case and the element of an of Neal would be nothing because Neal is actually a list of nothing, so that's the nothing that we get there, so to get into one more example complex, let's see We introduce this drop obstruction where we have a mole that has two units of two subtypes which is a kind of tail of the drop and then we have a cons operator with a star column that has a head type and a tail type and We're going to build takedowns by composing those types, but on the bottom left we can see how we would write the type.

On the right we see the same thing, but in the terms now we are going to add some interesting ones. operations on this, the first one is the hub, so we're going to line it up to produce some actual code, but what you can see is that the return type is a constant, which is trivially the definition of the cons, so that there is no problem there, but now, what happens if we go to Khan's cut? So traditionally we wouldn't have a direct way to express that I have my left topo and my right topo and I want the type that corresponds to those two: concatenated caps.

I would code that with implicit machinery and a little more complex, but here we can just say, "Okay, give me those two types" and I'm going to calculate the full graduated type and we do the same thing with the size, but in this case we're going to say that size is an int if I don't know the size of my top oh statically of the type or will it be a literal int if I know the size, let's look at the first example that we already saw in the in Martin's keynote, this is concatenation, it's a great move on, we take the left and right as x and y, then we say that the result of concat will always be a rollover, but maybe something not precise and then we give the actual definition that it will calculate. the type, so let's say that if the right side of the concat isempty, then we just take the left side, if we have a dump that is not empty, we will take the head and we go to concatenate the tail no, we are going to arrange with the concatenation of the queue and why this is quite familiar because we would have implemented it almost in the same way in the lists where and the key difference here is that inside the match and at the top everything is the type for counting, so write the patterns are types and the right side are types and in the normal case in the list concat on we are patching a term and then we have a term as a pattern and we have a right side which is a term.

Now let's look at another type match, so this is the type match for the successor of some particular int, so ideally we should pass it literal literal. int which is 1 2 3 and so on and we want to return the successor of that point, so one way to implement it is like this, which is simple but of course it won't scale, so we actually implement it directly in the compiler as and that can be found in the compile-time academic package s for the successor Type S and now we can use this type to calculate sighs, so we will take the drop type that we know will return at least to the most and now , if the type of demolition is unity, then we know statically that this demolition is empty and if not, if it is not empty, we can say that it is the successor of the size of the total, which would be equivalent to a plus. 1, if it had been done at the term level, so far we've seen how to calculate really precise types, but we haven't seen how to put those types back into terms, so for this we have those two constant methods. value and cons value increased the first one we will take a literal and if it is a literal it will read it will simply return that literal and if it is not then the compilation will fail, on the other hand the second one if it is not literal it will not return any if it is a literal, will return some of that literal, so now we can implement the size of our inline method because we say it will return a size inside, we can just take that size that we calculated, request the value and then we can apply a lot of patterns. on that value and we just optimize it, so if we really know the value, then we just use that value; otherwise we're going to compute a fallback and compute it at runtime, so far we've seen how to generate code leveraging most of the time. the compiler constant folding, but here we are going to introduce two to see what happens if we really want custom logic that should do the constant folding, that's where we introduce macros, which is more the domain of the old skeleton macros, but we Let's go Let's start with a simpler and simpler approach to implement them, so we are going to introduce two different concepts: quotes and splices, so the first quote has this notation, it has a quote and a block and inside a block there is an expression, that expression in that case has type T and what this block means is that I am not going to execute this code right now, I will simply save it as code for later and I will use it later in some program and on the other hand we have the splice where inside of some program and case inside this quote I am going to splice this code fragment and I am going to replace the code inside with this code fragment means that X when we calculate the quote, but we will keep everything between the quotes under the splice.

This is quite similar to the oscillating interpolator. The syntax is that the syntax rules are quite similar, but the difference is that any C directly inside a code is actually a term that can be used inside the splice again. We'll look at some examples of that later so we can do exactly the same thing. The same in types, we can quote a type and then we can join a type in a part of the program. Now, how would you define the macro with that? So let's pick up the same thing in line def power of long and in line int. like before, but instead of implementing it directly we're going to say, "Okay, I have a splice here." I'm going to calculate this expression for unknown earlier I said that we had to insert a splice inside another quote or program, so in this case we don't really have a quote directly visible, but we assume that when aligning where our quote is actually the program where we are aligning this piece of code and this is the exception and the only place where you can use a splice that is not inside another quote, now let's draw a parallel between the above implementation and how we would implement this, this X, this power X, so first we can see that the type signature differs a little bit so before we had an inline M which now becomes just a man so before we wanted to know it to force it to be a constant value and now we just say okay this is a value that I know right now when I'm running this code and on the other hand, the DX that we had before as a normal parameter now we have it as an extra as a long-lived expert, which means that we have some code to refer to it, but in we don't actually know what its value is, so now wherever we had the inline if where we expected it to partially evaluate this condition and then probably delete one of those branches instead, we just have a normal program that will simply run by executing N is equal to zero, it could be any other arbitrary method that the user defines, but now the result will be a quote, which means that we are going to return a piece of code that we are going to insert somewhere.

You can see that there is quite a bit of syntactic overhead, additional synthetic overhead for this, but this comes with the benefit of being able to execute arbitrary code and here we see how to do it if our code fragment in these cases defines the Val law and will refer twice to the value of x that is defined somewhere else and then we're going to use that to calculate the power that type macros can't go wrong. so we have some interesting rules to make this type check correctly and basically the only rule is that for any free variable reference, the number of scopes cited in the number of scopes spliced between the reference and its definition must be equal if we have this so we can ensure that if this program is well written, whatever we are going to generate will also be well written, so for example, here we define the value I and we are using it inside, but you can see that there is first a splice and then a quote between the definition and the use side so that we have equal numbers and we can use it and the same for X and n, but in this case first we have a quote and then wherever we use for n, we can also see that we have references, for example, too long and operation x and other operations are not really free variables because they are known statically and globally, so they do not depend on the current context of this method to be accessible to anything that is defined globally within a quote.

Now what about white box macros? This actually gets pretty trivial because it's basically just a combination of the specialized return type that I and I said before. m through a body to the inline method, so basically if we compute some expression inside that splice, if it's more precise than call site as normal inline with the return type of the subtype, so what if we actually want to inspect inside and see what the value is inside a fact of some expression? So let's take this example where we have a mole. we want we want to swap the elements and this particular implementation if it receives a reference and it will just call the normal what method on the overthrow, but if we receive it explicitly we will just swap the elements and build the overthrow directly here we take this drop by name because in the second case where we exchange the drop we don't want to evaluate the first drop first.

Now implementing this is pretty simple because we can just take this drop that we have as input and just say a lot about it with a quote as a pattern, so here we can see that we are the pattern that we were expecting on a size two pole and we're going to get each one of the references that were the elements and we're going to then build another overthrow and we're going to place those elements but in the reverse order, so an important note here is that this pattern is fully typed and the type of x one is known statically and it will be known statically in the other. side when we were built, so everything is again completely written, this has some limitations because we couldn't ask if I'm calling some method called apply without being a function subtype or something like that now, if you look at the other example, we were basically going to just use the tear down and call it, so lastly I added here, we can see that there is an additional parameter called reflection, this is what allows us to look to decompose the top.

I'm going to go in and describe what it actually does. internally later, but for now, as long as we associate a lot in quotes, we just need to have it, but we don't care what's inside now for another example, in the previous example we were pattern matching in the tree shape, but sometimes we want do to get other properties of the expression, so let's go back to the power function like we had in the previous version, but this is just the one that just calculates the power and then the power X will calculate the code for a power for a known and we will use that in our implementation of this expert 2 it will receive 2 expressions, so we don't statically know either value, but we can pattern match on both with this Const. and the extractor that will return will match if the value inside, if the expression is actually a value and we just return is the value, so if we know the values, we can call whenever we want at compile time to the power function and then take that value. that we have calculated and create an expression from it.

I will insert it into a program if we only know the exponent then we can use the power expert to generate the optimal code for that which is no longer referenced and if we don't know anything then we can just quote the power function which means that We are going to calculate this power function later at runtime and again we need this reflection refraction parameter, so so far we have seen well written macros. now we're going to get into what's more similar to color macros and where we're going to use an IP I directly in the trees to really drill down into all the properties of the expression, so first we have this.

Tasty concept, we have it as a file format, which is a standard encoding for Scala tree type trees and these s3 type ASDs have a source position, they also contain color comments, it is extensible and on the other hand we have things to reflect, which is what. We used before as parameters, so this is just an interface on top of those trees that allows us to inspect and build trees. It also allows us to inspect positions and comments of points and double the stability of this interface is given by stability. of the file format, so that's one of the reasons why we won't have the same portability problem as with Scala to Macross because our interface is not with the compiler itself, but with this table file format, so earlier I showed how to use this Const extractor but now let's see how we actually implement it so that the implementation here receives this reflection, which is our API, that is, in the trees first, if we use it, we will simply import the content of this reflection which it will give us access to all the types of methods and everything that is all the logic that we could do in trees to use that first at the bottom we have an expression and we will want to open it, that means taking an expression and seeing it as a tree. and in this particular case, because it is an expression, it is actually a term that refers to some type of tree that we are going to work on, so we can also match patterns in different internal trees, in this case we do we'll do only if it's a bit then we'll take it if it's a block with empty statements, we'll just ignore the statements and say yes, it's literal, okay, but note that there's a conversion here because we're taking the value of a term.

The term doesn't have the information formation like we had with the expression, so here we lose some assurance from the type system that this will actually succeed, but of course in this case we know that it will becauseWe know we have an expression. of that key, so if it contains a value, it must be a value of that same T, so we already have some pretty advanced prototypes, for example, there is the formless tree which uses a special line, it is inline, try online. matches an implicit match, it doesn't currently use any macros, let's experiment to see what happens if we use them, what else can we do?

So the implementation I showed before the overthrow is actually a simplification of what we actually do. in our standard library compiler to extend the tuples and have rit drops more than 22, we basically generalize the drops to have normal h list properties and for that we just use many types and nothing else inline and then we use some optimizations inside of us. I also implemented help to implement all the macros in the Scala test. It was a lot of work and then we have a couple of other outliers where some string interpolators run in cushioned andhra, so there is an F interpolator that is in the standard. library that uses macros to verify that the content of the three that the F interpolator has is formatted correctly and will not fail in the format parsing at runtime and we also have an XML string interpolator that uses macros that I am specializing in, in In this case the other one who did it only used macros and the XML interpreter is meant to replace XML literals, so thanks.

Is there a question in the center? Hello, how is this connected? if it's a round Scour meter, it doesn't really connect, so we had another one, we actually have another prototype in that, but we don't exactly use macro infrastructure, instead we use the Tastee reflect interface to take the tasty trees and generate the base semantic data that that meta uses, so it's a bit separate, but we can generate one from the other. Thanks, will there be any replacement for two runtime reflections, in particular, reflecting the types at runtime, yes, no, not at this time, not with infrastructure?

Okay, two questions first, one, how is it posed? custom compile errors from inline methods and macros, oh yeah, but we have a couple of other methods in the compile time package for the inline doctor method, so if there is an academic compile time error method that if you call inside an inline method and you get inline and then that branch is not deleted, then it will throw an error with the message that you wrote inside, okay, and if we have quotes or cuts, we have a similar but different mechanism for raising exceptions as well and if we actually use the reflect API then we can even be more precise, for example in string interpolators we could get error messages for the subpart of the string and say that this part of the string is not formatted correctly.

I also wanted to ask about the very typed example. I noticed that you compared a null type to an iterable type, oh yeah, which sort of assumed the subtype. What if you wanted to match a type exactly to a range and this way you didn't want to accept the subtype of a literal? but exactly an iterable of something, I think this was visible in one of the following slides, yes, this is correct, oh, yes, yes. Actually, I'm not sure how we would code, yeah, Martin, so one thing you could do is put both in scrutiny. X and the pattern in a tight black box with no variant, so you say it's a boutonnieres box of It's how you would do it.

I assume there will be quasi quotes or just the other quotes on spices will supplant that, so yes, the Costa quotes and the quotes on spices are quite different because the spices, as I show them, are just expressions, the quasi quotes were countries, so What House codes are a good way to abstract trees, so currently it's not the plan to implement them, but it would be possible to edit them later just as a layer on top of the tree API, but because we already had these nice quotes or splices. We can usually do most of the work directly on that and every time we have to switch to the other one it really isn't too big, so so far it hasn't really been necessary to write shortcode.

Is this feature experimental like macros in Scala 2? or is it really feature color 3, is it so all the first was first with type matches and online there are concrete features, the code set splices will probably be for the Scala tree one and the other under the reflection we are I'll use it internally at least I'm not sure if it will be completely stable for three one, but eventually it should also be part because in this case what we are really reflecting is the tasty format that will be stable. since we want to make sure it's stable from 3 1 onwards so from there we should also be able to have a stable API for that in the solution to overcome type erasure or you'll still need class text for that. inline leverage to take advantage of type erasure I actually showed an example to Dennis, who aspires you are presenting right now, if so, the main idea is that if you have a type parameter and you create an inline method, this method will be online sooner. it is the type that is furious, so whenever we go to specialize we will already know the most precise type information, now we have to find out that it is not trivial in all cases, sometimes we may need to find the correct encoding patterns to achieve everything we want with a razor, but for the simplest things it is trivial.

Is there a general rule for when you should use inline and when you should use macros? The rule is to start with online, if that's the case with your ears then you'll probably stick with it if you have to. more power, then go to macros, but since you saw that the inline definition is exactly the same, you don't need to change the interface that the user will see, so create an inline definition, create our method and then go oh I need more, then just change the implementation to use something more powerful. Thanks for talking about what will happen if I create an infinite loop on a match type.

There are some mechanisms that the compiler will keep track of recursion and if you do too. So far it's just going to tell you how we reach this maximum recursion limit and if it's actually a type, I mean, if you really have a very long type, you can increase those limits manually. Hello, thanks for the talk. Ok, we thought it was fantastic and I learned quite a bit. very much like Dimensions, tasty is meant to not be a compiler, it's specific which I really love, is there anything that would be useful for from an external tools point of view at the moment or is it just the first phase now of something more general? concept Oh, tasty, we are already using tasty for the ID for example, okay, so we get all the information that the edited once from the tasty, so we compile until tasty and then the ID just loads it and looks at the tree in the position it is in. it finds the position, it can find where the references are coming from and all that kind of stuff, we can also document it because we all have all the documents there, so we're really just compiling and then there's something else that will consume. that tasty and generating the documents makes me very happy thank you thank you for the talk about how it's an inner F represented and tasty like completely in light already or it's still like a function or oh yeah, so in tasty we basically keep the Whenever we have a, we'll keep the entire tree inside the file author, so as long as we keep that tree, we'll mark it as inline and massage the body of the inline method a little bit to do that. be inlinable easily in responsible, but then every time you go to align it, what we need to do is just become tasty, you read the tasty, take that tree that is there and take that tree directly and then start bending constantly and do the rest.

Alright, thanks so much for the chat, Nicola, guys, we're really running out of time. I think we need to stop the question and answer session, thank you.

Watch Video & Subscribe

If you have any copyright issue, please Contact