CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

Jun 06, 2021

- My talk is called What has my

compiler

done

for me

lately

? Unscrewing the

compiler

cover. Which is the longest and most complicated title I have ever seen. It turns out that titles are very difficult. So why am I here looking at you all? It's definitely a question I'm asking myself right now. (audience laughter) As Jason said, in his wonderful introduction, my name has unexpectedly become a noun and a verb, which has led to some extraordinarily surreal conversations this last week. Not only did people come up to me and, oh, they came up to me and introduced themselves like we were old friends, and I looked at them smiling and said, "Who are you?" And then they say, “Thanks for Compiler Explorer,” which is awesome.

Or I've been at conferences and people have been talking about Godbolt and, you know, how there are those neurons in your brain that are always in the background listening to your own name, and I'm just looking around like this, It's been very, very surreal, but thanks for the warm reception. I'm going to talk a little about myself so you know how I came up with this idea for Compiler Explorer. Like a lot of people, I'm sure, I started, a long time ago, programming on these 8-bit computers, which, as Jason alluded to, is still kind of a hobby of mine, I'm still hacking into the 6502 and the Z80.

More Interesting Facts About,

cppcon 2017 matt godbolt what has my compiler done for me lately unbolting the compiler s lid...

After a few years of trying to write my own games, I've been writing things in assembly because that's the only way you could do anything efficiently in those days. I moved to the ARM processor, stayed in the 8-bit era for too long, and then when the jump came, I went all the way to 32-bit. And in those days, there was no version of GCC for my computer, so I was stuck writing assembly code. I even ended up writing a complete IRC chat client in assembly, complete with windows and scripts and all integrated just because that was pretty much the only option available to me.

When I finally encountered C, I considered it a kind of macroassembler. It was like one of these things was a nuisance, it kept me from writing the assembly I wanted to write. But when I got to college and discovered that, oh, wow, it's suddenly very dark in here. Nobody falls asleep, right? When I got to college and realized that the only way I would like to write software that ran on more than just my ARM computer that I had, I realized that I had to learn C. And at that point, things they kept going. and eventually also C++.

That led me to get a job in the games industry, where I spent 10 happy years. I'm still a little torn between writing assembly code and C code and maybe a bit of C++. After that, I started writing C++ tools, so I have a lot of sympathy for those people who work with C++ tools. The language is absolutely terrible and is the worst thing to analyze. So if you see someone like JetBrains or something, give them a nice firm handshake, or anyone from Eclipse. Then I worked at a small internet startup. And then I moved on to DRW, which is the company I'm in now and where I do low latency stuff.

So you can say that I have always stayed pretty close to metal. So

what

's up with this talk? It's an amazing opportunity to talk to so many C++ developers and I thought:

what

am I going to do with this time? What do I love, what do I want to tell people? I love the montage, you probably noticed that in the last few slides. So I want to make sure you guys are happy and don't be afraid to see the assembly, it's a useful thing and you should do it. Not all the time, and I'm not saying that you should go out and learn to write assembly code or that you should ever write assembly code, but you should be able to look at the output of your compiler and, with some confidence, know what it's doing.

And when you do that, you'll appreciate how much work has gone into the compiler and how smart it is. There are many glamorous features that arise in the language. You know, all these new metaclasses, Herb's amazing metaclass ideas. Reflection is coming and all the template metaprogramming tricks you can do with the compiler are amazing. But that's all in the interface. It is also very important that the code that comes out on the other end, what will actually run on the computer, is efficient; well, firstly, be correct, and secondly, be as efficient as possible. So that's the outline of this talk, and I'll tell you a little history of how Compiler Explorer came about.

I want to give you a very quick overview of x86 assembly. I want to ask the question: "What has my compiler

done

for me

lately

?" Which is just to say that I'm going to look at some of the optimizations, some of the interesting optimizations that compilers do. Now, of course, these are all going to be slide examples, so they're totally contrived, but hopefully you'll get the hang of it and it'll whet your appetite. And then I'll talk a little bit about what happens when you write on my website, what happens behind the scenes to show you the results of the code.

So in 2012, a friend and I were at work and we were discussing whether or not it would be possible to start enabling the C++0x flag or the C++11 flag in the compiler and start using some of these. new features that were coming out, you know. Things like ranges, automatic variables, things like that. And being a high-frequency trading shop, we can't just tell everyone, "Go ahead, just use whatever cool new features are available," we need to make sure we haven't done any harm. So we came up with an example code snippet like this, which, as you can see, just takes a vector by reference, a vector of integers by reference, and just sums them all up.

Yes, this could just be std::accumulate, I know. , but be patient. And we're wondering if this traditional way of writing it is just a loop over an iterator using an index on that, a loop over a container, sorry, using an index on that container. This would be how we would write it before, we were wondering whether or not we could replace it with the much nicer code here, just using, you know, range for. Other languages have bitten us before because some managed languages in particular. those of us who had some experience would build iterators to iterate over a container like that And that would generate garbage and ultimately things would happen behind the scenes, obviously C++ doesn't have garbage, but we just didn't know that.

We'd like to check it out, you know, we should try these things. Which of these is better than the other? And that would be like an introduction to us, me talking about how Compiler Explorer came to be. And if my clicker works, we'll move on to the next one. Before we get into a bunch of assemblies, we'll tell you, we'll give you at least enough information to know how to read the assembly, I'll give you a big warning here. It's really seductive to look at the output of your compiler and start fooling yourself into thinking that you can see what it's doing and instinctively know that this is better than that or that this other thing is better or that there are fewer instructions and that's how it should be. faster, things like that.

If any of you were here earlier for Chandler's talk, he gave just a small glimpse of the kind of complicated things that happen inside a modern processor that you can't really predict. While you can obviously develop an intuition about what's going on, you should always measure and you should definitely use something like Google Benchmark or one of the other benchmarking libraries out there that have gone to great lengths to try to make it as robust as possible. . although microbenchmarks have their own problem. Or you can, a quick shout out to Fred if he's in the audience somewhere on his site, quick bench.

Hey. It's an amazing online tool that allows you to look at two different snippets and compare performance, again, using Google Benchmark behind the scenes, but also interactively online. And I clicked a button down, wow! So let's talk a little about x86. Well, first of all, I guess, how many of you compile for x86 platforms by raising your hand? Good. I was terribly afraid that suddenly everyone would say, "No, no, we're doing ARM now." Alright, who makes ARM, ARM predominantly? There are some hands. What about x86 32 bit? Is it still a goal? A bit. Does anyone have a more exotic processor than the one they usually build for?

Oh yeah, some PDP-11, right? (laughs) Okay, well, that at least gives me some legs, it would be very embarrassing now if I suddenly had to invent another architecture to talk about, but let's talk about x86, and specifically x86-64. We don't need to know too much about it. Everything you need to know, has records. These are 64-bit values and there are 16 of them. In the 32-bit version of the chip, there were only eight with these funny names, ax, bx, cx, dx and all this stuff, and some of them have special meanings too, but for the most part, you can think of them as purpose variables. general.

Fortunately, when AMD decided to extend to the 64-bit version, they added eight more registers, which was useful, and instead of making up new fun names, they just called them the much more sensible r8 to r15. That's where the integers are stored, which are like the integer variables inside the processor. There are a few other registers, the xmm or ymm and zmm registers, depending on the revision of the processor you have. Any type of floating point operation will occur there, and any type of packed operation will also occur on those registers, but in general, and at least during this talk, I'm not going to talk about those registers.

The only other thing you need to know is that there is an ABI, there is a way that functions talk to each other, and that is by convention. And that convention is that when I call a function, the arguments to that function are going to be rdi, rsi, rdx, and then some other registers. And then if you return a value, an integer value, that is, you'll leave it in rax. And there are rules about what records you can overwrite and what records you have to keep, but if you're not writing to assembly, you don't need to know that.

It's kind of like our ABI for talking between functions. Now, the registers, as I said, are 64 bits. But they have all these different names. So I've given rax as an example here, but this is also true for rbx, rcx, everyone else and even, you know, r8 to r15. If we put an r at the beginning, we are referring to the full 64 bits of the register. If we say eax, we mean a 32-bit version of the register. And we only get the lower 32 bits. And for complicated reasons, if you write to eax, it zeroes out the upper 32 bits.

That's not true for ax, ah, and al, which are these names for the bottom 16 bits and then the bottom two bytes, which are kind of legacy. I mean, that's how Intel has maintained backwards compatibility up to, I think, the 8088, by simply adding them, expanding the registers, and giving them new names each time. Once you have records, you have to do things with them. That's when we can finally talk instructions. So instructions can take between zero and three operands, and unlike Chandler's talk, I'm using Intel syntax here. Yes (laughs). (applause) God, I did a whole routine for tabs versus spaces and stuff like that, but I didn't realize it was going to get so much applause.

By show of hands, who uses Intel syntax when analyzing assembly? Oh okay, and who uses AT&T syntax? Oh, oh well, I thought I was going to make some enemies today. This is why Compiler Explorer, listen to me, Compiler Explorer has Intel assembly by default, because it makes sense and because I grew up... (audience laughter) And everyone else is wrong, I'm sorry. No, because I grew up, again, with ARM and 6502, and they all have the destination on the left side. So if you see something on my slides, the target of the statement will be the parameter on the left.

So the operations could be things like call, return, add, subtract, exclude, or, and all those kinds of chunks. And then, exclusively for x86, well, not exclusively. Specific to CISC type machines like x86, those destination and source registers, sorry, the things that are up there don't have to be registers. They often are, but they can (audio cuts out) references to memory. And being as complicated as it is, the memory reference can actually be more than just a memory address. We get this huge number of options at the bottom here, we get like a base, which is a constant.

We can choose another record and say add that record to it. And then we can also choose another record and add a multiple of one, two, four or eight to it. Obviously this is useful for things like arrays because you want to have a pointer to the beginning of the array, and then you want to get the ith element, you're going to put i in one of the registers and then you're going to multiply. one, two, four, eight to advance that number of bytes. So that's really powerful. But it kind of blurs the line on what a single instruction should be.

I mean, if you can do all this as a plugin. And in fact, this is the kind of thingwe're talking about here, we have a bunch of instructions there, I also put the C equivalent on the other side. I think you can see the pattern there, where the first one just reads from r14, says r14 is an address and I'm going to read four bytes from it and interpret it as an integer, put it in eax. The second one is just adding two records together. Then we can start to look at these more sophisticated addressing modes where we can add whatever r14 plus a little constant into eax, it would be read as the first element of an array if r14 were an array of integers.

And then that subtracts sub eax, DWORD PTR, blah blah blah. That's doing that array index operation I just talked about. Now, it turns out that array indexing is so powerful and so useful that there is a statement that simply says "hey, compute the address I would otherwise like to read from" and just give me the result of what the computation of that address is. . That's the penultimate instruction at the end, the effective loading address. It's a kind of glorified plugin, it was designed to effectively do what the C code equivalent shows, like taking the address of some element in a subarray.

But it's also effectively just adding a bunch of things in a more sensible way, one could argue, than the actual addition instruction. And then it's worth mentioning this down here, that xor edx, edx. It seems a little strange to do something exclusive or self-related. But as you've probably already figured out, if you take any number and exclude it with itself, you end up with zero, so this is an elegant way to set edx to zero. Why would you want to do it that way? Well, to encode a move instruction, move eax, 0, you would need to put in the instruction to move down and then you would have to put in the four bytes that correspond to the zero that came out.

While this exclusive or has only two bytes to encode the opcode, it is smaller and more compact in terms of code efficiency. There are also some architectural reasons inside that are super exciting and interesting, but I don't have time to get into them. In short, 64-bit fun registers, they have different names, parameters are passed first in rdi and rsi, result outputs in rax, operations are performed with destination on the left side and destination and source can be registers or those memory references. . Ok, now everyone knows how to read the x86 assembly. Where were we? We were here, we were comparing these two implementations of standard effective accumulation over a vector of integers.

Which one is better, well, let's take a look. Let's take a look at the assembly, that's how we look at these things. This is version 0.1 of Compiler Explorer, it is a shell command. So I was running g++ on a temporary file, triggering the optimizer, telling it to just generate the assembly instead of assembling it, generating the script, which is the standard. And of course, the very important masm=intel. Pipe it through c++filt because mangled names mean nothing to me, and then this fun grep here just removes all the fun lines, dot directives that the compiler needs to be able to talk to the assembler. to be able to release a real program, but I'm not interested.

And we have something like this. I don't know if you know the Unix watch command. It's a command that takes another command and just sits there and runs it in a loop and keeps running it over and over again. And you can even put --dif, and it will show you a difference from the previous time it was run. So I just split my terminal in half with tmux, running it on one side and running vi on the other. There you have it, Compiler Explorer. We found it very useful. We were able to answer all kinds of questions about how the code was compiling, and once you have a tool like this, you start to think hmm, I wonder what else we can do.

So I went home that night and it hurt that this was a great solution, but it wasn't very pretty. So what do you do when something doesn't look very pretty? You could use a cool graphics toolkit, right? Or you could go to the web. So, my goodness, I'm so glad it's loaded. This is Compiler Explorer, many of you have seen it before. And this is the example, so this is the integrated view. But I'm not going to show the embedded view because, frankly, it's too small, oh, I'm going to have to... Okay, is it almost readable?

Yes, oh well, and the yellow appears, hooray. Well. So here is our example. And on the left side we have the code. On the right side we have the set that comes from it. I'm building with GCC 7, 7.1 here, it's the version I wrote this talk with and I'm paranoid about changing anything in case everything changes. Within the compiler options here, I'm using O2 and C++1z and arch=haswell. Haswell is an Intel variant that is about five years old and is found in most servers; In my experience, it's a pretty secure base. Just to give sort of a brief introduction, we'll go into a little more detail in a second.

If I mouse over this yellow area here, you can see that there are two yellow areas on the right side that have become a little bolder, although it's hard to see them on that screen. Color coding attempts to match the source lines to the assembly output. So, although we don't really know much about assembly, other than the crash course I just gave you, we can see that the result int = 0 corresponds to xor eax, eax. So we know that xor eax, eax is the same as eax = 0, so we can at least see a parallel between those two things.

Similarly, if I look at this red block here, this is where we're accumulating the result. Now, because we are xing eax with zero, we could reasonably intuit that the compiler is using eax to hold the value of the result while its loop executes. And in fact, this is supported by the fact that we look at the add eax, DWORD PTR, which simply says: I'm going to accumulate in eax with whatever rdx points to, and then continues with rdx. And then you'll notice that the result returned is blank, there's no actual assembly instruction that corresponds to it.

This is because when the compiler finishes executing this loop of adding things, the result is already in eax, which is where we needed to leave the return value for our caller. So there are really no instructions. I mean, normally, I guess you could argue that this red here should be attributed to the result, but it's not perfect how these things line up. Well, before we get too deep into this, I'd like to show you what happens if we turn off the optimizer. So the optimizer is disabled and I'm not going to go over all of this, but the compiler does pretty much exactly what we ask it to do.

So this is interesting and useful to fully see what you've asked the compiler to do. And we can see that a lot of code has been generated here, these vector operations have been issued. And if you had attended the talk on linkers today you would know that they were marked as weak and things like that. But on O1... Yes, we have a mov eax, 0 here, it's funny that GCC has decided that it's just not worth replacing mov eax, 0 with xor eax, eax on O1. I don't really know what the reason is. And then if you put it on O3, which is the only true setting but, as we know, it's not very good.

It's amazing. I mean, look at these things. There are pages of this, and I'm sure it's super awesome and uses all these vector instructions, which are cool and all. But we would have to compare to make sure it was actually faster than the simple version. I rely on compilers a lot, but it's too much to fit on the slide anyway, so I'll settle for O2 for most of this stuff. Alright, anyway, what was our original question? Are we okay to make a range? Now, I'm going to drag one of those and paste the code here and hide the hidden code set that I put there.

The font size is a little too big here, but I'm sure you'll give me the benefit of the doubt for a second, so I'll go with (auto x : v) result += x, and I'll delete those two lines and it didn't happen nothing, because all I did was create an editing window. The user interface is such that now I need to connect a compiler to that window, so I'll take a compiler and put it here. And then, just to make it a fair fight, I'm going to take the same command line options and paste them in there. Okay, well, I can scroll up and down and see that there are 14 instructions there and there are 18 instructions there, but there is a better way to see what's going on.

I'm going to pull down a diff view and maximize it, okay. This is all good, here we go. Alright, this is what I wanted to see. These are the two versions side by side with one difference. As you can see, on the left side is the version that uses the zero to v.size loop and simply accesses it using an index, and on the right side is the version that uses range to. Now, again, with the caveat that you have to measure everything and you can't just look at these things, at least we can take away one thing, and that is that, although there is a little difference at the beginning, this region here, from line 7 to 11 or from 10 to 14, it is identical on both sides.

So even though we write the code in two completely different ways, the core loop, the part that actually loops through everything and summarizes it, is identical in both cases. And the difference at the top is a couple of instructions. We might reasonably expect them to behave very similarly. Oh, wait a second, I was going to do something else. And that is, for the purists among you who say why don't we use std::accumulate. Accumulate beginning (v), end (v). Writing under pressure is never easy. Ta-da! Now, if you're eagle-eyed, you'll see that it's absolutely identical, more or less a reordered statement from the handwritten for loop.

So use your standard algorithms, I think that's the bottom line. Let's quickly disassemble that code. This is an example of how, if I were reading some code, I would take it apart and try to intuit what is really going on and why they are different. If you remember, we are taking a vector of integers as a reference. Now, there is no reference in terms of hardware, they are all pointers as far as the processor is concerned, so effectively what we have done is passed a pointer to a vector of integers for this function. The first parameter, the first and only parameter, will be an rdi.

And you can see that these first two instructions are read from rdi and rdi+8. Now, it would be easy for you to fall into the trap of thinking that rdi points to the list of integers, but it doesn't. It is pointing to the vector. And at least in GCC's implementation of STL, this is what ultimately, if you look at all the templates, you arrive at. And we have a structure that has three pointers. The first pointer points to the beginning of the array. The second pointer points to the one beyond the end of the array and the third pointer points to the end of the storage region allocated for that particular vector.

That means we can have more space than we are actually using and as you know the vectors will grow to fill the amount of space they need and they will shrink, well they won't shrink unless you ask them to. Sorry, I'm getting carried away, but the interesting thing here is that the size itself is not hard-coded within that structure, we just have two pointers. This will get interesting. So here's the difference, the difference, here's the differences. On the left side, we have the traditional count-based approach, and on the right side, we have the range-based approach. Those first three instructions on the traditional side, sub, mov, and shift right, effectively take the end pointer and subtract it from the beginning.

That tells us how many bytes there are between the beginning and the end of our vector. And then that right shift by two is dividing by four, you know, if you shift something to the right, you'll divide it by two, and if you shift it to the right twice, you'll get a division by four. So what we've effectively done there is the decision on size. So the size has already been decided. And now we know how many elements there are in our vector that we are going to iterate over. Implicit in that turn to the right is a kind of comparison with zero.

So if we got a zero result, the equal flag will be set and that heh, jump if equal, will disappear after the .L4, which is that there was nothing to do, just return the zero part of the program. So far, we have said that size is equal to zero, if so, we are done, great. So now we might reasonably expect it to use that while counting until it reaches the end of the, like counting a loop iterator until it reaches that size, but something interesting has happened here, the compiler has realized that we never used that index. inside the loop.

And then we just read one array after another, and it basically goes in and rewrites everything for us in terms of generating a pointer to the end of the array and then moving a pointer forward through the entire array. So now you need to know where the end of the array is, which we all know is actually or was in rcx at the beginning.Unfortunately, the compiler didn't realize that, so it had to add rdx back to rcx to reconstitute the endpoint with the one we had at the beginning, unfortunately. And then it sets the result to zero there.

So there's a little bit of additional work done here, and I think there's probably some compiler writers here, they probably could, I don't think I'm speaking out of turn to think that that could be optimized by the compiler, I don't know if there is anything, oh, there's a note there (mumbles). (background noise drowns out audience member) Oh, okay. So I think the comment was that Clang was giving the same result, which might actually be a bug. Is that so? We...(background noise drowns out audience member) I see. (background noise drowns out audience member) Okay. It's difficult to communicate.

Yes. The comment was that there was a problem with the form, or there may be a problem with the Clang form, if we could talk about it at the end, maybe that would be clarified, because otherwise we will be derailed forever here. . But that's great, there are compiler people here who can help me improve this. On the range side, of course, what's really happening is that the range is being rewritten to get the start, get the end, create an iterator, and loop through it from start to finish. And that's exactly what the compiler did. So it's no fun calculating how many iterations we need to do and then effectively throwing away the number of iterations there.

The loop, as we already saw, was identical in both cases, so I don't think it is necessary to go into too much detail. We simply read each integer one at a time and add them to the accumulator. And then when the iterator reaches the end, we stop spinning around and just hit that return statement. I think we can probably conclude that, more or less, a couple of instructions from the beginning are identical, which is great, actually, because that means we can tell everyone: go ahead and use the much better C++. range idiom for instead of counting.

That is excellent. Additionally, we learned that optimizer settings make a big difference. And to be fair, I didn't actually compile the two versions with O3 or spend any time determining if there's anything else that changes if we try count versus range. I'm not sure we would, but it would definitely be worth checking out and of course you would compare if you really cared. And we saw that the standard build is identical, so we should actually use the standard build. Well, that's an example of the kind of thing you can do if you start sitting down and looking at the code.

And like I say, you start to discover these amazing things that the compiler does for you. So, the first thing I'm going to talk to you about is multiplication. Again, as I said at the beginning, these are all slide examples, so they are necessarily very small and very simple, but they are great. This is what a multiply x by y routine looks like, passing two things, x and y, edi will have the first parameter, esi will have the second parameter. And we need to get the multiplied version, multiplied, listed for me, the multiplied version in eax.

And this is what the compiler outputs. That's great. But can we do better than that? Multiplication, if you remember from grade school, looks like this, it's four-bit multiplication done by hand. I'm sure there are much smarter things inside the processor, but ultimately a lot of additions occur, and this is just four-bit multiplication, so there will be four additions in between. So in Haswell, where you'll be running at least 32-bit multiplication most of the time. It's a miracle they can make that happen in four CPU cycles. Now, a CPU cycle is about a third of a nanosecond, so put that in your head, of course, we're still talking about something that doesn't take much more than a nanosecond, so it's crazy.

But an addition is a cycle. So maybe there is a faster way to do this. Or maybe my clicker works, there we are. What happens if we know that we are multiplying by a constant? We know it happens all the time. You may be looping through an array or getting the ith element of an array, and you need to be able to multiply by the size of the object. We could reasonably expect it to use that gear as a shifter where we can go up by a power of two, sorry, go up and therefore multiply by a power of two, so we could expect it to go up by one.

Let's check. Oh, no turn. It's that fun instruction from Lea. Like I said, it's a glorified plugin, you should consider it as a plugin, but the nice thing about lea is that you can have a different destination than the source, while the change in x86 is in effect. So it avoided a statement here just to move things to the right place and instead said okay, add rdi to rdi, great, and then put the result in eax, we're done, great. What about some other numbers? Surely, it will need a change if we achieve it. Well no, because we can use the wood again.

Okay, now we know that the read instruction can do one, two, four, or eight. So no surprises await us here, and there really aren't any. If we go to 16, we'll see him using those shifts, and there you can see the two instructions that he would otherwise have used if he was just using shifts. Now, again, anyone who was at Chandler's talk can see how smart that processor is at moving things around and rearranging them. It's very clever, but it's still worth it to not have two instructions for your channeling. But what happens if we do something like that?

Well, it has given up on us. So obviously there is a limitation in the compiler, the compiler writers are not as smart as they seem. I know. (audience laughter) I know... (applause) Oh no, I was wrong, I'm sorry, I need to do this. I know I can build that and multiply myself. I know that 65599 is 65536 plus 64 minus 1, right, so I can build from those operations, so I'm going to do that and create better code than the compiler, I know better. Oh. (audience laughter) (applause) Well, that's awkward. So yes, it turns out that the compiler is smarter than me.

It is not surprising. So not only are you smart enough to know that multiplication is faster than all the shifting, addition and subtraction you would otherwise have to do. It turned out that my sequence of changes and additions is actually equivalent to multiplication and it said well, I'll save you from yourself here. (audience laughter) That, of course, is only true on more modern processors. If we go back to the golden age, let's go i486. Yes, you see. I'm still channeling Michael Abrash's big book of incredible optimizations and I can still write better code than the compiler.

Well, not yet. If I convert this to the multiplication it really should have been all along, the compiler is actually even smarter than that sequence of instructions, although I have no idea what it's doing, I haven't bothered to figure it out. So the answer really is to let the compiler do things for you, it will be smarter than you. And I guess as a secondary thing, telling you what architecture you're targeting is probably important too. I mean, I don't think multiplication has been this slow for a long time, so I think you're safe in this particular case, but always take a look.

Alright. That's multiplication, and also multiplication with a constant, so it's a bit of a special case. What about the division? If you hated doing long multiplication as a kid, you probably, like me, hated doing long division. And processors don't like it any more than you do. This is the code that appears if you do a division or a module. It turns out that the circuit they use to do a division also gives you the remainder at the same time, so the only difference between these two routines is whether or not it chooses to return eax or edx.

So it's one of those funny cases where the statement, the idea of statement, effectively generates two registers that aren't even named in the statement, it's crazy. Anyway. A 32-bit division in Haswell requires 22 to 29 cycles. That is eternity. I mean, it's still only 10 nanoseconds, okay. But it's an eternity compared to all those movements. The other thing I haven't put on the slide here is that there is only one divider for each core on your Intel chip. And it's not even fully in development. So those multiplications, to begin with, there can be multiple multiplications on the chip, which means you can have multiple simultaneous multiplications.

They're also pipelined, which means you can start a new multiplier every cycle and get the result four cycles later, but at least if you have a chain of them dependent, you can get a multiplier result every cycle. The divider, there is no chance. The divider is simply locked in, it will wait 20 to 30 cycles before getting the result and no one else will be able to use it. Can we do better? Click, here we go. Well, obviously, trivially, we divide by a constant, and that constant is a power of two, we're back in the land of change, right?

So I switched to using unsigned numbers here just to avoid extra code, but you can also do this trick with signed numbers. So now there's no magic instruction type trick to save us here, it'll just make the change, which is great. And you can't even see the code because I zoomed in on everything to try to let you see it, okay? So there it is, x over 2, we're just shifting it to the right, and you know, no surprise, or we're going to do x divided by 16, shift it by 4, and so on. But how often, I mean, to be fair, how often do you do a division of integers, but how often do you do an interdivision by a power of two?

I guess not that often. So let's try three, what's going on here? Oh. We have lost a division, there is still no division on the right side. We have some moves and things with a constant and multiplied fun and scary aspect. What's going on here? Well, compiler intelligence again. What's happening is that the aaaaaaab value has actually moved two-thirds up, from 2 to 32. So effectively, it's like a two-thirds fixed point where the fixed point point is between 32 and a lower 32, so it is 32.32. So if we multiply it, we get the answer, shifted up 32 bits. And if we just discard the lower bits here, which is what that eax move does, then what we have is the answer with a fixed point multiplied by two thirds.

And then we cut it in half. So why do we multiply by two-thirds? Well, it turns out that, to cover the entire range of an integer, at least an unsigned integer, there's just one bit less to make it one-third directly, you have to make it two-thirds and then shift it down. . And there are some very well published algorithms for determining the exact sequence of trades you should perform to hedge a particular range of values in a particular way. And I saw some copies of Bit Hacks or whatever, there's a book, I saw some out in the library before, oh, that's cool.

Sorry, cancel. I've been clicking report problem but nothing seems to be happening. Anyway, that's division. So if we have done division, we should see what the module does. And the module is effectively just a division, so the bit is grayed out at the top. It's a little bit, sorry, we do the division and then we just multiply it again by three, and you'll see that it's using the instruction read here to multiply it by three by adding it to itself by two, that's quite a bit. intelligent. And then you subtract it from the original number and you get the remainder.

So why do I talk about modules, modules, modulators? Well, they are used in hash maps, which is everyone's favorite container, right? So if you have a hash value, you probably have a 64-bit number coming out of you as if you were doing any crazy operation on your string. And now you have your 1,021 buckets in your hash tree which you need, sorry, your hash list, from which you need to start indexing to look up the result. And the only way to take the 64-bit value and mix it into the number of buckets you have is to modulate by the number of buckets, which is great, but as we just saw, it's the slowest thing you can do.

Now, obviously, if you know in advance that your container will always have exactly 1021 buckets, then you can write the code that makes mod 1021, the compiler goes in and says, ah, modulo times a constant, I'll generate that cool code for you. And you are ready. But of course, most of the time in general-purpose code, we don't know how big our sets and maps will be, unordered anyway. And then there will be a dynamic number of how many deposits I currently have, and that will mean there will be a split there. And that's unfortunate because hash maps are meant to be the fastest.

Again, we're only talking about 10 nanoseconds, but these things make a difference. In fact, there is such a difference that libc++, after a certain point, stops using prime numbers, which is like good oldway to calculate the number of buckets you have, and start using powers of two, although Actually, that's not perfect, but for a large enough number of buckets, it's probably fine. And at that point you just use an y trick to select the bottom few bits instead of splitting. So obviously people are thinking about this, some smart people, I'm looking, some people are looking at me and nodding their heads right now, so I guess it's them, some people are thinking about this kind of thing.

And in the case of, for example, boost multi_index, there's something even more clever where they have a certain set of allowed bucket sizes, so they know that this way we have 20 different possible index sizes, each of which is a prime number. And then instead of just modulating by the actual size, they make a switch statement about how big it is right now and have case 1021 which returns the hash mod 1021. Or the equivalent, I think it's done using an ordinal value . Which means that yes, you have a big switch statement, and yes, the compiler will have to dispatch and jump to the right part, but when you get there, you'll just do a couple of additions and a multiplication.

Obviously it's worth thinking about. Anyway, there are smarter people than me thinking about this. And that actually depends on the fact that the compiler will do this optimization for them, right, you wouldn't write it that way if you didn't know that's what would happen. Who went to Phil Nash's talk before? There are some. Great, I thought I'd calm my thunder on this part. There are a number of data structures for which it is convenient to be able to count the number of bits set in an integer. Now, who has ever had to do that? God, actually a lot more than he thought, I mean, Phil has his hand up, of course.

That's amazing, I thought I was going to set this kind of example and everyone would look at me blankly, like, why would we do that? But anyway. This is a way of counting the number of bits set. We'll pass a value to a and say that as long as there are still bits in a, let's increment the count. And then at a &= (a-1), it's like one of the oldest bit tricks in the world to clear the lower set bit. We go around again, once we have no bits left, we're done, we return the bill. Let's see what our compiler does with this.

So this is the GCC. GCC has done something pretty clever here. It has a loop. You can see that the yellow bit is the loop bit there. And then the red line there, a &= (a-1), is replaced with a blsr statement. I had never seen the blsr instruction before until preparing for this talk, and it turns out that it is tailor-made to perform this operation. It clears the lower configuration bit and allows you to put it in another register, so you simply have to clear the lower configuration bit. That is very beautiful. So an instruction was chosen that does exactly what I want, that's super clever.

Can we do better than that? I imagine there are some people stroking their chins right now. Oh yes, we can do much better. It turns out that there is an x86 instruction whose only reason for being is to count the number of bits set, it is so common that people wanted it that Intel included it in the instruction set. Now, think about what Clang must have done here. For example, there is no relationship on the right side, there is no indication there that says I am counting the number of bits set. There are some things built into the pop element count that you can do that tell the compiler that I want to do this and therefore the intrinsic will allow it to emit the code that does this or the instruction if it supports the architecture that I have chosen one.

But I was blown away when this happened, it was incredible. So all the bits of code that I have that actually used the pop count instruction could be replaced with much more readable, human, well, I guess it depends on (audio cuts out) bit manipulation and stuff, but you know, More legible. For me, sensible, straight-line code. That is an amazing achievement. The last one I'm going to talk about is, again, a little toy example. Therefore, standard warnings apply. And that's this summary, and I wrote a routine here and I used constexpr because I realized that this is all C so far and that makes it C++, right? (audience laughter) So we're going to sum up to the x value, and then we'll just do the obvious thing of starting at zero, counting up to it, and adding it to our sum, brilliant.

And then we're going to write a little main routine, which will summarize up to 20. Then we'll see what the compiler does, how it does it, oh, right, okay. Well, it's constexpr, right. Constexpr means it can only happen in the compiler, right, well it turns out I'm not using it in the constexpr context, but let's replace it with static or replace it with error, why not? Do we do that? It's still the same, right? So the compiler can see it, the compiler knows the game we're playing, and it says, I'm going to spend some time figuring out whether this reduces to a number.

Now this is a stupidly trivial example of this. Generally, this is likely to happen in code snippets that you're pasting into the Compiler Explorer, and the trick to this is to make sure it depends on something the compiler doesn't know about. So in this case, the compiler doesn't know how many arguments you pass to the function, so there we are. Now we can see the loop in its glory. And again, GCC has done something great. It's unrolled, no, no, I lied, sorry, that's next. Well, it's pretty simple here. I won't bore you by going over it, but you can see the shape of it, and you know, if we really want to go to town, we can turn on the O3 again, you can see, look how cool, Again, I love all this stuff.

I mean, I'm sure, I don't know how many command line arguments I need to pass before it's more efficient than the previous version, I don't know. But let's look at what Clang does again. Oh, that's interesting, there's no loop. There is a test in that blue region, and that test simply says that if it is less than or equal to zero, then the answer is zero. And then that yellow block, which apparently corresponds to addition += i, I don't know how the compiler came to that conclusion, but it's just a move, a fun read and a multiplication.

What is happening there? Well, as most of you probably know, there is a closed-form solution to addition. Clang solved, again, what I'm doing and replaced some code with a closed form sum, yeah, that's pretty awesome, right? (applause) Even more interesting is that you haven't actually used what I've always used, x(x + 1) over 2. In fact, you replaced it with x + (x - 1) over 2. They are equivalent. The reason is that what happens if we pass an INT_MAX, right? It would have worked, the loop version would have worked for INT_MAX, but it wouldn't have worked if we had to do x(x + 1), we would have overflowed, everything would have gone horribly wrong.

I mean, how to get the INT_MAX command line arguments into a main function is another problem, but in essence, they've also taken care of the overflow case. And it turns out that you can play around with that code and it will like it, move things around, and ultimately give you a close solution every time. Just think about what that's doing, it took an order and a piece of code that I wrote and turned it into a linear, sorry, constant time operation, there's another hand going up here, hello. (background noise drowns out audience member) Okay, so the observation was that overflowing an integer is an undefined operation, so why should it be compiled?

It's because the compiler can't start performing undefined behavior on my behalf, my x was not overflowing. Yeah, so you can't introduce, you can't add one to x ever, because as far as you know, the value was actually intmax beforehand and nowhere in my code did it overflow the integer, so you can't introduce undefined behavior. . We can talk about it later, anyway. Well, these are silly examples, they are all C examples. The compiler does much more than this. I tried to create a slide that explained some of my favorite things. I'm an old school 00 guy, an old fashioned guy.

So I still use the virtual keyword, I'm a bad person I know, but compilers are getting a lot better at that too. They can perform static devirtualization in all cases where they can prove that a type is of a particular type and can get rid of the overhead of calling virtual functions. And now they're getting smarter and smarter, like saying: I've looked at your entire program thanks to LTO and I've said there's exactly one implementation of this interface, so I'm going to assume that every call to that interface is to that implementation in particular. You still have to check, so you do a quick check on the vtable pointer or the actual address of the function you'll call, in case you've opened something and loaded another implementation.

But ultimately it is effectively included in the code and a big check is put at the beginning. And that's great, but again, it didn't fit there. And of course compilers do range analysis and do constant propagation and do things I don't understand. I'm not a compiler writer, I'm just a guy at the other end of the pub queue looking on in amazement. and I'm like, wow, that's really clever. And I think about that twenty-something me who always thought the C compiler was basically a glorified macroassembler, and now I realize how wrong I was. The compiler has done a lot for you, you should really be grateful for the hard work that the compiler writers put in to make the code efficient and in most cases correct too.

Alright, that's my big love story for compiler writers, and now I'm going to tell you a little bit about how G-Compiler Explorer, I almost said GCC Explorer again, God. Compiler Explorer works. This is the second talk of this conference where I am going to talk about JavaScript, I feel very bad. So yes, it is written in Node.js, which is a common JavaScript framework. And I know we sometimes give C++ a bit of a bad rap for having funny defaults and weird edge cases, but if you just take a look at JavaScript, really. (audience laughter) I mean, honestly, you can Google it and you'll find at least three cases that will make you, I don't know, give up on everything, but it's ubiquitous, it's easy to stop and then start running quickly, and once As you start a project and it becomes really successful, it's a little hard for you to say: oh, I should really rewrite this into something else.

And the irony is that the only other type of large open source library I have - large is relative - is a web server written in C++, so you'd think I would have used that, but never mind. So it works on Amazon's infrastructure. Oh, this is what Node.js looks like, it's horrible. But you know, this is not much more than the code that a very primitive version of Compiler Explorer gives you. Just say, "Hey, I'll get some requests at this URL, send them to the compiler, see what comes out, maybe do some primitive text processing, and then return it." It's so simple, I wish it were that easy to do some things like that in C++, and I'm sure there are people in the room who can tell me which libraries I should use to make it that easy.

Behind the scenes, everything runs on Amazon's infrastructure, namely EC2, which is their Elastic Compute Cloud. I have an edge cache, oh, first it's worth saying that it started out like on the free tier, meaning literally the smallest computer in the world that Amazon would give me for a year for free. And as demand has increased, it's gotten more and more sophisticated, meaning the layers of onion have grown and grown around it and now it's kind of a nightmare for development operations. But anyway, it's now completely cached at the edge. Hopefully, when you load the page, it appears fairly quickly.

There is a load balancer behind the edge cache. So when you actually make your post, it is sent to one of the many instances I have in the background. Those instances are virtual machines running Ubuntu. And inside those virtual machines (as I told you, they are layers of onion) there are Docker images. So Docker is a kind of very lightweight container that allows you to pool data and run a process in a namespace, meaning it can only see a subset of the computer, so it's a bit like a VM within a VM. VM. although much lighter.

That gives me some protection against all the crazy code you guys put on the site and not remove it all. And also, it gives me a way to make sure I can run the exact binaries locally that were going to be deployed to the site. Most of the time, I don't break it when I update the site. And for a long time, I used to build all the compiler images on those Docker images. And that started to become less fun when you get 40 or 50 gigabytes of compilers and try to build a Node, sorry, Docker image for itand then you have to try to send it to all the nodes, and you know, It takes 10 to 15 minutes for a node to start up, because it has to pull all this data before it can even start up.

I solved it with the cheesiest trick ever, which is that I just have a big NFS mount that everything stays on, so it seems to work. The compilers themselves, as I said, I initially started by installing them with apt-get. So the Docker images looked like little virtual machines, and then you pushed them, and then you had all these commands, I'd like sudo apt-get install gcc-blah blah blah blah or whatever. That was great, it was very convenient. But then I discovered that the compilers I had originally put on the site, such as Ubuntu 11, were deprecated and didn't install on GCC 12, 14 or 16, so I kept a container just because it had some old stuff. compiler that I wanted to keep.

And I want to make sure that the URLs that you share always work, which is another story, you know, URLs are forever, forget about any kind of radiation, it's like URLs stay absolutely forever. Now that they're built through these Docker images, I've spent quite a bit of time trying to learn good practices for building compilers, and so I've canonized it, it's all on my website, which I should link to. the end. And they're built into Docker images just because I have reproducible builds, which is something I'm very passionate about. I want anyone else to be able to clone this and build the same compiler that the Compiler Explorer is using. .

And more to the point, actually, once I built them, I dropped them to S3 and then marked them as public. And I keep getting emails from Amazon because there are a lot of people recently who found out that they have public information in S3 and they didn't mean to. Well, actually I mean Stop, the reason they're public is that you can actually run a shell script and if you have enough hard drive space, you'll get 40 gigs of open source compilers downloaded to your hard drive, which It's cool because then you can run your own local instance of Compiler Explorer if you don't want to send me your code, for example.

Those are the open source ones. Intel has also very kindly provided me with licenses for the ICC; They are somewhere else. And Microsoft's compilers also currently run through WINE, which is a bit of an admission of my lack of understanding of how to manage Windows more than anything else. I'm happy to say that I've met with Andrew and a few other people at Microsoft who will help me and we're hoping to get proper support for Windows compilers in Compiler Explorer, so that's really exciting. going up. (applause) Thank you, thank you. And you know, security. Well I actually got an email this morning from someone who thinks they've hacked my site and maybe they have, I don't know.

It turns out that a compiler is a gigantic security hole waiting to happen, it is a huge attack vector. Compiler writers are amazing, as I think I've said enough times. They're not really security experts and they're not really, they don't have to be, right, they're on a trusted system. But I had no idea how many ways there were to inject code into a compiler. You know, you think a compiler just builds the code, I'm not actually executing it, right, I just am. But GCC has a plugin architecture, you can provide a -F plugin and point it to something that is a loader like dynamic library.

So if you can get the compiler to fail, but after I've written the file somewhere and then my cleanup code doesn't work right away, then I could guess the path and then do another compile with the -F plugin equals boo. , and then you're in. It only happened once. (audience laughter) Also behind the scenes. GCC uses something called spec files, which I don't pretend to understand, but they seem to be part of the processes, the many, many pipelined processes that make up the build, from the preprocessor to CC1+ to the AS assembler and all that sort of stuff. of things and essentially it is a set of shell commands to execute.

And of course you can say, "I'd like you to use my spec file here." And again, you're back in the same world where, as long as you can write a file somewhere on the disk that says run, I don't know, netcat or something, then you know, you're in. I started down the path of trying to harden everything, and eventually came to the conclusion that there is no way to make it completely bulletproof. And that is why I have accepted that the principle of what is the worst that can happen should apply. And the thing is, you can hack one of my nodes and delete the Docker instance, and even if you don't exit the Docker container, the worst thing you can do is stop it from running and then something else will come. and end the process and it will start again.

Hopefully, that's not so bad. Even if you escape the CH path from the Docker guy, the namespace jail, which I'm told is possible. You are in a virtual machine that has no privileges at all. I'm saying this for the place where it's being recorded, someone will email me and say, "Oh, this is good," actually, it's true, but oh well. Anyway. So Docker somehow protects me from this. And I experimented for a long time using LD_PRELOAD tricks, which is a Unix-specific thing where you can force load a dynamic library before running an executable and then you can override some of the system routines, so I have a bit of code that basically it replaces open and read and all that kind of stuff and just has a blacklist and a whitelist of all the different files that you, that you were allowing the compiler to read, and that was my way of preventing people from doing like #include /etc. /passwd, ha ha ha.

Which is essentially the kind of thing people try first. It turns out that the compiler needs to read /etc/passwd, because it wants to know which user is running it. we will allow it. And then it was oh, now it's trying to read /proc/cpuinfo, I think, well, that's a little iffy. Why should I let it read the proc file system? And it's like, well, because people want to write mach=native, and how else is the compiler going to find out what CPU it's running on? And so on, and before I knew it, everything was basically fair game to the compiler, I just threw up my hands and said, well, screw it. (laughs) So I guess we should talk a little bit about that interface, what you see.

Another big prop for Microsoft here. The Visual Studio Code editor is a separate project that you can, written in JavaScript, that you can just take and drop into your project, and it's amazing, it's a full-featured editor. And he does everything he ever wanted. That diff view, for example, is simply a native thing my code can do. It had been in my backlog for a long time to have a difference. Things like mistake number three. And then when I moved to Monaco from CodeMirror, which was the previous JavaScript library I was using, it had this diff feature.

I thought, oh, that's cool, and it was only two lines of code to enable it, it was amazing. And then the fun drag and drop type that I see other people struggle with, which is more a testament to how bad I am at user experience design, is design gold, and I would like it, thanks to both equipment. Behind those things, it's amazing that they open sourced those things, which made it very easy to copy and paste the website together. And clicker. So the code is on GitHub. There are two repositories, the first has all the Node code and all the front-end code, so effectively, if you have Node on your machine, you should be able to clone it and type make, and that's it. .

You will have your own running local version of Compiler Explorer. The second line is the image, which has all the Amazon stuff, it has all the Docker container, it has all the compiler scripts, it has everything. Basically, if I got hit by a bus tomorrow, you should be able to reconstitute Compiler Explorer with what's there. And if you, oh, I've done that. And there will be more information about this in the upcoming C++ Weekly, Jason's weekly article on YouTube. This is a slide I put up earlier, so I haven't really thought about what I'm going to say.

But this is inspired by conversations I've had with you, friends. And that's, I think you have an idea of how I use Compiler Explorer, why it was created. And I've been surprised at how differently everyone else seems to use it. There are some people who like to show off cool and cool optimizations that the compiler does, but it's become kind of a de facto code pastbin. I think Jason alluded to that in the introduction by saying that, you know, this is how we share things now, which is, you know, very exciting to me. And I know some of the build teams are using it internally to test things they're doing and I did a search on the Clang and GCC bug forums, and there are over 100 bugs that have citations to Compiler Explorer links. .

And people use it to quickly split, you know, you can just drop it, the dropdown has so many compilers, you just go and move up and down here, oh, it was introduced in 6.2, and that's cool. I've also seen or heard of people using it as a kind of REPL. Which, you know, if you're doing a little bit of template metaprogramming, you can write tests, you can write, you know, constexpr code that uses it or static assertions. And you can start writing code. I think this shows the deficiency of our tools. It's a nicer experience to write on a website that publishes your code over the Internet to someone else's machine who builds it and ships it back than to just do it locally.

So I'm wondering if people in the audience might have some ideas from that. And I've also seen people using it as a training resource, which is also very rewarding, you know, being at this point at the inaugural conference, you know, teaching C++ and making sure that people understand how to write this code and understand the processes. What we go through when we write code is important, and it's good that you feel that Compiler Explorer can help with that. Quick sneaky, sneaky? Yes, advance, I always understand them backwards, of things that will come soon. There's CFG, which is a control flow graph viewer that's going to be available very, very soon, in fact, it's on the beta site or beta site.

So if you go to

godbolt

.org/beta, you'll see that there's a new icon on the far right of the assembly panel that you can click and drag, and you'll get a view showing how the different building blocks interact. I'm also looking to unify all languages, so you may have noticed that, as many people have said, it is inappropriately called gcc.

godbolt

.org. It should just be godbolt.org, well actually it should just be compilerexplorer.com, but you know, no one will let me rename it now. So I would like to have all the different languages, you know, there are other supported languages, D, there is Haskell, there is Swift, there is ISPC.

They're all there, and it would be nice to have them all in one place, so instead of accessing different URLs, you could actually open two different editors from two different languages, right? The Go version and the C version of something, and then see how the assembly works side by side. I'm afraid that will lead to some incendiary wars, so I don't take any responsibility for that. And then the one thing that everyone always asks me is about code execution. I mean, it's fair to say that there are other compilers online and they're very good and you should check them out too, there's OneBox and others that escape my mind at the moment.

But they do, some of them have runtime support, and that's awesome, because it's nice to write the code, it's even better if you can see what it actually does. But you've already seen my hours of devops and amateur security configuration, so I really have to understand it before allowing arbitrary code to run. But it's coming soon and I'm taking advice from people who know a lot more about this than I do. Well. We're at the end and I just want to say thank you. This is not just me. I'm happy to say that Compiler Explorer is now getting a lot of contributions from outsiders, like Ruben, Gabriel, Simon Brand, Johan, Jared, Chedy.

And the other people who have raised issues or have taken the time to contact me directly and tell me about issues that they encountered or have suggestions for the site and, you know, it's amazing, it's wonderful to have something that's been a labor of love, that You were scratching your itch and then you find out that it's actually helpful to other people, and not only that, they're willing to help you with it. Thanks also to the people on Patreon who help foot the bill. I'm in an embarrassing situation of making a small amount of money right now with this, which is awkward, so I keep increasing the number of instances just to counter it, so if you put more money in, then you get faster builds. (audience laughter) Thanks to the amazing C++ community, oh,ooh, there we go, we spoiled the ending.

Thanks to the amazing C++ community, the folks at Slack have been a great source of inspiration, idea sharing, and of course thanks to you for sitting in while I talk about my fun little website. Go read some montage, thanks. (applause) Thank you. We have time for a few questions until I'm told otherwise, so yeah, hello. - Can you go back to the slide where you showed the sum or the equation? - Oh, you're going to find a hole in my amateur math, yeah. I'm not, I'm, oh, somewhere in here. Hold. I think I'd stick out like a sore thumb, right?

There is. Alright. - Yeah, so the interesting thing is that actually, Clang transformed this to have this as x - 1 because he's worried about overflow, but multiplication can also overflow because everything can't overflow, right? So if he looks at the assembly, he'll see that it uses some very strange instructions and, in fact, in the compiler, it models it as 33-bit. Yes.-Oh, I see, well, there is someone who really knows what he is talking about. (laughs) - Yes, I hope 15.6 MSVC also has this. - I don't know, we have people from MSVC here. Thank you. I guess on this side, hello. - Actually, I had a question about the scope.

Other tools I've used like a Pastebin for code provide external libraries like Boost or ranges. Any plans, or is it even within the scope of this project? - Let's take a look, wait. Just the other day, a kind soul, oh look, thanks, Twitter. Oh, go away. (laughs) Never do things live, let's go to the live website. We now have a relatively easy way to add libraries and I'm happy to say that Abseil is one of the newest ones we've added. - That is awesome. (applause) - Thanks, Google, let me know about it. Oh, thanks, I guess, hello. - Hello.

Well, you said you liked it when people used it to share code, right? Any way to shorten the links? - Well, the link shortened. So I have one thing. At least for the moment, I'm not storing any data at all, I think I should make it clear, I don't care about your code, my life is too complicated and short to read your code, so don't do anything with it other than compile it and throw it away And I don't even want to store it long term right now until I have some privacy policy and stuff like that, so what actually happens in the URL has it all, so when we do the long URL, if I were to share here.

You've seen it, it's like, gigantic. So I use Google link shortener to make it the smaller version that it is and I've been reliable, what's that, sorry? - Fails at some point. - Yes, it fails at 8,000 characters. So I think at some point I'll have to bite the bullet and store your data to give you the kind of features people are asking for. And we have some ideas about how to do it and do it in a way that means your data is safe and backed up, because I don't want those URLs to ever go away, that's very important to me that people can still go.

I went back to Stack Overflow five years ago and found an amazing answer to something and it still worked. - Fine, thanks. - Thank you. Hello. - Regarding that optimization you were talking about with the hash table where you have a set of fixed options for the length of the hash table... - Right, the boost multi_index, yes. - And then you make a change. Well, those possible fixed options are not next to each other, they are all separate. So you'll end up having to do a binary search, which is a bunch of conditions, so you'll stop the process. - Yes Yes.

I simplified it for purposes, what I think it really does is that it just has the set of ordinal values. Let's say you have 20 different bucket size options and then you switch between 0 and 20. And you know that the ninth possible bucket size is 1021, I know I described it as the 1021 case, but it was actually like, it would be the switch case. 9, return hash percentage 1021. Does that make more sense? - Well, what I don't understand is that if you have a bunch of cases that are very far apart, you can't make a jump... - Of course, they wouldn't be very far apart, like I said, they would.

Just be numbers between 0 and let's say 20 as the 20 possible sizes. And then, of course, just use a jump table and then you jump to the correct piece of code. Yeah, sorry I oversimplified it, I was thinking otherwise I'd spend ages trying to explain it and as you've probably realized, I'm not the best. I guess you got there first, yeah, go. - Hello. Do you have any ideas to support the execution? So I think if it supports WebAssembly, I think it can be supported. - That's right, yes, using WebAssembly. I mean, it's already been the case that people have asked me to put in JavaScript backend support, and so I can compile JavaScript and web, asm.js and then WebAssembly would be something else.

And yes, there is the ability to compile to WebAssembly, send it back to the client and say, well, you can play with it all you want, the only computer you're breaking is your own. That has some appeal, but there are many compilers here that don't have backends for WebAssembly. And so, I mean, I guess one thing I've thought about doing, I mean, if you came to my other talk, you know, I have a kind of weird passion for emulating things. And if you've ever watched any of my YouTube videos, I also love talking about microarchitecture and how the internals of processors work.

At my dream job, I would sit down and write a JavaScript emulator for x86 with all the pipelines and everything, and then I could come full circle and have everything together, and then you could run it locally and see how it all works. the caches work and everything, it will be amazing. But no, for the moment I think the execution will probably stay on the server side. - OK thanks. - Thank you. Oh yes, go away. - I have one more... - Of course, do it. - I just got here, you know, to a conference on C++, with my team members working on the Linux kernel, and do you have any ideas for supporting the Linux kernel code? - I would suggest taking the code locally, pointing it to your own compiler, there are no restrictions on what files you can select when local, it doesn't run in a Docker container when you run it locally, so you can do this.

I point it to the user, include Linux or whatever you want it to do, and that's what I do. So when I use Compiler Explorer at work, I'll use it locally and do -I on my project path, and then I can #include, you know, what I'm interested in seeing. .h and then I write a small test function, I can see how the assembly is generated. So I think you could probably use the same approach there. - Oh, okay, thank you very much. - You are welcome. - I appreciate that you don't want to get into this, but how much does it cost? - Oh, I don't mind telling you at all.

I'm spending about $120 on Amazon costs directly, and then there's, like, CPU and stuff. And then there's the transfer costs and then there's the other miscellaneous, the bill, but it's, you know, around 150 for that, and then there's a bunch of other things that I have like, you know... - 150 for. - Month, I'm sorry. 150 a month. And then there's a bunch of other stuff, it ends up being about $200 a month. And you know, I'm hesitant about what I'm going to spend the surplus on, so my backers are currently around 300, which is, you know, fantastic, so I'm kind of net, although if you count... - Well, it's Va to go up by 10 or something like that.

It will just keep going up. - I mean, maybe. Don't make it more uncomfortable than it is. So yeah, if people have ideas about how to improve it that might involve larger sums of money than that, then I'm willing to add things that are a click away, in particular. Thank you. Oh, I guess I've lost track of who's next now, so I'll alternate. - Do you think it is possible to support intermediate representations like LLVM in... - Yes, absolutely. - I think because it will help the people at LLVM, especially the people who are improving that who like... - Can anyone remember what, Ivo, Ive? (audience chatter) Oh well, there you go, apparently it works.

So there it is, don't clap, don't clap, I can just put command line options in it and it just works, right? It seems... (laughter) (applause) Yeah, so it's supported only because it will just throw out whatever the compiler gives me, I'm not being smart here. But we have something, I mean, you can see there's a lot of redness here that's not so good. We're going to implement better support for LLVM syntax highlighting, but you can do that now. And it would be nice to filter it too, you know. It's become more and more sophisticated, the filtering that I do has become more and more sophisticated because I really only want to show the feature that you're interested in and not all the nonsense that appears.

I don't know enough about LLVM intermediate representation to know how to do that, but I think it will be something that will come in the future. - Yes, I think it looks very good. - It also supports -e, which is preprocessing, yeah, which is how people do the horrible include line, we have that here, you know, #includes passwd.h, whatever. Yes. Does that answer your question? - Yes, yes, thank you. - Thank you. Okay, now on this side, hello. - I'll include this comment because you mentioned it. As far as I know, there is an x86 emulator in JavaScript.

Yes, you can Google it, the one I know of is bellard.org. And it runs a full Linux, it has a compiler, it runs everything even a little bit fast. I don't know how, that guy is crazy. - That would be fun. Yeah, okay, thanks, that's a great idea, thanks. I think, from the looks of it, the last question. - How is the association of C++ lines to assembly lines achieved? - Oh, magic. No, I just pass -g, whatever you type, there's always a -g in there and luckily the assembler output has the dot lock stuff, the hint for the assembler to output midget debug and I look at it.

I also support STAB because there are some older compilers that do it too, but it's not as sophisticated and I've introduced some bugs, I think GCC and Clang are about where they decide to put which line corresponds to what, and because there are some obvious and glaring ones, We say, come on guys, I'm sure you could attribute this to the right place, but we'll see. Okay, I, oh, is it there? No, I think we're done. Thank you all, thank you very much. (applause) (brilliant music) - Bash Films can film your event with multiple cameras, link presentation slides, add titles, and edit your live event for a full streaming experience. - How does this work?

So this is actually a more interesting show to watch in many ways. So let's outline it. Take a little time to make a profile for us. Let's look at exactly what makes this faster or slower based on different inputs. We could really get a lot of information by looking at the profile this way. - I worked on Sesame Street, they hired me to be a writer's assistant on a show called Sesame Street English, which taught English to children in China and Japan. The programs they put on seem very simple, but in reality it is very difficult to design a program that is not only for young children, but also for parents. - A confession like this is therapeutic, I hope everyone gets something out of this, but if not, therapy will have been good for me, so thank you.

Seven years ago, I was working, not working (mumbles) for my previous employer, which was a large multinational investment bank. I had what up to that point was the worst day of my career. - And then the anger came. Anger at ourselves, because we knew we were responsible for America's first space disaster. We wrote two more words into our vocabulary as mission controllers, tough and competent. Which means that we will never again shirk our responsibilities because we will always be responsible for what we do, competent, we will never again take anything for granted. We will never stop learning, from now on the equipment and mission control will be perfect.

Because as a team we must never fail. - Another thing. We're all in a very fortunate position, we've been very lucky in our lives and so on. And I think, as part of the mission, sometimes it's also good to take that fortune and give it back. (applause) To make sure that we embrace this platform and use it for worthy causes. That's good karma, that's a good thing in the universe. - We understand that your event will have specific needs of your organization. Please email or call us directly to discuss your particular event. We look forward to discussing your goals and helping you make your event a success.

Watch Video & Subscribe

If you have any copyright issue, please Contact