Introduction to Memory Management in Linux

May 31, 2021

Welcome, we have a lot of material, so let's get through this. Wait, it'll be quick, so my name is Matt Porter, I'm with the Consul Group and if you're looking for Allan, I'll explain. on the next slide why I'm not him and you can probably tell because if you know Allan, his hair is up to here so this is an

introduction

memory

management

and I emphasize the

introduction

so if you're an experienced colonel, you might not get as much out of this um this is the presentation that when the first users of integrated lytics in 2201 were coming back from working on our releases I wish I had sat down and put together something as good because these are the things that everyone needed to understand to really understand their system um, so real quick.

I'd like a little bit about the original author Alan, um, he couldn't be here, unfortunately, he's a good friend of mine and he's a veteran embedded Linux developer. um he's a Linux architect at soft iron um you may have heard they're an arm server company um but he put together all this stuff uh he's a fellow instructor uh LF uh uh training instructor also gives the inner aspects of the colonel class that contains a lot more than this material, so he did a very good job on the slide deck and I trusted him to present it as well, so anyway, I just want to congratulate him on this amazing material, so that here we go.

More Interesting Facts About,

introduction to memory management in linux...

We're going to talk about

memory

management

from start to finish and that's going to be the introduction, like I said, so we start with physical memory. If you look, your low-end systems have a single address and memory space. and the peripherals share that same space, they are mapped to different parts of that single address space and all the processes in the operating system, in this type of system, share the same memory space, there is no memory protection, as we would we. We often hear that we're going to get all the details out of this and you're running a process in that single address space.

Processes can trample each other because they are all shared there, you have to separate them manually. and quote user space or your user application can trample, say, your real time executive that you're using to program, so examples of these would be a part 886 uh cortex M with all these low end microcontrollers and the old um pmu processors uh uh, so let's take a look, so it gives us I know many of us are not working on x86, but it serves as a ubiquitous example, uh, if we look at a 32-bit x86 system, right, um, many Legacy? obviously, but it's common ground, we have all these legacy areas, etc., you have hardware mapped between RAM areas, you can see that your PCI, PCI area physical memory mapped, IO is all on the high side.

Well, that gives you an idea physically of what x86 looks like now. What a limitation with the unique address space. You have portable C programs and you expect them to own everything. It is not like this? You know, if you're trying to port multiple C programs into a single space, you have to set up the addresses, this can live here and this can live in this segment so they don't trample each other. it's a little bit difficult to do that, you need to have special knowledge of your actual platform, you need to know what your total RAM is, and, like I say, you need to separate those processes, you have to do all this. it works um and there's no protection like we said malicious programs can stomp on all the things so they get virtual memory and this is where things get fun um so what's the right thing to do? um, this is a mapping, it's a virtual mapping and the virtual name, so, um, so. you map a virtual address, a fake address to that physical address, okay when we look back at that x86 map which is the entire physical world and if we can think of virtual addresses we can have any mapping we want so map virtual addresses to the physical RAM. but we also assign virtual addresses to hardware devices, so PCI GPU RAM is enabled, so IP blocks everything properly, so what's the advantage?

I described how in that flat memory model, the single address space we have that situation where you know I have to tell it something to execute. at this address and this address and this address until the end of time and actually I have a good memory where everything lives, it is not portable, so when we have virtual memory, you have a process since the RAM is inaccessible to the other processor . It is also invisible, so it has built-in memory protection and the kernel RAM is not visible to userspace directly. The good thing about it is that the memory can be moved correctly, so the memory can be visible to different processes, but it actually has to. um uh set up a mapping for that and the other nice thing is that instead of in a single address space uh where you have all the memory there and you have to share it manually and segment it properly, now you can do things like swap memory to dis because the addresses you're dealing with are just virtual, the other thing you can do with virtual memory is map the hardware we talked about can be mapped into the address space of your process, okay we need help from a konel.

To do that on behalf of user space, the other thing is that we can take RAM and we can assign it to multiple processes. Let's dig into that and that's the shared um, that would be a case where, for example. a shared library right where you're assigning it to multiple processes and finally with virtual memory we have the ability to have writeback execution permissions placed on those address accesses, so we have two address spaces now, we have the physical addresses We talked and saw that x86 physical memory map. We use an example and you know, dma peripherals, whatever is assigned in your physical world, correct virtual addresses, and those are the ones that our real software uses right when we get to our machine code, whatever the architecture, our loadstore accesses memory directly and they always use virtual addresses, so looking at virtual memory we have to do a mapping.

This mapping is done in Hardware, so there is a piece of Hardware that helps. with these allocations, okay, once we have allocated them, there is no penalty for accessing memory that way. Well, permissions are handled without penalty, so this is all handled in hardware for us and we're going to talk about what that hardware is. does this um and of course we use the same CPU instructions or the same load stores, whether it's Ram or a peripheral IO piece, okay, so in normal operation you're always using virtual addresses, okay, um So what magic does this do? memory management unit, and uh, so an mmu sits between the CPU core, uh, and the memory, okay, you often know that modern architecture is part of the physical CPU itself, if you look and you like retro things, you'll find that mmus used to be a separate discrete part, well, and we're interconnected and part of that whole, um, sold as, for example, a picture, it's often an integral, separate, discrete piece of an architecture, uh, so, the only thing that has to be maintained.

The mind is that the r Ram uh controller is a separate piece, so you have mmu, the DDR controller will be a separate IP block tightly coupled, although uh, and what does an mmu do well? What it does is it just does that magic transparently. handle the translation of those load and store instructions into physical addresses, okay, so we map memory accesses to our system's virtual addresses. Ram, that physical address space we talked about, same thing with peripheral hardware, it's no different from your point of view, right? handles permissions, you said we have permissions with virtual memory and if we have invalid access to something it will raise an exception and with that exception we can do some interesting things, we'll talk about that in a moment, what does an mmu look like? it works, there is an important piece of the mmu called plb translation, look aside at the buffer, okay, so that's just a hardware buffer, um, which has a set of mappings and those are your virtual to physical mappings, it will have permissions for that space, okay?

There is a granularity at which these mappings are maintained and we're going to talk about that in a moment and the interesting thing is that you know that design B is very architecture specific, very sensitive to the performance of specific parts, so we'll see a wide variation in how the tlbs are designed, how the mappings are placed, whether it's done by software or hardware, that kind of thing and also the capabilities, how many slots they have, so this is a quick little diagram. of what a system looks like, if you're having trouble visualizing it where it is, you'll see the mmu between the memory controller on the right and the CPU, you'll see that tlb on the side with some entries, okay, as they say with that.

Tlb, the mmu takes a look at that buffer, is there already a mapping there when it accesses a virtual address and then it can look for it and if it doesn't find one it will generate the page fault? Swap the CPU, okay now, if the address is in tlb, but let's say you're doing a successful access, but it's only configured for read permission, it will also throw an exception and that will come back into play as we see how. We use those things in Linux, so in Linux there is a page fault, so you have a CPU exception, it was thrown, okay, and this happens when you access that invalid virtual address, which makes it invalid, it is not in the tlb, okay, and you have three cases, then the first virtual address is just not allocated, okay for the process requesting it.

The second one, you don't have the right permissions and the third one would be that it's a valid virtual address, but it's currently swapped, and that's one. software condition, so let's dive into each of them, but first we're going to get into the virtual memory side of the kernel, okay, so we use virtual addresses in both the kernel and user space, but the way than to use them, the way things are allocated is a little bit different, so in the kernel we use them obviously, but we have this division in how we treat our virtual address, it adds um and uh, the top of our virtual memory.

The map is for the kernel and the bottom is for user space and usually when we teach people about this it's harder to think with 64 bit addresses so we go back to 32 bits and affectionately call U to the default place which is divided between user space and kernel. The space is C bazillion which is in that 3 gig location which is default so this is what it looks like. You saw that enormously complex physical memory map of a 32-bit x86 architecture and lo and behold, here's the virtual memory map that I have 3 gigs for user space, scroll controls to the right settings page where that split is set on the right, so each process gets its own 3 gabts on that system, it has that full view, so remember to go back to that single address space if you had multiple processes, you had to link them in all these different places and managing your processes very manually in this world, when we link applications they all end up in the same place and the colonel only has this gig in our 32 bit case.

Well, as we said, configuration page scrolling controls many architectures, if you have specific needs you can play around with that, which sometimes happens in embedded things, and, in 64-bit, we don't. I have this situation where there is ever a possible need to do that essentially um on the arm 64 we are at um8 bazillion there um x86 64 are the splits in a different location um but you know, given the Ram sizes and such, uh it's effective uh, something that's not important on a 32-bit system, um, where that page scroll is, it has an effect on how we handle large memory systems, which we'll talk about in a moment, so, There are three types of virtual. uh addresses in Linux and uh ldd um uh define these as best as you can, you can look at that and um the way we define them are um and some people use some different terminology historically um but uh on the kernel side we have kernel logical addresses addresses kernel virtual addresses and then we have those user space virtual addresses, okay, there's another special case, but most people don't talk about them exactly that way, whether they're physical addresses or bus addresses, but you can look at in ld3, the link was in there, um, to get a little more information, so the kernel logical addresses, um, that's what people consider to consider the normal address space, um, which they're normally dealing with what you get from K Malik is a logical address, uh, okay, they have a fixed address, uh uh. offset is fine and then you see a magic number there, that's the offset value of the configuration page and that would map to that now that physical address is specific to an architecture that couldcompletely different, but they're touching on that.

Same page frame when they switch context, okay and how does it look? We have the shared physical frame down there in green. We have this virtual address assigned. It's touching that shared framework. This is a 4K shared memory space. so this completely different virtual mapping in the other process points to the same frame boom, we have shared memory um now um, so that was a case with different virtual addresses, okay, um, mmap system calls may be familiar with them , you can access a specific address to share the memory, so that's a different case, that's okay and it can fail, okay, let's talk about lazy allocation, so one of the things you'll notice when you work on a Linux or classic system The Unix system is that the Colonel is not going to allocate memory, directly, well, yes, you saw that your call returned successfully, right, you have virtual memory, but it didn't actually allocate the physical memory, those page frames that support it correctly and that is. what we call lazy allocation, so there is an optimization, the colonel will wait until you actually need to use that memory, so if you are allocating a 4 megabyte chunk of memory for your database and you haven't touched any yet, you will did.

He really didn't assign anything to it, right, if he never uses it, never touches it, never assigns anything, then how does this work? So when we request that memory, it just creates this record of the requests in the page tables which we'll talk about page tables in a moment. Return the process and then you have that virtual memory reserved in the userspace process, okay, once we hit it, our old friend, page fault comes into play, we already learned that we are going to get an exception because there is no no mapping there or it's just configured to read permissions correctly and we're going to do the page fault handler so colonel will use page tables to see that the mapping is valid in this case in a virtual address space assigned with rights of lazy allocation, but it's not yet allocated in the tlb okay, okay, um at that point it's going to allocate those page frames, a page frame, a series of It.

Whatever the request is, it should be satisfied, okay and um, so it will update the tlb, its architecture is specific, how that happens, of course, with that. mapping and then returns from the exception handler and the user space program continues, so your Malik got you that virtual address space and quickly returned, but when you went to touch the memory, all this happened behind the scenes the first time you were. reference that pointer and update it with a value, so that's what happens behind the scenes, okay, but you're not aware of that key. It points here, but you'll see it if you're running a benchmark and you see that lag. appreciable, right, and you can use, you can use trace tools and see how that's happening, visually, the other thing if you have time-sensitive things here, you know you have a quick path, you can go pre-map that there may be used mlock um or the mlock family calls um which will go ahead and preallocate these things so you don't have that lazy allocation situation um, so like we said, going into the page tables, the tlb entries could be T The entries in tlb be can be a limited resource, we can't just map the entire world of our address space right there, so we have many more allocations in that mm structure for our process than we do TB entries, so the kernel has to track everything that, so you have a set of data structures.

We call them page tables, okay, and you can look at the mm structure and the VM area structure to see how they're done, but it's essentially a hierarchy that guides you. down to that 4K page entitlement and the mapping associated to a page frame number and the correct permissions so that everything lines up with what needs to be loaded into the tlbs, and also has metadata in addition to whether it's valid or not, etc. and some other cleanup flags, uh um, too, okay, so when we have something in the page table in teal, we have a valid mapping right and you touch it with the hardware, since there is nothing in it yet a tlb that will generate that. correct page fault, the cpu does not have the knowledge of the cpu, the correct mmu, uh, only one kernel works fine, so our page handler runs correctly.

It will loop through these page tables, find that mapping to the virtual address, correct page granularity, select and delete an existing TB. input, create a new one with our address and the correct permissions, etc., and return to the user space process, okay, okay, swapping and okay, um, so, swapping, we're used to our systems, we deal with our desktop systems, our development. systems um where uh we have a lot of changes in our dis when they're doing heavy builds and they're running out of RAM and um you know how this works the mmu is what allows this okay and um so, um you're You're going to stay without that 16 GB RAM that you have on these heavy builds and uh uh you're going to contact the switch and it needs more memory and it will take those page frames that were backed up and take the contents of those and send them to your storage directly and then when need to retrieve that data and it has been context changed, it will read it from that slow storage and bring it back, that's the big picture, very low. -l correct details, it will do it based on frames, it will copy a frame to the dis, remove the tlb entry and then that frame will be free to be backed up by U for another process, so when we need it again, CPU raises a page fault, common theme here, we, we remove that entry from the tlb right now, so now it will raise a page fault and then when we get to that page fault process, we copy that frame from the dis to an onuse frame. and we update the page table entry and then we reactivate the process, okay then it will be a slow process, right, we have to exit to that iio block, now we are limited by that bandwidth, so when we restore the page in Ram, it's okay.

We're not necessarily getting the same page frame, so again we have this virtual dance right here. There's no persistence or affinity to that original physical page frame, so you have to get rid of this notion that PIDs, you know, physical addresses matter. Okay, you'll use the same virtual address, though that's true, because those mappings stay the same in user space. Okay, you don't know the difference, so your code runs while you get the processor, swaps, and context switches again. The mapping of the same virtual address will be remade and your code continues to the same virtual address but with a completely different fallback, as the FR page frame contest is copied again and then mapped fine again, this is that low .

Level of detail why we said we can't use userspace virtual addresses for dma. We do not have persistence of the physical support that the DMA engines and peripheral hardware need. So what does this look like visually? We have this frame that was selected by the kernel to be swapped to our disk, we have this wonderful garbage can that looks like a cylindrical disk here and we copy that frame to the swap medium, we invalidate the tlb input page table entry now It's not valid, okay? and now there is no entry there, so that frame was freed, so now you can free it back in the allocator pool, but the data is persisted in the dis which is in your swap partition, now come on, we get , we get contact. we turn it back on, go back and run the same process, try to access the same virtual address we were running when our CPU was so abruptly taken away from us, and we get the page fault we've been through the page fault dance before. and we just go ahead, we copy from the Swap this simple cylindrical disk and, um, we put it back into the page frame that we were assigned, we create the tlb entry, I thought it had one more animation, yeah, and then we go back to the user . space now we can access that virtual address we have the same data that we had before they changed us okay so I'm actually running this on time so I win that's 95 slides so user space we have several . ways, we've gone through that whole stack, all the major pieces of how everything happens in the background, now let's look at how these Maps know about our APIs that we have in user space, and we have several ways of that. we allocate memory correctly, we have our whole family of Alec things and I've referenced them verbally a couple of times, we know that we can allocate them to allocate and directly allocate pages, we often see that to allocate some peripheral I/O, um, yeah We're hacking without making proper kernel drivers.

We have break and S BR where we can modify the heap size properly, so first we map out M in a way where we allocate a heap of user space memory. You will see. If you live in the world of running srace on things, you'll see a lot of maps happening right when files are opened, etc., so if you use Map Anonymous you'll get a normal memory allocation. the shared flag allows us to share that memory with other processes, so break, why is it called break?, which is the top part of the program. break. Correct legacy terminology and effectively you increase the heap size with that, as we say, okay now. um lazy allocation going back to our whole lazy allocation technique, okay, we have a situation with uh, if we look at M.C and dob bre um, it's implemented very much like a map, so it modifies the page tables, we talked about as. that happened right where we modified the page tables and then waited for a page fault, okay, and the other thing you can do is default, which we talked about with mlock, right, and not have that problem when accessing memory. you have this long delay, relatively long delay where you actually have to allocate that large number of page frames correctly so you can take that cost up front with mlock and then have relatively deterministic behavior once you're actually accessing the memory um Malik and cic's uh implementations um are the same um they're going to use break or M map depending on how big the allocation is and that's going to happen behind the scenes, and uh, if you're clever, you can modify that behavior with malop um, you could set the threshold parameter to say where it activates or not, um, that's often used in system tuning, okay and then finally a stack, um, if a process goes beyond a stack, the CPU correct will also be activated. a page fault, one of the special things that the page fault handler does in this case is detect that you have an address just past the stack, it knows where it is on the right and then it can allocate a new page correctly so that assign another pfn, go to the page table map, drop it into the tlb and remember that pfn could be anywhere that is not physically contiguous, it is virtually contiguous so it folds in execution continues and it can know to place things in that segment of the stack um you can see how it works and do page f that's the arm version and um a quick summary like I said introduction so if you're already a seasoned colonel you probably know all of that but we go over the physical memory, right?

We look uh. stock um you know exod 6 family memory map we talk about virtual memory three types of kernel correct logical kernel virtual virtual user which ones are contiguous or not correct we use chronological for dma um we go over user space addressing how processes will not have contiguous uh physical memory and how swap page faults work to do lazy allocation, etc., like I say, we covered swapping and then how those userspace APIs map to all of that, so that's it for the introduction. I have a minute for questions, yes. on the back, okay so, first part, first part of the question, let me address that, so the question was does the kernel always have the correct mappings and you're talking about that logical mapping that has why we have to wait for this. expensive mapping to user space and that, so to answer that and I hope I'm answering the right question, um, the reason for this is those logical mappings, if we just use them, it would be like that one-way system without an mmu and I can tell you that there are systems that in the 9's had mmus that ran rsses like VX Works and mapped with mmu only a flat address space because they had to have mmu enabled for performance reasons, but you don't. without having your own process space, you would have to link everything in its own address space and everything to make konel's logical addresses nice and linear and easy to think about, but you have to do these remappings so that the user space has that nice space. world that we enjoy that address space ofuser protected process where you just write a program, bind it and it will run in any context, if we had a whole mapping of the kernel logical spaces it would be just that one address space you would have to bind your program in zero and a bazillion two bazillion and manage them without trampling each other while allocating memory.

I hope that answers the first part and I'm out of time, but can we talk about the second part, yeah, sorry. 95 slides then

Watch Video & Subscribe

If you have any copyright issue, please Contact