YTread Logo
YTread Logo

Steven Rostedt - Learning the Linux Kernel with tracing"},"lengthSeconds":"4044","ownerProfileUrl":"

Jun 03, 2021
So Steven is a small

kernel

developer and we rarely have people like him here in Bulgaria. He really hoped we would have him for this year's Open Fest, but unfortunately he couldn't stay, so the best he could do was invite him. I've come to Sofia University to show them some love for the Linux

kernel

and try to get them excited about the Linux kernel, so we were discussing what he should talk about here and this talk was for me, it's the best he can do. show you here and help you understand why you should join me, okay, let's make sure that yes, my son, okay, so let's do it Hey, my custard, my mom, what I always do is I have this camera now, how few boys know what!
steven rostedt   learning the linux kernel with tracing lengthseconds 4044 ownerprofileurl
This is like something old, and this is the way you make real selfies smile perfect, so I'll upload it, like Marina said. I'm Steve Ross, Ted, I'm one of the Linux kernel developers I've ever been. I first played with Linux in 1996, probably before some of you were born in '98. I started playing with the Linux kernel. I only had part of my mastery in these. In fact, you even made the Linux kernel for my master's thesis and then I can't. advertise coke, so I, but we can't try VMware, no, that's it, so in 2001, I got my first job working really professionally on the Lynx kernel, moving time from the real-time Linux kernel to various board support packages so you know different architectures like mips PowerPC put together fun stuff like that and then I became a contractor and then I got hired by Red Hat.
steven rostedt   learning the linux kernel with tracing lengthseconds 4044 ownerprofileurl

More Interesting Facts About,

steven rostedt learning the linux kernel with tracing lengthseconds 4044 ownerprofileurl...

I'm one of the original developers of the real-time patch that turns Linux into a time operator, so that's what I've had here for a long time. story, that's not why I'm here to talk, basically I'm here, the guy introduced you, you know, there is a kernel via trace or F trace, how many people have played with the Lynx kernel, okay Bob, then you have some good people, like that, like that. some of this I break down for people who haven't done much, how many people have programmed, I see most people are fine, so first I'm going to go over the computer overview, one thing I try to tell people .
steven rostedt   learning the linux kernel with tracing lengthseconds 4044 ownerprofileurl
Know when they get into the lower aspects of programming and not just web development or app development, but actually coming down to the core, a computer is nothing more than a Turing machine. Everyone knows which Turing machine they are taking. In classes they talk about the Turing machine, you have an infinite number of tapes and only state machines, that is all that a Turing machine can do, a computer can do and vice versa, so they are the same and I deal to tell people that a computer is extremely simple. it just takes a simple command add subtract change compare it will jump to a different location it will move that ribbon which is usually what we call memory to something else so everyone is used to the green part at the top of the application which is where you might be programming and you could do something like printf where that speaks calls to a library function that will do some magic for you or if you're open you know the file reads the file, writes those applications or those operations that you do and the libraries will do some work . for you, it makes it easier so you don't have to interact directly with the kernel, but you can't bypass the library and talk directly to the kernel and you can even bypass the kernel and duck directory directly to the hardware, but that goes further. the scope so basically what a lot of people are used to is talking to the library but I'm not going to talk about that today that's not this talk if you want to use gdb it will show you your interface in libraries everyone uses debuggers gdb I'm more interested in talking to the kernel like I said, I don't want to talk about that, that's another one that's beyond the scope, so everyone is familiar with this program.
steven rostedt   learning the linux kernel with tracing lengthseconds 4044 ownerprofileurl
I hope it is the most famous. programs in the world it's my favorite I use it all the time it's useful especially for talks like this and GCC you know the canoe like all their compilers canoe C but I guess it's the compiler I found the exact acronym it's not that and you compile your hello dot C, which is what I call hello dot C now hello world that's C so when you run it it says hello world everyone sees this very obvious how many people have done this oh ok ok a lot of people how many people haven't basically I'm speaking for you because I'm focusing on this is what you see when you do an ab jump, it creates it in what's called an executable elf file and in linkable format, it's the way it's written, it's just a binary. file written to disk that when you run it you know you type your bash command line or however it is run you go into the kernel the kernel will read the header file and it will know that it knows how to parse elf files and it goes to load parts with a file in several segments of memory and then it will jump to a location, in fact there is the home location that it will jump to if you read elf, which is another application that we will actually look for.
I'll tell you. where is the starting point, but look, there is a lot of code before that start and it is added by GCC, there is no hello world there, but I go to the next page or, oh, before I do that, there is the start file function, There is a notice right there calling. it goes into Lib C so it starts and then it stops so that's your actual program, it jumps to Lib C which will do even more magical things that will call these other functions and then jump to the main code and then look for where main is. here's more this is the next page if you do it if you're doing less and you hit and turn you look at the next page this is what you see a lot more garbage that you don't care about and finally this is the last page it's only three pages the world it's these three pages and we do a knob dump and up here at the top that's your main Zhu min program this is what you see so these are the addresses this is a machine code this is the assembler, this is an office, assembly code and machine code.
Now I often talk about machine code and people get scared. In fact, sometimes I have to know machine code for some of the various things I do in the kernel. That's because I do it. some weird stuff in the kernel that most kernel developers don't do, so I'm a bit weird among kernel developers okay, so I need to look at the machine code. In fact, I have to know things like EA is a function called, how when? you call a function, it does opcode e8, which is very important from this and you see it right there.
I ate it followed by a four byte offset, but that's the machine code for this, so this is what the machine will actually read, heck, that's a hexadecimal. I think I didn't have enough space on my slides to convert them to binary, did you notice something here put s put s is another printf call and you put s but we didn't write put s, we wrote printf so why is put s there? you are now building with optimization turned on, but GCC will actually automate the optimization for you. B will look and say: "hey, there's just a string format with no parameters, there's nothing to parse", it will actually know this during compilation and say: "hey, just print a string". she says why calling printf this is really a placement test.
All I'm doing is writing whatever is in that string that was hello world and right there I'm not going to spend computer cycles, you know, reading and parsing the format to find parameters to inject into the code because it's expensive a put s has no logic, it just says write whatever you write so GCC is optimized, so I want to print. I don't want to put s, I want to print, so I have to go. I go back to my program and this time I'm going to add a parameter. Now I'm going to do something that's not normal instead of just putting in any old parameter and a variable like that.
It's boring if you could put an int sign the chain and then call us. again, so I don't want to do anything different. I want to see which direction this program runs. Now what we had was this. I collected it, did a knob dump and now I got this. i got my printf so printf is here it puts it in the ri part move the first parameter this is a string is what is the string is the offset of the instruction pointer in RSI the register RSI you have to remember for this actually RSI is the second parameter , so this must be the address of this guy, in fact, minus B of the IP here probably jumps directly to there, so this is the address of our RP minus B, which is 11, yeah, which is Minus 11 will bring you back. 11 bytes for main, that's the main reason why RSI is the second argument.
We will return to that later. Our di is the first argument, which is a string and then it calls printf, so let's look at the main one here. 1 1 3 5 remember that, so what do we expect when we compile it, run it again, and run our hello with main? let's wait 1 1 3 5 right, it makes sense, it's obvious, that's the direction, this is what I have, you know, on older kernels. I would have gotten 1 1 3 5, but in this kernel I got this crazy mess. The funniest part is that I ran it multiple times. I got a different result every time, so this confused me from the way I wrote these slides yesterday, so I ran them.
I have to go see why I got a different thing every different value every time I did this. You could as long as you have a new coat. I ran this on core 419 by the way, 419 is five colonel and I looked. In this, but one thing I realized that all these guys are the same, one three five, it's the same every time, so I looked at the code and I thought this is a security feature and I found out if you go and buy echo zero. in a control file in the kernel is some kind of optimization you could have, so I echo in /proc // kernel randomizes the VI space a virtual address VI space, so basically, by the way, don't do this, this It is a security feature.
Basically, every time you load something into curl, like run something, it will randomly place it in the virtual address space because it's a real look. Elf files are made so they can run anywhere, that's why you had the instruction offset. pointer and not just a hardcoded address space for those parameters, the string and the values, so when I ran it it still put in some strange number, but every time I ran it it was exactly the same every time, so I'm sure you guys I have seen these things. This is what a page table looks like inside the kernel.
This is what happens if you ever do something inside the kernel. You have to be very aware of this. This is how the virtual address space is now mapped to the physical address space. in the physical address space, that's the one here. I know no, I didn't have much space on the slide so I made this 32 bit, even though it's 64 bit so it pretends it's 64 bit but it just did it. easier to put on the slide, so I made this phase 34. This is the entire address space of Edge , so it's basically a space that you could tell the computer or the CPU to look at something on a line and there's something there, so on boot, the bias will actually allocate things, it will allocate memory to certain locations, so mat 0 may not even be memory, so if it makes a null pointer that's why you'll never have to worry.
I knew that something goes to zero a lot of times it's not assigned to anything, so if you're there, this will actually fail and cause a crash, so the way we do it is virtual. address space and if I divided it, what I did was 5 6 4, so 5 6 4 I put it here and divided it into four, but I divided it into colors because it's 9 bits, so I had to divide it. to binary, so I translated the hexadecimal to binary, these are 9 bits, 9 bits, 9 bits, 9 bits followed by 12 bits, so this is an index, 9 bits is 512, so the first one is the directory of global pages that is placed on you.
By the way, I don't need to worry about this, just let you know. I'm just telling you things just so you're aware of the things you know. This is not a lecture, there is no test, there is no exam. I hope you remember all this, we will learn it. It's simply knowing your interests. If you want to know more, all you have to do is search the web and do this, so I'll just be guys showing you. an overview, ideally this is the page descriptor table, this is the start of basically your address, your virtual address space for your application which is kept on x86 in the cr3 register, so there is a special register in the CPU called cr3 in which it loads something. there, the CPU, go ahead and look and search, it will be a physical address space, so what I put in the cr3 register, thisit's actually mapped somewhere here so it's going to jump there and expect this format and this is going to be a table so if I write to a virtual address space it's going to take the first byte by the way below is the hexadecimal of the first nine bits and that is AC, so the AC index goes down here and then it reads this physical address space that is somewhere here jumps to it and it will wait for another table, so I guess the top descriptor of the page does the same thing zero and seven jumps down here, looks up, you know, the page mill descriptor goes down here jumps up if I get to the page table entries and then here the last one jumps to the real physical space where it will eventually map right where you are, so that's just an overview of physical edges you don't really have to worry about this, but know that what you write to the virtual address space is not what you need to know or it won't always map to the same one. place in the physical address space, in fact, it may not match - anything and when you have the right to it you can send an error to the kernel, the kernel will say oh, this memory is actually in your swap partition and then it will read the swap partition. swap, load it into real memory, populate the page entry table and say and then go back to the kernel and you just carry on happily, that's how swap works, so if you ever wondered about your swap partitions, that it's just been mapped and all you need to do is fill in the removed entries here or set a flag. there it says that this guy doesn't exist and that it has a bug and the kernel is going to say ok, I'll just remember to open it and I might put it somewhere else so you can move this stuff wherever you want as long as it's mapped. correctly correctly here, so if you have two applications, the same virtual address space could be allocated to make two different locations, so this same address space if you have application a p-- application b application a this points here in application B that same address points here so, there are two different locations, same virtual address space, so in agile space usually again I still use 32 bits because it's easier, I don't want to extend there, it's like you have to zero to some number, is your userspace something greater than? that will be your kernel space so your application will be your maze when we assign a user but the question is what is in the kernel and that is what we want to find out so user space and kernel space now They have a special set of flags, so we call it on x86. world there is like ring 0 ring execute ring to bring three I guess ring 3 is like 1 and 2 is not used 3 is user space 0 is kernel space and it is a mode of it is just a bit that is basically in the CPU and when you are not on ring 0 you don't have access to anything in the page tables which says this is just ring 0 so if you try to access it it will kill your process if you try to access something on these ring cores 0, although with Spector and Meldown, you can avoid that, but that's another lecture anyway, so if you have user space, you have your applications and yes, I know, I put word for word and Outlook.
It was the first thing that came up in Google image search and I didn't have much time, I wrote this yesterday, so they each have their own virtual address spaces per page tables and they don't have access to the kernel and then they say you have something, they share a library like Lib, you see it dynamically. object library, then what we do is a file that is allocated somewhere has to be in the same place somewhere in the virtual address space of each of these applications and when these applications need to do something special like read a file , read on the network, read or send a packet show something on the screen I will accept interruptions or I will be able to read your keyboard they need to talk to the kernel because everything the kernel does the kernel is the service provider for whatever the applications want and They do it via system calls, so everyone here should be familiar with system calls.
Systems cause the way to access the kernel to discover or be able to accomplish something that it normally can't accomplish because it has to access something else. so you open a file over electronic network files and it's an application, it's an application programming interface API, so it doesn't change every time you know how many people have seen or heard of Leanest Revolves yelling at someone who is now a very nice guy. no, but one of the things that gets shouted out easily is that when people break applications, you don't break user space, you modify one of these system call things, you will break user space, although it's all if you break user space. user and no one notices.
You broke it and the answer is no? Leena actually said that and no one is complaining. You could change userspace to get some classes that changed and no one noticed and never broke. So how many are familiar with the trail? Okay, I should probably tell you how many people are unfamiliar with a vest race. You know they're too lazy to open a beer or raise their hand or you don't want to embarrass yourself or something. Anyway, the important thing is the system called tracer, it is a very, very useful tool. it is very slow because it uses P trace P trace is a utility in the kernel to be able to connect to other applications, you become its parent and you can monitor it, you can advance it, you can stop it, you can step through it gdb, your debugger. you use P trace to stop an application - it can do everything you do in gdb, where you can read memory, change memory, make something move step by step or run up to a certain point and stop, all of that is handled by P trace.
It's a horrible interface. We never use it, we are trying to find other ways to remove the P trace, in fact we are trying to work with perf; in fact if you go to a

linux

plumber we will have our session in a couple of weeks to try to get s trace to use the performance infrastructure so you don't need to use p trace and it could be much faster although they said The percentage of speed that the P trays or s trace have slowed down is normal for not using s trace. actually it has become much smaller the percentage is actually compared to how you run applications run normally and compared to when you run in s trace it is actually much closer and s trace never changed what happened were the fixes for the merge and Spector slowed down the interface to the kernel a lot, but it affected tracking, so hey, we went from being 30% slower to just 8% slower, without realizing that everything was much slower , so Iran's follow-up on this, it's actually a full screen. actually, this is every system call called by hello world, that's what hello world does a lot, right?
A lot happens when you ruin hello world and in the beginning this is exactly where the bigot BRK is where he creates the memory. address space for you and then it will check the dynamic linker to load things. This is a library and these things don't even exist, so that gives it a no. Oh, we found something, let's check and check its status. Let's map the memory, okay, close it L is basically, I have no idea what that is. Did I look at it in the code? It's like a little thing, it's just information. I wonder: what the hell is this?
Anyway, it's useless. Protects memory for security reasons. It will serve to protect the F BRK statistic and boom, there is our hello world. we finally got to something where we actually write hello world, you know, if I write, I'm going to write hello world and assembler and probably five lines, if I just write it with a system call, I don't need all that, you'd actually get some of the stuff that the BRK would actually happen because I think that's some of the things they could do just loading it, but they could probably just Dunham's a lot faster.
I had time to try it out, so that's three. What about F three? This is the official tracker. The Linux kernel is what I developed, I maintain it, this is kind of what I do, I constantly improve it and the way to access it is here, it's probably on any laptop you have or any box you have. I have to run Linux. I'm almost guaranteed it's been there since 2008 2009 2009. I think it's been in - it just started distribution so it's been almost ten years, nine years, it's been like everyone has it and it's actually in the debug trace of kn'l brother.
I moved it to six, kernel trace only, but since I can't break user space when your mout debugs if s, it will actually mount the s trace directly on top and if you want to access this if you ever go into the integrated programming. and you want to use Busybox if anyone knows what Busybox is it's basically a bare minimum okay if not I would say the userspace environment is very very small it's as small as hello world as you can see and you just do it . you don't have much, there are no libraries, you know, everything is linked like a big blob, a memory, everything you do LS and all that is just one thing, so you can use F trace only with Busybox, that's why I wrote it because I was an embedded developer.
It's been a long time and I always have a soft heart for embedded development, so I saved it so you could do it using echo and cat, so if you go to the F trace control directory to get there, if you want to mount it, it's this command. like I don't remember and I'll send the slides to be available so we just mount the board T the quick crawler without development for the SIS carl crawling the sis trace directory Karl is simply a pseudodirectory that the kernel creates for you, yeah F trace is enabled, you will see that directory, so just look there to see if F trace is enabled; otherwise you will see that directory there on newer kernels that created the trace directory. it's about to look at us, so I ride it, I just do LS, here's all the things they could do, you know, to track.
I highlighted some of the things now, which is really special for Gregg. I hope you know who Greg Crowe Hartmann is. the stable maintainer was very impressed with F trace because we are the only ones who created a file system with a readme file funny thing thank you thank you funniest thing about files I removed it just because it was kind of a joke but it takes memory on your space, as I deleted it, but it was a big joke. The first thing in Rimi was how to mount the FS trace directory, which doesn't make sense because if you're reading it you already mounted it, so if your cat trace you go to that directory only cat trace you see a header and nothing so that's not so boring we want to do something when I see something some action what we want to do then we echo the function this is actually you could enable it enable Tracy in almost all functions of your kernel and this will continue forever if there are two files there is trace and pipe trace , trace fine when you trace back, it's an iterator, it's a non-consuming read, so if you pause Tracy and can read the trace file over and over again, it will always be the same, but to do it I had to do it when you read the trace file.
I actually stop tracking when you read it so you can do it over and over again if you want a producer. -consumer where you are not pausing Tracy and Tracy never stops the trace pipe, there could be a trace underscore pipe in your directory, you read that it is a read that consumes but it also doesn't stop

tracing

so you can run it if it ends of doing the echo cat function the trace pipe will continue forever it will never stop never the cat itself will read it constantly it will cause events in the trace from its buffer and it will just read its loop to disable tracy and you just won't operate the tracer current.
The thing is that now we have the follow command. That's why I want some people to help me develop this tracker commit as a command line utility so you don't have to do all the work. all because yes, Busybox great. I am very still. I make sure everything keeps working for everything I do. You echo Busybox and all that, but for more normal users like me, I don't really want to go in and type things. I could create scripts that are not useful, so I wrote a follow-up command. By the way, I hate the name. There is a historical reason behind it.
I'm not going to explain it here, but this is a historical reason and I was going to change it. I even had it on Google+. If you remember what it is, it still exists. I think I did this Google+ poll anyway about how I should change the name and the first thing people chose was to keep it the same. My scripts are using it. It cannot break user space. It was like I hated the name so I had to be root to do anything useful because the

tracing

is really the fun part. I am the cook of my friend's case, whosome userspace program.
We did it. I mean, it is possible to include some Linux kernel header in some userspace program. capable of receiving all that data, which is doing it directly, basically taking a kernel header and including it. I mean, I guess that's what BPF tracking does. It takes a header from the kernel and includes it in the thing. Now the question is here, okay? you were able to do that the other day. Yeah, I got some feedback on the other one, so it takes time for the mice. Okay, you take the header file from the kernel, you compile it on the other one and you get the same header files.
What are you doing? if you run on a different kernel where the header file is different so it doesn't work, oh yeah what I would like is what are we working on; by the way, Colonel Shark and F three are going to do it. They'll be wrappers for the libraries, so all the functionality we have could be an LGPL library and you'll be able to attach it to any application you want, so once you ship it to multiple distributions, you'll be able to do it if you write a program that you want to track. you have to find a way to change the path but we have ways to help you change to the path but you have to type the password and then run Tracy record the data to the file and analyze it and any tool you want and it even works with similar python tools , so Python will do this, but say that doesn't answer your question about structures and headers for discovering data within the kernel. we want someone to work on getting a midget parser, so all you need is access to a kernel built with midget information and then you have a midget parser, that elf file I talked about earlier, the debugging midget, you know, the midget elf that was the guy.
Obviously people are fans of The Lord of the Rings, so the dwarf is far away is a thing inside the elf file that tells you where all the variables are, the structure designs and everything else, so if you have a parser, you might say, "Okay, give me that function." Give me that second parameter and just tell it that the dwarf will tell you where they are, how to see it, what records they're in and then you could add the track points and be able to dynamically do all of those things. Oh, we're working on that's our goal, actually my goal is like in the core shark to appear in a yard and privatization knows it, but he's discovering feathers now one of my goals is for a core file to appear, like this Let's say enter a file will pop up in the kernel directory of the currently running kernel.
I clicked on this. I want this variable to be registered. It will say start and record. You will create a tracepoint using a probe in the code. You will get the dwarf read variable. Find out where it is. es and you run your code, you actually see that variable appear so you don't have to figure out what the variable is, you just look at the code, you just point and click, that's where I want to go, so okay. anything else, go box here, throw that part out, yeah, that's not a traceback question, yeah, I know, okay, so the code of conduct is basically at the maintainers summit.
Now, how many people have heard of the code of conduct and the thinness of the whole thing? I'm on the technical advisory board of the Linux Foundation. maybe we should stop the recording, no, okay, you would record it basically, this is public knowledge, basically what we're saying is it's in the code of conduct, it's there, What was said was basically that a lot of people are upset or, you know, afraid. everything and right now we're going to say let's see how things go and let's not panic and do things over here let's say or you know these hypothetical problems, let's see what actually happens if there is a problem then we're going to address the problems when a problem occurs because right now the chances are that if we go back two years seriously, it took a lot of people who are criticized for thinness and all that stuff about two years to find something that broke the two year code of conduct so what does it mean? that for the last two years we have been following the code of conduct that we don't really know and black people never realize that and this is basically the code of conduct is basically to show everyone that we have really changed that we weren't much different from we were a year ago, I mean, we're actually better, but for the last two years, the Lynx Crow event, the Linux kernel development community has been pretty tame, as we're much more professional.
I think we just got older, we have kids, you know? I have learned how to deal with things. That's my personal opinion. Can we expect Torvald's code of conduct and developers? Excuse me, we notice that the commits on github are small these days, so we hear that the committers will pick up work and establish a new project there, that's new to me, rumors are just like making fun of rumors, you know, no I see that. changing No, I haven't heard that's the first time I've heard about this. I don't know anyone who is going out and doing things, I mean we have an interpretation document that pretty much explains the way we go and it's one of those things that no one likes, which means it has to be written correctly to that it is not and really it all comes down to reality, it is a problem of perception, as I said, the Lynx kernel has been really very good and we have it.
We are not changing, but the perception that we need to change the code of conduct helps to change perceptions. The same thing tells people that we have been like this for two years and it is not that Weenus maybe we are not saying as many bad words, but here I already reject things. he already said no, you can't do it this way and yeah, it won't be as colorful, it won't be like you, it won't be as quotable and that's good, the problem is fine, the only thing I would like is that everyone knows the problem with Linux kernel community that no other open source project really has.
We have a celebrity. Lena Stovall was even invited to the Oscars. Well, then he's a celebrity. People watch everything he says every day every time he posts. an email there are at least a hundred reporters reading each email and we are in a glass container, there is no other project, there are a hundred or a few reporters reading what is like the email of the project leaders, so when you receive something, if it makes the headlines and that makes us look bad and the problem is that we are trying to be inclusive, we want people from China, India, the Asian countries, many countries there and we can do a lot without having the big retro comments, you know, actively aborted, it's smart, so maybe if you'll see thinness or something, you know, tell them once, hey, insult me, please, thank you, yeah, anything else, okay, thank you my way of getting them, but First come, first serve, you want some dice, there is a mistake. you have to find out what the mistake is I will give you one mm-hmm

If you have any copyright issue, please Contact