Elements of Programming Style - Brian Kernighan

May 02, 2020

Thank you very much um I can't say what's closer than an Au away um I don't know where you can hear me guys let's assume you probably can and uh no it looks like we're getting another last one. says it's on, if all else fails, check the power, okay, so I'm going to assume that if in the back row you can't hear me at all, you know, wave your arms or something and we'll assume it's okay, it's okay, so let's get started, this title here actually has a bit of history in a sense, it's obviously quite arrogant, and Imp poite in a sense, it's actually a title I used in a talk in 1973, which maybe you can.

I think it was a long time ago before pretty much everyone here was born and I was a lot younger in those days and I had slightly different hair, the Genesis of all of this and actually what I want to talk about today when I went to Bell. Labs uh, in 1969, I was lucky enough to have the office next door to Dick Hamming anyway, like I said, I was in the office next door to Dick Hamming dick Hamming uh he was arrogant and erasable and very, very smart and he used to spend a lot of time complaining about how people programmed and he used to say to me and many others Hey kid, what's the way we teach

programming

: we give you a dictionary, we give you a grammar and we say you're a great writer and that's nonsense , it doesn't work that way and one day Dick came into my office carrying a book, literally, this book, um, a book called Computer Applications of Numerical Methods, I won't give you the name of the author, um, and he opened the book and he told me showed a particular particular. page and he said look at this, this is terrible and I looked at the book and I said, Oh my God, this is terrible, he was looking at the numerical analysis.

More Interesting Facts About,

elements of programming style brian kernighan...

I was looking at the code of this book. It was a Fortran numerical analysis book, so let's go. I will show you the code that I saw in that book. Well, in fact, I will ask you the question: what does it do? It's Fortran. Many of you I suspect are all at least familiar with Fortran, if not true genius experts. You're not always a Fortran expert anymore, what are you? Come on, there's no excuse. This is fortrend as it was in 1958. It's not that complicated. So what is it? Remember that I, I, and J are integers, so we're getting closer.

It is not a unit. Vector, but you have the right idea, when I is less than J, that term is zero, correct, when J is less than I, that term is zero, however, when I is equal to J, the term is one, it is okay, so what this does is create an identity matrix, right? He is putting some diagonally. The Matrix is a smart thing and you know I have a room full of really bright people here, I mean really bright and you're not entirely sure what it's doing right, so there's something wrong with this show, it's probably too smart to be okay, how could you write it well?

You could write it like this where it's clear that you still have the same two nested loops, but now the loop clearly sets an entire row to zero and then the outer loop sets the diagonal element back to one. That's clearer, I think it's clearer in modern language. I would probably do it even differently in some languages there would already be something called "Make me an Identity Matrix", but at least that's better, so anyway the experience with that, the book that brought that penis. His constant complaints about how people don't program well and why we can't do something about it led Bill Plugger and me to write a book called "The Elements of Programming Style" and that's the title he gave him. they put on Jim and Scott. the announcement of this meeting here, the approach we used in that book was actually copied, both the title and the approach were copied from another wonderful book called Strunk and White's Elements of Style, which is very good. very compact book on how to write well in English um the focus on the

style

elements

and therefore the

style

elements

of the program was to say here is something that is not very well written this is what is wrong with it this is how you can improve and here's the rule that you could derive from looking at that process of going from bad to good, so what we did in the

programming

style elements was just take a lot of examples of bad things and improve them and establish a lot of rules as a result now, that was a long time ago, that book was published in 1974, the world is quite different in that book we wrote in Fortran 66 and in um pl1, a language that I assume none of you have ever used, um Fortunately, he's dead today, people. they write in Fortran 9x primarily or they write in C or C++ or Java or they use programming languages like pick your favorite, um, so the world is very different, the examples in that book actually appeared and I think this is one of the things which made it fun. the examples come from programming textbooks like this one, um, the idea was that people would tell you how to program well and they couldn't do it themselves.

I guess you could argue that this is history repeating itself, but what the hell, I'm arrogant and I'm at the front of the room, so what I'm going to do today is more of the same type of thing, but I'm going to use examples that They are taken from a slightly broader collection of sources, some of which come from programming. textbooks, some come from real programs I've discovered, some of them are new and I found them well, googling for interesting codes that might be suitable for a collection of computational astrophysicists. Some of them are probably codes written by computational astrophysicists. uh, some of them are written by students in my class, um, unfortunately, a couple of them were written by me just to show you that no one is perfect, um, so anyway, in a way, this is kind of old, the People knew that you had to program well.

It's been a long, long time, even before Bill Pler and I started writing about this in the '70s, but in fact, people still don't necessarily write very well all the time. I think the average standard has improved a lot, the principles of how to write. writing well hasn't changed the details change well the languages are different we have a lot more computing power to work with but the principles of writing clearly in a way that other people can understand haven't changed one bit in all those years, so let's see some of them, okay, most of these are pretty small because they have to be examples that fit on the screen.

Most examples. I just went back and looked. They're in C, which you might then think of as C++. but nothing very complicated or Java, nothing very complicated and some in various forms, um, but as I say, the principles, although the details would be different in different languages, the principles are pretty much language independent, so let's look at one, here's one only. Well, this uses a C construct which is sometimes good shorthand, but the question mark and colon operator is obviously overused or abused from time to time, so what does it do? I had literally never seen this build before someone asked me what it does. uh maybe a year ago it's not like it's illegal, it's absolutely perfectly legal, but it's like what the hell does this do, let me explain how it says if he's armed, in other words, if he's armed is non-zero, whatever that is. , is a kind of device driver.

I guess then, if the count of V is greater than or equal to the threshold, dot, dot, on the other hand, if armed is not true, false is equal to zero, then the count is less than or equal to the threshold, do the point, point, so comment that What this says is wrong because it says if it exceeds the threshold but it also does something if it doesn't exceed the threshold, so clearly the comment is not in accordance with the code, we're back to that kind of things later, the problem with this is too clever to understand easily, not that you can't understand it, but you have to work at it and you shouldn't have to struggle to figure out what a piece of code does when it's that simple, so if you're willing to waste a few more characters you can get something that's a little better if it's armed and the count is greater than the threshold or if it's not armed and the count is less than the threshold then do the right dot dot dot like this which is clearer, I don't have to explain it to you, you can read it, there are still things you could argue could be done better.

I don't have any parentheses there other than those surrounding if, what is the relative precedence of y? and or or well, I know the right answer because I've been doing this for a long time, but you could argue that maybe putting it in parentheses would definitely make it a little clearer. It's hard to say what I won't argue with, I think. of that as a second order problem, but the first order problem is just a mess, okay, so don't be too smart, this will be your problem because you are all very smart people, uh, looking at the mathematics that goes into this. things, it's clear, I'm smarter than me, on the other hand, it's very possible for smart people to screw it up in the other direction and you don't want to be too dumb because you can be smart in one area and, like all of us, dumb in another . a different area and this is a good example.

I think I'm too stupid. What's going on here, we have a bunch of stuff and the guy is defining an array of a blank character and then has four function calls to various C. string comparisons and copying stuff to do something, what the hell is going on here? You know it's a big heavyweight, at least some sort of mechanism for doing something, if you look at it closely you realize this is all it's doing, what it says is if that first character isn't blank, so if it's a digit, copy it correctly so you don't need any function calls, no function calls, nothing like that, you don't have to go to the manual and remember what the order of the arguments is for the argument to shake and complete the compilation. and what it means and if you copy the extra bite at the end and all that kind of stuff, you don't have to think about it, so there's a balance you have to find between being too smart and not smart enough, being really a bit di I don't know what you're doing and sometimes you can figure it out by a principle that I guess is like keep it simple, stupid, but keep it simple like that, okay, so here's an example again.

This is one I literally found while I was out. snooping around for new examples last week while marooned in Upper Pennsylvania with the slowest internet connection known to man, what does this do right? It says remove all spaces and it does indeed remove all spaces, but no, there's a lot going on. call sterin give me the length of the string several times um call this S peber I don't know how to pronounce these things this is not my fault which basically I think it says find me the next blank space in the input and then does a move m that copies things from here one character about how much it copies well your character the length of the pointer plus one okay, so oh decrease like Etc when you're done, finish, there's a lot going on there, what's it doing, how long do you think? it took this person to get it right, they must have spent an incredible amount of time to get it right, the problem is that libraries are good.

Library functions are your friend because it means you can use code that someone else wrote and once they got it. right, you don't have to worry about that anymore, so these things like shake copy and note, etc., are things that you should use without ambiguity, um, but there are times when you don't need to use them and at that time you probably want to back off a little bit and this is all that happens along the string and if it's not a blank, copy it right and then I can describe it to you very easily, that's all you have to do and the job is done, it's a lot. simplest this way I wrote it with pointers here, you can write it with array subscripts if you are happier with the fact that the code is almost identical, they both work, this is strange, it can't matter, it can't matter, but this code will run a load ship faster because that other one is somewhere between quadratic and cubic in the number of whitespace on the line because of all that testing and finding the next whitespace and then moving things around using the length of the chain of what remains, is a surprisingly inefficient algorithm.

It doesn't matter in that particular context, but I'm told that people who do cosmological simulations or whatever occasionally worry about efficiency, so anyway, part of the problem with these things is that you get the feeling that the The person writing the code doesn't really understand the language properly or their understanding is a little shaky sometimes, so here's a good example, this one actually comes from a student code, um, undergraduate students, not graduate students, those Graduate students have learned it all, do you sense a pattern here? I'll probably say this again, but computers are really good at doing things that have repeating patterns and then it occurs to you that you could probably do something better, soI probably don't even have to show you how to do it, but right, it's basically that if that index all it does is convert an integer between 0 and N9 into its asy character equivalent and the way to do that is to simply add the zero value of asy and that works fine in the asy character set, there's no reason not to do it. do it right so you know your language and yourself and if you do that, you turn 10 lines of stuff into basically one or two lines, um, it'll be a lot easier to work with, okay, here's another one, this is a piece of java code um that popped up and I drew a little picture so you could get an idea of what it is.

This is an example of bit whacking. I think bit whacking is something that's not done a lot in Scientific Computing, where you basically get bits and you want to manipulate them as bits and I think a lot of people, um, in building tools, um, networking, types of things, controllers devices, all kinds of weird stuff, they make bits. I suspect you don't do as many bits as the numerical stuff in your particular field, but it's still useful to know how to do it. This is kind of interesting because the task is to say here's a 16-bit number and I want to take the top eight bits to zero.

I just want to set them to zero, so I want to preserve the bottom eight bits, which is the eight bits, so the interesting thing is if I set that task for you, how many of you would jump in and basically use an exponential to do it? What the hell is going on? This is the power function know why you're doing that how you're doing it right what you're doing is you start there at position 15, the one on the left and it says, let's calculate two to the power of 15 and then if the number I'm dealing with the number that I'm trying to reduce is greater than 2 to the power of 15, so I can subtract 2 to the power of 15, so I eliminated that part and then we'll keep doing it until we get to the eighth part and then we're done fine, this is something interesting, oh, and I checked because I couldn't remember the power function for sure.

This is in Java. The power function actually takes doubles as an argument, so in principle this is done as double precision. Floating point, although I suspect someone inside said, "Hey, wait a minute, those are integers, so this is just weird, how would you write it? How should you write it? How about that right? Take the value, keep it." the bit you want with the and operator and probably heximal constant because those are the EAS EAS to represent real bits, those are those bits, okay, they're not very complicated, let me give you one more example of a little bit more just because I have it here and it's a guide to something else I want to talk about in a moment, so let's imagine again that we have a 16-bit quantity.

You'll have to make it close enough for astronomical purposes. Think of this as 16. What's that old line. about 10 to the power of 50 or so 10? Close enough, anyway, 16 bits like. This and all this work is all this little macro is supposed to do is like that, just swapping the top half. and the bottom half of this amount is 16 bits, so how do you do it right? It says take the part in the uh take the whole number and then mask the top part which is the n0x FF. We just looked at that, okay and we moved that eight to the left, so we took the right part and moved it to the left and then it says, add to that what you get by pointing to the original. number that masks the things below, move that eight to the right and that's it, you took the bottom part and moved it up and you took the top part and moved it down, okay, it looks good, but now you should know that you shouldn't do it.

Believe me, what's wrong with it? Yes, the negative number is one of those things that could happen. In fact, let's talk about that one of the possibilities here. Some machines do sign extension, that is, arithmetic operations like shifting, extending the sign, so if that quantity is had its top on when you moved it to the right, that would leave a one-bit trail behind, so I and the masking we're done in the wrong place there, so you can see you take these bits, you just put them over that. and then you would add to that, uh, what you got by shifting the other thing to the left and when you shift the one on the left it fills with zero bits, no matter what the machine architecture is, you don't need the mask there, so you already know that. worse, in fact, those are not the real problems, you know what the real problem is, what is the relative precedence of plus and the shift operators, well, I will tell you that the shift operators have a much lower precedence than the plus operator , so what this is doing is Calculating something that is not very significant on the left side and then shifting left by eight plus whatever that not very significant calculation is on the right side and then changing everything back to eight, so this doesn't do anything sensible at all, this is not possible. it works, but it comes from an article on how to do machine independent calculations.

I guess you could argue that this machine standalone is totally wrong on all machines, okay, so the deal is you can write it like that and that's actually correct, uh, for 16 bits. and you notice that by moving it to the left you don't need to do anything and by moving it to the right you have to make the adjustment in the right place um and you notice that I didn't use the plus, I use or because or is at the The right level of precedence is fine and it's kind of a warning sign, at least if you're looking at C, C++ or Java code, the precedence of logical operators is very low, so you should be very careful if you come across mixes of logical operators down here and arithmetic operators down here above because without parentheses the parenthesis the preceding the association will simply be wrong guarantee okay there's something else wrong with this it's not a real bug uh in this code snippet but it's a suspicious construction uses macros so the original definition of C had the C preprocessor, it has a macro processor with kind of strange properties, sometimes, um, and there are dangers in using macros, and macros are something that you probably use in your C and C++ code to some extent or else You use them yourselves because you are wise, those who were before you did use them and are not as wise as this one.

I mentioned that the function called isdigit takes an asky character value and tells whether it's a digit or not, so this is a possible implementation of that as a macro. That's fine, and you'll find dozens, if not hundreds, of textbooks that say you can do it this way. What's the potential problem with that? Well, one potential problem is that the way macros work is simple. textual substitution, so that when someone in the rest of the program at dot dot dot says that it is a digit of something, the something is connected in two places in the resulting expression and therefore, depending on how the expression works, it is evaluated twice and if it has a side effect then you have done more than you thought and I actually found this in a code snippet that I think was processing jpeg images.

I don't remember now um on the web I found it last week um in this context so what this does is say that every time we want to test a part of the input that we call isdigit with something that has a side effect, so if when you call that value in JX at the current point is greater than or equal to zero, then decrement JX a second time, thus comparing something unrelated to nine, so this code is completely wrong and is going to fail in a completely mysterious way and this is where I speak from bitter experience.

I must have spent two or three days of my life quite a while ago, I was trying to figure out why a program I wrote basically only produced about half the result it was supposed to produce and the reason was that I had one impression, you know, another in this family, and was invoking. it was something that had a side effect so all the other characters most of the time just disappeared and the only good thing is that it wasn't my fault because the function I was using is print came from the compiler manufacturer who is better known . but it wasn't that mysterious, it hardly characterizes it, okay, one of the reasons why people use macros in C, by the way, and it's a reason why I don't think it has any validity anymore, but they used to use them because they were more efficient in the sense that with a macro you could get something that had more or less the semantics of a function call but didn't have any function called overloading and the machines of my day, you know, like eniac um Machines of my old Suba overload. was a serious factor for most purposes today.

I think you can safely ignore it or at least it's not a first order consideration at all, but people use macros to try to make things go faster and here's a wonderful example that was found again very recently, just um on the last few days, uh, there's a huge macro called Fast M M Copy, okay, so memory copy is one of these things that says here's a block of memory, make a copy there, so it's basically a loop, um , and what he says is funny, there is a definition. The balance point, which is actually a defined variable and you're supposed to set it to different values for different machines, which isn't exactly the definition of portability, but you're supposed to, so what it says is yes you're about breakeven, so you can use M uh, the M copy that came with the M copy that came with the system, but otherwise, you go into this thing where you write the explicit loop yourself because that way you're saving the Overhead function call, so ask yourself what particular value you could choose for the breakeven point today two two, that's not a bad guess.

I think the correct value is probably zero, but I was actually curious because I thought maybe I'm just a dinosaur and I have this completely wrong or something, so I went and did a bunch of measurements. I actually wrote the code, put it in there, tested this and it works, um and I couldn't find any measurable difference in any testing I did now. You already know this. I was just copying random things, um, of various sizes, but I couldn't find any measurable difference. Modern compilers on modern machines are just smarter than you, okay, so for the most part let them do what they mostly do well, you shouldn't. worry about that, that's fine, so you don't want to sacrifice clarity for efficiency.

This is another one of those things where someone has to write a bunch of complicated code and they have to do it well. It will cost you perhaps more than it will save over the entire lifespan of the program and you have left a nightmare for the next grad student or post-to to pick up, okay, so I think macros, in this sense, They are possibly one of the features of C that was cool at the time, but their time has passed and you have alternatives at this point, especially in C++, where you can use inline functions if it really matters and you have constant declarations for numbers and enums, all these kinds of things that make it better, there are still places where macros are fine, but they mostly don't work like macros, so I guess there's basically a general principle that says that every language has things that are actually pretty bad, no.

You should use them, right? in any particular language, but this I mean, actually Fortran was my second programming language, my first programming language was Cobalt, which set a kind of real background for the level of crap one could put up with, but anyway I wrote a bunch of Fortran when I was about your age um and this is Fortran as it came from the factory originally in about 1958 with uh or something because it has these arithmetic statements yeah, anyone who's willing to admit they've ever written a gift arithmetic declaration. in this group raise your hand if you wrote ah three integers three okay we'll defeat you later um the arithmetic yes for those of you who have never used it although you've probably seen it it's basically a three way branch and it says evaluate the expression and then , if it's negative go to the first of those labels, if it's zero go to the second and if it's positive go to the third and it's a pretty close imitation or match for an instruction in something like IBM. 704 or something like that, if I remember correctly, a machine that has long since left the earth, anyway, this does something, but it's a little hard to see and of course one of the things that actually appeared already in um for TR.

U 66 was that you could write uh if statements in a more sensible way, the original Fortran only had the if arithmetic, if I remember correctly, you could write this, this is not very good, but at least now you can start to see what it is doing . some kind of greatest common divisor algorithm and then eventually, and it took until Fortran 90 before someone realized that it was worth putting in a language a while loop, just an unconditional loop around and around, you know. , what a concept, you know, about 30 years ago.after everyone else discovered it, so now you can write it like that and I think it's a reasonable thing for languages to have bad and good features, as languages evolve and languages evolve a lot, Fortran has evolved tremendously since then. fortran77 for example, you have fewer reasons to use bad features and many more reasons to use good ones, so you should think that way, every language not only has bad features, but has obstacles where the feature is okay , but you have to do it. be careful how you use it, here's one, this comes from a book that I think is the worst C programming textbook ever written, almost every example in that book was wrong in one way or another, the advice that contained was, at best, misleading.

Usually I was completely wrong. It's hard to believe the book sold. Luckily, I got a copy and managed to keep a lot of the good parts. For uses exactly like this, what this function is supposed to do is basically make a move, you know? string concatenation, but it's trying to show you how you could write it and what it's called combine and it just takes those two strings and gives you a new string that's the two of them stuck end to end, um, and what does it do well. says that R is a car array with 100 bytes, so it will set the output to 100 bytes regardless of how large the input is.

Good start, you say, but okay, so it's going to use a Stir copy and it's like you're going to tell people. how to do these things, why not just be consistent but anyway you use shake copy to copy the string and then you can calculate the length and that tells you where the second string should be trapped and then write an explicit loop to put it there in instead of just using shake copy um, but it's okay and then it says return a pointer to the internal array, this is just awful, I mean, this is bad practice or something because it's a bug, a bug that all of us in a At one time or another we have made when writing C programs that you create, have an array and inadvertently return a pointer to it, it is a local array and therefore does not exist.

In a useful sense, once that subroutine or function has been successfully exited, it may sit residually on the stack until something else happens, but for the most part it disappears and relying on it is a recipe for disaster, and here's this guy telling you this is the way you do it, bad move, okay, so what you're seeing here are examples of things that people do right or wrong or whatever in the languages and languages of Computer languages are very similar to human languages, there is a kind of formal, this is the way you are supposed to do it. do it, but there are also idioms about the way people actually write code in practice, the way they write standardized types or perform standardized operations or tasks, so consider how the elements of an array are set up for something in a program c here.

There are four different ways you could do it right. You could go from zero to N1. An increase in the loop. You could do it up to, but not including, n incremented loop. You could back off, but I'm fighting for everyone. writing C here no one would use those, you would use the fourth one, right, that's the standard way to write it if you're doing fortrend it's like i equals 1 to 10 or whatever, it's a comma 10, whatever, it's what same, that's the standard way of writing it, so the problem with idioms is that if I give you a piece of program code that I wrote and it has that particular idiom, you can look at it and say I see what it's doing, you don't have to think about it. conversely, if you're sitting there saying "I have to write a loop that sets elements of an array", you know what to do, you write it automatically, you don't make a mistake, it's easy because it's an idiom, it's one you use all the time without having to think about it, so it's a standard means of communication between your brain and the Machine or between you and some other programmer, the flip side is that when you see something that's non-idiomatic, you have to sort it out. to raise a little red flag that says what's going on here whoever wrote this is doing something different why they are doing it differently it may simply be because they are not native speakers of this particular language or maybe there is something actively wrong so if the idiom is not followed if you see something that is not idiomatic, be suspicious, look at this, what is happening, see alloc is one of the memory allocation functions, it says: allocate n elements of this particular size, set them to zero, it is okay, so what this does is assign an array of n integers, okay, let's say n is 10, okay, and then it goes through a loop that sets n+ one, oops, you allocated a hole this big and you wrote stuff that big, and the way you see it, literally, was the way I saw it. the idiom is wrong the idiom is zero less than n and here it is less than or equal to n it is okay and after having seen what you say okay, there is something wrong and that is what it is this is the idiomatic version the size and then from zero to that's fine, so if you write it this way you don't have to pay attention to it, it will probably almost certainly be right now.

This appears in a variety of this specific appears in a variety of contexts. I'm sure it has affected at least some of you and it certainly affected me quite a few times. Think about arrays in C C++ Java Python Pearl, name your favorite language. They all start at zero. Think about for dran, where do arrays start? Oh my goodness, they start at one, so the cognitive dissonance or whatever of going from zero origin to zero origin or vice versa is just a nightmare, so if you're converting a program from one of those languages to another , you are guaranteed to go wrong somewhere and it will be hard to find, right.

I bet a lot of people have been through that, so it's a place where idioms are different in Fortran, let's say they would then be in C. I'm not up to date enough on Fortran 9x to know if I can. establish the origin correctly I probably can, but people do that, I honestly don't know and that just creates cognitive distance in a different place, luckily not for me, okay, another example, this actually goes back to something we saw ago a few minutes. What this does is say: make a space large enough to hold the string and then copy it.

How long is the chain? Well, the string is three bytes and the length is three. However, think about the implementation of strings within a C or C++ program. program at the basic level there is an extra bit at the end the null bit which is Terminator Okay, so what you've done is allocate, say, three bytes to a string of three bytes except that the string is actually four bytes, so that when you do the copy it also copies the null b and again you've gone off the end of the array and the idiom that was actually in one of the previous examples is this sterin + one and if you don't see that something is wrong, It's okay, it's kind. from awkward, it's an idiom that is only appropriate for C character array types, uh, you don't need it in uh forant or Java or many other languages, because strings are not null terminated, don't you see that there is an operator instead of length. and then you have a different set of idioms that you need to manipulate things in those languages.

Another thing that goes wrong here. This is a specific case of a very general problem that you must take into account in programming. Get the character. Print it again. and then see if it was the end of the file, the end of the file is not a real character, at least in the Unix world, maybe it is in the Windows world, it is a state of being, so you have copied something into the exit that was. There is no in the input and the reason you did it is because you wrote a loop that has a test at the bottom and a loop with a test at the bottom is one of those things that is a red flag that you look at and You say wait a minute.

Is this appropriate because it means you've done something before checking if you should do it, so it's almost always the case that you want to write loops where the test is at the top? Is there something to do? Then do it if there is nothing to do. I didn't do anything in the original Fortran and going forward, I don't remember if in Fortran 77 even do loops had the property that regardless of what the boundaries of the do loops were, it went through the damn loop once and this was a disaster that you always had. to protect against that kind of thing, the newer versions if I remember correctly do it correctly, but I'll leave it to the experts above, that was just bad and meant that all sorts of programs didn't work correctly when they had these cases fun. where the upper limit was less than the lower limit the way in C, this is a standard C idiom, get a character, save it and if it wasn't the end of the file then you can post it, but otherwise don't. anything, so this is an idiom, the one above is more like the kind of thing people write in languages like Python, where expressions, assignments can't be embedded in expressions like that, so you have to read a time you prime the bomb. and then you want a while loop with the test still at the top and a read, a second read at the bottom I should have had that here but I didn't get it, how do you find things like this?

How do you find what's wrong? In many cases, you can find out where the error is by testing something at its boundary conditions. I realize that boundary conditions are a phrase of a technical term in many different scientific fields. Actually, it is a technical term. I think it's also a technical term in testing programs. at least in running programs, here's another piece of code that comes from the same guy who was removing spaces and he says he wants to remove the trailing asterisk, okay, and what caught my attention here was the comment that says that the test should be higher. greater than or equal to zero, how would you solve that problem?

I think the way I would solve it is to say "okay", suppose the input consisted of nothing more than an asterisk, the string was one character long, it was an asterisk, what happens then I say, okay? What is sterin? It's one, subtract one. I become zero, I am greater than zero. I'm never going to look at that character, so he better be greater than or equal to zero, and I can reason about that by reasoning about the boundary condition. and I can reason about the boundary condition by finding the simplest possible case that I can work with, which is putting that asterisk there at the beginning, that's fine and nothing more, so anyway, that's it, ah, let's see, yeah, okay, what could go wrong? good question to ask in any program um here's something new that I picked up from the web very recently uh like last week um it's part of a little statistical package, I don't know if it was meant to be real or toy uh doing the average and the standard deviation along with a variety of other things, so what it does is it calculates the mean of M elements in a matrix and then it calculates the standard deviation using the squared differences that you know um and then it returns the square root uh appropriately okay, right?

What could go wrong? Yeah, what if m is one now? I don't even know. I'm not a statistician. Someone with better training than me can tell me what the standard deviation of a single element is, but it's not division by. zero, okay, um, and yeah, that's the obvious thing and how do you calculate how do you detect that you test the parameters and think about where the limits are? Borderline cases are fine. m is one and in fact there is another limiting case. What's going on? if m is zero, that begs the question of what happens in the mean calculation and I will tell you that the mean calculation has the same problem: it divides by zero if there are no elements, so here we have a library that is Se It's supposed to help you, but if you don't use it correctly it won't help you much, you'll get mysterious crashes in the middle of your code, so you'll want to be watching for that sort of thing. in your own code, if you wrote the routine, you have to defend yourself from the stupidity of the people who use it, even if it's you, in some other part of the code, okay, yeah, yeah, actually, if we go back to that for a second, there are a lot of different things that could go wrong, the array could actually be a null pointer, in which case you have a totally different kind of disaster waiting to happen, it raises a very interesting question, you can be paranoid or super paranoid or you can be incredibly unrealistically paranoid and I don't know where you draw the line, there is no general answer for that, but one of my favorite examples shows that there is a line, let's say you have a routine that does a binary search okay, so I have a bunch of items and they are sorted and I want to do a binary search on them, okay it just works, the binary search algorithm only works if they are in the right order so I could do that if I was super paranoid.

Review and verify that they are in order before performing the binary search. It would be wise to replace a logarithmic algorithm with a linear algorithm, so that's probably not the right thing to do, where do you plot theonce, so it should be there once as a module and something the insert type says, tell me where should I put my new word and the check type says, tell me. It tells me where I should look to see if my word is there and so modularization, if you do it that way, means that the complicated calculation is only there once and that makes it much easier and more likely to do it.

OK and solve this problem. I guess someone fixes the bug on one side, who fixes the bug on the other side, I guess once it goes live, yeah, well, like I was saying before I was so rudely interrupted, um, I don't actually remember what it was. saying when they interrupted me. but it was something about modularization and the U, the two different versions of this program and I actually don't have much else and I appreciate that many of you who came back have come back, so let's actually end up with a modest number of types of quickies.

Something happened here, you probably didn't notice it because it happened so fast, but right there at the top of the first of those functions, um, the dictionary function dictionary insertion um says insert and it says return one if the word was already in the dictionary and otherwise it returned zero that's what the comment says what the code says the code says it doesn't return anything right so the comment in the code disagrees on which one is right, there is no ambiguity, there is an aphorism in the army that says "believe in the terrain, not the map", okay, so the code is the terrain and the comments are the map, so they could well be wrong, let's look at some comments actually.

It's funny, one of the things you want to do is make sure your comments actually tell you something. Well, you know, well-written code doesn't need a lot of comments, but there are certain things that are critical. What is that the code, whether well written or not, does not need your comments which add nothing to what the code already says, so look at this, it says ignore all signals etc, and then each of those things. basically it says ignore this signal ignore this signal ignore this signal um and presumably you could do it like this and that would be much simpler um there are times when you see comments and you think that whoever wrote them was actually being paid uh maybe By the way, probably won't work in Scientific Computing, but in business stuff, sometimes you wonder, um, like that, well, you know, I make more than double, um, these are real, by the way, I haven't made them up at all, they all come from realcode, not even textbook code, where else do you say this is something like Cobalt?

Actually, I guess if you change the comment part, it would almost be Cobalt um and uh, this is probably C++ and this one I'm complete with big box comments and so it's not very useful, don't write comments like that, no they add nothing except vertical um and horizontal space to a program. You don't really want to make that place where you do want you to see comments. and it's probably a sign that something's not right, this is actually a piece of code that I wrote and it comes, it's actually part of O and at one point in the evolution of O I introduced a horrible, horrible bug into the thing. like a whole class of programs made the O compiler go into an infinite loop before it could look at anyone's data and while I was trying to figure out what the hell it was doing, I gradually figured it out and this piece of code that you didn't expect to understand, understand, for Of course, the comment density here is extremely high, like it's one comment per line and the rest of the code is probably one comment every 20 or 30 lines at most, something like that, so when you see that sudden explosion of comments it says that something is going wrong and sometimes what's going wrong is that it's a piece of code that's actively wrong, it's still wrong and someone was trying to get by and putting comments um so the deal is I don't want to document code. wrong, in other words, I did something stupid here.

I'd rather rewrite it so I don't have to say you did something. Stupid, there are other types of implicit documentation that appear in programs, one of these is various types of constants, you know you see 3.14159 or whatever, you know it will be Pi, it might even be better to see something that says pi and then know that was defined and calculated correctly somewhere. otherwise usually, when you see magic numbers, numbers whose meaning is not instantly obvious, you don't know constants of nature like Pi um, something is wrong, someone has missed an opportunity to make the code easier to understand, so here's an article that actually came from a student in my class a couple of years ago, the comment itself is kind of interesting, this poor kid couldn't figure out how to do something without a manual, something pretty basic he doesn't really have nothing to do with Java, it could also be done in C or C++.

What are those constants? What is 65 64 plus ah? In fact, it's 64 plus one. It's capital A. Yeah, so 90 is a capital Z and 97 is a little a. So this basically says: Do we have uppercase letters, lowercase letters from 48 to 57? they're digits and there's a couple of other random characters, um, and what they should be is quote a quote and so on for all the other characters and that way it's self-documenting and there's no need to do this, you don't need to figure out what is the correct value 97 and you don't need someone else to figure out what 97 means so you don't want to do that and the other thing I was talking to people outside about how to name things in shows and Are there naming conventions and if the specific conversation was about conventions for things like capitalization or underscores in long names and stuff like that, but before we get to that, you really want to name things like variables in a way that explains again what their function is, what their function is in the program. local variables like me the iterator in some trivial loop like 4 IAL 1 to 10 uh they don't need comments but things like global variables do or the function name should be self explanatory um and this is a wonderful example of how not to redo it from my class six or seven years ago at this point, I guess, and you can probably sense who this kid's roommates and maybe his girlfriend are by the names in the middle and I've never ever known who she was.

Grace, but I'm not sure she was properly associated with Earth. Well anyway, I've given you some rules, kind of like waving your arms and this isn't exactly a summary but kind of a representation of some of the things I talked about along the way, intelligence is out of place almost always, but you don't want to be too silly about it, it will almost always be the case that if you write your code clearly, simply and directly it will be quite efficient and at least you will be able to work on the parts that are not efficient and not worry about all the other parts that aren't where the action is.

It helps to know your language, including its idioms, of course. um, things like checking the control flow boundary condition is something you do while writing the code that you think about. I just wrote a loop. What will happen if I go through the loop? The right number The wrong number of times What happens if the limit is zero, which happens, the limit is the size of the array, things like that, you can do that kind of boundary condition, checking right at the time you type the code, you can catch a huge number of errors before they appear. happens sometime and that's definitely something you want to do and then defensive programming protects you against yourself or anyone else that's using your code and that's really important, but the bottom line is that really what you want to do is say what means as simple, clear and direct as possible.

You really want to write things that are clear because that's the only way you're likely to get it right. It's the only way you'll be able to work on it later. It's the only way that when they stop being graduate students and stuff and become full professors, their students can find out what the hell they did and that's why they all really want to do it that way now. knows all this, you know, in a sense, I'm preaching to people who already know it, so do it right, go and tell your friends the ones who didn't hold their ground during the rest of the talk. um, um, this is what they should do.

There are many reasons why people say they don't have time to do this. You know who cares? The Style program works, maybe it does, maybe it doesn't, but actually that's not a very strong argument, you can say it takes too much time. to fix it, but how long does it take to fix a broken program? If you do it better the first time, it's more likely to work and you won't have to waste that unpleasant time fixing it. People say that style rules interfere with the freedom of the programmer. I don't think it's a serious argument.

You can write really clean code within someone's ruleset without much trouble and if you go out and work in the real world, let's say not in it. academia um, you'll be forced to use some company's real rules anyway, so you have no choice, the rules are arbitrary, well, they're kind of arbitrary, but not that arbitrary, um, and one that's actually real . Think about people in academic settings, particularly graduate students, postdocs, and even junior faculty, they don't get any credit for making the program cleaner, neater, and nicer because that doesn't help them get the document and the document is the thing. you have to get out there, but if the article is based on physics that then needs to be rethought or retracted or whatever, then maybe you're not so good, so even if it's for that kind of environment, I think it's worth thinking seriously about , so if you get a program where all the little things are wrong, you can be sure that the important things are probably a little wrong too, whereas if all the little things are clean, you know the program is really clean, simple, you'll be able to understand what's going on, then chances are strong that you're also doing the right thing anyway, that's the end thanks to everyone for coming back after that break, thanks Brian, we're running a little on time Scott 20 , so we have time for some questions, so it's Okay, I'm going to answer questions, but you know, feel free to escape, you've already invested more of your time, so I'll move on, oh, the question is when I say, for example , an integer plus a zero quote to convert from an integer. value in an ASY value that one is portable, yes, because it is a universal character set, although it is a US standard code, it is also UTF encoding to UN code, so the arithmetic for the uh digits in the asy character set works and others similar. for upper and lowercase work, I still think that in some ways it's probably better to do something else in case some other representation looks superficially the same but is different, but I think it's safe and so I'm comfortable with that.

I wouldn't push it too hard and as soon as you got out of the standard kind of North American character set, the North American character set wouldn't really push it too hard. To print a character, the correct thing to do is to use uh. Well, first, are you starting with the numerical value, let's say 48, or are you starting with the numerical value zero, because the deal is if you have a number zero and you want a binary pattern, all zeros and you want to see what it looks like? one digit uh so you need to do the conversion if you already have something that's wrong inside and you want to do the then by C in C programs uh or in Java, in fact, it's the right way to do it or even in Python, in fact, there was another question or people, yeah, um.

Global variables, I think in general, are not the right way to do things, there are times when they just save your bacon, it's much easier to say things are up there and whatever. it can be seen by anyone it can be modified the good news is that anyone can see it the bad news is that anyone can modify it and therefore it is potentially not good so there may be certain types of configurations where the fact that it is global interferes with something otherwise, because of uncontrolled access, um over time, if something is a global variable, it means there is a real danger in using it, say in a type of threaded calculation where two threads might be manipulating it in an inconsistent way and you need to lock it, whereas if the content was stored within each code, the data associated with each thread was associated and then the blocking, some of it disappears, but I think the trend in recent years has been less and less.

Global variables, more and more, discipline, more and more. putting things in classes or uh in Fortran, whatever the right word is, modules, um, etc., so that they are better controlled and certainly Fortran programmers use, you know, something common without a name or something, no you do it. Escape, go shopping, thank you, it's a pleasure.

Watch Video & Subscribe

If you have any copyright issue, please Contact