YTread Logo
YTread Logo

Unix50 - Unix Today and Tomorrow: The Languages

Jun 08, 2021
Okay, okay, we have flights, that's good and I have no idea how this will go, but let's try. Well, I think one of the things that UNIX has done quite well from the beginning is being a vehicle for creating and using

languages

, some of them being conventional programming

languages

. C is the obvious first in a sense and C++ is the natural continuation of that, but then there were a bunch of other languages ​​that we're not what you'd call. conventional in the sense that I will take on any particular programming task, no matter what the language is, I will be good at it, but there were a lot of languages ​​that were somehow smaller and more interesting, so let me give you an example. of this and I'll see if this worked.
unix50   unix today and tomorrow the languages
I was once quoted as saying that if I were to be stranded on a desert island with only one programming language, it would have to be C, and I think that's probably still a correct statement. but anyway let me give you an example where this looks good so here's an exploratory data analysis that explored a data analysis actually a phrase that I think is associated with Bell Labs from the days when John Tukey still he was here as the world's leading statistician. Here's an example that actually came from one of my students at Princeton who was bad-mouthing his roommate who was a geoscientist and the homework that someone in Geosciences had given these poor kids was okay, we have a lot of volcanic eruptions here. . and we want to find all those that have magnitude greater than six and then the question is how is this done?
unix50   unix today and tomorrow the languages

More Interesting Facts About,

unix50 unix today and tomorrow the languages...

Well and at least since my student bed is mostly his roommate's, he said the roommate was doing it by hand. Well, no one here would. do that, what you would do instead of that, of course, you would write a program in your favorite programming language, okay, so like I said, I've been quoted saying wow, I would see that would be the right language, so When I heard this story I decided to quit. I went and wrote this program in C, so I did. Here it is. He didn't take any stylish comments from me. Just leave it.
unix50   unix today and tomorrow the languages
It took me an embarrassingly long time to get this working right first. I ruined Jeff F. Geddes' arguments. I never got it right, Mike, it's all your fault because you know the source of the file goes at the beginning or the end, while I was wrong the first time and then the question what does it return, it returns -1 or null when I was wrong, so I have to parse the lines, they are separated, the fields are separated by tabs, so I had to use shaking which has to be the worst function ever created in the C library and I don't know who to blame for that, although if anyone wants Raise your hand, everyone recognizes it.
unix50   unix today and tomorrow the languages
I finally got that part right and then the scanning, well, I said D percentage. I want to scan and present. I got the ampersand right, okay, I got it right. but unfortunately I forgot that percentage D is a decimal integer so I had to say percentage F, but oops was I wrong, it has to be because I was using doubles, it had to be L F, so because of all this it probably took me De 20 minutes to half an hour for this program to work, it works, it's not solid. If someone gave me a number that was over a thousand digits long I would fail, but overlook it actually works and runs pretty fast as you can imagine.
So I rewrote it in my second favorite programming language, and as you can imagine, it probably took me less than a second to write. I worked the first time, it's robust, there's nothing wrong with it and what this says is that there's a Here's sort of a trade-off between the types of languages ​​you might use and the tasks you want to perform with them, so certain tasks They see it as completely appropriate language. I wouldn't want to write an operating system in awk, but for other things, except for other things the right languages ​​are a good option and there is a saying here Benjamin Whorf was a well-known linguist in the first half of the last century and he had this wonderful phrase about the language that shapes the way we think and determines what we can think about now.
He was talking primarily about the languages ​​used by Native Americans in the Southwest, but I think it applies even more to the artificial languages ​​we create to tell our computers what we want to do. The notation you have at your disposal makes a big difference. The better it fits your model of what you need to do, the more likely you will be able to write a program that works well and works fast. There is another quote that accompanies this one that I have always liked. from Alan Perlis who says that a programming language that doesn't change the way you think isn't worth learning, there's probably some counterpositive that explains why I can't get into Haskell, but that's fine anyway, so whatever I wanted to talk briefly

today

, they are small languages, that is, languages ​​that are more like the AH type of the spectrum than the C or C++ end of the spectrum, so a small language also sometimes called application-specific or domain-specific language, These are languages ​​that focus on something small.
They're not trying to take on everything in the world like I said, they're trying to take on a relatively small part of it, but by virtue of that, maybe they can do a better job at it, they don't have to be programmable. or turing-complete than other words, in principle you couldn't with some of these a correct operating system, but and that style of declarative language is often something quite useful, as we will see, and the interesting thing and I think this is the important thing. thing for anyone here who really likes programming and building tools etc., most of these special purpose languages ​​are comparatively small enough that you can imagine building one yourself, conversely, I can't imagine building even C++ by myself, but I imagine that many of these other things are much smaller, so here is a list that is by no means complete, but just the ones that came to mind as I was thinking about what I would say here

today

.
Regular expressions, for example, are ubiquitous in UNIX. I don't know if that was Ken Thompson's influence originally, they appeared in the QED text editor which brought with it multics and then in the simplification for IDI on Unix, the shell itself as a programmable tool, the whole document preparation suite in which several people have worked. here, including Lorinda, Mike and John Bentley, language development tools like yak Steve Johnson, wherever you've gone, he's and Lex with Mike and make with Stu Feldman and awk, which we'll talk about a little bit in a statistic that is now best known through its clone R and the Apple modeling language that Bob made for gay Dave and I, these are all languages ​​that actually allow you to do things more conveniently in more limited domains, so let's take a look at one of these in more detail and it's completely appropriate because I have the perpetrators very close as we watched in ancient times how the powerful had become great, but anyway, awk has a language that Alan Peter and I worked on a little while ago 40 years old, when in fact we were in adjacent offices and so on.
Someone asked me before how you can have a successful career. One thing that helps enormously is having super good people in the office next door, so I highly recommend it to all of you. Fortunately, Bell Labs is always an iteration of having super good people. there are people everywhere so it will probably work anyway, so awk is a language for scanning and processing patterns, as it says there, and it was intended for really very simple types of tasks like that, finding all the places where the third field is greater than six. An example of that, one that we used a long time ago when cards were more prevalent, was this idea if the length was greater than 80, in other words, things that wouldn't fit into a card image or other types of transformations. simple.
I guess everyone here has at least heard of it, many of you I hope have used it, but if not that's okay, the basic idea of ​​awk is something called pattern action paradigm and this is something I learned. from Alt to Hoe is, as part of this, the idea that the structure of a pattern action program is that you have a sequence of patterns and for each pattern there is an action and all you do is check the input and say mmm , that pattern matches. Let's do the corresponding action, so it's a very, very natural way to do certain types of calculations.
The kind of thing where something goes in is processed and some of it comes out again. Patterns can be regular expressions or numbers. things or string expressions or you know, mix and match however you want and the basic loop is, in effect, a triple nested loop because it loops through a list of files and for each file it goes through the lines and then for each line through the one that happens the patterns and if the pattern matches it does so so you can practically see how this would work and what the implementation might look like.
I'm not going to give you a syntax question, but just a couple of very simple examples to show you some of the things we did to try to make it easier to use something that would make the average art program just one line or too long, for example. so the first example here is a pattern without action, so what it does is print each line. which matches the pattern and the pattern is if the number of fields in the line is greater than zero in other words, if it is a line that is not blank by default, print it fine, the next one is a little more complicated, twice larger, approaching the larger program.
In fact, you might want to do it right and what it does is that each line, the first part is a patternless action, so it says apply on each line and it just increments the sum by whatever is in the first field and then , at the end, a kind of pseudo pattern and says "let's print the resulting sum, so this just adds all the things in the first call of a date, our first field of a data set again, a lot of free stuff is happening there because and therefore you don't do it." you don't have to say anything about them, you don't have to initialize the variables, the sum of the variables starts at zero, you don't have to specify field separation, the fields are split automatically, and of course you don't have to specify anything about the input processing. and then the last one is one that Al and I could probably write blindfolded here.
This is how I count the frequency of words in something. So what you do is read each line and for each field on the line you increment an element of an associative array which is an element whose value is arbitrary, usually a string like a dictionary in Python or a hash table in Perl or whatever your language and then at the end you print them and so it's very, very simple. example, but it was kind of a touchstone example for us of what you might want to be able to do very, very easily in a language like this, okay, okay, how do you build one of these things?
I don't know, fortunately I had people. Around me, the one who did know one of them was The Son, who has it, he was the author of a famous book about the compilers of the book of the dragon and I will not talk about the cover of the book of the dragon, but one of the things here It was good, how do you do it? a compiler this is a compiler and there is a lexical part and there is a syntactic part and then there is generating code and at the other end comes an executable program well, we didn't want to do that because compilers means that you have to generate code for a particular machine, so the cheapest way to do it is a interpreter that turns out to have essentially the same image except the bottom is easier and that's what we did and when I say it's the Royal, I'll actually give you the information. this and Peter did the implementation, so if it doesn't work, it was Peter's fault at the time, not mine, but we're still not free at home because how the hell do you write?
He went blank. I'll probably have to look here. It means I talked too much anyway, how do you do this? How do you build one of these things if you weren't a compiler expert or a language expert or anything else and it turns out that right at the time, a few years before this, we had been working on this, Steve Johnson had created this wonderful program called yak that allows you to define the grammar of a language and attach the semantics to it so you can figure out what the structure of programs in a language would be and then you would do a lot of the work for you was basically creating a parser and shortly after that Mike created this program called Lex that would take care of doing the lexical analysis part of that program, so you didn't have to do that either and that meant this was all nice.
The stuff around building a compiler, much of it, has been taken care of by tools that are themselves languages, so what we have is a language that we're building or designing called awk and it's using tools like yak and Lex, which are languages. themselves to build things, okay and how are these things built? Well, you could specify things by simply writing them downwriting a bunch of commands in the shell, but that's a bit tedious and gets worse as you have a more complicated build process here. because now you have to run your grammar through yak and your lexical stuff through Lex and then everything else goes with that into a C compiler and then you use this thing that Stu Feldman created called make, so it's a very declarative that says Well, this is how you create something from its parts depending on when those parts were last updated.
Now, this was a time when machines were very, very, very slow and therefore you didn't want to do any builds that you didn't have to do. and then do was a big help because it made it so you never had to worry about what was up to date or not, you just said it did and you're done fine so far, very good. now we have this program called hoc, it's defined with a generator parser like yak, it has lexical stuff done with Lex, it's built with what you do, it outputs it as a dot, you send it fine in some places, you know, a toad, ran a test.
In case you're done, we decided to do a little better than that, so we started building a test harness. Let me give you a part of the test harness. Awk contains regular expressions thanks to Al's work on Egret, that code went directly into awk and so how do you test it? Very good expressive expression. Regular expressions are a very small language for specifying text patterns. So what we did was we created another language, a very, very, very focused one that just says this regular expression matches this one. enter this entry this entry does not match this entry this entry and so we have a language to specify regular expression tests and now we have to execute those tests well, what is the obvious?
We'll run a program on top of the test to create a bunch of shell commands that will run the tests, what language we use to generate those test sequences, obviously, because that's what it's good at and so it creates this thing, it creates a sequence of shell command lines that look exactly like this and then you get logged into the shell and if everything goes well it doesn't say anything and if something is broken it will complain about a particular test, so this is a part of a very extensive test suite for awk, there's another one that's quite a bit for expressions and a lot of other things, but you realize how many languages ​​are at play here, all kinds of different languages, okay, so the thing is working, it shipped , it was carefully tested, nothing can go wrong and then what you want to do is get people to use it, that means documenting how.
Do you document anything? Oh well, it turns out I have something with me that you can do well. You can write manuals, all kinds of documentation, the UNIX programmer's manual that Doug and Andrew kept in mind over the years somewhere in here and a sequence of tools that graph John. Bentley and I used to create charts and we chose the program that TBL Mike had done with Lorinda and I did with Joe Santa to create a finished book-sized document that had the advantage that all the programs in the book were tested automatically. So in principle at least there were no typos induced as we went along, so what you can see is a lot of languages ​​that work well together and I think this is one of the things that UNIX in the early days did very, very good.
It was doable for us to do it and it was a lot of fun and I think it was a lot of circles or virtuous cycles depending on how your images were, where one idea, one thing would improve something else and then you could use that to get feedback later. Absolutely true, there is no programming language that does everything to all people and, again quoting Alan Perlis, there will always be things we will want to say in our programs that in all known languages ​​can only be said wrong, so think about that when I'm designing languages, but Meanwhile, oh yeah, Johnny, I invite you up on stage, yeah, so in a sense I'm a generation after Brian and the people you saw before, so I feel a little young here and I'm going to talk about the great languages ​​in this game C and C++ and they were among the languages ​​created around here actually one building two mostly and this was a wonderful place to work and you learned a lot.
The purpose of the exercise is to build things that people can use when we talk about languages ​​at least when someone talks about languages ​​they can forget that it is actually the exercise, the purpose of the exercise is to provide useful tools, not just to write really interesting academic articles and see how smart you can be in two column format. This is what keeps me going, there's a lot of good stuff directly and indirectly on this slide and maybe I should just show this slide and get off stage, but of course, but of course, I'm not going to do that, this is the way that from the C++ perspective, this is how I see the world, there are two things that really must be done: we must use the hardware well in every cycle, every byte, people have been telling me forever that efficiency doesn't matter and that the hardware is infinitely cheap. and things like that, I don't think it's anyone's business to have to use hardware well nowadays, that also means batteries and latency, so networks and things like that, on the other hand, there at that level of the world, are quite unpleasant. it's cold, it's unforgiving, so we work on abstraction and you see the top line there, special purpose extraction, general purpose abstraction and when I got to Bell Labs, I got a job, they told me to do something interesting, I looked at the walls around the doors and I realized that Something interesting around here is not what it is in many other places.
I have to do something interesting, so I married some of the abstraction stuff with some of the efficiency stuff. Brian and Dennis were down the hall so for low level stuff I chose C. great language for that and built Simula with better static graph shaking and very flexible abstraction mechanisms, it wasn't cheap so I had to make it cheap because I wanted to work with systems, I actually wanted to build a distributed system and if we had not been distracted by the design of a language, maybe we would have done something interesting, basically from these two languages ​​C and C++, much of the world of modern programming is there.
I'm not going to talk about the whole thing, but look at the side branch or to Java and C sharp and so on, there's a lot of things and there's been some progress, there are two images that illustrate it for the younger people in the audience. I can point out that there's a scale and that little wire at the bottom is how you plug it into the wall and it doesn't really work unless you're plugged into the war and we start with some language where K and R saw when I was here and it they learned and it looked like the top there, this is how you sort a sequence of integers and today we've come to something where we can sort almost anything by saying sort that's kind of skipped and then we can sort integers we can sort sort strings this strings and everything will work well and the modern code also runs faster, so the progress may not be the most dramatic, but now we are dealing with the lower stuff.
In my opinion, one of the reasons CSC plus-plus was successful was that Dennis had a very good idea: a simple model of what hardware was the series of objects as operations there and we can compose things from things So. This is really simple. Hardware has never been so simple. There were registers and all kinds of complicated things and today we have FPGAs and GPUs. and all that, but what you have on this slide is basically a treaty between the compiler writers, the optimizer writers and the machine architects, and this is a good slow pace for about 40 years and every year people tell us He said it seemed too simple.
We have to do something else, maybe one day we will have to do something else, but this is actually the basis of a lot of things that have worked and as hardware has changed, languages ​​have changed, if you really want to build economical systems , efficient and reliable. I ended up with something like that. I quote another quote from Dennis C. It's a strongly typed weekly Czech language and I kind of took it as a challenge. I'd like it to be at least strongly typed but less weakly typed, so we started with something where you could say that square root and Q sort each took arguments, maybe and then you could call it and have it do everything right and the first thing I did when I tried to compile C++ was look at that and say that's just not the case. good enough and everyone agreed that it's just what to do about it and my department head, then Sandy Fraser, said well, just don't do it right, so I added arguments, checking and conversion, so that the square root of two would no longer fail because two was not a double because the compiler now knew it was a double and knew how to convert things and while I was being added I changed the syntax of the statements so that they were more regular and easier to handle and the only thing we had In those days, to check that a program compiled different parts, actually that the parts matched, was a so-called linked program that people tended to forget to run, so I managed to trick the linker into checking these things and at least not We were able to execute it. things that didn't match, so we started with the main features of C++ were classes and templates, we want user-defined types, we wanted some form of resource management and we also got object-oriented programming from Simula, while templates appear when you you want. to parameterize things with types or numbers and you get generic types and generic operations known as algorithms, so let me show you how those two things work, so we want to raise the abstraction level of raw memory, raw memory, there is a pointer , there is something. or what is there and there are a lot of them that are a representation linked at the bottom and we want to elevate it to something where you build and initialize and you can copy and move the specific access and you have a destructor that cleans up Then, clean up the mess, that's the start from C++, it is the first one we go to and basically class means user-defined type.
I borrowed the name from similar guys, so if you ever write in a programming language that calls a guy a class, you can thank Christ. Nicole and this is parameterized on the element type, so everything is fine and reasonably just again, it's also efficient, you don't have any fat here, so what you can do is deal with your resource management by having types. you initialize them with values ​​there is a vector of double a vector of strings a vector of fine streams the fine streams are file descriptors with some buffers associated or we can have a vector of thread lists and you can start them empty and fill them The point here is that it which I showed in the previous slide, which are constructors that create things, destructors that clear things, they work recursively and so at the end of the block here, anything that is still owned by these types will be cleaned up by the destructors and That's the basic idea here.
I should point out that some resources are not memory, which means that if you think a garbage collector will help you, you are wrong, it will forget about the locks you have, it will grab the files that the handles should have. taken the threads that are still running, this actually works fine, the other thing we needed for generalization was some way to deal with generic types, there are people who think that was invented much later, but I actually wrote an article. back in 81 I said you needed jet oriented stuff and you needed generic stuff and I surmised that you could do generic stuff with macros and I was dead wrong but sometimes it's a good idea to have the right problem, you mean if you have the wrong solution then we want parameterize things that are types functions things like that we want them to be the types of containers we can parameterize an algorithm with what type of containers it will take we can parameterize the types of elements of those containers and we know that we can parameterize with the operations that we do, all of that is very general , but something generic in three dimensions and the examples are threads, locks, dates and operations like builds or merges, finding if that job or the container is there and basically the goal of these things was to be extremely general and flexible, my statement at the time was that I wanted something that could do things I couldn't imagine because I have a healthy respect for my limitation in that regard.
I want a series or a head because I really wanted to be able to afford things. like that vector in direct competition with a C matrix, so we couldn't afford to have many addresses in place and on top of that I wanted well specified interfaces on these generic things, unfortunately no one knew about that. how to do that at that time, not all three, so I would talk first and the C++ community suffered quite a bit over the next 20 years, but basically for a programminggeneric I can say I want to sort anything that is more or less belitz, a sortable range says in the ISO menu that the definition is the ISO standard, what that means and it is also a string, something that is audible, yes it is organized in a random access and the elements have less than that's the definition, an array, yes I can do that, so basically I would like to make generic programming like ordinary programming, that there is nothing particularly complicated about it and that we have We have to go a little further most of the time, we want to do operations so I want to take I want to find something in some sequence and the The only thing I need to know about the sequence is that I can loop through it, none of this random access, we just want to read one element at a time and apply a predicate that says if we found what we were looking for so we can take a list of integers and look for something that is greater than 7 or that are difficult strings where the operation that I specify is that it has some vowels.
This is reasonably simple and almost optimally efficient. There is no history, since in the addresses there it is assigned directly to that. memory model. I showed Dennis' idea. We can say how we define interfaces in this world. We do this using compile-time predicates, that is, functions that return boolean values ​​that we run in the compiler, so an input range is a range that is the path. If you look at the range, it is an input iterator, which means you can read it from time to time. A range is something with a beginning and an end. That's it for a sortable range.
I don't access it either. It says it under the standard, so I specify it like this, again, this isn't rocket science, it's actually first order predicate logic, so it's very, very general and it's very simple to write . This is something that has been running for a while but will be official in C++ 20 next year. C++ is controlled by a fairly large and unwieldy standards committee. You can see some of them there when a committee is in session and there may be so many people on it that it's very difficult to get things done, but C++ 20. Is everything that started here going to be great?
Oh, building 2 is still going on and it's not just about languages ​​of course, but a lot of this is still going on and we're nowhere near done so I'll say a word. about languages ​​no language is ideal for everything in all there are so many languages ​​phonetics divorced languages ​​who think they have the solution it is very possible that they have the solution for something I mean, many of us think we can do well in certain areas, but the idea of That there is only one language for everything seems strange to me, since I want mathematical precision for anyone who builds engineering things for me.
I don't want planes to crash. I don't want my NIT to fall. etc., on the other hand, the ease of use for most people and it is really difficult to convince people and the educational establishment that this distinction is important and that a language is simply a tool environment center, not that's all. You never write a program in just one language. write it in a toolchain and the tooling environment so that the representation of the idea is something like a code language, something that is easy to manipulate and strongly typed, which is something we got wrong with C and C++ because the source text is the definition and that is unmanageable, sometimes even unreadable, anyway, we can do much better, so anyone who tells you where is the end of the development of science or whatever, don't believe it, We can do much better in my particular world, which is not unusual.
I think people want a smaller and simpler language, of course they do, almost all languages ​​are big and complicated and in addition to having a smaller and simpler language, they want about two more features, each of you wants one or two characteristics and they are not the same. Same thing and also everyone wants full backwards compatibility because don't break my code while you fix it and simplify it and add to it, and by the way, people use different implementations, different toolchains, so what They want to be compatible is not the same. This is difficult and of course we want to interoperate with other languages ​​and systems.
This is hard. I mean, language design for a general-purpose language is almost impossible, and by the way, this is where every general-purpose language ends if it's successful. The only way not to fall into this trap is to fail and we don't want to fail, so that's basically what I wanted to say. Hello everyone, my name is Tom van Kotzen and I am a department head here at Pelops and a researcher and if the IANA was feeling young, I don't really know what to say other than that, it is generally humbling and a great honor to introduce myself here today, so the question I would like to talk to you about today is whether in the future we will pair a program with an AI.
Now let me start with a different question, a more general question, what makes software developers more productive? Now I found this definition of what a web programmer is. A programmer is an organism that converts caffeine into code, so maybe the answer to my question is coffee and I'm sure many of you will agree, but more seriously, what makes software developers the more productive its tools are and, in fact, every year the ACM gives a software systems award to a tool that has a great impact on the productivity of developers, in fact, UNIX was awarded for the first time in 1983, so I'm going to take a little bit of a broader perspective than purely programming languages, so think of tools as things like making things like a compiler.
Suite things like integrated development environments and so on, and from that perspective, I'm going to talk about some important trends in the software industry and how they might influence how we're going to create tools for four languages ​​in the future. The first trend I'm going to talk about is one that I'm sure you've all already heard about: the rise of machine learning now, if you put it in historical perspective, when computers first came on the scene we used them to numerical computing and of course the pioneering language to do that was Fortran. Now, a few decades later, we discovered that we could use computers for totally different tasks than simulations, so we had algorithmic computing and we had other languages ​​like Algol and Lisp. who were really pioneers there and although sometimes it's like you know an influential language that you've never heard of, it had a huge influence on BCPL and then later on, fast forward to today, we find ourselves in what some people Call the era of cognitive computing, where we figure out how we can make computers become good at prediction tasks recognition classification tasks etc., and if you look at the pioneering systems, the next generation systems in that space, it's interesting to note that they are not completely new programming languages, they are more of what some people call domain-specific built-in languages ​​and for example for deep learning there is Theano torch tensorflow, these are all libraries that are implemented in a host language, in this case all of them in Python, which seems to be kind of the dominant language for machine learning today and that's actually a topic that will also come up again in the Leavin and Load Now talk.
Machine learning is all about prediction, particularly when you're working. With textual data, a very important task for a machine is to try to predict the next word in a sentence, for example, if I give you the sentence, I thought it would be on time, but I finished five minutes and then asked you to do it. Fill in the blanks, you probably have an idea of ​​what words would fit in that context now if you train a machine to be good at this task, what you will get are the tools we are all familiar with today such as autocomplete in our chat. clients and our email clients, interestingly, you can apply this same game to code, so if I give you this line of C code and ask you to fill in the blanks, what should go there?
Also, I thought everyone was going to go with I plus more, but that's okay, plus plus I certainly fits well too, and of course we don't really know if this is the good answer; maybe we would need more context, but it's certainly a very high probability answer and actually more formal when I'm trying to play this guessing game, what you're really trying to do is estimate a probability distribution and it's a distribution. conditional of what is the probability that you see this token given that I've seen a lot of previous tokens and the state-of-the-art tools that we have today to estimate that probability distribution, our neural networks and, more commonly, recurrent neural networks, so which are machines or gadgets that you put tokens into and it goes through a bunch of calculations, matrix multiplications and what it returns is a complete probability distribution over all the possible tokens in your language or in your corpus and it turns out that the code is even more predictable than natural language and recent research confirms this, so actually what is here in this graph is a kind of entropy when trying to predict the next token for a given corpus, so the lower the entropy , the more predictable the corpus is and what you can see here is for C and Haskell Java Ruby closure, you know that it is consistently lower the entropy is consistently lower than that of the natural languages ​​English, German and Spanish, which is interesting, and if you have enough code to train and if you have enough computing power to sustain a very large neural network then you can build tools like this so this is from a company called tab 9 and they created an autocomplete tool so every time that a developer presses tab, this kind of window appears and the suggestions in that window for code completion were learned by a machine learning algorithm just by reading millions of lines of C++ code, okay, interestingly enough, the company doesn't not only did this for C++, but it also did this for, I think, six or seven different programming languages ​​and this is the same tool now operating on a piece of Python code and this is An interesting observation is kind of a general trend in which you know we would have previously implemented, say, completion tools by manually entering Python or C++ syntax and, to some extent, the semantics that these tools learn about the syntax of a language. and to a lesser extent, the semantics of a language purely from data, so the second trend that I would like to talk to you about today is what I would call the Cambrian explosion of software, so today there is an enormous diversity of languages, tools, frameworks, components that developers can choose to build systems and just a data point if you take the three main programming languages ​​that you know are the most used today; in fact, they are JavaScript, Java and Python, and for each of those languages ​​there is a package repository where developers can publish ready-to-reuse modules or libraries and, in fact, for JavaScript their package manager is called NPM at the beginning of This year it crossed the threshold of 1 million packages available in that repository and is growing at a rate of more than 750 new packages being published there every day. not updates to existing packages, they are completely new packages, so this raises a kind of new problem, like how do we help developers find the most appropriate libraries to do their jobs?
What we need now is a map, not a map in the traditional sense like me. I'm here, I'm rendering a map, a map of Java, you know the island of Indonesia, we actually want a map of Java, the software ecosystem, and this is from our own research here, so what you see here, this cloud of points, each point is open source. java project and we have tagged some of the most well-known Apache open source projects and what we did here was train a machine learning model using similar techniques that I have talked about before and the model tries to predict which libraries. they would tend to be used together in a source file and if you do it with a large enough amount of data, then the machine actually learns or the machine learning algorithm learns which software libraries are semantically similar or contextually similar and again, this is generic. technique, so we didn't just do this for Java, we did it for different programming languages ​​and for each of these languages ​​you get a map that you can explore and you can see what this language is used for and what different types of niche. ecosystems and if you zoom into very specific libraries, for example if you zoom into numpy numpy, a very popular numerical computing library forPython actually holds the entire machine learning stack in Python these days, so if you look at sort of what the model has learned is that it learned to put numpy very close to scikit-learn scipy matplotlib pandas and so on.
Now these names may not mean much to some of you, but if so, if you tell this to a Python data scientist, he will. We met with enthusiasm, yes, that makes a lot of sense, so this is interesting, but of course the question is how does this help developers be more productive, so we don't just train these models, but that we translate them into actionable tools. See this graphic for a screenshot of Microsoft Visual Studio code. It's a pretty modern integrated development environment, so the developer is writing code and can invoke our extension called Code Compass, so when you open it, Code Compass will start reading. developers code and figure out what libraries they're using and then, along with the map they've created of the open source ecosystem, they can figure out what other libraries would be useful to that developer given their particular software project, now I would say.
If you take a little step back, this is already an example of an AI peer programmer, we have the developer writing their code and they are invoking a tool built with AI that has access to the developers code but also the type. of latent knowledge in the vast amount of open source software, so going back to my original question, do you know if we will ever combine a program with an AI? I hope I've convinced you that the answer is certainly yes and at Bell Labs we are actually building sort of the next generation of AI peer programmers and with this I would like to end my talk and thank you for this, so at this point I think It's clear that many of the UNIX concepts we discussed this morning are still relevant today and I would even go so far as to say that they have become more relevant today than ever and what you want to do here is take a quick look under the hood. of the global light flux technology being developed, a research project still in progress, still a work in progress. but that's leveraging and expanding on some key UNIX concepts as we speak and we'll try to convince you or argue that we are, in fact, inventing UNIX pipelines for the 21st century.
Now it is transmitted all over the world as a technology, it is actually very relevant in a context. where you need to make sense of a huge amount of data, data is continuously streamed from our environment, data, videos, any stream and that is so massive that you really have to select the relevant pieces from this abundance of available data and that is a quite difficult task. that is ephemeral and that needs to be processed really in a timely manner before the information just disappears and it is quite a challenging task and if you as a developer want to create these types of applications there are some challenges to address so one of the challenges is clearly, as Tom already mentioned, a Cambrian. explosion of frameworks available, libraries available for you to choose from, so it's about complexity but it's also a huge integration effort, even if you choose the right libraries you still have a significant integration effort left and the second challenge has to do with functionality over the last few decades.
You all know that software has evolved from fairly monolithically designed software applications or services to increasingly fine-grained microservices, all of them addressable and connectable over the network, so the number of API endpoints available to develop is simply unbeatable, it's really huge. So again, the complexity and the integration effort and then, as Mark already introduced, it was ultimately led by a wave of latency and bandwidth sensitive software applications that finally managed to escape the confines of the data center centralized, what we see now is a generic software requirement. The services that are to be distributed, the computing needs to be distributed across the core cloud facilities at the edge of the cloud and even on the device, so again you as a developer will have to face the challenge of distributing the computing of your application through this distributed computer, break down the complexity again. and the integration at the beginning now involves you have been working for quite a few years on this world-class technology, a platform technology that is designed to deal with this fragmentation, fragmentation of the functionality of the frameworks and also the location, so you allows you, as a developer, to quote easily. -in quotes create applications that take advantage of all these live streams of data and video, assign analysis or processing functions and then do it in a monolithic way, but still deploy them in a distributed way now, of course, many Bolt statements are like that in the true UNIX. style and philosophy we have to think about what are the languages ​​or what is the language that we would need to create these types of applications to connect and orchestrate them, how do we do the plumbing through this distributed fabric and also how can we at all times maintain a view of assistance through these distributed and highly complex applications.
Thank you, four years ago when Bell Apps began global research into Xtreme C, we created a new programming language called Extreme to address the future live and needs that I just mentioned and in this talk I will position Extreme. as McIlroy once said, the UNIX pipeline for the 21st century, once UNIX pipelines were introduced around the '70s, it became the philosophy that everyone started proposing. I mean, it allowed a lot of developers to write programs that did one thing and did it well. It then enabled the composition of these programs to create new applications and this really meant that you could put applications together and finish in a way that data would flow from one program to another, so that you no longer had to worry about compiling or linking these programs or how they were compiled. handled communication, you could simply use them without knowing their internal details and no matter what programming language they were programmed in now, what if we could leverage these ideas and apply them in a new context to accommodate future needs?
Now, personally, I see the world. as a large reactive system and it would be a dream to access any remote information stream and use that information to improve our daily lives. Unfortunately, simply taking a video stream on my 4x3 laptop and extracting meaningful content would already consume multiple terabytes of data each time. month, this clearly does not scale well and therefore to optimize bandwidth we must distribute the logic of an application across a distributed network of processing machines ranging from the smallest IOT sensor to the largest computing cloud and this is exactly what we are looking for. in our extreme language, so extreme at its core takes advantage of the UNIX pipeline concept to easily compose software.
Now the way this works is that each function call you see here is actually a proxy for some functionality and uses a specification instead of an implementation, so in this example, we've written an application that takes advantage of a streaming video and then filtering out the foreground motion before sending it to an output channel which could be a web page for example, now each module listed here was implemented in C separately but composed at a high level. scripting language and by doing this we have now exposed the communication patterns about the application to a platform and this is the key to unlocking distributed deployment over a wide area, meaning you can write a single application but leverage any machine that is best positioned to run the software it can.
Then create an infinite number of optimizers, target bandwidth, latency, fault tolerance, GDPR compliance, etc., I mean anything your application prefers or requires. There are now two main strategies for defining a new programming language. You can write an external domain-specific language that uses your own. syntax rules, a parser, etc., or you could host your constructs on top of an existing general-purpose language; that approach is now commonly called internal DSL, although external Diesels clearly have a very strong tradition in the UNIX community, we know that internal ones are very common today with notable examples like tensorflow, which is based on Python, has SPARC which extends Scala, etc., and I think this is mainly because the internal DSL allows developers to use the tools and syntax they are already familiar with, plus it also allows us to grow and benefit from evolutions in the host language, therefore, we chose to design Extreme as an internal DSL on top of JavaScript, and JavaScript is today a very popular agile scripting language and gives us access to one of the largest developer communities around. and in the name of echo script javascript has evolved quite dramatically in recent years and provides many of the new language features that you would expect to see in any modern programming language, for example one of the JavaScript extensions currently proposed is the operator pipeline and once included in an upcoming standard we can immediately benefit from that now another recent addition to the thriving JavaScript ecosystem is something called typescript so typescript is a superset of JavaScript where you can optionally write your code to ensure that certain properties are met at compile time and I would say that gradually writing your range, but with weak checking, is probably one of the main features required to compose software easily in the rapidly changing software world. nowadays.
Now the edge takes full advantage of this gradual typing and can actually inform developers when they are sure when a certain composition cannot be done without adding a particular transform. Now remember that these transformations don't have to be written in JavaScript, as I mentioned before, the endpoint uses a software specification as a proxy for an implementation and this works very well in conjunction with using the best tool or programming language to implement the job. For example, use the latest C++ features to implement a novel video encoder. Now, when you distribute code across multiple machines, it's very important to ship the content and dependencies needed to successfully run that program, and we've developed open source. a new zatia serial JavaScript module that allows us to transmit all the required configuration over the network.
Now I would like to conclude that we do not have a clear end goal for a project. I think we also have to continue improving extreme language. as the global broadcasting platform, so that more people can reap the benefits of programming centrally without needing to be aware that their program could be using their neighbor's battery and I maintain that what we really want is to simply communicate massive amount of information more efficient thanks

If you have any copyright issue, please Contact