ELK Stack Tutorial For Beginners | Elastic Stack Tutorial | DevOps | Intellipaat

Apr 24, 2024

Hello everyone, welcome to the live session on

intellipaat

moose

stack

tutorial

, but before starting the session, make sure to subscribe to our channel and press the bell icon so that you never miss any update from us . Now let's start with the agenda first of all, we will start with the introduction to elk, where we will discuss what elk actually is and what are its uses in cloud computing. A post that will discuss the components of elk, which is logstash and kibana

elastic

search, then we will move on to the elk flow. We will also do a hands on demo on single node installation with

elastic

search cabana and logstash beats post you will learn how to analyze log data and finally we will discuss some details about filter plugins so guys this is the agenda now let's get started. with session, so what is eagle? so elk

stack

is a group of elastic open source products, so what is elastic elastic? is a company name that is designed to help users take data from any type of data source in any format and search, analyze and visualize the data in real time, so there is a big difference between the existing products that are on the market and what elk stack has achieved.

Well, elk stack can help you analyze data in real time. Well, what's the biggest advantage that you have compared to you use any type of what you call any type of relational database system, a tool that will help you understand your data in real time, so the elastic stack , the heart of elasticstack is something called elasticsearch in the elk stack or elastic stack. There is a product called elasticsearch that will power anything that was created by someone and that is a very powerful way and that is completely different than what we are currently running, so it is mainly used for analysis and search, analyze and visualize. this is specifically used for log analysis and elk is the most used log analysis tool on the market in the world and completely beats splunk or everyone in market share because elk is completely free and yes the stack elk will represent a steep learning curve.

More Interesting Facts About,

elk stack tutorial for beginners elastic stack tutorial devops intellipaat...

Why is the learning curve so steep? Because you need to configure everything, all the basic components correctly. You have to do it, but when you go to Splunk properly, you can go right in and install Splunk in a couple of hours and you'll be able to have it up and running because since they're paying a lot of licensing costs for Splunk. Splunk will do most of the activities for you, us, outside of you, most of them will be pre-configured and everything will be ready to use right away, like dashboards, all these things, but here. in the elastic stack there is a pretty steep learning curve because the moose is made of three different parts and these three different tools have to be interconnected and you need to make a lot of adjustments to all these things, but as I said again, by putting this additional effect, what happens is that you get some rewards or gains which will be very useful and which will not be available in any other underlying infrastructure software in a company, so today, before, elk stack as a company started only providing logs analysis, but today it has diversified into so many types of situations that it will be highly usable and from Facebook to Netflix, there are thousands upon thousands of companies today who are actively using the elk stack in their businesses, from analytics and product search to enterprise word search or apm monitoring okay there are different situations you call nowadays most of the companies use lk stack and we will see that okay and elastic was founded in Amsterdam in 2012 and that company was formed to support the development of the elastic search is fine and it is related to all business and service aspects so initially elastic search was an important application and that important application is the core of everything called elastic search so it means it is a case of three tools, okay? and I hope you already knew, and that is elasticsearch logstash and kibana and slowly change the gain later, so the four main components of the elastic stack are elasticsearch.

Okay, this is a component that will store the data in elasticsearch, so this component will do that. storing the data logstash is a kind of person who will feed the data as if it were your Splunk feed jobs, so this person is also a very important component where you will read the data from various sources and then you can write to various sources and will write to elastic search so this person will actually get the data from various sources and not only will he get the data in the simple format but he can also transform the data or he can also modify or filter the data and then he can save it in various outputs and one of the most common output log statistics is elasticsearch because logstash will collect the information and store it in elasticsearch then kibana kibana is a graphical user interface it is just an application which will provide visualization interface and will do a lot because it has a lot of data but until you don't get meaningful information from it, it's a waste to do any searching, that's why kibana is also a very important component where it will be used as a visual interface for all the data that we are going to collect and beats is a new component, but it's been more than three years now, three to four years, these are lightweight data senders, they're okay, they sit on different servers and send data to elasticsearch directly, so elasticsearch here is, like I told you, the heart from dlk stack, which will be a very powerful database, which is a nosql database and surpasses these small lightweight agents that are single applications without any dependency where they will continue to send the information. for your elastic search, it's going to be that moving up elastic is going to save you, okay and yeah, we're going to talk in a lot more detail than this, okay, because understanding the architecture or the components of your performance stack or what their importance is. very, very critical, okay, let's talk about a problem statement like, how is what you call a moose going to help me?, okay, once we understand the type of problem statements, you will understand it better, okay, let's take that, okay, your clients. you are searching for information about the product in the company with huge product catalog so you have an e-commerce site where customers search for information about the product and you have a huge product base and you also have a huge customer base where We are facing a problem with long time in retrieving product information because Amazon has millions of products as rights listed, it will be difficult to come back, so what happens is this will lead to bad user experience and in turn, you are going to lose the potential customer so you may also lose my customer because if you are slow the customer may not be interested anymore or may not be as interested in browsing your site and they will lose interest and may go somewhere else, so what happens?

The delay in search right is attributed to the relational database used for the design of your product, so the data is scattered among several tables and retrieving the meaning of the user information requires retrieving it from them, because all We know that everyone uses an rdbms, a relational database. management system and the most popular of all is Oracle Fight, that is common, there is no logic, also because Oracle earns billions of dollars and today in 2020 also Oracle is the most stable application. SAP uses it, okay, everyone uses it, okay, because everyone trusts it, so it's very reliable. ok, everyone uses it as a backup, it means that using these type of relational database systems to retrieve your product information will slow down and will be the main culprit for that slowness in your product search, so?

Why, because your racial database is correct? It works comparatively slow when you are going to search for a large amount of data and you are going to fix the search results and what are we going to do. We will do it through a consultation. I'm going to send a query from the database. enterprise application and what is happening is that it is becoming slow and it is slow, that is where companies have started looking at alternative ways where the data stored is in such a way that it can be recovered quickly. This can be achieved through a nosql database instead of a The rdbms and sql database have some kind of properties, you know, the properties of the assets, the sequel is based on this acid, the properties of the acid are for these, these four pillars, right, it means that your sequel must be based on these four rules, okay, acid is a base, okay, atomicity, consistency, isolation, durability, okay. so your sql is based on these four properties and any sql server means any structured query language that will follow these asset rules will be your relational database management system and your microsoft sql, my mysql or your oracle database, they all follow these asset properties. because those are the four principles of how your database management system should be run, where everything has this proper schema, and remember that whenever we are going to create a proper database, we always first decide the schema because what will your schema be? . be one of the most important things for your app development.

A lot of people think it's okay if I say okay, I'm building an e-commerce app. Okay, for e-commerce application, there will be three types. Ok, the database tab is the application tab and the web database tab is where you are going to store the customer data or my products. My application layer where we write the actual business logic, how the user should be searched, how the content should be processed. All right, the main business logic of that entire echoes application will be written in your application layer and in the web layer is how we are going to display the information in the user interface.

Okay, so what component do you think is the most complicated that we designed in these three? components correct, which component do you think is most important to design? It is your database. Why, because Java requesters are nothing more than applications that people write to query the database, see if you have created a proper database with a perfect schema. It's easy for your developers to write Java code to retrieve because literally 99 percent of the time, any Java application will most of the time deal with a database, either updating the information, extracting the information or updating it, creating it. or eliminating it, okay?

Know the basic information most of the time, any request that originates from a Java application most of the time has a current operation, whether it needs to create some data, it means if you have bought something it will add an extra create table because it will. you'll be adding new data because you buy something, it needs to be written to the database, it's a create read, you're just reading your account information or your address information, or you're just reading your purchase history from the last six months or the update, is changing. your personal data or Home Hunter details or your mobile number, it doesn't matter and delete, uh, you are canceling your, uh, what do you call your order that you placed yesterday or canceling it, such deletion, so you will literally go through all this , which means that problem, properly design your schema and around that schema write a Java application that will actually read the data and display it to the client, so we have done that for a long time, but now what happens is that people I wanted a new way because We didn't always have structured data, we had a lot of unstructured data and this unstructured data started to increase.

Alright, about seven or eight years ago there wasn't much unstructured data to process, but today we have petabytes of information that is specifically unstructured data and unstructured data cannot be properly maintained without any secrecy using regular SQL because a SQL You need a suitable scheme. Well, it means: what is the row, column?, what is the main row?, everything must be defined correctly, otherwise you can do it. Do not store the data and you should always have the same column. The same number of columns. Yes all data should only have four columns. If any of the data has five columns, you cannot store the data in the table because the table only accepts. four row data or sorry four column data then it means that SQL expects you to have a specific number of rows, everything, the data type, everything, but nosql doesn't care because it is an alternative to SQL and not No has no scheme, no scheme means there is no rule that says it's okay that the customer rate should just be just a number, it's okay or the ID of thecustomer should only be the name it can only be the name number and a customer can have four columns and a customer can have six columns is fine so it can change completely so nosql is a core approach of data where we can accommodate why data reactor including key value document in columns and chart format column means data that is in the column byte of the column okay so column and chart graph means it will look like this guys okay this is a graph database okay in graph database the data will be stored in this type of relationships okay yesterday we have databases that we are going to use and the data is stored in this type of graphs.

Okay, what are the interrelated nodes, why this also has special use cases, only in that use case it will be used and if you want to see the right column, the column will be completely huge columns of data. Okay, column storage, column storage is like that. so the column storage again will be updated huge columns so there will be no rows okay it will be in the opposite column so there will be more columns there will be less rows so at that time you will use a column storage means that a user will have many records means a customer will beThere will be a lot of information with each column, so at that time use a columnar database, so all kinds of databases are catalytic score, there is no SQL, They don't have any established scheme, that's fine and they are not necessary. follow any scheme and large scale companies started using it initially but nowadays most of the companies have use cases and if you have heard the term mongodb right, mongodb is the most popular worst nosql database, it has become so popular now it is the fourth most popular. popular database, so the use of mongodb has been eclipsed a lot now and many people use it.

Well, it means it is the most used nosql database. Well, in a nosql database, the data will be stored in json format so everyone knows how to do it. Does the json format look like this? Json script is one of the most important interchangeable formats in the world. Ok, why is json important? Without json we cannot leave it even for a second. Today, not even a nanosecond, json is that javascript object notation, which is that. You can think of it as an interchangeable format. Look today, no company has the same programming language. Look, when you go to your Uber app.

What happens is that the Uber app will send information to Google to sell the data because when you go. to uber, what are the maps you see? They are not property of Uber. Uber will ask Google Maps to display the map information on top. Uber will overlay information like putting the car symbol on it, but those maps aren't actually their own maps. I'm going to use Google Maps, so when apps from different companies are inside your company, different apps want to talk to a different app that's written in different programming languages or different full stacks and different, all right, people I wanted them to speak in a very simplified way and json became So no matter what programming language you are writing, you ask a person a question in json format and the other person will accept your request.

He will understand your request again. He will respond in json format. The current json format is the interchangeable format used by human readable programs, and this is the standard on which everyone today works fine, so it doesn't matter, you can generate a json request and you can also read the json request. and understanding that information and processing that is fine and that is the main reason why json is not a programming language, it is a small syntax based on what you call language that will only be used to represent data to represent only data json will not It's a programming language, so the json data will look like this, okay, so it's a proper representation of its adjacent json, it always opens and closes with its curly braces, okay, and this is a one-shot json.

Okay, type of description mode name, so this is a record. a record, usually in c in your sql, what happens? It will show you the table in the table, it will be like this description, it will be a column mode, it will be a column name, it will be a column type, it will be in column, but here is the json. I'm going to render the data in this format, okay, this has a clear use case where it will search your applications in real time, okay, and it will search elastically in elastic search, all the components, this is the parent component, okay, this is going to handle everything because it has the ability to respond to your queries almost in real time, you uploaded like, for example, let's take a one terabyte of data or two terabytes of data or even 100 terabytes of data, okay and you requested something, you executed a query that query will be executed in seconds or less than a second, that is the capability of elastic search because elastic search does not store the information relationally, but it stores it non-relationally in the format json, so in a typical correct sql what happens?

You run a select command, type select star or select city, for example, from the list of cities, so that it is sent to your database server and the database will send it back, but what happens with elasticsearch is that elasticsearch is a Document Oriented Database designed to store, retrieve and manage document or semi-document oriented information in real time because you store the data in elasticsearch in a json document format. Okay, any information that is entered into elasticsearch will be stored in a json format, whatever it is, your application logs or your server logs anything, it will be sent to json and stored as json and then your query will be written, so this is a list of schemas and it uses some basic default values for indexing, which means it will have some basic values. indexing on its own and it will use it well and in the background its elastic search relies on an important component called apache leucine. apache leucine is this important search engine that is built into Java and that is the heart of your elastic search, so apache leucine is a text engine or search engine, okay, so this is a core, okay, what it will power your elastic search, okay so this is apache leucine, this is an open source product and this product is the heart of your apache elastic search. lucy is a high-performance, full-featured text search engine library written and implemented in Java.

Well, then someone has written a small Java program that is highly efficient, high performing, and has full-featured text search. Well, elasticsearch is based on this loss of Apache. Ok let's go. dig a little deeper into the architecture, okay so elastic stack has four important components before people used to call it moose stack but once beats came right people changed the term so let me show you first, let us see this one. and then we will go there so until 2015 people used to call it elk pile but after 2016 elk pile was renamed to elastic pile which is something you should remember or you should call it in future just elastic pile but not elk stack why because look at elk stack, the whole story started with elasticsearch.

I told you that elasticsearch is nothing but an open source distributed json based search engine. Well, then this guy is a very, very powerful search engine, that is, like Google search. I am right, Google search is also a powerful search engine, a search engine is nothing but knowledge to search over a large amount of data with less latency and that is what it is, then it increased to kibana logging station , so once elk was stacked it only had three components, the elasticsearch and kibana log stash, but then that committee expanded and the use cases increased, then they added beats to elk, this brought an additional tool, for which once the beats product was added to these three products additionally, I thought elk was no longer needed because there is a fourth name, so instead of putting elk b stack something like that, they told elastic stack that elastic stack be an lk stack but with more flexibility to do big things, so it has been renamed to accommodate more The component that came as beats the beats is a small component that we will discuss in detail and that will be useful for collecting and send the information and it will store it fine, so now you have to call it like an elastic stack. you shouldn't call elk stack and this component, as you see the base components will always be the beats and the lock stash, why the beats unlock the stash, because those guys will be the people who will actually retrieve the data, process it and send it . for elasticsearch, elasticsearch is a component that will store the search and then like, okay then it will be the heart of every product, so elasticsearch is an application that will store information in json format and when you ask it it will also answer you in nanoseconds or milliseconds of time with the appropriate information and kibana is a tool used to visualize and manage the data of our application.

Well, so that we can visualize, it means that through kibana we can create dashboards. Well, visualizations that are very very good visualizations, okay, so the keyboard has very good visualizations available that are missing in all the products in the markets, which means that there are many different types of graphics, well control panels and interface dashboards user interfaces through which you can accurately represent the data in the easiest and most intuitive way so that business people or non-technical people can consume that information correctly, not every time you want to search, sometimes you just want to display that in a significant way, but remember that people use elasticsearch also extensively for searching.

That's why when you go to the Elasticstack company, they tell you that Elasticstack is one of those companies, so they can't tell them to look for a company because they help you look for a lot of information in a very short period of time and let's see what architecture is okay, so elk flow, so you have three servers, okay, these three servers you need information and you have the information about your registration information, so when you write a Java application or any application, even an application with dots, what happens, app registration is very critical through app registration, only we can. really know if there was any information, there was any problem at midnight or yesterday or during any time during the execution time of the application, so now you have faced in every situation and it is okay that the production services are not working properly, so once you receive this message successfully what happens?

We log into all the services where it runs approximately and we will always troubleshoot the problem through logs because application crashes are the first point of troubleshooting in any company, but that application may be running on eight servers because it is a distributed application or larger capacity application, you can't go to each and every server and search for them, it will take a lot of time and also since the logs are separate you can't know what will happen if all the locks are combined and you search on top of it and that's where one of the most basic or fundamental use cases for elk comes in: analyzing logs at scale combined is from a single location, so you can combine all eight logs of the application into a single location and then you can search o Look at it to get meaningful information in a very, very simple and efficient way, and not only will it be historical data, but it will allow you to search in real time so you can keep feeding your moose pile and you can keep searching for them in real time. time so it's not about old data guys okay a lot of times the right people talk about some old data so the data that was collected but elk stack constantly collects and constantly stores allows you to analyze almost in real time so in real time you can install beats app a beats app is a small application which is responsible for collecting the information from any source and will send it directly to the elastic stack or else if you need some kind of transformation, it means you need to do some filtering, it means the data you collected directly is not in a usable format, you need to slice it or transform it some other way, then you will send it to the log store, the log state will filter the transformation and then the logstore will store it in elasticsearch if the information is already very good, you can write a store, beats can send it directly to elasticsearch to store it and elasticsearch, as I told you, is an enginesearch that can store information and will store that information and then through kibana.

I'm going to visualize it clearly, so in Splunk, right? You don't have these kinds of different things in Splunk. You download the Span installer, install it properly and how a Splunk license works. Do you know how I'm going to do it? Pay the similar money for Splunk based on which means if you cross daily 15 GB of data or if you input 15 GB of data into Splunk and 15 GB of data is processed and stored in Splunk's own format then you pay. that amount of money is why splunk quickly becomes a very expensive tool, and many companies are not even interested in spending a dime on their log analysis product or the log analysis aspect of their company, it is because that elk stack is incredibly popular because I did it a couple of years ago and we used to easily release the threshold and Splunk used to create some kind of notification of how many times you have breached your data in the last 15 days and that information will be sent to Splunk and Splunk. again it will ask you to pay more money, that's fine and Splunk is also very easy to install, once you install it you can dedicate what is hard storage, full storage, what is your cluster and where do you want to store it, you know that means It has a single UI console.

The cluster information will be there, but here also this is also easy to install, but to maintain it in real time you need to put in a little more effort, so let's talk about these products in detail, okay, so the elasticsearch, as I already told you, It is based on open source application called apache leucine, okay, apache lucid is a search engine that can process json text and they took elastic search to create dynastics as an application, they took apache lucid and made a lot of modifications to it and that is has become the core of elasticsearch and can also be used as a nosql database and analytics engine and remember don't try to use elasticsearch as your application data means you don't store your clients information, you can store it but it is not efficient, it is efficient to search and analyze. but not for general purpose, yes if you really want to use it you can use but never try to use elasticsearch for its general purpose.

For the general purpose, your SQL database is always good, but the type of search activities and analysis activities is right, which is your elasticsearch is going to be very popular, and the purpose, as I told you, is that elasticsearch basically It has no schematics and works almost in real time and has a calm interface What is a calm interface? What does it mean? The rest is again very, very important and if you know it well, the entire Internet runs on the best API today. The entire Internet runs on the best API. The rest is nothing more than an application. programming interface that will use the http request for its information collection or for its information request.

See that there are two applications that want to communicate with each other. Okay, we have your rpc protocol. TCP protocol. Its name is ssh. Well, there are different types of protocols available today for applications to communicate with each other, whether in the same location or in the same data center or in a different geography, but fundamentally we need a set of rules for how we can communicate with each other, which is a standard and is called protocol. but more and more people started talking only on the internet so people wanted a much more simplified and efficient way of communication and it worked in a simplified interface way and that's where Restful started.

The rest is an API. What does API mean? the set of exact rules for how the application can communicate, so if you want me to talk to you, I'm going to set some ground rules and those ground rules are APIs, how can you communicate with that? What is the syntax? How can I communicate? What is the protocol? What is the port? What is the syntax? What is it? Everything is defined through the API, so APIs expose your application to internal requesters or external applications. Rest API means it is a rest application interface that uses http request to perform the operations through get, post and delete data, Restful API also in short form, people call it Rest API is based on a technology of representational state transfer where the architectural style of communication approach is used through the development of web services, so whenever you see web services where you use the rest and the rest is used again.

Look, the rest is a set of rules. How can people communicate the full http protocol, but when they communicate the application, all the data they communicate is json, that's why the rest of the API and JSON are moved? hand in hand because the rest of the APIs are web services that will ensure that different requesters talk to each other. They have said that we will use the http protocol as a means of communication and for the communication of our web service or a web service to talk to. other web services and they said the data we will use is json that is why wherever you see json rest api they all come together and understanding your http protocol is what is very critical nowadays the http protocol has become the basis of each service that speaks. to each service because you can simply configure it and it is easy to secure by putting ssl on top of it and it is easy to understand it means http protocol is very easy to understand it means anyone can write an understanding it needs very complex knowledge to understand That is why today in Today, even internal communication in your companies is done using the http protocol and that is the basis of your microservices and entire communities in Docker World.

They all also talk through sdp, it's just that nowadays it is required for any Java application or any application you write. today you can only interfere or interface through stp, so you wrote a small application, okay, a calculator application, so if someone wants to ask you a question, they should ask you a question at rest alone and that question should be in json format only today it is the industry standard, I don't care what programming language is used, I just hope that it accepts a request via http protocol for the rest of the API with json format following the correct standard, people They have drastically simplified their communication, okay, that's their rest and these are very, very important the rest, right, the rest, api, your json, your microservice, that's where the industry is moving and that's something you should know and it's not very complicated because the rest is basically a set of rules, you just need to use it, which means that this guy can be I spoke through his http protocol so that he responds to his http protocol and you can ask him questions and that It helps us interact with multiple interfaces so that I can ask a question from my browser to elasticsearch and I can even ask from the command line as well, you know? curl command duplicate, you know these two commands, yes, the curl command, right?

Sometimes as developers, right, you guys use curl http equal and right right to check whether or not a service is running correctly, so the curl command is a small Linux utility that operates. in the http protocol so you can ask questions through curl to any application that responds in the history protocol will respond to you with that response and we will see that okay and your elasticsearch right is an analytics engine where you can run a query or a search against that and you can retrieve that information and store your data in the most optimized way.

Elasticsearch I told you is a person that will store it and allow you to play it, but when you store that information, store it in a highly efficient way so that you can read it and your implementation of what you call a database is very different, however, because because elasticsearch will continue to ingest the data, let's move on to the next component, kibana, so kiwan, I told you that it is completely a ui graphical user interface application, okay, where you can do many things when kibana was the first to do it, it is right, it was only used to create graphs and histograms, you know you know histograms well, since you guys are already brave, so you know different ways to display information correctly, so Instagram is also a way to display information.

You can use pie charts. You can use what you call a tile dial. It shows you like a speedometer which is also a type of representation and histograms are a type of representation graphs. or rendering, so it used to have only the first graphs and histograms, but with every update, the keyboard has evolved now, which has many important features that make the crowd understand well and if you really want kibana, it can also work on independent mode. but it is very effective when it only works in conjunction with the other two components, elasticsearch and its log storage, so let's talk about these main features in which kibana is useful for you.

Well, it will help you discover your data by exploring it so that you can explore it. visually it is ok, what is the elastic data that is stored? I can see that visually we can analyze your data by applying different metrics. Yes, I can apply a matrix in real time and see the result and I can visualize the data by creating different types of graphs here. it is nothing but a dashboard, in uh kibana terms, we call it a dashboard and it can apply machine learning to the data and get future trend anomalies. Yes you have machine learning features in elk stack which is a paid feature but your company is already using elasticsearch, most of the companies use elasticsearch only paid but like I told you if you don't want to use only some features, Some advanced elasticsearch features are completely free, you can deploy a cluster as large as you want. but there are some right features because elasticsearch is also a company that should also pay salaries to its employees so you can't publish everything for free, so you have to leave out something that you think is okay, you are forced to pay money for machine learning. some important features that it has, what he called, he has hidden from the customers in the open source version and if you pay money only, he will give it to you because again, that company should be running properly on a sustained basis, so he can monitor its complement. through apm application performance monitoring means that when you write a java application you are not only interested in writing knowing whether java is running or not, you also want to know what is running and what is happening inside the java code, it is possible All of these things.

It can be possible through your kibana, which is mainly about allowing you to see the information visually and also display it through different graphs. The log stash I told you is the actual data pipeline that can take data input from multiple sources, filter it and output to multiple sources means logstash can also be used independently and can be used as input to Kafka or any other database. So, Apache Kafka is also a very popular tool in today's market that is related to big data but it will not go there. but logstash can work independently because it is a small software that constantly reads data from one input and sends the input data to multiple different types of outputs, but again, logstash will work best when used together in this dlk stack and can also take the data. which has been collected and sent to other sources besides elasticsearch, log state is a very important tool, it's okay because he is the main person who will do all the hard work and collect information from everywhere and properly feeding that information to elasticsearch, so logstash is that person who will act as a splunk job feeder.

Okay, this plan is your feeder workstation where you install it, it will collect it constantly, but here it has separately its own name called logstash, okay and we. can take any type of data using logstash is ok structured unstructured anything it doesn't say is ok I need this information with this structure only because it is not a sequel it accepts any data any unstructured data is happily ingested into the log store and it will be stored in your elasticsearch and your beats, I told you, logstash became once vlk start became popular, right, logstash became a heavy application, meaning it was very very difficult for people to install lockstars everywhere because it consumed a lot of RAM and CPU and became a heavy application. so they wanted a lightweight, efficient and very simple application and that is where rhythms came into existence, this lockstash will be able to fulfill any type of requirement but its right rhythms are specially designed small small applications, there is a bit called file bit. network bit, so the bait side are little tiny apps that areYou can simply go to the yaml file and mark it without master point as false to disable it as a node.

It means that you will not be a teacher if you are not a teacher. There are three things he can do, he can act as a data person, an ingest person or a coordinator person, data person means that a data node contains data containing its document index, everything we store in the stack plk is called a document, so here we have slightly different terms than your standard one. relational database terms that we will discuss, so here it is called document and this data node will only store all the data that has been sent by each type of application and handles its operation related to the current search by default the data.

The node feature is enabled and you can also disable the feature by setting this node.data to false in elasticsearch.aml and you need datanodes and you just don't need a datanode you need to have at least three datanodes so that you have your data in a high volatility way and just keep in mind that the ingest node is just for processing the information that many people send you the data so you have to process and store it correctly, the ingest node is a server or where elasticity is installed, its purpose is only to process a document in pipeline mode before indexing the document, that's all your job is to index it correctly, grab it and store it in your data node, then the coordinator node this type will only act as a coordinator, you will only route the request, which will route the request, handle the search reduction phase, and distribute the work using bulk indexing.

This guy will just act as a router person, okay, if there are a lot of people using elasticsearch to constantly search correctly, he will direct the request to elasticsearch, which is free, and will also try to distribute the work using bulk indexing. Bulk index means that it will index the data in multiple locations and try to collect it at the same time. These are the roles that you have, okay and when you install Elastic Search, people will install Alexis Search on more servers and then they will activate on a server one of the options and make sure that they have a required capacity, this is where I told them that initially They may need to do some planning and they need to have at least two or three master nodes and many data nodes and one coordination node is enough and two or three injection nodes. are enough but the most important thing is the data because you need to store the data guys and the data also needs to be replicated again which means at least two copies of the data otherwise one of the servers right where the data is stored data if it fails completely. the data will not be available to me until that server is opened, that's why people also opt for replication, which means that the data will be replicated and stored on their servers.

Okay, this is nothing more than a normal server where you install a small software called elastics or software, but the behavior can be controlled by activating one of the options here, okay, so we are not going to talk in so many details because plunk does not it becomes slunk, it will completely hide the operational details from you because you are paying a lot of money for it, it will not expect to do all these unnecessary things for you because you are paying a lot of money, it will simplify everything for you, but here it is open source, you can customize it as much as you can. you want and it's also a big advantage because you go to Facebook, you go to everyone, right?

They won't use blanks for their amount of data processing, they need to write a blank check to a company because they will process like I can't even. Imagine how much data these big companies will process, it could be petabytes or zettabytes. Also, the company Splunk will overnight become the richest company in the world just because it is getting the license money from Facebook, so most of the big Netflix companies and all these guys won't . at all, touch Splunk, who will use Splunk? I don't have money to spend on human resources to make everything right. Those guys will go because they want there to be no problems from the first moment, which can be set up in a couple of days, but once. their requirement grows slowly people go quickly to elasticsearch why because they have much more control and how can they do it very big companies it is impossible you know what to see facebook is the worst busiest website in the world you know facebook has 200 million customers 230 something like that Recently it means that almost 35 to 40 percent of the world's population has a Facebook account.

What database does Facebook use? Any ideas? You won't use Oracle if you will use Oracle correctly. He needs to rewrite all the entire logical revenue of his company just for himself. he uses mysql, remember that once you go beyond a certain basic use case, most companies will use completely open source products, they are not interested in paying a dime to any other competitor, okay, sure, Google also has its own database called bigquery. Google will do it. Don't use any of people's databases.everyone has the ability to create their own databases because of the license cost because whatever they install correctly they will install at least on a minimum of 10 lakh to 40 lakh servers In our company, we will usually have between one hundred thousand or four thousand five thousand.

You will have a maximum of 10,000 servers, but those guys will have between 50 lakhs and 70 lakh servers, so that number itself will be very long, so if they buy the tools to do this kind of thing correctly, they will pay correctly for server. If it's per server 100, then you know the charge, that's why they won't accept it. Okay, at least they will have 40 lakh to 50 lakh Microsoft or Facebook service, so let's talk about this data, okay, we need to understand. what is stored inside, so we will understand what fields are, fields are the smallest single unit of data analysis, field means that when you enter some data correctly, fields are something like the name of the author, so we have seen an example of this json format above. right, just in this json format, so this id title is complete, these are fields, okay, the smallest unit of information in uh elasticsearch is a field, okay, so the id is a field, so that's something we need to understand, those are fields, the smallest individual data units in elasticsearch are customizable and could include, for example, title, author, date, abstract, team score, etc., okay, and each field you have a defined data type that contains a single data, right, a data type can be your string information, a name can be a string, mobile phone number. be a numeric date of what can be dates, whether it is sometimes a primary client or not, it can be a s or not, it is also correct, a boolean data type and sometimes also complex data types as objects and nested objects, which means that an address can be complex. data type that is nested because again in the address bar again it will be like your house number on the road, apartment, so a field can also be a complex field and sometimes it also has the geodata types that They are very important to use geographic analysis.

Okay, where you will have the latitude and longitude, also information where there are geographic types and special data types like the token and data like the range of flat information, which also includes all these types of things and these are the most unit of data small in elasticsearch and the multi. -fields are where they can be indexed in more than one way to produce more hits, so multiple fields with multiple index means you can use the correct id, e.g. most of the time id name, user id employee, these types of things are unique through which we can uniquely understand the document, remember that when you want to understand this guy well, since everyone knows the basics of SQL, he will write many SQL queries day by day in his Splunk, so let me show you what it's called. the distinction so you understand it better, okay, here the index is called like the database, okay, index, look at the database, right, the word index is used for a very different term, right, but in elasticsearch index means a database index means that the data is okay, the location in the data, so when you install the Oracle database software, it creates a database inside it and that is nothing more than what It contains the information in the rows and columns, so here it is called index and the field is the field right now, we are looking at this field, okay, understood, so that's it. what we're looking at here, which is a field and a multifield that can be used in multiple ways, so we can index in multiple ways and the beta fields are not very important, but our multifields and our fields are. the important data you have and the fields are nothing but name, author value, customer data, whatever is to be filled and multiple files, easy level through multiple fields option, then document documents are the json objects that are stored inside an elasticsearch index and are considered the base unit of storage, don't confuse index with indexing, it's okay, try to keep remembering index means data, it's okay here in elasticsearch index means theta, okay, group of data and each data will be called as a document, okay, in terms of your comparison, right, if you go here, it's not there, but it's a single unit of information, for example, you have one row to the right, one row has which column and row value, the right column will say the name in the row, it will have Gautham's last name. name the data correctly so that the entire information is called as a document, so documents are json objects that are stored within an elastic search index. index means the actual database, so it has documents and is considered the base unit of storage in the world of relational databases. the documents are compared to a row in a table on the right a document a row here a document a json object and in terms of your if you want to display it this is a document this is a document this is a document because in this you have to do two two things okay so this is a document this is a document understood so document a document to document three you will send it they will understand they will do it okay now let's understand the other term suppose you are running an e-commerce application You can have one document per product, or one document per order itself, which will represent information related to that operation, for example, the order document will be the name of the customer, what is the order they placed, what is the delivery date, what is the price paid.

What is the location you should send to? Well, all of this will be your document. Well, you can have one document per product or you can have one document per order. A product document means that one product is that you order three products, three books that you ordered each book. you will again have correct information, so it can be anything, there is no limit to the number of documents you can store in a particular index unlimitedly, in the same way, directly in a database, you can store even two million lines, also, usually, your database is very, very long.

We also have no limit, you can store any number of documents in a particular index, data in documents is defined with fields, combination of keys and values, I already told you that json is one of the simplest ways to represent data and here the key and the value. both are our role okay our own choice which is nothing but here okay this is the k n value yes this is also json okay the key and the value so these are keys, these are values, okay, so, and those are the simple way to represent your json data name car age or keys and john 30 and null are the values, okay, so you define the data in the document, what fields are made up of keys and values and the key is the name of the field and the value can be an element of many different types of things like a string boolean expression again, it can be an array of values, okay, and the document too contains reserved fields that come to document metadata like underline index okay every document will have correct metadata because when you store information inside my right, I should know which database I stored correctly so it will have metadata called underscore index, the index where the document is recited, write the underscore, the type that represents the document and the response id, the unique identifier for the document, so this is a document here, three type identification . your index type index transforms the intercontrol source, so this will be addedare also stored in some kind of base, which is an index, that's fine, but if you want, you can also do it again. yesterday we fixed it, you can also put and put a new index of what, but now it's working that way, it was taking time, yes, so what I put was correct, but because of timeout issue, it's ok, so yes this is a valid command. you can put and you can put a catalog, you can put a document here and I can send it, okay, you can just create an index, okay, so you can run any type of command here, so let's run a command to get a script search under, the search command is We will get all your available data as part of each of your indexes and I think it will take a long time.

Let us search based on your index. Okay, so this is a test. I have an index. Okay and the test index has this. The information is fine so I have data related to the text and the documentation so that's fine, right now some of the carries are timing out because it's a lot of data and the resources we provide are not enough , so some commands are get time is not all, but it is not failing, so there is an index, there is also an end called books and score, what is that other one? a book, television and score index, okay, so let me see, there is also another one called book, Debian score index and if you run that, it will also be fine, so this is also some data that I put, okay, there is some data so yes your queries will work so this is the easiest way to run queries on your elasticsearch directly and these scm uptime apm are all you need.

Call, it has additional features, but when you go to administration and you go to license management, by default you have a basic license available that will never expire and you can start a 30-day trial to have all the advanced features activated. You can click Start Test and it will write a startup test. The trial will give you some Platinum features. Platinum features are machine learning capabilities. JDBC oDBC connectivity and alert graphing capabilities for SQL. This means you can even start connecting to downstream databases and SQL and authenticating. you can implement your active directory or a saml based one and it may also require some audits so these are some extra features that you don't enable or provide because I told you the company also wants to maintain so they blocked some features. below and they only give it to you for a license fee.

If this doesn't concern you, you can cancel them and you don't need to activate them, but you can activate them immediately, so this is a single server setup. and for lock stash again you can go to log stash and install it but for a log stash you don't need it on the same server but yes we can install it on the same server so let's install logsdash so that logstash has start .gz zip. file, so let's download the zip file. What is Windows event log? Even the clock. Whatever happens in the OS, it will be in event blocking.

Okay, you can use Wind Beat. Alright. Wind Lock Beat is a rhythm. Well, yesterday we discussed rhythms. you have a beat called win log beat, okay, so the beats that I told you are the simple and efficient way to manage your data collection and send it to your elasticsearch, which is your database, so you can download this beat from winlock and install it in each one. all the windows servers you have and they will start sending data to your elasticsearch. Well, with this right you can replace many tools in your company, you can remove the monitoring tools, you can remove all the health check tools and you can have a combination. a monitoring solution with you again, I told you that the keyboard has a big advantage with the user interface, which means that the user interface, the right graphical options will be too many and you can use them and install them is very simple because there is not much complexity, just download. and unzip it okay you don't need to install everything okay these don't need any installation so wind lock beta so now here for wind lock pace again you're going to use this one so for bin log with what you should do. should just run a ps1 file, okay, the wind lock bit is nothing more than a simple partial script that you guys can understand, so what you need to do is run it, but before you run it, you need to make a basic configuration change and once they do the pausing setup, you can run the command, so to install it you have to install with forward slash when you log in, okay, let me open up PowerShell because it's better to see that beats are already small specifically designed softwares that already understand the data, they convert it along the way. elasticsearch will understand it and they will send it to elasticsearch, so before the right people only used to have log storage, log storage anyway, you will see later the stress of the log through storing logs directly in the logs, since there will be three pipes, well one is the input, the next is the transformation and the third is the output, so the input means where the data should be taken from.

If you do, some transformation has to happen and then the output is where you should send the data, so the input. is the log location of your program wherever you install logstash in that appearance on the server where all the information should be collected it will be the output location will always be your 9200 elasticsearch and any transformation you want but that became complex logstash needs a little bit more RAM, so these guys bought a company, so there was a company called optimistic, so elasticsearch bought a big company and they created all these software beats, soft software, small, small, small, which are low latency apps so for every activity there is a bit for windows there is a bit app so if you go to these beats you want to learn the beats so it will show you all these types of beats available so lightweight data carriers are fine, so these are all the bits that I told you, file weight, metric bits, packet bits, now we are discussing about this. gain register bit, which is Windows because I'm on the Windows rating system.

I'm playing it right now. The audit bit is for auditing and the heartbeat is just for monitoring uptime. The heartbeat will only indicate whether the server is up or not. We don't care at all because for us, when the server is not responding, if the server is unavailable, that's the most important metric that we have to worry about, so it's a hard bit network, just a data network, data metrics. packages, your operating system, full operating system metrics, CPU RAM storage file. The system is fine, anything else you want, it will collect it and the most popular and the most in-demand or the most used is a log bit file.

Bit means that you tell this type to grab data from specific files, such as your web server files. application log files, any file will take it and send it to elasticsearch so you can see the sending from the source. Plain and simple beats are great for collecting data, they sit on your servers with your containers, deployment functions and then centralize the data into elasticsearch and bits send the data confirming the ecs elastic common schema and if you want more power processing, you need to forward it to the log store for transformation and analysis, so by default it doesn't need much transformation, but still, if you want a lot more transformation, you can send it to the lock stash in the log stash you have . grok grock is nothing more than a syntax to do all this type of transformation and then it can be forwarded, so we go with this guy and, well, this is the location, it should allow us to install this guy, so what I got was that I gave it Power Shell, I told PowerShell to give that guy unrestricted runtime, okay, now let me run, let me use unlock the FM file, which is better so it's installed.

Now the Win registry bit is installed. Now let's go and it's installed in your program files C. Win Log Beta, so let me go there to see the program files. This is your bit configuration file. You can specify what type of events you want to monitor. Okay, and you can also know how long or how far you want to take it, and the most important thing here is the outputs. Well, here you will see the exit. See output.elasticsearch. Right now it's localhost 9200, which will be perfect. ok because i already installed mine only on localhost but in real time you have to configure it correctly with elasticsearch server name on any server you have elasticsearch on because win bit will be installed on remote server most of the time. but in this scenario since we are looking at everything on a single server this will work fine because what you call my latest research can currently be accessed on local Australia because my elastics were installed on the same server this is the result and if you want it to generate to lock the stash then you have to send it to lockstar server but here i don't send it to lockstash, whatever information is collected it is already in ecs standard so elastic schema standard is fine , and if you go to a little bit above you can see where this should get to kibana and kibana is fine, but now you want any tags, but the important thing is that this is fine, the event locks are older than 72 hours and what are the types of event locks you have.

What I want to monitor will be here, so let's take all this, the system security process, everything is fine, so I'm going to take the default and okay, let's go to this btml file with win bit documentation and see. that and you can upgrade it, decrease it or increase it, okay, so let's go gain with the reference sound. the same reference file we are. see it here, okay, it means that whatever the default file is, the default file is shown here and you will see right now in this default file, these two are just quick information guys, if you want to make a career in cloud and

devops

intellipaat

provides an advanced. certification in cloud computing and development by enict academy iit roorkee and is taught by iit roorkee faculty and industry experts.

This course is designed to improve your skills and land your dream job. Now let's continue with the session without comments and at the top they are uncommitted so most of At the moment this file is just commented and if you want to learn more about your processors that's fine or more information about other activities, you can start uncommenting them all, okay and right now what is only uncommented is because they have generalized, okay. to be name and security, but if you want more and if you go down, you will start to see them, okay, the maximum number of events that you can buffer, okay, the download time, which means after 15 minutes you want delete and you want to load the newest data, how long you want to delete and where the file should be available means which file you want to read and then even types, so here the number of processor you want information about all the available processors yes, it can give you the CPU and processor fields it will find the information about searching for processors and then these are again, if you have any container it will also monitor the container, those are the output processors and all this is proxy, if you are not connected directly, no certificate registration is needed. save the output, so your selection choice is just inside this where we've seen, so you have options here, what you want to do is here, so your name is an application that is not older than this and it is It keeps null, it doesn't want anything, it can make everything false. but we don't want that we need to get and filter pass enrich it is about doing something but there is no information on how to increase the amount of information okay it is monitoring itself okay I don't want to monitor myself let me go and start my gain bit, okay, because until I started it when you activated it it was just in a stop state, okay here or a little bit up there you will always see stopped because we just stopped it, so let's start, okay?

Start it now, you can run the script startup service, so let me go to my kibana now and in kibana let's add index management. There are two indexes. You can create indexes through kibana directly as well. This is nothing more than creating again on its classic. just search, okay so create a patternwants to know what happened yesterday at midnight, can just give you a quick information, guys, if you want to make a career in cloud and Devops, then. intellipaat provides advanced certification in cloud computing and development development by enict academy iit roorkee and is taught by iit roorkee faculty and industry experts.

This course is designed to improve your skills and land your dream job. Now let's continue with the session where problem solving begins. and check the log file of the application, what was the message printer at that time and usually you have stack traces in the log file through which you can easily find the root cause and then understand the behavior of your system or application to know when an application or system is running correctly. Normally it is a black box, which means that internally what it is doing is very difficult for us to understand, so in order for us to investigate or understand what is happening inside the system application, we always have to rely only on the locks, so which, for example, right? block the time it takes for several blocks of code to execute next time, you can easily adjust your application based on this time taken by the internal application and then obviously audit many organizations have to address the requirements where they need to store or maintain the registers. for a certain period of time because it is part of your industry and for that and one more with a lot of machine learning and this right data mining, the trend of analytics has been a big trend recently, people try to use current events trying to predict future events that we can all mean, based on the entire site, we can conclude that locks are very important, but the problem is not that locks are not important.

Locks come with their own kind of challenges. What kind of challenges do you face when you lock your locks? There is no common consistent format. Each application will not have the same consistent format. Oracle database logs will be different. Your Apache web server crashes will be different. Tom's catalogs will be different. Microsoft locks will be different, so each system will generate locks in its own format. and as an administrator or end user it would require experience in understanding the format of logs and then using this format to search different logs so it is a well known challenge for us and logs are always decentralized if your Java Play case is being executing. on four servers, as part of clustering, logs are stored on each server locally on the device, which means we always have to go to the specific device, log in locally on the device, and then use a notepad or any of the tools to open it and then fix the problem means that there is a big reason for a centralized log management where we are going to need to store the logs always in a central location and it is a big challenge and again there is no consistent time format, the format of Time will be different according to each system or application and again it will be very difficult to identify the exact time when the event occurred because again the timestamp or time format is not consistent and another important thing is that your data is not structured, the data long ones are unstructured and become difficult to perform an analysis also because sometimes you can have 10 failures, you can handle sometimes we have 12 fields and again it will be a combination of structured and unstructured data and all these are problems that you will face while trying to perform log analysis on your applications in your company, yes we know that crashes are important, they have a lot of information, but these are some of the challenges that you will face and we are going to solve this problem of a collection. analysis and then solve these problems through a tool called logstash.

Logstash is a key component of the elastic stack and is primarily used as an etl medium to extract, transform and load the engine, so it is the main way to collect the information from all your different devices. in elk stack is through lock statistics, lock test is the main way we will collect the information, process it and then store it in your elasticsearch which is our main database so logstash is an open source real-time data collection engine. The pipeline capabilities mean that in real time we can keep sending the data and the logstash will allow us to easily build a pipeline that can help collect data from a wide variety of input sources and will help us pass, enrich, unify and store it in one wide range of destinations, so logstash is not only famous for elk stack logs, it can also send them to many other destinations, so lock says it is not only popular within elk, some people use locksteps directly standalone this product only locks this is popular in elk stack but locks are also popular standalone because they can take information from anywhere and they can also send the information to many destinations including elasticsearch which means that logstash can also send data to other people, but that's not our business, our business is how it is. blocks that are going to collect it and the most important thing is how you are going to perform the transformation, what kind of tools are in your log stash that can be published, simply the collection can enrich your data, so this correct registration stage will provide a set of plugins known as input filters and output plugins that are easy to use and pluggable in nature, so it will be a simple process to unify and normalize a huge volume, so that the log stash mainly has two main components: the way it inputs and the way it outputs so they both have plugins plugin means again you know it has an extensible nature here means it's non-standard there okay so logstash has opened up a plugins architecture where there are multiple plugins in the way we input and there are multiple plugins in the way we output and these plugins are many plugins that are there and based on plugins exchanging plugins let's control the behavior of logstash what are some of the important features of its logstash? the logstash is pluggable data pipeline architecture means that it contains more than 200 different plugins that have been developed by elastic company and all are open source community and that can be used to mix, match and orchestrate your different input and output filters so that logstash can get data from different types of people and lockstars can then perform different types of operations, text operations means transformation and then send them. over 200 plugins and it's extensible so logstash is written in jruby jruby means it's a Java based Ruby it's the main lobby just because if you ask me Ruby is a very good programming language for text manipulation .

Ruby is the most ideal language for a text transformation and elasticsearch is mainly about text transformation that is why it is written in Ruby and it will support pluggable pipeline architecture so you can create your own custom plugin as per your custom needs if not find a plugin. and it has centralized processing, which means data from desperate sources can be easily extracted using the various input plugins it provides and it will enrich and send them through multiple decisions so that it acts as a central person to process your information, then there is much variety. in volume so logstash can handle all types of logs like apache nginx its system records windows event logs and collects many metrics from a wide range of applications and it can also collect from the tcp udp protocol and it can also collect from the http protocol your history. events as well and you can also collect information from your jira your github like these tools and you can also consume data through your relational databases also through some kind of plugins of all this type and also it is highly compatible compatible means it is easily supports mix and match with your elasticsearch and kibana tools and how can we install lockstash? lockstep doesn't have it's own directory so that means you have to handle it explicitly i.e. for advanced search if you download the jdk it will be inside that but for logs it won't be there so it means you need to download java and install it and set java chrome as your n1 variable and you need to check your java version and run java fm version command so once you run it you should expect to see java version output so i am using a machine with Windows so you guys I can recreate it, this is my Windows server, it just needs a little more resources, so let me get the v password control so that the downloads are correct, the first thing you should do when you go to specify your prerequisite and.

What I did was I didn't install Java explicitly, so what I did was go to my n1 variables, go to your settings here in your settings, windows settings, click on environment, it will automatically give you unwanted variables, click edit and a variable to get your unknown. variables here so what I did was create a system variable called javance for home instead of installing java we already have java available for elk stack so I pointed my javascript jdk because elasticsearch came with its own java so I pointed that out. to the directory as I mentioned in your document in your Windows document, you need to create in the c directory the elk folder inside that elastic search directory, where you unzip your ls search because electric search is also downloaded as a zip file where we unzip it. the unzipped content will be in this elasticsearch directory inside like a jdk, so point it to that so it automatically has java, you don't need to explicitly install it again, sure, it won't install any extra components or not.

The only thing I'm going to do is generate log files in that location, log files and data location, and elastics have a correct data location because in reality, the analysis is nothing more than a database, so You must properly mention the location of your data and have mentioned it. in your install location, so you need to update a couple of things in your dot per ml file, the commit file, so kibana log stash and elk uh and your ls has commit files, yaml files to update it, it's ok what i have done here is on this machine i went to c directory i created elk directory here a simple elk i created this folder myself and put elk stack search here so i downloaded ellis search here and unzipped the content and put is in my elk folder so go to lk stack here and go to bin directory and then go to cmd command prompt in this location i told you to type there now elastic search dot bat just press enter it takes something of time and it will provide you with the result, whatever it is, it will provide you with the result and it will give you a simple message called "it started well", so if you give it enough time because this is running as your service, if you want, you can register it as your system services, is that ok? you can register this video system services and in system services you can do so you see this is a basic message that you should expect whenever you want to confirm whether your requester is running or not active.

The license is now deactivated. Basic security is disabled. seeing this message means that it is actually running the actual message that you can see and this one is open source but there are other paid ones you can also use them and then you can register it but only your elastic search has a service through which you can register. system services, others don't even have that restriction don't even have them, okay, so register your service in their services, start a message, okay, so I think Microsoft also gives you what you call help so you can use one defined by Microsoft or me.

I think you are using nss, that's fine too, then srv start, there is one more application, okay, a third one, the same thing you need to create a file n and put what should happen when you start, what should happen when you shutdown, you only put a couple. of things and go to the command prompt sc create the same yes so sc create I think it is a basic command that is already in your operating system and you can create it fine I think you can follow this simple document because this site is a very Site effective most of the time 100 works if what is the tool you are not working with, but for your Windows machine or forelasticsearch you don't need any type of settings like this because it already has the service where you can register for these types. you can follow this process and remember that the preferred way for companies to run elasticsearch is only your linux, no one prefers windows, only you as a developer, want to use it, you can use it, but your company won't. error when running it on windows so on windows you have your system ctl command which logs it automatically so we don't need much so here i have fulfilled my requirement by creating a java score here its ok in my variable n1. and I started my elasticsearch and now to start your kibana so I can now go back to the bin directory and I'll do the same cmd but I don't need them here okay, I don't actually need them running today for our log testing practice okay, because today's registration stage we're going to run mostly independently, but I'm just starting you up to show you that okay, we can start you up.

You will also get this message: kibana is running on local port 5601. Okay, so this This is how you can make kibana and you can copy this and test it in your browser and see if it is running or not. Okay, kibana is running, Obama is loading. If you see this message, it means that kibana is initializing. get kibana then you can go to your stack monitoring and also start watching your elasticsearch because you start wanting your license click on configuration monitoring with weekly metric or you can set up your self monitoring and once you enable it it will start collecting the information about your elastic search doesn't need any additional help, so they don't need any help, it doesn't need dependencies, and this requester doesn't even need administrative privileges, so the only thing will be that you're not starting it correctly. or it's not running effectively, something is blocking it because it's not running the app, it's okay you don't need to reinstall that's why you're going to break the installation because you can run that kibana yourself directly in the command prompt because I don't think that is something you want if you want to use the same instance then as you see in Windows you don't have the install option yet which means there is no dot xd file to install. correctly it means it's still in beta so for linux you have the option to install it as system services but on linux it's not there it means you have to use some kind of nssm or anything just to distribute it properly. there is no other option because these guys are now working on providing a dot exe file so that the next time you click directly it will directly install the program files which is not an option yet because it is not a stable version yet.

So the only option is you have to do it only if you face error then you have to run it so let's go to log store to download the log. Go to the elastic.co section, scroll down a little further and you will see the log. save here click on the download button and it will take you to the download page and now you will see there is a zip file and a dot.gzen development file in the rpm file. rpm file means a file to install log storage on a Red Hat Developer. it means a file to install on ubuntu and a zip file and gz are directly packaged files where we can download and unzip them so I downloaded the zip file ok and I already have the zip file downloaded in my download directory and I remember one important thing guys.

Whenever you install elk stack, make sure all three products are the same version. Do not install other versions. Okay, if it's seven point five point two, make sure the other two components are also seven point five point two, just the three components. kibana log stage and its elk stack, so here you see I downloaded log state 7.5.2 which is about 160mb and seven point five point two, I unzipped the contents and here I got the directory and correctly again it I took and placed it in my c directory. Okay, unlock the stash and the log stash. See that there is no jdk directly.

It means it needs you to explicitly handle Java development. It means that the installation of the Java development kit needs to be handled by us. go to the directory and the log test doesn't need any running status okay you can run it when you want to inject other data so let me go to this guy cmd command so now you have the log storage and uh what is it your download, uh, the full open source version, okay, and within your registration stage there is what you call on the kibana side, there is a paid version and the paid version we will use later, so if you are going to your administration in the administration, you go to the license administration and you see that you have an active basic license and you can start a 30-day trial, which will be a platinum feature.

Well, this is where you can get additional features, so this basic license. It's unlimited, it will never expire, so we always use the basic license type of all three products, whether it's kibana, whether it's a log stash or elasticsearch. Okay, we will later enable this startup test and use some of the features later, okay and now let's get back to work, so to run lock slash successfully, you can run lock stash hyphen hedge or iPhone help to get the result of the lock set command and the first thing is to let us see the directed structure of logstash, okay, logstash. it has a configuration directory and inside the conversion directory you have your jvm, so open the jvm file.

Okay, you see the two important things here. You have 1 GB of RAM. Okay, so if you want, you can increase it according to your needs. Increase it to 2g 2 gigs because I have a lot of RAM on this machine so I can easily hold 2GB of RAM and all the settings below are considered expert settings so don't go wrong with the settings and right now your stash of blocking is only supported in 1.8. The Java version is not with the future Java version, so if you are installing something, try to provide Java 1.8, not beyond 1.8, beyond 1.8, some Java features are not supported properly, but it will not generate you a mistake, but again, as a best practice, you can try it. to provide 1.8 jdk and you can provide the home location or all these properties ok and take a heap dump if the dump is nothing but when the log store fails it will create a dump file which we can go to and see why the blockage was. heap crashes, what was happening during runtime, okay, so if you find that keeping the dump out of memory means that logstash was having some performance issues during runtime and it will be created with file called heapdom.hpf , so it will create the file in your home directory location, so seeing this file in your home directory means there is a big performance problem in your log stash, so that's not something you call a good sign that your application has and there are some tools that will help you load that file and give you the analysis result so that's your java file okay so just java so I'll save it and exit and again there is one more yaml file called logstash.ml which is the main file and log4j2.properties as well. we have a file where we can configure the properties of our registry.

Okay, where the log file will write all that stuff. If you make it bigger, there are a couple of locations where it will do that and you'll see that it's going to create a file here, which is a log file. Registration is nothing more than what really blocks us. Okay, we have talked too much about logs, so here again the same thing happens, so log4j is a standard way to define the log structure okay, when you go to programming, the lock forger is a library or a package through which you can control the log format, these are all the log options, how much do you want to make the file continuous after how many mb, what is the transfer policy after 100 MB of the file.

I want the file to make a transfer strategy. I only want a maximum of 30 files, so this is the log rotation policy. What is the timestamp you need? What is the file format you need? What is a file name? You need all these kinds of things to be there, okay? and you want to change it. You can change it. How many files do you want? How many old files do you want? What is the time? Name of the file you want. What is the content of the file? write if you want all those things you can configure here for the log of your log storage application, so the log text is also a software.

That guy also has a login. Logging can be controlled here and the logsdash dot yml is the main configuration file of your logstash. it is dot yml, it is an ml file and for the most part it will be part of your data part, which means how many pipelines is nothing more than your batch size, how much it will process each time in each iteration and the data path where your data goes. be available and also your pipeline configuration, how many pipeline ids you need to have the main id pipeline, how many workers you need by default, so there are two means depending on the host CPU course, so I give you two , but I have four in total if you have more. out of four six scores give uh they increase the number of workers okay it means parallel threads that can run so if you have eight cores you can give it four calls just okay because the blocking stats will never run on your kibana or in your elastic search log. size also has its own service where it will run, it runs independently somewhere so it will never run.

See here, we're running them all on the same server. We will never do that wrong company, as I told you, each server will install it separately. So based on this you can make your changes and everything will be related to your data pipeline. The pipeline is nothing more than we will create a pipeline here at some point, so that is what we will call pipeline where we will provide the input transformation. and output for an activity called a pipeline to complete, so the number of pipeline bar sizes is 125, the pipelines delay, all that, so once we go to the pipeline, we'll revisit this file at some point and we'll see what those are, except the The most common things you need to know right from the start are your data location, route location, and a couple more settings.

It's all related to your date, so let's come back here later, whatever we're going to do with logstash. I'll do it on the command line, so the first thing I told you is that logstash is about getting information from somewhere and providing it. How can I create a very simple log storage pipeline? I want to create a logstress pipeline on the command line itself. I'm not even going to write configuration files so that logstash has configuration files where we can define its pipeline, but here I want to write a very, very simple log storage pipeline, so we can define it using the e script on the command line, so I will generate std. in standard in then stdout in http means what is standard in standard guys standard in dot standard or nothing more than what you print on your command line standard in means what you type on your command line is standard in what you get as a result in your command line is called stdout, they have stdin and http output and they are related to your shell whether it is linux shell or windows shell it is the same so once you press enter the store of logs will create a runtime pipeline for me and whatever.

I write that it's going to take that information, so let's press Enter and see if this guy is going to start, so there's a couple of warnings here because I gave it a bad Java. It was asking me for 1.8, but I am. giving it 1.13 so my javance home right now which is set is 1.13 in Java so the first thing you need to understand is that you should see this message starting the crash with the version number that is going to be printed and then it will take you to something called reflections. It took so much time to do the reflection and then it will correctly tell you your pipeline so it will tell you exactly the start of the pipeline and the pipeline id is always primary because if you want to change the pipeline id it is your logstash. yml then instead of logsdash.ml it has the pipeline id and the pipeline batch size which we have declared there greater than 125 which is also the main configuration and the pipeline batch delay is again greater than 50 milliseconds and this is what creates acanalization. nothing, but if you provide input and output, everything is called pipeline, so it will tell you that the pipeline was started and the pipeline is running right now and the log storage API endpoint was successfully started and the registration stage will always listen to report 9600.

Now let's write something to make it that way. waiting for our message so this is a simple and basic interactive way of providing information and by converting that information into a proper lock stash okay so I'll just say welcome to log stash has provided information like this , the timestamp. so it will automatically call see every time you provide any information to the logstash, it will add some information to it, the timestamp and what server it is generated on and what the message is and what the version is. These things are additional. added to your data, okay, so this is interactive, so it's become interactive, so it's a simple pipeline, an interactive pipeline, but we don't do this kind of basic stuff.

What we do is a process of writing a file in which we are. Let's define everything, our friend, when we are going to run that file or give you that file as input to block the stash, so here click control c, if you control c, it will tell you if you want to finish the bad job, click yeah, so it's going to be ctrl c ctrl c you get this message do you want to change it like this? You can click yes and press enter. I'll go back to my terminal mode, so that worked. It will correctly return to the command line.

Now what we will do is talk about this pipeline, so installing Lock Stash just like other elasticsearch components, as I told you, downloading and installing Log Stitch is quite simple and straightforward, navigate to this location, download the zip or tar . file and following the screenshot to do it, okay, so this is the logstash architecture, so the logstash, even crossing the pipeline, always has three stages, so what is the pipeline we are discussing? It will be the core architecture of our logstash, which is called its even processing pipeline and always has three stages: input, filters and output, so a logstash pipeline always has two required elements that are mandatory, the first is the input and the another is the output and this filter is seen in the middle right. is required so data source means anything a log file or whatever is the data source and the destination of the data will always be here elastic search for us in moose situation for us , elasticsearch is the only destination, I don't have any other destination. but locksteps can create files on the fly, which means that once lockstep processes correctly, you can ask it to lock the data simply stored in the text file, but I'm not interested in that, I'm just interested in my logstash collecting information and then directly sending it to elasticsearch so that its filtering is optional, but input and output are mandatory in your log storage pipeline, so let's talk about these inputs, inputs are nothing but what creates the events, so filters which are nothing more than those that modify the input events. and the outputs are nothing more than they send them to the destination, so the inputs and outputs support the codec.

The codex means transformation. That's fine, converting from one format to another format they handle explicitly means you don't need to rack your brains again and convert them to support the codec. codec means it allows you to encode and decode the data as it comes in and out of the pipeline without separate filter encoding decoding means converting json to text text to json json to csv this kind of thing is fine, it will handle it without problems, you just need to say what type of decoding do you want, so is the decoding nothing more than by using plugins and how will the registration stage process the information?

The file is okay, something called apache.log, so there is an apache.log file on your Linux server and it is constantly generating data and you have created a pipeline store for log storage to constantly monitor this file, get the information , perform some transformation and send it. to elasticsearch, so logstash is going to test, only the data will be stored in your log file, okay, here the file path, simple output file path, it will store that output in the file, okay, no there's a lot about what you call options you have a couple more options here when the file you can store it as a zip file or the path location how flash interval file mode what should be the file mode means you can configure the file mode permissions of the file all these kind of basic options you have so we will see the result also when we write the log store any other important ones that we see is not much no there are many because you have the option but again our intention is to load that information into the elasticity of the locks so that again we can perform an analysis. or again once you store it to lock the stash, kibana will immediately have access to the data and then start displaying it, so here we will talk more about qivara but our destination is mainly to constantly send the data to elasticsearch to store it. that information correctly so that we can later perform any type of operation on it.

Well, see logstash has huge value independently, that's why you see all these options, even though we're just talking about elasticsearch, why is it creating this money production because logstash was? The first independent Logstash right was not Elk Stack's own product, it was developed by one person, a normal person worked in a company, he was a sysadmin, he created Logstash, then Logstash became so popular and then this person was to Elasticsearch and collaborated with it and then. made this logstasher part of elk, so elk was not original, only the elasticsearch database was created by the elk people, okay and kibana, so logsdash from the beginning was a standalone software available on the market, but since it was so popular, the elastic company took this person and made it part of their toolkit, that's why you see this section of many results, because if an elk company created it correctly, their result would only have been two or three things, but it wouldn't have been that elaborate, that's why people use logstash in some other types of basic situations it's also not part of the elk stack and it also has some use cases because it started as a standalone open source project and then was adopted by the company elastic because you don't normally do this, you don't see the company doing it. unnecessary extra stuff because they don't want other people to use their product, but still, since this is not their product, it was created by a person, so let's talk about your filter plugins and this is where most of your headache will be .

There is why a headache because the transformation is okay, the data every time you collect it will not be in the proper format, so yes, it will not be in the proper format and the filter plugin is used to perform transformations on the data that you are going to collect . Transformation means any transformation. convert from one format to another format cut it again slice it whatever you do data manipulation is fine, it allows you to combine one or more plugins and the order of the plugin defines the order in which the data is transformed, you can have multiple plugins, too it means data can pass through multiple plugins, you can also pass a plugin from other plugins and it will act as an intermediate section inside your pipeline configuration file, it is an intermediate section, it is an optional section and this is the entire list of available plugins we have so csv a very popular one which will pass the input data through csv so if you put the filter plugin as csv the record status will be understood automatically.

Ok, this is a csv file, let me process it according to csv format, so how is it defined? Define is like this filter csv and not only csv, you can also convert the data so when you collect the data you can convert column 1 to integer type and column 2 to boolean type, it means it has as client or customer name. number, it can be an integer and boolean, it means it is loyal customer, loyal customer, no customer, yes or no, then it will be boolean, so while we are getting the data we can also send the data type to the data type so that we can configure it like this. so we can, that's why filtering is not just about something basic, it can convert the data, so it converts normal data to a specific data type, which means it's just an integer, it's just boolean, so what the csv filter takes an event field containing csv data, passes it and stores it. as individual fields so it will take it as a single field and pass it as individual fields so comma seven means comma b comma c it is a csv file but it will become a b c something like that after uploading to this guy and you you can also add fields, that is again a very important thing, you can add fields, for example, your csv has three fields, you can add one more field, for example, let me tell you how, what does it mean, for example, you have three fields, age, name, number, okay? something like 20years name kumar the number is one, two, three, four, so when you pass it through the csv plugin, the csv plugin can add information.

Adding information means I can add a fourth field. Okay, the field name will be here if you see this structure. The name of the field. there will be something to underline some field name then it will be hello world from the post because you will see that there are some fields automatically available to you which is the host because I told you that by default logstash will know where it is collected from and you want to add some. information because from where you are collecting you can add a field like this if you add write this logic correctly if you write this logic for some field hello world you don't need a level just put the host section itself what happens is when this data is they will print right after this transformation after this transformation the data will be a comma your IP address as hostname whatever the hostname is, for example, it's a dash from your iphone db bar a dot, for example, uh sba. com or something like that okay so this is my server name so it's going to be like this which means you set as input three output fields one field was added so it's add field so remember that everything you call magic happens in this filtering only because while you are collecting it, it is your duty to transform it and send it properly, don't try to send the information, unstructured information to see it in your elasticsearch.

See the more structured information you send to the elastics, the more benefits you will get. otherwise if you see it, it's okay to send unstructured data, but if you send it you can't use elasticsearch to use all the transformations or what you call operations on it, the lookup operations, all good, it won't work properly because it won't There will be no consistency in the data, so most of the heavy lifting has to be done right here and each of these filter plugins has enough knowledge to do all sorts of things understood, so remember that there is a lot of emphasis on filter plugins. filter and you should know properly how to do it, new field so by adding fields we can add one or two fields its up to you you can add two more fields which will be added okay and you can also give unique id to the csv okay , you can also remove a field, for example you will get four fields where you don't need a second field.

The second field is unnecessary, so you can delete the field properly and you can also delete multiple fields, so you will get a lot of data, some data you want. delete because deleting the data itself is a very important thing so you can delete the field and you can delete tags and you can also tell to skip empty columns if there is an empty column entering on the right unnecessarily the fifth column will be empty, you can skip empty columns. can skip empty rows Skipping empty rows and columns is also a very important thing; otherwise the blank data will be loaded in elasticsearch and you will see that by default the csv is a separator in comma shaped csv because csu means comma separated value but sometimes the data you get csv files can be like this semicolon separator either or colon separator so at that time you can define the separator what is the separator so that you understand it correctly because csm is blind until you try to see the comma between the data if the comma is not relative.

I can't process, so you can also choose a selector. What is the default selector? It's a comma, but you can also select different types of things, so all these things are there, that's why it's supposed toWhat was the total time needed to process it. Now often the grock field has happened, this is the the data you get from the client is equal to this method, equal to get a request equal to these bytes, equal digits so you can see how correctly it has become and now this data can be loaded into a json and stored and tomorrow we will be able to correctly write basic SQL queries when client equals something when method equals to let you know right now you can simply start writing SQL queries not as clickers but your queries on top of your elastic search, yes, that's why, like I told you, a lot of your effort initially or during your initial elastic setup, you're supposed to do all this transformation so that people can actually get the benefit of the speed of said implementation, if you don't write correctly, then if there is unstructured data, it's our headache to break it and ultimately it doesn't mean a moose stack product. all the money spent on buying everything configured will be wasted if you don't load the structured data correctly, so you or any team that will be the person to load should consider it, you should put your think about it or you need to ask the purple how are you, what It is the data that comes, what type of data you need.

Sometimes you may not get any help, so you need to figure it out yourself. Sometimes the team person can help you. Okay, yes, this is the type of data we want because not everyone will know about Elasticsearch, right, they will just want an analysis to be done, but they blindly don't know anything, so sometimes it's your only duty to understand and go to them and ask us give them simple questions ok ask them what do you expect from this and based on that question you just need to find a simple solution and implement it so it will be your responsibility just a quick information guys if you want to make a run in cloud and

devops

, intellipad provides advanced certification in cloud computing and devops from enict academy iit roorkee and is taught by iit roorkee faculty and industry experts.

This course is designed to improve your skills and land your dream job.

Watch Video & Subscribe

If you have any copyright issue, please Contact