YTread Logo
YTread Logo

Python Machine Learning Tutorial (Data Science)

May 30, 2021
If you are looking for a

machine

learning

tutorial

with Python and Jupyter Notebook, this

tutorial

is for you. You will learn how to solve a real-world problem using

machine

learning

and Python. We'll start with a brief introduction to machine learning, then talk about the tools you need, and then jump right into the problem we're going to solve. You will learn how to build a model that can learn and predict the type of music people like, by the end of this one hour tutorial you will have a good understanding of the basics of machine learning and will be able to learn more intermediate to intermediate level concepts. advanced.
python machine learning tutorial data science
You don't need any prior knowledge in machine learning, but yes. To know Python pretty well if you don't know it, I have a couple of tutorials for you here on my channel. The links are below this video. I'm just embarrassed and very excited to be your instructor on this channel. Tons of programming tutorials that you may find useful, so be sure to subscribe as I upload new tutorials every week. Now, let's jump in and get started in this section, you will learn about machine learning, which is a subset of AI or artificial intelligence. one of the trending topics in the world these days and will have many applications in the future.
python machine learning tutorial data science

More Interesting Facts About,

python machine learning tutorial data science...

Here you have an example. Imagine I ask you to write a program to scan an image and find out if it's a cat or a doctor if you want. build this program using traditional programming techniques your program will become too complex you will have to come up with many rules to look for specific curves, edges and colors in an image to know if it is a cat or a dog, but if I give you a black and white photo Your rules may not work, they may be broken, then you will have to rewrite them or it could give you a photo of a cat or a dog from a different angle that you did not predict before, so solve this problem using the traditional way.
python machine learning tutorial data science
Programming techniques will become too complex or sometimes impossible now to make matters worse. What if in the future I ask you to extend this program to support three types of animals, cats, dogs, and horses? Once again, you will have to rewrite everything. those rules are not going to work, so machine learning is a technique to solve these types of problems and this is how it works: we build a model or an engine and we give it a lot of

data

, for example, we give you thousands or tens of thousands of images of dogs and cats, our model will find and learn patterns in the input

data

so that we can give it a new image of a cat that it has not seen before and ask it if it is a cat, a dog or a horse, and it will tell us with a certain level of precision, the more input data we give it, the more accurate our model will be, so that was a very basic example, but machine learning has other applications in self-driving cars, robotics, language processing, of vision, the foresight of things. like stock market trends and weather games etc, that's the basic idea about machine learning.
python machine learning tutorial data science
Next we will see machine learning in action. A machine learning project involves a series of steps. The first step is to import our data, which often arrives. in the form of a csv file, you may have a database with a large amount of data, we can simply export that data and store it in a csv file for our machine learning project, so we import our data, then we need to clean it and this involves tasks like removing duplicate data, if you have duplicates in the data we don't want to feed this to our model because otherwise our model will learn wrong patterns in the data and produce wrong result, so we need to make sure that our input the data is in good condition and clean if there is data that is irrelevant we need to delete it if it is duplicate or incomplete we can delete or modify it if our data is text based like the name of countries or genres of music or cats and dogs we need to convert it to numeric values, so this step really depends on the type of data we are working with.
Every project is different now that we have a clean data set, we need to split it into two segments, one to train our model and the other to test it and make sure our model produces the correct result, for example, if you have a thousand photographs of dogs and cats, we can reserve eighty percent for training and the other 20 for testing, the next step is to create a model and this. It involves selecting an algorithm to analyze the data. There are so many different machine learning algorithms, such as decision trees, neural networks, etc. Each algorithm has advantages and disadvantages in terms of accuracy and performance, so the algorithm you choose depends on the type of problem. you are trying to solve and your input data now the good news is that we don't have to explicitly program an algorithm, there are libraries that provide these algorithms, one of the most popular ones that we will see in this tutorial is scikit-learn, so we build a model using an algorithm.
Next, we need to train our model to fit our training data. Our model will look for patterns in the data so we can then ask it to make predictions back to our example. of cats and dogs we can ask our model if it is a cat or a dog and our model will make a prediction now, the prediction is not always accurate; In fact, when you start, it is very likely that your predictions will be inaccurate, so we need to evaluate the predictions and measure their accuracy, then we need to go back to our model and select a different algorithm that produces a more accurate result for the type of problem we are facing.
We are trying to solve or tune the parameters of our model so that each algorithm has parameters that we can modify to optimize accuracy, so these are the high-level steps that are followed in a machine learning project. Next, we'll look at libraries and tools for machine learning in this conference that we'll look at. the popular

python

libraries that we use in machine learning projects, the first is numpy which provides a multidimensional array, a very very popular library, the second is pandas which is a data analysis library that provides a concept called framework of data, a data frame is a two-dimensional data structure similar to an Excel spreadsheet, so we have rows and columns, we can select data in a row or column or in a range of rows and columns again, very popular in machine learning and data

science

projects.
The third library is matplotlib, which is a two-dimensional plotting library for creating graphs and diagrams. The next library is scikit-learn which is one of the most popular machine learning libraries that provides all these common algorithms like decision trees, neural networks, etc. when working with machine learning projects. Technically we use an environment called Jupiter to write our code. We can still use vs code or any other code editor, but these editors are not ideal for machine learning projects because we frequently need to inspect the data and that is very difficult in environments like vs code. and terminal, if you are working with a 10 or 20 column table, viewing this data in a terminal window is really difficult and complicated, that's why we use Jupiter, it makes it really easy to inspect our data.
Now to install Jupyter, we are going to use a platform called anaconda, so head over to anaconda.com. Download On this page, you can download the anaconda distribution for your operating system, so we have distributions for Windows, Mac and Linux, so let's go ahead and install Anaconda for Python 3.7. Download agree. here is anaconda downloaded on my machine, let's double click on this, first, a program will run to determine if the software can be installed, so let's continue and once again continue, continue quite easily, continue once again. I agree to the license agreement. Alright? use the default install location so don't worry about it, just click install, wait for few seconds.
The beautiful thing about anaconda is that it will install jupyter as well as all those popular data

science

libraries like numpy pandas etc, so I don't have to install this manually using pip now as part of the next step. Anaconda suggests installing Microsoft vs Code. We already have this on our machine, so we don't have to install it. We can go ahead and close the installation now we can finally move this to the trash because we don't need this installer in the future. Now open a terminal window and type jupyter with a space and notebook. This will start the notebook server on your machine, so log in, here you go. will start the portable server on your machine, you can see these default messages here, don't worry about them, now it automatically opens a browser window pointing to localhost port 888, this is what we call Jupiter panel in this panel, we have some tabs The first tab is the files tab and by default it points to your home directory, so every user on your machine has a home directory.
This is my home directory on Mac. You can see here that we have a desktop folder as well as document downloads etc. machine, you will see different folders, then someone on your machine needs to create a Jupyter notebook. I'm going to go to the desk, here's my desk. I don't have anything here and then I click new. I want to create a notebook for Python 3. In this notebook we can write Python code and execute it line by line, we can easily visualize our data as you will see in the upcoming videos, so let's move forward with this.
Here is our first notebook, you can see it by default. called untitled, let's change that to hello world, so this will be the hello world of our machine learning project. Let's change the name now. If you look at your desktop, you can see this file helloworld.i pi nb. This is a Jupiter notebook, it's kind of similar to our pi files where we write our Python code, but it includes additional data that Jupiter uses to run our code, so let's go back to our notebook, print Hello world, and then click on this Run button here and here is the result printed in Jupyter so we don't have to navigate back and forth between the terminal window, we can see all the results here.
Next, I'll show you how to load a dataset from a csv file in Jupyter. Very well, in this conference we go. to download a dataset from a very popular website called kaggle.com gaggle is basically a place to do data science projects, so the first thing you need to do is create an account, you can sign up on Facebook, Google or use a personalized email and password once you register, come back here on kaggle.com here in the search bar search for video game sales this is the name of a very popular data set that we will use in this conference, so here in this list you can see the first item with this kind of reddish icon, so let's go with that, as you can see, this data set includes sales data for over 16,000 video games.
On this page you can see the description of various columns in this data set that we have. classification name, platform year etc, here is our data source, it is a csv file called vgsales.csv. As you can see, there are more than 16,000 rows and 11 columns in this data set just below and you can see the first few records of this data. established, so here is our first record, the rating of this game is one, it is the Wii sports game for us as a platform and it was released in the year 2006. Now what I want you to do is go ahead and download this set of data and, as I told you.
First you need to log in before you can download this, so this will give you a zip file as you can see here here is our csv file. Now I want you to place this right next to your Jupyter notebook on my machine that's on my desk. so I'm going to drag and drop this into the desktop folder now if you look at the desktop you can see here is my jupyter hello world notebook and right next to it we have vgsales.csv with which we go back to our jupyter notebook. let's remove the first line and instead import pandas as pd.
With this, we will import the pandas module and rename it to pd so that we don't have to type pandas dot multiple times in this code. Now let's type pd dot read underline csv and pass the name of our csv file which is vg sales.csv now, because this csv file is in the current folder right next to our Jupyter notebook, we can easily upload it; otherwise we will have to provide the full path to this file for it to return data. frame object which is like an excel spreadsheet let me show you so we store it here and then we can just type df to inspect it so once again let's run this program here is our data frame with these rows and columns tothat we have the name of the classification platform. and so on, now this data frame object has many attributes and methods that we are not going to cover in this tutorial and that are really beyond the scope of what we are going to do, so I will leave it to you to read the panda article . documentation or follow other tutorials to learn about pandas data frames, but in this lecture I will show you some of the most useful methods and attributes.
The first one is the shape, so let's run this one more time, so here is the shape of this data. Altogether, we have over 16,000 records and 11 columns, technically this is a two-dimensional array of sixteen thousand and eleven. Okay, now you can see here, we have another segment to write code, so we don't have to write all the code in the first segment. So here in the second segment we can call one of the data frame methods which is df dot describe. Now when we run this program, we can see the result of each segment right next to it, so here is our first segment.
Here we have these three. lines and this is the output of the last line below which we have our second segment here we are calling the describe method and just below we have the output of this segment so this is the beauty of Jupiter we can easily visualize our data by doing This with vs code and terminal windows is really tedious and complicated, so what is this description method returning? Basically it returns some basic information about each column in this data set, as you saw above we have columns like year of ranking etc. these are the columns with numeric values.
Now for each column we have the count, which is the number of records in that column that you can see. our sort column has 16598 records while year column has 16327 records so this shows that some of our records don't have the value for year column we don't have values ​​so in a project real data science or machine learning We have to use some techniques to clean our data set. One option is to delete records that don't have a value for the year column or we can assign them a default value that really depends on the project. Now another attribute for each column is In the case of the rating column, this value doesn't really matter, but look at the year, so the average year for all these video games in our data set is 2006 and this could be important.
In the problem we are trying to solve we also have the standard deviation, which is a measure to quantify the amount of variation in our set of values ​​below. We have min as an example, the minimum value for the year column is 1980. Very often when working with a new data set we call the describe method to get some basic statistics about our data. Let me show you another useful attribute, so in the next segment let's write df.values. Let's run this, as you can see, it returns a value of two. dimensional array this bracket indicates the outer array and the second bracket represents the inner array, so the first element in our outer array is an array itself.
These are the values ​​in this matrix that basically represent the first row of our data set, so the video game with ranking 1, which is called wii sports, so this was a basic description of the pando data frames in next lecture, I will show you some of the most useful shortcuts in jupyter in this lecture, I will show you some of the most useful shortcuts in jupyter now, the first thing I want you to pay attention to is this green bar on the left, this indicates that This cell is currently in edit mode, so we can write code here now if we press the Escape key, the green turns blue. and that means this cell is currently in command mode so basically the activated cell can be in edit mode or command mode depending on the mode we have different shortcuts so here we are currently in command mode if we press h.
You can see the list of all keyboard shortcuts just above this list. You can see the Mac OS modifier keys. These are the extra keys we have on a Mac keyboard. If you are a Windows user, you won't see this. For example, here is the form of the command key, this is control, this is an option, etc., with this guide you can easily understand the shortcut associated with each command, let me show you that here we have all the commands when a cell is in command mode. for example, we have this command, open the command palette, this is exactly the same as the command palette we have in vs code, here is a shortcut to execute this command, which is command shift and f, okay, here we have many shortcuts , of course not.
I'll use them all all the time, but it's good to take a quick look here to see what's available to you. With these shortcuts, you can write code much faster, so let me show you some of the most useful ones that I'm going to use. close this now with our first cell in command mode. I'm going to press b and this inserts a new cell below this cell. We can also return to our first cell. Press Escape. Now the cell is in command mode. We can insert an empty cell above this cell by pressing a so that a or b a up and b down okay, now if you don't want this cell you can press d twice to delete it like this now in the cell I'm going to print a hello world message , so print hello world now to run the code in this cell we can click on the run button here so here is our print function and just below you can see the result of this function but keep in mind that when you run one cell this will only execute the code in that cell, in other words the code in other cells will not be executed, let me show you what I mean, so in the cell below this cell I am going to remove the call to describe the method. print ocean now I'm going to place the cursor back in this cell where we print the hello world message and run this cell so you can see that hello world is displayed here, but the cell below still shows the table described, so don't see changes here now to solve this problem, we can go to the cells menu at the top and run all the cells together.
This can work for small projects, but sometimes you're working with a large data set, so if you want to run all of these cells. together it will take a long time, that's why Jupiter saves the result of itself, so we don't have to re-run that code if it hasn't changed, so this notebook file we have here includes our source code organized in cells. as well as the output for each cell that's why it's different from a normal pi file where we only have the source code here we also have autocompletion and intellisense so in the cell let's call df dataframe dot now if you press tab we can see everything . the attributes and methods on this object, so let's call describe now with the cursor on the method name, we can press Shift and Tab to see this tooltip that describes what this method does and what parameter it takes, so here, in front of the signature you can see the description method, these are the parameters and their default value and just below you can see the description of what this method does, in this case it generates descriptive statistics summarizing the central tendency etc., similar to vs code, we can also convert a line to comment by pressing command and slash on Mac or control slash on Windows like this now this line is a comment, we can press the same shortcut once again to remove the comment, so these were some of the most useful shortcuts in jupyter now about the In the next few lectures we are going to work on a real machine learning project, but before we get there, let's delete all the cells here, so we will start with a single empty cell, like this which here, in this cell, I'm going to press the Escape button first.
Now the cell is blue, so we are in command mode and we can delete the cell by pressing d twice. There you have it, now the next cell is activated and is in command mode, so let's delete this as well, we have two more cells to delete. there you have it and the last one like this so now we have an empty notebook with a single cell Hi guys, I just wanted to inform you that I have an online coding school at cordwindmarch.com where you can find many courses on web and mobile development, In fact, I have a comprehensive Python course that teaches you everything about Python, from the basics to more advanced concepts, so after watching this tutorial, if you want to learn more, you might want to check out my Python course, which comes with a duration of 30 days. money back guarantee and a certificate of completion that you can add to your resume in case you are interested, the link is below this video during the next lectures we are going to work on a real machine learning project imagine we have a store online music When our users register we ask them their age and gender and based on their profile we recommend several music albums that they are likely to buy, so in this project we want to use machine learning to increase sales, so we want build a model that feeds this. model with some sample data based on existing users, our model will learn the patterns in our data, so we can ask it to make predictions when a user signs up, we tell our model, we have a new user with this profile , what is the type of music that this user is interested in in our model it will say jazz or hip hop or whatever and based on that we can make suggestions to the user, so this is the problem that we are going to solve now, Let's go back to the list of steps in a machine learning. project we first need to import our data, then we need to prepare or clean it, then we select a machine learning algorithm to build a model, we train our model and ask it to make predictions and finally we evaluate our algorithm to see its accuracy if it is not accurate.
We either tune our model or select a different algorithm, so let's focus on the first step. Download the csv file below this video. This is a very basic csv that I created for this project. It's just some random made up data, it's not real. We have a table with three columns: age, gender and gender. The gender can be one representing a man or zero representing a woman. I'm making some assumptions here. I guess 20-25 year old men like 20-25 year old hip-hop men. Those between 26 and 30 like jazz and after 30 they like classical music for women, I guess if they are between 20 and 25 years old they like dance music, if they are between 26 and 30 years old they like acoustic music and just like men. after 30 years they like classical music once again, this is a made up pattern, it is not the representation of reality, so let's go ahead and download this csv, click on this dot icon here and download this file in my downloads folder here.
I have this music.csv, I'm going to drag and drop it onto the desktop because that's where I saved this hello world notebook, so I want you to put the csv file right next to your jupyter notebook and now let's go back to our notebook. read csv file so as before first we need to import pandas module so import pandas as pd and then we will call pd which reads parses csv and our file name is music.csv as you saw above this returns data. frame which is a two dimensional array similar to an excel spreadsheet so let's call that music underline data now let's inspect this music underline data to make sure we load everything correctly so run it here is our beautiful frame next minute to prepare or clean the data. and that is the topic of the next lecture, the second step in a machine learning project is to clean or prepare the data and that involves tasks like removing duplicate null values ​​etc., now in this particular data set we don't have to do no type. cleanup because we don't have duplicates and as you can see all rows have values ​​for all columns so we don't have null values ​​but there is one thing we need to do we need to split this data set into two separate data sets . one with the first two columns which we refer to as the input set and the other with the last column which we refer to as the output set, so when we train a model we give it two separate sets of data, the set of input and output set. the output set, which in this case is the gender column, contains the predictions, so we tell our model that if we have a user who is 20 years old and male, likes hip hop, once we train our model, we give it a new input. set, for example, we say: we have a new user who is 21 years old and male, what is the gender ofmusic that this user probably likes?
As you can see from our input set, we do not have a sample for a 21 year old user. man of one year, so we're going to ask our model to predict that that's why we need to split this data set into two separate input and output sets, so going back to our code, this frame object Data has a method called drop now if Place the cursor under the method name and press Shift and Tab. You can see this tooltip, so this is the signature of this placement method. These are the parameters that we can pass here.
The parameter we will use in this lecture is the columns that are configured. to none by default with this parameter we can specify the columns we want to remove, so in this case we set the columns to an array with a type of string. Now this method doesn't actually modify the original data set, in fact it will create new data. set but without this column, so by convention we use a capital x to represent that data set, so the capital x is equal to this expression. Now let's inspect x so you can see our input set or x includes these two columns, age and gender which it doesn't have. the output or predictions next we need to create our output set so once again we start with our data frame music data using square brackets we can get all the values ​​in a given column in this case genre once Plus, this returns a new data set by convention.
We use a lowercase y to represent that, so that is our output data set, let's inspect that as well, in this data set we only have the predictions or the responses, so we have prepared our data. Next, we need to create a model using an algorithm. The next step is to build a model using a machine learning algorithm. There are so many algorithms available and each algorithm has its advantages and disadvantages in terms of performance and accuracy. In this lecture we will use a very simple algorithm called a decision tree. now the good news is that we don't have to explicitly program these algorithms, they are already implemented for us in a library called scikit-learn, so here at the top from sklearn.3 let's import the decision tree classifier so that sklearn is the package. which comes with scikit-learn library this is the most popular machine learning library in

python

in this package we have a module called tree and in this module we have a class called decision tree classifier this class implements the decision tree algorithm well , now we need to create a new instance of this class, at the end we are going to create an object called model and configure it to a new decision tree classifier instance like this, now that we have a model, next we need to train it to learn patterns in the data. and that's pretty easy, we call model that fits this method takes two sets of data, the input set and the output set, so they are capital x and y.
Now we finally need to ask our model to make a prediction so we can ask it what kind of music a 21 year old man likes now, before we do that, let's temporarily inspect our initial data set, which is music data, so look what we have here, as I told you before, I have assumed that men between 20 and 25 years old like hip-hop music but here we only have three samples for men of 20, 23 and 25 years old, we do not have a sample for a man of 21 years, so if you ask our model to predict the type of music a 21-year-old man likes, we expect him to say hip hop similarly.
I've assumed that women between 20 and 25 like dance music, but we don't. We have a sample for a 22 year old woman, so once again, if you ask our model to predict the type of music a 22 year old woman likes, we expect it to say dance, so with these assumptions let's move forward and let's ask our model to make predictions. so let's remove the last line and instead call the model dot predict. This method takes a two-dimensional array, so here is the outer array in this array, each element is an array, so I will pass another array here and in this one. array, I'm going to pass a new set of entries for a 21 year old man, so 21 point one which is like a new record in this table, okay, so this is a set of entries, let's pass another set of entries for a 22 year old woman, so here's another one. matrix here we add 22 point zero so we ask our model to make two predictions at the same time we get the result and store it in a variable called predictions and finally let's inspect that in our notebook run look what we got our model .
Say a 21 year old man likes hip hop and a 22 year old woman likes dance music, so our model could successfully make predictions here, but wait a second, build a model that makes predictions accurately It's not always that easy, as I told you before. We build a model, we need to measure its accuracy and if it is not accurate enough, we need to tune it or build a model using a different algorithm, so in the next lecture I will show you how to measure the accuracy of a model. In this lecture, I'll show you how to measure the accuracy of your models.
To do this, we first need to split our data set into two sets, one for training and the other for testing, because right now we're going through. the entire data set to train the model and we are using two samples to make predictions which are not enough to calculate the accuracy of a model. A general rule of thumb is to allocate 70 to 80 percent of our data for training and the other twenty. thirty percent for testing, then instead of passing just two samples to make predictions, we can pass the data set that we have for testing, we will get the predictions and then we can compare these predictions with the actual values ​​in the data set. tests based on that. we can calculate the precision, that's really easy, all we have to do is import a couple of functions and call them in this code.
Let me show you, first at the top of sklearn, the model underline selection module, we import a function called train test split with With this function we can easily split our data set into two sets to train and test, right now, later After defining the sets x and y, we call this function to train the test split, we give it three arguments 0.2, so we're allocating 20 of our data to test now this function returns a tuple so we can decompress it into four variables right here x underline train x underline test and underline train and underline and test, so the two First variables are the input sets for training and testing and the others are the output sets for training and testing.
Now, when we train our model, instead of passing the entire data set, we want to pass only the training data set, so x underlines the train. and the underline train and also while making predictions, instead of passing these two samples, we pass the underline test x, so it is the data set that contains input values ​​to test, now we get the predictions to calculate the accuracy , we simply have to compare these predictions with the real ones. values ​​we have in our result set for testing, that's very easy first at the top we need to import a function so from sklearn.metrics metrics import the underlying accuracy score now at the end we call this function to get a precision score and give it two arguments. and underline test containing the expected values ​​and predictions containing the actual values.
Now this function returns a precision score between zero and one, so we can store it here and just display it in the console, so let's go ahead and run this program to improve the precision. The score is one or 100 percent, but if we run this one more time we will see a different result because every time we split our data set into training and testing sets we will have different data sets because this function randomly selects data for training and testing , let me show you, so place your cursor on the cell now you can see that this cell is activated.
Note that if you click on this button here, it will run this cell and also insert a new cell below this cell, let me show you, so if I go to the second cell, press the Escape button, now we are in command mode. Press d twice. Okay, now it's removed. If we click on the Run button you can see that this code has been run and now we have a new cell so if you want to run our first cell multiple times each time we have to click on this and then run it and then click again and run it, it's a little bit tedious, so I'll show you a shortcut, activate the first cell and press Ctrl and enter this, it runs the current cell without adding a new cell below it, so let's go back here, let's run it a few times.
Okay, now look, the accuracy is down to 0.75, it's still good, so the accuracy score here is between 75 and 100, but let me show you something if I change the test size to 0.2. to 0.8, so essentially we are using only 20 of our data to train this model and we are using the other 80 for testing. Now let's see what happens when we run this cell multiple times, so check and enter, look, the precision immediately dropped to 0.4. once again now 46 percent 40 26 is actually very bad the reason this happens is because we are using very little data to train this model this is one of the key concepts in machine learning the more data we give to our model and the cleaner the data is the one we get the best result, so if we have irrelevant duplicate data or incomplete values, our model will learn bad patterns in our data, that's why it is very important to clean our data before training our model.
Now let's change this back to 0.2, run this. once again, okay, now the accuracy is 75 percent, now we go down to 50 again. The reason this happens is because we don't have enough data. Some machine learning problems require thousands or even millions of samples to train a model, the more complex it is. The problem is that we need more data, for example here we are only dealing with a three column table, but if you want to create a model to know if an image is a cat, a dog, a horse or a lion, we will need millions of images , the more animals we want to support, the more images we will need in the next lecture we will talk about model persistence, so this is a very basic implementation of building and training a model to make predictions now to simplify things I have.
We removed all the code that we wrote in the last lesson to calculate accuracy because in this lesson we are going to focus on a different topic, so we basically import our dataset, create a model, train it, and then ask it to make predictions. The code snippet you see here is not what we want to run every time we have a new user or every time we want to make recommendations to an existing user because training a model can sometimes take a long time in this example we are dealing with. with a very small data set that has only 20 records, but in real applications we can have a data set with thousands or millions of samples that train a model for that can take seconds, minutes or even hours, that's why model persistence It is important once in a while we build and train our model and then we will save it to a file, now the next time we want to make predictions we simply load the model from the file and ask it to make predictions; that model is already trained no need to retrain it its like a smart person so let me show you how to do this its very very easy at the top of the sklearn.externals module we import lib this working active object has methods to save and load models, so after training our model just call joblib dot dump and give it two arguments, our model and the name of the file in which we want to store this model.
Let's call that music dash recommend dot job lib, that's all we have to do now temporarily. I'm going to comment on this. line, we don't want to make any predictions, we just want to store our trained model in a file, so let's run this cell with control and forward slash. Okay, look in the output we have an array containing the name of our model file, so this is the return value of the dump method now it returns to our desktop right next to my laptop, you canview the live file of our work, this is where our model is stored.
It's just a binary file, now it goes back to our Jupyter laptop, as I told you before in a real version. application, we don't want to train a model every time, so let's comment out these few lines, so I selected these few lines on Mac, we can press command and slash in the Windows control, forward slash, okay, these lines are commented out now this time instead of dumping our model we are going to load it so we call the load method we don't have the model we just pass the name of our model file this returns our trained model now with these two lines we can simply make predictions so earlier we assumed that men between 20 and 25 years old like hip-hop music, let's print predictions and see if our model is behaving correctly or not, so check and enter, there you have it, this is how we persist and load models Earlier in this section, I told you that decision.
Trees are the easiest to understand and that is why we start machine learning with decision trees. In this lecture we are going to export our model in a visual format so you can see how this model makes predictions. That's really cool, let me show you. Once again, I have simplified this code, so we simply import our data set, create input and output sets, create a model and train it, that's all we are doing now. I want you to follow me and write everything exactly as I show you in this lecture. Don't worry about what it all means, we'll come back to this shortly, so at the top of the sklearn import tree, this object has a method to export our decision tree in a graphical format, so after training our model, let's call the tree point. export underline graph now here are some arguments that we need to pass, the first argument is our model, the second is the name of the output file, so here we will use keyword arguments because this method takes many parameters and we want to selectively pass word arguments key without worrying about their order, so the parameter we're going to set is the underscore file, let's set this to the music script recommender dot d o t this is the dot format, which is a graphics description language that you'll see in brief the other The parameter we want to set is the underlined names of the features.
We set it to an array of two strings: age and gender. These are the features or columns of our data set, so they are the properties or characteristics of our data. Well, the other parameter is the class names. underlined class names, we need to set this to the list of classes or tags that we have in our output data set, such as hip hop, classical jazz, etc., so that this data set includes all genres or all classes of our data, but they are repeated. multiple times in this data set, so here we call and single point, this returns the unique list of classes.
Now we need to sort this alphabetically, so we call the sorted function and pass the result to y single point, the next parameter is label, we set this to a string all once again, don't worry about the details of these parameters, we'll come back to this in short, so set the label to all, then round to true and finally fill to true, so this is the final result. Now let's run this cell. using control and enter, okay, here we have a new music recommendation file, dot, dot, which is kind of fun, so we want to open this file with vs code, so drag and drop it into a vs code window, okay , here is a dot format, it is a textual language for Describing graphs now to display this graph, we need to install an extension in vs code, so on the left side click on the extensions panel and search for dot dot, look at the second extension here, graphviz or staphon vs dot language, go ahead and install this extension and then reload vs code once you do, you can display this dot file, so let me close this tab, look at this dot, here on the right side , click on this, you should have a new menu open, site preview, so click on that, all good. here is the visualization of our decision tree, let's close the point file, there you have it, this is exactly how our model makes predictions, so we have this binary tree, which means that each node can have a maximum of two children above it from each node, we have a condition if this condition is true, we go to the child node on the left side; otherwise we go to the child node on the right side, so let's see what happens here.
The first condition is age less than or equal to 30.5 if this condition is false, that means that the user is 30 years old or older, so the genre of music they are interested in is classical, so here we classify people according to their profile that is the reason we have the word class here, so a user who is 30 years old or older belongs to the class of classical music or people who like classical music. Now what happens if this condition is true? That means that the user is under 30 years old. So now we check the gender if it is less than 0.5, which basically means that if it is equal to 0, then We are dealing with a female, so let's go to the child node here.
Now, once again, we have another condition, so we are dealing with a woman under 30 years old. Once again we need to check his age so age is less than 25.5 if that is the case then that user likes dance music otherwise he likes acoustic music then this is the tree decision that our model uses to make predictions. If you're wondering why we have these floating point numbers like 25.5, these are basically the rules that our model generates based on the patterns it finds in our data set. As we give our model more data, these rules will change. , so they are not always the same.
Additionally, the more columns or features we have, the more complex our decision tree will become. Currently we only have two features, age and gender, now let's go back to our code, let me quickly explain the meaning of all these parameters. We set fill to true so that each box or each node is filled with a color we set rounded to true so that they have rounded corners where we set the label. all so that each node has labels that we can read, we set the class names to the gender single list and that is to show the class for each node right here and we set the function names to age and gender so we can see the rules in our notes. thank you for watching my tutorial i hope you learned a lot and are excited to learn more if you enjoyed this tutorial please like and share it with others and make sure to subscribe to my channel as i upload new videos every week once again thank you and me We wish you all the best

If you have any copyright issue, please Contact