Twitter Sentiment Analysis Using Python

Jun 05, 2021

Hello everyone and welcome to this video about the Python programming language and machine learning, so in this video we will do a Twitter

sentiment

analysis

. I'm currently on Google's website called collab research Google comm because it makes it really easy to get. Started programming and Python and that means you don't have to install it on your computer. You can simply go to this website, sign in with your Google account, and start writing your Python code. So let's go ahead and start writing our program. So the first thing you're going to want to do is click on the file and then click on the new Python 3 notebook and it will open a new tab and a new cell now inside this cell.

I like to put a description of the program before. I write any logic, so let's do it in the comments now, so I will write a description and put that this is a

sentiment

analysis

program that analyzes tweets obtained from Twitter

using

Python. Okay, now let's go ahead and create a new cell by clicking. that code button and in this cell I'm going to import the libraries that we're going to use throughout this program, so I'm going to import tweet PI and from the text blog, I'm going to import the text blog then from Word. cloud I'm going to import word cloud and then I want to import pandas as PD and I'm going to import numpy as MP I'm going to import regular expressions are e and last but not least, I'm going to import that plot point type live batch SPL T and we'll style our chart, so type PLT point style point use and we'll use style 538 below.

More Interesting Facts About,

twitter sentiment analysis using python...

We want to run this cell by clicking this button here on the left and we're just checking to make sure there are no errors with these libraries and that doesn't seem to be the case, so let's go ahead and move this up and create a new cell. Now the idea of this program is to take some tweets and then of course capture the sentiment of those tweets, but before you do that you're going to need a Twitter account and you're going to need to have a Twitter app. I already have both. I'll put a link in the description that shows how to do it and I've put my Twitter app keys in a file and we need these keys to authenticate to Twitter, so I'm going to upload that file and then extract those keys, because I'm on the site Google website.

I need to use the Google library to load the data and that's exactly what we're going to do here, so here, when I type, load the data, from Google Colab I'm going to import files and create a variable called load it. and what this sets is equal to starting the file upload, okay, let's run this cell, run it one more time, click on Choose Files and the file that has my four keys is called lock that CSV, so I'm going to open it and I'm going to create a new cell and in this cell I'm going to get the data, so I'm going to create a variable called consumer key and I'm going to set it equal to weight before I do that, let me see how to get the data there.

Wow, before I do all that, I'm getting ahead of myself. I need to store the CSV file in some variable, so I will create a variable, call it record and set it equal to PD dot read underscore CSV and I'm done. I'm going to read the login point CSV file that I uploaded, so let's run this right now that I'm not getting ahead of myself. I'm going to get the Twitter API credentials, so now I'm going to create that variable. called consumer key and I'm going to set it equal to log in the column called key and position 0.

Next, I'm going to get the consumer secret and it's basically going to be the same except at position 1, so I'm going to copy this

using

ctrl C and paste it using control V and then go to position 1 where I have the key stored and I want to create a variable called access token and set it equal to the same but in position 2 and I want to create a variable called access token secret. I set it equal to the same but at position 3, so now let's run this cell right and create a new cell. Now we need to authenticate and create an id object so let's create the authentication object so 'I'm going to create an auth call variable and we will set it to a 3 point PI batch controller and this will take the consumer key in the one in which we store our key and the consumer secret in which we store the consumer secret key.

Alright, next. I'm going to set the access token and the access token is secret so to do that I just type authenticate dot set underline to put the secret access token and the second parameter in order and last but not least I am going to create the API object while passing and the author information so I will create a variable called API and set it equal to two , we have PI dot API and we are going to enter authentication, okay and wait underscore on underscore rate underscore limit will be true, now let's run this cell okay and create a new cell.

I'm going to scroll up, okay. Now what I want to do is extract a hundred tweets from a Twitter user, so I'll use the Twitter user named Bill Gates. Yes, this is Microsoft co-founder Bill Gates' Twitter account that we are going to use. and I choose this Twitter account because Bill Gates is trying to have a positive impact on the world and he founded an organization in 2000 called the Bill and Melinda Gates Foundation and I guess I should also mention that Melinda Gates also founded it. but basically this organization is guided by the belief that every life has equal value, so it is a very innovative organization and works to help all people live healthy and productive lives, so I am thinking that the account of Bill Gates' Twitter will be positive, okay?

In this cell, let's pull 100 tweets from the Twitter user and you know, I'll also put the link to the Bill and Melinda Gates Foundation in the description below so you can see it. Anyway, let's create a variable called posts and let's set equals AP I dot user underline timeline and we'll get the user's screen name, which is Bill Gates, okay and the count will be equal to 100 because we want 100 tweets from this account and the language will be English and the tweet mode will be equal to extended and that is so you can get the previous text, all good, that looks good, now let's print the last five tweets from this account or from the account, it is okay, someone I'm going to put a print statement here and I'm going to print, I'm going to show the five recent tweets and a new line and we're going to show those tweets, so for tweets and posts 0 to 5, not including 5, actually We want to print the tweet point for the underlined text and we'll probably want to put a new line for each one. one of them and let's run this right, so it looks pretty good.

We could see that there was a tweet about Roger Federer. I haven't pronounced that I won't try, but anyway the Rogers Foundation partners with local NGOs and six countries in South Africa to improve early learning and basic education is an awesome organization and I encourage you to consider supporting their work, so that's one of the tweets from the last tweet and I really want to make this look a little bit better so I'm going to create a variable called I equals one and what I'm going to do here is create a string and add a right parenthesis and That looks good, okay, I think that looks good, oh. no, that's not right, sorry, we need to put a space here and then the app and the tweet for the text, so now let's try that and now I see that I put the one several times here, so we need to make sure that it's incremented. so I said it's equal to I plus one and now let's run it right so we can see the five recent tweets.

Okay, so we're going to create a new cell and in this cell I'm going to create a data frame with a column called tweet or tweets okay, so I'm going to create a variable called DF which is going to be short for data frame and it's going to be equal to data frame PD and we'll get the full text of the tweet, so the tweet point four underscore text for tweet boost and the column name will be tweets, okay, next let's show the first five rows of data, so just type D F dot head lepprince, these right parentheses and run the cell and now we have stored these tweets in this new data frame, so going to create a new cell here and in this cell we need to clear the text because if you look here we can see the admissions here admissions here admissions here we can see some URLs, so let's see what else we have.

I'm sure we're going to have some hash tags so let's go ahead and clean the data so we're going to clean the text and to do this I'm going to create a function to do it for me so let's create a function to clean the tweets and I'm going to call this function clean text and it's going to take some text and we're going to set the text equal to removing these words and symbols, so just type re dot sub, so let's replace any pattern that we find with a string empty, for example, admissions. I want to remove them, so I want to find this pattern using regular expressions, so it will be followed by any character, so we have the case checking. lowercase, so let's do it now, A to Z, uppercase a to uppercase Z and we're also going to check for lowercase a to lowercase Z and any number from 0 to 9, okay, and then we'll put a plus sign here for one . or more, I'm going to put an R in front here and what this R does is tell Python that the expression is a raw string, so that's the pattern we want to find and once we find that pattern I want to replace it with the empty string , so I just want to get rid of it, I just want to delete it and then of course we have to put in the text that we want to find the pattern of okay, okay, so this is here again just. is removed at Mitch's, okay, next I'm going to set the text equal to re dotsub and I'm going to use that are again and we're going to get rid of any hash tags or number signs and the text and we're going to replace it, of course, with the empty string and we need to give it the text that we want to remove the hash tag from, so here we are just removing the hashtag symbol.

Okay, so next let's set do it and then an empty string and of course we enter. text ok, again this is remove RT and then let's set I'm going to put a question mark here to have 0 or 1 character, okay, so it's going to have a colon and the right slashes, so I'm going to get the slashes here, one, two slashes, yeah, I think so, the slashes are yeah. Okay, so one two and we're going to check if it's followed by one or more non-white spaces, so we're going to put a backslash here with a capital S and more, so I think that would do it and I'm pretty sure that's it. . backslash, okay, anyway, let's make sure we replace it with the empty string and enter the text, okay, this right here will remove the hyperlink, okay, and when we're done, we can return the text so the function looks good.

I'm going to bring This up a little bit, so now we can apply this function, so I just write DF tweets, tweets should be a string, tweets will be equal to DF tweets dot, apply clean text, so we'll apply that function to our tweets and then let's show the clean text so here we're technically cleaning the text so I'm just going to type DF and let's run this cell okay and now we can see that some of these admissions actually all of these admissions disappeared so I don't I don't see that in our tweets here and we don't see the link, so where we have game one, the link goes now, so it looks good, we've cleaned up the text, we could probably do a little more cleaning up on the text. but for now I think it's good enough, I'll create a new cell and bring this up.

Now in this cell I'm going to create a function to get something called subjectivity and polarity, and I'm going to create two more columns to store all this subjectivity x' and all the polarities, so let's do that now. I'm going to create a function to get this subjectivity and the subjectivity indicates how subjective or opinionated the text is, so that's what we're doing here, so I'm going to call this function get subjectivity and it will take some text and return a text blob. We have to enter text, point, sentiment, point, subjectivity, all right and we also want to create a function to get the polarity, so the polarity tells how positive or negative the text is, so I'm going to call this function get polarity and it will accept something of text, okay, and I'll return the text blob and with the input the text point sentiment. point polarity, all right, now let's create those columns, so let'screate two new columns and one will be called subjectivity and the other will be called polarity, so just type DF and then subjectivity and I'll set it equal to DF. tweets are applied so let's apply the get subjectivity function and I'm basically going to do the same thing with the polarity so just type DF and then polarity and set it to be equal to DF tweets and we're going to apply the get the polarity function we created okay so it looks good now I want to display the new data frame with the new columns so just type DF and let's run this cell so now I made a frame that looks pretty good so to each tweet here now we can see this subjectivity and the polarity of that tweet so I will create a new cell and now I want to see how well these feelings are distributed and a good way to achieve this task is to understand common words by plotting something called cloud of words and a word cloud is also known as tag clouds or text clouds and is basically a visualization where the more a specific word appears in the text, the larger and bolder it appears in the word cloud, so let's visualize all the words and the data using the word cloud graph so here I'm just going to put word cloud or plot the word cloud okay so I'm going to create a variable called all words I set it equal to string with a space, so there's a space there for that string and we're going to join, we're going to join tweets for tweets and our tweet column, okay, now I'm going to create a verbal. call the word Klout and I'll set it equal to the word cloud okay I need to give it a width so the width is 500 we'll give it a height so I'll set the height equal to 300 and give it a random state so the random state be equal to 21 and a maximum font size equal to 110, okay, that looks good, so let's generate this, so just type dot generate.

I'm going to try to write all the words, so show this, just write the ELT dot. I'm showing, I'm going to show the word cloud and the interpolation is going to be bilinear, okay, I think it looks good, let's remove the axis, so I just feel that axis and then let's show. this is good, I think that would be good, let's run this. I got some errors so the word cloud name is not defined. Ah, I see why the word cloud variable is not actually a word cloud object, but now it should be, it should be uppercase.

We're okay, now we have our word influence and we can see that world is quite big, so that means that although the word world appeared a lot in the text, new is another word that appeared a lot of work health year progress disease foundation so it's nice Interesting, many of these seem to help work in a new world. I think there are good words, I'm not sure about the disease. I guess it depends on how you use them, but anyway this is the word influence, so let's go ahead and create a new cell and and this wants to create a function to calculate the negative, neutral and positive analysis, so I'm going to create this function, let's call it get analysis and it will take some scoring, so if the score is less than zero then it is negative.

Otherwise we will return this negative string, if the score is equal to 0, else if the score is equal to zero then we will return the neutral string and else if it is neither negative nor neutral then it is positive , so we will return. the positive string looks good and I'm going to create a new column called analysis and I'm going to set this equal to the polarity DF, all good, because the polarity is the score and we're going to apply the function that you just caught. and now this is okay so I'm going to show that data frame so just type DF and let's run this okay so now we can easily see if the tweet is positive or if the tweet is negative so here we have our tweet here and the analysis . is that the tweet is positive, so we got positive, positive, positive, we have a negative here, it looks like we have a neutral here, from the hundred or hundred tweets that we have extracted, okay, so let's go ahead and create a new one cell and in this cell I want to print all the positive tweets so I'm going to create a variable called J and I'm going to set it equal to one and I'm going to create a variable called DF sorted which will be short for sorted data frame and I'm going to set this equal to the values DF point sorting underscore and let's sort these values by polarity, okay now for I in range 0 for DF sorted, yes, DF point shape sorted at position 0 or in other words number of rows in our data frame I want to print the sorted DF I want to print the tweet so the sorted DF tweets at position I are fine so I don't think it's really sorted J oh well I know what I'll use J because I'll use J here so we can use J as a number as a string that we can add so it looks decent, put a space in there so I think it looks good, now we have the tweets and I've sorted them. the values, so we'll get the most positive tweet and then we'll get the rest of the ones that follow, so it'll be the most positive, the second most positive, the third most positive and let's see, I can move on. and let's just print for a new line and set J equal to J plus 1 because we need it to be incremented and run this right.

We're not done yet, right? But I'll run it and then we'll continue. So let's go ahead and put the if statement here now, so I need an if statement to check if the tweet is negative, positive or neutral, so if I order it, DF analysis now, this is in position I is positive. so I need my parentheses so we'll do everything here so let me bring it back now everything looks good so sorry for the confusion basically what I did here was create a variable called DF sorted which sorts the values and I did a box by polarity then I am looping through the number of rows and sorted data box and then if the analysis at position I is positive then we will print that tweet okay and after that I will print a new line.

I just go ahead and run this. Oh, okay, here we have the Rogers Foundation partnering with local NGOs and six southern African countries to improve early learning and basic education, so that seems like the biggest positive. tweet and the tweet list of positive tweets, so it's quite interesting and we can easily see that there are eighty-one of these positive tweets, so 81 out of 100 of the tweets are positive, so we can already see that the author of which is Bill Gates has a positive feeling so far with these tweets, at least the last 100 are correct, 81 percent of them are positive, so we will create a new cell and we could just copy paste and change some things , but I'll go ahead and do this. again, so let's print, let's print the negative tweets, so let's see, let's look up here again, okay, let's go back down, so let's create a verbal, let's call it J and let's set it equal to one and create that classification variable D F again , we really could. let's see if we can use it, we can't use it again because now I'm going to sort the tweets in descending order, so let's create the verbal call, sorted D F and let's set it equal to the dot sort underscore values D F and let's sort the values by polarity , but this time ascending will be equal to false, that's fine and I didn't put ascending.

I'm on the last one because the default value is true so for I and range 0 to the number of the number of rows and sorted D F which is that way at position 0 I'm going to print sort the tweets D F at position I but only I want to do this if the parsing of DF ordered at position I is negative, okay, everything correct and I'm going to make it look good, so I'm going to make J a string and we're going to set that character in the correct parenthesis and then we'll sort it with DF pins and that looks good and then We'll print a new line and then we'll set J equal to J plus 1.

Alright, let's go ahead and run this right, so I was wrong somehow, let's look at the name, like this which I wrote wrong ordered here, so let's go ahead and execute. This is fine, now it seems like the most negative post is the game. I'm not really sure why, but that seems to be the most negative post. Now let's take another look here. It's encouraging to see these results that seem somewhat positive, let's see. I was deeply moved, okay, I'm excited to see world leaders and climate advocates come together to make progress, let's look at the United Nations Congress for a climate action summit, working together we can prevent the worst effects of climate change, so maybe the show saw the worst and said Well, that's a bad word, maybe everything here is terrible.

Said. Okay, that's a bad word. This must be a bad feeling or a negative feeling. Let's see here, we have mortality. Okay, so I can see how they decided on these. are negative here we have just this so injustice is a pretty negative word and I'm still not sure about this game but anyway we have nine negative tweets from the Twitter user okay so let's create a new cell and I want to plot polarity and subjectivity, so to do this we simply write the PLT point figure and we give our figure such a fixed size that it will be equal to 8 point 6 now 4i and range 0 to the shape of our data frame , which again is the number of rows in which we are going to plot the data as a scatter plot or a graph, so we need to give it the x axis, so it will be the DF polarity at position I and we need to give it the y . coordinates I minus a label X or but anyway x axis this is one here is the x axis if I didn't say little bit of color to our data, so I'm going to make it blue and then we're going to give our chart a title, so type in LTE spot title and the title is going to be sentiment analysis.

Okay, so next we're going to give our graph the label , so type, you'll see that program and let's run this right, so now we can Let's visually look at the polarity of Bill Gates' tweets, so this middle line here would be neutral and we can see that most of the posts are in the right side of this line here and then there's only a few on the left and there should only be what nine on the left so let's go ahead and count one two three four five six seven eight nine okay, let's go ahead and create a new cell and in this cell I will get the percentage of positive tweets that we already know.

It's 81, right, it was 81, let me take, scroll down, yeah, it was 81, but I'll show you another way. I really like to show different ways of doing things, so, you know, we get all the positive tweets, so I'm going to create a variable called P tweets which I'll set equal to our data frame and we'll get all the data where the analysis be positive so yeah could you do this instead of the for loop? print P tweets but I didn't so I just showed you another way anyway let's set P tweets equal to P tweets it will only get the tweets from the data set so it will allow you to see all the tweets so we need to print P tweets now.

Let's go ahead and run this, we can see that we got all the positive tweets back and don't be confused by these numbers here because you can see it a lot earlier, when I printed the positive tweets, I created these numbers correctly, so it's the same thing, these are the same posts from the first time we printed the positive tweets. Okay, now to get the percentage, I just need to get the number of rows in the data set, so it should be the dot shape of P tweets at position. 0 or of course you could use length and we'll need to divide it by the total number of data and our data frame so that it's shaped like a DF point at position 0 and what I'll do is round this up. so let's use the round function.

I'm going to cut this using control let's put it back, let's run this again, now we see that we got eighty one percent back, which we hope is fine, so it's going to create a new cell and I'm going to do the same thing but for the negative tweets, so I'm going to get the percentage of negative tweets, so I'll create a variable called tweets and set it equal to DF, where DF not the analysis is negative and then I'll get everything. of the negative tweets by setting entries equal to the tweets and tweets column and then I will round the shape of the point of the tweets at the zero position divided by the shape of the point D F at the zero position.

This time I multiply by what 100 and we round to a decimal and let's run this and it gives us nine percent, which of course we already knew, so yeah, this just shows you another way.to print positive and negative tweets. You can use it if you want, if not, you know, it doesn't really matter. You could do a count in the for loop if you want to get the number, percentage of positive and negative tweets, which means ten of the tweets are neutral and I'll worry about printing the neutral tweets here or displaying the percentage. is to go ahead and create a new sale.

I think this video is long enough and I want to show the value counts so I just type DF and then I parse the underlined point value of the counts and let's plot and visualize the counts and let's visualize the counts so I'm. I'm going to give the plot a title and I'm going to write sentiment analysis and then we're going to give the x-axis a label, so the poto the label is going to be counts and we're going to plot this, we're going to plot this as a bar, like a bar chart or a bar graph, so just type DF analysis point value underscore counts dot plot and then type chart which will be bars and then we'll show it like this just type bill t dot show and let's run this and of course we could do this oh let's see here this should just be an L okay this is the plot here and for Of course, we could have plotted it many different ways, right, I could have created it. a bar chart that loops through the values, but it doesn't matter anyway, we have it like this and this is what we get. we, for the positive count, we can see that the bar chart has 81 here, neutral, I guess Tim here he's right like he should have and then negative is nine, so it's a little bit lower than neutral and now we can see that, in fact, Bill Gates passed. hundred tweets seem to be mostly positive and not so negative, so it's pretty interesting anyway.

I hope the video wasn't too confusing and I know it's very long, but thank you all for sticking with me here. I hope you learned a lot about sentiment analysis with Twitter and leave any questions you have in the comments section if I don't answer them maybe other people will and please leave a like on the video if you enjoyed it and if you think the video It was helpful, share it. and as always I will see you all in the next video thanks for watching

Watch Video & Subscribe

If you have any copyright issue, please Contact