In this article, we will learn how to make predictions using Sentiment analysis. We will not be training any Machine Learning model, rather we will analyze the sentiments of people towards the candidates.
We will use the tweets posted by people for this and then on the basis of the number of positive and negative tweets we will try to predict the winner of the US Presidential Election 2020.
Dataset:
The dataset we will be using for this was collected from Twitter by the official twitter handles of Donald Trump and Joe Biden. These datasets can be downloaded at the following link:
https://drive.google.com/drive/folders/1VZOKZZECmQ1HAm_uImimUtypF4eLEUfe?usp=sharing
Necessary Installations:
To install necessary libraries needed for this project:
First open Terminal/Command Line.
Run the following command
pip3 install notebook numpy pandas textblob plotly
Installation should be complete.
To verify whether the installation is correct and/or to view information about the installation run the following command
pip3 show notebook numpy pandas textblob plotly
Sentiment Analysis:
Make necessary imports
import pandas as pd
import numpy as np
from textblob import TextBlob
import plotly.graph_objects as go
Read the 'Trump.csv' and 'Biden.csv' datasets into Pandas DataFrames.
trump_tweets = pd.read_csv("Trump.csv")
biden_tweets = pd.read_csv("Biden.csv")
Now, we will get to the sentiment analysis. We will do this using the Textblob() function of the textblob module.
textblob1 = TextBlob(trump_tweets["text"][10])
print("Trump :",textblob1.sentiment)
textblob2 = TextBlob(biden_tweets["text"][500])
print("Biden :",textblob2.sentiment)
The output of this cell is like this.
Trump : Sentiment(polarity=0.15, subjectivity=0.3125)
Biden : Sentiment(polarity=0.6, subjectivity=0.9)
We will use the polarity for our analysis. A positive polarity means a positive sentiment and a negative polarity means a negative sentiment.
We will add a new column to our DataFrames representing 'Sentiment Polarity'.
def find_pol(review):
return TextBlob(review).sentiment.polarity
trump_tweets["Sentiment Polarity"] = trump_tweets["text"].apply(find_pol)
print(trump_tweets.head())
biden_tweets["Sentiment Polarity"] = biden_tweets["text"].apply(find_pol)
print(biden_tweets.head())
Now we will add a 'Expression Label' column to our DataFrame where we will label our dataset with 'positive', 'negative' or 'neutral' based on 'Sentiment Polarity'.
trump_tweets["Expression Label"] = np.where(trump_tweets["Sentiment Polarity"]>0, "positive", "negative")
trump_tweets["Expression Label"][trump_tweets["Sentiment Polarity"]==0]="neutral"
print(trump_tweets.head())
biden_tweets["Expression Label"] = np.where(biden_tweets["Sentiment Polarity"]>0, "positive", "negative")
biden_tweets["Expression Label"][biden_tweets["Sentiment Polarity"]==0]="neutral"
print(biden_tweets.head())
We can now drop the rows where the sentiment came out to be 'neutral'.
tweets1 = trump_tweets[trump_tweets['Sentiment Polarity'] == 0.0000]
cond1=trump_tweets['Sentiment Polarity'].isin(tweets1['Sentiment Polarity'])
trump_tweets.drop(trump_tweets[cond1].index, inplace = True)
print(trump_tweets.head())
tweets2 = biden_tweets[biden_tweets['Sentiment Polarity'] == 0.0000]
cond2=biden_tweets['Sentiment Polarity'].isin(tweets2['Sentiment Polarity'])
biden_tweets.drop(biden_tweets[cond2].index, inplace = True)
print(biden_tweets.head())
At the end, we will print the number of positive and negative tweets for both the candidates.
We will also draw a graph showing the percentage of positive and negative tweets for each candidate.
count_1 = trump_tweets.groupby('Expression Label').count()
print('Donald Trump:')
print(count_1['user'])
negative_per1 = (count_1['Sentiment Polarity'][0]/trump_tweets.shape[0])*100
positive_per1 = (count_1['Sentiment Polarity'][1]/trump_tweets.shape[0])*100
count_2 = biden_tweets.groupby('Expression Label').count()
print('\nJoe Biden:')
print(count_2['user'])
negative_per2 = (count_2['Sentiment Polarity'][0]/biden_tweets.shape[0])*100
positive_per2 = (count_2['Sentiment Polarity'][1]/biden_tweets.shape[0])*100
Politicians = ['Donald Trump', 'Joe Biden']
lis_pos = [positive_per1, positive_per2]
lis_neg = [negative_per1, negative_per2]
fig = go.Figure(data=[
go.Bar(name='Positive', x=Politicians, y=lis_pos),
go.Bar(name='Negative', x=Politicians, y=lis_neg)
])
fig.update_layout(barmode='group')
fig.show()
With the help of the above visualization we can see how unpredictable this year's election is going to be.
Even though both the candidates are very popular the tweets regarding Joe Biden were more positive by a slight margin than that of Donald Trump. So, we can say that Joe Biden has an upper hand and has a slightly larger possibility of winning.
Comments