Twitter Sentiment Analysis Project Using Python














































Twitter Sentiment Analysis Project Using Python



    Twitter Sentiment Analysis Project Using 
                     Python :

What is Sentiment Analysis ?

Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral. It is contextual mining of some specific text which identifies and extracts subjective information in source material needed.
A basic task in sentiment analysis of text is classifying the polarity of a given text from the document. Polarity can be classified as positive, negative, or neutral.


Uses of Sentiment Analysis:

  • Business : In marketing field the strategies of a company is highly affected by this data i.e. It virtually shows the opinion of their customers about their popularity of products and highly affects their decision making (whether people are liking the particular product or not).
  • Politics : Since India is a democratic country and democracy gives you the right to speech, that is the reason why social media like twitter, facebook, instagram these are most popular platform for expressing public opinions . For example political parties have to monitor the tweets which are in their favour and also against them .
Prerequisites and Libraries need to be installed :

  • TweepyTweepy; is an easy-to-use Python library for accessing the Twitter API. Note that to run the program you have to install this library. You can install it using the following command .

pip install tweepy


  • TextBlob: TextBlob is a Python library for processing textual data.It provides a simple API for dividing into common NLP tasks. You can install it using following command.

pip install textblob

Note that, if we want to specifically provide some dataset on the basis of which we will determine the polarity of tweet i.e. positive, negative or neutral then we have to use the training dataset with content (A) and its corresponding sentiment (B) fields.
But its better to download the NLKT corpora (which is nothing but a collection of large structured set of texts) . Please download it using 

python -m textblob.download_corpora

Authentication needed :

Since this implementation  requires fetching of tweets through Twitter API , You should have an active twitter account and follow the below steps to know some of the informations needed to execute this program.  
Step 1 : Open Twitter; and first login to Twitter , then search for Create new App button and click it.
Step 2 : Fill the application details . You may leave the callback URL field empty .
Step 3 : Once the app is created successfully , you will be redirected to the app page.
Step 4 : Open the %u2018Keys and Access Tokens%u2019 tab.
Step 5 : Copy the Consumer key , Consumer Secret , Access Token and Access Token Secret which you have to provide in the implementation.

CODE:

import re 
import tweepy 
from tweepy import OAuthHandler 
from textblob import TextBlob 

class TwitterClient(object): 
    def __init__(self): 
        ConsumerKey = '***********'
        ConsumerSecret = '************'
        AccessToken = '****************'
        AccessTokenSecret = '***************'
        
        try:  
            self.auth = OAuthHandler(ConsumerKey, ConsumerSecret)  
            self.auth.set_access_token(AccessToken, AccessTokenSecret) 
            self.api = tweepy.API(self.auth) 
        except: 
            print("Error: Authentication Failed") 

    def clean_tweet(self, tweet): 
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) 
                                    |(\w+:\/\/\S+)", " ", tweet).split()) 

    def get_tweet_sentiment(self, tweet): 
                               
        tweet_sample = TextBlob(self.clean_tweet(tweet))  
        if tweet_sample.sentiment.polarity > 0: 
            return 'positive'
        elif tweet_sample.sentiment.polarity < 0: 
            return 'negative'
        else: 
            return 'neutral'

    def get_tweets(self, topic, count = 10):  
        tweets = [] 

        try:  
            fetched_tweets = self.api.search(q = topic, count = count)  
            for each_tweet in fetched_tweets: 
                parsed_tweet = {} 
                parsed_tweet['text'] = each_tweet.text 
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(each_tweet.text)  
                if each_tweet.retweet_count > 0: 
                    if parsed_tweet not in tweets: 
                        tweets.append(parsed_tweet) 
                else: 
                    tweets.append(parsed_tweet) 

            return tweets 

        except tweepy.TweepError as e: 
            print("Error : " + str(e)) 

def sentimentHelper(): 
    Tweet_api = TwitterClient()  
    tweets = Tweet_api.get_tweets(topic = 'cppsecrets.com', count = 50) 
    
    positive_tweets = [each_tweet for each_tweet in tweets if each_tweet['sentiment'] == 'positive'] 
    print(f"The Overall Positive tweets percentage are {(100*len(positive_tweets)/len(tweets))} %") 
    neutral_tweets = [each_tweet for each_tweet in tweets if each_tweet['sentiment'] == 'neutral']
    print(f"The Overall Neutral tweets percentage are {(100*len(neutral_tweets)/len(tweets))} %") 
    print(f"The Overall Negative tweets percentage are {(100*(len(tweets) -  (len(positive_tweets) + len(neutral_tweets))/len(tweets))} %")
    
    # Please Uncomment the following lines if you want to print your 
    # positive , negative or neutral tweets.
                                                           
    # print("\n\nPositive tweets:")
    # if (len(positive_tweets) < 10):
    #     for tweet in positive_tweets: 
    #         print(tweet['text'])
    # else:
    #     for tweet in positive_tweets[:10]: 
    #         print(tweet['text'])
   
    # print("\n\nNegative tweets:")
    # if (len(negative_tweets) < 10):
    #     for tweet in negative_tweets: 
    #         print(tweet['text'])
    # else:
    #     for tweet in negative_tweets[:10]: 
    #         print(tweet['text'])
    
    # print("\n\nNeutral tweets:")
    # if (len(neutral_tweets) < 10):
    #     for tweet in neutral_tweets: 
    #         print(tweet['text'])
    # else:
    #     for tweet in neutral_tweets[:10]:
    #         print(tweet['text'])
            
if __name__ == "__main__": 
	sentimentHelper()

SAMPLE OUTPUT:

The Overall Positive tweets percentage are 30 %
The Overall Neutral tweets percentage are 50 %
The Overall Negative tweets percentage are 20 %

Note that this data depends upon  the user account , i.e. it may vary in your case as you will search for your particular topic . You can also print the positive , negative and neutral tweets by Uncommenting the  given part above in our program.  

EXPLANATION:


We will discuss every piece of code accordingly.
Our program is broadly divided into three major subtasks. They are 

  • Authenticate the Twitter API Client .

We used a constructor __init__ to authenticate the Twitter Client as we handled the consumer key, consumer secret , Access token and Access token secret automatically when an object of  TwitterClient() class is created. 
 
  • Make a GET request to Twitter API to fetch tweets of a particular query which we have to provide.

We have a get_tweets() method in  which actually a call is being made to twitter API to fetch tweets i.e. 

fetched_tweets = self.api.search(q = topic, count = count)  
   
  Note that we have given a default argument count = 10 in case if function does not get a count value as argument it will automatically store 10 tweets .

  • Parse the tweets and classify them into Positive, Negative and  Neutral on the basis of their polarity.

This step requires a lot of patience as it contains many substeps .
First we called clean_tweet() method . It is basically a utility function to clean tweet text by removing links, special characters using simple regex statements. This function returns the clean tweet and that will be passed to the created TextBlob object . Now we had checked its sentiment polarity as greater than , less than or equal to 0 which is the basis of classification i.e. positive , negative or neutral tweet. Everytime we pass a tweet, many of the things are processed like Tokenizing tweets i.e. splitting of words , remove stopwords ( irrelevant words like I, We ,You etc) and also pick only some significant features like adverb, adjectives as we need only sentimental analysis . At last the tweet is going to be classified. Training Dataset is trained on a Naive Bayes Classifier; .

Finally parsed tweet is returned and classified on the basis of their sentiment.polarity property to give our desired result.

You can do various kinds of statistical data manipulation from tweets. That is all for this Tweet Sentiment Analysis Program . I hope you understood the basic idea of this project. 
                
Thank You !!



Comments