Categorizing and Tagging Sentences using NLTK in Python .














































Categorizing and Tagging Sentences using NLTK in Python .



  Tagging Sentences

In order to access NLTK's POS tagger, we'll need to import it. All import statements must go at the beginning of the script. Let's put this new import under our other import statement.


from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json')

Now, we can tag each of our tokens. NLTK allows us to do it all at once using: pos_tag_sents(). We are going to create a new variable tweets_tagged, which we will use to store our tagged lists. This new line can be put directly at the end of our current script:
tweets_tagged = pos_tag_sents(tweets_tokens))

To get an idea of what tagged tokens look like, here is what the first element in our tweets_tagged list looks like:

[(u'#FollowFriday', 'JJ'), (u'@France_Inte', 'NNP'), (u'@PKuchly57', 'NNP'), (u'@Milipol_Paris', 'NNP'), (u'for', 'IN'), (u'being', 'VBG'), (u'top', 'JJ'), (u'engaged', 'VBN'), (u'members', 'NNS'), (u'in', 'IN'), (u'my', 'PRP$'), (u'community', 'NN'), (u'this', 'DT'), (u'week', 'NN'), (u':)', 'NN')]


Counting POS Tags

Now,we will keep track of how many times JJ and NN appear using an accumulator (count) variable, which we will continuously add to every time we find a tag. First let's create our count at the bottom of our script, which we will first set to zero.

from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json') JJ_count = 0 NN_count = 0

After we create the variables, we'll create two for loops. The first loop will iterate through each tweet in the list. The second loop will iterate through each token/tag pair in each tweet. For each pair, we will look up the tag using the appropriate tuple index.

We will then check to see if the tag matches either the string 'JJ' or 'NN' by using conditional statements. If the tag is a match we will add (+= 1) to the appropriate accumulator.

from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json') JJ_count = 0 NN_count = 0 for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1

After the two loops are complete, we should have the total count for adjectives and nouns in our corpus. To see how many adjectives and nouns our script found, we'll add print statements to the end of the script.

for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1 print('Total number of adjectives = ', JJ_count) print('Total number of nouns = ', NN_count)

At this point, our program will be able to output the number of adjectives and nouns that were found in the corpus.


Save your .py file and run it to see how many adjectives and nouns we find:
Be patient, it might take a few seconds for the script to run. If all went well, when we run our script, we should get the following output:

Output Total number of adjectives = 6094 Total number of nouns = 13180

Finished Code For our finished code, we should add some comments to make it easier for others.Our script looks like this: # Import data and tagger from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents # Load tokenized tweets tweets_tokens = twitter_samples.tokenized('positive_tweets.json') # Tag tagged tweets tweets_tagged = pos_tag_sents(tweets_tokens) # Set accumulators JJ_count = 0 NN_count = 0 # Loop through list of tweets for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1 # Print total numbers for each adjectives and nouns print('Total number of adjectives = ', JJ_count) print('Total number of nouns = ', NN_count)

Output
Total number of adjectives = 6094 Total number of nouns = 13180


More Articles of Khushboo Singh:

Name Views Likes
Python program to insert an element in binary tree. 903 20
Tokenize text using NLTK in Python. 1290 12
Python Remove multiple elements from list while Iterating. 812 22
Python How to Check if an item exists in list ? 4361 14
Python How to remove multiple elements from list ? 839 26
Python program to check if two trees are mirror of each other without using recursion. 729 19
Python program to find maximum in Binary tree. 1016 19
Python Check if all elements are same using Set 806 15
Python program to find diameter of a binary tree. 1158 20
Python program to print root to leaf paths without using recursion. 913 20
Python program to find root of the tree where children id sum for every node is given. 740 23
Introduction of Python NLTK library 1441 25
Categorizing and Tagging Sentences using NLTK in Python . 1109 19
Python program to find height of a tree without using recursion. 731 16
Python program to find sum of all nodes of the given perfect binary tree. 733 19
Python program to find minimum in binary tree. 895 23
Python Check if element exist in list using list.count() function. 758 13
Python program to convert a given binary tree to doubly linked list. 955 20
Python program to find distance between two nodes of a binary tree. 1589 20
NLTK stop Words 1255 13
Python program to find largest binary search tree in a Binary Tree. 1007 20
Python program to find inorder successor in binary search tree with recursion. 1276 18
Python program to convert a binary tree into doubly linked list in spiral fashion. 873 15
Python List check if element are same using all() 749 12
Python program to check if two trees are identical using recursion. 768 30
Python Find the occurrence count of an element in the tuple using count() 1033 23
Python Convert two lists to a dictionary 804 19
Python program to construct a complete binary tree from given array. 1514 14
Python program to find diameter of binary tree in O(n). 945 17
Introduction to the AVL tree. 859 15
Python program to check if two trees are identical without using recursion 724 17
Python Convert a list of tuples to dictionary. 1169 24
Python program to convert a binary tree to a circular doubly link list. 712 21
Python Check if element exist in list based on own logic. 817 23
Python program to merge two binary trees by doing node sum using recursion 1077 27
Python program to check whether a given binary tree is perfect or not. 753 17
Python Check if all elements are same using list.count(). 1182 28
Python program to find an element into binary tree 690 12
Python program to find lowest common ancestor in a binary tree 1285 24

Comments