Categorizing and Tagging Sentences using NLTK in Python .














































Categorizing and Tagging Sentences using NLTK in Python .



  Tagging Sentences

In order to access NLTK's POS tagger, we'll need to import it. All import statements must go at the beginning of the script. Let's put this new import under our other import statement.


from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json')

Now, we can tag each of our tokens. NLTK allows us to do it all at once using: pos_tag_sents(). We are going to create a new variable tweets_tagged, which we will use to store our tagged lists. This new line can be put directly at the end of our current script:
tweets_tagged = pos_tag_sents(tweets_tokens))

To get an idea of what tagged tokens look like, here is what the first element in our tweets_tagged list looks like:

[(u'#FollowFriday', 'JJ'), (u'@France_Inte', 'NNP'), (u'@PKuchly57', 'NNP'), (u'@Milipol_Paris', 'NNP'), (u'for', 'IN'), (u'being', 'VBG'), (u'top', 'JJ'), (u'engaged', 'VBN'), (u'members', 'NNS'), (u'in', 'IN'), (u'my', 'PRP$'), (u'community', 'NN'), (u'this', 'DT'), (u'week', 'NN'), (u':)', 'NN')]


Counting POS Tags

Now,we will keep track of how many times JJ and NN appear using an accumulator (count) variable, which we will continuously add to every time we find a tag. First let's create our count at the bottom of our script, which we will first set to zero.

from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json') JJ_count = 0 NN_count = 0

After we create the variables, we'll create two for loops. The first loop will iterate through each tweet in the list. The second loop will iterate through each token/tag pair in each tweet. For each pair, we will look up the tag using the appropriate tuple index.

We will then check to see if the tag matches either the string 'JJ' or 'NN' by using conditional statements. If the tag is a match we will add (+= 1) to the appropriate accumulator.

from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json') JJ_count = 0 NN_count = 0 for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1

After the two loops are complete, we should have the total count for adjectives and nouns in our corpus. To see how many adjectives and nouns our script found, we'll add print statements to the end of the script.

for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1 print('Total number of adjectives = ', JJ_count) print('Total number of nouns = ', NN_count)

At this point, our program will be able to output the number of adjectives and nouns that were found in the corpus.


Save your .py file and run it to see how many adjectives and nouns we find:
Be patient, it might take a few seconds for the script to run. If all went well, when we run our script, we should get the following output:

Output Total number of adjectives = 6094 Total number of nouns = 13180

Finished Code For our finished code, we should add some comments to make it easier for others.Our script looks like this: # Import data and tagger from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents # Load tokenized tweets tweets_tokens = twitter_samples.tokenized('positive_tweets.json') # Tag tagged tweets tweets_tagged = pos_tag_sents(tweets_tokens) # Set accumulators JJ_count = 0 NN_count = 0 # Loop through list of tweets for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1 # Print total numbers for each adjectives and nouns print('Total number of adjectives = ', JJ_count) print('Total number of nouns = ', NN_count)

Output
Total number of adjectives = 6094 Total number of nouns = 13180


More Articles of Khushboo Singh:

Name Views Likes
Python program to insert an element in binary tree. 867 20
Tokenize text using NLTK in Python. 1251 12
Python Remove multiple elements from list while Iterating. 774 22
Python How to Check if an item exists in list ? 4323 14
Python How to remove multiple elements from list ? 788 26
Python program to check if two trees are mirror of each other without using recursion. 695 19
Python program to find maximum in Binary tree. 974 19
Python Check if all elements are same using Set 752 15
Python program to find diameter of a binary tree. 1127 20
Python program to print root to leaf paths without using recursion. 870 20
Python program to find root of the tree where children id sum for every node is given. 710 23
Introduction of Python NLTK library 1405 25
Categorizing and Tagging Sentences using NLTK in Python . 1063 19
Python program to find height of a tree without using recursion. 690 16
Python program to find sum of all nodes of the given perfect binary tree. 696 19
Python program to find minimum in binary tree. 860 23
Python Check if element exist in list using list.count() function. 723 13
Python program to convert a given binary tree to doubly linked list. 923 20
Python program to find distance between two nodes of a binary tree. 1561 20
NLTK stop Words 1196 13
Python program to find largest binary search tree in a Binary Tree. 973 20
Python program to find inorder successor in binary search tree with recursion. 1245 18
Python program to convert a binary tree into doubly linked list in spiral fashion. 828 15
Python List check if element are same using all() 713 12
Python program to check if two trees are identical using recursion. 739 30
Python Find the occurrence count of an element in the tuple using count() 1001 23
Python Convert two lists to a dictionary 761 19
Python program to construct a complete binary tree from given array. 1480 14
Python program to find diameter of binary tree in O(n). 908 17
Introduction to the AVL tree. 810 15
Python program to check if two trees are identical without using recursion 693 17
Python Convert a list of tuples to dictionary. 1134 24
Python program to convert a binary tree to a circular doubly link list. 688 21
Python Check if element exist in list based on own logic. 787 23
Python program to merge two binary trees by doing node sum using recursion 1042 27
Python program to check whether a given binary tree is perfect or not. 712 17
Python Check if all elements are same using list.count(). 1126 28
Python program to find an element into binary tree 658 12
Python program to find lowest common ancestor in a binary tree 1254 24

Comments