Categorizing and Tagging Sentences using NLTK in Python .














































Categorizing and Tagging Sentences using NLTK in Python .



  Tagging Sentences

In order to access NLTK's POS tagger, we'll need to import it. All import statements must go at the beginning of the script. Let's put this new import under our other import statement.


from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json')

Now, we can tag each of our tokens. NLTK allows us to do it all at once using: pos_tag_sents(). We are going to create a new variable tweets_tagged, which we will use to store our tagged lists. This new line can be put directly at the end of our current script:
tweets_tagged = pos_tag_sents(tweets_tokens))

To get an idea of what tagged tokens look like, here is what the first element in our tweets_tagged list looks like:

[(u'#FollowFriday', 'JJ'), (u'@France_Inte', 'NNP'), (u'@PKuchly57', 'NNP'), (u'@Milipol_Paris', 'NNP'), (u'for', 'IN'), (u'being', 'VBG'), (u'top', 'JJ'), (u'engaged', 'VBN'), (u'members', 'NNS'), (u'in', 'IN'), (u'my', 'PRP$'), (u'community', 'NN'), (u'this', 'DT'), (u'week', 'NN'), (u':)', 'NN')]


Counting POS Tags

Now,we will keep track of how many times JJ and NN appear using an accumulator (count) variable, which we will continuously add to every time we find a tag. First let's create our count at the bottom of our script, which we will first set to zero.

from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json') JJ_count = 0 NN_count = 0

After we create the variables, we'll create two for loops. The first loop will iterate through each tweet in the list. The second loop will iterate through each token/tag pair in each tweet. For each pair, we will look up the tag using the appropriate tuple index.

We will then check to see if the tag matches either the string 'JJ' or 'NN' by using conditional statements. If the tag is a match we will add (+= 1) to the appropriate accumulator.

from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents tweets = twitter_samples.strings('positive_tweets.json') tweets_tokens = twitter_samples.tokenized('positive_tweets.json') JJ_count = 0 NN_count = 0 for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1

After the two loops are complete, we should have the total count for adjectives and nouns in our corpus. To see how many adjectives and nouns our script found, we'll add print statements to the end of the script.

for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1 print('Total number of adjectives = ', JJ_count) print('Total number of nouns = ', NN_count)

At this point, our program will be able to output the number of adjectives and nouns that were found in the corpus.


Save your .py file and run it to see how many adjectives and nouns we find:
Be patient, it might take a few seconds for the script to run. If all went well, when we run our script, we should get the following output:

Output Total number of adjectives = 6094 Total number of nouns = 13180

Finished Code For our finished code, we should add some comments to make it easier for others.Our script looks like this: # Import data and tagger from nltk.corpus import twitter_samples from nltk.tag import pos_tag_sents # Load tokenized tweets tweets_tokens = twitter_samples.tokenized('positive_tweets.json') # Tag tagged tweets tweets_tagged = pos_tag_sents(tweets_tokens) # Set accumulators JJ_count = 0 NN_count = 0 # Loop through list of tweets for tweet in tweets_tagged: for pair in tweet: tag = pair[1] if tag == 'JJ': JJ_count += 1 elif tag == 'NN': NN_count += 1 # Print total numbers for each adjectives and nouns print('Total number of adjectives = ', JJ_count) print('Total number of nouns = ', NN_count)

Output
Total number of adjectives = 6094 Total number of nouns = 13180


More Articles of Khushboo Singh:

Name Views Likes
Python program to insert an element in binary tree. 820 20
Tokenize text using NLTK in Python. 1198 12
Python Remove multiple elements from list while Iterating. 731 22
Python How to Check if an item exists in list ? 4267 14
Python How to remove multiple elements from list ? 737 26
Python program to check if two trees are mirror of each other without using recursion. 660 19
Python program to find maximum in Binary tree. 929 19
Python Check if all elements are same using Set 709 15
Python program to find diameter of a binary tree. 1081 20
Python program to print root to leaf paths without using recursion. 840 20
Python program to find root of the tree where children id sum for every node is given. 669 23
Introduction of Python NLTK library 1355 25
Categorizing and Tagging Sentences using NLTK in Python . 1000 19
Python program to find height of a tree without using recursion. 662 16
Python program to find sum of all nodes of the given perfect binary tree. 656 19
Python program to find minimum in binary tree. 821 23
Python Check if element exist in list using list.count() function. 690 13
Python program to convert a given binary tree to doubly linked list. 883 20
Python program to find distance between two nodes of a binary tree. 1521 20
NLTK stop Words 1134 13
Python program to find largest binary search tree in a Binary Tree. 934 20
Python program to find inorder successor in binary search tree with recursion. 1201 18
Python program to convert a binary tree into doubly linked list in spiral fashion. 794 15
Python List check if element are same using all() 667 12
Python program to check if two trees are identical using recursion. 706 30
Python Find the occurrence count of an element in the tuple using count() 964 23
Python Convert two lists to a dictionary 723 19
Python program to construct a complete binary tree from given array. 1426 14
Python program to find diameter of binary tree in O(n). 870 17
Introduction to the AVL tree. 740 15
Python program to check if two trees are identical without using recursion 660 17
Python Convert a list of tuples to dictionary. 1077 24
Python program to convert a binary tree to a circular doubly link list. 645 21
Python Check if element exist in list based on own logic. 737 23
Python program to merge two binary trees by doing node sum using recursion 999 27
Python program to check whether a given binary tree is perfect or not. 675 17
Python Check if all elements are same using list.count(). 1083 28
Python program to find an element into binary tree 627 12
Python program to find lowest common ancestor in a binary tree 1209 24

Comments