Python NLTK Parts of speech tagging with stop_words














































Python NLTK Parts of speech tagging with stop_words



PART OF SPEECH TAGGING WITH STOP WORDS USING NLTK IN PYTHON


One of the more powerful aspects of the NLTK module is the Part of Speech tagging that it can do for you.

This means labeling words in a sentence as nouns, adjectives, verbs...etc. Even more impressive, it also labels by tense, and more. 

Here's a list of the tags, what they mean, and some examples:

CC coordinating conjunction

CD cardinal digit

DT determiner

EX existential there (like: %u201Cthere is%u201D %u2026 think of it like %u201Cthere exists%u201D)

FW foreign word

IN preposition/subordinating
conjunction


JJ adjective %u2018big%u2019

JJR adjective, comparative %u2018bigger%u2019

JJS adjective, superlative %u2018biggest%u2019

LS list marker 1)

MD modal could, will

NN noun, singular %u2018desk%u2019

NNS noun plural %u2018desks%u2019

NNP proper noun, singular %u2018Harrison%u2019

NNPS proper noun, plural %u2018Americans%u2019

PDT predeterminer %u2018all the kids%u2019

POS possessive ending parent%u2018s

PRP personal pronoun I, he, she

PRP$ possessive pronoun my, his, hers

RB adverb very, silently,

RBR adverb, comparative better

RBS adverb, superlative best

RP particle give up

TO to go %u2018to%u2018 the
store.


UH interjection errrrrrrrm

VB verb, base form take

VBD verb, past tense, took

VBG verb, gerund/present participle taking

VBN verb, the past participle is taken

VBP verb, sing. present, non-3d take

VBZ verb, 3rd person sing. present takes

WDT wh-determiner which

WP wh-pronoun who, what

WP$ possessive wh-pronoun whose

WRB wh-adverb where, when


          Text may contain stop words like the, u, is, are. Stop words can be filtered from the text to be processed.

There is no universal list of stop words in NLP research, however, the nltk module contains a list of stop words.


How might we use this?

While we're at it, we're going to cover a new sentence tokenizer called the PunktSentenceTokenizer.

This tokenizer is capable of unsupervised machine learning, so you can actually train it on any body of text that you use. 


Program for Parts of Speech tagging with stop words using NLTK in Python:


Output:


Conclusion:

Using parts of speech we labeled sentences as a noun, adjective, or verb.


 


Comments