PROGRAM FOR STEMMING WORDS AND SENTENCE WITH NLTK
Stemming is the process of producing morphological variants of a root/base word.
The idea of stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than when tense is involved.
The most popular stemming algorithms are the Porter stemmer, which has been around since 1979.
Why Do we use stem?
We use the stem to shorten the lookup and normalize sentences.
Consider some example:
Original sentence: I was taking a ride in the car.
stemmed sentence: I was riding in the car.
This sentence means the same thing. the car is the same. I was in the same. the
ing denotes a clear past-tense in both cases, so is it truly necessary to
differentiate between ride and riding, in the case of just trying to figure out
the meaning of what this past-tense activity was?
Errors in Stemming:
There are mainly two errors in stemming are follows:
1. Overstemming: Overstemming occurs when two words are stemmed from the same root that is of different stems.
2.Under-stemming: Under-stemming occurs when two words are stemmed from the same root that is not of different stems.
Applications of stemming are:
1. Stemming is used in information retrieval systems like search engines.
2. It is used to determine domain vocabularies in domain analysis.
Program for stemming words using NLTK:
Program for stemming Sentence using NLTK:
We can use the stem to shorten lookup and normalize string.