PROGRAMS FOR REMOVING STOP WORD WITH NLTK
The process of converting data to something a computer can understand is referred
to as pre-processing. One of the major forms of pre-processing is to filter out useless data. In natural language processing, useless words (data), are referred to as stop words.
What are Stop words?
A stop word is a commonly used word (such as the, a, Can, in) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query.
Why we remove stop words from our String or File?
We would not want these words to take up space in our database, or taking up the valuable processing time. For this, we can remove them easily, by storing a list of
words that you consider to stop words. NLTK(Natural Language Toolkit) in python
has a list of stop words stored in 16 different languages.
To check the list of stop words you can type the following commands in the python shell.
After running above you see all stop words in English are as follows.
Program for removing stop words into the string:
Program for removing stop words into the file:
First, make a file text.txt in the same directory where we code the next code also and write anything in that file as follows:
Then code your python file to removing stop words from text.txt file, as follows:
After running above code we will get a filteredtext.txt file which contains code that doesn't contain stop words are as follows:
We remove stop words from files for consuming processing time.