Python NLP Introduction














































Python NLP Introduction



NLP: Introduction

NLP stands for Natural Language Processing. NLP is one of the hottest fields of AI with tremendous potential for research and development. 

Some of the common applications of NLP include:
  1. Text To Speech or Speech to Text like in Apple Siri.
  2. Language translation like Google Translate.
These technologies are so integrated into our lives now that we cannot imagine one without them. NLP in nutshell is all about teaching any computer system the Human language and its expression in different contexts.

NLP is an important tool because when integrated with an IT machine, it gives a sense of "being alive" to its user. It also makes certain tasks easy as we don't need to write or read everything any more, the machine can write or speak for us.

NLP being such an important concept, there are various libraries/frameworks supported to help anyone who wants to learn or build NLP powered systems. 

Some of the top available libraries for NLP are:
  1. Natural Language Toolkit (NLTK)

    NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, etc. This library provides a practical introduction to programming for language processing. NLTK has been called %u201Ca wonderful tool for teaching and working in computational linguistics using Python,%u201D and %u201Can amazing library to play with natural language.%u201D

  2. Gensim

    Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is basically the natural language processing (NLP) and information retrieval (IR) community. The features of this library include such as all algorithms are memory-independent w.r.t. the corpus size, intuitive interfaces, efficient multicore implementations of popular algorithms, distributed computing, etc.

  3. polyglot

    Polyglot is a natural language pipeline which supports massive multilingual applications. The features include tokenisation, language detection, named entity recognition, part of speech tagging, sentiment analysis, word embeddings, etc.

  4. TextBlob

    TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, WordNet integration, parsing, word inflexion, add new models or languages through extensions, and more.

  5. CoreNLP

    Stanford CoreNLP provides a set of human language technology tools. Stanford CoreNLP%u2019s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. Stanford CoreNLP integrates many of Stanford%u2019s NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parserthe coreference resolution system, sentiment analysis, bootstrapped pattern learning, and the open information extraction tools. The tools variously use rule-based, probabilistic machine learning, and deep learning components.


In the next few articles, we will talk about the NLTK library as our main focus.



Comments