Natural Language Toolkit (NLTK)

NATURAL LANGUAGE PROCESS:

What separates human beings from other animals is their ability to express themselves in the form of language and understand that language.  In the same way, to understand human language, there is a sub-field in computer science called Natural language processing(NLP).  Like every other  program NLP requires the input and output here input and output could be 

  1. Speech
  2. Written Text
Fig  A:   Human Interaction With Computer

Natural language toolkit (NLTK):

There are a plethora of open source libraries available. Natural language toolkit (NLTK) is quite famous for that . Python is an easy to understand and high-level programming language and most importantly it is a very efficient language. 

Few Common terminology of NLTK:

Tokenization:

There are two things of tokenization: –

  1. Tokenization of words: In this we separate the words in the form of tokens
  2. Tokenization of sentence:In this we separate the sentence in the form of tokens

Lexicon and Corporas:

Corpora: It is the text of the body. Ex: Life is beautiful, Medical science

Lexicon: It defines the word and its meaning.

E.g.: investor speak regular english speaker speak

Investor speak ‘bull’: Someone who is very positive about the market

Regular english speaks’ bull’: It is an animal who is scary.

Practical:

 NLTK is very easy to follow and understand as it is a python written open-source library. Natural Language Toolkit is very interesting to and is really powerful.In this article we will going  to see following topic  

  1. How to install NLTK
  2. Some simple use of  nltk.
  3. Tokenization of the text
  4. synonyms on wordnet
  5. antonyms on wordnet
  6. Conclusion

How to install NLTK

To download Natural Language Toolkit on your computer, you should have a first python installed. 

You can use python pip to install nltk.

code: 1)
code:2)

 After the above command process the nltk package  will install and use that package in our program we need to import it in our program file  for that below we need to write.

code: 3)

 Finally, To install the package from nltk we can use the following command.

code: 4)

Now we can proceed towards actual practical use of NLTK.

Some simple use of NLTK

code:5)

In code:5)  we are simply converting the sentence in the form of tokens. Returns that sentence in the form of a list.

code:6)

In code6) we are doing tagging. Tagging is the process in which we convert sentences into list tuples.  Output will come  in the form of

word(Word,tag). Where tag is part of speech like noun, pronoun, verb, etc. In Which Methods NLTK Is Used.

Let us take an example: we have an Amazon food review wherein there is rating and feedback in the form of text is given. To know the exact useful data and for the refinement of data.

Following steps methods are used
  •  Handling missing data and duplicates: For handling missing values, we can simply drop them, and when we don’t want to lose the information we use some imputation techniques to fill these missing values. In our case we don’t have any of the missing values for content(text) & Rating (score), so we can go for the next step in which we drop the duplicates.
  • Punctuation Removal: All the marks and digits/numerical can remove for example we wish to compare the review to a list of English words.
  • Lowercase Conversion: Review may be more efficient and usable by converting it to lowercase which makes its comparison with an English dictionary easier.
  • Stop-words removal: Stop words are a class of some extremely common words that show no additional information when used in a text and are thus we can say to be useless. Examples include “a”, “an”, “the”, “he”, “she”, “by”, “on”, we can remove these words because they hold no additional information. since they are not in use most commonly used in all classes of text, for example when computing prior-sentiment-polarity of words in a review according to their frequency of occurrence in different classes and using this here we will understand our data first before we apply any modelling to it.

Application Of NLTK For NLP AND Python

 By using other libraries  of python we can clean data and also know analytics but with help of Natural Language Toolkit it is very easy and time saving . Most important use of nltk is for cleaning and analytics.

Reference:

Bird, Steven, Edward Loper, and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.

Written By: Triveni Kohale

Reviewed By: Vikas Bhardwaj

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *