NLTK Or Natural Language Toolkit - Pianalytix - Build Real-World Tech Projects

NLTK is a toolkit which is one of the most powerful NLP (Natural Language Processing) libraries. however, it stands for Natural Language Toolkit. it is also used to understand human language and respond to. however, It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania.

though It written in Python Programming Language.

Hence, it have following steps. They are:-

Tokenization in NLTK

Tokenization is a first step in the process. thus, It is a process of split up a text into a individual units are known as Tokenization.

Example:-

Unigram, Bigrams & Trigrams

Human being can understand linguistic structure and their meaning easily, but machine can’t understand it easily, so we convert the text into bigrams, trigrams and ngrams.

Bigram: Bigram is thus two continuously word in the sentence.

Example:-

Stemming In NLTK

however, Stemming is a process of reducing a word or cutting the world.

# Stemming
from nltk.stem import PorterStemmerpst=PorterStemmer()print(pst.stem(“winning”),pst.stem(“giving”))

O/P:- (‘win’, ‘giv’)

Lemmatization In NLTK

Lemmatization is similar to stemming. thus, It is the process of reducing words into their lamma or dictionary.

#Lemmatization
from nltk.stem import wordnetfrom nltk.stem import WordNetLemmatizer
lemmatizer=WordNetLemmatizer()print(lemmatizer.lemmatize(“went”))

O/P:- go

Note:- also, Lemmatizer always provide correct root word

Part of Speech Tagging

Part of Speech tagging is classifying the word and provide the label according the part of speech. so, Words are categorized into 8 part of speech

1). Noun

2). ProNoun

3). Verb

4). Adverb

5). Adjective

6). Conjunction

7). Preposition

8). Interjection

# POS
sentence= “Delhi is the capital of India. It is situated on the banks of the river Yamuna.”sen_token=word_tokenize(sentence)print(nltk.pos_tag(sen_token))

O/P:- [(‘Delhi’, ‘NNP’), (‘is’, ‘VBZ’), (‘the’, ‘DT’), (‘capital’, ‘NN’), (‘of’, ‘IN’), (‘India’, ‘NNP’), (‘.’, ‘.’), (‘It’, ‘PRP’), (‘is’, ‘VBZ’), (‘situated’, ‘VBN’), (‘on’, ‘IN’), (‘the’, ‘DT’), (‘banks’, ‘NNS’), (‘of’, ‘IN’), (‘the’, ‘DT’), (‘river’, ‘NN’), (‘Yamuna’, ‘NNP’), (‘.’, ‘.’)]

written By: Chandra Shekhar Tiwari

reviewed by: Shivani Yadav

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Tokenization in NLTK

Unigram, Bigrams & Trigrams

Stemming In NLTK

Lemmatization In NLTK

Part of Speech Tagging

Leave a Comment Cancel Reply