NLTK is a toolkit which is one of the most powerful NLP (Natural Language Processing) libraries. however, it stands for Natural Language Toolkit. it is also used to understand human language and respond to. however, It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania.
though It written in Python Programming Language.
Hence, it have following steps. They are:-
Tokenization in NLTK
Tokenization is a first step in the process. thus, It is a process of split up a text into a individual units are known as Tokenization.
Example:-
Unigram, Bigrams & Trigrams
Human being can understand linguistic structure and their meaning easily, but machine can’t understand it easily, so we convert the text into bigrams, trigrams and ngrams.
- Bigram: Bigram is thus two continuously word in the sentence.
Example:-
Stemming In NLTK
however, Stemming is a process of reducing a word or cutting the world.
# Stemming from nltk.stem import PorterStemmerpst=PorterStemmer()print(pst.stem(“winning”),pst.stem(“giving”)) |
O/P:- (‘win’, ‘giv’) |
Lemmatization In NLTK
Lemmatization is similar to stemming. thus, It is the process of reducing words into their lamma or dictionary.
#Lemmatization from nltk.stem import wordnetfrom nltk.stem import WordNetLemmatizer lemmatizer=WordNetLemmatizer()print(lemmatizer.lemmatize(“went”)) |
O/P:- go |
Note:- also, Lemmatizer always provide correct root word
Part of Speech Tagging
Part of Speech tagging is classifying the word and provide the label according the part of speech. so, Words are categorized into 8 part of speech
1). Noun
2). ProNoun
3). Verb
4). Adverb
5). Adjective
6). Conjunction
7). Preposition
8). Interjection
# POS sentence= “Delhi is the capital of India. It is situated on the banks of the river Yamuna.”sen_token=word_tokenize(sentence)print(nltk.pos_tag(sen_token)) |
O/P:- [(‘Delhi’, ‘NNP’), (‘is’, ‘VBZ’), (‘the’, ‘DT’), (‘capital’, ‘NN’), (‘of’, ‘IN’), (‘India’, ‘NNP’), (‘.’, ‘.’), (‘It’, ‘PRP’), (‘is’, ‘VBZ’), (‘situated’, ‘VBN’), (‘on’, ‘IN’), (‘the’, ‘DT’), (‘banks’, ‘NNS’), (‘of’, ‘IN’), (‘the’, ‘DT’), (‘river’, ‘NN’), (‘Yamuna’, ‘NNP’), (‘.’, ‘.’)] |
written By: Chandra Shekhar Tiwari
reviewed by: Shivani Yadav
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs