Autocorrect Using NLP

We use machine learning for daily usage we don’t even realise like the autocorrect system in our keypads. We write ‘heyy’’ and it would suggest if we meant ‘hey’ or for ‘byi’ if we meant ‘bye’.

Let’s see today how it’s done, what all parameters or conditions we need to keep in mind.

Source: Google

We have to keep in mind the following steps:

  1. Identify the misspelt word
  2. Find that strings and calculate the edit distance
  3. Filter candidates
  4. Calculate the probability of using that word

Let’s go by each step.

Identification using NLP

  • We need to filter the data initially. By filtration, I mean separating the individual words from the set of data. 
  • After filtering data using NLP, we have to select each word from the set of words and find which of those words aren’t mention in the English dictionary.
  • We make a new set of incorrect spelling words.
Calling Required Libraries
Pre-Processing Of Data
Finding Each Word And Their Count In The Whole Set

Calculation of Edit Distance

  • The Edits can be following way,
    • Remove letter
    • Add a letter
    • Replace a letter

We give points to each function.

  • For each Addition and Removal of the letter, 1 Point is counted and for Replacement, 2 Points (since its Removal followed by an Addition)
  • If we use a Minimum Distance algorithm, we create a table with one incorrect word on an axis and word by word remaining words from the set of all words in the passage on the other axis. We calculate the edit distance at each letter comparison and finally the whole sum to change that specific word. 
  • Here we’ll simply use the Bayes algorithm with each word distances.
Checking Combinations By Deleting One Letter
Checking Combinations By Replacing One Letter
Checking Combinations By Inserting One Letter

Filtering Candidates

After finally finding all the combinations of each incorrectly spelt word a distance away. All the combinations of that word with one letter each added at each position or removed or replaced.

E.g.

Eet (for eat)- 

1. ADDITION (1 distance away)-

[aeet, beet, ceet, deet, eeet,……….., zeet], 

[eaet, ebet, ecet, edet, eeet,……….., ezet],

[eeat, eebt, eect, eedt, eeet,……….., eezt],

[eeta, eetb, eetc, eetd, eete,……….., eetz]

All the words in the English dictionary are filtered from these like [beet, feet,.. Etc]

2. SUBTRACTION (1 Distance away)-

[et, et, ee]

All the words in english dictionary are filtered from these [none here]

3. REPLACE (2 Distance away)- 

[aet, bet, cet, det, eet,……….., zet],

[eat, ebt, ect, edt, eet,……….., ezt],

[eea, eeb, eec, eed, eet,……….., eez],

All the words in English dictionary are thus filtered from these like [bet, met, eat, get,.. Etc]

however, From all these sets of words, we filter out all the words belonging in the English Dictionary and run the probabilities on their occurrence. 

Probability Of Occurrence Of Each Word In The List Of Actual Words
Finding The Edit Distance When One Letter Was Replaced, Inserted Or Edited
Finding The Edit Distance When Two Letters Were Replaced, Inserted Or thus Edited
Ordering Words With Probabilities In Descending Order, Best To Worst

Filtering Candidates using NLP

The last step would be finding the probabilities of each word’s occurrence from the final set.

P(w) = C(w)V

W: each word 

P(w): Probability of each word

C(w): No. of times each word occurs

V: Total words including repetition 

Eg: “I am happy because I am learning”

WordCount
IAmHappyBecauselearning22111
V = 7(total)

here, P(Happy) = 1 7 =0.143

Implementing the final code on the word ‘dye’

thus, In this code, we tried for ‘dys’

We also got 0.000019 and days with probability 0.000410 for dye.

however, also use The Maximum Edit Distance formula which was mentioned above. That calculates edit distance even when there are more than one or two misplaced letters.

although, We could also use Deep Learning to perform this functionality but the process required a stronger implementation and required the knowledge of neural networks.

thus, The above Code is inspired by the NLP assignment by deeplearning.ai of Coursera.

THANK YOU.

written by: Sparsh Nagpal

reviewed by: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *