Need Spelling Suggestions? Head North

The true North for orthography

Page content

Using the right spellings shows good language skills. We often go wrong with our spellings and need a way to correct them. An automatic tool would help in this often overlooked task. North – a trie based spelling suggester helps you with some of your misspellings.

The types of misspellings

Misspellings can be broadly of three types:

  • Words misspelled at the beginning half of the word.
  • Misspellings at the second half of the word.
  • Transposition of letters in word.

Approach

You can approach your problem with the help of a data structure called Trie. A trie is a k-ary tree that contains two or more branches for each node. The trie initially starts with single letters and then build on suffixes of the word and finally the whole span of the word.

We first forward check the word for misspellings using a forward trie. Then, we reverse check the word using a reverse trie. We also do a ‘tail-swap’ while checking to generate transposition candidates.

We find the similarity between the given misspelled word and the correct word list by means of Damerau-Levenshtein (DL) distance algorithm. The DL algorithm has the additional capability to factor in letter transpositions. The candidate spellings ranked in increasing order of DL distance are returned as the spelling suggestions.

North is a library you can import and use. Here is an example to import and use North in your program.

from north import North

correct_word_list = ["Hayagreeva","Rama","Red","Violet","Blue","Green","Hanuman","Sugreeva",
                     "Bangalore","Bengaluru","India","Indianapolis","New Delhi"]

misspelled_word_list = ["Hanugreeva","hanman","Indiia","Bengalure","New Deli","Viola","Bloo"]

my_spelling_suggester = North(correct_word_list)

for misspelled_word in misspelled_word_list:
    print(misspelled_word, my_spelling_suggester(misspelled_word,topn=2))

Here are the spelling suggestions after running the program:

Hanugreeva ('Sugreeva', 'Hanuman')
hanman ('Hanuman',)
Indiia ('India', 'Indianapolis')
Bengalure ('Bengaluru', 'Bangalore')
New Deli ('New Delhi',)
Viola ('Violet', 'Rama')
Bloo ('Blue',)

Need spelling suggestions? Head North!