Spell Checking for African Language Systems by Moses E. Ekpenyong & Duru G. Chinanuekpere
The development of spell checkers for African language systems is often faced with numerous challenges including inconsistent orthographic systems, complex morphology, interactions with tone structure, and lack of interest in the language. This contribution proposes a spell checker to advance our local language resource-base research, teaching and learning. It adopts a machine-readable format called the Speech Assessment Method for Phonetic Alphabet (SAMPA) as a standard to deal with the complex orthographic nature of our language systems, and relies on factors close to the text itself to ensure interoperability and portability across language domains and
operating system platforms. The system architecture has three sections and was implemented using a hybrid-based methodology. Three levels of evaluation namely the lexicon, error detection and correction, and similarities measure were carried out to judge the performance of the proposed system. It was observed that the ratio of unique valid words to OOV words was 13:9, which suggests the inclusion of sufficient knowledge-base or very large, well trained corpus into spell checking systems. Also about I 0% of the words were most likely to be the correct replacements of the words in error, as these words had similarities measure of at least 0.5. Whereas, 0.6% of the words generated very few similarities measure, our algorithm could still perform
optimally in the presence of sparse data.
Keywords: NLP, N—gram analysis, similarities measure, text processing