Title |
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model |
Authors |
Mohammad Hoseyn Sheykholeslam, Behrouz Minaei-Bidgoli and Hossein Juzi |
Abstract |
There are several methods offered for spelling correction in Farsi (Persian) Language. Unfortunately no powerful framework has been implemented because of lack of a large training set in Farsi as an accurate model. A training set consisting of erroneous and related correction string pairs have been obtained from a large number of instances of the books each of which were typed two times in Computer Research Center of Islamic Sciences. We trained our error model using this huge set. In testing part after finding erroneous words in sample text, our program proposes some candidates for related correction. The paper focuses on describing the method of ranking related corrections. This method is customized version of Noisy Channel Spelling Correction for Farsi. This ranking method attempts to find intended correction c from a typo t, that maximizes P(c) P(t | c). In this paper different methods are described and analyzed to obtain a wide overview of the field. Our evaluation results show that Noisy Channel Model using our corpus and training set in this framework works more accurately and improves efficiently in comparison with other methods. |
Topics |
Tools, systems, applications, Lexicon, lexical database, Text mining |
Full paper |
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model |
Bibtex |
@InProceedings{SHEYKHOLESLAM12.384,
author = {Mohammad Hoseyn Sheykholeslam and Behrouz Minaei-Bidgoli and Hossein Juzi}, title = {A Framework for Spelling Correction in Persian Language Using Noisy Channel Model}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |