Summary of the paper

Title All, and only, the Errors: more Complete and Consistent Spelling and OCR-Error Correction Evaluation
Authors Martin Reynaert
Abstract Some time in the future, some spelling error correction system will correct all the errors, and only the errors. We need evaluation metrics that will tell us when this has been achieved and that can help guide us there. We survey the current practice in the form of the evaluation scheme of the latest major publication on spelling correction in a leading journal. We are forced to conclude that while the metric used there can tell us exactly when the ultimate goal of spelling correction research has been achieved, it offers little in the way of directions to be followed to eventually get there. We propose to consistently use the well-known metrics Recall and Precision, as combined in the F score, on 5 possible levels of measurement that should guide us more informedly along that path. We describe briefly what is then measured or measurable at these levels and propose a framework that should allow for concisely stating what it is one performs in one’s evaluations. We finally contrast our preferred metrics to Accuracy, which is widely used in this field to this day and to the Area-Under-the-Curve, which is increasingly finding acceptance in other fields.
Language Language-independent
Topics Authoring tools, proofing, Evaluation methodologies, Standards for LRs
Full paper All, and only, the Errors: more Complete and Consistent Spelling and OCR-Error Correction Evaluation
Slides All, and only, the Errors: more Complete and Consistent Spelling and OCR-Error Correction Evaluation
Bibtex @InProceedings{REYNAERT08.477,
  author = {Martin Reynaert},
  title = {All, and only, the Errors: more Complete and Consistent Spelling and OCR-Error Correction Evaluation},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA