| Title | Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage | 
  
  | Authors | Kevin Black, Eric Ringger, Paul Felt, Kevin Seppi, Kristian Heal and Deryle Lonsdale | 
  
  | Abstract | The task of corpus-dictionary linkage (CDL) is to annotate each word in a corpus with a link to an appropriate dictionary entry that documents the sense and usage of the word. Corpus-dictionary linked resources include concordances, dictionaries with word usage examples, and corpora annotated with lemmas or word-senses. Such CDL resources are essential in learning a language and in linguistic research, translation, and philology. Lemmatization is a common approximation to automating corpus-dictionary linkage, where lemmas are treated as dictionary entry headwords. We intend to use data-driven lemmatization models to provide machine assistance to human annotators in the form of pre-annotations, and thereby reduce the costs of CDL annotation. In this work we adapt the discriminative string transducer DirecTL+ to perform lemmatization for classical Syriac, a low-resource language. We compare the accuracy of DirecTL+ with the Morfette discriminative lemmatizer. DirecTL+ achieves 96.92% overall accuracy but only by a margin of 0.86% over Morfette at the cost of a longer time to train the model. Error analysis on the models provides guidance on how to apply these models in a machine assistance setting for corpus-dictionary linkage. | 
  
  | Topics | Collaborative Resource Construction, Linked Data | 
  
  | Full paper  | Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage | 
  
  | Bibtex | @InProceedings{BLACK14.1203, author =  {Kevin Black and Eric Ringger and Paul Felt and Kevin Seppi and Kristian Heal and Deryle Lonsdale},
 title =  {Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage},
 booktitle =  {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
 year =  {2014},
 month =  {may},
 date =  {26-31},
 address =  {Reykjavik, Iceland},
 editor =  {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
 publisher =  {European Language Resources Association (ELRA)},
 isbn =  {978-2-9517408-8-4},
 language =  {english}
 }
 |