Title |
Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler |
Authors |
Michael Gasser |
Abstract |
Resource-poor languages may suffer from a lack of any of the basic resources that are fundamental to computational linguistics, including an adequate digital lexicon. Given the relatively small corpus of texts that exists for such languages, extending the lexicon presents a challenge. Languages with complex morphology present a special case, however, because individual words in these languages provide a great deal of information about the grammatical properties of the roots that they are based on. Given a morphological analyzer, it is even possible to extract novel roots from words. In this paper, we look at the case of Tigrinya, a Semitic language with limited lexical resources for which a morphological analyzer is available. It is shown that this analyzer applied to the list of more than 200,000 Tigrinya words that is extracted by a web crawler can extend the lexicon in two ways, by adding new roots and by inferring some of the derivational constraints that apply to known roots. |
Topics |
Lexicon, lexical database, Morphology, Grammar and Syntax |
Full paper |
Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler |
Slides |
- |
Bibtex |
@InProceedings{GASSER10.926,
author = {Michael Gasser}, title = {Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |