Title |
How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese. |
Authors |
Jerid Francom, Amy LaCross and Adam Ussishkin |
Abstract |
In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density languages, drawing on well-established psycholinguistic factors. Using the low-density language Maltese as a test case, we highlight the challenges that face researchers developing resources for languages with sparsely available data and identify a key empirical link between corpus and psycholinguistic research as a tool to evaluate corpus resources. Specifically, we compare two robust variables identified in the psycholinguistic literature: word frequency (as measured in a corpus) and word familiarity (as measured in a rating task). We then apply statistical methods to evaluate the extent to which familiarity ratings predict corpus frequency for verbs in the Maltese corpus from three angles: 1) token frequency, 2) frequency distributions and 3) morpho-syntactic type (binyan). This research provides a multidisciplinary approach to corpus development and evaluation, in particular for less-resourced languages that lack a wide access to diverse language data. |
Topics |
Validation of LRs, Cognitive methods, Corpus (creation, annotation, etc.) |
Full paper |
How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese. |
Slides |
How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese. |
Bibtex |
@InProceedings{FRANCOM10.666,
author = {Jerid Francom and Amy LaCross and Adam Ussishkin}, title = {How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |