Title |
Creating and Curating a Cross-Language Person-Entity Linking Collection |
Authors |
Dawn Lawrie, James Mayfield, Paul McNamee and Douglas Oard |
Abstract |
To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. Name projections are then curated, again through crowdsourcing. This technique resulted in the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages. |
Topics |
Corpus (creation, annotation, etc.), Person Identification, Information Extraction, Information Retrieval |
Full paper |
Creating and Curating a Cross-Language Person-Entity Linking Collection |
Bibtex |
@InProceedings{LAWRIE12.655,
author = {Dawn Lawrie and James Mayfield and Paul McNamee and Douglas Oard}, title = {Creating and Curating a Cross-Language Person-Entity Linking Collection}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |