Title |
A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons |
Authors |
Satoshi Sato and Sayoko Kaide |
Abstract |
This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn_score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn_score takes one of five values, {+2, +1, 0, -1, -2}, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn_score can be easily extended to bilingual pn_score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955. |
Topics |
Tools, systems, applications, Lexicon, lexical database, Named Entity recognition |
Full paper |
A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons |
Slides |
- |
Bibtex |
@InProceedings{SATO10.343,
author = {Satoshi Sato and Sayoko Kaide}, title = {A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |