Title |
Enrichment of Bilingual Dictionary through News Stream Data |
Authors |
Ajay Dubey, Parth Gupta, Vasudeva Varma and Paolo Rosso |
Abstract |
Bilingual dictionaries are the key component of the cross-lingual similarity estimation methods. Usually such dictionary generation is accomplished by manual or automatic means. Automatic generation approaches include to exploit parallel or comparable data to derive dictionary entries. Such approaches require large amount of bilingual data in order to produce good quality dictionary. Many time the language pair does not have large bilingual comparable corpora and in such cases the best automatic dictionary is upper bounded by the quality and coverage of such corpora. In this work we propose a method which exploits continuous quasi-comparable corpora to derive term level associations for enrichment of such limited dictionary. Though we propose our experiments for English and Hindi, our approach can be easily extendable to other languages. We evaluated dictionary by manually computing the precision. In experiments we show our approach is able to derive interesting term level associations across languages. |
Topics |
Information Extraction, Information Retrieval, Text Mining |
Full paper |
Enrichment of Bilingual Dictionary through News Stream Data |
Bibtex |
@InProceedings{DUBEY14.1105,
author = {Ajay Dubey and Parth Gupta and Vasudeva Varma and Paolo Rosso}, title = {Enrichment of Bilingual Dictionary through News Stream Data}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |