Title |
Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them |
Authors |
Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos Ramisch and Maria José Finatto |
Abstract |
Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions. |
Topics |
Evaluation Methodologies, Machine Translation, SpeechToSpeech Translation |
Full paper |
Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them |
Bibtex |
@InProceedings{LARANJEIRA14.1095,
author = {Bruno Laranjeira and Viviane Moreira and Aline Villavicencio and Carlos Ramisch and Maria José Finatto}, title = {Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |