| Title | Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them | 
  
  | Authors | Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos Ramisch and Maria José Finatto | 
  
  | Abstract | Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions. | 
  
  | Topics | Evaluation Methodologies, Machine Translation, SpeechToSpeech Translation | 
  
  | Full paper  | Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them | 
  
  | Bibtex | @InProceedings{LARANJEIRA14.1095, author =  {Bruno Laranjeira and Viviane Moreira and Aline Villavicencio and Carlos Ramisch and Maria José Finatto},
 title =  {Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them},
 booktitle =  {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
 year =  {2014},
 month =  {may},
 date =  {26-31},
 address =  {Reykjavik, Iceland},
 editor =  {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
 publisher =  {European Language Resources Association (ELRA)},
 isbn =  {978-2-9517408-8-4},
 language =  {english}
 }
 |