LREC 2008 Proceedings

Summary of the paper

Title	Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods
Authors	Bogdan Babych and Anthony Hartley
Abstract	We report the results of our experiment on assessing the ability of automated MT evaluation metrics to remain sensitive to variations in MT quality as the average quality of the compared systems goes up. We compare two groups of metrics: those, which measure the proximity of MT output to some reference translation, and those which evaluate the performance of some automated process on degraded MT output. The experiment shows that proximity-based metrics (such as BLEU) loose sensitivity as the scores go up, but performance-based metrics (e.g., Named Entity recognition from MT output) remain sensitive across the scale. We suggest a model for explaining this result, which attributes stable sensitivity of performance-based metrics to measuring cumulative functional effect of different language levels, while proximity-based metrics measure structural matches on a lexical level and therefore miss higher-level errors that are more typical for better MT systems. Development of new automated metrics should take into account possible decline in sensitivity on higher-quality MT, which should be tested as part of meta-evaluation of the metrics.
Language	Language-independent
Topics	Machine Translation, SpeechToSpeech Translation, Evaluation methodologies
Full paper	Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods
Slides	Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods
Bibtex	@InProceedings{BABYCH08.542, author = {Bogdan Babych and Anthony Hartley}, title = {Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods}, booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)}, year = {2008}, month = {may}, date = {28-30}, address = {Marrakech, Morocco}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-4-0}, note = {http://www.lrec-conf.org/proceedings/lrec2008/}, language = {english} }