Title |
Comparison of some automatic and manual methods for summary evaluation based on the Text Summarization Challenge 2 |
Author(s) |
Hidetsugu Nanba (1), Manabu Okumura (2) (1) Hiroshima City University; (2) Tokyo Institute of Technology |
Session |
O29-EMSW |
Abstract |
In this paper, we compare some automatic and manual methods for summary evaluation. One of the essential points for evaluating a summary is how well the evaluation measure recognizes slight differences in the quality of the computer-produced summaries. In terms of this point, we examined 'evaluation by revision' using the data of the Text Summarization Challenge 2 (TSC2). Evaluation by revision is a manual method that was first used in TSC2, whose effectiveness has not been tested. First, we compared evaluation by revision with a ranking evaluation, which is a manual method used both in TSC1 and in TSC2, by checking the gaps of the edit distance from 0 to 1 at 0.1 intervals. To investigate the effectiveness of evaluation by revision, we also tested other automatic methods: content-based evaluation, BLEU and RED, and compare their results with that of evaluation by revision for reference. As a result, we found that evaluation by revision is effective for recognizing slight differences between computer-produced summaries. Second, we evaluated content-based evaluation, BLEU and RED by evaluation by revision, and compared the effectiveness of the three automatic methods. We found that RED is superior to the others in some examinations. |
Keyword(s) |
automatic summarization, automatic and manual evaluation, Text Summarization Challenge 2 |
Language(s) | Japanese |
Full Paper |