Hansard transcripts provide access to the opinions of MPs on many important issues, but are rather difficult for people to effectively process. Existing corpora for sentiment analysis in Hansard debates rely on speakers' votes as sentiment polarity labels, but these votes are known to be constrained by speakers' party affiliations. Over two rounds of manual annotation, we develop an annotation scheme and create a novel corpus designed for use in the evaluation of automatic sentiment analysis systems using both automatically and manually applied speech sentiment polarity class labels. Following observations of the effects on speech sentiment of differing sentiment polarities in debate motions (proposals), we also apply sentiment labels to the debate motions. We find that humans are able to reach high agreement in identifying sentiment polarity in these debates, and also that manually applied and automatically retrieved class labels differ somewhat, suggesting that speech content does not always reflect the voting behaviour of Members of the Parliament.
@InProceedings{ABERCROMBIE18.9, author = {Gavin Abercrombie and Riza Theresa Batista-Navarro}, title = {A Sentiment-labelled Corpus of Hansard Parliamentary Debate Speeches}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Darja Fišer and Maria Eskevich and Franciska de Jong}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-02-3}, language = {english} }