W25 2018 Proceedings

Summary of the paper

Title	A 20% Jump in Duplicate Question Detection Accuracy? Replicating IBM team’s experiment and finding problems in its data preparation
Authors	João Silva, João António Rodrigues, Vladislav Maraev, Chakaveh Saedi Saedi and António Branco
Abstract	Validation of experimental results through their replication is central to the scientific process. In the current paper we report on our efforts to replicate the central result in the Bogdanova et al. (2015) paper, Detecting Semantically Equivalent Questions in Online User Forums, which achieved results far surpassing the state-of-the-art for the task of duplicate question detection, and how that effort allowed us to find a flaw in data preprocessing in the original paper that casts doubt on the validity of the results reported there.
Full paper	A 20% Jump in Duplicate Question Detection Accuracy? Replicating IBM team’s experiment and finding problems in its data preparation
Bibtex	@InProceedings{SILVA18.7, author = {João Silva ,João António Rodrigues ,Vladislav Maraev ,Chakaveh Saedi Saedi and António Branco}, title = {A 20% Jump in Duplicate Question Detection Accuracy? Replicating IBM team’s experiment and finding problems in its data preparation}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {António Branco and Nicoletta Calzolari and Khalid Choukri}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-21-4}, language = {english} }