Summary of the paper

Title Multilingual Semantic Networks for Data-driven Interlingua Seq2Seq Systems
Authors Cristina España-Bonet and Josef Van Genabith
Abstract Neural machine translation systems are state-of-the-art for most language pairs despite the fact that they are relatively recent and that because of this there is likely room for even further improvements. Here, we explore whether, and if so, to what extent, semantic networks can help improve NMT. In particular, we (i) study the contribution of the nodes of the semantic network, synsets, as factors in multilingual neural translation engines. We show that they improve a state-of-the-art baseline and that they facilitate the translation from languages that have not been seen at all in training (beyond zero-shot translation). Taking this idea to an extreme, we (ii) use synsets as the basic unit to encode the input and turn the source language into a data-driven interlingual language. This transformation boosts the performance of the neural system for unseen languages achieving an improvement of 4.85 and 8.24 BLEU points for fr2en and es2en respectively when neither corpora in fr or es has been used. In (i), the enhancement comes about because cross-language synsets help to cluster together by semantics the words in different languages and to map the unknown words of a new language into the multilingual clusters. In (ii), because with the data-driven interlingua there is no unknown language if it is covered by the semantic network. However, non-content words are not represented in the semantic network, and a higher level of abstraction is still needed in order to go a step further and train these systems with only monolingual corpora for example.
Topics Multilingual Neural Machine Translation, Babelnet, Semantic Networks
Full paper Multilingual Semantic Networks for Data-driven Interlingua Seq2Seq Systems
Bibtex @InProceedings{ESPAÑA-BONET18.8,
  author = {Cristina España-Bonet and Josef Van Genabith},
  title = {Multilingual Semantic Networks for Data-driven Interlingua Seq2Seq Systems},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Jinhua Du and Mihael Arcan and Qun Liu and Hitoshi Isahara},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-15-3},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA