Summary of the paper

Title Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Authors Jean-Philippe Goldman, Yves Scherrer, Julie Glikman, Mathieu Avanzi, Christophe Benzitoun and Philippe Boula de Mareüil
Abstract We present the crowdsourcing platform Donnez Votre Français à la Science (DFS, or “Give your French to Science”), which aims to collect linguistic data and document language use, with a special focus on regional variation in European French. The activities not only gather data that is useful for scientific studies, but they also provide feedback to the general public; this is important in order to reward participants, to encourage them to follow future surveys, and to foster interaction with the scientific community. The two main activities described here are 1) a linguistic survey on lexical variation with immediate feedback and 2) a speaker geolocalisation system; i.e., a quiz that guesses the linguistic origin of the participant by comparing their answers with previously gathered linguistic data. For the geolocalisation activity, we set up a simulation framework to optimise predictions. Three classification algorithms are compared: the first one uses clustering and shibboleth detection, whereas the other two rely on feature elimination techniques with Support Vector Machines and Maximum Entropy models as underlying base classifiers. The best-performing system uses a selection of 17 questions and reaches a localisation accuracy of 66%, extending the prediction from the one-best area (one among 109 base areas) to its first-order and second-order neighbouring areas.
Topics Crowdsourcing, Language Identification, Lexicon, Lexical Database
Full paper Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Bibtex @InProceedings{GOLDMAN18.517,
  author = {Jean-Philippe Goldman and Yves Scherrer and Julie Glikman and Mathieu Avanzi and Christophe Benzitoun and Philippe Boula de Mareüil},
  title = "{Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA