LREC COLING 2024 Proceedings Home | Workshops | Tutorials | LREC Proceedings | ELRA Website | ICCL Website


The 17th Workshop on Building and Using Comparable Corpora (BUCC)


Full proceedings volume (PDF) | Workshop Site | Home | Programme | Author index | Bibliography (BibTeX) | Editors

PROGRAM

Monday, 20 May, 2024

 9:00–10:30 Session 1
9:00–9:30On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings
Guillem Ramírez, Rumen Dangovski, Preslav Nakov and Marin Soljacic
9:30–10:00Modeling Diachronic Change in English Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal
Julius Steuer, Marie-Pauline Krielke, Stefan Fischer, Stefania Degaetano-Ortlieb, Marius Mosbach and Dietrich Klakow
10:00–10:30PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
Tomás Freitas Osório, Bernardo Leite, Henrique Lopes Cardoso, Luís Gomes, João Rodrigues, Rodrigo Santos and António Branco
 10:30–11:00 Coffee break
 11:00–13:00 Session 2
11:00–12:00Invited Talk: The Way Towards Massively Multilingual Language Models
François Yvon
12:00–12:30Quality and Quantity of Machine Translation References for Automatic Metrics
Vilém Zouhar and Ondřej Bojar
12:30–13:00Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long, ZhenHao Tang, Xianghua Fu, Jian Chen, Shilong Hou and Jinze Lyu
 13:00–14:00 Lunch break
 14:00–16:00 Session 3
14:00–14:30Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles
Abdelhadi Soudi, Mohamed Hannani, Kristof Van Laerhoven and Eleftherios Avramidis
14:30–15:00INCLURE: a Dataset and Toolkit for Inclusive French Translation
Paul Lerner and Cyril Grouin
15:00–15:30BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation
Sourav Saha, Zeshan Ahmed Nobin, Mufassir Ahmad Chowdhury, Md. Shakirul Hasan Khan Mobin, Mohammad Ruhul Amin and Sudipta Kar
15:30–16:00Booster presentations
poster authors
 16:00–16:30 Coffee break
 16:30–18:00 Poster session
 Creating Clustered Comparable Corpora from Wikipedia with Different Fuzziness Levels and Language Representativity
Anna Laskina, Eric Gaussier and Gaelle Calvary
 EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Marc Kupietz, Piotr Banski, Nils Diewald, Beata Trawinski and Andreas Witt
 Building Annotated Parallel Corpora Using the ATIS Dataset: Two UD-style treebanks in English and Turkish
Neslihan Cesur, Aslı Kuzgun, Mehmet Kose and Olcay Taner Yıldız
 Bootstrapping the Annotation of UD Learner Treebanks
Arianna Masciolini
 SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish
Felix Morger
 Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish
Deniz Zeyrek, Giedrė Valūnaitė Oleškevičienė and Amalia Mendes
 mini-CIEP+ : A Shareable Parallel Corpus of Prose
Annemarie Verkerk and Luigi Talamo