LREC COLING 2024 Proceedings Home | Workshops | Tutorials | LREC Proceedings | ELRA Website | ICCL Website

Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC)

ISBN: 978-2-493814-31-9
EAN: 9782493814319

List of Papers


Full proceedings volume (PDF) | Workshop Site | Home | Programme | Author index | Bibliography (BibTeX) | Editors





pdf bib Papers pages
pdf bib On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings
Guillem Ramírez, Rumen Dangovski, Preslav Nakov and Marin Soljacic
pp. 1‑11
pdf bib Modeling Diachronic Change in English Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal
Julius Steuer, Marie-Pauline Krielke, Stefan Fischer, Stefania Degaetano-Ortlieb, Marius Mosbach and Dietrich Klakow
pp. 12‑23
pdf bib PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
Tomás Freitas Osório, Bernardo Leite, Henrique Lopes Cardoso, Luís Gomes, João Rodrigues, Rodrigo Santos and António Branco
pp. 24‑34
pdf bib Invited Talk: The Way Towards Massively Multilingual Language Models
François Yvon
pp. 35‑35
pdf bib Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long, ZhenHao Tang, Xianghua Fu, Jian Chen, Shilong Hou and Jinze Lyu
pp. 36‑50
pdf bib Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles
Abdelhadi Soudi, Mohamed Hannani, Kristof Van Laerhoven and Eleftherios Avramidis
pp. 51‑58
pdf bib INCLURE: a Dataset and Toolkit for Inclusive French Translation
Paul Lerner and Cyril Grouin
pp. 59‑68
pdf bib BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation
Sourav Saha, Zeshan Ahmed Nobin, Mufassir Ahmad Chowdhury, Md. Shakirul Hasan Khan Mobin, Mohammad Ruhul Amin and Sudipta Kar
pp. 69‑84
pdf bib Creating Clustered Comparable Corpora from Wikipedia with Different Fuzziness Levels and Language Representativity
Anna Laskina, Eric Gaussier and Gaelle Calvary
pp. 85‑93
pdf bib EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Marc Kupietz, Piotr Banski, Nils Diewald, Beata Trawinski and Andreas Witt
pp. 94‑103
pdf bib Building Annotated Parallel Corpora Using the ATIS Dataset: Two UD-style treebanks in English and Turkish
Neslihan Cesur, Aslı Kuzgun, Mehmet Kose and Olcay Taner Yıldız
pp. 104‑110
pdf bib Bootstrapping the Annotation of UD Learner Treebanks
Arianna Masciolini
pp. 111‑117
pdf bib SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish
Felix Morger
pp. 118‑124
pdf bib Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish
Deniz Zeyrek, Giedrė Valūnaitė Oleškevičienė and Amalia Mendes
pp. 125‑134
pdf bib mini-CIEP+ : A Shareable Parallel Corpus of Prose
Annemarie Verkerk and Luigi Talamo
pp. 135‑143

© 2024 ELRA