LREC COLING 2024 Proceedings Home | Workshops | Tutorials | LREC Proceedings | ELRA Website | ICCL Website

Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages

ISBN: 978-2-493814-29-6
EAN: 9782493814296

List of Papers


Full proceedings volume (PDF) | Workshop Site | Home | Programme | Author index | Bibliography (BibTeX) | Editors





pdf bib Papers pages
pdf bib A Bit of a Problem: Measurement Disparities in Dataset Sizes across Languages
Catherine Arnett, Tyler A. Chang and Benjamin Bergen
pp. 1‑9
pdf bib A Novel Corpus for Automated Sexism Identification on Social Media
Lutfiye Seda Mut Altin and Horacio Saggion
pp. 10‑15
pdf bib Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos, João Ricardo Silva, Luís Gomes, João Rodrigues and António Branco
pp. 16‑26
pdf bib Assessing Pre-Built Speaker Recognition Models for Endangered Language Data
Gina-Anne Levow
pp. 27‑32
pdf bib BERTbek: A Pretrained Language Model for Uzbek
Elmurod Kuriyozov, David Vilares and Carlos Gómez-Rodríguez
pp. 33‑44
pdf bib Beyond Error Categories: A Contextual Approach of Evaluating Emerging Spell and Grammar Checkers
Þórunn Arnardóttir, Svanhvít Lilja Ingólfsdóttir, Haukur Barri Símonarson, Hafsteinn Einarsson, Anton Karl Ingason and Vilhjálmur Þorsteinsson
pp. 45‑52
pdf bib Bidirectional English-Nepali Machine Translation(MT) System for Legal Domain
Shabdapurush Poudel, Bal Krishna Bal and Praveen Acharya
pp. 53‑58
pdf bib BK3AT: Bangsamoro K-3 Children’s Speech Corpus for Developing Assessment Tools in the Bangsamoro Languages
Kiel D. Gonzales, Jazzmin R. Maranan, Francis Paolo D. Santelices, Edsel Jedd M. Renovalles, Nissan D. Macale, Nicole Anne A. Palafox and Jose Marie A. Mendoza
pp. 59‑65
pdf bib CorpusArièja: Building an Annotated Corpus with Variation in Occitan
Clamenca Poujade, Myriam Bras and Assaf Urieli
pp. 66‑71
pdf bib Developing Infrastructure for Low-Resource Language Corpus Building
Hedwig G. Sekeres, Wilbert Heeringa, Wietse de Vries, Oscar Yde Zwagers, Martijn Wieling and Goffe Th. Jensma
pp. 72‑78
pdf bib Evaluating Icelandic Sentiment Analysis Models Trained on Translated Data
Ólafur A. Jóhannsson, Birkir H. Arndal, Eysteinn Ö. Jónsson, Stefan Olafsson and Hrafn Loftsson
pp. 79‑89
pdf bib Exploring Text Classification for Enhancing Digital Game-Based Language Learning for Irish
Leona Mc Cahill, Thomas Baltazar, Sally Bruen, Liang Xu, Monica Ward, Elaine Uí Dhonnchadha and Jennifer Foster
pp. 90‑96
pdf bib Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
Fred Philippy, Shohreh Haddadan and Siwen Guo
pp. 97‑104
pdf bib Fostering the Ecosystem of Open Neural Encoders for Portuguese with Albertina PT* Family
Rodrigo Santos, João Rodrigues, Luís Gomes, João Ricardo Silva, António Branco, Henrique Lopes Cardoso, Tomás Freitas Osório and Bernardo Leite
pp. 105‑114
pdf bib Improving Language Coverage on HeLI-OTS
Tommi Jauhiainen and Krister Lindén
pp. 115‑125
pdf bib Improving Legal Judgement Prediction in Romanian with Long Text Encoders
Mihai Masala, Traian Rebedea and Horia Velicu
pp. 126‑132
pdf bib Improving Noisy Student Training for Low-resource Languages in End-to-End ASR Using CycleGAN and Inter-domain Losses
Chia-Yu Li and Ngoc Thang Vu
pp. 133‑142
pdf bib Indonesian-English Code-Switching Speech Recognition Using the Machine Speech Chain Based Semi-Supervised Learning
Rais Vaza Man Tazakka, Dessi Lestari, Ayu Purwarianti, Dipta Tanaya, Kurniawati Azizah and Sakriani Sakti
pp. 143‑148
pdf bib Inter-language Transfer Learning for Visual Speech Recognition toward Under-resourced Environments
Fumiya Kondo and Satoshi Tamura
pp. 149‑154
pdf bib Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study
Wan-hua Her and Udo Kruschwitz
pp. 155‑167
pdf bib Italian-Ligurian Machine Translation in Its Cultural Context
Christopher R. Haberland, Jean Maillard and Stefano Lusito
pp. 168‑176
pdf bib Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset
Gabriel de Jesus and Sérgio Nunes
pp. 177‑188
pdf bib Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
Nikola Ljubešić, Vít Suchomel, Peter Rupnik, Taja Kuzman and Rik van Noord
pp. 189‑203
pdf bib Man or Machine: Evaluating Spelling Error Detection in Danish Newspaper Corpora
Eckhard Bick, Jonas Nygaard Blom, Marianne Rathje and Jørgen Schack
pp. 204‑211
pdf bib Managing Fine-grained Metadata for Text Bases in Extremely Low Resource Languages: The Cases of Two Regional Languages of France
Marianne Vergez-Couret, Delphine Bernhard, Michael Nauge, Myriam Bras, Pablo Ruiz Fabo and Carole Werner
pp. 212‑221
pdf bib Mixat: A Data Set of Bilingual Emirati-English Speech
Maryam Khalifa Al Ali and Hanan Aldarmaki
pp. 222‑226
pdf bib Multi-dialectal ASR of Armenian from Naturalistic and Read Speech
Malajyan Arthur, Victoria Khurshudyan, Karen Avetisyan, Hossep Dolatian and Damien Nouvel
pp. 227‑236
pdf bib Multilingual Self-supervised Visually Grounded Speech Models
Huynh Phuong Thanh Nguyen and Sakriani Sakti
pp. 237‑243
pdf bib Nepal Script Text Recognition Using CRNN CTC Architecture
Swornim Nakarmi, Sarin Sthapit, Arya Shakya, Rajani Chulyadyo and Bal Krishna Bal
pp. 244‑251
pdf bib NLP for Arbëresh: How an Endangered Language Learns to Write in the 21st Century
Giulio Cusenza and Çağrı Çöltekin
pp. 252‑256
pdf bib PersianEmo: Enhancing Farsi-Dari Emotion Analysis with a Hybrid Transformer and Recurrent Neural Network Model
Mohammad Ali Hussiny, Mohammad Arif Payenda and Lilja Øvrelid
pp. 257‑263
pdf bib Philippine Languages Database: A Multilingual Speech Corpora for Developing Systems for Low-Resource Languages
Rowena Cristina L. Guevara, Rhandley D. Cajote, Michael Gringo Angelo R. Bayona and Crisron Rudolf G. Lucas
pp. 264‑271
pdf bib Prompting towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot
Michelle Terblanche, Kayode Olaleye and Vukosi Marivate
pp. 272‑282
pdf bib Quantifying the Ethical Dilemma of Using Culturally Toxic Training Data in AI Tools for Indigenous Languages
Pedro Henrique Domingues, Claudio Santos Pinhanez, Paulo Cavalin and Julio Nogima
pp. 283‑293
pdf bib Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency
Carlos Escolano, Francesca De Luca Fornaciari and Maite Melero
pp. 294‑299
pdf bib Resource Acquisition for Understudied Languages: Extracting Wordlists from Dictionaries for Computer-assisted Language Comparison
Frederic Blum, Johannes Englisch, Alba Hermida Rodriguez, Rik van Gijn and Johann-Mattis List
pp. 300‑306
pdf bib Robust Guidance for Unsupervised Data Selection: Capturing Perplexing Named Entities for Domain-Specific Machine Translation
Seunghyun Ji, Hagai Raja Sinulingga and Darongsae Kwon
pp. 307‑317
pdf bib Seeding Alignment between Language Technology and Indigenous Methodologies: A Decolonizing Framework for Endangered Language Revitalization
Craig John Carpenter, John lyon, Miles Thorogood and Jeannette C. Armstrong
pp. 318‑324
pdf bib Solving Failure Modes in the Creation of Trustworthy Language Technologies
Gianna Leoni, Lee Steven, Tūreiti Keith, Keoni Mahelona, Peter-Lucas Jones and Suzanne Duncan
pp. 325‑330
pdf bib Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition
Dalai Mengke, Yan Meng and Peter Mihajlik
pp. 331‑336
pdf bib TELP – Text Extraction with Linguistic Patterns
João Cordeiro, Purificação Moura Silvano, António Leal and Sebastião Pais
pp. 337‑344
pdf bib The First Parallel Corpus and Neural Machine Translation Model of Western Armenian and English
Ari Nubar Boyacıoğlu and Jan Niehues
pp. 345‑356
pdf bib Tracing Linguistic Heritage: Constructing a Somali-Italian Terminological Resource through Explorers’ Notebooks and Contemporary Corpus Analysis
Silvia Piccini, Giuliana Elizabeth Vilela Ruiz, Andrea Bellandi and Enrico Carniani
pp. 357‑362
pdf bib Uncovering Social Changes of the Basque Speaking Twitter Community During COVID-19 Pandemic
Joseba Fernandez de Landa, Iker García-Ferrero, Ander Salaberria and Jon Ander Campos
pp. 363‑371
pdf bib UniDive: A COST Action on Universality, Diversity and Idiosyncrasy in Language Technology
Agata Savary, Daniel Zeman, Verginica Barbu Mititelu, Anabela Barreiro, Olesea Caftanatov, Marie-Catherine de Marneffe, Kaja Dobrovoljc, Gülşen Eryiğit, Voula Giouli, Bruno Guillaume, Stella Markantonatou, Nurit Melnik, Joakim Nivre, Atul Kr. Ojha, Carlos Ramisch, Abigail Walsh, Beata Wójtowicz and Alina Wróblewska
pp. 372‑382
pdf bib Unsupervised Outlier Detection for Language-Independent Text Quality Filtering
Jón Daðason and Hrafn Loftsson
pp. 383‑393
pdf bib UzABSA: Aspect-Based Sentiment Analysis for the Uzbek Language
Sanatbek Gayratovich Matlatipov, Jaloliddin Rajabov, Elmurod Kuriyozov and Mersaid Aripov
pp. 394‑403
pdf bib ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare
Huyen Nguyen, Quyen The Ngo, Thanh-Ha Do and Tuan-Anh Hoang
pp. 404‑409
pdf bib Why the Unexpected? Dissecting the Political and Economic Bias in Persian Small and Large Language Models
Ehsan Barkhordar, Surendrabikram Thapa, Ashwarya Maratha and Usman Naseem
pp. 410‑420
pdf bib Work in Progress: Text-to-speech on Edge Devices for Te Reo Māori and ‘Ōlelo Hawaiʻi
Tūreiti Keith
pp. 421‑426

© 2024 ELRA