Despite rapid progress in Natural Language Processing (NLP), the benefits of recent advances - especially large language models (LLMs) - remain unevenly distributed. While high-resource languages like English, French, and Chinese have seen significant performance gains, low-resource languages continue to face substantial challenges across core NLP tasks such as machine translation, sentiment analysis, named entity recognition (NER), and part-of-speech tagging.
These disparities arise from a combination of factors: the scarcity of high-quality training data, limited linguistic resources, and a lack of community involvement in data collection and model development. As a result, many languages, particularly African, Indigenous, and minority languages, remain underrepresented in both academic research and deployed NLP systems.
LowResNLP is a workshop dedicated to addressing these challenges by fostering research, collaboration, and discussion around methods, resources, and evaluation practices specifically designed for low-resource languages. LowResNLP seeks to actively contribute to the field by inviting submissions that specifically address the unique challenges and opportunities involved in working with low-resource languages.
For any questions, please drop a mail to lowresnlp-2025-organizers@googlegroups.com
Stay tuned for updates as we approach the workshop date!
We look forward to your participation in Varna!
Schedule (September 13, 2025)
08:45–09:00 — Arrival
09:00–10:00 — Keynote Speech, Jesujoba Oluwadara Alabi
10:00–10:30 — Low-Resource Machine Translation for Moroccan Arabic,
Alexei Rosca, Abderrahmane Issam and Gerasimos Spanakis
10:30–11:30 — Coffee break and poster session 1
11:30–12:00 — Building a Lightweight Classifier to Distinguish Closely Related Language Varieties with Limited Supervision:
The Case of Catalan vs Valencian, Raúl García Cerdá, María Miró Maestre and Miquel Canal
12:00–12:30 — Automatic Fact-checking in English and Telugu,
Ravi Kiran Chikkala, Tatiana Anikina, Natalia Skachkova, Ivan Vykopal, Rodrigo Agerri and Josef van Genabith
12:30–13:00 — Bridging the Gap: Leveraging Cherokee to Improve Language Identification for Endangered Iroquoian Languages,
Liam Enzo Eggleston, Michael P. Cacioli, Jatin Sarabu, Ivory Yang and Kevin Zhu
13:00–15:00 — Lunch break
15:00–15:30 — IfGPT: A Dataset in Bulgarian for Large Language Models,
Svetla Peneva Koeva, Ivelina Stoyanova and Jordan Konstantinov Kralev
15:30–16:00 — Modular Training of Deep Neural Networks for Text Classification in Guarani,
Jose Luis Vazquez, Carlos Ulises Valdez, Marvin Matías Agüero-Torales, Julio César Mello-Román, Jose Domingo Colbes and Sebastian Alberto Grillo
16:00–16:30 — Coffee break and poster session 2
16:30–17:00 — Slur and Emoji Aware Models for Hate and Sentiment Detection in Roman Urdu Transgender Discourse,
Muhammad Owais Raza, Aqsa Umar and Mehrub Awan
17:00–17:30 — A Multi-Task Learning Approach to Dialectal Arabic Identification and Translation to Modern Standard Arabic,
Abdullah Khered, Youcef Benkhedda and Riza Batista-Navarro
17:30–17:45 — Closing
This workshop was supported by the European Union under Horizon Europe projects GAIN: “Georgian Artificial Intelligence Networking and Twinning Initiative” (GA #101078950), DisAI: “Improving Scientific Excellence and Creativity in Combating Disinformation with Artificial Intelligence and Language Technologies” (GA #101079164), and LorAI: “Low Resource Artificial Intelligence” (GA #101136646), and by the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) through the project “The limits and future of data-driven approaches: A comparative study of deep learning, knowledge-based and rule-based models and methods in Natural Language Processing” (CIDEXG/2023/13).