Despite rapid progress in Natural Language Processing (NLP), the benefits of recent advances - especially large language models (LLMs) - remain unevenly distributed. While high-resource languages like English, French, and Chinese have seen significant performance gains, low-resource languages continue to face substantial challenges across core NLP tasks such as machine translation, sentiment analysis, named entity recognition (NER), and part-of-speech tagging.

These disparities arise from a combination of factors: the scarcity of high-quality training data, limited linguistic resources, and a lack of community involvement in data collection and model development. As a result, many languages, particularly African, Indigenous, and minority languages, remain underrepresented in both academic research and deployed NLP systems.

LowResNLP is a workshop dedicated to addressing these challenges by fostering research, collaboration, and discussion around methods, resources, and evaluation practices specifically designed for low-resource languages. LowResNLP seeks to actively contribute to the field by inviting submissions that specifically address the unique challenges and opportunities involved in working with low-resource languages.

For any questions, please drop a mail to lowresnlp-2025-organizers@googlegroups.com

Stay tuned for updates as we approach the workshop date!

We look forward to your participation in Varna!

Schedule (September 13, 2025)

08:45–09:00 — Arrival

09:00–10:00 — Keynote Speech, Jesujoba Oluwadara Alabi

10:00–10:30 — Low-Resource Machine Translation for Moroccan Arabic,
Alexei Rosca, Abderrahmane Issam and Gerasimos Spanakis

10:30–11:30 — Coffee break and poster session 1

11:30–12:00 — Building a Lightweight Classifier to Distinguish Closely Related Language Varieties with Limited Supervision:
The Case of Catalan vs Valencian, Raúl García Cerdá, María Miró Maestre and Miquel Canal

12:00–12:30 — Automatic Fact-checking in English and Telugu,
Ravi Kiran Chikkala, Tatiana Anikina, Natalia Skachkova, Ivan Vykopal, Rodrigo Agerri and Josef van Genabith

12:30–13:00 — Bridging the Gap: Leveraging Cherokee to Improve Language Identification for Endangered Iroquoian Languages,
Liam Enzo Eggleston, Michael P. Cacioli, Jatin Sarabu, Ivory Yang and Kevin Zhu

13:00–15:00 — Lunch break

15:00–15:30 — IfGPT: A Dataset in Bulgarian for Large Language Models,
Svetla Peneva Koeva, Ivelina Stoyanova and Jordan Konstantinov Kralev

15:30–16:00 — Modular Training of Deep Neural Networks for Text Classification in Guarani,
Jose Luis Vazquez, Carlos Ulises Valdez, Marvin Matías Agüero-Torales, Julio César Mello-Román, Jose Domingo Colbes and Sebastian Alberto Grillo

16:00–16:30 — Coffee break and poster session 2

16:30–17:00 — Slur and Emoji Aware Models for Hate and Sentiment Detection in Roman Urdu Transgender Discourse,
Muhammad Owais Raza, Aqsa Umar and Mehrub Awan

17:00–17:30 — A Multi-Task Learning Approach to Dialectal Arabic Identification and Translation to Modern Standard Arabic,
Abdullah Khered, Youcef Benkhedda and Riza Batista-Navarro

17:30–17:45 — Closing

This workshop was supported by the European Union under Horizon Europe projects GAIN: “Georgian Artificial Intelligence Networking and Twinning Initiative” (GA #101078950), DisAI: “Improving Scientific Excellence and Creativity in Combating Disinformation with Artificial Intelligence and Language Technologies” (GA #101079164), and LorAI: “Low Resource Artificial Intelligence” (GA #101136646), and by the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) through the project “The limits and future of data-driven approaches: A comparative study of deep learning, knowledge-based and rule-based models and methods in Natural Language Processing” (CIDEXG/2023/13).

EU flag