KeBoNa: Knowledge-driven Bootstrapping of computational language resources for Niger-Congo B languages

National Research Foundation CPRR grant (2024-2026), Grant number 23040389063

Project summary - Outputs - Members and collaborators

Project summary

Natural language processing is ubiquitous especially for well-resourced languages. Yet speakers of low-resourced languages, such as those of the Niger-Congo-B (‘Bantu’) language family, also want to have tools such as spelling and grammar checkers and chatbots and customised patient discharge notes in their first language. Due to insufficient language data, such technologies also require a more laborious knowledge-based approach. It is therefore imperative to bootstrap a new resource in one language from an existing one in a related language, for efficient reuse of resources. Insight into bootstrapping for NCB languages is sporadic, however, and is hindered by a lack of options for meaningful annotations across the languages and ontology-mediated annotation systems, such as GOLD and OLiA, do not include NCB-specific linguistically important elements, nor are those resources harmonised and aligned with foundational ontology principles.
The main aim of the project is to investigate bootstrapping strategies with a novel enhanced knowledge-mediated approach, to eventually be able to state, in an informed way, which task can bootstrap well from what language resources, and why. In one strand of research, we will investigate ontologically, and design, an integrative ontology or knowledge graph module to complement data-driven strategies with meaning, to incorporate specifics of NCB languages for resource annotation and comparison, and such that it is compatible with extant ontology ecosystems. The other strand of research concerns devising NCB-relevant metrics to compute bootstrapping effects and to compute similarity among languages to quantify the potential for, and benefits of, bootstrapping computational resources, availing of the knowledge resources. Both will inform each other, and we will evaluate the theory with concrete existing and novel computational tasks for NCB languages, developing new computational resources in the process.

Outputs


Members and collaborators