MoRe NL: foundations of a Modular Realisation Engine for Nguni LanguagesNRF CPRR grant (2020-2022), Grant number 120852
Project summary - Outputs - Members and collaborators
Project summaryA multitude of socio-economic and political factors cause language barriers to persist in healthcare and other areas, such as weather forecasts, for the vast majority of people in South Africa. Computer applications may alleviate these issues by translations or generating the required contextually relevant text from structured input. The latter is addressed by Natural Language Generation (NLG). The current state of NLG for Nguni languages--one of the two main groups of indigenous languages of to South Africa--is in the exploratory stage, which has led to a clear set of problems that need to be resolved. As templates are generally inapplicable, once-off patterns were defined, but there is no NLG pattern specification language. The algorithms for the few knowledge-to-text sentences supported are ad hoc, rather than systematically and modular for flexible reuse across application scenarios. Further, looking beyond isiZulu to related languages, there is no theory, nor tool, nor even an approach for easy reuse and adaptation--or: bootstrapping--the resources for those other languages that are also widely spoken.
The aim of this project is to carry out the research needed to build a generic framework for a NLG realization engine for at least the Nguni language group, inclusive of an entirely novel NLG pattern specification language with annotation model, that will be modular and domain-independent so that one can 'mix and match' word fragments, clitics, and concords as needed for the task. This will be computationally tractable and be usable with popular NLP tools and knowledge representation systems, such as NLTK and RDF and OWL. This will enable designers to generate sentences in the Nguni languages and in related Bantu languages for a range of applications. Further, in aiming for generalizability of such a realisation engine, a solution will be found for devising computationally usable measures with predictive power for bootstrapping across related Bantu languages.
OutputsNone yet (it's just starting up now)
Members and collaborators
- A/Prof. Maria Keet, UCT; PI
- A/Prof. Langa Khumalo, Linguistics and ULPDO, UKZN; research associate
- Dr. Zubeida Khan, CSIR; research associate
- Mr. Zola Mahlaza, PhD student, UCT; research associate
- MSc student - TBA
- scientific programmer - TBA