Language is more than a tool of conveying information; it is utilized in all aspects of our lives. Yet only a small number of languages in the 7,000 languages worldwide are highly resourced by human language technologies (HLT). Despite African languages representing over 2,000 languages, only a few African languages are highly resourced, for which there exists a considerable amount of parallel digital data.
We present a novel approach to machine translation (MT) for under-resourced languages by improving the quality of the model using a paradigm called ``humans in the Loop.''
This thesis describes the work carried out to create a Bambara-French MT system including data discovery, data preparation, model hyper-parameter tuning, the development of a crowdsourcing platform for humans in the loop, vocabulary sizing, and segmentation. We present a novel approach to machine translation (MT) for under-resourced languages by improving the quality of the model using a paradigm called ``humans in the Loop.'' We achieved a BLEU (bilingual evaluation understudy) score of 17.5. The results confirm that MT for Bambara, despite our small data set, is viable. This work has the potential to contribute to the reduction of language barriers between the people of Sub-Saharan Africa and the rest of the world.
Library of Congress Subject Headings
Bambara language--Translation into French; Translators (Computer programs); Translating and interpreting--Data processing; Computational linguistics; Corpora (Linguistics); Human-computer interaction
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Christopher M. Homan
Tapo, Allahsera Auguste, "Machine-assisted translation by Human-in-the-loop Crowdsourcing for Bambara" (2020). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus