Building a Yoruba text-to-speech engine for automatic reading machines: A concatenative approach
DOI:
https://doi.org/10.5281/zenodo.18109021Keywords:
Yoruba TTS, Concatenative Synthesis, Automatic Reading Machine, Digital Accessibility, Speech SynthesisAbstract
This research develops a concatenative Text-to-Speech (TTS) system for automatic reading machines (ARMs). TTS system is a major component of an ARM that converts written text to synthetic speech. Corpus concatenation has been the most effective and widely used TTS approach as it is the most efficient in the production of natural and intelligible speech for application in reading aids for the visually impaired, persons with dyslexia, and language learning tools. The abysmal performance of existing TTS in the Yoruba language has resulted in challenges Yoruba speakers face in accessing digital content. This study developed a comprehensive Yoruba speech corpus and implement a concatenative text-to-speech framework, incorporating a Yoruba optical character recognition (YOCR) system, Unicode mapping, syllable segmentation, and speech quality optimization using windowing and pre-emphasis filtering. The developed system achieved Mean Opinion Scores (MOS) of 4.86 for two syllable words, 4.67 for five-syllable words and at least 4.37 for sentences. The Mel Cepstral Distortion (MCD) metrics showed a maximum mean of 1.58 for concatenated words. The system evaluation using MOS and MCD metrics demonstrates its potential for integration into ARMs and improving digital content accessibility for Yoruba speakers.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Technoscience Journal for Community Development in Africa

This work is licensed under a Creative Commons Attribution 4.0 International License.