Scalable African LAnguage Multimodal AI Framework — an open-source Speech-to-Speech system built to empower African languages in voice-driven applications, starting with Swahili.
Bridge the African linguistic digital divide by providing high-quality, culturally aligned speech and language intelligence systems. SALAMA enables seamless end-to-end voice interaction by converting speech → text → intelligent response → speech.
Three modular components working together for end-to-end voice interaction.
Robust transcription for African languages using Whisper Small, fine-tuned for Swahili speech patterns and noisy environments.
Context-aware reasoning and response generation using UlizaLlama, instruction-tuned for natural Swahili text generation.
Natural, expressive voice synthesis using Facebook MMS (VITS-based), fine-tuned for Swahili prosody and tone.
Designed for extensibility, performance, and real-world impact.
STT, LLM, and TTS can be swapped, upgraded, or fine-tuned independently.
Models fine-tuned for African speech patterns and dialectal variations.
Framework designed to support additional modalities such as vision.
Optimized for low-latency voice interaction in conversational agents.
Simple configuration for integrating new languages or tasks.
Strong performance across all modules, validated on real-world Swahili data.
SALAMA supports six flexible modes for voice and text interaction.
Voice → LLM → Voice. Full end-to-end voice conversation.
Text → LLM → Text. Standard chat interaction.
Voice input, text response via LLM processing.
Text input, voice response with natural synthesis.
Transcription only — no LLM processing.
Synthesis only — no LLM processing.
All SALAMA models are open-source and available on HuggingFace.
Swahili Whisper ASR — fine-tuned speech recognition with 95.4% accuracy.
openai/whisper-smallSwahili instruction-tuned language model for reasoning, Q&A, and dialogue.
Jacaranda/UlizaLlamaSwahili text-to-speech with natural prosody — MOS 4.05/5.0.
facebook/mms-tts-swhSALAMA is open-source under the MIT License. Clone the repository, install dependencies, and start building voice-powered African language applications.