Lisan is an open, Amharic-first speech and language stack covering speech-to-text, text-to-speech and language understanding, for five East African languages spoken by more than 150 million people, yet largely missing from production AI.
Already proven in production through Dewul, our live multilingual voice AI.
Voice assistants, transcription, translation and chatbots have transformed how the world works, but they were never built for Amharic, Tigrinya, Afaan Oromo, Somali or Swahili. Tens of millions of people, businesses and institutions are locked out of modern AI simply because of the language they speak. Lisan exists to close that gap with open, reusable building blocks.
Five reusable components, released openly so any builder, researcher or institution can deploy AI that works in these languages.
Open speech recognition fine-tuned for each language and hardened for noisy, real-world telephony audio.
Natural, expressive voices in each language, so machines can speak back the way people actually talk.
Intent and meaning extraction for task-oriented dialogue: booking, answering, routing and more.
A reproducible, public leaderboard so the whole community can measure and push progress on these languages.
A simple way for any developer to deploy an in-language voice agent on their own infrastructure.
Datasets, models, benchmarks and tooling published with model cards and datasheets. Public goods, not black boxes.
We start with Amharic and extend across families and scripts, so the methods we prove here transfer to many more African languages.
We start from existing open corpora and strong open base models rather than reinventing them, and collect targeted new data only where real gaps exist.
Transfer learning with Ge'ez-script normalization and morphology handling for the Semitic languages, tuned for the way each language is actually written and spoken.
Every model is deployed in Dewul and measured on real customer calls, not just held-out test sets, so quality reflects the real world.
Models, datasets, the benchmark and the toolkit are published openly, so the African NLP community can build on top of them.
Dewul, our multilingual AI receptionist, answers live business calls in Amharic today. Lisan hardens and opens the speech and language stack underneath it — turning a working commercial product into public goods for everyone.
We're inviting researchers, native-speaker communities, institutions and funders to co-build Lisan as equals, with co-authorship on the open datasets, models and benchmark and a shared stake in the outcome.