NEWSLETTER

By clicking submit, you agree to share your email address with TFN to receive marketing, updates, and other emails from the site owner. Use the unsubscribe link in the emails to opt out at any time.

Ex-Goldman Sachs and Meta founders raise $3M to build the voice AI infrastructure global tech giants overlooked

AethexAI founders
Image credits: AethexAI
  • London-based AethexAI emerges from stealth with a $3M pre-seed round led by 4DX Ventures
  • The startup is already handling up to 15,000 production calls daily for enterprise customers across West Africa
  • Its proprietary voice stack cuts costs to as little as $0.03 per minute while optimising for low-bandwidth, high-latency markets across Africa and the Middle East

The world’s most sophisticated voice AI models break down precisely where the next billion internet users actually live. Deployed across Africa and the Middle East, Western voice systems buckle under patchy telecom infrastructure, high-latency networks, and the code-switching dialects of local markets, not in demos, but in production, where it counts.

London and San Francisco-based AethexAI, emerging from stealth today, has raised $3 million in a pre-seed round led by 4DX Ventures to fix that. The round included participation from Enza Capital, Dorm Room Fund, Mojo Ventures, the Stanford GSB 26 Fund, an alumni investment vehicle from Stanford Graduate School of Business, and strategic angels, including Stanford faculty, telecom executives, and AI researchers from Anthropic.

The founders

AethexAI was founded in 2025 by Mariama Diallo and Ayooluwa Odemuyiwa. Diallo previously worked in investment banking at Goldman Sachs before joining YC-backed Model ML as its first product and growth hire, working closely with large enterprise clients. Odemuyiwa trained as a computer scientist at Caltech, built systems across aerospace and at Meta, then attended Stanford Graduate School of Business — a connection that explains the Stanford GSB 26 Fund’s involvement in this round.

The pair spent time on the ground with businesses across Africa and the Middle East, where they repeatedly encountered the same problem: voice AI products that worked in demos but collapsed in production. They left their respective roles in San Francisco and relocated to London to build a purpose-designed voice infrastructure platform from scratch. AethexAI currently has a team of 10 and expects to double headcount by the end of 2026.

Why existing voice AI breaks here

Most voice AI platforms rely on large language models hosted on high-end GPU infrastructure in North America or Europe. For users in Africa and the Middle East, that geographic distance introduces latency and jitter that makes automated calls unreliable. Existing tools also struggle with code-switching, regional dialects of English, French, and Arabic, and low-bitrate audio on real telecom networks. The result: companies across the region have been unable to automate large parts of their customer interactions — leaving efficiency gains and revenue untapped.

AethexAI rebuilt the entire stack rather than adapting Western tools. Rather than using existing orchestration tools like Vapi or LiveKit, the company built its own small model and orchestration layer from scratch. The platform combines self-hosted speech models, fully managed telephony, agent orchestration, and enterprise deployment tools in a single stack, delivered through both a no-code interface and APIs. To train its models, AethexAI used anonymised recordings from call centre partners, shipped hard drives to radio stations across Africa to collect audio data, and built a contributor network of university students to annotate data and pronounce local names — a data collection strategy that most Western labs would never consider.

Kora 1 — the model stack

At the heart of the platform is Kora 1, AethexAI’s proprietary family of speech models. The models range from 300 million to 1.7 billion parameters — a fraction of the size of mainstream LLMs, which is precisely the point: smaller models run on local infrastructure with low latency, rather than routing calls to distant GPU clusters. Kora 1 is trained on licensed datasets from call centres, radio networks, and content platforms across the region, and is optimised for noisy environments, multiple accents, and code-switching between languages. Pricing starts at $0.030 per minute — compared to $0.10 or more from global providers before additional costs.

The company is already supporting production deployments of up to 15,000 calls daily for enterprise customers, including major call centre operators in West Africa. Primary use cases include debt collection, customer activation, and KYC verification for banks and telecoms. According to 4DX Ventures, enterprises in Africa and the Middle East process roughly three times the call volume of their Western counterparts — making voice AI not a nice-to-have but a core operational necessity.

In their own words

“Voice is already how businesses operate across emerging markets, but the technology behind it hasn’t kept up. We kept hearing the same thing from customers: that existing tools simply didn’t work in their environments. That’s why we built our own model stack and infrastructure from the ground up, designed for how these markets actually operate.” — Mariama Diallo, co-founder, AethexAI

“Voice AI failed in these markets at every layer of the stack. Latency, cost, poor handling of code switching, and weak performance under packet loss, jitter, and low-bitrate audio in real telecom networks led these systems to break in production. The fix was not incremental. It required redesigning the entire stack. Kora 1 is our family of speech models, specialised by dialect and fully self-hosted.” — Ayooluwa Odemuyiwa, co-founder, AethexAI.

“Voice AI adoption in emerging markets has been constrained, less so due to demand, but rather by infrastructure that was never designed for these environments. AethexAI has taken a fundamentally different approach, rebuilding the stack from the ground up for how these markets actually operate. With real production deployments already at scale, the AethexAI team is building what we believe will become the defining voice infrastructure layer for the next billion users.” — Walter Badoo, Co-Founder and Managing Partner, 4DX Ventures.

The competitive landscape

ElevenLabs raised $500 million at an $11 billion valuation in February 2026 in a Series D led by Sequoia, making it one of the best-funded voice AI companies in the world. Its focus spans speech synthesis, dubbing, and conversational AI — primarily targeting Western enterprise markets with premium pricing to match. Retell AI, which grew to $50 million ARR in 2025 on just $14 million in total funding, focuses on English-language call centre automation for Western enterprises and does not operate in emerging market telephony environments. What differentiates AethexAI is not just pricing but architecture: a fully self-hosted stack designed for low-bandwidth, high-latency telecom networks that global providers have never optimised for.

Market context

According to Grand View Research, the global AI agents market was valued at $7.63 billion in 2025 and is projected to reach $182.97 billion by 2033, growing at a CAGR of 49.6%. Africa’s digital economy is growing faster than any other region: internet users on the continent are projected to reach 1.1 billion by 2030, and mobile-first voice interactions already dominate enterprise customer engagement across the region.The question AethexAI forces the market to answer is not whether voice AI will reach Africa and the Middle East. Capital is already flowing there. The question is whether a purpose-built emerging market stack — trained on hard drives shipped to radio stations across the continent — can hold its ground when the $11 billion incumbents eventually decide these 1.5 billion users are worth optimising for.

Total
0
Shares
Related Posts
Total
0
Share

Get daily funding news briefings in the tech world delivered right to your inbox.

Enter Your Email
join our newsletter. thank you
TFN Banner