NEWSLETTER

By clicking submit, you agree to share your email address with TFN to receive marketing, updates, and other emails from the site owner. Use the unsubscribe link in the emails to opt out at any time.

OpenAI voice chatbot founder backs PyannoteAI’a $9M round to transform AI speech models with speaker intelligence

pyannoteAI founders
Picture credits: pyannoteAI

Most current voice AI systems focus heavily on converting speech to text. While transcription is a critical component, it often overlooks deeper layers of communication, especially who is speaking, how they are speaking, and the context surrounding their speech. pyannoteAI addresses this gap by introducing Speaker Intelligence, a groundbreaking technology designed to identify and differentiate speakers accurately, regardless of the language spoken or the acoustic conditions.

Being a notable player in Speaker Intelligence AI, the French startup has snapped $9 million in  seed funding. The round was led by Crane Venture Partners and Serena, with participation from notable angel investors Julien Chaumond, CTO of HuggingFace, and Alexis Conneau, formerly of Meta and OpenAI and WaveForms AI co-founder. 

With the newly secured funding, pyannoteAI is preparing to expand beyond the open-source ecosystem. The company plans to launch enterprise-level solutions that meet the specific needs of businesses looking to implement speaker-aware AI at scale. These solutions will be aimed at organisations processing large volumes of conversational audio who require accurate speaker recognition in real time.

Solves a long-standing challenge in conversational AI

Hervé Bredin, Vincent Molina, and Juan Coria founded pyannoteAI in 2024. The company’s mission is to empower global teams with world-class products through advanced conversational speech AI that bridges the gap between transcription and full conversational understanding.

Speaker Intelligence is particularly important in environments where multiple voices are involved, such as meetings, customer service calls, or medical consultations. In these scenarios, understanding not just what is said but who said it and how it was delivered is vital. pyannoteAI’s technology ensures that voice data retains its context and becomes a richer, more reliable source of actionable insights for organisations.

One of the key challenges in voice AI is dealing with spontaneous, unscripted speech. Variations in tone, accent, pace, and emotion add complexity that traditional transcription tools are not equipped to handle. This is where pyannoteAI sets itself apart. Its platform begins by identifying and separating different speakers with a high degree of accuracy, forming a basis for more nuanced conversational analysis.

This layer of speaker differentiation is essential for various industries. In customer support, for example, it helps distinguish between agent and customer inputs. In media and entertainment, it supports accurate dubbing and subtitling. In healthcare, it allows voice data to be tied to individual practitioners or patients for more precise record-keeping.

Rapid growth in adoption  

pyannoteAI’s growth has been fueled in part by its open-source foundation. Its tools are already used by over 100,000 developers worldwide and achieve approximately 45 million downloads each month on HuggingFace. This strong community support has validated the demand for accurate speaker diarisation and has helped the technology mature rapidly.

The company’s premium model delivers world-class accuracy—outperforming state-of-the-art solutions by 20% — while processing audio twice as fast as its open-source counterpart. This performance advantage makes speaker diarization more accessible to businesses of all sizes by significantly reducing computational costs.

Enables next-gen voice applications

By integrating Speaker Intelligence at the core of its offering, pyannoteAI is setting the stage for a new generation of voice-enabled applications. Its technology has the potential to enhance everything from virtual assistants and transcription services to content moderation and compliance monitoring.

Instead of treating voice solely as a medium for transcribed text, pyannoteAI encourages developers and companies to treat it as a multi-layered source of contextual information. Understanding who speaks and how they express themselves opens up new dimensions in how machines interpret human interaction.

The technology is already being deployed across diverse use cases such as live streaming applications enabling instant speaker tracking for localization or simultaneous translation during events — a capability critical for globalized industries like media production or international business operations.

What’s next for the company? 

With it’s seed round, pyannoteAI is positioned to expand its impact across industries that rely on accurate, context-aware voice data. By focusing on Speaker Intelligence, the company is filling a crucial gap in Voice AI, shifting the focus from mere word recognition to full conversational understanding. This approach not only improves the reliability of voice technologies but also paves the way for more human-like AI interactions in the future.

“Speech technology has advanced significantly, yet it still falls short of capturing the full picture. Voice is more than just words,” said Hervé Bredin, co-founder of pyannoteAI and former research scientist at CNRS. “For a decade, pyannote technology has been leading the way in distinguishing speakers and voices in real-world conversations—especially in high-stakes environments where every voice must be heard.”

“We’re bringing enterprise-grade Speaker Intelligence AI to businesses that depend on voice data,” said Vincent Molina, co-founder of pyannoteAI. “Our goal is to make speaker-aware AI as seamless and universal as speech itself.”

“As the old saying goes, ‘it’s not what you say, it’s how you say it’—and in the world of Voice AI, that distinction has never been more important. pyannoteAI’s groundbreaking approach to Speaker Intelligence AI is setting a new standard for how businesses process and extract value from spoken data. We, at Crane, are thrilled to back a team that is redefining the fundamental layer of voice technology,” said Morgane Zerath, Investor at Crane Venture Partners. 

“pyannoteAI is redefining the way businesses harness voice data, turning raw speech into actionable intelligence. The team’s expertise in speaker diarisation is unparalleled, and their transition from open-source leadership to enterprise-grade AI solutions marks a pivotal shift in the Voice AI landscape. We, at Serena, are excited to back their journey in making Speaker Intelligence AI a fundamental layer of modern voice technology,” added Matthieu Lavergne, Partner at Serena. 

Total
0
Shares
Related Posts
Total
0
Share

Get daily funding news briefings in the tech world delivered right to your inbox.

Enter Your Email
join our newsletter. thank you