AI development has hit a critical roadblock: the lack of high-quality training data. Public datasets are exhausted, and the real-world data needed for next-gen models remains fragmented and hard to access legally. Protege solves this by creating a trusted data exchange platform.
It licenses private datasets from hundreds of providers across healthcare, media, and more, then curates and optimises them into AI-ready formats. AI companies can access this data via streamlined workflows, while providers earn revenue shares.
Today, the US startup raised a $30 million Series A extension led by Andreessen Horowitz (a16z), with returning investors including Footwork, CRV, Bloomberg Beta, and others. The round brings total funding to $65M since 2024, following an initial $25M Series A in August 2025.
Making real-world data the fuel for reliable AI
Bobby Samuels and Travis May launched Protege in 2024 after seeing firsthand how data bottlenecks stall AI progress. May, former CEO of Datavant and LiveRamp, understood the complexities of health data exchange. Samuels brings operational expertise to scale data marketplaces.
“Across industries, we’re seeing demand for real-world data grow faster than the market’s ability to supply it responsibly. At the same time, data is highly fragmented, and neither data holders nor AI builders are set up to operationalise it at scale. Protege serves as a trusted source of curated, and AI-ready data while unlocking new revenue streams for data providers,” said Bobby Samuels, CEO and co-founder of Protege.
Protege’s platform aggregates datasets (de-identified health records, audio, imaging, media) via licensing, then applies curation tools for AI optimisation, cleaning, anonymisation, and formatting for training/evaluation. Other features include cross-vertical coverage, revenue sharing, and AI delivery.
Unlike Scale AI, Snorkel AI, and Labelbox, Protege focuses on data and a provider network.
What’s next?
The new capital targets domain expansion, partner growth, product acceleration, and team buildout.
“The next era of AI will be shaped by who can responsibly unlock access to the world’s most valuable data. Protege has built a platform that respects the complexity of real-world data across industries while making it usable for modern AI development. Their momentum reflects a broader shift in the market, ” said Daisy Wolf, Partner at Andreessen Horowitz.