Most AI benchmarks are built behind closed doors, designed by researchers for researchers. San Francisco-based LMArena flipped that logic early. Instead of synthetic tests or narrowly defined scores, it lets everyday users judge models on real prompts, real tasks, and real expectations. A user types a question, two anonymous models respond, and a single choice determines which one performed better.
Now, LMArena has raised $150 million in a funding round, valuing it at $1.7 billion. This is nearly triple its valuation after its May 2025 seed round. The round was led by Felicis and UC Investments, with participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, and Laude Ventures.
The capital will support platform operations, technical hiring, and deeper research. More importantly, it reinforces LMArena’s core bet: in an industry crowded with claims of intelligence, progress is best measured by how models perform in the hands of people who actually use them.
Why leaderboard rankings mattered
As the number of large language models exploded, LMArena’s rankings became a proxy for credibility. A rise or fall on the leaderboard could shape developer adoption, media narratives, and enterprise interest. Model makers began watching the results closely, sometimes obsessively. The influence also brought scrutiny.
When LMArena partnered with select companies, including OpenAI, Google, and Anthropic, to make flagship models available for evaluation, critics questioned whether access created bias. In April, a group of competitors published a paper alleging that the setup allowed certain players to game the benchmarks. LMArena strongly denied the claims, but the episode highlighted how consequential its platform had become.
Turning community judgment into a business
What started as an open research experiment at UC Berkeley, led by Anastasios Angelopoulos and Wei-Lin Chiang, evolved into a massive public signal of model quality. Today, more than five million monthly users across 150 countries generate around 60 million head-to-head comparisons each month. Those judgments power LMArena’s leaderboards, which rank models across text, web development, vision, text-to-image, and other practical tasks.
In September, LMArena took a decisive step by launching AI Evaluations, a commercial service that allows enterprises, developers, and model labs to commission evaluations using its community-driven framework. Within four months, the service reached an annualised consumption rate of $30 million, showing how much the market values independent, human-in-the-loop assessment.
“We cannot deploy AI responsibly without knowing how it delivers value to humans,” said Anastasios Angelopoulos, co-founder and CEO of LMArena. “To measure the real utility of AI, we need to put it in the hands of real users. LMArena does exactly this, leveraging feedback from tens of millions of consumers and professionals to set the North Star of the AI industry. Our evaluations use a transparent, open-source methodology to make these insights public for everyone. This funding accelerates the scientific work and community insights that make live evaluation from real users the gold standard for assessing AI in practice.”
“Without a trustworthy way to measure performance, AI can’t be safely scaled,” said Jagdeep Singh Bachher, the University of California’s chief investment officer. “LMArena delivers clarity and confidence for researchers, developers and businesses. As AI adoption accelerates, LMArena’s tools are becoming critical infrastructure.”