Artificial intelligence has taken a significant leap forward with the Meta Llama 3.2 launch, its latest large language model (LLM), capable of understanding both images and text. This marks Meta‘s entry into the world of vision models, placing it in direct competition with AI heavyweights like OpenAI and Anthropic.
AI agents with vision: Llama 3.2 launch is a new frontier for Meta
Announced at Meta Connect, Llama 3.2 includes medium-sized models with 11B and 90B parameters, along with lightweight, text-only models (1B and 3B parameters) designed for mobile and edge devices. These models are primed to power personalised AI agents, giving them the ability to understand and interact with visual data.
“This is our first open-source multimodal model,” said Meta CEO Mark Zuckerberg in his keynote. “It’s going to enable a lot of applications that will require visual understanding.”
For industries that depend heavily on visual information — from retail to healthcare — Llama 3.2 opens the door to innovations previously limited to high-end proprietary AI. Imagine an AI agent that not only reads but also comprehends graphs, captions images, and pinpoints objects based on natural language commands. Whether you’re looking for the best sales month from a graph or asking for a detailed description of an image, Llama 3.2’s multimodal capabilities take AI interaction to the next level.
Open-source AI: The Linux of artificial intelligence?
Meta has consistently championed open-source AI, and Llama 3.2 is no exception. Along with its visual understanding abilities, Meta is sharing Llama stack distributions for the first time, allowing developers to run the models across various environments, from on-premises to cloud setups.
Zuckerberg’s vision for Llama as “the Linux of AI” reflects Meta’s commitment to open-source as the future of cost-effective, customisable, and high-performing AI systems. “Open source is going to be — already is — the most cost-effective, customisable, trustworthy and performant option out there,” Zuckerberg said. “We’ve reached an inflection point in the industry. It’s starting to become an industry standard.”
Rivalling OpenAI’s GPT4o and Anthropic’s Claude 3 Haiku
The Meta Llama 3.2 launch arrives not long after Meta released Llama 3.1, a model that has achieved ten times growth since its launch. With the 11B and 90B models supporting image recognition, Meta is positioning Llama 3.2 as a serious competitor to leading models like OpenAI’s GPT4o and Anthropic’s Claude 3 Haiku.
The lightweight text-only versions, meanwhile, are designed for building highly personalised AI applications. These models can summarise recent messages, send calendar invites, and perform other tasks in a private setting, without needing large computational resources.
Meta claims that Llama 3.2 outperforms rivals like Gemma and Phi 3.5-mini in key areas such as instruction following, tool use, summarisation, and prompt rewriting.
Voice: A new era of celebrity-powered AI
In addition to its new visual capabilities, Meta is taking a bold step by giving Llama 3.2 “a voice” — quite literally. Meta AI can now talk back, and in the voices of celebrities like Dame Judi Dench, John Cena, and Kristen Bell. Available across platforms like WhatsApp, Messenger, Facebook, and Instagram, this feature allows users to interact with Meta AI using either text or voice commands.
“I think that voice is going to be a way more natural way of interacting with AI than text,” Zuckerberg remarked during his keynote. “It is just a lot better.”
Meta’s AI can also respond to shared photos in chats, modify images by adding or removing elements, and even change backgrounds. Looking ahead, Meta is experimenting with translation, video dubbing, and lip-syncing tools, aiming to further enhance the interactivity of its AI offerings.
Meta AI’s business expansion: A million advertisers strong
Meta’s AI vision doesn’t end with consumer applications. The company is expanding its business AI capabilities, allowing enterprises to use click-to-message ads on platforms like WhatsApp and Messenger. Businesses can build agents that answer common queries, discuss product details, and finalise purchases, all powered by generative AI.
Meta revealed that over one million advertisers have already used its AI tools, creating fifteen million ads in the past month alone. The company reported a significant impact, with ad campaigns using Meta’s generative AI tools experiencing an 11% higher click-through rate and a 7.6% increase in conversions compared to those that didn’t.
The future of AI assistants
With Llama 3.2, Meta is pushing the boundaries of what’s possible with AI agents. From multimodal understanding to celebrity-powered voices, Meta AI is setting its sights on becoming the world’s most-used assistant — a goal that Zuckerberg believes is already within reach. “It’s probably already there,” he said.
Llama 3.2 models are available for download on platforms such as Hugging Face and Meta’s own website, llama.com, enabling developers to build on the vision of truly multimodal, accessible AI.