Success! You're on the list.

Success! You're on the list.

Google’s Gemini era promises a revolution in how we think and search: Everything you should know

Google's Gemini era promises a revolution in how we think and search: Everything you should know
Image credit: Google

Sundar Pichai, CEO of Google, recently announced the company’s full immersion in the “Gemini era” at Google I/O 2024. This keynote speech marked a turning point, highlighting a decade of AI investment and unveiling the next generation of Google’s technology.

The Gemini era signifies an important shift towards “multimodal” AI models. Unlike previous models, Gemini can understand and integrate various information formats, including text, images, code, and even audio and video. This allows for a new level of human-computer interaction, paving the way for groundbreaking advancements.

A year ago, Google introduced the first Gemini models, showcasing their state-of-the-art performance. Since then, Google has released Gemini 1.5 Pro, boasting a significant breakthrough in “long context understanding.” This model can process a staggering 1 million tokens, enabling it to analyse vast amounts of information for more comprehensive results.

Late last year, we had done a feature titled, “Google Gemini: 10 not-so-much-understood depths of the latest trending AI product,” aswell. You can read it as well.

Making AI accessible: Gemini for everyone

Google is committed to democratising AI. Over 1.5 million developers are already utilising Gemini models within Google’s suite of tools, demonstrating its widespread adoption. These models are integrated across various Google products, including Search, Photos, Workspace, and the Android operating system.

Search, a core Google product, has undergone a major transformation with the introduction of AI Overviews. This feature leverages Gemini’s capabilities to deliver a more comprehensive and user-friendly search experience. AI Overviews allows users to ask complex, long-form questions and even search using photos, providing the most relevant results from the web. This feature is currently being rolled out in the U.S. with plans for international expansion.

Introducing Ask Photos: A new way to search your memories

Another exciting application of Gemini is “Ask Photos.” This feature empowers users to search their vast photo and video libraries with natural language queries. Gone are the days of scrolling through years of photos; Ask Photos can identify objects, recognize faces, and even interpret text within images. 

For instance, users can ask “When did Lucia learn to swim?” and then follow up with “Show me how Lucia’s swimming has progressed.” Here, Gemini goes beyond a simple search, understanding the context and chronology within the photos. Ask Photos is rolling out this summer with promises of even more functionalities in the future.

Unveiling knowledge with multimodality and long context

The ability to understand and utilise various information formats is what makes Gemini truly revolutionary. This “multimodal” approach combined with “long context” processing reveals entirely new possibilities. With a million-token context window, Gemini 1.5 Pro can analyse massive amounts of data, be it hundreds of pages of text, hours of video, or even entire code repositories. This extended context allows for a deeper understanding of user queries and more insightful responses.

Gemini 1.5 Pro with 2 million token context

While 1 million tokens represent a significant leap forward, Google is constantly pushing boundaries. The company is currently offering a private preview of Gemini 1.5 Pro with a 2 million token context window for developers. This signifies Google’s commitment to achieving “infinite context” in the future.

Gemini to workspace

The power of Gemini is evident within Google workspace. Imagine searching your emails with unparalleled efficiency. Gemini can analyse emails and attachments, including PDFs, to provide summaries of key points and action items. It can even highlight essential information from hour-long Google Meet recordings, making it easier to stay caught up even if you missed a meeting.

Audio outputs and the future of AI agents

The future of AI goes beyond just text-based interactions. A glimpse of this future is showcased in “Audio Overviews” within NotebookLM. This feature utilises Gemini 1.5 Pro to generate personalised and interactive audio conversations based on source materials. This signifies the potential for a truly multimodal user experience, allowing users to interact with AI through various input and output formats.

AI agents represent the next stage of AI development. These intelligent systems can reason, plan, and leverage memory to complete tasks on a user’s behalf. Imagine an AI agent that can handle all aspects of returning a poorly fitting pair of shoes, from locating the receipt to scheduling a UPS pickup.

The importance of responsible AI

As AI becomes more powerful, responsible development becomes paramount. At the event, the CEO of Google said that they are committed to building trustworthy AI through initiatives like “AI-assisted red teaming” and “SynthID,” a watermarking tool that helps identify AI-generated content.

What infrastructure is powering the Gemini Era

Training and running cutting-edge AI models like Gemini requires immense computational power. Google emphasised its commitment to building the necessary infrastructure for the AI era. Here are some key takeaways:

  • Trillium TPUs: Google unveiled its 6th generation of Tensor Processing Units (TPUs) called Trillium. This latest iteration boasts a 4.7x performance improvement compared to the previous generation, making it Google’s most powerful and efficient TPU to date. Trillium will be available to cloud customers by late 2024.
  • A comprehensive hardware ecosystem: Beyond TPUs, Google offers a complete hardware suite, including custom Arm-based CPUs (Axion processors) and Nvidia’s cutting-edge Blackwell GPUs. This ensures Google Cloud can cater to diverse workloads requiring different processing strengths.
  • AI hypercomputer: Google showcased its AI Hypercomputer, a groundbreaking supercomputer architecture that integrates performance-optimised hardware with open software and flexible consumption models. This allows businesses and developers to tackle complex challenges more efficiently compared to using raw hardware alone.
  • Liquid cooling: Google highlighted its significant lead in data centre liquid cooling technology. This approach boasts superior energy efficiency compared to traditional air cooling methods, allowing Google to scale its infrastructure for the demands of AI.
  • Network reach: Google boasts a vast network spanning over 2 million miles of terrestrial and subsea fibre, exceeding the reach of any other cloud provider by a factor of ten. This ensures the seamless transfer of massive datasets required for training and running advanced AI models.

The future of search in the Gemini era

Search, a cornerstone of Google’s offerings, is undergoing a radical transformation powered by Gemini. Here’s what to expect:

  • Generative search at scale: Google Search leverages Gemini’s capabilities to become a “generative AI” tool. This signifies a shift from simply returning links to providing users with comprehensive summaries, completing tasks, and offering a more intuitive search experience.
  • A new chapter for search: Sundar Pichai emphasised that the Gemini Era marks the most exciting chapter for Google Search yet. Users can anticipate a more dynamic and user-friendly search experience that caters to complex queries and leverages various input formats.

Enhancing User Experiences (UX) with Gemini

Beyond Search, Google is integrating Gemini across various products to provide more intelligent and personalised user experiences:

  • Live Gemini: A new experience called “Live” allows users to have in-depth, voice-based conversations with Gemini. This signifies a more natural and conversational way to interact with AI.
  • 2 million token support in Gemini Advanced: Later in 2024, Gemini Advanced will offer users the ability to upload and analyse even denser files like long code or high-resolution videos. This caters to users with more demanding computational needs.
  • Gemini on Android: Billions of Android users will benefit from a deeper integration of Gemini. This includes “Gemini Nano with Multimodality,” an on-device AI model that can process various information formats while keeping user data private.

What we think about the new announcements

The unveiling of the Gemini Era signifies Google’s commitment to ushering in a new era of AI innovation. With breakthroughs in multimodal AI, long-context processing, and a robust AI infrastructure, Google is laying the foundation for the future of human-computer interaction. 

The focus on responsible AI development through initiatives like “AI-assisted red teaming” and “SynthID” also shows Google’s commitment to building trustworthy and ethical AI tools. As Google collaborates with developers and users, the doors of possibilities opened by the Gemini Era are vast, promising to transform how we search for information, interact with technology, and ultimately, shape the future.

Related Posts

Get daily funding news briefings in the tech world delivered right to your inbox.

Enter Your Email
join our newsletter. thank you