The AI Update with Kevin Davis
Posts
What Does The Future Of Google Gemini 1.5 Pro Hold?

What Does The Future Of Google Gemini 1.5 Pro Hold?

Yesterday was OpenAI's shining moment, today is Google's with the Google I/O Conference

AI Update
May 14, 2024

In partnership with

In Today’s Issue

OpenAI GPT-4o Big Reveal Yesterday
Google Gemini 1.5 Pro and Google I/O Conference Today
GPT-4o Now Available Through API
AI Tool Report

BREAKING NEWS

OpenAI Unveils GPT-4o: A Leap Towards True Multimodal AI

May 13, 2024, marks a significant milestone in AI development with OpenAI's announcement of GPT-4o, a model that integrates text, audio, and vision capabilities in real-time.

Dubbed "omni" for its all-encompassing input and output modalities, GPT-4o promises to revolutionize human-computer interaction.

A New Era of Interaction

GPT-4o can process and respond to audio inputs in as little as 232 milliseconds, rivaling human conversational speed. It matches GPT-4 Turbo's performance in English text and coding, excels in non-English languages, and is 50% cheaper and faster in the API.

It’s vision and audio understanding surpass existing models, setting a new benchmark.

Unified Model Architecture

Unlike its predecessors, GPT-4o uses a single neural network to handle all inputs and outputs, preserving contextual nuances like tone and background noise.

This integrated approach allows for more natural interactions, including laughter, singing, and emotional expression.

Capabilities and Applications

From real-time translation and meeting summaries to creative tasks like character design and poetic typography, GPT-4o showcases a broad range of applications.

It also excels in traditional benchmarks, achieving high scores in multilingual and visual perception tests.

Safety and Limitations

OpenAI emphasizes safety, incorporating robust measures to filter training data and refine model behavior. Extensive testing, including external red teaming, ensures GPT-4o operates within acceptable risk levels.

However, audio outputs are initially limited to preset voices to mitigate risks.

Availability and Future Plans

GPT-4o's text and image capabilities are rolling out today in ChatGPT, available to free and Plus users. Developers can access the model via API, with audio and video capabilities launching soon for trusted partners.

OpenAI's iterative rollout aims to balance innovation with safety, setting the stage for broader adoption.

GPT-4o represents a significant leap in AI, pushing the boundaries of what multimodal models can achieve.

As OpenAI continues to refine and expand its capabilities, the potential for more natural and intuitive human-computer interactions grows, heralding a new era in AI technology.

OTHER NEWS

Google Teases Gemini 1.5 Pro: The Next Leap in AI

Google's AI subsidiary, DeepMind, has unveiled Gemini 1.5 Pro, an upgraded version of its recently rebranded Bard chatbot, now known as Gemini.

This announcement comes just days after the release of Gemini 1.0 Ultra, touted as Google's most advanced AI model to date.

Gemini 1.5 Pro isn't just an incremental update. It can process video, images, audio, and text to answer questions, offering significant improvements over its predecessors.

However, access is currently limited to developers and enterprise customers, with a broader rollout planned via a waitlist.

Oriol Vinyals, VP of Research at Google DeepMind and co-lead of Gemini, described this as a "research release" aimed at those who deeply understand the technology.

"When you create a new model, especially with new capabilities, it makes sense to see what creative minds can do with it," Vinyals said.

Performance and Capabilities

Gemini 1.5 Pro boasts an 87% win rate against Gemini 1.0 Pro and 55% against 1.0 Ultra, making it the most capable model yet. Vinyals emphasized the model's efficiency, thanks to a unique architecture that zeroes in on expert sources for answers.

The model also features a long context window, capable of ingesting up to 1 million tokens—equivalent to an hour of video or 11 hours of audio.

Practical Applications

Users can feed Gemini 1.5 Pro extensive documents like the Apollo 11 transcript and ask it to find specific moments. It can also identify scenes in silent films or translate languages with minimal data. However, the model isn't without flaws.

"The model will sometimes fail, and it's a work in progress for the whole community to improve these models," Vinyals noted.

What About Ultra 1.0?

Despite the new release, Vinyals assured that Gemini 1.0 Ultra remains relevant. "We are some time away from getting 1.5 Pro out," he said, indicating that Ultra 1.0 will still be valuable for users willing to pay for premium access.

The Bigger Picture

The release of Gemini 1.5 Pro comes amid a surge in AI advancements. OpenAI has launched its GPT-4 Turbo model, and Microsoft plans to integrate its AI tool, Copilot, into Windows 11.

The AI sector is projected to reach $1.3 trillion in revenue by 2032, making these developments crucial for the industry's future.

Gemini 1.5 Pro represents a significant leap in AI capabilities, but its full potential will only be realized as it becomes more widely available.

For now, developers and enterprise customers will be the first to explore its possibilities.

SOCIAL MEDIA

GPT4o Now Available In API As Well…

I switch most of my automation over to GPT4o last night. Seeing really great results with less “fluffy” language being used.

Make.com has not updated to V2 of the Assistants API so unable to use GPT4o in Assistant Completions yet, but hopefully an update the the headers passed is releasing in the next day or so.

.@OpenAI has just launched #GPT4o in their API, boasting amazing capabilities and 50% lower pricing! 🤯
➜ 2x faster latency
➜ 5x higher rate limits: up to 10M tokens/min
➜ Supports both text and vision, with improved vision and non-English support
➜ 128k context window
➜… x.com/i/web/status/1…
— DataChazGPT (not a bot) (@DataChaz)
3:51 PM • May 14, 2024

Recommended Newsletter

Learn AI in 5 Minutes a Day

AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.

Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.

FEEDBACK LOOP

Sincerely, How Did We Do With This Issue?

I would really appreciate your feedback to make this newsletter better...

LIKE IT, SHARE IT

That’s all for today.