Google's New AI Beast

Google's Gemini just made GPT-4 look like a baby’s toy?

Estimated read time: 1:20

Summary

Google has unveiled its highly anticipated Gemini model, a multimodal large language model outperforming GPT-4 in almost every benchmark. Gemini can process text, sound, images, and videos, demonstrating impressive capabilities such as recognizing activities in videos and real-time responses. It performs complex tasks like generating images, music, and tackling logic-based questions. While Gemini has three versions—Tall, Grande, and Ventti—the Ultra version is the most advanced, though not yet publicly available. Despite its impressive benchmarks, Gemini occasionally underperforms in common sense reasoning compared to GPT-4. The model is currently available in The Bard chatbot using the Gemini Pro version, and further releases are expected after more safety testing.

Highlights

Google's Gemini beats GPT-4 in nearly every benchmark. 🚀
Gemini can track objects in a video, like locating a ball under shuffled cups. 🎥🥤
It generates images and audio from text and visuals, much like a creative factory. 🏭
Gemini is available now in The Bard chatbot, with more models to follow. 💬
Despite its prowess, it struggles with the HellSwag benchmark, hinting at areas for improvement. ⚠️

Key Takeaways

Gemini can recognize and respond to ongoing video feeds in real-time. 🕶️
It supports multiple languages and can process multimodal inputs like images and sound. 🌐
Creates content on-the-fly, from music to blueprints. 🎶🖼️
Exceeds in tasks involving logic and spatial reasoning. 📐
Gemini Ultra outperforms GPT-4, except in common sense reasoning. 🤔
It uses advanced tensor processing units, shaping data systems into efficient configurations. 🔄
The Nano and Pro models will be available soon, while Ultra awaits further testing. ⏳

Overview

Google's new AI model, Gemini, is turning heads by outperforming Microsoft's GPT-4 in several benchmarks. This futuristic model isn't just about text—it's about integrating text with images, sounds, and videos, offering a taste of what AI can do across different media. It feels like the future is here, with models now recognizing video feeds in real-time and generating everything from music to structural blueprints on command!

The battle of the AI titans heats up as Gemini introduces intriguing features like multimodal outputs and sophisticated logic capabilities. The model's prowess spans multiple languages, real-time video tracking, and generating original art forms, highlighting just how far AI has come. What's more, Google's leveraging of ultra-efficient tensor processing units means we're seeing previously unimaginable processing speeds and capabilities.

Even though most are in awe of Gemini's abilities, there are some bumps to iron out. Its performance on benchmarks of common-sense reasoning is still a work in progress. And while the Gemini Pro model is already available in The Bard chatbot, the full potential of Gemini Ultra will be realized once additional safety tests are concluded. Google Cloud users can expect the Nano and Pro models soon, with Ultra following next year. Until then, the AI community watches eagerly to see how this game-changer evolves.

Chapters

00:00 - 00:30: Introduction and Context The chapter delves into the fierce competition between Google and Microsoft in the realm of artificial intelligence as of December 7th, 2023. Following Microsoft's significant advancements with GPT-4, which positioned Bing favorably among users, Google responded by launching its own AI model, Gemini. This model surpasses GPT-4 on several benchmarks. Gemini, a multimodal large language model, is celebrated for its ability to process and understand text, sound, images, and video simultaneously. It offers real-time responses to video feeds and excels in multilingual communication and logic tasks.
00:30 - 01:30: Google's Comeback with Gemini In this chapter, the focus is on Google's efforts to counteract Microsoft's dominance in the AI space following the release of GPT-4. Google's new AI model, Gemini, is introduced as a multimodal large language model, promising to outperform GPT-4 across various benchmarks. Gemini's capabilities include real-time video recognition, multilingual processing, and multimodal outputs like generating images and audio from text and video inputs. It demonstrates advanced functions such as spatial reasoning and logic, capable of tracking objects in dynamic scenarios and assisting engineers with design tasks.
01:30 - 02:30: Alpha Code 2 Unveiled Google's battle against Microsoft in the AI industry has intensified with the release of Gemini, a new AI model that surpasses GPT-4 in many benchmarks. This model is multimodal, handling text, sound, images, and video, with advanced real-time recognition and response capabilities. It's capable of understanding ongoing video feeds and performing tasks like 'find the ball under the cup' and generating multimodal outputs (e.g., turning images into audio). The model's practical applications include assisting engineers by creating blueprints from simple images, indicating a potential shift in multiple engineering fields.
02:30 - 03:30: Comparing Gemini and GPT-4 In this section of the discussion, the capabilities of Gemini and GPT-4 are compared, particularly focusing on their performance in various benchmarks. Gemini Ultra, while surpassing GPT-4 in many categories, oddly underperforms in the HellaSwag Benchmark, which assesses common sense and natural language comprehension. Notably, GPT-4 shows a more human-like understanding in this area despite its other limitations.
03:30 - 04:30: Performance and Benchmarks of Gemini Gemini Ultra, although still very fast, does not outperform GPT-4 Pro but is noted as a potential competitor, causing some unease in GPT-4 Pro's responses.
04:30 - 05:00: Availability and Conclusion The chapter starts with a comparison between GPT-4 and Gemini Ultra, highlighting that while GPT-4 is nervous about Gemini Ultra, Gemini Ultra outperforms GPT-4 in several benchmarks, especially in massive multitask language understanding. However, it falls short in the Hell Swag Benchmark, which evaluates common sense natural language tasks.

Google's Gemini just made GPT-4 look like a baby’s toy? Transcription

Segment 1: 00:00 - 02:30 make no mistake Google got obliterated by Microsoft's blitzk attack in the great AI war of 2023 GPT 4 captured the Zeitgeist of the artificial intelligence age we just entered and things got so bad for Google that people unironically started using Bing but the war is just getting started and just yesterday Google Unleashed its highly anticipated Gemini model that beats GPT 4 on nearly every Benchmark it is December 7th 2023 and you're watching the code report Gemini first became known to the public earlier this year at google.io when Sundar explained it like this you've been applying AI to make AI rigorously tested AI with AI Gemini is a multimodal large language model that will replace Lambda and palm 2 like gp4 it's multimodal which means it's not only trained on text but also sound images and video Google's demo is absolutely insane it can recognize what's going on in a video feed and respond in real time like this guy draws a duck then the AI tells him it's a duck it is a duck like holy and it can do that in multiple languages y what's really crazy though is that it can keep track of things in an ongoing video feed like it plays the game of find the ball under the cup and even after the cups are scrambled up it still knows where the ball is and it can even do connect the dots which makes my 5-year-old obsolete it also does multimodal outputs like it can generate images on the Fly Like Sable diffusion and can even generate music based on a prompt and not just text to audio but image to audio how about some 8s hair metal it's an anything to anything model it's also good at logic and spatial reasoning using these two pictures it's able to tell you which car will go faster based on the aerodynamics of the vehicle in the future a civil engineer will be able to just take a picture of some land then the AI can instantly generate some blueprints for a bridge so software Engineers aren't the only type
Segment 2: 00:00 - 02:30 of Engineers becoming obsolete although I do of course have some more bad news for programmers Google also unveiled Alpha code 2 which performs better than 90% of competitive programmers and we're talking about programmers solving highly complex abstract problems like you might find on code Force's competitions like any good programmer Alpha code 2 can break down problems into smaller problems s using techniques like dynamic programming now all these demos look really amazing at first glance but is this all just a marketing slide of hand from Google well currently Gemini comes in three sizes tall Grande and ventti the smallest version is designed to be embedded on devices like Android phones while the pro version is your more general purpose model while Ultra is like the Magnum XL of the Gemini family and the one that's blowing everybody's Minds if you're in the United States you can actually use Gemini right now in The Bard chatbot however it's using Gemini Pro the mid-range version Bard is way better than was 6 months ago and it's
Segment 3: 02:30 - 05:00 still extremely fast but after using it for a few minutes it's pretty obvious that it's not quite as good as GPT 4 Pro but gp4 is nervous about Gemini Ultra when I asked about it it started throwing mad shade at itself and then before it finished Sam Alman pulled a plug giving me this network error when it comes to benchmarks Gemini Pro underperforms GPT 4 in most situations but Gemini Ultra outperforms it on almost every single category most notably it's the first model ever to outperform human experts on massive multitask language understanding which is typically a multiple choice test over a wide array of subjects kind of like the SATs but for AI what's hella surprising though is that Gemini Ultra underperforms GPT 4 on the hell swag Benchmark it's designed to evaluate Common Sense natural language by having the AI finish a sentence that's often vague and ambiguous for example a man watches a fireship video and afterwards feels blank it's a job that's really easy for humans to do and a very important Benchmark because when an AI can't do this well it doesn't feel very human-like in GPT 4 I can write a vague prompt with typos and somehow it almost always seems to know what I'm talking about the fact that gp4 is doing so much better on H swag is hella concerning to say the least but another interesting thing to know from the technical paper is how they train this Beast they use their newly unveiled version 5 tensor processing units which are deployed in super PODS of 4,096 chips each super pod has a dedicated Optical switch which allows data to transfer quickly between the pods to train in parallel then they can dynamically reconfigure into 3D tourist topologies in other words they can shape shift into Donuts to reduce the latency between ships and the scale of Gemini Ultra is so large that they had to communicate between multiple data centers the paper also describes the
Segment 4: 02:30 - 05:00 training data set which basically includes everything you can find on the internet including web pages and YouTube videos as well as scientific papers and books they filter it for Quality then use reinforcement learning through human feedback to fine-tune the quality and avoid hallucinations overall Gemini looks amazing on paper but prepare to be disappointed the Nano and Pro Models will be available on Google Cloud on December 13th but the Gemini ultr Pro Max won't be available until next year until additional safety tests are done and it reaches 100% on the hell woke Benchmark this has been the code report thanks for watching and I will see you in the next one