Fast, cost-efficient third-generation Gemini model with a 1 million token context window. Optimised for high-throughput applications requiring real-time multimodal responses.
Model updates refreshed1h agoMay 20, 2026news + changelog
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
LaunchesGoogleToday
For centuries, the scientific method has been our best tool for progress. But today, there’s so much data out there that it’s impossible for any one researcher to connect all the dots. We want to fix
For centuries, the scientific method has been our best tool for progress. But today, there’s so much data out there that it’s impossible for any one researcher to connect all the dots. We want to fix that: Introducing Gemini for Science, a collection of science tools and https://t.co/knRWV2JJsR
We partnered with artists, designers, and builders to create new AI tools that solve real problems in their creative workflows. Here’s what’s new: — Introducing Google Pics in @GoogleWorkspace: A bran
We partnered with artists, designers, and builders to create new AI tools that solve real problems in their creative workflows. Here’s what’s new: — Introducing Google Pics in @GoogleWorkspace: A brand-new image creation & editing tool. Move and resize objects, add text, and https://t.co/e5nJrAfUHP
We were able to sit down with the @GoogleDeepmind team behind the new Gemini Omni Flash model to hear all of their behind-the-scenes stories, memorable moments, and many, many (occasionally embarrassi
We were able to sit down with the @GoogleDeepmind team behind the new Gemini Omni Flash model to hear all of their behind-the-scenes stories, memorable moments, and many, many (occasionally embarrassing) video generations. Watch the full Release Notes episode here: https://t.co/cA911hq2IL
By now, you've probably heard about Gemini Omni, our new model designed to create anything from any input, starting with video. But... what's the big deal? Let’s break it down 🧵👇 https://t.co/QbxMNZ
By now, you've probably heard about Gemini Omni, our new model designed to create anything from any input, starting with video. But... what's the big deal? Let’s break it down 🧵👇 https://t.co/QbxMNZa2Wx
For centuries, the scientific method has been our best tool for progress. But today, there’s so much data out there that it’s impossible for any one researcher to connect all the dots. We want to fix
For centuries, the scientific method has been our best tool for progress. But today, there’s so much data out there that it’s impossible for any one researcher to connect all the dots. We want to fix that: Introducing Gemini for Science, a collection of science tools and https://t.co/knRWV2JJsR
We partnered with artists, designers, and builders to create new AI tools that solve real problems in their creative workflows. Here’s what’s new: — Introducing Google Pics in @GoogleWorkspace: A bran
We partnered with artists, designers, and builders to create new AI tools that solve real problems in their creative workflows. Here’s what’s new: — Introducing Google Pics in @GoogleWorkspace: A brand-new image creation & editing tool. Move and resize objects, add text, and https://t.co/e5nJrAfUHP
New upgrades to the @GeminiApp are you helping you get more done: ✨Gemini Spark is your 24/7 personal AI agent that can take action on your behalf, under your direction. It seamlessly integrates with
New upgrades to the @GeminiApp are you helping you get more done: ✨Gemini Spark is your 24/7 personal AI agent that can take action on your behalf, under your direction. It seamlessly integrates with @Gmail, @GoogleDocs, and Slides to automate your workflows and, best of all, https://t.co/pMCS05HAhB
A few weeks ago, we asked our community to use @GoogleAIStudio or Canvas in @GeminiApp to help us create the Google I/O countdown. Thanks SO much to everyone who submitted, and special shoutout to the
A few weeks ago, we asked our community to use @GoogleAIStudio or Canvas in @GeminiApp to help us create the Google I/O countdown. Thanks SO much to everyone who submitted, and special shoutout to the creators whose submissions helped us set the right ~vibes~ on the stage today: https://t.co/A1zMExmEVM
We were able to sit down with the @GoogleDeepmind team behind the new Gemini Omni Flash model to hear all of their behind-the-scenes stories, memorable moments, and many, many (occasionally embarrassi
We were able to sit down with the @GoogleDeepmind team behind the new Gemini Omni Flash model to hear all of their behind-the-scenes stories, memorable moments, and many, many (occasionally embarrassing) video generations. Watch the full Release Notes episode here: https://t.co/cA911hq2IL
By now, you've probably heard about Gemini Omni, our new model designed to create anything from any input, starting with video. But... what's the big deal? Let’s break it down 🧵👇 https://t.co/QbxMNZ
By now, you've probably heard about Gemini Omni, our new model designed to create anything from any input, starting with video. But... what's the big deal? Let’s break it down 🧵👇 https://t.co/QbxMNZa2Wx
We want to help scientists discover their next breakthrough with AI. Gemini for Science is our new suite of experimental tools to help them explore more hypotheses, validate work at scale, unpack lite
We want to help scientists discover their next breakthrough with AI. Gemini for Science is our new suite of experimental tools to help them explore more hypotheses, validate work at scale, unpack literature with ease, and more 🧵 https://t.co/RyHvlZCS7u
Google Flow 🤝 Gemini Omni Create more cinematic stories with our latest model, which brings batch editing, improved character consistency and more. Here’s what else is new for @FlowbyGoogle → https:/
Google Flow 🤝 Gemini Omni Create more cinematic stories with our latest model, which brings batch editing, improved character consistency and more. Here’s what else is new for @FlowbyGoogle → https://t.co/corY3RwY7t #GoogleIO https://t.co/usg0Sudiv9
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matches the supervision distribution. This problem is further amplified in multimodal reasoning, where perception errors and reasoning failures follow distinct drift patterns that compound during subsequent RL. We introduce PRISM, a three-stage pipeline that mitigates this drift by inserting an explicit distribution-alignment stage between SFT and RLVR. Building on the principle of on-policy distillation (OPD), PRISM casts alignment as a black-box, response-level adversarial game between the policy and a Mixture-of-Experts (MoE) discriminator with dedicated perception and reasoning experts, providing disentangled corrective signals that steer the policy toward the supervision distribution without requiring access to teacher logits. While 1.26M public demonstrations suffice for broad SFT initialization, distribution alignment demands higher-fidelity supervision; we therefore curate 113K additional demonstrations from Gemini 3 Flash, featuring dense visual grounding and step-by-step reasoning on the hardest unsolved problems. Experiments on Qwen3-VL show that PRISM consistently improves downstream RLVR performance across multiple RL algorithms (GRPO, DAPO, GSPO) and diverse multimodal benchmarks, improving average accuracy by +4.4 and +6.0 points over the SFT-to-RLVR baseline on 4B and 8B, respectively. Our code, data, and model checkpoints are publicly available at https://github.com/XIAO4579/PRISM.
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning. PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic, spanning perceptual subtasks such as objects, attributes, relations, locations, actions, and events, and requiring skills including semantic recognition, visual correspondence, temporal reasoning, and spatial reasoning. The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains including city walk tours, indoor villa tours, video games, and extreme outdoor sports, with 100% manual annotation. Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants take much longer than on prior benchmarks, and accuracy drops to near chance (18.97%) when rewatching is disallowed. State-of-the-art MLLMs also perform substantially worse on PerceptionComp than on existing benchmarks: the best model in our evaluation, Gemini-3-Flash, reaches only 45.96% accuracy in the five-choice setting, while open-source models remain below 40%. These results suggest that perception-centric long-horizon video reasoning remains a major bottleneck, and we hope PerceptionComp will help drive progress in perceptual reasoning.
Gemini 3 Flash is now available through local Ollama runtime and Ollama Cloud. 1M context window listed. Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.