Second-generation open-weights model from Google. Offers competitive performance in a compact form factor; available in 2B, 9B, and 27B parameter variants.
Running this yourself: can likely run on your own machine.
Model updates refreshed2h agoMay 30, 2026news + changelog
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
LaunchesGoogleToday
Hear the architects of Gemini reflect on their journey to continue pushing the frontier of AI, on this episode of Release Notes. @JeffDean, @koraykv, @OriolVinyalsML, and @NoamShazeer sit down on came
Hear the architects of Gemini reflect on their journey to continue pushing the frontier of AI, on this episode of Release Notes. @JeffDean, @koraykv, @OriolVinyalsML, and @NoamShazeer sit down on camera together to share a behind-the-scenes look at the people behind the model, https://t.co/wO1mD8Wwmc
Gemma — Google DeepMind Skip to main content Explore our next generation AI systems Explore models Gemini Gemini Learn, build, and plan anything Nano Banana Create and edit detailed images Gemini Audio Talk, create and control audio Specialized models Veo Generate cinematic video with audio Imagen Generate high-quality images from text Lyria Generate high fidelity music and audio World models & embodied AI Genie 3 Generate and explore interactive worlds Gemini Robotics Percei
Gemma — Google DeepMind Skip to main content Explore our next generation AI systems Explore models Gemini Gemini Learn, build, and plan anything Nano Banana Create and edit detailed images Gemini Audio Talk, create and control audio Specialized models Veo Generate cinematic video with audio Imagen Generate high-quality images from text Lyria Generate high fidelity music and audio World models & embodied AI Genie 3 Generate and explore interactive worlds Gemini Robotics Percei
Hear the architects of Gemini reflect on their journey to continue pushing the frontier of AI, on this episode of Release Notes. @JeffDean, @koraykv, @OriolVinyalsML, and @NoamShazeer sit down on came
Hear the architects of Gemini reflect on their journey to continue pushing the frontier of AI, on this episode of Release Notes. @JeffDean, @koraykv, @OriolVinyalsML, and @NoamShazeer sit down on camera together to share a behind-the-scenes look at the people behind the model, https://t.co/wO1mD8Wwmc
Look back at last week’s I/O announcements with @NotebookLM. You can listen to an audio overview, watch the video recap, and even check out our detailed slide deck summarizing all of the biggest news
Look back at last week’s I/O announcements with @NotebookLM. You can listen to an audio overview, watch the video recap, and even check out our detailed slide deck summarizing all of the biggest news and launches. Check it out here: https://t.co/AIhdaw05b9
Look back at last week’s I/O announcements with @NotebookLM. You can listen to an audio overview, watch the video recap, and even check out our detailed slide deck summarizing all of the biggest news
Look back at last week’s I/O announcements with @NotebookLM. You can listen to an audio overview, watch the video recap, and even check out our detailed slide deck summarizing all of the biggest news and launches. Check it out here: https://t.co/Zc69abPReX https://t.co/cXEu6AhnGV
X/Twitter@GoogleDeepMindGoogleannouncementgeneral3d ago
SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. That’s why we’re partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to the
SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. That’s why we’re partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to their models – accelerating the industry-wide momentum we started with @NVIDIA. https://t.co/QnshYx3EfE
#benchmark#open-llm-leaderboard#🤝 base merges and moerges
Colored Noise Diffusion Sampling
Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget. In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold. Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SiT, JiT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance. Project page is available at https://hadardavidson.github.io/CNS/.
JLT: Clean-Latent Prediction in Latent Diffusion Transformers
Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, where compression has already removed much of the raw pixel variability. We introduce JLT, a 130M latent diffusion Transformer over frozen FLUX.2 VAE codes, and compare clean-latent prediction with a matched velocity-prediction DiT under the same representation, backbone, and training settings. Although the three variables x, epsilon, and v are linearly convertible for a fixed corruption time, a local Gaussian analysis shows that velocity regression inherits an isotropic target-covariance floor and amplifies low-variance latent directions, while clean prediction damps them. On ImageNet 256 x 256, JLT-B/1 obtains FID-50K 2.50 with classifier-free guidance, with a large matched-target gap over velocity prediction. These results suggest that prediction targets in latent diffusion are representation-dependent geometric choices, rather than interchangeable algebraic parameterizations.
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini
We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all these modalities that generalize well across a wide variety of tasks. Applying large-scale contrastive learning in a multi-task multi-stage training setup, we achieve state-of-the-art performance on key embedding benchmarks including unimodal, cross-modal, and multimodal retrieval spanning a diverse set of tasks. We show that our embedding model demonstrates strong performance (with a score of 62.9 R@1 on MSCOCO, 68.8 NDCG@10 on Vatex, 69.9 on MTEB multilingual and 84.0 on MTEB Code) across a variety of tasks surpassing the performance of specialized models. These unified capabilities make Gemini Embedding 2 a promising candidate for downstream use cases such as RAG, recommendation and search. Furthermore, its robust zero-shot performance across distinct fields - from astronomy and bioscience to fine arts and the culinary arts - establishes it as a highly reliable, out-of-the-box representation even for specialized domains.
Modern GANs often introduce adversarial supervision on intermediate generator outputs and interpret the resulting multi-stage synthesis as coarse-to-fine hierarchical generation. In this work, we challenge this interpretation. We argue that standard scale-wise adversarial supervision does not construct a proper coarse-to-fine hierarchy: each intermediate image is independently pushed toward the real distribution at its own resolution, but this scale-wise realism does not ensure that outputs across stages represent the identical generated sample. Moreover, the scale-specific image produced at each stage is not used as an explicit refinement target for the subsequent stage. Therefore, its adversarial loss can improve a scale-specific output without constraining later stages to preserve the same sample trajectory, allowing them to move toward a different sample rather than refine the previous output. We refer to this problem as a cross-scale trajectory misalignment problem. To resolve it, we propose CAT, a Cross-scale Aligned Transformer for multi-scale adversarial generation. CAT keeps the discriminator scale-wise, so each intermediate output is evaluated at its own resolution, while adding a simple generator-side consistency regularization that aligns intermediate outputs with the final output. On class-conditional ImageNet-256, CAT-H/2 achieves an FID-50K of 1.56 with one-step inference after only 60 training epochs, outperforming strong one-step GAN and diffusion/flow baselines.
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Weight pruning is a standard technique for compressing large language models, yet its effect on learned internal representations remains poorly understood. We present the first systematic study of how unstructured pruning reshapes the feature geometry of language models, using Sparse Autoencoders (SAEs) as interpretability probes. Across three model families (Gemma 3 1B, Gemma 2 2B, Llama 3.2 1B), two pruning methods (magnitude and Wanda), and six sparsity levels (0--60%), we investigate five research questions spanning seed stability, feature survival, SAE transferability, feature fragility, and causal relevance. Our most striking finding is that rare SAE features--those with low firing rates--survive pruning far better than frequent ones, with within-condition Spearman correlations of rho = -1.0 in 11 of 17 experimental conditions. This counter-intuitive result suggests that pruning acts as implicit feature selection, preferentially destroying high-frequency generic features while preserving specialized rare ones. We further show that Wanda pruning preserves feature structure up to 3.7x better than magnitude pruning, that pre-trained SAEs remain viable on Wanda-pruned models up to 50% sparsity, and that geometric feature survival does not predict causal importance--a dissociation with implications for interpretability under compression.
Gemma 2 is now available through local Ollama runtime. 8K context window listed. Google Gemma 2 is a high-performing and efficient model available in three sizes: 2B, 9B, and 27B.
Gemma — Google DeepMind Skip to main content Explore our next generation AI systems Explore models Gemini Gemini Learn, build, and plan anything Nano Banana Create and edit detailed images Gemini Audio Talk, create and control audio Specialized models Veo Generate cinematic video with audio Imagen Generate high-quality images from text Lyria Generate high fidelity music and audio World models & embodied AI Genie 3 Generate and explore interactive worlds Gemini Robotics Percei
Gemma — Google DeepMind Skip to main content Explore our next generation AI systems Explore models Gemini Gemini Learn, build, and plan anything Nano Banana Create and edit detailed images Gemini Audio Talk, create and control audio Specialized models Veo Generate cinematic video with audio Imagen Generate high-quality images from text Lyria Generate high fidelity music and audio World models & embodied AI Genie 3 Generate and explore interactive worlds Gemini Robotics Percei