Name: Gemini 2.5 Flash
Price: 20 USD
Availability: InStock
Author: Google

Gemini 2.5 Flash by Google | AI Market Cap

HF PapersGoogleresearch1w ago

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

Many moments in the real world do not wait for a user to ask. A fire starts on a security monitor, an expression flickers across a video call, or a product a viewer wants flashes by in a livestream. Yet today's large models remain mostly turn-based by design: they answer only when addressed, and even video-call apps that appear interactive still operate as question-answer systems, reacting only when polled or prompted. We argue for a different paradigm: a model that is present in the world like a person. It continuously watches what is happening now, decides on its own whether to speak or stay silent, interacts in real time, and delegates to a background model when the problem is hard. To advance interaction models and their adoption across domains, we make two fully open-sourced contributions. First, we release JoyAI-VL-Interaction, an 8B-scale, vision-first VL-interaction model. The model makes the response decision internally, choosing each second to stay silent, respond, or delegate to a background model, and it excels at vision-triggered responsiveness and time awareness. We pair it with a transferable training recipe, from which capabilities we never trained for emerge, such as guiding a shopper through changing app screens or improvising a lecture from a slide deck. Second, we release a complete, deployable system built around that model. The system streams any ongoing video into the model, making it genuinely present in the world. All other components are pluggable, including ASR/TTS modules, memory, visualization UI, and a background brain that can connect to any API or agent. Across six real-world scenarios, human raters prefer JoyAI-VL-Interaction over the in-app video-call assistants of Doubao and Gemini by a wide margin. To our knowledge, this is the first open, vision-driven interaction model released together with its training recipe, data, and complete deployable system.

View Source

#huggingface#daily-papers

HF PapersGoogleresearch3w ago

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However, standard OPD faces two coupled limitations. First, it requires direct access to the teacher's token-level logits, excluding a broad class of capable proprietary models from serving as teachers. Second, the token-level logit signal itself is brittle, depending on a narrow overlap of plausible next tokens between teacher and student, and prone to amplifying degenerate patterns such as repetition loops. In this paper, we introduce OmniOPD, a novel framework that addresses both limitations through a logit-free, chunk-level supervision signal. OmniOPD replaces deterministic logit matching with Monte Carlo rollouts that approximate the teacher's local preferences through a continuous semantic similarity metric over multi-token chunks, and concentrates this supervision via a peak-entropy scheduler that audits the student only at its high-uncertainty reasoning forks. A Dirichlet-Multinomial Bayesian prior and a base-model KL anchor further bound the variance of discrete sampling and prevent policy collapse across unaudited tokens. Across competitive benchmarks, OmniOPD surpasses the standard OPD approach by up to +28.64% on math, confirming that chunk-level semantic verification extracts a more reliable learning signal than token-level logit matching, whose high information density is offset by significant noise and brittleness. Furthermore, when paired with stronger black-box teachers such as Claude-4.5-Haiku and Gemini-2.5-Flash, OmniOPD achieves an additional +9.54% relative on math over its open-weight teacher counterpart, advancing the student past the performance of self-exploratory RL.

View Source

#huggingface#daily-papers

HF PapersGoogleresearch1mo ago

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

View Source

#huggingface#daily-papers

Gemini 2.5 Flash

Similar Models

Google DeepMind 🤝 @A24 We’re launching a research partnership with A24 to ensure the tools of the future are shaped by the creators who use them. Find out more → https://t.co/KN3HdGVjGS https://t.co/

Social & Blog Posts5

Benchmarks & Rankings13

Gemini 2.5 Flash Benchmark Update

Research Papers8

Other

gemini-2.5-flash - Arena-Hard-Auto

Our Robotics Accelerator has launched with 15 startups helping shape the future of physical AI in Europe. 🤖 This three-month program will connect them with access to our AI stack, Gemini Robotics mod

When millions of AI agents interact with each other, new collective behaviors can emerge. 🌐 Together with @schmidtsciences, @coop_ai, @ARIA_research and supported by @GoogleOrg, we’re launching a $10

gemini-2.5-flash - Arena-Hard-Auto

gemini-2.5 - Arena-Hard-Auto

Google DeepMind 🤝 @A24 We’re launching a research partnership with A24 to ensure the tools of the future are shaped by the creators who use them. Find out more → https://t.co/KN3HdGVjGS https://t.co/

Our Robotics Accelerator has launched with 15 startups helping shape the future of physical AI in Europe. 🤖 This three-month program will connect them with access to our AI stack, Gemini Robotics mod

When millions of AI agents interact with each other, new collective behaviors can emerge. 🌐 Together with @schmidtsciences, @coop_ai, @ARIA_research and supported by @GoogleOrg, we’re launching a $10

In Sierra Leone, a surging student population is outpacing available teachers. Our latest research explores how AI can act as a partner to support educators in these environments – amplifying their re

DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs. Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the mo

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

Gemini 2.5 Flash Benchmark Update

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

Physics-IQ Verified

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

gemini-2.5 - Arena-Hard-Auto

gemini-2.5-flash - SWE-Bench Verified

Gemini-2.5-Flash-04-17 - LiveCodeBench