Name: Gemini 2
Price: 20 USD
Availability: InStock
Rating: 52.2 (1 reviews)
Author: Google

Gemini 2 by Google | AI Market Cap

HF PapersGoogleresearch1mo ago

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advantages across diverse domains, yet current frameworks fail to exploit the complementary strengths of models and skills, thereby limiting their performance on downstream tasks. In this paper, we present Maestro (Multimodal Agent for Expert-Skill Targeted Reinforced Orchestration), a Reinforcement Learning (RL)-driven orchestration framework that reframes heterogeneous multimodal tasks as a sequential decision-making process over a hierarchical model-skill registry. Rather than consolidating all knowledge into a single model, Maestro trains a lightweight policy to dynamically compose ensembles of frozen expert models and a two-tier skill library, deciding at each step whether to invoke an external expert, which model-skill pair to select, and when to terminate. The policy is optimized via outcome-based RL, requiring no step-level supervision. We evaluate Maestro across ten representative multimodal benchmarks spanning mathematical reasoning, chart understanding, high-resolution perception, and domain-specific analysis. With only a 4B orchestrator, Maestro achieves an average accuracy of 70.1%, surpassing both GPT-5 (69.3%) and Gemini-2.5-Pro (68.7%). Crucially, the learned coordination policy generalizes to unseen models and skills without retraining: augmenting the registry with out-of-domain experts yields a 59.5% average on four challenging benchmarks, outperforming all closed-source baselines. Maestro further maintains high computational efficiency with low latency. The source code is available at https://github.com/jinyangwu/Maestro.

View Source

#huggingface#daily-papers

Gemini 2

Similar Models

As generative AI tools continue to evolve, we believe it's more important than ever to know what's AI-generated and what isn't. That’s why @GoogleDeepMind launched SynthID in 2023—a technology that ad

Social & Blog Posts2

Research Papers18

Other

gemini-2.5-flash - Arena-Hard-Auto

Google DeepMind 🤝 @A24 We’re launching a research partnership with A24 to ensure the tools of the future are shaped by the creators who use them. Find out more → https://t.co/KN3HdGVjGS https://t.co/

gemini-2.5-flash - Arena-Hard-Auto

gemini-2.5-flash - SWE-Bench Verified

Gemini-2.5-Flash-04-17 - LiveCodeBench

As generative AI tools continue to evolve, we believe it's more important than ever to know what's AI-generated and what isn't. That’s why @GoogleDeepMind launched SynthID in 2023—a technology that ad

Google DeepMind 🤝 @A24 We’re launching a research partnership with A24 to ensure the tools of the future are shaped by the creators who use them. Find out more → https://t.co/KN3HdGVjGS https://t.co/

Representation Distribution Matching for One-Step Visual Generation

From SRA to Self-Flow: Data Augmentation or Self-Supervision?

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

GEAR: Guided End-to-End AutoRegression for Image Synthesis

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Towards Automating Scientific Review with Google's Paper Assistant Tool

Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

Confidence-Aware Tool Orchestration for Robust Video Understanding

MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation

DiffusionBench: On Holistic Evaluation of Diffusion Transformers

SDR: Set-Distance Rewards for Radiology Report Generation

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

VoxMind: An End-to-End Agentic Spoken Dialogue System

ROSE: Retrieval-Oriented Segmentation Enhancement

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

gemini-2.5-flash - SWE-Bench Verified

Gemini-2.5-Flash-04-17 - LiveCodeBench