Name: Qwen3 14B
Price: 0.1 USD
Availability: InStock
Rating: 33.5 (1 reviews)
Author: Qwen

ResearchDeepSeek2w ago

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (Recursive Automated Composition for Environment Scaling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (SEQUENTIAL, PARALLEL, SORT, and SELECT) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.

View source

Qwen3 14B

Similar Models

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

Qwen3 - GAIA

Research Papers9

Other

Qwen3 14B is now available on Ollama

Qwen3 - GAIA

Qwen3 14B is now available on Ollama

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

ECHO: Terminal Agents Learn World Models for Free

From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

NGM: A Plug-and-Play Training-Free Memory Module for LLMs

TEMPO: Scaling Test-time Training for Large Reasoning Models

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Fusian: Multi-LoRA Fusion for Fine-Grained Continuous MBTI Personality Control in Large Language Models

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

Qwen3 - GAIA

Qwen3 - GAIA