Step 3.5 Flash Benchmark Update
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 201.967 tok/s | HumanEval: 0.404%
View sourceStepfun
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....
Step 3.5 Flash is StepFun's most capable open-source foundation model.
This model is still tracked for research and discovery, but it is excluded from default public rankings until it returns to active status.
---
Quality Score
1146
Arena ELO
11B
Parameters
262K
Context
Sign in to join the discussion
0
Downloads
0
Likes
Jan 2026
Released
Benchmarks
19
Research
1
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 201.967 tok/s | HumanEval: 0.404%
View sourceQuality: 25.5/100 | Price: $0.15/M tokens | Output: 201.967 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 201.967 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 194.446 tok/s | HumanEval: 0.404%
View sourceQuality: 25.5/100 | Price: $0.15/M tokens | Output: 194.446 tok/s | HumanEval: 0.404%
View sourceQuality: 25.5/100 | Price: $0.15/M tokens | Output: 197.655 tok/s | HumanEval: 0.404%
View sourceQuality: 25.5/100 | Price: $0.15/M tokens | Output: 201.967 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 194.446 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 194.446 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 197.655 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 212.801 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 210.122 tok/s | HumanEval: 0.404%
Quality: 25.5/100 | Price: $0.15/M tokens | Output: 189.483 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 177.909 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 176.358 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 180.424 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 197.434 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 215.94 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 206.606 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 171.835 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 164.267 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 177.967 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 190.615 tok/s | HumanEval: 0.404%
Quality: 37.8/100 | Price: $0.15/M tokens | Output: 215.92 tok/s | HumanEval: 0.404%
Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write reusable knowledge, and maintain a growing repository. We introduce OPD-Evolver, a slow-fast co-evolution framework that cultivates such an agent evolver through on-policy self-distillation. In the fast loop, OPD-Evolver interacts with a four-level memory hierarchy to read, use, write, and maintain experience for rapid test-time evolution. In the slow loop, outcome-calibrated memory attribution and privileged hindsight distill these four abilities into the deployable policy. Across multi-domain benchmarks, OPD-Evolver surpasses memory systems such as ReasoningBank by up to 11.5%, and training-based methods such as Skill0 by ~5.8%. Further analysis shows that OPD-Evolver internalizes high-value experience and memory management, enabling OPD-Evolver-9B to challenge giant counterparts such as Qwen3.5-397B-A17B and Step-3.5-Flash, pointing beyond memory-augmented agents toward genuinely qualified agent evolvers.