Command A+ Benchmark Update
Quality: 29.3/100 | Price: $0/M tokens | Output: 177.843 tok/s | HumanEval: 0.378%
View sourceCohere
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...
Running this yourself: can likely run on your own machine.
46.3
Quality Score
1327
Arena ELO
111B
Parameters
256K
Context
Sign in to join the discussion
0
Downloads
0
Likes
Mar 2025
Released
Benchmarks
19
Open Source
1
Research
4
General
6
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
Quality: 29.3/100 | Price: $0/M tokens | Output: 177.843 tok/s | HumanEval: 0.378%
View sourceQuality: 29.3/100 | Price: $0/M tokens | Output: 177.843 tok/s | HumanEval: 0.378%
Quality: 29.3/100 | Price: $0/M tokens | Output: 183.116 tok/s | HumanEval: 0.378%
View sourceQuality: 7.7/100 | Price: $4.375/M tokens | Output: 74.994 tok/s | MMLU: 0.712% | HumanEval: 0.287%
View sourceQuality: 7.7/100 | Price: $4.375/M tokens | Output: 74.994 tok/s | MMLU: 0.712% | HumanEval: 0.287%
View sourceQuality: 7.7/100 | Price: $4.375/M tokens | Output: 74.007 tok/s | MMLU: 0.712% | HumanEval: 0.287%
View sourceWhen you use Cohere, there are no staggered releases. No sudden disablements. We trust you completely: "[The customer] is in full control. We can't see in, we can't switch it off" - CEO @aidangomez https://t.co/sNof7pjEUJ
Quality: 29.3/100 | Price: $0/M tokens | Output: 183.116 tok/s | HumanEval: 0.378%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 74.994 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 74.994 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 74.007 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 74.007 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 73.566 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 75.14 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 29.3/100 | Price: $0/M tokens | Output: 190.772 tok/s | HumanEval: 0.378%
Quality: 29.3/100 | Price: $0/M tokens | Output: 201.491 tok/s | HumanEval: 0.378%
Quality: 29.3/100 | Price: $0/M tokens | Output: 204.713 tok/s | HumanEval: 0.378%
Quality: 7.7/100 | Price: $4.375/M tokens | Output: 72.904 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 13.5/100 | Price: $4.375/M tokens | Output: 76.334 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 37.2/100 | Price: $0/M tokens | Output: 197.762 tok/s | HumanEval: 0.378%
Quality: 37.2/100 | Price: $0/M tokens | Output: 199.45 tok/s | HumanEval: 0.378%
Quality: 13.5/100 | Price: $4.375/M tokens | Output: 73.819 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 37.2/100 | Price: $0/M tokens | Output: 178.173 tok/s | HumanEval: 0.378%
Quality: 13.5/100 | Price: $4.375/M tokens | Output: 71.329 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Quality: 13.5/100 | Price: $4.375/M tokens | Output: 70.833 tok/s | MMLU: 0.712% | HumanEval: 0.287%
Video diffusion models have enabled remarkable progress in video generation and editing. However, content preservation remains a core challenge: existing methods regenerate every pixel and often alter elements that should remain unchanged, such as characters or background scenes. We introduce Vera, a layered diffusion framework for content-preserving video editing. Instead of regenerating the entire video, Vera generates an edit layer along with an alpha matte for compositing with the source video, separating creative editing from content preservation by design. To encourage coherent composition with the source video, we extend the text-to-video DiT into a Mixture-of-Transformers (MoT) architecture, with separate DiTs for each layer that interact through joint self-attention. To support the training of Vera, we further construct a high-quality layered dataset with accurate alpha mattes, diverse scenes and dynamics, and visual effects. Across our quantitative benchmark and human preference study, Vera outperforms leading open-source video editing models in content preservation while remaining competitive in edit quality, using 486K frames of layered training data.
Reconstructing dynamic non-rigid objects from monocular video requires integrating visual cues from direct observations with data-driven priors over geometry and appearance. Prior approaches either learn to directly predict 4D representations from visual input or initialize a 3D representation that is subsequently deformed and refined based on video evidence. However, the former are constrained by the scarcity of 4D training data, while the latter leverage priors only for the initial reconstruction and rely solely on video supervision thereafter; neither handles complex in-the-wild scenarios with large deformations and occlusions well. We present Lift4D, a test-time optimization framework that addresses both limitations. First, we adapt an existing single-view 3D reconstruction model to yield temporally consistent per-frame predictions via causal latent conditioning, providing a coherent initialization for a deformable 3D Gaussian Splatting representation. We then ``sculpt'' this representation to match the input video through an occlusion-aware optimization that faithfully recovers visible surface details while completing unobserved regions using a view-conditioned diffusion prior. We demonstrate that Lift4D clearly improves over prior 4D reconstruction methods, particularly on challenging in-the-wild sequences with severe occlusions and non-rigid motion.
Cross-Chart Retrieval-Augmented Generation (RAG) is critical for complex multi-modal analytical tasks in scientific, business, and political domains. However, existing benchmarks either focus on tables, which are well-structured and textualized, or generate cross-chart questions by simply extracting key points, which often induces lexical overlap between queries and evidence and yields logically inconsistent reasoning chains. To address this, we introduce ChartWalker, a novel framework for constructing challenging cross-chart RAG tasks. ChartWalker features a hierarchical knowledge graph construction method tailored to charts, which organizes entities and relations by granularity to preserve analytical structure. We then propose a structure-aware sampling algorithm that synthesizes semantically coherent, multi-hop reasoning paths, enabling explicit control over query difficulty and granularity for QA generation. Built with this framework, we release ChartWalker-Bench, a comprehensive benchmark spanning diverse domains and cross-chart query types. Extensive evaluations across major RAG paradigms reveal significant performance gaps, underscoring the benchmark's difficulty and utility. Furthermore, we provide ChartWalker-Agent, an agentic baseline to facilitate analysis and inspire future system design.
Command A is now available through local Ollama runtime. 16K context window listed. 111 billion parameter model optimized for demanding enterprises that require fast, secure, and high-quality AI