Name: command-a-plus-05-2026-fp8
Author: Cohere

Open SourceCohereToday

Models Overview Explore and compare our open source models

View source

ResearchCohereToday

Aya A family of multilingual research models covering 70+ languages

View source

ResearchCohere2d ago

Xetrieval: Mechanistically Explaining Dense Retrieval

Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieval. Xetrieval first introduces a lightweight reasoning internalizer that approximates Chain-of-Thought reasoning directly in the embedding space with a single forward pass, enriching sentence embeddings with reasoning-oriented information while avoiding expensive autoregressive generation. It then decomposes these reasoning-enhanced embeddings into sparse, human-interpretable features, each associated with a coherent natural language description. By aggregating sparse feature overlaps across multiple document-side views, Xetrieval provides feature-level explanations of individual retrieval decisions. Experiments on diverse retrievers and benchmarks show that Xetrieval uncovers coherent interpretable features, yields stronger pair-level intervention effects, and supports task-level feature steering. The project page and source code are available at https://hihiczx.github.io/Xetrieval .

View source

ResearchCohere2d ago

Native Audio-Visual Alignment for Generation

Joint audio-video generation aims to synthesize temporally synchronized and semantically coherent visual-acoustic content. However, existing open-source methods mainly rely on either dual-tower designs with posterior alignment or fully unified tri-modal designs that mix textual context, audio and video in one shared space. The former weakens fine-grained audio-video co-evolution, while the latter couples semantic conditioning with low-level synchronization. To address these limitations, we propose NAVA, a Native Audio-Visual Alignment framework for joint audio-video generation. NAVA is built upon context-conditioned native audio-visual alignment: it first establishes audio-video correspondence in a dedicated interaction space, and then uses external context to condition the joint denoising process. Specifically, NAVA is instantiated with an Align-then-Fuse MMDiT architecture, which transitions from modality-aware audio-video alignment to modality-shared joint denoising. Furthermore, we introduce Timbre-in-Context Conditioning to associate reference timbre cues with corresponding speech spans to achieve controllable speech timbre. Experiments on Verse-Bench and Seed-TTS, together with a user study, demonstrate that NAVA achieves superior video quality, precise audio-visual synchronization, competitive audio quality, and stronger reference-timbre controllability using only 6.3B parameters.

View source

command-a-plus-05-2026-fp8

Similar Models

Models Overview Explore and compare our open source models

Social & Blog Posts7

Research Papers3

Command A+ sets a new high for Cohere's machine translation capabilities. Opening a clear gap over open source peers Mistral Medium 3.5, DeepSeek, & OpenAI's gpt-oss, as well as Claude Opus 4.6. A+ al

Aya A family of multilingual research models covering 70+ languages

Xetrieval: Mechanistically Explaining Dense Retrieval

Native Audio-Visual Alignment for Generation

Models Overview Explore and compare our open source models

Model Vault Your dedicated, secure model inference platform — managed by Cohere

Rerank A powerful model that provides a semantic boost to search quality

Aya A family of multilingual research models covering 70+ languages

Transcribe NEW A speech recognition model for generating highly accurate audio transcripts

Command NEW High-performance models for agentic, multimodal, multilingual AI

Command A+ sets a new high for Cohere's machine translation capabilities. Opening a clear gap over open source peers Mistral Medium 3.5, DeepSeek, & OpenAI's gpt-oss, as well as Claude Opus 4.6. A+ al

Xetrieval: Mechanistically Explaining Dense Retrieval

Native Audio-Visual Alignment for Generation

Towards Consistent Video Geometry Estimation