Mistral AI
Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...
Running this yourself: can likely run on your own machine.
56.8
Quality Score
1163
Arena ELO
128B
Parameters
262K
Context
Sign in to join the discussion
0
Downloads
0
Likes
Apr 2026
Released
Launches
6
Benchmarks
11
Open Source
2
Research
2
General
2
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
Quality: 39.2/100 | Price: $3/M tokens | Output: 161.524 tok/s | HumanEval: 0.396%
🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. 🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes https://t.co/ETMYDI9Isg
View source🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily https://t.co/Q2mdo8UBVo
View sourceMistral AI made the TIME100 Most Influential Companies list for 2026 — and the top 10 for AI. Why we're proud: customers run frontier models in production on their own terms, on their own infrastructure. Thank you to our customers for their trust and for joining us on the
🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. 🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes https://t.co/ETMYDI9Isg

Mistral’s AI Now Summit is coming to Paris on May 28 and tickets are live! What you’ll hear: 📷 Technical deep dives to help you build and deploy AI. 📷 Mistral’s founders on AI-driven transformation in large organizations, company trajectory, and upcoming releases. One day to https://t.co/8VcmrBer4A
🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily https://t.co/Q2mdo8UBVo
Today, we’re introducing Forge, a system for enterprises to build frontier-grade AI models grounded in their proprietary knowledge. 🌎 Forge bridges the gap between generic AI and enterprise-specific needs. Instead of relying on broad, public data, organizations can train models https://t.co/4YQ3ADvixr
Quality: 39.2/100 | Price: $3/M tokens | Output: 151.68 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 149.233 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 151.84 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 154.338 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 150.797 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 167.325 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 171.376 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 159.681 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 174.208 tok/s | HumanEval: 0.396%
Quality: 39.2/100 | Price: $3/M tokens | Output: 170.89 tok/s | HumanEval: 0.396%
The development of the Bielik v3 PL series, encompassing both the 7B and 11B parameter variants, represents a significant milestone in the field of language-specific large language model (LLM) optimization. While general-purpose models often demonstrate impressive multilingual capabilities, they frequently suffer from a fundamental architectural inefficiency: the use of universal tokenizers. These tokenizers, typically designed to cover a broad spectrum of languages, often fail to capture the morphological nuances of specific languages like Polish, leading to higher fertility ratios, increased inference costs, and restricted effective context windows. This report details the transition from the universal Mistral-based tokenization to a dedicated Polish-optimized vocabulary for the Bielik v3 models, exploring the FOCUS-based embedding initialization, the multi-stage pretraining curriculum, and the subsequent post-training alignment involving Supervised Fine-Tuning, Direct Preference Optimization, and Reinforcement Learning through Group Relative Policy Optimization with verifiable rewards.
Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are genuinely scarce. Most small companies skip it entirely. We built a six-agent AI system where each agent handles one analytical stage: profiling the organization, mapping assets, analyzing threats, evaluating controls, scoring risks, and generating recommendations. Agents share a persistent context that grows as the assessment proceeds, so later agents build on what earlier ones concluded -- the mechanism that distinguishes this from standard sequential agent pipelines. We tested it on a 15-person HIPAA-covered healthcare company and compared outputs to independent assessments by three CISSP practitioners -- the system agreed with them 85% of the time on severity classifications, covered 92% of identified risks, and finished in under 15 minutes. We then ran 30 repeated single-agent assessments across five synthetic but sector-realistic organizational profiles in healthcare, fintech, manufacturing, retail, and SaaS, comparing a general-purpose Mistral-7B against a domain fine-tuned model. Both completed every run. The fine-tuned model flagged threats the baseline could not see at all: PHI exposure in healthcare, OT/IIoT vulnerabilities in manufacturing, platform-specific risks in retail. The full multi-agent pipeline, however, failed every one of 30 attempts on a Tesla T4 with its 4,096-token default context window -- context capacity, not model quality, turned out to be the binding constraint.
Mistral Medium 3.5 is now available through local Ollama runtime. 256K context window listed. Mistral Medium 3.5 is the first flagship model of Mistral AI that merged instruction-following, reasoning, and coding in a single set of 128B weights.