Skip to main content

Models Deploy Leaderboards Marketplace

Track, rank, and compare every AI model in the world.

Platform

Models
Deploy
Leaderboards
Compare
News
Marketplace
Workspace
Deployments
Discover Watchlists
Pricing

Categories

LLMs
Image Gen
Vision
Multimodal
Embeddings
Speech
Video
Code
Browser Agents
Specialized

Company

About
Roadmap
Contact
FAQ
Providers
API
Terms
Privacy

© 2026 AI Market Cap. All rights reserved.

Qwen3 VL 8B Instruct by Qwen | AI Market Cap

Qwen3 VL 8B Instruct

#206MultimodalOpen Weights

Q

Qwen

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

Running this yourself: desktop gpu should be enough.

Model updates refreshed6h agoJul 4, 2026news + changelog

View Updates Start Free Trial

36.1

Quality Score

---

Arena ELO

8B

Parameters

256K

Context

Similar Models

Moonshot AI·Unknown

Gemma 4 31B#214

Discussion (0)

Sign in to join the discussion

Loading comments...

0

Downloads

0

Likes

Oct 2025

Released

Benchmarks

17

high

Open Source

1

medium

Research

2

low

What Changed Recently

Recent launch, pricing, benchmark, and API signals linked to this model or its provider.

BenchmarksAlibabaToday

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 139.294 tok/s | MMLU: 0.686% | HumanEval: 0.332%

BenchmarksAlibaba

Benchmarks & Rankings17

BenchmarksAlibabaToday

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 139.294 tok/s | MMLU: 0.686% | HumanEval: 0.332%

Research Papers2

HF PapersQwenresearch5d ago

Other

ollama-libraryQwenopen_sourceopen sourceToday

Qwen3 VL 8B Instruct is now available on Ollama

73.9

gemma-4-12B-it#227

Gemma 4 31B IT#21

Gemini 3 Flash#186

Google·Undisclosed

Yesterday

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 139.301 tok/s | MMLU: 0.686% | HumanEval: 0.332%

BenchmarksAlibaba2d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 143.302 tok/s | MMLU: 0.686% | HumanEval: 0.332%

BenchmarksAlibaba3d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 145.348 tok/s | MMLU: 0.686% | HumanEval: 0.332%

BenchmarksAlibaba4d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 146.085 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibabaYesterday

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 139.301 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba2d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 143.302 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba3d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 145.348 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba4d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 146.085 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba5d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 144.583 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba6d ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 142.674 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 142.674 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 144.301 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 144.358 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 142.907 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 142.907 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 143.099 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba1w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 144.904 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba2w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 145.735 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba2w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 145.483 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

BenchmarksAlibaba2w ago

Qwen3 VL 8B Instruct Benchmark Update

Quality: 8.4/100 | Price: $0.31/M tokens | Output: 145.136 tok/s | MMLU: 0.686% | HumanEval: 0.332%

#benchmark#pricing#artificial-analysis

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Recent multimodal large language models have shown great promise in clinical image reasoning, but existing post-training pipelines remain predominantly outcome-centric, relying on final answer correctness or sequence-level preferences. This suffers from sparse credit assignment, making it difficult to optimize the reasoning process essential for clinical applications. Our analysis reveals that cascading errors from early-stage reasoning failures are a leading cause of incorrect predictions in medical visual question answering (VQA) benchmarks. Motivated by this, we propose Medical Reasoning-aware Policy Optimization (MRPO), an RL algorithm that incorporates step-wise process rewards. When the final answer is incorrect, MRPO assigns exponentially larger penalties to tokens in earlier invalid reasoning steps, breaking failure cascades without compromising successful paths. Across three multimodal LLM backbones, MRPO consistently outperforms standard GRPO and a recent RL baseline, and on Qwen3-VL-8B-Instruct even surpasses substantially larger medical MLLMs such as HuatuoGPT-Vision-34B by 2.79 points. Moreover, MRPO reduces early-stage reasoning failures from 64.0% to 13.0%, showing that targeted mitigation of cascading failures improves both reasoning quality and final answer accuracy. Our code is available at https://github.com/dmis-lab/MRPO

#huggingface#daily-papers

HF PapersQwenresearch5d ago

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

In collaborative dialogue, shared perception does not guarantee shared interpretation. Mutual understanding must be established through interaction. We investigate whether vision-language models (VLMs) can distinguish what could be shared from what has been shared between dialogue participants through grounding. We formulate this as an interpretation-matching task on 13,077 annotated reference expressions from HCRC MapTask dialogues, and evaluate VLMs under systematically controlled manipulations of dialogue context and map-information access. Our results show that providing authentic map images improves overall performance but shifts models toward over-predicting alignment. Textual descriptions of the same map content reproduce this bias, while non-informative images suppress alignment predictions entirely, indicating that the bias is driven by task-relevant map content, not the visual channel. This improvement comes at the cost of degraded accuracy on non-aligned cases. Calibration analysis and reference-chain tracking further suggest that models rely on static referential cues on the maps rather than tracking how grounding unfolds through dialogue history. We observe these patterns most clearly in Qwen3-VL-8B-Instruct and, to varying degrees, in four additional models from two architecture families. In models that exhibit the bias, map content, whether presented visually or textually, is treated as evidence of mutual understanding, conflating potential with established common ground.

#huggingface#daily-papers

Qwen3 VL 8B Instruct is now available through local Ollama runtime and Ollama Cloud. 256K context window listed. The most powerful vision-language model in the Qwen model family to date.

#deployability#ollama#qwen