Name: Qwen2.5-7B-Instruct
Rating: 42.4 (1244 reviews)
Author: Qwen

Benchmarks3mo ago

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

SWE-Bench Verified resolved rate 69.6

View source

Benchmarks3mo ago

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

SWE-Bench Verified resolved rate 69.6

View source

Benchmarks3mo ago

Qwen 2.5 - SWE-Bench Verified

SWE-Bench Verified resolved rate 40.2

View source

Open SourceQwenToday

Qwen2.5-7B-Instruct is now available on Ollama

Qwen2.5-7B-Instruct is now available through local Ollama runtime. 32K context window listed. Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.

View source

ResearchQwen3mo ago

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is always overconfident offers no useful signal for deferral. We present a multi-agent framework that combines domain-specific specialist agents with Two-Phase Verification and S-Score Weighted Fusion to improve both calibration and discrimination in medical multiple-choice question answering. Four specialist agents (respiratory, cardiology, neurology, gastroenterology) generate independent diagnoses using Qwen2.5-7B-Instruct. Each diagnosis is then subjected to a two-phase self-verification process that measures internal consistency and produces a Specialist Confidence Score (S-score). The S-scores drive a weighted fusion strategy that selects the final answer and calibrates the reported confidence. We evaluate across four experimental settings, covering 100-question and 250-question high-disagreement subsets of both MedQA-USMLE and MedMCQA. Calibration improvement is the central finding, with ECE reduced by 49-74% across all four settings, including the harder MedMCQA benchmark where these gains persist even when absolute accuracy is constrained by knowledge-intensive recall demands. On MedQA-250, the full system achieves ECE = 0.091 (74.4% reduction over the single-specialist baseline) and AUROC = 0.630 (+0.056) at 59.2% accuracy. Ablation analysis identifies Two-Phase Verification as the primary calibration driver and multi-agent reasoning as the primary accuracy driver. These results establish that consistency-based verification produces more reliable uncertainty estimates across diverse medical question types, providing a practical confidence signal for deferral in safety-critical clinical AI applications.

View source

Qwen2.5-7B-Instruct

Similar Models

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

Research Papers3

Other

Qwen2.5-7B-Instruct is now available on Ollama

Qwen 2.5 - SWE-Bench Verified

Qwen2.5-7B-Instruct is now available on Ollama

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Hindsight Credit Assignment for Long-Horizon LLM Agents

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified

Qwen 2.5 - SWE-Bench Verified