https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct - SWE-Bench Verified
SWE-Bench Verified resolved rate 69.6
View sourceQwen
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
Running this yourself: consumer gpu should be enough.
Qwen2.5 7B is the latest series of Qwen large language models.
29.7
Quality Score
---
Arena ELO
7B
Parameters
131K
Context
Sign in to join the discussion
0
Downloads
0
Likes
Oct 2024
Released
Benchmarks
5
Open Source
1
Research
1
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
SWE-Bench Verified resolved rate 69.6
View sourceComplete: 46.1 | Instruct: 37.6 | 7B params
View sourceGAIA score 5.3 from PurpleNightmare-ppo-qwen2.5-7b
View sourceGAIA score 4.7 from rft-2
View sourceGAIA score 4.7 from rft-2
View sourceExtracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it showed the weakest results. Larger models (7B--8B parameters) consistently outperformed smaller counterparts in sensitivity and MCC. A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance, though at the cost of slightly lower specificity and precision. Feature-level results showed reliable extraction of physiological symptoms across most models, whereas psychological complaints, administrative requests, and complex somatic features remained challenging. These findings establish a practical, privacy-preserving blueprint for deploying open-source SLMs in multilingual clinical NLP settings with limited infrastructure and annotation resources, and highlight the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.
Qwen2.5 7B Instruct is now available through local Ollama runtime. 32K context window listed. Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.
SWE-Bench Verified resolved rate 69.6
Complete: 46.1 | Instruct: 37.6 | 7B params
GAIA score 5.3 from PurpleNightmare-ppo-qwen2.5-7b