GLM-4.5-Air Benchmark Update
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 71.832 tok/s | MMLU: 0.815% | HumanEval: 0.684%
View sourceZ.ai
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications.
25.4
Quality Score
---
Arena ELO
Undisclosed
Parameters
131K
Context
Sign in to join the discussion
0
Downloads
0
Likes
Jul 2025
Released
Benchmarks
20
General
3
Recent launch, pricing, benchmark, and API signals linked to this model or its provider.
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 71.832 tok/s | MMLU: 0.815% | HumanEval: 0.684%
View sourceQuality: 16.5/100 | Price: $0.372/M tokens | Output: 71.832 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 75.222 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 85.319 tok/s | MMLU: 0.815% | HumanEval: 0.684%
View sourceQuality: 16.5/100 | Price: $0.372/M tokens | Output: 80.254 tok/s | MMLU: 0.815% | HumanEval: 0.684%
View sourceQuality: 16.5/100 | Price: $0.372/M tokens | Output: 78.163 tok/s | MMLU: 0.815% | HumanEval: 0.684%
View sourceAs models, contexts, and workloads grow, hidden assumptions in inference infrastructure can surface as output anomalies. Reliability requires more than throughput, latency, and availability. It also requires preserving the correctness of model state behind every generation.

After fixing correctness issues, we turned to the next bottleneck: Prefill throughput and GPU memory pressure in long-context Coding Agent serving. To address this, we introduced LayerSplit, a layer-wise KV Cache storage scheme. Instead of duplicating all layers on every GPU, https://t.co/OGptVovbtf
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 75.222 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 85.319 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 80.254 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 78.163 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 79.832 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 81.302 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 77.401 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 75.287 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 77.5 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 83.275 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 84.07 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 85.046 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 87.456 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 85.028 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 84.112 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 78.077 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 77.236 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 74.676 tok/s | MMLU: 0.815% | HumanEval: 0.684%
Quality: 16.5/100 | Price: $0.372/M tokens | Output: 74.71 tok/s | MMLU: 0.815% | HumanEval: 0.684%