Name: GLM-5.2
Price: 10 USD
Availability: InStock
Rating: 60.9 (2124 reviews)
Author: Z.ai

BenchmarksZ.ai6d ago

zai-org/GLM-5.2 · Hugging Face

Z.ai published benchmark or leaderboard evidence for GLM-5.2.

View source

GeneralZ.ai1mo ago

As models, contexts, and workloads grow, hidden assumptions in inference infrastructure can surface as output anomalies. Reliability requires more than throughput, latency, and availability. It also r

As models, contexts, and workloads grow, hidden assumptions in inference infrastructure can surface as output anomalies. Reliability requires more than throughput, latency, and availability. It also requires preserving the correctness of model state behind every generation.

View source

GeneralZ.ai1mo ago

After fixing correctness issues, we turned to the next bottleneck: Prefill throughput and GPU memory pressure in long-context Coding Agent serving. To address this, we introduced LayerSplit, a layer-w

After fixing correctness issues, we turned to the next bottleneck: Prefill throughput and GPU memory pressure in long-context Coding Agent serving. To address this, we introduced LayerSplit, a layer-wise KV Cache storage scheme. Instead of duplicating all layers on every GPU, https://t.co/OGptVovbtf

View source

GeneralZ.ai2mo ago

Fantastic to see GLM being applied to such fresh, dynamic scenarios.

View source

GLM-5.2

Similar Models

zai-org/GLM-5.2 · Hugging Face

Social & Blog Posts3

Other

GLM-5.2 is now available on Ollama Cloud

GLM-5.2 is now available on Ollama Cloud

As models, contexts, and workloads grow, hidden assumptions in inference infrastructure can surface as output anomalies. Reliability requires more than throughput, latency, and availability. It also r

After fixing correctness issues, we turned to the next bottleneck: Prefill throughput and GPU memory pressure in long-context Coding Agent serving. To address this, we introduced LayerSplit, a layer-w

Fantastic to see GLM being applied to such fresh, dynamic scenarios.

As models, contexts, and workloads grow, hidden assumptions in inference infrastructure can surface as output anomalies. Reliability requires more than throughput, latency, and availability. It also r

After fixing correctness issues, we turned to the next bottleneck: Prefill throughput and GPU memory pressure in long-context Coding Agent serving. To address this, we introduced LayerSplit, a layer-w

Fantastic to see GLM being applied to such fresh, dynamic scenarios.

zai-org/GLM-5.2 · Hugging Face