Name: deepseek-r1
Author: DeepSeek

deepseek-r1 by DeepSeek | AI Market Cap

HF PapersDeepSeekresearch1w ago

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Long-horizon agents depend on context management: systems compress, summarize, and evict old tokens so tasks can continue beyond finite windows. That is safe only when dropped information is no longer needed or has been internalized. Plans are the stress case: they are written early, used for many steps, and first to be evicted. We introduce replay pairing, a diagnostic that runs the same trajectory with and without the plan in history and measures hidden-state cosine distance. On Llama-3.1-70B, plan signal spikes to 0.453 one step after the plan, then falls 4.1x in a single action-observation step; HotpotQA falls 12.4x. This is evidence that standard LLM agents do not carry plans forward as persistent state, and instead depend on the plan remaining in context. A layer-L32 probe detects this decay as a diagnostic, not as proof that it reads plan content itself. Reasoning models add a measurement confound: their `<think>` traces re-derive plan content, so standard stripping leaves plan evidence in the stripped condition. We name this the reasoning-trace confound and fix it with strict stripping, which removes prior `<think>` blocks from the stripped run only. It recovers +163% of the step+1 signal in-sample and +153% held out, while not meaningfully changing non-reasoning Llama (+4.8%). On DeepSeek-R1-Distill-Llama-70B, a Llama-trained probe transfers at AUROC 0.748 (p=6e-4), while R1-specific probes reach 1.000, suggesting R1 encodes plan signal in a different hidden-state direction. Finally, a compression stress test shows the practical cost: naive plan eviction cuts ALFWorld success by 34.7pp, while probe-gated re-surfacing does not recover it. The contribution is a measurement and stress-test framework showing that agent-critical information can be context-resident rather than persistent. Context management is load bearing, but plan protection alone is not enough.

View Source

#huggingface#daily-papers

HF PapersDeepSeekresearch2mo ago

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execution with proprietary models, leading to substantial token costs, high latency, and brittle integration. Training a small LLM end-to-end with reinforcement learning (RL) is a promising alternative, yet it faces a critical bottleneck in designing reliable and computationally feasible rewards for website generation. Unlike single-file coding tasks that can be verified by unit tests, website generation requires evaluating inherently subjective aesthetics, cross-page interactions, and functional correctness. To this end, we propose WebGen-R1, an end-to-end RL framework tailored for project-level website generation. We first introduce a scaffold-driven structured generation paradigm that constrains the large open-ended action space and preserves architectural integrity. We then design a novel cascaded multimodal reward that seamlessly couples structural guarantees with execution-grounded functional feedback and vision-based aesthetic supervision. Extensive experiments demonstrate that our WebGen-R1 substantially transforms a 7B base model from generating nearly nonfunctional websites into producing deployable, aesthetically aligned multi-page websites. Remarkably, our WebGen-R1 not only consistently outperforms heavily scaled open-source models (up to 72B), but also rivals the state-of-the-art DeepSeek-R1 (671B) in functional success, while substantially exceeding it in valid rendering and aesthetic alignment. These results position WebGen-R1 as a viable path for scaling small open models from function-level code generation to project-level web application generation.

View Source

#huggingface#daily-papers

arXivDeepSeekai3mo ago

SliderQuant: Accurate Post-Training Quantization for LLMs

In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers equally, but this may be not optimal in challenging bit-width settings. We empirically study the quantization impact of different layers on model accuracy, and observe that: (1) shallow/deep layers are usually more sensitive to quantization than intermediate layers; (2) among shallow/deep layers, the most sensitive one is the first/last layer, which exhibits significantly larger quantization error than others. These empirical observations imply that the quantization design for different layers of LLMs is required on multiple levels instead of a single level shared to all layers. Motivated by this, we propose a new PTQ framework termed Sliding-layer Quantization (SliderQuant) that relies on a simple adaptive sliding quantization concept facilitated by few learnable parameters. The base component of SliderQuant is called inter-layer sliding quantization, which incorporates three types of novel sliding window designs tailored for addressing the varying quantization sensitivity of shallow, intermediate and deep layers. The other component is called intra-layer sliding quantization that leverages an incremental strategy to quantize each window. As a result, SliderQuant has a strong ability to reduce quantization errors across layers. Extensive experiments on basic language generation, zero-shot commonsense reasoning and challenging math and code tasks with various LLMs, including Llama/Llama2/Llama3/Qwen2.5 model families, DeepSeek-R1 distilled models and large MoE models, show that our method outperforms existing PTQ methods (including the latest PTQ methods using rotation transformations) for both weight-only quantization and weight-activation quantization.

View Source

#cs.AI#cs.AI

deepseek-r1

Similar Models

DeepSeek-R1 Release 2025/01/20

Social & Blog Posts7

Research Papers16

Other

deepseek-r1 is now available on Ollama

DeepSeek-R1-0528 Release 2025/05/28

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs

Models & Pricing

⚡️ Efficiency Gains 🤖 DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost. 📊 Benchmarks show V3.2-Exp perform

Models & Pricing

DeepSeek-R1 Release 2025/01/20

DeepSeek-R1-0528 Release 2025/05/28

The Temperature Parameter

⚡️ Efficiency Gains 🤖 DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost. 📊 Benchmarks show V3.2-Exp perform

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs

Context Caching is Available 2024/08/02

Information-Aware KV Cache Compression for Long Reasoning

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models

SliderQuant: Accurate Post-Training Quantization for LLMs

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Agentic AI and the next intelligence explosion

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

deepseek-r1 - Arena-Hard-Auto

DeepSeek-R1-0528 - LiveCodeBench

DeepSeek-R1-0528 - LiveCodeBench

deepseek-r1

Similar Models

DeepSeek-R1 Release 2025/01/20

Social & Blog Posts7

Research Papers16

Other

deepseek-r1 is now available on Ollama

DeepSeek-R1-0528 Release 2025/05/28

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs

Models &amp; Pricing

⚡️ Efficiency Gains 🤖 DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost. 📊 Benchmarks show V3.2-Exp perform

Models &amp; Pricing

DeepSeek-R1 Release 2025/01/20

DeepSeek-R1-0528 Release 2025/05/28

The Temperature Parameter

⚡️ Efficiency Gains 🤖 DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost. 📊 Benchmarks show V3.2-Exp perform

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs

Context Caching is Available 2024/08/02

Information-Aware KV Cache Compression for Long Reasoning

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models

SliderQuant: Accurate Post-Training Quantization for LLMs

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Agentic AI and the next intelligence explosion

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

deepseek-r1 - Arena-Hard-Auto

DeepSeek-R1-0528 - LiveCodeBench

DeepSeek-R1-0528 - LiveCodeBench

Models & Pricing

Models & Pricing