Name: GPT-5.4
Price: 20 USD
Availability: InStock
Rating: 61.2 (1 reviews)
Author: OpenAI

GPT-5.4 by OpenAI | AI Market Cap

HF PapersOpenAIresearch2mo ago

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel privacy-themed ToM challenge, ToM for Steering Beliefs (ToM-SB), in which a defender must act as a Double Agent to steer the beliefs of an attacker with partial prior knowledge within a shared universe. To succeed on ToM-SB, the defender must engage with and form a ToM of the attacker, with a goal of fooling the attacker into believing they have succeeded in extracting sensitive information. We find that strong frontier models like Gemini3-Pro and GPT-5.4 struggle on ToM-SB, often failing to fool attackers in hard scenarios with partial attacker prior knowledge, even when prompted to reason about the attacker's beliefs (ToM prompting). To close this gap, we train models on ToM-SB to act as AI Double Agents using reinforcement learning, testing both fooling and ToM rewards. Notably, we find a bidirectionally emergent relationship between ToM and attacker-fooling: rewarding fooling success alone improves ToM, and rewarding ToM alone improves fooling. Across four attackers with different strengths, six defender methods, and both in-distribution and out-of-distribution (OOD) evaluation, we find that gains in ToM and attacker-fooling are well-correlated, highlighting belief modeling as a key driver of success on ToM-SB. AI Double Agents that combine both ToM and fooling rewards yield the strongest fooling and ToM performance, outperforming Gemini3-Pro and GPT-5.4 with ToM prompting on hard scenarios. We also show that ToM-SB and AI Double Agents can be extended to stronger attackers, demonstrating generalization to OOD settings and the upgradability of our task.

View Source

#huggingface#daily-papers

GPT-5.4

Similar Models

Introducing LifeSciBench

Social & Blog Posts10

Research Papers7

Other

Introducing GPT-5.4

We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate mo

Predicting model behavior before release by simulating deployment

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one fr

Introducing GPT-5.4

OpenAI DevDay 2026 applications are now open! Our biggest developer event gets even bigger. 📍 San Francisco 📅 September 29 Apply by July 10: https://t.co/BJyK2EbKuu

Samsung Electronics brings ChatGPT and Codex to employees

As AI takes on longer, higher-stakes tasks, we want models to carry beneficial and safe behavior into new domains beyond their training—and maintain it under pressure. That’s the idea behind our new r

Improving health intelligence in ChatGPT

Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, L

Introducing LifeSciBench

We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate mo

Let’s talk about evals. We’re always looking for better ways to measure and forecast model progress, especially as benchmarks get saturated or gamed. @tejalpatwardhan, who leads our frontier evals tea

Predicting model behavior before release by simulating deployment

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one fr

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Streaming Communication in Multi-Agent Reasoning

CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

IntentGrasp: A Comprehensive Benchmark for Intent Understanding

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

Introducing GPT‑5 for developers

GPT5.4 - GAIA