GPT-5.4 Drops With a 1 Million Token Context Window and Built-In Computer Use

OpenAI just shipped GPT-5.4, and it's not just another incremental upgrade. Released March 5 via the Chat Completions and Responses API, this model introduces three capabilities that fundamentally change what you can build with a language model: a 1.05 million token context window, built-in computer use, and a new feature called tool search that lets the model discover and load tools on demand. If GPT-5.3 was about making models faster, GPT-5.4 is about making them more capable agents.

The Million-Token Context Window

Let's start with the headline number. GPT-5.4 supports a 1.05 million token context window — roughly 3,000 pages of text, or about 15 average-length novels. That's not just a benchmark bragging point. It's the threshold where AI agents can realistically work with entire codebases, complete legal discovery documents, or full research paper collections without needing to chunk and summarize.

To put this in perspective, Claude Opus 4.6 currently offers 200K tokens. Google's Gemini 3 models reach 1M but with quality degradation at the extremes. GPT-5.4's million-token window comes with a pricing twist: prompts exceeding 272K input tokens are charged at 2x input and 1.5x output rates for the full session. OpenAI is essentially saying "yes, you can use the whole window, but we're going to charge you for the privilege." At $2.50 per million input tokens (standard rate), that's still remarkably affordable for what you're getting.

But the real innovation isn't just the window size — it's how OpenAI handles what happens when you fill it up.

Compaction: The Memory That Shrinks Itself

Here's the problem with massive context windows: eventually, you fill them. In long-running agent workflows — think a coding assistant that's been debugging for hours, or a research agent that's been reading papers all day — the conversation history grows until it hits the limit. Previously, you'd either truncate (losing important context) or start a new session (losing everything).

GPT-5.4 introduces compaction, a native feature that automatically compresses context when it crosses a configurable threshold. Think of it like your brain's ability to forget the specific words someone used in a conversation while remembering the key points. When triggered, compaction produces an encrypted, opaque item that carries forward the essential state and reasoning from the prior context using far fewer tokens.

The implementation is elegant. You set a compact_threshold in your API call, and when the rendered token count crosses that line, the server runs a compaction pass mid-stream. The compacted context item appears in your response, and you carry it forward to the next turn. No separate API call needed. For developers building long-running agents, this is the difference between agents that can work for minutes and agents that can work for hours or days.

Tool Search: The End of Tool Overload

This might be the most underappreciated feature in the release. If you've built an AI agent with more than a dozen tools, you know the pain: every tool definition goes into the system prompt, eating tokens, slowing responses, and confusing the model when it has to pick from 50+ options. It's like handing someone a 200-page manual when they just need to know how to use a screwdriver.

Tool search flips this on its head. Instead of loading every tool definition upfront, you mark tools with defer_loading: true and add a tool_search entry to your tools array. The model sees only the name and description of deferred tools at first. When it needs a specific tool, it searches for it, loads the full definition (including parameter schemas) into context, and uses it — all within the same API call.

The practical impact is enormous. An enterprise agent with 200 internal tools no longer needs to stuff all 200 definitions into every request. The model discovers what it needs on demand, preserving cache performance and reducing token usage dramatically. OpenAI recommends grouping tools into namespaces of fewer than 10 functions each, with clear descriptions so the model can search effectively.

This is a capability that only GPT-5.4 and later models support, and it's a strong signal about where OpenAI sees the future: agents that interact with large, complex tool ecosystems without human curation of which tools to include.

Built-In Computer Use

GPT-5.4 also gets native computer use through the Responses API — the ability to interact with desktop applications via screenshots, clicks, and keyboard input. This puts it in direct competition with Anthropic's Claude computer use, which has been available since late 2024. While computer use isn't new conceptually, having it built into OpenAI's flagship model with a dedicated API surface makes it significantly more accessible for developers building automation workflows.

The Pricing Picture

At $2.50 per million input tokens and $15.00 per million output tokens, GPT-5.4 sits at a premium over GPT-5.2 ($1.75 input) but well below the GPT-5.4 Pro tier. Cached inputs drop to $0.25 per million — a 10x discount that makes tool search's cache-preserving design even more valuable. There's also a GPT-5.4 Pro variant for "tougher problems that benefit from more compute," available through the Responses API.

A million tokens of context means nothing if you can't manage it. Compaction is the quiet revolution that makes massive context windows actually usable.

Key Takeaways

GPT-5.4 offers a 1.05M token context window — roughly 3,000 pages of text in a single session
Compaction automatically compresses context mid-conversation, enabling agents that run for hours without losing critical state
Tool search lets models discover and load tools on demand, eliminating the need to front-load hundreds of tool definitions
Built-in computer use via the Responses API puts GPT-5.4 in direct competition with Claude's computer use capabilities
Pricing starts at $2.50/M input tokens with 10x discounts for cached inputs, though long-context sessions (>272K tokens) incur premium rates

Our Take

The three features in GPT-5.4 aren't just independent upgrades — they're a coherent vision for what AI agents should look like. Tool search solves the problem of agents that need access to large tool ecosystems. The million-token context window gives agents the memory to work on complex, multi-hour tasks. And compaction ensures that memory doesn't eventually choke the system. Together, they describe an agent that can sit down at a computer, discover what tools are available, work on a problem for hours, and maintain coherent reasoning throughout — without a human babysitting the context window. That's a meaningful step beyond what was possible even two months ago. The competitive implications are significant. Anthropic's Claude has had computer use longer, but OpenAI's tool search feature has no direct equivalent in the market. Google's Gemini matches on context length but lacks native compaction. If you're building production AI agents, GPT-5.4's combination of features makes it arguably the most complete foundation available right now — not the best at any single thing, but the most cohesive package for agentic workloads.

The Million-Token Context Window

Compaction: The Memory That Shrinks Itself

Tool Search: The End of Tool Overload

Built-In Computer Use

The Pricing Picture

Key Takeaways

Our Take

Sources

Related Articles

IBM's Granite 4.0 Speech Model Hits #1 on Open ASR — With Just 1 Billion Parameters

Z.ai Launches GLM-5 Turbo: China's Open Source Champion Goes Proprietary

Google's Gemini 3.1 Flash-Lite Wants to Be the Cheapest Smart Model You'll Ever Use