OpenAI Presents GPT-5.4 with Native Computer Control and Higher Efficiency

What’s it about?

OpenAI has released a new AI model called GPT-5.4 that marks a significant advancement in the development of autonomous systems. For the first time, various core capabilities such as logical reasoning, programming, and direct control of computer interfaces are bundled in a single model. The AI can independently operate desktop environments, execute mouse and keyboard actions, and interpret screen content.

In tests for desktop navigation, the model achieves a success rate of 75 percent on the OSWorld-Verified-Benchmark, surpassing both previous AI systems and human benchmarks. At the same time, GPT-5.4 operates more efficiently than its predecessors and requires fewer computing resources while simultaneously delivering a reduced error rate.

Background & Context

The capability for native computer control represents a paradigm shift. While previous AI models were mostly oriented toward specific tasks, GPT-5.4 combines multiple competency areas. The model can coordinate and execute complex, multi-step workflows across different software applications — a development that could enable new forms of digital knowledge work.

The technical improvements are substantial: the model supports context lengths of up to one million tokens, enabling detailed and long-term work planning. The error rate for factual statements has been reduced by around one third. Dynamic tool search has been improved, allowing the system to more efficiently identify suitable tools for specific tasks. Web research functions have also been optimized to consolidate information from various sources.

With this release, OpenAI positions itself in direct competition with other providers such as Anthropic, which are also working on agent functionalities. The integration of various capabilities into a unified model could mean for professional users that they no longer need to switch between different specialized variants.

What does this mean?

Professional users can automate complex workflows that previously required manual interaction with various programs
The consolidation of reasoning, coding, and computer control in one model simplifies use for developers and companies
Higher efficiency with lower token consumption could reduce costs and enable faster response times
Desktop navigation capability opens up new deployment scenarios for AI agents in enterprise environments
With the extended context length of one million tokens, larger projects and longer work sessions can be realized