What’s it about?
OpenAI has released a new AI model called GPT-5.4 that marks a significant advancement in the development of autonomous systems. For the first time, various core capabilities such as logical reasoning, programming, and direct control of computer interfaces are bundled in a single model. The AI can independently operate desktop environments, execute mouse and keyboard actions, and interpret screen content.
In tests for desktop navigation, the model achieves a success rate of 75 percent on the OSWorld-Verified-Benchmark, surpassing both previous AI systems and human benchmarks. At the same time, GPT-5.4 operates more efficiently than its predecessors and requires fewer computing resources while simultaneously delivering a reduced error rate.
Background & Context
The capability for native computer control represents a paradigm shift. While previous AI models were mostly oriented toward specific tasks, GPT-5.4 combines multiple competency areas. The model can coordinate and execute complex, multi-step workflows across different software applications — a development that could enable new forms of digital knowledge work.
The technical improvements are substantial: the model supports context lengths of up to one million tokens, enabling detailed and long-term work planning. The error rate for factual statements has been reduced by around one third. Dynamic tool search has been improved, allowing the system to more efficiently identify suitable tools for specific tasks. Web research functions have also been optimized to consolidate information from various sources.
With this release, OpenAI positions itself in direct competition with other providers such as Anthropic, which are also working on agent functionalities. The integration of various capabilities into a unified model could mean for professional users that they no longer need to switch between different specialized variants.
What does this mean?
- Professional users can automate complex workflows that previously required manual interaction with various programs
- The consolidation of reasoning, coding, and computer control in one model simplifies use for developers and companies
- Higher efficiency with lower token consumption could reduce costs and enable faster response times
- Desktop navigation capability opens up new deployment scenarios for AI agents in enterprise environments
- With the extended context length of one million tokens, larger projects and longer work sessions can be realized
Sources
GPT-5.4: OpenAI combines reasoning and coding with computer control (Heise)
OpenAI launches AI models GPT-5.4: Coding, reasoning, and computer control in one model (The Decoder)
Introducing GPT-5.4 (OpenAI)
OpenAI GPT-5.4: New AI model attacks Anthropic with mouse agents and dynamic tool search (Notebookcheck)
OpenAI’s GPT-5.4 model release focuses on AI agents (The Verge)
This article was created with AI and is based on the cited sources and the language model’s training data.
