OpenAI rolls out ChatGPT Agent with reasoning, planning, and API integration

OpenAI has launched its most advanced system yet — a ChatGPT agent that can plan, reason, and perform multi-step tasks across the web and apps using integrated tools. The agent introduces a new layer of AI automation with features like browsing, terminal access, file editing, and API integration — all executed securely through a virtual computer.

What Is the ChatGPT Agent?

This agentic system builds on ChatGPT’s existing tools by allowing the AI to think, decide, and act independently. It uses a sandboxed virtual machine to complete tasks safely, switching between tools like a web browser, Python code interpreter, terminal, and file system.

The system is designed to automate workflows across work, research, and productivity, while keeping users in control at every step. OpenAI said the agent doesn’t just respond to prompts — it can break down complex instructions, plan the steps needed, execute each part autonomously, and notify users when finished.

What Can It Do?

According to OpenAI, the ChatGPT agent is built for real-world use cases and productivity scenarios. Key capabilities include:

Planning and Task Execution – Splits complex instructions into steps, executes them one by one, and adapts based on results
Tool Integration – Switches between browser, terminal, code interpreter, and file tools during execution
Web Browsing with Actions – Searches the web, logs into websites (with approval), downloads files, and extracts or summarizes information
API and App Access – Interacts with services like Gmail, Google Calendar, Notion, and GitHub to automate tasks
Multi-Step Reports – Can create slide decks, fill spreadsheets, analyze CSVs, and generate end-to-end reports
File Editing & Terminal Use – Navigates directories, reads files, writes scripts, runs commands, and edits documents

Safety, Privacy, and User Controls

OpenAI emphasized that safety is built into every layer of the agent’s operation. Key safeguards include:

User review required for all irreversible actions (e.g., submitting forms, making purchases)
Virtual browsing sandbox prevents direct access to passwords, banking info, or sensitive credentials
Session control – users can view, pause, or cancel the agent’s activity at any point
Built-in resistance to misuse in sensitive domains like chemistry, biology, or cybersecurity
Auditability – full task history is available for transparency

OpenAI said users remain in full control throughout each session and can disable agent access at any time.

Benchmarks and Performance

The ChatGPT agent sets new benchmark records across multiple test suites:

Achieved a 41.6% pass@1 on HumanEval, compared to 35.7% before.

FrontierMath: 27.4% accuracy – highest among public models

Led performance in AutoDemos, SimulEval, and Multi-hop Retrieval for data and spreadsheet tasks.
Shows improved results in multi-step planning, coding, and reasoning tests.

OpenAI said these performance gains reflect the agent’s ability to handle professional-grade workflows across data, research, and coding.

Limitations and What’s Coming Next

Despite its advanced capabilities, the agent has several current limitations:

Slide decks are basic – layout and formatting are not polished
Spreadsheets may contain errors – issues with formatting and data consistency
Clarification may be needed – agent may misunderstand ambiguous or layered instructions
Tool switching can lag – depending on complexity, execution may slow between steps
Make incorrect decisions.
Full automation is limited – critical actions always require user input
OpenAI expects upcoming updates will improve:
Slide and visual formatting
Spreadsheet accuracy and formula handling
Tool coordination fluidity
Reliability of complex, multi-step tasks

The company said it also plans to expand support to the EEA and Switzerland, and further strengthen memory, reasoning, and multi-modal integration.

Availability and Access

OpenAI will roll out the ChatGPT agent in phases beginning July 17, 2025.

ChatGPT Pro users get 400 messages per month with agent access.
Plus and Team users get 40 messages/month; top-ups are available
Enterprise and Education customers will gain access in upcoming updates
The system replaces the earlier Operator beta, and integrates with Deep Research, which now runs directly within ChatGPT

Users can manage the agent through settings and are notified when it’s active or has completed a task.

Rate This