OpenAI rolls out ChatGPT Agent with reasoning, planning, and API integration


OpenAI has launched its most advanced system yet — a ChatGPT agent that can plan, reason, and perform multi-step tasks across the web and apps using integrated tools. The agent introduces a new layer of AI automation with features like browsing, terminal access, file editing, and API integration — all executed securely through a virtual computer.

What Is the ChatGPT Agent?

This agentic system builds on ChatGPT’s existing tools by allowing the AI to think, decide, and act independently. It uses a sandboxed virtual machine to complete tasks safely, switching between tools like a web browser, Python code interpreter, terminal, and file system.

The system is designed to automate workflows across work, research, and productivity, while keeping users in control at every step. OpenAI said the agent doesn’t just respond to prompts — it can break down complex instructions, plan the steps needed, execute each part autonomously, and notify users when finished.

What Can It Do?

According to OpenAI, the ChatGPT agent is built for real-world use cases and productivity scenarios. Key capabilities include:

  • Planning and Task Execution – Splits complex instructions into steps, executes them one by one, and adapts based on results
  • Tool Integration – Switches between browser, terminal, code interpreter, and file tools during execution
  • Web Browsing with Actions – Searches the web, logs into websites (with approval), downloads files, and extracts or summarizes information
  • API and App Access – Interacts with services like Gmail, Google Calendar, Notion, and GitHub to automate tasks
  • Multi-Step Reports – Can create slide decks, fill spreadsheets, analyze CSVs, and generate end-to-end reports
  • File Editing & Terminal Use – Navigates directories, reads files, writes scripts, runs commands, and edits documents
Safety, Privacy, and User Controls

OpenAI emphasized that safety is built into every layer of the agent’s operation. Key safeguards include:

  • User review required for all irreversible actions (e.g., submitting forms, making purchases)
  • Virtual browsing sandbox prevents direct access to passwords, banking info, or sensitive credentials
  • Session control – users can view, pause, or cancel the agent’s activity at any point
  • Built-in resistance to misuse in sensitive domains like chemistry, biology, or cybersecurity
  • Auditability – full task history is available for transparency

OpenAI said users remain in full control throughout each session and can disable agent access at any time.

Benchmarks and Performance

The ChatGPT agent sets new benchmark records across multiple test suites:

  • Achieved a 41.6% pass@1 on HumanEval, compared to 35.7% before.

  • FrontierMath: 27.4% accuracy – highest among public models

  • Led performance in AutoDemos, SimulEval, and Multi-hop Retrieval for data and spreadsheet tasks.
  • Shows improved results in multi-step planning, coding, and reasoning tests.

OpenAI said these performance gains reflect the agent’s ability to handle professional-grade workflows across data, research, and coding.

Limitations and What’s Coming Next

Despite its advanced capabilities, the agent has several current limitations:

  • Slide decks are basic – layout and formatting are not polished
  • Spreadsheets may contain errors – issues with formatting and data consistency
  • Clarification may be needed – agent may misunderstand ambiguous or layered instructions
  • Tool switching can lag – depending on complexity, execution may slow between steps
  • Make incorrect decisions.
  • Full automation is limited – critical actions always require user input
  • OpenAI expects upcoming updates will improve:
  • Slide and visual formatting
  • Spreadsheet accuracy and formula handling
  • Tool coordination fluidity
  • Reliability of complex, multi-step tasks

The company said it also plans to expand support to the EEA and Switzerland, and further strengthen memory, reasoning, and multi-modal integration.

Availability and Access

OpenAI will roll out the ChatGPT agent in phases beginning July 17, 2025.

  • ChatGPT Pro users get 400 messages per month with agent access.
  • Plus and Team users get 40 messages/month; top-ups are available
  • Enterprise and Education customers will gain access in upcoming updates
  • The system replaces the earlier Operator beta, and integrates with Deep Research, which now runs directly within ChatGPT

Users can manage the agent through settings and are notified when it’s active or has completed a task.

0
0
i
Rate This