OpenAI launches GPT‑5.3‑Codex‑Spark for real-time coding with 128k context


OpenAI has released a research preview of GPT‑5.3‑Codex‑Spark, a smaller and faster version of GPT‑5.3‑Codex, designed for real-time coding tasks. The model is optimized for ultra-low latency hardware in partnership with Cerebras, delivering more than 1,000 tokens per second while maintaining full coding capability.

Codex-Spark is intended for interactive coding, enabling targeted edits, logic updates, or interface adjustments with immediate results. It also complements longer-running tasks handled by other Codex models. At launch, the model supports a 128k context window and is text-only. Usage during the research preview follows separate rate limits and may be temporarily limited under high demand.

Features: GPT‑5.3‑Codex‑Spark
  • Interactive Performance: Optimized for low-latency editing; by default, it performs minimal edits and does not run tests unless requested.
  • Coding Benchmarks: Evaluated on SWE-Bench Pro and Terminal-Bench 2.0, achieving high accuracy and completing tasks faster than GPT‑5.3‑Codex.

  • Latency Improvements: End-to-end optimizations reduce client-server roundtrip overhead by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The model uses a persistent WebSocket connection by default.
  • Hardware Integration: Runs on Cerebras Wafer Scale Engine 3, providing low-latency inference. GPUs continue to support large-scale, cost-efficient workloads, while Cerebras accelerators enhance interactive coding responsiveness.
  • Dual Coding Modes: Supports both real-time, interactive coding and longer-horizon tasks.
  • Safety and Evaluation: Includes standard safety training, including cyber-relevant guidance. Codex-Spark was assessed under OpenAI’s deployment framework and is not expected to reach high-risk thresholds for cybersecurity or biological capabilities.
What’s Next

The company says Codex-Spark is the first step toward a Codex with two complementary modes—longer-horizon reasoning and real-time collaboration—and expects that over time these modes will blend, keeping users in a tight interactive loop while delegating longer-running tasks to sub-agents or multiple models in parallel.

The company notes that as models become more capable, interaction speed becomes a bottleneck, and ultra-fast inference tightens the loop, making Codex feel more natural to use and expanding what the company expects is possible for anyone turning an idea into working software.

Availability
  • Research Preview: Available today for ChatGPT Pro users via the Codex app, CLI, and VS Code extension.
  • API Access: Limited availability for design partners; broader access will be rolled out in stages.
  • Context and Input: Currently supports text-only input with a 128k context window. Future releases may include multimodal input and larger models.