
Observe ZDNET: Add us as a most well-liked supply on Google.
ZDNET’s key takeaways
- OpenAI targets “conversational” coding, not gradual batch-style brokers.
- Huge latency wins: 80% quicker roundtrip, 50% quicker time-to-first-token.
- Runs on Cerebras WSE-3 chips for a latency-first Codex serving tier.
The Codex staff at OpenAI is on hearth. Lower than two weeks after releasing a devoted agent-based Codex app for Macs, and solely per week after releasing the quicker and extra steerable GPT-5.3-Codex language mannequin, OpenAI is relying on lightning placing for a 3rd time.
Additionally: OpenAI’s new GPT-5.3-Codex is 25% quicker and goes manner past coding now – what’s new
At the moment, the corporate has introduced a analysis preview of GPT-5.3-Codex-Spark, a smaller model of GPT-5.3-Codex constructed for real-time coding in Codex. The corporate stories that it generates code 15 instances quicker whereas “remaining extremely succesful for real-world coding duties.” There’s a catch, and I am going to discuss that in a minute.
Additionally: OpenAI’s Codex simply acquired its personal Mac app – and anybody can strive it totally free now
Codex-Spark will initially be out there solely to $200/mo Professional tier customers, with separate fee limits throughout the preview interval. If it follows OpenAI’s normal launch technique for Codex releases, Plus customers might be subsequent, with different tiers gaining entry pretty rapidly.
(Disclosure: Ziff Davis, ZDNET’s mother or father firm, filed an April 2025 lawsuit in opposition to OpenAI, alleging it infringed Ziff Davis copyrights in coaching and working its AI programs.)
Increasing the Codex household for real-time collaboration
OpenAI says Codex-Spark is its “first mannequin designed particularly for working with Codex in real-time — making focused edits, reshaping logic, or refining interfaces and seeing outcomes instantly.”
Let’s deconstruct this briefly. Most agentic AI programming instruments take some time to answer directions. In my programming work, I may give an instruction (and this is applicable to each Codex and Claude Code) and go off and work on one thing else for some time. Generally it is just some minutes. Different instances, it may be lengthy sufficient to get lunch.
Additionally: I acquired 4 years of product improvement accomplished in 4 days for $200, and I am nonetheless shocked
Codex-Spark is outwardly in a position to reply a lot quicker, permitting for fast and steady work. This might pace up improvement significantly, particularly for easier prompts and queries.
I do know that I have been often pissed off once I’ve requested an AI an excellent easy query that ought to have generated a right away response, however as an alternative I nonetheless needed to wait 5 minutes for a solution.
By making responsiveness a core function, the mannequin helps extra fluid, conversational coding. Generally, utilizing coding brokers feels extra like old-school batch fashion coding. That is designed to beat that feeling.
GPT-5.3-Codex-Spark is not supposed to interchange the bottom GPT-5.3-Codex. As a substitute, Spark was designed to enrich high-performance AI fashions constructed for long-running, autonomous duties lasting hours, days, or weeks.
Efficiency
The Codex-Spark mannequin is meant for work the place responsiveness issues as a lot as intelligence. It helps interruption and redirection mid-task, enabling tight iteration loops.
That is one thing that appeals to me, as a result of I at all times consider one thing extra to inform the AI ten seconds after I’ve given it an project.
Additionally: I used Claude Code to vibe code a Mac app in 8 hours, however it was extra work than magic
The Spark mannequin defaults to light-weight, focused edits, making fast tweaks moderately than taking huge swings. It additionally does not routinely run checks until requested.
OpenAI has been in a position to cut back latency (quicker turnaround) throughout the complete request-response pipeline. It says that overhead per shopper/server roundtrip has been diminished by 80%. Per-token overhead has been diminished by 30%. The time-to-first-token has been diminished by 50% by session initialization and streaming optimizations.
One other mechanism that improves responsiveness throughout iteration is the introduction of a persistent WebSocket connection, so the connection does not have to repeatedly be renegotiated.
Powered by Cerebras AI chips
In January, OpenAI introduced a partnership with AI chipmaker Cerebras. We have been protecting Cerebras for some time. We have coated its inference service, its work with DeepSeek, its work boosting the efficiency of Meta’s Llama mannequin, and Cerebras’ announcement of a actually huge AI chip, meant to double LLM efficiency.
GPT-5.3-Codex-Spark is the primary milestone for the OpenAI/Cerebras partnership introduced final month. The Spark mannequin runs on Cerebras’ Wafer Scale Engine 3, which is a high-performance AI chip structure that reinforces pace by placing all of the compute sources on a single wafer-scale processor the dimensions of a pancake.
Additionally: 7 ChatGPT settings tweaks that I can not work with out – and I am an influence consumer
Often, a semiconductor wafer comprises a complete bunch of processors, which later within the manufacturing course of get lower aside and put into their very own packaging. The Cerebras wafer comprises only one chip, making it a really, very huge processor with very, very intently coupled connections.
Based on Sean Lie, CTO and co-founder of Cerebras, “What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer neighborhood to find what quick inference makes potential — new interplay patterns, new use circumstances, and a essentially totally different mannequin expertise. This preview is only the start.”
The gotchas
Now, listed here are the gotchas.
First, OpenAI says that “when demand is excessive, you might even see slower entry or momentary queuing as we steadiness reliability throughout customers.” So, quick, until too many individuals wish to go quick.
Here is the kicker. The corporate says, “On SWE-Bench Professional and Terminal-Bench 2.0, two benchmarks evaluating agentic software program engineering functionality, GPT-5.3-Codex-Spark underperforms GPT-5.3-Codex, however can accomplish the duties in a fraction of the time.”
Final week, within the GPT-5.3-Codex announcement, OpenAI mentioned that GPT-5.3-Codex was the primary mannequin it classifies as “excessive functionality” for cybersecurity, in line with its printed Preparedness Framework. Alternatively, the corporate admitted that GPT-5.3-Codex-Spark “doesn’t have a believable probability of reaching our Preparedness Framework threshold for top functionality in cybersecurity.”
Additionally: I finished utilizing ChatGPT for all the pieces: These AI fashions beat it at analysis, coding, and extra
Assume on these statements, pricey reader. This AI is not as good, however it does do these not-as-smart issues lots quicker. 15x pace is definitely nothing to sneeze at. However do you really need an AI to make coding errors 15 instances quicker and produce code that’s much less safe?
Let me let you know this. “Eh, it is adequate” is not actually adequate when you’ve got 1000’s of pissed off customers coming at you with torches and pitchforks since you all of the sudden broke their software program with a brand new launch. Ask me how I do know.
Final week, we realized that OpenAI makes use of Codex to put in writing Codex. We additionally know that it makes use of it to have the ability to construct code a lot quicker. So the corporate clearly has a use case for one thing that is manner quicker, however not as good. As I get a greater deal with on what that’s and the place Spark suits, I am going to let you understand.
What’s subsequent?
OpenAI shared that it’s working towards twin modes of reasoning and real-time work for its Codex fashions.
The corporate says, “Codex-Spark is step one towards a Codex with two complementary modes: longer-horizon reasoning and execution, and real-time collaboration for speedy iteration. Over time, the modes will mix.”
The workflow mannequin it envisions is fascinating. Based on OpenAI, the intent is that finally “Codex can preserve you in a good interactive loop whereas delegating longer-running work to sub-agents within the background, or fanning out duties to many fashions in parallel while you need breadth and pace, so you do not have to decide on a single mode up entrance.”
Additionally: I attempted a Claude Code rival that is native, open supply, and fully free – the way it went
Primarily, it is working towards the perfect of each worlds. However for now, you’ll be able to select quick or correct. That is a troublesome alternative. However the correct is getting extra correct, and now, no less than, you’ll be able to go for quick while you need it (so long as you retain the trade-offs in thoughts and also you’re paying for the Professional tier).
What about you? Would you commerce some intelligence and safety functionality for 15x quicker coding responses? Does the thought of a real-time, interruptible AI collaborator attraction to you, or do you like a extra deliberate, higher-accuracy mannequin for critical improvement work?
How involved are you in regards to the cybersecurity distinction between Codex-Spark and the complete GPT-5.3-Codex mannequin? And in the event you’re a Professional consumer, do you see your self switching between “quick” and “good” modes relying on the duty? Tell us within the feedback under.
You may observe my day-to-day venture updates on social media. You’ll want to subscribe to my weekly replace publication, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.


























