OpenAI Unveils GPT-5.2 to Counter Google's Gemini 3 Dominance, Claims Strongest Agent Coding Capabilities

OpenAI on Thursday launched GPT-5.2, its most advanced artificial intelligence ( AI ) model, firing the first shot in its battle against Google's Gemini 3. The new model positions OpenAI to reclaim leadership in AI development after weeks of losing ground to rivals.

AI Generated Image

The company released GPT-5.2 on Thursday in three tiers across ChatGPT and its API platform. Paid ChatGPT users on Plus, Pro, Go, Business and Enterprise plans gain immediate access, while subscribers can continue using GPT-5.1 for three additional months before the legacy model sunsets. API developers received instant availability of GPT-5.2 Thinking at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount on cached inputs.

The launch comes weeks after CEO Sam Altman declared a "code red" in an internal memo, redirecting resources to ChatGPT improvements as the company faced declining traffic and market share losses to Google. OpenAI says GPT-5.2 surpasses human experts at professional tasks, setting new benchmarks in coding, mathematical reasoning and scientific research.

The competitive stakes have intensified as Google's Gemini 3 topped LMArena leaderboards and earned widespread praise for reasoning capabilities, threatening OpenAI's first-mover advantage at a time when the startup has committed over $1 trillion to AI infrastructure development.

Professional Performance Reaches Expert Level

GPT-5.2 represents OpenAI's first model to match or exceed human expert performance on GDPval, a benchmark measuring well-specified knowledge work across 44 occupations. The model beats or ties top industry professionals on 70.9% of comparisons, according to expert judges evaluating tasks spanning presentations, spreadsheets and other professional deliverables.

The model delivers these results at 11 times the speed and less than 1% the cost of expert professionals, OpenAI said. On internal benchmarks testing junior investment banking analyst tasks, GPT-5.2 Thinking scored 68.4%, a 9.3 percentage point improvement over GPT-5.1's 59.1%.

One GDPval judge reviewing outputs commented the work "appears to have been done by a professional company with staff, and has a surprisingly well designed layout and advice."

Coding Capabilities Target Developer Market

GPT-5.2 Thinking achieved 55.6% on SWE-Bench Pro, a rigorous evaluation testing real-world software engineering across four programming languages. The model reached 80% on SWE-bench Verified, OpenAI's new high.

Coding platforms reported measurable improvements. Jeff Wang, CEO of Windsurf, said GPT-5.2 "represents the biggest leap for GPT models in agentic coding since GPT-5" and enabled his company to collapse fragile multi-agent systems into single mega-agents with 20-plus tools. Cognition, Warp, Charlie Labs, JetBrains and Augment Code reported state-of-the-art agentic coding performance.

Research lead Adain Clark told reporters that stronger mathematical reasoning translates across workloads. "These are all properties that really matter across a wide range of different workloads," Clark said, citing financial modeling, forecasting and data analysis as key applications.

Product lead Max Schwarzer said GPT-5.2 Thinking responses contain 38% fewer errors than its predecessor, making the model more dependable for daily decision-making and research.

Scientific Research and Mathematical Breakthroughs

OpenAI positions GPT-5.2 Pro and Thinking as the world's best models for accelerating scientific research. GPT-5.2 Pro scored 93.2% on GPQA Diamond, a graduate-level benchmark testing science knowledge, while GPT-5.2 Thinking achieved 92.4%.

On FrontierMath expert-level mathematics problems, GPT-5.2 Thinking solved 40.3% of Tier 1-3 challenges, setting a new state of the art. The model became the first to cross 90% on ARC-AGI-1, improving from o3-preview's 87% while reducing costs by approximately 390 times.

In recent research, GPT-5.2 Pro helped researchers explore an open question in statistical learning theory, proposing a proof subsequently verified by authors and external experts. The company said this demonstrates how frontier models can assist mathematical research under human oversight.

Strategic Response to Competitive Pressure

Fidji Simo, CEO of applications at OpenAI, told CNBC that GPT-5.2 development spanned many months, predating the recent code red directive. "While we are proud that we are able to have a cadence of releasing models fast, this particular integration has been in the works for a while," Simo said.

Altman told CNBC on Thursday that "Gemini 3 has had less of an impact on our metrics than maybe we feared." He said he expects OpenAI to exit code red mode by January "in a very strong position."

The company has committed more than $1 trillion to AI infrastructure alongside partners NVIDIA and Microsoft. Azure data centers and NVIDIA GPUs, including H100, H200 and GB200-NVL72, underpin OpenAI's training infrastructure.

However, the focus on compute-intensive reasoning models presents financial challenges. GPT-5.2's Thinking and Pro modes consume significantly more computing resources than standard chatbots, potentially creating pressure as OpenAI already spends more on inference compute than previously disclosed, according to recent reports.

New Safety Features and Product Roadmap

OpenAI announced it has begun rolling out age prediction software to apply content protections for users under 18. Simo said the company plans to launch "adult mode" in the first quarter of 2025, allowing uses such as "erotica for verified adults."

The company strengthened responses to sensitive conversations, with improvements in handling prompts indicating suicide risk, self-harm, mental health distress or emotional reliance on the model. Details appear in the updated GPT-5.2 System Card.

OpenAI has no current plans to deprecate GPT-5.1, GPT-5 or GPT-4.1 in the API and will provide advance notice of any future deprecations. The company expects to release a Codex-optimized version of GPT-5.2 in coming weeks.

Enterprise partners including Notion, Box, Shopify, Harvey, Zoom, Databricks, Hex and Triple Whale reported state-of-the-art performance for long-horizon reasoning, tool-calling, data science and document analysis tasks.

宙世代

一起剪

相关标签