Huawei’s Ultimate Weapon is not AI Chips, Says Huawei’s Rotating Chair

TMTPOST – Xu Zhijun, the helmsman for Huawei ’ s artificial intelligence strategy, finally spoke the words he had been holding in for six years.

At the 2025 Huawei Connect Conference in Shanghai, as the lights in the venue gradually dimmed and the word "Ascend" appeared on the big screen, there was no thunderous applause or dramatic cheers as one might expect. Some people held their breath, others had tears in their eyes. Everyone knew that one day Ascend would return publicly, but when that moment truly came, the overwhelming emotion was not excitement — it was deep reflection.

Last Thursday Huawei unveiled a comprehensive roadmap for its AI chips over the coming years — a moment that marked over 2,000 days and nights since the release of the Ascend 310 chip in 2018 and the Ascend 910 chip in 2019.

In the spring of 2019, U.S. sanctions pushed Huawei's supply chain to its limit almost overnight. At the time, Huawei remained cautiously optimistic, believing the impact wouldn ’ t last. At the 2019 Huawei Connect Conference, the company continued with the commercial release of the Ascend 910 chip as scheduled, still maintaining an air of calm confidence.

But the pressure had already crept into every corner. Xu recalled, "Given the limited inventory of Ascend 910 chips at the time, we didn ’ t dare sell them to internet customers — only to customers in key national industries and public services."

The sanctions were like a sudden storm, abruptly halting Huawei ’ s upward momentum. From glory to isolation, from applause to doubt — Huawei ’ s chip journey was, in the eyes of many, pronounced dead.

What it truly cost to overcome the greatest challenge in its history — no one but Huawei will ever know.

To the outside world, Huawei was represented by the "comeback" Mate 60 smartphone, the HarmonyOS, or the enterprise tools like MetaERP, GaussDB, and other internal middleware that kept the company ’ s operations running.

But behind the scenes, many Huawei employees were lying low and quietly preparing for a comeback. Teams across HiSilicon, cloud computing, data centers, and optical communications were all eagerly waiting for their moment to return to the front line. AI computing power — this was the battlefield Huawei was truly aiming for.

In March this year, Huawei officially launched the Atlas 900 SuperNode, which can be seen as a preview of Huawei ’ s AI strategy. Fully configured, it supports 384 chips. With 384 Ascend 910C chips, it can operate like a single computer and delivers a peak computing power of 300 PFLOPS. As of now, Atlas 900 remains the world ’ s most powerful supernode in terms of computing power.

The CloudMatrix 384 supernode is a cloud service instance built by Huawei Cloud based on the Atlas 900 supernode and is already being widely used for training and inference of large AI models.

Independent analytics firm SemiAnalysis published an article titled "Huawei AI CloudMatrix 384 – China ’ s Answer to Nvidia GB200 NVL72", concluding that while Huawei ’ s chip technology is one generation behind, its independently developed cloud-based supercomputing solution — CloudMatrix 384 — is actually a generation ahead of current commercial products from Nvidia and AMD. It directly benchmarks Nvidia ’ s GB200 NVL72 system and shows technical advantages over Nvidia ’ s rack-scale solutions in several key metrics. "This solution competes directly with the GB200 NVL72, and in some metrics is more advanced than Nvidia ’ s rack-scale solution. The engineering advantage is at the system level, not just at the chip level, with innovation at the networking, optics, and software layers," says the article.

"In the past, Intel allowed us to use their CPU chip interconnect protocols, but later that was also banned. From optical components to optical modules, from interconnect protocols to interconnect chips — we had to redefine and redesign everything ourselves to make it work. Some overseas companies have been trying to replicate our supernode system, researching how we managed to build it," said Xu, in his first interview after U.S. sanctions in 2019 with the media, including AsianFin and a few other media outlets.

Xu delivered a speech in a recent conference in Shanghai

"Compared to the chips themselves, overseas companies are now more interested in Huawei ’ s supernode architecture, because while they may be able to build better individual chips, they still cannot build a supernode like Huawei ’ s," he explained.

During the interview, Xu delivered a clear message: Chips are not the whole story when it comes to Huawei ’ s AI computing power. Huawei ’ s core strategy in the AI field is the "Supernode + Supercluster" computing solution. The UnifiedBus Interconnect Protocol represents a new paradigm in computing architecture.

Chips are important — but not that important

"Chips are the foundation of computing power. Ascend chips are the cornerstone of Huawei ’ s AI computing strategy," said Xu Zhijun.

Huawei has laid out plans for the development of three major chip series to be rolled out by 2028: Ascend 950 series,Ascend 950PR and 950DT and Ascend 960 and Ascend 970 series. More specific chips are also under planning.

Huawei aims to double computing performance nearly every year, while simultaneously evolving in directions such as: improved usability, support for more data formats, and higher interconnect bandwidth — to continuously meet the growing demands for AI computing power.

Compared to the Ascend 910B / 910C series, key upgrades beginning with the Ascend 950 include:

New heterogeneous SIMD/SIMT architecture to enhance programming ease;

Support for richer data formats, including:

FP32 / HF32 / FP16 / BF16 / FP8 / MXFP8 / HiF8 / MXFP4 / HiF4;

Greater interconnect bandwidth:

950 series: up to 2 TB/s,

970 series: up to 4 TB/s;

Significantly higher computing performance;

In-house developed HBM ( High Bandwidth Memory ) :

Memory capacity will double progressively,

Memory bandwidth will be quadrupled.

Beyond the chip itself, the ecosystem is a focal point for developers.

"Whether domestic AI companies use Ascend to train large models depends on whether they ’ re willing to try it. It ’ s like dating — if you don ’ t try, how will you know the other person ’ s strengths or weaknesses, whether you ’ re compatible or not? You have to try it, use it. If problems arise in use, solve them. If company A can use it, why can ’ t company B? It ’ s all about whether you ’ re willing to use it," said Xu. "Of course, our ecosystem and toolchain still lag behind Nvidia ’ s. Many engineers were already proficient with Nvidia ’ s tools and are reluctant to switch — this is an engineer's habit issue, not a top-level issue," said Xu.

Many chip vendors in the industry have chosen to stay compatible with Nvidia ’ s CUDA ecosystem, a safer path aligned with current AI development practices. But Huawei has chosen a different direction.

"We don ’ t support the CUDA ecosystem. We insist on building our own CANN ecosystem and the MindSpore framework — this is a long-term strategic decision. If we invest heavily in being compatible with CUDA — especially older versions of CUDA — what happens if one day it becomes incompatible or unavailable? So we pushed forward with MindSpore, even though many experts opposed it at the time. Now, our entire AI stack — from Da Vinci architecture, to Ascend chips, to all related software and hardware — does not rely on any Western ecosystem or supply chain. For the long term, we had no choice but to build our own ecosystem," Xu said.

Had the story ended here, Huawei could say it survived — and that would be an achievement. But for Huawei, just surviving is not good enough.

From the very beginning, Ascend wasn ’ t designed as a "backup plan." The Ascend 910 was released with the goal of achieving top-tier computing power. However, due to lagging chip fabrication and manufacturing processes, Huawei ’ s Ascend chips — at least in the short term — will continue to play catch-up.

However, many people haven ’ t realized this yet: What enabled Nvidia to thrive in the large model era may soon enable Huawei to rise next.

In the early stages of large models, Nvidia benefited from the performance of individual GPU cards and the CUDA ecosystem. But as AI continues to evolve, the advantage will shift — and Huawei ’ s strength lies in its "Supernode + Cluster" architecture.

This approach has already gained recognition in top-tier large model circles, though the general public is still largely unaware.

"Supernode + Supercluster": Solution to Computing Power Shortage in China

In 2022, Nvidia launched its DGX H100 NVL256 "Ranger" platform, but it was never mass-produced, due to excessively high costs, massive power consumption and reliability issues ( stemming from an excessive number of optical transceivers and a complex dual-layer network architecture ) . By March 2024, Nvidia pivoted and released the GB200 NVL72 Supernode, based on its new Blackwell GPU — but with significantly reduced scale.

Looking back now, Nvidia ’ s supernode roadmap has essentially vanished. Nvidia did prove that supernodes represent the future of computing power, but also inadvertently demonstrated how difficult they are to implement.

Huawei has now taken the baton as the next leader in AI computing.

At this year's Huawei Connect Conference, the company unveiled its latest supernode products: Atlas 950 SuperPoD and Atlas 960 SuperPoD. These support 8,192 and 15,488 Ascend cards, respectively. In terms of number of cards, total compute power, memory capacity and interconnect bandwidth, Huawei ’ s offerings are industry-leading and are expected to remain the world ’ s most powerful supernodes for years to come.

Based on these supernodes, Huawei also launched the world ’ s most powerful supernode clusters: Atlas 950 SuperCluster and Atlas 960 SuperCluster.

These reach compute scales of over 500,000 cards and up to 1 million cards, respectively —

undisputedly the most powerful compute clusters in the world.

Xu commented: "Aside from the fact that a single chip ’ s computing power is slightly lower, and its power consumption is a bit higher than Nvidia ’ s, we have advantages across the board. Because AI is all about parallel computing, our solution is to use supernodes. You use five chips? I can use ten. We can use 384, 8,192, even 15,488 chips — and that ’ s still not the limit."

He further explained: "We are not a large model company, nor an application company. As an ICT infrastructure and smart device provider, Huawei fully leverages its advantages to build solid infrastructure — and we make money from that infrastructure. We build supernodes, build clusters. Internally, the company has reached a consensus: we will commercialize Ascend hardware and achieve success through infrastructure."

The supernode is a path Huawei was forced to take, but it ’ s also a path that unifies all of Huawei ’ s strengths and maximizes its advantages. More importantly, it is the key to turning Huawei ’ s disadvantage in single-chip performance into a system-level advantage — surpassing Nvidia, and achieving the strongest computing power.

"What is a supernode?" Xu explained. "Although it ’ s physically composed of multiple racks and thousands of cards ( 8,192 or 15,488 ) , they can work, learn, think, and reason as a single computer. A cluster is when multiple supernodes are connected via a network — much like cloud services. It ’ s like connecting multiple servers together and then orchestrating them via software."

He said that Huawei ’ s core strategy is ‘ Supernode + Cluster ’ . Only with this architecture can we bypass limitations in China ’ s chip manufacturing capabilities, and ensure a steady, scalable supply of AI computing power."

"Innovation is sometimes forced, not because we wanted," said Xu. "In response to sanctions, we used ‘ non-Moore ’ s Law ’ to compensate for Moore ’ s Law, and math to compensate for physics. It ’ s not some grand feat. it was necessity In the past, HiSilicon was one generation ahead of others in chips. Now, we ’ re one or two generations behind — and who knows how many generations behind we ’ ll be in the future. So, we had to find another way.And that other path is right here. The limitations of chip manufacturing forced us to innovate — and break through."

UnifiedBus, and Huawei ’ s Own Path

At the end of Xu ’ s keynote at the Huawei Connect 2025 conference, he did not conclude his speech with chips. "We hope to work with the industry to use the pioneering UnifiedBus supernode interconnect technology to lead a new paradigm for AI infrastructure. With supernodes and clusters based on UnifiedBus, we aim to continuously meet the rapidly growing demand for computing power, drive the continued development of artificial intelligence, and create greater value."

According to industry experts, the revolutionary impact of UnifiedBus may be comparable to reinventing AI infrastructure itself. Huawei ’ s success with the "supernode + cluster" model depends heavily on it. If lithography machines are what continually push the performance of a single chip, then UnifiedBus is what connects tens of thousands of chips into one.

In 2021, Huawei laid out three company-level strategic initiatives. One was the HarmonyOS operating system. Another was UnifiedBus, a clear signal of its strategic significance.

Nvidia and other chip companies excel at chip design, but supernodes are not built by simply stacking more chips. Take large model training for example: At first, increasing the number of chips leads to linear increases in compute power. But after a certain point, performance hits a bottleneck, and further additions yield diminishing returns.

Large-scale compute clusters tailored for large model training need massive, high-speed data transfers.

Human history has never seen such demanding data flows, where data floods forward at full bandwidth, then reverses at full speed. This requires extremely low latency and high throughput — and in the future, compute interconnects won ’ t just link AI chips to AI chips, but also AI compute to general-purpose compute, and general compute to general compute.

As the IT industry has evolved, protocols like PCIe, InfiniBand, and RoCE have developed in parallel.

Nvidia ’ s NVLink maximizes its GPU performance through such interconnects. But UnifiedBus is not just a replacement, it ’ s a redefinition of AI compute interconnect standards. Through the UnifiedBus interconnect protocol, tens of thousands of compute cards can be linked together to function as a single supernode.

Unlike NVLink, which is a closed protocol, Huawei has announced that it will open-source the UnifiedBus 2.0 technical specifications. Why invest so much and then open it up? The reason is simple: Huawei ’ s philosophy is to monetize hardware. If UnifiedBus remains exclusive to Huawei, it will never grow into a real ecosystem. But if more companies adopt UnifiedBus to build their own compute clusters, the industry snowball will keep getting bigger.

"Our path is definitely not Nvidia ’ s path," Xu said. "Right now, everyone is looking at us through Nvidia ’ s lens — that ’ s unfair. But we ourselves can ’ t afford to be naive. I ’ d rather suffer in the short term, and be free of pain in the long term.

Huawei has forged its own path in the field of AI computing power — a path built on a system of many integrated capabilities. Take optical communication technology as an example. NVIDIA ’ s supernodes rely entirely on copper-based communication. While the advantage of this is technical maturity and lower cost, the downside is that it can only be deployed within a 2-meter range; beyond that, performance degrades significantly. As a result, the number of chips that can be interconnected is limited.

Huawei, on the other hand, adopted a much more aggressive strategy with optical communication. Optical modules offer the benefits of high bandwidth and high data rates, with low signal loss — making them ideal for long-distance transmission. This allows Huawei to interconnect more chips with greater flexibility in deployment.

However, before Huawei, no other company dared to use optical modules to build a supernode. The high failure rate and cost of optical modules made the viability of such a solution uncertain. But Huawei leveraged its years of accumulated expertise in communications to develop a unique, end-to-end solution — spanning optical chips, connection technologies, and fault recovery — which made building a supernode not only possible but successful.

Huawei ’ s victory is a systems-level victory — one that belongs to all Huawei employees and the broader Chinese computing industry. Xu stated: "By using a supernode architecture, and the UnifiedBus interconnect protocol that supports it, we aim to build supernodes and clusters that can meet the nation ’ s boundless demand for computing power. This is our internal goal, our commitment to the industry, and our promise to the country."

He continued: "By blazing this trail, and driving China ’ s industrial chain forward, this path becomes a real road. It may not be a ‘ new paradigm ’— it's a paradigm born out of necessity, a greatness that was forced upon us. Who really wants to do what others have already done? Of course we want to pioneer the future."

宙世代

一起剪

相关标签