钛媒体 09-22
Huawei’s Ultimate Weapon is not AI Chips, Says Huawei’s Rotating Chair
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_font3.html

 

TMTPOST – Xu Zhijun, the helmsman for Huawei ’ s artificial intelligence strategy, finally spoke the words he had been holding in for six years.

At the 2025 Huawei Connect Conference in Shanghai, as the lights in the venue gradually dimmed and the word   "Ascend" appeared on the big screen, there was no thunderous applause or dramatic cheers as one might expect. Some people held their breath, others had tears in their eyes. Everyone knew that one day Ascend would return publicly, but when that moment truly came, the overwhelming emotion was not excitement — it was deep reflection.

Last Thursday Huawei unveiled a comprehensive roadmap for its AI chips over the coming years — a moment that marked over 2,000 days and nights since the release of the   Ascend 310 chip in 2018   and the   Ascend 910 chip in 2019.

In the spring of 2019, U.S. sanctions pushed Huawei's supply chain to its limit almost overnight. At the time, Huawei remained cautiously optimistic, believing the impact wouldn ’ t last. At the 2019 Huawei Connect Conference, the company continued with the commercial release of the Ascend 910 chip as scheduled, still maintaining an air of calm confidence.

But the pressure had already crept into every corner. Xu recalled, "Given the limited inventory of Ascend 910 chips at the time, we didn ’ t dare sell them to internet customers — only to customers in key national industries and public services."

The sanctions were like a sudden storm, abruptly halting Huawei ’ s upward momentum. From glory to isolation, from applause to doubt — Huawei ’ s chip journey was, in the eyes of many, pronounced dead.

What it truly cost to overcome the greatest challenge in its history — no one but Huawei will ever know.

To the outside world, Huawei was represented by the "comeback"   Mate 60   smartphone, the   HarmonyOS, or the enterprise tools like   MetaERP,   GaussDB, and other internal middleware that kept the company ’ s operations running.

But behind the scenes, many Huawei employees were lying low and quietly preparing for a comeback. Teams across HiSilicon, cloud computing, data centers, and optical communications were all eagerly waiting for their moment to return to the front line.   AI computing power   — this was the battlefield Huawei was truly aiming for.

In March this year, Huawei officially launched the   Atlas 900 SuperNode, which can be seen as a preview of Huawei ’ s AI strategy. Fully configured, it supports   384 chips. With   384 Ascend 910C chips, it can operate like a single computer and delivers a peak computing power of   300 PFLOPS. As of now, Atlas 900 remains the world ’ s most powerful supernode in terms of computing power.

The   CloudMatrix 384   supernode is a cloud service instance built by Huawei Cloud based on the Atlas 900 supernode and is already being widely used for training and inference of large AI models.

Independent analytics firm   SemiAnalysis   published an article titled   "Huawei AI CloudMatrix 384 – China ’ s Answer to Nvidia GB200 NVL72", concluding that while Huawei ’ s chip technology is one generation behind, its independently developed   cloud-based supercomputing solution — CloudMatrix 384 — is actually a generation ahead   of current commercial products from Nvidia and AMD. It directly benchmarks Nvidia ’ s GB200 NVL72 system and shows technical advantages over Nvidia ’ s rack-scale solutions in several key metrics. "This solution competes directly with the GB200 NVL72, and in some metrics is more advanced than Nvidia ’ s rack-scale solution. The engineering advantage is at the system level, not just at the chip level, with innovation at the networking, optics, and software layers," says the article.

"In the past, Intel allowed us to use their CPU chip interconnect protocols, but later that was also banned. From   optical components   to   optical modules, from   interconnect protocols   to   interconnect chips   — we had to redefine and redesign everything ourselves to make it work. Some overseas companies have been trying to replicate our supernode system, researching how we managed to build it," said Xu, in his first interview after U.S. sanctions in 2019 with the media, including AsianFin and a few other media outlets.

Xu delivered a speech in a recent conference in Shanghai

"Compared to the chips themselves, overseas companies are now more interested in   Huawei ’ s supernode architecture, because while they may be able to build better individual chips,   they still cannot build a supernode like Huawei ’ s," he explained.

During the interview, Xu delivered a clear message: Chips are not the whole story when it comes to Huawei ’ s AI computing power.   Huawei ’ s core strategy in the AI field is the "Supernode + Supercluster" computing solution. The   UnifiedBus Interconnect Protocol   represents a new paradigm in computing architecture.

Chips are important — but not   that   important

"Chips are the foundation of computing power. Ascend chips are the cornerstone of Huawei ’ s AI computing strategy," said Xu Zhijun.

Huawei has laid out plans for the development of   three major chip series   to be rolled out by   2028: Ascend 950 series,Ascend 950PR and 950DT and Ascend 960 and Ascend 970 series. More specific chips are also under planning.

Huawei aims to   double computing performance nearly every year, while simultaneously evolving in directions such as: improved usability, support for more data formats, and higher interconnect bandwidth —   to continuously meet the growing demands for AI computing power.

Compared to the Ascend 910B / 910C series, key upgrades beginning with the Ascend 950 include:

New heterogeneous SIMD/SIMT architecture   to enhance programming ease;

Support for   richer data formats, including:

FP32 / HF32 / FP16 / BF16 / FP8 / MXFP8 / HiF8 / MXFP4 / HiF4;

Greater   interconnect bandwidth:

950 series: up to   2 TB/s,

970 series: up to   4 TB/s;

Significantly   higher computing performance;

In-house developed HBM ( High Bandwidth Memory ) :

Memory capacity will   double   progressively,

Memory bandwidth will be   quadrupled.

Beyond the chip itself, the ecosystem   is a focal point for developers.

"Whether domestic AI companies use Ascend to train large models depends on whether they ’ re willing to try it.   It ’ s like dating — if you don ’ t try, how will you know the other person ’ s strengths or weaknesses, whether you ’ re compatible or not? You have to try it, use it. If problems arise in use, solve them. If company A can use it, why can ’ t company B? It ’ s all about whether you ’ re willing to use it," said Xu. "Of course, our ecosystem and toolchain still lag behind Nvidia ’ s. Many engineers were already proficient with Nvidia ’ s tools and are reluctant to switch — this is an   engineer's habit issue, not a top-level issue," said Xu.

Many chip vendors in the industry have chosen to   stay compatible with Nvidia ’ s CUDA ecosystem, a safer path aligned with current AI development practices. But   Huawei has chosen a different direction.

"We don ’ t support the CUDA ecosystem. We insist on building   our own CANN ecosystem   and the   MindSpore framework   — this is a long-term strategic decision. If we invest heavily in being compatible with CUDA — especially   older   versions of CUDA — what happens if one day it becomes incompatible or unavailable? So we pushed forward with MindSpore, even though many experts opposed it at the time. Now, our entire AI stack — from Da Vinci architecture, to Ascend chips, to all related software and hardware —   does not rely on any Western ecosystem or supply chain. For the long term, we had no choice but to build our own ecosystem," Xu said.

Had the story ended here, Huawei could say it   survived   — and that would be an achievement. But for Huawei,   just surviving is not good enough.

From the very beginning, Ascend wasn ’ t designed as a "backup plan." The   Ascend 910   was released with the goal of   achieving top-tier computing power. However, due to lagging chip fabrication and manufacturing processes, Huawei ’ s Ascend chips — at least in the short term — will continue to   play catch-up.

However, many people haven ’ t realized this yet:   What enabled Nvidia to thrive in the large model era   may soon enable   Huawei to rise next.

In the early stages of large models,   Nvidia benefited from the performance of individual GPU cards and the CUDA ecosystem.   But as AI continues to evolve, the advantage   will shift   — and Huawei ’ s strength lies in its "Supernode + Cluster" architecture.

This approach has already gained   recognition in top-tier large model circles, though the general public is still largely unaware.

"Supernode + Supercluster": Solution to Computing Power Shortage in China

In 2022, Nvidia launched its   DGX H100 NVL256 "Ranger"   platform,   but it was   never mass-produced, due to excessively high costs, massive power consumption and reliability issues   ( stemming from an excessive number of optical transceivers and a complex dual-layer network architecture ) .   By March 2024, Nvidia pivoted and released the   GB200 NVL72 Supernode, based on its new   Blackwell GPU   —   but   with significantly reduced scale.

Looking back now, Nvidia ’ s   supernode roadmap has essentially vanished.   Nvidia did prove that supernodes represent the   future of computing power,   but also inadvertently demonstrated   how difficult   they are to implement.

Huawei has now taken the baton   as the next leader in AI computing.

At this year's Huawei Connect Conference, the company unveiled its   latest supernode products:   Atlas 950 SuperPoD and   Atlas 960 SuperPoD.   These support   8,192 and 15,488 Ascend cards, respectively.   In terms of   number of cards,   total compute power,   memory capacity   and interconnect bandwidth,   Huawei ’ s offerings are   industry-leading   and are expected to remain the   world ’ s most powerful supernodes   for years to come.

Based on these supernodes, Huawei also launched the   world ’ s most powerful supernode clusters:   Atlas 950 SuperCluster and   Atlas 960 SuperCluster.

These reach compute scales of   over 500,000 cards   and   up to 1 million cards, respectively —

undisputedly   the   most powerful compute clusters in the world.

Xu commented:   "Aside from the fact that   a single chip ’ s computing power is slightly lower, and   its power consumption is a bit higher   than Nvidia ’ s, we have   advantages across the board.   Because AI is all about   parallel computing, our solution is to use supernodes.   You use five chips? I can use ten.   We can use 384, 8,192, even 15,488 chips — and that ’ s   still not the limit."

He further explained:   "We are   not   a large model company, nor an application company.   As an   ICT infrastructure and smart device provider, Huawei fully leverages its advantages to build   solid infrastructure   —   and we make money from that infrastructure.   We build supernodes, build clusters.   Internally, the company has reached a consensus:   we will   commercialize Ascend hardware   and achieve success through infrastructure."

The   supernode   is a path Huawei was   forced to take,   but it ’ s also a path that   unifies all of Huawei ’ s strengths   and maximizes its advantages.   More importantly, it is the   key   to   turning Huawei ’ s disadvantage in single-chip performance into a system-level advantage   —   surpassing Nvidia, and achieving   the strongest computing power.

"What is a supernode?" Xu explained.   "Although it ’ s   physically   composed of multiple racks and thousands of cards ( 8,192 or 15,488 ) ,   they can work, learn, think, and reason   as a single computer.   A   cluster is when   multiple supernodes   are connected via a network —   much like cloud services.   It ’ s like connecting multiple servers together and then orchestrating them via software."

He said that   Huawei ’ s core strategy is   ‘ Supernode + Cluster ’ .   Only with this architecture can we   bypass limitations in China ’ s chip manufacturing capabilities,   and   ensure a steady, scalable supply of AI computing power."

"Innovation is sometimes   forced, not because we wanted," said Xu.   "In response to sanctions, we used   ‘ non-Moore ’ s Law ’ to compensate for Moore ’ s Law,   and   math to compensate for physics.   It ’ s not some grand feat. it was   necessity   In the past,   HiSilicon   was one generation ahead of others in chips.   Now, we ’ re one or two generations behind — and who knows how many generations behind we ’ ll be in the future.   So, we had to find   another way.And   that other path   is right here.   The limitations of chip manufacturing   forced us to innovate   — and break through."

UnifiedBus, and Huawei ’ s Own Path

At the end of Xu ’ s keynote at the Huawei Connect 2025 conference, he did   not   conclude his speech with chips.   "We hope to work with the industry to use the pioneering   UnifiedBus supernode interconnect technology   to lead a new paradigm for AI infrastructure.   With supernodes and clusters based on UnifiedBus, we aim to continuously meet the rapidly growing demand for computing power, drive the continued development of artificial intelligence, and create greater value."

According to industry experts, the revolutionary impact of   UnifiedBus   may be comparable to   reinventing AI infrastructure itself.   Huawei ’ s success with the "supernode + cluster" model depends heavily on it.   If   lithography machines   are what continually push the performance of   a single chip,   then   UnifiedBus   is what   connects tens of thousands of chips into one.

In 2021, Huawei laid out   three company-level strategic initiatives.   One was the   HarmonyOS   operating system.   Another was   UnifiedBus, a clear signal of its strategic significance.

Nvidia and other chip companies excel at   chip design,   but   supernodes   are not built by simply stacking more chips.   Take large model training for example:   At first, increasing the number of chips leads to   linear increases   in compute power.   But after a certain point, performance hits a   bottleneck, and further additions yield   diminishing returns.

Large-scale compute clusters tailored for large model training need   massive, high-speed data transfers.

Human history has   never seen   such demanding data flows,   where data floods forward at full bandwidth, then reverses at full speed.   This requires   extremely low latency   and   high throughput   —   and in the future, compute interconnects won ’ t just link   AI chips to AI chips,   but also   AI compute to general-purpose compute, and   general compute to general compute.

As the IT industry has evolved, protocols like   PCIe,   InfiniBand, and   RoCE   have developed in parallel.

Nvidia ’ s   NVLink   maximizes its GPU performance through such interconnects.   But   UnifiedBus is not just a replacement,   it ’ s a   redefinition   of AI compute interconnect standards.   Through the   UnifiedBus interconnect protocol, tens of thousands of compute cards can be linked together to function as   a single supernode.

Unlike   NVLink, which is a   closed protocol,   Huawei has announced that it will   open-source the UnifiedBus 2.0 technical specifications.   Why invest so much and then   open it up?   The reason is simple:   Huawei ’ s philosophy is to   monetize hardware.   If UnifiedBus remains   exclusive to Huawei, it will never grow into a real ecosystem.   But if   more companies adopt UnifiedBus   to build their own compute clusters,   the   industry snowball   will keep getting bigger.

"Our path is definitely   not Nvidia ’ s path," Xu said.   "Right now, everyone is looking at us through Nvidia ’ s lens — that ’ s   unfair.   But we ourselves   can ’ t afford to be naive.   I ’ d rather suffer in the short term, and be free of pain in the long term.

Huawei has forged its own path in the field of AI computing power — a path built on a system of many integrated capabilities.   Take optical communication technology as an example. NVIDIA ’ s supernodes rely entirely on copper-based communication. While the advantage of this is technical maturity and lower cost, the downside is that it can only be deployed within a 2-meter range; beyond that, performance degrades significantly. As a result, the number of chips that can be interconnected is limited.

Huawei, on the other hand, adopted a much more aggressive strategy with optical communication. Optical modules offer the benefits of high bandwidth and high data rates, with low signal loss — making them ideal for long-distance transmission. This allows Huawei to interconnect more chips with greater flexibility in deployment.

However, before Huawei, no other company dared to use optical modules to build a supernode. The high failure rate and cost of optical modules made the viability of such a solution uncertain. But Huawei leveraged its years of accumulated expertise in communications to develop a unique, end-to-end solution — spanning optical chips, connection technologies, and fault recovery — which made building a supernode not only possible but successful.

Huawei ’ s victory is a systems-level victory — one that belongs to all Huawei employees and the broader Chinese computing industry. Xu stated: "By using a supernode architecture, and the UnifiedBus interconnect protocol that supports it, we aim to build supernodes and clusters that can meet the nation ’ s boundless demand for computing power. This is our internal goal, our commitment to the industry, and our promise to the country."

He continued: "By blazing this trail, and driving China ’ s industrial chain forward, this path becomes a real road. It may not be a ‘ new paradigm ’— it's a paradigm born out of necessity, a greatness that was forced upon us. Who really wants to do what others have already done? Of course we want to pioneer the future."

宙世代

宙世代

ZAKER旗下Web3.0元宇宙平台

一起剪

一起剪

ZAKER旗下免费视频剪辑工具

相关标签

huawei the
相关文章
评论
没有更多评论了
取消

登录后才可以发布评论哦

打开小程序可以发布评论哦

12 我来说两句…
打开 ZAKER 参与讨论