Exploring Huawei’s AI Journey: The Ascend NPU Roadmap
Huawei has recently unveiled its ambitious plan to supercharge its AI capabilities with its Ascend Neural Processing Units (NPUs). At the Huawei Connect 2025 event, the company proudly presented its first AI cluster, capable of achieving 1 FP4 ZettaFLOPS performance. However, what truly made waves was the comprehensive roadmap for its future NPUs. Huawei is projecting to reach a staggering 4 ZettaFLOPS FP4 performance by 2028, notwithstanding challenges in manufacturing and access to advanced technologies from global giants.
Overcoming Limitations: The Path Ahead
One of the significant hurdles Huawei faces is the inability to tap into the leading-edge technologies provided by TSMC and other industry leaders. Due to restrictions, the company cannot use high-end options like HBM4 and GDDR7 memory. This limitation compels Huawei to innovate its architecture and memory types, starting with the upcoming Ascend 950 series. By the end of the decade, Huawei aims to see its NPUs boost performance to multi-ZettaFLOPS, a feat that would put them at the forefront of AI processing capabilities.
Existing Performance Landscape
As of now, Huawei’s Ascend 910 series has seen minimal updates in recent years. The dual-chip Ascend 910C can perform up to 800 TFLOPS of FP16 performance, similar to Nvidia’s H100, and features 128 GB of HBM with 3.2 TB/s bandwidth. While competitive, this performance is beginning to lag behind Nvidia’s latest offerings. Recognizing this gap, Huawei is developing a new family of NPUs—comprising Ascend 950PR, 950DT, 960, and 970—designed with a fresh instruction set to meet the increasing demands of next-generation AI workloads.
Unveiling the Ascend 950 Series
The Ascend 950 series is a critical milestone in Huawei’s roadmap. It includes two variants: Ascend 950PR for recommendation tasks and Ascend 950DT for training. Both versions utilize a new SIMD+SIMT architecture that combines vector-based processing with thread-level parallelism. This clever design maximizes performance while optimizing memory access.
These new units will also introduce modern data formats, enhancing precision and performance. The 950PR will come with 128 GB of Huawei’s HiBL 1.0 memory and 1.6 TB/s bandwidth, targeting low-cost, memory-light tasks. The 950DT, on the other hand, will feature 144 GB of HiZQ 2.0 memory with a remarkable 4.0 TB/s bandwidth, slated for launch in Q4 2026.
The Ascend 960 and 970
Following the 950, Huawei plans to release the Ascend 960 in Q4 2027. This unit will bring significant upgrades, including support for a new 4-bit data format and nearly doubling performance, memory capacity, and bandwidth compared to its predecessor. It aims to deliver 2 PFLOPS FP8 and 4 PFLOPS FP4, defined by 288 GB of memory and 9.6 TB/s bandwidth.
Then comes the Ascend 970, which is set to launch in late 2028. Showcasing even further advancements, this unit aims for 4 FP8 PFLOPS and 8 FP4 PFLOPS, maintaining the same memory but upgrading the bandwidth to 14.4 TB/s. Designed to support models with up to 10 trillion parameters, the 970 aims to revolutionize processing capabilities.
Challenges of Scale: A Look Ahead
Despite these innovations, Huawei’s standing is precarious because of the U.S. restrictions limiting its access to advanced manufacturing technologies. This has led the company to pivot from traditional performance scaling, which aligns with Moore’s Law, toward building massive AI clusters. They plan to deploy SuperPods with up to 15,488 NPUs and SuperClusters of over a million NPUs, pushing the boundaries of AI performance.
However, this approach isn’t without complications. Integrating hundreds of thousands of accelerators necessitates significant advancements in software architecture and engineering. Huawei’s large, monolithic systems face unique challenges compared to Nvidia’s more modular design. The latter benefits from a well-established ecosystem, making it easier for developers to program and optimize their applications.
The Future is Bright, Yet Complex
As Huawei embarks on this ambitious journey toward advanced AI capabilities, the company’s roadmap for its Ascend NPUs delineates a promising future yet laden with challenges. With a focus on innovative memory solutions and higher performance capabilities, Huawei aims to compete with market leaders in the AI space. However, they need to navigate intricate software demands and integration complexities to truly harness the power of their upcoming technologies.
In summary, while the future of Huawei’s AI ambitions appears bright, the road ahead will require a mix of technological ingenuity and strategic planning to overcome existing obstacles.
Related Hashtags
Huawei #ArtificialIntelligence #NPU #ZettaFLOPS #AICluster #TechInnovation #FutureTechnology #DataProcessing #AIRevolution #HuaweiConnect #TechTrends
Original Text – https://www.tomshardware.com/tech-industry/artificial-intelligence/huawei-ascend-npu-roadmap-examined-company-targets-4-zettaflops-fp4-performance-by-2028-amid-manufacturing-constraints