Huawei Debuts CloudMatrix 384 Supernode to Challenge NVIDIA’s NVL72

The CloudMatrix 384 is already operational within Huawei’s cloud, and several Chinese firms are conducting trials

Huawei Debuts CloudMatrix 384 Supernode to Challenge NVIDIA’s NVL72

At the World Artificial Intelligence Conference in Shanghai, Huawei unveiled its CloudMatrix 384, an AI cluster built around 384 Ascend 910C chips linked via proprietary “supernode” architecture.

The system delivered up to 300 petaflops of BF16 compute, nearly double the 180 petaflops of NVIDIA’s GB200 NVL72.

While each Ascend chip lags behind NVIDIA’s GPUs in individual performance, Huawei compensates through scale: the cluster boasts 3.6× the memory capacity and 2.1× more bandwidth, enabled by ultra‑fast optical interconnects across its 16-rack design.

Despite its impressive throughput, CloudMatrix 384 consumes roughly four times the power of an NVL72 setup—around 559 kW versus 145 kW—and is significantly less power-efficient per FLOP. However, abundant electricity and a domestic AI ecosystem in China make this trade‑off more viable domestically.

The CloudMatrix 384 is already operational within Huawei’s cloud, and several Chinese firms are conducting trials under close engineering oversight to address thermal, logistical, and software integration challenges.

According to reports, Huawei has initiated efforts to export its Ascend 910B AI chips to markets in the Middle East and Southeast Asia, aiming to challenge U.S. leader Nvidia.

The company has approached prospective buyers in the United Arab Emirates (UAE), Saudi Arabia, Thailand, and Malaysia, offering chips in modest batches—although no deals have yet been finalised.

Huawei founder Ren Zhengfei acknowledged that its Ascend chips remain “one generation behind” U.S. competitors. Nonetheless, he asserted that cluster computing and mathematical optimization are enabling performance that rivals globally competitive systems despite U.S. export restrictions