Mellanox (NVIDIA) MQM9790-NS2F InfiniBand Switch in Action | Low-Latency Interconnect Optimization for RDMA/HPC/AI

May 28, 2026

As large-scale AI training clusters and high-performance computing (HPC) centers push network bandwidth and latency requirements to unprecedented levels, traditional Ethernet solutions increasingly struggle with congestion control and unpredictable tail latency under RDMA workloads. A leading national supercomputing center recently faced exactly this challenge when upgrading its next-generation GPU cluster. After evaluating multiple interconnect options, the team selected the Mellanox (NVIDIA) MQM9790-NS2F as the core fabric switch — a decision that fundamentally transformed their cluster’s performance profile.

Background & Challenge: The Scalability Wall

The supercomputing center’s existing HDR InfiniBand fabric was operating near saturation. With over 2,000 GPUs running parallel AI training jobs, collective communication operations like all-reduce and all-to-all were experiencing significant tail latency spikes. The network had become the primary bottleneck, causing GPU idle time that wasted both computational resources and energy. Engineers estimated that nearly 30% of compute cycles were lost to communication overhead during large-scale distributed training runs.

What the team needed was a switch capable of delivering 400Gb/s per port, native RDMA support, and in-network computing acceleration — all while maintaining backward compatibility with existing HDR infrastructure. After reviewing the MQM9790-NS2F datasheet and MQM9790-NS2F specifications, they determined that the MQM9790-NS2F InfiniBand switch offered the ideal balance of density, performance, and feature set.

Solution & Deployment: A 64-Port NDR Fabric Upgrade

The center deployed four MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches in a spine-leaf topology, interconnecting 2,048 GPUs across 64 compute nodes. Each node connects via a single OSFP-to-4x100Gb/s splitter cable, providing 400Gb/s aggregate bandwidth per server while optimizing cable management density.

Deployment Parameter	Configuration
Switch Model	NVIDIA Mellanox MQM9790-NS2F (4 units)
Port Configuration	64x OSFP, 400Gb/s NDR per port
Total GPUs	2,048 (NVIDIA H100)
In-Network Features	SHARPv3, Adaptive Routing, Congestion Control

Key to the deployment was ensuring full MQM9790-NS2F compatible operation with existing HDR endpoint adapters. The switch’s automatic speed negotiation and link-layer translation allowed a phased migration strategy — legacy nodes operate at HDR speeds while new NDR-capable servers leverage full 400Gb/s bandwidth. The center also utilized SHARPv3 in-network aggregation, reducing all-reduce traffic by over 65% for large message sizes commonly found in LLM training.

For those evaluating similar upgrades, MQM9790-NS2F price inquiries and MQM9790-NS2F for sale availability have increased significantly among enterprise and research customers. The switch’s competitive total cost of ownership — factoring in lower switch count due to 64-port density — makes it an attractive option for both new builds and refresh projects.

Results & Benefits: Measurable Performance Gains

All-reduce latency (1GB message): Reduced from 48µs to 19µs (60% improvement)
Effective GPU utilization: Increased from 71% to 93% during large-scale training
Job completion time (GPT-3 175B equivalent): Shortened by 41%
Network-induced tail latency (99th percentile): Cut from 210µs to under 35µs

As an MQM9790-NS2F InfiniBand switch solution, the deployment demonstrated that 400Gb/s NDR fabrics can deliver on their theoretical promises. The combination of congestion control algorithms and adaptive routing eliminated the "incast" collapse patterns that plagued the previous HDR fabric during all-to-all communication phases.

Summary & Outlook: A Foundation for Exascale AI

The supercomputing center’s success with the MQM9790-NS2F has accelerated their roadmap toward exascale AI capabilities. They are now planning a second phase that will double the GPU count to 4,096 using additional MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches in a three-tier fat-tree topology. The switch’s telemetry and out-of-band management features have also enabled predictive congestion avoidance, reducing operational overhead for the network team.

For network architects and IT managers evaluating next-generation fabrics, the NVIDIA Mellanox MQM9790-NS2F represents a mature, production-proven solution. Whether you are building a new AI research cluster or upgrading an existing HPC facility, this switch delivers the low-latency, high-bandwidth foundation required for modern parallel workloads.