Mellanox (NVIDIA) MQM9790-NS2F InfiniBand Switch in Action | Low-Latency Interconnect Optimization for RDMA/HPC/AI
May 28, 2026
As large-scale AI training clusters and high-performance computing (HPC) centers push network bandwidth and latency requirements to unprecedented levels, traditional Ethernet solutions increasingly struggle with congestion control and unpredictable tail latency under RDMA workloads. A leading national supercomputing center recently faced exactly this challenge when upgrading its next-generation GPU cluster. After evaluating multiple interconnect options, the team selected the Mellanox (NVIDIA) MQM9790-NS2F as the core fabric switch — a decision that fundamentally transformed their cluster’s performance profile.
Background & Challenge: The Scalability Wall
The supercomputing center’s existing HDR InfiniBand fabric was operating near saturation. With over 2,000 GPUs running parallel AI training jobs, collective communication operations like all-reduce and all-to-all were experiencing significant tail latency spikes. The network had become the primary bottleneck, causing GPU idle time that wasted both computational resources and energy. Engineers estimated that nearly 30% of compute cycles were lost to communication overhead during large-scale distributed training runs.
What the team needed was a switch capable of delivering 400Gb/s per port, native RDMA support, and in-network computing acceleration — all while maintaining backward compatibility with existing HDR infrastructure. After reviewing the MQM9790-NS2F datasheet and MQM9790-NS2F specifications, they determined that the MQM9790-NS2F InfiniBand switch offered the ideal balance of density, performance, and feature set.
Solution & Deployment: A 64-Port NDR Fabric Upgrade
The center deployed four MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches in a spine-leaf topology, interconnecting 2,048 GPUs across 64 compute nodes. Each node connects via a single OSFP-to-4x100Gb/s splitter cable, providing 400Gb/s aggregate bandwidth per server while optimizing cable management density.
| Deployment Parameter | Configuration |
|---|---|
| Switch Model | NVIDIA Mellanox MQM9790-NS2F (4 units) |
| Port Configuration | 64x OSFP, 400Gb/s NDR per port |
| Total GPUs | 2,048 (NVIDIA H100) |
| In-Network Features | SHARPv3, Adaptive Routing, Congestion Control |
Key to the deployment was ensuring full MQM9790-NS2F compatible operation with existing HDR endpoint adapters. The switch’s automatic speed negotiation and link-layer translation allowed a phased migration strategy — legacy nodes operate at HDR speeds while new NDR-capable servers leverage full 400Gb/s bandwidth. The center also utilized SHARPv3 in-network aggregation, reducing all-reduce traffic by over 65% for large message sizes commonly found in LLM training.
For those evaluating similar upgrades, MQM9790-NS2F price inquiries and MQM9790-NS2F for sale availability have increased significantly among enterprise and research customers. The switch’s competitive total cost of ownership — factoring in lower switch count due to 64-port density — makes it an attractive option for both new builds and refresh projects.
Results & Benefits: Measurable Performance Gains
- All-reduce latency (1GB message): Reduced from 48µs to 19µs (60% improvement)
- Effective GPU utilization: Increased from 71% to 93% during large-scale training
- Job completion time (GPT-3 175B equivalent): Shortened by 41%
- Network-induced tail latency (99th percentile): Cut from 210µs to under 35µs
As an MQM9790-NS2F InfiniBand switch solution, the deployment demonstrated that 400Gb/s NDR fabrics can deliver on their theoretical promises. The combination of congestion control algorithms and adaptive routing eliminated the "incast" collapse patterns that plagued the previous HDR fabric during all-to-all communication phases.
Summary & Outlook: A Foundation for Exascale AI
The supercomputing center’s success with the MQM9790-NS2F has accelerated their roadmap toward exascale AI capabilities. They are now planning a second phase that will double the GPU count to 4,096 using additional MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches in a three-tier fat-tree topology. The switch’s telemetry and out-of-band management features have also enabled predictive congestion avoidance, reducing operational overhead for the network team.
For network architects and IT managers evaluating next-generation fabrics, the NVIDIA Mellanox MQM9790-NS2F represents a mature, production-proven solution. Whether you are building a new AI research cluster or upgrading an existing HPC facility, this switch delivers the low-latency, high-bandwidth foundation required for modern parallel workloads.

