Nexus/Network: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
==Overview==
==Overview==
The [[Nexus]] cluster runs on a [https://en.wikipedia.org/wiki/Hierarchical_internetworking_model hierarchical] [https://en.wikipedia.org/wiki/Ethernet Ethernet]-based network with node-level speeds ranging anywhere from [https://en.wikipedia.org/wiki/Gigabit_Ethernet 1GbE] to [https://en.wikipedia.org/wiki/100_Gigabit_Ethernet 100GbE]. Generally speaking, newer-purchased nodes often come with hardware capable of using, and therefore use, faster speeds, but not always. Faster speeds require increasingly more expensive network switches and cables, so some labs/centers have opted to stay with slower speeds.
The [[Nexus]] cluster runs on a [https://en.wikipedia.org/wiki/Hierarchical_internetworking_model hierarchical] [https://en.wikipedia.org/wiki/Ethernet Ethernet]-based network with node-level speeds ranging anywhere from [https://en.wikipedia.org/wiki/Gigabit_Ethernet 1GbE] to [https://en.wikipedia.org/wiki/100_Gigabit_Ethernet 100GbE]. Generally speaking, newer-purchased compute nodes often come with hardware capable of using, and therefore use, faster speeds, but not always. Increasingly faster speeds require increasingly more expensive network switches and cables, so some labs/centers have opted to stay with slower speeds.
 
If you are running multi-node jobs in [[SLURM]], or simply want the best performance for a single-node job depending on what filesystem path(s) your job uses, it can be important to know the basics of the cluster's architecture to optimize performance.


In the future, SLURM's [https://slurm.schedmd.com/topology.html topology-aware resource allocation support] may be implemented on the cluster, but it is not currently.
In the future, SLURM's [https://slurm.schedmd.com/topology.html topology-aware resource allocation support] may be implemented on the cluster, but it is not currently.
==Network Core==
The network core for Nexus is the same network core used by all UMIACS-supported systems. There are two network switches connected to each other for redundancy via 40GbE. Node-to-node communications for nodes in the same [[Nexus#Partitions || partition]] rarely ever need to traverse the network core.
==Network Access==
stub

Revision as of 20:00, 26 November 2024

Overview

The Nexus cluster runs on a hierarchical Ethernet-based network with node-level speeds ranging anywhere from 1GbE to 100GbE. Generally speaking, newer-purchased compute nodes often come with hardware capable of using, and therefore use, faster speeds, but not always. Increasingly faster speeds require increasingly more expensive network switches and cables, so some labs/centers have opted to stay with slower speeds.

If you are running multi-node jobs in SLURM, or simply want the best performance for a single-node job depending on what filesystem path(s) your job uses, it can be important to know the basics of the cluster's architecture to optimize performance.

In the future, SLURM's topology-aware resource allocation support may be implemented on the cluster, but it is not currently.

Network Core

The network core for Nexus is the same network core used by all UMIACS-supported systems. There are two network switches connected to each other for redundancy via 40GbE. Node-to-node communications for nodes in the same | partition rarely ever need to traverse the network core.

Network Access

stub