Nexus/Tron: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 50: | Line 50: | ||
= Network = | = Network = | ||
The network infrastructure supporting the Tron partition consists of: | The network infrastructure supporting the Tron partition consists of: | ||
# One pair of network switches connected to each other via dual 100GbE links for redundancy, serving the following | # One pair of network switches connected to each other via dual 100GbE links for redundancy, serving the following compute nodes: | ||
#* tron[00-05]: Two 100GbE links per node, one to each switch in the pair (redundancy). | #* tron[00-05]: Two 100GbE links per node, one to each switch in the pair (redundancy). | ||
#* tron[06-44]: Two 50GbE links per node, one to each switch in the pair (redundancy). | #* tron[06-44]: Two 50GbE links per node, one to each switch in the pair (redundancy). | ||
#* tron[46-61]: One 100GbE link per node. Half of the overall links for this set of nodes go to one switch in the pair, and the other half go to the other switch in the pair. | #* tron[46-61]: One 100GbE link per node. Half of the overall links for this set of nodes go to one switch in the pair, and the other half go to the other switch in the pair. | ||
# One switch connected to the above pair of network switches via two 100GbE links, one to each switch in the pair for redundancy, serving the following | # One switch connected to the above pair of network switches via two 100GbE links, one to each switch in the pair for redundancy, serving the following compute nodes: | ||
#* tron[62-69]: Two 10GbE links to the switch per node (increased bandwidth). | #* tron[62-69]: Two 10GbE links to the switch per node (increased bandwidth). | ||
Revision as of 17:10, 2 December 2024
The Tron partition is a subset of resources available in the Nexus. It was purchased using college-level funding for UMIACS and CSD faculty.
Compute Nodes
The partition contains 69 compute nodes with specs as detailed below.
Nodenames | Type | Quantity | CPU cores per node | Memory per node | GPUs per node |
---|---|---|---|---|---|
tron[00-05] | A6000 GPU Node | 6 | 32 | 256GB | 8 |
tron[06-44] | A4000 GPU Node | 39 | 16 | 128GB | 4 |
tron[46-61] | A5000 GPU Node | 16 | 48 | 256GB | 8 |
tron[62-69] | RTX 2080 Ti GPU Node | 8 | 32 | 384GB | 8 |
tron[00-44,46-69] | Total | 69 | 1840 | 13282GB | 396 |
Network
The network infrastructure supporting the Tron partition consists of:
- One pair of network switches connected to each other via dual 100GbE links for redundancy, serving the following compute nodes:
- tron[00-05]: Two 100GbE links per node, one to each switch in the pair (redundancy).
- tron[06-44]: Two 50GbE links per node, one to each switch in the pair (redundancy).
- tron[46-61]: One 100GbE link per node. Half of the overall links for this set of nodes go to one switch in the pair, and the other half go to the other switch in the pair.
- One switch connected to the above pair of network switches via two 100GbE links, one to each switch in the pair for redundancy, serving the following compute nodes:
- tron[62-69]: Two 10GbE links to the switch per node (increased bandwidth).
The fileserver hosting all Nexus scratch, faculty, project, and dataset allocations also connects to the same pair of switches supporting tron[00-44,46-61] via four 100GbE links, two to each switch in the pair (redundancy and increased bandwidth).
For a broader overview of the network infrastructure supporting the Nexus cluster, please see Nexus/Network.