Latest revision as of 21:09, 3 December 2024

The Tron partition is a subset of resources available in the Nexus. It was purchased using college-level funding for UMIACS and CSD faculty.

Compute Nodes

The partition contains 69 compute nodes with specs as detailed below.

Nodenames	Type	Quantity	CPU cores per node	Memory per node	GPUs per node
tron[00-05]	A6000 GPU Node	6	32	256GB	8
tron[06-44]	A4000 GPU Node	39	16	128GB	4
tron[46-61]	A5000 GPU Node	16	48	256GB	8
tron[62-69]	RTX 2080 Ti GPU Node	8	32	384GB	8
tron[00-44,46-69]	Total	69	1840	13282GB	396

Network

The network infrastructure supporting the Tron partition consists of:

One pair of network switches connected to each other via dual 100GbE links for redundancy, serving the following compute nodes:
- tron[00-05]: Two 100GbE links per node, one to each switch in the pair (redundancy).
- tron[06-44]: Two 50GbE links per node, one to each switch in the pair (redundancy).
- tron[46-61]: One 100GbE link per node. Half of the overall links for this set of nodes go to one switch in the pair, and the other half go to the other switch in the pair. These nodes do not have redundant links because the switches are currently at port capacity.
One switch connected to the above pair of network switches via two 100GbE links, one to each switch in the pair for redundancy, serving the following compute nodes:
- tron[62-69]: Two 10GbE links to the switch per node (increased bandwidth).

The fileserver hosting all Nexus scratch, faculty, project, and dataset allocations first connects to a pair of intermediary switches before reaching the compute nodes. The last hop from the pair of intermediary switches to the first pair of switches mentioned on this page is via four 100GbE links, one for each combination of switches across each pairing, for redundancy and increased bandwidth.

For a broader overview of the network infrastructure supporting the Nexus cluster, please see Nexus/Network.

@@ Line 1: / Line 1: @@
 The Tron partition is a subset of resources available in the [[Nexus]].  It was purchased using college-level funding for UMIACS and CSD faculty.
-= Hardware =
+= Compute Nodes =
-The full configuration includes 70 nodes with specs as detailed below.
+The partition contains 69 compute nodes with specs as detailed below.
 {| class="wikitable sortable"
@@ Line 47: / Line 47: @@
 |396
 |}
+= Network =
+The network infrastructure supporting the Tron partition consists of:
+# One pair of network switches connected to each other via dual 100GbE links for redundancy, serving the following compute nodes:
+#* tron[00-05]: Two 100GbE links per node, one to each switch in the pair (redundancy).
+#* tron[06-44]: Two 50GbE links per node, one to each switch in the pair (redundancy).
+#* tron[46-61]: One 100GbE link per node. Half of the overall links for this set of nodes go to one switch in the pair, and the other half go to the other switch in the pair. These nodes do not have redundant links because the switches are currently at port capacity.
+# One switch connected to the above pair of network switches via two 100GbE links, one to each switch in the pair for redundancy, serving the following compute nodes:
+#* tron[62-69]: Two 10GbE links to the switch per node (increased bandwidth).
+The fileserver hosting all Nexus [[Nexus#Scratch_Directories | scratch]], [[Nexus#Faculty_Allocations | faculty]], [[Nexus#Project_Allocations | project]], and [[Nexus#Datasets | dataset]] allocations first connects to a pair of intermediary switches before reaching the compute nodes. The last hop from the pair of intermediary switches to the first pair of switches mentioned on this page is via four 100GbE links, one for each combination of switches across each pairing, for redundancy and increased bandwidth.
+For a broader overview of the network infrastructure supporting the Nexus cluster, please see [[Nexus/Network]].

Nexus/Tron: Difference between revisions

Latest revision as of 21:09, 3 December 2024

Compute Nodes

Network

Navigation menu

Nexus/Tron: Difference between revisions

Latest revision as of 21:09, 3 December 2024

Compute Nodes

Network

Navigation menu

Search