Vulcan/Storage

From cfar
Jump to navigation Jump to search

The Vulcan cluster has the following storage available. Please also review UMIACS Local Data Storage policies including any volume that is labeled as scratch.

Home Directory

Your local home directory is for storing your configuration files and code. It is available as /cfarhomes/$username. Your home directory is limited to 10GB if it was created on or after 04/09/2021 or 20GB if it was created before 04/09/2021. It has both snapshots and backups available if need be.

Scratch Directories

Scratch data has no data protection including no snapshots and the data is not backed up. There are two types of scratch directories in the Vulcan compute infrastructure:

  • Network scratch directory
  • Local scratch directories

Network Scratch Directory

You are allocated 300GB of scratch space via NFS from /vulcanscratch/$username. It is not backed up or protected in any way. This directory is automounted so you will need to cd into the directory or request/specify a fully qualified file path to access this.

You may request a temporary increase of up to 500GB total space for a maximum of 120 days without any faculty approval by contacting staff@umiacs.umd.edu. Once the temporary increase period is over, you will be contacted and given a one-week window of opportunity to clean and secure your data before staff will forcibly remove data to get your space back under 300GB. If you need space beyond 500GB or for longer than 120 days, you will need faculty approval and/or a project directory.

This file system is available on all submission, data management, and computational nodes within the cluster.

Local Scratch Directories

Each computational node that you can schedule compute jobs on has one or more local scratch directories. These are always named /scratch0, /scratch1, etc. These are almost always more performant than any other storage available to the job. However, you must stage their data within the confine of their job and stage the data out before the end of their job.

These local scratch directories have a tmpwatch job which will delete unaccessed data after 90 days, scheduled via maintenance jobs to run once a month at 1am. Different nodes will run the maintenance jobs on different days of the month to ensure the cluster is still highly available at all times. Please make sure you secure any data you write to these directories at the end of your job.

Datasets

We have read-only dataset storage available at /fs/vulcan-datasets. If there are datasets that you would like to see curated and available, please see this page.

The following is the list of datasets available:

Dataset Path
3D-FRONT /fs/vulcan-datasets/3d-front
3D-FUTURE /fs/vulcan-datasets/3d-future
Action Genome /fs/vulcan-datasets/AG
ActivityNet /fs/vulcan-datasets/ActivityNet
CATER /fs/vulcan-datasets/CATER
COVID-DA /fs/vulcan-datasets/COVID-DA
CelebA /fs/vulcan-datasets/CelebA
CelebA-HQ /fs/vulcan-datasets/CelebA-HQ
CelebAMask-HQ /fs/vulcan-datasets/CelebAMask-HQ
Charades /fs/vulcan-datasets/Charades
CharadesEgo /fs/vulcan-datasets/CharadesEgo
CIFAR10 /fs/vulcan-datasets/cifar-10-python
CIFAR100 /fs/vulcan-datasets/cifar-100-python
CityScapes /fs/vulcan-datasets/cityscapes
COCO /fs/vulcan-datasets/coco
Conceptual Captions /fs/vulcan-datasets/conceptual_captions
CUB /fs/vulcan-datasets/CUB
DeepFashion /fs/vulcan-datasets/DeepFashion
Digits /fs/vulcan-datasets/digits_full
Edges2handbags /fs/vulcan/datasets/edges2handbags
Edges2shoes /fs/vulcan/datasets/edges2shoes
EGTEA /fs/vulcan/datasets/EGTEA
emnist /fs/vulcan-datasets/emnist
EPIC Kitchens 2018 /fs/vulcan-datasets/Epics-kitchen-2018
EPIC Kitchens 2020 /fs/vulcan-datasets/EPIC-Kitchens-2020
Facades /fs/vulcan/datasets/facades
from_games (GTA5) /fs/vulcan-datasets/from_games
FFHQ /fs/vulcan-datasets/ffhq-dataset
FineGym /fs/vulcan-datasets/FineGym
Google Landmarks Dataset v2 /fs/vulcan-datasets/google-landmark-v2
HAA500 /fs/vulcan-datasets/haa500
HICO /fs/vulcan-datasets/HICO
HMDB51 /fs/vulcan-datasets/HMDB51
Honda_100h /fs/vulcan-datasets/honda_100h
HPatches /fs/vulcan-datasets/HPatches
Human3.6M /fs/vulcan-datasets/human3.6
IM2GPS (test only) /fs/vulcan-datasets/im2gps
ImageNet /fs/vulcan-datasets/imagenet
iNaturalist Dataset 2021 /fs/vulcan-datasets/inat_comp_2021
InteriorNet /fs/vulcan-datasets/InteriorNet
Kinetics-400 /fs/vulcan-datasets/Kinetics-400
Labelled Faces in the Wild /fs/vulcan-datasets/lfw
LibriSpeech /fs/vulcan-datasets/LibriSpeech
LSUN /fs/vulcan-datasets/LSUN
LVIS /fs/vulcan-datasets/LVIS
Maps /fs/vulcan-datasets/maps
Matterport3D /fs/vulcan-datasets/Matterport3D
MegaDepth /fs/vulcan-datasets/MegaDepth
MineRL /fs/vulcan-datasets/MineRL
Mini-ImageNet /fs/vulcan-datasets/miniImagenet
MIT Indoor /fs/vulcan-datasets/mit_indoor
MIT Places /fs/vulcan-datasets/mit_places
Multi-PIE Face /fs/vulcan-datasets/multipie
Night2day /fs/vulcan-datasets/night2day
ObjectNet3D /fs/vulcan-datasets/ObjectNet3D
Occluded Video Instance Segmentation /fs/vulcan-datasets/ovis-2021
Office /fs/vulcan-datasets/office
Office-Home /fs/vulcan-datasets/office_home
omniglot /fs/vulcan-datasets/omniglot
OOPS /fs/vulcan-datasets/OOPS
OpenImagesv4 /fs/vulcan-datasets/OpenImagesv4
PartNet /fs/vulcan-datasets/PartNet
Pascal VOC /fs/vulcan-datasets/pascal_voc
PIC (HOI-A) /fs/vulcan-datasets/PIC
PubLayNet /fs/vulcan-datasets/PubLayNet
Replica /fs/vulcan-datasets/Replica
ScanNet /fs/vulcan-datasets/ScanNet
ShapeNetCore.v2 /fs/vulcan-datasets/ShapeNetCore.v2
Something-Something-V1 /fs/vulcan-datasets/SomethingV1
Something-Something-V2 /fs/vulcan-datasets/SomethingV2
SYNTHIA-RAND-CITYSCAPES /fs/vulcan-datasets/SYNTHIA-RAND-CITYSCAPES
TAPOS /fs/vulcan-datasets/TAPOS
Tiny ImageNet /fs/vulcan-datasets/tiny_imagenet
Tumblr GIF Description /fs/vulcan-datasets/TGIF
Thingi10K /fs/vulcan-datasets/Thingi10K
UCF101 /fs/vulcan-datasets/UCF101
VirtualHomes /fs/vulcan-datasets/VirtualHomes
visda17 /fs/vulcan-datasets/visda17
visda17_openset /fs/vulcan-datasets/VISDA
visda19 /fs/vulcan-datasets/visda
Visual Genome /fs/vulcan-datasets/VG
Visual Relationship Detection /fs/vulcan-datasets/VRD
VOCdevkit /fs/vulcan-datasets/VOCdevkit
VoxCeleb2 /fs/vulcan-datasets/VoxCeleb2
WILDS /fs/vulcan-datasets/WILDS
xView2 /fs/vulcan-datasets/xView2
YCB Object Models /fs/vulcan-datasets/YCB
YouTube8M /fs/vulcan-datasets/YouTube8M
YouTubeVIS-2019 /fs/vulcan-datasets/YouTubeVIS-2019
YouTubeVIS-2021 /fs/vulcan-datasets/YouTubeVIS-2021

Project Storage

Users within the Vulcan compute infrastructure can request project based allocations for up to 10TB for up to 180 days from staff@umiacs.umd.edu with approval from the Vulcan faculty manager (Dr. Shrivastava). These allocations will be under a name that you provide when you request the allocation. Staff will provide the file system path in the response to the request. Once the allocation period is over, you will be contacted and given a 14-day window of opportunity to clean and secure your data before staff will remove the allocation.

This data, by default, will be backed up nightly and have a limited snapshot schedule (1 daily snapshot). Upon request to staff@umiacs.umd.edu, staff can both exclude the data from backups and/or disable snapshots on the project storage volume. We currently have 100TB total to support these projects which includes the snapshot data for this volume.

Object Storage

All Vulcan users can request project allocations in the UMIACS Object Store. Please email staff@umiacs.umd.edu with a short project name and the amount of storage you will need to get started.

An example on how to use the umobj command line utilities can be found here. A full set of documentation for the utilities can be found on the umobj Gitlab page.