FilesystemDataStorage: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Local (data) storage refers to all data that is stored physically at UMIACS, i.e., on hard drives either in a datacenter managed by [[HelpDesk | UMIACS staff]], or on a UMIACS-supported workstation. The opposite of this is [[CloudDataStorage | cloud storage]] which is stored on third-party providers' data hosting platforms.
Filesystem [[Data Storage | (data) storage]] refers to all data that is stored physically at UMIACS, i.e., on hard drives either in servers in datacenters managed by [[HelpDesk | UMIACS staff]], or in UMIACS-supported workstations. The opposite of this is [[CloudDataStorage | cloud storage]] which is stored on third-party providers' data hosting platforms.


The below sections outline the different categories of local storage available at UMIACS.
The below sections outline the different categories of filesystem storage available at UMIACS. Although technically filesystem storage by the above definition, UMIACS also hosts an [[OBJ | Object Store]] that is documented outside the scope of this page.


==Network Home Directory Local Storage==
==Network Home Directory Filesystem Storage==
We provide network home directory local storage to each of our users through [[NFShomes]] home directories.
We provide network home directory filesystem storage to each of our users through [[NFShomes]] home directories.


This home directory can be accessed via either
This home directory can be accessed via <code>/nfshomes/<username></code> on supported UNIX machines or by using [[WinSCP]] on Windows machines.


    /nfshomes/<username> (NFS)
Each home directory is backed up nightly using the Institute's [[TSM]] backup system. It also has [[Snapshots]] enabled for easy user restores.


or
Users are given a 30GB, non-expandable [[Quota]]. You will need to use either platform-specific filesystem storage, directly-attached filesystem storage, or other network-attached filesystem storage for increased space.
 
    \\isilondata.umiacs.umd.edu\nfshomes\<username> (CIFS)
 
and is backed up nightly using the Institute's [[TSM]] backup system. It also has [[Snapshots]] enabled for easy user restores.
 
Users are given a 30GB, non-expandable [[Quota]]. You will need to use either platform-specific local storage, directly-attached local storage, or other network-attached local storage for increased space.


On user account closure, the account's NFShomes home directory goes into our [[Archives]].
On user account closure, the account's NFShomes home directory goes into our [[Archives]].


==UNIX Local Storage==
==UNIX Filesystem Storage==
UNIX hosts use redundant, backed-up network file shares for user directories ([[#Network Home Directory Local Storage |above section]]). Research data storage ([[#Network-Attached Local Storage |below section]]) is also stored on redundant, possibly-backed-up network file shares and is generally available under /fs/
UNIX hosts use redundant, backed-up network file shares for user home directories ([[#Network Home Directory Filesystem Storage |above section]]). Research data storage ([[#Network-Attached Filesystem Storage |below section]]) is also stored on redundant, possibly-backed-up network file shares and is generally available under /fs/


All UNIX hosts also have local storage available for transitory use. These directories may be used to store temporary, local '''''COPIES''''' of data that is permanently stored elsewhere or as a staging point for output.
All UNIX hosts also have filesystem storage available for transitory use. These directories may be used to store temporary '''''COPIES''''' of data that is permanently stored elsewhere or as a staging point for output.


These directories may not, '''''under any circumstances''''', be used as permanent storage for unique, important data. They are not backed up or archived by UMIACS. UMIACS staff cannot recover damaged or deleted data from these directories and will not be responsible for data loss if they are misused. Additionally, on our [[SLURM]] compute clusters, these volumes may have an automated cleanup routine that will delete unmodified data after some number of days. You can check the page for the specific cluster you are using for more information.
These directories may not, '''''under any circumstances''''', be used as permanent storage for unique, important data. They are not backed up or archived by UMIACS. UMIACS staff cannot recover damaged or deleted data from these directories and will not be responsible for data loss if they are misused. Additionally, on our [[SLURM]] compute clusters, these volumes may have an automated cleanup routine that will delete unmodified data after some number of days. You can check the page for the specific cluster you are using for more information.
Line 35: Line 29:
   - any directory named in whole or in part "tmp", "temp", or "scratch".
   - any directory named in whole or in part "tmp", "temp", or "scratch".


==Windows and macOS Local Storage==
==Windows and macOS Filesystem Storage==
Windows and macOS hosts at UMIACS store user directories on their primary internal drives (<tt>C:\Users</tt> for Windows, <tt>/Users</tt> for macOS). Supported, UMIACS-managed hosts automatically back up user data on these drives nightly using the Institute's [[TSM]] backup system. If you have a supported, UMIACS-managed host that has other internal or external hard drives attached to it, or other partitions on its primary internal drive, please be aware that these drives/partitions '''are not''' backed up. Laptops and non-standard hosts are not automatically backed up and should be manually backed up by their users.
Windows and macOS hosts at UMIACS store user directories on their primary internal drives (<tt>C:\Users</tt> for Windows, <tt>/Users</tt> for macOS). Supported, UMIACS-managed hosts automatically back up user data on these drives nightly using the Institute's [[TSM]] backup system. If you have a supported, UMIACS-managed host that has other internal or external hard drives attached to it, or other partitions on its primary internal drive, please be aware that these drives/partitions '''are not''' backed up. Laptops and non-standard hosts are not automatically backed up and should be manually backed up by their users.


On host decommission, user directories go into our [[Archives]].
On host decommission, user directories go into our [[Archives]].


==Direct-Attached Local Storage==
==Direct-Attached Filesystem Storage==
Direct-attached local storage refers to devices like USB flash drives and USB hard drives, which are very popular for easily expanding storage capacity on a host. However, these devices are significantly more vulnerable to data loss or theft than internal or networked data storage. In general, UMIACS discourages the use of direct-attached local storage when any other option is available. Please note that these devices are prone to high rates of failure and additional steps should be taken to ensure that the data is backed up and that critical or confidential data is not lost or stolen.
Direct-attached filesystem storage refers to devices like USB flash drives and USB hard drives, which are very popular for easily expanding storage capacity on a host. However, these devices are significantly more vulnerable to data loss or theft than internal or networked data storage. In general, UMIACS discourages the use of direct-attached filesystem storage when any other option is available. Please note that these devices are prone to high rates of failure and additional steps should be taken to ensure that the data is backed up and that critical or confidential data is not lost or stolen.


Direct-attached local storage is not backed up or archived by UMIACS.
Direct-attached filesystem storage is not backed up or archived by UMIACS.


==Network-Attached Local Storage==
==Network-Attached Filesystem Storage==
Some labs have network-attached local storage space dedicated for datasets, models, and project storage. These shares are typically named in the form <tt>/fs/<lab>-<purpose></tt> (i.e., <tt>/fs/cml-models</tt> or <tt>/fs/vulcan-projects</tt>).
Some labs have network-attached filesystem storage space dedicated for datasets, models, and project storage. These shares are typically named in the form <tt>/fs/<lab>-<purpose></tt> (i.e., <tt>/fs/cml-models</tt> or <tt>/fs/vulcan-projects</tt>).


Network-attached local storage may or may not be backed up and/or archived by UMIACS. Details of a specific share's retention policy should be stated along with the documentation of the share's access / usage policy. If you find an  documentation network-attached local storage space in this wiki that does not state its retention policy, please [[HelpDesk | contact staff]].
Network-attached filesystem storage may or may not be backed up and/or archived by UMIACS. Details of a specific share's retention policy should be stated along with the documentation of the share's access / usage policy. If you find documentation for a network-attached filesystem storage space in this wiki that does not state its retention policy, please [[HelpDesk | contact staff]].


===Network-Attached Local Scratch Storage===
===Network-Attached Filesystem Scratch Storage===
One specific sub-category of network-attached local storage is network-attached local scratch storage. These shares are named similarly to local scratch or temporary storage, but with the lab's name included (i.e., <tt>/fs/cbcb-scratch</tt> or <tt>/gammascratch</tt>), are intended for scratch/temporary storage, and are subject to the same policies as local scratch/tmp directories, discussed above.
One specific sub-category of network-attached filesystem storage is network-attached filesystem scratch storage. These shares are named similarly to UNIX filesystem storage, but with the lab's name included (i.e., <tt>/fs/cbcb-scratch</tt> or <tt>/gammascratch</tt>), are intended for scratch/temporary storage, and are subject to the same policies as filesystem scratch/tmp directories, discussed above.


Network-attached local scratch storage is not backed up or archived by UMIACS.
Network-attached filesystem scratch storage is not backed up or archived by UMIACS.


==UNIX Local Storage Commands==
==UNIX Filesystem Storage Commands==
Below are a few different CLI commands that may prove useful for monitoring your local storage usage and performance. For additional information, run <code>[command] --help</code> or <code>man [command]</code>
Below are a few different CLI commands that may prove useful for monitoring your filesystem storage usage and performance. For additional information, run <code>[command] --help</code> or <code>man [command]</code>


df - Shows descriptive file system information
df - Shows descriptive file system information
Line 64: Line 58:
or all file systems by default.
or all file systems by default.
</pre>
</pre>
For example, to check how much space is available at a directory:
<pre>df -h ./</pre>


du - Shows disk usage of specific files. Use the -d flag for better depth control.
du - Shows disk usage of specific files. Use the -d flag for better depth control.
Line 71: Line 67:
Summarize disk usage of each FILE, recursively for directories.
Summarize disk usage of each FILE, recursively for directories.
</pre>
</pre>
For example, to check how much space each file in a directory takes up:
<pre>du -ah -d 1 ./</pre>


free - Shows current memory(RAM) usage. Use the -h flag for a human readable format.
free - Shows current memory(RAM) usage. Use the -h flag for a human readable format.

Latest revision as of 19:28, 24 September 2025

Filesystem (data) storage refers to all data that is stored physically at UMIACS, i.e., on hard drives either in servers in datacenters managed by UMIACS staff, or in UMIACS-supported workstations. The opposite of this is cloud storage which is stored on third-party providers' data hosting platforms.

The below sections outline the different categories of filesystem storage available at UMIACS. Although technically filesystem storage by the above definition, UMIACS also hosts an Object Store that is documented outside the scope of this page.

Network Home Directory Filesystem Storage

We provide network home directory filesystem storage to each of our users through NFShomes home directories.

This home directory can be accessed via /nfshomes/<username> on supported UNIX machines or by using WinSCP on Windows machines.

Each home directory is backed up nightly using the Institute's TSM backup system. It also has Snapshots enabled for easy user restores.

Users are given a 30GB, non-expandable Quota. You will need to use either platform-specific filesystem storage, directly-attached filesystem storage, or other network-attached filesystem storage for increased space.

On user account closure, the account's NFShomes home directory goes into our Archives.

UNIX Filesystem Storage

UNIX hosts use redundant, backed-up network file shares for user home directories (above section). Research data storage (below section) is also stored on redundant, possibly-backed-up network file shares and is generally available under /fs/

All UNIX hosts also have filesystem storage available for transitory use. These directories may be used to store temporary COPIES of data that is permanently stored elsewhere or as a staging point for output.

These directories may not, under any circumstances, be used as permanent storage for unique, important data. They are not backed up or archived by UMIACS. UMIACS staff cannot recover damaged or deleted data from these directories and will not be responsible for data loss if they are misused. Additionally, on our SLURM compute clusters, these volumes may have an automated cleanup routine that will delete unmodified data after some number of days. You can check the page for the specific cluster you are using for more information.

Please note that /tmp in particular is at risk for data loss or corruption as that directory is regularly used by system processes and services for temporary storage.

These directories include:

 - /tmp
 - /scratch0, /scratch1, ... (/scratch#)
 - any directory named in whole or in part "tmp", "temp", or "scratch".

Windows and macOS Filesystem Storage

Windows and macOS hosts at UMIACS store user directories on their primary internal drives (C:\Users for Windows, /Users for macOS). Supported, UMIACS-managed hosts automatically back up user data on these drives nightly using the Institute's TSM backup system. If you have a supported, UMIACS-managed host that has other internal or external hard drives attached to it, or other partitions on its primary internal drive, please be aware that these drives/partitions are not backed up. Laptops and non-standard hosts are not automatically backed up and should be manually backed up by their users.

On host decommission, user directories go into our Archives.

Direct-Attached Filesystem Storage

Direct-attached filesystem storage refers to devices like USB flash drives and USB hard drives, which are very popular for easily expanding storage capacity on a host. However, these devices are significantly more vulnerable to data loss or theft than internal or networked data storage. In general, UMIACS discourages the use of direct-attached filesystem storage when any other option is available. Please note that these devices are prone to high rates of failure and additional steps should be taken to ensure that the data is backed up and that critical or confidential data is not lost or stolen.

Direct-attached filesystem storage is not backed up or archived by UMIACS.

Network-Attached Filesystem Storage

Some labs have network-attached filesystem storage space dedicated for datasets, models, and project storage. These shares are typically named in the form /fs/<lab>-<purpose> (i.e., /fs/cml-models or /fs/vulcan-projects).

Network-attached filesystem storage may or may not be backed up and/or archived by UMIACS. Details of a specific share's retention policy should be stated along with the documentation of the share's access / usage policy. If you find documentation for a network-attached filesystem storage space in this wiki that does not state its retention policy, please contact staff.

Network-Attached Filesystem Scratch Storage

One specific sub-category of network-attached filesystem storage is network-attached filesystem scratch storage. These shares are named similarly to UNIX filesystem storage, but with the lab's name included (i.e., /fs/cbcb-scratch or /gammascratch), are intended for scratch/temporary storage, and are subject to the same policies as filesystem scratch/tmp directories, discussed above.

Network-attached filesystem scratch storage is not backed up or archived by UMIACS.

UNIX Filesystem Storage Commands

Below are a few different CLI commands that may prove useful for monitoring your filesystem storage usage and performance. For additional information, run [command] --help or man [command]

df - Shows descriptive file system information

Usage: df [OPTION]... [FILE]...
Show information about the file system on which each FILE resides,
or all file systems by default.

For example, to check how much space is available at a directory:

df -h ./

du - Shows disk usage of specific files. Use the -d flag for better depth control.

Usage: du [OPTION]... [FILE]...
  or:  du [OPTION]... --files0-from=F
Summarize disk usage of each FILE, recursively for directories.

For example, to check how much space each file in a directory takes up:

du -ah -d 1 ./

free - Shows current memory(RAM) usage. Use the -h flag for a human readable format.

Usage:
 free [options]

quota - Shows quota information, this is useful for viewing per filesystem limits in places such as a home directory.

quota: Usage: quota [-guqvswim] [-l | [-Q | -A]] [-F quotaformat]
	quota [-qvswim] [-l | [-Q | -A]] [-F quotaformat] -u username ...
	quota [-qvswim] [-l | [-Q | -A]] [-F quotaformat] -g groupname ...
	quota [-qvswugQm] [-F quotaformat] -f filesystem ...

iostat - Shows drive utilization, as well as other utilizations. Pair this with the watch command for regular updates.

Usage: iostat [ options ] [ <interval> [ <count> ] ]
Options are:
[ -c ] [ -d ] [ -h ] [ -k | -m ] [ -N ] [ -t ] [ -V ] [ -x ] [ -y ] [ -z ]
[ -j { ID | LABEL | PATH | UUID | ... } ]
[ [ -T ] -g <group_name> ] [ -p [ <device> [,...] | ALL ] ]
[ <device> [...] | ALL ]