Data Storage

From UMIACS
Revision as of 16:29, 6 November 2024 by Ridge (talk | contribs)
Jump to navigation Jump to search

This is a landing page for all topics related to data storage that are available at UMIACS. It is under active development.

Where can I store my data?

There is no one answer. Different types of data are used differently and have different optimal storage strategies. Trying to pick a one size fits all option will lead to things being sub-optimal overall

Before choosing where to store any of your data, consider how you may need to interact with that data in the short-term and in the long-term, and who you might need to share the data with. You can copy data between the different types of data storage listed below, but it may be unnecessarily cumbersome if you don't choose the right place for the "master copy" of your data.

  • Local data storage is best suited for data that you are actively working on. Some examples are code that you are developing or using (perhaps to run computational jobs in SLURM with), results from computational jobs that have already run but have not yet been published in a paper, or files that need to be processed by another desktop-based or server-based application.
    Data on UMIACS' local storage can be moved between different UMIACS-supported hosts and shared with other UMIACS account holders using common methods such as cp, File Explorer, Finder, and more.
  • UMIACS' Object Store is best suited for data that is going to remain static, but needs to remain accessible for very long periods of time, such as data referenced by published papers. It is also suitable for transferring specific versions of data in and out of UMIACS via large singular files, such as archive (tar/zip/etc.) files.
    Data in UMIACS' Object Store can be moved in and out of it, to local storage (whether UMIACS' or otherwise), via its built-in web interface or one of many compatible clients. Data can be shared publicly via simple download links or static websites that can visualize the data in a more accessible way. UMIACS account holders can sponsor Collaborator accounts for external collaborators that may need to upload and download new versions of data without needing direct access to UMIACS-supported hosts.
  • Cloud data storage is best suited for collaborative data, such as simple or rich text documents, PDF forms, spreadsheets, presentations, pictures, videos, and more. Many cloud storage service providers also provide web-based apps attached to their storage service that can be used to edit these types of data without ever having to download it to a specific device.
    The methods that can be used to move data in and out of cloud storage to local storage vary based on the specific service provider, but often involve web-based or mobile app-based upload and download. Data can be shared with others across the web via accounts associated with that service provider, and often publicly via simple download links.

How can I transfer my data?

  • Local data storage is typically best transferred using commands or programs available in the operating system on which it is stored or most commonly accessed from. A number of these commands are covered here.
  • Cloud data storage transfer methods vary based on the service provider. Most major providers will provide simple upload/download functionality, which works well for individual or relatively small amounts of files/folders. For bulk transfers, Rclone is one program that UMIACS staff often uses, as it has compatibility with many major providers.

How is my data retained?

  • Local data storage is retained by UMIACS staff in a number of different ways, namely:
    Snapshots: Point-in-time copies of specific file systems, easily accessible for quick restores. Taken more often than daily and retained for up to a week (see page for more details).
    NightlyBackups: Daily copies of specific file systems, sent to a backup server managed by staff. Taken daily and retained for up to 90 days.
    Archives: Final copies of specific types of data, stored on an archive server managed by staff. Taken when specific types of data is decommissioned and retained for up to 5 years (see page for more details).
  • Cloud data storage retention policies vary based on the service provider. You can often buy additional storage or storage protection for a set price per renewal period. There is also sometimes a Trash folder that stores accidentally deleted data for some period of time, often 30 days.
    For services that UMD accounts specifically provide access to, the Division of IT (DIT) documents policies here.

What are some data storage best practices?

  1. Store data that should persist indefinitely, whether local, Object Store, or cloud, in either a faculty member's storage allocation or a shared storage allocation. This reduces the likelihood that the data is lost when any one individual who has access to it has their account decommissioned due to an expiring appointment, inactivity, bad behavior, or otherwise. This is detailed further on this page, specifically as it pertains to publication, and by the Division of IT here, specifically as it pertains to Google Drive.
    • As an extension of this, be aware of who the actual owner of a file is in cloud data storage services. Files not owned by you in cloud storage can appear to be present in shared storage allocations or in your own personal allocation, but may in fact just be links to files owned by other individuals. If those individuals have their account decommissioned, the links to those files will no longer work, as the actual file will have been removed. The
  2. Be aware of what the retention policy is for data in cloud data storage services before deciding to store your data there. It is much harder to negotiate restoring data with cloud storage providers than it is with UMIACS Staff, so ensure you are OK with accepting the risk.