Data Storage: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
No edit summary
 
(29 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This is a landing page for all topics related to data storage that are available at UMIACS. It is under active development.
This is a landing page for all topics related to data storage that are available at UMIACS.


==Where can I store my data?==
==Where can I store my data?==
Before choosing where to store your data, consider how you need to interact with that data in the short-term and in the long-term, and who you might need to share the data with.
There is no one answer. Different types of data are used differently and have different optimal storage strategies. Trying to pick a one-size-fits-all option ''will'' lead to things being sub-optimal overall.
* '''[[LocalDataStorage | Local data storage]]''' is best suited for data that you are actively working on. Some examples are code that you are using to run computational jobs in [[SLURM]] with, results from computational jobs that have already run but have not yet been published in a paper, or unpublished papers that you are frequently editing.
 
*: Data on local storage can be moved between different UMIACS-supported hosts and shared with other [[Accounts#UMIACS_Account | UMIACS account]] holders using common methods such as <tt>cp</tt>, File Explorer, Finder, and more.
Before choosing where to store any of your data, consider how you may need to interact with that data in the short-term and in the long-term, and who you might need to share the data with. You can copy data between the different types of data storage listed below, but it may be unnecessarily cumbersome if you don't choose the right place for the "master copy" of your data.
* '''[[OBJ | UMIACS' Object Store]]''' is best suited for data that is going to remain static, but needs to remain accessible for very long periods of time, such as data referenced by published papers, or specific versions of data being transferred in and out of UMIACS.
 
*: Data can be moved in and out of the Object Store via its web interface or one of many [[S3Clients | compatible clients]]. Data can be shared via simple download links or [[OBJ/WebHosting | static websites]] that can visualize the data in a more accessible way. UMIACS account holders can sponsor [[Accounts#Guest_/_Collaborator_Account | Collaborator accounts]] for external collaborators that may need to upload and download new versions of data without needing direct access to UMIACS-supported hosts.  
* '''[[FilesystemDataStorage | Filesystem data storage]]''' is best suited for data that you are actively working on. Some examples are code that you are developing or using (perhaps to run computational jobs in [[SLURM]] with), results from computational jobs that have already run but have not yet been published in a paper, or files that need to be processed by another desktop-based or server-based application.
* [[CloudDataStorage]]
*: Data on UMIACS' filesystem storage can be moved between different UMIACS-supported hosts and shared with other [[Accounts#UMIACS_Account | UMIACS account]] holders using common methods such as <tt>cp</tt>, File Explorer, Finder, and more.
 
* '''[[OBJ | UMIACS' Object Store]]''' is best suited for data that is going to remain static, but needs to remain accessible for very long periods of time, such as data referenced by published papers. It is also suitable for transferring specific versions of data in and out of UMIACS via large singular files, such as archive (tar/zip/etc.) files.
*: Data in UMIACS' Object Store can be moved in and out of it, to filesystem storage (whether UMIACS' or otherwise), via its [https://obj.umiacs.umd.edu/obj built-in web interface] or one of many [[S3Clients | compatible clients]]. Data can be shared publicly via simple download links or [[OBJ/WebHosting | static websites]] that can visualize the data in a more accessible way. UMIACS account holders can sponsor [[Accounts#Guest_/_Collaborator_Account | Collaborator accounts]] for external collaborators that may need to upload and download new versions of data without needing direct access to UMIACS-supported hosts.
 
* '''[[CloudDataStorage | Cloud data storage]]''' is best suited for collaborative data, such as simple or rich text documents, PDF forms, spreadsheets, presentations, pictures, videos, and more. Many cloud storage service providers also provide web-based apps attached to their storage service that can be used to edit these types of data without ever having to download it to a specific device.
*: The methods that can be used to move data in and out of cloud storage to filesystem storage vary based on the specific service provider, but often involve web-based or mobile app-based upload and download. Data can be shared with others across the web via accounts associated with that service provider, and often publicly via simple download links.
 
If you need help deciding where the best place to store your data may be, please [[HelpDesk | contact staff]].


==How can I transfer my data?==
==How can I transfer my data?==
* [[Rclone]]
* '''[[FilesystemDataStorage | Filesystem data storage]]''' is typically best transferred using commands or programs available in the operating system on which it is stored or most commonly accessed from. A number of these commands are covered [[LocalDataTransfer | here]].
* [[SCP]]
 
* [[FTP]]
* '''[[OBJ | UMIACS' Object Store]]''' has a [https://obj.umiacs.umd.edu/obj built-in web interface] for transferring data in and out of it, which is usually the best option for individual or relatively small amounts of files/folders. For bulk transfers, one of many [[S3Clients | compatible clients]] can be used.
* [[Data Transfer]]
 
* '''[[CloudDataStorage | Cloud data storage]]''' transfer methods vary based on the service provider. Most major providers will provide simple upload/download functionality, which works well for individual or relatively small amounts of files/folders. For bulk transfers, [[Rclone]] is one program that [[HelpDesk | UMIACS staff]] often uses, as it has compatibility with many major providers.


==How is my data retained?==
==How is my data retained?==
* [[Snapshots]]
* '''[[FilesystemDataStorage | Filesystem data storage]]''' is retained by [[HelpDesk | UMIACS staff]] in a number of different ways, namely:
* [[NightlyBackups]]
*: [[Snapshots]]: Point-in-time copies of specific file systems, easily accessible for quick restores. Taken more often than daily and retained for up to a week (see page for more details).
* [[Archives]]
*: [[NightlyBackups]]: Daily copies of specific file systems, sent to a backup server managed by staff. Taken daily and retained for up to 90 days.
*: [[Archives]]: Final copies of specific types of data, stored on an archive server managed by staff. Taken when specific types of data is decommissioned and retained for up to 5 years (see page for more details).
 
* '''[[OBJ | UMIACS' Object Store]]''' is not retained in any way, but decommissioned data is sent to [[Archives]].
 
* '''[[CloudDataStorage | Cloud data storage]]''' retention policies vary based on the service provider. You can often buy additional storage or storage protection for a set price per renewal period. There is also sometimes a Trash folder that stores accidentally deleted data for some period of time, often 30 days.
*: For services that UMD accounts specifically provide access to, the [https://it.umd.edu Division of IT (DIT)] documents policies [https://itsupport.umd.edu/itsupport/en?id=kb_article_view&sysparm_article=KB0012739 here].


==What are some data storage best practices?==
==What are some data storage best practices?==
* [[Publishing Data]]
# Store data that should persist indefinitely, whether filesystem, Object Store, or cloud, in either a faculty member's storage allocation or a shared storage allocation. This reduces the likelihood that the data is lost when any one individual who has access to it has their account decommissioned due to an expiring appointment, inactivity, bad behavior, or otherwise. This is detailed further on [[Publishing Data | this page]], specifically as it pertains to publication, and by the Division of IT [https://itsupport.umd.edu/itsupport?id=kb_article_view&sysparm_article=KB0018225#mcetoc_1hntftl5o23 here], specifically as it pertains to Google Drive.
#* As an extension of this, be aware of who the actual owner of a file is in cloud data storage services. Files not owned by you in cloud storage can appear to be present in shared storage allocations or in your own personal allocation, but may in fact just be links to files owned by other individuals. If those individuals have their account decommissioned, the links to those files will no longer work, as the actual file will have been removed. The
# Be aware of what the retention policy is for data in cloud data storage services '''before''' deciding to store your data there. It is much harder to negotiate restoring data with cloud storage providers than it is with [[HelpDesk | UMIACS Staff]], so ensure you are OK with accepting the risk.

Latest revision as of 18:39, 18 November 2024

This is a landing page for all topics related to data storage that are available at UMIACS.

Where can I store my data?

There is no one answer. Different types of data are used differently and have different optimal storage strategies. Trying to pick a one-size-fits-all option will lead to things being sub-optimal overall.

Before choosing where to store any of your data, consider how you may need to interact with that data in the short-term and in the long-term, and who you might need to share the data with. You can copy data between the different types of data storage listed below, but it may be unnecessarily cumbersome if you don't choose the right place for the "master copy" of your data.

  • Filesystem data storage is best suited for data that you are actively working on. Some examples are code that you are developing or using (perhaps to run computational jobs in SLURM with), results from computational jobs that have already run but have not yet been published in a paper, or files that need to be processed by another desktop-based or server-based application.
    Data on UMIACS' filesystem storage can be moved between different UMIACS-supported hosts and shared with other UMIACS account holders using common methods such as cp, File Explorer, Finder, and more.
  • UMIACS' Object Store is best suited for data that is going to remain static, but needs to remain accessible for very long periods of time, such as data referenced by published papers. It is also suitable for transferring specific versions of data in and out of UMIACS via large singular files, such as archive (tar/zip/etc.) files.
    Data in UMIACS' Object Store can be moved in and out of it, to filesystem storage (whether UMIACS' or otherwise), via its built-in web interface or one of many compatible clients. Data can be shared publicly via simple download links or static websites that can visualize the data in a more accessible way. UMIACS account holders can sponsor Collaborator accounts for external collaborators that may need to upload and download new versions of data without needing direct access to UMIACS-supported hosts.
  • Cloud data storage is best suited for collaborative data, such as simple or rich text documents, PDF forms, spreadsheets, presentations, pictures, videos, and more. Many cloud storage service providers also provide web-based apps attached to their storage service that can be used to edit these types of data without ever having to download it to a specific device.
    The methods that can be used to move data in and out of cloud storage to filesystem storage vary based on the specific service provider, but often involve web-based or mobile app-based upload and download. Data can be shared with others across the web via accounts associated with that service provider, and often publicly via simple download links.

If you need help deciding where the best place to store your data may be, please contact staff.

How can I transfer my data?

  • Filesystem data storage is typically best transferred using commands or programs available in the operating system on which it is stored or most commonly accessed from. A number of these commands are covered here.
  • Cloud data storage transfer methods vary based on the service provider. Most major providers will provide simple upload/download functionality, which works well for individual or relatively small amounts of files/folders. For bulk transfers, Rclone is one program that UMIACS staff often uses, as it has compatibility with many major providers.

How is my data retained?

  • Filesystem data storage is retained by UMIACS staff in a number of different ways, namely:
    Snapshots: Point-in-time copies of specific file systems, easily accessible for quick restores. Taken more often than daily and retained for up to a week (see page for more details).
    NightlyBackups: Daily copies of specific file systems, sent to a backup server managed by staff. Taken daily and retained for up to 90 days.
    Archives: Final copies of specific types of data, stored on an archive server managed by staff. Taken when specific types of data is decommissioned and retained for up to 5 years (see page for more details).
  • Cloud data storage retention policies vary based on the service provider. You can often buy additional storage or storage protection for a set price per renewal period. There is also sometimes a Trash folder that stores accidentally deleted data for some period of time, often 30 days.
    For services that UMD accounts specifically provide access to, the Division of IT (DIT) documents policies here.

What are some data storage best practices?

  1. Store data that should persist indefinitely, whether filesystem, Object Store, or cloud, in either a faculty member's storage allocation or a shared storage allocation. This reduces the likelihood that the data is lost when any one individual who has access to it has their account decommissioned due to an expiring appointment, inactivity, bad behavior, or otherwise. This is detailed further on this page, specifically as it pertains to publication, and by the Division of IT here, specifically as it pertains to Google Drive.
    • As an extension of this, be aware of who the actual owner of a file is in cloud data storage services. Files not owned by you in cloud storage can appear to be present in shared storage allocations or in your own personal allocation, but may in fact just be links to files owned by other individuals. If those individuals have their account decommissioned, the links to those files will no longer work, as the actual file will have been removed. The
  2. Be aware of what the retention policy is for data in cloud data storage services before deciding to store your data there. It is much harder to negotiate restoring data with cloud storage providers than it is with UMIACS Staff, so ensure you are OK with accepting the risk.