LocalDataTransfer

From UMIACS
Revision as of 16:23, 25 October 2024 by Mbaney (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Depending on the amount of data you are trying to transfer, there are different commands that you should use. The classic cp command has a number of edge cases in which it will not copy everything that is expected, so the only time that one should use the cp command is if the copy can be verified, such as if you are moving a single file. The better choice for transferring data is by using tar or gtar. After transferring files or directories, you can use rsync to check that everything moved correctly and to perform a final sync on changed files.

Transfer a single file

If you want to transfer a single file into another directory you can use the cp command. The format for the cp command is: cp Source Destination. If destination is omitted, it will make a copy of the file(s) in the current directory.

  • To copy the file foo.txt into your home directory's Documents directory
cp foo.txt ~/Documents
  • To transfer the file foo.txt from your current directory to another host use scp (secure copy) with <USERNAME>@<FQDN>:<PATH>
scp foo.txt USERNAME@example1.umiacs.umd.edu:~/Documents

This command will copy the file foo from your current directory to the folder Documents located in your home directory on the host example1.

  • To transfer back the file foo.txt from the host example1 to another directory in your current host
scp USERNAME@example1.umiacs.umd.edu:~/Documents/foo.txt ~/Videos

Important: The cp command should only be used for small file transfers. If you try to transfer a large amount it is possible that cp will not copy all the files over properly. To learn more about the cp command type man cp on the in the terminal.

Transfer a directory or large amounts of data to another location

If you want to transfer a whole directory or a large amount of data to another location you can use the gtar command.

  • To combine two files in your current directory in an archive:
    gtar -cpvf archive.tar file1 file2
  • To re-archive the data back in a different directory, for example Documents in your home directory:
    gtar -C ~/Documents/ -xpvf archive.tar

This command will archive all files (including those in subdirectories) within the current directory and re-create the files in the directory that you specify.

The format for the gtar command for data transfer is:

gtar -cpf -  . | gtar -C /dir -xpvf -

In this command, the target is a dash '-' which stands for standard output, and the source is a period '.' which is interpreted as all files in your current directory. The standard output is piped to the second command, which has as a source a dash '-' (which the shell interprets as standard input). The output of the first command is piped and becomes the input of the second command. The target directory in your second command is /dir.

  • To transfer all files from your documents to a folder in your home directory called foo:
cd ~/Documents
gtar -cpf - . | gtar -C ~/foo -xpvf -

When you use this command it will display a list of the files it has transferred.

This command will leave Documents the same, but create a full copy of all files and folders from Documents in foo.

Note: This command will preserve permissions, attributes, and meta-data of all files transferred.

Transfer between two different hosts

  • To transfer files from the tmp directory on example1.umiacs.umd.edu to the foo directory on your current host:
ssh USERNAME@example1.umiacs.umd.edu gtar -cpf - /tmp | gtar -C /foo -xpvf -
  • To transfer files from foo directory on your current host to the tmp directory on example1.umiacs.umd.edu:
gtar -cpf - /foo/ | ssh USERNAME@example1.umiacs.umd.edu gtar -C /tmp -xpvf -

If your data transfer is interrupted, you can use the rsync command listed below to copy the rest of the files without creating doubles of files that have already been transferred.

rsync can also be used for the initial transfer of data if you expect the transfer to be interrupted. Otherwise, this method should not be used as the overhead takes more time and memory.

To run rsync to copy files, the format for the command is the same as written below under "Verifying transfer."

Verifying transfer

To verify that your transfer copied everything you can use the rsync command, which will compare the two directories' contents and will update the files in which it sees differences.

The format for the rsync command is:

rsync -aH /source/ /target

To ensure that the files are the same in Documents and foo from the previous example:

rsync -aH ~/Documents/ ~/foo

This command will compare the files and directories within Documents to the files and directories within foo. If there are files within Documents and its subdirectories that do not appear in foo, this command will copy the missing files from Documents to foo.

Important: Make sure to include the slash after the name of the source directory, if you do not include it, it will copy the directory folder over as well.