Data Transfer: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Depending on the amount of data you are trying to transfer there are different commands that you should use.  The classic cp command has a number of edge cases in which it will not copy everything that is expected, so the '''only time''' that one should use the cp command is if the copy can be verified, such as if you are moving a single file. The better choice for transfering data is by using tar/gtar.  After transfering files or directories you can use rsync to check that everything moved correctly and it will update files that have been changed.
Depending on the amount of data you are trying to transfer there are different commands that you should use.  The classic <code>cp</code> command has a number of edge cases in which it will not copy everything that is expected, so the '''only time''' that one should use the <code>cp</code> command is if the copy can be verified, such as if you are moving a single file. The better choice for transferring data is by using <code>tar</code> or <code>gtar</code>.  After transferring files or directories, you can use <code>rsync</code> to check that everything moved correctly and it will update files that have been changed.


==Transfer a single file==
==Transfer a single file==


If you want to transfer a single file into another directory you can use the cp command.  The cp command will take the file you need to transfer and make a copy of it in the directory you specify. If you do not specify a directory it will just make a copy of it in the current location.
If you want to transfer a single file into another directory you can use the <code>cp</code> command.  The format for the <code>cp</code> command is: <code>cp Source Destination</code>. If destination is omitted, it will make a copy of the file(s) in the current directory.


The format for the cp command is:
* To copy the file foo.txt from your home directory into Documents
<tt>cp file /target/</tt>


'''Example 1:''' To transfer the file foo.txt from your home directory into Documents
<pre>cp foo.txt ~/Documents</pre>


<tt>cd</tt>
* To transfer the file foo.txt from your current directory to another host use '''<tt>scp</tt>''' (secure copy) with $USERNAME@<FQDN>:<PATH>


<tt>cp foo.txt ~/Documents</tt>
<pre>scp foo.txt USERNAME@example1.umiacs.umd.edu:~/Documents</pre>


The end result will leave the original file in the home directory and create a copy of it in Documents.
This command will copy the file foo from your current directory to the folder Documents located in your home directory on the host example1.  


'''Example 2:''' To transfer the file foo.txt from your current directory to another host use scp (which stands for secure copy) with USERNAME@FULLYQUALIFIEDHOSTNAME:PATH
* To transfer back the file foo.txt from the host example1 to another directory in your current host


<tt>scp foo.txt USERNAME@example1.umiacs.umd.edu:~/Documents</tt>
<pre>scp USERNAME@example1.umiacs.umd.edu:~/Documents/foo.txt ~/Videos</pre>


This command will copy the file foo from your current directory to the folder Documents located in your home directory on the host example1. You can also use scp to copy a file from a host to your current directory, or copy a file between two hosts.
'''Important:''' The <code>cp</code> command should only be used for small file transfers.  If you try to transfer a large amount it is possible that <code>cp</code> will not copy all the files over properly. To learn more about the <code>cp</code> command type <code>man cp</code> on the in the terminal.
 
'''Example 3:''' To transfer back the file foo.txt from the host example1 to another directory in your current host
 
<tt>scp USERNAME@example1.umiacs.umd.edu:~/Documents/foo.txt ~/Videos</tt>
 
'''Important:''' The cp command should only be used for small file transfers.  If you try to transfer a large amount it is possible that cp will not copy all the files over properly.


==Transfer a directory or large amounts of data to another location==
==Transfer a directory or large amounts of data to another location==


If you want to transfer a whole directory or a large amount of data to another location you can use the gtar command.  Even though we use the gtar command to copy and transfer a directory, the more common use of gtar is to make tar balls (to create archives of specified directories or files).  
If you want to transfer a whole directory or a large amount of data to another location you can use the <code>code</code> command.  Even though we use the <code>gtar</code> command to copy and transfer a directory, the more common use of <code>gtar</code> is to make tar balls (to create archives of specified directories or files).  


'''Example 1:''' To combine two files in your current directory in an archive:
* To combine two files in your current directory in an archive:


<tt>gtar -cpvf archive.tar file1 file2</tt>
<pre>gtar -cpvf archive.tar file1 file2</pre>


To re-archive the data back in a different directory, for example Documents in your home directory:
To re-archive the data back in a different directory, for example Documents in your home directory:


<tt>gtar -C ~/Documents/ -xpvf archive.tar</tt>  
<pre>gtar -C ~/Documents/ -xpvf archive.tar</pre>  


This command will archive all files (including those in subdirectories) within the current directory and re-create the files in the directory that you specify.
This command will archive all files (including those in subdirectories) within the current directory and re-create the files in the directory that you specify.


The format for the gtar command for data transfer is:
The format for the <code>gtar</code> command for data transfer is:
<tt>gtar -cpf -  . | gtar -C /dir -xpvf -</tt>
<pre>gtar -cpf -  . | gtar -C /dir -xpvf -</pre>


In this command, the target is a dash <tt>'-'</tt> which stands for standard output, and the source is a period <tt>'.'</tt> which is interpreted as all files in your current directory. The standard output is piped to the second command, which has as a source a dash <tt>'-'</tt> (which the shell interprets as standard input). The output of the first command is piped and becomes the input of the second command. The target directory in your second command is <tt>/dir</tt>.  
In this command, the target is a dash <tt>'-'</tt> which stands for standard output, and the source is a period <tt>'.'</tt> which is interpreted as all files in your current directory. The standard output is piped to the second command, which has as a source a dash <tt>'-'</tt> (which the shell interprets as standard input). The output of the first command is piped and becomes the input of the second command. The target directory in your second command is <tt>/dir</tt>.  
   
   
'''Example 2:''' To transfer all files from your documents to a folder in your home directory called foo:
* To transfer all files from your documents to a folder in your home directory called foo:


<tt>cd ~/Documents</tt>
<pre>cd ~/Documents


<tt>gtar -cpf - . | gtar -C ~/foo -xpvf -</tt>
gtar -cpf - . | gtar -C ~/foo -xpvf -</pre>


When you use this command it will display a list of the files it has transferred.
When you use this command it will display a list of the files it has transferred.
Line 61: Line 54:
===Transfer between two different hosts===  
===Transfer between two different hosts===  


Include the command <tt>ssh USERNAME@FULLYQUALIFIEDHOSTNAME</tt> before the gtar command.
* To transfer files from the <tt>tmp</tt> directory on example1.umiacs.umd.edu to the <tt>foo</tt> directory on your current host:
 
If the other host is the one with the data to transfer you will need to include the command before the first gtar.
 
If the other host is the one receiving the data, you will need to include the command after the pipe "|" and before the second gtar.
 
'''Example 1:''' To transfer files from the directory /tmp/ on example1.umiacs.umd.edu to the folder /foo/ on your current host:
 
<tt>ssh USERNAME@example1.umiacs.umd.edu gtar -cpf - /tmp | gtar -C /foo -xpvf -</tt>


'''Example 2:''' To transfer files from the directory /foo/ on your current host to directory /tmp on example1.umiacs.umd.edu:
<pre>ssh USERNAME@example1.umiacs.umd.edu gtar -cpf - /tmp | gtar -C /foo -xpvf -</pre>


<tt>gtar -cpf - /foo/ | ssh USERNAME@example1.umiacs.umd.edu gtar -C /tmp -xpvf -</tt>
* To transfer files from <tt>foo</tt> directory on your current host to the <tt>tmp</tt> directory on example1.umiacs.umd.edu:


If your data transfer is interrupted, you can use the rsync command listed below to copy the rest of the files without creating doubles of files that have already been transferred.
<pre>gtar -cpf - /foo/ | ssh USERNAME@example1.umiacs.umd.edu gtar -C /tmp -xpvf -</pre>


Rsync can also be used for the initial transfer of data if you expect the transfer to be interrupted.  Elsewise, this method should not be used as it takes more time and memory.
If your data transfer is interrupted, you can use the <code>rsync</code> command listed below to copy the rest of the files without creating doubles of files that have already been transferred.


To run rsync to copy files, the format for the command is the same as written below under "Verifying transfer."
<code>rsync</code> can also be used for the initial transfer of data if you expect the transfer to be interrupted.  Otherwise, this method should not be used as it takes more time and memory.


==Verifying transfer==
==Verifying transfer==


To verify that your transfer copied everything you can use the rsync command, which will compare the two directories contents and will update the files in which it sees differences.
To verify that your transfer copied everything you can use the <code>rsync</code> command, which will compare the two directories' contents and will update the files in which it sees differences.


The format for the rsync command is:
The format for the <code>rsync</code> command is:


<tt>rsync -aH /source/ /target</tt>
<pre>rsync -aH /source/ /target</pre>


'''Example:''' To ensure that the files are the same in Documents and foo from the previous example:
To ensure that the files are the same in Documents and foo from the previous example:


<tt>rsync -aH ~/Documents/ ~/foo</tt>
<pre>rsync -aH ~/Documents/ ~/foo</pre>


This command will compare the files and directories within Documents to the files and directories within foo.  If there are files within Documents and its subdirectories that do not appear in foo, this command will copy the missing files from Documents to foo.
This command will compare the files and directories within Documents to the files and directories within foo.  If there are files within Documents and its subdirectories that do not appear in foo, this command will copy the missing files from Documents to foo.


'''Important:''' Make sure to include the slash after the name of the source directory, if you do not include it it will copy the directory folder over as well.
'''Important:''' Make sure to include the slash after the name of the source directory, if you do not include it, it will copy the directory folder over as well.

Revision as of 15:10, 14 March 2022

Depending on the amount of data you are trying to transfer there are different commands that you should use. The classic cp command has a number of edge cases in which it will not copy everything that is expected, so the only time that one should use the cp command is if the copy can be verified, such as if you are moving a single file. The better choice for transferring data is by using tar or gtar. After transferring files or directories, you can use rsync to check that everything moved correctly and it will update files that have been changed.

Transfer a single file

If you want to transfer a single file into another directory you can use the cp command. The format for the cp command is: cp Source Destination. If destination is omitted, it will make a copy of the file(s) in the current directory.

  • To copy the file foo.txt from your home directory into Documents
cp foo.txt ~/Documents
  • To transfer the file foo.txt from your current directory to another host use scp (secure copy) with $USERNAME@<FQDN>:<PATH>
scp foo.txt USERNAME@example1.umiacs.umd.edu:~/Documents

This command will copy the file foo from your current directory to the folder Documents located in your home directory on the host example1.

  • To transfer back the file foo.txt from the host example1 to another directory in your current host
scp USERNAME@example1.umiacs.umd.edu:~/Documents/foo.txt ~/Videos

Important: The cp command should only be used for small file transfers. If you try to transfer a large amount it is possible that cp will not copy all the files over properly. To learn more about the cp command type man cp on the in the terminal.

Transfer a directory or large amounts of data to another location

If you want to transfer a whole directory or a large amount of data to another location you can use the code command. Even though we use the gtar command to copy and transfer a directory, the more common use of gtar is to make tar balls (to create archives of specified directories or files).

  • To combine two files in your current directory in an archive:
gtar -cpvf archive.tar file1 file2

To re-archive the data back in a different directory, for example Documents in your home directory:

gtar -C ~/Documents/ -xpvf archive.tar

This command will archive all files (including those in subdirectories) within the current directory and re-create the files in the directory that you specify.

The format for the gtar command for data transfer is:

gtar -cpf -  . | gtar -C /dir -xpvf -

In this command, the target is a dash '-' which stands for standard output, and the source is a period '.' which is interpreted as all files in your current directory. The standard output is piped to the second command, which has as a source a dash '-' (which the shell interprets as standard input). The output of the first command is piped and becomes the input of the second command. The target directory in your second command is /dir.

  • To transfer all files from your documents to a folder in your home directory called foo:
cd ~/Documents

gtar -cpf - . | gtar -C ~/foo -xpvf -

When you use this command it will display a list of the files it has transferred.

This command will leave Documents the same, but create a full copy of all files and folders from Documents in foo.

Note: This command will preserve permissions, attributes, and meta-data of all files transferred.

Transfer between two different hosts

  • To transfer files from the tmp directory on example1.umiacs.umd.edu to the foo directory on your current host:
ssh USERNAME@example1.umiacs.umd.edu gtar -cpf - /tmp | gtar -C /foo -xpvf -
  • To transfer files from foo directory on your current host to the tmp directory on example1.umiacs.umd.edu:
gtar -cpf - /foo/ | ssh USERNAME@example1.umiacs.umd.edu gtar -C /tmp -xpvf -

If your data transfer is interrupted, you can use the rsync command listed below to copy the rest of the files without creating doubles of files that have already been transferred.

rsync can also be used for the initial transfer of data if you expect the transfer to be interrupted. Otherwise, this method should not be used as it takes more time and memory.

Verifying transfer

To verify that your transfer copied everything you can use the rsync command, which will compare the two directories' contents and will update the files in which it sees differences.

The format for the rsync command is:

rsync -aH /source/ /target

To ensure that the files are the same in Documents and foo from the previous example:

rsync -aH ~/Documents/ ~/foo

This command will compare the files and directories within Documents to the files and directories within foo. If there are files within Documents and its subdirectories that do not appear in foo, this command will copy the missing files from Documents to foo.

Important: Make sure to include the slash after the name of the source directory, if you do not include it, it will copy the directory folder over as well.