|
Moving Files In Linux
The Low Security FamilyLet's face it, most of us are in a rut when it comes to moving our files around. We learned how to use a simple FTP client years ago, and maybe even updated to a GUI FTP client when we were feeling particularly adventurous. There are actually a wealth of tools available for transferring files, and some of them perform automation functions that can easily assist your business in building site mirrors, synchronizing directory contents, and more. Keep in mind that for many of the tools covered here, there's only really room to skim through their features. Some, such as wget and rsync, are full of useful capabilities for those brave enough to read their man pages and experiment. I typically try to de-emphasize low security solutions, but there are times when they're perfectly acceptable. Rare times, but they have their uses, such as on a limited network that isn't connected to the outside world, and where you're not worried about someone having installed a packet sniffer. Another time these can be useful is when you really don't care if someone is listening: for example, when setting up a mirror for a publicly accessible web or FTP site, or keeping a directory's contents synchronized with such a site. These lower security tools include lftp, rcp, rsync, and wget. Many of these programs can handle more than just FTP connections, and some even have their own shell-like syntax for sophisticated use. Let's take a look at how to use each of these, and also at when they are most useful and when you want to hold out for a more secure solution.
lftpThe lftp tool can handle six different methods of file transfer (see the man page for the full list), including FTP and HTTP as well as the option of OpenSSL-based secure methods if they were included when the program was compiled. You can feed this tool instructions through a single command line, interactively, or even through a script file. For an example, we'll teach lftp to grab the whole set of HOWTOs from the Linux Documentation Project's server. Since we're not psychic (or at least I'm not), we'll walk through by hand before trying to write a script or do a nice long command line, so start by typing: lftp www.tldp.org As a result, you'll get the prompt: lftp www.tldp.org:~>
lftp isn't actually sitting there connected to the server. There's no
need to connect until you actually send a request. Now, if we type
Rather than going through the tedious step by step of figuring out what
directories to go to, we've all been through it with FTP, I've already
looked to see where I want to go. I'll skip straight to the directory
/pub/Linux/docs/HOWTO by typing
Now, I don't want every HOWTO individually. I just want the .tar.bz2
file for the full HOWTO set, and after this I can have the cron program
use lftp to see if there's a newer version available, and grab it. If I
type lftp www.tldp.org:/pub/Linux/docs/HOWTO> ls *bz2 -rw-rw-r-- 1 gferg linux 5699858 Feb 27 05:00 Linux-HOWTOs-20030227.tar.bz2 lftp www.tldp.org:/pub/Linux/docs/HOWTO>
I want to download this file to my ~/Downloads/documents directory, so I
type lftp www.tldp.org:/pub/Linux/docs/HOWTO> lcd ~/Downloads lcd ok, local cwd=/home/dee/Downloads lftp www.tldp.org:/pub/Linux/docs/HOWTO>
Now I type This is where lftp is at its most useful, where you don't care if someone sniffs the content because you're just doing an anonymous FTP login and grabbing publicly accessible files. The moment you start needing to worry about passwords being sniffed, or file contents, then lftp isn't a good choice--unless you're using a version with OpenSSL compiled in and make sure to only use it for FTPS and HTTPS (secure) connections.
rcprcp is a member of the old "r" (remote) command family, which includes both rlogin and rsh. Generally speaking, it's best to avoid this group of programs like the potential security plague they are. To use any of the "r" tools, you set up a file containing a list of machines and users who can access this machine without having to log in. There's no security implemented, no tunneling to hide data passing, no passwords needed to use it. The biggest problem with this collection of tools, and the reason I'm not going to cover how to use it, is that once someone breaks into a single account set up to use the "r" tools, they've just gained access to accounts and machines that this account can get to without needing passwords. This is a great way to give an intruder access all over your network. Please don't.
rsyncrsync comes to us from the Samba project, at http://rsync.samba.org/. This underutilized but valuable tool is excellent for keeping Web and FTP site mirrors up to date, not to mention for keeping the contents of local directories within your network in sync. You can also use it for private "secure" purposes such as data backup, as long as you are sure to utilize rsync within an ssh connection.
rsync is a client/server application, and like FTP, you can use it for
both anonymous and login-required transfers. For the client end, you can
learn more by typing Say that I'm using Mandrake Linux 9.1 and want to grab the latest packages available for this version without using Mandrake Update. I first go to http://www.mandrakesecure.net/en/ftp.php and select the mirror: I'll use the one at my alma matter, Penn State (carroll.cac.psu.edu), for this example. I begin by finding out if there are any rsync servers running on this server. The command I use for this is: rsync carroll.cac.psu.edu:: The response is, at the time of this writing (without the PSU banner): Apache Apache caldera Caldera Linux distribution caldera-iso Caldera Linux distribution ISO images collegelinux Collegelinux Linux distribution cpan Comprehensive Perl Archive Network ctan Comprehensive Tex Archive Network cygwin Cygwin debian Debian Linux distribution debian-cd Debian Linux distribution CD images freebsd FreeBSD gentoo Gentoo Linux distribution gnome The GNOME ftp site gnu GNU repository kde The KDE ftp site kernel Kernel.org mandrake Mandrake Linux distribution mandrake-devel Mandrake development tree mandrake-iso Mandrake development tree ISOs mandrake-old Mandrake old releases netbsd NetBSD openbsd OpenBSD opencd OpenCD Windows Distribution redhat-redhat Red Hat, Inc. -- Red Hat FTP Site, RedHat Area redhat-ftp Red Hat, Inc. -- Red Hat FTP Site redhat-beta Red Hat, Inc. -- Red Hat Linux beta releases redhat-contrib Red Hat, Inc. -- Contrib FTP Site redhat-rawhide Red Hat, Inc. -- Rawhide FTP Site redhat-updates Red Hat, Inc. -- Updates FTP Site sgifreeware freeware.sgi.com slackware Slackware Linux distribution sorcerer Sorceror Linux distribution splack Splack Linux distribution sunfreeware ftp ftp.sunfreeware.com suse SuSE Linux distribution xfree86 XFree86 ximian Ximian GNOME yellowdog YellowDog Linux distribution Since what I'm interested in is Mandrake updates, I now type the following to find the contents of the mandrake section: rsync carroll.cac.psu.edu::mandrake The results, minus the PSU banner, are: drwxr-xr-x 4096 2003/04/05 16:30:04 . drwxr-xr-x 4096 2003/03/25 07:19:02 9.0 drwxr-sr-x 4096 2003/03/25 07:46:53 9.1 lrwxrwxr-x 3 2003/03/25 08:30:05 current drwxr-xr-x 4096 2003/03/25 13:40:45 iso -rw-r--r-- 287053 2003/04/05 05:00:03 ls-lR.gz drwxrwsr-x 4096 2003/03/11 12:03:39 updates Since it's updates I'm looking for, I now try: rsync carroll.cac.psu.edu::mandrake/updates This gives me the following: drwxrwsr-x 4096 2003/03/11 12:03:39 updates What this tells me is that this is as as deep as I can go with rsync without recursively listing all files and directories. I'll do this by adding the -r flag: rsync -r carroll.cac.psu.edu::mandrake/updates | more The output is too long to list here, but what it shows me is that there are subdirectories for each Mandrake version. Using: rsync -r carroll.cac.psu.edu::mandrake/updates/9.1 | more shows me that I want the RPMS subdirectory, and: rsync -r carroll.cac.psu.edu::mandrake/updates/9.1/RPMS | more shows me that I've finally found the directory containing the files themselves. Ideally, I would now build a script that checked to see if I had the package installed before bothering to download an item, but for now I'm content to download everything in the updates directory that I don't already have. To accomplish this, I'll use (this line of code may show wrapped for readability, but it's meant to be all one line): rsync -uv carroll.cac.psu.edu::mandrake/updates/9.1/RPMS/* /home/dee/Updates/Mandrake The -u flag tells rsync to only grab the files that I don't already have, and the -v tells rsync to be verbose and show me the name of each file as it's grabbing them rather than just showing me the server's banner and then sitting there silently while it does its work. The path at the end (/home/dee/Updates/Mandrake) tells rsync where I want the files to go. If I was using this tool in a way that I needed security, I could use the flag and option: -e ssh to tell rsync to tunnel through the secure shell to do its work. rsync is a powerful, flexible tool. It can also be rather confusing, and digging around for examples on the Web is the best way I've found to get a handle on this program's many features.
wgetGNU's wget utility is a non-interactive download tool, meaning that it has no command line features to match lftp's functionality. You have access to FTP, HTTP, HTTPS, and proxied HTTP files using this program, but you have to know ahead of time what file you're trying to download, and where it is in the system's path. This command expects the file's location and path in a URL format. Say that I want to write a script that pulls data out of a Web page. I can easily grab that page using wget so I can work with its source. For example, I can download the default page for the Canadian Broadcasting Corporation's (CBC) with: wget http://www.cbc.ca/ If there are a list of pages, files, and so on that I want to grab for the script, I can list one URL-formatted item per line within a file. For example, if the file was ~/bin/getme, I would use: wget -i ~/bin/data/getme I could even tell wget to grab all of the URLs listed in a particular HTML file. If the default file I downloaded from the CBC was index.html.1 and it was saved in my home directory, then I would use the following to have wget grab every URL referenced in this file: wget -i index.html.1 -F
Notice the need to keep the flag's option with the flag. This command
will not work if you use wget has a number of useful features, including separate sets of options for FTP and HTTP connections. Taking the time to get more familiar with this tool is well worth your efforts.
The High Security FamilyWhile many of the tools we've already discussed have security features, there are some that are specially built for using in a secure manner. These are the tools you'll want to develop a habit of using when transferring information you need to keep private, or even just accessing accounts whose passwords you don't want to broadcast to the world. Still, don't let this tool lull you into complacency. There is no such thing as a fully secured setup. You'll need to have ssh configured between the two machines in question before you can use either of these tools. Setting up ssh is beyond the scope of this article, so see the ssh man page and your distribution's documentation for more information.
scpscp is a secure version of the old rcp tool, that uses ssh to prevent people from sniffing out what you're transferring. How you utilize this tool depends on whether you're using the same account name on both machines, or different account names. Otherwise, if you have your ssh set up properly, this is a pretty straightforward program. To copy the file sample1 to the recipient host example2 using scp, from the account bob on the local machine to the account bob on the remote machine, you would type: scp sample1 example2: However, if you wanted to copy the same file from bob on the local machine to jane on the remote machine, you would need to use a user@host format, such as: scp sample1 jane@example2: I highly recommend that you leave some form of password challenge in place. If you bypass all password challenges, then suddenly you've made it simple for someone to break into other machines on your network once they've broken into this one. What I do is use scp but still require that the remote machine user account's login password be entered before the copy is completed.
sftpsftp is an interactive tool that works over an ssh connection, mirroring the ftp program's functionality—and is in fact a nice front end to scp. You won't be using this client for anonymous downloads, but if you need to move data or other confidential information between machines, this is an excellent tool to choose. Once you've mastered the scp command you'll find sftp simple to use, or vice versa. They share many of the same flags and mostly the same syntax. Let's say that I want to FTP copies of some vital trade secrets from my office machine (user d.leblanc at work1.example.com) to my home machine (user dee at egg.example2.com)--okay, so let's not discuss why I'd want to be crazy enough to risk this in the first place--without someone being able to sniff out the data being passed along or any passwords along the way. Once I have ssh set up for a proper connection on both machines, I can open the connection using: sftp dee@egg.example2.com After being challenged for the password for user "dee," which is transmitted in a tunnel through the secure shell, I'm in and have the sftp prompt. Now I can use any of the commands in the sftp man page's INTERACTIVE COMMANDS section. Such a session might look like the following: sftp> mkdir StealMe sftp> cd StealMe sftp> put formulas/secret/* Uploading formulas/secret/worldpeace to /home/dee/StealMe/worldpeace Uploading formulas/secret/worldpeace to /home/dee/StealMe/worldpeace Uploading formulas/secret/baldcure to /home/dee/StealMe/baldcure Uploading formulas/secret/slimmer to /home/dee/StealMe/slimmer Uploading formulas/secret/senseofhumor to /home/dee/StealMe/senseofhumor sftp> So think of sftp as the more advanced, interactive cousin of scp.
Wrapping UpLinux has more ways to move data around than most people would care to keep track of. This bounty of tools doesn't need to be confusing. The main thing is to know what options you have available, and then choose the few that you want to really get to know in depth. For the rest, just keep them in the back of your mind in case you come upon a scenario where learning a new tool might be the most efficient answer to solving your file moving problems. Dee-Ann LeBlanc is an award-winning technical author with 11 books and over seventy articles in print. Along with writing, Dee-Ann teaches, develops courses, and also consults when time allows. Learn more at http://www.Dee-AnnLeBlanc.com/.
|