Monday, April 23, 2007

Linux Command Line Tips

Command line skills are something you pick up over time. When something needs to be done, you work out how, and from then on you know how to do it. Few take the time to systematically learn the ins and outs of the tools at their disposal, however, and so may not be aware of all the possibilities of even the most basic utilities. In this tip we'll be looking at one of the staples of any shell toolbox: the find utility.

As the name suggests, find is a program for searching your disk for files and directories satisfying a given criteria. By default, it starts in the current directory and traverses down to all lower directories. Find is able to search based upon a number of different file attributes and also perform actions on the results, usually running some program for each result.

Let's take a look at a few examples: firstly, to find all html files in the current directory or lower you would use:

find -name "*.html" -type f

Now, this command has two tests, the first, "-name", is used to test against each filename in the search. If you need this to be case insensitive, use "-iname" instead. The second test is "-type" which is used to specify the kind of thing you are interested in. The "f" says we are looking for regular files, however we could have used "d" for directories or "l" for symbolic links, for example. A full list of options can be found on the find manpage.

In this example we don't specify a location since we were looking in the current directory. You can start the search in any directory (or directories) on your system, for example if we know that the html files will be in one of two places then we could make the search quicker and more accurate by specifying a start point:

find /var/www /home/nickg/public_html -name "*.html" -type f

This now searches in the Web server root, my home html root, and their subdirectories. Hopefully this will mean we get what we're looking for and not erroneous files like the Web browser cache, or html help files.

Find traverses down all subdirectories by default, but you can control this behaviour by specifying the maximum depth. If in the previous case you wanted only to search those two directories but go no further you can add "-maxdepth 1" to the command. Setting the max depth to 0 means that only the files given on the command line will be tested. Similarly you can set a minimum depth, so that you can avoid files sitting in the root.

Another use for find is to search for files belonging to a given user. So to search for all files belonging to me on my system I use:

find / -user nickg

The same thing can be done for searching based upon group, using the "-group" test.

The next category of tests relates to time, allowing you to search for files based on when they were last created, accessed or modified by using "-ctime", "-atime" and "-mtime" respectively. For example to find files created in the past day:

find -ctime -1

Note the "-" before the 1, this means that we're looking at a range backwards from the current day. If you need more precision you can use the by minute variations, "-cmin", "-amin" and "-mmin". If you've just made a mistake and you're not sure which files may have been affected, it's easy to narrow things down using find:

find -mmin -5

The standard action for find to perform on the files is to print out the filename, which is why if you've been following along you've ended up with these long lists of filenames. This is perfect if you want to use this data as input into another filter, but if you need a little more information about the results you can get find to give you the same sort of output as you'd get from ls -l:

find -user nickg -iname "*.html" -ls

Which will give permission and time info.

Lastly, you can get find to run any program on each result by using the "-exec" action. The following program will delete all files in your home directory with the extension ".tmp":

find -name ".tmp" -exec rm {} \;

The two braces ("{}") are replaced by each filename matched, and the escaped semicolon is needed to tell find when the command is finished. Find is often used in combination with chmod to quickly change file permissions over a large set of files, or with grep and sed to selectively find or modify text using regular expressions. This is just the tip of the iceberg where find is concerned, by using it as the input to a script you can automate time consuming tasks, such as cleaning up files that have not been accessed in a year, or automatically backing up modified files. This kind of power means that find remains one of the best tools a Linux user has at their disposal.

0 comments: