The POSIX Shell and Command-Line Utilities

The POSIX shell is a descendant of the KornShell, which was a descendant of the Bourne shell. The basic syntax has remained the same, and Bourne shell scripts will usually run successfully in a POSIX shell. Not all of the KornShell features are included in POSIX (there are no arrays, for example), but the most important ones are. Those, for me, are string manipulation and arithmetic.

The scripts in this book make extensive use of features of the POSIX shell, and keep external commands to a minimum. This chapter presents an overview of the features of the POSIX shell and the external Unix commands used in this book, without going into great detail. Further information is available in the documentation for the shell and the various commands as well as on many web pages, a number of which are listed in the Appendix. I will also explain some of the idiosyncrasies used in the scripts, and present a library of functions that are used by many scripts.

Shell Commands

These descriptions are brief overviews of the built-in commands; for a complete description, see your shell's man page. 1 echo The echo command prints its arguments separated by single spaces followed by a newline. If an unquoted variable contains characters present in $IFS (see "Parameters and Variables" later in this chapter), then the variable will be split on those characters: $ list="a b c d e f g h" $ echo $list a b c d e f g h If the variable is quoted, all internal characters will be preserved: $ echo "$list" a b c d e f g h In the early days of Unix, two different versions of echo appeared. One version converted escape sequences, such as \t and \n, into the characters they represent in the C language; \c suppressed the newline, and discarded any further characters. The other used the -n option to suppress the trailing newline and did not convert escape sequences. The POSIX standard for echo says that "Implementations shall not support any options" and "If the first operand is -n, or if any of the operands contain a backslash ( '\' ) character, the results are implementation-defined." In other words, you cannot rely on echo's behavior being one or the other. It is best not to use echo unless you know exactly what echo is going to print, and you know that it will not contain any problem characters. The preferred command is printf. 2 printf This command may be built into the shell itself, or it may be an external command. Like the C-language function on which it is based, printf takes a format operand that describes how the remaining arguments are to be printed, and any number of optional arguments. The format string may contain literal characters, escape sequences, and conversion specifiers. Escape sequences (the most common ones being \n for newline, \t for tab, and \r for carriage return) in format will be converted to their respective characters. Conversion specifiers, %s, %b, %d, %x, and %o, are replaced by the corresponding argument on the command line. Some implementations support other specifiers, but they are not used in this book. When there are more arguments than specifiers, the format string is reused until all the arguments have been consumed. The %s specifier interprets its argument as a string and prints it literally: $ printf "%s\n" "qwer\ty" 1234+5678 qwer\ty 1234+5678 The %b specifier is like %s, but converts escape sequences in the argument: $ printf "%b\n" "qwer\ty" "asdf\nghj" qwer y asdf ghj The %d, %x, and %o specifiers print their arguments as decimal, hexadecimal, and octal numbers, respectively. $ printf "%d %x %o\n" 15 15 15 15 f 17 The conversion specifiers may be preceded by flags for width specification, optionally preceded by a minus sign indicating that the conversion is to be printed flush left, instead of flush right, in the specified number of columns: $ printf "%7d:\n%7s:\n%-7s:\n" 23 Cord Auburn 23: Cord: Auburn : In a numeric field, a 0 before the width flag indicates padding with zeroes: $ printf "%07d\n" 13 0000013 3 set In the Oxford English Dictionary, the longest entry is for the word set-thirty- two pages in my Compact Edition. In the Unix shell, the set command is really three commands in one. Without any arguments, it prints the names and values of all shell variables (including functions). With one or more option arguments, it alters the shell's behavior. Any non-option arguments are placed in the positional parameters. Only three options to set are used in this book: * -v: Print shell input lines as they are read. * -x: Print commands and their arguments as they are executed. * -f: Disable file name generation (globbing). Given this script, which I call xx.sh: echo "Number of positional parameters: $#" echo "First parameter: ${1:-EMPTY}" shift $(( $# - 1 )) echo "Last parameter: ${1:-EMPTY}" Its output is: $ xx.sh the quick brown fox Number of positional parameters: 4 First parameter: the Last parameter: fox If set -v is added to the top of the script, and the standard output redirected to oblivion, the script itself is printed: $ xx.sh the quick brown fox >/dev/null echo "Number of positional parameters: $#" echo "First parameter: ${1:-EMPTY}" shift $(( $# - 1 )) echo "Last parameter: ${1:-EMPTY}" If set -v is replaced with set -x, variables and arithmetic expressions are replaced by their values when the lines are printed; this is a useful debugging tool: $ xx.sh the quick brown fox >/dev/null ++ echo 'Number of positional parameters: 4' ++ echo 'First parameter: the' ++ shift 3 ++ echo 'Last parameter: fox' ++ exit To demonstrate the set -f option, and the use of + to reverse the operation, I ran the following script in an empty directory: ## Create a number of files using brace expansion (bash, ksh) touch {a,b,c,d}${RANDOM}_{e,r,g,h}${RANDOM} ## Turn off filename expansion set -f printf "%-22s%-22s%-22s\n" * ; echo ## Display asterisk printf "%-22s%-22s%-22s\n" *h* ; echo ## Display "*h*" ## Turn filename expansion back on set +f printf "%-22s%-22s%-22s\n" * ; echo ## Print all filenames printf "%-22s%-22s%-22s\n" *h* ; echo ## Print filenames containing "h" When the script is run, this is the output: $ xx.sh * *h* a12603_e6243 a28923_h23375 a29140_r28413 a5760_g7221 b17774_r4121 b18259_g11343 b18881_e10656 b660_h32228 c22841_r19358 c26906_h14133 c29993_g6498 c6576_e25837 d11453_h12972 d25162_e3276 d7984_r25591 d8972_g31551 a28923_h23375 b660_h32228 c26906_h14133 d11453_h12972 You can use set to split strings into pieces by changing the value of $IFS. . For example, to split a date, which could be 2005-03-01 or 2003/09/29 or 2001.01.01, $IFS can be set to all the possible characters that could be used as separators. The shell will perform word splitting on any character contained in $IFS: $ IFS=' -/.' $ set 2005-03-01 $ printf "%s\n" "$@" 2005 03 01 When the value to be set is contained in a variable, a double dash should be used to ensure that the value of the variable is not taken to be an option: $ var="-f -x -o" $ set -- $var 4 shift The leading positional parameters are removed, and the remaining parameters are moved up. By default, one parameter is removed, but an argument may specify more: $ set 1 2 3 4 5 6 7 8 $ echo "$* ($#)" 1 2 3 4 5 6 7 8 (8) $ shift $ echo "$* ($#)" 2 3 4 5 6 7 8 (7) $ shift 3 $ echo "$* ($#)" 5 6 7 8 (4) Some shells will complain if the argument to shift is larger than the number of positional parameters. 5 type The POSIX standard says type "shall indicate how each argument would be interpreted if used as a command name." Its return status may be used to determine whether a command is available: if type stat > /dev/null 2>&1 ## discard the output then stat "$file" fi If the command is an executable file, type prints the path to the file; otherwise, it prints the type of command that will be invoked: function, alias, or shell builtin. Its output is not standard across different shells, and therefore cannot be used reliably in a shell script. The four arguments to type in the following example represent an executable file, a function, a nonexistent command, and an alias. $ type ls pr1 greb ecoh ls is hashed (/bin/ls) pr1 is a function pr1 () { case $1 in -w) pr_w= ;; *) pr_w=-.${COLUMNS:-80} ;; esac; printf "%${pr_w}s\n" "$@" } bash: type: greb: not found ecoh is aliased to `echo' Unlike most shells, bash will print the definition of a function. 6 getopts The command getopts parses the positional parameters according to a string of acceptable options. If an option is followed by a colon, an argument is expected for that option, and will be stored in $OPTARG. This example accepts -a, -b, and -c, with -b expecting an argument: while getopts ab:c opt do case $opt in a) echo "Option -a found" ;; b) echo "Option -b found with argument $OPTARG" ;; c) echo "Option -c found" ;; *) echo "Invalid option: $opt"; exit 5 ;; esac done 7 case A workhorse among the shell's built-in commands, case allows multiple branches, and is the ideal tool, rather than grep, for determining whether a string contains a pattern or multiple patterns. The format is: case STRING in PATTERN [| PATTERN ...]) [list] ;; [PATTERN [| PATTERN ...]) [list] ;; ...] esac The PATTERN is a pathname expansion pattern, not a regular expression, and the list of commands following the first PATTERN that matches is executed. (See the "Patterns" section further on for an explanation of the two types of pattern matching.) 8 eval The command eval causes the shell to evaluate the rest of the line, then execute the result. In other words, it makes two passes at the command line. For example, given the command: eval "echo \${$#}" The first pass will generate echo ${4} (assuming that there are 4 positional parameters). This will then print the value of the last positional parameter, $4. 9 local The local command is used in functions; it takes one or more variables as arguments and makes those local to the function and its children. Though not part of the POSIX standard, it is built into many shells; bash and the ash family have it, and pdksh has it as a standard alias for typeset (which is also not included in POSIX). In KornShell 93 (generally referred to as ksh93), if a function is defined in the portable manner (as used throughout this book), there is no way to make a variable local to a function. In this book, local is used only in the few scripts that are written specifically for bash, most often for setting $IFS without having to restore it to its original value: local IFS=$NL Parameters and Variables Parameters are names used to represent information; there are three classes of parameters: Positional parameters are the command-line arguments, and are numbered beginning with $1; variables are parameters denoted by a name that contains only letters, numbers and underscores, and that begins with a letter or an underscore; and special parameters that are represented by non-alphanumeric characters. 1 Positional Parameters Positional parameters are the command-line arguments passed to a script or a function, and are numbered beginning with 1. Parameters greater then 9 must be enclosed in braces: ${12}. This is to preserve compatibility with the Bourne shell, which could only access the first nine positional parameters; $12 represents the contents of $1, followed by the number 2. The positional parameters can be assigned new values, with the set command. (See the example under "Special Parameters.") 2 Special Parameters The parameters $* and $@ expand to all the positional parameters, and # represents the number of positional parameters. This function demonstrates the features of these parameters: demo() { printf "Number of parameters: %d\n" $# printf " The first parameter: %s\n" "$1" printf "The second parameter: %s\n" "$2" printf "\nAll the parameters, each on a separate line:\n" printf "\t%s\n" "$@" printf "\nAll the parameters, on one line:\n" printf "\t%s\n" "$*" printf "\nEach word in the parameters on its own line:\n" printf "\t%s\n" $* } Here, the demo function is run with three arguments: $ demo The "quick brown" fox Number of parameters: 3 The first parameter: The The second parameter: quick brown All the parameters, each on a separate line: The quick brown fox All the parameters, on one line: The quick brown fox Each word in the parameters on its own line: The quick brown fox The decimal exit code of the previous command executed (0 for success, non- zero for failure) is stored in $?: $ true; echo $? 0 $ false; echo $? 1 The shell's current option flags are stored in $-; the shell's process ID is in $$; $! is the process ID of the most recently executed background command, and $0 is the name of the current shell or script: $ sleep 4 & [1] 12725 $ printf "PID: %d\nBackground command PID: %d\n" $$ $! PID: 12532 Background command PID: 12725 $ printf "Currently executing %s with options: %s\n" "$0" "$-" Currently executing bash with options: fhimBH 3 Shell Variables These are the variables that are assigned values at the command line or in a script. The system or the shell itself also set a number of variables; those used in this book are: * $HOME: The path name of user's home directory (e.g., /home/chris). * $IFS: A list of characters used as internal field separators for word splitting by the shell. The default characters are space, tab, and newline. Strings of characters can be broken up by changing the value of $IFS: $ IFS=-; date=2005-04-11; printf "%s\n" $date 2005 04 11 * $PATH: This colon-separated list of directories tells the shell which directories to search for a command. To execute a command in other directories, including the current working directory, an explicit path must be given (/home/chris/demo_script or ./demo_script, not just demo_script). * $PWD: This is set by the shell to the pathname of the current working directory: $ cd $HOME && echo $PWD /home/chris $ cd "$puzzles" && echo $PWD /data/cryptics 2 standard-vars-A Collection of Useful Variables My standard-vars file begins with these lines: NL=' ' CR=' ' TAB=' ' You might be able to guess that these three variables represent newline, carriage return, and tab, but it's not clear, and cannot be cut and pasted from a web site or newsgroup posting. Once those variables are successfully assigned, however, they can be used, without ambiguity, to represent those characters. The standard-vars file is read by the shell and executed in the current environment (known as sourcing, it is described later in the chapter) in most of my shell scripts, usually via standard-funcs, which appears later in this chapter. I created the file with the following script, then added other variables as I found them useful: printf "%b\n" \ "NL=\"\n\"" \ "CR=\"\r\"" \ "TAB=\"\t\"" \ "ESC=\"\e\"" \ "SPC=\"\040\" \ "export NL CR TAB ESC SPC" > $HOME/scripts/standard-vars-sh The -sh extension is part of the system I use for working on scripts without contaminating their production versions. It is explained in Chapter 20. Patterns Two types of patterns are used in shell scripts: pathname expansion and regular expressions. Pathname expansion is also known as globbing, and is done by the shell; regular expressions are more powerful (and much more complicated), and are used in external commands such as sed, awk, and grep. 1 Pathname Expansion Three special characters tell the shell to interpret an unquoted string as a pattern: *: Matches any string, including an empty one. By itself, an asterisk matches all files in the current directory, except those that begin with a dot. ?: Matches any single character. By itself, a question mark matches all files in the current directory whose name is a single character, other than a dot. [: When matched with a closing bracket, ], matches any of the characters enclosed. These may be individual characters, a range of characters, or a mixture of the two. These patterns can be combined to form complex patterns for matching strings in case statements, and for building lists of files. Here are a few examples executed in a directory containing these files: a b c d ee ef eg eh fe ff fg fh ge gf gg gh he hf hg hh i_158_d i_261_e i_502_f i_532_b i_661_c i_846_g i_942_a j_114_b j_155_f j_248_e j_326_d j_655_c j_723_g j_925_a k_182_a k_271_c k_286_e k_292_f k_294_g To display all files with single-character names: $ echo ? a b c d The next example prints all files whose names end with f: $ echo *f ef ff gf hf i_502_f j_155_f k_292_f All files containing a number whose first digit is in the range 3 to 6 can be shown with: $ echo *_[3-6]* i_502_f i_532_b i_661_c j_326_d j_655_c 2 Regular Expressions When I started writing shell scripts, I had problems with grep. I used the asterisk as a wildcard, expecting it to match any string. Most of the time, all was well, but occasionally grep would print a line I didn't want. For instance, when I wanted lines that contained call, I might get calculate as well, because I used 'call*' as the search pattern. At some point, it dawned on me that the patterns used by grep were not the wildcards I had been using for years to match files, but regular expressions, in which * stood for "zero or more occurrences of the preceding character or range of characters". To match any string, the pattern is .*, as the period matches any character, and the combination matches "zero or more occurrences of any character." As with pathname expansion, [...] matches any of the characters enclosed in the brackets. To match non-empty lines, search for any single character; that is, a dot: $ printf "%s\n" January February March "" May June July | grep . January February March May June July To print lines containing a b or a c, brackets are used: $ printf "%s\n" January February March " " May June July | grep '[bc]' February March In addition, the caret, ^, matches the expression only at the beginning of a line, and the dollar sign, $, matches only at the end of a line. Combining the two, ^...$, matches only the entire line. By anchoring the match to the beginning of the line, we can match lines with a as the second letter (the first letter can be anything): $ printf "%s\n" January February March " " May June July | grep '^.a' January March May Using both the caret and the dollar sign, we can match lines beginning with J and ending with y: $ printf "%s\n" January February March " " May June July | grep '^J.*y' January July There are various flavors of regular expressions, including basic (BREs) and extended (EREs). The Perl language has its own set (which has been incorporated into Python), but the basics are common to all versions. Regular expressions can be very complex (the example in the "Notes" to the printat function in Chapter 12 is daunting at first glance, but actually fairly simple), and are sometimes described as "write only"; once a regex (or regexp, the common abbreviations for regular expression) is written, it can be very hard to read it and understand how it works. A.M. Kuchling put it well in his Regular Expression HOWTO[1] (replace Python with whatever language you are using): "There are also tasks that can be done with regular expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable." If you want to delve deeper into regular expressions, the classic book from O'Reilly, sed & awk, has a very good section, and they are covered comprehensively in the Apress book, Regular Expression Recipes: A Problem- Solution Approach . There are also some links in the Appendix to online resources. In this book, you will find very few regular expressions, and none that cannot be easily understood. Parameter Expansion At its most basic, parameter expansion substitutes the value of the variable when it is preceded by a dollar sign ($). The variable may be enclosed in braces (${var}), and if the variable is a positional parameter greater than 9, the braces must be used. You can use three other forms of expansion within the braces: Bourne, POSIX, and shell specific. The original Bourne shell parameter expansions tested whether the variable was set or empty, and acted on the results of that test. The KornShell added expansions to return the length of the variable's contents, and to remove the beginning or end of the value if it matched a pattern; these have been incorporated into the POSIX standard. Korn Shell 93 (ksh93) added the search-and- replace and substring capabilities that have also been included in bash. 1 The Bourne Shell Expansions The original Bourne shell expansions have two forms. With a colon, they test whether a variable is null or unset; without the colon, the test is only whether the variable is unset. 1 ${var:-DEFAULT} If $var is unset or null, the expression expands to DEFAULT; otherwise, it expands to the contents of the variable: $ var= $ echo ${var:-y} y $ var=x $ echo ${var:-y} x Without the colon, the variable must be unset, not just null, for DEFAULT to be used (the result of the variable expansion is surrounded by slashes): $ var= $ echo /${var-y}/ // $ unset var $ echo /${var-y}//y/ 2 ${var:=DEFAULT} The only difference between this and the previous expansion, is that this also assigns a value to var: $ var= $ echo "${var:=q}" q $ echo "${var:=z}" q 3 ${var:+VALUE} This expansion (which was not in the very first Bourne shell) is the opposite of the previous two. If var is not null (or, without the colon, if it is set), VALUE is used. In the first example, var is unset, so the variable expands to an empty string, with or without the colon: $ unset var $ echo /${var:+X}/ // $ echo /${var+X}/ // In the next example, var is set but null. With the colon, the test is for a non-null string, so X is not printed. Without it, X is printed, because the test is for whether the variable is set. $ var= $ echo /${var:+X}/ // $ echo /${var+X}//X/ Finally, when the variable is set and not null, VALUE is used, with or without the colon: $ var=A $ echo /${var:+X}//X/ $ echo /${var+X}//X/ A common use for this type of expansion is when building a list in which a separator character is wanted between items. If we just used concatenation, we'd end up with the separator at the beginning where it is not wanted: $ for f in a b c d e > do > list=$list,$f >done $ echo $list ,a,b,c,d,e With this expansion, we can insert the separator only if $list is not empty: list=${list:+$list,}$f This is equivalent to: if [ -n "$list" ] then list=$list,$f else list=$f fi Using this expansion in place of the simple variable in the preceding example, there is no initial comma: $ for f in a b c d e > do > list=${list:+$list,},$f >done $ echo $list a,b,c,d,e 4 ${var:?MESSAGE} If var is unset (or, with the colon, null), an error or MESSAGE will be printed. If the shell is not interactive (as in the case of a script), it will exit. $ unset var $ echo ${var?} bash: var: parameter null or not set $ echo ${1?No value supplied} bash: 1: No value supplied 2 POSIX Parameter Expansions The expansions introduced by ksh, and adopted by POSIX, perform string manipulations that were once the province of the expr command. In these expansions, PATTERN is a file-globbing pattern, not a regular expression. 1 ${#var}-Length of Variable's Contents This expansion returns the length of the expanded value of the variable: $ var=LENGTH $ echo ${#var} 6 2 ${var%PATTERN}-Remove the Shortest Match from the End The variable is expanded, and the shortest string that matches PATTERN is removed from the end of the expanded value: $ var=usr/local/bin/crafty $ echo "${var%/*}" usr/local/bin 3 ${var%%PATTERN}-Remove the Longest Match from the End The variable is expanded, and the longest string that matches PATTERN from the end of the expanded value is removed: $ var=usr/local/bin/crafty $ echo "${var%%/*}" usr 4 ${var#PATTERN}-Remove the Shortest Match from the Beginning The variable is expanded, and the shortest string that matches PATTERN is removed from the beginning of the expanded value: $ var=usr/local/bin/crafty $ echo "${var#*/}" local/bin/crafty 5 ${var##PATTERN}-Remove the Longest Match from the Beginning The variable is expanded, and the longest string that matches PATTERN is removed from the beginning of the expanded value: $ var=usr/local/bin/crafty $ echo "${var##*/}" crafty 6 Combining Expansions The result of one expansion can be used as the PATTERN in another expansion to get, for example, the first or last character of a string: $ var=abcdef $ echo ${var%${var#?}} a $ echo ${var#${var%?}} f 3 Shell-Specific Expansions, bash2, and ksh93 I use two shell-specific parameter expansions in this book, either in the bash/ksh93 versions of functions (for example, substr in Chapter 3), or in bash- only scripts. 1 ${var//PATTERN/STRING}-Replace All Instances of PATTERN with STRING Because the question mark matches any single character, this example converts all the characters to tildes to use as an underline: $ var="Chapter 1" $ printf "%s\n" "$var" "${var//?/~}" Chapter 1 ~~~~~~~~~ This expansion can also be used with a single slash, which means to replace only the first instance of PATTERN. 2 ${var:OFFSET:LENGTH}-Return a Substring of $var A substring of $var starting at OFFSET is returned. If LENGTH is specified, that number of characters is substituted; otherwise, the rest of the string is returned. The first character is at offset 0: $ var=abcdefgh $ echo "${var:3:2}" de $ echo "${var:3}" defgh Shell Arithmetic In the Bourne shell, all arithmetic had to be done by an external command. For integer arithmetic, this was usually expr. The KornShell incorporated integer arithmetic into the shell itself, and it has been incorporated into the POSIX standard. The form is $(( expression )), and the standard arithmetic operators are supported: +, -, *, /, and %, for addition, subtraction, multiplication, division, and modulus (or remainder). There are other operators, but they are not used in this book; your shell's documentation will have all the details. The standard order of operator precedence that we remember from high school algebra applies here; multiplication and division are performed before addition and subtraction, unless the latter are grouped by parentheses: $ a=3 $ echo $(( $a + 4 * 12 )) 51 $ echo $(( ($a + 4) * 12 )) 84 The POSIX specification allows variables in arithmetic expressions to be used without a leading dollar sign, like this: echo $(( a + 4 )) instead of echo $(( $a + 4 )). This was not clear from early versions of the standard, and a major group of otherwise POSIX-compliant shells (ash, dash, and sh on BSD systems) did not implement it. In order for the scripts in this book to work in those shells, the dollar sign is always used. Aliases Aliases are the simple replacement of a typed command with another. In a POSIX shell, they can only take arguments after the command. Their use in scripts and on the command line) can be replaced entirely by functions; there are no aliases in this book. Sourcing a File When a script is executed, it can obtain the values of variables that have been placed in the environment with export, but any changes it makes to those or other variables will not be visible to the script that called it. Functions defined or changes to the current directory also will not affect the calling environment. For these to affect the calling environment, the script must be sourced. By using the dot command, the file is executed in the current shell's environment: . filename This technique is used throughout the book, most often to define a library of functions. Functions Functions group one or more commands under a single name. Functions are called in the same way as any other command, complete with arguments. They differ from ordinary commands in that they are executed in the current shell environment. This means they can see all the variables of the calling shell; they do not need to be exported. Variables set or changed in a function are also set or changed in the calling shell. And, most of all, a function that does all its work with shell commands and syntax is faster than an external command. 1 Functions Are Fast In Chapter 6, the basename and dirname functions replace the external commands of the same name, and do the job in a fraction of the time. Even a function more than 70 lines long can execute much faster than an external command. In Chapter 5, the _fpmul function is faster than the calc function, which uses awk, unless there are dozens of operands. Under normal circumstances, I wouldn't think of writing a shell function for floating-point multiplication; I'd let awk do it. I wrote _fpmul as a challenge, just to show that it could be done. Now that it's done, and it has proved to be faster than other methods, I do use it in scripts. A single line is all that's needed to make the function available: . math-funcs Other operations on decimal fractions are more complicated, and therefore aren't worth writing unless there's a specific need to do so. 2 Command Substitution Is Slow When I discovered that using command substitution to store the results of a function in a variable was so slow (in all shells except ksh93) that it severely reduced the advantage of using functions, I started looking for ways to mitigate the phenomenon. For a while I tried using a variable to tell a function whether to print the result: [ ${SILENT_FUNCS:-0} = 1 ] || echo "${_FPMUL}" This worked, but I found it ugly and cumbersome; when I didn't want a function to print anything, I had to set SILENT_FUNCS to 1 usually by preceding the call with SILENT_FUNCS=1. Occasionally, I could set it at the beginning of a section of code and have it in force for all subsequent function calls. I was well into writing this book when the solution occurred to me, and I had to backtrack and rewrite parts of earlier chapters to incorporate it. Whenever a function returns a value (other than an exit status), I now write two functions. One has the expected behavior of printing the result; the other, which begins with an underscore, sets a variable that is the function's name (including the underscore) converted to uppercase. To illustrate, here is a pair of functions to multiply two integers: _mul() { _MUL=$(( "$1" * "$2" )) } mul() { _mul "$@" && printf "%s\n" "$_MUL" } I can now print the result of the multiplication with $ mul 12 13 156 Or, I can store the result in a variable with $ _mul 12 13 $ product=$_MUL The extra few milliseconds it takes to use command substitution... $ time mul 123 456 56088 Real: 0.000 User: 0.000 System: 0.000 $ time { q=$(mul 123 456); } Real: 0.005 User: 0.001 System: 0.003 ...may not seem significant, but scripts often loop hundreds or even thousands of times, and may perform several such substitutions inside a loop. The result is a sluggish program. 3 Using the Functions in This Book I use functions in three ways: at the command line, as commands in scripts, and as a reference. For use at the command line, I source some of the function libraries in my shell startup file; others I load at the command line when I need them. In scripts, I usually source a single function library, and it will load any other libraries it needs. At other times, I use the function library as a reference, and I copy the code, sometimes modifying it, into the body of another script or function. The functions in this book are mostly stored in libraries of related functions. You may find a different structure more suited to your coding style. If so, go ahead and reorganize them. I would recommend that you avoid having the same function in more than one library, unless the subsequent versions offer additional features that are not usually needed. The first library in this book is a collection of functions that I use in many scripts. All are used in at least one script in the book. standard-funcs: A Collection of Useful Commands The functions in this library encapsulate commonly used tasks in a consistent interface that makes using them easy. When I need to get a keystroke from the user, I call get_key; when I want to display a date, I use show_date; when I want to exit from a script because something has failed, I use die. With menu1, I can display anything from a one-line to full-screen menu and execute a command based on the user's response. After defining the functions, the library loads standard_vars shown earlier in this chapter: . standard-vars 1 1.1 get_key-Get a Single Keystroke from the User In some circumstances, such as when asking a user to select from a menu, only a single key needs to be pressed. The shell read command requires a newline before it exits. Is there a way to read a single key? 1 How It Works The bash read command has an option, -n, to read only a specified number of characters, but that is lacking in most other shells. The portable way uses stty to turn off the terminal's buffering, and to set the minimum number of characters required for a complete read. A single character can then be read by dd. 1 Usage get_key [VAR] If a variable is specified, the key will be read into that variable. If not, it will be in $_KEY. 2 The Script get_key() { [ -t 0 ] && { ## Check whether input is coming from a terminal [ -z "$_STTY" ] && { _STTY=$(stty -g) ## Store the current settings for later restoration } ## By default, the minimum number of keys that needs to be entered is 1 ## This can be changed by setting the dd_min variable ## If the TMOUT variable is set greater than 0, the time-out is set to ## $TMOUT seconds if [ ${TMOUT:--1} -ge 0 ] then _TMOUT=$TMOUT stty -echo -icanon time $(( $_TMOUT * 10 )) min ${dd_min:-1} else stty -echo -icanon min ${dd_min:-1} fi } ## Read a key from the keyboard, using dd with a block size (bs) of 1. ## A period is appended, or command substitution will swallow a newline _KEY=$(dd bs=1 count=1 2>/dev/null; echo .) _KEY=${_KEY%?} ## Remove the period ## If a variable has been given on the command line, assign the result to it [ -n "$1" ] && ## Due to quoting, either ' or " needs special treatment; I chose ' case $_KEY in "'") eval "$1=\"'\"" ;; *) eval "$1='$_KEY'" ;; esac [ -t 0 ] && stty "$_STTY" ## reset terminal [ -n "$_KEY" ] ## Succeed if a key has been entered (not timed out) } 2 Notes The get_key function is often redefined in other scripts to allow entry of cursor and function keys-and even mouse clicks. For an example, which is too long to include in this book, see the mouse_demo script on my web site.[2] 2 1.2 getline-Prompt User to Enter a Line For interactive scripts, I like the editing capabilities that bash's readline library offers, but I still want the script to work in other POSIX shells. I want the best of both worlds! 1 How It Works The getline function checks for the existence of the $BASH_VERSION variable, and uses the readline library if it is set. If not, a POSIX read command is used. 1 Usage getline "PROMPT" [VAR] If no VAR is given, the line is read into the _GETLINE variable. If the variable name used is password, the keystrokes will not be echoed to the terminal. 2 The Script _getline() { ## Check that the parameter given is a valid variable name case $2 in [!a-zA-Z_]* | *[!a-zA-Z0-9_]* ) die 2 "Invalid variable name: $2" ;; *) var=${2:-_GETLINE} ;; esac ## If the variable name is "password" do not turn on echoing [ -t 0 ] && [ "$2" != "password" ] && stty echo case ${BASH_VERSION%%.*} in [2-9]|[1-9][0-9]) read ${TMOUT:+-t$TMOUT} -ep "$1: " -- $var ;; *) printf "%s: " "$1" >&2 IFS= read -r $var ;; esac [ -t 0 ] && stty -echo ## turn echoing back off } 3 1.3 press_any_key-Prompt for a Single Keypress Despite the now-trite comment, "But my keyboard doesn't have an ANY key," it is often desirable to pause execution of a script until the user presses a key with the message "PRESS ANY KEY TO CONTINUE." 1 How It Works The get_key function (shown two functions previously) provides the mechanism to read a single keypress, and printf and carriage returns display and erase the message. 1 Usage press_any_key At one time, this script accepted an argument: the name of a variable in which to store the key. I never used it, so I removed it. If you do want the keystroke, you can get it from the $_KEY variable set by get_key. 2 The Script press_any_key() { printf "\r " ## Display the message get_key ## Get the keystroke printf "\r \r" ## Erase the message } 4 1.4 menu1-Print a Menu and Execute a Selected Command Over the years, I have written many menu scripts, but when I needed a simple, easily adaptable menu, none fit the bill. All I wanted was a simple script that could display a menu, and execute a command based on the user's selection. 1 How It Works The menu1 (so called to prevent conflict with an existing menu program) function displays the $menu variable, which should be set by the calling script. This can be anything from a one-line menu to a full screen. The user selects by number, and the corresponding argument to menu1 is executed with eval. A 0, q, or a newline exits the menu. 1 Usage menu="MENU" menu1 CMD1 CMD2 ... Two optional variables control the behavior of the function. If $_MENU1 is not null, the function will exit after a single successful loop. If $pause_after is not null, the script will pause, and the user will be prompted to press_any_key. Here is a simple session; the characters in bold are the user's input, which do not actually appear when the script is run. $ menu=" 1. Yes 2. No ?" $ menu1 "echo YES" "echo NO" 1. Yes 2. No ? 3 bash: Invalid entry: 3 1. Yes 2. No ? 2 NO 1. Yes 2. No ? 1 YES 1. Yes 2. No ? q This is a slightly fancier version, with a two-line menu, that exits after one command: $ menu="$NL Do you want to see today's date?$NL 1. Yes 2. No ?" $ menu1 "date" ":" Do you want to see today's date? 1. Yes 2. No ? y bash: Invalid entry: y Do you want to see today's date? 1. Yes 2. No ? Mon Feb 7 08:55:26 EST 2005 For more elaborate menus using this function, see the conversion script in Chapter 5. 2 The Script menu1() { m1_items=$# ## Number of commands (i.e., arguments) while : ## Endless loop do printf "%s " "$menu" ## Display menu get_key Q ## Get user's choice case $Q in 0|q|""|"$NL" ) printf "\n"; break ;; ## Break out [1-$m1_items]) ## Valid number printf "\n" ( eval "\$$Q" ) ## Execute command case $pause_after in ## Pause if set *?) press_any_key ;; esac ;; *) printf "\n\t\a%sInvalid entry: %s\n" "${progname:+$progname: }" "$Q" continue ;; esac [ -n "$_MENU1" ] && break ## If set, exit after one successful selection done } 5 1.5 arg-Prompt for Required Argument If None Supplied I have a number of scripts that take an argument. I want a simple method to check whether an argument is present, and prompt the user for one if there isn't. 1 How It Works The arguments to the script are passed using "$@" to arg; if there is an argument, it is stored in $_arg; if not, the user is prompted (with the value of $prompt) to enter an appropriate value. 1 Usage arg "$@" The best illustration of this function comes from Chapter 5, where the conversion functions all use it. Here, for example, is the script to convert from ounces to grams (the actual calculation is done by _oz2g): oz2g() { units=grams prompt=Ounces arg "$@" _oz2g $arg printf "%s%s\n" "$_OZ2G" "${units:+ $units}" } When run without an argument, arg prompts for input. If the $units variable has been set by the calling script, it is printed: $ oz2g Ounces? 23 652.05 grams And with one, there is no prompt, and no units are printed because arg empties $units when there is an argument: $ oz2g 2 56.7 2 The Script arg() { case $1 in "") printf "%s? " "$prompt" >&2 ## Prompt for input stty echo ## Display characters entered read arg < /dev/tty ## Get user's input ;; *) arg="$*" ## Use existing arguments units= ## For use with the conversion script, Chapter 5 ;; esac } 6 1.6 die-Print Error Message and Exit with Error Status When a fatal error occurs, the usual action is to print a message and exit with an error status. I would like to make this exit routine uniform across all my scripts. 1 How It Works The die function takes an error number and a message as arguments. The first argument is stored in $result, and the rest are printed to the standard error. 1 Usage die NUM [MESSAGE ...] This function is usually used when an important command fails. For example, if a script needs certain directories, it may use checkdirs (introduced later in this chapter) and call die when one cannot be created: checkdirs "$HOME/scripts" "$HOME/bin" || die 13 "Could not create $dir" 2 The Script die() { result=$1 shift printf "%s\n" "$*" >&2 exit $result } 7 1.7 show_date-Display Date in D[D] MMM YYYY Format Much of the time, the display_date function from the date-funcs library in Chapter 8 is overkill, and all I want is a simple function to convert a date, usually in ISO format, into a more human-readable form. 1 How It Works By expanding the IFS variable to include the hyphen, period, and slash, as well as a space, most popular date formats can be split into their separate components. By default, the elements are expected in year-month-day order, but, by setting the DATE_FORMAT variable to dmy or mdy, the other common formats can also be accommodated. 1 Usage _show_date YYYY-MM-DD ## Result is stored in $_SHOW_DATE show_date YYYY-MM-DD ## Result is printed So long as you remember that the year 05 is two millennia in the past, you should have little trouble with this function. Here are a few examples: $ show_date 2005.04.01 1 Apr 2005 $ DATE_FORMAT=dmy show_date 04-01-2005 4 Jan 2005 $ DATE_FORMAT=mdy show_date 04/01/2005 1 Apr 2005 2 The Script _show_date() { oldIFS=$IFS ## Save old value of IFS IFS=" -./" ## Allow splitting on hyphen, period, slash and space set -- $* ## Re-split arguments IFS=$oldIFS ## Restore old IFS ## If there are less than 3 arguments, use today's date [ $# -ne 3 ] && { date_vars ## Populate date variables (see the next function) _SHOW_DATE="$_DAY $MonthAbbrev $YEAR" return } case $DATE_FORMAT in dmy) _y=$3 ## day-month-year format _m=${2#0} _d=${1#0} ;; mdy) _y=$3 ## month-day-year format _m=${1#0} _d=${2#0} ;; *) _y=$1 ## most sensible format _m=${2#0} _d=${3#0} ;; esac ## Translate number of month into abbreviated name set Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec eval _m=\${$_m} _SHOW_DATE="$_d $_m $_y" } show_date() { _show_date "$@" && printf "%s\n" "$_SHOW_DATE" } 8 1.8 date_vars-Set Date and Time Variables I need to set variables with elements of the current date and time. It's possible to do it like this, but it is very inefficient, and if it's run at the wrong instant (such as just before midnight), the results would be incorrect if the date changed between setting, for example, DAY and HOUR, because $HOUR and $MINUTE would refer to the day after $DAY: YEAR=$(date +%Y) MONTH=$(date +%m) DAY=$(date +%d) HOUR=$(date +%H) MINUTE=$(date +%M) SECOND=$(date +%S) 1 How It Works Combining the shell command, eval, and the date utility's format string, it can all be done in a single command; date is called with a format string that produces shell code to set all the variables, and eval executes it. 1 Usage date_vars [DATE OPTIONS] The output of the date command in the date_vars function looks like this: DATE=2005-02-05 YEAR=2005 MONTH=02 DAY=05 TIME=22:26:04 HOUR=22 MINUTE=26 SECOND=04 datestamp=2005-02-05_22.26.04 DayOfWeek=Sat DayOfYear=036 DayNum=6 MonthAbbrev=Feb This is interpreted as a shell command by eval to set the variables. While date_vars is usually called without any arguments, it can be used with whatever options your date utility provides. For example, if you have GNU date, you can use the -d option to use a date other than the current one: date_vars -d yesterday 2 The Script date_vars() { eval $(date "$@" "+DATE=%Y-%m-%d YEAR=%Y MONTH=%m DAY=%d TIME=%H:%M:%S HOUR=%H MINUTE=%M SECOND=%S datestamp=%Y-%m-%d_%H.%M.%S DayOfWeek=%a DayOfYear=%j DayNum=%w MonthAbbrev=%b") ## Remove leading zeroes for use in arithmetic expressions _MONTH=${MONTH#0} _DAY=${DAY#0} _HOUR=${HOUR#0} _MINUTE=${MINUTE#0} _SECOND=${SECOND#0} ## Sometimes the variable, TODAY, is more appropriate in the context of a ## particular script, so it is created as a synonym for $DATE TODAY=$DATE export DATE YEAR MONTH DAY TODAY TIME HOUR MINUTE SECOND export datestamp MonthAbbrev DayOfWeek DayNum } 9 1.9 is_num-Is This a Positive Integer? When a script needs an integer, and the method of input is not under the scripter's control (as when a user is asked for input), the value should be verified, and rejected if it is not what is required. 1 How It Works This function itself is limited to verifying positive integers, and checks for any non-digit characters. Negative integers can be accepted by calling the function without a leading negative sign on the argument. 1 Usage is_num INT The verification is registered entirely by the function's return code: $ is_num 33 && echo OK || echo NO GO OK $ var=-33 $ is_num "$var" && echo OK || echo NO GO NO GO A negative integer may be allowed by the caller by stripping any leading minus sign: $ is_num "${var#-}" && echo OK || echo NO GO OK 2 The Script is_num() { case $1 in *[!0-9]*) return 5 ;; ## Fail is any character is not a digit from 0 to 9 esac } 2 Notes I once used a more elaborate version that would accept negative integers as well as a verbose option, -v, which would print the results. Here it is; use whichever you prefer. is_num() { case ${1#-} in -v) isnum_verbose=1 shift ;; *) isnum_verbose=0 ;; esac case $1 in *[!0-9]*) case $isnum_verbose in 1) printf "Not a number: %s\n" $1 >&2 ;; esac return 5 ;; esac } 10 1.10 abbrev_num-Abbreviate Large Numbers When printing numbers in a confined space, such as a listing of mailbox contents in Chapter 10, printing them in a shortened form allows more room for other matter. I would like to convert 123456 to 123K, and 456789123 to 456M. 1 How It Works The suffix, K, M, or G, is determined by the length of the number, and the number is divided by 1000, 1000000, or 1000000000 as required. The result is always a maximum of four characters. 1 Usage _abbrev_num NUM ## Result is stored in $_ABBREV_NUM abbrev_num NUM ## Result is printed Four-digit numbers are not converted, as they are no longer than the maximum: $ abbrev_num 4321 4321 Longer number are converted (and rounded to the nearest unit): $ abbrev_num 123456 123K $ abbrev_num 234567890 235M 2 The Script The numbers are rounded up by adding 500, 500000, or 500000000 before dividing. _abbrev_num() { case ${#1} in 1|2|3|4) _ABBREV_NUM=$1 ;; 5|6) _ABBREV_NUM=$(( ($1 + 500) / 1000 ))K ;; 7|8|9) _ABBREV_NUM=$(( ($1 + 500000) / 1000000 ))M ;; 10|11|12) _ABBREV_NUM=$(( ($1 + 500000000) / 1000000000 ))G ;; *) _ABBREV_NUM="HUGE" ;; esac } abbrev_num() { _abbrev_num "$@" && printf "%s\n" "$_ABBREV_NUM" } 2 Notes These abbreviated numbers are sometimes referred to as "human-readable," but I find them less useful than the full number when it comes to comparing relative sizes. The abbreviated form cannot be sorted, and the length of a number gives a graphic representation of its size. That representation is helped when thousands separators are used, and for that we need a function: commas. 11 1.11 commas-Add Thousands Separators to a Number Large numbers are easier to understand when thousands separators are inserted. Can I do that in a script? 1 How It Works POSIX parameter expansion can give the length of a variable and remove three digits; this, plus concatenating the parts interspersed with commas, provides all that is necessary. 1 Usage _commas NNN ## Result is stored in $_COMMAS commas NNN [NNN ...] ## Result[s] is/are printed The underscore version of the function accepts only a single argument and stores it in _COMMAS, but the other accepts multiple arguments and prints the results one to a line. $ commas $(( 2345 * 43626 )) 3.14159265 299792458 102,302,970 3.14159265 299,792,458 2 The Script _commas() { _COMMAS= ## Clear the variable for the result _DECPOINT=. ## Character for decimal point; adjust for other locales _TH_SEP=, ## Character for separator; adjust for other locales case $1 in "$_DECPOINT"*) _COMMAS=$1 ## Number begins with dot; no action needed return ;; *"$_DECPOINT") ## Number ends with dot; store it in $c_decimal c_num=${1%"$_DECPOINT"} c_decimal=. ;; *"$_DECPOINT"*) c_num=${1%"$_DECPOINT"*} ## Separate integer and fraction c_decimal=$_DECPOINT${1#*"$_DECPOINT"} ;; *) c_num=$1 ## No dot, therefore no decimal c_decimal= ;; esac while : do case $c_num in ## Three, two or one digits [left] in $num; ## add them to the front of _COMMAS and exit from loop ???|??|?) _COMMAS=${c_num}${_COMMAS:+"$_TH_SEP"$_COMMAS} break ;; *) ## More than three numbers in $num left=${c_num%???} ## All but the last three digits ## Prepend the last three digits and a comma _COMMAS=${c_num#${left}}${_COMMAS:+"$_TH_SEP"$_COMMAS} c_num=$left ## Remove last three digits ;; esac done ## Replace decimal fraction, if any _COMMAS=${_COMMAS}${c_decimal} } commas() { for n do _commas "$n" && printf "%s\n" "$_COMMAS" done } 12 1.12 pr1-Print Arguments, One to a Line When I want to pipe a list of words to a command one at a time, it means I must use a for loop or pipe the items to tr -s ' ' '\n'. Is there a better way? 1 How It Works If there are more arguments than format specifiers in its format string, printf will reuse that string until all the arguments have been processed. If the format string contains one format specifier and a newline, each argument will be printed on a new line. 1 Usage pr1 [-w] ITEM ... With pr1, you can print a list of directories beginning with s, one to a line, without calling ls, and pipe it through awk to get the names without the full path. This is accomplished by setting the field separator to a slash, and printing the penultimate (NF-1) field: $ pr1 ~/work/s*/ | awk -F/ '{print $(NF-1)}' screenshots sounds src stocks sun sus sysadmin system Even using the external printf, if your shell does not have it built in, pr1 is considerably faster than echoing the list to ls -1 or xargs -n1. By default, arguments will be truncated to the width of the screen. To prevent that, use the -w option. 2 The Script pr1() { case $1 in -w) pr_w= ;; *) pr_w=-.${COLUMNS:-80} ;; esac printf "%${pr_w}s\n" "$@" } 13 1.13 checkdirs-Check for Directories; Create If Necessary I often need to know whether all the directories a script uses exist, and create them if they do not. 1 How It Works The checkdirs function checks whether each of its arguments is an existing directory, and attempts to create it if it doesn't; it prints an error message if a directory does not exist and cannot be created. If any directory could not be created, it returns an error status. 1 Usage checkdirs DIRPATH ... Only the existence of a directory is checked for, not the ability to read it or write into it. In this example, /bin exists, but it is unlikely that an ordinary user has the permission to write to a file inside it. $ checkdirs /qwe /bin uio /.autofsck || echo Failed mkdir: cannot create directory `/qwe': Permission denied mkdir: `/.autofsck' exists but is not a directory Failed 2 The Script checkdirs() { checkdirs=0 ## Return status: success unless a check fails for dir ## Loop through the directory on the command line do [ -d "$dir" ] && ## Check for the directory continue || ## If it exists, proceed to the next one mkdir -p "$dir" || ## Attempt to create it checkdirs=1 ## Set error status if $dir couldn't be created done return $checkdirs ## Return error status } 14 1.14 checkfiles-Check That a Directory Contains Certain Files With bash or ksh93, I can use brace expansion to combine multiple filenames with a directory and check for the existence of each one: $ for file in /bin/{bash1,bash2,bash3,bash4,bash5} > do > [ -f "$file" ] || return 5 > done Other POSIX shell do not have this syntax, so I need another method of combining directory and filename and checking them. 1 How It Works The directory given as the first argument is prepended to each of the other arguments in turn. The resulting path is checked, and the function returns with an error as soon as a given file does not exist. 1 Usage checkfiles DIR FILE ... The function fails if a file does not exist, and the name of the first nonexistent file will be in $_CHECKFILE. $ checkfiles /bin bash1 bash2 bash3 bash4 bash5 || > printf "%s\n" "$_CHECKFILE does not exist" bash4 does not exist 2 The Script checkfiles() { checkdict=$1 ## Directory must be first argument [ -d "$checkdict" ] || return 13 ## Fail if directory does not exist shift ## Remove dir from positional parameters for _CHECKFILE ## Loop through files on command line do [ -f "$checkdict/$checkfile" ] || return 5 done } 15 1.15 zpad-Pad a Number with Leading Zeroes When a one-digit number needs to be zero-padded to two digits, such as when building an ISO date, it's easy to do with parameter expansion. For example, month may be a single digit or two digits, and may or may not have a leading zero. This expansion will ensure it is two digits, with a leading 0 if necessary: month=0${month#0}. When more than one digit may have to be prepended and the number's length is variable, a more flexible method is needed. 1 How It Works The obvious method is to use $(printf "%0%${width}d" "$NUM"), but command substitution takes a long time in all shells except ksh93. The brute-force method of adding a zero in a loop until the desired size is reached is faster unless the amount of padding is large. The definition of large may vary from shell to shell and system to system; on my system with bash it is approximately 50. 1 Usage _zpad NUM PADDING [CHAR] ## Result is stored in $_ZPAD zpad NUM PADDING [CHAR] ## Result is printed NUM is the number to be padded; PADDING is the desired number of digits in the result: $ zpad 5 4 0005 The optional CHAR is a character to used for the padding instead of 0. $ zpad 23 5 x xxx23 2 The Script _zpad() { _ZPAD=$1 while [ ${#_ZPAD} -lt $2 ] do _ZPAD=${3:-0}$_ZPAD done } zpad() { _zpad "$@" && printf "%s\n" "$_ZPAD" } 16 1.16 cleanup-Remove Temporary Files and Reset Terminal on Exit My scripts often do unfriendly things to the terminal. These are all right so long as the script is running, but when control returns to the command line, it may be almost unusable. When the scripts exits, this terminal needs to be reset. In addition, the scripts may create temporary files that shouldn't be left lying around when it is finished. These need to be removed. 1 How It Works The trap command tells the shell to execute certain commands when a signal is received. If a trap is set on the EXIT signal, the specified commands will be executed whenever the script finishes, whether by a normal completion, and error, or by the user pressing Control+C. The cleanup function is designed for that purpose. 1 Usage trap cleanup [SIGNALS] The signal will usually be EXIT, and I have the standard-funcs library set this trap when it is sourced. 2 The Script cleanup() { ## delete temporary directory and files [ -n "$tempfile_dir" ] && rm -rf "$tempfile_dir" ## Reset the terminal characteristics [ -t 0 ] && { [ -n "$_STTY" ] && stty $_STTY || stty sane } exit } trap cleanup EXIT ## Set trap to call cleanup on exit The Unix Utilities With a few exceptions, the external commands I use in this book are standard on all Unix systems. Here I provide a quick look at most of the utilities you'll see in this book. For more complete information, see their man pages or the POSIX specification (see the Appendix for the web site, or Chapter 19 for a script to view the POSIX man pages). 1 cat: Concatenate Files to the Standard Output It is common to see novice shell scripters use cat to pipe a single file to a command that can take a filename as an argument. Randal Schwartz, best known for his books on Perl, used to give an "Unnecessary Use Of Cat" (UUOC) award to posters on the comp.unix.shell newsgroup. These four examples all qualify for the award: cat "$1" | sed 's/abc/ABC/g' cat /etc/passwd | grep "$USER" cat "$@" | awk '{print $2}' { cat file1; cat file2 | uuencode; } | mail xxx@yyy.invalid The first three could have been written without cat: sed 's/abc/ABC/g' "$1" grep "$USER" /etc/passwd awk '{print $2}' "$@" The last example is based on one posted, I am sad to say, by this author. When I duly won the UUOC award, I snapped back rather rudely to the effect that cat wasn't unnecessary at all. I had egg on my face when someone else pointed out that, while the first cat was indeed necessary, the second wasn't, and the command should have been this: { cat file1; uuencode file2; } | mail xxx@yyy.invalid In a script, valid uses for cat include these two tasks: * To provide more than one filename as input to a command that can only take a single file on the command line, or can only read the standard input. This often appears as cat "$@" | CMD, which will concatenate any files given as arguments, and read the standard input if no files are given. * To send an unmodified file, along with the output of one or more commands, to the input of another command. 2 sed: A Text Stream Editor I rarely use sed for more than relatively simple tasks. It is a very powerful, non-interactive text editor, but it uses cryptic, single-letter commands. The scripts, which also use the inherently hard-to-read regular expressions, can easily become difficult to maintain. My main use for sed is for search and replace. To change all lines that start with /bin to /usr/bin in the file script.sh and store the result in newscript.sh, use this command: sed 's|^/bin|/usr/bin|' script.sh > newscript.sh The usual delimiter after the s (substitute) command is the slash, but any character can be used. Since the slash is in the strings being used, it either must be escaped, or another character must be used; I used the pipe symbol. A command in a sed script can be limited to certain lines or certain ranges of lines. To define this address space, either a number (to match a line number) or a regular expression may be used. To limit the changes to the first ten lines, and changing all instances of John to Jack: sed '1,10 s/John/Jack/g' To change the first instance of one only on lines containing a number of at least two digits: sed '/[0-9][0-9]/ s/one/1/' When no file is supplied on the command line, the standard input is used. All output is sent to the standard output. 3 awk: Pattern Scanning and Processing Language The awk programming language can do many of the same things as sed, but it uses a much more legible syntax. Like sed, it processes a file line by line, operating on lines that match a pattern. When no pattern is associated with an action, it is performed on all lines; when no action is associated with a pattern, the default action is to print the line. To print only non-blank lines, a regular expression consisting of a single dot matches those lines which contain any character (pr1, from the standard-funcs library earlier in this chapter, prints each argument on a separate line): $ pr1 a b c "" e f | awk '/./' a b c e f To print the last field lines that begin with a number (NF is an awk variable that contains the number of fields on the current line; the dollar sign specifies a field on the line), use this: awk '/^[0-9]/ {print $NF}' The fields are, by default, separated by whitespace characters. The field separator can be changed either on the command line with the -F CHAR option, or in the script in a BEGIN block: $ awk 'BEGIN { FS = ":" } /chris|cfaj/ {print $1, $NF}' /etc/passwd chris /bin/bash cfaj /usr/local/bin/pdksh 4 grep: Print Lines Matching a Regular Expression Like awk and sed, grep reads the standard input if no files are specified, and can accept multiple files on the command line. The regular expression can be as simple as a string (i.e., a regular expression that matches only itself): $ grep '^c' /etc/passwd chris:x:501:501:Chris F.A. Johnson:/home/chris:/bin/bash cfaj:x:503:503:me:/home/cfaj:/usr/local/bin/pdksh 5 date: Show or Set the System Date The flexibility of date is in the format string, which determines what information about the current date and time will be printed, and how it will be printed. The format string begins with a + and can contain literal strings and specifiers for the elements of the date and time. A good example is in the date_vars function in the standard-funcs library, shown previously. Setting the system date must be done by the superuser, and doesn't figure in any of the scripts in this book. 6 tr: A Character Translation Utility I often see posts in the newsgroups asking why tr cannot change one string to another; the reason is that it is a character translation utility; use sed or awk to change strings. The tr utility converts characters in the first string with the corresponding characters in the second string: $ echo abcdefgascdbf | tr abc 123 123defg1s3d2f With the -d option, it deletes characters: $ echo abcdefgascdbf | tr -d abc defgsdf See the man page for full details and other options. 7 wc: Count Characters, Words, and Lines in a File By default, wc prints the number of lines, words, and characters in a file, followed by the filename. The information printed can be selected with the -l, -w, and -c options. If any option is given, only the information requested is printed: wc -l prints only the number of lines, and wc -cw prints the number of characters and the number of words. When no filename is given, wc reads the standard input, and, naturally, prints no filename. To store information from wc in a variable, use input redirection instead of giving a filename: $ wc -l FILE 3419 FILE $ wc -l < FILE 3419 Some versions of wc print spaces before the size. These can cause problems when using the result as a number in calculation. There are various ways to remove the leading spaces. I usually use shell arithmetic: var=$(( $(wc -l < FILE) )) 8 file: Determine the File Type The file command uses "magic" files that contain rules for classifying files. $ file /etc/passwd /bin/ls ./ean13.ps /etc/passwd: ASCII text /bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/ Linux 2.2.5, dynamically linked (uses shared libs), stripped ./ean13.ps: ASCII English text, with CRLF line terminators 9 ls: Sort and Provide Details About Files If ls is called with arguments, it lists the files, or the contents of directories, supplied on the command line. Without any arguments, it lists the names of files and directories in the current directory. This example lists files and directories in the current directory that begin with letters in the range a to m; the -d option tells it just to print the names of directories, not their contents, and -F prints a classification character after the names of certain types of files (in this example, slashes indicate directories). $ l$ ls -d -F [a-m]* actions ean13.ps file2 liblcms1_1.12-2ubuntu1_i386/ acute enzyme.dat firefox-installer/ max/ bin enzymes.tar gle-log mc-chris/ ch8.doc file1 lfy/ menulog To me, the most important options are -l, which prints details about the files instead of just the names, and -t, which prints the most recently modified files first. Many more options are available; as usual, see the man page for details. 10 uniq: Remove Consecutive Duplicate Lines For uniq to remove all duplicate lines from a file, it must first be sorted; otherwise, only consecutive duplicate lines are removed (see unique in Chapter 19 for a script to remove duplicate lines from an unsorted file). Since the sort command has a -u option that removes duplicate lines, uniq is more often used with -c for counting the number of instances of each line (the {a..g} construction was introduced in bash3, and expands to a b c d e f g): $ pr1 $(( $RANDOM % 2 )){a..g} $(( $RANDOM % 2 )){a..g} | sort | uniq -c 2 0a 1 0b 1 0c 2 0d 1 0g 1 1b 1 1c 2 1e 2 1f 1 1g 11 sudo: Execute Commands as the Superuser The file, /etc/sudoers, contains rules that give users permission to run commands as other users, usually as root. Permission can be given to selected users to run only certain commands or all commands, with or without a password being required, by prefacing it with sudo. 12 split: Divide a File into Equal-Sized Pieces By default, split divides a file into separate files of 1,000 lines each, using xaa, xab, and so on to name the files. The user can choose to split a file by any number of bytes or number of lines. The name and length of the extension can also be set with command-line options. 13 which: Show the Full Path to a Command I have generally avoided which, partly because Bourne-type shells have the built- in type command, and partly because historically it was a csh script that did not work in a Bourne-type shell. In recent years, however, which is a compiled command on most systems and works with all shells. The advantage it has over type is that it prints only the path to the command; type's output is not standard, and usually includes more than just the path. 14 gs, gv: Render, Convert, or View PostScript and PDF Files Ghostscript is used by several commands to render PostScript programs; I rarely use it directly. I use gv to view PS and PDF files, and ImageMagick's convert to convert from one format to another. Summary There's more to the shell than I have described in this chapter. It is only intended to explain some of the commands and concepts used in this book's scripts, not be a tutorial on how to write shell scripts. Throughout the book, the scripts are commented to describe what is going on, but they do not always explain the syntax itself. This chapter has provided those explanations. ----------------------- [1]http://www.amk.ca/python/howto/regex/regex.html [2] http://cfaj.freeshell.org/src/scripts/mouse-demo-sh