Welcome to Tuesday's Tips for shell scripting.
These tips come from my own scripting as well as answers I have provided to queries in various usenet newgroups (e.g., comp.unix.shell and comp.os.linux.misc).
The series ran from April to September, 2004, at which time I began work on a book of shell scripts. Due to the demands that project made on my time, I was unable to continue the series.
To format one's PATH variable for easy viewing, try this function:
path()
{
oldIFS=$IFS
IFS=:
printf "%s\n" $PATH
IFS=$oldIFS
}
A typical run of the function:
$ path /bin /usr/bin /usr/bin/X11 /usr/X11R6/bin /usr/local/bin /home/chris/bin /usr/games /home/chris/scripts
Most recent systems have a locate command (which
may be a link to slocate). It uses a database to
look up file names, but the search is done on the entire path to
the file. In other words, locate qwe finds a lot of
files in the /usr/lib/kbd/keymaps/ directory as
well the /home/chris/qwe file.
To look up a file name with locate, you need to use a wild card:
locate "*/qwe"
I have this in a command called flocate, which will
look up multiple files given on the command line:
for p do locate "*/$p" done
To toggle a variable between two values, I use this
var_toggle function:
var_toggle()
{
eval "_VAR_TOGGLE=\$$1"
[ ${_VAR_TOGGLE:-0} = ${3:-0} ] &&
_VAR_TOGGLE=${2:-1} ||
_VAR_TOGGLE=${3:-0}
eval "$1=\$_VAR_TOGGLE"
}
The first argument is the name of the variable to be toggled. Successive calls to var_toggle alternate the value of the variable between two values.
If no other arguments are given, the variable is toggled between 1 and 0.
$ var=1 $ var_toggle var; echo $var 0 $ var_toggle var; echo $var 1 $ var_toggle var; echo $var 0 $ var_toggle var; echo $var 1
If one other argument is given, the values alternates between that value and 0.
$ var_toggle var 13; echo $var 0 $ var_toggle var 13; echo $var 13 $ var_toggle var 13; echo $var 0 $ var_toggle var 13; echo $var 13
If two more arguments are given, the value is toggled between those two.
$ var_toggle var 13 5; echo $var 5 $ var_toggle var 13 5; echo $var 13 $ var_toggle var 13 5; echo $var 5 $ var_toggle var 13 5; echo $var 13
By default, each Unix process has 3 file descriptors (FDs) assigned to it, 0, 1 and 2; these are known as stdin, stdout, and stderr respectively. They are normally connected to your terminal: stdin is the keyboard; stdout and stderr are your screen.
Each of these represents a stream that can be redirected to other places, such as files or pipes. The output streams, stdout and stderr can be combined and sent to the same place, or directed to different locations.
If you redirect stdout (FD1) to a file, stderr (FD2) will still go to your screen.
$ ls -ld /home /qwerty ls: /qwerty: No such file or directory drwxr-xr-x 11 root root 4096 Jul 11 02:58 /home
If we redirect stdout to the bit bucket, the stderr will still be sent to the screen:
$ ls -ld /home /qwerty 1>/dev/null ls: /qwerty: No such file or directory
(NOTE: the 1 can be left off; >xxxx
implies 1>xxxx.)
Similarly, if we redirect stderr to the bit bucket, the stdout will still be sent to the screen:
$ ls -ld /home /qwerty 2>/dev/null drwxr-xr-x 11 root root 4096 Jul 11 02:58 /home
Or we can redirect both stderr and stdout to the bit bucket:
$ ls -ld /home /qwerty >/dev/null 2>/dev/null
This works with /dev/null, which isn't a real file, but redirecting both stdout and stderr to the same file individually will not work, as both redirections truncate the file:
$ ls -ld /home /qwerty >/tmp/xxx 2>/tmp/xxx
(The full explanation is too technical for this tip, but it is akin to redirecting output to the same file as the input.)
To redirect both stdout and stderr to the same file, we redirect one stream to the file, then redirect the other to the first stream:
$ ls -ld /home /qwerty >/tmp/xxx 2>&1
The order of the redirections is important. This will send stderr to the terminal and stdout to the file /tmp/xxx:
$ ls -ld /home /qwerty 2>&1 >/tmp/xxx
Sending stderr to stdout attaches the stream to wherever stdout is pointing at the time of the redirection. It is as if stdout and stderr are variables; you are doing the equivalent of:
## the defaults stdout=screen stderr=screen ## redirect stderr stderr=$stdout ## redirect stdout stdout=/dev/null
Now, stdout=/dev/null, and stderr=screen.
If you change the order, the result is different:
## the defaults stdout=screen stderr=screen ## redirect stdout stdout=/dev/null ## redirect stderr stderr=$stdout
Now, stdout and stderr are both pointing to /dev/null.
To see a list of subdirectories of the current directory:
printf "%s\n" */
With a Bourne shell:
echo */.
Taking the Heads Up tip from 2
weeks ago a little further, let's look at some more things
the read command can do. The basic read
command does more than just read a line of input. It strips
leading and trailing whitespace, and it processes escape
sequences introduced by a backslash.
The primary function of this is to allow lines to be continued by ending a line with a backslash. If a file contains:
This is the first line \ this is a continuation \ and so is this
Using read, all three lines will be concatenated
into a single line with one command:
$ read x < $HOME/txt $ echo "$x" This is the first line this is a continuation and so is this
To prevent backslashes being interpreted, raw mode
(-r) is used:
$ read -r x < $HOME/txt $ echo "$x" This is the first line \
If more than one variable is given as an argument to
read, the line will be broken up (using the
characters in IFS as separators) and assigned to each variable
in turn. If there are more words than variables, the last
variable will contain the remainder of the line:
$ read -r a b c d < $HOME/txt $ printf "%s\n" "a=$a" "b=$b" "c=$c" "d=$d" a=This b=is c=the d=first line \
Or, without using raw mode:
$ read a b c d < $HOME/txt $ printf "%s\n" "a=$a" "b=$b" "c=$c" "d=$d" a=This b=is c=the d=first line this is a continuation and so is this
The recently released Bash 3.0 has added a parameter expansion function that can replace, and do more than, the external command, seq, found on GNU/Linux systems.
In version 3 of Bash, brace expansion can be used to expand a range of numbers — or letters.
$ echo {1..13}
1 2 3 4 5 6 7 8 9 10 11 12 13
$ echo {h..o}
h i j k l m n o
If the first number or letter is higher than the second, the range is expanded in descending order:
$ echo {z..o}
z y x w v u t s r q p o
$ echo {9..3}
9 8 7 6 5 4 3
A common way of getting the first line from a file:
var=`head -1 FILE`
But why use an external command?
If you want the first line of the file, why not just read it?
read var < FILE
If you want 2 lines:
{
read var1
read var2
} < FILE
Some modern shells have a $RANDOM variable that generates a different random integer between 0 and 32767 each time it is referenced.
This can be used to select a random string from a list. The
randstr function selects one of its arguments at
random and puts it in the variable $_RETVAL:
randstr() {
[ $# -eq 0 ] && return 1
n=$(( ($RANDOM % $#) + 1 ))
eval _RETVAL=\${$n}
}
For example, to pick a card at random:
randstr diamonds hearts clubs spades suit=$_RETVAL randstr Ace 2 3 4 5 6 7 8 9 10 Jack Queen King card="$_RETVAL of $suit" echo $card
In bash2 or ksh93, you can use an array, populated through brace expansion:
deck=( {A,2,3,4,5,6,7,8,9,10,J,Q,K}_of_{Diamonds,Hearts,Clubs,Spades} )
randstr "${deck[@]}"
echo $_RETVAL
You can roll dice:
randstr 1 2 3 4 5 6 echo $_RETVAL
...with as many sides as you like; e.g. a 12-sided die:
randstr 1 2 3 4 5 6 7 8 9 10 11 12 echo $_RETVAL
Of course, the dice could be more efficiently implemented with:
roll() ## USAGE: roll [N] -- N = number of sides on die; default 6
{
roll_sides=${1:-6}
_RETVAL=$(( $RANDOM % $roll_sides + 1 ))
}
I have some variables that I include in almost all my shell scripts.
I keep them in /usr/local/bin/standard-vars, and just source them at the top of each script:
. standard-vars
I put problematic characters, such as newline (NL) and escape (ESC) into variables:
## bash and ksh93 specific; ## in other shells, replace the $'\X' with a literal character ## DEC OCT HEX NL=$'\n' ## 10, \012, 0x0a, a literal newline CR=$'\r' ## 13, \015, 0x0d, carriage return TAB=$'\t' ## 9, \011, 0x09, tab ESC=$'\e' ## 27, \033, 0x1b, escape
For some scripts, I would insert the above into the script itself.
The code used for those values is specific to bash and ksh93. In other shells one has to replace the escape sequence with the literal character:
NL=' ' CR=' '
Your web browser is probably unable to show that the character assigned to CR is a carriage return; that's one reason I like to use the $'\X' syntax in scripts posted on websites or to Usenet newsgroups.
There are more variables in my standard-vars script, mostly dealing with manipulating the terminal display. I'll look at them in the near future.
The current incarnation of the standard-vars script is at http://cfaj.freeshell.org/src/scripts/standard-vars-sh.
This function will centre text on a line of a given length:
centre() ## USAGE: centre width text...
{
c_width=$1
shift
c_text="$*"
c_width=$(( ($c_width + ${#c_text}) / 2 ))
printf "%${c_width}.${c_width}s\n" "$c_text"
}
Sample usage:
centre 45 this is centered on 45 characters centre $COLUMNS this is centred across the entire window
By using eval, it is possible to set more than one
variable with a single command.
This is, perhaps, best illustrated by the date
command, which is often used to set multiple variables, for
example, YEAR, MONTH and DAY.
Too often, I see it used this way:
YEAR=`date +%Y` MONTH=`date +%m` DAY=`date +%d`
Not only can it be done with a single call to date, but multiple calls can give the wrong results, if the date crosses a boundary between the calls. This is most likely to happen when using minutes and seconds.
Doing it the correct way ensures that all the variables are set using the same date and time:
eval "`date "+DATE=%Y-%m-%d
YEAR=%Y
MONTH=%m
DAY=%d
TIME=%H:%M:%S
HOUR=%H
MINUTE=%M
SECOND=%S
datestamp=%Y-%m-%d_%H.%M.%S
DayOfWeek=%a
MonthAbbrev=%b"`"
If you are using bash or ksh93, the quickest way to add numbers in a file is with parameter substitution and the shell's built-in arithmetic. The file must contain nothing but numbers. In bash the numbers must all be integers; in ksh93, they can include decimal fractions:
set -- `< $1` ## for multiple files use: set -- `cat "$@"`
q=$*
printf "%s\n" $(( ${q// / + } ))
All it takes to determine how many days in any month is a simple look-up table, with a reference to last week's tip if the month is February.
days_in_month() { ## USAGE: days_in_month [month [year]]
if [ -n "$1" ]
then
dim_m=$1
dim_y=$2
else
eval `date "+dim_m=%m dim_y=%Y"`
fi
case $dim_m in
*9|*4|*6|11)
_DAYS_IN_MONTH=30
;; ## 30 days hath September...
1|01|3|03|*5|*7|*8|10|12)
_DAYS_IN_MONTH=31 ;;
2|02)
is_leap_year ${dim_y:-`date +%Y`} &&
_DAYS_IN_MONTH=29 ||
_DAYS_IN_MONTH=28 ;;
esac
}
The result is stored in $_DAYS_IN_MONTH.
Is this (or any given year) a leap year?
These two functions, is_leap_yr() and is_leap_year(), both do the job. Both use the same syntax, using the current year if one is not supplied on the command line.
The first one uses arithmetic to determine whether the year is a leap year, and will only work in a POSIX shell (such as ksh or bash). The second uses pattern matching, and will work in any Bourne-type shell.
is_leap_yr() { ## USAGE: is_leap_yr [year]
ily_year=${1:-`date +%Y`}
[ $(( $ily_year % 400)) -eq 0 -o \
\( $(( $ily_year % 4)) -eq 0 -a \
$(( $ily_year % 100)) -ne 0 \) ] && {
_IS_LEAP_YEAR=1
return 0
} || {
_IS_LEAP_YEAR=0
return 1
}
}
is_leap_year() { ## USAGE: is_leap_year [year]
ily_year=${1:-`date +%Y`}
case $ily_year in
*0[48] |\
*[2468][048] |\
*[13579][26] |\
*[13579][26]0|\
*[2468][048]00 |\
*[13579][26]00 ) _IS_LEAP_YEAR=1
return 0 ;;
*) _IS_LEAP_YEAR=0
return 1 ;;
esac
}
By examining either the exit status of the function or the value of $_IS_LEAP_YEAR, one can determine whether any year (in the Gregorian calendar) is a leap year:
year=1999
if is_leap_year $year
then
echo $year is a leap year
else
echo $year is not a leap year
fi
Or test the variable $_IS_LEAP_YEAR:
year=1999
is_leap_year $year
if [ $_IS_LEAP_YEAR -eq 1 ]
then
echo $year is a leap year
else
echo $year is not a leap year
fi
To search a man page for a specific term, I use this function:
sman() { ## usage: sman command search_term
PAGER=less
export PAGER
LESS="$LESS${2:+ +/$2}" man $1
}
Examples:
sman bash EXPANSION sman find printf sman grep "REGULAR EXPRESSIONS"
Subsequent occurrences of the search term can be found by pressing "n", previous ones with "N".
The uniq command will "Discard all but one of successive identical lines" from a file or input stream.
In order to remove non-consecutive duplicate lines, use awk:
awk '!x[$0]++' FILE
Sometimes one wants to truncate a line to the width of the screen or window. While it's possible to do it by actually shortening the string in a variable, the easiest way to do it is with printf.
Bash2 (and later), Korn Shell 93, and the BSD shell (ash or dash on GNU/Linux) have printf built in, and it is generally installed as a command on modern *nix systems (it's required by the POSIX standard).
long_string="Bash2 and KornShell93 have printf built in, and it is generally installed as a command on modern *nix systems (it's required by the POSIX standard)."
printf "%${COLUMNS}.${COLUMNS}s" "$long_string"
Modern shells will automatically set the COLUMNS and LINES variables. Bash has an option to do it, but not all shells update these variables size when a window's size is changed:
shopt -s checkwinsize
In other shells, tput or stty can provide the information:
set -- `stty size` LINES=$1 COLUMNS=$2
Or:
LINES=`tput lines` COLUMNS=`tput cols`
Far too often, I see scripts like this:
string="123,456,789" v1=`echo $string | cut -d, -f1` v2=`echo $string | cut -d, -f2` v3=`echo $string | cut -d, -f3`
It uses three calls to an external command (cut) where none is necessary.
External commands are rarely needed to parse a string in a POSIX shell, and even using a Bourne shell they can often be avoided.
The shell splits strings into words using the value of the Internal Field Separator (IFS) variable as the delimiter, so the same thing can be accomplished this way:
string="123,456,789" oldIFS=$IFS IFS=, set -- $string v1=$1 v2=$2 v3=$3 IFS=$oldIFS
The Korn Shell (ksh) and the Bourne-Again Shell (bash) from
version 2 on support one-dimensional arrays which can be assigned
by array[N]=something_or_other and
referenced by ${array[N]}, where N is a
number from 0 up (ksh has an upper limit of 4095 [recent versions
of Korn Shell 93 have increased this], bash is limited only by
available memory).
When adding consecutive elements to an array, there is no need to maintain an index into the array; instead of:
for c in red green blue white
do
array[$n]=$c
n=$(( $n + 1 ))
done
just use:
for c in red green blue white
do
array[${#array[@]}]=$c
done
Since array elements start at index 0, the number of elements in
the array, ${#array[@]}, is also the
number of the next empty element.
More elements can be added at any time using the same syntax.
Of course, it doesn't work on a sparse array, that is, one with unset elements.