Shell Tools and Environments

Shell Scripting

Overview

Teaching: 25 min
Exercises: min

Questions

How to use control flow in shell?

Objectives

The next step in shell usage is scripting

Learn how to make use of special variables in scripting

Shell globbing helps you expand string expressions

Shell Scripting

So far we have seen how to execute commands in the shell and pipe them together. However, in many scenarios you will want to perform a series of commands and make use of control flow expressions like conditionals or loops.

Shell scripts are the next step in complexity. Most shells have their own scripting language with variables, control flow and its own syntax. What makes shell scripting different from other scripting programming language is that it is optimized for performing shell-related tasks. Thus, creating command pipelines, saving results into files, and reading from standard input are primitives in shell scripting, which makes it easier to use than general purpose scripting languages.

To assign variables in bash, use the syntax foo=bar and access the value of the variable with $foo. Note that foo = bar will not work since it is interpreted as calling the foo program with arguments = and bar. In general, in shell scripts the space character will perform argument splitting. This behavior can be confusing to use at first, so always check for that.

Strings in bash can be defined with ' and " delimiters, but they are not equivalent. Strings delimited with ' are literal strings and will not substitute variable values whereas " delimited strings will.

foo=bar
echo "$foo"
# prints bar
echo '$foo'
# prints $foo

As with most programming languages, bash supports control flow techniques including if, case, while and for. Similarly, bash has functions that take arguments and can operate with them. Here is an example of a function that creates a directory and cds into it.

mcd () {
    mkdir -p "$1"
    cd "$1"
}

Here $1 is the first argument to the script/function. Unlike other scripting languages, bash uses a variety of special variables to refer to arguments, error codes, and other relevant variables. Below is a list of some of them. A more comprehensive list can be found here.

$0 - Name of the script
$1 to $9 - Arguments to the script. $1 is the first argument and so on.
$@ - All the arguments
$# - Number of arguments
$? - Return code of the previous command
$$ - Process identification number (PID) for the current script
!! - Entire last command, including arguments. A common pattern is to execute a command only for it to fail due to missing permissions; you can quickly re-execute the command with sudo by doing sudo !!
$_ - Last argument from the last command. If you are in an interactive shell, you can also quickly get this value by typing Esc followed by .

Commands will often return output using STDOUT, errors through STDERR, and a Return Code to report errors in a more script-friendly manner. The return code or exit status is the way scripts/commands have to communicate how execution went. A value of 0 usually means everything went OK; anything different from 0 means an error occurred.

Exit codes can be used to conditionally execute commands using && (and operator) and || (or operator), both of which are short-circuiting operators. Commands can also be separated within the same line using a semicolon ;. The true program will always have a 0 return code and the false command will always have a 1 return code. Let’s see some examples:

false || echo "Oops, fail"
# Oops, fail

true || echo "Will not be printed"
#

true && echo "Things went well"
# Things went well

false && echo "Will not be printed"
#

true ; echo "This will always run"
# This will always run

false ; echo "This will always run"
# This will always run

Another common pattern is wanting to get the output of a command as a variable. This can be done with command substitution. Whenever you place $( CMD ) it will execute CMD, get the output of the command and substitute it in place. For example, if you do for file in $(ls), the shell will first call ls and then iterate over those values. A lesser known similar feature is process substitution, <( CMD ) will execute CMD and place the output in a temporary file and substitute the <() with that file’s name. This is useful when commands expect values to be passed by file instead of by STDIN. For example, diff <(ls foo) <(ls bar) will show differences between files in directories foo and bar.

Since that was a lot of information, here’s an example for filtering gene expression data downloaded from the Alan Brain Atlas:

#!/bin/bash

echo "Starting program at $(date)" # Date will be substituted

echo "Running program $0 with $# arguments with pid $$"

for file in "$@"; do
    grep PAX6 "$file" > results.csv 2> /dev/null
    # When pattern is not found, grep has exit status 1
    # Redirect STDERR to a null register
    if [[ $? -ne 0 ]]; then
        echo "File $file does not have our gene of interest"
    fi
done

In the comparison we tested whether $? was not equal to 0. Bash implements many comparisons of this sort - you can find a detailed list in the man page for test. When performing comparisons in bash, try to use double brackets [[ ]] in favor of simple brackets [ ]. Chances of making mistakes are lower although it won’t be portable to sh. A more detailed explanation can be found here.

When launching scripts, you will often want to provide arguments that are similar. Bash has ways of making this easier, expanding expressions by carrying out filename expansion. These techniques are often referred to as shell globbing.

Wildcards - Whenever you want to perform some sort of wildcard matching, you can use ? and * to match one or any amount of characters respectively. For instance, given files foo, foo1, foo2, foo10 and bar, the command rm foo? will delete foo1 and foo2 whereas rm foo* will delete all but bar.
Curly braces {} - Whenever you have a common substring in a series of commands, you can use curly braces for bash to expand this automatically. This comes in very handy when moving or converting files.

convert image.{png,jpg}
# Will expand to
convert image.png image.jpg

cp /path/to/project/{foo,bar,baz}.sh /newpath
# Will expand to
cp /path/to/project/foo.sh /path/to/project/bar.sh /path/to/project/baz.sh /newpath

# Globbing techniques can also be combined
mv *{.py,.sh} folder
# Will move all *.py and *.sh files


mkdir foo bar
# This creates files foo/a, foo/b, ... foo/h, bar/a, bar/b, ... bar/h
touch {foo,bar}/{a..h}
touch foo/x bar/y
# Show differences between files in foo and bar
diff <(ls foo) <(ls bar)
# Outputs
# < x
# ---
# > y

Writing bash scripts can be tricky and unintuitive. There are tools like shellcheck that will help you find errors in your sh/bash scripts.

Note that scripts need not necessarily be written in bash to be called from the terminal. For instance, here’s a simple Python script that outputs its arguments in reversed order:

#!/usr/local/bin/python
import sys
for arg in reversed(sys.argv[1:]):
    print(arg)

The kernel knows to execute this script with a python interpreter instead of a shell command because we included a shebang line at the top of the script. It is good practice to write shebang lines using the env command that will resolve to wherever the command lives in the system, increasing the portability of your scripts. To resolve the location, env will make use of the PATH environment variable we introduced in the first lecture. For this example the shebang line would look like #!/usr/bin/env python.

Key Points

Assign variables with foo=bar syntax

STDIN, STDOUT, and STDERR

Shell define strings differently with single and double quotation marks

Globbing with wildcards and curly braces

Shell Tools and Environments

Shell Scripting

Overview

Shell Scripting

Key Points

lesson home

next episode