GEF4530 - Shell commands

For more information see the Software Carpentry shell lesson (we have taken a small subset here).

Loops

We can use a loop to do some operation once for each thing in a list. Here’s a simple example that displays the first three lines of each namelist file in turn: On norStore (cruncher.norstore.uio.no)


cd /projects/NS1000K/GEF4530/outputs/$USER/runs/f2000.T31T31.test/run
for filename in atm_modelio.nml ice_modelio.nml
do
  head -n 3 $filename
done

When the shell sees the keyword for, it knows it is supposed to repeat a command (or group of commands) once for each thing in a list. In this case, the list is the two filenames. Each time through the loop, the name of the thing currently being operated on is assigned to the variable called filename. Inside the loop, we get the variable’s value by putting $ in front of it: $filename is atm_modelio.nml the first time through the loop, ice_modelio.nml the second, and so on.

By using the dollar sign we are telling the shell interpreter to treat filename as a variable name and substitute its value on its place, but not as some text or external command. When using variables it is also possible to put the names into curly braces to clearly delimit the variable name: $filename is equivalent to ${filename}, but is different from ${file}name. You may find this notation in other people’s programs.

Finally, the command that’s actually being run is "head", so this loop prints out the first three lines of each data file in turn.

Remark: The shell prompt changes from $ to > and back again as we were typing in our loop. The second prompt, >, is different to remind us that we haven’t finished typing a complete command yet. A semicolon, ;, can be used to separate two commands written on a single line.

We have called the variable in this loop filename in order to make its purpose clearer to human readers. The shell itself doesn’t care what the variable is called; if we wrote this loop as:


for x in atm_modelio.nml ice_modelio.nml
do
    head -n 3 $x
done

or:


for temperature in atm_modelio.nml ice_modelio.nml
do
    head -n 3 $temperature
done

it would work exactly the same way. Don’t do this. Programs are only useful if people can understand them, so meaningless names (like x) or misleading names (like temperature) increase the odds that the program won’t do what its readers think it does.

Here’s a slightly more complicated loop:


for filename in *.nml
do
    echo $filename
    head -n 100 $filename | tail -n 20
done

The shell starts by expanding *.nml to create the list of files it will process. The loop body then executes two commands for each of those files. The first, echo, just prints its command-line parameters to standard output. For example:


echo hello there

prints:


hello there

In this case, since the shell expands $filename to be the name of a file, echo $filename just prints the name of the file. Note that we can’t write this as:


for filename in *.nml
do
    $filename
    head -n 100 $filename | tail -n 20
done

because then the first time through the loop, when $filename expanded to atm_modelio.nml, the shell would try to run atm_modelio.nml as a program. Finally, the head and tail combination selects lines 81-100 from whatever file is being processed. Spaces in Names Filename expansion in loops is another reason you should not use spaces in filenames. Suppose our data files are named:


atm_modelio.nml
cpl user.nml
glc_modelio.nml

If we try to process them using:


for filename in *.nml
do
    head -n 100 $filename | tail -n 20
done

then the shell will expand *.nml to create:


atm_modelio.nml cpl user.nml glc_modelio.nml

With older versions of Bash, or most other shells, filename will then be assigned the following values in turn:


atm_modelio.nml
cpl
user.nml
glc_modelio.nml

That’s a problem: head can’t read files called cpl and user.nml because they don’t exist, and won’t be asked to read the file cpl user.nml.

We can make our script a little bit more robust by quoting our use of the variable:


for filename in *.nml
do
    head -n 100 "$filename" | tail -n 20
done

but it’s simpler just to avoid using spaces (or other special characters) in filenames. Remark: instead of the head command, you can use any command such as ncl, a python script, etc.

Shell scripts

We are finally ready to see what makes the shell such a powerful programming environment. We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.
Let’s start by going back to /projects/NS1000K/GEF4530/outputs/$USER/runs/f2000.T31T31.test/run and putting the following line in the file process_nml.sh:


cd /projects/NS1000K/GEF4530/outputs/$USER/runs/f2000.T31T31.test/run
cat > process_nml.sh << EOF
for filename in *.nml
do
    head -n 100 "$filename" | tail -n 20
done
EOF

It creates a file called process_nml.sh containing a shell loop (over namelist file .nml).
We can ask the shell to execute the commands it contains. Our shell is called bash, so we run the following command:


bash process_nml.sh

Sure enough, our script’s output is exactly what we would get if we ran that loop directly.

Finding Things

See the Software Carpentry lesson on Finding Things.