For more information see the Software Carpentry shell lesson (we have taken a small subset here).
Loops
We can use a loop to do some operation once for each thing in a list. Here’s a simple example that displays the first three lines of each namelist file in turn:
On norStore (cruncher.norstore.uio.no)
cd /projects/NS1000K/GEF4530/outputs/$USER/runs/f2000.T31T31.test/run
for filename in atm_modelio.nml ice_modelio.nml
do
head -n 3 $filename
done
When the shell sees the keyword for, it knows it is supposed to repeat a command (or group of commands) once for each thing in a list. In this case, the list is the two filenames. Each time through the loop, the name of the thing currently being operated on is assigned to the variable called filename. Inside the loop, we get the variable’s value by putting $ in front of it: $filename is atm_modelio.nml the first time through the loop, ice_modelio.nml the second, and so on.
By using the dollar sign we are telling the shell interpreter to treat filename as a variable name and substitute its value on its place, but not as some text or external command. When using variables it is also possible to put the names into curly braces to clearly delimit the variable name: $filename is equivalent to ${filename}, but is different from ${file}name. You may find this notation in other people’s programs.
Finally, the command that’s actually being run is "head", so this loop prints out the first three lines of each data file in turn.
Remark: The shell prompt changes from $ to > and back again as we were typing in our loop. The second prompt, >, is different to remind us that we haven’t finished typing a complete command yet. A semicolon, ;, can be used to separate two commands written on a single line.
We have called the variable in this loop filename in order to make its purpose clearer to human readers. The shell itself doesn’t care what the variable is called; if we wrote this loop as:
for x in atm_modelio.nml ice_modelio.nml
do
head -n 3 $x
done
or:
for temperature in atm_modelio.nml ice_modelio.nml
do
head -n 3 $temperature
done
it would work exactly the same way. Don’t do this. Programs are only useful if people can understand them, so meaningless names (like x) or misleading names (like temperature) increase the odds that the program won’t do what its readers think it does.
Here’s a slightly more complicated loop:
for filename in *.nml
do
echo $filename
head -n 100 $filename | tail -n 20
done
The shell starts by expanding *.nml to create the list of files it will process. The loop body then executes two commands for each of those files. The first, echo, just prints its command-line parameters to standard output. For example:
echo hello there
prints:
hello there
In this case, since the shell expands $filename to be the name of a file, echo $filename just prints the name of the file. Note that we can’t write this as:
for filename in *.nml
do
$filename
head -n 100 $filename | tail -n 20
done
because then the first time through the loop, when $filename expanded to atm_modelio.nml, the shell would try to run atm_modelio.nml as a program. Finally, the head and tail combination selects lines 81-100 from whatever file is being processed.
Spaces in Names
Filename expansion in loops is another reason you should not use spaces in filenames. Suppose our data files are named:
atm_modelio.nml
cpl user.nml
glc_modelio.nml
If we try to process them using:
for filename in *.nml
do
head -n 100 $filename | tail -n 20
done
then the shell will expand *.nml to create:
atm_modelio.nml cpl user.nml glc_modelio.nml
With older versions of Bash, or most other shells, filename will then be assigned the following values in turn:
atm_modelio.nml
cpl
user.nml
glc_modelio.nml
That’s a problem: head can’t read files called cpl and user.nml because they don’t exist, and won’t be asked to read the file cpl user.nml.
We can make our script a little bit more robust by quoting our use of the variable:
for filename in *.nml
do
head -n 100 "$filename" | tail -n 20
done
but it’s simpler just to avoid using spaces (or other special characters) in filenames.
Remark: instead of the head command, you can use any command such as ncl, a python script, etc.
Shell scripts
We are finally ready to see what makes the shell such a powerful programming environment. We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.
Let’s start by going back to /projects/NS1000K/GEF4530/outputs/$USER/runs/f2000.T31T31.test/run and putting the following line in the file process_nml.sh:
cd /projects/NS1000K/GEF4530/outputs/$USER/runs/f2000.T31T31.test/run
cat > process_nml.sh << EOF
for filename in *.nml
do
head -n 100 "$filename" | tail -n 20
done
EOF
It creates a file called process_nml.sh containing a shell loop (over namelist file .nml).
We can ask the shell to execute the commands it contains. Our shell is called bash, so we run the following command:
bash process_nml.sh
Sure enough, our script’s output is exactly what we would get if we ran that loop directly.
Finding Things
See the Software Carpentry lesson on Finding Things.