\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] Back to Table of Contents

23 Unix intro

Unix is an \indexacf{OS}, that is, a layer of software between the user or a user program and the hardware. It takes care of files and screen output, and it makes sure that many processes can exist side by side on one system. However, it is not immediately visible to the user. Most of the time that you use Unix, you are typing commands which are executed by an interpreter called the shell . The shell makes the actual OS calls. There are a few possible Unix shells available, but in this tutorial we will assume that you are using the sh or bash shell, although many commands are common to the various shells in existence.

This short tutorial will get you going; if you want to learn more about Unix and shell scripting, see for instance \url{http://www.tldp.org/guides.html}. Most of this tutorial will work on any Unix-like platform, including Cygwin on Windows. However, there is not just one Unix:

23.1 Files and such

Top > Files and such

In this section you will learn about the Unix file system, which consists of \indexterm{directories} that store \indexterm{files} . You will learn about \indexterm{executable} files and commands for displaying data files.

23.1.1 Looking at files

Top > Files and such > Looking at files

In this section you will learn commands for displaying file contents.

The \indextermunix{ls} command gives you a listing of files that are in your present location.

\practical {Type ls. Does anything show up?} {If there are files in your directory, they will be \n{l}isted; if there are none, no output will be given. This is standard Unix behaviour: no output does not mean that something went wrong, it only means that there is nothing to report.}{}

The \indextermunix{cat} command is often used to display files, but it can also be used to create some simple content.

\practical {Type cat > newfilename (where you can pick any filename) and type some text. Conclude with Control-d on a line by itself\footnote {Press the \texttt{Control} and hold it while you press the \texttt{d} key.}. Now use \n{cat} to view the contents of that file: cat newfilename.} {In the first use of \n{cat}, text was concatenated from the terminal to a file; in the second the file was cat'ed to the terminal output. You should see on your screen precisely what you typed into the file.} {Be sure to type Control-d as the first thing on the last line of input. If you really get stuck, Control-c will usually get you out. Try this: start creating a file with \n{cat > filename} and hit Control-c in the middle of a line. What are the contents of your file?}


Instead of \n{Control-d} you will often see the notation~\n{^D}. The capital letter is for historic reasons: you use the control key and the lowercase letter.

Above you used ls to get a directory listing. You can also use the ls command on specific files:

\practical {Do ls newfilename with the file that you created above; also do ls nosuchfile with a file name that does not exist.} {For an existing file you get the file name on your screen; for a non-existing file you get an error message.} {}

The ls command can give you all sorts of information. \practical {Read the man page of the \n{ls} command: man ls. Find

out the size and the time/date date of the last change to some files, for instance the file you just created.} {Did you find the \n{ls -s} and ls -l options? The first one lists the size of each file, usually in kilobytes, the other gives all sorts of information about a file, including things you will learn about later.} {The man command puts you in a mode where you can view long text documents. This viewer is common on Unix systems (it is available as the more or less system command), so memorize the following ways of navigating: Use the space bar to go forward and the u key to go back up. Use \n{g} to go to the beginning fo the text, and G for the end. Use \n{q} to exit the viewer. If you really get stuck, Control-c will get you out.}


There are several dates associated with a file, corresponding to changes in content, changes in permissions, and access of any sort. The \indextermunix{stat} command gives all of them.


If you already know what command you're looking for, you can use man to get online information about it. If you forget the name of a command, \indextermunix{man}~\n{-k keyword} can help you find it.

The \indextermunix{touch} command creates an empty file, or updates the timestamp of a file if it already exists. Use ls -l to confirm this behaviour.

Three more useful commands for files are: \indextermunix{cp} for copying, \indextermunix{mv} (short for `move') for renaming, and \indextermunix{rm} (`remove') for deleting. Experiment with them.

There are more commands for displaying a file, parts of a file, or information about a file.

\practical{Do ls /usr/share/words or \n{ls

/usr/share/dict/words} to confirm that a file with words exists on your system. Now experiment with the commands \n{head}, tail, more, and wc using that file.} { \indextermunix{head} displays the first couple of lines of a file, \indextermunix{tail} the last, and \indextermunix{more} uses the same viewer that is used for man pages. Read the man pages for these commands and experiment with increasing and decreasing the amount of output. The \indextermunix{wc} (`word count') command reports the number of words, characters, and lines in a file.}{}

Another useful command is \indextermunix{which}: it tells you what type of file you are dealing with. See what it tells you about one of the text files you just created.

23.1.2 Directories

Top > Files and such > Directories

Here you will learn about the Unix directory tree, how to manipulate it and how to move around in it.

A unix file system is a tree of directories, where a directory is a container for files or more directories. We will display directories as follows:

\dirdisplay{.1 /\DTcomment{The root of the directory tree}. .2 bin\DTcomment{Binary programs}. .2 home\DTcomment{Location of users directories}. }

The root of the Unix directory tree is indicated with a slash. Do ls / to see what the files and directories there are in the root. Note that the root is not the location where you start when you reboot your personal machine, or when you log in to a server.

\practical {The command to find out your current working directory is \indextermunix{pwd}. Your home directory is your working directory immediately when you log in. Find out your home directory.} {You will typically see something like /home/yourname or /Users/yourname. This is system dependent.}{}

Do ls to see the contents of the working directory. In the

displays in this section, directory names will be followed by a slash:~dir/ but this character is not part of their name. You can get this output by using ls -F, and you can tell your shell to use this output consistently by stating alias ls=ls -F at the start of your session. Example:

\dirdisplay{.1 /home/you/. .2 adirectory/. .2 afile. }

The command for making a new directory is \indextermunix{mkdir}.

\practical{Make a new directory with \indextermunix{mkdir}~newdir and view the current directory with ls} {You should see this structure: \dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{the new directory}. } }{}

The command for going into another directory, that is, making it your working directory, is \indextermunix{cd} (`change directory'). It can be used in the following ways:

  • \n{cd} Without any arguments, cd takes you to your home directory.
  • cd An absolute path starts at the root of the directory tree, that is, starts with~\n{/}. The cd command

    takes you to that location.

  • cd A relative path is one that does not start at the root. This form of the cd command takes you to /.
\practical{Do cd newdir and find out where you are in the directory tree with pwd. Confirm with ls that the directory is empty.

How would you get to this location using an absolute path?} {\n{pwd} should tell you \n{/home/you/newdir}, and ls then has no output, meaning there is nothing to list. The absolute path is /home/you/newdir.}{}

\practical{Let's quickly create a file in this directory: \indextermunix{touch} \n{onefile}, and another directory: mkdir otherdir. Do ls and confirm that there are a new file and directory.} {You should now have: \dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{you are here}. .3 onefile. .3 otherdir/. }}{}

The \n{ls} command has a very useful option: with ls -a you see

your regular files and hidden files, which have a name that starts with a dot. Doing ls -a in your new directory should tell you that there are the following files:

\dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{you are here}. .3 .. .3 ... .3 onefile. .3 otherdir/. }

The single dot is the current directory, and the double dot is the directory one level back.

\practical{Predict where you will be after cd ./otherdir/.. and

check to see if you were right.} {The single dot sends you to the current directory, so that does not change anything. The otherdir part makes that subdirectory your current working directory. Finally, .. goes one level back. In other words, this command puts your right back where you started.}{}

Since your home directory is a special place, there are shortcuts for \n{cd}'ing to it: \n{cd} without arguments, \n{cd ~}, and cd \$HOME all get you back to your home.

Go to your home directory, and from there do ls newdir to check

the contents of the first directory you created, without having to go there.

\practical {What does ls .. do?} {Recall that .. denotes the directory one level up in the tree: you should see your own home directory, plus the directories of any other users.}{}

\practical {Can you use ls to see the contents of someone else's home directory? In the previous exercise you saw whether other users exist on your system. If so, do \n{ls ../thatotheruser}.} {If this is your private computer, you can probably view the contents of the other user's directory. If this is a university computer or so, the other directory may very well be protected -- permissions are discussed in the next section -- and you get ls: ../otheruser: Permission denied.}{}

Make an attempt to move into someone else's home directory with cd. Does it work?

You can make copies of a directory with cp, but you need to add a

flag to indicate that you recursively copy the contents: \n{cp -r}. Make another directory somedir in your home so that you have

\dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{you have been working in this one}. .2 somedir/\DTcomment{you just created this one}. }

What is the difference between \n{cp -r newdir somedir} and \n{cp -r newdir thirddir} where thirddir is not an existing directory name?

23.1.3 Permissions

Top > Files and such > Permissions

In this section you will learn about how to give various users on your system permission to do (or not to do) various things with your files.

Unix files, including directories, have permissions, indicating `who can do what with this file'. Actions that can be performed on a file fall into three categories:

  • reading r: any access to a file (displaying, getting information

    on it) that does not change the file;

  • writing w: access to a file that changes its content, or even its

    metadata such as `date modified';

  • executing x: if the file is executable, to run it; if it is a

    directory, to enter it.

The people who can potentially access a file are divided into three classes too:

  • the user u: the person owning the file;
  • the group g: a group of users to which the owner belongs;
  • other o: everyone else.

These nine permissions are rendered in sequence \begin{equation} \begin{array} {|c|c|c|} \hline user&group&other\\ \hline rwx&rwx&rwx \\ \hline \end{array} \end{equation} For instance rw-r--r-- means that the owner can read and write a file, the owner's group and everyone else can only read.

Permissions are also rendered numerically in groups of three bits, by letting $\n{r}=4$, $w=2$, $x=1$: \begin{equation} \begin{array} {|c|} \hline rwx\\ \hline 421 \\ \hline \end{array} \end{equation} Common codes are $7=\n{rwx}$ and $6=rw$. You will find many files that have permissions $755$ which stands for an executable that everyone can run, but only the owner can change, or $644$ which stands for a data file that everyone can see but again only the owner can alter. You can set permissions by the \indextermunix{chmod} command:

  chmod <permissions> file         # just one file
  chmod -R <permissions> directory # directory, recursively
  chmod 766 file  # set to rwxrw-rw-
  chmod g+w file  # give group write permission
  chmod g=rx file # set group permissions
  chod o-w  file  # take away write permission from others
  chmod o=  file  # take away all permissions from others.
  chmod g+r,o-x file # give group read permission
                     # remove other execute permission
The man page gives all options.

\practical {Make a file \n{foo} and do chmod u-r foo. Can you now inspect its contents? Make the file readable again, this time using a numeric code. Now make the file readable to your classmates. Check by having one of them read the contents.} {When you've made the file `unreadable' by yourself, you can still \n{ls} it, but not cat it: that will give a `permission denied' message.}{}

Make a file com with the following contents:

#!/bin/sh echo "Hello world!"

This is a legitimate shell script. What happens when you type ./com? Can you make the script executable?

In the three permission categories it is who `you' and `others' refer to. How about `group'? The command \indextermunix{groups} tells you all the groups you are in, and ls -l tells you what group a file belongs to. Analogous to chmod, you can use \indextermunix{chgrp} to change the group to which a file belongs, to share it with a user who is also in that group. Adding a user to a group sometimes needs system priviliges.

23.1.4 Wildcards

Top > Files and such > Wildcards You already saw that ls filename gives you information about that one file, and ls gives you all files in the current directory. To

see files with certain conditions on their names, the wildcard mechanism exists. The following wildcards exist:

  • [*] any number of characters.
  • [?] any character.


s       sk      ski     skiing  skill
ski     skiing  skill
The second option lists all files whose name start with ski, followed by any number of other characters'; below you will see that in different contexts \n{ski*} means `sk followed by any number of i characters'. Confusing, but that's the way it is.

23.2 Text searching and regular expressions

Top > Text searching and regular expressions

In this section you will learn how to search for text in files.

For this section you need at least one file that contains some amount of text. You can for instance get random text from \url{http://www.lipsum.com/feed/html}.

The \indextermunix{grep} command can be used to search for a text expression in a file.

\practical{Search for the letter q in your text file with \n{grep

q yourfile} and search for it in all files in your directory with grep q *. Try some other searches.} {In the first case, you get a listing of all lines that contain a~\n{q}; in the second case, grep also reports what file name the match was found in: qfile:this line has q in it.} {If the string you are looking for does not occur, grep will simply not output anything. Remember that this is standard behaviour for Unix commands if there is nothing to report.}

In addition to searching for literal strings, you can look for more general expressions.

\begin{tabular} {|l|l|} \hline \verb+^+&the beginning of the line\\ \verb+$+&the end of the line\\ \verb+.+&any character\\ \verb+*+&any number of repetitions \\ \verb+[xyz]+&any of the characters xyz\\ \hline \end{tabular}

This looks like the wildcard mechanism you just saw (section  23.1.4 ) but it's subtly different. Compare the example above with:

In the second case you search for a string consisting of sk and any number of i characters, including zero of them.

Some more examples: you can find

  • All lines that contain the letter `q' with \verb+grep q yourfile+;

  • All lines that start with an `a' with \verb+grep "^a" yourfile+ (if your search string contains special characters, it is a good idea to use quote marks to enclose it);

  • All lines that end with a digit with \verb+grep "[0-9]$" yourfile+.

\practical{Construct the search strings for finding

  • lines that start with an uppercase character, and

  • lines that contain exactly one character.

}{For the first, use the range characters [], for the second use the period to match any character.}{}

\practical{Add a few lines x = 1, \n{x {} = 2}, \n{x {} {} = 3} (that is, have different numbers of spaces between x and the

equals sign) to your test file, and make grep commands to search for all assignments to~x.}{}{}

The characters in the table above have special meanings. If you want to search that actual character, you have to escape it.

\practical{Make a test file that has both \n{abc} and a.c in

it, on separate lines. Try the commands \n{grep "a.c" file}, grep a\\.c file, grep "a\\.c" file.} {You will see that the period needs to be escaped, and the search string needs to be quoted. In the absence of either, you will see that \n{grep} also finds the abc string.}{}

23.2.1 Stream editing with \texttt{sed}

Top > Text searching and regular expressions > Stream editing with \texttt{sed}

Unix has various tools for processing text files on a line-by-line basis. The stream editor \indextermunix{sed} is one example. If you have used the vi editor, you are probably used to a syntax like \verb+s/foo/bar/+ for making changes. With sed, you can do this on the commandline. For instance

sed 's/foo/bar/' myfile > mynewfile
will apply the substitute command s/foo/bar/ to every line of myfile. The output is shown on your screen so you should capture it in a new file; see section  23.3.2 for more on output redirection .

23.2.2 Cutting up lines with \texttt{cut}

Top > Text searching and regular expressions > Cutting up lines with \texttt{cut}

Another tool for editing lines is \indextermunix{cut}, which will cut up a line and display certain parts of it. For instance,

cut -c 2-5 myfile
will display the characters in position 2--5 of every line of myfile. Make a test file and verify this example.

Maybe more useful, you can give cut a delimiter character and have

it split a line on occurrences of that delimiter. For instance, your system will mostly likely have a file /etc/passwd that contains user information\footnote{This is traditionally the case; on Mac OS information about users is kept elsewhere and this file only contains system services.}, with every line consisting of fields separated by colons. For instance:

daemon:*:1:1:System Services:/var/root:/usr/bin/false
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
The seventh and last field is the login shell of the user; /bin/false indicates that the user is unable to log in.

You can display users and their login shells with:

cut -d ":" -f 1,7 /etc/passwd
This tells cut to use the colon as delimiter, and to print fields 1 and 7.

23.3 Command execution

Top > Command execution

23.3.1 Search paths

Top > Command execution > Search paths

In this section you will learn how Unix determines what to do when you type a command name.

If you type a command such as ls, the shell does not just rely on

a list of commands: it will actually go searching for a program by the name ls. This means that you can have multiple different commands with the same name, and which one gets executed depends on which one is found first.

\practical{What you may think of as `Unix commands' are often just executable files in a system directory. Do \indextermunix{which} ls, and do an ls -l on the result} {The location of \n{ls} is something like /bin/ls. If you ls that, you will see that it is probably owned by root. Its executable bits are probably set for all users.}{}

The locations where unix searches for commands is the `search path', which is stored in the environment variable (for more details see below) PATH.

\practical{Do \n{echo \$PATH}. Can you find the location of cd?

Are there other commands in the same location? Is the current directory `.' in the path? If not, do \n{export PATH=".:\$PATH"}. Now create an executable file cd in the current director (see above for the basics), and do cd.} {The path will be a list of colon-separated directories,\\ for instance /usr/bin:/usr/local/bin:/usr/X11R6/bin. If the working directory is in the path, it will probably be at the end: /usr/X11R6/bin:. but most likely it will not be there. If you put `.' at the start of the path, unix will find the local cd command before the system one.}{}

Some people consider having the working directory in the path a security risk. If your directory is writable, someone could put a malicious script named cd (or any other system command) in your directory, and you would execute it unwittingly.

It is possible to define your own commands as aliases of existing commands.

\practical{Do alias chdir=cd and convince yourself that now \n{chdir} works just like \n{cd}. Do alias rm='rm -i'; look up

the meaning of this in the man pages. Some people find this alias a good idea; can you see why?} {The \n{-i} `interactive' option for rm makes the command ask for confirmation before each delete. Since unix does not have a trashcan that needs to be emptied explicitly (as on Windows or the Mac OS), this can be a good idea.}{}

23.3.2 Redirection

Top > Command execution > Redirection

In this section you will learn how to feed one command into another, and how to connect commands to input and output files.

So far, the unix commands you have used have taken their input from your keyboard, or from a file named on the command line; their output went to your screen. There are other possibilities for providing input from a file, or for storing the output in a file. Input redirection

Top > Command execution > Redirection > Input redirection The grep command had two arguments, the second being a file name. You can also write grep string < yourfile, where the

less-than sign means that the input will come from the named file, \n{yourfile}. This is known as input redirection. Output redirection

Top > Command execution > Redirection > Output redirection Conversely, grep string yourfile > outfile will take what normally goes to the terminal, and redirect the output to outfile. The output

file is created if it didn't already exist, otherwise it is overwritten. (To append, use grep text yourfile >> outfile.)

\practical{Take one of the grep commands from the previous section, and send its output to a file. Check that the contents of the file are identical to what appeared on your screen before. Search for a string that does not appear in the file and send the output to a file. What does this mean for the output file?} {Searching for a string that does not occur in a file gives no terminal output. If you redirect the output of this grep to a file, it gives a zero size file. Check this with \n{ls} and wc.}{} Standard files

Top > Command execution > Redirection > Standard files

Unix has three standard files that handle input and output:

  • [{\tt stdin}\ ] is the file that provides input for processes.

  • [{\tt stdout}] is the file where the output of a process is written.

  • [{\tt stderr}] is the file where error output is written.

In an interactive session, all three files are connected to the user terminal. Using input or output redirection then means that the input is taken or the output sent to a different file than the terminal.

23.3.3 Command sequencing

Top > Command execution > Command sequencing

There are various ways of having multiple commands on a single commandline. Simple sequencing

Top > Command execution > Command sequencing > Simple sequencing

First of all, you can type

command1 ; command2
This is convenient if you repeat the same two commands a number of times: you only need to up-arrow once to repeat them both.

There is a problem: if you type

cc -o myprog myprog.c ; ./myprog
and the compilation fails, the program will still be executed, using an old version of the executable if that exists. This is very confusing.

A better way is:

cc -o myprog myprog.c && ./myprog
which only executes the second command if the first one was successful. Pipelining

Top > Command execution > Command sequencing > Pipelining

Instead of taking input from a file, or sending output to a file, it is possible to connect two commands together, so that the second takes the output of the first as input. The syntax for this is \verb+cmdone | cmdtwo+; this is called a pipeline. For instance, \verb+grep a yourfile | grep b+ finds all lines that contains both an \n{a} and a~b.

\practical{Construct a pipeline that counts how many lines there are in your file that contain the string \n{th}. Use the wc command (see above) to do the counting.} {}{} Backquoting

Top > Command execution > Command sequencing > Backquoting

There are a few more ways to combine commands. Suppose you want to present the result of wc a bit nicely. Type the following command

echo The line count is wc -l foo
where foo is the name of an existing file. The way to get the actual line count echoed is by the backquote:
echo The line count is `wc -l foo`
Anything in between backquotes is executed before the rest of the command line is evaluated.

Exercise The way wc is used here, it prints the

file name. Can you find a way to prevent that from happening? Grouping in a subshell

Top > Command execution > Command sequencing > Grouping in a subshell

Suppose you want to apply output redirection to a couple of commands in a row:

  configure ; make ; make install > installation.log 2>&1
This only catches the last command. You could for instance group the three commands in a subshell and catch the output of that:
  ( configure ; make ; make install ) > installation.log 2>&1

23.3.4 Exit status

Top > Command execution > Exit status

Commands can fail. If you type a single command on the command line, you see the error, and you act accordingly when you type the next command. When that failing command happens in a script, you have to tell the scrip how to act accordingly. For this, you use the exit status of the command: this is a value (zero for success, nonzero otherwise) that is stored in an internal variable, and that you can access with \verb+$?+.

Example. Suppose we have a directory that is not writable

[testing] ls -ld nowrite/
dr-xr-xr-x  2 eijkhout  506  68 May 19 12:32 nowrite//
[testing] cd nowrite/
and write try to create a file there:
[nowrite] cat ../newfile 
touch $1
echo "Created file: $1"
[nowrite] newfile myfile
bash: newfile: command not found
[nowrite] ../newfile myfile
touch: myfile: Permission denied
Created file: myfile
[nowrite] ls
The script reports that the file was created even though it wasn't.

Improved script:

[nowrite] cat ../betterfile
touch $1
if [ $? -eq 0 ] ; then
    echo "Created file: $1"
    echo "Problem creating file: $1"

[nowrite] ../betterfile myfile
touch: myfile: Permission denied
Problem creating file: myfile

23.3.5 Processes

Top > Command execution > Processes

The Unix operating system can run many programs at the same time, by rotating through the list and giving each only a fraction of a second to run each time. The command \indextermunix{ps} can tell you everything that is currently running.

\practical{Type ps. How many programs are currently running? By default ps gives you only programs that you explicitly started. Do ps guwax for a detailed list of everything that is

running. How many programs are running? How many belong to the root user, how many to you?} {To count the programs belonging to a user, pipe the ps command through an appropriate \n{grep}, which can then be piped to wc.} {}

In this long listing of ps, the second column contains the process numbers. Sometimes it is useful to have those. The cut command

explained above can cut certain position from a line: type ps guwax | cut -c 10-14.

To get dynamic information about all running processes, use the top command. Read the man page to find out how to sort the output by CPU usage.

When you type a command and hit return, that command becomes, for the duration of its run, the \indexterm{foreground process}. Everything else that is running at the same time is a background process .

Make an executable file hello with the following contents:

#!/bin/sh while [ 1 ] ; do sleep 2 date done

and type ./hello.

\practical{Type Control-z. This suspends the foreground process. It will give you a number like \n{[1]} or [2]

indicating that it is the first or second program that has been suspended or put in the background. Now type bg to put this process in the background. Confirm that there is no foreground process by hitting return, and doing an ls.} {After you put a process in the background, the terminal is available again to accept foreground commands. If you hit return, you should see the command prompt. However, the background process still keeps generating output.}{}

\practical{Type jobs to see the processes in the current

session. If the process you just put in the background was number 1, type fg \%1. Confirm that it is a foreground process again.} {If a shell is executing a program in the foreground, it will not accept command input, so hitting return should only produce blank lines.}{}

\practical{When you have made the hello script a foreground process again, you can kill it with Control-c. Try this. Start the script up again, this time as ./hello \& which immediately

puts it in the background. You should also get output along the lines of [1] 12345 which tells you that it is the first job you put in the background, and that 12345 is its process ID. Kill the script with kill \%1. Start it up again, and kill it by using the process number.} {The command kill 12345 using the process number is usually enough to kill a running program. Sometimes it is necessary to use \n{kill -9 12345}.}{}

23.3.6 Shell customization

Top > Command execution > Shell customization Above it was mentioned that ls -F is an easy way to see which

files are regular, executable, or directories; by typing \n{alias ls='ls -F'} the ls command will automatically expanded to \n{ls -F} every time it is invoked. If you would like this behaviour in every login session, you can add the alias command to your \n{.profile} file. Other shells than \n{sh}/bash have other files for such customizations.

23.4 Scripting

Top > Scripting

The unix shells are also programming environments. You will learn more about this aspect of unix in this section.

23.4.1 Shell environent variables

Top > Scripting > Shell environent variables Above you encountered PATH, which is an example of an

shell, or environment, variable. These are variables that are known to the shell and that can be used by all programs run by the shell. You can see the full list of all variables known to the shell by typing env.

You can get the value of a shell variable by prefixing it with a dollar sign. Type the following two commands and compare the output:

echo PATH
echo $PATH

\practical{Check on the value of the HOME variable by typing \n{echo \$HOME}. Also find the value of \n{HOME} by piping env through grep.}{}{}

Environment variables can be set in a number of ways. The simplest is by an assignment as in other programming languages.

\practical{Type a=5 on the commandline. This defines a variable \n{a}; check on its value by using the echo command.}{The shell will respond by typing the value~5.}{Beware not to have space

around the equals sign; also be sure to use the dollar sign to print the value.}

A variable set this way will be known to all subsequent commands you issue in this shell, but not to commands in new shells you start up. For that you need the \indextermtt{export} command. Reproduce the following session (the square brackets form the command prompt):

[] a=20
[] echo $a
[] /bin/bash
[] echo $a

[] exit
[] export a=21
[] /bin/bash
[] echo $a
[] exit

You can also temporarily set a variable. Replay this scenario: \begin{enumerate}

  • Find an environment variable that does not have a value:

    [] echo $b

  • Write a short shell script to print this variable:

    [] cat > echob
    echo $b
    and of course make it execuable: chmod +x echob.

  • Now call the script, preceeding it with a setting of the variable~b:

    [] b=5 ./echob 5

    The syntax where you set the value, as a prefix without using a separate command, sets the value just for that one command.

  • Show that the variable is still undefined:

    [] echo $b
    That is, you defined the variable just for the execution of a single command. \end{enumerate}

    In section  23.4.2 you will see that the for construct also defines a variable; section  23.4.3 shows some more built-in variables that apply in shell scripts.

    23.4.2 Control structures

    Top > Scripting > Control structures

    Like any good programming system, the shell has some control structures. Their syntax takes a bit of getting used to. (Different shells have different syntax; in this tutorial we only discuss the bash shell.

    In the bash shell, control structures can be written over several lines:

    if [ $PATH = "" ] ; then
      echo "Error: path is empty"
    or on a single line:
    if [ `wc -l file` -gt 100 ] ; then echo "file too long" ; fi
    There are a number of tests defined, for instance -f somefile tests for the existence of a file. Change your script so that it will report -1 if the file does not exist.

    There are also loops. A~for loop looks like

    for var in listofitems ; do something with $var done

    This does the following:

    • for each item in \n{listofitems}, the variable var is set to the

      item, and

    • the loop body is executed.

    As a simple example:
    [] for x in a b c ; do echo $x ; done
    In a more meaningful example, here is how you would make backups of all your~.c files:
    for cfile in *.c ; do
      cp $cfile $cfile.bak
    Shell variables can be manipulated in a number of ways. Execute the following commands to see that you can remove trailing characters from a variable:
    [] a=b.c
    [] echo ${a%.c}
    With this as a hint, write a loop that renames all your .c files to~.x files.

    23.4.3 Scripting

    Top > Scripting > Scripting

    It is possible to write programs of unix shell commands. First you need to know how to put a program in a file and have it be executed. Make a file script1 containing the following two lines:

    echo "hello world"
    and type ./script1 on the command line. Result? Make the file executable and try again.

    You can give your script command line arguments. If you want to be able to call

    ./script1 foo bar
    you can use variables \verb+$1+,\verb+$2+ et cetera in the script:
    echo "The first argument is $1"
    echo "There were $# arguments in all"

    Write a script that takes as input a file name argument, and reports how many lines are in that file.

    Edit your script to test whether the file has less than 10 lines (use the \n{foo -lt bar} test), and if it does, cat the file. Hint: you need to use backquotes inside the test.

    The number of command line arguments is available as \verb+$#+. Add a test to your script so that it will give a helpful message if you call it without any arguments.

    23.5 Expansion

    Top > Expansion

    The shell performs various kinds of expansion on a command line, that is, replacing part of the commandline with different text.

    Brace expansion:

    [] echo a{b,cc,ddd}e
    abe acce addde
    This can for instance be used to delete all extension of some base file name:
    [] rm tmp.{c,s,o}  # delete tmp.c tmp.s tmp.o

    Tilde expansion gives your own, or someone else's home directory:

    [] echo  
    [] echo  eijkhout

    Parameter expansion gives the value of shell variables:

    [] x=5
    [] echo $x
    Undefined variables do not give an error message:
    [] echo $y
    There are many variations on parameter expansion. Above you already saw that you can strip trailing characters:
    [] a=b.c
    [] echo ${a%.c}
    Here is how you can deal with undefined variables:
    [] echo ${y:-0}

    The backquote mechanism (section  23.3.2 above) is known as command substitution. It allows you to evalute part of a command and use it as input for another. For example, if you want to ask what type of file the command ls is, do

    [] file `which ls`
    This first evalutes \n{which ls}, giving /bin/ls, and then evaluates file /bin/ls. As another example, here we backquote a whole pipeline, and do a test on the result:
    [] echo 123 > w
    [] cat w
    [] wc -c w
           4 w
    [] if [ `cat w | wc -c` -eq 4 ] ; then echo four ; fi

    Unix shell programming is very much oriented towards text manipulation, but it is possible to do arithmetic. Arithmetic substitution tells the shell to treat the expansion of a parameter as a number:

    [] x=1
    [] echo $((x*2))

    Integer ranges can be used as follows:

    [] for i in {1..10} ; do echo $i ; done

    23.6 Startup files

    Top > Startup files

    |see{shell, startup files} } |see{shell, startup files} }

    In this tutorial you have seen several mechanisms for customizing the behaviour of your shell. For instance, by setting the PATH variable you can extend the locations where the shell looks for executables. Other environment variables (section  23.4.1 ) you can introduce for your own purposes. Many of these customizations will need to apply to every sessions, so you can have shell startup files that will be read at the start of any session.

    Unfortunately, there are several startup files, and which one gets read is a complicated functions of circumstances. Here is a good common sense guideline\footnote{Many thanks to Robert McLay for figuring this out.}:

    • Have a \n{.profile} that does nothing but read the .bashrc:

      #  /.profile if [ -f  /.bashrc ]; then source  /.bashrc fi

    • Your .bashrc does the actual customizations:

      #  /.bashrc # make sure your path is updated if [ -z ``$MYPATH'' ]; then export MYPATH=1 export PATH=$HOME/bin:$PATH fi

    23.7 Shell interaction

    Top > Shell interaction

    Interactive use of Unix, in contrast to script writing (section  23.4 ), is a complicated conversation between the user and the shell. You, the user, type a line, hit return, and the shell tries to interpret it. There are several cases.

    • Your line contains one full command, such as ls foo: the

      shell will execute this command.

    • You can put more than one command on a line, separated by semicolons: mkdir foo; cd foo. The shell will execute these commands in sequence.

    • Your input line is not a full command, for instance \n{while [ 1]}. The shell will recognize that there is more to come, and use a different prompt to show you that it is waiting for the remainder of the command.

    • Your input line would be a legitimate command, but you want to type more on a second line. In that case you can end your input line with a backslash character, and the shell will recognize that it needs to hold off on executing your command. In effect, the backslash will hide ( escape ) the return.

    When the shell has collected a command line to execute, by using one or more of your input line or only part of one, as described just now, it will apply expansion to the command line (section  23.5 ). It will then interpret the commandline as a command and arguments, and proceed to invoke that command with the arguments as found.

    There are some subtleties here. If you type ls *.c, then the shell

    will reognize the wildcard character and expand it to a command line, for instance \n{ls foo.c bar.c}. Then it will invoke the ls command with the argument list \n{foo.c bar.c}. Note that ls does not receive *.c as argument! In cases where you do want the unix command to receive an argument with a wildcard, you need to escape it so that the shell will not expand it. For instance, \n{find . -name \\*.c} will make the shell invoke find with arguments \n{. -name *.c}.

    23.8 The system and other users

    Top > The system and other users

    Unix is a multi-user operating system. Thus, even if you use it on your own personal machine, you are a user with an account and you may occasionally have to type in your username and password.

    If you are on your personal machine, you may be the only user logged in. On university machines or other servers, there will often be other users. Here are some commands relating to them.

    • [ \indextermunix{whoami}] show your login name.

    • [ \indextermunix{who}] show the other users currently logged in.

    • [ \indextermunix{finger} {\tt otheruser}] get information about another user; you can specify a user's login name here, or their real name, or other identifying information the system knows about.

    • [ \indextermunix{top}] which processes are running on the system; use top -u to get this sorted the amount of cpu time they are currently taking. (On Linux, try also the vmstat command.)

    • [ \indextermunix{uptime}] how long has it been since your last reboot?

    23.9 The {\tt sed} and {\tt awk} tools

    Top > The {\tt sed} and {\tt awk} tools Apart from fairly small utilities such as \n{tr} and cut, Unix

    some more powerful ones. In this section you will see two tools for line-by-line transformations on text files. Of course this tutorial merely touches on the depth of these tools; for more information see  [AWK:awk,OReilly:sedawk] .

    23.9.1 \tt sed

    Top > The {\tt sed} and {\tt awk} tools > \tt sed The streaming editor sed is like an editor by remote control,

    doing simple line edits with a commandline interface. Most of the time you will use sed as follows:

    cat somefile | sed 's/abc/def/:g' > newfile
    (The use of cat here is not strictly necessary.) The \n{s/abc/def/} part has the effect of replacing \n{abc} by def in every line; the :g modifier applies it to every instance in every line rather than just the first.

    • If you have more than one edit, you can specify them with

      sed -e 's/one/two/' -e 's/three/four/'

    • If an edit needs to be done only on certain lines, you can specify that by prefixing the edit with the match string. For instance

      sed '/^a/s/b/c/'
      only applies the edit on lines that start with an~a. (See section  23.2 for regular expressions.)

    • Traditionally, sed could only function in a stream, so you

      the output file was always different from the input. The GNU version, which is standard on Linux systems, has a flag -i which edits `in place':

      sed -e 's/ab/cd/' -e 's/ef/gh/' -i thefile

    23.9.2 \tt awk

    Top > The {\tt sed} and {\tt awk} tools > \tt awk The awk utility also operates on each line, but it can be

    described as having a memory. An awk program consists of a sequence of pairs, where each pair consists of a match string and an action. The simplest awk program is

    cat somefile | awk '{ print }'
    where the match string is omitted, meaning that all lines match, and the action is to print the line. Awk breaks each line into fields separated by whitespace. A common application of awk is to print a certain field:
    awk '{print $2}' file
    prints the second field of each line.

    Suppose you want to print all subroutines in a Fortran program; this can be accomplished with

    awk '/subroutine/ {print}' yourfile.f


    Build a commandpipeline that prints of each subroutine header only the subroutine name. For this you first use sed to replace the parentheses by spaces, then awk to print the subroutine name field.

    Awk has variables with which it can remember things. For instance, instead of just printing the second field of every line, you can make a list of them and print that later:

    cat myfile | awk 'BEGIN {v="Fields:"} {v=v " " $2} END {print v}'

    As another example of the use of variables, here is how you would print all lines in between a \n{BEGIN} and END line:

    cat myfile | awk '/END/ {p=0} p==1 {print} /BEGIN/ {p=1} '

    Exercise The placement of the match with \n{BEGIN} and END may seem

    strange. Rearrange the awk program, test it out, and explain the results you get.

    23.10 Review questions

    Top > Review questions


    Exercise Devise a pipeline that counts how many users are logged onto the system, whose name starts with a vowel and ends with a consonant.



    Write a shell script for making backups. When you call this script as \n{./backup somefile} it should test whether somefile.bak exists, and give a warning if it does. In either case, it should copy the original file to a backup.

    Back to Table of Contents