\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] Back to Table of Contents

30 Good coding practices

Sooner or later, and probably sooner than later, every programmer is confronted with code not behaving as intended. In this section you will learn some techniques of dealing with this problem. At first we will see a number of techniques for preventing errors; in the next chapter we will discuss debugging, the process of finding the inevitable errors in a program, once they have occurred.

30.1 Defensive programming

Top > Defensive programming

In this section we will discuss a number of techniques that are aimed at preventing the likelihood of programming errors, or increasing the likelikhood of them being found at runtime. We call this \indextermstart{defensive programming}.

Scientific codes are often large and involved, so it is a good practice to code knowing that you are going to make mistakes and prepare for them. Another good coding practice is the use of tools: there is no point in reinventing the wheel if someone has already done it for you. Some of these tools are be described in other sections:

First we will have a look at runtime sanity checks, where you test for things that can not or should not happen.

30.1.1 Assertions

Top > Defensive programming > Assertions

In the things that can go wrong with a program we can distinguish between errors and bugs. Errors are things that legitimately happen but that should not. File systems are common sources of errors: a program wants to open a file but the file doesn't exist because the user mistyped the name, or the program writes to a file but the disk is full. Other errors can come from arithmetic, such as overflow errors.

On the other hand, a bug in a program is an occurence that cannot legitimately occur. Of course, `legitimately' here means `according to the programmer's intentions'. Bugs can often be described as `the computer always does what you ask, not necessarily what you want'.

Assertions serve to detect bugs in your program: an assertion is a predicate that should be true at a certain point in your program. Thus, an assertion failing means that you didn't code what you intended to code. An assertion is typically a statement in your programming language, or a preprocessor macro; upon failure of the assertion, your program will take some abortive action.

Some examples of assertions:

Assertions are often disabled in a program once it's sufficiently tested. The reason for this is that assertions can be expensive to execute. For instance, if you have a complicated data structure, you could write a complicated integrity test, and perform that test in an assertion, which you put after every access to the data structure.

Because assertions are often disabled in the `production' version of a code, they should not affect any stored data . If they do, your code may behave differently when you're testing it with assertions, versus how you use it in practice without them. This is also formulated as `assertions should not have side-effects '.

30.1.1.1 The C assert macro

Top > Defensive programming > Assertions > The C assert macro The C standard library has a file assert.h which provides an \n{assert()} macro. Inserting assert(foo) has the following effect: if foo is zero (false), a diagnostic message is printed

on standard error:

Assertion failed: foo, file filename, line line-number
which includes the literal text of the expression, the file name, and line number; and the program is subsequently aborted. Here is an example:
#include<assert.h>


void open_record(char *record_name)
{
  assert(record_name!=NULL);
  /* Rest of code */
}


int main(void)
{
  open_record(NULL);
}
The assert macro can be disabled by defining the NDEBUG macro.

30.1.1.2 An assert macro for Fortran

Top > Defensive programming > Assertions > An assert macro for Fortran

(Thanks to Robert Mclay for this code.)

#if (defined( GFORTRAN ) || defined( G95 ) || defined ( PGI) ) # define MKSTR(x) "x" #else # define MKSTR(x) #x #endif #ifndef NDEBUG # define ASSERT(x, msg) if (.not. (x) ) \ call assert( FILE , LINE ,MKSTR(x),msg) #else # define ASSERT(x, msg) #endif subroutine assert(file, ln, testStr, msgIn) implicit none character(*) :: file, testStr, msgIn integer :: ln print *, "Assert: ",trim(testStr)," Failed at ",trim(file),":",ln print *, "Msg:", trim(msgIn) stop end subroutine assert

which is used as
ASSERT(nItemsSet.gt.arraySize,"Too many elements set")

\begin{notready}

30.1.2 Try-catch in C++

Top > Defensive programming > Try-catch in C++

\end{notready}

30.1.3 Use of error codes

Top > Defensive programming > Use of error codes

In some software libraries (for instance MPI or PETSc) every subprogram returns a result, either the function value or a parameter, to indicate success or failure of the routine. It is good programming practice to check these error parameters, even if you think that nothing can possibly go wrong.

It is also a good idea to write your own subprograms in such a way that they always have an error parameter. Let us consider the case of a function that performs some numerical computation.

float compute(float val) { float result; result = ... /* some computation */ return result; } float value,result; result = compute(value);

Looks good? What if the computation can fail, for instance:
  result = ... sqrt(val) ... /* some computation */
How do we handle the case where the user passes a negative number?
float compute(float val)
{
  float result;
  if (val<0) { /* then what? */ 
  } else 
    result = ... sqrt(val) ... /* some computation */
  return result;
}
We could print an error message and deliver some result, but the message may go unnoticed, and the calling environment does not really receive any notification that something has gone wrong.

The following approach is more flexible:

int compute(float val,float *result)
{
  float result;
  if (val<0) {
    return -1;
  } else {
    *result = ... sqrt(val) ... /* some computation */
  }
  return 0;
}


float value,result; int ierr;
ierr = compute(value,&result);
if (ierr!=0) { /* take appropriate action */
}
You can save yourself a lot of typing by writing
#define CHECK_FOR_ERROR(ierr) \
  if (ierr!=0) { \
    printf("Error %d detected\n",ierr); \
    return -1 ; }
....
ierr = compute(value,&result); CHECK_FOR_ERROR(ierr);
Using some cpp macros you can even define
#define CHECK_FOR_ERROR(ierr) \
  if (ierr!=0) { \
    printf("Error %d detected in line %d of file %s\n",\
           ierr,__LINE__,__FILE__); \
    return -1 ; }
Note that this macro not only prints an error message, but also does a further return. This means that, if you adopt this use of error codes systematically, you will get a full backtrace of the calling tree if an error occurs. (In the Python language this is precisely the wrong approach since the backtrace is built-in.)

30.2 Guarding against memory errors

Top > Guarding against memory errors

In scientific computing it goes pretty much without saying that you will be working with large amounts of data. Some programming languages make managing data easy, others, one might say, make making errors with data easy.

The following are some examples of memory violations

.

This section contains some techniques to prevent errors in dealing with memory that you have reserved for your data.

30.2.1 Array bound checking and other memory techniques

Top > Guarding against memory errors > Array bound checking and other memory techniques

In parallel codes, memory errors will often show up by a crash in an MPI routine. This is hardly ever an MPI problem or a problem with your cluster.

Compilers for Fortran often have support for array bound checking. Since this makes your code much slower, you would only enable it during the development phase of your code.

30.2.2 Memory leaks

Top > Guarding against memory errors > Memory leaks

We say that a program has a memory leak , if it allocates memory, and subsequently loses track of that memory. The operating system then thinks the memory is in use, while it is not, and as a result the computer memory can get filled up with allocated memory that serves no useful purpose.

In this example data is allocated inside a lexical scope:

for (i=.... ) {
  real *block = malloc( /* large number of bytes */ )
  /* do something with that block of memory */
  /* and forget to call "free" on that block */
}
The block of memory is allocated in each iteration, but the allocation of one iteration is no longer available in the next. A similar example can be made with allocating inside a conditional.

It should be noted that this problem is far less serious in Fortran, where memory is deallocated automatically as a variable goes out of scope.

There are various tools for detecting memory errors: Valgrind, DMALLOC, Electric Fence. For valgrind, see section  31.3 .

30.2.3 Roll-your-own malloc

Top > Guarding against memory errors > Roll-your-own malloc

Many programming errors arise from improper use of dynamically allocated memory: the program writes beyond the bounds, or writes to memory that has not been allocated yet, or has already been freed. While some compilers can do bound checking at runtime, this slows down your program. A better strategy is to write your own memory management. Some libraries such as PETSc already supply an enhanced malloc; if this is available you should certainly make use of it. (The gcc compiler has a function \indextermtt{mcheck}, defined in mcheck.h, that has a similar function.)

If you write in~C, you will probably know the \n{malloc} and free calls:

int *ip; ip = (int*) malloc(500*sizeof(int)); if (ip==0) {/* could not allocate memory */} ..... do stuff with ip ..... free(ip);

You can save yourself some typing by
#define MYMALLOC(a,b,c) \
  a = (c*)malloc(b*sizeof(c)); \
  if (a==0) {/* error message and appropriate action */}


int *ip;
MYMALLOC(ip,500,int);

Runtime checks on memory usage (either by compiler-generated bounds checking, or through tools like valgrind or Rational Purify) are expensive, but you can catch many problems by adding some functionality to your malloc. What we will do here is to detect memory corruption after the fact.

We allocate a few integers to the left and right of the allocated object (line~1 in the code below), and put a recognizable value in them (line \n{2} and~3), as well as the size of the object (line~2). We then return the pointer to the actually requested memory area (line~4).

#define MEMCOOKIE 137
#define MYMALLOC(a,b,c) { \
  char *aa; int *ii; \
  aa = malloc(b*sizeof(c)+3*sizeof(int)); /* 1 */ \
  ii = (int*)aa; ii[0] = b*sizeof(c); \
          ii[1] = MEMCOOKIE;              /* 2 */ \
  aa = (char*)(ii+2); a = (c*)aa ;        /* 4 */ \
  aa = aa+b*sizesof(c); ii = (int*)aa; \
          ii[0] = MEMCOOKIE;              /* 3 */ \
  }
Now you can write your own free, which tests whether the bounds of the object have not been written over.
#define MYFREE(a) { \
  char *aa; int *ii,; ii = (int*)a; \
  if (*(--ii)!=MEMCOOKIE) printf("object corrupted\n"); \
  n = *(--ii); aa = a+n; ii = (int*)aa; \
  if (*ii!=MEMCOOKIE)  printf("object corrupted\n"); \
  }
You can extend this idea: in every allocated object, also store two pointers, so that the allocated memory areas become a doubly linked list. You can then write a macro CHECKMEMORY which tests all your allocated objects for corruption.

Such solutions to the memory corruption problem are fairly easy to write, and they carry little overhead. There is a memory overhead of at most 5 integers per object, and there is practically no performance penalty.

(Instead of writing a wrapper for malloc, on some systems you can influence the behaviour of the system routine. On linux, malloc

calls hooks that can be replaced with your own routines; see \url{http://www.gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html}.)

30.2.4 Specific techniques: Fortran

Top > Guarding against memory errors > Specific techniques: Fortran Use Implicit none.

Put all subprograms in modules so that the compiler can check for missing arguments and type mismatches. It also allows for automatic dependency building with fdepend.

Use the C preprocessor for conditional compilation and such.

30.3 Testing

Top > Testing

There are various philosophies for testing the correctness of a code.

With parallel codes we run into a new category of difficulties with testing. Many algorithms, when executed in parallel, will execute operations in a slightly different order, leading to different roundoff behaviour. For instance, the parallel computation of a vector sum will use partial sums. Some algorithms have an inherent damping of numerical errors, for instance stationary iterative methods (section  ), but others have no such built-in error correction (nonstationary methods; section  ). As a result, the same iterative process can take different numbers of iterations depending on how many processors are used.

30.3.1 Test-driven design and development

Top > Testing > Test-driven design and development

In test-driven design there is a strong emphasis on the code always being testable. The basic ideas are as follows.

Back to Table of Contents