Parallel I/O

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html

47 Parallel I/O

Parallel I/O is a tricky subject. You can try to let all processors jointly write one file, or to write a file per process and combine them later. With the standard mechanisms of your programming language there are the following considerations:

• On clusters where the processes have individual file systems, the only way to write a single file is to let it be generated by a single processor.
• Writing one file per process is easy to do, but

• You need a post-processing script;
• if the files are not on a shared file system (such as Lustre ), it takes additional effort to bring them together;
• if the files are on a shared file system, writing many files may be a burden on the metadata server.
• On a shared file system it is possible for all files to open the same file and set the file pointer individually. This can be difficult if the amount of data per process is not uniform.

Illustrating the last point:

// pseek.c
FILE *pfile;
pfile = fopen("pseek.dat","w");
fseek(pfile,procid*sizeof(int),SEEK_CUR);
fseek(pfile,procid*sizeof(char),SEEK_CUR);
fprintf(pfile,"%d\n",procid);
fclose(pfile);


MPI also has its own portable I/O: MPI I/O , for which see chapter~ MPI topic: File I/O .

Alternatively, one could use a library such as hdf5 .