gpfs vs. beegfs (fhgfs) performance issues

Forums

Hi everyone,

Not sure if this is news to many of you, but I thought I'd share this here in case anyone has experienced similar problems, and to avoid that others get stuck with the same issue:

On our cluster, we observed that if your data - i.e. reference files and, and most importantly, the particle stack - are stored in a BeeGFS (fhgfs) filesystem, FREALIGN v9.11 runs very slowly. Individual processes on the computing nodes typically get only ~5% CPU usage and it seemed to be an I/O issue. We tried changing the launching scripts to copy the references to the local scratch disks of the nodes* and it didn't help much. Furthermore, it's impractical to do the same with the particle stacks as they are often very large files.

However, if the data is stored in a GPFS filesystem, it runs normally and we can launch hundreds of refinement/reconstruciton jobs simultaneously without problems (i.e. extremely fast).

Below is a brief explanation by our cluster admin:

We have effectively 2 different parallel file system on the cluster GPFS
for IBM and BeeGFS from the Frauenhofer institute (fhgfs).
We use GPFS on the home and group folder and BeeGFS on /scicore/scratch
and /scicore/pfs.
I had a look at the "pfs" file system and currently we have a huge load
there (throughput up to 4 GBytes / second). Your job will be slow if you
run your job with this load.
Each file system has his strengths, GPFS has a local cache and if you
open many time the same file you will much faster than with BeeGFS.
In your case, I suggest than you run the FREALIGN application in your
HOME. This doesn't concern the Relion application, this should still use
BeeGFS.

*Suggested in this thread: http://grigoriefflab.janelia.org/node/5244

Thanks. One job that accesses files many times during a run is the Frealign control script, which "tails" the output files of all jobs. It may therefore be possible to speed up processing on a BeeGFS file system by modifying the control and job scripts so that a file is checked that is created only when a job finishes. This way, files that are generated while the jobs are running will not be accessed by multiple processes.