threading with frealign_run_refine

Forums

I am running frealign 9.09 on a MOAB/TORQUE cluster. frealign_run_refine is working well for me. However, there is a feature that is either not working or that I'm using incorrectly. My understanding is that In the mparameters file, the nprocessor_rec parameter is the number of independent reconstructions to perform and mp_cpus is the number of threads to use in each of the reconstruction jobs. However, when I run a job with mp_cpus >1, the submitted job only ever uses 1 processor. Am I mistaken about how to set up the mparameter file, or is there something else going on here?

Thanks,
Scott

The number of reconstructions is determined by nclasses. For parallelization of refinement and reconstruction, the number of CPUs to be used must be provided by setting nprocessor_ref and nprocessor_rec. In some (rare) cases it makes sense to also change the value for mp_cpus, for example if the queuing system would otherwise assign to many job to one node and the node would then run out of memory. Unless this is the case, mp_cpus should be left unchanged, i.e. set to 1.

In reply to by niko

I think I wasn't clear when I said independent reconstructions. What I meant was that the reconstruction jobs for a given volume were being run independently (no threading) on separate processors. I have the case that you mention, where many jobs are being run on one node; these are big volumes (432^3) which is why I want to do threading. However, when I have set mp_cpus > 1, the jobs are only using 1 core. They do not seem to be using threading. I know threading works because I have independently created reconstruction jobs with export NCPUS=8 and qsub -l nodes=1:ppn=8, and the job uses threading as expected.

Thanks,
Scott

In reply to by sstagg

It is possible that threading does not work properly on clusters other than SGE since we do not have access to other clusters. If you check the mult_reconstruct.com script for the cluster option you are using, maybe you can see where the _mp option needs to be inserted to make it work, then send us the debugged code. However, I believe that in most cases it is more efficient to increase the number of parallel jobs and keep threading to one CPU.