launching jobs on a SLURM cluster

Forums

Hi everybody,

Our HPC at FSU just migrated to the SLURM resource manager instead of PBS that they were using previously. Unfortunately, now the frealign_run_refine script doesn't work for us. The computer section of my mparameters file looks like this:
# Computer-specific setting
cluster_type SLURM ! Set to "sge", "lsf", "slurm", "pbs" or "condor" when running on an SGE, LSF, SLURM, PBS or CONDOR cl
uster, otherwise set to "none".
nprocessor_ref 4 ! Number of CPUs to use during refinement.
nprocessor_rec 4 ! Number of CPUs to use during reconstruction.
mem_per_cpu 2048 ! Memory available per CPU (in MB).

frealign.log looks like this:
[sstagg@hpc-login-40 frealign]$ more frealign.log
Starting refinement...
Frealign run script crashed Tue Jul 21 16:12:31 EDT 2015
Terminating...

I noticed that there was an error reported in mult_refine.log, but I don't know how to diagnose it.
[sstagg@hpc-login-40 frealign]$ more scratch/mult_refine.log
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
@: Expression Syntax.

Any help I can get, to get this running again would be greatly appreciated.

Best regards,
Scott

This might be a bit more difficult to fix. The current SLURM implementation may be specific to the STAMPEDE cluster. How is a job on your system submitted? Can you submit singe CPU jobs or do you always have to use a whole node?

To diagnose the error in the script more, please add an x to the first line of the mult_refine.com script:

#!/bin/csh -fx

and run the job again. Then send me the mult_refine.log file.

In reply to by niko

Here is the output:

[sstagg@hpc-login-40 scratch]$ more mult_refine.log
set working_directory = `pwd`
pwd
set SCRATCH = `grep scratch_dir mparameters | awk '{print $2}'`
grep scratch_dir mparameters
if ( 0 || == ) then
set SCRATCH = /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch
endif
if ( ! -d /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch ) then
cp mparameters /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( != ) then
set start = `grep start_process $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep start_process /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
endif
set end = `grep end_process $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep end_process /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set first = `grep first_particle $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep first_particle /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set last = `grep last_particle $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep last_particle /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set data_input = `grep data_input $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep data_input /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set occ_helical = `grep occ_helical $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep occ_helical /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || == ) then
set occ_helical = F
endif
set bin_dir = `grep frealign_bin_dir $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep frealign_bin_dir /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || == ) then
set bin_dir = `which frealign_v9.exe`
which frealign_v9.exe
set bin_dir = /panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.09/bin
endif
set cluster_type = `cat $SCRATCH/cluster_type.log`
cat /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/cluster_type.log
if ( 0 || SLURM == ) then
set mem_per_cpu = `grep mem_per_cpu $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep mem_per_cpu /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || 2048 == ) then
set night_queue = `grep night_queue $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep night_queue /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set reschedule = `grep reschedule_if_qw $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep reschedule_if_qw /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set stn = `grep qsub_string_ref mparameters | awk -F\" '{print $2}'`
awk -F" {print $2}
grep qsub_string_ref mparameters
set no_delete = `grep delete_scratch $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep delete_scratch /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( T == F ) then
set no_delete = 0
endif
set raw_images = `grep raw_images_high $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep raw_images_high /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( == ) set raw_images = `grep raw_images $SCRATCH/mparameters_run | awk '{print $2}'`
set raw_images = `grep raw_images $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep raw_images /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set raw_images = `echo ${raw_images:r}`
echo /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/start
set extension = `ls $raw_images.* | head -1`
head -1
ls /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/start.mrc
if ( /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/start.mrc == ) then
set extension = `echo ${extension:e}`
echo mrc
set nx = `${bin_dir}/fheader.exe ${raw_images}.${extension} | grep --binary-files=text NX | awk '{print $4}'`
/panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.09/bin/fheader.exe /panfs/storage.local/imb/stagg/ssta
gg/reliontest/frealign/start.mrc
awk {print $4}
grep --binary-files=text NX
set mem_big = `echo $nx | awk '{print int(10 * $1^3 * 4 * 66 /1024^3 + 1)/10}'`
awk {print int(10 * $1^3 * 4 * 66 /1024^3 + 1)/10}
echo 96
if ( `echo $mem_big | awk '{print int(1024 * $1)}'` > 2048 ) then
awk {print int(1024 * $1)}
echo 0.3
set mem_big = `echo $mem_big | awk '{if ($1 < 1) {print 1} else {print $1} }'`
awk {if ($1 < 1) {print 1} else {print $1} }
echo 0.3
set mem_small = `echo $nx | awk '{print int(10 * $1^3 * 4 * 3 /1024^3 + 1)/10}'`
awk {print int(10 * $1^3 * 4 * 3 /1024^3 + 1)/10}
echo 96
set mem_small = `echo $mem_small | awk '{if ($1 < 1) {print 1} else {print $1} }'`
awk {if ($1 < 1) {print 1} else {print $1} }
echo 0.1
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == LSF ) then
echo SLURM
if ( == || == ) then
set first = 1
set last = `${bin_dir}/fheader.exe ${raw_images}.${extension} | grep --binary-files=text NX | awk '{print $6}'`
/panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.09/bin/fheader.exe /panfs/storage.local/imb/stagg/ssta
gg/reliontest/frealign/start.mrc
awk {print $6}
grep --binary-files=text NX
endif
sleep 2
echo monitor refine
set pix = `grep pix_size $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep pix_size /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set dstep = `grep dstep $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep dstep /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set sym = `grep Symmetry $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep Symmetry /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
@ prev = 11 - 1
set nc = `grep --binary-files=text -v C ${working_directory}/${data_input}_${prev}_r1.par | head -1 | wc | awk '{print $2}'`
head -1
inary-files=text -v C /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r1.par
awk {print $2}
wc
if ( 16 == 4 ) then
set nc = `grep --binary-files=text -v C ${working_directory}/${data_input}_${prev}_r1.par | head -1 | wc | awk '{print $2}'`
grep --binary-files=text -v C /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r1.par
head -1
awk {print $2}
wc
if ( 16 < 12 ) then
mainloop:
set restart = `grep restart_after_crash $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep restart_after_crash /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set nclass = `grep nclasses $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep nclasses /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set pmask = `grep parameter_mask $SCRATCH/mparameters_run | awk -F\" '{print $2}'`
awk -F" {print $2}
grep parameter_mask /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set pmask = `echo $pmask "1 1 1 1 1" | awk '{print $1,$2,$3,$4,$5}'`
awk {print $1,$2,$3,$4,$5}
set pshift = `echo $pmask | awk '{print 0,0,0,$4,$5}'`
awk {print 0,0,0,$4,$5}
echo 1 1 1 1 1
set pangle = `echo $pmask | awk '{print $1,$2,$3,0,0}'`
awk {print $1,$2,$3,0,0}
echo 1 1 1 1 1
@ prev = 11 - 1
set nc = `ls ${data_input}_${prev}_r[0-9].par ${data_input}_${prev}_r[0-9][0-9].par ${data_input}_${prev}_r[0-9][0-9][0-9].par | wc -l
`
wc -l
ls betagalb_10_r1.par betagalb_10_r2.par
if ( 2 != 2 ) then
grep kill /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid.log
if ( ! 1 ) exit
endif
set nc = `ls ${data_input}_${prev}_r[0-9].par ${data_input}_${prev}_r[0-9][0-9].par ${data_input}_${prev}_r[0-9][0-9][0-9].par | wc -l
`
wc -l
ls betagalb_10_r1.par betagalb_10_r2.par
set i = `ls ${data_input}_${prev}_r[0-9].${extension} ${data_input}_${prev}_r[0-9][0-9].${extension} ${data_input}_${prev}_r[0-9][0-9]
[0-9].${extension} | wc -l`
wc -l
ls betagalb_10_r1.mrc betagalb_10_r2.mrc
if ( 2 != 2 ) then
cp mparameters /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set nproc = `grep nprocessor_ref $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep nprocessor_ref /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set itmax = `grep ITMAX $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep ITMAX /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set refineshiftinc = `grep refineshiftinc $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep refineshiftinc /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set refineangleinc = `grep refineangleinc $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep refineangleinc /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set mode = `grep MODE $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep MODE /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set sym = `grep Symmetry $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep Symmetry /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set shiftiter = `expr ${start} % ${refineshiftinc}`
expr 11 % 4
set angleiter = `expr ${start} % ${refineangleinc}`
expr 11 % 4
if ( 2 < 2 ) then
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == SLURM ) then
tr [a-z] [A-Z]
set nproc = `echo ${nproc} | awk '{print int($1/16+0.5)*16}'`
awk {print int($1/16+0.5)*16}
endif
@ incr = 5393 + 1 - 1
set incr = `echo ${incr} ${nclass} ${nproc} | awk '{print int($1*$2/$3+1)}'`
awk {print int($1*$2/$3+1)}
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
set h = `date +%k`
date +%k
set nq =
if ( 17 > 18 ) set nq = -l night=true
if ( 17 < 6 ) set nq = -l night=true
if ( F != T ) set nq =
set nq =
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid_temp.log
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == SLURM ) then
echo SLURM
echo #!/bin/csh -f
endif
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == CONDOR ) then
tr [a-z] [A-Z]
subloop:
set firstn = 1
@ lastn = 1 + - 1
if ( 0 > = 5393 ) set lastn = 5393
while ( 0 < = 5393 )
grep kill /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid.log
if ( ! 1 ) exit
endif
set nc = 1
while ( 1 < = 2 )
set fst = 1
if ( -e /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/betagalb_11_r1.par_1_0 && F == T ) then
if ( 0 > = 1 ) then
@ nc++
end
while ( 2 < = 2 )
set fst = 1
if ( -e /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/betagalb_11_r2.par_1_0 && F == T ) then
if ( 0 > = 1 ) then
@ nc++
end
while ( 3 < = 2 )
if ( 0 == 5393 ) then
@ firstn = 1 +
@: Expression Syntax.

In reply to by sstagg

It looks like this is an older version of Frealign and scripts. Please download the latest version and try again. This bug may have been fixed already.

In reply to by niko

Hi Niko,

Thanks for your help. I updated to v9.10 and still have the same error. Here is the output from mult_refine.log after adding -x:
[sstagg@hpc-login-38 scratch]$ more mult_refine.log
set working_directory = `pwd`
pwd
set SCRATCH = `grep scratch_dir mparameters | awk '{print $2}'`
grep scratch_dir mparameters
if ( 0 || == ) then
set SCRATCH = /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch
endif
if ( ! -d /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch ) then
cp mparameters /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( != ) then
set start = `grep start_process $SCRATCH/mparameters_run | awk '{print $2}'`
grep start_process /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
endif
set end = `grep end_process $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep end_process /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set first = `grep first_particle $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep first_particle /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set last = `grep last_particle $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep last_particle /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set data_input = `grep data_input $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep data_input /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set occ_helical = `grep occ_helical $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep occ_helical /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || == ) then
set occ_helical = F
endif
set bin_dir = `grep frealign_bin_dir $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep frealign_bin_dir /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || == ) then
set bin_dir = `which frealign_v9.exe`
which frealign_v9.exe
set bin_dir = /panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.10/bin
endif
set cluster_type = `cat $SCRATCH/cluster_type.log`
cat /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/cluster_type.log
if ( 0 || SLURM == ) then
set mem_per_cpu = `grep mem_per_cpu $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep mem_per_cpu /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || 2048 == ) then
set stn = `grep qsub_string_ref mparameters | awk -F\" '{print $2}'`
awk -F" {print $2}
grep qsub_string_ref mparameters
set no_delete = `grep delete_scratch $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep delete_scratch /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( T == F ) then
set no_delete = 0
endif
set raw_images = `grep raw_images_high $SCRATCH/mparameters_run | awk '{print $2}'`
grep raw_images_high /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( == ) set raw_images = `grep raw_images $SCRATCH/mparameters_run | awk '{print $2}'`
set raw_images = `grep raw_images $SCRATCH/mparameters_run | awk '{print $2}'`
grep raw_images /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set raw_images = `echo ${raw_images:r}`
echo /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/start
set extension = `ls $raw_images.* | head -1`
head -1
ls /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/start.mrc
if ( /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/start.mrc == ) then
set extension = `echo ${extension:e}`
echo mrc
set nx = `${bin_dir}/fheader.exe ${raw_images}.${extension} | grep --binary-files=text NX | awk '{print $4}'`
/panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.10/bin/fheader.exe /panfs/storage.local/imb/stagg/ssta
gg/reliontest/frealign/start.mrc
awk {print $4}
grep --binary-files=text NX
set mem_big = `echo $nx | awk '{print int(10 * $1^3 * 4 * 66 /1024^3 + 1)/10}'`
echo 96
if ( `echo $mem_big | awk '{print int(1024 * $1)}'` > 2048 ) then
echo 0.3
set mem_big = `echo $mem_big | awk '{if ($1 < 1) {print 1} else {print $1} }'`
awk {if ($1 < 1) {print 1} else {print $1} }
set mem_small = `echo $nx | awk '{print int(10 * $1^3 * 4 * 3 /1024^3 + 1)/10}'`
awk {print int(10 * $1^3 * 4 * 3 /1024^3 + 1)/10}
echo 96
set mem_small = `echo $mem_small | awk '{if ($1 < 1) {print 1} else {print $1} }'`
echo 0.1
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == LSF ) then
echo SLURM
if ( == || == ) then
set first = 1
set last = `${bin_dir}/fheader.exe ${raw_images}.${extension} | grep --binary-files=text NX | awk '{print $6}'`
/panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.10/bin/fheader.exe /panfs/storage.local/imb/stagg/ssta
gg/reliontest/frealign/start.mrc
awk {print $6}
grep --binary-files=text NX
endif
sleep 2
echo monitor refine
set pix = `grep pix_size $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep pix_size /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set dstep = `grep dstep $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep dstep /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set sym = `grep Symmetry $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep Symmetry /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
@ prev = 11 - 1
set nc = `grep --binary-files=text -v C ${working_directory}/${data_input}_${prev}_r1.par | head -1 | wc | awk '{print $2}'`
grep --binary-files=text -v C /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r1.par
head -1
awk {print $2}
wc
if ( 16 == 4 ) then
set nc = `grep --binary-files=text -v C ${working_directory}/${data_input}_${prev}_r1.par | head -1 | wc | awk '{print $2}'`
grep --binary-files=text -v C /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r1.par
head -1
awk {print $2}
wc
if ( 16 < 12 ) then
mainloop:
set restart = `grep restart_after_crash $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep restart_after_crash /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set nclass = `grep nclasses $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep nclasses /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set pmask = `grep parameter_mask $SCRATCH/mparameters_run | awk -F\" '{print $2}'`
awk -F" {print $2}
grep parameter_mask /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set pmask = `echo $pmask "1 1 1 1 1" | awk '{print $1,$2,$3,$4,$5}'`
echo 1 1 1 1 1 1 1 1 1 1
set pshift = `echo $pmask | awk '{print 0,0,0,$4,$5}'`
echo 1 1 1 1 1
set pangle = `echo $pmask | awk '{print $1,$2,$3,0,0}'`
awk {print $1,$2,$3,0,0}
@ prev = 11 - 1
set nc = `ls ${data_input}_${prev}_r[0-9].par ${data_input}_${prev}_r[0-9][0-9].par ${data_input}_${prev}_r[0-9][0-9][0-9].par | wc -l
`
wc -l
ls betagalb_10_r1.par betagalb_10_r2.par
if ( 2 != 2 ) then
grep kill /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid.log
if ( ! 1 ) exit
set nc = `ls ${data_input}_${prev}_r[0-9].par ${data_input}_${prev}_r[0-9][0-9].par ${data_input}_${prev}_r[0-9][0-9][0-9].par | wc -l
`
wc -l
ls betagalb_10_r1.par betagalb_10_r2.par
set i = `ls ${data_input}_${prev}_r[0-9].${extension} ${data_input}_${prev}_r[0-9][0-9].${extension} ${data_input}_${prev}_r[0-9][0-9]
[0-9].${extension} | wc -l`
wc -l
ls betagalb_10_r1.mrc betagalb_10_r2.mrc
if ( 2 != 2 ) then
cp mparameters /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set nproc = `grep nprocessor_ref $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep nprocessor_ref /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set itmax = `grep ITMAX $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep ITMAX /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set refineshiftinc = `grep refineshiftinc $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep refineshiftinc /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set refineangleinc = `grep refineangleinc $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep refineangleinc /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set mode = `grep MODE $SCRATCH/mparameters_run | awk '{print $2}'`
awk {print $2}
grep MODE /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set sym = `grep Symmetry $SCRATCH/mparameters_run | awk '{print $2}'`
grep Symmetry /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
set shiftiter = `expr ${start} % ${refineshiftinc}`
expr 11 % 4
set angleiter = `expr ${start} % ${refineangleinc}`
expr 11 % 4
if ( 2 < 2 ) then
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == SSH ) then
echo SLURM
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == SLURM ) then
echo SLURM
set nproc = `echo ${nproc} | awk '{print int($1/16+0.5)*16}'`
echo 4
endif
@ incr = 5393 + 1 - 1
set incr = `echo ${incr} ${nclass} ${nproc} | awk '{print int($1*$2/$3+1)}'`
echo 5393 2 0
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid_temp.log
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid_ssh.log
set form = `${bin_dir}/fheader.exe ${raw_images}.${extension} | grep --binary-files=text Opening | awk '{print $2}'`
/panfs/storage.local/imb/stagg/software/rhel61/usr/local/grigorieff/frealign_v9.10/bin/fheader.exe /panfs/storage.local/imb/stagg/ssta
gg/reliontest/frealign/start.mrc
awk {print $2}
grep --binary-files=text Opening
set fm = M
if ( MRC/CCP4 == SPIDER ) set fm = S
if ( MRC/CCP4 == IMAGIC ) set fm = I
set mask_file = `grep mask_file $SCRATCH/mparameters_run | awk '{print $2}'`
grep mask_file /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/mparameters_run
if ( 0 || == ) then
set nc = 1
while ( 1 < = 2 )
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/betagalb_10_r1.mrc
ln -s /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r1.mrc /panfs/storage.local/imb/stagg/sstagg/reliontest/fr
ealign/scratch/betagalb_10_r1.mrc
if ( MRC/CCP4 == IMAGIC ) ln -s /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r1.mrc /panfs/storage.local/imb/
stagg/sstagg/reliontest/frealign/scratch/betagalb_10_r1.mrc
@ nc++
end
while ( 2 < = 2 )
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/betagalb_10_r2.mrc
ln -s /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r2.mrc /panfs/storage.local/imb/stagg/sstagg/reliontest/fr
ealign/scratch/betagalb_10_r2.mrc
if ( MRC/CCP4 == IMAGIC ) ln -s /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/betagalb_10_r2.mrc /panfs/storage.local/imb/
stagg/sstagg/reliontest/frealign/scratch/betagalb_10_r2.mrc
@ nc++
end
while ( 3 < = 2 )
else
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid_temp.log
rm /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid_ssh.log
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == SLURM ) then
echo SLURM
echo #!/bin/csh -f
endif
if ( `echo ${cluster_type} | tr '[a-z]' '[A-Z]'` == CONDOR ) then
echo SLURM
subloop:
set firstn = 1
@ lastn = 1 + - 1
if ( 0 > = 5393 ) set lastn = 5393
while ( 0 < = 5393 )
grep kill /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid.log
if ( ! 1 ) exit
set nc = 1
while ( 1 < = 2 )
set fst = 1
if ( -e /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/betagalb_11_r1.par_1_0 && F == T ) then
if ( 0 > = 1 ) then
@ nc++
end
while ( 2 < = 2 )
set fst = 1
if ( -e /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/betagalb_11_r2.par_1_0 && F == T ) then
if ( 0 > = 1 ) then
@ nc++
end
while ( 3 < = 2 )
if ( 0 == 5393 ) then
@ firstn = 1 +
@: Expression Syntax.

I noticed there is also an odd message in monitor_frealign.log:
[sstagg@hpc-login-38 scratch]$ more monitor_frealign.log
cat: /panfs/storage.local/imb/stagg/sstagg/reliontest/frealign/scratch/pid_temp.log: No such file or directory

Is there any way to get the scripts to write out the individual frealign control files that it is submitting, and to show the slurm command it is using for submission?

Thanks,
Scott

In reply to by sstagg

After looking through the code, it looks like the control script is supposed to create a file called slurm.com that contains the script for frealign.exe then it uses sbatch to submit slurm.com. It looks like something is failing so that slurm.com is created but it only contains this line:
#!/bin/csh -f

Any ideas?

In reply to by sstagg

The current implementation for SLURM clusters has been programmed specifically for the STAMPEDE cluster in Texas and will likely have to be modified for your cluster. We need to figure what the syntax is on your cluster to submit single CPU jobs. Please contact me offline.

In reply to by niko

Hi everyone,

I am also trying to run frealign on a slurm cluster and it actually seems to work, however it only uses one cpu per node. Is there a way to tell frealign to use threads?

Best,

David

In reply to by David_H

There appear to be different implementations of SLURM clusters. It sounds like yours automatically dedicates entire nodes to each submitted job while others dedicate individual CPUs to one job. One solution would be to run multi-threaded jobs on each node. However, this is not how Frealign's parallelization has been implemented. Instead of multi-threading, Frealign runs many single CPU jobs, which is significantly more efficient that threading. Please consult your cluster administrator to find out if a queue for single CPU jobs can be established. If yes, an additional string specifying the queue can be added to mparameters using keywords qsub_string_ref and qsub_string_rec.

Please note that there is a more specialized set of Frealign run scripts that is designed for a SLURM cluster called "STAMPEDE" (a resource in Texas, USA). These scripts will use all 16 CPUs per node and they may also run on your SLURM cluster. You can access these scripts by entering "stampede" as the cluster_type in mparameters.

In reply to by niko

Hi all,

Excuse reopening the thread here but I'm also trying to run on a slurm queuing system. I have the following error when trying to run a mode 4 search with ~4500 particles.

WARNING: job may be blocked by backfill chunking (1 < 16 )
Message[0] job violates class configuration 'wclimit too high for qos 'lifesci' (31536000 > 172800)'

Has anybody encountered this or found a fix? How is the wc calculated for submission to the queue? I have tried changing the processor_ref and the number of cycles but the wclimit is always exceeded as reported in the error by the same amount.

Thanks!
Kyle

In reply to by kylelmorris

This seems to be a question for the people running the cluster. The "(1

Please note that there is a more specialized set of Frealign run scripts that is designed for a SLURM cluster called "STAMPEDE" (a resource in Texas, USA). These scripts will use all 16 CPUs per node and they may also run on your SLURM cluster. You can access these scripts by entering "stampede" as the cluster_type in mparameters.

In reply to by niko

Hi Niko
I am trying to run this script with the new frealign_v9.11, however the same script excluding the following parameter seems to be runningn without problem on our small
#!/bin/sh
#SBATCH --job-name=mparameters
#SBATCH --partition=parallel
#SBATCH --nodes=24
#SBATCH --ntasks=440
#SBATCH --mem-per-cpu=2048
#SBATCH -o Fre_refine.out
#SBATCH -e Fre_refine.err
#SBATCH --mail-user=......
#SBATCH --mail-type=ALL
#SBATCH --no-requeue
#SBATCH --time=126:00:00

But it has been running for couple of days now I do not have an end of the cycle1 and no output file after the first one. In addition,to make it works i introduce exit in the script of monitor_frealign.log as recommended somewhere here.

Therefor I transfer the all project in bigger project and when I run it in our bigger cluster I have this message :
“set: Variable name must contain alphanumeric characters.”
Hereby the complete script within the parameter of mparameters, but I have changed the delete_scratch from T to F here.
It would be helpful if one has any suggestion as I am trying to run frealign in slurm cluster type:

#!/bin/sh
#SBATCH --job-name=frealignref.sh
#SBATCH --partition=parallel
#SBATCH --nodes=24
#SBATCH --ntasks=440
#SBATCH --mem-per-cpu=2048
#SBATCH -o Fre_refine.out
#SBATCH -e Fre_refine.err
#SBATCH --mail-user=........
#SBATCH --mail-type=ALL
#SBATCH --no-requeue
#SBATCH --time=126:00:00

export mparameters_PATH="/scratch/....frealign/mparameters:$PATH"
export PATH="/home/.../frealign_v9.11/bin:$PATH"
export OMP_NUM_THREADS=1

mpirun -np 440 /home/.../frealign_v9.11/bin/mult_refine.com

Control parameter file to run Frealign
======================================

This file must me kept in the project working directory from which the refinement scripts are launched.

Note: Please make sure that project and scratch directories (if specified) are accessible by all sub-processes that are run on cluster nodes.

# Computer-specific setting
cluster_type slurm ! Set to "sge", "lsf", "slurm", "stampede", "pbs" or "condor" when running on a cluster, otherwise set to "none".
nprocessor_ref 24 ! Number of CPUs to use during refinement.
nprocessor_rec 24 ! Number of CPUs to use during reconstruction.
mem_per_cpu 2048 ! Memory available per CPU (in MB).

# Refinement-specific parameters
MODE 1 ! 1, 2, 3 or 4. Refinement mode, normally 1. Set to 2 for additional search.
start_process 1 ! First cycle to execute. Output files from previous cycle (n-1) required.
end_process 20 ! Last cycle to execute.
res_high_refinement 6.0 ! High-resolution limit for particle alignment.
res_high_class 8.0 ! High-resolution limit to calculate class membership (OCC).
nclasses 1 ! Number of classes to use.
DANG 5.0 ! Mode 3 and 4: Angular step for orientational search.
ITMAX 200 ! Mode 2 and 4: Number of repetitions of grid search with random starting angles.

# Dataset-specific parameters
data_input Get1G1 ! Root name for parameter and map files.
raw_images Get1G1_stack_0_r1.mrc
image_contrast P ! N or P. Set to N if particles are dark on bright background, otherwise set to P.
outer_radius 45.0 ! Outer radius of spherical particle mask in Angstrom.
inner_radius 0.0 ! Inner radius of spherical particle mask in Angstrom
mol_mass 154.0 ! Molecular mass in kDa of particle or helical segment.
Symmetry C1 ! Symmetry of particle.
pix_size 1.05 ! Pixel size of particle in Angstrom.
dstep 5.0 ! Pixel size of detector in micrometer.
Aberration 2.7 ! Sherical aberration coefficient in millimeter.
Voltage 300.0 ! Beam accelleration voltage in kilovolt.
Amp_contrast 0.07 ! Amplitude contrast.

# Expert parameters (for expert users)
XSTD 0.0 ! Tighter masking of 3D map (XSTD > 0) or particles (XSTD < 0).
PBC 2.0 ! Discriminate particles with different scores during reconstruction. Small values (5 - 10) discriminate more than large values (50 - 100).
parameter_mask "1 1 1 1 1" ! Five flags of 0 or 1 (e.g. 1 1 1 1 1). Determines which parameters are refined (PSI, THETA, PHI, SHX, SHY).
refineangleinc 4 ! When larger than 1: Alternate between refinement of OCC and OCC + angles.
refineshiftinc 4 ! When larger than 1: Alternate between refinement of OCC and OCC + angles + shifts.
res_reconstruction 0.0 ! High-resolution limit of reconstruction. Normally set to Nyquist limit.
res_low_refinement 0.0 ! Low-resolution limit for particle alignment. Set to particle dimention or larger.
thresh_reconst 0.0 ! Particles with scores below this value will not be included in the reconstruction.
thresh_refine 50.0 ! Mode 4: Score threshold above which search will not be performed.
nbootstrap 1000 ! Number of bootstrap volumes to calculate real-space variance map.
FMAG F ! T or F. Set to T to refine particle magnification. Not recommended in most cases.
FDEF F ! T or F. Set to T to refine defocus per micrograph. Not recommended in most cases.
FASTIG F ! T or F. Set to T to refine astigmatism. Not recommended in most cases.
FPART F ! T or F. Set to T to refine defocus for each particle. Not recommended in most cases.
FFILT T ! T or F. Set to T to apply optimal filter to reconstruction. Recommended in most cases.
FMATCH F ! T or F. Set to T to output matching projections. Only needed for diagnostics.
FBEAUT F ! T or F. Set to T to apply symmetry also in real space. Not needed in most cases.
FBOOST F ! T or F. Set to T to allow potential overfitting during refinement. Not recommended in most cases.
RBfactor 0.0 ! B-factor sharpening (when < 0) applied during refinement. Not recommended in most cases.
beam_tilt_x 0.0 ! Beam tilt in mrad along X-axis.
beam_tilt_y 0.0 ! Beam tilt in mrad along y-axis.
mp_cpus 16 ! Number of CPUs to use for each reconstruction job.
restart_after_crash F ! T or F. Set to T to restart job after a crash.
delete_scratch F ! Delete intermediate files in scratch directory.
qsub_string_ref "" ! String to add to cluster jobs submitted for refinement (only for sge, lsf, slurm and pbs clusters).
qsub_string_rec "" ! String to add to cluster jobs submitted for reconstruction (only for sge, lsf, slurm and pbs clusters).
first_particle
last_particle
frealign_bin_dir
scratch_dir /scratch/tomography/nzigou/relion14_project/frealign/

# Masking parameters (for expert users)
mask_file
mask_edge 5 ! Width of cosine edge in pixels to add around mask. Set to 0 to leave mask unchanged.
mask_outside_weight 0.0 ! Factor to downweight density outside of mask (normally 0.0 - 1.0).
mask_filt_res 0.0 ! Filter radius (in A) to low-pass filter density outside density. Set to 0.0 to skip filtering.
mask_filt_edge 5 ! Width of cosine edge in reciprocal pixels to add to filter function.

focus_mask "" ! Four numbers (in Angstroms) describing a spherical mask (X, Y, Z for mask center and R for mask radius).

Thank you for help.

Jean

In reply to by jean-aymard

I see that you are using mpirun to use 440 CPUs for your job. Frealign does not use MPI. Parallelization is achieved through the scripts by launching multiple independent jobs.

It is quite possible that the scripts do not support your flavor of a SLURM cluster. Please try running your job again using the frealign_run_refine command and setting the cluster type either to SLURM or to STAMPEDE. Alternatively, if the compute nodes are accessible via SSH you can try SSH as the cluster type and provide a list with the node names and number of jobs to run on each in a file called hosts in the working directory. Please note that jobs run this way will not be registered on the cueing system and may interfere with other jobs running on the cluster. Please consult your system administrator.

In reply to by niko

Thank you very much Niko, every thing seems to run however I have an issue with the parameter files although I used the conversion tool in (http://grigoriefflab.janelia.org/frealign_conversion_scripts), and used the star file from relion.
I will restart conversion the process over

ERROR: Something wrong with parameter file.
If input file contained film#, defocus 1, defocus 2, astig. angle,
please make sure that numbers are separated by spaces, not commas.
Cycle 1: refining particles 1 to 7923, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 7924 to 15846, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 15847 to 23769, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 23770 to 31692, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 31693 to 39615, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 39616 to 47538, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 47539 to 55461, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 55462 to 63384, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 63385 to 71307, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 71308 to 79230, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 79231 to 87153, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 87154 to 95076, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 95077 to 102999, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 103000 to 110922, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 110923 to 118845, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 118846 to 126768, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 126769 to 134691, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 134692 to 142614, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 142615 to 150537, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 150538 to 158460, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 158461 to 166383, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 166384 to 174306, class 1 Mo 10. Okt 16:08:21 CEST 2016
Cycle 1: refining particles 174307 to 182229, class 1 Mo 10. Okt 16:08:22 CEST 2016
Cycle 1: refining particles 182230 to 190144, class 1 Mo 10. Okt 16:08:22 CEST 2016

Regards
Jean

In reply to by jean-aymard

Please make sure you are using the latest conversion script and also check that the converted file has all the columns and numbers in the right place. The Relion format may change and the script may stop working.

In reply to by niko

Dear Niko,
I download again the last conversion script and I have the following error messages for the frealign.log (1) and the mult_refine.log (2)
frealign.log (1)
Starting refinement...
ERROR: Something wrong with parameter file.
If input file contained film#, defocus 1, defocus 2, astig. angle,
please make sure that numbers are separated by spaces, not commas.
Cycle 1: refining particles 1 to 8961, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 8962 to 17922, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 17923 to 26883, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 26884 to 35844, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 35845 to 44805, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 44806 to 53766, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 53767 to 62727, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 62728 to 71688, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 71689 to 80649, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 80650 to 89610, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 89611 to 98571, class 1 Di 11. Okt 00:36:27 CEST 2016
Cycle 1: refining particles 98572 to 107532, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 107533 to 116493, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 116494 to 125454, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 125455 to 134415, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 134416 to 143376, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 143377 to 152337, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 152338 to 161298, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 161299 to 170259, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 170260 to 179220, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 179221 to 188181, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 188182 to 197142, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 197143 to 206103, class 1 Di 11. Okt 00:36:28 CEST 2016
Cycle 1: refining particles 206104 to 215063, class 1 Di 11. Okt 00:36:28 CEST 2016
Normal termination of frealign run Di 11. Okt 00:36:35 CEST 2016

mult_refine.log (2)
cat mult_refine.log
pwd: invalid option -- L
Try `pwd --help' for more information.
grep: /Get1G1_0_r1.par: No such file or directory
grep: /Get1G1_0_r1.par: No such file or directory
ls: No match.
pwd: invalid option -- L
Try `pwd --help' for more information.
Warning: ieee_inexact is signaling
FORTRAN STOP

RSAMPLE 1.01 - 24.07.15

Copyright 2013 Howard Hughes Medical Institute.
All rights reserved.
Use is subject to Janelia Farm Research Campus Software Copyright 1.1
license terms ( http://license.janelia.org/license/jfrc_copyright_1_1.html )

INPUT PARAMETER FILE ?
Get1G1_0.par
PIXEL SIZE [A] ?
# OF BOOTSTRAP VOLUMES ?
ERROR: Something wrong in parameter file
10 72,00 65,00 70,00 -3,15 2,10 47619 10 26677,0 26148,0 -15,00 100,00 -500 1,0000 20,00 0,00
0.000u 0.002s 0:00.00 0.0% 0+0k 0+0io 3pf+0w
/mult_reconstruct.com: Command not found.
Di 11. Okt 00:36:27 CEST 2016
ls: No match.
ls: No match.
sbatch: error: Unable to open file n1_1
sbatch: error: Unable to open file n1_8962
sbatch: error: Unable to open file n1_17923
sbatch: error: Unable to open file n1_26884
sbatch: error: Unable to open file n1_35845
sbatch: error: Unable to open file n1_44806
sbatch: error: Unable to open file n1_53767
sbatch: error: Unable to open file n1_62728
sbatch: error: Unable to open file n1_71689
sbatch: error: Unable to open file n1_80650
sbatch: error: Unable to open file n1_89611
sbatch: error: Unable to open file n1_98572
sbatch: error: Unable to open file n1_107533
sbatch: error: Unable to open file n1_116494
sbatch: error: Unable to open file n1_125455
sbatch: error: Unable to open file n1_134416
sbatch: error: Unable to open file n1_143377
sbatch: error: Unable to open file n1_152338
sbatch: error: Unable to open file n1_161299
sbatch: error: Unable to open file n1_170260
sbatch: error: Unable to open file n1_179221
sbatch: error: Unable to open file n1_188182
sbatch: error: Unable to open file n1_197143
sbatch: error: Unable to open file n1_206104

Beside I have my input files change from xxx_0_r1 to just xxx_0.
The parameter seems fine before I started the run though.

Here is an overview of parameter file data

1 152,00 124,00 -121,00 -2,10 4,20 47619 1 24279,0 23622,0 50,00 100,00 -500 1,0000 20,00 0,00
2 -80,00 124,00 -121,00 1,05 -0,00 47619 2 27533,0 26784,0 46,00 100,00 -500 1,0000 20,00 0,00

----------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------

215062 -19,00 46,00 -159,00 -0,00 1,05 47619 243 31035,0 30469,0 40,00 100,00 -500 1,0000 20,00 0,00
215063 -146,00 82,00 8,00 -0,00 -0,00 47619 95 22311,0 21934,0 -87,00 100,00 -500 1,0000 20,00 0,00

In reply to by niko

ThanK You Niko, the comma change works

Now seems that the command mult_reconstruct is not recognized and that I must submit the job as sbatch not just frealign_run_refine
command, however I did not includ exit in the monitor_frealign.log script this time.

monitor_frealign.log mparameters_run mult_refine.log pid.log pid_temp.log slurm.com
[................]$ cat mult_refine.log
ls: No match.

RSAMPLE 1.00 - 02.06.14

Copyright 2013 Howard Hughes Medical Institute.
All rights reserved.
Use is subject to Janelia Farm Research Campus Software Copyright 1.1
license terms ( http://license.janelia.org/license/jfrc_copyright_1_1.html )

INPUT PARAMETER FILE ?
Get1G1_0.par
PIXEL SIZE [A] ?
# OF BOOTSTRAP VOLUMES ?
N = 215063
OUTPUT PARAMETER FILE (*.par)?
Get1G1_0_r.par
Get1G1_0_r1.par

Normal termination of rsample
3.478u 0.058s 0:03.61 97.5% 0+0k 0+0io 0pf+0w
/scratch/agmisc/....../workfile/frealign/: Command not found.
Di 11. Okt 16:38:43 CEST 2016
ls: No match.
ls: No match.
sbatch: error: Batch job submission failed: Invalid partition name specified
grep: /scratch/agmisc/...../workfile/frealign/scratch/Get1G1_1_r1.par_1_6721: No such file or directory

In reply to by jean-aymard

I have currently submitted a batch job following the online NIH procedure, I will have to wait and see if it works that way:

(https://hpc.nih.gov/apps/Frealign.html#sbatch)

Batch job on Biowulf

Single Node Job

There are two ways of running a batch job on Biowulf. If your commands involve subtasks that run very quickly (for example reconstruction steps that last a few minutes each), it is much more efficient to run on a single node, using the local scratch disk. This is done by editing the mparameters file. In this case, because the path to the local scratch disk space is unknown prior to the job submission, the mparameters file contains a dummy tag "XXXXXXXX".
cluster_type none
nprocessor_ref 16
nprocessor_rec 16
scratch_dir /lscratch/XXXXXXXX
Create a batch input file (e.g. frealign.sh) which cds to the directory containing the mparameters file and substitutes the actual path to local disk into the dummy tag. For example:
#!/bin/bash
module load Frealign
cd /path/to/directory/where/mparameters/lives/
sed -i "s#/lscratch/XXXXXXXX#/lscratch/$SLURM_JOB_ID#" mparameters
frealign_run_refine
Submit this job using the Slurm sbatch command, allocating as many cpus as indicated in the nprocessor_ref and/or nprocessor_rec values, as well as local scratch space:
sbatch --cpus-per-task=16 --gres=lscratch:50 frealign.sh

Multiple Node Job

In cases where the subtasks of Frealign as expected to run for a significantly long time (more than 10 minutes), it may be better to distribute the subtasks across multiple nodes. Again, edit the mparameters file to indicate 'slurm' as the cluster type, and set nprocessor_{ref,rec} to no more than 50. In this case, you must use a shared scratch space, so either leave scratch_dir blank to indicate the current working directory, or set the value to a specific directory. The value for mp_cpus can be safely set to 2, because each sbatch job will allocate a minimum of 2 cpus.
cluster_type slurm
nprocessor_ref 50
nprocessor_rec 50
mp_cpus 2

Create a batch script (for example frealign.sh):