Refinement threads fail in Mode 1
Forums
Hello-
I am attempting to process a (true) helical filament (symmetry H) with Frealign v9.11. The processing is being performed locally on a workstation.
After initial alignment of segments with RELION, I was able to port my data to Frealign and generate a believable 3.5 A reconstruction (based on comparison to known structures of the protomer) with Frealign utilizing the RELION alignment parameters with mode 0, running the provided frealign_calc_reconstructions script.
I am then able to get through a single round of "priming" refinement setting the parameter_mask values to "0 0 0 0 0" running "frealign_run_refine". However, when I try to then do an actual refinement, setting parameter_mask to "1 1 1 1 1", the refinement step fails in inconsistent ways. The refinement will commence, but then individual frealign processes will begin to fail, generally after writing the header but before writing any alignment parameters to their respective temporary parameter files in the scratch directory. Sometimes they will chug along for a little while but then die, without an obvious pattern to where they fail. Oddly, sometimes the frealign_run_refine script seems aware that this has happened, writing a "Job XXXXX crashed" note in the frealign.log file. However, it does not kill the other threads, and frealign_kill will not work to kill them. Other times, the threads begin to die off, without it noticing, and frealign_kill will work in this condition.
When a "Job XXXXX crashed" note does appear, it will generally point to a specific mult_refine log file in the scratch directory. However, I can find no difference between these files and those from successful threads, other than the lack of any scores.
I can't think of any obvious differences between this job and others we routinely run successfully, other than that the dataset is somewhat larger: ~160,000 segments vs. ~70,000. We are using a 512 box, so the stack is fairly large. We have been using mrc stacks: does this become an issue with large stacks? Would a different format potentially be better? Although it is a bit confusing that it only begins to fail once parameter_mask is set to "1 1 1 1 1"; I would think input-file read issues would manifest themselves systematically...
Apologies for the long post, and thanks very much in advance for any help!
-Greg
This sounds odd. The larger
This sounds odd. The larger files size of the particle stack should not matter. I assume you have already eliminated the obvious, such as limited disk space or limited RAM on your compute nodes to run many jobs in parallel. The problem may have to do with the parameters. You should make sure that there is no obvious difference in the format between the present parameter file and a previous one that worked.
There is another potential problem: For helical symmetry, Frealign tries to apply restraints to the angles (in addition to shift restraints, which are always applied). The restraints are based on standard deviations of the PSI and THETA angles that should be similar to each other within a filament. If the imported PSI and THETA parameters are the same within a given filament, this could cause a problem with the restraints.
Hi Niko- Thanks for the quick
In reply to This sounds odd. The larger by niko
Hi Niko-
Thanks for the quick reply. Yes, the machine has 256GB of RAM and plenty of free disk space, and the refinement is running off a 2 disk SSD RAID0, so I/O should be decent.
I do notice one difference between the parameter files: the "FILM" column, which is used to track the filaments, goes up beyond 100000 in the larger dataset. Thus, it directly abuts the "MAG" column, without a space in between. Checking a bit more thoroughly, it does seem the particles from later in the dataset, where this value goes beyond 100000, are the ones that fail. I tried starting a refinement with C1 symmetry, and it fails in the same manner, suggesting it is not a problem of H symmetry per se...
We generally set stiffness to 0.0, such that (I believe) there should not be restraints between neighboring segments. I can try setting FILM to the same value for all the segments: do you think this could cause other problems? I assume simply inserting spaces is not going to work...
Yes, the FILM column could
In reply to Hi Niko- Thanks for the quick by galushin
Yes, the FILM column could cause the problem. Try changing it and run again.
Yes this seems to be the
In reply to Yes, the FILM column could by niko
Yes this seems to be the problem. In summary, the workaround was to set FILM to "1" for all segments, which works if stiffness is set to "0.0" in mparameters. I would note that this will not work for anyone with more than 100000 filaments who does want to use the stiffness restraint option.
Thanks again.