CTFFIND4 does not produce output on certain micrographs

I have been running ctffind4 in parallel on a number of micrographs, and have noticed that a large number of micrographs do not have output from ctffind4, indicating it either did not finish, or did not produce output.

When I do this on several hundred micrographs at once, it complains about a temporary .CTFFIND file having already been deleted in some instances.

CTFFIND finishes normally on more of the micrographs when I execute fewer jobs, but appears to always fail on a subset.

The subset on which it fails appears to be random, and resubmitting the jobs usually leads to the resubmitted jobs finishing, (though it can sometimes take 3 or 4 submissions to get all of them to run).

Finally, if I use the old-school-input method, a lot of them (or all of the jobs) will output the ctf parameters to the log file, but do not generate the extra .txt output file with the resolution to which the data is good is printed, indicating the problem might be in the output step, not in the fitting itself.

I am using the most recent version of CTFFIND4. (4.0.16)

Sorry if this bug report is a bit confusing.

Hi Axel,

Thanks for reporting this. I had thought that the bugs related to temporary files were fixed in 4.0.15 and later versions, so I'm sorry to hear you're still getting problems. Does your version of ctffind4 happen to print out any more information when it crashes? Like a stack trace perhaps? I may have to completely rethink the temporary file feature (it allows the program to remember what you answered the last time you ran it), but I'm afraid I won't have time to get to this for a while.

The only sure-fire workaround for this is to run ctffind in a separate folder for each input micrograph. That way, there can be no cross-talk between temporary files created by different instances of the program (an additional benefit of this is that some filesystems perform much better when the number of files per directory is small).

Regarding the other potential bug with old-school input, do you have a reproducible error, or is it also random? I can't think why this would occur, unless you are giving all your output files the same filename perhaps...

Cheers,
Alexis

In reply to by Alexis

Definitely not giving the output files the same filename (I am using the relion mpi feature to run it in old-school mode).

It also appears to be random. Re-running the jobs appears to fix the problem.

I'll reproduce the bug, and let you know what the error printout is.

Axel

In reply to by Axel

There are 4 general behaviors:

1. Jobs finished cleanly, outputs parameters to log file, but did not write .txt file

2. Jobs crashed with this output:

2015-10-06 20:16:48: Fatal error (UsefulFunctions::FileCopy): File exists but cannot be opened: .CTFFind_5Q7vNKg9GKogGI9k ; file not found, unit 20, file /net/em-stor1/abrilot/tf30/SerialEMData/group2_refflip/512/.CTFFind_5Q7vNKg9GKogGI9k

3. Other jobs failed with this output:

2015-10-06 18:51:10: Fatal error (UserInput::UpdateDefaults): UpdateDefaults can only be called once

4. Other jobs failed with this output:

**warning(FileDelete): error 28 when trying to delete .CTFFind_3RRwUCYQpvCcsJTg

May I suggest adding a batch mode, where no temporary files are used, and an interactive mode, where a temp file is created?