Temporary file I/O issues

Hello,

I'm running 32 unblur movie alignments on a single machine (128 vCPUs on AWS instance x1.32xlarge) and I keep running into errors that pop up randomly related to the temporary files written out by unblur.

An example of the error message:

2017-10-10 15:28:51: Fatal error (UsefulFunctions::FileCopy): File exists but cannot be opened: .UnBlur_S6XFugQHyDU7jmnv ; file not found, unit 20, file /data/.UnBlur_S6XFugQHyDU7jmnv

Is it correct to assume that the issue is related to the input/output on the storage drive? It is a RAID0 SSD, so I wouldn't think there would be issues with this type of problem.

What do you think is the best way to get around this problem? Can I recompile without temporary files being written? Or would a short wait time help?

Thanks!
Mike

Hi Mike,

I assume you are piping input into the program? If you want to try messing around with the source, you could try to change two lines in core/user_input.f90

line 68 - change "if (this_program%running_interactively) then" to "if (this_program%reading_from_terminal) then"
line 889 - also change "if (this_program%running_interactively) then" to "if (this_program%reading_from_terminal) then"

This may solve your problem, although I haven't really tested it.

Alternatively, you could run your job threaded, instead of running many instances.

Let me know if neither of these solves your problem!

Tim

In reply to by timgrant

Hi Tim,

Thanks for your reply. I will try editing the source code to see if that can help.

Otherwise do you think a wait time between job execution would help? Or, just putting every job into its own directory?

Thanks,
Mike

In reply to by timgrant

Ok thanks!

One more question - if I were going to thread unblur, what is the optimal number of threads to use? Keep in mind I can pick any configuration since it will be on AWS, which means from 1 to 128 hyperthreads for a given job.

Mike

In reply to by mcianfrocco

You may have to play around with the number a little to get the optimum, but something that divides nicely in to the number of frames is a good choice, and certainly not more than the number of frames.

Tim