ctftilt segmentation fault and suggested bug fix

The latest version of ctftilt (ctf_100930.tar.gz) is producing segmentation faults. After a little bit of poking around and debugging I was able to trace the problem to an uninitialized variable RMSMIN.

The main program sets the value for RMSMIN at line 256

DO 71 I=J-1,1,-1
IF (BINS(I).LT.CMAX/10.0) THEN
RMSMIN=(I-1)*(MAX-MIN)/(NBIN-1)+MIN
GOTO 72
ENDIF
71 CONTINUE

RMSMIN is then passed to subroutine FIND_TAXIS and then FIND_TAXIS_S. The problem is that if the condition at line 255 is never satisfied, then RMSMIN will be an uninitialized variable and will pick up whatever random value happens to be at that memory location. Minor changes to the code (e.g. print statements) result in wildly different values for RMSMIN (1.55E-19, 5.08E20). If RMSMIN by chance gets a very small value, everything seems to work fine. On the other hand, if it gets a very large value, FIND_TAXIS_S returns NaNs for the array VARP2 and the program crashes.

Giving RMSMIN a default value does the trick. I'm not terribly familiar with the inner workings of the code, but would initializing to zero be reasonable?

Cheers,
Bob Sinkovits

Yes, this looks like a bug. I will correct this and post an updated version over the weekend. Thanks!

In reply to by niko

Hi Niko - thanks for taking care of this. You may want to wait though before posting the new version. ctftilt is now running fine in single processor mode, but I'm still seeing segmentation faults when I compile with -fopenmp. Note that ctffind3 runs just fine in parallel mode, so it's not a general problem with multi-threaded apps.

In reply to by niko

Once I disable parallelization of the following loop, everything runs fine

!$OMP PARALLEL DO
DO 100 I=1,179,2
CALL FIND_TAXIS_S(AIN,NXYZ,RMSMIN,RMSMAX,
+ JXYZ,POWER,KXYZ,NR,VARP2,I,0)
100 CONTINUE

My guess is that multiple threads are trying to write to the same block of memory, but I can't seem to track down where this is happening. Since this loop uses a small fraction of the run time it wouldn't be much of a performance hit to disable parallelization until the root of the problem has been identified and fixed.

Cheers,
Bob