ainterpo3ds.f segmentation fault

Forums

Hi,

I am getting a segmentation fault error originating in function AINTERPO3DS (v9.11 151031). Backtracing gives:

Final lines:

frealign_v9.exe 0000000001ABC9DF Unknown Unknown Unknown
frealign_v9.exe 0000000001AC03FD Unknown Unknown Unknown
frealign_v9.exe 0000000001B731C0 Unknown Unknown Unknown
frealign_v9.exe 0000000000451A76 ainterpo3ds_ 88 ainterpo3ds.f
frealign_v9.exe 00000000004461F9 presb_ 104 presb.f
frealign_v9.exe 000000000043BAB2 lmain_ 812 lmain.f
frealign_v9.exe 000000000040643E MAIN__ 693 frealign_v9.f
frealign_v9.exe 00000000004005BE Unknown Unknown Unknown
frealign_v9.exe 0000000001B74A3B Unknown Unknown Unknown
frealign_v9.exe 0000000000400429 Unknown Unknown Unknown

Line 88 of ainterpo3ds.f is:

CBUF = A3DF(ID)

But sometimes the segfault originates in line 83:

Final lines:

frealign_v9.exe 0000000001B05346 Unknown Unknown Unknown
frealign_v9.exe 0000000001B087A6 Unknown Unknown Unknown
frealign_v9.exe 0000000001BD1020 Unknown Unknown Unknown
frealign_v9.exe 000000000044A6A8 ainterpo3ds_ 83 ainterpo3ds.f
frealign_v9.exe 000000000043F240 presb_ 104 presb.f
frealign_v9.exe 000000000043601B lmain_ 812 lmain.f
frealign_v9.exe 0000000000407513 MAIN__ 693 frealign_v9.f
frealign_v9.exe 00000000004005EE Unknown Unknown Unknown
frealign_v9.exe 0000000001BD289B Unknown Unknown Unknown
frealign_v9.exe 00000000004004A9 Unknown Unknown Unknown

Line 83 of ainterpo3ds.f is:

RBUF = A1 * A2 * A3

Some additional info that I could gather:

-It happens for specific particles in specific cycles for specific references; these particles are OK because refinement/reconstruction of this dataset works under other conditions; also, previous cycles of the same refinement ran without problems;

-It happens for these same particles in the same cycles regardless of whether I'm refining the alignment parameters or just doing classification (i.e. if PMASK is 0 0 0 0 0)

-If I change slightly the alignment and classification resolution limits (0.1 A higher or lower, or 1 resolution shell higher or lower in this case) the error doesn't happen, at least not until the cycles I observed (am I too unfortunate in my choice of resolution limits??)

-Also, if I compile with -O0 the error seems to go away (but for -O1, -O2 and -O3 it happens)

-Finally, the error does not happen with the distributed binaries, until the cycles I observed (btw, which Makefile is used to generate them?)
I am compiling myself because I modified other parts of the code for my project, but nothing related to ainterpo3ds.f. I also tried compiling the original source code without any modification and got the error. Also tried both Intel and GNU makefiles.

Could this be a bug? Any other hint?

Thanks for helping.

Thanks for your careful testing and documentation! Yes, this looks like a bug. Maybe some variable does not get initialized correctly and then, depending on the compiler, this can lead to unreasonable values. Alternatively, maybe rounding errors sometimes lead to out-of-bounds addresses. If you would like to test if an out-of-bounds error occurs, you could check that the index ID is larger than 0 and never exceeds NSAM*IPAD/2*NSAM*IPAD*(NSAM*IPAD+2). If is outside of this range, skip CBUF = A3DF(ID) and set AINTERPO3DS = 0.0 before returning from the AINTERPO3DS subroutine.

The makefiles used to compile the binaries are Makefile_linux_amd64_pgi_static and Makefile_linux_amd64_pgi_mp_static.

In reply to by niko

Thanks a lot for the quick and detailed reply, Niko.
I will first verify if the error persists using PGI to see the behavior of different compilers, and then implement the additional checks. I'll post the results here.

In reply to by rdrighetto

I confirm that the PGI compiler is somehow able to prevent this segmentation fault.

For the other compilers (I tested Intel only) I tried to patch ainterpo3ds.f as suggested, but it resulted in .par files with alignment parameters all equal to 0.0, also many ***** and NaN values.
The patched code can be downloaded here:
https://drive.switch.ch/index.php/s/f6xWrRS8FhVragY
Maybe I did something wrong.