ctffind 4.1.14 segfault

 

Hi Alexis,

I'm running gcc version 9.3.0, Linux 5.6.11-arch1-1. So far I was unable to run standalone ctffind on any mic, the only working binary for me is from inside cistem (4.1.8 I believe)

1) the linux64 binary from the website segfaults straight away when I execute it.

2) I've downloaded the src, run ./configure --with-wx-config=/usr/bin/wx-config --enable-openmp --enable-debugmode then make. It produced the following output:

       **   Welcome to Ctffind   **

           Version : 4.1.14
          Compiled : May 21 2020
              Mode : Interactive

Input image file name [input.mrc]                  : 006.mrc
Output diagnostic image file name
[diagnostic_output.mrc]                            :  
Pixel size [1.0]                                   : 7.08
Acceleration voltage [300.0]                       :  
Spherical aberration [2.70]                        :  
Amplitude contrast [0.07]                          :  
Size of amplitude spectrum to compute [512]        :  
Minimum resolution [30.0]                          :  
Maximum resolution [5.0]                           :  
Minimum defocus [5000.0]                           :  
Maximum defocus [50000.0]                          :  
Defocus search step [100.0]                        :  
Do you know what astigmatism is present? [no]      :  
Slower, more exhaustive search? [no]               :  
Use a restraint on astigmatism? [no]               :  
Find additional phase shift? [no]                  :  
Do you want to set expert options? [no]            :  
File name: 006.mrc
File type: MRC
Dimensions: X = 1024 Y = 1024 Z = 1
Number of slices: 1
Working on micrograph 1 of 1
     DFMID1      DFMID2      ANGAST          CC
   25399.96    25399.96        0.00    -0.86073

AverageRank is NaN for bin 9981

Failed Assert at src/programs/ctffind/ctffind.cpp:2854
bool ComputeRotationalAverageOfPowerSpectrum(Image*, CTF*, Image*, Image*, int, double*, double*, double*, double*, float*, float*)
Aborted (core dumped)

 

The output mrc is empty file and the output txt has no numbers in it, just the header. The produced defocus though seems correct (compared to cistem results)

I have number_of_bins = 363, but the counter on the line 2852 still goes outside of range:

(gdb) print(counter)
$20 = 362
(gdb) print(average_rank[counter])
$21 = -0.60797163347403205
(gdb) print(average[counter])      
$22 = -0.60797163347403205
(gdb) next
2854                    MyDebugAssertFalse(std::isnan(average_rank[counter]),"AverageRank is NaN for bin %i\n",counter);
(gdb)  
620       { return __builtin_isnan(__x); }
(gdb)  
2854                    MyDebugAssertFalse(std::isnan(average_rank[counter]),"AverageRank is NaN for bin %i\n",counter);
(gdb)  
2853                    MyDebugAssertFalse(std::isnan(average[counter]),"Average is NaN for bin %i\n",counter);
(gdb)  
620       { return __builtin_isnan(__x); }
(gdb) print(counter)               
$23 = 363
(gdb) next
2854                    MyDebugAssertFalse(std::isnan(average_rank[counter]),"AverageRank is NaN for bin %i\n",counter);
(gdb)  
620       { return __builtin_isnan(__x); }
(gdb)  
2854                    MyDebugAssertFalse(std::isnan(average_rank[counter]),"AverageRank is NaN for bin %i\n",counter);
(gdb)  
2853                    MyDebugAssertFalse(std::isnan(average[counter]),"Average is NaN for bin %i\n",counter);
(gdb)  
620       { return __builtin_isnan(__x); }
(gdb) print(counter)
$24 = 364
(gdb) print(average[counter])
$25 = 4.6355706165673953e-310
(gdb) print(average_rank[counter])
$26 = 0.069999992847442627

 

 

Hi Grigory,

Thanks for taking time to investigate this - it does look like a bug. Could you try to reproduce with 4.1.14, and, assuming it reproduces, could you please send the input image and runtime parameters so that I can investigate further? I believe if you start a new forum topic you'll have the option to attach files to the initial post in the topic.

Thanks!
Alexis

 

Download

Hi Grigory,

Thanks for this. Unfortunately, I am unable to reproduce this with the released binaries. 

First thing to try would be to disable debugmode. Could you try to recompile without this option?

Separately from this, I'm sorry that you can't run the pre-compiled binaries. This is going to be a bit tricky for me to investigate; I would have to create a virtual machine matching your OS exactly, and to be honest I'm not sure when I'll be able to do that. So let's focus on making it work for you when you build ctffind yourself for now.

Cheers,
Alexis

In reply to by Alexis

I have tried the ctffind (https://grigoriefflab.umassmed.edu/node/6323) using my posted tiff in this forum initiated by me, as it still gave segmentation fault, as following.

I know Alexis can make ctffind 4.1.14 workable with my tiff in Alexis's computer, but I hope Alexis can make it workable in every computer (or every liux).

Smith

 

 

 

[root@localhost bin]# ./ctffind

** Welcome to Ctffind **

Version : 4.1.14
Compiled : May 27 2020
Mode : Interactive

Input image file name [input.mrc] : /root/MotionCorr/job019/relionrelated/tutorialdata/relion30_tutorial/Movies/20170629_00049_frameImage.mrc
Output diagnostic image file name
[diagnostic_output.mrc] : fan20200527.mrc
Pixel size [1.0] : 0.885
Acceleration voltage [300.0] : 200
Spherical aberration [2.70] : 1.4
Amplitude contrast [0.07] : 0.1
Size of amplitude spectrum to compute [512] : 512
Minimum resolution [30.0] : 30
Maximum resolution [5.0] : 5
Minimum defocus [5000.0] : 5000
Maximum defocus [50000.0] : 50000
Defocus search step [100.0] : 100
Do you know what astigmatism is present? [no] : no
Slower, more exhaustive search? [no] : no
Use a restraint on astigmatism? [no] : no
Find additional phase shift? [no] : no
Do you want to set expert options? [no] : no
File name: 20170629_00049_frameImage.mrc
File type: MRC
Dimensions: X = 3710 Y = 3838 Z = 1
Number of slices: 1
Working on micrograph 1 of 1
OpenMP is not available - will not use parallel threads.

DFMID1 DFMID2 ANGAST CC
13000.02 13000.02 -42.50 0.14053
Segmentation fault (core dumped)

In reply to by Smith_Lee

Hello,

I have been trying to reproduce this error, but to no avail.

So far, I have managed to build a Docker container with Arch Linux, though not exactly the same version of reported by Grigory, and GCC 10.1.0 (also not the same version as Grigory). I have not yet figured out how to get exactly the same version of Arch and GCC.

In any case, in this virtual container, I could finally reproduce the immediate seg fault when launching the released 4.1.14 binary. However, when I compiled the sources I shared at https://grigoriefflab.umassmed.edu/node/6323 (using "configure --enable-debugmode --enable-openmp" and then "make"), the resulting binary ran fine, to completion, without error, on both Smith's and Grigory's test input images.

To help me debug this further, could you please specify the operating system version and GCC version you are using? I will then attempt to reproduce more closely those conditions so that I can reproduce the behavior.

Thanks for your help

Alexis

In reply to by Smith_Lee

Thanks for this information. I setup a Docker container with exactly that version of CentOS (8) and gcc. I believe I'm now using the same environment as you, and yet I am unable to reproduce the crashes using the latest version I shared with you a few days ago. 

I had to do the following before being able to build ctffind:

yum -y install sudo epel-release
yum -y install fftw fftw-devel wxGTK3 wxGTK3-devel libtiff libtiff-devel cmake make gcc git which diffutils gcc-c++ libjpeg-turbo-devel

Then I did:

../configure --enable-debugmode --enable-openmp

make -j2

This completed without error (starting from the ctffind-4.1.14_200525.tar.gz archive I shared the other day).

I was then able to run the resulting binary executable on both yours and Grigory's test images, with no crashes or errors.

For reference, here are more details about the container I built and the toolchain, as well as the libraries the binary links against:

[developer@2a6e5732349b 200512_ctffind_dbg]$ ldd /mnt/ext_home/work/tmp/centos_ctffind/ctffind-4.1.14/build/ctffind
    linux-vdso.so.1 (0x00007fff9331d000)
    libwx_baseu-3.0.so.0 => /lib64/libwx_baseu-3.0.so.0 (0x00007f5b091a3000)
    libwx_baseu_net-3.0.so.0 => /lib64/libwx_baseu_net-3.0.so.0 (0x00007f5b08f57000)
    libtiff.so.5 => /lib64/libtiff.so.5 (0x00007f5b08cde000)
    libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007f5b08a75000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f5b0885e000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f5b0865a000)
    libfftw3f.so.3 => /lib64/libfftw3f.so.3 (0x00007f5b08243000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f5b07eae000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f5b07b2c000)
    libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f5b078f4000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5b076dc000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5b074bc000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f5b070f9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f5b0964c000)
    libjbig.so.2.1 => /lib64/libjbig.so.2.1 (0x00007f5b06eed000)
[developer@2a6e5732349b 200512_ctffind_dbg]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)
[developer@2a6e5732349b 200512_ctffind_dbg]$ uname -a
Linux 2a6e5732349b 4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[developer@2a6e5732349b 200512_ctffind_dbg]$ rpm -q centos-release
centos-release-8.1-1.1911.0.8.el8.x86_64

 

 

In reply to by Alexis

 

Thanks Alexix.

Here I have a pure computer related question.

For "yum -y install fftw fftw-devel wxGTK3 wxGTK3-devel libtiff libtiff-devel cmake make gcc git which diffutils gcc-c++ libjpeg-turbo-devel", I have some concern, as for I have installed some some softwares such as relion, eman2, which installation and runing may have based on some impletations like  fftw, fftw-devel,   libtiff,libtiff-devel,cmake,make,gcc , gcc-c++, etc.
For installation of the test version of ctffind 4.1.14 in https://grigoriefflab.umassmed.edu/node/6323, if I run "yum -y install fftw fftw-devel wxGTK3 wxGTK3-devel libtiff libtiff-devel cmake make gcc git which diffutils gcc-c++ libjpeg-turbo-devel" based on your advice, is any possibility it can lead to my relion and  eman not workable, as for the versions of " fftw, fftw-devel,   libtiff,libtiff-devel,cmake,make,gcc , gcc-c++, etc" my relion and eman2 based on have been changed?

Best regards.

Smith

 

In reply to by Smith_Lee

[ TLDR: edit lines 270, 274, 2422 and 2765 of ctffind-4.1.14/src/programs/ctffind/ctffind.cpp and replace the bool at the beginning of those lines with void, then recompile. ]

I might have found the issue here - I at least had a similar experience and managed to fix it for our installation.

I recently built ctffind with a somewhat similar gcc version (8.3.0) and also experienced crashes.  I believe the reason for my crashes was undefined code in ctffind-4.1.14/src/programs/ctffind/ctffind.cpp:

The function ComputeRotationalAverageOfPowerSpectrum() (starting at line 2765) is declared as bool (matching the declaration in line 270), but it never returns anything, this is undefined behavior.  Going by its purpose, it doesn't have to return anything, it is only called once, in line 1757, and no return value is assigned there, and judging by the comment in the function in line 2824, it is planned to make it return something here (that should then also happen at the end of the function) and then evaluate that return value in the caller.

The same is true for RescaleSpectrumAndRotationalAverage().

All of this applies not only to ctffind 4.1.14 but also to 4.1.13.  In version 4.1.10 the two functions are correctly declared as void.

Here is a simple program visualizing the issue:

#include <iostream>

bool f() {
std::cout << "test" << std::endl;
}

int main()
{
f();
return 0;
}

Compiling this with <code>g++ -O1 -o test test.cpp</code> and running it fails in "random" ways - on x86-64 this causes an "infinite loop" (well, until the stack is exhausted) and on ppc64 it gives an "illegal instruction".  It appears to work if I compile with <code>-O0</code>. Unfortunately gcc only *warns* about this but compiles it nevertheless.

I hope this helps some people who are having problems (segfaults or other crashes) with 4.1.14.

In reply to by hgutch

Hi Harold,

Wow - thanks for finding this fix. This is great. 

As it turns out, I had already made such changes in my dev version. I do not recall exactly why (perhaps compiler warnings?). In any case, the dev version shouldn't suffer from this. You can get it from our github repository at https://github.com/ngrigorieff/cisTEM.

I shall prepare a new release of ctffind, but because the process is not trivial, I may not get to it very soon.

Cheers,
Alexis

In reply to by hgutch

Same discovery here,

Changing the return types of ComputeRotationalAverageOfPowerSpectrum and RescaleSpectrumAndRotationalAverage from "bool" to "void" (or adding a "return true;" at their end) in ctffind.cpp fix the segfault when compiled with gcc-9.2.0.

My guess is that GCC 9 needs to have everything aligned perfectly in memory for its optimizations to work.

What points to this problem is that the assert "AverageRank is NaN for bin" that fails in ComputeRotationalAverageOfPowerSpectrum (after calling Renormalize1DSpectrumForFRC), when done at the end of Renormalize1DSpectrumForFRC, doesn't fail.

Cheers,
Rafael N.

In reply to by Alexis

 

Hi all,

I just wanted to share our experience with this. We ran into the same issue on CentOS 8 with Ctffind 4.1.10 up to 4.1.14 all exiting with a segmentation fault (Ctffind 4.1.9 was running fine). v4.1.10 previously ran without issues on CentOS 7. It was always run in c shell.

Our IT department did some detective work and found out that it was glibc causing the problem, specifically the UTF-32.io library. We had glibc 2.17 on CentOS 7 and 2.28 on CentOS 8. Since it was a UTF library causing the problem, IT looked into the LANG environment variable, which for us was set to en_GB.UTF-8. If you set that variable to anything else prior to running Ctffind, e.g. setenv LANG, Ctffind seems to finish successfully (i.e. defoci between CentOS7 and CentOS8 on the same image are identical). 

I don't know if our seg fault applies to the other seg faults mentioned here, but changing the LANG environment variable solved ours.

Cheers,

Michael 

 

Dr. Michael Saur
Senior Research Associate
Astex Pharmaceuticals
436 Cambridge Science Park
Milton Road, Cambridge
CB4 0QA, UK
Tel: +44(0)1223 435063

michael.saur@astx.com
www.astx.com

 

In reply to by msaur

Thank you for the hint. Setting LC_ALL=C solves the problem for me.

I'm on CentOS 8 and ctffind 4.1.14 exits with Segmentation Fault unless I change the locale as above.

Best,
Andreas

In reply to by msaur

Wow, thanks; it fixes the segfault with the binary distribution of CTFFIND on CentOS 8.

Because our new cluster is AMD, I should bechmark which executable is faster between the Intel optimized binary or the locally compiled with GCC 9.

Edit: The GCC 9 version seams a tiny bit faster (5 seconds difference for 100 mics)