GeFREALIGN 8.06-20110514 with CUDA 4.2

Forums

I built GeFREALIGN 8.06-20110514 from the tarball on this website with the CUDA 4.2 toolkit on a 64-bit CentOS 5.8 machine without any problems. The two binaries compiled and linked without issue. Unfortunately the refinement binary is crashing when I run it. Here's a snippet of the logfile:


Fourier-transform reference volume....
3D WEIGHTS FILE FOR OUTPUT?
3D RECONSTRUCTION HALFSET 1 FOR OUTPUT ?
3D RECONSTRUCTION HALFSET 2 FOR OUTPUT ?
3D PHASE RESIDUAL FILE FOR OUTPUT ?
3D POINT SPREAD FUNCTION FOR OUTPUT ?
pthread #0 (tid=1104578880) is created for GPU device #0.
error: Set CUDA Device #0 failed!

This is my CUDA build line from the Makefile:

CUDA = nvcc -gencode arch=compute_20,code=sm_20 -gencode arch=compute_13,code=sm_13 -lpthread -Xptxas -dlcm=ca

Any ideas on how to proceed? Thanks.

Except the computing GPU, do you have another display GPU on that computer? It looks failed in initialize the GPU.

Another possiblity is the version of CUDA. In CUDA 4, some function of hardware assignment were changed. So I am not sure whether those functions in CUDA3 are compatible with CUDA4. I suggest you try CUDA3.2. I haven't try the newest v4.2. But I can not compile GeFREALIGN with CUDA4.1 and 4.0.

In reply to by xueming

Yes, there are two cards in the machine. A GeForce GTX 550 Ti and a GeForce 8400 GS which is connected to the monitor. I just got a second GTX 550 so I could test reconstruction, so I'll be replacing the 8400 GS shortly.

$ nvidia-smi -L
GPU 0: GeForce GTX 550 Ti (UUID: N/A)
GPU 1: GeForce 8400 GS (UUID: N/A)

I rebuilt the binaries with the 3.2.16 CUDA toolkit, and I'm still getting a similar error:

pthread #0 (tid=1096096064) is created for GPU device #0.
error: Get CUDA Device Pointer for mapped b3dv failed!

I could post the entire log file if that would be useful.

Thanks for your assistance.

In reply to by bene

This is a bit odd. In my workstation, I have two cards as well:

$ nvidia-smi -L
GPU 0: Quadro FX 380 (UUID: N/A)
GPU 1: GeForce GT 430 (UUID: N/A)

The Quadro is connected to the monitors. When I run the GeFREALIGN refinement binary that I compiled with the 3.2.16 toolkit, it runs:

pthread #0 (tid=47030415001344) is created for GPU device #0.

But it's not actually running on GPU #0. It's clearly running on the GT 430 based on the output from 'nvidia-smi -l'.

I just added the 2nd GTX 550 card to the other machine, and I'm still getting the same error. Could this be related to the compute levels for each card? The GTX 550 is a 2.1 card, while the FX 380 is a 1.1 card.

In reply to by bene

I think I know what's the reason. Your display card was involved into computation and casue the errors. I usually use hardware version to distinguish the display card and computing card. But your display card may have the same hardware version.

So could you find the line(around line#488) in frealign_v8.cu as shown in the following

for(i=0;i<GPUNum;i++)
{
cudaDeviceProp prop;
if(cudaGetDeviceProperties(&prop, i) == cudaSuccess)
...
}

And then, change

for(i=0;i<GPUNum;i++)

to

for(i=1;i<GPUNum;i++)

If this works, you need to change both frealign_v8.cu in src_ref and src_rec folder.

On the other hand, make sure GT430 has enough memory to do the computation.

In reply to by bene

I just realize you just have two different GPU.
GeFREALGIN must be run on at least two GPUs. And all GPUs involved in the computation should be the same. You have two very different GPU, I don't think it will work. If you change the code to exclue the first GPU as my last post, GeFREALIGN will not work.

In reply to by xueming

I have two machines that I am testing on:

scoop: CentOS 5.8/64-bit

GPU 0: GeForce GTX 550 Ti (UUID: N/A)
GPU 1: GeForce GTX 550 Ti (UUID: N/A)

wrench: Ubuntu 12.04/64-bit

GPU 0: Quadro FX 380 (UUID: N/A)
GPU 1: GeForce GT 430 (UUID: N/A)

None of the binaries I have built using CUDA 3.2.16/4.1, g77 and GCC 3.4/4.1 have worked on scoop. The refinement binary I built with CUDA 3.2.16/GCC 4.1 worked on wrench, but claimed it was running on GPU 0 when it was actually running on GPU 1.

scoop is a dedicated build/test machine and doesn't have a monitor connected to it. wrench is my workstation and has monitors connected to the Quadro/GPU 0.

What OS did you use to compile your binaries? Can you provide binaries to see if I can get them to run on my test machines?

I also have access to 8 cluster nodes with single Tesla M2050s. If I can get the refinement binary running properly on those nodes, I can push the research computing group to double up the GPUs so we can run reconstructions as well.

Thanks for your help.

In reply to by bene

I used both CentOS and Fedora. All our GPU clusters use CentOS. Your OS isn't a problem.
All CUDA program strongly denpend on OS configuration. So my compiled binarires may not work on your computer. You already can compile GeFrealign, it should be fine.

I think your problem is just hardware issue. Even you make GTX550ti, FX380 ang GT430 work, they will be very slow.

Tesla M2050s should work, but you need at least two.

In reply to by bene

The error "error: Get CUDA Device Pointer for mapped b3dv failed!" may mean device#0, GTX550ti, doesn't support memory mapping.
What is the memory size of the graphic card and host machine?
Both GTX550ti and GF8400gs are kind of entry level graphic card. They may not or just weakly support some CUDA functions I used.