CUDA Real_t data type yield wrong result

Question

I'm working on writing CUDA code on MMA, because I found that CUDA on MMA can reach the max precision from this webpage.
When I run the code below with float type, it yields correct result.(Note that width=1024, k=23)
Needs["CUDALink`"];
KernelCode = "
  __global__ void kernel(float* kXList, int width, int k) {
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;
    float delta = 2.0f / width;
    if ((row < k) && (col < width)) {
        kXList[row * width + col] = -1.0f + (col * k + row + 1.0f) * delta / k;
    }
  }
  ";

kernel = CUDAFunctionLoad[KernelCode, 
   "kernel", { {"Float"}, _Integer, _Integer}, {32, 32}];
kX = CUDAMemoryAllocate["Float", k*W];
kernel[buffer, kX, kXtemp, kX1, kX2, W, k];

test = CUDAMemoryGet[kX][[;; k*W]];
CUDAMemoryUnload[kX];
MatrixForm[ArrayReshape[test, {k, W}]]

First column of result:
{
 {-0.999915},
 {-0.99983},
 {-0.999745},
 {-0.99966},
 {-0.999575},
 {-0.99949},
 {-0.999406},
 {-0.999321},
 {-0.999236},
 {-0.999151},
 {-0.999066},
 {-0.998981},
 {-0.998896},
 {-0.998811},
 {-0.998726},
 {-0.998641},
 {-0.998556},
 {-0.998471},
 {-0.998387},
 {-0.998302},
 {-0.998217},
 {-0.998132},
 {-0.998047}
}

But when I run the code below with Real_t type like tutorial said, the result is wrong.
Needs["CUDALink`"];
KernelCode = "
  __global__ void kernel(Real_t* kXList, int width,int k) {
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;
    Real_t delta = 2.0 / width;
    if ((row < k) && (col < width)) {
        kXList[row * width + col] = -1.0 + (col * k + row + 1.0) * delta / k;
    }
  }
  ";

kernel = CUDAFunctionLoad[KernelCode, 
   "kernel", { {"Float"}, _Integer, _Integer}, {32, 32}];
kX = CUDAMemoryAllocate["Float", k*W];
kernel[buffer, kX, kXtemp, kX1, kX2, W, k];

test = CUDAMemoryGet[kX][[;; k*W]];
CUDAMemoryUnload[kX];
MatrixForm[ArrayReshape[test, {k, W}]]

First column of result:
{
 {-1.58819*10^-23},
 {-1.58715*10^-23},
 {4.17233*10^-8},
 {-1.58715*10^-23},
 {-1.07374*10^8},
 {4.17349*10^-8},
 {2.72008*10^23},
 {-1.58845*10^-23},
 {0.},
 {0.},
 {-1.58819*10^-23},
 {4.17203*10^-8},
 {4.17233*10^-8},
 {2.71979*10^23},
 {-1.07374*10^8},
 {-1.58845*10^-23},
 {2.72008*10^23},
 {-1.07381*10^8},
 {0.},
 {0.},
 {-1.58819*10^-23},
 {-1.58812*10^-23},
 {4.17233*10^-8}
}
```

PalvinWang · Accepted Answer

Real type on MMA side should use {_Real,_,Input} or {_Real,_,Output}, not Float.

CUDA Real_t data type yield wrong result

One Answer

Add your own answers!

Ask a Question