Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to run a simple example of dynamic parallelism in cuda. the code of the .cu file is

__global__ void child_launch(int *data) {
   data[threadIdx.x] = data[threadIdx.x]+1;
}

__global__ void parent_launch(int *data) {
   data[threadIdx.x] = threadIdx.x;

   __syncthreads();

   if (threadIdx.x == 0) {
       child_launch<<< 1, 256 >>>(data);
       cudaDeviceSynchronize();
   }

   __syncthreads();
}

where parent_launch is the kernel I want matlab to run, and each thread of parent_launch can run a grid of blocks with the kernel child_launch (in practice, only the 0th thread should create such a grid, but that's just an example).

I tried to run it all by compiling the .cu file into a .ptx file and then executing the following commands in matlab:

   k = parallel.gpu.CUDAKernel('file_name.ptx', 'file_name.cu');
   k.GridSize = [1,256];
   k.ThreadBlockSize = [1,256];
   r1 = feval(k, data);% data is an array of ints on the gpu

the problem is that when I tried to compile the .cu file, I got the following error:

error: kernel launch from __device__ or __global__ functions requires separate compilation mode

Does anyone know how to fix it?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
883 views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...