I am trying to write an OpenCL code to do element-wise operations on multi-dimensional arrays.
I know that OpenCL buffers are flattened, which makes indexing a bit tricky. I succeeded when dealing with 2-dimensional arrays, but for 3+ dimensional arrays, I have either indexing errors or the wrong result.
It is all the more surprising so that I use the same indexing principle/formula as in the 2D case.
2D case:
__kernel void test1(__global int* a, __global int* b, __global int* c, const int height) {
int i = get_global_id(0);
int j = get_global_id(1);
c[i + height * j] = a[i + height * j] + b[i + height * j];
}
Correct.
3D case:
__kernel void test1(__global int* a, __global int* b, __global int* c, const int dim1, const int dim2) {
int i = get_global_id(0);
int j = get_global_id(1);
int k = get_global_id(2);
int idx = i + dim1 * j + dim1 * dim2 * k;
c[idx] = a[idx] + b[idx];
}
Wrong result (usually an output buffer filled with values very close to 0).
4D case:
__kernel void test1(__global int* a, __global int* b, __global int* c, const int dim1, const int dim2, const int dim3) {
int i = get_global_id(0);
int j = get_global_id(1);
int k = get_global_id(2);
int l = get_global_id(3);
int idx = i + dim1 * j + dim1 * dim2 * k + l * dim1 * dim2 * dim3;
c[idx] = a[idx] + b[idx];
}
Here is the indexing error: enqueue_knl_test1 pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_DIMENSION