Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.

EDIT: My gpu is GeForce GTX 550i, with compute capability 2.1

My code:

#include<stdio.h>
#include<stdlib.h>
#define N 4096

__global__ void Printcell(float *d_Array , int n){
    int k = 0;

    printf("
=========== data of d_Array on device==============
");
    for( k = 0; k < n; k++ ){
        printf("%f  ", d_Array[k]);
        if((k+1)%6 == 0) printf("
");
    }
    printf("

Totally %d elements has been printed", k);
}

int main(){

    int i =0;

    float Array[N] = {0}, rArray[N] = {0};
    float *d_Array;
    for(i=0;i<N;i++)
        Array[i] = i;


    cudaMalloc((void**)&d_Array, N*sizeof(float));
    cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
    cudaDeviceSynchronize();
    Printcell<<<1,1>>>(d_Array, N);    //Print the device array by a kernel
    cudaDeviceSynchronize();

    /* Copy the device array back to host to see if it was correctly copied */   
    cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);

    printf("

");

    for(i=0;i<N;i++){
        printf("%f  ", rArray[i]);
        if((i+1)%6 == 0) printf("
");
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
141 views
Welcome To Ask or Share your Answers For Others

1 Answer

printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.

referring to the programmer's guide:

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.

Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.

The linked documentation indicates that the buffer size can be increased, also.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...