If an SSE/AVX register's value is such that all its bytes are either 0 or 1, is there any way to efficiently get the indices of all non zero elements?
For example, if xmm value is | r0=0 | r1=1 | r2=0 | r3=1 | r4=0 | r5=1 | r6=0 |...| r14=0 | r15=1 | the result should be something like (1, 3, 5, ... , 15). The result should be placed in another _m128i variable or char[16] array.
If it helps, we can assume that register's value is such that all bytes are either 0 or some constant nonzero value (not necessary 1).
I am pretty much wondering if there is an instruction for that or preferably C/C++ intrinsic. In any SSE or AVX set of instructions.
EDIT 1:
It was correctly observed by @zx485 that original question was not clear enough. I was looking for any "consecutive" solution.
The example 0 1 0 1 0 1 0 1...
above should result in either of the following:
- If we assume that indices start from 1, then
0
would be a termination byte and the result might be
002 004 006 008 010 012 014 016 000 000 000 000 000 000 000 000
- If we assume that negative byte is a termination byte the result might be
001 003 005 007 009 011 013 015 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
- Anything, that gives as a consecutive bytes which we can interpret as indices of non-zero elements in the original value
EDIT 2:
Indeed, as @harold and @Peter Cordes suggest in the comments to the original post, one of the possible solutions is to create a mask first (e.g. with pmovmskb
) and check non zero indices there. But that will lead to a loop.