Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to compare two floating point variables using intrinsics. If the comparison is true, do something else do something. I want to do this as a normal if..else condition. Is there any way using intrinsics?

//normal code
vector<float> v1, v2;
for(int i = 0; i < v1.size(); ++i)
if(v1[i]<v2[i])
{
    //do something
}
else
{
    //do something
)

How to do this using SSE2 or AVX?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
89 views
Welcome To Ask or Share your Answers For Others

1 Answer

If you expect that v1[i] < v2[i] is almost never true, almost always true, or usually stays the same for a long run (even if overall there might be no particular bias), then an other technique is also applicable which offers "true conditionality" (ie not "do both, discard one result"), a price of course, but you also get to actually skip work instead of just ignoring some results.

That technique is fairly simple, do the comparison (vectorized), gather the comparison mask with _mm_movemask_ps, and then you have 3 cases:

  • All comparisons went the same way and they were all false, execute the appropriate "do something" code that is now maybe easier to vectorize since the condition is gone.
  • All comparisons went the same way and they were all true, same.
  • Mixed, use more complicated logic. Depending on what you need, you could check all bits separately (falling back to scalar code, but now just 1 FP compare for the whole lot), or use one of the "iterate only over (un)set bits" tricks (combines well with bitscan to recover the actual index), or sometimes you can fall back to doing masking and merging as usual.

Not all 3 cases are always relevant, usually you're applying this because the predicate almost always goes the same way, making one of the "all the same" cases so rare that you can just lump it in with "mixed".

This technique is definitely not always useful. The "mixed" case is complicated and slow. The fast-path has to be common and fast enough to be worth testing whether you're can take it.

But it can be useful, maybe one of the sides is very slow and annoying, while the other side of the branch is nice simple vectorizable code that doesn't take all that long in comparison. For example, maybe the slow side has to do argument reduction for an otherwise fast approximated transcendental function, or maybe it has to normalize some vectors before taking their dot product, or orthogonalize a matrix, maybe even get data from disk..

Or, maybe neither side is exactly slow, but they evict each others data from cache (maybe both sides are a loop over an array that fits in cache, but the arrays don't fit in it together) so doing them unconditionally slows both of them down. This is probably a real thing, but I haven't seen it in the wild (yet).

Or, maybe one side cannot be executed unconditionally, doing some generally destructive things, maybe even some IO. For example if you're checking for error conditions and logging them.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...