Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
292 views
in Technique[技术] by (71.8m points)

c++ - The indices of non-zero bytes of an SSE/AVX register

If an SSE/AVX register's value is such that all its bytes are either 0 or 1, is there any way to efficiently get the indices of all non zero elements?

For example, if xmm value is | r0=0 | r1=1 | r2=0 | r3=1 | r4=0 | r5=1 | r6=0 |...| r14=0 | r15=1 | the result should be something like (1, 3, 5, ... , 15). The result should be placed in another _m128i variable or char[16] array.

If it helps, we can assume that register's value is such that all bytes are either 0 or some constant nonzero value (not necessary 1).

I am pretty much wondering if there is an instruction for that or preferably C/C++ intrinsic. In any SSE or AVX set of instructions.

EDIT 1:

It was correctly observed by @zx485 that original question was not clear enough. I was looking for any "consecutive" solution.

The example 0 1 0 1 0 1 0 1... above should result in either of the following:

  • If we assume that indices start from 1, then 0 would be a termination byte and the result might be

002 004 006 008 010 012 014 016 000 000 000 000 000 000 000 000

  • If we assume that negative byte is a termination byte the result might be

001 003 005 007 009 011 013 015 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF

  • Anything, that gives as a consecutive bytes which we can interpret as indices of non-zero elements in the original value

EDIT 2:

Indeed, as @harold and @Peter Cordes suggest in the comments to the original post, one of the possible solutions is to create a mask first (e.g. with pmovmskb) and check non zero indices there. But that will lead to a loop.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your question was unclear regarding the aspect if you want the result array to be "compressed". What I mean by "compressed" is, that the result should be consecutive. So, for example for 0 1 0 1 0 1 0 1..., there are two possibilities:

Non-consecutive:

XMM0: 000 001 000 003 000 005 000 007 000 009 000 011 000 013 000 015

Consecutive:

XMM0: 001 003 005 007 009 011 013 015 000 000 000 000 000 000 000 000

One problem of the consecutive approach is: how do you decide if it's index 0 or a termination value?

I'm offering a simple solution to the first, non-consecutive approach, which should be quite fast:

.data
  ddqZeroToFifteen              db 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
  ddqTestValue:                 db 0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1
.code
  movdqa xmm0, xmmword ptr [ddqTestValue]
  pxor xmm1, xmm1                             ; zero XMM1
  pcmpeqb xmm0, xmm1                          ; set to -1 for all matching
  pandn xmm0, xmmword ptr [ddqZeroToFifteen]  ; invert and apply indices

Just for the sake of completeness: the second, the consecutive approach, is not covered in this answer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...