Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
504 views
in Technique[技术] by (71.8m points)

x86 - Convert _mm_shuffle_epi32 to C expression for the permutation?

I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32 and the NEON instructions I selected.

The documentation for _mm_shuffle_epi32 is on the lean side from Microsoft. The Intel documentation is better, but it's not clear to me what some of the pseudo-code is doing.

SELECT4(src, control)
{
    CASE(control[1:0])
        0: tmp[31:0] := src[31:0]
        1: tmp[31:0] := src[63:32]
        2: tmp[31:0] := src[95:64]
        3: tmp[31:0] := src[127:96]
    ESAC
    RETURN tmp[31:0]
}

dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])

I need help envisioning what _mm_shuffle_epi32 does. Or more correctly, the permutation applied to the value by the immediate. I guess I need to see it as basic C and ANDs and ORs.

Given C statements and macros like:

v2 = _mm_shuffle_epi32(v1, _MM_SHUFFLE(i1,i2,i3,i4));

What does the resulting C expression look like when it's unrolled into basic C statements?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There's no AND/OR going on, unless you need to unpack the 8bit integer holding four 2bit indices.

Make your own definition for _MM_SHUFFLE that expands to four args, instead of packing them.

It's something like

// dst = _mm_shuffle_epi32(src, _MM_SHUFFLE(d,c,b,a))
void pshufd(int dst[4], int src[4], int d,int c,int b,int a)
{   // note that the _MM_SHUFFLE args are high-element-first order
    dst[0] = src[a];
    dst[1] = src[b];
    dst[2] = src[c];
    dst[3] = src[d];
}

Vectors are indexed from low element = 0. The low element is the one that stores into memory at the lowest address, but when values are in registers you should think about them as [ 3 2 1 0 ]. In this notation, vector right-shifts (like psrldq) actually shift to the right.

This is why _mm_set_epi32(3, 2, 1, 0) takes its args in reverse order from int foo[] = { 0, 1, 2, 3 };.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...