Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
560 views
in Technique[技术] by (71.8m points)

c - AVX/SSE version of xorshift128+

I am trying to make the fastest possible high quality RNG. Having read http://xorshift.di.unimi.it/ , xorshift128+ seems like a good option. The C code is

#include <stdint.h>
uint64_t s[ 2 ];

uint64_t next(void) { 
    uint64_t s1 = s[ 0 ];
    const uint64_t s0 = s[ 1 ];
    s[ 0 ] = s0;
    s1 ^= s1 << 23; // a
    return ( s[ 1 ] = ( s1 ^ s0 ^ ( s1 >> 17 ) ^ ( s0 >> 26 ) ) ) + s0; // b, c
}

I am not an SSE/AVX expert sadly but my CPU supports SSE4.1 / SSE4.2 / AVX / F16C / FMA3 / XOP instructions. How could you use these to speed up this code (assuming you want to make billions of such random numbers) and what is the expected limit to this speedup in practice?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

XorShift is indeed a good choice. It is so good, so fast and requires so little state that I'm surprised to see so little adoption. It should be the standard generator on all platforms. I have implemented it myself 8 years ago and even then it could generate 800MB/s of random bytes.

You cannot use vector instructions to speed up generating a single random number. There is too little instruction-level parallelism in those few instructions.

But you can easily speed up generating N numbers where N is the vector size of your target instruction set. Just run N generators in parallel. Keep state for N generators and generate N numbers at the same time.

If client code demands numbers one at a time you could keep a buffer of N (or more) numbers. If the buffer is empty you fill it using vector instructions. If the buffer is not empty you just return the next number.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...