Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
404 views
in Technique[技术] by (71.8m points)

sql - MySQL: Alternatives to ORDER BY RAND()

I've read about a few alternatives to MySQL's ORDER BY RAND() function, but most of the alternatives apply only to where on a single random result is needed.

Does anyone have any idea how to optimize a query that returns multiple random results, such as this:

   SELECT u.id, 
          p.photo 
     FROM users u, profiles p 
    WHERE p.memberid = u.id 
      AND p.photo != '' 
      AND (u.ownership=1 OR u.stamp=1) 
 ORDER BY RAND() 
    LIMIT 18 
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

UPDATE 2016

This solution works best using an indexed column.

Here is a simple example of and optimized query bench marked with 100,000 rows.

OPTIMIZED: 300ms

SELECT 
    g.*
FROM
    table g
        JOIN
    (SELECT 
        id
    FROM
        table
    WHERE
        RAND() < (SELECT 
                ((4 / COUNT(*)) * 10)
            FROM
                table)
    ORDER BY RAND()
    LIMIT 4) AS z ON z.id= g.id

note about limit ammount: limit 4 and 4/count(*). The 4s need to be the same number. Changing how many you return doesn't effect the speed that much. Benchmark at limit 4 and limit 1000 are the same. Limit 10,000 took it up to 600ms

note about join: Randomizing just the id is faster than randomizing a whole row. Since it has to copy the entire row into memory then randomize it. The join can be any table that is linked to the subquery Its to prevent tablescans.

note where clause: The where count limits down the ammount of results that are being randomized. It takes a percentage of the results and sorts them rather than the whole table.

note sub query: The if doing joins and extra where clause conditions you need to put them both in the subquery and the subsubquery. To have an accurate count and pull back correct data.

UNOPTIMIZED: 1200ms

SELECT 
    g.*
FROM
    table g
ORDER BY RAND()
LIMIT 4

PROS

4x faster than order by rand(). This solution can work with any table with a indexed column.

CONS

It is a bit complex with complex queries. Need to maintain 2 code bases in the subqueries


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...