Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a database table with about 160 million rows in it.

The table has two columns: id and listing.

I simply need to used PHP to display 1000 random rows from the listing column and put them into <span> tags. Like this:

<span>Row 1</span>
<span>Row 2</span>
<span>Row 3</span>

I've been trying to do it with ORDER BY RAND() but that takes so long to load on such a large database and I haven't been able to find any other solutions.

I'm hoping that there is a fast/easy way to do this. I can't imagine that it'd be impossible to simply echo 1000 random rows... Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
153 views
Welcome To Ask or Share your Answers For Others

1 Answer

Two solutions presented here. Both of these proposed solutions are mysql-only and can be used by any programming language as the consumer. PHP would be wildly too slow for this, but it could be the consumer of it.

Faster Solution: I can bring 1000 random rows from a table of 19 million rows in about 2 tenths of a second with more advanced programming techniques.

Slower Solution: It takes about 15 seconds with non-power programming techniques.

By the way both use the data generation seen HERE that I wrote. So that is my little schema. I use that, continue with TWO more self-inserts seen over there, until I have 19M rows. So I am not going to show that again. But to get those 19M rows, go see that, and do 2 more of those inserts, and you have 19M rows.

Slower version first

First, the slower method.

select id,thing from ratings order by rand() limit 1000;

That returns 1000 rows in 15 seconds.


For anyone new to mysql, don't even read the following.

Faster solution

This is a little more complicated to describe. The gist of it is that you pre-compute your random numbers and generate an in clause ending of random numbers, separated by commas, and wrapped with a pair of parentheses.

It will look like (1,2,3,4) but it will have 1000 numbers in it.

And you store them, and use them once. Like a one time pad for cryptography. Ok, not a great analogy, but you get the point I hope.

Think of it as an ending for an in clause, and stored in a TEXT column (like a blob).

Why in the world would one want to do this? Because RNG (random number generators) are prohibitively slow. But to generate them with a few machines may be able to crank out thousands relatively quickly. By the way (and you will see this in the structure of my so called appendices, I capture how long it takes to generate one row. About 1 second with mysql. But C#, PHP, Java, anything can put that together. The point is not how you put it together, rather, that you have it when you want it.

This strategy, the long and short of it is, when this is combined with fetching a row that has not been used as a random list, marking it as used, and issuing a call such as

select id,thing from ratings where id in (a,b,c,d,e, ... )

and the in clause has 1000 numbers in it, the results are available in less than half a second. Effective employing the mysql CBO (cost based optimizer) than treats it like a join on a PK index.

I leave this in summary form, because it is a bit complicated in practice, but includes the following particles potentially

  • a table holding the precomputed random numbers (Appendix A)
  • a mysql create event strategy (Appendix B)
  • a stored procedure that employees a Prepared Statement (Appendix C)
  • a mysql-only stored proc to demonstrate RNG in clause for kicks (Appendix D)

Appendix A

A table holding the precomputed random numbers

create table randomsToUse
(   -- create a table of 1000 random numbers to use
    -- format will be like a long "(a,b,c,d,e, ...)" string

    -- pre-computed random numbers, fetched upon needed for use

    id int auto_increment primary key,
    used int not null,  -- 0 = not used yet, 1= used
    dtStartCreate datetime not null, -- next two lines to eyeball time spent generating this row
    dtEndCreate datetime not null,
    dtUsed datetime null, -- when was it used
    txtInString text not null -- here is your in clause ending like (a,b,c,d,e, ... )
    -- this may only have about 5000 rows and garbage cleaned
    -- so maybe choose one or two more indexes, such as composites
);

Appendix B

In the interest of not turning this into a book, see my answer HERE for a mechanism for running a recurring mysql Event. It will drive the maintenance of the table seen in Appendix A using techniques seen in Appendix D and other thoughts you want to dream up. Such as re-use of rows, archiving, deleting, whatever.

Appendix C

stored procedure to simply get me 1000 random rows.

DROP PROCEDURE if exists showARandomChunk;
DELIMITER $$
CREATE PROCEDURE showARandomChunk
(
)
BEGIN
  DECLARE i int;
  DECLARE txtInClause text;

  -- select now() into dtBegin;

  select id,txtInString into i,txtInClause from randomsToUse where used=0 order by id limit 1;
  -- select txtInClause as sOut; -- used for debugging

  -- if I run this following statement, it is 19.9 seconds on my Dell laptop
  -- with 19M rows
  -- select * from ratings order by rand() limit 1000; -- 19 seconds

  -- however, if I run the following "Prepared Statement", if takes 2 tenths of a second
  -- for 1000 rows

  set @s1=concat("select * from ratings where id in ",txtInClause);

  PREPARE stmt1 FROM @s1;
  EXECUTE stmt1; -- execute the puppy and give me 1000 rows
  DEALLOCATE PREPARE stmt1;
END
$$
DELIMITER ;

Appendix D

Can be intertwined with Appendix B concept. However you want to do it. But it leaves you with something to see how mysql could do it all by itself on the RNG side of things. By the way, for parameters 1 and 2 being 1000 and 19M respectively, it takes 800 ms on my machine.

This routine could be written in any language as mentioned in the beginning.

drop procedure if exists createARandomInString;
DELIMITER $$
create procedure createARandomInString
(   nHowMany int, -- how many numbers to you want
    nMaxNum int -- max of any one number
)
BEGIN
    DECLARE dtBegin datetime;
    DECLARE dtEnd datetime;
    DECLARE i int;
    DECLARE txtInClause text;
    select now() into dtBegin;

    set i=1;
    set txtInClause="(";
    WHILE i<nHowMany DO
        set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,", "); -- extra space good due to viewing in text editor
        set i=i+1;
    END WHILE;
    set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,")");
    -- select txtInClause as myOutput; -- used for debugging
    select now() into dtEnd;

    -- insert a row, that has not been used yet
    insert randomsToUse(used,dtStartCreate,dtEndCreate,dtUsed,txtInString) values 
       (0,dtBegin,dtEnd,null,txtInClause);
END
$$
DELIMITER ;

How to call the above stored proc:

call createARandomInString(1000,18000000);

That generates and saves 1 row, of 1000 numbers wrapped as described above. Big numbers, 1 to 18M

As a quick illustration, if one were to modify the stored proc, un-rem the line near the bottom that says "used for debugging", and have that as the last line, in the stored proc that runs, and run this:

call createARandomInString(4,18000000);

... to generate 4 random numbers up to 18M, the results might look like

+-------------------------------------+
| myOutput                            |
+-------------------------------------+
| (2857561,5076608,16810360,14821977) |
+-------------------------------------+

Appendix E

Reality check. These are somewhat advanced techniques and I can't tutor anyone on them. But I wanted to share them anyway. But I can't teach it. Over and out.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...