sql - Select random row from a PostgreSQL table with weighted row probabilities

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

sql - Select random row from a PostgreSQL table with weighted row probabilities

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Example input:

SELECT * FROM test;
 id | percent   
----+----------
  1 | 50 
  2 | 35   
  3 | 15   
(3 rows)

How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?

I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}.

Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

536 views

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:13:32+0000

This should do the trick:

WITH CTE AS (
    SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R
)
SELECT *
FROM (
    SELECT id, SUM(percent) OVER (ORDER BY id) S, R
    FROM YOUR_TABLE CROSS JOIN CTE
) Q
WHERE S >= R
ORDER BY id
LIMIT 1;

The sub-query Q gives the following result:

1  50
2  85
3  100

We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE clause). We use common table expression (WITH) to ensure the random number is calculated only once.

BTW, the SELECT SUM(percent) FROM YOUR_TABLE allows you to have any weights in percent - they don't strictly need to be percentages (i.e. add-up to 100).

[SQL Fiddle]

Categories

sql - Select random row from a PostgreSQL table with weighted row probabilities

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags