java - Choose elements based on odds without going through entire list to compute the probability -
let's we're given array of number of objects, each 1 having weight representing ratio of being chosen other elements. so, instance, if have 1 object of weight 10, 1 of weight 30, , 1 of weight 35, ratio 10:30:35. probability of first object being chosen should 10/75 = 13.3%, probability of second being chosen should 30/75 = 40%, , probability of third being chosen should 35/75 = 46.6%
now, let's have function that's given array , has return randomly-chosen object based on weight. brute force way go on each object loop , add weight total, go on loop again , see whether each object should chosen according random probability function:
int totalweight = 0; for(object o: array) { totalweight += o.weight; } //now have total weight for(object o in array) { /* randomprob(double x) function returns true x percent of time */ if(randomprob(o.weight / totalweight)) { return o; } }
but let's we're dealing thousands of inputs, iterating on each 1 twice cost-intensive. there simpler algorithm determining object return without having go on loop once find total weight, , go on again on each element?
yes can in 1 pass, need sample n - 1
uniform random variables, n
length of list. can't if that's better in case or not out knowing lot more details application.
i guess switching 1 pass algorithm won't offer speed benefit, useful if list long fit memory.
the algorithm (note i'm using 1-relative indexing), list elements x[1], x[2], ..., x[n]
, positive weights w[1], w[2], ..., w[n]
.
pseudo code (sorry, not in java should able convert it):
x = x[1] w_total = w[1] in 2..n w_total = w_total + w[i] if (sampleuniform(0, 1) < w[i] / w_total) x = x[i]
i'm assuming sampleuniform(0, 1)
samples continuous uniform (0, 1) random variable. after for
loop completes, x
sample list x[1], x[2], ..., x[n]
desired property.
probability(x = x[i]) = w[i] / w_total
where
w_total = sum(j = 1..n) w[j].
it's relatively simple see why works examining what's happening inside for
loop arbitrary i
, using proof induction. suppose current x = x[j]
has been sampled list x[1], ..., x[i - 1]
probability w[j] / sum(k = 1 - 1) w[k]
(we'll deal in next section). new w_total
equal sum(k = 1 i) w[i]
, should clear after sampling uniform random variable have
probability(x = x[i]) = w[i] / w_total
and
probability(x = x[j]) = w[j] / (sum(k = 1 - 1) w[k]) * sum(k = 1 - 1) w[k] / w_total = w[j] / w_total.
since both j
, i
arbitrary holds after each step in loop, x
satisfy property
probability(x = x[k]) = w[k] / w_total k in 1 ... i.
it should easy see approach correct list of single element (this addresses above 'suppose' statement). hence induction, after loop
probability(x = x[i]) = w[i] / w_total
where
w_total = sum(j = 1..n) w[j].
(so algorithm works).
all being said, don't know if idea (speedwise) lists fit in memory. doubt it, sampling n - 1
uniform random numbers may take more time looping through list second time (you can speed tests check this).
Comments
Post a Comment