java - Choose elements based on odds without going through entire list to compute the probability -
let's we're given array of number of objects, each 1 having weight representing ratio of being chosen other elements. so, instance, if have 1 object of weight 10, 1 of weight 30, , 1 of weight 35, ratio 10:30:35. probability of first object being chosen should 10/75 = 13.3%, probability of second being chosen should 30/75 = 40%, , probability of third being chosen should 35/75 = 46.6%
now, let's have function that's given array , has return randomly-chosen object based on weight. brute force way go on each object loop , add weight total, go on loop again , see whether each object should chosen according random probability function:
int totalweight = 0; for(object o: array) { totalweight += o.weight; } //now have total weight for(object o in array) { /* randomprob(double x) function returns true x percent of time */ if(randomprob(o.weight / totalweight)) { return o; } } but let's we're dealing thousands of inputs, iterating on each 1 twice cost-intensive. there simpler algorithm determining object return without having go on loop once find total weight, , go on again on each element?
yes can in 1 pass, need sample n - 1 uniform random variables, n length of list. can't if that's better in case or not out knowing lot more details application.
i guess switching 1 pass algorithm won't offer speed benefit, useful if list long fit memory.
the algorithm (note i'm using 1-relative indexing), list elements x[1], x[2], ..., x[n], positive weights w[1], w[2], ..., w[n].
pseudo code (sorry, not in java should able convert it):
x = x[1] w_total = w[1] in 2..n w_total = w_total + w[i] if (sampleuniform(0, 1) < w[i] / w_total) x = x[i] i'm assuming sampleuniform(0, 1) samples continuous uniform (0, 1) random variable. after for loop completes, x sample list x[1], x[2], ..., x[n] desired property.
probability(x = x[i]) = w[i] / w_total where
w_total = sum(j = 1..n) w[j]. it's relatively simple see why works examining what's happening inside for loop arbitrary i , using proof induction. suppose current x = x[j] has been sampled list x[1], ..., x[i - 1] probability w[j] / sum(k = 1 - 1) w[k] (we'll deal in next section). new w_total equal sum(k = 1 i) w[i] , should clear after sampling uniform random variable have
probability(x = x[i]) = w[i] / w_total and
probability(x = x[j]) = w[j] / (sum(k = 1 - 1) w[k]) * sum(k = 1 - 1) w[k] / w_total = w[j] / w_total. since both j , i arbitrary holds after each step in loop, x satisfy property
probability(x = x[k]) = w[k] / w_total k in 1 ... i. it should easy see approach correct list of single element (this addresses above 'suppose' statement). hence induction, after loop
probability(x = x[i]) = w[i] / w_total where
w_total = sum(j = 1..n) w[j]. (so algorithm works).
all being said, don't know if idea (speedwise) lists fit in memory. doubt it, sampling n - 1 uniform random numbers may take more time looping through list second time (you can speed tests check this).
Comments
Post a Comment