java - Choose elements based on odds without going through entire list to compute the probability -


let's we're given array of number of objects, each 1 having weight representing ratio of being chosen other elements. so, instance, if have 1 object of weight 10, 1 of weight 30, , 1 of weight 35, ratio 10:30:35. probability of first object being chosen should 10/75 = 13.3%, probability of second being chosen should 30/75 = 40%, , probability of third being chosen should 35/75 = 46.6%

now, let's have function that's given array , has return randomly-chosen object based on weight. brute force way go on each object loop , add weight total, go on loop again , see whether each object should chosen according random probability function:

int totalweight = 0;  for(object o: array) {   totalweight += o.weight; }  //now have total weight for(object o in array) {   /* randomprob(double x) function returns true x percent of time */   if(randomprob(o.weight / totalweight)) {     return o;   } } 

but let's we're dealing thousands of inputs, iterating on each 1 twice cost-intensive. there simpler algorithm determining object return without having go on loop once find total weight, , go on again on each element?

yes can in 1 pass, need sample n - 1 uniform random variables, n length of list. can't if that's better in case or not out knowing lot more details application.

i guess switching 1 pass algorithm won't offer speed benefit, useful if list long fit memory.

the algorithm (note i'm using 1-relative indexing), list elements x[1], x[2], ..., x[n], positive weights w[1], w[2], ..., w[n].

pseudo code (sorry, not in java should able convert it):

x = x[1] w_total = w[1]  in 2..n     w_total = w_total + w[i]     if (sampleuniform(0, 1) < w[i] / w_total)        x = x[i] 

i'm assuming sampleuniform(0, 1) samples continuous uniform (0, 1) random variable. after for loop completes, x sample list x[1], x[2], ..., x[n] desired property.

probability(x = x[i]) = w[i] / w_total 

where

w_total = sum(j = 1..n) w[j]. 

it's relatively simple see why works examining what's happening inside for loop arbitrary i , using proof induction. suppose current x = x[j] has been sampled list x[1], ..., x[i - 1] probability w[j] / sum(k = 1 - 1) w[k] (we'll deal in next section). new w_total equal sum(k = 1 i) w[i] , should clear after sampling uniform random variable have

probability(x = x[i]) = w[i] / w_total 

and

probability(x = x[j]) = w[j] / (sum(k = 1 - 1) w[k]) *                          sum(k = 1 - 1) w[k] / w_total                        = w[j] / w_total. 

since both j , i arbitrary holds after each step in loop, x satisfy property

probability(x = x[k]) = w[k] / w_total k in 1 ... i. 

it should easy see approach correct list of single element (this addresses above 'suppose' statement). hence induction, after loop

probability(x = x[i]) = w[i] / w_total 

where

w_total = sum(j = 1..n) w[j]. 

(so algorithm works).

all being said, don't know if idea (speedwise) lists fit in memory. doubt it, sampling n - 1 uniform random numbers may take more time looping through list second time (you can speed tests check this).


Comments

Popular posts from this blog

android - Automated my builds -

how to proxy from https to http with lighttpd -

python - Flask migration error -