java - Choose elements based on odds without going through entire list to compute the probability -


let's we're given array of number of objects, each 1 having weight representing ratio of being chosen other elements. so, instance, if have 1 object of weight 10, 1 of weight 30, , 1 of weight 35, ratio 10:30:35. probability of first object being chosen should 10/75 = 13.3%, probability of second being chosen should 30/75 = 40%, , probability of third being chosen should 35/75 = 46.6%

now, let's have function that's given array , has return randomly-chosen object based on weight. brute force way go on each object loop , add weight total, go on loop again , see whether each object should chosen according random probability function:

int totalweight = 0;  for(object o: array) {   totalweight += o.weight; }  //now have total weight for(object o in array) {   /* randomprob(double x) function returns true x percent of time */   if(randomprob(o.weight / totalweight)) {     return o;   } } 

but let's we're dealing thousands of inputs, iterating on each 1 twice cost-intensive. there simpler algorithm determining object return without having go on loop once find total weight, , go on again on each element?

yes can in 1 pass, need sample n - 1 uniform random variables, n length of list. can't if that's better in case or not out knowing lot more details application.

i guess switching 1 pass algorithm won't offer speed benefit, useful if list long fit memory.

the algorithm (note i'm using 1-relative indexing), list elements x[1], x[2], ..., x[n], positive weights w[1], w[2], ..., w[n].

pseudo code (sorry, not in java should able convert it):

x = x[1] w_total = w[1]  in 2..n     w_total = w_total + w[i]     if (sampleuniform(0, 1) < w[i] / w_total)        x = x[i] 

i'm assuming sampleuniform(0, 1) samples continuous uniform (0, 1) random variable. after for loop completes, x sample list x[1], x[2], ..., x[n] desired property.

probability(x = x[i]) = w[i] / w_total 

where

w_total = sum(j = 1..n) w[j]. 

it's relatively simple see why works examining what's happening inside for loop arbitrary i , using proof induction. suppose current x = x[j] has been sampled list x[1], ..., x[i - 1] probability w[j] / sum(k = 1 - 1) w[k] (we'll deal in next section). new w_total equal sum(k = 1 i) w[i] , should clear after sampling uniform random variable have

probability(x = x[i]) = w[i] / w_total 

and

probability(x = x[j]) = w[j] / (sum(k = 1 - 1) w[k]) *                          sum(k = 1 - 1) w[k] / w_total                        = w[j] / w_total. 

since both j , i arbitrary holds after each step in loop, x satisfy property

probability(x = x[k]) = w[k] / w_total k in 1 ... i. 

it should easy see approach correct list of single element (this addresses above 'suppose' statement). hence induction, after loop

probability(x = x[i]) = w[i] / w_total 

where

w_total = sum(j = 1..n) w[j]. 

(so algorithm works).

all being said, don't know if idea (speedwise) lists fit in memory. doubt it, sampling n - 1 uniform random numbers may take more time looping through list second time (you can speed tests check this).


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -