Select according to fitness

When programming an evolutionary simulation, often the parents of offspring have to be selected according to the parent's fitness.
I have used several algorithms to choose a parent, and will show 2 here, and compare performance between the two.

The idea of the algorithm is to draw a random individual, proportional to the fitness of the individual. Hence individuals with higher (relative) fitness
tend to be drawn more often. It is assumed that you have a vector containing all relative fitnesses, where the index of the fitness relates to the index
of the individual in the population (this can be customized to the user's preferences if necessary).

We draw a random number between 0 and 1, since the fitnesses are relative, they sum up to 1 (fitness = fitness / (sum of all fitnesses)). We now want to know to which individual the random number corresponds. We do that as the following:

int discount(const std::vector< double>& F) { double r = 1.0 * std::rand() / RAND_MAX; for(int i = 0; i < F.size(); ++i) { r-= F[i]; if(r <=0) return i; } return F.size() - 1; //in case we have exactly drawn r = 1.0 //the for loop might not find a match }

This algorithm is quite efficient, but is relatively slow, especially as the size of vector F increases. Therefore it might be more efficient to use a cumulative distribution:

Instead of just using the relative fitness, we will now use the cumulative fitness:

double c_number = 0.0; std::vectorcumulative_Fitness; for(std::vector< double>::iterator f = relative_fitness.begin(); f != relative_fitness.end(); ++f) { c_number += (*f); cumulative_Fitness.push_back(c_number); }

There is no need to sort the relative fitnesses first (I think). Since we now no longer have a "wheel of fortune" with parts proportional to the relative fitness, but rather have mapped the exact locations of all fitness proportions, we can use a binary search algorithm to find the index of the individual:

int search(const std::vector< double>& cumulative_F) { double r = 1.0 * std::rand() / RAND_MAX; int min = 0; int max = cumulative_F.size() - 1; int med = (int)((max+min)*0.5); while((max-min) > 1) { if(cumulative_F[med] >= r) max = med; else min = med; med = (int)((max+min)*0.5); } if(cumulative_F[med] < r) return max; else return med; }

Advantages of using the cumulative distribution is that the amount of "calculation" steps scales logarithmically with the size of vector cumulative_F, this in contrast to the discounting method, that tends to scale linearly with vector size.

Here I have compared the performance of both algorithms, calculation time is mean time to perform 1e4 searches.

It is clear that the cumulative algorithm outcompetes the discounting method by far (note that the axes are logarithmic).

Web Site (C) Thijs Janzen 2011-2014

Lake Tanganyika pictures courtesy Jen Reynolds