Skip to content

Commit

Permalink
Introduce entropy for fastestmirror option
Browse files Browse the repository at this point in the history
Traditionally the fastestmirror option has relied on taking the provided
mirrorlist from the repo CDN and measured the latency to each mirror by
timing the libCurl socket open time. This has resulted in several
undesirable properties for the fastestmirror option:

1. Socket open latency is not always a good proxy for download bandwidth
   performance, so if the nearest mirror to a user happens to be
   particularly slow, they will always experience a very slow download
   despite having fastestmirror enabled.
2. Mirror operators don't appreciate fastestmirror being used by high
   density deployments (i.e. data centers) where thousands to millions
   of hosts with fastestmirror enabled all dogpile the single lowest
   latency mirror despite there being several mirrors within almost as
   close proximity to the site

This change introduces entropy into the mirror selection process by
taking all mirrors with latency measurements less than twice that of the
nearest mirror and shuffling those mirrors evenly, then simply appending
all other mirrors to the end of the list in strict latency order.
  • Loading branch information
PhirePhly authored and evan-goode committed Oct 15, 2024
1 parent f8b0b59 commit c987eda
Showing 1 changed file with 15 additions and 1 deletion.
16 changes: 15 additions & 1 deletion librepo/fastestmirror.c
Original file line number Diff line number Diff line change
Expand Up @@ -694,10 +694,24 @@ lr_fastestmirror(LrHandle *handle,
}

// Sort the mirrors by the connection time
//
// Note that always picking the single best mirror has undesirable properties like
// forcing the user to always pick the single lowest latency mirror regardless of its
// actual bandwidth performance, and high density install bases all using fastestmirror
// tend to dogpile the single mirror nearest to all of them
//
// Instead of using a strict sorted list, shuffle all mirrors with lower latency than 2x
// the best mirror, to introduce enough entropy to spread the load across nearby mirrors.
double bestMirrorLatency = ((LrFastestMirror *)lrfastestmirrors->data)->plain_connect_time;

for (GSList *elem = lrfastestmirrors; elem; elem = g_slist_next(elem)) {
LrFastestMirror *mirror = elem->data;
g_debug("%s: %3.6f : %s", __func__, mirror->plain_connect_time, mirror->url);
new_list = g_slist_append(new_list, mirror->url);
if (mirror->plain_connect_time < (2.0 * bestMirrorLatency) ) { // Shuffle nearby mirrors
new_list = g_slist_insert(new_list, mirror->url, g_random_int_range(0, g_slist_length(new_list)+1));
} else { // Far away mirrors appended as backup options
new_list = g_slist_append(new_list, mirror->url);
}
}

g_slist_free_full(lrfastestmirrors, (GDestroyNotify)lr_lrfastestmirror_free);
Expand Down

0 comments on commit c987eda

Please sign in to comment.