blob: 70abc980f05c4425c286247a02467add47a8ae59 [file] [log] [blame] [view]
Eric Anderson611e7e12015-05-11 09:38:18 -07001GRPC Connection Backoff Protocol
2================================
3
4When we do a connection to a backend which fails, it is typically desirable to
5not retry immediately (to avoid flooding the network or the server with
6requests) and instead do some form of exponential backoff.
7
8We have several parameters:
9 1. INITIAL_BACKOFF (how long to wait after the first failure before retrying)
10 2. MULTIPLIER (factor with which to multiply backoff after a failed retry)
11 3. MAX_BACKOFF (Upper bound on backoff)
12 4. MIN_CONNECTION_TIMEOUT
13
14## Proposed Backoff Algorithm
15
16Exponentially back off the start time of connection attempts up to a limit of
David Klempner0e5d2ef2015-06-15 14:48:31 -070017MAX_BACKOFF, with jitter.
Eric Anderson611e7e12015-05-11 09:38:18 -070018
19```
20ConnectWithBackoff()
21 current_backoff = INITIAL_BACKOFF
22 current_deadline = now() + INITIAL_BACKOFF
23 while (TryConnect(Max(current_deadline, MIN_CONNECT_TIMEOUT))
24 != SUCCESS)
25 SleepUntil(current_deadline)
26 current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
David Klempner0e5d2ef2015-06-15 14:48:31 -070027 current_deadline = now() + current_backoff +
David Klempner08d16ee2015-06-15 15:09:38 -070028 UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)
David Klempner0e5d2ef2015-06-15 14:48:31 -070029
Eric Anderson611e7e12015-05-11 09:38:18 -070030```
31
David Klempner0e5d2ef2015-06-15 14:48:31 -070032With specific parameters of
David Klempnerca5add62015-06-17 18:20:31 -070033MIN_CONNECT_TIMEOUT = 20 seconds
34INITIAL_BACKOFF = 1 second
David Klempner0e5d2ef2015-06-15 14:48:31 -070035MULTIPLIER = 1.6
36MAX_BACKOFF = 120 seconds
37JITTER = 0.2
38
39Implementations with pressing concerns (such as minimizing the number of wakeups
40on a mobile phone) may wish to use a different algorithm, and in particular
41different jitter logic.
42
43Alternate implementations must ensure that connection backoffs started at the
44same time disperse, and must not attempt connections substantially more often
45than the above algorithm.