blob: e348a2b9e34c15b2162d5dd3ce6410db4db795f5 [file] [log] [blame] [view]
Eric Anderson611e7e12015-05-11 09:38:18 -07001GRPC Connection Backoff Protocol
2================================
3
4When we do a connection to a backend which fails, it is typically desirable to
5not retry immediately (to avoid flooding the network or the server with
6requests) and instead do some form of exponential backoff.
7
8We have several parameters:
9 1. INITIAL_BACKOFF (how long to wait after the first failure before retrying)
10 2. MULTIPLIER (factor with which to multiply backoff after a failed retry)
11 3. MAX_BACKOFF (Upper bound on backoff)
12 4. MIN_CONNECTION_TIMEOUT
13
14## Proposed Backoff Algorithm
15
16Exponentially back off the start time of connection attempts up to a limit of
David Klempner0e5d2ef2015-06-15 14:48:31 -070017MAX_BACKOFF, with jitter.
Eric Anderson611e7e12015-05-11 09:38:18 -070018
19```
20ConnectWithBackoff()
21 current_backoff = INITIAL_BACKOFF
22 current_deadline = now() + INITIAL_BACKOFF
23 while (TryConnect(Max(current_deadline, MIN_CONNECT_TIMEOUT))
24 != SUCCESS)
25 SleepUntil(current_deadline)
26 current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
David Klempner0e5d2ef2015-06-15 14:48:31 -070027 current_deadline = now() + current_backoff +
28 UniformRandom(-JITTER * backoff, JITTER * backoff)
29
Eric Anderson611e7e12015-05-11 09:38:18 -070030```
31
David Klempner0e5d2ef2015-06-15 14:48:31 -070032With specific parameters of
33INITIAL_BACKOFF = 20 seconds
34MULTIPLIER = 1.6
35MAX_BACKOFF = 120 seconds
36JITTER = 0.2
37
38Implementations with pressing concerns (such as minimizing the number of wakeups
39on a mobile phone) may wish to use a different algorithm, and in particular
40different jitter logic.
41
42Alternate implementations must ensure that connection backoffs started at the
43same time disperse, and must not attempt connections substantially more often
44than the above algorithm.
45
Eric Anderson611e7e12015-05-11 09:38:18 -070046## Historical Algorithm in Stubby
47
48Exponentially increase up to a limit of MAX_BACKOFF the intervals between
49connection attempts. This is what stubby 2 uses, and is equivalent if
50TryConnect() fails instantly.
51
52```
53LegacyConnectWithBackoff()
54 current_backoff = INITIAL_BACKOFF
55 while (TryConnect(MIN_CONNECT_TIMEOUT) != SUCCESS)
56 SleepFor(current_backoff)
57 current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
58```
59
60The grpc C implementation currently uses this approach with an initial backoff
61of 1 second, multiplier of 2, and maximum backoff of 120 seconds. (This will
62change)
63
64Stubby, or at least rpc2, uses exactly this algorithm with an initial backoff
65of 1 second, multiplier of 1.2, and a maximum backoff of 120 seconds.
66
67## Use Cases to Consider
68
69* Client tries to connect to a server which is down for multiple hours, eg for
70 maintenance
71* Client tries to connect to a server which is overloaded
72* User is bringing up both a client and a server at the same time
73 * In particular, we would like to avoid a large unnecessary delay if the
74 client connects to a server which is about to come up
75* Client/server are misconfigured such that connection attempts always fail
76 * We want to make sure these dont put too much load on the server by
77 default.
78* Server is overloaded and wants to transiently make clients back off
79* Application has out of band reason to believe a server is back
80 * We should consider an out of band mechanism for the client to hint that
81 we should short circuit the backoff.