Diff - f9bae84e5faa86ce0cb6ffdfbeb04aec0c09631e^! - platform/system/netd

commit	f9bae84e5faa86ce0cb6ffdfbeb04aec0c09631e	[log] [tgz]
author	Lorenzo Colitti <lorenzo@google.com>	Mon Aug 28 11:59:51 2017 +0900
committer	Lorenzo Colitti <lorenzo@google.com>	Mon Aug 28 14:12:44 2017 +0900
tree	87796365558ef2a84c12eb9deb5d6e0c733b6029
parent	43f1c3c5e9d5b76e4c776bce3e8a80d8f1bd370d [diff] [blame]

Don't start iptables and ip6tables on parallel threads.

This still ensures that the child processes initialize in
parallel (which is the slow part of starting them, since
forkAndExec itself is sub-millisecond), but it does not fork them
in parallel.

Forking them in parallel is incorrect because forkAndExec first
creates pipes, then forks, then closes the pipes. If this is done
in parallel on two threads, the following can happen:

1. Create pipes to subprocess A.
2. Create pipes to subprocess B.
3. Fork process A.
4. Fork process B.
5. Close child end of pipes to subprocess A.
6. Close child end of pipes to subprocess B.

If this happens, then the subprocess B will inherit all the fds
of the pipes to subprocess A in addition to its own. This
includes the read end of the pipe that netd uses to send commands
to subprocess A, and the write ends of the pipes that netd uses
to receive subprocess A's stdout and stderr. Subprocess B never
reads or writes to these fds because it never dup2()s them to
stdin/stdout/stderr before calling exec(). However, it also never
closes them. Therefore, if process A dies (e.g., due to an
invalid command), then netd will never receive a POLLHUP on the
read ends or a POLLERR on the write end of these pipes.

This means that:

1. The invalid command to subprocess A will time out (imposing a
   5-second delay) instead of failing fast.
2. Even though the timeout causes us to call stop() on subprocess
   A, and stop() causes processTerminated to be set to true,
   sendCommand does not check processTerminated, it only checks
   outputReady. Therefore, when we get told to send another
   command to subprocess A, if subprocess B is still around,
   outputReady will return true. The result is that we do not
   fork a new subprocess to replace subprocess A.

These effects caused flakiness of TestRestartOnMalformedCommand
and a latency hit to the first invalid command sent after boot.

Bug: 28362720
Bug: 64687442
Test: bullhead builds, boots
Test: netd_{unit,integration}_test pass
Test: TestRestartOnMalformedCommand passes 100 times in a row
Change-Id: I117359b9ca9c82dd676608100fed6957c9a93c92

diff --git a/server/IptablesRestoreController.cpp b/server/IptablesRestoreController.cpp
index a90224a..37b94bf 100644
--- a/server/IptablesRestoreController.cpp
+++ b/server/IptablesRestoreController.cpp

@@ -124,11 +124,12 @@
 }
 
 void IptablesRestoreController::Init() {
-    // Start the IPv4 and IPv6 processes in parallel, since each one takes 20-30ms.
-    std::thread v4([this] () { mIpRestore.reset(forkAndExec(IPTABLES_PROCESS)); });
-    std::thread v6([this] () { mIp6Restore.reset(forkAndExec(IP6TABLES_PROCESS)); });
-    v4.join();
-    v6.join();
+    // We cannot fork these in parallel or a child process could inherit the pipe fds intended for
+    // use by the other child process. see https://android-review.googlesource.com/469559 for what
+    // breaks. This does not cause a latency hit, because the parent only has to wait for
+    // forkAndExec, which is sub-millisecond, and the child processes then call exec() in parallel.
+    mIpRestore.reset(forkAndExec(IPTABLES_PROCESS));
+    mIp6Restore.reset(forkAndExec(IP6TABLES_PROCESS));
 }
 
 /* static */