main: remove EPOLLHUP epoll item from host kernel synchronously

control_sockets.swap_remove() could cause host kernel to invoke
ep_remove() to remove the epoll item.

But it's called from the task work, and it could be deferred after
next poll_ctx.wait() which could unexpectedly pick up epoll events
from the already closed fd.

BUG=chromium:1019986
TEST=launch Crosvm guest from heavy loaded Linux host

Change-Id: I474a7a47a484e3acfae4383d61601e1553bd674f
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1917495
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
diff --git a/src/linux.rs b/src/linux.rs
index 24ef5ec..1177377 100644
--- a/src/linux.rs
+++ b/src/linux.rs
@@ -1845,10 +1845,26 @@
         }
 
         // Sort in reverse so the highest indexes are removed first. This removal algorithm
-        // preserved correct indexes as each element is removed.
+        // preserves correct indexes as each element is removed.
         vm_control_indices_to_remove.sort_unstable_by(|a, b| b.cmp(a));
         vm_control_indices_to_remove.dedup();
         for index in vm_control_indices_to_remove {
+            // Delete the socket from the `poll_ctx` synchronously. Otherwise, the kernel will do
+            // this automatically when the FD inserted into the `poll_ctx` is closed after this
+            // if-block, but this removal can be deferred unpredictably. In some instances where the
+            // system is under heavy load, we can even get events returned by `poll_ctx` for an FD
+            // that has already been closed. Because the token associated with that spurious event
+            // now belongs to a different socket, the control loop will start to interact with
+            // sockets that might not be ready to use. This can cause incorrect hangup detection or
+            // blocking on a socket that will never be ready. See also: crbug.com/1019986
+            if let Some(socket) = control_sockets.get(index) {
+                poll_ctx.delete(socket).map_err(Error::PollContextDelete)?;
+            }
+
+            // This line implicitly drops the socket at `index` when it gets returned by
+            // `swap_remove`. After this line, the socket at `index` is not the one from
+            // `vm_control_indices_to_remove`. Because of this socket's change in index, we need to
+            // use `poll_ctx.modify` to change the associated index in its `Token::VmControl`.
             control_sockets.swap_remove(index);
             if let Some(socket) = control_sockets.get(index) {
                 poll_ctx