MainLoop: improve error handling
Improve error handling, when reading from the log socket:
- On transient errors: sleep, to allow the transient condition
to pass (and to avoid a tight CPU loop, if the condition is
not so transient).
- On all other errors: abort the daemon. In the case of a
non-transient error, we have two choices: continue running,
or abort (and hope the new process comes up in a good state).
We to abort, for two reasons:
- we assume that newer logs are more important than older ones
- even if we kept running, we might not be able to dump the logs
that we'd captured so far, short of a core dump. (The request
to dump logs would, itself, need to be read from the log
socket.)
Bug: 32481888
Test: ./runtests.sh (on angler)
Change-Id: I68619d192f7682436c9f78b062d1b6070cdabcdf
diff --git a/main_loop.cpp b/main_loop.cpp
index 62bed31..8ad030a 100644
--- a/main_loop.cpp
+++ b/main_loop.cpp
@@ -30,6 +30,8 @@
namespace {
constexpr auto kMainBufferSizeBytes = 128 * 1024;
+// TODO(b/32840641): Tune the sleep time.
+constexpr auto kTransientErrorSleepTimeNsec = 100 * 1000; // 100 usec
}
MainLoop::MainLoop(const std::string& socket_name)
@@ -53,8 +55,7 @@
std::tie(datagram_len, err) =
os_->ReceiveDatagram(sock_fd_, input_buf.data(), input_buf.size());
if (err) {
- // TODO(b/32098735): Increment stats counter.
- // TODO(b/32481888): Improve error handling.
+ ProcessError(err);
return;
}
@@ -67,5 +68,20 @@
Os::kInvalidFd);
}
+// Private methods below.
+
+void MainLoop::ProcessError(Os::Errno err) {
+ if (err == EINTR || err == ENOMEM) {
+ // TODO(b/32098735): Increment stats counter.
+ os_->Nanosleep(kTransientErrorSleepTimeNsec);
+ return;
+ }
+
+ // Any other error is unexpected, and assumed to be non-recoverable.
+ // (If, e.g., our socket is in a bad state, then we won't be able to receive
+ // any new log messages.)
+ LOG(FATAL) << "Unexpected error: " << std::strerror(err);
+}
+
} // namespace wifilogd
} // namespace android