commit | 22e244a36ad19ef21dd0f6e59caa6df90f29ac47 | [log] [tgz] |
---|---|---|
author | Julius Werner <jwerner@chromium.org> | Tue Nov 13 18:25:19 2012 -0800 |
committer | Gerrit <chrome-bot@google.com> | Wed Nov 14 17:16:47 2012 -0800 |
tree | db572fa00d5acbaeef02be4f90ba2dd89093e991 | |
parent | fe34be0a35761150a1d085a55b8c925218c0e8b8 [diff] |
shill: Prevent silent discarding of GLib IO channel errors Shill uses GLib's IO channels to poll some of its file descriptors, like the RTNL socket. On detecting an error condition on the socket the callbacks just return FALSE (causing GLib to unregister the descriptor) without any further action or output. The process keeps on running without receiving new netlink messages, and the upper layers are never even informed of the condition. For the descriptors currently managed through GLib, error conditions on poll() or read() usually indicate a system-wide failure or bug that should not happen under normal conditions. One such current issue is in the kernel's NETLINK_ROUTE system, which can occasionally produce errors in a rare and hard to reproduce manner. This condition does not seem to immediately happen again when opening a new RTNL socket. This patch changes the affected GLib callback handlers to always log an error message and die on these kinds of error conditions. This seems to be the most practical answer to these unexpected and rare occurences, forcing a full restart of shill through upstart which is probably our best bet to recover from them. The callback handlers no longer return FALSE under any circumstances. BUG=chromium-os:36328 TEST=Hack the code to purposefully trigger one of the LOG(FATAL) code paths after some time. Watch shill die (with a sightly misformatted backtrace) and get immediately restarted. Change-Id: I74e47ab41a029d0b4fc509c525cb5cb86a871a2b Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://gerrit.chromium.org/gerrit/37980 Reviewed-by: mukesh agrawal <quiche@chromium.org>