26c7e2841bd473611c324130ab329c37d13775a1 - platform/external/autotest

commit	26c7e2841bd473611c324130ab329c37d13775a1	[log] [tgz]
author	Simran Basi <sbasi@chromium.org>	Wed Aug 21 17:42:00 2013 -0700
committer	ChromeBot <chrome-bot@google.com>	Thu Aug 22 16:03:08 2013 -0700
tree	0d15d9b01cb347ab10f257ea0600be1bfb5717d9
parent	a39f97038dfa865c7bfcea2625d516d098207066 [diff]

RPM Servers: Fix stuck threads issue. This cl fixes the issue of the rpm servers running out file descriptors due to stuck sockets. Debugged it as the following problem: * RPM dispatcher kicks off a thread for each rpm_controller. * Thread for rpm_controller calls set_power, something bad happens exception thrown. Thread is stuck. * RPM dispatcher gets new request for that RPM and sends it to that stuck thread. * Client times out, job logs error and continues. * The drone -> frontend connection becomes close_wait and is stuck. Client closed it but the frontend thread is still waiting to hear from the dispatcher. This uses up a file descriptor. * Frontend -> dispatcher connection is stuck as established. Another file descriptor used up. * Over time (roughly a few weeks to a month) we run out of file descriptors and can no longer open new sockets causing all calls to the infrastructure to fail until it is manually restarted. A side effect of this is now all calls to this RPM via our infrastructure will fail, sadly this occurs silently and can only be seen in the logs of the autoserv jobs who timed out when calling the rpm infrastructure. In order to address this, I added a catch of all exceptions that occur when trying to change power state. The exception will be caught and emailed out to the team. Also updated the error emails to go to chromeos-lab-errors@google.com BUG=chromium:243567 TEST=Put in set_power_state an explict raise exception which recreated the conditions we see on the live server. Then applied my fix and verified we don't get stuck/use up file descriptors as before. Change-Id: I69bf68564fcfbda6c387faa74202c7c4b9bbcdef Reviewed-on: https://gerrit.chromium.org/gerrit/66608 Reviewed-by: Scott Zawalski <scottz@chromium.org> Commit-Queue: Simran Basi <sbasi@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org>