Abort child process groups with SIGKILL.

Currently we unnecessarily sleep for 12 seconds every time we kill a process.
Here's how it works:
  - Send SIGCONT to all processes.
  - If any are still running, wait 6 seconds.
  - Send SIGTERM to all processes.
  - If any are still running, wait 6 seconds.
  - Send SIGKILL to all processes.

There are several problems with the above algorithm:
  - SIGCONT doesn't cause processes to exit, so waiting 6 seconds after
    that is pointless.
  - After sending SIGTERM, we check for whether any of the processes are
    present immediately. This doesn't give children enough time to
    actually clean up.
  - We sleep for 6 seconds unconditionally without considering the fact
    that the processes might exit early.

Instead of doing this, I've just updating it to send SIGKILL to the
entire process group and not have any sleep statements.

BUG=chromium:432191
DEPLOY=scheduler
TEST=Set up local scheduler, verified timeouts still work.

Change-Id: Ie41d10d0605851df61dd07789d0d4afffd9eef01
Reviewed-on: https://chromium-review.googlesource.com/230323
Reviewed-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Prashanth B <beeps@chromium.org>
Commit-Queue: David James <davidjames@chromium.org>
Tested-by: David James <davidjames@chromium.org>
diff --git a/client/common_lib/site_utils.py b/client/common_lib/site_utils.py
index fe612b2..352ee43 100644
--- a/client/common_lib/site_utils.py
+++ b/client/common_lib/site_utils.py
@@ -290,13 +290,13 @@
                 # The process may have died from a previous signal before we
                 # could kill it.
                 pass
+        if sig == signal.SIGKILL:
+            return sig_count
         pid_list = [pid for pid in pid_list if base_utils.pid_is_alive(pid)]
         if not pid_list:
             break
         time.sleep(CHECK_PID_IS_ALIVE_TIMEOUT)
     failed_list = []
-    if signal.SIGKILL in signal_queue:
-        return sig_count
     for pid in pid_list:
         if base_utils.pid_is_alive(pid):
             failed_list.append('Could not kill %d for process name: %s.' % pid,
diff --git a/scheduler/drone_utility.py b/scheduler/drone_utility.py
index 489c358..dfb4905 100755
--- a/scheduler/drone_utility.py
+++ b/scheduler/drone_utility.py
@@ -226,12 +226,11 @@
         kill_proc_key = 'kill_processes'
         stats.Gauge(_STATS_KEY).send('%s.%s' % (kill_proc_key, 'net'),
                                      len(process_list))
-        signal_queue = (signal.SIGCONT, signal.SIGTERM, signal.SIGKILL)
         try:
             logging.info('List of process to be killed: %s', process_list)
             sig_counts = utils.nuke_pids(
-                            [process.pid for process in process_list],
-                            signal_queue=signal_queue)
+                            [-process.pid for process in process_list],
+                            signal_queue=(signal.SIGKILL,))
             for name, count in sig_counts.iteritems():
                 stats.Gauge(_STATS_KEY).send('%s.%s' % (kill_proc_key, name),
                                              count)