J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 1 | #!/bin/bash |
| 2 | |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 3 | # Force a repair special task for any host that hasn't seen activity in |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 4 | # the past day. |
| 5 | # |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 6 | # Various scripts/cron jobs look for DUTs that aren't working. To be |
| 7 | # conservative, those scripts assume that a DUT that hasn't run any jobs |
| 8 | # within a reasonable time interval isn't working, since some of the |
| 9 | # ways a DUT may be unavailable manifest as inactivity. |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 10 | # |
| 11 | # In some cases, we'd like to be more certain as to a DUT's status. |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 12 | # This script goes through the entire AFE hosts table, and identifies |
| 13 | # unlocked hosts that would otherwise be flagged as "not working due to |
| 14 | # lack of activity", and forces a repair task. |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 15 | # |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 16 | # We use a repair task (as opposed to verify) for various reasons: |
| 17 | # + If a DUT is working, repair and verify perform the same checks, |
| 18 | # and generally run in the same time. |
| 19 | # + If a DUT is broken, a verify task will fail and invoke repair, |
| 20 | # which will take longer than just repair alone. |
| 21 | # + Repair tasks that pass update labels; without this, labels could |
| 22 | # become out-of-date simply because a DUT is idle. |
| 23 | # |
| 24 | # Locked hosts are skipped because they can't run jobs and because we |
| 25 | # want them to show up as suspicious anyway. |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 26 | |
| 27 | |
| 28 | cd $(dirname $0)/.. |
| 29 | |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 30 | # Gather all the hosts under supervision of the lab techs. |
| 31 | # Basically, that's any host in any managed pool. |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 32 | |
| 33 | GET_HOSTS=' |
Xuhui Peng | d0631de | 2017-09-18 20:58:27 +0000 | [diff] [blame] | 34 | /pool:(suites|bvt|cq|continuous|cts|arc-presubmit|crosperf|performance)/ { |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 35 | print $1 |
| 36 | } |
| 37 | ' |
| 38 | HOSTS=( $(cli/atest host list --unlocked | awk "$GET_HOSTS") ) |
| 39 | |
| 40 | |
| 41 | # Go through the gathered hosts, and use dut_status to find the |
| 42 | # ones with unknown state (anything without a positive "OK" or |
| 43 | # "NO" diagnosis). |
| 44 | |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 45 | NEED_CHECK=' |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 46 | /OK/ || /NO/ { next } |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 47 | /^chromeos/ { print $1 } |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 48 | ' |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 49 | CHECK=( $(site_utils/dut_status.py -d 19 "${HOSTS[@]}" | awk "$NEED_CHECK") ) |
J. Richard Barnette | d24b2fd | 2015-01-12 17:42:37 -0800 | [diff] [blame] | 50 | |
Richard Barnette | 2e9747f | 2018-06-15 07:54:08 -0700 | [diff] [blame] | 51 | contrib/repair_hosts "${CHECK[@]}" |