blob: ae80a6b704fe1075535ed7b66de9b9a90878b031 [file] [log] [blame]
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -08001#!/bin/bash
2
Richard Barnette2e9747f2018-06-15 07:54:08 -07003# Force a repair special task for any host that hasn't seen activity in
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -08004# the past day.
5#
Richard Barnette2e9747f2018-06-15 07:54:08 -07006# Various scripts/cron jobs look for DUTs that aren't working. To be
7# conservative, those scripts assume that a DUT that hasn't run any jobs
8# within a reasonable time interval isn't working, since some of the
9# ways a DUT may be unavailable manifest as inactivity.
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080010#
11# In some cases, we'd like to be more certain as to a DUT's status.
Richard Barnette2e9747f2018-06-15 07:54:08 -070012# This script goes through the entire AFE hosts table, and identifies
13# unlocked hosts that would otherwise be flagged as "not working due to
14# lack of activity", and forces a repair task.
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080015#
Richard Barnette2e9747f2018-06-15 07:54:08 -070016# We use a repair task (as opposed to verify) for various reasons:
17# + If a DUT is working, repair and verify perform the same checks,
18# and generally run in the same time.
19# + If a DUT is broken, a verify task will fail and invoke repair,
20# which will take longer than just repair alone.
21# + Repair tasks that pass update labels; without this, labels could
22# become out-of-date simply because a DUT is idle.
23#
24# Locked hosts are skipped because they can't run jobs and because we
25# want them to show up as suspicious anyway.
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080026
27
28cd $(dirname $0)/..
29
Richard Barnette2e9747f2018-06-15 07:54:08 -070030# Gather all the hosts under supervision of the lab techs.
31# Basically, that's any host in any managed pool.
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080032
33GET_HOSTS='
Xuhui Pengd0631de2017-09-18 20:58:27 +000034 /pool:(suites|bvt|cq|continuous|cts|arc-presubmit|crosperf|performance)/ {
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080035 print $1
36 }
37'
38HOSTS=( $(cli/atest host list --unlocked | awk "$GET_HOSTS") )
39
40
41# Go through the gathered hosts, and use dut_status to find the
42# ones with unknown state (anything without a positive "OK" or
43# "NO" diagnosis).
44
Richard Barnette2e9747f2018-06-15 07:54:08 -070045NEED_CHECK='
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080046 /OK/ || /NO/ { next }
Richard Barnette2e9747f2018-06-15 07:54:08 -070047 /^chromeos/ { print $1 }
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080048'
Richard Barnette2e9747f2018-06-15 07:54:08 -070049CHECK=( $(site_utils/dut_status.py -d 19 "${HOSTS[@]}" | awk "$NEED_CHECK") )
J. Richard Barnetted24b2fd2015-01-12 17:42:37 -080050
Richard Barnette2e9747f2018-06-15 07:54:08 -070051contrib/repair_hosts "${CHECK[@]}"