[autotest] Add a provision special task.
We now insert a special task which calls |autoserv --provision| with the
host that the HQE is about to run on to provision the machine correctly
before the test runs. If the provisioning fails, the HQE will also be
marked as failed. No provisioning special task will be queued if no
provisioning needs to be done to the host before the job can/will run.
With *just* this CL, no provisioning tasks should actually get
scheduled, because the part of the scheduler that maps HQEs to hosts
hasn't been taught about provisioning yet. That will come in a later
CL.
Once this CL goes in, it should not be reverted. The scheduler will
become very unhappy if it sees special tasks in its database, but can't
find a corresponding AgentTask definition for them. One would need to
do manual database cleanup to revert this CL. However, since one can
disable provisioning by reverting the (future) scheduling change CL,
this shouldn't be an issue.
BUG=chromium:249437
DEPLOY=scheduler
TEST=lots:
* Ran a job on a host with a non-matching cros-version:* label, and
a provision special task was correctly created. It ran after Reset,
and correctly kicked off the HQE after it finished.
* Ran a job on a host with a matching cros-version:* label, and no
provision special task was created.
* Ran a job on a host with a non-matching cros-version:* label, and
modified Reset so that it would fail. When reset failed, it canceled
the provision task, and the HQE was still rescheduled.
* Ran a job on a host with a non-matching cros-version:* label, and
modified the cros-version provisioning test to throw an exception.
The provision special task aborted the HQE with the desired semantics
(see comments in the ProvisionTask class in monitor_db), and scheduled
a repair to run after its failure.
The provision failures were all deduped against each other when bug
filing was enabled. See
https://code.google.com/p/autotest-bug-filing-test/issues/detail?id=1678
* Successfully debugged an autoupdate/devserver issue from provision
logs, thus proving that sufficient information is collected for debug.
Change-Id: I96dbfc7b001b90e7dc09e1196c0901adf35ba4d8
Reviewed-on: https://gerrit.chromium.org/gerrit/58385
Reviewed-by: Prashanth Balasubramanian <beeps@chromium.org>
Tested-by: Alex Miller <milleral@chromium.org>
Commit-Queue: Prashanth Balasubramanian <beeps@chromium.org>
diff --git a/frontend/afe/rpc_interface.py b/frontend/afe/rpc_interface.py
index e910a61..1789ab2 100644
--- a/frontend/afe/rpc_interface.py
+++ b/frontend/afe/rpc_interface.py
@@ -903,6 +903,7 @@
result['status_dictionary'] = {"Aborted": "Aborted",
"Verifying": "Verifying Host",
+ "Provisioning": "Provisioning Host",
"Pending": "Waiting on other hosts",
"Running": "Running autoserv",
"Completed": "Autoserv completed",