[autotest] Make provision failures bubble up the reason they failed.
This is relying on some very weird semantics to get the SERVER_JOB
ignored, but this is the best that I can figure out how to do without
massive code changes. Proper solutions would be to either:
1) Add some way for a job.run_test() to return the reason
2) Change the scheduler to decide success/failure for a special task
by running the parser on the status.log.
The second would likely be cleaner, but also would be a non-trivial
adjustment to the scheduler, would impact run times, and would nearly
add a Parsing state in between all of the normal state transitions.
status.log now looks like:
START ---- provision timestamp=1386286285 localtime=Dec 05 15:31:25
START provision_AutoUpdate provision_AutoUpdate timestamp=1386286285 localtime=Dec 05 15:31:25
FAIL provision_AutoUpdate provision_AutoUpdate timestamp=1386286310 localtime=Dec 05 15:31:50 A large potato fell on the DUT
END FAIL provision_AutoUpdate provision_AutoUpdate timestamp=1386286310 localtime=Dec 05 15:31:50
END FAIL ---- provision timestamp=1386286310 localtime=Dec 05 15:31:50
INFO ---- ---- timestamp=1386286310 job_abort_reason= localtime=Dec 05 15:31:50
run_suite output of provision_AutoUpdate raising an exception:
Suite prep [ PASSED ]
provision [ FAILED ]
provision FAIL: A large potato fell on the DUT
BUG=chromium:317719
TEST=Modified provision_AutoUpdate to raise an exception, and ran a
suite that would cause provisioning. Test results came back with the
exception message that was raised from within provision_AutoUpdate.
Change-Id: If1a1b1a9a854f327c94f8ca7b3d1bc759599fedf
Reviewed-on: https://chromium-review.googlesource.com/179010
Reviewed-by: Alex Miller <milleral@chromium.org>
Tested-by: Alex Miller <milleral@chromium.org>
Commit-Queue: Alex Miller <milleral@chromium.org>
diff --git a/server/control_segments/provision b/server/control_segments/provision
index 9da202c..6cef6f3 100644
--- a/server/control_segments/provision
+++ b/server/control_segments/provision
@@ -37,21 +37,35 @@
# we might as well just turn it off.
success = job.run_test(test, host=host, value=value)
if not success:
- raise Exception('Provisioning %s:%s failed on %s' %
- (name, value, machine))
+ raise Exception('This exception will be immediately discarded,'
+ ' and exists only to force all errors to the'
+ ' same error handling codepath below.')
except Exception as e:
- job.record('END FAIL', None, 'provision', str(e))
- # (Re)raising the exception serves two purposes here:
- # 1. The scheduler only looks at the return code of autoserv to see if
- # the special task failed. Raising an exception here will get autoserv
- # to exit with a non-zero exit code because of an unhandled exception.
- # This then triggers the failure condition in ProvisionTask's epilog,
- # which leads us into...
- # 2. This exception ends up triggering server_job to write an INFO line
- # with job_abort_reason equal to str(e), which is how str(e)
- # appears as the reason field for the job when the status.log we
- # generate is parsed as the job's results.
- raise
+ job.record('END FAIL', None, 'provision')
+ # Raising a blank exception is done here because any message we can
+ # give here would be less useful than whatever the failing test left as
+ # its own exception message.
+ #
+ # The gory details of how raising a blank exception accomplishes this
+ # is as follows:
+ #
+ # The scheduler only looks at the return code of autoserv to see if
+ # the special task failed. Therefore we need python to exit because
+ # of an unhandled exception or because someone called sys.exit(1).
+ #
+ # We can't call sys.exit, since there's post-job-running logic (like
+ # cleanup) that we'd be skipping out on. So therefore, we need to
+ # raise an exception. However, if we raise an exception, this
+ # exception ends up triggering server_job to write an INFO line with
+ # job_abort_reason equal to str(e), which the tko parser then picks
+ # up as the reason field for the job when the status.log we generate is
+ # parsed as the job's results.
+ #
+ # So therefore, we raise a blank exception, which then generates an
+ # empty job_abort_reason which the tko parser ignores just inserts as
+ # a SERVER_JOB failure with no reason, which we then ignore at suite
+ # results reporting time.
+ raise Exception('')
else:
# If we finish successfully, nothing in autotest ever looks at the
# status.log, so it's purely for human consumption and tracability.