Change the way unexpected output is reported: rather than stopping at
the first difference, let the test run till completion, then gather
all the output and compare it to the expected output using difflib.

XXX Still to do: produce diff output that only shows the sections that
differ; currently it produces ndiff-style output because that's the
easiest to produce with difflib, but this becomes a liability when the
output is voluminous and there are only a few differences.
2 files changed