Guido van Rossum | 54a069f | 2001-05-23 13:24:30 +0000 | [diff] [blame] | 1 | Writing Python Regression Tests |
| 2 | ------------------------------- |
| 3 | Skip Montanaro |
| 4 | (skip@mojam.com) |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 5 | |
| 6 | |
| 7 | Introduction |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 8 | |
| 9 | If you add a new module to Python or modify the functionality of an existing |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 10 | module, you should write one or more test cases to exercise that new |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 11 | functionality. There are different ways to do this within the regression |
| 12 | testing facility provided with Python; any particular test should use only |
| 13 | one of these options. Each option requires writing a test module using the |
| 14 | conventions of the the selected option: |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 15 | |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 16 | - PyUnit based tests |
| 17 | - doctest based tests |
| 18 | - "traditional" Python test modules |
| 19 | |
| 20 | Regardless of the mechanics of the testing approach you choose, |
| 21 | you will be writing unit tests (isolated tests of functions and objects |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 22 | defined by the module) using white box techniques. Unlike black box |
| 23 | testing, where you only have the external interfaces to guide your test case |
| 24 | writing, in white box testing you can see the code being tested and tailor |
| 25 | your test cases to exercise it more completely. In particular, you will be |
| 26 | able to refer to the C and Python code in the CVS repository when writing |
| 27 | your regression test cases. |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 28 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 29 | |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 30 | PyUnit based tests |
| 31 | |
| 32 | The PyUnit framework is based on the ideas of unit testing as espoused |
| 33 | by Kent Beck and the Extreme Programming (XP) movement. The specific |
| 34 | interface provided by the framework is tightly based on the JUnit |
| 35 | Java implementation of Beck's original SmallTalk test framework. Please |
| 36 | see the documentation of the unittest module for detailed information on |
| 37 | the interface and general guidelines on writing PyUnit based tests. |
| 38 | |
Fred Drake | b2ad1c8 | 2001-09-28 20:05:25 +0000 | [diff] [blame^] | 39 | The test_support helper module provides a two functions for use by |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 40 | PyUnit based tests in the Python regression testing framework: |
| 41 | run_unittest() takes a unittest.TestCase derived class as a parameter |
Fred Drake | b2ad1c8 | 2001-09-28 20:05:25 +0000 | [diff] [blame^] | 42 | and runs the tests defined in that class, and run_suite() takes a |
| 43 | populated TestSuite instance and runs the tests.. All test methods in |
| 44 | the Python regression framework have names that start with "test_" and |
| 45 | use lower-case names with words separated with underscores. |
| 46 | |
| 47 | All PyUnit-based tests in the Python test suite use boilerplate that |
| 48 | looks like this: |
| 49 | |
| 50 | import unittest |
| 51 | import test_support |
| 52 | |
| 53 | class MyTestCase(unittest.TestCase): |
| 54 | # define test methods here... |
| 55 | |
| 56 | def test_main(): |
| 57 | test_support.run_unittest(MyTestCase) |
| 58 | |
| 59 | if __name__ == "__main__": |
| 60 | test_main() |
| 61 | |
| 62 | This has the advantage that it allows the unittest module to be used |
| 63 | as a script to run individual tests as well as working well with the |
| 64 | regrtest framework. |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 65 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 66 | |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 67 | doctest based tests |
| 68 | |
| 69 | Tests written to use doctest are actually part of the docstrings for |
| 70 | the module being tested. Each test is written as a display of an |
| 71 | interactive session, including the Python prompts, statements that would |
| 72 | be typed by the user, and the output of those statements (including |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 73 | tracebacks, although only the exception msg needs to be retained then). |
| 74 | The module in the test package is simply a wrapper that causes doctest |
| 75 | to run over the tests in the module. The test for the difflib module |
| 76 | provides a convenient example: |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 77 | |
Tim Peters | a0a6222 | 2001-09-09 06:12:01 +0000 | [diff] [blame] | 78 | import difflib, test_support |
| 79 | test_support.run_doctest(difflib) |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 80 | |
| 81 | If the test is successful, nothing is written to stdout (so you should not |
| 82 | create a corresponding output/test_difflib file), but running regrtest |
Tim Peters | a0a6222 | 2001-09-09 06:12:01 +0000 | [diff] [blame] | 83 | with -v will give a detailed report, the same as if passing -v to doctest. |
| 84 | |
| 85 | A second argument can be passed to run_doctest to tell doctest to search |
| 86 | sys.argv for -v instead of using test_support's idea of verbosity. This |
| 87 | is useful for writing doctest-based tests that aren't simply running a |
| 88 | doctest'ed Lib module, but contain the doctests themselves. Then at |
| 89 | times you may want to run such a test directly as a doctest, independent |
| 90 | of the regrtest framework. The tail end of test_descrtut.py is a good |
| 91 | example: |
| 92 | |
| 93 | def test_main(verbose=None): |
| 94 | import test_support, test.test_descrtut |
| 95 | test_support.run_doctest(test.test_descrtut, verbose) |
| 96 | |
| 97 | if __name__ == "__main__": |
| 98 | test_main(1) |
| 99 | |
| 100 | If run via regrtest, test_main() is called (by regrtest) without specifying |
Tim Peters | bea3fb8 | 2001-09-10 01:39:21 +0000 | [diff] [blame] | 101 | verbose, and then test_support's idea of verbosity is used. But when |
Tim Peters | a0a6222 | 2001-09-09 06:12:01 +0000 | [diff] [blame] | 102 | run directly, test_main(1) is called, and then doctest's idea of verbosity |
| 103 | is used. |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 104 | |
| 105 | See the documentation for the doctest module for information on |
| 106 | writing tests using the doctest framework. |
| 107 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 108 | |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 109 | "traditional" Python test modules |
| 110 | |
| 111 | The mechanics of how the "traditional" test system operates are fairly |
| 112 | straightforward. When a test case is run, the output is compared with the |
| 113 | expected output that is stored in .../Lib/test/output. If the test runs to |
| 114 | completion and the actual and expected outputs match, the test succeeds, if |
| 115 | not, it fails. If an ImportError or test_support.TestSkipped error is |
| 116 | raised, the test is not run. |
| 117 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 118 | |
| 119 | Executing Test Cases |
| 120 | |
| 121 | If you are writing test cases for module spam, you need to create a file |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 122 | in .../Lib/test named test_spam.py. In addition, if the tests are expected |
| 123 | to write to stdout during a successful run, you also need to create an |
| 124 | expected output file in .../Lib/test/output named test_spam ("..." |
| 125 | represents the top-level directory in the Python source tree, the directory |
| 126 | containing the configure script). If needed, generate the initial version |
| 127 | of the test output file by executing: |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 128 | |
| 129 | ./python Lib/test/regrtest.py -g test_spam.py |
| 130 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 131 | from the top-level directory. |
Fred Drake | a6daad2 | 2001-05-23 04:57:49 +0000 | [diff] [blame] | 132 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 133 | Any time you modify test_spam.py you need to generate a new expected |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 134 | output file. Don't forget to desk check the generated output to make sure |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 135 | it's really what you expected to find! All in all it's usually better |
| 136 | not to have an expected-out file (note that doctest- and unittest-based |
| 137 | tests do not). |
| 138 | |
| 139 | To run a single test after modifying a module, simply run regrtest.py |
| 140 | without the -g flag: |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 141 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 142 | ./python Lib/test/regrtest.py test_spam.py |
| 143 | |
| 144 | While debugging a regression test, you can of course execute it |
| 145 | independently of the regression testing framework and see what it prints: |
| 146 | |
| 147 | ./python Lib/test/test_spam.py |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 148 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 149 | To run the entire test suite: |
| 150 | |
| 151 | [UNIX, + other platforms where "make" works] Make the "test" target at the |
| 152 | top level: |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 153 | |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 154 | make test |
| 155 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 156 | {WINDOWS] Run rt.bat from your PCBuild directory. Read the comments at |
| 157 | the top of rt.bat for the use of special -d, -O and -q options processed |
| 158 | by rt.bat. |
| 159 | |
| 160 | [OTHER] You can simply execute the two runs of regrtest (optimized and |
| 161 | non-optimized) directly: |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 162 | |
| 163 | ./python Lib/test/regrtest.py |
| 164 | ./python -O Lib/test/regrtest.py |
| 165 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 166 | But note that this way picks up whatever .pyc and .pyo files happen to be |
| 167 | around. The makefile and rt.bat ways run the tests twice, the first time |
| 168 | removing all .pyc and .pyo files from the subtree rooted at Lib/. |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 169 | |
| 170 | Test cases generate output based upon values computed by the test code. |
| 171 | When executed, regrtest.py compares the actual output generated by executing |
| 172 | the test case with the expected output and reports success or failure. It |
| 173 | stands to reason that if the actual and expected outputs are to match, they |
| 174 | must not contain any machine dependencies. This means your test cases |
| 175 | should not print out absolute machine addresses (e.g. the return value of |
| 176 | the id() builtin function) or floating point numbers with large numbers of |
| 177 | significant digits (unless you understand what you are doing!). |
| 178 | |
| 179 | |
| 180 | Test Case Writing Tips |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 181 | |
| 182 | Writing good test cases is a skilled task and is too complex to discuss in |
| 183 | detail in this short document. Many books have been written on the subject. |
| 184 | I'll show my age by suggesting that Glenford Myers' "The Art of Software |
| 185 | Testing", published in 1979, is still the best introduction to the subject |
| 186 | available. It is short (177 pages), easy to read, and discusses the major |
| 187 | elements of software testing, though its publication predates the |
| 188 | object-oriented software revolution, so doesn't cover that subject at all. |
| 189 | Unfortunately, it is very expensive (about $100 new). If you can borrow it |
| 190 | or find it used (around $20), I strongly urge you to pick up a copy. |
| 191 | |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 192 | The most important goal when writing test cases is to break things. A test |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 193 | case that doesn't uncover a bug is much less valuable than one that does. |
| 194 | In designing test cases you should pay attention to the following: |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 195 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 196 | * Your test cases should exercise all the functions and objects defined |
| 197 | in the module, not just the ones meant to be called by users of your |
| 198 | module. This may require you to write test code that uses the module |
| 199 | in ways you don't expect (explicitly calling internal functions, for |
| 200 | example - see test_atexit.py). |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 201 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 202 | * You should consider any boundary values that may tickle exceptional |
| 203 | conditions (e.g. if you were writing regression tests for division, |
| 204 | you might well want to generate tests with numerators and denominators |
| 205 | at the limits of floating point and integer numbers on the machine |
| 206 | performing the tests as well as a denominator of zero). |
Skip Montanaro | 47c60ec | 2000-06-30 06:08:35 +0000 | [diff] [blame] | 207 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 208 | * You should exercise as many paths through the code as possible. This |
| 209 | may not always be possible, but is a goal to strive for. In |
| 210 | particular, when considering if statements (or their equivalent), you |
| 211 | want to create test cases that exercise both the true and false |
| 212 | branches. For loops, you should create test cases that exercise the |
| 213 | loop zero, one and multiple times. |
| 214 | |
| 215 | * You should test with obviously invalid input. If you know that a |
| 216 | function requires an integer input, try calling it with other types of |
| 217 | objects to see how it responds. |
| 218 | |
| 219 | * You should test with obviously out-of-range input. If the domain of a |
| 220 | function is only defined for positive integers, try calling it with a |
| 221 | negative integer. |
| 222 | |
| 223 | * If you are going to fix a bug that wasn't uncovered by an existing |
| 224 | test, try to write a test case that exposes the bug (preferably before |
| 225 | fixing it). |
| 226 | |
Fred Drake | 44b6bd2 | 2000-10-23 16:37:14 +0000 | [diff] [blame] | 227 | * If you need to create a temporary file, you can use the filename in |
| 228 | test_support.TESTFN to do so. It is important to remove the file |
| 229 | when done; other tests should be able to use the name without cleaning |
| 230 | up after your test. |
| 231 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 232 | |
| 233 | Regression Test Writing Rules |
| 234 | |
| 235 | Each test case is different. There is no "standard" form for a Python |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 236 | regression test case, though there are some general rules (note that |
| 237 | these mostly apply only to the "classic" tests; unittest- and doctest- |
| 238 | based tests should follow the conventions natural to those frameworks): |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 239 | |
| 240 | * If your test case detects a failure, raise TestFailed (found in |
| 241 | test_support). |
| 242 | |
| 243 | * Import everything you'll need as early as possible. |
| 244 | |
| 245 | * If you'll be importing objects from a module that is at least |
| 246 | partially platform-dependent, only import those objects you need for |
| 247 | the current test case to avoid spurious ImportError exceptions that |
| 248 | prevent the test from running to completion. |
| 249 | |
| 250 | * Print all your test case results using the print statement. For |
| 251 | non-fatal errors, print an error message (or omit a successful |
| 252 | completion print) to indicate the failure, but proceed instead of |
| 253 | raising TestFailed. |
| 254 | |
Tim Peters | a48b526 | 2000-08-23 05:28:45 +0000 | [diff] [blame] | 255 | * Use "assert" sparingly, if at all. It's usually better to just print |
| 256 | what you got, and rely on regrtest's got-vs-expected comparison to |
| 257 | catch deviations from what you expect. assert statements aren't |
| 258 | executed at all when regrtest is run in -O mode; and, because they |
| 259 | cause the test to stop immediately, can lead to a long & tedious |
| 260 | test-fix, test-fix, test-fix, ... cycle when things are badly broken |
| 261 | (and note that "badly broken" often includes running the test suite |
| 262 | for the first time on new platforms or under new implementations of |
| 263 | the language). |
| 264 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 265 | |
| 266 | Miscellaneous |
| 267 | |
| 268 | There is a test_support module you can import from your test case. It |
| 269 | provides the following useful objects: |
| 270 | |
| 271 | * TestFailed - raise this exception when your regression test detects a |
| 272 | failure. |
| 273 | |
Fred Drake | 62c53dd | 2000-08-21 16:55:57 +0000 | [diff] [blame] | 274 | * TestSkipped - raise this if the test could not be run because the |
| 275 | platform doesn't offer all the required facilities (like large |
| 276 | file support), even if all the required modules are available. |
| 277 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 278 | * verbose - you can use this variable to control print output. Many |
| 279 | modules use it. Search for "verbose" in the test_*.py files to see |
| 280 | lots of examples. |
| 281 | |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 282 | * verify(condition, reason='test failed'). Use this instead of |
| 283 | |
| 284 | assert condition[, reason] |
| 285 | |
| 286 | verify() has two advantages over assert: it works even in -O mode, |
| 287 | and it raises TestFailed on failure instead of AssertionError. |
| 288 | |
| 289 | * TESTFN - a string that should always be used as the filename when you |
| 290 | need to create a temp file. Also use try/finally to ensure that your |
| 291 | temp files are deleted before your test completes. Note that you |
| 292 | cannot unlink an open file on all operating systems, so also be sure |
| 293 | to close temp files before trying to unlink them. |
| 294 | |
| 295 | * sortdict(dict) - acts like repr(dict.items()), but sorts the items |
| 296 | first. This is important when printing a dict value, because the |
| 297 | order of items produced by dict.items() is not defined by the |
| 298 | language. |
| 299 | |
| 300 | * findfile(file) - you can call this function to locate a file somewhere |
| 301 | along sys.path or in the Lib/test tree - see test_linuxaudiodev.py for |
| 302 | an example of its use. |
| 303 | |
Tim Peters | a48b526 | 2000-08-23 05:28:45 +0000 | [diff] [blame] | 304 | * use_large_resources - true iff tests requiring large time or space |
| 305 | should be run. |
| 306 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 307 | * fcmp(x,y) - you can call this function to compare two floating point |
| 308 | numbers when you expect them to only be approximately equal withing a |
| 309 | fuzz factor (test_support.FUZZ, which defaults to 1e-6). |
| 310 | |
Tim Peters | a48b526 | 2000-08-23 05:28:45 +0000 | [diff] [blame] | 311 | NOTE: Always import something from test_support like so: |
| 312 | |
| 313 | from test_support import verbose |
| 314 | |
| 315 | or like so: |
| 316 | |
| 317 | import test_support |
| 318 | ... use test_support.verbose in the code ... |
| 319 | |
| 320 | Never import anything from test_support like this: |
| 321 | |
| 322 | from test.test_support import verbose |
| 323 | |
| 324 | "test" is a package already, so can refer to modules it contains without |
| 325 | "test." qualification. If you do an explicit "test.xxx" qualification, that |
| 326 | can fool Python into believing test.xxx is a module distinct from the xxx |
| 327 | in the current package, and you can end up importing two distinct copies of |
| 328 | xxx. This is especially bad if xxx=test_support, as regrtest.py can (and |
| 329 | routinely does) overwrite its "verbose" and "use_large_resources" |
| 330 | attributes: if you get a second copy of test_support loaded, it may not |
| 331 | have the same values for those as regrtest intended. |
| 332 | |
| 333 | |
Skip Montanaro | e9e5dcd | 2000-07-19 17:19:49 +0000 | [diff] [blame] | 334 | Python and C statement coverage results are currently available at |
| 335 | |
| 336 | http://www.musi-cal.com/~skip/python/Python/dist/src/ |
| 337 | |
| 338 | As of this writing (July, 2000) these results are being generated nightly. |
| 339 | You can refer to the summaries and the test coverage output files to see |
| 340 | where coverage is adequate or lacking and write test cases to beef up the |
| 341 | coverage. |
Tim Peters | f5f6c43 | 2001-05-23 07:46:36 +0000 | [diff] [blame] | 342 | |
| 343 | |
| 344 | Some Non-Obvious regrtest Features |
| 345 | |
| 346 | * Automagic test detection: When you create a new test file |
| 347 | test_spam.py, you do not need to modify regrtest (or anything else) |
| 348 | to advertise its existence. regrtest searches for and runs all |
| 349 | modules in the test directory with names of the form test_xxx.py. |
| 350 | |
| 351 | * Miranda output: If, when running test_spam.py, regrtest does not |
| 352 | find an expected-output file test/output/test_spam, regrtest |
| 353 | pretends that it did find one, containing the single line |
| 354 | |
| 355 | test_spam |
| 356 | |
| 357 | This allows new tests that don't expect to print anything to stdout |
| 358 | to not bother creating expected-output files. |
| 359 | |
| 360 | * Two-stage testing: To run test_spam.py, regrtest imports test_spam |
| 361 | as a module. Most tests run to completion as a side-effect of |
| 362 | getting imported. After importing test_spam, regrtest also executes |
| 363 | test_spam.test_main(), if test_spam has a "test_main" attribute. |
Fred Drake | b2ad1c8 | 2001-09-28 20:05:25 +0000 | [diff] [blame^] | 364 | This is rarely required with the "traditional" Python tests, and |
| 365 | you shouldn't create a module global with name test_main unless |
| 366 | you're specifically exploiting this gimmick. This usage does |
| 367 | prove useful with PyUnit-based tests as well, however; defining |
| 368 | a test_main() which is run by regrtest and a script-stub in the |
| 369 | test module ("if __name__ == '__main__': test_main()") allows |
| 370 | the test to be used like any other Python test and also work |
| 371 | with the unittest.py-as-a-script approach, allowing a developer |
| 372 | to run specific tests from the command line. |