Tim Peters | 7688229 | 2001-02-17 05:58:44 +0000 | [diff] [blame^] | 1 | \section{\module{doctest} --- |
| 2 | Test docstrings represent reality} |
| 3 | |
| 4 | \declaremodule{standard}{doctest} |
| 5 | \moduleauthor{Tim Peters}{tim_one@users.sourceforge.net} |
| 6 | \sectionauthor{Tim Peters}{tim_one@users.sourceforge.net} |
| 7 | \sectionauthor{Moshe Zadka}{moshez@debian.org} |
| 8 | |
| 9 | \modulesynopsis{A framework for verifying examples in docstrings.} |
| 10 | |
| 11 | The \module{doctest} module searches a module's docstrings for text that looks |
| 12 | like an interactive Python session, then executes all such sessions to verify |
| 13 | they still work exactly as shown. Here's a complete but small example: |
| 14 | |
| 15 | \begin{verbatim} |
| 16 | """ |
| 17 | This is module example. |
| 18 | |
| 19 | Example supplies one function, factorial. For example, |
| 20 | |
| 21 | >>> factorial(5) |
| 22 | 120 |
| 23 | """ |
| 24 | |
| 25 | def factorial(n): |
| 26 | """Return the factorial of n, an exact integer >= 0. |
| 27 | |
| 28 | If the result is small enough to fit in an int, return an int. |
| 29 | Else return a long. |
| 30 | |
| 31 | >>> [factorial(n) for n in range(6)] |
| 32 | [1, 1, 2, 6, 24, 120] |
| 33 | >>> [factorial(long(n)) for n in range(6)] |
| 34 | [1, 1, 2, 6, 24, 120] |
| 35 | >>> factorial(30) |
| 36 | 265252859812191058636308480000000L |
| 37 | >>> factorial(30L) |
| 38 | 265252859812191058636308480000000L |
| 39 | >>> factorial(-1) |
| 40 | Traceback (most recent call last): |
| 41 | ... |
| 42 | ValueError: n must be >= 0 |
| 43 | |
| 44 | Factorials of floats are OK, but the float must be an exact integer: |
| 45 | >>> factorial(30.1) |
| 46 | Traceback (most recent call last): |
| 47 | ... |
| 48 | ValueError: n must be exact integer |
| 49 | >>> factorial(30.0) |
| 50 | 265252859812191058636308480000000L |
| 51 | |
| 52 | It must also not be ridiculously large: |
| 53 | >>> factorial(1e100) |
| 54 | Traceback (most recent call last): |
| 55 | ... |
| 56 | OverflowError: n too large |
| 57 | """ |
| 58 | |
| 59 | \end{verbatim} |
| 60 | % allow LaTeX to break here. |
| 61 | \begin{verbatim} |
| 62 | |
| 63 | import math |
| 64 | if not n >= 0: |
| 65 | raise ValueError("n must be >= 0") |
| 66 | if math.floor(n) != n: |
| 67 | raise ValueError("n must be exact integer") |
| 68 | if n+1 == n: # e.g., 1e300 |
| 69 | raise OverflowError("n too large") |
| 70 | result = 1 |
| 71 | factor = 2 |
| 72 | while factor <= n: |
| 73 | try: |
| 74 | result *= factor |
| 75 | except OverflowError: |
| 76 | result *= long(factor) |
| 77 | factor += 1 |
| 78 | return result |
| 79 | |
| 80 | def _test(): |
| 81 | import doctest, example |
| 82 | return doctest.testmod(example) |
| 83 | |
| 84 | if __name__ == "__main__": |
| 85 | _test() |
| 86 | \end{verbatim} |
| 87 | |
| 88 | If you run \file{example.py} directly from the command line, doctest works |
| 89 | its magic: |
| 90 | |
| 91 | \begin{verbatim} |
| 92 | $ python example.py |
| 93 | $ |
| 94 | \end{verbatim} |
| 95 | |
| 96 | There's no output! That's normal, and it means all the examples worked. |
| 97 | Pass \code{-v} to the script, and doctest prints a detailed log of what it's |
| 98 | trying, and prints a summary at the end: |
| 99 | |
| 100 | \begin{verbatim} |
| 101 | $ python example.py -v |
| 102 | Running example.__doc__ |
| 103 | Trying: factorial(5) |
| 104 | Expecting: 120 |
| 105 | ok |
| 106 | 0 of 1 examples failed in example.__doc__ |
| 107 | Running example.factorial.__doc__ |
| 108 | Trying: [factorial(n) for n in range(6)] |
| 109 | Expecting: [1, 1, 2, 6, 24, 120] |
| 110 | ok |
| 111 | Trying: [factorial(long(n)) for n in range(6)] |
| 112 | Expecting: [1, 1, 2, 6, 24, 120] |
| 113 | ok |
| 114 | Trying: factorial(30) |
| 115 | Expecting: 265252859812191058636308480000000L |
| 116 | ok |
| 117 | \end{verbatim} |
| 118 | |
| 119 | And so on, eventually ending with: |
| 120 | |
| 121 | \begin{verbatim} |
| 122 | Trying: factorial(1e100) |
| 123 | Expecting: |
| 124 | Traceback (most recent call last): |
| 125 | ... |
| 126 | OverflowError: n too large |
| 127 | ok |
| 128 | 0 of 8 examples failed in example.factorial.__doc__ |
| 129 | 2 items passed all tests: |
| 130 | 1 tests in example |
| 131 | 8 tests in example.factorial |
| 132 | 9 tests in 2 items. |
| 133 | 9 passed and 0 failed. |
| 134 | Test passed. |
| 135 | $ |
| 136 | \end{verbatim} |
| 137 | |
| 138 | That's all you need to know to start making productive use of doctest! Jump |
| 139 | in. The docstrings in doctest.py contain detailed information about all |
| 140 | aspects of doctest, and we'll just cover the more important points here. |
| 141 | |
| 142 | \subsection{Normal Usage} |
| 143 | |
| 144 | In normal use, end each module \module{M} with: |
| 145 | |
| 146 | \begin{verbatim} |
| 147 | def _test(): |
| 148 | import doctest, M # replace M with your module's name |
| 149 | return doctest.testmod(M) # ditto |
| 150 | |
| 151 | if __name__ == "__main__": |
| 152 | _test() |
| 153 | \end{verbatim} |
| 154 | |
| 155 | Then running the module as a script causes the examples in the docstrings |
| 156 | to get executed and verified: |
| 157 | |
| 158 | \begin{verbatim} |
| 159 | python M.py |
| 160 | \end{verbatim} |
| 161 | |
| 162 | This won't display anything unless an example fails, in which case the |
| 163 | failing example(s) and the cause(s) of the failure(s) are printed to stdout, |
| 164 | and the final line of output is \code{"Test failed."}. |
| 165 | |
| 166 | Run it with the \code{-v} switch instead: |
| 167 | |
| 168 | \begin{verbatim} |
| 169 | python M.py -v |
| 170 | \end{verbatim} |
| 171 | |
| 172 | and a detailed report of all examples tried is printed to \var{stdout}, |
| 173 | along with assorted summaries at the end. |
| 174 | |
| 175 | You can force verbose mode by passing \code{verbose=1} to testmod, or |
| 176 | prohibit it by passing \code{verbose=0}. In either of those cases, |
| 177 | \var{sys.argv} is not examined by testmod. |
| 178 | |
| 179 | In any case, testmod returns a 2-tuple of ints \var{(f, t)}, where \var{f} |
| 180 | is the number of docstring examples that failed and \var{t} is the total |
| 181 | number of docstring examples attempted. |
| 182 | |
| 183 | \subsection{Which Docstrings Are Examined?} |
| 184 | |
| 185 | See \file{docstring.py} for all the details. They're unsurprising: the |
| 186 | module docstring, and all function, class and method docstrings are |
| 187 | searched, with the exception of docstrings attached to objects with private |
| 188 | names. |
| 189 | |
| 190 | In addition, if \var{M.__test__} exists and "is true", it must be a dict, |
| 191 | and each entry maps a (string) name to a function object, class object, or |
| 192 | string. Function and class object docstrings found from \var{M.__test__} |
| 193 | are searched even if the name is private, and strings are searched directly |
| 194 | as if they were docstrings. In output, a key \var{K} in \var{M.__test__} |
| 195 | appears with name |
| 196 | |
| 197 | \begin{verbatim} |
| 198 | <name of M>.__test__.K |
| 199 | \end{verbatim} |
| 200 | |
| 201 | Any classes found are recursively searched similarly, to test docstrings in |
| 202 | their contained methods and nested classes. While private names reached |
| 203 | from \module{M}'s globals are skipped, all names reached from |
| 204 | \var{M.__test__} are searched. |
| 205 | |
| 206 | \subsection{What's the Execution Context?} |
| 207 | |
| 208 | By default, each time testmod finds a docstring to test, it uses a |
| 209 | {\em copy} of \module{M}'s globals, so that running tests on a module |
| 210 | doesn't change the module's real globals, and so that one test in |
| 211 | \module{M} can't leave behind crumbs that accidentally allow another test |
| 212 | to work. This means examples can freely use any names defined at top-level |
| 213 | in \module{M}, and names defined earlier in the docstring being run. It |
| 214 | also means that sloppy imports (see below) can cause examples in external |
| 215 | docstrings to use globals inappropriate for them. |
| 216 | |
| 217 | You can force use of your own dict as the execution context by passing |
| 218 | \code{globs=your_dict} to \function{testmod()} instead. Presumably this |
| 219 | would be a copy of \var{M.__dict__} merged with the globals from other |
| 220 | imported modules. |
| 221 | |
| 222 | \subsection{What About Exceptions?} |
| 223 | |
| 224 | No problem, as long as the only output generated by the example is the |
| 225 | traceback itself. For example: |
| 226 | |
| 227 | \begin{verbatim} |
| 228 | >>> [1, 2, 3].remove(42) |
| 229 | Traceback (most recent call last): |
| 230 | File "<stdin>", line 1, in ? |
| 231 | ValueError: list.remove(x): x not in list |
| 232 | >>> |
| 233 | \end{verbatim} |
| 234 | |
| 235 | Note that only the exception type and value are compared (specifically, |
| 236 | only the last line in the traceback). The various \code{"File"} lines in |
| 237 | between can be left out (unless they add significantly to the documentation |
| 238 | value of the example). |
| 239 | |
| 240 | \subsection{Advanced Usage} |
| 241 | |
| 242 | \function{testmod()} actually creates a local instance of class |
| 243 | \class{doctest.Tester}, runs appropriate methods of that class, and merges |
| 244 | the results into global \class{Tester} instance \var{doctest.master}. |
| 245 | |
| 246 | You can create your own instances of \class{doctest.Tester}, and so build |
| 247 | your own policies, or even run methods of \var{doctest.master} directly. |
| 248 | See \var{doctest.Tester.__doc__} for details. |
| 249 | |
| 250 | |
| 251 | \subsection{How are Docstring Examples Recognized?} |
| 252 | |
| 253 | In most cases a copy-and-paste of an interactive console session works fine |
| 254 | --- just make sure the leading whitespace is rigidly consistent (you can mix |
| 255 | tabs and spaces if you're too lazy to do it right, but doctest is not in |
| 256 | the business of guessing what you think a tab means). |
| 257 | |
| 258 | \begin{verbatim} |
| 259 | >>> # comments are ignored |
| 260 | >>> x = 12 |
| 261 | >>> x |
| 262 | 12 |
| 263 | >>> if x == 13: |
| 264 | ... print "yes" |
| 265 | ... else: |
| 266 | ... print "no" |
| 267 | ... print "NO" |
| 268 | ... print "NO!!!" |
| 269 | ... |
| 270 | no |
| 271 | NO |
| 272 | NO!!! |
| 273 | >>> |
| 274 | \end{verbatim} |
| 275 | |
| 276 | Any expected output must immediately follow the final \code{">>>"} or |
| 277 | \code{"..."} line containing the code, and the expected output (if any) |
| 278 | extends to the next \code{">>>"} or all-whitespace line. That's it. |
| 279 | |
| 280 | The fine print: |
| 281 | |
| 282 | \begin{itemize} |
| 283 | |
| 284 | \item Expected output cannot contain an all-whitespace line, since such a |
| 285 | line is taken to signal the end of expected output. |
| 286 | |
| 287 | \item Output to stdout is captured, but not output to stderr (exception |
| 288 | tracebacks are captured via a different means). |
| 289 | |
| 290 | \item If you continue a line via backslashing in an interactive session, or |
| 291 | for any other reason use a backslash, you need to double the backslash in |
| 292 | the docstring version. This is simply because you're in a string, and so |
| 293 | the backslash must be escaped for it to survive intact. Like: |
| 294 | |
| 295 | \begin{verbatim} |
| 296 | >>> if "yes" == \\ |
| 297 | ... "y" + \\ |
| 298 | ... "es": |
| 299 | ... print 'yes' |
| 300 | yes |
| 301 | \end{verbatim} |
| 302 | |
| 303 | The starting column doesn't matter: |
| 304 | |
| 305 | \begin{verbatim} |
| 306 | >>> assert "Easy!" |
| 307 | >>> import math |
| 308 | >>> math.floor(1.9) |
| 309 | 1.0 |
| 310 | \end{verbatim} |
| 311 | |
| 312 | and as many leading whitespace characters are stripped from the expected |
| 313 | output as appeared in the initial ">>>" line that triggered it. |
| 314 | |
| 315 | \subsection{Warnings} |
| 316 | |
| 317 | \begin{enumerate} |
| 318 | |
| 319 | \item Sloppy imports can cause trouble; e.g., if you do |
| 320 | |
| 321 | \begin{verbatim} |
| 322 | from XYZ import XYZclass |
| 323 | \end{verbatim} |
| 324 | |
| 325 | then \class{XYZclass} is a name in \var{M.__dict__} too, and doctest has no |
| 326 | way to know that \class{XYZclass} wasn't *defined* in \module{M}. So it may |
| 327 | try to execute the examples in \class{XYZclass}'s docstring, and those in |
| 328 | turn may require a different set of globals to work correctly. I prefer to |
| 329 | do \code{import *}- friendly imports, a la |
| 330 | |
| 331 | \begin{verbatim} |
| 332 | from XYZ import XYZclass as _XYZclass |
| 333 | \end{verbatim} |
| 334 | |
| 335 | and then the leading underscore makes \class{_XYZclass} a private name so |
| 336 | testmod skips it by default. Other approaches are described in |
| 337 | \file{doctest.py}. |
| 338 | |
| 339 | \item \module{doctest} is serious about requiring exact matches in expected |
| 340 | output. If even a single character doesn't match, the test fails. This |
| 341 | will probably surprise you a few times, as you learn exactly what Python |
| 342 | does and doesn't guarantee about output. For example, when printing a |
| 343 | dict, Python doesn't guarantee that the key-value pairs will be printed |
| 344 | in any particular order, so a test like |
| 345 | |
| 346 | % Hey! What happened to Monty Python examples? |
| 347 | \begin{verbatim} |
| 348 | >>> foo() |
| 349 | {"Hermione": "hippogryph", "Harry": "broomstick"} |
| 350 | >>> |
| 351 | \end{verbatim} |
| 352 | |
| 353 | is vulnerable! One workaround is to do |
| 354 | |
| 355 | \begin{verbatim} |
| 356 | >>> foo() == {"Hermione": "hippogryph", "Harry": "broomstick"} |
| 357 | 1 |
| 358 | >>> |
| 359 | \end{verbatim} |
| 360 | |
| 361 | instead. Another is to do |
| 362 | |
| 363 | \begin{verbatim} |
| 364 | >>> d = foo().items() |
| 365 | >>> d.sort() |
| 366 | >>> d |
| 367 | [('Harry', 'broomstick'), ('Hermione', 'hippogryph')] |
| 368 | \end{verbatim} |
| 369 | |
| 370 | There are others, but you get the idea. |
| 371 | |
| 372 | Another bad idea is to print things that embed an object address, like |
| 373 | |
| 374 | \begin{verbatim} |
| 375 | >>> id(1.0) # certain to fail some of the time |
| 376 | 7948648 |
| 377 | >>> |
| 378 | \end{verbatim} |
| 379 | |
| 380 | Floating-point numbers are also subject to small output variations across |
| 381 | platforms, because Python defers to the platform C library for float |
| 382 | formatting, and C libraries vary widely in quality here. |
| 383 | |
| 384 | \begin{verbatim} |
| 385 | >>> 1./7 # risky |
| 386 | 0.14285714285714285 |
| 387 | >>> print 1./7 # safer |
| 388 | 0.142857142857 |
| 389 | >>> print round(1./7, 6) # much safer |
| 390 | 0.142857 |
| 391 | \end{verbatim} |
| 392 | |
| 393 | Numbers of the form \code{I/2.**J} are safe across all platforms, and I |
| 394 | often contrive doctest examples to produce numbers of that form: |
| 395 | |
| 396 | \begin{verbatim} |
| 397 | >>> 3./4 # utterly safe |
| 398 | 0.75 |
| 399 | \end{verbatim} |
| 400 | |
| 401 | Simple fractions are also easier for people to understand, and that makes |
| 402 | for better documentation. |
| 403 | |
| 404 | |
| 405 | \subsection{Soapbox} |
| 406 | |
| 407 | The first word in doctest is "doc", and that's why the author wrote |
| 408 | doctest: to keep documentation up to date. It so happens that doctest |
| 409 | makes a pleasant unit testing environment, but that's not its primary |
| 410 | purpose. |
| 411 | |
| 412 | Choose docstring examples with care. There's an art to this that needs to |
| 413 | be learned --- it may not be natural at first. Examples should add genuine |
| 414 | value to the documentation. A good example can often be worth many words. |
| 415 | If possible, show just a few normal cases, show endcases, show interesting |
| 416 | subtle cases, and show an example of each kind of exception that can be |
| 417 | raised. You're probably testing for endcases and subtle cases anyway in an |
| 418 | interactive shell: doctest wants to make it as easy as possible to capture |
| 419 | those sessions, and will verify they continue to work as designed forever |
| 420 | after. |
| 421 | |
| 422 | If done with care, the examples will be invaluable for your users, and will |
| 423 | pay back the time it takes to collect them many times over as the years go |
| 424 | by and "things change". I'm still amazed at how often one of my doctest |
| 425 | examples stops working after a "harmless" change. |
| 426 | |
| 427 | For exhaustive testing, or testing boring cases that add no value to the |
| 428 | docs, define a \var{__test__} dict instead. That's what it's for. |