blob: 2f691708b1dfa22a2f5aad2d1c47d383e3bfe69b [file] [log] [blame]
Guido van Rossumdf804f81995-03-02 12:38:39 +00001\chapter{The Python Profiler}
2\stmodindex{profile}
3\stmodindex{pstats}
4
Fred Drake4b3f0311996-12-13 22:04:31 +00005Copyright \copyright{} 1994, by InfoSeek Corporation, all rights reserved.
Guido van Rossumdf804f81995-03-02 12:38:39 +00006
7Written by James Roskind%
8\footnote{
Guido van Rossum6c4f0031995-03-07 10:14:09 +00009Updated and converted to \LaTeX\ by Guido van Rossum. The references to
Guido van Rossumdf804f81995-03-02 12:38:39 +000010the old profiler are left in the text, although it no longer exists.
11}
12
13Permission to use, copy, modify, and distribute this Python software
14and its associated documentation for any purpose (subject to the
15restriction in the following sentence) without fee is hereby granted,
16provided that the above copyright notice appears in all copies, and
17that both that copyright notice and this permission notice appear in
18supporting documentation, and that the name of InfoSeek not be used in
19advertising or publicity pertaining to distribution of the software
20without specific, written prior permission. This permission is
21explicitly restricted to the copying and modification of the software
22to remain in Python, compiled Python, or other languages (such as C)
23wherein the modified or derived code is exclusively imported into a
24Python module.
25
26INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
27SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
28FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
29SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
30RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
31CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
32CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
33
34
35The profiler was written after only programming in Python for 3 weeks.
36As a result, it is probably clumsy code, but I don't know for sure yet
37'cause I'm a beginner :-). I did work hard to make the code run fast,
38so that profiling would be a reasonable thing to do. I tried not to
39repeat code fragments, but I'm sure I did some stuff in really awkward
40ways at times. Please send suggestions for improvements to:
Guido van Rossum789742b1996-02-12 23:17:40 +000041\code{jar@netscape.com}. I won't promise \emph{any} support. ...but
Guido van Rossumdf804f81995-03-02 12:38:39 +000042I'd appreciate the feedback.
43
44
Guido van Rossum470be141995-03-17 16:07:09 +000045\section{Introduction to the profiler}
Guido van Rossum86cb0921995-03-20 12:59:56 +000046\nodename{Profiler Introduction}
Guido van Rossumdf804f81995-03-02 12:38:39 +000047
48A \dfn{profiler} is a program that describes the run time performance
49of a program, providing a variety of statistics. This documentation
50describes the profiler functionality provided in the modules
51\code{profile} and \code{pstats.} This profiler provides
52\dfn{deterministic profiling} of any Python programs. It also
53provides a series of report generation tools to allow users to rapidly
54examine the results of a profile operation.
55
56
57\section{How Is This Profiler Different From The Old Profiler?}
Guido van Rossum86cb0921995-03-20 12:59:56 +000058\nodename{Profiler Changes}
Guido van Rossumdf804f81995-03-02 12:38:39 +000059
60The big changes from old profiling module are that you get more
61information, and you pay less CPU time. It's not a trade-off, it's a
62trade-up.
63
64To be specific:
65
66\begin{description}
67
68\item[Bugs removed:]
69Local stack frame is no longer molested, execution time is now charged
70to correct functions.
71
72\item[Accuracy increased:]
73Profiler execution time is no longer charged to user's code,
74calibration for platform is supported, file reads are not done \emph{by}
75profiler \emph{during} profiling (and charged to user's code!).
76
77\item[Speed increased:]
78Overhead CPU cost was reduced by more than a factor of two (perhaps a
79factor of five), lightweight profiler module is all that must be
80loaded, and the report generating module (\code{pstats}) is not needed
81during profiling.
82
83\item[Recursive functions support:]
84Cumulative times in recursive functions are correctly calculated;
85recursive entries are counted.
86
87\item[Large growth in report generating UI:]
88Distinct profiles runs can be added together forming a comprehensive
89report; functions that import statistics take arbitrary lists of
90files; sorting criteria is now based on keywords (instead of 4 integer
91options); reports shows what functions were profiled as well as what
92profile file was referenced; output format has been improved.
93
94\end{description}
95
96
97\section{Instant Users Manual}
98
99This section is provided for users that ``don't want to read the
100manual.'' It provides a very brief overview, and allows a user to
101rapidly perform profiling on an existing application.
102
103To profile an application with a main entry point of \samp{foo()}, you
104would add the following to your module:
105
Guido van Rossume47da0a1997-07-17 16:34:52 +0000106\bcode\begin{verbatim}
107import profile
108profile.run("foo()")
109\end{verbatim}\ecode
110%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000111The above action would cause \samp{foo()} to be run, and a series of
112informative lines (the profile) to be printed. The above approach is
113most useful when working with the interpreter. If you would like to
114save the results of a profile into a file for later examination, you
115can supply a file name as the second argument to the \code{run()}
116function:
117
Guido van Rossume47da0a1997-07-17 16:34:52 +0000118\bcode\begin{verbatim}
119import profile
120profile.run("foo()", 'fooprof')
121\end{verbatim}\ecode
122%
Guido van Rossumbac80021997-06-02 17:29:12 +0000123\code{profile.py} can also be invoked as
124a script to profile another script. For example:
125\code{python /usr/local/lib/python1.4/profile.py myscript.py}
126
Guido van Rossumdf804f81995-03-02 12:38:39 +0000127When you wish to review the profile, you should use the methods in the
128\code{pstats} module. Typically you would load the statistics data as
129follows:
130
Guido van Rossume47da0a1997-07-17 16:34:52 +0000131\bcode\begin{verbatim}
132import pstats
133p = pstats.Stats('fooprof')
134\end{verbatim}\ecode
135%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000136The class \code{Stats} (the above code just created an instance of
137this class) has a variety of methods for manipulating and printing the
138data that was just read into \samp{p}. When you ran
139\code{profile.run()} above, what was printed was the result of three
140method calls:
141
Guido van Rossume47da0a1997-07-17 16:34:52 +0000142\bcode\begin{verbatim}
143p.strip_dirs().sort_stats(-1).print_stats()
144\end{verbatim}\ecode
145%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000146The first method removed the extraneous path from all the module
147names. The second method sorted all the entries according to the
148standard module/line/name string that is printed (this is to comply
149with the semantics of the old profiler). The third method printed out
150all the statistics. You might try the following sort calls:
151
Guido van Rossume47da0a1997-07-17 16:34:52 +0000152\bcode\begin{verbatim}
153p.sort_stats('name')
154p.print_stats()
155\end{verbatim}\ecode
156%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000157The first call will actually sort the list by function name, and the
158second call will print out the statistics. The following are some
159interesting calls to experiment with:
160
Guido van Rossume47da0a1997-07-17 16:34:52 +0000161\bcode\begin{verbatim}
162p.sort_stats('cumulative').print_stats(10)
163\end{verbatim}\ecode
164%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000165This sorts the profile by cumulative time in a function, and then only
166prints the ten most significant lines. If you want to understand what
167algorithms are taking time, the above line is what you would use.
168
169If you were looking to see what functions were looping a lot, and
170taking a lot of time, you would do:
171
Guido van Rossume47da0a1997-07-17 16:34:52 +0000172\bcode\begin{verbatim}
173p.sort_stats('time').print_stats(10)
174\end{verbatim}\ecode
175%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000176to sort according to time spent within each function, and then print
177the statistics for the top ten functions.
178
179You might also try:
180
Guido van Rossume47da0a1997-07-17 16:34:52 +0000181\bcode\begin{verbatim}
182p.sort_stats('file').print_stats('__init__')
183\end{verbatim}\ecode
184%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000185This will sort all the statistics by file name, and then print out
186statistics for only the class init methods ('cause they are spelled
187with \code{__init__} in them). As one final example, you could try:
188
Guido van Rossume47da0a1997-07-17 16:34:52 +0000189\bcode\begin{verbatim}
190p.sort_stats('time', 'cum').print_stats(.5, 'init')
191\end{verbatim}\ecode
192%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000193This line sorts statistics with a primary key of time, and a secondary
194key of cumulative time, and then prints out some of the statistics.
195To be specific, the list is first culled down to 50\% (re: \samp{.5})
196of its original size, then only lines containing \code{init} are
197maintained, and that sub-sub-list is printed.
198
199If you wondered what functions called the above functions, you could
200now (\samp{p} is still sorted according to the last criteria) do:
201
Guido van Rossume47da0a1997-07-17 16:34:52 +0000202\bcode\begin{verbatim}
203p.print_callers(.5, 'init')
204\end{verbatim}\ecode
205%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000206and you would get a list of callers for each of the listed functions.
207
208If you want more functionality, you're going to have to read the
209manual, or guess what the following functions do:
210
Guido van Rossume47da0a1997-07-17 16:34:52 +0000211\bcode\begin{verbatim}
212p.print_callees()
213p.add('fooprof')
214\end{verbatim}\ecode
215%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000216\section{What Is Deterministic Profiling?}
Guido van Rossum86cb0921995-03-20 12:59:56 +0000217\nodename{Deterministic Profiling}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000218
219\dfn{Deterministic profiling} is meant to reflect the fact that all
220\dfn{function call}, \dfn{function return}, and \dfn{exception} events
221are monitored, and precise timings are made for the intervals between
222these events (during which time the user's code is executing). In
223contrast, \dfn{statistical profiling} (which is not done by this
224module) randomly samples the effective instruction pointer, and
225deduces where time is being spent. The latter technique traditionally
226involves less overhead (as the code does not need to be instrumented),
227but provides only relative indications of where time is being spent.
228
229In Python, since there is an interpreter active during execution, the
230presence of instrumented code is not required to do deterministic
231profiling. Python automatically provides a \dfn{hook} (optional
232callback) for each event. In addition, the interpreted nature of
233Python tends to add so much overhead to execution, that deterministic
234profiling tends to only add small processing overhead in typical
235applications. The result is that deterministic profiling is not that
236expensive, yet provides extensive run time statistics about the
237execution of a Python program.
238
239Call count statistics can be used to identify bugs in code (surprising
240counts), and to identify possible inline-expansion points (high call
241counts). Internal time statistics can be used to identify ``hot
242loops'' that should be carefully optimized. Cumulative time
243statistics should be used to identify high level errors in the
244selection of algorithms. Note that the unusual handling of cumulative
245times in this profiler allows statistics for recursive implementations
246of algorithms to be directly compared to iterative implementations.
247
248
249\section{Reference Manual}
250
Guido van Rossum470be141995-03-17 16:07:09 +0000251\renewcommand{\indexsubitem}{(profiler function)}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000252
253The primary entry point for the profiler is the global function
254\code{profile.run()}. It is typically used to create any profile
255information. The reports are formatted and printed using methods of
256the class \code{pstats.Stats}. The following is a description of all
257of these standard entry points and functions. For a more in-depth
258view of some of the code, consider reading the later section on
259Profiler Extensions, which includes discussion of how to derive
260``better'' profilers from the classes presented, or reading the source
261code for these modules.
262
Guido van Rossum470be141995-03-17 16:07:09 +0000263\begin{funcdesc}{profile.run}{string\optional{\, filename\optional{\, ...}}}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000264
265This function takes a single argument that has can be passed to the
266\code{exec} statement, and an optional file name. In all cases this
267routine attempts to \code{exec} its first argument, and gather profiling
268statistics from the execution. If no file name is present, then this
269function automatically prints a simple profiling report, sorted by the
270standard name string (file/line/function-name) that is presented in
271each line. The following is a typical output from such a call:
272
Guido van Rossum96628a91995-04-10 11:34:00 +0000273\small{
Guido van Rossume47da0a1997-07-17 16:34:52 +0000274\bcode\begin{verbatim}
Guido van Rossum96628a91995-04-10 11:34:00 +0000275 main()
276 2706 function calls (2004 primitive calls) in 4.504 CPU seconds
Guido van Rossumdf804f81995-03-02 12:38:39 +0000277
Guido van Rossum96628a91995-04-10 11:34:00 +0000278Ordered by: standard name
Guido van Rossumdf804f81995-03-02 12:38:39 +0000279
Guido van Rossum96628a91995-04-10 11:34:00 +0000280ncalls tottime percall cumtime percall filename:lineno(function)
281 2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
282 43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
283 ...
Guido van Rossume47da0a1997-07-17 16:34:52 +0000284\end{verbatim}\ecode
Guido van Rossum96628a91995-04-10 11:34:00 +0000285}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000286
287The first line indicates that this profile was generated by the call:\\
288\code{profile.run('main()')}, and hence the exec'ed string is
289\code{'main()'}. The second line indicates that 2706 calls were
290monitored. Of those calls, 2004 were \dfn{primitive}. We define
291\dfn{primitive} to mean that the call was not induced via recursion.
292The next line: \code{Ordered by:\ standard name}, indicates that
293the text string in the far right column was used to sort the output.
294The column headings include:
295
296\begin{description}
297
298\item[ncalls ]
299for the number of calls,
300
301\item[tottime ]
302for the total time spent in the given function (and excluding time
303made in calls to sub-functions),
304
305\item[percall ]
306is the quotient of \code{tottime} divided by \code{ncalls}
307
308\item[cumtime ]
309is the total time spent in this and all subfunctions (i.e., from
310invocation till exit). This figure is accurate \emph{even} for recursive
311functions.
312
313\item[percall ]
314is the quotient of \code{cumtime} divided by primitive calls
315
316\item[filename:lineno(function) ]
317provides the respective data of each function
318
319\end{description}
320
321When there are two numbers in the first column (e.g.: \samp{43/3}),
322then the latter is the number of primitive calls, and the former is
323the actual number of calls. Note that when the function does not
324recurse, these two values are the same, and only the single figure is
325printed.
Guido van Rossum96628a91995-04-10 11:34:00 +0000326
Guido van Rossumdf804f81995-03-02 12:38:39 +0000327\end{funcdesc}
328
329\begin{funcdesc}{pstats.Stats}{filename\optional{\, ...}}
330This class constructor creates an instance of a ``statistics object''
331from a \var{filename} (or set of filenames). \code{Stats} objects are
332manipulated by methods, in order to print useful reports.
333
334The file selected by the above constructor must have been created by
335the corresponding version of \code{profile}. To be specific, there is
336\emph{NO} file compatibility guaranteed with future versions of this
337profiler, and there is no compatibility with files produced by other
338profilers (e.g., the old system profiler).
339
340If several files are provided, all the statistics for identical
341functions will be coalesced, so that an overall view of several
342processes can be considered in a single report. If additional files
343need to be combined with data in an existing \code{Stats} object, the
344\code{add()} method can be used.
345\end{funcdesc}
346
347
Guido van Rossum470be141995-03-17 16:07:09 +0000348\subsection{The \sectcode{Stats} Class}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000349
350\renewcommand{\indexsubitem}{(Stats method)}
351
352\begin{funcdesc}{strip_dirs}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000353This method for the \code{Stats} class removes all leading path information
Guido van Rossumdf804f81995-03-02 12:38:39 +0000354from file names. It is very useful in reducing the size of the
355printout to fit within (close to) 80 columns. This method modifies
356the object, and the stripped information is lost. After performing a
357strip operation, the object is considered to have its entries in a
358``random'' order, as it was just after object initialization and
359loading. If \code{strip_dirs()} causes two function names to be
360indistinguishable (i.e., they are on the same line of the same
361filename, and have the same function name), then the statistics for
362these two entries are accumulated into a single entry.
363\end{funcdesc}
364
365
366\begin{funcdesc}{add}{filename\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000367This method of the \code{Stats} class accumulates additional profiling
Guido van Rossumdf804f81995-03-02 12:38:39 +0000368information into the current profiling object. Its arguments should
369refer to filenames created by the corresponding version of
370\code{profile.run()}. Statistics for identically named (re: file,
371line, name) functions are automatically accumulated into single
372function statistics.
373\end{funcdesc}
374
375\begin{funcdesc}{sort_stats}{key\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000376This method modifies the \code{Stats} object by sorting it according to the
Guido van Rossumdf804f81995-03-02 12:38:39 +0000377supplied criteria. The argument is typically a string identifying the
378basis of a sort (example: \code{"time"} or \code{"name"}).
379
380When more than one key is provided, then additional keys are used as
381secondary criteria when the there is equality in all keys selected
382before them. For example, sort_stats('name', 'file') will sort all
383the entries according to their function name, and resolve all ties
384(identical function names) by sorting by file name.
385
386Abbreviations can be used for any key names, as long as the
387abbreviation is unambiguous. The following are the keys currently
388defined:
389
390\begin{tableii}{|l|l|}{code}{Valid Arg}{Meaning}
391\lineii{"calls"}{call count}
392\lineii{"cumulative"}{cumulative time}
393\lineii{"file"}{file name}
394\lineii{"module"}{file name}
395\lineii{"pcalls"}{primitive call count}
396\lineii{"line"}{line number}
397\lineii{"name"}{function name}
398\lineii{"nfl"}{name/file/line}
399\lineii{"stdname"}{standard name}
400\lineii{"time"}{internal time}
401\end{tableii}
402
403Note that all sorts on statistics are in descending order (placing
404most time consuming items first), where as name, file, and line number
405searches are in ascending order (i.e., alphabetical). The subtle
406distinction between \code{"nfl"} and \code{"stdname"} is that the
407standard name is a sort of the name as printed, which means that the
408embedded line numbers get compared in an odd way. For example, lines
4093, 20, and 40 would (if the file names were the same) appear in the
410string order 20, 3 and 40. In contrast, \code{"nfl"} does a numeric
411compare of the line numbers. In fact, \code{sort_stats("nfl")} is the
412same as \code{sort_stats("name", "file", "line")}.
413
414For compatibility with the old profiler, the numeric arguments
415\samp{-1}, \samp{0}, \samp{1}, and \samp{2} are permitted. They are
416interpreted as \code{"stdname"}, \code{"calls"}, \code{"time"}, and
417\code{"cumulative"} respectively. If this old style format (numeric)
418is used, only one sort key (the numeric key) will be used, and
419additional arguments will be silently ignored.
420\end{funcdesc}
421
422
423\begin{funcdesc}{reverse_order}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000424This method for the \code{Stats} class reverses the ordering of the basic
Guido van Rossumdf804f81995-03-02 12:38:39 +0000425list within the object. This method is provided primarily for
426compatibility with the old profiler. Its utility is questionable
427now that ascending vs descending order is properly selected based on
428the sort key of choice.
429\end{funcdesc}
430
431\begin{funcdesc}{print_stats}{restriction\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000432This method for the \code{Stats} class prints out a report as described
Guido van Rossumdf804f81995-03-02 12:38:39 +0000433in the \code{profile.run()} definition.
434
435The order of the printing is based on the last \code{sort_stats()}
436operation done on the object (subject to caveats in \code{add()} and
437\code{strip_dirs())}.
438
439The arguments provided (if any) can be used to limit the list down to
440the significant entries. Initially, the list is taken to be the
441complete set of profiled functions. Each restriction is either an
442integer (to select a count of lines), or a decimal fraction between
4430.0 and 1.0 inclusive (to select a percentage of lines), or a regular
444expression (to pattern match the standard name that is printed). If
445several restrictions are provided, then they are applied sequentially.
446For example:
447
Guido van Rossume47da0a1997-07-17 16:34:52 +0000448\bcode\begin{verbatim}
449print_stats(.1, "foo:")
450\end{verbatim}\ecode
451%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000452would first limit the printing to first 10\% of list, and then only
453print functions that were part of filename \samp{.*foo:}. In
454contrast, the command:
455
Guido van Rossume47da0a1997-07-17 16:34:52 +0000456\bcode\begin{verbatim}
457print_stats("foo:", .1)
458\end{verbatim}\ecode
459%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000460would limit the list to all functions having file names \samp{.*foo:},
461and then proceed to only print the first 10\% of them.
462\end{funcdesc}
463
464
465\begin{funcdesc}{print_callers}{restrictions\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000466This method for the \code{Stats} class prints a list of all functions
Guido van Rossumdf804f81995-03-02 12:38:39 +0000467that called each function in the profiled database. The ordering is
468identical to that provided by \code{print_stats()}, and the definition
469of the restricting argument is also identical. For convenience, a
470number is shown in parentheses after each caller to show how many
471times this specific call was made. A second non-parenthesized number
472is the cumulative time spent in the function at the right.
473\end{funcdesc}
474
475\begin{funcdesc}{print_callees}{restrictions\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000476This method for the \code{Stats} class prints a list of all function
Guido van Rossumdf804f81995-03-02 12:38:39 +0000477that were called by the indicated function. Aside from this reversal
478of direction of calls (re: called vs was called by), the arguments and
479ordering are identical to the \code{print_callers()} method.
480\end{funcdesc}
481
482\begin{funcdesc}{ignore}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000483This method of the \code{Stats} class is used to dispose of the value
Guido van Rossumdf804f81995-03-02 12:38:39 +0000484returned by earlier methods. All standard methods in this class
485return the instance that is being processed, so that the commands can
486be strung together. For example:
487
Guido van Rossume47da0a1997-07-17 16:34:52 +0000488\bcode\begin{verbatim}
Guido van Rossum96628a91995-04-10 11:34:00 +0000489pstats.Stats('foofile').strip_dirs().sort_stats('cum') \
490 .print_stats().ignore()
Guido van Rossume47da0a1997-07-17 16:34:52 +0000491\end{verbatim}\ecode
492%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000493would perform all the indicated functions, but it would not return
Guido van Rossum470be141995-03-17 16:07:09 +0000494the final reference to the \code{Stats} instance.%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000495\footnote{
496This was once necessary, when Python would print any unused expression
497result that was not \code{None}. The method is still defined for
498backward compatibility.
499}
500\end{funcdesc}
501
502
503\section{Limitations}
504
505There are two fundamental limitations on this profiler. The first is
506that it relies on the Python interpreter to dispatch \dfn{call},
507\dfn{return}, and \dfn{exception} events. Compiled C code does not
508get interpreted, and hence is ``invisible'' to the profiler. All time
509spent in C code (including builtin functions) will be charged to the
Guido van Rossumcca8d2b1995-03-22 15:48:46 +0000510Python function that invoked the C code. If the C code calls out
Guido van Rossumdf804f81995-03-02 12:38:39 +0000511to some native Python code, then those calls will be profiled
512properly.
513
514The second limitation has to do with accuracy of timing information.
515There is a fundamental problem with deterministic profilers involving
516accuracy. The most obvious restriction is that the underlying ``clock''
517is only ticking at a rate (typically) of about .001 seconds. Hence no
518measurements will be more accurate that that underlying clock. If
519enough measurements are taken, then the ``error'' will tend to average
520out. Unfortunately, removing this first error induces a second source
521of error...
522
523The second problem is that it ``takes a while'' from when an event is
524dispatched until the profiler's call to get the time actually
525\emph{gets} the state of the clock. Similarly, there is a certain lag
526when exiting the profiler event handler from the time that the clock's
527value was obtained (and then squirreled away), until the user's code
528is once again executing. As a result, functions that are called many
529times, or call many functions, will typically accumulate this error.
530The error that accumulates in this fashion is typically less than the
531accuracy of the clock (i.e., less than one clock tick), but it
532\emph{can} accumulate and become very significant. This profiler
533provides a means of calibrating itself for a given platform so that
534this error can be probabilistically (i.e., on the average) removed.
535After the profiler is calibrated, it will be more accurate (in a least
536square sense), but it will sometimes produce negative numbers (when
537call counts are exceptionally low, and the gods of probability work
538against you :-). ) Do \emph{NOT} be alarmed by negative numbers in
539the profile. They should \emph{only} appear if you have calibrated
540your profiler, and the results are actually better than without
541calibration.
542
543
544\section{Calibration}
545
546The profiler class has a hard coded constant that is added to each
547event handling time to compensate for the overhead of calling the time
548function, and socking away the results. The following procedure can
549be used to obtain this constant for a given platform (see discussion
550in section Limitations above).
551
Guido van Rossume47da0a1997-07-17 16:34:52 +0000552\bcode\begin{verbatim}
553import profile
554pr = profile.Profile()
555pr.calibrate(100)
556pr.calibrate(100)
557pr.calibrate(100)
558\end{verbatim}\ecode
559%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000560The argument to calibrate() is the number of times to try to do the
561sample calls to get the CPU times. If your computer is \emph{very}
562fast, you might have to do:
563
Guido van Rossume47da0a1997-07-17 16:34:52 +0000564\bcode\begin{verbatim}
565pr.calibrate(1000)
566\end{verbatim}\ecode
567%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000568or even:
569
Guido van Rossume47da0a1997-07-17 16:34:52 +0000570\bcode\begin{verbatim}
571pr.calibrate(10000)
572\end{verbatim}\ecode
573%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000574The object of this exercise is to get a fairly consistent result.
575When you have a consistent answer, you are ready to use that number in
576the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the
577magical number is about .00053. If you have a choice, you are better
578off with a smaller constant, and your results will ``less often'' show
579up as negative in profile statistics.
580
581The following shows how the trace_dispatch() method in the Profile
582class should be modified to install the calibration constant on a Sun
583Sparcstation 1000:
584
Guido van Rossume47da0a1997-07-17 16:34:52 +0000585\bcode\begin{verbatim}
586def trace_dispatch(self, frame, event, arg):
587 t = self.timer()
588 t = t[0] + t[1] - self.t - .00053 # Calibration constant
589
590 if self.dispatch[event](frame,t):
Guido van Rossumdf804f81995-03-02 12:38:39 +0000591 t = self.timer()
Guido van Rossume47da0a1997-07-17 16:34:52 +0000592 self.t = t[0] + t[1]
593 else:
594 r = self.timer()
595 self.t = r[0] + r[1] - t # put back unrecorded delta
596 return
597\end{verbatim}\ecode
598%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000599Note that if there is no calibration constant, then the line
600containing the callibration constant should simply say:
601
Guido van Rossume47da0a1997-07-17 16:34:52 +0000602\bcode\begin{verbatim}
603t = t[0] + t[1] - self.t # no calibration constant
604\end{verbatim}\ecode
605%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000606You can also achieve the same results using a derived class (and the
607profiler will actually run equally fast!!), but the above method is
608the simplest to use. I could have made the profiler ``self
609calibrating'', but it would have made the initialization of the
610profiler class slower, and would have required some \emph{very} fancy
611coding, or else the use of a variable where the constant \samp{.00053}
612was placed in the code shown. This is a \strong{VERY} critical
613performance section, and there is no reason to use a variable lookup
614at this point, when a constant can be used.
615
616
Guido van Rossum86cb0921995-03-20 12:59:56 +0000617\section{Extensions --- Deriving Better Profilers}
618\nodename{Profiler Extensions}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000619
620The \code{Profile} class of module \code{profile} was written so that
621derived classes could be developed to extend the profiler. Rather
622than describing all the details of such an effort, I'll just present
623the following two examples of derived classes that can be used to do
624profiling. If the reader is an avid Python programmer, then it should
625be possible to use these as a model and create similar (and perchance
626better) profile classes.
627
628If all you want to do is change how the timer is called, or which
629timer function is used, then the basic class has an option for that in
630the constructor for the class. Consider passing the name of a
631function to call into the constructor:
632
Guido van Rossume47da0a1997-07-17 16:34:52 +0000633\bcode\begin{verbatim}
634pr = profile.Profile(your_time_func)
635\end{verbatim}\ecode
636%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000637The resulting profiler will call \code{your_time_func()} instead of
638\code{os.times()}. The function should return either a single number
639or a list of numbers (like what \code{os.times()} returns). If the
640function returns a single time number, or the list of returned numbers
641has length 2, then you will get an especially fast version of the
642dispatch routine.
643
644Be warned that you \emph{should} calibrate the profiler class for the
645timer function that you choose. For most machines, a timer that
646returns a lone integer value will provide the best results in terms of
647low overhead during profiling. (os.times is \emph{pretty} bad, 'cause
648it returns a tuple of floating point values, so all arithmetic is
649floating point in the profiler!). If you want to substitute a
650better timer in the cleanest fashion, you should derive a class, and
651simply put in the replacement dispatch method that better handles your
652timer call, along with the appropriate calibration constant :-).
653
654
655\subsection{OldProfile Class}
656
657The following derived profiler simulates the old style profiler,
658providing errant results on recursive functions. The reason for the
659usefulness of this profiler is that it runs faster (i.e., less
660overhead) than the old profiler. It still creates all the caller
661stats, and is quite useful when there is \emph{no} recursion in the
662user's code. It is also a lot more accurate than the old profiler, as
663it does not charge all its overhead time to the user's code.
664
Guido van Rossume47da0a1997-07-17 16:34:52 +0000665\bcode\begin{verbatim}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000666class OldProfile(Profile):
667
668 def trace_dispatch_exception(self, frame, t):
669 rt, rtt, rct, rfn, rframe, rcur = self.cur
670 if rcur and not rframe is frame:
671 return self.trace_dispatch_return(rframe, t)
672 return 0
673
674 def trace_dispatch_call(self, frame, t):
675 fn = `frame.f_code`
676
677 self.cur = (t, 0, 0, fn, frame, self.cur)
678 if self.timings.has_key(fn):
679 tt, ct, callers = self.timings[fn]
680 self.timings[fn] = tt, ct, callers
681 else:
682 self.timings[fn] = 0, 0, {}
683 return 1
684
685 def trace_dispatch_return(self, frame, t):
686 rt, rtt, rct, rfn, frame, rcur = self.cur
687 rtt = rtt + t
688 sft = rtt + rct
689
690 pt, ptt, pct, pfn, pframe, pcur = rcur
691 self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur
692
693 tt, ct, callers = self.timings[rfn]
694 if callers.has_key(pfn):
695 callers[pfn] = callers[pfn] + 1
696 else:
697 callers[pfn] = 1
698 self.timings[rfn] = tt+rtt, ct + sft, callers
699
700 return 1
701
702
703 def snapshot_stats(self):
704 self.stats = {}
705 for func in self.timings.keys():
706 tt, ct, callers = self.timings[func]
707 nor_func = self.func_normalize(func)
708 nor_callers = {}
709 nc = 0
710 for func_caller in callers.keys():
711 nor_callers[self.func_normalize(func_caller)]=\
712 callers[func_caller]
713 nc = nc + callers[func_caller]
714 self.stats[nor_func] = nc, nc, tt, ct, nor_callers
Guido van Rossume47da0a1997-07-17 16:34:52 +0000715\end{verbatim}\ecode
716%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000717\subsection{HotProfile Class}
718
719This profiler is the fastest derived profile example. It does not
720calculate caller-callee relationships, and does not calculate
721cumulative time under a function. It only calculates time spent in a
722function, so it runs very quickly (re: very low overhead). In truth,
723the basic profiler is so fast, that is probably not worth the savings
724to give up the data, but this class still provides a nice example.
725
Guido van Rossume47da0a1997-07-17 16:34:52 +0000726\bcode\begin{verbatim}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000727class HotProfile(Profile):
728
729 def trace_dispatch_exception(self, frame, t):
730 rt, rtt, rfn, rframe, rcur = self.cur
731 if rcur and not rframe is frame:
732 return self.trace_dispatch_return(rframe, t)
733 return 0
734
735 def trace_dispatch_call(self, frame, t):
736 self.cur = (t, 0, frame, self.cur)
737 return 1
738
739 def trace_dispatch_return(self, frame, t):
740 rt, rtt, frame, rcur = self.cur
741
742 rfn = `frame.f_code`
743
744 pt, ptt, pframe, pcur = rcur
745 self.cur = pt, ptt+rt, pframe, pcur
746
747 if self.timings.has_key(rfn):
748 nc, tt = self.timings[rfn]
749 self.timings[rfn] = nc + 1, rt + rtt + tt
750 else:
751 self.timings[rfn] = 1, rt + rtt
752
753 return 1
754
755
756 def snapshot_stats(self):
757 self.stats = {}
758 for func in self.timings.keys():
759 nc, tt = self.timings[func]
760 nor_func = self.func_normalize(func)
761 self.stats[nor_func] = nc, nc, tt, 0, {}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000762\end{verbatim}\ecode