blob: ecf41c36dd4bbba0069e2a75d2fb4649c811e760 [file] [log] [blame]
Guido van Rossumdf804f81995-03-02 12:38:39 +00001\chapter{The Python Profiler}
2\stmodindex{profile}
3\stmodindex{pstats}
4
Guido van Rossum470be141995-03-17 16:07:09 +00005Copyright \copyright\ 1994, by InfoSeek Corporation, all rights reserved.
Guido van Rossumdf804f81995-03-02 12:38:39 +00006
7Written by James Roskind%
8\footnote{
Guido van Rossum6c4f0031995-03-07 10:14:09 +00009Updated and converted to \LaTeX\ by Guido van Rossum. The references to
Guido van Rossumdf804f81995-03-02 12:38:39 +000010the old profiler are left in the text, although it no longer exists.
11}
12
13Permission to use, copy, modify, and distribute this Python software
14and its associated documentation for any purpose (subject to the
15restriction in the following sentence) without fee is hereby granted,
16provided that the above copyright notice appears in all copies, and
17that both that copyright notice and this permission notice appear in
18supporting documentation, and that the name of InfoSeek not be used in
19advertising or publicity pertaining to distribution of the software
20without specific, written prior permission. This permission is
21explicitly restricted to the copying and modification of the software
22to remain in Python, compiled Python, or other languages (such as C)
23wherein the modified or derived code is exclusively imported into a
24Python module.
25
26INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
27SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
28FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
29SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
30RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
31CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
32CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
33
34
35The profiler was written after only programming in Python for 3 weeks.
36As a result, it is probably clumsy code, but I don't know for sure yet
37'cause I'm a beginner :-). I did work hard to make the code run fast,
38so that profiling would be a reasonable thing to do. I tried not to
39repeat code fragments, but I'm sure I did some stuff in really awkward
40ways at times. Please send suggestions for improvements to:
41\code{jar@infoseek.com}. I won't promise \emph{any} support. ...but
42I'd appreciate the feedback.
43
44
Guido van Rossum470be141995-03-17 16:07:09 +000045\section{Introduction to the profiler}
Guido van Rossum86cb0921995-03-20 12:59:56 +000046\nodename{Profiler Introduction}
Guido van Rossumdf804f81995-03-02 12:38:39 +000047
48A \dfn{profiler} is a program that describes the run time performance
49of a program, providing a variety of statistics. This documentation
50describes the profiler functionality provided in the modules
51\code{profile} and \code{pstats.} This profiler provides
52\dfn{deterministic profiling} of any Python programs. It also
53provides a series of report generation tools to allow users to rapidly
54examine the results of a profile operation.
55
56
57\section{How Is This Profiler Different From The Old Profiler?}
Guido van Rossum86cb0921995-03-20 12:59:56 +000058\nodename{Profiler Changes}
Guido van Rossumdf804f81995-03-02 12:38:39 +000059
60The big changes from old profiling module are that you get more
61information, and you pay less CPU time. It's not a trade-off, it's a
62trade-up.
63
64To be specific:
65
66\begin{description}
67
68\item[Bugs removed:]
69Local stack frame is no longer molested, execution time is now charged
70to correct functions.
71
72\item[Accuracy increased:]
73Profiler execution time is no longer charged to user's code,
74calibration for platform is supported, file reads are not done \emph{by}
75profiler \emph{during} profiling (and charged to user's code!).
76
77\item[Speed increased:]
78Overhead CPU cost was reduced by more than a factor of two (perhaps a
79factor of five), lightweight profiler module is all that must be
80loaded, and the report generating module (\code{pstats}) is not needed
81during profiling.
82
83\item[Recursive functions support:]
84Cumulative times in recursive functions are correctly calculated;
85recursive entries are counted.
86
87\item[Large growth in report generating UI:]
88Distinct profiles runs can be added together forming a comprehensive
89report; functions that import statistics take arbitrary lists of
90files; sorting criteria is now based on keywords (instead of 4 integer
91options); reports shows what functions were profiled as well as what
92profile file was referenced; output format has been improved.
93
94\end{description}
95
96
97\section{Instant Users Manual}
98
99This section is provided for users that ``don't want to read the
100manual.'' It provides a very brief overview, and allows a user to
101rapidly perform profiling on an existing application.
102
103To profile an application with a main entry point of \samp{foo()}, you
104would add the following to your module:
105
106\begin{verbatim}
107 import profile
108 profile.run("foo()")
109\end{verbatim}
110
111The above action would cause \samp{foo()} to be run, and a series of
112informative lines (the profile) to be printed. The above approach is
113most useful when working with the interpreter. If you would like to
114save the results of a profile into a file for later examination, you
115can supply a file name as the second argument to the \code{run()}
116function:
117
118\begin{verbatim}
119 import profile
120 profile.run("foo()", 'fooprof')
121\end{verbatim}
122
123When you wish to review the profile, you should use the methods in the
124\code{pstats} module. Typically you would load the statistics data as
125follows:
126
127\begin{verbatim}
128 import pstats
129 p = pstats.Stats('fooprof')
130\end{verbatim}
131
132The class \code{Stats} (the above code just created an instance of
133this class) has a variety of methods for manipulating and printing the
134data that was just read into \samp{p}. When you ran
135\code{profile.run()} above, what was printed was the result of three
136method calls:
137
138\begin{verbatim}
139 p.strip_dirs().sort_stats(-1).print_stats()
140\end{verbatim}
141
142The first method removed the extraneous path from all the module
143names. The second method sorted all the entries according to the
144standard module/line/name string that is printed (this is to comply
145with the semantics of the old profiler). The third method printed out
146all the statistics. You might try the following sort calls:
147
148\begin{verbatim}
149 p.sort_stats('name')
150 p.print_stats()
151\end{verbatim}
152
153The first call will actually sort the list by function name, and the
154second call will print out the statistics. The following are some
155interesting calls to experiment with:
156
157\begin{verbatim}
158 p.sort_stats('cumulative').print_stats(10)
159\end{verbatim}
160
161This sorts the profile by cumulative time in a function, and then only
162prints the ten most significant lines. If you want to understand what
163algorithms are taking time, the above line is what you would use.
164
165If you were looking to see what functions were looping a lot, and
166taking a lot of time, you would do:
167
168\begin{verbatim}
169 p.sort_stats('time').print_stats(10)
170\end{verbatim}
171
172to sort according to time spent within each function, and then print
173the statistics for the top ten functions.
174
175You might also try:
176
177\begin{verbatim}
178 p.sort_stats('file').print_stats('__init__')
179\end{verbatim}
180
181This will sort all the statistics by file name, and then print out
182statistics for only the class init methods ('cause they are spelled
183with \code{__init__} in them). As one final example, you could try:
184
185\begin{verbatim}
186 p.sort_stats('time', 'cum').print_stats(.5, 'init')
187\end{verbatim}
188
189This line sorts statistics with a primary key of time, and a secondary
190key of cumulative time, and then prints out some of the statistics.
191To be specific, the list is first culled down to 50\% (re: \samp{.5})
192of its original size, then only lines containing \code{init} are
193maintained, and that sub-sub-list is printed.
194
195If you wondered what functions called the above functions, you could
196now (\samp{p} is still sorted according to the last criteria) do:
197
198\begin{verbatim}
199 p.print_callers(.5, 'init')
200\end{verbatim}
201
202and you would get a list of callers for each of the listed functions.
203
204If you want more functionality, you're going to have to read the
205manual, or guess what the following functions do:
206
207\begin{verbatim}
208 p.print_callees()
209 p.add('fooprof')
210\end{verbatim}
211
212
213\section{What Is Deterministic Profiling?}
Guido van Rossum86cb0921995-03-20 12:59:56 +0000214\nodename{Deterministic Profiling}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000215
216\dfn{Deterministic profiling} is meant to reflect the fact that all
217\dfn{function call}, \dfn{function return}, and \dfn{exception} events
218are monitored, and precise timings are made for the intervals between
219these events (during which time the user's code is executing). In
220contrast, \dfn{statistical profiling} (which is not done by this
221module) randomly samples the effective instruction pointer, and
222deduces where time is being spent. The latter technique traditionally
223involves less overhead (as the code does not need to be instrumented),
224but provides only relative indications of where time is being spent.
225
226In Python, since there is an interpreter active during execution, the
227presence of instrumented code is not required to do deterministic
228profiling. Python automatically provides a \dfn{hook} (optional
229callback) for each event. In addition, the interpreted nature of
230Python tends to add so much overhead to execution, that deterministic
231profiling tends to only add small processing overhead in typical
232applications. The result is that deterministic profiling is not that
233expensive, yet provides extensive run time statistics about the
234execution of a Python program.
235
236Call count statistics can be used to identify bugs in code (surprising
237counts), and to identify possible inline-expansion points (high call
238counts). Internal time statistics can be used to identify ``hot
239loops'' that should be carefully optimized. Cumulative time
240statistics should be used to identify high level errors in the
241selection of algorithms. Note that the unusual handling of cumulative
242times in this profiler allows statistics for recursive implementations
243of algorithms to be directly compared to iterative implementations.
244
245
246\section{Reference Manual}
247
Guido van Rossum470be141995-03-17 16:07:09 +0000248\renewcommand{\indexsubitem}{(profiler function)}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000249
250The primary entry point for the profiler is the global function
251\code{profile.run()}. It is typically used to create any profile
252information. The reports are formatted and printed using methods of
253the class \code{pstats.Stats}. The following is a description of all
254of these standard entry points and functions. For a more in-depth
255view of some of the code, consider reading the later section on
256Profiler Extensions, which includes discussion of how to derive
257``better'' profilers from the classes presented, or reading the source
258code for these modules.
259
Guido van Rossum470be141995-03-17 16:07:09 +0000260\begin{funcdesc}{profile.run}{string\optional{\, filename\optional{\, ...}}}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000261
262This function takes a single argument that has can be passed to the
263\code{exec} statement, and an optional file name. In all cases this
264routine attempts to \code{exec} its first argument, and gather profiling
265statistics from the execution. If no file name is present, then this
266function automatically prints a simple profiling report, sorted by the
267standard name string (file/line/function-name) that is presented in
268each line. The following is a typical output from such a call:
269
270\begin{verbatim}
271 main()
272 2706 function calls (2004 primitive calls) in 4.504 CPU seconds
273
274 Ordered by: standard name
275
276 ncalls tottime percall cumtime percall filename:lineno(function)
277 2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
278 43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
279 ...
280\end{verbatim}
281
282The first line indicates that this profile was generated by the call:\\
283\code{profile.run('main()')}, and hence the exec'ed string is
284\code{'main()'}. The second line indicates that 2706 calls were
285monitored. Of those calls, 2004 were \dfn{primitive}. We define
286\dfn{primitive} to mean that the call was not induced via recursion.
287The next line: \code{Ordered by:\ standard name}, indicates that
288the text string in the far right column was used to sort the output.
289The column headings include:
290
291\begin{description}
292
293\item[ncalls ]
294for the number of calls,
295
296\item[tottime ]
297for the total time spent in the given function (and excluding time
298made in calls to sub-functions),
299
300\item[percall ]
301is the quotient of \code{tottime} divided by \code{ncalls}
302
303\item[cumtime ]
304is the total time spent in this and all subfunctions (i.e., from
305invocation till exit). This figure is accurate \emph{even} for recursive
306functions.
307
308\item[percall ]
309is the quotient of \code{cumtime} divided by primitive calls
310
311\item[filename:lineno(function) ]
312provides the respective data of each function
313
314\end{description}
315
316When there are two numbers in the first column (e.g.: \samp{43/3}),
317then the latter is the number of primitive calls, and the former is
318the actual number of calls. Note that when the function does not
319recurse, these two values are the same, and only the single figure is
320printed.
321\end{funcdesc}
322
323\begin{funcdesc}{pstats.Stats}{filename\optional{\, ...}}
324This class constructor creates an instance of a ``statistics object''
325from a \var{filename} (or set of filenames). \code{Stats} objects are
326manipulated by methods, in order to print useful reports.
327
328The file selected by the above constructor must have been created by
329the corresponding version of \code{profile}. To be specific, there is
330\emph{NO} file compatibility guaranteed with future versions of this
331profiler, and there is no compatibility with files produced by other
332profilers (e.g., the old system profiler).
333
334If several files are provided, all the statistics for identical
335functions will be coalesced, so that an overall view of several
336processes can be considered in a single report. If additional files
337need to be combined with data in an existing \code{Stats} object, the
338\code{add()} method can be used.
339\end{funcdesc}
340
341
Guido van Rossum470be141995-03-17 16:07:09 +0000342\subsection{The \sectcode{Stats} Class}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000343
344\renewcommand{\indexsubitem}{(Stats method)}
345
346\begin{funcdesc}{strip_dirs}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000347This method for the \code{Stats} class removes all leading path information
Guido van Rossumdf804f81995-03-02 12:38:39 +0000348from file names. It is very useful in reducing the size of the
349printout to fit within (close to) 80 columns. This method modifies
350the object, and the stripped information is lost. After performing a
351strip operation, the object is considered to have its entries in a
352``random'' order, as it was just after object initialization and
353loading. If \code{strip_dirs()} causes two function names to be
354indistinguishable (i.e., they are on the same line of the same
355filename, and have the same function name), then the statistics for
356these two entries are accumulated into a single entry.
357\end{funcdesc}
358
359
360\begin{funcdesc}{add}{filename\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000361This method of the \code{Stats} class accumulates additional profiling
Guido van Rossumdf804f81995-03-02 12:38:39 +0000362information into the current profiling object. Its arguments should
363refer to filenames created by the corresponding version of
364\code{profile.run()}. Statistics for identically named (re: file,
365line, name) functions are automatically accumulated into single
366function statistics.
367\end{funcdesc}
368
369\begin{funcdesc}{sort_stats}{key\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000370This method modifies the \code{Stats} object by sorting it according to the
Guido van Rossumdf804f81995-03-02 12:38:39 +0000371supplied criteria. The argument is typically a string identifying the
372basis of a sort (example: \code{"time"} or \code{"name"}).
373
374When more than one key is provided, then additional keys are used as
375secondary criteria when the there is equality in all keys selected
376before them. For example, sort_stats('name', 'file') will sort all
377the entries according to their function name, and resolve all ties
378(identical function names) by sorting by file name.
379
380Abbreviations can be used for any key names, as long as the
381abbreviation is unambiguous. The following are the keys currently
382defined:
383
384\begin{tableii}{|l|l|}{code}{Valid Arg}{Meaning}
385\lineii{"calls"}{call count}
386\lineii{"cumulative"}{cumulative time}
387\lineii{"file"}{file name}
388\lineii{"module"}{file name}
389\lineii{"pcalls"}{primitive call count}
390\lineii{"line"}{line number}
391\lineii{"name"}{function name}
392\lineii{"nfl"}{name/file/line}
393\lineii{"stdname"}{standard name}
394\lineii{"time"}{internal time}
395\end{tableii}
396
397Note that all sorts on statistics are in descending order (placing
398most time consuming items first), where as name, file, and line number
399searches are in ascending order (i.e., alphabetical). The subtle
400distinction between \code{"nfl"} and \code{"stdname"} is that the
401standard name is a sort of the name as printed, which means that the
402embedded line numbers get compared in an odd way. For example, lines
4033, 20, and 40 would (if the file names were the same) appear in the
404string order 20, 3 and 40. In contrast, \code{"nfl"} does a numeric
405compare of the line numbers. In fact, \code{sort_stats("nfl")} is the
406same as \code{sort_stats("name", "file", "line")}.
407
408For compatibility with the old profiler, the numeric arguments
409\samp{-1}, \samp{0}, \samp{1}, and \samp{2} are permitted. They are
410interpreted as \code{"stdname"}, \code{"calls"}, \code{"time"}, and
411\code{"cumulative"} respectively. If this old style format (numeric)
412is used, only one sort key (the numeric key) will be used, and
413additional arguments will be silently ignored.
414\end{funcdesc}
415
416
417\begin{funcdesc}{reverse_order}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000418This method for the \code{Stats} class reverses the ordering of the basic
Guido van Rossumdf804f81995-03-02 12:38:39 +0000419list within the object. This method is provided primarily for
420compatibility with the old profiler. Its utility is questionable
421now that ascending vs descending order is properly selected based on
422the sort key of choice.
423\end{funcdesc}
424
425\begin{funcdesc}{print_stats}{restriction\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000426This method for the \code{Stats} class prints out a report as described
Guido van Rossumdf804f81995-03-02 12:38:39 +0000427in the \code{profile.run()} definition.
428
429The order of the printing is based on the last \code{sort_stats()}
430operation done on the object (subject to caveats in \code{add()} and
431\code{strip_dirs())}.
432
433The arguments provided (if any) can be used to limit the list down to
434the significant entries. Initially, the list is taken to be the
435complete set of profiled functions. Each restriction is either an
436integer (to select a count of lines), or a decimal fraction between
4370.0 and 1.0 inclusive (to select a percentage of lines), or a regular
438expression (to pattern match the standard name that is printed). If
439several restrictions are provided, then they are applied sequentially.
440For example:
441
442\begin{verbatim}
443 print_stats(.1, "foo:")
444\end{verbatim}
445
446would first limit the printing to first 10\% of list, and then only
447print functions that were part of filename \samp{.*foo:}. In
448contrast, the command:
449
450\begin{verbatim}
451 print_stats("foo:", .1)
452\end{verbatim}
453
454would limit the list to all functions having file names \samp{.*foo:},
455and then proceed to only print the first 10\% of them.
456\end{funcdesc}
457
458
459\begin{funcdesc}{print_callers}{restrictions\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000460This method for the \code{Stats} class prints a list of all functions
Guido van Rossumdf804f81995-03-02 12:38:39 +0000461that called each function in the profiled database. The ordering is
462identical to that provided by \code{print_stats()}, and the definition
463of the restricting argument is also identical. For convenience, a
464number is shown in parentheses after each caller to show how many
465times this specific call was made. A second non-parenthesized number
466is the cumulative time spent in the function at the right.
467\end{funcdesc}
468
469\begin{funcdesc}{print_callees}{restrictions\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000470This method for the \code{Stats} class prints a list of all function
Guido van Rossumdf804f81995-03-02 12:38:39 +0000471that were called by the indicated function. Aside from this reversal
472of direction of calls (re: called vs was called by), the arguments and
473ordering are identical to the \code{print_callers()} method.
474\end{funcdesc}
475
476\begin{funcdesc}{ignore}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000477This method of the \code{Stats} class is used to dispose of the value
Guido van Rossumdf804f81995-03-02 12:38:39 +0000478returned by earlier methods. All standard methods in this class
479return the instance that is being processed, so that the commands can
480be strung together. For example:
481
482\begin{verbatim}
483pstats.Stats('foofile').strip_dirs().sort_stats('cum').print_stats().ignore()
484\end{verbatim}
485
486would perform all the indicated functions, but it would not return
Guido van Rossum470be141995-03-17 16:07:09 +0000487the final reference to the \code{Stats} instance.%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000488\footnote{
489This was once necessary, when Python would print any unused expression
490result that was not \code{None}. The method is still defined for
491backward compatibility.
492}
493\end{funcdesc}
494
495
496\section{Limitations}
497
498There are two fundamental limitations on this profiler. The first is
499that it relies on the Python interpreter to dispatch \dfn{call},
500\dfn{return}, and \dfn{exception} events. Compiled C code does not
501get interpreted, and hence is ``invisible'' to the profiler. All time
502spent in C code (including builtin functions) will be charged to the
503Python function that was invoked the C code. If the C code calls out
504to some native Python code, then those calls will be profiled
505properly.
506
507The second limitation has to do with accuracy of timing information.
508There is a fundamental problem with deterministic profilers involving
509accuracy. The most obvious restriction is that the underlying ``clock''
510is only ticking at a rate (typically) of about .001 seconds. Hence no
511measurements will be more accurate that that underlying clock. If
512enough measurements are taken, then the ``error'' will tend to average
513out. Unfortunately, removing this first error induces a second source
514of error...
515
516The second problem is that it ``takes a while'' from when an event is
517dispatched until the profiler's call to get the time actually
518\emph{gets} the state of the clock. Similarly, there is a certain lag
519when exiting the profiler event handler from the time that the clock's
520value was obtained (and then squirreled away), until the user's code
521is once again executing. As a result, functions that are called many
522times, or call many functions, will typically accumulate this error.
523The error that accumulates in this fashion is typically less than the
524accuracy of the clock (i.e., less than one clock tick), but it
525\emph{can} accumulate and become very significant. This profiler
526provides a means of calibrating itself for a given platform so that
527this error can be probabilistically (i.e., on the average) removed.
528After the profiler is calibrated, it will be more accurate (in a least
529square sense), but it will sometimes produce negative numbers (when
530call counts are exceptionally low, and the gods of probability work
531against you :-). ) Do \emph{NOT} be alarmed by negative numbers in
532the profile. They should \emph{only} appear if you have calibrated
533your profiler, and the results are actually better than without
534calibration.
535
536
537\section{Calibration}
538
539The profiler class has a hard coded constant that is added to each
540event handling time to compensate for the overhead of calling the time
541function, and socking away the results. The following procedure can
542be used to obtain this constant for a given platform (see discussion
543in section Limitations above).
544
545\begin{verbatim}
546 import profile
547 pr = profile.Profile()
548 pr.calibrate(100)
549 pr.calibrate(100)
550 pr.calibrate(100)
551\end{verbatim}
552
553The argument to calibrate() is the number of times to try to do the
554sample calls to get the CPU times. If your computer is \emph{very}
555fast, you might have to do:
556
557\begin{verbatim}
558 pr.calibrate(1000)
559\end{verbatim}
560
561or even:
562
563\begin{verbatim}
564 pr.calibrate(10000)
565\end{verbatim}
566
567The object of this exercise is to get a fairly consistent result.
568When you have a consistent answer, you are ready to use that number in
569the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the
570magical number is about .00053. If you have a choice, you are better
571off with a smaller constant, and your results will ``less often'' show
572up as negative in profile statistics.
573
574The following shows how the trace_dispatch() method in the Profile
575class should be modified to install the calibration constant on a Sun
576Sparcstation 1000:
577
578\begin{verbatim}
579 def trace_dispatch(self, frame, event, arg):
580 t = self.timer()
581 t = t[0] + t[1] - self.t - .00053 # Calibration constant
582
583 if self.dispatch[event](frame,t):
584 t = self.timer()
585 self.t = t[0] + t[1]
586 else:
587 r = self.timer()
588 self.t = r[0] + r[1] - t # put back unrecorded delta
589 return
590\end{verbatim}
591
592Note that if there is no calibration constant, then the line
593containing the callibration constant should simply say:
594
595\begin{verbatim}
596 t = t[0] + t[1] - self.t # no calibration constant
597\end{verbatim}
598
599You can also achieve the same results using a derived class (and the
600profiler will actually run equally fast!!), but the above method is
601the simplest to use. I could have made the profiler ``self
602calibrating'', but it would have made the initialization of the
603profiler class slower, and would have required some \emph{very} fancy
604coding, or else the use of a variable where the constant \samp{.00053}
605was placed in the code shown. This is a \strong{VERY} critical
606performance section, and there is no reason to use a variable lookup
607at this point, when a constant can be used.
608
609
Guido van Rossum86cb0921995-03-20 12:59:56 +0000610\section{Extensions --- Deriving Better Profilers}
611\nodename{Profiler Extensions}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000612
613The \code{Profile} class of module \code{profile} was written so that
614derived classes could be developed to extend the profiler. Rather
615than describing all the details of such an effort, I'll just present
616the following two examples of derived classes that can be used to do
617profiling. If the reader is an avid Python programmer, then it should
618be possible to use these as a model and create similar (and perchance
619better) profile classes.
620
621If all you want to do is change how the timer is called, or which
622timer function is used, then the basic class has an option for that in
623the constructor for the class. Consider passing the name of a
624function to call into the constructor:
625
626\begin{verbatim}
627 pr = profile.Profile(your_time_func)
628\end{verbatim}
629
630The resulting profiler will call \code{your_time_func()} instead of
631\code{os.times()}. The function should return either a single number
632or a list of numbers (like what \code{os.times()} returns). If the
633function returns a single time number, or the list of returned numbers
634has length 2, then you will get an especially fast version of the
635dispatch routine.
636
637Be warned that you \emph{should} calibrate the profiler class for the
638timer function that you choose. For most machines, a timer that
639returns a lone integer value will provide the best results in terms of
640low overhead during profiling. (os.times is \emph{pretty} bad, 'cause
641it returns a tuple of floating point values, so all arithmetic is
642floating point in the profiler!). If you want to substitute a
643better timer in the cleanest fashion, you should derive a class, and
644simply put in the replacement dispatch method that better handles your
645timer call, along with the appropriate calibration constant :-).
646
647
648\subsection{OldProfile Class}
649
650The following derived profiler simulates the old style profiler,
651providing errant results on recursive functions. The reason for the
652usefulness of this profiler is that it runs faster (i.e., less
653overhead) than the old profiler. It still creates all the caller
654stats, and is quite useful when there is \emph{no} recursion in the
655user's code. It is also a lot more accurate than the old profiler, as
656it does not charge all its overhead time to the user's code.
657
658\begin{verbatim}
659class OldProfile(Profile):
660
661 def trace_dispatch_exception(self, frame, t):
662 rt, rtt, rct, rfn, rframe, rcur = self.cur
663 if rcur and not rframe is frame:
664 return self.trace_dispatch_return(rframe, t)
665 return 0
666
667 def trace_dispatch_call(self, frame, t):
668 fn = `frame.f_code`
669
670 self.cur = (t, 0, 0, fn, frame, self.cur)
671 if self.timings.has_key(fn):
672 tt, ct, callers = self.timings[fn]
673 self.timings[fn] = tt, ct, callers
674 else:
675 self.timings[fn] = 0, 0, {}
676 return 1
677
678 def trace_dispatch_return(self, frame, t):
679 rt, rtt, rct, rfn, frame, rcur = self.cur
680 rtt = rtt + t
681 sft = rtt + rct
682
683 pt, ptt, pct, pfn, pframe, pcur = rcur
684 self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur
685
686 tt, ct, callers = self.timings[rfn]
687 if callers.has_key(pfn):
688 callers[pfn] = callers[pfn] + 1
689 else:
690 callers[pfn] = 1
691 self.timings[rfn] = tt+rtt, ct + sft, callers
692
693 return 1
694
695
696 def snapshot_stats(self):
697 self.stats = {}
698 for func in self.timings.keys():
699 tt, ct, callers = self.timings[func]
700 nor_func = self.func_normalize(func)
701 nor_callers = {}
702 nc = 0
703 for func_caller in callers.keys():
704 nor_callers[self.func_normalize(func_caller)]=\
705 callers[func_caller]
706 nc = nc + callers[func_caller]
707 self.stats[nor_func] = nc, nc, tt, ct, nor_callers
708\end{verbatim}
709
710
711\subsection{HotProfile Class}
712
713This profiler is the fastest derived profile example. It does not
714calculate caller-callee relationships, and does not calculate
715cumulative time under a function. It only calculates time spent in a
716function, so it runs very quickly (re: very low overhead). In truth,
717the basic profiler is so fast, that is probably not worth the savings
718to give up the data, but this class still provides a nice example.
719
720\begin{verbatim}
721class HotProfile(Profile):
722
723 def trace_dispatch_exception(self, frame, t):
724 rt, rtt, rfn, rframe, rcur = self.cur
725 if rcur and not rframe is frame:
726 return self.trace_dispatch_return(rframe, t)
727 return 0
728
729 def trace_dispatch_call(self, frame, t):
730 self.cur = (t, 0, frame, self.cur)
731 return 1
732
733 def trace_dispatch_return(self, frame, t):
734 rt, rtt, frame, rcur = self.cur
735
736 rfn = `frame.f_code`
737
738 pt, ptt, pframe, pcur = rcur
739 self.cur = pt, ptt+rt, pframe, pcur
740
741 if self.timings.has_key(rfn):
742 nc, tt = self.timings[rfn]
743 self.timings[rfn] = nc + 1, rt + rtt + tt
744 else:
745 self.timings[rfn] = 1, rt + rtt
746
747 return 1
748
749
750 def snapshot_stats(self):
751 self.stats = {}
752 for func in self.timings.keys():
753 nc, tt = self.timings[func]
754 nor_func = self.func_normalize(func)
755 self.stats[nor_func] = nc, nc, tt, 0, {}
756\end{verbatim}