blob: b14116d89f65b03e6b395d4994b7a9788f54d0cc [file] [log] [blame]
Guido van Rossumdf804f81995-03-02 12:38:39 +00001\chapter{The Python Profiler}
2\stmodindex{profile}
3\stmodindex{pstats}
4
Guido van Rossum470be141995-03-17 16:07:09 +00005Copyright \copyright\ 1994, by InfoSeek Corporation, all rights reserved.
Guido van Rossumdf804f81995-03-02 12:38:39 +00006
7Written by James Roskind%
8\footnote{
Guido van Rossum6c4f0031995-03-07 10:14:09 +00009Updated and converted to \LaTeX\ by Guido van Rossum. The references to
Guido van Rossumdf804f81995-03-02 12:38:39 +000010the old profiler are left in the text, although it no longer exists.
11}
12
13Permission to use, copy, modify, and distribute this Python software
14and its associated documentation for any purpose (subject to the
15restriction in the following sentence) without fee is hereby granted,
16provided that the above copyright notice appears in all copies, and
17that both that copyright notice and this permission notice appear in
18supporting documentation, and that the name of InfoSeek not be used in
19advertising or publicity pertaining to distribution of the software
20without specific, written prior permission. This permission is
21explicitly restricted to the copying and modification of the software
22to remain in Python, compiled Python, or other languages (such as C)
23wherein the modified or derived code is exclusively imported into a
24Python module.
25
26INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
27SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
28FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
29SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
30RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
31CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
32CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
33
34
35The profiler was written after only programming in Python for 3 weeks.
36As a result, it is probably clumsy code, but I don't know for sure yet
37'cause I'm a beginner :-). I did work hard to make the code run fast,
38so that profiling would be a reasonable thing to do. I tried not to
39repeat code fragments, but I'm sure I did some stuff in really awkward
40ways at times. Please send suggestions for improvements to:
41\code{jar@infoseek.com}. I won't promise \emph{any} support. ...but
42I'd appreciate the feedback.
43
44
Guido van Rossum470be141995-03-17 16:07:09 +000045\section{Introduction to the profiler}
Guido van Rossumdf804f81995-03-02 12:38:39 +000046
47A \dfn{profiler} is a program that describes the run time performance
48of a program, providing a variety of statistics. This documentation
49describes the profiler functionality provided in the modules
50\code{profile} and \code{pstats.} This profiler provides
51\dfn{deterministic profiling} of any Python programs. It also
52provides a series of report generation tools to allow users to rapidly
53examine the results of a profile operation.
54
55
56\section{How Is This Profiler Different From The Old Profiler?}
57
58The big changes from old profiling module are that you get more
59information, and you pay less CPU time. It's not a trade-off, it's a
60trade-up.
61
62To be specific:
63
64\begin{description}
65
66\item[Bugs removed:]
67Local stack frame is no longer molested, execution time is now charged
68to correct functions.
69
70\item[Accuracy increased:]
71Profiler execution time is no longer charged to user's code,
72calibration for platform is supported, file reads are not done \emph{by}
73profiler \emph{during} profiling (and charged to user's code!).
74
75\item[Speed increased:]
76Overhead CPU cost was reduced by more than a factor of two (perhaps a
77factor of five), lightweight profiler module is all that must be
78loaded, and the report generating module (\code{pstats}) is not needed
79during profiling.
80
81\item[Recursive functions support:]
82Cumulative times in recursive functions are correctly calculated;
83recursive entries are counted.
84
85\item[Large growth in report generating UI:]
86Distinct profiles runs can be added together forming a comprehensive
87report; functions that import statistics take arbitrary lists of
88files; sorting criteria is now based on keywords (instead of 4 integer
89options); reports shows what functions were profiled as well as what
90profile file was referenced; output format has been improved.
91
92\end{description}
93
94
95\section{Instant Users Manual}
96
97This section is provided for users that ``don't want to read the
98manual.'' It provides a very brief overview, and allows a user to
99rapidly perform profiling on an existing application.
100
101To profile an application with a main entry point of \samp{foo()}, you
102would add the following to your module:
103
104\begin{verbatim}
105 import profile
106 profile.run("foo()")
107\end{verbatim}
108
109The above action would cause \samp{foo()} to be run, and a series of
110informative lines (the profile) to be printed. The above approach is
111most useful when working with the interpreter. If you would like to
112save the results of a profile into a file for later examination, you
113can supply a file name as the second argument to the \code{run()}
114function:
115
116\begin{verbatim}
117 import profile
118 profile.run("foo()", 'fooprof')
119\end{verbatim}
120
121When you wish to review the profile, you should use the methods in the
122\code{pstats} module. Typically you would load the statistics data as
123follows:
124
125\begin{verbatim}
126 import pstats
127 p = pstats.Stats('fooprof')
128\end{verbatim}
129
130The class \code{Stats} (the above code just created an instance of
131this class) has a variety of methods for manipulating and printing the
132data that was just read into \samp{p}. When you ran
133\code{profile.run()} above, what was printed was the result of three
134method calls:
135
136\begin{verbatim}
137 p.strip_dirs().sort_stats(-1).print_stats()
138\end{verbatim}
139
140The first method removed the extraneous path from all the module
141names. The second method sorted all the entries according to the
142standard module/line/name string that is printed (this is to comply
143with the semantics of the old profiler). The third method printed out
144all the statistics. You might try the following sort calls:
145
146\begin{verbatim}
147 p.sort_stats('name')
148 p.print_stats()
149\end{verbatim}
150
151The first call will actually sort the list by function name, and the
152second call will print out the statistics. The following are some
153interesting calls to experiment with:
154
155\begin{verbatim}
156 p.sort_stats('cumulative').print_stats(10)
157\end{verbatim}
158
159This sorts the profile by cumulative time in a function, and then only
160prints the ten most significant lines. If you want to understand what
161algorithms are taking time, the above line is what you would use.
162
163If you were looking to see what functions were looping a lot, and
164taking a lot of time, you would do:
165
166\begin{verbatim}
167 p.sort_stats('time').print_stats(10)
168\end{verbatim}
169
170to sort according to time spent within each function, and then print
171the statistics for the top ten functions.
172
173You might also try:
174
175\begin{verbatim}
176 p.sort_stats('file').print_stats('__init__')
177\end{verbatim}
178
179This will sort all the statistics by file name, and then print out
180statistics for only the class init methods ('cause they are spelled
181with \code{__init__} in them). As one final example, you could try:
182
183\begin{verbatim}
184 p.sort_stats('time', 'cum').print_stats(.5, 'init')
185\end{verbatim}
186
187This line sorts statistics with a primary key of time, and a secondary
188key of cumulative time, and then prints out some of the statistics.
189To be specific, the list is first culled down to 50\% (re: \samp{.5})
190of its original size, then only lines containing \code{init} are
191maintained, and that sub-sub-list is printed.
192
193If you wondered what functions called the above functions, you could
194now (\samp{p} is still sorted according to the last criteria) do:
195
196\begin{verbatim}
197 p.print_callers(.5, 'init')
198\end{verbatim}
199
200and you would get a list of callers for each of the listed functions.
201
202If you want more functionality, you're going to have to read the
203manual, or guess what the following functions do:
204
205\begin{verbatim}
206 p.print_callees()
207 p.add('fooprof')
208\end{verbatim}
209
210
211\section{What Is Deterministic Profiling?}
212
213\dfn{Deterministic profiling} is meant to reflect the fact that all
214\dfn{function call}, \dfn{function return}, and \dfn{exception} events
215are monitored, and precise timings are made for the intervals between
216these events (during which time the user's code is executing). In
217contrast, \dfn{statistical profiling} (which is not done by this
218module) randomly samples the effective instruction pointer, and
219deduces where time is being spent. The latter technique traditionally
220involves less overhead (as the code does not need to be instrumented),
221but provides only relative indications of where time is being spent.
222
223In Python, since there is an interpreter active during execution, the
224presence of instrumented code is not required to do deterministic
225profiling. Python automatically provides a \dfn{hook} (optional
226callback) for each event. In addition, the interpreted nature of
227Python tends to add so much overhead to execution, that deterministic
228profiling tends to only add small processing overhead in typical
229applications. The result is that deterministic profiling is not that
230expensive, yet provides extensive run time statistics about the
231execution of a Python program.
232
233Call count statistics can be used to identify bugs in code (surprising
234counts), and to identify possible inline-expansion points (high call
235counts). Internal time statistics can be used to identify ``hot
236loops'' that should be carefully optimized. Cumulative time
237statistics should be used to identify high level errors in the
238selection of algorithms. Note that the unusual handling of cumulative
239times in this profiler allows statistics for recursive implementations
240of algorithms to be directly compared to iterative implementations.
241
242
243\section{Reference Manual}
244
Guido van Rossum470be141995-03-17 16:07:09 +0000245\renewcommand{\indexsubitem}{(profiler function)}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000246
247The primary entry point for the profiler is the global function
248\code{profile.run()}. It is typically used to create any profile
249information. The reports are formatted and printed using methods of
250the class \code{pstats.Stats}. The following is a description of all
251of these standard entry points and functions. For a more in-depth
252view of some of the code, consider reading the later section on
253Profiler Extensions, which includes discussion of how to derive
254``better'' profilers from the classes presented, or reading the source
255code for these modules.
256
Guido van Rossum470be141995-03-17 16:07:09 +0000257\begin{funcdesc}{profile.run}{string\optional{\, filename\optional{\, ...}}}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000258
259This function takes a single argument that has can be passed to the
260\code{exec} statement, and an optional file name. In all cases this
261routine attempts to \code{exec} its first argument, and gather profiling
262statistics from the execution. If no file name is present, then this
263function automatically prints a simple profiling report, sorted by the
264standard name string (file/line/function-name) that is presented in
265each line. The following is a typical output from such a call:
266
267\begin{verbatim}
268 main()
269 2706 function calls (2004 primitive calls) in 4.504 CPU seconds
270
271 Ordered by: standard name
272
273 ncalls tottime percall cumtime percall filename:lineno(function)
274 2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
275 43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
276 ...
277\end{verbatim}
278
279The first line indicates that this profile was generated by the call:\\
280\code{profile.run('main()')}, and hence the exec'ed string is
281\code{'main()'}. The second line indicates that 2706 calls were
282monitored. Of those calls, 2004 were \dfn{primitive}. We define
283\dfn{primitive} to mean that the call was not induced via recursion.
284The next line: \code{Ordered by:\ standard name}, indicates that
285the text string in the far right column was used to sort the output.
286The column headings include:
287
288\begin{description}
289
290\item[ncalls ]
291for the number of calls,
292
293\item[tottime ]
294for the total time spent in the given function (and excluding time
295made in calls to sub-functions),
296
297\item[percall ]
298is the quotient of \code{tottime} divided by \code{ncalls}
299
300\item[cumtime ]
301is the total time spent in this and all subfunctions (i.e., from
302invocation till exit). This figure is accurate \emph{even} for recursive
303functions.
304
305\item[percall ]
306is the quotient of \code{cumtime} divided by primitive calls
307
308\item[filename:lineno(function) ]
309provides the respective data of each function
310
311\end{description}
312
313When there are two numbers in the first column (e.g.: \samp{43/3}),
314then the latter is the number of primitive calls, and the former is
315the actual number of calls. Note that when the function does not
316recurse, these two values are the same, and only the single figure is
317printed.
318\end{funcdesc}
319
320\begin{funcdesc}{pstats.Stats}{filename\optional{\, ...}}
321This class constructor creates an instance of a ``statistics object''
322from a \var{filename} (or set of filenames). \code{Stats} objects are
323manipulated by methods, in order to print useful reports.
324
325The file selected by the above constructor must have been created by
326the corresponding version of \code{profile}. To be specific, there is
327\emph{NO} file compatibility guaranteed with future versions of this
328profiler, and there is no compatibility with files produced by other
329profilers (e.g., the old system profiler).
330
331If several files are provided, all the statistics for identical
332functions will be coalesced, so that an overall view of several
333processes can be considered in a single report. If additional files
334need to be combined with data in an existing \code{Stats} object, the
335\code{add()} method can be used.
336\end{funcdesc}
337
338
Guido van Rossum470be141995-03-17 16:07:09 +0000339\subsection{The \sectcode{Stats} Class}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000340
341\renewcommand{\indexsubitem}{(Stats method)}
342
343\begin{funcdesc}{strip_dirs}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000344This method for the \code{Stats} class removes all leading path information
Guido van Rossumdf804f81995-03-02 12:38:39 +0000345from file names. It is very useful in reducing the size of the
346printout to fit within (close to) 80 columns. This method modifies
347the object, and the stripped information is lost. After performing a
348strip operation, the object is considered to have its entries in a
349``random'' order, as it was just after object initialization and
350loading. If \code{strip_dirs()} causes two function names to be
351indistinguishable (i.e., they are on the same line of the same
352filename, and have the same function name), then the statistics for
353these two entries are accumulated into a single entry.
354\end{funcdesc}
355
356
357\begin{funcdesc}{add}{filename\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000358This method of the \code{Stats} class accumulates additional profiling
Guido van Rossumdf804f81995-03-02 12:38:39 +0000359information into the current profiling object. Its arguments should
360refer to filenames created by the corresponding version of
361\code{profile.run()}. Statistics for identically named (re: file,
362line, name) functions are automatically accumulated into single
363function statistics.
364\end{funcdesc}
365
366\begin{funcdesc}{sort_stats}{key\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000367This method modifies the \code{Stats} object by sorting it according to the
Guido van Rossumdf804f81995-03-02 12:38:39 +0000368supplied criteria. The argument is typically a string identifying the
369basis of a sort (example: \code{"time"} or \code{"name"}).
370
371When more than one key is provided, then additional keys are used as
372secondary criteria when the there is equality in all keys selected
373before them. For example, sort_stats('name', 'file') will sort all
374the entries according to their function name, and resolve all ties
375(identical function names) by sorting by file name.
376
377Abbreviations can be used for any key names, as long as the
378abbreviation is unambiguous. The following are the keys currently
379defined:
380
381\begin{tableii}{|l|l|}{code}{Valid Arg}{Meaning}
382\lineii{"calls"}{call count}
383\lineii{"cumulative"}{cumulative time}
384\lineii{"file"}{file name}
385\lineii{"module"}{file name}
386\lineii{"pcalls"}{primitive call count}
387\lineii{"line"}{line number}
388\lineii{"name"}{function name}
389\lineii{"nfl"}{name/file/line}
390\lineii{"stdname"}{standard name}
391\lineii{"time"}{internal time}
392\end{tableii}
393
394Note that all sorts on statistics are in descending order (placing
395most time consuming items first), where as name, file, and line number
396searches are in ascending order (i.e., alphabetical). The subtle
397distinction between \code{"nfl"} and \code{"stdname"} is that the
398standard name is a sort of the name as printed, which means that the
399embedded line numbers get compared in an odd way. For example, lines
4003, 20, and 40 would (if the file names were the same) appear in the
401string order 20, 3 and 40. In contrast, \code{"nfl"} does a numeric
402compare of the line numbers. In fact, \code{sort_stats("nfl")} is the
403same as \code{sort_stats("name", "file", "line")}.
404
405For compatibility with the old profiler, the numeric arguments
406\samp{-1}, \samp{0}, \samp{1}, and \samp{2} are permitted. They are
407interpreted as \code{"stdname"}, \code{"calls"}, \code{"time"}, and
408\code{"cumulative"} respectively. If this old style format (numeric)
409is used, only one sort key (the numeric key) will be used, and
410additional arguments will be silently ignored.
411\end{funcdesc}
412
413
414\begin{funcdesc}{reverse_order}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000415This method for the \code{Stats} class reverses the ordering of the basic
Guido van Rossumdf804f81995-03-02 12:38:39 +0000416list within the object. This method is provided primarily for
417compatibility with the old profiler. Its utility is questionable
418now that ascending vs descending order is properly selected based on
419the sort key of choice.
420\end{funcdesc}
421
422\begin{funcdesc}{print_stats}{restriction\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000423This method for the \code{Stats} class prints out a report as described
Guido van Rossumdf804f81995-03-02 12:38:39 +0000424in the \code{profile.run()} definition.
425
426The order of the printing is based on the last \code{sort_stats()}
427operation done on the object (subject to caveats in \code{add()} and
428\code{strip_dirs())}.
429
430The arguments provided (if any) can be used to limit the list down to
431the significant entries. Initially, the list is taken to be the
432complete set of profiled functions. Each restriction is either an
433integer (to select a count of lines), or a decimal fraction between
4340.0 and 1.0 inclusive (to select a percentage of lines), or a regular
435expression (to pattern match the standard name that is printed). If
436several restrictions are provided, then they are applied sequentially.
437For example:
438
439\begin{verbatim}
440 print_stats(.1, "foo:")
441\end{verbatim}
442
443would first limit the printing to first 10\% of list, and then only
444print functions that were part of filename \samp{.*foo:}. In
445contrast, the command:
446
447\begin{verbatim}
448 print_stats("foo:", .1)
449\end{verbatim}
450
451would limit the list to all functions having file names \samp{.*foo:},
452and then proceed to only print the first 10\% of them.
453\end{funcdesc}
454
455
456\begin{funcdesc}{print_callers}{restrictions\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000457This method for the \code{Stats} class prints a list of all functions
Guido van Rossumdf804f81995-03-02 12:38:39 +0000458that called each function in the profiled database. The ordering is
459identical to that provided by \code{print_stats()}, and the definition
460of the restricting argument is also identical. For convenience, a
461number is shown in parentheses after each caller to show how many
462times this specific call was made. A second non-parenthesized number
463is the cumulative time spent in the function at the right.
464\end{funcdesc}
465
466\begin{funcdesc}{print_callees}{restrictions\optional{\, ...}}
Guido van Rossum470be141995-03-17 16:07:09 +0000467This method for the \code{Stats} class prints a list of all function
Guido van Rossumdf804f81995-03-02 12:38:39 +0000468that were called by the indicated function. Aside from this reversal
469of direction of calls (re: called vs was called by), the arguments and
470ordering are identical to the \code{print_callers()} method.
471\end{funcdesc}
472
473\begin{funcdesc}{ignore}{}
Guido van Rossum470be141995-03-17 16:07:09 +0000474This method of the \code{Stats} class is used to dispose of the value
Guido van Rossumdf804f81995-03-02 12:38:39 +0000475returned by earlier methods. All standard methods in this class
476return the instance that is being processed, so that the commands can
477be strung together. For example:
478
479\begin{verbatim}
480pstats.Stats('foofile').strip_dirs().sort_stats('cum').print_stats().ignore()
481\end{verbatim}
482
483would perform all the indicated functions, but it would not return
Guido van Rossum470be141995-03-17 16:07:09 +0000484the final reference to the \code{Stats} instance.%
Guido van Rossumdf804f81995-03-02 12:38:39 +0000485\footnote{
486This was once necessary, when Python would print any unused expression
487result that was not \code{None}. The method is still defined for
488backward compatibility.
489}
490\end{funcdesc}
491
492
493\section{Limitations}
494
495There are two fundamental limitations on this profiler. The first is
496that it relies on the Python interpreter to dispatch \dfn{call},
497\dfn{return}, and \dfn{exception} events. Compiled C code does not
498get interpreted, and hence is ``invisible'' to the profiler. All time
499spent in C code (including builtin functions) will be charged to the
500Python function that was invoked the C code. If the C code calls out
501to some native Python code, then those calls will be profiled
502properly.
503
504The second limitation has to do with accuracy of timing information.
505There is a fundamental problem with deterministic profilers involving
506accuracy. The most obvious restriction is that the underlying ``clock''
507is only ticking at a rate (typically) of about .001 seconds. Hence no
508measurements will be more accurate that that underlying clock. If
509enough measurements are taken, then the ``error'' will tend to average
510out. Unfortunately, removing this first error induces a second source
511of error...
512
513The second problem is that it ``takes a while'' from when an event is
514dispatched until the profiler's call to get the time actually
515\emph{gets} the state of the clock. Similarly, there is a certain lag
516when exiting the profiler event handler from the time that the clock's
517value was obtained (and then squirreled away), until the user's code
518is once again executing. As a result, functions that are called many
519times, or call many functions, will typically accumulate this error.
520The error that accumulates in this fashion is typically less than the
521accuracy of the clock (i.e., less than one clock tick), but it
522\emph{can} accumulate and become very significant. This profiler
523provides a means of calibrating itself for a given platform so that
524this error can be probabilistically (i.e., on the average) removed.
525After the profiler is calibrated, it will be more accurate (in a least
526square sense), but it will sometimes produce negative numbers (when
527call counts are exceptionally low, and the gods of probability work
528against you :-). ) Do \emph{NOT} be alarmed by negative numbers in
529the profile. They should \emph{only} appear if you have calibrated
530your profiler, and the results are actually better than without
531calibration.
532
533
534\section{Calibration}
535
536The profiler class has a hard coded constant that is added to each
537event handling time to compensate for the overhead of calling the time
538function, and socking away the results. The following procedure can
539be used to obtain this constant for a given platform (see discussion
540in section Limitations above).
541
542\begin{verbatim}
543 import profile
544 pr = profile.Profile()
545 pr.calibrate(100)
546 pr.calibrate(100)
547 pr.calibrate(100)
548\end{verbatim}
549
550The argument to calibrate() is the number of times to try to do the
551sample calls to get the CPU times. If your computer is \emph{very}
552fast, you might have to do:
553
554\begin{verbatim}
555 pr.calibrate(1000)
556\end{verbatim}
557
558or even:
559
560\begin{verbatim}
561 pr.calibrate(10000)
562\end{verbatim}
563
564The object of this exercise is to get a fairly consistent result.
565When you have a consistent answer, you are ready to use that number in
566the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the
567magical number is about .00053. If you have a choice, you are better
568off with a smaller constant, and your results will ``less often'' show
569up as negative in profile statistics.
570
571The following shows how the trace_dispatch() method in the Profile
572class should be modified to install the calibration constant on a Sun
573Sparcstation 1000:
574
575\begin{verbatim}
576 def trace_dispatch(self, frame, event, arg):
577 t = self.timer()
578 t = t[0] + t[1] - self.t - .00053 # Calibration constant
579
580 if self.dispatch[event](frame,t):
581 t = self.timer()
582 self.t = t[0] + t[1]
583 else:
584 r = self.timer()
585 self.t = r[0] + r[1] - t # put back unrecorded delta
586 return
587\end{verbatim}
588
589Note that if there is no calibration constant, then the line
590containing the callibration constant should simply say:
591
592\begin{verbatim}
593 t = t[0] + t[1] - self.t # no calibration constant
594\end{verbatim}
595
596You can also achieve the same results using a derived class (and the
597profiler will actually run equally fast!!), but the above method is
598the simplest to use. I could have made the profiler ``self
599calibrating'', but it would have made the initialization of the
600profiler class slower, and would have required some \emph{very} fancy
601coding, or else the use of a variable where the constant \samp{.00053}
602was placed in the code shown. This is a \strong{VERY} critical
603performance section, and there is no reason to use a variable lookup
604at this point, when a constant can be used.
605
606
Guido van Rossum470be141995-03-17 16:07:09 +0000607\section{Extensions - Deriving Better Profilers}
Guido van Rossumdf804f81995-03-02 12:38:39 +0000608
609The \code{Profile} class of module \code{profile} was written so that
610derived classes could be developed to extend the profiler. Rather
611than describing all the details of such an effort, I'll just present
612the following two examples of derived classes that can be used to do
613profiling. If the reader is an avid Python programmer, then it should
614be possible to use these as a model and create similar (and perchance
615better) profile classes.
616
617If all you want to do is change how the timer is called, or which
618timer function is used, then the basic class has an option for that in
619the constructor for the class. Consider passing the name of a
620function to call into the constructor:
621
622\begin{verbatim}
623 pr = profile.Profile(your_time_func)
624\end{verbatim}
625
626The resulting profiler will call \code{your_time_func()} instead of
627\code{os.times()}. The function should return either a single number
628or a list of numbers (like what \code{os.times()} returns). If the
629function returns a single time number, or the list of returned numbers
630has length 2, then you will get an especially fast version of the
631dispatch routine.
632
633Be warned that you \emph{should} calibrate the profiler class for the
634timer function that you choose. For most machines, a timer that
635returns a lone integer value will provide the best results in terms of
636low overhead during profiling. (os.times is \emph{pretty} bad, 'cause
637it returns a tuple of floating point values, so all arithmetic is
638floating point in the profiler!). If you want to substitute a
639better timer in the cleanest fashion, you should derive a class, and
640simply put in the replacement dispatch method that better handles your
641timer call, along with the appropriate calibration constant :-).
642
643
644\subsection{OldProfile Class}
645
646The following derived profiler simulates the old style profiler,
647providing errant results on recursive functions. The reason for the
648usefulness of this profiler is that it runs faster (i.e., less
649overhead) than the old profiler. It still creates all the caller
650stats, and is quite useful when there is \emph{no} recursion in the
651user's code. It is also a lot more accurate than the old profiler, as
652it does not charge all its overhead time to the user's code.
653
654\begin{verbatim}
655class OldProfile(Profile):
656
657 def trace_dispatch_exception(self, frame, t):
658 rt, rtt, rct, rfn, rframe, rcur = self.cur
659 if rcur and not rframe is frame:
660 return self.trace_dispatch_return(rframe, t)
661 return 0
662
663 def trace_dispatch_call(self, frame, t):
664 fn = `frame.f_code`
665
666 self.cur = (t, 0, 0, fn, frame, self.cur)
667 if self.timings.has_key(fn):
668 tt, ct, callers = self.timings[fn]
669 self.timings[fn] = tt, ct, callers
670 else:
671 self.timings[fn] = 0, 0, {}
672 return 1
673
674 def trace_dispatch_return(self, frame, t):
675 rt, rtt, rct, rfn, frame, rcur = self.cur
676 rtt = rtt + t
677 sft = rtt + rct
678
679 pt, ptt, pct, pfn, pframe, pcur = rcur
680 self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur
681
682 tt, ct, callers = self.timings[rfn]
683 if callers.has_key(pfn):
684 callers[pfn] = callers[pfn] + 1
685 else:
686 callers[pfn] = 1
687 self.timings[rfn] = tt+rtt, ct + sft, callers
688
689 return 1
690
691
692 def snapshot_stats(self):
693 self.stats = {}
694 for func in self.timings.keys():
695 tt, ct, callers = self.timings[func]
696 nor_func = self.func_normalize(func)
697 nor_callers = {}
698 nc = 0
699 for func_caller in callers.keys():
700 nor_callers[self.func_normalize(func_caller)]=\
701 callers[func_caller]
702 nc = nc + callers[func_caller]
703 self.stats[nor_func] = nc, nc, tt, ct, nor_callers
704\end{verbatim}
705
706
707\subsection{HotProfile Class}
708
709This profiler is the fastest derived profile example. It does not
710calculate caller-callee relationships, and does not calculate
711cumulative time under a function. It only calculates time spent in a
712function, so it runs very quickly (re: very low overhead). In truth,
713the basic profiler is so fast, that is probably not worth the savings
714to give up the data, but this class still provides a nice example.
715
716\begin{verbatim}
717class HotProfile(Profile):
718
719 def trace_dispatch_exception(self, frame, t):
720 rt, rtt, rfn, rframe, rcur = self.cur
721 if rcur and not rframe is frame:
722 return self.trace_dispatch_return(rframe, t)
723 return 0
724
725 def trace_dispatch_call(self, frame, t):
726 self.cur = (t, 0, frame, self.cur)
727 return 1
728
729 def trace_dispatch_return(self, frame, t):
730 rt, rtt, frame, rcur = self.cur
731
732 rfn = `frame.f_code`
733
734 pt, ptt, pframe, pcur = rcur
735 self.cur = pt, ptt+rt, pframe, pcur
736
737 if self.timings.has_key(rfn):
738 nc, tt = self.timings[rfn]
739 self.timings[rfn] = nc + 1, rt + rtt + tt
740 else:
741 self.timings[rfn] = 1, rt + rtt
742
743 return 1
744
745
746 def snapshot_stats(self):
747 self.stats = {}
748 for func in self.timings.keys():
749 nc, tt = self.timings[func]
750 nor_func = self.func_normalize(func)
751 self.stats[nor_func] = nc, nc, tt, 0, {}
752\end{verbatim}