Blame - Doc/lib/libprofile.tex - platform/external/python/cpython2

blob: b14116d89f65b03e6b395d4994b7a9788f54d0cc [file] [log] [blame]

Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	1	\chapter{The Python Profiler}
				2	\stmodindex{profile}
				3	\stmodindex{pstats}
				4
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	5	Copyright \copyright\ 1994, by InfoSeek Corporation, all rights reserved.
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	6
				7	Written by James Roskind%
				8	\footnote{
Guido van Rossum	6c4f003	1995-03-07 10:14:09 +0000	[diff] [blame]	9	Updated and converted to \LaTeX\ by Guido van Rossum. The references to
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	10	the old profiler are left in the text, although it no longer exists.
				11	}
				12
				13	Permission to use, copy, modify, and distribute this Python software
				14	and its associated documentation for any purpose (subject to the
				15	restriction in the following sentence) without fee is hereby granted,
				16	provided that the above copyright notice appears in all copies, and
				17	that both that copyright notice and this permission notice appear in
				18	supporting documentation, and that the name of InfoSeek not be used in
				19	advertising or publicity pertaining to distribution of the software
				20	without specific, written prior permission. This permission is
				21	explicitly restricted to the copying and modification of the software
				22	to remain in Python, compiled Python, or other languages (such as C)
				23	wherein the modified or derived code is exclusively imported into a
				24	Python module.
				25
				26	INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
				27	SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
				28	FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
				29	SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
				30	RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
				31	CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
				32	CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
				33
				34
				35	The profiler was written after only programming in Python for 3 weeks.
				36	As a result, it is probably clumsy code, but I don't know for sure yet
				37	'cause I'm a beginner :-). I did work hard to make the code run fast,
				38	so that profiling would be a reasonable thing to do. I tried not to
				39	repeat code fragments, but I'm sure I did some stuff in really awkward
				40	ways at times. Please send suggestions for improvements to:
				41	\code{jar@infoseek.com}. I won't promise \emph{any} support. ...but
				42	I'd appreciate the feedback.
				43
				44
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	45	\section{Introduction to the profiler}
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	46
				47	A \dfn{profiler} is a program that describes the run time performance
				48	of a program, providing a variety of statistics. This documentation
				49	describes the profiler functionality provided in the modules
				50	\code{profile} and \code{pstats.} This profiler provides
				51	\dfn{deterministic profiling} of any Python programs. It also
				52	provides a series of report generation tools to allow users to rapidly
				53	examine the results of a profile operation.
				54
				55
				56	\section{How Is This Profiler Different From The Old Profiler?}
				57
				58	The big changes from old profiling module are that you get more
				59	information, and you pay less CPU time. It's not a trade-off, it's a
				60	trade-up.
				61
				62	To be specific:
				63
				64	\begin{description}
				65
				66	\item[Bugs removed:]
				67	Local stack frame is no longer molested, execution time is now charged
				68	to correct functions.
				69
				70	\item[Accuracy increased:]
				71	Profiler execution time is no longer charged to user's code,
				72	calibration for platform is supported, file reads are not done \emph{by}
				73	profiler \emph{during} profiling (and charged to user's code!).
				74
				75	\item[Speed increased:]
				76	Overhead CPU cost was reduced by more than a factor of two (perhaps a
				77	factor of five), lightweight profiler module is all that must be
				78	loaded, and the report generating module (\code{pstats}) is not needed
				79	during profiling.
				80
				81	\item[Recursive functions support:]
				82	Cumulative times in recursive functions are correctly calculated;
				83	recursive entries are counted.
				84
				85	\item[Large growth in report generating UI:]
				86	Distinct profiles runs can be added together forming a comprehensive
				87	report; functions that import statistics take arbitrary lists of
				88	files; sorting criteria is now based on keywords (instead of 4 integer
				89	options); reports shows what functions were profiled as well as what
				90	profile file was referenced; output format has been improved.
				91
				92	\end{description}
				93
				94
				95	\section{Instant Users Manual}
				96
				97	This section is provided for users that ``don't want to read the
				98	manual.'' It provides a very brief overview, and allows a user to
				99	rapidly perform profiling on an existing application.
				100
				101	To profile an application with a main entry point of \samp{foo()}, you
				102	would add the following to your module:
				103
				104	\begin{verbatim}
				105	import profile
				106	profile.run("foo()")
				107	\end{verbatim}
				108
				109	The above action would cause \samp{foo()} to be run, and a series of
				110	informative lines (the profile) to be printed. The above approach is
				111	most useful when working with the interpreter. If you would like to
				112	save the results of a profile into a file for later examination, you
				113	can supply a file name as the second argument to the \code{run()}
				114	function:
				115
				116	\begin{verbatim}
				117	import profile
				118	profile.run("foo()", 'fooprof')
				119	\end{verbatim}
				120
				121	When you wish to review the profile, you should use the methods in the
				122	\code{pstats} module. Typically you would load the statistics data as
				123	follows:
				124
				125	\begin{verbatim}
				126	import pstats
				127	p = pstats.Stats('fooprof')
				128	\end{verbatim}
				129
				130	The class \code{Stats} (the above code just created an instance of
				131	this class) has a variety of methods for manipulating and printing the
				132	data that was just read into \samp{p}. When you ran
				133	\code{profile.run()} above, what was printed was the result of three
				134	method calls:
				135
				136	\begin{verbatim}
				137	p.strip_dirs().sort_stats(-1).print_stats()
				138	\end{verbatim}
				139
				140	The first method removed the extraneous path from all the module
				141	names. The second method sorted all the entries according to the
				142	standard module/line/name string that is printed (this is to comply
				143	with the semantics of the old profiler). The third method printed out
				144	all the statistics. You might try the following sort calls:
				145
				146	\begin{verbatim}
				147	p.sort_stats('name')
				148	p.print_stats()
				149	\end{verbatim}
				150
				151	The first call will actually sort the list by function name, and the
				152	second call will print out the statistics. The following are some
				153	interesting calls to experiment with:
				154
				155	\begin{verbatim}
				156	p.sort_stats('cumulative').print_stats(10)
				157	\end{verbatim}
				158
				159	This sorts the profile by cumulative time in a function, and then only
				160	prints the ten most significant lines. If you want to understand what
				161	algorithms are taking time, the above line is what you would use.
				162
				163	If you were looking to see what functions were looping a lot, and
				164	taking a lot of time, you would do:
				165
				166	\begin{verbatim}
				167	p.sort_stats('time').print_stats(10)
				168	\end{verbatim}
				169
				170	to sort according to time spent within each function, and then print
				171	the statistics for the top ten functions.
				172
				173	You might also try:
				174
				175	\begin{verbatim}
				176	p.sort_stats('file').print_stats('__init__')
				177	\end{verbatim}
				178
				179	This will sort all the statistics by file name, and then print out
				180	statistics for only the class init methods ('cause they are spelled
				181	with \code{__init__} in them). As one final example, you could try:
				182
				183	\begin{verbatim}
				184	p.sort_stats('time', 'cum').print_stats(.5, 'init')
				185	\end{verbatim}
				186
				187	This line sorts statistics with a primary key of time, and a secondary
				188	key of cumulative time, and then prints out some of the statistics.
				189	To be specific, the list is first culled down to 50\% (re: \samp{.5})
				190	of its original size, then only lines containing \code{init} are
				191	maintained, and that sub-sub-list is printed.
				192
				193	If you wondered what functions called the above functions, you could
				194	now (\samp{p} is still sorted according to the last criteria) do:
				195
				196	\begin{verbatim}
				197	p.print_callers(.5, 'init')
				198	\end{verbatim}
				199
				200	and you would get a list of callers for each of the listed functions.
				201
				202	If you want more functionality, you're going to have to read the
				203	manual, or guess what the following functions do:
				204
				205	\begin{verbatim}
				206	p.print_callees()
				207	p.add('fooprof')
				208	\end{verbatim}
				209
				210
				211	\section{What Is Deterministic Profiling?}
				212
				213	\dfn{Deterministic profiling} is meant to reflect the fact that all
				214	\dfn{function call}, \dfn{function return}, and \dfn{exception} events
				215	are monitored, and precise timings are made for the intervals between
				216	these events (during which time the user's code is executing). In
				217	contrast, \dfn{statistical profiling} (which is not done by this
				218	module) randomly samples the effective instruction pointer, and
				219	deduces where time is being spent. The latter technique traditionally
				220	involves less overhead (as the code does not need to be instrumented),
				221	but provides only relative indications of where time is being spent.
				222
				223	In Python, since there is an interpreter active during execution, the
				224	presence of instrumented code is not required to do deterministic
				225	profiling. Python automatically provides a \dfn{hook} (optional
				226	callback) for each event. In addition, the interpreted nature of
				227	Python tends to add so much overhead to execution, that deterministic
				228	profiling tends to only add small processing overhead in typical
				229	applications. The result is that deterministic profiling is not that
				230	expensive, yet provides extensive run time statistics about the
				231	execution of a Python program.
				232
				233	Call count statistics can be used to identify bugs in code (surprising
				234	counts), and to identify possible inline-expansion points (high call
				235	counts). Internal time statistics can be used to identify ``hot
				236	loops'' that should be carefully optimized. Cumulative time
				237	statistics should be used to identify high level errors in the
				238	selection of algorithms. Note that the unusual handling of cumulative
				239	times in this profiler allows statistics for recursive implementations
				240	of algorithms to be directly compared to iterative implementations.
				241
				242
				243	\section{Reference Manual}
				244
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	245	\renewcommand{\indexsubitem}{(profiler function)}
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	246
				247	The primary entry point for the profiler is the global function
				248	\code{profile.run()}. It is typically used to create any profile
				249	information. The reports are formatted and printed using methods of
				250	the class \code{pstats.Stats}. The following is a description of all
				251	of these standard entry points and functions. For a more in-depth
				252	view of some of the code, consider reading the later section on
				253	Profiler Extensions, which includes discussion of how to derive
				254	``better'' profilers from the classes presented, or reading the source
				255	code for these modules.
				256
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	257	\begin{funcdesc}{profile.run}{string\optional{\, filename\optional{\, ...}}}
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	258
				259	This function takes a single argument that has can be passed to the
				260	\code{exec} statement, and an optional file name. In all cases this
				261	routine attempts to \code{exec} its first argument, and gather profiling
				262	statistics from the execution. If no file name is present, then this
				263	function automatically prints a simple profiling report, sorted by the
				264	standard name string (file/line/function-name) that is presented in
				265	each line. The following is a typical output from such a call:
				266
				267	\begin{verbatim}
				268	main()
				269	2706 function calls (2004 primitive calls) in 4.504 CPU seconds
				270
				271	Ordered by: standard name
				272
				273	ncalls tottime percall cumtime percall filename:lineno(function)
				274	2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
				275	43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
				276	...
				277	\end{verbatim}
				278
				279	The first line indicates that this profile was generated by the call:\\
				280	\code{profile.run('main()')}, and hence the exec'ed string is
				281	\code{'main()'}. The second line indicates that 2706 calls were
				282	monitored. Of those calls, 2004 were \dfn{primitive}. We define
				283	\dfn{primitive} to mean that the call was not induced via recursion.
				284	The next line: \code{Ordered by:\ standard name}, indicates that
				285	the text string in the far right column was used to sort the output.
				286	The column headings include:
				287
				288	\begin{description}
				289
				290	\item[ncalls ]
				291	for the number of calls,
				292
				293	\item[tottime ]
				294	for the total time spent in the given function (and excluding time
				295	made in calls to sub-functions),
				296
				297	\item[percall ]
				298	is the quotient of \code{tottime} divided by \code{ncalls}
				299
				300	\item[cumtime ]
				301	is the total time spent in this and all subfunctions (i.e., from
				302	invocation till exit). This figure is accurate \emph{even} for recursive
				303	functions.
				304
				305	\item[percall ]
				306	is the quotient of \code{cumtime} divided by primitive calls
				307
				308	\item[filename:lineno(function) ]
				309	provides the respective data of each function
				310
				311	\end{description}
				312
				313	When there are two numbers in the first column (e.g.: \samp{43/3}),
				314	then the latter is the number of primitive calls, and the former is
				315	the actual number of calls. Note that when the function does not
				316	recurse, these two values are the same, and only the single figure is
				317	printed.
				318	\end{funcdesc}
				319
				320	\begin{funcdesc}{pstats.Stats}{filename\optional{\, ...}}
				321	This class constructor creates an instance of a ``statistics object''
				322	from a \var{filename} (or set of filenames). \code{Stats} objects are
				323	manipulated by methods, in order to print useful reports.
				324
				325	The file selected by the above constructor must have been created by
				326	the corresponding version of \code{profile}. To be specific, there is
				327	\emph{NO} file compatibility guaranteed with future versions of this
				328	profiler, and there is no compatibility with files produced by other
				329	profilers (e.g., the old system profiler).
				330
				331	If several files are provided, all the statistics for identical
				332	functions will be coalesced, so that an overall view of several
				333	processes can be considered in a single report. If additional files
				334	need to be combined with data in an existing \code{Stats} object, the
				335	\code{add()} method can be used.
				336	\end{funcdesc}
				337
				338
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	339	\subsection{The \sectcode{Stats} Class}
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	340
				341	\renewcommand{\indexsubitem}{(Stats method)}
				342
				343	\begin{funcdesc}{strip_dirs}{}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	344	This method for the \code{Stats} class removes all leading path information
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	345	from file names. It is very useful in reducing the size of the
				346	printout to fit within (close to) 80 columns. This method modifies
				347	the object, and the stripped information is lost. After performing a
				348	strip operation, the object is considered to have its entries in a
				349	``random'' order, as it was just after object initialization and
				350	loading. If \code{strip_dirs()} causes two function names to be
				351	indistinguishable (i.e., they are on the same line of the same
				352	filename, and have the same function name), then the statistics for
				353	these two entries are accumulated into a single entry.
				354	\end{funcdesc}
				355
				356
				357	\begin{funcdesc}{add}{filename\optional{\, ...}}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	358	This method of the \code{Stats} class accumulates additional profiling
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	359	information into the current profiling object. Its arguments should
				360	refer to filenames created by the corresponding version of
				361	\code{profile.run()}. Statistics for identically named (re: file,
				362	line, name) functions are automatically accumulated into single
				363	function statistics.
				364	\end{funcdesc}
				365
				366	\begin{funcdesc}{sort_stats}{key\optional{\, ...}}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	367	This method modifies the \code{Stats} object by sorting it according to the
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	368	supplied criteria. The argument is typically a string identifying the
				369	basis of a sort (example: \code{"time"} or \code{"name"}).
				370
				371	When more than one key is provided, then additional keys are used as
				372	secondary criteria when the there is equality in all keys selected
				373	before them. For example, sort_stats('name', 'file') will sort all
				374	the entries according to their function name, and resolve all ties
				375	(identical function names) by sorting by file name.
				376
				377	Abbreviations can be used for any key names, as long as the
				378	abbreviation is unambiguous. The following are the keys currently
				379	defined:
				380
				381	\begin{tableii}{\|l\|l\|}{code}{Valid Arg}{Meaning}
				382	\lineii{"calls"}{call count}
				383	\lineii{"cumulative"}{cumulative time}
				384	\lineii{"file"}{file name}
				385	\lineii{"module"}{file name}
				386	\lineii{"pcalls"}{primitive call count}
				387	\lineii{"line"}{line number}
				388	\lineii{"name"}{function name}
				389	\lineii{"nfl"}{name/file/line}
				390	\lineii{"stdname"}{standard name}
				391	\lineii{"time"}{internal time}
				392	\end{tableii}
				393
				394	Note that all sorts on statistics are in descending order (placing
				395	most time consuming items first), where as name, file, and line number
				396	searches are in ascending order (i.e., alphabetical). The subtle
				397	distinction between \code{"nfl"} and \code{"stdname"} is that the
				398	standard name is a sort of the name as printed, which means that the
				399	embedded line numbers get compared in an odd way. For example, lines
				400	3, 20, and 40 would (if the file names were the same) appear in the
				401	string order 20, 3 and 40. In contrast, \code{"nfl"} does a numeric
				402	compare of the line numbers. In fact, \code{sort_stats("nfl")} is the
				403	same as \code{sort_stats("name", "file", "line")}.
				404
				405	For compatibility with the old profiler, the numeric arguments
				406	\samp{-1}, \samp{0}, \samp{1}, and \samp{2} are permitted. They are
				407	interpreted as \code{"stdname"}, \code{"calls"}, \code{"time"}, and
				408	\code{"cumulative"} respectively. If this old style format (numeric)
				409	is used, only one sort key (the numeric key) will be used, and
				410	additional arguments will be silently ignored.
				411	\end{funcdesc}
				412
				413
				414	\begin{funcdesc}{reverse_order}{}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	415	This method for the \code{Stats} class reverses the ordering of the basic
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	416	list within the object. This method is provided primarily for
				417	compatibility with the old profiler. Its utility is questionable
				418	now that ascending vs descending order is properly selected based on
				419	the sort key of choice.
				420	\end{funcdesc}
				421
				422	\begin{funcdesc}{print_stats}{restriction\optional{\, ...}}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	423	This method for the \code{Stats} class prints out a report as described
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	424	in the \code{profile.run()} definition.
				425
				426	The order of the printing is based on the last \code{sort_stats()}
				427	operation done on the object (subject to caveats in \code{add()} and
				428	\code{strip_dirs())}.
				429
				430	The arguments provided (if any) can be used to limit the list down to
				431	the significant entries. Initially, the list is taken to be the
				432	complete set of profiled functions. Each restriction is either an
				433	integer (to select a count of lines), or a decimal fraction between
				434	0.0 and 1.0 inclusive (to select a percentage of lines), or a regular
				435	expression (to pattern match the standard name that is printed). If
				436	several restrictions are provided, then they are applied sequentially.
				437	For example:
				438
				439	\begin{verbatim}
				440	print_stats(.1, "foo:")
				441	\end{verbatim}
				442
				443	would first limit the printing to first 10\% of list, and then only
				444	print functions that were part of filename \samp{.*foo:}. In
				445	contrast, the command:
				446
				447	\begin{verbatim}
				448	print_stats("foo:", .1)
				449	\end{verbatim}
				450
				451	would limit the list to all functions having file names \samp{.*foo:},
				452	and then proceed to only print the first 10\% of them.
				453	\end{funcdesc}
				454
				455
				456	\begin{funcdesc}{print_callers}{restrictions\optional{\, ...}}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	457	This method for the \code{Stats} class prints a list of all functions
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	458	that called each function in the profiled database. The ordering is
				459	identical to that provided by \code{print_stats()}, and the definition
				460	of the restricting argument is also identical. For convenience, a
				461	number is shown in parentheses after each caller to show how many
				462	times this specific call was made. A second non-parenthesized number
				463	is the cumulative time spent in the function at the right.
				464	\end{funcdesc}
				465
				466	\begin{funcdesc}{print_callees}{restrictions\optional{\, ...}}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	467	This method for the \code{Stats} class prints a list of all function
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	468	that were called by the indicated function. Aside from this reversal
				469	of direction of calls (re: called vs was called by), the arguments and
				470	ordering are identical to the \code{print_callers()} method.
				471	\end{funcdesc}
				472
				473	\begin{funcdesc}{ignore}{}
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	474	This method of the \code{Stats} class is used to dispose of the value
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	475	returned by earlier methods. All standard methods in this class
				476	return the instance that is being processed, so that the commands can
				477	be strung together. For example:
				478
				479	\begin{verbatim}
				480	pstats.Stats('foofile').strip_dirs().sort_stats('cum').print_stats().ignore()
				481	\end{verbatim}
				482
				483	would perform all the indicated functions, but it would not return
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	484	the final reference to the \code{Stats} instance.%
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	485	\footnote{
				486	This was once necessary, when Python would print any unused expression
				487	result that was not \code{None}. The method is still defined for
				488	backward compatibility.
				489	}
				490	\end{funcdesc}
				491
				492
				493	\section{Limitations}
				494
				495	There are two fundamental limitations on this profiler. The first is
				496	that it relies on the Python interpreter to dispatch \dfn{call},
				497	\dfn{return}, and \dfn{exception} events. Compiled C code does not
				498	get interpreted, and hence is ``invisible'' to the profiler. All time
				499	spent in C code (including builtin functions) will be charged to the
				500	Python function that was invoked the C code. If the C code calls out
				501	to some native Python code, then those calls will be profiled
				502	properly.
				503
				504	The second limitation has to do with accuracy of timing information.
				505	There is a fundamental problem with deterministic profilers involving
				506	accuracy. The most obvious restriction is that the underlying ``clock''
				507	is only ticking at a rate (typically) of about .001 seconds. Hence no
				508	measurements will be more accurate that that underlying clock. If
				509	enough measurements are taken, then the ``error'' will tend to average
				510	out. Unfortunately, removing this first error induces a second source
				511	of error...
				512
				513	The second problem is that it ``takes a while'' from when an event is
				514	dispatched until the profiler's call to get the time actually
				515	\emph{gets} the state of the clock. Similarly, there is a certain lag
				516	when exiting the profiler event handler from the time that the clock's
				517	value was obtained (and then squirreled away), until the user's code
				518	is once again executing. As a result, functions that are called many
				519	times, or call many functions, will typically accumulate this error.
				520	The error that accumulates in this fashion is typically less than the
				521	accuracy of the clock (i.e., less than one clock tick), but it
				522	\emph{can} accumulate and become very significant. This profiler
				523	provides a means of calibrating itself for a given platform so that
				524	this error can be probabilistically (i.e., on the average) removed.
				525	After the profiler is calibrated, it will be more accurate (in a least
				526	square sense), but it will sometimes produce negative numbers (when
				527	call counts are exceptionally low, and the gods of probability work
				528	against you :-). ) Do \emph{NOT} be alarmed by negative numbers in
				529	the profile. They should \emph{only} appear if you have calibrated
				530	your profiler, and the results are actually better than without
				531	calibration.
				532
				533
				534	\section{Calibration}
				535
				536	The profiler class has a hard coded constant that is added to each
				537	event handling time to compensate for the overhead of calling the time
				538	function, and socking away the results. The following procedure can
				539	be used to obtain this constant for a given platform (see discussion
				540	in section Limitations above).
				541
				542	\begin{verbatim}
				543	import profile
				544	pr = profile.Profile()
				545	pr.calibrate(100)
				546	pr.calibrate(100)
				547	pr.calibrate(100)
				548	\end{verbatim}
				549
				550	The argument to calibrate() is the number of times to try to do the
				551	sample calls to get the CPU times. If your computer is \emph{very}
				552	fast, you might have to do:
				553
				554	\begin{verbatim}
				555	pr.calibrate(1000)
				556	\end{verbatim}
				557
				558	or even:
				559
				560	\begin{verbatim}
				561	pr.calibrate(10000)
				562	\end{verbatim}
				563
				564	The object of this exercise is to get a fairly consistent result.
				565	When you have a consistent answer, you are ready to use that number in
				566	the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the
				567	magical number is about .00053. If you have a choice, you are better
				568	off with a smaller constant, and your results will ``less often'' show
				569	up as negative in profile statistics.
				570
				571	The following shows how the trace_dispatch() method in the Profile
				572	class should be modified to install the calibration constant on a Sun
				573	Sparcstation 1000:
				574
				575	\begin{verbatim}
				576	def trace_dispatch(self, frame, event, arg):
				577	t = self.timer()
				578	t = t[0] + t[1] - self.t - .00053 # Calibration constant
				579
				580	if self.dispatch[event](frame,t):
				581	t = self.timer()
				582	self.t = t[0] + t[1]
				583	else:
				584	r = self.timer()
				585	self.t = r[0] + r[1] - t # put back unrecorded delta
				586	return
				587	\end{verbatim}
				588
				589	Note that if there is no calibration constant, then the line
				590	containing the callibration constant should simply say:
				591
				592	\begin{verbatim}
				593	t = t[0] + t[1] - self.t # no calibration constant
				594	\end{verbatim}
				595
				596	You can also achieve the same results using a derived class (and the
				597	profiler will actually run equally fast!!), but the above method is
				598	the simplest to use. I could have made the profiler ``self
				599	calibrating'', but it would have made the initialization of the
				600	profiler class slower, and would have required some \emph{very} fancy
				601	coding, or else the use of a variable where the constant \samp{.00053}
				602	was placed in the code shown. This is a \strong{VERY} critical
				603	performance section, and there is no reason to use a variable lookup
				604	at this point, when a constant can be used.
				605
				606
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame^]	607	\section{Extensions - Deriving Better Profilers}
Guido van Rossum	df804f8	1995-03-02 12:38:39 +0000	[diff] [blame]	608
				609	The \code{Profile} class of module \code{profile} was written so that
				610	derived classes could be developed to extend the profiler. Rather
				611	than describing all the details of such an effort, I'll just present
				612	the following two examples of derived classes that can be used to do
				613	profiling. If the reader is an avid Python programmer, then it should
				614	be possible to use these as a model and create similar (and perchance
				615	better) profile classes.
				616
				617	If all you want to do is change how the timer is called, or which
				618	timer function is used, then the basic class has an option for that in
				619	the constructor for the class. Consider passing the name of a
				620	function to call into the constructor:
				621
				622	\begin{verbatim}
				623	pr = profile.Profile(your_time_func)
				624	\end{verbatim}
				625
				626	The resulting profiler will call \code{your_time_func()} instead of
				627	\code{os.times()}. The function should return either a single number
				628	or a list of numbers (like what \code{os.times()} returns). If the
				629	function returns a single time number, or the list of returned numbers
				630	has length 2, then you will get an especially fast version of the
				631	dispatch routine.
				632
				633	Be warned that you \emph{should} calibrate the profiler class for the
				634	timer function that you choose. For most machines, a timer that
				635	returns a lone integer value will provide the best results in terms of
				636	low overhead during profiling. (os.times is \emph{pretty} bad, 'cause
				637	it returns a tuple of floating point values, so all arithmetic is
				638	floating point in the profiler!). If you want to substitute a
				639	better timer in the cleanest fashion, you should derive a class, and
				640	simply put in the replacement dispatch method that better handles your
				641	timer call, along with the appropriate calibration constant :-).
				642
				643
				644	\subsection{OldProfile Class}
				645
				646	The following derived profiler simulates the old style profiler,
				647	providing errant results on recursive functions. The reason for the
				648	usefulness of this profiler is that it runs faster (i.e., less
				649	overhead) than the old profiler. It still creates all the caller
				650	stats, and is quite useful when there is \emph{no} recursion in the
				651	user's code. It is also a lot more accurate than the old profiler, as
				652	it does not charge all its overhead time to the user's code.
				653
				654	\begin{verbatim}
				655	class OldProfile(Profile):
				656
				657	def trace_dispatch_exception(self, frame, t):
				658	rt, rtt, rct, rfn, rframe, rcur = self.cur
				659	if rcur and not rframe is frame:
				660	return self.trace_dispatch_return(rframe, t)
				661	return 0
				662
				663	def trace_dispatch_call(self, frame, t):
				664	fn = `frame.f_code`
				665
				666	self.cur = (t, 0, 0, fn, frame, self.cur)
				667	if self.timings.has_key(fn):
				668	tt, ct, callers = self.timings[fn]
				669	self.timings[fn] = tt, ct, callers
				670	else:
				671	self.timings[fn] = 0, 0, {}
				672	return 1
				673
				674	def trace_dispatch_return(self, frame, t):
				675	rt, rtt, rct, rfn, frame, rcur = self.cur
				676	rtt = rtt + t
				677	sft = rtt + rct
				678
				679	pt, ptt, pct, pfn, pframe, pcur = rcur
				680	self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur
				681
				682	tt, ct, callers = self.timings[rfn]
				683	if callers.has_key(pfn):
				684	callers[pfn] = callers[pfn] + 1
				685	else:
				686	callers[pfn] = 1
				687	self.timings[rfn] = tt+rtt, ct + sft, callers
				688
				689	return 1
				690
				691
				692	def snapshot_stats(self):
				693	self.stats = {}
				694	for func in self.timings.keys():
				695	tt, ct, callers = self.timings[func]
				696	nor_func = self.func_normalize(func)
				697	nor_callers = {}
				698	nc = 0
				699	for func_caller in callers.keys():
				700	nor_callers[self.func_normalize(func_caller)]=\
				701	callers[func_caller]
				702	nc = nc + callers[func_caller]
				703	self.stats[nor_func] = nc, nc, tt, ct, nor_callers
				704	\end{verbatim}
				705
				706
				707	\subsection{HotProfile Class}
				708
				709	This profiler is the fastest derived profile example. It does not
				710	calculate caller-callee relationships, and does not calculate
				711	cumulative time under a function. It only calculates time spent in a
				712	function, so it runs very quickly (re: very low overhead). In truth,
				713	the basic profiler is so fast, that is probably not worth the savings
				714	to give up the data, but this class still provides a nice example.
				715
				716	\begin{verbatim}
				717	class HotProfile(Profile):
				718
				719	def trace_dispatch_exception(self, frame, t):
				720	rt, rtt, rfn, rframe, rcur = self.cur
				721	if rcur and not rframe is frame:
				722	return self.trace_dispatch_return(rframe, t)
				723	return 0
				724
				725	def trace_dispatch_call(self, frame, t):
				726	self.cur = (t, 0, frame, self.cur)
				727	return 1
				728
				729	def trace_dispatch_return(self, frame, t):
				730	rt, rtt, frame, rcur = self.cur
				731
				732	rfn = `frame.f_code`
				733
				734	pt, ptt, pframe, pcur = rcur
				735	self.cur = pt, ptt+rt, pframe, pcur
				736
				737	if self.timings.has_key(rfn):
				738	nc, tt = self.timings[rfn]
				739	self.timings[rfn] = nc + 1, rt + rtt + tt
				740	else:
				741	self.timings[rfn] = 1, rt + rtt
				742
				743	return 1
				744
				745
				746	def snapshot_stats(self):
				747	self.stats = {}
				748	for func in self.timings.keys():
				749	nc, tt = self.timings[func]
				750	nor_func = self.func_normalize(func)
				751	self.stats[nor_func] = nc, nc, tt, 0, {}
				752	\end{verbatim}