Blame - Objects/lnotab_notes.txt - platform/external/python/cpython3

blob: f482310a34dd3a999dde1b0958a109417596072a [file] [log] [blame]

Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	1	Description of the internal format of the line number table
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	2
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	3	Conceptually, the line number table consists of a sequence of triples:
				4	start-offset (inclusive), end-offset (exclusive), line-number.
				5
Skip Montanaro	7cb033c	2021-03-19 18:10:54 -0500	[diff] [blame]	6	Note that not all byte codes have a line number so we need handle `None` for the line-number.
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	7
				8	However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry.
				9
Skip Montanaro	7cb033c	2021-03-19 18:10:54 -0500	[diff] [blame]	10	First, note that the end of one entry is the same as the start of the next, so we can overlap entries.
				11	Second, we don't really need arbitrary access to the sequence, so we can store deltas.
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	12
				13	We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero.
				14
Skip Montanaro	7cb033c	2021-03-19 18:10:54 -0500	[diff] [blame]	15	Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line.
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	16
				17	Consider the following table
				18	Start End Line
				19	0 6 1
				20	6 50 2
				21	50 350 7
				22	350 360 No line number
				23	360 376 8
				24	376 380 208
				25
				26	Stripping the redundant ends gives:
				27
				28	End-Start Line-delta
				29	6 +1
				30	44 +1
				31	300 +5
				32	10 No line number
				33	16 +1
				34	4 +200
				35
				36
				37	Note that the end - start value is always positive.
				38
Skip Montanaro	7cb033c	2021-03-19 18:10:54 -0500	[diff] [blame]	39	Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254,
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	40	and line deltas to the range -127 <= delta <= 127.
				41	A line delta of -128 is used to indicate no line number.
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	42	Also note that a delta of zero indicates that there are no bytecodes in the given range,
Skip Montanaro	7cb033c	2021-03-19 18:10:54 -0500	[diff] [blame]	43	which means we can use an invalid line number for that range.
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	44
				45	Final form:
				46
				47	Start delta Line delta
				48	6 +1
				49	44 +1
				50	254 +5
				51	46 0
				52	10 -128 (No line number, treated as a delta of zero)
				53	16 +1
				54	0 +127 (line 135, but the range is empty as no bytecodes are at line 135)
				55	4 +73
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	56
				57	Iterating over the table.
				58	-------------------------
				59
				60	For the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries.
				61
				62	The code is as follows:
				63
				64	def co_lines(code):
				65	line = code.co_firstlineno
				66	end = 0
				67	table_iter = iter(code.internal_line_table):
				68	for sdelta, ldelta in table_iter:
Mark Shannon	877df85	2020-11-12 09:43:29 +0000	[diff] [blame]	69	if ldelta == 0: # No change to line number, just accumulate changes to end
				70	end += odelta
				71	continue
				72	start = end
				73	end = start + sdelta
				74	if ldelta == -128: # No valid line number -- skip entry
				75	continue
				76	line += ldelta
				77	if end == start: # Empty range, omit.
				78	continue
				79	yield start, end, line
				80
				81
				82
				83
				84	The historical co_lnotab format
				85	-------------------------------
				86
				87	prior to 3.10 code objects stored a field named co_lnotab.
				88	This was an array of unsigned bytes disguised as a Python bytes object.
				89
				90	The old co_lnotab did not account for the presence of bytecodes without a line number,
				91	nor was it well suited to tracing as a number of workarounds were required.
				92
				93	The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format.
				94
				95	Below is the description of the old co_lnotab format:
				96
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	97
				98	The array is conceptually a compressed list of
				99	(bytecode offset increment, line number increment)
				100	pairs. The details are important and delicate, best illustrated by example:
				101
				102	byte code offset source code line number
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	103	0 1
				104	6 2
				105	50 7
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	106	350 207
				107	361 208
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	108
				109	Instead of storing these numbers literally, we compress the list by storing only
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	110	the difference from one row to the next. Conceptually, the stored list might
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	111	look like:
				112
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	113	0, 1, 6, 1, 44, 5, 300, 200, 11, 1
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	114
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	115	The above doesn't really work, but it's a start. An unsigned byte (byte code
Victor Stinner	9f78939	2016-01-21 18:12:29 +0100	[diff] [blame]	116	offset) can't hold negative values, or values larger than 255, a signed byte
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	117	(line number) can't hold values larger than 127 or less than -128, and the
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	118	above example contains two such values. (Note that before 3.6, line number
				119	was also encoded by an unsigned byte.) So we make two tweaks:
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	120
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	121	(a) there's a deep assumption that byte code offsets increase monotonically,
				122	and
				123	(b) if byte code offset jumps by more than 255 from one row to the next, or if
				124	source code line number jumps by more than 127 or less than -128 from one row
				125	to the next, more than one pair is written to the table. In case #b,
				126	there's no way to know from looking at the table later how many were written.
				127	That's the delicate part. A user of co_lnotab desiring to find the source
				128	line number corresponding to a bytecode address A should do something like
				129	this:
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	130
				131	lineno = addr = 0
				132	for addr_incr, line_incr in co_lnotab:
				133	addr += addr_incr
				134	if addr > A:
				135	return lineno
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	136	if line_incr >= 0x80:
				137	line_incr -= 0x100
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	138	lineno += line_incr
				139
				140	(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
				141	when the addr field increments by more than 255, the line # increment in each
				142	pair generated must be 0 until the remaining addr increment is < 256. So, in
				143	the example above, assemble_lnotab in compile.c should not (as was actually done
Victor Stinner	f3914eb	2016-01-20 12:16:21 +0100	[diff] [blame]	144	until 2.2) expand 300, 200 to
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	145	255, 255, 45, 45,
				146	but to
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	147	255, 0, 45, 127, 0, 73.
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	148
				149	The above is sufficient to reconstruct line numbers for tracebacks, but not for
				150	line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
				151	and maybe_call_line_trace() in ceval.c.
				152
				153	* Tracing *
				154
				155	To a first approximation, we want to call the tracing function when the line
				156	number of the current instruction changes. Re-computing the current line for
				157	every instruction is a little slow, though, so each time we compute the line
				158	number we save the bytecode indices where it's valid:
				159
				160	instr_lb <= frame->f_lasti < instr_ub
				161
				162	is true so long as execution does not change lines. That is, *instr_lb holds
				163	the first bytecode index of the current line, and *instr_ub holds the first
				164	bytecode index of the next line. As long as the above expression is true,
				165	maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
				166	that the same line may appear multiple times in the lnotab, either because the
				167	bytecode jumped more than 255 indices between line number changes or because
				168	the compiler inserted the same line twice. Even in that case, *instr_ub holds
				169	the first index of the next line.
				170
				171	However, we don't always want to call the line trace function when the above
				172	test fails.
				173
				174	Consider this code:
				175
				176	1: def f(a):
				177	2: while a:
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	178	3: print(1)
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	179	4: break
				180	5: else:
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	181	6: print(2)
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	182
				183	which compiles to this:
				184
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	185	2 0 SETUP_LOOP 26 (to 28)
				186	>> 2 LOAD_FAST 0 (a)
				187	4 POP_JUMP_IF_FALSE 18
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	188
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	189	3 6 LOAD_GLOBAL 0 (print)
				190	8 LOAD_CONST 1 (1)
				191	10 CALL_FUNCTION 1
				192	12 POP_TOP
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	193
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	194	4 14 BREAK_LOOP
				195	16 JUMP_ABSOLUTE 2
				196	>> 18 POP_BLOCK
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	197
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	198	6 20 LOAD_GLOBAL 0 (print)
				199	22 LOAD_CONST 2 (2)
				200	24 CALL_FUNCTION 1
				201	26 POP_TOP
				202	>> 28 LOAD_CONST 0 (None)
				203	30 RETURN_VALUE
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	204
Ivan Levkivskyi	9135275	2017-03-14 20:42:09 +0100	[diff] [blame]	205	If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
Alexandre Vassalotti	7b82b40	2009-07-21 04:30:03 +0000	[diff] [blame]	206	and the co_lnotab will claim that execution has moved to line 4, which is wrong.
				207	In this case, we could instead associate the POP_BLOCK with line 5, but that
				208	would break jumps around loops without else clauses.
				209
				210	We fix this by only calling the line trace function for a forward jump if the
				211	co_lnotab indicates we have jumped to the start of a line, i.e. if the current
				212	instruction offset matches the offset given for the start of a line by the
				213	co_lnotab. For backward jumps, however, we always call the line trace function,
				214	which lets a debugger stop on every evaluation of a loop guard (which usually
				215	won't be the first opcode in a line).
				216
				217	Why do we set f_lineno when tracing, and only just before calling the trace
				218	function? Well, consider the code above when 'a' is true. If stepping through
				219	this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
				220	line events on lines 2, 3, and 4, then a "return" type event -- but because the
				221	code for the return actually falls in the range of the "line 6" opcodes, you
				222	would be shown line 6 during this event. This is a change from the behaviour in
				223	2.2 and before, and I've found it confusing in practice. By setting and using
				224	f_lineno when tracing, one can report a line number different from that
				225	suggested by f_lasti on this one occasion where it's desirable.