blob: 59c65257082ccbb67f50e05af46b231a3a085b96 [file] [log] [blame]
Mark Shannon877df852020-11-12 09:43:29 +00001Description of the internal format of the line number table
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +00002
Mark Shannon877df852020-11-12 09:43:29 +00003Conceptually, the line number table consists of a sequence of triples:
4 start-offset (inclusive), end-offset (exclusive), line-number.
5
Skip Montanaro7cb033c2021-03-19 18:10:54 -05006Note that not all byte codes have a line number so we need handle `None` for the line-number.
Mark Shannon877df852020-11-12 09:43:29 +00007
8However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry.
9
Skip Montanaro7cb033c2021-03-19 18:10:54 -050010First, note that the end of one entry is the same as the start of the next, so we can overlap entries.
11Second, we don't really need arbitrary access to the sequence, so we can store deltas.
Mark Shannon877df852020-11-12 09:43:29 +000012
13We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero.
14
Skip Montanaro7cb033c2021-03-19 18:10:54 -050015Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line.
Mark Shannon877df852020-11-12 09:43:29 +000016
17Consider the following table
18 Start End Line
19 0 6 1
20 6 50 2
21 50 350 7
22 350 360 No line number
23 360 376 8
24 376 380 208
25
26Stripping the redundant ends gives:
27
28 End-Start Line-delta
29 6 +1
30 44 +1
31 300 +5
32 10 No line number
33 16 +1
34 4 +200
35
36
37Note that the end - start value is always positive.
38
Skip Montanaro7cb033c2021-03-19 18:10:54 -050039Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254,
Mark Shannon877df852020-11-12 09:43:29 +000040and line deltas to the range -127 <= delta <= 127.
41A line delta of -128 is used to indicate no line number.
42A start delta of 255 is used as a sentinel to mark the end of the table.
43Also note that a delta of zero indicates that there are no bytecodes in the given range,
Skip Montanaro7cb033c2021-03-19 18:10:54 -050044which means we can use an invalid line number for that range.
Mark Shannon877df852020-11-12 09:43:29 +000045
46Final form:
47
48 Start delta Line delta
49 6 +1
50 44 +1
51 254 +5
52 46 0
53 10 -128 (No line number, treated as a delta of zero)
54 16 +1
55 0 +127 (line 135, but the range is empty as no bytecodes are at line 135)
56 4 +73
57 255 (end mark) ---
58
59Iterating over the table.
60-------------------------
61
62For the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries.
63
64The code is as follows:
65
66def co_lines(code):
67 line = code.co_firstlineno
68 end = 0
69 table_iter = iter(code.internal_line_table):
70 for sdelta, ldelta in table_iter:
71 if sdelta == 255:
72 break
73 if ldelta == 0: # No change to line number, just accumulate changes to end
74 end += odelta
75 continue
76 start = end
77 end = start + sdelta
78 if ldelta == -128: # No valid line number -- skip entry
79 continue
80 line += ldelta
81 if end == start: # Empty range, omit.
82 continue
83 yield start, end, line
84
85
86
87
88The historical co_lnotab format
89-------------------------------
90
91prior to 3.10 code objects stored a field named co_lnotab.
92This was an array of unsigned bytes disguised as a Python bytes object.
93
94The old co_lnotab did not account for the presence of bytecodes without a line number,
95nor was it well suited to tracing as a number of workarounds were required.
96
97The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format.
98
99Below is the description of the old co_lnotab format:
100
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000101
102The array is conceptually a compressed list of
103 (bytecode offset increment, line number increment)
104pairs. The details are important and delicate, best illustrated by example:
105
106 byte code offset source code line number
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100107 0 1
108 6 2
109 50 7
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100110 350 207
111 361 208
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000112
113Instead of storing these numbers literally, we compress the list by storing only
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100114the difference from one row to the next. Conceptually, the stored list might
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000115look like:
116
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100117 0, 1, 6, 1, 44, 5, 300, 200, 11, 1
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000118
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100119The above doesn't really work, but it's a start. An unsigned byte (byte code
Victor Stinner9f789392016-01-21 18:12:29 +0100120offset) can't hold negative values, or values larger than 255, a signed byte
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100121(line number) can't hold values larger than 127 or less than -128, and the
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100122above example contains two such values. (Note that before 3.6, line number
123was also encoded by an unsigned byte.) So we make two tweaks:
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000124
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100125 (a) there's a deep assumption that byte code offsets increase monotonically,
126 and
127 (b) if byte code offset jumps by more than 255 from one row to the next, or if
128 source code line number jumps by more than 127 or less than -128 from one row
129 to the next, more than one pair is written to the table. In case #b,
130 there's no way to know from looking at the table later how many were written.
131 That's the delicate part. A user of co_lnotab desiring to find the source
132 line number corresponding to a bytecode address A should do something like
133 this:
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000134
135 lineno = addr = 0
136 for addr_incr, line_incr in co_lnotab:
137 addr += addr_incr
138 if addr > A:
139 return lineno
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100140 if line_incr >= 0x80:
141 line_incr -= 0x100
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000142 lineno += line_incr
143
144(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
145when the addr field increments by more than 255, the line # increment in each
146pair generated must be 0 until the remaining addr increment is < 256. So, in
147the example above, assemble_lnotab in compile.c should not (as was actually done
Victor Stinnerf3914eb2016-01-20 12:16:21 +0100148until 2.2) expand 300, 200 to
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000149 255, 255, 45, 45,
150but to
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100151 255, 0, 45, 127, 0, 73.
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000152
153The above is sufficient to reconstruct line numbers for tracebacks, but not for
154line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
155and maybe_call_line_trace() in ceval.c.
156
157*** Tracing ***
158
159To a first approximation, we want to call the tracing function when the line
160number of the current instruction changes. Re-computing the current line for
161every instruction is a little slow, though, so each time we compute the line
162number we save the bytecode indices where it's valid:
163
164 *instr_lb <= frame->f_lasti < *instr_ub
165
166is true so long as execution does not change lines. That is, *instr_lb holds
167the first bytecode index of the current line, and *instr_ub holds the first
168bytecode index of the next line. As long as the above expression is true,
169maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
170that the same line may appear multiple times in the lnotab, either because the
171bytecode jumped more than 255 indices between line number changes or because
172the compiler inserted the same line twice. Even in that case, *instr_ub holds
173the first index of the next line.
174
175However, we don't *always* want to call the line trace function when the above
176test fails.
177
178Consider this code:
179
1801: def f(a):
1812: while a:
Ivan Levkivskyi91352752017-03-14 20:42:09 +01001823: print(1)
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +00001834: break
1845: else:
Ivan Levkivskyi91352752017-03-14 20:42:09 +01001856: print(2)
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000186
187which compiles to this:
188
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100189 2 0 SETUP_LOOP 26 (to 28)
190 >> 2 LOAD_FAST 0 (a)
191 4 POP_JUMP_IF_FALSE 18
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000192
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100193 3 6 LOAD_GLOBAL 0 (print)
194 8 LOAD_CONST 1 (1)
195 10 CALL_FUNCTION 1
196 12 POP_TOP
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000197
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100198 4 14 BREAK_LOOP
199 16 JUMP_ABSOLUTE 2
200 >> 18 POP_BLOCK
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000201
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100202 6 20 LOAD_GLOBAL 0 (print)
203 22 LOAD_CONST 2 (2)
204 24 CALL_FUNCTION 1
205 26 POP_TOP
206 >> 28 LOAD_CONST 0 (None)
207 30 RETURN_VALUE
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000208
Ivan Levkivskyi91352752017-03-14 20:42:09 +0100209If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000210and the co_lnotab will claim that execution has moved to line 4, which is wrong.
211In this case, we could instead associate the POP_BLOCK with line 5, but that
212would break jumps around loops without else clauses.
213
214We fix this by only calling the line trace function for a forward jump if the
215co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
216instruction offset matches the offset given for the start of a line by the
217co_lnotab. For backward jumps, however, we always call the line trace function,
218which lets a debugger stop on every evaluation of a loop guard (which usually
219won't be the first opcode in a line).
220
221Why do we set f_lineno when tracing, and only just before calling the trace
222function? Well, consider the code above when 'a' is true. If stepping through
223this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
224line events on lines 2, 3, and 4, then a "return" type event -- but because the
225code for the return actually falls in the range of the "line 6" opcodes, you
226would be shown line 6 during this event. This is a change from the behaviour in
2272.2 and before, and I've found it confusing in practice. By setting and using
228f_lineno when tracing, one can report a line number different from that
229suggested by f_lasti on this one occasion where it's desirable.