blob: 515375772e631713cff133a4de9a6042b763e89a [file] [log] [blame]
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +00001All about co_lnotab, the line number table.
2
3Code objects store a field named co_lnotab. This is an array of unsigned bytes
4disguised as a Python string. It is used to map bytecode offsets to source code
5line #s for tracebacks and to identify line number boundaries for line tracing.
6
7The array is conceptually a compressed list of
8 (bytecode offset increment, line number increment)
9pairs. The details are important and delicate, best illustrated by example:
10
11 byte code offset source code line number
12 0 1
13 6 2
14 50 7
Victor Stinnerf3914eb2016-01-20 12:16:21 +010015 350 207
16 361 208
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000017
18Instead of storing these numbers literally, we compress the list by storing only
Victor Stinnerf3914eb2016-01-20 12:16:21 +010019the difference from one row to the next. Conceptually, the stored list might
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000020look like:
21
Victor Stinnerf3914eb2016-01-20 12:16:21 +010022 0, 1, 6, 1, 44, 5, 300, 200, 11, 1
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000023
Victor Stinnerf3914eb2016-01-20 12:16:21 +010024The above doesn't really work, but it's a start. An unsigned byte (byte code
Victor Stinner9f789392016-01-21 18:12:29 +010025offset) can't hold negative values, or values larger than 255, a signed byte
Victor Stinnerf3914eb2016-01-20 12:16:21 +010026(line number) can't hold values larger than 127 or less than -128, and the
27above example contains two such values. So we make two tweaks:
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000028
Victor Stinnerf3914eb2016-01-20 12:16:21 +010029 (a) there's a deep assumption that byte code offsets increase monotonically,
30 and
31 (b) if byte code offset jumps by more than 255 from one row to the next, or if
32 source code line number jumps by more than 127 or less than -128 from one row
33 to the next, more than one pair is written to the table. In case #b,
34 there's no way to know from looking at the table later how many were written.
35 That's the delicate part. A user of co_lnotab desiring to find the source
36 line number corresponding to a bytecode address A should do something like
37 this:
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000038
39 lineno = addr = 0
40 for addr_incr, line_incr in co_lnotab:
41 addr += addr_incr
42 if addr > A:
43 return lineno
Victor Stinnerf3914eb2016-01-20 12:16:21 +010044 if line_incr >= 0x80:
45 line_incr -= 0x100
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000046 lineno += line_incr
47
48(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
49when the addr field increments by more than 255, the line # increment in each
50pair generated must be 0 until the remaining addr increment is < 256. So, in
51the example above, assemble_lnotab in compile.c should not (as was actually done
Victor Stinnerf3914eb2016-01-20 12:16:21 +010052until 2.2) expand 300, 200 to
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000053 255, 255, 45, 45,
54but to
Victor Stinnerf3914eb2016-01-20 12:16:21 +010055 255, 0, 45, 128, 0, 72.
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000056
57The above is sufficient to reconstruct line numbers for tracebacks, but not for
58line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
59and maybe_call_line_trace() in ceval.c.
60
61*** Tracing ***
62
63To a first approximation, we want to call the tracing function when the line
64number of the current instruction changes. Re-computing the current line for
65every instruction is a little slow, though, so each time we compute the line
66number we save the bytecode indices where it's valid:
67
68 *instr_lb <= frame->f_lasti < *instr_ub
69
70is true so long as execution does not change lines. That is, *instr_lb holds
71the first bytecode index of the current line, and *instr_ub holds the first
72bytecode index of the next line. As long as the above expression is true,
73maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
74that the same line may appear multiple times in the lnotab, either because the
75bytecode jumped more than 255 indices between line number changes or because
76the compiler inserted the same line twice. Even in that case, *instr_ub holds
77the first index of the next line.
78
79However, we don't *always* want to call the line trace function when the above
80test fails.
81
82Consider this code:
83
841: def f(a):
852: while a:
863: print 1,
874: break
885: else:
896: print 2,
90
91which compiles to this:
92
93 2 0 SETUP_LOOP 19 (to 22)
94 >> 3 LOAD_FAST 0 (a)
95 6 POP_JUMP_IF_FALSE 17
96
97 3 9 LOAD_CONST 1 (1)
Victor Stinner9f789392016-01-21 18:12:29 +010098 12 PRINT_ITEM
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +000099
Victor Stinner9f789392016-01-21 18:12:29 +0100100 4 13 BREAK_LOOP
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000101 14 JUMP_ABSOLUTE 3
Victor Stinner9f789392016-01-21 18:12:29 +0100102 >> 17 POP_BLOCK
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000103
104 6 18 LOAD_CONST 2 (2)
Victor Stinner9f789392016-01-21 18:12:29 +0100105 21 PRINT_ITEM
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000106 >> 22 LOAD_CONST 0 (None)
Victor Stinner9f789392016-01-21 18:12:29 +0100107 25 RETURN_VALUE
Alexandre Vassalotti7b82b402009-07-21 04:30:03 +0000108
109If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 17
110and the co_lnotab will claim that execution has moved to line 4, which is wrong.
111In this case, we could instead associate the POP_BLOCK with line 5, but that
112would break jumps around loops without else clauses.
113
114We fix this by only calling the line trace function for a forward jump if the
115co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
116instruction offset matches the offset given for the start of a line by the
117co_lnotab. For backward jumps, however, we always call the line trace function,
118which lets a debugger stop on every evaluation of a loop guard (which usually
119won't be the first opcode in a line).
120
121Why do we set f_lineno when tracing, and only just before calling the trace
122function? Well, consider the code above when 'a' is true. If stepping through
123this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
124line events on lines 2, 3, and 4, then a "return" type event -- but because the
125code for the return actually falls in the range of the "line 6" opcodes, you
126would be shown line 6 during this event. This is a change from the behaviour in
1272.2 and before, and I've found it confusing in practice. By setting and using
128f_lineno when tracing, one can report a line number different from that
129suggested by f_lasti on this one occasion where it's desirable.