Objects/lnotab_notes.txt - platform/external/python/cpython3 - Gitiles

 All about co_lnotab, the line number table.

 Code objects store a field named co_lnotab.  This is an array of unsigned bytes
 disguised as a Python bytes object.  It is used to map bytecode offsets to
 source code line #s for tracebacks and to identify line number boundaries for
 line tracing.

 The array is conceptually a compressed list of
     (bytecode offset increment, line number increment)
 pairs.  The details are important and delicate, best illustrated by example:

     byte code offset    source code line number
         0                   1
         6                   2
        50                   7
       350                 207
       361                 208

 Instead of storing these numbers literally, we compress the list by storing only
 the difference from one row to the next.  Conceptually, the stored list might
 look like:

     0, 1,  6, 1,  44, 5,  300, 200,  11, 1

 The above doesn't really work, but it's a start. An unsigned byte (byte code
 offset) can't hold negative values, or values larger than 255, a signed byte
 (line number) can't hold values larger than 127 or less than -128, and the
 above example contains two such values.  (Note that before 3.6, line number
 was also encoded by an unsigned byte.)  So we make two tweaks:

  (a) there's a deep assumption that byte code offsets increase monotonically,
  and
  (b) if byte code offset jumps by more than 255 from one row to the next, or if
  source code line number jumps by more than 127 or less than -128 from one row
  to the next, more than one pair is written to the table. In case #b,
  there's no way to know from looking at the table later how many were written.
  That's the delicate part.  A user of co_lnotab desiring to find the source
  line number corresponding to a bytecode address A should do something like
  this:

     lineno = addr = 0
     for addr_incr, line_incr in co_lnotab:
         addr += addr_incr
         if addr > A:
             return lineno
         if line_incr >= 0x80:
             line_incr -= 0x100
         lineno += line_incr

 (In C, this is implemented by PyCode_Addr2Line().)  In order for this to work,
 when the addr field increments by more than 255, the line # increment in each
 pair generated must be 0 until the remaining addr increment is < 256.  So, in
 the example above, assemble_lnotab in compile.c should not (as was actually done
 until 2.2) expand 300, 200 to
     255, 255, 45, 45,
 but to
     255, 0, 45, 127, 0, 73.

 The above is sufficient to reconstruct line numbers for tracebacks, but not for
 line tracing.  Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
 and maybe_call_line_trace() in ceval.c.

 *** Tracing ***

 To a first approximation, we want to call the tracing function when the line
 number of the current instruction changes.  Re-computing the current line for
 every instruction is a little slow, though, so each time we compute the line
 number we save the bytecode indices where it's valid:

      *instr_lb <= frame->f_lasti < *instr_ub

 is true so long as execution does not change lines.  That is, *instr_lb holds
 the first bytecode index of the current line, and *instr_ub holds the first
 bytecode index of the next line.  As long as the above expression is true,
 maybe_call_line_trace() does not need to call PyCode_CheckLineNumber().  Note
 that the same line may appear multiple times in the lnotab, either because the
 bytecode jumped more than 255 indices between line number changes or because
 the compiler inserted the same line twice.  Even in that case, *instr_ub holds
 the first index of the next line.

 However, we don't *always* want to call the line trace function when the above
 test fails.

 Consider this code:

 1: def f(a):
 2:    while a:
 3:       print(1)
 4:       break
 5:    else:
 6:       print(2)

 which compiles to this:

   2           0 SETUP_LOOP              26 (to 28)
         >>    2 LOAD_FAST                0 (a)
               4 POP_JUMP_IF_FALSE       18

   3           6 LOAD_GLOBAL              0 (print)
               8 LOAD_CONST               1 (1)
              10 CALL_FUNCTION            1
              12 POP_TOP

   4          14 BREAK_LOOP
              16 JUMP_ABSOLUTE            2
         >>   18 POP_BLOCK

   6          20 LOAD_GLOBAL              0 (print)
              22 LOAD_CONST               2 (2)
              24 CALL_FUNCTION            1
              26 POP_TOP
         >>   28 LOAD_CONST               0 (None)
              30 RETURN_VALUE

 If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
 and the co_lnotab will claim that execution has moved to line 4, which is wrong.
 In this case, we could instead associate the POP_BLOCK with line 5, but that
 would break jumps around loops without else clauses.

 We fix this by only calling the line trace function for a forward jump if the
 co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
 instruction offset matches the offset given for the start of a line by the
 co_lnotab.  For backward jumps, however, we always call the line trace function,
 which lets a debugger stop on every evaluation of a loop guard (which usually
 won't be the first opcode in a line).

 Why do we set f_lineno when tracing, and only just before calling the trace
 function?  Well, consider the code above when 'a' is true.  If stepping through
 this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
 line events on lines 2, 3, and 4, then a "return" type event -- but because the
 code for the return actually falls in the range of the "line 6" opcodes, you
 would be shown line 6 during this event.  This is a change from the behaviour in
 2.2 and before, and I've found it confusing in practice.  By setting and using
 f_lineno when tracing, one can report a line number different from that
 suggested by f_lasti on this one occasion where it's desirable.
	All about co_lnotab, the line number table.

	Code objects store a field named co_lnotab. This is an array of unsigned bytes
	disguised as a Python bytes object. It is used to map bytecode offsets to
	source code line #s for tracebacks and to identify line number boundaries for
	line tracing.

	The array is conceptually a compressed list of
	(bytecode offset increment, line number increment)
	pairs. The details are important and delicate, best illustrated by example:

	byte code offset source code line number
	0 1
	6 2
	50 7
	350 207
	361 208

	Instead of storing these numbers literally, we compress the list by storing only
	the difference from one row to the next. Conceptually, the stored list might
	look like:

	0, 1, 6, 1, 44, 5, 300, 200, 11, 1

	The above doesn't really work, but it's a start. An unsigned byte (byte code
	offset) can't hold negative values, or values larger than 255, a signed byte
	(line number) can't hold values larger than 127 or less than -128, and the
	above example contains two such values. (Note that before 3.6, line number
	was also encoded by an unsigned byte.) So we make two tweaks:

	(a) there's a deep assumption that byte code offsets increase monotonically,
	and
	(b) if byte code offset jumps by more than 255 from one row to the next, or if
	source code line number jumps by more than 127 or less than -128 from one row
	to the next, more than one pair is written to the table. In case #b,
	there's no way to know from looking at the table later how many were written.
	That's the delicate part. A user of co_lnotab desiring to find the source
	line number corresponding to a bytecode address A should do something like
	this:

	lineno = addr = 0
	for addr_incr, line_incr in co_lnotab:
	addr += addr_incr
	if addr > A:
	return lineno
	if line_incr >= 0x80:
	line_incr -= 0x100
	lineno += line_incr

	(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
	when the addr field increments by more than 255, the line # increment in each
	pair generated must be 0 until the remaining addr increment is < 256. So, in
	the example above, assemble_lnotab in compile.c should not (as was actually done
	until 2.2) expand 300, 200 to
	255, 255, 45, 45,
	but to
	255, 0, 45, 127, 0, 73.

	The above is sufficient to reconstruct line numbers for tracebacks, but not for
	line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
	and maybe_call_line_trace() in ceval.c.

	* Tracing *

	To a first approximation, we want to call the tracing function when the line
	number of the current instruction changes. Re-computing the current line for
	every instruction is a little slow, though, so each time we compute the line
	number we save the bytecode indices where it's valid:

	instr_lb <= frame->f_lasti < instr_ub

	is true so long as execution does not change lines. That is, *instr_lb holds
	the first bytecode index of the current line, and *instr_ub holds the first
	bytecode index of the next line. As long as the above expression is true,
	maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
	that the same line may appear multiple times in the lnotab, either because the
	bytecode jumped more than 255 indices between line number changes or because
	the compiler inserted the same line twice. Even in that case, *instr_ub holds
	the first index of the next line.

	However, we don't always want to call the line trace function when the above
	test fails.

	Consider this code:

	1: def f(a):
	2: while a:
	3: print(1)
	4: break
	5: else:
	6: print(2)

	which compiles to this:

	2 0 SETUP_LOOP 26 (to 28)
	>> 2 LOAD_FAST 0 (a)
	4 POP_JUMP_IF_FALSE 18

	3 6 LOAD_GLOBAL 0 (print)
	8 LOAD_CONST 1 (1)
	10 CALL_FUNCTION 1
	12 POP_TOP

	4 14 BREAK_LOOP
	16 JUMP_ABSOLUTE 2
	>> 18 POP_BLOCK

	6 20 LOAD_GLOBAL 0 (print)
	22 LOAD_CONST 2 (2)
	24 CALL_FUNCTION 1
	26 POP_TOP
	>> 28 LOAD_CONST 0 (None)
	30 RETURN_VALUE

	If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
	and the co_lnotab will claim that execution has moved to line 4, which is wrong.
	In this case, we could instead associate the POP_BLOCK with line 5, but that
	would break jumps around loops without else clauses.

	We fix this by only calling the line trace function for a forward jump if the
	co_lnotab indicates we have jumped to the start of a line, i.e. if the current
	instruction offset matches the offset given for the start of a line by the
	co_lnotab. For backward jumps, however, we always call the line trace function,
	which lets a debugger stop on every evaluation of a loop guard (which usually
	won't be the first opcode in a line).

	Why do we set f_lineno when tracing, and only just before calling the trace
	function? Well, consider the code above when 'a' is true. If stepping through
	this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
	line events on lines 2, 3, and 4, then a "return" type event -- but because the
	code for the return actually falls in the range of the "line 6" opcodes, you
	would be shown line 6 during this event. This is a change from the behaviour in
	2.2 and before, and I've found it confusing in practice. By setting and using
	f_lineno when tracing, one can report a line number different from that
	suggested by f_lasti on this one occasion where it's desirable.