njn | 76896d7 | 2005-07-19 03:30:31 +0000 | [diff] [blame] | 1 | |
| 2 | This file describes in detail how Calltree accurately tracks function |
| 3 | entry/exit, one of those harder-than-you'd-think things. |
| 4 | |
| 5 | ----------------------------------------------------------------------------- |
| 6 | Josef's description |
| 7 | ----------------------------------------------------------------------------- |
| 8 | From: Josef Weidendorfer <Josef.Weidendorfer@gmx.de> |
| 9 | To: Nicholas Nethercote <njn25@cam.ac.uk> |
| 10 | Cc: valgrind-developers@lists.sourceforge.net |
| 11 | Subject: [Valgrind-developers] Re: Tracking function entry/exit |
| 12 | |
| 13 | On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote: |
| 14 | > Josef, |
| 15 | > |
| 16 | > The topic of tracking function entry/exit has come up a few times on the |
| 17 | > mailing lists recently. My usual answer is that it's difficult to do |
| 18 | > correctly. However, you seem to do it with Calltree. I looked at the |
| 19 | > source code a bit, and it looks like you are doing some reasonably |
| 20 | > complicated things to get it right, eg. unwinding the stack. How robust |
| 21 | > is your approach? Can you briefly explain how it works? |
| 22 | |
| 23 | A note before describing the mechanism: I need to have a helper call at start |
| 24 | of every BB anyway, so I use this helper to do the tracking. This of course |
| 25 | has some overhead, and perhaps can be avoided, but it seems to add to the |
| 26 | robustness. I have a bug fix here for reentrent entering of a signal handler |
| 27 | (2 bug reports). Otherwise I have no bug reports, so I assume that the |
| 28 | mechanism to be quite robust. |
| 29 | |
| 30 | I have a shadow call stack for every thread. For signal handlers of a thread, |
| 31 | I first PUSH a separation marker on the shadow stack, and use the stack as |
| 32 | normal. The marker is used for unwinding when leaving the signal handler. |
| 33 | This is fine as there is no scheduling among signal handlers of one thread. |
| 34 | |
| 35 | Instrumentation of calltree: |
| 36 | * Store at the end of each basic block the jmpkind into a tool-global, static |
| 37 | variable. |
| 38 | * At the start of every BB, jump to a helper function. |
| 39 | |
| 40 | The helper function does the following regarding function call tracking: |
| 41 | - for a control transfer to another ELF object/ELF section, override jmpkind |
| 42 | with a CALL (*1) |
| 43 | - for a control transfer to the 1st basic block of a function, override |
| 44 | jmpkind with a CALL (*2) |
| 45 | - do unwinding if needed (i.e, POPs of the shadow call stack) |
| 46 | - if jmpkind is RET and there was no unwinding/POP: |
| 47 | - if our call stack is empty, simulate a CALL lasting from beginning |
| 48 | (with Valgrind 2.1.x, this is not needed any more, as we run on |
| 49 | simulated CPU from first client instruction) |
| 50 | - otherwise this is a JMP using a RET instruction (typically used in |
| 51 | the runtime linker). Do a POP, setting previous BB address to call |
| 52 | site and override jmpkind with a CALL. By this, you get 2 function |
| 53 | calls from a calling site. |
| 54 | - when jmpkind is a CALL, push new function call from previous BB to current |
| 55 | BB on shadow call stack. |
| 56 | - Save current BB address to be available for call to handler in next BB. |
| 57 | |
| 58 | Special care is needed at thread switches and enter/leave of signal handlers, |
| 59 | as we need separate shadow call stacks. |
| 60 | |
| 61 | Known bug: We should check for the need of unwinding when ESP is explicitly |
| 62 | written to. I hope this doesn't create too much overhead. |
| 63 | |
| 64 | Remarks: |
| 65 | (*1) Jumps between ELF objects are function calls to a shared library. This is |
| 66 | mainly done to catch the JMP from PLT code. |
| 67 | (*2) This is what your function tracking skin/tool does. It is needed here |
| 68 | mainly to catch tail recursion. In general, for functions doing a |
| 69 | "return otherfunction()", GCC produces JMPs with -O2. |
| 70 | |
| 71 | Additional points: |
| 72 | - If I need a name for a function, but there is no debug info, I use the |
| 73 | instruction address minus the load offset of the corresponding ELF object |
| 74 | (if there is one) to get a relative address for that ELF object. This |
| 75 | offset can be used with objdump later in postprocessing tools (e.g. |
| 76 | objdump). I would suggest this change even for cachegrind instead of a |
| 77 | "???". |
| 78 | - I introduced the ability to specify functions to be "skipped". This means |
| 79 | that execution of these functions is attributed to the calling function. |
| 80 | The default is to skip all functions located in PLT sections. Thus, in |
| 81 | effect, costs of PLT functions are attributed to callers, and the call to |
| 82 | a shared library function starts directly with code in the other ELF |
| 83 | object. |
| 84 | - As Vg 2.1.x does pointerchecking, the instrumentation can't write to |
| 85 | memory space of Valgrind any longer. Currently, my tool needs |
| 86 | "--pointercheck=no" to be able to run. Jeremy and me already agreed on |
| 87 | replacing current LD/ST with a CLD/CST (Client Load/Store) with pointer |
| 88 | check and keep original LD/ST for tool usage without pointerchecking. |
| 89 | |
| 90 | Looking at these things, it seems possible to do function tracking at end of a |
| 91 | basic block instead of the beginning of the next BB. This way, we can perhaps |
| 92 | avoid calls to helpers at every BB. |
| 93 | |
| 94 | From my point of view, it would be great to integrate optional function |
| 95 | tracking into Valgrind core with some hooks. |
| 96 | |
| 97 | Josef |
| 98 | |
| 99 | |
| 100 | ----------------------------------------------------------------------------- |
| 101 | Josef's clarification of Nick's summary of Josef's description |
| 102 | ----------------------------------------------------------------------------- |
| 103 | On Monday 21 June 2004 12:15, Nicholas Nethercote wrote: |
| 104 | |
| 105 | > I've paraphrased your description to help me understand it better, but I'm |
| 106 | > still not quite clear on some points. I looked at the code, but found it |
| 107 | > hard to understand. Could you help me? I've written my questions in |
| 108 | > square brackets. Here's the description. |
| 109 | > |
| 110 | > -------- |
| 111 | > |
| 112 | > Data structures: |
| 113 | > |
| 114 | > - have a shadow call stack for every thread |
| 115 | > [not sure exactly what goes on this] |
| 116 | |
| 117 | That's the resizable array of struct _call_entry's. |
| 118 | Probably most important for call tracking is the %ESP value |
| 119 | directly after a CALL, and a pointer to some struct storing information |
| 120 | about the call arc or the called function. |
| 121 | |
| 122 | The esp value is needed to be able to robustly unwind correctly at %esp |
| 123 | changes with %esp > stored esp on shadow stack. |
| 124 | |
| 125 | > Action at BB start -- depends on jmp_kind from previous BB: |
| 126 | > |
| 127 | > - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring, |
| 128 | > JmpCond or JmpSyscall) and we transferred from one ELF object/section to |
| 129 | > another, it must be a function call to a shared library -- treat as a |
| 130 | > call. This catches jmps from PLT code. |
| 131 | > |
| 132 | > - If this is the first BB of a function, treat as a call. This catches |
| 133 | > tail calls (which gcc uses for "return f()" with -O2). |
| 134 | > [What if a function had a 'goto' back to its beginning? Would that be |
| 135 | > interpreted as a call?] |
| 136 | |
| 137 | Yes. IMHO, there is no way to distinguish between optimized tail recursion |
| 138 | using a jump and regular jumping. But as most functions need parameters on |
| 139 | the stack, a normal jump will rarely jump to the first BB of a function, |
| 140 | wouldn't it? |
| 141 | |
| 142 | > - Unwind the shadow call stack if necessary. |
| 143 | > [when is "necessary"? If the real %esp > the shadow stack %esp?] |
| 144 | |
| 145 | Yes. Currently I do this at every BB boundary, but perhaps it should be |
| 146 | checked at every %esp change. Then, OTOH, it would look strange to attribute |
| 147 | instructions of one BB to different functions? |
| 148 | |
| 149 | > - If this is a function return and there was no shadow stack unwinding, |
| 150 | > this must be a RET control transfer (typically used in the runtime |
| 151 | > linker). Pop the shadow call stack, setting the previous BB address to |
| 152 | > call site and override jmpkind with a CALL. By this, you get 2 function |
| 153 | > calls from a calling site. |
| 154 | > [I don't understand this... What is a "RET control transfer"? Why do |
| 155 | > you end up with 2 function calls -- is that a bad thing?] |
| 156 | |
| 157 | If there is a RET instruction, this usually should unwind (i.e. leave a |
| 158 | function) at least one entry of the shadow call stack. But this doesn't need |
| 159 | to be the case, i.e. even after a RET, %esp could be lower or equal to the |
| 160 | one on the shadow stack. E.g. suppose |
| 161 | |
| 162 | PUSH addr |
| 163 | RET |
| 164 | |
| 165 | This is only another way of saying "JMP addr", and doesn't add/remove any |
| 166 | stack frame at all. |
| 167 | Now, if addr is (according to debug information) inside of another function, |
| 168 | this is a JMP between functions, let's say from B to C. Suppose B was called |
| 169 | from A, I generate a RETURN event to A and a CALL event from A to C in this |
| 170 | case. |
| 171 | |
| 172 | > - If we're treating the control transfer as a call, push new function call |
| 173 | > from previous BB to current BB on shadow call stack. |
| 174 | > [when is this information used?] |
| 175 | |
| 176 | I meant: Append a struct call_entry to the shadow stack (together with the |
| 177 | current %esp value). As I said before, the shadow stack is used for robust |
| 178 | unwinding. |
| 179 | |
| 180 | > - Save current BB address to be available for call to handler in next BB. |
| 181 | > |
| 182 | > |
| 183 | > Other actions: |
| 184 | > |
| 185 | > When entering a signal handler, first push a separation marker on the |
| 186 | > thread's shadow stack, then use it as normal. The marker is used for |
| 187 | > unwinding when leaving the signal handler. This is fine as there is no |
| 188 | > scheduling among signal handlers of one thread. |
| 189 | > |
| 190 | > Special care is needed at thread switches and enter/leave of signal |
| 191 | > handlers, as we need separate shadow call stacks. |
| 192 | > [Do you mean "separate shadow call stacks for each thread"?] |
| 193 | |
| 194 | Yes. |
| 195 | |
| 196 | > What about stack switching -- does it cope with that? (Not that Valgrind |
| 197 | > in general does...) |
| 198 | |
| 199 | No. |
| 200 | If you could give me a hint how to do it, I would be pleased. The problem here |
| 201 | IMHO is: How to distinguish among a stack switch and allocating a huge array |
| 202 | on the stack? |
| 203 | |
| 204 | Josef |
| 205 | |