| |
| This file describes in detail how Calltree accurately tracks function |
| entry/exit, one of those harder-than-you'd-think things. |
| |
| ----------------------------------------------------------------------------- |
| Josef's description |
| ----------------------------------------------------------------------------- |
| From: Josef Weidendorfer <Josef.Weidendorfer@gmx.de> |
| To: Nicholas Nethercote <njn25@cam.ac.uk> |
| Cc: valgrind-developers@lists.sourceforge.net |
| Subject: [Valgrind-developers] Re: Tracking function entry/exit |
| |
| On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote: |
| > Josef, |
| > |
| > The topic of tracking function entry/exit has come up a few times on the |
| > mailing lists recently. My usual answer is that it's difficult to do |
| > correctly. However, you seem to do it with Calltree. I looked at the |
| > source code a bit, and it looks like you are doing some reasonably |
| > complicated things to get it right, eg. unwinding the stack. How robust |
| > is your approach? Can you briefly explain how it works? |
| |
| A note before describing the mechanism: I need to have a helper call at start |
| of every BB anyway, so I use this helper to do the tracking. This of course |
| has some overhead, and perhaps can be avoided, but it seems to add to the |
| robustness. I have a bug fix here for reentrent entering of a signal handler |
| (2 bug reports). Otherwise I have no bug reports, so I assume that the |
| mechanism to be quite robust. |
| |
| I have a shadow call stack for every thread. For signal handlers of a thread, |
| I first PUSH a separation marker on the shadow stack, and use the stack as |
| normal. The marker is used for unwinding when leaving the signal handler. |
| This is fine as there is no scheduling among signal handlers of one thread. |
| |
| Instrumentation of calltree: |
| * Store at the end of each basic block the jmpkind into a tool-global, static |
| variable. |
| * At the start of every BB, jump to a helper function. |
| |
| The helper function does the following regarding function call tracking: |
| - for a control transfer to another ELF object/ELF section, override jmpkind |
| with a CALL (*1) |
| - for a control transfer to the 1st basic block of a function, override |
| jmpkind with a CALL (*2) |
| - do unwinding if needed (i.e, POPs of the shadow call stack) |
| - if jmpkind is RET and there was no unwinding/POP: |
| - if our call stack is empty, simulate a CALL lasting from beginning |
| (with Valgrind 2.1.x, this is not needed any more, as we run on |
| simulated CPU from first client instruction) |
| - otherwise this is a JMP using a RET instruction (typically used in |
| the runtime linker). Do a POP, setting previous BB address to call |
| site and override jmpkind with a CALL. By this, you get 2 function |
| calls from a calling site. |
| - when jmpkind is a CALL, push new function call from previous BB to current |
| BB on shadow call stack. |
| - Save current BB address to be available for call to handler in next BB. |
| |
| Special care is needed at thread switches and enter/leave of signal handlers, |
| as we need separate shadow call stacks. |
| |
| Known bug: We should check for the need of unwinding when ESP is explicitly |
| written to. I hope this doesn't create too much overhead. |
| |
| Remarks: |
| (*1) Jumps between ELF objects are function calls to a shared library. This is |
| mainly done to catch the JMP from PLT code. |
| (*2) This is what your function tracking skin/tool does. It is needed here |
| mainly to catch tail recursion. In general, for functions doing a |
| "return otherfunction()", GCC produces JMPs with -O2. |
| |
| Additional points: |
| - If I need a name for a function, but there is no debug info, I use the |
| instruction address minus the load offset of the corresponding ELF object |
| (if there is one) to get a relative address for that ELF object. This |
| offset can be used with objdump later in postprocessing tools (e.g. |
| objdump). I would suggest this change even for cachegrind instead of a |
| "???". |
| - I introduced the ability to specify functions to be "skipped". This means |
| that execution of these functions is attributed to the calling function. |
| The default is to skip all functions located in PLT sections. Thus, in |
| effect, costs of PLT functions are attributed to callers, and the call to |
| a shared library function starts directly with code in the other ELF |
| object. |
| - As Vg 2.1.x does pointerchecking, the instrumentation can't write to |
| memory space of Valgrind any longer. Currently, my tool needs |
| "--pointercheck=no" to be able to run. Jeremy and me already agreed on |
| replacing current LD/ST with a CLD/CST (Client Load/Store) with pointer |
| check and keep original LD/ST for tool usage without pointerchecking. |
| |
| Looking at these things, it seems possible to do function tracking at end of a |
| basic block instead of the beginning of the next BB. This way, we can perhaps |
| avoid calls to helpers at every BB. |
| |
| From my point of view, it would be great to integrate optional function |
| tracking into Valgrind core with some hooks. |
| |
| Josef |
| |
| |
| ----------------------------------------------------------------------------- |
| Josef's clarification of Nick's summary of Josef's description |
| ----------------------------------------------------------------------------- |
| On Monday 21 June 2004 12:15, Nicholas Nethercote wrote: |
| |
| > I've paraphrased your description to help me understand it better, but I'm |
| > still not quite clear on some points. I looked at the code, but found it |
| > hard to understand. Could you help me? I've written my questions in |
| > square brackets. Here's the description. |
| > |
| > -------- |
| > |
| > Data structures: |
| > |
| > - have a shadow call stack for every thread |
| > [not sure exactly what goes on this] |
| |
| That's the resizable array of struct _call_entry's. |
| Probably most important for call tracking is the %ESP value |
| directly after a CALL, and a pointer to some struct storing information |
| about the call arc or the called function. |
| |
| The esp value is needed to be able to robustly unwind correctly at %esp |
| changes with %esp > stored esp on shadow stack. |
| |
| > Action at BB start -- depends on jmp_kind from previous BB: |
| > |
| > - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring, |
| > JmpCond or JmpSyscall) and we transferred from one ELF object/section to |
| > another, it must be a function call to a shared library -- treat as a |
| > call. This catches jmps from PLT code. |
| > |
| > - If this is the first BB of a function, treat as a call. This catches |
| > tail calls (which gcc uses for "return f()" with -O2). |
| > [What if a function had a 'goto' back to its beginning? Would that be |
| > interpreted as a call?] |
| |
| Yes. IMHO, there is no way to distinguish between optimized tail recursion |
| using a jump and regular jumping. But as most functions need parameters on |
| the stack, a normal jump will rarely jump to the first BB of a function, |
| wouldn't it? |
| |
| > - Unwind the shadow call stack if necessary. |
| > [when is "necessary"? If the real %esp > the shadow stack %esp?] |
| |
| Yes. Currently I do this at every BB boundary, but perhaps it should be |
| checked at every %esp change. Then, OTOH, it would look strange to attribute |
| instructions of one BB to different functions? |
| |
| > - If this is a function return and there was no shadow stack unwinding, |
| > this must be a RET control transfer (typically used in the runtime |
| > linker). Pop the shadow call stack, setting the previous BB address to |
| > call site and override jmpkind with a CALL. By this, you get 2 function |
| > calls from a calling site. |
| > [I don't understand this... What is a "RET control transfer"? Why do |
| > you end up with 2 function calls -- is that a bad thing?] |
| |
| If there is a RET instruction, this usually should unwind (i.e. leave a |
| function) at least one entry of the shadow call stack. But this doesn't need |
| to be the case, i.e. even after a RET, %esp could be lower or equal to the |
| one on the shadow stack. E.g. suppose |
| |
| PUSH addr |
| RET |
| |
| This is only another way of saying "JMP addr", and doesn't add/remove any |
| stack frame at all. |
| Now, if addr is (according to debug information) inside of another function, |
| this is a JMP between functions, let's say from B to C. Suppose B was called |
| from A, I generate a RETURN event to A and a CALL event from A to C in this |
| case. |
| |
| > - If we're treating the control transfer as a call, push new function call |
| > from previous BB to current BB on shadow call stack. |
| > [when is this information used?] |
| |
| I meant: Append a struct call_entry to the shadow stack (together with the |
| current %esp value). As I said before, the shadow stack is used for robust |
| unwinding. |
| |
| > - Save current BB address to be available for call to handler in next BB. |
| > |
| > |
| > Other actions: |
| > |
| > When entering a signal handler, first push a separation marker on the |
| > thread's shadow stack, then use it as normal. The marker is used for |
| > unwinding when leaving the signal handler. This is fine as there is no |
| > scheduling among signal handlers of one thread. |
| > |
| > Special care is needed at thread switches and enter/leave of signal |
| > handlers, as we need separate shadow call stacks. |
| > [Do you mean "separate shadow call stacks for each thread"?] |
| |
| Yes. |
| |
| > What about stack switching -- does it cope with that? (Not that Valgrind |
| > in general does...) |
| |
| No. |
| If you could give me a hint how to do it, I would be pleased. The problem here |
| IMHO is: How to distinguish among a stack switch and allocating a huge array |
| on the stack? |
| |
| Josef |
| |