blob: 03eb76a455f67ce7ca979b96678e92bea2bfd7ee [file] [log] [blame]
sewardj3b290482011-05-06 21:02:55 +00001This file contains various notes/ideas/history/... related
2to gdbserver in valgrind.
3
4How to use Valgrind gdbserver ?
5-------------------------------
6This is described in the Valgrind user manual.
7Before reading the below, you better read the user manual first.
8
9What is gdbserver ?
10-------------------
11gdb debugger typically is used to debug a process running
12on the same machine : gdb uses system calls (such as ptrace)
13to fetch data from the process being debugged
14or to change data in the process
15or interrupt the process
16or ...
17
18gdb can also debug processes running in a different computer
19(e.g. it can debug a process running on a small real time
20board).
21
22gdb does this by sending some commands (e.g. using tcp/ip) to a piece
23of code running on the remote computer. This piece of code (called a
24gdb stub in small boards, or gdbserver when the remote computer runs
25an OS such as GNU/linux) will provide a set of commands allowing gdb
26to remotely debug the process. Examples of commands are: "get the
27registers", "get the list of running threads", "read xxx bytes at
28address yyyyyyyy", etc. The definition of all these commands and the
29associated replies is the gdb remote serial protocol, which is
30documented in Appendix D of gdb user manual.
31
32The standard gdb distribution has a standalone gdbserver (a small
33executable) which implements this protocol and the needed system calls
34to allow gdb to remotely debug process running on a linux or MacOS or
Elliott Hughesed398002017-06-21 14:41:24 -070035Solaris...
sewardj3b290482011-05-06 21:02:55 +000036
37Activation of gdbserver code inside valgrind
38--------------------------------------------
39The gdbserver code (from gdb 6.6, GPL2+) has been modified so as to
40link it with valgrind and allow the valgrind guest process to be
41debugged by a gdb speaking to this gdbserver embedded in valgrind.
42The ptrace system calls inside gdbserver have been replaced by reading
43the state of the guest.
44
45The gdbserver functionality is activated with valgrind command line
46options. If gdbserver is not enabled, then the impact on valgrind
47runtime is minimal: basically it just checks at startup the command
48line option to see that there is nothing to do for what concerns gdb
49server: there is a "if gdbserver is active" check in the translate
50function of translate.c and an "if" in the valgrind scheduler.
51If the valgrind gdbserver is activated (--vgdb=yes), the impact
52is minimal (from time to time, the valgrind scheduler checks a counter
53in memory). Option --vgdb-poll=yyyyy controls how often the scheduler
54will do a (somewhat) more heavy check to see if gdbserver needs to
55stop execution of the guest to allow debugging.
56If valgrind gdbserver is activated with --vgdb=full, then
57each instruction is instrumented with an additional call to a dirty
58helper.
59
60How does gdbserver code interacts with valgrind ?
61-------------------------------------------------
62When an error is reported, the gdbserver code is called. It reads
63commands from gdb using read system call on a FIFO (e.g. a command
64such as "get the registers"). It executes the command (e.g. fetches
65the registers from the guest state) and writes the reply (e.g. a
66packet containing the register data). When gdb instructs gdbserver to
67"continue", the control is returned to valgrind, which then continues
68to execute guest code. The FIFOs used to communication between
69valgrind and gdb are created at startup if gdbserver is activated
70according to the --vgdb=no/yes/full command line option.
71
72How are signals "handled" ?
73---------------------------
74When a signal is to be given to the guest, valgrind core first calls
75gdbserver (if a gdb is currently connected to valgrind, otherwise the
76signal is delivered immediately). If gdb instructs to give the signal
77to the process, the signal is delivered to the guest. Otherwise, the
78signal is ignored (not given to the guest). The user can
79with gdb further decide to pass (or not pass) the signal.
80Note that some (fatal) signals cannot be ignored.
81
82How are "break/step/stepi/next/..." implemented ?
83-------------------------------------------------
84When a break is put by gdb on an instruction, a command is sent to the
85gdbserver in valgrind. This causes the basic block of this instruction
86to be discarded and then re-instrumented so as to insert calls to a
87dirty helper which calls the gdb server code. When a block is
88instrumented for gdbserver, all the "jump targets" of this block are
89invalidated, so as to allow step/stepi/next to properly work: these
90blocks will themselves automatically be re-instrumented for gdbserver
91if they are jumped to.
92The valgrind gdbserver remembers which blocks have been instrumented
93due to this "lazy 'jump targets' debugging instrumentation" so as to
94discard these "debugging translation" when gdb instructs to continue
95the execution normally.
96The blocks in which an explicit break has been put by the user
97are kept instrumented for gdbserver.
98(but note that by default, gdb removes all breaks when the
99process is stopped, and re-inserts all breaks when the process
100is continued). This behaviour can be changed using the gdb
101command 'set breakpoint always-inserted'.
102
103How are watchpoints implemented ?
104---------------------------------
105Watchpoints implies support from the tool to detect that
106a location is read and/or written. Currently, only memcheck
107supports this : when a watchpoint is placed, memcheck changes
108the addressability bits of the watched memory zone to be unacessible.
109Before an access, memcheck then detects an error, but sees this error
110is due to a watchpoint and gives the control back to gdb.
111Stopping on the exact instruction for a write watchpoint implies
112to use --vgdb=full. This is because the error is detected by memcheck
113before modifying the value. gdb checks that the value has not changed
114and so "does not believe" the information that the write watchpoint
115was triggered, and continues the execution. At the next watchpoint
Elliott Hughesed398002017-06-21 14:41:24 -0700116occurrence, gdb sees the value has changed. But the watchpoints are all
sewardj3b290482011-05-06 21:02:55 +0000117reported "off by one". To avoid this, Valgrind gdbserver must
118terminate the current instruction before reporting the write watchpoint.
119Terminating precisely the current instruction implies to have
120instrumented all the instructions of the block for gdbserver even
121if there is no break in this block. This is ensured by --vgdb=full.
122See m_gdbserver.c Bool VG_(is_watched) where watchpoint handling
123is implemented.
124
125How is the Valgrind gdbserver receiving commands/packets from gdb ?
126-------------------------------------------------------------------
127The embedded gdbserver reads gdb commands on a named pipe having
floriane906c642011-10-03 16:55:26 +0000128(by default) the name /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST
129where PID, USER, and HOST will be replaced by the actual pid, the user id,
130and the host name, respectively.
sewardj3b290482011-05-06 21:02:55 +0000131The embedded gdbserver will reply to gdb commands on a named pipe
floriane906c642011-10-03 16:55:26 +0000132/tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST
sewardj3b290482011-05-06 21:02:55 +0000133
134gdb does not speak directly with gdbserver in valgrind: a relay application
135called vgdb is needed between gdb and the valgrind-ified process.
136gdb writes commands on the stdin of vgdb. vgdb reads these
floriane906c642011-10-03 16:55:26 +0000137commands and writes them on FIFO /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST.
138vgdb reads replies on FIFO /tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST
139and writes them on its stdout.
sewardj3b290482011-05-06 21:02:55 +0000140
141Note: The solution of named pipes was preferred to tcp ip connections as
142it allows a discovery of which valgrind-ified processes are ready to accept
143command by looking at files starting with the /tmp/vgdb-pipe- prefix
144(changeable by a command line option).
145Also, the usual unix protections are protecting
146the valgrind process against other users sending commands.
147The relay process also takes into account the wake up of the valgrind
148process in case all threads are blocked in a system call.
149The relay process can also be used in a shell to send commands
150without a gdb (this allows to have a standard mechanism to control
151valgrind tools from the command line, rather than specialized mechanism
152e.g. in callgrind).
153
154How is gdbserver activated if all Valgrind threads are blocked in a syscall ?
155-----------------------------------------------------------------------------
156vgdb relays characters from gdb to valgrind. The scheduler will from
157time to time check if gdbserver has to handle incoming characters.
158(the check is efficient i.e. most of the time consists in checking
159a counter in (shared) memory).
160
161However, it might be that all the threads in the valgrind process are
162blocked in a system call. In such a case, no polling will be done by
163the valgrind scheduler (as no activity takes place). By default, vgdb
164will check after 100ms if the characters it has written have been read
165by valgrind. If not, vgdb will force the invocation of the gdbserver
166code inside the valgrind process.
167
philippe3c761f02013-12-01 14:56:28 +0000168On Linux, this forced invocation is implemented using the ptrace system call:
sewardj3b290482011-05-06 21:02:55 +0000169using ptrace, vgdb will cause the valgrind process to call the
170gdbserver code.
171
172This wake up is *not* done using signals as this would imply to
173implement a syscall restart logic in valgrind for all system
174calls. When using ptrace as above, the linux kernel is responsible to
175restart the system call.
176
177This wakeup is also *not* implemented by having a "system thread"
178started by valgrind as this would transform all non-threaded programs
179in threaded programs when running under valgrind. Also, such a 'system
180thread' for gdbserver was tried by Greg Parker in the early MacOS
181port, and was unreliable.
182
183So, the ptrace based solution was chosen instead.
184
185There used to be some bugs in the kernel when using ptrace on
186a process blocked in a system call : the symptom is that the system
187call fails with an unknown errno 512. This typically happens
188with a vgdb in 64bits ptrace-ing a 32 bits process.
189A bypass for old kernels has been integrated in vgdb.c (sign extend
190register rax).
191
192At least on a fedora core 12 (kernel 2.6.32), syscall restart of read
193and select are working ok and red-hat 5.3 (an old kernel), everything
194works properly.
195
philippe7ee8b882011-12-27 09:03:36 +0000196Need to investigate if darwin can similarly do syscall
sewardj3b290482011-05-06 21:02:55 +0000197restart with ptrace.
198
199The vgdb argument --max-invoke-ms=xxx allows to control the nr of
200milli-seconds after which vgdb will force the invocation of gdbserver
201code. If xxx is 0, this disables the forced invocation.
202Also, disabling this ptrace mechanism is necessary in case you are
203debugging the valgrind code at the same time as debugging the guest
204process using gdbserver.
205
206Do not kill -9 vgdb while it has interrupted the valgrind process,
207otherwise the valgrind process will very probably stay stopped or die.
208
Elliott Hughesed398002017-06-21 14:41:24 -0700209On Solaris, this forced invocation is implemented via agent thread.
210The process is first stopped (all the threads at once), and special agent
211thread is created which will force gbdserver invocation. After its
212work is done, the agent thread is destroyed and process resumed.
213Agent thread functionality is a Solaris OS feature, used also by debuggers.
214Therefore vgdb-invoker-solaris implementation is really small.
sewardj3b290482011-05-06 21:02:55 +0000215
216Implementation is based on the gdbserver code from gdb 6.6
217----------------------------------------------------------
218The gdbserver implementation is derived from the gdbserver included
219in the gdb distribution.
220The files originating from gdb are : inferiors.c, regcache.[ch],
221regdef.h, remote-utils.c, server.[ch], signals.c, target.[ch], utils.c,
222version.c.
223valgrind-low-* are inspired from gdb files.
224
225This code had to be changed to integrate properly within valgrind
226(e.g. no libc usage). Some of these changes have been ensured by
227using the preprocessor to replace calls by valgrind equivalent,
philippe7ee8b882011-12-27 09:03:36 +0000228e.g. #define strcmp(...) VG_(strcmp) (...).
sewardj3b290482011-05-06 21:02:55 +0000229
230Some "control flow" changes are due to the fact that gdbserver inside
231valgrind must return the control to valgrind when the 'debugged'
232process has to run, while in a classical gdbserver usage, the
233gdbserver process waits for a debugged process to stop on a break or
234similar. This has implied to have some variables to remember the
235state of gdbserver before returning to valgrind (search for
236resume_packet_needed in server.c) and "goto" the place where gdbserver
237expects a stopped process to return control to gdbserver.
238
239How does a tool need to be changed to be "debuggable" ?
240-------------------------------------------------------
241There is no need to modify a tool to have it "debuggable" via
242gdbserver : e.g. reports of errors, break etc will work "out of the
243box". If an interactive usage of tool client requests or similar is
244desired for a tool, then simple code can be written for that via a
245specific client request VG_USERREQ__GDB_MONITOR_COMMAND code. The tool
246function "handle_client_request" must then parse the string received
247in argument and call the expected valgrind or tool code. See
248e.g. massif ms_handle_client_request as an example.
249
250
251Automatic regression tests:
252---------------------------
253Automatic Valgrind gdbserver tests are in the directory
254$(top_srcdir)/gdbserver_tests.
bart238ac2f2011-12-27 09:14:50 +0000255Read $(top_srcdir)/gdbserver_tests/README_DEVELOPERS for more
sewardj3b290482011-05-06 21:02:55 +0000256info about testing.
257
258How to integrate support for a new architecture xxx?
259----------------------------------------------------
260Let's imagine a new architecture hal9000 has to be supported.
261
262Mandatory:
263The main thing to do is to make a file valgrind-low-hal9000.c.
264Start from an existing file (e.g. valgrind-low-x86.c).
265The data structures 'struct reg regs'
Elliott Hughesed398002017-06-21 14:41:24 -0700266and 'const char *expedite_regs' are built from files
sewardj3b290482011-05-06 21:02:55 +0000267in the gdb sources, e.g. for an new arch hal9000
268 cd gdb/regformats
philippe0eb0d5a2014-02-11 23:50:16 +0000269 sh ./regdat.sh reg-hal9000.dat hal9000
sewardj3b290482011-05-06 21:02:55 +0000270
271From the generated file hal9000, you copy/paste in
272valgrind-low-hal9000.c the two needed data structures and change their
273name to 'regs' and 'expedite_regs'
274
275Then adapt the set of functions needed to initialize the structure
276'static struct valgrind_target_ops low_target'.
277
278Optional but heavily recommended:
279To have a proper wake up of a Valgrind process with all threads
280blocked in a system call, some architecture specific code
philippe3c761f02013-12-01 14:56:28 +0000281has to be done in vgdb-invoker-*.c.
282Typically, for a linux system supporting ptrace, you have to modify
283vgdb-invoker-ptrace.c.
sewardj3b290482011-05-06 21:02:55 +0000284
philippe3c761f02013-12-01 14:56:28 +0000285For Linux based platforms, all the ptrace calls in vgdb-invoker-ptrace.c
286should be ok.
sewardj3b290482011-05-06 21:02:55 +0000287The only thing needed is the code needed to "push a dummy call" on the stack,
288i.e. assign the relevant registers in the struct user_regs_struct, and push
289values on the stack according to the ABI.
290
291For other platforms (i.e. Macos), more work is needed as the ptrace calls
292on Macos are either different and/or incomplete (and so, 'Mach' specific
293things are needed e.g. to attach to threads etc).
294A courageous Mac aficionado is welcome on this aspect.
295
Elliott Hughesed398002017-06-21 14:41:24 -0700296For Solaris, only architecture specific functionality in vgdb-invoker-solaris.c
297needs to be implemented, similar to Linux above.
298
sewardj3b290482011-05-06 21:02:55 +0000299Optional:
300To let gdb see the Valgrind shadow registers, xml description
301files have to be provided + valgrind-low-hal9000.c has
302to give the top xml file.
303Start from the xml files found in the gdb distribution directory
304gdb/features. You need to duplicate and modify these files to provide
305shadow1 and shadow2 register sets description.
306
307Modify coregrind/Makefile.am:
308 add valgrind-low-hal9000.c
floriane2b8aa42012-03-13 02:13:50 +0000309 If you have target xml description, also add them to GDBSERVER_XML_FILES
sewardj3b290482011-05-06 21:02:55 +0000310
311
sewardj3b290482011-05-06 21:02:55 +0000312TODO and/or additional nice things to have
313------------------------------------------
314* many options can be changed on-line without problems.
sewardj30b3eca2011-06-28 08:20:39 +0000315 => would be nice to have a v.option command that would evaluate
sewardj3b290482011-05-06 21:02:55 +0000316 its arguments like the startup options of m_main.c and tool clo processing.
317
sewardj30b3eca2011-06-28 08:20:39 +0000318* have a memcheck monitor command
philippea22f59d2012-01-26 23:13:52 +0000319 show_dangling_pointers [last_n_recently_released_blocks]
320 showing which of the n last recently released blocks are still
321 referenced. These references are (potential) dangling pointers.
sewardj3b290482011-05-06 21:02:55 +0000322
323* some GDBTD in the code
324
325(GDBTD = GDB To Do = something still to look at and/or a question)
326
327* All architectures and platforms are done.
328 But there are still some "GDBTD" to convert between gdb registers
329 and VEX registers :
330 e.g. some registers in x86 or amd64 that I could not
331 translate to VEX registers. Someone with a good knowledge
332 of these architectures might complete this
333 (see the GDBTD in valgrind-low-*.c)
334
sewardj3b290482011-05-06 21:02:55 +0000335* Currently, at least on recent linux kernel, vgdb can properly wake
336 up a valgrind process which is blocked in system calls. Maybe we
337 need to see till which kernel version the ptrace + syscall restart
338 is broken, and put the default value of --max-invoke-ms to 0 in this
339 case.
340
341* more client requests can be programmed in various tools. Currently,
342 there are only a few standard valgrind or memcheck client requests
343 implemented.
sewardj30b3eca2011-06-28 08:20:39 +0000344 v.suppression [generate|add|delete] might be an interesting command:
sewardj3b290482011-05-06 21:02:55 +0000345 generate would output a suppression, add/delete would add a suppression
346 in memory for the last (or selected?) error.
sewardj30b3eca2011-06-28 08:20:39 +0000347 v.break on fn calls/entry/exit + commands associated to it
sewardj3b290482011-05-06 21:02:55 +0000348 (such as search leaks)?
349
350
sewardj3b290482011-05-06 21:02:55 +0000351* currently jump(s) and inferior call(s) are somewhat dangerous
352 when called from a block not yet instrumented : instead
353 of continuing till the next Imark, where there will be a
354 debugger call that can properly jump at an instruction boundary,
355 the jump/call will quit the "middle" of an instruction.
356 We could detect if the current block is instrumented by a trick
357 like this:
358 /* Each time helperc_CallDebugger is called, we will store
359 the address from which is it called and the nr of bbs_done
360 when called. This allows to detect that gdbserver is called
361 from a block which is instrumented. */
362 static HWord CallDebugger_addr;
363 static ULong CallDebugger_bbs_done;
364
365 Bool VG_(gdbserver_current_IP_instrumented) (ThreadId tid)
366 {
367 if (VG_(get_IP) (tid) != CallDebugger_addr
368 || CallDebugger_bbs_done != VG_(bbs_done)())
369 return False;
370 return True;
371 }
372
373 Alternatively, we ensure we can re-instrument the current
374 block for gdbserver while executing it.
375 Something like:
376 keep current block till the end of the current instruction, then
377 go back to scheduler.
378 Unsure if and how this is do-able.
379
380
381* ensure that all non static symbols of gdbserver files are #define
382 xxxxx VG_(xxxxx) ???? Is this really needed ? I have tried to put in
383 a test program variables and functions with the same name as valgrind
384 stuff, and everything seems to be ok.
385 I see that all exported symbols in valgrind have a unique prefix
386 created with VG_ or MC_ or ...
387 This is not done for the "gdb gdbserver code", where I have kept
388 the original names. Is this a problem ? I could not create
389 a "symbol" collision between the user symbol and the valgrind
390 core gdbserver symbol.
391
392* currently, gdbserver can only stop/continue the whole process. It
393 might be interesting to have a fine-grained thread control (vCont
394 packet) maybe for tools such as helgrind, drd. This would allow the
395 user to stop/resume specific threads. Also, maybe this would solve
396 the following problem: wait for a breakpoint to be encountered,
397 switch thread, next. This sometimes causes an internal error in gdb,
398 probably because gdb believes the current thread will be continued ?
399
400* would be nice to have some more tests.
401
402* better valgrind target support in gdb (see comments of Tom Tromey).
403
404
405-------- description of how gdb invokes a function in the inferior
406to call a function in the inferior (below is for x86):
407gdb writes ESP and EBP to have some more stack space
408push a return address equal to 0x8048390 <_start>
409puts a break at 0x8048390
410put address of the function to call (e.g. hello_world in EIP (0x8048444))
411continue
412break encountered at 0x8048391 (90 after decrement)
413 => report stop to gdb
414 => gdb restores esp/ebp/eip to what it was (eg. 0x804848C)
415 => gdb "s" => causes the EIP to go to the new EIP (i.e. 0x804848C)
416 gdbserver tells "resuming from 0x804848c"
417 "stop pc is 0x8048491" => informed gdb of this
418