Blame - coregrind/m_gdbserver/README_DEVELOPERS - platform/external/valgrind

blob: 03eb76a455f67ce7ca979b96678e92bea2bfd7ee [file] [log] [blame]

sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	1	This file contains various notes/ideas/history/... related
				2	to gdbserver in valgrind.
				3
				4	How to use Valgrind gdbserver ?
				5	-------------------------------
				6	This is described in the Valgrind user manual.
				7	Before reading the below, you better read the user manual first.
				8
				9	What is gdbserver ?
				10	-------------------
				11	gdb debugger typically is used to debug a process running
				12	on the same machine : gdb uses system calls (such as ptrace)
				13	to fetch data from the process being debugged
				14	or to change data in the process
				15	or interrupt the process
				16	or ...
				17
				18	gdb can also debug processes running in a different computer
				19	(e.g. it can debug a process running on a small real time
				20	board).
				21
				22	gdb does this by sending some commands (e.g. using tcp/ip) to a piece
				23	of code running on the remote computer. This piece of code (called a
				24	gdb stub in small boards, or gdbserver when the remote computer runs
				25	an OS such as GNU/linux) will provide a set of commands allowing gdb
				26	to remotely debug the process. Examples of commands are: "get the
				27	registers", "get the list of running threads", "read xxx bytes at
				28	address yyyyyyyy", etc. The definition of all these commands and the
				29	associated replies is the gdb remote serial protocol, which is
				30	documented in Appendix D of gdb user manual.
				31
				32	The standard gdb distribution has a standalone gdbserver (a small
				33	executable) which implements this protocol and the needed system calls
				34	to allow gdb to remotely debug process running on a linux or MacOS or
Elliott Hughes	ed39800	2017-06-21 14:41:24 -0700	[diff] [blame^]	35	Solaris...
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	36
				37	Activation of gdbserver code inside valgrind
				38	--------------------------------------------
				39	The gdbserver code (from gdb 6.6, GPL2+) has been modified so as to
				40	link it with valgrind and allow the valgrind guest process to be
				41	debugged by a gdb speaking to this gdbserver embedded in valgrind.
				42	The ptrace system calls inside gdbserver have been replaced by reading
				43	the state of the guest.
				44
				45	The gdbserver functionality is activated with valgrind command line
				46	options. If gdbserver is not enabled, then the impact on valgrind
				47	runtime is minimal: basically it just checks at startup the command
				48	line option to see that there is nothing to do for what concerns gdb
				49	server: there is a "if gdbserver is active" check in the translate
				50	function of translate.c and an "if" in the valgrind scheduler.
				51	If the valgrind gdbserver is activated (--vgdb=yes), the impact
				52	is minimal (from time to time, the valgrind scheduler checks a counter
				53	in memory). Option --vgdb-poll=yyyyy controls how often the scheduler
				54	will do a (somewhat) more heavy check to see if gdbserver needs to
				55	stop execution of the guest to allow debugging.
				56	If valgrind gdbserver is activated with --vgdb=full, then
				57	each instruction is instrumented with an additional call to a dirty
				58	helper.
				59
				60	How does gdbserver code interacts with valgrind ?
				61	-------------------------------------------------
				62	When an error is reported, the gdbserver code is called. It reads
				63	commands from gdb using read system call on a FIFO (e.g. a command
				64	such as "get the registers"). It executes the command (e.g. fetches
				65	the registers from the guest state) and writes the reply (e.g. a
				66	packet containing the register data). When gdb instructs gdbserver to
				67	"continue", the control is returned to valgrind, which then continues
				68	to execute guest code. The FIFOs used to communication between
				69	valgrind and gdb are created at startup if gdbserver is activated
				70	according to the --vgdb=no/yes/full command line option.
				71
				72	How are signals "handled" ?
				73	---------------------------
				74	When a signal is to be given to the guest, valgrind core first calls
				75	gdbserver (if a gdb is currently connected to valgrind, otherwise the
				76	signal is delivered immediately). If gdb instructs to give the signal
				77	to the process, the signal is delivered to the guest. Otherwise, the
				78	signal is ignored (not given to the guest). The user can
				79	with gdb further decide to pass (or not pass) the signal.
				80	Note that some (fatal) signals cannot be ignored.
				81
				82	How are "break/step/stepi/next/..." implemented ?
				83	-------------------------------------------------
				84	When a break is put by gdb on an instruction, a command is sent to the
				85	gdbserver in valgrind. This causes the basic block of this instruction
				86	to be discarded and then re-instrumented so as to insert calls to a
				87	dirty helper which calls the gdb server code. When a block is
				88	instrumented for gdbserver, all the "jump targets" of this block are
				89	invalidated, so as to allow step/stepi/next to properly work: these
				90	blocks will themselves automatically be re-instrumented for gdbserver
				91	if they are jumped to.
				92	The valgrind gdbserver remembers which blocks have been instrumented
				93	due to this "lazy 'jump targets' debugging instrumentation" so as to
				94	discard these "debugging translation" when gdb instructs to continue
				95	the execution normally.
				96	The blocks in which an explicit break has been put by the user
				97	are kept instrumented for gdbserver.
				98	(but note that by default, gdb removes all breaks when the
				99	process is stopped, and re-inserts all breaks when the process
				100	is continued). This behaviour can be changed using the gdb
				101	command 'set breakpoint always-inserted'.
				102
				103	How are watchpoints implemented ?
				104	---------------------------------
				105	Watchpoints implies support from the tool to detect that
				106	a location is read and/or written. Currently, only memcheck
				107	supports this : when a watchpoint is placed, memcheck changes
				108	the addressability bits of the watched memory zone to be unacessible.
				109	Before an access, memcheck then detects an error, but sees this error
				110	is due to a watchpoint and gives the control back to gdb.
				111	Stopping on the exact instruction for a write watchpoint implies
				112	to use --vgdb=full. This is because the error is detected by memcheck
				113	before modifying the value. gdb checks that the value has not changed
				114	and so "does not believe" the information that the write watchpoint
				115	was triggered, and continues the execution. At the next watchpoint
Elliott Hughes	ed39800	2017-06-21 14:41:24 -0700	[diff] [blame^]	116	occurrence, gdb sees the value has changed. But the watchpoints are all
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	117	reported "off by one". To avoid this, Valgrind gdbserver must
				118	terminate the current instruction before reporting the write watchpoint.
				119	Terminating precisely the current instruction implies to have
				120	instrumented all the instructions of the block for gdbserver even
				121	if there is no break in this block. This is ensured by --vgdb=full.
				122	See m_gdbserver.c Bool VG_(is_watched) where watchpoint handling
				123	is implemented.
				124
				125	How is the Valgrind gdbserver receiving commands/packets from gdb ?
				126	-------------------------------------------------------------------
				127	The embedded gdbserver reads gdb commands on a named pipe having
florian	e906c64	2011-10-03 16:55:26 +0000	[diff] [blame]	128	(by default) the name /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST
				129	where PID, USER, and HOST will be replaced by the actual pid, the user id,
				130	and the host name, respectively.
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	131	The embedded gdbserver will reply to gdb commands on a named pipe
florian	e906c64	2011-10-03 16:55:26 +0000	[diff] [blame]	132	/tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	133
				134	gdb does not speak directly with gdbserver in valgrind: a relay application
				135	called vgdb is needed between gdb and the valgrind-ified process.
				136	gdb writes commands on the stdin of vgdb. vgdb reads these
florian	e906c64	2011-10-03 16:55:26 +0000	[diff] [blame]	137	commands and writes them on FIFO /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST.
				138	vgdb reads replies on FIFO /tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST
				139	and writes them on its stdout.
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	140
				141	Note: The solution of named pipes was preferred to tcp ip connections as
				142	it allows a discovery of which valgrind-ified processes are ready to accept
				143	command by looking at files starting with the /tmp/vgdb-pipe- prefix
				144	(changeable by a command line option).
				145	Also, the usual unix protections are protecting
				146	the valgrind process against other users sending commands.
				147	The relay process also takes into account the wake up of the valgrind
				148	process in case all threads are blocked in a system call.
				149	The relay process can also be used in a shell to send commands
				150	without a gdb (this allows to have a standard mechanism to control
				151	valgrind tools from the command line, rather than specialized mechanism
				152	e.g. in callgrind).
				153
				154	How is gdbserver activated if all Valgrind threads are blocked in a syscall ?
				155	-----------------------------------------------------------------------------
				156	vgdb relays characters from gdb to valgrind. The scheduler will from
				157	time to time check if gdbserver has to handle incoming characters.
				158	(the check is efficient i.e. most of the time consists in checking
				159	a counter in (shared) memory).
				160
				161	However, it might be that all the threads in the valgrind process are
				162	blocked in a system call. In such a case, no polling will be done by
				163	the valgrind scheduler (as no activity takes place). By default, vgdb
				164	will check after 100ms if the characters it has written have been read
				165	by valgrind. If not, vgdb will force the invocation of the gdbserver
				166	code inside the valgrind process.
				167
philippe	3c761f0	2013-12-01 14:56:28 +0000	[diff] [blame]	168	On Linux, this forced invocation is implemented using the ptrace system call:
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	169	using ptrace, vgdb will cause the valgrind process to call the
				170	gdbserver code.
				171
				172	This wake up is not done using signals as this would imply to
				173	implement a syscall restart logic in valgrind for all system
				174	calls. When using ptrace as above, the linux kernel is responsible to
				175	restart the system call.
				176
				177	This wakeup is also not implemented by having a "system thread"
				178	started by valgrind as this would transform all non-threaded programs
				179	in threaded programs when running under valgrind. Also, such a 'system
				180	thread' for gdbserver was tried by Greg Parker in the early MacOS
				181	port, and was unreliable.
				182
				183	So, the ptrace based solution was chosen instead.
				184
				185	There used to be some bugs in the kernel when using ptrace on
				186	a process blocked in a system call : the symptom is that the system
				187	call fails with an unknown errno 512. This typically happens
				188	with a vgdb in 64bits ptrace-ing a 32 bits process.
				189	A bypass for old kernels has been integrated in vgdb.c (sign extend
				190	register rax).
				191
				192	At least on a fedora core 12 (kernel 2.6.32), syscall restart of read
				193	and select are working ok and red-hat 5.3 (an old kernel), everything
				194	works properly.
				195
philippe	7ee8b88	2011-12-27 09:03:36 +0000	[diff] [blame]	196	Need to investigate if darwin can similarly do syscall
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	197	restart with ptrace.
				198
				199	The vgdb argument --max-invoke-ms=xxx allows to control the nr of
				200	milli-seconds after which vgdb will force the invocation of gdbserver
				201	code. If xxx is 0, this disables the forced invocation.
				202	Also, disabling this ptrace mechanism is necessary in case you are
				203	debugging the valgrind code at the same time as debugging the guest
				204	process using gdbserver.
				205
				206	Do not kill -9 vgdb while it has interrupted the valgrind process,
				207	otherwise the valgrind process will very probably stay stopped or die.
				208
Elliott Hughes	ed39800	2017-06-21 14:41:24 -0700	[diff] [blame^]	209	On Solaris, this forced invocation is implemented via agent thread.
				210	The process is first stopped (all the threads at once), and special agent
				211	thread is created which will force gbdserver invocation. After its
				212	work is done, the agent thread is destroyed and process resumed.
				213	Agent thread functionality is a Solaris OS feature, used also by debuggers.
				214	Therefore vgdb-invoker-solaris implementation is really small.
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	215
				216	Implementation is based on the gdbserver code from gdb 6.6
				217	----------------------------------------------------------
				218	The gdbserver implementation is derived from the gdbserver included
				219	in the gdb distribution.
				220	The files originating from gdb are : inferiors.c, regcache.[ch],
				221	regdef.h, remote-utils.c, server.[ch], signals.c, target.[ch], utils.c,
				222	version.c.
				223	valgrind-low-* are inspired from gdb files.
				224
				225	This code had to be changed to integrate properly within valgrind
				226	(e.g. no libc usage). Some of these changes have been ensured by
				227	using the preprocessor to replace calls by valgrind equivalent,
philippe	7ee8b88	2011-12-27 09:03:36 +0000	[diff] [blame]	228	e.g. #define strcmp(...) VG_(strcmp) (...).
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	229
				230	Some "control flow" changes are due to the fact that gdbserver inside
				231	valgrind must return the control to valgrind when the 'debugged'
				232	process has to run, while in a classical gdbserver usage, the
				233	gdbserver process waits for a debugged process to stop on a break or
				234	similar. This has implied to have some variables to remember the
				235	state of gdbserver before returning to valgrind (search for
				236	resume_packet_needed in server.c) and "goto" the place where gdbserver
				237	expects a stopped process to return control to gdbserver.
				238
				239	How does a tool need to be changed to be "debuggable" ?
				240	-------------------------------------------------------
				241	There is no need to modify a tool to have it "debuggable" via
				242	gdbserver : e.g. reports of errors, break etc will work "out of the
				243	box". If an interactive usage of tool client requests or similar is
				244	desired for a tool, then simple code can be written for that via a
				245	specific client request VG_USERREQ__GDB_MONITOR_COMMAND code. The tool
				246	function "handle_client_request" must then parse the string received
				247	in argument and call the expected valgrind or tool code. See
				248	e.g. massif ms_handle_client_request as an example.
				249
				250
				251	Automatic regression tests:
				252	---------------------------
				253	Automatic Valgrind gdbserver tests are in the directory
				254	$(top_srcdir)/gdbserver_tests.
bart	238ac2f	2011-12-27 09:14:50 +0000	[diff] [blame]	255	Read $(top_srcdir)/gdbserver_tests/README_DEVELOPERS for more
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	256	info about testing.
				257
				258	How to integrate support for a new architecture xxx?
				259	----------------------------------------------------
				260	Let's imagine a new architecture hal9000 has to be supported.
				261
				262	Mandatory:
				263	The main thing to do is to make a file valgrind-low-hal9000.c.
				264	Start from an existing file (e.g. valgrind-low-x86.c).
				265	The data structures 'struct reg regs'
Elliott Hughes	ed39800	2017-06-21 14:41:24 -0700	[diff] [blame^]	266	and 'const char *expedite_regs' are built from files
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	267	in the gdb sources, e.g. for an new arch hal9000
				268	cd gdb/regformats
philippe	0eb0d5a	2014-02-11 23:50:16 +0000	[diff] [blame]	269	sh ./regdat.sh reg-hal9000.dat hal9000
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	270
				271	From the generated file hal9000, you copy/paste in
				272	valgrind-low-hal9000.c the two needed data structures and change their
				273	name to 'regs' and 'expedite_regs'
				274
				275	Then adapt the set of functions needed to initialize the structure
				276	'static struct valgrind_target_ops low_target'.
				277
				278	Optional but heavily recommended:
				279	To have a proper wake up of a Valgrind process with all threads
				280	blocked in a system call, some architecture specific code
philippe	3c761f0	2013-12-01 14:56:28 +0000	[diff] [blame]	281	has to be done in vgdb-invoker-*.c.
				282	Typically, for a linux system supporting ptrace, you have to modify
				283	vgdb-invoker-ptrace.c.
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	284
philippe	3c761f0	2013-12-01 14:56:28 +0000	[diff] [blame]	285	For Linux based platforms, all the ptrace calls in vgdb-invoker-ptrace.c
				286	should be ok.
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	287	The only thing needed is the code needed to "push a dummy call" on the stack,
				288	i.e. assign the relevant registers in the struct user_regs_struct, and push
				289	values on the stack according to the ABI.
				290
				291	For other platforms (i.e. Macos), more work is needed as the ptrace calls
				292	on Macos are either different and/or incomplete (and so, 'Mach' specific
				293	things are needed e.g. to attach to threads etc).
				294	A courageous Mac aficionado is welcome on this aspect.
				295
Elliott Hughes	ed39800	2017-06-21 14:41:24 -0700	[diff] [blame^]	296	For Solaris, only architecture specific functionality in vgdb-invoker-solaris.c
				297	needs to be implemented, similar to Linux above.
				298
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	299	Optional:
				300	To let gdb see the Valgrind shadow registers, xml description
				301	files have to be provided + valgrind-low-hal9000.c has
				302	to give the top xml file.
				303	Start from the xml files found in the gdb distribution directory
				304	gdb/features. You need to duplicate and modify these files to provide
				305	shadow1 and shadow2 register sets description.
				306
				307	Modify coregrind/Makefile.am:
				308	add valgrind-low-hal9000.c
florian	e2b8aa4	2012-03-13 02:13:50 +0000	[diff] [blame]	309	If you have target xml description, also add them to GDBSERVER_XML_FILES
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	310
				311
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	312	TODO and/or additional nice things to have
				313	------------------------------------------
				314	* many options can be changed on-line without problems.
sewardj	30b3eca	2011-06-28 08:20:39 +0000	[diff] [blame]	315	=> would be nice to have a v.option command that would evaluate
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	316	its arguments like the startup options of m_main.c and tool clo processing.
				317
sewardj	30b3eca	2011-06-28 08:20:39 +0000	[diff] [blame]	318	* have a memcheck monitor command
philippe	a22f59d	2012-01-26 23:13:52 +0000	[diff] [blame]	319	show_dangling_pointers [last_n_recently_released_blocks]
				320	showing which of the n last recently released blocks are still
				321	referenced. These references are (potential) dangling pointers.
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	322
				323	* some GDBTD in the code
				324
				325	(GDBTD = GDB To Do = something still to look at and/or a question)
				326
				327	* All architectures and platforms are done.
				328	But there are still some "GDBTD" to convert between gdb registers
				329	and VEX registers :
				330	e.g. some registers in x86 or amd64 that I could not
				331	translate to VEX registers. Someone with a good knowledge
				332	of these architectures might complete this
				333	(see the GDBTD in valgrind-low-*.c)
				334
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	335	* Currently, at least on recent linux kernel, vgdb can properly wake
				336	up a valgrind process which is blocked in system calls. Maybe we
				337	need to see till which kernel version the ptrace + syscall restart
				338	is broken, and put the default value of --max-invoke-ms to 0 in this
				339	case.
				340
				341	* more client requests can be programmed in various tools. Currently,
				342	there are only a few standard valgrind or memcheck client requests
				343	implemented.
sewardj	30b3eca	2011-06-28 08:20:39 +0000	[diff] [blame]	344	v.suppression [generate\|add\|delete] might be an interesting command:
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	345	generate would output a suppression, add/delete would add a suppression
				346	in memory for the last (or selected?) error.
sewardj	30b3eca	2011-06-28 08:20:39 +0000	[diff] [blame]	347	v.break on fn calls/entry/exit + commands associated to it
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	348	(such as search leaks)?
				349
				350
sewardj	3b29048	2011-05-06 21:02:55 +0000	[diff] [blame]	351	* currently jump(s) and inferior call(s) are somewhat dangerous
				352	when called from a block not yet instrumented : instead
				353	of continuing till the next Imark, where there will be a
				354	debugger call that can properly jump at an instruction boundary,
				355	the jump/call will quit the "middle" of an instruction.
				356	We could detect if the current block is instrumented by a trick
				357	like this:
				358	/* Each time helperc_CallDebugger is called, we will store
				359	the address from which is it called and the nr of bbs_done
				360	when called. This allows to detect that gdbserver is called
				361	from a block which is instrumented. */
				362	static HWord CallDebugger_addr;
				363	static ULong CallDebugger_bbs_done;
				364
				365	Bool VG_(gdbserver_current_IP_instrumented) (ThreadId tid)
				366	{
				367	if (VG_(get_IP) (tid) != CallDebugger_addr
				368	\|\| CallDebugger_bbs_done != VG_(bbs_done)())
				369	return False;
				370	return True;
				371	}
				372
				373	Alternatively, we ensure we can re-instrument the current
				374	block for gdbserver while executing it.
				375	Something like:
				376	keep current block till the end of the current instruction, then
				377	go back to scheduler.
				378	Unsure if and how this is do-able.
				379
				380
				381	* ensure that all non static symbols of gdbserver files are #define
				382	xxxxx VG_(xxxxx) ???? Is this really needed ? I have tried to put in
				383	a test program variables and functions with the same name as valgrind
				384	stuff, and everything seems to be ok.
				385	I see that all exported symbols in valgrind have a unique prefix
				386	created with VG_ or MC_ or ...
				387	This is not done for the "gdb gdbserver code", where I have kept
				388	the original names. Is this a problem ? I could not create
				389	a "symbol" collision between the user symbol and the valgrind
				390	core gdbserver symbol.
				391
				392	* currently, gdbserver can only stop/continue the whole process. It
				393	might be interesting to have a fine-grained thread control (vCont
				394	packet) maybe for tools such as helgrind, drd. This would allow the
				395	user to stop/resume specific threads. Also, maybe this would solve
				396	the following problem: wait for a breakpoint to be encountered,
				397	switch thread, next. This sometimes causes an internal error in gdb,
				398	probably because gdb believes the current thread will be continued ?
				399
				400	* would be nice to have some more tests.
				401
				402	* better valgrind target support in gdb (see comments of Tom Tromey).
				403
				404
				405	-------- description of how gdb invokes a function in the inferior
				406	to call a function in the inferior (below is for x86):
				407	gdb writes ESP and EBP to have some more stack space
				408	push a return address equal to 0x8048390 <_start>
				409	puts a break at 0x8048390
				410	put address of the function to call (e.g. hello_world in EIP (0x8048444))
				411	continue
				412	break encountered at 0x8048391 (90 after decrement)
				413	=> report stop to gdb
				414	=> gdb restores esp/ebp/eip to what it was (eg. 0x804848C)
				415	=> gdb "s" => causes the EIP to go to the new EIP (i.e. 0x804848C)
				416	gdbserver tells "resuming from 0x804848c"
				417	"stop pc is 0x8048491" => informed gdb of this
				418