Blame - THREADS_SYSCALLS_SIGNALS.txt - platform/external/valgrind

blob: 5a083c12226dc6eb5ed9b819247fb5846b5fbee4 [file] [log] [blame]

sewardj	79f76f2	2005-03-14 13:35:15 +0000	[diff] [blame]	1
				2	/* Make a thread the running thread. The thread must previously been
				3	sleeping, and not holding the CPU semaphore. This will set the
				4	thread state to VgTs_Runnable, and the thread will attempt to take
				5	the CPU semaphore. By the time it returns, tid will be the running
				6	thread. */
				7	extern void VG_(set_running) ( ThreadId tid );
				8
				9	/* Set a thread into a sleeping state. Before the call, the thread
				10	must be runnable, and holding the CPU semaphore. When this call
				11	returns, the thread will be set to the specified sleeping state,
				12	and will not be holding the CPU semaphore. Note that another
				13	thread could be running by the time this call returns, so the
				14	caller must be careful not to touch any shared state. It is also
				15	the caller's responsibility to actually block until the thread is
				16	ready to run again. */
				17	extern void VG_(set_sleeping) ( ThreadId tid, ThreadStatus state );
				18
				19
				20	The master semaphore is run_sema in vg_scheduler.c.
				21
sewardj	c112139	2005-04-09 18:24:19 +0000	[diff] [blame]	22
				23	(what happens at a fork?)
				24
				25	VG_(scheduler_init) registers sched_fork_cleanup as a child atfork
				26	handler. sched_fork_cleanup, among other things, reinitializes the
				27	semaphore with a new pipe so the process has its own.
				28
sewardj	79f76f2	2005-03-14 13:35:15 +0000	[diff] [blame]	29	--------------------------------------------------------------------
				30
				31	Re: New World signal handling
				32	From: Jeremy Fitzhardinge <jeremy@goop.org>
				33	To: Julian Seward <jseward@acm.org>
				34	Date: Mon Mar 14 09:03:51 2005
				35
				36	Well, the big-picture things to be clear about are:
				37
				38	1. signal handlers are process-wide global state
				39	2. signal masks are per-thread (there's no notion of a process-wide
				40	signal mask)
				41	3. a signal can be targeted to either
				42	1. the whole process (any eligable thread is picked for
				43	delivery), or
				44	2. a specific thread
				45
				46	1 is why it is always a bug to temporarily reset a signal handler (say,
				47	for SIGSEGV), because if any other thread happens to be sent one in that
				48	window it will cause havok (I think there's still one instance of this
				49	in the symtab stuff).
				50	2 is the meat of your questions; more below.
				51	3 is responsible for some of the nitty detail in the signal stuff, so
				52	its worth bearing in mind to understand it all. (Note that even if a
				53	signal is targeting the whole process, its only ever delivered to one
				54	particular thread; there's no such thing as a broadcast signal.)
				55
				56	While a thread are running core code or generated code, it has almost
				57	all its signals blocked (all but the fault signals: SEGV, BUS, ILL, etc).
				58
				59	Every N basic blocks, each thread calls VG_(poll_signals) to see what
				60	signals are pending for it. poll_signals grabs the next pending signal
				61	which the client signal mask doesn't block, and sets it up for delivery;
				62	it uses the sigtimedwait() syscall to fetch blocked pending signals
				63	rather than have them delivered to a signal handler. This means that
				64	we avoid the complexity of having signals delivered asynchronously via
				65	the signal handlers; we can just poll for them synchronously when
				66	they're easy to deal with.
				67
				68	Fault signals, being caused by a specific instruction, are the exception
				69	because they can't be held off; if they're blocked when an instruction
				70	raises one, the kernel will just summarily kill the process. Therefore,
				71	they need to be always unblocked, and the signal handler is called when
				72	an instruction raises one of these exceptions. (It's also necessary to
				73	call poll_signals after any syscall which may raise a signal, since
				74	signal-raising syscalls are considered to be synchronous with respect to
				75	their signal; ie, calling kill(getpid(), SIGUSR1) will call the handler
				76	for SIGUSR1 before kill is seen to complete.)
				77
				78	The one time when the thread's real signal mask actually matches the
				79	client's requested signal mask is while running a blocking syscall. We
				80	have to set things up to accept signals during a syscall so that we get
				81	the right signal-interrupts-syscall semantics. The tricky part about
				82	this is that there's no general atomic
				83	set-signal-mask-and-block-in-syscall mechanism, so we need to fake it
				84	with the stuff in VGA_(_client_syscall)/VGA_(interrupted_syscall).
				85	These two basically form an explicit state machine, where the state
				86	variable is the instruction pointer, which allows it to determine what
				87	point the syscall got to when the async signal happens. By keeping the
				88	window where signals are actually unblocked very narrow, the number of
				89	possible states is pretty small.
				90
				91	This is all quite nice because the kernel does almost all the work of
				92	determining which thread should get a signal, what the correct action
				93	for a syscall when it has been interrupted is, etc. Particularly nice
				94	is that we don't need to worry about all the queuing semantics, and the
				95	per-signal special cases (which is, roughly, signals 1-32 are not queued
				96	except when they are, and signals 33-64 are queued except when they aren't).
				97
				98	BUT, there's another complexity: because the Unix signal mechanism has
				99	been overloaded to deal with two separate kinds of events (asynchronous
				100	signals raised by kill(), and synchronous faults raised by an
				101	instruction), we can't block a signal for one form and not the other.
				102	That is, because we have to leave SIGSEGV unblocked for faulting
				103	instructions, it also leaves us open to getting an async SIGSEGV sent
				104	with kill(pid, SIGSEGV).
				105
				106	To handle this case, there's a small per-thread signal queue set up to
				107	deal with this case (I'm using tid 0's queue for "signals sent to the
				108	whole process" - a hack, I'll admit). If an async SIGSEGV (etc) signal
				109	appears, then it is pushed onto the appropriate queue.
				110	VG_(poll_signals) also checks these queues for pending signals to decide
				111	what signal to deliver next. These queues are only manipulated with
				112	all signals blocked, so there's no risk of two concurrent async signal
				113	handlers modifying the queues at once. Also, because the liklihood of
				114	actually being sent an async SIGSEGV is pretty low, the queues are only
				115	allocated on demand.
				116
				117
				118
				119	There are two mechanisms to prevent disaster if multiple threads get
				120	signals concurrently. One is that a signal handler is set up to block a
				121	set of signals while the signal is being delivered. Valgrind's handlers
				122	block all signals, so there's no risk of a new signal being delivered to
				123	the same thread until the old handler has finished.
				124
				125	The other is that if the thread which recieves the signal is not running
				126	(ie, doesn't hold the run_sema, which implies it must be waiting for a
				127	syscall to complete), then the signal handler will grab the run_sema
				128	before making any global state changes. Since the only time we can get
				129	an async signal asynchronously is during a blocking syscall, this should
				130	be all the time. (And since synchronous signals are always the result of
				131	running an instruction, we should already be holding run_sema.)
				132
				133
				134	Valgrind will occasionally generate signals for itself. These are always
				135	synchronous faults as a result instruction fetch or something an
				136	instruction did. The two mechanims are the synth_fault_* functions,
				137	which are used to signal a problem while fetching an instruction, or by
				138	getting generated code to call a helper which contains a fault-raising
				139	instruction (used to deal with illegal/unimplemented instructions and
				140	for instructions who's only job is to raise exceptions).
				141
				142	That all explains how signals come in, but the second part is how they
				143	get delivered.
				144
				145	The main function for this is VG_(deliver_signal). There are three cases:
				146
				147	1. the process is ignoring the signal (SIG_IGN)
				148	2. the process is using the default handler (SIG_DFL)
				149	3. the process has a handler for the signal
				150
				151	In general, VG_(deliver_signal) shouldn't be called for ignored signals;
				152	if it has been called, it assumes the ignore is being overridden (if an
				153	instruction gets a SEGV etc, SIG_IGN is ignored and treated as SIG_DFL).
				154
				155	VG_(deliver_signal) handles the default handler case, and the
				156	client-specified signal handler case.
				157
				158	The default handler case is relatively easy: the signal's default action
				159	is either Terminate, or Ignore. We can ignore Ignore.
				160
				161	Terminate always kills the entire process; there's no such thing as a
				162	thread-specific signal death. Terminate comes in two forms: with
				163	coredump, or without. vg_default_action() will write a core file, and
				164	then will tell all the threads to start terminating; it then longjmps
				165	back to the current thread's scheduler loop. The scheduler loop will
				166	terminate immediately, and the master_tid thread will wait for all the
				167	others to exit before shutting down the process (this is the same
				168	mechanism as exit_group).
				169
				170	Delivering a signal to a client-side handler modifys the thread state so
				171	that there's a signal frame on the stack, and the instruction pointer is
				172	pointing to the handler. The fiddly bit is that there are two
				173	completely different signal frame formats: old and RT. While in theory
				174	the exact shape of these frames on stack is abstracted, there are real
				175	programs which know exactly where various parts of the structures are on
				176	stack (most notably, g++'s exception throwing code), which is why it has
				177	to have two separate pieces of code for each frame format. Another
				178	tricky case is dealing with the client stack running out/overflowing
				179	while setting up the signal frame.
				180
				181	Signal return is also interesting. There are two syscalls, sigreturn
				182	and rt_sigreturn, which a signal handler will use to resume execution.
				183	The client will call the right one for the frame it was passed, so the
				184	core doesn't need to track that state. The tricky part is moving the
				185	frame's register state back into the thread's state, particularly all
				186	the FPU state reformatting gunk. Also, *sigreturn checks for new
				187	pending signals after the old frame has been cleaned up, since there's a
				188	requirement that all deliverable pending signals are delivered before
				189	the mainline code makes progress. This means that a program could
				190	live-lock on signals, but that's what would happen running natively...
				191
				192	Another thing to watch for: programs which unwind the stack (like gdb,
				193	or exception throwers) recognize the existence of a signal frame by
				194	looking at the code the return address points to: if it is one of the
				195	two specific signal return sequences, it knows its a signal frame.
				196	That's why the signal handler return address must point to a very
				197	specific set of instructions.
				198
				199
				200	What else. Ah, the two internal signals.
				201
				202	SIGVGKILL is pretty straightforward: its just used to dislodge a thread
				203	from being blocked in a syscall, so that we can get the thread to
				204	terminate in a timely fashion.
				205
				206	SIGVGCHLD is used by a thread to tell the master_tid that it has
				207	exited. However, the only time the master_tid cares about this is when
				208	it has already exited, and its waiting for everyone else to exit. If
				209	the master_tid hasn't exited, then this signal is ignored. It isn't
				210	enough to simply block it, because that will cause a pile of queued
				211	SIGVGCHLDs to build up, eventually clogging the kernel's signal delivery
				212	mechanism. If its unblocked and ignored, it doesn't interrupt syscalls
				213	and it doesn't accumulate.
				214
				215
				216	I hope that helps clarify things. And explain why there's so much stuff
				217	in there: it's tracking a very complex and arcane underlying set of
				218	machinery.
				219
				220	J
sewardj	0aa2479	2005-03-14 17:32:52 +0000	[diff] [blame]	221
				222	--------------------------------------------------------------------
				223
				224	>I've been seeing references to 'master thread' around the place.
				225	>What distinguishes the master thread from the rest? Where does
				226	>the requirement to have a master thread come from?
				227	>
				228	It used to be tid 1, but I had to generalize it.
				229
				230	The master_tid isn't very special; its main job is at process shutdown.
				231	It waits for all the other threads to exit, and then produces all the
				232	final reports. Until it exits, it's just a normal thread, with no other
				233	responsibilities.
				234
				235	The alternative to having a master thread would be to make whichever
				236	thread exits last be responsible for emitting all the output. That
				237	would work, but it would make the results a bit asynchronous (that is,
				238	if the main thread exits and the other hang around for a while, anyone
				239	waiting on the process would see it as having exited, but no results
				240	would have been produced).
				241
				242	VG_(master_tid) is a varable to handle the case where a threaded program
				243	forks. In the first process, the master_tid will be 1. If that program
				244	creates a few threads, and then, say, thread 3 forks, the child process
				245	will have a single thread in it. In the child, master_tid will be 3.
				246	It was easier to make the master thread a variable than to try to work
				247	out how to rename thread 3 to 1 after a fork.
				248
				249	J
				250
sewardj	3350a43	2005-03-17 12:55:35 +0000	[diff] [blame]	251	--------------------------------------------------------------------
				252
				253	Re: Fwd: Documentation of kernel's signal routing ?
				254	From: David Woodhouse <...>
				255	To: Julian Seward <jseward@acm.org>
				256
				257	> Regarding sys_clone created threads. I have a vague idea that
				258	> there is a notion of 'thread group'. I further understand that if
				259	> one thread in a group calls sys_exit_group then all threads in that
				260	> group exit. Whereas if a thread calls sys_exit then just that
				261	> thread exits.
				262	>
				263	> I'm pretty hazy on this:
				264
				265	Hmm, so am I :)
				266
				267	> * Is the above correct?
				268
				269	Yes, I believe so.
				270
				271	> * How is thread-group membership defined/changed?
				272
				273	By specifying CLONE_THREAD in the flags to clone(), you remain part of
				274	the same thread group as the parent. In a single-threaded process, the
				275	thread group id (tgid) is the same as the pid.
				276
				277	Linux just has tasks, which sometimes happen to share VM -- and now with
				278	NPTL we also share other stuff like signals, etc. The 'pid' in Linux is
				279	what POSIX would call the 'thread id', and the 'tgid' in Linux is
				280	equivalent to the POSIX 'pid'.
				281
				282	> * Do you know offhand how LinuxThreads and NPTL use thread groups?
				283
				284	I believe that LT doesn't use the kernel's concept of thread groups at
				285	all. LT predates the kernel's support for proper POSIX-like sharing of
				286	anything much but memory, so uses only the CLONE_VM (and possibly
				287	CLONE_FILES) flags. I don't _think_ it uses CLONE_SIGHAND -- it does
				288	most of its work by propagating signals manually between threads.
				289
				290	NTPL uses thread groups as generated by the CLONE_THREAD flag, which is
				291	what invokes the POSIX-related thread semantics.
				292
				293	> Is it the case that each LinuxThreads threads is in its own
				294	> group whereas all NTPL threads [in a process] are in a single
				295	> group?
				296
				297	Yes, that's my understanding.
				298
				299	--
				300	dwmw2