Change the way thread termination is handled.  Until now, there has
been a concept of a 'master thread'.  This is the first thread in the
process.  There was special logic which kept the master thread alive
artificially should it attempt to exit before its children.  So the
master would wait for all children to exit and then exit itself, in
the process emitting the final summary of errors, leaks, etc.

This has the advantage that any process waiting on this one will see
the final summaries appearing before its sys_wait call returns.  In
other words, the final summary output is synchronous with the
master-thread exiting.

Unfortunately the master-thread idea has a serious drawback, namely
that it can and sometimes does cause threaded programs to deadlock at
exit.  It introduces an artificial dependency which is that the master
thread cannot really exit until all its children have exited.  If --
by any means at all -- the children are waiting for the master to exit
before exiting themselves, deadlock results.  There are now two known
examples of such deadlocks.

This commit removes the master thread concept and lets threads exit in
the order which they would have exited without Valgrind's involvement.
The last thread to exit prints the final summaries.  This has the
disadvantage that final output may appear arbitrarily later relative
to the exit of the initial thread.  Whether this is a problem in
practice remains to be seen.

As a minor side effect of this change, some functions have had
_NORETURN added to their names.  Such functions do not return.  The
thread in which they execute is guaranteed to exit before they return.
This makes the logic somewhat easier to follow.

amd64 compilation is now broken.  I will fix it shortly.




git-svn-id: svn://svn.valgrind.org/valgrind/trunk@3816 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/coregrind/core.h b/coregrind/core.h
index 0c7d2ce..40ba161 100644
--- a/coregrind/core.h
+++ b/coregrind/core.h
@@ -345,7 +345,10 @@
 
 // Do everything which needs doing before the process finally ends,
 // like printing reports, etc
-extern void VG_(shutdown_actions)(ThreadId tid);
+extern void VG_(shutdown_actions_NORETURN) (
+               ThreadId tid, 
+               VgSchedReturnCode tids_schedretcode 
+            );
 
 extern void VG_(scheduler_init) ( void );
 
@@ -524,12 +527,6 @@
 Char* VG_(build_child_VALGRINDCLO) ( Char* exename );
 Char* VG_(build_child_exename)     ( void );
 
-/* The master thread the one which will be responsible for mopping
-   everything up at exit.  Normally it is tid 1, since that's the
-   first thread created, but it may be something else after a
-   fork(). */
-extern ThreadId VG_(master_tid);
-
 /* Called when some unhandleable client behaviour is detected.
    Prints a msg and aborts. */
 extern void VG_(unimplemented) ( Char* msg )
@@ -608,21 +605,23 @@
                                  /*MOD*/ ThreadArchState* arch );
 
 // OS/Platform-specific thread clear (after thread exit)
-extern void VGA_(os_state_clear)(ThreadState *);
+extern void VGO_(os_state_clear)(ThreadState *);
 
 // OS/Platform-specific thread init (at scheduler init time)
-extern void VGA_(os_state_init)(ThreadState *);
+extern void VGO_(os_state_init)(ThreadState *);
 
-// Run a thread from beginning to end.  Does not return if tid == VG_(master_tid).
-void VGA_(thread_wrapper)(Word /*ThreadId*/ tid);
+// Run a thread from beginning to end. 
+extern VgSchedReturnCode VGO_(thread_wrapper)(Word /*ThreadId*/ tid);
 
-// Like VGA_(thread_wrapper), but it allocates a stack before calling
-// to VGA_(thread_wrapper) on that stack, as if it had been set up by
-// clone()
-void VGA_(main_thread_wrapper)(ThreadId tid) __attribute__ ((__noreturn__));
+// Call here to exit the entire Valgrind system.
+extern void VGO_(terminate_NORETURN)(ThreadId tid, VgSchedReturnCode src);
+
+// Allocates a stack for the first thread, then runs it,
+// as if the thread had been set up by clone()
+extern void VGP_(main_thread_wrapper_NORETURN)(ThreadId tid);
 
 // Return how many bytes of a thread's Valgrind stack are unused
-SSizeT VGA_(stack_unused)(ThreadId tid);
+extern SSizeT VGA_(stack_unused)(ThreadId tid);
 
 // wait until all other threads are dead
 extern void VGA_(reap_threads)(ThreadId self);