always initialize thread pointer at program start

this is the first step in an overhaul aimed at greatly simplifying and
optimizing everything dealing with thread-local state.

previously, the thread pointer was initialized lazily on first access,
or at program startup if stack protector was in use, or at certain
random places where inconsistent state could be reached if it were not
initialized early. while believed to be fully correct, the logic was
fragile and non-obvious.

in the first phase of the thread pointer overhaul, support is retained
(and in some cases improved) for systems/situation where loading the
thread pointer fails, e.g. old kernels.

some notes on specific changes:

- the confusing use of libc.main_thread as an indicator that the
  thread pointer is initialized is eliminated in favor of an explicit
  has_thread_pointer predicate.

- sigaction no longer needs to ensure that the thread pointer is
  initialized before installing a signal handler (this was needed to
  prevent a situation where the signal handler caused the thread
  pointer to be initialized and the subsequent sigreturn cleared it
  again) but it still needs to ensure that implementation-internal
  thread-related signals are not blocked.

- pthread tsd initialization for the main thread is deferred in a new
  manner to minimize bloat in the static-linked __init_tp code.

- pthread_setcancelstate no longer needs special handling for the
  situation before the thread pointer is initialized. it simply fails
  on systems that cannot support a thread pointer, which are
  non-conforming anyway.

- pthread_cleanup_push/pop now check for missing thread pointer and
  nop themselves out in this case, so stdio no longer needs to avoid
  the cancellable path when the thread pointer is not available.

a number of cases remain where certain interfaces may crash if the
system does not support a thread pointer. at this point, these should
be limited to pthread interfaces, and the number of such cases should
be fewer than before.
diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index ee6c31c..08c7411 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -77,6 +77,7 @@
 
 void __do_cleanup_push(struct __ptcb *cb)
 {
+	if (!libc.has_thread_pointer) return;
 	struct pthread *self = pthread_self();
 	cb->__next = self->cancelbuf;
 	self->cancelbuf = cb;
@@ -84,6 +85,7 @@
 
 void __do_cleanup_pop(struct __ptcb *cb)
 {
+	if (!libc.has_thread_pointer) return;
 	__pthread_self()->cancelbuf = cb->__next;
 }
 
@@ -110,6 +112,8 @@
 /* pthread_key_create.c overrides this */
 static const size_t dummy = 0;
 weak_alias(dummy, __pthread_tsd_size);
+static const void *dummy_tsd[1] = { 0 };
+weak_alias(dummy_tsd, __pthread_tsd_main);
 
 static FILE *const dummy_file = 0;
 weak_alias(dummy_file, __stdin_used);
@@ -127,7 +131,7 @@
 {
 	int ret;
 	size_t size, guard;
-	struct pthread *self = pthread_self(), *new;
+	struct pthread *self, *new;
 	unsigned char *map = 0, *stack = 0, *tsd = 0, *stack_limit;
 	unsigned flags = CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND
 		| CLONE_THREAD | CLONE_SYSVSEM | CLONE_SETTLS
@@ -135,13 +139,16 @@
 	int do_sched = 0;
 	pthread_attr_t attr = {0};
 
-	if (!self) return ENOSYS;
+	if (!libc.can_do_threads) return ENOSYS;
+	self = __pthread_self();
 	if (!libc.threaded) {
 		for (FILE *f=libc.ofl_head; f; f=f->next)
 			init_file_lock(f);
 		init_file_lock(__stdin_used);
 		init_file_lock(__stdout_used);
 		init_file_lock(__stderr_used);
+		__syscall(SYS_rt_sigprocmask, SIG_UNBLOCK, SIGPT_SET, 0, _NSIG/8);
+		self->tsd = __pthread_tsd_main;
 		libc.threaded = 1;
 	}
 	if (attrp) attr = *attrp;