add missing memory barrier to pthread_join

POSIX requires pthread_join to synchronize memory on success.  The
futex wait inside __timedwait_cp cannot handle this because it's not
called in all cases.  Also, in the case of a spurious wake, tid can
become zero between the wake and when the joining thread checks it.
diff --git a/src/thread/pthread_join.c b/src/thread/pthread_join.c
index 966b4ab..694d377 100644
--- a/src/thread/pthread_join.c
+++ b/src/thread/pthread_join.c
@@ -13,6 +13,7 @@
 	if (cs == PTHREAD_CANCEL_ENABLE) __pthread_setcancelstate(cs, 0);
 	while ((tmp = t->tid)) __timedwait_cp(&t->tid, tmp, 0, 0, 0);
 	__pthread_setcancelstate(cs, 0);
+	a_barrier();
 	if (res) *res = t->result;
 	if (t->map_base) __munmap(t->map_base, t->map_size);
 	return 0;