Improve callgrind performance by 4 to 8% using UNLIKELY
Performance improvements from 4 to 8% obtained on amd64 on the perf tests by:
1. using UNLIKELY inside tracing macros
2. avoid calling CLG_(switch_thread)(tid) on the hot patch setup_bbcc
   unless tid differs from CLG_(current_tid).



git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12939 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/callgrind/bbcc.c b/callgrind/bbcc.c
index 22dc16f..ad8a76d 100644
--- a/callgrind/bbcc.c
+++ b/callgrind/bbcc.c
@@ -571,7 +571,12 @@
    */
   tid = VG_(get_running_tid)();
 #if 1
-  CLG_(switch_thread)(tid);
+  /* CLG_(switch_thread) is a no-op when tid is equal to CLG_(current_tid).
+   * As this is on the hot path, we only call CLG_(switch_thread)(tid)
+   * if tid differs from the CLG_(current_tid).
+   */
+  if (UNLIKELY(tid != CLG_(current_tid)))
+     CLG_(switch_thread)(tid);
 #else
   CLG_ASSERT(VG_(get_running_tid)() == CLG_(current_tid));
 #endif