libcorkscrew native stacks, mutex ranking, and better ScopedThreadListLock.

This change uses libcorkscrew to show native stacks for threads in kNative or,
unlike dalvikvm, kVmWait --- working on the runtime directly I've found it
somewhat useful to be able to see _which_ internal resource we're waiting on.
We can always take that back out (or make it oatexecd-only) if it turns out to
be too noisy/confusing for app developers.

This change also lets us rank mutexes and enforce -- in oatexecd -- that you
take locks in a specific order.

Both of these helped me test the third novelty: removing the heap locking from
ScopedThreadListLock. I've manually inspected all the callers and added a
ScopedHeapLock where I think one is necessary. In manual testing, this makes
jdb a lot less prone to locking us up. There still seems to be a problem with
the JDWP VirtualMachine.Resume command, but I'll look at that separately. This
is a big enough and potentially disruptive enough change already.

Change-Id: Iad974358919d0e00674662dc8a69cc65878cfb5c
diff --git a/src/trace.cc b/src/trace.cc
index f5d118d..b481b74 100644
--- a/src/trace.cc
+++ b/src/trace.cc
@@ -45,17 +45,17 @@
   return (method | traceEvent);
 }
 
-bool UseThreadCpuClock() {
+static bool UseThreadCpuClock() {
   // TODO: Allow control over which clock is used
   return true;
 }
 
-bool UseWallClock() {
+static bool UseWallClock() {
   // TODO: Allow control over which clock is used
   return true;
 }
 
-void MeasureClockOverhead() {
+static void MeasureClockOverhead() {
   if (UseThreadCpuClock()) {
     ThreadCpuMicroTime();
   }
@@ -64,7 +64,7 @@
   }
 }
 
-uint32_t GetClockOverhead() {
+static uint32_t GetClockOverhead() {
   uint64_t start = ThreadCpuMicroTime();
 
   for (int i = 4000; i > 0; i--) {
@@ -82,19 +82,22 @@
   return uint32_t (elapsed / 32);
 }
 
-void Append2LE(uint8_t* buf, uint16_t val) {
+// TODO: put this somewhere with the big-endian equivalent used by JDWP.
+static void Append2LE(uint8_t* buf, uint16_t val) {
   *buf++ = (uint8_t) val;
   *buf++ = (uint8_t) (val >> 8);
 }
 
-void Append4LE(uint8_t* buf, uint32_t val) {
+// TODO: put this somewhere with the big-endian equivalent used by JDWP.
+static void Append4LE(uint8_t* buf, uint32_t val) {
   *buf++ = (uint8_t) val;
   *buf++ = (uint8_t) (val >> 8);
   *buf++ = (uint8_t) (val >> 16);
   *buf++ = (uint8_t) (val >> 24);
 }
 
-void Append8LE(uint8_t* buf, uint64_t val) {
+// TODO: put this somewhere with the big-endian equivalent used by JDWP.
+static void Append8LE(uint8_t* buf, uint64_t val) {
   *buf++ = (uint8_t) val;
   *buf++ = (uint8_t) (val >> 8);
   *buf++ = (uint8_t) (val >> 16);
@@ -391,8 +394,10 @@
 }
 
 static void DumpThread(Thread* t, void* arg) {
-  std::ostream* os = reinterpret_cast<std::ostream*>(arg);
-  *os << StringPrintf("%d\t%s\n", t->GetTid(), t->GetThreadName()->ToModifiedUtf8().c_str());
+  std::ostream& os = *reinterpret_cast<std::ostream*>(arg);
+  std::string name;
+  t->GetThreadName(name);
+  os << t->GetTid() << "\t" << name << "\n";
 }
 
 void Trace::DumpThreadList(std::ostream& os) {