DM: make GPU tasks multithreaded again.  Big refactor.

The main meat of things is in SkThreadPool.  We can now give SkThreadPool a
type for each thread to create and destroy on its local stack.  It's TLS
without going through SkTLS.

I've split the DM tasks into CpuTasks that run on threads with no TLS, and
GpuTasks that run on threads with a thread local GrContextFactory.

The old CpuTask and GpuTask have been renamed to CpuGMTask and GpuGMTask.

Upshot: default run of out/Debug/dm goes from ~45 seconds to ~20 seconds.

BUG=skia:
R=bsalomon@google.com, mtklein@google.com, reed@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/179233005

git-svn-id: http://skia.googlecode.com/svn/trunk@13632 2bbb7eff-a529-9590-31e7-b0007b416f81
diff --git a/dm/DMExpectationsTask.cpp b/dm/DMExpectationsTask.cpp
index cb92486..e29257a 100644
--- a/dm/DMExpectationsTask.cpp
+++ b/dm/DMExpectationsTask.cpp
@@ -6,7 +6,7 @@
 ExpectationsTask::ExpectationsTask(const Task& parent,
                                    const Expectations& expectations,
                                    SkBitmap bitmap)
-    : Task(parent)
+    : CpuTask(parent)
     , fName(parent.name())  // Masquerade as parent so failures are attributed to it.
     , fExpectations(expectations)
     , fBitmap(bitmap)