DM: make GPU tasks multithreaded again.  Big refactor.

The main meat of things is in SkThreadPool.  We can now give SkThreadPool a
type for each thread to create and destroy on its local stack.  It's TLS
without going through SkTLS.

I've split the DM tasks into CpuTasks that run on threads with no TLS, and
GpuTasks that run on threads with a thread local GrContextFactory.

The old CpuTask and GpuTask have been renamed to CpuGMTask and GpuGMTask.

Upshot: default run of out/Debug/dm goes from ~45 seconds to ~20 seconds.

BUG=skia:
R=bsalomon@google.com, mtklein@google.com, reed@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/179233005

git-svn-id: http://skia.googlecode.com/svn/trunk@13632 2bbb7eff-a529-9590-31e7-b0007b416f81
diff --git a/dm/DMWriteTask.h b/dm/DMWriteTask.h
index 49a5c74..839abd7 100644
--- a/dm/DMWriteTask.h
+++ b/dm/DMWriteTask.h
@@ -12,14 +12,13 @@
 
 namespace DM {
 
-class WriteTask : public Task {
+class WriteTask : public CpuTask {
 
 public:
     WriteTask(const Task& parent,  // WriteTask must be a child Task.  Pass its parent here.
               SkBitmap bitmap);    // Bitmap to write.
 
     virtual void draw() SK_OVERRIDE;
-    virtual bool usesGpu() const SK_OVERRIDE { return false; }
     virtual bool shouldSkip() const SK_OVERRIDE;
     virtual SkString name() const SK_OVERRIDE;