DM: run child tasks that are already on the CPU threadpool serially

These tasks tend to do similar things with similar sized bitmaps, so running
them serially means we tend to hold 2x bitmaps at a time (golden and
comparison) instead of (1+k)x bitmaps (golden and k concurrent comparisons).

We still migrate GPU task's children over to the main CPU thread pool,
because they'll run faster there and free up capacity on the GPU thread.

Before
  Debug: 54s, 2.9G peak
  Release: 13s, 2.4G peak

After
  Debug: 48s, 1.5G peak
  Release: 15s, 2.0G peak

BUG=skia:2478
R=borenet@google.com, mtklein@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/261593008

git-svn-id: http://skia.googlecode.com/svn/trunk@14486 2bbb7eff-a529-9590-31e7-b0007b416f81
diff --git a/dm/DMTask.cpp b/dm/DMTask.cpp
index 8b6a94e..83df849 100644
--- a/dm/DMTask.cpp
+++ b/dm/DMTask.cpp
@@ -37,7 +37,7 @@
     fReporter->finish(this->name(), SkTime::GetMSecs() - fStart);
 }
 
-void Task::spawnChild(CpuTask* task) {
+void Task::reallySpawnChild(CpuTask* task) {
     fTaskRunner->add(task);
 }
 
@@ -53,6 +53,11 @@
     SkDELETE(this);
 }
 
+void CpuTask::spawnChild(CpuTask* task) {
+    // Run children serially on this (CPU) thread.  This tends to save RAM and is usually no slower.
+    task->run();
+}
+
 GpuTask::GpuTask(Reporter* reporter, TaskRunner* taskRunner) : Task(reporter, taskRunner) {}
 
 void GpuTask::run(GrContextFactory& factory) {
@@ -64,6 +69,9 @@
     SkDELETE(this);
 }
 
-
+void GpuTask::spawnChild(CpuTask* task) {
+    // Really spawn a new task so it runs on the CPU threadpool instead of the GPU one we're on now.
+    this->reallySpawnChild(task);
+}
 
 }  // namespace DM