Threaded generation of software paths

Re-land of: https://skia-review.googlesource.com/36560

All information needed by the thread is captured by the prepare
callback object, the lambda captures a pointer to that, and does the
mask render. Once it's done, it signals the semaphore (also owned by the
callback). The callback defers the semaphore wait even longer (into the
ASAP upload), so the odds of waiting for the thread are REALLY low.

Also did a bunch of cleanup along the way, and put in some trace markers
so we can monitor how well this is working.

Traces of a GM that includes GPU and SW path rendering (path-reverse):

Original:
    https://screenshot.googleplex.com/f5BG3901tQg.png
Threaded, with wait in the callback (notice pre flush callback blocking):
    https://screenshot.googleplex.com/htOSZFE2s04.png
Current version, with wait deferred to ASAP upload function:
    https://screenshot.googleplex.com/GHjD0U3C34q.png
Bug: skia:
Change-Id: Idb92f385590749f41328a9aec65b2a93f4775079
Reviewed-on: https://skia-review.googlesource.com/40775
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
diff --git a/dm/DMSrcSink.h b/dm/DMSrcSink.h
index 1175149..8b6ee0e 100644
--- a/dm/DMSrcSink.h
+++ b/dm/DMSrcSink.h
@@ -310,6 +310,9 @@
             bool threaded, const GrContextOptions& grCtxOptions);
 
     Error draw(const Src&, SkBitmap*, SkWStream*, SkString*) const override;
+    Error onDraw(const Src&, SkBitmap*, SkWStream*, SkString*,
+                 const GrContextOptions& baseOptions) const;
+
     bool serial() const override { return !fThreaded; }
     const char* fileExtension() const override { return "png"; }
     SinkFlags flags() const override {
@@ -317,6 +320,8 @@
                                                       : SinkFlags::kNotMultisampled;
         return SinkFlags{ SinkFlags::kGPU, SinkFlags::kDirect, ms };
     }
+    const GrContextOptions& baseContextOptions() const { return fBaseContextOptions; }
+
 private:
     sk_gpu_test::GrContextFactory::ContextType        fContextType;
     sk_gpu_test::GrContextFactory::ContextOverrides   fContextOverrides;
@@ -329,6 +334,22 @@
     GrContextOptions                                  fBaseContextOptions;
 };
 
+class GPUThreadTestingSink : public GPUSink {
+public:
+    GPUThreadTestingSink(sk_gpu_test::GrContextFactory::ContextType,
+                         sk_gpu_test::GrContextFactory::ContextOverrides, int samples, bool diText,
+                         SkColorType colorType, SkAlphaType alphaType,
+                         sk_sp<SkColorSpace> colorSpace, bool threaded,
+                         const GrContextOptions& grCtxOptions);
+
+    Error draw(const Src&, SkBitmap*, SkWStream*, SkString*) const override;
+
+private:
+    std::unique_ptr<SkExecutor> fExecutor;
+
+    typedef GPUSink INHERITED;
+};
+
 class PDFSink : public Sink {
 public:
     PDFSink(bool pdfa, SkScalar rasterDpi) : fPDFA(pdfa), fRasterDpi(rasterDpi) {}