jumper, rework callback a bit, use it for color_lookup_table

Looks like the color-space images have this well tested (even without
lab_to_xyz) and the diffs look like rounding/FMA.

The old plan to keep loads and stores outside callback was:
  1) awkward, with too many pointers and pointers to pointers to track
  2) misguided... load and store stages march ahead by x,
     working at ptr+0, ptr+8, ptr+16, etc. while callback
     always wants to be working at the same spot in the buffer.

I spent a frustrating day in lldb to understood 2).  :/

So now the stage always store4's its pixels to a buffer in the context
before the callback, and when the callback returns it load4's them back
from a pointer in the context, defaulting to that same buffer.

Instead of passing a void* into the callback, we pass the context
itself.  This lets us subclass the context and add our own data...
C-compatible object-oriented programming.

Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43
Reviewed-on: https://skia-review.googlesource.com/13985
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
diff --git a/src/core/SkColorSpaceXform_A2B.cpp b/src/core/SkColorSpaceXform_A2B.cpp
index a97d60b..19115d8 100644
--- a/src/core/SkColorSpaceXform_A2B.cpp
+++ b/src/core/SkColorSpaceXform_A2B.cpp
@@ -16,6 +16,7 @@
 #include "SkNx.h"
 #include "SkSRGB.h"
 #include "SkTypes.h"
+#include "../jumper/SkJumper.h"
 
 bool SkColorSpaceXform_A2B::onApply(ColorFormat dstFormat, void* dst, ColorFormat srcFormat,
                                     const void* src, int count, SkAlphaType alphaType) const {
@@ -183,8 +184,27 @@
             case SkColorSpace_A2B::Element::Type::kCLUT: {
                 SkCSXformPrintf("CLUT (%d -> %d) stage added\n", e.colorLUT().inputChannels(),
                                                                  e.colorLUT().outputChannels());
-                auto clut = this->copy(sk_ref_sp(&e.colorLUT()));
-                fElementsPipeline.append(SkRasterPipeline::color_lookup_table, clut->get());
+                struct CallbackCtx : SkJumper_CallbackCtx {
+                    sk_sp<const SkColorLookUpTable> clut;
+                    // clut->interp() can't always safely alias its arguments,
+                    // so we allocate a second buffer to hold our results.
+                    float results[4*SkJumper_kMaxStride];
+                };
+                auto cb = fAlloc.make<CallbackCtx>();
+                cb->clut      = sk_ref_sp(&e.colorLUT());
+                cb->read_from = cb->results;
+                cb->fn        = [](SkJumper_CallbackCtx* ctx, int active_pixels) {
+                    auto c = (CallbackCtx*)ctx;
+                    for (int i = 0; i < active_pixels; i++) {
+                        // Look up red, green, and blue for this pixel using 3-4 values from rgba.
+                        c->clut->interp(c->results+4*i, c->rgba+4*i);
+
+                        // If we used 3 inputs (rgb) preserve the fourth as alpha.
+                        // If we used 4 inputs (cmyk) force alpha to 1.
+                        c->results[4*i+3] = (3 == c->clut->inputChannels()) ? c->rgba[4*i+3] : 1.0f;
+                    }
+                };
+                fElementsPipeline.append(SkRasterPipeline::callback, cb);
                 break;
             }
             case SkColorSpace_A2B::Element::Type::kMatrix:
diff --git a/src/core/SkRasterPipeline.h b/src/core/SkRasterPipeline.h
index 29c560d..66b4c3a 100644
--- a/src/core/SkRasterPipeline.h
+++ b/src/core/SkRasterPipeline.h
@@ -87,7 +87,7 @@
     M(parametric_r) M(parametric_g) M(parametric_b)              \
     M(parametric_a)                                              \
     M(table_r) M(table_g) M(table_b) M(table_a)                  \
-    M(color_lookup_table) M(lab_to_xyz)                          \
+    M(lab_to_xyz)                                                \
     M(clamp_x) M(mirror_x) M(repeat_x)                           \
     M(clamp_y) M(mirror_y) M(repeat_y)                           \
     M(gather_a8) M(gather_g8) M(gather_i8)                       \