jumper, rework callback a bit, use it for color_lookup_table

Looks like the color-space images have this well tested (even without
lab_to_xyz) and the diffs look like rounding/FMA.

The old plan to keep loads and stores outside callback was:
  1) awkward, with too many pointers and pointers to pointers to track
  2) misguided... load and store stages march ahead by x,
     working at ptr+0, ptr+8, ptr+16, etc. while callback
     always wants to be working at the same spot in the buffer.

I spent a frustrating day in lldb to understood 2).  :/

So now the stage always store4's its pixels to a buffer in the context
before the callback, and when the callback returns it load4's them back
from a pointer in the context, defaulting to that same buffer.

Instead of passing a void* into the callback, we pass the context
itself.  This lets us subclass the context and add our own data...
C-compatible object-oriented programming.

Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43
Reviewed-on: https://skia-review.googlesource.com/13985
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
diff --git a/src/core/SkRasterPipeline.h b/src/core/SkRasterPipeline.h
index 29c560d..66b4c3a 100644
--- a/src/core/SkRasterPipeline.h
+++ b/src/core/SkRasterPipeline.h
@@ -87,7 +87,7 @@
     M(parametric_r) M(parametric_g) M(parametric_b)              \
     M(parametric_a)                                              \
     M(table_r) M(table_g) M(table_b) M(table_a)                  \
-    M(color_lookup_table) M(lab_to_xyz)                          \
+    M(lab_to_xyz)                                                \
     M(clamp_x) M(mirror_x) M(repeat_x)                           \
     M(clamp_y) M(mirror_y) M(repeat_y)                           \
     M(gather_a8) M(gather_g8) M(gather_i8)                       \