Replace interp() with clut_{3,4}D stages.

I tried to follow exactly the same strategy as a start.
(Though I did fix the off-by-one dimensions.)

It does rather look like we only need 3D and 4D now
that I've looked at the call sites.

Looks like about a 20% speedup.

Change-Id: I8b1af64750ad1750716ee1ab0767e64591c7206a
Reviewed-on: https://skia-review.googlesource.com/32842
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
9 files changed