simplify BlendTest.cpp

  - streamline the testing down to just byte multiplies
    (that's always where the blend algorithms vary)
  - add another approximate multiply (x*y+255)>>8
  - add another variant of the perfect multiply, ((x*y+128)*257)>>16

I've realized ((x*y+128)*257)>>16 might be just as fast in SSE/NEON
as our current (x*y+x)>>8 approximation.  Good to be testing it here.

BUG=skia:

Review URL: https://codereview.chromium.org/1453043005
1 file changed