Streamline x86 u8 -> fixed15 math.

We can use SSE's 16 bit mul-hi to get a very good approximation to the
ideal multiplier.  This lets us trim several instructions.

This removes the need for the constant 0x0001 and instead uses 0x8081.
I've reordered the constants so that 0x8000 comes first, which helps
trim an instruction here and there on ARM.

Change-Id: I3d490c802df39a89424230c4cfc491f52210c275
Reviewed-on: https://skia-review.googlesource.com/7282
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
4 files changed