lowp bilerp_clamp_8888

Use scaling multipy, _mm_mulhrs_epi16(Intel) and vqrdmulhq_s16(ARM),
to implement bilerp_8888_clamp.

This CL results in 756usec to 590usec improvement for
samplingoptions_filter_1_mipmap_0 on Intel. For ARM, this improvement
is 1180usec -> 897usec.

This CL introduces scaled_mult which takes fixed-point numbers on the
interval [-1, 1) and returns a result which is rescaled to the
interval [-1, 1).

It also introduces the notion of constrained_add(I16, U16) where the
result is guaranteed to be U16. This avoids moving to a 32-bit integer
for during the computation.

Change-Id: I410e494364039df63e5976f433f7e68355e9cfbf
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/443896
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Herb Derby <herb@google.com>
1 file changed