ARM Skia NEON patches - 22 - S32_D565_Blend

BlitRow565: new NEON version of S32_D565_Blend

This new implementation brings a good speedup in most cases and
gives exact results (removes one mismatch in gm).

Here are the benchmark results (speedup vs. existing S32A_D565_Blend):

+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1     | -26,7%    | -27,5%     |
+-------+-----------+------------+
| 2     | 0%        | +53%       |
+-------+-----------+------------+
| 4     | +38,3%    | +26,5%     |
+-------+-----------+------------+
| 8     | +10,9%    |  -4,5%     |
+-------+-----------+------------+
| 16    | +18,2%    | +1,6%      |
+-------+-----------+------------+
| 64    | +22,3%    | +8,75%     |
+-------+-----------+------------+
| 256   | +12,3%    | +11,2%     |
+-------+-----------+------------+
| 1024  | +79,2%    | +10,9%     |
+-------+-----------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/181523002

git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed