036e1831e05ae3a6ec9bcd30cb24f6b1a49a3541 - platform/external/skia

commit	036e1831e05ae3a6ec9bcd30cb24f6b1a49a3541	[log] [tgz]
author	mtklein <mtklein@chromium.org>	Fri Jul 15 07:45:53 2016 -0700
committer	Commit bot <commit-bot@chromium.org>	Fri Jul 15 07:45:53 2016 -0700
tree	81efe17768f56658fc48fc7a694e352809da3072
parent	58e389b0518b46bbe58ba01c23443cf23c18435c [diff]

Add a bench to measure the best way to pack from int to uint16_t with SSE.

I measured relative runtimes on my laptop:

   pack_int_uint16_t_ss…
   1036  …e41 1x  …se3 1.01x  …e2_b 3.01x  …e2_a 3.02x

I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.

The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.

The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Review-Url: https://codereview.chromium.org/2150343002

bench/pack_int_uint16_t_Bench.cpp[Added - diff]
src/opts/SkNx_sse.h[diff]

2 files changed