29dbae9acf00bc6ffb9f7baa3438306029aa68cb - platform/external/skqp

commit	29dbae9acf00bc6ffb9f7baa3438306029aa68cb	[log] [tgz]
author	Mike Klein <mtklein@chromium.org>	Thu Aug 03 12:18:24 2017 -0400
committer	Skia Commit-Bot <skia-commit-bot@chromium.org>	Thu Aug 03 16:59:37 2017 +0000
tree	50885c8baf6c41dbfb64d6b617556b170e0ba191
parent	d0677bc44f74b257aa193ff1f635b3c7351dc48e [diff]

same 16->8 bit packing trick for SSE2/SSE4.1

It's funny how now that I'm on a machine that doesn't support AVX2,
it's suddenly important for me that pack() is optimized for SSE!

This is basically the same as this morning, without any weird AVX2
pack ordering issues.  This replaces something like

    movdqa     2300(%rip), %xmm0
    pshufb     %xmm0, %xmm3
    pshufb     %xmm0, %xmm2
    punpcklqdq %xmm3, %xmm2
    (This is SSE4.1; the SSE2 version is worse.)

with

    psrlw    $8, %xmm3
    psrlw    $8, %xmm2
    packuswb %xmm3, %xmm2
    (SSE2 and SSE4.1 both.)

It's always nice to not need to load a shuffle mask out of memory.

Change-Id: I56fb30b31fcedc0ee84a4a71c483a597c8dc1622
Reviewed-on: https://skia-review.googlesource.com/30583
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>

3 files changed

tree: 50885c8baf6c41dbfb64d6b617556b170e0ba191