i965/tiled_memcpy: Optimize RGBA -> BGRA swizzle.

Replaces four byte loads and four byte stores with a load, bswap,
rotate, store; or a movbe, rotate, store.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
1 file changed