Improve x86 instrinsic implementation.
* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
(useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
disables precompute_partition_info_sums_32bit_asm_ia32_().
SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
PABSD so it is slightly slower.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
16 files changed