Improve x86 instrinsic implementation.

* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
  function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
  (useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
  disables precompute_partition_info_sums_32bit_asm_ia32_().
  SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
  PABSD so it is slightly slower.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
16 files changed