libFLAC : SSE optimisations.

Add new function:

    FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse41()

and rewrite function:

    FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()

Testing shows noticeable speed increase on Intel Core i3/5/7 (up to 30%
for -8 mode), AMD Athlon64, Phenom, Bulldozer/Piledriver, but no increase
or even very small speed decrease (~2% for -8 mode) on Intel Core2.

Patch-from: lvqcl <lvqcl.mail@gmail.com>
4 files changed