Adds Neon assembly for correlation/convolution

Optimizing celt_pitch_xcorr()/xcorr_kernel() which also speeds up
FIRs, IIRs and auto-correlations

Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
diff --git a/celt/celt.h b/celt/celt.h
index 1d41f5f..5deea1f 100644
--- a/celt/celt.h
+++ b/celt/celt.h
@@ -122,7 +122,8 @@
 
 int celt_encode_with_ec(OpusCustomEncoder * OPUS_RESTRICT st, const opus_val16 * pcm, int frame_size, unsigned char *compressed, int nbCompressedBytes, ec_enc *enc);
 
-int celt_encoder_init(CELTEncoder *st, opus_int32 sampling_rate, int channels);
+int celt_encoder_init(CELTEncoder *st, opus_int32 sampling_rate, int channels,
+                      int arch);