Make gemmlowp able to run with multi-threads.

  - The most recent gemmlowp rebase changed the default value of
    MaxNumThreads from 0 to 1, which means by default it is running
    single-threaded.
  - This change reset MaxNumThreads to 0 for BNNM Intrinsic.

Exempt-From-Owner-Approval: Only current owner is this a mailing list
android-renderscript-dev+review@google.com
And mailing lists aren't currently supported.

Test: mm
Test: Verified the performance increase is about 2~3X on sailfish.
Test: All BLAS CTS pass
Change-Id: I01dda1915f4d427547dbd907c4533771b7669593
diff --git a/cpu_ref/rsCpuIntrinsicBLAS.cpp b/cpu_ref/rsCpuIntrinsicBLAS.cpp
index 4b08634..d60a3b9 100644
--- a/cpu_ref/rsCpuIntrinsicBLAS.cpp
+++ b/cpu_ref/rsCpuIntrinsicBLAS.cpp
@@ -877,6 +877,10 @@
 #endif
 
     // Using gemmlowp to calculate the low precision 8 bit GEMM.
+    // Set MaxNumThreads to 0. The value 0 lets the implementation query
+    // the system to determine the number of hardware threads
+    gemmlowp::eight_bit_int_gemm::SetMaxNumThreads(0);
+
     bool transpose_a = true;
     bool transpose_b = false;
     bool transpose_c = true;