Enable NEON DOT QS8 [I]GEMM microkernels on ARM64

- Add 1xNR versions of the microkernels
- Enable NEON DOT microkernels on supporting CPUs

Performance improvement on Galaxy S10 Exynos:
- MobileNet v1: 46 ms -> 12 ms
- MobileNet v2: 27 ms -> 10 ms

PiperOrigin-RevId: 330874920
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 0235c0f..14dfa9f 100755
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1233,13 +1233,16 @@
   src/f16-spmm/gen/32x1-minmax-neonfp16arith-unroll2.c)
 
 SET(XNNPACK_NEONDOT_MICROKERNEL_SRCS
+  src/qs8-gemm/gen/1x8c4-minmax-neondot.c
   src/qs8-gemm/gen/8x8c4-minmax-neondot.c
   src/qs8-gemm/gen/12x8c4-minmax-neondot.c
+  src/qs8-gemm/gen/1x16c4-minmax-neondot.c
   src/qs8-gemm/gen/4x16c4-minmax-neondot.c
+  src/qs8-igemm/gen/1x8c4-minmax-neondot.c
   src/qs8-igemm/gen/8x8c4-minmax-neondot.c
   src/qs8-igemm/gen/12x8c4-minmax-neondot.c
-  src/qs8-igemm/gen/4x16c4-minmax-neondot.c
-)
+  src/qs8-igemm/gen/1x16c4-minmax-neondot.c
+  src/qs8-igemm/gen/4x16c4-minmax-neondot.c)
 
 SET(XNNPACK_SSE_MICROKERNEL_SRCS
   src/f32-avgpool/9p8x-minmax-sse-c4.c