Scalar RAddStoreExpMinusMax micro-kernels

- Building blocks for SoftArgMax operator on WAsm
- P5 and LUT64+P2 implementations
- scalar_p5_x4_acc2 version is the fastest on both ARM64 and x86-64

PiperOrigin-RevId: 290780293
diff --git a/test/f32-raddstoreexpminusmax.yaml b/test/f32-raddstoreexpminusmax.yaml
index 77bf12e..5d4490e 100644
--- a/test/f32-raddstoreexpminusmax.yaml
+++ b/test/f32-raddstoreexpminusmax.yaml
@@ -26,3 +26,15 @@
 - name: xnn_f32_raddstoreexpminusmax_ukernel__avx512f_p5_scalef_x192_acc2
 - name: xnn_f32_raddstoreexpminusmax_ukernel__avx512f_p5_scalef_x192_acc3
 - name: xnn_f32_raddstoreexpminusmax_ukernel__avx512f_p5_scalef_x192_acc6
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_p5_x1
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_p5_x2
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_p5_x2_acc2
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_p5_x4
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_p5_x4_acc2
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_p5_x4_acc4
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_lut64_p2_x1
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_lut64_p2_x2
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_lut64_p2_x2_acc2
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_lut64_p2_x4
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_lut64_p2_x4_acc2
+- name: xnn_f32_raddstoreexpminusmax_ukernel__scalar_lut64_p2_x4_acc4