math: new log2

from https://github.com/ARM-software/optimized-routines,
commit 04884bd04eac4b251da4026900010ea7d8850edc

code size change: +2458 bytes (+1524 bytes with fma).
benchmark on x86_64 before, after, speedup:

-Os:
  log2 rthruput:  16.08 ns/call 10.49 ns/call 1.53x
   log2 latency:  44.54 ns/call 25.55 ns/call 1.74x
-O3:
  log2 rthruput:  15.92 ns/call 10.11 ns/call 1.58x
   log2 latency:  44.66 ns/call 26.16 ns/call 1.71x
3 files changed