util: implement F16C using inline assembly on x86_64

F16C: https://en.wikipedia.org/wiki/F16C

This also happens to fix bptc-float-modes on llvmpipe.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6987>
6 files changed