commit | 17ffb651173b85cc0156d6a5c403e12077a6e3a1 | [log] [tgz] |
---|---|---|
author | Jeff Hammond <jeff.r.hammond@intel.com> | Tue Sep 22 00:29:46 2020 -0700 |
committer | GitHub <noreply@github.com> | Tue Sep 22 07:29:46 2020 +0000 |
tree | 7aabfc189fa06f1ee3ccd6e2d7fb924319229c35 | |
parent | 76dafc7e3b197c532ba1951d13f0b0e73858b530 [diff] |
detect AVX-512 FMA count (#125) * add Ice Lake Server and Sapphire Rapids models The information contained in this commit was obtained from "Intel® Architecture Instruction Set Extensions and Future Features Programming Reference" document 319433-040 from https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com> * Tiger Lake; Ice Lake NNP-I; SPR string Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * second FMA features - incomplete and wrong Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * oops: use T/F not 2/1 Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com> * implement SKX lookup Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * add Intel copyright * cleanup AVX512 second FMA code 1) remove debug stuff 2) remove ICX - will add details when available Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * fix CPX detection Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * remove elses Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * remove curly braces from single-line conditional bodies Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> * apply clang-format Signed-off-by: Hammond, Jeff R <jeff.r.hammond@intel.com> Fixes #120
A cross-platform C library to retrieve CPU features (such as available instructions) at runtime.
cpuid
is unavailable. This is useful when running integration tests in hermetic environments.malloc
, memcpy
, and memcmp
.Note: For C++ code, the library functions are defined in the CpuFeatures
namespace.
Here's a simple example that executes a codepath if the CPU supports both the AES and the SSE4.2 instruction sets:
#include "cpuinfo_x86.h" // For C++, add `using namespace CpuFeatures;` static const X86Features features = GetX86Info().features; void Compute(void) { if (features.aes && features.sse4_2) { // Run optimized code. } else { // Run standard code. } }
If you wish, you can read all the features at once into a global variable, and then query for the specific features you care about. Below, we store all the ARM features and then check whether AES and NEON are supported.
#include <stdbool.h> #include "cpuinfo_arm.h" // For C++, add `using namespace CpuFeatures;` static const ArmFeatures features = GetArmInfo().features; static const bool has_aes_and_neon = features.aes && features.neon; // use has_aes_and_neon.
This is a good approach to take if you're checking for combinations of features when using a compiler that is slow to extract individual bits from bit-packed structures.
The following code determines whether the compiler was told to use the AVX instruction set (e.g., g++ -mavx
) and sets has_avx
accordingly.
#include <stdbool.h> #include "cpuinfo_x86.h" // For C++, add `using namespace CpuFeatures;` static const X86Features features = GetX86Info().features; static const bool has_avx = CPU_FEATURES_COMPILED_X86_AVX || features.avx; // use has_avx.
CPU_FEATURES_COMPILED_X86_AVX
is set to 1 if the compiler was instructed to use AVX and 0 otherwise, combining compile time and runtime knowledge.
On x86, the first incarnation of a feature in a microarchitecture might not be the most efficient (e.g. AVX on Sandy Bridge). We provide a function to retrieve the underlying microarchitecture so you can decide whether to use it.
Below, has_fast_avx
is set to 1 if the CPU supports the AVX instruction set—but only if it's not Sandy Bridge.
#include <stdbool.h> #include "cpuinfo_x86.h" // For C++, add `using namespace CpuFeatures;` static const X86Info info = GetX86Info(); static const X86Microarchitecture uarch = GetX86Microarchitecture(&info); static const bool has_fast_avx = info.features.avx && uarch != INTEL_SNB; // use has_fast_avx.
This feature is currently available only for x86 microarchitectures.
Building cpu_features
(check quickstart below) brings a small executable to test the library. .
% ./build/list_cpu_features arch : x86 brand : Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz family : 6 (0x06) model : 45 (0x2D) stepping : 7 (0x07) uarch : INTEL_SNB flags : aes,avx,cx16,smx,sse4_1,sse4_2,ssse3
% ./build/list_cpu_features --json {"arch":"x86","brand":" Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz","family":6,"model":45,"stepping":7,"uarch":"INTEL_SNB","flags":["aes","avx","cx16","smx","sse4_1","sse4_2","ssse3"]}
x86³ | ARM | AArch64 | MIPS⁴ | POWER | |
---|---|---|---|---|---|
Android | yes² | yes¹ | yes¹ | yes¹ | N/A |
iOS | N/A | not yet | not yet | N/A | N/A |
Linux | yes² | yes¹ | yes¹ | yes¹ | yes¹ |
MacOs | yes² | N/A | not yet | N/A | no |
Windows | yes² | not yet | not yet | N/A | N/A |
/proc/self/auxv
/proc/cpuinfo
cpuid
instruction.cpu_features is now officially supporting Android and offers a drop in replacement of for the NDK's cpu-features.h , see ndk_compat folder for details.
The cpu_features library is licensed under the terms of the Apache license. See LICENSE for more information.
Please check the CMake build instructions.
Ninja
list_cpu_features
cmake -B/tmp/cpu_features -H. -GNinja -DCMAKE_BUILD_TYPE=Release ninja -C/tmp/cpu_features /tmp/cpu_features/list_cpu_features --json
cmake -B/tmp/cpu_features -H. -GNinja -DBUILD_TESTING=ON ninja -C/tmp/cpu_features ninja -C/tmp/cpu_features test