NDK Programmer's Guide
x86

Introduction

Android NDK r6 added support for the 'x86' ABI, that allows native code to run on Android-based devices running on CPUs supporting the IA-32 instruction set.

The Android x86 ABI itself is fully specified in docs/CPU-ARCH-ABIS.html.

Overview

Generating x86 machine code is simple: just add 'x86' to your APP_ABI definition in your Application.mk file, for example:

    APP_ABI := armeabi armeabi-v7a x86

Alternatively, since NDK r7, you can use:

    APP_ABI := all

will generate machine code for all supported ABIs with this NDK. Doing so will ensure that your application package contains libraries for all target ABIs. Note that this has an impact on package size, since each ABI will correspond to its own set of native libraries built from the same sources.

The default ABI is still 'armeabi', if unspecified in your project.

As you would expect, generated libraries will go into $PROJECT/libs/x86/, and will be embedded into your .apk under /lib/x86/.

And just like other ABIs, the Android package manager will extract these libraries on a compatible x86-based device automatically at install time, to put them under <dataPath>/lib, where <dataPath> is the application's private data directory.

Similarly, the Google Play server is capable of filtering applications based on the native libraries they embed and your device's target CPU.

Debugging with ndk-gdb should work exactly as described under docs/NDK-GDB.html.

ARM NEON intrinsics support

The solution is shaped as C/C++ language header with the same name as standard ARM NEON intrinsics header "arm_neon.h" and is available in all NDK x86 toolchains. It translates NEON intrinsics to native x86 SSE ones.

By default SSE up to SSSE3 is used for porting ARM NEON to Intel SSE.

Current solution covers by default ~93% NEON functions (1869 of total 2009).

The solution

  • Redefines ARM NEON 128 bit vectors as the corresponding x86 SIMD data.
  • Redefines some functions from ARM NEON to Intel SSE if 1:1 correspondence exists.
  • Implements some ARM NEON functions using Intel SIMD if the performance effective implementation is possible.
  • Implements some of the remaining NEON functions using the serial solution and issuing the corresponding "low performance" compiler warning.

Known differences with ARM version:

There are few corner cases where x86 implementation produces different results comparing to native execution on ARM. It's been found on NEON tests with total passrate close to 100%. Though these cases are expected to be rare please see below for a complete list of such incompatibilities:

  • VRECPS/VRECPSQ
    If one of the operands is +/- infinity and the second is +/- 0.0 then
    • On ARM CPUs result element equal to 2.0 will be returned. To learn more about it, see here.
    • On x86 CPUs QNaN Indefinite will be returned. To learn more about it, see Volume 1 Appendix E chapter E.4.2.2 here.
  • VRSQRTS/VRSQRTSQ
    If one of the operands is +/- infinity and the second is +/- 0.0 then
    • On ARM CPUs result element equal to 1.5 will be returned. To learn more about it, see here.
    • On x86 CPUs QNaN Indefinite will be returned. To learn more about it, see Volume 1 Appendix E chapter E.4.2.2 here.
  • VMAX/VMAXQ
    If one of the operands is NaN or both are +/- 0.0 then
    • On ARM CPUs floating-point maximum works as follows:
      • max(+0.0, -0.0) = +0.0.
      • If any input is a NaN, the corresponding result element is the default NaN.
      To learn more about it, see here.
    • On x86 CPUs floating-point maximum works as follows:
      • If one of the source operands is NaN, than return the second source operand.
      • If both source operands are equal to 0, than return the second source operand.
      To learn more about it, see Volume 1 Appendix E chapter E.4.2.3 and Volume 2 at page 3-488 here.
  • VMIN/VMINQ
    If one of the operands is NaN or both are +/- 0.0 then
    • On ARM CPUs floating-point minimum works as follows:
      • min(+0.0, -0.0) = -0.0.
      • If any input is a NaN, the corresponding result element is the default NaN.
      To learn more about it, see here.
    • On x86 CPUs floating-point minimum works as follows:
      • If one of the source operands is NaN, than return the second source operand.
      • If both source operands are equal to 0, than return the second source operand.
      To learn more about it, see Volume 1 Appendix E chapter E.4.2.3 and Volume 2 at page 3-497 here.
  • VRECPE/VRECPEQ
    Different accuracy on ARM and x86 CPUs. To learn more about it, see ARM article "How do I use VRECPE/VRECPEQ for reciprocal estimate?" here and Volume 2 at page 4-281 here.
  • VRSQRTE/VRSQRTEQ
    • Different accuracy on ARM and x86 CPUs. To learn more about it, see here and Volume 2 at page 4-325 here.
    • If one of the operands is negative or -infinity then
      • On ARM CPUs function will return default NaN (sign is set to positive). To learn more about it, see here.
      • On x86 CPUs function will return the QNaN floating-point Indefinite (sign is set to negative). To learn more about it, see Volume 1 Appendix E chapter E.4.2.3 here.

Performance:

For the major number of cases it is expected to obtain the similar to ARM NEON native perfomance gain for vectorized vs. serial code.

Porting considerations and best known methods are:

  • Use 16-byte data alignment for faster load and store
  • Avoid NEON functions working with constants. It produces performance penalty for constants load. If constants usage is necessary try to move constants initialization out of hotspot loops and if applicable replace it with logical and compare operations.
  • Try to avoid functions marked as "serially implemented" because they need to store data from registers to memory, process them serialy and load them again. Probably you could change the data type or algorithm used to make the whole port vectorized not a serial one.

To learn more about it, see here.

Sample code:

In your project add 'x86' to APP_ABI definition and make sure "arm_neon.h" header is included. Your code will be ported to x86 without any other changes necessary.

Look at the "hello-neon" sample in NDK for an example on how ARM NEON porting to x86 SSE works.

Standalone-toolchain

It is possible to use the x86 toolchain with NDK r6 in stand-alone mode. See docs/STANDALONE-TOOLCHAIN.html for more details. Briefly speaking, it is now possible to run:

  $NDK/build/tools/make-standalone-toolchain.sh --arch=x86 --install-dir=<path>

The toolchain binaries have the i686-linux-android- prefix.

Compatibility

The minimal native API level provided by official Android x86 platform builds is 9, which corresponds to all the native APIs provided by Android 2.3, i.e. Gingerbread (note also that no new native APIs were introduced by Honeycomb).

You won't have to change anything to your project files if you target an older API level: the NDK build script will automatically select the right set of native platform headers/libraries for you.