bionic: Do not use <sys/atomics.h> for platform code.

We're going to modify the __atomic_xxx implementation to provide
full memory barriers, to avoid problems for NDK machine code that
link to these functions.

First step is to remove their usage from our platform code.
We now use inlined versions of the same functions for a slight
performance boost.

+ remove obsolete atomics_x86.c (was never compiled)

NOTE: This improvement was benchmarked on various devices.
      Comparing a pthread mutex lock + atomic increment + unlock
      we get:

  - ARMv7 emulator, running on a 2.4 GHz Xeon:
       before: 396 ns    after: 288 ns

  - x86 emulator in KVM mode on same machine:
       before: 27 ns     after: 27 ns

  - Google Nexus S, in ARMv7 mode (single-core):
       before: 82 ns     after: 76 ns

  - Motorola Xoom, in ARMv7 mode (multi-core):
       before: 121 ns    after: 120 ns

The code has also been rebuilt in ARMv5TE mode for correctness.

Change-Id: Ic1dc72b173d59b2e7af901dd70d6a72fb2f64b17
8 files changed