X86: Use short forward jumps if possible

The optimizing compiler uses 32 bit relative jumps for all forward
jumps, just in case the offset is too large to fit in one byte.  Some of
the generated code knows that the jumps will in fact fit.

Use the 'NearLabel' class for the code generator and intrinsics.

Use the jecxz/jrcxz instruction for string intrinsics.

Unfortunately, conditional jumps to basic blocks don't know enough to
use this, as we don't know how much code will be generated.

This saves a whopping 0.24% for core.oat and boot.oat sizes, but
every little bit helps, and it reduces icache footprint slightly.

Change-Id: I633fe3b2e0e810b4ce12fdad8c02135644b63506
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
4 files changed