[optimizing] Improve x86 parallel moves/swaps
Add a new constructor to ScratchRegisterScope that will supply a
register if there is a free one, but not spill to force one. Use this
to generated alternate code that doesn't use a temporary, as the
spill/restore of a register generates extra instructions that aren't
necessary on x86.
Here is the benefit for a 32 bit memory-to-memory exchange with no
free registers:
< 50 push eax
< 53 push ebx
< 8B44244C mov eax, [esp + 76]
< 8B5C246C mov ebx, [esp + 108]
< 8944246C mov [esp + 108], eax
< 895C244C mov [esp + 76], ebx
< 5B pop ebx
< 58 pop eax
---
> FF742444 push [esp + 68]
> FF742468 push [esp + 104]
> 8F44244C pop [esp + 72]
> 8F442468 pop [esp + 100]
Avoid using xchg instruction, as it is slow on smaller processors.
Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
diff --git a/compiler/optimizing/parallel_move_resolver.h b/compiler/optimizing/parallel_move_resolver.h
index 3fa1b37..173cffc 100644
--- a/compiler/optimizing/parallel_move_resolver.h
+++ b/compiler/optimizing/parallel_move_resolver.h
@@ -42,10 +42,15 @@
protected:
class ScratchRegisterScope : public ValueObject {
public:
+ // Spill a scratch register if no regs are free.
ScratchRegisterScope(ParallelMoveResolver* resolver,
int blocked,
int if_scratch,
int number_of_registers);
+ // Grab a scratch register only if available.
+ ScratchRegisterScope(ParallelMoveResolver* resolver,
+ int blocked,
+ int number_of_registers);
~ScratchRegisterScope();
int GetRegister() const { return reg_; }
@@ -62,6 +67,8 @@
// Allocate a scratch register for performing a move. The method will try to use
// a register that is the destination of a move, but that move has not been emitted yet.
int AllocateScratchRegister(int blocked, int if_scratch, int register_count, bool* spilled);
+ // As above, but return -1 if no free register.
+ int AllocateScratchRegister(int blocked, int register_count);
// Emit a move.
virtual void EmitMove(size_t index) = 0;