Optimize instruction data fetch in interpreter.

The computed goto implementation prevents the compiler from detecting we are
loading the first 16 bits of instruction twice: first one to get the opcode and
second one to fetch first instruction's operand(s) like vA and vB.

We now load the 16 bits into a local variable and decode opcode and operands
from this variable. And do the same in the switch-based implementation for
consistency.

The performance improvement is 5% in average on benchmark applications suite.

Also remove unused "Thread* self" parameter from DoIGetQuick and DoIPutQuick.

Bug: 10703860
Change-Id: I83026ed6e78f642ac3dcdc6edbb6056fe012005f
5 files changed