Make RSKernelExpand use memory rather than registers.

The RSKernelExpand pass generates a loop around the main kernel body.
This patch changes it to use LLVM memory with loads and stores rather
than generating the SSA directly. This is required in order to be able
to attach debugging information to the memory location associated with
the loop iteration index variable, enabling the debugger to inspect
current thread coordinate. The regular SSA form of LLVM IR doesn't allow
this to be done efficiently.

If optimizations are enabled, the subsequent passes promote memory back
to registers, avoiding potential performance regressions.

(cherry picked from commit 4165d29822fc7caf81e435995ff6189608fc0323)

Change-Id: Ic394c2876e72a4b3c7e1fe888f2369510fdd3d33
2 files changed