Lower insertelement and extractelement.

Use instructions that do the operations in registers and that are
available in SSE2. Spill to memory to perform the operation in the
absence of any other reasonable options (v16i8 and v16i1).

Unfortunately there is no natural class of SSE2 instructions that
insertelement / extractelement can get lowered
to for all vector types (though pinsr[bwd] and pextr[bwd] are
available in SSE4.1). There are in some cases a large number of
choices available for lowering and I have not looked into which
choices are the best yet, besides using LLVM output as a guide.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/401523003
15 files changed