049d33a7175882fcf2fdc56f4465af5629a7e353 - toolchain/llvm-project

commit	049d33a7175882fcf2fdc56f4465af5629a7e353	[log] [tgz]
author	Chris Lattner <sabre@nondot.org>	Sat Nov 13 20:48:57 2004 +0000
committer	Chris Lattner <sabre@nondot.org>	Sat Nov 13 20:48:57 2004 +0000
tree	5094b3b0a14ecc9ad353b4a4270dfb414875c6e7
parent	ef6bd92a8c78db2b1edbc444570c9fb2931f4e2f [diff]

shld is a very high latency operation. Instead of emitting it for shifts of
two or three, open code the equivalent operation which is faster on athlon
and P4 (by a substantial margin).

For example, instead of compiling this:

long long X2(long long Y) { return Y << 2; }

to:

X3_2:
        movl 4(%esp), %eax
        movl 8(%esp), %edx
        shldl $2, %eax, %edx
        shll $2, %eax
        ret

Compile it to:

X2:
        movl 4(%esp), %eax
        movl 8(%esp), %ecx
        movl %eax, %edx
        shrl $30, %edx
        leal (%edx,%ecx,4), %edx
        shll $2, %eax
        ret

Likewise, for << 3, compile to:

X3:
        movl 4(%esp), %eax
        movl 8(%esp), %ecx
        movl %eax, %edx
        shrl $29, %edx
        leal (%edx,%ecx,8), %edx
        shll $3, %eax
        ret

This matches icc, except that icc open codes the shifts as adds on the P4.

llvm-svn: 17707

llvm/lib/Target/X86/X86ISelSimple.cpp[diff]

1 file changed

tree: 5094b3b0a14ecc9ad353b4a4270dfb414875c6e7

llvm/