radeons: use dp4 for position invariant vertex programs

Fixes #22181. R200 requires this since DP4 is used in hw tnl mode.
R300 prefers it (should be faster due to no instruction dependencies), but
both methods should be correct (when sw tcl is used though, MUL/MAD might
be faster). Probably doesn't make much difference for R100 since vertex progs
are executed in software anyway, but let's just keep it the same there too.
3 files changed