llvmpipe: execute shaders on 4x4 blocks instead of 8x2

This matches the convention used by the recursive rasterizer.
Also fixed assorted typos, comments, etc.
Now tri-z.c, gears.c, etc look basically right but there's still some
cracks in triangle rasterization.
4 files changed