i965: Implement ARB_stencil_texturing on Gen8+.

On earlier hardware, we had to implement math in the shader to translate
Y-tiled or untiled coordinates to W-tiled coordinates (which is what
BLORP does today in order to texture from stencil buffers).

On Broadwell, we can simply state that it's W-tiled in SURFACE_STATE,
and adjust the pitch.  This is much easier.

In the surface state code, I chose to handle the "should we sample depth
or stencil?" question separately from the setup for sampling from
stencil.  This should make it work with the BindRenderbufferTexImage
hook as well, and hopefully be reusable for GL_ARB_texture_stencil8
someday.

v2: Update docs/GL3.txt (caught by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
diff --git a/docs/GL3.txt b/docs/GL3.txt
index f0e95c5..432a056 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -158,7 +158,7 @@
   GL_ARB_robust_buffer_access_behavior                 not started
   GL_ARB_shader_image_size                             not started
   GL_ARB_shader_storage_buffer_object                  not started
-  GL_ARB_stencil_texturing                             API exists, no drivers
+  GL_ARB_stencil_texturing                             DONE (i965/gen8+)
   GL_ARB_texture_buffer_range                          DONE (nv50, nvc0, i965, r600, radeonsi)
   GL_ARB_texture_query_levels                          DONE (i965)
   GL_ARB_texture_storage_multisample                   DONE (all drivers that support GL_ARB_texture_multisample)