i965: Stop lying about cpp and height of a stencil buffer.

When using a separate stencil buffer, i965 requires that the pitch of
the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x
the actual pitch.

Previously this was accomplished by doubling the "cpp" and "pitch"
values stored in the intel_region data structure, and halving the
height.  However, this was confusing, and it led to a subtle (but
benign) bug: since a stencil buffer is W-tiled, its true height must
be aligned to a multiple of 64; we were accidentally aligning its faux
height to a multiple of 64, causing memory to be wasted.

Note that for window system stencil buffers, the DDX also doubles the
cpp and pitch values.  To facilitate fixing this DDX server bug in the
future, we fix the cpp and pitch values we receive from the X server
only if cpp has the "incorrect" value of 2.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

v2: Clarify comments about the DDX.
diff --git a/src/mesa/drivers/dri/intel/intel_screen.c b/src/mesa/drivers/dri/intel/intel_screen.c
index 49e208c..3fc1d3c 100644
--- a/src/mesa/drivers/dri/intel/intel_screen.c
+++ b/src/mesa/drivers/dri/intel/intel_screen.c
@@ -956,22 +956,28 @@
       return NULL;
 
    if (attachment == __DRI_BUFFER_STENCIL) {
-      /* The stencil buffer has quirky pitch requirements.  From Vol 2a,
-       * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
-       *    The pitch must be set to 2x the value computed based on width, as
-       *    the stencil buffer is stored with two rows interleaved.
-       * To accomplish this, we resort to the nasty hack of doubling the
-       * region's cpp and halving its height.
+      /* Stencil buffers use W tiling, a tiling format that the DRM functions
+       * don't properly account for.  Therefore, when we allocate a stencil
+       * buffer that is private to Mesa (see intel_miptree_create), we round
+       * the height and width up to the next multiple of the tile size (64x64)
+       * and then ask DRM to allocate an untiled buffer.  Consequently, the
+       * height and the width stored in the stencil buffer's region structure
+       * are always multiples of 64, even if the stencil buffer itself is
+       * smaller.
+       *
+       * To avoid inconsistencies between how we represent private buffers and
+       * buffers shared with the window system, round up the height and width
+       * for window system buffers too.
        */
       region_width = ALIGN(width, 64);
-      region_height = ALIGN(ALIGN(height, 2) / 2, 64);
-      region_cpp = format / 4;
+      region_height = ALIGN(height, 64);
    } else {
       region_width = width;
       region_height = height;
-      region_cpp = format / 8;
    }
 
+   region_cpp = format / 8;
+
    intelBuffer->region = intel_region_alloc(intelScreen,
                                             tiling,
                                             region_cpp,