drm/i915: ban badly behaving contexts Now when we have mechanism in place to track which context was guilty of hanging the gpu, it is possible to punish for bad behaviour. If context has recently submitted a faulty batchbuffers guilty of gpu hang and submits another batch which hangs gpu in quick succession, ban it permanently. If ctx is banned, no more batchbuffers will be queued for execution. There is no need for global wedge machinery anymore and it would be unwise to wedge the whole gpu if we have multiple hanging batches queued for execution. Instead just ban the guilty ones and carry on. v2: Store guilty ban status bool in gpu_error instead of pointers that might become danling before hang is declared. v3: Use return value for banned status instead of stashing state into gpu_error (Chris Wilson) v4: - rebase on top of fixed hang stats api - add define for ban period - rename commit and improve commit msg v5: - rely context banning instead of wedging the gpu - beautification and fix for ban calculation (Chris) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

commit: be62acb4cce1389a28296852737e3917d9cc5b25 [log] [tgz]
author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fri Aug 30 16:19:28 2013 +0300
committer: Daniel Vetter <daniel.vetter@ffwll.ch> Fri Sep 06 17:55:50 2013 +0200
tree: 7675ef0ebe4cad2d72f1ec94124f73fa97a42472
parent: bf13e81b904a37d94d83dd6c3b53a147719a3ead [diff] [blame]
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 72e2be7..ec690ca 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c

@@ -719,24 +719,19 @@
 
 	simulated = dev_priv->gpu_error.stop_rings != 0;
 
-	if (!simulated && get_seconds() - dev_priv->gpu_error.last_reset < 5) {
-		DRM_ERROR("GPU hanging too fast, declaring wedged!\n");
-		ret = -ENODEV;
-	} else {
-		ret = intel_gpu_reset(dev);
+	ret = intel_gpu_reset(dev);
 
-		/* Also reset the gpu hangman. */
-		if (simulated) {
-			DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-			dev_priv->gpu_error.stop_rings = 0;
-			if (ret == -ENODEV) {
-				DRM_ERROR("Reset not implemented, but ignoring "
-					  "error for simulated gpu hangs\n");
-				ret = 0;
-			}
-		} else
-			dev_priv->gpu_error.last_reset = get_seconds();
+	/* Also reset the gpu hangman. */
+	if (simulated) {
+		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
+		dev_priv->gpu_error.stop_rings = 0;
+		if (ret == -ENODEV) {
+			DRM_ERROR("Reset not implemented, but ignoring "
+				  "error for simulated gpu hangs\n");
+			ret = 0;
+		}
 	}
+
 	if (ret) {
 		DRM_ERROR("Failed to reset chip.\n");
 		mutex_unlock(&dev->struct_mutex);
commit	be62acb4cce1389a28296852737e3917d9cc5b25	[log] [tgz]
author	Mika Kuoppala <mika.kuoppala@linux.intel.com>	Fri Aug 30 16:19:28 2013 +0300
committer	Daniel Vetter <daniel.vetter@ffwll.ch>	Fri Sep 06 17:55:50 2013 +0200
tree	7675ef0ebe4cad2d72f1ec94124f73fa97a42472
parent	bf13e81b904a37d94d83dd6c3b53a147719a3ead [diff] [blame]