drm/i915: clear up wedged transitions
We have two important transitions of the wedged state in the current
code:
- 0 -> 1: This means a hang has been detected, and signals to everyone
that they please get of any locks, so that the reset work item can
do its job.
- 1 -> 0: The reset handler has completed.
Now the last transition mixes up two states: "Reset completed and
successful" and "Reset failed". To distinguish these two we do some
tricks with the reset completion, but I simply could not convince
myself that this doesn't race under odd circumstances.
Hence split this up, and add a new terminal state indicating that the
hw is gone for good.
Also add explicit #defines for both states, update comments.
v2: Split out the reset handling bugfix for the throttle ioctl.
v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase
error which prevented this patch from actually compiling.
v4: To unify the wedged state with the reset counter, keep the
reset-in-progress state just as a flag. The terminally-wedged state is
now denoted with a big number.
v5: Add a comment to the reset_counter special values explaining that
WEDGED & RESET_IN_PROGRESS needs to be true for the code to be
correct.
v6: Fixup logic errors introduced with the wedged+reset_counter
unification. Since WEDGED implies reset-in-progress (in a way we're
terminally stuck in the dead-but-reset-not-completed state), we need
ensure that we check for this everywhere. The specific bug was in
wait_for_error, which would simply have timed out.
v7: Extract an inline i915_reset_in_progress helper to make the code
more readable. Also annote the reset-in-progress case with an
unlikely, to help the compiler optimize the fastpath. Do the same for
the terminally wedged case with i915_terminally_wedged.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f2c0016..4562c54 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -862,8 +862,10 @@
*/
static void i915_error_work_func(struct work_struct *work)
{
- drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
- gpu_error.work);
+ struct i915_gpu_error *error = container_of(work, struct i915_gpu_error,
+ work);
+ drm_i915_private_t *dev_priv = container_of(error, drm_i915_private_t,
+ gpu_error);
struct drm_device *dev = dev_priv->dev;
char *error_event[] = { "ERROR=1", NULL };
char *reset_event[] = { "RESET=1", NULL };
@@ -871,14 +873,18 @@
kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
- if (atomic_read(&dev_priv->gpu_error.wedged)) {
+ if (i915_reset_in_progress(error)) {
DRM_DEBUG_DRIVER("resetting chip\n");
kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, reset_event);
+
if (!i915_reset(dev)) {
- atomic_set(&dev_priv->gpu_error.wedged, 0);
+ atomic_set(&error->reset_counter, 0);
kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, reset_done_event);
+ } else {
+ atomic_set(&error->reset_counter, I915_WEDGED);
}
- complete_all(&dev_priv->gpu_error.completion);
+
+ wake_up_all(&dev_priv->gpu_error.reset_queue);
}
}
@@ -1482,11 +1488,12 @@
i915_report_and_clear_eir(dev);
if (wedged) {
- INIT_COMPLETION(dev_priv->gpu_error.completion);
- atomic_set(&dev_priv->gpu_error.wedged, 1);
+ atomic_set(&dev_priv->gpu_error.reset_counter,
+ I915_RESET_IN_PROGRESS_FLAG);
/*
- * Wakeup waiting processes so they don't hang
+ * Wakeup waiting processes so that the reset work item
+ * doesn't deadlock trying to grab various locks.
*/
for_each_ring(ring, dev_priv, i)
wake_up_all(&ring->irq_queue);