drm/i915: Perform object clflushing asynchronously
Flushing the cachelines for an object is slow, can be as much as 100ms
for a large framebuffer. We currently do this under the struct_mutex BKL
on execution or on pageflip. But now with the ability to add fences to
obj->resv for both flips and execbuf (and we naturally wait on the fence
before CPU access), we can move the clflush operation to a workqueue and
signal a fence for completion, thereby doing the work asynchronously and
not blocking the driver or its clients.
v2: Introduce i915_gem_clflush.h and use a new name, split out some
extras into separate patches.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6fb2983..35d2cb9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -35,6 +35,7 @@
#include <drm/i915_drm.h>
#include "i915_drv.h"
+#include "i915_gem_clflush.h"
#include "i915_trace.h"
#include "intel_drv.h"
#include "intel_frontbuffer.h"
@@ -1114,13 +1115,15 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC)
continue;
+ if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) {
+ i915_gem_clflush_object(obj, 0);
+ obj->base.write_domain = 0;
+ }
+
ret = i915_gem_request_await_object
(req, obj, obj->base.pending_write_domain);
if (ret)
return ret;
-
- if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
- i915_gem_clflush_object(obj, false);
}
/* Unconditionally flush any chipset caches (for streaming writes). */