drm/i915: Wait for writes through the GTT to land before reading back If we quickly switch from writing through the GTT to a read of the physical page directly with the CPU (e.g. performing relocations through the GTT and then running the command parser), we can observe that the writes are not visible to the CPU. It is not a coherency problem, as extensive investigations with clflush have demonstrated, but a mere timing issue - we have to wait for the GTT to complete it's write before we start our read from the CPU. The issue can be illustrated in userspace with: gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE); cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE); gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT); for (i = 0; i < OBJECT_SIZE / 64; i++) { int x = 16*i + (i%16); gtt[x] = i; clflush(&cpu[x], sizeof(cpu[x])); assert(cpu[x] == i); } Experimenting with that shows that this behaviour is indeed limited to recent Atom-class hardware. Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk

commit: 3b5724d702ef24ee41ca008a1fab1cf94f3d31b5 [log] [tgz]
author: Chris Wilson <chris@chris-wilson.co.uk> Thu Aug 18 17:16:49 2016 +0100
committer: Chris Wilson <chris@chris-wilson.co.uk> Thu Aug 18 22:36:45 2016 +0100
tree: 5fa3688f162b6aabd1413702b9bebdcc45d2530e
parent: a314d5cb4ac3722b9a673656e2499f4d92ee5e6f [diff] [blame]
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c3f1a98..4c9ecab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c

@@ -3173,20 +3173,30 @@
 static void
 i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
 		return;
 
 	/* No actual flushing is required for the GTT write domain.  Writes
-	 * to it immediately go to main memory as far as we know, so there's
+	 * to it "immediately" go to main memory as far as we know, so there's
 	 * no chipset flush.  It also doesn't land in render cache.
 	 *
 	 * However, we do have to enforce the order so that all writes through
 	 * the GTT land before any writes to the device, such as updates to
 	 * the GATT itself.
+	 *
+	 * We also have to wait a bit for the writes to land from the GTT.
+	 * An uncached read (i.e. mmio) seems to be ideal for the round-trip
+	 * timing. This issue has only been observed when switching quickly
+	 * between GTT writes and CPU reads from inside the kernel on recent hw,
+	 * and it appears to only affect discrete GTT blocks (i.e. on LLC
+	 * system agents we cannot reproduce this behaviour).
 	 */
 	wmb();
+	if (INTEL_GEN(dev_priv) >= 6 && !HAS_LLC(dev_priv))
+		POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base));
 
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
commit	3b5724d702ef24ee41ca008a1fab1cf94f3d31b5	[log] [tgz]
author	Chris Wilson <chris@chris-wilson.co.uk>	Thu Aug 18 17:16:49 2016 +0100
committer	Chris Wilson <chris@chris-wilson.co.uk>	Thu Aug 18 22:36:45 2016 +0100
tree	5fa3688f162b6aabd1413702b9bebdcc45d2530e
parent	a314d5cb4ac3722b9a673656e2499f4d92ee5e6f [diff] [blame]