drm/i915: Use simplest form for flushing the single cacheline in the HWS

Rather than call a function to compute the matching cachelines and
clflush them, just call the clflush *instruction* directly. We also know
that we can use the unpatched plain clflush rather than the clflushopt
alternative.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-4-git-send-email-chris@chris-wilson.co.uk
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 29c54cc..9d7b7bf 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -385,8 +385,9 @@
 static inline void
 intel_flush_status_page(struct intel_engine_cs *engine, int reg)
 {
-	drm_clflush_virt_range(&engine->status_page.page_addr[reg],
-			       sizeof(uint32_t));
+	mb();
+	clflush(&engine->status_page.page_addr[reg]);
+	mb();
 }
 
 static inline u32