drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()

As the clflush operates on cache lines, and we can flush any byte
address, in order to flush all bytes given in the range we issue an
extra clflush on the last byte to ensure the last cacheline is flushed.
We can can the iteration to be over the actual cache lines to avoid this
double clflush on the last byte.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 9a62d7a..6743ff7 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -130,11 +130,12 @@
 {
 #if defined(CONFIG_X86)
 	if (cpu_has_clflush) {
+		const int size = boot_cpu_data.x86_clflush_size;
 		void *end = addr + length;
+		addr = (void *)(((unsigned long)addr) & -size);
 		mb();
-		for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
+		for (; addr < end; addr += size)
 			clflushopt(addr);
-		clflushopt(end - 1);
 		mb();
 		return;
 	}