ndelay(): switch to C function to avoid 64-bit division

We should be able to do ndelay(some_u64), but that can cause a call to
__divdi3() to be emitted because the ndelay() macros does a divide.

Fix it by switching to static inline which will force the u64 arg to be
treated as an unsigned long.  udelay() takes an unsigned long arg.

[bunk@kernel.org: reported m68k build breakage]
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Martin Michlmayr <tbm@cyrius.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/include/linux/delay.h b/include/linux/delay.h
index 17ddb55..54552d2 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -7,6 +7,8 @@
  * Delay routines, using a pre-computed "loops_per_jiffy" value.
  */
 
+#include <linux/kernel.h>
+
 extern unsigned long loops_per_jiffy;
 
 #include <asm/delay.h>
@@ -32,7 +34,11 @@
 #endif
 
 #ifndef ndelay
-#define ndelay(x)	udelay(((x)+999)/1000)
+static inline void ndelay(unsigned long x)
+{
+	udelay(DIV_ROUND_UP(x, 1000));
+}
+#define ndelay(x) ndelay(x)
 #endif
 
 void calibrate_delay(void);