Use num_possible_cpus() instead of NR_CPUS for timer distribution

To avoid lock contention, we distribute the sched_timer calls across the
cpus so they do not trigger at the same instant.  However, I used NR_CPUS,
which can cause needless grouping on small smp systems depending on your
kernel config.  This patch converts to using num_possible_cpus() so we
spread it as evenly as possible on every machine.

Briefly tested w/ NR_CPUS=255 and verified reduced contention.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8c3fef1..ce89ffb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -570,7 +570,7 @@
 	/* Get the next period (per cpu) */
 	ts->sched_timer.expires = tick_init_jiffy_update();
 	offset = ktime_to_ns(tick_period) >> 1;
-	do_div(offset, NR_CPUS);
+	do_div(offset, num_possible_cpus());
 	offset *= smp_processor_id();
 	ts->sched_timer.expires = ktime_add_ns(ts->sched_timer.expires, offset);