perf bench: Improve builtin-bench.c for more friendly output

This patch makes output of perf bench more friendly.
Current style of putput, keeping user wait
and printing everything suddenly when we finish,
may confuse users.

So I improved it:

 | % perf bench sched messaging
 | # Running sched/messaging benchmark...  <- printed right after invocation
 | # 20 sender and receiver processes per group
 | # 10 groups == 400 processes run
 |
 |      Total time: 1.476 [sec]

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1257865442-20252-2-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index c7505ea..90c39ba 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -156,6 +156,10 @@
 			if (strcmp(subsystems[i].suites[j].name, argv[1]))
 				continue;
 
+			if (bench_format == BENCH_FORMAT_DEFAULT)
+				printf("# Running %s/%s benchmark...\n",
+				       subsystems[i].name,
+				       subsystems[i].suites[j].name);
 			status = subsystems[i].suites[j].fn(argc - 1,
 							    argv + 1, prefix);
 			goto end;