[PATCH] files: fix rcu initializers

First of a number of files_lock scaability patches.

 Here are the x86 numbers -

 tiobench on a 4(8)-way (HT) P4 system on ramdisk :

                                         (lockfree)
 Test            2.6.10-vanilla  Stdev   2.6.10-fd       Stdev
 -------------------------------------------------------------
 Seqread         1400.8          11.52   1465.4          34.27
 Randread        1594            8.86    2397.2          29.21
 Seqwrite        242.72          3.47    238.46          6.53
 Randwrite       445.74          9.15    446.4           9.75

 The performance improvement is very significant.
 We are getting killed by the cacheline bouncing of the files_struct
 lock here. Writes on ramdisk (ext2) seems to vary just too
 much to get any meaningful number.

 Also, With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system :

 2.6.12-rc5-vanilla :

 Running test 'readwrite' with 8 tasks
 Threads     0.34 +/- 0.01 seconds
 Processes   0.16 +/- 0.00 seconds

 2.6.12-rc5-fd :

 Running test 'readwrite' with 8 tasks
 Threads     0.17 +/- 0.02 seconds
 Processes   0.17 +/- 0.02 seconds

 I repeated the measurements on ramfs (as opposed to ext2 on ramdisk in
 the earlier measurement) and I got more consistent results from tiobench :

 4(8) way xeon P4
 -----------------
                                         (lock-free)
 Test            2.6.12-rc5      Stdev   2.6.12-rc5-fd   Stdev
 -------------------------------------------------------------
 Seqread         1282            18.59   1343.6          26.37
 Randread        1517            7       2415            34.27
 Seqwrite        702.2           5.27    709.46           5.9
 Randwrite       846.86          15.15   919.68          21.4

 4-way ppc64
 ------------
                                         (lock-free)
 Test            2.6.12-rc5      Stdev   2.6.12-rc5-fd   Stdev
 -------------------------------------------------------------
 Seqread         1549            91.16   1569.6          47.2
 Randread        1473.6          25.11   1585.4          69.99
 Seqwrite        1096.8          20.03   1136            29.61
 Randwrite       1189.6           4.04   1275.2          32.96

 Also running Tridge's thread_perf test on ppc64 :

 2.6.12-rc5-vanilla
 --------------------
 Running test 'readwrite' with 4 tasks
 Threads     0.20 +/- 0.02 seconds
 Processes   0.16 +/- 0.01 seconds

 2.6.12-rc5-fd
 --------------------
 Running test 'readwrite' with 4 tasks
 Threads     0.18 +/- 0.04 seconds
 Processes   0.16 +/- 0.01 seconds

 The benefits are huge (upto ~60%) in some cases on x86 primarily
 due to the atomic operations during acquisition of ->file_lock
 and cache line bouncing in fast path. ppc64 benefits are modest
 due to LL/SC based locking, but still statistically significant.

This patch:

RCU head initilizer no longer needs the head varible name since we don't use
list.h lists anymore.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1 file changed