Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 1 | Generic Mutex Subsystem |
| 2 | |
| 3 | started by Ingo Molnar <mingo@redhat.com> |
| 4 | |
| 5 | "Why on earth do we need a new mutex subsystem, and what's wrong |
| 6 | with semaphores?" |
| 7 | |
| 8 | firstly, there's nothing wrong with semaphores. But if the simpler |
| 9 | mutex semantics are sufficient for your code, then there are a couple |
| 10 | of advantages of mutexes: |
| 11 | |
| 12 | - 'struct mutex' is smaller on most architectures: .e.g on x86, |
| 13 | 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes. |
| 14 | A smaller structure size means less RAM footprint, and better |
| 15 | CPU-cache utilization. |
| 16 | |
| 17 | - tighter code. On x86 i get the following .text sizes when |
| 18 | switching all mutex-alike semaphores in the kernel to the mutex |
| 19 | subsystem: |
| 20 | |
| 21 | text data bss dec hex filename |
| 22 | 3280380 868188 396860 4545428 455b94 vmlinux-semaphore |
| 23 | 3255329 865296 396732 4517357 44eded vmlinux-mutex |
| 24 | |
| 25 | that's 25051 bytes of code saved, or a 0.76% win - off the hottest |
| 26 | codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%) |
| 27 | Smaller code means better icache footprint, which is one of the |
| 28 | major optimization goals in the Linux kernel currently. |
| 29 | |
| 30 | - the mutex subsystem is slightly faster and has better scalability for |
| 31 | contended workloads. On an 8-way x86 system, running a mutex-based |
| 32 | kernel and testing creat+unlink+close (of separate, per-task files) |
| 33 | in /tmp with 16 parallel tasks, the average number of ops/sec is: |
| 34 | |
| 35 | Semaphores: Mutexes: |
| 36 | |
| 37 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 |
| 38 | 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. |
| 39 | checking VFS performance. checking VFS performance. |
| 40 | avg loops/sec: 34713 avg loops/sec: 84153 |
| 41 | CPU utilization: 63% CPU utilization: 22% |
| 42 | |
| 43 | i.e. in this workload, the mutex based kernel was 2.4 times faster |
| 44 | than the semaphore based kernel, _and_ it also had 2.8 times less CPU |
| 45 | utilization. (In terms of 'ops per CPU cycle', the semaphore kernel |
| 46 | performed 551 ops/sec per 1% of CPU time used, while the mutex kernel |
| 47 | performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times |
| 48 | more efficient.) |
| 49 | |
| 50 | the scalability difference is visible even on a 2-way P4 HT box: |
| 51 | |
| 52 | Semaphores: Mutexes: |
| 53 | |
| 54 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 |
| 55 | 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. |
| 56 | checking VFS performance. checking VFS performance. |
| 57 | avg loops/sec: 127659 avg loops/sec: 181082 |
| 58 | CPU utilization: 100% CPU utilization: 34% |
| 59 | |
| 60 | (the straight performance advantage of mutexes is 41%, the per-cycle |
| 61 | efficiency of mutexes is 4.1 times better.) |
| 62 | |
| 63 | - there are no fastpath tradeoffs, the mutex fastpath is just as tight |
| 64 | as the semaphore fastpath. On x86, the locking fastpath is 2 |
| 65 | instructions: |
| 66 | |
| 67 | c0377ccb <mutex_lock>: |
| 68 | c0377ccb: f0 ff 08 lock decl (%eax) |
Denys Vlasenko | 75ddb0e | 2010-02-20 01:03:48 +0100 | [diff] [blame] | 69 | c0377cce: 78 0e js c0377cde <.text..lock.mutex> |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 70 | c0377cd0: c3 ret |
| 71 | |
| 72 | the unlocking fastpath is equally tight: |
| 73 | |
| 74 | c0377cd1 <mutex_unlock>: |
| 75 | c0377cd1: f0 ff 00 lock incl (%eax) |
Denys Vlasenko | 75ddb0e | 2010-02-20 01:03:48 +0100 | [diff] [blame] | 76 | c0377cd4: 7e 0f jle c0377ce5 <.text..lock.mutex+0x7> |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 77 | c0377cd6: c3 ret |
| 78 | |
| 79 | - 'struct mutex' semantics are well-defined and are enforced if |
| 80 | CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have |
| 81 | virtually no debugging code or instrumentation. The mutex subsystem |
| 82 | checks and enforces the following rules: |
| 83 | |
| 84 | * - only one task can hold the mutex at a time |
| 85 | * - only the owner can unlock the mutex |
| 86 | * - multiple unlocks are not permitted |
| 87 | * - recursive locking is not permitted |
| 88 | * - a mutex object must be initialized via the API |
| 89 | * - a mutex object must not be initialized via memset or copying |
| 90 | * - task may not exit with mutex held |
| 91 | * - memory areas where held locks reside must not be freed |
| 92 | * - held mutexes must not be reinitialized |
Matti Linnanvuori | f20fda4 | 2007-10-16 23:29:41 -0700 | [diff] [blame] | 93 | * - mutexes may not be used in hardware or software interrupt |
| 94 | * contexts such as tasklets and timers |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 95 | |
| 96 | furthermore, there are also convenience features in the debugging |
| 97 | code: |
| 98 | |
| 99 | * - uses symbolic names of mutexes, whenever they are printed in debug output |
| 100 | * - point-of-acquire tracking, symbolic lookup of function names |
| 101 | * - list of all locks held in the system, printout of them |
| 102 | * - owner tracking |
| 103 | * - detects self-recursing locks and prints out all relevant info |
| 104 | * - detects multi-task circular deadlocks and prints out all affected |
| 105 | * locks and tasks (and only those tasks) |
| 106 | |
| 107 | Disadvantages |
| 108 | ------------- |
| 109 | |
| 110 | The stricter mutex API means you cannot use mutexes the same way you |
| 111 | can use semaphores: e.g. they cannot be used from an interrupt context, |
| 112 | nor can they be unlocked from a different context that which acquired |
| 113 | it. [ I'm not aware of any other (e.g. performance) disadvantages from |
| 114 | using mutexes at the moment, please let me know if you find any. ] |
| 115 | |
| 116 | Implementation of mutexes |
| 117 | ------------------------- |
| 118 | |
| 119 | 'struct mutex' is the new mutex type, defined in include/linux/mutex.h |
| 120 | and implemented in kernel/mutex.c. It is a counter-based mutex with a |
| 121 | spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", |
| 122 | 0 for "locked" and negative numbers (usually -1) for "locked, potential |
| 123 | waiters queued". |
| 124 | |
| 125 | the APIs of 'struct mutex' have been streamlined: |
| 126 | |
| 127 | DEFINE_MUTEX(name); |
| 128 | |
| 129 | mutex_init(mutex); |
| 130 | |
| 131 | void mutex_lock(struct mutex *lock); |
| 132 | int mutex_lock_interruptible(struct mutex *lock); |
| 133 | int mutex_trylock(struct mutex *lock); |
| 134 | void mutex_unlock(struct mutex *lock); |
| 135 | int mutex_is_locked(struct mutex *lock); |
Robert P. J. Day | 343b901 | 2007-10-20 00:15:26 +0200 | [diff] [blame] | 136 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); |
| 137 | int mutex_lock_interruptible_nested(struct mutex *lock, |
| 138 | unsigned int subclass); |