Btrfs: switch the btrfs tree locks to reader/writer The btrfs metadata btree is the source of significant lock contention, especially in the root node. This commit changes our locking to use a reader/writer lock. The lock is built on top of rw spinlocks, and it extends the lock tracking to remember if we have a read lock or a write lock when we go to blocking. Atomics count the number of blocking readers or writers at any given time. It removes all of the adaptive spinning from the old code and uses only the spinning/blocking hints inside of btrfs to decide when it should continue spinning. In read heavy workloads this is dramatically faster. In write heavy workloads we're still faster because of less contention on the root node lock. We suffer slightly in dbench because we schedule more often during write locks, but all other benchmarks so far are improved. Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit: bd681513fa6f2ff29aa391f01e413a2d1c59fd77 [log] [tgz]
author: Chris Mason <chris.mason@oracle.com> Sat Jul 16 15:23:14 2011 -0400
committer: Chris Mason <chris.mason@oracle.com> Wed Jul 27 12:46:46 2011 -0400
tree: bb10ec6ef876b4d7a553cbe54976ec49a0d10b21
parent: 81317fdeddcef259b6ecf7b5c0d04caa167c6b54 [diff] [blame]
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b5f120c..21a7ca9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h

@@ -128,14 +128,26 @@
 	struct rcu_head rcu_head;
 	atomic_t refs;
 
-	/* the spinlock is used to protect most operations */
-	spinlock_t lock;
+	/* count of read lock holders on the extent buffer */
+	atomic_t write_locks;
+	atomic_t read_locks;
+	atomic_t blocking_writers;
+	atomic_t blocking_readers;
+	atomic_t spinning_readers;
+	atomic_t spinning_writers;
 
-	/*
-	 * when we keep the lock held while blocking, waiters go onto
-	 * the wq
+	/* protects write locks */
+	rwlock_t lock;
+
+	/* readers use lock_wq while they wait for the write
+	 * lock holders to unlock
 	 */
-	wait_queue_head_t lock_wq;
+	wait_queue_head_t write_lock_wq;
+
+	/* writers use read_lock_wq while they wait for readers
+	 * to unlock
+	 */
+	wait_queue_head_t read_lock_wq;
 };
 
 static inline void extent_set_compress_type(unsigned long *bio_flags,
commit	bd681513fa6f2ff29aa391f01e413a2d1c59fd77	[log] [tgz]
author	Chris Mason <chris.mason@oracle.com>	Sat Jul 16 15:23:14 2011 -0400
committer	Chris Mason <chris.mason@oracle.com>	Wed Jul 27 12:46:46 2011 -0400
tree	bb10ec6ef876b4d7a553cbe54976ec49a0d10b21
parent	81317fdeddcef259b6ecf7b5c0d04caa167c6b54 [diff] [blame]