Btrfs: Reduce contention on the root node

This calls unlock_up sooner in btrfs_search_slot in order to decrease the
amount of work done with the higher level tree locks held.

Also, it changes btrfs_tree_lock to spin for a big against the page lock
before scheduling.  This makes a big difference in context switch rate under
highly contended workloads.

Longer term, a better locking structure is needed than the page lock.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 80813a3..058a506 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -27,6 +27,16 @@
 
 int btrfs_tree_lock(struct extent_buffer *eb)
 {
+	int i;
+
+	if (!TestSetPageLocked(eb->first_page))
+		return 0;
+	for (i = 0; i < 512; i++) {
+		cpu_relax();
+		if (!TestSetPageLocked(eb->first_page))
+			return 0;
+	}
+	cpu_relax();
 	lock_page(eb->first_page);
 	return 0;
 }