ext4: attempt to fix race in bigalloc code path Currently, there exists a race between delayed allocated writes and the writeback when bigalloc feature is in use. The race was because we wanted to determine what blocks in a cluster are under delayed allocation and we were using buffer_delayed(bh) check for it. But, the writeback codepath clears this bit without any synchronization which resulted in a race and an ext4 warning similar to: EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0 reserved data blocks The race existed in two places. (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from writeback code path. (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where buffer_delayed(bh) is set. To fix (1), this patch introduces a new buffer_head state bit - BH_Da_Mapped. This bit is set under the protection of EXT4_I(inode)->i_data_sem when we have actually mapped the delayed allocated blocks during the writeout time. We can now reliably check for this bit inside ext4_find_delalloc_range() to determine whether the reservation for the blocks have already been claimed or not. To fix (2), it was necessary to set buffer_delay(bh) under the protection of i_data_sem. So, I extracted the very beginning of ext4_map_blocks into a new function - ext4_da_map_blocks() - and performed the required setting of bh_delay bit and the quota reservation under the protection of i_data_sem. These two fixes makes the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent, thus removing the race. Tested: I was able to reproduce the problem by running 'dd' and 'fsync' in parallel. Also, xfstests sometimes used to reproduce this race. After the fix both my test and xfstests were successful and no race (warning message) was observed. Google-Bug-Id: 4997027 Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

commit: 5356f2615cd558c57a1f7d7528d1ad4de3640d96 [log] [tgz]
author: Aditya Kali <adityakali@google.com> Fri Sep 09 19:20:51 2011 -0400
committer: Theodore Ts'o <tytso@mit.edu> Fri Sep 09 19:20:51 2011 -0400
tree: e3590bf14d9a21c4eb365105886382bfb1131b95
parent: d8990240d8c911064447f8aa5a440f9345a6d692 [diff] [blame]
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9b11930..ad39627c 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c

@@ -3296,28 +3296,9 @@
 
 	while ((i >= lblk_start) && (i <= lblk_end)) {
 		page = find_get_page(mapping, index);
-		if (!page || !PageDirty(page))
+		if (!page)
 			goto nextpage;
 
-		if (PageWriteback(page)) {
-			/*
-			 * This might be a race with allocation and writeout. In
-			 * this case we just assume that the rest of the range
-			 * will eventually be written and there wont be any
-			 * delalloc blocks left.
-			 * TODO: the above assumption is troublesome, but might
-			 * work better in practice. other option could be note
-			 * somewhere that the cluster is getting written out and
-			 * detect that here.
-			 */
-			page_cache_release(page);
-			trace_ext4_find_delalloc_range(inode,
-					lblk_start, lblk_end,
-					search_hint_reverse,
-					0, i);
-			return 0;
-		}
-
 		if (!page_has_buffers(page))
 			goto nextpage;
 
@@ -3340,7 +3321,11 @@
 				continue;
 			}
 
-			if (buffer_delay(bh)) {
+			/* Check if the buffer is delayed allocated and that it
+			 * is not yet mapped. (when da-buffers are mapped during
+			 * their writeout, their da_mapped bit is set.)
+			 */
+			if (buffer_delay(bh) && !buffer_da_mapped(bh)) {
 				page_cache_release(page);
 				trace_ext4_find_delalloc_range(inode,
 						lblk_start, lblk_end,
@@ -4106,6 +4091,7 @@
 			ext4_da_update_reserve_space(inode, allocated_clusters,
 							1);
 			if (reserved_clusters < allocated_clusters) {
+				struct ext4_inode_info *ei = EXT4_I(inode);
 				int reservation = allocated_clusters -
 						  reserved_clusters;
 				/*
@@ -4148,11 +4134,11 @@
 				 *   remaining blocks finally gets written, we
 				 *   could claim them.
 				 */
-				while (reservation) {
-					ext4_da_reserve_space(inode,
-							      map->m_lblk);
-					reservation--;
-				}
+				dquot_reserve_block(inode,
+						EXT4_C2B(sbi, reservation));
+				spin_lock(&ei->i_block_reservation_lock);
+				ei->i_reserved_data_blocks += reservation;
+				spin_unlock(&ei->i_block_reservation_lock);
 			}
 		}
 	}
commit	5356f2615cd558c57a1f7d7528d1ad4de3640d96	[log] [tgz]
author	Aditya Kali <adityakali@google.com>	Fri Sep 09 19:20:51 2011 -0400
committer	Theodore Ts'o <tytso@mit.edu>	Fri Sep 09 19:20:51 2011 -0400
tree	e3590bf14d9a21c4eb365105886382bfb1131b95
parent	d8990240d8c911064447f8aa5a440f9345a6d692 [diff] [blame]