jbd2: issue cache flush after checkpointing even with internal journal
When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash. Thus we must unconditionally issue a cache flush before we
update journal superblock in these cases.
A similar problem can also occur if journal superblock is written only in
disk's caches, other transaction starts reusing space of the transaction
cleaned from the log and power failure happens. Subsequent journal replay would
still try to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 19dcd0b..7f7ee5b 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -478,79 +478,28 @@
int jbd2_cleanup_journal_tail(journal_t *journal)
{
- transaction_t * transaction;
tid_t first_tid;
- unsigned long blocknr, freed;
+ unsigned long blocknr;
if (is_journal_aborted(journal))
return 1;
- /* OK, work out the oldest transaction remaining in the log, and
- * the log block it starts at.
- *
- * If the log is now empty, we need to work out which is the
- * next transaction ID we will write, and where it will
- * start. */
-
- write_lock(&journal->j_state_lock);
- spin_lock(&journal->j_list_lock);
- transaction = journal->j_checkpoint_transactions;
- if (transaction) {
- first_tid = transaction->t_tid;
- blocknr = transaction->t_log_start;
- } else if ((transaction = journal->j_committing_transaction) != NULL) {
- first_tid = transaction->t_tid;
- blocknr = transaction->t_log_start;
- } else if ((transaction = journal->j_running_transaction) != NULL) {
- first_tid = transaction->t_tid;
- blocknr = journal->j_head;
- } else {
- first_tid = journal->j_transaction_sequence;
- blocknr = journal->j_head;
- }
- spin_unlock(&journal->j_list_lock);
+ if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
+ return 1;
J_ASSERT(blocknr != 0);
- /* If the oldest pinned transaction is at the tail of the log
- already then there's not much we can do right now. */
- if (journal->j_tail_sequence == first_tid) {
- write_unlock(&journal->j_state_lock);
- return 1;
- }
-
- /* OK, update the superblock to recover the freed space.
- * Physical blocks come first: have we wrapped beyond the end of
- * the log? */
- freed = blocknr - journal->j_tail;
- if (blocknr < journal->j_tail)
- freed = freed + journal->j_last - journal->j_first;
-
- trace_jbd2_cleanup_journal_tail(journal, first_tid, blocknr, freed);
- jbd_debug(1,
- "Cleaning journal tail from %d to %d (offset %lu), "
- "freeing %lu\n",
- journal->j_tail_sequence, first_tid, blocknr, freed);
-
- journal->j_free += freed;
- journal->j_tail_sequence = first_tid;
- journal->j_tail = blocknr;
- write_unlock(&journal->j_state_lock);
-
/*
- * If there is an external journal, we need to make sure that
- * any data blocks that were recently written out --- perhaps
- * by jbd2_log_do_checkpoint() --- are flushed out before we
- * drop the transactions from the external journal. It's
- * unlikely this will be necessary, especially with a
- * appropriately sized journal, but we need this to guarantee
- * correctness. Fortunately jbd2_cleanup_journal_tail()
- * doesn't get called all that often.
+ * We need to make sure that any blocks that were recently written out
+ * --- perhaps by jbd2_log_do_checkpoint() --- are flushed out before
+ * we drop the transactions from the journal. It's unlikely this will
+ * be necessary, especially with an appropriately sized journal, but we
+ * need this to guarantee correctness. Fortunately
+ * jbd2_cleanup_journal_tail() doesn't get called all that often.
*/
- if ((journal->j_fs_dev != journal->j_dev) &&
- (journal->j_flags & JBD2_BARRIER))
+ if (journal->j_flags & JBD2_BARRIER)
blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
- if (!(journal->j_flags & JBD2_ABORT))
- jbd2_journal_update_sb_log_tail(journal);
+
+ __jbd2_update_log_tail(journal, first_tid, blocknr);
return 0;
}