xfs: log ticket reservation underestimates the number of iclogs
When allocation a ticket for a transaction, the ticket is initialised with the
worst case log space usage based on the number of bytes the transaction may
consume. Part of this calculation is the number of log headers required for the
iclog space used up by the transaction.
This calculation makes an undocumented assumption that if the transaction uses
the log header space reservation on an iclog, then it consumes either the
entire iclog or it completes. That is - the transaction that is first in an
iclog is the transaction that the log header reservation is accounted to. If
the transaction is larger than the iclog, then it will use the entire iclog
itself. Document this assumption.
Further, the current calculation uses the rule that we can fit iclog_size bytes
of transaction data into an iclog. This is in correct - the amount of space
available in an iclog for transaction data is the size of the iclog minus the
space used for log record headers. This means that the calculation is out by
512 bytes per 32k of log space the transaction can consume. This is rarely an
issue because maximally sized transactions are extremely uncommon, and for 4k
block size filesystems maximal transaction reservations are about 400kb. Hence
the error in this case is less than the size of an iclog, so that makes it even
harder to hit.
However, anyone using larger directory blocks (16k directory blocks push the
maximum transaction size to approx. 900k on a 4k block size filesystem) or
larger block size (e.g. 64k blocks push transactions to the 3-4MB size) could
see the error grow to more than an iclog and at this point the transaction is
guaranteed to get a reservation underrun and shutdown the filesystem.
Fix this by adjusting the calculation to calculate the correct number of iclogs
required and account for them all up front.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8556c59..81323d7 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -664,7 +664,10 @@
/*
* Write region vectors to log. The write happens using the space reservation
* of the ticket (tic). It is not a requirement that all writes for a given
- * transaction occur with one call to xfs_log_write().
+ * transaction occur with one call to xfs_log_write(). However, it is important
+ * to note that the transaction reservation code makes an assumption about the
+ * number of log headers a transaction requires that may be violated if you
+ * don't pass all the transaction vectors in one call....
*/
int
xfs_log_write(
@@ -3170,14 +3173,16 @@
* Allocate and initialise a new log ticket.
*/
STATIC xlog_ticket_t *
-xlog_ticket_alloc(xlog_t *log,
- int unit_bytes,
- int cnt,
- char client,
- uint xflags)
+xlog_ticket_alloc(
+ struct log *log,
+ int unit_bytes,
+ int cnt,
+ char client,
+ uint xflags)
{
- xlog_ticket_t *tic;
+ struct xlog_ticket *tic;
uint num_headers;
+ int iclog_space;
tic = kmem_zone_zalloc(xfs_log_ticket_zone, KM_SLEEP|KM_MAYFAIL);
if (!tic)
@@ -3221,16 +3226,40 @@
/* for start-rec */
unit_bytes += sizeof(xlog_op_header_t);
- /* for LR headers */
- num_headers = ((unit_bytes + log->l_iclog_size-1) >> log->l_iclog_size_log);
+ /*
+ * for LR headers - the space for data in an iclog is the size minus
+ * the space used for the headers. If we use the iclog size, then we
+ * undercalculate the number of headers required.
+ *
+ * Furthermore - the addition of op headers for split-recs might
+ * increase the space required enough to require more log and op
+ * headers, so take that into account too.
+ *
+ * IMPORTANT: This reservation makes the assumption that if this
+ * transaction is the first in an iclog and hence has the LR headers
+ * accounted to it, then the remaining space in the iclog is
+ * exclusively for this transaction. i.e. if the transaction is larger
+ * than the iclog, it will be the only thing in that iclog.
+ * Fundamentally, this means we must pass the entire log vector to
+ * xlog_write to guarantee this.
+ */
+ iclog_space = log->l_iclog_size - log->l_iclog_hsize;
+ num_headers = howmany(unit_bytes, iclog_space);
+
+ /* for split-recs - ophdrs added when data split over LRs */
+ unit_bytes += sizeof(xlog_op_header_t) * num_headers;
+
+ /* add extra header reservations if we overrun */
+ while (!num_headers ||
+ howmany(unit_bytes, iclog_space) > num_headers) {
+ unit_bytes += sizeof(xlog_op_header_t);
+ num_headers++;
+ }
unit_bytes += log->l_iclog_hsize * num_headers;
/* for commit-rec LR header - note: padding will subsume the ophdr */
unit_bytes += log->l_iclog_hsize;
- /* for split-recs - ophdrs added when data split over LRs */
- unit_bytes += sizeof(xlog_op_header_t) * num_headers;
-
/* for roundoff padding for transaction data and one for commit record */
if (xfs_sb_version_haslogv2(&log->l_mp->m_sb) &&
log->l_mp->m_sb.sb_logsunit > 1) {
@@ -3252,7 +3281,7 @@
tic->t_trans_type = 0;
if (xflags & XFS_LOG_PERM_RESERV)
tic->t_flags |= XLOG_TIC_PERM_RESERV;
- sv_init(&(tic->t_wait), SV_DEFAULT, "logtick");
+ sv_init(&tic->t_wait, SV_DEFAULT, "logtick");
xlog_tic_reset_res(tic);