From: Zhang Yi <[email protected]>
v1->v2:
- Drop the last patch in [1].
- Merge journal_clean_one_cp_list() into journal_shrink_one_cp_list().
- Fix the race issues through trying to hold buffer lock and check
dirty state under the lock.
- Append a cleanup patch, remove __journal_try_to_free_buffer().
Hello,
The first two patches are from [1] and are not changed, appending
another four (it depends on the first three) to fix another three race
issues in the checkpoint procedure which could also lead to inconsistent
results.
[1] https://lore.kernel.org/linux-ext4/[email protected]/
Thanks,
Yi.
Zhang Yi (5):
jbd2: recheck chechpointing non-dirty buffer
jbd2: remove t_checkpoint_io_list
jbd2: remove journal_clean_one_cp_list()
jbd2: fix a race when checking checkpoint buffer busy
jbd2: remove __journal_try_to_free_buffer()
Zhihao Cheng (1):
jbd2: Fix wrongly judgement for buffer head removing while doing
checkpoint
fs/jbd2/checkpoint.c | 276 ++++++++++++------------------------
fs/jbd2/commit.c | 3 +-
fs/jbd2/transaction.c | 40 ++----
include/linux/jbd2.h | 7 +-
include/trace/events/jbd2.h | 12 +-
5 files changed, 107 insertions(+), 231 deletions(-)
--
2.31.1
From: Zhang Yi <[email protected]>
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: [email protected]
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Tested-by: Zhihao Cheng <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/jbd2/checkpoint.c | 102 ++++++++++++-------------------------------
1 file changed, 29 insertions(+), 73 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
--
2.31.1
From: Zhang Yi <[email protected]>
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: [email protected]
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
---
fs/jbd2/checkpoint.c | 38 +++++++++++++++++++++++++++++++++++---
fs/jbd2/transaction.c | 17 +++++------------
include/linux/jbd2.h | 1 +
3 files changed, 41 insertions(+), 15 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 32f86bfbca69..36300f7eb32a 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -375,11 +375,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -615,6 +619,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we hold the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 91a2cf4bc575..c212da35a052 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
--
2.31.1
From: Zhang Yi <[email protected]>
Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint()
now, it's time to remove the whole t_checkpoint_io_list logic.
Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/jbd2/checkpoint.c | 42 ++----------------------------------------
fs/jbd2/commit.c | 3 +--
include/linux/jbd2.h | 6 ------
3 files changed, 3 insertions(+), 48 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 25e3c20eb19f..55d6efdbea64 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -27,7 +27,7 @@
*
* Called with j_list_lock held.
*/
-static inline void __buffer_unlink_first(struct journal_head *jh)
+static inline void __buffer_unlink(struct journal_head *jh)
{
transaction_t *transaction = jh->b_cp_transaction;
@@ -40,23 +40,6 @@ static inline void __buffer_unlink_first(struct journal_head *jh)
}
}
-/*
- * Unlink a buffer from a transaction checkpoint(io) list.
- *
- * Called with j_list_lock held.
- */
-static inline void __buffer_unlink(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
- if (transaction->t_checkpoint_io_list == jh) {
- transaction->t_checkpoint_io_list = jh->b_cpnext;
- if (transaction->t_checkpoint_io_list == jh)
- transaction->t_checkpoint_io_list = NULL;
- }
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -503,15 +486,6 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
break;
if (need_resched() || spin_needbreak(&journal->j_list_lock))
break;
- if (released)
- continue;
-
- nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_io_list,
- nr_to_scan, &released);
- if (*nr_to_scan == 0)
- break;
- if (need_resched() || spin_needbreak(&journal->j_list_lock))
- break;
} while (transaction != last_transaction);
if (transaction != last_transaction) {
@@ -566,17 +540,6 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
*/
if (need_resched())
return;
- if (ret)
- continue;
- /*
- * It is essential that we are as careful as in the case of
- * t_checkpoint_list with removing the buffer from the list as
- * we can possibly see not yet submitted buffers on io_list
- */
- ret = journal_clean_one_cp_list(transaction->
- t_checkpoint_io_list, destroy);
- if (need_resched())
- return;
/*
* Stop scanning if we couldn't free the transaction. This
* avoids pointless scanning of transactions which still
@@ -661,7 +624,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
jbd2_journal_put_journal_head(jh);
/* Is this transaction empty? */
- if (transaction->t_checkpoint_list || transaction->t_checkpoint_io_list)
+ if (transaction->t_checkpoint_list)
return 0;
/*
@@ -753,7 +716,6 @@ void __jbd2_journal_drop_transaction(journal_t *journal, transaction_t *transact
J_ASSERT(transaction->t_forget == NULL);
J_ASSERT(transaction->t_shadow_list == NULL);
J_ASSERT(transaction->t_checkpoint_list == NULL);
- J_ASSERT(transaction->t_checkpoint_io_list == NULL);
J_ASSERT(atomic_read(&transaction->t_updates) == 0);
J_ASSERT(journal->j_committing_transaction != transaction);
J_ASSERT(journal->j_running_transaction != transaction);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index b33155dd7001..1073259902a6 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -1141,8 +1141,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
spin_lock(&journal->j_list_lock);
commit_transaction->t_state = T_FINISHED;
/* Check if the transaction can be dropped now that we are finished */
- if (commit_transaction->t_checkpoint_list == NULL &&
- commit_transaction->t_checkpoint_io_list == NULL) {
+ if (commit_transaction->t_checkpoint_list == NULL) {
__jbd2_journal_drop_transaction(journal, commit_transaction);
jbd2_journal_free_transaction(commit_transaction);
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f619bae1dcc5..91a2cf4bc575 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -622,12 +622,6 @@ struct transaction_s
*/
struct journal_head *t_checkpoint_list;
- /*
- * Doubly-linked circular list of all buffers submitted for IO while
- * checkpointing. [j_list_lock]
- */
- struct journal_head *t_checkpoint_io_list;
-
/*
* Doubly-linked circular list of metadata buffers being
* shadowed by log IO. The IO buffers on the iobuf list and
--
2.31.1
From: Zhang Yi <[email protected]>
journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost
the same, so merge them into journal_shrink_one_cp_list(), remove the
nr_to_scan parameter, always scan and try to free the whole checkpoint
list.
Signed-off-by: Zhang Yi <[email protected]>
---
fs/jbd2/checkpoint.c | 74 ++++++++-----------------------------
include/trace/events/jbd2.h | 12 ++----
2 files changed, 20 insertions(+), 66 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 55d6efdbea64..3eb5b01a7e84 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -347,50 +347,10 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
/* Checkpoint list management */
-/*
- * journal_clean_one_cp_list
- *
- * Find all the written-back checkpoint buffers in the given list and
- * release them. If 'destroy' is set, clean all buffers unconditionally.
- *
- * Called with j_list_lock held.
- * Returns 1 if we freed the transaction, 0 otherwise.
- */
-static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
-{
- struct journal_head *last_jh;
- struct journal_head *next_jh = jh;
-
- if (!jh)
- return 0;
-
- last_jh = jh->b_cpprev;
- do {
- jh = next_jh;
- next_jh = jh->b_cpnext;
-
- if (!destroy && __cp_buffer_busy(jh))
- return 0;
-
- if (__jbd2_journal_remove_checkpoint(jh))
- return 1;
- /*
- * This function only frees up some memory
- * if possible so we dont have an obligation
- * to finish processing. Bail out if preemption
- * requested:
- */
- if (need_resched())
- return 0;
- } while (jh != last_jh);
-
- return 0;
-}
-
/*
* journal_shrink_one_cp_list
*
- * Find 'nr_to_scan' written-back checkpoint buffers in the given list
+ * Find all the written-back checkpoint buffers in the given list
* and try to release them. If the whole transaction is released, set
* the 'released' parameter. Return the number of released checkpointed
* buffers.
@@ -398,15 +358,14 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
* Called with j_list_lock held.
*/
static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
- unsigned long *nr_to_scan,
- bool *released)
+ bool destroy, bool *released)
{
struct journal_head *last_jh;
struct journal_head *next_jh = jh;
unsigned long nr_freed = 0;
int ret;
- if (!jh || *nr_to_scan == 0)
+ if (!jh)
return 0;
last_jh = jh->b_cpprev;
@@ -414,8 +373,7 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- (*nr_to_scan)--;
- if (__cp_buffer_busy(jh))
+ if (!destroy && __cp_buffer_busy(jh))
continue;
nr_freed++;
@@ -427,7 +385,7 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
if (need_resched())
break;
- } while (jh != last_jh && *nr_to_scan);
+ } while (jh != last_jh);
return nr_freed;
}
@@ -445,11 +403,11 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
unsigned long *nr_to_scan)
{
transaction_t *transaction, *last_transaction, *next_transaction;
- bool released;
+ bool __maybe_unused released;
tid_t first_tid = 0, last_tid = 0, next_tid = 0;
tid_t tid = 0;
unsigned long nr_freed = 0;
- unsigned long nr_scanned = *nr_to_scan;
+ unsigned long freed;
again:
spin_lock(&journal->j_list_lock);
@@ -478,10 +436,11 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
transaction = next_transaction;
next_transaction = transaction->t_cpnext;
tid = transaction->t_tid;
- released = false;
- nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_list,
- nr_to_scan, &released);
+ freed = journal_shrink_one_cp_list(transaction->t_checkpoint_list,
+ false, &released);
+ nr_freed += freed;
+ (*nr_to_scan) -= min(*nr_to_scan, freed);
if (*nr_to_scan == 0)
break;
if (need_resched() || spin_needbreak(&journal->j_list_lock))
@@ -502,9 +461,8 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
if (*nr_to_scan && next_tid)
goto again;
out:
- nr_scanned -= *nr_to_scan;
trace_jbd2_shrink_checkpoint_list(journal, first_tid, tid, last_tid,
- nr_freed, nr_scanned, next_tid);
+ nr_freed, next_tid);
return nr_freed;
}
@@ -520,7 +478,7 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
{
transaction_t *transaction, *last_transaction, *next_transaction;
- int ret;
+ bool released = false;
transaction = journal->j_checkpoint_transactions;
if (!transaction)
@@ -531,8 +489,8 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
do {
transaction = next_transaction;
next_transaction = transaction->t_cpnext;
- ret = journal_clean_one_cp_list(transaction->t_checkpoint_list,
- destroy);
+ journal_shrink_one_cp_list(transaction->t_checkpoint_list,
+ destroy, &released);
/*
* This function only frees up some memory if possible so we
* dont have an obligation to finish processing. Bail out if
@@ -545,7 +503,7 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
* avoids pointless scanning of transactions which still
* weren't checkpointed.
*/
- if (!ret)
+ if (!released)
return;
} while (transaction != last_transaction);
}
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 8f5ee380d309..5646ae15a957 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -462,11 +462,9 @@ TRACE_EVENT(jbd2_shrink_scan_exit,
TRACE_EVENT(jbd2_shrink_checkpoint_list,
TP_PROTO(journal_t *journal, tid_t first_tid, tid_t tid, tid_t last_tid,
- unsigned long nr_freed, unsigned long nr_scanned,
- tid_t next_tid),
+ unsigned long nr_freed, tid_t next_tid),
- TP_ARGS(journal, first_tid, tid, last_tid, nr_freed,
- nr_scanned, next_tid),
+ TP_ARGS(journal, first_tid, tid, last_tid, nr_freed, next_tid),
TP_STRUCT__entry(
__field(dev_t, dev)
@@ -474,7 +472,6 @@ TRACE_EVENT(jbd2_shrink_checkpoint_list,
__field(tid_t, tid)
__field(tid_t, last_tid)
__field(unsigned long, nr_freed)
- __field(unsigned long, nr_scanned)
__field(tid_t, next_tid)
),
@@ -484,15 +481,14 @@ TRACE_EVENT(jbd2_shrink_checkpoint_list,
__entry->tid = tid;
__entry->last_tid = last_tid;
__entry->nr_freed = nr_freed;
- __entry->nr_scanned = nr_scanned;
__entry->next_tid = next_tid;
),
TP_printk("dev %d,%d shrink transaction %u-%u(%u) freed %lu "
- "scanned %lu next transaction %u",
+ "next transaction %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->first_tid, __entry->tid, __entry->last_tid,
- __entry->nr_freed, __entry->nr_scanned, __entry->next_tid)
+ __entry->nr_freed, __entry->next_tid)
);
#endif /* _TRACE_JBD2_H */
--
2.31.1
On Tue 06-06-23 14:14:44, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost
> the same, so merge them into journal_shrink_one_cp_list(), remove the
> nr_to_scan parameter, always scan and try to free the whole checkpoint
> list.
>
> Signed-off-by: Zhang Yi <[email protected]>
Looks good. Just one nit below:
> @@ -398,15 +358,14 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
> * Called with j_list_lock held.
> */
> static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
> - unsigned long *nr_to_scan,
> - bool *released)
> + bool destroy, bool *released)
> {
> struct journal_head *last_jh;
> struct journal_head *next_jh = jh;
> unsigned long nr_freed = 0;
> int ret;
When changing this function, I think it will be more robust calling
convention to unconditionally set *released = false at the beginning of
this function. Then we can remove the initialization from
__jbd2_journal_clean_checkpoint_list().
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue 06-06-23 14:14:46, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Before removing checkpoint buffer from the t_checkpoint_list, we have to
> check both BH_Dirty and BH_Lock bits together to distinguish buffers
> have not been or were being written back. But __cp_buffer_busy() checks
> them separately, it first check lock state and then check dirty, the
> window between these two checks could be raced by writing back
> procedure, which locks buffer and clears buffer dirty before I/O
> completes. So it cannot guarantee checkpointing buffers been written
> back to disk if some error happens later. Finally, it may clean
> checkpoint transactions and lead to inconsistent filesystem.
>
> jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
> same problem (journal_unmap_buffer() escape from this issue since it's
> running under the buffer lock), so fix them through introducing a new
> helper to try holding the buffer lock and remove really clean buffer.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
> Cc: [email protected]
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Zhang Yi <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Just a type correction below:
> @@ -615,6 +619,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
> return 1;
> }
>
> +/*
> + * Check the checkpoint buffer and try to remove it from the checkpoint
> + * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
> + * it frees the transaction, 0 otherwise.
> + *
> + * This function is called with j_list_lock held.
> + */
> +int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
> +{
> + struct buffer_head *bh = jh2bh(jh);
> +
> + if (!trylock_buffer(bh))
> + return -EBUSY;
> + if (buffer_dirty(bh)) {
> + unlock_buffer(bh);
> + return -EBUSY;
> + }
> + unlock_buffer(bh);
> +
> + /*
> + * Buffer is clean and the IO has finished (we hold the buffer
^^^ held
> + * lock) so the checkpoint is done. We can safely remove the
> + * buffer from this transaction.
> + */
> + JBUFFER_TRACE(jh, "remove from checkpoint list");
> + return __jbd2_journal_remove_checkpoint(jh);
> +}
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2023/6/6 15:46, Jan Kara wrote:
> On Tue 06-06-23 14:14:44, Zhang Yi wrote:
>> From: Zhang Yi <[email protected]>
>>
>> journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost
>> the same, so merge them into journal_shrink_one_cp_list(), remove the
>> nr_to_scan parameter, always scan and try to free the whole checkpoint
>> list.
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>
> Looks good. Just one nit below:
>
>> @@ -398,15 +358,14 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
>> * Called with j_list_lock held.
>> */
>> static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
>> - unsigned long *nr_to_scan,
>> - bool *released)
>> + bool destroy, bool *released)
>> {
>> struct journal_head *last_jh;
>> struct journal_head *next_jh = jh;
>> unsigned long nr_freed = 0;
>> int ret;
>
> When changing this function, I think it will be more robust calling
> convention to unconditionally set *released = false at the beginning of
> this function. Then we can remove the initialization from
> __jbd2_journal_clean_checkpoint_list().
>
Thanks for the review, will change it in my next iteration.
Yi.