The __log_wait_for_space function sits in a loop checkpointing transactions
until there is sufficient space free in the journal. However, if there are
no transactions to be processed (e.g. because the free space calculation is
wrong due to a corrupted filesystem) it will never progress.
Check for space being required when no transactions are outstanding and
abort the journal instead of endlessly looping.
This patch fixes the bug reported by Sami Liedes at:
http://bugzilla.kernel.org/show_bug.cgi?id=10976
Signed-off-by: Duane Griffin <[email protected]>
---
diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
index a5432bb..af2b554 100644
--- a/fs/jbd/checkpoint.c
+++ b/fs/jbd/checkpoint.c
@@ -126,13 +126,24 @@ void __log_wait_for_space(journal_t *journal)
/*
* Test again, another process may have checkpointed while we
- * were waiting for the checkpoint lock
+ * were waiting for the checkpoint lock. If there are no
+ * outstanding transactions there is nothing to checkpoint and
+ * we can't make progress. Abort the journal in this case.
*/
spin_lock(&journal->j_state_lock);
nblocks = jbd_space_needed(journal);
if (__log_space_left(journal) < nblocks) {
+ int chkpt = journal->j_checkpoint_transactions != NULL;
+
spin_unlock(&journal->j_state_lock);
- log_do_checkpoint(journal);
+ if (chkpt) {
+ log_do_checkpoint(journal);
+ } else {
+ printk(KERN_ERR "%s: no transactions\n",
+ __func__);
+ journal_abort(journal, 0);
+ }
+
spin_lock(&journal->j_state_lock);
}
mutex_unlock(&journal->j_checkpoint_mutex);
The __jbd2_log_wait_for_space function sits in a loop checkpointing
transactions until there is sufficient space free in the journal. However,
if there are no transactions to be processed (e.g. because the free space
calculation is wrong due to a corrupted filesystem) it will never progress.
Check for space being required when no transactions are outstanding and
abort the journal instead of endlessly looping.
This patch fixes the bug reported by Sami Liedes at:
http://bugzilla.kernel.org/show_bug.cgi?id=10976
Signed-off-by: Duane Griffin <[email protected]>
---
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 91389c8..3c075c4 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -126,13 +126,24 @@ void __jbd2_log_wait_for_space(journal_t *journal)
/*
* Test again, another process may have checkpointed while we
- * were waiting for the checkpoint lock
+ * were waiting for the checkpoint lock. If there are no
+ * outstanding transactions there is nothing to checkpoint and
+ * we can't make progress. Abort the journal in this case.
*/
spin_lock(&journal->j_state_lock);
nblocks = jbd_space_needed(journal);
if (__jbd2_log_space_left(journal) < nblocks) {
+ int chkpt = journal->j_checkpoint_transactions != NULL;
+
spin_unlock(&journal->j_state_lock);
- jbd2_log_do_checkpoint(journal);
+ if (chkpt) {
+ jbd2_log_do_checkpoint(journal);
+ } else {
+ printk(KERN_ERR "%s: no transactions\n",
+ __func__);
+ jbd2_journal_abort(journal, 0);
+ }
+
spin_lock(&journal->j_state_lock);
}
mutex_unlock(&journal->j_checkpoint_mutex);
On Tue, 5 Aug 2008 02:05:20 +0100 "Duane Griffin" <[email protected]> wrote:
> The __log_wait_for_space function sits in a loop checkpointing transactions
> until there is sufficient space free in the journal. However, if there are
> no transactions to be processed (e.g. because the free space calculation is
> wrong due to a corrupted filesystem) it will never progress.
>
> Check for space being required when no transactions are outstanding and
> abort the journal instead of endlessly looping.
>
> This patch fixes the bug reported by Sami Liedes at:
> http://bugzilla.kernel.org/show_bug.cgi?id=10976
>
> Signed-off-by: Duane Griffin <[email protected]>
> ---
> diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
> index a5432bb..af2b554 100644
> --- a/fs/jbd/checkpoint.c
> +++ b/fs/jbd/checkpoint.c
> @@ -126,13 +126,24 @@ void __log_wait_for_space(journal_t *journal)
>
> /*
> * Test again, another process may have checkpointed while we
> - * were waiting for the checkpoint lock
> + * were waiting for the checkpoint lock. If there are no
> + * outstanding transactions there is nothing to checkpoint and
> + * we can't make progress. Abort the journal in this case.
> */
> spin_lock(&journal->j_state_lock);
> nblocks = jbd_space_needed(journal);
> if (__log_space_left(journal) < nblocks) {
> + int chkpt = journal->j_checkpoint_transactions != NULL;
> +
> spin_unlock(&journal->j_state_lock);
> - log_do_checkpoint(journal);
> + if (chkpt) {
> + log_do_checkpoint(journal);
> + } else {
> + printk(KERN_ERR "%s: no transactions\n",
> + __func__);
> + journal_abort(journal, 0);
> + }
> +
> spin_lock(&journal->j_state_lock);
> }
> mutex_unlock(&journal->j_checkpoint_mutex);
umm, OK, but...
There's not really a lot of point in testing
journal->j_checkpoint_transactions inside j_state_lock, is there?
Hence local variable chkpt isn't really needed.
But log_do_checkpoint() already checks to see if there are any
checkpointing transactions upon which to operate, so rather than doing
log_do_checkpoint()'s work for it, perhaps it would be cleaner to teach
log_do_checkpoint() to tell the caller whether it manage to do any
work?
The nice thing about that is that even if
journal->j_checkpoint_transactions is NULL, log_do_checkpoint() might
still be able to do some useful work in cleanup_journal_tail().
otoh, two existing callers of log_do_checkpoint() already test
journal->j_checkpoint_transactions before calling log_do_checkpoint(),
so maybe that's pretty pointless.
otoh2, those existing callers do the seemingly-unneeded
spin_lock(j_list_lock). hrm. So if we're playing match-the-existing
code, we should go with your first patches.
ho hum, I guess I'll do "otoh2".