LinuxLists.cc - [PATCH 0/4] ext4/jbd2: possible filesystem corruption fixes

2008-10-10 08:47:19

Subject: [PATCH 0/4] ext4/jbd2: possible filesystem corruption fixes

Hi,

This patch set fixes ext4 fs corruption problems caused by
transient I/O error. The ext3/jbd version is already in the
current -mm.

Here is the ext3/jbd version:
http://kerneltrap.org/mailarchive/linux-kernel/2008/7/24/2661724

This patch set consists of following 4 patches:

[PATCH 1/4] jbd2: abort when failed to log metadata buffers
[PATCH 2/4] jbd2: fix error handling for checkpoint io
[PATCH 3/4] ext4: add checks for errors from jbd2
[PATCH 4/4] jbd2: don't dirty original metadata buffer on abort

In the ext3/jbd version, PATCH 2/4 had a problem and I fixed it
with an additional patch later. This time PATCH 2/4 includes
that fix.

Regards,
--
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center

2008-10-10 08:58:49

by Hidehiro Kawai

[permalink] [raw]

Subject: [PATCH 2/4] jbd2: fix error handling for checkpoint io

When a checkpointing IO fails, current JBD2 code doesn't check the
error and continue journaling. This means latest metadata can be
lost from both the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space
and aborts journaling in the case of jbd2_log_do_checkpoint().
To achieve this, we need to do:

1. don't remove the failed buffer from the checkpoint list where in
the case of __try_to_free_cp_buf() because it may be released or
overwritten by a later transaction
2. jbd2_log_do_checkpoint() is the last chance, remove the failed
buffer from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
prevent the journaled contents from being cleaned. For safety,
don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext4 layer so
that ext4 don't clear the needs_recovery flag, otherwise the
journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent jbd2_cleanup_journal_tail() from being called between
__jbd2_journal_drop_transaction() and jbd2_journal_abort()
(a possible race issue between jbd2_log_do_checkpoint()s called by
jbd2_journal_flush() and __jbd2_log_wait_for_space())

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/jbd2/checkpoint.c | 49 ++++++++++++++++++++++++++++++-----------
fs/jbd2/journal.c | 28 ++++++++++++++++++-----
fs/jbd2/recovery.c | 7 ++++-
include/linux/jbd2.h | 2 -
4 files changed, 65 insertions(+), 21 deletions(-)

Index: linux-2.6.27-rc9-ex4-1/fs/jbd2/checkpoint.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/jbd2/checkpoint.c
+++ linux-2.6.27-rc9-ex4-1/fs/jbd2/checkpoint.c
@@ -94,7 +94,8 @@ static int __try_to_free_cp_buf(struct j
int ret = 0;
struct buffer_head *bh = jh2bh(jh);

- if (jh->b_jlist == BJ_None && !buffer_locked(bh) && !buffer_dirty(bh)) {
+ if (jh->b_jlist == BJ_None && !buffer_locked(bh) &&
+ !buffer_dirty(bh) && !buffer_write_io_error(bh)) {
JBUFFER_TRACE(jh, "remove from checkpoint list");
ret = __jbd2_journal_remove_checkpoint(jh) + 1;
jbd_unlock_bh_state(bh);
@@ -176,21 +177,25 @@ static void jbd_sync_bh(journal_t *journ
* buffers. Note that we take the buffers in the opposite ordering
* from the one in which they were submitted for IO.
*
+ * Return 0 on success, and return <0 if some buffers have failed
+ * to be written out.
+ *
* Called with j_list_lock held.
*/
-static void __wait_cp_io(journal_t *journal, transaction_t *transaction)
+static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
{
struct journal_head *jh;
struct buffer_head *bh;
tid_t this_tid;
int released = 0;
+ int ret = 0;

this_tid = transaction->t_tid;
restart:
/* Did somebody clean up the transaction in the meanwhile? */
if (journal->j_checkpoint_transactions != transaction ||
transaction->t_tid != this_tid)
- return;
+ return ret;
while (!released && transaction->t_checkpoint_io_list) {
jh = transaction->t_checkpoint_io_list;
bh = jh2bh(jh);
@@ -210,6 +215,9 @@ restart:
spin_lock(&journal->j_list_lock);
goto restart;
}
+ if (unlikely(buffer_write_io_error(bh)))
+ ret = -EIO;
+
/*
* Now in whatever state the buffer currently is, we know that
* it has been written out and so we can drop it from the list
@@ -219,6 +227,8 @@ restart:
jbd2_journal_remove_journal_head(bh);
__brelse(bh);
}
+
+ return ret;
}

#define NR_BATCH 64
@@ -242,7 +252,8 @@ __flush_batch(journal_t *journal, struct
* Try to flush one buffer from the checkpoint list to disk.
*
* Return 1 if something happened which requires us to abort the current
- * scan of the checkpoint list.
+ * scan of the checkpoint list. Return <0 if the buffer has failed to
+ * be written out.
*
* Called with j_list_lock held and drops it if 1 is returned
* Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
@@ -274,6 +285,9 @@ static int __process_buffer(journal_t *j
jbd2_log_wait_commit(journal, tid);
ret = 1;
} else if (!buffer_dirty(bh)) {
+ ret = 1;
+ if (unlikely(buffer_write_io_error(bh)))
+ ret = -EIO;
J_ASSERT_JH(jh, !buffer_jbddirty(bh));
BUFFER_TRACE(bh, "remove from checkpoint");
__jbd2_journal_remove_checkpoint(jh);
@@ -281,7 +295,6 @@ static int __process_buffer(journal_t *j
jbd_unlock_bh_state(bh);
jbd2_journal_remove_journal_head(bh);
__brelse(bh);
- ret = 1;
} else {
/*
* Important: we are about to write the buffer, and
@@ -314,6 +327,7 @@ static int __process_buffer(journal_t *j
* to disk. We submit larger chunks of data at once.
*
* The journal should be locked before calling this function.
+ * Called with j_checkpoint_mutex held.
*/
int jbd2_log_do_checkpoint(journal_t *journal)
{
@@ -339,6 +353,7 @@ int jbd2_log_do_checkpoint(journal_t *jo
* OK, we need to start writing disk blocks. Take one transaction
* and write it.
*/
+ result = 0;
spin_lock(&journal->j_list_lock);
if (!journal->j_checkpoint_transactions)
goto out;
@@ -357,7 +372,7 @@ restart:
int batch_count = 0;
struct buffer_head *bhs[NR_BATCH];
struct journal_head *jh;
- int retry = 0;
+ int retry = 0, err;

while (!retry && transaction->t_checkpoint_list) {
struct buffer_head *bh;
@@ -371,6 +386,8 @@ restart:
}
retry = __process_buffer(journal, jh, bhs, &batch_count,
transaction);
+ if (retry < 0 && !result)
+ result = retry;
if (!retry && (need_resched() ||
spin_needbreak(&journal->j_list_lock))) {
spin_unlock(&journal->j_list_lock);
@@ -395,14 +412,18 @@ restart:
* Now we have cleaned up the first transaction's checkpoint
* list. Let's clean up the second one
*/
- __wait_cp_io(journal, transaction);
+ err = __wait_cp_io(journal, transaction);
+ if (!result)
+ result = err;
}
out:
spin_unlock(&journal->j_list_lock);
- result = jbd2_cleanup_journal_tail(journal);
if (result < 0)
- return result;
- return 0;
+ jbd2_journal_abort(journal, result);
+ else
+ result = jbd2_cleanup_journal_tail(journal);
+
+ return (result < 0) ? result : 0;
}

/*
@@ -418,8 +439,9 @@ out:
* This is the only part of the journaling code which really needs to be
* aware of transaction aborts. Checkpointing involves writing to the
* main filesystem area rather than to the journal, so it can proceed
- * even in abort state, but we must not update the journal superblock if
- * we have an abort error outstanding.
+ * even in abort state, but we must not update the super block if
+ * checkpointing may have failed. Otherwise, we would lose some metadata
+ * buffers which should be written-back to the filesystem.
*/

int jbd2_cleanup_journal_tail(journal_t *journal)
@@ -428,6 +450,9 @@ int jbd2_cleanup_journal_tail(journal_t
tid_t first_tid;
unsigned long blocknr, freed;

+ if (is_journal_aborted(journal))
+ return 1;
+
/* OK, work out the oldest transaction remaining in the log, and
* the log block it starts at.
*
Index: linux-2.6.27-rc9-ex4-1/fs/jbd2/journal.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/jbd2/journal.c
+++ linux-2.6.27-rc9-ex4-1/fs/jbd2/journal.c
@@ -1451,9 +1451,12 @@ recovery_error:
*
* Release a journal_t structure once it is no longer in use by the
* journaled object.
+ * Return <0 if we couldn't clean up the journal.
*/
-void jbd2_journal_destroy(journal_t *journal)
+int jbd2_journal_destroy(journal_t *journal)
{
+ int err = 0;
+
/* Wait for the commit thread to wake up and die. */
journal_kill_thread(journal);

@@ -1476,11 +1479,16 @@ void jbd2_journal_destroy(journal_t *jou
J_ASSERT(journal->j_checkpoint_transactions == NULL);
spin_unlock(&journal->j_list_lock);

- /* We can now mark the journal as empty. */
- journal->j_tail = 0;
- journal->j_tail_sequence = ++journal->j_transaction_sequence;
if (journal->j_sb_buffer) {
- jbd2_journal_update_superblock(journal, 1);
+ if (!is_journal_aborted(journal)) {
+ /* We can now mark the journal as empty. */
+ journal->j_tail = 0;
+ journal->j_tail_sequence =
+ ++journal->j_transaction_sequence;
+ jbd2_journal_update_superblock(journal, 1);
+ } else {
+ err = -EIO;
+ }
brelse(journal->j_sb_buffer);
}

@@ -1492,6 +1500,8 @@ void jbd2_journal_destroy(journal_t *jou
jbd2_journal_destroy_revoke(journal);
kfree(journal->j_wbuf);
kfree(journal);
+
+ return err;
}

@@ -1717,10 +1727,16 @@ int jbd2_journal_flush(journal_t *journa
spin_lock(&journal->j_list_lock);
while (!err && journal->j_checkpoint_transactions != NULL) {
spin_unlock(&journal->j_list_lock);
+ mutex_lock(&journal->j_checkpoint_mutex);
err = jbd2_log_do_checkpoint(journal);
+ mutex_unlock(&journal->j_checkpoint_mutex);
spin_lock(&journal->j_list_lock);
}
spin_unlock(&journal->j_list_lock);
+
+ if (is_journal_aborted(journal))
+ return -EIO;
+
jbd2_cleanup_journal_tail(journal);

/* Finally, mark the journal as really needing no recovery.
@@ -1742,7 +1758,7 @@ int jbd2_journal_flush(journal_t *journa
J_ASSERT(journal->j_head == journal->j_tail);
J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
spin_unlock(&journal->j_state_lock);
- return err;
+ return 0;
}

/**
Index: linux-2.6.27-rc9-ex4-1/fs/jbd2/recovery.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/jbd2/recovery.c
+++ linux-2.6.27-rc9-ex4-1/fs/jbd2/recovery.c
@@ -225,7 +225,7 @@ do { \
*/
int jbd2_journal_recover(journal_t *journal)
{
- int err;
+ int err, err2;
journal_superblock_t * sb;

struct recovery_info info;
@@ -263,7 +263,10 @@ int jbd2_journal_recover(journal_t *jour
journal->j_transaction_sequence = ++info.end_transaction;

jbd2_journal_clear_revoke(journal);
- sync_blockdev(journal->j_fs_dev);
+ err2 = sync_blockdev(journal->j_fs_dev);
+ if (!err)
+ err = err2;
+
return err;
}

Index: linux-2.6.27-rc9-ex4-1/include/linux/jbd2.h
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/include/linux/jbd2.h
+++ linux-2.6.27-rc9-ex4-1/include/linux/jbd2.h
@@ -1061,7 +1061,7 @@ extern void jbd2_journal_clear_featur
(journal_t *, unsigned long, unsigned long, unsigned long);
extern int jbd2_journal_create (journal_t *);
extern int jbd2_journal_load (journal_t *journal);
-extern void jbd2_journal_destroy (journal_t *);
+extern int jbd2_journal_destroy (journal_t *);
extern int jbd2_journal_recover (journal_t *journal);
extern int jbd2_journal_wipe (journal_t *, int);
extern int jbd2_journal_skip_recovery (journal_t *);

2008-10-10 09:00:37

by Hidehiro Kawai

[permalink] [raw]

Subject: [PATCH 3/4] ext4: add checks for errors from jbd2

If the journal has aborted due to a checkpointing failure, we
have to keep the contents of the journal space. Otherwise, the
filesystem will lose uncheckpointed metadata completely and
become inconsistent. To avoid this, we need to keep needs_recovery
flag if checkpoint has failed.

With this patch, ext4_put_super() detects a checkpointing failure
from the return value of journal_destroy(), then it invokes
ext4_abort() to make the filesystem read only and keep
needs_recovery flag. Errors from jbd2_journal_flush() are also
handled by this patch in some places.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/ext4/ioctl.c | 12 ++++++++----
fs/ext4/super.c | 23 +++++++++++++++++++----
2 files changed, 27 insertions(+), 8 deletions(-)

Index: linux-2.6.27-rc9-ex4-1/fs/ext4/ioctl.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/ext4/ioctl.c
+++ linux-2.6.27-rc9-ex4-1/fs/ext4/ioctl.c
@@ -192,7 +192,7 @@ setversion_out:
case EXT4_IOC_GROUP_EXTEND: {
ext4_fsblk_t n_blocks_count;
struct super_block *sb = inode->i_sb;
- int err;
+ int err, err2;

if (!capable(CAP_SYS_RESOURCE))
return -EPERM;
@@ -206,8 +206,10 @@ setversion_out:

err = ext4_group_extend(sb, EXT4_SB(sb)->s_es, n_blocks_count);
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
- jbd2_journal_flush(EXT4_SB(sb)->s_journal);
+ err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ if (err == 0)
+ err = err2;
mnt_drop_write(filp->f_path.mnt);

return err;
@@ -220,7 +222,7 @@ setversion_out:
case EXT4_IOC_GROUP_ADD: {
struct ext4_new_group_data input;
struct super_block *sb = inode->i_sb;
- int err;
+ int err, err2;

if (!capable(CAP_SYS_RESOURCE))
return -EPERM;
@@ -235,8 +237,10 @@ setversion_out:

err = ext4_group_add(sb, &input);
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
- jbd2_journal_flush(EXT4_SB(sb)->s_journal);
+ err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ if (err == 0)
+ err = err2;
mnt_drop_write(filp->f_path.mnt);

return err;
Index: linux-2.6.27-rc9-ex4-1/fs/ext4/super.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/ext4/super.c
+++ linux-2.6.27-rc9-ex4-1/fs/ext4/super.c
@@ -507,7 +507,8 @@ static void ext4_put_super(struct super_
ext4_mb_release(sb);
ext4_ext_release(sb);
ext4_xattr_put_super(sb);
- jbd2_journal_destroy(sbi->s_journal);
+ if (jbd2_journal_destroy(sbi->s_journal) < 0)
+ ext4_abort(sb, __func__, "Couldn't clean up the journal");
sbi->s_journal = NULL;
if (!(sb->s_flags & MS_RDONLY)) {
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
@@ -2863,7 +2864,9 @@ static void ext4_mark_recovery_complete(
journal_t *journal = EXT4_SB(sb)->s_journal;

jbd2_journal_lock_updates(journal);
- jbd2_journal_flush(journal);
+ if (jbd2_journal_flush(journal) < 0)
+ goto out;
+
lock_super(sb);
if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER) &&
sb->s_flags & MS_RDONLY) {
@@ -2872,6 +2875,8 @@ static void ext4_mark_recovery_complete(
ext4_commit_super(sb, es, 1);
}
unlock_super(sb);
+
+out:
jbd2_journal_unlock_updates(journal);
}

@@ -2972,7 +2977,13 @@ static void ext4_write_super_lockfs(stru

/* Now we set up the journal barrier. */
jbd2_journal_lock_updates(journal);
- jbd2_journal_flush(journal);
+
+ /*
+ * We don't want to clear needs_recovery flag when we failed
+ * to flush the journal.
+ */
+ if (jbd2_journal_flush(journal) < 0)
+ return;

/* Journal blocked and flushed, clear needs_recovery flag. */
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
@@ -3412,8 +3423,12 @@ static int ext4_quota_on(struct super_bl
* otherwise be livelocked...
*/
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
- jbd2_journal_flush(EXT4_SB(sb)->s_journal);
+ err = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ if (err) {
+ path_put(&nd.path);
+ return err;
+ }
}

err = vfs_quota_on_path(sb, type, format_id, &nd.path);

2008-10-10 08:57:48

by Hidehiro Kawai

[permalink] [raw]

Subject: [PATCH 1/4] jbd2: abort when failed to log metadata buffers

If we failed to write metadata buffers to the journal space and
succeeded to write the commit record, stale data can be written
back to the filesystem as metadata in the recovery phase.

To avoid this, when we failed to write out metadata buffers,
abort the journal before writing the commit record.

We can also avoid this kind of corruption by using check-summing
feature because it can detect invalid metadata blocks in the
journal and avoid them from being replayed. So we don't need to
care about asynchronous commit record writeout with a check sum.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/jbd2/commit.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux-2.6.27-rc9-ex4-1/fs/jbd2/commit.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/jbd2/commit.c
+++ linux-2.6.27-rc9-ex4-1/fs/jbd2/commit.c
@@ -783,6 +783,9 @@ wait_for_iobuf:
/* AKPM: bforget here */
}

+ if (err)
+ jbd2_journal_abort(journal, err);
+
jbd_debug(3, "JBD: commit phase 5\n");

if (!JBD2_HAS_INCOMPAT_FEATURE(journal,

2008-10-10 09:02:22

by Hidehiro Kawai

[permalink] [raw]

Subject: [PATCH 4/4] jbd2: don't dirty original metadata buffer on abort

Currently, original metadata buffers are dirtied when they are
unfiled whether the journal has aborted or not. Eventually these
buffers will be written-back to the filesystem by pdflush. This
means some metadata buffers are written to the filesystem without
journaling if the journal aborts. So if both journal abort and
system crash happen at the same time, the filesystem would become
inconsistent state. Additionally, replaying journaled metadata
can overwrite the latest metadata on the filesystem partly.
Because, if the journal gets aborted, journaled metadata are
preserved and replayed during the next mount not to lose
uncheckpointed metadata. This would also break the consistency
of the filesystem.

This patch prevents original metadata buffers from being dirtied
on abort by clearing BH_JBDDirty flag from those buffers. Thus,
no metadata buffers are written to the filesystem without journaling.

Signed-off-by: Hidehiro Kawai <[email protected]>
---
fs/jbd2/commit.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6.27-rc9-ex4-1/fs/jbd2/commit.c
===================================================================
--- linux-2.6.27-rc9-ex4-1.orig/fs/jbd2/commit.c
+++ linux-2.6.27-rc9-ex4-1/fs/jbd2/commit.c
@@ -504,9 +504,10 @@ void jbd2_journal_commit_transaction(jou
jh = commit_transaction->t_buffers;

/* If we're in abort mode, we just un-journal the buffer and
- release it for background writing. */
+ release it. */

if (is_journal_aborted(journal)) {
+ clear_buffer_jbddirty(jh2bh(jh));
JBUFFER_TRACE(jh, "journal is aborting: refile");
jbd2_journal_refile_buffer(journal, jh);
/* If that was the last one, we need to clean up
@@ -884,6 +885,8 @@ restart_loop:
if (buffer_jbddirty(bh)) {
JBUFFER_TRACE(jh, "add to new checkpointing trans");
__jbd2_journal_insert_checkpoint(jh, commit_transaction);
+ if (is_journal_aborted(journal))
+ clear_buffer_jbddirty(bh);
JBUFFER_TRACE(jh, "refile for checkpoint writeback");
__jbd2_journal_refile_buffer(jh);
jbd_unlock_bh_state(bh);

2008-10-11 04:25:23

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 0/4] ext4/jbd2: possible filesystem corruption fixes

On Fri, Oct 10, 2008 at 05:47:06PM +0900, Hidehiro Kawai wrote:
> Hi,
>
> This patch set fixes ext4 fs corruption problems caused by
> transient I/O error. The ext3/jbd version is already in the
> current -mm.

Hidehiro-san,

Thanks for the patches. I've added them to the ext4 patch queue.
They weren't pushed to Linus in the first set of patches, because I
haven't yet had time to examine them closely, but I am in the process
of testing them.

Regards,

- Ted

2008-10-13 20:38:00

by Joel Becker

[permalink] [raw]

Subject: Re: [PATCH 4/4] jbd2: don't dirty original metadata buffer on abort

On Fri, Oct 10, 2008 at 06:02:09PM +0900, Hidehiro Kawai wrote:
> Currently, original metadata buffers are dirtied when they are
> unfiled whether the journal has aborted or not. Eventually these
> buffers will be written-back to the filesystem by pdflush. This
> means some metadata buffers are written to the filesystem without
> journaling if the journal aborts. So if both journal abort and
> system crash happen at the same time, the filesystem would become
> inconsistent state. Additionally, replaying journaled metadata
> can overwrite the latest metadata on the filesystem partly.
> Because, if the journal gets aborted, journaled metadata are
> preserved and replayed during the next mount not to lose
> uncheckpointed metadata. This would also break the consistency
> of the filesystem.
>
> This patch prevents original metadata buffers from being dirtied
> on abort by clearing BH_JBDDirty flag from those buffers. Thus,
> no metadata buffers are written to the filesystem without journaling.

It's not my place to Ack such patches, but I noticed this bug
during Plumbers, and Eric and Andreas pointed me to this patch, which
fixes it quite nicely. Just $0.02 :-)

Joel

--

Life's Little Instruction Book #94

"Make it a habit to do nice things for people who
will never find out."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-14 02:58:20

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 4/4] jbd2: don't dirty original metadata buffer on abort

On Mon, Oct 13, 2008 at 01:36:42PM -0700, Joel Becker wrote:
> > This patch prevents original metadata buffers from being dirtied
> > on abort by clearing BH_JBDDirty flag from those buffers. Thus,
> > no metadata buffers are written to the filesystem without journaling.
>
> It's not my place to Ack such patches, but I noticed this bug
> during Plumbers, and Eric and Andreas pointed me to this patch, which
> fixes it quite nicely. Just $0.02 :-)

Already pushed to Linus, and in mainline. :-)

- Ted