2008-03-15 18:49:50

by Duane Griffin

[permalink] [raw]
Subject: [PATCH 0/2] jbd[2]: fix long-standing data corruption bug

It seems we have a very long-standing typo in the journal replay magic number
unescaping code that basically means it will cause data corruption if we ever
actually use it. I think this is a candidate for the stable tree.

The following script, run on an FS mounted with data=journal, may be helpful in
reproducing and testing this bug:

#!/bin/bash

rm -rf test-dir && mkdir test-dir

for (( i=0; i != 100; i++ )); do
printf "\xc0\x3b\x39\x98" > test-dir/$RANDOM
done

while /bin/true; do
printf "\xc0\x3b\x39\x98" > test-dir/$RANDOM
rm test-dir/`ls test-dir | sort -R | head -n1`
done


2008-03-15 18:49:51

by Duane Griffin

[permalink] [raw]
Subject: [PATCH] jbd: correctly unescape journal data blocks

Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping. At the moment the wrong buffer head's data
is being unescaped.

To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device. Without this patch the files will contain zeros
instead of the correct data after recovery.

Signed-off-by: Duane Griffin <[email protected]>
---
fs/jbd/recovery.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jbd/recovery.c b/fs/jbd/recovery.c
index 2b8edf4..43bc5e5 100644
--- a/fs/jbd/recovery.c
+++ b/fs/jbd/recovery.c
@@ -478,7 +478,7 @@ static int do_one_pass(journal_t *journal,
memcpy(nbh->b_data, obh->b_data,
journal->j_blocksize);
if (flags & JFS_FLAG_ESCAPE) {
- *((__be32 *)bh->b_data) =
+ *((__be32 *)nbh->b_data) =
cpu_to_be32(JFS_MAGIC_NUMBER);
}

--
1.5.3.7


2008-03-15 18:49:53

by Duane Griffin

[permalink] [raw]
Subject: [PATCH] jbd2: correctly unescape journal data blocks

Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping. At the moment the wrong buffer head's data
is being unescaped.

To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device. Without this patch the files will contain zeros
instead of the correct data after recovery.

Signed-off-by: Duane Griffin <[email protected]>
---
fs/jbd2/recovery.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 1464113..5d0405a 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -535,7 +535,7 @@ static int do_one_pass(journal_t *journal,
memcpy(nbh->b_data, obh->b_data,
journal->j_blocksize);
if (flags & JBD2_FLAG_ESCAPE) {
- *((__be32 *)bh->b_data) =
+ *((__be32 *)nbh->b_data) =
cpu_to_be32(JBD2_MAGIC_NUMBER);
}

--
1.5.3.7


2008-03-16 00:29:54

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] jbd: correctly unescape journal data blocks

On Mar 15, 2008 18:49 +0000, Duane Griffin wrote:
> Fix a long-standing typo (predating git) that will cause data corruption
> if a journal data block needs unescaping. At the moment the wrong buffer
> head's data is being unescaped.
>
> To test this case mount a filesystem with data=journal, start creating
> and deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998),
> then pull the plug on the device. Without this patch the files will contain
> zeros instead of the correct data after recovery.
>
> Signed-off-by: Duane Griffin <[email protected]>
> ---
> fs/jbd/recovery.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/jbd/recovery.c b/fs/jbd/recovery.c
> index 2b8edf4..43bc5e5 100644
> --- a/fs/jbd/recovery.c
> +++ b/fs/jbd/recovery.c
> @@ -478,7 +478,7 @@ static int do_one_pass(journal_t *journal,
> memcpy(nbh->b_data, obh->b_data,
> journal->j_blocksize);
> if (flags & JFS_FLAG_ESCAPE) {
> - *((__be32 *)bh->b_data) =
> + *((__be32 *)nbh->b_data) =
> cpu_to_be32(JFS_MAGIC_NUMBER);
> }

Note that this would also affect filesystems larger than ~12TB, where
JFS_MAGIC_NUMBER might be the first block number in an ext2/3 indirect
block. That would cause the indirect block to be zapped and file data
in the whole block would suddenly become zero. Even worse if this is
the first block in a double-indirect or triple-indirect block, where
4MB or 16GB of the file data would suddenly become a hole. Unlikely,
but with enough monkeys it would be hit.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-03-17 15:38:33

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] jbd: correctly unescape journal data blocks

> Fix a long-standing typo (predating git) that will cause data corruption if a
> journal data block needs unescaping. At the moment the wrong buffer head's data
> is being unescaped.
>
> To test this case mount a filesystem with data=journal, start creating and
> deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
> pull the plug on the device. Without this patch the files will contain zeros
> instead of the correct data after recovery.
>
> Signed-off-by: Duane Griffin <[email protected]>
> ---
> fs/jbd/recovery.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/jbd/recovery.c b/fs/jbd/recovery.c
> index 2b8edf4..43bc5e5 100644
> --- a/fs/jbd/recovery.c
> +++ b/fs/jbd/recovery.c
> @@ -478,7 +478,7 @@ static int do_one_pass(journal_t *journal,
> memcpy(nbh->b_data, obh->b_data,
> journal->j_blocksize);
> if (flags & JFS_FLAG_ESCAPE) {
> - *((__be32 *)bh->b_data) =
> + *((__be32 *)nbh->b_data) =
> cpu_to_be32(JFS_MAGIC_NUMBER);
> }
Good catch. Acked-by: Jan Kara <[email protected]>

Honza

--
Jan Kara <[email protected]>
SuSE CR Labs

2008-03-17 15:38:55

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] jbd2: correctly unescape journal data blocks

> Fix a long-standing typo (predating git) that will cause data corruption if a
> journal data block needs unescaping. At the moment the wrong buffer head's data
> is being unescaped.
>
> To test this case mount a filesystem with data=journal, start creating and
> deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
> pull the plug on the device. Without this patch the files will contain zeros
> instead of the correct data after recovery.
>
> Signed-off-by: Duane Griffin <[email protected]>
> ---
> fs/jbd2/recovery.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 1464113..5d0405a 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -535,7 +535,7 @@ static int do_one_pass(journal_t *journal,
> memcpy(nbh->b_data, obh->b_data,
> journal->j_blocksize);
> if (flags & JBD2_FLAG_ESCAPE) {
> - *((__be32 *)bh->b_data) =
> + *((__be32 *)nbh->b_data) =
> cpu_to_be32(JBD2_MAGIC_NUMBER);
> }
Acked-by: Jan Kara <[email protected]>

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2008-03-17 23:56:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] jbd2: correctly unescape journal data blocks

On Sat, 15 Mar 2008 18:49:45 +0000
"Duane Griffin" <[email protected]> wrote:

> Fix a long-standing typo (predating git) that will cause data corruption if a
> journal data block needs unescaping. At the moment the wrong buffer head's data
> is being unescaped.
>
> To test this case mount a filesystem with data=journal, start creating and
> deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
> pull the plug on the device. Without this patch the files will contain zeros
> instead of the correct data after recovery.
>
> Signed-off-by: Duane Griffin <[email protected]>
> ---
> fs/jbd2/recovery.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 1464113..5d0405a 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -535,7 +535,7 @@ static int do_one_pass(journal_t *journal,
> memcpy(nbh->b_data, obh->b_data,
> journal->j_blocksize);
> if (flags & JBD2_FLAG_ESCAPE) {
> - *((__be32 *)bh->b_data) =
> + *((__be32 *)nbh->b_data) =
> cpu_to_be32(JBD2_MAGIC_NUMBER);
> }
>

Thanks. Ted, I'll scoot this into Linus and [email protected] tomorrowish.