2011-02-01 17:51:30

by Curt Wohlgemuth

[permalink] [raw]
Subject: ext4: Fix data corruption with multi-block writepages support

This fixes a corruption problem with the multi-block
writepages submittal change for ext4, from commit
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc ("ext4: use bio
layer instead of buffer layer in mpage_da_submit_io").

(Note that this corruption is not present in 2.6.37 on ext4,
because the multi-block writepages function is only enabled
by a non-default mount option. With the patch below, we can
eventually remove this mount option.)

The ext4 code path to bundle multiple pages for writeback in
ext4_bio_write_page() had a bug: we should be clearing
buffer head dirty flags *before* we submit the bio, not in
the completion routine.

The patch below was tested on 2.6.37 under KVM with the
postgresql script specified by Ted Tso in
1449032be17abb69116dbc393f67ceb8bd034f92 -- without this
commit, and hence by default using the multiblock writepages
submittal code.

Without the patch, I'd hit the corruption problem about
50-70% of the time. With the patch, I executed the script
> 100 times with no corruption seen.

Also don't dereference the bio in ext4_end_bio() after the
bio_put() call.

(I also added a blank line in ext4_bio_write_page(), 'cause
my eyes constantly confused the complex 'for' conditions
from the first statement in the block.)

Signed-off-by: Curt Wohlgemuth <[email protected]>

---

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 6e0e99b..c3d490f 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -193,6 +193,7 @@ static void ext4_end_bio(struct bio *bio, int error)
struct inode *inode;
unsigned long flags;
int i;
+ sector_t bi_sector = bio->bi_sector;

BUG_ON(!io_end);
bio->bi_private = NULL;
@@ -210,9 +211,7 @@ static void ext4_end_bio(struct bio *bio, int error)
if (error)
SetPageError(page);
BUG_ON(!head);
- if (head->b_size == PAGE_CACHE_SIZE)
- clear_buffer_dirty(head);
- else {
+ if (head->b_size != PAGE_CACHE_SIZE) {
loff_t offset;
loff_t io_end_offset = io_end->offset + io_end->size;

@@ -224,7 +223,6 @@ static void ext4_end_bio(struct bio *bio, int error)
if (error)
buffer_io_error(bh);

- clear_buffer_dirty(bh);
}
if (buffer_delay(bh))
partial_write = 1;
@@ -260,7 +258,7 @@ static void ext4_end_bio(struct bio *bio, int error)
(unsigned long long) io_end->offset,
(long) io_end->size,
(unsigned long long)
- bio->bi_sector >> (inode->i_blkbits - 9));
+ bi_sector >> (inode->i_blkbits - 9));
}

/* Add the io_end to per-inode completed io list*/
@@ -383,6 +381,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,

blocksize = 1 << inode->i_blkbits;

+ BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
set_page_writeback(page);
ClearPageError(page);
@@ -400,12 +399,14 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
for (bh = head = page_buffers(page), block_start = 0;
bh != head || !block_start;
block_start = block_end, bh = bh->b_this_page) {
+
block_end = block_start + blocksize;
if (block_start >= len) {
clear_buffer_dirty(bh);
set_buffer_uptodate(bh);
continue;
}
+ clear_buffer_dirty(bh);
ret = io_submit_add_bh(io, io_page, inode, wbc, bh);
if (ret) {
/*


2011-02-03 18:39:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

Thanks, added to the ext4 patch queue.

I modified the commit description slightly to give credit to Jon
Nelson, who reported the bug and really helped by devising a
reproduceable test case. Many thanks, Jon!!

- Ted

ext4: Fix data corruption with multi-block writepages support

From: Curt Wohlgemuth <[email protected]>

This fixes a corruption problem with the multi-block
writepages submittal change for ext4, from commit
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc ("ext4: use bio
layer instead of buffer layer in mpage_da_submit_io").

(Note that this corruption is not present in 2.6.37 on
ext4, because the corruption was detected after the
feature was merged in 2.6.37-rc1, and so it was turned
off by adding a non-default mount option,
mblk_io_submit. With this commit, which hopefully
fixes the last of the bugs with this feature, we'll be
able to turn on this performance feature by default in
2.6.38, and remove the mblk_io_submit option.)

The ext4 code path to bundle multiple pages for
writeback in ext4_bio_write_page() had a bug: we should
be clearing buffer head dirty flags *before* we submit
the bio, not in the completion routine.

The patch below was tested on 2.6.37 under KVM with the
postgresql script which was submitted by Jon Nelson as
documented in commit 1449032be1.

Without the patch, I'd hit the corruption problem about
50-70% of the time. With the patch, I executed the
script > 100 times with no corruption seen.

I also fixed a bug to make sure ext4_end_bio() doesn't
dereference the bio after the bio_put() call.

Reported-by: Jon Nelson <[email protected]>
Signed-off-by: Curt Wohlgemuth <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

2011-02-04 22:40:49

by Matt

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

>Thanks, added to the ext4 patch queue.

>I modified the commit description slightly to give credit to Jon
>Nelson, who reported the bug and really helped by devising a
>reproduceable test case. Many thanks, Jon!!

> - Ted

So that means that the file-corruption which existed until 2.6.37-rc6
and got triggered (for me) more easily via "dm crypt: scale to
multiple CPUs"
is fixed now ?

That should give ext4 a nice speedup for >=2.6.38 :)

Could you also please add an ?

Reported-by: Matthias Bayer <jackdachef <at> gmail <dot> com >

I mainly found it through testing with the mentioned dm-crypt scaling
patch and >=2.6.36-git*

Thanks & Regards !

Matt

2011-02-07 17:45:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

On Fri, Feb 04, 2011 at 10:40:47PM +0000, Matt wrote:
>
> So that means that the file-corruption which existed until 2.6.37-rc6
> and got triggered (for me) more easily via "dm crypt: scale to
> multiple CPUs"
> is fixed now ?

Well, a patch exists for it that will be merged into 2.6.38.

> That should give ext4 a nice speedup for >=2.6.38 :)

I'm not going to make it be the default for 2.6.38, since it's fairly
late in the -rc features. People who want it can explicitly enable it
using the mount option mblk_io_submit, though. (And let me know your
success stories! :-) I will be enabling it as the default in
2.6.39-rc1.

> Reported-by: Matthias Bayer <jackdachef <at> gmail <dot> com >

Sure!

- Ted

2011-02-07 18:29:26

by Milan Broz

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

On 02/07/2011 06:45 PM, Ted Ts'o wrote:
> On Fri, Feb 04, 2011 at 10:40:47PM +0000, Matt wrote:
>>
>> So that means that the file-corruption which existed until 2.6.37-rc6
>> and got triggered (for me) more easily via "dm crypt: scale to
>> multiple CPUs"
>> is fixed now ?
>
> Well, a patch exists for it that will be merged into 2.6.38.
>
>> That should give ext4 a nice speedup for >=2.6.38 :)
>
> I'm not going to make it be the default for 2.6.38, since it's fairly
> late in the -rc features. People who want it can explicitly enable it
> using the mount option mblk_io_submit, though. (And let me know your
> success stories! :-) I will be enabling it as the default in
> 2.6.39-rc1.

So it was ext4 only bug in ext4_end_bio(),
dm-crypt per-cpu code was just trigger here, right?

Milan

2011-02-07 18:44:14

by Matt

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

On Mon, Feb 7, 2011 at 6:29 PM, Milan Broz <[email protected]> wrote:
> On 02/07/2011 06:45 PM, Ted Ts'o wrote:
>> On Fri, Feb 04, 2011 at 10:40:47PM +0000, Matt wrote:
>>>
>>> So that means that the file-corruption which existed until 2.6.37-rc6
>>> and got triggered (for me) more easily via "dm crypt: scale to
>>> multiple CPUs"
>>> is fixed now ?
>>
>> Well, a patch exists for it that will be merged into 2.6.38.
>>
>>> That should give ext4 a nice speedup for >=2.6.38 :)
>>
>> I'm not going to make it be the default for 2.6.38, since it's fairly
>> late in the -rc features. ?People who want it can explicitly enable it
>> using the mount option mblk_io_submit, though. ?(And let me know your
>> success stories! ?:-) I will be enabling it as the default in
>> 2.6.39-rc1.
>
> So it was ext4 only bug in ext4_end_bio(),
> dm-crypt per-cpu code was just trigger here, right?
>
> Milan
>

Hi Milan,

Well, that was at least the experience that I made

ext4: after Ted had disabled support for multiple page-io submission

I observed no data-corruption anymore (it had only appeared on the
system-partition, /home - where ext4 also is used or on my backup
partitions there was also no problem as far as I can tell)

XFS: no corruption observed

reiserfs: I can't say for sure since I'm only using it on my /boot partition :P

for other filesystems I can't say anything - I didn't use additional
ones at that time

Regards

Matt

2011-02-07 18:56:30

by Matt

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

On Mon, Feb 7, 2011 at 5:45 PM, Ted Ts'o <[email protected]> wrote:
> On Fri, Feb 04, 2011 at 10:40:47PM +0000, Matt wrote:
>>
>> So that means that the file-corruption which existed until 2.6.37-rc6
>> and got triggered (for me) more easily via "dm crypt: scale to
>> multiple CPUs"
>> is fixed now ?
>
> Well, a patch exists for it that will be merged into 2.6.38.
>
>> That should give ext4 a nice speedup for >=2.6.38 :)
>
> I'm not going to make it be the default for 2.6.38, since it's fairly
> late in the -rc features. ?People who want it can explicitly enable it
> using the mount option mblk_io_submit, though. ?(And let me know your
> success stories! ?:-) I will be enabling it as the default in
> 2.6.39-rc1.
>

Hi Ted,

I guess it should be save to enable it with 2.6.37, dm-crypt multi-cpu
patch and the following patch ?

"ext4: Fix data corruption with multi-block writepages support"
(of course that's the minimum - it would be better to pull in the ext4
changes for 2.6.38)


For a short time I had it activated (via additional) mblk_io_submit
mount-command on my portage-partition (where the portage-ball of my
Gentoo system is).
I was curious to see what messages I would get and wondered why there
was nothing about mballoc mentioned

If I recall correctly there were always messages in the past, like:

EXT4-fs: delayed allocation enabled
EXT4-fs: file extents enabled
EXT4-fs: mballoc enabled

these are from 2.6.28 -

I'm only getting:

EXT4-fs (dm-3): mounted filesystem with ordered data mode.

or

EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts:
commit=60,barrier=1

(I like to set the barriers / flushes explicitly).

Sorry if I didn't follow development but these messages were kind of
more and more silenced ?


Thanks !


>> Reported-by: Matthias Bayer <jackdachef <at> gmail <dot> com >
>
> Sure!

Thanks !


>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>

Regards

Matt

2011-02-07 20:51:42

by Milan Broz

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

On 02/07/2011 09:44 PM, Ted Ts'o wrote:
> So the fact that we found and fixed an ext4 bug that was triggered by
> dm_crypt should not be taken as a statement (one way or the other)
> that dm_crypt is Bug-Free(tm). :-)

Really? Sigh. ;-)

(There is a rule that if dm-crypt+XFS bug appears, the problem
is always in dm-crypt. So I am quite surprised that this time there
was NO bug in dm-crypt... yet :-)

Anyway, I would like to know if still some problem remains...

Thanks,
Milan

2011-02-07 20:44:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4: Fix data corruption with multi-block writepages support

On Mon, Feb 07, 2011 at 07:29:26PM +0100, Milan Broz wrote:
>
> So it was ext4 only bug in ext4_end_bio(),
> dm-crypt per-cpu code was just trigger here, right?

There appeared to be two bugs that people were discussing on that
particular dm_crypt mail thread. Some people were complaining about
issues with dm_crypt even when ext4 was not involved.

So I think it's fair to say that there was definitely _a_ ext4 bug
which was most easily seen when dm_crypt was in play, but which was
definitely not dm_crypt specific (it was possible to see it on an
hdd-only system, but the workload was much more severe). In any case,
as soon as the problem was found, we disabled the ext4 optimization
in 2.6.37-rc5.

So the fact that we found and fixed an ext4 bug that was triggered by
dm_crypt should not be taken as a statement (one way or the other)
that dm_crypt is Bug-Free(tm). :-)

- Ted