2008-04-08 12:11:52

by Jan Kara

[permalink] [raw]
Subject: Ordered mode rewrite patch

Hello,

attached is a jumbo patch that reverses locking order of transaction
start and page lock in ext3 and rewrites handling of ordered data mode in
JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
The patch survives LTP run on my test machine so it shouldn't eat your data
immediately but bugs are of course possible...
I'm very interested in any results (both positive and negative) you could
get with it :). Thanks for testing it.

Honza

PS: CCing also linux-ext4 list in case there are some other interested
testers. Next on my todo list is to port this for ext4...
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (657.00 B)
ext3+jbd-2.6.25-ordered_mode.diff (55.50 kB)
Download all attachments

2008-04-08 12:19:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> - } else if (!buffer_mapped(bh) && buffer_dirty(bh)) {
> + } else if (!buffer_mapped(bh) && buffer_dirty(bh)
> + && !wbc->skip_unmapped) {

As mentioned last time unmapped buffer shouldn't happen anymore with
filesystem updated to have a ->page_mkwrite operations. As you've
included the ext3 patch in yours the above condition shoujldn't git
anymore.


2008-04-08 15:59:23

by Josef Bacik

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> Hello,
>
> attached is a jumbo patch that reverses locking order of transaction
> start and page lock in ext3 and rewrites handling of ordered data mode in
> JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> The patch survives LTP run on my test machine so it shouldn't eat your data
> immediately but bugs are of course possible...
> I'm very interested in any results (both positive and negative) you could
> get with it :). Thanks for testing it.
>
> Honza
>
> PS: CCing also linux-ext4 list in case there are some other interested
> testers. Next on my todo list is to port this for ext4...
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

Just glancing over it everything looks to be ok, but I still can't compile with
ext3/jbd being modules :). I had to EXPORT_SYMBOL do_writepages,
__filemap_fdatawrite_range and all the jbd things you added to make it build
properly. I will do some testing with it while I'm in class.

Josef

2008-04-08 20:21:29

by Nathan Grennan

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

Jan Kara wrote:
> Hello,
>
> attached is a jumbo patch that reverses locking order of transaction
> start and page lock in ext3 and rewrites handling of ordered data mode in
> JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> The patch survives LTP run on my test machine so it shouldn't eat your data
> immediately but bugs are of course possible...
> I'm very interested in any results (both positive and negative) you could
> get with it :). Thanks for testing it.
>
> Honza
>
> PS: CCing also linux-ext4 list in case there are some other interested
> testers. Next on my todo list is to port this for ext4...
>
I am trying to apply the patch to Fedora 8's kernel-2.6.24.4-64, and it
fails on all the files.

+ '[' '!' -f
/home/builder/rpmbuild/SOURCES/kernel-2.6.24.4/ext3-jbd-2.6.25-ordered_mode.patch
']'
+ case "$patch" in
+ patch -p1 -F1 -s
1 out of 26 hunks FAILED -- saving rejects to file fs/ext3/inode.c.rej
1 out of 11 hunks FAILED -- saving rejects to file fs/jbd/commit.c.rej
1 out of 12 hunks FAILED -- saving rejects to file fs/jbd/transaction.c.rej
1 out of 1 hunk FAILED -- saving rejects to file
include/linux/writeback.h.rej


2008-04-08 21:10:53

by Jan Kara

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Tue 08-04-08 11:47:04, Josef Bacik wrote:
> On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> > Hello,
> >
> > attached is a jumbo patch that reverses locking order of transaction
> > start and page lock in ext3 and rewrites handling of ordered data mode in
> > JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> > The patch survives LTP run on my test machine so it shouldn't eat your data
> > immediately but bugs are of course possible...
> > I'm very interested in any results (both positive and negative) you could
> > get with it :). Thanks for testing it.
> >
> > Honza
> >
> > PS: CCing also linux-ext4 list in case there are some other interested
> > testers. Next on my todo list is to port this for ext4...
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
>
> Just glancing over it everything looks to be ok, but I still can't compile with
> ext3/jbd being modules :). I had to EXPORT_SYMBOL do_writepages,
> __filemap_fdatawrite_range and all the jbd things you added to make it build
> properly. I will do some testing with it while I'm in class.
Strange, I've already been fixing these. Aww, I thought diff contained
all the changes and obviously the branch in git contained some further
fixes :(. Hopefully I've at least committed changes and will be able to
find fixes among git objects...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-04-08 21:16:42

by Jan Kara

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Tue 08-04-08 13:12:20, Nathan Grennan wrote:
> Jan Kara wrote:
>> Hello,
>>
>> attached is a jumbo patch that reverses locking order of transaction
>> start and page lock in ext3 and rewrites handling of ordered data mode in
>> JBD and ext3. Note that the patch will break compilation of ext4 and
>> OCFS2.
>> The patch survives LTP run on my test machine so it shouldn't eat your
>> data
>> immediately but bugs are of course possible...
>> I'm very interested in any results (both positive and negative) you
>> could
>> get with it :). Thanks for testing it.
>>
>> Honza
>>
>> PS: CCing also linux-ext4 list in case there are some other interested
>> testers. Next on my todo list is to port this for ext4...
>>
> I am trying to apply the patch to Fedora 8's kernel-2.6.24.4-64, and it
> fails on all the files.
>
> + '[' '!' -f
> /home/builder/rpmbuild/SOURCES/kernel-2.6.24.4/ext3-jbd-2.6.25-ordered_mode.patch
> ']'
> + case "$patch" in
> + patch -p1 -F1 -s
> 1 out of 26 hunks FAILED -- saving rejects to file fs/ext3/inode.c.rej
> 1 out of 11 hunks FAILED -- saving rejects to file fs/jbd/commit.c.rej
> 1 out of 12 hunks FAILED -- saving rejects to file fs/jbd/transaction.c.rej
> 1 out of 1 hunk FAILED -- saving rejects to file
> include/linux/writeback.h.rej
Hmm, strange so many chunks failed but anyway, the patch is against
recent Linus's git - I think 2.6.25-rc7 will do just fine.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-04-09 14:48:32

by Jan Kara

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Tue 08-04-08 08:19:37, Christoph Hellwig wrote:
> On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> > - } else if (!buffer_mapped(bh) && buffer_dirty(bh)) {
> > + } else if (!buffer_mapped(bh) && buffer_dirty(bh)
> > + && !wbc->skip_unmapped) {
>
> As mentioned last time unmapped buffer shouldn't happen anymore with
> filesystem updated to have a ->page_mkwrite operations. As you've
> included the ext3 patch in yours the above condition shoujldn't git
> anymore.
True. I've used wbc->skip_unmapped to detect inside
ext3_ordered_writepage() that we are committing a transaction and therefore
we should not add inode into the running transaction's list. But later
I've realized that we never need to add inode into transaction's list
from ext3_ordered_writepage() once we have ext3_page_mkwrite(). So now I
don't need wbc->skip_unmapped at all and I've compeletely removed it.
Thanks for pointing this out.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-04-11 18:51:10

by Josef Bacik

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> Hello,
>
> attached is a jumbo patch that reverses locking order of transaction
> start and page lock in ext3 and rewrites handling of ordered data mode in
> JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> The patch survives LTP run on my test machine so it shouldn't eat your data
> immediately but bugs are of course possible...
> I'm very interested in any results (both positive and negative) you could
> get with it :). Thanks for testing it.
>
> Honza
>

Hey Jan,

I just hit a problem with your patch. In journal_destroy() we do a
iput(journal->j_inode) and then kfree the journal, so when the iput comes back
into journal_release_jbd_inode we are doing a use after free which in my case
resulted in a panic. I was going to fix it but I figure since this is still in
transit you would have to just rewrite it so I'm not going to attach one, just
giving you a heads up. Let me know if my explanation wasn't clear enough.
Thanks much,

Josef

2008-04-14 17:47:24

by Jan Kara

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

Hi Josef,

On Fri 11-04-08 14:42:58, Josef Bacik wrote:
> On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> > Hello,
> >
> > attached is a jumbo patch that reverses locking order of transaction
> > start and page lock in ext3 and rewrites handling of ordered data mode in
> > JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> > The patch survives LTP run on my test machine so it shouldn't eat your data
> > immediately but bugs are of course possible...
> > I'm very interested in any results (both positive and negative) you could
> > get with it :). Thanks for testing it.
> >
> > Honza
> I just hit a problem with your patch. In journal_destroy() we do a
> iput(journal->j_inode) and then kfree the journal, so when the iput comes back
> into journal_release_jbd_inode we are doing a use after free which in my case
> resulted in a panic. I was going to fix it but I figure since this is still in
> transit you would have to just rewrite it so I'm not going to attach one, just
> giving you a heads up. Let me know if my explanation wasn't clear enough.
Thanks for letting me know. But I don't quite get it:
journal_destroy() does:
if (journal->j_inode)
iput(journal->j_inode);
if (journal->j_revoke)
journal_destroy_revoke(journal);
kfree(journal->j_wbuf);
kfree(journal);
}

iput() will eventually call generic_forget_inode() which calls
clear_inode() in which we do journal_release_jbd_inode(). But at that
moment journal still exists to I don't see any use after free... But
possibly there could be some other inode in the inode list but which one is
it?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-04-14 17:59:13

by Josef Bacik

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

On Mon, Apr 14, 2008 at 07:47:22PM +0200, Jan Kara wrote:
> Hi Josef,
>
> On Fri 11-04-08 14:42:58, Josef Bacik wrote:
> > On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> > > Hello,
> > >
> > > attached is a jumbo patch that reverses locking order of transaction
> > > start and page lock in ext3 and rewrites handling of ordered data mode in
> > > JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> > > The patch survives LTP run on my test machine so it shouldn't eat your data
> > > immediately but bugs are of course possible...
> > > I'm very interested in any results (both positive and negative) you could
> > > get with it :). Thanks for testing it.
> > >
> > > Honza
> > I just hit a problem with your patch. In journal_destroy() we do a
> > iput(journal->j_inode) and then kfree the journal, so when the iput comes back
> > into journal_release_jbd_inode we are doing a use after free which in my case
> > resulted in a panic. I was going to fix it but I figure since this is still in
> > transit you would have to just rewrite it so I'm not going to attach one, just
> > giving you a heads up. Let me know if my explanation wasn't clear enough.
> Thanks for letting me know. But I don't quite get it:
> journal_destroy() does:
> if (journal->j_inode)
> iput(journal->j_inode);
> if (journal->j_revoke)
> journal_destroy_revoke(journal);
> kfree(journal->j_wbuf);
> kfree(journal);
> }
>
> iput() will eventually call generic_forget_inode() which calls
> clear_inode() in which we do journal_release_jbd_inode(). But at that
> moment journal still exists to I don't see any use after free... But
> possibly there could be some other inode in the inode list but which one is
> it?
>

Hmm sorry in my pretend world iput goes off and does its thing and everything
keeps going... You are right it cant be that inode, but I definitely got a
panic so there must be some other inode in the inode list that is getting freed
after the journal is freed. Thanks much,

Josef

2008-04-25 14:20:30

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Ordered mode rewrite patch

Hi Jan,

On Tue, Apr 08, 2008 at 02:11:49PM +0200, Jan Kara wrote:
> Hello,
>
> attached is a jumbo patch that reverses locking order of transaction
> start and page lock in ext3 and rewrites handling of ordered data mode in
> JBD and ext3. Note that the patch will break compilation of ext4 and OCFS2.
> The patch survives LTP run on my test machine so it shouldn't eat your data
> immediately but bugs are of course possible...
> I'm very interested in any results (both positive and negative) you could
> get with it :). Thanks for testing it.
>
> Honza
>
> PS: CCing also linux-ext4 list in case there are some other interested
> testers. Next on my todo list is to port this for ext4...

No luck for my testcase, similar fsync behaviour taking several seconds:

vim D 0000000000000000 0 4088 4036
ffff81017a143da8 0000000000000086 0000000000000000 ffffffff80236808
ffff81022f272830 ffff81022f1ca730 ffff81022f272a70 0000000600000001
00000000ffffffff 0000000000000003 0000000000000000 0000000000000000
Call Trace:
[<ffffffff80236808>] lock_timer_base+0x26/0x4b
[<ffffffff802f8c00>] log_wait_commit+0x9f/0xed
[<ffffffff8023f7b1>] autoremove_wake_function+0x0/0x2e
[<ffffffff802f4dbf>] journal_stop+0x165/0x18d
[<ffffffff8029a7d9>] __writeback_single_inode+0x17f/0x29d
[<ffffffff8023f7b1>] autoremove_wake_function+0x0/0x2e
[<ffffffff8029b10a>] sync_inode+0x24/0x32
[<ffffffff802e5602>] ext3_sync_file+0x8a/0x9c
[<ffffffff8029d532>] do_fsync+0x52/0xa4
[<ffffffff8029d5a7>] __do_fsync+0x23/0x36
[<ffffffff8020b0ab>] system_call_after_swapgs+0x7b/0x80