2009-08-25 17:44:09

by Frank Mayhar

[permalink] [raw]
Subject: Problem with ext4_sync_file in no-journal mode.

Our powerfail testing turned up an odd regression when using fsync() in
no-journal mode to force data to the device. We saw loss rates (both
file and data) that were much higher than the same test using ext2 (60+%
loss versus <10%). We've done some investigation and one thing that
stood out was that in the no-journal case, ext4_sync_file() was just
calling sync_inode() (and nothing else), while ext2_sync_file(), for
comparison, was also calling sync_mapping_buffers() to actually push the
data out.

I therefore hacked ext4_sync_file() to call sync_mapping_buffers() in
the no-journal case; when we reran the test we saw that the loss rate
dropped from 60+% to around 50%. While it's clear that we have more
work to do in this area, this is a significant improvement. It appears
that this was just missed when we did the no-journal work. Do you guys
concur?

The other interesting bit of this is that ext4 no-journal without using
fsync() has, apparently, basically the same loss rate as ext2 with
fsync().
--
Frank Mayhar <[email protected]>
Google, Inc.



2009-08-26 16:27:36

by Jan Kara

[permalink] [raw]
Subject: Re: Problem with ext4_sync_file in no-journal mode.

> Our powerfail testing turned up an odd regression when using fsync() in
> no-journal mode to force data to the device. We saw loss rates (both
> file and data) that were much higher than the same test using ext2 (60+%
> loss versus <10%). We've done some investigation and one thing that
> stood out was that in the no-journal case, ext4_sync_file() was just
> calling sync_inode() (and nothing else), while ext2_sync_file(), for
> comparison, was also calling sync_mapping_buffers() to actually push the
> data out.
>
> I therefore hacked ext4_sync_file() to call sync_mapping_buffers() in
> the no-journal case; when we reran the test we saw that the loss rate
> dropped from 60+% to around 50%. While it's clear that we have more
> work to do in this area, this is a significant improvement. It appears
> that this was just missed when we did the no-journal work. Do you guys
> concur?
Well, I'm surprised sync_mapping_buffers() did anything - I believe
it's rather an error in testing. The thing is: sync_mapping_buffers()
writes buffers on private_list of mapping. In ext2, it contains all the
buffers used for indirect blocks. In ext4, there are no buffers there -
you have to call mark_buffer_dirty_inode() to put a buffer to this list
and ext4 does not do that with any buffer. So to make fsync work, you
have to call mark_buffer_dirty_inode() in __ext4_handle_dirty_metadata
if an inode is provided. Then sync_mapping_buffers() will actually do
something.
BTW: the syncing code in ext4_handle_dirty_metadata() looks
suboptimal. Why do you sync each an every metadata buffer? It might be
the easiest way for directories but for regular files this is really
superfluous. There you should need anything since VFS does the syncing
for you.

> The other interesting bit of this is that ext4 no-journal without using
> fsync() has, apparently, basically the same loss rate as ext2 with
> fsync().
Isn't this the other way around? I suppose ext4 without fsync isn't
better than ext4 with fsync ;).

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-08-26 16:41:34

by Frank Mayhar

[permalink] [raw]
Subject: Re: Problem with ext4_sync_file in no-journal mode.

On Wed, 2009-08-26 at 18:27 +0200, Jan Kara wrote:
> > Our powerfail testing turned up an odd regression when using fsync() in
> > no-journal mode to force data to the device. We saw loss rates (both
> > file and data) that were much higher than the same test using ext2 (60+%
> > loss versus <10%). We've done some investigation and one thing that
> > stood out was that in the no-journal case, ext4_sync_file() was just
> > calling sync_inode() (and nothing else), while ext2_sync_file(), for
> > comparison, was also calling sync_mapping_buffers() to actually push the
> > data out.
> >
> > I therefore hacked ext4_sync_file() to call sync_mapping_buffers() in
> > the no-journal case; when we reran the test we saw that the loss rate
> > dropped from 60+% to around 50%. While it's clear that we have more
> > work to do in this area, this is a significant improvement. It appears
> > that this was just missed when we did the no-journal work. Do you guys
> > concur?
> Well, I'm surprised sync_mapping_buffers() did anything - I believe
> it's rather an error in testing. The thing is: sync_mapping_buffers()
> writes buffers on private_list of mapping. In ext2, it contains all the
> buffers used for indirect blocks. In ext4, there are no buffers there -
> you have to call mark_buffer_dirty_inode() to put a buffer to this list
> and ext4 does not do that with any buffer. So to make fsync work, you
> have to call mark_buffer_dirty_inode() in __ext4_handle_dirty_metadata
> if an inode is provided. Then sync_mapping_buffers() will actually do
> something.

Yeah, after digging further I realized that, but be that as it may, it
did indeed make a 10% improvement overall. Why? No idea. In any event
I'll keep digging as the basic problem is still there.

> BTW: the syncing code in ext4_handle_dirty_metadata() looks
> suboptimal. Why do you sync each an every metadata buffer? It might be
> the easiest way for directories but for regular files this is really
> superfluous. There you should need anything since VFS does the syncing
> for you.

Ah, you say "VFS" but what you really mean is "generic_file_xxx_write,"
correct? Basically, at the moment it's just doing in this case what
ext2 does; it does sound like there's optimization that could be done
here, however.

> > The other interesting bit of this is that ext4 no-journal without using
> > fsync() has, apparently, basically the same loss rate as ext2 with
> > fsync().
> Isn't this the other way around? I suppose ext4 without fsync isn't
> better than ext4 with fsync ;).

That's what you would think, isn't it? However, you (and we) would be
wrong. In our testing, ext4+fsync was significantly worse than ext4
without fsync. Like, six times worse. Yes, this is a nonintuitive
result and no, I can't yet explain it.
--
Frank Mayhar <[email protected]>
Google, Inc.


2009-08-26 22:31:15

by Michael Rubin

[permalink] [raw]
Subject: Re: Problem with ext4_sync_file in no-journal mode.

On Wed, Aug 26, 2009 at 9:41 AM, Frank Mayhar<[email protected]> wrote:
> That's what you would think, isn't it? ?However, you (and we) would be
> wrong. ?In our testing, ext4+fsync was significantly worse than ext4
> without fsync. ?Like, six times worse. ?Yes, this is a nonintuitive
> result and no, I can't yet explain it.

Frank is referring to (ext4 with no_journal)+fsync compared to ext4
with no journal and no fsync.

With the journal everything is working as reliably as expected.

We will be publishing data for all the permutations of crashes and
power cycles we have tested as soon as we are confident in all the
data.

mrubin