LinuxLists.cc - [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

"Darrick J. Wong" <[email protected]> writes:

> To assess the performance impact of stable page writes, I moved to a disk that
> doesn't have DIF support so that I could measure just the impact of waiting for
> writeback. I first ran wac with 64 threads madly scribbling on a 64k file and
> saw about a 12 percent performance decrease. I then reran the wac program with
> 64 threads and a 64MB file and saw about the same performance numbers. As I
> suspected, the patchset only seems to impact workloads that rewrite the same
> memory page frequently.
>
> I am still chasing down what exactly is broken in ext3. data=writeback mode
> passes with no failures. data=ordered, however, does not pass; my current
> suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> page_mkclean to kick the userspace programs off the page before writing it.
>
> Per various comments regarding v3 of this patchset, I've integrated his
> suggestions, reworked the patch descriptions to make it clearer which ones
> touch all the filesystems and which ones are to fix remaining holes in specific
> filesystems, and expanded the scope of filesystems that got fixed.
>
> As always, questions and comments are welcome; and thank you to all the
> previous reviewers of this patchset. I am also soliciting people's opinions on
> whether or not these patches could go upstream for .40.

I'd like to know those patches are on what state. Waiting in writeback
page makes slower, like you mentioned it (I guess it would more
noticeable if device was slower that like FAT uses). And I think
currently it doesn't help anything others for blk-integrity stuff
(without other technic, it doesn't help FS consistency)?

So, why is this locking stuff enabled always? I think it would be better
to enable only if blk-integrity stuff was enabled.

If it was more sophisticate but more complex stuff (e.g. use
copy-on-write technic for it), I would agree always enable though.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 12:38:33

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue 10-05-11 10:59:15, OGAWA Hirofumi wrote:
> "Darrick J. Wong" <[email protected]> writes:
>
> > To assess the performance impact of stable page writes, I moved to a disk that
> > doesn't have DIF support so that I could measure just the impact of waiting for
> > writeback. I first ran wac with 64 threads madly scribbling on a 64k file and
> > saw about a 12 percent performance decrease. I then reran the wac program with
> > 64 threads and a 64MB file and saw about the same performance numbers. As I
> > suspected, the patchset only seems to impact workloads that rewrite the same
> > memory page frequently.
> >
> > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > passes with no failures. data=ordered, however, does not pass; my current
> > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > page_mkclean to kick the userspace programs off the page before writing it.
> >
> > Per various comments regarding v3 of this patchset, I've integrated his
> > suggestions, reworked the patch descriptions to make it clearer which ones
> > touch all the filesystems and which ones are to fix remaining holes in specific
> > filesystems, and expanded the scope of filesystems that got fixed.
> >
> > As always, questions and comments are welcome; and thank you to all the
> > previous reviewers of this patchset. I am also soliciting people's opinions on
> > whether or not these patches could go upstream for .40.
>
> I'd like to know those patches are on what state. Waiting in writeback
> page makes slower, like you mentioned it (I guess it would more
> noticeable if device was slower that like FAT uses). And I think
> currently it doesn't help anything others for blk-integrity stuff
> (without other technic, it doesn't help FS consistency)?
>
> So, why is this locking stuff enabled always? I think it would be better
> to enable only if blk-integrity stuff was enabled.
>
> If it was more sophisticate but more complex stuff (e.g. use
> copy-on-write technic for it), I would agree always enable though.
Well, also software RAID generally needs this feature (so that parity
information / mirror can be properly kept in sync). Not that I'd advocate
that this feature must be always enabled, it's just that there are also
other users besides blk-integrity.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 12:41:25

[permalink] [raw]

Subject: Re: [PATCH 2/7] fs: block_page_mkwrite should wait for writeback to finish

On Mon 09-05-11 16:03:34, Darrick J. Wong wrote:
> For filesystems such as nilfs2 and xfs that use block_page_mkwrite, modify that
> function to wait for pending writeback before allowing the page to become
> writable. This is needed to stabilize pages during writeback for those two
> filesystems.
>
> Signed-off-by: Darrick J. Wong <[email protected]>
> ---
> fs/buffer.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index a08bb8e..cf9a795 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2361,6 +2361,7 @@ block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
> if (!ret)
> ret = block_commit_write(page, 0, end);
>
> + wait_on_page_writeback(page);
Not that it matters much but it would seem more logical to me if we
waited only in not-error case (i.e. after the error handling below).

Honza
> if (unlikely(ret)) {
> unlock_page(page);
> if (ret == -ENOMEM)
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 12:51:29

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> I am still chasing down what exactly is broken in ext3. data=writeback mode
> passes with no failures. data=ordered, however, does not pass; my current
> suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> page_mkclean to kick the userspace programs off the page before writing it.
Yes, ext3 in data=ordered mode writes pages from
journal_commit_transaction() via submit_bh() without clearing page dirty
bits thus page_mkclean() is not called for these pages. Frankly, do you
really want to bother with adding support for ext2 and ext3? People can use
ext4 as a fs driver when they want to start using blk-integrity support.
Especially ext2 patch looks really painful and just from a quick look I can
see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 13:13:08

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Jan Kara <[email protected]> writes:

>> I'd like to know those patches are on what state. Waiting in writeback
>> page makes slower, like you mentioned it (I guess it would more
>> noticeable if device was slower that like FAT uses). And I think
>> currently it doesn't help anything others for blk-integrity stuff
>> (without other technic, it doesn't help FS consistency)?
>>
>> So, why is this locking stuff enabled always? I think it would be better
>> to enable only if blk-integrity stuff was enabled.
>>
>> If it was more sophisticate but more complex stuff (e.g. use
>> copy-on-write technic for it), I would agree always enable though.
> Well, also software RAID generally needs this feature (so that parity
> information / mirror can be properly kept in sync). Not that I'd advocate
> that this feature must be always enabled, it's just that there are also
> other users besides blk-integrity.

I see. So many block layer stuff sounds like broken on corner case? If
so, I more feel this approach should be temporary workaround, and should
use another less-blocking approach.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 13:29:59

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue 10-05-11 22:12:54, OGAWA Hirofumi wrote:
> Jan Kara <[email protected]> writes:
>
> >> I'd like to know those patches are on what state. Waiting in writeback
> >> page makes slower, like you mentioned it (I guess it would more
> >> noticeable if device was slower that like FAT uses). And I think
> >> currently it doesn't help anything others for blk-integrity stuff
> >> (without other technic, it doesn't help FS consistency)?
> >>
> >> So, why is this locking stuff enabled always? I think it would be better
> >> to enable only if blk-integrity stuff was enabled.
> >>
> >> If it was more sophisticate but more complex stuff (e.g. use
> >> copy-on-write technic for it), I would agree always enable though.
> > Well, also software RAID generally needs this feature (so that parity
> > information / mirror can be properly kept in sync). Not that I'd advocate
> > that this feature must be always enabled, it's just that there are also
> > other users besides blk-integrity.
>
> I see. So many block layer stuff sounds like broken on corner case? If
> so, I more feel this approach should be temporary workaround, and should
> use another less-blocking approach.
Not many but some... The alternative to less blocking approach is to do
copy-out before a page is submitted for IO (or various middle ground
alternatives of doing sometimes copyout, sometimes blocking...). That costs
some performance as well. We talked about it at LSF and the approach
Darrick is implementing was considered the least intrusive. There's really
no way to fix these corner cases and keep performance. But indeed a plain
SATA drive or a USB stick don't need stable pages so they wouldn't need to
pay the cost. So it would be beneficial if the underlying block device
propagated whether it needs stable writes or not and filesystem could turn
on stable pages accordingly.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 13:36:22

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue, May 10, 2011 at 10:59:15AM +0900, OGAWA Hirofumi wrote:
> I'd like to know those patches are on what state. Waiting in writeback
> page makes slower, like you mentioned it (I guess it would more
> noticeable if device was slower that like FAT uses). And I think
> currently it doesn't help anything others for blk-integrity stuff
> (without other technic, it doesn't help FS consistency)?

It only makes things slower if we rewrite a region in a file that
is currently undergoing writeback. I'd be interested to know about
real life applications doing that, and if they really are badly affect
we should help them to work around that in userspace, e.g. by adding
a fadvice will rewrite call that might be used to never write back
that regions without an explicit fsync call.

2011-05-10 13:46:23

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Jan Kara <[email protected]> writes:

>> I see. So many block layer stuff sounds like broken on corner case? If
>> so, I more feel this approach should be temporary workaround, and should
>> use another less-blocking approach.
> Not many but some... The alternative to less blocking approach is to do
> copy-out before a page is submitted for IO (or various middle ground
> alternatives of doing sometimes copyout, sometimes blocking...). That costs
> some performance as well. We talked about it at LSF and the approach
> Darrick is implementing was considered the least intrusive. There's really
> no way to fix these corner cases and keep performance.

You already considered, to copy only if page was writeback (like
copy-on-write). I.e. if page is on I/O, copy, then switch the page for
writing new data.

Yes, it is complex. But I think blocking and overhead is minimum, and
this can be used as infrastructure for copy-on-write FS.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 13:52:19

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Christoph Hellwig <[email protected]> writes:

> On Tue, May 10, 2011 at 10:59:15AM +0900, OGAWA Hirofumi wrote:
>> I'd like to know those patches are on what state. Waiting in writeback
>> page makes slower, like you mentioned it (I guess it would more
>> noticeable if device was slower that like FAT uses). And I think
>> currently it doesn't help anything others for blk-integrity stuff
>> (without other technic, it doesn't help FS consistency)?
>
> It only makes things slower if we rewrite a region in a file that is
> currently undergoing writeback. I'd be interested to know about real
> life applications doing that, and if they really are badly affect we
> should help them to work around that in userspace, e.g. by adding a
> fadvice will rewrite call that might be used to never write back that
> regions without an explicit fsync call.

Isn't it reallocated blocks too, and metadata too?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 14:05:48

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

OGAWA Hirofumi <[email protected]> writes:

> Jan Kara <[email protected]> writes:
>
>>> I see. So many block layer stuff sounds like broken on corner case? If
>>> so, I more feel this approach should be temporary workaround, and should
>>> use another less-blocking approach.
>> Not many but some... The alternative to less blocking approach is to do
>> copy-out before a page is submitted for IO (or various middle ground
>> alternatives of doing sometimes copyout, sometimes blocking...). That costs
>> some performance as well. We talked about it at LSF and the approach
>> Darrick is implementing was considered the least intrusive. There's really
>> no way to fix these corner cases and keep performance.
>
> You already considered, to copy only if page was writeback (like
> copy-on-write). I.e. if page is on I/O, copy, then switch the page for
> writing new data.

missed question mark in here.

Did you already consider, to copy only if page was writeback (like
copy-on-write)? I.e. if page is on I/O, copy, then switch the page for
writing new data.

> Yes, it is complex. But I think blocking and overhead is minimum, and
> this can be used as infrastructure for copy-on-write FS.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 14:49:49

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue 10-05-11 22:52:10, OGAWA Hirofumi wrote:
> Christoph Hellwig <[email protected]> writes:
>
> > On Tue, May 10, 2011 at 10:59:15AM +0900, OGAWA Hirofumi wrote:
> >> I'd like to know those patches are on what state. Waiting in writeback
> >> page makes slower, like you mentioned it (I guess it would more
> >> noticeable if device was slower that like FAT uses). And I think
> >> currently it doesn't help anything others for blk-integrity stuff
> >> (without other technic, it doesn't help FS consistency)?
> >
> > It only makes things slower if we rewrite a region in a file that is
> > currently undergoing writeback. I'd be interested to know about real
> > life applications doing that, and if they really are badly affect we
> > should help them to work around that in userspace, e.g. by adding a
> > fadvice will rewrite call that might be used to never write back that
> > regions without an explicit fsync call.
>
> Isn't it reallocated blocks too, and metadata too?
Reallocated blocks - not really. For a block to be freed it cannot be
under writeback and when it's freed no writeback is started. For metadata -
yes. But ext3, ext4, xfs, btrfs have to avoid modifying metadata under
writeback anyway (because of journalling / COW constraints) and thus they
don't care. For ext2 or vfat it's a different story. But as I wrote to
Darrick, I'm not sure about vfat but for ext2 and similar legacy
filesystems, I'd rather let them live with their unstable pages under IO ;)
because I see a limited use for that.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 14:54:25

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue 10-05-11 23:05:41, OGAWA Hirofumi wrote:
> OGAWA Hirofumi <[email protected]> writes:
>
> > Jan Kara <[email protected]> writes:
> >
> >>> I see. So many block layer stuff sounds like broken on corner case? If
> >>> so, I more feel this approach should be temporary workaround, and should
> >>> use another less-blocking approach.
> >> Not many but some... The alternative to less blocking approach is to do
> >> copy-out before a page is submitted for IO (or various middle ground
> >> alternatives of doing sometimes copyout, sometimes blocking...). That costs
> >> some performance as well. We talked about it at LSF and the approach
> >> Darrick is implementing was considered the least intrusive. There's really
> >> no way to fix these corner cases and keep performance.
> >
> > You already considered, to copy only if page was writeback (like
> > copy-on-write). I.e. if page is on I/O, copy, then switch the page for
> > writing new data.
>
> missed question mark in here.
>
> Did you already consider, to copy only if page was writeback (like
> copy-on-write)? I.e. if page is on I/O, copy, then switch the page for
> writing new data.
Yes, that was considered as well. We'd have to essentially migrate the
page that is under writeback and should be written to. You are going to pay
the cost of page allocation, copy, increased memory & cache pressure.
Depending on your backing storage and workload this may or may not be better
than waiting for IO...

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 15:25:09

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Jan Kara <[email protected]> writes:

>> Isn't it reallocated blocks too, and metadata too?
> Reallocated blocks - not really. For a block to be freed it cannot be
> under writeback and when it's freed no writeback is started.

Sure for data -> data reallocated case. metadata -> data/metadata is
still there.

> For metadata - yes. But ext3, ext4, xfs, btrfs have to avoid modifying
> metadata under writeback anyway (because of journalling / COW
> constraints) and thus they don't care.

Yes. Those would use better way than just blocking.

> For ext2 or vfat it's a different story. But as I wrote to Darrick,
> I'm not sure about vfat but for ext2 and similar legacy filesystems,
> I'd rather let them live with their unstable pages under IO ;) because
> I see a limited use for that.

If this patches was not going to tackle it, I have no argument here ;)
It would be simply FS specific approach/fixes anymore like journal.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 16:12:21

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Jan Kara <[email protected]> writes:

>> Did you already consider, to copy only if page was writeback (like
>> copy-on-write)? I.e. if page is on I/O, copy, then switch the page for
>> writing new data.
> Yes, that was considered as well. We'd have to essentially migrate the
> page that is under writeback and should be written to. You are going to pay
> the cost of page allocation, copy, increased memory & cache pressure.
> Depending on your backing storage and workload this may or may not be better
> than waiting for IO...

Maybe possible, but you really think on usual case just blocking is
better?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 16:18:44

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed 11-05-11 00:24:58, OGAWA Hirofumi wrote:
> Jan Kara <[email protected]> writes:
>
> >> Isn't it reallocated blocks too, and metadata too?
> > Reallocated blocks - not really. For a block to be freed it cannot be
> > under writeback and when it's freed no writeback is started.
>
> Sure for data -> data reallocated case. metadata -> data/metadata is
> still there.
Unless you properly use bforget() (which you should I think). But I have
not really checked this in detail for a while.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 16:22:41

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed 11-05-11 01:12:13, OGAWA Hirofumi wrote:
> Jan Kara <[email protected]> writes:
>
> >> Did you already consider, to copy only if page was writeback (like
> >> copy-on-write)? I.e. if page is on I/O, copy, then switch the page for
> >> writing new data.
> > Yes, that was considered as well. We'd have to essentially migrate the
> > page that is under writeback and should be written to. You are going to pay
> > the cost of page allocation, copy, increased memory & cache pressure.
> > Depending on your backing storage and workload this may or may not be better
> > than waiting for IO...
>
> Maybe possible, but you really think on usual case just blocking is
> better?
Define usual case... As Christoph noted, we don't currently have a real
practical case where blocking would matter (since frequent rewrites are
rather rare). So defining what is usual when we don't have a single real
case is kind of tough ;)

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 16:25:56

by Chris Mason

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Excerpts from Jan Kara's message of 2011-05-10 08:51:24 -0400:
> On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > passes with no failures. data=ordered, however, does not pass; my current
> > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > page_mkclean to kick the userspace programs off the page before writing it.
> Yes, ext3 in data=ordered mode writes pages from
> journal_commit_transaction() via submit_bh() without clearing page dirty
> bits thus page_mkclean() is not called for these pages. Frankly, do you
> really want to bother with adding support for ext2 and ext3? People can use
> ext4 as a fs driver when they want to start using blk-integrity support.
> Especially ext2 patch looks really painful and just from a quick look I can
> see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.

I think ext23 are going to be pretty big changes, we're best off just
going with ext4.

-chris

2011-05-10 16:28:42

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Jan Kara <[email protected]> writes:

>> Maybe possible, but you really think on usual case just blocking is
>> better?
> Define usual case... As Christoph noted, we don't currently have a real
> practical case where blocking would matter (since frequent rewrites are
> rather rare). So defining what is usual when we don't have a single real
> case is kind of tough ;)

OK. E.g. usual workload on desktop, but FS like ext2/fat.
--
OGAWA Hirofumi <[email protected]>

2011-05-10 16:29:32

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Jan Kara <[email protected]> writes:

> On Wed 11-05-11 00:24:58, OGAWA Hirofumi wrote:
>> Jan Kara <[email protected]> writes:
>>
>> >> Isn't it reallocated blocks too, and metadata too?
>> > Reallocated blocks - not really. For a block to be freed it cannot be
>> > under writeback and when it's freed no writeback is started.
>>
>> Sure for data -> data reallocated case. metadata -> data/metadata is
>> still there.
> Unless you properly use bforget() (which you should I think). But I have
> not really checked this in detail for a while.

bforget() doesn't wait IO, right?
--
OGAWA Hirofumi <[email protected]>

2011-05-10 17:03:16

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed 11-05-11 01:29:28, OGAWA Hirofumi wrote:
> Jan Kara <[email protected]> writes:
>
> > On Wed 11-05-11 00:24:58, OGAWA Hirofumi wrote:
> >> Jan Kara <[email protected]> writes:
> >>
> >> >> Isn't it reallocated blocks too, and metadata too?
> >> > Reallocated blocks - not really. For a block to be freed it cannot be
> >> > under writeback and when it's freed no writeback is started.
> >>
> >> Sure for data -> data reallocated case. metadata -> data/metadata is
> >> still there.
> > Unless you properly use bforget() (which you should I think). But I have
> > not really checked this in detail for a while.
>
> bforget() doesn't wait IO, right?
Right, my fault. Sorry. So you were right, reallocated blocks will be hit
as well.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-10 17:03:51

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed, May 11, 2011 at 12:24:58AM +0900, OGAWA Hirofumi wrote:
> > under writeback and when it's freed no writeback is started.
>
> Sure for data -> data reallocated case. metadata -> data/metadata is
> still there.

That's usually handled differently. For XFS take a look at the
xfs_alloc_busy_* function. For 2.6.40 they've been mostly rewritten
to rarely wait for the reuse but instead avoid busy blocks. But that's
a real data integrity issue even without stable pages for I/O.

2011-05-10 17:13:18

by djwong

[permalink] [raw]

Subject: Re: [PATCH 2/7] fs: block_page_mkwrite should wait for writeback to finish

For filesystems such as nilfs2 and xfs that use block_page_mkwrite, modify that
function to wait for pending writeback before allowing the page to become
writable. This is needed to stabilize pages during writeback for those two
filesystems.

Slight rework based on Jan Kara's suggestion.

Signed-off-by: Darrick J. Wong <[email protected]>
---

fs/buffer.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index a08bb8e..0e7fa16 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2367,8 +2367,10 @@ block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
ret = VM_FAULT_OOM;
else /* -ENOSPC, -EIO, etc */
ret = VM_FAULT_SIGBUS;
- } else
+ } else {
+ wait_on_page_writeback(page);
ret = VM_FAULT_LOCKED;
+ }

out:
return ret;

2011-05-10 20:50:19

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Christoph Hellwig <[email protected]> writes:

> On Wed, May 11, 2011 at 12:24:58AM +0900, OGAWA Hirofumi wrote:
>> > under writeback and when it's freed no writeback is started.
>>
>> Sure for data -> data reallocated case. metadata -> data/metadata is
>> still there.
>
> That's usually handled differently. For XFS take a look at the
> xfs_alloc_busy_* function. For 2.6.40 they've been mostly rewritten
> to rarely wait for the reuse but instead avoid busy blocks. But that's
> a real data integrity issue even without stable pages for I/O.

Sounds good. So... Are you suggesting this series should use better
approach than just blocking?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-11 16:16:02

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed, May 11, 2011 at 05:50:07AM +0900, OGAWA Hirofumi wrote:
> Sounds good. So... Are you suggesting this series should use better
> approach than just blocking?

No, block reuse is a problem independent of stable pages.

2011-05-11 16:33:38

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Christoph Hellwig <[email protected]> writes:

> On Wed, May 11, 2011 at 05:50:07AM +0900, OGAWA Hirofumi wrote:
>> Sounds good. So... Are you suggesting this series should use better
>> approach than just blocking?
>
> No, block reuse is a problem independent of stable pages.

OK. So, sounds like we are talking different points. I was generic stuff
(whole of patches). You were only some patches (guess it's only data page).
--
OGAWA Hirofumi <[email protected]>

2011-05-11 18:19:08

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue, May 10, 2011 at 02:51:24PM +0200, Jan Kara wrote:
> On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > passes with no failures. data=ordered, however, does not pass; my current
> > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > page_mkclean to kick the userspace programs off the page before writing it.
> Yes, ext3 in data=ordered mode writes pages from
> journal_commit_transaction() via submit_bh() without clearing page dirty
> bits thus page_mkclean() is not called for these pages. Frankly, do you
> really want to bother with adding support for ext2 and ext3? People can use
> ext4 as a fs driver when they want to start using blk-integrity support.
> Especially ext2 patch looks really painful and just from a quick look I can
> see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.

Yeah, I agree that ext2 is ugly and ext3/jbd might be more painful. Are there
any other code that wants stable pages that's already running with ext3? In
this months-long discussion I've heard that encryption and raid also like
stable pages during writes. Have those users been broken this whole time, or
have they been stabilizing pages themselves?

I suppose we can cross the "ext3 fails horribly on DIF" bridge when someone
complains about it. Possibly we could try to steer them to btrfs.

--D

2011-05-12 09:43:03

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed 11-05-11 11:19:01, Darrick J. Wong wrote:
> On Tue, May 10, 2011 at 02:51:24PM +0200, Jan Kara wrote:
> > On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> > > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > > passes with no failures. data=ordered, however, does not pass; my current
> > > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > > page_mkclean to kick the userspace programs off the page before writing it.
> > Yes, ext3 in data=ordered mode writes pages from
> > journal_commit_transaction() via submit_bh() without clearing page dirty
> > bits thus page_mkclean() is not called for these pages. Frankly, do you
> > really want to bother with adding support for ext2 and ext3? People can use
> > ext4 as a fs driver when they want to start using blk-integrity support.
> > Especially ext2 patch looks really painful and just from a quick look I can
> > see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.
>
> Yeah, I agree that ext2 is ugly and ext3/jbd might be more painful. Are there
> any other code that wants stable pages that's already running with ext3? In
> this months-long discussion I've heard that encryption and raid also like
> stable pages during writes. Have those users been broken this whole time, or
> have they been stabilizing pages themselves?
I believe part of them has been broken (e.g. raid) and part of them do
copy-out so they were OK.

> I suppose we can cross the "ext3 fails horribly on DIF" bridge when someone
> complains about it. Possibly we could try to steer them to btrfs.
Well, btrfs might be a bit too advantageous for production servers but
ext4 would be definitely viable for them.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-16 18:47:43

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Wed, May 11, 2011 at 01:28:32AM +0900, OGAWA Hirofumi wrote:
> Jan Kara <[email protected]> writes:
>
> >> Maybe possible, but you really think on usual case just blocking is
> >> better?
> > Define usual case... As Christoph noted, we don't currently have a real
> > practical case where blocking would matter (since frequent rewrites are
> > rather rare). So defining what is usual when we don't have a single real
> > case is kind of tough ;)
>
> OK. E.g. usual workload on desktop, but FS like ext2/fat.

In the frequent rewrite case, here's what you get:

Regular disk: (possibly garbage) write, followed by a second write to make the
disk reflect memory contents.

RAID w/ shadow pages: two writes, both consistent. Higher memory consumption.

T10 DIF disk: disk error any time the CPU modifies a page that the disk
controller is DMA'ing out of memory. I suppose one could simply retry the
operation if the page is dirty, but supposing memory writes are happening fast
enough that the retries also produce disk errors, _nothing_ ever gets written.

With the new stable-page-writes patchset, the garbage write/disk error symptoms
go away since the processes block instead of creating this window where it's
not clear whether the disk's copy of the data is consistent. I could turn the
wait_on_page_writeback calls into some sort of page migration if the
performance turns out to be terrible, though I'm still working on quantifying
the impact. Some people pointed out that sqlite tends to write the same blocks
frequently, though I wonder if sqlite actually tries to write memory pages
while syncing them?

One use case where I could see a serious performance hit happening is the case
where some app writes a bunch of memory pages, calls sync to force the dirty
pages to disk, and /must/ resume writing those memory pages before the sync
completes. The page migration would of course help there, provided a memory
page can be found in less time than an I/O operation.

Someone commented on the LWN article about this topic, claiming that he had a
program that couldn't afford to block on writes to mlock()'d memory. I'm not
sure how to fix that program, because if memory writes never coordinate with
disk writes and the other threads are always writing memory, I wonder how the
copy on disk isn't always indeterminate.

--D

2011-05-16 18:49:37

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Thu, May 12, 2011 at 11:42:55AM +0200, Jan Kara wrote:
> On Wed 11-05-11 11:19:01, Darrick J. Wong wrote:
> > On Tue, May 10, 2011 at 02:51:24PM +0200, Jan Kara wrote:
> > > On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> > > > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > > > passes with no failures. data=ordered, however, does not pass; my current
> > > > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > > > page_mkclean to kick the userspace programs off the page before writing it.
> > > Yes, ext3 in data=ordered mode writes pages from
> > > journal_commit_transaction() via submit_bh() without clearing page dirty
> > > bits thus page_mkclean() is not called for these pages. Frankly, do you
> > > really want to bother with adding support for ext2 and ext3? People can use
> > > ext4 as a fs driver when they want to start using blk-integrity support.
> > > Especially ext2 patch looks really painful and just from a quick look I can
> > > see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.
> >
> > Yeah, I agree that ext2 is ugly and ext3/jbd might be more painful. Are there
> > any other code that wants stable pages that's already running with ext3? In
> > this months-long discussion I've heard that encryption and raid also like
> > stable pages during writes. Have those users been broken this whole time, or
> > have they been stabilizing pages themselves?
> I believe part of them has been broken (e.g. raid) and part of them do
> copy-out so they were OK.

A future step might be to undo all these homegrown copy-outs?

> > I suppose we can cross the "ext3 fails horribly on DIF" bridge when someone
> > complains about it. Possibly we could try to steer them to btrfs.
> Well, btrfs might be a bit too advantageous for production servers but
> ext4 would be definitely viable for them.

Are there any distros that are going straight from ext3 to btrfs?

--D

2011-05-16 19:00:17

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Mon 16-05-11 11:49:27, Darrick J. Wong wrote:
> On Thu, May 12, 2011 at 11:42:55AM +0200, Jan Kara wrote:
> > On Wed 11-05-11 11:19:01, Darrick J. Wong wrote:
> > > On Tue, May 10, 2011 at 02:51:24PM +0200, Jan Kara wrote:
> > > > On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> > > > > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > > > > passes with no failures. data=ordered, however, does not pass; my current
> > > > > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > > > > page_mkclean to kick the userspace programs off the page before writing it.
> > > > Yes, ext3 in data=ordered mode writes pages from
> > > > journal_commit_transaction() via submit_bh() without clearing page dirty
> > > > bits thus page_mkclean() is not called for these pages. Frankly, do you
> > > > really want to bother with adding support for ext2 and ext3? People can use
> > > > ext4 as a fs driver when they want to start using blk-integrity support.
> > > > Especially ext2 patch looks really painful and just from a quick look I can
> > > > see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.
> > >
> > > Yeah, I agree that ext2 is ugly and ext3/jbd might be more painful. Are there
> > > any other code that wants stable pages that's already running with ext3? In
> > > this months-long discussion I've heard that encryption and raid also like
> > > stable pages during writes. Have those users been broken this whole time, or
> > > have they been stabilizing pages themselves?
> > I believe part of them has been broken (e.g. raid) and part of them do
> > copy-out so they were OK.
>
> A future step might be to undo all these homegrown copy-outs?
Sure but I'm not the right one to tell you where these are ;).

> > > I suppose we can cross the "ext3 fails horribly on DIF" bridge when someone
> > > complains about it. Possibly we could try to steer them to btrfs.
> > Well, btrfs might be a bit too advantageous for production servers but
> > ext4 would be definitely viable for them.
>
> Are there any distros that are going straight from ext3 to btrfs?
Most distros currently offer users a choice of xfs, ext3, ext4, btrfs
with ext4 being the default. I'm not sure if that's what you are asking
about...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-05-16 19:06:13

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Mon, May 09, 2011 at 04:03:18PM -0700, Darrick J. Wong wrote:
> Hi all,
>
> This is v3.1 of the stable-page-writes patchset for ext4/3/2, xfs, and fat.
> The purpose of this patchset is to prohibit processes from writing on memory
> pages that are currently being written to disk because certain storage setups
> (e.g. SCSI disks with DIF integrity checksums) will fail a write if the page
> contents don't match the checksum. btrfs already guarantees page stability, so
> it does not use these changes.
>
> The technique used is fairly simple -- whenever a page is about to become
> writable (either because of a write fault to a mapped page, or a buffered write
> is in progress), wait for the page writeback flag to be clear, indicating that
> the page is not being written to disk. This means that it is necessary (1) to
> add wait for writeback code to grab_cache_page_write_begin to take care of
> buffered writes, and (2) all filesystems must have a page_mkwrite that locks a
> page, waits for writeback, and returns the locked page. For filesystems that
> piggyback on the generic block_page_mkwrite, the patchset adds the writeback
> wait to that function; for filesystems that do not use the page_mkwrite hook at
> all, the patchset provides a stub page_mkwrite.
>
> I ran my write-after-checksum ("wac") reproducer program to try to create the
> DIF checksum errors by madly rewriting the same memory pages. In fact, I tried
> the following combinations against ext2/3/4, xfs, btrfs, and vfat:
>
> a. 64 write() threads + sync_file_range
> b. 64 mmap write threads + msync
> c. 32 write() threads + sync_file_range + 32 mmap write threads + msync
> d. Same as C, but with all threads in directio mode
> e. Same as A, but with all threads in directio mode
> f. Same as B, but with all threads in directio mode
>
> After running profiles A-F for 30 minutes each on 6 different machines, ext2,
> ext4, xfs, and vfat reported no errors. ext3 still has a lingering failure
> case (which I will touch on briefly later) and btrfs eventually reports -ENOSPC
> and fails the test, though it does that even without any of the patches applied.
>
> To assess the performance impact of stable page writes, I moved to a disk that
> doesn't have DIF support so that I could measure just the impact of waiting for
> writeback. I first ran wac with 64 threads madly scribbling on a 64k file and
> saw about a 12 percent performance decrease. I then reran the wac program with
> 64 threads and a 64MB file and saw about the same performance numbers. As I
> suspected, the patchset only seems to impact workloads that rewrite the same
> memory page frequently.
>
> I am still chasing down what exactly is broken in ext3. data=writeback mode
> passes with no failures. data=ordered, however, does not pass; my current
> suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> page_mkclean to kick the userspace programs off the page before writing it.
>
> Per various comments regarding v3 of this patchset, I've integrated his
> suggestions, reworked the patch descriptions to make it clearer which ones
> touch all the filesystems and which ones are to fix remaining holes in specific
> filesystems, and expanded the scope of filesystems that got fixed.
>
> As always, questions and comments are welcome; and thank you to all the
> previous reviewers of this patchset. I am also soliciting people's opinions on
> whether or not these patches could go upstream for .40.

[adding Andrew Morton to cc]

Ted, Mingming, and I were discussing how we might get this patchset pushed
upstream on today's ext4 community call. The ext2 patch can be dropped since
it really only was there as a proof that the generic mm/fs fixes actually
worked. I'm unsure of the vfat maintainer's feelings on the patchset, though
he seems concerned about the performance of apps that rewrite pages frequently.
Ted seemed agreeable with the ext4 changes, though I don't know if he's
reviewed them thoroughly yet.

Ted asked for clarification as to which patches are needed to fix ext4.
Patches 1 and 5 are the two that are required for ext4, and patch 4 cleans up
some ext4 code.

Patches 1 and 2 are needed to fix xfs and nilfs.

Patches 1 and 3 (and fs-specific patches such as 6 & 7) are needed to fix the
rest.

Unfortunately, Ted and I weren't sure who (if anyone) is in charge of pushing
mm and generic fs patches upstream. Ted suggested Andrew Morton for the mm
patches (1 & 3) and we weren't sure if Al Viro or Christoph (or someone else)
is in charge of generic vfs patches (patch #2).

So, who do I actually ask to take the mm and vfs patches? Andrew, can I send
you patches?

--D

2011-05-16 19:10:12

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Mon, May 16, 2011 at 08:59:47PM +0200, Jan Kara wrote:
> On Mon 16-05-11 11:49:27, Darrick J. Wong wrote:
> > On Thu, May 12, 2011 at 11:42:55AM +0200, Jan Kara wrote:
> > > On Wed 11-05-11 11:19:01, Darrick J. Wong wrote:
> > > > On Tue, May 10, 2011 at 02:51:24PM +0200, Jan Kara wrote:
> > > > > On Mon 09-05-11 16:03:18, Darrick J. Wong wrote:
> > > > > > I am still chasing down what exactly is broken in ext3. data=writeback mode
> > > > > > passes with no failures. data=ordered, however, does not pass; my current
> > > > > > suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> > > > > > page_mkclean to kick the userspace programs off the page before writing it.
> > > > > Yes, ext3 in data=ordered mode writes pages from
> > > > > journal_commit_transaction() via submit_bh() without clearing page dirty
> > > > > bits thus page_mkclean() is not called for these pages. Frankly, do you
> > > > > really want to bother with adding support for ext2 and ext3? People can use
> > > > > ext4 as a fs driver when they want to start using blk-integrity support.
> > > > > Especially ext2 patch looks really painful and just from a quick look I can
> > > > > see code e.g. in fs/ext2/namei.c which isn't handled by your patch yet.
> > > >
> > > > Yeah, I agree that ext2 is ugly and ext3/jbd might be more painful. Are there
> > > > any other code that wants stable pages that's already running with ext3? In
> > > > this months-long discussion I've heard that encryption and raid also like
> > > > stable pages during writes. Have those users been broken this whole time, or
> > > > have they been stabilizing pages themselves?
> > > I believe part of them has been broken (e.g. raid) and part of them do
> > > copy-out so they were OK.
> >
> > A future step might be to undo all these homegrown copy-outs?
> Sure but I'm not the right one to tell you where these are ;).

Yes, I've found a couple just by digging through the source tree. But maybe
I'll get this small set upstream before writing more patches.

> > > > I suppose we can cross the "ext3 fails horribly on DIF" bridge when someone
> > > > complains about it. Possibly we could try to steer them to btrfs.
> > > Well, btrfs might be a bit too advantageous for production servers but
> > > ext4 would be definitely viable for them.
> >
> > Are there any distros that are going straight from ext3 to btrfs?
> Most distros currently offer users a choice of xfs, ext3, ext4, btrfs
> with ext4 being the default. I'm not sure if that's what you are asking
> about...

Yep. I was primarily concerned that there might be some customer that would be
ok with deploying DIF hardware and rolling forward to ext4, but not to btrfs,
only to find that some distro refused to ship ext4. Looks like SLES/RHEL both
do, however. :)

--D

2011-05-16 19:31:54

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

"Darrick J. Wong" <[email protected]> writes:

>> OK. E.g. usual workload on desktop, but FS like ext2/fat.
>
> In the frequent rewrite case, here's what you get:
>
> Regular disk: (possibly garbage) write, followed by a second write to make the
> disk reflect memory contents.
>
> RAID w/ shadow pages: two writes, both consistent. Higher memory consumption.
>
> T10 DIF disk: disk error any time the CPU modifies a page that the disk
> controller is DMA'ing out of memory. I suppose one could simply retry the
> operation if the page is dirty, but supposing memory writes are happening fast
> enough that the retries also produce disk errors, _nothing_ ever gets written.
>
> With the new stable-page-writes patchset, the garbage write/disk error symptoms
> go away since the processes block instead of creating this window where it's
> not clear whether the disk's copy of the data is consistent. I could turn the
> wait_on_page_writeback calls into some sort of page migration if the
> performance turns out to be terrible, though I'm still working on quantifying
> the impact. Some people pointed out that sqlite tends to write the same blocks
> frequently, though I wonder if sqlite actually tries to write memory pages
> while syncing them?
>
> One use case where I could see a serious performance hit happening is the case
> where some app writes a bunch of memory pages, calls sync to force the dirty
> pages to disk, and /must/ resume writing those memory pages before the sync
> completes. The page migration would of course help there, provided a memory
> page can be found in less time than an I/O operation.
>
> Someone commented on the LWN article about this topic, claiming that he had a
> program that couldn't afford to block on writes to mlock()'d memory. I'm not
> sure how to fix that program, because if memory writes never coordinate with
> disk writes and the other threads are always writing memory, I wonder how the
> copy on disk isn't always indeterminate.

I'm not thinking data page is special operation for doing this (at least
logically). In other word, if you are talking about only data page, you
shouldn't send patches for metadata with it.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-16 20:27:42

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

Whay about just sending the VFS patches to Al instead of talking about
it on a totally irrelevant call that doesn't include the important
stakeholders? FS-specific patches can go through the fs maintainers
independently.

2011-05-16 20:56:16

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Mon, May 16, 2011 at 04:27:10PM -0400, Christoph Hellwig wrote:
> Whay about just sending the VFS patches to Al

Al was in the To: list of all 7 patches.

> instead of talking about it on a totally irrelevant call that doesn't include
> the important stakeholders? FS-specific patches can go through the fs
> maintainers independently.

The maintainers (ext4/ext2/vfat) were also in the To: list.

Trouble is, MAINTAINERS says this:

MEMORY MANAGEMENT
L: [email protected]
W: http://www.linux-mm.org
S: Maintained
F: include/linux/mm.h
F: mm/

There's a list, but no specific contact person. That's why I had to start
asking around about who actually pushes mm changes to Linus.

As for Al Viro, he's still listed as the VFS maintainer; isn't he resting?
I guess he did nominate you for the holding off of morons (like me). :)

--D

2011-05-17 01:23:20

by djwong

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

On Tue, May 17, 2011 at 04:31:37AM +0900, OGAWA Hirofumi wrote:
> "Darrick J. Wong" <[email protected]> writes:
>
> >> OK. E.g. usual workload on desktop, but FS like ext2/fat.
> >
> > In the frequent rewrite case, here's what you get:
> >
> > Regular disk: (possibly garbage) write, followed by a second write to make the
> > disk reflect memory contents.
> >
> > RAID w/ shadow pages: two writes, both consistent. Higher memory consumption.
> >
> > T10 DIF disk: disk error any time the CPU modifies a page that the disk
> > controller is DMA'ing out of memory. I suppose one could simply retry the
> > operation if the page is dirty, but supposing memory writes are happening fast
> > enough that the retries also produce disk errors, _nothing_ ever gets written.
> >
> > With the new stable-page-writes patchset, the garbage write/disk error symptoms
> > go away since the processes block instead of creating this window where it's
> > not clear whether the disk's copy of the data is consistent. I could turn the
> > wait_on_page_writeback calls into some sort of page migration if the
> > performance turns out to be terrible, though I'm still working on quantifying
> > the impact. Some people pointed out that sqlite tends to write the same blocks
> > frequently, though I wonder if sqlite actually tries to write memory pages
> > while syncing them?
> >
> > One use case where I could see a serious performance hit happening is the case
> > where some app writes a bunch of memory pages, calls sync to force the dirty
> > pages to disk, and /must/ resume writing those memory pages before the sync
> > completes. The page migration would of course help there, provided a memory
> > page can be found in less time than an I/O operation.
> >
> > Someone commented on the LWN article about this topic, claiming that he had a
> > program that couldn't afford to block on writes to mlock()'d memory. I'm not
> > sure how to fix that program, because if memory writes never coordinate with
> > disk writes and the other threads are always writing memory, I wonder how the
> > copy on disk isn't always indeterminate.
>
> I'm not thinking data page is special operation for doing this (at least
> logically). In other word, if you are talking about only data page, you
> shouldn't send patches for metadata with it.

Patch 7, which is the only patch that touches code under fs/fat/, only fixes
the case where the filesystem tries to modify its own metadata while writing
out the same metadata.

Patches 2 and 3, which affect only files mm/ and fs/, prevent data pages from
being modified while the same data pages are being written out. It is not
necessary to modify any fs/fat/ code to fix the data page case, fortunately.

That said, the intent of the patch set is to prevent writes to any memory page,
regardless of type (data or metadata), while the same page is being written out.

--D

2011-05-17 03:30:46

[permalink] [raw]

Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses

"Darrick J. Wong" <[email protected]> writes:

>> > With the new stable-page-writes patchset, the garbage write/disk
>> > error symptoms
>> > go away since the processes block instead of creating this window where it's
>> > not clear whether the disk's copy of the data is consistent. I
>> > could turn the
>> > wait_on_page_writeback calls into some sort of page migration if the
>> > performance turns out to be terrible, though I'm still working on
>> > quantifying
>> > the impact. Some people pointed out that sqlite tends to write
>> > the same blocks
>> > frequently, though I wonder if sqlite actually tries to write memory pages
>> > while syncing them?
>> >
>> > One use case where I could see a serious performance hit happening
>> > is the case
>> > where some app writes a bunch of memory pages, calls sync to force the dirty
>> > pages to disk, and /must/ resume writing those memory pages before the sync
>> > completes. The page migration would of course help there, provided a memory
>> > page can be found in less time than an I/O operation.

[...]

>> I'm not thinking data page is special operation for doing this (at least
>> logically). In other word, if you are talking about only data page, you
>> shouldn't send patches for metadata with it.
>
> Patch 7, which is the only patch that touches code under fs/fat/, only fixes
> the case where the filesystem tries to modify its own metadata while writing
> out the same metadata.
>
> Patches 2 and 3, which affect only files mm/ and fs/, prevent data pages from
> being modified while the same data pages are being written out. It is not
> necessary to modify any fs/fat/ code to fix the data page case, fortunately.
>
> That said, the intent of the patch set is to prevent writes to any memory page,
> regardless of type (data or metadata), while the same page is being written out.

Yes. I understand it though, performance analysis (and looks like
approach) can be quite difference with data page. The metadata depends
on FS, and it can be much more hit depends on FS state, not simple like
data page.

And if you changed only overwrite case, I already mentioned though,
which one makes sure to prevent reallocated metadata case for FAT?

What about metadata operations performance impact? Even if FS was low
free blocks state, performance impact is small? And read can be the
cause of atime update, don't matter (now relatime is default though)?
Although FAT specific, what about fragmented case (i.e. modify multiple
FAT table blocks even if sequential write)?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-05-17 14:02:20