2011-11-08 22:59:53

by Curt Wohlgemuth

[permalink] [raw]
Subject: Bug with "fix partial page writes"

It appears that there's a bug with this patch:

-------------------------------------------
commit 02fac1297eb3f471a27368271aadd285548297b0
Author: Allison Henderson <[email protected]>
Date: Tue Sep 6 21:53:01 2011 -0400

ext4: fix partial page writes
...
-------------------------------------------

Hugh Dickens found a bug with some nasty testing and lockdep that
crashed in ext4_da_write_end(), and after looking at the code with
him, it appears that the call to
ext4_discard_partial_page_buffers_no_lock() in this routine is
manipulating an unlocked, and possibly non-existent page:


-------------------------------------------
...
ret2 = generic_write_end(file, mapping, pos, len, copied,
page, fsdata);

page_len = PAGE_CACHE_SIZE -
((pos + copied - 1) & (PAGE_CACHE_SIZE - 1));

if (page_len > 0) {
ret = ext4_discard_partial_page_buffers_no_lock(handle,
inode, page, pos + copied - 1, page_len,
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
}
...
-------------------------------------------

Note that generic_write_end() will unlock and release the page before
it returns.

I've no good answer for how to fix this properly, but I wanted to let
Allison know about this, if she hadn't already. I looked but didn't
see any related email on the linux-ext4 list for this problem.

Thanks,
Curt


2011-11-20 20:59:24

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes"

We've seen no response to this, so Cc'ing Ted and linux-kernel,
and I'll fill in some more detail below.

On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
> It appears that there's a bug with this patch:
>
> -------------------------------------------
> commit 02fac1297eb3f471a27368271aadd285548297b0
> Author: Allison Henderson <[email protected]>
> Date: Tue Sep 6 21:53:01 2011 -0400
>
> ext4: fix partial page writes
> ...
> -------------------------------------------
>
> Hugh Dickins found a bug with some nasty testing and lockdep that

It's the tmpfs swapping test that I've been running, with variations,
for years. System booted with mem=700M and 1.5G swap, two repetitious
make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because
the balance of built to unbuilt source grows smaller with later kernels),
one directly in a tmpfs (irrelevant in this case, except for the added
pressure it generates), the other in a 1k-block ext2 (that I drive with
ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file.

The first oops I got was indeed down in lockdep, but I've since seen
crashes from the same cause without lockdep configured in. I've not
bothered to write down the stacks, beyond noting ext4_da_write_end()'s
call to ext4_discard_partial_page_buffers_no_lock() in them, since the
code there is clearly at fault as Curt describes.

> crashed in ext4_da_write_end(), and after looking at the code with
> him, it appears that the call to
> ext4_discard_partial_page_buffers_no_lock() in this routine is
> manipulating an unlocked, and possibly non-existent page:
>
>
> -------------------------------------------
> ...
> ret2 = generic_write_end(file, mapping, pos, len, copied,
> page, fsdata);
>
> page_len = PAGE_CACHE_SIZE -
> ((pos + copied - 1) & (PAGE_CACHE_SIZE - 1));
>
> if (page_len > 0) {
> ret = ext4_discard_partial_page_buffers_no_lock(handle,
> inode, page, pos + copied - 1, page_len,
> EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
> }
> ...
> -------------------------------------------
>
> Note that generic_write_end() will unlock and release the page before
> it returns.

Exactly. And clearly the loop-on-tmpfs aspect of the test is
irrelevant, except in generating more pressure to trigger it.

>
> I've no good answer for how to fix this properly, but I wanted to let
> Allison know about this, if she hadn't already. I looked but didn't
> see any related email on the linux-ext4 list for this problem.

There was a second problem I was seeing, more elusive and much harder
to attribute: occasionally the build on ext2 would fail with errors
from ld (almost always of the kind "In function `no symbol': multiple
definition of `no symbol'" and "Warning: size of symbol `' changed":
I don't know if there's anything to be deduced from that). I took
these to indicate an error in filesystem or loop or tmpfs or swap.

First suspect was loop changes from hch in 3.2-rc1, but backing those
out made no difference. I thought I was facing a week's bisection
(since it would need at least a day to conclude any stage good), but
took a gamble on backing out *both* parts of 02fac1297eb3: page_len
additions to ext4_da_write_begin() as well as page_len additions to
ext4_da_write_end().

That gamble paid off: the test then showed no problems in several
days running on two machines. So, both parts of 02fac1297eb3 are
bad, but it's not so easy to see what's wrong with the write_begin.

My *guess* is that the partial page fixes have interfered with the
subtle page-dirty buffer-dirty protocol in some way, which manifests
only under memory pressure.

It's conceivable that loop and tmpfs and swap play a part in this
further error, but I don't think so: I have no evidence for that,
and no such problem was seen before 3.2-rc1.

---

I wanted to find you an easier way to reproduce the problem, so I
tried fsx (I'm still using a pretty old fsx, no fallocate or punch
hole), run in ext2 on a kernel booted with mem=700M. Sorry, I did
this a week ago, then didn't find time to write it up, and failed to
note when my ext2 was in /dev/loop0 and when it was directly on disk.

fsx foo -q -c 100 -l 100000000 &
while :
do # memory hog mmaps and touches each page of 800MB private area
swapout 800
done

I did not reproduce either problem above with that. Instead I found
that backing out 02fac1297eb3 made fsx on 3.2-rc1 fail in a few minutes.
But leaving 02fac1297eb3 in, fsx still failed in 20 minutes or an hour.
On 3.1, fsx failed in a few minutes. On 3.0, fsx failed in half an hour.
On 2.6.39, fsx failed in a few minutes. I had to go back to 2.6.38 for
fsx to run successfully under memory pressure for more than two hours.

It looks as if ext4 testing has not been running fsx under memory
pressure recently. And although I didn't reproduce my main problems
that way, it could well be that getting fsx to run reliably again
under memory pressure will be the way to fix those problems.

Thanks,
Hugh

2011-11-21 01:59:56

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes"

Hi,

I am curious about the reason we need this operation in write_begin
functions. I had a look at the commit log just now. The commit
log explains the intention is to handle writes on a hole and writes on
EOF. Two cases can be handled successfully by block_write_begin.


Yongqiang.

On Mon, Nov 21, 2011 at 4:59 AM, Hugh Dickins <[email protected]> wrote:
> We've seen no response to this, so Cc'ing Ted and linux-kernel,
> and I'll fill in some more detail below.
>
> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>> It appears that there's a bug with this patch:
>>
>> -------------------------------------------
>> commit 02fac1297eb3f471a27368271aadd285548297b0
>> Author: Allison Henderson <[email protected]>
>> Date: ? Tue Sep 6 21:53:01 2011 -0400
>>
>> ? ? ext4: fix partial page writes
>> ...
>> -------------------------------------------
>>
>> Hugh Dickins found a bug with some nasty testing and lockdep that
>
> It's the tmpfs swapping test that I've been running, with variations,
> for years. ?System booted with mem=700M and 1.5G swap, two repetitious
> make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because
> the balance of built to unbuilt source grows smaller with later kernels),
> one directly in a tmpfs (irrelevant in this case, except for the added
> pressure it generates), the other in a 1k-block ext2 (that I drive with
> ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file.
>
> The first oops I got was indeed down in lockdep, but I've since seen
> crashes from the same cause without lockdep configured in. ?I've not
> bothered to write down the stacks, beyond noting ext4_da_write_end()'s
> call to ext4_discard_partial_page_buffers_no_lock() in them, since the
> code there is clearly at fault as Curt describes.
>
>> crashed in ext4_da_write_end(), and after looking at the code with
>> him, it appears that the call to
>> ext4_discard_partial_page_buffers_no_lock() in this routine is
>> manipulating an unlocked, and possibly non-existent page:
>>
>>
>> -------------------------------------------
>> ...
>> ? ? ? ret2 = generic_write_end(file, mapping, pos, len, copied,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? page, fsdata);
>>
>> ? ? ? page_len = PAGE_CACHE_SIZE -
>> ? ? ? ? ? ? ? ? ? ? ? ((pos + copied - 1) & (PAGE_CACHE_SIZE - 1));
>>
>> ? ? ? if (page_len > 0) {
>> ? ? ? ? ? ? ? ret = ext4_discard_partial_page_buffers_no_lock(handle,
>> ? ? ? ? ? ? ? ? ? ? ? inode, page, pos + copied - 1, page_len,
>> ? ? ? ? ? ? ? ? ? ? ? EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
>> ? ? ? }
>> ...
>> -------------------------------------------
>>
>> Note that generic_write_end() will unlock and release the page before
>> it returns.
>
> Exactly. ?And clearly the loop-on-tmpfs aspect of the test is
> irrelevant, except in generating more pressure to trigger it.
>
>>
>> I've no good answer for how to fix this properly, but I wanted to let
>> Allison know about this, if she hadn't already. ?I looked but didn't
>> see any related email on the linux-ext4 list for this problem.
>
> There was a second problem I was seeing, more elusive and much harder
> to attribute: occasionally the build on ext2 would fail with errors
> from ld (almost always of the kind "In function `no symbol': multiple
> definition of `no symbol'" and "Warning: size of symbol `' changed":
> I don't know if there's anything to be deduced from that). ?I took
> these to indicate an error in filesystem or loop or tmpfs or swap.
>
> First suspect was loop changes from hch in 3.2-rc1, but backing those
> out made no difference. ?I thought I was facing a week's bisection
> (since it would need at least a day to conclude any stage good), but
> took a gamble on backing out *both* parts of 02fac1297eb3: page_len
> additions to ext4_da_write_begin() as well as page_len additions to
> ext4_da_write_end().
>
> That gamble paid off: the test then showed no problems in several
> days running on two machines. ?So, both parts of 02fac1297eb3 are
> bad, but it's not so easy to see what's wrong with the write_begin.
>
> My *guess* is that the partial page fixes have interfered with the
> subtle page-dirty buffer-dirty protocol in some way, which manifests
> only under memory pressure.
>
> It's conceivable that loop and tmpfs and swap play a part in this
> further error, but I don't think so: I have no evidence for that,
> and no such problem was seen before 3.2-rc1.
>
> ---
>
> I wanted to find you an easier way to reproduce the problem, so I
> tried fsx (I'm still using a pretty old fsx, no fallocate or punch
> hole), run in ext2 on a kernel booted with mem=700M. ?Sorry, I did
> this a week ago, then didn't find time to write it up, and failed to
> note when my ext2 was in /dev/loop0 and when it was directly on disk.
>
> fsx foo -q -c 100 -l 100000000 &
> while :
> do ? ? ?# memory hog mmaps and touches each page of 800MB private area
> ? ? ? ?swapout 800
> done
>
> I did not reproduce either problem above with that. ?Instead I found
> that backing out 02fac1297eb3 made fsx on 3.2-rc1 fail in a few minutes.
> But leaving 02fac1297eb3 in, fsx still failed in 20 minutes or an hour.
> On 3.1, fsx failed in a few minutes. ?On 3.0, fsx failed in half an hour.
> On 2.6.39, fsx failed in a few minutes. ?I had to go back to 2.6.38 for
> fsx to run successfully under memory pressure for more than two hours.
>
> It looks as if ext4 testing has not been running fsx under memory
> pressure recently. ?And although I didn't reproduce my main problems
> that way, it could well be that getting fsx to run reliably again
> under memory pressure will be the way to fix those problems.
>
> Thanks,
> Hugh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best Wishes
Yongqiang Yang

2011-11-21 16:56:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes"

On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
> We've seen no response to this, so Cc'ing Ted and linux-kernel,
> and I'll fill in some more detail below.

Hugh,

Thanks for reminding us about this. Unfortunately bugzilla is still
down, so we'll have to track this via e-mail.

I mentioned this issue on the weekly ext4 call, and though there will
be a delay due to the Thanksgiving break, Allison said she would try
to take a look at this. Hopefully other folks will as well.

> I did not reproduce either problem above with that. Instead I found
> that backing out 02fac1297eb3 made fsx on 3.2-rc1 fail in a few minutes.
> But leaving 02fac1297eb3 in, fsx still failed in 20 minutes or an hour.
> On 3.1, fsx failed in a few minutes. On 3.0, fsx failed in half an hour.
> On 2.6.39, fsx failed in a few minutes. I had to go back to 2.6.38 for
> fsx to run successfully under memory pressure for more than two hours.
>
> It looks as if ext4 testing has not been running fsx under memory
> pressure recently. And although I didn't reproduce my main problems
> that way, it could well be that getting fsx to run reliably again
> under memory pressure will be the way to fix those problems.

Yes, I think we've been relying mostly on xfstests, and not
necessarily under extreme memory pressures. Out of curiosity, what
sort of configuration were you using when you did the above tests?
(memory, swap, fs bock size, etc.) Was it the same as you did with
your make -j20 kernel stress test? And where you using any special
fsx options?

I agree that we should add better memory pressure testing to ext4.

Regards,

- Ted

2011-11-21 17:39:51

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes"

On 11/20/2011 06:59 PM, Yongqiang Yang wrote:
> Hi,
>
> I am curious about the reason we need this operation in write_begin
> functions. I had a look at the commit log just now. The commit
> log explains the intention is to handle writes on a hole and writes on
> EOF. Two cases can be handled successfully by block_write_begin.
>
>
> Yongqiang.

Hi all,

Sorry I missed the first note that came through. I have not been able
to look at this in depth yet, but will do so when I get back from the
holiday break next Thurs. Basically this patch was addressing a bug I
found when I was trying to get the punch hole patch through an overnight
run of fsx.

With out this patch, fsx fails (after about 6 or so hours, with punch
hole enabled). The failure is triggered when a region of the test file
that is supposed to contain zeros, ends up containing garbage. In this
case what I found was that a write operation that starts/ends in a hole
or runs off the edge of the file, is supposed to zero out the part of
the page that is still in the hole or beyond EOF. But instead of
zeroing to the end of the page, it would only zero to the edge of the
block. So it would only appear to work when blocksize = pagesize, but
if blocksize < pagesize, we end up with extra garbage in the page.

ext4_discard_partial_page_buffers_no_lock() and
ext4_discard_partial_page_buffers(), were modeled off of the original
ext4_block_zero_page_range routine, but modified to handle multiple
blocks for a page. ext4_discard_partial_page_buffers is simply a
wrapper that locks the page before passing it to
ext4_discard_partial_page_buffers_no_lock. In most cases I found that
the page needs to be locked, but for ext4_da_write_end and
ext4_da_write_begin I ran into deadlocks, so I added the wrapper for
optional locking. I will look more into it when I get back, but perhaps
all we need here is some more logic to figure out if the page is present
and needs locking.

Allison Henderson

>
> On Mon, Nov 21, 2011 at 4:59 AM, Hugh Dickins<[email protected]> wrote:
>> We've seen no response to this, so Cc'ing Ted and linux-kernel,
>> and I'll fill in some more detail below.
>>
>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>> It appears that there's a bug with this patch:
>>>
>>> -------------------------------------------
>>> commit 02fac1297eb3f471a27368271aadd285548297b0
>>> Author: Allison Henderson<[email protected]>
>>> Date: Tue Sep 6 21:53:01 2011 -0400
>>>
>>> ext4: fix partial page writes
>>> ...
>>> -------------------------------------------
>>>
>>> Hugh Dickins found a bug with some nasty testing and lockdep that
>>
>> It's the tmpfs swapping test that I've been running, with variations,
>> for years. System booted with mem=700M and 1.5G swap, two repetitious
>> make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because
>> the balance of built to unbuilt source grows smaller with later kernels),
>> one directly in a tmpfs (irrelevant in this case, except for the added
>> pressure it generates), the other in a 1k-block ext2 (that I drive with
>> ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file.
>>
>> The first oops I got was indeed down in lockdep, but I've since seen
>> crashes from the same cause without lockdep configured in. I've not
>> bothered to write down the stacks, beyond noting ext4_da_write_end()'s
>> call to ext4_discard_partial_page_buffers_no_lock() in them, since the
>> code there is clearly at fault as Curt describes.
>>
>>> crashed in ext4_da_write_end(), and after looking at the code with
>>> him, it appears that the call to
>>> ext4_discard_partial_page_buffers_no_lock() in this routine is
>>> manipulating an unlocked, and possibly non-existent page:
>>>
>>>
>>> -------------------------------------------
>>> ...
>>> ret2 = generic_write_end(file, mapping, pos, len, copied,
>>> page, fsdata);
>>>
>>> page_len = PAGE_CACHE_SIZE -
>>> ((pos + copied - 1)& (PAGE_CACHE_SIZE - 1));
>>>
>>> if (page_len> 0) {
>>> ret = ext4_discard_partial_page_buffers_no_lock(handle,
>>> inode, page, pos + copied - 1, page_len,
>>> EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
>>> }
>>> ...
>>> -------------------------------------------
>>>
>>> Note that generic_write_end() will unlock and release the page before
>>> it returns.
>>
>> Exactly. And clearly the loop-on-tmpfs aspect of the test is
>> irrelevant, except in generating more pressure to trigger it.
>>
>>>
>>> I've no good answer for how to fix this properly, but I wanted to let
>>> Allison know about this, if she hadn't already. I looked but didn't
>>> see any related email on the linux-ext4 list for this problem.
>>
>> There was a second problem I was seeing, more elusive and much harder
>> to attribute: occasionally the build on ext2 would fail with errors
>> from ld (almost always of the kind "In function `no symbol': multiple
>> definition of `no symbol'" and "Warning: size of symbol `' changed":
>> I don't know if there's anything to be deduced from that). I took
>> these to indicate an error in filesystem or loop or tmpfs or swap.
>>
>> First suspect was loop changes from hch in 3.2-rc1, but backing those
>> out made no difference. I thought I was facing a week's bisection
>> (since it would need at least a day to conclude any stage good), but
>> took a gamble on backing out *both* parts of 02fac1297eb3: page_len
>> additions to ext4_da_write_begin() as well as page_len additions to
>> ext4_da_write_end().
>>
>> That gamble paid off: the test then showed no problems in several
>> days running on two machines. So, both parts of 02fac1297eb3 are
>> bad, but it's not so easy to see what's wrong with the write_begin.
>>
>> My *guess* is that the partial page fixes have interfered with the
>> subtle page-dirty buffer-dirty protocol in some way, which manifests
>> only under memory pressure.
>>
>> It's conceivable that loop and tmpfs and swap play a part in this
>> further error, but I don't think so: I have no evidence for that,
>> and no such problem was seen before 3.2-rc1.
>>
>> ---
>>
>> I wanted to find you an easier way to reproduce the problem, so I
>> tried fsx (I'm still using a pretty old fsx, no fallocate or punch
>> hole), run in ext2 on a kernel booted with mem=700M. Sorry, I did
>> this a week ago, then didn't find time to write it up, and failed to
>> note when my ext2 was in /dev/loop0 and when it was directly on disk.
>>
>> fsx foo -q -c 100 -l 100000000&
>> while :
>> do # memory hog mmaps and touches each page of 800MB private area
>> swapout 800
>> done
>>
>> I did not reproduce either problem above with that. Instead I found
>> that backing out 02fac1297eb3 made fsx on 3.2-rc1 fail in a few minutes.
>> But leaving 02fac1297eb3 in, fsx still failed in 20 minutes or an hour.
>> On 3.1, fsx failed in a few minutes. On 3.0, fsx failed in half an hour.
>> On 2.6.39, fsx failed in a few minutes. I had to go back to 2.6.38 for
>> fsx to run successfully under memory pressure for more than two hours.
>>
>> It looks as if ext4 testing has not been running fsx under memory
>> pressure recently. And although I didn't reproduce my main problems
>> that way, it could well be that getting fsx to run reliably again
>> under memory pressure will be the way to fix those problems.
>>
>> Thanks,
>> Hugh
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>


2011-11-21 22:04:35

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes"

On Mon, 21 Nov 2011, Ted Ts'o wrote:
> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>
> > I did not reproduce either problem above with that. Instead I found
> > that backing out 02fac1297eb3 made fsx on 3.2-rc1 fail in a few minutes.
> > But leaving 02fac1297eb3 in, fsx still failed in 20 minutes or an hour.
> > On 3.1, fsx failed in a few minutes. On 3.0, fsx failed in half an hour.
> > On 2.6.39, fsx failed in a few minutes. I had to go back to 2.6.38 for
> > fsx to run successfully under memory pressure for more than two hours.
> >
> > It looks as if ext4 testing has not been running fsx under memory
> > pressure recently. And although I didn't reproduce my main problems
> > that way, it could well be that getting fsx to run reliably again
> > under memory pressure will be the way to fix those problems.
>
> Yes, I think we've been relying mostly on xfstests, and not
> necessarily under extreme memory pressures. Out of curiosity, what
> sort of configuration were you using when you did the above tests?
> (memory, swap, fs bock size, etc.) Was it the same as you did with
> your make -j20 kernel stress test? And where you using any special
> fsx options?

Thanks for your rapid replies.

x86_64 kernel booted with "mem=700M cgroup_disable=memory" (latter to
rule out any memcg effects), swap was 1.5G, ext2 block size was 1024,

CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT23=y
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set

fsx options as below, no fallocation or holepunching:

fsx foo -q -c 100 -l 100000000 &
while :
do # memory hog mmaps and touches each page of 800MB private area
swapout 800
done

You might well wonder about the provenance of my fsx, and I'm not
certain where it came from originally (perhaps akpm's toolbox about
nine years ago). So I got an xfstests.git tree

HEAD e219e1cb59660b010ae8c1e22d41d319bb1e10c7
Date: Tue Nov 8 11:41:45 2011 +0800

built the latest version of fsx there, and ran the script with that,
on 3.2-rc2+ kernel pulled yesterday.

This time I used a disk partition, to rule out any loop or tmpfs
effects; but sized the partition to 460800 blocks to match what
I'd been using with loop and tmpfs (after growing impatient when
the first run on the whole 1.5G partition ran for half an hour -
but in retrospect perhaps it was about to blow when I stopped it).

It ran for 28 minutes before fsx failed with
READ BAD DATA: offset = 0x97f50, size = 0xe1bf, fname = foo
OFFSET GOOD BAD RANGE
0xa6000 0x537e 0x0000 0x 0
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
...
0xa600f 0x4453 0x0000 0x f
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (433525 total operations):
followed by the usual trace of operations leading up to this point.

Hugh

2011-11-22 01:44:51

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes"

On Tue, Nov 22, 2011 at 1:38 AM, Allison Henderson
<[email protected]> wrote:
> On 11/20/2011 06:59 PM, Yongqiang Yang wrote:
>>
>> Hi,
>>
>> I am curious about the reason we need this operation in write_begin
>> functions. ? ?I had a look at the commit log just now. ? The commit
>> log explains the intention is to handle writes on a hole and writes on
>> EOF. ? Two cases can be handled successfully by block_write_begin.
>>
>>
>> Yongqiang.
>
> Hi all,
>
> Sorry I missed the first note that came through. ?I have not been able to
> look at this in depth yet, but will do so when I get back from the holiday
> break next Thurs. ?Basically this patch was addressing a bug I found when I
> was trying to get the punch hole patch through an overnight run of fsx.
>
> With out this patch, fsx fails (after about 6 or so hours, with punch hole
> enabled). ?The failure is triggered when a region of the test file that is
> supposed to contain zeros, ends up containing garbage. ?In this case what I
> found was that a write operation that starts/ends in a hole or runs off the
> edge of the file, is supposed to zero out the part of the page that is still
> in the hole or beyond EOF. ?But instead of zeroing to the end of the page,
> it would only zero to the edge of the block. ?So it would only appear to
> work when blocksize = pagesize, but if blocksize < pagesize, we end up with
> extra garbage in the page.
Thank you for your response.

Well. According to code in vfs, do_mpage_readpage can handle this by
calling block_read_full_page. Consider that current code can not
handle the case above, then if there is a hole in a file, ext4 with
punching hole does not work as well. I am suspecting punch hole do
something abnormal, eg. setting false uptodate status or operating on
a unlocked page. Just a guess:-) I will have a look at the code.

Yongqiang.
>
> ext4_discard_partial_page_buffers_no_lock() and
> ext4_discard_partial_page_buffers(), were modeled off of the original
> ext4_block_zero_page_range routine, but modified to handle multiple blocks
> for a page. ?ext4_discard_partial_page_buffers is simply a wrapper that
> locks the page before passing it to
> ext4_discard_partial_page_buffers_no_lock. ?In most cases I found that the
> page needs to be locked, but for ext4_da_write_end and ext4_da_write_begin I
> ran into deadlocks, so I added the wrapper for optional locking. ?I will
> look more into it when I get back, but perhaps all we need here is some more
> logic to figure out if the page is present and needs locking.
>
> Allison Henderson
>
>>
>> On Mon, Nov 21, 2011 at 4:59 AM, Hugh Dickins<[email protected]> ?wrote:
>>>
>>> We've seen no response to this, so Cc'ing Ted and linux-kernel,
>>> and I'll fill in some more detail below.
>>>
>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>
>>>> It appears that there's a bug with this patch:
>>>>
>>>> -------------------------------------------
>>>> commit 02fac1297eb3f471a27368271aadd285548297b0
>>>> Author: Allison Henderson<[email protected]>
>>>> Date: ? Tue Sep 6 21:53:01 2011 -0400
>>>>
>>>> ? ? ext4: fix partial page writes
>>>> ...
>>>> -------------------------------------------
>>>>
>>>> Hugh Dickins found a bug with some nasty testing and lockdep that
>>>
>>> It's the tmpfs swapping test that I've been running, with variations,
>>> for years. ?System booted with mem=700M and 1.5G swap, two repetitious
>>> make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because
>>> the balance of built to unbuilt source grows smaller with later kernels),
>>> one directly in a tmpfs (irrelevant in this case, except for the added
>>> pressure it generates), the other in a 1k-block ext2 (that I drive with
>>> ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file.
>>>
>>> The first oops I got was indeed down in lockdep, but I've since seen
>>> crashes from the same cause without lockdep configured in. ?I've not
>>> bothered to write down the stacks, beyond noting ext4_da_write_end()'s
>>> call to ext4_discard_partial_page_buffers_no_lock() in them, since the
>>> code there is clearly at fault as Curt describes.
>>>
>>>> crashed in ext4_da_write_end(), and after looking at the code with
>>>> him, it appears that the call to
>>>> ext4_discard_partial_page_buffers_no_lock() in this routine is
>>>> manipulating an unlocked, and possibly non-existent page:
>>>>
>>>>
>>>> -------------------------------------------
>>>> ...
>>>> ? ? ? ret2 = generic_write_end(file, mapping, pos, len, copied,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? page, fsdata);
>>>>
>>>> ? ? ? page_len = PAGE_CACHE_SIZE -
>>>> ? ? ? ? ? ? ? ? ? ? ? ((pos + copied - 1)& ?(PAGE_CACHE_SIZE - 1));
>>>>
>>>> ? ? ? if (page_len> ?0) {
>>>> ? ? ? ? ? ? ? ret = ext4_discard_partial_page_buffers_no_lock(handle,
>>>> ? ? ? ? ? ? ? ? ? ? ? inode, page, pos + copied - 1, page_len,
>>>> ? ? ? ? ? ? ? ? ? ? ? EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
>>>> ? ? ? }
>>>> ...
>>>> -------------------------------------------
>>>>
>>>> Note that generic_write_end() will unlock and release the page before
>>>> it returns.
>>>
>>> Exactly. ?And clearly the loop-on-tmpfs aspect of the test is
>>> irrelevant, except in generating more pressure to trigger it.
>>>
>>>>
>>>> I've no good answer for how to fix this properly, but I wanted to let
>>>> Allison know about this, if she hadn't already. ?I looked but didn't
>>>> see any related email on the linux-ext4 list for this problem.
>>>
>>> There was a second problem I was seeing, more elusive and much harder
>>> to attribute: occasionally the build on ext2 would fail with errors
>>> from ld (almost always of the kind "In function `no symbol': multiple
>>> definition of `no symbol'" and "Warning: size of symbol `' changed":
>>> I don't know if there's anything to be deduced from that). ?I took
>>> these to indicate an error in filesystem or loop or tmpfs or swap.
>>>
>>> First suspect was loop changes from hch in 3.2-rc1, but backing those
>>> out made no difference. ?I thought I was facing a week's bisection
>>> (since it would need at least a day to conclude any stage good), but
>>> took a gamble on backing out *both* parts of 02fac1297eb3: page_len
>>> additions to ext4_da_write_begin() as well as page_len additions to
>>> ext4_da_write_end().
>>>
>>> That gamble paid off: the test then showed no problems in several
>>> days running on two machines. ?So, both parts of 02fac1297eb3 are
>>> bad, but it's not so easy to see what's wrong with the write_begin.
>>>
>>> My *guess* is that the partial page fixes have interfered with the
>>> subtle page-dirty buffer-dirty protocol in some way, which manifests
>>> only under memory pressure.
>>>
>>> It's conceivable that loop and tmpfs and swap play a part in this
>>> further error, but I don't think so: I have no evidence for that,
>>> and no such problem was seen before 3.2-rc1.
>>>
>>> ---
>>>
>>> I wanted to find you an easier way to reproduce the problem, so I
>>> tried fsx (I'm still using a pretty old fsx, no fallocate or punch
>>> hole), run in ext2 on a kernel booted with mem=700M. ?Sorry, I did
>>> this a week ago, then didn't find time to write it up, and failed to
>>> note when my ext2 was in /dev/loop0 and when it was directly on disk.
>>>
>>> fsx foo -q -c 100 -l 100000000&
>>> while :
>>> do ? ? ?# memory hog mmaps and touches each page of 800MB private area
>>> ? ? ? ?swapout 800
>>> done
>>>
>>> I did not reproduce either problem above with that. ?Instead I found
>>> that backing out 02fac1297eb3 made fsx on 3.2-rc1 fail in a few minutes.
>>> But leaving 02fac1297eb3 in, fsx still failed in 20 minutes or an hour.
>>> On 3.1, fsx failed in a few minutes. ?On 3.0, fsx failed in half an hour.
>>> On 2.6.39, fsx failed in a few minutes. ?I had to go back to 2.6.38 for
>>> fsx to run successfully under memory pressure for more than two hours.
>>>
>>> It looks as if ext4 testing has not been running fsx under memory
>>> pressure recently. ?And although I didn't reproduce my main problems
>>> that way, it could well be that getting fsx to run reliably again
>>> under memory pressure will be the way to fix those problems.
>>>
>>> Thanks,
>>> Hugh
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>
>



--
Best Wishes
Yongqiang Yang

2011-12-05 23:39:03

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Mon, 21 Nov 2011, Hugh Dickins wrote:
> On Mon, 21 Nov 2011, Ted Ts'o wrote:
> > On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
> > > On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
> > > It appears that there's a bug with this patch:

This has been outstanding for a month now, and we've heard no progress:
please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.

The problems appear on a 1k-blocksize filesystem under memory pressure:
the hunk in ext4_da_write_end() causes oops, because it's playing with
a page after generic_write_end() dropped our last reference to it; and
backing out the hunk in ext4_da_write_begin() is then found to stop
rare data corruption seen when kbuilding.

Although I earlier reported that backing out the patch caused an fsx
test to fail earlier, I've since found great variation in how soon it
fails, and seen it fail just as quickly with 02fac1297eb3 still in.
I also reported that I had to go back to 2.6.38 for fsx not to fail
under memory pressure: you won't be surprised that that turned out to
be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.

Thanks,
Hugh

2011-12-06 01:40:58

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/05/2011 04:38 PM, Hugh Dickins wrote:
> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>> It appears that there's a bug with this patch:
>
> This has been outstanding for a month now, and we've heard no progress:
> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>
> The problems appear on a 1k-blocksize filesystem under memory pressure:
> the hunk in ext4_da_write_end() causes oops, because it's playing with
> a page after generic_write_end() dropped our last reference to it; and
> backing out the hunk in ext4_da_write_begin() is then found to stop
> rare data corruption seen when kbuilding.
>
> Although I earlier reported that backing out the patch caused an fsx
> test to fail earlier, I've since found great variation in how soon it
> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
> I also reported that I had to go back to 2.6.38 for fsx not to fail
> under memory pressure: you won't be surprised that that turned out to
> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>
> Thanks,
> Hugh
>


Hi there,

Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
works well when blocksize < pagesize" ? I have tried it and it does
seem to help, but I am still running into some failures that I am trying
to debug, but let please let us know if it helps the issues that you are
seeing. Thx!

Allison Henderson


2011-12-06 03:08:37

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

Hi Allison,

I noticed another problem which has nothing to do with punching hole.
__block_write_begin does not zero buffers beyond EOF.(I guess you
tried to zero them in your code, am I right? ) When users mapread
beyond EOF, users get non-zero data. I am not sure zero or non-zero
data should be, but fsx thinks they should be zero data and reports an
error.

It I understand the problem right, it happens more often with punch hole.

Yongqiang.
On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
<[email protected]> wrote:
> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>
>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>
>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>
>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>
>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>> It appears that there's a bug with this patch:
>>
>>
>> This has been outstanding for a month now, and we've heard no progress:
>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>
>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>> a page after generic_write_end() dropped our last reference to it; and
>> backing out the hunk in ext4_da_write_begin() is then found to stop
>> rare data corruption seen when kbuilding.
>>
>> Although I earlier reported that backing out the patch caused an fsx
>> test to fail earlier, I've since found great variation in how soon it
>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>> under memory pressure: you won't be surprised that that turned out to
>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>
>> Thanks,
>> Hugh
>>
>
>
> Hi there,
>
> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
> works well when blocksize < pagesize" ? ?I have tried it and it does seem to
> help, but I am still running into some failures that I am trying to debug,
> but let please let us know if it helps the issues that you are seeing. ?Thx!
>
> Allison Henderson
>



--
Best Wishes
Yongqiang Yang

2011-12-06 03:20:11

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, Dec 6, 2011 at 11:08 AM, Yongqiang Yang <[email protected]> wrote:
> Hi Allison,
>
> I noticed another problem which has nothing to do with punching hole.
> ?__block_write_begin does not zero buffers beyond EOF.(I guess you
> tried to zero them in your code, am I right? ) ?When users mapread
> beyond EOF, ?users get non-zero data. ?I am not sure zero or non-zero
> data should be, but fsx thinks they should be zero data and reports an
> error.
>
> It I understand the problem right, it happens more often with punch hole.
Strange! filemap_fault should have handled the case.

Yongqiang.
>
> Yongqiang.
> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
> <[email protected]> wrote:
>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>
>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>
>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>
>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>
>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>> It appears that there's a bug with this patch:
>>>
>>>
>>> This has been outstanding for a month now, and we've heard no progress:
>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>>
>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>> a page after generic_write_end() dropped our last reference to it; and
>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>> rare data corruption seen when kbuilding.
>>>
>>> Although I earlier reported that backing out the patch caused an fsx
>>> test to fail earlier, I've since found great variation in how soon it
>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>> under memory pressure: you won't be surprised that that turned out to
>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>
>>> Thanks,
>>> Hugh
>>>
>>
>>
>> Hi there,
>>
>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>> works well when blocksize < pagesize" ? ?I have tried it and it does seem to
>> help, but I am still running into some failures that I am trying to debug,
>> but let please let us know if it helps the issues that you are seeing. ?Thx!
>>
>> Allison Henderson
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang



--
Best Wishes
Yongqiang Yang

2011-12-06 03:34:03

by Tao Ma

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/06/2011 11:08 AM, Yongqiang Yang wrote:
> Hi Allison,
>
> I noticed another problem which has nothing to do with punching hole.
> __block_write_begin does not zero buffers beyond EOF.(I guess you
yes, that is expected.
> tried to zero them in your code, am I right? ) When users mapread
> beyond EOF, users get non-zero data. I am not sure zero or non-zero
> data should be, but fsx thinks they should be zero data and reports an
> error.
why users can read the data passing EOF? I am also puzzled. Punching
hole will do this? I don't think it's right.

Thanks
Tao
>
> It I understand the problem right, it happens more often with punch hole.
>
> Yongqiang.
> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
> <[email protected]> wrote:
>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>
>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>
>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>
>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>
>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>> It appears that there's a bug with this patch:
>>>
>>>
>>> This has been outstanding for a month now, and we've heard no progress:
>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>>
>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>> a page after generic_write_end() dropped our last reference to it; and
>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>> rare data corruption seen when kbuilding.
>>>
>>> Although I earlier reported that backing out the patch caused an fsx
>>> test to fail earlier, I've since found great variation in how soon it
>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>> under memory pressure: you won't be surprised that that turned out to
>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>
>>> Thanks,
>>> Hugh
>>>
>>
>>
>> Hi there,
>>
>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>> works well when blocksize < pagesize" ? I have tried it and it does seem to
>> help, but I am still running into some failures that I am trying to debug,
>> but let please let us know if it helps the issues that you are seeing. Thx!
>>
>> Allison Henderson
>>
>
>
>


2011-12-06 03:44:08

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, Dec 6, 2011 at 11:33 AM, Tao Ma <[email protected]> wrote:
> On 12/06/2011 11:08 AM, Yongqiang Yang wrote:
>> Hi Allison,
>>
>> I noticed another problem which has nothing to do with punching hole.
>> ?__block_write_begin does not zero buffers beyond EOF.(I guess you
> yes, that is expected.
>> tried to zero them in your code, am I right? ) ?When users mapread
>> beyond EOF, ?users get non-zero data. ?I am not sure zero or non-zero
>> data should be, but fsx thinks they should be zero data and reports an
>> error.
> why users can read the data passing EOF? I am also puzzled. Punching
> hole will do this? I don't think it's right.
According to code, fiemap_fault handles the case right. But I met
the error - 'non-zero data beyond EOF' reported by fsx. It is
strange. It seems that uptodate status is set wrong. Just a guess:-)

I am guessing Allison met the problem before and tried to fix it in
write path by zeroing buffers beyond EOF.

Yongqiang.
>
> Thanks
> Tao
>>
>> It I understand the problem right, it happens more often with punch hole.
>>
>> Yongqiang.
>> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
>> <[email protected]> wrote:
>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>
>>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>>
>>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>>
>>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>>
>>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>>> It appears that there's a bug with this patch:
>>>>
>>>>
>>>> This has been outstanding for a month now, and we've heard no progress:
>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>>>
>>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>> a page after generic_write_end() dropped our last reference to it; and
>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>> rare data corruption seen when kbuilding.
>>>>
>>>> Although I earlier reported that backing out the patch caused an fsx
>>>> test to fail earlier, I've since found great variation in how soon it
>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>> under memory pressure: you won't be surprised that that turned out to
>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>>
>>>> Thanks,
>>>> Hugh
>>>>
>>>
>>>
>>> Hi there,
>>>
>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>> works well when blocksize < pagesize" ? ?I have tried it and it does seem to
>>> help, but I am still running into some failures that I am trying to debug,
>>> but let please let us know if it helps the issues that you are seeing. ?Thx!
>>>
>>> Allison Henderson
>>>
>>
>>
>>
>



--
Best Wishes
Yongqiang Yang

2011-12-06 04:07:17

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/05/2011 08:44 PM, Yongqiang Yang wrote:
> On Tue, Dec 6, 2011 at 11:33 AM, Tao Ma<[email protected]> wrote:
>> On 12/06/2011 11:08 AM, Yongqiang Yang wrote:
>>> Hi Allison,
>>>
>>> I noticed another problem which has nothing to do with punching hole.
>>> __block_write_begin does not zero buffers beyond EOF.(I guess you
>> yes, that is expected.
>>> tried to zero them in your code, am I right? ) When users mapread
>>> beyond EOF, users get non-zero data. I am not sure zero or non-zero
>>> data should be, but fsx thinks they should be zero data and reports an
>>> error.
>> why users can read the data passing EOF? I am also puzzled. Punching
>> hole will do this? I don't think it's right.
> According to code, fiemap_fault handles the case right. But I met
> the error - 'non-zero data beyond EOF' reported by fsx. It is
> strange. It seems that uptodate status is set wrong. Just a guess:-)
>
> I am guessing Allison met the problem before and tried to fix it in
> write path by zeroing buffers beyond EOF.

Yes I did run into something similar. I found 2 cases that involved EOF:
1. A truncate shortens EOF, but only zeroed to the end of the block, but
not to the end of the page. This was corrected by "[PATCH 5/6 v7] ext4:
fix fsx truncate failure"

2. A write extends EOF, but does not zero all of the page beyond EOF,
and that was what "[PATCH 6/6 v7] ext4: fix partial page writes" was
supposed to address.

I am still digging through tracing output at the moment, so I dont have
a very good explanation right now, but I will keep folks posted if I
find something.

Allison Henderson


>
> Yongqiang.
>>
>> Thanks
>> Tao
>>>
>>> It I understand the problem right, it happens more often with punch hole.
>>>
>>> Yongqiang.
>>> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
>>> <[email protected]> wrote:
>>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>>
>>>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>>>
>>>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>>>
>>>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>>>
>>>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>>>> It appears that there's a bug with this patch:
>>>>>
>>>>>
>>>>> This has been outstanding for a month now, and we've heard no progress:
>>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>>>>
>>>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>>> a page after generic_write_end() dropped our last reference to it; and
>>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>>> rare data corruption seen when kbuilding.
>>>>>
>>>>> Although I earlier reported that backing out the patch caused an fsx
>>>>> test to fail earlier, I've since found great variation in how soon it
>>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>>> under memory pressure: you won't be surprised that that turned out to
>>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>>>
>>>>> Thanks,
>>>>> Hugh
>>>>>
>>>>
>>>>
>>>> Hi there,
>>>>
>>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>>> works well when blocksize< pagesize" ? I have tried it and it does seem to
>>>> help, but I am still running into some failures that I am trying to debug,
>>>> but let please let us know if it helps the issues that you are seeing. Thx!
>>>>
>>>> Allison Henderson
>>>>
>>>
>>>
>>>
>>
>
>
>


2011-12-06 07:57:07

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, Dec 6, 2011 at 12:05 PM, Allison Henderson
<[email protected]> wrote:
> On 12/05/2011 08:44 PM, Yongqiang Yang wrote:
>>
>> On Tue, Dec 6, 2011 at 11:33 AM, Tao Ma<[email protected]> ?wrote:
>>>
>>> On 12/06/2011 11:08 AM, Yongqiang Yang wrote:
>>>>
>>>> Hi Allison,
>>>>
>>>> I noticed another problem which has nothing to do with punching hole.
>>>> ?__block_write_begin does not zero buffers beyond EOF.(I guess you
>>>
>>> yes, that is expected.
>>>>
>>>> tried to zero them in your code, am I right? ) ?When users mapread
>>>> beyond EOF, ?users get non-zero data. ?I am not sure zero or non-zero
>>>> data should be, but fsx thinks they should be zero data and reports an
>>>> error.
>>>
>>> why users can read the data passing EOF? I am also puzzled. Punching
>>> hole will do this? I don't think it's right.
>>
>> According to code, fiemap_fault handles the case right. ? But I met
>> the error - 'non-zero data beyond EOF' reported by fsx. ?It is
>> strange. ?It seems that uptodate status is set wrong. ?Just a guess:-)
>>
>> I am guessing Allison met the problem before and tried to fix it in
>> write path by zeroing buffers beyond EOF.
>
>
> Yes I did run into something similar. ?I found 2 cases that involved EOF:
> 1. A truncate shortens EOF, but only zeroed to the end of the block, but not
> to the end of the page. ?This was corrected by "[PATCH 5/6 v7] ext4: fix fsx
> truncate failure"
>
> 2. A write extends EOF, but does not zero all of the page beyond EOF, and
> that was what "[PATCH 6/6 v7] ext4: fix partial page writes" was supposed to
> address.
I ran into the 2nd case, this case should be handled by readpage. In
this case, write_end should not set uptodate on page. Both mapread
and read should work. Becasue fiemap_fault calls readpage on
non-uptodate page in mapread case.

It seems that write_end sets page uptodate, as a result garbage data
is seen by applications. But I can not find why this happens.


Yongqiang.
>
> I am still digging through tracing output at the moment, so I dont have a
> very good explanation right now, but I will keep folks posted if I find
> something.
>
> Allison Henderson
>
>
>
>>
>> Yongqiang.
>>>
>>>
>>> Thanks
>>> Tao
>>>>
>>>>
>>>> It I understand the problem right, it happens more often with punch
>>>> hole.
>>>>
>>>> Yongqiang.
>>>> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
>>>> <[email protected]> ?wrote:
>>>>>
>>>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>>>
>>>>>>
>>>>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>>>>> It appears that there's a bug with this patch:
>>>>>>
>>>>>>
>>>>>>
>>>>>> This has been outstanding for a month now, and we've heard no
>>>>>> progress:
>>>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for
>>>>>> rc5.
>>>>>>
>>>>>> The problems appear on a 1k-blocksize filesystem under memory
>>>>>> pressure:
>>>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>>>> a page after generic_write_end() dropped our last reference to it; and
>>>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>>>> rare data corruption seen when kbuilding.
>>>>>>
>>>>>> Although I earlier reported that backing out the patch caused an fsx
>>>>>> test to fail earlier, I've since found great variation in how soon it
>>>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>>>> under memory pressure: you won't be surprised that that turned out to
>>>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>>>>
>>>>>> Thanks,
>>>>>> Hugh
>>>>>>
>>>>>
>>>>>
>>>>> Hi there,
>>>>>
>>>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>>>> works well when blocksize< ?pagesize" ? ?I have tried it and it does
>>>>> seem to
>>>>> help, but I am still running into some failures that I am trying to
>>>>> debug,
>>>>> but let please let us know if it helps the issues that you are seeing.
>>>>> ?Thx!
>>>>>
>>>>> Allison Henderson
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>



--
Best Wishes
Yongqiang Yang

2011-12-06 08:56:14

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Mon, 5 Dec 2011, Allison Henderson wrote:
> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
> >
> > This has been outstanding for a month now, and we've heard no progress:
> > please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
> >
> > The problems appear on a 1k-blocksize filesystem under memory pressure:
> > the hunk in ext4_da_write_end() causes oops, because it's playing with
> > a page after generic_write_end() dropped our last reference to it; and
> > backing out the hunk in ext4_da_write_begin() is then found to stop
> > rare data corruption seen when kbuilding.
> >
> > Although I earlier reported that backing out the patch caused an fsx
> > test to fail earlier, I've since found great variation in how soon it
> > fails, and seen it fail just as quickly with 02fac1297eb3 still in.
> > I also reported that I had to go back to 2.6.38 for fsx not to fail
> > under memory pressure: you won't be surprised that that turned out to
> > be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>
> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
> works well when blocksize < pagesize" ? I have tried it and it does seem to
> help, but I am still running into some failures that I am trying to debug,
> but let please let us know if it helps the issues that you are seeing. Thx!

That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
pages without buffers correctly"? The latter is mostly a reversion
of your 02fac1297eb3, so that's the one I need to fix the oops and
rare data corruption. Perhaps you're suggesting 1/2 for fsx failures
under memory pressure?

I've now tried the fsx test on three machines, with both 1/2 and 2/2
applied to 3.2-rc4. On one machine, with ext2 on loop on tmpfs, the
fsx test failed in a couple of minutes with those patches; on another
machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
with the patches; on this laptop, with ext2 on SSD, it's just now
failed after 35 minutes with the patches.

That's not to say that Yongqiang's patches aren't good; but I cannot
detect whether they make any improvement or not, since lasting for 2 or
40 minutes is typical for fsx under memory pressure with recent kernels.

Hugh

2011-12-06 09:32:05

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, Dec 6, 2011 at 4:55 PM, Hugh Dickins <[email protected]> wrote:
> On Mon, 5 Dec 2011, Allison Henderson wrote:
>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>> >
>> > This has been outstanding for a month now, and we've heard no progress:
>> > please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>> >
>> > The problems appear on a 1k-blocksize filesystem under memory pressure:
>> > the hunk in ext4_da_write_end() causes oops, because it's playing with
>> > a page after generic_write_end() dropped our last reference to it; and
>> > backing out the hunk in ext4_da_write_begin() is then found to stop
>> > rare data corruption seen when kbuilding.
>> >
>> > Although I earlier reported that backing out the patch caused an fsx
>> > test to fail earlier, I've since found great variation in how soon it
>> > fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>> > I also reported that I had to go back to 2.6.38 for fsx not to fail
>> > under memory pressure: you won't be surprised that that turned out to
>> > be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>
>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>> works well when blocksize < pagesize" ? ?I have tried it and it does seem to
>> help, but I am still running into some failures that I am trying to debug,
>> but let please let us know if it helps the issues that you are seeing. ?Thx!
>
> That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
> pages without buffers correctly"? ?The latter is mostly a reversion
> of your 02fac1297eb3, so that's the one I need to fix the oops and
> rare data corruption. ?Perhaps you're suggesting 1/2 for fsx failures
> under memory pressure?
>
> I've now tried the fsx test on three machines, with both 1/2 and 2/2
> applied to 3.2-rc4. ?On one machine, with ext2 on loop on tmpfs, the
> fsx test failed in a couple of minutes with those patches; on another
> machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
> with ?the patches; on this laptop, with ext2 on SSD, it's just now
> failed after 35 minutes with the patches.
ext2? So files are indirect mapped? If so, the failure should has
nothing to do with punching hole, I remember that punch hole is not
supported for indirect mapped files.

Do you mean fsx failure or the bug you reported earlier due to
referencing a unlocked page?

Yongqiang.
>
> That's not to say that Yongqiang's patches aren't good; but I cannot
> detect whether they make any improvement or not, since lasting for 2 or
> 40 minutes is typical for fsx under memory pressure with recent kernels.
>
> Hugh



--
Best Wishes
Yongqiang Yang

2011-12-06 21:15:41

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/06/2011 01:55 AM, Hugh Dickins wrote:
> On Mon, 5 Dec 2011, Allison Henderson wrote:
>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>
>>> This has been outstanding for a month now, and we've heard no progress:
>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>>
>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>> a page after generic_write_end() dropped our last reference to it; and
>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>> rare data corruption seen when kbuilding.
>>>
>>> Although I earlier reported that backing out the patch caused an fsx
>>> test to fail earlier, I've since found great variation in how soon it
>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>> under memory pressure: you won't be surprised that that turned out to
>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>
>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>> works well when blocksize< pagesize" ? I have tried it and it does seem to
>> help, but I am still running into some failures that I am trying to debug,
>> but let please let us know if it helps the issues that you are seeing. Thx!
>
> That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
> pages without buffers correctly"? The latter is mostly a reversion
> of your 02fac1297eb3, so that's the one I need to fix the oops and
> rare data corruption. Perhaps you're suggesting 1/2 for fsx failures
> under memory pressure?
>
> I've now tried the fsx test on three machines, with both 1/2 and 2/2
> applied to 3.2-rc4. On one machine, with ext2 on loop on tmpfs, the
> fsx test failed in a couple of minutes with those patches; on another
> machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
> with the patches; on this laptop, with ext2 on SSD, it's just now
> failed after 35 minutes with the patches.
>
> That's not to say that Yongqiang's patches aren't good; but I cannot
> detect whether they make any improvement or not, since lasting for 2 or
> 40 minutes is typical for fsx under memory pressure with recent kernels.


Well, initially I meant to just try the whole set, but now that I try
just one of them, I find that I get further with only the first one. I
think Yongqiang and I have a similar set up because I get the hang if I
dont have the first patch, and I get the fsx write failure (in about 20
or so minutes) if I have the second one. But I think Yongqiang's right,
we need to figure out why the page is uptodate when it shouldn't be.


>
> Hugh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-12-06 22:32:12

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, 6 Dec 2011, Yongqiang Yang wrote:
> On Tue, Dec 6, 2011 at 4:55 PM, Hugh Dickins <[email protected]> wrote:
> >
> > I've now tried the fsx test on three machines, with both 1/2 and 2/2
> > applied to 3.2-rc4. ?On one machine, with ext2 on loop on tmpfs, the
> > fsx test failed in a couple of minutes with those patches; on another
> > machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
> > with ?the patches; on this laptop, with ext2 on SSD, it's just now
> > failed after 35 minutes with the patches.

> ext2? So files are indirect mapped? If so, the failure should has
> nothing to do with punching hole, I remember that punch hole is not
> supported for indirect mapped files.

Yes, I am not trying to test hole punching: just trying to do my own
testing under traditional loads, and hitting problems in changes which
have gone into extN to fix up hole punching.

Some of the time I've been using an old fsx which doesn't even know
about fallocate(), some of the time a recent fsx from xfstests. Most
of the time I've been using ext2, but occasionally I try it on ext4.

I think the only thing which makes the problems go away is blocksize
same as pagesize; but it's a long time since I tried nomblk_io_submit,
that made a big difference around 2.6.38/39, perhaps it still does.

>
> Do you mean fsx failure or the bug you reported earlier due to
> referencing a unlocked page?

In first quoted paragraph above, fsx failure.

Hugh

2011-12-06 22:47:24

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, 6 Dec 2011, Allison Henderson wrote:
> On 12/06/2011 01:55 AM, Hugh Dickins wrote:
> > On Mon, 5 Dec 2011, Allison Henderson wrote:
> > >
> > > Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
> > > works well when blocksize< pagesize" ? I have tried it and it does
> > > seem to
> > > help, but I am still running into some failures that I am trying to
> > > debug,
> > > but let please let us know if it helps the issues that you are seeing.
> > > Thx!
> >
> > That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
> > pages without buffers correctly"? The latter is mostly a reversion
> > of your 02fac1297eb3, so that's the one I need to fix the oops and
> > rare data corruption. Perhaps you're suggesting 1/2 for fsx failures
> > under memory pressure?
> >
> > I've now tried the fsx test on three machines, with both 1/2 and 2/2
> > applied to 3.2-rc4. On one machine, with ext2 on loop on tmpfs, the
> > fsx test failed in a couple of minutes with those patches; on another
> > machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
> > with the patches; on this laptop, with ext2 on SSD, it's just now
> > failed after 35 minutes with the patches.
> >
> > That's not to say that Yongqiang's patches aren't good; but I cannot
> > detect whether they make any improvement or not, since lasting for 2 or
> > 40 minutes is typical for fsx under memory pressure with recent kernels.
>
>
> Well, initially I meant to just try the whole set, but now that I try just
> one of them, I find that I get further with only the first one. I think
> Yongqiang and I have a similar set up because I get the hang if I dont have
> the first patch, and I get the fsx write failure (in about 20 or so minutes)
> if I have the second one. But I think Yongqiang's right, we need to figure
> out why the page is uptodate when it shouldn't be.

I've not seen a hang myself. I'm reluctant to drop the second patch,
since it appears to fix the real problems (oops and corruption) that
I have seen, and fsx fails with or without it; but I don't know whether
it's any better than simply reverting yours.

I would certainly like fsx under memory pressure to be reliable again -
that will help all our testing of new changes; but I think it's more
urgent to get normal loads to be reliable again.

If you can call building on -t ext2 -b 1024 under memory pressure normal ;)

Hugh

2011-12-07 08:28:56

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

Hi Allison and Hugh,

I think I found the problem and it has nothing to do with punching
hole. The patch [ext4: let ext4_bio_write_page handle EOF correctly]
would fix up the problem.

I post the patch so that it can be tested as early as possible. The
problem has not appeared on my machine since the patch is applied.

Yongqiang.
On Wed, Dec 7, 2011 at 5:15 AM, Allison Henderson
<[email protected]> wrote:
> On 12/06/2011 01:55 AM, Hugh Dickins wrote:
>>
>> On Mon, 5 Dec 2011, Allison Henderson wrote:
>>>
>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>
>>>>
>>>> This has been outstanding for a month now, and we've heard no progress:
>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for
>>>> rc5.
>>>>
>>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>> a page after generic_write_end() dropped our last reference to it; and
>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>> rare data corruption seen when kbuilding.
>>>>
>>>> Although I earlier reported that backing out the patch caused an fsx
>>>> test to fail earlier, I've since found great variation in how soon it
>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>> under memory pressure: you won't be surprised that that turned out to
>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>
>>>
>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>> works well when blocksize< ?pagesize" ? ?I have tried it and it does seem
>>> to
>>> help, but I am still running into some failures that I am trying to
>>> debug,
>>> but let please let us know if it helps the issues that you are seeing.
>>> ?Thx!
>>
>>
>> That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
>> pages without buffers correctly"? ?The latter is mostly a reversion
>> of your 02fac1297eb3, so that's the one I need to fix the oops and
>> rare data corruption. ?Perhaps you're suggesting 1/2 for fsx failures
>> under memory pressure?
>>
>> I've now tried the fsx test on three machines, with both 1/2 and 2/2
>> applied to 3.2-rc4. ?On one machine, with ext2 on loop on tmpfs, the
>> fsx test failed in a couple of minutes with those patches; on another
>> machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
>> with ?the patches; on this laptop, with ext2 on SSD, it's just now
>> failed after 35 minutes with the patches.
>>
>> That's not to say that Yongqiang's patches aren't good; but I cannot
>> detect whether they make any improvement or not, since lasting for 2 or
>> 40 minutes is typical for fsx under memory pressure with recent kernels.
>
>
>
> Well, initially I meant to just try the whole set, but now that I try just
> one of them, I find that I get further with only the first one. ?I think
> Yongqiang and I have a similar set up because I get the hang if I dont have
> the first patch, and I get the fsx write failure (in about 20 or so minutes)
> if I have the second one. ?But I think Yongqiang's right, we need to figure
> out why the page is uptodate when it shouldn't be.
>
>
>>
>> Hugh
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html



--
Best Wishes
Yongqiang Yang

2011-12-07 17:04:43

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/07/2011 01:28 AM, Yongqiang Yang wrote:
> Hi Allison and Hugh,
>
> I think I found the problem and it has nothing to do with punching
> hole. The patch [ext4: let ext4_bio_write_page handle EOF correctly]
> would fix up the problem.
>
> I post the patch so that it can be tested as early as possible. The
> problem has not appeared on my machine since the patch is applied.
>
> Yongqiang.

Great! I will try it out with your other set in my sandbox and let you
know what happens. Thx!

Allison Henderson

> On Wed, Dec 7, 2011 at 5:15 AM, Allison Henderson
> <[email protected]> wrote:
>> On 12/06/2011 01:55 AM, Hugh Dickins wrote:
>>>
>>> On Mon, 5 Dec 2011, Allison Henderson wrote:
>>>>
>>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>>
>>>>>
>>>>> This has been outstanding for a month now, and we've heard no progress:
>>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for
>>>>> rc5.
>>>>>
>>>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>>> a page after generic_write_end() dropped our last reference to it; and
>>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>>> rare data corruption seen when kbuilding.
>>>>>
>>>>> Although I earlier reported that backing out the patch caused an fsx
>>>>> test to fail earlier, I've since found great variation in how soon it
>>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>>> under memory pressure: you won't be surprised that that turned out to
>>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>>
>>>>
>>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>>> works well when blocksize< pagesize" ? I have tried it and it does seem
>>>> to
>>>> help, but I am still running into some failures that I am trying to
>>>> debug,
>>>> but let please let us know if it helps the issues that you are seeing.
>>>> Thx!
>>>
>>>
>>> That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
>>> pages without buffers correctly"? The latter is mostly a reversion
>>> of your 02fac1297eb3, so that's the one I need to fix the oops and
>>> rare data corruption. Perhaps you're suggesting 1/2 for fsx failures
>>> under memory pressure?
>>>
>>> I've now tried the fsx test on three machines, with both 1/2 and 2/2
>>> applied to 3.2-rc4. On one machine, with ext2 on loop on tmpfs, the
>>> fsx test failed in a couple of minutes with those patches; on another
>>> machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
>>> with the patches; on this laptop, with ext2 on SSD, it's just now
>>> failed after 35 minutes with the patches.
>>>
>>> That's not to say that Yongqiang's patches aren't good; but I cannot
>>> detect whether they make any improvement or not, since lasting for 2 or
>>> 40 minutes is typical for fsx under memory pressure with recent kernels.
>>
>>
>>
>> Well, initially I meant to just try the whole set, but now that I try just
>> one of them, I find that I get further with only the first one. I think
>> Yongqiang and I have a similar set up because I get the hang if I dont have
>> the first patch, and I get the fsx write failure (in about 20 or so minutes)
>> if I have the second one. But I think Yongqiang's right, we need to figure
>> out why the page is uptodate when it shouldn't be.
>>
>>
>>>
>>> Hugh
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>

2011-12-08 05:10:40

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/07/2011 10:04 AM, Allison Henderson wrote:
> On 12/07/2011 01:28 AM, Yongqiang Yang wrote:
>> Hi Allison and Hugh,
>>
>> I think I found the problem and it has nothing to do with punching
>> hole. The patch [ext4: let ext4_bio_write_page handle EOF correctly]
>> would fix up the problem.
>>
>> I post the patch so that it can be tested as early as possible. The
>> problem has not appeared on my machine since the patch is applied.
>>
>> Yongqiang.
>
> Great! I will try it out with your other set in my sandbox and let you
> know what happens. Thx!
>
> Allison Henderson

Well, it's been running several hours now with out problems, so I think
it will be ok, but I will let it run the full day.

Andy, I know you were also seeing issues in this area. Could you try
Yongqiang patches? The code you were modifying needed to be removed, so
I think they will resolve the issues you were seeing too. Please try
the following patch sets:

[PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
[PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
buffers correctly

and

[PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
[PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly

Thx!

Allison Henderson

>
>> On Wed, Dec 7, 2011 at 5:15 AM, Allison Henderson
>> <[email protected]> wrote:
>>> On 12/06/2011 01:55 AM, Hugh Dickins wrote:
>>>>
>>>> On Mon, 5 Dec 2011, Allison Henderson wrote:
>>>>>
>>>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>>>
>>>>>>
>>>>>> This has been outstanding for a month now, and we've heard no
>>>>>> progress:
>>>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for
>>>>>> rc5.
>>>>>>
>>>>>> The problems appear on a 1k-blocksize filesystem under memory
>>>>>> pressure:
>>>>>> the hunk in ext4_da_write_end() causes oops, because it's playing
>>>>>> with
>>>>>> a page after generic_write_end() dropped our last reference to it;
>>>>>> and
>>>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>>>> rare data corruption seen when kbuilding.
>>>>>>
>>>>>> Although I earlier reported that backing out the patch caused an fsx
>>>>>> test to fail earlier, I've since found great variation in how soon it
>>>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>>>> under memory pressure: you won't be surprised that that turned out to
>>>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39
>>>>>> mblk_io_submit.
>>>>>
>>>>>
>>>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let
>>>>> mpage_submit_io
>>>>> works well when blocksize< pagesize" ? I have tried it and it does
>>>>> seem
>>>>> to
>>>>> help, but I am still running into some failures that I am trying to
>>>>> debug,
>>>>> but let please let us know if it helps the issues that you are seeing.
>>>>> Thx!
>>>>
>>>>
>>>> That 1/2, or the 2/2 "ext4: let ext4_discard_partial_buffers handle
>>>> pages without buffers correctly"? The latter is mostly a reversion
>>>> of your 02fac1297eb3, so that's the one I need to fix the oops and
>>>> rare data corruption. Perhaps you're suggesting 1/2 for fsx failures
>>>> under memory pressure?
>>>>
>>>> I've now tried the fsx test on three machines, with both 1/2 and 2/2
>>>> applied to 3.2-rc4. On one machine, with ext2 on loop on tmpfs, the
>>>> fsx test failed in a couple of minutes with those patches; on another
>>>> machine, with ext2 on loop on tmpfs, it failed after about 40 minutes
>>>> with the patches; on this laptop, with ext2 on SSD, it's just now
>>>> failed after 35 minutes with the patches.
>>>>
>>>> That's not to say that Yongqiang's patches aren't good; but I cannot
>>>> detect whether they make any improvement or not, since lasting for 2 or
>>>> 40 minutes is typical for fsx under memory pressure with recent
>>>> kernels.
>>>
>>>
>>>
>>> Well, initially I meant to just try the whole set, but now that I try
>>> just
>>> one of them, I find that I get further with only the first one. I think
>>> Yongqiang and I have a similar set up because I get the hang if I
>>> dont have
>>> the first patch, and I get the fsx write failure (in about 20 or so
>>> minutes)
>>> if I have the second one. But I think Yongqiang's right, we need to
>>> figure
>>> out why the page is uptodate when it shouldn't be.
>>>
>>>
>>>>
>>>> Hugh
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-ext4" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-12-08 17:40:28

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Wed, 7 Dec 2011, Allison Henderson wrote:
> On 12/07/2011 10:04 AM, Allison Henderson wrote:
> > On 12/07/2011 01:28 AM, Yongqiang Yang wrote:
> > > Hi Allison and Hugh,
> > >
> > > I think I found the problem and it has nothing to do with punching
> > > hole. The patch [ext4: let ext4_bio_write_page handle EOF correctly]
> > > would fix up the problem.
> > >
> > > I post the patch so that it can be tested as early as possible. The
> > > problem has not appeared on my machine since the patch is applied.
> > >
> > > Yongqiang.
> >
> > Great! I will try it out with your other set in my sandbox and let you
> > know what happens. Thx!
> >
> > Allison Henderson
>
> Well, it's been running several hours now with out problems, so I think it
> will be ok, but I will let it run the full day.
>
> Andy, I know you were also seeing issues in this area. Could you try
> Yongqiang patches? The code you were modifying needed to be removed, so I
> think they will resolve the issues you were seeing too. Please try the
> following patch sets:
>
> [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
> [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
> buffers correctly
>
> and
>
> [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
> [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly

Those patches are working well for me, many thanks to Yongqiang.

The last (or more of them?) fix behaviour going back several
releases, and ought to be sent to -stable after verification.

I ran fsx (args as before on 1024k block ext2fs under memory pressure)
for 8 hours on three machines, and no problem showed up on any.
I didn't have time to try ext4, but I expect that you did.

And I've run kernel builds under memory pressure for 7.5 hours,
no problem has showed up there either - although that's not long
enough yet to validate the oops fix by itself, we've earlier run
long enough with the first 2/2 to be sure that it fixes the oops,
and the "corruption" that I saw.

Quotes around corruption now because, from Yongqiang's description,
I'm guessing that ld was mmap'ing objfiles and acting on "data"
from beyond eof. Which ld does have the right to do, it should
indeed be zeroed.

Only once, before the fixes, did I ever see an unexplained EINVAL
(from cp), like Andy reports: I'm very hopeful his case is fixed too.

Thanks!
Hugh

2011-12-08 18:12:41

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Thu, Dec 08, 2011 at 09:39:55AM -0800, Hugh Dickins wrote:

[...]
> > [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
> > [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
> >
> > [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
> > [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly
[...]
> Only once, before the fixes, did I ever see an unexplained EINVAL
> (from cp), like Andy reports: I'm very hopeful his case is fixed too.

Yes I have reverted my fix and applied all four of these patches (above).
I have just completed a 100 iteration run of my test case without failure.
This would typically fail in the first iteration 90% of the time and
never survived more than two iterations.

I am comfortable saying they resolve my issue.

-apw

2011-12-09 13:33:48

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Thu, Dec 08, 2011 at 06:12:35PM +0000, Andy Whitcroft wrote:
> On Thu, Dec 08, 2011 at 09:39:55AM -0800, Hugh Dickins wrote:
>
> [...]
> > > [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
> > > [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
> > >
> > > [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
> > > [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly
> [...]
> > Only once, before the fixes, did I ever see an unexplained EINVAL
> > (from cp), like Andy reports: I'm very hopeful his case is fixed too.
>
> Yes I have reverted my fix and applied all four of these patches (above).
> I have just completed a 100 iteration run of my test case without failure.
> This would typically fail in the first iteration 90% of the time and
> never survived more than two iterations.
>
> I am comfortable saying they resolve my issue.

Are we likely to see these fixes in a 3.2-rcN or will they be going to
stable?

-apw

2011-12-09 13:44:46

by Yongqiang Yang

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Fri, Dec 9, 2011 at 9:33 PM, Andy Whitcroft <[email protected]> wrote:
> On Thu, Dec 08, 2011 at 06:12:35PM +0000, Andy Whitcroft wrote:
>> On Thu, Dec 08, 2011 at 09:39:55AM -0800, Hugh Dickins wrote:
>>
>> [...]
>> > > [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
>> > > [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
>> > >
>> > > [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
>> > > [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly
>> [...]
>> > Only once, before the fixes, did I ever see an unexplained EINVAL
>> > (from cp), like Andy reports: I'm very hopeful his case is fixed too.
>>
>> Yes I have reverted my fix and applied all four of these patches (above).
>> I have just completed a 100 iteration run of my test case without failure.
>> This would typically fail in the first iteration 90% of the time and
>> never survived more than two iterations.
>>
>> I am comfortable saying they resolve my issue.
>
> Are we likely to see these fixes in a 3.2-rcN or will they be going to
> stable?
All patches will go to 3.2-rcN and the later 2 patches would go to
stable too, I think.

Yongqiang.
>
> -apw



--
Best Wishes
Yongqiang Yang

2011-12-09 16:38:14

by Allison Henderson

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On 12/09/2011 06:44 AM, Yongqiang Yang wrote:
> On Fri, Dec 9, 2011 at 9:33 PM, Andy Whitcroft<[email protected]> wrote:
>> On Thu, Dec 08, 2011 at 06:12:35PM +0000, Andy Whitcroft wrote:
>>> On Thu, Dec 08, 2011 at 09:39:55AM -0800, Hugh Dickins wrote:
>>>
>>> [...]
>>>>> [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize< pagesize
>>>>> [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
>>>>>
>>>>> [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
>>>>> [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly
>>> [...]
>>>> Only once, before the fixes, did I ever see an unexplained EINVAL
>>>> (from cp), like Andy reports: I'm very hopeful his case is fixed too.
>>>
>>> Yes I have reverted my fix and applied all four of these patches (above).
>>> I have just completed a 100 iteration run of my test case without failure.
>>> This would typically fail in the first iteration 90% of the time and
>>> never survived more than two iterations.
>>>
>>> I am comfortable saying they resolve my issue.
>>
>> Are we likely to see these fixes in a 3.2-rcN or will they be going to
>> stable?
> All patches will go to 3.2-rcN and the later 2 patches would go to
> stable too, I think.
>
> Yongqiang.

Sounds good, its lasted 24hrs now for me so I think it's safe to stop
it. Thx!

Allison Henderson

>>
>> -apw
>
>
>


2011-12-14 04:12:59

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

Hey Hugh,

If you could do me a favor and run your "kernel compile torture test"
on the tree found on the "dev" branch of the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git

... I'd really appreciate it. This should have all of the patches
needed to address the problems you had been having, and it would be
great to get some additional testing before I push it to Linus.

I'm currently running a full xfstests run on this tree as we speak....

- Ted

2011-12-14 18:07:42

by Hugh Dickins

[permalink] [raw]
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, 13 Dec 2011, Ted Ts'o wrote:
> Hey Hugh,
>
> If you could do me a favor and run your "kernel compile torture test"
> on the tree found on the "dev" branch of the git tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
>
> ... I'd really appreciate it. This should have all of the patches
> needed to address the problems you had been having, and it would be
> great to get some additional testing before I push it to Linus.
>
> I'm currently running a full xfstests run on this tree as we speak....

Builds from that tree have now been running it without incident for
9 hours overnight on five machines. Before setting that off, I gave
them each an hour of fsx under memory pressure, some ext2, some ext4:
again, nothing to report. Plus I have confidence in Yongqiang's four
patches from running the test with them on rc4 for 3 days on two
machines. Looks good to me - thanks to you all!

Hugh