2009-11-16 03:38:53

by JiSheng Zhang

[permalink] [raw]
Subject: [BUG]2.6.27.y some contents lost after writing to mmaped file

Hi,

I triggered a failure in an fs test with fsx-linux from ltp. It seems that
fsx-linux failed at mmap->write sequence.

Tested kernel is 2.6.27.12 and 2.6.27.39
Tested file system: ext3, tmpfs.
IMHO, it impacts all file systems.

Some fsx-linux log is:

READ BAD DATA: offset = 0x2771b, size = 0xa28e
OFFSET GOOD BAD RANGE
0x287e0 0x35c9 0x15a9 0x80
operation# (mod 256) for the bad datamay be 21
...
7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
******WWWW
7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
***RRRR***
Correct content saved for comparison
...


2009-11-17 02:00:34

by Greg KH

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
> Hi,
>
> I triggered a failure in an fs test with fsx-linux from ltp. It seems that
> fsx-linux failed at mmap->write sequence.
>
> Tested kernel is 2.6.27.12 and 2.6.27.39

Does this work on any kernel you have tested? Or is it a regression?

> Tested file system: ext3, tmpfs.
> IMHO, it impacts all file systems.
>
> Some fsx-linux log is:
>
> READ BAD DATA: offset = 0x2771b, size = 0xa28e
> OFFSET GOOD BAD RANGE
> 0x287e0 0x35c9 0x15a9 0x80
> operation# (mod 256) for the bad datamay be 21
> ...
> 7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
> 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
> ******WWWW
> 7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
> ***RRRR***
> Correct content saved for comparison
> ...

Are you sure that the LTP is correct? It wouldn't be the first time it
wasn't...

thanks,

greg k-h

2009-11-17 11:07:29

by JiSheng Zhang

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

Hi Greg,

2009/11/17 Greg KH <[email protected]>:
>>
>> Tested kernel is 2.6.27.12 and 2.6.27.39
>
> Does this work on any kernel you have tested? ?Or is it a regression?

I have tested on both 2.6.27.12 and 2.6.27.39, fsx-linux all failed.

>
>> Tested file system: ext3, tmpfs.
>> IMHO, it impacts all file systems.
>>
>> Some fsx-linux log is:
>>
>> READ BAD DATA: offset = 0x2771b, size = 0xa28e
>> OFFSET ?GOOD ? ?BAD ? ? RANGE
> Are you sure that the LTP is correct? ?It wouldn't be the first time it
> wasn't...

hmmm, I read the source again, IMHO it is correct.

>
> thanks,
>
> greg k-h
>

One more findings: If I add "return" at the beginning of domapwrite,
no fail found yet.

Regards,
Jisheng

2009-11-17 12:37:45

by Chris Mason

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

On Mon, Nov 16, 2009 at 05:56:55PM -0800, Greg KH wrote:
> On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
> > Hi,
> >
> > I triggered a failure in an fs test with fsx-linux from ltp. It seems that
> > fsx-linux failed at mmap->write sequence.
> >
> > Tested kernel is 2.6.27.12 and 2.6.27.39
>
> Does this work on any kernel you have tested? Or is it a regression?
>
> > Tested file system: ext3, tmpfs.
> > IMHO, it impacts all file systems.
> >
> > Some fsx-linux log is:
> >
> > READ BAD DATA: offset = 0x2771b, size = 0xa28e
> > OFFSET GOOD BAD RANGE
> > 0x287e0 0x35c9 0x15a9 0x80
> > operation# (mod 256) for the bad datamay be 21
> > ...
> > 7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
> > 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
> > ******WWWW
> > 7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
> > ***RRRR***
> > Correct content saved for comparison
> > ...
>
> Are you sure that the LTP is correct? It wouldn't be the first time it
> wasn't...

I'm afraid fsx usually finds bugs. I thought Jan Kara recently fixed
something here in ext3, does 2.6.32-rc work?

-chris

2009-11-17 19:06:32

by Jan Kara

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

On Tue 17-11-09 07:36:22, Chris Mason wrote:
> On Mon, Nov 16, 2009 at 05:56:55PM -0800, Greg KH wrote:
> > On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
> > > Hi,
> > >
> > > I triggered a failure in an fs test with fsx-linux from ltp. It seems that
> > > fsx-linux failed at mmap->write sequence.
> > >
> > > Tested kernel is 2.6.27.12 and 2.6.27.39
> >
> > Does this work on any kernel you have tested? Or is it a regression?
> >
> > > Tested file system: ext3, tmpfs.
> > > IMHO, it impacts all file systems.
> > >
> > > Some fsx-linux log is:
> > >
> > > READ BAD DATA: offset = 0x2771b, size = 0xa28e
> > > OFFSET GOOD BAD RANGE
> > > 0x287e0 0x35c9 0x15a9 0x80
> > > operation# (mod 256) for the bad datamay be 21
> > > ...
> > > 7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
> > > 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
> > > ******WWWW
> > > 7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
> > > ***RRRR***
> > > Correct content saved for comparison
> > > ...
Hmm, how long does it take to reproduce? I'm running fsx-linux on tmpfs
for a while on 2.6.27.21 and didn't hit the problem yet.

> > Are you sure that the LTP is correct? It wouldn't be the first time it
> > wasn't...
>
> I'm afraid fsx usually finds bugs. I thought Jan Kara recently fixed
> something here in ext3, does 2.6.32-rc work?
Yeah, fsx usually finds bugs. Note that he sees the problem also on tmpfs
so it's not ext3 problem. Anyway, trying to reproduce with 2.6.32-rc? would
be interesting.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-11-18 13:55:38

by JiSheng Zhang

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

On Tue, 17 Nov 2009 20:06:35 +0100
Jan Kara <[email protected]> wrote:

> On Tue 17-11-09 07:36:22, Chris Mason wrote:
> > On Mon, Nov 16, 2009 at 05:56:55PM -0800, Greg KH wrote:
> > > On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
> > > > Hi,
> > > >
> > > > I triggered a failure in an fs test with fsx-linux from ltp. It seems that
> > > > fsx-linux failed at mmap->write sequence.
> > > >
> > > > Tested kernel is 2.6.27.12 and 2.6.27.39
> > >
> > > Does this work on any kernel you have tested? Or is it a regression?
> > >
> > > > Tested file system: ext3, tmpfs.
> > > > IMHO, it impacts all file systems.
> > > >
> > > > Some fsx-linux log is:
> > > >
> > > > READ BAD DATA: offset = 0x2771b, size = 0xa28e
> > > > OFFSET GOOD BAD RANGE
> > > > 0x287e0 0x35c9 0x15a9 0x80
> > > > operation# (mod 256) for the bad datamay be 21
> > > > ...
> > > > 7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
> > > > 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
> > > > ******WWWW
> > > > 7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
> > > > ***RRRR***
> > > > Correct content saved for comparison
> > > > ...
> Hmm, how long does it take to reproduce? I'm running fsx-linux on tmpfs
> for a while on 2.6.27.21 and didn't hit the problem yet.

I forget to mention that the test were done on an arm board with 64M ram.
I have tested fsx-linux again on pc, it seems that failure go away.

>
> > > Are you sure that the LTP is correct? It wouldn't be the first time it
> > > wasn't...
> >
> > I'm afraid fsx usually finds bugs. I thought Jan Kara recently fixed
> > something here in ext3, does 2.6.32-rc work?
> Yeah, fsx usually finds bugs. Note that he sees the problem also on tmpfs
> so it's not ext3 problem. Anyway, trying to reproduce with 2.6.32-rc? would
> be interesting.

Currently the arm board doesn't support 2.6.32-rc. But I test with 2.6.32-rc7
On my pc box, there's no failure so far.

>
> Honza

I found this via google:
http://marc.info/?t=118026315000001&r=1&w=2

I even tried the code from
http://marc.info/?l=linux-arch&m=118030601701617&w=2
I got mostly:
firstfirstfirst
firstfirstfirst
firstfirstfirst


No change after pass "MS_SYNC|MS_INVALIDATE" to msync and make the
flush_dcache_page() call unconditional in do_generic_mapping_read.
This behavior is different from what I read from the mail thread above.

> void do_generic_mapping_read(struct address_space *mapping,
> struct file_ra_state *_ra,
> struct file *filp,
> loff_t *ppos,
> read_descriptor_t *desc,
> read_actor_t actor)
> {
> ...
> /* If users can be writing to this page using arbitrary
> * virtual addresses, take care about potential aliasing
> * before reading the page on the kernel side.
> */
> if (1 || mapping_writably_mapped(mapping))
> flush_dcache_page(page);

Then I run fsx-linux after the above modification, fsx-linux failed all
the same both on tmpfs and ext3

2009-11-19 14:43:28

by Jan Kara

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

On Wed 18-11-09 22:17:56, JiSheng Zhang wrote:
> On Tue, 17 Nov 2009 20:06:35 +0100
> Jan Kara <[email protected]> wrote:
>
> > On Tue 17-11-09 07:36:22, Chris Mason wrote:
> > > On Mon, Nov 16, 2009 at 05:56:55PM -0800, Greg KH wrote:
> > > > On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
> > > > > Hi,
> > > > >
> > > > > I triggered a failure in an fs test with fsx-linux from ltp. It seems that
> > > > > fsx-linux failed at mmap->write sequence.
> > > > >
> > > > > Tested kernel is 2.6.27.12 and 2.6.27.39
> > > >
> > > > Does this work on any kernel you have tested? Or is it a regression?
> > > >
> > > > > Tested file system: ext3, tmpfs.
> > > > > IMHO, it impacts all file systems.
> > > > >
> > > > > Some fsx-linux log is:
> > > > >
> > > > > READ BAD DATA: offset = 0x2771b, size = 0xa28e
> > > > > OFFSET GOOD BAD RANGE
> > > > > 0x287e0 0x35c9 0x15a9 0x80
> > > > > operation# (mod 256) for the bad datamay be 21
> > > > > ...
> > > > > 7828: 1257514978.306753 READ 0x23dba thru 0x25699 (0x18e0 bytes)
> > > > > 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
> > > > > ******WWWW
> > > > > 7830: 1257514978.307504 READ 0x2771b thru 0x319a8 (0xa28e bytes)
> > > > > ***RRRR***
> > > > > Correct content saved for comparison
> > > > > ...
> > Hmm, how long does it take to reproduce? I'm running fsx-linux on tmpfs
> > for a while on 2.6.27.21 and didn't hit the problem yet.
>
> I forget to mention that the test were done on an arm board with 64M ram.
> I have tested fsx-linux again on pc, it seems that failure go away.
>
> > > > Are you sure that the LTP is correct? It wouldn't be the first time it
> > > > wasn't...
> > >
> > > I'm afraid fsx usually finds bugs. I thought Jan Kara recently fixed
> > > something here in ext3, does 2.6.32-rc work?
> > Yeah, fsx usually finds bugs. Note that he sees the problem also on tmpfs
> > so it's not ext3 problem. Anyway, trying to reproduce with 2.6.32-rc? would
> > be interesting.
>
> Currently the arm board doesn't support 2.6.32-rc. But I test with 2.6.32-rc7
> On my pc box, there's no failure so far.
OK, so it's either ARM specific or it's triggered by low amount of
available memory (you might want to try testing your PC with mem=64M).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-11-19 15:26:01

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

On Wed, Nov 18, 2009 at 10:17:56PM +0800, JiSheng Zhang wrote:
> I forget to mention that the test were done on an arm board with 64M ram.
> I have tested fsx-linux again on pc, it seems that failure go away.

Could provide a full bug report please, as in:

- CPU type
- is it a SMP CPU
- are you running a SMP kernel
- board type

All the above can be provided by supplying the kernel boot messages
(preferred)

- the storage peripheral being used for this test
- is DMA being used for this periperal
- any additional block layers (eg, lvm, dm, md)
- filesystem type

Plus, please cc suspected ARM problems to the ARM _kernel_ mailing list.

Thanks.

2009-11-20 03:40:58

by JiSheng Zhang

[permalink] [raw]
Subject: Re: [BUG]2.6.27.y some contents lost after writing to mmaped file

Hi,

Russell King wrote
>- CPU type
ARM926EJ-S
>- is it a SMP CPU
no. UP
>- are you running a SMP kernel
no
>- board type
an soc


>- the storage peripheral being used for this test
memory and harddrive
>- is DMA being used for this periperal
for memory, DMA? for harddrive, yes
>- any additional block layers (eg, lvm, dm, md)
no
>- filesystem type
tmpfs and ext3

2009/11/18 JiSheng Zhang <[email protected]>:
> On Tue, 17 Nov 2009 20:06:35 +0100
> Jan Kara <[email protected]> wrote:
>
>> On Tue 17-11-09 07:36:22, Chris Mason wrote:
>> > On Mon, Nov 16, 2009 at 05:56:55PM -0800, Greg KH wrote:
>> > > On Mon, Nov 16, 2009 at 11:38:57AM +0800, JiSheng Zhang wrote:
>> > > > Hi,
>> > > >
>> > > > I triggered a failure in an fs test with fsx-linux from ltp. It seems that
>> > > > fsx-linux failed at mmap->write sequence.
>> > > >
>> > > > Tested kernel is 2.6.27.12 and 2.6.27.39
>> > >
>> > > Does this work on any kernel you have tested? ?Or is it a regression?
>> > >
>> > > > Tested file system: ext3, tmpfs.
>> > > > IMHO, it impacts all file systems.
>> > > >
>> > > > Some fsx-linux log is:
>> > > >
>> > > > READ BAD DATA: offset = 0x2771b, size = 0xa28e
>> > > > OFFSET ?GOOD ? ?BAD ? ? RANGE
>> > > > 0x287e0 0x35c9 ?0x15a9 ? ? 0x80
>> > > > operation# (mod 256) for the bad datamay be 21
>> > > > ...
>> > > > 7828: 1257514978.306753 READ ? ? 0x23dba thru 0x25699 (0x18e0 bytes)
>> > > > 7829: 1257514978.306899 MAPWRITE 0x27eeb thru 0x2a516 (0x262c bytes)
>> > > > ?******WWWW
>> > > > 7830: 1257514978.307504 READ ? ? 0x2771b thru 0x319a8 (0xa28e bytes)
>> > > > ?***RRRR***
>> > > > Correct content saved for comparison
>> > > > ...
>> ? Hmm, how long does it take to reproduce? I'm running fsx-linux on tmpfs
>> for a while on 2.6.27.21 and didn't hit the problem yet.
>
> I forget to mention that the test were done on an arm board with 64M ram.
> I have tested fsx-linux again on pc, it seems that failure go away.
>
>>
>> > > Are you sure that the LTP is correct? ?It wouldn't be the first time it
>> > > wasn't...
>> >
>> > I'm afraid fsx usually finds bugs. ?I thought Jan Kara recently fixed
>> > something here in ext3, does 2.6.32-rc work?
>> ? Yeah, fsx usually finds bugs. Note that he sees the problem also on tmpfs
>> so it's not ext3 problem. Anyway, trying to reproduce with 2.6.32-rc? would
>> be interesting.
>
> Currently the arm board doesn't support 2.6.32-rc. But I test with 2.6.32-rc7
> On my pc box, there's no failure so far.
>
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Honza
>
> I found this via google:
> http://marc.info/?t=118026315000001&r=1&w=2
>
> I even tried the code from
> http://marc.info/?l=linux-arch&m=118030601701617&w=2
> I got mostly:
> firstfirstfirst
> firstfirstfirst
> firstfirstfirst
>
>
> No change after pass "MS_SYNC|MS_INVALIDATE" to msync and make the
> flush_dcache_page() call unconditional in do_generic_mapping_read.
> This behavior is different from what I read from the mail thread above.
>
>> void do_generic_mapping_read(struct address_space *mapping,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct file_ra_state *_ra,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct file *filp,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?loff_t *ppos,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?read_descriptor_t *desc,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?read_actor_t actor)
>> {
>> ...
>> ? ? ? ? ? ? ? ? /* If users can be writing to this page using arbitrary
>> ? ? ? ? ? ? ? ? ?* virtual addresses, take care about potential aliasing
>> ? ? ? ? ? ? ? ? ?* before reading the page on the kernel side.
>> ? ? ? ? ? ? ? ? ?*/
>> ? ? ? ? ? ? ? ? if (1 || mapping_writably_mapped(mapping))
>> ? ? ? ? ? ? ? ? ? ? ? ? flush_dcache_page(page);
>
> Then I run fsx-linux after the above modification, fsx-linux failed all
> the same both on tmpfs and ext3