LinuxLists.cc - OpenPosix test case mmap_11-4 fails in ext4 filesystem

2014-05-21 06:42:07

Subject: OpenPosix test case mmap_11-4 fails in ext4 filesystem

Hi,

Recently I met the mmap_11-4 fails when running LTP in RHEL7.0RC. Attached is a
test program to reproduce this problem, which is written by Cyril. Uncommenting the
msync() makes the test succeed in old linux distribution, such as RHEL6.5GA, but
fails in RHEL7.0RC.

I also read some ext4's source code in RHEL7.0RC and here is the possible reason
according to my understanding. Hope this will help you something.
--------------------------------------------------------------------------------------------

When calling msync() in an ext4 file system, ext4_bio_write_page will be
called to write back dirty pages. Here is the source code in RHEL7.0RC:

int ext4_bio_write_page(struct ext4_io_submit *io, struct page *page, int len, struct writeback_control *wbc)
{
struct inode *inode = page->mapping->host;
unsigned block_start, blocksize;
struct buffer_head *bh, *head;
int ret = 0;
int nr_submitted = 0;

blocksize = 1 << inode->i_blkbits;

BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));

set_page_writeback(page);
ClearPageError(page);

......

bh = head = page_buffers(page);
do {
block_start = bh_offset(bh);
if (block_start >= len) {
/*
* Comments copied from block_write_full_page_endio:
*
* The page straddles i_size. It must be zeroed out on
* each and every writepage invocation because it may
* be mmapped. "A file is mapped in multiples of the
* page size. For a file that is not a multiple of
* the page size, the remaining memory is zeroed when
* mapped, and writes to that region are not written
* out to the file."
*/
zero_user_segment(page, block_start,
block_start + blocksize);
clear_buffer_dirty(bh);
set_buffer_uptodate(bh);
continue;
}
......
} while ((bh = bh->b_this_page) != head);
--------------------------------------------------------------------------------------------
I deleted some irrelevant code.

The argument len is computed by the following code:
loff_t size = i_size_read(inode); // file's length
if (index == size >> PAGE_CACHE_SHIFT)
len = size & ~PAGE_CACHE_MASK;
else
len = PAGE_CACHE_SIZE;

That means len is the valid file length in every page.

When ext4 file system's block size is 1024, then there will be 4 struct buffer head attached to this page.

See the above "do... while ..." statements in ext4_bio_write_page(), "block_start = bh_offset(bh);" will make
block_start be 0 for the first buffer head, 1024 for the second, 2048 for the third, 3072 for the forth.

So in the reproduce program, in this case, len is 2048, the "if (block_start >= len) "
condition will be satisfied in the third and forth iteration, so "zero_user_segment(page, block_start, block_start + blocksize);" will
be called, then the content beyond the file's end will be zeroed, so the reproduce program will succeed.

But when ext4 file system's block size if 4096, then there will only on buffer head attached to
this page, then when len is 2048, "while ((bh = bh->b_this_page) != head);" statement will make the "do ... while..."
statement execute only once. In the first iteration, "block_start = bh_offset(bh); " will make
block_start be 0, " if (block_start >= len) " won't be satisfied, zero_user_segment() won't be called,
so the content in current page beyond the file's end will not be zeroed, so the reproduce program fails.

In RHEL6.5GA, block_write_full_page() will be called to do work similar to ext4_bio_write_page, this function does
not do the zero work in unit of struct buffer head, so this bug is not exist.

The above is my understanding. If it's not correct, I'd like that you explain the true reason, thanks.

Also I don't know whether this can be considered an ext4 bug or not. And according msync()'s definition,
it seems that it dose not require msync() to zero the partial page beyond mmaped file's area. But at least, msync()'s behavior will be
different when ext4 file system has different block size, I think we may fix these to keep consistency.

Or you think ext4's implementation is correct and the LTP mmap_11-4 case is invalid? Thanks.

Regards,
Xiaoguang Wang

Attachments:

test.c (2.00 kB)

2014-05-21 12:55:20

by Lukas Czerner

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Wed, 21 May 2014, Xiaoguang Wang wrote:

> Date: Wed, 21 May 2014 14:39:43 +0800
> From: Xiaoguang Wang <[email protected]>
> To: [email protected]
> Cc: [email protected], [email protected]
> Subject: OpenPosix test case mmap_11-4 fails in ext4 filesystem
>
> Hi,
>
> Recently I met the mmap_11-4 fails when running LTP in RHEL7.0RC. Attached is a
> test program to reproduce this problem, which is written by Cyril. Uncommenting the
> msync() makes the test succeed in old linux distribution, such as RHEL6.5GA, but
> fails in RHEL7.0RC.

Hi,

please contact you Red Hat support to triage the issue within Red
Hat.

>
> I also read some ext4's source code in RHEL7.0RC and here is the possible reason
> according to my understanding. Hope this will help you something.
> --------------------------------------------------------------------------------------------
>
> When calling msync() in an ext4 file system, ext4_bio_write_page will be
> called to write back dirty pages. Here is the source code in RHEL7.0RC:
>
> int ext4_bio_write_page(struct ext4_io_submit *io, struct page *page, int len, struct writeback_control *wbc)
> {
> struct inode *inode = page->mapping->host;
> unsigned block_start, blocksize;
> struct buffer_head *bh, *head;
> int ret = 0;
> int nr_submitted = 0;
>
> blocksize = 1 << inode->i_blkbits;
>
> BUG_ON(!PageLocked(page));
> BUG_ON(PageWriteback(page));
>
> set_page_writeback(page);
> ClearPageError(page);
>
> ......
>
> bh = head = page_buffers(page);
> do {
> block_start = bh_offset(bh);
> if (block_start >= len) {
> /*
> * Comments copied from block_write_full_page_endio:
> *
> * The page straddles i_size. It must be zeroed out on
> * each and every writepage invocation because it may
> * be mmapped. "A file is mapped in multiples of the
> * page size. For a file that is not a multiple of
> * the page size, the remaining memory is zeroed when
> * mapped, and writes to that region are not written
> * out to the file."
> */
> zero_user_segment(page, block_start,
> block_start + blocksize);
> clear_buffer_dirty(bh);
> set_buffer_uptodate(bh);
> continue;
> }
> ......
> } while ((bh = bh->b_this_page) != head);
> --------------------------------------------------------------------------------------------
> I deleted some irrelevant code.
>
> The argument len is computed by the following code:
> loff_t size = i_size_read(inode); // file's length
> if (index == size >> PAGE_CACHE_SHIFT)
> len = size & ~PAGE_CACHE_MASK;
> else
> len = PAGE_CACHE_SIZE;
>
> That means len is the valid file length in every page.
>
> When ext4 file system's block size is 1024, then there will be 4 struct buffer head attached to this page.
>
> See the above "do... while ..." statements in ext4_bio_write_page(), "block_start = bh_offset(bh);" will make
> block_start be 0 for the first buffer head, 1024 for the second, 2048 for the third, 3072 for the forth.
>
> So in the reproduce program, in this case, len is 2048, the "if (block_start >= len) "
> condition will be satisfied in the third and forth iteration, so "zero_user_segment(page, block_start, block_start + blocksize);" will
> be called, then the content beyond the file's end will be zeroed, so the reproduce program will succeed.
>
> But when ext4 file system's block size if 4096, then there will only on buffer head attached to
> this page, then when len is 2048, "while ((bh = bh->b_this_page) != head);" statement will make the "do ... while..."
> statement execute only once. In the first iteration, "block_start = bh_offset(bh); " will make
> block_start be 0, " if (block_start >= len) " won't be satisfied, zero_user_segment() won't be called,
> so the content in current page beyond the file's end will not be zeroed, so the reproduce program fails.

Actually this is the same even with block size < page size. The zero
our will always be block aligned. The last block will alway be
written out.

-Lukas

>
> In RHEL6.5GA, block_write_full_page() will be called to do work similar to ext4_bio_write_page, this function does
> not do the zero work in unit of struct buffer head, so this bug is not exist.
>
> The above is my understanding. If it's not correct, I'd like that you explain the true reason, thanks.
>
> Also I don't know whether this can be considered an ext4 bug or not. And according msync()'s definition,
> it seems that it dose not require msync() to zero the partial page beyond mmaped file's area. But at least, msync()'s behavior will be
> different when ext4 file system has different block size, I think we may fix these to keep consistency.
>
> Or you think ext4's implementation is correct and the LTP mmap_11-4 case is invalid? Thanks.
>
> Regards,
> Xiaoguang Wang
>

2014-05-21 14:12:16

by Theodore Ts'o

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Wed, May 21, 2014 at 02:55:12PM +0200, Lukáš Czerner wrote:
> >
> > Recently I met the mmap_11-4 fails when running LTP in RHEL7.0RC. Attached is a
> > test program to reproduce this problem, which is written by Cyril. Uncommenting the
> > msync() makes the test succeed in old linux distribution, such as RHEL6.5GA, but
> > fails in RHEL7.0RC.
>
> please contact you Red Hat support to triage the issue within Red
> Hat.

This is a valid bug report which applies to upstream ext4. It also
applies to tmpfs, by the way.

It would be good to get this test case (thank you very much for
writing it!) imported into xfstests, since the lack of this test is
why we didn't notice the bug when we added the fs/ext4/page_io.c code
paths.

BTW, there is one interesting thing about that changed when we moved
to page based I/O (years and years ago). The requirement in mmap is
as Xiaoguang stated:

A file is mapped in multiples of the page size. For a file
that is not a multi‐ ple of the page size, the remaining memory
is zeroed when mapped, and writes to that region are not
written out to the file. The effect of changing the size of
the underlying file of a mapping on the pages that correspond
to added or removed regions of the file is unspecified.

Previously in Linux 2.2 (or was it 2.0; my memory is a bit fuzzy about
when we unified the buffer and page caches), when we copied the data
from the page cache to the buffer cache, we stopped at EOF. This
implies that if the program modified the portion of the page mapped
beyond EOF, it would remain modified until the page was released and
the page was dropped from the page cache.

Now, when we call writepage() --- assuming the file system wasn't
buggy --- the bytes after EOF will get cleared. If the user doesn't
call msync(), when this happens is unpredictable to the userspace
program, since it happens when writeback daemons decide to write back
the data.

See my modified test program to demonstrate this issue. For ext3 and
xfs, you will see message:

nb: byte modified beyond EOF cleared after mysync

I think this is valid, or at least, not prohibited by the mmap()
interface specification, but it's not explicitly documented, and it
might be a little surprising. Given that ext3 users have been seeing
this behavior for years and years, it's probably fine, so I'm
proposing that we change this; but we might want to make a note of it
in the man page.

Regards,

- Ted

Attachments:

(No filename) (2.44 kB)
mmap-test.c (2.08 kB)
Download all attachments

2014-05-21 15:06:31

by Cyril Hrubis

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

Hi!
> This is a valid bug report which applies to upstream ext4. It also
> applies to tmpfs, by the way.
>
> It would be good to get this test case (thank you very much for
> writing it!) imported into xfstests, since the lack of this test is
> why we didn't notice the bug when we added the fs/ext4/page_io.c code
> paths.

It has been in LTP for ages. Maybe it's a time developers should start
to use LTP :) (We managed to fix most of the problems that are credible
for the bad LTP reputation...)

> BTW, there is one interesting thing about that changed when we moved
> to page based I/O (years and years ago). The requirement in mmap is
> as Xiaoguang stated:
>
> A file is mapped in multiples of the page size. For a file
> that is not a multi??? ple of the page size, the remaining memory
> is zeroed when mapped, and writes to that region are not
> written out to the file. The effect of changing the size of
> the underlying file of a mapping on the pages that correspond
> to added or removed regions of the file is unspecified.
>
> Previously in Linux 2.2 (or was it 2.0; my memory is a bit fuzzy about
> when we unified the buffer and page caches), when we copied the data
> from the page cache to the buffer cache, we stopped at EOF. This
> implies that if the program modified the portion of the page mapped
> beyond EOF, it would remain modified until the page was released and
> the page was dropped from the page cache.
>
> Now, when we call writepage() --- assuming the file system wasn't
> buggy --- the bytes after EOF will get cleared. If the user doesn't
> call msync(), when this happens is unpredictable to the userspace
> program, since it happens when writeback daemons decide to write back
> the data.

This is exacly what we concluded when we were fixing the testacase (I
talked about this I think with Jan Kara and Michal Hocko). And the
result was to add the msync() to the testcase. We also agreed that
fixing this for tmpfs is not worth the effort although when interpreting
POSIX strictly it should work with shm memory as well.

--
Cyril Hrubis
[email protected]

2014-05-21 15:18:32

by Lukas Czerner

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Wed, 21 May 2014, Theodore Ts'o wrote:

> Date: Wed, 21 May 2014 10:12:03 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukáš Czerner <[email protected]>
> Cc: Xiaoguang Wang <[email protected]>, [email protected],
> [email protected]
> Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem
>
> On Wed, May 21, 2014 at 02:55:12PM +0200, Lukáš Czerner wrote:
> > >
> > > Recently I met the mmap_11-4 fails when running LTP in RHEL7.0RC. Attached is a
> > > test program to reproduce this problem, which is written by Cyril. Uncommenting the
> > > msync() makes the test succeed in old linux distribution, such as RHEL6.5GA, but
> > > fails in RHEL7.0RC.
> >
> > please contact you Red Hat support to triage the issue within Red
> > Hat.
>
> This is a valid bug report which applies to upstream ext4. It also
> applies to tmpfs, by the way.

I am not disputing it, but since he seems to be using RHEL I am
asking for the triage with Red Hat support.

Thanks!
-Lukas

2014-05-21 18:58:39

by Theodore Ts'o

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Wed, May 21, 2014 at 05:06:19PM +0200, [email protected] wrote:
> > It would be good to get this test case (thank you very much for
> > writing it!) imported into xfstests, since the lack of this test is
> > why we didn't notice the bug when we added the fs/ext4/page_io.c code
> > paths.
>
> It has been in LTP for ages. Maybe it's a time developers should start
> to use LTP :) (We managed to fix most of the problems that are credible
> for the bad LTP reputation...)

There is a pretty large amount of overlap between LTP and xfstests,
and xfstests are what most of the file system developers are using,
and we have developed a lot of automated test automation which means
running xfstests is very easy and convenient. For example:

https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/README

The ability for me to build a kernel and then with a single command,
"kvm-xfstests smoke", do a quick verification in about 30 minutes, is
very convenient.

As I recall, ltp was integrated with autotest, and my experience with
autotest at multiple companies is if anything, worse than ltp's
reputation. (I considered ltp to be mostly harmless, albeit not
particularly useful, whereas I considered autotest to be activetly
harmful to engineer productivity.)

Anyway, it's already the case that most of the useful file-system
specific bits of LTP has been cherry picked into xfstests, and I
suspect it will be a lot easier to get a few additional LTP test cases
added into xfstests, than it will be to convince a large number of
file system developers that they should (a) try to figure out how to
integrate LTP into their test harnesses, and (b) how to avoid
duplicating tests which xfstests are already running.

> This is exacly what we concluded when we were fixing the testacase (I
> talked about this I think with Jan Kara and Michal Hocko). And the
> result was to add the msync() to the testcase. We also agreed that
> fixing this for tmpfs is not worth the effort although when interpreting
> POSIX strictly it should work with shm memory as well.

Sure, although it's not clear what "written out" means in the context
of tmpfs.

- Ted

2014-05-21 19:01:40

by Theodore Ts'o

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Wed, May 21, 2014 at 05:18:24PM +0200, Lukáš Czerner wrote:
>
> I am not disputing it, but since he seems to be using RHEL I am
> asking for the triage with Red Hat support.

Sure, it makes sence that a RH bugzilla entry be opened to make sure
that whatever eventual bug fix we have in upstream gets backported to
the RHEL 7.0RC kernel.

And sure, normally when someone uses distro-specific version
references when reporting bugs upstream, this is not the best
procedure, especially since not everyone (including me) has access to
RHEL 7.0 RC kernels. However, I'm minded to overlook such faux pas
when the bug report is accompanied by a very nice test case which
makes it trivial to verify the problematic behavior, and that the
problem is present upstream.

Cheers,

- ted

2014-05-21 21:20:09

by Cyril Hrubis

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

Hi!
> There is a pretty large amount of overlap between LTP and xfstests,
> and xfstests are what most of the file system developers are using,
> and we have developed a lot of automated test automation which means
> running xfstests is very easy and convenient. For example:
>
> https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/README
>
> The ability for me to build a kernel and then with a single command,
> "kvm-xfstests smoke", do a quick verification in about 30 minutes, is
> very convenient.

LTP is automated to a degree where you run single script and get a file
with list of failed testcases later. We do not have any kind of kvm
integration though.

> As I recall, ltp was integrated with autotest, and my experience with
> autotest at multiple companies is if anything, worse than ltp's
> reputation. (I considered ltp to be mostly harmless, albeit not
> particularly useful, whereas I considered autotest to be activetly
> harmful to engineer productivity.)

The autotest integration is not a part of the LTP at all. I remember
seeing it somewhere but I've never used it/looked at the code.

LTP has it's own script and testdriver to run testcases, but given that
LTP tests are just binaries that are mostly self-contained it's not
doing much more than starting a test, writing logfiles and killing
lefover processes (if the tests fails to collect them itself). I will
not pretend that it's clean and well designed code but at least it works
fine (as a matter of fact I've started to work on redesigning/rewriting
it from scratch some time ago).

> Anyway, it's already the case that most of the useful file-system
> specific bits of LTP has been cherry picked into xfstests, and I
> suspect it will be a lot easier to get a few additional LTP test cases
> added into xfstests, than it will be to convince a large number of
> file system developers that they should (a) try to figure out how to
> integrate LTP into their test harnesses, and (b) how to avoid
> duplicating tests which xfstests are already running.

Well I can personally help with (a).

The test in question here (mmap_11-4) is a part of the Open Posix
Testsuite that continues to live in LTP. The whole testsuite runs in
about 30 minutes and covers most of the POSIX interfaces in ~1600
testcases.

Then there is a syscalls testsuite which covers, in addition to the
POSIX specs, some of the Linux specific interfaces too. The runtime is
about 15 minutes for ~1030 testcases.

I guess that we can filter filesystem related syscalls quite easily. The
overlap would take more work though. In LTP we have mostly conformance
testcases and some stress testcases. I'm not much familiar with xfstests
and its coverage.

And we have a more tests that may be interesting to fs maintaners, there
are aio testcases (which are likely covered by xfstests allready), some
fs stress tests, ...

--
Cyril Hrubis
[email protected]

2014-05-21 22:06:22

by Eric Sandeen

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On 5/21/14, 4:20 PM, [email protected] wrote:
> Hi!
>> There is a pretty large amount of overlap between LTP and xfstests,
>> and xfstests are what most of the file system developers are using,
>> and we have developed a lot of automated test automation which means
>> running xfstests is very easy and convenient. For example:
>>
>> https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/README
>>
>> The ability for me to build a kernel and then with a single command,
>> "kvm-xfstests smoke", do a quick verification in about 30 minutes, is
>> very convenient.
>
> LTP is automated to a degree where you run single script and get a file
> with list of failed testcases later. We do not have any kind of kvm
> integration though.
>
>> As I recall, ltp was integrated with autotest, and my experience with
>> autotest at multiple companies is if anything, worse than ltp's
>> reputation. (I considered ltp to be mostly harmless, albeit not
>> particularly useful, whereas I considered autotest to be activetly
>> harmful to engineer productivity.)
>
> The autotest integration is not a part of the LTP at all. I remember
> seeing it somewhere but I've never used it/looked at the code.
>
> LTP has it's own script and testdriver to run testcases, but given that
> LTP tests are just binaries that are mostly self-contained it's not
> doing much more than starting a test, writing logfiles and killing
> lefover processes (if the tests fails to collect them itself). I will
> not pretend that it's clean and well designed code but at least it works
> fine (as a matter of fact I've started to work on redesigning/rewriting
> it from scratch some time ago).
>
>> Anyway, it's already the case that most of the useful file-system
>> specific bits of LTP has been cherry picked into xfstests, and I
>> suspect it will be a lot easier to get a few additional LTP test cases
>> added into xfstests, than it will be to convince a large number of
>> file system developers that they should (a) try to figure out how to
>> integrate LTP into their test harnesses, and (b) how to avoid
>> duplicating tests which xfstests are already running.
>
> Well I can personally help with (a).
>
> The test in question here (mmap_11-4) is a part of the Open Posix
> Testsuite that continues to live in LTP. The whole testsuite runs in
> about 30 minutes and covers most of the POSIX interfaces in ~1600
> testcases.
>
> Then there is a syscalls testsuite which covers, in addition to the
> POSIX specs, some of the Linux specific interfaces too. The runtime is
> about 15 minutes for ~1030 testcases.
>
> I guess that we can filter filesystem related syscalls quite easily. The
> overlap would take more work though. In LTP we have mostly conformance
> testcases and some stress testcases. I'm not much familiar with xfstests
> and its coverage.
>
> And we have a more tests that may be interesting to fs maintaners, there
> are aio testcases (which are likely covered by xfstests allready), some
> fs stress tests, ...
>

FWIW, I don't think there's any real compelling reason to migrate existing
LTP tests into xfstests. Maybe LTP folks just need to do a better job of
pitching LTP to filesystem developers, as we did with xfstests. :)

-Eric

2014-05-22 02:45:38

by Theodore Ts'o

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Wed, May 21, 2014 at 11:20:03PM +0200, [email protected] wrote:
>
> The test in question here (mmap_11-4) is a part of the Open Posix
> Testsuite that continues to live in LTP. The whole testsuite runs in
> about 30 minutes and covers most of the POSIX interfaces in ~1600
> testcases.

I'm pretty familiar with the PCTS (which is different from the Open
Posix Testsuite), epsecially as it relates to the tty/termios
interfaces.

At least with the PCTS, there are a large number of the tests which
are primarily issues which are implemented in the core VFS layer or in
the core TTY layer. If you are a core tty implementor (as I was), the
PCTS was really useful for that. If you are a serial device driver
implementor, however, not all of the tests were as useful.

I suspect the same would be true with file system related tests.
There will be those tests which are are really useful if you are doing
core VFS work, and then there are those tests which are useful if you
are working on a particular file system. Separating out those tests
which are most useful for developers would probably be a good thing,
although obviously tere will always be a certain amount of overlap.

> I guess that we can filter filesystem related syscalls quite easily. The
> overlap would take more work though. In LTP we have mostly conformance
> testcases and some stress testcases. I'm not much familiar with xfstests
> and its coverage.

Xfstests has already taken some parts of LTP. The xfstests sources
has a ltp directory with the following:

% ls ltp
total 376
40 aio-stress.c 8 doio.h 44 fsx.c 44 iogen.c 8 rwtest.sh*
84 doio.c 68 fsstress.c 76 growfiles.c 4 Makefile

Furthermore, there are a number of xfstests which run programs like
fsstress and fsx using a variety of configuration options which have
historically been best at stress testing file systems.

So it sounds like there's quite a lot of overlap, some of it caused by
xfstests grabbing bits and pieces of LTP, and part of it because
people have been adding both functional and stress tests to xfstests.

The question is what's the best way of dealing with the overlap.
Clearly xfstests has a lot more mindshare amonst file system
developers, and LTP does have that unfortunate reputation which it is
still trying to overcome. :-/

Cheers,

- Ted

2014-05-22 10:45:12

by Cyril Hrubis

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

Hi!
> > The test in question here (mmap_11-4) is a part of the Open Posix
> > Testsuite that continues to live in LTP. The whole testsuite runs in
> > about 30 minutes and covers most of the POSIX interfaces in ~1600
> > testcases.
>
> I'm pretty familiar with the PCTS (which is different from the Open
> Posix Testsuite), epsecially as it relates to the tty/termios
> interfaces.
>
> At least with the PCTS, there are a large number of the tests which
> are primarily issues which are implemented in the core VFS layer or in
> the core TTY layer. If you are a core tty implementor (as I was), the
> PCTS was really useful for that. If you are a serial device driver
> implementor, however, not all of the tests were as useful.

The Open Posix Testsuite is written to the POSIX standard, it tries to
test each assertion written in POSIX so the coverage is very broad. I
guess that you are right that not much of the tests are relevant for
filesystem developers and that would similarily be true for the syscalls
tests, however the bug that started this thread proves that there are
some.

And by the way trying to test each POSIX assertion is sometimes wery
dumb thing to do and we did remove quite a lot of testcases that were
useless or even wrong but that is completly different story...

> I suspect the same would be true with file system related tests.
> There will be those tests which are are really useful if you are doing
> core VFS work, and then there are those tests which are useful if you
> are working on a particular file system. Separating out those tests
> which are most useful for developers would probably be a good thing,
> although obviously tere will always be a certain amount of overlap.

Right, sorting out right testcases is not easy task. But given that the
runtime is not that long we can do rough selection and end up with
something that runs in about 10 or 20 minutes and covers all possibly
relevant testcases. We may end up selecting tests that are not relevant
to the job, but that shouldn't be problem unless these tests start to
misbehave and produce false positives. And even if misbehaved test is
found, which shouldn't common case, it's my job to help with getting it
right.

I've looked quickly at the syscalls testcases and I think that there are
conformance testcases that may be interesting to filesystem developers,
namely: ftruncate, fallocate, io_* syscalls tests, fdatasync, setxattr,
then there are separate testcases for capabilities (although I would
have to review these before use). And then there are a lot of testcases
for syscalls that interacts with the underlying filesystem i.e.
reading/writing/mmaping files, checking that file permissons are set
correctly etc.

> > I guess that we can filter filesystem related syscalls quite easily. The
> > overlap would take more work though. In LTP we have mostly conformance
> > testcases and some stress testcases. I'm not much familiar with xfstests
> > and its coverage.
>
> Xfstests has already taken some parts of LTP. The xfstests sources
> has a ltp directory with the following:
>
> % ls ltp
> total 376
> 40 aio-stress.c 8 doio.h 44 fsx.c 44 iogen.c 8 rwtest.sh*
> 84 doio.c 68 fsstress.c 76 growfiles.c 4 Makefile

I've looked at these tests in LTP and xfstests. The code in both
projects has drifted apart a bit and there are different fixes in both
projects which is not healty.

The best I can do as LTP maintainer is to help to unify the code so that
both LTP and xfstests uses latest code with all the fixes. Which would
be harder than just applying patches because there were some larger code
cleanups in both projects, it should be doable though.

> Furthermore, there are a number of xfstests which run programs like
> fsstress and fsx using a variety of configuration options which have
> historically been best at stress testing file systems.

I'm not trying to argue with that. And I'm not trying to convince
anybody that LTP is the only right tool for kernel testing.

All I'm trying to say is that LTP is worth running. :)
(And maybe that LTP is a good home if you have just a test or two that
doesn't seem to fit to any specific testsuite and you don't want to
start you own test project.)

> So it sounds like there's quite a lot of overlap, some of it caused by
> xfstests grabbing bits and pieces of LTP, and part of it because
> people have been adding both functional and stress tests to xfstests.

There is overlap even inside of the LTP as well (for example between
Open Posix Testsuite and syscall tests), but I don't think that is
necessarily wrong. Each of these tests has a slightly different purpose
and different implementation and together it covers more cases.

> The question is what's the best way of dealing with the overlap.

I guess making sure that both LTP and xfstests share latest source code
with all the fixes would be a good start.

> Clearly xfstests has a lot more mindshare amonst file system
> developers, and LTP does have that unfortunate reputation which it is
> still trying to overcome. :-/

To be frank the reputation is one thing I would really like to fix.
Nothing is more frustrating than spending years fixing code and still
being ignored because of the past you haven't been even involved in.

Unfortunately this seems to be one of the most challenging tasks in LTP
maintainership.

--
Cyril Hrubis
[email protected]

2014-05-22 13:42:21

by Jan Kara

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

Hello,

On Wed 21-05-14 14:39:43, Xiaoguang Wang wrote:
> Recently I met the mmap_11-4 fails when running LTP in RHEL7.0RC. Attached is a
> test program to reproduce this problem, which is written by Cyril. Uncommenting the
> msync() makes the test succeed in old linux distribution, such as RHEL6.5GA, but
> fails in RHEL7.0RC.
Thanks for report and the detailed analysis!

> I also read some ext4's source code in RHEL7.0RC and here is the possible reason
> according to my understanding. Hope this will help you something.
> --------------------------------------------------------------------------------------------
>
> When calling msync() in an ext4 file system, ext4_bio_write_page will be
> called to write back dirty pages. Here is the source code in RHEL7.0RC:
>
> int ext4_bio_write_page(struct ext4_io_submit *io, struct page *page, int len, struct writeback_control *wbc)
> {
> struct inode *inode = page->mapping->host;
> unsigned block_start, blocksize;
> struct buffer_head *bh, *head;
> int ret = 0;
> int nr_submitted = 0;
>
> blocksize = 1 << inode->i_blkbits;
>
> BUG_ON(!PageLocked(page));
> BUG_ON(PageWriteback(page));
>
> set_page_writeback(page);
> ClearPageError(page);
>
> ......
>
> bh = head = page_buffers(page);
> do {
> block_start = bh_offset(bh);
> if (block_start >= len) {
> /*
> * Comments copied from block_write_full_page_endio:
> *
> * The page straddles i_size. It must be zeroed out on
> * each and every writepage invocation because it may
> * be mmapped. "A file is mapped in multiples of the
> * page size. For a file that is not a multiple of
> * the page size, the remaining memory is zeroed when
> * mapped, and writes to that region are not written
> * out to the file."
> */
> zero_user_segment(page, block_start,
> block_start + blocksize);
> clear_buffer_dirty(bh);
> set_buffer_uptodate(bh);
> continue;
> }
> ......
> } while ((bh = bh->b_this_page) != head);
> --------------------------------------------------------------------------------------------
> I deleted some irrelevant code.
You are right, the code is wrong. The breakage actually seems to have
happened already in 2010 when Ted originally wrote the code and Yongqiang's
patch in 2011 fixed it only partially. I'll send the fix in a minute.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-05-22 14:39:12

by Theodore Ts'o

[permalink] [raw]

Subject: Re: OpenPosix test case mmap_11-4 fails in ext4 filesystem

On Thu, May 22, 2014 at 12:45:05PM +0200, [email protected] wrote:
> Right, sorting out right testcases is not easy task. But given that the
> runtime is not that long we can do rough selection and end up with
> something that runs in about 10 or 20 minutes and covers all possibly
> relevant testcases.

Something to keep in mind is that for at least some file system
developers, such as myself, we end up running certain tests multiple
times, so we can get test coverage for different file system mount
options, block sizes, file system features, etc. Currently, I have
ten or so different configurations for my full test run.

So if the bulk of the 20 minutes are tests for which there is
duplicate coverage with xfstests, or which is stuff that is only
testing behaviour implemented in the VFS, that's actually 3-4 extra
hours of testing as far as I'm concerned. Hence, the longer and the
longer the tests take, the less often I will run them --- and the
effect is not linear.

So failing to eliminate tests which are not relevant is not cost-free.
In some cases, the costs of winnowing out non-relevant tests may not
be worth the effort. If it's only an extra 5 seconds, I probably
wouldn't care. Although if you multiply the time saved by the number
of file system developers, and the number of full test runs that they
likely will be running during the development cycle, spending a few
minutes to disable a non-relevant test starts becoming net-positive.

> To be frank the reputation is one thing I would really like to fix.
> Nothing is more frustrating than spending years fixing code and still
> being ignored because of the past you haven't been even involved in.

Well, to the extent that many years ago, I was employed by one of the
companies had a well-meaning, but ultimately counterproductive
contribution to LTP, by putting relatively junior test engineers that
didn't understand the kernel code all that deeply, and was given the
mandate to increase the code coverage percentages as the only figure
of merit, I was actually somewhat involved, if only indirectly.

Unfortunately, I wasn't able to effect change by advocating strongly
enough for a more productive approach (which would have required
putting in far more senior, and thus more expensive and harder to
allocate engineering talent), although I did help eventually help by
arguing that ceasing support for what I considered to be a deeply
dysfunctional effort as the best thing that particular company could
have done at that point. :-(

So I really have to commend Xiaoguang Wang for having done a large
amount of analysis when submitting the report about the LTP test
failure. That's significantly different from my previous experience
interacting with previous people working on the LTP project, and it's
precisely that sort of work and analysis which I'm sure will help turn
around LTP's reputation.

Cheers,

- Ted