2005-11-21 19:26:14

by Marcelo Tosatti

[permalink] [raw]
Subject: [PATCH] properly account readahead file major faults

Hi,

The fault accounting of filemap_dopage() is currently unable to account
for readahead pages as major faults.

Which means that getrusage's major fault reporting is pretty useless.

Fix that by using the PageReferenced and PageActive bits: allows
differentiation between newly accessed pages and reaccessed pages.

Follows the output of "/usr/bin/time -v" of an app which reads
10MB through a mapping on cold pagecache.

BEFORE PATCH:
Size: 10MB
Loops: 10
opening ./scan-testfile
reading scan-testfile, length = 10MB
Command being timed: "./scan-shared 10 10"
<snip>
Major (requiring I/O) page faults: 225
Minor (reclaiming a frame) page faults: 2437

AFTER PATCH:
Size: 10MB
Loops: 10
opening ./scan-testfile
reading scan-testfile, length = 10MB
Command being timed: "./scan-shared 10 10"
<>snip>
Major (requiring I/O) page faults: 2562
Minor (reclaiming a frame) page faults: 101



Signed-off-by: Marcelo Tosatti <[email protected]>

diff --git a/mm/filemap.c b/mm/filemap.c
index 5d6e4c2..8655443 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1237,14 +1237,6 @@ retry_find:
if (ra->mmap_miss > ra->mmap_hit + MMAP_LOTSAMISS)
goto no_cached_page;

- /*
- * To keep the pgmajfault counter straight, we need to
- * check did_readaround, as this is an inner loop.
- */
- if (!did_readaround) {
- majmin = VM_FAULT_MAJOR;
- inc_page_state(pgmajfault);
- }
did_readaround = 1;
ra_pages = max_sane_readahead(file->f_ra.ra_pages);
if (ra_pages) {
@@ -1273,6 +1265,15 @@ success:
/*
* Found the page and have a reference on it.
*/
+ if (PageActive(page) && !PageReferenced(page)) {
+ /* only account active pages as major faults, since inactive
+ * pages might have their referenced bit cleaned by
+ * memory scanning.
+ */
+ majmin = VM_FAULT_MAJOR;
+ inc_page_state(pgmajfault);
+ }
+
mark_page_accessed(page);
if (type)
*type = majmin;
@@ -1312,10 +1313,6 @@ no_cached_page:
return NULL;

page_not_uptodate:
- if (!did_readaround) {
- majmin = VM_FAULT_MAJOR;
- inc_page_state(pgmajfault);
- }
lock_page(page);

/* Did it get unhashed while we waited for it? */


2005-11-22 04:23:05

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH] properly account readahead file major faults

Hi,

On Mon, Nov 21, 2005 at 12:00:38PM -0200, Marcelo Tosatti wrote:
> Hi,
>
> The fault accounting of filemap_dopage() is currently unable to account
> for readahead pages as major faults.

Sorry, I don't know much about the definition of major/minor page faults.
So I googled one that explains the old behavior:

--> Page Faults <--
These come in two varieties. Minor and Major faults. A Major fault results
when an application tries to access a memory page that has been swapped out to
disk. The page must be swapped back in. A Minor fault results when an
application tries to access a memory page that is still in memory, but the
physical location of which is not immediately known. The address must be
looked up.

With the current accounting logic:
- major faults reflect the times one has to wait for real I/O.
- the more success read-ahead, the less major faults.
- anyway, major+minor faults remain the same for the same benchmark.

With your patch:
- major faults are expected to remain the same with whatever read-ahead.
- but what's the new meaning of minor faults?

Thanks,
Wu

2005-11-22 11:51:01

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] properly account readahead file major faults

Hi Wu!

On Tue, Nov 22, 2005 at 12:24:43PM +0800, Wu Fengguang wrote:
> Hi,
>
> On Mon, Nov 21, 2005 at 12:00:38PM -0200, Marcelo Tosatti wrote:
> > Hi,
> >
> > The fault accounting of filemap_dopage() is currently unable to account
> > for readahead pages as major faults.
>
> Sorry, I don't know much about the definition of major/minor page faults.
> So I googled one that explains the old behavior:
>
> --> Page Faults <--
> These come in two varieties. Minor and Major faults. A Major fault results
> when an application tries to access a memory page that has been swapped out to
> disk. The page must be swapped back in. A Minor fault results when an
> application tries to access a memory page that is still in memory, but the
> physical location of which is not immediately known. The address must be
> looked up.

Yep, just that "swapped out"/"swappin in" can be though of as "read
in/"read out".

> With the current accounting logic:
> - major faults reflect the times one has to wait for real I/O.
> - the more success read-ahead, the less major faults.
> - anyway, major+minor faults remain the same for the same benchmark.
>
> With your patch:
> - major faults are expected to remain the same with whatever read-ahead.
> - but what's the new meaning of minor faults?

With the patch minor faults are only those faults which can be serviced
by the pagecache, requiring no I/O.

Pages which hit the first time in cache due to readahead _have_ caused
IO, and as such they should be counted as major faults.

I suppose that if you want to count readahead hits it should be done
separately (which is now "sort of" available with the "majflt" field).

2005-11-22 12:34:17

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH] properly account readahead file major faults

On Tue, Nov 22, 2005 at 04:23:21AM -0200, Marcelo Tosatti wrote:
> > With the current accounting logic:
> > - major faults reflect the times one has to wait for real I/O.
> > - the more success read-ahead, the less major faults.
> > - anyway, major+minor faults remain the same for the same benchmark.
> >
> > With your patch:
> > - major faults are expected to remain the same with whatever read-ahead.
> > - but what's the new meaning of minor faults?
>
> With the patch minor faults are only those faults which can be serviced
> by the pagecache, requiring no I/O.
>
> Pages which hit the first time in cache due to readahead _have_ caused
> IO, and as such they should be counted as major faults.

Thanks, so you are changing the semantics. It may be a good idea, though I have
no idea which is more valuable for the people, or whether this change is ok for
the majority, or if it makes linux in line with other UNIX behaviors. Sorry, I
cannot comment on this.

> I suppose that if you want to count readahead hits it should be done
> separately (which is now "sort of" available with the "majflt" field).

Sure. It is hard to collect the exact readahead hit values, for the pages from
active list and pages newly allocated are not distinguishable. But I do manage
to collect the short term values for each readahead, which are sufficient most
of the time.

Regards,
Wu

2005-11-22 12:55:05

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] properly account readahead file major faults

On Tue, 22 Nov 2005, Marcelo Tosatti wrote:
>
> Pages which hit the first time in cache due to readahead _have_ caused
> IO, and as such they should be counted as major faults.

Have caused IO, or have benefitted from IO which was done earlier?

It sounds debatable, each will have their own idea of what's major.

Maybe PageUptodate at the time the entry is found in the page cache
should come into it? !PageUptodate implying that we'll be waiting
for read to complete.

Hugh

2005-11-22 13:36:43

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] properly account readahead file major faults

Hi Hugh!

On Tue, Nov 22, 2005 at 12:55:02PM +0000, Hugh Dickins wrote:
> On Tue, 22 Nov 2005, Marcelo Tosatti wrote:
> >
> > Pages which hit the first time in cache due to readahead _have_ caused
> > IO, and as such they should be counted as major faults.
>
> Have caused IO, or have benefitted from IO which was done earlier?

Which caused IO, either synchronously or via (previously read)
readahead.

> It sounds debatable, each will have their own idea of what's major.

I see your point... and I much prefer the "majflt means IO performed"
definition :)

As a user I want to know how many pages have been read in from disk to
service my application requests.

>From the "time" manpage:

F Number of major, or I/O-requiring, page faults that oc-
curred while the process was running. These are faults
where the page has actually migrated out of primary memo-
ry.

> Maybe PageUptodate at the time the entry is found in the page cache
> should come into it? !PageUptodate implying that we'll be waiting
> for read to complete.

Hum, I still strongly feel that users care about IO performed and not
readahead effectiveness (which could be separate information).

I don't think the semantics are precisely defined anywhere are they?