2007-05-19 12:30:38

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 1/9] readahead: introduce PG_readahead

On Fri, May 18, 2007 at 11:28:24PM -0700, Andrew Morton wrote:
> On Thu, 17 May 2007 06:47:53 +0800 Fengguang Wu <[email protected]> wrote:
>
> > Introduce a new page flag: PG_readahead.
>
> Is there any way in which we can avoid adding a new page flag?
>
> We have the advantage that if the kernel very occasionally gets the wrong
> result for PageReadahead(page), nothing particularly bad will happen, so we
> can do racy things.
>
> >From a quick peek, it appears that PG_readahead is only ever set against
> non-uptodate pages. If true we could perhaps exploit that: say,
> PageReadahead(page) == PG_referenced && !PG_uptodate?

PG_uptodate will flip to 1 before the reader touches the page :(

However, it may be possible to share the same bit with PG_reclaim or PG_booked.
Which one would be preferred?


2007-05-19 15:30:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/9] readahead: introduce PG_readahead

On Sat, 19 May 2007 20:30:31 +0800 Fengguang Wu <[email protected]> wrote:

> On Fri, May 18, 2007 at 11:28:24PM -0700, Andrew Morton wrote:
> > On Thu, 17 May 2007 06:47:53 +0800 Fengguang Wu <[email protected]> wrote:
> >
> > > Introduce a new page flag: PG_readahead.
> >
> > Is there any way in which we can avoid adding a new page flag?
> >
> > We have the advantage that if the kernel very occasionally gets the wrong
> > result for PageReadahead(page), nothing particularly bad will happen, so we
> > can do racy things.
> >
> > >From a quick peek, it appears that PG_readahead is only ever set against
> > non-uptodate pages. If true we could perhaps exploit that: say,
> > PageReadahead(page) == PG_referenced && !PG_uptodate?
>
> PG_uptodate will flip to 1 before the reader touches the page :(

OK.

> However, it may be possible to share the same bit with PG_reclaim or PG_booked.
> Which one would be preferred?

I'd like PG_booked to go away too - I don't think we've put that under the
microscope yet. If it remains then its scope will be "defined by the
filesystem", so readahead shouldn't use it. PG_reclaim belongs to core VFS
so that's much better.

Let's not do anything ugly, slow or silly in there, but please do take a
look, see if there is an opportunity here.

2007-05-20 03:09:05

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 1/9] readahead: introduce PG_readahead

On Sat, May 19, 2007 at 08:25:04AM -0700, Andrew Morton wrote:
> On Sat, 19 May 2007 20:30:31 +0800 Fengguang Wu <[email protected]> wrote:
>
> > On Fri, May 18, 2007 at 11:28:24PM -0700, Andrew Morton wrote:
> > > On Thu, 17 May 2007 06:47:53 +0800 Fengguang Wu <[email protected]> wrote:
> > >
> > > > Introduce a new page flag: PG_readahead.
> > >
> > > Is there any way in which we can avoid adding a new page flag?
> > >
> > > We have the advantage that if the kernel very occasionally gets the wrong
> > > result for PageReadahead(page), nothing particularly bad will happen, so we
> > > can do racy things.
> > >
> > > >From a quick peek, it appears that PG_readahead is only ever set against
> > > non-uptodate pages. If true we could perhaps exploit that: say,
> > > PageReadahead(page) == PG_referenced && !PG_uptodate?
> >
> > PG_uptodate will flip to 1 before the reader touches the page :(
>
> OK.
>
> > However, it may be possible to share the same bit with PG_reclaim or PG_booked.
> > Which one would be preferred?
>
> I'd like PG_booked to go away too - I don't think we've put that under the
> microscope yet. If it remains then its scope will be "defined by the
> filesystem", so readahead shouldn't use it. PG_reclaim belongs to core VFS
> so that's much better.
>
> Let's not do anything ugly, slow or silly in there, but please do take a
> look, see if there is an opportunity here.

The reuse code would look like the attached one.
It still needs more testing, and would fail if Christoph reuses
PG_reclaim in higher order pagecache in the future.

Fengguang Wu
---

Subject: mm: share PG_readahead and PG_reclaim

Share the same page flag bit for PG_readahead and PG_reclaim.

One is used only on file reads, another is only for emergency writes.
One is used mostly for fresh/young pages, another is for old pages.

Combinations of possible interactions are:

a) clear PG_reclaim => implicit clear of PG_readahead
it will delay an asynchronous readahead into a synchronous one
it actually does _good_ for readahead:
the pages will be reclaimed soon, it's readahead thrashing!
in this case, synchronous readahead makes more sense.

b) clear PG_readahead => implicit clear of PG_reclaim
one(and only one) page will not be reclaimed in time
it can be avoided by checking PageWriteback(page) in readahead first

c) set PG_reclaim => implicit set of PG_readahead
will confuse readahead and make it restart the size rampup process
it's a trivial problem, and can mostly be avoided by checking
PageWriteback(page) first in readahead

d) set PG_readahead => implicit set of PG_reclaim
PG_readahead will never be set on already cached pages.
PG_reclaim will always be cleared on dirtying a page.
so not a problem.

In summary,
a) we get better behavior
b,d) possible interactions can be avoided
c) racy condition exists that might affect readahead, but the chance
is _really_ low, and the hurt on readahead is trivial.

Compound pages also use PG_reclaim, but for now they do not interact with
reclaim/readahead code.

Signed-off-by: Fengguang Wu <[email protected]>
---
include/linux/page-flags.h | 4 +++-
mm/page-writeback.c | 1 +
mm/readahead.c | 4 ++++
3 files changed, 8 insertions(+), 1 deletion(-)

--- linux-2.6.22-rc1-mm1.orig/include/linux/page-flags.h
+++ linux-2.6.22-rc1-mm1/include/linux/page-flags.h
@@ -92,7 +92,9 @@
#define PG_lazyfree 20 /* MADV_FREE potential throwaway */

#define PG_booked 21 /* Has blocks reserved on-disk */
-#define PG_readahead 22 /* Reminder to do async read-ahead */
+
+/* PG_readahead is only used for file reads; PG_reclaim is only for writes */
+#define PG_readahead PG_reclaim /* Reminder to do async read-ahead */

/* PG_owner_priv_1 users should have descriptive aliases */
#define PG_checked PG_owner_priv_1 /* Used by some filesystems */
--- linux-2.6.22-rc1-mm1.orig/mm/page-writeback.c
+++ linux-2.6.22-rc1-mm1/mm/page-writeback.c
@@ -922,6 +922,7 @@ int clear_page_dirty_for_io(struct page

BUG_ON(!PageLocked(page));

+ ClearPageReclaim(page);
if (mapping && mapping_cap_account_dirty(mapping)) {
/*
* Yes, Virginia, this is indeed insane.
--- linux-2.6.22-rc1-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc1-mm1/mm/readahead.c
@@ -447,6 +447,10 @@ page_cache_readahead_ondemand(struct add
if (!ra->ra_pages)
return 0;

+ /* It's PG_reclaim! */
+ if (PageWriteback(page))
+ return 0;
+
if (page) {
ClearPageReadahead(page);


2007-05-20 03:09:04

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 1/9] readahead: introduce PG_readahead

On Sat, May 19, 2007 at 08:25:04AM -0700, Andrew Morton wrote:
> On Sat, 19 May 2007 20:30:31 +0800 Fengguang Wu <[email protected]> wrote:
>
> > On Fri, May 18, 2007 at 11:28:24PM -0700, Andrew Morton wrote:
> > > On Thu, 17 May 2007 06:47:53 +0800 Fengguang Wu <[email protected]> wrote:
> > >
> > > > Introduce a new page flag: PG_readahead.
> > >
> > > Is there any way in which we can avoid adding a new page flag?
> > >
> > > We have the advantage that if the kernel very occasionally gets the wrong
> > > result for PageReadahead(page), nothing particularly bad will happen, so we
> > > can do racy things.
> > >
> > > >From a quick peek, it appears that PG_readahead is only ever set against
> > > non-uptodate pages. If true we could perhaps exploit that: say,
> > > PageReadahead(page) == PG_referenced && !PG_uptodate?
> >
> > PG_uptodate will flip to 1 before the reader touches the page :(
>
> OK.
>
> > However, it may be possible to share the same bit with PG_reclaim or PG_booked.
> > Which one would be preferred?
>
> I'd like PG_booked to go away too - I don't think we've put that under the
> microscope yet. If it remains then its scope will be "defined by the
> filesystem", so readahead shouldn't use it. PG_reclaim belongs to core VFS
> so that's much better.
>
> Let's not do anything ugly, slow or silly in there, but please do take a
> look, see if there is an opportunity here.

The reuse code would look like the attached one.
It still needs more testing, and would fail if Christoph reuses
PG_reclaim in higher order pagecache in the future.

Fengguang Wu
---

Subject: mm: share PG_readahead and PG_reclaim

Share the same page flag bit for PG_readahead and PG_reclaim.

One is used only on file reads, another is only for emergency writes.
One is used mostly for fresh/young pages, another is for old pages.

Combinations of possible interactions are:

a) clear PG_reclaim => implicit clear of PG_readahead
it will delay an asynchronous readahead into a synchronous one
it actually does _good_ for readahead:
the pages will be reclaimed soon, it's readahead thrashing!
in this case, synchronous readahead makes more sense.

b) clear PG_readahead => implicit clear of PG_reclaim
one(and only one) page will not be reclaimed in time
it can be avoided by checking PageWriteback(page) in readahead first

c) set PG_reclaim => implicit set of PG_readahead
will confuse readahead and make it restart the size rampup process
it's a trivial problem, and can mostly be avoided by checking
PageWriteback(page) first in readahead

d) set PG_readahead => implicit set of PG_reclaim
PG_readahead will never be set on already cached pages.
PG_reclaim will always be cleared on dirtying a page.
so not a problem.

In summary,
a) we get better behavior
b,d) possible interactions can be avoided
c) racy condition exists that might affect readahead, but the chance
is _really_ low, and the hurt on readahead is trivial.

Compound pages also use PG_reclaim, but for now they do not interact with
reclaim/readahead code.

Signed-off-by: Fengguang Wu <[email protected]>
---
include/linux/page-flags.h | 4 +++-
mm/page-writeback.c | 1 +
mm/readahead.c | 4 ++++
3 files changed, 8 insertions(+), 1 deletion(-)

--- linux-2.6.22-rc1-mm1.orig/include/linux/page-flags.h
+++ linux-2.6.22-rc1-mm1/include/linux/page-flags.h
@@ -92,7 +92,9 @@
#define PG_lazyfree 20 /* MADV_FREE potential throwaway */

#define PG_booked 21 /* Has blocks reserved on-disk */
-#define PG_readahead 22 /* Reminder to do async read-ahead */
+
+/* PG_readahead is only used for file reads; PG_reclaim is only for writes */
+#define PG_readahead PG_reclaim /* Reminder to do async read-ahead */

/* PG_owner_priv_1 users should have descriptive aliases */
#define PG_checked PG_owner_priv_1 /* Used by some filesystems */
--- linux-2.6.22-rc1-mm1.orig/mm/page-writeback.c
+++ linux-2.6.22-rc1-mm1/mm/page-writeback.c
@@ -922,6 +922,7 @@ int clear_page_dirty_for_io(struct page

BUG_ON(!PageLocked(page));

+ ClearPageReclaim(page);
if (mapping && mapping_cap_account_dirty(mapping)) {
/*
* Yes, Virginia, this is indeed insane.
--- linux-2.6.22-rc1-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc1-mm1/mm/readahead.c
@@ -447,6 +447,10 @@ page_cache_readahead_ondemand(struct add
if (!ra->ra_pages)
return 0;

+ /* It's PG_reclaim! */
+ if (PageWriteback(page))
+ return 0;
+
if (page) {
ClearPageReadahead(page);


2007-05-20 07:10:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/9] readahead: introduce PG_readahead

On Sun, 20 May 2007, Fengguang Wu wrote:

> The reuse code would look like the attached one.
> It still needs more testing, and would fail if Christoph reuses
> PG_reclaim in higher order pagecache in the future.

Do not worry about that. All higher order pagecache patchsets remove that
sharing because we will then need these pages on the LRU and thus
PG_reclaim cannot be shared anymore.