LinuxLists.cc - [PATCH] earlier allocation of order 0 pages from pcp in __alloc

2005-09-29 22:02:14

Subject: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

[PATCH]: Try to service a order 0 page request in __alloc_pages from the pcp list before checking the aone_watermarks.

Try to service a order 0 page request from pcp list. This will allow us to not check and possibly start the reclaim activity when there are free pages present on the pcp. This early allocation does not try to replenish an empty pcp.

Signed-off-by: Rohit Seth <[email protected]>

--- linux-2.6.14-rc2-mm1.org/mm/page_alloc.c 2005-09-27 10:03:51.000000000 -0700
+++ linux-2.6.14-rc2-mm1/mm/page_alloc.c 2005-09-28 17:38:15.000000000 -0700
@@ -716,6 +716,39 @@
clear_highpage(page + i);
}

+/* This routine allocates a order 0 page from cpu's pcp list when one is present.
+ * It does not try to remove the pages from zone_free_list as the zone low
+ * water mark has not yet been checked.
+ */
+
+static struct page *
+remove_from_pcp(struct zone *zone, unsigned int __nocast gfp_flags)
+{
+ unsigned long flags;
+ struct per_cpu_pages *pcp;
+ struct page *page = NULL;
+ int cold = !!(gfp_flags & __GFP_COLD);
+
+ pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
+ local_irq_save(flags);
+ if (pcp->count > pcp->low) {
+ page = list_entry(pcp->list.next, struct page, lru);
+ list_del(&page->lru);
+ pcp->count--;
+ }
+ local_irq_restore(flags);
+ put_cpu();
+
+ if (page != NULL) {
+ mod_page_state_zone(zone, pgalloc, 1 );
+ prep_new_page(page, 0);
+
+ if (gfp_flags & __GFP_ZERO)
+ prep_zero_page(page, 0, gfp_flags);
+ }
+ return page;
+}
+
/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch
@@ -905,6 +938,12 @@
if (!cpuset_zone_allowed(z, __GFP_HARDWALL))
continue;

+ if (order == 0) {
+ page = remove_from_pcp(z, gfp_mask);
+ if (page)
+ goto got_pg;
+ }
+
/*
* If the zone is to attempt early page reclaim then this loop
* will try to reclaim pages and check the watermark a second

2005-09-29 22:36:25

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

"Seth, Rohit" <[email protected]> wrote:
>
> [PATCH]: Try to service a order 0 page request in __alloc_pages from the pcp list before checking the aone_watermarks.
>
> Try to service a order 0 page request from pcp list. This will allow us to not check and possibly start the reclaim activity when there are free pages present on the pcp. This early allocation does not try to replenish an empty pcp.

(Please avoid the 240-column emails!)

Why is this a desirable change to make?

2005-09-29 22:48:34

by Martin Bligh

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

> Try to service a order 0 page request from pcp list. This will allow us to not check and possibly start the reclaim activity when there are free pages present on the pcp. This early allocation does not try to replenish an empty pcp.
>
> Signed-off-by: Rohit Seth <[email protected]>

It seems a bit odd to copy more code to do this, which I think we already
have in buffered_rmqueue? Can we clean up the flow a bit here ... it
is already looking messy in __alloc_pages with various gotos and crud
there. I'm not saying what you're trying to do is bad, but the flow
in there is getting more and move convoluted, and we perhaps need to
straighten it.

It looks like we're now dropping into direct reclaim as the first thing
in __alloc_pages before even trying to kick off kswapd. When the hell
did that start? Or is that only meant to trigger if we're already below
the low watermark level?

What do we want to do at a higher level?

if (order 0) and (have stuff in the local lists)
take from local lists
else if (we're under a little pressure)
do kswapd reclaim
else if (we're under a lot of pressure)
do direct reclaim?

That whole code area seems to have been turned into spagetti, without
any clear comments.

M.

> --- linux-2.6.14-rc2-mm1.org/mm/page_alloc.c 2005-09-27 10:03:51.000000000 -0700
> +++ linux-2.6.14-rc2-mm1/mm/page_alloc.c 2005-09-28 17:38:15.000000000 -0700
> @@ -716,6 +716,39 @@
> clear_highpage(page + i);
> }
>
> +/* This routine allocates a order 0 page from cpu's pcp list when one is present.
> + * It does not try to remove the pages from zone_free_list as the zone low
> + * water mark has not yet been checked.
> + */
> +
> +static struct page *
> +remove_from_pcp(struct zone *zone, unsigned int __nocast gfp_flags)
> +{
> + unsigned long flags;
> + struct per_cpu_pages *pcp;
> + struct page *page = NULL;
> + int cold = !!(gfp_flags & __GFP_COLD);
> +
> + pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
> + local_irq_save(flags);
> + if (pcp->count > pcp->low) {
> + page = list_entry(pcp->list.next, struct page, lru);
> + list_del(&page->lru);
> + pcp->count--;
> + }
> + local_irq_restore(flags);
> + put_cpu();
> +
> + if (page != NULL) {
> + mod_page_state_zone(zone, pgalloc, 1 );
> + prep_new_page(page, 0);
> +
> + if (gfp_flags & __GFP_ZERO)
> + prep_zero_page(page, 0, gfp_flags);
> + }
> + return page;
> +}
> +
> /*
> * Really, prep_compound_page() should be called from __rmqueue_bulk(). But
> * we cheat by calling it from here, in the order > 0 path. Saves a branch
> @@ -905,6 +938,12 @@
> if (!cpuset_zone_allowed(z, __GFP_HARDWALL))
> continue;
>
> + if (order == 0) {
> + page = remove_from_pcp(z, gfp_mask);
> + if (page)
> + goto got_pg;
> + }
> +
> /*
> * If the zone is to attempt early page reclaim then this loop
> * will try to reclaim pages and check the watermark a second
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
>

2005-09-29 22:50:09

by Dave Hansen

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

On Thu, 2005-09-29 at 15:01 -0700, Seth, Rohit wrote:
> +/* This routine allocates a order 0 page from cpu's pcp list when one is present.
> + * It does not try to remove the pages from zone_free_list as the zone low
> + * water mark has not yet been checked.
> + */
> +
> +static struct page *
> +remove_from_pcp(struct zone *zone, unsigned int __nocast gfp_flags)
> +{
> + unsigned long flags;
> + struct per_cpu_pages *pcp;
> + struct page *page = NULL;
> + int cold = !!(gfp_flags & __GFP_COLD);
> +
> + pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
> + local_irq_save(flags);
> + if (pcp->count > pcp->low) {
> + page = list_entry(pcp->list.next, struct page, lru);
> + list_del(&page->lru);
> + pcp->count--;
> + }
> + local_irq_restore(flags);
> + put_cpu();
> +
> + if (page != NULL) {
> + mod_page_state_zone(zone, pgalloc, 1 );
> + prep_new_page(page, 0);
> +
> + if (gfp_flags & __GFP_ZERO)
> + prep_zero_page(page, 0, gfp_flags);
> + }
> + return page;
> +}
> +

That looks to share a decent amount of logic with the pcp code in
buffered_rmqueue. Any chance it could be consolidated instead of
copy/pasting?

-- Dave

2005-09-29 23:11:46

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

"Martin J. Bligh" <[email protected]> wrote:
>
> It looks like we're now dropping into direct reclaim as the first thing
> in __alloc_pages before even trying to kick off kswapd. When the hell
> did that start? Or is that only meant to trigger if we're already below
> the low watermark level?

That's all the numa goop which Martin Hicks added. It's all disabled if
z->reclaim_pages is zero (it is). However we could be testing that flag a
bit earlier, I think.

And yeah, some de-spaghettification would be nice. Certainly before adding
more logic.

Martin, should we take out the early zone reclaim logic? It's all
unreachable at present anyway.

2005-09-30 00:58:09

by Rohit Seth

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

On Thu, 2005-09-29 at 15:36 -0700, Andrew Morton wrote:
> "Seth, Rohit" <[email protected]> wrote:
> >
> > [PATCH]: Try to service a order 0 page request in __alloc_pages from the pcp list before checking the aone_watermarks.
> >
> > Try to service a order 0 page request from pcp list. This will allow us to not check and possibly start the reclaim activity when there are free pages present on the pcp. This early allocation does not try to replenish an empty pcp.
>
> (Please avoid the 240-column emails!)
>

This is bad. Sorry.

> Why is this a desirable change to make?

This change avoids any checks for watermarks and starting potential
reclaims for the cases where we already know we don't need to.

-rohit

2005-09-30 01:03:09

by Rohit Seth

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

On Thu, 2005-09-29 at 15:50 -0700, Dave Hansen wrote:

>
> That looks to share a decent amount of logic with the pcp code in
> buffered_rmqueue. Any chance it could be consolidated instead of
> copy/pasting?
>

It indeed does share most of the code with buffered_rmqueue. And it is
definitely possible to streamline this control flow. But that would
require more changes in the existing code (didn't want to make that as
part of this patch to start with).

-rohit

2005-09-30 01:24:59

by Rohit Seth

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

On Thu, 2005-09-29 at 15:48 -0700, Martin J. Bligh wrote:
> > Try to service a order 0 page request from pcp list. This will allow us to not check and possibly start the reclaim activity when there are free pages present on the pcp. This early allocation does not try to replenish an empty pcp.
> >
> > Signed-off-by: Rohit Seth <[email protected]>
>
> It seems a bit odd to copy more code to do this, which I think we already
> have in buffered_rmqueue? Can we clean up the flow a bit here ... it
> is already looking messy in __alloc_pages with various gotos and crud
> there. I'm not saying what you're trying to do is bad, but the flow
> in there is getting more and move convoluted, and we perhaps need to
> straighten it.
>

I will update/streamline __alloc_pages code and send the patch.

> It looks like we're now dropping into direct reclaim as the first thing
> in __alloc_pages before even trying to kick off kswapd. When the hell
> did that start? Or is that only meant to trigger if we're already below
> the low watermark level?
>

As Andrew said in the other mail that do_reclaim is never true so the
first reclaim never happens. That also means that we don't look at pcp
for the scenarios when zone has run below the low water mark before
waking kswapd.

> What do we want to do at a higher level?
>
> if (order 0) and (have stuff in the local lists)
> take from local lists
> else if (we're under a little pressure)
> do kswapd reclaim
> else if (we're under a lot of pressure)
> do direct reclaim?
>
> That whole code area seems to have been turned into spagetti, without
> any clear comments.

Agreed.

2005-09-30 01:32:46

by Martin Bligh

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

> I will update/streamline __alloc_pages code and send the patch.
>
>> It looks like we're now dropping into direct reclaim as the first thing
>> in __alloc_pages before even trying to kick off kswapd. When the hell
>> did that start? Or is that only meant to trigger if we're already below
>> the low watermark level?
>>
>
> As Andrew said in the other mail that do_reclaim is never true so the
> first reclaim never happens. That also means that we don't look at pcp
> for the scenarios when zone has run below the low water mark before
> waking kswapd.
>
>> What do we want to do at a higher level?
>>
>> if (order 0) and (have stuff in the local lists)
>> take from local lists
>> else if (we're under a little pressure)
>> do kswapd reclaim
>> else if (we're under a lot of pressure)
>> do direct reclaim?
>>
>> That whole code area seems to have been turned into spagetti, without
>> any clear comments.
>
> Agreed.

Thanks. I didn't mean you'd done so, BTW ... just many iterations of people
stacking more and more stuff on top without it ever getting cleaned up,
and it's got a bit silly ;-)

M.

2005-09-30 01:52:16

by Rohit Seth

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

On Thu, 2005-09-29 at 16:11 -0700, Andrew Morton wrote:
> "Martin J. Bligh" <[email protected]> wrote:
> >
> > It looks like we're now dropping into direct reclaim as the first thing
> > in __alloc_pages before even trying to kick off kswapd. When the hell
> > did that start? Or is that only meant to trigger if we're already below
> > the low watermark level?
>
> That's all the numa goop which Martin Hicks added. It's all disabled if
> z->reclaim_pages is zero (it is). However we could be testing that flag a
> bit earlier, I think.
>
> And yeah, some de-spaghettification would be nice. Certainly before adding
> more logic.
>
> Martin, should we take out the early zone reclaim logic? It's all
> unreachable at present anyway.
>
...yeah just like sys_set_zone_reclaim. was it intended to be added as
a system call?

-rohit

2005-09-30 12:30:34

by Martin Hicks

[permalink] [raw]

Subject: Re: [PATCH] earlier allocation of order 0 pages from pcp in __alloc_pages

On Thu, Sep 29, 2005 at 04:11:18PM -0700, Andrew Morton wrote:
> "Martin J. Bligh" <[email protected]> wrote:
> >
> > It looks like we're now dropping into direct reclaim as the first thing
> > in __alloc_pages before even trying to kick off kswapd. When the hell
> > did that start? Or is that only meant to trigger if we're already below
> > the low watermark level?
>
> That's all the numa goop which Martin Hicks added. It's all disabled if
> z->reclaim_pages is zero (it is). However we could be testing that flag a
> bit earlier, I think.
>
> And yeah, some de-spaghettification would be nice. Certainly before adding
> more logic.
>
> Martin, should we take out the early zone reclaim logic? It's all
> unreachable at present anyway.
>

Yes, Please do.

mh

--
Martin Hicks || Silicon Graphics Inc. || [email protected]