Recent changes in page allocations for pcps has increased the high watermark for these lists. This has resulted in scenarios where pcp lists could be having bigger number of free pages even under low memory conditions.
[PATCH]: Reduce the high mark in cpu's pcp lists.
Signed-off-by: Rohit Seth <[email protected]>
--- linux-2.6.14-rc2-mm1.org/mm/page_alloc.c 2005-09-27 10:03:51.000000000 -0700
+++ linux-2.6.14-rc2-mm1/mm/page_alloc.c 2005-09-27 18:01:21.000000000 -0700
@@ -1859,15 +1859,15 @@
pcp = &p->pcp[0]; /* hot */
pcp->count = 0;
pcp->low = 0;
- pcp->high = 6 * batch;
+ pcp->high = 4 * batch;
pcp->batch = max(1UL, 1 * batch);
INIT_LIST_HEAD(&pcp->list);
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
pcp->batch = max(1UL, batch/2);
+ pcp->high = pcp->batch + 1;
INIT_LIST_HEAD(&pcp->list);
}
On Wed, 28 Sep 2005, Seth, Rohit wrote:
> Recent changes in page allocations for pcps has increased the high watermark for these lists. This has resulted in scenarios where pcp lists could be having bigger number of free pages even under low memory conditions.
>
> [PATCH]: Reduce the high mark in cpu's pcp lists.
There is no need for such a patch. The pcp lists are regularly flushed.
See drain_remote_pages.
On Wed, 2005-09-28 at 13:01 -0700, Christoph Lameter wrote:
> On Wed, 28 Sep 2005, Seth, Rohit wrote:
>
> > Recent changes in page allocations for pcps has increased the high watermark for these lists. This has resulted in scenarios where pcp lists could be having bigger number of free pages even under low memory conditions.
> >
> > [PATCH]: Reduce the high mark in cpu's pcp lists.
>
> There is no need for such a patch. The pcp lists are regularly flushed.
> See drain_remote_pages.
CONFIG_NUMA needs to be defined for that. And then too for flushing the
remote pages. Also, when are you flushing the local pcps. Also note
that this patch is just bringing the free pages on the pcp list closer
to what used to be the number earlier.
-rohit
On Wed, 28 Sep 2005, Rohit Seth wrote:
> CONFIG_NUMA needs to be defined for that. And then too for flushing the
> remote pages. Also, when are you flushing the local pcps. Also note
> that this patch is just bringing the free pages on the pcp list closer
> to what used to be the number earlier.
What was the reason for the increase of those numbers?
--On Wednesday, September 28, 2005 13:01:23 -0700 Christoph Lameter <[email protected]> wrote:
> On Wed, 28 Sep 2005, Seth, Rohit wrote:
>
>> Recent changes in page allocations for pcps has increased the high watermark for these lists. This has resulted in scenarios where pcp lists could be having bigger number of free pages even under low memory conditions.
>>
>> [PATCH]: Reduce the high mark in cpu's pcp lists.
>
> There is no need for such a patch. The pcp lists are regularly flushed.
> See drain_remote_pages.
That's only retrieving pages which have migrated off-node, is it not?
M.
On Wed, 2005-09-28 at 14:09 -0700, Christoph Lameter wrote:
> On Wed, 28 Sep 2005, Rohit Seth wrote:
>
> > CONFIG_NUMA needs to be defined for that. And then too for flushing the
> > remote pages. Also, when are you flushing the local pcps. Also note
> > that this patch is just bringing the free pages on the pcp list closer
> > to what used to be the number earlier.
>
> What was the reason for the increase of those numbers?
>
Bugger batch size to possibly get more physical contiguous pages. That
indirectly increased the high water marks for the pcps.
On Wed, 28 Sep 2005, Martin J. Bligh wrote:
> >> Recent changes in page allocations for pcps has increased the high watermark for these lists. This has resulted in scenarios where pcp lists could be having bigger number of free pages even under low memory conditions.
> >>
> >> [PATCH]: Reduce the high mark in cpu's pcp lists.
> >
> > There is no need for such a patch. The pcp lists are regularly flushed.
> > See drain_remote_pages.
>
> That's only retrieving pages which have migrated off-node, is it not?
Its freeing all pages in off node pcps. There is no page migration
in the current kernels.
On Wed, 28 Sep 2005, Rohit Seth wrote:
> On Wed, 2005-09-28 at 14:09 -0700, Christoph Lameter wrote:
> > On Wed, 28 Sep 2005, Rohit Seth wrote:
> >
> > > CONFIG_NUMA needs to be defined for that. And then too for flushing the
> > > remote pages. Also, when are you flushing the local pcps. Also note
> > > that this patch is just bringing the free pages on the pcp list closer
> > > to what used to be the number earlier.
> >
> > What was the reason for the increase of those numbers?
> Bugger batch size to possibly get more physical contiguous pages. That
> indirectly increased the high water marks for the pcps.
I know that Jack and Nick did something with those counts to insure that
page coloring effects are avoided. Would you comment?
On Wed, 2005-09-28 at 14:56 -0700, Christoph Lameter wrote:
> On Wed, 28 Sep 2005, Rohit Seth wrote:
>
> > On Wed, 2005-09-28 at 14:09 -0700, Christoph Lameter wrote:
> > > On Wed, 28 Sep 2005, Rohit Seth wrote:
> > >
> > > > CONFIG_NUMA needs to be defined for that. And then too for flushing the
> > > > remote pages. Also, when are you flushing the local pcps. Also note
> > > > that this patch is just bringing the free pages on the pcp list closer
> > > > to what used to be the number earlier.
> > >
> > > What was the reason for the increase of those numbers?
> > Bugger batch size to possibly get more physical contiguous pages. That
> > indirectly increased the high water marks for the pcps.
>
> I know that Jack and Nick did something with those counts to insure that
> page coloring effects are avoided. Would you comment?
>
About 10% performance variation was seen from run to run with original
setting with certain workloads on x86 and IA-64 platforms. And this
variation came down to about 2% with new settings.
-rohit
Christoph Lameter wrote:
>
>I know that Jack and Nick did something with those counts to insure that
>page coloring effects are avoided. Would you comment?
>
>
The 'batch' argument to setup_pageset should be clamped to a power
of 2 minus 1 (ie. 15, 31, etc), which was found to avoid the worst
of the colouring problems.
pcp->high of the hotlist IMO should have been reduced to 4 anyway
after its pcp->low was reduced from 2 to 0.
I don't see that there would be any problems with playing with the
->high and ->low numbers so long as they are a reasonable multiple
of batch, however I would question the merit of setting the high
watermark of the cold queue to ->batch + 1 (should really stay at
2*batch IMO).
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com
On Thu, 2005-09-29 at 16:49 +1000, Nick Piggin wrote:
> I don't see that there would be any problems with playing with the
> ->high and ->low numbers so long as they are a reasonable multiple
> of batch, however I would question the merit of setting the high
> watermark of the cold queue to ->batch + 1 (should really stay at
> 2*batch IMO).
>
I agree that this watermark is little low at this point. But that is
mainly because currently we don't have a way to drain the pcps for low
memory conditions. Once I add that support, I will bump up the high
water marks.
Can you share a list of specific workloads that you ran earlier while
fixing these numbers.
-rohit