2011-02-09 13:21:25

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH] mm: batch-free pcp list if possible

free_pcppages_bulk() frees pages from pcp lists in a round-robin
fashion by keeping batch_free counter. But it doesn't need to spin
if there is only one non-empty list. This can be checked by
batch_free == MIGRATE_PCPTYPES.

Signed-off-by: Namhyung Kim <[email protected]>
---
mm/page_alloc.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a873e61e312e..470fb42e303c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
list = &pcp->lists[migratetype];
} while (list_empty(list));

+ /* This is an only non-empty list. Free them all. */
+ if (batch_free == MIGRATE_PCPTYPES)
+ batch_free = to_free;
+
do {
page = list_entry(list->prev, struct page, lru);
/* must delete as __free_one_page list manipulates */
--
1.7.3.4.600.g982838b0


2011-02-09 14:42:14

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Wed, Feb 09, 2011 at 10:21:17PM +0900, Namhyung Kim wrote:
> free_pcppages_bulk() frees pages from pcp lists in a round-robin
> fashion by keeping batch_free counter. But it doesn't need to spin
> if there is only one non-empty list. This can be checked by
> batch_free == MIGRATE_PCPTYPES.
>
> Signed-off-by: Namhyung Kim <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

> ---
> mm/page_alloc.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a873e61e312e..470fb42e303c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> list = &pcp->lists[migratetype];
> } while (list_empty(list));
>
> + /* This is an only non-empty list. Free them all. */
> + if (batch_free == MIGRATE_PCPTYPES)
> + batch_free = to_free;
> +
> do {
> page = list_entry(list->prev, struct page, lru);
> /* must delete as __free_one_page list manipulates */
> --
> 1.7.3.4.600.g982838b0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2011-02-09 20:38:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Wed, 9 Feb 2011 22:21:17 +0900
Namhyung Kim <[email protected]> wrote:

> free_pcppages_bulk() frees pages from pcp lists in a round-robin
> fashion by keeping batch_free counter. But it doesn't need to spin
> if there is only one non-empty list. This can be checked by
> batch_free == MIGRATE_PCPTYPES.
>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> mm/page_alloc.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a873e61e312e..470fb42e303c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> list = &pcp->lists[migratetype];
> } while (list_empty(list));
>
> + /* This is an only non-empty list. Free them all. */
> + if (batch_free == MIGRATE_PCPTYPES)
> + batch_free = to_free;
> +
> do {
> page = list_entry(list->prev, struct page, lru);
> /* must delete as __free_one_page list manipulates */

free_pcppages_bulk() hurts my brain.

What is it actually trying to do, and why? It counts up the number of
contiguous empty lists and then frees that number of pages from the
first-encountered non-empty list and then advances onto the next list?

What's the point in that? What relationship does the number of
contiguous empty lists have with the number of pages to free from one
list?

The comment "This is so more pages are freed off fuller lists instead
of spinning excessively around empty lists" makes no sense - the only
way this can be true is if the code knows the number of elements on
each list, and it doesn't know that.

Also, the covering comments over free_pcppages_bulk() regarding the
pages_scanned counter and the "all pages pinned" logic appear to be out
of date. Or, alternatively, those comments do reflect the desired
design, but we broke it.


Methinks that free_pcppages_bulk() is an area ripe for simplification
and clarification.

2011-02-09 21:33:47

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Wed, Feb 09, 2011 at 12:38:03PM -0800, Andrew Morton wrote:
> On Wed, 9 Feb 2011 22:21:17 +0900
> Namhyung Kim <[email protected]> wrote:
>
> > free_pcppages_bulk() frees pages from pcp lists in a round-robin
> > fashion by keeping batch_free counter. But it doesn't need to spin
> > if there is only one non-empty list. This can be checked by
> > batch_free == MIGRATE_PCPTYPES.
> >
> > Signed-off-by: Namhyung Kim <[email protected]>
> > ---
> > mm/page_alloc.c | 4 ++++
> > 1 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a873e61e312e..470fb42e303c 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > list = &pcp->lists[migratetype];
> > } while (list_empty(list));
> >
> > + /* This is an only non-empty list. Free them all. */
> > + if (batch_free == MIGRATE_PCPTYPES)
> > + batch_free = to_free;
> > +
> > do {
> > page = list_entry(list->prev, struct page, lru);
> > /* must delete as __free_one_page list manipulates */
>
> free_pcppages_bulk() hurts my brain.

Thanks for saying that ;-)

> What is it actually trying to do, and why? It counts up the number of
> contiguous empty lists and then frees that number of pages from the
> first-encountered non-empty list and then advances onto the next list?
>
> What's the point in that? What relationship does the number of
> contiguous empty lists have with the number of pages to free from one
> list?

It at least recovers some of the otherwise wasted effort of looking at
an empty list, by flushing more pages once it encounters a non-empty
list. After all, freeing to_free pages is the goal.

That breaks the round-robin fashion, though. If list-1 has pages,
list-2 is empty and list-3 has pages, it will repeatedly free one page
from list-1 and two pages from list-3.

My initial response to Namhyung's patch was to write up a version that
used a bitmap for all lists. It starts with all lists set and clears
their respective bit once the list is empty, so it would never
consider them again. But it looked a bit over-engineered for 3 lists
and the resulting object code was bigger than what we have now.
Though, it would be more readable. Attached for reference (untested
and all).

Hannes

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 60e58b0..c77ab28 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -590,8 +590,7 @@ static inline int free_pages_check(struct page *page)
static void free_pcppages_bulk(struct zone *zone, int count,
struct per_cpu_pages *pcp)
{
- int migratetype = 0;
- int batch_free = 0;
+ unsigned long listmap = (1 << MIGRATE_PCPTYPES) - 1;
int to_free = count;

spin_lock(&zone->lock);
@@ -599,31 +598,29 @@ static void free_pcppages_bulk(struct zone *zone, int count,
zone->pages_scanned = 0;

while (to_free) {
- struct page *page;
- struct list_head *list;
-
+ int migratetype;
/*
- * Remove pages from lists in a round-robin fashion. A
- * batch_free count is maintained that is incremented when an
- * empty list is encountered. This is so more pages are freed
- * off fuller lists instead of spinning excessively around empty
- * lists
+ * Remove pages from lists in a round-robin fashion.
+ * Empty lists are excluded from subsequent rounds.
*/
- do {
- batch_free++;
- if (++migratetype == MIGRATE_PCPTYPES)
- migratetype = 0;
- list = &pcp->lists[migratetype];
- } while (list_empty(list));
+ for_each_set_bit (migratetype, &listmap, MIGRATE_PCPTYPES) {
+ struct list_head *list;
+ struct page *page;

- do {
+ list = &pcp->lists[migratetype];
+ if (list_empty(list)) {
+ listmap &= ~(1 << migratetype);
+ continue;
+ }
+ if (!to_free--)
+ break;
page = list_entry(list->prev, struct page, lru);
/* must delete as __free_one_page list manipulates */
list_del(&page->lru);
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, page_private(page));
trace_mm_page_pcpu_drain(page, 0, page_private(page));
- } while (--to_free && --batch_free && !list_empty(list));
+ }
}
__mod_zone_page_state(zone, NR_FREE_PAGES, count);
spin_unlock(&zone->lock);

2011-02-09 21:48:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Wed, 9 Feb 2011 22:33:38 +0100
Johannes Weiner <[email protected]> wrote:

> On Wed, Feb 09, 2011 at 12:38:03PM -0800, Andrew Morton wrote:
> > On Wed, 9 Feb 2011 22:21:17 +0900
> > Namhyung Kim <[email protected]> wrote:
> >
> > > free_pcppages_bulk() frees pages from pcp lists in a round-robin
> > > fashion by keeping batch_free counter. But it doesn't need to spin
> > > if there is only one non-empty list. This can be checked by
> > > batch_free == MIGRATE_PCPTYPES.
> > >
> > > Signed-off-by: Namhyung Kim <[email protected]>
> > > ---
> > > mm/page_alloc.c | 4 ++++
> > > 1 files changed, 4 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index a873e61e312e..470fb42e303c 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > > list = &pcp->lists[migratetype];
> > > } while (list_empty(list));
> > >
> > > + /* This is an only non-empty list. Free them all. */
> > > + if (batch_free == MIGRATE_PCPTYPES)
> > > + batch_free = to_free;
> > > +
> > > do {
> > > page = list_entry(list->prev, struct page, lru);
> > > /* must delete as __free_one_page list manipulates */
> >
> > free_pcppages_bulk() hurts my brain.
>
> Thanks for saying that ;-)

My brain has a lot of scar tissue.

> > What is it actually trying to do, and why? It counts up the number of
> > contiguous empty lists and then frees that number of pages from the
> > first-encountered non-empty list and then advances onto the next list?
> >
> > What's the point in that? What relationship does the number of
> > contiguous empty lists have with the number of pages to free from one
> > list?
>
> It at least recovers some of the otherwise wasted effort of looking at
> an empty list, by flushing more pages once it encounters a non-empty
> list. After all, freeing to_free pages is the goal.
>
> That breaks the round-robin fashion, though. If list-1 has pages,
> list-2 is empty and list-3 has pages, it will repeatedly free one page
> from list-1 and two pages from list-3.
>
> My initial response to Namhyung's patch was to write up a version that
> used a bitmap for all lists. It starts with all lists set and clears
> their respective bit once the list is empty, so it would never
> consider them again. But it looked a bit over-engineered for 3 lists
> and the resulting object code was bigger than what we have now.
> Though, it would be more readable. Attached for reference (untested
> and all).
>
> Hannes
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 60e58b0..c77ab28 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -590,8 +590,7 @@ static inline int free_pages_check(struct page *page)
> static void free_pcppages_bulk(struct zone *zone, int count,
> struct per_cpu_pages *pcp)
> {
> - int migratetype = 0;
> - int batch_free = 0;
> + unsigned long listmap = (1 << MIGRATE_PCPTYPES) - 1;
> int to_free = count;
>
> spin_lock(&zone->lock);
> @@ -599,31 +598,29 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> zone->pages_scanned = 0;
>
> while (to_free) {
> - struct page *page;
> - struct list_head *list;
> -
> + int migratetype;
> /*
> - * Remove pages from lists in a round-robin fashion. A
> - * batch_free count is maintained that is incremented when an
> - * empty list is encountered. This is so more pages are freed
> - * off fuller lists instead of spinning excessively around empty
> - * lists
> + * Remove pages from lists in a round-robin fashion.
> + * Empty lists are excluded from subsequent rounds.
> */
> - do {
> - batch_free++;
> - if (++migratetype == MIGRATE_PCPTYPES)
> - migratetype = 0;
> - list = &pcp->lists[migratetype];
> - } while (list_empty(list));
> + for_each_set_bit (migratetype, &listmap, MIGRATE_PCPTYPES) {
> + struct list_head *list;
> + struct page *page;
>
> - do {
> + list = &pcp->lists[migratetype];
> + if (list_empty(list)) {
> + listmap &= ~(1 << migratetype);
> + continue;
> + }
> + if (!to_free--)
> + break;
> page = list_entry(list->prev, struct page, lru);
> /* must delete as __free_one_page list manipulates */
> list_del(&page->lru);
> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> __free_one_page(page, zone, 0, page_private(page));
> trace_mm_page_pcpu_drain(page, 0, page_private(page));
> - } while (--to_free && --batch_free && !list_empty(list));
> + }
> }
> __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> spin_unlock(&zone->lock);

Well, it replaces one linear search with another one. If you really
want to avoid repeated walking over empty lists then create a local
array `list_head *lists[MIGRATE_PCPTYPES]' (or MIGRATE_PCPTYPES+1 for
null-termination), populate it on entry and compact it as lists fall
empty. Then the code can simply walk around the lists until to_free is
satisfied or list_empty(lists[0]). It's not obviously worth the effort
though - the empty list_heads will be cache-hot and all the cost will
be in hitting cache-cold pageframes.


2011-02-09 23:23:13

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Thu, Feb 10, 2011 at 6:47 AM, Andrew Morton
<[email protected]> wrote:
> On Wed, 9 Feb 2011 22:33:38 +0100
> Johannes Weiner <[email protected]> wrote:
>
>> On Wed, Feb 09, 2011 at 12:38:03PM -0800, Andrew Morton wrote:
>> > On Wed,  9 Feb 2011 22:21:17 +0900
>> > Namhyung Kim <[email protected]> wrote:
>> >
>> > > free_pcppages_bulk() frees pages from pcp lists in a round-robin
>> > > fashion by keeping batch_free counter. But it doesn't need to spin
>> > > if there is only one non-empty list. This can be checked by
>> > > batch_free == MIGRATE_PCPTYPES.
>> > >
>> > > Signed-off-by: Namhyung Kim <[email protected]>
>> > > ---
>> > >  mm/page_alloc.c |    4 ++++
>> > >  1 files changed, 4 insertions(+), 0 deletions(-)
>> > >
>> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > > index a873e61e312e..470fb42e303c 100644
>> > > --- a/mm/page_alloc.c
>> > > +++ b/mm/page_alloc.c
>> > > @@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>> > >                   list = &pcp->lists[migratetype];
>> > >           } while (list_empty(list));
>> > >
>> > > +         /* This is an only non-empty list. Free them all. */
>> > > +         if (batch_free == MIGRATE_PCPTYPES)
>> > > +                 batch_free = to_free;
>> > > +
>> > >           do {
>> > >                   page = list_entry(list->prev, struct page, lru);
>> > >                   /* must delete as __free_one_page list manipulates */
>> >
>> > free_pcppages_bulk() hurts my brain.
>>
>> Thanks for saying that ;-)
>
> My brain has a lot of scar tissue.
>
>> > What is it actually trying to do, and why?  It counts up the number of
>> > contiguous empty lists and then frees that number of pages from the
>> > first-encountered non-empty list and then advances onto the next list?
>> >
>> > What's the point in that?  What relationship does the number of
>> > contiguous empty lists have with the number of pages to free from one
>> > list?
>>
>> It at least recovers some of the otherwise wasted effort of looking at
>> an empty list, by flushing more pages once it encounters a non-empty
>> list.  After all, freeing to_free pages is the goal.
>>
>> That breaks the round-robin fashion, though.  If list-1 has pages,
>> list-2 is empty and list-3 has pages, it will repeatedly free one page
>> from list-1 and two pages from list-3.
>>
>> My initial response to Namhyung's patch was to write up a version that
>> used a bitmap for all lists.  It starts with all lists set and clears
>> their respective bit once the list is empty, so it would never
>> consider them again.  But it looked a bit over-engineered for 3 lists
>> and the resulting object code was bigger than what we have now.
>> Though, it would be more readable.  Attached for reference (untested
>> and all).
>>
>>       Hannes
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 60e58b0..c77ab28 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -590,8 +590,7 @@ static inline int free_pages_check(struct page *page)
>>  static void free_pcppages_bulk(struct zone *zone, int count,
>>                                       struct per_cpu_pages *pcp)
>>  {
>> -     int migratetype = 0;
>> -     int batch_free = 0;
>> +     unsigned long listmap = (1 << MIGRATE_PCPTYPES) - 1;
>>       int to_free = count;
>>
>>       spin_lock(&zone->lock);
>> @@ -599,31 +598,29 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>       zone->pages_scanned = 0;
>>
>>       while (to_free) {
>> -             struct page *page;
>> -             struct list_head *list;
>> -
>> +             int migratetype;
>>               /*
>> -              * Remove pages from lists in a round-robin fashion. A
>> -              * batch_free count is maintained that is incremented when an
>> -              * empty list is encountered.  This is so more pages are freed
>> -              * off fuller lists instead of spinning excessively around empty
>> -              * lists
>> +              * Remove pages from lists in a round-robin fashion.
>> +              * Empty lists are excluded from subsequent rounds.
>>                */
>> -             do {
>> -                     batch_free++;
>> -                     if (++migratetype == MIGRATE_PCPTYPES)
>> -                             migratetype = 0;
>> -                     list = &pcp->lists[migratetype];
>> -             } while (list_empty(list));
>> +             for_each_set_bit (migratetype, &listmap, MIGRATE_PCPTYPES) {
>> +                     struct list_head *list;
>> +                     struct page *page;
>>
>> -             do {
>> +                     list = &pcp->lists[migratetype];
>> +                     if (list_empty(list)) {
>> +                             listmap &= ~(1 << migratetype);
>> +                             continue;
>> +                     }
>> +                     if (!to_free--)
>> +                             break;
>>                       page = list_entry(list->prev, struct page, lru);
>>                       /* must delete as __free_one_page list manipulates */
>>                       list_del(&page->lru);
>>                       /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>>                       __free_one_page(page, zone, 0, page_private(page));
>>                       trace_mm_page_pcpu_drain(page, 0, page_private(page));
>> -             } while (--to_free && --batch_free && !list_empty(list));
>> +             }
>>       }
>>       __mod_zone_page_state(zone, NR_FREE_PAGES, count);
>>       spin_unlock(&zone->lock);
>
> Well, it replaces one linear search with another one.  If you really
> want to avoid repeated walking over empty lists then create a local
> array `list_head *lists[MIGRATE_PCPTYPES]' (or MIGRATE_PCPTYPES+1 for
> null-termination), populate it on entry and compact it as lists fall
> empty.  Then the code can simply walk around the lists until to_free is
> satisfied or list_empty(lists[0]).  It's not obviously worth the effort
> though - the empty list_heads will be cache-hot and all the cost will
> be in hitting cache-cold pageframes.

Hannes's patch solves round-robin fairness as well as avoidance of
empty list although it makes rather bloated code.
I think it's enough to solve the fairness regardless of whether it's
Hannes's approach or your idea.

>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
Kind regards,
Minchan Kim

2011-02-10 09:36:14

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Wed, Feb 09, 2011 at 12:38:03PM -0800, Andrew Morton wrote:
> On Wed, 9 Feb 2011 22:21:17 +0900
> Namhyung Kim <[email protected]> wrote:
>
> > free_pcppages_bulk() frees pages from pcp lists in a round-robin
> > fashion by keeping batch_free counter. But it doesn't need to spin
> > if there is only one non-empty list. This can be checked by
> > batch_free == MIGRATE_PCPTYPES.
> >
> > Signed-off-by: Namhyung Kim <[email protected]>
> > ---
> > mm/page_alloc.c | 4 ++++
> > 1 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a873e61e312e..470fb42e303c 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -614,6 +614,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > list = &pcp->lists[migratetype];
> > } while (list_empty(list));
> >
> > + /* This is an only non-empty list. Free them all. */
> > + if (batch_free == MIGRATE_PCPTYPES)
> > + batch_free = to_free;
> > +
> > do {
> > page = list_entry(list->prev, struct page, lru);
> > /* must delete as __free_one_page list manipulates */
>
> free_pcppages_bulk() hurts my brain.
>

I vaguely recall trying to make it easier to understand. Each attempt
made it easier to read, but slower. At the time there were complaints
about the overhead of the page allocator so making it slower was not an
option. "Overhead" was what oprofile reported as the time spent in each
function.

> What is it actually trying to do, and why? It counts up the number of
> contiguous empty lists and then frees that number of pages from the
> first-encountered non-empty list and then advances onto the next list?
>

Yes. This is potentially unfair because lists for one migratetype can get
drained heavier than others. However, checking empty lists was showing up as
a reasonably significant cost according to profiles for allocator-intensive
workloads. I *think* the workload I was using was netperf-based.

> What's the point in that? What relationship does the number of
> contiguous empty lists have with the number of pages to free from one
> list?
>

The point is to avoid excessive checking of empty lists. There is no
relationship between the number of empty lists and the size of the next
list. The size of the lists is related to the workload and the resulting
allocator/free pattern.

> The comment "This is so more pages are freed off fuller lists instead
> of spinning excessively around empty lists" makes no sense - the only
> way this can be true is if the code knows the number of elements on
> each list, and it doesn't know that.
>

batch_free gets preserved if a list empties so if batch_free was 2 but
there was only 1 page on the next list, more pages are taken off a
larger list. We know what the total size of all the lists are so there
are always pages to find. You're right in that we don't know the size of
individual lists because space in the pcp structure is tight.

> Also, the covering comments over free_pcppages_bulk() regarding the
> pages_scanned counter and the "all pages pinned" logic appear to be out
> of date. Or, alternatively, those comments do reflect the desired
> design, but we broke it.
>

This comment is really old.... heh, you introduced it back in 2.5.49
apparently.

The comment is referring to the clearing of all_unreclaimable. By clearing it,
kswapd will scan that zone again and set all_unreclaimable back if necessary
and that is still valid.

More importantly, if there is another process in direct reclaim and it failed
to reclaim any pages, the clearing of all_unreclaimable will avoid the direct
reclaimer entering OOM.

The comment could be better but it doesn't look wrong, just not
particularly helpful.

> Methinks that free_pcppages_bulk() is an area ripe for simplification
> and clarification.
>

Probably but any patch that simplifies it needs to be accompanied with
profiles of an allocator-intensive workload showing it's not worse as a result.

--
Mel Gorman

2011-02-10 21:01:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: batch-free pcp list if possible

On Thu, 10 Feb 2011 09:35:44 +0000
Mel Gorman <[email protected]> wrote:

> > What's the point in that? What relationship does the number of
> > contiguous empty lists have with the number of pages to free from one
> > list?
> >
>
> The point is to avoid excessive checking of empty lists.

It seems pretty simple to me to skip the testing of empty lists
altogether. I suggested one way, however I suspect a better approach
might be to maintain a count of the number of pages in each list and
then change free_pcppages_bulk() so that it calculates up-front the
number of pages to free from each list (equal proportion of each) then
sits in a tight loop freeing that number of pages.

It might be that the overhead of maintaining the per-list count makes
that not worthwhile. It'll be hard to tell because the count
maintenance cost will be smeared all over the place.

I doubt if any of it matters much, compared to the cost of allocating,
populating and freeing a page. I just want free_pcppages_bulk() to
stop hurting my brain ;)