2010-07-08 07:38:19

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH v2 1/2] vmscan: don't subtraction of unsined

'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
unsafe.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
---
mm/vmscan.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..8715da1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
.swappiness = vm_swappiness,
.order = order,
};
- unsigned long slab_reclaimable;
+ unsigned long n, m;

disable_swap_token();
cond_resched();
@@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
}

- slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
- if (slab_reclaimable > zone->min_slab_pages) {
+ n = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ if (n > zone->min_slab_pages) {
/*
* shrink_slab() does not currently allow us to determine how
* many pages were freed in this zone. So we take the current
@@ -2628,16 +2628,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
* take a long time.
*/
while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
- zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
- slab_reclaimable - nr_pages)
+ (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
;

/*
* Update nr_reclaimed by the number of slab pages we
* reclaimed from this zone.
*/
- sc.nr_reclaimed += slab_reclaimable -
- zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ m = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ if (m < n)
+ sc.nr_reclaimed += n - m;
}

p->reclaim_state = NULL;
--
1.6.5.2



2010-07-08 07:40:56

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

Fix simple argument error. Usually 'order' is very small value than
lru_pages. then it can makes unnecessary icache dropping.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/vmscan.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8715da1..60d813b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2617,6 +2617,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)

n = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
if (n > zone->min_slab_pages) {
+ unsigned long lru_pages = zone_reclaimable_pages(zone);
+
/*
* shrink_slab() does not currently allow us to determine how
* many pages were freed in this zone. So we take the current
@@ -2627,7 +2629,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
* Note that shrink_slab will free memory on all zones and may
* take a long time.
*/
- while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
+ while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
(zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
;

--
1.6.5.2


2010-07-08 07:41:44

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

Oops, sorry. I did forget cc Christoph.
resend.


> 'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
> unsafe.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Acked-by: Christoph Lameter <[email protected]>
> ---
> mm/vmscan.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9c7e57c..8715da1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> .swappiness = vm_swappiness,
> .order = order,
> };
> - unsigned long slab_reclaimable;
> + unsigned long n, m;
>
> disable_swap_token();
> cond_resched();
> @@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
> }
>
> - slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> - if (slab_reclaimable > zone->min_slab_pages) {
> + n = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + if (n > zone->min_slab_pages) {
> /*
> * shrink_slab() does not currently allow us to determine how
> * many pages were freed in this zone. So we take the current
> @@ -2628,16 +2628,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> * take a long time.
> */
> while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> - zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
> - slab_reclaimable - nr_pages)
> + (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
> ;
>
> /*
> * Update nr_reclaimed by the number of slab pages we
> * reclaimed from this zone.
> */
> - sc.nr_reclaimed += slab_reclaimable -
> - zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + m = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + if (m < n)
> + sc.nr_reclaimed += n - m;
> }
>
> p->reclaim_state = NULL;
> --
> 1.6.5.2
>
>
>


2010-07-08 13:23:47

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On 07/08/2010 03:40 AM, KOSAKI Motohiro wrote:
> Fix simple argument error. Usually 'order' is very small value than
> lru_pages. then it can makes unnecessary icache dropping.
>
> Signed-off-by: KOSAKI Motohiro<[email protected]>

Acked-by: Rik van Riel <[email protected]>

--
All rights reversed

2010-07-08 14:05:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On Thu, 8 Jul 2010, KOSAKI Motohiro wrote:

> Fix simple argument error. Usually 'order' is very small value than
> lru_pages. then it can makes unnecessary icache dropping.

AFAICT this is not argument error but someone changed the naming of the
parameter. The "lru_pages" parameter is really a division factor affecting
the number of pages scanned. This patch increases this division factor
significantly and therefore reduces the number of items scanned during
zone_reclaim.

2010-07-08 20:02:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

On Thu, 8 Jul 2010 16:38:10 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> 'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
> unsafe.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Acked-by: Christoph Lameter <[email protected]>
> ---
> mm/vmscan.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9c7e57c..8715da1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> .swappiness = vm_swappiness,
> .order = order,
> };
> - unsigned long slab_reclaimable;
> + unsigned long n, m;

Please use better identifiers.

> disable_swap_token();
> cond_resched();
> @@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
> }
>
> - slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> - if (slab_reclaimable > zone->min_slab_pages) {
> + n = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + if (n > zone->min_slab_pages) {
> /*
> * shrink_slab() does not currently allow us to determine how
> * many pages were freed in this zone. So we take the current
> @@ -2628,16 +2628,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> * take a long time.
> */
> while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> - zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
> - slab_reclaimable - nr_pages)
> + (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
> ;
>
> /*
> * Update nr_reclaimed by the number of slab pages we
> * reclaimed from this zone.
> */
> - sc.nr_reclaimed += slab_reclaimable -
> - zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + m = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + if (m < n)
> + sc.nr_reclaimed += n - m;

And it's not a completly trivial objection. Your patch made the above
code snippet quite a lot harder to read (and hence harder to maintain).

2010-07-08 20:33:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On Thu, 8 Jul 2010 09:04:18 -0500 (CDT)
Christoph Lameter <[email protected]> wrote:

> On Thu, 8 Jul 2010, KOSAKI Motohiro wrote:
>
> > Fix simple argument error. Usually 'order' is very small value than
> > lru_pages. then it can makes unnecessary icache dropping.
>
> AFAICT this is not argument error but someone changed the naming of the
> parameter.

It's been there since day zero:

: commit 2a16e3f4b0c408b9e50297d2ec27e295d490267a
: Author: Christoph Lameter <[email protected]>
: AuthorDate: Wed Feb 1 03:05:35 2006 -0800
: Commit: Linus Torvalds <[email protected]>
: CommitDate: Wed Feb 1 08:53:16 2006 -0800
:
: [PATCH] Reclaim slab during zone reclaim
:
: If large amounts of zone memory are used by empty slabs then zone_reclaim
: becomes uneffective. This patch shakes the slab a bit.
:
: The problem with this patch is that the slab reclaim is not containable to a
: zone. Thus slab reclaim may affect the whole system and be extremely slow.
: This also means that we cannot determine how many pages were freed in this
: zone. Thus we need to go off node for at least one allocation.
:
: The functionality is disabled by default.
:
: We could modify the shrinkers to take a zone parameter but that would be quite
: invasive. Better ideas are welcome.
:
: Signed-off-by: Christoph Lameter <[email protected]>
: Signed-off-by: Andrew Morton <[email protected]>
: Signed-off-by: Linus Torvalds <[email protected]>
:
: diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
: index 4bca2a3..a46c10f 100644
: --- a/Documentation/sysctl/vm.txt
: +++ b/Documentation/sysctl/vm.txt
: @@ -137,6 +137,7 @@ This is value ORed together of
: 1 = Zone reclaim on
: 2 = Zone reclaim writes dirty pages out
: 4 = Zone reclaim swaps pages
: +8 = Also do a global slab reclaim pass
:
: zone_reclaim_mode is set during bootup to 1 if it is determined that pages
: from remote zones will cause a measurable performance reduction. The
: @@ -160,6 +161,11 @@ Allowing regular swap effectively restricts allocations to the local
: node unless explicitly overridden by memory policies or cpuset
: configurations.
:
: +It may be advisable to allow slab reclaim if the system makes heavy
: +use of files and builds up large slab caches. However, the slab
: +shrink operation is global, may take a long time and free slabs
: +in all nodes of the system.
: +
: ================================================================
:
: zone_reclaim_interval:
: diff --git a/mm/vmscan.c b/mm/vmscan.c
: index 9e2ef36..aa4b80d 100644
: --- a/mm/vmscan.c
: +++ b/mm/vmscan.c
: @@ -1596,6 +1596,7 @@ int zone_reclaim_mode __read_mostly;
: #define RECLAIM_ZONE (1<<0) /* Run shrink_cache on the zone */
: #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
: #define RECLAIM_SWAP (1<<2) /* Swap pages out during reclaim */
: +#define RECLAIM_SLAB (1<<3) /* Do a global slab shrink if the zone is out of memory */
:
: /*
: * Mininum time between zone reclaim scans
: @@ -1666,6 +1667,19 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
:
: } while (sc.nr_reclaimed < nr_pages && sc.priority > 0);
:
: + if (sc.nr_reclaimed < nr_pages && (zone_reclaim_mode & RECLAIM_SLAB)) {
: + /*
: + * shrink_slab does not currently allow us to determine
: + * how many pages were freed in the zone. So we just
: + * shake the slab and then go offnode for a single allocation.
: + *
: + * shrink_slab will free memory on all zones and may take
: + * a long time.
: + */
: + shrink_slab(sc.nr_scanned, gfp_mask, order);
: + sc.nr_reclaimed = 1; /* Avoid getting the off node timeout */
: + }
: +
: p->reclaim_state = NULL;
: current->flags &= ~PF_MEMALLOC;

> The "lru_pages" parameter is really a division factor affecting
> the number of pages scanned. This patch increases this division factor
> significantly and therefore reduces the number of items scanned during
> zone_reclaim.
>

And for that reason I won't apply the patch. I'd be crazy to do so.
It tosses away four years testing, replacing it with something which
could have a large effect on reclaim behaviour, with no indication
whether that effect is good or bad.

2010-07-08 21:02:46

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On Thu, 8 Jul 2010, Andrew Morton wrote:

> > AFAICT this is not argument error but someone changed the naming of the
> > parameter.
>
> It's been there since day zero:
>
> : commit 2a16e3f4b0c408b9e50297d2ec27e295d490267a
> : Author: Christoph Lameter <[email protected]>
> : AuthorDate: Wed Feb 1 03:05:35 2006 -0800
> : Commit: Linus Torvalds <[email protected]>
> : CommitDate: Wed Feb 1 08:53:16 2006 -0800
> :
> : [PATCH] Reclaim slab during zone reclaim

That only shows that the order parameter was passed to shrink_slab() which
is what I remember being done intentionally.

Vaguely recall that this was necessary to avoid shrink_slab() throwing out
too many pages for higher order allocs.

Initially zone_reclaim was full of heuristics that later were replaced by
counter as the new ZVCs became available and gradually better ways of
accounting for pages became possible.


2010-07-09 00:46:36

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

> On Thu, 8 Jul 2010, Andrew Morton wrote:
>
> > > AFAICT this is not argument error but someone changed the naming of the
> > > parameter.
> >
> > It's been there since day zero:
> >
> > : commit 2a16e3f4b0c408b9e50297d2ec27e295d490267a
> > : Author: Christoph Lameter <[email protected]>
> > : AuthorDate: Wed Feb 1 03:05:35 2006 -0800
> > : Commit: Linus Torvalds <[email protected]>
> > : CommitDate: Wed Feb 1 08:53:16 2006 -0800
> > :
> > : [PATCH] Reclaim slab during zone reclaim
>
> That only shows that the order parameter was passed to shrink_slab() which
> is what I remember being done intentionally.
>
> Vaguely recall that this was necessary to avoid shrink_slab() throwing out
> too many pages for higher order allocs.

But It make opposite effect. number of scanning objects of shrink_slab() are

lru_scanned max_pass
basic_scan_objects = 4 x ------------- x -----------------------------
lru_pages shrinker->seeks (default:2)

scan_objects = min(basic_scan_objects, max_pass * 2)


That said, small lru_pages parameter makes too many slab dropping.
Practically, zone-reclaim always take max_pass*2. about inode,
shrink_icache_memory() takes number of unused inode as max_pass.
It mean one shrink_slab() calling drop all icache. Is this optimal
behavior? why?

Am I missing something?

> Initially zone_reclaim was full of heuristics that later were replaced by
> counter as the new ZVCs became available and gradually better ways of
> accounting for pages became possible.

2010-07-09 01:16:37

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined


> > @@ -2628,16 +2628,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > * take a long time.
> > */
> > while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> > - zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
> > - slab_reclaimable - nr_pages)
> > + (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
> > ;
> >
> > /*
> > * Update nr_reclaimed by the number of slab pages we
> > * reclaimed from this zone.
> > */
> > - sc.nr_reclaimed += slab_reclaimable -
> > - zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > + m = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > + if (m < n)
> > + sc.nr_reclaimed += n - m;
>
> And it's not a completly trivial objection. Your patch made the above
> code snippet quite a lot harder to read (and hence harder to maintain).

Initially, I proposed following patch to Christoph. but he prefer n and m.
To be honest, I don't think this naming is big matter. so you prefer following
I'll submit it.




=====================================================================
>From 397199d69860061eaa5e1aaadac45c46c76b0522 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Wed, 30 Jun 2010 13:35:16 +0900
Subject: [PATCH] vmscan: don't subtraction of unsined

'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
unsafe.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/vmscan.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..79ff877 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
.swappiness = vm_swappiness,
.order = order,
};
- unsigned long slab_reclaimable;
+ unsigned long nr_slab_pages0, nr_slab_pages1;

disable_swap_token();
cond_resched();
@@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
}

- slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
- if (slab_reclaimable > zone->min_slab_pages) {
+ nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ if (nr_slab_pages0 > zone->min_slab_pages) {
/*
* shrink_slab() does not currently allow us to determine how
* many pages were freed in this zone. So we take the current
@@ -2628,16 +2628,17 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
* take a long time.
*/
while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
- zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
- slab_reclaimable - nr_pages)
+ (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
+ nr_slab_pages0))
;

/*
* Update nr_reclaimed by the number of slab pages we
* reclaimed from this zone.
*/
- sc.nr_reclaimed += slab_reclaimable -
- zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ nr_slab_pages1 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ if (nr_slab_pages1 < nr_slab_pages0)
+ sc.nr_reclaimed += nr_slab_pages0 - nr_slab_pages1;
}

p->reclaim_state = NULL;
--
1.6.5.2






2010-07-09 01:46:36

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

On Fri, Jul 9, 2010 at 10:16 AM, KOSAKI Motohiro
<[email protected]> wrote:
>
>> > @@ -2628,16 +2628,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>> >              * take a long time.
>> >              */
>> >             while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
>> > -                   zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
>> > -                           slab_reclaimable - nr_pages)
>> > +                  (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
>> >                     ;
>> >
>> >             /*
>> >              * Update nr_reclaimed by the number of slab pages we
>> >              * reclaimed from this zone.
>> >              */
>> > -           sc.nr_reclaimed += slab_reclaimable -
>> > -                   zone_page_state(zone, NR_SLAB_RECLAIMABLE);
>> > +           m = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
>> > +           if (m < n)
>> > +                   sc.nr_reclaimed += n - m;
>>
>> And it's not a completly trivial objection.  Your patch made the above
>> code snippet quite a lot harder to read (and hence harder to maintain).
>
> Initially, I proposed following patch to Christoph. but he prefer n and m.
> To be honest, I don't think this naming is big matter. so you prefer following
> I'll submit it.
>
>
>
>
> =====================================================================
> From 397199d69860061eaa5e1aaadac45c46c76b0522 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <[email protected]>
> Date: Wed, 30 Jun 2010 13:35:16 +0900
> Subject: [PATCH] vmscan: don't subtraction of unsined
>
> 'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
> unsafe.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

I like this than n,m.

--
Kind regards,
Minchan Kim

2010-07-09 08:21:18

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

> > The "lru_pages" parameter is really a division factor affecting
> > the number of pages scanned. This patch increases this division factor
> > significantly and therefore reduces the number of items scanned during
> > zone_reclaim.
> >
>
> And for that reason I won't apply the patch. I'd be crazy to do so.
> It tosses away four years testing, replacing it with something which
> could have a large effect on reclaim behaviour, with no indication
> whether that effect is good or bad.

Unfortunatelly, current code is more crazy. it is nearly worst bad behavior.
I've applied attached debugging patch and got following result.

That said, really low memory pressure (scan = 32, order = 0) drop _all_
icache and dcache (about 100MB!). I don't blieve anyone tested slab behavior
on zone_reclaim=1 for this four years.

please remember shrink_slab() scanning equation. order=0 always makes
all slab dropping!

btw, current shrink_slab() don't exit even if (*shrinker->shrink)(0) return 0.
It's a bit inefficient and meaningless loop iteration. I'll make a fix.


<...>-2677 [001] 840.832711: shrink_slab: shrink_icache_memory before=56100 after=56000
<...>-2677 [001] 840.832898: shrink_slab: shrink_icache_memory before=56000 after=55900
<...>-2677 [001] 840.833081: shrink_slab: shrink_icache_memory before=55900 after=55800
<...>-2677 [001] 840.833693: shrink_slab: shrink_icache_memory before=55800 after=55600
<...>-2677 [001] 840.833860: shrink_slab: shrink_icache_memory before=55600 after=55500
<...>-2677 [001] 840.834033: shrink_slab: shrink_icache_memory before=55500 after=55400
<...>-2677 [001] 840.834201: shrink_slab: shrink_icache_memory before=55400 after=55200
<...>-2677 [001] 840.834385: shrink_slab: shrink_icache_memory before=55200 after=55100
<...>-2677 [001] 840.834553: shrink_slab: shrink_icache_memory before=55100 after=55000
<...>-2677 [001] 840.834720: shrink_slab: shrink_icache_memory before=55000 after=54900
<...>-2677 [001] 840.834889: shrink_slab: shrink_icache_memory before=54900 after=54700
<...>-2677 [001] 840.835063: shrink_slab: shrink_icache_memory before=54700 after=54600
<...>-2677 [001] 840.835229: shrink_slab: shrink_icache_memory before=54600 after=54500
<...>-2677 [001] 840.835596: shrink_slab: shrink_icache_memory before=54500 after=54300
<...>-2677 [001] 840.835761: shrink_slab: shrink_icache_memory before=54300 after=54200
<...>-2677 [001] 840.835926: shrink_slab: shrink_icache_memory before=54200 after=54100
<...>-2677 [001] 840.836097: shrink_slab: shrink_icache_memory before=54100 after=54000
<...>-2677 [001] 840.836284: shrink_slab: shrink_icache_memory before=54000 after=53800
<...>-2677 [001] 840.836453: shrink_slab: shrink_icache_memory before=53800 after=53700
<...>-2677 [001] 840.836621: shrink_slab: shrink_icache_memory before=53700 after=53600
<...>-2677 [001] 840.836793: shrink_slab: shrink_icache_memory before=53600 after=53500
<...>-2677 [001] 840.836962: shrink_slab: shrink_icache_memory before=53500 after=53300
<...>-2677 [001] 840.837137: shrink_slab: shrink_icache_memory before=53300 after=53200
<...>-2677 [001] 840.837317: shrink_slab: shrink_icache_memory before=53200 after=53100
<...>-2677 [001] 840.837485: shrink_slab: shrink_icache_memory before=53100 after=52900
<...>-2677 [001] 840.837652: shrink_slab: shrink_icache_memory before=52900 after=52800
<...>-2677 [001] 840.837821: shrink_slab: shrink_icache_memory before=52800 after=52700
<...>-2677 [001] 840.837993: shrink_slab: shrink_icache_memory before=52700 after=52600
<...>-2677 [001] 840.838168: shrink_slab: shrink_icache_memory before=52600 after=52400
<...>-2677 [001] 840.838353: shrink_slab: shrink_icache_memory before=52400 after=52300
<...>-2677 [001] 840.838524: shrink_slab: shrink_icache_memory before=52300 after=52200
<...>-2677 [001] 840.838695: shrink_slab: shrink_icache_memory before=52200 after=52000
<...>-2677 [001] 840.838865: shrink_slab: shrink_icache_memory before=52000 after=51900
<...>-2677 [001] 840.839037: shrink_slab: shrink_icache_memory before=51900 after=51800
<...>-2677 [001] 840.839207: shrink_slab: shrink_icache_memory before=51800 after=51700
<...>-2677 [001] 840.839422: shrink_slab: shrink_icache_memory before=51700 after=51500
<...>-2677 [001] 840.839589: shrink_slab: shrink_icache_memory before=51500 after=51400
<...>-2677 [001] 840.839753: shrink_slab: shrink_icache_memory before=51400 after=51300
<...>-2677 [001] 840.839920: shrink_slab: shrink_icache_memory before=51300 after=51100
<...>-2677 [001] 840.840094: shrink_slab: shrink_icache_memory before=51100 after=51000
<...>-2677 [001] 840.840278: shrink_slab: shrink_icache_memory before=51000 after=50900
<...>-2677 [001] 840.840446: shrink_slab: shrink_icache_memory before=50900 after=50800
<...>-2677 [001] 840.840618: shrink_slab: shrink_icache_memory before=50800 after=50600
<...>-2677 [001] 840.840787: shrink_slab: shrink_icache_memory before=50600 after=50500
<...>-2677 [001] 840.840953: shrink_slab: shrink_icache_memory before=50500 after=50400
<...>-2677 [001] 840.841128: shrink_slab: shrink_icache_memory before=50400 after=50300
<...>-2677 [001] 840.841310: shrink_slab: shrink_icache_memory before=50300 after=50100
<...>-2677 [001] 840.841480: shrink_slab: shrink_icache_memory before=50100 after=50000
<...>-2677 [001] 840.841649: shrink_slab: shrink_icache_memory before=50000 after=49900
<...>-2677 [001] 840.841815: shrink_slab: shrink_icache_memory before=49900 after=49700
<...>-2677 [001] 840.841984: shrink_slab: shrink_icache_memory before=49700 after=49600
<...>-2677 [001] 840.842159: shrink_slab: shrink_icache_memory before=49600 after=49500
<...>-2677 [001] 840.842346: shrink_slab: shrink_icache_memory before=49500 after=49400
<...>-2677 [001] 840.842515: shrink_slab: shrink_icache_memory before=49400 after=49200
<...>-2677 [001] 840.842684: shrink_slab: shrink_icache_memory before=49200 after=49100
<...>-2677 [001] 840.842864: shrink_slab: shrink_icache_memory before=49100 after=49000
<...>-2677 [001] 840.843039: shrink_slab: shrink_icache_memory before=49000 after=48800
<...>-2677 [001] 840.843205: shrink_slab: shrink_icache_memory before=48800 after=48700
<...>-2677 [001] 840.843391: shrink_slab: shrink_icache_memory before=48700 after=48600
<...>-2677 [001] 840.843560: shrink_slab: shrink_icache_memory before=48600 after=48500
<...>-2677 [001] 840.843735: shrink_slab: shrink_icache_memory before=48500 after=48300
<...>-2677 [001] 840.844964: shrink_slab: shrink_icache_memory before=48300 after=48200
<...>-2677 [001] 840.845242: shrink_slab: shrink_icache_memory before=48200 after=48100
<...>-2677 [001] 840.845411: shrink_slab: shrink_icache_memory before=48100 after=47900
<...>-2677 [001] 840.845581: shrink_slab: shrink_icache_memory before=47900 after=47800
<...>-2677 [001] 840.845752: shrink_slab: shrink_icache_memory before=47800 after=47700
<...>-2677 [001] 840.845920: shrink_slab: shrink_icache_memory before=47700 after=47600
<...>-2677 [001] 840.860766: shrink_slab: shrink_icache_memory before=47600 after=47400
<...>-2677 [001] 840.860949: shrink_slab: shrink_icache_memory before=47400 after=47300
<...>-2677 [001] 840.861118: shrink_slab: shrink_icache_memory before=47300 after=47200
<...>-2677 [001] 840.861306: shrink_slab: shrink_icache_memory before=47200 after=47100
<...>-2677 [001] 840.861476: shrink_slab: shrink_icache_memory before=47100 after=46900
<...>-2677 [001] 840.861646: shrink_slab: shrink_icache_memory before=46900 after=46800
<...>-2677 [001] 840.861817: shrink_slab: shrink_icache_memory before=46800 after=46700
<...>-2677 [001] 840.861986: shrink_slab: shrink_icache_memory before=46700 after=46500
<...>-2677 [001] 840.862159: shrink_slab: shrink_icache_memory before=46500 after=46400
<...>-2677 [001] 840.862438: shrink_slab: shrink_icache_memory before=46400 after=46300
<...>-2677 [001] 840.862626: shrink_slab: shrink_icache_memory before=46300 after=46200
<...>-2677 [001] 840.862796: shrink_slab: shrink_icache_memory before=46200 after=46000
<...>-2677 [001] 840.862963: shrink_slab: shrink_icache_memory before=46000 after=45900
<...>-2677 [001] 840.863148: shrink_slab: shrink_icache_memory before=45900 after=45800
<...>-2677 [001] 840.863323: shrink_slab: shrink_icache_memory before=45800 after=45600
<...>-2677 [001] 840.863492: shrink_slab: shrink_icache_memory before=45600 after=45500
<...>-2677 [001] 840.863656: shrink_slab: shrink_icache_memory before=45500 after=45400
<...>-2677 [001] 840.863821: shrink_slab: shrink_icache_memory before=45400 after=45300
<...>-2677 [001] 840.863988: shrink_slab: shrink_icache_memory before=45300 after=45100
CPU:1 [LOST 702 EVENTS]
<...>-2677 [001] 840.928410: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928411: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928412: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928414: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928415: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928416: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928418: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928419: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928420: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928422: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928423: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928425: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928426: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928427: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928429: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928430: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928432: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928433: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928434: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928436: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928437: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928438: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928440: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928441: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928443: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928444: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928445: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928447: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928448: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928449: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928451: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928452: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928454: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928455: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928456: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928458: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928459: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928460: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928462: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928463: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928464: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928466: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928467: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928468: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928470: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928471: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928473: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928474: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928475: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928477: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928478: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928479: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928481: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928482: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928483: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928485: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928486: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928487: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928489: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928490: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928492: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928493: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928494: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928496: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928497: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928498: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928500: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928501: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928502: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928504: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928505: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928506: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928508: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928509: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928510: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928512: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928513: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928515: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928516: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928518: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928519: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928520: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928522: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928523: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928524: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928526: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928527: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928529: shrink_slab: shrink_icache_memory before=0 after=0
<...>-2677 [001] 840.928532: zone_reclaim: scan 32, order 0, old 39546 new 16605





diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5a377e6..5767a08 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -244,6 +244,12 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,

nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
+
+ trace_printk("%pf before=%d after=%d\n",
+ shrinker->shrink,
+ nr_before,
+ shrink_ret);
+
if (shrink_ret == -1)
break;
if (shrink_ret < nr_before)
@@ -2619,10 +2625,23 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int orde
* Note that shrink_slab will free memory on all zones and may
* take a long time.
*/
- while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
- zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
- slab_reclaimable - nr_pages)
- ;
+ for (;;) {
+ unsigned long slab;
+
+ if (!shrink_slab(sc.nr_scanned, gfp_mask, order))
+ break;
+
+ slab = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+
+ trace_printk("scan %lu, order %d, old %lu new %lu\n",
+ sc.nr_scanned,
+ order,
+ slab_reclaimable,
+ slab);
+
+ if (slab + nr_pages <= slab_reclaimable)
+ break;
+ }

2010-07-09 08:36:10

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On Thu, Jul 8, 2010 at 4:40 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Fix simple argument error. Usually 'order' is very small value than
> lru_pages. then it can makes unnecessary icache dropping.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

With your test result, This patch makes sense to me.
Please, include your test result in description.

--
Kind regards,
Minchan Kim

2010-07-09 10:14:04

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

If number of reclaimable slabs are zero, shrink_icache_memory() and
shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
it and continue meaningless loop iteration.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/vmscan.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0f9f624..8f61adb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -243,6 +243,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
int nr_before;

nr_before = (*shrinker->shrink)(0, gfp_mask);
+ /* no slab objects, no more reclaim. */
+ if (nr_before == 0) {
+ total_scan = 0;
+ break;
+ }
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
if (shrink_ret == -1)
break;
--
1.6.5.2


2010-07-09 10:53:15

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

On Fri, Jul 9, 2010 at 7:13 PM, KOSAKI Motohiro
<[email protected]> wrote:
> If number of reclaimable slabs are zero, shrink_icache_memory() and
> shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
> it and continue meaningless loop iteration.
>
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> ?mm/vmscan.c | ? ?5 +++++
> ?1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0f9f624..8f61adb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -243,6 +243,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
> ? ? ? ? ? ? ? ? ? ? ? ?int nr_before;
>
> ? ? ? ? ? ? ? ? ? ? ? ?nr_before = (*shrinker->shrink)(0, gfp_mask);
> + ? ? ? ? ? ? ? ? ? ? ? /* no slab objects, no more reclaim. */
> + ? ? ? ? ? ? ? ? ? ? ? if (nr_before == 0) {
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? total_scan = 0;

Why do you reset totoal_scan to 0?
I don't know exact meaning of shrinker->nr.
AFAIU, it can affect next shrinker's total_scan.
Isn't it harmful?

--
Kind regards,
Minchan Kim

2010-07-09 11:04:43

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

> On Fri, Jul 9, 2010 at 7:13 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> > If number of reclaimable slabs are zero, shrink_icache_memory() and
> > shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
> > it and continue meaningless loop iteration.
> >
> > This patch fixes it.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> > ---
> > ?mm/vmscan.c | ? ?5 +++++
> > ?1 files changed, 5 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 0f9f624..8f61adb 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -243,6 +243,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
> > ? ? ? ? ? ? ? ? ? ? ? ?int nr_before;
> >
> > ? ? ? ? ? ? ? ? ? ? ? ?nr_before = (*shrinker->shrink)(0, gfp_mask);
> > + ? ? ? ? ? ? ? ? ? ? ? /* no slab objects, no more reclaim. */
> > + ? ? ? ? ? ? ? ? ? ? ? if (nr_before == 0) {
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? total_scan = 0;
>
> Why do you reset totoal_scan to 0?

If shab objects are zero, we don't need more reclaim.

> I don't know exact meaning of shrinker->nr.

similar meaning of reclaim_stat->nr_saved_scan.
If total_scan can't divide SHRINK_BATCH(128), saving remainder and using at next shrink_slab().

> AFAIU, it can affect next shrinker's total_scan.
> Isn't it harmful?

No. This loop is

total_scan = shrinker->nr; /* Reset and init total_scan */
shrinker->nr = 0;

while (total_scan >= SHRINK_BATCH) {
nr_before = (*shrinker->shrink)(0, gfp_mask);
/* no slab objects, no more reclaim. */
if (nr_before == 0) {
total_scan = 0;
break;
}
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
if (shrink_ret == -1)
break;
if (shrink_ret < nr_before)
ret += nr_before - shrink_ret;
total_scan -= this_scan;
}

shrinker->nr += total_scan; /* save remainder #of-scan */


2010-07-09 14:03:24

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

On Fri, 9 Jul 2010, KOSAKI Motohiro wrote:

> If number of reclaimable slabs are zero, shrink_icache_memory() and
> shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
> it and continue meaningless loop iteration.

There is also a per zone/node/global counter SLAB_RECLAIM_ACCOUNT that
could be used to determine if its worth looking at things at all. I saw
some effort going into making the shrinkers zone aware. If so then we may
be able to avoid scanning slabs.

2010-07-09 22:29:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

On Fri, 9 Jul 2010 10:16:33 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> .swappiness = vm_swappiness,
> .order = order,
> };
> - unsigned long slab_reclaimable;
> + unsigned long nr_slab_pages0, nr_slab_pages1;
>
> disable_swap_token();
> cond_resched();
> @@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
> }
>
> - slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> - if (slab_reclaimable > zone->min_slab_pages) {
> + nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + if (nr_slab_pages0 > zone->min_slab_pages) {
> /*
> * shrink_slab() does not currently allow us to determine how
> * many pages were freed in this zone.

Well no, but it could do so, with some minor changes to struct
reclaim_state and its handling. Put a zone* and a counter in
reclaim_state, handle them in sl?b.c.

> So we take the current
> @@ -2628,16 +2628,17 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> * take a long time.
> */
> while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> - zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
> - slab_reclaimable - nr_pages)
> + (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
> + nr_slab_pages0))
> ;
>
> /*
> * Update nr_reclaimed by the number of slab pages we
> * reclaimed from this zone.
> */
> - sc.nr_reclaimed += slab_reclaimable -
> - zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + nr_slab_pages1 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> + if (nr_slab_pages1 < nr_slab_pages0)
> + sc.nr_reclaimed += nr_slab_pages0 - nr_slab_pages1;

My, that's horrible. The whole expression says "this number is
basically a pile of random junk. Let's add it in anyway".


> }
>
> p->reclaim_state = NULL;

2010-07-11 22:28:12

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

On Fri, Jul 9, 2010 at 8:04 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> On Fri, Jul 9, 2010 at 7:13 PM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> > If number of reclaimable slabs are zero, shrink_icache_memory() and
>> > shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
>> > it and continue meaningless loop iteration.
>> >
>> > This patch fixes it.
>> >
>> > Signed-off-by: KOSAKI Motohiro <[email protected]>
>> > ---
>> > ?mm/vmscan.c | ? ?5 +++++
>> > ?1 files changed, 5 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 0f9f624..8f61adb 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -243,6 +243,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>> > ? ? ? ? ? ? ? ? ? ? ? ?int nr_before;
>> >
>> > ? ? ? ? ? ? ? ? ? ? ? ?nr_before = (*shrinker->shrink)(0, gfp_mask);
>> > + ? ? ? ? ? ? ? ? ? ? ? /* no slab objects, no more reclaim. */
>> > + ? ? ? ? ? ? ? ? ? ? ? if (nr_before == 0) {
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? total_scan = 0;
>>
>> Why do you reset totoal_scan to 0?
>
> If shab objects are zero, we don't need more reclaim.
>
>> I don't know exact meaning of shrinker->nr.
>
> similar meaning of reclaim_stat->nr_saved_scan.
> If total_scan can't divide SHRINK_BATCH(128), saving remainder and using at next shrink_slab().
>
>> AFAIU, it can affect next shrinker's total_scan.
>> Isn't it harmful?
>
> No. ?This loop is
>
> ? ? ? ? ? ? ? ?total_scan = shrinker->nr; ? ? ? ? ? ? ?/* Reset and init total_scan */
> ? ? ? ? ? ? ? ?shrinker->nr = 0;
>
> ? ? ? ? ? ? ? ?while (total_scan >= SHRINK_BATCH) {
> ? ? ? ? ? ? ? ? ? ? ? ?nr_before = (*shrinker->shrink)(0, gfp_mask);
> ? ? ? ? ? ? ? ? ? ? ? ?/* no slab objects, no more reclaim. */
> ? ? ? ? ? ? ? ? ? ? ? ?if (nr_before == 0) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?total_scan = 0;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?break;
> ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ? ? ? ? ?shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
> ? ? ? ? ? ? ? ? ? ? ? ?if (shrink_ret == -1)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?break;
> ? ? ? ? ? ? ? ? ? ? ? ?if (shrink_ret < nr_before)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ret += nr_before - shrink_ret;
> ? ? ? ? ? ? ? ? ? ? ? ?total_scan -= this_scan;
> ? ? ? ? ? ? ? ?}
>
> ? ? ? ? ? ? ? ?shrinker->nr += total_scan; ? ? ? ? ? ? /* save remainder #of-scan */
>
>
I can't understand your point.


old shrink_slab

shrinker->nr += delta; /* Add delta to previous shrinker's remained count */
total_scan = shrinker->nr;

while(total_scan >= SHRINK_BATCH) {
nr_before = shrink(xxx);
total_scan =- this_scan;
}

shrinker->nr += total_scan;

The total_scan can always be the number < SHRINK_BATCH.
So, when next shrinker calcuates loop count, the number can affect.

new shrink_slab

shrinker->nr += delta; /* nr is always zero by your patch */
total_scan = shrinker->nr;

while(total_scan >= SHRINK_BATCH) {
nr_before = shrink(xxx);
if (nr_before == 0) {
total_scan = 0;
break;
}
}

shrinker->nr += 0;

But after your patch, total_scan is always zero. It never affect
next shrinker's loop count.

Am I missing something?
--
Kind regards,
Minchan Kim

2010-07-13 04:48:34

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

Hi

>
> old shrink_slab
>
> shrinker->nr += delta; /* Add delta to previous shrinker's remained count */
> total_scan = shrinker->nr;
>
> while(total_scan >= SHRINK_BATCH) {
> nr_before = shrink(xxx);
> total_scan =- this_scan;
> }
>
> shrinker->nr += total_scan;
>
> The total_scan can always be the number < SHRINK_BATCH.
> So, when next shrinker calcuates loop count, the number can affect.

Correct.


>
> new shrink_slab
>
> shrinker->nr += delta; /* nr is always zero by your patch */

no.
my patch don't change delta calculation at all.


> total_scan = shrinker->nr;
>
> while(total_scan >= SHRINK_BATCH) {
> nr_before = shrink(xxx);
> if (nr_before == 0) {
> total_scan = 0;
> break;
> }
> }
>
> shrinker->nr += 0;
>
> But after your patch, total_scan is always zero. It never affect
> next shrinker's loop count.

No. after my patch this loop has two exiting way
1) total_scan are less than SHRINK_BATCH.
-> no behavior change. we still pass shrinker->nr += total_scan code.
2) (*shrinker->shrink)(0, gfp_mask) return 0
don't increase shrinker->nr. because two reason,
a) if total_scan are 10000, we shouldn't carry over such big number.
b) now, we have zero slab objects, then we have been freed form the guilty of keeping
balance page and slab reclaim. shrinker->nr += 0; have zero side effect.

Thanks.

>
> Am I missing something?
> --
> Kind regards,
> Minchan Kim


2010-07-13 04:59:19

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

> On Fri, 9 Jul 2010, KOSAKI Motohiro wrote:
>
> > If number of reclaimable slabs are zero, shrink_icache_memory() and
> > shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
> > it and continue meaningless loop iteration.
>
> There is also a per zone/node/global counter SLAB_RECLAIM_ACCOUNT that
> could be used to determine if its worth looking at things at all. I saw
> some effort going into making the shrinkers zone aware. If so then we may
> be able to avoid scanning slabs.

Yup.
After to merge nick's effort, we can makes more imrovement. I bet :)



2010-07-13 05:41:32

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

> On Thu, Jul 8, 2010 at 4:40 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> > Fix simple argument error. Usually 'order' is very small value than
> > lru_pages. then it can makes unnecessary icache dropping.
> >
> > Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: Minchan Kim <[email protected]>
>
> With your test result, This patch makes sense to me.
> Please, include your test result in description.

How's this?

==============================================================
>From 19872d74875e2331cbe7eca46c8ef65f5f00d7c4 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Tue, 13 Jul 2010 13:39:11 +0900
Subject: [PATCH] vmscan: shrink_slab() require number of lru_pages, not page order

Now, shrink_slab() has following scanning equation.

lru_scanned max_pass
basic_scan_objects = 4 x ------------- x -----------------------------
lru_pages shrinker->seeks (default:2)

scan_objects = min(basic_scan_objects, max_pass * 2)

Then, If we pass very small value as lru_pages instead real number of
lru pages, shrink_slab() drop much objects rather than necessary. and
now, __zone_reclaim() pass 'order' as lru_pages by mistake. that makes
bad result.

Example, If we receive very low memory pressure (scan = 32, order = 0),
shrink_slab() via zone_reclaim() always drop _all_ icache/dcache
objects. (see above equation, very small lru_pages make very big
scan_objects result)

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Acked-by: Rik van Riel <[email protected]>
---
mm/vmscan.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6ff51c0..1bf9f72 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2612,6 +2612,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)

nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
if (nr_slab_pages0 > zone->min_slab_pages) {
+ unsigned long lru_pages = zone_reclaimable_pages(zone);
+
/*
* shrink_slab() does not currently allow us to determine how
* many pages were freed in this zone. So we take the current
@@ -2622,7 +2624,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
* Note that shrink_slab will free memory on all zones and may
* take a long time.
*/
- while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
+ while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
(zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
nr_slab_pages0))
;
--
1.6.5.2



2010-07-13 06:33:59

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] vmscan: stop meaningless loop iteration when no reclaimable slab

On Tue, Jul 13, 2010 at 1:48 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Hi
>
>>
>> old shrink_slab
>>
>> shrinker->nr += delta; /* Add delta to previous shrinker's remained count */
>> total_scan = shrinker->nr;
>>
>> while(total_scan >= SHRINK_BATCH) {
>> ? ? ? nr_before = shrink(xxx);
>> ? ? ? total_scan =- this_scan;
>> }
>>
>> shrinker->nr += total_scan;
>>
>> The total_scan can always be the number < SHRINK_BATCH.
>> So, when next shrinker calcuates loop count, the number can affect.
>
> Correct.
>
>
>>
>> new shrink_slab
>>
>> shrinker->nr += delta; /* nr is always zero by your patch */
>
> no.
> my patch don't change delta calculation at all.
>
>
>> total_scan = shrinker->nr;
>>
>> while(total_scan >= SHRINK_BATCH) {
>> ? ? ? nr_before = shrink(xxx);
>> ? ? ? if (nr_before == 0) {
>> ? ? ? ? ? ? ? total_scan = 0;
>> ? ? ? ? ? ? ? break;
>> ? ? ? }
>> }
>>
>> shrinker->nr += 0;
>>
>> But after your patch, total_scan is always zero. It never affect
>> next shrinker's loop count.
>
> No. after my patch this loop has two exiting way
> ?1) total_scan are less than SHRINK_BATCH.
> ? ? ?-> no behavior change. ?we still pass shrinker->nr += total_scan code.
> ?2) (*shrinker->shrink)(0, gfp_mask) return 0
> ? ? ?don't increase shrinker->nr. ?because two reason,
> ? ? ?a) if total_scan are 10000, ?we shouldn't carry over such big number.
> ? ? ?b) now, we have zero slab objects, then we have been freed form the guilty of keeping
> ? ? ? ? ?balance page and slab reclaim. shrinker->nr += 0; have zero side effect.

Totally, I agree with you.
Thanks for good explanation, Kosaki.

Reviewed-by: Minchan kim <[email protected]>


--
Kind regards,
Minchan Kim

2010-07-13 09:32:27

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

> On Fri, 9 Jul 2010 10:16:33 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > .swappiness = vm_swappiness,
> > .order = order,
> > };
> > - unsigned long slab_reclaimable;
> > + unsigned long nr_slab_pages0, nr_slab_pages1;
> >
> > disable_swap_token();
> > cond_resched();
> > @@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
> > }
> >
> > - slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > - if (slab_reclaimable > zone->min_slab_pages) {
> > + nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > + if (nr_slab_pages0 > zone->min_slab_pages) {
> > /*
> > * shrink_slab() does not currently allow us to determine how
> > * many pages were freed in this zone.
>
> Well no, but it could do so, with some minor changes to struct
> reclaim_state and its handling. Put a zone* and a counter in
> reclaim_state, handle them in sl?b.c.
>
> > So we take the current
> > @@ -2628,16 +2628,17 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > * take a long time.
> > */
> > while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> > - zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
> > - slab_reclaimable - nr_pages)
> > + (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
> > + nr_slab_pages0))
> > ;
> >
> > /*
> > * Update nr_reclaimed by the number of slab pages we
> > * reclaimed from this zone.
> > */
> > - sc.nr_reclaimed += slab_reclaimable -
> > - zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > + nr_slab_pages1 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > + if (nr_slab_pages1 < nr_slab_pages0)
> > + sc.nr_reclaimed += nr_slab_pages0 - nr_slab_pages1;
>
> My, that's horrible. The whole expression says "this number is
> basically a pile of random junk. Let's add it in anyway".
>
>
> > }
> >
> > p->reclaim_state = NULL;


How's this?

Christoph, Can we hear your opinion about to add new branch in slab-free path?
I think this is ok, because reclaim makes a lot of cache miss then branch
mistaken is relatively minor penalty. thought?



>From 9f7d7a9bd836b7373ade3056e6a3d2a3d82ac7ce Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Tue, 13 Jul 2010 14:43:21 +0900
Subject: [PATCH] vmscan: count reclaimed slab pages properly

Andrew Morton pointed out __zone_reclaim() shouldn't compare old and new
zone_page_state(NR_SLAB_RECLAIMABLE) result. Instead, it have to account
number of free slab pages by to enhance reclaim_state.

This patch does it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/swap.h | 3 ++-
mm/slab.c | 4 +++-
mm/slob.c | 4 +++-
mm/slub.c | 7 +++++--
mm/vmscan.c | 44 ++++++++++++++++----------------------------
5 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ff4acea..b8d3f33 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -107,7 +107,8 @@ typedef struct {
* memory reclaim
*/
struct reclaim_state {
- unsigned long reclaimed_slab;
+ unsigned long reclaimed_slab;
+ struct zone *zone;
};

#ifdef __KERNEL__
diff --git a/mm/slab.c b/mm/slab.c
index 4e9c46f..aac9306 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1741,7 +1741,9 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
page++;
}
if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += nr_freed;
+ if (!current->reclaim_state->zone ||
+ current->reclaim_state->zone == page_zone(page))
+ current->reclaim_state->reclaimed_slab += nr_freed;
free_pages((unsigned long)addr, cachep->gfporder);
}

diff --git a/mm/slob.c b/mm/slob.c
index 3f19a34..192d05c 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -260,7 +260,9 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
static void slob_free_pages(void *b, int order)
{
if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += 1 << order;
+ if (!current->reclaim_state->zone ||
+ current->reclaim_state->zone == page_zone(page))
+ current->reclaim_state->reclaimed_slab += 1 << order;
free_pages((unsigned long)b, order);
}

diff --git a/mm/slub.c b/mm/slub.c
index 7bb7940..f510b14 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1204,8 +1204,11 @@ static void __free_slab(struct kmem_cache *s, struct page *page)

__ClearPageSlab(page);
reset_page_mapcount(page);
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += pages;
+ if (current->reclaim_state) {
+ if (!current->reclaim_state->zone ||
+ current->reclaim_state->zone == page_zone(page))
+ current->reclaim_state->reclaimed_slab += pages;
+ }
__free_pages(page, order);
}

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1bf9f72..8faef0c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2571,7 +2571,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
/* Minimum pages needed in order to stay on node */
const unsigned long nr_pages = 1 << order;
struct task_struct *p = current;
- struct reclaim_state reclaim_state;
int priority;
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
@@ -2583,8 +2582,10 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
.swappiness = vm_swappiness,
.order = order,
};
- unsigned long nr_slab_pages0, nr_slab_pages1;
-
+ struct reclaim_state reclaim_state = {
+ .reclaimed_slab = 0,
+ .zone = zone,
+ };

cond_resched();
/*
@@ -2594,7 +2595,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
*/
p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
lockdep_set_current_reclaim_state(gfp_mask);
- reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;

if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
@@ -2610,34 +2610,22 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
}

- nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
- if (nr_slab_pages0 > zone->min_slab_pages) {
+ if (zone_page_state(zone, NR_SLAB_RECLAIMABLE) > zone->min_slab_pages) {
unsigned long lru_pages = zone_reclaimable_pages(zone);

- /*
- * shrink_slab() does not currently allow us to determine how
- * many pages were freed in this zone. So we take the current
- * number of slab pages and shake the slab until it is reduced
- * by the same nr_pages that we used for reclaiming unmapped
- * pages.
- *
- * Note that shrink_slab will free memory on all zones and may
- * take a long time.
- */
- while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
- (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
- nr_slab_pages0))
- ;
-
- /*
- * Update nr_reclaimed by the number of slab pages we
- * reclaimed from this zone.
- */
- nr_slab_pages1 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
- if (nr_slab_pages1 < nr_slab_pages0)
- sc.nr_reclaimed += nr_slab_pages0 - nr_slab_pages1;
+ for(;;) {
+ /*
+ * Note that shrink_slab will free memory on all zones
+ * and may take a long time.
+ */
+ if (!shrink_slab(sc.nr_scanned, gfp_mask, lru_pages))
+ break;
+ if (reclaim_state.reclaimed_slab >= nr_pages)
+ break;
+ }
}

+ sc.nr_reclaimed += reclaim_state.reclaimed_slab;
p->reclaim_state = NULL;
current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
lockdep_clear_current_reclaim_state();
--
1.6.5.2





2010-07-14 01:51:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

On Tue, 13 Jul 2010, KOSAKI Motohiro wrote:

> Christoph, Can we hear your opinion about to add new branch in slab-free path?
> I think this is ok, because reclaim makes a lot of cache miss then branch
> mistaken is relatively minor penalty. thought?

Its on the slow path so I would think that should be okay. But is this
really necessary? Working with the per zone slab reclaim counters is not
enough? We are adding counter after counter that have similar purposes and
the handling gets more complex.

Maybe we can get rid of the code in the slabs instead by just relying on
the difference of the zone counters?

2010-07-14 02:15:37

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

> On Tue, 13 Jul 2010, KOSAKI Motohiro wrote:
>
> > Christoph, Can we hear your opinion about to add new branch in slab-free path?
> > I think this is ok, because reclaim makes a lot of cache miss then branch
> > mistaken is relatively minor penalty. thought?
>
> Its on the slow path so I would think that should be okay. But is this
> really necessary? Working with the per zone slab reclaim counters is not
> enough? We are adding counter after counter that have similar purposes and
> the handling gets more complex.
>
> Maybe we can get rid of the code in the slabs instead by just relying on
> the difference of the zone counters?

Okey, I agree. I'm pending this work at once. and I'll (probably) resume it
after Nick's work merged.

Thanks.



2010-07-15 19:16:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On Tue, 13 Jul 2010 14:41:28 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Now, shrink_slab() has following scanning equation.
>
> lru_scanned max_pass
> basic_scan_objects = 4 x ------------- x -----------------------------
> lru_pages shrinker->seeks (default:2)
>
> scan_objects = min(basic_scan_objects, max_pass * 2)
>
> Then, If we pass very small value as lru_pages instead real number of
> lru pages, shrink_slab() drop much objects rather than necessary. and
> now, __zone_reclaim() pass 'order' as lru_pages by mistake. that makes
> bad result.
>
> Example, If we receive very low memory pressure (scan = 32, order = 0),
> shrink_slab() via zone_reclaim() always drop _all_ icache/dcache
> objects. (see above equation, very small lru_pages make very big
> scan_objects result)
>
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Reviewed-by: Minchan Kim <[email protected]>
> Acked-by: Christoph Lameter <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> ---
> mm/vmscan.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6ff51c0..1bf9f72 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2612,6 +2612,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>
> nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> if (nr_slab_pages0 > zone->min_slab_pages) {
> + unsigned long lru_pages = zone_reclaimable_pages(zone);
> +
> /*
> * shrink_slab() does not currently allow us to determine how
> * many pages were freed in this zone. So we take the current
> @@ -2622,7 +2624,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> * Note that shrink_slab will free memory on all zones and may
> * take a long time.
> */
> - while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> + while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
> (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
> nr_slab_pages0))
> ;

Wouldn't it be better to recalculate zone_reclaimable_pages() each time
around the loop? For example, shrink_icache_memory()->prune_icache()
will remove pagecache from an inode if it hits the tail of the list.
This can change the number of reclaimable pages by squigabytes, but
this code thinks nothing changed?

2010-07-16 01:40:05

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

> > nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> > if (nr_slab_pages0 > zone->min_slab_pages) {
> > + unsigned long lru_pages = zone_reclaimable_pages(zone);
> > +
> > /*
> > * shrink_slab() does not currently allow us to determine how
> > * many pages were freed in this zone. So we take the current
> > @@ -2622,7 +2624,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > * Note that shrink_slab will free memory on all zones and may
> > * take a long time.
> > */
> > - while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
> > + while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
> > (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
> > nr_slab_pages0))
> > ;
>
> Wouldn't it be better to recalculate zone_reclaimable_pages() each time
> around the loop? For example, shrink_icache_memory()->prune_icache()
> will remove pagecache from an inode if it hits the tail of the list.
> This can change the number of reclaimable pages by squigabytes, but
> this code thinks nothing changed?

Ah, I missed this. incrementa patch is here.

thank you!



>From 8f7c70cfb4a25f8292a59564db6c3ff425a69b53 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Fri, 16 Jul 2010 08:40:01 +0900
Subject: [PATCH] vmscan: recalculate lru_pages on each shrink_slab()

Andrew Morton pointed out shrink_slab() may change number of reclaimable
pages (e.g. shrink_icache_memory()->prune_icache() will remove unmapped
pagecache).

So, we need to recalculate lru_pages on each shrink_slab() calling.
This patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/vmscan.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1bf9f72..1da9b14 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2612,8 +2612,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)

nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
if (nr_slab_pages0 > zone->min_slab_pages) {
- unsigned long lru_pages = zone_reclaimable_pages(zone);
-
/*
* shrink_slab() does not currently allow us to determine how
* many pages were freed in this zone. So we take the current
@@ -2624,10 +2622,18 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
* Note that shrink_slab will free memory on all zones and may
* take a long time.
*/
- while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
- (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
- nr_slab_pages0))
- ;
+ for (;;) {
+ unsigned long lru_pages = zone_reclaimable_pages(zone);
+
+ /* No reclaimable slab or very low memroy pressure */
+ if (!shrink_slab(sc.nr_scanned, gfp_mask, lru_pages))
+ break;
+
+ /* Freed enouch memory */
+ nr_slab_pages1 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+ if (nr_slab_pages1 + nr_pages <= nr_slab_pages0)
+ break;
+ }

/*
* Update nr_reclaimed by the number of slab pages we
--
1.6.5.2


2010-07-16 01:44:27

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] vmscan: shrink_slab() require number of lru_pages, not page order

On Fri, Jul 16, 2010 at 10:39 AM, KOSAKI Motohiro
<[email protected]> wrote:
>> > ? ? nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
>> > ? ? if (nr_slab_pages0 > zone->min_slab_pages) {
>> > + ? ? ? ? ? unsigned long lru_pages = zone_reclaimable_pages(zone);
>> > +
>> > ? ? ? ? ? ? /*
>> > ? ? ? ? ? ? ?* shrink_slab() does not currently allow us to determine how
>> > ? ? ? ? ? ? ?* many pages were freed in this zone. So we take the current
>> > @@ -2622,7 +2624,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>> > ? ? ? ? ? ? ?* Note that shrink_slab will free memory on all zones and may
>> > ? ? ? ? ? ? ?* take a long time.
>> > ? ? ? ? ? ? ?*/
>> > - ? ? ? ? ? while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
>> > + ? ? ? ? ? while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
>> > ? ? ? ? ? ? ? ? ? ?(zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
>> > ? ? ? ? ? ? ? ? ? ? nr_slab_pages0))
>> > ? ? ? ? ? ? ? ? ? ? ;
>>
>> Wouldn't it be better to recalculate zone_reclaimable_pages() each time
>> around the loop? ?For example, shrink_icache_memory()->prune_icache()
>> will remove pagecache from an inode if it hits the tail of the list.
>> This can change the number of reclaimable pages by squigabytes, but
>> this code thinks nothing changed?
>
> Ah, I missed this. incrementa patch is here.
>
> thank you!
>
>
>
> From 8f7c70cfb4a25f8292a59564db6c3ff425a69b53 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <[email protected]>
> Date: Fri, 16 Jul 2010 08:40:01 +0900
> Subject: [PATCH] vmscan: recalculate lru_pages on each shrink_slab()
>
> Andrew Morton pointed out shrink_slab() may change number of reclaimable
> pages (e.g. shrink_icache_memory()->prune_icache() will remove unmapped
> pagecache).
>
> So, we need to recalculate lru_pages on each shrink_slab() calling.
> This patch fixes it.
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

It does make sense.

--
Kind regards,
Minchan Kim