2015-11-24 11:55:12

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 0/2] 2 zone_pages_reclaimable fixes

Hi,
Johannes had a valid point [1] that zone_pages_reclaimable should contain
isolated pages as well. This is what the first patch does. While I was
there I've realized that the current logic of this function allows for
a large overestimation of the reclaimable memory with anon >> nr_swap_pages
which would be visible especially when the swap is getting short on space.
I think this is a bug and this is fixed in the second patch.

I do not have any particular workload which would show significant misbehavior
because of the current implementation though. We mostly just happen to scan
longer than necessary because zone_reclaimable would keep us looping longer
but I still think it makes sense to fix this regardless.

[1] http://lkml.kernel.org/r/20151123182447.GF13000%40cmpxchg.org


2015-11-24 11:55:14

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 1/2] mm, vmscan: consider isolated pages in zone_reclaimable_pages

From: Michal Hocko <[email protected]>

zone_reclaimable_pages counts how many pages are reclaimable in
the given zone. This currently includes all pages on file lrus and
anon lrus if there is an available swap storage. We do not consider
NR_ISOLATED_{ANON,FILE} counters though which is not correct because
these counters reflect temporarily isolated pages which are still
reclaimable because they either get back to their LRU or get freed
either by the page reclaim or page migration.

The number of these pages might be sufficiently high to confuse users of
zone_reclaimable_pages (e.g. mbind can migrate large ranges of memory at
once).

Suggested-by: Johannes Weiner <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/vmscan.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a4507ecaefbf..946d348f5040 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -197,11 +197,13 @@ static unsigned long zone_reclaimable_pages(struct zone *zone)
unsigned long nr;

nr = zone_page_state(zone, NR_ACTIVE_FILE) +
- zone_page_state(zone, NR_INACTIVE_FILE);
+ zone_page_state(zone, NR_INACTIVE_FILE) +
+ zone_page_state(zone, NR_ISOLATED_FILE);

if (get_nr_swap_pages() > 0)
nr += zone_page_state(zone, NR_ACTIVE_ANON) +
- zone_page_state(zone, NR_INACTIVE_ANON);
+ zone_page_state(zone, NR_INACTIVE_ANON) +
+ zone_page_state(zone, NR_ISOLATED_ANON);

return nr;
}
--
2.6.2

2015-11-24 11:55:39

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 2/2] mm, vmscan: do not overestimate anonymous reclaimable pages

From: Michal Hocko <[email protected]>

zone_reclaimable_pages considers all anonymous pages on LRUs reclaimable
if there is at least one entry on the swap storage left. This can be
really misleading when the swap is short on space and skew reclaim
decisions based on zone_reclaimable_pages. Fix this by clamping the
number to the minimum of the available swap space and anon LRU pages.

Signed-off-by: Michal Hocko <[email protected]>
---
mm/vmscan.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 946d348f5040..646001a1f279 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -195,15 +195,20 @@ static bool sane_reclaim(struct scan_control *sc)
static unsigned long zone_reclaimable_pages(struct zone *zone)
{
unsigned long nr;
+ long nr_swap = get_nr_swap_pages();

nr = zone_page_state(zone, NR_ACTIVE_FILE) +
zone_page_state(zone, NR_INACTIVE_FILE) +
zone_page_state(zone, NR_ISOLATED_FILE);

- if (get_nr_swap_pages() > 0)
- nr += zone_page_state(zone, NR_ACTIVE_ANON) +
- zone_page_state(zone, NR_INACTIVE_ANON) +
- zone_page_state(zone, NR_ISOLATED_ANON);
+ if (nr_swap > 0) {
+ unsigned long anon;
+
+ anon = zone_page_state(zone, NR_ACTIVE_ANON) +
+ zone_page_state(zone, NR_INACTIVE_ANON) +
+ zone_page_state(zone, NR_ISOLATED_ANON);
+ nr += min_t(unsigned long, nr_swap, anon);
+ }

return nr;
}
--
2.6.2

2015-11-24 13:07:58

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm, vmscan: do not overestimate anonymous reclaimable pages

On Tue, Nov 24, 2015 at 12:55:00PM +0100, Michal Hocko wrote:
> zone_reclaimable_pages considers all anonymous pages on LRUs reclaimable
> if there is at least one entry on the swap storage left. This can be
> really misleading when the swap is short on space and skew reclaim
> decisions based on zone_reclaimable_pages. Fix this by clamping the
> number to the minimum of the available swap space and anon LRU pages.

Suppose there's 100M of swap and 1G of anon pages. This patch makes
zone_reclaimable_pages return 100M instead of 1G in this case. If you
rotate 600M of oldest anon pages, which is quite possible,
zone_reclaimable will start returning false, which is wrong, because
there are still 400M pages that were not even scanned, besides those
600M of rotated pages could have become reclaimable after their ref bits
got cleared.

I think it is the name of zone_reclaimable_pages which is misleading. It
should be called something like "zone_scannable_pages" judging by how it
is used in zone_reclaimable.

Thanks,
Vladimir

>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/vmscan.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 946d348f5040..646001a1f279 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -195,15 +195,20 @@ static bool sane_reclaim(struct scan_control *sc)
> static unsigned long zone_reclaimable_pages(struct zone *zone)
> {
> unsigned long nr;
> + long nr_swap = get_nr_swap_pages();
>
> nr = zone_page_state(zone, NR_ACTIVE_FILE) +
> zone_page_state(zone, NR_INACTIVE_FILE) +
> zone_page_state(zone, NR_ISOLATED_FILE);
>
> - if (get_nr_swap_pages() > 0)
> - nr += zone_page_state(zone, NR_ACTIVE_ANON) +
> - zone_page_state(zone, NR_INACTIVE_ANON) +
> - zone_page_state(zone, NR_ISOLATED_ANON);
> + if (nr_swap > 0) {
> + unsigned long anon;
> +
> + anon = zone_page_state(zone, NR_ACTIVE_ANON) +
> + zone_page_state(zone, NR_INACTIVE_ANON) +
> + zone_page_state(zone, NR_ISOLATED_ANON);
> + nr += min_t(unsigned long, nr_swap, anon);
> + }
>
> return nr;
> }

2015-11-24 13:37:17

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm, vmscan: do not overestimate anonymous reclaimable pages

On Tue 24-11-15 16:07:40, Vladimir Davydov wrote:
> On Tue, Nov 24, 2015 at 12:55:00PM +0100, Michal Hocko wrote:
> > zone_reclaimable_pages considers all anonymous pages on LRUs reclaimable
> > if there is at least one entry on the swap storage left. This can be
> > really misleading when the swap is short on space and skew reclaim
> > decisions based on zone_reclaimable_pages. Fix this by clamping the
> > number to the minimum of the available swap space and anon LRU pages.
>
> Suppose there's 100M of swap and 1G of anon pages. This patch makes
> zone_reclaimable_pages return 100M instead of 1G in this case. If you
> rotate 600M of oldest anon pages, which is quite possible,
> zone_reclaimable will start returning false, which is wrong, because
> there are still 400M pages that were not even scanned, besides those
> 600M of rotated pages could have become reclaimable after their ref bits
> got cleared.

Uhm, OK, I guess you are right. Making zone_reclaimable less
conservative can lead to hard to expect results. Scratch this patch
please.

> I think it is the name of zone_reclaimable_pages which is misleading. It
> should be called something like "zone_scannable_pages" judging by how it
> is used in zone_reclaimable.

Thanks!
--
Michal Hocko
SUSE Labs

2015-11-24 13:53:53

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm, vmscan: consider isolated pages in zone_reclaimable_pages

On Tue, Nov 24, 2015 at 12:54:59PM +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> zone_reclaimable_pages counts how many pages are reclaimable in
> the given zone. This currently includes all pages on file lrus and
> anon lrus if there is an available swap storage. We do not consider
> NR_ISOLATED_{ANON,FILE} counters though which is not correct because
> these counters reflect temporarily isolated pages which are still
> reclaimable because they either get back to their LRU or get freed
> either by the page reclaim or page migration.
>
> The number of these pages might be sufficiently high to confuse users of
> zone_reclaimable_pages (e.g. mbind can migrate large ranges of memory at
> once).

Sounds reasonable to me.

Reviewed-by: Vladimir Davydov <[email protected]>

Thanks,
Vladimir

>
> Suggested-by: Johannes Weiner <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/vmscan.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a4507ecaefbf..946d348f5040 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -197,11 +197,13 @@ static unsigned long zone_reclaimable_pages(struct zone *zone)
> unsigned long nr;
>
> nr = zone_page_state(zone, NR_ACTIVE_FILE) +
> - zone_page_state(zone, NR_INACTIVE_FILE);
> + zone_page_state(zone, NR_INACTIVE_FILE) +
> + zone_page_state(zone, NR_ISOLATED_FILE);
>
> if (get_nr_swap_pages() > 0)
> nr += zone_page_state(zone, NR_ACTIVE_ANON) +
> - zone_page_state(zone, NR_INACTIVE_ANON);
> + zone_page_state(zone, NR_INACTIVE_ANON) +
> + zone_page_state(zone, NR_ISOLATED_ANON);
>
> return nr;
> }

2015-11-24 16:04:09

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm, vmscan: consider isolated pages in zone_reclaimable_pages

On Tue, Nov 24, 2015 at 12:54:59PM +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> zone_reclaimable_pages counts how many pages are reclaimable in
> the given zone. This currently includes all pages on file lrus and
> anon lrus if there is an available swap storage. We do not consider
> NR_ISOLATED_{ANON,FILE} counters though which is not correct because
> these counters reflect temporarily isolated pages which are still
> reclaimable because they either get back to their LRU or get freed
> either by the page reclaim or page migration.
>
> The number of these pages might be sufficiently high to confuse users of
> zone_reclaimable_pages (e.g. mbind can migrate large ranges of memory at
> once).
>
> Suggested-by: Johannes Weiner <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

2015-11-25 11:00:21

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm, vmscan: consider isolated pages in zone_reclaimable_pages

On Tue, 24 Nov 2015, Michal Hocko wrote:

> From: Michal Hocko <[email protected]>
>
> zone_reclaimable_pages counts how many pages are reclaimable in
> the given zone. This currently includes all pages on file lrus and
> anon lrus if there is an available swap storage. We do not consider
> NR_ISOLATED_{ANON,FILE} counters though which is not correct because
> these counters reflect temporarily isolated pages which are still
> reclaimable because they either get back to their LRU or get freed
> either by the page reclaim or page migration.
>
> The number of these pages might be sufficiently high to confuse users of
> zone_reclaimable_pages (e.g. mbind can migrate large ranges of memory at
> once).
>
> Suggested-by: Johannes Weiner <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: David Rientjes <[email protected]>