2015-08-03 15:02:45

by Vladimir Davydov

[permalink] [raw]
Subject: [PATCH] mm: vmscan: never isolate more pages than necessary

If transparent huge pages are enabled, we can isolate many more pages
than we actually need to scan, because we count both single and huge
pages equally in isolate_lru_pages().

Since commit 5bc7b8aca942d ("mm: thp: add split tail pages to shrink
page list in page reclaim"), we scan all the tail pages immediately
after a huge page split (see shrink_page_list()). As a result, we can
reclaim up to SWAP_CLUSTER_MAX * HPAGE_PMD_NR (512 MB) in one run!

This is easy to catch on memcg reclaim with zswap enabled. The latter
makes swapout instant so that if we happen to scan an unreferenced huge
page we will evict both its head and tail pages immediately, which is
likely to result in excessive reclaim.

Signed-off-by: Vladimir Davydov <[email protected]>
---
mm/vmscan.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5221e19e98f4..94092fd3b96b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1387,7 +1387,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
unsigned long nr_taken = 0;
unsigned long scan;

- for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
+ for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
+ !list_empty(src); scan++) {
struct page *page;
int nr_pages;

--
2.1.4


2015-08-04 13:53:01

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: vmscan: never isolate more pages than necessary

On Mon 03-08-15 18:02:27, Vladimir Davydov wrote:
> If transparent huge pages are enabled, we can isolate many more pages
> than we actually need to scan, because we count both single and huge
> pages equally in isolate_lru_pages().
>
> Since commit 5bc7b8aca942d ("mm: thp: add split tail pages to shrink
> page list in page reclaim"), we scan all the tail pages immediately
> after a huge page split (see shrink_page_list()). As a result, we can
> reclaim up to SWAP_CLUSTER_MAX * HPAGE_PMD_NR (512 MB) in one run!

512MB is really unexpected. But yeah, you are right. Mel has increased
SWAP_CLUSTER_MAX to 256 recently (mm: increase SWAP_CLUSTER_MAX to
batch TLB flushes) which I have missed. That has made the situation
potentially much worse. I guess this is worth mentioning in the
changelog because the original SWAP_CLUSTER_MAX (32) hasn't looked that
scary.

> This is easy to catch on memcg reclaim with zswap enabled. The latter
> makes swapout instant so that if we happen to scan an unreferenced huge
> page we will evict both its head and tail pages immediately, which is
> likely to result in excessive reclaim.
>
> Signed-off-by: Vladimir Davydov <[email protected]>

Reviewed-by: Michal Hocko <[email protected]>

> ---
> mm/vmscan.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5221e19e98f4..94092fd3b96b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1387,7 +1387,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> unsigned long nr_taken = 0;
> unsigned long scan;
>
> - for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
> + for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
> + !list_empty(src); scan++) {
> struct page *page;
> int nr_pages;
>
> --
> 2.1.4
>

--
Michal Hocko
SUSE Labs