2011-06-05 05:08:49

by Minchan Kim

[permalink] [raw]
Subject: [PATCH] Fix page isolated count mismatch

If migration is failed, normally we call putback_lru_pages which
decreases NR_ISOLATE_[ANON|FILE].
It means we should increase NR_ISOLATE_[ANON|FILE] before calling
putback_lru_pages. But soft_offline_page dosn't it.

It can make NR_ISOLATE_[ANON|FILE] with negative value and in UP build
, zone_page_state will say huge isolated pages so too_many_isolated
functions be deceived completely. At last, some process stuck in D state
as it expect while loop ending with congestion_wait.
But it's never ending story.

If it is right, it would be -stable stuff.

Cc: Andi Kleen <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/memory-failure.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5c8f7e0..eac0ba5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -52,6 +52,7 @@
#include <linux/swapops.h>
#include <linux/hugetlb.h>
#include <linux/memory_hotplug.h>
+#include <linux/mm_inline.h>
#include "internal.h"

int sysctl_memory_failure_early_kill __read_mostly = 0;
@@ -1468,7 +1469,8 @@ int soft_offline_page(struct page *page, int flags)
put_page(page);
if (!ret) {
LIST_HEAD(pagelist);
-
+ inc_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
list_add(&page->lru, &pagelist);
ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
0, true);
--
1.7.0.4


2011-06-07 09:51:11

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] Fix page isolated count mismatch

On Sun, Jun 05, 2011 at 02:08:36PM +0900, Minchan Kim wrote:
> If migration is failed, normally we call putback_lru_pages which
> decreases NR_ISOLATE_[ANON|FILE].
> It means we should increase NR_ISOLATE_[ANON|FILE] before calling
> putback_lru_pages. But soft_offline_page dosn't it.
>
> It can make NR_ISOLATE_[ANON|FILE] with negative value and in UP build
> , zone_page_state will say huge isolated pages so too_many_isolated
> functions be deceived completely. At last, some process stuck in D state
> as it expect while loop ending with congestion_wait.
> But it's never ending story.
>
> If it is right, it would be -stable stuff.
>

The patch is fine but the changelog is tricky to read. How about this?

[PATCH] Fix isolated page count during memory failure

Pages isolated for migration are accounted with the vmstat counters
NR_ISOLATE_[ANON|FILE]. Callers of migrate_pages() are expected to
increment these counters when pages are isolated from the LRU. Once
the pages have been migrated, they are put back on the LRU or freed
and the isolated count is decremented.

Memory failure is not properly accounting for pages it isolates
causing the NR_ISOLATED counters to be negative. On SMP builds,
this goes unnoticed as negative counters are treated as 0 due to
expected per-cpu drift. On UP builds, the counter is treated by
too_many_isolated() as a large value causing processes to enter D
state during page reclaim or compaction. This patch accounts for
pages isolated by memory failure correctly.

Whether you add the changelog or not;

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs