2013-08-22 09:48:48

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/6] mm/hwpoison: fix lose PG_dirty flag for errors on mlocked pages

memory_failure() store the page flag of the error page before doing unmap,
and (only) if the first check with page flags at the time decided the error
page is unknown, it do the second check with the stored page flag since
memory_failure() does unmapping of the error pages before doing page_action().
This unmapping changes the page state, especially page_remove_rmap() (called
from try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.

However, memory_failure() can't handle memory errors on dirty mlocked pages
correctly. try_to_unmap_one will move the dirty bit from pte to the physical
page, the second check lose it since it check the stored page flag. This patch
fix it by restore PG_dirty flag to stored page flag if the page is dirty.

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 2
#define PAGE_SIZE 4096

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0, 0);

for (i = 0; i < PAGES_TO_TEST; i++)
mem[i * PAGE_SIZE] = 'a';

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

return 0;
}

Before patch:

[ 912.839247] Injecting memory failure for page 7dfb8 at 7f6b4e37b000
[ 912.839257] MCE 0x7dfb8: clean mlocked LRU page recovery: Recovered
[ 912.845550] MCE 0x7dfb8: clean mlocked LRU page still referenced by 1 users
[ 912.852586] Injecting memory failure for page 7e6aa at 7f6b4e37c000
[ 912.852594] MCE 0x7e6aa: clean mlocked LRU page recovery: Recovered
[ 912.858936] MCE 0x7e6aa: clean mlocked LRU page still referenced by 1 users

After patch:

[ 163.590225] Injecting memory failure for page 91bc2f at 7f9f5b0e5000
[ 163.590264] MCE 0x91bc2f: dirty mlocked LRU page recovery: Recovered
[ 163.596680] MCE 0x91bc2f: dirty mlocked LRU page still referenced by 1 users
[ 163.603831] Injecting memory failure for page 91cdd3 at 7f9f5b0e6000
[ 163.603852] MCE 0x91cdd3: dirty mlocked LRU page recovery: Recovered
[ 163.610305] MCE 0x91cdd3: dirty mlocked LRU page still referenced by 1 users

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index bee58d8..e156084 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1206,6 +1206,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
for (ps = error_states;; ps++)
if ((p->flags & ps->mask) == ps->res)
break;
+
+ page_flags |= (p->flags & (1UL << PG_dirty));
+
if (!ps->mask)
for (ps = error_states;; ps++)
if ((page_flags & ps->mask) == ps->res)
--
1.7.7.6


2013-08-22 09:48:50

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 6/6] mm/hwpoison: centralize set PG_hwpoison flag and increase num_poisoned_pages

soft_offline_page will invoke __soft_offline_page for in-use normal pages
and soft_offline_huge_page for in-use hugetlbfs pages. Both of them will
done the same effort as for soft offline free pages set PG_hwpoison, increase
num_poisoned_pages etc, this patch centralize do them in soft_offline_page.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0a52571..3226de1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1486,15 +1486,9 @@ static int soft_offline_huge_page(struct page *page, int flags)
ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL,
MIGRATE_SYNC);
put_page(hpage);
- if (ret) {
+ if (ret)
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
pfn, ret, page->flags);
- } else {
- set_page_hwpoison_huge_page(hpage);
- dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_order(hpage),
- &num_poisoned_pages);
- }
return ret;
}

@@ -1530,8 +1524,6 @@ static int __soft_offline_page(struct page *page, int flags)
if (ret == 1) {
put_page(page);
pr_info("soft_offline: %#lx: invalidated\n", pfn);
- SetPageHWPoison(page);
- atomic_long_inc(&num_poisoned_pages);
return 0;
}

@@ -1572,11 +1564,9 @@ static int __soft_offline_page(struct page *page, int flags)
lru_add_drain_all();
if (!is_free_buddy_page(page))
drain_all_pages();
- SetPageHWPoison(page);
if (!is_free_buddy_page(page))
pr_info("soft offline: %#lx: page leaked\n",
pfn);
- atomic_long_inc(&num_poisoned_pages);
}
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
@@ -1633,7 +1623,9 @@ int soft_offline_page(struct page *page, int flags)
ret = soft_offline_huge_page(page, flags);
else
ret = __soft_offline_page(page, flags);
- } else { /* for free pages */
+ }
+
+ if (!ret) {
if (PageHuge(page)) {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
--
1.8.1.2

2013-08-22 09:48:46

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 3/6] mm/hwpoison: fix num_poisoned_pages error statistics for thp

There is a race between hwpoison page and unpoison page, memory_failure
set the page hwpoison and increase num_poisoned_pages without hold page
lock, and one page count will be accounted against thp for num_poisoned_pages.
However, unpoison can occur before memory_failure hold page lock and
split transparent hugepage, unpoison will decrease num_poisoned_pages
by 1 << compound_order since memory_failure has not yet split transparent
hugepage with page lock held. That means we account one page for hwpoison
and 1 << compound_order for unpoison. This patch fix it by decrease one
account for num_poisoned_pages against no hugetlbfs pages case.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5092e06..6bfd51e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1350,7 +1350,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
if (TestClearPageHWPoison(p))
- atomic_long_sub(nr_pages, &num_poisoned_pages);
+ atomic_long_dec(&num_poisoned_pages);
pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
return 0;
}
--
1.8.1.2

2013-08-22 09:49:32

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 2/6] mm/hwpoison: don't need to hold compound lock for hugetlbfs page

compound lock is introduced by commit e9da73d67("thp: compound_lock."),
it is used to serialize put_page against __split_huge_page_refcount().
In addition, transparent hugepages will be splitted in hwpoison handler
and just one subpage will be poisoned. There is unnecessary to hold
compound lock for hugetlbfs page. This patch replace compound_trans_order
by compond_order in the place where the page is hugetlbfs page.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2c13aa7..5092e06 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -206,7 +206,7 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
#ifdef __ARCH_SI_TRAPNO
si.si_trapno = trapno;
#endif
- si.si_addr_lsb = compound_trans_order(compound_head(page)) + PAGE_SHIFT;
+ si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;

if ((flags & MF_ACTION_REQUIRED) && t == current) {
si.si_code = BUS_MCEERR_AR;
@@ -983,7 +983,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
static void set_page_hwpoison_huge_page(struct page *hpage)
{
int i;
- int nr_pages = 1 << compound_trans_order(hpage);
+ int nr_pages = 1 << compound_order(hpage);
for (i = 0; i < nr_pages; i++)
SetPageHWPoison(hpage + i);
}
@@ -991,7 +991,7 @@ static void set_page_hwpoison_huge_page(struct page *hpage)
static void clear_page_hwpoison_huge_page(struct page *hpage)
{
int i;
- int nr_pages = 1 << compound_trans_order(hpage);
+ int nr_pages = 1 << compound_order(hpage);
for (i = 0; i < nr_pages; i++)
ClearPageHWPoison(hpage + i);
}
@@ -1491,7 +1491,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
} else {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_trans_order(hpage),
+ atomic_long_add(1 << compound_order(hpage),
&num_poisoned_pages);
}
return ret;
@@ -1551,7 +1551,7 @@ int soft_offline_page(struct page *page, int flags)
if (PageHuge(page)) {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_trans_order(hpage),
+ atomic_long_add(1 << compound_order(hpage),
&num_poisoned_pages);
} else {
SetPageHWPoison(page);
--
1.8.1.2

2013-08-22 09:49:31

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 4/6] mm/hwpoison: don't set migration type twice to avoid hold heavy contend zone->lock

Set pageblock migration type will hold zone->lock which is heavy contended
in system to avoid race. However, soft offline page will set pageblock
migration type twice during get page if the page is in used, not hugetlbfs
page and not on lru list. There is unnecessary to set the pageblock migration
type and hold heavy contended zone->lock again if the first round get page
have already set the pageblock to right migration type.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6bfd51e..3bfb45f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1413,7 +1413,8 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
* was free. This flag should be kept set until the source page
* is freed and PG_hwpoison on it is set.
*/
- set_migratetype_isolate(p, true);
+ if (get_pageblock_migratetype(p) == MIGRATE_ISOLATE)
+ set_migratetype_isolate(p, true);
/*
* When the target page is a free hugepage, just remove it
* from free hugepage list.
--
1.8.1.2

2013-08-22 09:58:56

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 5/6] mm/hwpoison: drop forward reference declarations __soft_offline_page()

Drop forward reference declarations __soft_offline_page.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 129 ++++++++++++++++++++++++++--------------------------
1 file changed, 64 insertions(+), 65 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3bfb45f..0a52571 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1498,71 +1498,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
return ret;
}

-static int __soft_offline_page(struct page *page, int flags);
-
-/**
- * soft_offline_page - Soft offline a page.
- * @page: page to offline
- * @flags: flags. Same as memory_failure().
- *
- * Returns 0 on success, otherwise negated errno.
- *
- * Soft offline a page, by migration or invalidation,
- * without killing anything. This is for the case when
- * a page is not corrupted yet (so it's still valid to access),
- * but has had a number of corrected errors and is better taken
- * out.
- *
- * The actual policy on when to do that is maintained by
- * user space.
- *
- * This should never impact any application or cause data loss,
- * however it might take some time.
- *
- * This is not a 100% solution for all memory, but tries to be
- * ``good enough'' for the majority of memory.
- */
-int soft_offline_page(struct page *page, int flags)
-{
- int ret;
- unsigned long pfn = page_to_pfn(page);
- struct page *hpage = compound_trans_head(page);
-
- if (PageHWPoison(page)) {
- pr_info("soft offline: %#lx page already poisoned\n", pfn);
- return -EBUSY;
- }
- if (!PageHuge(page) && PageTransHuge(hpage)) {
- if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
- pr_info("soft offline: %#lx: failed to split THP\n",
- pfn);
- return -EBUSY;
- }
- }
-
- ret = get_any_page(page, pfn, flags);
- if (ret < 0)
- return ret;
- if (ret) { /* for in-use pages */
- if (PageHuge(page))
- ret = soft_offline_huge_page(page, flags);
- else
- ret = __soft_offline_page(page, flags);
- } else { /* for free pages */
- if (PageHuge(page)) {
- set_page_hwpoison_huge_page(hpage);
- dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_order(hpage),
- &num_poisoned_pages);
- } else {
- SetPageHWPoison(page);
- atomic_long_inc(&num_poisoned_pages);
- }
- }
- unset_migratetype_isolate(page, MIGRATE_MOVABLE);
- return ret;
-}
-
static int __soft_offline_page(struct page *page, int flags)
{
int ret;
@@ -1649,3 +1584,66 @@ static int __soft_offline_page(struct page *page, int flags)
}
return ret;
}
+
+/**
+ * soft_offline_page - Soft offline a page.
+ * @page: page to offline
+ * @flags: flags. Same as memory_failure().
+ *
+ * Returns 0 on success, otherwise negated errno.
+ *
+ * Soft offline a page, by migration or invalidation,
+ * without killing anything. This is for the case when
+ * a page is not corrupted yet (so it's still valid to access),
+ * but has had a number of corrected errors and is better taken
+ * out.
+ *
+ * The actual policy on when to do that is maintained by
+ * user space.
+ *
+ * This should never impact any application or cause data loss,
+ * however it might take some time.
+ *
+ * This is not a 100% solution for all memory, but tries to be
+ * ``good enough'' for the majority of memory.
+ */
+int soft_offline_page(struct page *page, int flags)
+{
+ int ret;
+ unsigned long pfn = page_to_pfn(page);
+ struct page *hpage = compound_trans_head(page);
+
+ if (PageHWPoison(page)) {
+ pr_info("soft offline: %#lx page already poisoned\n", pfn);
+ return -EBUSY;
+ }
+ if (!PageHuge(page) && PageTransHuge(hpage)) {
+ if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
+ pr_info("soft offline: %#lx: failed to split THP\n",
+ pfn);
+ return -EBUSY;
+ }
+ }
+
+ ret = get_any_page(page, pfn, flags);
+ if (ret < 0)
+ return ret;
+ if (ret) { /* for in-use pages */
+ if (PageHuge(page))
+ ret = soft_offline_huge_page(page, flags);
+ else
+ ret = __soft_offline_page(page, flags);
+ } else { /* for free pages */
+ if (PageHuge(page)) {
+ set_page_hwpoison_huge_page(hpage);
+ dequeue_hwpoisoned_huge_page(hpage);
+ atomic_long_add(1 << compound_order(hpage),
+ &num_poisoned_pages);
+ } else {
+ SetPageHWPoison(page);
+ atomic_long_inc(&num_poisoned_pages);
+ }
+ }
+ unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+ return ret;
+}
--
1.8.1.2

2013-08-22 15:52:55

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm/hwpoison: fix lose PG_dirty flag for errors on mlocked pages

Hi Wanpeng,

On Thu, Aug 22, 2013 at 05:48:22PM +0800, Wanpeng Li wrote:
> memory_failure() store the page flag of the error page before doing unmap,
> and (only) if the first check with page flags at the time decided the error
> page is unknown, it do the second check with the stored page flag since
> memory_failure() does unmapping of the error pages before doing page_action().
> This unmapping changes the page state, especially page_remove_rmap() (called
> from try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
> mlocked pages after that.
>
> However, memory_failure() can't handle memory errors on dirty mlocked pages
> correctly. try_to_unmap_one will move the dirty bit from pte to the physical
> page, the second check lose it since it check the stored page flag. This patch
> fix it by restore PG_dirty flag to stored page flag if the page is dirty.

Right. And I'm guessing that the discrepancy between pte_dirty and PageDirty
can happen on the situations rather than mlocked pages.
Anyway, using both of page flags before and after unmapping looks right to me.

Reviewed-by: Naoya Horiguchi <[email protected]>


> Testcase:
>
> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <stdio.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <errno.h>
>
> #define PAGES_TO_TEST 2
> #define PAGE_SIZE 4096
>
> int main(void)
> {
> char *mem;
> int i;
>
> mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
> PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0, 0);
>
> for (i = 0; i < PAGES_TO_TEST; i++)
> mem[i * PAGE_SIZE] = 'a';
>
> if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
> return -1;
>
> return 0;
> }
>
> Before patch:
>
> [ 912.839247] Injecting memory failure for page 7dfb8 at 7f6b4e37b000
> [ 912.839257] MCE 0x7dfb8: clean mlocked LRU page recovery: Recovered
> [ 912.845550] MCE 0x7dfb8: clean mlocked LRU page still referenced by 1 users
> [ 912.852586] Injecting memory failure for page 7e6aa at 7f6b4e37c000
> [ 912.852594] MCE 0x7e6aa: clean mlocked LRU page recovery: Recovered
> [ 912.858936] MCE 0x7e6aa: clean mlocked LRU page still referenced by 1 users
>
> After patch:
>
> [ 163.590225] Injecting memory failure for page 91bc2f at 7f9f5b0e5000
> [ 163.590264] MCE 0x91bc2f: dirty mlocked LRU page recovery: Recovered
> [ 163.596680] MCE 0x91bc2f: dirty mlocked LRU page still referenced by 1 users
> [ 163.603831] Injecting memory failure for page 91cdd3 at 7f9f5b0e6000
> [ 163.603852] MCE 0x91cdd3: dirty mlocked LRU page recovery: Recovered
> [ 163.610305] MCE 0x91cdd3: dirty mlocked LRU page still referenced by 1 users
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/memory-failure.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index bee58d8..e156084 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1206,6 +1206,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> for (ps = error_states;; ps++)
> if ((p->flags & ps->mask) == ps->res)
> break;
> +
> + page_flags |= (p->flags & (1UL << PG_dirty));
> +
> if (!ps->mask)
> for (ps = error_states;; ps++)
> if ((page_flags & ps->mask) == ps->res)
> --
> 1.7.7.6
>

2013-08-22 15:53:22

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 2/6] mm/hwpoison: don't need to hold compound lock for hugetlbfs page

On Thu, Aug 22, 2013 at 05:48:23PM +0800, Wanpeng Li wrote:
> compound lock is introduced by commit e9da73d67("thp: compound_lock."),
> it is used to serialize put_page against __split_huge_page_refcount().
> In addition, transparent hugepages will be splitted in hwpoison handler
> and just one subpage will be poisoned. There is unnecessary to hold
> compound lock for hugetlbfs page. This patch replace compound_trans_order
> by compond_order in the place where the page is hugetlbfs page.
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/memory-failure.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2c13aa7..5092e06 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -206,7 +206,7 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
> #ifdef __ARCH_SI_TRAPNO
> si.si_trapno = trapno;
> #endif
> - si.si_addr_lsb = compound_trans_order(compound_head(page)) + PAGE_SHIFT;
> + si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
>
> if ((flags & MF_ACTION_REQUIRED) && t == current) {
> si.si_code = BUS_MCEERR_AR;
> @@ -983,7 +983,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
> static void set_page_hwpoison_huge_page(struct page *hpage)
> {
> int i;
> - int nr_pages = 1 << compound_trans_order(hpage);
> + int nr_pages = 1 << compound_order(hpage);
> for (i = 0; i < nr_pages; i++)
> SetPageHWPoison(hpage + i);
> }
> @@ -991,7 +991,7 @@ static void set_page_hwpoison_huge_page(struct page *hpage)
> static void clear_page_hwpoison_huge_page(struct page *hpage)
> {
> int i;
> - int nr_pages = 1 << compound_trans_order(hpage);
> + int nr_pages = 1 << compound_order(hpage);
> for (i = 0; i < nr_pages; i++)
> ClearPageHWPoison(hpage + i);
> }
> @@ -1491,7 +1491,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
> } else {
> set_page_hwpoison_huge_page(hpage);
> dequeue_hwpoisoned_huge_page(hpage);
> - atomic_long_add(1 << compound_trans_order(hpage),
> + atomic_long_add(1 << compound_order(hpage),
> &num_poisoned_pages);
> }
> return ret;
> @@ -1551,7 +1551,7 @@ int soft_offline_page(struct page *page, int flags)
> if (PageHuge(page)) {
> set_page_hwpoison_huge_page(hpage);
> dequeue_hwpoisoned_huge_page(hpage);
> - atomic_long_add(1 << compound_trans_order(hpage),
> + atomic_long_add(1 << compound_order(hpage),
> &num_poisoned_pages);
> } else {
> SetPageHWPoison(page);

We have one more compound_trans_order() in unpoison_memory(), so could you
replace that too?

With that change ...
Reviewed-by: Naoya Horiguchi <[email protected]>

Thanks,
Naoya Horiguchi

2013-08-22 16:43:36

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm/hwpoison: fix num_poisoned_pages error statistics for thp

On Thu, Aug 22, 2013 at 05:48:24PM +0800, Wanpeng Li wrote:
> There is a race between hwpoison page and unpoison page, memory_failure
> set the page hwpoison and increase num_poisoned_pages without hold page
> lock, and one page count will be accounted against thp for num_poisoned_pages.
> However, unpoison can occur before memory_failure hold page lock and
> split transparent hugepage, unpoison will decrease num_poisoned_pages
> by 1 << compound_order since memory_failure has not yet split transparent
> hugepage with page lock held. That means we account one page for hwpoison
> and 1 << compound_order for unpoison. This patch fix it by decrease one
> account for num_poisoned_pages against no hugetlbfs pages case.
>
> Signed-off-by: Wanpeng Li <[email protected]>

I think that a thp never becomes hwpoisoned without splitting, so "trying
to unpoison thp" never happens (I think that this implicit fact should be
commented somewhere or asserted with VM_BUG_ON().)
And nr_pages in unpoison_memory() can be greater than 1 for hugetlbfs page.
So does this patch break counting when unpoisoning free hugetlbfs pages?

Thanks,
Naoya Horiguchi

> ---
> mm/memory-failure.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5092e06..6bfd51e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1350,7 +1350,7 @@ int unpoison_memory(unsigned long pfn)
> return 0;
> }
> if (TestClearPageHWPoison(p))
> - atomic_long_sub(nr_pages, &num_poisoned_pages);
> + atomic_long_dec(&num_poisoned_pages);
> pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
> return 0;
> }
> --
> 1.8.1.2
>

2013-08-22 17:01:11

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm/hwpoison: fix num_poisoned_pages error statistics for thp

On Thu, Aug 22, 2013 at 12:43:08PM -0400, Naoya Horiguchi wrote:
> On Thu, Aug 22, 2013 at 05:48:24PM +0800, Wanpeng Li wrote:
> > There is a race between hwpoison page and unpoison page, memory_failure
> > set the page hwpoison and increase num_poisoned_pages without hold page
> > lock, and one page count will be accounted against thp for num_poisoned_pages.
> > However, unpoison can occur before memory_failure hold page lock and
> > split transparent hugepage, unpoison will decrease num_poisoned_pages
> > by 1 << compound_order since memory_failure has not yet split transparent
> > hugepage with page lock held. That means we account one page for hwpoison
> > and 1 << compound_order for unpoison. This patch fix it by decrease one
> > account for num_poisoned_pages against no hugetlbfs pages case.
> >
> > Signed-off-by: Wanpeng Li <[email protected]>
>
> I think that a thp never becomes hwpoisoned without splitting, so "trying
> to unpoison thp" never happens (I think that this implicit fact should be
> commented somewhere or asserted with VM_BUG_ON().)

> And nr_pages in unpoison_memory() can be greater than 1 for hugetlbfs page.
> So does this patch break counting when unpoisoning free hugetlbfs pages?

Sorry, the latter part of this remark was incorrect. Please ignore it.

- Naoya

2013-08-22 19:06:58

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm/hwpoison: don't set migration type twice to avoid hold heavy contend zone->lock

On Thu, Aug 22, 2013 at 05:48:25PM +0800, Wanpeng Li wrote:
> Set pageblock migration type will hold zone->lock which is heavy contended
> in system to avoid race. However, soft offline page will set pageblock
> migration type twice during get page if the page is in used, not hugetlbfs
> page and not on lru list. There is unnecessary to set the pageblock migration
> type and hold heavy contended zone->lock again if the first round get page
> have already set the pageblock to right migration type.

Can we use get_pageblock_migratetype() outside zone->lock?

There are surely some users which call this function outside
zone->lock like free_hot_cold_pages(), __free_pages, etc.,
but I think that there's a race window where migratetype is
updated just after get_pageblock_migratetype() check.

> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/memory-failure.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 6bfd51e..3bfb45f 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1413,7 +1413,8 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
> * was free. This flag should be kept set until the source page
> * is freed and PG_hwpoison on it is set.
> */
> - set_migratetype_isolate(p, true);
> + if (get_pageblock_migratetype(p) == MIGRATE_ISOLATE)

You meant '!=', right?

Thanks,
Naoya Horiguchi

> + set_migratetype_isolate(p, true);
> /*
> * When the target page is a free hugepage, just remove it
> * from free hugepage list.
> --
> 1.8.1.2
>

2013-08-22 19:25:01

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 5/6] mm/hwpoison: drop forward reference declarations __soft_offline_page()

On Thu, Aug 22, 2013 at 05:48:26PM +0800, Wanpeng Li wrote:
> Drop forward reference declarations __soft_offline_page.
>
> Signed-off-by: Wanpeng Li <[email protected]>

Reviewed-by: Naoya Horiguchi <[email protected]>

> ---
> mm/memory-failure.c | 129 ++++++++++++++++++++++++++--------------------------
> 1 file changed, 64 insertions(+), 65 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3bfb45f..0a52571 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1498,71 +1498,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
> return ret;
> }
>
> -static int __soft_offline_page(struct page *page, int flags);
> -
> -/**
> - * soft_offline_page - Soft offline a page.
> - * @page: page to offline
> - * @flags: flags. Same as memory_failure().
> - *
> - * Returns 0 on success, otherwise negated errno.
> - *
> - * Soft offline a page, by migration or invalidation,
> - * without killing anything. This is for the case when
> - * a page is not corrupted yet (so it's still valid to access),
> - * but has had a number of corrected errors and is better taken
> - * out.
> - *
> - * The actual policy on when to do that is maintained by
> - * user space.
> - *
> - * This should never impact any application or cause data loss,
> - * however it might take some time.
> - *
> - * This is not a 100% solution for all memory, but tries to be
> - * ``good enough'' for the majority of memory.
> - */
> -int soft_offline_page(struct page *page, int flags)
> -{
> - int ret;
> - unsigned long pfn = page_to_pfn(page);
> - struct page *hpage = compound_trans_head(page);
> -
> - if (PageHWPoison(page)) {
> - pr_info("soft offline: %#lx page already poisoned\n", pfn);
> - return -EBUSY;
> - }
> - if (!PageHuge(page) && PageTransHuge(hpage)) {
> - if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
> - pr_info("soft offline: %#lx: failed to split THP\n",
> - pfn);
> - return -EBUSY;
> - }
> - }
> -
> - ret = get_any_page(page, pfn, flags);
> - if (ret < 0)
> - return ret;
> - if (ret) { /* for in-use pages */
> - if (PageHuge(page))
> - ret = soft_offline_huge_page(page, flags);
> - else
> - ret = __soft_offline_page(page, flags);
> - } else { /* for free pages */
> - if (PageHuge(page)) {
> - set_page_hwpoison_huge_page(hpage);
> - dequeue_hwpoisoned_huge_page(hpage);
> - atomic_long_add(1 << compound_order(hpage),
> - &num_poisoned_pages);
> - } else {
> - SetPageHWPoison(page);
> - atomic_long_inc(&num_poisoned_pages);
> - }
> - }
> - unset_migratetype_isolate(page, MIGRATE_MOVABLE);
> - return ret;
> -}
> -
> static int __soft_offline_page(struct page *page, int flags)
> {
> int ret;
> @@ -1649,3 +1584,66 @@ static int __soft_offline_page(struct page *page, int flags)
> }
> return ret;
> }
> +
> +/**
> + * soft_offline_page - Soft offline a page.
> + * @page: page to offline
> + * @flags: flags. Same as memory_failure().
> + *
> + * Returns 0 on success, otherwise negated errno.
> + *
> + * Soft offline a page, by migration or invalidation,
> + * without killing anything. This is for the case when
> + * a page is not corrupted yet (so it's still valid to access),
> + * but has had a number of corrected errors and is better taken
> + * out.
> + *
> + * The actual policy on when to do that is maintained by
> + * user space.
> + *
> + * This should never impact any application or cause data loss,
> + * however it might take some time.
> + *
> + * This is not a 100% solution for all memory, but tries to be
> + * ``good enough'' for the majority of memory.
> + */
> +int soft_offline_page(struct page *page, int flags)
> +{
> + int ret;
> + unsigned long pfn = page_to_pfn(page);
> + struct page *hpage = compound_trans_head(page);
> +
> + if (PageHWPoison(page)) {
> + pr_info("soft offline: %#lx page already poisoned\n", pfn);
> + return -EBUSY;
> + }
> + if (!PageHuge(page) && PageTransHuge(hpage)) {
> + if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
> + pr_info("soft offline: %#lx: failed to split THP\n",
> + pfn);
> + return -EBUSY;
> + }
> + }
> +
> + ret = get_any_page(page, pfn, flags);
> + if (ret < 0)
> + return ret;
> + if (ret) { /* for in-use pages */
> + if (PageHuge(page))
> + ret = soft_offline_huge_page(page, flags);
> + else
> + ret = __soft_offline_page(page, flags);
> + } else { /* for free pages */
> + if (PageHuge(page)) {
> + set_page_hwpoison_huge_page(hpage);
> + dequeue_hwpoisoned_huge_page(hpage);
> + atomic_long_add(1 << compound_order(hpage),
> + &num_poisoned_pages);
> + } else {
> + SetPageHWPoison(page);
> + atomic_long_inc(&num_poisoned_pages);
> + }
> + }
> + unset_migratetype_isolate(page, MIGRATE_MOVABLE);
> + return ret;
> +}
> --
> 1.8.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2013-08-22 20:13:44

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 6/6] mm/hwpoison: centralize set PG_hwpoison flag and increase num_poisoned_pages

On Thu, Aug 22, 2013 at 05:48:27PM +0800, Wanpeng Li wrote:
> soft_offline_page will invoke __soft_offline_page for in-use normal pages
> and soft_offline_huge_page for in-use hugetlbfs pages. Both of them will
> done the same effort as for soft offline free pages set PG_hwpoison, increase
> num_poisoned_pages etc, this patch centralize do them in soft_offline_page.
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/memory-failure.c | 16 ++++------------
> 1 file changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 0a52571..3226de1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1486,15 +1486,9 @@ static int soft_offline_huge_page(struct page *page, int flags)
> ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL,
> MIGRATE_SYNC);
> put_page(hpage);
> - if (ret) {
> + if (ret)
> pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
> pfn, ret, page->flags);
> - } else {
> - set_page_hwpoison_huge_page(hpage);
> - dequeue_hwpoisoned_huge_page(hpage);
> - atomic_long_add(1 << compound_order(hpage),
> - &num_poisoned_pages);
> - }
> return ret;
> }
>
> @@ -1530,8 +1524,6 @@ static int __soft_offline_page(struct page *page, int flags)
> if (ret == 1) {
> put_page(page);
> pr_info("soft_offline: %#lx: invalidated\n", pfn);
> - SetPageHWPoison(page);
> - atomic_long_inc(&num_poisoned_pages);
> return 0;
> }
>
> @@ -1572,11 +1564,9 @@ static int __soft_offline_page(struct page *page, int flags)
> lru_add_drain_all();
> if (!is_free_buddy_page(page))
> drain_all_pages();
> - SetPageHWPoison(page);
> if (!is_free_buddy_page(page))
> pr_info("soft offline: %#lx: page leaked\n",
> pfn);
> - atomic_long_inc(&num_poisoned_pages);
> }
> } else {
> pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",

This change does not simply clean up code, but affects the behavior.
This memory leak check should come after SetPageHWPoison().

Thanks,
Naoya Horiguchi

> @@ -1633,7 +1623,9 @@ int soft_offline_page(struct page *page, int flags)
> ret = soft_offline_huge_page(page, flags);
> else
> ret = __soft_offline_page(page, flags);
> - } else { /* for free pages */
> + }
> +
> + if (!ret) {
> if (PageHuge(page)) {
> set_page_hwpoison_huge_page(hpage);
> dequeue_hwpoisoned_huge_page(hpage);
> --
> 1.8.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2013-08-23 03:27:44

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm/hwpoison: fix num_poisoned_pages error statistics for thp

Hi Wanpeng,

On Fri, Aug 23, 2013 at 07:52:40AM +0800, Wanpeng Li wrote:
> Hi Naoya,
> On Thu, Aug 22, 2013 at 12:43:08PM -0400, Naoya Horiguchi wrote:
> >On Thu, Aug 22, 2013 at 05:48:24PM +0800, Wanpeng Li wrote:
> >> There is a race between hwpoison page and unpoison page, memory_failure
> >> set the page hwpoison and increase num_poisoned_pages without hold page
> >> lock, and one page count will be accounted against thp for num_poisoned_pages.
> >> However, unpoison can occur before memory_failure hold page lock and
> >> split transparent hugepage, unpoison will decrease num_poisoned_pages
> >> by 1 << compound_order since memory_failure has not yet split transparent
> >> hugepage with page lock held. That means we account one page for hwpoison
> >> and 1 << compound_order for unpoison. This patch fix it by decrease one
> >> account for num_poisoned_pages against no hugetlbfs pages case.
> >>
> >> Signed-off-by: Wanpeng Li <[email protected]>
> >
> >I think that a thp never becomes hwpoisoned without splitting, so "trying
> >to unpoison thp" never happens (I think that this implicit fact should be
>
> There is a race window here for hwpoison thp:

OK, thanks for great explanation (it's worth written in description.)
And I found my previous comment was comletely pointless, sorry :(

> A B
> memory_failue
> TestSetPageHWPoison(p);
> if (PageHuge(p))
> nr_pages = 1 << compound_order(hpage);
> else
> nr_pages = 1;
> atomic_long_add(nr_pages, &num_poisoned_pages);
> unpoison_memory
> nr_pages = 1<< compound_trans_order(page;)
>
> if(TestClearPageHWPoison(p))
> atomic_long_sub(nr_pages, &num_poisoned_pages);
> lock page
> if (!PageHWPoison(p))
> unlock page and return
> hwpoison_user_mappings
> if (PageTransHuge(hpage))
> split_huge_page(hpage);

When this race happens, our expectation is that num_poisoned_pages is
increased by 1 because finally thread A succeeds to hwpoison one normal page.
So thread B should fail to unpoison without clearing PageHWPoison nor
decreasing num_poisoned_pages. My suggestion is inserting a PageTransHuge
check before doing TestClearPageHWPoison like follows:

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1cb3b7d..f551b72 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1336,6 +1336,16 @@ int unpoison_memory(unsigned long pfn)
return 0;
}

+ /*
+ * unpoison_memory() can encounter thp only when the thp is being
+ * worked by memory_failure() and the page lock is not held yet.
+ * In such case, we yield to memory_failure() and make unpoison fail.
+ */
+ if (PageTransHuge(page)) {
+ pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
+ return 0;
+ }
+
nr_pages = 1 << compound_trans_order(page);

if (!get_page_unless_zero(page)) {


I think that replacing atomic_long_sub() with atomic_long_dec() still
has a meaning, so you don't have to drop that.

>
> We increase one page count, however, decrease 1 << compound_trans_order.
> The compound_trans_order you mentioned is used here for thp, that's why
> I don't drop it in patch 2/6.

I don't think that we have to use compound_trans_order() any more, because
with the above change we don't calculate nr_pages any more for thp.
We can reduce the cost to lock/unlock compound_lock as described in 2/6.

> >commented somewhere or asserted with VM_BUG_ON().)
>
> I will add the VM_BUG_ON() in unpoison_memory after lock page in next
> version.

Sorry, my previous suggestion didn't make sense.

Thank you!
Naoya Horiguchi

> >And nr_pages in unpoison_memory() can be greater than 1 for hugetlbfs page.
> >So does this patch break counting when unpoisoning free hugetlbfs pages?
> >
> >Thanks,
> >Naoya Horiguchi
> >
> >> ---
> >> mm/memory-failure.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 5092e06..6bfd51e 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1350,7 +1350,7 @@ int unpoison_memory(unsigned long pfn)
> >> return 0;
> >> }
> >> if (TestClearPageHWPoison(p))
> >> - atomic_long_sub(nr_pages, &num_poisoned_pages);
> >> + atomic_long_dec(&num_poisoned_pages);
> >> pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
> >> return 0;
> >> }
> >> --
> >> 1.8.1.2
> >>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>