2013-09-02 12:34:07

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

madvise_hwpoison won't check if the page is small page or huge page and traverse
in small page granularity against the range unconditional, which result in a printk
flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch fix
it by increase compound_order(compound_head(page)) for huge page iterator.

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 3
#define PAGE_SIZE 4096 * 512

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/madvise.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6975bc8..539eeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
*/
static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
{
+ struct page *p;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
- for (; start < end; start += PAGE_SIZE) {
- struct page *p;
+ for (; start < end; start += PAGE_SIZE <<
+ compound_order(compound_head(p))) {
int ret;

ret = get_user_pages_fast(start, 1, 0, &p);
--
1.8.1.2


2013-09-02 12:34:13

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page

The lack of one reference count against poisoned page for hwpoison_inject w/o
hwpoison_filter enabled result in hwpoison detect -1 users still referenced
the page, however, the number should be 0 except the poison handler held one
after successfully unmap. This patch fix it by hold one referenced count against
poisoned page for hwpoison_inject w/ and w/o hwpoison_filter enabled.

Before patch:

[ 71.902112] Injecting memory failure at pfn 224706
[ 71.902137] MCE 0x224706: dirty LRU page recovery: Failed
[ 71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users

After patch:

[ 94.710860] Injecting memory failure at pfn 215b68
[ 94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/hwpoison-inject.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index afc2daa..4c84678 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -20,8 +20,6 @@ static int hwpoison_inject(void *data, u64 val)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;

- if (!hwpoison_filter_enable)
- goto inject;
if (!pfn_valid(pfn))
return -ENXIO;

@@ -33,6 +31,9 @@ static int hwpoison_inject(void *data, u64 val)
if (!get_page_unless_zero(hpage))
return 0;

+ if (!hwpoison_filter_enable)
+ goto inject;
+
if (!PageLRU(p) && !PageHuge(p))
shake_page(p, 0);
/*
--
1.8.1.2

2013-09-02 12:34:10

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 3/4] mm/hwpoison: fix false report 2nd try page recovery

If the page is poisoned by software inject w/ MF_COUNT_INCREASED flag, there
is a false report 2nd try page recovery which is not truth, this patch fix it
by report first try free buddy page recovery if MF_COUNT_INCREASED is set.

Before patch:

[ 346.332041] Injecting memory failure at pfn 200010
[ 346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed

After patch:

[ 297.742600] Injecting memory failure at pfn 200010
[ 297.742941] MCE 0x200010: free buddy page recovery: Delayed

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b114570..6293164 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1114,8 +1114,10 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
* shake_page could have turned it free.
*/
if (is_free_buddy_page(p)) {
- action_result(pfn, "free buddy, 2nd try",
- DELAYED);
+ if (flags & MF_COUNT_INCREASED)
+ action_result(pfn, "free buddy", DELAYED);
+ else
+ action_result(pfn, "free buddy, 2nd try", DELAYED);
return 0;
}
action_result(pfn, "non LRU", IGNORED);
--
1.7.5.4

2013-09-02 12:34:42

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 2/4] mm/hwpoison: fix miss catch transparent huge page

PageTransHuge() can't guarantee the page is transparent huge page since it
return true for both transparent huge and hugetlbfs pages. This patch fix
it by check the page is also !hugetlbfs page.

Before patch:

[ 121.571128] Injecting memory failure at pfn 23a200
[ 121.571141] MCE 0x23a200: huge page recovery: Delayed
[ 140.355100] MCE: Memory failure is now running on 0x23a200

After patch:

[ 94.290793] Injecting memory failure at pfn 23a000
[ 94.290800] MCE 0x23a000: huge page recovery: Delayed
[ 105.722303] MCE: Software-unpoisoned page 0x23a000

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e28ee77..b114570 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1349,7 +1349,7 @@ int unpoison_memory(unsigned long pfn)
* worked by memory_failure() and the page lock is not held yet.
* In such case, we yield to memory_failure() and make unpoison fail.
*/
- if (PageTransHuge(page)) {
+ if (PageTransHuge(page) && !PageHuge(page)) {
pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
return 0;
}
--
1.8.1.2

2013-09-02 18:34:34

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood

On Mon, Sep 02, 2013 at 08:33:41PM +0800, Wanpeng Li wrote:
> madvise_hwpoison won't check if the page is small page or huge page and traverse
> in small page granularity against the range unconditional, which result in a printk
> flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch fix
> it by increase compound_order(compound_head(page)) for huge page iterator.
>
> Testcase:
>
> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <stdio.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <errno.h>
>
> #define PAGES_TO_TEST 3
> #define PAGE_SIZE 4096 * 512
>
> int main(void)
> {
> char *mem;
> int i;
>
> mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
> PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);
>
> if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
> return -1;
>
> munmap(mem, PAGES_TO_TEST * PAGE_SIZE);
>
> return 0;
> }
>
> Signed-off-by: Wanpeng Li <[email protected]>

Reviewed-by: Naoya Horiguchi <[email protected]>

> ---
> mm/madvise.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6975bc8..539eeb9 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
> */
> static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
> {
> + struct page *p;
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
> - for (; start < end; start += PAGE_SIZE) {
> - struct page *p;
> + for (; start < end; start += PAGE_SIZE <<
> + compound_order(compound_head(p))) {
> int ret;
>
> ret = get_user_pages_fast(start, 1, 0, &p);
> --
> 1.8.1.2
>

2013-09-02 18:34:45

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page

On Mon, Sep 02, 2013 at 08:33:44PM +0800, Wanpeng Li wrote:
> The lack of one reference count against poisoned page for hwpoison_inject w/o
> hwpoison_filter enabled result in hwpoison detect -1 users still referenced
> the page, however, the number should be 0 except the poison handler held one
> after successfully unmap. This patch fix it by hold one referenced count against
> poisoned page for hwpoison_inject w/ and w/o hwpoison_filter enabled.
>
> Before patch:
>
> [ 71.902112] Injecting memory failure at pfn 224706
> [ 71.902137] MCE 0x224706: dirty LRU page recovery: Failed
> [ 71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users
>
> After patch:
>
> [ 94.710860] Injecting memory failure at pfn 215b68
> [ 94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered
>
> Signed-off-by: Wanpeng Li <[email protected]>

Reviewed-by: Naoya Horiguchi <[email protected]>

> ---
> mm/hwpoison-inject.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index afc2daa..4c84678 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -20,8 +20,6 @@ static int hwpoison_inject(void *data, u64 val)
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> - if (!hwpoison_filter_enable)
> - goto inject;
> if (!pfn_valid(pfn))
> return -ENXIO;
>
> @@ -33,6 +31,9 @@ static int hwpoison_inject(void *data, u64 val)
> if (!get_page_unless_zero(hpage))
> return 0;
>
> + if (!hwpoison_filter_enable)
> + goto inject;
> +
> if (!PageLRU(p) && !PageHuge(p))
> shake_page(p, 0);
> /*
> --
> 1.8.1.2
>

2013-09-02 18:34:47

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/hwpoison: fix false report 2nd try page recovery

On Mon, Sep 02, 2013 at 08:33:43PM +0800, Wanpeng Li wrote:
> If the page is poisoned by software inject w/ MF_COUNT_INCREASED flag, there
> is a false report 2nd try page recovery which is not truth, this patch fix it
> by report first try free buddy page recovery if MF_COUNT_INCREASED is set.
>
> Before patch:
>
> [ 346.332041] Injecting memory failure at pfn 200010
> [ 346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed
>
> After patch:
>
> [ 297.742600] Injecting memory failure at pfn 200010
> [ 297.742941] MCE 0x200010: free buddy page recovery: Delayed
>
> Signed-off-by: Wanpeng Li <[email protected]>

Reviewed-by: Naoya Horiguchi <[email protected]>

> ---
> mm/memory-failure.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b114570..6293164 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1114,8 +1114,10 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> * shake_page could have turned it free.
> */
> if (is_free_buddy_page(p)) {
> - action_result(pfn, "free buddy, 2nd try",
> - DELAYED);
> + if (flags & MF_COUNT_INCREASED)
> + action_result(pfn, "free buddy", DELAYED);
> + else
> + action_result(pfn, "free buddy, 2nd try", DELAYED);
> return 0;
> }
> action_result(pfn, "non LRU", IGNORED);
> --
> 1.7.5.4
>

2013-09-02 18:34:43

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/hwpoison: fix miss catch transparent huge page

On Mon, Sep 02, 2013 at 08:33:42PM +0800, Wanpeng Li wrote:
> PageTransHuge() can't guarantee the page is transparent huge page since it
> return true for both transparent huge and hugetlbfs pages. This patch fix
> it by check the page is also !hugetlbfs page.
>
> Before patch:
>
> [ 121.571128] Injecting memory failure at pfn 23a200
> [ 121.571141] MCE 0x23a200: huge page recovery: Delayed
> [ 140.355100] MCE: Memory failure is now running on 0x23a200
>
> After patch:
>
> [ 94.290793] Injecting memory failure at pfn 23a000
> [ 94.290800] MCE 0x23a000: huge page recovery: Delayed
> [ 105.722303] MCE: Software-unpoisoned page 0x23a000
>
> Signed-off-by: Wanpeng Li <[email protected]>

PageTransHuge doesn't care about hugetlbfs at all, assuming that it
shouldn't be called hugetlbfs context as commented.

/*
* PageHuge() only returns true for hugetlbfs pages, but not for
* normal or transparent huge pages.
*
* PageTransHuge() returns true for both transparent huge and
* hugetlbfs pages, but not normal pages. PageTransHuge() can only be
* called only in the core VM paths where hugetlbfs pages can't exist.
*/
static inline int PageTransHuge(struct page *page)

I think it's for the ultra optimization of thp, so we can't change that.
So we need to follow the pattern whenever possible.

if (PageHuge) {
hugetlb specific code
} else if (PageTransHuge) {
thp specific code
}
normal page code / common code

> ---
> mm/memory-failure.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index e28ee77..b114570 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1349,7 +1349,7 @@ int unpoison_memory(unsigned long pfn)
> * worked by memory_failure() and the page lock is not held yet.
> * In such case, we yield to memory_failure() and make unpoison fail.
> */
> - if (PageTransHuge(page)) {
> + if (PageTransHuge(page) && !PageHuge(page)) {
> pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
> return 0;
> }

I think that we can effectively follow the above pattern by reversing
these two checks.

Thanks,
Naoya Horiguchi