2013-08-26 08:46:29

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 1/10] mm/hwpoison: fix lose PG_dirty flag for errors on mlocked pages

memory_failure() store the page flag of the error page before doing unmap,
and (only) if the first check with page flags at the time decided the error
page is unknown, it do the second check with the stored page flag since
memory_failure() does unmapping of the error pages before doing page_action().
This unmapping changes the page state, especially page_remove_rmap() (called
from try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.

However, memory_failure() can't handle memory errors on dirty mlocked pages
correctly. try_to_unmap_one will move the dirty bit from pte to the physical
page, the second check lose it since it check the stored page flag. This patch
fix it by restore PG_dirty flag to stored page flag if the page is dirty.

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 2
#define PAGE_SIZE 4096

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0, 0);

for (i = 0; i < PAGES_TO_TEST; i++)
mem[i * PAGE_SIZE] = 'a';

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

return 0;
}

Before patch:

[ 912.839247] Injecting memory failure for page 7dfb8 at 7f6b4e37b000
[ 912.839257] MCE 0x7dfb8: clean mlocked LRU page recovery: Recovered
[ 912.845550] MCE 0x7dfb8: clean mlocked LRU page still referenced by 1 users
[ 912.852586] Injecting memory failure for page 7e6aa at 7f6b4e37c000
[ 912.852594] MCE 0x7e6aa: clean mlocked LRU page recovery: Recovered
[ 912.858936] MCE 0x7e6aa: clean mlocked LRU page still referenced by 1 users

After patch:

[ 163.590225] Injecting memory failure for page 91bc2f at 7f9f5b0e5000
[ 163.590264] MCE 0x91bc2f: dirty mlocked LRU page recovery: Recovered
[ 163.596680] MCE 0x91bc2f: dirty mlocked LRU page still referenced by 1 users
[ 163.603831] Injecting memory failure for page 91cdd3 at 7f9f5b0e6000
[ 163.603852] MCE 0x91cdd3: dirty mlocked LRU page recovery: Recovered
[ 163.610305] MCE 0x91cdd3: dirty mlocked LRU page still referenced by 1 users

Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2c13aa7..d5686d4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1204,6 +1204,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
for (ps = error_states;; ps++)
if ((p->flags & ps->mask) == ps->res)
break;
+
+ page_flags |= (p->flags & (1UL << PG_dirty));
+
if (!ps->mask)
for (ps = error_states;; ps++)
if ((page_flags & ps->mask) == ps->res)
--
1.8.1.2


2013-08-26 08:46:31

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 2/10] mm/hwpoison: don't need to hold compound lock for hugetlbfs page

v1 -> v2:
* drop compound_trans_order completely

compound lock is introduced by commit e9da73d67("thp: compound_lock."),
it is used to serialize put_page against __split_huge_page_refcount().
In addition, transparent hugepages will be splitted in hwpoison handler
and just one subpage will be poisoned. There is unnecessary to hold
compound lock for hugetlbfs page. This patch replace compound_trans_order
by compond_order in the place where the page is hugetlbfs page.

Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
include/linux/mm.h | 14 --------------
mm/memory-failure.c | 12 ++++++------
2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..1745a2a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -489,20 +489,6 @@ static inline int compound_order(struct page *page)
return (unsigned long)page[1].lru.prev;
}

-static inline int compound_trans_order(struct page *page)
-{
- int order;
- unsigned long flags;
-
- if (!PageHead(page))
- return 0;
-
- flags = compound_lock_irqsave(page);
- order = compound_order(page);
- compound_unlock_irqrestore(page, flags);
- return order;
-}
-
static inline void set_compound_order(struct page *page, unsigned long order)
{
page[1].lru.prev = (void *)order;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2c13aa7..efa6bd7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -206,7 +206,7 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
#ifdef __ARCH_SI_TRAPNO
si.si_trapno = trapno;
#endif
- si.si_addr_lsb = compound_trans_order(compound_head(page)) + PAGE_SHIFT;
+ si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;

if ((flags & MF_ACTION_REQUIRED) && t == current) {
si.si_code = BUS_MCEERR_AR;
@@ -983,7 +983,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
static void set_page_hwpoison_huge_page(struct page *hpage)
{
int i;
- int nr_pages = 1 << compound_trans_order(hpage);
+ int nr_pages = 1 << compound_order(hpage);
for (i = 0; i < nr_pages; i++)
SetPageHWPoison(hpage + i);
}
@@ -991,7 +991,7 @@ static void set_page_hwpoison_huge_page(struct page *hpage)
static void clear_page_hwpoison_huge_page(struct page *hpage)
{
int i;
- int nr_pages = 1 << compound_trans_order(hpage);
+ int nr_pages = 1 << compound_order(hpage);
for (i = 0; i < nr_pages; i++)
ClearPageHWPoison(hpage + i);
}
@@ -1336,7 +1336,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}

- nr_pages = 1 << compound_trans_order(page);
+ nr_pages = 1 << compound_order(page);

if (!get_page_unless_zero(page)) {
/*
@@ -1491,7 +1491,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
} else {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_trans_order(hpage),
+ atomic_long_add(1 << compound_order(hpage),
&num_poisoned_pages);
}
return ret;
@@ -1551,7 +1551,7 @@ int soft_offline_page(struct page *page, int flags)
if (PageHuge(page)) {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_trans_order(hpage),
+ atomic_long_add(1 << compound_order(hpage),
&num_poisoned_pages);
} else {
SetPageHWPoison(page);
--
1.7.5.4

2013-08-26 08:46:37

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 6/10] mm/hwpoison: drop forward reference declarations __soft_offline_page()

Drop forward reference declarations __soft_offline_page.

Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 128 ++++++++++++++++++++++++++--------------------------
1 file changed, 63 insertions(+), 65 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f357c91..ca714ac 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1511,71 +1511,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
return ret;
}

-static int __soft_offline_page(struct page *page, int flags);
-
-/**
- * soft_offline_page - Soft offline a page.
- * @page: page to offline
- * @flags: flags. Same as memory_failure().
- *
- * Returns 0 on success, otherwise negated errno.
- *
- * Soft offline a page, by migration or invalidation,
- * without killing anything. This is for the case when
- * a page is not corrupted yet (so it's still valid to access),
- * but has had a number of corrected errors and is better taken
- * out.
- *
- * The actual policy on when to do that is maintained by
- * user space.
- *
- * This should never impact any application or cause data loss,
- * however it might take some time.
- *
- * This is not a 100% solution for all memory, but tries to be
- * ``good enough'' for the majority of memory.
- */
-int soft_offline_page(struct page *page, int flags)
-{
- int ret;
- unsigned long pfn = page_to_pfn(page);
- struct page *hpage = compound_trans_head(page);
-
- if (PageHWPoison(page)) {
- pr_info("soft offline: %#lx page already poisoned\n", pfn);
- return -EBUSY;
- }
- if (!PageHuge(page) && PageTransHuge(hpage)) {
- if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
- pr_info("soft offline: %#lx: failed to split THP\n",
- pfn);
- return -EBUSY;
- }
- }
-
- ret = get_any_page(page, pfn, flags);
- if (ret < 0)
- return ret;
- if (ret) { /* for in-use pages */
- if (PageHuge(page))
- ret = soft_offline_huge_page(page, flags);
- else
- ret = __soft_offline_page(page, flags);
- } else { /* for free pages */
- if (PageHuge(page)) {
- set_page_hwpoison_huge_page(hpage);
- dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_order(hpage),
- &num_poisoned_pages);
- } else {
- SetPageHWPoison(page);
- atomic_long_inc(&num_poisoned_pages);
- }
- }
- unset_migratetype_isolate(page, MIGRATE_MOVABLE);
- return ret;
-}
-
static int __soft_offline_page(struct page *page, int flags)
{
int ret;
@@ -1662,3 +1597,66 @@ static int __soft_offline_page(struct page *page, int flags)
}
return ret;
}
+
+/**
+ * soft_offline_page - Soft offline a page.
+ * @page: page to offline
+ * @flags: flags. Same as memory_failure().
+ *
+ * Returns 0 on success, otherwise negated errno.
+ *
+ * Soft offline a page, by migration or invalidation,
+ * without killing anything. This is for the case when
+ * a page is not corrupted yet (so it's still valid to access),
+ * but has had a number of corrected errors and is better taken
+ * out.
+ *
+ * The actual policy on when to do that is maintained by
+ * user space.
+ *
+ * This should never impact any application or cause data loss,
+ * however it might take some time.
+ *
+ * This is not a 100% solution for all memory, but tries to be
+ * ``good enough'' for the majority of memory.
+ */
+int soft_offline_page(struct page *page, int flags)
+{
+ int ret;
+ unsigned long pfn = page_to_pfn(page);
+ struct page *hpage = compound_trans_head(page);
+
+ if (PageHWPoison(page)) {
+ pr_info("soft offline: %#lx page already poisoned\n", pfn);
+ return -EBUSY;
+ }
+ if (!PageHuge(page) && PageTransHuge(hpage)) {
+ if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
+ pr_info("soft offline: %#lx: failed to split THP\n",
+ pfn);
+ return -EBUSY;
+ }
+ }
+
+ ret = get_any_page(page, pfn, flags);
+ if (ret < 0)
+ return ret;
+ if (ret) { /* for in-use pages */
+ if (PageHuge(page))
+ ret = soft_offline_huge_page(page, flags);
+ else
+ ret = __soft_offline_page(page, flags);
+ } else { /* for free pages */
+ if (PageHuge(page)) {
+ set_page_hwpoison_huge_page(hpage);
+ dequeue_hwpoisoned_huge_page(hpage);
+ atomic_long_add(1 << compound_order(hpage),
+ &num_poisoned_pages);
+ } else {
+ SetPageHWPoison(page);
+ atomic_long_inc(&num_poisoned_pages);
+ }
+ }
+ unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+ return ret;
+}
--
1.8.1.2

2013-08-26 08:46:41

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 8/10] mm/hwpoison: fix memory failure still hold reference count after unpoison empty zero page

madvise hwpoison inject will poison the read-only empty zero page if there is
no write access before poison. Empty zero page reference count will be increased
for hwpoison, subsequent poison zero page will return directly since page has
already been set PG_hwpoison, however, page reference count is still increased
by get_user_pages_fast. The unpoison process will unpoison the empty zero page
and decrease the reference count successfully for the fist time, however,
subsequent unpoison empty zero page will return directly since page has already
been unpoisoned and without decrease the page reference count of empty zero page.
This patch fix it by decrease page reference count for empty zero page which has
already been unpoisoned and page count > 1.

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 3
#define PAGE_SIZE 4096

int main(void)
{
char *mem;
int i;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

Add printk to dump page reference count:

[ 93.075959] Injecting memory failure for page 0x19d0 at 0xb77d8000
[ 93.076207] MCE 0x19d0: non LRU page recovery: Ignored
[ 93.076209] pfn 0x19d0, page count = 1 after memory failure
[ 93.076220] Injecting memory failure for page 0x19d0 at 0xb77d9000
[ 93.076221] MCE 0x19d0: already hardware poisoned
[ 93.076222] pfn 0x19d0, page count = 2 after memory failure
[ 93.076224] Injecting memory failure for page 0x19d0 at 0xb77da000
[ 93.076224] MCE 0x19d0: already hardware poisoned
[ 93.076225] pfn 0x19d0, page count = 3 after memory failure

Before patch:

[ 139.197474] MCE: Software-unpoisoned page 0x19d0
[ 139.197479] pfn 0x19d0, page count = 2 after unpoison memory
[ 150.478130] MCE: Page was already unpoisoned 0x19d0
[ 150.478135] pfn 0x19d0, page count = 2 after unpoison memory
[ 151.548288] MCE: Page was already unpoisoned 0x19d0
[ 151.548292] pfn 0x19d0, page count = 2 after unpoison memory

After patch:

[ 116.022122] MCE: Software-unpoisoned page 0x19d0
[ 116.022127] pfn 0x19d0, page count = 2 after unpoison memory
[ 117.256163] MCE: Page was already unpoisoned 0x19d0
[ 117.256167] pfn 0x19d0, page count = 1 after unpoison memory
[ 117.917772] MCE: Page was already unpoisoned 0x19d0
[ 117.917777] pfn 0x19d0, page count = 1 after unpoison memory

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ca714ac..fb687fd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1335,6 +1335,8 @@ int unpoison_memory(unsigned long pfn)
page = compound_head(p);

if (!PageHWPoison(p)) {
+ if (pfn == my_zero_pfn(0) && page_count(p) > 1)
+ put_page(p);
pr_info("MCE: Page was already unpoisoned %#lx\n", pfn);
return 0;
}
--
1.8.1.2

2013-08-26 08:46:47

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 10/10] mm/hwpoison: fix bug triggered by unpoison empty zero page

[ 57.579580] Injecting memory failure for page 0x19d0 at 0xb77d2000
[ 57.579824] MCE 0x19d0: non LRU page recovery: Ignored
[ 91.290453] MCE: Software-unpoisoned page 0x19d0
[ 91.290456] BUG: Bad page state in process bash pfn:019d0
[ 91.290466] page:f3461a00 count:0 mapcount:0 mapping: (null) index:0x0
[ 91.290467] page flags: 0x40000404(referenced|reserved)
[ 91.290469] Modules linked in: nfsd auth_rpcgss i915 nfs_acl nfs lockd video drm_kms_helper drm bnep rfcomm sunrpc bluetooth psmouse parport_pc ppdev lp serio_raw fscache parport gpio_ich lpc_ich mac_hid i2c_algo_bit tpm_tis wmi usb_storage hid_generic usbhid hid e1000e firewire_ohci firewire_core ahci ptp libahci pps_core crc_itu_t
[ 91.290486] CPU: 3 PID: 2123 Comm: bash Not tainted 3.11.0-rc6+ #12
[ 91.290487] Hardware name: LENOVO 7034DD7/ , BIOS 9HKT47AUS 01//2012
[ 91.290488] 00000000 00000000 e9625ea0 c15ec49b f3461a00 e9625eb8 c15ea119 c17cbf18
[ 91.290491] ef084314 000019d0 f3461a00 e9625ed8 c110dc8a f3461a00 00000001 00000000
[ 91.290494] f3461a00 40000404 00000000 e9625ef8 c110dcc1 f3461a00 f3461a00 000019d0
[ 91.290497] Call Trace:
[ 91.290501] [<c15ec49b>] dump_stack+0x41/0x52
[ 91.290504] [<c15ea119>] bad_page+0xcf/0xeb
[ 91.290515] [<c110dc8a>] free_pages_prepare+0x12a/0x140
[ 91.290517] [<c110dcc1>] free_hot_cold_page+0x21/0x110
[ 91.290519] [<c11123c1>] __put_single_page+0x21/0x30
[ 91.290521] [<c1112815>] put_page+0x25/0x40
[ 91.290524] [<c11544e7>] unpoison_memory+0x107/0x200
[ 91.290526] [<c104a537>] ? ns_capable+0x27/0x60
[ 91.290528] [<c1155720>] hwpoison_unpoison+0x20/0x30
[ 91.290530] [<c1178266>] simple_attr_write+0xb6/0xd0
[ 91.290532] [<c11781b0>] ? generic_fh_to_dentry+0x50/0x50
[ 91.290535] [<c1158c60>] vfs_write+0xa0/0x1b0
[ 91.290537] [<c11781b0>] ? generic_fh_to_dentry+0x50/0x50
[ 91.290539] [<c11590df>] SyS_write+0x4f/0x90
[ 91.290549] [<c15f9a81>] sysenter_do_call+0x12/0x22
[ 91.290550] Disabling lock debugging due to kernel taint

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 1
#define PAGE_SIZE 4096

int main(void)
{
char *mem;

mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;

munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

return 0;
}

There is one page reference count for default empty zero page, madvise_hwpoison
add another one by get_user_pages_fast. memory_hwpoison reduce one page reference
count since it's a non LRU page. unpoison_memory release the last page reference
count and free empty zero page to buddy system which is not correct since empty
zero page has PG_reserved flag. This patch fix it by don't reduce the page
reference count under 1 against empty zero page.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fb687fd..be6b453 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1387,7 +1387,7 @@ int unpoison_memory(unsigned long pfn)
unlock_page(page);

put_page(page);
- if (freeit)
+ if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
put_page(page);

return 0;
--
1.8.1.2

2013-08-26 08:46:49

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 9/10] mm/hwpoison: change permission of corrupt-pfn/unpoison-pfn to 0400

Hwpoison inject doesn't implement read method for corrupt-pfn/unpoison-pfn
attributes:

# cat /sys/kernel/debug/hwpoison/corrupt-pfn
cat: /sys/kernel/debug/hwpoison/corrupt-pfn: Permission denied
# cat /sys/kernel/debug/hwpoison/unpoison-pfn
cat: /sys/kernel/debug/hwpoison/unpoison-pfn: Permission denied

This patch change the permission of corrupt-pfn/unpoison-pfn to 0400.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/hwpoison-inject.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 3a61efc..8b77bfd 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -88,12 +88,12 @@ static int pfn_inject_init(void)
* hardware status change, hence do not require hardware support.
* They are mainly for testing hwpoison in software level.
*/
- dentry = debugfs_create_file("corrupt-pfn", 0600, hwpoison_dir,
+ dentry = debugfs_create_file("corrupt-pfn", 0400, hwpoison_dir,
NULL, &hwpoison_fops);
if (!dentry)
goto fail;

- dentry = debugfs_create_file("unpoison-pfn", 0600, hwpoison_dir,
+ dentry = debugfs_create_file("unpoison-pfn", 0400, hwpoison_dir,
NULL, &unpoison_fops);
if (!dentry)
goto fail;
--
1.8.1.2

2013-08-26 08:47:31

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 3/10] mm/hwpoison: fix race against poison thp

v1 -> v2:
* unpoison thp fail

There is a race between hwpoison page and unpoison page, memory_failure
set the page hwpoison and increase num_poisoned_pages without hold page
lock, and one page count will be accounted against thp for num_poisoned_pages.
However, unpoison can occur before memory_failure hold page lock and
split transparent hugepage, unpoison will decrease num_poisoned_pages
by 1 << compound_order since memory_failure has not yet split transparent
hugepage with page lock held. That means we account one page for hwpoison
and 1 << compound_order for unpoison. This patch fix it by inserting a
PageTransHuge check before doing TestClearPageHWPoison, unpoison failed
without clearing PageHWPoison and decreasing num_poisoned_pages.


A B
memory_failue
TestSetPageHWPoison(p);
if (PageHuge(p))
nr_pages = 1 << compound_order(hpage);
else
nr_pages = 1;
atomic_long_add(nr_pages, &num_poisoned_pages);
unpoison_memory
nr_pages = 1<< compound_trans_order(page);
if(TestClearPageHWPoison(p))
atomic_long_sub(nr_pages, &num_poisoned_pages);
lock page
if (!PageHWPoison(p))
unlock page and return
hwpoison_user_mappings
if (PageTransHuge(hpage))
split_huge_page(hpage);


Suggested-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5a4f4d6..a6c4752 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1339,6 +1339,16 @@ int unpoison_memory(unsigned long pfn)
return 0;
}

+ /*
+ * unpoison_memory() can encounter thp only when the thp is being
+ * worked by memory_failure() and the page lock is not held yet.
+ * In such case, we yield to memory_failure() and make unpoison fail.
+ */
+ if (PageTransHuge(page)) {
+ pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
+ return 0;
+ }
+
nr_pages = 1 << compound_order(page);

if (!get_page_unless_zero(page)) {
--
1.8.1.2

2013-08-26 08:46:35

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 5/10] mm/hwpoison: don't set migration type twice to avoid hold heavy contend zone->lock

v1 -> v2:
* add more explanation in patch description.
v2 -> v3:
* set MIGRATE_ISOLATE only if it's not set.

Set pageblock migration type will hold zone->lock which is heavy contended
in system to avoid race. However, soft offline page will set pageblock
migration type twice during get page if the page is in used, not hugetlbfs
page and not on lru list. There is unnecessary to set the pageblock migration
type and hold heavy contended zone->lock again if the first round get page
have already set the pageblock to right migration type.

The trick here is migration type is MIGRATE_ISOLATE. There are other two parts
can change MIGRATE_ISOLATE except hwpoison. One is memory hoplug, however, we
hold lock_memory_hotplug() which avoid race. The second is CMA which umovable
page allocation requst can't fallback to. So it's safe here.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 297965e..f357c91 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1426,7 +1426,8 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
* was free. This flag should be kept set until the source page
* is freed and PG_hwpoison on it is set.
*/
- set_migratetype_isolate(p, true);
+ if (get_pageblock_migratetype(p) != MIGRATE_ISOLATE)
+ set_migratetype_isolate(p, true);
/*
* When the target page is a free hugepage, just remove it
* from free hugepage list.
--
1.8.1.2

2013-08-26 08:47:46

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 7/10] mm/hwpoison: add '#' to madvise_hwpoison

Add '#' to madvise_hwpoison.

Before patch:

[ 95.892866] Injecting memory failure for page 19d0 at b7786000
[ 95.893151] MCE 0x19d0: non LRU page recovery: Ignored

After patch:

[ 95.892866] Injecting memory failure for page 0x19d0 at 0xb7786000
[ 95.893151] MCE 0x19d0: non LRU page recovery: Ignored

Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 95795df..588bb19 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -353,14 +353,14 @@ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
if (ret != 1)
return ret;
if (bhv == MADV_SOFT_OFFLINE) {
- printk(KERN_INFO "Soft offlining page %lx at %lx\n",
+ pr_info("Soft offlining page %#lx at %#lx\n",
page_to_pfn(p), start);
ret = soft_offline_page(p, MF_COUNT_INCREASED);
if (ret)
break;
continue;
}
- printk(KERN_INFO "Injecting memory failure for page %lx at %lx\n",
+ pr_info("Injecting memory failure for page %#lx at %#lx\n",
page_to_pfn(p), start);
/* Ignore return value for now */
memory_failure(page_to_pfn(p), 0, MF_COUNT_INCREASED);
--
1.8.1.2

2013-08-26 08:48:13

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 4/10] mm/hwpoison: replacing atomic_long_sub() with atomic_long_dec()

Repalce atomic_long_sub() with atomic_long_dec() since the page is
normal page instead of hugetlbfs page or thp.

Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a6c4752..297965e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1363,7 +1363,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
if (TestClearPageHWPoison(p))
- atomic_long_sub(nr_pages, &num_poisoned_pages);
+ atomic_long_dec(&num_poisoned_pages);
pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
return 0;
}
--
1.8.1.2

2013-08-26 15:47:04

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 8/10] mm/hwpoison: fix memory failure still hold reference count after unpoison empty zero page

On Mon, Aug 26, 2013 at 04:46:12PM +0800, Wanpeng Li wrote:
> madvise hwpoison inject will poison the read-only empty zero page if there is
> no write access before poison. Empty zero page reference count will be increased
> for hwpoison, subsequent poison zero page will return directly since page has
> already been set PG_hwpoison, however, page reference count is still increased
> by get_user_pages_fast. The unpoison process will unpoison the empty zero page
> and decrease the reference count successfully for the fist time, however,
> subsequent unpoison empty zero page will return directly since page has already
> been unpoisoned and without decrease the page reference count of empty zero page.
> This patch fix it by decrease page reference count for empty zero page which has
> already been unpoisoned and page count > 1.

I guess that fixing on the madvise side looks reasonable to me, because this
refcount mismatch happens only when we poison with madvise(). The root cause
is that we can get refcount multiple times on a page, even if memory_failure()
or soft_offline_page() can do its work only once.

How about making madvise_hwpoison() put a page and return immediately
(without calling memory_failure() or soft_offline_page()) when the page
is already hwpoisoned?
I hope it also helps us avoid meaningless printk flood.

Thanks,
Naoya Horiguchi

> Testcase:
>
> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <stdio.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <errno.h>
>
> #define PAGES_TO_TEST 3
> #define PAGE_SIZE 4096
>
> int main(void)
> {
> char *mem;
> int i;
>
> mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
> PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
>
> if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
> return -1;
>
> munmap(mem, PAGES_TO_TEST * PAGE_SIZE);
>
> return 0;
> }
>
> Add printk to dump page reference count:
>
> [ 93.075959] Injecting memory failure for page 0x19d0 at 0xb77d8000
> [ 93.076207] MCE 0x19d0: non LRU page recovery: Ignored
> [ 93.076209] pfn 0x19d0, page count = 1 after memory failure
> [ 93.076220] Injecting memory failure for page 0x19d0 at 0xb77d9000
> [ 93.076221] MCE 0x19d0: already hardware poisoned
> [ 93.076222] pfn 0x19d0, page count = 2 after memory failure
> [ 93.076224] Injecting memory failure for page 0x19d0 at 0xb77da000
> [ 93.076224] MCE 0x19d0: already hardware poisoned
> [ 93.076225] pfn 0x19d0, page count = 3 after memory failure
>
> Before patch:
>
> [ 139.197474] MCE: Software-unpoisoned page 0x19d0
> [ 139.197479] pfn 0x19d0, page count = 2 after unpoison memory
> [ 150.478130] MCE: Page was already unpoisoned 0x19d0
> [ 150.478135] pfn 0x19d0, page count = 2 after unpoison memory
> [ 151.548288] MCE: Page was already unpoisoned 0x19d0
> [ 151.548292] pfn 0x19d0, page count = 2 after unpoison memory
>
> After patch:
>
> [ 116.022122] MCE: Software-unpoisoned page 0x19d0
> [ 116.022127] pfn 0x19d0, page count = 2 after unpoison memory
> [ 117.256163] MCE: Page was already unpoisoned 0x19d0
> [ 117.256167] pfn 0x19d0, page count = 1 after unpoison memory
> [ 117.917772] MCE: Page was already unpoisoned 0x19d0
> [ 117.917777] pfn 0x19d0, page count = 1 after unpoison memory
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/memory-failure.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ca714ac..fb687fd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1335,6 +1335,8 @@ int unpoison_memory(unsigned long pfn)
> page = compound_head(p);
>
> if (!PageHWPoison(p)) {
> + if (pfn == my_zero_pfn(0) && page_count(p) > 1)
> + put_page(p);
> pr_info("MCE: Page was already unpoisoned %#lx\n", pfn);
> return 0;
> }
> --
> 1.8.1.2
>

2013-08-26 15:47:55

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 9/10] mm/hwpoison: change permission of corrupt-pfn/unpoison-pfn to 0400

On Mon, Aug 26, 2013 at 05:08:38PM +0800, Wanpeng Li wrote:
> On Mon, Aug 26, 2013 at 04:46:13PM +0800, Wanpeng Li wrote:
> >Hwpoison inject doesn't implement read method for corrupt-pfn/unpoison-pfn
> >attributes:
> >
> ># cat /sys/kernel/debug/hwpoison/corrupt-pfn
> >cat: /sys/kernel/debug/hwpoison/corrupt-pfn: Permission denied
> ># cat /sys/kernel/debug/hwpoison/unpoison-pfn
> >cat: /sys/kernel/debug/hwpoison/unpoison-pfn: Permission denied
> >
> >This patch change the permission of corrupt-pfn/unpoison-pfn to 0400.
> >
> >Signed-off-by: Wanpeng Li <[email protected]>
> >---
> > mm/hwpoison-inject.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> >index 3a61efc..8b77bfd 100644
> >--- a/mm/hwpoison-inject.c
> >+++ b/mm/hwpoison-inject.c
> >@@ -88,12 +88,12 @@ static int pfn_inject_init(void)
> > * hardware status change, hence do not require hardware support.
> > * They are mainly for testing hwpoison in software level.
> > */
> >- dentry = debugfs_create_file("corrupt-pfn", 0600, hwpoison_dir,
> >+ dentry = debugfs_create_file("corrupt-pfn", 0400, hwpoison_dir,
> > NULL, &hwpoison_fops);
> > if (!dentry)
> > goto fail;
> >
> >- dentry = debugfs_create_file("unpoison-pfn", 0600, hwpoison_dir,
> >+ dentry = debugfs_create_file("unpoison-pfn", 0400, hwpoison_dir,
> > NULL, &unpoison_fops);
> > if (!dentry)
> > goto fail;
> >--
> >1.8.1.2
>
> Fix this patch:

Reviewed-by: Naoya Horiguchi <[email protected]>

2013-08-26 23:32:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 8/10] mm/hwpoison: fix memory failure still hold reference count after unpoison empty zero page

On Tue, 27 Aug 2013 07:26:04 +0800 Wanpeng Li <[email protected]> wrote:

> Hi Naoya,
> On Mon, Aug 26, 2013 at 11:45:37AM -0400, Naoya Horiguchi wrote:
> >On Mon, Aug 26, 2013 at 04:46:12PM +0800, Wanpeng Li wrote:
> >> madvise hwpoison inject will poison the read-only empty zero page if there is
> >> no write access before poison. Empty zero page reference count will be increased
> >> for hwpoison, subsequent poison zero page will return directly since page has
> >> already been set PG_hwpoison, however, page reference count is still increased
> >> by get_user_pages_fast. The unpoison process will unpoison the empty zero page
> >> and decrease the reference count successfully for the fist time, however,
> >> subsequent unpoison empty zero page will return directly since page has already
> >> been unpoisoned and without decrease the page reference count of empty zero page.
> >> This patch fix it by decrease page reference count for empty zero page which has
> >> already been unpoisoned and page count > 1.
> >
> >I guess that fixing on the madvise side looks reasonable to me, because this
> >refcount mismatch happens only when we poison with madvise(). The root cause
> >is that we can get refcount multiple times on a page, even if memory_failure()
> >or soft_offline_page() can do its work only once.
> >
>
> I think this just happen in read-only before poison case against empty
> zero page.
>
> Hi Andrew,
>
> I see you have already merged the patch, which method you prefer?
>

Addressing it within the madvise code does sound more appropriate. The
change which
mm-hwpoison-fix-memory-failure-still-holding-reference-count-after-unpoisoning-empty-zero-page.patch
makes is pretty darn strange-looking at at least needs a comment
telling people what it's doing, and why.

2013-08-27 00:12:51

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 8/10] mm/hwpoison: fix memory failure still hold reference count after unpoison empty zero page

Hi Wanpeng,

On Tue, Aug 27, 2013 at 07:26:04AM +0800, Wanpeng Li wrote:
> Hi Naoya,
> On Mon, Aug 26, 2013 at 11:45:37AM -0400, Naoya Horiguchi wrote:
> >On Mon, Aug 26, 2013 at 04:46:12PM +0800, Wanpeng Li wrote:
> >> madvise hwpoison inject will poison the read-only empty zero page if there is
> >> no write access before poison. Empty zero page reference count will be increased
> >> for hwpoison, subsequent poison zero page will return directly since page has
> >> already been set PG_hwpoison, however, page reference count is still increased
> >> by get_user_pages_fast. The unpoison process will unpoison the empty zero page
> >> and decrease the reference count successfully for the fist time, however,
> >> subsequent unpoison empty zero page will return directly since page has already
> >> been unpoisoned and without decrease the page reference count of empty zero page.
> >> This patch fix it by decrease page reference count for empty zero page which has
> >> already been unpoisoned and page count > 1.
> >
> >I guess that fixing on the madvise side looks reasonable to me, because this
> >refcount mismatch happens only when we poison with madvise(). The root cause
> >is that we can get refcount multiple times on a page, even if memory_failure()
> >or soft_offline_page() can do its work only once.
> >
>
> I think this just happen in read-only before poison case against empty
> zero page.

OK. I agree.

> Hi Andrew,
>
> I see you have already merged the patch, which method you prefer?
>
> >How about making madvise_hwpoison() put a page and return immediately
> >(without calling memory_failure() or soft_offline_page()) when the page
> >is already hwpoisoned?
> >I hope it also helps us avoid meaningless printk flood.
> >
>
> Btw, Naoya, how about patch 10/10, any input are welcome! ;-)

No objection if you (and Andrew) decide to go with current approach.
But I think that if we shift to fix this problem in madvise(),
we don't need 10/10 any more. So it looks simpler to me.

Thanks,
Naoya Horiguchi

2013-08-27 00:47:17

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 8/10] mm/hwpoison: fix memory failure still hold reference count after unpoison empty zero page

On Tue, Aug 27, 2013 at 08:21:05AM +0800, Wanpeng Li wrote:
> Hi Naoya,
> On Mon, Aug 26, 2013 at 08:12:29PM -0400, Naoya Horiguchi wrote:
> >Hi Wanpeng,
> >
> >On Tue, Aug 27, 2013 at 07:26:04AM +0800, Wanpeng Li wrote:
> >> Hi Naoya,
> >> On Mon, Aug 26, 2013 at 11:45:37AM -0400, Naoya Horiguchi wrote:
> >> >On Mon, Aug 26, 2013 at 04:46:12PM +0800, Wanpeng Li wrote:
> >> >> madvise hwpoison inject will poison the read-only empty zero page if there is
> >> >> no write access before poison. Empty zero page reference count will be increased
> >> >> for hwpoison, subsequent poison zero page will return directly since page has
> >> >> already been set PG_hwpoison, however, page reference count is still increased
> >> >> by get_user_pages_fast. The unpoison process will unpoison the empty zero page
> >> >> and decrease the reference count successfully for the fist time, however,
> >> >> subsequent unpoison empty zero page will return directly since page has already
> >> >> been unpoisoned and without decrease the page reference count of empty zero page.
> >> >> This patch fix it by decrease page reference count for empty zero page which has
> >> >> already been unpoisoned and page count > 1.
> >> >
> >> >I guess that fixing on the madvise side looks reasonable to me, because this
> >> >refcount mismatch happens only when we poison with madvise(). The root cause
> >> >is that we can get refcount multiple times on a page, even if memory_failure()
> >> >or soft_offline_page() can do its work only once.
> >> >
> >>
> >> I think this just happen in read-only before poison case against empty
> >> zero page.
> >
> >OK. I agree.
> >
> >> Hi Andrew,
> >>
> >> I see you have already merged the patch, which method you prefer?
> >>
> >> >How about making madvise_hwpoison() put a page and return immediately
> >> >(without calling memory_failure() or soft_offline_page()) when the page
> >> >is already hwpoisoned?
> >> >I hope it also helps us avoid meaningless printk flood.
> >> >
> >>
> >> Btw, Naoya, how about patch 10/10, any input are welcome! ;-)
> >
> >No objection if you (and Andrew) decide to go with current approach.
>
> Andrew prefer your method, I will resend the patch w/ your suggested-by. ;-)

Thanks you :)

> >But I think that if we shift to fix this problem in madvise(),
> >we don't need 10/10 any more. So it looks simpler to me.
>
> I don't think it's same issue. There is just one page in my test case.
> #define PAGES_TO_TEST 1
> If I miss something?

Ah, OK.

BTW, in my understanding, zero pages are not exist physically (I mean that
no real page is allocated to store 4096 bytes of 0.) So there can't happen
any real MCE SRAO on zero page. So one possible solution might be that we
completely ignore all of madvise(MADV_HWPOISON) over zero pages.

Thanks,
Naoya Horiguchi

2013-08-27 01:34:36

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 8/10] mm/hwpoison: fix memory failure still hold reference count after unpoison empty zero page

On Tue, Aug 27, 2013 at 09:17:29AM +0800, Wanpeng Li wrote:
> On Mon, Aug 26, 2013 at 08:46:54PM -0400, Naoya Horiguchi wrote:
> >On Tue, Aug 27, 2013 at 08:21:05AM +0800, Wanpeng Li wrote:
> >> Hi Naoya,
> >> On Mon, Aug 26, 2013 at 08:12:29PM -0400, Naoya Horiguchi wrote:
> >> >Hi Wanpeng,
> >> >
> >> >On Tue, Aug 27, 2013 at 07:26:04AM +0800, Wanpeng Li wrote:
> >> >> Hi Naoya,
> >> >> On Mon, Aug 26, 2013 at 11:45:37AM -0400, Naoya Horiguchi wrote:
> >> >> >On Mon, Aug 26, 2013 at 04:46:12PM +0800, Wanpeng Li wrote:
> >> >> >> madvise hwpoison inject will poison the read-only empty zero page if there is
> >> >> >> no write access before poison. Empty zero page reference count will be increased
> >> >> >> for hwpoison, subsequent poison zero page will return directly since page has
> >> >> >> already been set PG_hwpoison, however, page reference count is still increased
> >> >> >> by get_user_pages_fast. The unpoison process will unpoison the empty zero page
> >> >> >> and decrease the reference count successfully for the fist time, however,
> >> >> >> subsequent unpoison empty zero page will return directly since page has already
> >> >> >> been unpoisoned and without decrease the page reference count of empty zero page.
> >> >> >> This patch fix it by decrease page reference count for empty zero page which has
> >> >> >> already been unpoisoned and page count > 1.
> >> >> >
> >> >> >I guess that fixing on the madvise side looks reasonable to me, because this
> >> >> >refcount mismatch happens only when we poison with madvise(). The root cause
> >> >> >is that we can get refcount multiple times on a page, even if memory_failure()
> >> >> >or soft_offline_page() can do its work only once.
> >> >> >
> >> >>
> >> >> I think this just happen in read-only before poison case against empty
> >> >> zero page.
> >> >
> >> >OK. I agree.
> >> >
> >> >> Hi Andrew,
> >> >>
> >> >> I see you have already merged the patch, which method you prefer?
> >> >>
> >> >> >How about making madvise_hwpoison() put a page and return immediately
> >> >> >(without calling memory_failure() or soft_offline_page()) when the page
> >> >> >is already hwpoisoned?
> >> >> >I hope it also helps us avoid meaningless printk flood.
> >> >> >
> >> >>
> >> >> Btw, Naoya, how about patch 10/10, any input are welcome! ;-)
> >> >
> >> >No objection if you (and Andrew) decide to go with current approach.
> >>
> >> Andrew prefer your method, I will resend the patch w/ your suggested-by. ;-)
> >
> >Thanks you :)
> >
> >> >But I think that if we shift to fix this problem in madvise(),
> >> >we don't need 10/10 any more. So it looks simpler to me.
> >>
> >> I don't think it's same issue. There is just one page in my test case.
> >> #define PAGES_TO_TEST 1
> >> If I miss something?
> >
> >Ah, OK.
>
> I complete do it in madvise codes, however, the bug mentioned in patch
> 10/10 is still there. ;-)
>
> >
> >BTW, in my understanding, zero pages are not exist physically (I mean that
> >no real page is allocated to store 4096 bytes of 0.) So there can't happen
> >any real MCE SRAO on zero page. So one possible solution might be that we
> >completely ignore all of madvise(MADV_HWPOISON) over zero pages.
>
> What's the userland visible difference against mmap w/o write access before poison
> you expect?

In this case the userland is a test program like mce-test, so my expectation
is that the test program shouldn't detect false test failures when it
accidentally calls madvise(MADV_HWPOISON) on zero pages, because there's no
real test target associated with such testcases. So I think just returning
with success return code without doing anything looks good.

Thanks,
Naoya Horiguchi

2013-08-29 06:00:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v4 1/10] mm/hwpoison: fix lose PG_dirty flag for errors on mlocked pages


I did a quick read and the patches look all good to me
(except the one just moving code around -- seems pointless,
forward declarations are totally fine)

Acked-by: Andi Kleen <[email protected]>

-Andi
--
[email protected] -- Speaking for myself only.