memory_failure() store the page flag of the error page before doing unmap,
and (only) if the first check with page flags at the time decided the error
page is unknown, it do the second check with the stored page flag since
memory_failure() does unmapping of the error pages before doing page_action().
This unmapping changes the page state, especially page_remove_rmap() (called
from try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.
However, memory_failure() can't handle memory errors on dirty mlocked pages
correctly. try_to_unmap_one will move the dirty bit from pte to the physical
page, the second check lose it since it check the stored page flag. This patch
fix it by restore PG_dirty flag to stored page flag if the page is dirty.
Testcase:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <errno.h>
#define PAGES_TO_TEST 2
#define PAGE_SIZE 4096
int main(void)
{
char *mem;
int i;
mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0, 0);
for (i = 0; i < PAGES_TO_TEST; i++)
mem[i * PAGE_SIZE] = 'a';
if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;
return 0;
}
Before patch:
[ 912.839247] Injecting memory failure for page 7dfb8 at 7f6b4e37b000
[ 912.839257] MCE 0x7dfb8: clean mlocked LRU page recovery: Recovered
[ 912.845550] MCE 0x7dfb8: clean mlocked LRU page still referenced by 1 users
[ 912.852586] Injecting memory failure for page 7e6aa at 7f6b4e37c000
[ 912.852594] MCE 0x7e6aa: clean mlocked LRU page recovery: Recovered
[ 912.858936] MCE 0x7e6aa: clean mlocked LRU page still referenced by 1 users
After patch:
[ 163.590225] Injecting memory failure for page 91bc2f at 7f9f5b0e5000
[ 163.590264] MCE 0x91bc2f: dirty mlocked LRU page recovery: Recovered
[ 163.596680] MCE 0x91bc2f: dirty mlocked LRU page still referenced by 1 users
[ 163.603831] Injecting memory failure for page 91cdd3 at 7f9f5b0e6000
[ 163.603852] MCE 0x91cdd3: dirty mlocked LRU page recovery: Recovered
[ 163.610305] MCE 0x91cdd3: dirty mlocked LRU page still referenced by 1 users
Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2c13aa7..d5686d4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1204,6 +1204,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
for (ps = error_states;; ps++)
if ((p->flags & ps->mask) == ps->res)
break;
+
+ page_flags |= (p->flags & (1UL << PG_dirty));
+
if (!ps->mask)
for (ps = error_states;; ps++)
if ((page_flags & ps->mask) == ps->res)
--
1.8.1.2
v1 -> v2:
* unpoison thp fail
There is a race between hwpoison page and unpoison page, memory_failure
set the page hwpoison and increase num_poisoned_pages without hold page
lock, and one page count will be accounted against thp for num_poisoned_pages.
However, unpoison can occur before memory_failure hold page lock and
split transparent hugepage, unpoison will decrease num_poisoned_pages
by 1 << compound_order since memory_failure has not yet split transparent
hugepage with page lock held. That means we account one page for hwpoison
and 1 << compound_order for unpoison. This patch fix it by inserting a
PageTransHuge check before doing TestClearPageHWPoison, unpoison failed
without clearing PageHWPoison and decreasing num_poisoned_pages.
A B
memory_failue
TestSetPageHWPoison(p);
if (PageHuge(p))
nr_pages = 1 << compound_order(hpage);
else
nr_pages = 1;
atomic_long_add(nr_pages, &num_poisoned_pages);
unpoison_memory
nr_pages = 1<< compound_trans_order(page);
if(TestClearPageHWPoison(p))
atomic_long_sub(nr_pages, &num_poisoned_pages);
lock page
if (!PageHWPoison(p))
unlock page and return
hwpoison_user_mappings
if (PageTransHuge(hpage))
split_huge_page(hpage);
Suggested-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5a4f4d6..a6c4752 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1339,6 +1339,16 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
+ /*
+ * unpoison_memory() can encounter thp only when the thp is being
+ * worked by memory_failure() and the page lock is not held yet.
+ * In such case, we yield to memory_failure() and make unpoison fail.
+ */
+ if (PageTransHuge(page)) {
+ pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
+ return 0;
+ }
+
nr_pages = 1 << compound_order(page);
if (!get_page_unless_zero(page)) {
--
1.8.1.2
Drop forward reference declarations __soft_offline_page.
Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 128 ++++++++++++++++++++++++++--------------------------
1 file changed, 63 insertions(+), 65 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f357c91..ca714ac 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1511,71 +1511,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
return ret;
}
-static int __soft_offline_page(struct page *page, int flags);
-
-/**
- * soft_offline_page - Soft offline a page.
- * @page: page to offline
- * @flags: flags. Same as memory_failure().
- *
- * Returns 0 on success, otherwise negated errno.
- *
- * Soft offline a page, by migration or invalidation,
- * without killing anything. This is for the case when
- * a page is not corrupted yet (so it's still valid to access),
- * but has had a number of corrected errors and is better taken
- * out.
- *
- * The actual policy on when to do that is maintained by
- * user space.
- *
- * This should never impact any application or cause data loss,
- * however it might take some time.
- *
- * This is not a 100% solution for all memory, but tries to be
- * ``good enough'' for the majority of memory.
- */
-int soft_offline_page(struct page *page, int flags)
-{
- int ret;
- unsigned long pfn = page_to_pfn(page);
- struct page *hpage = compound_trans_head(page);
-
- if (PageHWPoison(page)) {
- pr_info("soft offline: %#lx page already poisoned\n", pfn);
- return -EBUSY;
- }
- if (!PageHuge(page) && PageTransHuge(hpage)) {
- if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
- pr_info("soft offline: %#lx: failed to split THP\n",
- pfn);
- return -EBUSY;
- }
- }
-
- ret = get_any_page(page, pfn, flags);
- if (ret < 0)
- return ret;
- if (ret) { /* for in-use pages */
- if (PageHuge(page))
- ret = soft_offline_huge_page(page, flags);
- else
- ret = __soft_offline_page(page, flags);
- } else { /* for free pages */
- if (PageHuge(page)) {
- set_page_hwpoison_huge_page(hpage);
- dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_order(hpage),
- &num_poisoned_pages);
- } else {
- SetPageHWPoison(page);
- atomic_long_inc(&num_poisoned_pages);
- }
- }
- unset_migratetype_isolate(page, MIGRATE_MOVABLE);
- return ret;
-}
-
static int __soft_offline_page(struct page *page, int flags)
{
int ret;
@@ -1662,3 +1597,66 @@ static int __soft_offline_page(struct page *page, int flags)
}
return ret;
}
+
+/**
+ * soft_offline_page - Soft offline a page.
+ * @page: page to offline
+ * @flags: flags. Same as memory_failure().
+ *
+ * Returns 0 on success, otherwise negated errno.
+ *
+ * Soft offline a page, by migration or invalidation,
+ * without killing anything. This is for the case when
+ * a page is not corrupted yet (so it's still valid to access),
+ * but has had a number of corrected errors and is better taken
+ * out.
+ *
+ * The actual policy on when to do that is maintained by
+ * user space.
+ *
+ * This should never impact any application or cause data loss,
+ * however it might take some time.
+ *
+ * This is not a 100% solution for all memory, but tries to be
+ * ``good enough'' for the majority of memory.
+ */
+int soft_offline_page(struct page *page, int flags)
+{
+ int ret;
+ unsigned long pfn = page_to_pfn(page);
+ struct page *hpage = compound_trans_head(page);
+
+ if (PageHWPoison(page)) {
+ pr_info("soft offline: %#lx page already poisoned\n", pfn);
+ return -EBUSY;
+ }
+ if (!PageHuge(page) && PageTransHuge(hpage)) {
+ if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
+ pr_info("soft offline: %#lx: failed to split THP\n",
+ pfn);
+ return -EBUSY;
+ }
+ }
+
+ ret = get_any_page(page, pfn, flags);
+ if (ret < 0)
+ return ret;
+ if (ret) { /* for in-use pages */
+ if (PageHuge(page))
+ ret = soft_offline_huge_page(page, flags);
+ else
+ ret = __soft_offline_page(page, flags);
+ } else { /* for free pages */
+ if (PageHuge(page)) {
+ set_page_hwpoison_huge_page(hpage);
+ dequeue_hwpoisoned_huge_page(hpage);
+ atomic_long_add(1 << compound_order(hpage),
+ &num_poisoned_pages);
+ } else {
+ SetPageHWPoison(page);
+ atomic_long_inc(&num_poisoned_pages);
+ }
+ }
+ unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+ return ret;
+}
--
1.8.1.2
Repalce atomic_long_sub() with atomic_long_dec() since the page is
normal page instead of hugetlbfs page or thp.
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a6c4752..297965e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1363,7 +1363,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
if (TestClearPageHWPoison(p))
- atomic_long_sub(nr_pages, &num_poisoned_pages);
+ atomic_long_dec(&num_poisoned_pages);
pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
return 0;
}
--
1.8.1.2
v1 -> v2:
* add more explanation in patch description.
Set pageblock migration type will hold zone->lock which is heavy contended
in system to avoid race. However, soft offline page will set pageblock
migration type twice during get page if the page is in used, not hugetlbfs
page and not on lru list. There is unnecessary to set the pageblock migration
type and hold heavy contended zone->lock again if the first round get page
have already set the pageblock to right migration type.
The trick here is migration type is MIGRATE_ISOLATE. There are other two parts
can change MIGRATE_ISOLATE except hwpoison. One is memory hoplug, however, we
hold lock_memory_hotplug() which avoid race. The second is CMA which umovable
page allocation requst can't fallback to. So it's safe here.
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 297965e..f357c91 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1426,7 +1426,8 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
* was free. This flag should be kept set until the source page
* is freed and PG_hwpoison on it is set.
*/
- set_migratetype_isolate(p, true);
+ if (get_pageblock_migratetype(p) == MIGRATE_ISOLATE)
+ set_migratetype_isolate(p, true);
/*
* When the target page is a free hugepage, just remove it
* from free hugepage list.
--
1.8.1.2
Add '#' to madvise_hwpoison.
Before patch:
[ 95.892866] Injecting memory failure for page 19d0 at b7786000
[ 95.893151] MCE 0x19d0: non LRU page recovery: Ignored
After patch:
[ 95.892866] Injecting memory failure for page 0x19d0 at 0xb7786000
[ 95.893151] MCE 0x19d0: non LRU page recovery: Ignored
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 95795df..588bb19 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -353,14 +353,14 @@ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
if (ret != 1)
return ret;
if (bhv == MADV_SOFT_OFFLINE) {
- printk(KERN_INFO "Soft offlining page %lx at %lx\n",
+ pr_info("Soft offlining page %#lx at %#lx\n",
page_to_pfn(p), start);
ret = soft_offline_page(p, MF_COUNT_INCREASED);
if (ret)
break;
continue;
}
- printk(KERN_INFO "Injecting memory failure for page %lx at %lx\n",
+ pr_info("Injecting memory failure for page %#lx at %#lx\n",
page_to_pfn(p), start);
/* Ignore return value for now */
memory_failure(page_to_pfn(p), 0, MF_COUNT_INCREASED);
--
1.8.1.2
v1 -> v2:
* drop compound_trans_order completely
compound lock is introduced by commit e9da73d67("thp: compound_lock."),
it is used to serialize put_page against __split_huge_page_refcount().
In addition, transparent hugepages will be splitted in hwpoison handler
and just one subpage will be poisoned. There is unnecessary to hold
compound lock for hugetlbfs page. This patch replace compound_trans_order
by compond_order in the place where the page is hugetlbfs page.
Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
include/linux/mm.h | 14 --------------
mm/memory-failure.c | 12 ++++++------
2 files changed, 6 insertions(+), 20 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..1745a2a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -489,20 +489,6 @@ static inline int compound_order(struct page *page)
return (unsigned long)page[1].lru.prev;
}
-static inline int compound_trans_order(struct page *page)
-{
- int order;
- unsigned long flags;
-
- if (!PageHead(page))
- return 0;
-
- flags = compound_lock_irqsave(page);
- order = compound_order(page);
- compound_unlock_irqrestore(page, flags);
- return order;
-}
-
static inline void set_compound_order(struct page *page, unsigned long order)
{
page[1].lru.prev = (void *)order;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2c13aa7..efa6bd7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -206,7 +206,7 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
#ifdef __ARCH_SI_TRAPNO
si.si_trapno = trapno;
#endif
- si.si_addr_lsb = compound_trans_order(compound_head(page)) + PAGE_SHIFT;
+ si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
if ((flags & MF_ACTION_REQUIRED) && t == current) {
si.si_code = BUS_MCEERR_AR;
@@ -983,7 +983,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
static void set_page_hwpoison_huge_page(struct page *hpage)
{
int i;
- int nr_pages = 1 << compound_trans_order(hpage);
+ int nr_pages = 1 << compound_order(hpage);
for (i = 0; i < nr_pages; i++)
SetPageHWPoison(hpage + i);
}
@@ -991,7 +991,7 @@ static void set_page_hwpoison_huge_page(struct page *hpage)
static void clear_page_hwpoison_huge_page(struct page *hpage)
{
int i;
- int nr_pages = 1 << compound_trans_order(hpage);
+ int nr_pages = 1 << compound_order(hpage);
for (i = 0; i < nr_pages; i++)
ClearPageHWPoison(hpage + i);
}
@@ -1336,7 +1336,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
- nr_pages = 1 << compound_trans_order(page);
+ nr_pages = 1 << compound_order(page);
if (!get_page_unless_zero(page)) {
/*
@@ -1491,7 +1491,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
} else {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_trans_order(hpage),
+ atomic_long_add(1 << compound_order(hpage),
&num_poisoned_pages);
}
return ret;
@@ -1551,7 +1551,7 @@ int soft_offline_page(struct page *page, int flags)
if (PageHuge(page)) {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
- atomic_long_add(1 << compound_trans_order(hpage),
+ atomic_long_add(1 << compound_order(hpage),
&num_poisoned_pages);
} else {
SetPageHWPoison(page);
--
1.7.5.4
On Fri, Aug 23, 2013 at 06:30:38PM +0800, Wanpeng Li wrote:
> Repalce atomic_long_sub() with atomic_long_dec() since the page is
> normal page instead of hugetlbfs page or thp.
>
> Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
> ---
> mm/memory-failure.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a6c4752..297965e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1363,7 +1363,7 @@ int unpoison_memory(unsigned long pfn)
> return 0;
> }
> if (TestClearPageHWPoison(p))
> - atomic_long_sub(nr_pages, &num_poisoned_pages);
> + atomic_long_dec(&num_poisoned_pages);
> pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
> return 0;
> }
> --
> 1.8.1.2
>
On Fri, Aug 23, 2013 at 06:30:39PM +0800, Wanpeng Li wrote:
> v1 -> v2:
> * add more explanation in patch description.
>
> Set pageblock migration type will hold zone->lock which is heavy contended
> in system to avoid race. However, soft offline page will set pageblock
> migration type twice during get page if the page is in used, not hugetlbfs
> page and not on lru list. There is unnecessary to set the pageblock migration
> type and hold heavy contended zone->lock again if the first round get page
> have already set the pageblock to right migration type.
>
> The trick here is migration type is MIGRATE_ISOLATE. There are other two parts
> can change MIGRATE_ISOLATE except hwpoison. One is memory hoplug, however, we
> hold lock_memory_hotplug() which avoid race. The second is CMA which umovable
> page allocation requst can't fallback to. So it's safe here.
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/memory-failure.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 297965e..f357c91 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1426,7 +1426,8 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
> * was free. This flag should be kept set until the source page
> * is freed and PG_hwpoison on it is set.
> */
> - set_migratetype_isolate(p, true);
> + if (get_pageblock_migratetype(p) == MIGRATE_ISOLATE)
> + set_migratetype_isolate(p, true);
Do you really mean "we set MIGRATE_ISOLATE only if it's already set?"
Thanks,
Naoya Horiguchi
> /*
> * When the target page is a free hugepage, just remove it
> * from free hugepage list.
> --
> 1.8.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
On Fri, Aug 23, 2013 at 06:30:41PM +0800, Wanpeng Li wrote:
> Add '#' to madvise_hwpoison.
>
> Before patch:
>
> [ 95.892866] Injecting memory failure for page 19d0 at b7786000
> [ 95.893151] MCE 0x19d0: non LRU page recovery: Ignored
>
> After patch:
>
> [ 95.892866] Injecting memory failure for page 0x19d0 at 0xb7786000
> [ 95.893151] MCE 0x19d0: non LRU page recovery: Ignored
>
> Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
> ---
> mm/madvise.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 95795df..588bb19 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -353,14 +353,14 @@ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
> if (ret != 1)
> return ret;
> if (bhv == MADV_SOFT_OFFLINE) {
> - printk(KERN_INFO "Soft offlining page %lx at %lx\n",
> + pr_info("Soft offlining page %#lx at %#lx\n",
> page_to_pfn(p), start);
> ret = soft_offline_page(p, MF_COUNT_INCREASED);
> if (ret)
> break;
> continue;
> }
> - printk(KERN_INFO "Injecting memory failure for page %lx at %lx\n",
> + pr_info("Injecting memory failure for page %#lx at %#lx\n",
> page_to_pfn(p), start);
> /* Ignore return value for now */
> memory_failure(page_to_pfn(p), 0, MF_COUNT_INCREASED);
> --
> 1.8.1.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>