2015-07-31 06:49:21

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 0/5] hwpoison: fixes on v4.2-rc4

This is v2 of hwpoison fix series for v4.2.
I reflected the feedback for v1, and tried another solution for "reuse just
after soft-offline" problem (see patch 5/5.)

General description (mostly identical to v1)
===================

Recently I addressed a few of hwpoison race problems and the patches are merged
on v4.2-rc1. It made progress, but unfortunately some problems still remain due
to less coverage of my testing. So I'm trying to fix or avoid them in this series.

One point I'm expecting to discuss is that patch 4/5 changes the page flag set
to be checked on free time. In current behavior, __PG_HWPOISON is not supposed
to be set when the page is freed. I think that there is no strong reason for this
behavior, and it causes a problem hard to fix only in error handler side (because
__PG_HWPOISON could be set at arbitrary timing.) So I suggest to change it.

Test
====

With this patchset, hwpoison stress testing in official mce-test testsuite
(which previously failed) passes.

Thanks,
Naoya Horiguchi
---
Tree: https://github.com/Naoya-Horiguchi/linux/tree/v4.2-rc4/hwpoison.v2
---
Summary:

Naoya Horiguchi (5):
mm/memory-failure: unlock_page before put_page
mm/memory-failure: fix race in counting num_poisoned_pages
mm/memory-failure: give up error handling for non-tail-refcounted thp
mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
mm/memory-failure: set PageHWPoison before migrate_pages()

include/linux/page-flags.h | 10 +++++++---
mm/huge_memory.c | 7 +------
mm/memory-failure.c | 32 ++++++++++++++++++--------------
mm/migrate.c | 8 ++++++--
mm/page_alloc.c | 4 ++++
5 files changed, 36 insertions(+), 25 deletions(-)-


2015-07-31 06:49:20

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 1/5] mm/memory-failure: unlock_page before put_page

In "just unpoisoned" path, we do put_page and then unlock_page, which is a
wrong order and causes "freeing locked page" bug. So let's fix it.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git v4.2-rc4.orig/mm/memory-failure.c v4.2-rc4/mm/memory-failure.c
index c53543d89282..04d677048af7 100644
--- v4.2-rc4.orig/mm/memory-failure.c
+++ v4.2-rc4/mm/memory-failure.c
@@ -1209,9 +1209,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
if (!PageHWPoison(p)) {
printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn);
atomic_long_sub(nr_pages, &num_poisoned_pages);
+ unlock_page(hpage);
put_page(hpage);
- res = 0;
- goto out;
+ return 0;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
--
2.4.3

2015-07-31 06:49:28

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 4/5] mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*

The race condition addressed in commit add05cecef80 ("mm: soft-offline: don't
free target page in successful page migration") was not closed completely,
because that can happen not only for soft-offline, but also for hard-offline.
Consider that a slab page is about to be freed into buddy pool, and then an
uncorrected memory error hits the page just after entering __free_one_page(),
then VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP) is triggered,
despite the fact that it's not necessary because the data on the affected
page is not consumed.

To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time. I think it's justified because __PG_HWPOISON flags is
defined to prevent the page from being reused, and setting it outside the
page's alloc-free cycle is a designed behavior (not a bug.)

For recent months, I was annoyed about BUG_ON when soft-offlined page remains
on lru cache list for a while, which is avoided by calling put_page() instead
of putback_lru_page() in page migration's success path. This means that this
patch reverts a major change from commit add05cecef80 about the new refcounting
rule of soft-offlined pages, so "reuse window" revives. This will be closed
by a subsequent patch.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
v1 -> v2:
- call put_page() when migration succeeded in reason == MR_MEMORY_FAILURE
- refrain from reviving "reuse prevention" by MIGRATE_ISOLATE flag
---
include/linux/page-flags.h | 10 +++++++---
mm/huge_memory.c | 7 +------
mm/migrate.c | 5 ++++-
mm/page_alloc.c | 4 ++++
4 files changed, 16 insertions(+), 10 deletions(-)

diff --git v4.2-rc4.orig/include/linux/page-flags.h v4.2-rc4/include/linux/page-flags.h
index f34e040b34e9..41c93844fb1d 100644
--- v4.2-rc4.orig/include/linux/page-flags.h
+++ v4.2-rc4/include/linux/page-flags.h
@@ -631,15 +631,19 @@ static inline void ClearPageSlabPfmemalloc(struct page *page)
1 << PG_private | 1 << PG_private_2 | \
1 << PG_writeback | 1 << PG_reserved | \
1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
- 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
+ 1 << PG_unevictable | __PG_MLOCKED | \
__PG_COMPOUND_LOCK)

/*
* Flags checked when a page is prepped for return by the page allocator.
- * Pages being prepped should not have any flags set. It they are set,
+ * Pages being prepped should not have these flags set. It they are set,
* there has been a kernel bug or struct page corruption.
+ *
+ * __PG_HWPOISON is exceptional because it needs to be kept beyond page's
+ * alloc-free cycle to prevent from reusing the page.
*/
-#define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1)
+#define PAGE_FLAGS_CHECK_AT_PREP \
+ (((1 << NR_PAGEFLAGS) - 1) & ~__PG_HWPOISON)

#define PAGE_FLAGS_PRIVATE \
(1 << PG_private | 1 << PG_private_2)
diff --git v4.2-rc4.orig/mm/huge_memory.c v4.2-rc4/mm/huge_memory.c
index c107094f79ba..097c7a4bfbd9 100644
--- v4.2-rc4.orig/mm/huge_memory.c
+++ v4.2-rc4/mm/huge_memory.c
@@ -1676,12 +1676,7 @@ static void __split_huge_page_refcount(struct page *page,
/* after clearing PageTail the gup refcount can be released */
smp_mb__after_atomic();

- /*
- * retain hwpoison flag of the poisoned tail page:
- * fix for the unsuitable process killed on Guest Machine(KVM)
- * by the memory-failure.
- */
- page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP | __PG_HWPOISON;
+ page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (page->flags &
((1L << PG_referenced) |
(1L << PG_swapbacked) |
diff --git v4.2-rc4.orig/mm/migrate.c v4.2-rc4/mm/migrate.c
index ee401e4e5ef1..f2415be7d93b 100644
--- v4.2-rc4.orig/mm/migrate.c
+++ v4.2-rc4/mm/migrate.c
@@ -950,7 +950,10 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- if (reason != MR_MEMORY_FAILURE)
+ /* Soft-offlined page shouldn't go through lru cache list */
+ if (reason == MR_MEMORY_FAILURE)
+ put_page(page);
+ else
putback_lru_page(page);
}

diff --git v4.2-rc4.orig/mm/page_alloc.c v4.2-rc4/mm/page_alloc.c
index ef19f22b2b7d..775c254648f7 100644
--- v4.2-rc4.orig/mm/page_alloc.c
+++ v4.2-rc4/mm/page_alloc.c
@@ -1285,6 +1285,10 @@ static inline int check_new_page(struct page *page)
bad_reason = "non-NULL mapping";
if (unlikely(atomic_read(&page->_count) != 0))
bad_reason = "nonzero _count";
+ if (unlikely(page->flags & __PG_HWPOISON)) {
+ bad_reason = "HWPoisoned (hardware-corrupted)";
+ bad_flags = __PG_HWPOISON;
+ }
if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set";
bad_flags = PAGE_FLAGS_CHECK_AT_PREP;
--
2.4.3

2015-07-31 06:49:29

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 3/5] mm/memory-failure: give up error handling for non-tail-refcounted thp

"non anonymous thp" case is still racy with freeing thp, which causes panic
due to put_page() for refcount-0 page. It seems that closing up this race
might be hard (and/or not worth doing,) so let's give up the error handling
for this case.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
v1 -> v2:
- keep pr_err() message
---
mm/memory-failure.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)

diff --git v4.2-rc4.orig/mm/memory-failure.c v4.2-rc4/mm/memory-failure.c
index f72d2fad0b90..cd985530f102 100644
--- v4.2-rc4.orig/mm/memory-failure.c
+++ v4.2-rc4/mm/memory-failure.c
@@ -909,6 +909,18 @@ int get_hwpoison_page(struct page *page)
* directly for tail pages.
*/
if (PageTransHuge(head)) {
+ /*
+ * Non anonymous thp exists only in allocation/free time. We
+ * can't handle such a case correctly, so let's give it up.
+ * This should be better than triggering BUG_ON when kernel
+ * tries to touch the "partially handled" page.
+ */
+ if (!PageAnon(head)) {
+ pr_err("MCE: %#lx: non anonymous thp\n",
+ page_to_pfn(page));
+ return 0;
+ }
+
if (get_page_unless_zero(head)) {
if (PageTail(page))
get_page(page);
@@ -1134,15 +1146,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
}

if (!PageHuge(p) && PageTransHuge(hpage)) {
- if (!PageAnon(hpage)) {
- pr_err("MCE: %#lx: non anonymous thp\n", pfn);
- if (TestClearPageHWPoison(p))
- atomic_long_sub(nr_pages, &num_poisoned_pages);
- put_page(p);
- if (p != hpage)
- put_page(hpage);
- return -EBUSY;
- }
if (unlikely(split_huge_page(hpage))) {
pr_err("MCE: %#lx: thp split failed\n", pfn);
if (TestClearPageHWPoison(p))
--
2.4.3

2015-07-31 06:49:27

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 2/5] mm/memory-failure: fix race in counting num_poisoned_pages

When memory_failure() is called on a page which are just freed after page
migration from soft offlining, the counter num_poisoned_pages is raised twice.
So let's fix it with using TestSetPageHWPoison.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git v4.2-rc4.orig/mm/memory-failure.c v4.2-rc4/mm/memory-failure.c
index 04d677048af7..f72d2fad0b90 100644
--- v4.2-rc4.orig/mm/memory-failure.c
+++ v4.2-rc4/mm/memory-failure.c
@@ -1671,8 +1671,8 @@ static int __soft_offline_page(struct page *page, int flags)
if (ret > 0)
ret = -EIO;
} else {
- SetPageHWPoison(page);
- atomic_long_inc(&num_poisoned_pages);
+ if (!TestSetPageHWPoison(page))
+ atomic_long_inc(&num_poisoned_pages);
}
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
--
2.4.3

2015-07-31 06:49:54

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 5/5] mm/memory-failure: set PageHWPoison before migrate_pages()

Now page freeing code doesn't consider PageHWPoison as a bad page, so by
setting it before completing the page containment, we can prevent the error
page from being reused just after successful page migration.

I added TTU_IGNORE_HWPOISON for try_to_unmap() to make sure that the page
table entry is transformed into migration entry, not to hwpoison entry.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 7 ++++---
mm/migrate.c | 3 ++-
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git v4.2-rc4.orig/mm/memory-failure.c v4.2-rc4/mm/memory-failure.c
index cd985530f102..ea5a93659488 100644
--- v4.2-rc4.orig/mm/memory-failure.c
+++ v4.2-rc4/mm/memory-failure.c
@@ -1659,6 +1659,8 @@ static int __soft_offline_page(struct page *page, int flags)
inc_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
list_add(&page->lru, &pagelist);
+ if (!TestSetPageHWPoison(page))
+ atomic_long_inc(&num_poisoned_pages);
ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
@@ -1673,9 +1675,8 @@ static int __soft_offline_page(struct page *page, int flags)
pfn, ret, page->flags);
if (ret > 0)
ret = -EIO;
- } else {
- if (!TestSetPageHWPoison(page))
- atomic_long_inc(&num_poisoned_pages);
+ if (TestClearPageHWPoison(page))
+ atomic_long_dec(&num_poisoned_pages);
}
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
diff --git v4.2-rc4.orig/mm/migrate.c v4.2-rc4/mm/migrate.c
index f2415be7d93b..eb4267107d1f 100644
--- v4.2-rc4.orig/mm/migrate.c
+++ v4.2-rc4/mm/migrate.c
@@ -880,7 +880,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
/* Establish migration ptes or remove ptes */
if (page_mapped(page)) {
try_to_unmap(page,
- TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+ TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
+ TTU_IGNORE_HWPOISON);
page_was_mapped = 1;
}

--
2.4.3

2015-07-31 20:54:25

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] mm/memory-failure: unlock_page before put_page

On Fri, 31 Jul 2015, Naoya Horiguchi wrote:

> In "just unpoisoned" path, we do put_page and then unlock_page, which is a
> wrong order and causes "freeing locked page" bug. So let's fix it.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> mm/memory-failure.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git v4.2-rc4.orig/mm/memory-failure.c v4.2-rc4/mm/memory-failure.c
> index c53543d89282..04d677048af7 100644
> --- v4.2-rc4.orig/mm/memory-failure.c
> +++ v4.2-rc4/mm/memory-failure.c
> @@ -1209,9 +1209,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> if (!PageHWPoison(p)) {
> printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn);
> atomic_long_sub(nr_pages, &num_poisoned_pages);
> + unlock_page(hpage);
> put_page(hpage);
> - res = 0;
> - goto out;
> + return 0;
> }
> if (hwpoison_filter(p)) {
> if (TestClearPageHWPoison(p))

Looks like you could do the unlock_page() before either the printk or
atomic_long_sub(), but probably not important.

Acked-by: David Rientjes <[email protected]>