Recently I addressed a few of hwpoison race problems and the patches are merged
on v4.2-rc1. It made progress, but unfortunately some problems still remain due
to less coverage of my testing. So I'm trying to fix or avoid them in this series.
One point I'm expecting to discuss is that patch 4/4 changes the page flag set
to be checked on free time. In current behavior, __PG_HWPOISON is not supposed
to be set when the page is freed. I think that there is no strong reason for this
behavior, and it causes a problem hard to fix only in error handler side (because
__PG_HWPOISON could be set at arbitrary timing.) So I suggest to change it.
With this patchset, the stress testing in official mce-test testsuite passes.
Thanks,
Naoya Horiguchi
---
Tree: https://github.com/Naoya-Horiguchi/linux/tree/v4.2-rc2/hwpoison.v1
---
Summary:
Naoya Horiguchi (4):
mm/memory-failure: unlock_page before put_page
mm/memory-failure: fix race in counting num_poisoned_pages
mm/memory-failure: give up error handling for non-tail-refcounted thp
mm/memory-failure: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
include/linux/page-flags.h | 10 +++++++---
mm/huge_memory.c | 7 +------
mm/memory-failure.c | 32 ++++++++++++++++++--------------
mm/migrate.c | 9 +++------
mm/page_alloc.c | 4 ++++
5 files changed, 33 insertions(+), 29 deletions(-)-
The race condition addressed in commit add05cecef80 ("mm: soft-offline: don't
free target page in successful page migration") was not closed completely,
because that can happen not only for soft-offline, but also for hard-offline.
Consider that a slab page is about to be freed into buddy pool, and then an
uncorrected memory error hits the page just after entering __free_one_page(),
then VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP) is triggered,
despite the fact that it's not necessary because the data on the affected
page is not consumed.
To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time. I think it's justified because __PG_HWPOISON flags is
defined to prevent the page from being reused and setting it outside the
page's alloc-free cycle is a designed behavior (not a bug.)
And the patch reverts most of the changes from commit add05cecef80 about
the new refcounting rule of soft-offlined pages, which is no longer necessary.
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/page-flags.h | 10 +++++++---
mm/huge_memory.c | 7 +------
mm/memory-failure.c | 6 +++++-
mm/migrate.c | 9 +++------
mm/page_alloc.c | 4 ++++
5 files changed, 20 insertions(+), 16 deletions(-)
diff --git v4.2-rc2.orig/include/linux/page-flags.h v4.2-rc2/include/linux/page-flags.h
index f34e040b34e9..53400f101f2d 100644
--- v4.2-rc2.orig/include/linux/page-flags.h
+++ v4.2-rc2/include/linux/page-flags.h
@@ -631,15 +631,19 @@ static inline void ClearPageSlabPfmemalloc(struct page *page)
1 << PG_private | 1 << PG_private_2 | \
1 << PG_writeback | 1 << PG_reserved | \
1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
- 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
+ 1 << PG_unevictable | __PG_MLOCKED | \
__PG_COMPOUND_LOCK)
/*
* Flags checked when a page is prepped for return by the page allocator.
- * Pages being prepped should not have any flags set. It they are set,
+ * Pages being prepped should not have these flags set. It they are set,
* there has been a kernel bug or struct page corruption.
+ *
+ * __PG_HWPOISON is exceptional because it need to be kept beyond page's
+ * alloc-free cycle to prevent from reusing the page.
*/
-#define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1)
+#define PAGE_FLAGS_CHECK_AT_PREP \
+ (((1 << NR_PAGEFLAGS) - 1) & ~__PG_HWPOISON)
#define PAGE_FLAGS_PRIVATE \
(1 << PG_private | 1 << PG_private_2)
diff --git v4.2-rc2.orig/mm/huge_memory.c v4.2-rc2/mm/huge_memory.c
index c107094f79ba..097c7a4bfbd9 100644
--- v4.2-rc2.orig/mm/huge_memory.c
+++ v4.2-rc2/mm/huge_memory.c
@@ -1676,12 +1676,7 @@ static void __split_huge_page_refcount(struct page *page,
/* after clearing PageTail the gup refcount can be released */
smp_mb__after_atomic();
- /*
- * retain hwpoison flag of the poisoned tail page:
- * fix for the unsuitable process killed on Guest Machine(KVM)
- * by the memory-failure.
- */
- page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP | __PG_HWPOISON;
+ page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (page->flags &
((1L << PG_referenced) |
(1L << PG_swapbacked) |
diff --git v4.2-rc2.orig/mm/memory-failure.c v4.2-rc2/mm/memory-failure.c
index 421d7c9b30f4..755f87e4ec64 100644
--- v4.2-rc2.orig/mm/memory-failure.c
+++ v4.2-rc2/mm/memory-failure.c
@@ -1723,6 +1723,9 @@ int soft_offline_page(struct page *page, int flags)
get_online_mems();
+ if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+ set_migratetype_isolate(page, true);
+
ret = get_any_page(page, pfn, flags);
put_online_mems();
if (ret > 0) { /* for in-use pages */
@@ -1730,7 +1733,7 @@ int soft_offline_page(struct page *page, int flags)
ret = soft_offline_huge_page(page, flags);
else
ret = __soft_offline_page(page, flags);
- } else if (ret == 0) { /* for free pages */
+ } else if (ret == 0) {
if (PageHuge(page)) {
set_page_hwpoison_huge_page(hpage);
if (!dequeue_hwpoisoned_huge_page(hpage))
@@ -1741,5 +1744,6 @@ int soft_offline_page(struct page *page, int flags)
atomic_long_inc(&num_poisoned_pages);
}
}
+ unset_migratetype_isolate(page, MIGRATE_MOVABLE);
return ret;
}
diff --git v4.2-rc2.orig/mm/migrate.c v4.2-rc2/mm/migrate.c
index ee401e4e5ef1..c37d5772767b 100644
--- v4.2-rc2.orig/mm/migrate.c
+++ v4.2-rc2/mm/migrate.c
@@ -918,8 +918,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
static ICE_noinline int unmap_and_move(new_page_t get_new_page,
free_page_t put_new_page,
unsigned long private, struct page *page,
- int force, enum migrate_mode mode,
- enum migrate_reason reason)
+ int force, enum migrate_mode mode)
{
int rc = 0;
int *result = NULL;
@@ -950,8 +949,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- if (reason != MR_MEMORY_FAILURE)
- putback_lru_page(page);
+ putback_lru_page(page);
}
/*
@@ -1124,8 +1122,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
pass > 2, mode);
else
rc = unmap_and_move(get_new_page, put_new_page,
- private, page, pass > 2, mode,
- reason);
+ private, page, pass > 2, mode);
switch(rc) {
case -ENOMEM:
diff --git v4.2-rc2.orig/mm/page_alloc.c v4.2-rc2/mm/page_alloc.c
index 506eac8b38af..e32d58ce5d2f 100644
--- v4.2-rc2.orig/mm/page_alloc.c
+++ v4.2-rc2/mm/page_alloc.c
@@ -1287,6 +1287,10 @@ static inline int check_new_page(struct page *page)
bad_reason = "non-NULL mapping";
if (unlikely(atomic_read(&page->_count) != 0))
bad_reason = "nonzero _count";
+ if (unlikely(page->flags & __PG_HWPOISON)) {
+ bad_reason = "HWPoisoned (hardware-corrupted)";
+ bad_flags = __PG_HWPOISON;
+ }
if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set";
bad_flags = PAGE_FLAGS_CHECK_AT_PREP;
--
2.4.3
When memory_failure() is called on a page which are just freed after page
migration from soft offlining, the counter num_poisoned_pages is raised twice.
So let's fix it with using TestSetPageHWPoison.
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git v4.2-rc2.orig/mm/memory-failure.c v4.2-rc2/mm/memory-failure.c
index 04d677048af7..f72d2fad0b90 100644
--- v4.2-rc2.orig/mm/memory-failure.c
+++ v4.2-rc2/mm/memory-failure.c
@@ -1671,8 +1671,8 @@ static int __soft_offline_page(struct page *page, int flags)
if (ret > 0)
ret = -EIO;
} else {
- SetPageHWPoison(page);
- atomic_long_inc(&num_poisoned_pages);
+ if (!TestSetPageHWPoison(page))
+ atomic_long_inc(&num_poisoned_pages);
}
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
--
2.4.3
"non anonymous thp" case is still racy with freeing thp, which causes panic
due to put_page() for refcount-0 page. It seems that closing up this race
might be hard (and/or not worth doing,) so let's give up the error handling
for this case.
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git v4.2-rc2.orig/mm/memory-failure.c v4.2-rc2/mm/memory-failure.c
index f72d2fad0b90..421d7c9b30f4 100644
--- v4.2-rc2.orig/mm/memory-failure.c
+++ v4.2-rc2/mm/memory-failure.c
@@ -909,6 +909,15 @@ int get_hwpoison_page(struct page *page)
* directly for tail pages.
*/
if (PageTransHuge(head)) {
+ /*
+ * Non anonymous thp exists only in allocation/free time. We
+ * can't handle such a case correctly, so let's give it up.
+ * This should be better than triggering BUG_ON when kernel
+ * tries to touch a "partially handled" page.
+ */
+ if (!PageAnon(head))
+ return 0;
+
if (get_page_unless_zero(head)) {
if (PageTail(page))
get_page(page);
@@ -1134,15 +1143,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
}
if (!PageHuge(p) && PageTransHuge(hpage)) {
- if (!PageAnon(hpage)) {
- pr_err("MCE: %#lx: non anonymous thp\n", pfn);
- if (TestClearPageHWPoison(p))
- atomic_long_sub(nr_pages, &num_poisoned_pages);
- put_page(p);
- if (p != hpage)
- put_page(hpage);
- return -EBUSY;
- }
if (unlikely(split_huge_page(hpage))) {
pr_err("MCE: %#lx: thp split failed\n", pfn);
if (TestClearPageHWPoison(p))
--
2.4.3
In "just unpoisoned" path, we do put_page and then unlock_page, which is a
wrong order and causes "freeing locked page" bug. So let's fix it.
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git v4.2-rc2.orig/mm/memory-failure.c v4.2-rc2/mm/memory-failure.c
index c53543d89282..04d677048af7 100644
--- v4.2-rc2.orig/mm/memory-failure.c
+++ v4.2-rc2/mm/memory-failure.c
@@ -1209,9 +1209,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
if (!PageHWPoison(p)) {
printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn);
atomic_long_sub(nr_pages, &num_poisoned_pages);
+ unlock_page(hpage);
put_page(hpage);
- res = 0;
- goto out;
+ return 0;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
--
2.4.3
> @@ -909,6 +909,15 @@ int get_hwpoison_page(struct page *page)
> * directly for tail pages.
> */
> if (PageTransHuge(head)) {
> + /*
> + * Non anonymous thp exists only in allocation/free time. We
> + * can't handle such a case correctly, so let's give it up.
> + * This should be better than triggering BUG_ON when kernel
> + * tries to touch a "partially handled" page.
> + */
> + if (!PageAnon(head))
> + return 0;
Please print a message for this case. In the future there will be
likely more non anonymous THP pages from Kirill's large page cache work
(so eventually we'll need it)
-Andi
On Thu, Jul 16, 2015 at 04:33:07AM +0200, Andi Kleen wrote:
> > @@ -909,6 +909,15 @@ int get_hwpoison_page(struct page *page)
> > * directly for tail pages.
> > */
> > if (PageTransHuge(head)) {
> > + /*
> > + * Non anonymous thp exists only in allocation/free time. We
> > + * can't handle such a case correctly, so let's give it up.
> > + * This should be better than triggering BUG_ON when kernel
> > + * tries to touch a "partially handled" page.
> > + */
> > + if (!PageAnon(head))
> > + return 0;
>
> Please print a message for this case. In the future there will be
> likely more non anonymous THP pages from Kirill's large page cache work
> (so eventually we'll need it)
OK, I'll do this.
Thanks,
Naoya Horiguchi-
On Thu, 16 Jul 2015 01:41:56 +0000 Naoya Horiguchi <[email protected]> wrote:
> The race condition addressed in commit add05cecef80 ("mm: soft-offline: don't
> free target page in successful page migration") was not closed completely,
> because that can happen not only for soft-offline, but also for hard-offline.
> Consider that a slab page is about to be freed into buddy pool, and then an
> uncorrected memory error hits the page just after entering __free_one_page(),
> then VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP) is triggered,
> despite the fact that it's not necessary because the data on the affected
> page is not consumed.
>
> To solve it, this patch drops __PG_HWPOISON from page flag checks at
> allocation/free time. I think it's justified because __PG_HWPOISON flags is
> defined to prevent the page from being reused and setting it outside the
> page's alloc-free cycle is a designed behavior (not a bug.)
>
> And the patch reverts most of the changes from commit add05cecef80 about
> the new refcounting rule of soft-offlined pages, which is no longer necessary.
>
> ...
>
> --- v4.2-rc2.orig/mm/memory-failure.c
> +++ v4.2-rc2/mm/memory-failure.c
> @@ -1723,6 +1723,9 @@ int soft_offline_page(struct page *page, int flags)
>
> get_online_mems();
>
> + if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> + set_migratetype_isolate(page, true);
> +
> ret = get_any_page(page, pfn, flags);
> put_online_mems();
> if (ret > 0) { /* for in-use pages */
This patch gets build-broken by your
mm-page_isolation-make-set-unset_migratetype_isolate-file-local.patch,
which I shall drop.
From: Naoya Horiguchi <[email protected]>
Subject: mm, page_isolation: make set/unset_migratetype_isolate() file-local
Nowaday, set/unset_migratetype_isolate() is defined and used only in
mm/page_isolation, so let's limit the scope within the file.
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
include/linux/page-isolation.h | 5 -----
mm/page_isolation.c | 5 +++--
2 files changed, 3 insertions(+), 7 deletions(-)
diff -puN include/linux/page-isolation.h~mm-page_isolation-make-set-unset_migratetype_isolate-file-local include/linux/page-isolation.h
--- a/include/linux/page-isolation.h~mm-page_isolation-make-set-unset_migratetype_isolate-file-local
+++ a/include/linux/page-isolation.h
@@ -65,11 +65,6 @@ undo_isolate_page_range(unsigned long st
int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
bool skip_hwpoisoned_pages);
-/*
- * Internal functions. Changes pageblock's migrate type.
- */
-int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
-void unset_migratetype_isolate(struct page *page, unsigned migratetype);
struct page *alloc_migrate_target(struct page *page, unsigned long private,
int **resultp);
diff -puN mm/page_isolation.c~mm-page_isolation-make-set-unset_migratetype_isolate-file-local mm/page_isolation.c
--- a/mm/page_isolation.c~mm-page_isolation-make-set-unset_migratetype_isolate-file-local
+++ a/mm/page_isolation.c
@@ -9,7 +9,8 @@
#include <linux/hugetlb.h>
#include "internal.h"
-int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
+static int set_migratetype_isolate(struct page *page,
+ bool skip_hwpoisoned_pages)
{
struct zone *zone;
unsigned long flags, pfn;
@@ -72,7 +73,7 @@ out:
return ret;
}
-void unset_migratetype_isolate(struct page *page, unsigned migratetype)
+static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
{
struct zone *zone;
unsigned long flags, nr_pages;
_
On Thu, Jul 23, 2015 at 01:37:02PM -0700, Andrew Morton wrote:
> On Thu, 16 Jul 2015 01:41:56 +0000 Naoya Horiguchi <[email protected]> wrote:
>
> > The race condition addressed in commit add05cecef80 ("mm: soft-offline: don't
> > free target page in successful page migration") was not closed completely,
> > because that can happen not only for soft-offline, but also for hard-offline.
> > Consider that a slab page is about to be freed into buddy pool, and then an
> > uncorrected memory error hits the page just after entering __free_one_page(),
> > then VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP) is triggered,
> > despite the fact that it's not necessary because the data on the affected
> > page is not consumed.
> >
> > To solve it, this patch drops __PG_HWPOISON from page flag checks at
> > allocation/free time. I think it's justified because __PG_HWPOISON flags is
> > defined to prevent the page from being reused and setting it outside the
> > page's alloc-free cycle is a designed behavior (not a bug.)
> >
> > And the patch reverts most of the changes from commit add05cecef80 about
> > the new refcounting rule of soft-offlined pages, which is no longer necessary.
> >
> > ...
> >
> > --- v4.2-rc2.orig/mm/memory-failure.c
> > +++ v4.2-rc2/mm/memory-failure.c
> > @@ -1723,6 +1723,9 @@ int soft_offline_page(struct page *page, int flags)
> >
> > get_online_mems();
> >
> > + if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> > + set_migratetype_isolate(page, true);
> > +
> > ret = get_any_page(page, pfn, flags);
> > put_online_mems();
> > if (ret > 0) { /* for in-use pages */
>
> This patch gets build-broken by your
> mm-page_isolation-make-set-unset_migratetype_isolate-file-local.patch,
> which I shall drop.
I apologize this build failure. At first I planned to add another hwpoison patch
after this to remove this migratetype thing separately, but I was not 100% sure
of the correctness, so I did not include it in this version.
But Vlastimil's cleanup patch showed me that using MIGRATE_ISOLATE at free time
(, which is what soft offline code does now,) is wrong (or not an expected usage).
So I shouldn't have reverted the above part.
So I want the patch "mm, page_isolation: make set/unset_migratetype_isolate()
file-local" to be merged first, and I'd like to update this hwpoison before
going into mmotm. Could you drop this series from your tree for now?
I'll repost the next version probably next week.
Thanks,
Naoya Horiguchi-