2017-06-01 08:17:07

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 0/9] mm: hwpoison: fixlet for hugetlb migration

Hi everyone,

I wrote the patchset updating hwpoison/hugetlb code to address
the 2 reported issues.

One is madvise(MADV_HWPOISON) failure reported by Intel's lkp robot
(see http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop.)
First half was already fixed in mainline, and another half about hugetlb
cases are solved in this series.

Another issue is "narrow-down error affected region into a single 4kB
page instead of a whole hugetlb page" issue, which was tried by Anshuman
(http://lkml.kernel.org/r/[email protected])
and I updated it to apply it more widely.

Hopefully it helps people who are interested in hugetlb migration for
wider arch/setting.

Thanks,
Naoya Horiguchi
---
Summary:

Anshuman Khandual (1):
mm: hugetlb: soft-offline: dissolve source hugepage after successful migration

Naoya Horiguchi (8):
mm: hugetlb: prevent reuse of hwpoisoned free hugepages
mm: hugetlb: return immediately for hugetlb page in __delete_from_page_cache()
mm: hwpoison: change PageHWPoison behavior on hugetlb pages
mm: soft-offline: dissolve free hugepage if soft-offlined
mm: hwpoison: introduce memory_failure_hugetlb()
mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error
mm: hugetlb: delete dequeue_hwpoisoned_huge_page()
mm: hwpoison: introduce idenfity_page_state

fs/hugetlbfs/inode.c | 11 ++
include/linux/hugetlb.h | 8 +-
include/linux/swapops.h | 9 --
mm/filemap.c | 8 +-
mm/hugetlb.c | 47 ++-----
mm/memory-failure.c | 323 +++++++++++++++++++++++-------------------------
mm/migrate.c | 2 +
7 files changed, 184 insertions(+), 224 deletions(-)


2017-06-01 08:17:12

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 1/9] mm: hugetlb: prevent reuse of hwpoisoned free hugepages

We no longer use MIGRATE_ISOLATE to prevent reuse of hwpoison hugepages
as we did before. So current dequeue_huge_page_node() doesn't work as
intended because it still uses is_migrate_isolate_page() for this check.
This patch fixes it with PageHWPoison flag.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/hugetlb.c | 3 +--
mm/memory-failure.c | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)

diff --git v4.12-rc3/mm/hugetlb.c v4.12-rc3_patched/mm/hugetlb.c
index e582887..6d6c659 100644
--- v4.12-rc3/mm/hugetlb.c
+++ v4.12-rc3_patched/mm/hugetlb.c
@@ -22,7 +22,6 @@
#include <linux/rmap.h>
#include <linux/swap.h>
#include <linux/swapops.h>
-#include <linux/page-isolation.h>
#include <linux/jhash.h>

#include <asm/page.h>
@@ -872,7 +871,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
struct page *page;

list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
- if (!is_migrate_isolate_page(page))
+ if (!PageHWPoison(page))
break;
/*
* if 'non-isolated free hugepage' not found on the list,
diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index 2527dfed..6c1a7c9 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -49,7 +49,6 @@
#include <linux/swap.h>
#include <linux/backing-dev.h>
#include <linux/migrate.h>
-#include <linux/page-isolation.h>
#include <linux/suspend.h>
#include <linux/slab.h>
#include <linux/swapops.h>
--
2.7.0

2017-06-01 08:17:19

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 4/9] mm: hugetlb: soft-offline: dissolve source hugepage after successful migration

From: Anshuman Khandual <[email protected]>

Currently hugepage migrated by soft-offline (i.e. due to correctable
memory errors) is contained as a hugepage, which means many non-error
pages in it are unreusable, i.e. wasted.

This patch solves this issue by dissolving source hugepages into buddy.
As done in previous patch, PageHWPoison is set only on a head page of
the error hugepage. Then in dissoliving we move the PageHWPoison flag to
the raw error page so that all healthy subpages return back to buddy.

Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c | 10 +++++++++-
mm/memory-failure.c | 5 +----
mm/migrate.c | 2 ++
4 files changed, 14 insertions(+), 5 deletions(-)

diff --git v4.12-rc3/include/linux/hugetlb.h v4.12-rc3_patched/include/linux/hugetlb.h
index b857fc8..89afe40 100644
--- v4.12-rc3/include/linux/hugetlb.h
+++ v4.12-rc3_patched/include/linux/hugetlb.h
@@ -461,6 +461,7 @@ static inline pgoff_t basepage_index(struct page *page)
return __basepage_index(page);
}

+extern int dissolve_free_huge_page(struct page *page);
extern int dissolve_free_huge_pages(unsigned long start_pfn,
unsigned long end_pfn);
static inline bool hugepage_migration_supported(struct hstate *h)
@@ -529,6 +530,7 @@ static inline pgoff_t basepage_index(struct page *page)
{
return page->index;
}
+#define dissolve_free_huge_page(p) 0
#define dissolve_free_huge_pages(s, e) 0
#define hugepage_migration_supported(h) false

diff --git v4.12-rc3/mm/hugetlb.c v4.12-rc3_patched/mm/hugetlb.c
index 6d6c659..41c37ed 100644
--- v4.12-rc3/mm/hugetlb.c
+++ v4.12-rc3_patched/mm/hugetlb.c
@@ -1443,7 +1443,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
* number of free hugepages would be reduced below the number of reserved
* hugepages.
*/
-static int dissolve_free_huge_page(struct page *page)
+int dissolve_free_huge_page(struct page *page)
{
int rc = 0;

@@ -1456,6 +1456,14 @@ static int dissolve_free_huge_page(struct page *page)
rc = -EBUSY;
goto out;
}
+ /*
+ * Move PageHWPoison flag from head page to the raw error page,
+ * which makes any subpages rather than the error page reusable.
+ */
+ if (PageHWPoison(head) && page != head) {
+ SetPageHWPoison(page);
+ ClearPageHWPoison(head);
+ }
list_del(&head->lru);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index aae620f..e03903f 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -1571,11 +1571,8 @@ static int soft_offline_huge_page(struct page *page, int flags)
if (ret > 0)
ret = -EIO;
} else {
- /* overcommit hugetlb page will be freed to buddy */
- SetPageHWPoison(page);
if (PageHuge(page))
- dequeue_hwpoisoned_huge_page(hpage);
- num_poisoned_pages_inc();
+ dissolve_free_huge_page(page);
}
return ret;
}
diff --git v4.12-rc3/mm/migrate.c v4.12-rc3_patched/mm/migrate.c
index 89a0a17..f0319db 100644
--- v4.12-rc3/mm/migrate.c
+++ v4.12-rc3_patched/mm/migrate.c
@@ -1251,6 +1251,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
out:
if (rc != -EAGAIN)
putback_active_hugepage(hpage);
+ if (reason == MR_MEMORY_FAILURE && !test_set_page_hwpoison(hpage))
+ num_poisoned_pages_inc();

/*
* If migration was not successful and there's a freeing callback, use
--
2.7.0

2017-06-01 08:17:22

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 6/9] mm: hwpoison: introduce memory_failure_hugetlb()

memory_failure() is a big function and hard to maintain.
Handling hugetlb- and non-hugetlb- case in a single function is not
good, so this patch separates PageHuge() branch into a new function,
which saves many PageHuge() check.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 134 ++++++++++++++++++++++++++++++++--------------------
1 file changed, 82 insertions(+), 52 deletions(-)

diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index a5f52cc..31b761a 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -1009,6 +1009,76 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
return unmap_success;
}

+static int memory_failure_hugetlb(unsigned long pfn, int trapno, int flags)
+{
+ struct page_state *ps;
+ struct page *p = pfn_to_page(pfn);
+ struct page *head = compound_head(p);
+ int res;
+ unsigned long page_flags;
+
+ if (TestSetPageHWPoison(head)) {
+ pr_err("Memory failure: %#lx: already hardware poisoned\n",
+ pfn);
+ return 0;
+ }
+
+ num_poisoned_pages_inc();
+
+ if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+ /*
+ * Check "filter hit" and "race with other subpage."
+ */
+ lock_page(head);
+ if (PageHWPoison(head)) {
+ if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
+ || (p != head && TestSetPageHWPoison(head))) {
+ num_poisoned_pages_dec();
+ unlock_page(head);
+ return 0;
+ }
+ }
+ unlock_page(head);
+ dissolve_free_huge_page(p);
+ action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED);
+ return 0;
+ }
+
+ lock_page(head);
+ page_flags = head->flags;
+
+ if (!PageHWPoison(head)) {
+ pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
+ num_poisoned_pages_dec();
+ unlock_page(head);
+ put_hwpoison_page(head);
+ return 0;
+ }
+
+ if (!hwpoison_user_mappings(p, pfn, trapno, flags, &head)) {
+ action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
+ res = -EBUSY;
+ goto out;
+ }
+
+ res = -EBUSY;
+
+ for (ps = error_states;; ps++)
+ if ((p->flags & ps->mask) == ps->res)
+ break;
+
+ page_flags |= (p->flags & (1UL << PG_dirty));
+
+ if (!ps->mask)
+ for (ps = error_states;; ps++)
+ if ((page_flags & ps->mask) == ps->res)
+ break;
+ res = page_action(ps, p, pfn);
+out:
+ unlock_page(head);
+ return res;
+}
+
/**
* memory_failure - Handle memory failure of a page.
* @pfn: Page Number of the corrupted page
@@ -1046,33 +1116,22 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
}

p = pfn_to_page(pfn);
- orig_head = hpage = compound_head(p);
-
- /* tmporary check code, to be updated in later patches */
- if (PageHuge(p)) {
- if (TestSetPageHWPoison(hpage)) {
- pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
- return 0;
- }
- goto tmp;
- }
+ if (PageHuge(p))
+ return memory_failure_hugetlb(pfn, trapno, flags);
if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn);
return 0;
}

-tmp:
+ orig_head = hpage = compound_head(p);
num_poisoned_pages_inc();

/*
* We need/can do nothing about count=0 pages.
* 1) it's a free page, and therefore in safe hand:
* prep_new_page() will be the gate keeper.
- * 2) it's a free hugepage, which is also safe:
- * an affected hugepage will be dequeued from hugepage freelist,
- * so there's no concern about reusing it ever after.
- * 3) it's part of a non-compound high order page.
+ * 2) it's part of a non-compound high order page.
* Implies some kernel user: cannot stop them from
* R/W the page; let's pray that the page has been
* used and will be freed some time later.
@@ -1083,31 +1142,13 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
if (is_free_buddy_page(p)) {
action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
return 0;
- } else if (PageHuge(hpage)) {
- /*
- * Check "filter hit" and "race with other subpage."
- */
- lock_page(hpage);
- if (PageHWPoison(hpage)) {
- if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
- || (p != hpage && TestSetPageHWPoison(hpage))) {
- num_poisoned_pages_dec();
- unlock_page(hpage);
- return 0;
- }
- }
- res = dequeue_hwpoisoned_huge_page(hpage);
- action_result(pfn, MF_MSG_FREE_HUGE,
- res ? MF_IGNORED : MF_DELAYED);
- unlock_page(hpage);
- return res;
} else {
action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
return -EBUSY;
}
}

- if (!PageHuge(p) && PageTransHuge(hpage)) {
+ if (PageTransHuge(hpage)) {
lock_page(p);
if (!PageAnon(p) || unlikely(split_huge_page(p))) {
unlock_page(p);
@@ -1145,7 +1186,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
return 0;
}

- lock_page(hpage);
+ lock_page(p);

/*
* The page could have changed compound pages during the locking.
@@ -1172,33 +1213,22 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
if (!PageHWPoison(p)) {
pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
num_poisoned_pages_dec();
- unlock_page(hpage);
- put_hwpoison_page(hpage);
+ unlock_page(p);
+ put_hwpoison_page(p);
return 0;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
- unlock_page(hpage);
- put_hwpoison_page(hpage);
+ unlock_page(p);
+ put_hwpoison_page(p);
return 0;
}

- if (!PageHuge(p) && !PageTransTail(p) && !PageLRU(p))
+ if (!PageTransTail(p) && !PageLRU(p))
goto identify_page_state;

/*
- * For error on the tail page, we should set PG_hwpoison
- * on the head page to show that the hugepage is hwpoisoned
- */
- if (PageHuge(p) && PageTail(p) && TestSetPageHWPoison(hpage)) {
- action_result(pfn, MF_MSG_POISONED_HUGE, MF_IGNORED);
- unlock_page(hpage);
- put_hwpoison_page(hpage);
- return 0;
- }
-
- /*
* It's very difficult to mess with pages currently under IO
* and in many cases impossible, so we just avoid it here.
*/
@@ -1245,7 +1275,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
break;
res = page_action(ps, p, pfn);
out:
- unlock_page(hpage);
+ unlock_page(p);
return res;
}
EXPORT_SYMBOL_GPL(memory_failure);
--
2.7.0

2017-06-01 08:17:29

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 9/9] mm: hwpoison: introduce idenfity_page_state

Factoring duplicate code into a function.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 57 +++++++++++++++++++++++------------------------------
1 file changed, 25 insertions(+), 32 deletions(-)

diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index 85009ead..d4a88d0 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -1025,9 +1025,31 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
return unmap_success;
}

-static int memory_failure_hugetlb(unsigned long pfn, int trapno, int flags)
+static int identify_page_state(unsigned long pfn, struct page *p,
+ unsigned long page_flags)
{
struct page_state *ps;
+
+ /*
+ * The first check uses the current page flags which may not have any
+ * relevant information. The second check with the saved page flags is
+ * carried out only if the first check can't determine the page status.
+ */
+ for (ps = error_states;; ps++)
+ if ((p->flags & ps->mask) == ps->res)
+ break;
+
+ page_flags |= (p->flags & (1UL << PG_dirty));
+
+ if (!ps->mask)
+ for (ps = error_states;; ps++)
+ if ((page_flags & ps->mask) == ps->res)
+ break;
+ return page_action(ps, p, pfn);
+}
+
+static int memory_failure_hugetlb(unsigned long pfn, int trapno, int flags)
+{
struct page *p = pfn_to_page(pfn);
struct page *head = compound_head(p);
int res;
@@ -1077,19 +1099,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int trapno, int flags)
goto out;
}

- res = -EBUSY;
-
- for (ps = error_states;; ps++)
- if ((p->flags & ps->mask) == ps->res)
- break;
-
- page_flags |= (p->flags & (1UL << PG_dirty));
-
- if (!ps->mask)
- for (ps = error_states;; ps++)
- if ((page_flags & ps->mask) == ps->res)
- break;
- res = page_action(ps, p, pfn);
+ res = identify_page_state(pfn, p, page_flags);
out:
unlock_page(head);
return res;
@@ -1115,7 +1125,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int trapno, int flags)
*/
int memory_failure(unsigned long pfn, int trapno, int flags)
{
- struct page_state *ps;
struct page *p;
struct page *hpage;
struct page *orig_head;
@@ -1273,23 +1282,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
}

identify_page_state:
- res = -EBUSY;
- /*
- * The first check uses the current page flags which may not have any
- * relevant information. The second check with the saved page flagss is
- * carried out only if the first check can't determine the page status.
- */
- for (ps = error_states;; ps++)
- if ((p->flags & ps->mask) == ps->res)
- break;
-
- page_flags |= (p->flags & (1UL << PG_dirty));
-
- if (!ps->mask)
- for (ps = error_states;; ps++)
- if ((page_flags & ps->mask) == ps->res)
- break;
- res = page_action(ps, p, pfn);
+ res = identify_page_state(pfn, p, page_flags);
out:
unlock_page(p);
return res;
--
2.7.0

2017-06-01 08:17:35

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 7/9] mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error

Currently me_huge_page() relies on dequeue_hwpoisoned_huge_page() to
keep the error hugepage away from the system, which is OK but not good
enough because the hugepage still has a refcount and unpoison doesn't
work on the error hugepage (PageHWPoison flags are cleared but pages
are still leaked.) And there's "wasting health subpages" issue too.
This patch reworks on me_huge_page() to solve these issues.

For hugetlb file, recently we have truncating code so let's use it
in hugetlbfs specific ->error_remove_page().

For anonymous hugepage, it's helpful to dissolve the error page after
freeing it into free hugepage list. Migration entry and PageHWPoison
in the head page prevent the access to it.

TODO: dissolve_free_huge_page() can fail but we don't considered it
yet. It's not critical (and at least no worse that now) because in
such case the error hugepage just stays in free hugepage list without
being dissolved. By virtue of PageHWPoison in head page, it's never
allocated to processes.

Fixes: 23a003bfd23ea9ea0b7756b920e51f64b284b468 ("mm/madvise: pass return code of memory_failure() to userspace")
Link: http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
---
fs/hugetlbfs/inode.c | 11 ++++++
mm/memory-failure.c | 94 ++++++++++++++++++++++++++++++----------------------
2 files changed, 66 insertions(+), 39 deletions(-)

diff --git v4.12-rc3/fs/hugetlbfs/inode.c v4.12-rc3_patched/fs/hugetlbfs/inode.c
index dde8613..33b33ec 100644
--- v4.12-rc3/fs/hugetlbfs/inode.c
+++ v4.12-rc3_patched/fs/hugetlbfs/inode.c
@@ -851,6 +851,16 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
return MIGRATEPAGE_SUCCESS;
}

+static int hugetlbfs_error_remove_page(struct address_space *mapping,
+ struct page *page)
+{
+ struct inode *inode = mapping->host;
+
+ remove_huge_page(page);
+ hugetlb_fix_reserve_counts(inode);
+ return 0;
+}
+
static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(dentry->d_sb);
@@ -966,6 +976,7 @@ static const struct address_space_operations hugetlbfs_aops = {
.write_end = hugetlbfs_write_end,
.set_page_dirty = hugetlbfs_set_page_dirty,
.migratepage = hugetlbfs_migrate_page,
+ .error_remove_page = hugetlbfs_error_remove_page,
};


diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index 31b761a..047867b 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -554,6 +554,39 @@ static int delete_from_lru_cache(struct page *p)
return -EIO;
}

+static int truncate_error_page(struct page *p, unsigned long pfn,
+ struct address_space *mapping)
+{
+ int ret = MF_FAILED;
+
+ if (mapping->a_ops->error_remove_page) {
+ int err = mapping->a_ops->error_remove_page(mapping, p);
+
+ if (err != 0) {
+ pr_info("Memory failure: %#lx: Failed to punch page: %d\n",
+ pfn, err);
+ } else if (page_has_private(p) &&
+ !try_to_release_page(p, GFP_NOIO)) {
+ pr_info("Memory failure: %#lx: failed to release buffers\n",
+ pfn);
+ } else {
+ ret = MF_RECOVERED;
+ }
+ } else {
+ /*
+ * If the file system doesn't support it just invalidate
+ * This fails on dirty or anything with private pages
+ */
+ if (invalidate_inode_page(p))
+ ret = MF_RECOVERED;
+ else
+ pr_info("Memory failure: %#lx: Failed to invalidate\n",
+ pfn);
+ }
+
+ return ret;
+}
+
/*
* Error hit kernel page.
* Do nothing, try to be lucky and not touch this instead. For a few cases we
@@ -579,7 +612,6 @@ static int me_unknown(struct page *p, unsigned long pfn)
static int me_pagecache_clean(struct page *p, unsigned long pfn)
{
int err;
- int ret = MF_FAILED;
struct address_space *mapping;

delete_from_lru_cache(p);
@@ -611,30 +643,7 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn)
*
* Open: to take i_mutex or not for this? Right now we don't.
*/
- if (mapping->a_ops->error_remove_page) {
- err = mapping->a_ops->error_remove_page(mapping, p);
- if (err != 0) {
- pr_info("Memory failure: %#lx: Failed to punch page: %d\n",
- pfn, err);
- } else if (page_has_private(p) &&
- !try_to_release_page(p, GFP_NOIO)) {
- pr_info("Memory failure: %#lx: failed to release buffers\n",
- pfn);
- } else {
- ret = MF_RECOVERED;
- }
- } else {
- /*
- * If the file system doesn't support it just invalidate
- * This fails on dirty or anything with private pages
- */
- if (invalidate_inode_page(p))
- ret = MF_RECOVERED;
- else
- pr_info("Memory failure: %#lx: Failed to invalidate\n",
- pfn);
- }
- return ret;
+ return truncate_error_page(p, pfn, mapping);
}

/*
@@ -740,24 +749,31 @@ static int me_huge_page(struct page *p, unsigned long pfn)
{
int res = 0;
struct page *hpage = compound_head(p);
+ struct address_space *mapping;

if (!PageHuge(hpage))
return MF_DELAYED;

- /*
- * We can safely recover from error on free or reserved (i.e.
- * not in-use) hugepage by dequeuing it from freelist.
- * To check whether a hugepage is in-use or not, we can't use
- * page->lru because it can be used in other hugepage operations,
- * such as __unmap_hugepage_range() and gather_surplus_pages().
- * So instead we use page_mapping() and PageAnon().
- */
- if (!(page_mapping(hpage) || PageAnon(hpage))) {
- res = dequeue_hwpoisoned_huge_page(hpage);
- if (!res)
- return MF_RECOVERED;
+ mapping = page_mapping(hpage);
+ if (mapping) {
+ res = truncate_error_page(hpage, pfn, mapping);
+ } else {
+ int dissolve_ret;
+
+ unlock_page(hpage);
+ /*
+ * migration entry prevents later access on error anonymous
+ * hugepage, so we can free and dissolve it into buddy to
+ * save healthy subpages.
+ */
+ if (PageAnon(hpage))
+ put_page(hpage);
+ dissolve_free_huge_page(p);
+ res = MF_RECOVERED;
+ lock_page(hpage);
}
- return MF_DELAYED;
+
+ return res;
}

/*
@@ -856,7 +872,7 @@ static int page_action(struct page_state *ps, struct page *p,
count = page_count(p) - 1;
if (ps->action == me_swapcache_dirty && result == MF_DELAYED)
count--;
- if (count != 0) {
+ if (count > 0) {
pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
pfn, action_page_types[ps->type], count);
result = MF_FAILED;
--
2.7.0

2017-06-01 08:17:43

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 8/9] mm: hugetlb: delete dequeue_hwpoisoned_huge_page()

dequeue_hwpoisoned_huge_page() is no longer used, so let's remove it.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/hugetlb.h | 6 ------
mm/hugetlb.c | 34 ----------------------------------
mm/memory-failure.c | 11 -----------
3 files changed, 51 deletions(-)

diff --git v4.12-rc3/include/linux/hugetlb.h v4.12-rc3_patched/include/linux/hugetlb.h
index 89afe40..b81931b 100644
--- v4.12-rc3/include/linux/hugetlb.h
+++ v4.12-rc3_patched/include/linux/hugetlb.h
@@ -92,7 +92,6 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long freed);
-int dequeue_hwpoisoned_huge_page(struct page *page);
bool isolate_huge_page(struct page *page, struct list_head *list);
void putback_active_hugepage(struct page *page);
void free_huge_page(struct page *page);
@@ -158,11 +157,6 @@ static inline void hugetlb_show_meminfo(void)
#define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
src_addr, pagep) ({ BUG(); 0; })
#define huge_pte_offset(mm, address) 0
-static inline int dequeue_hwpoisoned_huge_page(struct page *page)
-{
- return 0;
-}
-
static inline bool isolate_huge_page(struct page *page, struct list_head *list)
{
return false;
diff --git v4.12-rc3/mm/hugetlb.c v4.12-rc3_patched/mm/hugetlb.c
index 41c37ed..17f5f0f 100644
--- v4.12-rc3/mm/hugetlb.c
+++ v4.12-rc3_patched/mm/hugetlb.c
@@ -4701,40 +4701,6 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
}

-#ifdef CONFIG_MEMORY_FAILURE
-
-/*
- * This function is called from memory failure code.
- */
-int dequeue_hwpoisoned_huge_page(struct page *hpage)
-{
- struct hstate *h = page_hstate(hpage);
- int nid = page_to_nid(hpage);
- int ret = -EBUSY;
-
- spin_lock(&hugetlb_lock);
- /*
- * Just checking !page_huge_active is not enough, because that could be
- * an isolated/hwpoisoned hugepage (which have >0 refcount).
- */
- if (!page_huge_active(hpage) && !page_count(hpage)) {
- /*
- * Hwpoisoned hugepage isn't linked to activelist or freelist,
- * but dangling hpage->lru can trigger list-debug warnings
- * (this happens when we call unpoison_memory() on it),
- * so let it point to itself with list_del_init().
- */
- list_del_init(&hpage->lru);
- set_page_refcounted(hpage);
- h->free_huge_pages--;
- h->free_huge_pages_node[nid]--;
- ret = 0;
- }
- spin_unlock(&hugetlb_lock);
- return ret;
-}
-#endif
-
bool isolate_huge_page(struct page *page, struct list_head *list)
{
bool ret = true;
diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index 047867b..85009ead 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -1458,17 +1458,6 @@ int unpoison_memory(unsigned long pfn)
}

if (!get_hwpoison_page(p)) {
- /*
- * Since HWPoisoned hugepage should have non-zero refcount,
- * race between memory failure and unpoison seems to happen.
- * In such case unpoison fails and memory failure runs
- * to the end.
- */
- if (PageHuge(page)) {
- unpoison_pr_info("Unpoison: Memory failure is now running on free hugepage %#lx\n",
- pfn, &unpoison_rs);
- return 0;
- }
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
--
2.7.0

2017-06-01 08:18:07

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 5/9] mm: soft-offline: dissolve free hugepage if soft-offlined

Now we have code to rescue most of healthy pages from a hwpoisoned
hugepage. So let's apply it to soft_offline_free_page too.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index e03903f..a5f52cc 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -1693,7 +1693,7 @@ static void soft_offline_free_page(struct page *page)
if (!TestSetPageHWPoison(head)) {
num_poisoned_pages_inc();
if (PageHuge(head))
- dequeue_hwpoisoned_huge_page(head);
+ dissolve_free_huge_page(page);
}
}

--
2.7.0

2017-06-01 08:18:33

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 3/9] mm: hwpoison: change PageHWPoison behavior on hugetlb pages

We'd like to narrow down the error region in memory error on hugetlb
pages. However, currently we set PageHWPoison flags on all subpages
in the error hugepage and add # of subpages to num_hwpoison_pages,
which doesn't fit our purpose.

So this patch changes the behavior and we only set PageHWPoison on
the head page then increase num_hwpoison_pages only by 1. This is
a preparation for narrow-down part which comes in later patches.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/swapops.h | 9 -----
mm/memory-failure.c | 87 ++++++++++++++-----------------------------------
2 files changed, 24 insertions(+), 72 deletions(-)

diff --git v4.12-rc3/include/linux/swapops.h v4.12-rc3_patched/include/linux/swapops.h
index 5c3a5f3..c5ff7b2 100644
--- v4.12-rc3/include/linux/swapops.h
+++ v4.12-rc3_patched/include/linux/swapops.h
@@ -196,15 +196,6 @@ static inline void num_poisoned_pages_dec(void)
atomic_long_dec(&num_poisoned_pages);
}

-static inline void num_poisoned_pages_add(long num)
-{
- atomic_long_add(num, &num_poisoned_pages);
-}
-
-static inline void num_poisoned_pages_sub(long num)
-{
- atomic_long_sub(num, &num_poisoned_pages);
-}
#else

static inline swp_entry_t make_hwpoison_entry(struct page *page)
diff --git v4.12-rc3/mm/memory-failure.c v4.12-rc3_patched/mm/memory-failure.c
index 6c1a7c9..aae620f 100644
--- v4.12-rc3/mm/memory-failure.c
+++ v4.12-rc3_patched/mm/memory-failure.c
@@ -1009,22 +1009,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
return unmap_success;
}

-static void set_page_hwpoison_huge_page(struct page *hpage)
-{
- int i;
- int nr_pages = 1 << compound_order(hpage);
- for (i = 0; i < nr_pages; i++)
- SetPageHWPoison(hpage + i);
-}
-
-static void clear_page_hwpoison_huge_page(struct page *hpage)
-{
- int i;
- int nr_pages = 1 << compound_order(hpage);
- for (i = 0; i < nr_pages; i++)
- ClearPageHWPoison(hpage + i);
-}
-
/**
* memory_failure - Handle memory failure of a page.
* @pfn: Page Number of the corrupted page
@@ -1050,7 +1034,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
struct page *hpage;
struct page *orig_head;
int res;
- unsigned int nr_pages;
unsigned long page_flags;

if (!sysctl_memory_failure_recovery)
@@ -1064,24 +1047,23 @@ int memory_failure(unsigned long pfn, int trapno, int flags)

p = pfn_to_page(pfn);
orig_head = hpage = compound_head(p);
+
+ /* tmporary check code, to be updated in later patches */
+ if (PageHuge(p)) {
+ if (TestSetPageHWPoison(hpage)) {
+ pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
+ return 0;
+ }
+ goto tmp;
+ }
if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn);
return 0;
}

- /*
- * Currently errors on hugetlbfs pages are measured in hugepage units,
- * so nr_pages should be 1 << compound_order. OTOH when errors are on
- * transparent hugepages, they are supposed to be split and error
- * measurement is done in normal page units. So nr_pages should be one
- * in this case.
- */
- if (PageHuge(p))
- nr_pages = 1 << compound_order(hpage);
- else /* normal page or thp */
- nr_pages = 1;
- num_poisoned_pages_add(nr_pages);
+tmp:
+ num_poisoned_pages_inc();

/*
* We need/can do nothing about count=0 pages.
@@ -1109,12 +1091,11 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
if (PageHWPoison(hpage)) {
if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
|| (p != hpage && TestSetPageHWPoison(hpage))) {
- num_poisoned_pages_sub(nr_pages);
+ num_poisoned_pages_dec();
unlock_page(hpage);
return 0;
}
}
- set_page_hwpoison_huge_page(hpage);
res = dequeue_hwpoisoned_huge_page(hpage);
action_result(pfn, MF_MSG_FREE_HUGE,
res ? MF_IGNORED : MF_DELAYED);
@@ -1137,7 +1118,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
pr_err("Memory failure: %#lx: thp split failed\n",
pfn);
if (TestClearPageHWPoison(p))
- num_poisoned_pages_sub(nr_pages);
+ num_poisoned_pages_dec();
put_hwpoison_page(p);
return -EBUSY;
}
@@ -1190,14 +1171,14 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
*/
if (!PageHWPoison(p)) {
pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
- num_poisoned_pages_sub(nr_pages);
+ num_poisoned_pages_dec();
unlock_page(hpage);
put_hwpoison_page(hpage);
return 0;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
- num_poisoned_pages_sub(nr_pages);
+ num_poisoned_pages_dec();
unlock_page(hpage);
put_hwpoison_page(hpage);
return 0;
@@ -1216,14 +1197,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
put_hwpoison_page(hpage);
return 0;
}
- /*
- * Set PG_hwpoison on all pages in an error hugepage,
- * because containment is done in hugepage unit for now.
- * Since we have done TestSetPageHWPoison() for the head page with
- * page lock held, we can safely set PG_hwpoison bits on tail pages.
- */
- if (PageHuge(p))
- set_page_hwpoison_huge_page(hpage);

/*
* It's very difficult to mess with pages currently under IO
@@ -1394,7 +1367,6 @@ int unpoison_memory(unsigned long pfn)
struct page *page;
struct page *p;
int freeit = 0;
- unsigned int nr_pages;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);

@@ -1439,8 +1411,6 @@ int unpoison_memory(unsigned long pfn)
return 0;
}

- nr_pages = 1 << compound_order(page);
-
if (!get_hwpoison_page(p)) {
/*
* Since HWPoisoned hugepage should have non-zero refcount,
@@ -1470,10 +1440,8 @@ int unpoison_memory(unsigned long pfn)
if (TestClearPageHWPoison(page)) {
unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
pfn, &unpoison_rs);
- num_poisoned_pages_sub(nr_pages);
+ num_poisoned_pages_dec();
freeit = 1;
- if (PageHuge(page))
- clear_page_hwpoison_huge_page(page);
}
unlock_page(page);

@@ -1604,14 +1572,10 @@ static int soft_offline_huge_page(struct page *page, int flags)
ret = -EIO;
} else {
/* overcommit hugetlb page will be freed to buddy */
- if (PageHuge(page)) {
- set_page_hwpoison_huge_page(hpage);
+ SetPageHWPoison(page);
+ if (PageHuge(page))
dequeue_hwpoisoned_huge_page(hpage);
- num_poisoned_pages_add(1 << compound_order(hpage));
- } else {
- SetPageHWPoison(page);
- num_poisoned_pages_inc();
- }
+ num_poisoned_pages_inc();
}
return ret;
}
@@ -1727,15 +1691,12 @@ static int soft_offline_in_use_page(struct page *page, int flags)

static void soft_offline_free_page(struct page *page)
{
- if (PageHuge(page)) {
- struct page *hpage = compound_head(page);
+ struct page *head = compound_head(page);

- set_page_hwpoison_huge_page(hpage);
- if (!dequeue_hwpoisoned_huge_page(hpage))
- num_poisoned_pages_add(1 << compound_order(hpage));
- } else {
- if (!TestSetPageHWPoison(page))
- num_poisoned_pages_inc();
+ if (!TestSetPageHWPoison(head)) {
+ num_poisoned_pages_inc();
+ if (PageHuge(head))
+ dequeue_hwpoisoned_huge_page(head);
}
}

--
2.7.0

2017-06-01 08:18:50

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 2/9] mm: hugetlb: return immediately for hugetlb page in __delete_from_page_cache()

We avoid calling __mod_node_page_state(NR_FILE_PAGES) for hugetlb page
now, but it's not enough because later code doesn't handle hugetlb
properly. Actually in our testing, WARN_ON_ONCE(PageDirty(page)) at the
end of this function fires for hugetlb, which makes no sense. So we
should return immediately for hugetlb pages.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/filemap.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git v4.12-rc3/mm/filemap.c v4.12-rc3_patched/mm/filemap.c
index 6f1be57..2867fcb 100644
--- v4.12-rc3/mm/filemap.c
+++ v4.12-rc3_patched/mm/filemap.c
@@ -239,14 +239,16 @@ void __delete_from_page_cache(struct page *page, void *shadow)
/* Leave page->index set: truncation lookup relies upon it */

/* hugetlb pages do not participate in page cache accounting. */
- if (!PageHuge(page))
- __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
+ if (PageHuge(page))
+ return;
+
+ __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
if (PageSwapBacked(page)) {
__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
if (PageTransHuge(page))
__dec_node_page_state(page, NR_SHMEM_THPS);
} else {
- VM_BUG_ON_PAGE(PageTransHuge(page) && !PageHuge(page), page);
+ VM_BUG_ON_PAGE(PageTransHuge(page), page);
}

/*
--
2.7.0