I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
rebasing required some non-trivial changes to adjust, but mainly that was
straightforward. I confirmed that the reported problem doesn't reproduce on
compaction after soft offline. For more precise description of the problem
and the motivation of this patchset, please see [2].
I think that the following two patches in v2 are better to be done with
separate work of hard-offline rework, so it's not included in this series.
- mm,hwpoison: Take pages off the buddy when hard-offlining
- mm/hwpoison-inject: Rip off duplicated checks
These two are not directly related to the reported problem, so they seems
not urgent. And the first one breaks num_poisoned_pages counting in some
testcases, and The second patch needs more consideration about commented point.
Any comment/suggestion/help would be appreciated.
[1] v1: https://lore.kernel.org/linux-mm/[email protected]/
[2] v2: https://lore.kernel.org/linux-mm/[email protected]/
Thanks,
Naoya Horiguchi
---
Summary:
Naoya Horiguchi (7):
mm,hwpoison: cleanup unused PageHuge() check
mm, hwpoison: remove recalculating hpage
mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
mm,hwpoison-inject: don't pin for hwpoison_filter
mm,hwpoison: remove MF_COUNT_INCREASED
mm,hwpoison: remove flag argument from soft offline functions
mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
Oscar Salvador (8):
mm,madvise: Refactor madvise_inject_error
mm,hwpoison: Un-export get_hwpoison_page and make it static
mm,hwpoison: Kill put_hwpoison_page
mm,hwpoison: Unify THP handling for hard and soft offline
mm,hwpoison: Rework soft offline for free pages
mm,hwpoison: Rework soft offline for in-use pages
mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
drivers/base/memory.c | 2 +-
include/linux/mm.h | 12 +-
include/linux/page-flags.h | 6 +-
include/ras/ras_event.h | 3 +
mm/hwpoison-inject.c | 18 +--
mm/madvise.c | 39 +++---
mm/memory-failure.c | 331 ++++++++++++++++++++-------------------------
mm/migrate.c | 11 +-
mm/page_alloc.c | 63 +++++++--
9 files changed, 233 insertions(+), 252 deletions(-)
From: Oscar Salvador <[email protected]>
Merging soft_offline_huge_page and __soft_offline_page let us get rid of
quite some duplicated code, and makes the code much easier to follow.
Now, __soft_offline_page will handle both normal and hugetlb pages.
Note that move put_page() block to the beginning of page_handle_poison()
with drain_all_pages() in order to make sure that the target page is
freed and sent into free list to make take_page_off_buddy() work properly.
Signed-off-by: Oscar Salvador <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
---
ChangeLog v2 -> v3:
- use page_is_file_lru() instead of page_is_file_cache(),
- add description about put_page() and drain_all_pages().
- fix coding style warnings by checkpatch.pl
---
mm/memory-failure.c | 185 ++++++++++++++++++++------------------------
1 file changed, 86 insertions(+), 99 deletions(-)
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index f744eb90c15c..22c904f6d17a 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -78,14 +78,36 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
-static void page_handle_poison(struct page *page, bool release)
+static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
{
+ if (release) {
+ put_page(page);
+ drain_all_pages(page_zone(page));
+ }
+
+ if (hugepage_or_freepage) {
+ /*
+ * Doing this check for free pages is also fine since dissolve_free_huge_page
+ * returns 0 for non-hugetlb pages as well.
+ */
+ if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
+ /*
+ * The hugetlb page can end up being enqueued back into
+ * the freelists by means of:
+ * unmap_and_move_huge_page
+ * putback_active_hugepage
+ * put_page->free_huge_page
+ * enqueue_huge_page
+ * If this happens, we might lose the race against an allocation.
+ */
+ return false;
+ }
SetPageHWPoison(page);
- if (release)
- put_page(page);
page_ref_inc(page);
num_poisoned_pages_inc();
+
+ return true;
}
static int hwpoison_filter_dev(struct page *p)
@@ -1718,63 +1740,52 @@ static int get_any_page(struct page *page, unsigned long pfn)
return ret;
}
-static int soft_offline_huge_page(struct page *page)
+static bool isolate_page(struct page *page, struct list_head *pagelist)
{
- int ret;
- unsigned long pfn = page_to_pfn(page);
- struct page *hpage = compound_head(page);
- LIST_HEAD(pagelist);
+ bool isolated = false;
+ bool lru = PageLRU(page);
+
+ if (PageHuge(page)) {
+ isolated = isolate_huge_page(page, pagelist);
+ } else {
+ if (lru)
+ isolated = !isolate_lru_page(page);
+ else
+ isolated = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
+
+ if (isolated)
+ list_add(&page->lru, pagelist);
- /*
- * This double-check of PageHWPoison is to avoid the race with
- * memory_failure(). See also comment in __soft_offline_page().
- */
- lock_page(hpage);
- if (PageHWPoison(hpage)) {
- unlock_page(hpage);
- put_page(hpage);
- pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
- return -EBUSY;
}
- unlock_page(hpage);
- ret = isolate_huge_page(hpage, &pagelist);
+ if (isolated && lru)
+ inc_node_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_lru(page));
+
/*
- * get_any_page() and isolate_huge_page() takes a refcount each,
- * so need to drop one here.
+ * If we succeed to isolate the page, we grabbed another refcount on
+ * the page, so we can safely drop the one we got from get_any_pages().
+ * If we failed to isolate the page, it means that we cannot go further
+ * and we will return an error, so drop the reference we got from
+ * get_any_pages() as well.
*/
- put_page(hpage);
- if (!ret) {
- pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
- return -EBUSY;
- }
-
- ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
- MIGRATE_SYNC, MR_MEMORY_FAILURE);
- if (ret) {
- pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",
- pfn, ret, page->flags, &page->flags);
- if (!list_empty(&pagelist))
- putback_movable_pages(&pagelist);
- if (ret > 0)
- ret = -EIO;
- } else {
- /*
- * We set PG_hwpoison only when we were able to take the page
- * off the buddy.
- */
- if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
- page_handle_poison(page, false);
- else
- ret = -EBUSY;
- }
- return ret;
+ put_page(page);
+ return isolated;
}
+/*
+ * __soft_offline_page handles hugetlb-pages and non-hugetlb pages.
+ * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
+ * If the page is mapped, it migrates the contents over.
+ */
static int __soft_offline_page(struct page *page)
{
- int ret;
+ int ret = 0;
unsigned long pfn = page_to_pfn(page);
+ struct page *hpage = compound_head(page);
+ const char *msg_page[] = {"page", "hugepage"};
+ bool huge = PageHuge(page);
+ LIST_HEAD(pagelist);
/*
* Check PageHWPoison again inside page lock because PageHWPoison
@@ -1783,98 +1794,74 @@ static int __soft_offline_page(struct page *page)
* so there's no race between soft_offline_page() and memory_failure().
*/
lock_page(page);
- wait_on_page_writeback(page);
+ if (!PageHuge(page))
+ wait_on_page_writeback(page);
if (PageHWPoison(page)) {
unlock_page(page);
put_page(page);
pr_info("soft offline: %#lx page already poisoned\n", pfn);
return -EBUSY;
}
- /*
- * Try to invalidate first. This should work for
- * non dirty unmapped page cache pages.
- */
- ret = invalidate_inode_page(page);
+
+ if (!PageHuge(page))
+ /*
+ * Try to invalidate first. This should work for
+ * non dirty unmapped page cache pages.
+ */
+ ret = invalidate_inode_page(page);
unlock_page(page);
+
/*
* RED-PEN would be better to keep it isolated here, but we
* would need to fix isolation locking first.
*/
- if (ret == 1) {
+ if (ret) {
pr_info("soft_offline: %#lx: invalidated\n", pfn);
- page_handle_poison(page, true);
+ page_handle_poison(page, false, true);
return 0;
}
- /*
- * Simple invalidation didn't work.
- * Try to migrate to a new page instead. migrate.c
- * handles a large number of cases for us.
- */
- if (PageLRU(page))
- ret = isolate_lru_page(page);
- else
- ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
- /*
- * Drop page reference which is came from get_any_page()
- * successful isolate_lru_page() already took another one.
- */
- put_page(page);
- if (!ret) {
- LIST_HEAD(pagelist);
- /*
- * After isolated lru page, the PageLRU will be cleared,
- * so use !__PageMovable instead for LRU page's mapping
- * cannot have PAGE_MAPPING_MOVABLE.
- */
- if (!__PageMovable(page))
- inc_node_page_state(page, NR_ISOLATED_ANON +
- page_is_file_lru(page));
- list_add(&page->lru, &pagelist);
+ if (isolate_page(hpage, &pagelist)) {
ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (!ret) {
- page_handle_poison(page, true);
+ bool release = !huge;
+
+ if (!page_handle_poison(page, true, release))
+ ret = -EBUSY;
} else {
if (!list_empty(&pagelist))
putback_movable_pages(&pagelist);
- pr_info("soft offline: %#lx: migration failed %d, type %lx (%pGp)\n",
- pfn, ret, page->flags, &page->flags);
+
+ pr_info("soft offline: %#lx: %s migration failed %d, type %lx (%pGp)\n",
+ pfn, msg_page[huge], ret, page->flags, &page->flags);
if (ret > 0)
ret = -EIO;
}
} else {
- pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx (%pGp)\n",
- pfn, ret, page_count(page), page->flags, &page->flags);
+ pr_info("soft offline: %#lx: %s isolation failed: %d, page count %d, type %lx (%pGp)\n",
+ pfn, msg_page[huge], ret, page_count(page), page->flags, &page->flags);
}
return ret;
}
static int soft_offline_in_use_page(struct page *page)
{
- int ret;
struct page *hpage = compound_head(page);
if (!PageHuge(page) && PageTransHuge(hpage))
if (try_to_split_thp_page(page, "soft offline") < 0)
return -EBUSY;
-
- if (PageHuge(page))
- ret = soft_offline_huge_page(page);
- else
- ret = __soft_offline_page(page);
- return ret;
+ return __soft_offline_page(page);
}
static int soft_offline_free_page(struct page *page)
{
- int rc = -EBUSY;
+ int rc = 0;
- if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) {
- page_handle_poison(page, false);
- rc = 0;
- }
+ if (!page_handle_poison(page, true, false))
+ rc = -EBUSY;
return rc;
}
--
2.17.1
From: Oscar Salvador <[email protected]>
After commit 4e41a30c6d50 ("mm: hwpoison: adjust for new thp refcounting"),
put_hwpoison_page got reduced to a put_page.
Let us just use put_page instead.
Signed-off-by: Oscar Salvador <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/mm.h | 1 -
mm/memory-failure.c | 30 +++++++++++++++---------------
2 files changed, 15 insertions(+), 16 deletions(-)
diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
index 050e9cffc2ea..c64ffec363b5 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
@@ -2995,7 +2995,6 @@ extern int memory_failure(unsigned long pfn, int flags);
extern void memory_failure_queue(unsigned long pfn, int flags);
extern void memory_failure_queue_kick(int cpu);
extern int unpoison_memory(unsigned long pfn);
-#define put_hwpoison_page(page) put_page(page)
extern int sysctl_memory_failure_early_kill;
extern int sysctl_memory_failure_recovery;
extern void shake_page(struct page *p, int access);
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 48feb45047f7..0b7d9769cf29 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1144,7 +1144,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
num_poisoned_pages_dec();
unlock_page(head);
- put_hwpoison_page(head);
+ put_page(head);
return 0;
}
@@ -1336,7 +1336,7 @@ int memory_failure(unsigned long pfn, int flags)
pfn);
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
- put_hwpoison_page(p);
+ put_page(p);
return -EBUSY;
}
unlock_page(p);
@@ -1389,14 +1389,14 @@ int memory_failure(unsigned long pfn, int flags)
pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
num_poisoned_pages_dec();
unlock_page(p);
- put_hwpoison_page(p);
+ put_page(p);
return 0;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
unlock_page(p);
- put_hwpoison_page(p);
+ put_page(p);
return 0;
}
@@ -1630,9 +1630,9 @@ int unpoison_memory(unsigned long pfn)
}
unlock_page(page);
- put_hwpoison_page(page);
+ put_page(page);
if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
- put_hwpoison_page(page);
+ put_page(page);
return 0;
}
@@ -1690,7 +1690,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
/*
* Try to free it.
*/
- put_hwpoison_page(page);
+ put_page(page);
shake_page(page, 1);
/*
@@ -1699,7 +1699,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
ret = __get_any_page(page, pfn, 0);
if (ret == 1 && !PageLRU(page)) {
/* Drop page reference which is from __get_any_page() */
- put_hwpoison_page(page);
+ put_page(page);
pr_info("soft_offline: %#lx: unknown non LRU page type %lx (%pGp)\n",
pfn, page->flags, &page->flags);
return -EIO;
@@ -1722,7 +1722,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
lock_page(hpage);
if (PageHWPoison(hpage)) {
unlock_page(hpage);
- put_hwpoison_page(hpage);
+ put_page(hpage);
pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
return -EBUSY;
}
@@ -1733,7 +1733,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
* get_any_page() and isolate_huge_page() takes a refcount each,
* so need to drop one here.
*/
- put_hwpoison_page(hpage);
+ put_page(hpage);
if (!ret) {
pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
return -EBUSY;
@@ -1782,7 +1782,7 @@ static int __soft_offline_page(struct page *page, int flags)
wait_on_page_writeback(page);
if (PageHWPoison(page)) {
unlock_page(page);
- put_hwpoison_page(page);
+ put_page(page);
pr_info("soft offline: %#lx page already poisoned\n", pfn);
return -EBUSY;
}
@@ -1797,7 +1797,7 @@ static int __soft_offline_page(struct page *page, int flags)
* would need to fix isolation locking first.
*/
if (ret == 1) {
- put_hwpoison_page(page);
+ put_page(page);
pr_info("soft_offline: %#lx: invalidated\n", pfn);
SetPageHWPoison(page);
num_poisoned_pages_inc();
@@ -1817,7 +1817,7 @@ static int __soft_offline_page(struct page *page, int flags)
* Drop page reference which is came from get_any_page()
* successful isolate_lru_page() already took another one.
*/
- put_hwpoison_page(page);
+ put_page(page);
if (!ret) {
LIST_HEAD(pagelist);
/*
@@ -1861,7 +1861,7 @@ static int soft_offline_in_use_page(struct page *page, int flags)
pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page));
else
pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page));
- put_hwpoison_page(page);
+ put_page(page);
return -EBUSY;
}
unlock_page(page);
@@ -1934,7 +1934,7 @@ int soft_offline_page(unsigned long pfn, int flags)
if (PageHWPoison(page)) {
pr_info("soft offline: %#lx page already poisoned\n", pfn);
if (flags & MF_COUNT_INCREASED)
- put_hwpoison_page(page);
+ put_page(page);
return -EBUSY;
}
--
2.17.1
On Wed, 24 Jun 2020 15:01:22 +0000 [email protected] wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward. I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline. For more precise description of the problem
> and the motivation of this patchset, please see [2].
>
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
>
> - mm,hwpoison: Take pages off the buddy when hard-offlining
> - mm/hwpoison-inject: Rip off duplicated checks
>
> These two are not directly related to the reported problem, so they seems
> not urgent. And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
>
It would be nice to have some sort of overview of the patch series in
this [0/n] email.
> [1] v1: https://lore.kernel.org/linux-mm/[email protected]/
> [2] v2: https://lore.kernel.org/linux-mm/[email protected]/
The above have such, but are they up to date?
On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 15:01:22 +0000 [email protected] wrote:
>
> > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> > rebasing required some non-trivial changes to adjust, but mainly that was
> > straightforward. I confirmed that the reported problem doesn't reproduce on
> > compaction after soft offline. For more precise description of the problem
> > and the motivation of this patchset, please see [2].
> >
> > I think that the following two patches in v2 are better to be done with
> > separate work of hard-offline rework, so it's not included in this series.
> >
> > - mm,hwpoison: Take pages off the buddy when hard-offlining
> > - mm/hwpoison-inject: Rip off duplicated checks
> >
> > These two are not directly related to the reported problem, so they seems
> > not urgent. And the first one breaks num_poisoned_pages counting in some
> > testcases, and The second patch needs more consideration about commented point.
> >
>
> It would be nice to have some sort of overview of the patch series in
> this [0/n] email.
>
> > [1] v1: https://lore.kernel.org/linux-mm/[email protected]/
> > [2] v2: https://lore.kernel.org/linux-mm/[email protected]/
>
> The above have such, but are they up to date?
The description of the problem doesn't change, but there're some new patches
and some patches are postponed, so I should've added an overview of this series:
- patch 1, 2 are cleanups.
- patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
we sometimes call it with holding refcount of the target page and somtimes call
without holding it, and we passed a flag of whether refcount was taken out of
memory_failure(). It was confusing and caused code more complex than needed.
- patch 6-10 are cleanups.
- patch 11 introduces new logic to remove the error page from buddy allocator,
which is also applied to the path of soft-offling in-use pages in patch 12.
- patch 13 is basically a refactoring but I added some adjustment to make sure
that the freed page is surely sent back to buddy instead of being kept in pcplist,
which is based on discussion in v2.
- patch 14 fixes the inconsistency of return values between injection interfaces.
- patch 15 is a new patch to complement missing code found in code review for
previous version.
Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.
Thanks,
Naoya Horiguchi
On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口 直也) <[email protected]> wrote:
> On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > On Wed, 24 Jun 2020 15:01:22 +0000 [email protected] wrote:
> >
> > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> > > rebasing required some non-trivial changes to adjust, but mainly that was
> > > straightforward. I confirmed that the reported problem doesn't reproduce on
> > > compaction after soft offline. For more precise description of the problem
> > > and the motivation of this patchset, please see [2].
> > >
> > > I think that the following two patches in v2 are better to be done with
> > > separate work of hard-offline rework, so it's not included in this series.
> > >
> > > - mm,hwpoison: Take pages off the buddy when hard-offlining
> > > - mm/hwpoison-inject: Rip off duplicated checks
> > >
> > > These two are not directly related to the reported problem, so they seems
> > > not urgent. And the first one breaks num_poisoned_pages counting in some
> > > testcases, and The second patch needs more consideration about commented point.
> > >
> >
> > It would be nice to have some sort of overview of the patch series in
> > this [0/n] email.
> >
> > > [1] v1: https://lore.kernel.org/linux-mm/[email protected]/
> > > [2] v2: https://lore.kernel.org/linux-mm/[email protected]/
> >
> > The above have such, but are they up to date?
>
> The description of the problem doesn't change, but there're some new patches
> and some patches are postponed, so I should've added an overview of this series:
>
> - patch 1, 2 are cleanups.
> - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
> we sometimes call it with holding refcount of the target page and somtimes call
> without holding it, and we passed a flag of whether refcount was taken out of
> memory_failure(). It was confusing and caused code more complex than needed.
> - patch 6-10 are cleanups.
> - patch 11 introduces new logic to remove the error page from buddy allocator,
> which is also applied to the path of soft-offling in-use pages in patch 12.
> - patch 13 is basically a refactoring but I added some adjustment to make sure
> that the freed page is surely sent back to buddy instead of being kept in pcplist,
> which is based on discussion in v2.
> - patch 14 fixes the inconsistency of return values between injection interfaces.
> - patch 15 is a new patch to complement missing code found in code review for
> previous version.
>
> Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.
And all the other words in
https://lore.kernel.org/linux-mm/[email protected]/
are still accurate and complete?
On Wed, Jun 24, 2020 at 03:49:47PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA($BKY8}!!D>Li(B) <[email protected]> wrote:
>
> > On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > > On Wed, 24 Jun 2020 15:01:22 +0000 [email protected] wrote:
> > >
> > > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> > > > rebasing required some non-trivial changes to adjust, but mainly that was
> > > > straightforward. I confirmed that the reported problem doesn't reproduce on
> > > > compaction after soft offline. For more precise description of the problem
> > > > and the motivation of this patchset, please see [2].
> > > >
> > > > I think that the following two patches in v2 are better to be done with
> > > > separate work of hard-offline rework, so it's not included in this series.
> > > >
> > > > - mm,hwpoison: Take pages off the buddy when hard-offlining
> > > > - mm/hwpoison-inject: Rip off duplicated checks
> > > >
> > > > These two are not directly related to the reported problem, so they seems
> > > > not urgent. And the first one breaks num_poisoned_pages counting in some
> > > > testcases, and The second patch needs more consideration about commented point.
> > > >
> > >
> > > It would be nice to have some sort of overview of the patch series in
> > > this [0/n] email.
> > >
> > > > [1] v1: https://lore.kernel.org/linux-mm/[email protected]/
> > > > [2] v2: https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > The above have such, but are they up to date?
> >
> > The description of the problem doesn't change, but there're some new patches
> > and some patches are postponed, so I should've added an overview of this series:
> >
> > - patch 1, 2 are cleanups.
> > - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
> > we sometimes call it with holding refcount of the target page and somtimes call
> > without holding it, and we passed a flag of whether refcount was taken out of
> > memory_failure(). It was confusing and caused code more complex than needed.
> > - patch 6-10 are cleanups.
> > - patch 11 introduces new logic to remove the error page from buddy allocator,
> > which is also applied to the path of soft-offling in-use pages in patch 12.
> > - patch 13 is basically a refactoring but I added some adjustment to make sure
> > that the freed page is surely sent back to buddy instead of being kept in pcplist,
> > which is based on discussion in v2.
> > - patch 14 fixes the inconsistency of return values between injection interfaces.
> > - patch 15 is a new patch to complement missing code found in code review for
> > previous version.
> >
> > Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.
>
> And all the other words in
> https://lore.kernel.org/linux-mm/[email protected]/
> are still accurate and complete?
Yes, they are.
- Naoya
On Wed, Jun 24, 2020 at 03:01:22PM +0000, [email protected] wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward. I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline. For more precise description of the problem
> and the motivation of this patchset, please see [2].
>
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
>
> - mm,hwpoison: Take pages off the buddy when hard-offlining
> - mm/hwpoison-inject: Rip off duplicated checks
>
> These two are not directly related to the reported problem, so they seems
> not urgent. And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
>
> Any comment/suggestion/help would be appreciated.
Next-20200626 failed to compile due to this series. Reverting the whole
thing [1] will fix the issue below right away,
mm/memory-failure.c: In function ‘__soft_offline_page’:
mm/memory-failure.c:1827:3: error: implicit declaration of function ‘page_handle_poison’; did you mean ‘page_init_poison’? [-Werror=implicit-function-declaration]
page_handle_poison(page, false, true);
^~~~~~~~~~~~~~~~~~
page_init_poison
.config used,
https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
[1] git revert --no-edit f296ba1d3a07..0bd1762119e9
>
> [1] v1: https://lore.kernel.org/linux-mm/[email protected]/
> [2] v2: https://lore.kernel.org/linux-mm/[email protected]/
>
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
>
> Naoya Horiguchi (7):
> mm,hwpoison: cleanup unused PageHuge() check
> mm, hwpoison: remove recalculating hpage
> mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
> mm,hwpoison-inject: don't pin for hwpoison_filter
> mm,hwpoison: remove MF_COUNT_INCREASED
> mm,hwpoison: remove flag argument from soft offline functions
> mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
>
> Oscar Salvador (8):
> mm,madvise: Refactor madvise_inject_error
> mm,hwpoison: Un-export get_hwpoison_page and make it static
> mm,hwpoison: Kill put_hwpoison_page
> mm,hwpoison: Unify THP handling for hard and soft offline
> mm,hwpoison: Rework soft offline for free pages
> mm,hwpoison: Rework soft offline for in-use pages
> mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
> mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
>
> drivers/base/memory.c | 2 +-
> include/linux/mm.h | 12 +-
> include/linux/page-flags.h | 6 +-
> include/ras/ras_event.h | 3 +
> mm/hwpoison-inject.c | 18 +--
> mm/madvise.c | 39 +++---
> mm/memory-failure.c | 331 ++++++++++++++++++++-------------------------
> mm/migrate.c | 11 +-
> mm/page_alloc.c | 63 +++++++--
> 9 files changed, 233 insertions(+), 252 deletions(-)
>
On Wed, 24 Jun 2020 at 20:32, <[email protected]> wrote:
>
> From: Oscar Salvador <[email protected]>
>
> Merging soft_offline_huge_page and __soft_offline_page let us get rid of
> quite some duplicated code, and makes the code much easier to follow.
>
> Now, __soft_offline_page will handle both normal and hugetlb pages.
>
> Note that move put_page() block to the beginning of page_handle_poison()
> with drain_all_pages() in order to make sure that the target page is
> freed and sent into free list to make take_page_off_buddy() work properly.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> ChangeLog v2 -> v3:
> - use page_is_file_lru() instead of page_is_file_cache(),
> - add description about put_page() and drain_all_pages().
> - fix coding style warnings by checkpatch.pl
> ---
> mm/memory-failure.c | 185 ++++++++++++++++++++------------------------
> 1 file changed, 86 insertions(+), 99 deletions(-)
>
> diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> index f744eb90c15c..22c904f6d17a 100644
> --- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
> +++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> @@ -78,14 +78,36 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
> EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
> EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
>
> -static void page_handle_poison(struct page *page, bool release)
> +static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
> {
> + if (release) {
> + put_page(page);
> + drain_all_pages(page_zone(page));
> + }
> +
> + if (hugepage_or_freepage) {
> + /*
> + * Doing this check for free pages is also fine since dissolve_free_huge_page
> + * returns 0 for non-hugetlb pages as well.
> + */
> + if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
> + /*
> + * The hugetlb page can end up being enqueued back into
> + * the freelists by means of:
> + * unmap_and_move_huge_page
> + * putback_active_hugepage
> + * put_page->free_huge_page
> + * enqueue_huge_page
> + * If this happens, we might lose the race against an allocation.
> + */
> + return false;
> + }
>
> SetPageHWPoison(page);
> - if (release)
> - put_page(page);
> page_ref_inc(page);
> num_poisoned_pages_inc();
> +
> + return true;
> }
>
> static int hwpoison_filter_dev(struct page *p)
> @@ -1718,63 +1740,52 @@ static int get_any_page(struct page *page, unsigned long pfn)
> return ret;
> }
>
> -static int soft_offline_huge_page(struct page *page)
> +static bool isolate_page(struct page *page, struct list_head *pagelist)
> {
> - int ret;
> - unsigned long pfn = page_to_pfn(page);
> - struct page *hpage = compound_head(page);
> - LIST_HEAD(pagelist);
> + bool isolated = false;
> + bool lru = PageLRU(page);
> +
> + if (PageHuge(page)) {
> + isolated = isolate_huge_page(page, pagelist);
> + } else {
> + if (lru)
> + isolated = !isolate_lru_page(page);
> + else
> + isolated = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
> +
> + if (isolated)
> + list_add(&page->lru, pagelist);
>
> - /*
> - * This double-check of PageHWPoison is to avoid the race with
> - * memory_failure(). See also comment in __soft_offline_page().
> - */
> - lock_page(hpage);
> - if (PageHWPoison(hpage)) {
> - unlock_page(hpage);
> - put_page(hpage);
> - pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
> - return -EBUSY;
> }
> - unlock_page(hpage);
>
> - ret = isolate_huge_page(hpage, &pagelist);
> + if (isolated && lru)
> + inc_node_page_state(page, NR_ISOLATED_ANON +
> + page_is_file_lru(page));
> +
> /*
> - * get_any_page() and isolate_huge_page() takes a refcount each,
> - * so need to drop one here.
> + * If we succeed to isolate the page, we grabbed another refcount on
> + * the page, so we can safely drop the one we got from get_any_pages().
> + * If we failed to isolate the page, it means that we cannot go further
> + * and we will return an error, so drop the reference we got from
> + * get_any_pages() as well.
> */
> - put_page(hpage);
> - if (!ret) {
> - pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
> - return -EBUSY;
> - }
> -
> - ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> - MIGRATE_SYNC, MR_MEMORY_FAILURE);
> - if (ret) {
> - pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",
> - pfn, ret, page->flags, &page->flags);
> - if (!list_empty(&pagelist))
> - putback_movable_pages(&pagelist);
> - if (ret > 0)
> - ret = -EIO;
> - } else {
> - /*
> - * We set PG_hwpoison only when we were able to take the page
> - * off the buddy.
> - */
> - if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
> - page_handle_poison(page, false);
> - else
> - ret = -EBUSY;
> - }
> - return ret;
> + put_page(page);
> + return isolated;
> }
>
> +/*
> + * __soft_offline_page handles hugetlb-pages and non-hugetlb pages.
> + * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
> + * If the page is mapped, it migrates the contents over.
> + */
> static int __soft_offline_page(struct page *page)
> {
> - int ret;
> + int ret = 0;
> unsigned long pfn = page_to_pfn(page);
> + struct page *hpage = compound_head(page);
> + const char *msg_page[] = {"page", "hugepage"};
> + bool huge = PageHuge(page);
> + LIST_HEAD(pagelist);
>
> /*
> * Check PageHWPoison again inside page lock because PageHWPoison
> @@ -1783,98 +1794,74 @@ static int __soft_offline_page(struct page *page)
> * so there's no race between soft_offline_page() and memory_failure().
> */
> lock_page(page);
> - wait_on_page_writeback(page);
> + if (!PageHuge(page))
> + wait_on_page_writeback(page);
> if (PageHWPoison(page)) {
> unlock_page(page);
> put_page(page);
> pr_info("soft offline: %#lx page already poisoned\n", pfn);
> return -EBUSY;
> }
> - /*
> - * Try to invalidate first. This should work for
> - * non dirty unmapped page cache pages.
> - */
> - ret = invalidate_inode_page(page);
> +
> + if (!PageHuge(page))
> + /*
> + * Try to invalidate first. This should work for
> + * non dirty unmapped page cache pages.
> + */
> + ret = invalidate_inode_page(page);
> unlock_page(page);
> +
> /*
> * RED-PEN would be better to keep it isolated here, but we
> * would need to fix isolation locking first.
> */
> - if (ret == 1) {
> + if (ret) {
> pr_info("soft_offline: %#lx: invalidated\n", pfn);
> - page_handle_poison(page, true);
> + page_handle_poison(page, false, true);
arm64 build failed on linux-next 20200626 tag
make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc CC="sccache
aarch64-linux-gnu-gcc" O=build Image
79 #
80 ../mm/memory-failure.c: In function ‘__soft_offline_page’:
81 ../mm/memory-failure.c:1827:3: error: implicit declaration of
function ‘page_handle_poison’; did you mean ‘page_init_poison’?
[-Werror=implicit-function-declaration]
82 1827 | page_handle_poison(page, false, true);
83 | ^~~~~~~~~~~~~~~~~~
84 | page_init_poison
--
Linaro LKFT
https://lkft.linaro.org
On Fri, Jun 26, 2020 at 10:23:43PM +0530, Naresh Kamboju wrote:
> On Wed, 24 Jun 2020 at 20:32, <[email protected]> wrote:
> >
> > From: Oscar Salvador <[email protected]>
> >
> > Merging soft_offline_huge_page and __soft_offline_page let us get rid of
> > quite some duplicated code, and makes the code much easier to follow.
> >
> > Now, __soft_offline_page will handle both normal and hugetlb pages.
> >
> > Note that move put_page() block to the beginning of page_handle_poison()
> > with drain_all_pages() in order to make sure that the target page is
> > freed and sent into free list to make take_page_off_buddy() work properly.
> >
> > Signed-off-by: Oscar Salvador <[email protected]>
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > ChangeLog v2 -> v3:
> > - use page_is_file_lru() instead of page_is_file_cache(),
> > - add description about put_page() and drain_all_pages().
> > - fix coding style warnings by checkpatch.pl
> > ---
> > mm/memory-failure.c | 185 ++++++++++++++++++++------------------------
> > 1 file changed, 86 insertions(+), 99 deletions(-)
> >
> > diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> > index f744eb90c15c..22c904f6d17a 100644
> > --- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
> > +++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> > @@ -78,14 +78,36 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
> > EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
> > EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
> >
> > -static void page_handle_poison(struct page *page, bool release)
> > +static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
> > {
> > + if (release) {
> > + put_page(page);
> > + drain_all_pages(page_zone(page));
> > + }
> > +
> > + if (hugepage_or_freepage) {
> > + /*
> > + * Doing this check for free pages is also fine since dissolve_free_huge_page
> > + * returns 0 for non-hugetlb pages as well.
> > + */
> > + if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
> > + /*
> > + * The hugetlb page can end up being enqueued back into
> > + * the freelists by means of:
> > + * unmap_and_move_huge_page
> > + * putback_active_hugepage
> > + * put_page->free_huge_page
> > + * enqueue_huge_page
> > + * If this happens, we might lose the race against an allocation.
> > + */
> > + return false;
> > + }
> >
> > SetPageHWPoison(page);
> > - if (release)
> > - put_page(page);
> > page_ref_inc(page);
> > num_poisoned_pages_inc();
> > +
> > + return true;
> > }
> >
> > static int hwpoison_filter_dev(struct page *p)
> > @@ -1718,63 +1740,52 @@ static int get_any_page(struct page *page, unsigned long pfn)
> > return ret;
> > }
> >
> > -static int soft_offline_huge_page(struct page *page)
> > +static bool isolate_page(struct page *page, struct list_head *pagelist)
> > {
> > - int ret;
> > - unsigned long pfn = page_to_pfn(page);
> > - struct page *hpage = compound_head(page);
> > - LIST_HEAD(pagelist);
> > + bool isolated = false;
> > + bool lru = PageLRU(page);
> > +
> > + if (PageHuge(page)) {
> > + isolated = isolate_huge_page(page, pagelist);
> > + } else {
> > + if (lru)
> > + isolated = !isolate_lru_page(page);
> > + else
> > + isolated = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
> > +
> > + if (isolated)
> > + list_add(&page->lru, pagelist);
> >
> > - /*
> > - * This double-check of PageHWPoison is to avoid the race with
> > - * memory_failure(). See also comment in __soft_offline_page().
> > - */
> > - lock_page(hpage);
> > - if (PageHWPoison(hpage)) {
> > - unlock_page(hpage);
> > - put_page(hpage);
> > - pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
> > - return -EBUSY;
> > }
> > - unlock_page(hpage);
> >
> > - ret = isolate_huge_page(hpage, &pagelist);
> > + if (isolated && lru)
> > + inc_node_page_state(page, NR_ISOLATED_ANON +
> > + page_is_file_lru(page));
> > +
> > /*
> > - * get_any_page() and isolate_huge_page() takes a refcount each,
> > - * so need to drop one here.
> > + * If we succeed to isolate the page, we grabbed another refcount on
> > + * the page, so we can safely drop the one we got from get_any_pages().
> > + * If we failed to isolate the page, it means that we cannot go further
> > + * and we will return an error, so drop the reference we got from
> > + * get_any_pages() as well.
> > */
> > - put_page(hpage);
> > - if (!ret) {
> > - pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
> > - return -EBUSY;
> > - }
> > -
> > - ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> > - MIGRATE_SYNC, MR_MEMORY_FAILURE);
> > - if (ret) {
> > - pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",
> > - pfn, ret, page->flags, &page->flags);
> > - if (!list_empty(&pagelist))
> > - putback_movable_pages(&pagelist);
> > - if (ret > 0)
> > - ret = -EIO;
> > - } else {
> > - /*
> > - * We set PG_hwpoison only when we were able to take the page
> > - * off the buddy.
> > - */
> > - if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
> > - page_handle_poison(page, false);
> > - else
> > - ret = -EBUSY;
> > - }
> > - return ret;
> > + put_page(page);
> > + return isolated;
> > }
> >
> > +/*
> > + * __soft_offline_page handles hugetlb-pages and non-hugetlb pages.
> > + * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
> > + * If the page is mapped, it migrates the contents over.
> > + */
> > static int __soft_offline_page(struct page *page)
> > {
> > - int ret;
> > + int ret = 0;
> > unsigned long pfn = page_to_pfn(page);
> > + struct page *hpage = compound_head(page);
> > + const char *msg_page[] = {"page", "hugepage"};
> > + bool huge = PageHuge(page);
> > + LIST_HEAD(pagelist);
> >
> > /*
> > * Check PageHWPoison again inside page lock because PageHWPoison
> > @@ -1783,98 +1794,74 @@ static int __soft_offline_page(struct page *page)
> > * so there's no race between soft_offline_page() and memory_failure().
> > */
> > lock_page(page);
> > - wait_on_page_writeback(page);
> > + if (!PageHuge(page))
> > + wait_on_page_writeback(page);
> > if (PageHWPoison(page)) {
> > unlock_page(page);
> > put_page(page);
> > pr_info("soft offline: %#lx page already poisoned\n", pfn);
> > return -EBUSY;
> > }
> > - /*
> > - * Try to invalidate first. This should work for
> > - * non dirty unmapped page cache pages.
> > - */
> > - ret = invalidate_inode_page(page);
> > +
> > + if (!PageHuge(page))
> > + /*
> > + * Try to invalidate first. This should work for
> > + * non dirty unmapped page cache pages.
> > + */
> > + ret = invalidate_inode_page(page);
> > unlock_page(page);
> > +
> > /*
> > * RED-PEN would be better to keep it isolated here, but we
> > * would need to fix isolation locking first.
> > */
> > - if (ret == 1) {
> > + if (ret) {
> > pr_info("soft_offline: %#lx: invalidated\n", pfn);
> > - page_handle_poison(page, true);
> > + page_handle_poison(page, false, true);
>
> arm64 build failed on linux-next 20200626 tag
>
> make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm64
> CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc CC="sccache
> aarch64-linux-gnu-gcc" O=build Image
> 79 #
> 80 ../mm/memory-failure.c: In function $B!F(B__soft_offline_page$B!G(B:
> 81 ../mm/memory-failure.c:1827:3: error: implicit declaration of
> function $B!F(Bpage_handle_poison$B!G(B; did you mean $B!F(Bpage_init_poison$B!G(B?
> [-Werror=implicit-function-declaration]
> 82 1827 | page_handle_poison(page, false, true);
> 83 | ^~~~~~~~~~~~~~~~~~
> 84 | page_init_poison
Thanks for reporting, Naresh, Qian Cai.
page_handle_poison() is now defined in #ifdef CONFIG_HWPOISON_INJECT block,
which is not correct because this function is used in non-injection code
path. I'd like to move this function out of the block and it affects
multiple patches in this series, so maybe I have to update/resend the whole
series, but I like to wait for a few days for other reviews, so for quick
fix for linux-next I suggest the following one.
Andrew, could you append this diff on top of this series on mmotm?
Thanks,
Naoya Horiguchi
---
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e90ddddab397..b49590bc5615 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -78,38 +78,6 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
-static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
-{
- if (release) {
- put_page(page);
- drain_all_pages(page_zone(page));
- }
-
- if (hugepage_or_freepage) {
- /*
- * Doing this check for free pages is also fine since dissolve_free_huge_page
- * returns 0 for non-hugetlb pages as well.
- */
- if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
- /*
- * The hugetlb page can end up being enqueued back into
- * the freelists by means of:
- * unmap_and_move_huge_page
- * putback_active_hugepage
- * put_page->free_huge_page
- * enqueue_huge_page
- * If this happens, we might lose the race against an allocation.
- */
- return false;
- }
-
- SetPageHWPoison(page);
- page_ref_inc(page);
- num_poisoned_pages_inc();
-
- return true;
-}
-
static int hwpoison_filter_dev(struct page *p)
{
struct address_space *mapping;
@@ -204,6 +172,38 @@ int hwpoison_filter(struct page *p)
EXPORT_SYMBOL_GPL(hwpoison_filter);
+static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
+{
+ if (release) {
+ put_page(page);
+ drain_all_pages(page_zone(page));
+ }
+
+ if (hugepage_or_freepage) {
+ /*
+ * Doing this check for free pages is also fine since dissolve_free_huge_page
+ * returns 0 for non-hugetlb pages as well.
+ */
+ if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
+ /*
+ * The hugetlb page can end up being enqueued back into
+ * the freelists by means of:
+ * unmap_and_move_huge_page
+ * putback_active_hugepage
+ * put_page->free_huge_page
+ * enqueue_huge_page
+ * If this happens, we might lose the race against an allocation.
+ */
+ return false;
+ }
+
+ SetPageHWPoison(page);
+ page_ref_inc(page);
+ num_poisoned_pages_inc();
+
+ return true;
+}
+
/*
* Kill all processes that have a poisoned page mapped and then isolate
* the page.
Hi Naoya,
On Sun, 28 Jun 2020 15:54:09 +0900 Naoya Horiguchi <[email protected]> wrote:
>
> Andrew, could you append this diff on top of this series on mmotm?
I have added that patch to linux-next today.
--
Cheers,
Stephen Rothwell
On Mon, Jun 29, 2020 at 05:22:40PM +1000, Stephen Rothwell wrote:
> Hi Naoya,
>
> On Sun, 28 Jun 2020 15:54:09 +0900 Naoya Horiguchi <[email protected]> wrote:
> >
> > Andrew, could you append this diff on top of this series on mmotm?
>
> I have added that patch to linux-next today.
Thank you!
- Naoya
On Wed, 2020-06-24 at 15:01 +0000, [email protected] wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest
> mmotm. The
> rebasing required some non-trivial changes to adjust, but mainly that
> was
> straightforward. I confirmed that the reported problem doesn't
> reproduce on
> compaction after soft offline. For more precise description of the
> problem
> and the motivation of this patchset, please see [2].
Hi Naoya,
Thanks for dusting this off.
To be honest, I got stuck with the hard offline mode so this delayed
the resubmission, along other problems.
> I think that the following two patches in v2 are better to be done
> with
> separate work of hard-offline rework, so it's not included in this
> series.
>
> - mm,hwpoison: Take pages off the buddy when hard-offlining
> - mm/hwpoison-inject: Rip off duplicated checks
>
> These two are not directly related to the reported problem, so they
> seems
> not urgent. And the first one breaks num_poisoned_pages counting in
> some
> testcases, and The second patch needs more consideration about
> commented point.
I fully agree.
> Any comment/suggestion/help would be appreciated.
My "new" version included a patch to make sure we give a chance to
pages that possibly are in a pcplist.
Current behavior is that if someone tries to soft-offline such a page,
we return an error because page count is 0 but page is not in the buddy
system.
Since this patchset already landed in the mm tree, I could send it as a
standalone patch on top if you agree with it.
My patch looked something like:
From: Oscar Salvador <[email protected]>
Date: Mon, 29 Jun 2020 12:25:11 +0200
Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for
non-buddy
zero-refcount page
A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
Currently, we bail out with an error if we encounter such a page,
meaning that we do not give a chance to handle pcppages.
Fix this by draining pcplists whenever we find this kind of page
and retry the check again.
It might be that pcplists have been spilled into the buddy allocator
and so we can handle it.
Signed-off-by: Oscar Salvador <[email protected]>
---
mm/memory-failure.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e90ddddab397..3aac3f1eeed0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -958,7 +958,7 @@ static int page_action(struct page_state *ps,
struct page *p,
* Return: return 0 if failed to grab the refcount, otherwise true
(some
* non-zero value.)
*/
-static int get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page)
{
struct page *head = compound_head(page);
@@ -988,6 +988,28 @@ static int get_hwpoison_page(struct page *page)
return 0;
}
+static int get_hwpoison_page(struct page *p)
+{
+ int ret;
+ bool drained = false;
+
+retry:
+ ret = __get_hwpoison_page(p);
+ if (!ret) {
+ if (!is_free_buddy_page(p) && !page_count(p) &&
!drained) {
+ /*
+ * The page might be in a pcplist, so try to
drain
+ * those and see if we are lucky.
+ */
+ drain_all_pages(page_zone(p));
+ drained = true;
+ goto retry;
+ }
+ }
+
+ return ret;
+}
+
/*
* Do all that is necessary to remove user space mappings. Unmap
* the pages and send SIGBUS to the processes if the data was dirty.
--
2.26.2
--
Oscar Salvador
SUSE L3
On Wed, Jun 24, 2020 at 03:01:22PM +0000, [email protected] wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm. The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward. I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline. For more precise description of the problem
> and the motivation of this patchset, please see [2].
>
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
>
> - mm,hwpoison: Take pages off the buddy when hard-offlining
> - mm/hwpoison-inject: Rip off duplicated checks
>
> These two are not directly related to the reported problem, so they seems
> not urgent. And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
>
> Any comment/suggestion/help would be appreciated.
Even after applied the compling fix,
https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
would succeed without this series. Steps:
# git clone https://github.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1 (Need at least two NUMA memory nodes)
start: migrate_huge_offline
- use NUMA nodes 0,4.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 4
madvise: Input/output error
(x86.config is also included there.)
[10718.158548][T13039] __get_any_page: 0x1d1600 free huge page
[10718.165684][T13039] Soft offlining pfn 0x1d1600 at process virtual address 0x7f1dd2000000
[10718.175061][T13039] Soft offlining pfn 0x154c00 at process virtual address 0x7f1dd2200000
[10718.185492][T13039] Soft offlining pfn 0x137c00 at process virtual address 0x7f1dd2000000
[10718.195209][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2200000
[10718.203483][T13039] soft offline: 0x4c7a00: hugepage isolation failed: 0, page count 2, type bfffc00001000f (locked|referenced|uptodate|dirty|head)
[10718.218228][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2000000
[10718.227522][T13039] Soft offlining pfn 0x2da800 at process virtual address 0x7f1dd2200000
[10718.238503][T13039] Soft offlining pfn 0x1de200 at process virtual address 0x7f1dd2000000
[10718.247822][T13039] Soft offlining pfn 0x155c00 at process virtual address 0x7f1dd2200000
[10718.259031][T13039] Soft offlining pfn 0x203600 at process virtual address 0x7f1dd2000000
[10718.268504][T13039] Soft offlining pfn 0x417600 at process virtual address 0x7f1dd2200000
[10718.278830][T13039] Soft offlining pfn 0x20a600 at process virtual address 0x7f1dd2000000
[10718.288133][T13039] Soft offlining pfn 0x1d0800 at process virtual address 0x7f1dd2200000
[10718.299198][T13039] Soft offlining pfn 0x3e5800 at process virtual address 0x7f1dd2000000
[10718.308593][T13039] Soft offlining pfn 0x21f200 at process virtual address 0x7f1dd2200000
[10718.319725][T13039] Soft offlining pfn 0x18c600 at process virtual address 0x7f1dd2000000
[10718.329301][T13039] Soft offlining pfn 0x396a00 at process virtual address 0x7f1dd2200000
[10718.378882][T13039] Soft offlining pfn 0x4d5000 at process virtual address 0x7f1dd2000000
[10718.388415][T13039] Soft offlining pfn 0x4e5000 at process virtual address 0x7f1dd2200000
[10718.398807][T13039] Soft offlining pfn 0x2f5000 at process virtual address 0x7f1dd2000000
[10718.408236][T13039] Soft offlining pfn 0x50b400 at process virtual address 0x7f1dd2200000
[10718.419781][T13039] Soft offlining pfn 0x396800 at process virtual address 0x7f1dd2000000
[10718.429677][T13039] Soft offlining pfn 0xd69c00 at process virtual address 0x7f1dd2200000
[10718.440435][T13039] Soft offlining pfn 0x21f000 at process virtual address 0x7f1dd2000000
[10718.450341][T13039] Soft offlining pfn 0x399400 at process virtual address 0x7f1dd2200000
[10718.458768][T13039] __get_any_page: 0x399400: unknown zero refcount page type bfffc000000000
The main part is,
https://github.com/cailca/linux-mm/blob/master/random.c#L372
>
> [1] v1: https://lore.kernel.org/linux-mm/[email protected]/
> [2] v2: https://lore.kernel.org/linux-mm/[email protected]/
>
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
>
> Naoya Horiguchi (7):
> mm,hwpoison: cleanup unused PageHuge() check
> mm, hwpoison: remove recalculating hpage
> mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
> mm,hwpoison-inject: don't pin for hwpoison_filter
> mm,hwpoison: remove MF_COUNT_INCREASED
> mm,hwpoison: remove flag argument from soft offline functions
> mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
>
> Oscar Salvador (8):
> mm,madvise: Refactor madvise_inject_error
> mm,hwpoison: Un-export get_hwpoison_page and make it static
> mm,hwpoison: Kill put_hwpoison_page
> mm,hwpoison: Unify THP handling for hard and soft offline
> mm,hwpoison: Rework soft offline for free pages
> mm,hwpoison: Rework soft offline for in-use pages
> mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
> mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
>
> drivers/base/memory.c | 2 +-
> include/linux/mm.h | 12 +-
> include/linux/page-flags.h | 6 +-
> include/ras/ras_event.h | 3 +
> mm/hwpoison-inject.c | 18 +--
> mm/madvise.c | 39 +++---
> mm/memory-failure.c | 331 ++++++++++++++++++++-------------------------
> mm/migrate.c | 11 +-
> mm/page_alloc.c | 63 +++++++--
> 9 files changed, 233 insertions(+), 252 deletions(-)
>
On Tue, 2020-06-30 at 01:08 -0400, Qian Cai wrote:
> On Wed, Jun 24, 2020 at 03:01:22PM +0000, [email protected]
> wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm. The
> > rebasing required some non-trivial changes to adjust, but mainly
> > that was
> > straightforward. I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline. For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
> >
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> >
> > - mm,hwpoison: Take pages off the buddy when hard-offlining
> > - mm/hwpoison-inject: Rip off duplicated checks
> >
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent. And the first one breaks num_poisoned_pages counting
> > in some
> > testcases, and The second patch needs more consideration about
> > commented point.
> >
> > Any comment/suggestion/help would be appreciated.
>
> Even after applied the compling fix,
>
> https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
>
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
>
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
> start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error
I think I know why.
It's been a while since I took a look, but I compared the posted
patchset with my newest patchset I had ready and I saw I made some
changes with regard of hugetlb pages.
I will be taking a look, although it might be better to re-post the
patchset instead of adding a fix on top since the changes are a bit
substantial.
Thanks for reporting.
--
Oscar Salvador
SUSE L3
On Mon, Jun 29, 2020 at 12:29:25PM +0200, Oscar Salvador wrote:
> On Wed, 2020-06-24 at 15:01 +0000, [email protected] wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm. The
> > rebasing required some non-trivial changes to adjust, but mainly that
> > was
> > straightforward. I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline. For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
>
> Hi Naoya,
>
> Thanks for dusting this off.
> To be honest, I got stuck with the hard offline mode so this delayed
> the resubmission, along other problems.
>
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> >
> > - mm,hwpoison: Take pages off the buddy when hard-offlining
> > - mm/hwpoison-inject: Rip off duplicated checks
> >
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent. And the first one breaks num_poisoned_pages counting in
> > some
> > testcases, and The second patch needs more consideration about
> > commented point.
>
> I fully agree.
>
> > Any comment/suggestion/help would be appreciated.
>
> My "new" version included a patch to make sure we give a chance to
> pages that possibly are in a pcplist.
> Current behavior is that if someone tries to soft-offline such a page,
> we return an error because page count is 0 but page is not in the buddy
> system.
>
> Since this patchset already landed in the mm tree, I could send it as a
> standalone patch on top if you agree with it.
>
> My patch looked something like:
>
> From: Oscar Salvador <[email protected]>
> Date: Mon, 29 Jun 2020 12:25:11 +0200
> Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for
> non-buddy
> zero-refcount page
>
> A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
> Currently, we bail out with an error if we encounter such a page,
> meaning that we do not give a chance to handle pcppages.
>
> Fix this by draining pcplists whenever we find this kind of page
> and retry the check again.
> It might be that pcplists have been spilled into the buddy allocator
> and so we can handle it.
>
> Signed-off-by: Oscar Salvador <[email protected]>
I'm fine with this patch, so this will be with the next version with
some minor change like moving function comment block together.
Actually I feel we could refactor get_hwpoison_page() with get_any_page()
to handle this kind of pcplist related issues in one place. But this could
be more important in hard-offline reworking, and we don't have to do it
in this series.
Thanks,
Naoya Horiguchi
On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > Even after applied the compling fix,
> >
> > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> >
> > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > would succeed without this series. Steps:
> >
> > # git clone https://github.com/cailca/linux-mm
> > # cd linux-mm; make
> > # ./random 1 (Need at least two NUMA memory nodes)
> > start: migrate_huge_offline
> > - use NUMA nodes 0,4.
> > - mmap and free 8388608 bytes hugepages on node 0
> > - mmap and free 8388608 bytes hugepages on node 4
> > madvise: Input/output error
>
> I think I know why.
> It's been a while since I took a look, but I compared the posted
> patchset with my newest patchset I had ready and I saw I made some
> changes with regard of hugetlb pages.
>
> I will be taking a look, although it might be better to re-post the
> patchset instead of adding a fix on top since the changes are a bit
> substantial.
Yap, I just confirmed this.
Posted version had some flaws wrt. hugetlb pages, that is why I was
working on a v3.
I am rebasing my v3 on top of the mm with the posted patchset reverted.
I expect to post it on Friday.
--
Oscar Salvador
SUSE L3
On Wed, Jul 01, 2020 at 10:22:07AM +0200, Oscar Salvador wrote:
> On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > > Even after applied the compling fix,
> > >
> > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> > >
> > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > > would succeed without this series. Steps:
> > >
> > > # git clone https://github.com/cailca/linux-mm
> > > # cd linux-mm; make
> > > # ./random 1 (Need at least two NUMA memory nodes)
> > > start: migrate_huge_offline
> > > - use NUMA nodes 0,4.
> > > - mmap and free 8388608 bytes hugepages on node 0
> > > - mmap and free 8388608 bytes hugepages on node 4
> > > madvise: Input/output error
> >
> > I think I know why.
> > It's been a while since I took a look, but I compared the posted
> > patchset with my newest patchset I had ready and I saw I made some
> > changes with regard of hugetlb pages.
> >
> > I will be taking a look, although it might be better to re-post the
> > patchset instead of adding a fix on top since the changes are a bit
> > substantial.
>
> Yap, I just confirmed this.
I've reproduced the EIO, but still not sure how this happens ...
> Posted version had some flaws wrt. hugetlb pages, that is why I was
> working on a v3.
> I am rebasing my v3 on top of the mm with the posted patchset reverted.
OK, thank you for offering to work for v4, so I'll wait for it for testing.
Thanks,
Naoya Horiguchi
On Tue, Jun 30, 2020 at 01:08:03AM -0400, Qian Cai wrote:
> Even after applied the compling fix,
>
> https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
>
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
>
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
> start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error
Ok, sorry for the lateness, but I had to re-fetch the code on my brain again.
I just finished v4 of this patchset and it seems this problem is gone:
# ./random 1
- start: migrate_huge_offline
- use NUMA nodes 0,1.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 1
- pass: mmap_offline_node_huge
- start: hotplug_memory
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Invalid argument
offline: Device or resource busy
offline: Invalid argument
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
- pass: hotplug_memory
The test seems to suceed and no crash on the kernel side.
I will just run some more tests to make sure the thing is solid enough
and then I will post v4.
Thanks
--
Oscar Salvador
SUSE L3
On Tue, Jul 14, 2020 at 12:08:46PM +0200, Oscar Salvador wrote:
> On Tue, Jun 30, 2020 at 01:08:03AM -0400, Qian Cai wrote:
> > Even after applied the compling fix,
> >
> > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> >
> > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > would succeed without this series. Steps:
> >
> > # git clone https://github.com/cailca/linux-mm
> > # cd linux-mm; make
> > # ./random 1 (Need at least two NUMA memory nodes)
> > start: migrate_huge_offline
> > - use NUMA nodes 0,4.
> > - mmap and free 8388608 bytes hugepages on node 0
> > - mmap and free 8388608 bytes hugepages on node 4
> > madvise: Input/output error
>
> Ok, sorry for the lateness, but I had to re-fetch the code on my brain again.
>
> I just finished v4 of this patchset and it seems this problem is gone:
>
> # ./random 1
> - start: migrate_huge_offline
> - use NUMA nodes 0,1.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 1
> - pass: mmap_offline_node_huge
> - start: hotplug_memory
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Invalid argument
> offline: Device or resource busy
> offline: Invalid argument
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> offline: Device or resource busy
> - pass: hotplug_memory
>
> The test seems to suceed and no crash on the kernel side.
>
> I will just run some more tests to make sure the thing is solid enough
> and then I will post v4.
Great, I plan to reproduce a bit more of the crash below which I am not 100%
sure yet which patchset/patch is the culprit where that LTP move_pages12
reproducer does the similar things.
https://lore.kernel.org/lkml/[email protected]/