Hi everyone,
This is v3 of the fix of return value issue of hugepage soft-offlining
(v2: https://lkml.org/lkml/2019/6/10/156).
Straightforwardly applied feedbacks on v2.
Thanks,
Naoya Horiguchi
The pass/fail of soft offline should be judged by checking whether the
raw error page was finally contained or not (i.e. the result of
set_hwpoison_free_buddy_page()), but current code do not work like that.
So this patch is suggesting to fix it.
Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may
not offline the original page and will not return an error. It might
lead us to misjudge the test result when set_hwpoison_free_buddy_page()
actually fails.
Signed-off-by: Naoya Horiguchi <[email protected]>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: <[email protected]> # v4.19+
---
ChangeLog v2->v3:
- update patch description to clarify user visible change
---
mm/memory-failure.c | 2 ++
1 file changed, 2 insertions(+)
diff --git v5.2-rc4/mm/memory-failure.c v5.2-rc4_patched/mm/memory-failure.c
index 8da0334..8ee7b16 100644
--- v5.2-rc4/mm/memory-failure.c
+++ v5.2-rc4_patched/mm/memory-failure.c
@@ -1730,6 +1730,8 @@ static int soft_offline_huge_page(struct page *page, int flags)
if (!ret) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
+ else
+ ret = -EBUSY;
}
}
return ret;
--
2.7.0
madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
for hugepages with overcommitting enabled. That was caused by the suboptimal
code in current soft-offline code. See the following part:
ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
...
} else {
/*
* We set PG_hwpoison only when the migration source hugepage
* was successfully dissolved, because otherwise hwpoisoned
* hugepage remains on free hugepage list, then userspace will
* find it as SIGBUS by allocation failure. That's not expected
* in soft-offlining.
*/
ret = dissolve_free_huge_page(page);
if (!ret) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
}
}
return ret;
Here dissolve_free_huge_page() returns -EBUSY if the migration source page
was freed into buddy in migrate_pages(), but even in that case we actually
has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
current code gives up offlining too early now.
dissolve_free_huge_page() checks that a given hugepage is suitable for
dissolving, where we should return success for !PageHuge() case because
the given hugepage is considered as already dissolved.
This change also affects other callers of dissolve_free_huge_page(),
which are cleaned up together.
Reported-by: Chen, Jerry T <[email protected]>
Tested-by: Chen, Jerry T <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: <[email protected]> # v4.19+
---
ChangeLog v2->v3:
- add PageHuge check in dissolve_free_huge_page() outside hugetlb_lock
- update comment on dissolve_free_huge_page() about return value
---
mm/hugetlb.c | 29 ++++++++++++++++++++---------
mm/memory-failure.c | 5 +----
2 files changed, 21 insertions(+), 13 deletions(-)
diff --git v5.2-rc4/mm/hugetlb.c v5.2-rc4_patched/mm/hugetlb.c
index ac843d3..ede7e7f 100644
--- v5.2-rc4/mm/hugetlb.c
+++ v5.2-rc4_patched/mm/hugetlb.c
@@ -1510,16 +1510,29 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
/*
* Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * dissolution fails because a give page is not a free hugepage, or because
- * free hugepages are fully reserved.
+ * nothing for in-use hugepages and non-hugepages.
+ * This function returns values like below:
+ *
+ * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
+ * (allocated or reserved.)
+ * 0: successfully dissolved free hugepages or the page is not a
+ * hugepage (considered as already dissolved)
*/
int dissolve_free_huge_page(struct page *page)
{
int rc = -EBUSY;
+ /* Not to disrupt normal path by vainly holding hugetlb_lock */
+ if (!PageHuge(page))
+ return 0;
+
spin_lock(&hugetlb_lock);
- if (PageHuge(page) && !page_count(page)) {
+ if (!PageHuge(page)) {
+ rc = 0;
+ goto out;
+ }
+
+ if (!page_count(page)) {
struct page *head = compound_head(page);
struct hstate *h = page_hstate(head);
int nid = page_to_nid(head);
@@ -1564,11 +1577,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
page = pfn_to_page(pfn);
- if (PageHuge(page) && !page_count(page)) {
- rc = dissolve_free_huge_page(page);
- if (rc)
- break;
- }
+ rc = dissolve_free_huge_page(page);
+ if (rc)
+ break;
}
return rc;
diff --git v5.2-rc4/mm/memory-failure.c v5.2-rc4_patched/mm/memory-failure.c
index 8ee7b16..d9cc660 100644
--- v5.2-rc4/mm/memory-failure.c
+++ v5.2-rc4_patched/mm/memory-failure.c
@@ -1856,11 +1856,8 @@ static int soft_offline_in_use_page(struct page *page, int flags)
static int soft_offline_free_page(struct page *page)
{
- int rc = 0;
- struct page *head = compound_head(page);
+ int rc = dissolve_free_huge_page(page);
- if (PageHuge(head))
- rc = dissolve_free_huge_page(page);
if (!rc) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
--
2.7.0
On Mon, Jun 17, 2019 at 05:51:15PM +0900, Naoya Horiguchi wrote:
> The pass/fail of soft offline should be judged by checking whether the
> raw error page was finally contained or not (i.e. the result of
> set_hwpoison_free_buddy_page()), but current code do not work like that.
> So this patch is suggesting to fix it.
>
> Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may
> not offline the original page and will not return an error. It might
> lead us to misjudge the test result when set_hwpoison_free_buddy_page()
> actually fails.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> Cc: <[email protected]> # v4.19+
Reviewed-by: Oscar Salvador <[email protected]>
> ---
> ChangeLog v2->v3:
> - update patch description to clarify user visible change
> ---
> mm/memory-failure.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git v5.2-rc4/mm/memory-failure.c v5.2-rc4_patched/mm/memory-failure.c
> index 8da0334..8ee7b16 100644
> --- v5.2-rc4/mm/memory-failure.c
> +++ v5.2-rc4_patched/mm/memory-failure.c
> @@ -1730,6 +1730,8 @@ static int soft_offline_huge_page(struct page *page, int flags)
> if (!ret) {
> if (set_hwpoison_free_buddy_page(page))
> num_poisoned_pages_inc();
> + else
> + ret = -EBUSY;
> }
> }
> return ret;
> --
> 2.7.0
>
--
Oscar Salvador
SUSE L3
On Mon, Jun 17, 2019 at 05:51:16PM +0900, Naoya Horiguchi wrote:
> madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
> for hugepages with overcommitting enabled. That was caused by the suboptimal
> code in current soft-offline code. See the following part:
>
> ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> MIGRATE_SYNC, MR_MEMORY_FAILURE);
> if (ret) {
> ...
> } else {
> /*
> * We set PG_hwpoison only when the migration source hugepage
> * was successfully dissolved, because otherwise hwpoisoned
> * hugepage remains on free hugepage list, then userspace will
> * find it as SIGBUS by allocation failure. That's not expected
> * in soft-offlining.
> */
> ret = dissolve_free_huge_page(page);
> if (!ret) {
> if (set_hwpoison_free_buddy_page(page))
> num_poisoned_pages_inc();
> }
> }
> return ret;
Hi Naoya,
just a nit:
>
> Here dissolve_free_huge_page() returns -EBUSY if the migration source page
> was freed into buddy in migrate_pages(), but even in that case we actually
> has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
> current code gives up offlining too early now.
Maybe it is me that I am not really familiar with hugetlb code, but having had
a comment pointing out that the releasing of overcommited hugetlb pages into the
buddy allocator happens in migrate_pages()->put_page()->free_huge_page() would
have been great.
>
> dissolve_free_huge_page() checks that a given hugepage is suitable for
> dissolving, where we should return success for !PageHuge() case because
> the given hugepage is considered as already dissolved.
>
> This change also affects other callers of dissolve_free_huge_page(),
> which are cleaned up together.
>
> Reported-by: Chen, Jerry T <[email protected]>
> Tested-by: Chen, Jerry T <[email protected]>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> Cc: <[email protected]> # v4.19+
Reviewed-by: Oscar Salvador <[email protected]>
--
Oscar Salvador
SUSE L3
On 6/17/19 1:51 AM, Naoya Horiguchi wrote:
> The pass/fail of soft offline should be judged by checking whether the
> raw error page was finally contained or not (i.e. the result of
> set_hwpoison_free_buddy_page()), but current code do not work like that.
> So this patch is suggesting to fix it.
>
> Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may
> not offline the original page and will not return an error. It might
> lead us to misjudge the test result when set_hwpoison_free_buddy_page()
> actually fails.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
Thanks for the updates,
Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz
On 6/17/19 1:51 AM, Naoya Horiguchi wrote:
> madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
> for hugepages with overcommitting enabled. That was caused by the suboptimal
> code in current soft-offline code. See the following part:
>
> ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> MIGRATE_SYNC, MR_MEMORY_FAILURE);
> if (ret) {
> ...
> } else {
> /*
> * We set PG_hwpoison only when the migration source hugepage
> * was successfully dissolved, because otherwise hwpoisoned
> * hugepage remains on free hugepage list, then userspace will
> * find it as SIGBUS by allocation failure. That's not expected
> * in soft-offlining.
> */
> ret = dissolve_free_huge_page(page);
> if (!ret) {
> if (set_hwpoison_free_buddy_page(page))
> num_poisoned_pages_inc();
> }
> }
> return ret;
>
> Here dissolve_free_huge_page() returns -EBUSY if the migration source page
> was freed into buddy in migrate_pages(), but even in that case we actually
> has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
> current code gives up offlining too early now.
>
> dissolve_free_huge_page() checks that a given hugepage is suitable for
> dissolving, where we should return success for !PageHuge() case because
> the given hugepage is considered as already dissolved.
>
> This change also affects other callers of dissolve_free_huge_page(),
> which are cleaned up together.
>
> Reported-by: Chen, Jerry T <[email protected]>
> Tested-by: Chen, Jerry T <[email protected]>
> Signed-off-by: Naoya Horiguchi <[email protected]>
Thanks,
Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz