2022-04-27 09:43:32

by Naoya Horiguchi

[permalink] [raw]
Subject: [RFC PATCH v1 4/4] mm, memory_hotplug: fix inconsistent num_poisoned_pages on memory hotremove

From: Naoya Horiguchi <[email protected]>

When offlining memory section with hwpoisoned pages, the hwpoisons are
canceled. But num_poisoned_pages is not updated for that event, so the
counter becomes inconsistent.

Add num_poisoned_pages_dec() in __offline_isolated_pages(). PageHWPoison
can be set on a tail page of some high order buddy page, so we need check
PageHWPoison on each subpage.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/page_alloc.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e5b4488a0c5..dcd962855617 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9487,12 +9487,15 @@ void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
zone = page_zone(pfn_to_page(pfn));
spin_lock_irqsave(&zone->lock, flags);
while (pfn < end_pfn) {
+ int i;
+
page = pfn_to_page(pfn);
/*
* The HWPoisoned page may be not in buddy system, and
* page_count() is not 0.
*/
if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
+ num_poisoned_pages_dec();
pfn++;
continue;
}
@@ -9510,6 +9513,9 @@ void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
BUG_ON(page_count(page));
BUG_ON(!PageBuddy(page));
order = buddy_order(page);
+ for (i = 0; i < 1 << order; i++)
+ if (PageHWPoison(page + i))
+ num_poisoned_pages_dec();
del_page_from_free_list(page, zone, order);
pfn += (1 << order);
}
--
2.25.1


Subject: Re: [RFC PATCH v1 4/4] mm, memory_hotplug: fix inconsistent num_poisoned_pages on memory hotremove

On Thu, Apr 28, 2022 at 11:20:16AM +0800, Miaohe Lin wrote:
> On 2022/4/27 12:28, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <[email protected]>
> >
> > When offlining memory section with hwpoisoned pages, the hwpoisons are
> > canceled. But num_poisoned_pages is not updated for that event, so the
> > counter becomes inconsistent.
>
> IIUC, this work is already done via clear_hwpoisoned_pages when __remove_pages.
> Or am I miss something?

Actually I had the same question when writing this patch, and found that
__remove_pages() seems to be called from device memory or HMM, but not from
offline_pages(). If you mean that we could make offline_pages() call
clear_hwpoisoned_pages(), that seems possible and I'll consider it.

But as David and Oscar pointed out for 0/4, removing PageHWPoison flags
in offlining seems not to be right thing, so I'd like to have some consensus
on what way to go first.

Thanks,
Naoya Horiguchi

2022-05-09 13:43:45

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 4/4] mm, memory_hotplug: fix inconsistent num_poisoned_pages on memory hotremove

On Thu, Apr 28, 2022 at 03:16:01PM +0800, Miaohe Lin wrote:
> On 2022/4/28 12:05, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, Apr 28, 2022 at 11:20:16AM +0800, Miaohe Lin wrote:
> >> On 2022/4/27 12:28, Naoya Horiguchi wrote:
> >>> From: Naoya Horiguchi <[email protected]>
> >>>
> >>> When offlining memory section with hwpoisoned pages, the hwpoisons are
> >>> canceled. But num_poisoned_pages is not updated for that event, so the
> >>> counter becomes inconsistent.
> >>
> >> IIUC, this work is already done via clear_hwpoisoned_pages when __remove_pages.
> >> Or am I miss something?
> >
> > Actually I had the same question when writing this patch, and found that
> > __remove_pages() seems to be called from device memory or HMM, but not from
>
> It seems remove_memory (which calls __remove_pages) will be called as .detach callback of
> memory_device_handler in drivers/acpi/acpi_memhotplug.c. So the hwpoison info will also be
> clear for that memory ?

Sorry, you're right. That code path also calls __remove_pages() and
clear_hwpoisoned_pages(). So most major usecases of memory hotremove seems
not to be affected by the reported problem.

Thanks,
Naoya Horiguchi