2015-08-10 11:56:34

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 5/5] mm/hwpoison: replace most of put_page in memory error handling by put_hwpoison_page

Replace most of put_page in memory error handling by put_hwpoison_page,
except the ones at the front of soft_offline_page since the page maybe
THP page and the get refcount in madvise_hwpoison is against the single
4KB page instead of the logic in get_hwpoison_page.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 28 +++++++++++++---------------
1 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fa9aa21..6179fc1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1159,9 +1159,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
pr_err("MCE: %#lx: thp split failed\n", pfn);
if (TestClearPageHWPoison(p))
atomic_long_sub(nr_pages, &num_poisoned_pages);
- put_page(p);
- if (p != hpage)
- put_page(hpage);
+ put_hwpoison_page(p);
return -EBUSY;
}
VM_BUG_ON_PAGE(!page_count(p), p);
@@ -1222,14 +1220,14 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn);
atomic_long_sub(nr_pages, &num_poisoned_pages);
unlock_page(hpage);
- put_page(hpage);
+ put_hwpoison_page(hpage);
return 0;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
atomic_long_sub(nr_pages, &num_poisoned_pages);
unlock_page(hpage);
- put_page(hpage);
+ put_hwpoison_page(hpage);
return 0;
}

@@ -1243,7 +1241,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
if (PageHuge(p) && PageTail(p) && TestSetPageHWPoison(hpage)) {
action_result(pfn, MF_MSG_POISONED_HUGE, MF_IGNORED);
unlock_page(hpage);
- put_page(hpage);
+ put_hwpoison_page(hpage);
return 0;
}
/*
@@ -1477,9 +1475,9 @@ int unpoison_memory(unsigned long pfn)
}
unlock_page(page);

- put_page(page);
+ put_hwpoison_page(page);
if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
- put_page(page);
+ put_hwpoison_page(page);

return 0;
}
@@ -1539,7 +1537,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
/*
* Try to free it.
*/
- put_page(page);
+ put_hwpoison_page(page);
shake_page(page, 1);

/*
@@ -1548,7 +1546,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
ret = __get_any_page(page, pfn, 0);
if (!PageLRU(page)) {
/* Drop page reference which is from __get_any_page() */
- put_page(page);
+ put_hwpoison_page(page);
pr_info("soft_offline: %#lx: unknown non LRU page type %lx\n",
pfn, page->flags);
return -EIO;
@@ -1571,7 +1569,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
lock_page(hpage);
if (PageHWPoison(hpage)) {
unlock_page(hpage);
- put_page(hpage);
+ put_hwpoison_page(hpage);
pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
return -EBUSY;
}
@@ -1582,7 +1580,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
* get_any_page() and isolate_huge_page() takes a refcount each,
* so need to drop one here.
*/
- put_page(hpage);
+ put_hwpoison_page(hpage);
if (!ret) {
pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
return -EBUSY;
@@ -1631,7 +1629,7 @@ static int __soft_offline_page(struct page *page, int flags)
wait_on_page_writeback(page);
if (PageHWPoison(page)) {
unlock_page(page);
- put_page(page);
+ put_hwpoison_page(page);
pr_info("soft offline: %#lx page already poisoned\n", pfn);
return -EBUSY;
}
@@ -1646,7 +1644,7 @@ static int __soft_offline_page(struct page *page, int flags)
* would need to fix isolation locking first.
*/
if (ret == 1) {
- put_page(page);
+ put_hwpoison_page(page);
pr_info("soft_offline: %#lx: invalidated\n", pfn);
SetPageHWPoison(page);
atomic_long_inc(&num_poisoned_pages);
@@ -1663,7 +1661,7 @@ static int __soft_offline_page(struct page *page, int flags)
* Drop page reference which is came from get_any_page()
* successful isolate_lru_page() already took another one.
*/
- put_page(page);
+ put_hwpoison_page(page);
if (!ret) {
LIST_HEAD(pagelist);
inc_zone_page_state(page, NR_ISOLATED_ANON +
--
1.7.1


2015-08-12 08:57:10

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] mm/hwpoison: replace most of put_page in memory error handling by put_hwpoison_page

On Mon, Aug 10, 2015 at 07:28:23PM +0800, Wanpeng Li wrote:
> Replace most of put_page in memory error handling by put_hwpoison_page,
> except the ones at the front of soft_offline_page since the page maybe
> THP page and the get refcount in madvise_hwpoison is against the single
> 4KB page instead of the logic in get_hwpoison_page.
>
> Signed-off-by: Wanpeng Li <[email protected]>

# Sorry for my late response.

If I read correctly, get_user_pages_fast() (called by madvise_hwpoison)
for a THP tail page takes a refcount from each of head and tail page.
gup_huge_pmd() does this in the fast path, and get_page_foll() does this
in the slow path (maybe via the following code path)

get_user_pages_unlocked
__get_user_pages_unlocked
__get_user_pages_locked
__get_user_pages
follow_page_mask
follow_trans_huge_pmd (with FOLL_GET set)
get_page_foll

So this should be equivalent to what get_hwpoison_page() does for thp pages
with regard to refcounting.

And I'm expecting that a refcount taken by get_hwpoison_page() is released
by put_hwpoison_page() even if the page's status is changed during error
handling (the typical (or only?) case is successful thp split.)

So I think you can apply put_hwpoison_page() for 3 more callsites in
mm/memory-failure.c.
- MF_MSG_POISONED_HUGE case
- "soft offline: %#lx page already poisoned" case (you mentioned above)
- "soft offline: %#lx: failed to split THP" case (you mentioned above)

Thanks,
Naoya Horiguchi-

2015-08-12 09:13:48

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] mm/hwpoison: replace most of put_page in memory error handling by put_hwpoison_page

On 8/12/15 4:55 PM, Naoya Horiguchi wrote:
> On Mon, Aug 10, 2015 at 07:28:23PM +0800, Wanpeng Li wrote:
>> Replace most of put_page in memory error handling by put_hwpoison_page,
>> except the ones at the front of soft_offline_page since the page maybe
>> THP page and the get refcount in madvise_hwpoison is against the single
>> 4KB page instead of the logic in get_hwpoison_page.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
> # Sorry for my late response.
>
> If I read correctly, get_user_pages_fast() (called by madvise_hwpoison)
> for a THP tail page takes a refcount from each of head and tail page.
> gup_huge_pmd() does this in the fast path, and get_page_foll() does this
> in the slow path (maybe via the following code path)
>
> get_user_pages_unlocked
> __get_user_pages_unlocked
> __get_user_pages_locked
> __get_user_pages
> follow_page_mask
> follow_trans_huge_pmd (with FOLL_GET set)
> get_page_foll
>
> So this should be equivalent to what get_hwpoison_page() does for thp pages
> with regard to refcounting.
>
> And I'm expecting that a refcount taken by get_hwpoison_page() is released
> by put_hwpoison_page() even if the page's status is changed during error
> handling (the typical (or only?) case is successful thp split.)

Indeed. :-)

>
> So I think you can apply put_hwpoison_page() for 3 more callsites in
> mm/memory-failure.c.
> - MF_MSG_POISONED_HUGE case

I have already done this in my patch.

> - "soft offline: %#lx page already poisoned" case (you mentioned above)
> - "soft offline: %#lx: failed to split THP" case (you mentioned above)

You are right, I will send a patch rebased on this one since they are
merged.

Regards,
Wanpeng Li

>
> Thanks,
> Naoya Horiguchi

2015-08-12 09:35:27

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] mm/hwpoison: replace most of put_page in memory error handling by put_hwpoison_page

On 8/12/15 5:13 PM, Wanpeng Li wrote:
> On 8/12/15 4:55 PM, Naoya Horiguchi wrote:
>> On Mon, Aug 10, 2015 at 07:28:23PM +0800, Wanpeng Li wrote:
>>> Replace most of put_page in memory error handling by put_hwpoison_page,
>>> except the ones at the front of soft_offline_page since the page maybe
>>> THP page and the get refcount in madvise_hwpoison is against the single
>>> 4KB page instead of the logic in get_hwpoison_page.
>>>
>>> Signed-off-by: Wanpeng Li <[email protected]>
>> # Sorry for my late response.
>>
>> If I read correctly, get_user_pages_fast() (called by madvise_hwpoison)
>> for a THP tail page takes a refcount from each of head and tail page.
>> gup_huge_pmd() does this in the fast path, and get_page_foll() does this
>> in the slow path (maybe via the following code path)
>>
>> get_user_pages_unlocked
>> __get_user_pages_unlocked
>> __get_user_pages_locked
>> __get_user_pages
>> follow_page_mask
>> follow_trans_huge_pmd (with FOLL_GET set)
>> get_page_foll
>>
>> So this should be equivalent to what get_hwpoison_page() does for thp pages
>> with regard to refcounting.
>>
>> And I'm expecting that a refcount taken by get_hwpoison_page() is released
>> by put_hwpoison_page() even if the page's status is changed during error
>> handling (the typical (or only?) case is successful thp split.)
> Indeed. :-)
>
>> So I think you can apply put_hwpoison_page() for 3 more callsites in
>> mm/memory-failure.c.
>> - MF_MSG_POISONED_HUGE case
> I have already done this in my patch.
>
>> - "soft offline: %#lx page already poisoned" case (you mentioned above)
>> - "soft offline: %#lx: failed to split THP" case (you mentioned above)
> You are right, I will send a patch rebased on this one since they are
> merged.

The fix patch is in attachment. :)

Regards,
Wanpeng Li


Attachments:
0001-mm-hwpoison-mm-hwpoison-replace-most-of-put_page-in-.patch (1.32 kB)