2015-08-10 06:49:58

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/2] mm/hwpoison: fix fail to split THP w/ refcount held

THP pages will get a refcount in madvise_hwpoison() w/ MF_COUNT_INCREASED
flag, however, the refcount is still held when fail to split THP pages.

Fix it by reducing the refcount of THP pages when fail to split THP.

Signed-off-by: Wanpeng Li <[email protected]>
---
mm/memory-failure.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8077b1c..56b8a71 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1710,6 +1710,8 @@ int soft_offline_page(struct page *page, int flags)
if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
pr_info("soft offline: %#lx: failed to split THP\n",
pfn);
+ if (flags & MF_COUNT_INCREASED)
+ put_page(page);
return -EBUSY;
}
}
--
1.7.1


2015-08-10 08:18:57

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/hwpoison: fix fail to split THP w/ refcount held

On Mon, Aug 10, 2015 at 02:32:30PM +0800, Wanpeng Li wrote:
> THP pages will get a refcount in madvise_hwpoison() w/ MF_COUNT_INCREASED
> flag, however, the refcount is still held when fail to split THP pages.
>
> Fix it by reducing the refcount of THP pages when fail to split THP.
>
> Signed-off-by: Wanpeng Li <[email protected]>

It seems that the same conditional put_page() would be added to
"soft offline: %#lx page already poisoned" branch too, right?

> ---
> mm/memory-failure.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8077b1c..56b8a71 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1710,6 +1710,8 @@ int soft_offline_page(struct page *page, int flags)
> if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
> pr_info("soft offline: %#lx: failed to split THP\n",
> pfn);
> + if (flags & MF_COUNT_INCREASED)
> + put_page(page);
> return -EBUSY;
> }
> }
> --
> 1.7.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>-

2015-08-10 08:29:28

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/hwpoison: fix fail to split THP w/ refcount held

Hi Naoya,

On 8/10/15 4:10 PM, Naoya Horiguchi wrote:
> On Mon, Aug 10, 2015 at 02:32:30PM +0800, Wanpeng Li wrote:
>> THP pages will get a refcount in madvise_hwpoison() w/ MF_COUNT_INCREASED
>> flag, however, the refcount is still held when fail to split THP pages.
>>
>> Fix it by reducing the refcount of THP pages when fail to split THP.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
> It seems that the same conditional put_page() would be added to
> "soft offline: %#lx page already poisoned" branch too, right?

PageHWPoison() is just called before the soft_offline_page() in
madvise_hwpoion(). I think the PageHWPosion()
check in soft_offline_page() makes more sense for the other
soft_offline_page() callsites which don't have the
refcount held.

Regards,
Wanpeng Li

>
>> ---
>> mm/memory-failure.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 8077b1c..56b8a71 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1710,6 +1710,8 @@ int soft_offline_page(struct page *page, int flags)
>> if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
>> pr_info("soft offline: %#lx: failed to split THP\n",
>> pfn);
>> + if (flags & MF_COUNT_INCREASED)
>> + put_page(page);
>> return -EBUSY;
>> }
>> }
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2015-08-10 08:51:59

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/hwpoison: fix fail to split THP w/ refcount held

On Mon, Aug 10, 2015 at 04:29:18PM +0800, Wanpeng Li wrote:
> Hi Naoya,
>
> On 8/10/15 4:10 PM, Naoya Horiguchi wrote:
> >On Mon, Aug 10, 2015 at 02:32:30PM +0800, Wanpeng Li wrote:
> >>THP pages will get a refcount in madvise_hwpoison() w/ MF_COUNT_INCREASED
> >>flag, however, the refcount is still held when fail to split THP pages.
> >>
> >>Fix it by reducing the refcount of THP pages when fail to split THP.
> >>
> >>Signed-off-by: Wanpeng Li <[email protected]>
> >It seems that the same conditional put_page() would be added to
> >"soft offline: %#lx page already poisoned" branch too, right?
>
> PageHWPoison() is just called before the soft_offline_page() in
> madvise_hwpoion(). I think the PageHWPosion()
> check in soft_offline_page() makes more sense for the other
> soft_offline_page() callsites which don't have the
> refcount held.

What I am worried is a race like below:

CPU0 CPU1

madvise_hwpoison
get_user_pages_fast
PageHWPoison check (false)
memory_failure
TestSetPageHWPoison
soft_offline_page
PageHWPoison check (true)
return -EBUSY (without put_page)

It's rare and madvise_hwpoison() is testing feature, so this never causes
real problems in production systems, so it's not a big deal.
My suggestion is maybe just for code correctness thing ...

Thanks,
Naoya Horiguchi-

2015-08-10 09:16:32

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/hwpoison: fix fail to split THP w/ refcount held



On 8/10/15 4:50 PM, Naoya Horiguchi wrote:
> On Mon, Aug 10, 2015 at 04:29:18PM +0800, Wanpeng Li wrote:
>> Hi Naoya,
>>
>> On 8/10/15 4:10 PM, Naoya Horiguchi wrote:
>>> On Mon, Aug 10, 2015 at 02:32:30PM +0800, Wanpeng Li wrote:
>>>> THP pages will get a refcount in madvise_hwpoison() w/ MF_COUNT_INCREASED
>>>> flag, however, the refcount is still held when fail to split THP pages.
>>>>
>>>> Fix it by reducing the refcount of THP pages when fail to split THP.
>>>>
>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>> It seems that the same conditional put_page() would be added to
>>> "soft offline: %#lx page already poisoned" branch too, right?
>> PageHWPoison() is just called before the soft_offline_page() in
>> madvise_hwpoion(). I think the PageHWPosion()
>> check in soft_offline_page() makes more sense for the other
>> soft_offline_page() callsites which don't have the
>> refcount held.
> What I am worried is a race like below:
>
> CPU0 CPU1
>
> madvise_hwpoison
> get_user_pages_fast
> PageHWPoison check (false)
> memory_failure
> TestSetPageHWPoison
> soft_offline_page
> PageHWPoison check (true)
> return -EBUSY (without put_page)

Indeed, there is a race even through it is rared happen.

>
> It's rare and madvise_hwpoison() is testing feature, so this never causes
> real problems in production systems, so it's not a big deal.
> My suggestion is maybe just for code correctness thing ...

Thanks for your proposal, I will add your suggestion in v2 and post out
after we have a uniform solution for patch 2/2. :)

Regards,
Wanpeng Li

>
> Thanks,
> Naoya Horiguchi