2022-08-15 01:31:14

by Wang, Haiyue

[permalink] [raw]
Subject: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page

Not all huge page APIs support FOLL_GET option, so the __NR_move_pages
will fail to get the page node information for huge page.

This is an temporary solution to mitigate the racing fix.

After supporting follow huge page by FOLL_GET is done, this fix can be
reverted safely.

Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline")
Signed-off-by: Haiyue Wang <[email protected]>
---
mm/migrate.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 6a1597c92261..581dfaad9257 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1848,6 +1848,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,

for (i = 0; i < nr_pages; i++) {
unsigned long addr = (unsigned long)(*pages);
+ unsigned int foll_flags = FOLL_DUMP;
struct vm_area_struct *vma;
struct page *page;
int err = -EFAULT;
@@ -1856,8 +1857,12 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
if (!vma)
goto set_status;

+ /* Not all huge page follow APIs support 'FOLL_GET' */
+ if (!is_vm_hugetlb_page(vma))
+ foll_flags |= FOLL_GET;
+
/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
+ page = follow_page(vma, addr, foll_flags);

err = PTR_ERR(page);
if (IS_ERR(page))
@@ -1865,7 +1870,8 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,

if (page && !is_zone_device_page(page)) {
err = page_to_nid(page);
- put_page(page);
+ if (foll_flags & FOLL_GET)
+ put_page(page);
} else {
err = -ENOENT;
}
--
2.37.2


2022-08-15 02:37:53

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page

Haiyue Wang <[email protected]> writes:

> Not all huge page APIs support FOLL_GET option, so the __NR_move_pages

move_pages() is a syscall, so you can just call it move_pages(), or
move_pages() syscall.

> will fail to get the page node information for huge page.
~~~~~~~~~
some huge pages?

> This is an temporary solution to mitigate the racing fix.

Why is it "racing fix"? This isn't a race condition fix.

Best Regards,
Huang, Ying

> After supporting follow huge page by FOLL_GET is done, this fix can be
> reverted safely.
>
> Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline")
> Signed-off-by: Haiyue Wang <[email protected]>

[snip]

2022-08-15 02:53:20

by Wang, Haiyue

[permalink] [raw]
Subject: RE: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page

> -----Original Message-----
> From: Huang, Ying <[email protected]>
> Sent: Monday, August 15, 2022 09:59
> To: Wang, Haiyue <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page
>
> Haiyue Wang <[email protected]> writes:
>
> > Not all huge page APIs support FOLL_GET option, so the __NR_move_pages
>
> move_pages() is a syscall, so you can just call it move_pages(), or
> move_pages() syscall.

The application meets the issue, use the bellow function:
syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)

So I used it directly in the commit. "move_pages() syscall" is better.
Will update latter.

>
> > will fail to get the page node information for huge page.
> ~~~~~~~~~
> some huge pages?

OK.

>
> > This is an temporary solution to mitigate the racing fix.
>
> Why is it "racing fix"? This isn't a race condition fix.

The 'Fixes' commit is about race condition fix.

How about " his is an temporary solution to mitigate the side effect
Of the race condition fix"

>
> Best Regards,
> Huang, Ying
>
> > After supporting follow huge page by FOLL_GET is done, this fix can be
> > reverted safely.
> >
> > Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline")
> > Signed-off-by: Haiyue Wang <[email protected]>
>
> [snip]

2022-08-15 02:55:37

by Wang, Haiyue

[permalink] [raw]
Subject: RE: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page

> -----Original Message-----
> From: Wang, Haiyue
> Sent: Monday, August 15, 2022 10:11
> To: Huang, Ying <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page
>
> > -----Original Message-----
> > From: Huang, Ying <[email protected]>
> > Sent: Monday, August 15, 2022 09:59
> > To: Wang, Haiyue <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page
> >
> > Haiyue Wang <[email protected]> writes:
> >


> >
> > > This is an temporary solution to mitigate the racing fix.
> >
> > Why is it "racing fix"? This isn't a race condition fix.
>
> The 'Fixes' commit is about race condition fix.
>
> How about " his is an temporary solution to mitigate the side effect
> Of the race condition fix"

Try to add more words to make things clean:

"This is an temporary solution to mitigate the side effect of the race
condition fix by calling follow_page() with FOLL_GET set."

>
> >
> > Best Regards,
> > Huang, Ying
> >
> > > After supporting follow huge page by FOLL_GET is done, this fix can be
> > > reverted safely.
> > >
> > > Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline")
> > > Signed-off-by: Haiyue Wang <[email protected]>
> >
> > [snip]

2022-08-15 03:22:56

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page

"Wang, Haiyue" <[email protected]> writes:

>> -----Original Message-----
>> From: Wang, Haiyue
>> Sent: Monday, August 15, 2022 10:11
>> To: Huang, Ying <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected]
>> Subject: RE: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page
>>
>> > -----Original Message-----
>> > From: Huang, Ying <[email protected]>
>> > Sent: Monday, August 15, 2022 09:59
>> > To: Wang, Haiyue <[email protected]>
>> > Cc: [email protected]; [email protected]; [email protected]; [email protected];
>> > [email protected]; [email protected]; [email protected]; [email protected]
>> > Subject: Re: [PATCH v3 1/2] mm: migration: fix the FOLL_GET failure on following huge page
>> >
>> > Haiyue Wang <[email protected]> writes:
>> >
>
>
>> >
>> > > This is an temporary solution to mitigate the racing fix.
>> >
>> > Why is it "racing fix"? This isn't a race condition fix.
>>
>> The 'Fixes' commit is about race condition fix.
>>
>> How about " his is an temporary solution to mitigate the side effect
>> Of the race condition fix"
>
> Try to add more words to make things clean:
>
> "This is an temporary solution to mitigate the side effect of the race
> condition fix by calling follow_page() with FOLL_GET set."

Looks good to me. Thanks!

Best Regards,
Huang, Ying

>>
>> >
>> > Best Regards,
>> > Huang, Ying
>> >
>> > > After supporting follow huge page by FOLL_GET is done, this fix can be
>> > > reverted safely.
>> > >
>> > > Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline")
>> > > Signed-off-by: Haiyue Wang <[email protected]>
>> >
>> > [snip]