2022-08-12 02:00:15

by Miaohe Lin

[permalink] [raw]
Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage

On 2022/8/11 16:01, Wang, Haiyue wrote:
> Hi Miaohe,
>
> ?

Hi Haiyue,

Many thanks for your report and debug.

>
> When I call ?*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*? to get the huge page node
>
> information, it is failed with ?-2? returned in ?status? array.
>
> ?
>
> After some debug, I found that ?*follow_huge_pud*? will return NULL if ?*FOLL_GET*? is set.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de35aaa99ba819c93d <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de35aaa99ba819c93d>
>
> ?
>
> This will make your patch doesn?t work for huge page.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee3cfa96718784d1f5 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee3cfa96718784d1f5>
>

Supporting of ?*FOLL_GET*? in follow_huge_pud is introduced via the below commit:

https://lore.kernel.org/all/[email protected]/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081

But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not supported. And pgd level
hugepage doesn't support FOLL_GET now.

> ?
>
> Not sure you know this issue or not, just share my debug information.

I'm not sure whether it's better to revert my above "problematic" patch first then add it back when all hugetlb pages support FOLL_GET.
Or we could just live with it? Any thoughts?


Thanks,
Miaohe Lin


>
> ?
>
> BR,
>
> Haiyue
>
> ?
>


2022-08-12 03:22:53

by Wang, Haiyue

[permalink] [raw]
Subject: RE: Linux 5.19 __NR_move_pages failed for hugepage

> -----Original Message-----
> From: Miaohe Lin <[email protected]>
> Sent: Friday, August 12, 2022 09:59
> To: Wang, Haiyue <[email protected]>
> Cc: [email protected]; Linux-MM <[email protected]>; linux-kernel <linux-
> [email protected]>; Naoya Horiguchi <[email protected]>; David Hildenbrand
> <[email protected]>
> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
>
> On 2022/8/11 16:01, Wang, Haiyue wrote:
> > Hi Miaohe,
> >
> >
>
> Hi Haiyue,
>
> Many thanks for your report and debug.
>
> >
> > When I call "*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*" to get the huge page node
> >
> > information, it is failed with '-2' returned in 'status' array.
> >
> >
> >
> > After some debug, I found that "*follow_huge_pud*" will return NULL if '*FOLL_GET*' is set.
> >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de
> 35aaa99ba819c93d
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39d
> e35aaa99ba819c93d>
> >
> >
> >
> > This will make your patch doesn't work for huge page.
> >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee
> 3cfa96718784d1f5
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769e
> e3cfa96718784d1f5>
> >
>
> Supporting of '*FOLL_GET*' in follow_huge_pud is introduced via the below commit:
>
> https://lore.kernel.org/all/20220714042420.1847125-9-
> [email protected]/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081
>
> But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not supported.
> And pgd level
> hugepage doesn't support FOLL_GET now.
>
> >
> >
> > Not sure you know this issue or not, just share my debug information.
>
> I'm not sure whether it's better to revert my above "problematic" patch first then add it back when
> all hugetlb pages support FOLL_GET.
> Or we could just live with it? Any thoughts?
>

TBH, the issue is more complicated than I think. :-(

Looks like only '[PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages' will be
backported to 5.19 ? Only this patch has "Fixes:" tag. If so, it will break 5.19.

I just run VPP 'https://fd.io/' to find the error message about huge page allocation
after I switched from 5.18 to 5.19.

>
> Thanks,
> Miaohe Lin
>
>
> >
> >
> >
> > BR,
> >
> > Haiyue
> >
> >
> >

2022-08-12 06:47:12

by Miaohe Lin

[permalink] [raw]
Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage

On 2022/8/12 11:04, Wang, Haiyue wrote:
>> -----Original Message-----
>> From: Miaohe Lin <[email protected]>
>> Sent: Friday, August 12, 2022 09:59
>> To: Wang, Haiyue <[email protected]>
>> Cc: [email protected]; Linux-MM <[email protected]>; linux-kernel <linux-
>> [email protected]>; Naoya Horiguchi <[email protected]>; David Hildenbrand
>> <[email protected]>
>> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
>>
>> On 2022/8/11 16:01, Wang, Haiyue wrote:
>>> Hi Miaohe,
>>>
>>>
>>
>> Hi Haiyue,
>>
>> Many thanks for your report and debug.
>>
>>>
>>> When I call "*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*" to get the huge page node
>>>
>>> information, it is failed with '-2' returned in 'status' array.
>>>
>>>
>>>
>>> After some debug, I found that "*follow_huge_pud*" will return NULL if '*FOLL_GET*' is set.
>>>
>>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de
>> 35aaa99ba819c93d
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39d
>> e35aaa99ba819c93d>
>>>
>>>
>>>
>>> This will make your patch doesn't work for huge page.
>>>
>>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee
>> 3cfa96718784d1f5
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769e
>> e3cfa96718784d1f5>
>>>
>>
>> Supporting of '*FOLL_GET*' in follow_huge_pud is introduced via the below commit:
>>
>> https://lore.kernel.org/all/20220714042420.1847125-9-
>> [email protected]/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081
>>
>> But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not supported.
>> And pgd level
>> hugepage doesn't support FOLL_GET now.
>>
>>>
>>>
>>> Not sure you know this issue or not, just share my debug information.
>>
>> I'm not sure whether it's better to revert my above "problematic" patch first then add it back when
>> all hugetlb pages support FOLL_GET.
>> Or we could just live with it? Any thoughts?
>>
>
> TBH, the issue is more complicated than I think. :-(
>
> Looks like only '[PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages' will be
> backported to 5.19 ? Only this patch has "Fixes:" tag. If so, it will break 5.19.

If you want to mitigate the problem of __NR_move_pages failing for hugepage, "[PATCH v7 2/8] mm/hugetlb:
make pud_huge() and follow_huge_pud() aware of non-present pud entry" could be backported to 5.19.

>
> I just run VPP 'https://fd.io/' to find the error message about huge page allocation
> after I switched from 5.18 to 5.19.

Do you mean the reported problem is found by VPP? Anyway, you can send a patch to fix the problem if you like. :)
I will try fixing it if requested of course (but I'm not sure how to fix it yet).

Thanks,
Miaohe Lin

2022-08-12 09:43:50

by Wang, Haiyue

[permalink] [raw]
Subject: RE: Linux 5.19 __NR_move_pages failed for hugepage

> -----Original Message-----
> From: Miaohe Lin <[email protected]>
> Sent: Friday, August 12, 2022 14:41
> To: Wang, Haiyue <[email protected]>
> Cc: [email protected]; Linux-MM <[email protected]>; linux-kernel <linux-
> [email protected]>; Naoya Horiguchi <[email protected]>; David Hildenbrand
> <[email protected]>
> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
>
> On 2022/8/12 11:04, Wang, Haiyue wrote:
> >> -----Original Message-----
> >> From: Miaohe Lin <[email protected]>
> >> Sent: Friday, August 12, 2022 09:59
> >> To: Wang, Haiyue <[email protected]>
> >> Cc: [email protected]; Linux-MM <[email protected]>; linux-kernel <linux-
> >> [email protected]>; Naoya Horiguchi <[email protected]>; David Hildenbrand
> >> <[email protected]>
> >> Subject: Re: Linux 5.19 __NR_move_pages failed for hugepage
> >>
> >> On 2022/8/11 16:01, Wang, Haiyue wrote:
> >>> Hi Miaohe,
> >>>
> >>>
> >>
> >> Hi Haiyue,
> >>
> >> Many thanks for your report and debug.
> >>
> >>>
> >>> When I call "*syscall (__NR_move_pages, 0, n_pages, ptr, 0, status, 0)*" to get the huge page node
> >>>
> >>> information, it is failed with '-2' returned in 'status' array.
> >>>
> >>>
> >>>
> >>> After some debug, I found that "*follow_huge_pud*" will return NULL if '*FOLL_GET*' is set.
> >>>
> >>>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39de
> >> 35aaa99ba819c93d
> >>
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e66f17ff71772b209eed39d
> >> e35aaa99ba819c93d>
> >>>
> >>>
> >>>
> >>> This will make your patch doesn't work for huge page.
> >>>
> >>>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769ee
> >> 3cfa96718784d1f5
> >>
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd614841c06338a087769e
> >> e3cfa96718784d1f5>
> >>>
> >>
> >> Supporting of '*FOLL_GET*' in follow_huge_pud is introduced via the below commit:
> >>
> >> https://lore.kernel.org/all/20220714042420.1847125-9-
> >> [email protected]/T/#mb3c83df087fba454b7b4ea32227fb8775ca70081
> >>
> >> But that's still not perfect yet. For s390 version of follow_huge_pud, FOLL_GET is still not
> supported.
> >> And pgd level
> >> hugepage doesn't support FOLL_GET now.
> >>
> >>>
> >>>
> >>> Not sure you know this issue or not, just share my debug information.
> >>
> >> I'm not sure whether it's better to revert my above "problematic" patch first then add it back when
> >> all hugetlb pages support FOLL_GET.
> >> Or we could just live with it? Any thoughts?
> >>
> >
> > TBH, the issue is more complicated than I think. :-(
> >
> > Looks like only '[PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages' will be
> > backported to 5.19 ? Only this patch has "Fixes:" tag. If so, it will break 5.19.
>
> If you want to mitigate the problem of __NR_move_pages failing for hugepage, "[PATCH v7 2/8]
> mm/hugetlb:
> make pud_huge() and follow_huge_pud() aware of non-present pud entry" could be backported to 5.19.
>
> >
> > I just run VPP 'https://fd.io/' to find the error message about huge page allocation
> > after I switched from 5.18 to 5.19.
>
> Do you mean the reported problem is found by VPP? Anyway, you can send a patch to fix the problem if
> you like. :)
> I will try fixing it if requested of course (but I'm not sure how to fix it yet).
>

I try a quick fix, and cc'ed you. Ugly design, but your fix is kept.

> Thanks,
> Miaohe Lin