LinuxLists.cc - Question about hwpoison handling of 1GB hugepage

2022-03-31 21:41:09

Subject: Question about hwpoison handling of 1GB hugepage

Hi,

Recently, I found a problem with hwpoison 1GB hugepage.
I created a process and mapped 1GB hugepage. This process will then fork a
child process and write/read this 1GB hugepage. Then I inject hwpoison into
this 1GB hugepage. The child process triggers the memory failure and is
being killed as expected. After this, the parent process will try to fork a
new child process and do the same thing. It is killed again and finally it
goes into such an infinite loop. I found this was caused by
commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1GB hugepage")

It looks like there is a bug for hwpoison 1GB hugepage so I try to reproduce
the bug described. After trying to revert the patch in an earlier version of
the kernel, I reproduce the bug described. Then I try to revert the patch in
latest version, and find the bug is no longer reproduced.

I compare the code paths of 1 GB hugepage and 2 MB hugepage for second madvise(MADV_HWPOISON),
and find that the problem is caused because in gup_pud_range(), pud_none() and
pud_huge() both return false and then trigger the bug. But in gup_pmd_range(),
the pmd_none() is modified to pmd_present() which will make code return directly.
The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") which
cause latest version not reproduced. I backport commit 15494520b776 in
earlier version and find the bug is no longer reproduced either.

So I'd like to consult that is it the time to revert commit 31286a8484a8?
Or if we modify pud_huge to be similar with pmd_huge, is it sufficient?

I also noticed there is a TODO comment in memory_failure_hugetlb():
- conversion of a pud that maps an error hugetlb into hwpoison
entry properly works, and
- other mm code walking over page table is aware of pud-aligned
hwpoison entries.

I'm not sure whether the above fix are sufficient, so is there anything else need
to analysis that I haven't considered?

Thanks,

2022-04-05 02:54:26

by HORIGUCHI NAOYA(堀口　直也)

[permalink] [raw]

Subject: Re: Question about hwpoison handling of 1GB hugepage

On Thu, Mar 31, 2022 at 06:56:25PM +0800, Liu Shixin wrote:
> Hi,
>
> Recently, I found a problem with hwpoison 1GB hugepage.
> I created a process and mapped 1GB hugepage. This process will then fork a
> child process and write/read this 1GB hugepage. Then I inject hwpoison into
> this 1GB hugepage. The child process triggers the memory failure and is
> being killed as expected. After this, the parent process will try to fork a
> new child process and do the same thing. It is killed again and finally it
> goes into such an infinite loop. I found this was caused by
> commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1GB hugepage")

Hello Shixin,

It's little unclear to me about what behavior you are expecting and
what the infinite loop repeats, could you explain little more about them?
(I briefly tried to reproduce it, but didn't make it...)

>
> It looks like there is a bug for hwpoison 1GB hugepage so I try to reproduce
> the bug described. After trying to revert the patch in an earlier version of
> the kernel, I reproduce the bug described. Then I try to revert the patch in
> latest version, and find the bug is no longer reproduced.
>
> I compare the code paths of 1 GB hugepage and 2 MB hugepage for second madvise(MADV_HWPOISON),
> and find that the problem is caused because in gup_pud_range(), pud_none() and
> pud_huge() both return false and then trigger the bug. But in gup_pmd_range(),
> the pmd_none() is modified to pmd_present() which will make code return directly.
> The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") which
> cause latest version not reproduced. I backport commit 15494520b776 in
> earlier version and find the bug is no longer reproduced either.

Thank you for the analysis.
So this patch might make 31286a8484a8 unnecessary, that's a good news.

>
> So I'd like to consult that is it the time to revert commit 31286a8484a8?
> Or if we modify pud_huge to be similar with pmd_huge, is it sufficient?
>
> I also noticed there is a TODO comment in memory_failure_hugetlb():
> - conversion of a pud that maps an error hugetlb into hwpoison
> entry properly works, and
> - other mm code walking over page table is aware of pud-aligned
> hwpoison entries.

These are simply minimum requirements, but might not be sufficient.
We need testing (with removing 31286a8484a8) to make sure that
there's no issues on some corner cases.
(I start to extend existing hugetlb-related testcases to 1GB ones.)

Thanks,
Naoya Horiguchi

>
> I'm not sure whether the above fix are sufficient, so is there anything else need
> to analysis that I haven't considered?

2022-04-06 16:44:26

by Liu Shixin

[permalink] [raw]

Subject: Re: Question about hwpoison handling of 1GB hugepage

On 2022/4/4 7:42, HORIGUCHI NAOYA(堀口　直也) wrote:
> On Thu, Mar 31, 2022 at 06:56:25PM +0800, Liu Shixin wrote:
>> Hi,
>>
>> Recently, I found a problem with hwpoison 1GB hugepage.
>> I created a process and mapped 1GB hugepage. This process will then fork a
>> child process and write/read this 1GB hugepage. Then I inject hwpoison into
>> this 1GB hugepage. The child process triggers the memory failure and is
>> being killed as expected. After this, the parent process will try to fork a
>> new child process and do the same thing. It is killed again and finally it
>> goes into such an infinite loop. I found this was caused by
>> commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1GB hugepage")
> Hello Shixin,
>
> It's little unclear to me about what behavior you are expecting and
> what the infinite loop repeats, could you explain little more about them?
> (I briefly tried to reproduce it, but didn't make it...)

There are two process in my environment. The parent process will firstly map
an 1GB hugepage then fork a child process and monitor it. If the child process
is killed, the parent process will fork a new child process. The child process will
write to the hugepage.

After we inject a hwpoison to the 1GB hugepage(madvise(MADV_HWPOISON)),
the child process will be killed by MCE when writing to the hugepage. Then the
parent process will fork new child process.

I expect the new child process can realloc a new 1GB hugepage and no longer be killed.
But now the child process will write to the hwpoison hugepage again and be killed.
For this reason, the parent process will keep forking new child process and the child
process will keep writing to the hwpoison hugepage and be killd.

>
>> It looks like there is a bug for hwpoison 1GB hugepage so I try to reproduce
>> the bug described. After trying to revert the patch in an earlier version of
>> the kernel, I reproduce the bug described. Then I try to revert the patch in
>> latest version, and find the bug is no longer reproduced.
>>
>> I compare the code paths of 1 GB hugepage and 2 MB hugepage for second madvise(MADV_HWPOISON),
>> and find that the problem is caused because in gup_pud_range(), pud_none() and
>> pud_huge() both return false and then trigger the bug. But in gup_pmd_range(),
>> the pmd_none() is modified to pmd_present() which will make code return directly.
>> The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") which
>> cause latest version not reproduced. I backport commit 15494520b776 in
>> earlier version and find the bug is no longer reproduced either.
> Thank you for the analysis.
> So this patch might make 31286a8484a8 unnecessary, that's a good news.
>
>> So I'd like to consult that is it the time to revert commit 31286a8484a8?
>> Or if we modify pud_huge to be similar with pmd_huge, is it sufficient?
>>
>> I also noticed there is a TODO comment in memory_failure_hugetlb():
>> - conversion of a pud that maps an error hugetlb into hwpoison
>> entry properly works, and
>> - other mm code walking over page table is aware of pud-aligned
>> hwpoison entries.
> These are simply minimum requirements, but might not be sufficient.
> We need testing (with removing 31286a8484a8) to make sure that
> there's no issues on some corner cases.
> (I start to extend existing hugetlb-related testcases to 1GB ones.)
Looking forward to the testcases and further conclusions.
>
> Thanks,
> Naoya Horiguchi
>
>> I'm not sure whether the above fix are sufficient, so is there anything else need
>> to analysis that I haven't considered?

2022-05-17 20:29:35

by Liu Shixin

[permalink] [raw]

Subject: Re: Question about hwpoison handling of 1GB hugepage

Hello Naoya,

Is there any progress on memory error handling on 1GB hugepage : )

Thanks,
Liu Shixin

On 2022/4/4 7:42, HORIGUCHI NAOYA(堀口　直也) wrote:
> On Thu, Mar 31, 2022 at 06:56:25PM +0800, Liu Shixin wrote:
>> Hi,
>>
>> Recently, I found a problem with hwpoison 1GB hugepage.
>> I created a process and mapped 1GB hugepage. This process will then fork a
>> child process and write/read this 1GB hugepage. Then I inject hwpoison into
>> this 1GB hugepage. The child process triggers the memory failure and is
>> being killed as expected. After this, the parent process will try to fork a
>> new child process and do the same thing. It is killed again and finally it
>> goes into such an infinite loop. I found this was caused by
>> commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1GB hugepage")
> Hello Shixin,
>
> It's little unclear to me about what behavior you are expecting and
> what the infinite loop repeats, could you explain little more about them?
> (I briefly tried to reproduce it, but didn't make it...)
>
>> It looks like there is a bug for hwpoison 1GB hugepage so I try to reproduce
>> the bug described. After trying to revert the patch in an earlier version of
>> the kernel, I reproduce the bug described. Then I try to revert the patch in
>> latest version, and find the bug is no longer reproduced.
>>
>> I compare the code paths of 1 GB hugepage and 2 MB hugepage for second madvise(MADV_HWPOISON),
>> and find that the problem is caused because in gup_pud_range(), pud_none() and
>> pud_huge() both return false and then trigger the bug. But in gup_pmd_range(),
>> the pmd_none() is modified to pmd_present() which will make code return directly.
>> The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") which
>> cause latest version not reproduced. I backport commit 15494520b776 in
>> earlier version and find the bug is no longer reproduced either.
> Thank you for the analysis.
> So this patch might make 31286a8484a8 unnecessary, that's a good news.
>
>> So I'd like to consult that is it the time to revert commit 31286a8484a8?
>> Or if we modify pud_huge to be similar with pmd_huge, is it sufficient?
>>
>> I also noticed there is a TODO comment in memory_failure_hugetlb():
>> - conversion of a pud that maps an error hugetlb into hwpoison
>> entry properly works, and
>> - other mm code walking over page table is aware of pud-aligned
>> hwpoison entries.
> These are simply minimum requirements, but might not be sufficient.
> We need testing (with removing 31286a8484a8) to make sure that
> there's no issues on some corner cases.
> (I start to extend existing hugetlb-related testcases to 1GB ones.)
>
> Thanks,
> Naoya Horiguchi
>
>> I'm not sure whether the above fix are sufficient, so is there anything else need
>> to analysis that I haven't considered?

2022-05-19 04:47:50

by HORIGUCHI NAOYA(堀口　直也)

[permalink] [raw]

Subject: Re: Question about hwpoison handling of 1GB hugepage

On Tue, May 17, 2022 at 08:03:03PM +0800, Liu Shixin wrote:
> Hello Naoya,
>
> Is there any progress on memory error handling on 1GB hugepage : )

Hi Shixin,

I have a little ..., I found that error handling fails for anonymous 1GB
hugepage because __page_handle_poison() fails. I don't pinpoint the issue
precisely yet, but I feel that there's some issue in free_gigantic_page()
that fails to send the victim raw page to buddy. I don't think that this is
an critical issue because the error page should not be reused (it's isolated
but not in controlled manner). This prevents unpoisoning and make testing
inefficient, so I'd like to fix.

Thanks,
Naoya Horiguchi

>
> Thanks,
> Liu Shixin
>
> On 2022/4/4 7:42, HORIGUCHI NAOYA(堀口　直也) wrote:
> > On Thu, Mar 31, 2022 at 06:56:25PM +0800, Liu Shixin wrote:
> >> Hi,
> >>
> >> Recently, I found a problem with hwpoison 1GB hugepage.
> >> I created a process and mapped 1GB hugepage. This process will then fork a
> >> child process and write/read this 1GB hugepage. Then I inject hwpoison into
> >> this 1GB hugepage. The child process triggers the memory failure and is
> >> being killed as expected. After this, the parent process will try to fork a
> >> new child process and do the same thing. It is killed again and finally it
> >> goes into such an infinite loop. I found this was caused by
> >> commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1GB hugepage")
> > Hello Shixin,
> >
> > It's little unclear to me about what behavior you are expecting and
> > what the infinite loop repeats, could you explain little more about them?
> > (I briefly tried to reproduce it, but didn't make it...)
> >
> >> It looks like there is a bug for hwpoison 1GB hugepage so I try to reproduce
> >> the bug described. After trying to revert the patch in an earlier version of
> >> the kernel, I reproduce the bug described. Then I try to revert the patch in
> >> latest version, and find the bug is no longer reproduced.
> >>
> >> I compare the code paths of 1 GB hugepage and 2 MB hugepage for second madvise(MADV_HWPOISON),
> >> and find that the problem is caused because in gup_pud_range(), pud_none() and
> >> pud_huge() both return false and then trigger the bug. But in gup_pmd_range(),
> >> the pmd_none() is modified to pmd_present() which will make code return directly.
> >> The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") which
> >> cause latest version not reproduced. I backport commit 15494520b776 in
> >> earlier version and find the bug is no longer reproduced either.
> > Thank you for the analysis.
> > So this patch might make 31286a8484a8 unnecessary, that's a good news.
> >
> >> So I'd like to consult that is it the time to revert commit 31286a8484a8?
> >> Or if we modify pud_huge to be similar with pmd_huge, is it sufficient?
> >>
> >> I also noticed there is a TODO comment in memory_failure_hugetlb():
> >> - conversion of a pud that maps an error hugetlb into hwpoison
> >> entry properly works, and
> >> - other mm code walking over page table is aware of pud-aligned
> >> hwpoison entries.
> > These are simply minimum requirements, but might not be sufficient.
> > We need testing (with removing 31286a8484a8) to make sure that
> > there's no issues on some corner cases.
> > (I start to extend existing hugetlb-related testcases to 1GB ones.)
> >
> > Thanks,
> > Naoya Horiguchi
> >
> >> I'm not sure whether the above fix are sufficient, so is there anything else need
> >> to analysis that I haven't considered?

2022-06-02 06:37:55

by HORIGUCHI NAOYA(堀口　直也)

[permalink] [raw]

Subject: Re: Question about hwpoison handling of 1GB hugepage

On Thu, May 19, 2022 at 02:17:58AM +0000, HORIGUCHI NAOYA(堀口直也) wrote:
> On Tue, May 17, 2022 at 08:03:03PM +0800, Liu Shixin wrote:
> > Hello Naoya,
> >
> > Is there any progress on memory error handling on 1GB hugepage : )
>
> Hi Shixin,
>
> I have a little ..., I found that error handling fails for anonymous 1GB
> hugepage because __page_handle_poison() fails. I don't pinpoint the issue
> precisely yet, but I feel that there's some issue in free_gigantic_page()
> that fails to send the victim raw page to buddy. I don't think that this is
> an critical issue because the error page should not be reused (it's isolated
> but not in controlled manner). This prevents unpoisoning and make testing
> inefficient, so I'd like to fix.

I posted a patchset enabling 1GB hugepage support,
https://lore.kernel.org/linux-mm/[email protected]/T/#u

It passed my testing but I appreciate it if you try testing it
in your workload.

Thanks,
Naoya Horiguchi

>
> >
> > Thanks,
> > Liu Shixin
> >
> > On 2022/4/4 7:42, HORIGUCHI NAOYA(堀口　直也) wrote:
> > > On Thu, Mar 31, 2022 at 06:56:25PM +0800, Liu Shixin wrote:
> > >> Hi,
> > >>
> > >> Recently, I found a problem with hwpoison 1GB hugepage.
> > >> I created a process and mapped 1GB hugepage. This process will then fork a
> > >> child process and write/read this 1GB hugepage. Then I inject hwpoison into
> > >> this 1GB hugepage. The child process triggers the memory failure and is
> > >> being killed as expected. After this, the parent process will try to fork a
> > >> new child process and do the same thing. It is killed again and finally it
> > >> goes into such an infinite loop. I found this was caused by
> > >> commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1GB hugepage")
> > > Hello Shixin,
> > >
> > > It's little unclear to me about what behavior you are expecting and
> > > what the infinite loop repeats, could you explain little more about them?
> > > (I briefly tried to reproduce it, but didn't make it...)
> > >
> > >> It looks like there is a bug for hwpoison 1GB hugepage so I try to reproduce
> > >> the bug described. After trying to revert the patch in an earlier version of
> > >> the kernel, I reproduce the bug described. Then I try to revert the patch in
> > >> latest version, and find the bug is no longer reproduced.
> > >>
> > >> I compare the code paths of 1 GB hugepage and 2 MB hugepage for second madvise(MADV_HWPOISON),
> > >> and find that the problem is caused because in gup_pud_range(), pud_none() and
> > >> pud_huge() both return false and then trigger the bug. But in gup_pmd_range(),
> > >> the pmd_none() is modified to pmd_present() which will make code return directly.
> > >> The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") which
> > >> cause latest version not reproduced. I backport commit 15494520b776 in
> > >> earlier version and find the bug is no longer reproduced either.
> > > Thank you for the analysis.
> > > So this patch might make 31286a8484a8 unnecessary, that's a good news.
> > >
> > >> So I'd like to consult that is it the time to revert commit 31286a8484a8?
> > >> Or if we modify pud_huge to be similar with pmd_huge, is it sufficient?
> > >>
> > >> I also noticed there is a TODO comment in memory_failure_hugetlb():
> > >> - conversion of a pud that maps an error hugetlb into hwpoison
> > >> entry properly works, and
> > >> - other mm code walking over page table is aware of pud-aligned
> > >> hwpoison entries.
> > > These are simply minimum requirements, but might not be sufficient.
> > > We need testing (with removing 31286a8484a8) to make sure that
> > > there's no issues on some corner cases.
> > > (I start to extend existing hugetlb-related testcases to 1GB ones.)
> > >
> > > Thanks,
> > > Naoya Horiguchi
> > >
> > >> I'm not sure whether the above fix are sufficient, so is there anything else need
> > >> to analysis that I haven't considered?