In some cases it appears the invalidation of a hwpoisoned page
fails because the page is still mapped in another process. This
can cause a program to be continuously restarted and die when
it page faults on the page that was not invalidated. Avoid that
problem by unmapping the hwpoisoned page when we find it.
Another issue is that sometimes we end up oopsing in finish_fault,
if the code tries to do something with the now-NULL vmf->page.
I did not hit this error when submitting the previous patch because
there are several opportunities for alloc_set_pte to bail out before
accessing vmf->page, and that apparently happened on those systems,
and most of the time on other systems, too.
However, across several million systems that error does occur a
handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE
which will cause do_read_fault to return before calling finish_fault.
Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
Cc: Oscar Salvador <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
---
mm/memory.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index be44d0b36b18..76e3af9639d9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
return ret;
if (unlikely(PageHWPoison(vmf->page))) {
+ struct page *page = vmf->page;
vm_fault_t poisonret = VM_FAULT_HWPOISON;
if (ret & VM_FAULT_LOCKED) {
+ if (page_mapped(page))
+ unmap_mapping_pages(page_mapping(page),
+ page->index, 1, false);
/* Retry if a clean page was removed from the cache. */
- if (invalidate_inode_page(vmf->page))
- poisonret = 0;
- unlock_page(vmf->page);
+ if (invalidate_inode_page(page))
+ poisonret = VM_FAULT_NOPAGE;
+ unlock_page(page);
}
- put_page(vmf->page);
+ put_page(page);
vmf->page = NULL;
return poisonret;
}
--
2.35.1
On 2022/3/26 4:14, Rik van Riel wrote:
> In some cases it appears the invalidation of a hwpoisoned page
> fails because the page is still mapped in another process. This
> can cause a program to be continuously restarted and die when
> it page faults on the page that was not invalidated. Avoid that
> problem by unmapping the hwpoisoned page when we find it.
>
> Another issue is that sometimes we end up oopsing in finish_fault,
> if the code tries to do something with the now-NULL vmf->page.
> I did not hit this error when submitting the previous patch because
> there are several opportunities for alloc_set_pte to bail out before
> accessing vmf->page, and that apparently happened on those systems,
> and most of the time on other systems, too.
>
> However, across several million systems that error does occur a
> handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE
> which will cause do_read_fault to return before calling finish_fault.
>
> Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
> Cc: Oscar Salvador <[email protected]>
> Cc: Miaohe Lin <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> ---
> mm/memory.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index be44d0b36b18..76e3af9639d9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> return ret;
>
> if (unlikely(PageHWPoison(vmf->page))) {
> + struct page *page = vmf->page;
> vm_fault_t poisonret = VM_FAULT_HWPOISON;
> if (ret & VM_FAULT_LOCKED) {
> + if (page_mapped(page))
> + unmap_mapping_pages(page_mapping(page),
> + page->index, 1, false);
It seems this unmap_mapping_pages also helps the success rate of the below invalidate_inode_page.
> /* Retry if a clean page was removed from the cache. */
> - if (invalidate_inode_page(vmf->page))
> - poisonret = 0;
> - unlock_page(vmf->page);
> + if (invalidate_inode_page(page))
> + poisonret = VM_FAULT_NOPAGE;
> + unlock_page(page);
> }
> - put_page(vmf->page);
> + put_page(page);
Do we use page instead of vmf->page just for simplicity? Or there is some other concern?
> vmf->page = NULL;
We return either VM_FAULT_NOPAGE or VM_FAULT_HWPOISON with vmf->page = NULL. If any case,
finish_fault won't be called later. So I think your fix is right.
> return poisonret;
> }
>
Many thanks for your patch.
On 2022/3/28 10:24, Rik van Riel wrote:
> On Mon, 2022-03-28 at 10:14 +0800, Miaohe Lin wrote:
>> On 2022/3/27 4:14, Rik van Riel wrote:
>>
>>
>>>
>>>>> /* Retry if a clean page was removed
>>>>> from
>>>>> the cache. */
>>>>> - if (invalidate_inode_page(vmf->page))
>>>>> - poisonret = 0;
>>>>> - unlock_page(vmf->page);
>>>>> + if (invalidate_inode_page(page))
>>>>> + poisonret = VM_FAULT_NOPAGE;
>>>>> + unlock_page(page);
>>>
>>
>> Sure, but when I think more about this, it seems this fix isn't
>> ideal:
>> If VM_FAULT_NOPAGE is returned with page table unset, the process
>> will
>> re-trigger page fault again and again until invalidate_inode_page
>> succeeds
>> to evict the inode page. This might hang the process a really long
>> time.
>> Or am I miss something?
>>
> If invalidate_inode_page fails, we will return
> VM_FAULT_HWPOISON, and kill the task, instead
> of looping indefinitely.
Oh, really sorry! It's a drowsy Monday morning. :)
This patch looks good to me. Thanks!
Reviewed-by: Miaohe Lin <[email protected]>
>
On 2022/3/27 4:14, Rik van Riel wrote:
> On Sat, 2022-03-26 at 15:48 +0800, Miaohe Lin wrote:
>> On 2022/3/26 4:14, Rik van Riel wrote:
>>>
>>> +++ b/mm/memory.c
>>> @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct
>>> vm_fault *vmf)
>>> return ret;
>>>
>>> if (unlikely(PageHWPoison(vmf->page))) {
>>> + struct page *page = vmf->page;
>>> vm_fault_t poisonret = VM_FAULT_HWPOISON;
>>> if (ret & VM_FAULT_LOCKED) {
>>> + if (page_mapped(page))
>>> + unmap_mapping_pages(page_mapping(pa
>>> ge),
>>> + page->index, 1,
>>> false);
>>
>> It seems this unmap_mapping_pages also helps the success rate of the
>> below invalidate_inode_page.
>>
>
> That is indeed what it is supposed to do.
>
> It isn't fool proof, since you can still end up
> with dirty pages that don't get cleaned immediately,
> but it seems to turn infinite loops of a program
> being killed every time it's started into a more
> manageable situation where the task succeeds again
> pretty quickly.
Looks convincing to me.
>
>>> /* Retry if a clean page was removed from
>>> the cache. */
>>> - if (invalidate_inode_page(vmf->page))
>>> - poisonret = 0;
>>> - unlock_page(vmf->page);
>>> + if (invalidate_inode_page(page))
>>> + poisonret = VM_FAULT_NOPAGE;
>>> + unlock_page(page);
>>> }
>>> - put_page(vmf->page);
>>> + put_page(page);
>>
>> Do we use page instead of vmf->page just for simplicity? Or there is
>> some other concern?
>>
>
> Just a simplification, and not dereferencing the same thing
> 6 times.
>
I see. :)
>>> vmf->page = NULL;
>>
>> We return either VM_FAULT_NOPAGE or VM_FAULT_HWPOISON with vmf->page
>> = NULL. If any case,
>> finish_fault won't be called later. So I think your fix is right.
>
> Want to send in a Reviewed-by or Acked-by? :)
>
Sure, but when I think more about this, it seems this fix isn't ideal:
If VM_FAULT_NOPAGE is returned with page table unset, the process will
re-trigger page fault again and again until invalidate_inode_page succeeds
to evict the inode page. This might hang the process a really long time.
Or am I miss something?
Thanks.
On Mon, 2022-03-28 at 10:14 +0800, Miaohe Lin wrote:
> On 2022/3/27 4:14, Rik van Riel wrote:
>
>
> >
> > > > /* Retry if a clean page was removed
> > > > from
> > > > the cache. */
> > > > - if (invalidate_inode_page(vmf->page))
> > > > - poisonret = 0;
> > > > - unlock_page(vmf->page);
> > > > + if (invalidate_inode_page(page))
> > > > + poisonret = VM_FAULT_NOPAGE;
> > > > + unlock_page(page);
> >
>
> Sure, but when I think more about this, it seems this fix isn't
> ideal:
> If VM_FAULT_NOPAGE is returned with page table unset, the process
> will
> re-trigger page fault again and again until invalidate_inode_page
> succeeds
> to evict the inode page. This might hang the process a really long
> time.
> Or am I miss something?
>
If invalidate_inode_page fails, we will return
VM_FAULT_HWPOISON, and kill the task, instead
of looping indefinitely.
--
All Rights Reversed.
On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote:
> In some cases it appears the invalidation of a hwpoisoned page
> fails because the page is still mapped in another process. This
> can cause a program to be continuously restarted and die when
> it page faults on the page that was not invalidated. Avoid that
> problem by unmapping the hwpoisoned page when we find it.
>
> Another issue is that sometimes we end up oopsing in finish_fault,
> if the code tries to do something with the now-NULL vmf->page.
> I did not hit this error when submitting the previous patch because
> there are several opportunities for alloc_set_pte to bail out before
> accessing vmf->page, and that apparently happened on those systems,
> and most of the time on other systems, too.
>
> However, across several million systems that error does occur a
> handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE
> which will cause do_read_fault to return before calling finish_fault.
I artificially created clean/dirty page cache pages with PageHWPoison flag
(with SystemTap), then reproduced NULL pointer dereference by page fault on
current mainline branch (with e53ac7374e64). And confirmed that the bug was
fixed with this patch, so the fix seems to work.
(Maybe I should've done this kind of testing before merging e53ac7374e64, sorry..)
Anyway, thank you very much.
Tested-by: Naoya Horiguchi <[email protected]>
>
> Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
> Cc: Oscar Salvador <[email protected]>
> Cc: Miaohe Lin <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> ---
> mm/memory.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index be44d0b36b18..76e3af9639d9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> return ret;
>
> if (unlikely(PageHWPoison(vmf->page))) {
> + struct page *page = vmf->page;
> vm_fault_t poisonret = VM_FAULT_HWPOISON;
> if (ret & VM_FAULT_LOCKED) {
> + if (page_mapped(page))
> + unmap_mapping_pages(page_mapping(page),
> + page->index, 1, false);
> /* Retry if a clean page was removed from the cache. */
> - if (invalidate_inode_page(vmf->page))
> - poisonret = 0;
> - unlock_page(vmf->page);
> + if (invalidate_inode_page(page))
> + poisonret = VM_FAULT_NOPAGE;
> + unlock_page(page);
> }
> - put_page(vmf->page);
> + put_page(page);
> vmf->page = NULL;
> return poisonret;
> }
> --
> 2.35.1
>
On Sat, 2022-03-26 at 15:48 +0800, Miaohe Lin wrote:
> On 2022/3/26 4:14, Rik van Riel wrote:
> >
> > +++ b/mm/memory.c
> > @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct
> > vm_fault *vmf)
> > return ret;
> >
> > if (unlikely(PageHWPoison(vmf->page))) {
> > + struct page *page = vmf->page;
> > vm_fault_t poisonret = VM_FAULT_HWPOISON;
> > if (ret & VM_FAULT_LOCKED) {
> > + if (page_mapped(page))
> > + unmap_mapping_pages(page_mapping(pa
> > ge),
> > + page->index, 1,
> > false);
>
> It seems this unmap_mapping_pages also helps the success rate of the
> below invalidate_inode_page.
>
That is indeed what it is supposed to do.
It isn't fool proof, since you can still end up
with dirty pages that don't get cleaned immediately,
but it seems to turn infinite loops of a program
being killed every time it's started into a more
manageable situation where the task succeeds again
pretty quickly.
> > /* Retry if a clean page was removed from
> > the cache. */
> > - if (invalidate_inode_page(vmf->page))
> > - poisonret = 0;
> > - unlock_page(vmf->page);
> > + if (invalidate_inode_page(page))
> > + poisonret = VM_FAULT_NOPAGE;
> > + unlock_page(page);
> > }
> > - put_page(vmf->page);
> > + put_page(page);
>
> Do we use page instead of vmf->page just for simplicity? Or there is
> some other concern?
>
Just a simplification, and not dereferencing the same thing
6 times.
> > vmf->page = NULL;
>
> We return either VM_FAULT_NOPAGE or VM_FAULT_HWPOISON with vmf->page
> = NULL. If any case,
> finish_fault won't be called later. So I think your fix is right.
Want to send in a Reviewed-by or Acked-by? :)
--
All Rights Reversed.
On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote:
> In some cases it appears the invalidation of a hwpoisoned page
> fails because the page is still mapped in another process. This
> can cause a program to be continuously restarted and die when
> it page faults on the page that was not invalidated. Avoid that
> problem by unmapping the hwpoisoned page when we find it.
>
> Another issue is that sometimes we end up oopsing in finish_fault,
> if the code tries to do something with the now-NULL vmf->page.
> I did not hit this error when submitting the previous patch because
> there are several opportunities for alloc_set_pte to bail out before
> accessing vmf->page, and that apparently happened on those systems,
> and most of the time on other systems, too.
>
> However, across several million systems that error does occur a
> handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE
> which will cause do_read_fault to return before calling finish_fault.
>
> Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
> Cc: Oscar Salvador <[email protected]>
> Cc: Miaohe Lin <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> ---
> mm/memory.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index be44d0b36b18..76e3af9639d9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> return ret;
>
> if (unlikely(PageHWPoison(vmf->page))) {
> + struct page *page = vmf->page;
> vm_fault_t poisonret = VM_FAULT_HWPOISON;
> if (ret & VM_FAULT_LOCKED) {
> + if (page_mapped(page))
> + unmap_mapping_pages(page_mapping(page),
> + page->index, 1, false);
> /* Retry if a clean page was removed from the cache. */
> - if (invalidate_inode_page(vmf->page))
> - poisonret = 0;
> - unlock_page(vmf->page);
> + if (invalidate_inode_page(page))
> + poisonret = VM_FAULT_NOPAGE;
What is the effect of returning VM_FAULT_NOPAGE?
I take that we are cool because the pte has been installed and points to
a new page? (I could not find where that is being done).
--
Oscar Salvador
SUSE Labs
On Mon, 2022-03-28 at 11:00 +0200, Oscar Salvador wrote:
> On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote:
> > + if (invalidate_inode_page(page))
> > + poisonret = VM_FAULT_NOPAGE;
>
> What is the effect of returning VM_FAULT_NOPAGE?
> I take that we are cool because the pte has been installed and points
> to
> a new page? (I could not find where that is being done).
>
It results in us returning to userspace as if the page
fault had been handled, resulting in a second fault on
the same address.
However, now the page is no longer in the page cache,
and we can read it in from disk, to a page that is not
hardware poisoned, and we can then use that second page
without issues.
--
All Rights Reversed.
On Fri, Mar 25, 2022 at 04:14:28PM -0400, Rik van Riel wrote:
> In some cases it appears the invalidation of a hwpoisoned page
> fails because the page is still mapped in another process. This
> can cause a program to be continuously restarted and die when
> it page faults on the page that was not invalidated. Avoid that
> problem by unmapping the hwpoisoned page when we find it.
>
> Another issue is that sometimes we end up oopsing in finish_fault,
> if the code tries to do something with the now-NULL vmf->page.
> I did not hit this error when submitting the previous patch because
> there are several opportunities for alloc_set_pte to bail out before
> accessing vmf->page, and that apparently happened on those systems,
> and most of the time on other systems, too.
>
> However, across several million systems that error does occur a
> handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE
> which will cause do_read_fault to return before calling finish_fault.
>
> Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
> Cc: Oscar Salvador <[email protected]>
> Cc: Miaohe Lin <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
Reviewed-by: Oscar Salvador <[email protected]>
> ---
> mm/memory.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index be44d0b36b18..76e3af9639d9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> return ret;
>
> if (unlikely(PageHWPoison(vmf->page))) {
> + struct page *page = vmf->page;
> vm_fault_t poisonret = VM_FAULT_HWPOISON;
> if (ret & VM_FAULT_LOCKED) {
> + if (page_mapped(page))
> + unmap_mapping_pages(page_mapping(page),
> + page->index, 1, false);
> /* Retry if a clean page was removed from the cache. */
> - if (invalidate_inode_page(vmf->page))
> - poisonret = 0;
> - unlock_page(vmf->page);
> + if (invalidate_inode_page(page))
> + poisonret = VM_FAULT_NOPAGE;
> + unlock_page(page);
> }
> - put_page(vmf->page);
> + put_page(page);
> vmf->page = NULL;
> return poisonret;
> }
> --
> 2.35.1
>
>
>
--
Oscar Salvador
SUSE Labs
On Tue, Mar 29, 2022 at 11:49:53AM -0400, Rik van Riel wrote:
> It results in us returning to userspace as if the page
> fault had been handled, resulting in a second fault on
> the same address.
>
> However, now the page is no longer in the page cache,
> and we can read it in from disk, to a page that is not
> hardware poisoned, and we can then use that second page
> without issues.
Ok, I see, thanks a lot for the explanation Rik.
--
Oscar Salvador
SUSE Labs