2024-05-21 07:35:17

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
means that we might need to call restore_reserve_on_error() upon error.
vmf_anon_prepare() releases the vma lock before returning, but
restore_reserve_on_error() expects the vma lock to be held by the caller.

Fix it by calling vmf_anon_prepare() before allocating the page.

Signed-off-by: Oscar Salvador <[email protected]>
Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")
---
I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
for some other reason. And I did not want to get creative to see if I could
trigger this so I could get a backtrace.
My assumption is that we could trigger this if 1) this was a shared mapping,
so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().

mm/hugetlb.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6be78e7d4f6e..eb0d8a45505e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6005,6 +6005,15 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
* be acquired again before returning to the caller, as expected.
*/
spin_unlock(vmf->ptl);
+
+ /*
+ * When the original hugepage is shared one, it does not have
+ * anon_vma prepared.
+ */
+ ret = vmf_anon_prepare(vmf);
+ if (unlikely(ret))
+ goto out_release_old;
+
new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve);

if (IS_ERR(new_folio)) {
@@ -6058,14 +6067,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
goto out_release_old;
}

- /*
- * When the original hugepage is shared one, it does not have
- * anon_vma prepared.
- */
- ret = vmf_anon_prepare(vmf);
- if (unlikely(ret))
- goto out_release_all;
-
if (copy_user_large_folio(new_folio, old_folio, vmf->real_address, vma)) {
ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h));
goto out_release_all;
--
2.45.1



2024-05-21 09:57:10

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

On 21.05.24 09:34, Oscar Salvador wrote:
> hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
> means that we might need to call restore_reserve_on_error() upon error.
> vmf_anon_prepare() releases the vma lock before returning, but
> restore_reserve_on_error() expects the vma lock to be held by the caller.
>
> Fix it by calling vmf_anon_prepare() before allocating the page.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")
> ---
> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
> for some other reason. And I did not want to get creative to see if I could
> trigger this so I could get a backtrace.
> My assumption is that we could trigger this if 1) this was a shared mapping,
> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().

FOLL_WRITE should never result in FLAG_UNSHARE.

>
> mm/hugetlb.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6be78e7d4f6e..eb0d8a45505e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6005,6 +6005,15 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
> * be acquired again before returning to the caller, as expected.
> */
> spin_unlock(vmf->ptl);
> +
> + /*
> + * When the original hugepage is shared one, it does not have
> + * anon_vma prepared.
> + */
> + ret = vmf_anon_prepare(vmf);
> + if (unlikely(ret))
> + goto out_release_old;
> +
> new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve);
>
> if (IS_ERR(new_folio)) {
> @@ -6058,14 +6067,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
> goto out_release_old;
> }
>
> - /*
> - * When the original hugepage is shared one, it does not have
> - * anon_vma prepared.
> - */
> - ret = vmf_anon_prepare(vmf);
> - if (unlikely(ret))
> - goto out_release_all;
> -
> if (copy_user_large_folio(new_folio, old_folio, vmf->real_address, vma)) {
> ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h));
> goto out_release_all;

The joy of hugetlb reservation code.

LGTM

--
Cheers,

David / dhildenb


2024-05-21 10:26:45

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

On Tue, May 21, 2024 at 11:56:54AM +0200, David Hildenbrand wrote:
> On 21.05.24 09:34, Oscar Salvador wrote:
> > hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
> > means that we might need to call restore_reserve_on_error() upon error.
> > vmf_anon_prepare() releases the vma lock before returning, but
> > restore_reserve_on_error() expects the vma lock to be held by the caller.
> >
> > Fix it by calling vmf_anon_prepare() before allocating the page.
> >
> > Signed-off-by: Oscar Salvador <[email protected]>
> > Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")
> > ---
> > I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
> > for some other reason. And I did not want to get creative to see if I could
> > trigger this so I could get a backtrace.
> > My assumption is that we could trigger this if 1) this was a shared mapping,
> > so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
> > the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().
>
> FOLL_WRITE should never result in FLAG_UNSHARE.

You are right.
It was quite early when I looked at this and I managed to confuse myself when
reading hugetlb_follow_page_mask().

> The joy of hugetlb reservation code.
>
> LGTM

thanks David!


--
Oscar Salvador
SUSE Labs

2024-05-27 08:54:08

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote:
> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
> for some other reason. And I did not want to get creative to see if I could
> trigger this so I could get a backtrace.
> My assumption is that we could trigger this if 1) this was a shared mapping,
> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().

So I checked this again and I have to confess I am bit confused.

hugetlb_wp() can be called from either hugetlb_fault() or hugetlb_no_page().

hugetlb_fault()->hugetlb_wp() upon FAULT_FLAG_{WRITE,UNSHARE}
hugetlb_no_page->hugetlb_wp()-> upon FAULT_FLAG_WRITE && !VM_SHARED

hugetlb_no_page()->vmf_anon_prepare() upon !VM_SHARED, which means that VM_SHARED
mappings do not have vma->anon_vma, while others do.

hugetlb_wp() will call set_huge_ptep_writable() right away and return if it sees
that the mapping is shared.
So the only other we have to end up in hugetlb_wp() is via FAULT_FLAG_UNSHARE.
For that to happen gup_must_unshare() must return true, which means the following
assumptions must hold.

- For Anonymous pages:
1) !PageAnonExclusive
- For Filebacked pages:
2) We do not have a vma
3) It is a COW mapping

1) If gup_must_unshare() returns true for Anonymous pages because the page is not
exclusive and must be unshared, hugetlb_wp() will already see the
vma->anon_prepare being initialized because of the previous
hugetlb_no_page()->vmf_anon_prepare.

2) I do not quite understand this case.
3) !VMSHARED mappings already had its anon_vma initialized in
hugetlb_no_page()->vmf_anon_prepare.

Probably I am missing some bits here, but I cannot see how we would even
need vmf_anon_prepare() in hugetlb_wp() in the first place.


--
Oscar Salvador
SUSE Labs

2024-05-27 13:22:41

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp



Am 27.05.24 um 10:53 schrieb Oscar Salvador:
> On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote:
>> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
>> for some other reason. And I did not want to get creative to see if I could
>> trigger this so I could get a backtrace.
>> My assumption is that we could trigger this if 1) this was a shared mapping,
>> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
>> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().
>
> So I checked this again and I have to confess I am bit confused.
>
> hugetlb_wp() can be called from either hugetlb_fault() or hugetlb_no_page().
>
> hugetlb_fault()->hugetlb_wp() upon FAULT_FLAG_{WRITE,UNSHARE}
> hugetlb_no_page->hugetlb_wp()-> upon FAULT_FLAG_WRITE && !VM_SHARED
>
> hugetlb_no_page()->vmf_anon_prepare() upon !VM_SHARED, which means that VM_SHARED
> mappings do not have vma->anon_vma, while others do.
>
> hugetlb_wp() will call set_huge_ptep_writable() right away and return if it sees
> that the mapping is shared.
> So the only other we have to end up in hugetlb_wp() is via FAULT_FLAG_UNSHARE.
> For that to happen gup_must_unshare() must return true, which means the following
> assumptions must hold.
>
> - For Anonymous pages:
> 1) !PageAnonExclusive
> - For Filebacked pages:
> 2) We do not have a vma
> 3) It is a COW mapping
>
> 1) If gup_must_unshare() returns true for Anonymous pages because the page is not
> exclusive and must be unshared, hugetlb_wp() will already see the
> vma->anon_prepare being initialized because of the previous
> hugetlb_no_page()->vmf_anon_prepare.
>
> 2) I do not quite understand this case.

gup_must_unshare() without a VMA is only used for GUP-fast. Before triggering a
page fault we always fallback to GUP-slow first, where we have a VMA.

IMHO we better not make assumptions that hugetlb_wp() will always already have
an anon VMA.

--
Thanks,

David / dhildenb


2024-05-27 14:04:12

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

On Mon, May 27, 2024 at 03:17:01PM +0200, David Hildenbrand wrote:
> gup_must_unshare() without a VMA is only used for GUP-fast. Before
> triggering a page fault we always fallback to GUP-slow first, where we have
> a VMA.

I see, thanks for explaining!

> IMHO we better not make assumptions that hugetlb_wp() will always already
> have an anon VMA.

Fine by me, I just wanted to have the picture more clear, because I could not
see how we could end up in such situation, but better be safe and than sorry.

@Andrew: This fix is to avoid an issue like [1], but for hugetlb_wp().

[1] https://lore.kernel.org/linux-mm/[email protected]/T/


--
Oscar Salvador
SUSE Labs

2024-06-12 20:27:20

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH] mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote:
> hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
> means that we might need to call restore_reserve_on_error() upon error.
> vmf_anon_prepare() releases the vma lock before returning, but
> restore_reserve_on_error() expects the vma lock to be held by the caller.
>
> Fix it by calling vmf_anon_prepare() before allocating the page.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")

Let me just revamp this and run away


--
Oscar Salvador
SUSE Labs