This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
However there's one CoW optimization path that can trigger hugetlb_wp()
inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected. The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE
won't be affected and can still be resolved, IOW only skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
Reported-by: Muhammad Usama Anjum <[email protected]>
Cc: linux-stable <[email protected]>
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <[email protected]>
---
Notes:
v2 is not on the list but in an attachment in the reply; this v3 is mostly
to make sure it's not the same as the patch used to be attached. Sorry
Andrew, we need to drop the queued one as I rewrote the commit message.
Muhammad, I didn't attach your T-b because of the slight functional change.
Please feel free to re-attach if it still works for you (which I believe
should).
thanks,
---
mm/hugetlb.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bfd07f4c143..a58b3739ed4b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
struct folio *pagecache_folio, spinlock_t *ptl)
{
const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
+ pte_t pte = huge_ptep_get(ptep);
struct hstate *h = hstate_vma(vma);
struct page *old_page;
struct folio *new_folio;
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long haddr = address & huge_page_mask(h);
struct mmu_notifier_range range;
+ /*
+ * Never handle CoW for uffd-wp protected pages. It should be only
+ * handled when the uffd-wp protection is removed.
+ *
+ * Note that only the CoW optimization path (in hugetlb_no_page())
+ * can trigger this, because hugetlb_fault() will always resolve
+ * uffd-wp bit first.
+ */
+ if (!unshare && huge_pte_uffd_wp(pte))
+ return 0;
+
/*
* hugetlb does not support FOLL_FORCE-style write faults that keep the
* PTE mapped R/O such as maybe_mkwrite() would do.
@@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
}
- pte = huge_ptep_get(ptep);
old_page = pte_page(pte);
delayacct_wpcopy_start();
--
2.39.1
On 3/24/23 7:26 PM, Peter Xu wrote:
> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
> writable even with uffd-wp bit set. It only happens with hugetlb private
> mappings, when someone firstly wr-protects a missing pte (which will
> install a pte marker), then a write to the same page without any prior
> access to the page.
>
> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
> reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
> However there's one CoW optimization path that can trigger hugetlb_wp()
> inside hugetlb_no_page(), which will bypass the trap.
>
> This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
> is detected. The new path will only trigger in the CoW optimization path
> because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
> will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE
> won't be affected and can still be resolved, IOW only skip CoW not CoR.
>
> This patch will be needed for v5.19+ hence copy stable.
>
> Reported-by: Muhammad Usama Anjum <[email protected]>
> Cc: linux-stable <[email protected]>
> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
> Signed-off-by: Peter Xu <[email protected]>
Tested-by: Muhammad Usama Anjum <[email protected]>
> ---
>
> Notes:
>
> v2 is not on the list but in an attachment in the reply; this v3 is mostly
> to make sure it's not the same as the patch used to be attached. Sorry
> Andrew, we need to drop the queued one as I rewrote the commit message.
>
> Muhammad, I didn't attach your T-b because of the slight functional change.
> Please feel free to re-attach if it still works for you (which I believe
> should).
Thank you for the fix.
>
> thanks,
> ---
> mm/hugetlb.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bfd07f4c143..a58b3739ed4b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> struct folio *pagecache_folio, spinlock_t *ptl)
> {
> const bool unshare = flags & FAULT_FLAG_UNSHARE;
> - pte_t pte;
> + pte_t pte = huge_ptep_get(ptep);
> struct hstate *h = hstate_vma(vma);
> struct page *old_page;
> struct folio *new_folio;
> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long haddr = address & huge_page_mask(h);
> struct mmu_notifier_range range;
>
> + /*
> + * Never handle CoW for uffd-wp protected pages. It should be only
> + * handled when the uffd-wp protection is removed.
> + *
> + * Note that only the CoW optimization path (in hugetlb_no_page())
> + * can trigger this, because hugetlb_fault() will always resolve
> + * uffd-wp bit first.
> + */
> + if (!unshare && huge_pte_uffd_wp(pte))
> + return 0;
> +
> /*
> * hugetlb does not support FOLL_FORCE-style write faults that keep the
> * PTE mapped R/O such as maybe_mkwrite() would do.
> @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> return 0;
> }
>
> - pte = huge_ptep_get(ptep);
> old_page = pte_page(pte);
>
> delayacct_wpcopy_start();
--
BR,
Muhammad Usama Anjum
On 03/24/23 10:26, Peter Xu wrote:
> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
> writable even with uffd-wp bit set. It only happens with hugetlb private
> mappings, when someone firstly wr-protects a missing pte (which will
> install a pte marker), then a write to the same page without any prior
> access to the page.
>
> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
> reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
> However there's one CoW optimization path that can trigger hugetlb_wp()
> inside hugetlb_no_page(), which will bypass the trap.
>
> This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
> is detected. The new path will only trigger in the CoW optimization path
> because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
> will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE
> won't be affected and can still be resolved, IOW only skip CoW not CoR.
>
> This patch will be needed for v5.19+ hence copy stable.
>
> Reported-by: Muhammad Usama Anjum <[email protected]>
> Cc: linux-stable <[email protected]>
> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
> Signed-off-by: Peter Xu <[email protected]>
> ---
>
> Notes:
>
> v2 is not on the list but in an attachment in the reply; this v3 is mostly
> to make sure it's not the same as the patch used to be attached. Sorry
> Andrew, we need to drop the queued one as I rewrote the commit message.
My appologies! I saw the code path missed in v2 and assumed you did not
think it applied. So, I said nothing. My bad!
> Muhammad, I didn't attach your T-b because of the slight functional change.
> Please feel free to re-attach if it still works for you (which I believe
> should).
>
> thanks,
> ---
> mm/hugetlb.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bfd07f4c143..a58b3739ed4b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> struct folio *pagecache_folio, spinlock_t *ptl)
> {
> const bool unshare = flags & FAULT_FLAG_UNSHARE;
> - pte_t pte;
> + pte_t pte = huge_ptep_get(ptep);
> struct hstate *h = hstate_vma(vma);
> struct page *old_page;
> struct folio *new_folio;
> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long haddr = address & huge_page_mask(h);
> struct mmu_notifier_range range;
>
> + /*
> + * Never handle CoW for uffd-wp protected pages. It should be only
> + * handled when the uffd-wp protection is removed.
> + *
> + * Note that only the CoW optimization path (in hugetlb_no_page())
> + * can trigger this, because hugetlb_fault() will always resolve
> + * uffd-wp bit first.
> + */
> + if (!unshare && huge_pte_uffd_wp(pte))
> + return 0;
This looks correct. However, since the previous version looked correct I must
ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems
we would need to possibly propogate that uffd_wp to the new pte as in v2
> +
> /*
> * hugetlb does not support FOLL_FORCE-style write faults that keep the
> * PTE mapped R/O such as maybe_mkwrite() would do.
> @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> return 0;
> }
>
> - pte = huge_ptep_get(ptep);
> old_page = pte_page(pte);
>
> delayacct_wpcopy_start();
> --
> 2.39.1
>
--
Mike Kravetz
On 24.03.23 23:27, Mike Kravetz wrote:
> On 03/24/23 10:26, Peter Xu wrote:
>> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
>> writable even with uffd-wp bit set. It only happens with hugetlb private
>> mappings, when someone firstly wr-protects a missing pte (which will
>> install a pte marker), then a write to the same page without any prior
>> access to the page.
>>
>> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
>> reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
>> However there's one CoW optimization path that can trigger hugetlb_wp()
>> inside hugetlb_no_page(), which will bypass the trap.
>>
>> This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
>> is detected. The new path will only trigger in the CoW optimization path
>> because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
>> will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE
>> won't be affected and can still be resolved, IOW only skip CoW not CoR.
>>
>> This patch will be needed for v5.19+ hence copy stable.
>>
>> Reported-by: Muhammad Usama Anjum <[email protected]>
>> Cc: linux-stable <[email protected]>
>> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
>> Signed-off-by: Peter Xu <[email protected]>
>> ---
>>
>> Notes:
>>
>> v2 is not on the list but in an attachment in the reply; this v3 is mostly
>> to make sure it's not the same as the patch used to be attached. Sorry
>> Andrew, we need to drop the queued one as I rewrote the commit message.
>
> My appologies! I saw the code path missed in v2 and assumed you did not
> think it applied. So, I said nothing. My bad!
>
>> Muhammad, I didn't attach your T-b because of the slight functional change.
>> Please feel free to re-attach if it still works for you (which I believe
>> should).
>>
>> thanks,
>> ---
>> mm/hugetlb.c | 14 ++++++++++++--
>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 8bfd07f4c143..a58b3739ed4b 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>> struct folio *pagecache_folio, spinlock_t *ptl)
>> {
>> const bool unshare = flags & FAULT_FLAG_UNSHARE;
>> - pte_t pte;
>> + pte_t pte = huge_ptep_get(ptep);
>> struct hstate *h = hstate_vma(vma);
>> struct page *old_page;
>> struct folio *new_folio;
>> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>> unsigned long haddr = address & huge_page_mask(h);
>> struct mmu_notifier_range range;
>>
>> + /*
>> + * Never handle CoW for uffd-wp protected pages. It should be only
>> + * handled when the uffd-wp protection is removed.
>> + *
>> + * Note that only the CoW optimization path (in hugetlb_no_page())
>> + * can trigger this, because hugetlb_fault() will always resolve
>> + * uffd-wp bit first.
>> + */
>> + if (!unshare && huge_pte_uffd_wp(pte))
>> + return 0;
>
> This looks correct. However, since the previous version looked correct I must
> ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems
> we would need to possibly propogate that uffd_wp to the new pte as in v2
We can. A reproducer would share an anon hugetlb page because parent and
child. In the parent, we would uffd-wp that page. We could trigger
unsharing by R/O-pinning that page.
--
Thanks,
David / dhildenb
On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
> > > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> > > unsigned long haddr = address & huge_page_mask(h);
> > > struct mmu_notifier_range range;
> > > + /*
> > > + * Never handle CoW for uffd-wp protected pages. It should be only
> > > + * handled when the uffd-wp protection is removed.
> > > + *
> > > + * Note that only the CoW optimization path (in hugetlb_no_page())
> > > + * can trigger this, because hugetlb_fault() will always resolve
> > > + * uffd-wp bit first.
> > > + */
> > > + if (!unshare && huge_pte_uffd_wp(pte))
> > > + return 0;
> >
> > This looks correct. However, since the previous version looked correct I must
> > ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems
> > we would need to possibly propogate that uffd_wp to the new pte as in v2
Good point, thanks for spotting!
>
> We can. A reproducer would share an anon hugetlb page because parent and
> child. In the parent, we would uffd-wp that page. We could trigger unsharing
> by R/O-pinning that page.
Right. This seems to be a separate bug.. It should be triggered in
totally different context and much harder due to rare use of RO pins,
meanwhile used with userfault-wp.
If both of you agree, I can prepare a separate patch for this bug, and I'll
better prepare a reproducer/selftest with it.
--
Peter Xu
On 03/26/23 10:46, Peter Xu wrote:
> On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
> > > > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> > > > unsigned long haddr = address & huge_page_mask(h);
> > > > struct mmu_notifier_range range;
> > > > + /*
> > > > + * Never handle CoW for uffd-wp protected pages. It should be only
> > > > + * handled when the uffd-wp protection is removed.
> > > > + *
> > > > + * Note that only the CoW optimization path (in hugetlb_no_page())
> > > > + * can trigger this, because hugetlb_fault() will always resolve
> > > > + * uffd-wp bit first.
> > > > + */
> > > > + if (!unshare && huge_pte_uffd_wp(pte))
> > > > + return 0;
> > >
> > > This looks correct. However, since the previous version looked correct I must
> > > ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems
> > > we would need to possibly propogate that uffd_wp to the new pte as in v2
>
> Good point, thanks for spotting!
>
> >
> > We can. A reproducer would share an anon hugetlb page because parent and
> > child. In the parent, we would uffd-wp that page. We could trigger unsharing
> > by R/O-pinning that page.
>
> Right. This seems to be a separate bug.. It should be triggered in
> totally different context and much harder due to rare use of RO pins,
> meanwhile used with userfault-wp.
>
> If both of you agree, I can prepare a separate patch for this bug, and I'll
> better prepare a reproducer/selftest with it.
>
I am OK with separate patches, and agree that the R/O pinning case is less
likely to happen.
Since this patch addresses the issue found by Muhammad,
Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz
On 27.03.23 20:34, Mike Kravetz wrote:
> On 03/26/23 10:46, Peter Xu wrote:
>> On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
>>>>> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>>>>> unsigned long haddr = address & huge_page_mask(h);
>>>>> struct mmu_notifier_range range;
>>>>> + /*
>>>>> + * Never handle CoW for uffd-wp protected pages. It should be only
>>>>> + * handled when the uffd-wp protection is removed.
>>>>> + *
>>>>> + * Note that only the CoW optimization path (in hugetlb_no_page())
>>>>> + * can trigger this, because hugetlb_fault() will always resolve
>>>>> + * uffd-wp bit first.
>>>>> + */
>>>>> + if (!unshare && huge_pte_uffd_wp(pte))
>>>>> + return 0;
>>>>
>>>> This looks correct. However, since the previous version looked correct I must
>>>> ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems
>>>> we would need to possibly propogate that uffd_wp to the new pte as in v2
>>
>> Good point, thanks for spotting!
>>
>>>
>>> We can. A reproducer would share an anon hugetlb page because parent and
>>> child. In the parent, we would uffd-wp that page. We could trigger unsharing
>>> by R/O-pinning that page.
>>
>> Right. This seems to be a separate bug.. It should be triggered in
>> totally different context and much harder due to rare use of RO pins,
>> meanwhile used with userfault-wp.
>>
>> If both of you agree, I can prepare a separate patch for this bug, and I'll
>> better prepare a reproducer/selftest with it.
>>
>
> I am OK with separate patches, and agree that the R/O pinning case is less
> likely to happen.
Yes, the combination should be rather rare and we can fix that
separately. Ideally, we'd try to mimic the same uffd code flow in
hugetlb cow/unshare handling that we use in memory.c
>
> Since this patch addresses the issue found by Muhammad,
>
> Reviewed-by: Mike Kravetz <[email protected]>
Hopefully we didn't forget about yet another case :D
Acked-by: David Hildenbrand <[email protected]>
--
Thanks,
David / dhildenb