It seems hugetlb_follow_page_mask() was missing permission checks. For
example, one follow_page() can get the hugetlb page with FOLL_WRITE even if
the page is read-only.
And it wasn't there even in the old follow_page_mask(), where we can
reference from before commit 57a196a58421 ("hugetlb: simplify hugetlb
handling in follow_page_mask").
Let's add them, namely, either the need to CoW due to missing write bit, or
proper CoR on !AnonExclusive pages over R/O pins to reject the follow page.
That brings this function closer to follow_hugetlb_page().
I just doubt how many of us care for that, for FOLL_PIN follow_page doesn't
really happen at all. But we'll care, and care more if we switch over
slow-gup to use hugetlb_follow_page_mask(). We'll also care when to return
-EMLINK then, as that's the gup internal api to mean "we should do CoR".
When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
clear that it just should never fail.
Signed-off-by: Peter Xu <[email protected]>
---
mm/hugetlb.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 82dfdd96db4c..9c261921b2cf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6481,8 +6481,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
ptl = huge_pte_lock(h, mm, pte);
entry = huge_ptep_get(pte);
if (pte_present(entry)) {
- page = pte_page(entry) +
- ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+ page = pte_page(entry);
+
+ if (gup_must_unshare(vma, flags, page)) {
+ /* Tell the caller to do Copy-On-Read */
+ page = ERR_PTR(-EMLINK);
+ goto out;
+ }
+
+ if ((flags & FOLL_WRITE) && !pte_write(entry)) {
+ page = NULL;
+ goto out;
+ }
+
+ page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+
/*
* Note that page may be a sub-page, and with vmemmap
* optimizations the page struct may be read only.
@@ -6492,10 +6505,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
* try_grab_page() should always be able to get the page here,
* because we hold the ptl lock and have verified pte_present().
*/
- if (try_grab_page(page, flags)) {
- page = NULL;
- goto out;
- }
+ WARN_ON_ONCE(try_grab_page(page, flags));
}
out:
spin_unlock(ptl);
--
2.40.1
On 13.06.23 23:53, Peter Xu wrote:
> It seems hugetlb_follow_page_mask() was missing permission checks. For
> example, one follow_page() can get the hugetlb page with FOLL_WRITE even if
> the page is read-only.
I'm curious if there even is a follow_page() user that operates on
hugetlb ...
s390x secure storage does not apply to hugetlb IIRC.
ksm.c? no.
huge_memory.c ? no
So what remains is most probably mm/migrate.c, which never sets FOLL_WRITE.
Or am I missing something a user?
> > And it wasn't there even in the old follow_page_mask(), where we can
> reference from before commit 57a196a58421 ("hugetlb: simplify hugetlb
> handling in follow_page_mask").
>
> Let's add them, namely, either the need to CoW due to missing write bit, or
> proper CoR on !AnonExclusive pages over R/O pins to reject the follow page.
> That brings this function closer to follow_hugetlb_page().
>
> I just doubt how many of us care for that, for FOLL_PIN follow_page doesn't
> really happen at all. But we'll care, and care more if we switch over
> slow-gup to use hugetlb_follow_page_mask(). We'll also care when to return
> -EMLINK then, as that's the gup internal api to mean "we should do CoR".
>
> When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> clear that it just should never fail.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/hugetlb.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 82dfdd96db4c..9c261921b2cf 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6481,8 +6481,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> ptl = huge_pte_lock(h, mm, pte);
> entry = huge_ptep_get(pte);
> if (pte_present(entry)) {
> - page = pte_page(entry) +
> - ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> + page = pte_page(entry);
> +
> + if (gup_must_unshare(vma, flags, page)) {
> + /* Tell the caller to do Copy-On-Read */
> + page = ERR_PTR(-EMLINK);
> + goto out;
> + }
> +
> + if ((flags & FOLL_WRITE) && !pte_write(entry)) {
> + page = NULL;
> + goto out;
> + }
> +
> + page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> +
> /*
> * Note that page may be a sub-page, and with vmemmap
> * optimizations the page struct may be read only.
> @@ -6492,10 +6505,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> * try_grab_page() should always be able to get the page here,
> * because we hold the ptl lock and have verified pte_present().
> */
> - if (try_grab_page(page, flags)) {
> - page = NULL;
> - goto out;
> - }
> + WARN_ON_ONCE(try_grab_page(page, flags));
> }
> out:
> spin_unlock(ptl);
--
Cheers,
David / dhildenb
On 14.06.23 17:46, Peter Xu wrote:
> On Wed, Jun 14, 2023 at 05:31:36PM +0200, David Hildenbrand wrote:
>> On 13.06.23 23:53, Peter Xu wrote:
>>> It seems hugetlb_follow_page_mask() was missing permission checks. For
>>> example, one follow_page() can get the hugetlb page with FOLL_WRITE even if
>>> the page is read-only.
>>
>> I'm curious if there even is a follow_page() user that operates on hugetlb
>> ...
>>
>> s390x secure storage does not apply to hugetlb IIRC.
>
> You're the expert, so I'll rely on you. :)
>
Hehe, there is a comment in gmap_destroy_page(), above one of the
follow_page() users:
/*
* Huge pages should not be able to become secure
*/
if (is_vm_hugetlb_page(vma))
goto out;
--
Cheers,
David / dhildenb
On Wed, Jun 14, 2023 at 05:31:36PM +0200, David Hildenbrand wrote:
> On 13.06.23 23:53, Peter Xu wrote:
> > It seems hugetlb_follow_page_mask() was missing permission checks. For
> > example, one follow_page() can get the hugetlb page with FOLL_WRITE even if
> > the page is read-only.
>
> I'm curious if there even is a follow_page() user that operates on hugetlb
> ...
>
> s390x secure storage does not apply to hugetlb IIRC.
You're the expert, so I'll rely on you. :)
>
> ksm.c? no.
>
> huge_memory.c ? no
>
> So what remains is most probably mm/migrate.c, which never sets FOLL_WRITE.
>
> Or am I missing something a user?
Yes, non of the rest are with WRITE.
Then I assume no fixes /backport needed at all (which is what this patch
already does). It's purely to be prepared only. I'll mention that in the
new version.
Thanks,
>
> > > And it wasn't there even in the old follow_page_mask(), where we can
> > reference from before commit 57a196a58421 ("hugetlb: simplify hugetlb
> > handling in follow_page_mask").
> >
> > Let's add them, namely, either the need to CoW due to missing write bit, or
> > proper CoR on !AnonExclusive pages over R/O pins to reject the follow page.
> > That brings this function closer to follow_hugetlb_page().
> >
> > I just doubt how many of us care for that, for FOLL_PIN follow_page doesn't
> > really happen at all. But we'll care, and care more if we switch over
> > slow-gup to use hugetlb_follow_page_mask(). We'll also care when to return
> > -EMLINK then, as that's the gup internal api to mean "we should do CoR".
> >
> > When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> > clear that it just should never fail.
> >
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > mm/hugetlb.c | 22 ++++++++++++++++------
> > 1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 82dfdd96db4c..9c261921b2cf 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6481,8 +6481,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > ptl = huge_pte_lock(h, mm, pte);
> > entry = huge_ptep_get(pte);
> > if (pte_present(entry)) {
> > - page = pte_page(entry) +
> > - ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > + page = pte_page(entry);
> > +
> > + if (gup_must_unshare(vma, flags, page)) {
> > + /* Tell the caller to do Copy-On-Read */
> > + page = ERR_PTR(-EMLINK);
> > + goto out;
> > + }
> > +
> > + if ((flags & FOLL_WRITE) && !pte_write(entry)) {
> > + page = NULL;
> > + goto out;
> > + }
> > +
> > + page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > +
> > /*
> > * Note that page may be a sub-page, and with vmemmap
> > * optimizations the page struct may be read only.
> > @@ -6492,10 +6505,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > * try_grab_page() should always be able to get the page here,
> > * because we hold the ptl lock and have verified pte_present().
> > */
> > - if (try_grab_page(page, flags)) {
> > - page = NULL;
> > - goto out;
> > - }
> > + WARN_ON_ONCE(try_grab_page(page, flags));
> > }
> > out:
> > spin_unlock(ptl);
>
> --
> Cheers,
>
> David / dhildenb
>
--
Peter Xu
On 06/14/23 11:46, Peter Xu wrote:
> On Wed, Jun 14, 2023 at 05:31:36PM +0200, David Hildenbrand wrote:
> > On 13.06.23 23:53, Peter Xu wrote:
>
> Then I assume no fixes /backport needed at all (which is what this patch
> already does). It's purely to be prepared only. I'll mention that in the
> new version.
Code looks fine to me. Feel free to add,
Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz
> > >
> > > When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> > > clear that it just should never fail.
> > >
> > > Signed-off-by: Peter Xu <[email protected]>
> > > ---
> > > mm/hugetlb.c | 22 ++++++++++++++++------
> > > 1 file changed, 16 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 82dfdd96db4c..9c261921b2cf 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -6481,8 +6481,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > > ptl = huge_pte_lock(h, mm, pte);
> > > entry = huge_ptep_get(pte);
> > > if (pte_present(entry)) {
> > > - page = pte_page(entry) +
> > > - ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > > + page = pte_page(entry);
> > > +
> > > + if (gup_must_unshare(vma, flags, page)) {
> > > + /* Tell the caller to do Copy-On-Read */
> > > + page = ERR_PTR(-EMLINK);
> > > + goto out;
> > > + }
> > > +
> > > + if ((flags & FOLL_WRITE) && !pte_write(entry)) {
> > > + page = NULL;
> > > + goto out;
> > > + }
> > > +
> > > + page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > > +
> > > /*
> > > * Note that page may be a sub-page, and with vmemmap
> > > * optimizations the page struct may be read only.
> > > @@ -6492,10 +6505,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > > * try_grab_page() should always be able to get the page here,
> > > * because we hold the ptl lock and have verified pte_present().
> > > */
> > > - if (try_grab_page(page, flags)) {
> > > - page = NULL;
> > > - goto out;
> > > - }
> > > + WARN_ON_ONCE(try_grab_page(page, flags));
> > > }
> > > out:
> > > spin_unlock(ptl);