LinuxLists.cc - [PATCH RFC 09/10] mm/hugetlb: Make hugetlb

2022-10-30 22:02:37

Subject: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe

RCU makes sure the pte_t* won't go away from under us. Please refer to the
comment above huge_pte_offset() for more information.

Signed-off-by: Peter Xu <[email protected]>
---
mm/hugetlb.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5dc87e4e6780..6d336d286394 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
int need_wait_lock = 0;
unsigned long haddr = address & huge_page_mask(h);

+ /* For huge_pte_offset() */
+ rcu_read_lock();
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (ptep) {
/*
@@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* not actually modifying content here.
*/
entry = huge_ptep_get(ptep);
+ rcu_read_unlock();
if (unlikely(is_hugetlb_entry_migration(entry))) {
migration_entry_wait_huge(vma, ptep);
return 0;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
- }
+ } else
+ rcu_read_unlock();

/*
* Serialize hugepage allocation and instantiation, so that we don't
--
2.37.3

2022-11-02 19:16:20

by James Houghton

[permalink] [raw]

Subject: Re: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe

On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <[email protected]> wrote:
>
> RCU makes sure the pte_t* won't go away from under us. Please refer to the
> comment above huge_pte_offset() for more information.

Thanks for this series, Peter! :)

>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/hugetlb.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5dc87e4e6780..6d336d286394 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> int need_wait_lock = 0;
> unsigned long haddr = address & huge_page_mask(h);
>
> + /* For huge_pte_offset() */
> + rcu_read_lock();
> ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> if (ptep) {
> /*
> @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> * not actually modifying content here.
> */
> entry = huge_ptep_get(ptep);
> + rcu_read_unlock();
> if (unlikely(is_hugetlb_entry_migration(entry))) {
> migration_entry_wait_huge(vma, ptep);

ptep is used here (and we dereference it in
`__migration_entry_wait_huge`), so this looks unsafe to me. A simple
way to fix this would be to move the migration entry check after the
huge_pte_alloc call.

- James

> return 0;
> } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> return VM_FAULT_HWPOISON_LARGE |
> VM_FAULT_SET_HINDEX(hstate_index(h));
> - }
> + } else
> + rcu_read_unlock();
>
> /*
> * Serialize hugepage allocation and instantiation, so that we don't
> --
> 2.37.3
>

2022-11-03 16:04:25

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe

On Wed, Nov 02, 2022 at 11:04:01AM -0700, James Houghton wrote:
> On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <[email protected]> wrote:
> >
> > RCU makes sure the pte_t* won't go away from under us. Please refer to the
> > comment above huge_pte_offset() for more information.
>
> Thanks for this series, Peter! :)

Thanks for reviewing, James!

>
> >
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > mm/hugetlb.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 5dc87e4e6780..6d336d286394 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > int need_wait_lock = 0;
> > unsigned long haddr = address & huge_page_mask(h);
> >
> > + /* For huge_pte_offset() */
> > + rcu_read_lock();
> > ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> > if (ptep) {
> > /*
> > @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > * not actually modifying content here.
> > */
> > entry = huge_ptep_get(ptep);
> > + rcu_read_unlock();
> > if (unlikely(is_hugetlb_entry_migration(entry))) {
> > migration_entry_wait_huge(vma, ptep);
>
> ptep is used here (and we dereference it in
> `__migration_entry_wait_huge`), so this looks unsafe to me. A simple
> way to fix this would be to move the migration entry check after the
> huge_pte_alloc call.

Right, I definitely overlooked the migration entries in both patches
(including the previous one that you commented), thanks for pointing that
out.

Though moving that after huge_pte_alloc() may have similar problem, iiuc.
The thing is we need either the vma lock or rcu to protect accessing the
pte*, while the pte* page and its pgtable lock can be accessed very deep
into the migration core (e.g., migration_entry_wait_on_locked()) as the
lock cannot be released before the thread queues itself into the waitqueue.

So far I don't see a good way to achieve this but add a hook to
migration_entry_wait_on_locked() so that any lock held for huge migrations
can be properly released after the pgtable lock released but before the
thread yields itself.

--
Peter Xu