Use huge_ptep_get to translate huge ptes to normal ptes so we can
check them with the huge_pte_* functions. Otherwise some architectures
will check the wrong values and will not wait for userspace to bring
in the memory.
Signed-off-by: Janosch Frank <[email protected]>
Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
---
fs/userfaultfd.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 123bf7d516fc..594d192b2331 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
unsigned long reason)
{
struct mm_struct *mm = ctx->mm;
- pte_t *pte;
+ pte_t *ptep, pte;
bool ret = true;
VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
- pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
- if (!pte)
+ ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
+
+ if (!ptep)
goto out;
ret = false;
+ pte = huge_ptep_get(ptep);
/*
* Lockless access: we're in a wait_event so it's ok if it
* changes under us.
*/
- if (huge_pte_none(*pte))
+ if (huge_pte_none(pte))
ret = true;
- if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
+ if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
ret = true;
out:
return ret;
--
2.14.3
On 26.06.2018 15:24, Janosch Frank wrote:
> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> check them with the huge_pte_* functions. Otherwise some architectures
> will check the wrong values and will not wait for userspace to bring
> in the memory.
>
> Signed-off-by: Janosch Frank <[email protected]>
> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
> ---
> fs/userfaultfd.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 123bf7d516fc..594d192b2331 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> unsigned long reason)
> {
> struct mm_struct *mm = ctx->mm;
> - pte_t *pte;
> + pte_t *ptep, pte;
> bool ret = true;
>
> VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
>
> - pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> - if (!pte)
> + ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> +
> + if (!ptep)
> goto out;
>
> ret = false;
> + pte = huge_ptep_get(ptep);
>
> /*
> * Lockless access: we're in a wait_event so it's ok if it
> * changes under us.
> */
> - if (huge_pte_none(*pte))
> + if (huge_pte_none(pte))
> ret = true;
> - if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
> + if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
> ret = true;
> out:
> return ret;
>
Indeed, this only works by luck on x86 and ppc, but not on s390x.
Reviewed-by: David Hildenbrand <[email protected]>
--
Thanks,
David / dhildenb
On 06/26/2018 06:24 AM, Janosch Frank wrote:
> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> check them with the huge_pte_* functions. Otherwise some architectures
> will check the wrong values and will not wait for userspace to bring
> in the memory.
>
> Signed-off-by: Janosch Frank <[email protected]>
> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
Adding linux-mm and Andrew on Cc:
Thanks for catching and fixing this.
I think this needs to be fixed in stable as well. Correct? Assuming
userfaultfd is/can be enabled for impacted architectures.
Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz
> ---
> fs/userfaultfd.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 123bf7d516fc..594d192b2331 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> unsigned long reason)
> {
> struct mm_struct *mm = ctx->mm;
> - pte_t *pte;
> + pte_t *ptep, pte;
> bool ret = true;
>
> VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
>
> - pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> - if (!pte)
> + ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> +
> + if (!ptep)
> goto out;
>
> ret = false;
> + pte = huge_ptep_get(ptep);
>
> /*
> * Lockless access: we're in a wait_event so it's ok if it
> * changes under us.
> */
> - if (huge_pte_none(*pte))
> + if (huge_pte_none(pte))
> ret = true;
> - if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
> + if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
> ret = true;
> out:
> return ret;
>
On 26.06.2018 19:00, Mike Kravetz wrote:
> On 06/26/2018 06:24 AM, Janosch Frank wrote:
>> Use huge_ptep_get to translate huge ptes to normal ptes so we can
>> check them with the huge_pte_* functions. Otherwise some architectures
>> will check the wrong values and will not wait for userspace to bring
>> in the memory.
>>
>> Signed-off-by: Janosch Frank <[email protected]>
>> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
> Adding linux-mm and Andrew on Cc:
>
> Thanks for catching and fixing this.
Sure
I'd be happy if we get less of these problems with time, this one was
rather painful to debug. :)
> I think this needs to be fixed in stable as well. Correct? Assuming
> userfaultfd is/can be enabled for impacted architectures.
Correct, it seems I forgot the CC stable...
>
> Reviewed-by: Mike Kravetz <[email protected]>
Thanks
> -- Mike Kravetz
>> ---
>> fs/userfaultfd.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>> index 123bf7d516fc..594d192b2331 100644
>> --- a/fs/userfaultfd.c
>> +++ b/fs/userfaultfd.c
>> @@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
>> unsigned long reason)
>> {
>> struct mm_struct *mm = ctx->mm;
>> - pte_t *pte;
>> + pte_t *ptep, pte;
>> bool ret = true;
>>
>> VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
>>
>> - pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
>> - if (!pte)
>> + ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
>> +
>> + if (!ptep)
>> goto out;
>>
>> ret = false;
>> + pte = huge_ptep_get(ptep);
>>
>> /*
>> * Lockless access: we're in a wait_event so it's ok if it
>> * changes under us.
>> */
>> - if (huge_pte_none(*pte))
>> + if (huge_pte_none(pte))
>> ret = true;
>> - if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
>> + if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
>> ret = true;
>> out:
>> return ret;
>>
Hello,
On Wed, Jun 27, 2018 at 10:47:44AM +0200, Janosch Frank wrote:
> On 26.06.2018 19:00, Mike Kravetz wrote:
> > On 06/26/2018 06:24 AM, Janosch Frank wrote:
> >> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> >> check them with the huge_pte_* functions. Otherwise some architectures
> >> will check the wrong values and will not wait for userspace to bring
> >> in the memory.
> >>
> >> Signed-off-by: Janosch Frank <[email protected]>
> >> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
> > Adding linux-mm and Andrew on Cc:
> >
> > Thanks for catching and fixing this.
>
> Sure
> I'd be happy if we get less of these problems with time, this one was
> rather painful to debug. :)
What I thought when I read the fix is it would be more robust and we
could catch any further error like this at build time by having
huge_pte_offset return a new type "hugepte_t *" instead of the current
"pte_t *". Of course then huge_ptep_get() would take a "hugepte_t *" as
parameter. The x86 implementation would then become:
static inline pte_t huge_ptep_get(hugepte_t *ptep)
{
return *(pte_t *)ptep;
}
I haven't tried it, perhaps it's not feasible for other reasons
because there's a significant fallout from such a change (i.e. a lot
of hugetlbfs methods needs to change input type), but you said you're
actively looking to get less of these problems this could be a way if
it can be done, so I should mention it.
The need of huge_ptep_get() of course is very apparent when reading the
fix, but it was all but apparent when reading the previous code and the
previous code was correct for x86 because of course huge_ptep_get is
implemented as *ptep on x86.
For now the current fix is certainly good, any robustness cleanup is
cleaner if done orthogonal anyway.
Thanks!
Andrea