2022-09-08 21:18:13

by Sergei Antonov

[permalink] [raw]
Subject: [PATCH] mm: bring back update_mmu_cache() to finish_fault()

Running this test program on ARMv4 a few times (sometimes just once)
reproduces the bug.

int main()
{
unsigned i;
char paragon[SIZE];
void* ptr;

memset(paragon, 0xAA, SIZE);
ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_ANON | MAP_SHARED, -1, 0);
if (ptr == MAP_FAILED) return 1;
printf("ptr = %p\n", ptr);
for (i=0;i<10000;i++){
memset(ptr, 0xAA, SIZE);
if (memcmp(ptr, paragon, SIZE)) {
printf("Unexpected bytes on iteration %u!!!\n", i);
break;
}
}
munmap(ptr, SIZE);
}

In the "ptr" buffer there appear runs of zero bytes which are aligned
by 16 and their lengths are multiple of 16.

Linux v5.11 does not have the bug, "git bisect" finds the first bad commit:
f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")

Before the commit update_mmu_cache() was called during a call to
filemap_map_pages() as well as finish_fault(). After the commit
finish_fault() lacks it.

Bring back update_mmu_cache() to finish_fault() to fix the bug.
Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more
closely reproduce the code of alloc_set_pte() function that existed before
the commit.

On many platforms update_mmu_cache() is nop:
x86, see arch/x86/include/asm/pgtable
ARMv6+, see arch/arm/include/asm/tlbflush.h
So, it seems, few users ran into this bug.

Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Sergei Antonov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
---
mm/memory.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 4ba73f5aa8bb..a78814413ac0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4386,14 +4386,20 @@ vm_fault_t finish_fault(struct vm_fault *vmf)

vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
vmf->address, &vmf->ptl);
- ret = 0;
+
/* Re-check under ptl */
- if (likely(!vmf_pte_changed(vmf)))
+ if (likely(!vmf_pte_changed(vmf))) {
do_set_pte(vmf, page, vmf->address);
- else
+
+ /* no need to invalidate: a not-present page won't be cached */
+ update_mmu_cache(vma, vmf->address, vmf->pte);
+
+ ret = 0;
+ } else {
+ update_mmu_tlb(vma, vmf->address, vmf->pte);
ret = VM_FAULT_NOPAGE;
+ }

- update_mmu_tlb(vma, vmf->address, vmf->pte);
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
}
--
2.34.1


2022-09-08 22:35:11

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] mm: bring back update_mmu_cache() to finish_fault()

On Thu, Sep 08, 2022 at 11:48:09PM +0300, Sergei Antonov wrote:
> Running this test program on ARMv4 a few times (sometimes just once)
> reproduces the bug.
>
> int main()
> {
> unsigned i;
> char paragon[SIZE];
> void* ptr;
>
> memset(paragon, 0xAA, SIZE);
> ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> MAP_ANON | MAP_SHARED, -1, 0);
> if (ptr == MAP_FAILED) return 1;
> printf("ptr = %p\n", ptr);
> for (i=0;i<10000;i++){
> memset(ptr, 0xAA, SIZE);
> if (memcmp(ptr, paragon, SIZE)) {
> printf("Unexpected bytes on iteration %u!!!\n", i);
> break;
> }
> }
> munmap(ptr, SIZE);
> }
>
> In the "ptr" buffer there appear runs of zero bytes which are aligned
> by 16 and their lengths are multiple of 16.
>
> Linux v5.11 does not have the bug, "git bisect" finds the first bad commit:
> f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
>
> Before the commit update_mmu_cache() was called during a call to
> filemap_map_pages() as well as finish_fault(). After the commit
> finish_fault() lacks it.
>
> Bring back update_mmu_cache() to finish_fault() to fix the bug.
> Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more
> closely reproduce the code of alloc_set_pte() function that existed before
> the commit.
>
> On many platforms update_mmu_cache() is nop:
> x86, see arch/x86/include/asm/pgtable
> ARMv6+, see arch/arm/include/asm/tlbflush.h
> So, it seems, few users ran into this bug.
>
> Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> Signed-off-by: Sergei Antonov <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>

+Will.

Seems I confused update_mmu_tlb() with update_mmu_cache() :/

Looks good to me:

Acked-by: Kirill A. Shutemov <[email protected]>

> ---
> mm/memory.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 4ba73f5aa8bb..a78814413ac0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4386,14 +4386,20 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>
> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> vmf->address, &vmf->ptl);
> - ret = 0;
> +
> /* Re-check under ptl */
> - if (likely(!vmf_pte_changed(vmf)))
> + if (likely(!vmf_pte_changed(vmf))) {
> do_set_pte(vmf, page, vmf->address);
> - else
> +
> + /* no need to invalidate: a not-present page won't be cached */
> + update_mmu_cache(vma, vmf->address, vmf->pte);
> +
> + ret = 0;
> + } else {
> + update_mmu_tlb(vma, vmf->address, vmf->pte);
> ret = VM_FAULT_NOPAGE;
> + }
>
> - update_mmu_tlb(vma, vmf->address, vmf->pte);
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> return ret;
> }
> --
> 2.34.1
>

--
Kiryl Shutsemau / Kirill A. Shutemov

2022-09-09 05:41:25

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] mm: bring back update_mmu_cache() to finish_fault()

On Thu, Sep 08, 2022 at 11:48:09PM +0300, Sergei Antonov wrote:
> Running this test program on ARMv4 a few times (sometimes just once)
> reproduces the bug.
>
> int main()
> {
> unsigned i;
> char paragon[SIZE];
> void* ptr;
>
> memset(paragon, 0xAA, SIZE);
> ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> MAP_ANON | MAP_SHARED, -1, 0);
> if (ptr == MAP_FAILED) return 1;
> printf("ptr = %p\n", ptr);
> for (i=0;i<10000;i++){
> memset(ptr, 0xAA, SIZE);
> if (memcmp(ptr, paragon, SIZE)) {
> printf("Unexpected bytes on iteration %u!!!\n", i);
> break;
> }
> }
> munmap(ptr, SIZE);
> }
>
> In the "ptr" buffer there appear runs of zero bytes which are aligned
> by 16 and their lengths are multiple of 16.
>
> Linux v5.11 does not have the bug, "git bisect" finds the first bad commit:
> f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
>
> Before the commit update_mmu_cache() was called during a call to
> filemap_map_pages() as well as finish_fault(). After the commit
> finish_fault() lacks it.
>
> Bring back update_mmu_cache() to finish_fault() to fix the bug.
> Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more
> closely reproduce the code of alloc_set_pte() function that existed before
> the commit.
>
> On many platforms update_mmu_cache() is nop:
> x86, see arch/x86/include/asm/pgtable
> ARMv6+, see arch/arm/include/asm/tlbflush.h
> So, it seems, few users ran into this bug.
>
> Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> Signed-off-by: Sergei Antonov <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> ---
> mm/memory.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

2022-09-09 10:25:04

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] mm: bring back update_mmu_cache() to finish_fault()

On Fri, Sep 09, 2022 at 01:24:10AM +0300, Kirill A. Shutemov wrote:
> On Thu, Sep 08, 2022 at 11:48:09PM +0300, Sergei Antonov wrote:
> > Running this test program on ARMv4 a few times (sometimes just once)
> > reproduces the bug.
> >
> > int main()
> > {
> > unsigned i;
> > char paragon[SIZE];
> > void* ptr;
> >
> > memset(paragon, 0xAA, SIZE);
> > ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> > MAP_ANON | MAP_SHARED, -1, 0);
> > if (ptr == MAP_FAILED) return 1;
> > printf("ptr = %p\n", ptr);
> > for (i=0;i<10000;i++){
> > memset(ptr, 0xAA, SIZE);
> > if (memcmp(ptr, paragon, SIZE)) {
> > printf("Unexpected bytes on iteration %u!!!\n", i);
> > break;
> > }
> > }
> > munmap(ptr, SIZE);
> > }
> >
> > In the "ptr" buffer there appear runs of zero bytes which are aligned
> > by 16 and their lengths are multiple of 16.
> >
> > Linux v5.11 does not have the bug, "git bisect" finds the first bad commit:
> > f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> >
> > Before the commit update_mmu_cache() was called during a call to
> > filemap_map_pages() as well as finish_fault(). After the commit
> > finish_fault() lacks it.
> >
> > Bring back update_mmu_cache() to finish_fault() to fix the bug.
> > Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more
> > closely reproduce the code of alloc_set_pte() function that existed before
> > the commit.
> >
> > On many platforms update_mmu_cache() is nop:
> > x86, see arch/x86/include/asm/pgtable
> > ARMv6+, see arch/arm/include/asm/tlbflush.h
> > So, it seems, few users ran into this bug.
> >
> > Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
> > Signed-off-by: Sergei Antonov <[email protected]>
> > Cc: Kirill A. Shutemov <[email protected]>
>
> +Will.
>
> Seems I confused update_mmu_tlb() with update_mmu_cache() :/

Urgh, that thing is pretty horrible! But anyway, I agree that this change
looks correct based on the other callers in the file.

> Looks good to me:
>
> Acked-by: Kirill A. Shutemov <[email protected]>

I'm assuming Andrew will pick this up. Otherwise, please let me know if
I should route it via the arm64 tree.

Will