LinuxLists.cc - mm: NULL ptr deref in unlink_file

2014-12-22 15:04:24

Subject: mm: NULL ptr deref in unlink_file_vma

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel, I've stumbled on the following spew:

[ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
[ 432.380085] PGD 57e5e0067 PUD 57e5e1067 PMD 0
[ 432.380085] Oops: 0002 [#1] PREEMPT SMP KASAN
[ 432.380085] Dumping ftrace buffer:
[ 432.380085] (ftrace buffer empty)
[ 432.380085] Modules linked in:
[ 432.380085] CPU: 4 PID: 9249 Comm: trinity-subchil Not tainted 3.18.0-next-20141219-sasha-00047-gaab33f6-dirty #1627
[ 432.380085] task: ffff8806a88c8000 ti: ffff880664f3c000 task.ti: ffff880664f3c000
[ 432.380085] RIP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
[ 432.380085] RSP: 0018:ffff880664f3fc98 EFLAGS: 00010292
[ 432.380085] RAX: 0000000000000038 RBX: ffff880101aa7c00 RCX: 1ffff10020354f8f
[ 432.380085] RDX: ffffffff00000001 RSI: 1ffff100fe326200 RDI: 0000000000000038
[ 432.380085] RBP: ffff880664f3fcb8 R08: 0000000000000000 R09: ffff880101aa6258
[ 432.380085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880000025900
[ 432.380085] R13: 0000000000000038 R14: 0000000000000000 R15: ffff880101aa7c00
[ 432.380085] FS: 00007f21149c4700(0000) GS:ffff880216400000(0000) knlGS:0000000000000000
[ 432.380085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 432.380085] CR2: 0000000000000038 CR3: 000000057a4f3000 CR4: 00000000000006a0
[ 432.380085] Stack:
[ 432.380085] ffff880101aa7c00 ffff880101aa7c78 ffff880101aa6200 ffff880101aa7c00
[ 432.380085] ffff880664f3fce8 ffffffffa1953e9d[ 432.400920] CONFIG_KASAN_INLINE enabled
[ 432.400923] GPF could be caused by NULL-ptr deref or user memory access

[ 432.402566] ffff880101aa7c00 00007f210f9f3000
[ 432.402566] dfffe90000000000 ffff880101aa4600 ffff880664f3fd48 ffffffffa1937821
[ 432.402566] Call Trace:
[ 432.402566] unlink_file_vma (mm/mmap.c:264)
[ 432.402566] free_pgtables (mm/memory.c:548)
[ 432.402566] exit_mmap (mm/mmap.c:2846)
[ 432.402566] ? kmem_cache_free (mm/slub.c:2712 mm/slub.c:2721)
[ 432.402566] mmput (kernel/fork.c:659)
[ 432.402566] do_exit (./arch/x86/include/asm/thread_info.h:164 kernel/exit.c:438 kernel/exit.c:732)
[ 432.402566] ? preempt_count_sub (kernel/sched/core.c:2620)
[ 432.402566] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 432.402566] do_group_exit (include/linux/sched.h:775 kernel/exit.c:858)
[ 432.402566] SyS_exit_group (kernel/exit.c:886)
[ 432.402566] system_call_fastpath (arch/x86/kernel/entry_64.S:423)
[ 432.402566] Code: 79 05 e8 f9 e9 a6 f2 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 48 ba 01 00 00 00 ff ff ff ff 55 48 89 f8 48 89 e5 53 48 83 ec 18 <f0> 48 0f c1 10 85 d2 74 05 e8 f7 e9 a6 f2 65 48 8b 1c 25 80 b9
All code
========
0: 79 05 jns 0x7
2: e8 f9 e9 a6 f2 callq 0xfffffffff2a6ea00
7: 5d pop %rbp
8: c3 retq
9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
10: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax
15: 48 ba 01 00 00 00 ff movabs $0xffffffff00000001,%rdx
1c: ff ff ff
1f: 55 push %rbp
20: 48 89 f8 mov %rdi,%rax
23: 48 89 e5 mov %rsp,%rbp
26: 53 push %rbx
27: 48 83 ec 18 sub $0x18,%rsp
2b:* f0 48 0f c1 10 lock xadd %rdx,(%rax) <-- trapping instruction
30: 85 d2 test %edx,%edx
32: 74 05 je 0x39
34: e8 f7 e9 a6 f2 callq 0xfffffffff2a6ea30
39: 65 gs
3a: 48 rex.W
3b: 8b .byte 0x8b
3c: 1c 25 sbb $0x25,%al
3e: 80 .byte 0x80
3f: b9 .byte 0xb9
...

Code starting with the faulting instruction
===========================================
0: f0 48 0f c1 10 lock xadd %rdx,(%rax)
5: 85 d2 test %edx,%edx
7: 74 05 je 0xe
9: e8 f7 e9 a6 f2 callq 0xfffffffff2a6ea05
e: 65 gs
f: 48 rex.W
10: 8b .byte 0x8b
11: 1c 25 sbb $0x25,%al
13: 80 .byte 0x80
14: b9 .byte 0xb9
...
[ 432.402566] RIP down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
[ 432.402566] RSP <ffff880664f3fc98>
[ 432.402566] CR2: 0000000000000038

Thanks,
Sasha

2014-12-22 18:01:27

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel, I've stumbled on the following spew:
>
> [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)

Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
under us?

I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
path which could lead to the crash.

I've noticed one strange code path, which probably is not related to the
issue:

unmap_mapping_range()
i_mmap_lock_read(mapping);
unmap_mapping_range_tree()
unmap_mapping_range_vma()
zap_page_range_single()
unmap_single_vma()
if (unlikely(is_vm_hugetlb_page(vma))) {
i_mmap_lock_write(vma->vm_file->f_mapping);

Is it legal to down_write() semaphore while we have it taken on read?
It must be the same mapping, right?

> [ 432.380085] PGD 57e5e0067 PUD 57e5e1067 PMD 0
> [ 432.380085] Oops: 0002 [#1] PREEMPT SMP KASAN
> [ 432.380085] Dumping ftrace buffer:
> [ 432.380085] (ftrace buffer empty)
> [ 432.380085] Modules linked in:
> [ 432.380085] CPU: 4 PID: 9249 Comm: trinity-subchil Not tainted 3.18.0-next-20141219-sasha-00047-gaab33f6-dirty #1627
> [ 432.380085] task: ffff8806a88c8000 ti: ffff880664f3c000 task.ti: ffff880664f3c000
> [ 432.380085] RIP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> [ 432.380085] RSP: 0018:ffff880664f3fc98 EFLAGS: 00010292
> [ 432.380085] RAX: 0000000000000038 RBX: ffff880101aa7c00 RCX: 1ffff10020354f8f
> [ 432.380085] RDX: ffffffff00000001 RSI: 1ffff100fe326200 RDI: 0000000000000038
> [ 432.380085] RBP: ffff880664f3fcb8 R08: 0000000000000000 R09: ffff880101aa6258
> [ 432.380085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880000025900
> [ 432.380085] R13: 0000000000000038 R14: 0000000000000000 R15: ffff880101aa7c00
> [ 432.380085] FS: 00007f21149c4700(0000) GS:ffff880216400000(0000) knlGS:0000000000000000
> [ 432.380085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 432.380085] CR2: 0000000000000038 CR3: 000000057a4f3000 CR4: 00000000000006a0
> [ 432.380085] Stack:
> [ 432.380085] ffff880101aa7c00 ffff880101aa7c78 ffff880101aa6200 ffff880101aa7c00
> [ 432.380085] ffff880664f3fce8 ffffffffa1953e9d[ 432.400920] CONFIG_KASAN_INLINE enabled
> [ 432.400923] GPF could be caused by NULL-ptr deref or user memory access
>
> [ 432.402566] ffff880101aa7c00 00007f210f9f3000
> [ 432.402566] dfffe90000000000 ffff880101aa4600 ffff880664f3fd48 ffffffffa1937821
> [ 432.402566] Call Trace:
> [ 432.402566] unlink_file_vma (mm/mmap.c:264)
> [ 432.402566] free_pgtables (mm/memory.c:548)
> [ 432.402566] exit_mmap (mm/mmap.c:2846)
> [ 432.402566] ? kmem_cache_free (mm/slub.c:2712 mm/slub.c:2721)
> [ 432.402566] mmput (kernel/fork.c:659)
> [ 432.402566] do_exit (./arch/x86/include/asm/thread_info.h:164 kernel/exit.c:438 kernel/exit.c:732)
> [ 432.402566] ? preempt_count_sub (kernel/sched/core.c:2620)
> [ 432.402566] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 432.402566] do_group_exit (include/linux/sched.h:775 kernel/exit.c:858)
> [ 432.402566] SyS_exit_group (kernel/exit.c:886)
> [ 432.402566] system_call_fastpath (arch/x86/kernel/entry_64.S:423)
> [ 432.402566] Code: 79 05 e8 f9 e9 a6 f2 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 48 ba 01 00 00 00 ff ff ff ff 55 48 89 f8 48 89 e5 53 48 83 ec 18 <f0> 48 0f c1 10 85 d2 74 05 e8 f7 e9 a6 f2 65 48 8b 1c 25 80 b9
> All code
> ========
> 0: 79 05 jns 0x7
> 2: e8 f9 e9 a6 f2 callq 0xfffffffff2a6ea00
> 7: 5d pop %rbp
> 8: c3 retq
> 9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> 10: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax
> 15: 48 ba 01 00 00 00 ff movabs $0xffffffff00000001,%rdx
> 1c: ff ff ff
> 1f: 55 push %rbp
> 20: 48 89 f8 mov %rdi,%rax
> 23: 48 89 e5 mov %rsp,%rbp
> 26: 53 push %rbx
> 27: 48 83 ec 18 sub $0x18,%rsp
> 2b:* f0 48 0f c1 10 lock xadd %rdx,(%rax) <-- trapping instruction
> 30: 85 d2 test %edx,%edx
> 32: 74 05 je 0x39
> 34: e8 f7 e9 a6 f2 callq 0xfffffffff2a6ea30
> 39: 65 gs
> 3a: 48 rex.W
> 3b: 8b .byte 0x8b
> 3c: 1c 25 sbb $0x25,%al
> 3e: 80 .byte 0x80
> 3f: b9 .byte 0xb9
> ...
>
> Code starting with the faulting instruction
> ===========================================
> 0: f0 48 0f c1 10 lock xadd %rdx,(%rax)
> 5: 85 d2 test %edx,%edx
> 7: 74 05 je 0xe
> 9: e8 f7 e9 a6 f2 callq 0xfffffffff2a6ea05
> e: 65 gs
> f: 48 rex.W
> 10: 8b .byte 0x8b
> 11: 1c 25 sbb $0x25,%al
> 13: 80 .byte 0x80
> 14: b9 .byte 0xb9
> ...
> [ 432.402566] RIP down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> [ 432.402566] RSP <ffff880664f3fc98>
> [ 432.402566] CR2: 0000000000000038
>
>
> Thanks,
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kirill A. Shutemov

2014-12-22 18:04:34

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

[ fixed Davidlohr's address. ]

On Mon, Dec 22, 2014 at 08:01:02PM +0200, Kirill A. Shutemov wrote:
> On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
> > Hi all,
> >
> > While fuzzing with trinity inside a KVM tools guest running the latest -next
> > kernel, I've stumbled on the following spew:
> >
> > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
>
> Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
> under us?
>
> I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
> path which could lead to the crash.
>
> I've noticed one strange code path, which probably is not related to the
> issue:
>
> unmap_mapping_range()
> i_mmap_lock_read(mapping);
> unmap_mapping_range_tree()
> unmap_mapping_range_vma()
> zap_page_range_single()
> unmap_single_vma()
> if (unlikely(is_vm_hugetlb_page(vma))) {
> i_mmap_lock_write(vma->vm_file->f_mapping);
>
> Is it legal to down_write() semaphore while we have it taken on read?
> It must be the same mapping, right?
>
> > [ 432.380085] PGD 57e5e0067 PUD 57e5e1067 PMD 0
> > [ 432.380085] Oops: 0002 [#1] PREEMPT SMP KASAN
> > [ 432.380085] Dumping ftrace buffer:
> > [ 432.380085] (ftrace buffer empty)
> > [ 432.380085] Modules linked in:
> > [ 432.380085] CPU: 4 PID: 9249 Comm: trinity-subchil Not tainted 3.18.0-next-20141219-sasha-00047-gaab33f6-dirty #1627
> > [ 432.380085] task: ffff8806a88c8000 ti: ffff880664f3c000 task.ti: ffff880664f3c000
> > [ 432.380085] RIP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> > [ 432.380085] RSP: 0018:ffff880664f3fc98 EFLAGS: 00010292
> > [ 432.380085] RAX: 0000000000000038 RBX: ffff880101aa7c00 RCX: 1ffff10020354f8f
> > [ 432.380085] RDX: ffffffff00000001 RSI: 1ffff100fe326200 RDI: 0000000000000038
> > [ 432.380085] RBP: ffff880664f3fcb8 R08: 0000000000000000 R09: ffff880101aa6258
> > [ 432.380085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880000025900
> > [ 432.380085] R13: 0000000000000038 R14: 0000000000000000 R15: ffff880101aa7c00
> > [ 432.380085] FS: 00007f21149c4700(0000) GS:ffff880216400000(0000) knlGS:0000000000000000
> > [ 432.380085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 432.380085] CR2: 0000000000000038 CR3: 000000057a4f3000 CR4: 00000000000006a0
> > [ 432.380085] Stack:
> > [ 432.380085] ffff880101aa7c00 ffff880101aa7c78 ffff880101aa6200 ffff880101aa7c00
> > [ 432.380085] ffff880664f3fce8 ffffffffa1953e9d[ 432.400920] CONFIG_KASAN_INLINE enabled
> > [ 432.400923] GPF could be caused by NULL-ptr deref or user memory access
> >
> > [ 432.402566] ffff880101aa7c00 00007f210f9f3000
> > [ 432.402566] dfffe90000000000 ffff880101aa4600 ffff880664f3fd48 ffffffffa1937821
> > [ 432.402566] Call Trace:
> > [ 432.402566] unlink_file_vma (mm/mmap.c:264)
> > [ 432.402566] free_pgtables (mm/memory.c:548)
> > [ 432.402566] exit_mmap (mm/mmap.c:2846)
> > [ 432.402566] ? kmem_cache_free (mm/slub.c:2712 mm/slub.c:2721)
> > [ 432.402566] mmput (kernel/fork.c:659)
> > [ 432.402566] do_exit (./arch/x86/include/asm/thread_info.h:164 kernel/exit.c:438 kernel/exit.c:732)
> > [ 432.402566] ? preempt_count_sub (kernel/sched/core.c:2620)
> > [ 432.402566] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > [ 432.402566] do_group_exit (include/linux/sched.h:775 kernel/exit.c:858)
> > [ 432.402566] SyS_exit_group (kernel/exit.c:886)
> > [ 432.402566] system_call_fastpath (arch/x86/kernel/entry_64.S:423)
> > [ 432.402566] Code: 79 05 e8 f9 e9 a6 f2 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 48 ba 01 00 00 00 ff ff ff ff 55 48 89 f8 48 89 e5 53 48 83 ec 18 <f0> 48 0f c1 10 85 d2 74 05 e8 f7 e9 a6 f2 65 48 8b 1c 25 80 b9
> > All code
> > ========
> > 0: 79 05 jns 0x7
> > 2: e8 f9 e9 a6 f2 callq 0xfffffffff2a6ea00
> > 7: 5d pop %rbp
> > 8: c3 retq
> > 9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> > 10: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax
> > 15: 48 ba 01 00 00 00 ff movabs $0xffffffff00000001,%rdx
> > 1c: ff ff ff
> > 1f: 55 push %rbp
> > 20: 48 89 f8 mov %rdi,%rax
> > 23: 48 89 e5 mov %rsp,%rbp
> > 26: 53 push %rbx
> > 27: 48 83 ec 18 sub $0x18,%rsp
> > 2b:* f0 48 0f c1 10 lock xadd %rdx,(%rax) <-- trapping instruction
> > 30: 85 d2 test %edx,%edx
> > 32: 74 05 je 0x39
> > 34: e8 f7 e9 a6 f2 callq 0xfffffffff2a6ea30
> > 39: 65 gs
> > 3a: 48 rex.W
> > 3b: 8b .byte 0x8b
> > 3c: 1c 25 sbb $0x25,%al
> > 3e: 80 .byte 0x80
> > 3f: b9 .byte 0xb9
> > ...
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: f0 48 0f c1 10 lock xadd %rdx,(%rax)
> > 5: 85 d2 test %edx,%edx
> > 7: 74 05 je 0xe
> > 9: e8 f7 e9 a6 f2 callq 0xfffffffff2a6ea05
> > e: 65 gs
> > f: 48 rex.W
> > 10: 8b .byte 0x8b
> > 11: 1c 25 sbb $0x25,%al
> > 13: 80 .byte 0x80
> > 14: b9 .byte 0xb9
> > ...
> > [ 432.402566] RIP down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> > [ 432.402566] RSP <ffff880664f3fc98>
> > [ 432.402566] CR2: 0000000000000038
> >
> >
> > Thanks,
> > Sasha
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> --
> Kirill A. Shutemov
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2014-12-22 18:05:38

by Sasha Levin

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On 12/22/2014 01:01 PM, Kirill A. Shutemov wrote:
> On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
>> > Hi all,
>> >
>> > While fuzzing with trinity inside a KVM tools guest running the latest -next
>> > kernel, I've stumbled on the following spew:
>> >
>> > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
>> > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
> under us?
>
> I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
> path which could lead to the crash.

I've reported a different issue which that patchset: https://lkml.org/lkml/2014/12/9/741

I guess it could be related?

Thanks,
Sasha

2014-12-22 19:04:40

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Mon, 2014-12-22 at 20:04 +0200, Kirill A. Shutemov wrote:
> [ fixed Davidlohr's address. ]
>
> On Mon, Dec 22, 2014 at 08:01:02PM +0200, Kirill A. Shutemov wrote:
> > On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
> > > Hi all,
> > >
> > > While fuzzing with trinity inside a KVM tools guest running the latest -next
> > > kernel, I've stumbled on the following spew:
> > >
> > > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> >
> > Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
> > under us?
> >
> > I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
> > path which could lead to the crash.

Sasha, does this still occur if you revert c8475d144abb?

> > I've noticed one strange code path, which probably is not related to the
> > issue:
> >
> > unmap_mapping_range()
> > i_mmap_lock_read(mapping);
> > unmap_mapping_range_tree()
> > unmap_mapping_range_vma()
> > zap_page_range_single()
> > unmap_single_vma()
> > if (unlikely(is_vm_hugetlb_page(vma))) {
> > i_mmap_lock_write(vma->vm_file->f_mapping);

Right, this is would be completely bogus. But the deadlock cannot happen
in reality as hugetlb uses its own handlers and thus never calls
unmap_mapping_range.

Thanks,
Davidlohr

2014-12-22 19:17:47

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Mon, Dec 22, 2014 at 01:05:13PM -0500, Sasha Levin wrote:
> On 12/22/2014 01:01 PM, Kirill A. Shutemov wrote:
> > On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
> >> > Hi all,
> >> >
> >> > While fuzzing with trinity inside a KVM tools guest running the latest -next
> >> > kernel, I've stumbled on the following spew:
> >> >
> >> > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> >> > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> > Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
> > under us?
> >
> > I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
> > path which could lead to the crash.
>
> I've reported a different issue which that patchset: https://lkml.org/lkml/2014/12/9/741
>
> I guess it could be related?

Maybe.

Other thing:

unmap_mapping_range()
i_mmap_lock_read(mapping);
unmap_mapping_range_tree()
unmap_mapping_range_vma()
zap_page_range_single()
unmap_single_vma()
untrack_pfn()
vma->vm_flags &= ~VM_PAT;

It seems we modify ->vm_flags without mmap_sem taken, means we can corrupt
them.

Sasha could you check if you hit untrack_pfn()?

The problem probably was hidden by exclusive i_mmap_lock on
unmap_mapping_range(), but it's not exclusive anymore afrer Dave's
patchset.

Konstantin, you've modified untrack_pfn() back in 2012 to change
->vm_flags. Any coments?

For now, I would propose to revert the commit and probably re-introduce it
after v3.19:

>From 14392c69fcfeeda34eb9f75d983dad32698cdd5c Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <[email protected]>
Date: Mon, 22 Dec 2014 21:01:54 +0200
Subject: [PATCH] Revert "mm/memory.c: share the i_mmap_rwsem"

This reverts commit c8475d144abb1e62958cc5ec281d2a9e161c1946.

There are several[1][2] of bug reports which points to this commit as potential
cause[3].

Let's revert it until we figure out what's going on.

[1] https://lkml.org/lkml/2014/11/14/342
[2] https://lkml.org/lkml/2014/12/22/213
[3] https://lkml.org/lkml/2014/12/9/741

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Mel Gorman <[email protected]>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 649e7d440bd7..ca920d1fd314 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2378,12 +2378,12 @@ void unmap_mapping_range(struct address_space *mapping,
details.last_index = ULONG_MAX;

- i_mmap_lock_read(mapping);
+ i_mmap_lock_write(mapping);
if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
- i_mmap_unlock_read(mapping);
+ i_mmap_unlock_write(mapping);
}
EXPORT_SYMBOL(unmap_mapping_range);

--
Kirill A. Shutemov

2014-12-22 20:38:45

by Sasha Levin

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On 12/22/2014 02:04 PM, Davidlohr Bueso wrote:
> Sasha, does this still occur if you revert c8475d144abb?

On 12/22/2014 02:14 PM, Kirill A. Shutemov wrote:
> Sasha could you check if you hit untrack_pfn()?

I'm afraid I only hit this issue once, unlike the other once which
was bisected down.

I'm trying to play with it a bit to see if I can "help" it reproduce,
but no luck so far.

Thanks,
Sasha

2014-12-22 22:12:14

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Mon, 2014-12-22 at 21:14 +0200, Kirill A. Shutemov wrote:
> Other thing:
>
> unmap_mapping_range()
> i_mmap_lock_read(mapping);
> unmap_mapping_range_tree()
> unmap_mapping_range_vma()
> zap_page_range_single()
> unmap_single_vma()
> untrack_pfn()
> vma->vm_flags &= ~VM_PAT;
>
> It seems we modify ->vm_flags without mmap_sem taken, means we can corrupt
> them.

yep. Although one thing that wouldn't match this would be the mlock'd
bad page when freeing in both of Sasha's previous reports, as we would
need to have VM_PFNMAP when calling untrack_pfn().

> Sasha could you check if you hit untrack_pfn()?
>
> The problem probably was hidden by exclusive i_mmap_lock on
> unmap_mapping_range(), but it's not exclusive anymore afrer Dave's
> patchset.
>
> Konstantin, you've modified untrack_pfn() back in 2012 to change
> ->vm_flags. Any coments?
>
> For now, I would propose to revert the commit and probably re-introduce it
> after v3.19:
>
> From 14392c69fcfeeda34eb9f75d983dad32698cdd5c Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <[email protected]>
> Date: Mon, 22 Dec 2014 21:01:54 +0200
> Subject: [PATCH] Revert "mm/memory.c: share the i_mmap_rwsem"
>
> This reverts commit c8475d144abb1e62958cc5ec281d2a9e161c1946.
>
> There are several[1][2] of bug reports which points to this commit as potential
> cause[3].
>
> Let's revert it until we figure out what's going on.
>
> [1] https://lkml.org/lkml/2014/11/14/342
> [2] https://lkml.org/lkml/2014/12/22/213
> [3] https://lkml.org/lkml/2014/12/9/741
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Reported-by: Sasha Levin <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>

I certainly have no problem with this. Furthermore we snuck this one in
kinda last minute, so:

Acked-by: Davidlohr Bueso <[email protected]>

Thanks,
Davidlohr

2015-02-10 18:42:34

by Konstantin Khlebnikov

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Mon, Dec 22, 2014 at 10:14 PM, Kirill A. Shutemov
<[email protected]> wrote:
> On Mon, Dec 22, 2014 at 01:05:13PM -0500, Sasha Levin wrote:
>> On 12/22/2014 01:01 PM, Kirill A. Shutemov wrote:
>> > On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
>> >> > Hi all,
>> >> >
>> >> > While fuzzing with trinity inside a KVM tools guest running the latest -next
>> >> > kernel, I've stumbled on the following spew:
>> >> >
>> >> > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
>> >> > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
>> > Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
>> > under us?
>> >
>> > I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
>> > path which could lead to the crash.
>>
>> I've reported a different issue which that patchset: https://lkml.org/lkml/2014/12/9/741
>>
>> I guess it could be related?
>
> Maybe.
>
> Other thing:
>
> unmap_mapping_range()
> i_mmap_lock_read(mapping);
> unmap_mapping_range_tree()
> unmap_mapping_range_vma()
> zap_page_range_single()
> unmap_single_vma()
> untrack_pfn()
> vma->vm_flags &= ~VM_PAT;
>
> It seems we modify ->vm_flags without mmap_sem taken, means we can corrupt
> them.
>
> Sasha could you check if you hit untrack_pfn()?
>
> The problem probably was hidden by exclusive i_mmap_lock on
> unmap_mapping_range(), but it's not exclusive anymore afrer Dave's
> patchset.
>
> Konstantin, you've modified untrack_pfn() back in 2012 to change
> ->vm_flags. Any coments?

Hmm. I don't really understand how
unmap_mapping_range() could be used for VM_PFNMAP mappings
except unmap() or exit_mmap() where mm is locked anyway.
Somebody truncates memory mapped device and unmaps mapped PFNs?

If it's a problem then I think VM_PAT could be tuned into hint which
means PAT tracking was here and we pat should check internal structure
for details and take actions if pat tracking is still here. As I see
pat tracking probably also have problems if somebody unmaps that vma
partially.

>
> For now, I would propose to revert the commit and probably re-introduce it
> after v3.19:
>
> From 14392c69fcfeeda34eb9f75d983dad32698cdd5c Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <[email protected]>
> Date: Mon, 22 Dec 2014 21:01:54 +0200
> Subject: [PATCH] Revert "mm/memory.c: share the i_mmap_rwsem"
>
> This reverts commit c8475d144abb1e62958cc5ec281d2a9e161c1946.
>
> There are several[1][2] of bug reports which points to this commit as potential
> cause[3].
>
> Let's revert it until we figure out what's going on.
>
> [1] https://lkml.org/lkml/2014/11/14/342
> [2] https://lkml.org/lkml/2014/12/22/213
> [3] https://lkml.org/lkml/2014/12/9/741
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Reported-by: Sasha Levin <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Srikar Dronamraju <[email protected]>
> Cc: Mel Gorman <[email protected]>
> ---
> mm/memory.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 649e7d440bd7..ca920d1fd314 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2378,12 +2378,12 @@ void unmap_mapping_range(struct address_space *mapping,
> details.last_index = ULONG_MAX;
>
>
> - i_mmap_lock_read(mapping);
> + i_mmap_lock_write(mapping);

Probably we could enable read-lock for "good" mappings, for example:

if (mapping->a_ops->error_remove_page == generic_error_remove_page)
i_mmap_lock_read(mapping);
else
i_mmap_lock_write(mapping);

=)

> if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
> unmap_mapping_range_tree(&mapping->i_mmap, &details);
> if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
> unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
> - i_mmap_unlock_read(mapping);
> + i_mmap_unlock_write(mapping);
> }
> EXPORT_SYMBOL(unmap_mapping_range);
>
> --
> Kirill A. Shutemov

2015-02-11 12:25:19

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Tue, Feb 10, 2015 at 10:42:31PM +0400, Konstantin Khlebnikov wrote:
> On Mon, Dec 22, 2014 at 10:14 PM, Kirill A. Shutemov
> <[email protected]> wrote:
> > On Mon, Dec 22, 2014 at 01:05:13PM -0500, Sasha Levin wrote:
> >> On 12/22/2014 01:01 PM, Kirill A. Shutemov wrote:
> >> > On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > While fuzzing with trinity inside a KVM tools guest running the latest -next
> >> >> > kernel, I've stumbled on the following spew:
> >> >> >
> >> >> > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> >> >> > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
> >> > Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
> >> > under us?
> >> >
> >> > I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
> >> > path which could lead to the crash.
> >>
> >> I've reported a different issue which that patchset: https://lkml.org/lkml/2014/12/9/741
> >>
> >> I guess it could be related?
> >
> > Maybe.
> >
> > Other thing:
> >
> > unmap_mapping_range()
> > i_mmap_lock_read(mapping);
> > unmap_mapping_range_tree()
> > unmap_mapping_range_vma()
> > zap_page_range_single()
> > unmap_single_vma()
> > untrack_pfn()
> > vma->vm_flags &= ~VM_PAT;
> >
> > It seems we modify ->vm_flags without mmap_sem taken, means we can corrupt
> > them.
> >
> > Sasha could you check if you hit untrack_pfn()?
> >
> > The problem probably was hidden by exclusive i_mmap_lock on
> > unmap_mapping_range(), but it's not exclusive anymore afrer Dave's
> > patchset.
> >
> > Konstantin, you've modified untrack_pfn() back in 2012 to change
> > ->vm_flags. Any coments?
>
> Hmm. I don't really understand how
> unmap_mapping_range() could be used for VM_PFNMAP mappings
> except unmap() or exit_mmap() where mm is locked anyway.
> Somebody truncates memory mapped device and unmaps mapped PFNs?

Hm. Probably not. But it's not obvious to me what would stop this.
Should we at least have assert on mmap_sem locked in untrack_pfn()?

> If it's a problem then I think VM_PAT could be tuned into hint which
> means PAT tracking was here and we pat should check internal structure
> for details and take actions if pat tracking is still here. As I see
> pat tracking probably also have problems if somebody unmaps that vma
> partially.

IIUC, we only mark a vma with VM_PAT if whole vma is subject for
remap_pfn_range(). I don't see a point in cleaning VM_PAT -- just let it
die with vma. Or do I miss something?

--
Kirill A. Shutemov

2015-02-12 13:42:23

by Konstantin Khlebnikov

[permalink] [raw]

Subject: Re: mm: NULL ptr deref in unlink_file_vma

On Wed, Feb 11, 2015 at 3:22 PM, Kirill A. Shutemov
<[email protected]> wrote:
> On Tue, Feb 10, 2015 at 10:42:31PM +0400, Konstantin Khlebnikov wrote:
>> On Mon, Dec 22, 2014 at 10:14 PM, Kirill A. Shutemov
>> <[email protected]> wrote:
>> > On Mon, Dec 22, 2014 at 01:05:13PM -0500, Sasha Levin wrote:
>> >> On 12/22/2014 01:01 PM, Kirill A. Shutemov wrote:
>> >> > On Mon, Dec 22, 2014 at 10:04:02AM -0500, Sasha Levin wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > While fuzzing with trinity inside a KVM tools guest running the latest -next
>> >> >> > kernel, I've stumbled on the following spew:
>> >> >> >
>> >> >> > [ 432.376425] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
>> >> >> > [ 432.378876] IP: down_write (./arch/x86/include/asm/rwsem.h:105 ./arch/x86/include/asm/rwsem.h:121 kernel/locking/rwsem.c:71)
>> >> > Looks like vma->vm_file->mapping is NULL. Somebody freed ->vm_file from
>> >> > under us?
>> >> >
>> >> > I suspect Davidlohr's patchset on i_mmap_lock, but I cannot find any code
>> >> > path which could lead to the crash.
>> >>
>> >> I've reported a different issue which that patchset: https://lkml.org/lkml/2014/12/9/741
>> >>
>> >> I guess it could be related?
>> >
>> > Maybe.
>> >
>> > Other thing:
>> >
>> > unmap_mapping_range()
>> > i_mmap_lock_read(mapping);
>> > unmap_mapping_range_tree()
>> > unmap_mapping_range_vma()
>> > zap_page_range_single()
>> > unmap_single_vma()
>> > untrack_pfn()
>> > vma->vm_flags &= ~VM_PAT;
>> >
>> > It seems we modify ->vm_flags without mmap_sem taken, means we can corrupt
>> > them.
>> >
>> > Sasha could you check if you hit untrack_pfn()?
>> >
>> > The problem probably was hidden by exclusive i_mmap_lock on
>> > unmap_mapping_range(), but it's not exclusive anymore afrer Dave's
>> > patchset.
>> >
>> > Konstantin, you've modified untrack_pfn() back in 2012 to change
>> > ->vm_flags. Any coments?
>>
>> Hmm. I don't really understand how
>> unmap_mapping_range() could be used for VM_PFNMAP mappings
>> except unmap() or exit_mmap() where mm is locked anyway.
>> Somebody truncates memory mapped device and unmaps mapped PFNs?
>
> Hm. Probably not. But it's not obvious to me what would stop this.
> Should we at least have assert on mmap_sem locked in untrack_pfn()?

exit_mmap() runs without mmap_sem thus this should be something like thus:
WARN_ON_ONCE(atomic_read(&mm->mm_users) && !rwsem_is_locked(&mm->mmap_sem));

Clearing VM_MAYSHARE in __unmap_hugepage_range_final() has the same problem,
it's called from unmap_single_vma() as well as untrack_pfn().

Also clear_soft_dirty_pmd() clears VM_SOFTDIRTY under mmap_sem locked for read.

>
>> If it's a problem then I think VM_PAT could be tuned into hint which
>> means PAT tracking was here and we pat should check internal structure
>> for details and take actions if pat tracking is still here. As I see
>> pat tracking probably also have problems if somebody unmaps that vma
>> partially.
>
> IIUC, we only mark a vma with VM_PAT if whole vma is subject for
> remap_pfn_range(). I don't see a point in cleaning VM_PAT -- just let it
> die with vma. Or do I miss something?

Yep, for now track/untrack works only for whole vma and cannot handle
vma split or partial munmap.

>
> --
> Kirill A. Shutemov