LinuxLists.cc - a crash when running strace from persistent memory

2020-09-03 19:25:42

Subject: a crash when running strace from persistent memory

Hi

There's a bug when you run strace from dax-based filesystem.

-- create real or emulated persistent memory device (/dev/pmem0)
mkfs.ext2 /dev/pmem0
-- mount it
mount -t ext2 -o dax /dev/pmem0 /mnt/test
-- copy the system to it (well, you can copy just a few files that are
needed for running strace and ls)
cp -ax / /mnt/test
-- bind the system directories
mount --bind /dev /mnt/test/dev
mount --bind /proc /mnt/test/proc
mount --bind /sys /mnt/test/sys
-- run strace on the ls command
chroot /mnt/test/ strace /bin/ls

You get this warning and ls is killed with SIGSEGV.

I bisected the problem and it is caused by the commit
17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around
"COW can break either way" issue). When I revert the patch (on the kernel
5.9-rc3), the bug goes away.

Mikulas

[ 84.190961] ------------[ cut here ]------------
[ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6
[ 84.192398] Modules linked in: ext2 uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb fbdev ipv6 tun autofs4 binfmt_misc configfs af_packet mousedev virtio_balloon virtio_rng evdev rng_core pcspkr button raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx libcrc32c raid1 raid0 md_mod sd_mod t10_pi virtio_scsi virtio_net psmouse net_failover scsi_mod failover
[ 84.196301] CPU: 6 PID: 1350 Comm: strace Not tainted 5.9.0-rc3 #6
[ 84.197020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 84.197685] RIP: 0010:wp_page_copy.cold+0xdb/0xf6
[ 84.198231] Code: ff ff ff 0f 00 eb 8e 48 8b 3c 24 48 8b 74 24 08 ba 00 10 00 00 e8 33 87 1f 00 85 c0 74 1f 48 c7 c7 a7 2b ba 81 e8 cc b6 f0 ff <0f> 0b 48 8b 3c 24 e8 08 82 1f 00 41 be 01 00 00 00 eb ae 41 be 01
[ 84.200410] RSP: 0018:ffff88940c1dba58 EFLAGS: 00010282
[ 84.201035] RAX: 0000000000000006 RBX: ffff88940c1dbb00 RCX: 0000000000000000
[ 84.201842] RDX: 0000000000000003 RSI: ffffffff81b9c0fa RDI: 00000000ffffffff
[ 84.202650] RBP: ffffea004f0e4d80 R08: 0000000000000000 R09: 0000000000000000
[ 84.203460] R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000000
[ 84.204265] R13: ffff88940aa86318 R14: 00000000f7fac000 R15: ffff8893c8db3c40
[ 84.205083] FS: 00007fd8a8320740(0000) GS:ffff88940fb80000(0000) knlGS:0000000000000000
[ 84.206000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 84.206664] CR2: 00000000f7fac000 CR3: 00000013c93a6000 CR4: 00000000000006a0
[ 84.207481] Call Trace:
[ 84.207883] do_wp_page+0x172/0x6a0
[ 84.208285] handle_mm_fault+0xd0b/0x1540
[ 84.208753] __get_user_pages+0x21a/0x6c0
[ 84.209213] __get_user_pages_remote+0xc8/0x2a0
[ 84.209735] process_vm_rw_core.isra.0+0x1ac/0x440
[ 84.210318] ? __might_fault+0x26/0x40
[ 84.210758] ? _copy_from_user+0x6a/0xa0
[ 84.211208] ? __might_fault+0x26/0x40
[ 84.211642] ? _copy_from_user+0x6a/0xa0
[ 84.212091] process_vm_rw+0xd1/0x100
[ 84.212511] ? _copy_to_user+0x69/0x80
[ 84.212946] ? ptrace_get_syscall_info+0x9b/0x180
[ 84.213484] ? find_held_lock+0x2b/0x80
[ 84.213926] ? __x64_sys_ptrace+0x106/0x140
[ 84.214405] ? fpregs_assert_state_consistent+0x19/0x40
[ 84.215002] ? exit_to_user_mode_prepare+0x2d/0x120
[ 84.215556] __x64_sys_process_vm_readv+0x22/0x40
[ 84.216103] do_syscall_64+0x2d/0x80
[ 84.216518] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 84.217098] RIP: 0033:0x7fd8a84896da
[ 84.217512] Code: 48 8b 15 b9 f7 0b 00 f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 18 f0 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 36 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 81 f7 0b 00 f7
[ 84.219618] RSP: 002b:00007ffd08c3c678 EFLAGS: 00000246 ORIG_RAX: 0000000000000136
[ 84.220563] RAX: ffffffffffffffda RBX: 00000000f7fac000 RCX: 00007fd8a84896da
[ 84.221380] RDX: 0000000000000001 RSI: 00007ffd08c3c680 RDI: 0000000000000549
[ 84.222194] RBP: 00007ffd08c3c760 R08: 0000000000000001 R09: 0000000000000000
[ 84.222999] R10: 00007ffd08c3c690 R11: 0000000000000246 R12: 00000000f7faca80
[ 84.223804] R13: 0000000000000580 R14: 0000000000000549 R15: 00005589a50eee80
[ 84.223804] R13: 0000000000000580 R14: 0000000000000549 R15: 00005589a50eee80
[ 84.224612] ---[ end trace d8dbf2da5dc1b7ca ]---

2020-09-03 19:59:37

by Linus Torvalds

[permalink] [raw]

Subject: Re: a crash when running strace from persistent memory

On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <[email protected]> wrote:
>
> There's a bug when you run strace from dax-based filesystem.
>
> -- create real or emulated persistent memory device (/dev/pmem0)
> mkfs.ext2 /dev/pmem0
> -- mount it
> mount -t ext2 -o dax /dev/pmem0 /mnt/test
> -- copy the system to it (well, you can copy just a few files that are
> needed for running strace and ls)
> cp -ax / /mnt/test
> -- bind the system directories
> mount --bind /dev /mnt/test/dev
> mount --bind /proc /mnt/test/proc
> mount --bind /sys /mnt/test/sys
> -- run strace on the ls command
> chroot /mnt/test/ strace /bin/ls
>
> You get this warning and ls is killed with SIGSEGV.
>
> I bisected the problem and it is caused by the commit
> 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around
> "COW can break either way" issue). When I revert the patch (on the kernel
> 5.9-rc3), the bug goes away.

Funky. I really don't see how it could cause that, but we have the
UDDF issue too, so I'm guessing I will have to fix it the radical way
with Peter Xu's series based on my "rip out COW special cases" patch.

Or maybe I'm just using that as an excuse for really wanting to apply
that series.. Because we can't just revert that GUP commit due to
security concerns.

> [ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6

I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have
some extra debug patch that causes that line to be off by 4? Because
at least for me, line 2486 is actually an empty line in v5.9-rc3.

That said, I really think this is a pre-existing race, and all the
"COW can break either way" patch does is change the timing (presumably
due to the actual pattern of actually doing the COW changing).

See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into
PFN-mapped VMA") for background.

Mikulas, can you check that everything works ok for that case if you
apply Peter's series? See

https://lore.kernel.org/lkml/[email protected]/

or if you have 'b4' installed, use

b4 am [email protected]

to get the series..

Linus

2020-09-04 08:10:21

by Mikulas Patocka

[permalink] [raw]

Subject: Re: a crash when running strace from persistent memory

On Thu, 3 Sep 2020, Linus Torvalds wrote:

> On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <[email protected]> wrote:
> >
> > There's a bug when you run strace from dax-based filesystem.
> >
> > -- create real or emulated persistent memory device (/dev/pmem0)
> > mkfs.ext2 /dev/pmem0
> > -- mount it
> > mount -t ext2 -o dax /dev/pmem0 /mnt/test
> > -- copy the system to it (well, you can copy just a few files that are
> > needed for running strace and ls)
> > cp -ax / /mnt/test
> > -- bind the system directories
> > mount --bind /dev /mnt/test/dev
> > mount --bind /proc /mnt/test/proc
> > mount --bind /sys /mnt/test/sys
> > -- run strace on the ls command
> > chroot /mnt/test/ strace /bin/ls
> >
> > You get this warning and ls is killed with SIGSEGV.
> >
> > I bisected the problem and it is caused by the commit
> > 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around
> > "COW can break either way" issue). When I revert the patch (on the kernel
> > 5.9-rc3), the bug goes away.
>
> Funky. I really don't see how it could cause that, but we have the
> UDDF issue too, so I'm guessing I will have to fix it the radical way
> with Peter Xu's series based on my "rip out COW special cases" patch.
>
> Or maybe I'm just using that as an excuse for really wanting to apply
> that series.. Because we can't just revert that GUP commit due to
> security concerns.
>
> > [ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6
>
> I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have
> some extra debug patch that causes that line to be off by 4? Because
> at least for me, line 2486 is actually an empty line in v5.9-rc3.

Yes, that's it. I added a few printk to look at the control flow.

> That said, I really think this is a pre-existing race, and all the
> "COW can break either way" patch does is change the timing (presumably
> due to the actual pattern of actually doing the COW changing).
>
> See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into
> PFN-mapped VMA") for background.
>
> Mikulas, can you check that everything works ok for that case if you
> apply Peter's series? See
>
> https://lore.kernel.org/lkml/[email protected]/

I applied these four patches and strace works well. There is no longer any
warning or crash.

Mikulas

> or if you have 'b4' installed, use
>
> b4 am [email protected]
>
> to get the series..
>
> Linus
>

2020-09-04 17:13:09

by Linus Torvalds

[permalink] [raw]

Subject: Re: a crash when running strace from persistent memory

On Fri, Sep 4, 2020 at 1:08 AM Mikulas Patocka <[email protected]> wrote:
>
> I applied these four patches and strace works well. There is no longer any
> warning or crash.

Ok. I obviously approve of that series whole-heartedly, but I still
didn't want to apply it this way (and with this kind of "mid-rc"
timing).

I was hoping to just leave it for the next merge window, but there are
now two independent problems that that forced COW patch of mine
caused, and a plain revert isn't acceptable either, so I've just
applied that series to my tree despite the garbage timing.

Maybe I'm just making excuses and rationalizing because I wanted that
series anyway, and patches that remove lines in core code make me
happy, but I don't see other great alternatives.

Linus