2020-08-11 11:19:49

by John Hubbard

[permalink] [raw]
Subject: btrfs crash in kobject_del while running xfstest

Hi,

Here's an early warning of a possible problem.

I'm seeing a new btrfs crash when running xfstests, as of
00e4db51259a5f936fec1424b884f029479d3981 ("Merge tag
'perf-tools-2020-08-10' of
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") in linux.git.

This doesn't crash in v5.8, so I attempted to bisect, but ended up with
the net-next merge commit as the offending one: commit
47ec5303d73ea344e84f46660fff693c57641386 ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"), which
doesn't really help because it's 2088 files changed, of course.

I'm attaching the .config that I used.

This is easily reproducible via something like (change to match your setup,
of course):

sudo TEST_DEV=/dev/nvme0n1p8 TEST_DIR=/xfstest_btrfs \
SCRATCH_DEV=/dev/nvme0n1p9 SCRATCH_MNT=/xfstest_scratch ./check \
btrfs/002

which leads to:

[ 586.097360] BTRFS info (device nvme0n1p8): disk space caching is enabled
[ 586.103232] BTRFS info (device nvme0n1p8): has skinny extents
[ 586.115169] BTRFS info (device nvme0n1p8): enabling ssd optimizations
[ 586.308264] BTRFS: device fsid 5dfff89d-8f8d-42ac-8538-acb95164d0be devid 1 transid 5
/dev/nvme0n1p9 scanned by mkfs.btrfs (6374)
[ 586.342776] BTRFS info (device nvme0n1p9): disk space caching is enabled
[ 586.348585] BTRFS info (device nvme0n1p9): has skinny extents
[ 586.353413] BTRFS info (device nvme0n1p9): flagging fs with big metadata feature
[ 586.368129] BTRFS info (device nvme0n1p9): enabling ssd optimizations
[ 586.373996] BTRFS info (device nvme0n1p9): checking UUID tree
[ 586.387449] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 586.393485] #PF: supervisor read access in kernel mode
[ 586.397623] #PF: error_code(0x0000) - not-present page
[ 586.401763] PGD 0 P4D 0
[ 586.403219] Oops: 0000 [#1] SMP PTI
[ 586.405650] CPU: 1 PID: 6405 Comm: umount Not tainted 5.8.0-hubbard-github+ #171
[ 586.412118] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X99-UD3P-CF, BIOS
F1 02/10/2015
[ 586.421360] RIP: 0010:kobject_del+0x1/0x20
[ 586.424427] Code: 48 c7 43 18 00 00 00 00 5b 5d c3 c3 be 01 00 00 00 48 89 df e8 60 1b 00 00 eb
c9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 <48> 8b 6f 18 e8 86 ff ff ff 48 89 ef 5d e9 cd fe ff
ff 66 66 2e 0f
[ 586.442644] RSP: 0018:ffffc90009ef7e08 EFLAGS: 00010246
[ 586.446914] RAX: 0000000000000000 RBX: ffff888896080000 RCX: 0000000000000006
[ 586.453149] RDX: ffff88888ee4b000 RSI: ffffffff82669a00 RDI: 0000000000000000
[ 586.459390] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 586.465631] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888896080000
[ 586.471866] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 586.478106] FS: 00007f5595739c80(0000) GS:ffff88889fc40000(0000) knlGS:0000000000000000
[ 586.485325] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 586.490129] CR2: 0000000000000018 CR3: 0000000896d5a006 CR4: 00000000001706e0
[ 586.496372] Call Trace:
[ 586.497807] btrfs_sysfs_del_qgroups+0xa5/0xe0 [btrfs]
[ 586.502017] close_ctree+0x1c5/0x2b6 [btrfs]
[ 586.505307] ? fsnotify_destroy_marks+0x24/0x124
[ 586.508948] generic_shutdown_super+0x67/0x100
[ 586.512408] kill_anon_super+0x14/0x30
[ 586.515159] btrfs_kill_super+0x12/0x20 [btrfs]
[ 586.518704] deactivate_locked_super+0x36/0x90
[ 586.522159] cleanup_mnt+0x12d/0x190
[ 586.524720] task_work_run+0x5c/0xa0
[ 586.527285] exit_to_user_mode_loop+0xb9/0xc0
[ 586.530648] exit_to_user_mode_prepare+0xab/0xe0
[ 586.534276] syscall_exit_to_user_mode+0x17/0x50
[ 586.537908] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 586.541984] RIP: 0033:0x7f55959896fb
[ 586.544531] Code: 07 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00
0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 07 0c 00 f7
d8 64 89 01 48
[ 586.562775] RSP: 002b:00007fffcc431228 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 586.569485] RAX: 0000000000000000 RBX: 00007f5595ab31e4 RCX: 00007f55959896fb
[ 586.575753] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005601fb16bb80
[ 586.582020] RBP: 00005601fb16b970 R08: 0000000000000000 R09: 00007fffcc42ffa0
[ 586.588278] R10: 00005601fb16c930 R11: 0000000000000246 R12: 00005601fb16bb80
[ 586.594534] R13: 0000000000000000 R14: 00005601fb16ba68 R15: 0000000000000000
[ 586.600805] Modules linked in: xfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
fscache bpfilter dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support
x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul btrfs ghash_clmulni_intel aesni_intel
blake2b_generic crypto_simd xor cryptd zstd_compress glue_helper input_leds raid6_pq libcrc32c
lpc_ich i2c_i801 mfd_core mei_me i2c_smbus mei rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser
libiscsi ib_srpt target_core_mod ib_srp ib_ipoib rdma_ucm ib_uverbs ib_umad sr_mod cdrom sd_mod
nouveau ahci libahci nvme crc32c_intel video e1000e led_class nvme_core libata t10_pi ttm mxm_wmi
wmi fuse
[ 586.661098] CR2: 0000000000000018
[ 586.663455] ---[ end trace 158f42d646f4715d ]---

A quick peek shows that this is crashing here:

void kobject_del(struct kobject *kobj)
{
struct kobject *parent = kobj->parent; <---- CRASHES HERE with NULL kobj

__kobject_del(kobj);
kobject_put(parent);
}
EXPORT_SYMBOL(kobject_del);

The crash at 0x18 matches passes in a null, because that's the right offset for
->parent, and the disassembly confirms that 0x18 gets offset right at kobject_del+0x1:

Dump of assembler code for function kobject_del:
0xffffffff81534ec0 <+0>: push %rbp
0xffffffff81534ec1 <+1>: mov 0x18(%rdi),%rbp
0xffffffff81534ec5 <+5>: callq 0xffffffff81534e50 <__kobject_del>
0xffffffff81534eca <+10>: mov %rbp,%rdi
0xffffffff81534ecd <+13>: pop %rbp
0xffffffff81534ece <+14>: jmpq 0xffffffff81534da0 <kobject_put>
End of assembler dump.

But as for how we ended up with a null kobj here, that's actually hard to see, at least
for a non-btrfs person, which is why I hoped git bisect would help more than it did here.


thanks,
--
John Hubbard
NVIDIA


Attachments:
btrfs_crash.config (137.10 kB)

2020-08-11 11:23:43

by John Hubbard

[permalink] [raw]
Subject: Re: btrfs crash in kobject_del while running xfstest

Somehow the copy-paste of Chris Mason's name failed (user error
on my end), sorry about that Chris!

On 8/11/20 4:17 AM, John Hubbard wrote:
> Hi,
>
> Here's an early warning of a possible problem.
>
> I'm seeing a new btrfs crash when running xfstests, as of
> 00e4db51259a5f936fec1424b884f029479d3981 ("Merge tag
> 'perf-tools-2020-08-10' of
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") in linux.git.
>
> This doesn't crash in v5.8, so I attempted to bisect, but ended up with
> the net-next merge commit as the offending one: commit
> 47ec5303d73ea344e84f46660fff693c57641386 ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"), which
> doesn't really help because it's 2088 files changed, of course.
>
> I'm attaching the .config that I used.
>
> This is easily reproducible via something like (change to match your setup,
> of course):
>
>     sudo TEST_DEV=/dev/nvme0n1p8 TEST_DIR=/xfstest_btrfs \
>       SCRATCH_DEV=/dev/nvme0n1p9 SCRATCH_MNT=/xfstest_scratch  ./check \
>       btrfs/002
>
> which leads to:
>
> [  586.097360] BTRFS info (device nvme0n1p8): disk space caching is enabled
> [  586.103232] BTRFS info (device nvme0n1p8): has skinny extents
> [  586.115169] BTRFS info (device nvme0n1p8): enabling ssd optimizations
> [  586.308264] BTRFS: device fsid 5dfff89d-8f8d-42ac-8538-acb95164d0be devid 1 transid 5
> /dev/nvme0n1p9 scanned by mkfs.btrfs (6374)
> [  586.342776] BTRFS info (device nvme0n1p9): disk space caching is enabled
> [  586.348585] BTRFS info (device nvme0n1p9): has skinny extents
> [  586.353413] BTRFS info (device nvme0n1p9): flagging fs with big metadata feature
> [  586.368129] BTRFS info (device nvme0n1p9): enabling ssd optimizations
> [  586.373996] BTRFS info (device nvme0n1p9): checking UUID tree
> [  586.387449] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [  586.393485] #PF: supervisor read access in kernel mode
> [  586.397623] #PF: error_code(0x0000) - not-present page
> [  586.401763] PGD 0 P4D 0
> [  586.403219] Oops: 0000 [#1] SMP PTI
> [  586.405650] CPU: 1 PID: 6405 Comm: umount Not tainted 5.8.0-hubbard-github+ #171
> [  586.412118] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X99-UD3P-CF, BIOS
> F1 02/10/2015
> [  586.421360] RIP: 0010:kobject_del+0x1/0x20
> [  586.424427] Code: 48 c7 43 18 00 00 00 00 5b 5d c3 c3 be 01 00 00 00 48 89 df e8 60 1b 00 00 eb
> c9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 <48> 8b 6f 18 e8 86 ff ff ff 48 89 ef 5d e9 cd fe ff
> ff 66 66 2e 0f
> [  586.442644] RSP: 0018:ffffc90009ef7e08 EFLAGS: 00010246
> [  586.446914] RAX: 0000000000000000 RBX: ffff888896080000 RCX: 0000000000000006
> [  586.453149] RDX: ffff88888ee4b000 RSI: ffffffff82669a00 RDI: 0000000000000000
> [  586.459390] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> [  586.465631] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888896080000
> [  586.471866] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  586.478106] FS:  00007f5595739c80(0000) GS:ffff88889fc40000(0000) knlGS:0000000000000000
> [  586.485325] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  586.490129] CR2: 0000000000000018 CR3: 0000000896d5a006 CR4: 00000000001706e0
> [  586.496372] Call Trace:
> [  586.497807]  btrfs_sysfs_del_qgroups+0xa5/0xe0 [btrfs]
> [  586.502017]  close_ctree+0x1c5/0x2b6 [btrfs]
> [  586.505307]  ? fsnotify_destroy_marks+0x24/0x124
> [  586.508948]  generic_shutdown_super+0x67/0x100
> [  586.512408]  kill_anon_super+0x14/0x30
> [  586.515159]  btrfs_kill_super+0x12/0x20 [btrfs]
> [  586.518704]  deactivate_locked_super+0x36/0x90
> [  586.522159]  cleanup_mnt+0x12d/0x190
> [  586.524720]  task_work_run+0x5c/0xa0
> [  586.527285]  exit_to_user_mode_loop+0xb9/0xc0
> [  586.530648]  exit_to_user_mode_prepare+0xab/0xe0
> [  586.534276]  syscall_exit_to_user_mode+0x17/0x50
> [  586.537908]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  586.541984] RIP: 0033:0x7f55959896fb
> [  586.544531] Code: 07 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00
> 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 07 0c 00 f7
> d8 64 89 01 48
> [  586.562775] RSP: 002b:00007fffcc431228 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [  586.569485] RAX: 0000000000000000 RBX: 00007f5595ab31e4 RCX: 00007f55959896fb
> [  586.575753] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005601fb16bb80
> [  586.582020] RBP: 00005601fb16b970 R08: 0000000000000000 R09: 00007fffcc42ffa0
> [  586.588278] R10: 00005601fb16c930 R11: 0000000000000246 R12: 00005601fb16bb80
> [  586.594534] R13: 0000000000000000 R14: 00005601fb16ba68 R15: 0000000000000000
> [  586.600805] Modules linked in: xfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
> fscache bpfilter dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support
> x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul btrfs ghash_clmulni_intel aesni_intel
> blake2b_generic crypto_simd xor cryptd zstd_compress glue_helper input_leds raid6_pq libcrc32c
> lpc_ich i2c_i801 mfd_core mei_me i2c_smbus mei rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser
> libiscsi ib_srpt target_core_mod ib_srp ib_ipoib rdma_ucm ib_uverbs ib_umad sr_mod cdrom sd_mod
> nouveau ahci libahci nvme crc32c_intel video e1000e led_class nvme_core libata t10_pi ttm mxm_wmi
> wmi fuse
> [  586.661098] CR2: 0000000000000018
> [  586.663455] ---[ end trace 158f42d646f4715d ]---
>
> A quick peek shows that this is crashing here:
>
> void kobject_del(struct kobject *kobj)
> {
>     struct kobject *parent = kobj->parent; <---- CRASHES HERE with NULL kobj
>
>     __kobject_del(kobj);
>     kobject_put(parent);
> }
> EXPORT_SYMBOL(kobject_del);
>
> The crash at 0x18 matches passes in a null, because that's the right offset for
> ->parent, and the disassembly confirms that 0x18 gets offset right at kobject_del+0x1:
>
> Dump of assembler code for function kobject_del:
>    0xffffffff81534ec0 <+0>:     push   %rbp
>    0xffffffff81534ec1 <+1>:     mov    0x18(%rdi),%rbp
>    0xffffffff81534ec5 <+5>:     callq  0xffffffff81534e50 <__kobject_del>
>    0xffffffff81534eca <+10>:    mov    %rbp,%rdi
>    0xffffffff81534ecd <+13>:    pop    %rbp
>    0xffffffff81534ece <+14>:    jmpq   0xffffffff81534da0 <kobject_put>
> End of assembler dump.
>
> But as for how we ended up with a null kobj here, that's actually hard to see, at least
> for a non-btrfs person, which is why I hoped git bisect would help more than it did here.
>
>
> thanks,

thanks,
--
John Hubbard
NVIDIA

2020-08-11 14:07:14

by David Sterba

[permalink] [raw]
Subject: Re: btrfs crash in kobject_del while running xfstest

On Tue, Aug 11, 2020 at 04:19:47AM -0700, John Hubbard wrote:
> Somehow the copy-paste of Chris Mason's name failed (user error
> on my end), sorry about that Chris!
>
> On 8/11/20 4:17 AM, John Hubbard wrote:
> > Hi,
> >
> > Here's an early warning of a possible problem.
> >
> > I'm seeing a new btrfs crash when running xfstests, as of
> > 00e4db51259a5f936fec1424b884f029479d3981 ("Merge tag
> > 'perf-tools-2020-08-10' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") in linux.git.
> >
> > This doesn't crash in v5.8, so I attempted to bisect, but ended up with
> > the net-next merge commit as the offending one: commit
> > 47ec5303d73ea344e84f46660fff693c57641386 ("Merge
> > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"), which
> > doesn't really help because it's 2088 files changed, of course.

Thanks for the report, it's already known and patch is on the way to
Linus' tree (ETA before rc1). You can apply
https://lore.kernel.org/linux-btrfs/[email protected]/
locally.