2024-04-14 22:32:25

by Kent Overstreet

[permalink] [raw]
Subject: debugfs revoke broken?

I recently started seeing test failures like the following; from the
"shutdown complete" line we're well after the point where we called
debugfs_remove_recursive() - yet from the backtrace we were still able
to call into debugfs.

And I see from the history the remove path has been getting tweaked,
so...

00091 ========= TEST generic/001
00091
00092 Setting up swapspace version 1, size = 2 GiB (2147479552 bytes)
00092 no label, UUID=73a80295-2b03-4512-aae1-785187926ce3
00092 Adding 2097148k swap on /dev/vde. Priority:-2 extents:1 across:2097148k
00092 configuration error - unknown item 'NONEXISTENT' (notify administrator)
00092 configuration error - unknown item 'PREVENT_NO_AUTH' (notify administrator)
00094 configuration error - unknown item 'NONEXISTENT' (notify administrator)
00094 configuration error - unknown item 'PREVENT_NO_AUTH' (notify administrator)
00094 configuration error - unknown item 'NONEXISTENT' (notify administrator)
00094 configuration error - unknown item 'PREVENT_NO_AUTH' (notify administrator)
00101 building 001... done
00101 bcachefs (vdb): mounting version 1.7: mi_btree_bitmap
00101 bcachefs (vdb): initializing new filesystem
00101 bcachefs (vdb): going read-write
00101 bcachefs (vdb): marking superblocks
00101 bcachefs (vdb): initializing freespace
00101 bcachefs (vdb): done initializing freespace
00101 bcachefs (vdb): reading snapshots table
00101 bcachefs (vdb): reading snapshots done
00101 bcachefs (vdb): done starting filesystem
00102 FSTYP -- bcachefs
00102 PLATFORM -- Linux/aarch64 Debian-1103-bullseye-arm64-base-kvm 6.9.0-rc2-ktest-g2719f811ae24 #18142 SMP Sun Apr 14 16:26:05 NZST 2024
00102 MKFS_OPTIONS -- --encrypted --no_passphrase /dev/vdc
00102 MOUNT_OPTIONS -- /dev/vdc /mnt/scratch
00102
00102 bcachefs (vdc): mounting version 1.7: mi_btree_bitmap
00102 bcachefs (vdc): initializing new filesystem
00102 bcachefs (vdc): going read-write
00102 bcachefs (vdc): marking superblocks
00102 bcachefs (vdc): initializing freespace
00102 bcachefs (vdc): done initializing freespace
00102 bcachefs (vdc): reading snapshots table
00102 bcachefs (vdc): reading snapshots done
00102 bcachefs (vdc): done starting filesystem
00102 bcachefs (vdc): shutting down
00102 bcachefs (vdc): going read-only
00102 bcachefs (vdc): finished waiting for writes to stop
00102 bcachefs (vdc): flushing journal and stopping allocators, journal seq 3
00102 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 5
00102 bcachefs (vdc): shutdown complete, journal seq 6
00102 bcachefs (vdc): marking filesystem clean
00102 bcachefs (vdc): shutdown complete
00102 bcachefs (vdb): shutting down
00102 bcachefs (vdb): going read-only
00102 bcachefs (vdb): finished waiting for writes to stop
00102 bcachefs (vdb): flushing journal and stopping allocators, journal seq 6
00102 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 7
00102 bcachefs (vdb): shutdown complete, journal seq 8
00102 bcachefs (vdb): marking filesystem clean
00102 bcachefs (vdb): shutdown complete
00102 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
00102 Mem abort info:
00102 ESR = 0x0000000096000004
00102 EC = 0x25: DABT (current EL), IL = 32 bits
00102 SET = 0, FnV = 0
00102 EA = 0, S1PTW = 0
00102 FSC = 0x04: level 0 translation fault
00102 Data abort info:
00102 ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
00102 CM = 0, WnR = 0, TnD = 0, TagAccess = 0
00102 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00102 user pgtable: 4k pages, 48-bit VAs, pgdp=000000011585c000
00102 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
00102 Internal error: Oops: 0000000096000004 [#1] SMP
00102 Modules linked in:
00102 CPU: 7 PID: 1805 Comm: cat Not tainted 6.9.0-rc2-ktest-g2719f811ae24 #18142
00102 Hardware name: linux,dummy-virt (DT)
00102 pstate: 00001005 (nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00102 pc : bch2_journal_seq_pins_to_text+0x100/0x208
00102 lr : bch2_journal_seq_pins_to_text+0xf0/0x208
00102 sp : ffff0000d6dd3c80
00102 x29: ffff0000d6dd3c80 x28: ffff0000ca361f00 x27: 0000000000000000
00102 x26: 0000000000000000 x25: ffff0000da0002c0 x24: ffff0000da0002f0
00102 x23: ffff0000d50668c0 x22: ffff800080998950 x21: ffff0000da0002c0
00102 x20: ffff0000c46165c0 x19: 0000000000000000 x18: 00000000fffffffe
00102 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
00102 x14: ffffffffffffffff x13: ffff0000c0ada1df x12: ffff0000c0ada1d9
00102 x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080400ec8
00102 x8 : 0000000000000000 x7 : 20746e756f63203a x6 : 0000000000000000
00102 x5 : 0000000000000020 x4 : 000000000000000d x3 : ffff0000c0ada1d0
00102 x2 : 0000000000000010 x1 : ffff0000c0ada1d0 x0 : 0000000000000012
00102 Call trace:
00102 bch2_journal_seq_pins_to_text+0x100/0x208
00102 bch2_journal_pins_read+0x48/0xd0
00102 full_proxy_read+0x64/0xb8
00102 vfs_read+0xd0/0x2d0
00102 ksys_read+0x5c/0xe0
00102 __arm64_sys_read+0x20/0x30
00102 invoke_syscall.constprop.0+0x50/0xe0
00102 do_el0_svc+0x44/0xc8
00102 el0_svc+0x18/0x58
00102 el0t_64_sync_handler+0xb8/0xc0
00102 el0t_64_sync+0x14c/0x150
00102 Code: f94002b3 eb15027f 54000180 d503201f (f9400a63)
00102 ---[ end trace 0000000000000000 ]---
00102 Kernel panic - not syncing: Oops: Fatal exception
00102 SMP: stopping secondary CPUs
00102 Kernel Offset: disabled
00102 CPU features: 0x0,00000003,80000008,4240500b
00102 Memory Limit: none
00102 ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
00107 ========= FAILED TIMEOUT generic.001 in 1200s



2024-04-15 08:48:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: debugfs revoke broken?

On Sun, Apr 14, 2024 at 06:32:07PM -0400, Kent Overstreet wrote:
> I recently started seeing test failures like the following; from the
> "shutdown complete" line we're well after the point where we called
> debugfs_remove_recursive() - yet from the backtrace we were still able
> to call into debugfs.
>
> And I see from the history the remove path has been getting tweaked,
> so...
>
> 00091 ========= TEST generic/001
> 00091
> 00092 Setting up swapspace version 1, size = 2 GiB (2147479552 bytes)
> 00092 no label, UUID=73a80295-2b03-4512-aae1-785187926ce3
> 00092 Adding 2097148k swap on /dev/vde. Priority:-2 extents:1 across:2097148k
> 00092 configuration error - unknown item 'NONEXISTENT' (notify administrator)
> 00092 configuration error - unknown item 'PREVENT_NO_AUTH' (notify administrator)
> 00094 configuration error - unknown item 'NONEXISTENT' (notify administrator)
> 00094 configuration error - unknown item 'PREVENT_NO_AUTH' (notify administrator)
> 00094 configuration error - unknown item 'NONEXISTENT' (notify administrator)
> 00094 configuration error - unknown item 'PREVENT_NO_AUTH' (notify administrator)
> 00101 building 001... done
> 00101 bcachefs (vdb): mounting version 1.7: mi_btree_bitmap
> 00101 bcachefs (vdb): initializing new filesystem
> 00101 bcachefs (vdb): going read-write
> 00101 bcachefs (vdb): marking superblocks
> 00101 bcachefs (vdb): initializing freespace
> 00101 bcachefs (vdb): done initializing freespace
> 00101 bcachefs (vdb): reading snapshots table
> 00101 bcachefs (vdb): reading snapshots done
> 00101 bcachefs (vdb): done starting filesystem
> 00102 FSTYP -- bcachefs
> 00102 PLATFORM -- Linux/aarch64 Debian-1103-bullseye-arm64-base-kvm 6.9.0-rc2-ktest-g2719f811ae24 #18142 SMP Sun Apr 14 16:26:05 NZST 2024
> 00102 MKFS_OPTIONS -- --encrypted --no_passphrase /dev/vdc
> 00102 MOUNT_OPTIONS -- /dev/vdc /mnt/scratch
> 00102
> 00102 bcachefs (vdc): mounting version 1.7: mi_btree_bitmap
> 00102 bcachefs (vdc): initializing new filesystem
> 00102 bcachefs (vdc): going read-write
> 00102 bcachefs (vdc): marking superblocks
> 00102 bcachefs (vdc): initializing freespace
> 00102 bcachefs (vdc): done initializing freespace
> 00102 bcachefs (vdc): reading snapshots table
> 00102 bcachefs (vdc): reading snapshots done
> 00102 bcachefs (vdc): done starting filesystem
> 00102 bcachefs (vdc): shutting down
> 00102 bcachefs (vdc): going read-only
> 00102 bcachefs (vdc): finished waiting for writes to stop
> 00102 bcachefs (vdc): flushing journal and stopping allocators, journal seq 3
> 00102 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 5
> 00102 bcachefs (vdc): shutdown complete, journal seq 6
> 00102 bcachefs (vdc): marking filesystem clean
> 00102 bcachefs (vdc): shutdown complete
> 00102 bcachefs (vdb): shutting down
> 00102 bcachefs (vdb): going read-only
> 00102 bcachefs (vdb): finished waiting for writes to stop
> 00102 bcachefs (vdb): flushing journal and stopping allocators, journal seq 6
> 00102 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 7
> 00102 bcachefs (vdb): shutdown complete, journal seq 8
> 00102 bcachefs (vdb): marking filesystem clean
> 00102 bcachefs (vdb): shutdown complete
> 00102 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> 00102 Mem abort info:
> 00102 ESR = 0x0000000096000004
> 00102 EC = 0x25: DABT (current EL), IL = 32 bits
> 00102 SET = 0, FnV = 0
> 00102 EA = 0, S1PTW = 0
> 00102 FSC = 0x04: level 0 translation fault
> 00102 Data abort info:
> 00102 ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> 00102 CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> 00102 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> 00102 user pgtable: 4k pages, 48-bit VAs, pgdp=000000011585c000
> 00102 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
> 00102 Internal error: Oops: 0000000096000004 [#1] SMP
> 00102 Modules linked in:
> 00102 CPU: 7 PID: 1805 Comm: cat Not tainted 6.9.0-rc2-ktest-g2719f811ae24 #18142
> 00102 Hardware name: linux,dummy-virt (DT)
> 00102 pstate: 00001005 (nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> 00102 pc : bch2_journal_seq_pins_to_text+0x100/0x208
> 00102 lr : bch2_journal_seq_pins_to_text+0xf0/0x208
> 00102 sp : ffff0000d6dd3c80
> 00102 x29: ffff0000d6dd3c80 x28: ffff0000ca361f00 x27: 0000000000000000
> 00102 x26: 0000000000000000 x25: ffff0000da0002c0 x24: ffff0000da0002f0
> 00102 x23: ffff0000d50668c0 x22: ffff800080998950 x21: ffff0000da0002c0
> 00102 x20: ffff0000c46165c0 x19: 0000000000000000 x18: 00000000fffffffe
> 00102 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> 00102 x14: ffffffffffffffff x13: ffff0000c0ada1df x12: ffff0000c0ada1d9
> 00102 x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080400ec8
> 00102 x8 : 0000000000000000 x7 : 20746e756f63203a x6 : 0000000000000000
> 00102 x5 : 0000000000000020 x4 : 000000000000000d x3 : ffff0000c0ada1d0
> 00102 x2 : 0000000000000010 x1 : ffff0000c0ada1d0 x0 : 0000000000000012
> 00102 Call trace:
> 00102 bch2_journal_seq_pins_to_text+0x100/0x208
> 00102 bch2_journal_pins_read+0x48/0xd0
> 00102 full_proxy_read+0x64/0xb8
> 00102 vfs_read+0xd0/0x2d0
> 00102 ksys_read+0x5c/0xe0
> 00102 __arm64_sys_read+0x20/0x30
> 00102 invoke_syscall.constprop.0+0x50/0xe0
> 00102 do_el0_svc+0x44/0xc8
> 00102 el0_svc+0x18/0x58
> 00102 el0t_64_sync_handler+0xb8/0xc0
> 00102 el0t_64_sync+0x14c/0x150

So this file was open when debugfs_remove() was called?

Any chance you can bisect? We just fixed some issues here in
952c3fce297f ("debugfs: fix wait/cancellation handling during remove")
that I thought should have handled this. If you revert that commit,
does things work again? And/or what about commit 8c88a474357e
("debugfs: add API to allow debugfs operations cancellation")? Maybe we
need to go back to not having completions at all in the debugfs remove
path and just live with waiting for the files to be removed before
continuing (which should at the least, resolve the issue you see here,
while slowing things down a bit.)

thanks,

greg k-h

2024-04-15 18:02:49

by Kent Overstreet

[permalink] [raw]
Subject: Re: debugfs revoke broken?

On Mon, Apr 15, 2024 at 10:47:56AM +0200, Greg Kroah-Hartman wrote:
> So this file was open when debugfs_remove() was called?

Seems likely.

> Any chance you can bisect? We just fixed some issues here in
> 952c3fce297f ("debugfs: fix wait/cancellation handling during remove")
> that I thought should have handled this. If you revert that commit,
> does things work again? And/or what about commit 8c88a474357e
> ("debugfs: add API to allow debugfs operations cancellation")? Maybe we
> need to go back to not having completions at all in the debugfs remove
> path and just live with waiting for the files to be removed before
> continuing (which should at the least, resolve the issue you see here,
> while slowing things down a bit.)

No, this doesn't occur often enough to bisect, unfortunately. All my
tests bang on sysfs/debugfs in the background to torture test those
codepaths, and I've seen 2 occurrences out of thousands of test runs -
it's rare.