2022-04-05 00:32:14

by Sachin Sant

[permalink] [raw]
Subject: [powerpc]Kernel crash while running xfstests (generic/250) [next-20220404]

While running xfstests(ext4 or XFS as fs) on a Power10 LPAR booted with today’s
next (5.18.0-rc1-next-20220404) following crash is seen.

[ 51.260209] XFS (dm-0): Unmounting Filesystem
[ 51.262949] XFS (dm-0): Mounting V5 Filesystem
[ 51.270524] XFS (dm-0): Ending clean mount
[ 51.272641] xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff)
[ 51.377505] XFS (dm-0): Unmounting Filesystem
[ 51.397584] BUG: Unable to handle kernel data access at 0x5deadbeef0000122
[ 51.397591] Faulting instruction address: 0xc0000000001561bc
[ 51.397595] Oops: Kernel access of bad area, sig: 11 [#1]
[ 51.397598] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 51.397602] Modules linked in: xfs dm_mod ip_set rfkill nf_tables bonding libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ext4 mbcache jbd2 sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp fuse
[ 51.397626] CPU: 3 PID: 3448 Comm: dmsetup Not tainted 5.18.0-rc1-next-20220404 #16
[ 51.397630] NIP: c0000000001561bc LR: c0000000001560e8 CTR: c000000000672ef0
[ 51.397633] REGS: c000000095c9b610 TRAP: 0380 Not tainted (5.18.0-rc1-next-20220404)
[ 51.397636] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24024824 XER: 00000000
[ 51.397646] CFAR: c0000000001560f0 IRQMASK: 0
[ 51.397646] GPR00: c0000000001560e8 c000000095c9b8b0 c000000002a03800 0000000000000000
[ 51.397646] GPR04: c000000017a1ab78 0000000000000000 c00000002cab6ac0 c000000093e73900
[ 51.397646] GPR08: c000000093e73900 5deadbeef0000100 5deadbeef0000122 c008000001b5a4e8
[ 51.397646] GPR12: c000000000672ef0 c000000abfff8e80 000000013dbd0b60 00007fff849e9da8
[ 51.397646] GPR16: 00007fff849e9da8 00007fff849e9da8 00007fff84a23670 0000000000000000
[ 51.397646] GPR20: 00007fff849f3388 00007fff84a22040 000000013dbd0b90 0000000000000131
[ 51.397646] GPR24: c00000000254d768 ffffffffffff0000 c00000000254d730 c000000027668e00
[ 51.397646] GPR28: c0000000029b0170 c000000017a1ab78 0000000000000017 0000000000000000
[ 51.397684] NIP [c0000000001561bc] __cpuhp_state_remove_instance+0x19c/0x2c0
[ 51.397692] LR [c0000000001560e8] __cpuhp_state_remove_instance+0xc8/0x2c0
[ 51.397697] Call Trace:
[ 51.397698] [c000000095c9b8b0] [c0000000001560e8] __cpuhp_state_remove_instance+0xc8/0x2c0 (unreliable)
[ 51.397705] [c000000095c9b920] [c000000000672f4c] bioset_exit+0x5c/0x280
[ 51.397709] [c000000095c9b9c0] [c008000001b433f4] cleanup_mapped_device+0x4c/0x1a0 [dm_mod]
[ 51.397721] [c000000095c9ba00] [c008000001b436f0] __dm_destroy+0x1a8/0x360 [dm_mod]
[ 51.397730] [c000000095c9baa0] [c008000001b50e90] dev_remove+0x1a8/0x280 [dm_mod]
[ 51.397740] [c000000095c9bb30] [c008000001b5115c] ctl_ioctl+0x1f4/0x7c0 [dm_mod]
[ 51.397750] [c000000095c9bd40] [c008000001b51748] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[ 51.397759] [c000000095c9bd60] [c0000000004b1f68] sys_ioctl+0xf8/0x150
[ 51.397763] [c000000095c9bdb0] [c00000000003373c] system_call_exception+0x18c/0x390
[ 51.397767] [c000000095c9be10] [c00000000000c64c] system_call_common+0xec/0x270
[ 51.397772] --- interrupt: c00 at 0x7fff84329210
[ 51.397776] NIP: 00007fff84329210 LR: 00007fff849e6824 CTR: 0000000000000000
[ 51.397780] REGS: c000000095c9be80 TRAP: 0c00 Not tainted (5.18.0-rc1-next-20220404)
[ 51.397785] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24004484 XER: 00000000
[ 51.397795] IRQMASK: 0
[ 51.397795] GPR00: 0000000000000036 00007ffffdb43030 00007fff84407300 0000000000000003
[ 51.397795] GPR04: 00000000c138fd04 000000013dbd0b60 0000000000000004 00007fff849f3f98
[ 51.397795] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
[ 51.397795] GPR12: 0000000000000000 00007fff84acfa80 000000013dbd0b60 00007fff849e9da8
[ 51.397795] GPR16: 00007fff849e9da8 00007fff849e9da8 00007fff84a23670 0000000000000000
[ 51.397795] GPR20: 00007fff849f3388 00007fff84a22040 000000013dbd0b90 000000013dbd02e0
[ 51.397795] GPR24: 00007fff849e9da8 00007fff849e9da8 00007fff849e9da8 00007fff849e9da8
[ 51.397795] GPR28: 0000000000000001 00007fff849e9da8 0000000000000000 00007fff849e9da8
[ 51.397829] NIP [00007fff84329210] 0x7fff84329210
[ 51.397831] LR [00007fff849e6824] 0x7fff849e6824
[ 51.397834] --- interrupt: c00
[ 51.397835] Instruction dump:
[ 51.397838] 60000000 7f69db78 7f83e040 7c7f07b4 7bea1f24 419cffb4 eae10028 eb210038
[ 51.397844] eb610048 e93d0000 e95d0008 2fa90000 <f92a0000> 419e0008 f9490008 3d405dea
[ 51.397850] ---[ end trace 0000000000000000 ]---
[ 51.400133]
[ 52.400136] Kernel panic - not syncing: Fatal exception

This problem was possibly introduced with 5.17.0-next-20220330.
Git bisect leads me to following patch
commit 1d158814db8e7b3cbca0f2c8d9242fbec4fbc57e
dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset

-Sachin


2022-04-07 07:58:29

by Sachin Sant

[permalink] [raw]
Subject: Re: [powerpc]Kernel crash while running xfstests (generic/250) [next-20220404]


> On 04-Apr-2022, at 5:04 PM, Sachin Sant <[email protected]> wrote:
>
> While running xfstests(ext4 or XFS as fs) on a Power10 LPAR booted with today’s
> next (5.18.0-rc1-next-20220404) following crash is seen.
>
> This problem was possibly introduced with 5.17.0-next-20220330.
> Git bisect leads me to following patch
> commit 1d158814db8e7b3cbca0f2c8d9242fbec4fbc57e
> dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset
>

Continue to see this problem with latest next.

[ 2388.091152] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
[ 2388.091173] ext4 filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff)
[ 2388.287138] BUG: Unable to handle kernel data access at 0x5deadbeef0000122
[ 2388.287154] Faulting instruction address: 0xc000000000154a6c
[ 2388.287160] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2388.287164] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 2388.287172] Modules linked in: xfs dm_flakey dm_snapshot dm_bufio dm_zero loop dm_mod ip_set bonding rfkill nf_tables libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ext4 mbcache jbd2 lpfc nvmet_fc nvmet sr_mod sd_mod cdrom nvme_fc sg nvme nvme_fabrics tg3 nvme_core ptp ibmvscsi t10_pi crc64_rocksoft ibmveth scsi_transport_srp scsi_transport_fc pps_core crc64 ipmi_devintf ipmi_msghandler fuse [last unloaded: scsi_debug]
[ 2388.287236] CPU: 16 PID: 1043652 Comm: dmsetup Not tainted 5.18.0-rc1-next-20220406 #1
[ 2388.287244] NIP: c000000000154a6c LR: c000000000154998 CTR: c000000000674690
[ 2388.287249] REGS: c000000145fb3610 TRAP: 0380 Not tainted (5.18.0-rc1-next-20220406)
[ 2388.287255] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28024824 XER: 00000000
[ 2388.287271] CFAR: c0000000001549a0 IRQMASK: 0
[ 2388.287271] GPR00: c000000000154998 c000000145fb38b0 c000000002a1f400 0000000000000000
[ 2388.287271] GPR04: c00000004aa2c378 0000000000000000 c000000048fdf060 c00000015387b600
[ 2388.287271] GPR08: c00000015387b600 5deadbeef0000100 5deadbeef0000122 c00800000988a4e8
[ 2388.287271] GPR12: c000000000674690 c00000001ec28a80 0000010014bf0b40 00007fff9f1b9da8
[ 2388.287271] GPR16: 00007fff9f1b9da8 00007fff9f1b9da8 00007fff9f1f3670 0000000000000000
[ 2388.287271] GPR20: 00007fff9f1c3388 00007fff9f1f2040 0000010014bf0b70 0000000000000131
[ 2388.287271] GPR24: c00000000254d768 ffffffffffff0000 c00000000254d730 c0000000f5103a00
[ 2388.287271] GPR28: c0000000029b0570 c00000004aa2c378 0000000000000017 0000000000000000
[ 2388.287332] NIP [c000000000154a6c] __cpuhp_state_remove_instance+0x19c/0x2c0
[ 2388.287344] LR [c000000000154998] __cpuhp_state_remove_instance+0xc8/0x2c0
[ 2388.287351] Call Trace:
[ 2388.287353] [c000000145fb38b0] [c000000000154998] __cpuhp_state_remove_instance+0xc8/0x2c0 (unreliable)
[ 2388.287362] [c000000145fb3920] [c0000000006746ec] bioset_exit+0x5c/0x280
[ 2388.287369] [c000000145fb39c0] [c0080000098733f4] cleanup_mapped_device+0x4c/0x1a0 [dm_mod]
[ 2388.287385] [c000000145fb3a00] [c0080000098736f0] __dm_destroy+0x1a8/0x360 [dm_mod]
[ 2388.287397] [c000000145fb3aa0] [c008000009880e90] dev_remove+0x1a8/0x280 [dm_mod]
[ 2388.287409] [c000000145fb3b30] [c00800000988115c] ctl_ioctl+0x1f4/0x7c0 [dm_mod]
[ 2388.287422] [c000000145fb3d40] [c008000009881748] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[ 2388.287434] [c000000145fb3d60] [c0000000004b2c08] sys_ioctl+0xf8/0x150
[ 2388.287441] [c000000145fb3db0] [c0000000000324e8] system_call_exception+0x178/0x380
[ 2388.287449] [c000000145fb3e10] [c00000000000c64c] system_call_common+0xec/0x250
[ 2388.287457] --- interrupt: c00 at 0x7fff9ec991a0
[ 2388.287461] NIP: 00007fff9ec991a0 LR: 00007fff9f1b6824 CTR: 0000000000000000
[ 2388.287466] REGS: c000000145fb3e80 TRAP: 0c00 Not tainted (5.18.0-rc1-next-20220406)
[ 2388.287471] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28004484 XER: 00000000
[ 2388.287486] IRQMASK: 0
[ 2388.287486] GPR00: 0000000000000036 00007fffe5635be0 00007fff9ed77300 0000000000000003
[ 2388.287486] GPR04: 00000000c138fd04 0000010014bf0b40 0000000000000004 00007fff9f1c3f98
[ 2388.287486] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
[ 2388.287486] GPR12: 0000000000000000 00007fff9f29fa00 0000010014bf0b40 00007fff9f1b9da8
[ 2388.287486] GPR16: 00007fff9f1b9da8 00007fff9f1b9da8 00007fff9f1f3670 0000000000000000
[ 2388.287486] GPR20: 00007fff9f1c3388 00007fff9f1f2040 0000010014bf0b70 0000010014bf0940
[ 2388.287486] GPR24: 00007fff9f1b9da8 00007fff9f1b9da8 00007fff9f1b9da8 00007fff9f1b9da8
[ 2388.287486] GPR28: 0000000000000001 00007fff9f1b9da8 0000000000000000 00007fff9f1b9da8
[ 2388.287543] NIP [00007fff9ec991a0] 0x7fff9ec991a0
[ 2388.287547] LR [00007fff9f1b6824] 0x7fff9f1b6824
[ 2388.287551] --- interrupt: c00
[ 2388.287554] Instruction dump:
[ 2388.287558] 60000000 7f69db78 7f83e040 7c7f07b4 7bea1f24 419cffb4 eae10028 eb210038
[ 2388.287569] eb610048 e93d0000 e95d0008 2fa90000 <f92a0000> 419e0008 f9490008 3d405dea
[ 2388.287581] ---[ end trace 0000000000000000 ]---
[ 2388.403785]
[ 2389.403791] Kernel panic - not syncing: Fatal exception

Let me know if any additional information is required.

-Sachin

2022-05-10 09:35:46

by Sachin Sant

[permalink] [raw]
Subject: Re: [powerpc]Kernel crash while running xfstests (generic/250) [next-20220404]



> On 07-Apr-2022, at 10:19 AM, Sachin Sant <[email protected]> wrote:
>
>
>> On 04-Apr-2022, at 5:04 PM, Sachin Sant <[email protected]> wrote:
>>
>> While running xfstests(ext4 or XFS as fs) on a Power10 LPAR booted with today’s
>> next (5.18.0-rc1-next-20220404) following crash is seen.
>>
>> This problem was possibly introduced with 5.17.0-next-20220330.
>> Git bisect leads me to following patch
>> commit 1d158814db8e7b3cbca0f2c8d9242fbec4fbc57e
>> dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset
>>
>
> Continue to see this problem with latest next.

I can still recreate this issue against latest linux-next build.

[ 1536.883400] Buffer I/O error on dev dm-0, logical block 10485497, async page read
[ 1536.936018] XFS (dm-0): Unmounting Filesystem
[ 1536.938849] XFS (dm-0): Mounting V5 Filesystem
[ 1536.946007] XFS (dm-0): Ending clean mount
[ 1536.947926] xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff)
[ 1537.052850] XFS (dm-0): Unmounting Filesystem
[ 1537.083979] BUG: Unable to handle kernel data access at 0x5deadbeef0000122
[ 1537.083982] Faulting instruction address: 0xc00000000015b0bc
[ 1537.083984] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1537.084000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 1537.084006] Modules linked in: dm_snapshot(E) dm_bufio(E) loop(E) dm_flakey(E) xfs(E) dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) bonding(E) tls(E) libcrc32c(E) nfnetlink(E) sunrpc(E) nd_pmem(E) nd_btt(E) dax_pmem(E) pseries_rng(E) papr_scm(E) libnvdimm(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) fuse(E) [last unloaded: scsi_debug]
[ 1537.084056] CPU: 10 PID: 970489 Comm: dmsetup Tainted: G E 5.18.0-rc6-next-20220509 #2
[ 1537.084061] NIP: c00000000015b0bc LR: c00000000015afe8 CTR: c000000000753bb0
[ 1537.084064] REGS: c0000000211fb610 TRAP: 0380 Tainted: G E (5.18.0-rc6-next-20220509)
[ 1537.084068] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24024824 XER: 20040000
[ 1537.084078] CFAR: c00000000015aff0 IRQMASK: 0
[ 1537.084078] GPR00: c00000000015afe8 c0000000211fb8b0 c000000002a7cf00 0000000000000000
[ 1537.084078] GPR04: c0000000f98a1378 0000000000000000 c0000000f5043b50 c00000043463e280
[ 1537.084078] GPR08: c00000043463e280 5deadbeef0000100 5deadbeef0000122 c00800000214dcb0
[ 1537.084078] GPR12: c000000000753bb0 c000000abfff1700 0000000155ee0b60 00007fffa7c29da8
[ 1537.084078] GPR16: 00007fffa7c29da8 00007fffa7c29da8 00007fffa7c63670 0000000000000000
[ 1537.084078] GPR20: 00007fffa7c33388 00007fffa7c62040 0000000155ee0b90 0000000000000131
[ 1537.084078] GPR24: c0000000025adb68 ffffffffffffffff c0000000025adb30 c000000103a5e000
[ 1537.084078] GPR28: c000000002a23ce8 c0000000f98a1378 0000000000000017 0000000000000000
[ 1537.084117] NIP [c00000000015b0bc] __cpuhp_state_remove_instance+0x19c/0x2c0
[ 1537.084125] LR [c00000000015afe8] __cpuhp_state_remove_instance+0xc8/0x2c0
[ 1537.084130] Call Trace:
[ 1537.084131] [c0000000211fb8b0] [c00000000015afe8] __cpuhp_state_remove_instance+0xc8/0x2c0 (unreliable)
[ 1537.084138] [c0000000211fb920] [c000000000753c14] bioset_exit+0x64/0x280
[ 1537.084144] [c0000000211fb9c0] [c008000002137744] cleanup_mapped_device+0x4c/0x1c0 [dm_mod]
[ 1537.084155] [c0000000211fba00] [c008000002137a60] __dm_destroy+0x1a8/0x360 [dm_mod]
[ 1537.084163] [c0000000211fbaa0] [c0080000021445c0] dev_remove+0x1b8/0x290 [dm_mod]
[ 1537.084172] [c0000000211fbb30] [c00800000214488c] ctl_ioctl+0x1f4/0x7d0 [dm_mod]
[ 1537.084181] [c0000000211fbd40] [c008000002144e88] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[ 1537.084190] [c0000000211fbd60] [c00000000055ff28] sys_ioctl+0xf8/0x190
[ 1537.084195] [c0000000211fbdb0] [c00000000003377c] system_call_exception+0x17c/0x350
[ 1537.084200] [c0000000211fbe10] [c00000000000c54c] system_call_common+0xec/0x270
[ 1537.084205] --- interrupt: c00 at 0x7fffa7529210
[ 1537.084208] NIP: 00007fffa7529210 LR: 00007fffa7c26824 CTR: 0000000000000000
[ 1537.084211] REGS: c0000000211fbe80 TRAP: 0c00 Tainted: G E (5.18.0-rc6-next-20220509)
[ 1537.084215] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24004484 XER: 00000000
[ 1537.084224] IRQMASK: 0
[ 1537.084224] GPR00: 0000000000000036 00007fffc7448c30 00007fffa7607300 0000000000000003
[ 1537.084224] GPR04: 00000000c138fd04 0000000155ee0b60 0000000000000004 00007fffa7c33f98
[ 1537.084224] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
[ 1537.084224] GPR12: 0000000000000000 00007fffa7d0fa80 0000000155ee0b60 00007fffa7c29da8
[ 1537.084224] GPR16: 00007fffa7c29da8 00007fffa7c29da8 00007fffa7c63670 0000000000000000
[ 1537.084224] GPR20: 00007fffa7c33388 00007fffa7c62040 0000000155ee0b90 0000000155ee02e0
[ 1537.084224] GPR24: 00007fffa7c29da8 00007fffa7c29da8 00007fffa7c29da8 00007fffa7c29da8
[ 1537.084224] GPR28: 0000000000000001 00007fffa7c29da8 0000000000000000 00007fffa7c29da8
[ 1537.084261] NIP [00007fffa7529210] 0x7fffa7529210
[ 1537.084263] LR [00007fffa7c26824] 0x7fffa7c26824
[ 1537.084265] --- interrupt: c00
[ 1537.084267] Instruction dump:
[ 1537.084270] 60000000 7f69db78 7f83e040 7c7f07b4 7bea1f24 419cffb4 eae10028 eb210038
[ 1537.084276] eb610048 e93d0000 e95d0008 2fa90000 <f92a0000> 419e0008 f9490008 3d405dea
[ 1537.084284] ---[ end trace 0000000000000000 ]---
[ 1537.106557]
[ 1538.106559] Kernel panic - not syncing: Fatal exception

- Sachin