2021-02-08 09:28:46

by Abdul Haleem

[permalink] [raw]
Subject: [powerpc][Oops] 5.11.0-rc6 boot failure on my lpar

Greeting's

mainline 5.11.0-rc6 kernel panics when booting on my powerpc lpar with below message

sd 1:0:1:0: [sdg] Attached SCSI disk
sd 1:0:0:5: [sdf] Attached SCSI disk
sd 1:0:1:1: [sdh] Attached SCSI disk
sd 1:0:0:2: [sdc] Attached SCSI disk
qla2xxx [0012:01:00.1]-ffff:2: register_localport: host-traddr=nn-0x2000f4e9d454a70f:pn-0x2100f4e9d454a70f on portID:f1100
scsi 2:0:0:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
scsi 2:0:0:0: alua: supports implicit TPGS
scsi 2:0:0:0: alua: device naa.60050768108001b3a8000000000000cf port group 10 rel port 680
sd 2:0:0:0: Attached scsi generic sg12 type 0
sd 2:0:0:0: Power-on or device reset occurred
sd 2:0:0:0: [sdm] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:0:0: [sdm] Write Protect is off
scsi 2:0:0:1: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
sd 2:0:0:0: [sdm] Write cache: disabled, read cache: enabled, supports DPO and FUA
scsi 2:0:0:1: alua: supports implicit TPGS
scsi 2:0:0:1: alua: device naa.60050768108001b3a8000000000000d0 port group 10 rel port 680
sd 2:0:0:1: Attached scsi generic sg13 type 0
sd 2:0:0:1: Power-on or device reset occurred
scsi 2:0:0:2: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
scsi 2:0:0:2: alua: supports implicit TPGS
scsi 2:0:0:2: alua: device naa.60050768108001b3a8000000000000d1 port group 10 rel port 680
sd 2:0:0:2: Attached scsi generic sg14 type 0
sd 2:0:0:2: Power-on or device reset occurred
scsi 2:0:0:3: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
scsi 2:0:0:3: alua: supports implicit TPGS
scsi 2:0:0:3: alua: device naa.60050768108001b3a8000000000000d2 port group 10 rel port 680
sd 2:0:0:3: Attached scsi generic sg15 type 0
sd 2:0:0:3: Power-on or device reset occurred
scsi 2:0:0:4: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
sd 2:0:0:0: [sdm] Attached SCSI disk
scsi 2:0:0:4: alua: supports implicit TPGS
scsi 2:0:0:4: alua: device naa.60050768108001b3a8000000000000d3 port group 10 rel port 680
sd 2:0:0:4: Attached scsi generic sg16 type 0
sd 2:0:0:4: Power-on or device reset occurred
scsi 2:0:0:5: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
scsi 2:0:0:5: alua: supports implicit TPGS
scsi 2:0:0:5: alua: device naa.60050768108001b3a8000000000000d4 port group 10 rel port 680
sd 2:0:0:5: Attached scsi generic sg17 type 0
sd 2:0:0:5: Power-on or device reset occurred
sd 2:0:0:0: alua: port group 10 state N non-preferred supports tolusna
sd 2:0:0:5: alua: port group 10 state A non-preferred supports tolusna
sd 2:0:0:4: alua: port group 10 state N non-preferred supports tolusna
sd 2:0:0:2: alua: port group 10 state N non-preferred supports tolusna
sd 2:0:0:1: alua: port group 10 state A non-preferred supports tolusna
sd 2:0:0:3: alua: port group 10 state A non-preferred supports tolusna
sd 2:0:0:1: [sdn] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:0:1: [sdn] Write Protect is off
sd 2:0:0:1: [sdn] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:0:4: [sdq] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:0:3: [sdp] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:0:2: [sdo] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:0:2: [sdo] Write Protect is off
sd 2:0:0:4: [sdq] Write Protect is off
sd 2:0:0:2: [sdo] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:0:3: [sdp] Write Protect is off
sd 2:0:0:3: [sdp] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:0:4: [sdq] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:0:5: [sdr] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:0:5: [sdr] Write Protect is off
sd 2:0:0:5: [sdr] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:0:1: [sdn] Attached SCSI disk
sd 2:0:0:4: [sdq] Attached SCSI disk
sd 2:0:0:2: [sdo] Attached SCSI disk
sd 2:0:0:3: [sdp] Attached SCSI disk
sd 2:0:0:5: [sdr] Attached SCSI disk
scsi 2:0:1:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
scsi 2:0:1:0: alua: supports implicit TPGS
scsi 2:0:1:0: alua: device naa.60050768108001b3a8000000000000cf port group 11 rel port e80
sd 2:0:1:0: Attached scsi generic sg18 type 0
sd 2:0:1:0: Power-on or device reset occurred
scsi 2:0:1:1: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
sd 2:0:1:0: [sds] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:1:0: [sds] Write Protect is off
scsi 2:0:1:1: alua: supports implicit TPGS
scsi 2:0:1:1: alua: device naa.60050768108001b3a8000000000000d0 port group 11 rel port e80
sd 2:0:1:0: [sds] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:1:1: Attached scsi generic sg19 type 0
sd 2:0:1:1: Power-on or device reset occurred
scsi 2:0:1:2: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6
scsi 2:0:1:2: alua: supports implicit TPGS
scsi 2:0:1:2: alua: device naa.60050768108001b3a8000000000000d1 port group 11 rel port e80
BUG: Unable to handle kernel data access on write at 0xc009ffffff600098
Faulting instruction address: 0xc0000000000b8d30
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: sd_mod sg mlx5_core qla2xxx ibmvfc(+) ibmveth nvme_fc nvme_fabrics nvme_core mlxfw t10_pi ptp scsi_transport_fc pps_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod
CPU: 17 PID: 610 Comm: kworker/u64:4 Not tainted 5.11.0-rc6-autotest-gb3d2c7b876d4 #2
Workqueue: scsi_wq_2 fc_scsi_scan_rport [scsi_transport_fc]
NIP: c0000000000b8d30 LR: c00000000036252c CTR: 0000000000000002
REGS: c000000005dbf210 TRAP: 0300 Not tainted (5.11.0-rc6-autotest-gb3d2c7b876d4)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44224280 XER: 0000000b
CFAR: c000000000010300 DAR: c009ffffff600098 DSISR: 02200000 IRQMASK: 0
GPR00: 0000000000000002 c000000005dbf4b0 c000000001969700 c009ffffff600098
GPR04: 0000000000000000 0000000000000018
sd 2:0:1:2: Power-on or device reset occurred
c009ffffff600098 0000000000000000
GPR08: c000000c7ffde780 c009fffffe000098 00000000000000b0 0000000000000010
GPR12: 000000000000000f c00000001ec0d600 0000000000000000 000000000000006d
GPR16: 0000000000000000 0000000000000000 0000000000000cc0 c000000001c02300
GPR20: c000000c7ffde500 c000000c7ffde610 0000000000000008 0000000000000cc0
GPR24: c00000000ff00208 0000000000000098 c0000000019b26a0 0000000000000000
GPR28: 0000000000000098 c0000000019b2260 0000000000000016 c00000000ff00180
NIP [c0000000000b8d30] memset+0x68/0x104
LR [c00000000036252c] pcpu_alloc+0x4ec/0xb20
Call Trace:
[c000000005dbf4b0] [c00000000036253c] pcpu_alloc+0x4fc/0xb20 (unreliable)
[c000000005dbf5c0] [c0000000004ab608] bdev_alloc+0xd8/0x170
[c000000005dbf610] [c0000000006295d0] __alloc_disk_node+0x80/0x160
[c000000005dbf690] [c0080000002d2ca0] sg_add_device+0x38/0x590 [sg]
[c000000005dbf760] [c0000000007d214c] device_add+0x61c/0xa70
[c000000005dbf860] [c000000000838d40] scsi_sysfs_add_sdev+0x260/0x3a0
[c000000005dbf8f0] [c0000000008333bc] scsi_probe_and_add_lun+0xb5c/0x1120
[c000000005dbfaa0] [c000000000834b04] __scsi_scan_target+0x624/0x780
[c000000005dbfbd0] [c000000000834e1c] scsi_scan_target+0x1bc/0x1e0
[c000000005dbfc30] [c008000000344738] fc_scsi_scan_rport+0xd0/0xe0 [scsi_transport_fc]
[c000000005dbfc60] [c00000000016fe00] process_one_work+0x260/0x530
[c000000005dbfd00] [c000000000170148] worker_thread+0x78/0x5f0
[c000000005dbfda0] [c00000000017a6a8] kthread+0x198/0x1a0
[c000000005dbfe10] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
Instruction dump:
409e000c b0860000 38c60002 409d000c 90860000 38c60004 78a0d183 78a506a0
7c0903a6 41820034 60000000 60000000 <f8860000> f8860008 f8860010 f8860018
---[ end trace c8bca8ca0f2b771c ]---
sd 2:0:1:0: [sds] Attached SCSI disk

sd 2:0:1:1: [sdt] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:1:1: [sdt] Write Protect is off
sd 2:0:1:1: [sdt] Mode Sense: 97 00 10 08
sd 2:0:1:1: [sdt] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:1:1: [sdt] Attached SCSI disk
sd 2:0:1:2: [sdu] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
sd 2:0:1:2: [sdu] Write Protect is off
sd 2:0:1:2: [sdu] Mode Sense: 97 00 10 08
sd 2:0:1:2: [sdu] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:1:2: [sdu] Attached SCSI disk
Kernel panic - not syncing: Fatal exception

The fault instruction is pointing to

gdb -batch vmlinux -ex 'list *(0xc0000000000b8d30)'
0xc0000000000b8d30 is at arch/powerpc/lib/mem_64.S:58.
53 3: srdi. r0,r5,6
54 clrldi r5,r5,58
55 mtctr r0
56 beq 5f
57 .balign 16
58 4: std r4,0(r6)
59 std r4,8(r6)
60 std r4,16(r6)
61 std r4,24(r6)
62 std r4,32(r6)

attaching the kernel config

Regard's
Abdul Haleem
IBM Linux technology Center


Attachments:
ZZ-VM-config.txt (143.70 kB)