2019-08-28 05:05:31

by Abdul Haleem

[permalink] [raw]
Subject: [linux-next][BUG][driver/scsi/lpfc][10541f] Kernel panics when booting next kernel on my Power 9 box

Greetings,

linux-next kernel 5.3.0-rc1 failed to boot with kernel Oops on Power 9
box

I see a recent changes to lpfc code was from commit
10541f03 scsi: lpfc: Update lpfc version to 12.4.0.0

Recent boot logs:

[..snip..]
Emulex LightPulse Fibre Channel SCSI driver 12.4.0.0
Copyright (C) 2017-2019 Broadcom. All Rights Reserved. The term "Broadcom" refers to Broadcom Inc. and/or its subsidiaries.
bnx2x 0021:01:00.0: part number 0-0-0-0
bnx2x 0021:01:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0021:01:00.1: msix capability found
bnx2x 0021:01:00.1: part number 0-0-0-0
lpfc 0022:01:00.0: 0:2574 IO channels: hdwQ 16 IRQ 16 MRQ: 0
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 0 (phys 0 core 0): hdwq 0 eq 0 irq 48 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 1 (phys 0 core 1): hdwq 1 eq 1 irq 49 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 2 (phys 0 core 2): hdwq 2 eq 2 irq 50 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 3 (phys 0 core 3): hdwq 3 eq 3 irq 51 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 4 (phys 0 core 4): hdwq 4 eq 4 irq 52 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 5 (phys 0 core 5): hdwq 5 eq 5 irq 53 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 6 (phys 0 core 6): hdwq 6 eq 6 irq 54 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 7 (phys 0 core 7): hdwq 7 eq 7 irq 55 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 8 (phys 0 core 8): hdwq 8 eq 8 irq 56 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 9 (phys 0 core 9): hdwq 9 eq 9 irq 57 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 10 (phys 0 core 10): hdwq 10 eq 10 irq 58 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 11 (phys 0 core 11): hdwq 11 eq 11 irq 59 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 12 (phys 0 core 12): hdwq 12 eq 12 irq 60 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 13 (phys 0 core 13): hdwq 13 eq 13 irq 61 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 14 (phys 0 core 14): hdwq 14 eq 14 irq 62 flg x4
lpfc 0022:01:00.0: 0:3333 Set Affinity: CPU 15 (phys 0 core 15): hdwq 15 eq 15 irq 63 flg x4
scsi host1: Emulex LPe32000 32Gb PCIe Fibre Channel Adapter on PCI bus 01 device 00 irq 20 PCI resettable
bnx2x 0021:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0021:01:00.2: msix capability found
bnx2x 0021:01:00.2: part number 0-0-0-0
bnx2x 0021:01:00.2: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0021:01:00.3: msix capability found
bnx2x 0021:01:00.3: part number 0-0-0-0
BUG: Kernel NULL pointer dereference at 0x00000148
Faulting instruction address: 0xc0080000006f97dc
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: lpfc(E+) bnx2x(E+) ibmvscsi(E+) ibmveth(E)
scsi_transport_srp(E) nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E)
scsi_transport_fc(E) mdio(E) libcrc32c(E) ptp(E) pps_core(E) nvme(E)
nvme_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
CPU: 0 PID: 16 Comm: kworker/0:0 Tainted: G E 5.3.0-rc6-next-20190826-autotest-autotest #1
Workqueue: events work_for_cpu_fn
NIP: c0080000006f97dc LR: c008000000712de4 CTR: c0080000006f9760
REGS: c00000027b57b660 TRAP: 0380 Tainted: G E (5.3.0-rc6-next-20190826-autotest-autotest)
MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 84002044 XER: 20000008
CFAR: c0080000006f976c IRQMASK: 0
GPR00: c008000000712c90 c00000027b57b8f0 c0080000007c8900 c000000003a44000
GPR04: c00000026bc95c00 0000000000000000 0000000000000001 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000020000000 c008000000799d18
GPR12: c0080000006f9760 c00000001ecaee00 c00000027d096980 c00000027d0966f0
GPR16: c00000027d0966a0 c00000027d096680 c000000001323a00 0000000000000000
GPR20: c00000027b4ad000 fffffffffffffef7 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000001323a00 c00000026e3c3848 c000000003a45148
GPR28: 0000000000000000 000000000000000a c00000026e394c00 c000000003a44000
NIP [c0080000006f97dc] lpfc_sli4_write_cq_db+0x7c/0xa0 [lpfc]
LR [c008000000712de4] lpfc_sli4_hba_setup+0x1024/0x1dc0 [lpfc]
Call Trace:
[c00000027b57b8f0] [c008000000712c90] lpfc_sli4_hba_setup+0xed0/0x1dc0 [lpfc] (unreliable)
[c00000027b57ba90] [c008000000752c28] lpfc_pci_probe_one_s4.isra.25+0x358/0x13b0 [lpfc]
[c00000027b57bb20] [c008000000753270] lpfc_pci_probe_one_s4.isra.25+0x9a0/0x13b0 [lpfc]
[c00000027b57bbb0] [c0000000005d7af4] local_pci_probe+0x64/0x100
[c00000027b57bc30] [c0000000001363c0] work_for_cpu_fn+0x30/0x50
[c00000027b57bc60] [c00000000013c2a0] process_one_work+0x1c0/0x480
[c00000027b57bd00] [c00000000013c7e8] worker_thread+0x288/0x590
[c00000027b57bdb0] [c00000000014401c] kthread+0x14c/0x190
[c00000027b57be20] [c00000000000b760] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
a14d1180 b14d1182 4e800020 2fa60000 4dde0020 81040098 54a580de e9240038
3d402000 7caa5378 5507b2be 550805be <e9290148> 54e75c28 7d4a3b78 7d484378
---[ end trace 067b4fb92c298ba5 ]---

Kernel panic - not syncing: Fatal exception
------------[ cut here ]------------
WARNING: CPU: 0 PID: 16 at drivers/tty/vt/vt.c:4256 do_unblank_screen+0x1dc/0x250
Modules linked in: lpfc(E+) bnx2x(E+) ibmvscsi(E+) ibmveth(E)
scsi_transport_srp(E) nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E)
scsi_transport_fc(E) mdio(E) libcrc32c(E) ptp(E) pps_core(E) nvme(E)
nvme_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
CPU: 0 PID: 16 Comm: kworker/0:0 Tainted: G D E 5.3.0-rc6-next-20190826-autotest-autotest #1
Workqueue: events work_for_cpu_fn
NIP: c000000000659adc LR: c000000000659ac4 CTR: c000000000499980
REGS: c00000027b57b140 TRAP: 0700 Tainted: G D E (5.3.0-rc6-next-20190826-autotest-autotest)
MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28002042 XER: 2000000c
CFAR: c000000000194da0 IRQMASK: 3
GPR00: c000000000659af4 c00000027b57b3d0 c000000001303000 0000000000000000
GPR04: 0000000000000003 0000000000000800 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000001001
GPR12: c000000000499980 c00000001ecaee00 c00000027d096980 c00000027d0966f0
GPR16: c00000027d0966a0 c00000027d096680 c000000001323a00 0000000000000000
GPR20: c00000027b4ad000 fffffffffffffef7 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000001323a00 c0000000011d2728 c0000000014d6758
GPR28: c0000000014d6730 0000000000000000 c000000000cada08 c0000000015e5270
NIP [c000000000659adc] do_unblank_screen+0x1dc/0x250
LR [c000000000659ac4] do_unblank_screen+0x1c4/0x250
Call Trace:
[c00000027b57b3d0] [c000000000659af4] do_unblank_screen+0x1f4/0x250 (unreliable)
[c00000027b57b450] [c000000000112c08] panic+0x1e0/0x3f8
[c00000027b57b4e0] [c00000000002a628] oops_end+0x1a8/0x1b0
[c00000027b57b560] [c00000000006f718] bad_page_fault+0xe8/0x194
[c00000027b57b5d0] [c000000000078318] do_bad_slb_fault+0x88/0xa0
[c00000027b57b5f0] [c000000000008840] data_access_slb_common+0x130/0x140
--- interrupt: 380 at lpfc_sli4_write_cq_db+0x7c/0xa0 [lpfc]
LR = lpfc_sli4_hba_setup+0x1024/0x1dc0 [lpfc]
[c00000027b57b8f0] [c008000000712c90] lpfc_sli4_hba_setup+0xed0/0x1dc0 [lpfc] (unreliable)
[c00000027b57ba90] [c008000000752c28] lpfc_pci_probe_one_s4.isra.25+0x358/0x13b0 [lpfc]
[c00000027b57bb20] [c008000000753270] lpfc_pci_probe_one_s4.isra.25+0x9a0/0x13b0 [lpfc]
[c00000027b57bbb0] [c0000000005d7af4] local_pci_probe+0x64/0x100
[c00000027b57bc30] [c0000000001363c0] work_for_cpu_fn+0x30/0x50
[c00000027b57bc60] [c00000000013c2a0] process_one_work+0x1c0/0x480
[c00000027b57bd00] [c00000000013c7e8] worker_thread+0x288/0x590
[c00000027b57bdb0] [c00000000014401c] kthread+0x14c/0x190
[c00000027b57be20] [c00000000000b760] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
7c0803a6 4e800020 60000000 60000000 60420000 4bb3b2d9 60000000 2fa30000
409efe80 813f0000 2f890000 409efe74 <0fe00000> 4bfffe6c 60000000 60000000
---[ end trace 067b4fb92c298ba6 ]---
Rebooting in 10 seconds..

Detailed logs attached

--
Regard's

Abdul Haleem
IBM Linux Technology Centre



Attachments:
bootlogs.txt (28.68 kB)
ZZ-VM-config (149.70 kB)
Download all attachments

2019-08-28 15:23:50

by James Smart

[permalink] [raw]
Subject: Re: [linux-next][BUG][driver/scsi/lpfc][10541f] Kernel panics when booting next kernel on my Power 9 box

On 8/27/2019 10:02 PM, Abdul Haleem wrote:
> Greetings,
>
> linux-next kernel 5.3.0-rc1 failed to boot with kernel Oops on Power 9
> box
>
> I see a recent changes to lpfc code was from commit
> 10541f03 scsi: lpfc: Update lpfc version to 12.4.0.0
>
> Recent boot logs:
>
> [..snip..]

see https://www.spinics.net/lists/linux-scsi/msg133343.html

It hasn't been tested yet, but appears to be the issue.

-- james

2019-08-28 17:44:22

by Abdul Haleem

[permalink] [raw]
Subject: Re: [linux-next][BUG][driver/scsi/lpfc][c00f62e6] Kernel panics when booting next kernel on my Power 9 box

On Wed, 2019-08-28 at 08:22 -0700, James Smart wrote:
> On 8/27/2019 10:02 PM, Abdul Haleem wrote:
> > Greetings,
> >
> > linux-next kernel 5.3.0-rc1 failed to boot with kernel Oops on Power 9
> > box
> >
> > I see a recent changes to lpfc code was from commit
> > 10541f03 scsi: lpfc: Update lpfc version to 12.4.0.0
> >
> > Recent boot logs:
> >
> > [..snip..]
>
> see https://www.spinics.net/lists/linux-scsi/msg133343.html
>
> It hasn't been tested yet, but appears to be the issue.

Ah, commit c00f62e6 (scsi: lpfc: Merge per-protocol...) is the bad one
and Yes the patch fixes it, System booted fine with below code change

--- a/drivers/scsi/lpfc/lpfc_sli.c 2019-08-23 13:55:18.253546775 -0700
+++ b/drivers/scsi/lpfc_sli.c 2019-08-27 17:04:51.095330056 -0700
@@ -5553,7 +5553,7 @@ lpfc_sli4_arm_cqeq_intr(struct lpfc_hba
for (qidx = 0; qidx < phba->cfg_hdw_queue; qidx++) {
qp = &sli4_hba->hdwq[qidx];
/* ARM the corresponding CQ */
- sli4_hba->sli4_write_cq_db(phba, qp[qidx].io_cq, 0,
+ sli4_hba->sli4_write_cq_db(phba, qp->io_cq, 0,
LPFC_QUEUE_REARM);


Tested-by: Abdul Haleem <[email protected]>

--
Regard's

Abdul Haleem
IBM Linux Technology Centre