2023-10-18 13:26:29

by Tasmiya Nalatwad

[permalink] [raw]
Subject: [Bisected] [efeda3bf912f] OOPS crash while performing Block device module parameter test [qla2xxx / FC]

Greetings,

OOPs Kernel crash while performing Block device module parameter test
[qla2xxx / FC] on linux-next 6.6.0-rc5-next-20231010

--- Traces ---

[30876.431678] Kernel attempted to read user page (30) - exploit
attempt? (uid: 0)
[30876.431687] BUG: Kernel NULL pointer dereference on read at 0x00000030
[30876.431692] Faulting instruction address: 0xc0080000018e3180
[30876.431697] Oops: Kernel access of bad area, sig: 11 [#1]
[30876.431700] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[30876.431705] Modules linked in: qla2xxx(+) nvme_fc nvme_fabrics
nvme_core dm_round_robin dm_queue_length exfat vfat fat btrfs
blake2b_generic zstd_compress loop raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 linear xfs
libcrc32c raid0 nvram rpadlpar_io rpaphp xsk_diag bonding tls rfkill
vmx_crypto pseries_rng binfmt_misc ext4 mbcache jbd2 dm_service_time
sd_mod sg ibmvfc ibmveth t10_pi crc64_rocksoft crc64 scsi_transport_fc
dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded:
nvme_core]
[30876.431767] CPU: 0 PID: 1289400 Comm: kworker/0:2 Kdump: loaded Not
tainted 6.6.0-rc5-next-20231010-auto #1
[30876.431773] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200
0xf000006 of:IBM,FW1030.30 (NH1030_062) hv:phyp pSeries
[30876.431779] Workqueue: events work_for_cpu_fn
[30876.431788] NIP:  c0080000018e3180 LR: c0080000018e3128 CTR:
c000000000513f80
[30876.431792] REGS: c000000062a8b930 TRAP: 0300   Not tainted
(6.6.0-rc5-next-20231010-auto)
[30876.431797] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> 
CR: 28000482  XER: 2004000f
[30876.431811] CFAR: c0080000018e3138 DAR: 0000000000000030 DSISR:
40000000 IRQMASK: 0
[30876.431811] GPR00: c0080000018e3128 c000000062a8bbd0 c008000000eb8300
0000000000000000
[30876.431811] GPR04: 0000000000000000 0000000000000000 0000000000000000
000000000017bbac
[30876.431811] GPR08: 0000000000000000 0000000000000030 0000000000000000
c0080000019a6d68
[30876.431811] GPR12: 0000000000000000 c000000002ff0000 c00000000019cb98
c000000082a97980
[30876.431811] GPR16: 0000000000000000 0000000000000000 0000000000000000
c000000003071ab0
[30876.431811] GPR20: c000000003491c0d c000000063bb9a00 c000000063bb30c0
c0000001d8b52928
[30876.431811] GPR24: c008000000eb63a8 ffffffffffffffed c0000001d8b52000
0000000000000102
[30876.431811] GPR28: c008000000ebaf00 c0000001d8b52890 0000000000000000
c0000001d8b58000
[30876.431856] NIP [c0080000018e3180] qla2x00_mem_free+0x298/0x6b0 [qla2xxx]
[30876.431876] LR [c0080000018e3128] qla2x00_mem_free+0x240/0x6b0 [qla2xxx]
[30876.431895] Call Trace:
[30876.431897] [c000000062a8bbd0] [c0080000018e2f1c]
qla2x00_mem_free+0x34/0x6b0 [qla2xxx] (unreliable)
[30876.431917] [c000000062a8bc20] [c0080000018eed30]
qla2x00_probe_one+0x16d8/0x2640 [qla2xxx]
[30876.431937] [c000000062a8bd90] [c0000000008c589c]
local_pci_probe+0x6c/0x110
[30876.431943] [c000000062a8be10] [c000000000189ba8]
work_for_cpu_fn+0x38/0x60
[30876.431948] [c000000062a8be40] [c00000000018d0d0]
process_scheduled_works+0x230/0x4f0
[30876.431952] [c000000062a8bf10] [c00000000018fe14]
worker_thread+0x1e4/0x500
[30876.431955] [c000000062a8bf90] [c00000000019ccc8] kthread+0x138/0x140
[30876.431960] [c000000062a8bfe0] [c00000000000df98]
start_kernel_thread+0x14/0x18
[30876.431965] Code: 4082000c a09f0198 78841b68 e8df0278 38e00000
480c3b8d e8410018 39200000 e91f0178 f93f0280 f93f0278 39280030
<e9480030> 7fa95040 419e00b8 ebc80030
[30876.431977] ---[ end trace 0000000000000000 ]---
[30876.480385] pstore: backend (nvram) writing error (-1)


Git bisect points to below commit. Reverting this commit fixes the problem.
commit efeda3bf912f269bcae16816683f432f58d68075
    scsi: qla2xxx: Move resource to allow code reuse

--
Regards,
Tasmiya Nalatwad
IBM Linux Technology Center


2023-10-18 14:33:48

by Nilesh Javali

[permalink] [raw]
Subject: RE: [EXT] [Bisected] [efeda3bf912f] OOPS crash while performing Block device module parameter test [qla2xxx / FC]

Hi Tasmiya,

> -----Original Message-----
> From: Tasmiya Nalatwad <[email protected]>
> Sent: Wednesday, October 18, 2023 6:51 PM
> To: [email protected]; [email protected]; linuxppc-
> [email protected]; [email protected]; [email protected]
> Cc: Quinn Tran <[email protected]>; Nilesh Javali <[email protected]>;
> [email protected]; [email protected]; GR-QLogic-
> Storage-Upstream <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: [EXT] [Bisected] [efeda3bf912f] OOPS crash while performing Block
> device module parameter test [qla2xxx / FC]
>
> External Email
>
> ----------------------------------------------------------------------
> Greetings,
>
> OOPs Kernel crash while performing Block device module parameter test
> [qla2xxx / FC] on linux-next 6.6.0-rc5-next-20231010
>
> --- Traces ---
>
> [30876.431678] Kernel attempted to read user page (30) - exploit
> attempt? (uid: 0)
> [30876.431687] BUG: Kernel NULL pointer dereference on read at 0x00000030
> [30876.431692] Faulting instruction address: 0xc0080000018e3180
> [30876.431697] Oops: Kernel access of bad area, sig: 11 [#1]
> [30876.431700] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA
> pSeries
> [30876.431705] Modules linked in: qla2xxx(+) nvme_fc nvme_fabrics
> nvme_core dm_round_robin dm_queue_length exfat vfat fat btrfs
> blake2b_generic zstd_compress loop raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 linear xfs
> libcrc32c raid0 nvram rpadlpar_io rpaphp xsk_diag bonding tls rfkill
> vmx_crypto pseries_rng binfmt_misc ext4 mbcache jbd2 dm_service_time
> sd_mod sg ibmvfc ibmveth t10_pi crc64_rocksoft crc64 scsi_transport_fc
> dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded:
> nvme_core]
> [30876.431767] CPU: 0 PID: 1289400 Comm: kworker/0:2 Kdump: loaded Not
> tainted 6.6.0-rc5-next-20231010-auto #1
> [30876.431773] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200
> 0xf000006 of:IBM,FW1030.30 (NH1030_062) hv:phyp pSeries
> [30876.431779] Workqueue: events work_for_cpu_fn
> [30876.431788] NIP:  c0080000018e3180 LR: c0080000018e3128 CTR:
> c000000000513f80
> [30876.431792] REGS: c000000062a8b930 TRAP: 0300   Not tainted
> (6.6.0-rc5-next-20231010-auto)
> [30876.431797] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
> CR: 28000482  XER: 2004000f
> [30876.431811] CFAR: c0080000018e3138 DAR: 0000000000000030 DSISR:
> 40000000 IRQMASK: 0
> [30876.431811] GPR00: c0080000018e3128 c000000062a8bbd0
> c008000000eb8300
> 0000000000000000
> [30876.431811] GPR04: 0000000000000000 0000000000000000
> 0000000000000000
> 000000000017bbac
> [30876.431811] GPR08: 0000000000000000 0000000000000030
> 0000000000000000
> c0080000019a6d68
> [30876.431811] GPR12: 0000000000000000 c000000002ff0000
> c00000000019cb98
> c000000082a97980
> [30876.431811] GPR16: 0000000000000000 0000000000000000
> 0000000000000000
> c000000003071ab0
> [30876.431811] GPR20: c000000003491c0d c000000063bb9a00
> c000000063bb30c0
> c0000001d8b52928
> [30876.431811] GPR24: c008000000eb63a8 ffffffffffffffed c0000001d8b52000
> 0000000000000102
> [30876.431811] GPR28: c008000000ebaf00 c0000001d8b52890
> 0000000000000000
> c0000001d8b58000
> [30876.431856] NIP [c0080000018e3180] qla2x00_mem_free+0x298/0x6b0
> [qla2xxx]
> [30876.431876] LR [c0080000018e3128] qla2x00_mem_free+0x240/0x6b0
> [qla2xxx]
> [30876.431895] Call Trace:
> [30876.431897] [c000000062a8bbd0] [c0080000018e2f1c]
> qla2x00_mem_free+0x34/0x6b0 [qla2xxx] (unreliable)
> [30876.431917] [c000000062a8bc20] [c0080000018eed30]
> qla2x00_probe_one+0x16d8/0x2640 [qla2xxx]
> [30876.431937] [c000000062a8bd90] [c0000000008c589c]
> local_pci_probe+0x6c/0x110
> [30876.431943] [c000000062a8be10] [c000000000189ba8]
> work_for_cpu_fn+0x38/0x60
> [30876.431948] [c000000062a8be40] [c00000000018d0d0]
> process_scheduled_works+0x230/0x4f0
> [30876.431952] [c000000062a8bf10] [c00000000018fe14]
> worker_thread+0x1e4/0x500
> [30876.431955] [c000000062a8bf90] [c00000000019ccc8]
> kthread+0x138/0x140
> [30876.431960] [c000000062a8bfe0] [c00000000000df98]
> start_kernel_thread+0x14/0x18
> [30876.431965] Code: 4082000c a09f0198 78841b68 e8df0278 38e00000
> 480c3b8d e8410018 39200000 e91f0178 f93f0280 f93f0278 39280030
> <e9480030> 7fa95040 419e00b8 ebc80030
> [30876.431977] ---[ end trace 0000000000000000 ]---
> [30876.480385] pstore: backend (nvram) writing error (-1)
>
>
> Git bisect points to below commit. Reverting this commit fixes the problem.
> commit efeda3bf912f269bcae16816683f432f58d68075
>     scsi: qla2xxx: Move resource to allow code reuse
>
> --
> Regards,
> Tasmiya Nalatwad
> IBM Linux Technology Center

We have recently posted a fix for the commit that you have pointed here,
https://marc.info/?l=linux-scsi&m=169750508721982&w=2

Thanks,
Nilesh

2023-10-18 17:05:10

by Tasmiya Nalatwad

[permalink] [raw]
Subject: Re: [EXT] [Bisected] [efeda3bf912f] OOPS crash while performing Block device module parameter test [qla2xxx / FC]

Thanks Nilesh. The patch fixes the issue.

On 10/18/23 19:59, Nilesh Javali wrote:
> Hi Tasmiya,
>
>> -----Original Message-----
>> From: Tasmiya Nalatwad <[email protected]>
>> Sent: Wednesday, October 18, 2023 6:51 PM
>> To: [email protected]; [email protected]; linuxppc-
>> [email protected]; [email protected]; [email protected]
>> Cc: Quinn Tran <[email protected]>; Nilesh Javali <[email protected]>;
>> [email protected]; [email protected]; GR-QLogic-
>> Storage-Upstream <[email protected]>;
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: [EXT] [Bisected] [efeda3bf912f] OOPS crash while performing Block
>> device module parameter test [qla2xxx / FC]
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> Greetings,
>>
>> OOPs Kernel crash while performing Block device module parameter test
>> [qla2xxx / FC] on linux-next 6.6.0-rc5-next-20231010
>>
>> --- Traces ---
>>
>> [30876.431678] Kernel attempted to read user page (30) - exploit
>> attempt? (uid: 0)
>> [30876.431687] BUG: Kernel NULL pointer dereference on read at 0x00000030
>> [30876.431692] Faulting instruction address: 0xc0080000018e3180
>> [30876.431697] Oops: Kernel access of bad area, sig: 11 [#1]
>> [30876.431700] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA
>> pSeries
>> [30876.431705] Modules linked in: qla2xxx(+) nvme_fc nvme_fabrics
>> nvme_core dm_round_robin dm_queue_length exfat vfat fat btrfs
>> blake2b_generic zstd_compress loop raid10 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 linear xfs
>> libcrc32c raid0 nvram rpadlpar_io rpaphp xsk_diag bonding tls rfkill
>> vmx_crypto pseries_rng binfmt_misc ext4 mbcache jbd2 dm_service_time
>> sd_mod sg ibmvfc ibmveth t10_pi crc64_rocksoft crc64 scsi_transport_fc
>> dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded:
>> nvme_core]
>> [30876.431767] CPU: 0 PID: 1289400 Comm: kworker/0:2 Kdump: loaded Not
>> tainted 6.6.0-rc5-next-20231010-auto #1
>> [30876.431773] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200
>> 0xf000006 of:IBM,FW1030.30 (NH1030_062) hv:phyp pSeries
>> [30876.431779] Workqueue: events work_for_cpu_fn
>> [30876.431788] NIP:  c0080000018e3180 LR: c0080000018e3128 CTR:
>> c000000000513f80
>> [30876.431792] REGS: c000000062a8b930 TRAP: 0300   Not tainted
>> (6.6.0-rc5-next-20231010-auto)
>> [30876.431797] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
>> CR: 28000482  XER: 2004000f
>> [30876.431811] CFAR: c0080000018e3138 DAR: 0000000000000030 DSISR:
>> 40000000 IRQMASK: 0
>> [30876.431811] GPR00: c0080000018e3128 c000000062a8bbd0
>> c008000000eb8300
>> 0000000000000000
>> [30876.431811] GPR04: 0000000000000000 0000000000000000
>> 0000000000000000
>> 000000000017bbac
>> [30876.431811] GPR08: 0000000000000000 0000000000000030
>> 0000000000000000
>> c0080000019a6d68
>> [30876.431811] GPR12: 0000000000000000 c000000002ff0000
>> c00000000019cb98
>> c000000082a97980
>> [30876.431811] GPR16: 0000000000000000 0000000000000000
>> 0000000000000000
>> c000000003071ab0
>> [30876.431811] GPR20: c000000003491c0d c000000063bb9a00
>> c000000063bb30c0
>> c0000001d8b52928
>> [30876.431811] GPR24: c008000000eb63a8 ffffffffffffffed c0000001d8b52000
>> 0000000000000102
>> [30876.431811] GPR28: c008000000ebaf00 c0000001d8b52890
>> 0000000000000000
>> c0000001d8b58000
>> [30876.431856] NIP [c0080000018e3180] qla2x00_mem_free+0x298/0x6b0
>> [qla2xxx]
>> [30876.431876] LR [c0080000018e3128] qla2x00_mem_free+0x240/0x6b0
>> [qla2xxx]
>> [30876.431895] Call Trace:
>> [30876.431897] [c000000062a8bbd0] [c0080000018e2f1c]
>> qla2x00_mem_free+0x34/0x6b0 [qla2xxx] (unreliable)
>> [30876.431917] [c000000062a8bc20] [c0080000018eed30]
>> qla2x00_probe_one+0x16d8/0x2640 [qla2xxx]
>> [30876.431937] [c000000062a8bd90] [c0000000008c589c]
>> local_pci_probe+0x6c/0x110
>> [30876.431943] [c000000062a8be10] [c000000000189ba8]
>> work_for_cpu_fn+0x38/0x60
>> [30876.431948] [c000000062a8be40] [c00000000018d0d0]
>> process_scheduled_works+0x230/0x4f0
>> [30876.431952] [c000000062a8bf10] [c00000000018fe14]
>> worker_thread+0x1e4/0x500
>> [30876.431955] [c000000062a8bf90] [c00000000019ccc8]
>> kthread+0x138/0x140
>> [30876.431960] [c000000062a8bfe0] [c00000000000df98]
>> start_kernel_thread+0x14/0x18
>> [30876.431965] Code: 4082000c a09f0198 78841b68 e8df0278 38e00000
>> 480c3b8d e8410018 39200000 e91f0178 f93f0280 f93f0278 39280030
>> <e9480030> 7fa95040 419e00b8 ebc80030
>> [30876.431977] ---[ end trace 0000000000000000 ]---
>> [30876.480385] pstore: backend (nvram) writing error (-1)
>>
>>
>> Git bisect points to below commit. Reverting this commit fixes the problem.
>> commit efeda3bf912f269bcae16816683f432f58d68075
>>     scsi: qla2xxx: Move resource to allow code reuse
>>
>> --
>> Regards,
>> Tasmiya Nalatwad
>> IBM Linux Technology Center
> We have recently posted a fix for the commit that you have pointed here,
> https://marc.info/?l=linux-scsi&m=169750508721982&w=2
>
> Thanks,
> Nilesh

--
Regards,
Tasmiya Nalatwad
IBM Linux Technology Center