2024-06-10 10:18:50

by Venkat Rao Bagalkote

[permalink] [raw]
Subject: Kernel OOPS while creating a NVMe Namespace

Greetings!!!

Observing Kernel OOPS, while creating namespace on a NVMe device.

[  140.209777] BUG: Unable to handle kernel data access at
0x18d7003065646fee
[  140.209792] Faulting instruction address: 0xc00000000023b45c
[  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
[  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[  140.209809] Modules linked in: rpadlpar_io rpaphp xsk_diag
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
bonding nf_conntrack tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set
nf_tables nfnetlink vmx_crypto pseries_rng binfmt_misc fuse xfs
libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth nvme nvme_core
t10_pi crc64_rocksoft_generic crc64_rocksoft crc64
[  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not
tainted 6.10.0-rc3 #2
[  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202
0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
[  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR:
c00000000023b42c
[  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted (6.10.0-rc3)
[  140.209899] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> 
CR: 24000244  XER: 00000000
[  140.209915] CFAR: c008000006aa80ac IRQMASK: 0
[  140.209915] GPR00: c008000006a96b20 c000000050607b40 c000000001573700
c000000004291ee0
[  140.209915] GPR04: 0000000000000000 c000000006150080 00000000c0080005
fffffffffffe0000
[  140.209915] GPR08: 0000000000000000 18d7003065646f6e 0000000000000000
c008000006aa8098
[  140.209915] GPR12: c00000000023b42c c00000000f7cdf00 c0000000001a151c
c000000004f2be80
[  140.209915] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  140.209915] GPR20: c000000004dbcc00 0000000000000006 0000000000000002
c000000004911270
[  140.209915] GPR24: 0000000000000000 0000000000000000 c0000000ee254ffc
c0000000049111f0
[  140.209915] GPR28: 0000000000000000 c000000004911260 c000000004291ee0
c000000004911260
[  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
[  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 [nvme_core]
[  140.209994] Call Trace:
[  140.209997] [c000000050607b90] [c008000006a96b20]
nvme_ns_remove+0x80/0x2d8 [nvme_core]
[  140.210008] [c000000050607bd0] [c008000006a972b4]
nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
[  140.210020] [c000000050607c60] [c008000006a9dbd4]
nvme_scan_ns_list+0x19c/0x370 [nvme_core]
[  140.210032] [c000000050607d70] [c008000006a9dfc8]
nvme_scan_work+0xc8/0x278 [nvme_core]
[  140.210043] [c000000050607e40] [c00000000019414c]
process_one_work+0x20c/0x4f4
[  140.210051] [c000000050607ef0] [c0000000001950cc]
worker_thread+0x378/0x544
[  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
[  140.210065] [c000000050607fe0] [c00000000000df98]
start_kernel_thread+0x14/0x18
[  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6
fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010
<e9290080> 7c2004ac 71290003 41820008
[  140.210093] ---[ end trace 0000000000000000 ]---


Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.


Reverting it, issue is not seen.


Regards,

Venkat.




2024-06-10 18:32:23

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: Kernel OOPS while creating a NVMe Namespace

On 6/10/24 00:51, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
> Observing Kernel OOPS, while creating namespace on a NVMe device.
>
> [  140.209777] BUG: Unable to handle kernel data access at
> 0x18d7003065646fee
> [  140.209792] Faulting instruction address: 0xc00000000023b45c
> [  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
> [  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
> [  140.209809] Modules linked in: rpadlpar_io rpaphp xsk_diag
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> bonding nf_conntrack tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set
> nf_tables nfnetlink vmx_crypto pseries_rng binfmt_misc fuse xfs
> libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth nvme nvme_core
> t10_pi crc64_rocksoft_generic crc64_rocksoft crc64
> [  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not
> tainted 6.10.0-rc3 #2
> [  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202
> 0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
> [  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR:
> c00000000023b42c
> [  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted
> (6.10.0-rc3)
> [  140.209899] MSR:  800000000280b033
> <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000244  XER: 00000000
> [  140.209915] CFAR: c008000006aa80ac IRQMASK: 0
> [  140.209915] GPR00: c008000006a96b20 c000000050607b40
> c000000001573700 c000000004291ee0
> [  140.209915] GPR04: 0000000000000000 c000000006150080
> 00000000c0080005 fffffffffffe0000
> [  140.209915] GPR08: 0000000000000000 18d7003065646f6e
> 0000000000000000 c008000006aa8098
> [  140.209915] GPR12: c00000000023b42c c00000000f7cdf00
> c0000000001a151c c000000004f2be80
> [  140.209915] GPR16: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> [  140.209915] GPR20: c000000004dbcc00 0000000000000006
> 0000000000000002 c000000004911270
> [  140.209915] GPR24: 0000000000000000 0000000000000000
> c0000000ee254ffc c0000000049111f0
> [  140.209915] GPR28: 0000000000000000 c000000004911260
> c000000004291ee0 c000000004911260
> [  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
> [  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8
> [nvme_core]
> [  140.209994] Call Trace:
> [  140.209997] [c000000050607b90] [c008000006a96b20]
> nvme_ns_remove+0x80/0x2d8 [nvme_core]
> [  140.210008] [c000000050607bd0] [c008000006a972b4]
> nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
> [  140.210020] [c000000050607c60] [c008000006a9dbd4]
> nvme_scan_ns_list+0x19c/0x370 [nvme_core]
> [  140.210032] [c000000050607d70] [c008000006a9dfc8]
> nvme_scan_work+0xc8/0x278 [nvme_core]
> [  140.210043] [c000000050607e40] [c00000000019414c]
> process_one_work+0x20c/0x4f4
> [  140.210051] [c000000050607ef0] [c0000000001950cc]
> worker_thread+0x378/0x544
> [  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
> [  140.210065] [c000000050607fe0] [c00000000000df98]
> start_kernel_thread+0x14/0x18
> [  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6
> fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010
> <e9290080> 7c2004ac 71290003 41820008
> [  140.210093] ---[ end trace 0000000000000000 ]---
>
>
> Issue is introduced by the patch:
> be647e2c76b27f409cdd520f66c95be888b553a3.
>
>
> Reverting it, issue is not seen.
>
>
> Regards,
>
> Venkat.
>
>
>

do you have steps that you can share ?
did you find this using blktest ? if not can you please submit
the blktest for this issue ? this clearly needs to be tested regularly
since people are working on this sensitive area ...

-ck


2024-06-10 18:53:31

by Keith Busch

[permalink] [raw]
Subject: Re: Kernel OOPS while creating a NVMe Namespace

On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
>
> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.

My mistake. The namespace remove list appears to be getting corrupted
because I'm using the wrong APIs to replace a "list_move_tail". This is
fixing the issue on my end:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7c9f91314d366..c667290de5133 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,

mutex_lock(&ctrl->namespaces_lock);
list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
- if (ns->head->ns_id > nsid)
- list_splice_init_rcu(&ns->list, &rm_list,
- synchronize_rcu);
+ if (ns->head->ns_id > nsid) {
+ list_del_rcu(&ns->list);
+ list_add_tail_rcu(&ns->list, &rm_list);
+ }
}
mutex_unlock(&ctrl->namespaces_lock);
synchronize_srcu(&ctrl->srcu);
--

2024-06-10 19:05:13

by Sagi Grimberg

[permalink] [raw]
Subject: Re: Kernel OOPS while creating a NVMe Namespace



On 10/06/2024 21:53, Keith Busch wrote:
> On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
>> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> My mistake. The namespace remove list appears to be getting corrupted
> because I'm using the wrong APIs to replace a "list_move_tail". This is
> fixing the issue on my end:
>
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7c9f91314d366..c667290de5133 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>
> mutex_lock(&ctrl->namespaces_lock);
> list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> - if (ns->head->ns_id > nsid)
> - list_splice_init_rcu(&ns->list, &rm_list,
> - synchronize_rcu);
> + if (ns->head->ns_id > nsid) {
> + list_del_rcu(&ns->list);
> + list_add_tail_rcu(&ns->list, &rm_list);
> + }
> }
> mutex_unlock(&ctrl->namespaces_lock);
> synchronize_srcu(&ctrl->srcu);
> --

Can we add a reproducer for this in blktests? I'm assuming that we can
easily trigger this
with adding/removing nvmet namespaces?

2024-06-10 19:15:20

by Keith Busch

[permalink] [raw]
Subject: Re: Kernel OOPS while creating a NVMe Namespace

On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
>
>
> On 10/06/2024 21:53, Keith Busch wrote:
> > On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
> > > Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> > My mistake. The namespace remove list appears to be getting corrupted
> > because I'm using the wrong APIs to replace a "list_move_tail". This is
> > fixing the issue on my end:
> >
> > ---
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 7c9f91314d366..c667290de5133 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
> > mutex_lock(&ctrl->namespaces_lock);
> > list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > - if (ns->head->ns_id > nsid)
> > - list_splice_init_rcu(&ns->list, &rm_list,
> > - synchronize_rcu);
> > + if (ns->head->ns_id > nsid) {
> > + list_del_rcu(&ns->list);
> > + list_add_tail_rcu(&ns->list, &rm_list);
> > + }
> > }
> > mutex_unlock(&ctrl->namespaces_lock);
> > synchronize_srcu(&ctrl->srcu);
> > --
>
> Can we add a reproducer for this in blktests? I'm assuming that we can
> easily trigger this
> with adding/removing nvmet namespaces?

I'm testing this with Namespace Manamgent commands, which nvmet doesn't
support. You can recreate the issue by detaching the last namespace.

2024-06-10 19:18:20

by Sagi Grimberg

[permalink] [raw]
Subject: Re: Kernel OOPS while creating a NVMe Namespace



On 10/06/2024 22:15, Keith Busch wrote:
> On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
>>
>> On 10/06/2024 21:53, Keith Busch wrote:
>>> On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
>>>> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
>>> My mistake. The namespace remove list appears to be getting corrupted
>>> because I'm using the wrong APIs to replace a "list_move_tail". This is
>>> fixing the issue on my end:
>>>
>>> ---
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 7c9f91314d366..c667290de5133 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>>> mutex_lock(&ctrl->namespaces_lock);
>>> list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
>>> - if (ns->head->ns_id > nsid)
>>> - list_splice_init_rcu(&ns->list, &rm_list,
>>> - synchronize_rcu);
>>> + if (ns->head->ns_id > nsid) {
>>> + list_del_rcu(&ns->list);
>>> + list_add_tail_rcu(&ns->list, &rm_list);
>>> + }
>>> }
>>> mutex_unlock(&ctrl->namespaces_lock);
>>> synchronize_srcu(&ctrl->srcu);
>>> --
>> Can we add a reproducer for this in blktests? I'm assuming that we can
>> easily trigger this
>> with adding/removing nvmet namespaces?
> I'm testing this with Namespace Manamgent commands, which nvmet doesn't
> support. You can recreate the issue by detaching the last namespace.
>

I think the same will happen in a test that creates two namespaces and
then echo 0 > ns/enable.

2024-06-10 19:34:04

by Keith Busch

[permalink] [raw]
Subject: Re: Kernel OOPS while creating a NVMe Namespace

On Mon, Jun 10, 2024 at 10:17:42PM +0300, Sagi Grimberg wrote:
> On 10/06/2024 22:15, Keith Busch wrote:
> > On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
> > >
> > > On 10/06/2024 21:53, Keith Busch wrote:
> > > > On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
> > > > > Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> > > > My mistake. The namespace remove list appears to be getting corrupted
> > > > because I'm using the wrong APIs to replace a "list_move_tail". This is
> > > > fixing the issue on my end:
> > > >
> > > > ---
> > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > > index 7c9f91314d366..c667290de5133 100644
> > > > --- a/drivers/nvme/host/core.c
> > > > +++ b/drivers/nvme/host/core.c
> > > > @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
> > > > mutex_lock(&ctrl->namespaces_lock);
> > > > list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > > > - if (ns->head->ns_id > nsid)
> > > > - list_splice_init_rcu(&ns->list, &rm_list,
> > > > - synchronize_rcu);
> > > > + if (ns->head->ns_id > nsid) {
> > > > + list_del_rcu(&ns->list);
> > > > + list_add_tail_rcu(&ns->list, &rm_list);
> > > > + }
> > > > }
> > > > mutex_unlock(&ctrl->namespaces_lock);
> > > > synchronize_srcu(&ctrl->srcu);
> > > > --
> > > Can we add a reproducer for this in blktests? I'm assuming that we can
> > > easily trigger this
> > > with adding/removing nvmet namespaces?
> > I'm testing this with Namespace Manamgent commands, which nvmet doesn't
> > support. You can recreate the issue by detaching the last namespace.
> >
>
> I think the same will happen in a test that creates two namespaces and then
> echo 0 > ns/enable.

Looks like nvme/016 tess this. It's reporting as "passed" on my end, but
I don't think it's actually testing the driver as intended. Still
messing with it.