2023-09-13 00:29:52

by Daniel Wagner

[permalink] [raw]
Subject: Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port

> We have 3 association:
> >
> > assoc 0: systemd/udev triggered 'nvme connect-all' discovery controller
> > assoc 1: discovery controller from nvme/005
> > assoc 2: i/o controller from nvme/005
>
> Actually, assoc 1 is also a i/o controller for the same hostnqn as assoc
> 2. This looks more like it assoc 0 and 1 are both from systemd/udev
> trigger. But why the heck is the hostnqn for assoc 1 the same as the
> hostnqn we use in blktests? Something is really off here.
>
> The rest of my analysis is still valid.

I figured it out. I still used an older version nvme-cli which didn't
apply the hostnqn correctly. We fixed this in the meantime. With the
latest git version of nvme-cli the second I/O controller is not setup
anymore and the crash is gone. Though we still don't cleanup all
resources as the kernel module complains with

[41707.083039] nvmet_fc: nvmet_fc_exit_module: targetport list not empty


2023-09-13 11:43:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port

So that's interesting. But what I'm mostly worried about is how the
nvmet kernel code allows a request without ->port to progress to the
actual command handler. We should never allow a command to get that
far if ->port is NULL, and should not allow to clear ->port while
commands are still handled.

2023-09-13 15:12:17

by Daniel Wagner

[permalink] [raw]
Subject: Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port

On Wed, Sep 13, 2023 at 01:35:19PM +0200, Christoph Hellwig wrote:
> So that's interesting. But what I'm mostly worried about is how the
> nvmet kernel code allows a request without ->port to progress to the
> actual command handler.

nvmet_fc_handle_fcp_rqst()

if (tgtport->pe)
fod->req.port = tgtport->pe->port;

Not sure why this is there. Will test what happens when we just return
an error when we don't have pe set.

> We should never allow a command to get that
> far if ->port is NULL, and should not allow to clear ->port while
> commands are still handled.

Okay, makes sense. I'll test this when I have access to my rig again tomorrow.