2019-08-31 00:03:20

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH] nvme-core: Fix subsystem instance mismatches

With multipath enabled (which is now default in many distros), nvme
controllers and their respective namespaces can be numbered
differently. For example: nvme0n1 might actually belong to controller
nvme1, which is super confusing (and may have broken any scripts that
rely on the numbering relation). To make matters worse, the mismatches
can sometimes change from boot to boot so anyone dealing with the
nvme control device has to inspect sysfs every boot to ensure they
get the right one.

The reason for this is that the X in nvmeXn1 is the subsystem's instance
number (when multipath is enabled) which is distinct from the
controller's instance and the subsystem instance is assigned from a
separate IDA after the controller gets identified and this can race
a bit when multiple controllers are being setup.

To fix this, assign the subsystem's instance based on the instance
number of the controller's instance that first created it. There should
always be fewer subsystems than controllers so the should not be a need
to create extra subsystems that overlap existing controllers.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/host/core.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d3d6b7bd6903..ca201b71ab49 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq);
struct workqueue_struct *nvme_delete_wq;
EXPORT_SYMBOL_GPL(nvme_delete_wq);

-static DEFINE_IDA(nvme_subsystems_ida);
static LIST_HEAD(nvme_subsystems);
static DEFINE_MUTEX(nvme_subsystems_lock);

@@ -2332,7 +2331,6 @@ static void nvme_release_subsystem(struct device *dev)
struct nvme_subsystem *subsys =
container_of(dev, struct nvme_subsystem, dev);

- ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
kfree(subsys);
}

@@ -2461,12 +2459,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
if (!subsys)
return -ENOMEM;
- ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL);
- if (ret < 0) {
- kfree(subsys);
- return ret;
- }
- subsys->instance = ret;
+ subsys->instance = ctrl->instance;
mutex_init(&subsys->lock);
kref_init(&subsys->ref);
INIT_LIST_HEAD(&subsys->ctrls);
@@ -4074,7 +4067,6 @@ static int __init nvme_core_init(void)

static void __exit nvme_core_exit(void)
{
- ida_destroy(&nvme_subsystems_ida);
class_destroy(nvme_subsys_class);
class_destroy(nvme_class);
unregister_chrdev_region(nvme_chr_devt, NVME_MINORS);
--
2.20.1


2019-08-31 15:31:56

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
> To fix this, assign the subsystem's instance based on the instance
> number of the controller's instance that first created it. There should
> always be fewer subsystems than controllers so the should not be a need
> to create extra subsystems that overlap existing controllers.

The subsystem's lifetime is not tied to the controller's. When the
controller is removed and releases its instance, the next controller
to take that available instance will create naming collisions with the
subsystem still using it.

2019-09-03 16:09:51

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches



On 2019-08-31 9:29 a.m., Keith Busch wrote:
> On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
>> To fix this, assign the subsystem's instance based on the instance
>> number of the controller's instance that first created it. There should
>> always be fewer subsystems than controllers so the should not be a need
>> to create extra subsystems that overlap existing controllers.
>
> The subsystem's lifetime is not tied to the controller's. When the
> controller is removed and releases its instance, the next controller
> to take that available instance will create naming collisions with the
> subsystem still using it.
>

Hmm, yes, ok.

So perhaps we can just make the subsystem prefer the ctrl's instance
when allocating the ID? Then at least, in the common case, the
controller numbers will match the subsystem numbers. Only when there's
random hot-plugs would the numbers get out of sync.

Logan

2019-09-03 16:49:22

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Tue, Sep 03, 2019 at 10:08:01AM -0600, Logan Gunthorpe wrote:
> On 2019-08-31 9:29 a.m., Keith Busch wrote:
> > On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
> >> To fix this, assign the subsystem's instance based on the instance
> >> number of the controller's instance that first created it. There should
> >> always be fewer subsystems than controllers so the should not be a need
> >> to create extra subsystems that overlap existing controllers.
> >
> > The subsystem's lifetime is not tied to the controller's. When the
> > controller is removed and releases its instance, the next controller
> > to take that available instance will create naming collisions with the
> > subsystem still using it.
> >
>
> Hmm, yes, ok.
>
> So perhaps we can just make the subsystem prefer the ctrl's instance
> when allocating the ID? Then at least, in the common case, the
> controller numbers will match the subsystem numbers. Only when there's
> random hot-plugs would the numbers get out of sync.

I really don't know about a patch that works only on static
configurations. Connects and disconnects do happen on live systems,
so the numerals will inevitably get out of sync.

Could we possibly make /dev/nvmeX be a subsystem handle without causing
trouble for anyone? This would essentially be the same thing as today
for non-CMIC controllers with a device-per-controller and only affects
the CMIC ones.

2019-09-03 18:15:27

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches



On 2019-09-03 10:46 a.m., Keith Busch wrote:
> On Tue, Sep 03, 2019 at 10:08:01AM -0600, Logan Gunthorpe wrote:
>> On 2019-08-31 9:29 a.m., Keith Busch wrote:
>>> On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
>>>> To fix this, assign the subsystem's instance based on the instance
>>>> number of the controller's instance that first created it. There should
>>>> always be fewer subsystems than controllers so the should not be a need
>>>> to create extra subsystems that overlap existing controllers.
>>>
>>> The subsystem's lifetime is not tied to the controller's. When the
>>> controller is removed and releases its instance, the next controller
>>> to take that available instance will create naming collisions with the
>>> subsystem still using it.
>>>
>>
>> Hmm, yes, ok.
>>
>> So perhaps we can just make the subsystem prefer the ctrl's instance
>> when allocating the ID? Then at least, in the common case, the
>> controller numbers will match the subsystem numbers. Only when there's
>> random hot-plugs would the numbers get out of sync.
>
> I really don't know about a patch that works only on static
> configurations. Connects and disconnects do happen on live systems,
> so the numerals will inevitably get out of sync.

Well this depends on how big a problem we think the number mismatch is.
Right now it's pretty annoying because numbers aren't matching for
non-CMIC controllers in simple setups on boot. I think having a small
patch that makes it more consistent for the static would be worth it and
if CMIC controllers with significant hot-plug events have mismatches
that seems more understandable to me.

> Could we possibly make /dev/nvmeX be a subsystem handle without causing
> trouble for anyone? This would essentially be the same thing as today
> for non-CMIC controllers with a device-per-controller and only affects
> the CMIC ones.

Well then we'd have to be able to do everything that's possible with a
controller via the subsystem and it would have to multiplex all admin
commands for CMIC ones, etc to a sensible controller. This might make
sense in the long term but it sounds like a larger project than I have
time to take on.

Logan

2019-09-04 06:07:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Tue, Sep 03, 2019 at 10:46:20AM -0600, Keith Busch wrote:
> Could we possibly make /dev/nvmeX be a subsystem handle without causing
> trouble for anyone? This would essentially be the same thing as today
> for non-CMIC controllers with a device-per-controller and only affects
> the CMIC ones.

A per-subsyste character device doesn't make sense, as a lot of admin
command require a specific controller.

If this really is an isue for people we'll just need to refcount the
handle allocation. That is:

- nvme_init_ctrl allocates a new nvme_instance or so object, which
does the ida_simple_get.
- we allocate a new subsystem that reuses the handle and grabs
a reference in nvme_init_subsystem, then if we find an existing
subsystem we drop that reference again.
- last free of a ctrl or subsystem also drops a reference, with
the final free releasing the ida

2019-09-04 14:47:12

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Wed, Sep 04, 2019 at 08:05:58AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 03, 2019 at 10:46:20AM -0600, Keith Busch wrote:
> > Could we possibly make /dev/nvmeX be a subsystem handle without causing
> > trouble for anyone? This would essentially be the same thing as today
> > for non-CMIC controllers with a device-per-controller and only affects
> > the CMIC ones.
>
> A per-subsyste character device doesn't make sense, as a lot of admin
> command require a specific controller.

Yeah, I was hoping to provide something special for CMIC controllers
so you can do path specific admin, but that looks sure to break user
space.

> If this really is an isue for people we'll just need to refcount the
> handle allocation. That is:
>
> - nvme_init_ctrl allocates a new nvme_instance or so object, which
> does the ida_simple_get.
> - we allocate a new subsystem that reuses the handle and grabs
> a reference in nvme_init_subsystem, then if we find an existing
> subsystem we drop that reference again.
> - last free of a ctrl or subsystem also drops a reference, with
> the final free releasing the ida

Let me step through an example:

Ctrl A gets instance 0.

Its subsystem gets the same instance, and takes ref count on it:
all namespaces in this subsystem will use '0'.

Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
no new subsytem is allocated.

Ctrl A is disconnected, dropping its ref on instance 0, but the
subsystem still has its refcount, making it unavailable.

Ctrl A is reconnected, and allocates instance 2 because 0 is still in
use.

Now all the namespaces in this subsystem are prefixed with nvme0, but no
controller exists with the same prefix. We still have inevitable naming
mismatch, right?

2019-09-04 15:45:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote:
> Let me step through an example:
>
> Ctrl A gets instance 0.
>
> Its subsystem gets the same instance, and takes ref count on it:
> all namespaces in this subsystem will use '0'.
>
> Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
> no new subsytem is allocated.
>
> Ctrl A is disconnected, dropping its ref on instance 0, but the
> subsystem still has its refcount, making it unavailable.
>
> Ctrl A is reconnected, and allocates instance 2 because 0 is still in
> use.
>
> Now all the namespaces in this subsystem are prefixed with nvme0, but no
> controller exists with the same prefix. We still have inevitable naming
> mismatch, right?

I think th major confusion was that we can use the same handle for
and unrelated subsystem vs controller, and that would avoid it.

I don't see how we can avoid the controller is entirely different
from namespace problem ever.

2019-09-04 15:57:45

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Wed, Sep 04, 2019 at 05:42:15PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote:
> > Let me step through an example:
> >
> > Ctrl A gets instance 0.
> >
> > Its subsystem gets the same instance, and takes ref count on it:
> > all namespaces in this subsystem will use '0'.
> >
> > Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
> > no new subsytem is allocated.
> >
> > Ctrl A is disconnected, dropping its ref on instance 0, but the
> > subsystem still has its refcount, making it unavailable.
> >
> > Ctrl A is reconnected, and allocates instance 2 because 0 is still in
> > use.
> >
> > Now all the namespaces in this subsystem are prefixed with nvme0, but no
> > controller exists with the same prefix. We still have inevitable naming
> > mismatch, right?
>
> I think th major confusion was that we can use the same handle for
> and unrelated subsystem vs controller, and that would avoid it.
>
> I don't see how we can avoid the controller is entirely different
> from namespace problem ever.

Can we just ensure there is never a matching controller then? This
patch will accomplish that and simpler than wrapping the instance in a
refcount'ed object:

http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html

2019-09-04 16:06:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Wed, Sep 04, 2019 at 09:54:45AM -0600, Keith Busch wrote:
> Can we just ensure there is never a matching controller then? This
> patch will accomplish that and simpler than wrapping the instance in a
> refcount'ed object:
>
> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html

I guess that is fine to. Btw, what happened to the plan for udev
rules in that thread?

2019-09-04 16:09:49

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches



On 2019-09-04 9:54 a.m., Keith Busch wrote:
> On Wed, Sep 04, 2019 at 05:42:15PM +0200, Christoph Hellwig wrote:
>> On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote:
>>> Let me step through an example:
>>>
>>> Ctrl A gets instance 0.
>>>
>>> Its subsystem gets the same instance, and takes ref count on it:
>>> all namespaces in this subsystem will use '0'.
>>>
>>> Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
>>> no new subsytem is allocated.
>>>
>>> Ctrl A is disconnected, dropping its ref on instance 0, but the
>>> subsystem still has its refcount, making it unavailable.
>>>
>>> Ctrl A is reconnected, and allocates instance 2 because 0 is still in
>>> use.
>>>
>>> Now all the namespaces in this subsystem are prefixed with nvme0, but no
>>> controller exists with the same prefix. We still have inevitable naming
>>> mismatch, right?
>>
>> I think th major confusion was that we can use the same handle for
>> and unrelated subsystem vs controller, and that would avoid it.
>>
>> I don't see how we can avoid the controller is entirely different
>> from namespace problem ever.


Yes, I agree, we can't solve the mismatch problem in the general case:
with sequences of hot plug events there will always be a case that
mismatches. I just think we can do better in the simple common default case.


> Can we just ensure there is never a matching controller then? This
> patch will accomplish that and simpler than wrapping the instance in a
> refcount'ed object:
>
> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html

I don't really like that idea. It reduces the confusion caused by
mismatching numbers, but causes the controller to never match the
namespace, which is also confusing but in a different way.

I like the nvme_instance idea. It's not going to be perfect but it has
some nice properties: the subsystem will try to match the controller's
instance whenever possible, but in cases where it doesn't, the instance
number of the subsystem will never be the same as an existing controller.

I'll see if I can work up a quick patch set and see what people think.

Logan

2019-09-04 16:40:27

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Wed, Sep 04, 2019 at 10:07:12AM -0600, Logan Gunthorpe wrote:
> Yes, I agree, we can't solve the mismatch problem in the general case:
> with sequences of hot plug events there will always be a case that
> mismatches. I just think we can do better in the simple common default case.

This may be something where udev can help us. I might be able to find
some time to look at that, but not today.

> > Can we just ensure there is never a matching controller then? This
> > patch will accomplish that and simpler than wrapping the instance in a
> > refcount'ed object:
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html
>
> I don't really like that idea. It reduces the confusion caused by
> mismatching numbers, but causes the controller to never match the
> namespace, which is also confusing but in a different way.
>
> I like the nvme_instance idea. It's not going to be perfect but it has
> some nice properties: the subsystem will try to match the controller's
> instance whenever possible, but in cases where it doesn't, the instance
> number of the subsystem will never be the same as an existing controller.
>
> I'll see if I can work up a quick patch set and see what people think.

How about this: we have the subsys copy the controller's instance,
and the nvme_free_ctrl() doesn't release it if its subsys matches?

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 14c0bfb55615..8a8279ece5ee 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq);
struct workqueue_struct *nvme_delete_wq;
EXPORT_SYMBOL_GPL(nvme_delete_wq);

-static DEFINE_IDA(nvme_subsystems_ida);
static LIST_HEAD(nvme_subsystems);
static DEFINE_MUTEX(nvme_subsystems_lock);

@@ -2344,7 +2343,8 @@ static void nvme_release_subsystem(struct device *dev)
struct nvme_subsystem *subsys =
container_of(dev, struct nvme_subsystem, dev);

- ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
+ if (subsys->instance >= 0)
+ ida_simple_remove(&nvme_instance_ida, subsys->instance);
kfree(subsys);
}

@@ -2473,12 +2473,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
if (!subsys)
return -ENOMEM;
- ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL);
- if (ret < 0) {
- kfree(subsys);
- return ret;
- }
- subsys->instance = ret;
+
+ subsys->instance = -1;
mutex_init(&subsys->lock);
kref_init(&subsys->ref);
INIT_LIST_HEAD(&subsys->ctrls);
@@ -2497,7 +2493,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
subsys->dev.class = nvme_subsys_class;
subsys->dev.release = nvme_release_subsystem;
subsys->dev.groups = nvme_subsys_attrs_groups;
- dev_set_name(&subsys->dev, "nvme-subsys%d", subsys->instance);
+ dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance);
device_initialize(&subsys->dev);

mutex_lock(&nvme_subsystems_lock);
@@ -2528,6 +2524,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
goto out_put_subsystem;
}

+ if (!found)
+ subsys->instance = ctrl->instance;
ctrl->subsys = subsys;
list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
mutex_unlock(&nvme_subsystems_lock);
@@ -3803,7 +3801,9 @@ static void nvme_free_ctrl(struct device *dev)
container_of(dev, struct nvme_ctrl, ctrl_device);
struct nvme_subsystem *subsys = ctrl->subsys;

- ida_simple_remove(&nvme_instance_ida, ctrl->instance);
+ if (subsys && ctrl->instance != subsys->instance)
+ ida_simple_remove(&nvme_instance_ida, ctrl->instance);
+
kfree(ctrl->effects);
nvme_mpath_uninit(ctrl);
__free_page(ctrl->discard_page);
@@ -4085,7 +4085,6 @@ static int __init nvme_core_init(void)

static void __exit nvme_core_exit(void)
{
- ida_destroy(&nvme_subsystems_ida);
class_destroy(nvme_subsys_class);
class_destroy(nvme_class);
unregister_chrdev_region(nvme_chr_devt, NVME_MINORS);
--

2019-09-04 17:03:16

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches



On 2019-09-04 10:35 a.m., Keith Busch wrote:
> On Wed, Sep 04, 2019 at 10:07:12AM -0600, Logan Gunthorpe wrote:
>> Yes, I agree, we can't solve the mismatch problem in the general case:
>> with sequences of hot plug events there will always be a case that
>> mismatches. I just think we can do better in the simple common default case.
>
> This may be something where udev can help us. I might be able to find
> some time to look at that, but not today.
>
>>> Can we just ensure there is never a matching controller then? This
>>> patch will accomplish that and simpler than wrapping the instance in a
>>> refcount'ed object:
>>>
>>> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html
>>
>> I don't really like that idea. It reduces the confusion caused by
>> mismatching numbers, but causes the controller to never match the
>> namespace, which is also confusing but in a different way.
>>
>> I like the nvme_instance idea. It's not going to be perfect but it has
>> some nice properties: the subsystem will try to match the controller's
>> instance whenever possible, but in cases where it doesn't, the instance
>> number of the subsystem will never be the same as an existing controller.
>>
>> I'll see if I can work up a quick patch set and see what people think.
>
> How about this: we have the subsys copy the controller's instance,
> and the nvme_free_ctrl() doesn't release it if its subsys matches?

Oh, yes that's simpler than the struct/kref method and looks like it
will accomplish the same thing. I did some brief testing with it and it
seems to work for me (though I don't have any subsystems with multiple
controllers). If you want to make a patch out of it you can add my

Reviewed-by: Logan Gunthorpe <[email protected]>

Thanks!

Logan

> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 14c0bfb55615..8a8279ece5ee 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq);
> struct workqueue_struct *nvme_delete_wq;
> EXPORT_SYMBOL_GPL(nvme_delete_wq);
>
> -static DEFINE_IDA(nvme_subsystems_ida);
> static LIST_HEAD(nvme_subsystems);
> static DEFINE_MUTEX(nvme_subsystems_lock);
>
> @@ -2344,7 +2343,8 @@ static void nvme_release_subsystem(struct device *dev)
> struct nvme_subsystem *subsys =
> container_of(dev, struct nvme_subsystem, dev);
>
> - ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
> + if (subsys->instance >= 0)
> + ida_simple_remove(&nvme_instance_ida, subsys->instance);
> kfree(subsys);
> }
>
> @@ -2473,12 +2473,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
> if (!subsys)
> return -ENOMEM;
> - ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL);
> - if (ret < 0) {
> - kfree(subsys);
> - return ret;
> - }
> - subsys->instance = ret;
> +
> + subsys->instance = -1;
> mutex_init(&subsys->lock);
> kref_init(&subsys->ref);
> INIT_LIST_HEAD(&subsys->ctrls);
> @@ -2497,7 +2493,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> subsys->dev.class = nvme_subsys_class;
> subsys->dev.release = nvme_release_subsystem;
> subsys->dev.groups = nvme_subsys_attrs_groups;
> - dev_set_name(&subsys->dev, "nvme-subsys%d", subsys->instance);
> + dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance);
> device_initialize(&subsys->dev);
>
> mutex_lock(&nvme_subsystems_lock);
> @@ -2528,6 +2524,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> goto out_put_subsystem;
> }
>
> + if (!found)
> + subsys->instance = ctrl->instance;
> ctrl->subsys = subsys;
> list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
> mutex_unlock(&nvme_subsystems_lock);
> @@ -3803,7 +3801,9 @@ static void nvme_free_ctrl(struct device *dev)
> container_of(dev, struct nvme_ctrl, ctrl_device);
> struct nvme_subsystem *subsys = ctrl->subsys;
>
> - ida_simple_remove(&nvme_instance_ida, ctrl->instance);
> + if (subsys && ctrl->instance != subsys->instance)
> + ida_simple_remove(&nvme_instance_ida, ctrl->instance);
> +
> kfree(ctrl->effects);
> nvme_mpath_uninit(ctrl);
> __free_page(ctrl->discard_page);
> @@ -4085,7 +4085,6 @@ static int __init nvme_core_init(void)
>
> static void __exit nvme_core_exit(void)
> {
> - ida_destroy(&nvme_subsystems_ida);
> class_destroy(nvme_subsys_class);
> class_destroy(nvme_class);
> unregister_chrdev_region(nvme_chr_devt, NVME_MINORS);
> --
>

2019-09-04 17:17:36

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches

On Wed, Sep 04, 2019 at 11:01:22AM -0600, Logan Gunthorpe wrote:
> Oh, yes that's simpler than the struct/kref method and looks like it
> will accomplish the same thing. I did some brief testing with it and it
> seems to work for me (though I don't have any subsystems with multiple
> controllers). If you want to make a patch out of it you can add my
>
> Reviewed-by: Logan Gunthorpe <[email protected]>

Thanks! I'll make it a proper patch and send shortly.

For testing multi-controller subsystems, I haven't got proper hardware
either, so I really like the nvme loop target. Here's a very simple json
defining a two namespace subsystem backed by two real nvme devices:

loop.json:
---
{
"ports": [
{
"addr": {
"adrfam": "",
"traddr": "",
"treq": "not specified",
"trsvcid": "",
"trtype": "loop"
},
"portid": 1,
"referrals": [],
"subsystems": [
"testnqn"
]
}
],
"subsystems": [
{
"attr": {
"allow_any_host": "1"
},
"namespaces": [
{
"device": {
"nguid": "ef90689c-6c46-d44c-89c1-4067801309a8",
"path": "/dev/nvme0n1"
},
"enable": 1,
"nsid": 1
},
{
"device": {
"nguid": "ef90689c-6c46-d44c-89c1-4067801309a9",
"path": "/dev/nvme1n1"
},
"enable": 1,
"nsid": 2
}
],
"nqn": "testnqn"
}
]
}
--

Configure the target:

# nvmetcli restore loop.json

Connect to it twice:

# nvme connect -n testnqn -t loop
# nvme connect -n testnqn -t loop

List the result:

# nvme list -v
NVM Express Subsystems

Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys0 nqn.2014.08.org.nvmexpress:8086108ePHLE7200015N6P4BGN-17335943:ICDPC5ED2ORA6.4T nvme0
nvme-subsys1 nqn.2014.08.org.nvmexpress:8086108ePHLE7200015N6P4BGN-27335943:ICDPC5ED2ORA6.4T nvme1
nvme-subsys2 testnqn nvme2, nvme3

NVM Express Controllers

Device SN MN FR TxPort Address Subsystem Namespaces
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
nvme0 PHLE7200015N6P4BGN-1 7335943:ICDPC5ED2ORA6.4T QDV1RD07 pcie 0000:88:00.0 nvme-subsys0 nvme0n1
nvme1 PHLE7200015N6P4BGN-2 7335943:ICDPC5ED2ORA6.4T QDV1RD03 pcie 0000:89:00.0 nvme-subsys1 nvme1n1
nvme2 9eb72cbeecc6fdb0 Linux 5.3.0-rc loop nvme-subsys2 nvme2n1, nvme2n2
nvme3 9eb72cbeecc6fdb0 Linux 5.3.0-rc loop nvme-subsys2 nvme2n1, nvme2n2

NVM Express Namespaces

Device NSID Usage Format Controllers
------------ -------- -------------------------- ---------------- ----------------
nvme0n1 1 3.20 TB / 3.20 TB 512 B + 0 B nvme0
nvme1n1 1 3.20 TB / 3.20 TB 512 B + 0 B nvme1
nvme2n1 1 3.20 TB / 3.20 TB 512 B + 0 B nvme2, nvme3
nvme2n2 2 3.20 TB / 3.20 TB 512 B + 0 B nvme2, nvme3


2019-09-04 17:32:06

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: Fix subsystem instance mismatches



On 2019-09-04 11:14 a.m., Keith Busch wrote:
> On Wed, Sep 04, 2019 at 11:01:22AM -0600, Logan Gunthorpe wrote:
>> Oh, yes that's simpler than the struct/kref method and looks like it
>> will accomplish the same thing. I did some brief testing with it and it
>> seems to work for me (though I don't have any subsystems with multiple
>> controllers). If you want to make a patch out of it you can add my
>>
>> Reviewed-by: Logan Gunthorpe <[email protected]>
>
> Thanks! I'll make it a proper patch and send shortly.
>
> For testing multi-controller subsystems, I haven't got proper hardware
> either, so I really like the nvme loop target. Here's a very simple json
> defining a two namespace subsystem backed by two real nvme devices:

Cool right, thanks for the tip, I should have thought of that. I just
did some more loop testing with your patch and it behaves roughly as we
expect. The controller and subsystem IDs never overlap unless they are
created at the same time and it doesn't look like any IDs are ever
leaked. With simple non-CMIC devices the ctrl and subsystem always have
the same instance number.

Logan