When we remove the siblings entry, we update ns->head->list, hence we
can't separate the removal and test for being empty. They have to be
in the same critical section to avoid a race.
Fixes: 5396fdac56d8 ("nvme: fix refcounting imbalance when all paths are down")
Cc: Hannes Reinecke <[email protected]>
Cc: Keith Busch <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
I am able to hit this race window when I try to remove two paths
at the same time by making delete_controller asynchronous.
[ 93.977701] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress:NVMf:uuid:de63429f-50a4-4e03-ade6-0be27b75be77"
[ 93.994213] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress:NVMf:uuid:de63429f-50a4-4e03-ade6-0be27b75be77"
[ 94.009093] del cdev ffff991a00b3c388 minor 0
[ 94.009102] CPU: 2 PID: 13239 Comm: nvme Not tainted 5.14.0-rc4+ #29
[ 94.009109] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 94.009112] Call Trace:
[ 94.009119] dump_stack_lvl+0x33/0x42
[ 94.009133] nvme_cdev_del+0x2d/0x60 [nvme_core]
[ 94.009148] nvme_mpath_shutdown_disk+0x41/0x50 [nvme_core]
[ 94.009157] nvme_ns_remove+0x199/0x1c0 [nvme_core]
[ 94.009166] nvme_remove_namespaces+0xac/0xf0 [nvme_core]
[ 94.009175] nvme_do_delete_ctrl+0x43/0x60 [nvme_core]
[ 94.009182] nvme_sysfs_delete+0x42/0x60 [nvme_core]
[ 94.009190] kernfs_fop_write_iter+0x12c/0x1a0
[ 94.009219] new_sync_write+0x11c/0x1b0
[ 94.009229] vfs_write+0x1ea/0x250
[ 94.009236] ksys_write+0xa1/0xe0
[ 94.009242] do_syscall_64+0x37/0x80
[ 94.009256] entry_SYSCALL_64_after_hwframe+0x44/0xae
With the patch only one of the nvme_do_delete_ctrl() will see
last_path = true and I can't observe any crash.
Though one thing I am not really sure how it interacts with
nvme_init_ns_head() as we could be in running nvme_init_ns_head()
after we have set last_path = true. I haven't really figured
out yet what this would mean. Is this a real problem?
drivers/nvme/host/core.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 42b69f3c6e20..953d07d6a29d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3809,8 +3809,13 @@ static void nvme_ns_remove(struct nvme_ns *ns)
set_capacity(ns->disk, 0);
nvme_fault_inject_fini(&ns->fault_inject);
+ /* Synchronize with nvme_init_ns_head() */
mutex_lock(&ns->ctrl->subsys->lock);
list_del_rcu(&ns->siblings);
+ if (list_empty(&ns->head->list)) {
+ list_del_init(&ns->head->entry);
+ last_path = true;
+ }
mutex_unlock(&ns->ctrl->subsys->lock);
synchronize_rcu(); /* guarantee not available in head->list */
@@ -3830,13 +3835,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
list_del_init(&ns->list);
up_write(&ns->ctrl->namespaces_rwsem);
- /* Synchronize with nvme_init_ns_head() */
- mutex_lock(&ns->head->subsys->lock);
- if (list_empty(&ns->head->list)) {
- list_del_init(&ns->head->entry);
- last_path = true;
- }
- mutex_unlock(&ns->head->subsys->lock);
if (last_path)
nvme_mpath_shutdown_disk(ns->head);
nvme_put_ns(ns);
--
2.29.2
On Mon, Aug 30, 2021 at 11:36:18AM +0200, Daniel Wagner wrote:
> Though one thing I am not really sure how it interacts with
> nvme_init_ns_head() as we could be in running nvme_init_ns_head()
> after we have set last_path = true. I haven't really figured
> out yet what this would mean. Is this a real problem?
I suspect it will regress the very thing 5396fdac56d8 ("nvme: fix
refcounting imbalance when all paths are down") tried to fix.
On 8/30/21 12:04 PM, Daniel Wagner wrote:
> On Mon, Aug 30, 2021 at 11:36:18AM +0200, Daniel Wagner wrote:
>> Though one thing I am not really sure how it interacts with
>> nvme_init_ns_head() as we could be in running nvme_init_ns_head()
>> after we have set last_path = true. I haven't really figured
>> out yet what this would mean. Is this a real problem?
>
> I suspect it will regress the very thing 5396fdac56d8 ("nvme: fix
> refcounting imbalance when all paths are down") tried to fix.
>
Most likely. Do drop me a mail on how to create a reproducer for that;
it's not exactly trivial as you need to patch qemu for that
(and, of course, those patches will not go upstream as they again hit a
section which the maintainer deemed to be reworked any time now. So of
course he can't possibly apply them.)
(I seem to have a particular spell of bad luck, seeing that it's the
_third_ time this happened to me :-( )
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
On Mon, Aug 30, 2021 at 07:14:02PM +0200, Hannes Reinecke wrote:
> On 8/30/21 12:04 PM, Daniel Wagner wrote:
> > On Mon, Aug 30, 2021 at 11:36:18AM +0200, Daniel Wagner wrote:
> > > Though one thing I am not really sure how it interacts with
> > > nvme_init_ns_head() as we could be in running nvme_init_ns_head()
> > > after we have set last_path = true. I haven't really figured
> > > out yet what this would mean. Is this a real problem?
> >
> > I suspect it will regress the very thing 5396fdac56d8 ("nvme: fix
> > refcounting imbalance when all paths are down") tried to fix.
> >
> Most likely. Do drop me a mail on how to create a reproducer for that; it's
> not exactly trivial as you need to patch qemu for that
> (and, of course, those patches will not go upstream as they again hit a
> section which the maintainer deemed to be reworked any time now. So of
> course he can't possibly apply them.)
> (I seem to have a particular spell of bad luck, seeing that it's the _third_
> time this happened to me :-( )
Soo. What is the problem in simply checking in nvme_find_ns_head that
h->list is non-empty? E.g. this variant of the patch from Daniel:
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d535b00d65816..ce91655fa29bb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3523,7 +3523,9 @@ static struct nvme_ns_head *nvme_find_ns_head(struct nvme_subsystem *subsys,
lockdep_assert_held(&subsys->lock);
list_for_each_entry(h, &subsys->nsheads, entry) {
- if (h->ns_id == nsid && nvme_tryget_ns_head(h))
+ if (h->ns_id != nsid)
+ continue;
+ if (!list_empty(&h->list) && nvme_tryget_ns_head(h))
return h;
}
@@ -3835,7 +3837,11 @@ static void nvme_ns_remove(struct nvme_ns *ns)
mutex_lock(&ns->ctrl->subsys->lock);
list_del_rcu(&ns->siblings);
- mutex_unlock(&ns->ctrl->subsys->lock);
+ if (list_empty(&ns->head->list)) {
+ list_del_init(&ns->head->entry);
+ last_path = true;
+ }
+ mutex_unlock(&ns->head->subsys->lock);
/* guarantee not available in head->list */
synchronize_rcu();
@@ -3855,13 +3861,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
list_del_init(&ns->list);
up_write(&ns->ctrl->namespaces_rwsem);
- /* Synchronize with nvme_init_ns_head() */
- mutex_lock(&ns->head->subsys->lock);
- if (list_empty(&ns->head->list)) {
- list_del_init(&ns->head->entry);
- last_path = true;
- }
- mutex_unlock(&ns->head->subsys->lock);
if (last_path)
nvme_mpath_shutdown_disk(ns->head);
nvme_put_ns(ns);
>> Most likely. Do drop me a mail on how to create a reproducer for that; it's
>> not exactly trivial as you need to patch qemu for that
>> (and, of course, those patches will not go upstream as they again hit a
>> section which the maintainer deemed to be reworked any time now. So of
>> course he can't possibly apply them.)
>> (I seem to have a particular spell of bad luck, seeing that it's the _third_
>> time this happened to me :-( )
>
> Soo. What is the problem in simply checking in nvme_find_ns_head that
> h->list is non-empty? E.g. this variant of the patch from Daniel:
Don't see why this won't work...
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d535b00d65816..ce91655fa29bb 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3523,7 +3523,9 @@ static struct nvme_ns_head *nvme_find_ns_head(struct nvme_subsystem *subsys,
> lockdep_assert_held(&subsys->lock);
>
> list_for_each_entry(h, &subsys->nsheads, entry) {
> - if (h->ns_id == nsid && nvme_tryget_ns_head(h))
> + if (h->ns_id != nsid)
> + continue;
> + if (!list_empty(&h->list) && nvme_tryget_ns_head(h))
> return h;
> }
>
> @@ -3835,7 +3837,11 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>
> mutex_lock(&ns->ctrl->subsys->lock);
> list_del_rcu(&ns->siblings);
> - mutex_unlock(&ns->ctrl->subsys->lock);
> + if (list_empty(&ns->head->list)) {
> + list_del_init(&ns->head->entry);
> + last_path = true;
> + }
> + mutex_unlock(&ns->head->subsys->lock);
>
> /* guarantee not available in head->list */
> synchronize_rcu();
> @@ -3855,13 +3861,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> list_del_init(&ns->list);
> up_write(&ns->ctrl->namespaces_rwsem);
>
> - /* Synchronize with nvme_init_ns_head() */
> - mutex_lock(&ns->head->subsys->lock);
> - if (list_empty(&ns->head->list)) {
> - list_del_init(&ns->head->entry);
> - last_path = true;
> - }
> - mutex_unlock(&ns->head->subsys->lock);
> if (last_path)
> nvme_mpath_shutdown_disk(ns->head);
> nvme_put_ns(ns);
>
> _______________________________________________
> Linux-nvme mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>