LinuxLists.cc - Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

2017-06-01 11:43:40

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> We can fix user-space triggered set_features higger up e.g. in
> nvme_ioctl by putting same check. Introduction of a separate state
> NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> additional advantage of making sure that only one thread is going
> through resetting and eventually through removal (if required) and
> solves lot of problems.

I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
sure some time in the past I already had it in a local tree as a
generalization of rdma and loop already use NVME_CTRL_RESETTING
(they set it before queueing the reset work).

But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
related to this patch?

2017-06-01 12:28:21

by Rakesh Pandit

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 01:43:38PM +0200, Christoph Hellwig wrote:
> On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> > We can fix user-space triggered set_features higger up e.g. in
> > nvme_ioctl by putting same check. Introduction of a separate state
> > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > additional advantage of making sure that only one thread is going
> > through resetting and eventually through removal (if required) and
> > solves lot of problems.
>
> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).
>
> But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> related to this patch?

They aren't related. Sorry for confusion. I intended to answer
another thread but instead wrote it here.

2017-06-01 12:35:16

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 03:28:18PM +0300, Rakesh Pandit wrote:
> > I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
>
> They aren't related. Sorry for confusion. I intended to answer
> another thread but instead wrote it here.

Ok, thanks. But I think the Point from Sagi that we should move
the state check all the way down to submit_sync_command still makes
sense to me.

2017-06-01 12:36:53

by Rakesh Pandit

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 03:28:18PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017 at 01:43:38PM +0200, Christoph Hellwig wrote:
> > On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> > > We can fix user-space triggered set_features higger up e.g. in
> > > nvme_ioctl by putting same check. Introduction of a separate state
> > > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > > additional advantage of making sure that only one thread is going
> > > through resetting and eventually through removal (if required) and
> > > solves lot of problems.
> >
> > I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
>
> They aren't related. Sorry for confusion. I intended to answer
> another thread but instead wrote it here.
>

Also Sagi pointed out that user space set_features ioctl if fired up
in a window after nvme_removal it can also result in this issue seems
to be correct. I would prefer to keep this as it is and introduce
similar check higher up in nvme_ioctrl instead so that we don't send
sync commands if queues are killed already.

Would you prefer a patch ? Thanks,

2017-06-01 12:46:34

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> Also Sagi pointed out that user space set_features ioctl if fired up
> in a window after nvme_removal it can also result in this issue seems
> to be correct. I would prefer to keep this as it is and introduce
> similar check higher up in nvme_ioctrl instead so that we don't send
> sync commands if queues are killed already.
>
> Would you prefer a patch ? Thanks,

If we want to kill everyone we probably should do it in ->queue_rq.
Or is the block layer blocking you somewhere else?

2017-06-01 14:56:32

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > Also Sagi pointed out that user space set_features ioctl if fired up
> > in a window after nvme_removal it can also result in this issue seems
> > to be correct. I would prefer to keep this as it is and introduce
> > similar check higher up in nvme_ioctrl instead so that we don't send
> > sync commands if queues are killed already.
> >
> > Would you prefer a patch ? Thanks,
>
> If we want to kill everyone we probably should do it in ->queue_rq.

Looks ->queue_rq has done it already via checking nvmeq->cq_vector

> Or is the block layer blocking you somewhere else?

blk-mq doesn't handle dying in the I/O path.

Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
nvme_kill_queues()), seems we need to do it for admin_q too.

Can the following change fix the issue?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e44326d5cf19..360758488124 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
struct nvme_ns *ns;

mutex_lock(&ctrl->namespaces_mutex);
+ blk_mq_start_hw_queues(ctrl->admin_q);
list_for_each_entry(ns, &ctrl->namespaces, list) {
/*
* Revalidating a dead namespace sets capacity to 0. This will

Thanks,
Ming

2017-06-01 19:33:08

by Rakesh Pandit

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 10:56:10PM +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > in a window after nvme_removal it can also result in this issue seems
> > > to be correct. I would prefer to keep this as it is and introduce
> > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > sync commands if queues are killed already.
> > >
> > > Would you prefer a patch ? Thanks,
> >
> > If we want to kill everyone we probably should do it in ->queue_rq.
>
> Looks ->queue_rq has done it already via checking nvmeq->cq_vector
>
> > Or is the block layer blocking you somewhere else?
>
> blk-mq doesn't handle dying in the I/O path.
>
> Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> nvme_kill_queues()), seems we need to do it for admin_q too.
>
> Can the following change fix the issue?
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e44326d5cf19..360758488124 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> mutex_lock(&ctrl->namespaces_mutex);
> + blk_mq_start_hw_queues(ctrl->admin_q);
> list_for_each_entry(ns, &ctrl->namespaces, list) {
> /*
> * Revalidating a dead namespace sets capacity to 0. This will
>
>

Yes change fixes the issue.

2017-06-02 01:42:24

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Thu, Jun 01, 2017 at 10:33:04PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017 at 10:56:10PM +0800, Ming Lei wrote:
> > On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > > in a window after nvme_removal it can also result in this issue seems
> > > > to be correct. I would prefer to keep this as it is and introduce
> > > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > > sync commands if queues are killed already.
> > > >
> > > > Would you prefer a patch ? Thanks,
> > >
> > > If we want to kill everyone we probably should do it in ->queue_rq.
> >
> > Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> >
> > > Or is the block layer blocking you somewhere else?
> >
> > blk-mq doesn't handle dying in the I/O path.
> >
> > Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> > nvme_kill_queues()), seems we need to do it for admin_q too.
> >
> > Can the following change fix the issue?
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index e44326d5cf19..360758488124 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> > struct nvme_ns *ns;
> >
> > mutex_lock(&ctrl->namespaces_mutex);
> > + blk_mq_start_hw_queues(ctrl->admin_q);
> > list_for_each_entry(ns, &ctrl->namespaces, list) {
> > /*
> > * Revalidating a dead namespace sets capacity to 0. This will
> >
> >
>
> Yes change fixes the issue.

Rakesh, thanks for your test, and I will prepare a formal one for
merging.

thanks,
Ming

2017-06-04 15:28:23

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).

I don't remember having it, but I might be wrong...

Can you explain again why you think we need it? Sorry for being
difficult, but I'm not exactly sure why it makes things better
or simpler.

2017-06-05 08:18:20

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Sun, Jun 04, 2017 at 06:28:15PM +0300, Sagi Grimberg wrote:
>
>> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
>> sure some time in the past I already had it in a local tree as a
>> generalization of rdma and loop already use NVME_CTRL_RESETTING
>> (they set it before queueing the reset work).
>
> I don't remember having it, but I might be wrong...
>
> Can you explain again why you think we need it? Sorry for being
> difficult, but I'm not exactly sure why it makes things better
> or simpler.

Motly that we can treat a controller as under reset before scheduling
the reset work, both to prevent multiple schedules, and to make
checks like the one in nvme_should_reset robus.

But I think something along the lines of the earlier patch from
Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
suggested by you should also work for that purpose. So maybe that's
the way to go after all.

2017-06-05 10:52:20

by Rakesh Pandit

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

On Mon, Jun 05, 2017 at 10:18:17AM +0200, Christoph Hellwig wrote:
> On Sun, Jun 04, 2017 at 06:28:15PM +0300, Sagi Grimberg wrote:
> >
> >> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> >> sure some time in the past I already had it in a local tree as a
> >> generalization of rdma and loop already use NVME_CTRL_RESETTING
> >> (they set it before queueing the reset work).
> >
> > I don't remember having it, but I might be wrong...
> >
> > Can you explain again why you think we need it? Sorry for being
> > difficult, but I'm not exactly sure why it makes things better
> > or simpler.
>
> Motly that we can treat a controller as under reset before scheduling
> the reset work, both to prevent multiple schedules, and to make
> checks like the one in nvme_should_reset robus.
>
> But I think something along the lines of the earlier patch from
> Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
> suggested by you should also work for that purpose. So maybe that's
> the way to go after all.

I would post a new patch which includes my RESETTING state earlier
patch + the cancel_work_sync which Sagi suggested after testing.

Sagi: Because my RESETTING patch earlier is subset of your untested
patch with cancel_work_sync, it would be logical to take a signed off
from you as well. May you review/ack/nack the patch? Feel free to
let me know if you want me to change it further or instead you want to
post as author.

I am okay with either as long as we fix the issue.

2017-06-05 11:09:28

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep for ever

Rakesh,

> I would post a new patch which includes my RESETTING state earlier
> patch + the cancel_work_sync which Sagi suggested after testing.
>
> Sagi: Because my RESETTING patch earlier is subset of your untested
> patch with cancel_work_sync, it would be logical to take a signed off
> from you as well. May you review/ack/nack the patch? Feel free to
> let me know if you want me to change it further or instead you want to
> post as author.

Its all yours :)

send a patch and we'll review it, thanks.