On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> We can fix user-space triggered set_features higger up e.g. in
> nvme_ioctl by putting same check. Introduction of a separate state
> NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> additional advantage of making sure that only one thread is going
> through resetting and eventually through removal (if required) and
> solves lot of problems.
I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
sure some time in the past I already had it in a local tree as a
generalization of rdma and loop already use NVME_CTRL_RESETTING
(they set it before queueing the reset work).
But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
related to this patch?
On Thu, Jun 01, 2017 at 01:43:38PM +0200, Christoph Hellwig wrote:
> On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> > We can fix user-space triggered set_features higger up e.g. in
> > nvme_ioctl by putting same check. Introduction of a separate state
> > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > additional advantage of making sure that only one thread is going
> > through resetting and eventually through removal (if required) and
> > solves lot of problems.
>
> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).
>
> But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> related to this patch?
They aren't related. Sorry for confusion. I intended to answer
another thread but instead wrote it here.
On Thu, Jun 01, 2017 at 03:28:18PM +0300, Rakesh Pandit wrote:
> > I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
>
> They aren't related. Sorry for confusion. I intended to answer
> another thread but instead wrote it here.
Ok, thanks. But I think the Point from Sagi that we should move
the state check all the way down to submit_sync_command still makes
sense to me.
On Thu, Jun 01, 2017 at 03:28:18PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017 at 01:43:38PM +0200, Christoph Hellwig wrote:
> > On Tue, May 30, 2017 at 05:23:46PM +0300, Rakesh Pandit wrote:
> > > We can fix user-space triggered set_features higger up e.g. in
> > > nvme_ioctl by putting same check. Introduction of a separate state
> > > NVME_CTRL_SCHED_RESET (being discussed in another thread) has
> > > additional advantage of making sure that only one thread is going
> > > through resetting and eventually through removal (if required) and
> > > solves lot of problems.
> >
> > I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> > sure some time in the past I already had it in a local tree as a
> > generalization of rdma and loop already use NVME_CTRL_RESETTING
> > (they set it before queueing the reset work).
> >
> > But I don't fully understand how the NVME_CTRL_SCHED_RESET fix is
> > related to this patch?
>
> They aren't related. Sorry for confusion. I intended to answer
> another thread but instead wrote it here.
>
Also Sagi pointed out that user space set_features ioctl if fired up
in a window after nvme_removal it can also result in this issue seems
to be correct. I would prefer to keep this as it is and introduce
similar check higher up in nvme_ioctrl instead so that we don't send
sync commands if queues are killed already.
Would you prefer a patch ? Thanks,
On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> Also Sagi pointed out that user space set_features ioctl if fired up
> in a window after nvme_removal it can also result in this issue seems
> to be correct. I would prefer to keep this as it is and introduce
> similar check higher up in nvme_ioctrl instead so that we don't send
> sync commands if queues are killed already.
>
> Would you prefer a patch ? Thanks,
If we want to kill everyone we probably should do it in ->queue_rq.
Or is the block layer blocking you somewhere else?
On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > Also Sagi pointed out that user space set_features ioctl if fired up
> > in a window after nvme_removal it can also result in this issue seems
> > to be correct. I would prefer to keep this as it is and introduce
> > similar check higher up in nvme_ioctrl instead so that we don't send
> > sync commands if queues are killed already.
> >
> > Would you prefer a patch ? Thanks,
>
> If we want to kill everyone we probably should do it in ->queue_rq.
Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> Or is the block layer blocking you somewhere else?
blk-mq doesn't handle dying in the I/O path.
Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
nvme_kill_queues()), seems we need to do it for admin_q too.
Can the following change fix the issue?
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e44326d5cf19..360758488124 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
struct nvme_ns *ns;
mutex_lock(&ctrl->namespaces_mutex);
+ blk_mq_start_hw_queues(ctrl->admin_q);
list_for_each_entry(ns, &ctrl->namespaces, list) {
/*
* Revalidating a dead namespace sets capacity to 0. This will
Thanks,
Ming
On Thu, Jun 01, 2017 at 10:56:10PM +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > in a window after nvme_removal it can also result in this issue seems
> > > to be correct. I would prefer to keep this as it is and introduce
> > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > sync commands if queues are killed already.
> > >
> > > Would you prefer a patch ? Thanks,
> >
> > If we want to kill everyone we probably should do it in ->queue_rq.
>
> Looks ->queue_rq has done it already via checking nvmeq->cq_vector
>
> > Or is the block layer blocking you somewhere else?
>
> blk-mq doesn't handle dying in the I/O path.
>
> Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> nvme_kill_queues()), seems we need to do it for admin_q too.
>
> Can the following change fix the issue?
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e44326d5cf19..360758488124 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> mutex_lock(&ctrl->namespaces_mutex);
> + blk_mq_start_hw_queues(ctrl->admin_q);
> list_for_each_entry(ns, &ctrl->namespaces, list) {
> /*
> * Revalidating a dead namespace sets capacity to 0. This will
>
>
Yes change fixes the issue.
On Thu, Jun 01, 2017 at 10:33:04PM +0300, Rakesh Pandit wrote:
> On Thu, Jun 01, 2017 at 10:56:10PM +0800, Ming Lei wrote:
> > On Thu, Jun 01, 2017 at 02:46:32PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 01, 2017 at 03:36:50PM +0300, Rakesh Pandit wrote:
> > > > Also Sagi pointed out that user space set_features ioctl if fired up
> > > > in a window after nvme_removal it can also result in this issue seems
> > > > to be correct. I would prefer to keep this as it is and introduce
> > > > similar check higher up in nvme_ioctrl instead so that we don't send
> > > > sync commands if queues are killed already.
> > > >
> > > > Would you prefer a patch ? Thanks,
> > >
> > > If we want to kill everyone we probably should do it in ->queue_rq.
> >
> > Looks ->queue_rq has done it already via checking nvmeq->cq_vector
> >
> > > Or is the block layer blocking you somewhere else?
> >
> > blk-mq doesn't handle dying in the I/O path.
> >
> > Maybe it is similar with 806f026f9b901eaf1a(nvme: use blk_mq_start_hw_queues() in
> > nvme_kill_queues()), seems we need to do it for admin_q too.
> >
> > Can the following change fix the issue?
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index e44326d5cf19..360758488124 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2438,6 +2438,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> > struct nvme_ns *ns;
> >
> > mutex_lock(&ctrl->namespaces_mutex);
> > + blk_mq_start_hw_queues(ctrl->admin_q);
> > list_for_each_entry(ns, &ctrl->namespaces, list) {
> > /*
> > * Revalidating a dead namespace sets capacity to 0. This will
> >
> >
>
> Yes change fixes the issue.
Rakesh, thanks for your test, and I will prepare a formal one for
merging.
thanks,
Ming
> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> sure some time in the past I already had it in a local tree as a
> generalization of rdma and loop already use NVME_CTRL_RESETTING
> (they set it before queueing the reset work).
I don't remember having it, but I might be wrong...
Can you explain again why you think we need it? Sorry for being
difficult, but I'm not exactly sure why it makes things better
or simpler.
On Sun, Jun 04, 2017 at 06:28:15PM +0300, Sagi Grimberg wrote:
>
>> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
>> sure some time in the past I already had it in a local tree as a
>> generalization of rdma and loop already use NVME_CTRL_RESETTING
>> (they set it before queueing the reset work).
>
> I don't remember having it, but I might be wrong...
>
> Can you explain again why you think we need it? Sorry for being
> difficult, but I'm not exactly sure why it makes things better
> or simpler.
Motly that we can treat a controller as under reset before scheduling
the reset work, both to prevent multiple schedules, and to make
checks like the one in nvme_should_reset robus.
But I think something along the lines of the earlier patch from
Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
suggested by you should also work for that purpose. So maybe that's
the way to go after all.
On Mon, Jun 05, 2017 at 10:18:17AM +0200, Christoph Hellwig wrote:
> On Sun, Jun 04, 2017 at 06:28:15PM +0300, Sagi Grimberg wrote:
> >
> >> I think we need the NVME_CTRL_SCHED_RESET state. In fact I'm pretty
> >> sure some time in the past I already had it in a local tree as a
> >> generalization of rdma and loop already use NVME_CTRL_RESETTING
> >> (they set it before queueing the reset work).
> >
> > I don't remember having it, but I might be wrong...
> >
> > Can you explain again why you think we need it? Sorry for being
> > difficult, but I'm not exactly sure why it makes things better
> > or simpler.
>
> Motly that we can treat a controller as under reset before scheduling
> the reset work, both to prevent multiple schedules, and to make
> checks like the one in nvme_should_reset robus.
>
> But I think something along the lines of the earlier patch from
> Rakesh that just sets the RESETTING state earlier + the cancel_work_sync
> suggested by you should also work for that purpose. So maybe that's
> the way to go after all.
I would post a new patch which includes my RESETTING state earlier
patch + the cancel_work_sync which Sagi suggested after testing.
Sagi: Because my RESETTING patch earlier is subset of your untested
patch with cancel_work_sync, it would be logical to take a signed off
from you as well. May you review/ack/nack the patch? Feel free to
let me know if you want me to change it further or instead you want to
post as author.
I am okay with either as long as we fix the issue.
Rakesh,
> I would post a new patch which includes my RESETTING state earlier
> patch + the cancel_work_sync which Sagi suggested after testing.
>
> Sagi: Because my RESETTING patch earlier is subset of your untested
> patch with cancel_work_sync, it would be logical to take a signed off
> from you as well. May you review/ack/nack the patch? Feel free to
> let me know if you want me to change it further or instead you want to
> post as author.
Its all yours :)
send a patch and we'll review it, thanks.