Hi,
this is a followup on the crash I reported in
https://lore.kernel.org/linux-block/[email protected]/
By moving the hardware check up the crash was gone. Unfortuntatly, I
don't understand why this fixes the crash. The per-cpu access is
crashing but I can't see why the blk_mq_update_nr_hw_queues() is
fixing this problem.
Even though I can't explain why it fixes it, I think it makes sense to
update the hardware queue mapping bevore we recreate the IO
queues. Thus I avoided in the commit message to say it fixes
something.
Also during testing I observed the we hang indivinetly in
blk_mq_freeze_queue_wait(). Again I can't explain why we get stuck
there but given a common pattern for the nvme_wait_freeze() is to use
it with a timeout I think the timeout should be used too :)
Anyway, someone with more undertanding of the stack can explain the
problems.
Thanks,
Daniel
Daniel Wagner (2):
nvme-fc: Update hardware queues before using them
nvme-fc: Wait with a timeout for queue to freeze
drivers/nvme/host/fc.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
--
2.29.2
On Fri, Jun 25, 2021 at 12:16:47PM +0200, Daniel Wagner wrote:
> this is a followup on the crash I reported in
>
> https://lore.kernel.org/linux-block/[email protected]/
>
> By moving the hardware check up the crash was gone. Unfortuntatly, I
> don't understand why this fixes the crash. The per-cpu access is
> crashing but I can't see why the blk_mq_update_nr_hw_queues() is
> fixing this problem.
>
> Even though I can't explain why it fixes it, I think it makes sense to
> update the hardware queue mapping bevore we recreate the IO
> queues. Thus I avoided in the commit message to say it fixes
> something.
I just discussed this with Hannes and we figured out how the crash is
fixed by moving the blk_mq_update_nr_hw_queues() before the
nvme_fc_create_hw_io_queues()/nvme_fc_connect_io_queues().
First of all, blk_mq_update_nr_hw_queues() operates on the normal
tag_set and not the admin_tag_set. That means when we move the
blk_mq_update_nr_hw_queues() before the nvme_fc_connect_io_queues(), we
update the mapping to only CPUs and hwctx which are available. When we
then do the connect call nvmf_connect_io_queue() we will only allocate
tags from queues which are not in the BLK_MQ_S_INACTIVE anymore. Hence
we skip the blk_mq_put_tag() call.
> Also during testing I observed the we hang indivinetly in
> blk_mq_freeze_queue_wait(). Again I can't explain why we get stuck
> there but given a common pattern for the nvme_wait_freeze() is to use
> it with a timeout I think the timeout should be used too :)
The nvme_wait_freeeze() is probably not needed at all,
__blk_mq_update_nr_hw_queues() already calls blk_mq_freeze_queue(). So
there this is not needed at all. Furthermore, if we move
blk_mq_update_nr_hw_queues() in front of nvme_fc_create_hw_io_queues()
there can't be any pending I/Os because there are not queues.
On Fri, Jun 25, 2021 at 02:21:56PM +0200, Daniel Wagner wrote:
> On Fri, Jun 25, 2021 at 12:16:47PM +0200, Daniel Wagner wrote:
> > this is a followup on the crash I reported in
> >
> > https://lore.kernel.org/linux-block/[email protected]/
> >
> > By moving the hardware check up the crash was gone. Unfortuntatly, I
> > don't understand why this fixes the crash. The per-cpu access is
> > crashing but I can't see why the blk_mq_update_nr_hw_queues() is
> > fixing this problem.
> >
> > Even though I can't explain why it fixes it, I think it makes sense to
> > update the hardware queue mapping bevore we recreate the IO
> > queues. Thus I avoided in the commit message to say it fixes
> > something.
>
> I just discussed this with Hannes and we figured out how the crash is
> fixed by moving the blk_mq_update_nr_hw_queues() before the
> nvme_fc_create_hw_io_queues()/nvme_fc_connect_io_queues().
>
> First of all, blk_mq_update_nr_hw_queues() operates on the normal
> tag_set and not the admin_tag_set. That means when we move the
> blk_mq_update_nr_hw_queues() before the nvme_fc_connect_io_queues(), we
> update the mapping to only CPUs and hwctx which are available. When we
> then do the connect call nvmf_connect_io_queue() we will only allocate
> tags from queues which are not in the BLK_MQ_S_INACTIVE anymore. Hence
> we skip the blk_mq_put_tag() call.
Your patch just reduces the race window, what if all cpus in
hctx->cpumask become offline when calling blk_mq_alloc_request_hctx()?
Thanks,
Ming
On Fri, Jun 25, 2021 at 9:00 PM Ming Lei <[email protected]> wrote:
>
> On Fri, Jun 25, 2021 at 02:21:56PM +0200, Daniel Wagner wrote:
> > On Fri, Jun 25, 2021 at 12:16:47PM +0200, Daniel Wagner wrote:
> > > this is a followup on the crash I reported in
> > >
> > > https://lore.kernel.org/linux-block/[email protected]/
> > >
> > > By moving the hardware check up the crash was gone. Unfortuntatly, I
> > > don't understand why this fixes the crash. The per-cpu access is
> > > crashing but I can't see why the blk_mq_update_nr_hw_queues() is
> > > fixing this problem.
> > >
> > > Even though I can't explain why it fixes it, I think it makes sense to
> > > update the hardware queue mapping bevore we recreate the IO
> > > queues. Thus I avoided in the commit message to say it fixes
> > > something.
> >
> > I just discussed this with Hannes and we figured out how the crash is
> > fixed by moving the blk_mq_update_nr_hw_queues() before the
> > nvme_fc_create_hw_io_queues()/nvme_fc_connect_io_queues().
> >
> > First of all, blk_mq_update_nr_hw_queues() operates on the normal
> > tag_set and not the admin_tag_set. That means when we move the
> > blk_mq_update_nr_hw_queues() before the nvme_fc_connect_io_queues(), we
> > update the mapping to only CPUs and hwctx which are available. When we
> > then do the connect call nvmf_connect_io_queue() we will only allocate
> > tags from queues which are not in the BLK_MQ_S_INACTIVE anymore. Hence
> > we skip the blk_mq_put_tag() call.
>
> Your patch just reduces the race window, what if all cpus in
> hctx->cpumask become offline when calling blk_mq_alloc_request_hctx()?
connect io queues after updating nr_hw_queues can cause correct hctx_idx
to be passed to blk_mq_alloc_request_hctx(), so this patch is good, so the patch
looks good.
Yeah, there is still other issue not covered during cpu hotplug, but
that is different
with this one.
Thanks,