2022-07-12 13:09:42

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

On s390 and powerpc PCI devices are isolated when an error is detected
and driver->err_handler->error_detected is called with an inaccessible
PCI device and PCI channel state set to pci_channel_io_frozen
(see Step 1 in Documentation/PCI/pci-error-recovery.rst).

In the case of NVMe devices nvme_error_detected() then calls
nvme_dev_disable(dev, false) and requests a reset. After a successful
reset the device is accessible again and nvme_slot_reset() resets the
controller and queues nvme_reset_work() which then recovers the
controller.

Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
queues if pci_device_is_present() returns false. This is the case for an
isolated PCI device. In principle this makes sense as there are no
accessible hardware queues to run. The problem though is that for
a previously live reset controller with online queues nvme_reset_work()
calls nvme_wait_freeze() which, without the freeze having been
initiated, then hangs forever. Fix this by starting the freeze in
nvme_slot_reset() which is the earliest point where we know the device
should be accessible again.

Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
Signed-off-by: Niklas Schnelle <[email protected]>
---
drivers/nvme/host/pci.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 193b44755662..7c0c61b74c30 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3399,6 +3399,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
dev_info(dev->ctrl.device, "restart after slot reset\n");
pci_restore_state(pdev);
nvme_reset_ctrl(&dev->ctrl);
+ nvme_start_freeze(&dev->ctrl);
return PCI_ERS_RESULT_RECOVERED;
}

--
2.34.1


2022-07-12 14:02:05

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

On 7/12/22 14:44, Niklas Schnelle wrote:
> On s390 and powerpc PCI devices are isolated when an error is detected
> and driver->err_handler->error_detected is called with an inaccessible
> PCI device and PCI channel state set to pci_channel_io_frozen
> (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
>
> In the case of NVMe devices nvme_error_detected() then calls
> nvme_dev_disable(dev, false) and requests a reset. After a successful
> reset the device is accessible again and nvme_slot_reset() resets the
> controller and queues nvme_reset_work() which then recovers the
> controller.
>
> Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
> nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
> queues if pci_device_is_present() returns false. This is the case for an
> isolated PCI device. In principle this makes sense as there are no
> accessible hardware queues to run. The problem though is that for
> a previously live reset controller with online queues nvme_reset_work()
> calls nvme_wait_freeze() which, without the freeze having been
> initiated, then hangs forever. Fix this by starting the freeze in
> nvme_slot_reset() which is the earliest point where we know the device
> should be accessible again.
>
> Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> drivers/nvme/host/pci.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 193b44755662..7c0c61b74c30 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3399,6 +3399,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
> dev_info(dev->ctrl.device, "restart after slot reset\n");
> pci_restore_state(pdev);
> nvme_reset_ctrl(&dev->ctrl);
> + nvme_start_freeze(&dev->ctrl);
> return PCI_ERS_RESULT_RECOVERED;
> }
>
I am not sure if that's the right fix.
From your description the hang occurs as nvme_reset_ctrl() is calling
nvme_wait_freeze() without an corresponding nvme_start_freeze().
So why are you calling it _after_ the call to nvme_reset_ctrl()?

Cheers,

Hannes

2022-07-12 14:53:44

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

On Tue, 2022-07-12 at 15:49 +0200, Hannes Reinecke wrote:
> On 7/12/22 14:44, Niklas Schnelle wrote:
> > On s390 and powerpc PCI devices are isolated when an error is detected
> > and driver->err_handler->error_detected is called with an inaccessible
> > PCI device and PCI channel state set to pci_channel_io_frozen
> > (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
> >
> > In the case of NVMe devices nvme_error_detected() then calls
> > nvme_dev_disable(dev, false) and requests a reset. After a successful
> > reset the device is accessible again and nvme_slot_reset() resets the
> > controller and queues nvme_reset_work() which then recovers the
> > controller.
> >
> > Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
> > nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
> > queues if pci_device_is_present() returns false. This is the case for an
> > isolated PCI device. In principle this makes sense as there are no
> > accessible hardware queues to run. The problem though is that for
> > a previously live reset controller with online queues nvme_reset_work()
> > calls nvme_wait_freeze() which, without the freeze having been
> > initiated, then hangs forever. Fix this by starting the freeze in
> > nvme_slot_reset() which is the earliest point where we know the device
> > should be accessible again.
> >
> > Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
> > Signed-off-by: Niklas Schnelle <[email protected]>
> > ---
> > drivers/nvme/host/pci.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 193b44755662..7c0c61b74c30 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -3399,6 +3399,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
> > dev_info(dev->ctrl.device, "restart after slot reset\n");
> > pci_restore_state(pdev);
> > nvme_reset_ctrl(&dev->ctrl);
> > + nvme_start_freeze(&dev->ctrl);
> > return PCI_ERS_RESULT_RECOVERED;
> > }
> >
> I am not sure if that's the right fix.
> From your description the hang occurs as nvme_reset_ctrl() is calling
> nvme_wait_freeze() without an corresponding nvme_start_freeze().
> So why are you calling it _after_ the call to nvme_reset_ctrl()?
>
> Cheers,
>
> Hannes

Hmm, the call chain that used to have the nvme_start_freeze()
is nvme_error_detected()->nvme_dev_disable()->nvme_start_freeze().

With the referenced commit that nvme_start_freeze() no longer happens
because the nvme_error_detected callback occurs before the reset when
the device is still inaccessible (as mentioned Step 1 in
Documentation/PCI/pci-error-recovery.rst).

There is indeed another nvme_dev_disable() call in nvme_reset_work()
but in the nvme_slot_reset() path this comes before
pci_enable_device_mem() was called so that also doesn't do the
nvme_start_freeze(). I also tried doing the nvme_start_freeze() there
but at least in my test that broke /sys/bus/pci/devices/<dev>/reset,
though not entirely sure why since nvme_start_freeze() looks like a no-
op when done a second time. I'm also not sure though what the right
approach is here though.

2022-07-12 14:56:45

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

On Tue, 2022-07-12 at 08:17 -0600, Keith Busch wrote:
> On Tue, Jul 12, 2022 at 02:44:53PM +0200, Niklas Schnelle wrote:
> > On s390 and powerpc PCI devices are isolated when an error is detected
> > and driver->err_handler->error_detected is called with an inaccessible
> > PCI device and PCI channel state set to pci_channel_io_frozen
> > (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
> >
> > In the case of NVMe devices nvme_error_detected() then calls
> > nvme_dev_disable(dev, false) and requests a reset. After a successful
> > reset the device is accessible again and nvme_slot_reset() resets the
> > controller and queues nvme_reset_work() which then recovers the
> > controller.
> >
> > Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
> > nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
> > queues if pci_device_is_present() returns false. This is the case for an
> > isolated PCI device. In principle this makes sense as there are no
> > accessible hardware queues to run. The problem though is that for
> > a previously live reset controller with online queues nvme_reset_work()
> > calls nvme_wait_freeze() which, without the freeze having been
> > initiated, then hangs forever. Fix this by starting the freeze in
> > nvme_slot_reset() which is the earliest point where we know the device
> > should be accessible again.
> >
> > Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
> > Signed-off-by: Niklas Schnelle <[email protected]>
>
> Oh, we've messed up the expected sequence. The mistaken assumption is a device
> not present means we're about to unbind from it, but it could appear that way
> just for normal error handling and reset, so we need to preserve the previous
> handling.
>
> The offending commit really just wants to avoid the register access (which we
> shouldn't have to do, but hey, broken hardware...). So let's keep the sequence
> the same as before and just skip the register read. Does this work for you?

Ah thanks for the explanation! I had actually tested a similar patch
but wasn't sure if nvme_start_freeze() also does register access for
starting the HW queues and if it makes sense on a dead/isolated device
at all. On the other hand this code very explicitly handles dead
devices so I guess this was kept in mind.

So yes the below patch works for me.

>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fdfee3e590db..c40e82cee735 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> static void nvme_dev_remove_admin(struct nvme_dev *dev)
> @@ -2690,9 +2772,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> struct pci_dev *pdev = to_pci_dev(dev->dev);
>
> mutex_lock(&dev->shutdown_lock);
> - if (pci_device_is_present(pdev) && pci_is_enabled(pdev)) {
> - u32 csts = readl(dev->bar + NVME_REG_CSTS);
> + if (pci_is_enabled(pdev)) {
> + u32 csts = ~0;
>
> + if (pci_device_is_present(pdev))
> + csts = readl(dev->bar + NVME_REG_CSTS);
> if (dev->ctrl.state == NVME_CTRL_LIVE ||
> dev->ctrl.state == NVME_CTRL_RESETTING) {
> freeze = true;
> --


2022-07-12 15:00:24

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

On Tue, Jul 12, 2022 at 02:44:53PM +0200, Niklas Schnelle wrote:
> On s390 and powerpc PCI devices are isolated when an error is detected
> and driver->err_handler->error_detected is called with an inaccessible
> PCI device and PCI channel state set to pci_channel_io_frozen
> (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
>
> In the case of NVMe devices nvme_error_detected() then calls
> nvme_dev_disable(dev, false) and requests a reset. After a successful
> reset the device is accessible again and nvme_slot_reset() resets the
> controller and queues nvme_reset_work() which then recovers the
> controller.
>
> Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
> nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
> queues if pci_device_is_present() returns false. This is the case for an
> isolated PCI device. In principle this makes sense as there are no
> accessible hardware queues to run. The problem though is that for
> a previously live reset controller with online queues nvme_reset_work()
> calls nvme_wait_freeze() which, without the freeze having been
> initiated, then hangs forever. Fix this by starting the freeze in
> nvme_slot_reset() which is the earliest point where we know the device
> should be accessible again.
>
> Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
> Signed-off-by: Niklas Schnelle <[email protected]>

Oh, we've messed up the expected sequence. The mistaken assumption is a device
not present means we're about to unbind from it, but it could appear that way
just for normal error handling and reset, so we need to preserve the previous
handling.

The offending commit really just wants to avoid the register access (which we
shouldn't have to do, but hey, broken hardware...). So let's keep the sequence
the same as before and just skip the register read. Does this work for you?

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fdfee3e590db..c40e82cee735 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
static void nvme_dev_remove_admin(struct nvme_dev *dev)
@@ -2690,9 +2772,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
struct pci_dev *pdev = to_pci_dev(dev->dev);

mutex_lock(&dev->shutdown_lock);
- if (pci_device_is_present(pdev) && pci_is_enabled(pdev)) {
- u32 csts = readl(dev->bar + NVME_REG_CSTS);
+ if (pci_is_enabled(pdev)) {
+ u32 csts = ~0;

+ if (pci_device_is_present(pdev))
+ csts = readl(dev->bar + NVME_REG_CSTS);
if (dev->ctrl.state == NVME_CTRL_LIVE ||
dev->ctrl.state == NVME_CTRL_RESETTING) {
freeze = true;
--

2022-07-18 09:39:58

by Stefan Roese

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

On 12.07.22 16:17, Keith Busch wrote:
> On Tue, Jul 12, 2022 at 02:44:53PM +0200, Niklas Schnelle wrote:
>> On s390 and powerpc PCI devices are isolated when an error is detected
>> and driver->err_handler->error_detected is called with an inaccessible
>> PCI device and PCI channel state set to pci_channel_io_frozen
>> (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
>>
>> In the case of NVMe devices nvme_error_detected() then calls
>> nvme_dev_disable(dev, false) and requests a reset. After a successful
>> reset the device is accessible again and nvme_slot_reset() resets the
>> controller and queues nvme_reset_work() which then recovers the
>> controller.
>>
>> Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
>> nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
>> queues if pci_device_is_present() returns false. This is the case for an
>> isolated PCI device. In principle this makes sense as there are no
>> accessible hardware queues to run. The problem though is that for
>> a previously live reset controller with online queues nvme_reset_work()
>> calls nvme_wait_freeze() which, without the freeze having been
>> initiated, then hangs forever. Fix this by starting the freeze in
>> nvme_slot_reset() which is the earliest point where we know the device
>> should be accessible again.
>>
>> Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
>> Signed-off-by: Niklas Schnelle <[email protected]>
>
> Oh, we've messed up the expected sequence. The mistaken assumption is a device
> not present means we're about to unbind from it, but it could appear that way
> just for normal error handling and reset, so we need to preserve the previous
> handling.
>
> The offending commit really just wants to avoid the register access (which we
> shouldn't have to do, but hey, broken hardware...).

Correct.

> So let's keep the sequence
> the same as before and just skip the register read. Does this work for you?
>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fdfee3e590db..c40e82cee735 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> static void nvme_dev_remove_admin(struct nvme_dev *dev)
> @@ -2690,9 +2772,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> struct pci_dev *pdev = to_pci_dev(dev->dev);
>
> mutex_lock(&dev->shutdown_lock);
> - if (pci_device_is_present(pdev) && pci_is_enabled(pdev)) {
> - u32 csts = readl(dev->bar + NVME_REG_CSTS);
> + if (pci_is_enabled(pdev)) {
> + u32 csts = ~0;
>
> + if (pci_device_is_present(pdev))
> + csts = readl(dev->bar + NVME_REG_CSTS);
> if (dev->ctrl.state == NVME_CTRL_LIVE ||
> dev->ctrl.state == NVME_CTRL_RESETTING) {
> freeze = true;
> --


Thanks. Looks good to me. So if anyone whats to send a proper patch with
this change, feel free to add my:

Reviewed-by: Stefan Roese <[email protected]>

Thanks,
Stefan