2023-04-03 07:57:56

by Niklas Schnelle

[permalink] [raw]
Subject: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

after an error on the PCI link, the driver does not need to wait
for the link to become functional again as a reset is required. Stop
the wait loop in this case to accelerate the recovery flow.

Co-developed-by: Alexander Schmidt <[email protected]>
Signed-off-by: Alexander Schmidt <[email protected]>
Signed-off-by: Niklas Schnelle <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index f9438d4e43ca..81ca44e0705a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
while (sensor_pci_not_working(dev)) {
if (time_after(jiffies, end))
return -ETIMEDOUT;
+ if (pci_channel_offline(dev->pdev))
+ return -EIO;
msleep(100);
}
return 0;
@@ -332,10 +334,16 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)

static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
{
+ int rc;
+
mlx5_core_warn(dev, "handling bad device here\n");
mlx5_handle_bad_state(dev);
- if (mlx5_health_wait_pci_up(dev)) {
- mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
+ rc = mlx5_health_wait_pci_up(dev);
+ if (rc) {
+ if (rc == -ETIMEDOUT)
+ mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
+ else
+ mlx5_core_err(dev, "health recovery flow aborted, PCI channel offline\n");
return -EIO;
}
mlx5_core_err(dev, "starting health recovery flow\n");

base-commit: 7e364e56293bb98cae1b55fd835f5991c4e96e7d
--
2.37.2


2023-04-03 18:28:05

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> after an error on the PCI link, the driver does not need to wait
> for the link to become functional again as a reset is required. Stop
> the wait loop in this case to accelerate the recovery flow.
>
> Co-developed-by: Alexander Schmidt <[email protected]>
> Signed-off-by: Alexander Schmidt <[email protected]>
> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> index f9438d4e43ca..81ca44e0705a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> @@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> while (sensor_pci_not_working(dev)) {

According to the comment in sensor_pci_not_working(), this loop is
supposed to wait till PCI will be ready again. Otherwise, already in
first iteration, we will bail out with pci_channel_offline() error.

Thanks

> if (time_after(jiffies, end))
> return -ETIMEDOUT;
> + if (pci_channel_offline(dev->pdev))
> + return -EIO;
> msleep(100);
> }
> return 0;
> @@ -332,10 +334,16 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
>
> static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
> {
> + int rc;
> +
> mlx5_core_warn(dev, "handling bad device here\n");
> mlx5_handle_bad_state(dev);
> - if (mlx5_health_wait_pci_up(dev)) {
> - mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> + rc = mlx5_health_wait_pci_up(dev);
> + if (rc) {
> + if (rc == -ETIMEDOUT)
> + mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> + else
> + mlx5_core_err(dev, "health recovery flow aborted, PCI channel offline\n");
> return -EIO;
> }
> mlx5_core_err(dev, "starting health recovery flow\n");
>
> base-commit: 7e364e56293bb98cae1b55fd835f5991c4e96e7d
> --
> 2.37.2
>

2023-04-04 15:35:29

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

On Mon, 2023-04-03 at 21:21 +0300, Leon Romanovsky wrote:
> On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> > after an error on the PCI link, the driver does not need to wait
> > for the link to become functional again as a reset is required. Stop
> > the wait loop in this case to accelerate the recovery flow.
> >
> > Co-developed-by: Alexander Schmidt <[email protected]>
> > Signed-off-by: Alexander Schmidt <[email protected]>
> > Signed-off-by: Niklas Schnelle <[email protected]>
> > ---
> > drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > index f9438d4e43ca..81ca44e0705a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > @@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> > while (sensor_pci_not_working(dev)) {
>
> According to the comment in sensor_pci_not_working(), this loop is
> supposed to wait till PCI will be ready again. Otherwise, already in
> first iteration, we will bail out with pci_channel_offline() error.
>
> Thanks

Well yes. The problem is that this works for intermittent errors
including when the card resets itself which seems to be the use case in
mlx5_fw_reset_complete_reload() and mlx5_devlink_reload_fw_activate().
If there is a PCI error that requires a link reset though we see some
problems though it does work after running into the timeout.

As I understand it and as implemented at least on s390,
pci_channel_io_frozen is only set for fatal errors that require a reset
while non fatal errors will have pci_channel_io_normal (see also
Documentation/PCI/pcieaer-howto.rst) thus I think pci_channel_offline()
should only be true if a reset is required or there is a permanent
error. Furthermore in the pci_channel_io_frozen state the PCI function
may be isolated and the reads will not reach the endpoint, this is the
case at least on s390. Thus for errors requiring a reset the loop
without pci_channel_offline() will run until the reset is performed or
the timeout is reached. In the mlx5_health_try_recover() case during
error recovery we will then indeed always loop until timeout, because
the loop blocks mlx5_pci_err_detected() from returning thus blocking
the reset (see Documentation/PCI/pci-error-recovery.rst). Adding Bjorn,
maybe he can confirm or correct my assumptions here.

Thanks,
Niklas

>
> > if (time_after(jiffies, end))
> > return -ETIMEDOUT;
> > + if (pci_channel_offline(dev->pdev))
> > + return -EIO;
> > msleep(100);
> > }
> > return 0;
> > @@ -332,10 +334,16 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> >
> > static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
> > {
> > + int rc;
> > +
> > mlx5_core_warn(dev, "handling bad device here\n");
> > mlx5_handle_bad_state(dev);
> > - if (mlx5_health_wait_pci_up(dev)) {
> > - mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> > + rc = mlx5_health_wait_pci_up(dev);
> > + if (rc) {
> > + if (rc == -ETIMEDOUT)
> > + mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> > + else
> > + mlx5_core_err(dev, "health recovery flow aborted, PCI channel offline\n");
> > return -EIO;
> > }
> > mlx5_core_err(dev, "starting health recovery flow\n");
> >
> > base-commit: 7e364e56293bb98cae1b55fd835f5991c4e96e7d
> > --
> > 2.37.2
> >

2023-04-05 21:09:29

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

On Tue, Apr 04, 2023 at 05:27:35PM +0200, Niklas Schnelle wrote:
> On Mon, 2023-04-03 at 21:21 +0300, Leon Romanovsky wrote:
> > On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> > > after an error on the PCI link, the driver does not need to wait
> > > for the link to become functional again as a reset is required. Stop
> > > the wait loop in this case to accelerate the recovery flow.
> > >
> > > Co-developed-by: Alexander Schmidt <[email protected]>
> > > Signed-off-by: Alexander Schmidt <[email protected]>
> > > Signed-off-by: Niklas Schnelle <[email protected]>
> > > ---
> > > drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > index f9438d4e43ca..81ca44e0705a 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > @@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> > > while (sensor_pci_not_working(dev)) {
> >
> > According to the comment in sensor_pci_not_working(), this loop is
> > supposed to wait till PCI will be ready again. Otherwise, already in
> > first iteration, we will bail out with pci_channel_offline() error.
>
> Well yes. The problem is that this works for intermittent errors
> including when the card resets itself which seems to be the use case in
> mlx5_fw_reset_complete_reload() and mlx5_devlink_reload_fw_activate().
> If there is a PCI error that requires a link reset though we see some
> problems though it does work after running into the timeout.
>
> As I understand it and as implemented at least on s390,
> pci_channel_io_frozen is only set for fatal errors that require a reset
> while non fatal errors will have pci_channel_io_normal (see also
> Documentation/PCI/pcieaer-howto.rst)

Yes, I think that's true, see handle_error_source().

> thus I think pci_channel_offline()
> should only be true if a reset is required or there is a permanent
> error.

Yes, I think pci_channel_offline() will only be true when a fatal
error has been reported via AER or DPC (or a hotplug driver says the
device has been removed). The driver resetting the device should not
cause such a fatal error.

> Furthermore in the pci_channel_io_frozen state the PCI function
> may be isolated and the reads will not reach the endpoint, this is the
> case at least on s390. Thus for errors requiring a reset the loop
> without pci_channel_offline() will run until the reset is performed or
> the timeout is reached. In the mlx5_health_try_recover() case during
> error recovery we will then indeed always loop until timeout, because
> the loop blocks mlx5_pci_err_detected() from returning thus blocking
> the reset (see Documentation/PCI/pci-error-recovery.rst). Adding Bjorn,
> maybe he can confirm or correct my assumptions here.

> > > if (time_after(jiffies, end))
> > > return -ETIMEDOUT;
> > > + if (pci_channel_offline(dev->pdev))
> > > + return -EIO;
> > > msleep(100);
> > > }
> > > return 0;
> > > @@ -332,10 +334,16 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> > >
> > > static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
> > > {
> > > + int rc;
> > > +
> > > mlx5_core_warn(dev, "handling bad device here\n");
> > > mlx5_handle_bad_state(dev);
> > > - if (mlx5_health_wait_pci_up(dev)) {
> > > - mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> > > + rc = mlx5_health_wait_pci_up(dev);
> > > + if (rc) {
> > > + if (rc == -ETIMEDOUT)
> > > + mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> > > + else
> > > + mlx5_core_err(dev, "health recovery flow aborted, PCI channel offline\n");
> > > return -EIO;
> > > }
> > > mlx5_core_err(dev, "starting health recovery flow\n");
> > >
> > > base-commit: 7e364e56293bb98cae1b55fd835f5991c4e96e7d
> > > --
> > > 2.37.2
> > >
>

2023-04-09 08:55:10

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

On Wed, Apr 05, 2023 at 04:06:13PM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 04, 2023 at 05:27:35PM +0200, Niklas Schnelle wrote:
> > On Mon, 2023-04-03 at 21:21 +0300, Leon Romanovsky wrote:
> > > On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> > > > after an error on the PCI link, the driver does not need to wait
> > > > for the link to become functional again as a reset is required. Stop
> > > > the wait loop in this case to accelerate the recovery flow.
> > > >
> > > > Co-developed-by: Alexander Schmidt <[email protected]>
> > > > Signed-off-by: Alexander Schmidt <[email protected]>
> > > > Signed-off-by: Niklas Schnelle <[email protected]>
> > > > ---
> > > > drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> > > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > > index f9438d4e43ca..81ca44e0705a 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > > @@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> > > > while (sensor_pci_not_working(dev)) {
> > >
> > > According to the comment in sensor_pci_not_working(), this loop is
> > > supposed to wait till PCI will be ready again. Otherwise, already in
> > > first iteration, we will bail out with pci_channel_offline() error.
> >
> > Well yes. The problem is that this works for intermittent errors
> > including when the card resets itself which seems to be the use case in
> > mlx5_fw_reset_complete_reload() and mlx5_devlink_reload_fw_activate().
> > If there is a PCI error that requires a link reset though we see some
> > problems though it does work after running into the timeout.
> >
> > As I understand it and as implemented at least on s390,
> > pci_channel_io_frozen is only set for fatal errors that require a reset
> > while non fatal errors will have pci_channel_io_normal (see also
> > Documentation/PCI/pcieaer-howto.rst)
>
> Yes, I think that's true, see handle_error_source().
>
> > thus I think pci_channel_offline()
> > should only be true if a reset is required or there is a permanent
> > error.
>
> Yes, I think pci_channel_offline() will only be true when a fatal
> error has been reported via AER or DPC (or a hotplug driver says the
> device has been removed). The driver resetting the device should not
> cause such a fatal error.

Thank you for an explanation and confirmation.

2023-04-09 08:59:12

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> after an error on the PCI link, the driver does not need to wait
> for the link to become functional again as a reset is required. Stop
> the wait loop in this case to accelerate the recovery flow.
>
> Co-developed-by: Alexander Schmidt <[email protected]>
> Signed-off-by: Alexander Schmidt <[email protected]>
> Signed-off-by: Niklas Schnelle <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>

The subject line should include target for netdev patches: [PATCH net-next] ....

Thanks,
Reviewed-by: Leon Romanovsky <[email protected]>

2023-04-11 10:15:35

by Niklas Schnelle

[permalink] [raw]
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required

On Sun, 2023-04-09 at 11:55 +0300, Leon Romanovsky wrote:
> On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> > after an error on the PCI link, the driver does not need to wait
> > for the link to become functional again as a reset is required. Stop
> > the wait loop in this case to accelerate the recovery flow.
> >
> > Co-developed-by: Alexander Schmidt <[email protected]>
> > Signed-off-by: Alexander Schmidt <[email protected]>
> > Signed-off-by: Niklas Schnelle <[email protected]>
> > ---
> > drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
>
> The subject line should include target for netdev patches: [PATCH net-next] ....
>
> Thanks,
> Reviewed-by: Leon Romanovsky <[email protected]>

Thanks, I'll sent a v2 with your R-b, the correct net-next prefix and a
Link to this discussion.