2024-05-28 10:04:45

by Ricky Wu

[permalink] [raw]
Subject: [PATCH] ice: irdma hardware init failed after suspend/resume

A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
that irdma would break and report hardware initialization failed after
suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).

The problem is caused due to the collision between the irq numbers
requested in irdma and the irq numbers requested in other drivers
after suspend/resume.

The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
which stores mappings between MSI-X index and Linux interrupt number.
It's supposed to be cleaned up when suspend and rebuilt in resume but
it's not, causing irdma using the old irq numbers stored in the old
ice_pf->msix_entries to request_irq() when resume. And eventually
collide with other drivers.

This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
irdma if we've dynamically allocated them). On Resume, we call
ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
MSI-X vectors if we would like to dynamically allocate them).

Signed-off-by: Ricky Wu <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f60c022f7960..ec3cbadaa162 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
*/
disabled = ice_service_task_stop(pf);

- ice_unplug_aux_dev(pf);
+ ice_deinit_rdma(pf);

/* Already suspended?, then there is nothing to do */
if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
@@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
if (ret)
dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);

+ ret = ice_init_rdma(pf);
+ if (ret)
+ dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
+
clear_bit(ICE_DOWN, pf->state);
/* Now perform PF reset and rebuild */
reset_type = ICE_RESET_PFR;
--
2.43.0



2024-05-28 10:43:30

by Wojciech Drewek

[permalink] [raw]
Subject: Re: [PATCH] ice: irdma hardware init failed after suspend/resume



On 28.05.2024 12:03, Ricky Wu wrote:
> A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> that irdma would break and report hardware initialization failed after
> suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
>
> The problem is caused due to the collision between the irq numbers
> requested in irdma and the irq numbers requested in other drivers
> after suspend/resume.
>
> The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> which stores mappings between MSI-X index and Linux interrupt number.
> It's supposed to be cleaned up when suspend and rebuilt in resume but
> it's not, causing irdma using the old irq numbers stored in the old
> ice_pf->msix_entries to request_irq() when resume. And eventually
> collide with other drivers.
>
> This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> irdma if we've dynamically allocated them). On Resume, we call
> ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> MSI-X vectors if we would like to dynamically allocate them).
>
> Signed-off-by: Ricky Wu <[email protected]>
> ---

Thanks for the patch!

> drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index f60c022f7960..ec3cbadaa162 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
> */
> disabled = ice_service_task_stop(pf);
>
> - ice_unplug_aux_dev(pf);
> + ice_deinit_rdma(pf);

I think this should be called later in the reset path IMO.
You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),

>
> /* Already suspended?, then there is nothing to do */
> if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
> if (ret)
> dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>
> + ret = ice_init_rdma(pf);
> + if (ret)
> + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);

And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev)

> +
> clear_bit(ICE_DOWN, pf->state);
> /* Now perform PF reset and rebuild */
> reset_type = ICE_RESET_PFR;

2024-05-28 10:44:06

by Michal Swiatkowski

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume

On Tue, May 28, 2024 at 06:03:15PM +0800, Ricky Wu wrote:
> A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> that irdma would break and report hardware initialization failed after
> suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
>
> The problem is caused due to the collision between the irq numbers
> requested in irdma and the irq numbers requested in other drivers
> after suspend/resume.
>
> The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> which stores mappings between MSI-X index and Linux interrupt number.
> It's supposed to be cleaned up when suspend and rebuilt in resume but
> it's not, causing irdma using the old irq numbers stored in the old
> ice_pf->msix_entries to request_irq() when resume. And eventually
> collide with other drivers.
>
> This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> irdma if we've dynamically allocated them). On Resume, we call
> ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> MSI-X vectors if we would like to dynamically allocate them).
>
> Signed-off-by: Ricky Wu <[email protected]>
> ---
> drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index f60c022f7960..ec3cbadaa162 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
> */
> disabled = ice_service_task_stop(pf);
>
> - ice_unplug_aux_dev(pf);
> + ice_deinit_rdma(pf);
>
> /* Already suspended?, then there is nothing to do */
> if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
> if (ret)
> dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>
> + ret = ice_init_rdma(pf);
> + if (ret)
> + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
> +
> clear_bit(ICE_DOWN, pf->state);
> /* Now perform PF reset and rebuild */
> reset_type = ICE_RESET_PFR;

Looks fine, thanks for the fix. You can add fixes tag and target it to a
net tree.

Reviewed-by: Michal Swiatkowski <[email protected]>

> --
> 2.43.0
>

2024-05-28 11:13:14

by Paul Menzel

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume

Dear Ricky,


Thank you for your patch. Some minor nits. It’d be great if you made the
summary about the action and not an issue description. Maybe:

Avoid IRQ collision to fix init failure on ACPI S3 resume

Am 28.05.24 um 12:03 schrieb Ricky Wu:
> A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> that irdma would break and report hardware initialization failed after
> suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
>
> The problem is caused due to the collision between the irq numbers
> requested in irdma and the irq numbers requested in other drivers
> after suspend/resume.
>
> The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> which stores mappings between MSI-X index and Linux interrupt number.
> It's supposed to be cleaned up when suspend and rebuilt in resume but
> it's not, causing irdma using the old irq numbers stored in the old
> ice_pf->msix_entries to request_irq() when resume. And eventually
> collide with other drivers.
>
> This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> irdma if we've dynamically allocated them). On Resume, we call

resume

> ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> MSI-X vectors if we would like to dynamically allocate them).
>
> Signed-off-by: Ricky Wu <[email protected]>

Please add a Link: tag.

If this was tested by somebody else, please also add a Tested-by: line.

> ---
> drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index f60c022f7960..ec3cbadaa162 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
> */
> disabled = ice_service_task_stop(pf);
>
> - ice_unplug_aux_dev(pf);
> + ice_deinit_rdma(pf);
>
> /* Already suspended?, then there is nothing to do */
> if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
> if (ret)
> dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>
> + ret = ice_init_rdma(pf);
> + if (ret)
> + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
> +
> clear_bit(ICE_DOWN, pf->state);
> /* Now perform PF reset and rebuild */
> reset_type = ICE_RESET_PFR;

What effect does this have on resume time?


Kind regards,

Paul

2024-05-29 03:18:24

by Ricky Wu

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume

Thanks for your kind and quick reply.

> I think this should be called later in the reset path IMO.
> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),
I'm afraid this would break the existing code because in
ice_deinit_rdma(), it will remove some entries in
pf->irq_tracker.entries. And in ice_reinit_interrupt_scheme() (which
is called before ice_prepare_for_reset), some entries have been
allocated for other irq usage.

> What effect does this have on resume time?
When we call ice_init_rdma() at resume time, it will allocate entries
at pf->irq_tracker.entries and update pf->msix_entries for later use
(request_irq) by irdma.

On Tue, 28 May 2024 at 19:12, Paul Menzel <[email protected]> wrote:
>
> Dear Ricky,
>
>
> Thank you for your patch. Some minor nits. It’d be great if you made the
> summary about the action and not an issue description. Maybe:
>
> Avoid IRQ collision to fix init failure on ACPI S3 resume
>
> Am 28.05.24 um 12:03 schrieb Ricky Wu:
> > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> > that irdma would break and report hardware initialization failed after
> > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
> >
> > The problem is caused due to the collision between the irq numbers
> > requested in irdma and the irq numbers requested in other drivers
> > after suspend/resume.
> >
> > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> > which stores mappings between MSI-X index and Linux interrupt number.
> > It's supposed to be cleaned up when suspend and rebuilt in resume but
> > it's not, causing irdma using the old irq numbers stored in the old
> > ice_pf->msix_entries to request_irq() when resume. And eventually
> > collide with other drivers.
> >
> > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> > irdma if we've dynamically allocated them). On Resume, we call
>
> resume
>
> > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> > MSI-X vectors if we would like to dynamically allocate them).
> >
> > Signed-off-by: Ricky Wu <[email protected]>
>
> Please add a Link: tag.
>
> If this was tested by somebody else, please also add a Tested-by: line.
>
> > ---
> > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index f60c022f7960..ec3cbadaa162 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
> > */
> > disabled = ice_service_task_stop(pf);
> >
> > - ice_unplug_aux_dev(pf);
> > + ice_deinit_rdma(pf);
> >
> > /* Already suspended?, then there is nothing to do */
> > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
> > if (ret)
> > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
> >
> > + ret = ice_init_rdma(pf);
> > + if (ret)
> > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
> > +
> > clear_bit(ICE_DOWN, pf->state);
> > /* Now perform PF reset and rebuild */
> > reset_type = ICE_RESET_PFR;
>
> What effect does this have on resume time?
>
>
> Kind regards,
>
> Paul

2024-05-29 20:20:05

by Paul Menzel

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume

Dear En-Wei,


Thank you for responding so quickly.

Am 29.05.24 um 05:17 schrieb En-Wei WU:

[…]

>> What effect does this have on resume time?
>
> When we call ice_init_rdma() at resume time, it will allocate entries
> at pf->irq_tracker.entries and update pf->msix_entries for later use
> (request_irq) by irdma.

Sorry for being unclear. I meant, does resuming the system take longer
now? (initcall_debug might give a clue.)


Kind regards,

Paul

2024-05-30 07:08:51

by Ricky Wu

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume

Thank you for your reply.

> Sorry for being unclear. I meant, does resuming the system take longer
> now? (initcall_debug might give a clue.)
I've tested the S3 suspend/resume with the initcall_debug kernel
command option, and it shows no clear difference between having or not
having the ice_init_rdma in ice_resume:
Without ice_init_rdma:
```
[ 104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
0 after 9415 usecs
[ 104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
0 after 9443 usecs
```
With ice_init_rdma:
```
[ 122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
0 after 9485 usecs
[ 122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
0 after 9532 usecs
```

> And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev)
We can defer the ice_init_rdma to the later service task by adopting this.

> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),
It seems like we must call ice_deinit_rdma in ice_suspend. If we call
it in the later service task, it will:
1. break some existing code setup by ice_resume
2. Since the PCI-X vector table is flushed at the end of ice_suspend,
we have no way to release PCI-X vectors for rdma if we had allocated
it dynamically
The second point is important since we didn't release the PCI-X
vectors for rdma (if we allocated it dynamically) in the original
ice_suspend, and it's somewhat like a leak in the original code.

Best regards,
Ricky.

On Thu, 30 May 2024 at 04:19, Paul Menzel <[email protected]> wrote:
>
> Dear En-Wei,
>
>
> Thank you for responding so quickly.
>
> Am 29.05.24 um 05:17 schrieb En-Wei WU:
>
> […]
>
> >> What effect does this have on resume time?
> >
> > When we call ice_init_rdma() at resume time, it will allocate entries
> > at pf->irq_tracker.entries and update pf->msix_entries for later use
> > (request_irq) by irdma.
>
> Sorry for being unclear. I meant, does resuming the system take longer
> now? (initcall_debug might give a clue.)
>
>
> Kind regards,
>
> Paul

2024-05-30 07:56:33

by Paul Menzel

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume

Dear En-Wei,


Thank you for your quick reply.

Am 30.05.24 um 09:08 schrieb En-Wei WU:

>> Sorry for being unclear. I meant, does resuming the system take longer
>> now? (initcall_debug might give a clue.)
> I've tested the S3 suspend/resume with the initcall_debug kernel
> command option, and it shows no clear difference between having or not
> having the ice_init_rdma in ice_resume:
> Without ice_init_rdma:
> ```
> [ 104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9415 usecs
> [ 104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9443 usecs
> ```
> With ice_init_rdma:
> ```
> [ 122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9485 usecs
> [ 122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9532 usecs
> ```

Thank you for verifying this. Nice to hear, it has no impact.

[…]


Kind regards,

Paul

2024-05-31 09:38:16

by Wojciech Drewek

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH] ice: irdma hardware init failed after suspend/resume



On 30.05.2024 09:08, En-Wei WU wrote:
> Thank you for your reply.
>
>> Sorry for being unclear. I meant, does resuming the system take longer
>> now? (initcall_debug might give a clue.)
> I've tested the S3 suspend/resume with the initcall_debug kernel
> command option, and it shows no clear difference between having or not
> having the ice_init_rdma in ice_resume:
> Without ice_init_rdma:
> ```
> [ 104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9415 usecs
> [ 104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9443 usecs
> ```
> With ice_init_rdma:
> ```
> [ 122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9485 usecs
> [ 122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9532 usecs
> ```
>
>> And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev)
> We can defer the ice_init_rdma to the later service task by adopting this.
>
>> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),
> It seems like we must call ice_deinit_rdma in ice_suspend. If we call
> it in the later service task, it will:
> 1. break some existing code setup by ice_resume
> 2. Since the PCI-X vector table is flushed at the end of ice_suspend,
> we have no way to release PCI-X vectors for rdma if we had allocated
> it dynamically
> The second point is important since we didn't release the PCI-X
> vectors for rdma (if we allocated it dynamically) in the original
> ice_suspend, and it's somewhat like a leak in the original code.
>
> Best regards,
> Ricky.

Makes sense, let's keep ice_deinit_rdma in ice_suspend.

>
> On Thu, 30 May 2024 at 04:19, Paul Menzel <[email protected]> wrote:
>>
>> Dear En-Wei,
>>
>>
>> Thank you for responding so quickly.
>>
>> Am 29.05.24 um 05:17 schrieb En-Wei WU:
>>
>> […]
>>
>>>> What effect does this have on resume time?
>>>
>>> When we call ice_init_rdma() at resume time, it will allocate entries
>>> at pf->irq_tracker.entries and update pf->msix_entries for later use
>>> (request_irq) by irdma.
>>
>> Sorry for being unclear. I meant, does resuming the system take longer
>> now? (initcall_debug might give a clue.)
>>
>>
>> Kind regards,
>>
>> Paul