iavf_reset_task() takes crit_lock at the beginning and holds
it during whole call. The function subsequently calls
iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs
when userspace initiates during the reset task any ndo callback
that runs under RTNL like iavf_open() because some of that
functions tries to take crit_lock. This leads to classic A-B B-A
deadlock scenario.
To resolve this situation the device should be detached in
iavf_reset_task() prior taking crit_lock to avoid subsequent
ndos running under RTNL and reattach the device at the end.
Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability")
Cc: Jacob Keller <[email protected]>
Cc: Patryk Piotrowski <[email protected]>
Cc: SlawomirX Laba <[email protected]>
Tested-by: Vitaly Grinberg <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f39440ad5c50..10aa99dfdcdb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2877,6 +2877,11 @@ static void iavf_reset_task(struct work_struct *work)
int i = 0, err;
bool running;
+ /* Detach interface to avoid subsequent NDO callbacks */
+ rtnl_lock();
+ netif_device_detach(netdev);
+ rtnl_unlock();
+
/* When device is being removed it doesn't make sense to run the reset
* task, just return in such a case.
*/
@@ -2884,7 +2889,7 @@ static void iavf_reset_task(struct work_struct *work)
if (adapter->state != __IAVF_REMOVE)
queue_work(iavf_wq, &adapter->reset_task);
- return;
+ goto reset_finish;
}
while (!mutex_trylock(&adapter->client_lock))
@@ -2954,7 +2959,6 @@ static void iavf_reset_task(struct work_struct *work)
if (running) {
netif_carrier_off(netdev);
- netif_tx_stop_all_queues(netdev);
adapter->link_up = false;
iavf_napi_disable_all(adapter);
}
@@ -3084,7 +3088,7 @@ static void iavf_reset_task(struct work_struct *work)
mutex_unlock(&adapter->client_lock);
mutex_unlock(&adapter->crit_lock);
- return;
+ goto reset_finish;
reset_err:
if (running) {
set_bit(__IAVF_VSI_DOWN, adapter->vsi.state);
@@ -3095,6 +3099,10 @@ static void iavf_reset_task(struct work_struct *work)
mutex_unlock(&adapter->client_lock);
mutex_unlock(&adapter->crit_lock);
dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
+reset_finish:
+ rtnl_lock();
+ netif_device_attach(netdev);
+ rtnl_unlock();
}
/**
--
2.35.1
> -----Original Message-----
> From: Ivan Vecera <[email protected]>
> Sent: Tuesday, August 30, 2022 10:16 AM
> Subject: [PATCH net v2] iavf: Detach device during reset task
>
> iavf_reset_task() takes crit_lock at the beginning and holds it during whole call.
> The function subsequently calls
> iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when userspace
> initiates during the reset task any ndo callback that runs under RTNL like
> iavf_open() because some of that functions tries to take crit_lock. This leads to
> classic A-B B-A deadlock scenario.
>
> To resolve this situation the device should be detached in
> iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running under
> RTNL and reattach the device at the end.
>
> Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around
> i40evf_set_interrupt_capability")
> Cc: Jacob Keller <[email protected]>
> Cc: Patryk Piotrowski <[email protected]>
> Cc: SlawomirX Laba <[email protected]>
> Tested-by: Vitaly Grinberg <[email protected]>
> Signed-off-by: Ivan Vecera <[email protected]>
>
> @@ -2884,7 +2889,7 @@ static void iavf_reset_task(struct work_struct *work)
> if (adapter->state != __IAVF_REMOVE)
> queue_work(iavf_wq, &adapter->reset_task);
>
> - return;
> + goto reset_finish;
> }
>
> while (!mutex_trylock(&adapter->client_lock))
Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish label?
if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) {
dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
reg_val);
iavf_disable_vf(adapter);
mutex_unlock(&adapter->client_lock);
mutex_unlock(&adapter->crit_lock);
return; /* Do not attempt to reinit. It's dead, Jim. */
}
I am concerned that if the reset never finishes and iavf goes into disabled state, and then for example if driver reload operation is performed, bad things can happen.
[1] https://elixir.bootlin.com/linux/v6.0-rc3/source/drivers/net/ethernet/intel/iavf/iavf_main.c#L2939
On Tue, 30 Aug 2022 20:49:54 +0000
"Laba, SlawomirX" <[email protected]> wrote:
> Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish label?
>
> if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) {
> dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
> reg_val);
> iavf_disable_vf(adapter);
> mutex_unlock(&adapter->client_lock);
> mutex_unlock(&adapter->crit_lock);
> return; /* Do not attempt to reinit. It's dead, Jim. */
> }
>
> I am concerned that if the reset never finishes and iavf goes into disabled state, and then for example if driver reload operation is performed, bad things can happen.
I think we should not re-attach device back as the VF is disabled. Detached device causes that userspace (user) won't be able to configure associated interface
that is correct. Driver reload won't cause anything special in this situation because during module removal the interface is unregistered.
Thanks,
Ivan
> -----Original Message-----
> From: Ivan Vecera <[email protected]>
> Sent: Wednesday, August 31, 2022 12:06 AM
> To: Laba, SlawomirX <[email protected]>
> Cc: [email protected]; Keller, Jacob E <[email protected]>;
> Piotrowski, Patryk <[email protected]>; Vitaly Grinberg
> <[email protected]>; Brandeburg, Jesse <[email protected]>;
> Nguyen, Anthony L <[email protected]>; David S. Miller
> <[email protected]>; Eric Dumazet <[email protected]>; Jakub Kicinski
> <[email protected]>; Paolo Abeni <[email protected]>; Jeff Kirsher
> <[email protected]>; moderated list:INTEL ETHERNET DRIVERS <intel-
> [email protected]>; open list <[email protected]>
> Subject: Re: [PATCH net v2] iavf: Detach device during reset task
>
> On Tue, 30 Aug 2022 20:49:54 +0000
> "Laba, SlawomirX" <[email protected]> wrote:
>
> > Ivan, what do you think about this flow [1]? Shouldn't it also goto reset_finish
> label?
> >
> > if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) {
> > dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
> > reg_val);
> > iavf_disable_vf(adapter);
> > mutex_unlock(&adapter->client_lock);
> > mutex_unlock(&adapter->crit_lock);
> > return; /* Do not attempt to reinit. It's dead, Jim. */
> > }
> >
> > I am concerned that if the reset never finishes and iavf goes into disabled state,
> and then for example if driver reload operation is performed, bad things can
> happen.
>
> I think we should not re-attach device back as the VF is disabled. Detached device
> causes that userspace (user) won't be able to configure associated interface
> that is correct. Driver reload won't cause anything special in this situation
> because during module removal the interface is unregistered.
>
> Thanks,
> Ivan
I agree. It's safe to remove a detached device.
Thanks,
Jake
> -----Original Message-----
> From: Intel-wired-lan <[email protected]> On Behalf Of
> Ivan Vecera
> Sent: Tuesday, August 30, 2022 10:16 AM
> To: [email protected]
> Cc: Paolo Abeni <[email protected]>; Eric Dumazet
> <[email protected]>; moderated list:INTEL ETHERNET DRIVERS <intel-
> [email protected]>; open list <[email protected]>;
> Piotrowski, Patryk <[email protected]>; Jeff Kirsher
> <[email protected]>; Jakub Kicinski <[email protected]>; Vitaly
> Grinberg <[email protected]>; David S. Miller <[email protected]>
> Subject: [Intel-wired-lan] [PATCH net v2] iavf: Detach device during reset
> task
>
> iavf_reset_task() takes crit_lock at the beginning and holds it during whole
> call. The function subsequently calls
> iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when
> userspace initiates during the reset task any ndo callback that runs under
> RTNL like iavf_open() because some of that functions tries to take crit_lock.
> This leads to classic A-B B-A deadlock scenario.
>
> To resolve this situation the device should be detached in
> iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running
> under RTNL and reattach the device at the end.
>
> Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around
> i40evf_set_interrupt_capability")
> Cc: Jacob Keller <[email protected]>
> Cc: Patryk Piotrowski <[email protected]>
> Cc: SlawomirX Laba <[email protected]>
> Tested-by: Vitaly Grinberg <[email protected]>
> Signed-off-by: Ivan Vecera <[email protected]>
> ---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index f39440ad5c50..10aa99dfdcdb 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
Tested-by: Konrad Jankowski <[email protected]>