2022-03-17 12:54:51

by Ivan Vecera

[permalink] [raw]
Subject: [PATCH] iavf: Fix hang during reboot/shutdown

Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.

The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).

Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot

[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2
[52625.996732] Call Trace:
[52625.999187] __schedule+0x2d1/0x830
[52626.007400] schedule+0x35/0xa0
[52626.010545] schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046] usleep_range+0x5b/0x80
[52626.023540] iavf_remove+0x63/0x5b0 [iavf]
[52626.027645] pci_device_remove+0x3b/0xc0
[52626.031572] device_release_driver_internal+0x103/0x1f0
[52626.036805] pci_stop_bus_device+0x72/0xa0
[52626.040904] pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870] pci_iov_remove_virtfn+0xba/0x120
[52626.050232] sriov_disable+0x2f/0xe0
[52626.053813] ice_free_vfs+0x7c/0x340 [ice]
[52626.057946] ice_remove+0x220/0x240 [ice]
[52626.061967] ice_shutdown+0x16/0x50 [ice]
[52626.065987] pci_device_shutdown+0x34/0x60
[52626.070086] device_shutdown+0x165/0x1c5
[52626.074011] kernel_restart+0xe/0x30
[52626.077593] __do_sys_reboot+0x1d2/0x210
[52626.093815] do_syscall_64+0x5b/0x1a0
[52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 45570e3f782e..0e178a0a59c5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4620,6 +4620,13 @@ static void iavf_remove(struct pci_dev *pdev)
struct iavf_hw *hw = &adapter->hw;
int err;

+ /* When reboot/shutdown is in progress no need to do anything
+ * as the adapter is already REMOVE state that was set during
+ * iavf_shutdown() callback.
+ */
+ if (adapter->state == __IAVF_REMOVE)
+ return;
+
set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section);
/* Wait until port initialization is complete.
* There are flows where register/unregister netdev may race.
--
2.34.1


2022-03-17 19:16:34

by Tony Nguyen

[permalink] [raw]
Subject: Re: [PATCH] iavf: Fix hang during reboot/shutdown


On 3/17/2022 9:11 AM, Jakub Kicinski wrote:
> On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
>> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
>> initialized in remove") adds a wait-loop at the beginning of
>> iavf_remove() to ensure that port initialization is finished
>> prior unregistering net device. This causes a regression
>> in reboot/shutdown scenario because in this case callback
>> iavf_shutdown() is called and this callback detaches the device,
>> makes it down if it is running and sets its state to __IAVF_REMOVE.
>> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
>> is called. That callback calls among other things sriov_disable()
>> that calls indirectly iavf_remove() (see stack trace below).
>> As the adapter state is already __IAVF_REMOVE then the mentioned
>> loop is end-less and shutdown process hangs.
> Tony, Jesse, looks like the regression is from 5.17-rc6, should
> I take this directly so it makes 5.17 final?

Hi Jakub,

There are some additional improvements that we think can be made but we
need more time to analyze and test. This is probably good for you to
take to make into this kernel though. We will send follow on patches if
needed.

Thanks,

Tony

2022-03-17 20:21:22

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH] iavf: Fix hang during reboot/shutdown

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <[email protected]>:

On Thu, 17 Mar 2022 11:45:24 +0100 you wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
>
> [...]

Here is the summary with links:
- iavf: Fix hang during reboot/shutdown
https://git.kernel.org/netdev/net/c/b04683ff8f08

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


2022-03-17 20:42:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] iavf: Fix hang during reboot/shutdown

On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.

Tony, Jesse, looks like the regression is from 5.17-rc6, should
I take this directly so it makes 5.17 final?