2023-03-14 15:32:31

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()

Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
acquire phydev->lock, but the mscc phy driver implementations,
vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
resulting in a deadlock.

$ ip link set swp3 down
============================================
WARNING: possible recursive locking detected
mscc_felix 0000:00:00.5 swp3: Link is Down
--------------------------------------------
ip/375 is trying to acquire lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4

but task is already holding lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&dev->lock);
lock(&dev->lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

2 locks held by ip/375:
#0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
#1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c

Call trace:
__mutex_lock+0x98/0x454
mutex_lock_nested+0x2c/0x38
vsc85xx_wol_get+0x2c/0xf4
phy_ethtool_get_wol+0x50/0x6c
phy_suspend+0x84/0xcc
phy_state_machine+0x1b8/0x27c
phy_stop+0x70/0x154
phylink_stop+0x34/0xc0
dsa_port_disable_rt+0x2c/0xa4
dsa_slave_close+0x38/0xec
__dev_close_many+0xc8/0x16c
__dev_change_flags+0xdc/0x218
dev_change_flags+0x24/0x6c
do_setlink+0x234/0xea4
__rtnl_newlink+0x46c/0x878
rtnl_newlink+0x50/0x7c
rtnetlink_rcv_msg+0x16c/0x58c

Removing the mutex_lock(&phydev->lock) calls from the driver restores
the functionality.

Fixes: 2f987d486610 ("net: phy: Add locks to ethtool functions")
Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/phy/mscc/mscc_main.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 8a13b1ad9a33..62bf99e45af1 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -280,12 +280,9 @@ static int vsc85xx_wol_set(struct phy_device *phydev,
u16 pwd[3] = {0, 0, 0};
struct ethtool_wolinfo *wol_conf = wol;

- mutex_lock(&phydev->lock);
rc = phy_select_page(phydev, MSCC_PHY_PAGE_EXTENDED_2);
- if (rc < 0) {
- rc = phy_restore_page(phydev, rc, rc);
- goto out_unlock;
- }
+ if (rc < 0)
+ return phy_restore_page(phydev, rc, rc);

if (wol->wolopts & WAKE_MAGIC) {
/* Store the device address for the magic packet */
@@ -323,7 +320,7 @@ static int vsc85xx_wol_set(struct phy_device *phydev,

rc = phy_restore_page(phydev, rc, rc > 0 ? 0 : rc);
if (rc < 0)
- goto out_unlock;
+ return rc;

if (wol->wolopts & WAKE_MAGIC) {
/* Enable the WOL interrupt */
@@ -331,22 +328,19 @@ static int vsc85xx_wol_set(struct phy_device *phydev,
reg_val |= MII_VSC85XX_INT_MASK_WOL;
rc = phy_write(phydev, MII_VSC85XX_INT_MASK, reg_val);
if (rc)
- goto out_unlock;
+ return rc;
} else {
/* Disable the WOL interrupt */
reg_val = phy_read(phydev, MII_VSC85XX_INT_MASK);
reg_val &= (~MII_VSC85XX_INT_MASK_WOL);
rc = phy_write(phydev, MII_VSC85XX_INT_MASK, reg_val);
if (rc)
- goto out_unlock;
+ return rc;
}
/* Clear WOL iterrupt status */
reg_val = phy_read(phydev, MII_VSC85XX_INT_STATUS);

-out_unlock:
- mutex_unlock(&phydev->lock);
-
- return rc;
+ return 0;
}

static void vsc85xx_wol_get(struct phy_device *phydev,
@@ -358,10 +352,9 @@ static void vsc85xx_wol_get(struct phy_device *phydev,
u16 pwd[3] = {0, 0, 0};
struct ethtool_wolinfo *wol_conf = wol;

- mutex_lock(&phydev->lock);
rc = phy_select_page(phydev, MSCC_PHY_PAGE_EXTENDED_2);
if (rc < 0)
- goto out_unlock;
+ goto out_restore_page;

reg_val = __phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL);
if (reg_val & SECURE_ON_ENABLE)
@@ -377,9 +370,8 @@ static void vsc85xx_wol_get(struct phy_device *phydev,
}
}

-out_unlock:
+out_restore_page:
phy_restore_page(phydev, rc, rc > 0 ? 0 : rc);
- mutex_unlock(&phydev->lock);
}

#if IS_ENABLED(CONFIG_OF_MDIO)
--
2.34.1



2023-03-14 16:26:28

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()

On Tue, Mar 14, 2023 at 05:30:25PM +0200, Vladimir Oltean wrote:
> Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
> acquire phydev->lock, but the mscc phy driver implementations,
> vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
> resulting in a deadlock.
>
> $ ip link set swp3 down
> ============================================
> WARNING: possible recursive locking detected
> mscc_felix 0000:00:00.5 swp3: Link is Down
> --------------------------------------------
> ip/375 is trying to acquire lock:
> ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4
>
> but task is already holding lock:
> ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&dev->lock);
> lock(&dev->lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by ip/375:
> #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
> #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
>
> Call trace:
> __mutex_lock+0x98/0x454
> mutex_lock_nested+0x2c/0x38
> vsc85xx_wol_get+0x2c/0xf4
> phy_ethtool_get_wol+0x50/0x6c
> phy_suspend+0x84/0xcc
> phy_state_machine+0x1b8/0x27c
> phy_stop+0x70/0x154
> phylink_stop+0x34/0xc0
> dsa_port_disable_rt+0x2c/0xa4
> dsa_slave_close+0x38/0xec
> __dev_close_many+0xc8/0x16c
> __dev_change_flags+0xdc/0x218
> dev_change_flags+0x24/0x6c
> do_setlink+0x234/0xea4
> __rtnl_newlink+0x46c/0x878
> rtnl_newlink+0x50/0x7c
> rtnetlink_rcv_msg+0x16c/0x58c
>
> Removing the mutex_lock(&phydev->lock) calls from the driver restores
> the functionality.
>
> Fixes: 2f987d486610 ("net: phy: Add locks to ethtool functions")
> Signed-off-by: Vladimir Oltean <[email protected]>

Reviewed-by: Simon Horman <[email protected]>


2023-03-14 16:32:17

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()

On Tue, Mar 14, 2023 at 05:30:25PM +0200, Vladimir Oltean wrote:
> Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
> acquire phydev->lock, but the mscc phy driver implementations,
> vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
> resulting in a deadlock.
>
> Fixes: 2f987d486610 ("net: phy: Add locks to ethtool functions")
> Signed-off-by: Vladimir Oltean <[email protected]>

Thanks Vladimir.

[Goes and checks to see if the same problem exists for other PHY drivers]

Reviewed-by: Andrew Lunn <[email protected]>

Andrew

2023-03-14 16:46:14

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()

On Tue, Mar 14, 2023 at 05:31:45PM +0100, Andrew Lunn wrote:
> [Goes and checks to see if the same problem exists for other PHY drivers]

Here's a call path I am not sure how to interpret (but doesn't look like
there's anything preventing it).

linkstate_get_sqi()
-> mutex_lock(&phydev->lock)
-> phydev->drv->get_sqi(phydev);
-> lan87xx_get_sqi()
-> access_ereg()
-> lan937x_dsp_workaround()
-> mutex_lock(&phydev->lock);
-> mutex_unlock(&phydev->lock);
-> mutex_unlock(&phydev->lock)

2023-03-14 16:53:12

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()

On Tue, Mar 14, 2023 at 06:45:47PM +0200, Vladimir Oltean wrote:
> On Tue, Mar 14, 2023 at 05:31:45PM +0100, Andrew Lunn wrote:
> > [Goes and checks to see if the same problem exists for other PHY drivers]
>
> Here's a call path I am not sure how to interpret (but doesn't look like
> there's anything preventing it).
>
> linkstate_get_sqi()
> -> mutex_lock(&phydev->lock)
> -> phydev->drv->get_sqi(phydev);
> -> lan87xx_get_sqi()
> -> access_ereg()
> -> lan937x_dsp_workaround()
> -> mutex_lock(&phydev->lock);
> -> mutex_unlock(&phydev->lock);
> -> mutex_unlock(&phydev->lock)

I noticed access_ereg() as well. But i don't think there have been any
recent changes there. Lots loop in the Microchip developers.

Andrew

2023-03-16 04:41:05

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <[email protected]>:

On Tue, 14 Mar 2023 17:30:25 +0200 you wrote:
> Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
> acquire phydev->lock, but the mscc phy driver implementations,
> vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
> resulting in a deadlock.
>
> $ ip link set swp3 down
> ============================================
> WARNING: possible recursive locking detected
> mscc_felix 0000:00:00.5 swp3: Link is Down
>
> [...]

Here is the summary with links:
- [net] net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()
https://git.kernel.org/netdev/net/c/cd356010ce4c

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html