2016-12-27 03:40:42

by Kweh, Hock Leong

[permalink] [raw]
Subject: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe

From: "Kweh, Hock Leong" <[email protected]>

If kernel module stmmac driver being loaded after OS booted, there is a
race condition between stmmac_open() and stmmac_mdio_register(), which is
invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
PHY not found and stmmac_open() failed:
[ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
stmmac_dvr_probe: warning: cannot get CSR clock
[ 473.919382] stmmaceth 0000:01:00.0: no reset control found
[ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
[ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
[ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
Enable RX Mitigation via HW Watchdog Timer
[ 473.921395] libphy: PHY stmmac-1:00 not found
[ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
[ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
PHY (error: -19)
[ 473.959710] libphy: stmmac: probed
[ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
(stmmac-1:00) active
[ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
(stmmac-1:01)
[ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
(stmmac-1:02)
[ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
(stmmac-1:03)

The resolution used wait_for_completion_interruptible() to synchronize
stmmac_open() and stmmac_dvr_probe() to prevent the race condition
happening.

Signed-off-by: Kweh, Hock Leong <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++++++
2 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index eab04ae..5daf8a5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -131,6 +131,7 @@ struct stmmac_priv {
u32 rx_tail_addr;
u32 tx_tail_addr;
u32 mss;
+ struct completion probe_done;

#ifdef CONFIG_DEBUG_FS
struct dentry *dbgfs_dir;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bb40382..28e85f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1770,6 +1770,14 @@ static int stmmac_open(struct net_device *dev)
struct stmmac_priv *priv = netdev_priv(dev);
int ret;

+ ret = wait_for_completion_interruptible(&priv->probe_done);
+ if (ret) {
+ netdev_err(priv->dev,
+ "%s: Interrupted while waiting probe completion\n",
+ __func__);
+ return ret;
+ }
+
stmmac_check_ether_addr(priv);

if (priv->hw->pcs != STMMAC_PCS_RGMII &&
@@ -3226,6 +3234,7 @@ int stmmac_dvr_probe(struct device *device,
priv = netdev_priv(ndev);
priv->device = device;
priv->dev = ndev;
+ init_completion(&priv->probe_done);

stmmac_set_ethtool_ops(ndev);
priv->pause = pause;
@@ -3372,6 +3381,7 @@ int stmmac_dvr_probe(struct device *device,
}
}

+ complete_all(&priv->probe_done);
return 0;

error_mdio_register:
--
1.7.9.5


2016-12-27 04:55:11

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe

From: "Kweh, Hock Leong" <[email protected]>
Date: Tue, 27 Dec 2016 19:44:59 +0800

> From: "Kweh, Hock Leong" <[email protected]>
>
> If kernel module stmmac driver being loaded after OS booted, there is a
> race condition between stmmac_open() and stmmac_mdio_register(), which is
> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> PHY not found and stmmac_open() failed:
...
> The resolution used wait_for_completion_interruptible() to synchronize
> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> happening.
>
> Signed-off-by: Kweh, Hock Leong <[email protected]>

The proper thing to do is to make sure register_netdevice() is not
invoked until it is %100 safe to call stmmac_open().

2016-12-27 05:10:21

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe



On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
> From: "Kweh, Hock Leong" <[email protected]>
>
> If kernel module stmmac driver being loaded after OS booted, there is a
> race condition between stmmac_open() and stmmac_mdio_register(), which is
> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> PHY not found and stmmac_open() failed:
> [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> stmmac_dvr_probe: warning: cannot get CSR clock
> [ 473.919382] stmmaceth 0000:01:00.0: no reset control found
> [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
> [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
> [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
> [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
> [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> Enable RX Mitigation via HW Watchdog Timer
> [ 473.921395] libphy: PHY stmmac-1:00 not found
> [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
> [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
> PHY (error: -19)
> [ 473.959710] libphy: stmmac: probed
> [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
> (stmmac-1:00) active
> [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
> (stmmac-1:01)
> [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
> (stmmac-1:02)
> [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
> (stmmac-1:03)
>
> The resolution used wait_for_completion_interruptible() to synchronize
> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> happening.

The proper fix for this would be to have register_netdev() be the last
thing done in stmmac_drv_probe(), whereas right now, the last thing done
is stmmac_mdio_register(), leading the window you are seeing here, where
the network interface can be open prior to all resources being set up,
including, but not limited to MDIO devices.
--
Florian

2016-12-27 05:14:07

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe



On 12/26/2016 09:10 PM, Florian Fainelli wrote:
>
>
> On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
>> From: "Kweh, Hock Leong" <[email protected]>
>>
>> If kernel module stmmac driver being loaded after OS booted, there is a
>> race condition between stmmac_open() and stmmac_mdio_register(), which is
>> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
>> PHY not found and stmmac_open() failed:
>> [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
>> stmmac_dvr_probe: warning: cannot get CSR clock
>> [ 473.919382] stmmaceth 0000:01:00.0: no reset control found
>> [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
>> [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
>> [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
>> [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
>> [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
>> Enable RX Mitigation via HW Watchdog Timer
>> [ 473.921395] libphy: PHY stmmac-1:00 not found
>> [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
>> [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
>> PHY (error: -19)
>> [ 473.959710] libphy: stmmac: probed
>> [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
>> (stmmac-1:00) active
>> [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
>> (stmmac-1:01)
>> [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
>> (stmmac-1:02)
>> [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
>> (stmmac-1:03)
>>
>> The resolution used wait_for_completion_interruptible() to synchronize
>> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
>> happening.
>
> The proper fix for this would be to have register_netdev() be the last
> thing done in stmmac_drv_probe(), whereas right now, the last thing done
> is stmmac_mdio_register(), leading the window you are seeing here, where
> the network interface can be open prior to all resources being set up,
> including, but not limited to MDIO devices.

Something like the following untested patch should plug this race:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bb40382e205d..5910ea51f8f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3339,13 +3339,6 @@ int stmmac_dvr_probe(struct device *device,

spin_lock_init(&priv->lock);

- ret = register_netdev(ndev);
- if (ret) {
- netdev_err(priv->dev, "%s: ERROR %i registering the
device\n",
- __func__, ret);
- goto error_netdev_register;
- }
-
/* If a specific clk_csr value is passed from the platform
* this means that the CSR Clock Range selection cannot be
* changed at run-time and it is fixed. Viceversa the driver'll
try to
@@ -3372,11 +3365,14 @@ int stmmac_dvr_probe(struct device *device,
}
}

- return 0;
+ ret = register_netdev(ndev);
+ if (ret)
+ netdev_err(priv->dev, "%s: ERROR %i registering the
device\n",
+ __func__, ret);
+
+ return ret;

error_mdio_register:
- unregister_netdev(ndev);
-error_netdev_register:
netif_napi_del(&priv->napi);
error_hw_init:
clk_disable_unprepare(priv->pclk);

--
Florian

2016-12-27 05:25:15

by Kweh, Hock Leong

[permalink] [raw]
Subject: RE: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe

> -----Original Message-----
> From: Florian Fainelli [mailto:[email protected]]
> Sent: Tuesday, December 27, 2016 1:14 PM
> To: Kweh, Hock Leong <[email protected]>; David S. Miller
> <[email protected]>; Joao Pinto <[email protected]>; Giuseppe
> CAVALLARO <[email protected]>; [email protected]
> Cc: Alexandre TORGUE <[email protected]>; Joachim Eastwood
> <[email protected]>; Niklas Cassel <[email protected]>; Johan Hovold
> <[email protected]>; [email protected]; Ong, Boon Leong
> <[email protected]>; netdev <[email protected]>; LKML <linux-
> [email protected]>; Voon, Weifeng <[email protected]>; Lars
> Persson <[email protected]>
> Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and
> stmmac_dvr_probe
>
>
>
> On 12/26/2016 09:10 PM, Florian Fainelli wrote:
> >
> >
> > On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
> >> From: "Kweh, Hock Leong" <[email protected]>
> >>
> >> If kernel module stmmac driver being loaded after OS booted, there is a
> >> race condition between stmmac_open() and stmmac_mdio_register(), which
> is
> >> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> >> PHY not found and stmmac_open() failed:
> >> [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> >> stmmac_dvr_probe: warning: cannot get CSR clock
> >> [ 473.919382] stmmaceth 0000:01:00.0: no reset control found
> >> [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
> >> [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register
> supported
> >> [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine
> supported
> >> [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
> >> [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> >> Enable RX Mitigation via HW Watchdog Timer
> >> [ 473.921395] libphy: PHY stmmac-1:00 not found
> >> [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
> >> [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach
> to
> >> PHY (error: -19)
> >> [ 473.959710] libphy: stmmac: probed
> >> [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
> >> (stmmac-1:00) active
> >> [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
> >> (stmmac-1:01)
> >> [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
> >> (stmmac-1:02)
> >> [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
> >> (stmmac-1:03)
> >>
> >> The resolution used wait_for_completion_interruptible() to synchronize
> >> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> >> happening.
> >
> > The proper fix for this would be to have register_netdev() be the last
> > thing done in stmmac_drv_probe(), whereas right now, the last thing done
> > is stmmac_mdio_register(), leading the window you are seeing here, where
> > the network interface can be open prior to all resources being set up,
> > including, but not limited to MDIO devices.
>
> Something like the following untested patch should plug this race:
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index bb40382e205d..5910ea51f8f6 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3339,13 +3339,6 @@ int stmmac_dvr_probe(struct device *device,
>
> spin_lock_init(&priv->lock);
>
> - ret = register_netdev(ndev);
> - if (ret) {
> - netdev_err(priv->dev, "%s: ERROR %i registering the
> device\n",
> - __func__, ret);
> - goto error_netdev_register;
> - }
> -
> /* If a specific clk_csr value is passed from the platform
> * this means that the CSR Clock Range selection cannot be
> * changed at run-time and it is fixed. Viceversa the driver'll
> try to
> @@ -3372,11 +3365,14 @@ int stmmac_dvr_probe(struct device *device,
> }
> }
>
> - return 0;
> + ret = register_netdev(ndev);
> + if (ret)
> + netdev_err(priv->dev, "%s: ERROR %i registering the
> device\n",
> + __func__, ret);
> +
> + return ret;
>
> error_mdio_register:
> - unregister_netdev(ndev);
> -error_netdev_register:
> netif_napi_del(&priv->napi);
> error_hw_init:
> clk_disable_unprepare(priv->pclk);
>
> --
> Florian

Thanks. Will try out to confirm.

Regards,
Wilson

2016-12-27 05:27:00

by Kweh, Hock Leong

[permalink] [raw]
Subject: RE: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe

> -----Original Message-----
> From: David Miller [mailto:[email protected]]
> Sent: Tuesday, December 27, 2016 12:55 PM
> To: Kweh, Hock Leong <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Ong, Boon Leong <[email protected]>;
> [email protected]; [email protected]; Voon, Weifeng
> <[email protected]>; [email protected]
> Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and
> stmmac_dvr_probe
>
> From: "Kweh, Hock Leong" <[email protected]>
> Date: Tue, 27 Dec 2016 19:44:59 +0800
>
> > From: "Kweh, Hock Leong" <[email protected]>
> >
> > If kernel module stmmac driver being loaded after OS booted, there is a
> > race condition between stmmac_open() and stmmac_mdio_register(), which is
> > invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> > PHY not found and stmmac_open() failed:
> ...
> > The resolution used wait_for_completion_interruptible() to synchronize
> > stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> > happening.
> >
> > Signed-off-by: Kweh, Hock Leong <[email protected]>
>
> The proper thing to do is to make sure register_netdevice() is not
> invoked until it is %100 safe to call stmmac_open().

Noted & thanks. Will look into it.

Regards,
Wilson