Hi,
I'm seeing a deadlock in linkwatch_event() when bringing down an
Ethernet interface using the smc91x driver (drivers/net/smc91x.c).
What I am seeing is that smc_close() is calling netif_carrier_off which
has the call chain:
netif_carrier_off
-> linkwatch_fire_event
-> schedule_work or schedule_delayed_work
The function that is scheduled is linkwatch_event().
smc_close() then goes on to call flush_scheduled_work() in order to
ensure that it's own pending workqueue stuff (smc_phy_configure()) is
completed before powering down the PHY.
What I am seeing is that linkwatch_event() is deadlocking trying take
rtnl_sem via rtnl_shlock(). The lock appears to already be held by a
call to rtnl_lock() from devinet_ioctl().
Any ideas? Perhaps smc_phy_configure calls could just check that the
interface is up before continuing, then there would be no need to flush
the queue to get rid of it.
Ian.
--
Ian Campbell, Senior Design Engineer
Web: http://www.arcom.com
Arcom, Clifton Road, Direct: +44 (0)1223 403 465
Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
Ian Campbell <[email protected]> wrote:
>
> Hi,
>
> I'm seeing a deadlock in linkwatch_event() when bringing down an
> Ethernet interface using the smc91x driver (drivers/net/smc91x.c).
>
> What I am seeing is that smc_close() is calling netif_carrier_off which
> has the call chain:
> netif_carrier_off
> -> linkwatch_fire_event
> -> schedule_work or schedule_delayed_work
> The function that is scheduled is linkwatch_event().
>
> smc_close() then goes on to call flush_scheduled_work() in order to
> ensure that it's own pending workqueue stuff (smc_phy_configure()) is
> completed before powering down the PHY.
>
> What I am seeing is that linkwatch_event() is deadlocking trying take
> rtnl_sem via rtnl_shlock(). The lock appears to already be held by a
> call to rtnl_lock() from devinet_ioctl().
>
> Any ideas? Perhaps smc_phy_configure calls could just check that the
> interface is up before continuing, then there would be no need to flush
> the queue to get rid of it.
>
linkwatch probably doesn't need the flush_scheduled_work(), because it
correctly does refcounting on the device. Presumably that
flush_scheduled_work() in smc_close() is there to force out any pending
calls to smc_phy_configure().
One possible fix would be to remove that flush_scheduled_work() and to do
refcounting around smc_phy_configure(): dev_hold() when scheduling the work
(if schedule_work() returned true), dev_put() in the handler.
On Tue, 2004-11-23 at 15:31 -0800, Andrew Morton wrote:
> One possible fix would be to remove that flush_scheduled_work() and to do
> refcounting around smc_phy_configure(): dev_hold() when scheduling the work
> (if schedule_work() returned true), dev_put() in the handler.
Something like the following? I can bring the interface up and down
without locks now.
I introduced the smc_phy_configure_wq() wrapper for the workqueue call
because smc_phy_configure is also called directly from a couple of
places -- I guess I could also have prefixed both of those calls with a
dev_hold() or punted both to the workqueue but this way seems nicest.
Signed-off-by: Ian Campbell <[email protected]>
Index: 2.6/drivers/net/smc91x.c
===================================================================
--- 2.6.orig/drivers/net/smc91x.c 2004-11-16 09:26:52.000000000 +0000
+++ 2.6/drivers/net/smc91x.c 2004-11-24 09:40:19.940528769 +0000
@@ -1158,6 +1158,20 @@
}
/*
+ * smc_phy_configure_wq
+ *
+ * The net_device is referenced when the work was scheduled to avoid
+ * the need for a flush_scheduled_work() in smc_close(). Drop the
+ * reference and then do the configuration.
+ */
+static void smc_phy_configure_wq(void *data)
+{
+ struct net_device *dev = data;
+ dev_put(dev);
+ smc_phy_configure(data);
+}
+
+/*
* smc_phy_interrupt
*
* Purpose: Handle interrupts relating to PHY register 18. This is
@@ -1350,10 +1364,13 @@
/*
* Reconfiguring the PHY doesn't seem like a bad idea here, but
* smc_phy_configure() calls msleep() which calls schedule_timeout()
- * which calls schedule(). Ence we use a work queue.
+ * which calls schedule(). Hence we use a work queue.
*/
- if (lp->phy_type != 0)
- schedule_work(&lp->phy_configure);
+ if (lp->phy_type != 0) {
+ if (schedule_work(&lp->phy_configure)) {
+ dev_hold(dev);
+ }
+ }
/* We can accept TX packets again */
dev->trans_start = jiffies;
@@ -1536,10 +1553,8 @@
/* clear everything */
smc_shutdown(dev);
- if (lp->phy_type != 0) {
- flush_scheduled_work();
+ if (lp->phy_type != 0)
smc_phy_powerdown(dev, lp->mii.phy_id);
- }
if (lp->pending_tx_skb) {
dev_kfree_skb(lp->pending_tx_skb);
@@ -1891,7 +1906,7 @@
dev->ethtool_ops = &smc_ethtool_ops;
tasklet_init(&lp->tx_task, smc_hardware_send_pkt, (unsigned long)dev);
- INIT_WORK(&lp->phy_configure, smc_phy_configure, dev);
+ INIT_WORK(&lp->phy_configure, smc_phy_configure_wq, dev);
lp->mii.phy_id_mask = 0x1f;
lp->mii.reg_num_mask = 0x1f;
lp->mii.force_media = 0;
--
Ian Campbell, Senior Design Engineer
Web: http://www.arcom.com
Arcom, Clifton Road, Direct: +44 (0)1223 403 465
Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
_____________________________________________________________________
The message in this transmission is sent in confidence for the attention of the addressee only and should not be disclosed to any other party. Unauthorised recipients are requested to preserve this confidentiality. Please advise the sender if the addressee is not resident at the receiving end. Email to and from Arcom is automatically monitored for operational and lawful business reasons.
This message has been virus scanned by MessageLabs.
Ian Campbell <[email protected]> wrote:
>
> On Tue, 2004-11-23 at 15:31 -0800, Andrew Morton wrote:
> > One possible fix would be to remove that flush_scheduled_work() and to do
> > refcounting around smc_phy_configure(): dev_hold() when scheduling the work
> > (if schedule_work() returned true), dev_put() in the handler.
>
> Something like the following?
I think so.
> +static void smc_phy_configure_wq(void *data)
> +{
> + struct net_device *dev = data;
> + dev_put(dev);
> + smc_phy_configure(data);
> +}
You'd want to do the dev_put() after the smc_phy_configure() though. It
may still be a tiny bit racy against module unload.
On Wed, 2004-11-24 at 01:46 -0800, Andrew Morton wrote:
> Ian Campbell <[email protected]> wrote:
> >
> > On Tue, 2004-11-23 at 15:31 -0800, Andrew Morton wrote:
> > > One possible fix would be to remove that flush_scheduled_work() and to do
> > > refcounting around smc_phy_configure(): dev_hold() when scheduling the work
> > > (if schedule_work() returned true), dev_put() in the handler.
> >
> > Something like the following?
>
> I think so.
>
> > +static void smc_phy_configure_wq(void *data)
> > +{
> > + struct net_device *dev = data;
> > + dev_put(dev);
> > + smc_phy_configure(data);
> > +}
>
> You'd want to do the dev_put() after the smc_phy_configure() though. It
> may still be a tiny bit racy against module unload.
Quite right. Fixed patch included.
Signed-off-by: Ian Campbell <[email protected]>
Index: 2.6/drivers/net/smc91x.c
===================================================================
--- 2.6.orig/drivers/net/smc91x.c 2004-11-16 09:26:52.000000000 +0000
+++ 2.6/drivers/net/smc91x.c 2004-11-24 09:53:54.397248397 +0000
@@ -1158,6 +1158,20 @@
}
/*
+ * smc_phy_configure_wq
+ *
+ * The net_device is referenced when the work was scheduled to avoid
+ * the need for a flush_scheduled_work() in smc_close(). Drop the
+ * reference and then do the configuration.
+ */
+static void smc_phy_configure_wq(void *data)
+{
+ struct net_device *dev = data;
+ smc_phy_configure(data);
+ dev_put(dev);
+}
+
+/*
* smc_phy_interrupt
*
* Purpose: Handle interrupts relating to PHY register 18. This is
@@ -1350,10 +1364,13 @@
/*
* Reconfiguring the PHY doesn't seem like a bad idea here, but
* smc_phy_configure() calls msleep() which calls schedule_timeout()
- * which calls schedule(). Ence we use a work queue.
+ * which calls schedule(). Hence we use a work queue.
*/
- if (lp->phy_type != 0)
- schedule_work(&lp->phy_configure);
+ if (lp->phy_type != 0) {
+ if (schedule_work(&lp->phy_configure)) {
+ dev_hold(dev);
+ }
+ }
/* We can accept TX packets again */
dev->trans_start = jiffies;
@@ -1536,10 +1553,8 @@
/* clear everything */
smc_shutdown(dev);
- if (lp->phy_type != 0) {
- flush_scheduled_work();
+ if (lp->phy_type != 0)
smc_phy_powerdown(dev, lp->mii.phy_id);
- }
if (lp->pending_tx_skb) {
dev_kfree_skb(lp->pending_tx_skb);
@@ -1891,7 +1906,7 @@
dev->ethtool_ops = &smc_ethtool_ops;
tasklet_init(&lp->tx_task, smc_hardware_send_pkt, (unsigned long)dev);
- INIT_WORK(&lp->phy_configure, smc_phy_configure, dev);
+ INIT_WORK(&lp->phy_configure, smc_phy_configure_wq, dev);
lp->mii.phy_id_mask = 0x1f;
lp->mii.reg_num_mask = 0x1f;
lp->mii.force_media = 0;
--
Ian Campbell, Senior Design Engineer
Web: http://www.arcom.com
Arcom, Clifton Road, Direct: +44 (0)1223 403 465
Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
On Wed, 24 Nov 2004, Ian Campbell wrote:
> Quite right. Fixed patch included.
Small question:
> + * smc_phy_configure_wq
> + *
> + * The net_device is referenced when the work was scheduled to avoid
> + * the need for a flush_scheduled_work() in smc_close(). Drop the
> + * reference and then do the configuration.
You probably want to invert the comment here too.
> +static void smc_phy_configure_wq(void *data)
> +{
> + struct net_device *dev = data;
> + smc_phy_configure(data);
> + dev_put(dev);
> +}
[...]
> @@ -1536,10 +1553,8 @@
> /* clear everything */
> smc_shutdown(dev);
>
> - if (lp->phy_type != 0) {
> - flush_scheduled_work();
> + if (lp->phy_type != 0)
> smc_phy_powerdown(dev, lp->mii.phy_id);
How do you ensure that smc_phy_configure() can't end up being called
after smc_phy_powerdown() here?
Nicolas
On Wed, 2004-11-24 at 10:21 -0500, Nicolas Pitre wrote:
> > + * smc_phy_configure_wq
> > + *
> > + * The net_device is referenced when the work was scheduled to avoid
> > + * the need for a flush_scheduled_work() in smc_close(). Drop the
> > + * reference and then do the configuration.
>
> You probably want to invert the comment here too.
Quite right.
> > @@ -1536,10 +1553,8 @@
> > /* clear everything */
> > smc_shutdown(dev);
> >
> > - if (lp->phy_type != 0) {
> > - flush_scheduled_work();
> > + if (lp->phy_type != 0)
> > smc_phy_powerdown(dev, lp->mii.phy_id);
>
>
> How do you ensure that smc_phy_configure() can't end up being called
> after smc_phy_powerdown() here?
Hmm, I think that smc_phy_configure() is only called from
smc_drv_resume() and smc_timeout() (via the workqueue).
For smc_drv_resume() I expected that there would be some sort of mutual
exclusion in the generic layers to prevent _close() and _resume() from
happening at the same time.
For smc_timeout() I would also expect that the generic layer would have
cancelled the tx_timeout etc before calling smc_close().
I guess I don't know if either of these things are true -- anyone know?
The other solution might be to set phy_type to 0 in smc_phy_powerdown()
and then redetect it in smc_open() and smc_resume(). Or just use another
flag altogether.
Ian.
--
Ian Campbell, Senior Design Engineer
Web: http://www.arcom.com
Arcom, Clifton Road, Direct: +44 (0)1223 403 465
Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
_____________________________________________________________________
The message in this transmission is sent in confidence for the attention of the addressee only and should not be disclosed to any other party. Unauthorised recipients are requested to preserve this confidentiality. Please advise the sender if the addressee is not resident at the receiving end. Email to and from Arcom is automatically monitored for operational and lawful business reasons.
This message has been virus scanned by MessageLabs.
On Wed, 2004-11-24 at 11:57 -0500, Nicolas Pitre wrote:
I'll have a go at all this tomorrow.
Cheers,
Ian.
--
Ian Campbell, Senior Design Engineer
Web: http://www.arcom.com
Arcom, Clifton Road, Direct: +44 (0)1223 403 465
Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
_____________________________________________________________________
The message in this transmission is sent in confidence for the attention of the addressee only and should not be disclosed to any other party. Unauthorised recipients are requested to preserve this confidentiality. Please advise the sender if the addressee is not resident at the receiving end. Email to and from Arcom is automatically monitored for operational and lawful business reasons.
This message has been virus scanned by MessageLabs.
On Wed, 24 Nov 2004, Ian Campbell wrote:
> On Wed, 2004-11-24 at 10:21 -0500, Nicolas Pitre wrote:
>
> > How do you ensure that smc_phy_configure() can't end up being called
> > after smc_phy_powerdown() here?
>
> Hmm, I think that smc_phy_configure() is only called from
> smc_drv_resume() and smc_timeout() (via the workqueue).
There is smc_open() as well, but only smc_timeout() is really
problematic because of the schedule_work() call.
> The other solution might be to set phy_type to 0 in smc_phy_powerdown()
> and then redetect it in smc_open() and smc_resume(). Or just use another
> flag altogether.
Please make it another flag. You could replace your dev_hold(dev) with
lp->work_pending = 1 and dev_put() with lp->work_pending = 0, then
adding a while (lp->work_pending) schedule() in place of the
flush_scheduled_work().
And while you're at it, could you replace:
smc_phy_configure(void *data)
with
smc_phy_configure(struct net_device *dev)
The parameter doesn't have to be void *data anymore now that you
introduced smc_phy_configure_wq().
And finally, please tell about the reason why flush_scheduled_work()
can't be used in your comment.
Nicolas
Hi,
I've taken Nico's comments on board and used lp->work_pending to detect
whether smc_phy_configure is pending or not instead of dev_hold/put(). I
was able to do away with smc_phy_configure_wq() from the previous
version since setting lp->work_pending=0 can safely be done in
smc_phy_configure() itself, unlike dev_put().
Also fixed a typo 'ence' -> 'Hence' and renamed smc_detect_phy to
smc_phy_detect in order to follow the same pattern as the other
smc_phy_* functions, I was typing smc_phy_detect() every time anyway and
it was getting on my wick. I hope that's OK...
Signed-off-by: Ian Campbell <[email protected]>
Index: 2.6/drivers/net/smc91x.c
===================================================================
--- 2.6.orig/drivers/net/smc91x.c 2004-11-16 09:26:52.000000000 +0000
+++ 2.6/drivers/net/smc91x.c 2004-11-25 09:49:38.830953019 +0000
@@ -203,7 +203,10 @@
u32 msg_enable;
u32 phy_type;
struct mii_if_info mii;
+
+ /* work queue */
struct work_struct phy_configure;
+ int work_pending;
spinlock_t lock;
@@ -903,7 +906,7 @@
/*
* Finds and reports the PHY address
*/
-static void smc_detect_phy(struct net_device *dev)
+static void smc_phy_detect(struct net_device *dev)
{
struct smc_local *lp = netdev_priv(dev);
int phyaddr;
@@ -1155,6 +1158,7 @@
smc_phy_configure_exit:
spin_unlock_irq(&lp->lock);
+ lp->work_pending = 0;
}
/*
@@ -1350,10 +1354,13 @@
/*
* Reconfiguring the PHY doesn't seem like a bad idea here, but
* smc_phy_configure() calls msleep() which calls schedule_timeout()
- * which calls schedule(). Ence we use a work queue.
+ * which calls schedule(). Hence we use a work queue.
*/
- if (lp->phy_type != 0)
- schedule_work(&lp->phy_configure);
+ if (lp->phy_type != 0) {
+ if (schedule_work(&lp->phy_configure)) {
+ lp->work_pending = 1;
+ }
+ }
/* We can accept TX packets again */
dev->trans_start = jiffies;
@@ -1537,7 +1544,18 @@
smc_shutdown(dev);
if (lp->phy_type != 0) {
- flush_scheduled_work();
+ /* We need to ensure that no calls to
+ smc_phy_configure are pending.
+
+ flush_scheduled_work() cannot be called because we
+ are running with the netlink semaphore held (from
+ devinet_ioctl()) and the pending work queue
+ contains linkwatch_event() (scheduled by
+ netif_carrier_off() above). linkwatch_event() also
+ wants the netlink semaphore.
+ */
+ while(lp->work_pending)
+ schedule();
smc_phy_powerdown(dev, lp->mii.phy_id);
}
@@ -1904,7 +1922,7 @@
* Locate the phy, if any.
*/
if (lp->version >= (CHIP_91100 << 4))
- smc_detect_phy(dev);
+ smc_phy_detect(dev);
/* Set default parameters */
lp->msg_enable = NETIF_MSG_LINK;
--
Ian Campbell, Senior Design Engineer
Web: http://www.arcom.com
Arcom, Clifton Road, Direct: +44 (0)1223 403 465
Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
_____________________________________________________________________
The message in this transmission is sent in confidence for the attention of the addressee only and should not be disclosed to any other party. Unauthorised recipients are requested to preserve this confidentiality. Please advise the sender if the addressee is not resident at the receiving end. Email to and from Arcom is automatically monitored for operational and lawful business reasons.
This message has been virus scanned by MessageLabs.
On Thu, 25 Nov 2004, Ian Campbell wrote:
> Hi,
>
> I've taken Nico's comments on board and used lp->work_pending to detect
> whether smc_phy_configure is pending or not instead of dev_hold/put(). I
> was able to do away with smc_phy_configure_wq() from the previous
> version since setting lp->work_pending=0 can safely be done in
> smc_phy_configure() itself, unlike dev_put().
>
> Also fixed a typo 'ence' -> 'Hence' and renamed smc_detect_phy to
> smc_phy_detect in order to follow the same pattern as the other
> smc_phy_* functions, I was typing smc_phy_detect() every time anyway and
> it was getting on my wick. I hope that's OK...
>
> Signed-off-by: Ian Campbell <[email protected]>
Good! Please add "Signed-off-by: Nicolas Pitre <[email protected]>" and
send it to Jeff Garzik <[email protected]>.
> Index: 2.6/drivers/net/smc91x.c
> ===================================================================
> --- 2.6.orig/drivers/net/smc91x.c 2004-11-16 09:26:52.000000000 +0000
> +++ 2.6/drivers/net/smc91x.c 2004-11-25 09:49:38.830953019 +0000
> @@ -203,7 +203,10 @@
> u32 msg_enable;
> u32 phy_type;
> struct mii_if_info mii;
> +
> + /* work queue */
> struct work_struct phy_configure;
> + int work_pending;
>
> spinlock_t lock;
>
> @@ -903,7 +906,7 @@
> /*
> * Finds and reports the PHY address
> */
> -static void smc_detect_phy(struct net_device *dev)
> +static void smc_phy_detect(struct net_device *dev)
> {
> struct smc_local *lp = netdev_priv(dev);
> int phyaddr;
> @@ -1155,6 +1158,7 @@
>
> smc_phy_configure_exit:
> spin_unlock_irq(&lp->lock);
> + lp->work_pending = 0;
> }
>
> /*
> @@ -1350,10 +1354,13 @@
> /*
> * Reconfiguring the PHY doesn't seem like a bad idea here, but
> * smc_phy_configure() calls msleep() which calls schedule_timeout()
> - * which calls schedule(). Ence we use a work queue.
> + * which calls schedule(). Hence we use a work queue.
> */
> - if (lp->phy_type != 0)
> - schedule_work(&lp->phy_configure);
> + if (lp->phy_type != 0) {
> + if (schedule_work(&lp->phy_configure)) {
> + lp->work_pending = 1;
> + }
> + }
>
> /* We can accept TX packets again */
> dev->trans_start = jiffies;
> @@ -1537,7 +1544,18 @@
> smc_shutdown(dev);
>
> if (lp->phy_type != 0) {
> - flush_scheduled_work();
> + /* We need to ensure that no calls to
> + smc_phy_configure are pending.
> +
> + flush_scheduled_work() cannot be called because we
> + are running with the netlink semaphore held (from
> + devinet_ioctl()) and the pending work queue
> + contains linkwatch_event() (scheduled by
> + netif_carrier_off() above). linkwatch_event() also
> + wants the netlink semaphore.
> + */
> + while(lp->work_pending)
> + schedule();
> smc_phy_powerdown(dev, lp->mii.phy_id);
> }
>
> @@ -1904,7 +1922,7 @@
> * Locate the phy, if any.
> */
> if (lp->version >= (CHIP_91100 << 4))
> - smc_detect_phy(dev);
> + smc_phy_detect(dev);
>
> /* Set default parameters */
> lp->msg_enable = NETIF_MSG_LINK;
>
>
> --
> Ian Campbell, Senior Design Engineer
> Web: http://www.arcom.com
> Arcom, Clifton Road, Direct: +44 (0)1223 403 465
> Cambridge CB1 7EA, United Kingdom Phone: +44 (0)1223 411 200
>
>
> _____________________________________________________________________
> The message in this transmission is sent in confidence for the attention of the addressee only and should not be disclosed to any other party. Unauthorised recipients are requested to preserve this confidentiality. Please advise the sender if the addressee is not resident at the receiving end. Email to and from Arcom is automatically monitored for operational and lawful business reasons.
>
> This message has been virus scanned by MessageLabs.
>
Nicolas