From: Knut Petersen <[email protected]>
Date: Mon, 24 Jan 2011 10:25:55 +0100
> As I was hunting something different I found the following (potential)
> problem on an openSuSE 11.3 system with kernel 2.6.38-rc2.
> The message is triggerd by smpppd starting a dsl connection.
>
> Knut
>
>
> NET: Registered protocol family 24
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.38-rc2-kape #7
> -------------------------------------------------------
> pppd/2529 is trying to acquire lock:
> (&(&pch->downl)->rlock){+.....}, at: [<f814a634>] ppp_push+0x59/0x4a8
> [ppp_generic]
>
> but task is already holding lock:
> (&(&ppp->wlock)->rlock){+.-...}, at: [<f814ae1b>]
> ppp_xmit_process+0x19/0x451 [ppp_generic]
>
> which lock already depends on the new lock.
I've stared over this trace several times and can't figure out what
the problem is.
Paul, any idea?
On Sun, Feb 06, 2011 at 11:28:56PM -0800, David Miller wrote:
> From: Knut Petersen <[email protected]>
> Date: Mon, 24 Jan 2011 10:25:55 +0100
>
> > As I was hunting something different I found the following (potential)
> > problem on an openSuSE 11.3 system with kernel 2.6.38-rc2.
> > The message is triggerd by smpppd starting a dsl connection.
> >
> > Knut
> >
> >
> > NET: Registered protocol family 24
> >
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.38-rc2-kape #7
> > -------------------------------------------------------
> > pppd/2529 is trying to acquire lock:
> > (&(&pch->downl)->rlock){+.....}, at: [<f814a634>] ppp_push+0x59/0x4a8
> > [ppp_generic]
> >
> > but task is already holding lock:
> > (&(&ppp->wlock)->rlock){+.-...}, at: [<f814ae1b>]
> > ppp_xmit_process+0x19/0x451 [ppp_generic]
> >
> > which lock already depends on the new lock.
>
> I've stared over this trace several times and can't figure out what
> the problem is.
>
> Paul, any idea?
We seem to have recursed in the ppp code because of (apparently)
handling a softirq inside a spin_lock_bh region. :( If I understand
the original report correctly, the stack trace looks like this in part:
[<c04153eb>] net_rx_action+0x3f/0xfe
[<c0128563>] __do_softirq+0x76/0xfd
-> #1 (_xmit_NETROM){+.-...}:
[<c01462b2>] lock_acquire+0x47/0x5e
[<c0471c9c>] _raw_spin_lock_irqsave+0x2e/0x3e
[<c040ed60>] skb_dequeue+0x12/0x4a
[<f814c237>] ppp_channel_push+0x2e/0x94 [ppp_generic]
So we were in ppp_channel_push, and the first thing it does is
spin_lock_bh(&pch->downl), and then it calls skb_dequeue, which did a
spin_lock_irqsave, and then somehow we get into __do_softirq. I
thought spin_lock_bh should have stopped softirqs from running?
Paul.
On Mon, Feb 07, 2011 at 09:29:50PM +1100, Paul Mackerras wrote:
> We seem to have recursed in the ppp code because of (apparently)
> handling a softirq inside a spin_lock_bh region. :( If I understand
> the original report correctly, the stack trace looks like this in part:
>
> [<c04153eb>] net_rx_action+0x3f/0xfe
> [<c0128563>] __do_softirq+0x76/0xfd
> -> #1 (_xmit_NETROM){+.-...}:
> [<c01462b2>] lock_acquire+0x47/0x5e
> [<c0471c9c>] _raw_spin_lock_irqsave+0x2e/0x3e
> [<c040ed60>] skb_dequeue+0x12/0x4a
> [<f814c237>] ppp_channel_push+0x2e/0x94 [ppp_generic]
>
> So we were in ppp_channel_push, and the first thing it does is
> spin_lock_bh(&pch->downl), and then it calls skb_dequeue, which did a
> spin_lock_irqsave, and then somehow we get into __do_softirq. I
> thought spin_lock_bh should have stopped softirqs from running?
OK, I think I have misinterpreted the lockdep info in the original
message. If it's saying that we are trying to get ppp->rlock when we
have taken chan->downl, then that would indeed be a bug, since the lock
ordering as documented in the comments is ppp->rlock -> chan->downl.
I can't see in the code where that happens though and the lockdep
trace doesn't seem to be telling me either.
Paul.
Hi everybody!
I bisected the problem with the following result:
aa9421041128abb4d269ee1dc502ff65fb3b7d69 is the first bad commit
commit aa9421041128abb4d269ee1dc502ff65fb3b7d69
Author: Changli Gao <[email protected]>
Date: Sat Dec 4 02:31:41 2010 +0000
net: init ingress queue
The dev field of ingress queue is forgot to initialized, then NULL
pointer dereference happens in qdisc_alloc().
Move inits of tx queues to netif_alloc_netdev_queues().
Signed-off-by: Changli Gao <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
:040000 040000 dcbb6ab41c4308cba1bc6823d200dcf92aa402d8
b5e190ec681d26ffe62d1d0214c4ef77b8034189 M net
cu,
Knut
From: Knut Petersen <[email protected]>
Date: Tue, 08 Feb 2011 08:51:22 +0100
> I bisected the problem with the following result:
>
> aa9421041128abb4d269ee1dc502ff65fb3b7d69 is the first bad commit
> commit aa9421041128abb4d269ee1dc502ff65fb3b7d69
> Author: Changli Gao <[email protected]>
> Date: Sat Dec 4 02:31:41 2010 +0000
>
> net: init ingress queue
>
> The dev field of ingress queue is forgot to initialized, then NULL
> pointer dereference happens in qdisc_alloc().
>
> Move inits of tx queues to netif_alloc_netdev_queues().
>
> Signed-off-by: Changli Gao <[email protected]>
> Acked-by: Eric Dumazet <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
>
> :040000 040000 dcbb6ab41c4308cba1bc6823d200dcf92aa402d8
> b5e190ec681d26ffe62d1d0214c4ef77b8034189 M net
Indeed, this initialization is now too early for the sake
of getting the lockdep bits right. The problem is that at
the point in which we call netif_alloc_netdev_queue() we
haven't initialized dev->type yet, it is therefore always
zero when we setup the lockdep class for ->_xmit_lock.
From: David Miller <[email protected]>
Date: Tue, 08 Feb 2011 00:07:21 -0800 (PST)
[ Eric B., CC:'ing you so that you are aware of the init network
namespace leak that's fixed as a side effect of this change. ]
> From: Knut Petersen <[email protected]>
> Date: Tue, 08 Feb 2011 08:51:22 +0100
>
>> I bisected the problem with the following result:
>>
>> aa9421041128abb4d269ee1dc502ff65fb3b7d69 is the first bad commit
>> commit aa9421041128abb4d269ee1dc502ff65fb3b7d69
>> Author: Changli Gao <[email protected]>
>> Date: Sat Dec 4 02:31:41 2010 +0000
>>
>> net: init ingress queue
>>
>> The dev field of ingress queue is forgot to initialized, then NULL
>> pointer dereference happens in qdisc_alloc().
>>
>> Move inits of tx queues to netif_alloc_netdev_queues().
>>
>> Signed-off-by: Changli Gao <[email protected]>
>> Acked-by: Eric Dumazet <[email protected]>
>> Signed-off-by: David S. Miller <[email protected]>
>>
>> :040000 040000 dcbb6ab41c4308cba1bc6823d200dcf92aa402d8
>> b5e190ec681d26ffe62d1d0214c4ef77b8034189 M net
>
> Indeed, this initialization is now too early for the sake
> of getting the lockdep bits right. The problem is that at
> the point in which we call netif_alloc_netdev_queue() we
> haven't initialized dev->type yet, it is therefore always
> zero when we setup the lockdep class for ->_xmit_lock.
Ok, this should fix it, please test it out for me.
Thanks!
--------------------
net: Fix lockdep regression caused by initializing netdev queues too early.
In commit aa9421041128abb4d269ee1dc502ff65fb3b7d69 ("net: init ingress
queue") we moved the allocation and lock initialization of the queues
into alloc_netdev_mq() since register_netdevice() is way too late.
The problem is that dev->type is not setup until the setup()
callback is invoked by alloc_netdev_mq(), and the dev->type is
what determines the lockdep class to use for the locks in the
queues.
Fix this by doing the queue allocation after the setup() callback
runs.
This is safe because the setup() callback is not allowed to make any
state changes that need to be undone on error (memory allocations,
etc.). It may, however, make state changes that are undone by
free_netdev() (such as netif_napi_add(), which is done by the
ipoib driver's setup routine).
The previous code also leaked a reference to the &init_net namespace
object on RX/TX queue allocation failures.
Signed-off-by: David S. Miller <[email protected]>
---
net/core/dev.c | 27 ++++++++++++++++-----------
1 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b6d0bf8..8e726cb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5660,30 +5660,35 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
dev_net_set(dev, &init_net);
+ dev->gso_max_size = GSO_MAX_SIZE;
+
+ INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
+ dev->ethtool_ntuple_list.count = 0;
+ INIT_LIST_HEAD(&dev->napi_list);
+ INIT_LIST_HEAD(&dev->unreg_list);
+ INIT_LIST_HEAD(&dev->link_watch_list);
+ dev->priv_flags = IFF_XMIT_DST_RELEASE;
+ setup(dev);
+
dev->num_tx_queues = txqs;
dev->real_num_tx_queues = txqs;
if (netif_alloc_netdev_queues(dev))
- goto free_pcpu;
+ goto free_all;
#ifdef CONFIG_RPS
dev->num_rx_queues = rxqs;
dev->real_num_rx_queues = rxqs;
if (netif_alloc_rx_queues(dev))
- goto free_pcpu;
+ goto free_all;
#endif
- dev->gso_max_size = GSO_MAX_SIZE;
-
- INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
- dev->ethtool_ntuple_list.count = 0;
- INIT_LIST_HEAD(&dev->napi_list);
- INIT_LIST_HEAD(&dev->unreg_list);
- INIT_LIST_HEAD(&dev->link_watch_list);
- dev->priv_flags = IFF_XMIT_DST_RELEASE;
- setup(dev);
strcpy(dev->name, name);
return dev;
+free_all:
+ free_netdev(dev);
+ return NULL;
+
free_pcpu:
free_percpu(dev->pcpu_refcnt);
kfree(dev->_tx);
--
1.7.4
Am 09.02.2011 00:04, schrieb David Miller:
> From: David Miller <[email protected]>
> Date: Tue, 08 Feb 2011 00:07:21 -0800 (PST)
>
> [ Eric B., CC:'ing you so that you are aware of the init network
> namespace leak that's fixed as a side effect of this change. ]
>
>
>> From: Knut Petersen <[email protected]>
>> Date: Tue, 08 Feb 2011 08:51:22 +0100
>>
>>
>>> I bisected the problem with the following result:
>>>
>>> aa9421041128abb4d269ee1dc502ff65fb3b7d69 is the first bad commit
>>> commit aa9421041128abb4d269ee1dc502ff65fb3b7d69
>>> Author: Changli Gao <[email protected]>
>>> Date: Sat Dec 4 02:31:41 2010 +0000
>>>
>>> net: init ingress queue
>>>
>>> The dev field of ingress queue is forgot to initialized, then NULL
>>> pointer dereference happens in qdisc_alloc().
>>>
>>> Move inits of tx queues to netif_alloc_netdev_queues().
>>>
>>> Signed-off-by: Changli Gao <[email protected]>
>>> Acked-by: Eric Dumazet <[email protected]>
>>> Signed-off-by: David S. Miller <[email protected]>
>>>
>>> :040000 040000 dcbb6ab41c4308cba1bc6823d200dcf92aa402d8
>>> b5e190ec681d26ffe62d1d0214c4ef77b8034189 M net
>>>
>> Indeed, this initialization is now too early for the sake
>> of getting the lockdep bits right. The problem is that at
>> the point in which we call netif_alloc_netdev_queue() we
>> haven't initialized dev->type yet, it is therefore always
>> zero when we setup the lockdep class for ->_xmit_lock.
>>
> Ok, this should fix it, please test it out for me.
>
> Thanks!
>
Yes, that patch does solve the problem, and I do not see
any negative effects.
Thanks!
Knut
> --------------------
> net: Fix lockdep regression caused by initializing netdev queues too early.
>
> In commit aa9421041128abb4d269ee1dc502ff65fb3b7d69 ("net: init ingress
> queue") we moved the allocation and lock initialization of the queues
> into alloc_netdev_mq() since register_netdevice() is way too late.
>
> The problem is that dev->type is not setup until the setup()
> callback is invoked by alloc_netdev_mq(), and the dev->type is
> what determines the lockdep class to use for the locks in the
> queues.
>
> Fix this by doing the queue allocation after the setup() callback
> runs.
>
> This is safe because the setup() callback is not allowed to make any
> state changes that need to be undone on error (memory allocations,
> etc.). It may, however, make state changes that are undone by
> free_netdev() (such as netif_napi_add(), which is done by the
> ipoib driver's setup routine).
>
> The previous code also leaked a reference to the &init_net namespace
> object on RX/TX queue allocation failures.
>
> Signed-off-by: David S. Miller <[email protected]>
> ---
> net/core/dev.c | 27 ++++++++++++++++-----------
> 1 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b6d0bf8..8e726cb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5660,30 +5660,35 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
>
> dev_net_set(dev, &init_net);
>
> + dev->gso_max_size = GSO_MAX_SIZE;
> +
> + INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
> + dev->ethtool_ntuple_list.count = 0;
> + INIT_LIST_HEAD(&dev->napi_list);
> + INIT_LIST_HEAD(&dev->unreg_list);
> + INIT_LIST_HEAD(&dev->link_watch_list);
> + dev->priv_flags = IFF_XMIT_DST_RELEASE;
> + setup(dev);
> +
> dev->num_tx_queues = txqs;
> dev->real_num_tx_queues = txqs;
> if (netif_alloc_netdev_queues(dev))
> - goto free_pcpu;
> + goto free_all;
>
> #ifdef CONFIG_RPS
> dev->num_rx_queues = rxqs;
> dev->real_num_rx_queues = rxqs;
> if (netif_alloc_rx_queues(dev))
> - goto free_pcpu;
> + goto free_all;
> #endif
>
> - dev->gso_max_size = GSO_MAX_SIZE;
> -
> - INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
> - dev->ethtool_ntuple_list.count = 0;
> - INIT_LIST_HEAD(&dev->napi_list);
> - INIT_LIST_HEAD(&dev->unreg_list);
> - INIT_LIST_HEAD(&dev->link_watch_list);
> - dev->priv_flags = IFF_XMIT_DST_RELEASE;
> - setup(dev);
> strcpy(dev->name, name);
> return dev;
>
> +free_all:
> + free_netdev(dev);
> + return NULL;
> +
> free_pcpu:
> free_percpu(dev->pcpu_refcnt);
> kfree(dev->_tx);
>