2008-07-21 15:24:20

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Ingo Molnar <[email protected]>
Date: Mon, 21 Jul 2008 17:04:48 +0200

[ Adding linux-wireless CC, again, Ingo please retain it for
followups, thanks! ]

> * Ingo Molnar <[email protected]> wrote:
>
> > > Pid: 1, comm: swapper Not tainted 2.6.26-tip-00013-g6de15c6-dirty #21290
> >
> > some more information: find below the same crash with vanilla
> > linus/master and no extra patches. The crash site is:
>
> a 32-bit testbox just triggered the same crash too:
>
> calling init_mac80211_hwsim+0x0/0x310
> mac80211_hwsim: Initializing radio 0
> phy0: Failed to select rate control algorithm
> phy0: Failed to initialize rate control algorithm
> mac80211_hwsim: ieee80211_register_hw failed (-2)
> BUG: unable to handle kernel NULL pointer dereference at 00000298
> IP: [<c06efb98>] rollback_registered+0x28/0x120
> *pdpt = 0000000000bc9001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP
>
> and that system has no wireless so i guess it's just some unregister
> inbalance kind of init/deinit buglet.
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-07-24 10:09:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, 2008-07-24 at 02:32 -0700, David Miller wrote:
> From: Peter Zijlstra <[email protected]>
> Date: Thu, 24 Jul 2008 11:27:05 +0200
>
> > Well, not only lockdep, taking a very large number of locks is expensive
> > as well.
>
> Right now it would be on the order of 16 or 32 for
> real hardware.
>
> Much less than the scheduler currently takes on some
> of my systems, so currently you are the pot calling the
> kettle black. :-)

One nit, and then I'll let this issue rest :-)

The scheduler has a long lock dependancy chain (nr_cpu_ids rq locks),
but it never takes all of them at the same time. Any one code path will
at most hold two rq locks.




2008-07-25 19:36:50

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, 2008-07-25 at 21:34 +0200, Jarek Poplawski wrote:

> > Umm, of course it cannot, because then we'd have to take the mutex in
> > the TX path, which we cannot. We cannot have another lock in the TX
> > path, what's so hard to understand about? We need to be able to lock all
> > queues to lock out multiple tx paths at once in some (really) slow paths
> > but not have any extra lock overhead for the tx path, especially not a
> > single lock.
>
> But this mutex doesn't have to be mutex. And it's not for the tx path,
> only for "service" just like netif_tx_lock(). The fast path needs only
> queue->tx_lock.

No, we need to be able to lock out multiple TX paths at once.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-24 10:45:27

by Nick Piggin

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thursday 24 July 2008 20:08, Peter Zijlstra wrote:
> On Thu, 2008-07-24 at 02:32 -0700, David Miller wrote:
> > From: Peter Zijlstra <[email protected]>
> > Date: Thu, 24 Jul 2008 11:27:05 +0200
> >
> > > Well, not only lockdep, taking a very large number of locks is
> > > expensive as well.
> >
> > Right now it would be on the order of 16 or 32 for
> > real hardware.
> >
> > Much less than the scheduler currently takes on some
> > of my systems, so currently you are the pot calling the
> > kettle black. :-)
>
> One nit, and then I'll let this issue rest :-)
>
> The scheduler has a long lock dependancy chain (nr_cpu_ids rq locks),
> but it never takes all of them at the same time. Any one code path will
> at most hold two rq locks.

Aside from lockdep, is there a particular problem with taking 64k locks
at once? (in a very slow path, of course) I don't think it causes a
problem with preempt_count, does it cause issues with -rt kernel?

Hey, something kind of cool (and OT) I've just thought of that we can
do with ticket locks is to take tickets for 2 (or 64K) nested locks,
and then wait for them both (all), so the cost is N*lock + longest spin,
rather than N*lock + N*avg spin.

That would mean even at the worst case of a huge amount of contention
on all 64K locks, it should only take a couple of ms to take all of
them (assuming max spin time isn't ridiculous).

Probably not the kind of feature we want to expose widely, but for
really special things like the scheduler, it might be a neat hack to
save a few cycles ;) Traditional implementations would just have
#define spin_lock_async spin_lock
#define spin_lock_async_wait do {} while (0)

Sorry it's offtopic, but if I didn't post it, I'd forget to. Might be
a fun quick hack for someone.

2008-07-22 13:02:12

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

David Miller wrote:
> From: Larry Finger <[email protected]>
> Date: Tue, 22 Jul 2008 01:34:28 -0500
>
>> David Miller wrote:
>>> From: Larry Finger <[email protected]>
>>> Date: Mon, 21 Jul 2008 17:40:10 -0500
>>>
>>>> Sorry :(
>>>>
>>>> I used the davem patch, the second version of your first one, and your second
>>>> one. Both problems persist.
>>>>
>>>> Still plugging away on bisection.
>>> GIT bisecting the lockdep problem is surely going the land you on:
>>>
>>> commit e308a5d806c852f56590ffdd3834d0df0cbed8d7
>> No. It landed on this one.
>
> For the lockdep warnings?

When I was just one commit later, I got both the lockdep warning and the BUG.
This is the commit in question.

commit 16361127ebed0fb8f9d7cc94c6e137eaf710f676
Author: David S. Miller <[email protected]>
Date: Wed Jul 16 02:23:17 2008 -0700

pkt_sched: dev_init_scheduler() does not need to lock qdisc tree.

We are registering the device, there is no way anyone can get
at this object's qdiscs yet in any meaningful way.

Signed-off-by: David S. Miller <[email protected]>

Larry

2008-07-21 21:06:10

by Patrick McHardy

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

David Miller wrote:
> From: Patrick McHardy <[email protected]>
> Date: Mon, 21 Jul 2008 22:51:53 +0200
>
>> David Miller wrote:
>>> From: Larry Finger <[email protected]>
>>> Date: Mon, 21 Jul 2008 15:38:47 -0500
>>>
>>>> David Miller wrote:
>>>>> No further backtrace? That will tell us what driver is causing
>>>>> this.
>>>> Yes, I have a full backtrace.
>>>>
>>>> It starts with possible recursive locking in NetworkManager, and goes directly
>>>> into the Warning - this came from a later pull of Linus's tree.
>>> That helps a lot, I'm looking at this now.
>> I'm guessing this needs similar lockdep class initializations
>> to _xmit_lock since it essentially has the same nesting rules.
>
> Yes, I figured that out just now :-)
>
> Maybe something like the following should do it?


It looks correct in any case. I'm not sure whether it fixes
this lockdep warning though, according to the backtrace and
module list its b43 and dev_mc_sync in net/mac80211/main.c
that are causing the error, which don't seem to be included
in your patch. I'm unable to find where it previously
initialized the xmit_lock lockdep class though, so I must
be missing something :)



2008-07-21 20:21:00

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Larry Finger <[email protected]>
Date: Mon, 21 Jul 2008 14:43:34 -0500

> Jul 21 12:19:37 larrylap kernel: kernel BUG at net/core/dev.c:1328!
> Jul 21 12:19:37 larrylap kernel: invalid opcode: 0000 [1] SMP
> Jul 21 12:19:37 larrylap kernel: CPU 0
> Jul 21 12:19:37 larrylap kernel: Modules linked in: af_packet rfkill_input nfs
> lockd nfs_acl sunrpc cpufreq_conservative cpu
> freq_userspace cpufreq_powersave powernow_k8 fuse loop dm_mod arc4 ecb
> crypto_blkcipher b43 firmware_class rfkill mac80211 c
> fg80211 snd_hda_intel snd_pcm snd_timer led_class snd k8temp input_polldev
> sr_mod soundcore button battery hwmon cdrom force
> deth ac serio_raw ssb snd_page_alloc sg ehci_hcd sd_mod ohci_hcd usbcore edd fan
> thermal processor ext3 mbcache jbd pata_amd
> ahci libata scsi_mod dock
> Jul 21 12:19:37 larrylap kernel: Pid: 2057, comm: b43 Not tainted
> 2.6.26-Linus-git-05253-g14b395e #1
> Jul 21 12:19:37 larrylap kernel: RIP: 0010:[<ffffffff8039ec4d>]
> [<ffffffff8039ec4d>] __netif_schedule+0x12/0x75
> Jul 21 12:19:37 larrylap kernel: RSP: 0000:ffff8800b9ae1de0 EFLAGS: 00010246
>
> With an invalid opcode, mine is likely due to stack corruption.

No further backtrace? That will tell us what driver is causing
this.


2008-07-21 19:48:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370



On Mon, 21 Jul 2008, Larry Finger wrote:
>
> Yes, I'm chasing a distinct bug. The header for mine is
>
> Jul 21 12:19:37 larrylap kernel: kernel BUG at net/core/dev.c:1328!

Ok, that one is fixed now in my tree. Or at least it's turned into a
warning, so the machine should work.

> With an invalid opcode, mine is likely due to stack corruption.

No, invalid opcode is because we use the "ud2" instruction for BUG(),
which causes an invalid op exception. So any BUG[_ON]() will always cause
that on x86.

Linus

2008-07-24 10:56:13

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, 24 Jul 2008, Nick Piggin wrote:
> Hey, something kind of cool (and OT) I've just thought of that we can
> do with ticket locks is to take tickets for 2 (or 64K) nested locks,
> and then wait for them both (all), so the cost is N*lock + longest spin,
> rather than N*lock + N*avg spin.

Isn't this deadlocky?

E.g. one task takes ticket x=1, then other task comes in and takes x=2
and y=1, then first task takes y=2. Then neither can actually
complete both locks.

Miklos

2008-07-22 11:32:21

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Larry Finger <[email protected]>
Date: Tue, 22 Jul 2008 01:34:28 -0500

> David Miller wrote:
> > From: Larry Finger <[email protected]>
> > Date: Mon, 21 Jul 2008 17:40:10 -0500
> >
> >> Sorry :(
> >>
> >> I used the davem patch, the second version of your first one, and your second
> >> one. Both problems persist.
> >>
> >> Still plugging away on bisection.
> >
> > GIT bisecting the lockdep problem is surely going the land you on:
> >
> > commit e308a5d806c852f56590ffdd3834d0df0cbed8d7
>
> No. It landed on this one.

For the lockdep warnings?

2008-07-23 07:59:21

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Wed, 23 Jul 2008 06:20:36 +0000

> PS: if there is nothing new in lockdep the classical method would
> be to change this static array:
>
> static struct lock_class_key
> netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
>
> to
>
> static struct lock_class_key
> netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)][MAX_NUM_TX_QUEUES];
>
> and set lockdep classes per queue as well. (If we are sure we don't
> need lockdep subclasses anywhere this could be optimized by using
> one lock_class_key per 8 queues and spin_lock_nested()).

Unfortunately MAX_NUM_TX_QUEUES is USHORT_MAX, so this isn't really
a feasible approach.

spin_lock_nested() isn't all that viable either, as the subclass
limit is something like 8.


2008-07-24 09:27:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, 2008-07-24 at 02:20 -0700, David Miller wrote:
> From: Peter Zijlstra <[email protected]>
> Date: Thu, 24 Jul 2008 11:10:48 +0200
>
> > Ok, then how about something like this, the idea is to wrap the per tx
> > lock with a read lock of the device and let the netif_tx_lock() be the
> > write side, therefore excluding all device locks, but not incure the
> > cacheline bouncing on the read side by using per-cpu counters like rcu
> > does.
> >
> > This of course requires that netif_tx_lock() is rare, otherwise stuff
> > will go bounce anyway...
> >
> > Probably missed a few details,.. but I think the below ought to show the
> > idea...
>
> Thanks for the effort, but I don't think we can seriously consider
> this.
>
> This lock is taken for every packet transmitted by the system, adding
> another memory reference (the RCU deref) and a counter bump is just
> not something we can just add to placate lockdep. We going through
> all of this effort to seperate the TX locking into individual
> queues, it would be silly to go back and make it more expensive.

Well, not only lockdep, taking a very large number of locks is expensive
as well.

> I have other ideas which I've expanded upon in other emails. They
> involve creating a netif_tx_freeze() interface and getting the drivers
> to start using it.

OK, as long as we get there :-)


2008-07-31 12:29:32

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Sun, 27 Jul 2008 22:37:57 +0200

> Looks like enough to me. (Probably it could even share space with
> the state.)

So I made some progress on this, three things:

1) I remember why I choose a to use a bit in my design, it's so that
it does not increase the costs of the checks in the fast paths.
test_bit(X) && test_bit(Y) can be combined into a single test by
the compiler.

2) We can't use the reference counting scheme, because we don't want
to let a second cpu into these protected code paths just because
another is in the middle of using a freeze too.

3) So we can simply put a top-level TX spinlock around these things.

Therefore all the hot paths:

a) grab _xmit_lock
b) check XOFF and FROZEN
c) only call ->hard_start_xmit() if both bits are clear

netif_tx_lock() does:

1) grab netdev->tx_global_lock
2) for_each_tx_queue() {
lock(txq);
set_bit(FROZEN);
unlock(txq);
}

and unlock does:

1) for_each_tx_queue() {
clear_bit(FROZEN);
if (!test_bit(XOFF))
__netif_schedule();
}
2) release netdev->tx_global_lock

And this seems to satisfy all the constraints which are:

1) Must act like a lock and protect execution of the code path
which occurs inside of "netif_tx_{lock,unlock}()"

2) Must ensure no cpus are executing inside of ->hard_start_xmit()
after netif_tx_lock() returns.

3) Must not try to grab all the TX queue locks at once.

This top-level tx_global_lock also simplifies the freezing, as
it makes sure only one cpu is initiating or finishing a freeze
at any given time.

I've also adjusted code that really and truly only wanted to
lock one queue at a time, which in particular was IFB and the
teql scheduler.

It's late here, but I'll start testing the following patch on my
multiqueue capable cards after some sleep.

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 0960e69..e4fbefc 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -69,18 +69,20 @@ static void ri_tasklet(unsigned long dev)
struct net_device *_dev = (struct net_device *)dev;
struct ifb_private *dp = netdev_priv(_dev);
struct net_device_stats *stats = &_dev->stats;
+ struct netdev_queue *txq;
struct sk_buff *skb;

+ txq = netdev_get_tx_queue(_dev, 0);
dp->st_task_enter++;
if ((skb = skb_peek(&dp->tq)) == NULL) {
dp->st_txq_refl_try++;
- if (netif_tx_trylock(_dev)) {
+ if (__netif_tx_trylock(txq)) {
dp->st_rxq_enter++;
while ((skb = skb_dequeue(&dp->rq)) != NULL) {
skb_queue_tail(&dp->tq, skb);
dp->st_rx2tx_tran++;
}
- netif_tx_unlock(_dev);
+ __netif_tx_unlock(txq);
} else {
/* reschedule */
dp->st_rxq_notenter++;
@@ -115,7 +117,7 @@ static void ri_tasklet(unsigned long dev)
BUG();
}

- if (netif_tx_trylock(_dev)) {
+ if (__netif_tx_trylock(txq)) {
dp->st_rxq_check++;
if ((skb = skb_peek(&dp->rq)) == NULL) {
dp->tasklet_pending = 0;
@@ -123,10 +125,10 @@ static void ri_tasklet(unsigned long dev)
netif_wake_queue(_dev);
} else {
dp->st_rxq_rsch++;
- netif_tx_unlock(_dev);
+ __netif_tx_unlock(txq);
goto resched;
}
- netif_tx_unlock(_dev);
+ __netif_tx_unlock(txq);
} else {
resched:
dp->tasklet_pending = 1;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b4d056c..ee583f6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -440,6 +440,7 @@ static inline void napi_synchronize(const struct napi_struct *n)
enum netdev_queue_state_t
{
__QUEUE_STATE_XOFF,
+ __QUEUE_STATE_FROZEN,
};

struct netdev_queue {
@@ -636,7 +637,7 @@ struct net_device
unsigned int real_num_tx_queues;

unsigned long tx_queue_len; /* Max frames per queue allowed */
-
+ spinlock_t tx_global_lock;
/*
* One part is mostly used on xmit path (device)
*/
@@ -1099,6 +1100,11 @@ static inline int netif_queue_stopped(const struct net_device *dev)
return netif_tx_queue_stopped(netdev_get_tx_queue(dev, 0));
}

+static inline int netif_tx_queue_frozen(const struct netdev_queue *dev_queue)
+{
+ return test_bit(__QUEUE_STATE_FROZEN, &dev_queue->state);
+}
+
/**
* netif_running - test if up
* @dev: network device
@@ -1475,6 +1481,26 @@ static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
txq->xmit_lock_owner = smp_processor_id();
}

+static inline int __netif_tx_trylock(struct netdev_queue *txq)
+{
+ int ok = spin_trylock(&txq->_xmit_lock);
+ if (likely(ok))
+ txq->xmit_lock_owner = smp_processor_id();
+ return ok;
+}
+
+static inline void __netif_tx_unlock(struct netdev_queue *txq)
+{
+ txq->xmit_lock_owner = -1;
+ spin_unlock(&txq->_xmit_lock);
+}
+
+static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
+{
+ txq->xmit_lock_owner = -1;
+ spin_unlock_bh(&txq->_xmit_lock);
+}
+
/**
* netif_tx_lock - grab network device transmit lock
* @dev: network device
@@ -1484,12 +1510,23 @@ static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
*/
static inline void netif_tx_lock(struct net_device *dev)
{
- int cpu = smp_processor_id();
unsigned int i;
+ int cpu;

+ spin_lock(&dev->tx_global_lock);
+ cpu = smp_processor_id();
for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
+
+ /* We are the only thread of execution doing a
+ * freeze, but we have to grab the _xmit_lock in
+ * order to synchronize with threads which are in
+ * the ->hard_start_xmit() handler and already
+ * checked the frozen bit.
+ */
__netif_tx_lock(txq, cpu);
+ set_bit(__QUEUE_STATE_FROZEN, &txq->state);
+ __netif_tx_unlock(txq);
}
}

@@ -1499,40 +1536,22 @@ static inline void netif_tx_lock_bh(struct net_device *dev)
netif_tx_lock(dev);
}

-static inline int __netif_tx_trylock(struct netdev_queue *txq)
-{
- int ok = spin_trylock(&txq->_xmit_lock);
- if (likely(ok))
- txq->xmit_lock_owner = smp_processor_id();
- return ok;
-}
-
-static inline int netif_tx_trylock(struct net_device *dev)
-{
- return __netif_tx_trylock(netdev_get_tx_queue(dev, 0));
-}
-
-static inline void __netif_tx_unlock(struct netdev_queue *txq)
-{
- txq->xmit_lock_owner = -1;
- spin_unlock(&txq->_xmit_lock);
-}
-
-static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
-{
- txq->xmit_lock_owner = -1;
- spin_unlock_bh(&txq->_xmit_lock);
-}
-
static inline void netif_tx_unlock(struct net_device *dev)
{
unsigned int i;

for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
- __netif_tx_unlock(txq);
- }

+ /* No need to grab the _xmit_lock here. If the
+ * queue is not stopped for another reason, we
+ * force a schedule.
+ */
+ clear_bit(__QUEUE_STATE_FROZEN, &txq->state);
+ if (!test_bit(__QUEUE_STATE_XOFF, &txq->state))
+ __netif_schedule(txq->qdisc);
+ }
+ spin_unlock(&dev->tx_global_lock);
}

static inline void netif_tx_unlock_bh(struct net_device *dev)
@@ -1556,13 +1575,18 @@ static inline void netif_tx_unlock_bh(struct net_device *dev)
static inline void netif_tx_disable(struct net_device *dev)
{
unsigned int i;
+ int cpu;

- netif_tx_lock_bh(dev);
+ local_bh_disable();
+ cpu = smp_processor_id();
for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
+
+ __netif_tx_lock(txq, cpu);
netif_tx_stop_queue(txq);
+ __netif_tx_unlock(txq);
}
- netif_tx_unlock_bh(dev);
+ local_bh_enable();
}

static inline void netif_addr_lock(struct net_device *dev)
diff --git a/net/core/dev.c b/net/core/dev.c
index 63d6bcd..69320a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4200,6 +4200,7 @@ static void netdev_init_queues(struct net_device *dev)
{
netdev_init_one_queue(dev, &dev->rx_queue, NULL);
netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
+ spin_lock_init(&dev->tx_global_lock);
}

/**
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c127208..6c7af39 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -70,6 +70,7 @@ static void queue_process(struct work_struct *work)
local_irq_save(flags);
__netif_tx_lock(txq, smp_processor_id());
if (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq) ||
dev->hard_start_xmit(skb, dev) != NETDEV_TX_OK) {
skb_queue_head(&npinfo->txq, skb);
__netif_tx_unlock(txq);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index c7d484f..3284605 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3305,6 +3305,7 @@ static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)

txq = netdev_get_tx_queue(odev, queue_map);
if (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq) ||
need_resched()) {
idle_start = getCurUs();

@@ -3320,7 +3321,8 @@ static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)

pkt_dev->idle_acc += getCurUs() - idle_start;

- if (netif_tx_queue_stopped(txq)) {
+ if (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq)) {
pkt_dev->next_tx_us = getCurUs(); /* TODO */
pkt_dev->next_tx_ns = 0;
goto out; /* Try the next interface */
@@ -3352,7 +3354,8 @@ static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)
txq = netdev_get_tx_queue(odev, queue_map);

__netif_tx_lock_bh(txq);
- if (!netif_tx_queue_stopped(txq)) {
+ if (!netif_tx_queue_stopped(txq) &&
+ !netif_tx_queue_frozen(txq)) {

atomic_inc(&(pkt_dev->skb->users));
retry_now:
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 345838a..9c9cd4d 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -135,7 +135,8 @@ static inline int qdisc_restart(struct Qdisc *q)
txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));

HARD_TX_LOCK(dev, txq, smp_processor_id());
- if (!netif_subqueue_stopped(dev, skb))
+ if (!netif_tx_queue_stopped(txq) &&
+ !netif_tx_queue_frozen(txq))
ret = dev_hard_start_xmit(skb, dev, txq);
HARD_TX_UNLOCK(dev, txq);

@@ -162,7 +163,8 @@ static inline int qdisc_restart(struct Qdisc *q)
break;
}

- if (ret && netif_tx_queue_stopped(txq))
+ if (ret && (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq)))
ret = 0;

return ret;
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index 5372236..2c35c67 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -305,10 +305,11 @@ restart:

switch (teql_resolve(skb, skb_res, slave)) {
case 0:
- if (netif_tx_trylock(slave)) {
- if (!__netif_subqueue_stopped(slave, subq) &&
+ if (__netif_tx_trylock(slave_txq)) {
+ if (!netif_tx_queue_stopped(slave_txq) &&
+ !netif_tx_queue_frozen(slave_txq) &&
slave->hard_start_xmit(skb, slave) == 0) {
- netif_tx_unlock(slave);
+ __netif_tx_unlock(slave_txq);
master->slaves = NEXT_SLAVE(q);
netif_wake_queue(dev);
master->stats.tx_packets++;
@@ -316,7 +317,7 @@ restart:
qdisc_pkt_len(skb);
return 0;
}
- netif_tx_unlock(slave);
+ __netif_tx_unlock(slave_txq);
}
if (netif_queue_stopped(dev))
busy = 1;

2008-07-21 19:13:50

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

Ingo Molnar wrote:
> * Ian Schram <[email protected]> wrote:
>
>> I was looking at this out of interest, but I'm in no way familiar with
>> the code.
>
> thanks Ian for the patch, i'll test it.
>
> Note that it was whitespace damaged, find below a tidied up version of
> the patch that i've applied to tip/out-of-tree.
>
> Ingo

This patch may be needed to fix error handling in the hw_sim code, but I get the
crash even with that code disabled. I'm currently bisecting to find the culprit.

Larry

2008-07-22 17:20:06

by Patrick McHardy

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

Larry Finger wrote:
> I pulled from Linus's tree this morning and now have git-05752-g93ded9b.
> The kernel WARNING from __netif_schedule and the lockdep warning are
> present with or without the patches from yesterday.
>
> As I stated earlier, the kernel WARNING (it was a BUG then) was
> introduced in commit 37437bb2 when the BUG statement was entered.
>
> The lockdep warning started with the next commit (16361127).
>
> The lockdep warning is:
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.26-Linus-git-05752-g93ded9b #49

^^^^^^^^^^^^^^ dirty?

This kernel is not using the patches we sent.

2008-07-23 09:29:53

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, Jul 23, 2008 at 11:03:06AM +0200, Peter Zijlstra wrote:
> On Wed, 2008-07-23 at 08:54 +0000, Jarek Poplawski wrote:
> > On Wed, Jul 23, 2008 at 12:59:21AM -0700, David Miller wrote:
> > > From: Jarek Poplawski <[email protected]>
> > > Date: Wed, 23 Jul 2008 06:20:36 +0000
> > >
> > > > PS: if there is nothing new in lockdep the classical method would
> > > > be to change this static array:
> > > >
> > > > static struct lock_class_key
> > > > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
> > > >
> > > > to
> > > >
> > > > static struct lock_class_key
> > > > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)][MAX_NUM_TX_QUEUES];
> > > >
> > > > and set lockdep classes per queue as well. (If we are sure we don't
> > > > need lockdep subclasses anywhere this could be optimized by using
> > > > one lock_class_key per 8 queues and spin_lock_nested()).
> > >
> > > Unfortunately MAX_NUM_TX_QUEUES is USHORT_MAX, so this isn't really
> > > a feasible approach.
> >
> > Is it used by real devices already? Maybe for the beginning we could
> > start with something less?
> >
> > > spin_lock_nested() isn't all that viable either, as the subclass
> > > limit is something like 8.
> >
> > This method would need to do some additional counting: depending of
> > a queue number each 8 subsequent queues share (are set to) the same
> > class and their number mod 8 gives the subqueue number for
> > spin_lock_nested().
> >
> > I'll try to find if there is something new around this in lockdep.
> > (lockdep people added to CC.)
>
> There isn't.
>
> Is there a static data structure that the driver needs to instantiate to
> 'create' a queue? Something like:
>
> /* this imaginary e1000 hardware has 16 hardware queues */
> static struct net_tx_queue e1000e_tx_queues[16];

I guess, not.

Then, IMHO, we could be practical and simply skip lockdep validation
for "some" range of locks: e.g. to set the table for the first 256
queues only, and to do e.g. __raw_spin_lock() for bigger numbers. (If
there are any bad locking patterns this should be enough for checking.)

Jarek P.

2008-07-22 12:52:27

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

David Miller wrote:
> From: Larry Finger <[email protected]>
> Date: Tue, 22 Jul 2008 01:34:28 -0500
>
>> David Miller wrote:
>>> From: Larry Finger <[email protected]>
>>> Date: Mon, 21 Jul 2008 17:40:10 -0500
>>>
>>>> Sorry :(
>>>>
>>>> I used the davem patch, the second version of your first one, and your second
>>>> one. Both problems persist.
>>>>
>>>> Still plugging away on bisection.
>>> GIT bisecting the lockdep problem is surely going the land you on:
>>>
>>> commit e308a5d806c852f56590ffdd3834d0df0cbed8d7
>> No. It landed on this one.
>
> For the lockdep warnings?

No - this one triggers the kernel BUG as follows:

------------[ cut here ]------------
kernel BUG at net/core/dev.c:1328!
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: af_packet rfkill_input nfs lockd nfs_acl sunrpc
cpufreq_conservative cpufreq_userspace cpufreq_powersave powernow_k8 fuse loop
dm_mod arc4 ecb crypto_blkcipher b43 firmware_class rfkill mac80211 cfg80211
led_class input_polldev k8temp sr_mod battery ac ssb button hwmon forcedeth
cdrom serio_raw sg ohci_hcd ehci_hcd sd_mod usbcore edd fan thermal processor
ext3 mbcache jbd pata_amd ahci libata scsi_mod dock
Pid: 2003, comm: b43 Not tainted 2.6.26-rc8-Linus-git-01424-g37437bb #43
RIP: 0010:[<ffffffff803958c6>] [<ffffffff803958c6>] __netif_schedule+0x12/0x75
RSP: 0018:ffff8100b9e33de0 EFLAGS: 00010246
RAX: ffff8100b63819c0 RBX: ffffffff80545300 RCX: ffff8100b6381980
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffffffff80545300
RBP: ffff8100b7b45158 R08: ffff8100b89d8000 R09: ffff8100b9d26000
R10: ffff8100b7b44480 R11: ffffffffa01239ef R12: ffff8100b7b44480
R13: ffff8100b9d26000 R14: ffff8100b89d8000 R15: 0000000000000000
FS: 00007f494406a6f0(0000) GS:ffffffff8055e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f49440933dc CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process b43 (pid: 2003, threadinfo ffff8100b9e32000, task ffff8100b4a3e480)
Stack: ffff8100b7b45158 ffff8100b89d8900 ffff8100b7b45158 ffffffffa0158455
ffff8100ba3287c0 0000000000000246 0000000000000000 0000000000000000
ffff8100b9e33e70 ffff8100b7b451b8 ffff8100ba3287c0 ffff8100b7b451b0
Call Trace:
[<ffffffffa0158455>] ? :mac80211:ieee80211_scan_completed+0x25b/0x2e1
[<ffffffffa01586d6>] ? :mac80211:ieee80211_sta_scan_work+0x0/0x1b8
[<ffffffff8023f7d7>] ? run_workqueue+0xf1/0x1f3
[<ffffffff8023f9b4>] ? worker_thread+0xdb/0xea
[<ffffffff80243017>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff8023f8d9>] ? worker_thread+0x0/0xea
[<ffffffff80242cff>] ? kthread+0x47/0x73
[<ffffffff80402845>] ? trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8020cd48>] ? child_rip+0xa/0x12
[<ffffffff8020c45f>] ? restore_args+0x0/0x30
[<ffffffff8021d3b6>] ? flat_send_IPI_mask+0x0/0x67
[<ffffffff80242c93>] ? kthreadd+0x188/0x1ad
[<ffffffff80242c93>] ? kthreadd+0x188/0x1ad
[<ffffffff80242cb8>] ? kthread+0x0/0x73
[<ffffffff8020cd3e>] ? child_rip+0x0/0x12


Code: 00 00 75 0a 55 9d 5e 5b 5d e9 32 64 eb ff e8 21 73 eb ff 55 9d 59 5b 5d c3
55 53 48 89 fb 48 83 ec 08 48 81 ff 00 53 54 80 75 04 <0f> 0b eb fe 48 8d 47 30
f0 0f ba 28 01 19 d2 85 d2 75 4c 9c 5d
RIP [<ffffffff803958c6>] __netif_schedule+0x12/0x75
RSP <ffff8100b9e33de0>
---[ end trace 396dc6bdf73da468 ]---

I'll have to trace back to see which of the bisections produced both the lockdep
and the kernel bug.

Larry



2008-07-21 19:43:34

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

Ingo Molnar wrote:
> * Larry Finger <[email protected]> wrote:
>
>> Ingo Molnar wrote:
>>> * Ian Schram <[email protected]> wrote:
>>>
>>>> I was looking at this out of interest, but I'm in no way familiar
>>>> with the code.
>>> thanks Ian for the patch, i'll test it.
>>>
>>> Note that it was whitespace damaged, find below a tidied up version of
>>> the patch that i've applied to tip/out-of-tree.
>>>
>>> Ingo
>> This patch may be needed to fix error handling in the hw_sim code, but
>> I get the crash even with that code disabled. I'm currently bisecting
>> to find the culprit.
>
> ok. I just reactivated CONFIG_MAC80211_HWSIM, applied Ian's fix and the
> crash went away:
>
> calling iwl4965_init+0x0/0x6c
> iwl4965: Intel(R) Wireless WiFi Link 4965AGN driver for Linux, 1.3.27kd
> iwl4965: Copyright(c) 2003-2008 Intel Corporation
> initcall iwl4965_init+0x0/0x6c returned 0 after 10 msecs
> calling init_mac80211_hwsim+0x0/0x31c
> mac80211_hwsim: Initializing radio 0
> PM: Adding info for No Bus:hwsim0
> PM: Adding info for No Bus:phy0
> PM: Adding info for No Bus:wmaster0
> phy0: Failed to select rate control algorithm
> phy0: Failed to initialize rate control algorithm
> PM: Removing info for No Bus:wmaster0
> PM: Removing info for No Bus:phy0
> mac80211_hwsim: ieee80211_register_hw failed (-2)
> PM: Removing info for No Bus:hwsim0
> initcall init_mac80211_hwsim+0x0/0x31c returned -2 after 58 msecs
> initcall init_mac80211_hwsim+0x0/0x31c returned with error code -2
> calling dmfe_init_module+0x0/0xea
> dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17)
> initcall dmfe_init_module+0x0/0xea returned 0 after 5 msecs
>
> So at least as far as the init_mac80211_hwsim() deinit crash goes:
>
> Tested-by: Ingo Molnar <[email protected]>

Yes, I'm chasing a distinct bug. The header for mine is

Jul 21 12:19:37 larrylap kernel: kernel BUG at net/core/dev.c:1328!
Jul 21 12:19:37 larrylap kernel: invalid opcode: 0000 [1] SMP
Jul 21 12:19:37 larrylap kernel: CPU 0
Jul 21 12:19:37 larrylap kernel: Modules linked in: af_packet rfkill_input nfs
lockd nfs_acl sunrpc cpufreq_conservative cpu
freq_userspace cpufreq_powersave powernow_k8 fuse loop dm_mod arc4 ecb
crypto_blkcipher b43 firmware_class rfkill mac80211 c
fg80211 snd_hda_intel snd_pcm snd_timer led_class snd k8temp input_polldev
sr_mod soundcore button battery hwmon cdrom force
deth ac serio_raw ssb snd_page_alloc sg ehci_hcd sd_mod ohci_hcd usbcore edd fan
thermal processor ext3 mbcache jbd pata_amd
ahci libata scsi_mod dock
Jul 21 12:19:37 larrylap kernel: Pid: 2057, comm: b43 Not tainted
2.6.26-Linus-git-05253-g14b395e #1
Jul 21 12:19:37 larrylap kernel: RIP: 0010:[<ffffffff8039ec4d>]
[<ffffffff8039ec4d>] __netif_schedule+0x12/0x75
Jul 21 12:19:37 larrylap kernel: RSP: 0000:ffff8800b9ae1de0 EFLAGS: 00010246

With an invalid opcode, mine is likely due to stack corruption.

Larry

2008-07-21 22:05:25

by Patrick McHardy

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

diff --git a/net/core/dev.c b/net/core/dev.c
index 2eed17b..371b1a0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -259,7 +259,7 @@ static RAW_NOTIFIER_HEAD(netdev_chain);

DEFINE_PER_CPU(struct softnet_data, softnet_data);

-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#ifdef CONFIG_LOCKDEP
/*
* register_netdevice() inits txq->_xmit_lock and sets lockdep class
* according to dev->type


Attachments:
x (389.00 B)

2008-07-23 20:42:52

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, Jul 23, 2008 at 01:16:07PM -0700, David Miller wrote:
...
> There will always be a need for a "stop all the TX queues" operation.

The question is if the current way is "all correct". As a matter of
fact I think Peter's doubts could be justified: taking "USHORT_MAX"
locks looks really dubious (so maybe it's not so strange lockdep
didn't get used to this).

Jarek P.

2008-07-21 21:51:26

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

Patrick McHardy wrote:
> Patrick McHardy wrote:
>> David Miller wrote:
>>> Maybe something like the following should do it?
>>
>>
>> It looks correct in any case. I'm not sure whether it fixes
>> this lockdep warning though, according to the backtrace and
>> module list its b43 and dev_mc_sync in net/mac80211/main.c
>> that are causing the error, which don't seem to be included
>> in your patch. I'm unable to find where it previously
>> initialized the xmit_lock lockdep class though, so I must
>> be missing something :)
>
> This is what I was missing, we're setting a lockdep class
> by default depending on dev->type. This patch combined
> with yours should fix all addr_list_lock warnings.

No cigar yet. I tried davem's patch first, then yours on top of his. I still get
both the recursive locking and the kernel warning.

BTW, wireless doesn't work but if I plug in the wire, then networking is OK. It
seems to be in mac80211, which is strange because I routinely run the latest
wireless-testing kernel, and all the wireless bits should be there already.

I'm still plugging away at the bisection. I think I got away from the kernel
that won't build.

Larry

2008-07-23 20:16:07

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Wed, 23 Jul 2008 11:49:14 +0000

> On Wed, Jul 23, 2008 at 11:35:19AM +0000, Jarek Poplawski wrote:
> > On Wed, Jul 23, 2008 at 12:58:16PM +0200, Peter Zijlstra wrote:
> ...
> > > When I look at the mac802.11 code in ieee80211_tx_pending() it looks
> > > like it can do with just one lock at a time, instead of all - but I
> > > might be missing some obvious details.
> > >
> > > So I guess my question is, is netif_tx_lock() here to stay, or is the
> > > right fix to convert all those drivers to use __netif_tx_lock() which
> > > locks only a single queue?
> > >
> >
> > It's a new thing mainly for new hardware/drivers, and just after
> > conversion (older drivers effectively use __netif_tx_lock()), so it'll
> > probably stay for some time until something better is found. David,
> > will tell the rest, I hope.
>
> ...And, of course, these new drivers should also lock a single queue
> where possible.

It isn't going away.

There will always be a need for a "stop all the TX queues" operation.

2008-07-21 21:42:58

by Patrick McHardy

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

net: set lockdep class for dev->addr_list_lock

Initialize dev->addr_list_lock lockdep classes equally to dev->_xmit_lock.

Signed-off-by: Patrick McHardy <[email protected]>

diff --git a/net/core/dev.c b/net/core/dev.c
index 2eed17b..6f8b6c5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -299,6 +299,7 @@ static const char *netdev_lock_name[] =
"_xmit_NONE"};

static struct lock_class_key netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
+static struct lock_class_key netdev_addr_lock_key[ARRAY_SIZE(netdev_lock_type)];

static inline unsigned short netdev_lock_pos(unsigned short dev_type)
{
@@ -311,8 +312,8 @@ static inline unsigned short netdev_lock_pos(unsigned short dev_type)
return ARRAY_SIZE(netdev_lock_type) - 1;
}

-static inline void netdev_set_lockdep_class(spinlock_t *lock,
- unsigned short dev_type)
+static inline void netdev_set_xmit_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
{
int i;

@@ -320,9 +321,22 @@ static inline void netdev_set_lockdep_class(spinlock_t *lock,
lockdep_set_class_and_name(lock, &netdev_xmit_lock_key[i],
netdev_lock_name[i]);
}
+
+static inline void netdev_set_addr_lockdep_class(struct net_device *dev)
+{
+ int i;
+
+ i = netdev_lock_pos(dev->type);
+ lockdep_set_class_and_name(&dev->addr_list_lock,
+ &netdev_addr_lock_key[i],
+ netdev_lock_name[i]);
+}
#else
-static inline void netdev_set_lockdep_class(spinlock_t *lock,
- unsigned short dev_type)
+static inline void netdev_set_xmit_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+}
+static inline void netdev_set_addr_lockdep_class(struct net_device *dev)
{
}
#endif
@@ -3843,7 +3857,7 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
void *_unused)
{
spin_lock_init(&dev_queue->_xmit_lock);
- netdev_set_lockdep_class(&dev_queue->_xmit_lock, dev->type);
+ netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
dev_queue->xmit_lock_owner = -1;
}

@@ -3888,6 +3902,7 @@ int register_netdevice(struct net_device *dev)
net = dev_net(dev);

spin_lock_init(&dev->addr_list_lock);
+ netdev_set_addr_lockdep_class(dev);
netdev_init_queue_locks(dev);

dev->iflink = -1;


Attachments:
x (2.18 kB)

2008-07-21 21:01:21

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Patrick McHardy <[email protected]>
Date: Mon, 21 Jul 2008 22:51:53 +0200

> David Miller wrote:
> > From: Larry Finger <[email protected]>
> > Date: Mon, 21 Jul 2008 15:38:47 -0500
> >
> >> David Miller wrote:
> >>> No further backtrace? That will tell us what driver is causing
> >>> this.
> >> Yes, I have a full backtrace.
> >>
> >> It starts with possible recursive locking in NetworkManager, and goes directly
> >> into the Warning - this came from a later pull of Linus's tree.
> >
> > That helps a lot, I'm looking at this now.
>
> I'm guessing this needs similar lockdep class initializations
> to _xmit_lock since it essentially has the same nesting rules.

Yes, I figured that out just now :-)

Maybe something like the following should do it?

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9737c06..a641eea 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5041,6 +5041,7 @@ static int bond_check_params(struct bond_params *params)
}

static struct lock_class_key bonding_netdev_xmit_lock_key;
+static struct lock_class_key bonding_netdev_addr_lock_key;

static void bond_set_lockdep_class_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -5052,6 +5053,8 @@ static void bond_set_lockdep_class_one(struct net_device *dev,

static void bond_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock,
+ &bonding_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, bond_set_lockdep_class_one, NULL);
}

diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index b6500b2..58f4b1d 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -123,6 +123,7 @@ static LIST_HEAD(bpq_devices);
* off into a separate class since they always nest.
*/
static struct lock_class_key bpq_netdev_xmit_lock_key;
+static struct lock_class_key bpq_netdev_addr_lock_key;

static void bpq_set_lockdep_class_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -133,6 +134,7 @@ static void bpq_set_lockdep_class_one(struct net_device *dev,

static void bpq_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock, &bpq_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, bpq_set_lockdep_class_one, NULL);
}

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index efbc155..4239450 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -276,6 +276,7 @@ static int macvlan_change_mtu(struct net_device *dev, int new_mtu)
* separate class since they always nest.
*/
static struct lock_class_key macvlan_netdev_xmit_lock_key;
+static struct lock_class_key macvlan_netdev_addr_lock_key;

#define MACVLAN_FEATURES \
(NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
@@ -295,6 +296,8 @@ static void macvlan_set_lockdep_class_one(struct net_device *dev,

static void macvlan_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock,
+ &macvlan_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, macvlan_set_lockdep_class_one, NULL);
}

diff --git a/drivers/net/wireless/hostap/hostap_hw.c b/drivers/net/wireless/hostap/hostap_hw.c
index 13d5882..3153fe9 100644
--- a/drivers/net/wireless/hostap/hostap_hw.c
+++ b/drivers/net/wireless/hostap/hostap_hw.c
@@ -3101,6 +3101,7 @@ static void prism2_clear_set_tim_queue(local_info_t *local)
* This is a natural nesting, which needs a split lock type.
*/
static struct lock_class_key hostap_netdev_xmit_lock_key;
+static struct lock_class_key hostap_netdev_addr_lock_key;

static void prism2_set_lockdep_class_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -3112,6 +3113,8 @@ static void prism2_set_lockdep_class_one(struct net_device *dev,

static void prism2_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock,
+ &hostap_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, prism2_set_lockdep_class_one, NULL);
}

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f42bc2b..4bf014e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -569,6 +569,7 @@ static void vlan_dev_set_rx_mode(struct net_device *vlan_dev)
* separate class since they always nest.
*/
static struct lock_class_key vlan_netdev_xmit_lock_key;
+static struct lock_class_key vlan_netdev_addr_lock_key;

static void vlan_dev_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -581,6 +582,9 @@ static void vlan_dev_set_lockdep_one(struct net_device *dev,

static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass)
{
+ lockdep_set_class_and_subclass(&dev->addr_list_lock,
+ &vlan_netdev_addr_lock_key,
+ subclass);
netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, &subclass);
}

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index fccc250..532e4fa 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -73,6 +73,7 @@ static const struct proto_ops nr_proto_ops;
* separate class since they always nest.
*/
static struct lock_class_key nr_netdev_xmit_lock_key;
+static struct lock_class_key nr_netdev_addr_lock_key;

static void nr_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -83,6 +84,7 @@ static void nr_set_lockdep_one(struct net_device *dev,

static void nr_set_lockdep_key(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock, &nr_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, nr_set_lockdep_one, NULL);
}

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index dbc963b..a7f1ce1 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -74,6 +74,7 @@ ax25_address rose_callsign;
* separate class since they always nest.
*/
static struct lock_class_key rose_netdev_xmit_lock_key;
+static struct lock_class_key rose_netdev_addr_lock_key;

static void rose_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -84,6 +85,7 @@ static void rose_set_lockdep_one(struct net_device *dev,

static void rose_set_lockdep_key(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock, &rose_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, rose_set_lockdep_one, NULL);
}


2008-07-26 09:18:48

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Fri, 25 Jul 2008 22:01:37 +0200

> On Fri, Jul 25, 2008 at 09:36:15PM +0200, Johannes Berg wrote:
> > On Fri, 2008-07-25 at 21:34 +0200, Jarek Poplawski wrote:
> >
> > No, we need to be able to lock out multiple TX paths at once.
>
> IMHO, it can do the same. We could e.g. insert a locked spinlock into
> this noop_tx_handler(), to give everyone some waiting.

I think there might be an easier way, but we may have
to modify the state bits a little.

Every call into ->hard_start_xmit() is made like this:

1. lock TX queue
2. check TX queue stopped
3. call ->hard_start_xmit() if not stopped

This means that we can in fact do something like:

unsigned int i;

for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq;

txq = netdev_get_tx_queue(dev, i);
spin_lock_bh(&txq->_xmit_lock);
netif_tx_freeze_queue(txq);
spin_unlock_bh(&txq->_xmit_lock);
}

netif_tx_freeze_queue() just sets a new bit we add.

Then we go to the ->hard_start_xmit() call sites and check this new
"frozen" bit as well as the existing "stopped" bit.

When we unfreeze each queue later, we see if it is stopped, and if not
we schedule it's qdisc for packet processing.

A patch below shows how the guarding would work. It doesn't implement
the actual freeze/unfreeze.

We need to use a side-state bit to do this because we don't
want this operation to get all mixed up with the queue waking
operations that the driver TX reclaim code will be doing
asynchronously.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b4d056c..cba98fb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -440,6 +440,7 @@ static inline void napi_synchronize(const struct napi_struct *n)
enum netdev_queue_state_t
{
__QUEUE_STATE_XOFF,
+ __QUEUE_STATE_FROZEN,
};

struct netdev_queue {
@@ -1099,6 +1100,11 @@ static inline int netif_queue_stopped(const struct net_device *dev)
return netif_tx_queue_stopped(netdev_get_tx_queue(dev, 0));
}

+static inline int netif_tx_queue_frozen(const struct netdev_queue *dev_queue)
+{
+ return test_bit(__QUEUE_STATE_FROZEN, &dev_queue->state);
+}
+
/**
* netif_running - test if up
* @dev: network device
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c127208..6c7af39 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -70,6 +70,7 @@ static void queue_process(struct work_struct *work)
local_irq_save(flags);
__netif_tx_lock(txq, smp_processor_id());
if (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq) ||
dev->hard_start_xmit(skb, dev) != NETDEV_TX_OK) {
skb_queue_head(&npinfo->txq, skb);
__netif_tx_unlock(txq);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index c7d484f..3284605 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3305,6 +3305,7 @@ static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)

txq = netdev_get_tx_queue(odev, queue_map);
if (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq) ||
need_resched()) {
idle_start = getCurUs();

@@ -3320,7 +3321,8 @@ static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)

pkt_dev->idle_acc += getCurUs() - idle_start;

- if (netif_tx_queue_stopped(txq)) {
+ if (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq)) {
pkt_dev->next_tx_us = getCurUs(); /* TODO */
pkt_dev->next_tx_ns = 0;
goto out; /* Try the next interface */
@@ -3352,7 +3354,8 @@ static __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)
txq = netdev_get_tx_queue(odev, queue_map);

__netif_tx_lock_bh(txq);
- if (!netif_tx_queue_stopped(txq)) {
+ if (!netif_tx_queue_stopped(txq) &&
+ !netif_tx_queue_frozen(txq)) {

atomic_inc(&(pkt_dev->skb->users));
retry_now:
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index fd2a6ca..f17551a 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -135,7 +135,8 @@ static inline int qdisc_restart(struct Qdisc *q)
txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));

HARD_TX_LOCK(dev, txq, smp_processor_id());
- if (!netif_subqueue_stopped(dev, skb))
+ if (!netif_tx_queue_stopped(txq) &&
+ !netif_tx_queue_frozen(txq))
ret = dev_hard_start_xmit(skb, dev, txq);
HARD_TX_UNLOCK(dev, txq);

@@ -162,7 +163,8 @@ static inline int qdisc_restart(struct Qdisc *q)
break;
}

- if (ret && netif_tx_queue_stopped(txq))
+ if (ret && (netif_tx_queue_stopped(txq) ||
+ netif_tx_queue_frozen(txq)))
ret = 0;

return ret;

2008-07-21 22:40:14

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

Patrick McHardy wrote:
> Larry Finger wrote:
>> Patrick McHardy wrote:
>>>
>>> This is what I was missing, we're setting a lockdep class
>>> by default depending on dev->type. This patch combined
>>> with yours should fix all addr_list_lock warnings.
>>
>> No cigar yet. I tried davem's patch first, then yours on top of his. I
>> still get both the recursive locking and the kernel warning.
>
> Does this one earn me my cigar? :)

Sorry :(

I used the davem patch, the second version of your first one, and your second
one. Both problems persist.

Still plugging away on bisection.

Larry

2008-07-21 18:18:53

by Ian Schram

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

I was looking at this out of interest, but I'm in no way familiar with the code.

Looks to me that the error handling code in mac80211_hwsim is awkward. Which
leads to it calling ieee80211_unregister_hw even when ieee80211_register_hw failed.

The function has a for loop where it generates all simulated radios. when something
fails, the error handling will call mac80211_hwsim_free which frees all simulated radios
who's pointer isn't zero. However the information stored is insufficient to determine
whether or not the call to ieee80211_register_hw succeeded or not for a specific radio.
The included patch makes init_mac80211_hwsim clean up the current simulated radio,
and then calls into mac80211_hwsim_free to clean up all the radios that did succeed.

This however doesn't explain why the rate control registration failed.. build tested this,
but had some problems reproducing the original problem.

signed-off-by: Ian Schram <[email protected]>
--- a/mac80211_hwsim.c 2008-07-21 18:48:38.000000000 +0200
+++ b/mac80211_hwsim.c 2008-07-21 19:31:44.000000000 +0200
@@ -364,8 +364,7 @@ static void mac80211_hwsim_free(void)
struct mac80211_hwsim_data *data;
data = hwsim_radios[i]->priv;
ieee80211_unregister_hw(hwsim_radios[i]);
- if (!IS_ERR(data->dev))
- device_unregister(data->dev);
+ device_unregister(data->dev);
ieee80211_free_hw(hwsim_radios[i]);
}
}
@@ -437,7 +436,7 @@ static int __init init_mac80211_hwsim(vo
"mac80211_hwsim: device_create_drvdata "
"failed (%ld)\n", PTR_ERR(data->dev));
err = -ENOMEM;
- goto failed;
+ goto failed_drvdata;
}
data->dev->driver = &mac80211_hwsim_driver;

@@ -461,7 +460,7 @@ static int __init init_mac80211_hwsim(vo
if (err < 0) {
printk(KERN_DEBUG "mac80211_hwsim: "
"ieee80211_register_hw failed (%d)\n", err);
- goto failed;
+ goto failed_hw;
}

printk(KERN_DEBUG "%s: hwaddr %s registered\n",
@@ -479,9 +478,9 @@ static int __init init_mac80211_hwsim(vo
rtnl_lock();

err = dev_alloc_name(hwsim_mon, hwsim_mon->name);
- if (err < 0) {
+ if (err < 0)
goto failed_mon;
- }
+

err = register_netdevice(hwsim_mon);
if (err < 0)
@@ -494,7 +493,14 @@ static int __init init_mac80211_hwsim(vo
failed_mon:
rtnl_unlock();
free_netdev(hwsim_mon);
+ mac80211_hwsim_free();
+ return err;

+failed_hw:
+ device_unregister(data->dev);
+failed_drvdata:
+ ieee80211_free_hw(hw);
+ hwsim_radios[i] = 0;
failed:
mac80211_hwsim_free();
return err;



2008-07-21 21:35:34

by Patrick McHardy

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

net: set lockdep class for dev->addr_list_lock

Initialize dev->addr_list_lock lockdep classes equally to dev->_xmit_lock.

Signed-off-by: Patrick McHardy <[email protected]>

diff --git a/net/core/dev.c b/net/core/dev.c
index 2eed17b..9cfed90 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -299,6 +299,7 @@ static const char *netdev_lock_name[] =
"_xmit_NONE"};

static struct lock_class_key netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
+static struct lock_class_key netdev_addr_lock_key[ARRAY_SIZE(netdev_lock_type)];

static inline unsigned short netdev_lock_pos(unsigned short dev_type)
{
@@ -311,8 +312,8 @@ static inline unsigned short netdev_lock_pos(unsigned short dev_type)
return ARRAY_SIZE(netdev_lock_type) - 1;
}

-static inline void netdev_set_lockdep_class(spinlock_t *lock,
- unsigned short dev_type)
+static inline void netdev_set_xmit_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
{
int i;

@@ -320,9 +321,23 @@ static inline void netdev_set_lockdep_class(spinlock_t *lock,
lockdep_set_class_and_name(lock, &netdev_xmit_lock_key[i],
netdev_lock_name[i]);
}
+
+static inline void netdev_set_addr_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+ int i;
+
+ i = netdev_lock_pos(dev_type);
+ lockdep_set_class_and_name(lock, &netdev_addr_lock_key[i],
+ netdev_lock_name[i]);
+}
#else
-static inline void netdev_set_lockdep_class(spinlock_t *lock,
- unsigned short dev_type)
+static inline void netdev_set_xmit_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+}
+static inline void netdev_set_addr_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
{
}
#endif
@@ -3843,14 +3858,15 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
void *_unused)
{
spin_lock_init(&dev_queue->_xmit_lock);
- netdev_set_lockdep_class(&dev_queue->_xmit_lock, dev->type);
+ netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
dev_queue->xmit_lock_owner = -1;
}

-static void netdev_init_queue_locks(struct net_device *dev)
+static void netdev_init_locks(struct net_device *dev)
{
netdev_for_each_tx_queue(dev, __netdev_init_queue_locks_one, NULL);
__netdev_init_queue_locks_one(dev, &dev->rx_queue, NULL);
+ netdev_set_addr_lockdep_class(&dev->addr_list_lock, dev->type);
}

/**
@@ -3888,7 +3904,7 @@ int register_netdevice(struct net_device *dev)
net = dev_net(dev);

spin_lock_init(&dev->addr_list_lock);
- netdev_init_queue_locks(dev);
+ netdev_init_locks(dev);

dev->iflink = -1;


Attachments:
x (2.52 kB)

2008-07-21 20:28:31

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

Linus Torvalds wrote:
>
> On Mon, 21 Jul 2008, Larry Finger wrote:
>> Yes, I'm chasing a distinct bug. The header for mine is
>>
>> Jul 21 12:19:37 larrylap kernel: kernel BUG at net/core/dev.c:1328!
>
> Ok, that one is fixed now in my tree. Or at least it's turned into a
> warning, so the machine should work.
>
>> With an invalid opcode, mine is likely due to stack corruption.
>
> No, invalid opcode is because we use the "ud2" instruction for BUG(),
> which causes an invalid op exception. So any BUG[_ON]() will always cause
> that on x86.

Thanks for the explanation.

With your latest tree, I do get the warning. Unfortunately, it still breaks my
wireless and I still need to do the bisection. That is complicated by getting a
kernel that won't build after the first try. I think I now have a workaround.

Larry

2008-07-23 09:03:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, 2008-07-23 at 08:54 +0000, Jarek Poplawski wrote:
> On Wed, Jul 23, 2008 at 12:59:21AM -0700, David Miller wrote:
> > From: Jarek Poplawski <[email protected]>
> > Date: Wed, 23 Jul 2008 06:20:36 +0000
> >
> > > PS: if there is nothing new in lockdep the classical method would
> > > be to change this static array:
> > >
> > > static struct lock_class_key
> > > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
> > >
> > > to
> > >
> > > static struct lock_class_key
> > > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)][MAX_NUM_TX_QUEUES];
> > >
> > > and set lockdep classes per queue as well. (If we are sure we don't
> > > need lockdep subclasses anywhere this could be optimized by using
> > > one lock_class_key per 8 queues and spin_lock_nested()).
> >
> > Unfortunately MAX_NUM_TX_QUEUES is USHORT_MAX, so this isn't really
> > a feasible approach.
>
> Is it used by real devices already? Maybe for the beginning we could
> start with something less?
>
> > spin_lock_nested() isn't all that viable either, as the subclass
> > limit is something like 8.
>
> This method would need to do some additional counting: depending of
> a queue number each 8 subsequent queues share (are set to) the same
> class and their number mod 8 gives the subqueue number for
> spin_lock_nested().
>
> I'll try to find if there is something new around this in lockdep.
> (lockdep people added to CC.)

There isn't.

Is there a static data structure that the driver needs to instantiate to
'create' a queue? Something like:

/* this imaginary e1000 hardware has 16 hardware queues */
static struct net_tx_queue e1000e_tx_queues[16];

In that case you can stick the key in there and do:

int e1000e_init_tx_queue(struct net_tx_queue *txq)
{
...

spin_lock_init(&txq->tx_lock);
lockdep_set_class(&txq->tx_lock, &txq->tx_lock_key);

...
}

( This is what the scheduler runqueues also do )


2008-07-21 20:16:00

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Linus Torvalds <[email protected]>
Date: Mon, 21 Jul 2008 12:47:58 -0700 (PDT)

> On Mon, 21 Jul 2008, Larry Finger wrote:
> > With an invalid opcode, mine is likely due to stack corruption.
>
> No, invalid opcode is because we use the "ud2" instruction for BUG(),
> which causes an invalid op exception. So any BUG[_ON]() will always cause
> that on x86.

Is there really no more backtrace from that crash message?
It would tell me what driver it's in.

There is some "comm: b43" in the log so I'll check that one.

2008-07-23 06:15:30

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On 23-07-2008 01:04, David Miller wrote:
> From: Larry Finger <[email protected]>
> Date: Tue, 22 Jul 2008 13:39:08 -0500
>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 2.6.26-Linus-05752-g93ded9b-dirty #53
>> ---------------------------------------------
>> b43/1997 is trying to acquire lock:
>> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
>> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
>>
>> but task is already holding lock:
>> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
>> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
...
> Lockdep doesn't like that we have an array of objects (the TX queues)
> and we're iterating over them grabbing all of their locks.
>
> Does anyone know how to teach lockdep that this is OK?

I guess, David Miller knows...:

http://permalink.gmane.org/gmane.linux.network/99784

Jarek P.

PS: if there is nothing new in lockdep the classical method would
be to change this static array:

static struct lock_class_key
netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];

to

static struct lock_class_key
netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)][MAX_NUM_TX_QUEUES];

and set lockdep classes per queue as well. (If we are sure we don't
need lockdep subclasses anywhere this could be optimized by using
one lock_class_key per 8 queues and spin_lock_nested()).

2008-07-21 23:15:17

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Larry Finger <[email protected]>
Date: Mon, 21 Jul 2008 17:40:10 -0500

> Sorry :(
>
> I used the davem patch, the second version of your first one, and your second
> one. Both problems persist.
>
> Still plugging away on bisection.

GIT bisecting the lockdep problem is surely going the land you on:

commit e308a5d806c852f56590ffdd3834d0df0cbed8d7
Author: David S. Miller <[email protected]>
Date: Tue Jul 15 00:13:44 2008 -0700

netdev: Add netdev->addr_list_lock protection.

Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <[email protected]>


2008-07-21 20:51:56

by Patrick McHardy

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

David Miller wrote:
> From: Larry Finger <[email protected]>
> Date: Mon, 21 Jul 2008 15:38:47 -0500
>
>> David Miller wrote:
>>> No further backtrace? That will tell us what driver is causing
>>> this.
>> Yes, I have a full backtrace.
>>
>> It starts with possible recursive locking in NetworkManager, and goes directly
>> into the Warning - this came from a later pull of Linus's tree.
>
> That helps a lot, I'm looking at this now.

I'm guessing this needs similar lockdep class initializations
to _xmit_lock since it essentially has the same nesting rules.


2008-07-25 18:35:53

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, Jul 25, 2008 at 07:04:36PM +0200, Ingo Oeser wrote:
...
> I'm sure as hell, I miss sth. but can't it be done by this pseudo-code:

...And I really doubt it can't be done like this.

Jarek P.

>
> netif_tx_lock(device)
> {
> mutex_lock(device->queue_entry_mutex);
> foreach_queue_entries(queue, device->queues)
> {
> spin_lock(queue->tx_lock);
> set_noop_tx_handler(queue);
> spin_unlock(queue->tx_lock);
> }
> mutex_unlock(device->queue_entry_mutex);
> }
>
> netif_tx_unlock(device)
> {
> mutex_lock(device->queue_entry_mutex);
> foreach_queue_entries(queue, device->queues)
> {
> spin_lock(queue->tx_lock);
> set_useful_tx_handler(queue);
> spin_unlock(queue->tx_lock);
> }
> mutex_unlock(device->queue_entry_mutex);
> }
>
> Then protect use of the queues by queue->tx_lock in transmit path.
> The first setup of the queue doesn't need to be protected, since no-one
> knows the device. The final cleanup of the device doesn't need to be
> protected either, because netif_tx_lock() and netif_tx_unlock() should
> not be called after entering the final cleanup.
>
> Some VM locking works this way...
>
>
> Best Regards
>
> Ingo Oeser

2008-07-21 20:38:49

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

David Miller wrote:
>
> No further backtrace? That will tell us what driver is causing
> this.

Yes, I have a full backtrace.

It starts with possible recursive locking in NetworkManager, and goes directly
into the Warning - this came from a later pull of Linus's tree.

Jul 21 15:11:07 larrylap kernel: [ INFO: possible recursive locking detected ]
Jul 21 15:11:07 larrylap kernel: 2.6.26-Linus-git-05614-ge89970a #8
Jul 21 15:11:07 larrylap kernel: ---------------------------------------------
Jul 21 15:11:07 larrylap kernel: NetworkManager/2661 is trying to acquire lock:
Jul 21 15:11:07 larrylap kernel: (&dev->addr_list_lock){-...}, at:
[<ffffffff803a2961>] dev_mc_sync+0x19/0x57
Jul 21 15:11:07 larrylap kernel:
Jul 21 15:11:07 larrylap kernel: but task is already holding lock:
Jul 21 15:11:07 larrylap kernel: (&dev->addr_list_lock){-...}, at:
[<ffffffff8039e7c5>] dev_set_rx_mode+0x19/0x2e
Jul 21 15:11:07 larrylap kernel:
Jul 21 15:11:07 larrylap kernel: other info that might help us debug this:
Jul 21 15:11:07 larrylap kernel: 2 locks held by NetworkManager/2661:
Jul 21 15:11:07 larrylap kernel: #0: (rtnl_mutex){--..}, at:
[<ffffffff803a8318>] rtnetlink_rcv+0x12/0x27
Jul 21 15:11:07 larrylap kernel: #1: (&dev->addr_list_lock){-...}, at:
[<ffffffff8039e7c5>] dev_set_rx_mode+0x19/0x2e
Jul 21 15:11:07 larrylap kernel:
Jul 21 15:11:07 larrylap kernel: stack backtrace:
Jul 21 15:11:07 larrylap kernel: Pid: 2661, comm: NetworkManager Not tainted
2.6.26-Linus-git-05614-ge89970a #8
Jul 21 15:11:07 larrylap kernel:
Jul 21 15:11:07 larrylap kernel: Call Trace:
Jul 21 15:11:07 larrylap kernel: [<ffffffff80251b02>] __lock_acquire+0xb7b/0xecc
Jul 21 15:11:07 larrylap kernel: [<ffffffff80251ea4>] lock_acquire+0x51/0x6a
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a2961>] dev_mc_sync+0x19/0x57
Jul 21 15:11:07 larrylap kernel: [<ffffffff80408f9c>] _spin_lock_bh+0x23/0x2c
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a2961>] dev_mc_sync+0x19/0x57
Jul 21 15:11:07 larrylap kernel: [<ffffffff8039e7cd>] dev_set_rx_mode+0x21/0x2e
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a036c>] dev_open+0x8e/0xb0
Jul 21 15:11:07 larrylap kernel: [<ffffffff8039fd13>] dev_change_flags+0xa6/0x164
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a7421>] do_setlink+0x286/0x349
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a832d>] rtnetlink_rcv_msg+0x0/0x1ec
Jul 21 15:11:07 larrylap syslog-ng[2488]: last message repeated 2 times
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a766e>] rtnl_setlink+0x10b/0x10d
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a832d>] rtnetlink_rcv_msg+0x0/0x1ec
Jul 21 15:11:07 larrylap kernel: [<ffffffff803b1bcf>] netlink_rcv_skb+0x34/0x7d
Jul 21 15:11:07 larrylap kernel: [<ffffffff803a8327>] rtnetlink_rcv+0x21/0x27
Jul 21 15:11:07 larrylap kernel: [<ffffffff803b16cf>] netlink_unicast+0x1f0/0x261
Jul 21 15:11:07 larrylap kernel: [<ffffffff8039a448>] __alloc_skb+0x66/0x12a
Jul 21 15:11:07 larrylap kernel: [<ffffffff803b19a8>] netlink_sendmsg+0x268/0x27b
Jul 21 15:11:07 larrylap rpc.idmapd[2783]: main:
fcntl(/var/lib/nfs/rpc_pipefs/nfs): Invalid argument
Jul 21 15:11:07 larrylap kernel: [<ffffffff80393b8d>] sock_sendmsg+0xcb/0xe3
Jul 21 15:11:07 larrylap kernel: [<ffffffff80246aab>]
autoremove_wake_function+0x0/0x2e
Jul 21 15:11:07 larrylap kernel: [<ffffffff8039b194>] verify_iovec+0x46/0x82
Jul 21 15:11:07 larrylap kernel: [<ffffffff80393dbc>] sys_sendmsg+0x217/0x28a
Jul 21 15:11:07 larrylap kernel: [<ffffffff80393516>] sockfd_lookup_light+0x1a/0x52
Jul 21 15:11:07 larrylap kernel: [<ffffffff80250990>]
trace_hardirqs_on_caller+0xef/0x113
Jul 21 15:11:07 larrylap kernel: [<ffffffff80408b14>]
trace_hardirqs_on_thunk+0x3a/0x3f
Jul 21 15:11:07 larrylap kernel: [<ffffffff8020be9b>]
system_call_after_swapgs+0x7b/0x80
Jul 21 15:11:07 larrylap kernel:
Jul 21 15:11:07 larrylap kernel: NET: Registered protocol family 17
Jul 21 15:11:08 larrylap kernel: ------------[ cut here ]------------
Jul 21 15:11:08 larrylap kernel: WARNING: at net/core/dev.c:1330
__netif_schedule+0x2c/0x98()
Jul 21 15:11:08 larrylap kernel: Modules linked in: snd_seq_device af_packet nfs
lockd nfs_acl sunrpc rfkill_input cpufreq_conservative cpufreq_userspace
cpufreq_powersave powernow_k8 fuse loop dm_mod arc4 ecb crypto_blkcipher b43
firmware_class rfkill snd_hda_intel mac80211 cfg80211 snd_pcm snd_timer
led_class snd soundcore input_polldev ac k8temp button snd_page_alloc battery
sr_mod forcedeth cdrom serio_raw hwmon ssb sg ehci_hcd sd_mod ohci_hcd usbcore
edd fan thermal processor ext3 mbcache jbd pata_amd ahci libata scsi_mod dock
Jul 21 15:11:08 larrylap kernel: Pid: 2035, comm: b43 Not tainted
2.6.26-Linus-git-05614-ge89970a #8
Jul 21 15:11:08 larrylap kernel:
Jul 21 15:11:08 larrylap kernel: Call Trace:
Jul 21 15:11:08 larrylap kernel: [<ffffffff80233f6d>] warn_on_slowpath+0x51/0x8c
Jul 21 15:11:08 larrylap kernel: [<ffffffff8039d7f3>] __netif_schedule+0x2c/0x98
Jul 21 15:11:08 larrylap kernel: [<ffffffffa018b44d>]
ieee80211_scan_completed+0x25b/0x2e1 [mac80211]
Jul 21 15:11:08 larrylap kernel: [<ffffffffa018b6ce>]
ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
Jul 21 15:11:08 larrylap kernel: [<ffffffff8024325e>] run_workqueue+0xf0/0x1f2
Jul 21 15:11:08 larrylap kernel: [<ffffffff8024343b>] worker_thread+0xdb/0xea
Jul 21 15:11:08 larrylap kernel: [<ffffffff80246aab>]
autoremove_wake_function+0x0/0x2e
Jul 21 15:11:08 larrylap avahi-daemon[2877]: Found user 'avahi' (UID 102) and
group 'avahi' (GID 104).
Jul 21 15:11:09 larrylap kernel: [<ffffffff80243360>] worker_thread+0x0/0xea
Jul 21 15:11:09 larrylap kernel: [<ffffffff8024678b>] kthread+0x47/0x73
Jul 21 15:11:09 larrylap avahi-daemon[2877]: Successfully dropped root privileges.
Jul 21 15:11:09 larrylap kernel: [<ffffffff80408b14>]
trace_hardirqs_on_thunk+0x3a/0x3f
Jul 21 15:11:09 larrylap avahi-daemon[2877]: avahi-daemon 0.6.22 starting up.
Jul 21 15:11:09 larrylap kernel: [<ffffffff8020cea9>] child_rip+0xa/0x11
Jul 21 15:11:09 larrylap kernel: [<ffffffff8020c4df>] restore_args+0x0/0x30
Jul 21 15:11:09 larrylap kernel: [<ffffffff8024671f>] kthreadd+0x188/0x1ad
Jul 21 15:11:09 larrylap kernel: [<ffffffff80246744>] kthread+0x0/0x73
Jul 21 15:11:09 larrylap kernel: [<ffffffff8020ce9f>] child_rip+0x0/0x11
Jul 21 15:11:09 larrylap kernel:
Jul 21 15:11:09 larrylap kernel: ---[ end trace 030d0589d3c6c7f5 ]---


Larry

2008-07-21 19:34:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370


* Larry Finger <[email protected]> wrote:

> Ingo Molnar wrote:
>> * Ian Schram <[email protected]> wrote:
>>
>>> I was looking at this out of interest, but I'm in no way familiar
>>> with the code.
>>
>> thanks Ian for the patch, i'll test it.
>>
>> Note that it was whitespace damaged, find below a tidied up version of
>> the patch that i've applied to tip/out-of-tree.
>>
>> Ingo
>
> This patch may be needed to fix error handling in the hw_sim code, but
> I get the crash even with that code disabled. I'm currently bisecting
> to find the culprit.

ok. I just reactivated CONFIG_MAC80211_HWSIM, applied Ian's fix and the
crash went away:

calling iwl4965_init+0x0/0x6c
iwl4965: Intel(R) Wireless WiFi Link 4965AGN driver for Linux, 1.3.27kd
iwl4965: Copyright(c) 2003-2008 Intel Corporation
initcall iwl4965_init+0x0/0x6c returned 0 after 10 msecs
calling init_mac80211_hwsim+0x0/0x31c
mac80211_hwsim: Initializing radio 0
PM: Adding info for No Bus:hwsim0
PM: Adding info for No Bus:phy0
PM: Adding info for No Bus:wmaster0
phy0: Failed to select rate control algorithm
phy0: Failed to initialize rate control algorithm
PM: Removing info for No Bus:wmaster0
PM: Removing info for No Bus:phy0
mac80211_hwsim: ieee80211_register_hw failed (-2)
PM: Removing info for No Bus:hwsim0
initcall init_mac80211_hwsim+0x0/0x31c returned -2 after 58 msecs
initcall init_mac80211_hwsim+0x0/0x31c returned with error code -2
calling dmfe_init_module+0x0/0xea
dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17)
initcall dmfe_init_module+0x0/0xea returned 0 after 5 msecs

So at least as far as the init_mac80211_hwsim() deinit crash goes:

Tested-by: Ingo Molnar <[email protected]>

Ingo

2008-07-31 12:44:01

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Nick Piggin <[email protected]>
Date: Thu, 31 Jul 2008 22:38:19 +1000

> Except for the braindead volatile that gets stuck on the bitops pointer.
>
> Last time I complained about this, a lot of noise was made and I think
> Linus wanted it to stay around so we could pass volatile pointers to
> bitops & co without warnings. I say we should just remove the volatile
> and kill any callers that might warn...

Ho hum... :)

Another way to approach that, and keep the volatile, is to have
a "test_flags()" interface that takes the bit mask of values
you want to test for cases where you know it is a single word
flags value.

The downside is that this kind of interface is easy to use
incorrectly especially when accesses to the same flags use
bot test_bit() and test_flags().

2008-07-22 20:43:41

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Larry Finger <[email protected]>
Date: Tue, 22 Jul 2008 07:52:24 -0500

> David Miller wrote:
> > From: Larry Finger <[email protected]>
> > Date: Tue, 22 Jul 2008 01:34:28 -0500
> >
> >> David Miller wrote:
> >>> From: Larry Finger <[email protected]>
> >>> Date: Mon, 21 Jul 2008 17:40:10 -0500
> >>>
> >>>> Sorry :(
> >>>>
> >>>> I used the davem patch, the second version of your first one, and your second
> >>>> one. Both problems persist.
> >>>>
> >>>> Still plugging away on bisection.
> >>> GIT bisecting the lockdep problem is surely going the land you on:
> >>>
> >>> commit e308a5d806c852f56590ffdd3834d0df0cbed8d7
> >> No. It landed on this one.
> >
> > For the lockdep warnings?
>
> No - this one triggers the kernel BUG as follows:

Well, we were trying to fix the lockdep warnings. One
thing at a time :-)

And that BUG is now just a WARN in Linus's tree, so if
you update you should get further along.

2008-07-24 07:00:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, 2008-07-23 at 13:14 -0700, David Miller wrote:
> From: Peter Zijlstra <[email protected]>
> Date: Wed, 23 Jul 2008 12:58:16 +0200
>
> > So I guess my question is, is netif_tx_lock() here to stay, or is the
> > right fix to convert all those drivers to use __netif_tx_lock() which
> > locks only a single queue?
>
> It's staying.
>
> It's trying to block all potential calls into the ->hard_start_xmit()
> method of the driver, and the only reliable way to do that is to take
> all the TX queue locks. And in one form or another, we're going to
> have this "grab/release all the TX queue locks" construct.
>
> I find it interesting that this cannot be simply described to lockdep
> :-)

If you think its OK to take USHORT_MAX locks at once, I'm afraid we'll
have to agree to disagree :-/

Thing is, lockdep wants to be able to describe the locking hierarchy
with classes, and each class needs to be in static storage for various
reasons.

So if you make a locking hierarchy that is USHORT_MAX deep, you need at
least that amount of static classes.

Also, you'll run into the fact that lockdep will only track like 48 held
locks, after that it self terminates.

I'm aware of only 2 sites in the kernel that break this limit. The
down-side of stretching this limit is that deep lock chains come with
costs (esp so on -rt), so I'm not particularly eager to grow this - it
might give the impresssion its a good idea to have very long lock
chains.


2008-07-23 09:50:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, 2008-07-23 at 09:35 +0000, Jarek Poplawski wrote:
> On Wed, Jul 23, 2008 at 11:03:06AM +0200, Peter Zijlstra wrote:
> > On Wed, 2008-07-23 at 08:54 +0000, Jarek Poplawski wrote:
> > > On Wed, Jul 23, 2008 at 12:59:21AM -0700, David Miller wrote:
> > > > From: Jarek Poplawski <[email protected]>
> > > > Date: Wed, 23 Jul 2008 06:20:36 +0000
> > > >
> > > > > PS: if there is nothing new in lockdep the classical method would
> > > > > be to change this static array:
> > > > >
> > > > > static struct lock_class_key
> > > > > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
> > > > >
> > > > > to
> > > > >
> > > > > static struct lock_class_key
> > > > > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)][MAX_NUM_TX_QUEUES];
> > > > >
> > > > > and set lockdep classes per queue as well. (If we are sure we don't
> > > > > need lockdep subclasses anywhere this could be optimized by using
> > > > > one lock_class_key per 8 queues and spin_lock_nested()).
> > > >
> > > > Unfortunately MAX_NUM_TX_QUEUES is USHORT_MAX, so this isn't really
> > > > a feasible approach.
> > >
> > > Is it used by real devices already? Maybe for the beginning we could
> > > start with something less?
> > >
> > > > spin_lock_nested() isn't all that viable either, as the subclass
> > > > limit is something like 8.
> > >
> > > This method would need to do some additional counting: depending of
> > > a queue number each 8 subsequent queues share (are set to) the same
> > > class and their number mod 8 gives the subqueue number for
> > > spin_lock_nested().
> > >
> > > I'll try to find if there is something new around this in lockdep.
> > > (lockdep people added to CC.)
> >
> > There isn't.
> >
> > Is there a static data structure that the driver needs to instantiate to
> > 'create' a queue? Something like:
> >
> > /* this imaginary e1000 hardware has 16 hardware queues */
> > static struct net_tx_queue e1000e_tx_queues[16];
>
> I guess, not.
>
> Then, IMHO, we could be practical and simply skip lockdep validation
> for "some" range of locks: e.g. to set the table for the first 256
> queues only, and to do e.g. __raw_spin_lock() for bigger numbers. (If
> there are any bad locking patterns this should be enough for checking.)

definite NAK on using raw spinlock ops...

I'll go look at this multiqueue stuff to see if there is anything to be
done.. David, what would be a good commit to start reading?



2008-07-24 09:32:10

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Peter Zijlstra <[email protected]>
Date: Thu, 24 Jul 2008 11:27:05 +0200

> Well, not only lockdep, taking a very large number of locks is expensive
> as well.

Right now it would be on the order of 16 or 32 for
real hardware.

Much less than the scheduler currently takes on some
of my systems, so currently you are the pot calling the
kettle black. :-)

USHORT_MAX is just the upper hard limit imposed by the
software interface merely as a side effect of storing
the queue number of the SKB as a u16.

2008-07-22 23:04:09

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Larry Finger <[email protected]>
Date: Tue, 22 Jul 2008 13:39:08 -0500

> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.26-Linus-05752-g93ded9b-dirty #53
> ---------------------------------------------
> b43/1997 is trying to acquire lock:
> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
>
> but task is already holding lock:
> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
>
> other info that might help us debug this:
> 3 locks held by b43/1997:
> #0: ((name)){--..}, at: [<ffffffff80245185>] run_workqueue+0xa7/0x1f2
> #1: (&(&local->scan_work)->work){--..}, at: [<ffffffff80245185>]
> run_workqueue+0xa7/0x1f2
> #2: (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
>
> stack backtrace:
> Pid: 1997, comm: b43 Not tainted 2.6.26-Linus-05752-g93ded9b-dirty #53
>
> Call Trace:
> [<ffffffff80255616>] __lock_acquire+0xb7b/0xecc
> [<ffffffff8040c9a0>] __mutex_unlock_slowpath+0x100/0x10b
> [<ffffffff802559b8>] lock_acquire+0x51/0x6a
> [<ffffffffa028f322>] ieee80211_scan_completed+0x130/0x2e1 [mac80211]
> [<ffffffff8040dc08>] _spin_lock+0x1e/0x27
> [<ffffffffa028f322>] ieee80211_scan_completed+0x130/0x2e1 [mac80211]
> [<ffffffffa028f6ce>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
> [<ffffffff802451ce>] run_workqueue+0xf0/0x1f2
> [<ffffffff802453ab>] worker_thread+0xdb/0xea
> [<ffffffff80248a5f>] autoremove_wake_function+0x0/0x2e
> [<ffffffff802452d0>] worker_thread+0x0/0xea
> [<ffffffff80248731>] kthread+0x47/0x73
> [<ffffffff8040d7b1>] trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8020ceb9>] child_rip+0xa/0x11
> [<ffffffff8020c4ef>] restore_args+0x0/0x30
> [<ffffffff802486c5>] kthreadd+0x19a/0x1bf
> [<ffffffff802486ea>] kthread+0x0/0x73
> [<ffffffff8020ceaf>] child_rip+0x0/0x11

Lockdep doesn't like that we have an array of objects (the TX queues)
and we're iterating over them grabbing all of their locks.

Does anyone know how to teach lockdep that this is OK?

2008-07-23 10:08:47

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, Jul 23, 2008 at 11:50:14AM +0200, Peter Zijlstra wrote:
...
> definite NAK on using raw spinlock ops...
>
> I'll go look at this multiqueue stuff to see if there is anything to be
> done.. David, what would be a good commit to start reading?

...In the case David ever sleeps: I think, the current Linus' git is
good enough (the problem is with netdevice.h: netif_tx_lock()).

Jarek P.

2008-07-22 18:39:11

by Larry Finger

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

Patrick McHardy wrote:
> Larry Finger wrote:
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 2.6.26-Linus-git-05752-g93ded9b #49
>
> ^^^^^^^^^^^^^^ dirty?
>
> This kernel is not using the patches we sent.

No, but they didn't make any difference. I tried with all 3 applied, then backed
them out one by one. That was the state when I posted before.

Here are the dumps with all 3 patches applied:

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-Linus-05752-g93ded9b-dirty #53
---------------------------------------------
b43/1997 is trying to acquire lock:
(_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
ieee80211_scan_completed+0x130/0x2e1 [mac80211]

but task is already holding lock:
(_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
ieee80211_scan_completed+0x130/0x2e1 [mac80211]

other info that might help us debug this:
3 locks held by b43/1997:
#0: ((name)){--..}, at: [<ffffffff80245185>] run_workqueue+0xa7/0x1f2
#1: (&(&local->scan_work)->work){--..}, at: [<ffffffff80245185>]
run_workqueue+0xa7/0x1f2
#2: (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
ieee80211_scan_completed+0x130/0x2e1 [mac80211]

stack backtrace:
Pid: 1997, comm: b43 Not tainted 2.6.26-Linus-05752-g93ded9b-dirty #53

Call Trace:
[<ffffffff80255616>] __lock_acquire+0xb7b/0xecc
[<ffffffff8040c9a0>] __mutex_unlock_slowpath+0x100/0x10b
[<ffffffff802559b8>] lock_acquire+0x51/0x6a
[<ffffffffa028f322>] ieee80211_scan_completed+0x130/0x2e1 [mac80211]
[<ffffffff8040dc08>] _spin_lock+0x1e/0x27
[<ffffffffa028f322>] ieee80211_scan_completed+0x130/0x2e1 [mac80211]
[<ffffffffa028f6ce>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
[<ffffffff802451ce>] run_workqueue+0xf0/0x1f2
[<ffffffff802453ab>] worker_thread+0xdb/0xea
[<ffffffff80248a5f>] autoremove_wake_function+0x0/0x2e
[<ffffffff802452d0>] worker_thread+0x0/0xea
[<ffffffff80248731>] kthread+0x47/0x73
[<ffffffff8040d7b1>] trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020ceb9>] child_rip+0xa/0x11
[<ffffffff8020c4ef>] restore_args+0x0/0x30
[<ffffffff802486c5>] kthreadd+0x19a/0x1bf
[<ffffffff802486ea>] kthread+0x0/0x73
[<ffffffff8020ceaf>] child_rip+0x0/0x11

------------[ cut here ]------------
WARNING: at net/core/dev.c:1344 __netif_schedule+0x2c/0x98()
Modules linked in: af_packet rfkill_input nfs lockd nfs_acl sunrpc
cpufreq_conservative cpufreq_userspace cpufreq_powersave powernow_k8 fuse loop
dm_mod arc4 ecb crypto_blkcipher b43 firmware_class rfkill mac80211
snd_hda_intel cfg80211 led_class input_polldev k8temp snd_pcm snd_timer battery
hwmon sr_mod forcedeth ssb joydev ac button serio_raw cdrom snd soundcore
snd_page_alloc sg sd_mod ohci_hcd ehci_hcd usbcore edd fan thermal processor
ext3 mbcache jbd pata_amd ahci libata scsi_mod dock
Pid: 1997, comm: b43 Not tainted 2.6.26-Linus-05752-g93ded9b-dirty #53

Call Trace:
[<ffffffff80235d49>] warn_on_slowpath+0x51/0x8c
[<ffffffff8040d7b1>] trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff803a2413>] __netif_schedule+0x2c/0x98
[<ffffffffa028f44d>] ieee80211_scan_completed+0x25b/0x2e1 [mac80211]
[<ffffffffa028f6ce>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
[<ffffffff802451ce>] run_workqueue+0xf0/0x1f2
[<ffffffff802453ab>] worker_thread+0xdb/0xea
[<ffffffff80248a5f>] autoremove_wake_function+0x0/0x2e
[<ffffffff802452d0>] worker_thread+0x0/0xea
[<ffffffff80248731>] kthread+0x47/0x73
[<ffffffff8040d7b1>] trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020ceb9>] child_rip+0xa/0x11
[<ffffffff8020c4ef>] restore_args+0x0/0x30
[<ffffffff802486c5>] kthreadd+0x19a/0x1bf
[<ffffffff802486ea>] kthread+0x0/0x73
[<ffffffff8020ceaf>] child_rip+0x0/0x11

---[ end trace 88fab857dc2a4242 ]---

Larry





2008-07-22 14:53:35

by Patrick McHardy

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

Larry Finger wrote:
> David Miller wrote:
>> From: Larry Finger <[email protected]>
>> Date: Tue, 22 Jul 2008 01:34:28 -0500
>>
>>>> GIT bisecting the lockdep problem is surely going the land you on:
>>>>
>>>> commit e308a5d806c852f56590ffdd3834d0df0cbed8d7
>>> No. It landed on this one.
>>
>> For the lockdep warnings?
>
> When I was just one commit later, I got both the lockdep warning and the
> BUG. This is the commit in question.

I actually don't see how you could still get the warning with
Dave's patch and the two I sent applied.

The warning is triggered by the dev_mc_sync call in
ieee80211_set_multicast_list:

dev_mc_sync(local->mdev, dev);


local->mdev is the wmaster device, which has its type set to
ARPHRD_IEEE80211. dev is regular wireless device with type set
to ARPHRD_ETHER. So they have distinct lockdep classes set
by register_netdevice.

The warning is:

Jul 21 15:11:07 larrylap kernel: NetworkManager/2661 is trying to
acquire lock:
Jul 21 15:11:07 larrylap kernel: (&dev->addr_list_lock){-...}, at:
[<ffffffff803a2961>] dev_mc_sync+0x19/0x57
Jul 21 15:11:07 larrylap kernel:
Jul 21 15:11:07 larrylap kernel: but task is already holding lock:
Jul 21 15:11:07 larrylap kernel: (&dev->addr_list_lock){-...}, at:
[<ffffffff8039e7c5>] dev_set_rx_mode+0x19/0x2e
Jul 21 15:11:07 larrylap kernel:

The only already held is dev->addr_list_lock, the one taken
by dev_mc_sync is local->mdev->addr_list_lock. And this shouldn't
cause any warnings because of the distinct lockdep classes.

Could you please retry with the three patches attached to this
mail? If the lockdep warning still triggers, please post it again.


Attachments:
01.diff (5.45 kB)
02.diff (2.18 kB)
03.diff (389.00 B)
Download all attachments

2008-07-22 16:39:41

by Larry Finger

[permalink] [raw]
Subject: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

David and Patrick,

Here is the latest on this problem.

I pulled from Linus's tree this morning and now have git-05752-g93ded9b. The
kernel WARNING from __netif_schedule and the lockdep warning are present with or
without the patches from yesterday.

As I stated earlier, the kernel WARNING (it was a BUG then) was introduced in
commit 37437bb2 when the BUG statement was entered.

The lockdep warning started with the next commit (16361127).

I am not using any network traffic shaping. Is it correct that the faulty
condition is not that q == &noop_qdisc, but that __netif_schedule was called
when that condition exists?


The lockdep warning is:

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-Linus-git-05752-g93ded9b #49
---------------------------------------------
NetworkManager/2611 is trying to acquire lock:
(&dev->addr_list_lock){-...}, at: [<ffffffff803a2ad1>] dev_mc_sync+0x19/0x57

but task is already holding lock:
(&dev->addr_list_lock){-...}, at: [<ffffffff8039e909>] dev_set_rx_mode+0x19/0x2e

other info that might help us debug this:
2 locks held by NetworkManager/2611:
#0: (rtnl_mutex){--..}, at: [<ffffffff803a8488>] rtnetlink_rcv+0x12/0x27
#1: (&dev->addr_list_lock){-...}, at: [<ffffffff8039e909>]
dev_set_rx_mode+0x19/0x2e

stack backtrace:
Pid: 2611, comm: NetworkManager Not tainted 2.6.26-Linus-git-05752-g93ded9b #49

Call Trace:
[<ffffffff80251b02>] __lock_acquire+0xb7b/0xecc
[<ffffffff80251ea4>] lock_acquire+0x51/0x6a
[<ffffffff803a2ad1>] dev_mc_sync+0x19/0x57
[<ffffffff8040b3fc>] _spin_lock_bh+0x23/0x2c
[<ffffffff803a2ad1>] dev_mc_sync+0x19/0x57
[<ffffffff8039e911>] dev_set_rx_mode+0x21/0x2e
[<ffffffff803a04da>] dev_open+0x8e/0xb0
[<ffffffff8039fe84>] dev_change_flags+0xa6/0x163
[<ffffffff803a7591>] do_setlink+0x286/0x349
[<ffffffff803a849d>] rtnetlink_rcv_msg+0x0/0x1ec
[<ffffffff803a849d>] rtnetlink_rcv_msg+0x0/0x1ec
[<ffffffff803a849d>] rtnetlink_rcv_msg+0x0/0x1ec
[<ffffffff803a77de>] rtnl_setlink+0x10b/0x10d
[<ffffffff803a849d>] rtnetlink_rcv_msg+0x0/0x1ec
[<ffffffff803b416f>] netlink_rcv_skb+0x34/0x7d
[<ffffffff803a8497>] rtnetlink_rcv+0x21/0x27
[<ffffffff803b3c6f>] netlink_unicast+0x1f0/0x261
[<ffffffff8039a58d>] __alloc_skb+0x66/0x12a
[<ffffffff803b3f48>] netlink_sendmsg+0x268/0x27b
[<ffffffff80393cb9>] sock_sendmsg+0xcb/0xe3
[<ffffffff80246aab>] autoremove_wake_function+0x0/0x2e
[<ffffffff8039b2d8>] verify_iovec+0x46/0x82
[<ffffffff80393ee8>] sys_sendmsg+0x217/0x28a
[<ffffffff80393642>] sockfd_lookup_light+0x1a/0x52
[<ffffffff80250990>] trace_hardirqs_on_caller+0xef/0x113
[<ffffffff8040af74>] trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020be9b>] system_call_after_swapgs+0x7b/0x80
========================================================


The logged data for the WARNING is as follows:

------------[ cut here ]------------
WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()
Modules linked in: af_packet nfs lockd nfs_acl rfkill_input sunrpc
cpufreq_conservative cpufreq_userspace cpufreq_powersave powernow_k8 fuse loop
dm_mod arc4 ecb crypto_blkcipher b43 firmware_class rfkill mac80211 cfg80211
led_class input_polldev battery ac button ssb serio_raw forcedeth sr_mod cdrom
k8temp hwmon sg sd_mod ehci_hcd ohci_hcd usbcore edd fan thermal processor ext3
mbcache jbd pata_amd ahci libata scsi_mod dock
Pid: 1990, comm: b43 Not tainted 2.6.26-Linus-git-05752-g93ded9b #49

Call Trace:
[<ffffffff80233f6d>] warn_on_slowpath+0x51/0x8c
[<ffffffff8039d937>] __netif_schedule+0x2c/0x98
[<ffffffffa015445d>] ieee80211_scan_completed+0x26b/0x2f1 [mac80211]
[<ffffffffa01546de>] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211]
[<ffffffff8024325e>] run_workqueue+0xf0/0x1f2
[<ffffffff8024343b>] worker_thread+0xdb/0xea
[<ffffffff80246aab>] autoremove_wake_function+0x0/0x2e
[<ffffffff80243360>] worker_thread+0x0/0xea
[<ffffffff8024678b>] kthread+0x47/0x73
[<ffffffff8040af74>] trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020cea9>] child_rip+0xa/0x11
[<ffffffff8020c4df>] restore_args+0x0/0x30
[<ffffffff8024671f>] kthreadd+0x188/0x1ad
[<ffffffff80246744>] kthread+0x0/0x73
[<ffffffff8020ce9f>] child_rip+0x0/0x11

---[ end trace 42d234b678daea7a ]---

Other info I have found. The call to __netif_schedule from
ieee80211_scan_completed is through the following code from
include/linux/netdevice.h:

/**
* netif_wake_queue - restart transmit
* @dev: network device
*
* Allow upper layers to call the device hard_start_xmit routine.
* Used for flow control when transmit resources are available.
*/
static inline void netif_tx_wake_queue(struct netdev_queue *dev_queue)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap()) {
clear_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
return;
}
#endif
if (test_and_clear_bit(__QUEUE_STATE_XOFF, &dev_queue->state))
__netif_schedule(dev_queue->qdisc);
}

It doesn't make any difference if CONFIG_NETPOLL_TRAP is defined or not.

Please let me know if I can provide any further information,

Larry

2008-07-23 08:49:46

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, Jul 23, 2008 at 12:59:21AM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Wed, 23 Jul 2008 06:20:36 +0000
>
> > PS: if there is nothing new in lockdep the classical method would
> > be to change this static array:
> >
> > static struct lock_class_key
> > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
> >
> > to
> >
> > static struct lock_class_key
> > netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)][MAX_NUM_TX_QUEUES];
> >
> > and set lockdep classes per queue as well. (If we are sure we don't
> > need lockdep subclasses anywhere this could be optimized by using
> > one lock_class_key per 8 queues and spin_lock_nested()).
>
> Unfortunately MAX_NUM_TX_QUEUES is USHORT_MAX, so this isn't really
> a feasible approach.

Is it used by real devices already? Maybe for the beginning we could
start with something less?

> spin_lock_nested() isn't all that viable either, as the subclass
> limit is something like 8.

This method would need to do some additional counting: depending of
a queue number each 8 subsequent queues share (are set to) the same
class and their number mod 8 gives the subqueue number for
spin_lock_nested().

I'll try to find if there is something new around this in lockdep.
(lockdep people added to CC.)

Jarek P.

2008-07-24 09:10:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, 2008-07-23 at 13:16 -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Wed, 23 Jul 2008 11:49:14 +0000
>
> > On Wed, Jul 23, 2008 at 11:35:19AM +0000, Jarek Poplawski wrote:
> > > On Wed, Jul 23, 2008 at 12:58:16PM +0200, Peter Zijlstra wrote:
> > ...
> > > > When I look at the mac802.11 code in ieee80211_tx_pending() it looks
> > > > like it can do with just one lock at a time, instead of all - but I
> > > > might be missing some obvious details.
> > > >
> > > > So I guess my question is, is netif_tx_lock() here to stay, or is the
> > > > right fix to convert all those drivers to use __netif_tx_lock() which
> > > > locks only a single queue?
> > > >
> > >
> > > It's a new thing mainly for new hardware/drivers, and just after
> > > conversion (older drivers effectively use __netif_tx_lock()), so it'll
> > > probably stay for some time until something better is found. David,
> > > will tell the rest, I hope.
> >
> > ...And, of course, these new drivers should also lock a single queue
> > where possible.
>
> It isn't going away.
>
> There will always be a need for a "stop all the TX queues" operation.

Ok, then how about something like this, the idea is to wrap the per tx
lock with a read lock of the device and let the netif_tx_lock() be the
write side, therefore excluding all device locks, but not incure the
cacheline bouncing on the read side by using per-cpu counters like rcu
does.

This of course requires that netif_tx_lock() is rare, otherwise stuff
will go bounce anyway...

Probably missed a few details,.. but I think the below ought to show the
idea...

struct tx_lock {
int busy;
spinlock_t lock;
unsigned long *counters;
};


int tx_lock_init(struct tx_lock *txl)
{
txl->busy = 0;
spin_lock_init(&txl->lock);
txl->counters = alloc_percpu(unsigned long);

if (!txl->counters)
return -ENOMEM;

return 0;
}

void __netif_tx_lock(struct netdev_queue *txq, cpu)
{
struct net_device *dev = txq->dev;

if (rcu_dereference(dev->tx_lock.busy)) {
spin_lock(&dev->tx_lock.lock);
(*percpu_ptr(dev->tx_lock.counters, cpu))++;
spin_unlock(&dev->tx_lock.lock);
} else
(*percpu_ptr(dev->tx_lock.counters, cpu))++;

spin_lock(&txq->_xmit_lock);
txq->xmit_lock_owner = cpu;
}

void __netif_tx_unlock(struct netdev_queue *txq)
{
struct net_device *dev = txq->dev;

(*percpu_ptr(dev->tx_lock.counters, txq->xmit_lock_owner))--;
txq->xmit_lock_owner = -1;
spin_unlock(&txq->xmit_lock);
}

unsigned long tx_lock_read_counters(struct tx_lock *txl)
{
int i;
unsigned long counter = 0;

/* can use online - the inc/dec are matched per cpu */
for_each_online_cpu(i)
counter += *percpu_ptr(txl->counters, i);

return counter;
}

void netif_tx_lock(struct net_device *dev)
{
spin_lock(&dev->tx_lock.lock);
rcu_assign_pointer(dev->tx_lock.busy, 1);

while (tx_lock_read_counters(&dev->tx_lock)
cpu_relax();
}

void netif_tx_unlock(struct net_device *dev)
{
rcu_assign_pointer(dev->tx_lock.busy, 0);
smp_wmb(); /* because rcu_assign_pointer is broken */
spin_unlock(&dev->tx_lock.lock);
}


2008-07-25 20:00:30

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, Jul 25, 2008 at 09:36:15PM +0200, Johannes Berg wrote:
> On Fri, 2008-07-25 at 21:34 +0200, Jarek Poplawski wrote:
>
> > > Umm, of course it cannot, because then we'd have to take the mutex in
> > > the TX path, which we cannot. We cannot have another lock in the TX
> > > path, what's so hard to understand about? We need to be able to lock all
> > > queues to lock out multiple tx paths at once in some (really) slow paths
> > > but not have any extra lock overhead for the tx path, especially not a
> > > single lock.
> >
> > But this mutex doesn't have to be mutex. And it's not for the tx path,
> > only for "service" just like netif_tx_lock(). The fast path needs only
> > queue->tx_lock.
>
> No, we need to be able to lock out multiple TX paths at once.

IMHO, it can do the same. We could e.g. insert a locked spinlock into
this noop_tx_handler(), to give everyone some waiting.

Jarek P.

2008-07-25 19:33:20

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, Jul 25, 2008 at 09:16:24PM +0200, Johannes Berg wrote:
> On Fri, 2008-07-25 at 20:36 +0200, Jarek Poplawski wrote:
> > On Fri, Jul 25, 2008 at 07:04:36PM +0200, Ingo Oeser wrote:
> > ...
> > > I'm sure as hell, I miss sth. but can't it be done by this pseudo-code:
> >
> > ...And I really doubt it can't be done like this.
>
> Umm, of course it cannot, because then we'd have to take the mutex in
> the TX path, which we cannot. We cannot have another lock in the TX
> path, what's so hard to understand about? We need to be able to lock all
> queues to lock out multiple tx paths at once in some (really) slow paths
> but not have any extra lock overhead for the tx path, especially not a
> single lock.

But this mutex doesn't have to be mutex. And it's not for the tx path,
only for "service" just like netif_tx_lock(). The fast path needs only
queue->tx_lock.

Jarek P.

2008-07-22 18:44:55

by Patrick McHardy

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

Larry Finger wrote:
> Patrick McHardy wrote:
>> Larry Finger wrote:
>>>
>>> =============================================
>>> [ INFO: possible recursive locking detected ]
>>> 2.6.26-Linus-git-05752-g93ded9b #49
>>
>> ^^^^^^^^^^^^^^ dirty?
>>
>> This kernel is not using the patches we sent.
>
> No, but they didn't make any difference. I tried with all 3 applied,
> then backed them out one by one. That was the state when I posted before.

Well, this is a completely different warning.

>
> Here are the dumps with all 3 patches applied:
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.26-Linus-05752-g93ded9b-dirty #53
> ---------------------------------------------
> b43/1997 is trying to acquire lock:
> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
>
> but task is already holding lock:
> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
> ieee80211_scan_completed+0x130/0x2e1 [mac80211]


2008-07-31 12:38:35

by Nick Piggin

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thursday 31 July 2008 22:29, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Sun, 27 Jul 2008 22:37:57 +0200
>
> > Looks like enough to me. (Probably it could even share space with
> > the state.)
>
> So I made some progress on this, three things:
>
> 1) I remember why I choose a to use a bit in my design, it's so that
> it does not increase the costs of the checks in the fast paths.
> test_bit(X) && test_bit(Y) can be combined into a single test by
> the compiler.

Except for the braindead volatile that gets stuck on the bitops pointer.

Last time I complained about this, a lot of noise was made and I think
Linus wanted it to stay around so we could pass volatile pointers to
bitops & co without warnings. I say we should just remove the volatile
and kill any callers that might warn...

2008-07-23 11:44:09

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, Jul 23, 2008 at 11:35:19AM +0000, Jarek Poplawski wrote:
> On Wed, Jul 23, 2008 at 12:58:16PM +0200, Peter Zijlstra wrote:
...
> > When I look at the mac802.11 code in ieee80211_tx_pending() it looks
> > like it can do with just one lock at a time, instead of all - but I
> > might be missing some obvious details.
> >
> > So I guess my question is, is netif_tx_lock() here to stay, or is the
> > right fix to convert all those drivers to use __netif_tx_lock() which
> > locks only a single queue?
> >
>
> It's a new thing mainly for new hardware/drivers, and just after
> conversion (older drivers effectively use __netif_tx_lock()), so it'll
> probably stay for some time until something better is found. David,
> will tell the rest, I hope.

...And, of course, these new drivers should also lock a single queue
where possible.

Jarek P.

2008-07-23 11:30:14

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, Jul 23, 2008 at 12:58:16PM +0200, Peter Zijlstra wrote:
...
> Ah, right,...
>
> that takes a whole bunch of locks at once..
>
> Is that really needed? - when I grep for its usage its surprisingly few
> drivers using it and even less generic code.
>
> When I look at the mac802.11 code in ieee80211_tx_pending() it looks
> like it can do with just one lock at a time, instead of all - but I
> might be missing some obvious details.
>
> So I guess my question is, is netif_tx_lock() here to stay, or is the
> right fix to convert all those drivers to use __netif_tx_lock() which
> locks only a single queue?
>

It's a new thing mainly for new hardware/drivers, and just after
conversion (older drivers effectively use __netif_tx_lock()), so it'll
probably stay for some time until something better is found. David,
will tell the rest, I hope.

Jarek P.

2008-07-24 09:20:40

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Peter Zijlstra <[email protected]>
Date: Thu, 24 Jul 2008 11:10:48 +0200

> Ok, then how about something like this, the idea is to wrap the per tx
> lock with a read lock of the device and let the netif_tx_lock() be the
> write side, therefore excluding all device locks, but not incure the
> cacheline bouncing on the read side by using per-cpu counters like rcu
> does.
>
> This of course requires that netif_tx_lock() is rare, otherwise stuff
> will go bounce anyway...
>
> Probably missed a few details,.. but I think the below ought to show the
> idea...

Thanks for the effort, but I don't think we can seriously consider
this.

This lock is taken for every packet transmitted by the system, adding
another memory reference (the RCU deref) and a counter bump is just
not something we can just add to placate lockdep. We going through
all of this effort to seperate the TX locking into individual
queues, it would be silly to go back and make it more expensive.

I have other ideas which I've expanded upon in other emails. They
involve creating a netif_tx_freeze() interface and getting the drivers
to start using it.

2008-07-24 10:59:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, 2008-07-24 at 20:38 +1000, Nick Piggin wrote:
> On Thursday 24 July 2008 20:08, Peter Zijlstra wrote:
> > On Thu, 2008-07-24 at 02:32 -0700, David Miller wrote:
> > > From: Peter Zijlstra <[email protected]>
> > > Date: Thu, 24 Jul 2008 11:27:05 +0200
> > >
> > > > Well, not only lockdep, taking a very large number of locks is
> > > > expensive as well.
> > >
> > > Right now it would be on the order of 16 or 32 for
> > > real hardware.
> > >
> > > Much less than the scheduler currently takes on some
> > > of my systems, so currently you are the pot calling the
> > > kettle black. :-)
> >
> > One nit, and then I'll let this issue rest :-)
> >
> > The scheduler has a long lock dependancy chain (nr_cpu_ids rq locks),
> > but it never takes all of them at the same time. Any one code path will
> > at most hold two rq locks.
>
> Aside from lockdep, is there a particular problem with taking 64k locks
> at once? (in a very slow path, of course) I don't think it causes a
> problem with preempt_count, does it cause issues with -rt kernel?

PI-chains might explode I guess, Thomas?

Besides that, I just have this voice in my head telling me that
minimizing the number of locks held is a good thing.

> Hey, something kind of cool (and OT) I've just thought of that we can
> do with ticket locks is to take tickets for 2 (or 64K) nested locks,
> and then wait for them both (all), so the cost is N*lock + longest spin,
> rather than N*lock + N*avg spin.
>
> That would mean even at the worst case of a huge amount of contention
> on all 64K locks, it should only take a couple of ms to take all of
> them (assuming max spin time isn't ridiculous).
>
> Probably not the kind of feature we want to expose widely, but for
> really special things like the scheduler, it might be a neat hack to
> save a few cycles ;) Traditional implementations would just have
> #define spin_lock_async spin_lock
> #define spin_lock_async_wait do {} while (0)
>
> Sorry it's offtopic, but if I didn't post it, I'd forget to. Might be
> a fun quick hack for someone.

It might just be worth it for double_rq_lock() - if you can sort out the
deadlock potential Miklos just raised ;-)


2008-07-24 11:07:09

by Nick Piggin

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thursday 24 July 2008 20:55, Miklos Szeredi wrote:
> On Thu, 24 Jul 2008, Nick Piggin wrote:
> > Hey, something kind of cool (and OT) I've just thought of that we can
> > do with ticket locks is to take tickets for 2 (or 64K) nested locks,
> > and then wait for them both (all), so the cost is N*lock + longest spin,
> > rather than N*lock + N*avg spin.
>
> Isn't this deadlocky?
>
> E.g. one task takes ticket x=1, then other task comes in and takes x=2
> and y=1, then first task takes y=2. Then neither can actually
> complete both locks.

Oh duh of course you still need mutual exclusion from the first lock
to order the subsequent :P

So yeah it only works for N > 2 locks, and you have to spin_lock the
first one... so unsuitable for scheduler.

2008-07-26 13:17:43

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Sat, Jul 26, 2008 at 02:18:46AM -0700, David Miller wrote:
...
> I think there might be an easier way, but we may have
> to modify the state bits a little.
>
> Every call into ->hard_start_xmit() is made like this:
>
> 1. lock TX queue
> 2. check TX queue stopped
> 3. call ->hard_start_xmit() if not stopped
>
> This means that we can in fact do something like:
>
> unsigned int i;
>
> for (i = 0; i < dev->num_tx_queues; i++) {
> struct netdev_queue *txq;
>
> txq = netdev_get_tx_queue(dev, i);
> spin_lock_bh(&txq->_xmit_lock);
> netif_tx_freeze_queue(txq);
> spin_unlock_bh(&txq->_xmit_lock);
> }
>
> netif_tx_freeze_queue() just sets a new bit we add.
>
> Then we go to the ->hard_start_xmit() call sites and check this new
> "frozen" bit as well as the existing "stopped" bit.
>
> When we unfreeze each queue later, we see if it is stopped, and if not
> we schedule it's qdisc for packet processing.

I guess some additional synchronization will be added yet to prevent
parallel freeze and especially unfreeze.

Jarek P.

2008-07-22 19:30:36

by Larry Finger

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

Patrick McHardy wrote:
> Larry Finger wrote:
>> Patrick McHardy wrote:
>>> Larry Finger wrote:
>>>>
>>>> =============================================
>>>> [ INFO: possible recursive locking detected ]
>>>> 2.6.26-Linus-git-05752-g93ded9b #49
>>>
>>> ^^^^^^^^^^^^^^ dirty?
>>>
>>> This kernel is not using the patches we sent.
>>
>> No, but they didn't make any difference. I tried with all 3 applied,
>> then backed them out one by one. That was the state when I posted before.
>
> Well, this is a completely different warning.
>
>>
>> Here are the dumps with all 3 patches applied:
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 2.6.26-Linus-05752-g93ded9b-dirty #53
>> ---------------------------------------------
>> b43/1997 is trying to acquire lock:
>> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
>> ieee80211_scan_completed+0x130/0x2e1 [mac80211]
>>
>> but task is already holding lock:
>> (_xmit_IEEE80211#2){-...}, at: [<ffffffffa028f322>]
>> ieee80211_scan_completed+0x130/0x2e1 [mac80211]

Sorry. After all those kernel builds I got bleary-eyed and just looked for
recursive locking without any regard for the details.

Larry

2008-07-25 17:52:21

by Ingo Oeser

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

Hi David,

David Miller schrieb:
> From: Peter Zijlstra <[email protected]>
> Date: Wed, 23 Jul 2008 12:58:16 +0200
>
> > So I guess my question is, is netif_tx_lock() here to stay, or is the
> > right fix to convert all those drivers to use __netif_tx_lock() which
> > locks only a single queue?
>
> It's staying.
>
> It's trying to block all potential calls into the ->hard_start_xmit()
> method of the driver, and the only reliable way to do that is to take
> all the TX queue locks. And in one form or another, we're going to
> have this "grab/release all the TX queue locks" construct.
>
> I find it interesting that this cannot be simply described to lockdep
> :-)

I'm sure as hell, I miss sth. but can't it be done by this pseudo-code:

netif_tx_lock(device)
{
mutex_lock(device->queue_entry_mutex);
foreach_queue_entries(queue, device->queues)
{
spin_lock(queue->tx_lock);
set_noop_tx_handler(queue);
spin_unlock(queue->tx_lock);
}
mutex_unlock(device->queue_entry_mutex);
}

netif_tx_unlock(device)
{
mutex_lock(device->queue_entry_mutex);
foreach_queue_entries(queue, device->queues)
{
spin_lock(queue->tx_lock);
set_useful_tx_handler(queue);
spin_unlock(queue->tx_lock);
}
mutex_unlock(device->queue_entry_mutex);
}

Then protect use of the queues by queue->tx_lock in transmit path.
The first setup of the queue doesn't need to be protected, since no-one
knows the device. The final cleanup of the device doesn't need to be
protected either, because netif_tx_lock() and netif_tx_unlock() should
not be called after entering the final cleanup.

Some VM locking works this way...


Best Regards

Ingo Oeser

2008-07-23 20:55:30

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Wed, 23 Jul 2008 22:43:35 +0200

> On Wed, Jul 23, 2008 at 01:16:07PM -0700, David Miller wrote:
> ...
> > There will always be a need for a "stop all the TX queues" operation.
>
> The question is if the current way is "all correct". As a matter of
> fact I think Peter's doubts could be justified: taking "USHORT_MAX"
> locks looks really dubious (so maybe it's not so strange lockdep
> didn't get used to this).

There are, of course, potentially other ways to achieve the objective.

And for non-multiqueue aware devices (which is the vast majority of
the 400 or so networking drivers we have) there is only one queue and
thus one lock taken.

2008-07-22 10:46:53

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

On 22-07-2008 08:34, Larry Finger wrote:
...
>>> I used the davem patch, the second version of your first one, and
>>> your second one. Both problems persist.

Could you send lockdep info after Patrick's "set lockdep classes"
patch?

Jarek P.

2008-07-23 20:14:42

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Peter Zijlstra <[email protected]>
Date: Wed, 23 Jul 2008 12:58:16 +0200

> So I guess my question is, is netif_tx_lock() here to stay, or is the
> right fix to convert all those drivers to use __netif_tx_lock() which
> locks only a single queue?

It's staying.

It's trying to block all potential calls into the ->hard_start_xmit()
method of the driver, and the only reliable way to do that is to take
all the TX queue locks. And in one form or another, we're going to
have this "grab/release all the TX queue locks" construct.

I find it interesting that this cannot be simply described to lockdep
:-)

2008-07-27 00:34:35

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Sat, 26 Jul 2008 15:18:38 +0200

> I guess some additional synchronization will be added yet to prevent
> parallel freeze and especially unfreeze.

Yes, that could be a problem. Using test_and_set_bit() can
guard the freezing sequence itself, but it won't handle
letting two threads of control freeze and unfreeze safely
without a reference count.

We want this thing to be able to be used flexbly, which means
we can't just assume that this is a short code sequence and
the unfreeze will come quickly. That pretty much rules
out using a new lock around the operation or anything
like that.

So I guess we could replace the state bit with a reference
count. It doesn't even need to be atomic since it is set
and tested under dev_queue->_xmit_lock


2008-07-21 19:07:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370


* Ian Schram <[email protected]> wrote:

> I was looking at this out of interest, but I'm in no way familiar with
> the code.

thanks Ian for the patch, i'll test it.

Note that it was whitespace damaged, find below a tidied up version of
the patch that i've applied to tip/out-of-tree.

Ingo

----------------------->
commit 2f77dd3a3b5c3a27298fa0a09d8703c09c633fc6
Author: Ian Schram <[email protected]>
Date: Mon Jul 21 20:18:25 2008 +0200

mac80211_hwsim.c: fix: BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

I was looking at this out of interest, but I'm in no way familiar with
the code.

Looks to me that the error handling code in mac80211_hwsim is awkward.
Which leads to it calling ieee80211_unregister_hw even when
ieee80211_register_hw failed.

The function has a for loop where it generates all simulated radios.
when something fails, the error handling will call mac80211_hwsim_free
which frees all simulated radios who's pointer isn't zero. However the
information stored is insufficient to determine whether or not the call
to ieee80211_register_hw succeeded or not for a specific radio. The
included patch makes init_mac80211_hwsim clean up the current simulated
radio, and then calls into mac80211_hwsim_free to clean up all the
radios that did succeed.

This however doesn't explain why the rate control registration failed..
build tested this, but had some problems reproducing the original
problem.

Signed-off-by: Ian Schram <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/net/wireless/mac80211_hwsim.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 913dc9f..5816230 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -364,8 +364,7 @@ static void mac80211_hwsim_free(void)
struct mac80211_hwsim_data *data;
data = hwsim_radios[i]->priv;
ieee80211_unregister_hw(hwsim_radios[i]);
- if (!IS_ERR(data->dev))
- device_unregister(data->dev);
+ device_unregister(data->dev);
ieee80211_free_hw(hwsim_radios[i]);
}
}
@@ -437,7 +436,7 @@ static int __init init_mac80211_hwsim(void)
"mac80211_hwsim: device_create_drvdata "
"failed (%ld)\n", PTR_ERR(data->dev));
err = -ENOMEM;
- goto failed;
+ goto failed_drvdata;
}
data->dev->driver = &mac80211_hwsim_driver;

@@ -461,7 +460,7 @@ static int __init init_mac80211_hwsim(void)
if (err < 0) {
printk(KERN_DEBUG "mac80211_hwsim: "
"ieee80211_register_hw failed (%d)\n", err);
- goto failed;
+ goto failed_hw;
}

printk(KERN_DEBUG "%s: hwaddr %s registered\n",
@@ -479,9 +478,9 @@ static int __init init_mac80211_hwsim(void)
rtnl_lock();

err = dev_alloc_name(hwsim_mon, hwsim_mon->name);
- if (err < 0) {
+ if (err < 0)
goto failed_mon;
- }
+

err = register_netdevice(hwsim_mon);
if (err < 0)
@@ -494,7 +493,14 @@ static int __init init_mac80211_hwsim(void)
failed_mon:
rtnl_unlock();
free_netdev(hwsim_mon);
+ mac80211_hwsim_free();
+ return err;

+failed_hw:
+ device_unregister(data->dev);
+failed_drvdata:
+ ieee80211_free_hw(hw);
+ hwsim_radios[i] = 0;
failed:
mac80211_hwsim_free();
return err;

2008-07-26 10:53:20

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Sat, Jul 26, 2008 at 02:18:46AM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Fri, 25 Jul 2008 22:01:37 +0200
>
> > On Fri, Jul 25, 2008 at 09:36:15PM +0200, Johannes Berg wrote:
> > > On Fri, 2008-07-25 at 21:34 +0200, Jarek Poplawski wrote:
> > >
> > > No, we need to be able to lock out multiple TX paths at once.
> >
> > IMHO, it can do the same. We could e.g. insert a locked spinlock into
> > this noop_tx_handler(), to give everyone some waiting.
>
> I think there might be an easier way, but we may have
> to modify the state bits a little.

Yes, this looks definitely easier, but here is this one little bit
more, plus additional code to handle this in various places. Ingo's
proposal needs a (one?!) bit more thinking in one place, but it
shouldn't add even a bit to tx path (and it looks really cool!). Of
course, it could be re-considered in some other time too.

BTW, it seems with "Ingo's method" this netif_queue_stopped() check
could be removed too - the change of handlers could be done with
single qdiscs as well.

Jarek P.

2008-07-22 06:34:31

by Larry Finger

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

David Miller wrote:
> From: Larry Finger <[email protected]>
> Date: Mon, 21 Jul 2008 17:40:10 -0500
>
>> Sorry :(
>>
>> I used the davem patch, the second version of your first one, and your second
>> one. Both problems persist.
>>
>> Still plugging away on bisection.
>
> GIT bisecting the lockdep problem is surely going the land you on:
>
> commit e308a5d806c852f56590ffdd3834d0df0cbed8d7

No. It landed on this one.

37437bb2e1ae8af470dfcd5b4ff454110894ccaf is first bad commit
commit 37437bb2e1ae8af470dfcd5b4ff454110894ccaf
Author: David S. Miller <[email protected]>
Date: Wed Jul 16 02:15:04 2008 -0700

pkt_sched: Schedule qdiscs instead of netdev_queue.

When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.

Signed-off-by: David S. Miller <[email protected]>

:040000 040000 4d13d1fb1ae37d9720c3db6b1368866e78621f55
f1a0f5e5a191e7904b528d9e10069a4324a5d328 M include
:040000 040000 3515aad52a2cdaaba85feeffc0944d7f07a19c96
4854d4f4df9726a2e8837037f82bde807bed2ede M net

Larry

2008-07-27 20:38:01

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Sat, Jul 26, 2008 at 05:34:34PM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Sat, 26 Jul 2008 15:18:38 +0200
>
> > I guess some additional synchronization will be added yet to prevent
> > parallel freeze and especially unfreeze.
>
> Yes, that could be a problem. Using test_and_set_bit() can
> guard the freezing sequence itself, but it won't handle
> letting two threads of control freeze and unfreeze safely
> without a reference count.
>
> We want this thing to be able to be used flexbly, which means
> we can't just assume that this is a short code sequence and
> the unfreeze will come quickly. That pretty much rules
> out using a new lock around the operation or anything
> like that.
>
> So I guess we could replace the state bit with a reference
> count. It doesn't even need to be atomic since it is set
> and tested under dev_queue->_xmit_lock

Looks like enough to me. (Probably it could even share space with
the state.)

Jarek P.

2008-07-23 10:58:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Wed, 2008-07-23 at 10:13 +0000, Jarek Poplawski wrote:
> On Wed, Jul 23, 2008 at 11:50:14AM +0200, Peter Zijlstra wrote:
> ....
> > definite NAK on using raw spinlock ops...
> >
> > I'll go look at this multiqueue stuff to see if there is anything to be
> > done.. David, what would be a good commit to start reading?
>
> ....In the case David ever sleeps: I think, the current Linus' git is
> good enough (the problem is with netdevice.h: netif_tx_lock()).

Ah, right,...

that takes a whole bunch of locks at once..

Is that really needed? - when I grep for its usage its surprisingly few
drivers using it and even less generic code.

When I look at the mac802.11 code in ieee80211_tx_pending() it looks
like it can do with just one lock at a time, instead of all - but I
might be missing some obvious details.

So I guess my question is, is netif_tx_lock() here to stay, or is the
right fix to convert all those drivers to use __netif_tx_lock() which
locks only a single queue?


2008-07-25 19:17:12

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, 2008-07-25 at 20:36 +0200, Jarek Poplawski wrote:
> On Fri, Jul 25, 2008 at 07:04:36PM +0200, Ingo Oeser wrote:
> ...
> > I'm sure as hell, I miss sth. but can't it be done by this pseudo-code:
>
> ...And I really doubt it can't be done like this.

Umm, of course it cannot, because then we'd have to take the mutex in
the TX path, which we cannot. We cannot have another lock in the TX
path, what's so hard to understand about? We need to be able to lock all
queues to lock out multiple tx paths at once in some (really) slow paths
but not have any extra lock overhead for the tx path, especially not a
single lock.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-21 20:46:08

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Larry Finger <[email protected]>
Date: Mon, 21 Jul 2008 15:38:47 -0500

> David Miller wrote:
> >
> > No further backtrace? That will tell us what driver is causing
> > this.
>
> Yes, I have a full backtrace.
>
> It starts with possible recursive locking in NetworkManager, and goes directly
> into the Warning - this came from a later pull of Linus's tree.

That helps a lot, I'm looking at this now.

Thanks.

2008-07-22 21:17:48

by David Miller

[permalink] [raw]
Subject: Re: [crash] BUG: unable to handle kernel NULL pointer dereference at 0000000000000370

From: Patrick McHardy <[email protected]>
Date: Tue, 22 Jul 2008 16:53:30 +0200

> Could you please retry with the three patches attached to this
> mail? If the lockdep warning still triggers, please post it again.

Since I'm convinced the original lockdep spurious warning
issue is cured, and we're looking at something different
now, I've integrated all of these fixes together as one
commit as below.

Thanks a lot Patrick.

netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep.

The new address list lock needs to handle the same device layering
issues that the _xmit_lock one does.

This integrates work done by Patrick McHardy.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9737c06..a641eea 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5041,6 +5041,7 @@ static int bond_check_params(struct bond_params *params)
}

static struct lock_class_key bonding_netdev_xmit_lock_key;
+static struct lock_class_key bonding_netdev_addr_lock_key;

static void bond_set_lockdep_class_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -5052,6 +5053,8 @@ static void bond_set_lockdep_class_one(struct net_device *dev,

static void bond_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock,
+ &bonding_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, bond_set_lockdep_class_one, NULL);
}

diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index b6500b2..58f4b1d 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -123,6 +123,7 @@ static LIST_HEAD(bpq_devices);
* off into a separate class since they always nest.
*/
static struct lock_class_key bpq_netdev_xmit_lock_key;
+static struct lock_class_key bpq_netdev_addr_lock_key;

static void bpq_set_lockdep_class_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -133,6 +134,7 @@ static void bpq_set_lockdep_class_one(struct net_device *dev,

static void bpq_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock, &bpq_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, bpq_set_lockdep_class_one, NULL);
}

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index efbc155..4239450 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -276,6 +276,7 @@ static int macvlan_change_mtu(struct net_device *dev, int new_mtu)
* separate class since they always nest.
*/
static struct lock_class_key macvlan_netdev_xmit_lock_key;
+static struct lock_class_key macvlan_netdev_addr_lock_key;

#define MACVLAN_FEATURES \
(NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
@@ -295,6 +296,8 @@ static void macvlan_set_lockdep_class_one(struct net_device *dev,

static void macvlan_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock,
+ &macvlan_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, macvlan_set_lockdep_class_one, NULL);
}

diff --git a/drivers/net/wireless/hostap/hostap_hw.c b/drivers/net/wireless/hostap/hostap_hw.c
index 13d5882..3153fe9 100644
--- a/drivers/net/wireless/hostap/hostap_hw.c
+++ b/drivers/net/wireless/hostap/hostap_hw.c
@@ -3101,6 +3101,7 @@ static void prism2_clear_set_tim_queue(local_info_t *local)
* This is a natural nesting, which needs a split lock type.
*/
static struct lock_class_key hostap_netdev_xmit_lock_key;
+static struct lock_class_key hostap_netdev_addr_lock_key;

static void prism2_set_lockdep_class_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -3112,6 +3113,8 @@ static void prism2_set_lockdep_class_one(struct net_device *dev,

static void prism2_set_lockdep_class(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock,
+ &hostap_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, prism2_set_lockdep_class_one, NULL);
}

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f42bc2b..4bf014e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -569,6 +569,7 @@ static void vlan_dev_set_rx_mode(struct net_device *vlan_dev)
* separate class since they always nest.
*/
static struct lock_class_key vlan_netdev_xmit_lock_key;
+static struct lock_class_key vlan_netdev_addr_lock_key;

static void vlan_dev_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -581,6 +582,9 @@ static void vlan_dev_set_lockdep_one(struct net_device *dev,

static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass)
{
+ lockdep_set_class_and_subclass(&dev->addr_list_lock,
+ &vlan_netdev_addr_lock_key,
+ subclass);
netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, &subclass);
}

diff --git a/net/core/dev.c b/net/core/dev.c
index 65eea83..6bf217d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -261,7 +261,7 @@ static RAW_NOTIFIER_HEAD(netdev_chain);

DEFINE_PER_CPU(struct softnet_data, softnet_data);

-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#ifdef CONFIG_LOCKDEP
/*
* register_netdevice() inits txq->_xmit_lock and sets lockdep class
* according to dev->type
@@ -301,6 +301,7 @@ static const char *netdev_lock_name[] =
"_xmit_NONE"};

static struct lock_class_key netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
+static struct lock_class_key netdev_addr_lock_key[ARRAY_SIZE(netdev_lock_type)];

static inline unsigned short netdev_lock_pos(unsigned short dev_type)
{
@@ -313,8 +314,8 @@ static inline unsigned short netdev_lock_pos(unsigned short dev_type)
return ARRAY_SIZE(netdev_lock_type) - 1;
}

-static inline void netdev_set_lockdep_class(spinlock_t *lock,
- unsigned short dev_type)
+static inline void netdev_set_xmit_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
{
int i;

@@ -322,9 +323,22 @@ static inline void netdev_set_lockdep_class(spinlock_t *lock,
lockdep_set_class_and_name(lock, &netdev_xmit_lock_key[i],
netdev_lock_name[i]);
}
+
+static inline void netdev_set_addr_lockdep_class(struct net_device *dev)
+{
+ int i;
+
+ i = netdev_lock_pos(dev->type);
+ lockdep_set_class_and_name(&dev->addr_list_lock,
+ &netdev_addr_lock_key[i],
+ netdev_lock_name[i]);
+}
#else
-static inline void netdev_set_lockdep_class(spinlock_t *lock,
- unsigned short dev_type)
+static inline void netdev_set_xmit_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+}
+static inline void netdev_set_addr_lockdep_class(struct net_device *dev)
{
}
#endif
@@ -3851,7 +3865,7 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
void *_unused)
{
spin_lock_init(&dev_queue->_xmit_lock);
- netdev_set_lockdep_class(&dev_queue->_xmit_lock, dev->type);
+ netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
dev_queue->xmit_lock_owner = -1;
}

@@ -3896,6 +3910,7 @@ int register_netdevice(struct net_device *dev)
net = dev_net(dev);

spin_lock_init(&dev->addr_list_lock);
+ netdev_set_addr_lockdep_class(dev);
netdev_init_queue_locks(dev);

dev->iflink = -1;
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index fccc250..532e4fa 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -73,6 +73,7 @@ static const struct proto_ops nr_proto_ops;
* separate class since they always nest.
*/
static struct lock_class_key nr_netdev_xmit_lock_key;
+static struct lock_class_key nr_netdev_addr_lock_key;

static void nr_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -83,6 +84,7 @@ static void nr_set_lockdep_one(struct net_device *dev,

static void nr_set_lockdep_key(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock, &nr_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, nr_set_lockdep_one, NULL);
}

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index dbc963b..a7f1ce1 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -74,6 +74,7 @@ ax25_address rose_callsign;
* separate class since they always nest.
*/
static struct lock_class_key rose_netdev_xmit_lock_key;
+static struct lock_class_key rose_netdev_addr_lock_key;

static void rose_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
@@ -84,6 +85,7 @@ static void rose_set_lockdep_one(struct net_device *dev,

static void rose_set_lockdep_key(struct net_device *dev)
{
+ lockdep_set_class(&dev->addr_list_lock, &rose_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, rose_set_lockdep_one, NULL);
}


2008-08-01 04:27:29

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: David Miller <[email protected]>
Date: Thu, 31 Jul 2008 05:29:32 -0700 (PDT)

> It's late here, but I'll start testing the following patch on my
> multiqueue capable cards after some sleep.

As a quick followup, I tested this on a machine where I had
a multiqueue interface and could reproduce the lockdep warnings,
and the patch makes them go away.

So I've pushed the patch into net-2.6 and will send it to Linus.

2008-08-01 21:10:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, Jul 24, 2008 at 08:38:35PM +1000, Nick Piggin wrote:
> On Thursday 24 July 2008 20:08, Peter Zijlstra wrote:
> > On Thu, 2008-07-24 at 02:32 -0700, David Miller wrote:
> > > From: Peter Zijlstra <[email protected]>
> > > Date: Thu, 24 Jul 2008 11:27:05 +0200
> > >
> > > > Well, not only lockdep, taking a very large number of locks is
> > > > expensive as well.
> > >
> > > Right now it would be on the order of 16 or 32 for
> > > real hardware.
> > >
> > > Much less than the scheduler currently takes on some
> > > of my systems, so currently you are the pot calling the
> > > kettle black. :-)
> >
> > One nit, and then I'll let this issue rest :-)
> >
> > The scheduler has a long lock dependancy chain (nr_cpu_ids rq locks),
> > but it never takes all of them at the same time. Any one code path will
> > at most hold two rq locks.
>
> Aside from lockdep, is there a particular problem with taking 64k locks
> at once? (in a very slow path, of course) I don't think it causes a
> problem with preempt_count, does it cause issues with -rt kernel?
>
> Hey, something kind of cool (and OT) I've just thought of that we can
> do with ticket locks is to take tickets for 2 (or 64K) nested locks,
> and then wait for them both (all), so the cost is N*lock + longest spin,
> rather than N*lock + N*avg spin.
>
> That would mean even at the worst case of a huge amount of contention
> on all 64K locks, it should only take a couple of ms to take all of
> them (assuming max spin time isn't ridiculous).
>
> Probably not the kind of feature we want to expose widely, but for
> really special things like the scheduler, it might be a neat hack to
> save a few cycles ;) Traditional implementations would just have
> #define spin_lock_async spin_lock
> #define spin_lock_async_wait do {} while (0)
>
> Sorry it's offtopic, but if I didn't post it, I'd forget to. Might be
> a fun quick hack for someone.

FWIW, I did something similar in a previous life for the write-side of
a brlock-like locking mechanism. This was especially helpful if the
read-side critical sections were long.

Thanx, Paul

2008-08-01 07:36:15

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, Aug 01, 2008 at 12:01:46AM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Fri, 1 Aug 2008 07:01:50 +0000
>
> > On Fri, Aug 01, 2008 at 06:48:10AM +0000, Jarek Poplawski wrote:
> > > On Thu, Jul 31, 2008 at 05:29:32AM -0700, David Miller wrote:
> > ...
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 63d6bcd..69320a5 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -4200,6 +4200,7 @@ static void netdev_init_queues(struct net_device *dev)
> > > > {
> > > > netdev_init_one_queue(dev, &dev->rx_queue, NULL);
> > > > netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
> > > > + spin_lock_init(&dev->tx_global_lock);
> > >
> > > This will probably need some lockdep annotations similar to
> > > _xmit_lock.
> >
> > ...BTW, we probably could also consider some optimization here: the
> > xmit_lock of the first queue could be treated as special, and only
> > the owner could do such a freezing. This would save changes of
> > functionality to non mq devices. On the other hand, it would need
> > remembering about this special treatment (so, eg. a separate lockdep
> > initialization than all the others).
>
> I think special casing the zero's queue's lock is a bad idea.
> Having a real top-level synchronizer is a powerful tool and
> we could use it for other things.

Sure, if there is really no problem with lockdep here, there is no
need for this at all.

Thanks for the explanations,
Jarek P.

2008-08-01 06:42:54

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, Jul 31, 2008 at 05:29:32AM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Sun, 27 Jul 2008 22:37:57 +0200
>
> > Looks like enough to me. (Probably it could even share space with
> > the state.)

Alas I've some doubts here...

...
> static inline void netif_tx_unlock(struct net_device *dev)
> {
> unsigned int i;
>
> for (i = 0; i < dev->num_tx_queues; i++) {
> struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
> - __netif_tx_unlock(txq);
> - }
>
> + /* No need to grab the _xmit_lock here. If the
> + * queue is not stopped for another reason, we
> + * force a schedule.
> + */
> + clear_bit(__QUEUE_STATE_FROZEN, &txq->state);

The comments in asm-x86/bitops.h to set_bit/clear_bit are rather queer
about reordering on non x86: isn't eg. smp_mb_before_clear_bit()
useful here?

> + if (!test_bit(__QUEUE_STATE_XOFF, &txq->state))
> + __netif_schedule(txq->qdisc);
> + }
> + spin_unlock(&dev->tx_global_lock);
> }
...

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 63d6bcd..69320a5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4200,6 +4200,7 @@ static void netdev_init_queues(struct net_device *dev)
> {
> netdev_init_one_queue(dev, &dev->rx_queue, NULL);
> netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
> + spin_lock_init(&dev->tx_global_lock);

This will probably need some lockdep annotations similar to
_xmit_lock.

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 345838a..9c9cd4d 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -135,7 +135,8 @@ static inline int qdisc_restart(struct Qdisc *q)
> txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
>
> HARD_TX_LOCK(dev, txq, smp_processor_id());
> - if (!netif_subqueue_stopped(dev, skb))
> + if (!netif_tx_queue_stopped(txq) &&
> + !netif_tx_queue_frozen(txq))
> ret = dev_hard_start_xmit(skb, dev, txq);
> HARD_TX_UNLOCK(dev, txq);

This thing is the most doubtful to me: before this patch callers would
wait on this lock. Now they take the lock without problems, check the
flags, and let to take this lock again, doing some re-queing in the
meantime.

So, it seems HARD_TX_LOCK should rather do some busy looping now with
a trylock, and re-checking the _FROZEN flag. Maybe even this should
be done in __netif_tx_lock(). On the other hand, this shouldn't block
too much the owner of tx_global_lock() with taking such a lock.

Jarek P.

2008-08-01 21:10:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, Jul 24, 2008 at 09:06:51PM +1000, Nick Piggin wrote:
> On Thursday 24 July 2008 20:55, Miklos Szeredi wrote:
> > On Thu, 24 Jul 2008, Nick Piggin wrote:
> > > Hey, something kind of cool (and OT) I've just thought of that we can
> > > do with ticket locks is to take tickets for 2 (or 64K) nested locks,
> > > and then wait for them both (all), so the cost is N*lock + longest spin,
> > > rather than N*lock + N*avg spin.
> >
> > Isn't this deadlocky?
> >
> > E.g. one task takes ticket x=1, then other task comes in and takes x=2
> > and y=1, then first task takes y=2. Then neither can actually
> > complete both locks.
>
> Oh duh of course you still need mutual exclusion from the first lock
> to order the subsequent :P
>
> So yeah it only works for N > 2 locks, and you have to spin_lock the
> first one... so unsuitable for scheduler.

Or sort the locks by address or some such.

Thanx, Paul

2008-08-01 07:01:46

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Fri, 1 Aug 2008 07:01:50 +0000

> On Fri, Aug 01, 2008 at 06:48:10AM +0000, Jarek Poplawski wrote:
> > On Thu, Jul 31, 2008 at 05:29:32AM -0700, David Miller wrote:
> ...
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 63d6bcd..69320a5 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -4200,6 +4200,7 @@ static void netdev_init_queues(struct net_device *dev)
> > > {
> > > netdev_init_one_queue(dev, &dev->rx_queue, NULL);
> > > netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
> > > + spin_lock_init(&dev->tx_global_lock);
> >
> > This will probably need some lockdep annotations similar to
> > _xmit_lock.
>
> ...BTW, we probably could also consider some optimization here: the
> xmit_lock of the first queue could be treated as special, and only
> the owner could do such a freezing. This would save changes of
> functionality to non mq devices. On the other hand, it would need
> remembering about this special treatment (so, eg. a separate lockdep
> initialization than all the others).

I think special casing the zero's queue's lock is a bad idea.
Having a real top-level synchronizer is a powerful tool and
we could use it for other things.

2008-08-01 07:00:06

by David Miller

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

From: Jarek Poplawski <[email protected]>
Date: Fri, 1 Aug 2008 06:48:10 +0000

> On Thu, Jul 31, 2008 at 05:29:32AM -0700, David Miller wrote:
> > + /* No need to grab the _xmit_lock here. If the
> > + * queue is not stopped for another reason, we
> > + * force a schedule.
> > + */
> > + clear_bit(__QUEUE_STATE_FROZEN, &txq->state);
>
> The comments in asm-x86/bitops.h to set_bit/clear_bit are rather queer
> about reordering on non x86: isn't eg. smp_mb_before_clear_bit()
> useful here?

It doesn't matter, we need no synchronization here at all.

We unconditionally perform a __netif_schedule(), and that
will run the TX queue on the local cpu. We will take the
_xmit_lock at least once time if in fact the queue was not
stopped before the first froze it.

> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 63d6bcd..69320a5 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4200,6 +4200,7 @@ static void netdev_init_queues(struct net_device *dev)
> > {
> > netdev_init_one_queue(dev, &dev->rx_queue, NULL);
> > netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
> > + spin_lock_init(&dev->tx_global_lock);
>
> This will probably need some lockdep annotations similar to
> _xmit_lock.

I highly doubt it. It will never be taken nested with another
device's instance.

It is only ->hard_start_xmit() leading to another ->hard_start_xmit()
where this can currently happen, but tx_global_lock will not be used
in such paths.

> > @@ -135,7 +135,8 @@ static inline int qdisc_restart(struct Qdisc *q)
> > txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
> >
> > HARD_TX_LOCK(dev, txq, smp_processor_id());
> > - if (!netif_subqueue_stopped(dev, skb))
> > + if (!netif_tx_queue_stopped(txq) &&
> > + !netif_tx_queue_frozen(txq))
> > ret = dev_hard_start_xmit(skb, dev, txq);
> > HARD_TX_UNLOCK(dev, txq);
>
> This thing is the most doubtful to me: before this patch callers would
> wait on this lock. Now they take the lock without problems, check the
> flags, and let to take this lock again, doing some re-queing in the
> meantime.
>
> So, it seems HARD_TX_LOCK should rather do some busy looping now with
> a trylock, and re-checking the _FROZEN flag. Maybe even this should
> be done in __netif_tx_lock(). On the other hand, this shouldn't block
> too much the owner of tx_global_lock() with taking such a lock.

'ret' will be NETDEV_TX_BUSY in such a case (finding the queue
frozen), which will cause the while() loop in __qdisc_run() to
terminate.

The freezer will unconditionally schedule a new __qdisc_run()
when it unfreezes the queue.

Sure it's possible for some cpus to bang in and out of there
a few times, but that's completely harmless. And it can only
happen a few times since this freeze state is only held across
a critical section.


2008-08-01 07:09:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Thu, 2008-07-31 at 21:27 -0700, David Miller wrote:
> From: David Miller <[email protected]>
> Date: Thu, 31 Jul 2008 05:29:32 -0700 (PDT)
>
> > It's late here, but I'll start testing the following patch on my
> > multiqueue capable cards after some sleep.
>
> As a quick followup, I tested this on a machine where I had
> a multiqueue interface and could reproduce the lockdep warnings,
> and the patch makes them go away.
>
> So I've pushed the patch into net-2.6 and will send it to Linus.

Thanks david!


2008-08-01 06:56:33

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Kernel WARNING: at net/core/dev.c:1330 __netif_schedule+0x2c/0x98()

On Fri, Aug 01, 2008 at 06:48:10AM +0000, Jarek Poplawski wrote:
> On Thu, Jul 31, 2008 at 05:29:32AM -0700, David Miller wrote:
...
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 63d6bcd..69320a5 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4200,6 +4200,7 @@ static void netdev_init_queues(struct net_device *dev)
> > {
> > netdev_init_one_queue(dev, &dev->rx_queue, NULL);
> > netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
> > + spin_lock_init(&dev->tx_global_lock);
>
> This will probably need some lockdep annotations similar to
> _xmit_lock.

...BTW, we probably could also consider some optimization here: the
xmit_lock of the first queue could be treated as special, and only
the owner could do such a freezing. This would save changes of
functionality to non mq devices. On the other hand, it would need
remembering about this special treatment (so, eg. a separate lockdep
initialization than all the others).

Jarek P.