2007-05-10 02:55:00

by Kyle Moffett

[permalink] [raw]
Subject: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

I've managed to fairly reliably trigger a deadlock in some portion of
the linux networking code on my Debian test box (using the debian
kernel linux-image-2.6.20-1-686). I'm pretty sure that it's a race
condition of some sort as it doesn't trigger if I ifdown the
interfaces one by one, but if I run "ifdown -a" then it triggers
halfway through reliably (although not with the same reference count
numbers, once it was 6, this time it was 2).

The message I get is: "unregister_netdevice: waiting for world0 to
become free. Usage count = 2"

My /etc/network/interfaces file uses a couple custom-made if-pre-up.d
and if-post-down.d scripts to set up the bridges and VLANs a little
more cleanly than the standard debian scripts do, but the general
configuration is as follows:

net0: Tigon-3 onboard gigabit NIC, hooked to managed switch, untagged
packets
wfi0: (net0.2 before renaming) WAP-connected VLAN 2 on the managed
switch
world0: (net0.4094 before renaming) Internet connection, runs DHCP

lan: Local Area Network bridge of "net0" and "wfi0" (current box has
lowest STP priority) This will eventually also have another untagged-
only ethernet port attached to it.
lan:0: Alias for acting as primary nameserver

world: pseudo-bridge of "world0" for highly-available DHCP client.

Just for a bit of background on why this is so complex: When I get
this networking problem sorted out I'm going to set up heartbeat and
a dummy "world1" interface with a shared MAC which is added to the
"world" bridge when the current system is the DHCP-client master.
Hopefully that way I can have 3 systems act as a highly-available
router. Whichever one is currently the master will add the dummy
world1 interface with shared MAC address to its "world" bridge and
bring dhcp3-client up on the "world1" interface with the latest copy
of the leases file from the previous master (even if the previous
master mysteriously crashed. This should keep my public IP from
changing unnecessarily and should allow me to reboot any one of the 3
router/firewall/mailserver/fileserver/etc systems without adversely
affecting the internet connection or other critical services.

Eventually I plan to add ebtables to help filter traffic between wfi*
interfaces and untagged VLAN-1 net* interfaces, but for now there's
no filtering.

Anyways, after the unregister_netdevice messages start popping up all
sorts of networking related things start to hang hard in the kernel;
I can't "ip link set down" any interfaces, etc. I've captured sysrq
dumps of the process state on the system at the time with most
processes. See the attached syslog.bz2 file.

The important part is probably the stuck "vconfig" process, but I
don't really understand enough about the timer stuff to interpret
that backtrace.

vconfig D 83CCD8CE 0 16564 16562 (NOTLB)
efdd7e7c 00000086 ee120afb 83ccd8ce 98f00788 b7083ffa
5384b49a c76c0b05
9ebaf791 00000004 efdd7e4e 00000007 f1468a90 2ab74174
00000362 00000326
f1468b9c c180e420 00000001 00000286 c012933c efdd7e8c
df98a000 c180e468
Call Trace:
[<c012933c>] lock_timer_base+0x15/0x2f
[<c0129445>] __mod_timer+0x91/0x9b
[<c02988f5>] schedule_timeout+0x70/0x8d
[<f8b75209>] vlan_device_event+0x13/0xf8 [8021q]
[<c0128bfe>] process_timeout+0x0/0x5
[<c01295db>] msleep+0x1a/0x1f
[<c023e5a9>] netdev_run_todo+0x10f/0x203
[<f8b757d4>] vlan_ioctl_handler+0x4dc/0x594 [8021q]
[<c023398d>] sock_ioctl+0x145/0x1be
[<c0233848>] sock_ioctl+0x0/0x1be
[<c016c39f>] do_ioctl+0x1f/0x62
[<c016c626>] vfs_ioctl+0x244/0x256
[<c016c684>] sys_ioctl+0x4c/0x64
[<c0102d70>] syscall_call+0x7/0xb


There's also a whole bunch of processes stuck in netdev_run_todo()
trying to lock a mutex. This even frustratingly affects things like
rndc which use netlink sockets:

rpc.mountd D C9F2BD14 0 4351 1 4425 4229 (NOTLB)
c9f2bd28 00000086 00000002 c9f2bd14 c9f2bd10 00000000
000010ff 46422e36
00000000 00000002 00000202 00000007 ed266030 495bcd12
000003a5 00013461
ed26613c c180e420 00000001 00000000 dffc8200 dfaeb580
000010ff df99ce00
Call Trace:
[<c0298c09>] __mutex_lock_slowpath+0x48/0x77
[<c0298ac0>] mutex_lock+0xa/0xb
[<c023e4aa>] netdev_run_todo+0x10/0x203
[<c0251460>] netlink_run_queue+0x9f/0xbe
[<c0243ac9>] rtnetlink_rcv_msg+0x0/0x1d9
[<c0243a97>] rtnetlink_rcv+0x34/0x3d
[<c0251880>] netlink_data_ready+0x12/0x4c
[<c0250841>] netlink_sendskb+0x19/0x30
[<c0251862>] netlink_sendmsg+0x26a/0x276
[<c02341b9>] sock_sendmsg+0xd0/0xeb
[<c0131e95>] autoremove_wake_function+0x0/0x35
[<c020a2be>] extract_entropy+0x45/0x89
[<c011a485>] __wake_up+0x32/0x43
[<c0250aa6>] netlink_insert+0x10f/0x119
[<c02344e3>] sys_sendto+0x116/0x140
[<c014841a>] find_get_page+0x18/0x41
[<c014a736>] filemap_nopage+0x197/0x2f9
[<c0234ea9>] sys_getsockname+0x86/0xb0
[<c015341a>] __handle_mm_fault+0x2ee/0x7d4
[<c0235207>] sys_socketcall+0x15e/0x242
[<c0102d70>] syscall_call+0x7/0xb

The "zz-km-vlan" script is the bash if-post-down.d script in charge
of disassembling VLAN interfaces. There is an equivalent "zz-km-
bridge" script for bridge interfaces, as well as if-pre-up.d scripts
called "00-km-vlan" and "00-km-bridge" to create the interfaces.

If anyone has any suggestions, patches, or debugging tips I'm very
interested to hear from you. Thanks!

Cheers,
Kyle Moffett


2007-05-10 04:26:19

by Ben Greear

[permalink] [raw]
Subject: Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

Kyle Moffett wrote:
>
> vconfig D 83CCD8CE 0 16564 16562 (NOTLB)
> efdd7e7c 00000086 ee120afb 83ccd8ce 98f00788 b7083ffa 5384b49a
> c76c0b05
> 9ebaf791 00000004 efdd7e4e 00000007 f1468a90 2ab74174 00000362
> 00000326
> f1468b9c c180e420 00000001 00000286 c012933c efdd7e8c df98a000
> c180e468
> Call Trace:
> [<c012933c>] lock_timer_base+0x15/0x2f
> [<c0129445>] __mod_timer+0x91/0x9b
> [<c02988f5>] schedule_timeout+0x70/0x8d
> [<f8b75209>] vlan_device_event+0x13/0xf8 [8021q]

Looks like a deadlock in the vlan code. Any chance you can run this
test with
lockdep enabled?

You could also add a printk in vlan_device_event() to check which event
it is hanging on, and
the netdevice that is passed in.

Since the vlan code holds RTNL at this point, then most other network
tasks will eventually
hang as well.

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2007-05-10 04:34:33

by Kyle Moffett

[permalink] [raw]
Subject: Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

On May 10, 2007, at 00:25:54, Ben Greear wrote:
> Kyle Moffett wrote:
>> vconfig D 83CCD8CE 0 16564 16562
>> (NOTLB)
>> efdd7e7c 00000086 ee120afb 83ccd8ce 98f00788 b7083ffa
>> 5384b49a c76c0b05
>> 9ebaf791 00000004 efdd7e4e 00000007 f1468a90 2ab74174
>> 00000362 00000326
>> f1468b9c c180e420 00000001 00000286 c012933c efdd7e8c
>> df98a000 c180e468
>> Call Trace:
>> [<c012933c>] lock_timer_base+0x15/0x2f
>> [<c0129445>] __mod_timer+0x91/0x9b
>> [<c02988f5>] schedule_timeout+0x70/0x8d
>> [<f8b75209>] vlan_device_event+0x13/0xf8 [8021q]
>
> Looks like a deadlock in the vlan code. Any chance you can run
> this test with lockdep enabled?
>
> You could also add a printk in vlan_device_event() to check which
> event it is hanging on, and the netdevice that is passed in.

Ok, I'll try building a 2.6.21 kernel with lockdep and some debugging
printk()s in the vlan_device_event() function and get back to you
tomorrow. Thanks for the quick response!

> Since the vlan code holds RTNL at this point, then most other
> network tasks will eventually hang as well.

Well, it's less of an "eventually" and more of an "almost
immediately". When that happens pretty close to everything more
complicated than socket(), bind(), and connect() with straightforward
UNIX or INET sockets tends to stick completely.

Thanks again!

Cheers,
Kyle Moffett

2007-05-11 05:49:59

by Kyle Moffett

[permalink] [raw]
Subject: Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

On May 10, 2007, at 00:34:11, Kyle Moffett wrote:
> On May 10, 2007, at 00:25:54, Ben Greear wrote:
>> Looks like a deadlock in the vlan code. Any chance you can run
>> this test with lockdep enabled?
>>
>> You could also add a printk in vlan_device_event() to check which
>> event it is hanging on, and the netdevice that is passed in.
>
> Ok, I'll try building a 2.6.21 kernel with lockdep and some
> debugging printk()s in the vlan_device_event() function and get
> back to you tomorrow. Thanks for the quick response!

Progress!!! I built a 2.6.21.1 kernel with a 1MB dmesg buffer,
almost all of the locking debugging options on (as well as a few
others just for kicks), a VLAN debug #define turned on in the net/
8021q/vlan.h file, and lots of extra debugging messages added to the
functions in vlan.c. My initial interpretation is that due to the
funny order in which "ifdown -a" takes down interfaces, it tries to
delete the VLAN interfaces before the bridges running atop them have
been taken down. Ordinarily this seems to work, but when the
underlying physical ethernet is down already, the last VLAN to be
deleted seems to hang somehow. The full results are as follows:

The lock dependency validator at startup passes all 218 testcases,
indicating that all the locking crap is probably working correctly
(those debug options chew up another meg of RAM).

ifup -a brings up the interfaces in this order (See previous email
for configuration details):
lo net0 wfi0 world0 lan lan:0 world

ifdown -a appears to bring them down in the same order (at least,
until it gets stuck).

Attached below is filtered debugging information. I cut out 90% of
the crap in the syslog, but there's still a lot left over to sift
through; sorry. If you want my .config or the full text of the log
then email me privately and I'll send it to you, as it's kinda big.

I appreciate any advice, thanks for all your help

Cheers,
Kyle Moffett

This first bit is the "ifup -a -v -i interfaces":
> ADDRCONF(NETDEV_UP): net0: link is not ready
> vlan_ioctl_handler: args.cmd: 6
> vlan_ioctl_handler: args.cmd: 0
> register_vlan_device: if_name -:net0:-^Ivid: 2
> About to allocate name, vlan_name_type: 3
> Allocated new name -:net0.2:-
> About to go find the group for idx: 2
> vlan_transfer_operstate: net0 state transition applies to net0.2 too:
> vlan_proc_add, device -:net0.2:- being added.
> Allocated new device successfully, returning.
> wfi0: add 33:33:00:00:00:01 mcast address to master interface
> wfi0: add 01:00:5e:00:00:01 mcast address to master interface
> ADDRCONF(NETDEV_UP): wfi0: link is not ready
> vlan_ioctl_handler: args.cmd: 6
> vlan_ioctl_handler: args.cmd: 0
> register_vlan_device: if_name -:net0:-^Ivid: 4094
> About to allocate name, vlan_name_type: 3
> Allocated new name -:net0.4094:-
> About to go find the group for idx: 2
> vlan_transfer_operstate: net0 state transition applies to net0.4094
> too:
> vlan_proc_add, device -:net0.4094:- being added.
> Allocated new device successfully, returning.
> world0: add 33:33:00:00:00:01 mcast address to master interface
> world0: add 01:00:5e:00:00:01 mcast address to master interface
> ADDRCONF(NETDEV_UP): world0: link is not ready
> tg3: net0: Link is up at 1000 Mbps, full duplex.
> tg3: net0: Flow control is on for TX and on for RX.
> ADDRCONF(NETDEV_CHANGE): net0: link becomes ready
> Propagating NETDEV_CHANGE for device net0...
> ... to wfi0
> vlan_transfer_operstate: net0 state transition applies to
> wfi0 too:
> ...found a carrier, applying to VLAN device
> ... to world0
> vlan_transfer_operstate: net0 state transition applies to
> world0 too:
> ...found a carrier, applying to VLAN device
> lan: port 1(net0) entering listening state
> ADDRCONF(NETDEV_CHANGE): wfi0: link becomes ready
> wfi0: dev_set_promiscuity(master, 1)
> wfi0: add 33:33:ff:5f:60:92 mcast address to master interface
> lan: port 2(wfi0) entering listening state
> ADDRCONF(NETDEV_CHANGE): world0: link becomes ready
> world0: add 33:33:ff:91:e2:4c mcast address to master interface
> lan: no IPv6 routers present
> world: no IPv6 routers present
> net0: no IPv6 routers present
> world0: no IPv6 routers present
> wfi0: no IPv6 routers present
> lan: port 1(net0) entering learning state
> lan: port 2(wfi0) entering learning state
> lan: topology change detected, propagating
> lan: port 1(net0) entering forwarding state
> lan: topology change detected, propagating
> lan: port 2(wfi0) entering forwarding state

This bit is for "ifdown -a -v -i interfaces":
> Propagating NETDEV_DOWN for device net0...
> ... to wfi0
> wfi0: del 33:33:ff:5f:60:92 mcast address from vlan interface
> wfi0: del 33:33:ff:5f:60:92 mcast address from master interface
> wfi0: del 01:00:5e:00:00:01 mcast address from vlan interface
> wfi0: del 01:00:5e:00:00:01 mcast address from master interface
> wfi0: del 33:33:00:00:00:01 mcast address from vlan interface
> wfi0: del 33:33:00:00:00:01 mcast address from master interface
> lan: port 2(wfi0) entering disabled state
> ... to world0
> world0: del 33:33:ff:91:e2:4c mcast address from vlan interface
> world0: del 33:33:ff:91:e2:4c mcast address from master interface
> world0: del 01:00:5e:00:00:01 mcast address from vlan interface
> world0: del 01:00:5e:00:00:01 mcast address from master interface
> world0: del 33:33:00:00:00:01 mcast address from vlan interface
> world0: del 33:33:00:00:00:01 mcast address from master interface
> lan: port 1(net0) entering disabled state
> vlan_ioctl_handler: args.cmd: 1
> unregister_vlan_device: request to unregister VLAN device wfi0
> ...found device wfi0 to remove
> ...oh goodie, it really is a vlan
> unregister_vlan_dev: real_dev=net0 vlan_id=2
> ...found a VLAN group on net0
> ...found VLAN2 device wfi0 (on net0)
> vlan_proc_rem_dev: dev: eed3a800
> lan: port 2(wfi0) entering disabled state
> vlan_ioctl_handler: args.cmd: 1
> unregister_vlan_device: request to unregister VLAN device world0
> ...found device world0 to remove
> ...oh goodie, it really is a vlan
> unregister_vlan_dev: real_dev=net0 vlan_id=4094
> ...found a VLAN group on net0
> ...found VLAN4094 device world0 (on net0)
> vlan_proc_rem_dev: dev: efd10000
> ...disabling VLAN RX function for net0

And then it hangs; ugh. I think the disable-the-RX function callback
is making things die horribly somehow, but I don't know how. At that
point these messages start every couple seconds, of course:
> unregister_netdevice: waiting for world0 to become free. Usage
> count = 2

I managed to dump a lot of info with /proc/sysrq-trigger before I
rebooted the box:

Locks held after vconfig hang:
> 1 lock held by racoon/2760:
> #0: (&u->readlock){--..}, at: [<c02bac6c>] mutex_lock+0x1c/0x1f
> 1 lock held by snmpd/3652:
> #0: (net_todo_run_mutex){--..}, at: [<c02bac6c>] mutex_lock+0x1c/0x1f
> 1 lock held by getty/4460:
> #0: (&tty->atomic_read_lock){--..}, at: [<c02ba86d>]
> mutex_lock_interruptible+0x1c/0x1f
> 1 lock held by getty/4462:
> #0: (&tty->atomic_read_lock){--..}, at: [<c02ba86d>]
> mutex_lock_interruptible+0x1c/0x1f
> 1 lock held by getty/4468:
> #0: (&tty->atomic_read_lock){--..}, at: [<c02ba86d>]
> mutex_lock_interruptible+0x1c/0x1f
> 1 lock held by getty/4469:
> #0: (&tty->atomic_read_lock){--..}, at: [<c02ba86d>]
> mutex_lock_interruptible+0x1c/0x1f
> 1 lock held by getty/4474:
> #0: (&tty->atomic_read_lock){--..}, at: [<c02ba86d>]
> mutex_lock_interruptible+0x1c/0x1f
> 2 locks held by bash/5276:
> #0: (sysrq_key_table_lock){....}, at: [<c02369bd>] __handle_sysrq
> +0x1a/0x109
> #1: (tasklist_lock){..-?}, at: [<c013cc9b>] debug_show_all_locks
> +0x3e/0x12c
> 1 lock held by ntpdate/14168:
> #0: (net_todo_run_mutex){--..}, at: [<c02bac6c>] mutex_lock+0x1c/0x1f
> 2 locks held by vconfig/14440:
> #0: (vlan_ioctl_mutex){--..}, at: [<c02bac6c>] mutex_lock+0x1c/0x1f
> #1: (net_todo_run_mutex){--..}, at: [<c02bac6c>] mutex_lock+0x1c/0x1f

Timers active after vconfig hang:
> Timer List Version: v0.3
> HRTIMER_MAX_CLOCK_BASES: 2
> now at 843386820778 nsecs
>
> cpu: 0
> clock 0:
> .index: 0
> .resolution: 1 nsecs
> .get_time: ktime_get_real
> .offset: 1178840350000000000 nsecs
> active timers:
> clock 1:
> .index: 1
> .resolution: 1 nsecs
> .get_time: ktime_get
> .offset: 0 nsecs
> active timers:
> #0: <f3ae1ef4>, tick_sched_timer, S:01
> # expires at 843388000000 nsecs [in 1179222 nsecs]
> #1: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 843615749775 nsecs [in 228928997 nsecs]
> #2: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 844260242939 nsecs [in 873422161 nsecs]
> #3: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 844526938106 nsecs [in 1140117328 nsecs]
> #4: <f3ae1ef4>, it_real_fn, S:01
> # expires at 857313402669 nsecs [in 13926581891 nsecs]
> #5: <f3ae1ef4>, it_real_fn, S:01
> # expires at 864373659163 nsecs [in 20986838385 nsecs]
> #6: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 868262819523 nsecs [in 24875998745 nsecs]
> #7: <f3ae1ef4>, it_real_fn, S:01
> # expires at 869177517060 nsecs [in 25790696282 nsecs]
> #8: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 887144741534 nsecs [in 43757920756 nsecs]
> #9: <f3ae1ef4>, it_real_fn, S:01
> # expires at 1237719810874 nsecs [in 394332990096 nsecs]
> #10: <f3ae1ef4>, it_real_fn, S:01
> # expires at 3649061431571 nsecs [in 2805674610793 nsecs]
> #11: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 3653786151926 nsecs [in 2810399331148 nsecs]
> #12: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 15650596970527 nsecs [in 14807210149749 nsecs]
> #13: <f3ae1ef4>, it_real_fn, S:01
> # expires at 28838055898524 nsecs [in 27994669077746 nsecs]
> .expires_next : 843388000000 nsecs
> .hres_active : 1
> .nr_events : 177960
> .nohz_mode : 2
> .idle_tick : 836132000000 nsecs
> .tick_stopped : 0
> .idle_jiffies : 134035
> .idle_calls : 252157
> .idle_sleeps : 40637
> .idle_entrytime : 843384013943 nsecs
> .idle_sleeptime : 291305642112 nsecs
> .last_jiffies : 135847
> .next_jiffies : 135848
> .idle_expires : 836144000000 nsecs
> jiffies: 135847
>
> cpu: 1
> clock 0:
> .index: 0
> .resolution: 1 nsecs
> .get_time: ktime_get_real
> .offset: 1178840350000000000 nsecs
> active timers:
> clock 1:
> .index: 1
> .resolution: 1 nsecs
> .get_time: ktime_get
> .offset: 0 nsecs
> active timers:
> #0: <f3ae1ef4>, tick_sched_timer, S:01
> # expires at 843408000000 nsecs [in 21179222 nsecs]
> #1: <f3ae1ef4>, it_real_fn, S:01
> # expires at 844040130585 nsecs [in 653309807 nsecs]
> #2: <f3ae1ef4>, hrtimer_wakeup, S:01
> # expires at 891590009741 nsecs [in 48203188963 nsecs]
> .expires_next : 843408000000 nsecs
> .hres_active : 1
> .nr_events : 109759
> .nohz_mode : 2
> .idle_tick : 843344000000 nsecs
> .tick_stopped : 1
> .idle_jiffies : 135836
> .idle_calls : 158231
> .idle_sleeps : 12809
> .idle_entrytime : 843340014823 nsecs
> .idle_sleeptime : 448828615174 nsecs
> .last_jiffies : 135836
> .next_jiffies : 135853
> .idle_expires : 843408000000 nsecs
> jiffies: 135847
>
>
> Tick Device: mode: 1
> Clock Event Device: hpet
> max_delta_ns: 2147483647
> min_delta_ns: 3352
> mult: 61496110
> shift: 32
> mode: 3
> next_event: 9223372036854775807 nsecs
> set_next_event: hpet_next_event
> set_mode: hpet_set_mode
> event_handler: tick_handle_oneshot_broadcast
> tick_broadcast_mask: 00000000
> tick_broadcast_oneshot_mask: 00000000
>
>
> Tick Device: mode: 1
> Clock Event Device: lapic
> max_delta_ns: 671115476
> min_delta_ns: 1200
> mult: 53684938
> shift: 32
> mode: 3
> next_event: 843388000000 nsecs
> set_next_event: lapic_next_event
> set_mode: lapic_timer_setup
> event_handler: hrtimer_interrupt
>
> Tick Device: mode: 1
> Clock Event Device: lapic
> max_delta_ns: 671115476
> min_delta_ns: 1200
> mult: 53684938
> shift: 32
> mode: 3
> next_event: 843408000000 nsecs
> set_next_event: lapic_next_event
> set_mode: lapic_timer_setup
> event_handler: hrtimer_interrupt

At that point, these tasks were blocked:
> free sibling
> task PC stack pid father child younger older
> snmpd D F3675E20 4860 3652 1 (NOTLB)
> f3675e34 00200082 00000002 f3675e20 f3675e1c 00000000
> f364e030 00000000
> 00000007 f364e030 e08a8ac0 000000ae 0000ef1d f364e13c
> c1d0d280 00000000
> 00018ef7 c013dce6 000000ff 00000000 00000000 f3675e50
> c038c320 00200246
> Call Trace:
> [<c02bab1f>] __mutex_lock_slowpath+0x167/0x298
> [<c02bac6c>] mutex_lock+0x1c/0x1f
> [<c025d993>] netdev_run_todo+0x13/0x204
> [<c02634f5>] rtnl_unlock+0x2f/0x31
> [<c025d7da>] dev_ioctl+0x3a5/0x46e
> [<c0252aad>] sock_ioctl+0x1be/0x1c9
> [<c0178e4e>] do_ioctl+0x22/0x67
> [<c01790e5>] vfs_ioctl+0x252/0x265
> [<c0179141>] sys_ioctl+0x49/0x64
> [<c0103ea0>] syscall_call+0x7/0xb
> =======================
> ntpdate D EDC0BCA4 6004 14168 1 (NOTLB)
> edc0bcb8 00200082 00000002 edc0bca4 edc0bca0 00000000
> ee177570 00000000
> 0000000a ee177570 fcb9946b 000000ac 00023add ee17767c
> c1d0d280 00000000
> 00018709 c013dce6 000000ff 00000000 00000000 edc0bcd4
> c038c320 00200246
> Call Trace:
> [<c02bab1f>] __mutex_lock_slowpath+0x167/0x298
> [<c02bac6c>] mutex_lock+0x1c/0x1f
> [<c025d993>] netdev_run_todo+0x13/0x204
> [<c02634bc>] rtnetlink_rcv+0x38/0x42
> [<c0271a14>] netlink_data_ready+0x15/0x56
> [<c0270aef>] netlink_sendskb+0x1f/0x37
> [<c02713df>] netlink_unicast+0x1a8/0x1c2
> [<c02719f3>] netlink_sendmsg+0x264/0x270
> [<c025325a>] sock_sendmsg+0xe7/0x104
> [<c02534f4>] sys_sendto+0xcc/0xec
> [<c025414e>] sys_socketcall+0x14e/0x222
> [<c0103ea0>] syscall_call+0x7/0xb
> =======================
> vconfig D EEF6BE24 6440 14440 14438 (NOTLB)
> eef6be38 00000086 00000002 eef6be24 eef6be20 00000000
> 00000000 c012bd4a
> 00000007 edcfca70 67a281c2 000000d3 00001336 edcfcb7c
> c1d0d280 00000000
> 00022873 c012bd4a 000000ff 00000000 00000000 eef6be48
> 000228b4 00022834
> Call Trace:
> [<c02ba23b>] schedule_timeout+0x70/0x8e
> [<c02ba26e>] schedule_timeout_uninterruptible+0x15/0x17
> [<c012bd65>] msleep+0x10/0x16
> [<c025da91>] netdev_run_todo+0x111/0x204
> [<c02634f5>] rtnl_unlock+0x2f/0x31
> [<f8b3494d>] vlan_ioctl_handler+0x60b/0x6d0 [8021q]
> [<c0252a40>] sock_ioctl+0x151/0x1c9
> [<c0178e4e>] do_ioctl+0x22/0x67
> [<c01790e5>] vfs_ioctl+0x252/0x265
> [<c0179141>] sys_ioctl+0x49/0x64
> [<c0103ea0>] syscall_call+0x7/0xb
> =======================

As the system continues to run more processes get stuck in the
netdev_run_todo function waiting for vconfig to release a lock, while
vconfig appears to be waiting in the kernel for its interfaces to
lose reference counts, which they can't do as long as vconfig holds
random locks.

2007-05-12 12:14:18

by Kyle Moffett

[permalink] [raw]
Subject: Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

On May 11, 2007, at 01:49:27, Kyle Moffett wrote:
> On May 10, 2007, at 00:34:11, Kyle Moffett wrote:
>> On May 10, 2007, at 00:25:54, Ben Greear wrote:
>>> Looks like a deadlock in the vlan code. Any chance you can run
>>> this test with lockdep enabled?
>>>
>>> You could also add a printk in vlan_device_event() to check which
>>> event it is hanging on, and the netdevice that is passed in.
>>
>> Ok, I'll try building a 2.6.21 kernel with lockdep and some
>> debugging printk()s in the vlan_device_event() function and get
>> back to you tomorrow. Thanks for the quick response!

[snip]

> ifup -a brings up the interfaces in this order (See previous email
> for configuration details):
> lo net0 wfi0 world0 lan lan:0 world
>
> ifdown -a appears to bring them down in the same order (at least,
> until it gets stuck).

Hmm, turns out that it always hung downing this entry in my
interfaces file, independent of ordering:

iface world0 inet manual
mac-address 8b:8d:cb:91:e2:4c
minimally-up yes
vlan-dev net0
vlan-id 4094

By commenting out the MAC address line it worked. Yes, I realize the
MAC address specified there is bogus, I managed to {think,type}o that
one somehow. I had been intending to specify a locally-allocated
virtual MAC address on world0 but instead I managed to somehow assign
one with the MAC multicast bit set (01:00:00:00:00:00)
If I change the above garbage MAC to 02:00:00:00:00:01 (first 02 is
the locally-administrated bit) then it seems to work perfectly fine,
My guess that the bridging code doesn't properly drop all references
to world0 when it has that garbage MAC address on it (since the
problem only shows up when both the invalid mac-address is present
*AND* I start the "world" bridge). I suppose this isn't really a big
problem, but it would be nice if things didn't leak refcounts on
invalid input.

Cheers,
Kyle Moffett