Wow. Nearly 400 lines of debug spew, from a simple 'ifup eth1'.
Dave
ADDRCONF(NETDEV_UP): eth1: link is not ready
eth1: New link status: Disconnected (0002)
======================================================
[ INFO: hard-safe -> hard-unsafe lock order detected ]
------------------------------------------------------
events/0/5 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(af_callback_keys + sk->sk_family){-.--}, at: [<ffffffff802136b1>] sock_def_readable+0x19/0x6f
and this task is already holding:
(&priv->lock){++..}, at: [<ffffffff8824f70e>] orinoco_send_wevents+0x28/0x8b [orinoco]
which would create a new lock dependency:
(&priv->lock){++..} -> (af_callback_keys + sk->sk_family){-.--}
but this new dependency connects a hard-irq-safe lock:
(&priv->lock){++..}
... which became hard-irq-safe at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff8824f7be>] orinoco_interrupt+0x4d/0xf49 [orinoco]
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
to a hard-irq-unsafe lock:
(af_callback_keys + sk->sk_family){-.--}
... which became hard-irq-unsafe at:
... [<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267867>] _write_lock_bh+0x29/0x36
[<ffffffff80433960>] netlink_release+0x139/0x2ca
[<ffffffff80257903>] sock_release+0x19/0x9b
[<ffffffff80257b13>] sock_close+0x33/0x3a
[<ffffffff802130ee>] __fput+0xc6/0x1a8
[<ffffffff8022effe>] fput+0x13/0x16
[<ffffffff80225383>] filp_close+0x64/0x70
[<ffffffff8021eecc>] sys_close+0x93/0xb0
[<ffffffff8026048d>] system_call+0x7d/0x83
other info that might help us debug this:
1 lock held by events/0/5:
#0: (&priv->lock){++..}, at: [<ffffffff8824f70e>] orinoco_send_wevents+0x28/0x8b [orinoco]
the hard-irq-safe lock's dependencies:
-> (&priv->lock){++..} ops: 0 {
initial-use at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267a3e>] _spin_lock_irq+0x2a/0x38
[<ffffffff8824f102>] orinoco_init+0x934/0x966 [orinoco]
[<ffffffff8041e762>] register_netdevice+0xe6/0x375
[<ffffffff8041ea4b>] register_netdev+0x5a/0x69
[<ffffffff8826155f>] orinoco_cs_probe+0x3d7/0x475 [orinoco_cs]
[<ffffffff803daa02>] pcmcia_device_probe+0x7f/0x124
[<ffffffff803b5e74>] driver_probe_device+0x5b/0xb1
[<ffffffff803b5fde>] __driver_attach+0x88/0xdb
[<ffffffff803b5826>] bus_for_each_dev+0x48/0x7a
[<ffffffff803b5d9e>] driver_attach+0x1b/0x1e
[<ffffffff803b543e>] bus_add_driver+0x88/0x138
[<ffffffff803b6289>] driver_register+0x8e/0x93
[<ffffffff803da89b>] pcmcia_register_driver+0xd0/0xda
[<ffffffff880a9024>] 0xffffffff880a9024
[<ffffffff802af420>] sys_init_module+0x16f2/0x18b7
[<ffffffff8026048d>] system_call+0x7d/0x83
in-hardirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff8824f7be>] orinoco_interrupt+0x4d/0xf49 [orinoco]
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
in-softirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff8824f7be>] orinoco_interrupt+0x4d/0xf49 [orinoco]
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
[<ffffffff8028ebce>] scheduler_tick+0xc1/0x362
[<ffffffff80261739>] call_softirq+0x1d/0x28
[<ffffffff80295edb>] irq_exit+0x56/0x59
[<ffffffff8027a67f>] smp_apic_timer_interrupt+0x5c/0x62
[<ffffffff802610ad>] apic_timer_interrupt+0x69/0x70
}
... key at: [<ffffffff8825fd80>] __key.22351+0x0/0xffffffffffff27fa [orinoco]
-> (&cwq->lock){++..} ops: 0 {
initial-use at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff802a0314>] __queue_work+0x17/0x5e
[<ffffffff802a03de>] queue_work+0x4d/0x57
[<ffffffff8029fdda>] call_usermodehelper_keys+0x119/0x137
[<ffffffff8025af79>] kobject_uevent+0x3e5/0x42e
[<ffffffff803b6ebf>] class_device_add+0x314/0x471
[<ffffffff803b7034>] class_device_register+0x18/0x1d
[<ffffffff803b7130>] class_device_create+0xf7/0x129
[<ffffffff8097f2ed>] vtconsole_class_init+0x74/0xbb
[<ffffffff8026d7fc>] init+0x1fc/0x3cd
[<ffffffff802613dd>] child_rip+0x7/0x12
in-hardirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff802a0314>] __queue_work+0x17/0x5e
[<ffffffff802a03de>] queue_work+0x4d/0x57
[<ffffffff8033c786>] kblockd_schedule_work+0x15/0x18
[<ffffffff8034493b>] __cfq_slice_expired+0x63/0xe6
[<ffffffff80253352>] cfq_completed_request+0x116/0x154
[<ffffffff8033bb82>] elv_completed_request+0x38/0x85
[<ffffffff8033cca7>] __blk_put_request+0x35/0x9f
[<ffffffff8033cdfb>] end_that_request_last+0xea/0xf4
[<ffffffff8020b10a>] ide_end_request+0xf2/0x111
[<ffffffff8023f4a7>] ide_dma_intr+0x70/0xb5
[<ffffffff8020dcd6>] ide_intr+0x169/0x1df
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
in-softirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff802a0314>] __queue_work+0x17/0x5e
[<ffffffff802a03de>] queue_work+0x4d/0x57
[<ffffffff802a03fd>] schedule_work+0x15/0x18
[<ffffffff803639bb>] cursor_timer_handler+0x1b/0x38
[<ffffffff8029a391>] run_timer_softirq+0x14b/0x1d5
[<ffffffff80212a1f>] __do_softirq+0x67/0xf5
[<ffffffff80261739>] call_softirq+0x1d/0x28
[<ffffffff80295edb>] irq_exit+0x56/0x59
[<ffffffff8027a67f>] smp_apic_timer_interrupt+0x5c/0x62
[<ffffffff802610ad>] apic_timer_interrupt+0x69/0x70
}
... key at: [<ffffffff806c47a0>] __key.10352+0x0/0x8
-> (&q->lock){++..} ops: 0 {
initial-use at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267a3e>] _spin_lock_irq+0x2a/0x38
[<ffffffff80265123>] wait_for_completion+0x2f/0xb3
[<ffffffff802a34d4>] keventd_create_kthread+0x35/0x6a
[<ffffffff802a35d3>] kthread_create+0xca/0x153
[<ffffffff8028e085>] migration_call+0x60/0x44f
[<ffffffff80975115>] migration_init+0x27/0x4f
[<ffffffff8026d669>] init+0x69/0x3cd
[<ffffffff802613dd>] child_rip+0x7/0x12
in-hardirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff80230689>] __wake_up+0x21/0x50
[<ffffffff8038719b>] acpi_ec_gpe_handler+0x96/0xdb
[<ffffffff803734f2>] acpi_ev_gpe_dispatch+0x6e/0x160
[<ffffffff80373876>] acpi_ev_gpe_detect+0xae/0xff
[<ffffffff80371cf0>] acpi_ev_sci_xrupt_handler+0x19/0x22
[<ffffffff8036c543>] acpi_irq+0x10/0x1b
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
in-softirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff8028d434>] complete+0x1b/0x4c
[<ffffffff802a12dc>] wakeme_after_rcu+0xc/0xf
[<ffffffff802a1531>] __rcu_process_callbacks+0x154/0x1d9
[<ffffffff802a15d8>] rcu_process_callbacks+0x22/0x44
[<ffffffff80296014>] tasklet_action+0x6c/0xc5
[<ffffffff80212a1f>] __do_softirq+0x67/0xf5
[<ffffffff80261739>] call_softirq+0x1d/0x28
[<ffffffff80295edb>] irq_exit+0x56/0x59
[<ffffffff8027a67f>] smp_apic_timer_interrupt+0x5c/0x62
[<ffffffff802610ad>] apic_timer_interrupt+0x69/0x70
}
... key at: [<ffffffff806c4dd8>] __key.13972+0x0/0x8
-> (&rq->rq_lock_key){++..} ops: 0 {
initial-use at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff8028dd54>] init_idle+0x98/0xc7
[<ffffffff8097531d>] sched_init+0x1b8/0x1be
[<ffffffff809646e8>] start_kernel+0x7a/0x24c
[<ffffffff8096428a>] _sinittext+0x28a/0x292
[<ffffffffffffffff>] 0xffffffffffffffff
in-hardirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff802677ca>] _spin_lock+0x24/0x31
[<ffffffff8028eb81>] scheduler_tick+0x74/0x362
[<ffffffff8029abe0>] update_process_times+0x67/0x79
[<ffffffff80279f02>] smp_local_timer_interrupt+0x2a/0x50
[<ffffffff80271526>] main_timer_handler+0x202/0x3a5
[<ffffffff802716dd>] timer_interrupt+0x14/0x2a
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
in-softirq-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff802677ca>] _spin_lock+0x24/0x31
[<ffffffff8028de0c>] task_rq_lock+0x41/0x74
[<ffffffff802489a3>] try_to_wake_up+0x26/0x418
[<ffffffff8028e022>] wake_up_process+0xf/0x12
[<ffffffff8029a593>] process_timeout+0x8/0xb
[<ffffffff8029a391>] run_timer_softirq+0x14b/0x1d5
[<ffffffff80212a1f>] __do_softirq+0x67/0xf5
[<ffffffff80261739>] call_softirq+0x1d/0x28
[<ffffffff80295edb>] irq_exit+0x56/0x59
[<ffffffff8027a67f>] smp_apic_timer_interrupt+0x5c/0x62
[<ffffffff802610ad>] apic_timer_interrupt+0x69/0x70
}
... key at: [<ffff810002618700>] 0xffff810002618700
... acquired at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff802677ca>] _spin_lock+0x24/0x31
[<ffffffff8028de0c>] task_rq_lock+0x41/0x74
[<ffffffff802489a3>] try_to_wake_up+0x26/0x418
[<ffffffff8028e010>] default_wake_function+0xc/0xf
[<ffffffff8028c310>] __wake_up_common+0x3d/0x68
[<ffffffff8028d450>] complete+0x37/0x4c
[<ffffffff80235411>] kthread+0xda/0x136
[<ffffffff802613dd>] child_rip+0x7/0x12
... acquired at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff80230689>] __wake_up+0x21/0x50
[<ffffffff802a0347>] __queue_work+0x4a/0x5e
[<ffffffff802a03de>] queue_work+0x4d/0x57
[<ffffffff8029fdda>] call_usermodehelper_keys+0x119/0x137
[<ffffffff8025af79>] kobject_uevent+0x3e5/0x42e
[<ffffffff803b6ebf>] class_device_add+0x314/0x471
[<ffffffff803b7034>] class_device_register+0x18/0x1d
[<ffffffff803b7130>] class_device_create+0xf7/0x129
[<ffffffff8097f2ed>] vtconsole_class_init+0x74/0xbb
[<ffffffff8026d7fc>] init+0x1fc/0x3cd
[<ffffffff802613dd>] child_rip+0x7/0x12
... acquired at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff802a0314>] __queue_work+0x17/0x5e
[<ffffffff802a03de>] queue_work+0x4d/0x57
[<ffffffff802a03fd>] schedule_work+0x15/0x18
[<ffffffff8824fc31>] orinoco_interrupt+0x4c0/0xf49 [orinoco]
[<ffffffff8021151f>] handle_IRQ_event+0x2b/0x64
[<ffffffff802c0987>] __do_IRQ+0xae/0x114
[<ffffffff8026fca8>] do_IRQ+0xf7/0x107
[<ffffffff802609c4>] common_interrupt+0x64/0x65
-> (&list->lock){....} ops: 0 {
initial-use at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff80258024>] skb_queue_tail+0x1e/0x49
[<ffffffff80259ac6>] netlink_broadcast+0x211/0x2e2
[<ffffffff8025af3f>] kobject_uevent+0x3ab/0x42e
[<ffffffff803b6ebf>] class_device_add+0x314/0x471
[<ffffffff803b7034>] class_device_register+0x18/0x1d
[<ffffffff803b7130>] class_device_create+0xf7/0x129
[<ffffffff803ff248>] evdev_connect+0xfc/0x121
[<ffffffff803fd73a>] input_register_device+0x1e8/0x26d
[<ffffffff80400ac5>] atkbd_connect+0x23d/0x26d
[<ffffffff803f8861>] serio_connect_driver+0x2c/0x41
[<ffffffff803f8890>] serio_driver_probe+0x1a/0x1d
[<ffffffff803b5e74>] driver_probe_device+0x5b/0xb1
[<ffffffff803b5fde>] __driver_attach+0x88/0xdb
[<ffffffff803b5826>] bus_for_each_dev+0x48/0x7a
[<ffffffff803b5d9e>] driver_attach+0x1b/0x1e
[<ffffffff803b543e>] bus_add_driver+0x88/0x138
[<ffffffff803b6289>] driver_register+0x8e/0x93
[<ffffffff803f942c>] serio_thread+0x14c/0x2a9
[<ffffffff80235436>] kthread+0xff/0x136
[<ffffffff802613dd>] child_rip+0x7/0x12
}
... key at: [<ffffffff80919fb0>] __key.17572+0x0/0x8
... acquired at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267ba2>] _spin_lock_irqsave+0x2b/0x3c
[<ffffffff80258024>] skb_queue_tail+0x1e/0x49
[<ffffffff80259ac6>] netlink_broadcast+0x211/0x2e2
[<ffffffff804287ea>] wireless_send_event+0x2ff/0x317
[<ffffffff8824f731>] orinoco_send_wevents+0x4b/0x8b [orinoco]
[<ffffffff8024f99b>] run_workqueue+0xa7/0xfb
[<ffffffff8024c17f>] worker_thread+0xee/0x122
[<ffffffff80235436>] kthread+0xff/0x136
[<ffffffff802613dd>] child_rip+0x7/0x12
the hard-irq-unsafe lock's dependencies:
-> (af_callback_keys + sk->sk_family){-.--} ops: 0 {
initial-use at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267947>] _read_lock+0x27/0x34
[<ffffffff802136b0>] sock_def_readable+0x18/0x6f
[<ffffffff80259ad6>] netlink_broadcast+0x221/0x2e2
[<ffffffff8025af3f>] kobject_uevent+0x3ab/0x42e
[<ffffffff803b6ebf>] class_device_add+0x314/0x471
[<ffffffff803b7034>] class_device_register+0x18/0x1d
[<ffffffff803b7130>] class_device_create+0xf7/0x129
[<ffffffff803ff248>] evdev_connect+0xfc/0x121
[<ffffffff803fd73a>] input_register_device+0x1e8/0x26d
[<ffffffff80400ac5>] atkbd_connect+0x23d/0x26d
[<ffffffff803f8861>] serio_connect_driver+0x2c/0x41
[<ffffffff803f8890>] serio_driver_probe+0x1a/0x1d
[<ffffffff803b5e74>] driver_probe_device+0x5b/0xb1
[<ffffffff803b5fde>] __driver_attach+0x88/0xdb
[<ffffffff803b5826>] bus_for_each_dev+0x48/0x7a
[<ffffffff803b5d9e>] driver_attach+0x1b/0x1e
[<ffffffff803b543e>] bus_add_driver+0x88/0x138
[<ffffffff803b6289>] driver_register+0x8e/0x93
[<ffffffff803f942c>] serio_thread+0x14c/0x2a9
[<ffffffff80235436>] kthread+0xff/0x136
[<ffffffff802613dd>] child_rip+0x7/0x12
hardirq-on-W at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267867>] _write_lock_bh+0x29/0x36
[<ffffffff80433960>] netlink_release+0x139/0x2ca
[<ffffffff80257903>] sock_release+0x19/0x9b
[<ffffffff80257b13>] sock_close+0x33/0x3a
[<ffffffff802130ee>] __fput+0xc6/0x1a8
[<ffffffff8022effe>] fput+0x13/0x16
[<ffffffff80225383>] filp_close+0x64/0x70
[<ffffffff8021eecc>] sys_close+0x93/0xb0
[<ffffffff8026048d>] system_call+0x7d/0x83
softirq-on-R at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267947>] _read_lock+0x27/0x34
[<ffffffff802136b0>] sock_def_readable+0x18/0x6f
[<ffffffff80259ad6>] netlink_broadcast+0x221/0x2e2
[<ffffffff8025af3f>] kobject_uevent+0x3ab/0x42e
[<ffffffff803b6ebf>] class_device_add+0x314/0x471
[<ffffffff803b7034>] class_device_register+0x18/0x1d
[<ffffffff803b7130>] class_device_create+0xf7/0x129
[<ffffffff803ff248>] evdev_connect+0xfc/0x121
[<ffffffff803fd73a>] input_register_device+0x1e8/0x26d
[<ffffffff80400ac5>] atkbd_connect+0x23d/0x26d
[<ffffffff803f8861>] serio_connect_driver+0x2c/0x41
[<ffffffff803f8890>] serio_driver_probe+0x1a/0x1d
[<ffffffff803b5e74>] driver_probe_device+0x5b/0xb1
[<ffffffff803b5fde>] __driver_attach+0x88/0xdb
[<ffffffff803b5826>] bus_for_each_dev+0x48/0x7a
[<ffffffff803b5d9e>] driver_attach+0x1b/0x1e
[<ffffffff803b543e>] bus_add_driver+0x88/0x138
[<ffffffff803b6289>] driver_register+0x8e/0x93
[<ffffffff803f942c>] serio_thread+0x14c/0x2a9
[<ffffffff80235436>] kthread+0xff/0x136
[<ffffffff802613dd>] child_rip+0x7/0x12
hardirq-on-R at:
[<ffffffff802a8e62>] lock_acquire+0x4a/0x69
[<ffffffff80267947>] _read_lock+0x27/0x34
[<ffffffff802136b0>] sock_def_readable+0x18/0x6f
[<ffffffff80259ad6>] netlink_broadcast+0x221/0x2e2
[<ffffffff8025af3f>] kobject_uevent+0x3ab/0x42e
[<ffffffff803b6ebf>] class_device_add+0x314/0x471
[<ffffffff803b7034>] class_device_register+0x18/0x1d
[<ffffffff803b7130>] class_device_create+0xf7/0x129
[<ffffffff803ff248>] evdev_connect+0xfc/0x121
[<ffffffff803fd73a>] input_register_device+0x1e8/0x26d
[<ffffffff80400ac5>] atkbd_connect+0x23d/0x26d
[<ffffffff803f8861>] serio_connect_driver+0x2c/0x41
[<ffffffff803f8890>] serio_driver_probe+0x1a/0x1d
[<ffffffff803b5e74>] driver_probe_device+0x5b/0xb1
[<ffffffff803b5fde>] __driver_attach+0x88/0xdb
[<ffffffff803b5826>] bus_for_each_dev+0x48/0x7a
[<ffffffff803b5d9e>] driver_attach+0x1b/0x1e
[<ffffffff803b543e>] bus_add_driver+0x88/0x138
[<ffffffff803b6289>] driver_register+0x8e/0x93
[<ffffffff803f942c>] serio_thread+0x14c/0x2a9
[<ffffffff80235436>] kthread+0xff/0x136
[<ffffffff802613dd>] child_rip+0x7/0x12
}
... key at: [<ffffffff8091a280>] af_callback_keys+0x80/0x100
stack backtrace:
Call Trace:
[<ffffffff8026e7fd>] show_trace+0xae/0x30e
[<ffffffff8026ea72>] dump_stack+0x15/0x17
[<ffffffff802a7dc1>] check_usage+0x27d/0x28e
[<ffffffff802a86e6>] __lock_acquire+0x878/0xa54
[<ffffffff802a8e63>] lock_acquire+0x4b/0x69
[<ffffffff80267948>] _read_lock+0x28/0x34
[<ffffffff802136b1>] sock_def_readable+0x19/0x6f
[<ffffffff80259ad7>] netlink_broadcast+0x222/0x2e2
[<ffffffff804287eb>] wireless_send_event+0x300/0x317
[<ffffffff8824f732>] :orinoco:orinoco_send_wevents+0x4c/0x8b
[<ffffffff8024f99c>] run_workqueue+0xa8/0xfb
[<ffffffff8024c180>] worker_thread+0xef/0x122
[<ffffffff80235437>] kthread+0x100/0x136
[<ffffffff802613de>] child_rip+0x8/0x12
DWARF2 unwinder stuck at child_rip+0x8/0x12
Leftover inexact backtrace:
[<ffffffff80267ab2>] _spin_unlock_irq+0x2b/0x31
[<ffffffff80260a1b>] restore_args+0x0/0x30
[<ffffffff80235337>] kthread+0x0/0x136
[<ffffffff802613d6>] child_rip+0x0/0x12
eth1: New link status: Connected (0001)
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
--
http://www.codemonkey.org.uk
On Wed, 2006-08-02 at 17:59 -0400, Dave Jones wrote:
> Wow. Nearly 400 lines of debug spew, from a simple 'ifup eth1'.
>
> Dave
>
>
> ADDRCONF(NETDEV_UP): eth1: link is not ready
> eth1: New link status: Disconnected (0002)
>
> ======================================================
> [ INFO: hard-safe -> hard-unsafe lock order detected ]
> ------------------------------------------------------
> events/0/5 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> (af_callback_keys + sk->sk_family){-.--}, at: [<ffffffff802136b1>] sock_def_readable+0x19/0x6f
>
> and this task is already holding:
> (&priv->lock){++..}, at: [<ffffffff8824f70e>] orinoco_send_wevents+0x28/0x8b [orinoco]
> which would create a new lock dependency:
> (&priv->lock){++..} -> (af_callback_keys + sk->sk_family){-.--}
> [<ffffffff80267948>] _read_lock+0x28/0x34
> [<ffffffff802136b1>] sock_def_readable+0x19/0x6f
> [<ffffffff80259ad7>] netlink_broadcast+0x222/0x2e2
> [<ffffffff804287eb>] wireless_send_event+0x300/0x317
> [<ffffffff8824f732>] :orinoco:orinoco_send_wevents+0x4c/0x8b
> [<ffffffff8024f99c>] run_workqueue+0xa8/0xfb
> [<ffffffff8024c180>] worker_thread+0xef/0x122
> [<ffffffff80235437>] kthread+0x100/0x136
> [<ffffffff802613de>] child_rip+0x8/0x12
this is another one of those nasty buggers;
Lock A = the sk->sk_callback_lock
Lock B = priv->lock in the driver
Lock A is only BH safe
Lock B is hardirq safe and used in the hardirq
Cpu 0 cpu 1
user closes the netlink socket
takes lock B in orinoco_send_events
takes lock A in user context in netlink_release() (for write)
interrupt happens
takes lock B in hardirq handler (spins)
calls netlink_broadcast
which takes lock A for read (spins)
and you have a nice classical AB-BA deadlock
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Arjan van de Ven <[email protected]> wrote:
>
> this is another one of those nasty buggers;
Good catch. It's really time that we fix this properly rather than
adding more kludges to the core code.
Dave, once this goes in you can revert the previous netlink workaround
that added the _bh suffix.
[WIRELESS]: Send wireless netlink events with a clean slate
Drivers expect to be able to call wireless_send_event in arbitrary
contexts. On the other hand, netlink really doesn't like being
invoked in an IRQ context. So we need to postpone the sending of
netlink skb's to a tasklet.
Signed-off-by: Herbert Xu <[email protected]>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/core/wireless.c b/net/core/wireless.c
index d2bc72d..de0bde4 100644
--- a/net/core/wireless.c
+++ b/net/core/wireless.c
@@ -82,6 +82,7 @@ #include <linux/seq_file.h>
#include <linux/init.h> /* for __init */
#include <linux/if_arp.h> /* ARPHRD_ETHER */
#include <linux/etherdevice.h> /* compare_ether_addr */
+#include <linux/interrupt.h>
#include <linux/wireless.h> /* Pretty obvious */
#include <net/iw_handler.h> /* New driver API */
@@ -1842,6 +1843,18 @@ #endif /* CONFIG_NET_WIRELESS_RTNETLINK
*/
#ifdef WE_EVENT_RTNETLINK
+static struct sk_buff_head wireless_nlevent_queue;
+
+static void wireless_nlevent_process(unsigned long data)
+{
+ struct sk_buff *skb;
+
+ while ((skb = skb_dequeue(&wireless_nlevent_queue)))
+ netlink_broadcast(rtnl, skb, 0, RTNLGRP_LINK, GFP_ATOMIC);
+}
+
+static DECLARE_TASKLET(wireless_nlevent_tasklet, wireless_nlevent_process, 0);
+
/* ---------------------------------------------------------------- */
/*
* Fill a rtnetlink message with our event data.
@@ -1904,8 +1917,17 @@ static inline void rtmsg_iwinfo(struct n
return;
}
NETLINK_CB(skb).dst_group = RTNLGRP_LINK;
- netlink_broadcast(rtnl, skb, 0, RTNLGRP_LINK, GFP_ATOMIC);
+ skb_queue_tail(&wireless_nlevent_queue, skb);
+ tasklet_schedule(&wireless_nlevent_tasklet);
+}
+
+static int __init wireless_nlevent_init(void)
+{
+ skb_queue_head_init(&wireless_nlevent_queue);
+ return 0;
}
+
+subsys_initcall(wireless_nlevent_init);
#endif /* WE_EVENT_RTNETLINK */
/* ---------------------------------------------------------------- */
On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> Arjan van de Ven <[email protected]> wrote:
> >
> > this is another one of those nasty buggers;
>
> Good catch. It's really time that we fix this properly rather than
> adding more kludges to the core code.
>
> Dave, once this goes in you can revert the previous netlink workaround
> that added the _bh suffix.
>
> [WIRELESS]: Send wireless netlink events with a clean slate
Could we please just get rid of the wireless extensions over netlink code
again? It doesn't help to solve anything and just creates a bigger mess
to untangle when switching to a fully fledged wireless stack.
Herbert Xu wrote:
Hi,
> Arjan van de Ven <[email protected]> wrote:
>> this is another one of those nasty buggers;
>
> Good catch. It's really time that we fix this properly rather than
> adding more kludges to the core code.
however I'm not quite yet convinced that this patch is going to solve
this particular deadlock.
(I agree with the principle of it and I think it's really needed,
I just don't yet see how it's going to solve this specific deadlock. But
then again it's early and I've not had sufficient coffee yet so I could
well be wrong)
> [WIRELESS]: Send wireless netlink events with a clean slate
>
> Drivers expect to be able to call wireless_send_event in arbitrary
> contexts. On the other hand, netlink really doesn't like being
> invoked in an IRQ context. So we need to postpone the sending of
> netlink skb's to a tasklet.
it's not just about irq context, it's about being called with any lock that's
used in IRQ context; that is what makes this double nasty...
Greetings,
Arjan van de Ven
On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> Arjan van de Ven <[email protected]> wrote:
> >
> > this is another one of those nasty buggers;
>
> Good catch. It's really time that we fix this properly rather than
> adding more kludges to the core code.
>
> Dave, once this goes in you can revert the previous netlink workaround
> that added the _bh suffix.
>
> [WIRELESS]: Send wireless netlink events with a clean slate
>
> Drivers expect to be able to call wireless_send_event in arbitrary
> contexts. On the other hand, netlink really doesn't like being
> invoked in an IRQ context. So we need to postpone the sending of
> netlink skb's to a tasklet.
Yes, this was needed. I really like the way you implemented
it, simple and efficient. Go for it !
> Signed-off-by: Herbert Xu <[email protected]>
For what it's worth :
Signed-off-by: Jean Tourrilhes <[email protected]>
> Cheers,
Thanks !
Jean
On Thu, Aug 03, 2006 at 03:11:53PM +0100, Christoph Hellwig wrote:
> On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> > Arjan van de Ven <[email protected]> wrote:
> > >
> > > this is another one of those nasty buggers;
> >
> > Good catch. It's really time that we fix this properly rather than
> > adding more kludges to the core code.
> >
> > Dave, once this goes in you can revert the previous netlink workaround
> > that added the _bh suffix.
> >
> > [WIRELESS]: Send wireless netlink events with a clean slate
>
> Could we please just get rid of the wireless extensions over netlink code
> again? It doesn't help to solve anything and just creates a bigger mess
> to untangle when switching to a fully fledged wireless stack.
That's not going to happen any time soon, NetworkManager
depends on Wireless Events, as well as many other apps. And there is
not many mechanisms you can use in the kernel to generate events from
driver to userspace.
Have fun...
Jean
On Thu, Aug 03, 2006 at 11:58:00AM -0700, Jean Tourrilhes wrote:
> On Thu, Aug 03, 2006 at 03:11:53PM +0100, Christoph Hellwig wrote:
> > On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> > > Arjan van de Ven <[email protected]> wrote:
> > > >
> > > > this is another one of those nasty buggers;
> > >
> > > Good catch. It's really time that we fix this properly rather than
> > > adding more kludges to the core code.
> > >
> > > Dave, once this goes in you can revert the previous netlink workaround
> > > that added the _bh suffix.
> > >
> > > [WIRELESS]: Send wireless netlink events with a clean slate
> >
> > Could we please just get rid of the wireless extensions over netlink code
> > again? It doesn't help to solve anything and just creates a bigger mess
> > to untangle when switching to a fully fledged wireless stack.
>
> That's not going to happen any time soon, NetworkManager
> depends on Wireless Events, as well as many other apps. And there is
> not many mechanisms you can use in the kernel to generate events from
> driver to userspace.
It seemed to cope pretty well before we had this ?
Dave
--
http://www.codemonkey.org.uk
On Thu, Aug 03, 2006 at 03:11:53PM +0100, Christoph Hellwig wrote:
> Could we please just get rid of the wireless extensions over netlink code
> again? It doesn't help to solve anything and just creates a bigger mess
> to untangle when switching to a fully fledged wireless stack.
If we're going to do that, now is probably the best time to do it,
before any distro userland starts using it.
Dave
--
http://www.codemonkey.org.uk
On Thu, Aug 03, 2006 at 02:59:58PM -0400, Dave Jones wrote:
> On Thu, Aug 03, 2006 at 11:58:00AM -0700, Jean Tourrilhes wrote:
> > On Thu, Aug 03, 2006 at 03:11:53PM +0100, Christoph Hellwig wrote:
> > > On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> > > > Arjan van de Ven <[email protected]> wrote:
> > > > >
> > > > > this is another one of those nasty buggers;
> > > >
> > > > Good catch. It's really time that we fix this properly rather than
> > > > adding more kludges to the core code.
> > > >
> > > > Dave, once this goes in you can revert the previous netlink workaround
> > > > that added the _bh suffix.
> > > >
> > > > [WIRELESS]: Send wireless netlink events with a clean slate
> > >
> > > Could we please just get rid of the wireless extensions over netlink code
> > > again? It doesn't help to solve anything and just creates a bigger mess
> > > to untangle when switching to a fully fledged wireless stack.
> >
> > That's not going to happen any time soon, NetworkManager
> > depends on Wireless Events, as well as many other apps. And there is
> > not many mechanisms you can use in the kernel to generate events from
> > driver to userspace.
>
> It seemed to cope pretty well before we had this ?
Wireless Events were introduced in kernel 2.4.20 and 2.5.7,
which means 2002. NetworkManager and WPA Supplicant were based from
the very start on the availability of Wireless Events.
You are confusing different things...
> Dave
Have fun...
Jean
P.S. : By the way, don't ask me why it took four years for this bug to
get discovered...
Jean Tourrilhes wrote:
> Jean
>
> P.S. : By the way, don't ask me why it took four years for this bug to
> get discovered...
that I could answer: Only from 2.6.18-rc1 onwards does the kernel have a built in deadlock finder :)
On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> Arjan van de Ven <[email protected]> wrote:
> >
> > this is another one of those nasty buggers;
>
> Good catch. It's really time that we fix this properly rather than
> adding more kludges to the core code.
>
> Dave, once this goes in you can revert the previous netlink workaround
> that added the _bh suffix.
>
> [WIRELESS]: Send wireless netlink events with a clean slate
>
> Drivers expect to be able to call wireless_send_event in arbitrary
> contexts. On the other hand, netlink really doesn't like being
> invoked in an IRQ context. So we need to postpone the sending of
> netlink skb's to a tasklet.
>
> Signed-off-by: Herbert Xu <[email protected]>
Does anyone have any objection to Herbert's patch? It seems
appropriate to me.
Arjan, did you convince yourself whether or not this patch actually
resolves the problem at hand? Applying it makes sense to me either
way, but it would be nice to believe it fixed a known issue. :-)
John
--
John W. Linville
[email protected]
On Thu, Aug 03, 2006 at 03:53:13PM -0400, John W. Linville wrote:
>
> Does anyone have any objection to Herbert's patch? It seems
> appropriate to me.
I have no objections!
:)
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, Aug 03, 2006 at 08:22:58AM -0700, Arjan van de Ven wrote:
>
> however I'm not quite yet convinced that this patch is going to solve
> this particular deadlock.
> (I agree with the principle of it and I think it's really needed,
> I just don't yet see how it's going to solve this specific deadlock. But
> then again it's early and I've not had sufficient coffee yet so I could
> well be wrong)
Well it solves the dead lock by breaking the chain that links the
netlink system with the jungle of wireless locking :)
The spin lock in sk_buff_head acts as a mediator. We only feed the
skb to the netlink system once that spin lock has been dropped.
> it's not just about irq context, it's about being called with any lock
> that's
> used in IRQ context; that is what makes this double nasty...
Yes it is nasty. However, so far wireless seems to be the only offender.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, 2006-08-03 at 15:53 -0400, John W. Linville wrote:
> On Thu, Aug 03, 2006 at 11:54:41PM +1000, Herbert Xu wrote:
> Arjan, did you convince yourself whether or not this patch actually
> resolves the problem at hand? Applying it makes sense to me either
> way, but it would be nice to believe it fixed a known issue. :-)
it'll fix a whole bunch of issues for sure, and this one as well afaics
(now with coffee ;-).. it probably won't fix all of them, but that's ok,
with this in place we actually CAN fix any others that pop up, right now
without this patch we probably can't.
> John
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com