2003-05-15 12:55:03

by Martin Diehl

[permalink] [raw]
Subject: [2.5.69] rtnl-deadlock with usermodehelper and keventd


Hi,

seems we may run into mutual deadlock in the unregister_netdev() path with
CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to
the following description:

* killing irattach (userland daemon comparable to pppd) starts closing the
irda tty-ldisc

* there we call unregister_netdev() on behalf of the (already closed)
irda0 network device.

* unregister_netdev() takes rtnl_lock

* further down in unregister_netdevice() with CONFIG_HOTPLUG the network
layers wants to call userland hotplug stuff

* the request to fork the usermodehelper gets queued for the event/0
workqueue (aka keventd) and we are blocking with rtnl still acquired for
completion.

* at this moment for some reason keventd has a linkwatch_event()
apparently already scheduled before the usermode helper. So we run into
linkwatch_event() with tries to get rtnl_lock.

-> mutual deadlock: keventd waiting for rtnl_lock which is still hold by
unregister_netdev blocking for completion of work scheduled for keventd.

I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter
what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no
idea why I get this with irda0 but not with eth0 for example.
FWIW kernel is SMP running on UP without preempt.

As I don't see how the irda stuff could cause unregister_netdev() to
schedule the hotplug stuff with some linkwatch_event already scheduled
I've no idea what the real problem and fix might be.

Below a commented calltrace catched right when it hangs as described.

Thanks
Martin

-----------------------------

> May 14 13:14:17 laptop kernel: events/0 D C12FDF04 412092 3 1 4 2 (L-TLB)
> May 14 13:14:17 laptop kernel: Call Trace:
> May 14 13:14:17 laptop kernel: [__down+150/256] __down+0x96/0x100
> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [__down_failed+8/12] __down_failed+0x8/0xc
> May 14 13:14:17 laptop kernel: [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36
> May 14 13:14:17 laptop kernel: [linkwatch_event+29/48] linkwatch_event+0x1d/0x30
> May 14 13:14:17 laptop kernel: [worker_thread+511/736] worker_thread+0x1ff/0x2e0
> May 14 13:14:17 laptop kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30
> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14
> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [worker_thread+0/736] worker_thread+0x0/0x2e0
> May 14 13:14:17 laptop kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18

This is the keventd-thread. It has some work scheduled for the network
layer, namely linkwatch_event(). This is currently blocking to get the
rtnl_lock semaphore.


> May 14 13:14:17 laptop kernel: irattach D 00000000 4283667124 400 1 537 396 (NOTLB)
> May 14 13:14:17 laptop kernel: Call Trace:
> May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> May 14 13:14:17 laptop kernel: [wait_for_completion+153/224] wait_for_completion+0x99/0xe0

(5)

> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [queue_work+132/160] queue_work+0x84/0xa0

(4)

> May 14 13:14:17 laptop kernel: [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110
> May 14 13:14:17 laptop kernel: [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70
> May 14 13:14:17 laptop kernel: [vsprintf+39/48] vsprintf+0x27/0x30
> May 14 13:14:17 laptop kernel: [sprintf+31/48] sprintf+0x1f/0x30
> May 14 13:14:17 laptop kernel: [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3

(3)

> May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> May 14 13:14:17 laptop kernel: [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0
> May 14 13:14:17 laptop kernel: [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0
> May 14 13:14:17 laptop kernel: [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260
> May 14 13:14:17 laptop kernel: [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev]

(2)

> May 14 13:14:17 laptop kernel: [unregister_netdev+24/48] unregister_netdev+0x18/0x30

(1)

> May 14 13:14:17 laptop kernel: [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev]
> May 14 13:14:17 laptop kernel: [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev]
> May 14 13:14:17 laptop kernel: [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir]
> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0
> May 14 13:14:17 laptop kernel: [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0
> May 14 13:14:17 laptop kernel: [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110
> May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel: [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20
> May 14 13:14:17 laptop kernel: [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir]
> May 14 13:14:17 laptop kernel: [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir]
> May 14 13:14:17 laptop kernel: [sys_ioctl+256/656] sys_ioctl+0x100/0x290
> May 14 13:14:17 laptop kernel: [syscall_call+7/11] syscall_call+0x7/0xb

Ok, nice trace btw: The last printk from sir_dev was at (1) before we
called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to
the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer
decided to call the hotplug stuff (3). For this to fork the usermode
helper, it scheduled some work for keventd (4). Finally we are blocking
for completion until keventd finishes wait4 usermodehelper (5).

Unfortunately we are blocking for completion with rtnl still locked and
keventd apparently having the linkwatch_event() scheduled before the
usermodehelper -> mutual deadlock between irattach and keventd!


2003-05-15 20:01:16

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

Greg,

This is a HotPlug problem, so would you mind forwarding this
to the relevant person and help Martin ?
Thanks in advance...

Jean

On Thu, May 15, 2003 at 03:14:36PM +0200, Martin Diehl wrote:
>
> Hi,
>
> seems we may run into mutual deadlock in the unregister_netdev() path with
> CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to
> the following description:
>
> * killing irattach (userland daemon comparable to pppd) starts closing the
> irda tty-ldisc
>
> * there we call unregister_netdev() on behalf of the (already closed)
> irda0 network device.
>
> * unregister_netdev() takes rtnl_lock
>
> * further down in unregister_netdevice() with CONFIG_HOTPLUG the network
> layers wants to call userland hotplug stuff
>
> * the request to fork the usermodehelper gets queued for the event/0
> workqueue (aka keventd) and we are blocking with rtnl still acquired for
> completion.
>
> * at this moment for some reason keventd has a linkwatch_event()
> apparently already scheduled before the usermode helper. So we run into
> linkwatch_event() with tries to get rtnl_lock.
>
> -> mutual deadlock: keventd waiting for rtnl_lock which is still hold by
> unregister_netdev blocking for completion of work scheduled for keventd.
>
> I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter
> what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no
> idea why I get this with irda0 but not with eth0 for example.
> FWIW kernel is SMP running on UP without preempt.
>
> As I don't see how the irda stuff could cause unregister_netdev() to
> schedule the hotplug stuff with some linkwatch_event already scheduled
> I've no idea what the real problem and fix might be.
>
> Below a commented calltrace catched right when it hangs as described.
>
> Thanks
> Martin
>
> -----------------------------
>
> > May 14 13:14:17 laptop kernel: events/0 D C12FDF04 412092 3 1 4 2 (L-TLB)
> > May 14 13:14:17 laptop kernel: Call Trace:
> > May 14 13:14:17 laptop kernel: [__down+150/256] __down+0x96/0x100
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [__down_failed+8/12] __down_failed+0x8/0xc
> > May 14 13:14:17 laptop kernel: [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36
> > May 14 13:14:17 laptop kernel: [linkwatch_event+29/48] linkwatch_event+0x1d/0x30
> > May 14 13:14:17 laptop kernel: [worker_thread+511/736] worker_thread+0x1ff/0x2e0
> > May 14 13:14:17 laptop kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [worker_thread+0/736] worker_thread+0x0/0x2e0
> > May 14 13:14:17 laptop kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
>
> This is the keventd-thread. It has some work scheduled for the network
> layer, namely linkwatch_event(). This is currently blocking to get the
> rtnl_lock semaphore.
>
>
> > May 14 13:14:17 laptop kernel: irattach D 00000000 4283667124 400 1 537 396 (NOTLB)
> > May 14 13:14:17 laptop kernel: Call Trace:
> > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> > May 14 13:14:17 laptop kernel: [wait_for_completion+153/224] wait_for_completion+0x99/0xe0
>
> (5)
>
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [queue_work+132/160] queue_work+0x84/0xa0
>
> (4)
>
> > May 14 13:14:17 laptop kernel: [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110
> > May 14 13:14:17 laptop kernel: [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70
> > May 14 13:14:17 laptop kernel: [vsprintf+39/48] vsprintf+0x27/0x30
> > May 14 13:14:17 laptop kernel: [sprintf+31/48] sprintf+0x1f/0x30
> > May 14 13:14:17 laptop kernel: [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3
>
> (3)
>
> > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> > May 14 13:14:17 laptop kernel: [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0
> > May 14 13:14:17 laptop kernel: [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0
> > May 14 13:14:17 laptop kernel: [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260
> > May 14 13:14:17 laptop kernel: [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev]
>
> (2)
>
> > May 14 13:14:17 laptop kernel: [unregister_netdev+24/48] unregister_netdev+0x18/0x30
>
> (1)
>
> > May 14 13:14:17 laptop kernel: [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev]
> > May 14 13:14:17 laptop kernel: [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev]
> > May 14 13:14:17 laptop kernel: [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0
> > May 14 13:14:17 laptop kernel: [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0
> > May 14 13:14:17 laptop kernel: [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20
> > May 14 13:14:17 laptop kernel: [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [sys_ioctl+256/656] sys_ioctl+0x100/0x290
> > May 14 13:14:17 laptop kernel: [syscall_call+7/11] syscall_call+0x7/0xb
>
> Ok, nice trace btw: The last printk from sir_dev was at (1) before we
> called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to
> the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer
> decided to call the hotplug stuff (3). For this to fork the usermode
> helper, it scheduled some work for keventd (4). Finally we are blocking
> for completion until keventd finishes wait4 usermodehelper (5).
>
> Unfortunately we are blocking for completion with rtnl still locked and
> keventd apparently having the linkwatch_event() scheduled before the
> usermodehelper -> mutual deadlock between irattach and keventd!
>

2003-05-15 20:07:17

by Greg KH

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

On Thu, May 15, 2003 at 01:12:55PM -0700, [email protected] wrote:
> Greg,
>
> This is a HotPlug problem, so would you mind forwarding this
> to the relevant person and help Martin ?

But it's a networking subsystem hotplug problem, right? That's way out
of my league.

I do agree it looks like a real problem, Martin did a great job in
tracking this down.

thanks,

greg k-h

2003-05-15 20:12:49

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

On Thu, May 15, 2003 at 01:19:36PM -0700, Greg KH wrote:
> On Thu, May 15, 2003 at 01:12:55PM -0700, [email protected] wrote:
> > Greg,
> >
> > This is a HotPlug problem, so would you mind forwarding this
> > to the relevant person and help Martin ?
>
> But it's a networking subsystem hotplug problem, right? That's way out
> of my league.

That's why I say "forwarding", I know that we are all humans
after all ;-).

> I do agree it looks like a real problem, Martin did a great job in
> tracking this down.

Yes, I'm glad he is back, that way I can dedicate a bit more
time to pending wireless stuff ;-)

> thanks,
>
> greg k-h

Jean

2003-05-23 06:47:57

by David Miller

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

From: Martin Diehl <[email protected]>
Date: Fri, 23 May 2003 09:06:10 +0200 (CEST)

Asking just because there was another user hitting this deadlock:

It's fixed in current 2.5.x sources, wake up :-)

2003-05-23 09:18:51

by Martin Diehl

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

On Thu, 22 May 2003, David S. Miller wrote:

> Asking just because there was another user hitting this deadlock:
>
> It's fixed in current 2.5.x sources, wake up :-)

Oops, sorry for the noise, I hadn't noticed this yet.

But nope, unfortunately it's still hanging! I've just tested with
2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl
hold. This time however it seems it's triggered from sysfs side!

Thanks anyway!
Martin

-------------------------

May 23 11:07:31 srv kernel: events/0 D C02B05DC 4294946908 4 1 5 3 (L-TLB)
May 23 11:07:31 srv kernel: Call Trace:
May 23 11:07:31 srv kernel: [__down+197/368] __down+0xc5/0x170
May 23 11:07:31 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
May 23 11:07:31 srv kernel: [__down_failed+8/12] __down_failed+0x8/0xc
May 23 11:07:31 srv kernel: [.text.lock.rtnetlink+5/94] .text.lock.rtnetlink+0x5/0x5e
May 23 11:07:31 srv kernel: [linkwatch_event+33/48] linkwatch_event+0x21/0x30
May 23 11:07:32 srv kernel: [worker_thread+478/752] worker_thread+0x1de/0x2f0
May 23 11:07:32 srv kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30
May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
May 23 11:07:32 srv kernel: [ret_from_fork+6/32] ret_from_fork+0x6/0x20
May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
May 23 11:07:32 srv kernel: [worker_thread+0/752] worker_thread+0x0/0x2f0
May 23 11:07:32 srv kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18

May 23 11:07:32 srv kernel: irattach D 00000000 19710128 2109 1 2104 (NOTLB)
May 23 11:07:32 srv kernel: Call Trace:
May 23 11:07:32 srv kernel: [wait_for_completion+220/352] wait_for_completion+0xdc/0x160
May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
May 23 11:07:32 srv kernel: [__wake_up+83/144] __wake_up+0x53/0x90
May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
May 23 11:07:32 srv kernel: [call_usermodehelper+290/301] call_usermodehelper+0x122/0x12d
May 23 11:07:32 srv kernel: [__call_usermodehelper+0/96] __call_usermodehelper+0x0/0x60
May 23 11:07:32 srv kernel: [__call_usermodehelper+0/96] __call_usermodehelper+0x0/0x60
May 23 11:07:32 srv kernel: [sprintf+18/32] sprintf+0x12/0x20
May 23 11:07:32 srv kernel: [kset_hotplug+419/464] kset_hotplug+0x1a3/0x1d0
May 23 11:07:32 srv kernel: [kobject_del+75/96] kobject_del+0x4b/0x60
May 23 11:07:32 srv kernel: [class_device_del+166/192] class_device_del+0xa6/0xc0
May 23 11:07:32 srv kernel: [class_device_unregister+11/32] class_device_unregister+0xb/0x20
May 23 11:07:32 srv kernel: [unregister_netdevice+356/496] unregister_netdevice+0x164/0x1f0
May 23 11:07:32 srv kernel: [unregister_netdev+16/48] unregister_netdev+0x10/0x30
May 23 11:07:32 srv kernel: [_end+206658744/1070163436] +0x128/0x75c [sir_dev]
May 23 11:07:32 srv kernel: [_end+206653994/1070163436] sirdev_put_instance+0xfe/0x110 [sir_dev]
May 23 11:07:32 srv kernel: [tty_wait_until_sent+235/256] tty_wait_until_sent+0xeb/0x100
May 23 11:07:32 srv kernel: [_end+206513126/1070163436] irtty_close+0x3a/0x141 [irtty_sir]
May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
May 23 11:07:32 srv kernel: [tty_set_ldisc+205/464] tty_set_ldisc+0xcd/0x1d0
May 23 11:07:32 srv kernel: [serial8250_tx_empty+60/128] serial8250_tx_empty+0x3c/0x80
May 23 11:07:32 srv kernel: [uart_wait_until_sent+150/224] uart_wait_until_sent+0x96/0xe0
May 23 11:07:32 srv kernel: [_end+206514243/1070163436] +0x274/0x31d [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206512540/1070163436] irtty_open+0x0/0x210 [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206513068/1070163436] irtty_close+0x0/0x141 [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206511964/1070163436] irtty_ioctl+0x0/0x240 [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206510924/1070163436] irtty_receive_buf+0x0/0xb0 [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206511100/1070163436] irtty_receive_room+0x0/0x30 [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206511148/1070163436] irtty_write_wakeup+0x0/0x40 [irtty_sir]
May 23 11:07:32 srv kernel: [_end+206517164/1070163436] +0x0/0x100 [irtty_sir]
May 23 11:07:32 srv kernel: [dput+28/608] dput+0x1c/0x260
May 23 11:07:32 srv kernel: [tty_ioctl+888/1152] tty_ioctl+0x378/0x480
May 23 11:07:32 srv kernel: [sys_ioctl+646/744] sys_ioctl+0x286/0x2e8
May 23 11:07:32 srv kernel: [sys_fcntl64+89/112] sys_fcntl64+0x59/0x70
May 23 11:07:32 srv kernel: [sys_fcntl64+101/112] sys_fcntl64+0x65/0x70
May 23 11:07:32 srv kernel: [syscall_call+7/11] syscall_call+0x7/0xb

2003-05-23 09:31:56

by David Miller

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

From: Martin Diehl <[email protected]>
Date: Fri, 23 May 2003 11:38:38 +0200 (CEST)

On Thu, 22 May 2003, David S. Miller wrote:

> Asking just because there was another user hitting this deadlock:
>
> It's fixed in current 2.5.x sources, wake up :-)

Oops, sorry for the noise, I hadn't noticed this yet.

But nope, unfortunately it's still hanging! I've just tested with
2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl
hold. This time however it seems it's triggered from sysfs side!

Stephen, you need to do the device class stuff outside of the RTNL
lock please.

At least I didn't add this bug :-)

This should fix it.

--- net/core/dev.c.~1~ Fri May 23 02:42:37 2003
+++ net/core/dev.c Fri May 23 02:43:20 2003
@@ -2754,6 +2754,8 @@

dev->next = NULL;

+ netdev_unregister_sysfs(dev);
+
netdev_wait_allrefs(dev);

BUG_ON(atomic_read(&dev->refcnt));
@@ -2841,8 +2843,6 @@
BUG_TRAP(!dev->master);

free_divert_blk(dev);
-
- netdev_unregister_sysfs(dev);

spin_lock(&unregister_todo_lock);
dev->next = unregister_todo;

2003-05-23 14:29:00

by Stian Jordet

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

fre, 23.05.2003 kl. 11.43 skrev David S. Miller:
> From: Martin Diehl <[email protected]>
> Date: Fri, 23 May 2003 11:38:38 +0200 (CEST)
>
> On Thu, 22 May 2003, David S. Miller wrote:
>
> > Asking just because there was another user hitting this deadlock:
> >
> > It's fixed in current 2.5.x sources, wake up :-)
>
> Oops, sorry for the noise, I hadn't noticed this yet.
>
> But nope, unfortunately it's still hanging! I've just tested with
> 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl
> hold. This time however it seems it's triggered from sysfs side!
>
> Stephen, you need to do the device class stuff outside of the RTNL
> lock please.
>
> At least I didn't add this bug :-)
>
> This should fix it.

And so it did :-) Thanks.

Best regards,
Stian

2003-05-23 16:33:07

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

On Fri, May 23, 2003 at 02:43:08AM -0700, David S. Miller wrote:
> From: Martin Diehl <[email protected]>
> Date: Fri, 23 May 2003 11:38:38 +0200 (CEST)
>
> On Thu, 22 May 2003, David S. Miller wrote:
>
> > Asking just because there was another user hitting this deadlock:
> >
> > It's fixed in current 2.5.x sources, wake up :-)
>
> Oops, sorry for the noise, I hadn't noticed this yet.
>
> But nope, unfortunately it's still hanging! I've just tested with
> 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl
> hold. This time however it seems it's triggered from sysfs side!
>
> Stephen, you need to do the device class stuff outside of the RTNL
> lock please.
>
> At least I didn't add this bug :-)
>
> This should fix it.

Thanks Dave, we are very much obliged !

Jean

2003-05-23 23:05:38

by Martin Diehl

[permalink] [raw]
Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

On Fri, 23 May 2003, David S. Miller wrote:

> But nope, unfortunately it's still hanging! I've just tested with
> 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl
> hold. This time however it seems it's triggered from sysfs side!
>
> Stephen, you need to do the device class stuff outside of the RTNL
> lock please.
>
> At least I didn't add this bug :-)
>
> This should fix it.

Well, back online now pretty late ;-)

Yes, as was already reported I can also confirm from testing the deadlock
is gone now. Thanks for resolving this issue!

Just a minor question before the thread gets closed: Don't we have the
same problem in the register path? register_netdevice is running unter
rtnl and calls netdev_register_sysfs. I've never seen a deadlock there,
but I'd expect this to sleep for hotplug usermode completion as well.
Maybe this is just what you meant by your comment above ;-)

Martin