Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264233AbTEOUBQ (ORCPT ); Thu, 15 May 2003 16:01:16 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264231AbTEOUAx (ORCPT ); Thu, 15 May 2003 16:00:53 -0400 Received: from palrel10.hp.com ([156.153.255.245]:59829 "EHLO palrel10.hp.com") by vger.kernel.org with ESMTP id S264227AbTEOUAF (ORCPT ); Thu, 15 May 2003 16:00:05 -0400 Date: Thu, 15 May 2003 13:12:55 -0700 To: Martin Diehl Cc: Jeff Garzik , Linux kernel mailing list Subject: Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd Message-ID: <20030515201255.GA18643@bougret.hpl.hp.com> Reply-To: jt@hpl.hp.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.28i Organisation: HP Labs Palo Alto Address: HP Labs, 1U-17, 1501 Page Mill road, Palo Alto, CA 94304, USA. E-mail: jt@hpl.hp.com From: Jean Tourrilhes Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7632 Lines: 143 Greg, This is a HotPlug problem, so would you mind forwarding this to the relevant person and help Martin ? Thanks in advance... Jean On Thu, May 15, 2003 at 03:14:36PM +0200, Martin Diehl wrote: > > Hi, > > seems we may run into mutual deadlock in the unregister_netdev() path with > CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to > the following description: > > * killing irattach (userland daemon comparable to pppd) starts closing the > irda tty-ldisc > > * there we call unregister_netdev() on behalf of the (already closed) > irda0 network device. > > * unregister_netdev() takes rtnl_lock > > * further down in unregister_netdevice() with CONFIG_HOTPLUG the network > layers wants to call userland hotplug stuff > > * the request to fork the usermodehelper gets queued for the event/0 > workqueue (aka keventd) and we are blocking with rtnl still acquired for > completion. > > * at this moment for some reason keventd has a linkwatch_event() > apparently already scheduled before the usermode helper. So we run into > linkwatch_event() with tries to get rtnl_lock. > > -> mutual deadlock: keventd waiting for rtnl_lock which is still hold by > unregister_netdev blocking for completion of work scheduled for keventd. > > I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter > what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no > idea why I get this with irda0 but not with eth0 for example. > FWIW kernel is SMP running on UP without preempt. > > As I don't see how the irda stuff could cause unregister_netdev() to > schedule the hotplug stuff with some linkwatch_event already scheduled > I've no idea what the real problem and fix might be. > > Below a commented calltrace catched right when it hangs as described. > > Thanks > Martin > > ----------------------------- > > > May 14 13:14:17 laptop kernel: events/0 D C12FDF04 412092 3 1 4 2 (L-TLB) > > May 14 13:14:17 laptop kernel: Call Trace: > > May 14 13:14:17 laptop kernel: [__down+150/256] __down+0x96/0x100 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [__down_failed+8/12] __down_failed+0x8/0xc > > May 14 13:14:17 laptop kernel: [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36 > > May 14 13:14:17 laptop kernel: [linkwatch_event+29/48] linkwatch_event+0x1d/0x30 > > May 14 13:14:17 laptop kernel: [worker_thread+511/736] worker_thread+0x1ff/0x2e0 > > May 14 13:14:17 laptop kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [worker_thread+0/736] worker_thread+0x0/0x2e0 > > May 14 13:14:17 laptop kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 > > This is the keventd-thread. It has some work scheduled for the network > layer, namely linkwatch_event(). This is currently blocking to get the > rtnl_lock semaphore. > > > > May 14 13:14:17 laptop kernel: irattach D 00000000 4283667124 400 1 537 396 (NOTLB) > > May 14 13:14:17 laptop kernel: Call Trace: > > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0 > > May 14 13:14:17 laptop kernel: [wait_for_completion+153/224] wait_for_completion+0x99/0xe0 > > (5) > > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [queue_work+132/160] queue_work+0x84/0xa0 > > (4) > > > May 14 13:14:17 laptop kernel: [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110 > > May 14 13:14:17 laptop kernel: [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70 > > May 14 13:14:17 laptop kernel: [vsprintf+39/48] vsprintf+0x27/0x30 > > May 14 13:14:17 laptop kernel: [sprintf+31/48] sprintf+0x1f/0x30 > > May 14 13:14:17 laptop kernel: [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3 > > (3) > > > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0 > > May 14 13:14:17 laptop kernel: [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0 > > May 14 13:14:17 laptop kernel: [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0 > > May 14 13:14:17 laptop kernel: [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260 > > May 14 13:14:17 laptop kernel: [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev] > > (2) > > > May 14 13:14:17 laptop kernel: [unregister_netdev+24/48] unregister_netdev+0x18/0x30 > > (1) > > > May 14 13:14:17 laptop kernel: [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev] > > May 14 13:14:17 laptop kernel: [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev] > > May 14 13:14:17 laptop kernel: [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir] > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0 > > May 14 13:14:17 laptop kernel: [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0 > > May 14 13:14:17 laptop kernel: [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20 > > May 14 13:14:17 laptop kernel: [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir] > > May 14 13:14:17 laptop kernel: [sys_ioctl+256/656] sys_ioctl+0x100/0x290 > > May 14 13:14:17 laptop kernel: [syscall_call+7/11] syscall_call+0x7/0xb > > Ok, nice trace btw: The last printk from sir_dev was at (1) before we > called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to > the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer > decided to call the hotplug stuff (3). For this to fork the usermode > helper, it scheduled some work for keventd (4). Finally we are blocking > for completion until keventd finishes wait4 usermodehelper (5). > > Unfortunately we are blocking for completion with rtnl still locked and > keventd apparently having the linkwatch_event() scheduled before the > usermodehelper -> mutual deadlock between irattach and keventd! > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/