Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764092AbXE2H5g (ORCPT ); Tue, 29 May 2007 03:57:36 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753208AbXE2H53 (ORCPT ); Tue, 29 May 2007 03:57:29 -0400 Received: from smtp1.linux-foundation.org ([207.189.120.13]:46392 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753129AbXE2H52 (ORCPT ); Tue, 29 May 2007 03:57:28 -0400 Date: Tue, 29 May 2007 00:56:28 -0700 From: Andrew Morton To: Folkert van Heusden Cc: linux-kernel@vger.kernel.org, Jarek Poplawski , Jason Wessel , Thomas Gleixner , stable@kernel.org Subject: Re: [2.6.21.1] soft lockup when removing netconsole module Message-Id: <20070529005628.f7f3abc6.akpm@linux-foundation.org> In-Reply-To: <20070526154011.GB3735@vanheusden.com> References: <20070526154011.GB3735@vanheusden.com> X-Mailer: Sylpheed 2.4.1 (GTK+ 2.8.17; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2897 Lines: 72 On Sat, 26 May 2007 17:40:12 +0200 Folkert van Heusden wrote: > When trying to remove the netconsole module, I got the following kernel > output after a while (couple of minutes iirc): > > [525720.117293] BUG: soft lockup detected on CPU#1! > [525720.117353] [] show_trace_log_lvl+0x1a/0x30 > [525720.117439] [] show_trace+0x12/0x14 > [525720.117526] [] dump_stack+0x16/0x18 > [525720.117613] [] softlockup_tick+0xa6/0xc2 > [525720.117694] [] run_local_timers+0x12/0x14 > [525720.117738] [] update_process_times+0x72/0xa1 > [525720.117744] [] tick_sched_timer+0x53/0xb6 > [525720.117748] [] hrtimer_interrupt+0x189/0x1e3 > [525720.117753] [] local_apic_timer_interrupt+0x55/0x5b > [525720.117761] [] smp_apic_timer_interrupt+0x2a/0x39 > [525720.117766] [] apic_timer_interrupt+0x33/0x38 > [525720.117770] [] mutex_lock+0x8/0xa > [525720.117775] [] flush_workqueue+0x2f/0x8f > [525720.117780] [] cancel_rearming_delayed_workqueue+0x29/0x2b > [525720.117785] [] cancel_rearming_delayed_work+0xf/0x11 > [525720.117790] [] netpoll_cleanup+0x75/0xa5 > [525720.117794] [] cleanup_netconsole+0x17/0x1a [netconsole] > [525720.117804] [] sys_delete_module+0x12f/0x14f > [525720.117809] [] syscall_call+0x7/0xb > [525720.117812] ======================= > > Also the rmmod hangs and would not exit even with kill -9. It also > sucks up 100% cpu. Jason recently posted a mystery patch without telling us what problem it fixed. It looks like you just found it: cancel_rearming_delayed_work() will hang if the work isn't actually pending. Please test this: From: Jason Wessel Do not call cancel_rearming_delayed_work() if there is no pending work. Signed-off-by: Jason Wessel Signed-off-by: Andrew Morton --- net/core/netpoll.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff -puN net/core/netpoll.c~a net/core/netpoll.c --- a/net/core/netpoll.c~a +++ a/net/core/netpoll.c @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np) if (atomic_dec_and_test(&npinfo->refcnt)) { skb_queue_purge(&npinfo->arp_tx); skb_queue_purge(&npinfo->txq); - cancel_rearming_delayed_work(&npinfo->tx_work); - flush_scheduled_work(); + if (delayed_work_pending(&npinfo->tx_work)) { + cancel_rearming_delayed_work(&npinfo->tx_work); + flush_scheduled_work(); + } kfree(npinfo); } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/