Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755249AbXFLL1A (ORCPT ); Tue, 12 Jun 2007 07:27:00 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753605AbXFLL0u (ORCPT ); Tue, 12 Jun 2007 07:26:50 -0400 Received: from mx10.go2.pl ([193.17.41.74]:58395 "EHLO poczta.o2.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752649AbXFLL0t (ORCPT ); Tue, 12 Jun 2007 07:26:49 -0400 Date: Tue, 12 Jun 2007 13:02:33 +0200 From: Jarek Poplawski To: Andrew Morton Cc: Folkert van Heusden , linux-kernel@vger.kernel.org, Jason Wessel , Thomas Gleixner , stable@kernel.org, netdev@vger.kernel.org Subject: Re: [2.6.21.1] soft lockup when removing netconsole module Message-ID: <20070612110233.GA3281@ff.dom.local> References: <20070526154011.GB3735@vanheusden.com> <20070529005628.f7f3abc6.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070529005628.f7f3abc6.akpm@linux-foundation.org> User-Agent: Mutt/1.4.2.2i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4155 Lines: 104 On Tue, May 29, 2007 at 12:56:28AM -0700, Andrew Morton wrote: > On Sat, 26 May 2007 17:40:12 +0200 Folkert van Heusden wrote: > > > When trying to remove the netconsole module, I got the following kernel > > output after a while (couple of minutes iirc): > > > > [525720.117293] BUG: soft lockup detected on CPU#1! > > [525720.117353] [] show_trace_log_lvl+0x1a/0x30 > > [525720.117439] [] show_trace+0x12/0x14 > > [525720.117526] [] dump_stack+0x16/0x18 > > [525720.117613] [] softlockup_tick+0xa6/0xc2 > > [525720.117694] [] run_local_timers+0x12/0x14 > > [525720.117738] [] update_process_times+0x72/0xa1 > > [525720.117744] [] tick_sched_timer+0x53/0xb6 > > [525720.117748] [] hrtimer_interrupt+0x189/0x1e3 > > [525720.117753] [] local_apic_timer_interrupt+0x55/0x5b > > [525720.117761] [] smp_apic_timer_interrupt+0x2a/0x39 > > [525720.117766] [] apic_timer_interrupt+0x33/0x38 > > [525720.117770] [] mutex_lock+0x8/0xa > > [525720.117775] [] flush_workqueue+0x2f/0x8f > > [525720.117780] [] cancel_rearming_delayed_workqueue+0x29/0x2b > > [525720.117785] [] cancel_rearming_delayed_work+0xf/0x11 > > [525720.117790] [] netpoll_cleanup+0x75/0xa5 > > [525720.117794] [] cleanup_netconsole+0x17/0x1a [netconsole] > > [525720.117804] [] sys_delete_module+0x12f/0x14f > > [525720.117809] [] syscall_call+0x7/0xb > > [525720.117812] ======================= > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > sucks up 100% cpu. > > Jason recently posted a mystery patch without telling us what problem it > fixed. > To be fair the problem should be known: http://marc.info/?l=linux-kernel&m=117700287817801&w=2 List: linux-kernel Subject: Re: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work From: Chuck Ebbert Date: 2007-04-19 17:07:11 Message-ID: 4627A1BF.8080406 () redhat ! com > Okay, an easy test for it: insmod netconsole ; rmmod netconsole > > In 2.6.20.x it loops forever and cancel_rearming_delayed_work() > is part of the trace... I hoped the discussion about cancel_rearming_delayed_work would reach more people (there was also a patch proposal to add a warning to the usage comment). But it seem it was not enough... Of course such a problem should preferably be fixed by somebody who knows the code (alas I don't know netconsole), to be sure all needed cancels are still done after this change. I hope Jason's patch is right but I'm a little surprised I can't see netdev in cc (I'll try to fix this). Cheers, Jarek P. PS: I'm very sorry for such late response (holidays). > It looks like you just found it: cancel_rearming_delayed_work() will hang > if the work isn't actually pending. Please test this: > > > From: Jason Wessel > > Do not call cancel_rearming_delayed_work() if there is no > pending work. > > Signed-off-by: Jason Wessel > Signed-off-by: Andrew Morton > --- > > net/core/netpoll.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff -puN net/core/netpoll.c~a net/core/netpoll.c > --- a/net/core/netpoll.c~a > +++ a/net/core/netpoll.c > @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np) > if (atomic_dec_and_test(&npinfo->refcnt)) { > skb_queue_purge(&npinfo->arp_tx); > skb_queue_purge(&npinfo->txq); > - cancel_rearming_delayed_work(&npinfo->tx_work); > - flush_scheduled_work(); > + if (delayed_work_pending(&npinfo->tx_work)) { > + cancel_rearming_delayed_work(&npinfo->tx_work); > + flush_scheduled_work(); > + } > > kfree(npinfo); > } > _ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/