Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753246Ab3GIWuP (ORCPT ); Tue, 9 Jul 2013 18:50:15 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:46925 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751452Ab3GIWuN (ORCPT ); Tue, 9 Jul 2013 18:50:13 -0400 Message-ID: <51DC9379.9050408@oracle.com> Date: Tue, 09 Jul 2013 18:49:29 -0400 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Dave Jones , Tejun Heo , tglx@linutronix.de, Peter Zijlstra , LKML , trinity@vger.kernel.org Subject: Re: timer: lockup in run_timer_softirq() References: <51DC902F.3070403@oracle.com> <20130709224706.GA13855@redhat.com> In-Reply-To: <20130709224706.GA13855@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2405 Lines: 51 On 07/09/2013 06:47 PM, Dave Jones wrote: > On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote: > > Hi all, > > > > While fuzzing with trinity inside a KVM tools guest running latest -next, I've > > stumbled on the following spew: > > > > [ 2536.440007] BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child86:12368] > > [ 2536.440007] Call Trace: > > [ 2536.440007] > > [ 2536.440007] [] run_timer_softirq+0x2d0/0x330 > > [ 2536.440007] [] ? lock_timer_base+0x70/0x70 > > [ 2536.440007] [] __do_softirq+0x261/0x4d0 > > [ 2536.440007] [] irq_exit+0x86/0x120 > > [ 2536.440007] [] smp_apic_timer_interrupt+0x4a/0x60 > > [ 2536.440007] [] apic_timer_interrupt+0x72/0x80 > > [ 2536.440007] > > [ 2536.440007] [] ? retint_restore_args+0x13/0x13 > > [ 2536.440007] [] ? user_enter+0x135/0x150 > > [ 2536.440007] [] syscall_trace_leave+0x12d/0x160 > > [ 2536.440007] [] int_check_syscall_exit_work+0x34/0x3d > > [ 2536.440007] Code: 01 fd 48 89 df e8 45 5b 8f fd e8 b0 f1 00 fd 48 83 3d 30 e6 8a 01 00 75 0e 0f > > 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 fb 66 66 90 <66> 66 90 bf 01 00 00 00 e8 17 4f 00 00 65 48 8b > > 04 25 88 d9 00 > > > > While going through the NMI dump, I noticed that it's very incomplete, and full of: > > > > [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: > > 697182.008 msecs > > I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1] > on bare-metal. > > I've not managed to find time yet to try and narrow down the exact combination > of trinity syscalls that causes this. Sometimes it happens quickly, sometimes > after a few hours (even when run with the same seeds, it seems variable) Interesting. It's the first time I'm seeing these, but I haven't really changed anything in my configuration. Are you also seeing the "NMI handler took too long" messages? Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/