Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754409AbaKSEk5 (ORCPT ); Tue, 18 Nov 2014 23:40:57 -0500 Received: from mail-vc0-f177.google.com ([209.85.220.177]:58822 "EHLO mail-vc0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753258AbaKSEk4 (ORCPT ); Tue, 18 Nov 2014 23:40:56 -0500 MIME-Version: 1.0 In-Reply-To: <20141119021902.GA14216@redhat.com> References: <20141117170359.GA1382@redhat.com> <20141118020959.GA2091@redhat.com> <20141118023930.GA2871@redhat.com> <20141118145234.GA7487@redhat.com> <20141118215540.GD35311@redhat.com> <20141119021902.GA14216@redhat.com> Date: Tue, 18 Nov 2014 20:40:55 -0800 X-Google-Sender-Auth: iavkIwHNpSotXy4XHgVjstaUP9c Message-ID: Subject: Re: frequent lockups in 3.18rc4 From: Linus Torvalds To: Dave Jones , Don Zickus , Thomas Gleixner , Linus Torvalds , Linux Kernel , "the arch/x86 maintainers" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones wrote: > > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480] > CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140] > RIP: 0010:[] [] context_tracking_user_enter+0xa4/0x190 > Call Trace: > [] syscall_trace_leave+0xa5/0x160 > [] int_check_syscall_exit_work+0x34/0x3d Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work. Some TIF_NOHZ loop, perhaps? You have nohz on, don't you? That makes me wonder: does the problem go away if you disable NOHZ? > CPU: 0 PID: 27716 Comm: kworker/0:1 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140] > Workqueue: events nohz_kick_work_fn > RIP: 0010:[] [] smp_call_function_many+0x1b2/0x320 > Call Trace: > [] tick_nohz_full_kick_all+0x35/0x70 > [] nohz_kick_work_fn+0xe/0x10 > [] process_one_work+0x1fd/0x590 > [] worker_thread+0x11b/0x490 > [] kthread+0xf9/0x110 > [] ret_from_fork+0x7c/0xb0 Yeah, there's certainly some NOHZ work going on on CPU0 too. > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 10/411 32140] > RIP: 0010:[] [] intel_idle+0xd5/0x180 > Call Trace: > [] cpuidle_enter_state+0x55/0x1c0 > [] cpuidle_enter+0x17/0x20 > [] cpu_startup_entry+0x433/0x4e0 > [] start_secondary+0x1a3/0x220 Nothing. > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 10/411 32140] > RIP: 0010:[] [] intel_idle+0xd5/0x180 > [] cpuidle_enter_state+0x55/0x1c0 > [] cpuidle_enter+0x17/0x20 > [] cpu_startup_entry+0x433/0x4e0 > [] start_secondary+0x1a3/0x220 Nothing. Hmm. NOHZ? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/