Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750966AbaK0TRS (ORCPT ); Thu, 27 Nov 2014 14:17:18 -0500 Received: from mail-qc0-f172.google.com ([209.85.216.172]:56980 "EHLO mail-qc0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750816AbaK0TRR (ORCPT ); Thu, 27 Nov 2014 14:17:17 -0500 MIME-Version: 1.0 In-Reply-To: <20141126225745.GA30346@redhat.com> References: <20141114213124.GB3344@redhat.com> <20141115213405.GA31971@redhat.com> <20141116014006.GA5016@redhat.com> <20141126002501.GA11752@redhat.com> <20141126024032.GA13246@redhat.com> <20141126225745.GA30346@redhat.com> Date: Thu, 27 Nov 2014 11:17:16 -0800 X-Google-Sender-Auth: VnvgpLxKhVq6MefR1oehwmJ19Bo Message-ID: Subject: Re: frequent lockups in 3.18rc4 From: Linus Torvalds To: Dave Jones , Linus Torvalds , Linux Kernel , "the arch/x86 maintainers" , Don Zickus Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 26, 2014 at 2:57 PM, Dave Jones wrote: > > So 3.17 also has this problem. > Good news I guess in that it's not a regression, but damn I really didn't > want to have to go digging through the mists of time to find the last 'good' point. So I'm looking at the watchdog code, and it seems racy wrt parking and startup. In particular, it sets the high priority *after* starting the hrtimer, and it goes back to SCHED_NORMAL *before* canceling the timer. Which seems completely ass-backwards. And the smp_hotplug_thread stuff explicitly enables preemption around the setup/cleanup/part/unpark operations. However, that would be an issue only if trinity might be doing things that enable and disable the watchdog. And doing so under insane loads. Even then it seems unlikely. The insane loads you have. But even then, could a load average of 169 possibly delay running a non-RT process for 22 seconds? Doubtful. But just in case: do you do cpu hotplug events (that will disable and re-enable the watchdog process?). Anything else that will part/unpark the hotplug thread? Quite frankly, I'm just grasping for straws here, but a lot of the watchdog traces really have seemed spurious... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/