Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753529AbaLUX6U (ORCPT ); Sun, 21 Dec 2014 18:58:20 -0500 Received: from mail-qg0-f53.google.com ([209.85.192.53]:35622 "EHLO mail-qg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753142AbaLUX6T (ORCPT ); Sun, 21 Dec 2014 18:58:19 -0500 MIME-Version: 1.0 In-Reply-To: <20141221223204.GA9618@codemonkey.org.uk> References: <20141219145528.GC13404@redhat.com> <20141221223204.GA9618@codemonkey.org.uk> Date: Sun, 21 Dec 2014 15:58:18 -0800 X-Google-Sender-Auth: 1l3gZbP1crYi6AAMLPENMZZ6iuI Message-ID: Subject: Re: frequent lockups in 3.18rc4 From: Linus Torvalds To: Dave Jones , Linus Torvalds , Thomas Gleixner , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?UTF-8?Q?D=C3=A2niel_Fraga?= , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 21, 2014 at 2:32 PM, Dave Jones wrote: > On Sun, Dec 21, 2014 at 02:19:03PM -0800, Linus Torvalds wrote: > > > > And finally, and stupidly, is there any chance that you have anything > > accessing /dev/hpet? > > Not knowingly at least, but who the hell knows what systemd has its > fingers in these days. Actually, it looks like /dev/hpet doesn't allow write access. I can do the mmap(/dev/mem) thing and access the HPET by hand, and when I write zero to it I immediately get something like this: Clocksource tsc unstable (delta = -284317725450 ns) Switched to clocksource hpet just to confirm that yes, a jump in the HPET counter would indeed give those kinds of symptoms:blaming the TSC with a negative delta in the 0-300s range, even though it's the HPET that is broken. And if the HPET then occasionally jumps around afterwards, it would show up as ktime_get() occasionally going backwards, which in turn would - as far as I can tell - result in exactly that pseudo-infirnite loop with timers. Anyway, any wild kernel pointer access *could* happen to just hit the HPET and write to the main counter value, although I'd personally be more inclined to blame BIOS/SMM kind of code playing tricks with time.. We do have a few places where we explicitly write the value on purpose, but they are in the HPET init code, and in the clocksource resume code, so they should not be involved. Thomas - have you had reports of HPET breakage in RT circles, the same way BIOSes have been tinkering with TSC? Also, would it perhaps be a good idea to make "ktime_get()" save the last time in a percpu variable, and warn if time ever goes backwards on a particular CPU? A percpu thing should be pretty cheap, even if we write to it every time somebody asks for time.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/