Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752223AbbBXGfU (ORCPT ); Tue, 24 Feb 2015 01:35:20 -0500 Received: from mail-we0-f170.google.com ([74.125.82.170]:33344 "EHLO mail-we0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbbBXGfT (ORCPT ); Tue, 24 Feb 2015 01:35:19 -0500 Date: Tue, 24 Feb 2015 07:35:14 +0100 From: Ingo Molnar To: Anton Blanchard Cc: Andrew Morton , Steven Rostedt , Michael Ellerman , Paul Mackerras , Benjamin Herrenschmidt , sam.bobroff@au1.ibm.com, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Russell King , peterz@infradead.org, Don Zickus , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, Linus Torvalds , Arjan van de Ven Subject: Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups Message-ID: <20150224063513.GA15387@gmail.com> References: <1424748634-9153-1-git-send-email-anton@samba.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1424748634-9153-1-git-send-email-anton@samba.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1389 Lines: 43 * Anton Blanchard wrote: > Every now and then I end up with an undebuggable issue > because multiple CPUs hit something at the same time and > everything is interleaved: > > CR: 48000082 XER: 00000000 > ,RI > c0000003dc72fd10 > ,LE > d0000000065b84e8 > Instruction dump: > MSR: 8000000100029033 > > Very annoying. > > Some architectures already have their own recursive > locking for oopses and we have another version for > serialising dump_stack. > > Create a common version and use it everywhere (oopses, > BUGs, WARNs, dump_stack, soft lockups and hard lockups). Dunno. I've had cases where the simultaneity of the oopses (i.e. their garbled nature) gave me the clue about the type of race to expect. To still get that information: instead of taking a serializing spinlock (or in addition to it), it would be nice to at least preserve the true time order of the incidents, at minimum by generating a global count for oopses/warnings (a bit like the oops count # currently), and to gather it first - before taking any spinlocks. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/