Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754562Ab2FYJRl (ORCPT ); Mon, 25 Jun 2012 05:17:41 -0400 Received: from mail-bk0-f46.google.com ([209.85.214.46]:59740 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752842Ab2FYJRj (ORCPT ); Mon, 25 Jun 2012 05:17:39 -0400 Date: Mon, 25 Jun 2012 11:17:34 +0200 From: Ingo Molnar To: ShuoX Liu , Andrew Morton Cc: "linux-kernel@vger.kernel.org" , Borislav Petkov , Yanmin Zhang , "Luck, Tony" , Andrew Morton , "andi@firstfloor.org" , Ingo Molnar , Peter Zijlstra , Thomas Gleixner Subject: Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface Message-ID: <20120625091734.GA25839@gmail.com> References: <20120605081448.GA7097@liondog.tnic> <4FCDD72A.9030701@intel.com> <4FCDD78A.3070106@intel.com> <20120605151542.GA10669@x1.osrc.amd.com> <1338942965.14538.233.camel@ymzhang.sh.intel.com> <4FCF155B.3090705@intel.com> <4FCF160D.8010404@intel.com> <20120606152238.GA3874@x1.osrc.amd.com> <4FD01896.1010506@intel.com> <4FD01933.7070000@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FD01933.7070000@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2668 Lines: 70 * ShuoX Liu wrote: > From: ShuoX Liu > > On x86 machines, some times MCE happens just when kernel calls printk > to output some log info to serial console, while usually MCE module in > kernel is used to print out some hardware error information, such like > bad cache or bad memory bank. That causes printk recursion and printk > would omit MCE printk output. > > We hit it when running MTBF testing on Android ATOM mobiles. > > Here in print_mce, we choose to disable printk recursion to make sure > MCE logs printed out. > > Signed-off-by: Yanmin Zhang > Signed-off-by: ShuoX Liu > --- > arch/x86/kernel/cpu/mcheck/mce.c | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 2afcbd2..6056e94 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -242,6 +242,7 @@ static void print_mce(struct mce *m) > { > int ret = 0; > > + printk_recursion_check_disable(); > pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n", > m->extcpu, m->mcgstatus, m->bank, m->status); > > @@ -275,10 +276,13 @@ static void print_mce(struct mce *m) > * (if the CPU has an implementation for that) > */ > ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m); > - if (ret == NOTIFY_STOP) > + if (ret == NOTIFY_STOP) { > + printk_recursion_check_enable(); > return; > + } > > pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n"); > + printk_recursion_check_enable(); Ok, this looks useful and it solves a real problem, but I'd prefer a better interface: instead of exposing the guts of printk to drivers in an unsafe manner (and allowing them to keep printk in an unsafe state indefinitely), shouldn't we instead introduce printk_emergency() (and variants) that just disable the recursion check, do the printk and then enable them? That way drivers cannot possibly leave the recursion check disabled permanently and it would also be much more obvious *which* actual printk sites are affected by this exception. In theory this could be achieved via a new, super-high-prio printk level: KERN_CRASH or so, which the core printk code could check - without introducing a new side-facility. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/