Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752555AbbFKI2F (ORCPT ); Thu, 11 Jun 2015 04:28:05 -0400 Received: from cantor2.suse.de ([195.135.220.15]:54184 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750872AbbFKI1z (ORCPT ); Thu, 11 Jun 2015 04:27:55 -0400 Date: Thu, 11 Jun 2015 10:27:47 +0200 From: Borislav Petkov To: "Luck, Tony" Cc: "Chen, Gong" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 4/4 Rebase] x86, MCE: Avoid potential deadlock in MCE context Message-ID: <20150611082747.GA30391@pd.tnic> References: <1432150538-3120-1-git-send-email-gong.chen@linux.intel.com> <1432150538-3120-5-git-send-email-gong.chen@linux.intel.com> <20150520092800.GB3645@pd.tnic> <20150522211247.GB4930@gchen.bj.intel.com> <20150522090941.GD3606@pd.tnic> <20150608134127.GE5877@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F32A8E51A@ORSMSX114.amr.corp.intel.com> <20150608202658.GF5877@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150608202658.GF5877@pd.tnic> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5494 Lines: 186 On Mon, Jun 08, 2015 at 10:26:58PM +0200, Borislav Petkov wrote: > On Mon, Jun 08, 2015 at 08:03:08PM +0000, Luck, Tony wrote: > > @@ -156,7 +156,8 @@ void mce_log(struct mce *mce) > > /* Emit the trace record: */ > > trace_mce_record(mce); > > > > - atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce); > > + mce_genpool_add(mce); > > + irq_work_queue(&mce_irq_work); > > > > Is it safe to call irq_work_queue() from MCE context? > > Yeah, we're using it in contexts like: Yeah, so that is safe but there's another issue which I triggered yesterday. I hope the commit message explains it thoroughly. If you see holes, please shout. I haven't run this on the EINJ box yet, that's still outstanding. In any case, I'm thinking of not rushing those changes to tip yet and have them simmer in linux-next for a whole round and have them ready for 4.3. Just to be on the safe side and have some more time hammering on them. --- From: Borislav Petkov Date: Wed, 10 Jun 2015 18:49:18 +0200 Subject: [PATCH] x86/mce: Fix logging of early boot machine checks With moving the irq_work queuing to mce_log() which simplified other code, we're faced with the problem of logging MCEs during early boot which might be there from before. In particular, facilities which we rely on, like workqueues, for example, are not initialized yet. See splat below. As a first step, move the workqueue init before the first invocation of mce_log() in __mcheck_cpu_init_generic |-> machine_check_poll |-> mce_log But that's not enough because the normal system workqueue on which stuff gets queued with schedule_work() is not initialized yet either. Therefore, we still queue the work in the genpool but delay the scheduling of that work to a late initcall where everything should be initialized. While at it, make sure we queue irq_work only if the addition to the genpool has been successful. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc7+ #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 task: ffffffff8197d580 ti: ffffffff8196c000 task.ti: ffffffff8196c000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) Call Trace: ? irq_work_run_list irq_work_run smp_irq_work_interrupt irq_work_interrupt ? apic_send_IPI_self arch_irq_work_raise irq_work_queue mce_log machine_check_poll __mcheck_cpu_init_generic mcheck_cpu_init identify_cpu identify_boot_cpu check_bugs start_kernel x86_64_start_reservations x86_64_start_kernel Code: Bad RIP value. RIP [< (null)>] (null) RSP CR2: 0000000000000000 ---[ end trace 6d6dd507e2636bd4 ]--- Kernel panic - not syncing: Fatal exception in interrupt ---[ end Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Borislav Petkov --- arch/x86/kernel/cpu/mcheck/mce.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index a93a150c3583..17cef061c24a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -159,8 +159,8 @@ void mce_log(struct mce *mce) /* Emit the trace record: */ trace_mce_record(mce); - mce_genpool_add(mce); - irq_work_queue(&mce_irq_work); + if (mce_genpool_add(mce)) + irq_work_queue(&mce_irq_work); mce->finished = 0; wmb(); @@ -480,7 +480,7 @@ int mce_available(struct cpuinfo_x86 *c) static void mce_schedule_work(void) { - if (!mce_genpool_empty()) + if (!mce_genpool_empty() && keventd_up()) schedule_work(&mce_work); } @@ -648,8 +648,9 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) if (m.status & MCI_STATUS_ADDRV) { m.severity = severity; m.usable_addr = mce_usable_address(&m); - mce_genpool_add(&m); - mce_schedule_work(); + + if (mce_genpool_add(&m)) + mce_schedule_work(); } } @@ -1704,13 +1705,14 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) return; } + INIT_WORK(&mce_work, mce_process_work); + init_irq_work(&mce_irq_work, mce_irq_work_cb); + machine_check_vector = do_machine_check; __mcheck_cpu_init_generic(); __mcheck_cpu_init_vendor(c); __mcheck_cpu_init_timer(); - INIT_WORK(&mce_work, mce_process_work); - init_irq_work(&mce_irq_work, mce_irq_work_cb); } /* @@ -2565,5 +2567,20 @@ static int __init mcheck_debugfs_init(void) return 0; } -late_initcall(mcheck_debugfs_init); +#else +static int __init mcheck_debugfs_init(void) {} #endif + +static int __init mcheck_late_init(void) +{ + mcheck_debugfs_init(); + + /* + * Flush out everything that has been logged during early boot, now that + * everything has been initialized (workqueues, decoders, ...). + */ + mce_schedule_work(); + + return 0; +} +late_initcall(mcheck_late_init); -- 2.3.5 -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/