Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751636AbdL2Rca (ORCPT ); Fri, 29 Dec 2017 12:32:30 -0500 Received: from mga06.intel.com ([134.134.136.31]:37860 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751560AbdL2Rc1 (ORCPT ); Fri, 29 Dec 2017 12:32:27 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,477,1508828400"; d="scan'208";a="162647145" Subject: Re: 4.14.9 with CONFIG_MCORE2 fails to boot To: Alexander Tsoy , Greg KH , Andy Lutomirski , Thomas Gleixner , Ingo Molnar References: <1514453602.6251.8.camel@tsoy.me> <20171229091741.GC18441@kroah.com> <1514557888.28262.1.camel@tsoy.me> <1514558513.28262.3.camel@tsoy.me> Cc: Borislav Petkov , Boris Ostrovsky , Borislav Petkov , Borislav Petkov , Brian Gerst , Dave Hansen , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , "H. Peter Anvin" , Josh Poimboeuf , Juergen Gross , Linus Torvalds , Peter Zijlstra , Rik van Riel , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Kernel Mailing List , stable From: Dave Hansen Message-ID: <1b4569ee-8c06-4480-447b-2af8f6804053@intel.com> Date: Fri, 29 Dec 2017 09:32:13 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1514558513.28262.3.camel@tsoy.me> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2329 Lines: 55 Does anyone have the results of build that they can share? (vmlinux, vmlinuz/bzImage, System.map, .config). That, plus a corresponding serial log with an oops would be helpful. I tried just adding MCORE2=y to my normal config but it didn't reproduce this. If you can't send the entire build like that, just running scripts/ faddr2line on __schedule+0x37f/0x7b0 would be very enlightening. On 12/29/2017 06:41 AM, Alexander Tsoy wrote: > [ 0.775461] NMI backtrace for cpu 0 > [ 0.775461] CPU: 0 PID: 114 Comm: modprobe Not tainted 4.1u.0-rc5+ ... > [    0.775461] Call Trace: > [    0.775461]  <#DF> > [    0.775461]  ? double_fault+0xc/0x30 > [    0.775461]  ? page_fault+0x36/0x60 > [    0.775461]  do_double_fault+0xb/0x130 > [    0.775461]   > [    0.775461] Code: 78 4c 89 7c 24 08 4c 89 74 24 10 4c 89 6c 24 18 4c > 89 64 2t 20 48 89 6c 24 28 48 89 5c 24 30 bb 01 00 00 00 b9 01 01 00 c0 > 0f 32 <85> d2 78 05 0f 01 f8 31 db c3 0f 1f 40 00 66 2e 0f 1f 8t 00 00  >From the various oopses, it looks like this happens when getting a double fault while trying to go idle. The CPU gets is probably trying to return from the double fault, but it didn't do anything useful in the fault handler so it just continues faulting, but the NMI watchdog can still get an oops out of it. It doesn't appear to be a recursing *too* far because it's not blowing through the stack and triple faulting. Of the several traces, they all appear to be in paths that might call safe_halt() (including the kvm async page fault code). It makes me wonder if we've been taking double faults there for a long time, but the new trampoline stack somehow ends up being more fragile and can't recover from the double-fault. Couple more things: MCORE2 seems to get one oddball compiler flag (-march=core2): > cflags-$(CONFIG_MCORE2) += \ > $(call cc-option,-march=core2,$(call cc-option,-mtune=generic)) It would be interesting to see if replacing the above "$(call" with: $(call cc-option,-mtune=generic) makes the problem go away the same way as changing the .config option. The MCORE2 config option also sets CONFIG_X86_P6_NOP, which overrides the normal X86_64 noops, if I'm reading that code correctly. But I think that's much less likely to be the since there