Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751598AbdFAI2A (ORCPT ); Thu, 1 Jun 2017 04:28:00 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:60786 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751577AbdFAI17 (ORCPT ); Thu, 1 Jun 2017 04:27:59 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Date: Thu, 1 Jun 2017 09:27:44 +0100 From: Chris Wilson To: Linus Torvalds , Mikulas Patocka , Ingo Molnar Cc: Peter Zijlstra , "H. Peter Anvin" , linux-kernel@vger.kernel.org Subject: [v4.12-rc3] Early boot panic on Broadwell Message-ID: <20170601082744.GD23936@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2458 Lines: 60 Hi guys, I hit an early boot panic on a Broadwell laptop (xps13-9343) that I bisected to: commit cbed27cdf0e3f7ea3b2259e86b9e34df02be3fe4 Author: Mikulas Patocka Date: Tue Apr 18 15:07:11 2017 -0400 x86/PAT: Fix Xorg regression on CPUs that don't support PAT In the file arch/x86/mm/pat.c, there's a '__pat_enabled' variable. The variable is set to 1 by default and the function pat_init() sets __pat_enabled to 0 if the CPU doesn't support PAT. However, on AMD K6-3 CPUs, the processor initialization code never calls pat_init() and so __pat_enabled stays 1 and the function pat_enabled() returns true, even though the K6-3 CPU doesn't support PAT. The result of this bug is that a kernel warning is produced when attempting to start the Xserver and the Xserver doesn't start (fork() returns ENOMEM). Another symptom of this bug is that the framebuffer driver doesn't set the K6-3 MTRR registers: x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3891 at arch/x86/mm/pat.c:1020 untrack_pfn+0x5c/0x9f ... x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining To fix the bug change pat_enabled() so that it returns true only if PAT initialization was actually done. Also, I changed boot_cpu_has(X86_FEATURE_PAT) to this_cpu_has(X86_FEATURE_PAT) in pat_ap_init(), so that we check the PAT feature on the processor that is being initialized. In my testing, I found that reverting the /boot_cpu_has/this_cpu_has/ change was enough to restore working behaviour: diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index 83a59a6..c537bfb 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -234,7 +234,7 @@ static void pat_bsp_init(u64 pat) static void pat_ap_init(u64 pat) { - if (!this_cpu_has(X86_FEATURE_PAT)) { + if (!boot_cpu_has(X86_FEATURE_PAT)) { /* * If this happens we are on a secondary CPU, but switched to * PAT on the boot CPU. We have no way to undo PAT. Seems scary enough that different cpus may have different features, but that may just be a symptom of the boot phase? -Chris -- Chris Wilson, Intel Open Source Technology Centre