Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2228922imm; Fri, 7 Sep 2018 12:53:00 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaXq+LI8c7GKsWU8q8Z/Oh0tUvF8n+sO4QLfKQ0BFXbWAFf49wFpEAdDZd5Ouu1joutBLVr X-Received: by 2002:a63:481:: with SMTP id 123-v6mr10175080pge.129.1536349980554; Fri, 07 Sep 2018 12:53:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536349980; cv=none; d=google.com; s=arc-20160816; b=xvnY4NS1BsJ9F7pya9fZ96FGHyVi81Qi/q0qwS9YpoPYIPgIM3QgTiUfhkI+hG2CDq /Kyhz22zhqX2ShanjDaeGkwsuaapktynwYPxp4zLe5miRerqCn3pYqVPVtvRVQlGgYnR 9zqbKdJxKo+XcurUHzVzVmoc7kguzbp9V7MnGR6vhGWpnfGD9lKnyOl8ATycLVMtwMOP DE5ilPjVVwJ7qZOWd3xGBqfz9+CiyvgYv8HKdHOnBaeyO2p3uCX1HF+yzy8YAol3QrdM Qc/tYh1MINVdO3mKIQYuC6UscVY8bmQUQfwdK+6PQlJUiUrL/xdYEvrTFT8qPO3FFmC4 VXJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:in-reply-to:references:date :from:cc:to:subject; bh=Lwcq9Ajd8NAAnR62TeQLTDTBVGRvTQ4sHBPZgOwdrFo=; b=cBW2ZGIBcRboTR86g7/cDkFoqaPU2txwrFeIeI1gq56Tz5CX+nws+7dj5ZPexFODhm inWj0aXloSqA6Iqay0P4daOtcw4b52gnF+9YaJUll0ITyrwHFh7K5lVqW7pQRa6Zjtre inkQ+g4mQmHbPGZNYsXM1lkGpZOw3qq0hoUyH3O1aTFW704wSw9+MY23bI7Uc7Lq4yWB lfUdwhFzz8HbqoJMZ44Bs8+G9bm8Ou0iWAY6IKDkoTpmNRsBwY5vfYjNU0Cj6a5BY6rT IO5QEd+JXbNEiHAd4G9rO2tiJ0oKIxKVL0m1pRGiEksYDYUbUulFCUicBx3Cpf3m93aX I5vA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x5-v6si8436505plv.304.2018.09.07.12.52.45; Fri, 07 Sep 2018 12:53:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727579AbeIHAeD (ORCPT + 99 others); Fri, 7 Sep 2018 20:34:03 -0400 Received: from mga12.intel.com ([192.55.52.136]:44822 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726187AbeIHAeC (ORCPT ); Fri, 7 Sep 2018 20:34:02 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Sep 2018 12:51:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,343,1531810800"; d="scan'208";a="69209451" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by fmsmga008.fm.intel.com with ESMTP; 07 Sep 2018 12:51:25 -0700 Subject: [RFC][PATCH 1/8] x86/mm: clarify hardware vs. software "error_code" To: linux-kernel@vger.kernel.org Cc: Dave Hansen , sean.j.christopherson@intel.com, peterz@infradead.org, tglx@linutronix.de, x86@kernel.org, luto@kernel.org From: Dave Hansen Date: Fri, 07 Sep 2018 12:48:54 -0700 References: <20180907194852.3C351B82@viggo.jf.intel.com> In-Reply-To: <20180907194852.3C351B82@viggo.jf.intel.com> Message-Id: <20180907194854.74729D71@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Signed-off-by: Dave Hansen Cc: Sean Christopherson Cc: "Peter Zijlstra (Intel)" Cc: Thomas Gleixner Cc: x86@kernel.org Cc: Andy Lutomirski --- b/arch/x86/mm/fault.c | 77 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 52 insertions(+), 25 deletions(-) diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-0 arch/x86/mm/fault.c --- a/arch/x86/mm/fault.c~pkeys-fault-warnings-0 2018-09-07 11:21:45.629751903 -0700 +++ b/arch/x86/mm/fault.c 2018-09-07 11:21:45.633751903 -0700 @@ -1209,9 +1209,10 @@ static inline bool smap_violation(int er * routines. */ static noinline void -__do_page_fault(struct pt_regs *regs, unsigned long error_code, +__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { + unsigned long sw_error_code; struct vm_area_struct *vma; struct task_struct *tsk; struct mm_struct *mm; @@ -1237,17 +1238,17 @@ __do_page_fault(struct pt_regs *regs, un * nothing more. * * This verifies that the fault happens in kernel space - * (error_code & 4) == 0, and that the fault was not a - * protection error (error_code & 9) == 0. + * (hw_error_code & 4) == 0, and that the fault was not a + * protection error (hw_error_code & 9) == 0. */ if (unlikely(fault_in_kernel_space(address))) { - if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { + if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { if (vmalloc_fault(address) >= 0) return; } /* Can handle a stale RO->RW TLB: */ - if (spurious_fault(error_code, address)) + if (spurious_fault(hw_error_code, address)) return; /* kprobes don't want to hook the spurious faults: */ @@ -1257,7 +1258,7 @@ __do_page_fault(struct pt_regs *regs, un * Don't take the mm semaphore here. If we fixup a prefetch * fault we could otherwise deadlock: */ - bad_area_nosemaphore(regs, error_code, address, NULL); + bad_area_nosemaphore(regs, hw_error_code, address, NULL); return; } @@ -1266,11 +1267,11 @@ __do_page_fault(struct pt_regs *regs, un if (unlikely(kprobes_fault(regs))) return; - if (unlikely(error_code & X86_PF_RSVD)) - pgtable_bad(regs, error_code, address); + if (unlikely(hw_error_code & X86_PF_RSVD)) + pgtable_bad(regs, hw_error_code, address); - if (unlikely(smap_violation(error_code, regs))) { - bad_area_nosemaphore(regs, error_code, address, NULL); + if (unlikely(smap_violation(hw_error_code, regs))) { + bad_area_nosemaphore(regs, hw_error_code, address, NULL); return; } @@ -1279,11 +1280,18 @@ __do_page_fault(struct pt_regs *regs, un * in a region with pagefaults disabled then we must not take the fault */ if (unlikely(faulthandler_disabled() || !mm)) { - bad_area_nosemaphore(regs, error_code, address, NULL); + bad_area_nosemaphore(regs, hw_error_code, address, NULL); return; } /* + * hw_error_code is literally the "page fault error code" passed to + * the kernel directly from the hardware. But, we will shortly be + * modifying it in software, so give it a new name. + */ + sw_error_code = hw_error_code; + + /* * It's safe to allow irq's after cr2 has been saved and the * vmalloc fault has been handled. * @@ -1292,7 +1300,26 @@ __do_page_fault(struct pt_regs *regs, un */ if (user_mode(regs)) { local_irq_enable(); - error_code |= X86_PF_USER; + /* + * Up to this point, X86_PF_USER set in hw_error_code + * indicated a user-mode access. But, after this, + * X86_PF_USER in sw_error_code will indicate either + * that, *or* an implicit kernel(supervisor)-mode access + * which originated from user mode. + */ + if (!(hw_error_code & X86_PF_USER)) { + /* + * The CPU was in user mode, but the CPU says + * the fault was not a user-mode access. + * Must be an implicit kernel-mode access, + * which we do not expect to happen in the + * user address space. + */ + WARN_ONCE(1, "kernel-mode error from user-mode: %lx\n", + hw_error_code); + + sw_error_code |= X86_PF_USER; + } flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) @@ -1301,9 +1328,9 @@ __do_page_fault(struct pt_regs *regs, un perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - if (error_code & X86_PF_WRITE) + if (sw_error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; - if (error_code & X86_PF_INSTR) + if (sw_error_code & X86_PF_INSTR) flags |= FAULT_FLAG_INSTRUCTION; /* @@ -1323,9 +1350,9 @@ __do_page_fault(struct pt_regs *regs, un * space check, thus avoiding the deadlock: */ if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if (!(error_code & X86_PF_USER) && + if (!(sw_error_code & X86_PF_USER) && !search_exception_tables(regs->ip)) { - bad_area_nosemaphore(regs, error_code, address, NULL); + bad_area_nosemaphore(regs, sw_error_code, address, NULL); return; } retry: @@ -1341,16 +1368,16 @@ retry: vma = find_vma(mm, address); if (unlikely(!vma)) { - bad_area(regs, error_code, address); + bad_area(regs, sw_error_code, address); return; } if (likely(vma->vm_start <= address)) goto good_area; if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { - bad_area(regs, error_code, address); + bad_area(regs, sw_error_code, address); return; } - if (error_code & X86_PF_USER) { + if (sw_error_code & X86_PF_USER) { /* * Accessing the stack below %sp is always a bug. * The large cushion allows instructions like enter @@ -1358,12 +1385,12 @@ retry: * 32 pointers and then decrements %sp by 65535.) */ if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) { - bad_area(regs, error_code, address); + bad_area(regs, sw_error_code, address); return; } } if (unlikely(expand_stack(vma, address))) { - bad_area(regs, error_code, address); + bad_area(regs, sw_error_code, address); return; } @@ -1372,8 +1399,8 @@ retry: * we can handle it.. */ good_area: - if (unlikely(access_error(error_code, vma))) { - bad_area_access_error(regs, error_code, address, vma); + if (unlikely(access_error(sw_error_code, vma))) { + bad_area_access_error(regs, sw_error_code, address, vma); return; } @@ -1415,13 +1442,13 @@ good_area: return; /* Not returning to user mode? Handle exceptions or die: */ - no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); + no_context(regs, sw_error_code, address, SIGBUS, BUS_ADRERR); return; } up_read(&mm->mmap_sem); if (unlikely(fault & VM_FAULT_ERROR)) { - mm_fault_error(regs, error_code, address, &pkey, fault); + mm_fault_error(regs, sw_error_code, address, &pkey, fault); return; } _