Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1109245ybz; Wed, 29 Apr 2020 15:13:56 -0700 (PDT) X-Google-Smtp-Source: APiQypLvhJjyX3QcfmVxtVP35dQj/67iLDKYQ9N5v5wqGwyM44FBCIbL/4hpwdWaVTDVIQ57RtDK X-Received: by 2002:a17:906:1c8a:: with SMTP id g10mr4896267ejh.342.1588198436737; Wed, 29 Apr 2020 15:13:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588198436; cv=none; d=google.com; s=arc-20160816; b=axTgNkj78VtL0mDggy+cuSN6Zmj7nZsAxhEMuFvDNP7UOAUVy17QDYf1yY5w29sG+3 7uqEBDkWoqqRTud3zOxLmRJX417ojqS6SInirkQhblhM+rt3dP9sLNwHjwhMbRqxR0sW 5sKP5ALqGi4DiAexHPm3iXx/zIV7QzvHjXLIic96TeB6A223xemioaLMtq2CAEVfaraF VUBgoQDsGrRt3ojAEV/33wXm7m1ceeJbhGtMhXa2L2K4oeD+b1RnV5h1ZflMMrder2a3 nA7Za+2egVmSd7THgOpec6jsw843y+fBT8hTCHasgeOQ0t0Kjza4HP/Du//gsbgmHXNE 3Diw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-sdr:ironport-sdr; bh=jpUcDhs9/JpX0EEkAPNzg5AuQgYRbt1+V4jIu075edc=; b=WVBcUI0fpdwdLDimVFca11w00kqIlq9yTSnVkIXCum8mhTweeheMdpGe9hpCWNUPA+ 082PU574ED7ihsOjm5uxiTAeP9CmgF4xv4F3Hjvvl0pyihpmjQsqYk7wHnEjCW2H+dwG BeO/ncMYzfwADP6qIv7GVKVv6dXciRhLWfDr8y3OJDaeDkG4Yo8/cmLjPbiMgMiC4RUP hU5HDYqnyGPHdNQZRGHy9L0Ppx/g94zyApUCrfPbNHUntTmG5sUhmNoCGY3eJK6Lg6OT G6yZFW127/8TtNjAxSMhuxlrURNeg344IYNVCAPAePAqJFTn9rmRhpZyIUK5Folf3Ysq SHvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i2si4343769eds.101.2020.04.29.15.13.33; Wed, 29 Apr 2020 15:13:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727982AbgD2WIy (ORCPT + 99 others); Wed, 29 Apr 2020 18:08:54 -0400 Received: from mga09.intel.com ([134.134.136.24]:61300 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727895AbgD2WIr (ORCPT ); Wed, 29 Apr 2020 18:08:47 -0400 IronPort-SDR: yE4E7PT1KM54djDX+prWENxlNv/zAkkNKAUdN/ykFytFfttxMG7D5psmliGn9bwEMiXDeA1IUo b+RGD4evGE2w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2020 15:08:45 -0700 IronPort-SDR: CaqjPX1qzs7BbH8+SLFhLcHm0AhHj/rcZS5oJa0YWPuOWPnXZzL3ZCEFKUhjJ1irn2OZDCAQ9V WmPjC6K/5SqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,333,1583222400"; d="scan'208";a="276308886" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga002.jf.intel.com with ESMTP; 29 Apr 2020 15:08:45 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v10 11/26] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY_HW to _PAGE_COW Date: Wed, 29 Apr 2020 15:07:17 -0700 Message-Id: <20200429220732.31602-12-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200429220732.31602-1-yu-cheng.yu@intel.com> References: <20200429220732.31602-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When shadow stack is introduced, [R/O + _PAGE_DIRTY_HW] PTE is reserved for shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW]. When a PTE goes from [R/W + _PAGE_DIRTY_HW] to [R/O + _PAGE_COW], it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a read-only PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replaces _PAGE_DIRTY_HW with _PAGE_COW, a transient shadow stack PTE can exist. This is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: - Replace bit shift with pte_wrprotect()/pmd_wrprotect(), which use bit test & shift. - Move READ_ONCE of old_pte into try_cmpxchg() loop. - Change static_cpu_has() to cpu_feature_enabled(). v9: - Change compile-time conditionals to runtime checks. - Fix parameters of try_cmpxchg(): change pte_t/pmd_t to pte_t.pte/pmd_t.pmd. v4: - Implement try_cmpxchg(). arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f4870cd040de..eaa38adb1038 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1316,6 +1316,32 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Some processors can start a write, but end up seeing a read-only + * PTE by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PTE which + * looks like a shadow stack PTE. + * + * However, this behavior has been improved and will not occur on + * processors supporting shadow stack. Without this guarantee, a + * transition to a non-present PTE and flush the TLB would be + * needed. + * + * When changing a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, move that bit to _PAGE_COW so that the + * PTE is not a shadow stack PTE. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + do { + old_pte = READ_ONCE(*ptep); + new_pte = pte_wrprotect(old_pte); + + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1372,6 +1398,32 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + /* + * Some processors can start a write, but end up seeing a read-only + * PMD by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PMD which + * looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PMD and flush the TLB would be + * needed. + * + * When changing a writable PMD to read-only and if the PMD has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_COW so that the + * PMD is not a shadow stack PMD. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + do { + old_pmd = READ_ONCE(*pmdp); + new_pmd = pmd_wrprotect(old_pmd); + + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd))); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -- 2.21.0