Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp558522pxb; Fri, 28 Jan 2022 05:19:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJyaWNuw07KYWphTc3KPGJKD2xGm1mOFBwycYSnh3ns0BOYIRrtlH/2ICRT2QPniF1A5J2a+ X-Received: by 2002:a63:2a94:: with SMTP id q142mr3277471pgq.193.1643375993218; Fri, 28 Jan 2022 05:19:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643375993; cv=none; d=google.com; s=arc-20160816; b=VUouenarqsGDIEDyMbG8+2RVmwRuyZ6S2sMJ1RggJNZqmakXh7K5cn6mwb7ME+VEAe iLDudSnYJpILP7rGPyq9aZZNSOxAIcrbWqssJX4o+VZdxG+sns1gwoXdYxMY4IpoZJvA PkjqblP6rzGm0PBtmstqbudG7r8412Vh+1LjbvtvwXLST+kGfJP6ETvYlKOzQ0AgFrni 4AtcZRRaGixUv01XK3Q651yIkpON7rp1SH9LkI9SsNSJQQ9LYHlC92hRctWzDfji/Xz8 S/w0D2HhvsXOgI+LF21xLaPggRIDd/LfGfYD3MsJ0bABDVx1nr4bBuh/kZmmTvkz17fB YQcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=s33jxFZlm3PnBuEM4X3mnZMeJ21i33b/ixZelDBAJEM=; b=KNQfo245bBSToEjn4b3+HCFBdAVzV2ne1ajkorAVtT92YVd+lQXz7XNP/lPzsgIuia ZiKxlBERkDj0WsyPYdaqmcG78i08afJ+Hr2A5V/ViOS+NX/KGUdbJEMbMwjj3hN1UHLP sC03b+3Rf5qOfTnZ2mraaLkI+c7NmEHPDTdgq5wiZYoZweLDRdKtkONhSizwdjUGeZcq UXvE10n0Uwydv86pOTVOdQQhn2DY0dXSzLvH4RRzWg7jrrRI2TvoDpIpKWBecW0fiHKo EuxGuKBP/0cX3SDy10qX1bbPCag/ulHPgQ2aVgGWkTGUEFEXUj1vEofOuVAk35U2yOsi oICg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=f7sM8AlO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p1si4893837pls.193.2022.01.28.05.19.20; Fri, 28 Jan 2022 05:19:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=f7sM8AlO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244808AbiA0R4r (ORCPT + 99 others); Thu, 27 Jan 2022 12:56:47 -0500 Received: from mga12.intel.com ([192.55.52.136]:65504 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244769AbiA0Rzd (ORCPT ); Thu, 27 Jan 2022 12:55:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306133; x=1674842133; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OakK5azFAZtRgEn1SDscsdM2loliiFcrxNWheUTy0fE=; b=f7sM8AlOPADn4RQw2g7vqXZowDO2IoWn3erMqYmwEtNTOlXEBS4xYm2X m8vHjGfY0A8eefwVAzKhpHqh7nApzXmL0W7D5Hpr+xWBakrwqt69VMhzI 7FxGvIqDtsOMTqmpxE2j1MQiardWlmGsXQv1o9+3H5oC7GTRYoZgafb8r /txNCd5MDmn/NQGtaKGSXRDTaBGGs8AnRPiDXAUXSWX2GqPoEMVY6tcWq pW8dm/6TlkP9qMcBwr0vou9TbSy+Wdn8hUoeo0vdu6s2T6n3lanFdujoC 9qwlU8pC4iXAtx/dHspz66hQjrkQiqwaWYV28oBcLhqBNtCcmtPMTWc24 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899138" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899138" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796189" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 38/44] memremap_pages: Define pgmap_mk_{readwrite|noaccess}() calls Date: Thu, 27 Jan 2022 09:54:59 -0800 Message-Id: <20220127175505.851391-39-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ira Weiny Users will need a way to flag valid access to pages which have been protected with PGMAP protections. Provide this by defining pgmap_mk_*() accessor functions. pgmap_mk_{readwrite|noaccess}() take a struct page for convenience. They determine if the page is protected by dev_pagemap protections. If so, they perform the requested operation. In addition, the lower level __pgmap_* functions are exported. They take the dev_pagemap object directly for internal users who have knowledge of the of the dev_pagemap. All changes in the protections must be through the above calls. They abstract the protection implementation (currently the PKS api) from the upper layer users. Furthermore, the calls are nestable by the use of a per task reference count. This ensures that the first call to re-enable protection does not 'break' the last access of the device memory. Access to device memory during exceptions (#PF) is expected only from user faults. Therefore there is no need to maintain the reference count when entering or exiting exceptions. However, reference counting will occur during the exception. Recall that protection is automatically enabled during exceptions by the PKS core.[1] NOTE: It is not anticipated that any code paths will directly nest these calls. For this reason multiple reviewers, including Dan and Thomas, asked why this reference counting was needed at this level rather than in a higher level call such as kmap_{atomic,local_page}(). The reason is that pgmap_mk_readwrite() could nest with regards to other callers of pgmap_mk_*() such as kmap_{atomic,local_page}(). Therefore push this reference counting to the lower level and just ensure that these calls are nestable. [1] https://lore.kernel.org/lkml/20210401225833.566238-9-ira.weiny@intel.com/ Signed-off-by: Ira Weiny --- Changes for V8 Split these functions into their own patch. This helps to clarify the commit message and usage. --- include/linux/mm.h | 34 ++++++++++++++++++++++++++++++++++ include/linux/sched.h | 7 +++++++ init/init_task.c | 3 +++ mm/memremap.c | 14 ++++++++++++++ 4 files changed, 58 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6e4a2758e3d3..60044de77c54 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1162,10 +1162,44 @@ static inline bool devmap_protected(struct page *page) return false; } +void __pgmap_mk_readwrite(struct dev_pagemap *pgmap); +void __pgmap_mk_noaccess(struct dev_pagemap *pgmap); + +static inline bool pgmap_check_pgmap_prot(struct page *page) +{ + if (!devmap_protected(page)) + return false; + + /* + * There is no known use case to change permissions in an irq for pgmap + * pages + */ + lockdep_assert_in_irq(); + return true; +} + +static inline void pgmap_mk_readwrite(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_mk_readwrite(page->pgmap); +} +static inline void pgmap_mk_noaccess(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_mk_noaccess(page->pgmap); +} + bool pgmap_protection_available(void); #else +static inline void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) { } +static inline void __pgmap_mk_noaccess(struct dev_pagemap *pgmap) { } +static inline void pgmap_mk_readwrite(struct page *page) { } +static inline void pgmap_mk_noaccess(struct page *page) { } + static inline bool pgmap_protection_available(void) { return false; diff --git a/include/linux/sched.h b/include/linux/sched.h index f5b2be39a78c..5020ed7e67b7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1492,6 +1492,13 @@ struct task_struct { struct callback_head l1d_flush_kill; #endif +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + /* + * NOTE: pgmap_prot_count is modified within a single thread of + * execution. So it does not need to be atomic_t. + */ + u32 pgmap_prot_count; +#endif /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/init_task.c b/init/init_task.c index 73cc8f03511a..948b32cf8139 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -209,6 +209,9 @@ struct task_struct init_task #ifdef CONFIG_SECCOMP_FILTER .seccomp = { .filter_count = ATOMIC_INIT(0) }, #endif +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + .pgmap_prot_count = 0, +#endif }; EXPORT_SYMBOL(init_task); diff --git a/mm/memremap.c b/mm/memremap.c index d3e6f328a711..b75c4f778c59 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -96,6 +96,20 @@ static void devmap_protection_disable(void) static_branch_dec(&dev_pgmap_protection_static_key); } +void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) +{ + if (!current->pgmap_prot_count++) + pks_mk_readwrite(PKS_KEY_PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(__pgmap_mk_readwrite); + +void __pgmap_mk_noaccess(struct dev_pagemap *pgmap) +{ + if (!--current->pgmap_prot_count) + pks_mk_noaccess(PKS_KEY_PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(__pgmap_mk_noaccess); + bool pgmap_protection_available(void) { return pks_available(); -- 2.31.1