Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp2865928rdb; Wed, 15 Nov 2023 12:58:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IH2KYgKZ2S8fm4KK1Co3HNGS567mxU2JVHEWbW7C5y4Z162C1el8DI0MntCGHHdSMv57pot X-Received: by 2002:a05:6a20:e111:b0:161:2389:e34b with SMTP id kr17-20020a056a20e11100b001612389e34bmr16462509pzb.13.1700081906031; Wed, 15 Nov 2023 12:58:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700081906; cv=none; d=google.com; s=arc-20160816; b=AHw6brQDf9pc0iAMBoaur8HDoYPbAsIrLzjwz+8ahpoTHjoYqFzvSknO+GEEo5kiTI 41RrE/dTwaLmJ7W+nHwk1OB52UHMADaQYJf6ch0F5XXgOBO4uXkgUAEOmkdidMb9jbPb yemLc3VO4VeBDzhzHXXek3bdHFuGcQ3ctHwhJ4t4ocMKF6UkHOaRBD8kIorTd5o8LLdc HsD1UY+bcaAktCOCyZJ5CMgWgXP9ExJNthQ5j1f3am41saokTd3EsXTsPw5TznGU7IkZ fFYLCTMA7G4PgjZU0JYZhXnBB7xdTDMhxzkEMhePxP3WN2Jy7gWYJhBsgzx5pm40LbDn izKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:references:to:from:subject:cc :message-id:date:content-transfer-encoding:mime-version :dkim-signature; bh=FXaMwXj5ij7vkDizR6jQAfP6nHUZJLBqRO8hvfyF58M=; fh=g2HrtGQ2SKEJjmO4h3B/NWxQLF02BrQLKX946LuMoK4=; b=FLsWVlLbLvsNyaXxqXVI01lcofONKhoJt3muwai7lmbswT8KkKp9mkVI4iwklNW5gY qJDW2Pg+zaOKnTer0nfjPn2QsLeujcCyFqDBXdtLWHjniUrV2kJYdlYd+w0+OwJJLvKG +h5m3Jvfgnbs6E3/kqZpHgTU2ijqFQ5gJYXWyu6tsoWtzxlI5tBHBkr+4mfZrbYj1+EM zAzSqfITJe7lOY0whTaRQrPtJEz1JnCBHB9Iaq1nyTea/g2xiPEH4uJozV8ObT78E8mm 3/IAB4WMYKT+uaOHsnJtHAXwcf7j2exaVIKXqVl3ntGCKjN0K5wp6Hw4+GwiMNTtUuGV dBsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HoZtaZW2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id z16-20020a170903019000b001c9c108bf6esi11204825plg.387.2023.11.15.12.58.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 12:58:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HoZtaZW2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C035580D44D9; Wed, 15 Nov 2023 12:58:24 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344654AbjKOU6W (ORCPT + 99 others); Wed, 15 Nov 2023 15:58:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344779AbjKOU6E (ORCPT ); Wed, 15 Nov 2023 15:58:04 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF6451BF7; Wed, 15 Nov 2023 12:57:34 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CEEEC4160E; Wed, 15 Nov 2023 20:48:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700081338; bh=PJculjTWiL2GB3exzyUInoYkoRADOvSehHA7YLmK9QU=; h=Date:Cc:Subject:From:To:References:In-Reply-To:From; b=HoZtaZW2Nj0gACIllbuUieALzComiVjQK9O/TLxPbX6KK+VvX/Zoe6+qUDhyQdcpK zSiEgGXP4NRKTWIAdAFFuieVlJkkNebZq3zFiUlEUtL2nT7a0LgEUSAyazgeNK4YK8 RnfNq/0BWN1om6SmTCM7xVSC4KWV2kDe9sIZ0swVMAjku6+UbVAgsce6lYQbcBXqxm jiQP5TItttouyWlf8WqToNqx9QECMIGrpps5fN5tzhiy+fRwE+4QV5nE71/yghJBKh DF4jCjluzWVsvW5ZcpLJ2CtlF1IAH8Ojz5vw2WWMLx6qo6pMslgvVallP/tSrf2xoO TVZcp9A4nhSSg== Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Wed, 15 Nov 2023 22:48:52 +0200 Message-Id: Cc: , , , , , , , "Sean Christopherson" Subject: Re: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup functionality From: "Jarkko Sakkinen" To: "Haitao Huang" , , , , , , , , , , , , X-Mailer: aerc 0.15.2 References: <20231030182013.40086-1-haitao.huang@linux.intel.com> <20231030182013.40086-5-haitao.huang@linux.intel.com> In-Reply-To: <20231030182013.40086-5-haitao.huang@linux.intel.com> X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 15 Nov 2023 12:58:25 -0800 (PST) On Mon Oct 30, 2023 at 8:20 PM EET, Haitao Huang wrote: > From: Kristen Carlson Accardi > > Implement support for cgroup control of SGX Enclave Page Cache (EPC) > memory using the misc cgroup controller. EPC memory is independent > from normal system memory, e.g. must be reserved at boot from RAM and > cannot be converted between EPC and normal memory while the system is > running. EPC is managed by the SGX subsystem and is not accounted by > the memory controller. > > Much like normal system memory, EPC memory can be overcommitted via > virtual memory techniques and pages can be swapped out of the EPC to > their backing store (normal system memory, e.g. shmem). The SGX EPC > subsystem is analogous to the memory subsystem and the SGX EPC controller > is in turn analogous to the memory controller; it implements limit and > protection models for EPC memory. > > The misc controller provides a mechanism to set a hard limit of EPC > usage via the "sgx_epc" resource in "misc.max". The total EPC memory > available on the system is reported via the "sgx_epc" resource in > "misc.capacity". > > This patch was modified from the previous version to only add basic EPC > cgroup structure, accounting allocations for cgroup usage > (charge/uncharge), setup misc cgroup callbacks, set total EPC capacity. > > For now, the EPC cgroup simply blocks additional EPC allocation in > sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are > still tracked in the global active list, only reclaimed by the global > reclaimer when the total free page count is lower than a threshold. > > Later patches will reorganize the tracking and reclamation code in the > globale reclaimer and implement per-cgroup tracking and reclaiming. > > Co-developed-by: Sean Christopherson > Signed-off-by: Sean Christopherson > Signed-off-by: Kristen Carlson Accardi > Co-developed-by: Haitao Huang > Signed-off-by: Haitao Huang > --- > V6: > - Split the original large patch"Limit process EPC usage with misc > cgroup controller" and restructure it (Kai) > --- > arch/x86/Kconfig | 13 ++++ > arch/x86/kernel/cpu/sgx/Makefile | 1 + > arch/x86/kernel/cpu/sgx/epc_cgroup.c | 103 +++++++++++++++++++++++++++ > arch/x86/kernel/cpu/sgx/epc_cgroup.h | 36 ++++++++++ > arch/x86/kernel/cpu/sgx/main.c | 28 ++++++++ > arch/x86/kernel/cpu/sgx/sgx.h | 3 + > 6 files changed, 184 insertions(+) > create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c > create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 66bfabae8814..e17c5dc3aea4 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1921,6 +1921,19 @@ config X86_SGX > =20 > If unsure, say N. > =20 > +config CGROUP_SGX_EPC > + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC) for = Intel SGX" > + depends on X86_SGX && CGROUP_MISC > + help > + Provides control over the EPC footprint of tasks in a cgroup via > + the Miscellaneous cgroup controller. > + > + EPC is a subset of regular memory that is usable only by SGX > + enclaves and is very limited in quantity, e.g. less than 1% > + of total DRAM. > + > + Say N if unsure. > + > config X86_USER_SHADOW_STACK > bool "X86 userspace shadow stack" > depends on AS_WRUSS > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/M= akefile > index 9c1656779b2a..12901a488da7 100644 > --- a/arch/x86/kernel/cpu/sgx/Makefile > +++ b/arch/x86/kernel/cpu/sgx/Makefile > @@ -4,3 +4,4 @@ obj-y +=3D \ > ioctl.o \ > main.o > obj-$(CONFIG_X86_SGX_KVM) +=3D virt.o > +obj-$(CONFIG_CGROUP_SGX_EPC) +=3D epc_cgroup.o > diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/s= gx/epc_cgroup.c > new file mode 100644 > index 000000000000..500627d0563f > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c > @@ -0,0 +1,103 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright(c) 2022 Intel Corporation. > + > +#include > +#include > +#include "epc_cgroup.h" > + > +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct = misc_cg *cg) > +{ > + return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); > +} > + > +static inline bool sgx_epc_cgroup_disabled(void) > +{ > + return !cgroup_subsys_enabled(misc_cgrp_subsys); > +} > + > +/** > + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a single E= PC page > + * > + * Returns EPC cgroup or NULL on success, -errno on failure. Should have a description explaining what "charging hierarchically" is all about. This is too cryptic like this. E.g. consider wahat non-hierarchically charging means. There must be opposite end in order to have a meaning (for anything expressed with a language). > + */ > +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) > +{ > + struct sgx_epc_cgroup *epc_cg; > + int ret; > + > + if (sgx_epc_cgroup_disabled()) > + return NULL; > + > + epc_cg =3D sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); > + ret =3D misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); > + > + if (!ret) { > + /* No epc_cg returned, release ref from get_current_misc_cg() */ > + put_misc_cg(epc_cg->cg); > + return ERR_PTR(-ENOMEM); > + } > + > + /* Ref released in sgx_epc_cgroup_uncharge() */ > + return epc_cg; > +} > + > +/** > + * sgx_epc_cgroup_uncharge() - hierarchically uncharge EPC pages > + * @epc_cg: the charged epc cgroup > + */ > +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) > +{ > + if (sgx_epc_cgroup_disabled()) > + return; > + > + misc_cg_uncharge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); > + > + /* Ref got from sgx_epc_cgroup_try_charge() */ > + put_misc_cg(epc_cg->cg); > +} > + > +static void sgx_epc_cgroup_free(struct misc_cg *cg) > +{ > + struct sgx_epc_cgroup *epc_cg; > + > + epc_cg =3D sgx_epc_cgroup_from_misc_cg(cg); > + if (!epc_cg) > + return; > + > + kfree(epc_cg); > +} > + > +static int sgx_epc_cgroup_alloc(struct misc_cg *cg); > + > +const struct misc_operations_struct sgx_epc_cgroup_ops =3D { > + .alloc =3D sgx_epc_cgroup_alloc, > + .free =3D sgx_epc_cgroup_free, > +}; > + > +static int sgx_epc_cgroup_alloc(struct misc_cg *cg) > +{ > + struct sgx_epc_cgroup *epc_cg; > + > + epc_cg =3D kzalloc(sizeof(*epc_cg), GFP_KERNEL); > + if (!epc_cg) > + return -ENOMEM; > + > + cg->res[MISC_CG_RES_SGX_EPC].misc_ops =3D &sgx_epc_cgroup_ops; > + cg->res[MISC_CG_RES_SGX_EPC].priv =3D epc_cg; > + epc_cg->cg =3D cg; > + return 0; > +} > + > +static int __init sgx_epc_cgroup_init(void) > +{ > + struct misc_cg *cg; > + > + if (!boot_cpu_has(X86_FEATURE_SGX)) > + return 0; > + > + cg =3D misc_cg_root(); > + BUG_ON(!cg); > + > + return sgx_epc_cgroup_alloc(cg); > +} > +subsys_initcall(sgx_epc_cgroup_init); > diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/s= gx/epc_cgroup.h > new file mode 100644 > index 000000000000..c3abfe82be15 > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h > @@ -0,0 +1,36 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* Copyright(c) 2022 Intel Corporation. */ > +#ifndef _INTEL_SGX_EPC_CGROUP_H_ > +#define _INTEL_SGX_EPC_CGROUP_H_ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "sgx.h" > + > +#ifndef CONFIG_CGROUP_SGX_EPC > +#define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES > +struct sgx_epc_cgroup; > + > +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) > +{ > + return NULL; > +} > + > +static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg= ) { } > +#else > +struct sgx_epc_cgroup { > + struct misc_cg *cg; > +}; > + > +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void); > +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); > +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root); > + > +#endif > + > +#endif /* _INTEL_SGX_EPC_CGROUP_H_ */ > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/mai= n.c > index 166692f2d501..07606f391540 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -17,6 +18,7 @@ > #include "driver.h" > #include "encl.h" > #include "encls.h" > +#include "epc_cgroup.h" > =20 > struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; > static int sgx_nr_epc_sections; > @@ -559,6 +561,11 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page = *page) > struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) > { > struct sgx_epc_page *page; > + struct sgx_epc_cgroup *epc_cg; > + > + epc_cg =3D sgx_epc_cgroup_try_charge(); > + if (IS_ERR(epc_cg)) > + return ERR_CAST(epc_cg); > =20 > for ( ; ; ) { > page =3D __sgx_alloc_epc_page(); > @@ -580,10 +587,21 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner= , bool reclaim) > break; > } > =20 > + /* > + * Need to do a global reclamation if cgroup was not full but free > + * physical pages run out, causing __sgx_alloc_epc_page() to fail. > + */ > sgx_reclaim_pages(); > cond_resched(); > } > =20 > + if (!IS_ERR(page)) { > + WARN_ON_ONCE(page->epc_cg); > + page->epc_cg =3D epc_cg; > + } else { > + sgx_epc_cgroup_uncharge(epc_cg); > + } > + > if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) > wake_up(&ksgxd_waitq); > =20 > @@ -604,6 +622,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page) > struct sgx_epc_section *section =3D &sgx_epc_sections[page->section]; > struct sgx_numa_node *node =3D section->node; > =20 > + if (page->epc_cg) { > + sgx_epc_cgroup_uncharge(page->epc_cg); > + page->epc_cg =3D NULL; > + } > + > spin_lock(&node->lock); > =20 > page->owner =3D NULL; > @@ -643,6 +666,7 @@ static bool __init sgx_setup_epc_section(u64 phys_add= r, u64 size, > section->pages[i].flags =3D 0; > section->pages[i].owner =3D NULL; > section->pages[i].poison =3D 0; > + section->pages[i].epc_cg =3D NULL; > list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); > } > =20 > @@ -787,6 +811,7 @@ static void __init arch_update_sysfs_visibility(int n= id) {} > static bool __init sgx_page_cache_init(void) > { > u32 eax, ebx, ecx, edx, type; > + u64 capacity =3D 0; > u64 pa, size; > int nid; > int i; > @@ -837,6 +862,7 @@ static bool __init sgx_page_cache_init(void) > =20 > sgx_epc_sections[i].node =3D &sgx_numa_nodes[nid]; > sgx_numa_nodes[nid].size +=3D size; > + capacity +=3D size; > =20 > sgx_nr_epc_sections++; > } > @@ -846,6 +872,8 @@ static bool __init sgx_page_cache_init(void) > return false; > } > =20 > + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity); > + > return true; > } > =20 > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.= h > index d2dad21259a8..b1786774b8d2 100644 > --- a/arch/x86/kernel/cpu/sgx/sgx.h > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > @@ -29,12 +29,15 @@ > /* Pages on free list */ > #define SGX_EPC_PAGE_IS_FREE BIT(1) > =20 > +struct sgx_epc_cgroup; > + > struct sgx_epc_page { > unsigned int section; > u16 flags; > u16 poison; > struct sgx_encl_page *owner; > struct list_head list; > + struct sgx_epc_cgroup *epc_cg; > }; > =20 > /* BR, Jarkko