Received: by 2002:ac2:464d:0:0:0:0:0 with SMTP id s13csp3293825lfo; Mon, 23 May 2022 00:55:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwdbEM9sq/YbDTmNs831AOQRs+WVB1a8qfw0S3rX2R435ruroURk4hxe5PIw13U3LVnGnvn X-Received: by 2002:a62:cd0b:0:b0:518:11b3:c9f with SMTP id o11-20020a62cd0b000000b0051811b30c9fmr22565148pfg.46.1653292535583; Mon, 23 May 2022 00:55:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653292535; cv=none; d=google.com; s=arc-20160816; b=W/YyYypmp4nrIYRv0c1RC9BmhIQg8EikTgvkKnVjhWsV9GQryB/9Oebxu7mQ8iObhG VjV5sdRxr1mq6/FeKcJbRwC5+FBOl2DZAc5/5W3UVFTrhdBEH3XPTwMGjVKGn4nkg0wU 3762pbYF+zPka5yRpyS5BEZVzuthG4H3gtcmMZ345E5U8LgQQl63Uaj0Jf1HozIF6ixa 2bxaFIQ1vvbkZpjk61CV2XGiafPCCh9+iQbI7do+VvY8v1+c1mIE0dqkZXgoeJnV42kP nYNq8rMCDIBgIQ7My4Rwc6AcGLz1iK6nAmbf35n0l7Cd+f/S99gg3vgGMnmCBDYnsyyj 4Kqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=VJpEsgNpWOSwJb2i6QnTiyOR7vnQqG0YpXjrHGiaMP4=; b=gOdUjl7hVweb/VxW9blIJRRPqWEfDTYq7NSAVlMJZg2WCsnofXrIb6zaS096lKrBQC wQm+bqww9Db+lWgVIZN3pK8dpDDisXlRKTUtQ1tLoTP/aGIWOz5y0R2SPM1d8voCKQ9n jK6s/G7za/s8qAMMMP8x7+TQkgqUe79jiaGZvWNUieJ06kJ5FovNylQeMmgfyDV/byhR 8m6xYJW2jvqbMUFtQ9s+t1HiRBuqRC8eWPOGI2tgIiCUpxZSiD8AWnF/JeQtGMdv6dhu eXphr654RVQ0pqxJBKrrNbA0ZCqdtTn5+kGjqYjmnX8kMedFz8S6N04llQBfoHdgrh98 AryQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ZsOkxq2A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id b13-20020a17090aa58d00b001c73b8066e0si12355300pjq.74.2022.05.23.00.55.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 00:55:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ZsOkxq2A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DD0D513E0C; Sun, 22 May 2022 23:55:43 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352291AbiETRnD (ORCPT + 99 others); Fri, 20 May 2022 13:43:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237892AbiETRnC (ORCPT ); Fri, 20 May 2022 13:43:02 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B68FC5997E; Fri, 20 May 2022 10:43:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653068580; x=1684604580; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=fyg9TiH1ZV2HFjP5MEGy0as8O4WB//BPXXPI6BsYlFY=; b=ZsOkxq2ArSJiq+/g++8+2ORokXnjNpYVNIfbZAeZTa5Mmc9dlrhNfZaT i3ifibyWVG4tI0qdmkLuchDKVOF57MN5NKw7cWkyFyuHaYOI6obGb/8Wo 43nxWPmWlbBup9vDI+UOTBeby6XJ95vukcLgjtM6GeA6RJh+7UOboBkeG 65PpsLoi4v2dNNMb9lwvMaazVTjet1nCx6XGZXB/BO/XNElEt6cKHgwyh QAk8YfZS+RtDBVwJYbqKASnahUlFiZEO6ncjhTNLJpENrXc9WBSrR3c3w zGxJ+5j53d8d4PvaySyJitdu0Y99UgqMH9RDvq1cv6jjB74T45VGWlxzc A==; X-IronPort-AV: E=McAfee;i="6400,9594,10353"; a="254738887" X-IronPort-AV: E=Sophos;i="5.91,240,1647327600"; d="scan'208";a="254738887" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2022 10:42:59 -0700 X-IronPort-AV: E=Sophos;i="5.91,240,1647327600"; d="scan'208";a="715627429" Received: from kcaccard-mobl.amr.corp.intel.com (HELO kcaccard-mobl1.jf.intel.com) ([10.209.83.65]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2022 10:42:58 -0700 From: Kristen Carlson Accardi To: linux-sgx@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, roman.gushchin@linux.dev, hannes@cmpxchg.org, shakeelb@google.com, Kristen Carlson Accardi , stable@vger.kernel.org Subject: [PATCH v3] x86/sgx: Set active memcg prior to shmem allocation Date: Fri, 20 May 2022 10:42:47 -0700 Message-Id: <20220520174248.4918-1-kristen@linux.intel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the system runs out of enclave memory, SGX can reclaim EPC pages by swapping to normal RAM. These backing pages are allocated via a per-enclave shared memory area. Since SGX allows unlimited over commit on EPC memory, the reclaimer thread can allocate a large number of backing RAM pages in response to EPC memory pressure. When the shared memory backing RAM allocation occurs during the reclaimer thread context, the shared memory is charged to the root memory control group, and the shmem usage of the enclave is not properly accounted for, making cgroups ineffective at limiting the amount of RAM an enclave can consume. For example, when using a cgroup to launch a set of test enclaves, the kernel does not properly account for 50% - 75% of shmem page allocations on average. In the worst case, when nearly all allocations occur during the reclaimer thread, the kernel accounts less than a percent of the amount of shmem used by the enclave's cgroup to the correct cgroup. SGX stores a list of mm_structs that are associated with an enclave. Pick one of them during reclaim and charge that mm's memcg with the shmem allocation. The one that gets picked is arbitrary, but this list almost always only has one mm. The cases where there is more than one mm with different memcg's are not worth considering. Create a new function - sgx_encl_alloc_backing(). This function is used whenever a new backing storage page needs to be allocated. Previously the same function was used for page allocation as well as retrieving a previously allocated page. Prior to backing page allocation, if there is a mm_struct associated with the enclave that is requesting the allocation, it is set as the active memory control group. Signed-off-by: Kristen Carlson Accardi Reviewed-by: Shakeel Butt Acked-by: Roman Gushchin Cc: stable@vger.kernel.org --- V2 -> V3: Changed memcg variable names in sgx_encl_alloc_backing() and removed some whitespace. V1 -> V2: Changed sgx_encl_set_active_memcg() to simply return the correct memcg for the enclave and renamed to sgx_encl_get_mem_cgroup(). Created helper function current_is_ksgxd() to improve readability. Use mmget_not_zero()/mmput_async() when searching mm_list. Move call to set_active_memcg() to sgx_encl_alloc_backing() and use mem_cgroup_put() to avoid leaking a memcg reference. Address review feedback regarding comments and commit log. --- arch/x86/kernel/cpu/sgx/encl.c | 105 ++++++++++++++++++++++++++++++++- arch/x86/kernel/cpu/sgx/encl.h | 11 +++- arch/x86/kernel/cpu/sgx/main.c | 4 +- 3 files changed, 114 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 001808e3901c..6f05e3d919f7 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, else page_index = PFN_DOWN(encl->size); - ret = sgx_encl_get_backing(encl, page_index, &b); + ret = sgx_encl_lookup_backing(encl, page_index, &b); if (ret) return ret; @@ -574,7 +574,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, * 0 on success, * -errno otherwise. */ -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, struct sgx_backing *backing) { pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5); @@ -601,6 +601,107 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, return 0; } +/* + * When called from ksgxd, returns the mem_cgroup of a struct mm stored + * in the enclave's mm_list. When not called from ksgxd, just returns + * the mem_cgroup of the current task. + */ +static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl) +{ + struct mem_cgroup *memcg = NULL; + struct sgx_encl_mm *encl_mm; + int idx; + + /* + * If called from normal task context, return the mem_cgroup + * of the current task's mm. The remainder of the handling is for + * ksgxd. + */ + if (!current_is_ksgxd()) + return get_mem_cgroup_from_mm(current->mm); + + /* + * Search the enclave's mm_list to find an mm associated with + * this enclave to charge the allocation to. + */ + idx = srcu_read_lock(&encl->srcu); + + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { + if (!mmget_not_zero(encl_mm->mm)) + continue; + + memcg = get_mem_cgroup_from_mm(encl_mm->mm); + + mmput_async(encl_mm->mm); + + break; + } + + srcu_read_unlock(&encl->srcu, idx); + + /* + * In the rare case that there isn't an mm associated with + * the enclave, set memcg to the current active mem_cgroup. + * This will be the root mem_cgroup if there is no active + * mem_cgroup. + */ + if (!memcg) + return get_mem_cgroup_from_mm(NULL); + + return memcg; +} + +/** + * sgx_encl_alloc_backing() - allocate a new backing storage page + * @encl: an enclave pointer + * @page_index: enclave page index + * @backing: data for accessing backing storage for the page + * + * When called from ksgxd, sets the active memcg from one of the + * mms in the enclave's mm_list prior to any backing page allocation, + * in order to ensure that shmem page allocations are charged to the + * enclave. + * + * Return: + * 0 on success, + * -errno otherwise. + */ +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing) +{ + struct mem_cgroup *encl_memcg = sgx_encl_get_mem_cgroup(encl); + struct mem_cgroup *memcg = set_active_memcg(encl_memcg); + int ret; + + ret = sgx_encl_get_backing(encl, page_index, backing); + + set_active_memcg(memcg); + mem_cgroup_put(encl_memcg); + + return ret; +} + +/** + * sgx_encl_lookup_backing() - retrieve an existing backing storage page + * @encl: an enclave pointer + * @page_index: enclave page index + * @backing: data for accessing backing storage for the page + * + * Retrieve a backing page for loading data back into an EPC page with ELDU. + * It is the caller's responsibility to ensure that it is appropriate to use + * sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If lookup is + * not used correctly, this will cause an allocation which is not accounted for. + * + * Return: + * 0 on success, + * -errno otherwise. + */ +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing) +{ + return sgx_encl_get_backing(encl, page_index, backing); +} + /** * sgx_encl_put_backing() - Unpin the backing storage * @backing: data for accessing backing storage for the page diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index fec43ca65065..2de3b150ab00 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -100,13 +100,20 @@ static inline int sgx_encl_find(struct mm_struct *mm, unsigned long addr, return 0; } +static inline bool current_is_ksgxd(void) +{ + return current->mm ? false : true; +} + int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, unsigned long end, unsigned long vm_flags); void sgx_encl_release(struct kref *ref); int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, - struct sgx_backing *backing); +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing); +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing); void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 4b41efc9e367..7d41c8538795 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -310,7 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, encl->secs_child_cnt--; if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { - ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), + ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size), &secs_backing); if (ret) goto out; @@ -381,7 +381,7 @@ static void sgx_reclaim_pages(void) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); - ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); + ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]); if (ret) goto skip; -- 2.20.1