Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1157958rwb; Thu, 22 Sep 2022 10:51:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7TOD9+o0wAyjx8pU0+59sql54ygTzmy/dhgv4hZAgVxrFUV9V9Gc4i24QHgHrSzBQajvlu X-Received: by 2002:a17:907:60d3:b0:77e:5e9b:8ec with SMTP id hv19-20020a17090760d300b0077e5e9b08ecmr3879295ejc.640.1663869088411; Thu, 22 Sep 2022 10:51:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663869088; cv=none; d=google.com; s=arc-20160816; b=HkErhizJdIuXLK/4wq+t1w6AKDfWQHUvsy7+Ztm6qOWp0d0Jju4ha8+MPCjBQBJAA1 aqQe1zrDFAztDQoaEZEZiy2VVsVIta7EyWzUB7FrDUpb22B6zf5ud9Kl1NPcyyfGn+hc Nw6Gceay7zgdtw2rSTPKWT+qiDiL3awuo5oL3/EoZM/4ZZSNfnZKNx09ODxnQYJaGIjX e2sC7lBJNKCyRULgBMb8dQvhsy2PBhRHdFMO1mwVr3s1sfiLrThWvaYn7eNyW0ntWTT7 2IpoObCRY3HJNaWxolvNNqvBnYPjxS5XwOUQ9O/VR3KyCrR/RWiRmELbh7Uhq8g9GHRY 4yYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zy1BbjDuoKiXw1GLWwkL1Zh7kI59c1ji+rEqz4jR9JY=; b=Kdy3ALVA4VB7OO/ak3JyhtCx4m6C4tn6CU7iOXUPOQgVKOHd5lyYmLWdw3hJC0XNk4 OEfvQ3Eq0HLQuG0UMJzvK2R9SyFQ8Xl44wOp2p5T9kz+6AmenZFQ1HakRAnHzJ7cYa2m 8y7CPKpvSOB2Lglt39lP4ziiIhmFS666frDbmm9NyWr5FhydA6wTv801qoj+A8hqFjXv o902oCiq7EcBkbkRfRSCz/6NSRy0O7yPFqd8R9IoS9lixRQkPfYUbTllQi2CMlJ0Cwz/ zdbCkfHUMlFBFeqX85z46oAnwm2ZaI6NNb+Ck9T/6QRQZP/jYr7Kpf8mhRdEd1d0nLVq zrtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hu+DqGiN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o9-20020a170906774900b00730b0fef595si4866341ejn.857.2022.09.22.10.51.01; Thu, 22 Sep 2022 10:51:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hu+DqGiN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230216AbiIVRQN (ORCPT + 99 others); Thu, 22 Sep 2022 13:16:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231919AbiIVRPy (ORCPT ); Thu, 22 Sep 2022 13:15:54 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB3EBFE074; Thu, 22 Sep 2022 10:15:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866934; x=1695402934; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UDz/+joCMOsr/E6xxinNXlAv/UogP8vf91LabdqMtoc=; b=hu+DqGiNgWVVBqPQUqlZoZnwzx51emQGWvl8fosYNc3n2EN/wnwuoGNg 8/br2GViVc2+UzzqpnyLgEmEmmKk5LIx67qEq8EjA7ic/Fno0GzJTRNZr 2FZb7J3PVp4HHBn4A9L4V4Kq5DKWr8jDvN/h+z/nMYpgNRWdOq2WEYG+K Dcf7d5413gbS8GWVnOKw6SMut8/aKwMP9i5vijTUjGkFwCgDON1uCnDtG DVcZK2m3KCLZLOIiHvzeW394ALgqEQSobNm/SNYLJ9iBETAyhAqgfjz8q cEiAPthAmTH+yLc5epo6GjXGNE+2BT9KVAtAl43HxyTuhqc1zPYT++9VD A==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="364351924" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="364351924" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:40 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762269940" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:17 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Date: Thu, 22 Sep 2022 10:10:42 -0700 Message-Id: <20220922171057.1236139-6-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson Add code to keep track of pages that are not tracked by the reclaimer in the LRU's "unreclaimable" list. When there is an OOM event and an enclave must be OOM killed, the EPC pages which are not tracked by the reclaimer can still be freed. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++--- arch/x86/kernel/cpu/sgx/ioctl.c | 11 +++++++---- arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++----------- arch/x86/kernel/cpu/sgx/sgx.h | 7 ++++--- arch/x86/kernel/cpu/sgx/virt.c | 28 ++++++++++++++++++++-------- 5 files changed, 53 insertions(+), 29 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index a18f1311b57d..ad611c06798f 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -252,6 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, epc_page = sgx_encl_eldu(&encl->secs, NULL); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); + sgx_record_epc_page(epc_page, 0); } epc_page = sgx_encl_eldu(entry, encl->secs.epc_page); @@ -259,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); return entry; } @@ -375,7 +376,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -687,7 +688,7 @@ void sgx_encl_release(struct kref *ref) * The page and its radix tree entry cannot be freed * if the page is being held by the reclaimer. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) + if (sgx_drop_epc_page(entry->epc_page)) continue; sgx_encl_free_epc_page(entry->epc_page); @@ -703,6 +704,7 @@ void sgx_encl_release(struct kref *ref) xa_destroy(&encl->page_array); if (!encl->secs_child_cnt && encl->secs.epc_page) { + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; } @@ -711,6 +713,7 @@ void sgx_encl_release(struct kref *ref) va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); list_del(&va_page->list); + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); } @@ -1218,6 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } + sgx_record_epc_page(epc_page, 0); epc_page->owner = encl; diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9a1bb3c3211a..aca80a3f38a1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -48,6 +48,7 @@ void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page) encl->page_cnt--; if (va_page) { + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); list_del(&va_page->list); kfree(va_page); @@ -113,6 +114,8 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS; + sgx_record_epc_page(encl->secs.epc_page, 0); + /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -322,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -958,7 +961,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, * Prevent page from being reclaimed while mutex * is released. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EAGAIN; goto out_entry_changed; } @@ -973,7 +976,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); } /* Change EPC type */ @@ -1130,7 +1133,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, goto out_unlock; } - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EBUSY; goto out_unlock; } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index af68dc1c677b..543bc5b20508 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -262,7 +262,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, goto out; sgx_encl_ewb(encl->secs.epc_page, &secs_backing); - + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; @@ -495,31 +495,35 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_record_epc_page() - Add a page to the LRU tracking * @page: EPC page * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. + * Mark a page with the specified flags and add it to the appropriate + * (un)reclaimable list. */ -void sgx_mark_page_reclaimable(struct sgx_epc_page *page) +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_global_lru.reclaimable); + WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + page->flags |= flags; + if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) + list_add_tail(&page->list, &sgx_global_lru.reclaimable); + else + list_add_tail(&page->list, &sgx_global_lru.unreclaimable); spin_unlock(&sgx_global_lru.lock); } /** - * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list + * sgx_drop_epc_page() - Remove a page from a LRU list * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, * -EBUSY if the page is in the process of being reclaimed */ -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) +int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { @@ -529,9 +533,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) return -EBUSY; } - list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } + list_del(&page->list); spin_unlock(&sgx_global_lru.lock); return 0; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 7b208ee8eb45..65625ea8fd6e 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -86,20 +86,21 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) struct sgx_epc_lru { spinlock_t lock; struct list_head reclaimable; + struct list_head unreclaimable; }; static inline void sgx_lru_init(struct sgx_epc_lru *lru) { spin_lock_init(&lru->lock); INIT_LIST_HEAD(&lru->reclaimable); + INIT_LIST_HEAD(&lru->unreclaimable); } struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); - void sgx_reclaim_direct(void); -void sgx_mark_page_reclaimable(struct sgx_epc_page *page); -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); +int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); void sgx_ipi_cb(void *info); diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c index 6a77a14eee38..287e235bc3c1 100644 --- a/arch/x86/kernel/cpu/sgx/virt.c +++ b/arch/x86/kernel/cpu/sgx/virt.c @@ -62,6 +62,8 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc, goto err_delete; } + sgx_record_epc_page(epc_page, 0); + return 0; err_delete: @@ -146,6 +148,7 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page) return ret; } + sgx_drop_epc_page(epc_page); sgx_free_epc_page(epc_page); return 0; } @@ -218,8 +221,15 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) * have been removed, the SECS page must have a child on * another instance. */ - if (sgx_vepc_free_page(epc_page)) + if (sgx_vepc_free_page(epc_page)) { + /* + * Drop the page before adding it to the list of SECS + * pages. Moving the page off the unreclaimable list + * needs to be done under the LRU's spinlock. + */ + sgx_drop_epc_page(epc_page); list_add_tail(&epc_page->list, &secs_pages); + } xa_erase(&vepc->page_array, index); } @@ -234,15 +244,17 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) mutex_lock(&zombie_secs_pages_lock); list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) { /* - * Speculatively remove the page from the list of zombies, - * if the page is successfully EREMOVE'd it will be added to - * the list of free pages. If EREMOVE fails, throw the page - * on the local list, which will be spliced on at the end. + * If EREMOVE fails, throw the page on the local list, which + * will be spliced on at the end. + * + * Note, this abuses sgx_drop_epc_page() to delete the page off + * the list of zombies, but this is a very rare path (probably + * never hit in production). It's not worth special casing the + * free path for this super rare case just to avoid taking the + * LRU's spinlock. */ - list_del(&epc_page->list); - if (sgx_vepc_free_page(epc_page)) - list_add_tail(&epc_page->list, &secs_pages); + list_move_tail(&epc_page->list, &secs_pages); } if (!list_empty(&secs_pages)) -- 2.37.3