Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp942188iob; Fri, 13 May 2022 17:14:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxR8cT5n2+Zx9pINGTrx0xAJad2uj2s9e7qo2IFbpqZt5sSSWZhu6nqnjvWFZQWN5qxfS56 X-Received: by 2002:a5d:6da3:0:b0:20c:86d4:efbd with SMTP id u3-20020a5d6da3000000b0020c86d4efbdmr5748590wrs.633.1652487241914; Fri, 13 May 2022 17:14:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652487241; cv=none; d=google.com; s=arc-20160816; b=V0Xp6Yr72otkExZGLCdikmj6iNnVG6cMS1R7SQJDrp/Sp97sRRAM2uUpqixJ/WBB2m 9ik8uupfaxz1yJf13QMpSeJGs2dR1E6+OQBcSxd1MdqFhmtWSnsknY47LhtjXvp2Uaw0 SPjP6oJEHnK/M6pobBpnbqhoYaNET4THCSaRH1InhcJCumPYYrWdnrJsjihHclzRC2nL 1CIGsOUKUh9NS/vdz8kz0yjAspAe2W9uONSrYKVWndS4aderdn6wDXWsiVVF6Rh79wmt QZHnAfNYbhI+4yFWLJAUuk3XxgQPnM+s3wMLpjJluYFbY/3jThmTIdNkPVTeaZ8LlqpG 4qsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=/Mzmys8A0AN0sPZ5EYd0TYQQSrHsNNw4NLrVhLnK2HU=; b=Yx1kfMNibu4KR+rK3eju7vei3yqaG1MQ5EvvME8+6M6az3YrMhQgYfVl8VxAAS6/8a m2bd5iqbjEY8GpwBdSW4VQ/OcmGRdOCEYNMJCV0O1SM4ZpAc/3mlrz1KvDLLiVs5eFqw pSUpOEO5o3voaOJdMCjkr9VopM+SC0Jm45gMkAgbbk59mrI4EELIjH/zxCelkDw7tScw +2lNkSMF+m+MkuvjRz6I0kDWNJ/WNTDWcEEPCyd+LDXhKp26CxPvlwJhsiuFdS/8tcUX f29Rpfsw8W3/fCikZjmmfL8A66VImFEibXTt0Rmq5KwrqscQPMbw8CDXW3vbnB5tVRZq xl0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=X3u0SjlG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id o2-20020a05600002c200b00207916b3cf0si4496903wry.106.2022.05.13.17.14.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 17:14:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=X3u0SjlG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9362E321617; Fri, 13 May 2022 16:14:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234299AbiEMOoh (ORCPT + 99 others); Fri, 13 May 2022 10:44:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382839AbiEMOnS (ORCPT ); Fri, 13 May 2022 10:43:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 695D513D05; Fri, 13 May 2022 07:41:45 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EA6E36227D; Fri, 13 May 2022 14:41:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07C55C34100; Fri, 13 May 2022 14:41:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652452904; bh=35n03oMDiGuZXYeT9U2fcPTKKLqo1pbg0WdIXgmtDV4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=X3u0SjlG0YiI+D0q6rsr1qf1mOa4liBMOuEGgjjupbbamygFh2iPf1OWRr2Qk5WWM RNZx/8FAUPj8DNXbZe9CXXkQCugu8ms9JaPnYuZBtCUjAVZVqAJK1Raw/5nYaJeOVL WXTY9yogEoNR+M/UBFhebcBXsV6XNpNUvcIYilvwLzS8UdygcjM8gn5ReexCuv1goA PV5JgmdjrPnd/7zTzJ3nr7k9hEMh2Cm0H91P70QMZX6tmyN+0SNFVR7iWtpAAjgKZv oaoNzj9Q6VbMOYAmItueO4bwfDoidlao7o4mHB4wpD62mnTfaODVm8o0SEV0jzTDDo tRYZ7woddrRDA== Date: Fri, 13 May 2022 17:40:14 +0300 From: Jarkko Sakkinen To: Reinette Chatre Cc: dave.hansen@linux.intel.com, tglx@linutronix.de, bp@alien8.de, luto@kernel.org, mingo@redhat.com, linux-sgx@vger.kernel.org, x86@kernel.org, haitao.huang@intel.com, hpa@zytor.com, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page fault handler Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 12, 2022 at 02:51:00PM -0700, Reinette Chatre wrote: > Haitao reported encountering a WARN triggered by the ENCLS[ELDU] > instruction faulting with a #GP. > > The WARN is encountered when the reclaimer evicts a range of > pages from the enclave when the same pages are faulted back right away. > > Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) > sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the > enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains > just one entry, that of ENCLAVE_B. > > Scenario proceeds where ENCLAVE_A is being evicted from the enclave > while ENCLAVE_B is faulted in. > > sgx_reclaim_pages() { > > ... > > /* > * Reclaim ENCLAVE_A > */ > mutex_lock(&encl->lock); > /* > * Get a reference to ENCLAVE_A's > * shmem page where enclave page > * encrypted data will be stored > * as well as a reference to the > * enclave page's PCMD data page, > * PCMD_AB. > * Release mutex before writing > * any data to the shmem pages. > */ > sgx_encl_get_backing(...); > encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; > mutex_unlock(&encl->lock); > > /* > * Fault ENCLAVE_B > */ > > sgx_vma_fault() { > > mutex_lock(&encl->lock); > /* > * Get reference to > * ENCLAVE_B's shmem page > * as well as PCMD_AB. > */ > sgx_encl_get_backing(...) > /* > * Load page back into > * enclave via ELDU. > */ > /* > * Release reference to > * ENCLAVE_B' shmem page and > * PCMD_AB. > */ > sgx_encl_put_backing(...); > /* > * PCMD_AB is found empty so > * it and ENCLAVE_B's shmem page > * are truncated. > */ > /* Truncate ENCLAVE_B backing page */ > sgx_encl_truncate_backing_page(); > /* Truncate PCMD_AB */ > sgx_encl_truncate_backing_page(); > > mutex_unlock(&encl->lock); > > ... > } > mutex_lock(&encl->lock); > encl_page->desc &= > ~SGX_ENCL_PAGE_BEING_RECLAIMED; > /* > * Write encrypted contents of > * ENCLAVE_A to ENCLAVE_A shmem > * page and its PCMD data to > * PCMD_AB. > */ > sgx_encl_put_backing(...) > > /* > * Reference to PCMD_AB is > * dropped and it is truncated. > * ENCLAVE_A's PCMD data is lost. > */ > mutex_unlock(&encl->lock); > } > > What happens next depends on whether it is ENCLAVE_A being faulted > in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting > with a #GP. > > If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called > a new PCMD page is allocated and providing the empty PCMD data for > ENCLAVE_A would cause ENCLS[ELDU] to #GP > > If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the > reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction > would #GP during its checks of the PCMD value and the WARN would be > encountered. > > Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time > it obtains a reference to the backing store pages of an enclave page it > is in the process of reclaiming, fix the race by only truncating the PCMD > page after ensuring that no page sharing the PCMD page is in the process > of being reclaimed. > > Cc: stable@vger.kernel.org > Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") > Reported-by: Haitao Huang > Tested-by: Haitao Huang > Signed-off-by: Reinette Chatre > --- > Changes since V2: > - Declare "addr" and "entry" within loop. (Dave) > - Fix incorrect error return when enclave page not found to belong > to enclave. Continue search instead of returning failure. (Dave) > - Add Haitao's "Tested-by" tag. > - Rename pcmd_page_in_use() to reclaimer_writing_to_pcmd() to be less > generic. (Dave) > - Improve function comments to be clear about it testing for PCMD > page soon becoming non-empty, also add context info to kernel-doc > to indicate that enclave mutex must be held. (Dave) > > Changes since RFC v1: > - New patch. > > arch/x86/kernel/cpu/sgx/encl.c | 94 +++++++++++++++++++++++++++++++++- > 1 file changed, 93 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 5104a428b72c..243f3bd78145 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -12,6 +12,92 @@ > #include "encls.h" > #include "sgx.h" > > +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd)) > +/* > + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to > + * determine the page index associated with the first PCMD entry > + * within a PCMD page. > + */ > +#define PCMD_FIRST_MASK GENMASK(4, 0) > + > +/** > + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with > + * a PCMD page is in process of being reclaimed. > + * @encl: Enclave to which PCMD page belongs > + * @start_addr: Address of enclave page using first entry within the PCMD page > + * > + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is > + * stored. The PCMD data of a reclaimed enclave page contains enough > + * information for the processor to verify the page at the time > + * it is loaded back into the Enclave Page Cache (EPC). > + * > + * The backing storage to which enclave pages are reclaimed is laid out as > + * follows: > + * Encrypted enclave pages:SECS page:PCMD pages > + * > + * Each PCMD page contains the PCMD metadata of > + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages. > + * > + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the > + * process of getting data (and thus soon being non-empty). (b) is tested with > + * a check if an enclave page sharing the PCMD page is in the process of being > + * reclaimed. > + * > + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it > + * intends to reclaim that enclave page - it means that the PCMD page > + * associated with that enclave page is about to get some data and thus > + * even if the PCMD page is empty, it should not be truncated. > + * > + * Context: Enclave mutex (&sgx_encl->lock) must be held. > + * Return: 1 if the reclaimer is about to write to the PCMD page > + * 0 if the reclaimer has no intention to write to the PCMD page > + */ > +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl, > + unsigned long start_addr) > +{ > + int reclaimed = 0; > + int i; > + > + /* > + * PCMD_FIRST_MASK is based on number of PCMD entries within > + * PCMD page being 32. > + */ > + BUILD_BUG_ON(PCMDS_PER_PAGE != 32); > + > + for (i = 0; i < PCMDS_PER_PAGE; i++) { > + struct sgx_encl_page *entry; > + unsigned long addr; > + > + addr = start_addr + i * PAGE_SIZE; > + > + /* > + * Stop when reaching the SECS page - it does not > + * have a page_array entry and its reclaim is > + * started and completed with enclave mutex held so > + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED > + * flag. > + */ > + if (addr == encl->base + encl->size) > + break; > + > + entry = xa_load(&encl->page_array, PFN_DOWN(addr)); > + if (!entry) > + continue; > + > + /* > + * VA page slot ID uses same bit as the flag so it is important > + * to ensure that the page is not already in backing store. > + */ > + if (entry->epc_page && > + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) { > + reclaimed = 1; > + break; > + } > + } > + > + return reclaimed; > +} > + > /* > * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's > * follow right after the EPC data in the backing storage. In addition to the > @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, > unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; > struct sgx_encl *encl = encl_page->encl; > pgoff_t page_index, page_pcmd_off; > + unsigned long pcmd_first_page; > struct sgx_pageinfo pginfo; > struct sgx_backing b; > bool pcmd_page_empty; > @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, > else > page_index = PFN_DOWN(encl->size); > > + /* > + * Address of enclave page using the first entry within the PCMD page. > + */ > + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base; > + > page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index); > > ret = sgx_encl_get_backing(encl, page_index, &b); > @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, > > sgx_encl_truncate_backing_page(encl, page_index); > > - if (pcmd_page_empty) > + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) > sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); > > return ret; > -- > 2.25.1 > Reviewed-by: Jarkko Sakkinen BR, Jarkko