Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2663224pxk; Sun, 4 Oct 2020 07:36:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz1AdilPSIFPtHihwND8R9mNMDyJ9HeTLRoR/x+RCEWO4cSRHsWRN8zGZExOcFXlAJdfFdm X-Received: by 2002:a50:9e87:: with SMTP id a7mr11546165edf.297.1601822197880; Sun, 04 Oct 2020 07:36:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601822197; cv=none; d=google.com; s=arc-20160816; b=BYl7TlYZk4HEypO51BDH6czbV9bEZR7W9y2g+LwZz0DYjvBL9aSeHjsRpBqKSJA18a aTvGm+Dn06sBK3SCZTn2CeaYDCqAkjLqZaWKjuhgOPGEgipIVcqUQ/hNz6dt7eZH2GoT OwHUaAV/12FmqtaXia9WSopZiZmdnoe6B5XPE9MiWf1rpxC/L+0aSdrj4sQ2YfZoqi9o RUYoxIQluyV4df66wLHdLXb2r+biszsFELTLGaVnbEBhUGbLFBs/3N+pLeF53ow6k4/+ KVNvaX5Rzg121hN0PK1kHlLUJ4k+89ZpSAtwLlROlM5rnepySnIJ7uqvfMO/U50ZSrjk vvog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:organization:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :ironport-sdr:ironport-sdr; bh=/a00NcJwC/OFpbZMO5dHOx9aXhiq6BJ9a6EdbxTWFnc=; b=I3TtGX9kE/7bkBoBx8B3PKJ68gNGEY1I5vVtB0zWIf0wP+80sShvKy3QKFrcVVNuMd kP99jvCYBe/cfM2lx6Fh2Nm7MLDswPEmebXYytu1wwoWRgerKGSe1QdKKoUq12QzORef ROjXOLiU3Rjydj64lhiG2UtDWxdwoM+237Y1rWJDxFfvGa4MtxFKMdHwTYkJCEdVgxaU fuIwQVKU89yUjL96sqF3BByYlGc6dEeoogf5IEtVkPEpGvTmrM/Eo7jyU6S6MReV1G3h OSPWt3lys7Re0HllE1Xr5kli3CdZ8HYWOQnETviaOzS5X5apKccQt7K9y6RiiSGEHvBa HfpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dr7si5373727ejc.447.2020.10.04.07.36.14; Sun, 04 Oct 2020 07:36:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726061AbgJDOc6 (ORCPT + 99 others); Sun, 4 Oct 2020 10:32:58 -0400 Received: from mga05.intel.com ([192.55.52.43]:49353 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725827AbgJDOc6 (ORCPT ); Sun, 4 Oct 2020 10:32:58 -0400 IronPort-SDR: ILKPSoNOsdJpnD0es5voKnDH875NR8YiImQDOL7Rhtpfw4Z/0BytQgvpzNV+hy7OzzC3BBsUXb QYDitoVnMYQA== X-IronPort-AV: E=McAfee;i="6000,8403,9763"; a="248026200" X-IronPort-AV: E=Sophos;i="5.77,335,1596524400"; d="scan'208";a="248026200" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Oct 2020 07:32:56 -0700 IronPort-SDR: 4Xvz+TMkm+Qpbjp+2v6pfbZheZBTQpWTrakW+HOcLN3bIvhBqiL/xddm00pP8FI7Js7AlcB83z o/PbvDM8QrnA== X-IronPort-AV: E=Sophos;i="5.77,335,1596524400"; d="scan'208";a="520142122" Received: from pkozlov-mobl1.ger.corp.intel.com (HELO localhost) ([10.249.35.250]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Oct 2020 07:32:48 -0700 Date: Sun, 4 Oct 2020 17:32:46 +0300 From: Jarkko Sakkinen To: Greg KH Cc: x86@kernel.org, linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Jethro Beekman , Haitao Huang , Chunyang Hui , Jordan Hand , Nathaniel McCallum , Seth Moore , Darren Kenny , Sean Christopherson , Suresh Siddha , andriy.shevchenko@linux.intel.com, asapek@google.com, bp@alien8.de, cedric.xing@intel.com, chenalexchen@google.com, conradparker@google.com, cyhanish@google.com, dave.hansen@intel.com, haitao.huang@intel.com, kai.huang@intel.com, kai.svahn@intel.com, kmoy@google.com, ludloff@google.com, luto@kernel.org, nhorman@redhat.com, puiterwijk@redhat.com, rientjes@google.com, tglx@linutronix.de, yaozhangx@google.com, mikko.ylinen@intel.com Subject: Re: [PATCH v39 11/24] x86/sgx: Add SGX enclave driver Message-ID: <20201004143246.GA3561@linux.intel.com> References: <20201003045059.665934-1-jarkko.sakkinen@linux.intel.com> <20201003045059.665934-12-jarkko.sakkinen@linux.intel.com> <20201003143925.GB800720@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201003143925.GB800720@kroah.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 03, 2020 at 04:39:25PM +0200, Greg KH wrote: > On Sat, Oct 03, 2020 at 07:50:46AM +0300, Jarkko Sakkinen wrote: > > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can > > be used by applications to set aside private regions of code and data. The > > code outside the SGX hosted software entity is prevented from accessing the > > memory inside the enclave by the CPU. We call these entities enclaves. > > > > Add a driver that provides an ioctl API to construct and run enclaves. > > Enclaves are constructed from pages residing in reserved physical memory > > areas. The contents of these pages can only be accessed when they are > > mapped as part of an enclave, by a hardware thread running inside the > > enclave. > > > > The starting state of an enclave consists of a fixed measured set of > > pages that are copied to the EPC during the construction process by > > using the opcode ENCLS leaf functions and Software Enclave Control > > Structure (SECS) that defines the enclave properties. > > > > Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and > > EINIT. ECREATE initializes SECS, EADD copies pages from system memory to > > the EPC and EINIT checks a given signed measurement and moves the enclave > > into a state ready for execution. > > > > An initialized enclave can only be accessed through special Thread Control > > Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER. This leaf > > function converts a thread into enclave mode and continues the execution in > > the offset defined by the TCS provided to EENTER. An enclave is exited > > through syscall, exception, interrupts or by explicitly calling another > > ENCLU leaf EEXIT. > > > > The mmap() permissions are capped by the contained enclave page > > permissions. The mapped areas must also be populated, i.e. each page > > address must contain a page. This logic is implemented in > > sgx_encl_may_map(). > > > > Cc: linux-security-module@vger.kernel.org > > Cc: linux-mm@kvack.org > > Cc: Andrew Morton > > Cc: Matthew Wilcox > > Acked-by: Jethro Beekman > > Tested-by: Jethro Beekman > > Tested-by: Haitao Huang > > Tested-by: Chunyang Hui > > Tested-by: Jordan Hand > > Tested-by: Nathaniel McCallum > > Tested-by: Seth Moore > > Tested-by: Darren Kenny > > Reviewed-by: Darren Kenny > > Co-developed-by: Sean Christopherson > > Signed-off-by: Sean Christopherson > > Co-developed-by: Suresh Siddha > > Signed-off-by: Suresh Siddha > > Signed-off-by: Jarkko Sakkinen > > --- > > arch/x86/kernel/cpu/sgx/Makefile | 2 + > > arch/x86/kernel/cpu/sgx/driver.c | 173 ++++++++++++++++ > > arch/x86/kernel/cpu/sgx/driver.h | 29 +++ > > arch/x86/kernel/cpu/sgx/encl.c | 331 +++++++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/encl.h | 85 ++++++++ > > arch/x86/kernel/cpu/sgx/main.c | 11 + > > 6 files changed, 631 insertions(+) > > create mode 100644 arch/x86/kernel/cpu/sgx/driver.c > > create mode 100644 arch/x86/kernel/cpu/sgx/driver.h > > create mode 100644 arch/x86/kernel/cpu/sgx/encl.c > > create mode 100644 arch/x86/kernel/cpu/sgx/encl.h > > > > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile > > index 79510ce01b3b..3fc451120735 100644 > > --- a/arch/x86/kernel/cpu/sgx/Makefile > > +++ b/arch/x86/kernel/cpu/sgx/Makefile > > @@ -1,2 +1,4 @@ > > obj-y += \ > > + driver.o \ > > + encl.o \ > > main.o > > diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c > > new file mode 100644 > > index 000000000000..f54da5f19c2b > > --- /dev/null > > +++ b/arch/x86/kernel/cpu/sgx/driver.c > > @@ -0,0 +1,173 @@ > > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > > You use gpl-only header files in this file, so how in the world can it > be bsd-3 licensed? > > Please get your legal department to agree with this, after you explain > to them how you are mixing gpl2-only code in with this file. I'll do what I already stated that I will do. Should I do something more? > > +// Copyright(c) 2016-18 Intel Corporation. > > Dates are hard to get right :( Will fix. > > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include "driver.h" > > +#include "encl.h" > > + > > +u64 sgx_encl_size_max_32; > > +u64 sgx_encl_size_max_64; > > +u32 sgx_misc_reserved_mask; > > +u64 sgx_attributes_reserved_mask; > > +u64 sgx_xfrm_reserved_mask = ~0x3; > > +u32 sgx_xsave_size_tbl[64]; > > + > > +static int sgx_open(struct inode *inode, struct file *file) > > +{ > > + struct sgx_encl *encl; > > + int ret; > > + > > + encl = kzalloc(sizeof(*encl), GFP_KERNEL); > > + if (!encl) > > + return -ENOMEM; > > + > > + atomic_set(&encl->flags, 0); > > + kref_init(&encl->refcount); > > + xa_init(&encl->page_array); > > + mutex_init(&encl->lock); > > + INIT_LIST_HEAD(&encl->mm_list); > > + spin_lock_init(&encl->mm_lock); > > + > > + ret = init_srcu_struct(&encl->srcu); > > + if (ret) { > > + kfree(encl); > > + return ret; > > + } > > + > > + file->private_data = encl; > > + > > + return 0; > > +} > > + > > +static int sgx_release(struct inode *inode, struct file *file) > > +{ > > + struct sgx_encl *encl = file->private_data; > > + struct sgx_encl_mm *encl_mm; > > + > > + for ( ; ; ) { > > + spin_lock(&encl->mm_lock); > > + > > + if (list_empty(&encl->mm_list)) { > > + encl_mm = NULL; > > + } else { > > + encl_mm = list_first_entry(&encl->mm_list, > > + struct sgx_encl_mm, list); > > + list_del_rcu(&encl_mm->list); > > + } > > + > > + spin_unlock(&encl->mm_lock); > > + > > + /* The list is empty, ready to go. */ > > + if (!encl_mm) > > + break; > > + > > + synchronize_srcu(&encl->srcu); > > + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); > > + kfree(encl_mm); > > + } > > + > > + mutex_lock(&encl->lock); > > + atomic_or(SGX_ENCL_DEAD, &encl->flags); > > So you set a flag that this is dead, and then instantly delete it? Why > does that matter? I see you check for this flag elsewhere, but as you > are just about to delete this structure, how can this be an issue? It matters because ksgxswapd (sgx_reclaimer_*) might be processing it. It will use the flag to skip the operations that it would do to a victim page, when the enclave is still alive. > > > + mutex_unlock(&encl->lock); > > + > > + kref_put(&encl->refcount, sgx_encl_release); > > Don't you need to hold the lock across the put? If not, what is > serializing this? > > But an even larger comment, why is this reference count needed at all? > > You never grab it except at init time, and you free it at close time. > Why not rely on the reference counting that the vfs ensures you? Because ksgxswapd needs the alive enclave instance while it is in the process of swapping a victim page. The reason for this is the hierarchical nature of the enclave pages. As an example, a write operation to main memory, EWB (SDM vol 3D 40-79) needs to access SGX Enclave Control Structure (SECS) page, which is contains global data for an enclave, like the unswapped child count. > > + return 0; > > +} > > + > > +static int sgx_mmap(struct file *file, struct vm_area_struct *vma) > > +{ > > + struct sgx_encl *encl = file->private_data; > > + int ret; > > + > > + ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, vma->vm_flags); > > + if (ret) > > + return ret; > > + > > + ret = sgx_encl_mm_add(encl, vma->vm_mm); > > + if (ret) > > + return ret; > > + > > + vma->vm_ops = &sgx_vm_ops; > > + vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; > > + vma->vm_private_data = encl; > > + > > + return 0; > > +} > > + > > +static unsigned long sgx_get_unmapped_area(struct file *file, > > + unsigned long addr, > > + unsigned long len, > > + unsigned long pgoff, > > + unsigned long flags) > > +{ > > + if ((flags & MAP_TYPE) == MAP_PRIVATE) > > + return -EINVAL; > > + > > + if (flags & MAP_FIXED) > > + return addr; > > + > > + return current->mm->get_unmapped_area(file, addr, len, pgoff, flags); > > +} > > + > > +static const struct file_operations sgx_encl_fops = { > > + .owner = THIS_MODULE, > > + .open = sgx_open, > > + .release = sgx_release, > > + .mmap = sgx_mmap, > > + .get_unmapped_area = sgx_get_unmapped_area, > > +}; > > + > > +static struct miscdevice sgx_dev_enclave = { > > + .minor = MISC_DYNAMIC_MINOR, > > + .name = "enclave", > > + .nodename = "sgx/enclave", > > A subdir for a single device node? Ok, odd, but why not just > "sgx_enclave"? How "special" is this device node? There is a patch that adds "sgx/provision". Either works for me. Should I flatten them to "sgx_enclave" and "sgx_provision", or keep them as they are? > thanks, > > greg k-h /Jarkko