Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp338869imu; Tue, 8 Jan 2019 21:27:26 -0800 (PST) X-Google-Smtp-Source: ALg8bN7Ys78V7BrBIAOIZzowP6wSI6rXFDIsy1hIPa+DNfeL+LIOX+nIojHlXAbNdYJMfHKxYngo X-Received: by 2002:a62:c101:: with SMTP id i1mr4626081pfg.80.1547011646520; Tue, 08 Jan 2019 21:27:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547011646; cv=none; d=google.com; s=arc-20160816; b=wOrFf2JWHPaarW4HqZFjDx6kRfZ/a5dq0XEb8PHjgILcLjeAHNttc56fbF9/m+QQR3 Rqo5T/ovt7Wm4Q9HcE+uTLdux6n25y00TgER78q4m1UVwrc6VF4uViK++mhZoYpHpKys VjUAUHLUE4LOcYIgtZEYiYzETZ0wEKtKro/I+kGUear1sD4xKLAPw2GaAg+1PLA4ev5i Ave/cna6cd7zHUwzPsuLsLi9GVwAyeQoxxk3XdBbrScnLNlIg76cOyvl/jDah9VLHvpR FcLYoX/3cLQMSSYykSn7TPtWu7FMIaZcSwZE1nK0sTL8rSWevgOLUomWCLVRDD07Kg3+ AOOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :dlp-reaction:dlp-version:dlp-product:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from; bh=t8KKygFzkqZGuyrCihZwxHg1gjVXeL//R1s8LRm9l5U=; b=fCffR/OgdvTVxbo2LNzd7cxN7UeAO1ceUAMdc9r/UWtdCb/qHQk4cCBYJjYMsNZgVt sbxyZWqo8USo6NuaXbGvH5mkysqDS+WTWjKG0NZ5m7oWq1iu34mUbTyxKlXJHFH4C4qu 8c18nGvGvbuucqtmesDhGqMAxCZ9J8O1GkeHhosCFvFV3uIQ2Tdgy7fuwhfFWLnnztYs 4a1GDZi8HxBBhsMGhpJr23d/J3CXTb+eAPu7vqzTTAq0MbzCweQNbgSTnTq433ot+Sp2 DBpDg3acwwYNnvm0NwQjRoZYyLUXUzoZ+7UI64v9ULpDot/0m9cJ0vB7xmfgemmwaaqz zx0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a17si13200859pgv.456.2019.01.08.21.27.10; Tue, 08 Jan 2019 21:27:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729172AbfAIFYp convert rfc822-to-8bit (ORCPT + 99 others); Wed, 9 Jan 2019 00:24:45 -0500 Received: from mga05.intel.com ([192.55.52.43]:64491 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726237AbfAIFYp (ORCPT ); Wed, 9 Jan 2019 00:24:45 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Jan 2019 21:24:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,456,1539673200"; d="scan'208";a="136607201" Received: from pgsmsx105.gar.corp.intel.com ([10.221.44.96]) by fmsmga001.fm.intel.com with ESMTP; 08 Jan 2019 21:24:40 -0800 Received: from pgsmsx112.gar.corp.intel.com ([169.254.3.246]) by PGSMSX105.gar.corp.intel.com ([169.254.4.160]) with mapi id 14.03.0415.000; Wed, 9 Jan 2019 13:24:40 +0800 From: "Huang, Kai" To: "Christopherson, Sean J" CC: Andy Lutomirski , Jethro Beekman , Jarkko Sakkinen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "x86@kernel.org" , Dave Hansen , Peter Zijlstra , "H. Peter Anvin" , "linux-kernel@vger.kernel.org" , "linux-sgx@vger.kernel.org" , Josh Triplett , "Haitao Huang" , "Dr . Greg Wettstein" Subject: RE: x86/sgx: uapi change proposal Thread-Topic: x86/sgx: uapi change proposal Thread-Index: AQHUl3CouNmOMJQZHU2YX8Dh54+gQaWFODMAgAAIjACAAAbWAIAAVlWAgADM8wCAAnSJgIAc/6rw//+proCAAOdxMA== Date: Wed, 9 Jan 2019 05:24:38 +0000 Message-ID: <105F7BF4D0229846AF094488D65A0989355A58F1@PGSMSX112.gar.corp.intel.com> References: <20181214215729.4221-1-sean.j.christopherson@intel.com> <7706b2aa71312e1f0009958bcab24e1e9d8d1237.camel@linux.intel.com> <598cd050-f0b5-d18c-96a0-915f02525e3e@fortanix.com> <20181219091148.GA5121@linux.intel.com> <613c6814-4e71-38e5-444a-545f0e286df8@fortanix.com> <20181219144515.GA30909@linux.intel.com> <20181221162825.GB26865@linux.intel.com> <105F7BF4D0229846AF094488D65A0989355A45B6@PGSMSX112.gar.corp.intel.com> <20190108220946.GA30462@linux.intel.com> In-Reply-To: <20190108220946.GA30462@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZTMwY2IwMTctM2I2Yi00Mjg2LWFkNDMtY2YxMGRkN2JhOWQ3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiTTMrWnJ3XC96SUhCT25TT3JzNWRcL3dxSnlrR2kxcGNVem9USTJLajM0OWJlR1JxQWszR01EMFwvdUhXS1lqNXJxMCJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [172.30.20.206] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote: > > > > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just > > > > opening a new instance of /dev/sgx for each encalve? > > > > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't > > > be used to provide ioctl()'s for other SGX-related needs, e.g. to > > > mmap() raw EPC and expose it a VM. Proposed layout in the link > > > below. I'll also respond to Jarkko's question about exposing EPC > > > through /dev/sgx instead of having KVM allocate it on behalf of the VM. > > > > > > https://lkml.kernel.org/r/20181218185349.GC30082@linux.intel.com > > > > Hi Sean, > > > > Sorry for replying to old email. But IMHO it is not a must that Qemu > > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual > > EPC slot, instead, KVM could create private slot, which is not visible > > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation > > API directly. > > That's possible, but it has several downsides. > > - Duplicates a lot of code in KVM for managing memory regions. I don't see why there will be duplicated code. you can simply call __x86_set_memory_region to create private slot. It is KVM x86 equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only difference is Qemu is not aware of the private slot. > - Artificially restricts userspace to a single EPC region, unless > even more code is duplicated to handle multiple private regions. You can have multiple private slots, by calling __x86_set_memory_region for each EPC section. KVM receives EPC section/sections info from Qemu, via CPUID, or dedicated IOCTL (is this you are going to add?), and simply creates private EPC slot/slots. > - Requires additional ioctls() or capabilities to probe EPC support No. EPC info is from Qemu at the beginning (size is given by parameter, base is calculated by Qemu), and actually it is Qemu notifies KVM EPC info, so I don't think we require additional ioctls or capabilities here. > - Does not fit with Qemu/KVM's memory model, e.g. all other types of > memory are exposed to a guest through > KVM_SET_USER_MEMORY_REGION. EPC is different. I am not sure whether EPC needs to fit such model. There are already examples in KVM which uses private slot w/o using KVM_SET_USER_MEMORY_REGION, for example, APIC access page. > - Prevents userspace from debugging a guest's enclave. I'm not saying > this is a likely scenario, but I also don't think we should preclude > it without good reason. I am not sure how important it is, so don't know whether this can be a justification. To me we don't need to consider this. Qemu normally doesn't access guest memory unless it has to (ie, for device model). > - KVM is now responsible for managing the lifecycle of EPC, e.g. what > happens if an EPC cgroup limit is lowered on a running VM and > KVM can't gracefully reclaim EPC? The userspace hypervisor should > ultimately decide how to handle such an event. Even using KVM_SET_USER_MEMORY_REGION, KVM is also responsible for managing the lifecycle of EPC, no? And "managing the lifecycle of EPC" you mean doesn't need to be "managing EPC itself" I suppose, since EPC should always be managed by core SGX code. I don't see the difference between private slot and KVM_SET_USER_MEMORY_REGION, in terms of how does KVM reclaim EPC, or how does KVM do when it fails to reclaim EPC. > - SGX logic is split between SGX and KVM, e.g. VA page management for > oversubscription will likely be common to SGX and KVM. From a long > term maintenance perspective, this means that changes to the EPC > management could potentially need to be Acked by KVM, and vice versa. I think most of the code should be in core SGX code under x86. KVM should only have the code that is specifically related to virtualization, ie, ENCLV. VA page allocation should be under code SGX code. KVM might need to call function such as alloc_va_page, etc, but this is not a problem. There are many other cases now. And this is not related to private slot vs KVM_SET_USER_MEMORY_REGION. > > > I am not sure what's the good of allowing userspace to alloc/mmap a > > raw EPC region? Userspace is not allowed to touch EPC anyway, expect > > enclave code. > > > > To me KVM creates private EPC slot is cleaner than exposing > > /dev/sgx/epc and allowing userspace to map some raw EPC region. > > Cleaner in the sense that it's faster to get basic support up and running since > there are fewer touchpoints, but there are long term ramifications to > cramming EPC management in KVM. > > And at this point I'm not stating any absolutes, e.g. how EPC will be handled > by KVM. What I'm pushing for is to not eliminate the possibility of having > the SGX subsystem own all EPC management, e.g. don't tie /dev/sgx to a > single enclave. I suppose "SGX subsystem" you mean here is core SGX code under x86. IMHO EPC should always be managed by such SGX subsystem, and KVM and SGX driver are just consumers (ie, calling EPC allocation function, etc). IMHO I think /dev/sgx (or whatever /dev/sgx/xxx) should be in SGX driver, but not SGX core code. For example, if we don't consider KVM EPC oversubscription here, theoretically we only need below code in core SGX code to make KVM SGX work: 1) SGX feature detection. There should be some core structure (such as boot_cpu_data, etc) where KVM can query SGX capabilities, ie, whether FLC is available. 2) EPC management. KVM simply calls EPC management APIs when it needs. For example, when EPC slot is created, we should populate all EPC pages, and fail to create VM if running out of EPC. 3) Other code to deal with ENCLS, SGX data structure, etc, since we have agreed even with EPC static allocation, we should trap EINIT. We probably even don't need code to deal with enclave, if only for KVM. Of course, in order to support KVM EPC oversubscription, we should also include enclave management code in core SGX code, but this doesn't necessary mean we should include /dev/sgx/xxx in core SGX code. It seems there still are lots of design issues we haven't got consensus in terms of how SGX driver should be (or how SGX stack is supposed to work), but if we can focus on what core SGX code should really have (w/o involving SGX driver logic), we can at least get KVM SGX code working. After all, we also have window SGX support. Thanks, -Kai