Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp1009245ybm; Wed, 22 May 2019 15:44:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqwJq/0TMtzyLYuASlYjpdlWai7eZXNF8FoOkiGr3wrOBOjFZdnGuMKKDC17UcXe90yl6Wz/ X-Received: by 2002:a62:6d47:: with SMTP id i68mr99263561pfc.189.1558565083164; Wed, 22 May 2019 15:44:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558565083; cv=none; d=google.com; s=arc-20160816; b=mFHQAotrwd/SYJtt6cUuaNNFG0fxBcnqddJH1HXTgaMrVbIaqPkh3VUGu+hTO49HqG Gb/QQEmhJIjhYx7k2waCcdcLrBr+Ai4IaFz+tKKPFq5/JUNmE63ZwqQ9RFiP58f1PtVq GEN/s0rRpHcmb1Ug4c2PGe4mEGPqhO0LF+ncIuc5K1C1bTRCcw0AbcA4eDNeHDSacQuV rNZk05Uo61VBW8MnF7ybPwKMBAbJQINS9nMwXKC4QGFRVrJtcCzXLR98tdI+gZ1tRec/ D4igAP9PU6NRgNUE7CliGB4LiNyw851idAunGEzYFdM30SHqekFCkK38K3cOp/RYtmh2 eczg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=r7kXOA1RaNdkRTQQYtwAxZWwm1WE0LcY2HGOh9mfDxI=; b=iGk2Sv09Lj69GlO82li81CaHxGYDa/hbLjLiwNoc4mE8e6kTN8Z4TSQuGUdfPnyWqn 5mKk9w1Rigd8Mm/6gaCFhSbpyVQxQEBEUYiVc4ggTWhwdbKRpEqRdcQpAB9ZfVrsydMi rJeP0PHZbzKNru7jBOw7iEEpoKf/R4jMnIi2AneWBMG0Bhtr4m6EIAKCLA2mzjbs5AoQ 0xlyEV3i2Ql9z0m/1sXo+c8lJyLGO7nSmE40EPHd2IL069TH14paWP4e3xLqW7+HvKmu wfxUfA2MG3jd4P8XWqv3VHpVQHYpBrlyj01BT/JVvS4SMYZnq9aJ+OYqQD7/RZKpFvD2 PSpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Uv3dbxZB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si27586152pgh.192.2019.05.22.15.44.27; Wed, 22 May 2019 15:44:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Uv3dbxZB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729329AbfEVWnB (ORCPT + 99 others); Wed, 22 May 2019 18:43:01 -0400 Received: from mail.kernel.org ([198.145.29.99]:58206 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728202AbfEVWnB (ORCPT ); Wed, 22 May 2019 18:43:01 -0400 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A471821855 for ; Wed, 22 May 2019 22:42:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558564979; bh=OOvk4jhd4RJaf/SM5z16pjS8vJ3csmmntOLya71ImTY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Uv3dbxZBVzkpdBCbr+gOafq0SWIgkZDVjd3lr2+tS4Ri2jV+M0mOrPu8pY9d0wn3T DHL0MaocHFIaV913/qDSFVudmp3spEApGaelHChdWetR5Gb/uoW26APzBLn5dEZcZD hU8LYCVAohKZLS9FA16eD4RlISBsDbzzDbT0ynhs= Received: by mail-wm1-f46.google.com with SMTP id i3so3783178wml.4 for ; Wed, 22 May 2019 15:42:59 -0700 (PDT) X-Gm-Message-State: APjAAAWY49dJKIRXp4Buhgq3xbCZVrRX7vgouIKQa3o3RESxR89CpsPC bald4FOS7WuT3kL+108yyK0o7YtAm0H/72LKERo8wQ== X-Received: by 2002:a1c:4107:: with SMTP id o7mr9456931wma.122.1558564978244; Wed, 22 May 2019 15:42:58 -0700 (PDT) MIME-Version: 1.0 References: <20190515013031.GF1977@linux.intel.com> <20190517000331.GD11204@linux.intel.com> <20190520114105.GD27805@linux.intel.com> <20190521151836.GA4843@linux.intel.com> <20190521155140.GE22089@linux.intel.com> <20190522132022.GC31176@linux.intel.com> <20190522132227.GD31176@linux.intel.com> <0e183cce-c4b4-0e10-dbb6-bd81bea58b66@tycho.nsa.gov> <20190522153836.GA24833@linux.intel.com> In-Reply-To: <20190522153836.GA24833@linux.intel.com> From: Andy Lutomirski Date: Wed, 22 May 2019 15:42:45 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support) To: Sean Christopherson Cc: Stephen Smalley , Jarkko Sakkinen , Andy Lutomirski , James Morris , "Serge E. Hallyn" , LSM List , Paul Moore , Eric Paris , selinux@vger.kernel.org, Jethro Beekman , "Xing, Cedric" , "Hansen, Dave" , Thomas Gleixner , "Dr. Greg" , Linus Torvalds , LKML , X86 ML , "linux-sgx@vger.kernel.org" , Andrew Morton , "nhorman@redhat.com" , "npmccallum@redhat.com" , "Ayoun, Serge" , "Katz-zamir, Shay" , "Huang, Haitao" , Andy Shevchenko , "Svahn, Kai" , Borislav Petkov , Josh Triplett , "Huang, Kai" , David Rientjes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 22, 2019 at 8:38 AM Sean Christopherson wrote: > > On Wed, May 22, 2019 at 09:56:30AM -0400, Stephen Smalley wrote: > > On 5/22/19 9:22 AM, Jarkko Sakkinen wrote: > > >On Wed, May 22, 2019 at 04:20:22PM +0300, Jarkko Sakkinen wrote: > > >>On Tue, May 21, 2019 at 08:51:40AM -0700, Sean Christopherson wrote: > > >>>Except that mmap() is more or less required to guarantee that ELRANGE > > >>>established by ECREATE is available. And we want to disallow mmap() as > > >>>soon as the first EADD is done so that userspace can't remap the enclave's > > >>>VMAs via munmap()->mmap() and gain execute permissions to pages that were > > >>>EADD'd as NX. > > >> > > >>We don't want to guarantee such thing and it is not guaranteed. It does > > >>not fit at all to the multi process work done. Enclaves are detached > > >>from any particular process addresse spaces. It is responsibility of > > >>process to open windows to them. > > >> > > >>That would be completely against work that we've done lately. > > > > > >Example use case: you have a process that just constructs an enclave > > >and sends it to another process or processes for use. The constructor > > >process could have basically anything on that range. This was the key > > >goal of the fd based enclave work. > > > > What exactly happens in the constructor versus the recipient processes? > > Which process performs each of the necessary open(), mmap(), and ioctl() > > calls for setting up the enclave? Can you provide a high level overview of > > the sequence of userspace calls by the constructor and by the recipient > > similar to what Sean showed earlier for just a single process? > > Hmm, what we had talked about was allowing the SGX ioctls to work without > an associated VMA, with the end goal of letting userspace restrict access > to /dev/sgx/enclave. Very roughly... > > Enclave Owner: > > connect(builder, ...); > send(builder, "/home/sean/path/to/my/enclave"); > > recv(builder, &enclave_fd); > > for_each_chunk { > mmap(enclave_addr + offset, size, ..., MAP_SHARED, enclave_fd, 0); > } > > > Enclave Builder: > > recv(sock, &enclave_path); > > source_fd = open(enclave_path, O_RDONLY); > for_each_chunk { > > } > > enclave_fd = open("/dev/sgx/enclave", O_RDWR); > > ioctl(enclave_fd, ENCLAVE_CREATE, ...); > for_each_chunk { > struct sgx_enclave_add ioctlargs = { > .offset = chunk.offset, > .source = chunk.addr, > .size = chunk.size, > .type = chunk.type, /* SGX specific metadata */ > } > ioctl(fd, ENCLAVE_ADD, &ioctlargs); /* modifies enclave's VMAs */ > } > ioctl(enclave_fd, ENCLAVE_INIT, ...); > > write(sock, enclave_fd); > > > But the above flow is flawed because there'a catch-22: ENCLAVE_ECREATE > takes the virtual address of the enclave, but in the above flow that's > not established until "mmap(..., enclave_fd)". And because an enclave's > virtual range needs to be naturally aligned (hardware requirements), the > enclave owner would need to do something like: > > source_fd = open("/home/sean/path/to/my/enclave", O_RDONLY); > size = > > enclave_range = mmap(NULL, size*2, PROT_READ, ???, NULL, 0); > enclave_addr = (enclave_range + (size - 1)) & ~(size - 1); > > connect(builder, ...); > send(builder, {"/home/sean/path/to/my/enclave", enclave_addr}); > > recv(builder, &enclave_fd); > > munmap(enclave_range); > > for_each_chunk { > addr = mmap(enclave_addr + c.offset, c.size, ..., MAP_SHARED, enclave_fd, 0); > if (addr != enclave_addr + c.offset) > exit(1); > } > > And that straight up doesn't work with the v20 driver because mmap() with > the enclave_fd will run through sgx_get_unmapped_area(), which also does > the natural alignment adjustments (the idea being that mmap() is mapping > the entire enclave). E.g. mmap() will map the wrong address if the offset > of a chunk is less than its size due to the driver adjusting the address. That presumably needs to change. Are we entirely missing an API to allocate a naturally aligned VA range? That's kind of annoying. > > Eliminating sgx_get_unmapped_area() means userspace is once again on the > hook for naturally aligning the enclave, which is less than desirable. > > Looking back at the original API discussions around a builder process[1], > we never fleshed out the end-to-end flow. While having a builder process > *sounds* reasonable, in practice it adds a lot of complexity without > providing much in the way of added security. E.g. in addition to the > above mmap() issues, since the order of EADDs affects the enclave > measurement, the enclave owner would need to communicate the exact steps > to build the enclave, or the builder would need a priori knowledge of the > enclave format. > > Userspace can still restrict access to /dev/sgx/enclave, e.g. by having a > daemon that requires additional credentials to obtain a new enclave_fd. > So AFAICT, the only benefit to having a dedicated builder is that it can > do its own whitelisting of enclaves, but since we're trending towards > supporting whitelisting enclaves in the kernel, e.g. via sigstruct, > whitelisting in userspace purely in userspace also provides marginal value. > > TL;DR: Requiring VMA backing to build an enclave seems reasonable and sane. This isn't necessarily a problem, but we pretty much have to use mprotect() then. Maybe the semantics could just be that mmap() on the SGX device gives natural alignment, but that there is no actual constraint enforced by the driver as to whether mmap() happens before or after ECREATE. After all, it's *ugly* for user code to reserve its address range with an awkward giant mmap(), there's nothing fundamentally wrong with it. As far as I know from this whole discussion, we still haven't come up with any credible way to avoid tracking, per enclave page, whether that page came from unmodified PROT_EXEC memory.