Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp2485141ybm; Thu, 23 May 2019 18:22:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqwZQlMdEirjSZ27OACPPz4YNgEoORA789CfB1WuDVr84s+qE9yR01wJVl7iAGlZHG3VMP0y X-Received: by 2002:a17:902:b70d:: with SMTP id d13mr2879950pls.259.1558660967379; Thu, 23 May 2019 18:22:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558660967; cv=none; d=google.com; s=arc-20160816; b=ZkzHpglndX1/g+EEwspj/dTECidl70Jm8Y0ovfp0gAFO5nLKVAK4XTrcy5YXmKY0jU EbrXzhyz80lpEk7z5sdMzgxNeexB46xU8GfdMFfHtz8IWpagWRDby/Wruyy3ftygpezG 88ATpYoVYE3JoUheAmwH6ahhJwgBJTGsbkuzPRPpYsoTCglcxfQNmzAMC7oLKxqT/KzU CILZc38S//NjGMmyMdQU5nXTBcEm7hYQfREtF3ab9wMR/eH3N95elJDSZCUMAa+dE3cD 1o6LujW1Vm7kQMz2F6GyikUMLW7ObPin+Y0VR4RrkNg85bBdzWK2jJwqx2uVlcYy3MGJ McmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=G1qrX1d6EpuBeTlTpBy49ppgZ2ZLEOjffjjJDsFLZ5I=; b=BSal5pwby1wHUIzq9ZyRQWtBqhZDcAzgCxymzFvSCpqL0OMV5r0IqDUfZCiwMSIoii HKWRRKo7mG+5m5PXH11FflxMqFZLi8W8cRywe3npx9lnEPg/7+Mr9/SYRSamw9ks3eIu kk2cAm8koOVAud8YyNFjzCyVPctQzq9hVNQspTfQk3+7vqy23GcrbV3DgJpvcRH+2IdX XlB6oNiOqFVPU8B1y7dOx0eTu41e2dZn/IBu/XRWaJwz9rVEmf0uNyUDLwRdDqTg2uTn W1HDlp8zpercdotCcgjIpmnLLheW6Csgx+dom2gkv9LEOxeBYQIAAXiun77Y7k+v7OpD 3Kpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="L17Jfa/U"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a10si1912396pfc.55.2019.05.23.18.22.27; Thu, 23 May 2019 18:22:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="L17Jfa/U"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389183AbfEXBSL (ORCPT + 99 others); Thu, 23 May 2019 21:18:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:46556 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387626AbfEXBSK (ORCPT ); Thu, 23 May 2019 21:18:10 -0400 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6416C218B6 for ; Fri, 24 May 2019 01:18:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558660688; bh=7lZcMPoWnmeu0Lf6tScsQ7kQKOEs9HqN2DCaPjTfUzo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=L17Jfa/Us2ZXApwta7cAoOC9c3NGw2tm1QB1VurrQU0TaWCdhiMet8mHL20XvrAqV TH20q/gdWLNJKa3bM34yERSCAPx4cIp6M17ITISjzJRIM+um7KyEETShxLr84/uD5V 7pxeT3K+W7thx1oE8oyiPHYiMXItdqJ/8+384sO0= Received: by mail-wm1-f47.google.com with SMTP id v19so48549wmh.0 for ; Thu, 23 May 2019 18:18:08 -0700 (PDT) X-Gm-Message-State: APjAAAU21k38l55trV5P6ErQAjMVHEQQ/4JijvU/Z+pEkEPKeN08mqW6 j0W713DVCr2rbQEKleNlTiG94Y645IgqsCtXc2hxDA== X-Received: by 2002:a1c:e906:: with SMTP id q6mr14707368wmc.47.1558660686807; Thu, 23 May 2019 18:18:06 -0700 (PDT) MIME-Version: 1.0 References: <20190521155140.GE22089@linux.intel.com> <20190522132022.GC31176@linux.intel.com> <20190522132227.GD31176@linux.intel.com> <0e183cce-c4b4-0e10-dbb6-bd81bea58b66@tycho.nsa.gov> <20190522153836.GA24833@linux.intel.com> <20190523023517.GA31950@linux.intel.com> <20190523102628.GC10955@linux.intel.com> <20190523141752.GA12078@linux.intel.com> <20190523234044.GC12078@linux.intel.com> In-Reply-To: <20190523234044.GC12078@linux.intel.com> From: Andy Lutomirski Date: Thu, 23 May 2019 18:17:54 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support) To: Sean Christopherson Cc: Andy Lutomirski , Jarkko Sakkinen , Stephen Smalley , James Morris , "Serge E. Hallyn" , LSM List , Paul Moore , Eric Paris , selinux@vger.kernel.org, Jethro Beekman , "Xing, Cedric" , "Hansen, Dave" , Thomas Gleixner , "Dr. Greg" , Linus Torvalds , LKML , X86 ML , "linux-sgx@vger.kernel.org" , Andrew Morton , "nhorman@redhat.com" , "npmccallum@redhat.com" , "Ayoun, Serge" , "Katz-zamir, Shay" , "Huang, Haitao" , Andy Shevchenko , "Svahn, Kai" , Borislav Petkov , Josh Triplett , "Huang, Kai" , David Rientjes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 23, 2019 at 4:40 PM Sean Christopherson wrote: > > On Thu, May 23, 2019 at 08:38:17AM -0700, Andy Lutomirski wrote: > > On Thu, May 23, 2019 at 7:17 AM Sean Christopherson > > wrote: > > > > > > On Thu, May 23, 2019 at 01:26:28PM +0300, Jarkko Sakkinen wrote: > > > > On Wed, May 22, 2019 at 07:35:17PM -0700, Sean Christopherson wrote: > > > > > But actually, there's no need to disallow mmap() after ECREATE since the > > > > > LSM checks also apply to mmap(), e.g. FILE__EXECUTE would be needed to > > > > > mmap() any enclave pages PROT_EXEC. I guess my past self thought mmap() > > > > > bypassed LSM checks? The real problem is that mmap()'ng an existing > > > > > enclave would require FILE__WRITE and FILE__EXECUTE, which puts us back > > > > > at square one. > > > > > > > > I'm lost with the constraints we want to set. > > > > > > As is today, SELinux policies would require enclave loaders to have > > > FILE__WRITE and FILE__EXECUTE permissions on /dev/sgx/enclave. Presumably > > > other LSMs have similar requirements. Requiring all processes to have > > > FILE__{WRITE,EXECUTE} permissions means the permissions don't add much > > > value, e.g. they can't be used to distinguish between an enclave that is > > > being loaded from an unmodified file and an enclave that is being > > > generated on the fly, e.g. Graphene. > > > > > > Looking back at Andy's mail, he was talking about requiring FILE__EXECUTE > > > to run an enclave, so perhaps it's only FILE__WRITE that we're trying to > > > special case. > > > > > > > I thought about this some more, and I have a new proposal that helps > > address the ELRANGE alignment issue and the permission issue at the > > cost of some extra verbosity. Maybe you all can poke holes in it :) > > The basic idea is to make everything more explicit from a user's > > perspective. Here's how it works: > > > > Opening /dev/sgx/enclave gives an enclave_fd that, by design, doesn't > > give EXECUTE or WRITE. mmap() on the enclave_fd only works if you > > pass PROT_NONE and gives the correct alignment. The resulting VMA > > cannot be mprotected or mremapped. It can't be mmapped at all until > > I assume you're thinking of clearing all VM_MAY* flags in sgx_mmap()? > > > after ECREATE because the alignment isn't known before that. > > I don't follow. The alignment is known because userspace knows the size > of its enclave. The initial unknown is the address, but that becomes > known once the initial mmap() completes. [...] I think I made the mistake of getting too carried away with implementation details rather than just getting to the point. And I misremembered the ECREATE flow -- oops. Let me try again. First, here are some problems with some earlier proposals (mine, yours Cedric's): - Having the EADD operation always work but have different effects depending on the source memory permissions is, at the very least, confusing. - If we want to encourage user programs to be well-behaved, we want to make it easy to map the RX parts of an enclave RX, the RW parts RW, the RO parts R, etc. But this interacts poorly with the sgx_mmap() alignment magic, as you've pointed out. - We don't want to couple LSMs with SGX too tightly. So here's how a nice interface might work: int enclave_fd = open("/dev/sgx/enclave", O_RDWR); /* enclave_fd points to a totally blank enclave. Before ECREATE, we need to decide on an address. */ void *addr = mmap(NULL, size, PROT_NONE, MAP_SHARED, enclave_fd, 0); /* we have an address! */ ioctl(enclave_fd, ECREATE, ...); /* now add some data to the enclave. We want the RWX addition to fail immediately unless we have the relevant LSM pemission. Similarly, we want the RX addition to fail immediately unless the source VMA is appropriate. */ ioctl(enclave_fd, EADD, rx_source_1, MAXPERM=RX, ...); [the ... includes SECINFO, which the kernel doesn't really care about] ioctl(enclave_fd, EADD, ro_source_1, MAXPERM=RX ...); ioctl(enclave_fd, EADD, rw_source_1, MAXPERM=RW ...); ioctl(enclave_fd, EADD, rwx_source_1, MAXPERM=RWX ...); ioctl(enclave_fd, EINIT, ...); /* presumably pass sigstruct_fd here, too. */ /* at this point, all is well except that the enclave is mapped PROT_NONE. There are a couple ways I can imagine to fix this. */ We could use mmap: mmap(baseaddr+offset, len, PROT_READ, MAP_SHARED | MAP_FIXED, enclave_fd, 0); /* only succeeds if MAXPERM & R == R */ But this has some annoying implications with regard to sgx_get_unmapped_area(). We could use an ioctl: ioctl(enclave_fd, SGX_IOC_MPROTECT, offset, len, PROT_READ); which has the potentially nice property that we can completely bypass the LSM hooks, because the LSM has *already* vetted everything when the EADD calls were allowed. Or we could maybe even just use mprotect() itself: mprotect(baseaddr + offset, len, PROT_READ); Or, for the really evil option, we could use a bit of magic in .fault and do nothing here. Instead we'd make the initial mapping PROT_READ|PROT_WRITE|PROT_EXEC and have .fault actually instantiate the PTEs with the intersection of the VMA permissions and MAXPERM. I don't think I like this alternative, since it feels more magical than needed and it will be harder to debug. I like the fact that /proc/self/maps shows the actual permissions in all the other variants. All of the rest of the crud in my earlier email was just implementation details. The point I was trying to make was that I think it's possible to implement this without making too much of a mess internally. I think I favor the mprotect() approach since it makes the behavior fairly obvious. I don't think any of this needs to change for SGX2. We'd have an ioctl() that does EAUG and specifies MAXPERM. Trying to mprotect() a page that hasn't been added yet with any permission other than PROT_NONE would fail. I suppose we might end up needing a way to let the EAUG operation *change* MAXPERM, and this operation would have to do some more LSM checks and walk all the existing mappings to make sure they're consistent with the new MAXPERM. As an aside, I wonder if Linus et all would be okay with a new MAP_FULLY_ALIGNED mmap() flag that allocated memory aligned to the requested size. Then we could get rid of yet another bit of magic. --Andy