Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp3274314pxb; Mon, 16 Nov 2020 10:04:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJyrR1dProszqMgqE0xw4yFj8+pwF5ioIRwnk6fbLeZ2j7ouEvVZ83bgdh0n6stohqLptAr9 X-Received: by 2002:a17:906:3a55:: with SMTP id a21mr15293428ejf.357.1605549863414; Mon, 16 Nov 2020 10:04:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605549863; cv=none; d=google.com; s=arc-20160816; b=jp5EgfOh5C3CrVMmIZgKxLbyU15c/AIFrm6BPlUJ4B7jwZcClFoenG9WlrzFMnZnl3 02zKoVqsRhtjQVRi3eVwy+EJF16AP/UaWaG03ON9WyLMdoMbsZ+279SYdXmt525NJvYn qWYytNYyvhyji2hxGwyGC+MUOmiXYVbxG8zey5uRnEKuIw3kfxPbLw5/82H/NL6fHOmO 3WTb0fG0w+3iL4+YeDjny6hhIW1BP8Hk9A612iSYlCbowO0nDYuJZEf1bc3Nena9s36z HRdsfZlUxKmMsJRerXTTYRprzoXk5o5zB0KPyfE+X1czB9o6A3FUOCXAyTOAfpVGFPyW JdJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date; bh=6m8ArYu1Ytnpsg1tGeFJxlOLvRIvIeyZzrxhQOp1+fk=; b=ph6p2iGnycXCtvfbaWYXcRRiIhOl1VVhlwN1QgBOhYMvztmPSs9HJCFb7OcNw6U75D R8K0Lhac4YoXY/gWXXEgJ3H1Sa9bHwVycZkPKZWlsGD0VXEr7WyfVPWo4IDcmBgdpvvv j0i3JYu3c+etVH3c2X2iIIikhvaP2+Az1awDFpnRHM/f0MpsDLHVQgicSaUAcc/XHr7Y TxLPvG3n87AJNHef/Upb1P8yms93PqMsDRyNEprD4s11GSkietDp2dSTjXQ//vMNWr+Q BUgaOwg+/kUuVckHp2g65DEB+xAmsX/EUbr2fX9w7ycKTUH0vcJRm3CRmEpyqW46zmD+ Eyzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b15si11837550eja.538.2020.11.16.10.03.58; Mon, 16 Nov 2020 10:04:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387697AbgKPSBd (ORCPT + 99 others); Mon, 16 Nov 2020 13:01:33 -0500 Received: from wind.enjellic.com ([76.10.64.91]:60792 "EHLO wind.enjellic.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731501AbgKPSBd (ORCPT ); Mon, 16 Nov 2020 13:01:33 -0500 Received: from wind.enjellic.com (localhost [127.0.0.1]) by wind.enjellic.com (8.15.2/8.15.2) with ESMTP id 0AGI0Pib001075; Mon, 16 Nov 2020 12:00:25 -0600 Received: (from greg@localhost) by wind.enjellic.com (8.15.2/8.15.2/Submit) id 0AGI0NEQ001074; Mon, 16 Nov 2020 12:00:23 -0600 Date: Mon, 16 Nov 2020 12:00:23 -0600 From: "Dr. Greg" To: Andy Lutomirski Cc: Dave Hansen , Jarkko Sakkinen , X86 ML , linux-sgx@vger.kernel.org, LKML , Sean Christopherson , Linux-MM , Andrew Morton , Matthew Wilcox , Jethro Beekman , Darren Kenny , Andy Shevchenko , asapek@google.com, Borislav Petkov , "Xing, Cedric" , chenalexchen@google.com, Conrad Parker , cyhanish@google.com, "Huang, Haitao" , "Huang, Kai" , "Svahn, Kai" , Keith Moyer , Christian Ludloff , Neil Horman , Nathaniel McCallum , Patrick Uiterwijk , David Rientjes , Thomas Gleixner , yaozhangx@google.com, Mikko Ylinen Subject: Re: [PATCH v40 10/24] mm: Add 'mprotect' hook to struct vm_operations_struct Message-ID: <20201116180023.GA32481@wind.enjellic.com> Reply-To: "Dr. Greg" References: <20201104145430.300542-1-jarkko.sakkinen@linux.intel.com> <20201104145430.300542-11-jarkko.sakkinen@linux.intel.com> <20201106174359.GA24109@wind.enjellic.com> <20201107150930.GA29530@wind.enjellic.com> <20201112205819.GA9172@wind.enjellic.com> <5c22300c-0956-48ed-578d-00cf62cb5c09@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.3 (wind.enjellic.com [127.0.0.1]); Mon, 16 Nov 2020 12:00:25 -0600 (CST) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 12, 2020 at 02:41:00PM -0800, Andy Lutomirski wrote: Good morning, I hope the week is starting well for everyone. > On Thu, Nov 12, 2020 at 1:31 PM Dave Hansen wrote: > > > > On 11/12/20 12:58 PM, Dr. Greg wrote: > > > @@ -270,11 +270,10 @@ static int sgx_vma_mprotect(struct vm_area_struct *vma, > > > struct vm_area_struct **pprev, unsigned long start, > > > unsigned long end, unsigned long newflags) > > > { > > > - int ret; > > > + struct sgx_encl *encl = vma->vm_private_data; > > > > > > - ret = sgx_encl_may_map(vma->vm_private_data, start, end, newflags); > > > - if (ret) > > > - return ret; > > > + if ( test_bit(SGX_ENCL_INITIALIZED, &encl->flags) ) > > > + return -EACCES; > > > > > > return mprotect_fixup(vma, pprev, start, end, newflags); > > > } > > > > This rules out mprotect() on running enclaves. Does that break any > > expectations from enclave authors, or take away capabilities that folks > > need? > It certainly prevents any scheme in which an enclave coordinates > with the outside world to do W-and-then-X JIT inside. I'm also not > convinced it has any real effect at all unless there's some magic I > missed to prevent someone from using mmap(2) to effectively change > permissions. The patch that I posted yesterday addresses the security issue for both mmap and mprotect by trapping the permission change request at the level of the sgx_encl_may_map() function. With respect to the W-and-then-X JIT issue, the stated purpose of the driver is to implement basic SGX functionality, which is SGX1 semantics, it has been stated formally for a year by the developers themselves that they are not entertaining a driver that addresses any of the issues associated with non-static memory permissions. As I've noted previously, the hardware itself is capable of enforcing that after initialization, if mmap/mprotect is blocked on hardware that supports SGX2 instructions. > Everyone, IMO this SGX1 - vs - SGX2 - vs - EDMM discussion is > entirely missing the point and is a waste of everyone's time. Let's > pretend we're building a system that has nothing to do with SGX and > requires no special hardware support at all. It works like this: I don't doubt there is a potential bigger vision here, quite frankly it is probably an open question whether or not SGX is going to be a part of this future, for a variety of reasons. I also do not doubt that you have the skills to define that vision. Right now, however, the issue is not about pretending but rather one of getting a driver into the kernel that provides a framework for building whatever future SGX may have. Given GKH's comments on LWN last week in response to the RAPL vulnerability, I'm not sure if it is a politically done deal that the driver will go in. SGX has specific hardware characteristics that impact the driver, I don't see how fitting it into a generic trusted execution model advances the agenda immediately at hand. Particularly given the fact that I'm not even sure people understand the questions that need to be answered about such a generic model. > A user program opens /dev/xyz and gets back an fd that represents 16 > MB of memory. The user program copies some data from disk (or > network or whatever) into fd (using write(2) or ioctl(2) or mmap(2) > and memcpy) and then mmaps some of the fd as R and some as RW and > some as RX, and then the user program jumps into the RX mapping. This is basically the SGX model in the new driver. The important defining characteristic of the driver, that we can't wave away, is that the hardware requires a specific set of initial page permissions to be implemented in order for initialization of the memory range (enclave) to succeed. This is inherent to the way SGX hardware was designed to work. The only difference between SGX1 and SGX2 is that the latter offers a small number of additional instructions that allow the page permissions to be dynamically manipulated after initialization is complete. From a security perspective, the issue at hand is that the executable material is not going to come in through the fd, it is going to be loaded by the enclave over the network. This isn't fear mongering, it is the stated intent of what people want to do with the technology as a integral part of confidential computing. I've had the opportunity to brief DOD and other entities concerned with intelligence issues, about these type of potential capabilities. It isn't hard to envision scenarios of where having potentially sensitive code and data only ever handled and executed by a trusted entity, in an environment that is inherently ephemeral with respect to its persistence, is an important design characteristic. Thermite has also been known to play a role in some of these designs prior to the greater elegance of trusted execution environments. Ultimately, if you believe the Confidential Computing Consortium, it is also what people want for their sensitive cloud workloads. Absent the thermite of course. > If we replace /dev/xyz with /dev/zero, then this simply does not work > under a reasonably strict W^X policy -- a lot of people think it's > quite reasonable for an OS to prevent a user program from obtaining an > X mapping containing anything other than a mapping from a file on > disk. To solve this, we can do one of at least three things: > > a) You can't use /dev/xyz unless you have permission to create WX > memory or to at least create W memory and then change it to X. > > b) You can do whatever you want with /dev/xyz, and LSM policies are > blatantly violated as a result. I think the important issue at hand is that classic LSM policies simply are not relevant with respect to how this technology was designed to operate, and perhaps more importantly, how people want to use it. That is why I have consistently stated that I think the only relevant knob is a binary decision as to whether or not a platform owner wants to entertain completely anonymous code execution. > c) The /dev/xyz API is clever and tracks, page-by-page, whether the > user intends to ever write and/or execute that page, and behaves > accordingly. > > This patchset takes the approach (c). The actual clever policy > isn't here yet, and we don't really know whether it will ever > appear, but the API is set up to enable such a policy to be written. > This appears to be a win for everyone, since the code is pretty > clean and the API is straightforward. I believe I have been clear in stating that I have never doubted the cleverness of the approach or its potential utility for the future. The issue at hand is that it simply isn't relevant at this stage of the driver. Getting this new vision right is something that would benefit from a lot of conversations between runtime and kernel developers. Arguably, the case can be made that it should have a second type of implementation to ensure that the approach is generic, extensible and most importantly secure. The 'cleverness' of the policy needs to be evaluated in the context of what does it mean with respect to the risk arbitration decisions that we are trying to support. The open question comes down to, in essence, asking ourselves whether or not we believe that it makes sense to say that 15 pages of RWX memory is a security threat but 5 are not. > Now, back to SGX. There are only two things that are remotely > SGX-specific here. First, SGX requires this unusual memory model in > which there is an executable mapping of (part of) a device node. [0] > Second, early SGX hardware had this oddity that the kernel could set > a per-backing-page (as opposed to per-PTE) bit to permanently > disable X on a given /dev/sgx page. Building a security model > around that would have been a hack, and it DOES NOT WORK on new > hardware. So can we please stop discussing it? None of the actual > interesting parts of this have much to do with SGX per se and have > nothing whatsoever to do with EDMM or any other Intel buzzword. Just a clarification for everyone sitting in their recliners eating popcorn and following along at home. As I've stated before, I don't argue the potential utility of some new model, SGX however, has hardware characteristics that cannot be waved away in this discussion. The technology was designed to have a cryptographic measurement that controls whether or not the memory image is suitable for execution. The description of that image is defined by the enclave author when the enclave is signed. This is why the current EDMM implementation requires that a maximum aperture range be defined for dynamic memory regions. Since the linear address of each page address in the enclave is encoded into the measurement, enclave initialization will fail unless the loaded memory image is consistent with the wishes of the enclave signer. Having 40 pages of potential heap memory when the author called for 39 would be considered an initialization defect that would be enforced by the hardware. The desired page permissions are also encoded into the enclave measurement. Since the current implementation takes the maximum scoped permissions from the security information encoded in the enclave, it would require that the enclave encode for RWX permissions if the intent was to dynmically load executable or JIT code after the enclave was initialized. If I understand your above analysis correctly, this would be problematic for current security models/practices. Obviously an API could be proposed that allowed this permissable memory map to be defined independently from the enclave. This notion, based on my read of the security risk considerations that went into the design of SGX, would be problematic, since it would allow an untrusted party to modify the characteristics that were desired by the enclave author for the executable image. > Heck, if anyone actually cared to do so, something with essentially > the same semantics could probably be built using SEV hardware > instead of SGX, and it would have exactly the same issue if we > wanted it to work for tasks that didn't have access to /dev/kvm. As I noted above, perhaps whatever this driver may become in the future would benefit from the community creating something like this as a second reference to build an API on top of. > [0] SGX doesn't *really* require this. We could set things up so that > you do mmap(..., MAP_ANONYMOUS, fd, ...) and then somehow introduce > that mapping to SGX. I think the result would be too disgusting to > seriously consider. Let me be clear, I certainly would not advocate doing anything too disgusting to consider. Hopefully our proposal for simplifying the security model for the driver, while still allowing the framework for a still unspecified future pathway, doesn't fit this description. Best wishes for a productive week to everyone. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Autonomously self-defensive Enjellic Systems Development, LLC IOT platforms and edge devices. 4206 N. 19th Ave. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Boy, it must not take much to make a phone work. Looking at everthing else here it must be the same way with the INTERNET." -- Francis 'Fritz' Wettstein