Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2400112imu; Tue, 6 Nov 2018 14:00:45 -0800 (PST) X-Google-Smtp-Source: AJdET5dzEKoJKWki1XADp67MVOo0ripB4E23hV6jIE6AO7JttAJ55P20kz6CFcZEhOiGkLp8LVmP X-Received: by 2002:a63:1c1b:: with SMTP id c27-v6mr25541124pgc.351.1541541645534; Tue, 06 Nov 2018 14:00:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541541645; cv=none; d=google.com; s=arc-20160816; b=yPZ+05F8zjcax1BBEfPhh84wdioMHt+/YSA3arwztP2VJuakXpuUa/c86wImj5wX8J e5TmlnCrKG8EZ6Qb3Mbtl+WIESGHUF53xqK/PzNnHo3DTlQusJxsmCwPSJqIUckhikIu XzOiTfw2EitG4S/hlk1r4vQWTsAv1w+uu3SZI0HCiRZaaoKwe3LPYay8+xnSCVE2pDtl jQ9R9xRhiju3CyZuc+c7ECafmI1MfhT8SI7Uo8fOeFdbHF4bygns+JQ3zGYrDZOEeB5/ 7FhBFYceMrzHNZcmWmCEw8TDmS2m07FGe/u6Z98HiTmVXwq9E/RYicH2bsL7GnLpp7D2 mUPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=QZCOPFdkK9TXVEkQLLPiXUppzDAXa8U2TiyvwHIftHA=; b=Jti+eqZCWETd56ncCDYBiHz5BPIAWZzh1E4plyh/Jd3mb857ySE3yYFHxs7X+FI9yh XTiyE/9jP2sGWtvIsycbt1qmU1m8OnmGw7C53YSbSd9hpmvgYU5Rmd1ae94GAxpjTHey 1Uvyqh3EQi+gJOometmDL5pys9Az5kfTFPM577UyhC3vknj3Hx/yIKxH78ILdKHBgrkq S2GQ15TSwKzNq6tKMp4wzry0OM264vgCBSQr/YKCYExlY+95ajMSXGLosySUtiDBm0HL nUbBDUDRZ+SFcSlkJGmlyyagfBIShIRqz7pG5FpiJxi8i4ruXJRI417S6cpDzJaYF55L gAew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u12-v6si8926354plr.104.2018.11.06.14.00.29; Tue, 06 Nov 2018 14:00:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730647AbeKGH0q (ORCPT + 99 others); Wed, 7 Nov 2018 02:26:46 -0500 Received: from mga07.intel.com ([134.134.136.100]:6185 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726069AbeKGH0q (ORCPT ); Wed, 7 Nov 2018 02:26:46 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2018 13:59:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,473,1534834800"; d="scan'208";a="87187056" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.154]) by orsmga007.jf.intel.com with ESMTP; 06 Nov 2018 13:59:25 -0800 Message-ID: <1541541565.8854.13.camel@intel.com> Subject: Re: RFC: userspace exception fixups From: Sean Christopherson To: Andy Lutomirski , Dave Hansen Cc: Jann Horn , Linus Torvalds , Rich Felker , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Carlos O'Donell , adhemerval.zanella@linaro.org Date: Tue, 06 Nov 2018 13:59:25 -0800 In-Reply-To: References: <20181102170627.GD7393@linux.intel.com> <20181102173350.GF7393@linux.intel.com> <20181102182712.GG7393@linux.intel.com> <20181102220437.GI7393@linux.intel.com> <1541518670.7839.31.camel@intel.com> <1541524750.7839.51.camel@intel.com> <22596E35-F5D1-4935-86AB-B510DCA0FABE@amacapital.net> <1C426267-492F-4AE7-8BE8-C7FE278531F9@amacapital.net> <209cf4a5-eda9-2495-539f-fed22252cf02@intel.com> <9B76E95B-5745-412E-8007-7FAA7F83D6FB@amacapital.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wrote: > > > > > > > > On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote: > > > > > > > > > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote: > > > > True, but what if we have a nasty enclave that writes to memory just > > > > below SP *before* decrementing SP? > > > Yeah, that would be unfortunate.  If an enclave did this (roughly): > > > > > >    1. EENTER > > >    2. Hardware sets eenter_hwframe->sp = %sp > > >    3. Enclave runs... wants to do out-call > > >    4. Enclave sets up parameters: > > >        memcpy(&eenter_hwframe->sp[-offset], arg1, size); > > >        ... > > >    5. Enclave sets eenter_hwframe->sp -= offset > > > > > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that > > > was on the stack.  The enclave could easily fix this by moving ->sp first. > > > > > > But, this is one of those "fun" parts of the ABI that I think we need to > > > talk about.  If we do this, we also basically require that the code > > > which handles asynchronous exits must *not* write to the stack.  That's > > > not hard because it's typically just a single ERESUME instruction, but > > > it *is* a requirement. > > > > > I was assuming that the async exit stuff was completely hidden by the API. The AEP code would decide whether the exit got fixed up by the kernel (which may or may not be easy to tell — can the > > code even tell without kernel help whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause sgx_enter_enclave() to return with an appropriate return value. > > > > > Sean, how does the current SDK AEX handler decide whether to do > EENTER, ERESUME, or just bail and consider the enclave dead?  It seems > like the *CPU* could give a big hint, but I don't see where there is > any architectural indication of why the AEX code got called or any > obvious way for the user code to know whether the exit was fixed up by > the kernel? The SDK "unconditionally" does ERESUME at the AEP location, but that's bit misleading because its signal handler may muck with the context's RIP, e.g. to abort the enclave on a fatal fault. On an event/exception from within an enclave, the event is immediately delivered after loading synthetic state and changing RIP to the AEP. In other words, jamming CPU state is essentially a bunch of vectoring ucode preamble, but from software's perspective it's a normal event that happens to point at the AEP instead of somewhere in the enclave. And because the signals the SDK cares about are all synchronous, the SDK can simply hardcode ERESUME at the AEP since all of the fault logic resides in its signal handler.  IRQs and whatnot simply trampoline back into the enclave. Userspace can do something funky instead of ERESUME, but only *after* IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's case, after the trap handler has run. Jumping back a bit, how much do we care about preventing userspace from doing stupid things?  I did a quick POC on the idea of hardcoding fixup for the ENCLU opcode, and the basic idea checks out.  The code is fairly minimal and doesn't impact the core functionality of the SDK. They'd need to redo their trap handling to move it from the signal handler to inline, but their stack shenanigans won't be any more broken than they already are.