Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2555626imu; Tue, 6 Nov 2018 17:18:15 -0800 (PST) X-Google-Smtp-Source: AJdET5fqiVHHR2S7gZUr1rQ+8rbVDIWvwWqKYtx8vh/zEq3YWgFeFxgBS1SkesvuGSjaGas711++ X-Received: by 2002:a62:4b8c:: with SMTP id d12-v6mr26653658pfj.38.1541553495815; Tue, 06 Nov 2018 17:18:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541553495; cv=none; d=google.com; s=arc-20160816; b=FOFORU/9LFQy+BmJlz6xWEvrL/JQ/MhcOEhwvSKsid0Ci4hawtxbnXpUw2cAd2i+YT hx++bJYw/ELHdcEuhnbfEPeA9obXAvNB1x9VymX42Lp1SAk3H2gw58iHnfdGD8zuckIo mXze/ooydEeTRhSq5JWOlJLJW6xMg7teoZ56XZJA9RPEc+iBG6h4qOIDkf+ECtSfovzg SdA7Bzv1p+iVq9PLdwRz/RYLKFAW0ZXSjptgdJcj2rgo4kfhvsEF7xj5oKEoKjSI26Vn 078oVMQkwG7/dslg/MZ4rx4WMIVTfjG8kyR4h+N1wWVgXaSeNLoSXAxMkd6XJqpOQo/l /aEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=T5SvDmOpGde550L/O2CvFWubGUyKio/x+3LDdiS3tvk=; b=pVnU9jDmMRzmFgdk8yH82DaQzQPZAfAH9jrI2VqY7O/A9xOFpcuajBvk68Y91RD7Nw ClfSow2ekxZtWYYM+ZuqPQjrPLXm4ZGKdhDeE10oxAYqxjdHBtZRMHTwgjfMPjiFmdi0 IMOdP/JDdTl0d69WTwv0gggu1HM+OEcbLOS4SV/39E2dWn/JgBqiDiXJUebmg0Eqxap1 au4PH7KIguSed9zQ928KC6mz1jO/Mf35VANrVddsNOMCzQ+nqEeAGtjwpY6pwcA903+0 Wzxtq9QP7HrMC0GPxscLZzOabRL6QZpamwSY8dY/cwwpdN3hQCCo2RMEn0M4Kdqi4f4D a16g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=W6dgmvJl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3-v6si46935621plx.106.2018.11.06.17.17.59; Tue, 06 Nov 2018 17:18:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=W6dgmvJl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388876AbeKGKpd (ORCPT + 99 others); Wed, 7 Nov 2018 05:45:33 -0500 Received: from mail.kernel.org ([198.145.29.99]:46840 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388677AbeKGKpd (ORCPT ); Wed, 7 Nov 2018 05:45:33 -0500 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C925C20882 for ; Wed, 7 Nov 2018 01:17:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1541553448; bh=bJwZb1O4zAE9dHWjIOrppDX7jvBW1SOTO4SvVHJRQZw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=W6dgmvJlAdGMPWdH/NSCibfKDhUb98lqWbm3C3ibnVJx/hVXKwNfhonR8pM+Pc0og 8PfXFXb/TDnBiJdySqIUEawR67UMPKNcmSPdsg59wSEdZErKVdcpnz9arugS9N6qjP PNrviTqeWufeIut87G+xlW5KKlLMtkc3jX8CEk3I= Received: by mail-wm1-f43.google.com with SMTP id f2-v6so543070wme.3 for ; Tue, 06 Nov 2018 17:17:27 -0800 (PST) X-Gm-Message-State: AGRZ1gJB4qgocIWeAdTLpeuFZesItoet1QDJKWczfwnoE6i8XwgPjJcK I59hcViAspUJ/x01mc5OWjmuPYWtXFOjgFgXmatxqg== X-Received: by 2002:a1c:410a:: with SMTP id o10-v6mr174833wma.19.1541553446217; Tue, 06 Nov 2018 17:17:26 -0800 (PST) MIME-Version: 1.0 References: <1C426267-492F-4AE7-8BE8-C7FE278531F9@amacapital.net> <209cf4a5-eda9-2495-539f-fed22252cf02@intel.com> <9B76E95B-5745-412E-8007-7FAA7F83D6FB@amacapital.net> <1541541565.8854.13.camel@intel.com> <7FF4802E-FBC5-4E6D-A8F6-8A65114F18C7@amacapital.net> <20181106233515.GB11101@linux.intel.com> <20181107000235.GC11101@linux.intel.com> In-Reply-To: <20181107000235.GC11101@linux.intel.com> From: Andy Lutomirski Date: Tue, 6 Nov 2018 17:17:14 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: RFC: userspace exception fixups To: "Christopherson, Sean J" Cc: Andrew Lutomirski , Dave Hansen , Jann Horn , Linus Torvalds , Rich Felker , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "Carlos O'Donell" , adhemerval.zanella@linaro.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson wrote: > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson wrote: > > > > >> > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > > >> any architectural indication of why the AEX code got called or any > > > > >> obvious way for the user code to know whether the exit was fixed up by > > > > >> the kernel? > > > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > > bit misleading because its signal handler may muck with the context's > > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > > ucode preamble, but from software's perspective it's a normal event > > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > > And because the signals the SDK cares about are all synchronous, the > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > > > into the enclave. > > > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > > case, after the trap handler has run. > > > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > > from doing stupid things? > > > > > > > > My general feeling is that userspace should be allowed to do apparently > > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > > DOSEMU are just user programs that do stupid things. Linux generally tries > > > > to provide a reasonably complete view of architectural behavior. This is > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > > > cause very odd behavior indeed. So magic fixups that do non-architectural > > > > things are not so great. > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > instruction, not the EENTER instruction, so if we skip it we just end > > up in lala land. > > Userspace would obviously need to be aware of the fixup behavior, but > it actually works out fairly nicely to have a separate path for ERESUME > fixup since a fault on EENTER is generally fatal, whereas as a fault on > ERESUME might be recoverable. > Hmm. > > do_eenter: > mov tcs, %rbx > lea async_exit, %rcx > mov $EENTER, %rax > ENCLU Or SOME_SILLY_PREFIX ENCLU? > > /* > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > * fault indicator, e.g. -EFAULT. > */ > eexit_or_eenter_fault: > ret But userspace wants to know whether it was a fault or not. So I think we either need two landing pads or we need to hijack a flag bit (are there any known-zeroed flag bits after EEXIT?) to say whether it was a fault. And, if it was a fault, we should give the vector, the sanitized error code, and possibly CR2. > > async_exit: > ENCLU Same prefix here, right? > > fixup_handler: > This whole thing is a bit odd, but not necessarily a terrible idea. > > > How averse would everyone be to making enclave entry be a syscall? > > The user code would do sys_sgx_enter_enclave(), and the kernel would > > stash away the register state (vm86()-style), point RIP to the vDSO's > > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > > SYSRET. The trap handlers would understand what's going on and > > restore register state accordingly. > > Wouldn't that blast away any stack changes made by the enclave? Yes, but I was imagining that it would stash the registers into the struct host_state thing I made up :)