Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3486937imu; Wed, 7 Nov 2018 11:03:57 -0800 (PST) X-Google-Smtp-Source: AJdET5c+RNwL9vXFHiyu444pg/Tr95/osOJGVUkVsjSjl80Hj1c4rgV3c9Oi0mCMlT3vFAcsIu/T X-Received: by 2002:a63:e40c:: with SMTP id a12mr1227713pgi.28.1541617437449; Wed, 07 Nov 2018 11:03:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541617437; cv=none; d=google.com; s=arc-20160816; b=Ark/60D5Pj3QJvnR6rOgVrRGg+h95s3LBTPMgtebB9qgWxclpOSdJJpVyY2UJVz4CK lJrNy6FyEAuTfjx3KftZHRVAxmwaecER82rw+Vsi+j9/Xm/bqHE8kyjadiYM83De/9ZZ kNQwMXzVlDkwc4iR9JlCK+FviVCSODfdewroXzYRBUoa/UxL/hs7reQhxSDVuJ2C5TiZ ff+wp/9e2zdxttvdSHgWicPrxlPDkKnerHMXwNhYmoQ+sEnvhxwVWKeD0CAYJUfdsHME 56y+fW9L4n47iOCilaeeWBHnH9Tg0sQzkAAuaNpVeSBOyBA5ua+AJKOrThZaedpiLIy5 yWZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GrGH/5pfZjOmC/v1WrnYtbOhLe+kgoFXpQHe5lQQa6Q=; b=ZFuJqqSLDT3v3meDdam7zHOuDGwWJPf6DX3SCpv1uzW0icI7hgsJuwBOjZ9wA5VR7F y1ys8OJYi9wp8f3Igw2ZlTxdqu08SY3Xr53Fv4p1p8/cV+/0oxLgSinBgDKPZ8Z+x8h0 U8KKAkhGVVnP8GFjItM7in8qXGylcwFkxRHv0tRT8w9r187+SaDAT+PnwZlMW05cFsxt SZpT5ZK/GV2SHLnmrqhWNyqny7PbrLVTQO9hH97tR2HE7EHcgwNz2dK/LEOKSGrd4fDk NVnTUMXe1MLbI8S0/33hPDYjm5Kg4j70KsXIOSOfJse0nvDqNYhuGcB5orJ64+ZR7u9p WKYA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t75-v6si1438282pfi.221.2018.11.07.11.03.39; Wed, 07 Nov 2018 11:03:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729766AbeKHEc4 (ORCPT + 99 others); Wed, 7 Nov 2018 23:32:56 -0500 Received: from mga12.intel.com ([192.55.52.136]:2418 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725727AbeKHEc4 (ORCPT ); Wed, 7 Nov 2018 23:32:56 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Nov 2018 11:01:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,476,1534834800"; d="scan'208";a="106298502" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.154]) by orsmga001.jf.intel.com with ESMTP; 07 Nov 2018 11:01:15 -0800 Date: Wed, 7 Nov 2018 11:01:15 -0800 From: Sean Christopherson To: Andy Lutomirski Cc: Dave Hansen , Jann Horn , Linus Torvalds , Rich Felker , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Carlos O'Donell , adhemerval.zanella@linaro.org Subject: Re: RFC: userspace exception fixups Message-ID: <20181107190114.GA26603@linux.intel.com> References: <209cf4a5-eda9-2495-539f-fed22252cf02@intel.com> <9B76E95B-5745-412E-8007-7FAA7F83D6FB@amacapital.net> <1541541565.8854.13.camel@intel.com> <7FF4802E-FBC5-4E6D-A8F6-8A65114F18C7@amacapital.net> <20181106233515.GB11101@linux.intel.com> <20181107000235.GC11101@linux.intel.com> <20181107153452.GB22972@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181107153452.GB22972@linux.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 07, 2018 at 07:34:52AM -0800, Sean Christopherson wrote: > On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote: > > On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > > > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > > > > wrote: > > > > > > > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > > > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > > > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > > > > that the enclave can EEXIT to immediately after the EENTER location. > > > > > > > > > > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > > > > instruction, not the EENTER instruction, so if we skip it we just end > > > > up in lala land. > > > > > > Userspace would obviously need to be aware of the fixup behavior, but > > > it actually works out fairly nicely to have a separate path for ERESUME > > > fixup since a fault on EENTER is generally fatal, whereas as a fault on > > > ERESUME might be recoverable. > > > > > > > Hmm. > > > > > > > > do_eenter: > > > mov tcs, %rbx > > > lea async_exit, %rcx > > > mov $EENTER, %rax > > > ENCLU > > > > Or SOME_SILLY_PREFIX ENCLU? > > Yeah, forgot to include that. > > > > > > > /* > > > * EEXIT or EENTER faulted. In the latter case, %RAX already holds some > > > * fault indicator, e.g. -EFAULT. > > > */ > > > eexit_or_eenter_fault: > > > ret > > > > But userspace wants to know whether it was a fault or not. So I think > > we either need two landing pads or we need to hijack a flag bit (are > > there any known-zeroed flag bits after EEXIT?) to say whether it was a > > fault. And, if it was a fault, we should give the vector, the > > sanitized error code, and possibly CR2. > > As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we > can use RAX to indicate a fault. That's what I was trying to imply with > EFAULT. Here's the reg stuffing I use for the POC: > > regs->ax = EFAULT; > regs->di = trapnr; > regs->si = error_code; > regs->dx = address; > > > Well-known RAX values also means the kernel fault handlers only need to > look for SOME_SILLY_PREFIX ENCLU if RAX==2 || RAX==3, i.e. the fault > occurred on EENTER or in an enclave (RAX is set to ERESUME's leaf as > part of the asynchronous enlcave exit flow). POC kernel code, 64-bit only. Limiting this to 64-bit isn't necessary, but it makes the code prettier and allows using REX as the magic prefix. I like the idea of using REX because it seems least likely to be repurposed for yet another new feature. I have no idea if 64-bit only will fly with the SDK folks. Going off comments in similar code related to UMIP, we'd need to figure out how to handle protection keys. /* REX with all bits set, ignored by ENCLU. */ #define SGX_DO_ENCLU_FIXUP 0x4F #define SGX_ENCLU_OPCODE0 0x0F #define SGX_ENCLU_OPCODE1 0x01 #define SGX_ENCLU_OPCODE2 0xD7 /* ENCLU is a three-byte opcode, plus one byte for the magic prefix. */ #define SGX_ENCLU_FIXUP_INSN_LEN 4 static int sgx_detect_enclu(struct pt_regs *regs) { unsigned char buf[SGX_ENCLU_FIXUP_INSN_LEN]; /* Look for EENTER or ERESUME in RAX, 64-bit mode only. */ if (!regs || (regs->ax != 2 && regs->ax != 3) || !user_64bit_mode(regs)) return 0; if (copy_from_user(buf, (void __user *)(regs->ip), sizeof(buf))) return 0; if (buf[0] == SGX_DO_ENCLU_FIXUP && buf[1] == SGX_ENCLU_OPCODE0 && buf[2] == SGX_ENCLU_OPCODE1 && buf[3] == SGX_ENCLU_OPCODE2) return SGX_ENCLU_FIXUP_INSN_LEN; return 0; } bool sgx_fixup_enclu_fault(struct pt_regs *regs, int trapnr, unsigned long error_code, unsigned long address) { int insn_len; insn_len = sgx_detect_enclu(regs); if (!insn_len) return false; regs->ip += insn_len; regs->ax = EFAULT; regs->di = trapnr; regs->si = error_code; regs->dx = address; return true; }