Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2179249imu; Tue, 6 Nov 2018 10:12:56 -0800 (PST) X-Google-Smtp-Source: AJdET5dEipY98Rrc5EREGpECLj3zCr4NHSCM//stz+ZXxrOmIUJlMoM+/5Ge4jFyLv/vsFW9khvD X-Received: by 2002:a17:902:22cc:: with SMTP id o12-v6mr27497581plg.108.1541527976919; Tue, 06 Nov 2018 10:12:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541527976; cv=none; d=google.com; s=arc-20160816; b=sIu8T0mVmI4GsKryf/Hfl9Psgj0HpRqyXKDtMflPzidYEFnaNvD0vRm8gvI+Znei4W nQOFvrzIUM7cxTw0Jj0+xIn0Wx0KY/mWqiidcUgvV6VV1D9ZeqjS1ygjiYR1K6wKVajf 5kEeN6mlIC3EvDNJ/7di1ab5ocr5cIbi39OObZYvIebMoXTKvb9oB9Jo0Q0V5IlOY3cR uPm+0vU2X2XS/735D4+YV/ORrf09IxqUPZ89uIMRq25s2Xg7Lf+R8BVOhcDVOC4GYQxA j60pYS3ZMajZAWsE4kgS3ts++lA2MkhTp9mOB3OR9mHto7jaVl2khNNX7WwjjKTISTZD lEjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=03Bkbl9GsQsrVMklxOA0CrcHx6MgSVbPw16OmB1FgaQ=; b=EHGJK0zIgCMKw6Cn3prTcE6PBgLGcjuGbUwKGHGlnhGbkYUNq1MKQfuVqetm0DlWuL UcYVIdFRKh2CQJd5HFx4LZ8D9JpmjMucgAGvsdiDJWt3FzF9z/kv/GJflXt2xPaXkYMA R9G8wsTYlO8y1nFo5HuNQBX68G9xOFIcOJrNDhcEv+9bIudSJUh0KQrDH+WydKB04Veb Xc4+V36JV0KlOoPYkywd4OA4isp7A/mLzidLzI2JHhfi+JcnDIyqZTL4F/pObTQc9sHI YtOpH0ECsLodzyHYZztWeKVmRFodM1E1OThldxkybeLxF2J+znnoLhVZufl90k1DEREc 3Urg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r2-v6si21777213pgj.139.2018.11.06.10.12.41; Tue, 06 Nov 2018 10:12:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388403AbeKGBDh (ORCPT + 99 others); Tue, 6 Nov 2018 20:03:37 -0500 Received: from mga01.intel.com ([192.55.52.88]:36060 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387480AbeKGBDg (ORCPT ); Tue, 6 Nov 2018 20:03:36 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2018 07:37:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,472,1534834800"; d="scan'208";a="101981597" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.154]) by fmsmga002.fm.intel.com with ESMTP; 06 Nov 2018 07:37:49 -0800 Message-ID: <1541518670.7839.31.camel@intel.com> Subject: Re: RFC: userspace exception fixups From: Sean Christopherson To: Andy Lutomirski , Jann Horn Cc: Dave Hansen , Linus Torvalds , Rich Felker , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Carlos O'Donell , adhemerval.zanella@linaro.org Date: Tue, 06 Nov 2018 07:37:50 -0800 In-Reply-To: References: <20181102163034.GB7393@linux.intel.com> <7050972d-a874-dc08-3214-93e81181da60@intel.com> <20181102170627.GD7393@linux.intel.com> <20181102173350.GF7393@linux.intel.com> <20181102182712.GG7393@linux.intel.com> <20181102220437.GI7393@linux.intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2018-11-02 at 16:32 -0700, Andy Lutomirski wrote: > On Fri, Nov 2, 2018 at 4:28 PM Jann Horn wrote: > > > > > > On Fri, Nov 2, 2018 at 11:04 PM Sean Christopherson > > wrote: > > > > > > On Fri, Nov 02, 2018 at 08:02:23PM +0100, Jann Horn wrote: > > > > > > > > On Fri, Nov 2, 2018 at 7:27 PM Sean Christopherson > > > > wrote: > > > > > > > > > > On Fri, Nov 02, 2018 at 10:48:38AM -0700, Andy Lutomirski wrote: > > > > > > > > > > > > This whole mechanism seems very complicated, and it's not clear > > > > > > exactly what behavior user code wants. > > > > > No argument there.  That's why I like the approach of dumping the > > > > > exception to userspace without trying to do anything intelligent in > > > > > the kernel.  Userspace can then do whatever it wants AND we don't > > > > > have to worry about mucking with stacks. > > > > > > > > > > One of the hiccups with the VDSO approach is that the enclave may > > > > > want to use the untrusted stack, i.e. the stack that has the VDSO's > > > > > stack frame.  For example, Intel's SDK uses the untrusted stack to > > > > > pass parameters for EEXIT, which means an AEX might occur with what > > > > > is effectively a bad stack from the VDSO's perspective. > > > > What exactly does "uses the untrusted stack to pass parameters for > > > > EEXIT" mean? I guess you're saying that the enclave is writing to > > > > RSP+[0...some_positive_offset], and the written data needs to be > > > > visible to the code outside the enclave afterwards? > > > As is, they actually do it the other way around, i.e. negative offsets > > > relative to the untrusted %RSP.  Going into the enclave there is no > > > reserved space on the stack.  The SDK uses EEXIT like a function call, > > > i.e. pushing parameters on the stack and making an call outside of the > > > enclave, hence the name out-call.  This allows the SDK to handle any > > > reasonable out-call without a priori knowledge of the application's > > > maximum out-call "size". > > But presumably this is bounded to be at most 128 bytes (the red zone > > size), right? Otherwise this would be incompatible with > > non-sigaltstack signal delivery. > > I think Sean is saying that the enclave also updates RSP. Yeah, the enclave saves/restores RSP from/to the current save state area. > One might reasonably wonder how the SDX knows the offset from RSP to > the function ID.  Presumably using RBP? Here's pseudocode for how the SDK uses the untrusted stack, minus a bunch of error checking and gory details. The function ID and a pointer to a marshalling struct are passed to the untrusted runtime via normal register params, e.g. RDI and RSI. The marshalling struct is what's actually allocated on the untrusted stack, like alloca() but more complex and explicit.  The marshalling struct size is not artificially restricted by the SDK, e.g. AFAIK it could span multiple 4k pages. int sgx_out_call(const unsigned int func_index, void *marshalling_struct) { struct sgx_encl_tls *tls = get_encl_tls(); %RBP = tls->save_state_area[SSA_RBP]; %RSP = tls->save_state_area[SSA_RSP]; %RDI = func_index; %RSI = marshalling_struct; EEXIT /* magic elsewhere to get back here on an EENTER(OUT_CALL_RETURN) */ return %RAX } void *sgx_alloc_untrusted_stack(size_t size) { struct sgx_encl_tls *tls = get_encl_tls(); struct sgx_out_call_context *context; void *tmp; /* create a frame on the trusted stack to hold the out-call context */ tls->trusted_stack -= sizeof(struct sgx_out_call_context); /* save the untrusted %RSP into the out-call context */ context = (struct sgx_out_call_context *)tls->trusted_stack; context->untrusted_stack = tls->save_state_area[SSA_RSP]; /* allocate space on the untrusted stack */ tmp = (void *)(tls->save_state_area[SSA_RSP] - size); tls->save_state_area[SSA_RSP] = tmp; return tmp; } void sgx_pop_untrusted_stack(void) { struct sgx_encl_tls *tls = get_encl_tls(); struct sgx_out_call_context *context; /* retrieve the current out-call context from the trusted stack */ context = (struct sgx_out_call_context *)tls->trusted_stack; /* restore untrusted %RSP */ tls->save_state_area[SSA_RSP] = context->untrusted_stack; /* pop the out-call context frame */ tls->trusted_stack += sizeof(struct sgx_out_call_context); } int sgx_main(void) { struct my_out_call_struct *params; params = sgx_alloc_untrusted_stack(sizeof(*params)); params->0..N = XYZ; ret = sgx_out_call(DO_WORK, params); sgx_pop_untrusted_stack(); return ret; }