Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp2329084imd; Fri, 2 Nov 2018 09:31:12 -0700 (PDT) X-Google-Smtp-Source: AJdET5djE62JmatTIvOO9V7xe1JetPZrmVKt4k4Kzuoxcq2TTTVZ57ecHWbbxnqvL2I31CoFQPqA X-Received: by 2002:a17:902:24e7:: with SMTP id l36-v6mr12223960plg.234.1541176272910; Fri, 02 Nov 2018 09:31:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541176272; cv=none; d=google.com; s=arc-20160816; b=08bQHWbMb5u0dYfISJFVPHSxj1BW8BZRqfTzi7dvgckF4oLW55IqBo5+7e9UpxKWez pY3+cE8FbZBg3VTpOeAaIA4VYOmZpYk8V9ch0HBwNlQww9wMgIPGI+V6X1cubyRmrySV rUSZinZEIh5TTnYbmKriJap0IlQYIHgFreFdym2yNUhHS6Pe8XftRNZ2a8rGfiVx81Xt 0oeYWr83QaIyeVO7kREQAQ7VLIzqYHe80lL/ADINc3HL3yfZKcnvsbtpoFSNzsbqEG3k HiZIrISwPDNInXE36RwMEDf0NZZxWxD3L1hz+ep0tYMoshYLBBLYG+WlTuVM98mEJOdl xajA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=6ieM2dzUAOTsSydsZOSlNl2cHmx3bEu423eZXdYdqLo=; b=Xp0J5Qax9YVhYi+PpjJ3r1aP56DL6svZiXXvmsg0qAzgGgTpIiarUImBvc+LKypdwQ 1+eXfTPij8ZpJyAB0QOiKuwFfMd4m5saONM3/Wp38/F42iRbq2y6/lqx+0c9Cx7i/t+I EKxm5AexK39Ksx0eBgfW/mjjXwHLml6Se/jHqLG3+ZnFc04JPWrg25AX86uBQjsHySOJ LKgT2eYLRIkbSbxqLj9QuwE1GlQ7kNrI2m98vTs5vYcnzmPGXDz91Ik2NHLgBYkv77A6 CEcW9o5BOlolVZkocMVYHJwYpYk4f9HRASS6PF6FDFqiMdtr4PTeFY2xFtjchtqdO3li hOcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f7-v6si33728848pgn.108.2018.11.02.09.30.57; Fri, 02 Nov 2018 09:31:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727603AbeKCBiR (ORCPT + 99 others); Fri, 2 Nov 2018 21:38:17 -0400 Received: from mga01.intel.com ([192.55.52.88]:30202 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726260AbeKCBiR (ORCPT ); Fri, 2 Nov 2018 21:38:17 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2018 09:30:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,456,1534834800"; d="scan'208";a="277840635" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.193]) by fmsmga006.fm.intel.com with ESMTP; 02 Nov 2018 09:30:35 -0700 Date: Fri, 2 Nov 2018 09:30:34 -0700 From: Sean Christopherson To: Andy Lutomirski Cc: Linus Torvalds , Rich Felker , Jann Horn , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Carlos O'Donell , adhemerval.zanella@linaro.org Subject: Re: RFC: userspace exception fixups Message-ID: <20181102163034.GB7393@linux.intel.com> References: <20181101185225.GC5150@brightrain.aerifal.cx> <20181101193107.GE5150@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 01, 2018 at 04:22:55PM -0700, Andy Lutomirski wrote: > On Thu, Nov 1, 2018 at 2:24 PM Linus Torvalds > wrote: > > > > On Thu, Nov 1, 2018 at 12:31 PM Rich Felker wrote: > > > > > > See my other emails in this thread. You would register the *address* > > > (in TLS) of a function pointer object pointing to the handler, rather > > > than the function address of the handler. Then switching handler is > > > just a single store in userspace, no syscalls involved. > > > > Yes. > > > > And for just EENTER, maybe that's the right model. > > > > If we want to generalize it to other thread-synchronous faults, it > > needs way more information and a list of handlers, but if we limit the > > thing to _only_ EENTER getting an SGX fault, then a single "this is > > the fault handler" address is probably the right thing to do. > > It sounds like you're saying that the kernel should know, *before* > running any user fixup code, whether the fault in question is one that > wants a fixup. Sounds reasonable. > > I think it would be nice, but not absolutely necessary, if user code > didn't need to poke some value into TLS each time it ran a function > that had a fixup. With the poke-into-TLS approach, it looks a lot > like rseq, and rseq doesn't nest very nicely. I think we really want > this mechanism to Just Work. So we could maybe have a syscall that > associates a list of fixups with a given range of text addresses. We > might want the kernel to automatically zap the fixups when the text in > question is unmapped. If this is EENTER specific then nesting isn't an issue. But I don't see a simple way to restrict the mechanism to EENTER. What if rather than having userspace register an address for fixup the kernel instead unconditionally does fixup on the ENCLU opcode? For example, skip the instruction and put fault info into some combination of RDX/RSI/RDI (they're cleared on asynchronous enclave exits). The decode logic is straightforward since ENCLU doesn't have operands, we'd just have to eat any ignored prefixes. The intended convention for EENTER is to have an ENCLU at the AEX target (to automatically do ERESUME after INTR, etc...), so this would work regardless of whether the fault happened on EENTER or in the enclave. EENTER/ERESUME are the only ENCLU functions that are allowed outside of an enclave so there's no danger of accidentally crushing something else. This way we wouldn't need a VDSO blob and we'd enforce the kernel's ABI, e.g. a library that tried to use signal handling would go off the rails when the kernel mucked with the registers. We could even have the SGX EPC fault handler return VM_FAULT_SIGBUS if the faulting instruction isn't ENCLU, e.g. to further enforce that the AEX target needs to be ENCLU. Userspace would look something like this: mov tcs, %xbx /* Thread Control Structure address */ leaq async_exit(%rip), %rcx /* AEX target for EENTER/RESUME */ mov $SGX_EENTER, %rax /* EENTER leaf */ async_exit: ENCLU fault_handler: enclave_exit: /* EEXIT target */