Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp972697imm; Fri, 8 Jun 2018 08:04:04 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ8YH3KSUD9iNcpzV1xmwoOfKKXSzDKqZd66R/klWrILefvThOkxzcRydauM11d5auv3IBu X-Received: by 2002:a17:902:b786:: with SMTP id e6-v6mr7112625pls.260.1528470244314; Fri, 08 Jun 2018 08:04:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528470244; cv=none; d=google.com; s=arc-20160816; b=cTAdiZU5+bLVSFiLFB5cLELuoQMX0OZpMdVTgzYEhtA+7cL4e3C3IH1i/hjwIG3p8p WugOCK3VUTu3B/NIHIIvbipnNObrd1l+NXrJ5e90g/bpqExV2l7mTlc2bPOVlpSL4g3E 7/VqFx9X318iEkwPQ7lI5v4mnbq1wZaAO7iBiAYsAStz7Q2dvA/FXpDlXv3Laf2Om13m QB1VsvXzdu9xg2gQtY2m7qCXCTDxmYKUlGugvGQyGZOn/4j9HD6COx+KgSWQTg0blQoj TxzPUTjYO13MNe7m77O7yjBpvqcmcgpjD97MjB+0ZULuZQVVPWnXpEfpY2jlDdHOsDU4 yoYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature:arc-authentication-results; bh=kFzQLTFEf2LDobRCFt+2hfNpbjiaie5Bwz2GxrcjGnE=; b=vpQ7/TgJmSFA+eOOqAUH+CownZHhCgJ3P78xqkEJVZKJH3YOElimA1FrzxVb+tD3LP jbVBhoopJ9aB+vBBO9xFud7cbNggK6H4QZDUQ0IUeekXy56OGFVEFSpRqTDUWejZ2TkE J0Np4jq8htKyRHVqJ2oO+0lecFUeN2jiEKFQgqYBqUhHWYjAKi9ZNX0gvQ1D0Mfb3WwM nSjCSXavl9X0/c7uQkRRiFliVWc8Ebgw/XuGwNEMY6mzyzbiUGA3EV36y372fM5FR+ce 7EBuTRu1Yrd5N+HRtN7ILpqU6bci5Ap4sadPrLxEvwu7BMxiOWwyLTKd4lS2uY9lNnSY CVrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=xI0dNiot; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v5-v6si11834365pgs.15.2018.06.08.08.03.38; Fri, 08 Jun 2018 08:04:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=xI0dNiot; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752729AbeFHPCA (ORCPT + 99 others); Fri, 8 Jun 2018 11:02:00 -0400 Received: from mail.kernel.org ([198.145.29.99]:40592 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751905AbeFHPB6 (ORCPT ); Fri, 8 Jun 2018 11:01:58 -0400 Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7EB35208B1 for ; Fri, 8 Jun 2018 15:01:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1528470117; bh=oPC86fbNyuCunakBiXdzs4ozNJN/bNwMdMGWPHNb150=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=xI0dNiotMzCPNZPDpj/0DJ9Q/X8Sz02E8yHMaV6zKfk3KD53WLRFXkh90pIjXKcc6 qDMglo/XAednt71wV4NsmnKjmh1CBCjw1xWYSE/9bhs7MU4AGijngrjD9Dc/2+yKSa hLmXJWseiPLJcIOObF+jrDQZXzi3v0HRgp4HhhKs= Received: by mail-wm0-f53.google.com with SMTP id e16-v6so3893776wmd.0 for ; Fri, 08 Jun 2018 08:01:57 -0700 (PDT) X-Gm-Message-State: APt69E1Ldt1gCorJoL1KBT8U+fBZ1L3wNOL+kRcPvYIHjtHOPPjYWfjv cwjwepEBhOmhLqmC637EuitfUo8phXjvPb8SlNVEaw== X-Received: by 2002:a1c:f902:: with SMTP id x2-v6mr1705545wmh.116.1528470115793; Fri, 08 Jun 2018 08:01:55 -0700 (PDT) MIME-Version: 1.0 References: <20180607143807.3611-1-yu-cheng.yu@intel.com> <20180607143807.3611-5-yu-cheng.yu@intel.com> <3c1bdf85-0c52-39ed-a799-e26ac0e52391@redhat.com> <6ee29e8b-4a0a-3459-a1ee-03923ba4e15d@redhat.com> In-Reply-To: <6ee29e8b-4a0a-3459-a1ee-03923ba4e15d@redhat.com> From: Andy Lutomirski Date: Fri, 8 Jun 2018 08:01:43 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 04/10] x86/cet: Handle thread shadow stack To: Florian Weimer Cc: Andrew Lutomirski , Yu-cheng Yu , LKML , linux-doc@vger.kernel.org, Linux-MM , linux-arch , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "H. J. Lu" , "Shanbhogue, Vedvyas" , "Ravi V. Shankar" , Dave Hansen , Jonathan Corbet , Oleg Nesterov , Arnd Bergmann , mike.kravetz@oracle.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer wrote: > > On 06/07/2018 10:53 PM, Andy Lutomirski wrote: > > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer wro= te: > >> > >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote: > >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu wr= ote: > >>>> > >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child > >>>> needs a separate program stack and a separate shadow stack. > >>>> This patch handles allocation and freeing of the thread shadow > >>>> stack. > >>> > >>> Aha -- you're trying to make this automatic. I'm not convinced this > >>> is a good idea. The Linux kernel has a long and storied history of > >>> enabling new hardware features in ways that are almost entirely > >>> useless for userspace. > >>> > >>> Florian, do you have any thoughts on how the user/kernel interaction > >>> for the shadow stack should work? > >> > >> I have not looked at this in detail, have not played with the emulator= , > >> and have not been privy to any discussions before these patches have > >> been posted, however =E2=80=A6 > >> > >> I believe that we want as little code in userspace for shadow stack > >> management as possible. One concern I have is that even with the code > >> we arguably need for various kinds of stack unwinding, we might have > >> unwittingly built a generic trampoline that leads to full CET bypass. > > > > I was imagining an API like "allocate a shadow stack for the current > > thread, fail if the current thread already has one, and turn on the > > shadow stack". glibc would call clone and then call this ABI pretty > > much immediately (i.e. before making any calls from which it expects > > to return). > > Ahh. So you propose not to enable shadow stack enforcement on the new > thread even if it is enabled for the current thread? For the cases > where CLONE_VM is involved? > > It will still need a new assembler wrapper which sets up the shadow > stack, and it's probably required to disable signals. > > I think it should be reasonable safe and actually implementable. But > the benefits are not immediately obvious to me. Doing it this way would have been my first incliniation. It would avoid all the oddities of the kernel magically creating a VMA when clone() is called, guessing the shadow stack size, etc. But I'm okay with having the kernel do it automatically, too. I think it would be very nice to have a way for user code to find out the size of the shadow stack and change it, though. (And relocate it, but maybe that's impossible. The CET documentation doesn't have a clear description of the shadow stack layout.) > > > We definitely want strong enough user control that tools like CRIU can > > continue to work. I haven't looked at the SDM recently enough to > > remember for sure, but I'm reasonably confident that user code can > > learn the address of its own shadow stack. If nothing else, CRIU > > needs to be able to restore from a context where there's a signal on > > the stack and the signal frame contains a shadow stack pointer. > > CRIU also needs the shadow stack *contents*, which shouldn't be directly > available to the process. So it needs very special interfaces anyway. True. I proposed in a different email that ptrace() have full control of the shadow stack (read, write, lock, unlock, etc). > > Does CRIU implement MPX support? Dunno. But given that MPX seems to be dying, I'm not sure it matters. --Andy