Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1081397imm; Fri, 13 Jul 2018 11:07:48 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfYz3kiKuqIwgqFIXhIx9pMeq9GR2gvs33POE4UOS8BO1ga2e+ZJ15ddGS5zHojj80OlRkB X-Received: by 2002:a65:6211:: with SMTP id d17-v6mr7188986pgv.450.1531505268472; Fri, 13 Jul 2018 11:07:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531505268; cv=none; d=google.com; s=arc-20160816; b=Zml5urM/3wLDXCM0nHctr5C8yyTqS2wHVY4r28yiALIMCDUHlMGRQAFYFXjKRTeq6N Ux7VX8WJjisuB52j9si3SH3L2q3mR6JPk6Ne1FSBHZAeaSLxV4WBD1+Ah6YM0WQQ3Wed ygz6Jk70gCQl4tVgpr1PjcCBvkRsnlStJ303Q20CpQYmULEmVr+SJMW1CaJaa5pK1tDU aMv76aHVP5x1FvIsGhGION01bB21gRZDePsnDj2UqDxnk8AvOR6lIifQJZhjZxfm7+35 0HoqVm61k+oTt3TrRaM66TTsZYN3kFG1YYaaoLfb60BpjZdX351u5dFS+Ok3GJZdjHIr BhaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=7/KDxQd0fvDv+vIm+38SY2RWCRWavFH12RmxJ75BFGk=; b=ueNSLiMoJiZw/Qd/jKIViabO/zNAsYoeN7JWdMZaUQ6Q7d/SZ9fCQWKWJxSzzI/yWu SbK7koLVyaID7S8ke8ikIEDEDB9o1km6d/3kDOKbFPDotlnQhlSJNgL+Pmn/V354KbJH UzqkzKmNpspOV5JGjODQPsGUTEse8ei2WnUUjlNgHwZdu88l0pSoAzQG6fbcxg1oxPii VM5SpQKuBEWMSnJtcoBk7e1wXuU72RnHblOGzEkHbh2bWcgrnLJJVRkf5c65POKa1Ax1 KU0H80RoIXC8SBliQ32h/+WuAXpMwszsRAvVzxrVoZ9ksqGHv+LNiV7dCcxTsWzSZnU3 hoKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o33-v6si23729690plb.432.2018.07.13.11.07.33; Fri, 13 Jul 2018 11:07:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387801AbeGMSW1 (ORCPT + 99 others); Fri, 13 Jul 2018 14:22:27 -0400 Received: from mga11.intel.com ([192.55.52.93]:4809 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387717AbeGMSW1 (ORCPT ); Fri, 13 Jul 2018 14:22:27 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 11:06:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,348,1526367600"; d="scan'208";a="72146951" Received: from 2b52.sc.intel.com ([143.183.136.52]) by fmsmga001.fm.intel.com with ESMTP; 13 Jul 2018 11:06:45 -0700 Message-ID: <1531504984.11680.21.camel@intel.com> Subject: Re: [RFC PATCH v2 17/27] x86/cet/shstk: User-mode shadow stack support From: Yu-cheng Yu To: Andy Lutomirski , Jann Horn Cc: the arch/x86 maintainers , "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , kernel list , linux-doc@vger.kernel.org, Linux-MM , linux-arch , Linux API , Arnd Bergmann , bsingharora@gmail.com, Cyrill Gorcunov , Dave Hansen , Florian Weimer , hjl.tools@gmail.com, Jonathan Corbet , keescook@chromiun.org, Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , ravi.v.shankar@intel.com, vedvyas.shanbhogue@intel.com Date: Fri, 13 Jul 2018 11:03:04 -0700 In-Reply-To: References: <20180710222639.8241-1-yu-cheng.yu@intel.com> <20180710222639.8241-18-yu-cheng.yu@intel.com> <6F5FEFFD-0A9A-4181-8D15-5FC323632BA6@amacapital.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-07-11 at 15:21 -0700, Andy Lutomirski wrote: > > > > On Jul 11, 2018, at 2:51 PM, Jann Horn wrote: > > > > On Wed, Jul 11, 2018 at 2:34 PM Andy Lutomirski wrote: > > > > > > > > > > > On Jul 11, 2018, at 2:10 PM, Jann Horn wrote: > > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 3:31 PM Yu-cheng Yu wrote: > > > > > > > > > > This patch adds basic shadow stack enabling/disabling routines. > > > > > A task's shadow stack is allocated from memory with VM_SHSTK > > > > > flag set and read-only protection.  The shadow stack is > > > > > allocated to a fixed size. > > > > > > > > > > Signed-off-by: Yu-cheng Yu > > > > [...] > > > > > > > > > > diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c > > > > > new file mode 100644 > > > > > index 000000000000..96bf69db7da7 > > > > > --- /dev/null > > > > > +++ b/arch/x86/kernel/cet.c > > > > [...] > > > > > > > > > > +static unsigned long shstk_mmap(unsigned long addr, unsigned long len) > > > > > +{ > > > > > +       struct mm_struct *mm = current->mm; > > > > > +       unsigned long populate; > > > > > + > > > > > +       down_write(&mm->mmap_sem); > > > > > +       addr = do_mmap(NULL, addr, len, PROT_READ, > > > > > +                      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK, > > > > > +                      0, &populate, NULL); > > > > > +       up_write(&mm->mmap_sem); > > > > > + > > > > > +       if (populate) > > > > > +               mm_populate(addr, populate); > > > > > + > > > > > +       return addr; > > > > > +} > > [...] > > > > > > > > > > > Should the kernel enforce that two shadow stacks must have a guard > > > > page between them so that they can not be directly adjacent, so that > > > > if you have too much recursion, you can't end up corrupting an > > > > adjacent shadow stack? > > > I think the answer is a qualified “no”. I would like to instead enforce a general guard page on all mmaps that don’t use MAP_FORCE. We *might* need to exempt any mmap with an address hint for > > > compatibility. > > I like this idea a lot. > > > > > > > > My commercial software has been manually adding guard pages on every single mmap done by tcmalloc for years, and it has caught a couple bugs and costs essentially nothing. > > > > > > Hmm. Linux should maybe add something like Windows’ “reserved” virtual memory. It’s basically a way to ask for a VA range that explicitly contains nothing and can be subsequently be turned into > > > something useful with the equivalent of MAP_FORCE. > > What's the benefit over creating an anonymous PROT_NONE region? That > > the kernel won't have to scan through the corresponding PTEs when > > tearing down the mapping? > Make it more obvious what’s happening and avoid accounting issues?  What I’ve actually used is MAP_NORESERVE | PROT_NONE, but I think this still counts against the VA rlimit. But maybe that’s > actually the desired behavior. We can put a NULL at both ends of a SHSTK to guard against corruption. Yu-cheng