Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1202638imm; Wed, 11 Jul 2018 20:02:38 -0700 (PDT) X-Google-Smtp-Source: AAOMgpe4MzFJXFJ0itlVZNLQ991BvRa8VL778DJugz6nw9yD/Ms23DdvAGY6J3DbTn7GlV3dN3HI X-Received: by 2002:a62:8b0f:: with SMTP id j15-v6mr496517pfe.33.1531364558697; Wed, 11 Jul 2018 20:02:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531364558; cv=none; d=google.com; s=arc-20160816; b=HPSDslsuVVxHxghV2Hv+9DRv5JPjnHmvmx7OgWpOAgFyVBlff8zyk2LjHLj42uTbFV MRDtcqF6B22FRb+sF3UZrF6R8FWVL4UW4oaQDnqFUGq9KSlwHF6KGHo52nA1qLfO+WyX UrWUboXTeKV90V52TNUd80I9J1jp9x1c9CWz9+3Hz5jywR6r45Rx0rushJHp3hE55ZQT 5wPtn5lbnIXzC7Dv5CAuFq4L0//6kJxO59dQOc1HXjtunxD5AxHHz+PoV8FPOTHZYJI3 B+VamzY5bUJZf4IWTXJzMoaSb9UBSmLdqZZ6aj8yM1NDsw3jr5tB1NblsdjqNXytjkLQ h8CQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:to:from:subject:message-id :arc-authentication-results; bh=YlW6k/pzbASViicDJz0XvXGMoYYLI9ROQKRahnQIlPA=; b=QEY40rOfbnEZ3p2E0rjzdOZco/kfbRrBA/0khcxTzWFX3k1b17x79nMS+6il0Y7eHl ABuOVJThy15IzURbUNeo/TqyD2D+qF9IH6NTtfw20pjT/Mh9+Y0/N9jQTMLSyaWp276R bFV0DtK9emtY6i+1fcUJTl9ME89gz5X2Ve+9EVQAo6kZ5hJOhK/+a3IwPqnLljNU3P3h 97llVjZzk4gDYC3tt3IclMF6Tw4+8rHGkpgqsv13bhdUWB20riYUcVz0HPoWnU5b2fLw X2ky4d93phB9ac0IFmtjenR0C9kyqtj/r9X9FcQ7qN/GFi1jOvTbT+EHvIMChv05BtMp c5cg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si20600920plf.133.2018.07.11.20.02.23; Wed, 11 Jul 2018 20:02:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390473AbeGKWUX (ORCPT + 99 others); Wed, 11 Jul 2018 18:20:23 -0400 Received: from mga04.intel.com ([192.55.52.120]:26724 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387637AbeGKWUX (ORCPT ); Wed, 11 Jul 2018 18:20:23 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jul 2018 15:13:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,339,1526367600"; d="scan'208";a="66187062" Received: from 2b52.sc.intel.com ([143.183.136.52]) by fmsmga002.fm.intel.com with ESMTP; 11 Jul 2018 15:13:57 -0700 Message-ID: <1531347019.15351.89.camel@intel.com> Subject: Re: [RFC PATCH v2 22/27] x86/cet/ibt: User-mode indirect branch tracking support From: Yu-cheng Yu To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Cyrill Gorcunov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , "Ravi V. Shankar" , Vedvyas Shanbhogue Date: Wed, 11 Jul 2018 15:10:19 -0700 In-Reply-To: <3a7e9ce4-03c6-cc28-017b-d00108459e94@linux.intel.com> References: <20180710222639.8241-1-yu-cheng.yu@intel.com> <20180710222639.8241-23-yu-cheng.yu@intel.com> <3a7e9ce4-03c6-cc28-017b-d00108459e94@linux.intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2018-07-10 at 17:11 -0700, Dave Hansen wrote: > Is this feature *integral* to shadow stacks?  Or, should it just be > in a > different series? The whole CET series is mostly about SHSTK and only a minority for IBT. IBT changes cannot be applied by itself without first applying SHSTK changes.  Would the titles help, e.g. x86/cet/ibt, x86/cet/shstk, etc.? > > > > > diff --git a/arch/x86/include/asm/cet.h > > b/arch/x86/include/asm/cet.h > > index d9ae3d86cdd7..71da2cccba16 100644 > > --- a/arch/x86/include/asm/cet.h > > +++ b/arch/x86/include/asm/cet.h > > @@ -12,7 +12,10 @@ struct task_struct; > >  struct cet_status { > >   unsigned long shstk_base; > >   unsigned long shstk_size; > > + unsigned long ibt_bitmap_addr; > > + unsigned long ibt_bitmap_size; > >   unsigned int shstk_enabled:1; > > + unsigned int ibt_enabled:1; > >  }; > Is there a reason we're not using pointers here?  This seems like the > kind of place that we probably want __user pointers. Yes, I will change that. > > > > > > +static unsigned long ibt_mmap(unsigned long addr, unsigned long > > len) > > +{ > > + struct mm_struct *mm = current->mm; > > + unsigned long populate; > > + > > + down_write(&mm->mmap_sem); > > + addr = do_mmap(NULL, addr, len, PROT_READ | PROT_WRITE, > > +        MAP_ANONYMOUS | MAP_PRIVATE, > > +        VM_DONTDUMP, 0, &populate, NULL); > > + up_write(&mm->mmap_sem); > > + > > + if (populate) > > + mm_populate(addr, populate); > > + > > + return addr; > > +} > We're going to have to start consolidating these at some point.  We > have > at least three of them now, maybe more. Maybe we can do the following in linux/mm.h? +static inline unsigned long do_mmap_locked(addr, len, prot, +     flags, vm_flags) +{ + struct mm_struct *mm = current->mm; + unsigned long populate; + + down_write(&mm->mmap_sem); + addr = do_mmap(NULL, addr, len, prot, flags, vm_flags, +        0, &populate, NULL); + up_write(&mm->mmap_sem); + + if (populate) + mm_populate(addr, populate); + + return addr; +}  > > > > +int cet_setup_ibt_bitmap(void) > > +{ > > + u64 r; > > + unsigned long bitmap; > > + unsigned long size; > > + > > + if (!cpu_feature_enabled(X86_FEATURE_IBT)) > > + return -EOPNOTSUPP; > > + > > + size = TASK_SIZE_MAX / PAGE_SIZE / BITS_PER_BYTE; > Just a note: this table is going to be gigantic on 5-level paging > systems, and userspace won't, by default use any of that extra > address > space.  I think it ends up being a 512GB allocation in a 128TB > address > space. > > Is that a problem? > > On 5-level paging systems, maybe we should just stick it up in the > high > part of the address space. We do not know in advance if dlopen() needs to create the bitmap.  Do we always reserve high address or force legacy libs to low address? > > > > > + bitmap = ibt_mmap(0, size); > > + > > + if (bitmap >= TASK_SIZE_MAX) > > + return -ENOMEM; > > + > > + bitmap &= PAGE_MASK; > We're page-aligning the result of an mmap()?  Why? This may not be necessary.  The lower bits of MSR_IA32_U_CET are settings and not part of the bitmap address.  Is this is safer? > > > > > + rdmsrl(MSR_IA32_U_CET, r); > > + r |= (MSR_IA32_CET_LEG_IW_EN | bitmap); > > + wrmsrl(MSR_IA32_U_CET, r); > Comments, please.  What is this doing, logically?  Also, why are we > OR'ing the results into this MSR?  What are we trying to preserve? I will add comments. > > > > > + current->thread.cet.ibt_bitmap_addr = bitmap; > > + current->thread.cet.ibt_bitmap_size = size; > > + return 0; > > +} > > + > > +void cet_disable_ibt(void) > > +{ > > + u64 r; > > + > > + if (!cpu_feature_enabled(X86_FEATURE_IBT)) > > + return; > Does this need a check for being already disabled? We need that.  We cannot write to those MSRs if the CPU does not support it. > > > > > + rdmsrl(MSR_IA32_U_CET, r); > > + r &= ~(MSR_IA32_CET_ENDBR_EN | MSR_IA32_CET_LEG_IW_EN | > > +        MSR_IA32_CET_NO_TRACK_EN); > > + wrmsrl(MSR_IA32_U_CET, r); > > + current->thread.cet.ibt_enabled = 0; > > +} > What's the locking for current->thread.cet? Now CET is not locked until the application calls ARCH_CET_LOCK. > > > > > diff --git a/arch/x86/kernel/cpu/common.c > > b/arch/x86/kernel/cpu/common.c > > index 705467839ce8..c609c9ce5691 100644 > > --- a/arch/x86/kernel/cpu/common.c > > +++ b/arch/x86/kernel/cpu/common.c > > @@ -413,7 +413,8 @@ __setup("nopku", setup_disable_pku); > >   > >  static __always_inline void setup_cet(struct cpuinfo_x86 *c) > >  { > > - if (cpu_feature_enabled(X86_FEATURE_SHSTK)) > > + if (cpu_feature_enabled(X86_FEATURE_SHSTK) || > > +     cpu_feature_enabled(X86_FEATURE_IBT)) > >   cr4_set_bits(X86_CR4_CET); > >  } > >   > > @@ -434,6 +435,23 @@ static __init int setup_disable_shstk(char *s) > >  __setup("no_cet_shstk", setup_disable_shstk); > >  #endif > >   > > +#ifdef CONFIG_X86_INTEL_BRANCH_TRACKING_USER > > +static __init int setup_disable_ibt(char *s) > > +{ > > + /* require an exact match without trailing characters */ > > + if (strlen(s)) > > + return 0; > > + > > + if (!boot_cpu_has(X86_FEATURE_IBT)) > > + return 1; > > + > > + setup_clear_cpu_cap(X86_FEATURE_IBT); > > + pr_info("x86: 'no_cet_ibt' specified, disabling Branch > > Tracking\n"); > > + return 1; > > +} > > +__setup("no_cet_ibt", setup_disable_ibt); > > +#endif > >  /* > >   * Some CPU features depend on higher CPUID levels, which may not > > always > >   * be available due to CPUID level capping or broken > > virtualization > > diff --git a/arch/x86/kernel/elf.c b/arch/x86/kernel/elf.c > > index 233f6dad9c1f..42e08d3b573e 100644 > > --- a/arch/x86/kernel/elf.c > > +++ b/arch/x86/kernel/elf.c > > @@ -15,6 +15,7 @@ > >  #include > >  #include > >  #include > > +#include > >   > >  /* > >   * The .note.gnu.property layout: > > @@ -222,7 +223,8 @@ int arch_setup_features(void *ehdr_p, void > > *phdr_p, > >   > >   struct elf64_hdr *ehdr64 = ehdr_p; > >   > > - if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) > > + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) && > > +     !cpu_feature_enabled(X86_FEATURE_IBT)) > >   return 0; > >   > >   if (ehdr64->e_ident[EI_CLASS] == ELFCLASS64) { > > @@ -250,6 +252,9 @@ int arch_setup_features(void *ehdr_p, void > > *phdr_p, > >   current->thread.cet.shstk_enabled = 0; > >   current->thread.cet.shstk_base = 0; > >   current->thread.cet.shstk_size = 0; > > + current->thread.cet.ibt_enabled = 0; > > + current->thread.cet.ibt_bitmap_addr = 0; > > + current->thread.cet.ibt_bitmap_size = 0; > >   if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { > >   if (shstk) { > >   err = cet_setup_shstk(); > > @@ -257,6 +262,15 @@ int arch_setup_features(void *ehdr_p, void > > *phdr_p, > >   goto out; > >   } > >   } > > + > > + if (cpu_feature_enabled(X86_FEATURE_IBT)) { > > + if (ibt) { > > + err = cet_setup_ibt(); > > + if (err < 0) > > + goto out; > > + } > > + } > You introduced 'ibt' before it was used.  Please wait to introduce it > until you actually use it to make it easier to review. > > Also, what's wrong with: > > if (cpu_feature_enabled(X86_FEATURE_IBT) && ibt) { > ... > } > > ? I will fix it.