Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1202899imm; Wed, 11 Jul 2018 20:03:00 -0700 (PDT) X-Google-Smtp-Source: AAOMgpf5Ovts4ZUExfuH7vZbVCxPx8b/2Dg8bEa5vUxoIlOnpErfB7lmy431JjyBqy9iakSPYyxH X-Received: by 2002:a63:b02:: with SMTP id 2-v6mr403098pgl.301.1531364580502; Wed, 11 Jul 2018 20:03:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531364580; cv=none; d=google.com; s=arc-20160816; b=J2dddcsctjX7mHyDNoRCXBl4Jzckg0XJYhLHOETFbJdhCNYFVcO1KzpjWcUGekJKcl Ni1KmMe8Agfy0a0ZVHXWBK+maDvDAUhaOxHtkuqVXC1oDhKNAMAPa8xQ8UilnwvsVlJg co9kFx/RcHh1my4Rhe0dehRunO+G72/TMiVTXtMvDgi0jhL6MKkq0ia1LNQ1XIvwyzkZ kBspA1zHMvZ+Sd0pYb1IF+9Kls5UqfzyzWjqfjIQNoag7tXPT/7axTBdQXwf3klTxeF4 BgjhIfGr8cmtLMzu1h7TEYCi2nioKVSHmkgq6X2LP8VU+XnrqHixbQqul5ZMsz5OarzJ 4NMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:openpgp:from:references:to:subject :arc-authentication-results; bh=rCp5rKBSHy9t2cgAsTLuDtBfrZfmPkiG8uIDbtRh/ns=; b=SzPnaTAmRiflmu2TWy5Br3Is+6RAYf8nwOVH9/hA8wzjziKIQ79ovuh31OV2+u/xb1 7u4fgdsvf2ZAFQUnMWr4fIYD53+1H1rqKUPt3X/D5V08/NrYwlD9DFMypUz59075SJ1G kB9mswE4lWkbcy4ERv6CSru+99F1vDpA7xncBWRIchk303rOBLeoNTDQwj8lo+vkshy4 l/Uj2jLAW9sKmLCsf4w5CwgAf7t+VBVbYKLWQKCJ0yRqMn4ZUJnTpqwZZCqvxciigCHP VTkquRjeUGatAA4vSiDmFdeKF/zTgbbdpx9d6dg/6412Bso54CGOxvQpXq9qIB6GQChx W0Hw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1-v6si18366490pfe.66.2018.07.11.20.02.45; Wed, 11 Jul 2018 20:03:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390692AbeGKWrS (ORCPT + 99 others); Wed, 11 Jul 2018 18:47:18 -0400 Received: from mga17.intel.com ([192.55.52.151]:1559 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387477AbeGKWrS (ORCPT ); Wed, 11 Jul 2018 18:47:18 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jul 2018 15:40:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,339,1526367600"; d="scan'208";a="56958167" Received: from ray.jf.intel.com (HELO [10.7.201.16]) ([10.7.201.16]) by orsmga006.jf.intel.com with ESMTP; 11 Jul 2018 15:40:47 -0700 Subject: Re: [RFC PATCH v2 22/27] x86/cet/ibt: User-mode indirect branch tracking support To: Yu-cheng Yu , x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Cyrill Gorcunov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , "Ravi V. Shankar" , Vedvyas Shanbhogue References: <20180710222639.8241-1-yu-cheng.yu@intel.com> <20180710222639.8241-23-yu-cheng.yu@intel.com> <3a7e9ce4-03c6-cc28-017b-d00108459e94@linux.intel.com> <1531347019.15351.89.camel@intel.com> From: Dave Hansen Openpgp: preference=signencrypt Autocrypt: addr=dave.hansen@linux.intel.com; keydata= xsFNBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABzShEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gPGRhdmVAc3I3MS5uZXQ+wsF7BBMBAgAlAhsDBgsJCAcDAgYVCAIJ CgsEFgIDAQIeAQIXgAUCTo3k0QIZAQAKCRBoNZUwcMmSsMO2D/421Xg8pimb9mPzM5N7khT0 2MCnaGssU1T59YPE25kYdx2HntwdO0JA27Wn9xx5zYijOe6B21ufrvsyv42auCO85+oFJWfE K2R/IpLle09GDx5tcEmMAHX6KSxpHmGuJmUPibHVbfep2aCh9lKaDqQR07gXXWK5/yU1Dx0r VVFRaHTasp9fZ9AmY4K9/BSA3VkQ8v3OrxNty3OdsrmTTzO91YszpdbjjEFZK53zXy6tUD2d e1i0kBBS6NLAAsqEtneplz88T/v7MpLmpY30N9gQU3QyRC50jJ7LU9RazMjUQY1WohVsR56d ORqFxS8ChhyJs7BI34vQusYHDTp6PnZHUppb9WIzjeWlC7Jc8lSBDlEWodmqQQgp5+6AfhTD kDv1a+W5+ncq+Uo63WHRiCPuyt4di4/0zo28RVcjtzlGBZtmz2EIC3vUfmoZbO/Gn6EKbYAn rzz3iU/JWV8DwQ+sZSGu0HmvYMt6t5SmqWQo/hyHtA7uF5Wxtu1lCgolSQw4t49ZuOyOnQi5 f8R3nE7lpVCSF1TT+h8kMvFPv3VG7KunyjHr3sEptYxQs4VRxqeirSuyBv1TyxT+LdTm6j4a mulOWf+YtFRAgIYyyN5YOepDEBv4LUM8Tz98lZiNMlFyRMNrsLV6Pv6SxhrMxbT6TNVS5D+6 UorTLotDZKp5+M7BTQRUY85qARAAsgMW71BIXRgxjYNCYQ3Xs8k3TfAvQRbHccky50h99TUY sqdULbsb3KhmY29raw1bgmyM0a4DGS1YKN7qazCDsdQlxIJp9t2YYdBKXVRzPCCsfWe1dK/q 66UVhRPP8EGZ4CmFYuPTxqGY+dGRInxCeap/xzbKdvmPm01Iw3YFjAE4PQ4hTMr/H76KoDbD cq62U50oKC83ca/PRRh2QqEqACvIH4BR7jueAZSPEDnzwxvVgzyeuhwqHY05QRK/wsKuhq7s UuYtmN92Fasbxbw2tbVLZfoidklikvZAmotg0dwcFTjSRGEg0Gr3p/xBzJWNavFZZ95Rj7Et db0lCt0HDSY5q4GMR+SrFbH+jzUY/ZqfGdZCBqo0cdPPp58krVgtIGR+ja2Mkva6ah94/oQN lnCOw3udS+Eb/aRcM6detZr7XOngvxsWolBrhwTQFT9D2NH6ryAuvKd6yyAFt3/e7r+HHtkU kOy27D7IpjngqP+b4EumELI/NxPgIqT69PQmo9IZaI/oRaKorYnDaZrMXViqDrFdD37XELwQ gmLoSm2VfbOYY7fap/AhPOgOYOSqg3/Nxcapv71yoBzRRxOc4FxmZ65mn+q3rEM27yRztBW9 AnCKIc66T2i92HqXCw6AgoBJRjBkI3QnEkPgohQkZdAb8o9WGVKpfmZKbYBo4pEAEQEAAcLB XwQYAQIACQUCVGPOagIbDAAKCRBoNZUwcMmSsJeCEACCh7P/aaOLKWQxcnw47p4phIVR6pVL e4IEdR7Jf7ZL00s3vKSNT+nRqdl1ugJx9Ymsp8kXKMk9GSfmZpuMQB9c6io1qZc6nW/3TtvK pNGz7KPPtaDzvKA4S5tfrWPnDr7n15AU5vsIZvgMjU42gkbemkjJwP0B1RkifIK60yQqAAlT YZ14P0dIPdIPIlfEPiAWcg5BtLQU4Wg3cNQdpWrCJ1E3m/RIlXy/2Y3YOVVohfSy+4kvvYU3 lXUdPb04UPw4VWwjcVZPg7cgR7Izion61bGHqVqURgSALt2yvHl7cr68NYoFkzbNsGsye9ft M9ozM23JSgMkRylPSXTeh5JIK9pz2+etco3AfLCKtaRVysjvpysukmWMTrx8QnI5Nn5MOlJj 1Ov4/50JY9pXzgIDVSrgy6LYSMc4vKZ3QfCY7ipLRORyalFDF3j5AGCMRENJjHPD6O7bl3Xo 4DzMID+8eucbXxKiNEbs21IqBZbbKdY1GkcEGTE7AnkA3Y6YB7I/j9mQ3hCgm5muJuhM/2Fr OPsw5tV/LmQ5GXH0JQ/TZXWygyRFyyI2FqNTx4WHqUn3yFj8rwTAU1tluRUYyeLy0ayUlKBH ybj0N71vWO936MqP6haFERzuPAIpxj2ezwu0xb1GjTk4ynna6h5GjnKgdfOWoRtoWndMZxbA z5cecg== Message-ID: Date: Wed, 11 Jul 2018 15:40:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <1531347019.15351.89.camel@intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/11/2018 03:10 PM, Yu-cheng Yu wrote: > On Tue, 2018-07-10 at 17:11 -0700, Dave Hansen wrote: >> Is this feature *integral* to shadow stacks?  Or, should it just be >> in a >> different series? > > The whole CET series is mostly about SHSTK and only a minority for IBT. > IBT changes cannot be applied by itself without first applying SHSTK > changes.  Would the titles help, e.g. x86/cet/ibt, x86/cet/shstk, etc.? That doesn't really answer what I asked, though. Do shadow stacks *require* IBT? Or, should we concentrate on merging shadow stacks themselves first and then do IBT at a later time, in a different patch series? But, yes, better patch titles would help, although I'm not sure that's quite the format that Ingo and Thomas prefer. >>> +int cet_setup_ibt_bitmap(void) >>> +{ >>> + u64 r; >>> + unsigned long bitmap; >>> + unsigned long size; >>> + >>> + if (!cpu_feature_enabled(X86_FEATURE_IBT)) >>> + return -EOPNOTSUPP; >>> + >>> + size = TASK_SIZE_MAX / PAGE_SIZE / BITS_PER_BYTE; >> Just a note: this table is going to be gigantic on 5-level paging >> systems, and userspace won't, by default use any of that extra >> address >> space.  I think it ends up being a 512GB allocation in a 128TB >> address >> space. >> >> Is that a problem? >> >> On 5-level paging systems, maybe we should just stick it up in the >> high part of the address space. > > We do not know in advance if dlopen() needs to create the bitmap.  Do > we always reserve high address or force legacy libs to low address? Does it matter? Does code ever get pointers to this area? Might they be depending on high address bits for the IBT being clear? >>> + bitmap = ibt_mmap(0, size); >>> + >>> + if (bitmap >= TASK_SIZE_MAX) >>> + return -ENOMEM; >>> + >>> + bitmap &= PAGE_MASK; >> We're page-aligning the result of an mmap()?  Why? > > This may not be necessary.  The lower bits of MSR_IA32_U_CET are > settings and not part of the bitmap address.  Is this is safer? No. If we have mmap() returning non-page-aligned addresses, we have bigger problems. Worst-case, do WARN_ON_ONCE(bitmap & ~PAGE_MASK); >>> + current->thread.cet.ibt_bitmap_addr = bitmap; >>> + current->thread.cet.ibt_bitmap_size = size; >>> + return 0; >>> +} >>> + >>> +void cet_disable_ibt(void) >>> +{ >>> + u64 r; >>> + >>> + if (!cpu_feature_enabled(X86_FEATURE_IBT)) >>> + return; >> Does this need a check for being already disabled? > > We need that.  We cannot write to those MSRs if the CPU does not > support it. No, I mean for code doing cet_disable_ibt() twice in a row. >>> + rdmsrl(MSR_IA32_U_CET, r); >>> + r &= ~(MSR_IA32_CET_ENDBR_EN | MSR_IA32_CET_LEG_IW_EN | >>> +        MSR_IA32_CET_NO_TRACK_EN); >>> + wrmsrl(MSR_IA32_U_CET, r); >>> + current->thread.cet.ibt_enabled = 0; >>> +} >> What's the locking for current->thread.cet? > > Now CET is not locked until the application calls ARCH_CET_LOCK. No, I mean what is the in-kernel locking for the current->thread.cet data structure? Is there none because it's only every modified via current->thread and it's entirely thread-local?