Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4963670ybi; Tue, 28 May 2019 05:31:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqzjiYH6yafHkl+bFUhhg4AzROfpZrR2OfObn9Lr2qoraCny96LcXJsWJeoJanMkBr/PvHdJ X-Received: by 2002:a17:902:f01:: with SMTP id 1mr5091392ply.170.1559046705226; Tue, 28 May 2019 05:31:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559046705; cv=none; d=google.com; s=arc-20160816; b=W8oeA42o81n8F0tZLc0LQ3WV/L7pWkjUW4oTKnzL4VkES7GAxnrpRF51UR5D08fua+ spboCK/7TrIyEzRnu84VGlamc3uwO8a+YRowcp92qu+Ja3TZv+8bJgkVWc5ONZm3soMS n0ruXVJT2WAc33ZI22wUCd2gq8DpqBWHSSG/SPRZ7A3/KihqNamhlKkcbujrDqS92bB8 Pm8l8lkJCEow18kacm2aLmJAw04Wd9elmVjlEk7mrVpwEWCvNCIB7q1nZi3p7iCphZvO 7ygCqNlDL6rh8AbWaCixvCQTqDqIlFVOc0yBE5/Vb+YU9h8sBOOzqKmQobVCoh+N/E4V L9fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :dlp-reaction:dlp-version:dlp-product:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from; bh=9BpPI0xqD6FSXFxoulXTAbm721CseTCi9yPrNqJe/NM=; b=KMRuxVWf97ZOb8AWYbQGuNIbUdnP5RqNh6gB06bZ5DkAXSfot80qUFtqosmwzrXY1O HocPoTFuD/rjQh3V3Tay2a2AAoBhGqiB3zopgYAZFYN+y0kpXL/zrgB5P1/wIYxXhwpM LYcBrD9zQu99XcUJfax67ILpfKnVX7oW0Lhfr0xQ1r4kN3TPx3ruec4Cj1OTWyDSueeg beb1tnTH2+n+02tWXKwW4C6SAY4Ixy46rYbJJnwwrf9zbiamIlybniyxacLEhtrdu3+X 6SeReq+5/Ddfh9DKQUp1PSnEvALJxXguoKHtqUprPHP28gm/1gRFlhwFFswjoRZbVm2l R8IQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r10si22617789pfh.147.2019.05.28.05.31.28; Tue, 28 May 2019 05:31:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726999AbfE1M27 convert rfc822-to-8bit (ORCPT + 99 others); Tue, 28 May 2019 08:28:59 -0400 Received: from mga03.intel.com ([134.134.136.65]:18010 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726580AbfE1M27 (ORCPT ); Tue, 28 May 2019 08:28:59 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 May 2019 05:28:58 -0700 X-ExtLoop1: 1 Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by fmsmga001.fm.intel.com with ESMTP; 28 May 2019 05:28:54 -0700 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.108]) by IRSMSX106.ger.corp.intel.com ([169.254.8.166]) with mapi id 14.03.0415.000; Tue, 28 May 2019 13:28:53 +0100 From: "Reshetova, Elena" To: Kees Cook , Ingo Molnar , "Andy Lutomirski" CC: David Laight , Theodore Ts'o , Eric Biggers , "ebiggers@google.com" , "herbert@gondor.apana.org.au" , Peter Zijlstra , "Daniel Borkmann" , "linux-kernel@vger.kernel.org" , "jpoimboe@redhat.com" , "jannh@google.com" , "Perla, Enrico" , "mingo@redhat.com" , "bp@alien8.de" , "tglx@linutronix.de" , "gregkh@linuxfoundation.org" , "Edgecombe, Rick P" , Linus Torvalds , Peter Zijlstra Subject: RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall Thread-Topic: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall Thread-Index: AQHU81HQwzT9MH4dM0y/JZXnSwiYT6Y8wW2AgAAdM1CAAXexAIAANZ3ggAAW1gCAAApRgIAAMeKAgAAd+PCAAQuGgIAAYQuAgAAKhwCACsPi4IADJTwAgAAcagCAAExngIAEBbGAgACIbACAAbyQ8IAA626AgAGZfXCAAARpgIAAWpuAgAAF74CAABf/AIAAAvkAgAGZnrD///dzgIAHjbaA///31ICAAC4VAIABBxmAgAAfuaCAAA5FAIAED8OAgAAYaYCAAINWgIAAbRaAgBjvMfA= Date: Tue, 28 May 2019 12:28:53 +0000 Message-ID: <2236FBA76BA1254E88B949DDB74E612BA4CA8DBF@IRSMSX102.ger.corp.intel.com> References: <2236FBA76BA1254E88B949DDB74E612BA4C760A7@IRSMSX102.ger.corp.intel.com> <20190508113239.GA33324@gmail.com> <2236FBA76BA1254E88B949DDB74E612BA4C762F7@IRSMSX102.ger.corp.intel.com> <20190509055915.GA58462@gmail.com> <2236FBA76BA1254E88B949DDB74E612BA4C7741F@IRSMSX102.ger.corp.intel.com> <20190509084352.GA96236@gmail.com> <201905111703.5998DF5F@keescook> <20190512080245.GA7827@gmail.com> <201905120705.4F27DF3244@keescook> In-Reply-To: <201905120705.4F27DF3244@keescook> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.600.7 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > With 5 bits there's a ~96.9% chance of crashing the system in an attempt, > > the exploit cannot be used for a range of attacks, including spear > > attacks and fast-spreading worms, right? A crashed and inaccessible > > system also increases the odds of leaving around unfinished attack code > > and leaking a zero-day attack. > > Yup, which is why I'd like to have _something_ here without us getting > lost in the "perfect entropy" weeds. :) I really start to believe that we cannot make good randomness sources behave fast enough for per-syscall usage if our target is 1-2% overhead under worst possible (and potentially unrealistic ) scenario (stress test of some simple syscall like getpid()). The only thing that would fit the margin is indeed rdtsc(). I profiled the path in use with get_random_bytes() and results look like this (arch_get_random_long in not inline for measurement purpose here): > > > > | | | --9.44%--random_get_byte > > | | | | > > | | | --8.08%--get_random_bytes > > | | | | > > | | | --7.80%--_extract_crng.constprop.45 > > | | | | > > | | | |--4.95%--arch_get_random_long > > | | | | > > | | | --2.39%--chacha_block And here is the proof that under such usage _extract_crng bottlenecks on rdrand: PerfTop: 5877 irqs/sec kernel:78.6% exact: 100.0% [4000Hz cycles:ppp], (all, 8 CPUs) ------------------------------------------------------------------------------------------------------------------------------------ Showing cycles:ppp for _extract_crng.constprop.46 Events Pcnt (>=5%) Percent | Source code & Disassembly of kcore for cycles:ppp (2104 samples, percent: local period) ------------------------------------------------------------------------------------------------------- 0.00 : ffffffff9abd1a62: mov $0xa,%edx 97.94 : ffffffff9abd1a67: rdrand %rax And then of course there is chacha permutation itself. So, I think Andy's proposal to rewrite "get_random_bytes" for speed is not so easy to implement. So, given that all we want is to raise the bar for attackers to predict the stack location on subsequent syscall, is it really worth to try to come up with more complex solutions than just using lower bits of rdtsc() by default? One idea that I got suggested last week is to create a pool of good randomness and then during syscall select a random number from the pool using smth rdtsc()%POOL_SIZE. Pool would need to be refilled periodically, outside of syscall path to maintain diversity. I can try this approach, if people believe that it would address the security concerns around rdtsc() (my personal feeling is that one can still time attack this if we assume that rdtsc can be attacked and complexity of the whole thing increases considerably). If we decide that this is too much trouble for just 5 bits of randomness we need per syscall, I would still propose we reconsider original rdtsc() approach since it is still better than nothing. We can have the whole thing on three levels: CONFIG_RANDOMIZE_KSTACK_OFFSET - off - no randomization, like now CONFIG_RANDOMIZE_KSTACK_OFFSET on with rdtsc(), fast, better than nothing, but prone to timing attacks CONFIG_RANDOMIZE_KSTACK_OFFSET based on get_random_bytes() with better security guarantees. Performance numbers for will approx. look like No randomization: Simple syscall: 0.0534 microseconds With rdtsc(): Simple syscall: 0.0539 microseconds Wih get_random_bytes(4096 buffer): Simple syscall: 0.0597 microseconds Pure rdrand option with calling rdrand_long every 10th syscall is considerably slower With rdrand (every 10th syscall): Simple syscall: 0.0719 microseconds And I guess we should once again remember that these are *not* the numbers that real users will see in practice since I doubt we have the real loads issuing millions of *very lightweight* syscalls in a loop, so this is really more "theoretical, worst case ever" numbers. If someone could actually propose a reasonable *practical* workload to measure with, then we can see what is the overhead on that both for rdtsc and get_random_bytes(). Best Regards, Elena.