Received: by 10.223.176.5 with SMTP id f5csp678646wra; Sat, 3 Feb 2018 07:59:25 -0800 (PST) X-Google-Smtp-Source: AH8x224YY2f1ZNnjgBkO52ev37KowmOb+L8NrVF4ElQ8lDM+KPbwhCrYjGGVq/JYFeIwHyVKQMU+ X-Received: by 10.98.147.29 with SMTP id b29mr39115192pfe.229.1517673564937; Sat, 03 Feb 2018 07:59:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517673564; cv=none; d=google.com; s=arc-20160816; b=oM0vwjt7ECnjzZZe/10We+4X+Ora2JyslcAU/P4mWUuhlYh2bVsglO10c++vYSUpgj GTOAUn/TjyYXFyzb1qGtNaQQ879rQlByyROu+y1aEtLdVhL3m4zC9FxRhBdLTww0LtMi k2NPwqpAPaKcXtTWB05/gGkNKOZ/A2rRXYNfDYkrZbVY3J5xLe+D3V8uXkbyMcqiJIiN XfPCKFcXa1cQsSRkjninFMBKA8kDcADUEUf0FaAgDeuTglTKvr7XF8cHqQX9ClmygbT5 cXkkJHiHG0HbZZ8/qJZmsCFkpt42jPpzcOZ+JhNO9l/Nak+xisI8KGZbRTMrMid2l+oJ Slgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=GhdOwopFz8jgvr0FNDF0ZLGqM4ZowKdMcjA2iqYgC4k=; b=PRs8Z9uDnPDfCFdHV8xr5CnECWWTZNwTcuxc7huzBotqdZycy4L96HsAM/7VCj9d0H +lub1hrah4XyyCuC1ob3kNCq6R54J5wXnrOMZPr8Pb+7iT98fA84m5V5ftjXbqc9Ecnv iqihxBp1JwZJIMVClfl1/UR2yF5GNpRmyDxvY++nvMl6YxC6lTf/5Z7RAAZIfvOmpZas vEL3BJWTnUQBjihBmkmVU+eV3rHRz3P3sKhWdkIgzPJuZDtNO+46lEGbEK8JmB0Yz+SH jLvELYSmUqlNUhyQeNea5U+8zkGq7PQFdNsjrcjkvU335y9frFLrj+yuC/sHn0gtTK7G mByg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x1si1611728pgc.316.2018.02.03.07.59.09; Sat, 03 Feb 2018 07:59:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752414AbeBCPi1 (ORCPT + 99 others); Sat, 3 Feb 2018 10:38:27 -0500 Received: from lhrrgout.huawei.com ([194.213.3.17]:24955 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751644AbeBCPiU (ORCPT ); Sat, 3 Feb 2018 10:38:20 -0500 Received: from LHREML714-CAH.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id 4ABC959B403B1; Sat, 3 Feb 2018 15:38:16 +0000 (GMT) Received: from [10.122.225.51] (10.122.225.51) by smtpsuk.huawei.com (10.201.108.37) with Microsoft SMTP Server (TLS) id 14.3.361.1; Sat, 3 Feb 2018 15:38:17 +0000 Subject: Re: [kernel-hardening] [PATCH 4/6] Protectable Memory To: Christopher Lameter , Matthew Wilcox , Boris Lukashev CC: Jann Horn , , Kees Cook , Michal Hocko , Laura Abbott , Christoph Hellwig , , , kernel list , Kernel Hardening References: <20180124175631.22925-1-igor.stoppa@huawei.com> <20180124175631.22925-5-igor.stoppa@huawei.com> <20180126053542.GA30189@bombadil.infradead.org> From: Igor Stoppa Message-ID: Date: Sat, 3 Feb 2018 17:38:11 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.122.225.51] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Boris Lukashev On 02/02/18 20:39, Christopher Lameter wrote: > On Thu, 25 Jan 2018, Matthew Wilcox wrote: > >> It's worth having a discussion about whether we want the pmalloc API >> or whether we want a slab-based API. We can have a separate discussion >> about an API to remove pages from the physmap. > > We could even do this in a more thorough way. Can we use a ring 1 / 2 > distinction to create a hardened OS core that policies the rest of > the ever expanding kernel with all its modules and this and that feature? What would be the differentiating criteria? Furthermore, what are the chances of invalidating the entire concept, because there is already an hypervisor using the higher level features? That is what you are proposing, if I understand correctly. But more on this below ... > I think that will long term be a better approach and allow more than the > current hardening approaches can get you. It seems that we are willing to > tolerate significant performance regressions now. So lets use the > protection mechanisms that the hardware offers. I would rather *not* propose significant performance regression :-P There might be some one-off case or anyway rare event which is penalized, but my preference goes to not introducing any significant performance penalty, during regular use. After all, the lower the penalty, the wider the (potential) adoption. More in detail: there are 2 major cases for wanting some form of read-only protection. 1) extra ward against accidental corruption The kernel provides many debugging tools and they can detect lots of errors during development, but they require time and knowledge to use them, which are not always available. Furthermore, it is objectively true that not all the code has the same level of maturity, especially when non-upstream code is used in some custom product. It's not my main goal, but it would be nice if that case too could be addressed by the protection. Corruption *can* happen. Having live guards against it, will definitely help spotting bugs or, at the very least, crash/reboot a device before it can cause permanent data corruption. Protection against accidental corruption should be used as widely as possible, therefore it cannot have an high price tag, in terms of lost performance. Otherwise, there's the risk that it will be just a debug feature, more like lockdep or ubsan. 2) protection against malicious attacks This is harder, of course, but what is realistically to be expected? If an attacker can gain full control of the kernel, the only way to do damage control is to have HW and/or higher privilege SW that can somehow limit the reach of the attacker. To make it work for real, it should be mandated that either these extra HW/SW means can tell apart legitimate kernel activity from rogue actions, or they operate so independently from the kernel that a compromise kernel cannot use any API to influence them. The consensus seems to be to put aside (for now) this concern and instead focus on what is a typical scenario: - some bug is found that allows to read/write kernel memory - some other bug is found, which leaks the address of a well known variable, effectively revealing the randomized offset of each symbol placed in linear memory, once their relative location is known. What is described above is a toolkit that effectively can allow - with patience - to attack anything that is writable by the kernel. Including page tables and permissions. However the typical attack is more like: "let's flip some bit(s)". Which is where __ro_after_init has its purpose to exist. My proposal is to extend the same sort of protection also to variables allocated dynamically. * make the pages read only, once the data is initialized * use vmalloc to prevent that exfiltrating the address of an unrelated variable can easily give away the location of the real target, because of the individual page mapping vs linear mapping. Boris Lukashev proposed additional hardening, when accessing a certain variable, in the form of hash/checksum, but I could not come up with an implementation that did not have too much overhead. Re-considering this, one option would be to have a function "pool_validate()" - probably expensive - that could be invoked by a piece of code before using the data from the pool. Not perfect, because it would not be atomic, but it could be used once, at the beginning of a function, without adding overhead to each access to the pool that the function would perform. An attacker would have to time the attack so that the corruption of the data wold happen after the pool is validated and before the data is read from it. Possible, but way tricker than the current unprotected situation. What I am trying to say, is that even after having multi-ring implementation (which would be more dependent on HW features), there would be still the problem of validating the legitimacy of the use of the API that such implementation would expose. I'd rather try to preserve performance and still provide a defense against the more trivial attacks, since other types of attacks are much harder to perform in the wild. Of course, I'm interested in alternatives (I'll comment separately on the compound pages) The way pmalloc is designed is to take advantage of any page provider. So far, vmalloc seems to me the best option, but something else might emerge that works better. Yet the pmalloc API is, I think, what would be still needed, to let the rest of the kernel take advantage of this feature. -- igor