Received: by 10.223.176.46 with SMTP id f43csp585209wra; Fri, 26 Jan 2018 03:48:05 -0800 (PST) X-Google-Smtp-Source: AH8x227V2PbCZtN2SF0FCQgrgFyJUAKNoXBnlS8uIzdjAjG4JfCRdcnuQsJ33jzuxSO+shI8wvnY X-Received: by 10.99.128.66 with SMTP id j63mr6338192pgd.254.1516967285604; Fri, 26 Jan 2018 03:48:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516967285; cv=none; d=google.com; s=arc-20160816; b=bXqucb4t3Fvu/ulwL897hTv3+6HhFBNGw0p9yRkn3jeO5Mhdenl44qjpJhsWbyvG/P /rE385koHoCAhaAtwTSHTYtRXIsEMR79LdAX7Mt4H/llQnoe/WxklP59yA/XlsQXkmoi MAHuCnMVok9mjTEFFdZdcOU0C9bDlYb8+BvVIu/I60YjL6wrnWxHgywPMveWrCzOTwBH Ro4SmZgTWFQNc0Vb+yMCrdmlyb4EgnXlYuC7wIPKAF+u2K4qE/nl6NyoPQq/ylekFseC //mtRUb+AzFiziIlcIGOv03E2c0+QBbTGcbsqgSIPdxt8ImShUldjr3MsURc4ZbLkIIv 9Uuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=mhJBI/5uG5uCTwXcLPQER2oUr5eRzi5atzebcLPciHE=; b=JwslzogVZQnY5Q886YkJZhCfqp8E5SLBOCHx3mMZ8fYIbBB5hkRCkK6samrxO7NcsH 4F9TuBI3AhGwOsqTLCbgwBiymdbQDjM2+IhR3tkU18+Uyw9Okwvfa1gHm9p9FQaPqhOj z+cyZimoKaNN2pev+ucmCMl2Ek5gCSE2p0yD3dsryN2ORncKbmGKtqMr2LKwckoIRbUv eGHa+NRAAmuK1v7JZ6zBXmOfRUeBRaR1+wNdNTP4rJNxfDDzwHTjh3sy4n0FqnEfdCWw 7E/GT4d+jzvZmKXIAXHZCBhFnqofKkW0ILpfanbxHWiBES5fpcOT3ml3noKu8BrmmQ7q vO7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r9si6237507pfk.216.2018.01.26.03.47.51; Fri, 26 Jan 2018 03:48:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751841AbeAZLq7 (ORCPT + 99 others); Fri, 26 Jan 2018 06:46:59 -0500 Received: from lhrrgout.huawei.com ([194.213.3.17]:22998 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751365AbeAZLq5 (ORCPT ); Fri, 26 Jan 2018 06:46:57 -0500 Received: from LHREML712-CAH.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id C8BC51C7CAFE1; Fri, 26 Jan 2018 11:46:53 +0000 (GMT) Received: from [10.122.225.51] (10.122.225.51) by smtpsuk.huawei.com (10.201.108.35) with Microsoft SMTP Server (TLS) id 14.3.361.1; Fri, 26 Jan 2018 11:46:53 +0000 Subject: Re: [kernel-hardening] [PATCH 4/6] Protectable Memory To: Matthew Wilcox , Jann Horn CC: , Kees Cook , Michal Hocko , Laura Abbott , Christoph Hellwig , Christoph Lameter , , , kernel list , Kernel Hardening References: <20180124175631.22925-1-igor.stoppa@huawei.com> <20180124175631.22925-5-igor.stoppa@huawei.com> <20180126053542.GA30189@bombadil.infradead.org> From: Igor Stoppa Message-ID: <4788245d-c8e1-3587-c1e0-09c18245fe9a@huawei.com> Date: Fri, 26 Jan 2018 13:46:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20180126053542.GA30189@bombadil.infradead.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.122.225.51] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26/01/18 07:35, Matthew Wilcox wrote: > On Wed, Jan 24, 2018 at 08:10:53PM +0100, Jann Horn wrote: >> I'm not entirely convinced by the approach of marking small parts of >> kernel memory as readonly for hardening. > > It depends how significant the data stored in there are. For example, > storing function pointers in read-only memory provides significant > hardening. > >> You're allocating with vmalloc(), which, as far as I know, establishes >> a second mapping in the vmalloc area for pages that are already mapped >> as RW through the physmap. AFAICS, later, when you're trying to make >> pages readonly, you're only changing the protections on the second >> mapping in the vmalloc area, therefore leaving the memory writable >> through the physmap. Is that correct? If so, please either document >> the reasoning why this is okay or change it. > > Yes, this is still vulnerable to attacks through the physmap. That's also > true for marking structs as const. We should probably fix that at some > point, but at least they're not vulnerable to heap overruns by small > amounts ... you have to be able to overrun some other array by terabytes. Actually, I think there is something to say in favor of using a vmalloc based approach, precisely because of the physmap :-P If I understood correctly, the physmap is primarily meant to speed up access to physical memory through the TLB. In particular, for kmalloc based allocations. Which means that, to perform a physmap-based attack to a kmalloced allocation, one needs to know: - the address of the target variable in the kmalloc range - the randomized offset of the kernel - the location of the physmap But, for a vmalloc based allocation, there is one extra hoop: since the mapping is really per page, now the attacker has actually to walk the page table, to figure out where to poke in the physmap. One more thought about physmap: does it map also code? Because, if it does, and one wants to use it for an attack, isn't it easier to look for some security test and replace a bne with be or equivalent? > It's worth having a discussion about whether we want the pmalloc API > or whether we want a slab-based API. pmalloc is meant to be useful where the attack surface is made up of lots of small allocations - my first use case was the SE Linux policy DB, where there is a variety of elements being allocated, in large amount. To the point where having ready made caches would be wasteful. Then there is the issue I already mentioned about arm/arm64 which would require to break down large mappings, which seems to be against current policy, as described in my previous mail: http://www.openwall.com/lists/kernel-hardening/2018/01/24/11 I do not know exactly what you have in mind wrt slab, but my impression is that it will most likely gravitate toward the pmalloc implementation. It will need: - "pools" or anyway some means to lock only a certain group of pages, related to a specific kernel user - (mostly) lockless allocation - a way to manage granularity (or order of allocation) Most of this is already provided by genalloc, which is what I ended up almost re-implementing, before being pointed to it :-) I only had to add the tracking of end of allocations, which is what the patch 1/6 does - as side note, is anybody maintaining it? I could not find an entry in MAINTAINERS As I mentioned above, using vmalloc adds even an extra layer of protection. The major downside is the increased TLB use, however this is not so relevant for the volumes of data that I had to deal with so far: only few 4K pages. But you might have in mind something else. I'd be interested to know what and what would be an obstacle in using pmalloc. Maybe it can be solved. -- igor