Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5707129imu; Tue, 13 Nov 2018 10:27:05 -0800 (PST) X-Google-Smtp-Source: AJdET5dwLmEiTOFYT9l5aEAout7Qr+q96nkpsea1Dg7zu3v7/Tp7wH9ZHYxiuFHOmaFxB8yIN1ku X-Received: by 2002:a17:902:768b:: with SMTP id m11mr3889885pll.80.1542133625065; Tue, 13 Nov 2018 10:27:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542133625; cv=none; d=google.com; s=arc-20160816; b=eYaMHjjZkqhXa5BM2tZYV/4QyvYhvlFGhpQ8Kcyglnxs07TWKDc88myRrM1ezSE5wq qpvGNaFZQlimBx+8af0kYXhIXcsrGbNeU/u/jqmwQogFmI5ElPjr2Thd1ao6k4o4huLH z3CpvUrerUTTmxaTndVx5jREfjyHIHteyzTOtin+IGKTunU440/l9kCEgTJWu0t45woq KbLOuQX3wnKMmPIj2jgIbwbwlwIPM9W422y5NGWgVJ6+5VRjs8A2hr+e+wcnIc/edkyd oI2q4+mMhhxY3zjutmOpwAblGiRFR6C5dvDn1IMcMayIfN0eZ2OVqzYL1sr/ZWL4kRSa 4T0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=eS4hoMt+MsOs3oAxOvWgaGAi4gOz572YP6+ADvuAQhg=; b=dt1aCR8Q33FhXQHJflI15pjm9y2st5uMQw0cVDXLOsVp2SkaK2zYJaV079nKj7v/qE YY/u33UN7MBoHRqqYHv5XpNHGTaR1YAQlw3m46AIcwSCHsuwS/yZK1rKBrtRr0rjUE/q 3vBVRMTC6uFZrXJ7igkVAAIOXloVxSmfN2gbHfM6ZHBxT/5oBJ6ySEyLpIg0xN7rimXe O1U97zCWHtopdKphkysP6vfWACrBfSYpR9G4rGFyQ/Qmd+8KkBsWa9nA9VsZ03ayudde B6b4Zsq+dJtzaGTAkvOjW4rq1OKHHI+FAwRETNC/ZFZfY9FHZ6vZqccI2/eEAM5qXzi0 DU4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 132si20590148pge.141.2018.11.13.10.26.46; Tue, 13 Nov 2018 10:27:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729564AbeKNEZZ (ORCPT + 99 others); Tue, 13 Nov 2018 23:25:25 -0500 Received: from lhrrgout.huawei.com ([185.176.76.210]:32758 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726517AbeKNEZY (ORCPT ); Tue, 13 Nov 2018 23:25:24 -0500 Received: from LHREML713-CAH.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 77C88501BB6A9; Tue, 13 Nov 2018 18:26:04 +0000 (GMT) Received: from [10.202.210.149] (10.202.210.149) by smtpsuk.huawei.com (10.201.108.36) with Microsoft SMTP Server (TLS) id 14.3.408.0; Tue, 13 Nov 2018 18:26:04 +0000 Subject: Re: [PATCH 10/17] prmem: documentation To: Andy Lutomirski , Igor Stoppa CC: Kees Cook , Peter Zijlstra , Nadav Amit , Mimi Zohar , Matthew Wilcox , Dave Chinner , James Morris , Michal Hocko , "Kernel Hardening" , linux-integrity , LSM List , Dave Hansen , Jonathan Corbet , Laura Abbott , Randy Dunlap , Mike Rapoport , "open list:DOCUMENTATION" , LKML , "Thomas Gleixner" References: <20181023213504.28905-1-igor.stoppa@huawei.com> <20181023213504.28905-11-igor.stoppa@huawei.com> <20181026092609.GB3159@worktop.c.hoisthospitality.com> <20181028183126.GB744@hirez.programming.kicks-ass.net> <40cd77ce-f234-3213-f3cb-0c3137c5e201@gmail.com> <20181030152641.GE8177@hirez.programming.kicks-ass.net> <0A7AFB50-9ADE-4E12-B541-EC7839223B65@amacapital.net> <6f60afc9-0fed-7f95-a11a-9a2eef33094c@gmail.com> From: Igor Stoppa Message-ID: <17a007eb-43ea-e4da-b066-0d8c502f5f6e@huawei.com> Date: Tue, 13 Nov 2018 20:26:01 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.210.149] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13/11/2018 19:16, Andy Lutomirski wrote: > On Tue, Nov 13, 2018 at 6:25 AM Igor Stoppa wrote: [...] >> How about having one mm_struct for each writer (core or thread)? >> > > I don't think that helps anything. I think the mm_struct used for > prmem (or rare_write or whatever you want to call it) write_rare / rarely can be shortened to wr_ which is kinda less confusing than rare_write, since it would become rw_ and easier to confuse with R/W Any advice for better naming is welcome. > should be > entirely abstracted away by an appropriate API, so neither SELinux nor > IMA need to be aware that there's an mm_struct involved. Yes, that is fine. In my proposal I was thinking about tying it to the core/thread that performs the actual write. The high level API could be something like: wr_memcpy(void *src, void *dst, uint_t size) > It's also > entirely possible that some architectures won't even use an mm_struct > behind the scenes -- x86, for example, could have avoided it if there > were a kernel equivalent of PKRU. Sadly, there isn't. The mm_struct - or whatever is the means to do the write on that architecture - can be kept hidden from the API. But the reason why I was proposing to have one mm_struct per writer is that, iiuc, the secondary mapping is created in the secondary mm_struct for each writer using it. So the updating of IMA measurements would have, theoretically, also write access to the SELinux AVC. Which I was trying to avoid. And similarly any other write rare updater. Is this correct? >> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of >> the patch might spill across the page boundary, however if I deal with >> the modification of generic data, I shouldn't (shouldn't I?) assume that >> the data will not span across multiple pages. > > The reason for the particular architecture of text_poke() is to avoid > memory allocation to get it working. i think that prmem/rare_write > should have each rare-writable kernel address map to a unique user > address, possibly just by offsetting everything by a constant. For > rare_write, you don't actually need it to work as such until fairly > late in boot, since the rare_writable data will just be writable early > on. Yes, that is true. I think it's safe to assume, from an attack pattern, that as long as user space is not started, the system can be considered ok. Even user-space code run from initrd should be ok, since it can be bundled (and signed) as a single binary with the kernel. Modules loaded from a regular filesystem are a bit more risky, because an attack might inject a rogue key in the key-ring and use it to load malicious modules. >> If the data spans across multiple pages, in unknown amount, I suppose >> that I should not keep interrupts disabled for an unknown time, as it >> would hurt preemption. >> >> What I thought, in my initial patch-set, was to iterate over each page >> that must be written to, in a loop, re-enabling interrupts in-between >> iterations, to give pending interrupts a chance to be served. >> >> This would mean that the data being written to would not be consistent, >> but it's a problem that would have to be addressed anyways, since it can >> be still read by other cores, while the write is ongoing. > > This probably makes sense, except that enabling and disabling > interrupts means you also need to restore the original mm_struct (most > likely), which is slow. I don't think there's a generic way to check > whether in interrupt is pending without turning interrupts on. The only "excuse" I have is that write_rare is opt-in and is "rare". Maybe the enabling/disabling of interrupts - and the consequent switch of mm_struct - could be somehow tied to the latency configuration? If preemption is disabled, the expectations on the system latency are anyway more relaxed. But I'm not sure how it would work against I/O. -- igor