Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5718876imu; Tue, 13 Nov 2018 10:37:46 -0800 (PST) X-Google-Smtp-Source: AJdET5enmu9MqdP0GK3Eh8WcdCfPipiW3/uLtYfMdsDlVmB2XoQf6H0OB1/bBbEfOtnGt7LT1+9E X-Received: by 2002:a63:396:: with SMTP id 144mr5827818pgd.68.1542134266421; Tue, 13 Nov 2018 10:37:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542134266; cv=none; d=google.com; s=arc-20160816; b=rQoaKbuuVHTE6GfQOioxyU2xn2AsIH/fXXg26+GgaM8nWWJ/WFgnG4FAZVdb3iKUS8 Ib2lti0jbAT4tuUX44uSxvjJBR1tKiLtGU2eg9rtw+HIiazcNr4kcky+oanzkP5l5V+S Gy5eQtIfTSBW9SHP1VIeKKiuw84wQ7xR+dyWhx8dkiFCo+DOP5ex1flQShnKkDmKlnvQ 62PdW4t3IGE3JYvkMHVMeRF4fZO57V0BuCByIqhAUBQes/WVbflafw0y3Ncnr5UxYz6t p/ZZAxX9jGbMkP/TnW+cB9IHXGhSh8f1bKo518mg7a//XwfIVs/XIatg8YYfyfFdgP+r KXTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=srCcrn9kPDYHv5RsnDuEY83gk928+k3hynyAdn3Y8Pg=; b=Gk+T7iMhDtjoUoPoBaNeKg0z470cPiz0zMc9ipNKis8HOTRDzUO93Y6FuWIOimMNFW DzxC5dpnoqT5rsNXJoQFqdDsLJ9QlJ51kAWiwyxEtc2frqMoIpN3mwvUywqVe5ziJhXQ OMrbeZvmjHDvW+bn04yVsatC7W50w1rCRLmPSbO0dnOERbA8bCR4qTEB52aS3/uS6Z5v 1WbyxIeM8uxa+qnR91diG4quaBsCrj0qTYMQYytNfaiqsYDOUpov+6v7CQZbTASiECBd E4agPR3CZKuvb/WOhUtBSfi7QOFLJ3olzhh9qYiy9i+GDvfeU89R1L6G5of1mB99mpGn uIDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=GQLveIUh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b2-v6si23517927plk.356.2018.11.13.10.37.17; Tue, 13 Nov 2018 10:37:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=GQLveIUh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731135AbeKNEf0 (ORCPT + 99 others); Tue, 13 Nov 2018 23:35:26 -0500 Received: from mail-wm1-f67.google.com ([209.85.128.67]:36699 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728334AbeKNEf0 (ORCPT ); Tue, 13 Nov 2018 23:35:26 -0500 Received: by mail-wm1-f67.google.com with SMTP id s11so3078874wmh.1 for ; Tue, 13 Nov 2018 10:36:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=srCcrn9kPDYHv5RsnDuEY83gk928+k3hynyAdn3Y8Pg=; b=GQLveIUhsI/F4IwjREUDGARnt4jT80vgXXupCkzLOkuOjm8n6t41/gD5IcV6nYZP5H P29zYAXGAgiZsxBT5Gijp44woWXulSEs5pGGD1vz/1HXiU9FhXpVUDbeZxW9LtzgdyQx A1Dc/leRlfri2uVYNzJcbmwIE/Mw+r/x+NIvr6VLXJpXDJfsOuQq2SS6cpOEeMaTbJ2j hJIEOoJAA+5xGKRGulZqzCRhU7aP1J4hA96YlftbQROIImHPnFIf7S5WAbOBC5o45awB mR0vsIxuw2EYCnkqV0rLYQwib7mER/ULglOCdCj8gTnMkr9eTqtXX9nmBiGrcAZpKz3u /XJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=srCcrn9kPDYHv5RsnDuEY83gk928+k3hynyAdn3Y8Pg=; b=ruUJqvVVXl62MWm+E8We0c6nPmt5jqOOLAxQky9ICLX8m9zB2JHRR/IrZWHgZBj7sQ asA8K5eUz907RC7myiOo74WnxeRqA0vSo3skzPObbwOJbP9s7QeVLE+jEv12HLGyxrtR sxKXtP4KZNjpYVusISU+7/t2uuXiFp0uZgZ0rd1YYlVA+eLrKALlshIN1Mz8keqpim5k 96dpu39tiSBQRXd3nQvx/GOCKxbBkL2mbAy3eCFI0hTRCPFmXGYBAMOWghEf4YcgzJgh fmnA4BwNnPtT1ejHHsEbViOtVOyaJhqvLjDnCuo0d0s+JnB/MFnEzHHrwL0pOThtAXWp y2mw== X-Gm-Message-State: AGRZ1gLY/4wVtt3RWnvzdEPOzPb7rR27bSj/iMkFrrktBZ0s5dvzNKpd uyZ9SHPDY4RgjcSHPvy6+Pf8lFZxgN6wmAascWoRuw== X-Received: by 2002:a1c:2b45:: with SMTP id r66-v6mr4153457wmr.128.1542134164781; Tue, 13 Nov 2018 10:36:04 -0800 (PST) MIME-Version: 1.0 References: <20181023213504.28905-1-igor.stoppa@huawei.com> <20181023213504.28905-11-igor.stoppa@huawei.com> <20181026092609.GB3159@worktop.c.hoisthospitality.com> <20181028183126.GB744@hirez.programming.kicks-ass.net> <40cd77ce-f234-3213-f3cb-0c3137c5e201@gmail.com> <20181030152641.GE8177@hirez.programming.kicks-ass.net> <0A7AFB50-9ADE-4E12-B541-EC7839223B65@amacapital.net> <6f60afc9-0fed-7f95-a11a-9a2eef33094c@gmail.com> <17a007eb-43ea-e4da-b066-0d8c502f5f6e@huawei.com> In-Reply-To: <17a007eb-43ea-e4da-b066-0d8c502f5f6e@huawei.com> From: Andy Lutomirski Date: Tue, 13 Nov 2018 10:35:53 -0800 Message-ID: Subject: Re: [PATCH 10/17] prmem: documentation To: Igor Stoppa Cc: Igor Stoppa , Kees Cook , Peter Zijlstra , Nadav Amit , Mimi Zohar , Matthew Wilcox , Dave Chinner , James Morris , Michal Hocko , Kernel Hardening , linux-integrity , LSM List , Dave Hansen , Jonathan Corbet , Laura Abbott , Randy Dunlap , Mike Rapoport , "open list:DOCUMENTATION" , LKML , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 13, 2018 at 10:26 AM Igor Stoppa wrote: > > On 13/11/2018 19:16, Andy Lutomirski wrote: > > > should be > > entirely abstracted away by an appropriate API, so neither SELinux nor > > IMA need to be aware that there's an mm_struct involved. > > Yes, that is fine. In my proposal I was thinking about tying it to the > core/thread that performs the actual write. > > The high level API could be something like: > > wr_memcpy(void *src, void *dst, uint_t size) > > > It's also > > entirely possible that some architectures won't even use an mm_struct > > behind the scenes -- x86, for example, could have avoided it if there > > were a kernel equivalent of PKRU. Sadly, there isn't. > > The mm_struct - or whatever is the means to do the write on that > architecture - can be kept hidden from the API. > > But the reason why I was proposing to have one mm_struct per writer is > that, iiuc, the secondary mapping is created in the secondary mm_struct > for each writer using it. > > So the updating of IMA measurements would have, theoretically, also > write access to the SELinux AVC. Which I was trying to avoid. > And similarly any other write rare updater. Is this correct? If you call a wr_memcpy() function with the signature you suggested, then you can overwrite any memory of this type. Having a different mm_struct under the hood makes no difference. As far as I'm concerned, for *dynamically allocated* rare-writable memory, you might as well allocate all the paging structures at the same time, so the mm_struct will always contain the mappings. If there are serious bugs in wr_memcpy() that cause it to write to the wrong place, we have bigger problems. I can imagine that we'd want a *typed* wr_memcpy()-like API some day, but that can wait for v2. And it still doesn't obviously need multiple mm_structs. > > >> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of > >> the patch might spill across the page boundary, however if I deal with > >> the modification of generic data, I shouldn't (shouldn't I?) assume that > >> the data will not span across multiple pages. > > > > The reason for the particular architecture of text_poke() is to avoid > > memory allocation to get it working. i think that prmem/rare_write > > should have each rare-writable kernel address map to a unique user > > address, possibly just by offsetting everything by a constant. For > > rare_write, you don't actually need it to work as such until fairly > > late in boot, since the rare_writable data will just be writable early > > on. > > Yes, that is true. I think it's safe to assume, from an attack pattern, > that as long as user space is not started, the system can be considered > ok. Even user-space code run from initrd should be ok, since it can be > bundled (and signed) as a single binary with the kernel. > > Modules loaded from a regular filesystem are a bit more risky, because > an attack might inject a rogue key in the key-ring and use it to load > malicious modules. If a malicious module is loaded, the game is over. > > >> If the data spans across multiple pages, in unknown amount, I suppose > >> that I should not keep interrupts disabled for an unknown time, as it > >> would hurt preemption. > >> > >> What I thought, in my initial patch-set, was to iterate over each page > >> that must be written to, in a loop, re-enabling interrupts in-between > >> iterations, to give pending interrupts a chance to be served. > >> > >> This would mean that the data being written to would not be consistent, > >> but it's a problem that would have to be addressed anyways, since it can > >> be still read by other cores, while the write is ongoing. > > > > This probably makes sense, except that enabling and disabling > > interrupts means you also need to restore the original mm_struct (most > > likely), which is slow. I don't think there's a generic way to check > > whether in interrupt is pending without turning interrupts on. > > The only "excuse" I have is that write_rare is opt-in and is "rare". > Maybe the enabling/disabling of interrupts - and the consequent switch > of mm_struct - could be somehow tied to the latency configuration? > > If preemption is disabled, the expectations on the system latency are > anyway more relaxed. > > But I'm not sure how it would work against I/O. I think it's entirely reasonable for the API to internally break up very large memcpys.