Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp5233928imd; Tue, 30 Oct 2018 14:10:35 -0700 (PDT) X-Google-Smtp-Source: AJdET5c26fyDa10LBov/FitoNaVRmvaP6cEBKfeMeiuWHSYnuOjJQylHWxQs8z/tef/Gld+LtHNA X-Received: by 2002:a63:e841:: with SMTP id a1-v6mr348864pgk.4.1540933835357; Tue, 30 Oct 2018 14:10:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540933835; cv=none; d=google.com; s=arc-20160816; b=KhJMArAXv89muf8v/kWOxVcRWUTkPGOO6HtjHViqtT8w8AqhloG6Sb3tm+Hu96qrZQ Gycu9zyQYhY/D8mabyvlOHPXTBJKGm31jy3gCZs2SZe9ww7MB4d0OwubNlr8zvGDgUEX 0u8h0BHgu5o4fr2oFAX02Nvb33k8m0WHc9jQy/W3P5psFRcHXkdUcEkfZ6Ymfsd61mTN xp4N5XUS+Q7SW3sLOsanskxk3YRUPcZHegr1w2NcVFmabIlu7knZvkJBmrQr+HGD7s98 x+kw1KG7WWl5jxZ2guCubNiyeTTdVBNCuIqISU2oHhUYusHfQ3zGpbJQOPT8pYmzCJ6x LJ1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature; bh=jyiOPOCckzrbIKAshXrPMU0L5xcGklBFrQXLee5VrHU=; b=qHED8AytkAEA/kUbGPstuqRyj93ldcuGNP/LqckRtns9Dq7WLOzkVGmWFfXHtX7Bd2 Rl056b87W5600ZAzikl7OjkPPR+7bria2VlhEL9Ogynx5vjlr1gmXp4Pm3rHWkXyyotd cO3Q1TDF93CYWdVkVa8tuKBqzmwIwntnFxx121f3THDJHh5uQt9H/CbF0j2r/ae4nTBA SlQ4RPjAMONvslZc9XEH57DMINgs6yydNMH9B3CH5JJiYcRPtuXWSUN6WWGkZFvw3Xfs SIXkgRABIWd+wbSdWGvb+wzwNPt3PeyDyHAezqWwduFfv+tRI6sy6G6Z3VzISr1FK6W9 eNaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=mxS5QgvV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q85-v6si25409187pfi.183.2018.10.30.14.10.19; Tue, 30 Oct 2018 14:10:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=mxS5QgvV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727842AbeJaGC4 (ORCPT + 99 others); Wed, 31 Oct 2018 02:02:56 -0400 Received: from mail-yb1-f196.google.com ([209.85.219.196]:42945 "EHLO mail-yb1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725921AbeJaGC4 (ORCPT ); Wed, 31 Oct 2018 02:02:56 -0400 Received: by mail-yb1-f196.google.com with SMTP id o204-v6so5683569yba.9 for ; Tue, 30 Oct 2018 14:07:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=jyiOPOCckzrbIKAshXrPMU0L5xcGklBFrQXLee5VrHU=; b=mxS5QgvVZduXsqT44pfmSvL3VNLgxeTwCignfntI8Cw8waLM7cUr9tzKaulQIdYPBC y0cU84bcvv5mHPjU6sMaLoiVCD11xR+MCWQtRRLNBFiC3gYjgZJs8QxWG+o+aMWZGTId F9O44F7hYnRzDFMUa+x+azpcbUxfCPjboI4tg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=jyiOPOCckzrbIKAshXrPMU0L5xcGklBFrQXLee5VrHU=; b=UZIpNZk6e/nmlYb4iCrFSbsiHBhwzHfzc6Cmcb3nE8CBKLJ5dPnjQklKTZWBx7Tq2/ y1ABZ2PIimcTV8HaFlKvI5Abi275tIpW+x7AmePlZEz3Dw2+O7FdhqKEVwz5m2Jx5jCj Ivr2BI+oKu8kcP7aL8HW7GhFbkqJ8YhSdYuGSo/+B8Swe3POWuX1+rBZgkwvwSAtkFb2 Hyi3yr0/7OSYzK8K+d89C+6c97OAD9hpBvK0HiYf5nP6BgJ2+P6/RgS7p5UHufRleLJK qtDe6L8xyyWh6WX7883/jdWXSQgGPJscf1pE8qHlRXgVVV7RaaT8Z3b9L6KZeRiLusw7 8RJA== X-Gm-Message-State: AGRZ1gLnsjXrsqPfwlIAHm06E+xdB0W29uFYc18qzNt/03Pk1TzZVdds HPFsGx63/qM2bmVDtMo1PXDC92EN0SE= X-Received: by 2002:a25:dc5:: with SMTP id 188-v6mr402346ybn.330.1540933669498; Tue, 30 Oct 2018 14:07:49 -0700 (PDT) Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com. [209.85.219.171]) by smtp.gmail.com with ESMTPSA id m65-v6sm5166362ywm.42.2018.10.30.14.07.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Oct 2018 14:07:47 -0700 (PDT) Received: by mail-yb1-f171.google.com with SMTP id g9-v6so5677610ybh.7 for ; Tue, 30 Oct 2018 14:07:47 -0700 (PDT) X-Received: by 2002:a25:8409:: with SMTP id u9-v6mr366343ybk.421.1540933667165; Tue, 30 Oct 2018 14:07:47 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a25:3990:0:0:0:0:0 with HTTP; Tue, 30 Oct 2018 14:07:45 -0700 (PDT) In-Reply-To: References: <20181023213504.28905-11-igor.stoppa@huawei.com> <20181026092609.GB3159@worktop.c.hoisthospitality.com> <20181028183126.GB744@hirez.programming.kicks-ass.net> <40cd77ce-f234-3213-f3cb-0c3137c5e201@gmail.com> <20181030152641.GE8177@hirez.programming.kicks-ass.net> <0A7AFB50-9ADE-4E12-B541-EC7839223B65@amacapital.net> <20181030175814.GB10491@bombadil.infradead.org> <20181030182841.GE7343@cisco> <20181030192021.GC10491@bombadil.infradead.org> <9edbdf8b-b5fb-5a82-43b4-b639f5ec8484@gmail.com> From: Kees Cook Date: Tue, 30 Oct 2018 14:07:45 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 10/17] prmem: documentation To: Andy Lutomirski Cc: Igor Stoppa , Matthew Wilcox , Tycho Andersen , Peter Zijlstra , Mimi Zohar , Dave Chinner , James Morris , Michal Hocko , Kernel Hardening , linux-integrity , linux-security-module , Igor Stoppa , Dave Hansen , Jonathan Corbet , Laura Abbott , Randy Dunlap , Mike Rapoport , "open list:DOCUMENTATION" , LKML , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 30, 2018 at 2:02 PM, Andy Lutomirski wrot= e: > > >> On Oct 30, 2018, at 1:43 PM, Igor Stoppa wrote: >> >>> On 30/10/2018 21:20, Matthew Wilcox wrote: >>>> On Tue, Oct 30, 2018 at 12:28:41PM -0600, Tycho Andersen wrote: >>>>> On Tue, Oct 30, 2018 at 10:58:14AM -0700, Matthew Wilcox wrote: >>>>> On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote: >>>>>>> On Oct 30, 2018, at 9:37 AM, Kees Cook wrot= e: >>>>>> I support the addition of a rare-write mechanism to the upstream ker= nel. >>>>>> And I think that there is only one sane way to implement it: using a= n >>>>>> mm_struct. That mm_struct, just like any sane mm_struct, should only >>>>>> differ from init_mm in that it has extra mappings in the *user* regi= on. >>>>> >>>>> I'd like to understand this approach a little better. In a syscall p= ath, >>>>> we run with the user task's mm. What you're proposing is that when w= e >>>>> want to modify rare data, we switch to rare_mm which contains a >>>>> writable mapping to all the kernel data which is rare-write. >>>>> >>>>> So the API might look something like this: >>>>> >>>>> void *p =3D rare_alloc(...); /* writable pointer */ >>>>> p->a =3D x; >>>>> q =3D rare_protect(p); /* read-only pointer */ >> >> With pools and memory allocated from vmap_areas, I was able to say >> >> protect(pool) >> >> and that would do a swipe on all the pages currently in use. >> In the SELinux policyDB, for example, one doesn't really want to individ= ually protect each allocation. >> >> The loading phase happens usually at boot, when the system can be assume= d to be sane (one might even preload a bare-bone set of rules from initramf= s and then replace it later on, with the full blown set). >> >> There is no need to process each of these tens of thousands allocations = and initialization as write-rare. >> >> Would it be possible to do the same here? > > I don=E2=80=99t see why not, although getting the API right will be a tad= complicated. > >> >>>>> >>>>> To subsequently modify q, >>>>> >>>>> p =3D rare_modify(q); >>>>> q->a =3D y; >>>> >>>> Do you mean >>>> >>>> p->a =3D y; >>>> >>>> here? I assume the intent is that q isn't writable ever, but that's >>>> the one we have in the structure at rest. >>> Yes, that was my intent, thanks. >>> To handle the list case that Igor has pointed out, you might want to >>> do something like this: >>> list_for_each_entry(x, &xs, entry) { >>> struct foo *writable =3D rare_modify(entry); >> >> Would this mapping be impossible to spoof by other cores? >> > > Indeed. Only the core with the special mm loaded could see it. > > But I dislike allowing regular writes in the protected region. We really = only need four write primitives: > > 1. Just write one value. Call at any time (except NMI). > > 2. Just copy some bytes. Same as (1) but any number of bytes. > > 3,4: Same as 1 and 2 but must be called inside a special rare write regio= n. This is purely an optimization. > > Actually getting a modifiable pointer should be disallowed for two reason= s: > > 1. Some architectures may want to use a special write-different-address-s= pace operation. Heck, x86 could, too: make the actual offset be a secret an= d shove the offset into FSBASE or similar. Then %fs-prefixed writes would d= o the rare writes. > > 2. Alternatively, x86 could set the U bit. Then the actual writes would u= se the uaccess helpers, giving extra protection via SMAP. > > We don=E2=80=99t really want a situation where an unchecked pointer in th= e rare write region completely defeats the mechanism. We still have to deal with certain structures under the write-rare window. For example, see: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=3D= kspp/write-rarely&id=3D60430b4d3b113aae4adab66f8339074986276474 They are wrappers to non-inline functions that have the same sanity-checkin= g. --=20 Kees Cook