Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp5233047imd; Tue, 30 Oct 2018 14:09:46 -0700 (PDT) X-Google-Smtp-Source: AJdET5dF8h6YlVllDW6O1IrBHrImdk76KiVnWFja37dfnfqUMaKh5Bzlq1v65WuZknSVfUrco9j/ X-Received: by 2002:a62:4301:: with SMTP id q1-v6mr302830pfa.163.1540933786759; Tue, 30 Oct 2018 14:09:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540933786; cv=none; d=google.com; s=arc-20160816; b=YcvJGSBkPN5wHI6d1mi5MlVZfwR5CRTSvab3XRGIJVQaJ9kXhx0M877Rv+5ni4gRyK 3j5jNbocdq1NJUbA8KqHbZYNf98LpZHgqd1up5kkxy4jJlcvg9HwD936VGYuqrh3EP+y jireAbIPqGwzXgjP1euq6WA5Zj6GPq1vvnxFP+A2se/NazTgse/9nuXWjWZr7XP8dUlA sehCCUTICVPNcYH5By2LnwI9zhkYRr0/ABWAHKJyUx3dLgN4glWl10S+mSQo1bItdsNi RFNv9x6MXjzLux9ad+pba25RaA0nlT38ShAolZZnpbe10Uf+eiyJPc9YsJpg0Ebwhbzy 2qnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=xu5WEvOdkhwAcw/tDTH6tw+32SrXTot1y5NB/AFoBIs=; b=mvntIjLC5hm1TuR/jbsBhGJpfsE50lptfstnFwZzZjISvnpvYsKQv/lpHJgdxZ8w3x R8KskF/EpIgKUcJWmxI2MMXMtweUvni18+wjn0/MWxwofa4KQ8ZqY6LOuNfHh/8fTU2d 0UQuNRP3R8fvqq4bKRRfO+02t735NdDcppPSG55sjwgPjMr97uNQ+v1GYcSKD2psjlQ8 illVCLUWhRt+qA6g12JK6Nf+zDSbigzGZUapTy5nMhPdnIVegp2BQ+aL0E9gJfAesfLu Y8Js+u/8fqiNEBOxsbX0X72XREzJpimhuT5fA1gec9dlvFEEfz2z7r8ImQ6c2wXjQnre eNBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=b0aWIQci; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e62-v6si25422099pfc.240.2018.10.30.14.09.31; Tue, 30 Oct 2018 14:09:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=b0aWIQci; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727655AbeJaFiV (ORCPT + 99 others); Wed, 31 Oct 2018 01:38:21 -0400 Received: from mail-lf1-f68.google.com ([209.85.167.68]:40013 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725812AbeJaFiV (ORCPT ); Wed, 31 Oct 2018 01:38:21 -0400 Received: by mail-lf1-f68.google.com with SMTP id n3-v6so9982150lfe.7; Tue, 30 Oct 2018 13:43:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=xu5WEvOdkhwAcw/tDTH6tw+32SrXTot1y5NB/AFoBIs=; b=b0aWIQciD2DGRdfad6mVuqrT1TXY/CanJEtTZyG6XTjz+2+EOLjtS+rkfnpr2boN03 oN9lVCpGlyc0tzzkI/I5TdRulXOXBnCtUmStuLEC424QT09opngSh3BEjE4ihuJXDcG3 m+taoz6yBFDiLKzINxZZupGq8xq3fmGbW2J2Dc4n+4tUUciYVhR+T5ST3tUeXCRKz6W5 HoeepadUg8CE5t6HfOzFnqMwpZoodPw3l0UXcacKN06WCuLjmHxVedUGoYk+x3AK5K5r s03Hn2XEzSk9Gy8m5mXkMQcJgu0soW8vQ+nxi8L8zbRSktJ7o/73zW8kG8MKcM2lWrwR q/3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xu5WEvOdkhwAcw/tDTH6tw+32SrXTot1y5NB/AFoBIs=; b=MOwjWdxMkdASGOArDEjwpgnJfuS0MP7MFgBFviDz9GV8EkUTCsQnwi6oB45iCMUtL8 2I+7dEt1lSYr0f555JNqc9dwGngv1LMF6G/66Ic7d4NbgcJsd0pEwpOWeAOhEwjj4c3D mnY3gJWk1SVOCWVOg3iWS6UNbDUxARoyv6Kz49BjNcW+jEUDSazyTs+I2OniOyLUOjAY L9sIc7fkMPEtcDw65na5/x2Lib5jKZSC/tODcBjgt+zJ2Rl9AXf7wvmEaHlIT5SjfNlr EA+hHdryj6ZTu0Fptt5PPYQXids/QH/OXYFcMA9cgt9+TTLWe6gIqjCp8xG70WEd4RJV zw4w== X-Gm-Message-State: AGRZ1gIk1QOeoZvxAC4BZOvsMi6LlYUhel9As9hEECh3cQo7Z5naWogI i3cE92mXXMIx4ltTBEHmfRs= X-Received: by 2002:a19:ab1a:: with SMTP id u26-v6mr139559lfe.103.1540932196483; Tue, 30 Oct 2018 13:43:16 -0700 (PDT) Received: from [192.168.10.160] (91-159-62-169.elisa-laajakaista.fi. [91.159.62.169]) by smtp.gmail.com with ESMTPSA id s69-v6sm3848162lfs.92.2018.10.30.13.43.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Oct 2018 13:43:15 -0700 (PDT) Subject: Re: [PATCH 10/17] prmem: documentation To: Matthew Wilcox , Tycho Andersen Cc: Andy Lutomirski , Kees Cook , Peter Zijlstra , Mimi Zohar , Dave Chinner , James Morris , Michal Hocko , Kernel Hardening , linux-integrity , linux-security-module , Igor Stoppa , Dave Hansen , Jonathan Corbet , Laura Abbott , Randy Dunlap , Mike Rapoport , "open list:DOCUMENTATION" , LKML , Thomas Gleixner References: <20181023213504.28905-11-igor.stoppa@huawei.com> <20181026092609.GB3159@worktop.c.hoisthospitality.com> <20181028183126.GB744@hirez.programming.kicks-ass.net> <40cd77ce-f234-3213-f3cb-0c3137c5e201@gmail.com> <20181030152641.GE8177@hirez.programming.kicks-ass.net> <0A7AFB50-9ADE-4E12-B541-EC7839223B65@amacapital.net> <20181030175814.GB10491@bombadil.infradead.org> <20181030182841.GE7343@cisco> <20181030192021.GC10491@bombadil.infradead.org> From: Igor Stoppa Message-ID: <9edbdf8b-b5fb-5a82-43b4-b639f5ec8484@gmail.com> Date: Tue, 30 Oct 2018 22:43:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181030192021.GC10491@bombadil.infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/10/2018 21:20, Matthew Wilcox wrote: > On Tue, Oct 30, 2018 at 12:28:41PM -0600, Tycho Andersen wrote: >> On Tue, Oct 30, 2018 at 10:58:14AM -0700, Matthew Wilcox wrote: >>> On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote: >>>>> On Oct 30, 2018, at 9:37 AM, Kees Cook wrote: >>>> I support the addition of a rare-write mechanism to the upstream kernel. >>>> And I think that there is only one sane way to implement it: using an >>>> mm_struct. That mm_struct, just like any sane mm_struct, should only >>>> differ from init_mm in that it has extra mappings in the *user* region. >>> >>> I'd like to understand this approach a little better. In a syscall path, >>> we run with the user task's mm. What you're proposing is that when we >>> want to modify rare data, we switch to rare_mm which contains a >>> writable mapping to all the kernel data which is rare-write. >>> >>> So the API might look something like this: >>> >>> void *p = rare_alloc(...); /* writable pointer */ >>> p->a = x; >>> q = rare_protect(p); /* read-only pointer */ With pools and memory allocated from vmap_areas, I was able to say protect(pool) and that would do a swipe on all the pages currently in use. In the SELinux policyDB, for example, one doesn't really want to individually protect each allocation. The loading phase happens usually at boot, when the system can be assumed to be sane (one might even preload a bare-bone set of rules from initramfs and then replace it later on, with the full blown set). There is no need to process each of these tens of thousands allocations and initialization as write-rare. Would it be possible to do the same here? >>> >>> To subsequently modify q, >>> >>> p = rare_modify(q); >>> q->a = y; >> >> Do you mean >> >> p->a = y; >> >> here? I assume the intent is that q isn't writable ever, but that's >> the one we have in the structure at rest. > > Yes, that was my intent, thanks. > > To handle the list case that Igor has pointed out, you might want to > do something like this: > > list_for_each_entry(x, &xs, entry) { > struct foo *writable = rare_modify(entry); Would this mapping be impossible to spoof by other cores? I'm asking this because, from what I understand, local interrupts are enabled here, so an attack could freeze the core performing the write-rare operation, while another scrapes the memory. But blocking interrupts for the entire body of the loop would make RT latency unpredictable. > kref_get(&writable->ref); > rare_protect(writable); > } > > but we'd probably wrap it in list_for_each_rare_entry(), just to be nicer. This seems suspiciously close to the duplication of kernel interfaces that I was roasted for :-) -- igor