Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4823897ybg; Tue, 29 Oct 2019 12:58:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqyIURXich1G1WCJzowvhlx67DspPivUjJMwYJHU0bYA8+ubSbzOrLjjoe3gQ/nA0hmXHyEs X-Received: by 2002:a17:906:770c:: with SMTP id q12mr5151894ejm.75.1572379098491; Tue, 29 Oct 2019 12:58:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572379098; cv=none; d=google.com; s=arc-20160816; b=sWykLWjMK73DL07m/bGeZbXHBY2ZSULBd0czySGgIxUYIjK1W0hPpfgoBLW0sLPWvR Ickj/acwog2kpFhSFaH4q8t65TXZz+bEb1madGNr6v93bh9gbTlFO5tA+/aSOpdv1zIP 4paGaU+nU3X1Ks650jhRd36hVdk7RjHI1yejFReXKWVCBs7NcZUeIPAA51Ny/EHNPfdf eu8RAiwbIqvuHC+kP9EvexREXTcb0v1PSU65V8RiFU/3SiDRJwUmlGF6dgsucLOZTlUi HVMlkrp2Cbmej/nHsFPJEbq7aPWMkG3qIM2cd1lUTEWOuW+9b4EH9okiZ1a+5Z4uC8PH XXZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jsjQlqR6Jd5VbtAIpUpGGm+7DN3BrB9GpZgJLI//pko=; b=UoQvi8R398jYX2hfBXbUjnvKBkJGFEF7QTmxDikmGXO0KW0r1zL0+F94UvqjaT9TAa iBkDDJXXMS/E3ie5lkl6Dp49bkzk+bfyXIBFjHrXpdZ2HcaFwlrlcUzPfycEUnSbGvrH MQswksiUrIv43qDpkK+asyNvM0IL5IqnJKgpMiPhB52H08SM3u9oEJRWTvfeCfYJVnvd BemIpGVARrNU/Zmiz/Rhv/HU0v1G/PWZcOBnTTHyos4+jJfHmecXiU5CBL4oKNA5pCAA tAePleuyrbIdJs/KyfAoPEPDYCFIEHCdQmyCB4+zHfbARWXl+duxzpJqrfNAgBBXEiRF th4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=e94CpByj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s28si10724774edm.64.2019.10.29.12.57.53; Tue, 29 Oct 2019 12:58:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=e94CpByj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730809AbfJ2SKs (ORCPT + 99 others); Tue, 29 Oct 2019 14:10:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:33490 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730435AbfJ2SKs (ORCPT ); Tue, 29 Oct 2019 14:10:48 -0400 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AF8D1222C5 for ; Tue, 29 Oct 2019 18:10:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572372647; bh=mmqqgcnvD3Xud8Yvhd/XeHf9l9aSOrYjKCNVmej7nv0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=e94CpByjlElpCBSdGUw8LOUInMuzqLKgLsu2uwnO0r3HVjCV4JW7sUt1C6hs3eFmj j+vn75mvMAfROWiNwnx9nN+jLqNLHL155HSgxVTUgc1MkUhMPL0Czut3zx/MmLr8Pb D/TJ+hYXOpGdFFJdT8VUuj4Kba3ILPFDnGuB2pkA= Received: by mail-wr1-f50.google.com with SMTP id l10so14719859wrb.2 for ; Tue, 29 Oct 2019 11:10:46 -0700 (PDT) X-Gm-Message-State: APjAAAVY4y5+4M8Qk9fGHmfv41PTzqWr5yqMXQ1cvnfGLE72wfVFr06u bub2X3m3j/8qRY5Ec8+bSxt1nN6AtbkkYO3B6c6BXw== X-Received: by 2002:a5d:51c2:: with SMTP id n2mr20627016wrv.149.1572372644926; Tue, 29 Oct 2019 11:10:44 -0700 (PDT) MIME-Version: 1.0 References: <1572171452-7958-1-git-send-email-rppt@kernel.org> <2236FBA76BA1254E88B949DDB74E612BA4EEC0CE@IRSMSX102.ger.corp.intel.com> <1572371012.4812.19.camel@linux.ibm.com> In-Reply-To: <1572371012.4812.19.camel@linux.ibm.com> From: Andy Lutomirski Date: Tue, 29 Oct 2019 11:10:33 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings To: James Bottomley Cc: Andy Lutomirski , "Reshetova, Elena" , Mike Rapoport , "linux-kernel@vger.kernel.org" , Alexey Dobriyan , Andrew Morton , Arnd Bergmann , Borislav Petkov , Dave Hansen , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , "linux-api@vger.kernel.org" , "linux-mm@kvack.org" , "x86@kernel.org" , Mike Rapoport , Tycho Andersen , Alan Cox Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 29, 2019 at 10:44 AM James Bottomley wrote: > > On Tue, 2019-10-29 at 10:03 -0700, Andy Lutomirski wrote: > > On Tue, Oct 29, 2019 at 4:25 AM Reshetova, Elena > > wrote: > > > > > > > The patch below aims to allow applications to create mappins that > > > > have > > > > pages visible only to the owning process. Such mappings could be > > > > used to > > > > store secrets so that these secrets are not visible neither to > > > > other > > > > processes nor to the kernel. > > > > > > Hi Mike, > > > > > > I have actually been looking into the closely related problem for > > > the past > > > couple of weeks (on and off). What is common here is the need for > > > userspace > > > to indicate to kernel that some pages contain secrets. And then > > > there are > > > actually a number of things that kernel can do to try to protect > > > these secrets > > > better. Unmap from direct map is one of them. Another thing is to > > > map such > > > pages as non-cached, which can help us to prevent or considerably > > > restrict > > > speculation on such pages. The initial proof of concept for marking > > > pages as > > > "UNCACHED" that I got from Dave Hansen was actually based on > > > mlock2() > > > and a new flag for it for this purpose. Since then I have been > > > thinking on what > > > interface suits the use case better and actually selected going > > > with new madvise() > > > flag instead because of all possible implications for fragmentation > > > and performance. > > > > Doing all of this with MAP_SECRET seems bad to me. If user code > > wants UC memory, it should ask for UC memory -- having the kernel > > involved in the decision to use UC memory is a bad idea, because the > > performance impact of using UC memory where user code wasn't > > expecting it wil be so bad that the system might as well not work at > > all. (For kicks, I once added a sysctl to turn off caching in > > CR0. I enabled it in gnome-shell. The system slowed down to such an > > extent that I was unable to enter the three or so keystrokes to turn > > it back off.) > > > > EXCLUSIVE makes sense. Saying "don't ptrace this" makes sense. UC > > makes sense. But having one flag to rule them all does not make > > sense to me. > > So this is a usability problem. We have a memory flag that can be used > for "secrecy" for some userspace value of the word and we have a load > of internal properties depending on how the hardware works, including > potentially some hardware additions like SEV or TME, that can be used > to implement the property. If we expose our hardware vagaries, the > user is really not going to know what to do ... and we have a limited > number of flags to express this, so it stands to reason that we need to > define "secrecy" for the user and then implement it using whatever > flags we have. So I think no ptrace and no direct map make sense for > pretty much any value of "secrecy". The UC bit seems to be an attempt > to prevent exfiltration via L1TF or other cache side channels, so it > looks like it should only be applied if the side channel mitigations > aren't active ... which would tend to indicate it's a kernel decision > as well. I just don't think this will work in practice. Someone will say "hey, let's keep this giant buffer we do crypto from, or maybe even the entire data area of some critical service, secret". It will work *fine* at first. But then some kernel config changes and we can't do DMA, and now it breaks on some configs. Someone else will say "hey, I don't have L1TF or whatever mitigation, let's turn on UC", and everything goes to hell. IMO the kernel should attempt to keep *all memory* secret. Specific applications that want greater levels of secrecy should opt in to more expensive things. Here's what's already on the table: Exclusive / XPFO / XPO: allocation might be extremely expensive. Overuse might hurt performance due to huge page fragmentation DMA may not work. Otherwise it's peachy. SEV: Works only in some contexts. The current kernel implementation is, IMO, unacceptable to the extent that I wish I could go back in time and NAK it. TME: it's on or it's off. There's no room for a MAP_ flag here. MKTME: of highly dubious value here. The only useful thing here I can thing it would be a MAP_NOTSECRET to opt *out* of encryption for a specific range. Other than that, it has all the same performance implications that EXCLUSIVE has. UC: Performance hit is extreme. *Also* has the perf implications of exclusive. I can't imagine this making any sense except were the user application is written in the expectation that UC might be used so that the access patterns would be reasonable. WC: Same issues as UC plus memory ordering issues such that unsuspecting applications will corrupt data. Trying to bundle these together with kernel- or admin-only config seems like a lost cause.