Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4812493ybg; Tue, 29 Oct 2019 12:46:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqwKbbKzXVeWMMjqztIacss7PKONxYDB5IVDDvHdP28v029bjMiFVtX4fKj58OAqDPN1Jvve X-Received: by 2002:aa7:c502:: with SMTP id o2mr27419471edq.223.1572378381292; Tue, 29 Oct 2019 12:46:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572378381; cv=none; d=google.com; s=arc-20160816; b=pwHqvYLmAqQr1HfeJLs7zfDdlLad7zSMox6ccbmZOOCHiPdxwknMFVZ2u/48CTEhn4 ZJHnNIhWfgCRvq6g7W0o6/FJ5j2nFTJfY2whkTGxgugZn2XZnYcfN9tsz8Mw+ovlAHJ1 +K67MtckYIgKiVoEX8uiuDWJqSho9RYqt44juWrpVOv/IfTuGWkO8hiCjfnakQKcmDFJ lN5O6GG3VE/zDf1p4VV8PmXQCWx/TL1LiSW3297XZ7Hl+y7OI3DE7tP0jQ740KjAFBEf OhqNFfV2SWzle/NQ6bfiHrYnltfA75HstL4OmVaTM3iMthIzMD0zrEgyg7sFxuSzLdMk /Q1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=OE51UBN6LQ7VYVZ0gEA2r+oupQT/3uJeLD2O6XKVczM=; b=ZqZ4yozXHRIqvgWvYuqDu5Vl+9drFiKRKQTkAy3gYtsw0oYW+SsTSdaJCgNq60dBJo wzTe0UEhtJqL+TTq/mFwme6amvTZw9LnDLDKiirQ/rSK2SufY67JIUbUaQp06QQXIGnW a9pF9REhGUsM3T9cAtOAKNZuUQPKSvAhMLp+5iazAlbIg2OQZ/tJHNqIDuD7ZcaQWEhZ QvWADiyC5p20hxcNEYA718YSmeAVC2bRiYhMem/hUChcc7CHy9DLdGo6qSigI+UAWtNk z8XAR6cwADoDMj6gtM7OFkdWpE8+N2cuoY2AVSbhwslYT6pJDrF/RqRU/iiDTawsU/d2 NH6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=q3ICBVGc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l15si8669002ejp.114.2019.10.29.12.45.42; Tue, 29 Oct 2019 12:46:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=q3ICBVGc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390620AbfJ2RBL (ORCPT + 99 others); Tue, 29 Oct 2019 13:01:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:43078 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390518AbfJ2RBL (ORCPT ); Tue, 29 Oct 2019 13:01:11 -0400 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6584E21D7C for ; Tue, 29 Oct 2019 17:01:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572368469; bh=CEfNBOwII3kRKjNTV1QnlKd06oaG8CbrkIl/0qlzYVo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=q3ICBVGcY42CD77N0qTsXJMlZjKwCidRWPAxH82TYrbs7YAGB3c7j/9/y7vR/rPZR eDzCjhVK7kXCC1J3TLXjgEAjKbyUIb+2Sw6OzQcEHDflnMgQERH7a35eeLBeGOOIQO H7iDIO3lp7G8rFWGFzKfkz3edX4IblThKLG68/UA= Received: by mail-wr1-f52.google.com with SMTP id q13so14443952wrs.12 for ; Tue, 29 Oct 2019 10:01:09 -0700 (PDT) X-Gm-Message-State: APjAAAX5jAkdn8PSLv0ZaybVmL/ROPnf20B6etjwDNTLpMd5TJznS37B 0YseMihruPAwb0EKK4mMTRo6s1ZxMpNrWP3sPzlRyQ== X-Received: by 2002:adf:f04e:: with SMTP id t14mr21415495wro.106.1572368466787; Tue, 29 Oct 2019 10:01:06 -0700 (PDT) MIME-Version: 1.0 References: <1572171452-7958-1-git-send-email-rppt@kernel.org> <20191029093254.GE18773@rapoport-lnx> In-Reply-To: <20191029093254.GE18773@rapoport-lnx> From: Andy Lutomirski Date: Tue, 29 Oct 2019 10:00:55 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings To: Mike Rapoport Cc: LKML , Alexey Dobriyan , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Dave Hansen , James Bottomley , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Linux API , Linux-MM , X86 ML , Mike Rapoport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 29, 2019 at 2:33 AM Mike Rapoport wrote: > > On Mon, Oct 28, 2019 at 02:44:23PM -0600, Andy Lutomirski wrote: > > > > > On Oct 27, 2019, at 4:17 AM, Mike Rapoport wrote: > > > > > > =EF=BB=BFFrom: Mike Rapoport > > > > > > Hi, > > > > > > The patch below aims to allow applications to create mappins that hav= e > > > pages visible only to the owning process. Such mappings could be used= to > > > store secrets so that these secrets are not visible neither to other > > > processes nor to the kernel. > > > > > > I've only tested the basic functionality, the changes should be verif= ied > > > against THP/migration/compaction. Yet, I'd appreciate early feedback. > > > > I=E2=80=99ve contemplated the concept a fair amount, and I think you sh= ould > > consider a change to the API. In particular, rather than having it be a > > MAP_ flag, make it a chardev. You can, at least at first, allow only > > MAP_SHARED, and admins can decide who gets to use it. It might also pl= ay > > better with the VM overall, and you won=E2=80=99t need a VM_ flag for i= t =E2=80=94 you > > can just wire up .fault to do the right thing. > > I think mmap()/mprotect()/madvise() are the natural APIs for such > interface. Then you have a whole bunch of questions to answer. For example: What happens if you mprotect() or similar when the mapping is already in use in a way that's incompatible with MAP_EXCLUSIVE? Is it actually reasonable to malloc() some memory and then make it exclusiv= e? Are you permitted to map a file MAP_EXCLUSIVE? What does it mean? What does MAP_PRIVATE | MAP_EXCLUSIVE do? How does one pass exclusive memory via SCM_RIGHTS? (If it's a memfd-like or chardev interface, it's trivial. mmap(), not so much.) And finally, there's my personal giant pet peeve: a major use of this will be for virtualization. I suspect that a lot of people would like the majority of KVM guest memory to be unmapped from the host pagetables. But people might also like for guest memory to be unmapped in *QEMU's* pagetables, and mmap() is a basically worthless interface for this. Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse. > Switching to a chardev doesn't solve the major problem of direct > map fragmentation and defeats the ability to use exclusive memory mapping= s > with the existing allocators, while mprotect() and madvise() do not. > Will people really want to do malloc() and then remap it exclusive? This sounds dubiously useful at best.