Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp15288pxb; Mon, 8 Feb 2021 13:48:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJyNxipuuzjZWyyGmJQutbMetruMTwLTqPh/GeKhfCsM6nTVfcGL4ujRnJLciTZ4Gv1W7AEx X-Received: by 2002:a17:906:3c1:: with SMTP id c1mr9325460eja.428.1612820926939; Mon, 08 Feb 2021 13:48:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612820926; cv=none; d=google.com; s=arc-20160816; b=JFgznnG9+VuddHMjvg2gFjRUFkhk7DarqnKUG+Rlf56/xmWeTYMDiqUXSuJzYgtnhI kHdGmbLccOmRv9WBzLkGDYInQMigL/KL011ev/r+HNlVLrU62ftt8k3mXf6EDX1LiXOb 7f1uy1wjNXZ2CR64xEp5+jRVLXZYMDIXnrOSXmBcLQsPGRIGfiO9XuXhvKohkVKstkoU w+/H4ANL1ydKQN8HmGTUZlwsVBaz7c+Ajh6OiBgAnyadZZ6So33oJ9wVEbTDZTQHCFzE q9IuIB/NSyGkciHU6q79vdUb3MzJxkrnBXyUN5ZahOwKgm7MaUowwR0CP7f/Ebho3QSY WXSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ibeG7oFDlDtA1RhqFjz4Qh8CkI63ESF5l00GH8w/Uiw=; b=nxqQ0wfE29wMWNynj+vs2eodXPWRMTqBE9bqfqHOdRSOKTuo1eprsbUMajIiLYTtt0 okC8VER/rYPbaPkTF2SK9n5OWppyMD4ZvFgzn1haexOTX2ekh1pjWPzG4uOdMurNGc6p Cm8p2DnxM02EtS2bd4kfXKUNV9Jf/LxjqMvJYVaApGrUOmgSBEJCRjrlD66M1dOxuU5B mCZQslVYaZ56oUbXJqb64NmWmID54BEQoN/HMA97CsqVO5xGOBoCxsoKwc/vIEkZIbaL Y/0xSsCfAS8iripMtOdU62YZpJ+1B7DHFjmp3FGP3qtldYFnkoPs+HImha650GMucKx0 vj5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Eerdn/7T"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z9si15018869edr.392.2021.02.08.13.48.23; Mon, 08 Feb 2021 13:48:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Eerdn/7T"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235956AbhBHVra (ORCPT + 99 others); Mon, 8 Feb 2021 16:47:30 -0500 Received: from mail.kernel.org ([198.145.29.99]:35556 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236260AbhBHV1F (ORCPT ); Mon, 8 Feb 2021 16:27:05 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id E27F164E6C; Mon, 8 Feb 2021 21:26:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612819582; bh=Y8G7gf2IwGjBXBF5ErxQfWKVOLSdVIOpGEIG67JA2AY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Eerdn/7TBUJNWh5JSQGQanhkCbXO6r8i+07ORvc0kNlXTv/gMu3WAJsIb2RYRWrDl NzMmvcDn3iTmjaUTOACFOeq+fWRq6saXTmy51YQtoTk38Gte4b+P8L19Xbwx0L6ml+ Ea3Z7Qd3D+MldYIlVvyh5Yb26npCORZwTE+3wJTir9k1lgaS6eZYPXQ4zFo0WChRGs fUiJgsx/cCeor6w8T2f3XtMihtpacVS0+tPYSqs4uM0k3MR4YWV+vaapoyzoPPxAl+ yFpPIEmRq4D/vlPxshoMK56Hm/B9URPT2mlstPmtBfgreYHioAYDlS64KLgxO+QeoG FSPZy/knOHzIQ== Date: Mon, 8 Feb 2021 23:26:05 +0200 From: Mike Rapoport To: Michal Hocko Cc: Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210208212605.GX242749@kernel.org> References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote: > On Mon 08-02-21 10:49:17, Mike Rapoport wrote: > > From: Mike Rapoport > > > > Introduce "memfd_secret" system call with the ability to create memory > > areas visible only in the context of the owning process and not mapped not > > only to other processes but in the kernel page tables as well. > > > > The secretmem feature is off by default and the user must explicitly enable > > it at the boot time. > > > > Once secretmem is enabled, the user will be able to create a file > > descriptor using the memfd_secret() system call. The memory areas created > > by mmap() calls from this file descriptor will be unmapped from the kernel > > direct map and they will be only mapped in the page table of the owning mm. > > Is this really true? I guess you meant to say that the memory will > visible only via page tables to anybody who can mmap the respective file > descriptor. There is nothing like an owning mm as the fd is inherently a > shareable resource and the ownership becomes a very vague and hard to > define term. Hmm, it seems I've been dragging this paragraph from the very first mmap(MAP_EXCLUSIVE) rfc and nobody (including myself) noticed the inconsistency. > > The file descriptor based memory has several advantages over the > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It > > paves the way for VMMs to remove the secret memory range from the process; > > I do not understand how it helps to remove the memory from the process > as the interface explicitly allows to add a memory that is removed from > all other processes via direct map. The current implementation does not help to remove the memory from the process, but using fd-backed memory seems a better interface to remove guest memory from host mappings than mmap. As Andy nicely put it: "Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse." > > As secret memory implementation is not an extension of tmpfs or hugetlbfs, > > usage of a dedicated system call rather than hooking new functionality into > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and > > allows better upwards compatibility. > > What is this supposed to mean? What are differences? Well, the phrasing could be better indeed. That supposed to mean that they differ in the semantics behind the file descriptor: memfd_create implements sealing for shmem and hugetlbfs while memfd_secret implements memory hidden from the kernel. > > The secretmem mappings are locked in memory so they cannot exceed > > RLIMIT_MEMLOCK. Since these mappings are already locked an attempt to > > mlock() secretmem range would fail and mlockall() will ignore secretmem > > mappings. > > What about munlock? Isn't this implied? ;-) I'll add a sentence about it. -- Sincerely yours, Mike.