Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp3960078ybg; Mon, 28 Oct 2019 23:29:09 -0700 (PDT) X-Google-Smtp-Source: APXvYqyMy6WtiHJPjxeK2rBEXVuWX90JLe6qpITmYtyxzlalqsGEyy2Yl7exMLsbSNBdUCYo4nKd X-Received: by 2002:aa7:c691:: with SMTP id n17mr12624950edq.100.1572330549691; Mon, 28 Oct 2019 23:29:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572330549; cv=none; d=google.com; s=arc-20160816; b=NOsNlt84AoyItElHSi4tg3vv/IHdX4wSQHNumwfPLKLo2khrgTFtreGS6EaZd+PfDO +To1237sggRK/etmP4S3ou1f1NBzVInyBVre3QDdhBzDKZHheExfYUzSistrmKUB7MWf Uf6HMBqSuSRgKzGcoJO3ncwyqZ9VrzdNr4OpeAlTryra/mVeAkDzAQSiZ2P+ldwjOB5J FGEYZ7EQEpyGF23J2GfpzGDBmWqmfnlyejeuiWf0E3zoV0tbuaCe5ldSy7W4jHB9o8GF EgHeAC7fuw4VMlchDATbyaRfeZn3QV5Vke/16+wKbt1laiPOD4EXLVw5bdUIOqF243wP 8vlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=WlyrH0UNTZftiGlGwyjbGaPeVPSbH6T8j2fk7b1vGD8=; b=LfJaZNRzUfp+ZCqheSzdKWb3PuayetZYwH0Q9MuQIZLfBNRV3Q+DH/44pnK60EW+lS 7GGzu48oBJvcNVLlgMRqxBo+2+UYREQDkmoVg6QKbgdGd3aO27zZo5/GL8UTVsDCmJ6i nK9cSyo6LZp219QvF+TGjyDBcN6pwQBSFKeTqCqz1NTXZcWL/34K2/ZRgXJhsIRGk7HJ JCg4GLhr5SzaPpYwX/CVU/CN7PJ6F4YJP/rCat1jHKljv6gZLNEz5tfv9qTRfA4274nL mtZIewZyS+XfVQsztoG5P/n5BEos351esi/cGxKAhaDY4/9gpt9wc0NGOZMQmEQza6X1 +5Fg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id oe20si7645586ejb.228.2019.10.28.23.28.45; Mon, 28 Oct 2019 23:29:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404179AbfJ1Rdb (ORCPT + 99 others); Mon, 28 Oct 2019 13:33:31 -0400 Received: from mga02.intel.com ([134.134.136.20]:42803 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404155AbfJ1Rdb (ORCPT ); Mon, 28 Oct 2019 13:33:31 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Oct 2019 10:32:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,240,1569308400"; d="scan'208";a="399514952" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga005.fm.intel.com with ESMTP; 28 Oct 2019 10:32:30 -0700 Date: Mon, 28 Oct 2019 10:32:30 -0700 From: Sean Christopherson To: Dave Hansen Cc: Mike Rapoport , linux-kernel@vger.kernel.org, Alexey Dobriyan , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Dave Hansen , James Bottomley , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-api@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, Mike Rapoport Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Message-ID: <20191028173229.GC5061@linux.intel.com> References: <1572171452-7958-1-git-send-email-rppt@kernel.org> <1572171452-7958-2-git-send-email-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 28, 2019 at 10:12:44AM -0700, Dave Hansen wrote: > On 10/27/19 3:17 AM, Mike Rapoport wrote: > > The pages in these mappings are removed from the kernel direct map and > > marked with PG_user_exclusive flag. When the exclusive area is unmapped, > > the pages are mapped back into the direct map. > > This looks fun. It's certainly simple. > > But, the description is not really calling out the pros and cons very > well. I'm also not sure that folks will use an interface like this that > requires up-front, special code to do an allocation instead of something > like madvise(). That's why protection keys ended up the way it did: if > you do this as a mmap() replacement, you need to modify all *allocators* > to be enabled for this. If you do it with mprotect()-style, you can > apply it to existing allocations. > > Some other random thoughts: > > * The page flag is probably not a good idea. It would be probably > better to set _PAGE_SPECIAL on the PTE and force get_user_pages() > into the slow path. > * This really stops being "normal" memory. You can't do futexes on it, > cant splice it. Probably need a more fleshed-out list of > incompatible features. > * As Kirill noted, each 4k page ends up with a potential 1GB "blast > radius" of demoted pages in the direct map. Not cool. This is > probably a non-starter as it stands. > * The global TLB flushes are going to eat you alive. They probably > border on a DoS on larger systems. > * Do we really want this user interface to dictate the kernel > implementation? In other words, do we really want MAP_EXCLUSIVE, > or do we want MAP_SECRET? One tells the kernel what do *do*, the > other tells the kernel what the memory *IS*. If we go that route, maybe MAP_USER_SECRET so that there's wiggle room in the event that there are different secret keepers that require different implementations in the kernel? E.g. MAP_GUEST_SECRET for a KVM guest to take the userspace VMM (Qemu) out of the TCB, i.e. the mapping would be accessible by the kernel (or just KVM?) and the KVM guest, but not userspace. > * There's a lot of other stuff going on in this area: XPFO, SEV, MKTME, > Persistent Memory, where the kernel direct map is a liability in some > way. We probably need some kind of overall, architected solution > rather than five or ten things all poking at the direct map. >