Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752070AbaFZXPf (ORCPT ); Thu, 26 Jun 2014 19:15:35 -0400 Received: from mail-lb0-f169.google.com ([209.85.217.169]:38225 "EHLO mail-lb0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751329AbaFZXPe (ORCPT ); Thu, 26 Jun 2014 19:15:34 -0400 MIME-Version: 1.0 In-Reply-To: <53ACA5B3.3010702@intel.com> References: <1403084656-27284-1-git-send-email-qiaowei.ren@intel.com> <1403084656-27284-3-git-send-email-qiaowei.ren@intel.com> <53A884B2.5070702@mit.edu> <53A88806.1060908@intel.com> <53A88DE4.8050107@intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016AF41C@shsmsx102.ccr.corp.intel.com> <9E0BE1322F2F2246BD820DA9FC397ADE016B26AB@shsmsx102.ccr.corp.intel.com> <53AB42E1.4090102@intel.com> <53ACA5B3.3010702@intel.com> From: Andy Lutomirski Date: Thu, 26 Jun 2014 16:15:12 -0700 Message-ID: Subject: Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface To: Dave Hansen Cc: "Ren, Qiaowei" , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , X86 ML , "linux-kernel@vger.kernel.org" , Linux MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 26, 2014 at 3:58 PM, Dave Hansen wrote: > On 06/26/2014 03:19 PM, Andy Lutomirski wrote: >> On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen wrote: >>> On 06/25/2014 02:05 PM, Andy Lutomirski wrote: >>>> Hmm. the memfd_create thing may be able to do this for you. If you >>>> created a per-mm memfd and mapped it, it all just might work. >>> >>> memfd_create() seems to bring a fair amount of baggage along (the fd >>> part :) if all we want is a marker. Really, all we need is _a_ bit, and >>> some way to plumb to userspace the RSS values of VMAs with that bit set. >>> >>> Creating and mmap()'ing a fd seems a rather roundabout way to get there. >> >> Hmm. So does VM_MPX, though. If this stuff were done entirely in >> userspace, then memfd_create would be exactly the right solution, I >> think. >> >> Would it work to just scan the bound directory to figure out how many >> bound tables exist? > > Theoretically, perhaps. > > Practically, the bounds directory is 2GB, and it is likely to be very > sparse. You would have to walk the page tables finding where pages were > mapped, then search the mapped pages for bounds table entries. > > Assuming that it was aligned and minimally populated, that's a *MINIMUM* > search looking for a PGD entry, then you have to look at 512 PUD > entries. A full search would have to look at half a million ptes. > That's just finding out how sparse the first level of the tables are > before you've looked at a byte of actual data, and if they were empty. > > We could keep another, parallel, data structure that handles this better > other than the hardware tables. Like, say, an rbtree that stores ranges > of virtual addresses. We could call them vm_area_somethings ... wait a > sec... we have a structure like that. ;) > > So here's my mental image of how I might do this if I were doing it entirely in userspace: I'd create a file or memfd for the bound tables and another for the bound directory. These files would be *huge*: the bound directory file would be 2GB and the bounds table file would be 2^48 bytes or whatever it is. (Maybe even bigger?) Then I'd just map pieces of those files wherever they'd need to be, and I'd make the mappings sparse. I suspect that you don't actually want a vma for each piece of bound table that gets mapped -- the space of vmas could end up incredibly sparse. So I'd at least map (in the vma sense, not the pte sense) and entire bound table at a time. And I'd probably just map the bound directory in one big piece. Then I'd populate it in the fault handler. This is almost what the code is doing, I think, modulo the files. This has one killer problem: these mappings need to be private (cowed on fork). So memfd is no good. There's got to be an easyish way to modify the mm code to allow anonymous maps with vm_ops. Maybe a new mmap_region parameter or something? Maybe even a special anon_vma, but I don't really understand how those work. Also, egads: what happens when a bound table entry is associated with a MAP_SHARED page? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/