Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3067370ybl; Sun, 26 Jan 2020 18:24:18 -0800 (PST) X-Google-Smtp-Source: APXvYqwY950h1vsoeXZWYMcvS0iF2U2oNGow3m4QuJZ16hZIiC1FY1nVlAzWdTAZ97OsnrUB3R4a X-Received: by 2002:aca:c415:: with SMTP id u21mr6249592oif.49.1580091858769; Sun, 26 Jan 2020 18:24:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580091858; cv=none; d=google.com; s=arc-20160816; b=eOvKkhK/7hmi+kcgaazHyw/BUk8uO3BDXl0qY9iFtWoGj53v8hQPUBbF8z3UjMPe3/ 8x+ZflazgiDW6yJ1me47L7aPrjHOTVi6wwqpMKm19EbMxkB9k9PCKWFQ2drkolopgQqg +/5dsNX23+t+lkRhO3zAbn+T6NOIV7qUqh7/BRWrayn/+Pun6UvLM5qL2HxlSQn+uGv0 yTOesrzK/Yn+b6Sw84bHuWPmsssYIhYt2K4OUpF7KC/9ZuAbrUdEBXSAK40KQCm9qmeW Q+a1klZnrWKQARBAz0MID+Z1QGfGjSjjGV+vHNXkPHD5V3ELmc4QAZKDE704l1rR6PsL RjoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=PMhK4J394I4THgHDoNXNmbP45YIW934mob295GzVs5g=; b=weSbLbm+DZtPCrMLBacRZkB1Itq4HSmzvbcw64CdGDD7BvFnB1iwMGXy9vyTFZE2DA cI4gCOoDCMPDXeyRnpi+iBLp4nn8eiLVqQM9QezsaQbKr5FQ2nCVzuy9y/e+sjTvbsEg TaPctXl1H+bfanvqHIYvA6zTQm5VWkztMNZnyVbfIgP7RrwhMWUGw+42PEfD8fNy+xeH eDgd0Psz7+3nxgil34Z95i8XgEDirXR9mx3W04qxYg8Z+UvSxwiioyOQTE9VhuvDpEJ+ 5zz0mVqN4NfhVD9id5at8sdJI2231jcr6Y8c4WZQaZFupLl0KL0j7jJhwmWMu81Pn0Ts XZDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="ozlGC/HS"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l13si1072716otq.30.2020.01.26.18.23.54; Sun, 26 Jan 2020 18:24:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="ozlGC/HS"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727067AbgA0CWJ (ORCPT + 99 others); Sun, 26 Jan 2020 21:22:09 -0500 Received: from mail-ed1-f66.google.com ([209.85.208.66]:36485 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726725AbgA0CWJ (ORCPT ); Sun, 26 Jan 2020 21:22:09 -0500 Received: by mail-ed1-f66.google.com with SMTP id j17so9266318edp.3 for ; Sun, 26 Jan 2020 18:22:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PMhK4J394I4THgHDoNXNmbP45YIW934mob295GzVs5g=; b=ozlGC/HS7K+HkHkIA72jJOY8IVJTyGW5SLczYWrMnbnENa3t6exj9PFqpHonGHynAx YAh6M8Di1IrCJ46N7jlGuTJ1yfBFtlrzC2eD4z3N+XDU5UPBm2ZsMhN+Stvu3ODCZOeR wuAyEsV4RW8lIUlIb8AiLtEcvH25BY/EQ7BUGbzAZnMZXsm0cupeftYrZOvQjIb4toVX ZG8mnKE1wvnHEqP//XiTLkMOwpceBYmM6TlbOjo2KbpstqZkfl1nY91TKNZVtiuFvEAo f38kRLdzqYvtxrI1YhdhJ96LMc1yPRt2sdqAtItY19azKzB6RyguxR/r1w+jLRkrKiAE Eq0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PMhK4J394I4THgHDoNXNmbP45YIW934mob295GzVs5g=; b=NceeDa3K244IfxOzKn6aHaNxTvUn5KGSDx3XTN7AdT6n1IJxVeYWDCl7cKlSmK1uhV 2WJLb8kMJrr/dlPHuNQ3TeNpDUpXiY6ueQ6xhJTDZEOx9x4XY9qk+y76w175bF1mkLzN JSwwrr85vdAwJMt8W1TFW70+mReohZfG+03tmousYtAszNisOgSPEupv1Atn4AdxOjeg PJ8pjmRUyAXtNbypKvZfGwUdsPypRTfVT33BcZ/f2jjKW5xqBD0QqyF4KL2IPBlFI4Mi VDFKzpsN5Uor9lrlzk++5sUnuf19Pck4STsjDOCte+lfw+t1ogcWWWq2IHNjMB58myOE IuJw== X-Gm-Message-State: APjAAAVMUAscgOJd+Io7SPWl7fHHy2q08JoYn+uq4N2A3s+RmkX+4E7F 1y3yI4EO+Emob97TvIUM6U3FTYKPpMgcdwSZKTqStA== X-Received: by 2002:a17:906:b208:: with SMTP id p8mr11966398ejz.191.1580091726432; Sun, 26 Jan 2020 18:22:06 -0800 (PST) MIME-Version: 1.0 References: <20200123014627.71720-1-bgeffon@google.com> <20200124190625.257659-1-bgeffon@google.com> <20200126051642.GA39508@ubuntu-x2-xlarge-x86> In-Reply-To: <20200126051642.GA39508@ubuntu-x2-xlarge-x86> From: Brian Geffon Date: Sun, 26 Jan 2020 18:21:39 -0800 Message-ID: Subject: Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap(). To: Nathan Chancellor Cc: Andrew Morton , "Michael S . Tsirkin" , Arnd Bergmann , LKML , linux-mm , linux-api@vger.kernel.org, Andy Lutomirski , Andrea Arcangeli , Sonny Rao , Minchan Kim , Joel Fernandes , Yu Zhao , Jesse Barnes , clang-built-linux@googlegroups.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Nathan, Thank you! That was an oversight on my part. I'll address it in the next patch. Brian On Sat, Jan 25, 2020 at 9:16 PM Nathan Chancellor wrote: > > Hi Brian, > > On Fri, Jan 24, 2020 at 11:06:25AM -0800, Brian Geffon wrote: > > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is > > set, the source mapping will not be removed. Instead it will be > > cleared as if a brand new anonymous, private mapping had been created > > atomically as part of the mremap() call. If a userfaultfd was watching > > the source, it will continue to watch the new mapping. For a mapping > > that is shared or not anonymous, MREMAP_DONTUNMAP will cause the > > mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is > > also used. The final result is two equally sized VMAs where the > > destination contains the PTEs of the source. > > > > We hope to use this in Chrome OS where with userfaultfd we could write > > an anonymous mapping to disk without having to STOP the process or worry > > about VMA permission changes. > > > > This feature also has a use case in Android, Lokesh Gidra has said > > that "As part of using userfaultfd for GC, We'll have to move the physical > > pages of the java heap to a separate location. For this purpose mremap > > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java > > heap, its virtual mapping will be removed as well. Therefore, we'll > > require performing mmap immediately after. This is not only time consuming > > but also opens a time window where a native thread may call mmap and > > reserve the java heap's address range for its own usage. This flag > > solves the problem." > > > > Signed-off-by: Brian Geffon > > --- > > include/uapi/linux/mman.h | 5 +++-- > > mm/mremap.c | 37 ++++++++++++++++++++++++++++++------- > > 2 files changed, 33 insertions(+), 9 deletions(-) > > > > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h > > index fc1a64c3447b..923cc162609c 100644 > > --- a/include/uapi/linux/mman.h > > +++ b/include/uapi/linux/mman.h > > @@ -5,8 +5,9 @@ > > #include > > #include > > > > -#define MREMAP_MAYMOVE 1 > > -#define MREMAP_FIXED 2 > > +#define MREMAP_MAYMOVE 1 > > +#define MREMAP_FIXED 2 > > +#define MREMAP_DONTUNMAP 4 > > > > #define OVERCOMMIT_GUESS 0 > > #define OVERCOMMIT_ALWAYS 1 > > diff --git a/mm/mremap.c b/mm/mremap.c > > index 122938dcec15..bf97c3eb538b 100644 > > --- a/mm/mremap.c > > +++ b/mm/mremap.c > > @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma, > > static unsigned long move_vma(struct vm_area_struct *vma, > > unsigned long old_addr, unsigned long old_len, > > unsigned long new_len, unsigned long new_addr, > > - bool *locked, struct vm_userfaultfd_ctx *uf, > > - struct list_head *uf_unmap) > > + bool *locked, unsigned long flags, > > + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) > > { > > struct mm_struct *mm = vma->vm_mm; > > struct vm_area_struct *new_vma; > > @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma, > > if (unlikely(vma->vm_flags & VM_PFNMAP)) > > untrack_pfn_moved(vma); > > > > + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { > > + if (vm_flags & VM_ACCOUNT) > > + vma->vm_flags |= VM_ACCOUNT; > > + > > + goto out; > > + } > > + > > if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { > > /* OOM: unable to split vma, just get accounts right */ > > vm_unacct_memory(excess >> PAGE_SHIFT); > > @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, > > vma->vm_next->vm_flags |= VM_ACCOUNT; > > } > > > > +out: > > if (vm_flags & VM_LOCKED) { > > mm->locked_vm += new_len >> PAGE_SHIFT; > > *locked = true; > > @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, > > > > static unsigned long mremap_to(unsigned long addr, unsigned long old_len, > > unsigned long new_addr, unsigned long new_len, bool *locked, > > - struct vm_userfaultfd_ctx *uf, > > + unsigned long flags, struct vm_userfaultfd_ctx *uf, > > struct list_head *uf_unmap_early, > > struct list_head *uf_unmap) > > { > > @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, > > old_len = new_len; > > } > > > > + /* > > + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will > > + * check that we can expand by old_len and vma_to_resize will handle > > + * the vma growing. > > + */ > > + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm, > > + vma->vm_flags, old_len >> PAGE_SHIFT))) { > > We received a Clang build report that vma is used uninitialized here > (they aren't being publicly sent to LKML due to GCC vs Clang > warning/error overlap): > > https://groups.google.com/d/msg/clang-built-linux/gE5wRaeHdSI/xVA0MBQVEgAJ > > Sure enough, vma is initialized first in the next block. Not sure if > this section should be moved below that initialization or if something > else should be done to resolve it but that dereference will obviously be > fatal. > > Cheers, > Nathan > > > + ret = -ENOMEM; > > + goto out; > > + } > > + > > vma = vma_to_resize(addr, old_len, new_len, &charged); > > if (IS_ERR(vma)) { > > ret = PTR_ERR(vma); > > @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, > > if (IS_ERR_VALUE(ret)) > > goto out1; > > > > - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf, > > + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, > > uf_unmap); > > if (!(offset_in_page(ret))) > > goto out; > > @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, > > addr = untagged_addr(addr); > > new_addr = untagged_addr(new_addr); > > > > - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) > > + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) > > return ret; > > > > if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) > > return ret; > > > > + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED)) > > + return ret; > > + > > if (offset_in_page(addr)) > > return ret; > > > > @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, > > > > if (flags & MREMAP_FIXED) { > > ret = mremap_to(addr, old_len, new_addr, new_len, > > - &locked, &uf, &uf_unmap_early, &uf_unmap); > > + &locked, flags, &uf, &uf_unmap_early, > > + &uf_unmap); > > goto out; > > } > > > > @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, > > } > > > > ret = move_vma(vma, addr, old_len, new_len, new_addr, > > - &locked, &uf, &uf_unmap); > > + &locked, flags, &uf, &uf_unmap); > > } > > out: > > if (offset_in_page(ret)) { > > -- > > 2.25.0.341.g760bfbb309-goog > >