Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp998070ybv; Fri, 7 Feb 2020 12:21:17 -0800 (PST) X-Google-Smtp-Source: APXvYqzraX6u2xiy4MCt9CXJs25n6Mk3HkfiKsU04ythRzSzkIT4DsWlEgrWrKmQ6loNF3q0N1N/ X-Received: by 2002:aca:c551:: with SMTP id v78mr3265120oif.161.1581106877744; Fri, 07 Feb 2020 12:21:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581106877; cv=none; d=google.com; s=arc-20160816; b=Ed+inb+uiHfKMPZvLwzIB24+uaKbgcy7biBhaIYf71TSTWthphZJHzgJ++Ga29Gmd/ OANwRPm4qEz6anHk+ssJZ/bTN4hskVUY7zprftk2IiDkYZcBqagtt+YCgg6t+fxRw7sV U1UmPESmuSRjWH43AJO7p66VT1+nVlk7xMSNJZ94LoLo66NM+C9gKMItqANq/YOau/NU t64YpD1GPxlb3WfKguH4XW0nzubqWktkbvTf8okAX9j84nSuM7/w8/m6T52KHclwR8JX KsYU5sEhWdj6KmM1KHH+tT1sa3Ld1h3xfit8Yb2G8AjBe5aJw2rsMnFvgQsexeJN1GXg WWRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:from :subject:mime-version:message-id:date:dkim-signature; bh=QzauUeaEnfHfWuXx3lBriozcVQ0DxIozXWsHJDJ8lEM=; b=CUkpnGBXs5y+7+QlQQOMQug+kIjDkTcrfWZyVcv+HylnVG7nhUFv7lCB6nHVv9emHy iFIqaRilKVBR3j/C1NhVPicn6+dli7Mw6eQo1xG9eUwm7feikB7SVRyjBGqarSnsDxHD u64xW5HLEA1p3f/coBtlW/JE84uww2YokY1WlKY0YL+Bn+AdjQ8Xns9djUvia+AAlDfX j/+xUlp/pWdU8DT1D8ef10uOc/ri6BgL8As7kJdOVRKVydmvn4ehATz5TANO+upiXeC4 0fzeVE202uzkrnE+jyVskvn9VXFRW4cmuuEgiSBQkoKqW4+AKpUpuy1ZrLoDZWLU6tsR YZqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=jYoOVjgu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d5si4224029oij.139.2020.02.07.12.21.05; Fri, 07 Feb 2020 12:21:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=jYoOVjgu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727505AbgBGUTT (ORCPT + 99 others); Fri, 7 Feb 2020 15:19:19 -0500 Received: from mail-pg1-f201.google.com ([209.85.215.201]:45280 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727117AbgBGUTT (ORCPT ); Fri, 7 Feb 2020 15:19:19 -0500 Received: by mail-pg1-f201.google.com with SMTP id q1so354527pge.12 for ; Fri, 07 Feb 2020 12:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=QzauUeaEnfHfWuXx3lBriozcVQ0DxIozXWsHJDJ8lEM=; b=jYoOVjguL2ABHtruGRJTVFLIe6pXUkIprGqjAW0BWNA6gLV+6m38ChSLz6vzL7y5iP 4KMduhDGlzRx8WC65a9abm1yrH9br8M6m6Q9w56nw2IrnzZRJHrDjKqFLq/T5hKd8lYr DLQkObWB4W1mnUW4BJEqjtteu5t7W/ZianH0tD+CgU1pUKHAXzvf095gSThBMaKVQSuj CQq+FQo9j0qVa6B8VtCUv4IG7ekD9H3/9hypxEAOFjPpICpXzfn8V4DV3G/P0P8Nq7IG 3HHjK1iMb9A16rg4N0nATZefcauxUt9i/XOVgNaWwHmxpBZFiLnID3yd1fEMwGPc+QSW HVOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=QzauUeaEnfHfWuXx3lBriozcVQ0DxIozXWsHJDJ8lEM=; b=egOEq4rmaE0mCC0tvxBE5CQUVTbxmyv25RGe6iQvXWnN9vgiRQglH6GjCXw+/TMoH9 3Sl7u/C4fC0aV732+Vv3FrolEmOmvXHERuaz5u2DDXJ1XzbX3mY9Xbv1/sIhbdl1uvJ/ 9HgziJNBe6XcUbI18xZ6+S4g6QQJw9wUzEmGcarFuOxYGj9XYeYNtO0emLE/nLRN33gD OkJewEEsXC1YFVpTXmGW+8zO9gsqmem4gpkbczv8nyYr39bvJCYMx8WryDgsBrtpE97C fHqE7XvuFraKLowSnn1g7SnIsD4cYQFuD+9Xr2JiLi35IPFjZE1DpNv8gLIE278JoLK6 ni8Q== X-Gm-Message-State: APjAAAUaYR28XFIO6x6uaoR0MKYdiwO9WDUlUSdIwP90XXg4mnBhvR42 jKXuJwzSBwXHfLftoIoJQC0Q7JakMkiG X-Received: by 2002:a63:9358:: with SMTP id w24mr964623pgm.207.1581106758156; Fri, 07 Feb 2020 12:19:18 -0800 (PST) Date: Fri, 7 Feb 2020 12:18:56 -0800 Message-Id: <20200207201856.46070-1-bgeffon@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.25.0.341.g760bfbb309-goog Subject: [PATCH v4] mm: Add MREMAP_DONTUNMAP to mremap(). From: Brian Geffon To: Andrew Morton Cc: "Michael S . Tsirkin" , Brian Geffon , Arnd Bergmann , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Andy Lutomirski , Will Deacon , Andrea Arcangeli , Sonny Rao , Minchan Kim , Joel Fernandes , Yu Zhao , Jesse Barnes , Nathan Chancellor , Florian Weimer , "Kirill A . Shutemov" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. =C2=A0If a userfaultfd was watchin= g the source, it will continue to watch the new mapping. =C2=A0For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. Because MREMAP_DONTUNMAP always results in moving a VMA you MUST use the MREMAP_MAYMOVE flag. The final result is two equally sized VMAs where the destination contains the PTEs of the source. We hope to use this in Chrome OS where with userfaultfd we could write an anonymous mapping to disk without having to STOP the process or worry about VMA permission changes. This feature also has a use case in Android, Lokesh Gidra has said that "As part of using userfaultfd for GC, We'll have to move the physical pages of the java heap to a separate location. For this purpose mremap will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java heap, its virtual mapping will be removed as well. Therefore, we'll require performing mmap immediately after. This is not only time consuming but also opens a time window where a native thread may call mmap and reserve the java heap's address range for its own usage. This flag solves the problem." =C2=A0 =C2=A0 Signed-off-by: Brian Geffon --- include/uapi/linux/mman.h | 5 +- mm/mremap.c | 98 ++++++++++++++++++++++++++++++--------- 2 files changed, 80 insertions(+), 23 deletions(-) diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index fc1a64c3447b..923cc162609c 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -5,8 +5,9 @@ #include #include =20 -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 =20 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 diff --git a/mm/mremap.c b/mm/mremap.c index 122938dcec15..9f4aa17f178b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, - bool *locked, struct vm_userfaultfd_ctx *uf, - struct list_head *uf_unmap) + bool *locked, unsigned long flags, + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) { struct mm_struct *mm =3D vma->vm_mm; struct vm_area_struct *new_vma; @@ -408,11 +408,41 @@ static unsigned long move_vma(struct vm_area_struct *= vma, if (unlikely(vma->vm_flags & VM_PFNMAP)) untrack_pfn_moved(vma); =20 + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { + if (vm_flags & VM_ACCOUNT) { + /* Always put back VM_ACCOUNT since we won't unmap */ + vma->vm_flags |=3D VM_ACCOUNT; + + vm_acct_memory(vma_pages(new_vma)); + } + + /* + * locked_vm accounting: if the mapping remained the same size + * it will have just moved and we don't need to touch locked_vm + * because we skip the do_unmap. If the mapping shrunk before + * being moved then the do_unmap on that portion will have + * adjusted vm_locked. Only if the mapping grows do we need to + * do something special; the reason is locked_vm only accounts + * for old_len, but we're now adding new_len - old_len locked + * bytes to the new mapping. + */ + if (new_len > old_len) + mm->locked_vm +=3D (new_len - old_len) >> PAGE_SHIFT; + + goto out; + } + if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); excess =3D 0; } + + if (vm_flags & VM_LOCKED) { + mm->locked_vm +=3D new_len >> PAGE_SHIFT; + *locked =3D true; + } +out: mm->hiwater_vm =3D hiwater_vm; =20 /* Restore VM_ACCOUNT if one or two pieces of vma left */ @@ -422,16 +452,12 @@ static unsigned long move_vma(struct vm_area_struct *= vma, vma->vm_next->vm_flags |=3D VM_ACCOUNT; } =20 - if (vm_flags & VM_LOCKED) { - mm->locked_vm +=3D new_len >> PAGE_SHIFT; - *locked =3D true; - } - return new_addr; } =20 static struct vm_area_struct *vma_to_resize(unsigned long addr, - unsigned long old_len, unsigned long new_len, unsigned long *p) + unsigned long old_len, unsigned long new_len, unsigned long flags, + unsigned long *p) { struct mm_struct *mm =3D current->mm; struct vm_area_struct *vma =3D find_vma(mm, addr); @@ -453,6 +479,10 @@ static struct vm_area_struct *vma_to_resize(unsigned l= ong addr, return ERR_PTR(-EINVAL); } =20 + if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) || + vma->vm_flags & VM_SHARED)) + return ERR_PTR(-EINVAL); + if (is_vm_hugetlb_page(vma)) return ERR_PTR(-EINVAL); =20 @@ -497,7 +527,7 @@ static struct vm_area_struct *vma_to_resize(unsigned lo= ng addr, =20 static unsigned long mremap_to(unsigned long addr, unsigned long old_len, unsigned long new_addr, unsigned long new_len, bool *locked, - struct vm_userfaultfd_ctx *uf, + unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { @@ -505,7 +535,7 @@ static unsigned long mremap_to(unsigned long addr, unsi= gned long old_len, struct vm_area_struct *vma; unsigned long ret =3D -EINVAL; unsigned long charged =3D 0; - unsigned long map_flags; + unsigned long map_flags =3D 0; =20 if (offset_in_page(new_addr)) goto out; @@ -534,9 +564,11 @@ static unsigned long mremap_to(unsigned long addr, uns= igned long old_len, if ((mm->map_count + 2) >=3D sysctl_max_map_count - 3) return -ENOMEM; =20 - ret =3D do_munmap(mm, new_addr, new_len, uf_unmap_early); - if (ret) - goto out; + if (flags & MREMAP_FIXED) { + ret =3D do_munmap(mm, new_addr, new_len, uf_unmap_early); + if (ret) + goto out; + } =20 if (old_len >=3D new_len) { ret =3D do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap); @@ -545,13 +577,26 @@ static unsigned long mremap_to(unsigned long addr, un= signed long old_len, old_len =3D new_len; } =20 - vma =3D vma_to_resize(addr, old_len, new_len, &charged); + vma =3D vma_to_resize(addr, old_len, new_len, flags, &charged); if (IS_ERR(vma)) { ret =3D PTR_ERR(vma); goto out; } =20 - map_flags =3D MAP_FIXED; + /* + * MREMAP_DONTUNMAP expands by new_len - (new_len - old_len), we will + * check that we can expand by new_len and vma_to_resize will handle + * the vma growing which is (new_len - old_len). + */ + if (flags & MREMAP_DONTUNMAP && + !may_expand_vm(mm, vma->vm_flags, new_len >> PAGE_SHIFT)) { + ret =3D -ENOMEM; + goto out; + } + + if (flags & MREMAP_FIXED) + map_flags |=3D MAP_FIXED; + if (vma->vm_flags & VM_MAYSHARE) map_flags |=3D MAP_SHARED; =20 @@ -561,10 +606,16 @@ static unsigned long mremap_to(unsigned long addr, un= signed long old_len, if (IS_ERR_VALUE(ret)) goto out1; =20 - ret =3D move_vma(vma, addr, old_len, new_len, new_addr, locked, uf, + /* We got a new mapping */ + if (!(flags & MREMAP_FIXED)) + new_addr =3D ret; + + ret =3D move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf= , uf_unmap); + if (!(offset_in_page(ret))) goto out; + out1: vm_unacct_memory(charged); =20 @@ -609,12 +660,16 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned= long, old_len, addr =3D untagged_addr(addr); new_addr =3D untagged_addr(new_addr); =20 - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; =20 if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) return ret; =20 + /* MREMAP_DONTUNMAP is always a move */ + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_MAYMOVE)) + return ret; + if (offset_in_page(addr)) return ret; =20 @@ -632,9 +687,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned = long, old_len, if (down_write_killable(¤t->mm->mmap_sem)) return -EINTR; =20 - if (flags & MREMAP_FIXED) { + if (flags & MREMAP_FIXED || flags & MREMAP_DONTUNMAP) { ret =3D mremap_to(addr, old_len, new_addr, new_len, - &locked, &uf, &uf_unmap_early, &uf_unmap); + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); goto out; } =20 @@ -662,7 +718,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned l= ong, old_len, /* * Ok, we need to grow.. */ - vma =3D vma_to_resize(addr, old_len, new_len, &charged); + vma =3D vma_to_resize(addr, old_len, new_len, flags, &charged); if (IS_ERR(vma)) { ret =3D PTR_ERR(vma); goto out; @@ -712,7 +768,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned l= ong, old_len, } =20 ret =3D move_vma(vma, addr, old_len, new_len, new_addr, - &locked, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap); } out: if (offset_in_page(ret)) { --=20 2.25.0.341.g760bfbb309-goog