Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp974671ybl; Fri, 24 Jan 2020 13:01:43 -0800 (PST) X-Google-Smtp-Source: APXvYqwwlQe0n73oruSXtLG8qTatKUYoJrHsZBc81KXvjWFugkWWlYBrfKPXZCB6bQ+vBd+KBxZp X-Received: by 2002:aca:ad11:: with SMTP id w17mr534375oie.85.1579899703721; Fri, 24 Jan 2020 13:01:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579899703; cv=none; d=google.com; s=arc-20160816; b=fuWV5gpDFR71w8eEaTDoyZBWH/XziJ21FxFhdy6WyOQ0V1EvHtRTQjmp2pfnQEWQpi wm/3vrIRMcDCyo4bg/PZPjil/QLE5CbY4Qjani+8W3d/f1YfhTmGxXE2EqrN6/mJOgJb t4+NeWn8dRW9Bi3vfgAnQq/3EFj5A0fsodF6r/lgDqMCl9Cu0VUMg40lOwsiiJ8Oec7H ZPYDWUQFHz3cmtIGKBLtQu+BnxN25Lv75A3sDhmgDNih8MBec/LgSYXhhp2Mam/kN1Ox 0B4IAGP9x2fbmfWJrmYEuslruGFVmCFJGNArQ7W0ipDz8BjqMOA/0JBZfW5c/4xfHsQP 0XEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:dkim-signature; bh=krkqAlKRhIMp0ArvlpDhRd7dHWq318OqpoSyF8/KVxw=; b=QifkATMcVusUMMdR7yIEJOghkhQPGSTtwATFf8g1haOrB6aY6DggIEW3Q5xiTFJYie K7KSKHmVsoKI02SVzsz6GTwA0O6U+W4u+t87AbB2z+kp8m6UGwx3Xe7h7+9cZSPtaJI6 71D99OuoBCn5tR0qUY6Yj/JM7B32JNehQ54kwk0zM/alw5CN7zjCEw9nkFeKqCiIxpoe jmy/uFHfPz2dDPosTInR5rgNVL1cGHnBcCeJuWLeUuQHG6he0+WfvHu0NQ+bbQ8zptH1 Ic8Vflj2MmtW+Cunrk9TnAkbG/5sYigtA4letCa2qlagD9QMzlM8UqulWDEMxH/qXG1M /fJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=GAywwj8U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w189si291513oif.255.2020.01.24.13.01.32; Fri, 24 Jan 2020 13:01:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=GAywwj8U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391541AbgAXTGc (ORCPT + 99 others); Fri, 24 Jan 2020 14:06:32 -0500 Received: from mail-pj1-f74.google.com ([209.85.216.74]:53701 "EHLO mail-pj1-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387973AbgAXTGc (ORCPT ); Fri, 24 Jan 2020 14:06:32 -0500 Received: by mail-pj1-f74.google.com with SMTP id h6so272285pju.3 for ; Fri, 24 Jan 2020 11:06:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=krkqAlKRhIMp0ArvlpDhRd7dHWq318OqpoSyF8/KVxw=; b=GAywwj8UJhwxiqNgWIKKSNIBEkjgCb13ZvmIkbgvw3a62UZT9wx4bs2L9qmSU3ApoM 7PD72PhLgZhEqmrQVMDh0QfmdpfL80pW8KBieoiCd48rL/OjP5j5lFKCOP1sfKkvwNJA +xGfaRc8hQxhz5oQ4dGwxqy8O6hIBiF13MIyAYLgOv42e3kfnTOe2fqkw2sbIYg6MsZP iI9O/XCli30VFBvTFH6miWnkPrkRzCYFM3vOhPNmLi4bnaFG9Ex0KnGovbb+71Wf73kA zOxsE8QnPRM9HVyNDzafepg3w8FlTAiCFQPW5FOKe3uaMrHWxo/bUFwHk53FhRpWl42j gV8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=krkqAlKRhIMp0ArvlpDhRd7dHWq318OqpoSyF8/KVxw=; b=fpnkMwxvQwUZutI70uYEi2myMVxDq/8w2HX79PnZ5wn4QhFlYaukiiq3qtg8xcavRx UzyHYMjOLkOcK2PLkvfd6+aWQW02kheo+HgAdTvSnGKgtHnzEO50D5ODWXoJFmtoX42F Zkjy4EcWpIQBUBYvmH8Xde2DIiH8m/yBGm1STlmPItrW6VPMSJOv3Y1kDfwSYBqycR4D vkMsOHnAkXDUyuaPLPgYpk1rM4hud+V3zaOAJ76DvRcXzpta8Qc6+FSbQYC88GL6100g jXX07JXGrzfZCvxAcFHkMPbytSoAu+H+NOTnKTnbn/+6MPMWEEeJRUC3x1TjM3TzNxnF wkOQ== X-Gm-Message-State: APjAAAUHQJpfPDpkkNnK8T1H91P38QGgZE8rAyhDlj4S2w4KG8emsxYs ttxKH1s/CeZchNtXQOo2VYMe2bcLOZPj X-Received: by 2002:a63:6d05:: with SMTP id i5mr5925764pgc.120.1579892791226; Fri, 24 Jan 2020 11:06:31 -0800 (PST) Date: Fri, 24 Jan 2020 11:06:25 -0800 In-Reply-To: <20200123014627.71720-1-bgeffon@google.com> Message-Id: <20200124190625.257659-1-bgeffon@google.com> Mime-Version: 1.0 References: <20200123014627.71720-1-bgeffon@google.com> X-Mailer: git-send-email 2.25.0.341.g760bfbb309-goog Subject: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap(). From: Brian Geffon To: Andrew Morton Cc: "Michael S . Tsirkin" , Brian Geffon , Arnd Bergmann , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Andy Lutomirski , Andrea Arcangeli , Sonny Rao , Minchan Kim , Joel Fernandes , Yu Zhao , Jesse Barnes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. If a userfaultfd was watching the source, it will continue to watch the new mapping. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. MREMAP_DONTUNMAP implies that MREMAP_FIXED is also used. The final result is two equally sized VMAs where the destination contains the PTEs of the source. We hope to use this in Chrome OS where with userfaultfd we could write an anonymous mapping to disk without having to STOP the process or worry about VMA permission changes. This feature also has a use case in Android, Lokesh Gidra has said that "As part of using userfaultfd for GC, We'll have to move the physical pages of the java heap to a separate location. For this purpose mremap will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java heap, its virtual mapping will be removed as well. Therefore, we'll require performing mmap immediately after. This is not only time consuming but also opens a time window where a native thread may call mmap and reserve the java heap's address range for its own usage. This flag solves the problem." Signed-off-by: Brian Geffon --- include/uapi/linux/mman.h | 5 +++-- mm/mremap.c | 37 ++++++++++++++++++++++++++++++------- 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index fc1a64c3447b..923cc162609c 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -5,8 +5,9 @@ #include #include -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 diff --git a/mm/mremap.c b/mm/mremap.c index 122938dcec15..bf97c3eb538b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma, static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, - bool *locked, struct vm_userfaultfd_ctx *uf, - struct list_head *uf_unmap) + bool *locked, unsigned long flags, + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *new_vma; @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma, if (unlikely(vma->vm_flags & VM_PFNMAP)) untrack_pfn_moved(vma); + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { + if (vm_flags & VM_ACCOUNT) + vma->vm_flags |= VM_ACCOUNT; + + goto out; + } + if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, vma->vm_next->vm_flags |= VM_ACCOUNT; } +out: if (vm_flags & VM_LOCKED) { mm->locked_vm += new_len >> PAGE_SHIFT; *locked = true; @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, static unsigned long mremap_to(unsigned long addr, unsigned long old_len, unsigned long new_addr, unsigned long new_len, bool *locked, - struct vm_userfaultfd_ctx *uf, + unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, old_len = new_len; } + /* + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will + * check that we can expand by old_len and vma_to_resize will handle + * the vma growing. + */ + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm, + vma->vm_flags, old_len >> PAGE_SHIFT))) { + ret = -ENOMEM; + goto out; + } + vma = vma_to_resize(addr, old_len, new_len, &charged); if (IS_ERR(vma)) { ret = PTR_ERR(vma); @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (IS_ERR_VALUE(ret)) goto out1; - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf, + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap); if (!(offset_in_page(ret))) goto out; @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, addr = untagged_addr(addr); new_addr = untagged_addr(new_addr); - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) return ret; + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED)) + return ret; + if (offset_in_page(addr)) return ret; @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, if (flags & MREMAP_FIXED) { ret = mremap_to(addr, old_len, new_addr, new_len, - &locked, &uf, &uf_unmap_early, &uf_unmap); + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); goto out; } @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, } ret = move_vma(vma, addr, old_len, new_len, new_addr, - &locked, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap); } out: if (offset_in_page(ret)) { -- 2.25.0.341.g760bfbb309-goog