Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3272029pxb; Fri, 12 Feb 2021 13:59:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJyCXHFQqq8xEtVEI+1TxonGJcDezjbJZloXmgYASxEXmAn3AMuVC0NAS6dAigf+IlBXQtwB X-Received: by 2002:a17:906:38c3:: with SMTP id r3mr5052877ejd.193.1613167168536; Fri, 12 Feb 2021 13:59:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613167168; cv=none; d=google.com; s=arc-20160816; b=b6x9260L0nkcSmXsqR6fg8HvP5DCF43yh6lDmGsZvgyjDyDJTlqtUQ+I7c1lDa4JP8 Yyd+AcapgnQj5HPoPqvmG7Mdc1ZxHm9GIK6n7E82ezX9g06J0Rxe+LphTO3L40feVy5V GZZveGBjwoVgbbKLj82FUq6vbVCwVy2felVWZ0Kz5bdv7BNMNqT2wQgT4qD7oLw7s2sr PLqfHs9azBPTPhV6cTg5psZifg1QUCozsuU9+IfRhkHPUAKa5yGBtk9FmCCQMkqRiWDu MRDixtPMpuIhH47Z73IOxUoUeBxPiEBcEIuyerH87gbwzOUD5GiJR0NeOgRDZOJOo3ow 8mtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=DWVyr1ZZbWP2yL4UmcS8cS1Q3kIDy4wU8uYCp1CCBqo=; b=HEk6ys5+JU63FZZWzRT2Z6C3FBAHLOHuk9a37pqalvXythjfc69UwlRFUMrKtBgz9j Yen9e7M1YX6xGKZQl3P28lF1mAQ7ZzVpCz5IMr0IAxWOahSzRJmetHQMwNdFSTpVBv3A vjEOG/Tq8f8ay1ME+Kj/aG78HjoScacqIQ+4nFNP9OcmVfc82YHaJCDOOzOnOlauQgdq tHcPFR8iZlUKhuEgV5oAegN5pBU93T80+J21j1UVySlmuDjiaQhbXHVZZvCxFdgmQa64 j19P+z75k7HA/2cEfskN2h7Y2oXS3tvEw6oCm6jUitziQE8jmqYmOfLd9p2AK4bCS7C7 D+sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=o8rqDRHp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k5si7036439edq.211.2021.02.12.13.59.04; Fri, 12 Feb 2021 13:59:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=o8rqDRHp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231210AbhBLV52 (ORCPT + 99 others); Fri, 12 Feb 2021 16:57:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230518AbhBLVzk (ORCPT ); Fri, 12 Feb 2021 16:55:40 -0500 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FF5BC061797 for ; Fri, 12 Feb 2021 13:54:27 -0800 (PST) Received: by mail-qv1-xf4a.google.com with SMTP id a13so417754qvv.14 for ; Fri, 12 Feb 2021 13:54:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=DWVyr1ZZbWP2yL4UmcS8cS1Q3kIDy4wU8uYCp1CCBqo=; b=o8rqDRHpEiBn+eOk9OgsMrkXIiHamj2wsmCP6vcv0UXSLKbj+2TZy4w/9IB8OyVNbv P4iCLycQ9QOmT3XnHWrmIrqavOUxP/EQxwccwPjtsgOZeIvyDbq+aFDRT3OH57/WTOXj tCwe91VEDdKJcm2y4pfoeCMTH7IN9cqm9LeNW1vvIiEJQ3Xvo8iiT31s1ZnoWvYOoNAk YT0W1e7U92uUV9ZaCnka0okFlTghxMmnRI3+r1qTOg2l0ymQOe5FDkOAiyg17jRjj8+C KrWxCV++mulatPyT3k2hCZAZzXqAE6nBNRzz3WTPWSQQWHF/hX6huma9enm/Q2lGDyod fdQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=DWVyr1ZZbWP2yL4UmcS8cS1Q3kIDy4wU8uYCp1CCBqo=; b=W/+qmpqzctB9kxJK9HyIxN2Mvtz8StQ8QWAQTJbhj/UPPMsm0djElqmHt8DHnwW4Co D57EJ6e0stPaB/gtOyoTvUvRqhnEtZ79oBg/hTzi4mG7lywLObTgvXaWsz1wCTc2zxtt LFmhUTL3uPy2Mygz5LNv9Y6mn4MYZGmkcagNa7PI2Bn1hddUN1wvAC9l1xYgZnVkPVdY jphB/xEay3nM+qApRYTK9ryVqyJJDqRQTr3vLR0fMK9aJ+2fJOaLQ/sy/YhgjD+iuX+Z EGSGybBhA99y8lWvwSZBLjcQoxejA2z+xslpBJSMS+tYdV0ubBkgitjOh0uju2642X8n GuDg== X-Gm-Message-State: AOAM533he/Y/4vkxH+t93V1BOQ7do8PBQbTGNhqYNzXIHgMNFbDz7jTn hIuXb74bJPeUB0Ix689tsfuanL2XYLS9wSFvWD5s Sender: "axelrasmussen via sendgmr" X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a0c:80e9:: with SMTP id 96mr4268727qvb.53.1613166866493; Fri, 12 Feb 2021 13:54:26 -0800 (PST) Date: Fri, 12 Feb 2021 13:54:01 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 5/7] userfaultfd: add UFFDIO_CONTINUE ioctl From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , "=?UTF-8?q?Michal=20Koutn=C3=BD?=" , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for minor fault registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page **pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 78 ++++++++++++++++++++++++++++++-- include/linux/hugetlb.h | 3 ++ include/linux/userfaultfd_k.h | 18 ++++++++ include/uapi/linux/userfaultfd.h | 21 ++++++++- mm/hugetlb.c | 41 +++++++++++------ mm/userfaultfd.c | 37 +++++++++------ 6 files changed, 164 insertions(+), 34 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index edfdb8f1c740..b0142a9919f7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1327,7 +1327,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, struct uffdio_register __user *user_uffdio_register; unsigned long vm_flags, new_flags; bool found; - bool basic_ioctls; + bool found_hugetlb; unsigned long start, end, vma_end; user_uffdio_register = (struct uffdio_register __user *) arg; @@ -1386,7 +1386,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, * Search for not compatible vmas. */ found = false; - basic_ioctls = false; + found_hugetlb = false; for (cur = vma; cur && cur->vm_start < end; cur = cur->vm_next) { cond_resched(); @@ -1441,7 +1441,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, * Note vmas containing huge pages */ if (is_vm_hugetlb_page(cur)) - basic_ioctls = true; + found_hugetlb = true; found = true; } @@ -1514,7 +1514,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!ret) { __u64 ioctls_out; - ioctls_out = basic_ioctls ? UFFD_API_RANGE_IOCTLS_BASIC : + ioctls_out = found_hugetlb ? UFFD_API_RANGE_IOCTLS_BASIC : UFFD_API_RANGE_IOCTLS; /* @@ -1524,6 +1524,13 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)) ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT); + /* CONTINUE ioctl is only supported for minor ranges. */ + if (!(found_hugetlb && + (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) && + (ctx->features & UFFD_FEATURE_MINOR_HUGETLBFS))) { + ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE); + } + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -1877,6 +1884,66 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; } +static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) +{ + __s64 ret; + struct uffdio_continue uffdio_continue; + struct uffdio_continue __user *user_uffdio_continue; + struct userfaultfd_wake_range range; + + user_uffdio_continue = (struct uffdio_continue __user *)arg; + + ret = -EAGAIN; + if (READ_ONCE(ctx->mmap_changing)) + goto out; + + ret = -EFAULT; + if (copy_from_user(&uffdio_continue, user_uffdio_continue, + /* don't copy the output fields */ + sizeof(uffdio_continue) - (sizeof(__s64)))) + goto out; + + ret = validate_range(ctx->mm, &uffdio_continue.range.start, + uffdio_continue.range.len); + if (ret) + goto out; + + ret = -EINVAL; + /* double check for wraparound just in case. */ + if (uffdio_continue.range.start + uffdio_continue.range.len <= + uffdio_continue.range.start) { + goto out; + } + if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + goto out; + + if (mmget_not_zero(ctx->mm)) { + ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, + uffdio_continue.range.len, + &ctx->mmap_changing); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (unlikely(put_user(ret, &user_uffdio_continue->mapped))) + return -EFAULT; + if (ret < 0) + goto out; + + /* len == 0 would wake all */ + BUG_ON(!ret); + range.len = ret; + if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) { + range.start = uffdio_continue.range.start; + wake_userfault(ctx, &range); + } + ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN; + +out: + return ret; +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -1961,6 +2028,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_WRITEPROTECT: ret = userfaultfd_writeprotect(ctx, arg); break; + case UFFDIO_CONTINUE: + ret = userfaultfd_continue(ctx, arg); + break; } return ret; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index aa9e1d6de831..3d01d228fc78 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -11,6 +11,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -139,6 +140,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, @@ -317,6 +319,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep) { BUG(); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 98cb6260b4b4..5d87f2883783 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -44,6 +44,22 @@ extern int sysctl_unprivileged_userfaultfd; extern vm_fault_t handle_userfault(struct vm_fault *vmf, enum uffd_trigger_reason reason); +/* + * The mode of operation for __mcopy_atomic and its helpers. + * + * This is almost an implementation detail (mcopy_atomic below doesn't take this + * as a parameter), but it's exposed here because memory-kind-specific + * implementations (e.g. hugetlbfs) need to know the mode of operation. + */ +enum mcopy_atomic_mode { + /* A normal copy_from_user into the destination range. */ + MCOPY_ATOMIC_NORMAL, + /* Don't copy; map the destination range to the zero page. */ + MCOPY_ATOMIC_ZEROPAGE, + /* Just install pte(s) with the existing page(s) in the page cache. */ + MCOPY_ATOMIC_CONTINUE, +}; + extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, bool *mmap_changing, __u64 mode); @@ -51,6 +67,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, bool *mmap_changing); +extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, + unsigned long len, bool *mmap_changing); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, bool *mmap_changing); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 6b038d56bca7..10e98cb29352 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -37,10 +37,12 @@ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_ZEROPAGE | \ - (__u64)1 << _UFFDIO_WRITEPROTECT) + (__u64)1 << _UFFDIO_WRITEPROTECT | \ + (__u64)1 << _UFFDIO_CONTINUE) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ - (__u64)1 << _UFFDIO_COPY) + (__u64)1 << _UFFDIO_COPY | \ + (__u64)1 << _UFFDIO_CONTINUE) /* * Valid ioctl command number range with this API is from 0x00 to @@ -56,6 +58,7 @@ #define _UFFDIO_COPY (0x03) #define _UFFDIO_ZEROPAGE (0x04) #define _UFFDIO_WRITEPROTECT (0x06) +#define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_API (0x3F) /* userfaultfd ioctl ids */ @@ -74,6 +77,8 @@ struct uffdio_zeropage) #define UFFDIO_WRITEPROTECT _IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \ struct uffdio_writeprotect) +#define UFFDIO_CONTINUE _IOR(UFFDIO, _UFFDIO_CONTINUE, \ + struct uffdio_continue) /* read() structure */ struct uffd_msg { @@ -264,6 +269,18 @@ struct uffdio_writeprotect { __u64 mode; }; +struct uffdio_continue { + struct uffdio_range range; +#define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) + __u64 mode; + + /* + * Fields below here are written by the ioctl and must be at the end: + * the copy_from_user will not read past here. + */ + __s64 mapped; +}; + /* * Flags for the userfaultfd(2) system call itself. */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 37b9ff7c2d04..adca0d99b954 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include "internal.h" @@ -4648,8 +4647,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep) { + bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct address_space *mapping; pgoff_t idx; unsigned long size; @@ -4659,8 +4660,18 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, spinlock_t *ptl; int ret; struct page *page; + int writable; - if (!*pagep) { + mapping = dst_vma->vm_file->f_mapping; + idx = vma_hugecache_offset(h, dst_vma, dst_addr); + + if (is_continue) { + ret = -EFAULT; + page = find_lock_page(mapping, idx); + *pagep = NULL; + if (!page) + goto out; + } else if (!*pagep) { ret = -ENOMEM; page = alloc_huge_page(dst_vma, dst_addr, 0); if (IS_ERR(page)) @@ -4689,13 +4700,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, */ __SetPageUptodate(page); - mapping = dst_vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, dst_vma, dst_addr); - - /* - * If shared, add to page cache - */ - if (vm_shared) { + /* Add shared, newly allocated pages to the page cache. */ + if (vm_shared && !is_continue) { size = i_size_read(mapping->host) >> huge_page_shift(h); ret = -EFAULT; if (idx >= size) @@ -4740,8 +4746,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - _dst_pte = make_huge_pte(dst_vma, page, dst_vma->vm_flags & VM_WRITE); - if (dst_vma->vm_flags & VM_WRITE) + /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ + if (is_continue && !vm_shared) + writable = 0; + else + writable = dst_vma->vm_flags & VM_WRITE; + + _dst_pte = make_huge_pte(dst_vma, page, writable); + if (writable) _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); @@ -4755,15 +4767,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, update_mmu_cache(dst_vma, dst_addr, dst_pte); spin_unlock(ptl); - SetHPageMigratable(page); - if (vm_shared) + if (!is_continue) + SetHPageMigratable(page); + if (vm_shared || is_continue) unlock_page(page); ret = 0; out: return ret; out_release_unlock: spin_unlock(ptl); - if (vm_shared) + if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: put_page(page); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index b2ce61c1b50d..ce6cb4760d2c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -207,7 +207,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage) + enum mcopy_atomic_mode mode) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; int vm_shared = dst_vma->vm_flags & VM_SHARED; @@ -227,7 +227,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, * by THP. Since we can not reliably insert a zero page, this * feature is not supported. */ - if (zeropage) { + if (mode == MCOPY_ATOMIC_ZEROPAGE) { mmap_read_unlock(dst_mm); return -EINVAL; } @@ -273,8 +273,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { - pte_t dst_pteval; - BUG_ON(dst_addr >= dst_start + len); /* @@ -297,16 +295,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, goto out_unlock; } - err = -EEXIST; - dst_pteval = huge_ptep_get(dst_pte); - if (!huge_pte_none(dst_pteval)) { + if (mode != MCOPY_ATOMIC_CONTINUE && + !huge_pte_none(huge_ptep_get(dst_pte))) { + err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); goto out_unlock; } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, &page); + dst_addr, src_addr, mode, &page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -408,7 +406,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage); + enum mcopy_atomic_mode mode); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -458,7 +456,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage, + enum mcopy_atomic_mode mcopy_mode, bool *mmap_changing, __u64 mode) { @@ -469,6 +467,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, long copied; struct page *page; bool wp_copy; + bool zeropage = (mcopy_mode == MCOPY_ATOMIC_ZEROPAGE); /* * Sanitize the command parameters: @@ -527,10 +526,12 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, zeropage); + src_start, len, mcopy_mode); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; + if (mcopy_mode == MCOPY_ATOMIC_CONTINUE) + goto out_unlock; /* * Ensure the dst_vma has a anon_vma or this page @@ -626,14 +627,22 @@ ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, bool *mmap_changing, __u64 mode) { - return __mcopy_atomic(dst_mm, dst_start, src_start, len, false, - mmap_changing, mode); + return __mcopy_atomic(dst_mm, dst_start, src_start, len, + MCOPY_ATOMIC_NORMAL, mmap_changing, mode); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool *mmap_changing) { - return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, + mmap_changing, 0); +} + +ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, bool *mmap_changing) +{ + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, + mmap_changing, 0); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, -- 2.30.0.478.g8a0d178c01-goog