Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp7491367pxb; Thu, 18 Feb 2021 11:26:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJzlZqis9n83cNHS+e9HkyALxfO1FlDAQCrHIqishoDsi5G7V0ej8K3PSmSGPI+6gN9uWvi9 X-Received: by 2002:aa7:cb8f:: with SMTP id r15mr5677272edt.130.1613676400938; Thu, 18 Feb 2021 11:26:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613676400; cv=none; d=google.com; s=arc-20160816; b=QZW+PtYZbBkGJRzc6m++Sx5JkrKo9Dw9d0jm+L1Xh0JZ6f7BJ5UrPU2DVKs215nCM/ TvgTQmtl6A7J0PUaGMbc1s26Qm+Zoev8UYrABtme9ItYX6X3nTI8phgcw3+2bHf6aW8q OGe/GiAk1bNp/zsytMpCjFXxMO02zT0F97Kpm3c9dLTx4dB0g7iD0KelbBOWTX/6pLwC v1mhH0Kmlp20EOTz88+7Q0n84fVuCCGYV0w16PeeEhH9UpUVBJt2Jj1u+I7U0WFA7dEd IzVuUonxGXAx872tz/G07bMY8bnwCrq+ZYznvQMARX0fs6Cu6QtVckU27tSPwXkuzvPm 5zqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ASJfbMESWoJh/HQsMW7Q3UKMpf/MfugaFVT8oj4WLmg=; b=GhMiWvlvXlpCxYsKZB1EnE7qg31XxJJoGCahWmamilSLhjHSFu1kAMINhhPn95b1jZ 7VXfyGXktmMhRV6m3OrwhTHI8SQJD/ftcdhxbeJzSgUeRbeTWtc+pC582moMQwBhv1N+ H30eajcJ/UY7nC4WKnVnjLpdZshpbsAtilpNX1VcOchP13H/kRP8w4vrUOeNN0f3/5ln xEURy8ksUkDEMcCRIJjgVS/EFPE+GfvZRwi3biRjJuiqJ5qRDxjGgeHyQspFHRa0pz4T YXjpIbWfBz6SzZoC1GSNcdHGnvTPKX+wPqsKzGlz8qQYf3QR6m+HZyNScEHjGUtC+4Ux 3pCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ryCJHtpS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bg8si4671467ejb.188.2021.02.18.11.26.16; Thu, 18 Feb 2021 11:26:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ryCJHtpS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230466AbhBRTZh (ORCPT + 99 others); Thu, 18 Feb 2021 14:25:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231588AbhBRSeW (ORCPT ); Thu, 18 Feb 2021 13:34:22 -0500 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46C77C0611C2 for ; Thu, 18 Feb 2021 10:32:36 -0800 (PST) Received: by mail-il1-x136.google.com with SMTP id o7so2356207ils.2 for ; Thu, 18 Feb 2021 10:32:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ASJfbMESWoJh/HQsMW7Q3UKMpf/MfugaFVT8oj4WLmg=; b=ryCJHtpSF1ypfE4YKHb5dWe5u3MHqsEwaEb86TPt1EPl6gh2KYJ445RXpgLxv4Hxo5 SI8cver45q3N02qNa4nG2bpG99XCwT0u9VJiSrCw2RN8twvMjSsTZrnu6RgQW/lAQBBo 96yqOc/zXHlKVeqzMlp++IMC6pd1isZABQAMn+4270ssbBkolJpt99LBVV+9MX439rxx +xAOZ6oxmoK3QESndmnAbLlK8wxrAa7YURMzc2vDoGzTdX7Fahd/7T4JmYNofBmVrKNX Ys9Y4XQRw0x7y5WEWO87oKDhpjFBnVBxWG54QmZrMT+re7WY+aXQuseBUadBpIJr5zEK PAdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ASJfbMESWoJh/HQsMW7Q3UKMpf/MfugaFVT8oj4WLmg=; b=iYuibLLeEG13rALwO0nC8QdGNRVHvlnA0IL/D3rMe0BY9uGz+WBTHy9ww7JeRbVogB +ox7thoSjdgaCv++S2eNDVvczqcLrtNiiyuoUWUVovrmW4x0v15AR0ZzcAibRR6kB+3c Hh/Wn+6CZ7n9IISA8zL+wYZr1O748V6wzRa9w1TczoCaXGajAa7ES6K6UpN01oNXB1pq DkksnZ/M2LVLmeioRZGQYQd/juS+nFgg1OePLinRxi6/n5pDFUMIsdzWdh6xMcv7a3uT TcoqnwTAO1vWbCcCtPU2Xv6FfAoQ13aPNtMNut4QWZh+Ij8BUDYnY4MZxL/Yv9UAjhb4 c7GA== X-Gm-Message-State: AOAM531Kj0ib6zExKZAkBS2KeX0hiCuPxaWO10SrpvNq+bovtvg7/c+O ++OVqI+na32iDfOX1QReL39vIAG6ZhGgdEWNwv9UaLxRK+jrLQ== X-Received: by 2002:a05:6e02:1c8d:: with SMTP id w13mr394834ill.301.1613673155515; Thu, 18 Feb 2021 10:32:35 -0800 (PST) MIME-Version: 1.0 References: <20210217204418.54259-1-peterx@redhat.com> <20210217204619.54761-1-peterx@redhat.com> <20210217204619.54761-3-peterx@redhat.com> In-Reply-To: <20210217204619.54761-3-peterx@redhat.com> From: Axel Rasmussen Date: Thu, 18 Feb 2021 10:32:00 -0800 Message-ID: Subject: Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp To: Peter Xu Cc: Linux MM , LKML , Mike Kravetz , Mike Rapoport , Andrea Arcangeli , Matthew Wilcox , "Kirill A . Shutemov" , Andrew Morton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 17, 2021 at 12:46 PM Peter Xu wrote: > > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because > userfaultfd-wp is always based on pgtable entries, so they cannot be shared. > > Walk the hugetlb range and unshare all such mappings if there is, right before > UFFDIO_REGISTER will succeed and return to userspace. > > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing > is completely disabled for userfaultfd-wp registered range. > > Signed-off-by: Peter Xu > --- > fs/userfaultfd.c | 4 ++++ > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 56 insertions(+) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 894cc28142e7..e259318fcae1 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > vma->vm_flags = new_flags; > vma->vm_userfaultfd_ctx.ctx = ctx; > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > + hugetlb_unshare_all_pmds(vma); This line yields the following error, if building with: # CONFIG_CMA is not set ./fs/userfaultfd.c:1459: undefined reference to `hugetlb_unshare_all_pmds' > + > skip: > prev = vma; > start = vma->vm_end; > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 3b4104021dd3..97ecfd4c20b2 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > unsigned long address, unsigned long end, pgprot_t newprot); > > bool is_hugetlb_entry_migration(pte_t pte); > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); > > #else /* !CONFIG_HUGETLB_PAGE */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f53a0b852ed8..83c006ea3ff9 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) > pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); > } > > +/* > + * This function will unconditionally remove all the shared pmd pgtable entries > + * within the specific vma for a hugetlbfs memory range. > + */ > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > +{ > + struct hstate *h = hstate_vma(vma); > + unsigned long sz = huge_page_size(h); > + struct mm_struct *mm = vma->vm_mm; > + struct mmu_notifier_range range; > + unsigned long address, start, end; > + spinlock_t *ptl; > + pte_t *ptep; > + > + if (!(vma->vm_flags & VM_MAYSHARE)) > + return; > + > + start = ALIGN(vma->vm_start, PUD_SIZE); > + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); > + > + if (start >= end) > + return; > + > + /* > + * No need to call adjust_range_if_pmd_sharing_possible(), because > + * we're going to operate on the whole vma > + */ > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, > + vma->vm_start, vma->vm_end); > + mmu_notifier_invalidate_range_start(&range); > + i_mmap_lock_write(vma->vm_file->f_mapping); > + for (address = start; address < end; address += PUD_SIZE) { > + unsigned long tmp = address; > + > + ptep = huge_pte_offset(mm, address, sz); > + if (!ptep) > + continue; > + ptl = huge_pte_lock(h, mm, ptep); > + /* We don't want 'address' to be changed */ > + huge_pmd_unshare(mm, vma, &tmp, ptep); > + spin_unlock(ptl); > + } > + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); > + i_mmap_unlock_write(vma->vm_file->f_mapping); > + /* > + * No need to call mmu_notifier_invalidate_range(), see > + * Documentation/vm/mmu_notifier.rst. > + */ > + mmu_notifier_invalidate_range_end(&range); > +} > + > #endif /* CONFIG_CMA */ > -- > 2.26.2 >