Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3546039pxb; Mon, 4 Apr 2022 20:44:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5UknCYp9yl0lFdewvK5jqJPzFS4UObpMRsXoNmSzukTh3xTv/+VJW5rsbz5iXBH/5f2Cm X-Received: by 2002:a17:90b:1c88:b0:1b8:a77e:c9cb with SMTP id oo8-20020a17090b1c8800b001b8a77ec9cbmr1725129pjb.205.1649130285185; Mon, 04 Apr 2022 20:44:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649130285; cv=none; d=google.com; s=arc-20160816; b=sHLGu1DzrtwEfPk1O0PDbTusOQBFsST4cYTk2Ilw0zVoKdAMpR7czuQRScqYs2d01V MB8Enn+IA/N78tvqvD0rniojX+/JwYCv/6HHchFT/4mVe2IbgKB2BIm7PGPNeQAW30Ou 60NvJcyzpD6POwPjbsYZst5GjTNhLf2uOiwhIEBXQtvl86pHXApiXHVpytBdzE7lFl3K 0b/RVKmgYkSslHWSieVDqBVX191H0hF/s8e7uRJ6iaQRjKmhoN0yQYZwo7EX1SBVdRmZ YPx5Og7WcwED+gZGFqWUmhoopsBxc38E6sPqqRHMmZQncITGEKCc9+E/RJ+lUbg1ZdQf WxNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Zpk9Li6fgqH75i55vcn3N+1Xng1FM6T4OSs35V/WwSY=; b=C/VqJCWbYiz2rJfRuY7KxVjKyFCx1ztWuMVfc9Qc/Ud6U1wsY/6kRrDXPIhrZ/ZIFw afvZHDezZ1p+3J2ASG2JyrmS7nJYBFc4jdDXiJi+NHd5J587IXVK/SrbjMPQz9sESHzU jV0qUfvUM7MQjmhqEU1ZcaeHQHDCm3WOH9s2onZckKG7oKo7Tgx9Fc4njclkkFaOYgN0 WC+/4ZexfdkDCWUSfobx3XmlRLbqqK+JO1gm7jTJzhZReFxmRMWviRtxXJ1s4aEm1B10 RhwAnfYBSCVhZX3a2oYd+yGsecAKgugrb4ejB6gTFqb9glrBLjR9g/I4phYmEy87bYF5 YB/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AuyFPVaw; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n9-20020a170903110900b00153b2d16443si12243173plh.75.2022.04.04.20.44.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 20:44:45 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AuyFPVaw; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6E4DB3167EC; Mon, 4 Apr 2022 19:43:23 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbiDECok (ORCPT + 99 others); Mon, 4 Apr 2022 22:44:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbiDECnl (ORCPT ); Mon, 4 Apr 2022 22:43:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 946163A37B8 for ; Mon, 4 Apr 2022 18:49:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zpk9Li6fgqH75i55vcn3N+1Xng1FM6T4OSs35V/WwSY=; b=AuyFPVawAUTMJ7DgjsnuWVKTP92a4zATGCq9czhbUkniyPNlNBx8F7G2mJl+u9W8WICXeS PnreGcHGxhEvIyhXlRc6vGzjJu8bqKvesjgoUDFNjcl0DD6f7il9R7XTtbx+uijcYy9L01 rN5PbCOzv85ng9HbNSZpMlsIM5wiEMQ= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-629-xScWbi0WPTSzR6cwXTV8kw-1; Mon, 04 Apr 2022 21:49:21 -0400 X-MC-Unique: xScWbi0WPTSzR6cwXTV8kw-1 Received: by mail-io1-f70.google.com with SMTP id x16-20020a6bfe10000000b006409f03e39eso7462851ioh.7 for ; Mon, 04 Apr 2022 18:49:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Zpk9Li6fgqH75i55vcn3N+1Xng1FM6T4OSs35V/WwSY=; b=VKpz4gxTpIoimHG+r1BGZLVN49XDMGXj/hlGnwqg+ANI7JGX+Nsb/3TyXXi4dAQlvi iVcbShGir11ibLLbFb+Zypr1s7K/MMuQYPwUJvRDwyCrCZ6Bm8kv+c9LYlqMKpvz+41n S0SNThP9FS2Edv2zslwtc43X5vD35VHousWgQe2IfrZFkaJxqn2jb3rJ50SO6vwb+T2O yv/1EwfstFcZN6nwh6ebGKAtlyXV2eFOW/oxaV3atqTpToBCpjnnQHDuAswX7QZZToo7 cR37LSBn/3fFXRTtgvUzrHVjovlcN0VyTD2pI8a6xLCcrUr/np36PG9PeQNJryhPSWpM OvEw== X-Gm-Message-State: AOAM531xjWnj6liRYlgIo26L75Jcu2/wQzR8D9DN2getyA9Ac7HRiZvl pSJmMhptdB6JEY706jqmyqZhPev/uWaEEU20PawPGWZjN4oIdVT1qbFkJHNBAFP5DRx9y2kpS+P UJ9GCw58MAfFuVDH+L1EyrBQWkWHxtvDALzzInIWTgTxbOCZqEaTQ3UpM8bY7EIda3Q8oTduWfw == X-Received: by 2002:a05:6638:164b:b0:323:ac42:8d4b with SMTP id a11-20020a056638164b00b00323ac428d4bmr693516jat.75.1649123361107; Mon, 04 Apr 2022 18:49:21 -0700 (PDT) X-Received: by 2002:a05:6638:164b:b0:323:ac42:8d4b with SMTP id a11-20020a056638164b00b00323ac428d4bmr693489jat.75.1649123360806; Mon, 04 Apr 2022 18:49:20 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id t1-20020a056e02060100b002ca41adce5dsm2355369ils.8.2022.04.04.18.49.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:20 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 18/23] mm/hugetlb: Handle uffd-wp during fork() Date: Mon, 4 Apr 2022 21:49:18 -0400 Message-Id: <20220405014918.14932-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Firstly, we'll need to pass in dst_vma into copy_hugetlb_page_range() because for uffd-wp it's the dst vma that matters on deciding how we should treat uffd-wp protected ptes. We should recognize pte markers during fork and do the pte copy if needed. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 +++++-- mm/hugetlb.c | 42 +++++++++++++++++++++++++++-------------- mm/memory.c | 2 +- 3 files changed, 34 insertions(+), 17 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ab48b3bbb0e6..6df51d23b7ee 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -137,7 +137,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct vm_area_struct *new_vma, unsigned long old_addr, unsigned long new_addr, unsigned long len); -int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); +int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, + struct vm_area_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, unsigned long *, long, unsigned int, @@ -268,7 +269,9 @@ static inline struct page *follow_huge_addr(struct mm_struct *mm, } static inline int copy_hugetlb_page_range(struct mm_struct *dst, - struct mm_struct *src, struct vm_area_struct *vma) + struct mm_struct *src, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e4af8b357b90..e1571179698a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4706,23 +4706,24 @@ hugetlb_install_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr } int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; struct page *ptepage; unsigned long addr; - bool cow = is_cow_mapping(vma->vm_flags); - struct hstate *h = hstate_vma(vma); + bool cow = is_cow_mapping(src_vma->vm_flags); + struct hstate *h = hstate_vma(src_vma); unsigned long sz = huge_page_size(h); unsigned long npages = pages_per_huge_page(h); - struct address_space *mapping = vma->vm_file->f_mapping; + struct address_space *mapping = src_vma->vm_file->f_mapping; struct mmu_notifier_range range; int ret = 0; if (cow) { - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, src, - vma->vm_start, - vma->vm_end); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_vma, src, + src_vma->vm_start, + src_vma->vm_end); mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); @@ -4736,12 +4737,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, i_mmap_lock_read(mapping); } - for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) { + for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, vma, addr, sz); + dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4776,6 +4777,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else if (unlikely(is_hugetlb_entry_migration(entry) || is_hugetlb_entry_hwpoisoned(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); + bool uffd_wp = huge_pte_uffd_wp(entry); if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -4785,10 +4787,21 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, swp_entry = make_readable_migration_entry( swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); + if (userfaultfd_wp(src_vma) && uffd_wp) + entry = huge_pte_mkuffd_wp(entry); set_huge_swap_pte_at(src, addr, src_pte, entry, sz); } + if (!userfaultfd_wp(dst_vma) && uffd_wp) + entry = huge_pte_clear_uffd_wp(entry); set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz); + } else if (unlikely(is_pte_marker(entry))) { + /* + * We copy the pte marker only if the dst vma has + * uffd-wp enabled. + */ + if (userfaultfd_wp(dst_vma)) + set_huge_pte_at(dst, addr, dst_pte, entry); } else { entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); @@ -4806,20 +4819,21 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (!PageAnon(ptepage)) { page_dup_file_rmap(ptepage, true); - } else if (page_try_dup_anon_rmap(ptepage, true, vma)) { + } else if (page_try_dup_anon_rmap(ptepage, true, + src_vma)) { pte_t src_pte_old = entry; struct page *new; spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new = alloc_huge_page(vma, addr, 1); + new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { put_page(ptepage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, vma, + copy_user_huge_page(new, ptepage, addr, dst_vma, npages); put_page(ptepage); @@ -4829,13 +4843,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { - restore_reserve_on_error(h, vma, addr, + restore_reserve_on_error(h, dst_vma, addr, new); put_page(new); /* dst_entry won't change as in child */ goto again; } - hugetlb_install_page(vma, dst_pte, addr, new); + hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; diff --git a/mm/memory.c b/mm/memory.c index 9808edfe18d4..d1e9c2517dfb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1278,7 +1278,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return 0; if (is_vm_hugetlb_page(src_vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, src_vma); + return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { /* -- 2.32.0