Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp136141pxb; Fri, 15 Jan 2021 09:14:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJzjqMNFrBGQ75gtjbeM63kSCrfEcS9FUXJYUN7wjL/6rlfSjbA3JUVmJahprUrYE/zQSVmx X-Received: by 2002:a17:907:3e02:: with SMTP id hp2mr9377663ejc.411.1610730850600; Fri, 15 Jan 2021 09:14:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610730850; cv=none; d=google.com; s=arc-20160816; b=t4XJAZqhE0y9NITiNWAKHhdDIvXLNYYNcvBokm4V91IVoBE7a2ixaoC+hPUZWlMYrm 2Bb716ZwP+jmWV2opEr4QJmNxqnPozx9ZqBMaw5GHkA7oUD9olDRt97Yhhv4rZ0uXCYa NK0NEXzOZnWbuVHKHEl+wtFVS8jIlmE43/Y/PvSkYYblN5PBhqQl8NOvoi01TshzoOAE sLuEWsw/2y/GR/HS8Smif2SNffBh/SscaJWPOPzVCtOX9dTsg8vQw4OzdJ1VDrpZZBK+ CNhvLk6+4o/uHx2N/Re2xYt7fbfRjJAFFwsmiAtyLs7XDVaNnIs4e+yeUZoqnOY4JQ1y KqqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GCz2Z9Zgad3gqL1m2VPHTbAcPly4nG/hCXs5cYStaH0=; b=HPVwAjb/K9yyyBcBJ2GOpoZCGxpZr5JcVTzu5Y/eYvGB2OnrAt+ySTk4XIqOq0zwxR gCDhPOlYMlcLsEua1ZcX/eTkKlqGnenKiZ3qz7/31put1SU5/oi2oFLw3C2LabANvsMU 0i+lwjKGHfSUaGKyp5jfUh8+1J1FqEbAy8LO8wYuJFd7kT/5WCEHTQcYNxKoaBFagpCt II2MMMoDSLKCUth1cqf9nGGBmbpiobcjYdJrgJ21fz5Z9Vp8007dwjvc+bClEtAfQC+M +i/dMo5Bzt/OT5XVFg3+DCnzUpAdJPEAuNtNw+NpPUQp0DN6PVED0egxBcgF/GwXKiam dv1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dgIRpP44; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id kt11si4024867ejb.445.2021.01.15.09.13.46; Fri, 15 Jan 2021 09:14:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dgIRpP44; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387424AbhAORLh (ORCPT + 99 others); Fri, 15 Jan 2021 12:11:37 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:32185 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728988AbhAORLf (ORCPT ); Fri, 15 Jan 2021 12:11:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610730609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GCz2Z9Zgad3gqL1m2VPHTbAcPly4nG/hCXs5cYStaH0=; b=dgIRpP44KH+hJOxBR1fDPklKdj30P9ZVqtGhQsF5HqtQVkNdlzoFIHwAzO4JmdvWF3zqvX DrAwOCIEFBvEiHr7s8lJwQqvGat4uvVegBO+27P0FwWN25/ReD6E2DRHpgwlNC0pwyvqIr d6aPaXBvwxnDmId+50xzzdp5ePSwmK8= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-206-ihgJRVk1M2qzj6PL-5W3Zg-1; Fri, 15 Jan 2021 12:10:06 -0500 X-MC-Unique: ihgJRVk1M2qzj6PL-5W3Zg-1 Received: by mail-qk1-f199.google.com with SMTP id p185so8625458qkc.9 for ; Fri, 15 Jan 2021 09:10:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GCz2Z9Zgad3gqL1m2VPHTbAcPly4nG/hCXs5cYStaH0=; b=G2Su32LlXCFHsYS1W6r7sD/UQaqTMRcselsYpF4UU0/3xsEvCd+uUJPcONDNxIzAII +Vn556JrawitI/9eVUWKDxlg5n9/Mkii5SJBy9qiYGQQZ78QuVRURae7z/wVVOHoivYS GVbXscRRNy8hpxbgz/2xucZscjRuD3eLlkWFCT5FJZcwPyk9s9Vs4b4CT2nRxh52OlJy VnB3FybXQuEoPDVrs/mkoH4hodyE0+FXbaX04Oal8PsqUyeSCssnuTGa8vVBc2PpBaZo J3pNzm/wxkacZo/L7l6k2s5YwkH8uC9t65f4cDB8+LB4pkloB476ewhV4REbvQjuV9u2 /gzA== X-Gm-Message-State: AOAM531H8ImCtTUnphJeJKGLv0lea5K0BniTJp+ldXJvrpIOMHeyPUdK w9K5+i6kP0QaTHdCKDGz78ktu9SkhH0WSh2hfi2ey4MnXSNqdVTZm4cbGTh5DoNPmO7Gj4Kgwwh /7dV1OZ3oDOc8LT2HRkFuS2xa5SxZA+kyUMns//5d85fF1TDTjsupNutmYv/1K8Y+PeB0UqLgeg == X-Received: by 2002:a37:c0b:: with SMTP id 11mr13568860qkm.32.1610730604242; Fri, 15 Jan 2021 09:10:04 -0800 (PST) X-Received: by 2002:a37:c0b:: with SMTP id 11mr13568805qkm.32.1610730603894; Fri, 15 Jan 2021 09:10:03 -0800 (PST) Received: from localhost.localdomain ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id d123sm5187840qke.95.2021.01.15.09.10.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Jan 2021 09:10:03 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Rapoport , Mike Kravetz , peterx@redhat.com, Jerome Glisse , "Kirill A . Shutemov" , Hugh Dickins , Axel Rasmussen , Matthew Wilcox , Andrew Morton , Andrea Arcangeli , Nadav Amit Subject: [PATCH RFC 28/30] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Date: Fri, 15 Jan 2021 12:09:05 -0500 Message-Id: <20210115170907.24498-29-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210115170907.24498-1-peterx@redhat.com> References: <20210115170907.24498-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Just like what we've done with shmem uffd-wp special ptes, we shouldn't drop uffd-wp special swap pte for hugetlb too, only if we're going to unmap the whole vma, or we're punching a hole with safe locks held. For example, remove_inode_hugepages() is safe to drop uffd-wp ptes, because it has taken hugetlb fault mutex so that no concurrent page fault would trigger. While the call to hugetlb_vmdelete_list() in hugetlbfs_punch_hole() is not safe. That's why the previous call will be with ZAP_FLAG_DROP_FILE_UFFD_WP, while the latter one won't be able to. Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 13 ++++++++----- mm/hugetlb.c | 27 +++++++++++++++++++++------ mm/memory.c | 5 ++++- 4 files changed, 42 insertions(+), 18 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b5c109703daa..f9ff2ba5e47b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -399,7 +399,8 @@ static void remove_huge_page(struct page *page) } static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, + unsigned long zap_flags) { struct vm_area_struct *vma; @@ -432,7 +433,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) } unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } @@ -513,7 +514,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_FILE_UFFD_WP); i_mmap_unlock_write(mapping); } @@ -579,7 +581,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_FILE_UFFD_WP); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); return 0; @@ -613,8 +616,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8841d118f45b..93f3c46439b2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -121,14 +121,15 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -353,14 +354,16 @@ static inline unsigned long hugetlb_change_protection( static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } static inline void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7959fb4b1633..731a26617673 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3864,7 +3864,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -3916,6 +3916,19 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, continue; } + if (unlikely(is_swap_special_pte(pte))) { + WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte)); + /* + * Only drop the special swap uffd-wp pte if + * e.g. unmapping a vma or punching a hole (with proper + * lock held so that concurrent page fault won't happen). + */ + if (zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP) + huge_pte_clear(mm, address, ptep, sz); + spin_unlock(ptl); + continue; + } + /* * Migrating hugepage or HWPoisoned hugepage is already * unmapped and its refcount is dropped, so just clear pte here. @@ -3967,9 +3980,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -3985,7 +3999,8 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mm_struct *mm; struct mmu_gather tlb; @@ -4004,7 +4019,7 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, mm = vma->vm_mm; tlb_gather_mmu(&tlb, mm, tlb_start, tlb_end); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb, tlb_start, tlb_end); } @@ -4059,7 +4074,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index 59d56f57ba2c..993ec7a7961a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1499,8 +1499,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags = details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else -- 2.26.2