Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp1035609rdb; Tue, 19 Sep 2023 19:19:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEZwWrnjZXVXMG+GcrbED3pXhp73viA/nsgkcE0AbXPF0aBvi6qvIUyGYwr5iTkhRxqKPdQ X-Received: by 2002:a17:902:f7c7:b0:1c4:335:b06d with SMTP id h7-20020a170902f7c700b001c40335b06dmr1009841plw.32.1695176359908; Tue, 19 Sep 2023 19:19:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695176359; cv=none; d=google.com; s=arc-20160816; b=Uo7wWCubp0jEaUAwKeUOe0hADmnfmqEeILbFP+p3N0Xr70THD0kLq4J9BrVKNtk+YS SAaEy1AxnEuWhzaVHuPuXkhA7aqCC3OuvNYDsYj7H/hBrCy2wW1cIlb6iUBXDvPZ0VLt mdh3fPbPNzktXtdp1AjMEQMDeVeL0An8zjtl4L8kq0+3Hm+rtf2NQNI+JpJGy3e8Nmx+ Dn3YwuIUGhaenlB9O0DmhxljmEVQqiOmPys4OMMSnBz16Bx2v61inFstBJySR2Eg/QCJ olADZsbW95M4BIUwWPjNCuKaG7AFTrwBAT9uLCgvhLljrL7NJ+Rk5P2eAPKZbnb5rapV XtCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=58vavVNOX5+5S5lqwPODVeP3P8mCoxkLR+A+P+FvD2s=; fh=bzNJBDvmVggPHh1ZcYjMKr2Y1pY39TjoLL6KDiYiAEw=; b=Eyvk4iAZRujOfiI6+fHyltm4n/mZfjHu+AaMokkKOZKaWXauoWQbuDX62KVCyUhr+v PqeLouT6hy2+KiobnLlaFAnOxJzYNyzUOd1l6yblfIiC7D6zDyVyUzjaB+4Szz8MXh3H 9gxi6UALUZUI0ZOl1XHNO0bM+v5cqd8f5GjYvcQ4BPe1YkFYP7ADJMQAH4o/iabMPU1I QIu0Ak5p3TDEydvBronyDeCRWPyvnGjXk34SA+xPPTjsEun0Ww8tNA5dXtC3c0WyrT5g If+Lf7a8UE7clm9etlfY2bUyrFMKnyiKIHcu+qZz3fmHO2GAPyXOxLF/GfRjwHtCm2WH PjrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id u16-20020a17090341d000b001c4621ac000si6535630ple.246.2023.09.19.19.19.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 19:19:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 36DCE825173F; Tue, 19 Sep 2023 19:19:06 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231835AbjITCSt (ORCPT + 99 others); Tue, 19 Sep 2023 22:18:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231778AbjITCSs (ORCPT ); Tue, 19 Sep 2023 22:18:48 -0400 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E687BE for ; Tue, 19 Sep 2023 19:18:42 -0700 (PDT) Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qimn9-0006me-2e; Tue, 19 Sep 2023 22:18:15 -0400 From: riel@surriel.com To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, Rik van Riel Subject: [PATCH 1/2] hugetlbfs: extend hugetlb_vma_lock to private VMAs Date: Tue, 19 Sep 2023 22:16:09 -0400 Message-ID: <20230920021811.3095089-2-riel@surriel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920021811.3095089-1-riel@surriel.com> References: <20230920021811.3095089-1-riel@surriel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: riel@surriel.com X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Tue, 19 Sep 2023 19:19:06 -0700 (PDT) From: Rik van Riel Extend the locking scheme used to protect shared hugetlb mappings from truncate vs page fault races, in order to protect private hugetlb mappings (with resv_map) against MADV_DONTNEED. Add a read-write semaphore to the resv_map data structure, and use that from the hugetlb_vma_(un)lock_* functions, in preparation for closing the race between MADV_DONTNEED and page faults. Signed-off-by: Rik van Riel --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 36 ++++++++++++++++++++++++++++++++---- 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5b2626063f4f..694928fa06a3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -60,6 +60,7 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; + struct rw_semaphore rw_sema; #ifdef CONFIG_CGROUP_HUGETLB /* * On private mappings, the counter to uncharge reservations is stored @@ -1231,6 +1232,11 @@ static inline bool __vma_shareable_lock(struct vm_area_struct *vma) return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; } +static inline bool __vma_private_lock(struct vm_area_struct *vma) +{ + return (!(vma->vm_flags & VM_MAYSHARE)) && vma->vm_private_data; +} + /* * Safe version of huge_pte_offset() to check the locks. See comments * above huge_pte_offset(). diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ba6d39b71cb1..b99d215d2939 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -97,6 +97,7 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); +static struct resv_map *vma_resv_map(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -267,6 +268,10 @@ void hugetlb_vma_lock_read(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + down_read(&resv_map->rw_sema); } } @@ -276,6 +281,10 @@ void hugetlb_vma_unlock_read(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + up_read(&resv_map->rw_sema); } } @@ -285,6 +294,10 @@ void hugetlb_vma_lock_write(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + down_write(&resv_map->rw_sema); } } @@ -294,17 +307,27 @@ void hugetlb_vma_unlock_write(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + up_write(&resv_map->rw_sema); } } int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - if (!__vma_shareable_lock(vma)) - return 1; + if (__vma_shareable_lock(vma)) { + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + + return down_write_trylock(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + return down_write_trylock(&resv_map->rw_sema); + } - return down_write_trylock(&vma_lock->rw_sema); + return 1; } void hugetlb_vma_assert_locked(struct vm_area_struct *vma) @@ -313,6 +336,10 @@ void hugetlb_vma_assert_locked(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; lockdep_assert_held(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + lockdep_assert_held(&resv_map->rw_sema); } } @@ -1068,6 +1095,7 @@ struct resv_map *resv_map_alloc(void) kref_init(&resv_map->refs); spin_lock_init(&resv_map->lock); INIT_LIST_HEAD(&resv_map->regions); + init_rwsem(&resv_map->rw_sema); resv_map->adds_in_progress = 0; /* -- 2.41.0