Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp2068472rwi; Thu, 3 Nov 2022 12:14:31 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4cQAZTDi3gZY3c57Ew8/UKXYzAwDWPhLdnfUwW87hGYGdpgbc6LTxq8tV3HDgkVO8U5Tbm X-Received: by 2002:a17:907:d1d:b0:7ad:4a55:9f01 with SMTP id gn29-20020a1709070d1d00b007ad4a559f01mr30611350ejc.723.1667502871366; Thu, 03 Nov 2022 12:14:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667502871; cv=none; d=google.com; s=arc-20160816; b=ce/+ZnuDyP4sWlbAc06hL3rtC7Ld8/qtKtvYBdUghiA0zrO7iHW3kXsxtQJqJOh98W Gb5WkJvATXMv+6mFiuXzE6AtsH0ekehduoIhuQNJCQOT0rdiCvbXI4dq69F+vJf+/SJW 1b0u22vrz+f4IAgMio5iVc+aPR9awUfUV+s1LYqc5+aJWDNWjRX1n8nAIgabBik3tvai oG8LLTIE2P54EvjCZbfOaQ3DevZVM/JVJJQJZE1ZJ6hNDbpxJzLeOjEFnftAOjgqBQRQ ZcAked7DHcbOTZalBDfbCt3OZLfdC7AukZc3pBdbi4K+CPQ/WB3LlMyD0a3FZrM7g8R6 3kPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2WMjOJmGY0GsqjDFw7J9b6jhLGxRZTGjZleyCuJ0qI4=; b=DUKrej2Q9bLWn3VrWsnEfxcQ/aVP0PWqqIG+2bSGzLa9Qaw7kWyOoE+20TYVzJ4RDU bH4Z91Z4a4iT31nZxa2NoewZJO54eRggsGqi2f+ZLN4OnDCF9Mv8LmkwhucdtyZn9D/7 tpTXZSHRhkaBdxvfw0uRBeKJSXSyoa/uS2DbQCfwLilK8O9wJ8Rgn0YU8fXcH4dsIuSO L4l9ejXPgtfanaxHBVWYc1lZ5kLo1KpS4KJes10oJ/rGkpRcFODuAyfVw1UHWGva7/ma 8gSr82dtISkTI4rSi9fX+wVW2wOYOdyoIWudzhbUL9DNiUrRlXEK64hfRVNctOQdOdRW EBzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Eg7ao3KB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n19-20020a509353000000b00459101ecc5esi1676909eda.468.2022.11.03.12.14.06; Thu, 03 Nov 2022 12:14:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Eg7ao3KB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230233AbiKCSMe (ORCPT + 97 others); Thu, 3 Nov 2022 14:12:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230329AbiKCSMc (ORCPT ); Thu, 3 Nov 2022 14:12:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA6622718 for ; Thu, 3 Nov 2022 11:11:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667499096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2WMjOJmGY0GsqjDFw7J9b6jhLGxRZTGjZleyCuJ0qI4=; b=Eg7ao3KB8TwBrkB5/lnltYldpOMFOOvgT4MfqkGgYMgyeyMGyK8IYLQJto96LH6TQogRZW ft7V26Az6y7YgYKQuxmwiMpAAjzPmSVH2+ChlJ+K0HG8JmzN4uucuMn+TcPZLj99aoMkU+ pdaE3n+OoZ0do4p7BKc8WoayB2lQtGU= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-461-mPSjdz1TOxG5OWbPdX-S4Q-1; Thu, 03 Nov 2022 14:11:34 -0400 X-MC-Unique: mPSjdz1TOxG5OWbPdX-S4Q-1 Received: by mail-qt1-f197.google.com with SMTP id i4-20020ac813c4000000b003a5044a818cso2339901qtj.11 for ; Thu, 03 Nov 2022 11:11:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=2WMjOJmGY0GsqjDFw7J9b6jhLGxRZTGjZleyCuJ0qI4=; b=e7MvmgeMlW/wyqamSpMaAsThICINsNi/W/vZUNSY222HxDrCv+9mdgzODpYujUowsR 4Y6cO0R0T+IgE4mWRnT2hp1rGt8QCHUx4YH5ywrdxE95gRW6KfE4gHmJgpzR6ScY4Gv8 cZRuHDagaNMYN3oLAZhAp+yvzd3vWPdJxL0I9yqc3JMHF7p+qI1OBsKa/7vQ599+5rA3 KVdYrzpSqrj0CRuUOluUSYlrEmD68yqHaV95ILyGzDbZYxSXLp32lONpTzSRgbtIsAMd P7/7DbbRKq8StZspsz9qXxGlS8RcGR071tgs7vG10dSTBUy/D9afF+gQ7a56Js1jQeON mREw== X-Gm-Message-State: ACrzQf3zhn3c2PDk7Hbg6qceXKpeFl+25nxrLSChGp0wvAykGPy+RvrQ KojIdGa6osDh3ZAh8WR9HuKAMnqaqKcxv7i9oQJQf12jz3Fi7CiiTllaDGTrvx5pr+WUl9ydWpG SnE+x3BsyCU+c+ccwtQ5jAg0V X-Received: by 2002:ac8:7190:0:b0:3a5:10c1:5d3d with SMTP id w16-20020ac87190000000b003a510c15d3dmr25252448qto.483.1667499094026; Thu, 03 Nov 2022 11:11:34 -0700 (PDT) X-Received: by 2002:ac8:7190:0:b0:3a5:10c1:5d3d with SMTP id w16-20020ac87190000000b003a510c15d3dmr25252422qto.483.1667499093772; Thu, 03 Nov 2022 11:11:33 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id fe4-20020a05622a4d4400b00397b1c60780sm918078qtb.61.2022.11.03.11.11.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Nov 2022 11:11:33 -0700 (PDT) Date: Thu, 3 Nov 2022 14:11:31 -0400 From: Peter Xu To: Mike Kravetz Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , James Houghton , Miaohe Lin , David Hildenbrand , Muchun Song , Andrea Arcangeli , Nadav Amit , Rik van Riel Subject: Re: [PATCH RFC 02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements Message-ID: References: <20221030212929.335473-1-peterx@redhat.com> <20221030212929.335473-3-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 03, 2022 at 08:42:01AM -0700, Mike Kravetz wrote: > On 10/30/22 17:29, Peter Xu wrote: > > huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a > > hugetlb address. > > > > Normally, it's always safe to walk the pgtable as long as we're with the > > mmap lock held for either read or write, because that guarantees the > > pgtable pages will always be valid during the process. > > > > But it's not true for hugetlbfs: hugetlbfs has the pmd sharing feature, it > > means that even with mmap lock held, the PUD pgtable page can still go away > > from under us if pmd unsharing is possible during the walk. > > > > It's not always the case, e.g.: > > > > (1) If the mapping is private we're not prone to pmd sharing or > > unsharing, so it's okay. > > > > (2) If we're with the hugetlb vma lock held for either read/write, it's > > okay too because pmd unshare cannot happen at all. > > > > Document all these explicitly for huge_pte_offset(), because it's really > > not that obvious. This also tells all the callers on what it needs to > > guarantee huge_pte_offset() thread-safety. > > > > Signed-off-by: Peter Xu > > --- > > arch/arm64/mm/hugetlbpage.c | 32 ++++++++++++++++++++++++++++++++ > > 1 file changed, 32 insertions(+) > > > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > > index 35e9a468d13e..0bf930c75d4b 100644 > > --- a/arch/arm64/mm/hugetlbpage.c > > +++ b/arch/arm64/mm/hugetlbpage.c > > @@ -329,6 +329,38 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, > > return ptep; > > } > > > > +/* > > + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. > > + * Returns the pte_t* if found, or NULL if the address is not mapped. > > + * > > + * NOTE: since this function will walk all the pgtable pages (including not > > + * only high-level pgtable page, but also PUD that can be unshared > > + * concurrently for VM_SHARED), the caller of this function should be > > + * responsible of its thread safety. One can follow this rule: > > + * > > + * (1) For private mappings: pmd unsharing is not possible, so it'll > > + * always be safe if we're with the mmap sem for either read or write. > > + * This is normally always the case, so IOW we don't need to do > > + * anything special. > > Not sure if it is worth calling out that we are safe if the process owning the > page table being walked is single threaded? Although, a pmd can be 'unshared' > due to an operation in another process, the primary is when the pmd is cleared > which only happens when the unshare is initiated by a thread of the process > owning the page tables being walked. Even if the process is single threaded, the pmd unshare can still trigger from other threads too, am I right? Looking at huge_pmd_unshare() callers, the major ones that doesn't need current mm context are: - __unmap_hugepage_range() (e.g. hole punch from other process on file?) - try_to_unmap_one() - try_to_migrate_one() So for example, even for a single thread process, if its pmd shared with another process, the other process can do (1) punch hole on pmd shared region, then (2) munmap() the pmd shared region, then it seems the single thread process can be still on risk of accessing freed pgtable. Thanks, -- Peter Xu