Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp6227524rwb; Wed, 18 Jan 2023 02:46:34 -0800 (PST) X-Google-Smtp-Source: AMrXdXtXt+CIWY14Pwlv2g/EdgTMyQG3p/sef26N1AMu/fS/TSY8NfpyGITKJYu16f7dUplCuzEU X-Received: by 2002:a17:907:c911:b0:870:5bf4:a3b1 with SMTP id ui17-20020a170907c91100b008705bf4a3b1mr5578471ejc.33.1674038794325; Wed, 18 Jan 2023 02:46:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674038794; cv=none; d=google.com; s=arc-20160816; b=DyNDd0xzNEVbXpz/Rs7ExFYt3R/s0F3ElDOh+vU6gHa61KRGvQa2zoEH9gimShY6je PFauOx/WC8AnV1j7UetRggI0vWFgdWitqh7MkBZThpft9GBwt0A7zwmL4r1OOHBQlU48 5sljkFXSxoxG3C9Ljl+3CfebuJlk4q/fud9SF4EWj/AJl9KRWH7SjuMV56xGjOlmTgx1 ymSpIQfnspQ04MOW5OHlump+dT9u7wB4Qg0axd5oqMLZ+/tPNloZfZze6QOm/9qEY9E4 RH75DFsMrxc8pcyuhF0UK1C+UXnKDKldryblRQhjtgbPWgoacZ87Br1JBksxuhOrjPMy DStg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=qkSsxdGDhi7OFnoEmmEI+/6UTMsWfEo5f5xKht27nUU=; b=kNic6M9EtBE6CquJbQUARaHd9WYWLMYQ5FChgs7pVXkvQL94KEzR42KLfYTwwvP7yG 8cbDY3zQloqnGi1egvCjX4JUdQ1JQRC5vHCd3EncQonf8SX4PK/ytEzIPPopuabLHUAs xzE4tSkYT+htzDi2TJ645wh3TY4r7VBK3A+UfSbb2lu8csm7VueXx9jx9YJcjznFeCfe 8gwI9IqFb7FE2DpPqrrST6Jl7Oj/n5HPejjSg00ff4y7naScRFDxAw07jCvaKfGzySPC JQg+QjwecM34twP7mcvke/okHLDUxskcDd0KHJR7BtWoYg4YgIT+dJ8lfXI0gYyPSRrD 8tJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=XXI0e3SJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y26-20020aa7ccda000000b0049b935e07e4si18082989edt.207.2023.01.18.02.46.23; Wed, 18 Jan 2023 02:46:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=XXI0e3SJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229692AbjARKf7 (ORCPT + 45 others); Wed, 18 Jan 2023 05:35:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229865AbjARKfA (ORCPT ); Wed, 18 Jan 2023 05:35:00 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 300D28385F for ; Wed, 18 Jan 2023 01:40:13 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D418521066; Wed, 18 Jan 2023 09:40:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1674034811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qkSsxdGDhi7OFnoEmmEI+/6UTMsWfEo5f5xKht27nUU=; b=XXI0e3SJYUfmbTv7Egu/VOj5qe2veVpfQOCo54bZ/6FeH0dQKnz5D55LikQtuGQQuUNqfI Gn7IKTKaqjbGrtd1XZzr4g/KpxDTk79Mv+uc0zAiZxKfI8Y788ts0MzsC3fxrwN4O4u8pN Wa2w/A2JXRQ13PjgVrmAhrdegohLoFI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A9B3F139D2; Wed, 18 Jan 2023 09:40:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GwnzKHu+x2MPPgAAMHmgww (envelope-from ); Wed, 18 Jan 2023 09:40:11 +0000 Date: Wed, 18 Jan 2023 10:40:09 +0100 From: Michal Hocko To: Jann Horn Cc: Suren Baghdasaryan , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 18/41] mm/khugepaged: write-lock VMA while collapsing a huge page Message-ID: References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-19-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 17-01-23 21:28:06, Jann Horn wrote: > On Tue, Jan 17, 2023 at 4:25 PM Michal Hocko wrote: > > On Mon 09-01-23 12:53:13, Suren Baghdasaryan wrote: > > > Protect VMA from concurrent page fault handler while collapsing a huge > > > page. Page fault handler needs a stable PMD to use PTL and relies on > > > per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), > > > set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will > > > not be detected by a page fault handler without proper locking. > > > > I am struggling with this changelog. Maybe because my recollection of > > the THP collapsing subtleties is weak. But aren't you just trying to say > > that the current #PF handling and THP collapsing need to be mutually > > exclusive currently so in order to keep that assumption you have mark > > the vma write locked? > > > > Also it is not really clear to me how that handles other vmas which can > > share the same thp? > > It's not about the hugepage itself, it's about how the THP collapse > operation frees page tables. > > Before this series, page tables can be walked under any one of the > mmap lock, the mapping lock, and the anon_vma lock; so when khugepaged > unlinks and frees page tables, it must ensure that all of those either > are locked or don't exist. This series adds a fourth lock under which > page tables can be traversed, and so khugepaged must also lock out that one. > > There is a codepath in khugepaged that iterates through all mappings > of a file to zap page tables (retract_page_tables()), which locks each > visited mm with mmap_write_trylock() and now also does > vma_write_lock(). OK, I see. This would be a great addendum to the changelog. > I think one aspect of this patch that might cause trouble later on, if > support for non-anonymous VMAs is added, is that retract_page_tables() > now does vma_write_lock() while holding the mapping lock; the page > fault handling path would probably take the locks the other way > around, leading to a deadlock? So the vma_write_lock() in > retract_page_tables() might have to become a trylock later on. This, right? #PF retract_page_tables vma_read_lock i_mmap_lock_write i_mmap_lock_read vma_write_lock I might be missing something but I have only found huge_pmd_share to be called from the #PF path. That one should be safe as it cannot be a target for THP. Not that it would matter much because such a dependency chain would be really subtle. -- Michal Hocko SUSE Labs