Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp688175pxb; Thu, 24 Mar 2022 05:19:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxnoHcc0ZNfW5Nd+u3lexp8HHdAPTZDTaNWZ5ecovYg3UkoJJbTkCEdHDBAWbrLfugP8Piw X-Received: by 2002:a05:6a00:814:b0:4f7:4c6:1227 with SMTP id m20-20020a056a00081400b004f704c61227mr4929384pfk.54.1648124369798; Thu, 24 Mar 2022 05:19:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648124369; cv=none; d=google.com; s=arc-20160816; b=KgkSz/DGtSQBlsOgLbzSrPu3RhcQu24sj40gCw1ZukMLPtsKrGzZaINW2T5ZhaWYlx 1Q1eoVGe7XU92T/QT1piLJZgn5B5CrvxsahlLRtMR5ALLHWtNNEO+zSU2Y8zgU+TPtIX VIkfmxKCO0utrtFVwnVjQJ6lpmDNEbYlJxyuGROznY7dgiONnY/G1dNeaxvS4jrGi9Bg aY1SPbHu1BITPM3ASgOUveO40DtOStyErBqzEh5+QEGAgAiZySF65kYE7vhJHY5qGsyQ qT0koZE0k0ShaP97OK3J31NUnJOS55Zzf4p7Tsj3jzNX28XYKbfG/fuLXUHvL+n2ajhA tTjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Frj9I9cTH6FLXKjrTFBZsCGB8sdODq+Mv7HIowwztn8=; b=tPMK5GcROFSFXGjguMm8bkMyDcHCG2+95btSYVsy9OwL24u6nRpzh1oo/y3xLtZVO7 XEoEZrzhiP6/Nt8d2yZ9bDcCXoVyMPB8TLujLagBYIIz8LX0lu0/49PqesrqkXkChmyz twSxiUrksLnsDceaN3QgMPi4vp/VQf/4+IfUBNFU3yEIYF917lVW/InQ4wVjCe+Q+vOx JSzLqC6cOX+gWxhQF8yWUVRKTflKe2C4lr67lFajvBFPPNy3usZrzC/U0ohyTkYtaQCv 2vF2jqcqH9/dzZd6O7xOszd7NCIGvc3Gy0+bXizNluAqPo5QgOD96oVQDSIj4qGPXngj o94A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YFGFG1Ap; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p8-20020a1709028a8800b00153b2d16648si20332933plo.592.2022.03.24.05.19.12; Thu, 24 Mar 2022 05:19:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YFGFG1Ap; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344881AbiCWVWn (ORCPT + 99 others); Wed, 23 Mar 2022 17:22:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241081AbiCWVWm (ORCPT ); Wed, 23 Mar 2022 17:22:42 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DE2985644 for ; Wed, 23 Mar 2022 14:21:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E13D2B820C5 for ; Wed, 23 Mar 2022 21:21:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D880C340EE; Wed, 23 Mar 2022 21:21:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1648070469; bh=pi4TsVzQEBzAwpJQOAYodMLonLQpZSt0gqr4sMg+vN4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=YFGFG1ApYl/SfF0lPexzaDWCb3nM7r0Q1oJaR4DWf34eUD1IdXXzArV5VsHEeDzkx jals1gSGpf1j4r5E+jG7aLG0ljiJ4j9qmS13NuVq5Qjcbi782RoAKBrKgOOXGyStGr f4nHE5kKMLC/fJAc/ABzSnB6LEv1UTucf4wqTRBsMHXT8OLtrC7ZOkVVb4AaJoKplP oGu9U5DcC/aMXgIyFsAi9k918VU87xWdBDfLvYO0rr7V3+Wc+auC4qN0tHoUcgm9sm 9QacqXJLRd/WNbzWAWQRuGfzfVozPsws8uKQWcBTaS16QBqf8/W6/gEopsyA7iH3ZU bo2GrGLZmb11Q== Date: Wed, 23 Mar 2022 14:21:07 -0700 From: Jaegeuk Kim To: Linus Torvalds Cc: Tim Murray , Waiman Long , Linux Kernel Mailing List , Linux F2FS Dev Mailing List Subject: Re: [GIT PULL] f2fs for 5.18 Message-ID: References: <51cded74-3135-eed8-06d3-0b2165e3b379@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/23, Linus Torvalds wrote: > On Wed, Mar 23, 2022 at 9:26 AM Jaegeuk Kim wrote: > > > > OTOH, I was suspecting the major contetion would be > > f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem); > > , which was used for most of filesystem operations. > > Very possible, I was just looking at a random one in f2fs/file.c > obviously with no actual numbers in hand. > > In general, I really hate seeing specialized locks, but this f2fs use > case is in some ways worse than other ad-hoc locks I've seen - simply > because it's been one whole-sale conversion of "down_read/write()" to > "f2fs_down_read/write()" - regardless of _which_ lock is being locked. > > (Now, it's not all bad news - in other respects it's much better than > some ad-hoc locking: at least you still will participate in lockdep, > and you use the actual low-level locking primitives instead of making > up your own and getting memory ordering wrong). > > But basically I think it would have been much nicer if you would have > done this for just the _one_ lock that mattered, whichever lock that > might be. Partly as documentation, and partly so that maybe some day > you can split that lock up (or maybe notice cases where you can avoid > it entirely). > > For example, if it's really just f2fs_lock_op() that needs this, the > special "wait_event(trylock)" hack could have been entirely local to > just *that*, rather than affecting all the other locks too. > > And the very first f2fs_lock_op() case I find, I see that the lock is > pointless. Again, that's unlikely to be the *cause* of any of these > problems, but the fact that I've now looked at two of the f2fs locks, > and gone "the locking seems to be pointlessly badly done" does imply > that the problem isn't "down_read()", it's the use. > > That other lock I reacted to was the f2fs_lock_op(sbi) at the top of > f2fs_new_inode(). > > Look, you have a new inode that you just allocated, that nobody else > can yet access. > > And the only thing that that f2fs_lock_op(sbi) -> f2fs_unlock_op(sbi) > sequence protects is the f2fs_alloc_nid() for that new inode. > > Ok, so maybe f2fs_alloc_nid() needs that lock? > > No it doesn't. It already has > > - &nm_i->nid_list_lock spinlock for its own in-memory internal NID caches > > *and* when that fails > > - &NM_I(sbi)->build_lock for protecting all of f2fs_build_free_nids() > > *and* inside of that lock > > - f2fs_down_read(&nm_i->nat_tree_lock) for protecting the NAT tree structures. > > So I see two major issues in the very first user of that > f2fs_lock_op() that I look at: > > (a) it seems to be entirely unnecessary Actually, when I took a look at the above path, indeed, f2fs_lock_op in f2fs_new_inode may be unnecessary at all aligned to your points. Even, that might hurt performance since we get f2fs_lock_op twice before dealing with dentries like f2fs_add_link. Let me test a bit whether there's any regression if I remove f2fs_lock_op in f2fs_new_inode. > > (b) it is a classic case of "multiple nested locks". > > Now, it's possible that I'm wrong on (a) and there's some odd reason > that lock is needed (maybe there is a lock ordering problem for one of > the other locks between readers and writers, and the op-lock acts as a > mutual exclusion for that). > > But (b) really is a classic problem case for locking: nested locks are > *much* more likely to cause horrible contention, because not any > contention in any of the locks will end up affecting the others (and > you easily get "bunching up" of different processes when they get > synchronized with each other thanks to the inner lock). > > Nested locking is often required, but it's one of those things where > you just need to be aware that they can be horribly bad for > performance, _particularly_ if an inner lock sees contention and > essentially "transfers" that contention to an outer lock. > > Maybe I've been unlucky. Maybe the two cases I happened to look at > were just completely harmless, and very unusual. But the fact that I'm > two-for-two and go "that locking looks like a prime candidate to be > fixed" makes me suspect there's a lot of low-hanging fruit in there. Thank you so much for taking the time to write this great advise. Let me dig more whether there's anything that I can relax the lock use-cases further. (tbh, I haven't reviewed them for a long time due to focusing on stability issues mostly.) > > And that whole "wait_event(trylock)" thing is a symptom of problematic > f2fs locking, rather than a solution to it. Understood. If I can avoid lock contention upfront, definitely it wouldn't need to apply rwsem change at all. Let me take some time to think about how to move forward. > > Linus