Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2285554pxp; Mon, 21 Mar 2022 15:54:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyA8beb1MufCUbgYJdsyoDrz8CR5bUcn8qfVNoeYUPjipv+0pkSh7q+EVB08U5UwpEkl2T6 X-Received: by 2002:a05:6a00:d50:b0:4f6:d0a2:fe1b with SMTP id n16-20020a056a000d5000b004f6d0a2fe1bmr26106240pfv.84.1647903269543; Mon, 21 Mar 2022 15:54:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647903269; cv=none; d=google.com; s=arc-20160816; b=tN+S9vXpy878gbLtFZ7XsOh4Si6l2fnai9Ekz1B5BW6DC6tCknUP4k7ZJYKL+V0F/V lcZ0EBZvX1EnQpmAIpqQUGQbBRq5DSjcuLiT3tXJZ4uQOwcD1ohNiwMlIabJrGMlnYjV bN4O+TA8ZT2kghrWaFDPktAAI6lfvhqAdeQplsFUznfJg08KRQfcc01fl02GYnW3k/CO LdUHpqpJmJ2GjZvxqHbB6TjunGbQnv8bG1mgGiHbUzrgMJFlnpNC0zqj73KFt5Wzsc2X Ly6Z5dyk8WHIgyaCvz3qnlwi0VdplpyDuwN1giBeQ+mHNcekrR5q4ovS6QuBHUk8vA3t NQYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=VCo3yGjveon5Bx3px7mkMlJvpiiIm4Y9jjFxD1P5TFE=; b=stkV2mX8Po8FxW6tC9H5pzkF7G2OuNP32kZWNGGq7gIjCTlk7SAhAJRVBbjuxPdMO0 b05Qs0lDN4FpQ1AB0wm7+P0Zw2kI8Npf2yNIN/C5uVPMjHAYQHBgyx5bmWh8GrlTFXlH k/qZCBsdmslNAOnqSdk8LJtPwLojAggPJ9vQFL8lKGPXX5d5MNDdUuKyjKDQ1a6Rc9wl MmYc/i5q2S6+tJtg1iErOF17hZjQ0tptwEM2ORwEnIYa7lqI7f9zPyjRCkGIO08pKKzx 48LjAWv/5aFBGL9VZxCFuLeolbwux6c87lIGlRB4vdQw2CogTckGE/WZFgD9Q1kiV0oG G7Lg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id n12-20020a654ccc000000b003816043f159si16385811pgt.846.2022.03.21.15.54.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Mar 2022 15:54:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 89258352E04; Mon, 21 Mar 2022 15:04:19 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351861AbiCUR52 (ORCPT + 99 others); Mon, 21 Mar 2022 13:57:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351886AbiCUR50 (ORCPT ); Mon, 21 Mar 2022 13:57:26 -0400 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A76FC49907 for ; Mon, 21 Mar 2022 10:55:58 -0700 (PDT) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nWMG1-00EuGN-5b; Mon, 21 Mar 2022 17:55:53 +0000 Date: Mon, 21 Mar 2022 17:55:53 +0000 From: Al Viro To: Tejun Heo Cc: Imran Khan , gregkh@linuxfoundation.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [RESEND PATCH v7 7/8] kernfs: Replace per-fs rwsem with hashed rwsems. Message-ID: References: <20220317072612.163143-1-imran.f.khan@oracle.com> <20220317072612.163143-8-imran.f.khan@oracle.com> <536f2392-45d2-2f43-5e9d-01ef50e33126@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 21, 2022 at 06:46:53AM -1000, Tejun Heo wrote: > On Mon, Mar 21, 2022 at 07:29:45AM +0000, Al Viro wrote: > ... > > stabilizing the tree topology. Turn it into rwlock if you wish, > > with that thing being a reader and existing users - writers. > > And don't bother with further scaling, until and unless you see a real > > contention on it. > > Given how rare these renames are, in the (unlikely) case the rename rwsem > becomes a problem, we should probably just switch it to a percpu_rwsem. Why bother with rwsem, when we don't need anything blocking under it? DEFINE_RWLOCK instead of DEFINE_SPINLOCK and don't make it static. Again, we already have a spinlock protecting ->parent and ->name. Existing users: kernfs_name() - can be shared. kernfs_path_from_node() - can be shared. pr_cont_kernfs_name() - exclusive, since that thing works into a static buffer. pr_cont_kernfs_path() - exclusive, same reasons. kernfs_get_parent() - can be shared, but its callers need to be reviewed; that's the prime breeding ground for rename races. kernfs_walk_ns() - this is fucking insane; on the surface, it needs to be exclusive due to the use of the same static buffer. It uses that buffer to generate a pathname, *THEN* walks over it with strsep(). That's an... interesting approach, for the lack of other printable terms - we walk the chain of ancestors, concatenating their names into a buffer and separating those names with slashes, then we walk that buffer, searching for slashes... WTF? kernfs_rename_ns() - exclusive; that's where the tree topology gets changed. So we can just turn that spinlock into rwlock, replace the existing uses with read_lock()/read_unlock() in kernfs_{name,path_from_node,get_parent} and with write_lock()/write_unlock() in the rest of fs/kernfs/dir.c, make it non-static, put extern into kernfs-internal.h and there you go... Wait a sec; what happens if e.g. kernfs_path_from_node() races with __kernfs_remove()? We do _not_ clear ->parent, but we do drop references that used to pin what it used to point to, unless I'm misreading that code... Or is it somehow prevented by drain-related logics? Seeing that it seems to be possible to have kernfs_path_from_node() called from an interrupt context, that could be delicate...