Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp951569rwi; Wed, 19 Oct 2022 05:07:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Cent8R32RIcgHCYvlRD73VJz3m6qJSKLK/92Nvf7e2TJJ0FS343LHkCuiaTw6rBLFQtDm X-Received: by 2002:aa7:84ce:0:b0:563:1aaf:81d5 with SMTP id x14-20020aa784ce000000b005631aaf81d5mr8387040pfn.63.1666181277958; Wed, 19 Oct 2022 05:07:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666181277; cv=none; d=google.com; s=arc-20160816; b=z9DACp7DV+zSeCpdpyjGs3RWE3dfvXBU7p7qbQ2d+cSn6dgbcxvvQAP43YjlKBAcoC dCP2u+Ppa2taow86VWCjZB84AWlFoWZWZOwLR8Od9KcZ/3Ddgvry9qmgaJtb2XnxhOrB wV8Y/Qgqrox+mI4Vg1LOpUInL24Lo2HX2bB9Mye/R7OeIU35OcYeCVDlnJcNliagBZNM 3py4fYK4MyxGfqr/qfZLbUzTabEBHVgUpPmZD7KP/G0HjC5jLcY0QuAM5WcDGvqBkhbM UfjTYngFM4w/vuBn87cvDWZS9J41Kb/n09xWaYb81rM9JF7LrDAG0DH4dxkbGfpcuaIs TyCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=kt4bBTVbXW81YAenwRldgzFfhVYXZdpeKJjNUxSlOeU=; b=opI9RBAXrbTvDedc82w3CSXKwS1vpHP19LbHLlo6vTA0UJVyx/Qv5avq2LXl3B7kup CSuS7ptu5b5ccZdcx+YCIVbj+j5Pzk4JC+6dylymepqo9DdkSwcrtFvwi7xgYfmjXUC4 alul2/jPlqYhtqHNzTJu5qHKfF7JNqZa5eM9Y2iPAdq5+xYbZgFqbFEhBD7QdJewCZNQ UtbZfYTtA+9PjFyUNmCtpWI4LZj9L44vR9N+Ja1io5MR0a0IT2K9JVE++YU8XW47apCn MxUqcNxPy+NhzfZxO6hXz9Kts9vpPTbiyZHZXslEvv7nFWHOkQp7sm5LMvxXck9idVkN 3xtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@poochiereds-net.20210112.gappssmtp.com header.s=20210112 header.b=s6iUeH+w; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z6-20020a056a00240600b00562b0b92756si20250475pfh.297.2022.10.19.05.07.22; Wed, 19 Oct 2022 05:07:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@poochiereds-net.20210112.gappssmtp.com header.s=20210112 header.b=s6iUeH+w; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229658AbiJSMFV (ORCPT + 99 others); Wed, 19 Oct 2022 08:05:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231271AbiJSMEx (ORCPT ); Wed, 19 Oct 2022 08:04:53 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C55418DD47 for ; Wed, 19 Oct 2022 04:40:59 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id z8so11431416qtv.5 for ; Wed, 19 Oct 2022 04:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=poochiereds-net.20210112.gappssmtp.com; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=kt4bBTVbXW81YAenwRldgzFfhVYXZdpeKJjNUxSlOeU=; b=s6iUeH+w4n4Ls7LtfummMV7p+bd31M+wBWLZGj2aUXfFA8LIb5uTTJ7Xr8X4vA24a9 OM7IktSapN4qGr5v8AC5WnjH+Y+owKx10z0eBdq28kG6rEAeFWnt9N8hLxX0yWSLsxqv NtY77tFxE2oyLa25ICs4PyXUyWZjBScSUcbttZ8jyFwp+CS5A0ooYeXB+CVr8s2GIFd8 qeRIls/05VNVyH6xT5lTwhGOqyFA1TSiITfwDz8Ivr6l++70SL+MijcUYU8Z8y/qhUGO KCnrIhKwKfYU6CePGr6cq2owfDGQqX86vcELTwPYny7YupfHv8C79ZDI4OJGL35YFPYI 4cqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=kt4bBTVbXW81YAenwRldgzFfhVYXZdpeKJjNUxSlOeU=; b=Bki7kF8h941Mj1szzLe+ZgLUs8iNKRcC8uJk0DT+m4u7rCUPcn8i28LWLNHKa8am6v jBOCbpzK7GQP39Csr5UKzeE9okHqTKnbU5jpeAfl+6tIm+0fNsWNEHm4yGuhRL5BfTOs mn7I0aSyl53/cYCRKf+KjbHdaLPGKgq4uhfgCLM9cSB5qK/YOFwYMdlpoS863Lt/6f0R I0yCy3yO9jc+IIgHRwF0HxWsaVyXB3bP5UK4eEEyM3yeQZsT5vuLavps3/CDgD8iQ1iQ p+nEZLLIINY9xWc0x90jf/TzN9esD6pIFiGsgzpWzYhlj93lA0lMFnYI9VVNI4NIv8Xk hECA== X-Gm-Message-State: ACrzQf2ErN9b6vusSF/NoA0sVnPvQCNmPkSkpvKhz4v57+MhqvqDbHar /zn9It1ve7cw9Bmy935IN6r1KQ== X-Received: by 2002:ac8:5745:0:b0:35c:9f9b:9d56 with SMTP id 5-20020ac85745000000b0035c9f9b9d56mr5905199qtx.103.1666179581917; Wed, 19 Oct 2022 04:39:41 -0700 (PDT) Received: from [192.168.1.3] (68-20-15-154.lightspeed.rlghnc.sbcglobal.net. [68.20.15.154]) by smtp.gmail.com with ESMTPSA id de38-20020a05620a372600b006ce30a5f892sm4927509qkb.102.2022.10.19.04.39.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Oct 2022 04:39:39 -0700 (PDT) Message-ID: Subject: Re: [PATCH v4 5/7] NFSD: Use rhashtable for managing nfs4_file objects From: Jeff Layton To: Chuck Lever , linux-nfs@vger.kernel.org Cc: neilb@suse.de Date: Wed, 19 Oct 2022 07:39:38 -0400 In-Reply-To: <166612313084.1291.5764156173845222109.stgit@manet.1015granger.net> References: <166612295223.1291.11761205673682408148.stgit@manet.1015granger.net> <166612313084.1291.5764156173845222109.stgit@manet.1015granger.net> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,T_SPF_PERMERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, 2022-10-18 at 15:58 -0400, Chuck Lever wrote: > fh_match() is costly, especially when filehandles are large (as is > the case for NFSv4). It needs to be used sparingly when searching > data structures. Unfortunately, with common workloads, I see > multiple thousands of objects stored in file_hashtbl[], which always > has only 256 buckets, which makes the hash chains quite lengthy. >=20 > Walking long hash chains with the state_lock held blocks other > activity that needs that lock. >=20 > To help mitigate the cost of searching with fh_match(), replace the > nfs4_file hash table with an rhashtable, which can dynamically > resize its bucket array to minimize hash chain length. The ideal for > this use case is one bucket per inode. >=20 > The result is an improvement in the latency of NFSv4 operations > and the reduction of nfsd CPU utilization due to the cost of > fh_match() and the CPU cache misses incurred while walking long > hash chains in the nfs4_file hash table. >=20 > Note that hash removal is no longer protected by the state_lock. > This is because insert_nfs4_file() takes the RCU read lock then the > state_lock. rhltable_remove() takes the RCU read lock internally; > if remove_nfs4_file() took the state_lock before calling > rhltable_remove(), that would result in a lock inversion. >=20 > Signed-off-by: Chuck Lever > --- > fs/nfsd/nfs4state.c | 215 +++++++++++++++++++++++++++++++++++----------= ------ > fs/nfsd/state.h | 5 - > 2 files changed, 147 insertions(+), 73 deletions(-) >=20 > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 2b850de288cf..a63334ad61f6 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -44,7 +44,9 @@ > #include > #include > #include > +#include > #include > + > #include "xdr4.h" > #include "xdr4cb.h" > #include "vfs.h" > @@ -84,6 +86,7 @@ static bool check_for_locks(struct nfs4_file *fp, struc= t nfs4_lockowner *lowner) > static void nfs4_free_ol_stateid(struct nfs4_stid *stid); > void nfsd4_end_grace(struct nfsd_net *nn); > static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cp= ntf_state *cps); > +static void remove_nfs4_file(struct nfs4_file *fi); > =20 > /* Locking: */ > =20 > @@ -577,11 +580,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu= ) > void > put_nfs4_file(struct nfs4_file *fi) > { > - might_lock(&state_lock); > - > - if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) { > - hlist_del_rcu(&fi->fi_hash); > - spin_unlock(&state_lock); > + if (refcount_dec_and_test(&fi->fi_ref)) { > + remove_nfs4_file(fi); > WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate)); > WARN_ON_ONCE(!list_empty(&fi->fi_delegations)); > call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu); > @@ -695,19 +695,82 @@ static unsigned int ownerstr_hashval(struct xdr_net= obj *ownername) > return ret & OWNER_HASH_MASK; > } > =20 > -/* hash table for nfs4_file */ > -#define FILE_HASH_BITS 8 > -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS) > +static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp; > + > +/* > + * The returned hash value is based solely on the address of an in-code > + * inode, a pointer to a slab-allocated object. The entropy in such a > + * pointer is concentrated in its middle bits. > + */ > +static u32 nfs4_file_inode_hash(const struct inode *inode, u32 seed) > +{ > + u32 k; > + > + k =3D ((unsigned long)inode) >> L1_CACHE_SHIFT; > + return jhash2(&k, 1, seed); > +} > =20 > -static unsigned int file_hashval(struct svc_fh *fh) > +/** > + * nfs4_file_key_hashfn - Compute the hash value of a lookup key > + * @data: key on which to compute the hash value > + * @len: rhash table's key_len parameter (unused) > + * @seed: rhash table's random seed of the day > + * > + * Return value: > + * Computed 32-bit hash value > + */ > +static u32 nfs4_file_key_hashfn(const void *data, u32 len, u32 seed) > { > - struct inode *inode =3D d_inode(fh->fh_dentry); > + const struct svc_fh *fhp =3D data; > =20 > - /* XXX: why not (here & in file cache) use inode? */ > - return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS); > + return nfs4_file_inode_hash(d_inode(fhp->fh_dentry), seed); > } > =20 > -static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; > +/** > + * nfs4_file_obj_hashfn - Compute the hash value of an nfs4_file object > + * @data: object on which to compute the hash value > + * @len: rhash table's key_len parameter (unused) > + * @seed: rhash table's random seed of the day > + * > + * Return value: > + * Computed 32-bit hash value > + */ > +static u32 nfs4_file_obj_hashfn(const void *data, u32 len, u32 seed) > +{ > + const struct nfs4_file *fi =3D data; > + > + return nfs4_file_inode_hash(fi->fi_inode, seed); > +} > + > +/** > + * nfs4_file_obj_cmpfn - Match a cache item against search criteria > + * @arg: search criteria > + * @ptr: cache item to check > + * > + * Return values: > + * %0 - Success; item matches search criteria > + */ > +static int nfs4_file_obj_cmpfn(struct rhashtable_compare_arg *arg, > + const void *ptr) > +{ > + const struct svc_fh *fhp =3D arg->key; > + const struct nfs4_file *fi =3D ptr; > + > + return fh_match(&fi->fi_fhandle, &fhp->fh_handle) ? 0 : 1; > +} > + > +static const struct rhashtable_params nfs4_file_rhash_params =3D { > + .key_len =3D sizeof_field(struct nfs4_file, fi_inode), > + .key_offset =3D offsetof(struct nfs4_file, fi_inode), > + .head_offset =3D offsetof(struct nfs4_file, fi_rhash), > + .hashfn =3D nfs4_file_key_hashfn, > + .obj_hashfn =3D nfs4_file_obj_hashfn, > + .obj_cmpfn =3D nfs4_file_obj_cmpfn, > + > + /* Reduce resizing churn on light workloads */ > + .min_size =3D 512, /* buckets */ > + .automatic_shrinking =3D true, > +}; > =20 > /* > * Check if courtesy clients have conflicting access and resolve it if p= ossible > @@ -4251,11 +4314,8 @@ static struct nfs4_file *nfsd4_alloc_file(void) > } > =20 > /* OPEN Share state helper functions */ > -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, > - struct nfs4_file *fp) > +static void init_nfs4_file(const struct svc_fh *fh, struct nfs4_file *fp= ) > { > - lockdep_assert_held(&state_lock); > - > refcount_set(&fp->fi_ref, 1); > spin_lock_init(&fp->fi_lock); > INIT_LIST_HEAD(&fp->fi_stateids); > @@ -4273,7 +4333,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsi= gned int hashval, > INIT_LIST_HEAD(&fp->fi_lo_states); > atomic_set(&fp->fi_lo_recalls, 0); > #endif > - hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]); > } > =20 > void > @@ -4626,71 +4685,75 @@ move_to_close_lru(struct nfs4_ol_stateid *s, stru= ct net *net) > nfs4_put_stid(&last->st_stid); > } > =20 > -/* search file_hashtbl[] for file */ > -static struct nfs4_file * > -find_file_locked(struct svc_fh *fh, unsigned int hashval) > +static struct nfs4_file *find_nfs4_file(const struct svc_fh *fhp) > { > - struct nfs4_file *fp; > + struct rhlist_head *tmp, *list; > + struct nfs4_file *fi =3D NULL; > =20 > - hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, > - lockdep_is_held(&state_lock)) { > - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { > - if (refcount_inc_not_zero(&fp->fi_ref)) > - return fp; > + list =3D rhltable_lookup(&nfs4_file_rhltable, fhp, > + nfs4_file_rhash_params); > + rhl_for_each_entry_rcu(fi, tmp, list, fi_rhash) > + if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { > + if (!refcount_inc_not_zero(&fi->fi_ref)) > + fi =3D NULL; > + break; > } > - } > - return NULL; > + return fi; > } > =20 > -static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_f= h *fh, > - unsigned int hashval) > +/* > + * On hash insertion, identify entries with the same inode but > + * distinct filehandles. They will all be in the same hash bucket > + * because we hash by the address of the nfs4_file's struct inode. > + */ > +static struct nfs4_file *insert_file(struct nfs4_file *new, > + const struct svc_fh *fhp) > { > - struct nfs4_file *fp; > - struct nfs4_file *ret =3D NULL; > - bool alias_found =3D false; > + struct rhlist_head *tmp, *list; > + struct nfs4_file *fi; > + int err; > =20 > spin_lock(&state_lock); > - hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, > - lockdep_is_held(&state_lock)) { > - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { > - if (refcount_inc_not_zero(&fp->fi_ref)) > - ret =3D fp; > - } else if (d_inode(fh->fh_dentry) =3D=3D fp->fi_inode) > - fp->fi_aliased =3D alias_found =3D true; > - } > - if (likely(ret =3D=3D NULL)) { > - nfsd4_init_file(fh, hashval, new); > - new->fi_aliased =3D alias_found; > - ret =3D new; > + > + err =3D rhltable_insert_key(&nfs4_file_rhltable, fhp, > + &new->fi_rhash, > + nfs4_file_rhash_params); > + if (err) { > + spin_unlock(&state_lock); > + return NULL; > } > + > + list =3D rhltable_lookup(&nfs4_file_rhltable, fhp, > + nfs4_file_rhash_params); > + rhl_for_each_entry_rcu(fi, tmp, list, fi_rhash) > + if (fi->fi_inode =3D=3D d_inode(fhp->fh_dentry) && > + !fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { > + fi->fi_aliased =3D true; > + new->fi_aliased =3D true; > + } > + > spin_unlock(&state_lock); > - return ret; > + return new; > } > =20 > -static struct nfs4_file * find_file(struct svc_fh *fh) > +static noinline struct nfs4_file * > +insert_nfs4_file(struct nfs4_file *new, const struct svc_fh *fhp) > { > - struct nfs4_file *fp; > - unsigned int hashval =3D file_hashval(fh); > + struct nfs4_file *fi; > =20 > - rcu_read_lock(); > - fp =3D find_file_locked(fh, hashval); > - rcu_read_unlock(); > - return fp; > + /* Do not hash the same filehandle more than once */ > + fi =3D find_nfs4_file(fhp); > + if (unlikely(fi)) > + return fi; > + > + init_nfs4_file(fhp, new); > + return insert_file(new, fhp); > } > =20 > -static struct nfs4_file * > -find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) > +static noinline void remove_nfs4_file(struct nfs4_file *fi) > { > - struct nfs4_file *fp; > - unsigned int hashval =3D file_hashval(fh); > - > - rcu_read_lock(); > - fp =3D find_file_locked(fh, hashval); > - rcu_read_unlock(); > - if (fp) > - return fp; > - > - return insert_file(new, fh, hashval); > + rhltable_remove(&nfs4_file_rhltable, &fi->fi_rhash, > + nfs4_file_rhash_params); > } > =20 > /* > @@ -4703,9 +4766,12 @@ nfs4_share_conflict(struct svc_fh *current_fh, uns= igned int deny_type) > struct nfs4_file *fp; > __be32 ret =3D nfs_ok; > =20 > - fp =3D find_file(current_fh); > + rcu_read_lock(); > + fp =3D find_nfs4_file(current_fh); > + rcu_read_unlock(); > if (!fp) > return ret; > + > /* Check for conflicting share reservations */ > spin_lock(&fp->fi_lock); > if (fp->fi_share_deny & deny_type) > @@ -5548,7 +5614,11 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct= svc_fh *current_fh, struct nf > * and check for delegations in the process of being recalled. > * If not found, create the nfs4_file struct > */ > - fp =3D find_or_add_file(open->op_file, current_fh); > + rcu_read_lock(); > + fp =3D insert_nfs4_file(open->op_file, current_fh); > + rcu_read_unlock(); It'd probably be better to push this rcu_read_lock down into insert_nfs4_file. You don't need to hold it over the actual insertion, since that requires the state_lock. > + if (unlikely(!fp)) > + return nfserr_jukebox; > if (fp !=3D open->op_file) { > status =3D nfs4_check_deleg(cl, open, &dp); > if (status) > @@ -7905,10 +7975,16 @@ nfs4_state_start(void) > { > int ret; > =20 > - ret =3D nfsd4_create_callback_queue(); > + ret =3D rhltable_init(&nfs4_file_rhltable, &nfs4_file_rhash_params); > if (ret) > return ret; > =20 > + ret =3D nfsd4_create_callback_queue(); > + if (ret) { > + rhltable_destroy(&nfs4_file_rhltable); > + return ret; > + } > + > set_max_delegations(); > return 0; > } > @@ -7939,6 +8015,7 @@ nfs4_state_shutdown_net(struct net *net) > =20 > nfsd4_client_tracking_exit(net); > nfs4_state_destroy_net(net); > + rhltable_destroy(&nfs4_file_rhltable); > #ifdef CONFIG_NFSD_V4_2_INTER_SSC > nfsd4_ssc_shutdown_umount(nn); > #endif > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > index ae596dbf8667..ad20467c9dee 100644 > --- a/fs/nfsd/state.h > +++ b/fs/nfsd/state.h > @@ -536,16 +536,13 @@ struct nfs4_clnt_odstate { > * inode can have multiple filehandles associated with it, so there is > * (potentially) a many to one relationship between this struct and stru= ct > * inode. > - * > - * These are hashed by filehandle in the file_hashtbl, which is protecte= d by > - * the global state_lock spinlock. > */ > struct nfs4_file { > refcount_t fi_ref; > struct inode * fi_inode; > bool fi_aliased; > spinlock_t fi_lock; > - struct hlist_node fi_hash; /* hash on fi_fhandle */ > + struct rhlist_head fi_rhash; > struct list_head fi_stateids; > union { > struct list_head fi_delegations; >=20 >=20 --=20 Jeff Layton