Date: Sat, 8 Aug 2015 06:36:24 -0400
From: Jeff Layton <jlayton@poochiereds.net>
To: Kinglong Mee <kinglongmee@gmail.com>
Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility
 to nfsd
Message-ID: <20150808063624.24b6fcc3@tlielax.poochiereds.net>
In-Reply-To: <55C549D6.4030104@gmail.com>
References: <1438264341-18048-1-git-send-email-jeff.layton@primarydata.com>
	<1438809216-4846-1-git-send-email-jeff.layton@primarydata.com>
	<1438809216-4846-5-git-send-email-jeff.layton@primarydata.com>
	<55C4CEA9.306@gmail.com>
	<20150807131857.0a94bdfe@synchrony.poochiereds.net>
	<55C549D6.4030104@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-nfs-owner@vger.kernel.org

On Sat, 8 Aug 2015 08:14:14 +0800
Kinglong Mee <kinglongmee@gmail.com> wrote:

> On 8/8/2015 01:18, Jeff Layton wrote:
> > On Fri, 7 Aug 2015 23:28:41 +0800
> > Kinglong Mee <kinglongmee@gmail.com> wrote:
> > 
> >> On 8/6/2015 05:13, Jeff Layton wrote:
> >>> Currently, NFSv2/3 reads and writes have to open a file, do the read or
> >>> write and then close it again for each RPC. This is highly inefficient,
> >>> especially when the underlying filesystem has a relatively slow open
> >>> routine.
> >>>
> >>> This patch adds a new open file cache to knfsd. Rather than doing an
> >>> open for each RPC, the read/write handlers can call into this cache to
> >>> see if there is one already there for the correct filehandle and
> >>> NFS_MAY_READ/WRITE flags.
> >>>
> >>> If there isn't an entry, then we create a new one and attempt to
> >>> perform the open. If there is, then we wait until the entry is fully
> >>> instantiated and return it if it is at the end of the wait. If it's
> >>> not, then we attempt to take over construction.
> >>>
> >>> Since the main goal is to speed up NFSv2/3 I/O, we don't want to
> >>> close these files on last put of these objects. We need to keep them
> >>> around for a little while since we never know when the next READ/WRITE
> >>> will come in.
> >>>
> >>> Cache entries have a hardcoded 1s timeout, and we have a recurring
> >>> workqueue job that walks the cache and purges any entries that have
> >>> expired.
> >>>
> >>> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> >> ... snip ... 
> >>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> >>> index 9051ee54faa3..adf7e78b8e43 100644
> >>> --- a/fs/nfsd/filecache.h
> >>> +++ b/fs/nfsd/filecache.h
> >>> @@ -4,6 +4,7 @@
> >>>  #include <linux/jhash.h>
> >>>  #include <linux/sunrpc/xdr.h>
> >>>  
> >>> +#include "nfsfh.h"
> >>>  #include "export.h"
> >>>  
> >>>  /* hash table for nfs4_file */
> >>> @@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
> >>>  	return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
> >>>  }
> >>>  
> >>> +struct nfsd_file {
> >>
> >> There is a nfs4_file in nfsd, it's not easy to distinguish them for a new folk.
> >> More comments or a meaningful name is better.
> >>
> > 
> > Maybe. Again, any suggestions? My hope was that eventually we can unify
> > the two caches somehow but maybe that's naive.
> 
> I cannot find a better name for the new file cache. More comments.
> 
> I don't agree with merging those two into one cache.
> 
> nfsv4's file cache is a state resource of client which will exist since close
> or lease expire. But nfsd_file cache is a temporary resource for nfsv2/v3 client
> without state. 
> 

You're probably right here. It was idle thought from when I first
started this work. What we probably would want to do however is to
layer the nfs4_file cache on top of this cache instead of having it
manage filps on its own.

I tried to design this cache so that it can handle O_RDWR opens, even
though the current callers don't actually ever request those. It should
be possible to hook up the nfs4_file cache to it, though I'd prefer to
wait until we have this code in place first and then do that later.

> Also, for nfsv4's conflict checking, should we check the temporary file cache
> for nfsv2/v3 too?
> 

You mean for share/deny modes? We traditionally have not done that, and
I don't see a compelling reason to start now. That would be a separate
project in its own right, IMO. We'd need to lift the share/deny mode
handling into this new cache.

There's also the problem of there not being any protocol support for
that in NFSv2/3. What would we return to the client if there's a deny
mode conflict when it's trying to do (e.g.) a READ?


> > 
> >>> +	struct hlist_node	nf_node;
> >>> +	struct list_head	nf_dispose;
> >>> +	struct rcu_head		nf_rcu;
> >>> +	struct file		*nf_file;
> >>> +	unsigned long		nf_time;
> >>> +#define NFSD_FILE_HASHED	(0)
> >>
> >> Why not using hlist_unhashed()? 
> >>
> > 
> > These entries are removed from the list using hlist_del_rcu(), and
> > hlist_unhashed will not return true after that.
> 
> As I understand, NFSD_FILE_HASHED means the file cache is added to 
> nfsd_file_hashtbl, and increased the global file count.
> 
> With calling hlist_del_rcu, file cache has be deleted from the hashtbl,
> and clear the NFSD_FILE_HASHED bit. 
> 

That's correct.

> As using in the codes, the bit and hlist_unhashed have the same meaning.
> Any mistake of my understand?
> 

hlist_unhashed() won't work here:

static inline int hlist_unhashed(const struct hlist_node *h)
{
        return !h->pprev;
}

...but:

static inline void hlist_del_rcu(struct hlist_node *n)
{
        __hlist_del(n);
        n->pprev = LIST_POISON2;
}

...so after a hlist_del_rcu, hlist_unhashed will still return false. In
order to get it to return true, we'd need to use hlist_del_init, but
that would mean that we couldn't safely traverse the hash chain under
the rcu_read_lock.

> +static void
> +nfsd_file_unhash(struct nfsd_file *nf)
> +{
> +	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> +		hlist_del_rcu(&nf->nf_node);
> +		nfsd_file_count_dec();
> +	}
> +}
> 
> thanks,
> Kinglong Mee
> 
> > 
> > 
> >>> +#define NFSD_FILE_PENDING	(1)
> >>> +	unsigned long		nf_flags;
> >>> +	struct knfsd_fh		nf_handle;
> >>> +	unsigned int		nf_hashval;
> >>> +	atomic_t		nf_ref;
> >>> +	unsigned char		nf_may;
> >>> +};
> >>> +
> >>> +int nfsd_file_cache_init(void);
> >>> +void nfsd_file_cache_shutdown(void);
> >>> +void nfsd_file_put(struct nfsd_file *nf);
> >>> +__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>> +		  unsigned int may_flags, struct nfsd_file **nfp);
> >>>  #endif /* _FS_NFSD_FILECACHE_H */
> >>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> >>> index ced9944201a0..0572441e23ec 100644
> >>> --- a/fs/nfsd/nfssvc.c
> >>> +++ b/fs/nfsd/nfssvc.c
> >>> @@ -23,6 +23,7 @@
> >>>  #include "cache.h"
> >>>  #include "vfs.h"
> >>>  #include "netns.h"
> >>> +#include "filecache.h"
> >>>  
> >>>  #define NFSDDBG_FACILITY	NFSDDBG_SVC
> >>>  
> >>> @@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
> >>>  	if (!nfsd_laundry_wq)
> >>>  		goto out_racache;
> >>>  
> >>> -	ret = nfs4_state_start();
> >>> +	ret = nfsd_file_cache_init();
> >>>  	if (ret)
> >>>  		goto out_wq;
> >>> +
> >>> +	ret = nfs4_state_start();
> >>> +	if (ret)
> >>> +		goto out_nfsd_file;
> >>>  	return 0;
> >>>  
> >>> +out_nfsd_file:
> >>> +	nfsd_file_cache_shutdown();
> >>>  out_wq:
> >>>  	destroy_workqueue(nfsd_laundry_wq);
> >>>  	nfsd_laundry_wq = NULL;
> >>> @@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
> >>>  		return;
> >>>  
> >>>  	nfs4_state_shutdown();
> >>> +	nfsd_file_cache_shutdown();
> >>>  	nfsd_racache_shutdown();
> >>>  }
> >>>  
> >>>
> > 
> > 


-- 
Jeff Layton <jlayton@poochiereds.net>