LinuxLists.cc - [PATCH 6/8] knfsd: repcache: use client IP address in hash

2006-10-11 11:28:55

Subject: [PATCH 6/8] knfsd: repcache: use client IP address in hash

knfsd: Use the client's IP address in the duplicate request cache
hash function, instead of just the XID. This avoids contention
on hash buckets when the workload has many clients whose XIDs are
nearly in lockstep, a property seen on compute clusters using NFS
for shared storage.

Signed-off-by: Greg Banks <[email protected]>
---

fs/nfsd/nfscache.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-git-20061009/fs/nfsd/nfscache.c
===================================================================
--- linux-git-20061009.orig/fs/nfsd/nfscache.c 2006-10-10 16:41:07.121363949 +1000
+++ linux-git-20061009/fs/nfsd/nfscache.c 2006-10-10 16:41:49.107949488 +1000
@@ -93,13 +93,18 @@ static int cache_disabled = 1;
* Calculate the hash index from an XID. Note, some clients increment
* their XIDs in host order, which can result in all the variation being
* in the top bits we see here. So we fold those bits down.
+ *
+ * Experiment shows that using the Jenkins hash improves the spectral
+ * properties of this hash, but the CPU cost of calculating it outweighs
+ * the advantages.
*/
static inline u32
-request_hash(u32 xid)
+request_hash(u32 xid, const struct sockaddr_in *sin)
{
u32 h = xid;
h ^= (xid >> 24);
h ^= ((xid & 0xff0000) >> 8);
+ h ^= sin->sin_addr.s_addr;
return h;
}

@@ -248,7 +253,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp
int safe = 0;
int expand = 0;

- h = request_hash(xid);
+ h = request_hash(xid, &rqstp->rq_addr);
b = bucket_for_hash(h);
h = (h / CACHE_NUM_BUCKETS) & (HASHSIZE-1);

@@ -399,7 +404,7 @@ nfsd_cache_update(struct svc_rqst *rqstp
if (!(rp = rqstp->rq_cacherep) || cache_disabled)
return;

- b = bucket_for_hash(request_hash(rp->c_xid));
+ b = bucket_for_hash(request_hash(rp->c_xid, &rp->c_addr));

len = resv->iov_len - ((char*)statp - (char*)resv->iov_base);
len >>= 2;

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-23 19:52:13

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 6/8] knfsd: repcache: use client IP address in hash

On Wed, Oct 11, 2006 at 09:28:50PM +1000, Greg Banks wrote:
> knfsd: Use the client's IP address in the duplicate request cache
> hash function, instead of just the XID.

By the way, do we ever match the credential used on the replayed request
with the credential used on the original request? From a quick check of
the code, I can't see any place where we do.

It strikes me as something as an attacker might be able to have some fun
with. (Poison the cache with requests matching xid's you expect to be
used in the future? "Replay" somebody else's request just to see a
response that you wouldn't otherwise have been able to?)

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-12 02:30:19

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 6/8] knfsd: repcache: use client IP address in hash

On Wed, 2006-10-11 at 21:28 +1000, Greg Banks wrote:
> knfsd: Use the client's IP address in the duplicate request cache
> hash function, instead of just the XID. This avoids contention
> on hash buckets when the workload has many clients whose XIDs are
> nearly in lockstep, a property seen on compute clusters using NFS
> for shared storage.

Note that some platforms (in particular the *BSDs) use an MD5 checksum
of the first couple of 100 bytes of the RPC header+message instead of
relying on the XID. That is a good deal safer w.r.t. port reuse by other
clients etc.

Cheers,
Trond

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-12 08:21:43

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 6/8] knfsd: repcache: use client IP address in hash

On Wed, Oct 11, 2006 at 07:30:00PM -0700, Trond Myklebust wrote:
> On Wed, 2006-10-11 at 21:28 +1000, Greg Banks wrote:
> > knfsd: Use the client's IP address in the duplicate request cache
> > hash function, instead of just the XID. This avoids contention
> > on hash buckets when the workload has many clients whose XIDs are
> > nearly in lockstep, a property seen on compute clusters using NFS
> > for shared storage.
>
> Note that some platforms (in particular the *BSDs) use an MD5 checksum
> of the first couple of 100 bytes of the RPC header+message instead of
> relying on the XID. That is a good deal safer w.r.t. port reuse by other
> clients etc.
>

I hear that there was a Cthon presentation on this subject. It sounds
very interesting, does anyone have a URL?

I presume the approach involves masking out the IPID and TCP sequence
number? Otherwise retries would never hash to the same value as the
original requests, thus defeating the repcache entirely.

Also, I'm not entirely convinced that a hash function which distributes
repcache entries more evenly across the hash table (which is what
I would expect an MD5 to do) is necessarily the best approach.
For one thing, that maximises the number of times that cachelines
need to be retrieved from remote nodes. A better approach might be
to construct the hash function so that certain cachelines naturally
stay on certain CPUs. I thought I'd done something like this, but
looking at the patch I sent it's dumber than that.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-12 14:19:34

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH 6/8] knfsd: repcache: use client IP address in hash

On 10/12/06, Greg Banks <[email protected]> wrote:
> On Wed, Oct 11, 2006 at 07:30:00PM -0700, Trond Myklebust wrote:
> > On Wed, 2006-10-11 at 21:28 +1000, Greg Banks wrote:
> > > knfsd: Use the client's IP address in the duplicate request cache
> > > hash function, instead of just the XID. This avoids contention
> > > on hash buckets when the workload has many clients whose XIDs are
> > > nearly in lockstep, a property seen on compute clusters using NFS
> > > for shared storage.
> >
> > Note that some platforms (in particular the *BSDs) use an MD5 checksum
> > of the first couple of 100 bytes of the RPC header+message instead of
> > relying on the XID. That is a good deal safer w.r.t. port reuse by other
> > clients etc.
> >
>
> I hear that there was a Cthon presentation on this subject. It sounds
> very interesting, does anyone have a URL?
>
> I presume the approach involves masking out the IPID and TCP sequence
> number? Otherwise retries would never hash to the same value as the
> original requests, thus defeating the repcache entirely.

The hash starts at the beginning of the RPC header, so the IP and
TCP/UDP headers are skipped.

--
"We who cut mere stones must always be envisioning cathedrals"
-- Quarry worker's creed

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-12 15:32:02

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 6/8] knfsd: repcache: use client IP address in hash

On Thu, Oct 12, 2006 at 06:21:26PM +1000, Greg Banks wrote:
> On Wed, Oct 11, 2006 at 07:30:00PM -0700, Trond Myklebust wrote:
> > On Wed, 2006-10-11 at 21:28 +1000, Greg Banks wrote:
> > > knfsd: Use the client's IP address in the duplicate request cache
> > > hash function, instead of just the XID. This avoids contention
> > > on hash buckets when the workload has many clients whose XIDs are
> > > nearly in lockstep, a property seen on compute clusters using NFS
> > > for shared storage.
> >
> > Note that some platforms (in particular the *BSDs) use an MD5 checksum
> > of the first couple of 100 bytes of the RPC header+message instead of
> > relying on the XID. That is a good deal safer w.r.t. port reuse by other
> > clients etc.
> >
>
> I hear that there was a Cthon presentation on this subject. It sounds
> very interesting, does anyone have a URL?

My possibly muddled notes from Rick's presentation:

Rick suggests:
LRU cache per TCP connection:
evicts from each cache on TCP ack from reply
keep individual caches around forever, even after disconnect--
since longterm network partition e.g. may be typical case.
(OK, maybe not forever).
(Note lookups are *global*--he doesn't look up on TCP connection (or
even IP address)--he wants reconnects to get hits.)
He uses XID and checksum on first 100 bytes of decrypted RPC body
as key into cache.
He assumes any hit on an in-progress rpc is a false positive.
He also will *never* drop based on a hit on anything with
a sequenceid-mutating op in it.

He also has more detailed notes at
ftp://ftp.cis.uoguelph.ca:/pub/nfsv4/server-cache.algorithm
and code in newnfs/nfsd/nfsd_srvcache.c in any of the nfsv4-fullkern* tarballs
on the same site.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs