2013-02-07 14:51:46

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 0/2] nfsd: checksum first 256 bytes of request to guard against XID collisions in the DRC

This patchset is a respin of the ones that Bruce has not yet committed
of my DRC overhaul. The main difference is the first patch, which adds a
routine to strip the trailing checksum off of decrypted or
integrity-verified buffer.

I've tested both the client and server with different krb5 flavors (and
using both the v1 and v2 codepaths) and it seems to work fine with those
checksum blobs stripped off.

I think we should consider this for 3.9 since XID collisions are a lot
more likely now with the new DRC code in place.

Jeff Layton (2):
sunrpc: trim off trailing checksum before returning decrypted or
integrity authenticated buffer
nfsd: keep a checksum of the first 256 bytes of request

fs/nfsd/cache.h | 5 ++++
fs/nfsd/nfscache.c | 53 ++++++++++++++++++++++++++++++++++---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/auth_gss/gss_krb5_wrap.c | 2 ++
net/sunrpc/auth_gss/svcauth_gss.c | 10 +++++--
net/sunrpc/xdr.c | 42 +++++++++++++++++++++++++++++
6 files changed, 107 insertions(+), 6 deletions(-)

--
1.7.11.7



2013-02-08 15:57:07

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Fri, 8 Feb 2013 10:42:20 -0500
Chuck Lever <[email protected]> wrote:

>
> On Feb 8, 2013, at 8:27 AM, Jeff Layton <[email protected]> wrote:
>
> > On Thu, 7 Feb 2013 13:03:16 -0500
> > Jeff Layton <[email protected]> wrote:
> >
> >> On Thu, 7 Feb 2013 10:51:02 -0500
> >> Chuck Lever <[email protected]> wrote:
> >>
> >>>
> >>> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> >>>
> >>>> Now that we're allowing more DRC entries, it becomes a lot easier to hit
> >>>> problems with XID collisions. In order to mitigate those, calculate the
> >>>> crc32 of up to the first 256 bytes of each request coming in and store
> >>>> that in the cache entry, along with the total length of the request.
> >>>
> >>> I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
> >>>
> >>> Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
> >>>
> >>
> >> No, I haven't, at least not in any sort of rigorous way. It's pretty
> >> negligible on "normal" PC hardware, but I think most intel and amd cpus
> >> have instructions for handling crc32. I'm ok with a different checksum,
> >> we don't need anything cryptographically secure here. I simply chose
> >> crc32 since it has an easily available API, and I figured it would be
> >> fairly lightweight.
> >>
> >
> > After an abortive attempt to measure this with ftrace, I ended up
> > hacking together a patch to just measure the latency of the
> > nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
> > KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
> > checksums cuts that roughly in half to ~800ns. I'm not sure how best to
> > measure the cache footprint however.
>
> Thanks for sorting that, I feel better having a real data point.
>

Me too -- thanks for raising the question.

> > Neither seems terribly significant, especially given the other
> > inefficiencies in this code. OTOH, I guess those latencies can add up,
> > and I don't see any need to use crc32 over the net/checksum.h routines.
> > We probably ought to go with my RFC patch from yesterday.
> >
> > This could look very different on other arches too of course, but I
> > don't have a test rig for any other arches at the moment.
>
> Going with the IP checksum seems perfectly adequate to me.
>

Agreed.

> Bruce made a good point that it is difficult to know how to test the efficacy of this change. We don't really know of any especially bad cases that defeat the current DRC, though I know the Red Hat Q/A team likes to use NFS/UDP and they seem to run into problems with some frequency. Having a chat with Rick Macklem and/or Tom Talpey on this topic might have some value, and likewise perhaps the bug's OP might have some ideas about what test cases they prefer.
>

It is difficult to know since XID collisions are so rare (or at least
we think so). When they do happen they're often not noticeable, but we
do have some documented cases of it happening in the field...

RH's QA team does test with NFS/UDP but I'm not aware of any problems
with XID collisions, per-se. The problems they've had are that the DRC
sometimes just plain doesn't work. Based on some testing we've done
internally, I think the problem is likely due to the fact that it's
trivial to blow out the current fixed-size DRC cache. Until I backport
this and get them to deploy it onto our test NFS servers though, it's
going to be hard to know for certain.

I decided to add in the extra checksumming for 3 main reasons:

1) I was in here anyway, and it didn't seem too tough to do

2) the potential for xid collisions will be greater with a larger DRC

3) we've had people request it in the past

Given that it seems fairly cheap to add this, we might as well do it.

--
Jeff Layton <[email protected]>

2013-02-07 16:00:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Thu, Feb 07, 2013 at 10:51:02AM -0500, Chuck Lever wrote:
>
> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
>
> > Now that we're allowing more DRC entries, it becomes a lot easier to
> > hit problems with XID collisions. In order to mitigate those,
> > calculate the crc32 of up to the first 256 bytes of each request
> > coming in and store that in the cache entry, along with the total
> > length of the request.
>
> I'm happy to see a checksummed DRC finally become reality for the
> Linux NFS server.
>
> Have you measured the CPU utilization impact and CPU cache footprint
> of performing a CRC computation for every incoming RPC?

Note this is over the first 256 bytes of the request--which we're
probably just about to read for xdr decoding anyway.

> I'm wondering if a simpler checksum might be just as useful but less
> costly to compute.

What would be an example of a simpler checksum?

--b.

>
>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/cache.h | 5 +++++
> > fs/nfsd/nfscache.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
> > 2 files changed, 54 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> > index 9c7232b..4822db3 100644
> > --- a/fs/nfsd/cache.h
> > +++ b/fs/nfsd/cache.h
> > @@ -29,6 +29,8 @@ struct svc_cacherep {
> > u32 c_prot;
> > u32 c_proc;
> > u32 c_vers;
> > + unsigned int c_len;
> > + u32 c_crc;
> > unsigned long c_timestamp;
> > union {
> > struct kvec u_vec;
> > @@ -73,6 +75,9 @@ enum {
> > /* Cache entries expire after this time period */
> > #define RC_EXPIRE (120 * HZ)
> >
> > +/* Checksum this amount of the request */
> > +#define RC_CSUMLEN (256U)
> > +
> > int nfsd_reply_cache_init(void);
> > void nfsd_reply_cache_shutdown(void);
> > int nfsd_cache_lookup(struct svc_rqst *);
> > diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> > index f754469..a8c3f1e 100644
> > --- a/fs/nfsd/nfscache.c
> > +++ b/fs/nfsd/nfscache.c
> > @@ -11,6 +11,8 @@
> > #include <linux/slab.h>
> > #include <linux/sunrpc/addr.h>
> > #include <linux/highmem.h>
> > +#include <linux/crc32.h>
> > +#include <linux/sunrpc/svcauth_gss.h>
> >
> > #include "nfsd.h"
> > #include "cache.h"
> > @@ -24,6 +26,7 @@ static struct list_head lru_head;
> > static struct kmem_cache *drc_slab;
> > static unsigned int num_drc_entries;
> > static unsigned int max_drc_entries;
> > +static u32 crc_seed;
> >
> > /*
> > * Calculate the hash index from an XID.
> > @@ -130,6 +133,9 @@ int nfsd_reply_cache_init(void)
> > INIT_LIST_HEAD(&lru_head);
> > max_drc_entries = nfsd_cache_size_limit();
> > num_drc_entries = 0;
> > +
> > + /* Is a random seed any better than some well-defined constant? */
> > + get_random_bytes(&crc_seed, sizeof(crc_seed));
> > return 0;
> > out_nomem:
> > printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
> > @@ -238,12 +244,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
> > }
> >
> > /*
> > + * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> > + */
> > +static u32
> > +nfsd_cache_crc(struct svc_rqst *rqstp)
> > +{
> > + int idx;
> > + unsigned int base;
> > + u32 crc;
> > + struct xdr_buf *buf = &rqstp->rq_arg;
> > + const unsigned char *p = buf->head[0].iov_base;
> > + size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> > + RC_CSUMLEN);
> > + size_t len = min(buf->head[0].iov_len, csum_len);
> > +
> > + /* rq_arg.head first */
> > + crc = crc32(crc_seed, p, len);
> > + csum_len -= len;
> > +
> > + /* Continue into page array */
> > + idx = buf->page_base / PAGE_SIZE;
> > + base = buf->page_base & ~PAGE_MASK;
> > + while (csum_len) {
> > + p = page_address(buf->pages[idx]) + base;
> > + len = min(PAGE_SIZE - base, csum_len);
> > + crc = crc32(crc, p, len);
> > + csum_len -= len;
> > + base = 0;
> > + ++idx;
> > + }
> > + return crc;
> > +}
> > +
> > +/*
> > * Search the request hash for an entry that matches the given rqstp.
> > * Must be called with cache_lock held. Returns the found entry or
> > * NULL on failure.
> > */
> > static struct svc_cacherep *
> > -nfsd_cache_search(struct svc_rqst *rqstp)
> > +nfsd_cache_search(struct svc_rqst *rqstp, u32 crc)
> > {
> > struct svc_cacherep *rp;
> > struct hlist_node *hn;
> > @@ -257,6 +296,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
> > hlist_for_each_entry(rp, hn, rh, c_hash) {
> > if (xid == rp->c_xid && proc == rp->c_proc &&
> > proto == rp->c_prot && vers == rp->c_vers &&
> > + rqstp->rq_arg.len == rp->c_len && crc == rp->c_crc &&
> > rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
> > rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
> > return rp;
> > @@ -276,7 +316,8 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> > __be32 xid = rqstp->rq_xid;
> > u32 proto = rqstp->rq_prot,
> > vers = rqstp->rq_vers,
> > - proc = rqstp->rq_proc;
> > + proc = rqstp->rq_proc,
> > + crc;
> > unsigned long age;
> > int type = rqstp->rq_cachetype;
> > int rtn;
> > @@ -287,10 +328,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> > return RC_DOIT;
> > }
> >
> > + crc = nfsd_cache_crc(rqstp);
> > +
> > spin_lock(&cache_lock);
> > rtn = RC_DOIT;
> >
> > - rp = nfsd_cache_search(rqstp);
> > + rp = nfsd_cache_search(rqstp, crc);
> > if (rp)
> > goto found_entry;
> >
> > @@ -318,7 +361,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> > * Must search again just in case someone inserted one
> > * after we dropped the lock above.
> > */
> > - found = nfsd_cache_search(rqstp);
> > + found = nfsd_cache_search(rqstp, crc);
> > if (found) {
> > nfsd_reply_cache_free_locked(rp);
> > rp = found;
> > @@ -344,6 +387,8 @@ setup_entry:
> > rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
> > rp->c_prot = proto;
> > rp->c_vers = vers;
> > + rp->c_len = rqstp->rq_arg.len;
> > + rp->c_crc = crc;
> >
> > hash_refile(rp);
> > lru_put_end(rp);
> > --
> > 1.7.11.7
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

2013-02-08 20:55:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Fri, Feb 08, 2013 at 08:27:06AM -0500, Jeff Layton wrote:
> On Thu, 7 Feb 2013 13:03:16 -0500
> Jeff Layton <[email protected]> wrote:
>
> > On Thu, 7 Feb 2013 10:51:02 -0500
> > Chuck Lever <[email protected]> wrote:
> >
> > >
> > > On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> > >
> > > > Now that we're allowing more DRC entries, it becomes a lot easier to hit
> > > > problems with XID collisions. In order to mitigate those, calculate the
> > > > crc32 of up to the first 256 bytes of each request coming in and store
> > > > that in the cache entry, along with the total length of the request.
> > >
> > > I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
> > >
> > > Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
> > >
> >
> > No, I haven't, at least not in any sort of rigorous way. It's pretty
> > negligible on "normal" PC hardware, but I think most intel and amd cpus
> > have instructions for handling crc32. I'm ok with a different checksum,
> > we don't need anything cryptographically secure here. I simply chose
> > crc32 since it has an easily available API, and I figured it would be
> > fairly lightweight.
> >
>
> After an abortive attempt to measure this with ftrace, I ended up
> hacking together a patch to just measure the latency of the
> nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
> KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
> checksums cuts that roughly in half to ~800ns. I'm not sure how best to
> measure the cache footprint however.
>
> Neither seems terribly significant, especially given the other
> inefficiencies in this code. OTOH, I guess those latencies can add up,
> and I don't see any need to use crc32 over the net/checksum.h routines.
> We probably ought to go with my RFC patch from yesterday.

OK, I hadn't committed the original yet, so I've just rolled them
together and added a little of the above to the changelog. Look OK?
Chuck, should I add a Reviewed-by: ?

--b.

commit a937bd422ccc4306cdc81b5aa60b12a7212b70d3
Author: Jeff Layton <[email protected]>
Date: Mon Feb 4 11:57:27 2013 -0500

nfsd: keep a checksum of the first 256 bytes of request

Now that we're allowing more DRC entries, it becomes a lot easier to hit
problems with XID collisions. In order to mitigate those, calculate a
checksum of up to the first 256 bytes of each request coming in and store
that in the cache entry, along with the total length of the request.

This initially used crc32, but Chuck Lever and Jim Rees pointed out that
crc32 is probably more heavyweight than we really need for generating
these checksums, and recommended looking at using the same routines that
are used to generate checksums for IP packets.

On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
csum_partial vs ~1750ns for crc32. The difference probably isn't
terribly significant, but for now we may as well use csum_partial.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 9c7232b..87fd141 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -29,6 +29,8 @@ struct svc_cacherep {
u32 c_prot;
u32 c_proc;
u32 c_vers;
+ unsigned int c_len;
+ __wsum c_csum;
unsigned long c_timestamp;
union {
struct kvec u_vec;
@@ -73,6 +75,9 @@ enum {
/* Cache entries expire after this time period */
#define RC_EXPIRE (120 * HZ)

+/* Checksum this amount of the request */
+#define RC_CSUMLEN (256U)
+
int nfsd_reply_cache_init(void);
void nfsd_reply_cache_shutdown(void);
int nfsd_cache_lookup(struct svc_rqst *);
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index f754469..40db57e 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/sunrpc/addr.h>
#include <linux/highmem.h>
+#include <net/checksum.h>

#include "nfsd.h"
#include "cache.h"
@@ -130,6 +131,7 @@ int nfsd_reply_cache_init(void)
INIT_LIST_HEAD(&lru_head);
max_drc_entries = nfsd_cache_size_limit();
num_drc_entries = 0;
+
return 0;
out_nomem:
printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
@@ -238,12 +240,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
}

/*
+ * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
+ */
+static __wsum
+nfsd_cache_csum(struct svc_rqst *rqstp)
+{
+ int idx;
+ unsigned int base;
+ __wsum csum;
+ struct xdr_buf *buf = &rqstp->rq_arg;
+ const unsigned char *p = buf->head[0].iov_base;
+ size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
+ RC_CSUMLEN);
+ size_t len = min(buf->head[0].iov_len, csum_len);
+
+ /* rq_arg.head first */
+ csum = csum_partial(p, len, 0);
+ csum_len -= len;
+
+ /* Continue into page array */
+ idx = buf->page_base / PAGE_SIZE;
+ base = buf->page_base & ~PAGE_MASK;
+ while (csum_len) {
+ p = page_address(buf->pages[idx]) + base;
+ len = min(PAGE_SIZE - base, csum_len);
+ csum = csum_partial(p, len, csum);
+ csum_len -= len;
+ base = 0;
+ ++idx;
+ }
+ return csum;
+}
+
+/*
* Search the request hash for an entry that matches the given rqstp.
* Must be called with cache_lock held. Returns the found entry or
* NULL on failure.
*/
static struct svc_cacherep *
-nfsd_cache_search(struct svc_rqst *rqstp)
+nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
{
struct svc_cacherep *rp;
struct hlist_node *hn;
@@ -257,6 +292,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
hlist_for_each_entry(rp, hn, rh, c_hash) {
if (xid == rp->c_xid && proc == rp->c_proc &&
proto == rp->c_prot && vers == rp->c_vers &&
+ rqstp->rq_arg.len == rp->c_len && csum == rp->c_csum &&
rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
return rp;
@@ -277,6 +313,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
u32 proto = rqstp->rq_prot,
vers = rqstp->rq_vers,
proc = rqstp->rq_proc;
+ __wsum csum;
unsigned long age;
int type = rqstp->rq_cachetype;
int rtn;
@@ -287,10 +324,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
return RC_DOIT;
}

+ csum = nfsd_cache_csum(rqstp);
+
spin_lock(&cache_lock);
rtn = RC_DOIT;

- rp = nfsd_cache_search(rqstp);
+ rp = nfsd_cache_search(rqstp, csum);
if (rp)
goto found_entry;

@@ -318,7 +357,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
* Must search again just in case someone inserted one
* after we dropped the lock above.
*/
- found = nfsd_cache_search(rqstp);
+ found = nfsd_cache_search(rqstp, csum);
if (found) {
nfsd_reply_cache_free_locked(rp);
rp = found;
@@ -344,6 +383,8 @@ setup_entry:
rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
rp->c_prot = proto;
rp->c_vers = vers;
+ rp->c_len = rqstp->rq_arg.len;
+ rp->c_csum = csum;

hash_refile(rp);
lru_put_end(rp);

2013-02-07 18:03:24

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Thu, 7 Feb 2013 10:51:02 -0500
Chuck Lever <[email protected]> wrote:

>
> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
>
> > Now that we're allowing more DRC entries, it becomes a lot easier to hit
> > problems with XID collisions. In order to mitigate those, calculate the
> > crc32 of up to the first 256 bytes of each request coming in and store
> > that in the cache entry, along with the total length of the request.
>
> I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
>
> Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
>

No, I haven't, at least not in any sort of rigorous way. It's pretty
negligible on "normal" PC hardware, but I think most intel and amd cpus
have instructions for handling crc32. I'm ok with a different checksum,
we don't need anything cryptographically secure here. I simply chose
crc32 since it has an easily available API, and I figured it would be
fairly lightweight.

I originally had hopes of just using the checksum in the TCP/UDP
header, but that's handled in hw by some cards and so isn't available.
There are also things that change during a retransmit (like GSS sequence
numbers), so we can't just scrape those out anyway.

As far as why 256 bytes, we had had a bug opened a while back (by
Oracle, I think), that asked us to add this capability and suggested
200 bytes. I like powers of 2 so I rounded up. We could easily extend
that though by just changing RC_CSUMLEN if we think it's not enough.

--
Jeff Layton <[email protected]>

2013-02-07 16:37:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Thu, Feb 07, 2013 at 11:23:17AM -0500, Chuck Lever wrote:
>
> On Feb 7, 2013, at 11:00 AM, "J. Bruce Fields" <[email protected]> wrote:
>
> > On Thu, Feb 07, 2013 at 10:51:02AM -0500, Chuck Lever wrote:
> >>
> >> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> >>
> >>> Now that we're allowing more DRC entries, it becomes a lot easier to
> >>> hit problems with XID collisions. In order to mitigate those,
> >>> calculate the crc32 of up to the first 256 bytes of each request
> >>> coming in and store that in the cache entry, along with the total
> >>> length of the request.
> >>
> >> I'm happy to see a checksummed DRC finally become reality for the
> >> Linux NFS server.
> >>
> >> Have you measured the CPU utilization impact and CPU cache footprint
> >> of performing a CRC computation for every incoming RPC?
> >
> > Note this is over the first 256 bytes of the request--which we're
> > probably just about to read for xdr decoding anyway.
>
> XDR decoding is copying and branching. Computing a CRC involves real math, which tends to be significantly more expensive than successfully predicted branches, especially on low-power CPUs that might be found in SOHO NAS products.

OK, I wouldn't know.

(I was just responding to the "cache footprint" question--I thought you
were concerned about reading in a bunch of the request.) Looks like the
biggest piece of the crc32 code is a 1k lookup table?

> >> I'm wondering if a simpler checksum might be just as useful but less
> >> costly to compute.
> >
> > What would be an example of a simpler checksum?
>
> The same one TCP uses, like a simple additive sum, or an XOR. Is a heavyweight checksum needed because checksums generated with a simple function are more likely to collide?
>
> Not that this should hold up merging Jeff's work! We can easily tweak or replace the checksum algorithm after it's upstream. It's not kABI.
>
> But someone should assess the impact of the additional checksum computation. CRC seems to me heavier than is needed here.

OK, sure, may be worth looking into.

> Possible tweaks:
>
> Why 256 bytes? Is that too much? Or not enough for some NFSv4
> compounds that might often start with the same data? Could we, for
> instance, use fewer bytes for NFSv2 and NFSv3? Or even a variable
> checksum length depending on the NFS operation? Is 256 bytes enough
> for NFSv4.1, whose compounds always start with the same operation?

NFSv4.1 has the drc turned completely off.

> If integrity or privacy is in play, can we use that information in
> place of a separate DRC checksum?

There's a gss sequence number that's incremented even on resends of the
same rpc, so this doesn't work. (By design: you don't want an attacker
to be able to replay an old rpc.)

--b.

2013-02-08 13:27:16

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Thu, 7 Feb 2013 13:03:16 -0500
Jeff Layton <[email protected]> wrote:

> On Thu, 7 Feb 2013 10:51:02 -0500
> Chuck Lever <[email protected]> wrote:
>
> >
> > On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> >
> > > Now that we're allowing more DRC entries, it becomes a lot easier to hit
> > > problems with XID collisions. In order to mitigate those, calculate the
> > > crc32 of up to the first 256 bytes of each request coming in and store
> > > that in the cache entry, along with the total length of the request.
> >
> > I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
> >
> > Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
> >
>
> No, I haven't, at least not in any sort of rigorous way. It's pretty
> negligible on "normal" PC hardware, but I think most intel and amd cpus
> have instructions for handling crc32. I'm ok with a different checksum,
> we don't need anything cryptographically secure here. I simply chose
> crc32 since it has an easily available API, and I figured it would be
> fairly lightweight.
>

After an abortive attempt to measure this with ftrace, I ended up
hacking together a patch to just measure the latency of the
nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
checksums cuts that roughly in half to ~800ns. I'm not sure how best to
measure the cache footprint however.

Neither seems terribly significant, especially given the other
inefficiencies in this code. OTOH, I guess those latencies can add up,
and I don't see any need to use crc32 over the net/checksum.h routines.
We probably ought to go with my RFC patch from yesterday.

This could look very different on other arches too of course, but I
don't have a test rig for any other arches at the moment.

--
Jeff Layton <[email protected]>

2013-02-08 15:41:28

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Thu, 7 Feb 2013 16:32:20 +0000
"Myklebust, Trond" <[email protected]> wrote:

> > -----Original Message-----
> > From: [email protected] [mailto:linux-nfs-
> > [email protected]] On Behalf Of J. Bruce Fields
> > Sent: Thursday, February 07, 2013 11:01 AM
> > To: Chuck Lever
> > Cc: Jeff Layton; [email protected]
> > Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of
> > request
> >
> > On Thu, Feb 07, 2013 at 10:51:02AM -0500, Chuck Lever wrote:
> > >
> > > On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> > >
> > > > Now that we're allowing more DRC entries, it becomes a lot easier to
> > > > hit problems with XID collisions. In order to mitigate those,
> > > > calculate the crc32 of up to the first 256 bytes of each request
> > > > coming in and store that in the cache entry, along with the total
> > > > length of the request.
> > >
> > > I'm happy to see a checksummed DRC finally become reality for the
> > > Linux NFS server.
> > >
> > > Have you measured the CPU utilization impact and CPU cache footprint
> > > of performing a CRC computation for every incoming RPC?
> >
> > Note this is over the first 256 bytes of the request--which we're probably just
> > about to read for xdr decoding anyway.
>
> - Would it make sense perhaps to generate the checksum as you are reading the data?

Maybe, but I'm not sure it's worth the engineering effort. On my highly
unscientific test rig, the time to calculate the checksum is ~800ns
using the IP checksum routine. If we can run the checksum while we read
the data off the socket, then we might shave some of that off, but that
"feels" like small potatoes to me.

OTOH, I don't have a good way to measure what the effect of this is on
the CPU cache. Any ideas of how to measure whether that's significant?

> - Also, is 256 bytes sufficient? How far does that get you with your average WRITE compound?

I also had a look at this, and it looks like the data portion of a
WRITE compound from the Linux client starts at ~80 bytes from the
beginning of the NFS part of the frame. So with a 256 byte checksum
length, we're checksumming over about 176 bytes of write data from a
typical NFSv4.0 WRITE request.

It might not hurt to extend that out to 384 bytes or so? We have to
figure that some clients also set the tag field, so having a bit of
fudge factor for that might be reasonable.

Making it variable doesn't seem like it would buy us too much since
most calls are much smaller than 256 bytes to begin with.

--
Jeff Layton <[email protected]>

2013-02-07 18:35:42

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Thu, 7 Feb 2013 16:32:20 +0000
"Myklebust, Trond" <[email protected]> wrote:

> > -----Original Message-----
> > From: [email protected] [mailto:linux-nfs-
> > [email protected]] On Behalf Of J. Bruce Fields
> > Sent: Thursday, February 07, 2013 11:01 AM
> > To: Chuck Lever
> > Cc: Jeff Layton; [email protected]
> > Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of
> > request
> >
> > On Thu, Feb 07, 2013 at 10:51:02AM -0500, Chuck Lever wrote:
> > >
> > > On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> > >
> > > > Now that we're allowing more DRC entries, it becomes a lot easier to
> > > > hit problems with XID collisions. In order to mitigate those,
> > > > calculate the crc32 of up to the first 256 bytes of each request
> > > > coming in and store that in the cache entry, along with the total
> > > > length of the request.
> > >
> > > I'm happy to see a checksummed DRC finally become reality for the
> > > Linux NFS server.
> > >
> > > Have you measured the CPU utilization impact and CPU cache footprint
> > > of performing a CRC computation for every incoming RPC?
> >
> > Note this is over the first 256 bytes of the request--which we're probably just
> > about to read for xdr decoding anyway.
>
> - Would it make sense perhaps to generate the checksum as you are reading the data?

It would be nice, but that would require some significant
reengineering, AFAICT. I'm not sure this is worth doing all of that,
but maybe there's an easy way to do that that I'm not seeing.

> - Also, is 256 bytes sufficient? How far does that get you with your average WRITE compound?

Mostly that length comes from a bug we had opened a while back which was
entitled "Oracle has insisted of all the NAS vendors that for all the
dNFS IO, the first 200 bytes check-sum of every write to be validated
before the commit takes place." The bug is marked private or I'd post a
link to it here.

In any case, the title is poorly worded, but basically they were saying
we should checksum the first 200 bytes of write data as a guard against
xid collisions in the DRC. I rounded it up to 256 just because I like
powers of 2 and we needed some extra to cover the NFS header anyway.

We could always extend that, or even make it variable based on some
criteria.

> - Could the integrity checksum in RPCSEC_GSS/krbi be reused as a DRC checksum?
>

Sadly, no. As Bruce pointed out, that has GSSAPI sequence numbers,
which change on a retransmit. Scraping the checksum out of the TCP/UDP
headers somehow is also problematic for similar reasons...

--
Jeff Layton <[email protected]>

2013-02-07 15:11:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] nfsd: checksum first 256 bytes of request to guard against XID collisions in the DRC

On Thu, Feb 07, 2013 at 09:51:39AM -0500, Jeff Layton wrote:
> This patchset is a respin of the ones that Bruce has not yet committed
> of my DRC overhaul. The main difference is the first patch, which adds a
> routine to strip the trailing checksum off of decrypted or
> integrity-verified buffer.
>
> I've tested both the client and server with different krb5 flavors (and
> using both the v1 and v2 codepaths) and it seems to work fine with those
> checksum blobs stripped off.
>
> I think we should consider this for 3.9 since XID collisions are a lot
> more likely now with the new DRC code in place.

Looks good--applying for 3.9 absent any objections.

--b.

>
> Jeff Layton (2):
> sunrpc: trim off trailing checksum before returning decrypted or
> integrity authenticated buffer
> nfsd: keep a checksum of the first 256 bytes of request
>
> fs/nfsd/cache.h | 5 ++++
> fs/nfsd/nfscache.c | 53 ++++++++++++++++++++++++++++++++++---
> include/linux/sunrpc/xdr.h | 1 +
> net/sunrpc/auth_gss/gss_krb5_wrap.c | 2 ++
> net/sunrpc/auth_gss/svcauth_gss.c | 10 +++++--
> net/sunrpc/xdr.c | 42 +++++++++++++++++++++++++++++
> 6 files changed, 107 insertions(+), 6 deletions(-)
>
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-02-07 16:41:44

by Jim Rees

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

Chuck Lever wrote:

>
>> I'm wondering if a simpler checksum might be just as useful but less
>> costly to compute.
>
> What would be an example of a simpler checksum?

The same one TCP uses, like a simple additive sum, or an XOR. Is a
heavyweight checksum needed because checksums generated with a simple
function are more likely to collide?

At least CRC isn't as bad as a crypto checksum like md5. Those are often
misused when what's really wanted is just a simple guard against accidental
corruption. In this case I'd go with the tcp checksum, which I think has an
api and even an accelerated implementation depending on the machine arch,
although I can't find it right now.

2013-02-08 15:42:34

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request


On Feb 8, 2013, at 8:27 AM, Jeff Layton <[email protected]> wrote:

> On Thu, 7 Feb 2013 13:03:16 -0500
> Jeff Layton <[email protected]> wrote:
>
>> On Thu, 7 Feb 2013 10:51:02 -0500
>> Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
>>>
>>>> Now that we're allowing more DRC entries, it becomes a lot easier to hit
>>>> problems with XID collisions. In order to mitigate those, calculate the
>>>> crc32 of up to the first 256 bytes of each request coming in and store
>>>> that in the cache entry, along with the total length of the request.
>>>
>>> I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
>>>
>>> Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
>>>
>>
>> No, I haven't, at least not in any sort of rigorous way. It's pretty
>> negligible on "normal" PC hardware, but I think most intel and amd cpus
>> have instructions for handling crc32. I'm ok with a different checksum,
>> we don't need anything cryptographically secure here. I simply chose
>> crc32 since it has an easily available API, and I figured it would be
>> fairly lightweight.
>>
>
> After an abortive attempt to measure this with ftrace, I ended up
> hacking together a patch to just measure the latency of the
> nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
> KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
> checksums cuts that roughly in half to ~800ns. I'm not sure how best to
> measure the cache footprint however.

Thanks for sorting that, I feel better having a real data point.

> Neither seems terribly significant, especially given the other
> inefficiencies in this code. OTOH, I guess those latencies can add up,
> and I don't see any need to use crc32 over the net/checksum.h routines.
> We probably ought to go with my RFC patch from yesterday.
>
> This could look very different on other arches too of course, but I
> don't have a test rig for any other arches at the moment.

Going with the IP checksum seems perfectly adequate to me.

Bruce made a good point that it is difficult to know how to test the efficacy of this change. We don't really know of any especially bad cases that defeat the current DRC, though I know the Red Hat Q/A team likes to use NFS/UDP and they seem to run into problems with some frequency. Having a chat with Rick Macklem and/or Tom Talpey on this topic might have some value, and likewise perhaps the bug's OP might have some ideas about what test cases they prefer.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-02-07 14:51:47

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 1/2] sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer

When GSSAPI integrity signatures are in use, or when we're using GSSAPI
privacy with the v2 token format, there is a trailing checksum on the
xdr_buf that is returned.

It's checked during the authentication stage, and afterward nothing
cares about it. Ordinarily, it's not a problem since the XDR code
generally ignores it, but it will be when we try to compute a checksum
over the buffer to help prevent XID collisions in the duplicate reply
cache.

Fix the code to trim off the checksums after verifying them. Note that
in unwrap_integ_data, we must avoid trying to reverify the checksum if
the request was deferred since it will no longer be present when it's
revisited.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/auth_gss/gss_krb5_wrap.c | 2 ++
net/sunrpc/auth_gss/svcauth_gss.c | 10 +++++++--
net/sunrpc/xdr.c | 42 +++++++++++++++++++++++++++++++++++++
4 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 224d060..15f9204 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -152,6 +152,7 @@ xdr_adjust_iovec(struct kvec *iov, __be32 *p)
extern void xdr_shift_buf(struct xdr_buf *, size_t);
extern void xdr_buf_from_iov(struct kvec *, struct xdr_buf *);
extern int xdr_buf_subsegment(struct xdr_buf *, struct xdr_buf *, unsigned int, unsigned int);
+extern void xdr_buf_trim(struct xdr_buf *, unsigned int);
extern int xdr_buf_read_netobj(struct xdr_buf *, struct xdr_netobj *, unsigned int);
extern int read_bytes_from_xdr_buf(struct xdr_buf *, unsigned int, void *, unsigned int);
extern int write_bytes_to_xdr_buf(struct xdr_buf *, unsigned int, void *, unsigned int);
diff --git a/net/sunrpc/auth_gss/gss_krb5_wrap.c b/net/sunrpc/auth_gss/gss_krb5_wrap.c
index 107c452..88edec9 100644
--- a/net/sunrpc/auth_gss/gss_krb5_wrap.c
+++ b/net/sunrpc/auth_gss/gss_krb5_wrap.c
@@ -574,6 +574,8 @@ gss_unwrap_kerberos_v2(struct krb5_ctx *kctx, int offset, struct xdr_buf *buf)
buf->head[0].iov_len -= GSS_KRB5_TOK_HDR_LEN + headskip;
buf->len -= GSS_KRB5_TOK_HDR_LEN + headskip;

+ /* Trim off the checksum blob */
+ xdr_buf_trim(buf, GSS_KRB5_TOK_HDR_LEN + tailskip);
return GSS_S_COMPLETE;
}

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 73e9573..a5b41e2 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -817,13 +817,17 @@ read_u32_from_xdr_buf(struct xdr_buf *buf, int base, u32 *obj)
* The server uses base of head iovec as read pointer, while the
* client uses separate pointer. */
static int
-unwrap_integ_data(struct xdr_buf *buf, u32 seq, struct gss_ctx *ctx)
+unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gss_ctx *ctx)
{
int stat = -EINVAL;
u32 integ_len, maj_stat;
struct xdr_netobj mic;
struct xdr_buf integ_buf;

+ /* Did we already verify the signature on the original pass through? */
+ if (rqstp->rq_deferred)
+ return 0;
+
integ_len = svc_getnl(&buf->head[0]);
if (integ_len & 3)
return stat;
@@ -846,6 +850,8 @@ unwrap_integ_data(struct xdr_buf *buf, u32 seq, struct gss_ctx *ctx)
goto out;
if (svc_getnl(&buf->head[0]) != seq)
goto out;
+ /* trim off the mic at the end before returning */
+ xdr_buf_trim(buf, mic.len + 4);
stat = 0;
out:
kfree(mic.data);
@@ -1190,7 +1196,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp)
/* placeholders for length and seq. number: */
svc_putnl(resv, 0);
svc_putnl(resv, 0);
- if (unwrap_integ_data(&rqstp->rq_arg,
+ if (unwrap_integ_data(rqstp, &rqstp->rq_arg,
gc->gc_seq, rsci->mechctx))
goto garbage_args;
break;
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 5605563..02c1577 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -879,6 +879,48 @@ xdr_buf_subsegment(struct xdr_buf *buf, struct xdr_buf *subbuf,
}
EXPORT_SYMBOL_GPL(xdr_buf_subsegment);

+/**
+ * xdr_buf_trim - lop at most "len" bytes off the end of "buf"
+ * @buf: buf to be trimmed
+ * @len: number of bytes to reduce "buf" by
+ *
+ * Trim an xdr_buf by the given number of bytes by fixing up the lengths. Note
+ * that it's possible that we'll trim less than that amount if the xdr_buf is
+ * too small, or if (for instance) it's all in the head and the parser has
+ * already read too far into it.
+ */
+void xdr_buf_trim(struct xdr_buf *buf, unsigned int len)
+{
+ size_t cur;
+ unsigned int trim = len;
+
+ if (buf->tail[0].iov_len) {
+ cur = min_t(size_t, buf->tail[0].iov_len, trim);
+ buf->tail[0].iov_len -= cur;
+ trim -= cur;
+ /* XXX: ok to leave tail[0].iov_base as non-NULL here? */
+ if (!trim)
+ goto fix_len;
+ }
+
+ if (buf->page_len) {
+ cur = min_t(unsigned int, buf->page_len, trim);
+ buf->page_len -= cur;
+ trim -= cur;
+ if (!trim)
+ goto fix_len;
+ }
+
+ if (buf->head[0].iov_len) {
+ cur = min_t(size_t, buf->head[0].iov_len, trim);
+ buf->head[0].iov_len -= cur;
+ trim -= cur;
+ }
+fix_len:
+ buf->len -= (len - trim);
+}
+EXPORT_SYMBOL_GPL(xdr_buf_trim);
+
static void __read_bytes_from_xdr_buf(struct xdr_buf *subbuf, void *obj, unsigned int len)
{
unsigned int this_len;
--
1.7.11.7


2013-02-09 11:37:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Fri, 8 Feb 2013 15:55:55 -0500
"J. Bruce Fields" <[email protected]> wrote:

> On Fri, Feb 08, 2013 at 08:27:06AM -0500, Jeff Layton wrote:
> > On Thu, 7 Feb 2013 13:03:16 -0500
> > Jeff Layton <[email protected]> wrote:
> >
> > > On Thu, 7 Feb 2013 10:51:02 -0500
> > > Chuck Lever <[email protected]> wrote:
> > >
> > > >
> > > > On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> > > >
> > > > > Now that we're allowing more DRC entries, it becomes a lot easier to hit
> > > > > problems with XID collisions. In order to mitigate those, calculate the
> > > > > crc32 of up to the first 256 bytes of each request coming in and store
> > > > > that in the cache entry, along with the total length of the request.
> > > >
> > > > I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
> > > >
> > > > Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
> > > >
> > >
> > > No, I haven't, at least not in any sort of rigorous way. It's pretty
> > > negligible on "normal" PC hardware, but I think most intel and amd cpus
> > > have instructions for handling crc32. I'm ok with a different checksum,
> > > we don't need anything cryptographically secure here. I simply chose
> > > crc32 since it has an easily available API, and I figured it would be
> > > fairly lightweight.
> > >
> >
> > After an abortive attempt to measure this with ftrace, I ended up
> > hacking together a patch to just measure the latency of the
> > nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
> > KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
> > checksums cuts that roughly in half to ~800ns. I'm not sure how best to
> > measure the cache footprint however.
> >
> > Neither seems terribly significant, especially given the other
> > inefficiencies in this code. OTOH, I guess those latencies can add up,
> > and I don't see any need to use crc32 over the net/checksum.h routines.
> > We probably ought to go with my RFC patch from yesterday.
>
> OK, I hadn't committed the original yet, so I've just rolled them
> together and added a little of the above to the changelog. Look OK?
> Chuck, should I add a Reviewed-by: ?
>
> --b.
>
> commit a937bd422ccc4306cdc81b5aa60b12a7212b70d3
> Author: Jeff Layton <[email protected]>
> Date: Mon Feb 4 11:57:27 2013 -0500
>
> nfsd: keep a checksum of the first 256 bytes of request
>
> Now that we're allowing more DRC entries, it becomes a lot easier to hit
> problems with XID collisions. In order to mitigate those, calculate a
> checksum of up to the first 256 bytes of each request coming in and store
> that in the cache entry, along with the total length of the request.
>
> This initially used crc32, but Chuck Lever and Jim Rees pointed out that
> crc32 is probably more heavyweight than we really need for generating
> these checksums, and recommended looking at using the same routines that
> are used to generate checksums for IP packets.
>
> On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
> csum_partial vs ~1750ns for crc32. The difference probably isn't
> terribly significant, but for now we may as well use csum_partial.
>
> Signed-off-by: Jeff Layton <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>


Thanks Bruce. Looks good to me.

> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 9c7232b..87fd141 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -29,6 +29,8 @@ struct svc_cacherep {
> u32 c_prot;
> u32 c_proc;
> u32 c_vers;
> + unsigned int c_len;
> + __wsum c_csum;
> unsigned long c_timestamp;
> union {
> struct kvec u_vec;
> @@ -73,6 +75,9 @@ enum {
> /* Cache entries expire after this time period */
> #define RC_EXPIRE (120 * HZ)
>
> +/* Checksum this amount of the request */
> +#define RC_CSUMLEN (256U)
> +
> int nfsd_reply_cache_init(void);
> void nfsd_reply_cache_shutdown(void);
> int nfsd_cache_lookup(struct svc_rqst *);
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index f754469..40db57e 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -11,6 +11,7 @@
> #include <linux/slab.h>
> #include <linux/sunrpc/addr.h>
> #include <linux/highmem.h>
> +#include <net/checksum.h>
>
> #include "nfsd.h"
> #include "cache.h"
> @@ -130,6 +131,7 @@ int nfsd_reply_cache_init(void)
> INIT_LIST_HEAD(&lru_head);
> max_drc_entries = nfsd_cache_size_limit();
> num_drc_entries = 0;
> +
> return 0;
> out_nomem:
> printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
> @@ -238,12 +240,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
> }
>
> /*
> + * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> + */
> +static __wsum
> +nfsd_cache_csum(struct svc_rqst *rqstp)
> +{
> + int idx;
> + unsigned int base;
> + __wsum csum;
> + struct xdr_buf *buf = &rqstp->rq_arg;
> + const unsigned char *p = buf->head[0].iov_base;
> + size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> + RC_CSUMLEN);
> + size_t len = min(buf->head[0].iov_len, csum_len);
> +
> + /* rq_arg.head first */
> + csum = csum_partial(p, len, 0);
> + csum_len -= len;
> +
> + /* Continue into page array */
> + idx = buf->page_base / PAGE_SIZE;
> + base = buf->page_base & ~PAGE_MASK;
> + while (csum_len) {
> + p = page_address(buf->pages[idx]) + base;
> + len = min(PAGE_SIZE - base, csum_len);
> + csum = csum_partial(p, len, csum);
> + csum_len -= len;
> + base = 0;
> + ++idx;
> + }
> + return csum;
> +}
> +
> +/*
> * Search the request hash for an entry that matches the given rqstp.
> * Must be called with cache_lock held. Returns the found entry or
> * NULL on failure.
> */
> static struct svc_cacherep *
> -nfsd_cache_search(struct svc_rqst *rqstp)
> +nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
> {
> struct svc_cacherep *rp;
> struct hlist_node *hn;
> @@ -257,6 +292,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
> hlist_for_each_entry(rp, hn, rh, c_hash) {
> if (xid == rp->c_xid && proc == rp->c_proc &&
> proto == rp->c_prot && vers == rp->c_vers &&
> + rqstp->rq_arg.len == rp->c_len && csum == rp->c_csum &&
> rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
> rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
> return rp;
> @@ -277,6 +313,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> u32 proto = rqstp->rq_prot,
> vers = rqstp->rq_vers,
> proc = rqstp->rq_proc;
> + __wsum csum;
> unsigned long age;
> int type = rqstp->rq_cachetype;
> int rtn;
> @@ -287,10 +324,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> return RC_DOIT;
> }
>
> + csum = nfsd_cache_csum(rqstp);
> +
> spin_lock(&cache_lock);
> rtn = RC_DOIT;
>
> - rp = nfsd_cache_search(rqstp);
> + rp = nfsd_cache_search(rqstp, csum);
> if (rp)
> goto found_entry;
>
> @@ -318,7 +357,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> * Must search again just in case someone inserted one
> * after we dropped the lock above.
> */
> - found = nfsd_cache_search(rqstp);
> + found = nfsd_cache_search(rqstp, csum);
> if (found) {
> nfsd_reply_cache_free_locked(rp);
> rp = found;
> @@ -344,6 +383,8 @@ setup_entry:
> rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
> rp->c_prot = proto;
> rp->c_vers = vers;
> + rp->c_len = rqstp->rq_arg.len;
> + rp->c_csum = csum;
>
> hash_refile(rp);
> lru_put_end(rp);

--
Jeff Layton <[email protected]>

2013-02-07 15:51:13

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request


On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:

> Now that we're allowing more DRC entries, it becomes a lot easier to hit
> problems with XID collisions. In order to mitigate those, calculate the
> crc32 of up to the first 256 bytes of each request coming in and store
> that in the cache entry, along with the total length of the request.

I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.

Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.


> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/cache.h | 5 +++++
> fs/nfsd/nfscache.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 9c7232b..4822db3 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -29,6 +29,8 @@ struct svc_cacherep {
> u32 c_prot;
> u32 c_proc;
> u32 c_vers;
> + unsigned int c_len;
> + u32 c_crc;
> unsigned long c_timestamp;
> union {
> struct kvec u_vec;
> @@ -73,6 +75,9 @@ enum {
> /* Cache entries expire after this time period */
> #define RC_EXPIRE (120 * HZ)
>
> +/* Checksum this amount of the request */
> +#define RC_CSUMLEN (256U)
> +
> int nfsd_reply_cache_init(void);
> void nfsd_reply_cache_shutdown(void);
> int nfsd_cache_lookup(struct svc_rqst *);
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index f754469..a8c3f1e 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -11,6 +11,8 @@
> #include <linux/slab.h>
> #include <linux/sunrpc/addr.h>
> #include <linux/highmem.h>
> +#include <linux/crc32.h>
> +#include <linux/sunrpc/svcauth_gss.h>
>
> #include "nfsd.h"
> #include "cache.h"
> @@ -24,6 +26,7 @@ static struct list_head lru_head;
> static struct kmem_cache *drc_slab;
> static unsigned int num_drc_entries;
> static unsigned int max_drc_entries;
> +static u32 crc_seed;
>
> /*
> * Calculate the hash index from an XID.
> @@ -130,6 +133,9 @@ int nfsd_reply_cache_init(void)
> INIT_LIST_HEAD(&lru_head);
> max_drc_entries = nfsd_cache_size_limit();
> num_drc_entries = 0;
> +
> + /* Is a random seed any better than some well-defined constant? */
> + get_random_bytes(&crc_seed, sizeof(crc_seed));
> return 0;
> out_nomem:
> printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
> @@ -238,12 +244,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
> }
>
> /*
> + * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> + */
> +static u32
> +nfsd_cache_crc(struct svc_rqst *rqstp)
> +{
> + int idx;
> + unsigned int base;
> + u32 crc;
> + struct xdr_buf *buf = &rqstp->rq_arg;
> + const unsigned char *p = buf->head[0].iov_base;
> + size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> + RC_CSUMLEN);
> + size_t len = min(buf->head[0].iov_len, csum_len);
> +
> + /* rq_arg.head first */
> + crc = crc32(crc_seed, p, len);
> + csum_len -= len;
> +
> + /* Continue into page array */
> + idx = buf->page_base / PAGE_SIZE;
> + base = buf->page_base & ~PAGE_MASK;
> + while (csum_len) {
> + p = page_address(buf->pages[idx]) + base;
> + len = min(PAGE_SIZE - base, csum_len);
> + crc = crc32(crc, p, len);
> + csum_len -= len;
> + base = 0;
> + ++idx;
> + }
> + return crc;
> +}
> +
> +/*
> * Search the request hash for an entry that matches the given rqstp.
> * Must be called with cache_lock held. Returns the found entry or
> * NULL on failure.
> */
> static struct svc_cacherep *
> -nfsd_cache_search(struct svc_rqst *rqstp)
> +nfsd_cache_search(struct svc_rqst *rqstp, u32 crc)
> {
> struct svc_cacherep *rp;
> struct hlist_node *hn;
> @@ -257,6 +296,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
> hlist_for_each_entry(rp, hn, rh, c_hash) {
> if (xid == rp->c_xid && proc == rp->c_proc &&
> proto == rp->c_prot && vers == rp->c_vers &&
> + rqstp->rq_arg.len == rp->c_len && crc == rp->c_crc &&
> rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
> rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
> return rp;
> @@ -276,7 +316,8 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> __be32 xid = rqstp->rq_xid;
> u32 proto = rqstp->rq_prot,
> vers = rqstp->rq_vers,
> - proc = rqstp->rq_proc;
> + proc = rqstp->rq_proc,
> + crc;
> unsigned long age;
> int type = rqstp->rq_cachetype;
> int rtn;
> @@ -287,10 +328,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> return RC_DOIT;
> }
>
> + crc = nfsd_cache_crc(rqstp);
> +
> spin_lock(&cache_lock);
> rtn = RC_DOIT;
>
> - rp = nfsd_cache_search(rqstp);
> + rp = nfsd_cache_search(rqstp, crc);
> if (rp)
> goto found_entry;
>
> @@ -318,7 +361,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> * Must search again just in case someone inserted one
> * after we dropped the lock above.
> */
> - found = nfsd_cache_search(rqstp);
> + found = nfsd_cache_search(rqstp, crc);
> if (found) {
> nfsd_reply_cache_free_locked(rp);
> rp = found;
> @@ -344,6 +387,8 @@ setup_entry:
> rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
> rp->c_prot = proto;
> rp->c_vers = vers;
> + rp->c_len = rqstp->rq_arg.len;
> + rp->c_crc = crc;
>
> hash_refile(rp);
> lru_put_end(rp);
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-02-07 16:23:28

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request


On Feb 7, 2013, at 11:00 AM, "J. Bruce Fields" <[email protected]> wrote:

> On Thu, Feb 07, 2013 at 10:51:02AM -0500, Chuck Lever wrote:
>>
>> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
>>
>>> Now that we're allowing more DRC entries, it becomes a lot easier to
>>> hit problems with XID collisions. In order to mitigate those,
>>> calculate the crc32 of up to the first 256 bytes of each request
>>> coming in and store that in the cache entry, along with the total
>>> length of the request.
>>
>> I'm happy to see a checksummed DRC finally become reality for the
>> Linux NFS server.
>>
>> Have you measured the CPU utilization impact and CPU cache footprint
>> of performing a CRC computation for every incoming RPC?
>
> Note this is over the first 256 bytes of the request--which we're
> probably just about to read for xdr decoding anyway.

XDR decoding is copying and branching. Computing a CRC involves real math, which tends to be significantly more expensive than successfully predicted branches, especially on low-power CPUs that might be found in SOHO NAS products.


>
>> I'm wondering if a simpler checksum might be just as useful but less
>> costly to compute.
>
> What would be an example of a simpler checksum?

The same one TCP uses, like a simple additive sum, or an XOR. Is a heavyweight checksum needed because checksums generated with a simple function are more likely to collide?

Not that this should hold up merging Jeff's work! We can easily tweak or replace the checksum algorithm after it's upstream. It's not kABI.

But someone should assess the impact of the additional checksum computation. CRC seems to me heavier than is needed here.


Possible tweaks:

Why 256 bytes? Is that too much? Or not enough for some NFSv4 compounds that might often start with the same data? Could we, for instance, use fewer bytes for NFSv2 and NFSv3? Or even a variable checksum length depending on the NFS operation? Is 256 bytes enough for NFSv4.1, whose compounds always start with the same operation?

If integrity or privacy is in play, can we use that information in place of a separate DRC checksum?



>
> --b.
>
>>
>>
>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> fs/nfsd/cache.h | 5 +++++
>>> fs/nfsd/nfscache.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
>>> 2 files changed, 54 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
>>> index 9c7232b..4822db3 100644
>>> --- a/fs/nfsd/cache.h
>>> +++ b/fs/nfsd/cache.h
>>> @@ -29,6 +29,8 @@ struct svc_cacherep {
>>> u32 c_prot;
>>> u32 c_proc;
>>> u32 c_vers;
>>> + unsigned int c_len;
>>> + u32 c_crc;
>>> unsigned long c_timestamp;
>>> union {
>>> struct kvec u_vec;
>>> @@ -73,6 +75,9 @@ enum {
>>> /* Cache entries expire after this time period */
>>> #define RC_EXPIRE (120 * HZ)
>>>
>>> +/* Checksum this amount of the request */
>>> +#define RC_CSUMLEN (256U)
>>> +
>>> int nfsd_reply_cache_init(void);
>>> void nfsd_reply_cache_shutdown(void);
>>> int nfsd_cache_lookup(struct svc_rqst *);
>>> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
>>> index f754469..a8c3f1e 100644
>>> --- a/fs/nfsd/nfscache.c
>>> +++ b/fs/nfsd/nfscache.c
>>> @@ -11,6 +11,8 @@
>>> #include <linux/slab.h>
>>> #include <linux/sunrpc/addr.h>
>>> #include <linux/highmem.h>
>>> +#include <linux/crc32.h>
>>> +#include <linux/sunrpc/svcauth_gss.h>
>>>
>>> #include "nfsd.h"
>>> #include "cache.h"
>>> @@ -24,6 +26,7 @@ static struct list_head lru_head;
>>> static struct kmem_cache *drc_slab;
>>> static unsigned int num_drc_entries;
>>> static unsigned int max_drc_entries;
>>> +static u32 crc_seed;
>>>
>>> /*
>>> * Calculate the hash index from an XID.
>>> @@ -130,6 +133,9 @@ int nfsd_reply_cache_init(void)
>>> INIT_LIST_HEAD(&lru_head);
>>> max_drc_entries = nfsd_cache_size_limit();
>>> num_drc_entries = 0;
>>> +
>>> + /* Is a random seed any better than some well-defined constant? */
>>> + get_random_bytes(&crc_seed, sizeof(crc_seed));
>>> return 0;
>>> out_nomem:
>>> printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
>>> @@ -238,12 +244,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
>>> }
>>>
>>> /*
>>> + * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
>>> + */
>>> +static u32
>>> +nfsd_cache_crc(struct svc_rqst *rqstp)
>>> +{
>>> + int idx;
>>> + unsigned int base;
>>> + u32 crc;
>>> + struct xdr_buf *buf = &rqstp->rq_arg;
>>> + const unsigned char *p = buf->head[0].iov_base;
>>> + size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
>>> + RC_CSUMLEN);
>>> + size_t len = min(buf->head[0].iov_len, csum_len);
>>> +
>>> + /* rq_arg.head first */
>>> + crc = crc32(crc_seed, p, len);
>>> + csum_len -= len;
>>> +
>>> + /* Continue into page array */
>>> + idx = buf->page_base / PAGE_SIZE;
>>> + base = buf->page_base & ~PAGE_MASK;
>>> + while (csum_len) {
>>> + p = page_address(buf->pages[idx]) + base;
>>> + len = min(PAGE_SIZE - base, csum_len);
>>> + crc = crc32(crc, p, len);
>>> + csum_len -= len;
>>> + base = 0;
>>> + ++idx;
>>> + }
>>> + return crc;
>>> +}
>>> +
>>> +/*
>>> * Search the request hash for an entry that matches the given rqstp.
>>> * Must be called with cache_lock held. Returns the found entry or
>>> * NULL on failure.
>>> */
>>> static struct svc_cacherep *
>>> -nfsd_cache_search(struct svc_rqst *rqstp)
>>> +nfsd_cache_search(struct svc_rqst *rqstp, u32 crc)
>>> {
>>> struct svc_cacherep *rp;
>>> struct hlist_node *hn;
>>> @@ -257,6 +296,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
>>> hlist_for_each_entry(rp, hn, rh, c_hash) {
>>> if (xid == rp->c_xid && proc == rp->c_proc &&
>>> proto == rp->c_prot && vers == rp->c_vers &&
>>> + rqstp->rq_arg.len == rp->c_len && crc == rp->c_crc &&
>>> rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
>>> rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
>>> return rp;
>>> @@ -276,7 +316,8 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
>>> __be32 xid = rqstp->rq_xid;
>>> u32 proto = rqstp->rq_prot,
>>> vers = rqstp->rq_vers,
>>> - proc = rqstp->rq_proc;
>>> + proc = rqstp->rq_proc,
>>> + crc;
>>> unsigned long age;
>>> int type = rqstp->rq_cachetype;
>>> int rtn;
>>> @@ -287,10 +328,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
>>> return RC_DOIT;
>>> }
>>>
>>> + crc = nfsd_cache_crc(rqstp);
>>> +
>>> spin_lock(&cache_lock);
>>> rtn = RC_DOIT;
>>>
>>> - rp = nfsd_cache_search(rqstp);
>>> + rp = nfsd_cache_search(rqstp, crc);
>>> if (rp)
>>> goto found_entry;
>>>
>>> @@ -318,7 +361,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
>>> * Must search again just in case someone inserted one
>>> * after we dropped the lock above.
>>> */
>>> - found = nfsd_cache_search(rqstp);
>>> + found = nfsd_cache_search(rqstp, crc);
>>> if (found) {
>>> nfsd_reply_cache_free_locked(rp);
>>> rp = found;
>>> @@ -344,6 +387,8 @@ setup_entry:
>>> rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
>>> rp->c_prot = proto;
>>> rp->c_vers = vers;
>>> + rp->c_len = rqstp->rq_arg.len;
>>> + rp->c_crc = crc;
>>>
>>> hash_refile(rp);
>>> lru_put_end(rp);
>>> --
>>> 1.7.11.7
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-02-07 14:51:49

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

Now that we're allowing more DRC entries, it becomes a lot easier to hit
problems with XID collisions. In order to mitigate those, calculate the
crc32 of up to the first 256 bytes of each request coming in and store
that in the cache entry, along with the total length of the request.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/cache.h | 5 +++++
fs/nfsd/nfscache.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 9c7232b..4822db3 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -29,6 +29,8 @@ struct svc_cacherep {
u32 c_prot;
u32 c_proc;
u32 c_vers;
+ unsigned int c_len;
+ u32 c_crc;
unsigned long c_timestamp;
union {
struct kvec u_vec;
@@ -73,6 +75,9 @@ enum {
/* Cache entries expire after this time period */
#define RC_EXPIRE (120 * HZ)

+/* Checksum this amount of the request */
+#define RC_CSUMLEN (256U)
+
int nfsd_reply_cache_init(void);
void nfsd_reply_cache_shutdown(void);
int nfsd_cache_lookup(struct svc_rqst *);
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index f754469..a8c3f1e 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -11,6 +11,8 @@
#include <linux/slab.h>
#include <linux/sunrpc/addr.h>
#include <linux/highmem.h>
+#include <linux/crc32.h>
+#include <linux/sunrpc/svcauth_gss.h>

#include "nfsd.h"
#include "cache.h"
@@ -24,6 +26,7 @@ static struct list_head lru_head;
static struct kmem_cache *drc_slab;
static unsigned int num_drc_entries;
static unsigned int max_drc_entries;
+static u32 crc_seed;

/*
* Calculate the hash index from an XID.
@@ -130,6 +133,9 @@ int nfsd_reply_cache_init(void)
INIT_LIST_HEAD(&lru_head);
max_drc_entries = nfsd_cache_size_limit();
num_drc_entries = 0;
+
+ /* Is a random seed any better than some well-defined constant? */
+ get_random_bytes(&crc_seed, sizeof(crc_seed));
return 0;
out_nomem:
printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
@@ -238,12 +244,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
}

/*
+ * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
+ */
+static u32
+nfsd_cache_crc(struct svc_rqst *rqstp)
+{
+ int idx;
+ unsigned int base;
+ u32 crc;
+ struct xdr_buf *buf = &rqstp->rq_arg;
+ const unsigned char *p = buf->head[0].iov_base;
+ size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
+ RC_CSUMLEN);
+ size_t len = min(buf->head[0].iov_len, csum_len);
+
+ /* rq_arg.head first */
+ crc = crc32(crc_seed, p, len);
+ csum_len -= len;
+
+ /* Continue into page array */
+ idx = buf->page_base / PAGE_SIZE;
+ base = buf->page_base & ~PAGE_MASK;
+ while (csum_len) {
+ p = page_address(buf->pages[idx]) + base;
+ len = min(PAGE_SIZE - base, csum_len);
+ crc = crc32(crc, p, len);
+ csum_len -= len;
+ base = 0;
+ ++idx;
+ }
+ return crc;
+}
+
+/*
* Search the request hash for an entry that matches the given rqstp.
* Must be called with cache_lock held. Returns the found entry or
* NULL on failure.
*/
static struct svc_cacherep *
-nfsd_cache_search(struct svc_rqst *rqstp)
+nfsd_cache_search(struct svc_rqst *rqstp, u32 crc)
{
struct svc_cacherep *rp;
struct hlist_node *hn;
@@ -257,6 +296,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
hlist_for_each_entry(rp, hn, rh, c_hash) {
if (xid == rp->c_xid && proc == rp->c_proc &&
proto == rp->c_prot && vers == rp->c_vers &&
+ rqstp->rq_arg.len == rp->c_len && crc == rp->c_crc &&
rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
return rp;
@@ -276,7 +316,8 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
__be32 xid = rqstp->rq_xid;
u32 proto = rqstp->rq_prot,
vers = rqstp->rq_vers,
- proc = rqstp->rq_proc;
+ proc = rqstp->rq_proc,
+ crc;
unsigned long age;
int type = rqstp->rq_cachetype;
int rtn;
@@ -287,10 +328,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
return RC_DOIT;
}

+ crc = nfsd_cache_crc(rqstp);
+
spin_lock(&cache_lock);
rtn = RC_DOIT;

- rp = nfsd_cache_search(rqstp);
+ rp = nfsd_cache_search(rqstp, crc);
if (rp)
goto found_entry;

@@ -318,7 +361,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
* Must search again just in case someone inserted one
* after we dropped the lock above.
*/
- found = nfsd_cache_search(rqstp);
+ found = nfsd_cache_search(rqstp, crc);
if (found) {
nfsd_reply_cache_free_locked(rp);
rp = found;
@@ -344,6 +387,8 @@ setup_entry:
rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
rp->c_prot = proto;
rp->c_vers = vers;
+ rp->c_len = rqstp->rq_arg.len;
+ rp->c_crc = crc;

hash_refile(rp);
lru_put_end(rp);
--
1.7.11.7


2013-02-08 21:00:15

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request


On Feb 8, 2013, at 3:55 PM, "J. Bruce Fields" <[email protected]> wrote:

> On Fri, Feb 08, 2013 at 08:27:06AM -0500, Jeff Layton wrote:
>> On Thu, 7 Feb 2013 13:03:16 -0500
>> Jeff Layton <[email protected]> wrote:
>>
>>> On Thu, 7 Feb 2013 10:51:02 -0500
>>> Chuck Lever <[email protected]> wrote:
>>>
>>>>
>>>> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
>>>>
>>>>> Now that we're allowing more DRC entries, it becomes a lot easier to hit
>>>>> problems with XID collisions. In order to mitigate those, calculate the
>>>>> crc32 of up to the first 256 bytes of each request coming in and store
>>>>> that in the cache entry, along with the total length of the request.
>>>>
>>>> I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
>>>>
>>>> Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
>>>>
>>>
>>> No, I haven't, at least not in any sort of rigorous way. It's pretty
>>> negligible on "normal" PC hardware, but I think most intel and amd cpus
>>> have instructions for handling crc32. I'm ok with a different checksum,
>>> we don't need anything cryptographically secure here. I simply chose
>>> crc32 since it has an easily available API, and I figured it would be
>>> fairly lightweight.
>>>
>>
>> After an abortive attempt to measure this with ftrace, I ended up
>> hacking together a patch to just measure the latency of the
>> nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
>> KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
>> checksums cuts that roughly in half to ~800ns. I'm not sure how best to
>> measure the cache footprint however.
>>
>> Neither seems terribly significant, especially given the other
>> inefficiencies in this code. OTOH, I guess those latencies can add up,
>> and I don't see any need to use crc32 over the net/checksum.h routines.
>> We probably ought to go with my RFC patch from yesterday.
>
> OK, I hadn't committed the original yet, so I've just rolled them
> together and added a little of the above to the changelog. Look OK?
> Chuck, should I add a Reviewed-by: ?

Not sure my participation counts as review. How about:

Stones-thrown-by: Chuck Lever <[email protected]>

> --b.
>
> commit a937bd422ccc4306cdc81b5aa60b12a7212b70d3
> Author: Jeff Layton <[email protected]>
> Date: Mon Feb 4 11:57:27 2013 -0500
>
> nfsd: keep a checksum of the first 256 bytes of request
>
> Now that we're allowing more DRC entries, it becomes a lot easier to hit
> problems with XID collisions. In order to mitigate those, calculate a
> checksum of up to the first 256 bytes of each request coming in and store
> that in the cache entry, along with the total length of the request.
>
> This initially used crc32, but Chuck Lever and Jim Rees pointed out that
> crc32 is probably more heavyweight than we really need for generating
> these checksums, and recommended looking at using the same routines that
> are used to generate checksums for IP packets.
>
> On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
> csum_partial vs ~1750ns for crc32. The difference probably isn't
> terribly significant, but for now we may as well use csum_partial.
>
> Signed-off-by: Jeff Layton <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 9c7232b..87fd141 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -29,6 +29,8 @@ struct svc_cacherep {
> u32 c_prot;
> u32 c_proc;
> u32 c_vers;
> + unsigned int c_len;
> + __wsum c_csum;
> unsigned long c_timestamp;
> union {
> struct kvec u_vec;
> @@ -73,6 +75,9 @@ enum {
> /* Cache entries expire after this time period */
> #define RC_EXPIRE (120 * HZ)
>
> +/* Checksum this amount of the request */
> +#define RC_CSUMLEN (256U)
> +
> int nfsd_reply_cache_init(void);
> void nfsd_reply_cache_shutdown(void);
> int nfsd_cache_lookup(struct svc_rqst *);
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index f754469..40db57e 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -11,6 +11,7 @@
> #include <linux/slab.h>
> #include <linux/sunrpc/addr.h>
> #include <linux/highmem.h>
> +#include <net/checksum.h>
>
> #include "nfsd.h"
> #include "cache.h"
> @@ -130,6 +131,7 @@ int nfsd_reply_cache_init(void)
> INIT_LIST_HEAD(&lru_head);
> max_drc_entries = nfsd_cache_size_limit();
> num_drc_entries = 0;
> +
> return 0;
> out_nomem:
> printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
> @@ -238,12 +240,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
> }
>
> /*
> + * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> + */
> +static __wsum
> +nfsd_cache_csum(struct svc_rqst *rqstp)
> +{
> + int idx;
> + unsigned int base;
> + __wsum csum;
> + struct xdr_buf *buf = &rqstp->rq_arg;
> + const unsigned char *p = buf->head[0].iov_base;
> + size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> + RC_CSUMLEN);
> + size_t len = min(buf->head[0].iov_len, csum_len);
> +
> + /* rq_arg.head first */
> + csum = csum_partial(p, len, 0);
> + csum_len -= len;
> +
> + /* Continue into page array */
> + idx = buf->page_base / PAGE_SIZE;
> + base = buf->page_base & ~PAGE_MASK;
> + while (csum_len) {
> + p = page_address(buf->pages[idx]) + base;
> + len = min(PAGE_SIZE - base, csum_len);
> + csum = csum_partial(p, len, csum);
> + csum_len -= len;
> + base = 0;
> + ++idx;
> + }
> + return csum;
> +}
> +
> +/*
> * Search the request hash for an entry that matches the given rqstp.
> * Must be called with cache_lock held. Returns the found entry or
> * NULL on failure.
> */
> static struct svc_cacherep *
> -nfsd_cache_search(struct svc_rqst *rqstp)
> +nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
> {
> struct svc_cacherep *rp;
> struct hlist_node *hn;
> @@ -257,6 +292,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
> hlist_for_each_entry(rp, hn, rh, c_hash) {
> if (xid == rp->c_xid && proc == rp->c_proc &&
> proto == rp->c_prot && vers == rp->c_vers &&
> + rqstp->rq_arg.len == rp->c_len && csum == rp->c_csum &&
> rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
> rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
> return rp;
> @@ -277,6 +313,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> u32 proto = rqstp->rq_prot,
> vers = rqstp->rq_vers,
> proc = rqstp->rq_proc;
> + __wsum csum;
> unsigned long age;
> int type = rqstp->rq_cachetype;
> int rtn;
> @@ -287,10 +324,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> return RC_DOIT;
> }
>
> + csum = nfsd_cache_csum(rqstp);
> +
> spin_lock(&cache_lock);
> rtn = RC_DOIT;
>
> - rp = nfsd_cache_search(rqstp);
> + rp = nfsd_cache_search(rqstp, csum);
> if (rp)
> goto found_entry;
>
> @@ -318,7 +357,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> * Must search again just in case someone inserted one
> * after we dropped the lock above.
> */
> - found = nfsd_cache_search(rqstp);
> + found = nfsd_cache_search(rqstp, csum);
> if (found) {
> nfsd_reply_cache_free_locked(rp);
> rp = found;
> @@ -344,6 +383,8 @@ setup_entry:
> rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
> rp->c_prot = proto;
> rp->c_vers = vers;
> + rp->c_len = rqstp->rq_arg.len;
> + rp->c_csum = csum;
>
> hash_refile(rp);
> lru_put_end(rp);

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-02-08 21:02:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

On Fri, Feb 08, 2013 at 03:59:53PM -0500, Chuck Lever wrote:
>
> On Feb 8, 2013, at 3:55 PM, "J. Bruce Fields" <[email protected]> wrote:
>
> > On Fri, Feb 08, 2013 at 08:27:06AM -0500, Jeff Layton wrote:
> >> On Thu, 7 Feb 2013 13:03:16 -0500
> >> Jeff Layton <[email protected]> wrote:
> >>
> >>> On Thu, 7 Feb 2013 10:51:02 -0500
> >>> Chuck Lever <[email protected]> wrote:
> >>>
> >>>>
> >>>> On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> >>>>
> >>>>> Now that we're allowing more DRC entries, it becomes a lot easier to hit
> >>>>> problems with XID collisions. In order to mitigate those, calculate the
> >>>>> crc32 of up to the first 256 bytes of each request coming in and store
> >>>>> that in the cache entry, along with the total length of the request.
> >>>>
> >>>> I'm happy to see a checksummed DRC finally become reality for the Linux NFS server.
> >>>>
> >>>> Have you measured the CPU utilization impact and CPU cache footprint of performing a CRC computation for every incoming RPC? I'm wondering if a simpler checksum might be just as useful but less costly to compute.
> >>>>
> >>>
> >>> No, I haven't, at least not in any sort of rigorous way. It's pretty
> >>> negligible on "normal" PC hardware, but I think most intel and amd cpus
> >>> have instructions for handling crc32. I'm ok with a different checksum,
> >>> we don't need anything cryptographically secure here. I simply chose
> >>> crc32 since it has an easily available API, and I figured it would be
> >>> fairly lightweight.
> >>>
> >>
> >> After an abortive attempt to measure this with ftrace, I ended up
> >> hacking together a patch to just measure the latency of the
> >> nfsd_cache_csum/_crc functions to get some rough numbers. On my x86_64
> >> KVM guest, the avg time to calculate the crc32 is ~1750ns. Using IP
> >> checksums cuts that roughly in half to ~800ns. I'm not sure how best to
> >> measure the cache footprint however.
> >>
> >> Neither seems terribly significant, especially given the other
> >> inefficiencies in this code. OTOH, I guess those latencies can add up,
> >> and I don't see any need to use crc32 over the net/checksum.h routines.
> >> We probably ought to go with my RFC patch from yesterday.
> >
> > OK, I hadn't committed the original yet, so I've just rolled them
> > together and added a little of the above to the changelog. Look OK?
> > Chuck, should I add a Reviewed-by: ?
>
> Not sure my participation counts as review. How about:
>
> Stones-thrown-by: Chuck Lever <[email protected]>

As you wish.--b.

>
> > --b.
> >
> > commit a937bd422ccc4306cdc81b5aa60b12a7212b70d3
> > Author: Jeff Layton <[email protected]>
> > Date: Mon Feb 4 11:57:27 2013 -0500
> >
> > nfsd: keep a checksum of the first 256 bytes of request
> >
> > Now that we're allowing more DRC entries, it becomes a lot easier to hit
> > problems with XID collisions. In order to mitigate those, calculate a
> > checksum of up to the first 256 bytes of each request coming in and store
> > that in the cache entry, along with the total length of the request.
> >
> > This initially used crc32, but Chuck Lever and Jim Rees pointed out that
> > crc32 is probably more heavyweight than we really need for generating
> > these checksums, and recommended looking at using the same routines that
> > are used to generate checksums for IP packets.
> >
> > On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
> > csum_partial vs ~1750ns for crc32. The difference probably isn't
> > terribly significant, but for now we may as well use csum_partial.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> > index 9c7232b..87fd141 100644
> > --- a/fs/nfsd/cache.h
> > +++ b/fs/nfsd/cache.h
> > @@ -29,6 +29,8 @@ struct svc_cacherep {
> > u32 c_prot;
> > u32 c_proc;
> > u32 c_vers;
> > + unsigned int c_len;
> > + __wsum c_csum;
> > unsigned long c_timestamp;
> > union {
> > struct kvec u_vec;
> > @@ -73,6 +75,9 @@ enum {
> > /* Cache entries expire after this time period */
> > #define RC_EXPIRE (120 * HZ)
> >
> > +/* Checksum this amount of the request */
> > +#define RC_CSUMLEN (256U)
> > +
> > int nfsd_reply_cache_init(void);
> > void nfsd_reply_cache_shutdown(void);
> > int nfsd_cache_lookup(struct svc_rqst *);
> > diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> > index f754469..40db57e 100644
> > --- a/fs/nfsd/nfscache.c
> > +++ b/fs/nfsd/nfscache.c
> > @@ -11,6 +11,7 @@
> > #include <linux/slab.h>
> > #include <linux/sunrpc/addr.h>
> > #include <linux/highmem.h>
> > +#include <net/checksum.h>
> >
> > #include "nfsd.h"
> > #include "cache.h"
> > @@ -130,6 +131,7 @@ int nfsd_reply_cache_init(void)
> > INIT_LIST_HEAD(&lru_head);
> > max_drc_entries = nfsd_cache_size_limit();
> > num_drc_entries = 0;
> > +
> > return 0;
> > out_nomem:
> > printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
> > @@ -238,12 +240,45 @@ nfsd_reply_cache_shrink(struct shrinker *shrink, struct shrink_control *sc)
> > }
> >
> > /*
> > + * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> > + */
> > +static __wsum
> > +nfsd_cache_csum(struct svc_rqst *rqstp)
> > +{
> > + int idx;
> > + unsigned int base;
> > + __wsum csum;
> > + struct xdr_buf *buf = &rqstp->rq_arg;
> > + const unsigned char *p = buf->head[0].iov_base;
> > + size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> > + RC_CSUMLEN);
> > + size_t len = min(buf->head[0].iov_len, csum_len);
> > +
> > + /* rq_arg.head first */
> > + csum = csum_partial(p, len, 0);
> > + csum_len -= len;
> > +
> > + /* Continue into page array */
> > + idx = buf->page_base / PAGE_SIZE;
> > + base = buf->page_base & ~PAGE_MASK;
> > + while (csum_len) {
> > + p = page_address(buf->pages[idx]) + base;
> > + len = min(PAGE_SIZE - base, csum_len);
> > + csum = csum_partial(p, len, csum);
> > + csum_len -= len;
> > + base = 0;
> > + ++idx;
> > + }
> > + return csum;
> > +}
> > +
> > +/*
> > * Search the request hash for an entry that matches the given rqstp.
> > * Must be called with cache_lock held. Returns the found entry or
> > * NULL on failure.
> > */
> > static struct svc_cacherep *
> > -nfsd_cache_search(struct svc_rqst *rqstp)
> > +nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
> > {
> > struct svc_cacherep *rp;
> > struct hlist_node *hn;
> > @@ -257,6 +292,7 @@ nfsd_cache_search(struct svc_rqst *rqstp)
> > hlist_for_each_entry(rp, hn, rh, c_hash) {
> > if (xid == rp->c_xid && proc == rp->c_proc &&
> > proto == rp->c_prot && vers == rp->c_vers &&
> > + rqstp->rq_arg.len == rp->c_len && csum == rp->c_csum &&
> > rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
> > rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
> > return rp;
> > @@ -277,6 +313,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> > u32 proto = rqstp->rq_prot,
> > vers = rqstp->rq_vers,
> > proc = rqstp->rq_proc;
> > + __wsum csum;
> > unsigned long age;
> > int type = rqstp->rq_cachetype;
> > int rtn;
> > @@ -287,10 +324,12 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> > return RC_DOIT;
> > }
> >
> > + csum = nfsd_cache_csum(rqstp);
> > +
> > spin_lock(&cache_lock);
> > rtn = RC_DOIT;
> >
> > - rp = nfsd_cache_search(rqstp);
> > + rp = nfsd_cache_search(rqstp, csum);
> > if (rp)
> > goto found_entry;
> >
> > @@ -318,7 +357,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
> > * Must search again just in case someone inserted one
> > * after we dropped the lock above.
> > */
> > - found = nfsd_cache_search(rqstp);
> > + found = nfsd_cache_search(rqstp, csum);
> > if (found) {
> > nfsd_reply_cache_free_locked(rp);
> > rp = found;
> > @@ -344,6 +383,8 @@ setup_entry:
> > rpc_set_port((struct sockaddr *)&rp->c_addr, rpc_get_port(svc_addr(rqstp)));
> > rp->c_prot = proto;
> > rp->c_vers = vers;
> > + rp->c_len = rqstp->rq_arg.len;
> > + rp->c_csum = csum;
> >
> > hash_refile(rp);
> > lru_put_end(rp);
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

2013-02-07 16:32:22

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of request

> -----Original Message-----
> From: [email protected] [mailto:linux-nfs-
> [email protected]] On Behalf Of J. Bruce Fields
> Sent: Thursday, February 07, 2013 11:01 AM
> To: Chuck Lever
> Cc: Jeff Layton; [email protected]
> Subject: Re: [PATCH v3 2/2] nfsd: keep a checksum of the first 256 bytes of
> request
>
> On Thu, Feb 07, 2013 at 10:51:02AM -0500, Chuck Lever wrote:
> >
> > On Feb 7, 2013, at 9:51 AM, Jeff Layton <[email protected]> wrote:
> >
> > > Now that we're allowing more DRC entries, it becomes a lot easier to
> > > hit problems with XID collisions. In order to mitigate those,
> > > calculate the crc32 of up to the first 256 bytes of each request
> > > coming in and store that in the cache entry, along with the total
> > > length of the request.
> >
> > I'm happy to see a checksummed DRC finally become reality for the
> > Linux NFS server.
> >
> > Have you measured the CPU utilization impact and CPU cache footprint
> > of performing a CRC computation for every incoming RPC?
>
> Note this is over the first 256 bytes of the request--which we're probably just
> about to read for xdr decoding anyway.

- Would it make sense perhaps to generate the checksum as you are reading the data?
- Also, is 256 bytes sufficient? How far does that get you with your average WRITE compound?
- Could the integrity checksum in RPCSEC_GSS/krbi be reused as a DRC checksum?

Cheers
Trond