LinuxLists.cc - nfsd delays between svc_recv and gss_check_seq

2016-04-10 11:44:48

Subject: nfsd delays between svc_recv and gss_check_seq_num

My client hangs on xfstests generic/074 on a krb5 mount, and I've found that
the linux server is silently discarding one or more RPCs because the GSS
sequence numbers are outside the sequence window.

The reason is that sometimes one of the nfsd threads takes a long time
between receiving the RPC and then checking if the sequence is within the
window. That delay allows the other nfsd threads to quickly move the window
forward out of range.

If the server discards the RPC then that causes then the client to wait
forever for a response or until the connection is reset.

By inserting tracepoints, I think I found two sources of delay:

1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup with
GFP_KERNEL. It does this because presumabely it doesn't know how big the
context handle should be.

2) gss_verify_mic() uses make_checksum() which eventually gets to
crypto_alloc_hash() with GFP_KERNEL.

For the first delay, can we assume the context handles are all going to be
the same size? It looks like the handle is assigned by the server, so it
seems like we should be able to know beforehand how large they are.

For the second allocation -- I haven't thrown a lot of thought into what
could be done to fix it.. seems a bit tricker. I'll think about both of
these a bit more, but I thought in the meantime to ask if anyone has
thoughts about this problem. Maybe we can to the sequence check before
verify_mic -- but then a message that fails verification could flip the
sequence bit..

Ben

2016-04-25 18:38:02

by J. Bruce Fields

[permalink] [raw]

Subject: Re: nfsd delays between svc_recv and gss_check_seq_num

On Sun, Apr 10, 2016 at 07:44:45AM -0400, Benjamin Coddington wrote:
> My client hangs on xfstests generic/074 on a krb5 mount, and I've found that
> the linux server is silently discarding one or more RPCs because the GSS
> sequence numbers are outside the sequence window.
>
> The reason is that sometimes one of the nfsd threads takes a long time
> between receiving the RPC and then checking if the sequence is within the
> window. That delay allows the other nfsd threads to quickly move the window
> forward out of range.
>
> If the server discards the RPC then that causes then the client to wait
> forever for a response or until the connection is reset.
>
> By inserting tracepoints, I think I found two sources of delay:
>
> 1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup with
> GFP_KERNEL. It does this because presumabely it doesn't know how big the
> context handle should be.
>
> 2) gss_verify_mic() uses make_checksum() which eventually gets to
> crypto_alloc_hash() with GFP_KERNEL.
>
> For the first delay, can we assume the context handles are all going to be
> the same size? It looks like the handle is assigned by the server, so it
> seems like we should be able to know beforehand how large they are.

It's assigned by the server, but I believe that happens in userland,
either in svcgssd or gss-proxy. On a quick look I can't find a limit
other than the rpc-imposed limit of 400 bytes for an rpc credential. So
we'd need a documented agreement with svcgssd and gss-proxy for that.
Probably easy for the former, not sure about the latter.

> For the second allocation -- I haven't thrown a lot of thought into what
> could be done to fix it.. seems a bit tricker. I'll think about both of
> these a bit more, but I thought in the meantime to ask if anyone has
> thoughts about this problem. Maybe we can to the sequence check before
> verify_mic -- but then a message that fails verification could flip the
> sequence bit..

How much is this happening? Could increase the sequence window?

--b.

2016-04-25 19:23:59

by Benjamin Coddington

[permalink] [raw]

Subject: Re: nfsd delays between svc_recv and gss_check_seq_num

On Mon, 25 Apr 2016, J. Bruce Fields wrote:

> On Sun, Apr 10, 2016 at 07:44:45AM -0400, Benjamin Coddington wrote:
> > My client hangs on xfstests generic/074 on a krb5 mount, and I've found that
> > the linux server is silently discarding one or more RPCs because the GSS
> > sequence numbers are outside the sequence window.
> >
> > The reason is that sometimes one of the nfsd threads takes a long time
> > between receiving the RPC and then checking if the sequence is within the
> > window. That delay allows the other nfsd threads to quickly move the window
> > forward out of range.
> >
> > If the server discards the RPC then that causes then the client to wait
> > forever for a response or until the connection is reset.
> >
> > By inserting tracepoints, I think I found two sources of delay:
> >
> > 1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup with
> > GFP_KERNEL. It does this because presumabely it doesn't know how big the
> > context handle should be.
> >
> > 2) gss_verify_mic() uses make_checksum() which eventually gets to
> > crypto_alloc_hash() with GFP_KERNEL.
> >
> > For the first delay, can we assume the context handles are all going to be
> > the same size? It looks like the handle is assigned by the server, so it
> > seems like we should be able to know beforehand how large they are.
>
> It's assigned by the server, but I believe that happens in userland,
> either in svcgssd or gss-proxy. On a quick look I can't find a limit
> other than the rpc-imposed limit of 400 bytes for an rpc credential. So
> we'd need a documented agreement with svcgssd and gss-proxy for that.
> Probably easy for the former, not sure about the latter.

OK, that gives me something to follow up.

> > For the second allocation -- I haven't thrown a lot of thought into what
> > could be done to fix it.. seems a bit tricker. I'll think about both of
> > these a bit more, but I thought in the meantime to ask if anyone has
> > thoughts about this problem. Maybe we can to the sequence check before
> > verify_mic -- but then a message that fails verification could flip the
> > sequence bit..
>
> How much is this happening? Could increase the sequence window?

It happens once or twice each run. I've seen the sequence check miss by
several times the current sequence window size. I don't think increasing
the sequence window is a great fix because the delay waiting for an
allocation is not well-bounded. Different hardware or in other
circumstances it could be quite a bit longer.

Ben

2016-04-25 20:10:02

by J. Bruce Fields

[permalink] [raw]

Subject: Re: nfsd delays between svc_recv and gss_check_seq_num

On Mon, Apr 25, 2016 at 03:23:56PM -0400, Benjamin Coddington wrote:
> On Mon, 25 Apr 2016, J. Bruce Fields wrote:
>
> > On Sun, Apr 10, 2016 at 07:44:45AM -0400, Benjamin Coddington wrote:
> > > My client hangs on xfstests generic/074 on a krb5 mount, and I've found that
> > > the linux server is silently discarding one or more RPCs because the GSS
> > > sequence numbers are outside the sequence window.
> > >
> > > The reason is that sometimes one of the nfsd threads takes a long time
> > > between receiving the RPC and then checking if the sequence is within the
> > > window. That delay allows the other nfsd threads to quickly move the window
> > > forward out of range.
> > >
> > > If the server discards the RPC then that causes then the client to wait
> > > forever for a response or until the connection is reset.
> > >
> > > By inserting tracepoints, I think I found two sources of delay:
> > >
> > > 1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup with
> > > GFP_KERNEL. It does this because presumabely it doesn't know how big the
> > > context handle should be.
> > >
> > > 2) gss_verify_mic() uses make_checksum() which eventually gets to
> > > crypto_alloc_hash() with GFP_KERNEL.
> > >
> > > For the first delay, can we assume the context handles are all going to be
> > > the same size? It looks like the handle is assigned by the server, so it
> > > seems like we should be able to know beforehand how large they are.
> >
> > It's assigned by the server, but I believe that happens in userland,
> > either in svcgssd or gss-proxy. On a quick look I can't find a limit
> > other than the rpc-imposed limit of 400 bytes for an rpc credential. So
> > we'd need a documented agreement with svcgssd and gss-proxy for that.
> > Probably easy for the former, not sure about the latter.
>
> OK, that gives me something to follow up.

Taking a look at the code.... That thing we're allocating is just a
temporary way to pass in the handle that we're looking for, so why not
just something like this?:

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 1095be9c80ab..22886097d8ee 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -569,10 +569,9 @@ gss_svc_searchbyctx(struct cache_detail *cd, struct xdr_netobj *handle)
struct rsc *found;

memset(&rsci, 0, sizeof(rsci));
- if (dup_to_netobj(&rsci.handle, handle->data, handle->len))
- return NULL;
+ rsci.handle.data = handle->data;
+ rsci.handle.len = handle->len;
found = rsc_lookup(cd, &rsci);
- rsc_free(&rsci);
if (!found)
return NULL;
if (cache_check(cd, &found->h, NULL))

--b.

2016-04-25 21:22:38

by J. Bruce Fields

[permalink] [raw]

Subject: Re: nfsd delays between svc_recv and gss_check_seq_num

On Sun, Apr 10, 2016 at 07:44:45AM -0400, Benjamin Coddington wrote:
> My client hangs on xfstests generic/074 on a krb5 mount, and I've found that
> the linux server is silently discarding one or more RPCs because the GSS
> sequence numbers are outside the sequence window.
>
> The reason is that sometimes one of the nfsd threads takes a long time
> between receiving the RPC and then checking if the sequence is within the
> window. That delay allows the other nfsd threads to quickly move the window
> forward out of range.
>
> If the server discards the RPC then that causes then the client to wait
> forever for a response or until the connection is reset.
>
> By inserting tracepoints, I think I found two sources of delay:
>
> 1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup with
> GFP_KERNEL. It does this because presumabely it doesn't know how big the
> context handle should be.
>
> 2) gss_verify_mic() uses make_checksum() which eventually gets to
> crypto_alloc_hash() with GFP_KERNEL.
>
> For the first delay, can we assume the context handles are all going to be
> the same size? It looks like the handle is assigned by the server, so it
> seems like we should be able to know beforehand how large they are.
>
> For the second allocation -- I haven't thrown a lot of thought into what
> could be done to fix it.. seems a bit tricker. I'll think about both of
> these a bit more, but I thought in the meantime to ask if anyone has
> thoughts about this problem. Maybe we can to the sequence check before
> verify_mic -- but then a message that fails verification could flip the
> sequence bit..

Yes.

It would be better to allocate the crypto context at the time we create
the gss context, instead of each time we need to use it.

The problem is that we could then use that crypto context concurrently
from multiple tasks. And it could have some state--it did when we first
wrote this code, at least. But I'm sure we could work around that
somehow if it were worth it.

Looking at the code now--I don't think the original crypto api had this
separate ahash_request_alloc step. If all the state's in there, then we
should be able to allocate the crypto_ahash once when we create the
krb5_ctx, and then only need the ahash_request_alloc for each operation.
Does that help?

--b.

2016-07-20 10:33:01

by Red Hat

[permalink] [raw]

Subject: Re: nfsd delays between svc_recv and gss_check_seq_num

On Sun, 10 Apr 2016, Benjamin Coddington wrote:

> My client hangs on xfstests generic/074 on a krb5 mount, and I've found that
> the linux server is silently discarding one or more RPCs because the GSS
> sequence numbers are outside the sequence window.
>
> The reason is that sometimes one of the nfsd threads takes a long time
> between receiving the RPC and then checking if the sequence is within the
> window. That delay allows the other nfsd threads to quickly move the window
> forward out of range.
>
> If the server discards the RPC then that causes then the client to wait
> forever for a response or until the connection is reset.
>
> By inserting tracepoints, I think I found two sources of delay:
>
> 1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup with
> GFP_KERNEL. It does this because presumabely it doesn't know how big the
> context handle should be.
>
> 2) gss_verify_mic() uses make_checksum() which eventually gets to
> crypto_alloc_hash() with GFP_KERNEL.
>
> For the first delay, can we assume the context handles are all going to be
> the same size? It looks like the handle is assigned by the server, so it
> seems like we should be able to know beforehand how large they are.
>
> For the second allocation -- I haven't thrown a lot of thought into what
> could be done to fix it.. seems a bit tricker. I'll think about both of
> these a bit more, but I thought in the meantime to ask if anyone has
> thoughts about this problem. Maybe we can to the sequence check before
> verify_mic -- but then a message that fails verification could flip the
> sequence bit..

I haven't gotten back to this problem yet, but in the interest of not
forgetting about it I want to self-reply that some testing with kmemleak
showed a bunch of these on trond's linux-next branch. I haven't done any
other investigation:

unreferenced object 0xffff8800545bfe48 (size 16):
comm "nfsd", pid 1548, jiffies 4294980011 (age 54.660s)
hex dump (first 16 bytes):
2a 86 48 86 f7 12 01 02 02 6b 6b 6b 6b 6b 6b a5 *.H......kkkkkk.
backtrace:
[<ffffffff817c1f2e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff81220767>] __kmalloc_track_caller+0x187/0x2a0
[<ffffffff811d6cb0>] kmemdup+0x20/0x50
[<ffffffffc011d86f>] gssx_dec_buffer+0x8f/0xa0 [auth_rpcgss]
[<ffffffffc011e252>] gssx_dec_accept_sec_context+0x242/0x58e
[auth_rpcgss]
[<ffffffffc0147e57>] rpcauth_unwrap_resp+0xa7/0xe0 [sunrpc]
[<ffffffffc01386e3>] call_decode+0x1e3/0x810 [sunrpc]
[<ffffffffc01441b1>] __rpc_execute+0x91/0x460 [sunrpc]
[<ffffffffc01468a1>] rpc_execute+0x61/0xc0 [sunrpc]
[<ffffffffc013a351>] rpc_run_task+0xf1/0x130 [sunrpc]
[<ffffffffc013a3e0>] rpc_call_sync+0x50/0xc0 [sunrpc]
[<ffffffffc011d3fa>] gssp_accept_sec_context_upcall+0x1aa/0x4b0
[auth_rpcgss]
[<ffffffffc011b9c5>] svcauth_gss_proxy_init+0x175/0x520 [auth_rpcgss]
[<ffffffffc011c172>] svcauth_gss_accept+0x402/0xb60 [auth_rpcgss]
[<ffffffffc014d77d>] svc_authenticate+0xed/0xf0 [sunrpc]
[<ffffffffc01491e4>] svc_process_common+0x244/0x690 [sunrpc]

Ben

2016-07-20 11:29:33

by Benjamin Coddington

[permalink] [raw]

Subject: Re: nfsd delays between svc_recv and gss_check_seq_num

Uh, something is clearly wrong with my mailer now..

On 20 Jul 2016, at 6:33, Red Hat wrote:

> On Sun, 10 Apr 2016, Benjamin Coddington wrote:
>
>> My client hangs on xfstests generic/074 on a krb5 mount, and I've
>> found that
>> the linux server is silently discarding one or more RPCs because the
>> GSS
>> sequence numbers are outside the sequence window.
>>
>> The reason is that sometimes one of the nfsd threads takes a long
>> time
>> between receiving the RPC and then checking if the sequence is within
>> the
>> window. That delay allows the other nfsd threads to quickly move the
>> window
>> forward out of range.
>>
>> If the server discards the RPC then that causes then the client to
>> wait
>> forever for a response or until the connection is reset.
>>
>> By inserting tracepoints, I think I found two sources of delay:
>>
>> 1) gss_svc_searchbyctx() uses dup_to_netobj() which has a kmemdup
>> with
>> GFP_KERNEL. It does this because presumabely it doesn't know how big
>> the
>> context handle should be.
>>
>> 2) gss_verify_mic() uses make_checksum() which eventually gets to
>> crypto_alloc_hash() with GFP_KERNEL.
>>
>> For the first delay, can we assume the context handles are all going
>> to be
>> the same size? It looks like the handle is assigned by the server,
>> so it
>> seems like we should be able to know beforehand how large they are.
>>
>> For the second allocation -- I haven't thrown a lot of thought into
>> what
>> could be done to fix it.. seems a bit tricker. I'll think about both
>> of
>> these a bit more, but I thought in the meantime to ask if anyone has
>> thoughts about this problem. Maybe we can to the sequence check
>> before
>> verify_mic -- but then a message that fails verification could flip
>> the
>> sequence bit..
>
> I haven't gotten back to this problem yet, but in the interest of not
> forgetting about it I want to self-reply that some testing with
> kmemleak
> showed a bunch of these on trond's linux-next branch. I haven't done
> any
> other investigation:
>
> unreferenced object 0xffff8800545bfe48 (size 16):
> comm "nfsd", pid 1548, jiffies 4294980011 (age 54.660s)
> hex dump (first 16 bytes):
> 2a 86 48 86 f7 12 01 02 02 6b 6b 6b 6b 6b 6b a5 *.H......kkkkkk.
> backtrace:
> [<ffffffff817c1f2e>] kmemleak_alloc+0x4e/0xb0
> [<ffffffff81220767>] __kmalloc_track_caller+0x187/0x2a0
> [<ffffffff811d6cb0>] kmemdup+0x20/0x50
> [<ffffffffc011d86f>] gssx_dec_buffer+0x8f/0xa0 [auth_rpcgss]
> [<ffffffffc011e252>] gssx_dec_accept_sec_context+0x242/0x58e
> [auth_rpcgss]
> [<ffffffffc0147e57>] rpcauth_unwrap_resp+0xa7/0xe0 [sunrpc]
> [<ffffffffc01386e3>] call_decode+0x1e3/0x810 [sunrpc]
> [<ffffffffc01441b1>] __rpc_execute+0x91/0x460 [sunrpc]
> [<ffffffffc01468a1>] rpc_execute+0x61/0xc0 [sunrpc]
> [<ffffffffc013a351>] rpc_run_task+0xf1/0x130 [sunrpc]
> [<ffffffffc013a3e0>] rpc_call_sync+0x50/0xc0 [sunrpc]
> [<ffffffffc011d3fa>] gssp_accept_sec_context_upcall+0x1aa/0x4b0
> [auth_rpcgss]
> [<ffffffffc011b9c5>] svcauth_gss_proxy_init+0x175/0x520
> [auth_rpcgss]
> [<ffffffffc011c172>] svcauth_gss_accept+0x402/0xb60 [auth_rpcgss]
> [<ffffffffc014d77d>] svc_authenticate+0xed/0xf0 [sunrpc]
> [<ffffffffc01491e4>] svc_process_common+0x244/0x690 [sunrpc]
>
> Ben