2016-09-07 20:36:19

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1] svcauth_gss: Close connection when dropping an incoming message

S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
whose sequence number lies outside the current window is dropped.
The rationale is:

The reason for discarding requests silently is that the server
is unable to determine if the duplicate or out of range request
was due to a sequencing problem in the client, network, or the
operating system, or due to some quirk in routing, or a replay
attack by an intruder. Discarding the request allows the client
to recover after timing out, if indeed the duplication was
unintentional or well intended.

However, clients may rely on the server dropping the connection to
indicate that a retransmit is needed. Without a connection reset, a
client can wait forever without retransmitting, and the workload
just stops dead. I've reproduced this behavior by running xfstests
generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.

To address this issue, have the server close the connection when it
silently discards an incoming message due to a GSS sequence number
problem.

Signed-off-by: Chuck Lever <[email protected]>
Cc: Benjamin Coddington <[email protected]>
---
Hi-

Passed testing with my reproducer: 10 runs of generic/323 with
proto=rdma and sec=krb5i, with NFSv3, NFSv4.0, and NFSv4.1.
generic/323 is 120 seconds or so of a heavy aio workload.

I tested with that dprintk replaced with pr_warn to confirm that the
reproducer hits this path one or more times per test run.

net/sunrpc/auth_gss/svcauth_gss.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index d858202..3ff52ec 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -696,7 +696,8 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci,
if (!gss_check_seq_num(rsci, gc->gc_seq)) {
dprintk("RPC: svcauth_gss: discarding request with "
"old sequence number %d\n", gc->gc_seq);
- return SVC_DROP;
+ /* Signal to the client that an RPC message was lost */
+ return SVC_CLOSE;
}
return SVC_OK;
}


2016-09-09 21:18:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v1] svcauth_gss: Close connection when dropping an incoming message

On Wed, Sep 07, 2016 at 04:36:19PM -0400, Chuck Lever wrote:
> S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
> whose sequence number lies outside the current window is dropped.
> The rationale is:
>
> The reason for discarding requests silently is that the server
> is unable to determine if the duplicate or out of range request
> was due to a sequencing problem in the client, network, or the
> operating system, or due to some quirk in routing, or a replay
> attack by an intruder. Discarding the request allows the client
> to recover after timing out, if indeed the duplication was
> unintentional or well intended.
>
> However, clients may rely on the server dropping the connection to
> indicate that a retransmit is needed. Without a connection reset, a
> client can wait forever without retransmitting, and the workload
> just stops dead. I've reproduced this behavior by running xfstests
> generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.
>
> To address this issue, have the server close the connection when it
> silently discards an incoming message due to a GSS sequence number
> problem.
>
> Signed-off-by: Chuck Lever <[email protected]>
> Cc: Benjamin Coddington <[email protected]>
> ---
> Hi-
>
> Passed testing with my reproducer: 10 runs of generic/323 with
> proto=rdma and sec=krb5i, with NFSv3, NFSv4.0, and NFSv4.1.
> generic/323 is 120 seconds or so of a heavy aio workload.
>
> I tested with that dprintk replaced with pr_warn to confirm that the
> reproducer hits this path one or more times per test run.

Thanks, this is useful, but before applying I'd just like to audit other
uses of SVC_DROP in the server rpc code as this probably isn't the only
place with this problem.

Also, this changes behavior for v2/v3 too, does that cause any problems?
Is it OK for the server to just always close connections on dropping in
the v2/v3 case too?

--b.

>
> net/sunrpc/auth_gss/svcauth_gss.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index d858202..3ff52ec 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -696,7 +696,8 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci,
> if (!gss_check_seq_num(rsci, gc->gc_seq)) {
> dprintk("RPC: svcauth_gss: discarding request with "
> "old sequence number %d\n", gc->gc_seq);
> - return SVC_DROP;
> + /* Signal to the client that an RPC message was lost */
> + return SVC_CLOSE;
> }
> return SVC_OK;
> }

2016-09-12 15:57:13

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] svcauth_gss: Close connection when dropping an incoming message

Hi Bruce-


> On Sep 9, 2016, at 5:18 PM, J. Bruce Fields <[email protected]> =
wrote:
>=20
> On Wed, Sep 07, 2016 at 04:36:19PM -0400, Chuck Lever wrote:
>> S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
>> whose sequence number lies outside the current window is dropped.
>> The rationale is:
>>=20
>> The reason for discarding requests silently is that the server
>> is unable to determine if the duplicate or out of range request
>> was due to a sequencing problem in the client, network, or the
>> operating system, or due to some quirk in routing, or a replay
>> attack by an intruder. Discarding the request allows the client
>> to recover after timing out, if indeed the duplication was
>> unintentional or well intended.
>>=20
>> However, clients may rely on the server dropping the connection to
>> indicate that a retransmit is needed. Without a connection reset, a
>> client can wait forever without retransmitting, and the workload
>> just stops dead. I've reproduced this behavior by running xfstests
>> generic/323 on an NFSv4.0 mount with proto=3Drdma and sec=3Dkrb5i.
>>=20
>> To address this issue, have the server close the connection when it
>> silently discards an incoming message due to a GSS sequence number
>> problem.
>>=20
>> Signed-off-by: Chuck Lever <[email protected]>
>> Cc: Benjamin Coddington <[email protected]>
>> ---
>> Hi-
>>=20
>> Passed testing with my reproducer: 10 runs of generic/323 with
>> proto=3Drdma and sec=3Dkrb5i, with NFSv3, NFSv4.0, and NFSv4.1.
>> generic/323 is 120 seconds or so of a heavy aio workload.
>>=20
>> I tested with that dprintk replaced with pr_warn to confirm that the
>> reproducer hits this path one or more times per test run.
>=20
> Thanks, this is useful, but before applying I'd just like to audit =
other
> uses of SVC_DROP in the server rpc code as this probably isn't the =
only
> place with this problem.

Consider this a test result, then.

So, "I'd just like to audit" means you are doing the auditing now, or
would you like me to dig into that?


> Also, this changes behavior for v2/v3 too, does that cause any =
problems?
> Is it OK for the server to just always close connections on dropping =
in
> the v2/v3 case too?

I've run the same tests with NFSv3 (NFS/RDMA + krb5i or krb5p) and did
not see a negative impact. Not much, but there it is.

What would provide more confidence that NFSv2/3 is not impacted?


> --b.
>=20
>>=20
>> net/sunrpc/auth_gss/svcauth_gss.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>=20
>> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c =
b/net/sunrpc/auth_gss/svcauth_gss.c
>> index d858202..3ff52ec 100644
>> --- a/net/sunrpc/auth_gss/svcauth_gss.c
>> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
>> @@ -696,7 +696,8 @@ gss_verify_header(struct svc_rqst *rqstp, struct =
rsc *rsci,
>> if (!gss_check_seq_num(rsci, gc->gc_seq)) {
>> dprintk("RPC: svcauth_gss: discarding request with =
"
>> "old sequence number %d\n", gc->gc_seq);
>> - return SVC_DROP;
>> + /* Signal to the client that an RPC message was lost */
>> + return SVC_CLOSE;
>> }
>> return SVC_OK;
>> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

2016-09-12 16:18:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v1] svcauth_gss: Close connection when dropping an incoming message

On Mon, Sep 12, 2016 at 11:57:13AM -0400, Chuck Lever wrote:
> Hi Bruce-
>
>
> > On Sep 9, 2016, at 5:18 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Wed, Sep 07, 2016 at 04:36:19PM -0400, Chuck Lever wrote:
> >> S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
> >> whose sequence number lies outside the current window is dropped.
> >> The rationale is:
> >>
> >> The reason for discarding requests silently is that the server
> >> is unable to determine if the duplicate or out of range request
> >> was due to a sequencing problem in the client, network, or the
> >> operating system, or due to some quirk in routing, or a replay
> >> attack by an intruder. Discarding the request allows the client
> >> to recover after timing out, if indeed the duplication was
> >> unintentional or well intended.
> >>
> >> However, clients may rely on the server dropping the connection to
> >> indicate that a retransmit is needed. Without a connection reset, a
> >> client can wait forever without retransmitting, and the workload
> >> just stops dead. I've reproduced this behavior by running xfstests
> >> generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.
> >>
> >> To address this issue, have the server close the connection when it
> >> silently discards an incoming message due to a GSS sequence number
> >> problem.
> >>
> >> Signed-off-by: Chuck Lever <[email protected]>
> >> Cc: Benjamin Coddington <[email protected]>
> >> ---
> >> Hi-
> >>
> >> Passed testing with my reproducer: 10 runs of generic/323 with
> >> proto=rdma and sec=krb5i, with NFSv3, NFSv4.0, and NFSv4.1.
> >> generic/323 is 120 seconds or so of a heavy aio workload.
> >>
> >> I tested with that dprintk replaced with pr_warn to confirm that the
> >> reproducer hits this path one or more times per test run.
> >
> > Thanks, this is useful, but before applying I'd just like to audit other
> > uses of SVC_DROP in the server rpc code as this probably isn't the only
> > place with this problem.
>
> Consider this a test result, then.
>
> So, "I'd just like to audit" means you are doing the auditing now, or
> would you like me to dig into that?

I haven't looked at it, if you can that would be fantastic.

> > Also, this changes behavior for v2/v3 too, does that cause any problems?
> > Is it OK for the server to just always close connections on dropping in
> > the v2/v3 case too?
>
> I've run the same tests with NFSv3 (NFS/RDMA + krb5i or krb5p) and did
> not see a negative impact. Not much, but there it is.
>
> What would provide more confidence that NFSv2/3 is not impacted?

I guess I'm not too worried. Surely NFSv3 clients have always had to
handle reconnecting connections closed by the server.

--b.