LinuxLists.cc - [PATCH] NFSD: fix decode_cb

2011-02-22 22:43:26

Subject: [PATCH] NFSD: fix decode_cb_sequence4resok

Fix bug introduced in patch
85a56480 NFSD: Update XDR decoders in NFSv4 callback client

Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
it must account for their space in their xdr stream when calling xdr_inline_decode

Cc: Chuck Lever <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4callback.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index da54498..d046bdb 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -592,7 +592,7 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr,
* If the server returns different values for sessionID, slotID or
* sequence number, the server is looney tunes.
*/
- p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4);
+ p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4 + 4 + 4);
if (unlikely(p == NULL))
goto out_overflow;
memcpy(id.data, p, NFS4_MAX_SESSIONID_LEN);
--
1.7.3.4

2011-02-23 00:25:31

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On Tue, Feb 22, 2011 at 04:15:22PM -0800, Benny Halevy wrote:
> On 2011-02-22 16:11, J. Bruce Fields wrote:
> > On Tue, Feb 22, 2011 at 02:43:22PM -0800, Benny Halevy wrote:
> >> Fix bug introduced in patch
> >> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
> >>
> >> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
> >> it must account for their space in their xdr stream when calling xdr_inline_decode
> >
> > Thanks, applying for 2.6.38. (How come you caught this, and I didn't?
> > I guess it's just that the object code depends more on the callback
> > returns?)
>
> That's a great question.
> I caught it by testing cb_layoutrecall, both with the objects layout
> and with the files layout with the patch I sent for PNFSD_LEXP to recall the
> layout on truncate.
> cb_recall decoding should be broken too at the server.
> It's possible nothing really depends on its status.

Yeah, the server may take a failure as a sign it should retry the
request, but the only "reply" that really matters is the eventual
{deleg,layout,whatever}return.

On the client side a failure here would be visisble at most as a retry,
so pynfs won't catch this kind of thing.

--b.

2011-02-23 18:07:19

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On 2011-02-23 09:29, J. Bruce Fields wrote:
> On Wed, Feb 23, 2011 at 09:08:34AM -0800, Benny Halevy wrote:
>> On 2011-02-23 08:48, Chuck Lever wrote:
>>>
>>> On Feb 22, 2011, at 2:43 PM, Benny Halevy wrote:
>>>
>>>> Fix bug introduced in patch
>>>> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
>>>>
>>>> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
>>>> it must account for their space in their xdr stream when calling xdr_inline_decode
>>>
>>> The real problem is that decoding for the next operation in the compound will start too early in the buffer, because we didn't account for the ignored 8 bytes here, yes?
>>
>> Right on the spot.
>
> So actually I guess there is another bug here, which is a subset of
>
> http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues#Callback_error_handling
>
> The server should be setting the appropriate sequence flag when it
> (rightly or wrongly) things that a cb reply is garbage (not sure which
> flag off the top of my head), and pynfs should be insisting that the
> flag be set on any further sequence flags.
>
> --b.

SEQ4_STATUS_BACKCHANNEL_FAULT
The server has encountered an unrecoverable fault with the
backchannel (e.g., it has lost track of the sequence ID for a slot
in the backchannel). The client MUST stop sending more requests
on the session's fore channel, wait for all outstanding requests
to complete on the fore and back channel, and then destroy the
session.

Right?

2011-02-23 17:31:12

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On Wed, Feb 23, 2011 at 09:08:34AM -0800, Benny Halevy wrote:
> On 2011-02-23 08:48, Chuck Lever wrote:
> >
> > On Feb 22, 2011, at 2:43 PM, Benny Halevy wrote:
> >
> >> Fix bug introduced in patch
> >> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
> >>
> >> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
> >> it must account for their space in their xdr stream when calling xdr_inline_decode
> >
> > The real problem is that decoding for the next operation in the compound will start too early in the buffer, because we didn't account for the ignored 8 bytes here, yes?
>
> Right on the spot.

So actually I guess there is another bug here, which is a subset of

http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues#Callback_error_handling

The server should be setting the appropriate sequence flag when it
(rightly or wrongly) things that a cb reply is garbage (not sure which
flag off the top of my head), and pynfs should be insisting that the
flag be set on any further sequence flags.

--b.

2011-02-23 16:49:06

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On Feb 22, 2011, at 2:43 PM, Benny Halevy wrote:

> Fix bug introduced in patch
> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
>
> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
> it must account for their space in their xdr stream when calling xdr_inline_decode

The real problem is that decoding for the next operation in the compound will start too early in the buffer, because we didn't account for the ignored 8 bytes here, yes?

Reviewed-by: Chuck Lever <[email protected]>

> Cc: Chuck Lever <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index da54498..d046bdb 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -592,7 +592,7 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr,
> * If the server returns different values for sessionID, slotID or
> * sequence number, the server is looney tunes.
> */
> - p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4);
> + p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4 + 4 + 4);
> if (unlikely(p == NULL))
> goto out_overflow;
> memcpy(id.data, p, NFS4_MAX_SESSIONID_LEN);
> --
> 1.7.3.4
>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2011-02-23 17:08:33

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On 2011-02-23 08:48, Chuck Lever wrote:
>
> On Feb 22, 2011, at 2:43 PM, Benny Halevy wrote:
>
>> Fix bug introduced in patch
>> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
>>
>> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
>> it must account for their space in their xdr stream when calling xdr_inline_decode
>
> The real problem is that decoding for the next operation in the compound will start too early in the buffer, because we didn't account for the ignored 8 bytes here, yes?

Right on the spot.

Benny

>
> Reviewed-by: Chuck Lever <[email protected]>
>
>> Cc: Chuck Lever <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/nfs4callback.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index da54498..d046bdb 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -592,7 +592,7 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr,
>> * If the server returns different values for sessionID, slotID or
>> * sequence number, the server is looney tunes.
>> */
>> - p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4);
>> + p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4 + 4 + 4);
>> if (unlikely(p == NULL))
>> goto out_overflow;
>> memcpy(id.data, p, NFS4_MAX_SESSIONID_LEN);
>> --
>> 1.7.3.4
>>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>

2011-02-23 18:43:12

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On 2011-02-23 10:07, Benny Halevy wrote:
> On 2011-02-23 09:29, J. Bruce Fields wrote:
>> On Wed, Feb 23, 2011 at 09:08:34AM -0800, Benny Halevy wrote:
>>> On 2011-02-23 08:48, Chuck Lever wrote:
>>>>
>>>> On Feb 22, 2011, at 2:43 PM, Benny Halevy wrote:
>>>>
>>>>> Fix bug introduced in patch
>>>>> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
>>>>>
>>>>> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
>>>>> it must account for their space in their xdr stream when calling xdr_inline_decode
>>>>
>>>> The real problem is that decoding for the next operation in the compound will start too early in the buffer, because we didn't account for the ignored 8 bytes here, yes?
>>>
>>> Right on the spot.
>>
>> So actually I guess there is another bug here, which is a subset of
>>
>> http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues#Callback_error_handling
>>
>> The server should be setting the appropriate sequence flag when it
>> (rightly or wrongly) things that a cb reply is garbage (not sure which
>> flag off the top of my head), and pynfs should be insisting that the
>> flag be set on any further sequence flags.
>>
>> --b.
>
> SEQ4_STATUS_BACKCHANNEL_FAULT
> The server has encountered an unrecoverable fault with the
> backchannel (e.g., it has lost track of the sequence ID for a slot
> in the backchannel). The client MUST stop sending more requests
> on the session's fore channel, wait for all outstanding requests
> to complete on the fore and back channel, and then destroy the
> session.
>
> Right?
> --

How about this patch?

>From c5b856eaab1e17f3d67b6fd0964d0803318ec342 Mon Sep 17 00:00:00 2001
From: Benny Halevy <[email protected]>
Date: Wed, 23 Feb 2011 10:38:19 -0800
Subject: [PATCH] nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4callback.c | 10 ++++++++++
fs/nfsd/nfs4state.c | 8 +++++++-
fs/nfsd/state.h | 1 +
3 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d046bdb..b914fb1 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -39,6 +39,8 @@

#define NFSDDBG_FACILITY NFSDDBG_PROC

+static void nfsd4_mark_cb_fault(struct nfs4_client *, int reason);
+
#define NFSPROC4_CB_NULL 0
#define NFSPROC4_CB_COMPOUND 1

@@ -620,6 +622,8 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr,
*/
status = 0;
out:
+ if (status)
+ nfsd4_mark_cb_fault(cb->cb_clp, status);
return status;
out_overflow:
print_overflow_msg(__func__, xdr);
@@ -935,6 +939,12 @@ static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason)
warn_no_callback_path(clp, reason);
}

+static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason)
+{
+ clp->cl_cb_state = NFSD4_CB_FAULT;
+ warn_no_callback_path(clp, reason);
+}
+
static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata)
{
struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index e8df39f..2188c16 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1856,8 +1856,14 @@ out:

nfsd4_get_session(cstate->session);
atomic_inc(&clp->cl_refcount);
- if (clp->cl_cb_state == NFSD4_CB_DOWN)
+ switch (clp->cl_cb_state) {
+ case NFSD4_CB_DOWN:
seq->status_flags |= SEQ4_STATUS_CB_PATH_DOWN;
+ break;
+ case NFSD4_CB_FAULT:
+ seq->status_flags |= SEQ4_STATUS_BACKCHANNEL_FAULT;
+ break;
+ }
}
kfree(conn);
spin_unlock(&client_lock);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index c934e1c..95ddf7a 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -235,6 +235,7 @@ struct nfs4_client {
#define NFSD4_CB_UP 0
#define NFSD4_CB_UNKNOWN 1
#define NFSD4_CB_DOWN 2
+#define NFSD4_CB_FAULT 3
int cl_cb_state;
struct nfsd4_callback cl_cb_null;
struct nfsd4_session *cl_cb_session;
--
1.7.3.4

2011-02-25 00:37:33

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On Wed, Feb 23, 2011 at 10:43:13AM -0800, Benny Halevy wrote:
> On 2011-02-23 10:07, Benny Halevy wrote:
> > SEQ4_STATUS_BACKCHANNEL_FAULT
> > The server has encountered an unrecoverable fault with the
> > backchannel (e.g., it has lost track of the sequence ID for a slot
> > in the backchannel). The client MUST stop sending more requests
> > on the session's fore channel, wait for all outstanding requests
> > to complete on the fore and back channel, and then destroy the
> > session.
> >
> > Right?

Yep, thanks. Looking at my todo list on the wiki, I see I mistakenly
had two entries for the backchannel error handling.... I've fixed that
and added a mention of this case:

http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues#Callback_failure_handling

> How about this patch?

Thanks for diving into this!

I suspect the FAULT state should be per-session, not
per-client--otherwise a client that has more than one session may be
forced to throw away some other perfectly good session.

Makes sense to me to set this on any callback decode error, as that's a
pretty sure sign that the backchannel is hopeless.

It's probably the right behavior on failure to decode a later op in the
compound as well, and we don't want to add this check to every op
decoder--perhaps a check in e.g. nfsd4_cb_done() could catch any error
that results from callback decoding.

--b.

>
> From c5b856eaab1e17f3d67b6fd0964d0803318ec342 Mon Sep 17 00:00:00 2001
> From: Benny Halevy <[email protected]>
> Date: Wed, 23 Feb 2011 10:38:19 -0800
> Subject: [PATCH] nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 10 ++++++++++
> fs/nfsd/nfs4state.c | 8 +++++++-
> fs/nfsd/state.h | 1 +
> 3 files changed, 18 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index d046bdb..b914fb1 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -39,6 +39,8 @@
>
> #define NFSDDBG_FACILITY NFSDDBG_PROC
>
> +static void nfsd4_mark_cb_fault(struct nfs4_client *, int reason);
> +
> #define NFSPROC4_CB_NULL 0
> #define NFSPROC4_CB_COMPOUND 1
>
> @@ -620,6 +622,8 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr,
> */
> status = 0;
> out:
> + if (status)
> + nfsd4_mark_cb_fault(cb->cb_clp, status);
> return status;
> out_overflow:
> print_overflow_msg(__func__, xdr);
> @@ -935,6 +939,12 @@ static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason)
> warn_no_callback_path(clp, reason);
> }
>
> +static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason)
> +{
> + clp->cl_cb_state = NFSD4_CB_FAULT;
> + warn_no_callback_path(clp, reason);
> +}
> +
> static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata)
> {
> struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null);
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index e8df39f..2188c16 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1856,8 +1856,14 @@ out:
>
> nfsd4_get_session(cstate->session);
> atomic_inc(&clp->cl_refcount);
> - if (clp->cl_cb_state == NFSD4_CB_DOWN)
> + switch (clp->cl_cb_state) {
> + case NFSD4_CB_DOWN:
> seq->status_flags |= SEQ4_STATUS_CB_PATH_DOWN;
> + break;
> + case NFSD4_CB_FAULT:
> + seq->status_flags |= SEQ4_STATUS_BACKCHANNEL_FAULT;
> + break;
> + }
> }
> kfree(conn);
> spin_unlock(&client_lock);
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index c934e1c..95ddf7a 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -235,6 +235,7 @@ struct nfs4_client {
> #define NFSD4_CB_UP 0
> #define NFSD4_CB_UNKNOWN 1
> #define NFSD4_CB_DOWN 2
> +#define NFSD4_CB_FAULT 3
> int cl_cb_state;
> struct nfsd4_callback cl_cb_null;
> struct nfsd4_session *cl_cb_session;
> --
> 1.7.3.4
>

2011-02-23 00:12:14

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On Tue, Feb 22, 2011 at 02:43:22PM -0800, Benny Halevy wrote:
> Fix bug introduced in patch
> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
>
> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
> it must account for their space in their xdr stream when calling xdr_inline_decode

Thanks, applying for 2.6.38. (How come you caught this, and I didn't?
I guess it's just that the object code depends more on the callback
returns?)

--b.

>
> Cc: Chuck Lever <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index da54498..d046bdb 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -592,7 +592,7 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr,
> * If the server returns different values for sessionID, slotID or
> * sequence number, the server is looney tunes.
> */
> - p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4);
> + p = xdr_inline_decode(xdr, NFS4_MAX_SESSIONID_LEN + 4 + 4 + 4 + 4);
> if (unlikely(p == NULL))
> goto out_overflow;
> memcpy(id.data, p, NFS4_MAX_SESSIONID_LEN);
> --
> 1.7.3.4
>

2011-02-23 00:15:25

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH] NFSD: fix decode_cb_sequence4resok

On 2011-02-22 16:11, J. Bruce Fields wrote:
> On Tue, Feb 22, 2011 at 02:43:22PM -0800, Benny Halevy wrote:
>> Fix bug introduced in patch
>> 85a56480 NFSD: Update XDR decoders in NFSv4 callback client
>>
>> Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
>> it must account for their space in their xdr stream when calling xdr_inline_decode
>
> Thanks, applying for 2.6.38. (How come you caught this, and I didn't?
> I guess it's just that the object code depends more on the callback
> returns?)

That's a great question.
I caught it by testing cb_layoutrecall, both with the objects layout
and with the files layout with the patch I sent for PNFSD_LEXP to recall the
layout on truncate.
cb_recall decoding should be broken too at the server.
It's possible nothing really depends on its status.

Benny

>
> --b.
>
>>