2010-06-28 17:33:23

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH] nfsd41: Fix a crash when a callback is retried


If a callback is retried at nfsd4_cb_recall_done() do to
some error. The returned rpc reply would then crash here:

@@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
u32 dummy;
__be32 *p;

+ BUG_ON(!res);
if (res->cbs_minorversion == 0)
return 0;

[BUG_ON added for demonstration]

This is because the nfsd4_cb_done_sequence() has NULLed out
the task->tk_msg.rpc_resp pointer.

This problem was introduced by a 4.1 protocol addition patch:
[0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1

Which was overlooking the possibility of an RPC callback retries.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfsd/nfs4callback.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index f3b5015..dace7e2 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -869,9 +869,6 @@ static void nfsd4_cb_done_sequence(struct rpc_task *task,
rpc_wake_up_next(&clp->cl_cb_waitq);
dprintk("%s: freed slot, new seqid=%d\n", __func__,
clp->cl_cb_seq_nr);
-
- /* We're done looking into the sequence information */
- task->tk_msg.rpc_resp = NULL;
}
}

--
1.6.6.1



2010-06-28 17:38:39

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH] nfsd41: Fix a crash when a callback is retried

On 06/28/2010 08:33 PM, Boaz Harrosh wrote:
>
> If a callback is retried at nfsd4_cb_recall_done() do to
> some error. The returned rpc reply would then crash here:
>
> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
> u32 dummy;
> __be32 *p;
>
> + BUG_ON(!res);
> if (res->cbs_minorversion == 0)
> return 0;
>
> [BUG_ON added for demonstration]
>
> This is because the nfsd4_cb_done_sequence() has NULLed out
> the task->tk_msg.rpc_resp pointer.
>
> This problem was introduced by a 4.1 protocol addition patch:
> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
>
> Which was overlooking the possibility of an RPC callback retries.
>
> Signed-off-by: Boaz Harrosh <[email protected]>

Bruce do we need a CC: [email protected] here. Is any code actually
exercising callbacks? And with 4.1?

It is an existing bug but an highly theoretical one. I'd say:
innocent until proven guilty in this case.

Boaz

> ---
> fs/nfsd/nfs4callback.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index f3b5015..dace7e2 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -869,9 +869,6 @@ static void nfsd4_cb_done_sequence(struct rpc_task *task,
> rpc_wake_up_next(&clp->cl_cb_waitq);
> dprintk("%s: freed slot, new seqid=%d\n", __func__,
> clp->cl_cb_seq_nr);
> -
> - /* We're done looking into the sequence information */
> - task->tk_msg.rpc_resp = NULL;
> }
> }
>


2010-06-28 18:50:15

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH] nfsd41: Fix a crash when a callback is retried

On Jun. 28, 2010, 20:33 +0300, Boaz Harrosh <[email protected]> wrote:
>
> If a callback is retried at nfsd4_cb_recall_done() do to
> some error. The returned rpc reply would then crash here:
>
> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
> u32 dummy;
> __be32 *p;
>
> + BUG_ON(!res);
> if (res->cbs_minorversion == 0)
> return 0;
>
> [BUG_ON added for demonstration]
>
> This is because the nfsd4_cb_done_sequence() has NULLed out
> the task->tk_msg.rpc_resp pointer.
>
> This problem was introduced by a 4.1 protocol addition patch:
> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
>
> Which was overlooking the possibility of an RPC callback retries.
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index f3b5015..dace7e2 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -869,9 +869,6 @@ static void nfsd4_cb_done_sequence(struct rpc_task *task,
> rpc_wake_up_next(&clp->cl_cb_waitq);
> dprintk("%s: freed slot, new seqid=%d\n", __func__,
> clp->cl_cb_seq_nr);
> -
> - /* We're done looking into the sequence information */
> - task->tk_msg.rpc_resp = NULL;
> }
> }
>

It looks like we have a more fundamental problem that
nfsd41_cb_setup_sequence is not called on the retry path
meaning that not only the message isn't reinitialized properly
but the single slot is not allocated as it should.
Boaz, I think you saw multiple callbacks going out concurrently, right?

rpc_restart_call_prepare() should be called instead of rpc_restart_call()


2010-06-29 07:43:29

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH] nfsd41: Fix a crash when a callback is retried

On 06/28/2010 09:50 PM, Benny Halevy wrote:
> On Jun. 28, 2010, 20:33 +0300, Boaz Harrosh <[email protected]> wrote:
>>
>> If a callback is retried at nfsd4_cb_recall_done() do to
>> some error. The returned rpc reply would then crash here:
>>
>> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
>> u32 dummy;
>> __be32 *p;
>>
>> + BUG_ON(!res);
>> if (res->cbs_minorversion == 0)
>> return 0;
>>
>> [BUG_ON added for demonstration]
>>
>> This is because the nfsd4_cb_done_sequence() has NULLed out
>> the task->tk_msg.rpc_resp pointer.
>>
>> This problem was introduced by a 4.1 protocol addition patch:
>> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
>>
>> Which was overlooking the possibility of an RPC callback retries.
>>
>> Signed-off-by: Boaz Harrosh <[email protected]>
>> ---
>> fs/nfsd/nfs4callback.c | 3 ---
>> 1 files changed, 0 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index f3b5015..dace7e2 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -869,9 +869,6 @@ static void nfsd4_cb_done_sequence(struct rpc_task *task,
>> rpc_wake_up_next(&clp->cl_cb_waitq);
>> dprintk("%s: freed slot, new seqid=%d\n", __func__,
>> clp->cl_cb_seq_nr);
>> -
>> - /* We're done looking into the sequence information */
>> - task->tk_msg.rpc_resp = NULL;
>> }
>> }
>>
>
> It looks like we have a more fundamental problem that
> nfsd41_cb_setup_sequence is not called on the retry path
> meaning that not only the message isn't reinitialized properly
> but the single slot is not allocated as it should.

That's true clp->cl_cb_slot_busy is not set on the retry path and there for
there might be a bug with a second in-coming cb_layout/cb_recall.
This is not the case with my test because exofs limits one cb_layout per inode,
and my test only uses one file. But with multiple files test it would.

> Boaz, I think you saw multiple callbacks going out concurrently, right?
>

No, that was a stupid exofs bug. I showed you. I was sending two cb_layout
each time, Just for fun ;-) (Patch coming soon)

> rpc_restart_call_prepare() should be called instead of rpc_restart_call()
>

Yes, that should work better, and nfsd4_cb_recall_done should be fixed as well.
Am testing cb_layout_done() path. (Will send patches soon)

Boaz

2010-06-29 11:33:58

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH version2] nfsd41: Fix a crash when a callback is retried


If a callback is retried at nfsd4_cb_recall_done() do to
some error. The returned rpc reply would then crash here:

@@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
u32 dummy;
__be32 *p;

+ BUG_ON(!res);
if (res->cbs_minorversion == 0)
return 0;

[BUG_ON added for demonstration]

This is because the nfsd4_cb_done_sequence() has NULLed out
the task->tk_msg.rpc_resp pointer.

Also eventually the rpc would use the new slot without making
sure it is free by calling nfsd41_cb_setup_sequence().

This problem was introduced by a 4.1 protocol addition patch:
[0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1

Which was overlooking the possibility of an RPC callback retries.
For not-4.1 case redoing the _prepare is harmless.

Signed-off-by: Boaz Harrosh <[email protected]>
---
fs/nfsd/nfs4callback.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index f3b5015..3bbeae8 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -913,7 +913,7 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
if (dp->dl_retries--) {
rpc_delay(task, 2*HZ);
task->tk_status = 0;
- rpc_restart_call(task);
+ rpc_restart_call_prepare(task);
return;
} else {
atomic_set(&clp->cl_cb_set, 0);
--
1.6.6.1



2010-07-20 14:37:39

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH version2] nfsd41: Fix a crash when a callback is retried

On 06/29/2010 02:33 PM, Boaz Harrosh wrote:
>
> If a callback is retried at nfsd4_cb_recall_done() do to
> some error. The returned rpc reply would then crash here:
>
> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
> u32 dummy;
> __be32 *p;
>
> + BUG_ON(!res);
> if (res->cbs_minorversion == 0)
> return 0;
>
> [BUG_ON added for demonstration]
>
> This is because the nfsd4_cb_done_sequence() has NULLed out
> the task->tk_msg.rpc_resp pointer.
>
> Also eventually the rpc would use the new slot without making
> sure it is free by calling nfsd41_cb_setup_sequence().
>
> This problem was introduced by a 4.1 protocol addition patch:
> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
>
> Which was overlooking the possibility of an RPC callback retries.
> For not-4.1 case redoing the _prepare is harmless.
>
> Signed-off-by: Boaz Harrosh <[email protected]>

Bruce hi.

This is a crash fix for current 4.1 code. Perhaps you have missed it.
(If not, sorry. Just that I've not seen any response)

Thanks
Boaz

> ---
> fs/nfsd/nfs4callback.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index f3b5015..3bbeae8 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -913,7 +913,7 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
> if (dp->dl_retries--) {
> rpc_delay(task, 2*HZ);
> task->tk_status = 0;
> - rpc_restart_call(task);
> + rpc_restart_call_prepare(task);
> return;
> } else {
> atomic_set(&clp->cl_cb_set, 0);


2010-07-21 23:29:39

by J.Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH version2] nfsd41: Fix a crash when a callback is retried

On Tue, Jul 20, 2010 at 05:37:36PM +0300, Boaz Harrosh wrote:
> On 06/29/2010 02:33 PM, Boaz Harrosh wrote:
> >
> > If a callback is retried at nfsd4_cb_recall_done() do to
> > some error. The returned rpc reply would then crash here:
> >
> > @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
> > u32 dummy;
> > __be32 *p;
> >
> > + BUG_ON(!res);
> > if (res->cbs_minorversion == 0)
> > return 0;
> >
> > [BUG_ON added for demonstration]
> >
> > This is because the nfsd4_cb_done_sequence() has NULLed out
> > the task->tk_msg.rpc_resp pointer.
> >
> > Also eventually the rpc would use the new slot without making
> > sure it is free by calling nfsd41_cb_setup_sequence().
> >
> > This problem was introduced by a 4.1 protocol addition patch:
> > [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
> >
> > Which was overlooking the possibility of an RPC callback retries.
> > For not-4.1 case redoing the _prepare is harmless.
> >
> > Signed-off-by: Boaz Harrosh <[email protected]>
>
> Bruce hi.
>
> This is a crash fix for current 4.1 code. Perhaps you have missed it.
> (If not, sorry. Just that I've not seen any response)

It's always good to poke me again in a case like this.... I haven't
gotten to it yet, but it's on my list, thanks.

--b.

>
> Thanks
> Boaz
>
> > ---
> > fs/nfsd/nfs4callback.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index f3b5015..3bbeae8 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -913,7 +913,7 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
> > if (dp->dl_retries--) {
> > rpc_delay(task, 2*HZ);
> > task->tk_status = 0;
> > - rpc_restart_call(task);
> > + rpc_restart_call_prepare(task);
> > return;
> > } else {
> > atomic_set(&clp->cl_cb_set, 0);
>

2010-07-01 18:29:01

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH version2] nfsd41: Fix a crash when a callback is retried

On Jun. 29, 2010, 14:33 +0300, Boaz Harrosh <[email protected]> wrote:
>
> If a callback is retried at nfsd4_cb_recall_done() do to
> some error. The returned rpc reply would then crash here:
>
> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
> u32 dummy;
> __be32 *p;
>
> + BUG_ON(!res);
> if (res->cbs_minorversion == 0)
> return 0;
>
> [BUG_ON added for demonstration]
>
> This is because the nfsd4_cb_done_sequence() has NULLed out
> the task->tk_msg.rpc_resp pointer.
>
> Also eventually the rpc would use the new slot without making
> sure it is free by calling nfsd41_cb_setup_sequence().
>
> This problem was introduced by a 4.1 protocol addition patch:
> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
>
> Which was overlooking the possibility of an RPC callback retries.
> For not-4.1 case redoing the _prepare is harmless.
>
> Signed-off-by: Boaz Harrosh <[email protected]>

Merged at pnfs-all-2.6.35-rc3-2010-07-01

Thanks!

Benny

> ---
> fs/nfsd/nfs4callback.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index f3b5015..3bbeae8 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -913,7 +913,7 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
> if (dp->dl_retries--) {
> rpc_delay(task, 2*HZ);
> task->tk_status = 0;
> - rpc_restart_call(task);
> + rpc_restart_call_prepare(task);
> return;
> } else {
> atomic_set(&clp->cl_cb_set, 0);

2010-08-23 22:32:45

by J.Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH version2] nfsd41: Fix a crash when a callback is retried

On Mon, Aug 23, 2010 at 01:10:28PM +0300, Boaz Harrosh wrote:
> On 08/05/2010 05:22 PM, J. Bruce Fields wrote:
> > Thanks, applied. There may be a delay before it shows up in my
> > for-2.6.36 queue, while I sort out a few other bugs.
> >
> > (We still have problems here: getting a new slot isn't always the
> > correct thing to do, depending on the error. But this seems an
> > improvement....).
> >
>
> We do free the slot unconditionally, (hence the crash) so we better
> grab a new one.
>
> Please store in the corner of your mind those cases you think should
> not free the slot, I promise to put it on my todo, and come back to it.

OK, thanks.

Might be worth starting by looking at the client code, as we probably
have to handle all the same errors in cb_sequence replies as are handled
in sequence replies.

--b.

2010-08-23 10:10:34

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH version2] nfsd41: Fix a crash when a callback is retried

On 08/05/2010 05:22 PM, J. Bruce Fields wrote:
> Thanks, applied. There may be a delay before it shows up in my
> for-2.6.36 queue, while I sort out a few other bugs.
>
> (We still have problems here: getting a new slot isn't always the
> correct thing to do, depending on the error. But this seems an
> improvement....).
>

We do free the slot unconditionally, (hence the crash) so we better
grab a new one.

Please store in the corner of your mind those cases you think should
not free the slot, I promise to put it on my todo, and come back to it.

We will then have to switch on these case that don't free the slot and
bail out early. At this point of the code, it will remain the same.

But you are right, I have not thought about these cases.

> --b.

Thanks
Boaz

2010-08-05 14:23:55

by J.Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH version2] nfsd41: Fix a crash when a callback is retried

On Tue, Jun 29, 2010 at 02:33:55PM +0300, Boaz Harrosh wrote:
>
> If a callback is retried at nfsd4_cb_recall_done() do to
> some error. The returned rpc reply would then crash here:
>
> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
> u32 dummy;
> __be32 *p;
>
> + BUG_ON(!res);
> if (res->cbs_minorversion == 0)
> return 0;

Thanks, applied. There may be a delay before it shows up in my
for-2.6.36 queue, while I sort out a few other bugs.

(We still have problems here: getting a new slot isn't always the
correct thing to do, depending on the error. But this seems an
improvement....).

--b.

>
> [BUG_ON added for demonstration]
>
> This is because the nfsd4_cb_done_sequence() has NULLed out
> the task->tk_msg.rpc_resp pointer.
>
> Also eventually the rpc would use the new slot without making
> sure it is free by calling nfsd41_cb_setup_sequence().
>
> This problem was introduced by a 4.1 protocol addition patch:
> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
>
> Which was overlooking the possibility of an RPC callback retries.
> For not-4.1 case redoing the _prepare is harmless.
>
> Signed-off-by: Boaz Harrosh <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index f3b5015..3bbeae8 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -913,7 +913,7 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
> if (dp->dl_retries--) {
> rpc_delay(task, 2*HZ);
> task->tk_status = 0;
> - rpc_restart_call(task);
> + rpc_restart_call_prepare(task);
> return;
> } else {
> atomic_set(&clp->cl_cb_set, 0);
> --
> 1.6.6.1
>
>