LinuxLists.cc - [PATCH 1/1] NFSD: fix error handling in callbacks

2021-03-09 14:43:19

Subject: [PATCH 1/1] NFSD: fix error handling in callbacks

From: Olga Kornievskaia <[email protected]>

When the server tries to do a callback and a client fails it due to
authentication problems, we need the server to set callback down
flag in RENEW so that client can recover.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/nfs4callback.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 052be5bf9ef5..7325592b456e 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
switch (task->tk_status) {
case -EIO:
case -ETIMEDOUT:
+ case -EACCES:
nfsd4_mark_cb_down(clp, task->tk_status);
}
break;
--
2.27.0

2021-03-09 15:40:04

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <[email protected]>
>
> When the server tries to do a callback and a client fails it due to
> authentication problems, we need the server to set callback down
> flag in RENEW so that client can recover.

I was looking at this. It looks to me like this should really be just:

case 1:
if (task->tk_status)
nfsd4_mark_cb_down(clp, task->tk_status);

If tk_status showed an error, and the ->done method doesn't return 0 to
tell us it something worth retrying, then the callback failed
permanently, so we should mark the callback path down, regardless of the
exact error.

--b.

>
> Signed-off-by: Olga Kornievskaia <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 052be5bf9ef5..7325592b456e 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> switch (task->tk_status) {
> case -EIO:
> case -ETIMEDOUT:
> + case -EACCES:
> nfsd4_mark_cb_down(clp, task->tk_status);
> }
> break;
> --
> 2.27.0
>

2021-03-09 16:41:41

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <[email protected]>
> >
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
>
> I was looking at this. It looks to me like this should really be just:
>
> case 1:
> if (task->tk_status)
> nfsd4_mark_cb_down(clp, task->tk_status);
>
> If tk_status showed an error, and the ->done method doesn't return 0 to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of the
> exact error.

Ok. v2 coming (will change the title to make it 4.0 callback)

>
> --b.
>
> >
> > Signed-off-by: Olga Kornievskaia <[email protected]>
> > ---
> > fs/nfsd/nfs4callback.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > switch (task->tk_status) {
> > case -EIO:
> > case -ETIMEDOUT:
> > + case -EACCES:
> > nfsd4_mark_cb_down(clp, task->tk_status);
> > }
> > break;
> > --
> > 2.27.0
> >
>

2021-03-09 16:48:05

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, Mar 9, 2021 at 11:39 AM Olga Kornievskaia
<[email protected]> wrote:
>
> On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <[email protected]> wrote:
> >
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <[email protected]>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this. It looks to me like this should really be just:
> >
> > case 1:
> > if (task->tk_status)
> > nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0 to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of the
> > exact error.
>
> Ok. v2 coming (will change the title to make it 4.0 callback)

Sigh, I didn't change the wording of the commit and left the
authentication problem which is not accurate enough for this patch (as
say connection errors are also covered by this patch). Do you need me
to change the wording of the commit and send v3?

>
> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > ---
> > > fs/nfsd/nfs4callback.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > > switch (task->tk_status) {
> > > case -EIO:
> > > case -ETIMEDOUT:
> > > + case -EACCES:
> > > nfsd4_mark_cb_down(clp, task->tk_status);
> > > }
> > > break;
> > > --
> > > 2.27.0
> > >
> >

2021-03-09 17:45:23

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

> On Mar 9, 2021, at 11:46 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Mar 9, 2021 at 11:39 AM Olga Kornievskaia
> <[email protected]> wrote:
>>
>> On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <[email protected]> wrote:
>>>
>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
>>>> From: Olga Kornievskaia <[email protected]>
>>>>
>>>> When the server tries to do a callback and a client fails it due to
>>>> authentication problems, we need the server to set callback down
>>>> flag in RENEW so that client can recover.
>>>
>>> I was looking at this. It looks to me like this should really be just:
>>>
>>> case 1:
>>> if (task->tk_status)
>>> nfsd4_mark_cb_down(clp, task->tk_status);
>>>
>>> If tk_status showed an error, and the ->done method doesn't return 0 to
>>> tell us it something worth retrying, then the callback failed
>>> permanently, so we should mark the callback path down, regardless of the
>>> exact error.
>>
>> Ok. v2 coming (will change the title to make it 4.0 callback)
>
> Sigh, I didn't change the wording of the commit and left the
> authentication problem which is not accurate enough for this patch (as
> say connection errors are also covered by this patch). Do you need me
> to change the wording of the commit and send v3?

Yes, please post a v3. Thanks.

>>> --b.
>>>
>>>>
>>>> Signed-off-by: Olga Kornievskaia <[email protected]>
>>>> ---
>>>> fs/nfsd/nfs4callback.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>> index 052be5bf9ef5..7325592b456e 100644
>>>> --- a/fs/nfsd/nfs4callback.c
>>>> +++ b/fs/nfsd/nfs4callback.c
>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
>>>> switch (task->tk_status) {
>>>> case -EIO:
>>>> case -ETIMEDOUT:
>>>> + case -EACCES:
>>>> nfsd4_mark_cb_down(clp, task->tk_status);
>>>> }
>>>> break;
>>>> --
>>>> 2.27.0

--
Chuck Lever

2021-03-09 19:46:55

by Scott Mayhew

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, 09 Mar 2021, J. Bruce Fields wrote:

> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <[email protected]>
> >
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
>
> I was looking at this. It looks to me like this should really be just:
>
> case 1:
> if (task->tk_status)
> nfsd4_mark_cb_down(clp, task->tk_status);
>
> If tk_status showed an error, and the ->done method doesn't return 0 to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of the
> exact error.

That switch was added because the server was erroneously setting
SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
NFS-level error for a CB_RECALL that the client had already done a
FREE_STATEID for. Removing the switch will cause the server to go back
to that behaviour, won't it?

-Scott
>
> --b.
>
> >
> > Signed-off-by: Olga Kornievskaia <[email protected]>
> > ---
> > fs/nfsd/nfs4callback.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > switch (task->tk_status) {
> > case -EIO:
> > case -ETIMEDOUT:
> > + case -EACCES:
> > nfsd4_mark_cb_down(clp, task->tk_status);
> > }
> > break;
> > --
> > 2.27.0
> >
>

2021-03-09 20:13:41

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, Mar 09, 2021 at 02:45:19PM -0500, Scott Mayhew wrote:
> On Tue, 09 Mar 2021, J. Bruce Fields wrote:
>
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <[email protected]>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this. It looks to me like this should really be just:
> >
> > case 1:
> > if (task->tk_status)
> > nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0 to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of the
> > exact error.
>
> That switch was added because the server was erroneously setting
> SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
> NFS-level error for a CB_RECALL that the client had already done a
> FREE_STATEID for. Removing the switch will cause the server to go back
> to that behaviour, won't it?

Oh, thanks for the history.

My knee-jerk reaction is: that sounds like a recall-specific issue, so
maybe that logic should be in nfsd4_cb_recall_done?

I guess I'm OK continuing instead to enumerate transport-layer errors in
nfsd4_cb_done, but do we know that EACCES makes it a complete list?

--b.

>
> -Scott
> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > ---
> > > fs/nfsd/nfs4callback.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > > switch (task->tk_status) {
> > > case -EIO:
> > > case -ETIMEDOUT:
> > > + case -EACCES:
> > > nfsd4_mark_cb_down(clp, task->tk_status);
> > > }
> > > break;
> > > --
> > > 2.27.0
> > >
> >

2021-03-09 20:24:22

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <[email protected]>
> >
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
>
> I was looking at this. It looks to me like this should really be
> just:
>
>         case 1:
>                 if (task->tk_status)
>                         nfsd4_mark_cb_down(clp, task->tk_status);
>
> If tk_status showed an error, and the ->done method doesn't return 0
> to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of
> the
> exact error.

I disagree. task->tk_status could be an unhandled NFSv4 error (see
nfsd4_cb_recall_done()). The client might, for instance, be in the
process of returning the delegation being recalled. Why should that
result in the callback channel being marked as down?

>
> --b.
>
> >
> > Signed-off-by: Olga Kornievskaia <[email protected]>
> > ---
> > fs/nfsd/nfs4callback.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > *task, void *calldata)
> >                 switch (task->tk_status) {
> >                 case -EIO:
> >                 case -ETIMEDOUT:
> > +               case -EACCES:
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >                 }
> >                 break;
> > --
> > 2.27.0
> >
>

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2021-03-09 20:41:50

by Scott Mayhew

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, 09 Mar 2021, J. Bruce Fields wrote:

> On Tue, Mar 09, 2021 at 02:45:19PM -0500, Scott Mayhew wrote:
> > On Tue, 09 Mar 2021, J. Bruce Fields wrote:
> >
> > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <[email protected]>
> > > >
> > > > When the server tries to do a callback and a client fails it due to
> > > > authentication problems, we need the server to set callback down
> > > > flag in RENEW so that client can recover.
> > >
> > > I was looking at this. It looks to me like this should really be just:
> > >
> > > case 1:
> > > if (task->tk_status)
> > > nfsd4_mark_cb_down(clp, task->tk_status);
> > >
> > > If tk_status showed an error, and the ->done method doesn't return 0 to
> > > tell us it something worth retrying, then the callback failed
> > > permanently, so we should mark the callback path down, regardless of the
> > > exact error.
> >
> > That switch was added because the server was erroneously setting
> > SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
> > NFS-level error for a CB_RECALL that the client had already done a
> > FREE_STATEID for. Removing the switch will cause the server to go back
> > to that behaviour, won't it?
>
> Oh, thanks for the history.
>
> My knee-jerk reaction is: that sounds like a recall-specific issue, so
> maybe that logic should be in nfsd4_cb_recall_done?
>
> I guess I'm OK continuing instead to enumerate transport-layer errors in
> nfsd4_cb_done, but do we know that EACCES makes it a complete list?

No idea. I'm pretty sure EIO and ETIMEDOUT were the two errors that I
was seeing at the time. I don't remember the exact test case but I
think I was hammering the server trying to reproduce seq misordered &
false retry errors. And EACCES is what Olga's seeing when the server
uses the wrong principal for the callback cred.

-Scott
>
> --b.
>
> >
> > -Scott
> > >
> > > --b.
> > >
> > > >
> > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > ---
> > > > fs/nfsd/nfs4callback.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 052be5bf9ef5..7325592b456e 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > > > switch (task->tk_status) {
> > > > case -EIO:
> > > > case -ETIMEDOUT:
> > > > + case -EACCES:
> > > > nfsd4_mark_cb_down(clp, task->tk_status);
> > > > }
> > > > break;
> > > > --
> > > > 2.27.0
> > > >
> > >

2021-03-09 20:44:44

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <[email protected]> wrote:
>
> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <[email protected]>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this. It looks to me like this should really be
> > just:
> >
> > case 1:
> > if (task->tk_status)
> > nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0
> > to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of
> > the
> > exact error.
>
> I disagree. task->tk_status could be an unhandled NFSv4 error (see
> nfsd4_cb_recall_done()). The client might, for instance, be in the
> process of returning the delegation being recalled. Why should that
> result in the callback channel being marked as down?
>

Are you talking about say the connection going down and server should
just reconnect instead of recovering the callback channel. I assumed
that connection break is something that's not recoverable by the
callback but perhaps I'm wrong.

> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > ---
> > > fs/nfsd/nfs4callback.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > *task, void *calldata)
> > > switch (task->tk_status) {
> > > case -EIO:
> > > case -ETIMEDOUT:
> > > + case -EACCES:
> > > nfsd4_mark_cb_down(clp, task->tk_status);
> > > }
> > > break;
> > > --
> > > 2.27.0
> > >
> >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2021-03-09 21:02:01

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> [email protected]> wrote:
> >
> > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > wrote:
> > > > From: Olga Kornievskaia <[email protected]>
> > > >
> > > > When the server tries to do a callback and a client fails it
> > > > due to
> > > > authentication problems, we need the server to set callback
> > > > down
> > > > flag in RENEW so that client can recover.
> > >
> > > I was looking at this. It looks to me like this should really be
> > > just:
> > >
> > >         case 1:
> > >                 if (task->tk_status)
> > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > >
> > > If tk_status showed an error, and the ->done method doesn't
> > > return 0
> > > to
> > > tell us it something worth retrying, then the callback failed
> > > permanently, so we should mark the callback path down, regardless
> > > of
> > > the
> > > exact error.
> >
> > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > process of returning the delegation being recalled. Why should that
> > result in the callback channel being marked as down?
> >
>
> Are you talking about say the connection going down and server should
> just reconnect instead of recovering the callback channel. I assumed
> that connection break is something that's not recoverable by the
> callback but perhaps I'm wrong.

No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
not seeing why either of those errors should be handled by marking the
callback channel as being down.

Looking further, it seems that the same function will also return '1'
without checking the value of task->tk_status if the delegation has
been revoked or returned. So that would mean that even NFS4ERR_DELAY
could trigger the call to nfsd4_mark_cb_down() with the above change.

>
> > >
> > > --b.
> > >
> > > >
> > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > ---
> > > > fs/nfsd/nfs4callback.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 052be5bf9ef5..7325592b456e 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > *task, void *calldata)
> > > >                 switch (task->tk_status) {
> > > >                 case -EIO:
> > > >                 case -ETIMEDOUT:
> > > > +               case -EACCES:
> > > >                         nfsd4_mark_cb_down(clp, task-
> > > > >tk_status);
> > > >                 }
> > > >                 break;
> > > > --
> > > > 2.27.0
> > > >
> > >
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2021-03-10 14:48:55

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > [email protected]> wrote:
> > >
> > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > wrote:
> > > > > From: Olga Kornievskaia <[email protected]>
> > > > >
> > > > > When the server tries to do a callback and a client fails it
> > > > > due to
> > > > > authentication problems, we need the server to set callback
> > > > > down
> > > > > flag in RENEW so that client can recover.
> > > >
> > > > I was looking at this.? It looks to me like this should really be
> > > > just:
> > > >
> > > > ??????? case 1:
> > > > ??????????????? if (task->tk_status)
> > > > ??????????????????????? nfsd4_mark_cb_down(clp, task->tk_status);
> > > >
> > > > If tk_status showed an error, and the ->done method doesn't
> > > > return 0
> > > > to
> > > > tell us it something worth retrying, then the callback failed
> > > > permanently, so we should mark the callback path down, regardless
> > > > of
> > > > the
> > > > exact error.
> > >
> > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > process of returning the delegation being recalled. Why should that
> > > result in the callback channel being marked as down?
> > >
> >
> > Are you talking about say the connection going down and server should
> > just reconnect instead of recovering the callback channel. I assumed
> > that connection break is something that's not? recoverable by the
> > callback but perhaps I'm wrong.
>
> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> not seeing why either of those errors should be handled by marking the
> callback channel as being down.
>
> Looking further, it seems that the same function will also return '1'
> without checking the value of task->tk_status if the delegation has
> been revoked or returned. So that would mean that even NFS4ERR_DELAY
> could trigger the call to nfsd4_mark_cb_down() with the above change.

Yeah, OK, that's wrong, apologies.

I'm just a little worried about the attempt to enumerate transport level
errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is
the right list?

--b.

>
> >
> > > >
> > > > --b.
> > > >
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > > ---
> > > > > ?fs/nfsd/nfs4callback.c | 1 +
> > > > > ?1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > *task, void *calldata)
> > > > > ??????????????? switch (task->tk_status) {
> > > > > ??????????????? case -EIO:
> > > > > ??????????????? case -ETIMEDOUT:
> > > > > +?????????????? case -EACCES:
> > > > > ??????????????????????? nfsd4_mark_cb_down(clp, task-
> > > > > >tk_status);
> > > > > ??????????????? }
> > > > > ??????????????? break;
> > > > > --
> > > > > 2.27.0
> > > > >
> > > >
> > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > [email protected]
> > >
> > >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2021-03-10 22:11:22

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> > On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > > [email protected]> wrote:
> > > >
> > > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > > wrote:
> > > > > > From: Olga Kornievskaia <[email protected]>
> > > > > >
> > > > > > When the server tries to do a callback and a client fails it
> > > > > > due to
> > > > > > authentication problems, we need the server to set callback
> > > > > > down
> > > > > > flag in RENEW so that client can recover.
> > > > >
> > > > > I was looking at this. It looks to me like this should really be
> > > > > just:
> > > > >
> > > > > case 1:
> > > > > if (task->tk_status)
> > > > > nfsd4_mark_cb_down(clp, task->tk_status);
> > > > >
> > > > > If tk_status showed an error, and the ->done method doesn't
> > > > > return 0
> > > > > to
> > > > > tell us it something worth retrying, then the callback failed
> > > > > permanently, so we should mark the callback path down, regardless
> > > > > of
> > > > > the
> > > > > exact error.
> > > >
> > > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > > process of returning the delegation being recalled. Why should that
> > > > result in the callback channel being marked as down?
> > > >
> > >
> > > Are you talking about say the connection going down and server should
> > > just reconnect instead of recovering the callback channel. I assumed
> > > that connection break is something that's not recoverable by the
> > > callback but perhaps I'm wrong.
> >
> > No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> > for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> > not seeing why either of those errors should be handled by marking the
> > callback channel as being down.
> >
> > Looking further, it seems that the same function will also return '1'
> > without checking the value of task->tk_status if the delegation has
> > been revoked or returned. So that would mean that even NFS4ERR_DELAY
> > could trigger the call to nfsd4_mark_cb_down() with the above change.
>
> Yeah, OK, that's wrong, apologies.
>
> I'm just a little worried about the attempt to enumerate transport level
> errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is
> the right list?

Looking at call_transmit_status error handling, I don't think
connection errors are returned. Instead the code tries to fix the
connection by retrying unless the rpc_timeout is reached and then only
EIO,TIMEDOUT is returned.

Can then my original patch be considered without resubmission?

>
> --b.
>
> >
> > >
> > > > >
> > > > > --b.
> > > > >
> > > > > >
> > > > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > > > ---
> > > > > > fs/nfsd/nfs4callback.c | 1 +
> > > > > > 1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > > *task, void *calldata)
> > > > > > switch (task->tk_status) {
> > > > > > case -EIO:
> > > > > > case -ETIMEDOUT:
> > > > > > + case -EACCES:
> > > > > > nfsd4_mark_cb_down(clp, task-
> > > > > > >tk_status);
> > > > > > }
> > > > > > break;
> > > > > > --
> > > > > > 2.27.0
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Trond Myklebust
> > > > Linux NFS client maintainer, Hammerspace
> > > > [email protected]
> > > >
> > > >
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >
>

2021-03-11 15:00:24

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Wed, Mar 10, 2021 at 05:09:20PM -0500, Olga Kornievskaia wrote:
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <[email protected]> wrote:
> >
> > On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> > > On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > > > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > > > [email protected]> wrote:
> > > > >
> > > > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > > > wrote:
> > > > > > > From: Olga Kornievskaia <[email protected]>
> > > > > > >
> > > > > > > When the server tries to do a callback and a client fails it
> > > > > > > due to
> > > > > > > authentication problems, we need the server to set callback
> > > > > > > down
> > > > > > > flag in RENEW so that client can recover.
> > > > > >
> > > > > > I was looking at this. It looks to me like this should really be
> > > > > > just:
> > > > > >
> > > > > > case 1:
> > > > > > if (task->tk_status)
> > > > > > nfsd4_mark_cb_down(clp, task->tk_status);
> > > > > >
> > > > > > If tk_status showed an error, and the ->done method doesn't
> > > > > > return 0
> > > > > > to
> > > > > > tell us it something worth retrying, then the callback failed
> > > > > > permanently, so we should mark the callback path down, regardless
> > > > > > of
> > > > > > the
> > > > > > exact error.
> > > > >
> > > > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > > > process of returning the delegation being recalled. Why should that
> > > > > result in the callback channel being marked as down?
> > > > >
> > > >
> > > > Are you talking about say the connection going down and server should
> > > > just reconnect instead of recovering the callback channel. I assumed
> > > > that connection break is something that's not recoverable by the
> > > > callback but perhaps I'm wrong.
> > >
> > > No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> > > for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> > > not seeing why either of those errors should be handled by marking the
> > > callback channel as being down.
> > >
> > > Looking further, it seems that the same function will also return '1'
> > > without checking the value of task->tk_status if the delegation has
> > > been revoked or returned. So that would mean that even NFS4ERR_DELAY
> > > could trigger the call to nfsd4_mark_cb_down() with the above change.
> >
> > Yeah, OK, that's wrong, apologies.
> >
> > I'm just a little worried about the attempt to enumerate transport level
> > errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is
> > the right list?
>
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
>
> Can then my original patch be considered without resubmission?

Sure, thanks for checking that.

--b.

>
> >
> > --b.
> >
> > >
> > > >
> > > > > >
> > > > > > --b.
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > > > > ---
> > > > > > > fs/nfsd/nfs4callback.c | 1 +
> > > > > > > 1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > > > *task, void *calldata)
> > > > > > > switch (task->tk_status) {
> > > > > > > case -EIO:
> > > > > > > case -ETIMEDOUT:
> > > > > > > + case -EACCES:
> > > > > > > nfsd4_mark_cb_down(clp, task-
> > > > > > > >tk_status);
> > > > > > > }
> > > > > > > break;
> > > > > > > --
> > > > > > > 2.27.0
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Trond Myklebust
> > > > > Linux NFS client maintainer, Hammerspace
> > > > > [email protected]
> > > > >
> > > > >
> > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > [email protected]
> > >
> > >
> >
>

2021-03-11 15:11:59

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <[email protected]> wrote:
>
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <[email protected]> wrote:
>>
>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>> [email protected]> wrote:
>>>>>
>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>> wrote:
>>>>>>> From: Olga Kornievskaia <[email protected]>
>>>>>>>
>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>> due to
>>>>>>> authentication problems, we need the server to set callback
>>>>>>> down
>>>>>>> flag in RENEW so that client can recover.
>>>>>>
>>>>>> I was looking at this. It looks to me like this should really be
>>>>>> just:
>>>>>>
>>>>>> case 1:
>>>>>> if (task->tk_status)
>>>>>> nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>>
>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>> return 0
>>>>>> to
>>>>>> tell us it something worth retrying, then the callback failed
>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>> of
>>>>>> the
>>>>>> exact error.
>>>>>
>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>> process of returning the delegation being recalled. Why should that
>>>>> result in the callback channel being marked as down?
>>>>>
>>>>
>>>> Are you talking about say the connection going down and server should
>>>> just reconnect instead of recovering the callback channel. I assumed
>>>> that connection break is something that's not recoverable by the
>>>> callback but perhaps I'm wrong.
>>>
>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>> not seeing why either of those errors should be handled by marking the
>>> callback channel as being down.
>>>
>>> Looking further, it seems that the same function will also return '1'
>>> without checking the value of task->tk_status if the delegation has
>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>>
>> Yeah, OK, that's wrong, apologies.
>>
>> I'm just a little worried about the attempt to enumerate transport level
>> errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is
>> the right list?
>
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
>
> Can then my original patch be considered without resubmission?

Bruce has authorized v1 of this patch, but that one has the
uncorrected patch description. Post a v4?

>> --b.
>>
>>>
>>>>
>>>>>>
>>>>>> --b.
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Olga Kornievskaia <[email protected]>
>>>>>>> ---
>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>> *task, void *calldata)
>>>>>>> switch (task->tk_status) {
>>>>>>> case -EIO:
>>>>>>> case -ETIMEDOUT:
>>>>>>> + case -EACCES:
>>>>>>> nfsd4_mark_cb_down(clp, task-
>>>>>>>> tk_status);
>>>>>>> }
>>>>>>> break;
>>>>>>> --
>>>>>>> 2.27.0
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> [email protected]
>>>>>
>>>>>
>>>
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer, Hammerspace
>>> [email protected]

--
Chuck Lever

2021-03-11 15:20:06

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

On Thu, Mar 11, 2021 at 10:10 AM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <[email protected]> wrote:
> >
> > On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <[email protected]> wrote:
> >>
> >> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> >>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> >>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> >>>> [email protected]> wrote:
> >>>>>
> >>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> >>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> >>>>>> wrote:
> >>>>>>> From: Olga Kornievskaia <[email protected]>
> >>>>>>>
> >>>>>>> When the server tries to do a callback and a client fails it
> >>>>>>> due to
> >>>>>>> authentication problems, we need the server to set callback
> >>>>>>> down
> >>>>>>> flag in RENEW so that client can recover.
> >>>>>>
> >>>>>> I was looking at this. It looks to me like this should really be
> >>>>>> just:
> >>>>>>
> >>>>>> case 1:
> >>>>>> if (task->tk_status)
> >>>>>> nfsd4_mark_cb_down(clp, task->tk_status);
> >>>>>>
> >>>>>> If tk_status showed an error, and the ->done method doesn't
> >>>>>> return 0
> >>>>>> to
> >>>>>> tell us it something worth retrying, then the callback failed
> >>>>>> permanently, so we should mark the callback path down, regardless
> >>>>>> of
> >>>>>> the
> >>>>>> exact error.
> >>>>>
> >>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
> >>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
> >>>>> process of returning the delegation being recalled. Why should that
> >>>>> result in the callback channel being marked as down?
> >>>>>
> >>>>
> >>>> Are you talking about say the connection going down and server should
> >>>> just reconnect instead of recovering the callback channel. I assumed
> >>>> that connection break is something that's not recoverable by the
> >>>> callback but perhaps I'm wrong.
> >>>
> >>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> >>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> >>> not seeing why either of those errors should be handled by marking the
> >>> callback channel as being down.
> >>>
> >>> Looking further, it seems that the same function will also return '1'
> >>> without checking the value of task->tk_status if the delegation has
> >>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
> >>> could trigger the call to nfsd4_mark_cb_down() with the above change.
> >>
> >> Yeah, OK, that's wrong, apologies.
> >>
> >> I'm just a little worried about the attempt to enumerate transport level
> >> errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is
> >> the right list?
> >
> > Looking at call_transmit_status error handling, I don't think
> > connection errors are returned. Instead the code tries to fix the
> > connection by retrying unless the rpc_timeout is reached and then only
> > EIO,TIMEDOUT is returned.
> >
> > Can then my original patch be considered without resubmission?
>
> Bruce has authorized v1 of this patch, but that one has the
> uncorrected patch description. Post a v4?

v1's description is accurate. It reflects that only authentication
errors are handled.

>
>
>
> >> --b.
> >>
> >>>
> >>>>
> >>>>>>
> >>>>>> --b.
> >>>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Olga Kornievskaia <[email protected]>
> >>>>>>> ---
> >>>>>>> fs/nfsd/nfs4callback.c | 1 +
> >>>>>>> 1 file changed, 1 insertion(+)
> >>>>>>>
> >>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> >>>>>>> index 052be5bf9ef5..7325592b456e 100644
> >>>>>>> --- a/fs/nfsd/nfs4callback.c
> >>>>>>> +++ b/fs/nfsd/nfs4callback.c
> >>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> >>>>>>> *task, void *calldata)
> >>>>>>> switch (task->tk_status) {
> >>>>>>> case -EIO:
> >>>>>>> case -ETIMEDOUT:
> >>>>>>> + case -EACCES:
> >>>>>>> nfsd4_mark_cb_down(clp, task-
> >>>>>>>> tk_status);
> >>>>>>> }
> >>>>>>> break;
> >>>>>>> --
> >>>>>>> 2.27.0
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Trond Myklebust
> >>>>> Linux NFS client maintainer, Hammerspace
> >>>>> [email protected]
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> Trond Myklebust
> >>> Linux NFS client maintainer, Hammerspace
> >>> [email protected]
>
> --
> Chuck Lever
>
>
>

2021-03-11 15:20:19

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 1/1] NFSD: fix error handling in callbacks

> On Mar 11, 2021, at 10:16 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Thu, Mar 11, 2021 at 10:10 AM Chuck Lever III <[email protected]> wrote:
>>
>>
>>
>>> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <[email protected]> wrote:
>>>>
>>>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>>>> wrote:
>>>>>>>>> From: Olga Kornievskaia <[email protected]>
>>>>>>>>>
>>>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>>>> due to
>>>>>>>>> authentication problems, we need the server to set callback
>>>>>>>>> down
>>>>>>>>> flag in RENEW so that client can recover.
>>>>>>>>
>>>>>>>> I was looking at this. It looks to me like this should really be
>>>>>>>> just:
>>>>>>>>
>>>>>>>> case 1:
>>>>>>>> if (task->tk_status)
>>>>>>>> nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>>>>
>>>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>>>> return 0
>>>>>>>> to
>>>>>>>> tell us it something worth retrying, then the callback failed
>>>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>>>> of
>>>>>>>> the
>>>>>>>> exact error.
>>>>>>>
>>>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>>>> process of returning the delegation being recalled. Why should that
>>>>>>> result in the callback channel being marked as down?
>>>>>>>
>>>>>>
>>>>>> Are you talking about say the connection going down and server should
>>>>>> just reconnect instead of recovering the callback channel. I assumed
>>>>>> that connection break is something that's not recoverable by the
>>>>>> callback but perhaps I'm wrong.
>>>>>
>>>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>>>> not seeing why either of those errors should be handled by marking the
>>>>> callback channel as being down.
>>>>>
>>>>> Looking further, it seems that the same function will also return '1'
>>>>> without checking the value of task->tk_status if the delegation has
>>>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>>>>
>>>> Yeah, OK, that's wrong, apologies.
>>>>
>>>> I'm just a little worried about the attempt to enumerate transport level
>>>> errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is
>>>> the right list?
>>>
>>> Looking at call_transmit_status error handling, I don't think
>>> connection errors are returned. Instead the code tries to fix the
>>> connection by retrying unless the rpc_timeout is reached and then only
>>> EIO,TIMEDOUT is returned.
>>>
>>> Can then my original patch be considered without resubmission?
>>
>> Bruce has authorized v1 of this patch, but that one has the
>> uncorrected patch description. Post a v4?
>
> v1's description is accurate. It reflects that only authentication
> errors are handled.

Nit: The short description isn't specific to NFSv4.0, as the v3
one was.

>>>> --b.
>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> --b.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Olga Kornievskaia <[email protected]>
>>>>>>>>> ---
>>>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>>>> *task, void *calldata)
>>>>>>>>> switch (task->tk_status) {
>>>>>>>>> case -EIO:
>>>>>>>>> case -ETIMEDOUT:
>>>>>>>>> + case -EACCES:
>>>>>>>>> nfsd4_mark_cb_down(clp, task-
>>>>>>>>>> tk_status);
>>>>>>>>> }
>>>>>>>>> break;
>>>>>>>>> --
>>>>>>>>> 2.27.0
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Trond Myklebust
>>>>>>> Linux NFS client maintainer, Hammerspace
>>>>>>> [email protected]
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> [email protected]
>>
>> --
>> Chuck Lever

--
Chuck Lever