2009-12-18 13:56:16

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] sunrpc: on successful gss error pipe write, don't return error

When handling the gssd downcall, the kernel should distinguish between a
successful downcall that contains an error code and a failed downcall
(i.e. where the parsing failed or some other sort of problem occurred).

In the former case, gss_pipe_downcall should be returning the number of
bytes written to the pipe instead of an error.

Signed-off-by: Jeff Layton <[email protected]>
---
net/sunrpc/auth_gss/auth_gss.c | 21 +++++++++++++++++----
1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 3c3c50f..03cc5a4 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -184,7 +184,8 @@ gss_alloc_context(void)

#define GSSD_MIN_TIMEOUT (60 * 60)
static const void *
-gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct gss_api_mech *gm)
+gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx,
+ struct gss_api_mech *gm, ssize_t *downcall_err)
{
const void *q;
unsigned int seclen;
@@ -208,6 +209,7 @@ gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct
if (ctx->gc_win == 0) {
/* in which case, p points to an error code which we ignore */
p = ERR_PTR(-EACCES);
+ *downcall_err = -EACCES;
goto err;
}
/* copy the opaque wire context */
@@ -641,10 +643,21 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
list_del_init(&gss_msg->list);
spin_unlock(&inode->i_lock);

- p = gss_fill_context(p, end, ctx, gss_msg->auth->mech);
+ err = 0;
+ p = gss_fill_context(p, end, ctx, gss_msg->auth->mech, &err);
if (IS_ERR(p)) {
- err = PTR_ERR(p);
- gss_msg->msg.errno = (err == -EAGAIN) ? -EAGAIN : -EACCES;
+ /*
+ * a non-zero downcall_err indicates that downcall write was
+ * OK, but contained a zero gc_win (and hence an error code).
+ */
+ if (err) {
+ gss_msg->msg.errno = err;
+ err = mlen;
+ } else {
+ err = PTR_ERR(p);
+ gss_msg->msg.errno = (err == -EAGAIN) ?
+ -EAGAIN : -EACCES;
+ }
goto err_release_msg;
}
gss_msg->ctx = gss_get_ctx(ctx);
--
1.6.5.2



2009-12-18 14:12:15

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 2009-12-18 at 08:56 -0500, Jeff Layton wrote:
> When handling the gssd downcall, the kernel should distinguish between a
> successful downcall that contains an error code and a failed downcall
> (i.e. where the parsing failed or some other sort of problem occurred).
>
> In the former case, gss_pipe_downcall should be returning the number of
> bytes written to the pipe instead of an error.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> net/sunrpc/auth_gss/auth_gss.c | 21 +++++++++++++++++----
> 1 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index 3c3c50f..03cc5a4 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -184,7 +184,8 @@ gss_alloc_context(void)
>
> #define GSSD_MIN_TIMEOUT (60 * 60)
> static const void *
> -gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct gss_api_mech *gm)
> +gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx,
> + struct gss_api_mech *gm, ssize_t *downcall_err)
> {
> const void *q;
> unsigned int seclen;
> @@ -208,6 +209,7 @@ gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct
> if (ctx->gc_win == 0) {
> /* in which case, p points to an error code which we ignore */
> p = ERR_PTR(-EACCES);
> + *downcall_err = -EACCES;
> goto err;
> }
> /* copy the opaque wire context */
> @@ -641,10 +643,21 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
> list_del_init(&gss_msg->list);
> spin_unlock(&inode->i_lock);
>
> - p = gss_fill_context(p, end, ctx, gss_msg->auth->mech);
> + err = 0;
> + p = gss_fill_context(p, end, ctx, gss_msg->auth->mech, &err);
> if (IS_ERR(p)) {
> - err = PTR_ERR(p);
> - gss_msg->msg.errno = (err == -EAGAIN) ? -EAGAIN : -EACCES;
> + /*
> + * a non-zero downcall_err indicates that downcall write was
> + * OK, but contained a zero gc_win (and hence an error code).
> + */
> + if (err) {
> + gss_msg->msg.errno = err;
> + err = mlen;
> + } else {
> + err = PTR_ERR(p);
> + gss_msg->msg.errno = (err == -EAGAIN) ?
> + -EAGAIN : -EACCES;
> + }
> goto err_release_msg;
> }
> gss_msg->ctx = gss_get_ctx(ctx);

Is this extra parameter really necessary? Can't we just distinguish
between EACCES, which means that the downcall was successful, but
contained an error, and EFAULT/ENOMEM/ENOSYS, which are context creation
errors.

BTW: while looking at this, I spotted a nasty bug in
gss_import_sec_context_kerberos(). If the kzalloc() call fails, we will
return a random error code since 'p' still points to a valid memory
location...

Trond

2009-12-18 14:39:12

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 18 Dec 2009 09:11:08 -0500
Trond Myklebust <[email protected]> wrote:

> On Fri, 2009-12-18 at 08:56 -0500, Jeff Layton wrote:
> > When handling the gssd downcall, the kernel should distinguish between a
> > successful downcall that contains an error code and a failed downcall
> > (i.e. where the parsing failed or some other sort of problem occurred).
> >
> > In the former case, gss_pipe_downcall should be returning the number of
> > bytes written to the pipe instead of an error.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > net/sunrpc/auth_gss/auth_gss.c | 21 +++++++++++++++++----
> > 1 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> > index 3c3c50f..03cc5a4 100644
> > --- a/net/sunrpc/auth_gss/auth_gss.c
> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> > @@ -184,7 +184,8 @@ gss_alloc_context(void)
> >
> > #define GSSD_MIN_TIMEOUT (60 * 60)
> > static const void *
> > -gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct gss_api_mech *gm)
> > +gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx,
> > + struct gss_api_mech *gm, ssize_t *downcall_err)
> > {
> > const void *q;
> > unsigned int seclen;
> > @@ -208,6 +209,7 @@ gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct
> > if (ctx->gc_win == 0) {
> > /* in which case, p points to an error code which we ignore */
> > p = ERR_PTR(-EACCES);
> > + *downcall_err = -EACCES;
> > goto err;
> > }
> > /* copy the opaque wire context */
> > @@ -641,10 +643,21 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
> > list_del_init(&gss_msg->list);
> > spin_unlock(&inode->i_lock);
> >
> > - p = gss_fill_context(p, end, ctx, gss_msg->auth->mech);
> > + err = 0;
> > + p = gss_fill_context(p, end, ctx, gss_msg->auth->mech, &err);
> > if (IS_ERR(p)) {
> > - err = PTR_ERR(p);
> > - gss_msg->msg.errno = (err == -EAGAIN) ? -EAGAIN : -EACCES;
> > + /*
> > + * a non-zero downcall_err indicates that downcall write was
> > + * OK, but contained a zero gc_win (and hence an error code).
> > + */
> > + if (err) {
> > + gss_msg->msg.errno = err;
> > + err = mlen;
> > + } else {
> > + err = PTR_ERR(p);
> > + gss_msg->msg.errno = (err == -EAGAIN) ?
> > + -EAGAIN : -EACCES;
> > + }
> > goto err_release_msg;
> > }
> > gss_msg->ctx = gss_get_ctx(ctx);
>
> Is this extra parameter really necessary? Can't we just distinguish
> between EACCES, which means that the downcall was successful, but
> contained an error, and EFAULT/ENOMEM/ENOSYS, which are context creation
> errors.
>

Yeah, we could do that with the existing code. I sort of don't like
that because it's hard to know if other functions could eventually
return an EACCES for another reason and then that error would bubble up
to this function. If you think it's the right thing to do though, I'm
OK with it.

FWIW: The reason I'm poking around in here is because I'm taking a stab
at fixing the problem where syscalls start returning errors when a krb5
ticket expires.

As part of that, I want to have gssd send a more granular error code
and have the kernel adjust what it does accordingly. I'd like to have
it retry the upcall indefinitely when there's an expired credcache, and
return an error when there's no credcache at all).

Without a separate downcall error field, we'll need to special case at
least 2 different errors -- one for a "real" EACCES and one that
indicates that the ticket expired and the upcall should be retried
instead.

> BTW: while looking at this, I spotted a nasty bug in
> gss_import_sec_context_kerberos(). If the kzalloc() call fails, we will
> return a random error code since 'p' still points to a valid memory
> location...

Good catch. Do you want fix that one, or should I?

--
Jeff Layton <[email protected]>

2009-12-18 14:48:35

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> Yeah, we could do that with the existing code. I sort of don't like
> that because it's hard to know if other functions could eventually
> return an EACCES for another reason and then that error would bubble up
> to this function. If you think it's the right thing to do though, I'm
> OK with it.

The error EACCES means 'you are not authorised'. I don't see how it can
be ambiguous here.

> FWIW: The reason I'm poking around in here is because I'm taking a stab
> at fixing the problem where syscalls start returning errors when a krb5
> ticket expires.
>
> As part of that, I want to have gssd send a more granular error code
> and have the kernel adjust what it does accordingly. I'd like to have
> it retry the upcall indefinitely when there's an expired credcache, and
> return an error when there's no credcache at all).

That makes sense.

> Without a separate downcall error field, we'll need to special case at
> least 2 different errors -- one for a "real" EACCES and one that
> indicates that the ticket expired and the upcall should be retried
> instead.

We can find another error for the 'ticket expired' case. EKEYEXPIRED
springs to mind...

> > BTW: while looking at this, I spotted a nasty bug in
> > gss_import_sec_context_kerberos(). If the kzalloc() call fails, we will
> > return a random error code since 'p' still points to a valid memory
> > location...
>
> Good catch. Do you want fix that one, or should I?
>

I can do it, if you like.

Trond

2009-12-18 15:12:41

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 2009-12-18 at 09:47 -0500, Trond Myklebust wrote:
> On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > Without a separate downcall error field, we'll need to special case at
> > least 2 different errors -- one for a "real" EACCES and one that
> > indicates that the ticket expired and the upcall should be retried
> > instead.
>
> We can find another error for the 'ticket expired' case. EKEYEXPIRED
> springs to mind...

BTW: Here be dragons!

I think we need to handle the 'ticket expired' case as if it were an
NFS4ERR_DELAY/EJUKEBOX, and actually do the retry in the NFS layer after
a suitable exponential back-off period.

Otherwise, we end up holding onto resources (in particular NFSv4.1
slots, but also RPC slots, ...) which will cause congestion, and prevent
other RPC calls from making progress.

Cheers
Trond

2009-12-18 15:25:55

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 18 Dec 2009 09:47:52 -0500
Trond Myklebust <[email protected]> wrote:

> On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > Yeah, we could do that with the existing code. I sort of don't like
> > that because it's hard to know if other functions could eventually
> > return an EACCES for another reason and then that error would bubble up
> > to this function. If you think it's the right thing to do though, I'm
> > OK with it.
>
> The error EACCES means 'you are not authorised'. I don't see how it can
> be ambiguous here.
>

I guess I was thinking that the pipe writer might not be authorized to
import the sec context for some reason, which would be a different
situation. If that'll never be the case, then a single error code is
probably fine. I'll respin it to just special case EACCES for now and
will plan to have it special case the other error when that work is
more fleshed out.

> > FWIW: The reason I'm poking around in here is because I'm taking a stab
> > at fixing the problem where syscalls start returning errors when a krb5
> > ticket expires.
> >
> > As part of that, I want to have gssd send a more granular error code
> > and have the kernel adjust what it does accordingly. I'd like to have
> > it retry the upcall indefinitely when there's an expired credcache, and
> > return an error when there's no credcache at all).
>
> That makes sense.
>
> > Without a separate downcall error field, we'll need to special case at
> > least 2 different errors -- one for a "real" EACCES and one that
> > indicates that the ticket expired and the upcall should be retried
> > instead.
>
> We can find another error for the 'ticket expired' case. EKEYEXPIRED
> springs to mind...
>

That's exactly the error I was going to use.

> > > BTW: while looking at this, I spotted a nasty bug in
> > > gss_import_sec_context_kerberos(). If the kzalloc() call fails, we will
> > > return a random error code since 'p' still points to a valid memory
> > > location...
> >
> > Good catch. Do you want fix that one, or should I?
> >
>
> I can do it, if you like.
>

Sure. I see another problem in this area too. gss_import_sec_context
can return GSS_S_FAILURE which is unsigned and positive when cast to a
signed value. gss_import_sec_context checks for a negative return from
that function though. Should it be checking for non-zero instead?

--
Jeff Layton <[email protected]>

2009-12-18 15:37:23

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 18 Dec 2009 10:12:22 -0500
Trond Myklebust <[email protected]> wrote:

> On Fri, 2009-12-18 at 09:47 -0500, Trond Myklebust wrote:
> > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > > Without a separate downcall error field, we'll need to special case at
> > > least 2 different errors -- one for a "real" EACCES and one that
> > > indicates that the ticket expired and the upcall should be retried
> > > instead.
> >
> > We can find another error for the 'ticket expired' case. EKEYEXPIRED
> > springs to mind...
>
> BTW: Here be dragons!
>
> I think we need to handle the 'ticket expired' case as if it were an
> NFS4ERR_DELAY/EJUKEBOX, and actually do the retry in the NFS layer after
> a suitable exponential back-off period.
>
> Otherwise, we end up holding onto resources (in particular NFSv4.1
> slots, but also RPC slots, ...) which will cause congestion, and prevent
> other RPC calls from making progress.
>

Thanks. My original thought was that we should handle this situation as
we do when gssd is down -- just retry at the RPC layer. I hadn't
considered the resource issue however. I'll shoot for making the retry
happen at the NFS layer instead. That should also make it easier to
handle this situation differently on hard vs. soft mounts too.

--
Jeff Layton <[email protected]>

2009-12-18 16:25:24

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 18 Dec 2009 10:25:50 -0500
Jeff Layton <[email protected]> wrote:

> Sure. I see another problem in this area too. gss_import_sec_context
> can return GSS_S_FAILURE which is unsigned and positive when cast to a
> signed value. gss_import_sec_context checks for a negative return from
> that function though. Should it be checking for non-zero instead?

Oops, sorry. I meant to say "gss_fill_context checks for a negative
return".

--
Jeff Layton <[email protected]>

2009-12-18 18:30:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 2009-12-18 at 10:37 -0500, Jeff Layton wrote:
> On Fri, 18 Dec 2009 10:12:22 -0500
> Trond Myklebust <[email protected]> wrote:
>
> > On Fri, 2009-12-18 at 09:47 -0500, Trond Myklebust wrote:
> > > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > > > Without a separate downcall error field, we'll need to special case at
> > > > least 2 different errors -- one for a "real" EACCES and one that
> > > > indicates that the ticket expired and the upcall should be retried
> > > > instead.
> > >
> > > We can find another error for the 'ticket expired' case. EKEYEXPIRED
> > > springs to mind...
> >
> > BTW: Here be dragons!
> >
> > I think we need to handle the 'ticket expired' case as if it were an
> > NFS4ERR_DELAY/EJUKEBOX, and actually do the retry in the NFS layer after
> > a suitable exponential back-off period.
> >
> > Otherwise, we end up holding onto resources (in particular NFSv4.1
> > slots, but also RPC slots, ...) which will cause congestion, and prevent
> > other RPC calls from making progress.
> >
>
> Thanks. My original thought was that we should handle this situation as
> we do when gssd is down -- just retry at the RPC layer. I hadn't
> considered the resource issue however. I'll shoot for making the retry
> happen at the NFS layer instead. That should also make it easier to
> handle this situation differently on hard vs. soft mounts too.
>

It will also make it easier to do things like preventing flushd from
hanging forever on a set of writebacks that cannot make progress.

At some point we might also want to allow the administrator to set a
limit on the number of write retries, so that a user who decides to go
on a 1 year sabbatical doesn't end up holding up access to a file
forever...

Cheers
Trond


2009-12-18 19:14:08

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 18 Dec 2009 13:30:27 -0500
Trond Myklebust <[email protected]> wrote:

> On Fri, 2009-12-18 at 10:37 -0500, Jeff Layton wrote:
> > On Fri, 18 Dec 2009 10:12:22 -0500
> > Trond Myklebust <[email protected]> wrote:
> >
> > > On Fri, 2009-12-18 at 09:47 -0500, Trond Myklebust wrote:
> > > > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > > > > Without a separate downcall error field, we'll need to special case at
> > > > > least 2 different errors -- one for a "real" EACCES and one that
> > > > > indicates that the ticket expired and the upcall should be retried
> > > > > instead.
> > > >
> > > > We can find another error for the 'ticket expired' case. EKEYEXPIRED
> > > > springs to mind...
> > >
> > > BTW: Here be dragons!
> > >
> > > I think we need to handle the 'ticket expired' case as if it were an
> > > NFS4ERR_DELAY/EJUKEBOX, and actually do the retry in the NFS layer after
> > > a suitable exponential back-off period.
> > >
> > > Otherwise, we end up holding onto resources (in particular NFSv4.1
> > > slots, but also RPC slots, ...) which will cause congestion, and prevent
> > > other RPC calls from making progress.
> > >
> >
> > Thanks. My original thought was that we should handle this situation as
> > we do when gssd is down -- just retry at the RPC layer. I hadn't
> > considered the resource issue however. I'll shoot for making the retry
> > happen at the NFS layer instead. That should also make it easier to
> > handle this situation differently on hard vs. soft mounts too.
> >
>
> It will also make it easier to do things like preventing flushd from
> hanging forever on a set of writebacks that cannot make progress.
>
> At some point we might also want to allow the administrator to set a
> limit on the number of write retries, so that a user who decides to go
> on a 1 year sabbatical doesn't end up holding up access to a file
> forever...
>

Possibly. To make the calls start erroring out with the design I'm
working on, all you'd need to do is destroy their credcache. That's a
manual process though and it might be better to be able to handle this
situation more automatically. I'll need to ponder it some...

I'd like to avoid too much scope creep here. My feeling here is that we
should start simply and just make this situation behave like
NFS4ERR_DELAY/EJUKEBOX for the first pass. If that turns up problems,
then we can modify that behavior.

Sound ok?

--
Jeff Layton <[email protected]>

2009-12-18 19:33:37

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 2009-12-18 at 10:25 -0500, Jeff Layton wrote:
> On Fri, 18 Dec 2009 09:47:52 -0500
> Trond Myklebust <[email protected]> wrote:
>
> > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > > Yeah, we could do that with the existing code. I sort of don't like
> > > that because it's hard to know if other functions could eventually
> > > return an EACCES for another reason and then that error would bubble up
> > > to this function. If you think it's the right thing to do though, I'm
> > > OK with it.
> >
> > The error EACCES means 'you are not authorised'. I don't see how it can
> > be ambiguous here.
> >
>
> I guess I was thinking that the pipe writer might not be authorized to
> import the sec context for some reason, which would be a different
> situation. If that'll never be the case, then a single error code is
> probably fine. I'll respin it to just special case EACCES for now and
> will plan to have it special case the other error when that work is
> more fleshed out.
>
> > > FWIW: The reason I'm poking around in here is because I'm taking a stab
> > > at fixing the problem where syscalls start returning errors when a krb5
> > > ticket expires.
> > >
> > > As part of that, I want to have gssd send a more granular error code
> > > and have the kernel adjust what it does accordingly. I'd like to have
> > > it retry the upcall indefinitely when there's an expired credcache, and
> > > return an error when there's no credcache at all).
> >
> > That makes sense.
> >
> > > Without a separate downcall error field, we'll need to special case at
> > > least 2 different errors -- one for a "real" EACCES and one that
> > > indicates that the ticket expired and the upcall should be retried
> > > instead.
> >
> > We can find another error for the 'ticket expired' case. EKEYEXPIRED
> > springs to mind...
> >
>
> That's exactly the error I was going to use.
>
> > > > BTW: while looking at this, I spotted a nasty bug in
> > > > gss_import_sec_context_kerberos(). If the kzalloc() call fails, we will
> > > > return a random error code since 'p' still points to a valid memory
> > > > location...
> > >
> > > Good catch. Do you want fix that one, or should I?
> > >
> >
> > I can do it, if you like.
> >
>
> Sure. I see another problem in this area too. gss_import_sec_context
> can return GSS_S_FAILURE which is unsigned and positive when cast to a
> signed value. gss_import_sec_context checks for a negative return from
> that function though. Should it be checking for non-zero instead?
>

No. All the other functions there return ENOMEM on memory allocation
failure. The GSS_S_FAILURE is just wrong.


--------------------------------------------------------------------------------------------
SUNRPC: Fix the return value in gss_import_sec_context()

From: Trond Myklebust <[email protected]>

If the context allocation fails, it will return GSS_S_FAILURE, which is
neither a valid error code, nor is it even negative.

Return ENOMEM instead...

Reported-by: Jeff Layton <[email protected]>
Cc: [email protected]
Signed-off-by: Trond Myklebust <[email protected]>
---

net/sunrpc/auth_gss/gss_mech_switch.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c b/net/sunrpc/auth_gss/gss_mech_switch.c
index 6efbb0c..76e4c6f 100644
--- a/net/sunrpc/auth_gss/gss_mech_switch.c
+++ b/net/sunrpc/auth_gss/gss_mech_switch.c
@@ -252,7 +252,7 @@ gss_import_sec_context(const void *input_token, size_t bufsize,
struct gss_ctx **ctx_id)
{
if (!(*ctx_id = kzalloc(sizeof(**ctx_id), GFP_KERNEL)))
- return GSS_S_FAILURE;
+ return -ENOMEM;
(*ctx_id)->mech_type = gss_mech_get(mech);

return mech->gm_ops

2009-12-18 19:42:29

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 2009-12-18 at 14:14 -0500, Jeff Layton wrote:
> On Fri, 18 Dec 2009 13:30:27 -0500
> Trond Myklebust <[email protected]> wrote:
>
> > On Fri, 2009-12-18 at 10:37 -0500, Jeff Layton wrote:
> > > On Fri, 18 Dec 2009 10:12:22 -0500
> > > Trond Myklebust <[email protected]> wrote:
> > >
> > > > On Fri, 2009-12-18 at 09:47 -0500, Trond Myklebust wrote:
> > > > > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > > > > > Without a separate downcall error field, we'll need to special case at
> > > > > > least 2 different errors -- one for a "real" EACCES and one that
> > > > > > indicates that the ticket expired and the upcall should be retried
> > > > > > instead.
> > > > >
> > > > > We can find another error for the 'ticket expired' case. EKEYEXPIRED
> > > > > springs to mind...
> > > >
> > > > BTW: Here be dragons!
> > > >
> > > > I think we need to handle the 'ticket expired' case as if it were an
> > > > NFS4ERR_DELAY/EJUKEBOX, and actually do the retry in the NFS layer after
> > > > a suitable exponential back-off period.
> > > >
> > > > Otherwise, we end up holding onto resources (in particular NFSv4.1
> > > > slots, but also RPC slots, ...) which will cause congestion, and prevent
> > > > other RPC calls from making progress.
> > > >
> > >
> > > Thanks. My original thought was that we should handle this situation as
> > > we do when gssd is down -- just retry at the RPC layer. I hadn't
> > > considered the resource issue however. I'll shoot for making the retry
> > > happen at the NFS layer instead. That should also make it easier to
> > > handle this situation differently on hard vs. soft mounts too.
> > >
> >
> > It will also make it easier to do things like preventing flushd from
> > hanging forever on a set of writebacks that cannot make progress.
> >
> > At some point we might also want to allow the administrator to set a
> > limit on the number of write retries, so that a user who decides to go
> > on a 1 year sabbatical doesn't end up holding up access to a file
> > forever...
> >
>
> Possibly. To make the calls start erroring out with the design I'm
> working on, all you'd need to do is destroy their credcache. That's a
> manual process though and it might be better to be able to handle this
> situation more automatically. I'll need to ponder it some...
>
> I'd like to avoid too much scope creep here. My feeling here is that we
> should start simply and just make this situation behave like
> NFS4ERR_DELAY/EJUKEBOX for the first pass. If that turns up problems,
> then we can modify that behavior.
>
> Sound ok?

Sure. I fully agree that the first pass should be to do the
NFS4ERR_DELAY/EJUKEBOX behaviour. I'm just saying that this may be
something that we want to do in the future, and so it is worth
considering what consequences that will have for the very basic
design...

Trond

2009-12-18 20:14:11

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: on successful gss error pipe write, don't return error

On Fri, 18 Dec 2009 14:33:37 -0500
Trond Myklebust <[email protected]> wrote:

> On Fri, 2009-12-18 at 10:25 -0500, Jeff Layton wrote:
> > On Fri, 18 Dec 2009 09:47:52 -0500
> > Trond Myklebust <[email protected]> wrote:
> >
> > > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote:
> > > > Yeah, we could do that with the existing code. I sort of don't like
> > > > that because it's hard to know if other functions could eventually
> > > > return an EACCES for another reason and then that error would bubble up
> > > > to this function. If you think it's the right thing to do though, I'm
> > > > OK with it.
> > >
> > > The error EACCES means 'you are not authorised'. I don't see how it can
> > > be ambiguous here.
> > >
> >
> > I guess I was thinking that the pipe writer might not be authorized to
> > import the sec context for some reason, which would be a different
> > situation. If that'll never be the case, then a single error code is
> > probably fine. I'll respin it to just special case EACCES for now and
> > will plan to have it special case the other error when that work is
> > more fleshed out.
> >
> > > > FWIW: The reason I'm poking around in here is because I'm taking a stab
> > > > at fixing the problem where syscalls start returning errors when a krb5
> > > > ticket expires.
> > > >
> > > > As part of that, I want to have gssd send a more granular error code
> > > > and have the kernel adjust what it does accordingly. I'd like to have
> > > > it retry the upcall indefinitely when there's an expired credcache, and
> > > > return an error when there's no credcache at all).
> > >
> > > That makes sense.
> > >
> > > > Without a separate downcall error field, we'll need to special case at
> > > > least 2 different errors -- one for a "real" EACCES and one that
> > > > indicates that the ticket expired and the upcall should be retried
> > > > instead.
> > >
> > > We can find another error for the 'ticket expired' case. EKEYEXPIRED
> > > springs to mind...
> > >
> >
> > That's exactly the error I was going to use.
> >
> > > > > BTW: while looking at this, I spotted a nasty bug in
> > > > > gss_import_sec_context_kerberos(). If the kzalloc() call fails, we will
> > > > > return a random error code since 'p' still points to a valid memory
> > > > > location...
> > > >
> > > > Good catch. Do you want fix that one, or should I?
> > > >
> > >
> > > I can do it, if you like.
> > >
> >
> > Sure. I see another problem in this area too. gss_import_sec_context
> > can return GSS_S_FAILURE which is unsigned and positive when cast to a
> > signed value. gss_import_sec_context checks for a negative return from
> > that function though. Should it be checking for non-zero instead?
> >
>
> No. All the other functions there return ENOMEM on memory allocation
> failure. The GSS_S_FAILURE is just wrong.
>
>
> --------------------------------------------------------------------------------------------
> SUNRPC: Fix the return value in gss_import_sec_context()
>
> From: Trond Myklebust <[email protected]>
>
> If the context allocation fails, it will return GSS_S_FAILURE, which is
> neither a valid error code, nor is it even negative.
>
> Return ENOMEM instead...
>
> Reported-by: Jeff Layton <[email protected]>
> Cc: [email protected]
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> net/sunrpc/auth_gss/gss_mech_switch.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c b/net/sunrpc/auth_gss/gss_mech_switch.c
> index 6efbb0c..76e4c6f 100644
> --- a/net/sunrpc/auth_gss/gss_mech_switch.c
> +++ b/net/sunrpc/auth_gss/gss_mech_switch.c
> @@ -252,7 +252,7 @@ gss_import_sec_context(const void *input_token, size_t bufsize,
> struct gss_ctx **ctx_id)
> {
> if (!(*ctx_id = kzalloc(sizeof(**ctx_id), GFP_KERNEL)))
> - return GSS_S_FAILURE;
> + return -ENOMEM;
> (*ctx_id)->mech_type = gss_mech_get(mech);
>
> return mech->gm_ops
>

Looks good.

Acked-by: Jeff Layton <[email protected]>