2016-12-05 04:10:21

by NeilBrown

[permalink] [raw]
Subject: [PATCH] SUNRPC: fix refcounting problems with auth_gss messages.


There are two problems with refcounting of auth_gss messages.

First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted. It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.

However there is no guaranty of this. I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.

One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
calls __gss_find_upcall(pipe, U, NULL) and it finds the
*second* message, as new messages are placed at the head
of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
to this message that has just been freed.

I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().

It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details. In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.

The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.

Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service")
Cc: [email protected]
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250
Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/auth_gss/auth_gss.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 3dfd769dc5b5..16cea00c959b 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -541,9 +541,13 @@ gss_setup_upcall(struct gss_auth *gss_auth, struct rpc_cred *cred)
return gss_new;
gss_msg = gss_add_msg(gss_new);
if (gss_msg == gss_new) {
- int res = rpc_queue_upcall(gss_new->pipe, &gss_new->msg);
+ int res;
+ atomic_inc(&gss_msg->count);
+ res = rpc_queue_upcall(gss_new->pipe, &gss_new->msg);
if (res) {
gss_unhash_msg(gss_new);
+ atomic_dec(&gss_msg->count);
+ gss_release_msg(gss_new);
gss_msg = ERR_PTR(res);
}
} else
@@ -836,6 +840,7 @@ gss_pipe_destroy_msg(struct rpc_pipe_msg *msg)
warn_gssd();
gss_release_msg(gss_msg);
}
+ gss_release_msg(gss_msg);
}

static void gss_pipe_dentry_destroy(struct dentry *dir,
--
2.10.2


Attachments:
signature.asc (832.00 B)

2016-12-05 17:29:08

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: fix refcounting problems with auth_gss messages.

On Sun, Dec 4, 2016 at 11:10 PM, NeilBrown <[email protected]> wrote:
>
> There are two problems with refcounting of auth_gss messages.
>
> First, the reference on the pipe->pipe list (taken by a call
> to rpc_queue_upcall()) is not counted. It seems to be
> assumed that a message in pipe->pipe will always also be in
> pipe->in_downcall, where it is correctly reference counted.
>
> However there is no guaranty of this. I have a report of a
> NULL dereferences in rpc_pipe_read() which suggests a msg
> that has been freed is still on the pipe->pipe list.
>
> One way I imagine this might happen is:
> - message is queued for uid=U and auth->service=S1
> - rpc.gssd reads this message and starts processing.
> This removes the message from pipe->pipe
> - message is queued for uid=U and auth->service=S2
> - rpc.gssd replies to the first message. gss_pipe_downcall()
> calls __gss_find_upcall(pipe, U, NULL) and it finds the
> *second* message, as new messages are placed at the head
> of ->in_downcall, and the service type is not checked.

Correct "service" was/is not a part of the protocol between the kernel
and gssd. So that's why the kernel can't match what's coming from gssd
to a particular waiting upcall.

> - This second message is removed from ->in_downcall and freed
> by gss_release_msg() (even though it is still on pipe->pipe)
> - rpc.gssd tries to read another message, and dereferences a pointer
> to this message that has just been freed.

Agreed. This is a problem.

Doesn't the problem still exist even with this patch because
gss_add_msg() adds the msg onto the in_downcall() list? So gssd in
__gss_fin_upcall() can find the 2nd upcall even before the 2nd msg is
added to the pipe->pipe()?

In the perfect world, the real solution (even for the initial problem)
would have been changing the gssd-kernel protocol to include "service"
as a part of the upcall. But that's not going to happen...

The solution I can thinking of is something where the message is not
on "in_downcall" list until it's consumed by the rpc_pipe. I'd need to
some time to figure out how to do that...

> I fix this by incrementing the reference count before calling
> rpc_queue_upcall(), and decrementing it if that fails, or normally in
> gss_pipe_destroy_msg().
> It seems strange that the reply doesn't target the message more
> precisely, but I don't know all the details. In any case, I think the
> reference counting irregularity became a measureable bug when the
> extra arg was added to __gss_find_upcall(), hence the Fixes: line
> below.
>
> The second problem is that if rpc_queue_upcall() fails, the new
> message is not freed. gss_alloc_msg() set the ->count to 1,
> gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
> then the pointer is discarded so the memory never gets freed.
>
> Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service")
> Cc: [email protected]
> Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250
> Signed-off-by: NeilBrown <[email protected]>
> ---
> net/sunrpc/auth_gss/auth_gss.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index 3dfd769dc5b5..16cea00c959b 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -541,9 +541,13 @@ gss_setup_upcall(struct gss_auth *gss_auth, struct rpc_cred *cred)
> return gss_new;
> gss_msg = gss_add_msg(gss_new);
> if (gss_msg == gss_new) {
> - int res = rpc_queue_upcall(gss_new->pipe, &gss_new->msg);
> + int res;
> + atomic_inc(&gss_msg->count);
> + res = rpc_queue_upcall(gss_new->pipe, &gss_new->msg);
> if (res) {
> gss_unhash_msg(gss_new);
> + atomic_dec(&gss_msg->count);
> + gss_release_msg(gss_new);
> gss_msg = ERR_PTR(res);
> }
> } else
> @@ -836,6 +840,7 @@ gss_pipe_destroy_msg(struct rpc_pipe_msg *msg)
> warn_gssd();
> gss_release_msg(gss_msg);
> }
> + gss_release_msg(gss_msg);
> }
>
> static void gss_pipe_dentry_destroy(struct dentry *dir,
> --
> 2.10.2
>

2016-12-05 22:04:25

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: fix refcounting problems with auth_gss messages.

On Tue, Dec 06 2016, Olga Kornievskaia wrote:

> On Sun, Dec 4, 2016 at 11:10 PM, NeilBrown <[email protected]> wrote:
>>
>> There are two problems with refcounting of auth_gss messages.
>>
>> First, the reference on the pipe->pipe list (taken by a call
>> to rpc_queue_upcall()) is not counted. It seems to be
>> assumed that a message in pipe->pipe will always also be in
>> pipe->in_downcall, where it is correctly reference counted.
>>
>> However there is no guaranty of this. I have a report of a
>> NULL dereferences in rpc_pipe_read() which suggests a msg
>> that has been freed is still on the pipe->pipe list.
>>
>> One way I imagine this might happen is:
>> - message is queued for uid=U and auth->service=S1
>> - rpc.gssd reads this message and starts processing.
>> This removes the message from pipe->pipe
>> - message is queued for uid=U and auth->service=S2
>> - rpc.gssd replies to the first message. gss_pipe_downcall()
>> calls __gss_find_upcall(pipe, U, NULL) and it finds the
>> *second* message, as new messages are placed at the head
>> of ->in_downcall, and the service type is not checked.
>
> Correct "service" was/is not a part of the protocol between the kernel
> and gssd. So that's why the kernel can't match what's coming from gssd
> to a particular waiting upcall.
>
>> - This second message is removed from ->in_downcall and freed
>> by gss_release_msg() (even though it is still on pipe->pipe)
>> - rpc.gssd tries to read another message, and dereferences a pointer
>> to this message that has just been freed.
>
> Agreed. This is a problem.
>
> Doesn't the problem still exist even with this patch because
> gss_add_msg() adds the msg onto the in_downcall() list? So gssd in
> __gss_fin_upcall() can find the 2nd upcall even before the 2nd msg is
> added to the pipe->pipe()?

The use-after-free problem is solved I think. It doesn't really make
any difference if the down-call arrives before or after
rpc_queue_upcall() is called. The msg will still not be freed before it
is removed from both lists.

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2016-12-06 18:07:54

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: fix refcounting problems with auth_gss messages.

On Mon, Dec 5, 2016 at 5:04 PM, NeilBrown <[email protected]> wrote:
> On Tue, Dec 06 2016, Olga Kornievskaia wrote:
>
>> On Sun, Dec 4, 2016 at 11:10 PM, NeilBrown <[email protected]> wrote:
>>>
>>> There are two problems with refcounting of auth_gss messages.
>>>
>>> First, the reference on the pipe->pipe list (taken by a call
>>> to rpc_queue_upcall()) is not counted. It seems to be
>>> assumed that a message in pipe->pipe will always also be in
>>> pipe->in_downcall, where it is correctly reference counted.
>>>
>>> However there is no guaranty of this. I have a report of a
>>> NULL dereferences in rpc_pipe_read() which suggests a msg
>>> that has been freed is still on the pipe->pipe list.
>>>
>>> One way I imagine this might happen is:
>>> - message is queued for uid=U and auth->service=S1
>>> - rpc.gssd reads this message and starts processing.
>>> This removes the message from pipe->pipe
>>> - message is queued for uid=U and auth->service=S2
>>> - rpc.gssd replies to the first message. gss_pipe_downcall()
>>> calls __gss_find_upcall(pipe, U, NULL) and it finds the
>>> *second* message, as new messages are placed at the head
>>> of ->in_downcall, and the service type is not checked.
>>
>> Correct "service" was/is not a part of the protocol between the kernel
>> and gssd. So that's why the kernel can't match what's coming from gssd
>> to a particular waiting upcall.
>>
>>> - This second message is removed from ->in_downcall and freed
>>> by gss_release_msg() (even though it is still on pipe->pipe)
>>> - rpc.gssd tries to read another message, and dereferences a pointer
>>> to this message that has just been freed.
>>
>> Agreed. This is a problem.
>>
>> Doesn't the problem still exist even with this patch because
>> gss_add_msg() adds the msg onto the in_downcall() list? So gssd in
>> __gss_fin_upcall() can find the 2nd upcall even before the 2nd msg is
>> added to the pipe->pipe()?
>
> The use-after-free problem is solved I think. It doesn't really make
> any difference if the down-call arrives before or after
> rpc_queue_upcall() is called. The msg will still not be freed before it
> is removed from both lists.
>

Sorry I don't see it.

Thread 1 adds an upcall and it's getting processed by gssd.
Thread 2 executes gss_add_msg() which puts the message on the
in_downcall list. Context switch (before the atomic_inc()!).
Upcall comes back from the gssd, finds msg from Thread2 in_downcall
list. gss_release_msg() will dec the counter to 0 and will remove the
msg.

2016-12-06 22:13:11

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: fix refcounting problems with auth_gss messages.

On Wed, Dec 07 2016, Olga Kornievskaia wrote:
>>>
>>> Agreed. This is a problem.
>>>
>>> Doesn't the problem still exist even with this patch because
>>> gss_add_msg() adds the msg onto the in_downcall() list? So gssd in
>>> __gss_fin_upcall() can find the 2nd upcall even before the 2nd msg is
>>> added to the pipe->pipe()?
>>
>> The use-after-free problem is solved I think. It doesn't really make
>> any difference if the down-call arrives before or after
>> rpc_queue_upcall() is called. The msg will still not be freed before it
>> is removed from both lists.
>>
>
> Sorry I don't see it.

Maybe we are looking at different code?

>
> Thread 1 adds an upcall and it's getting processed by gssd.
> Thread 2 executes gss_add_msg() which puts the message on the
> in_downcall list. Context switch (before the atomic_inc()!).

gss_add_msg(), as of 4.9-rc8, is
spin_lock(&pipe->lock);
old = __gss_find_upcall(pipe, gss_msg->uid, gss_msg->auth);
if (old == NULL) {
atomic_inc(&gss_msg->count);
list_add(&gss_msg->list, &pipe->in_downcall);
} else
gss_msg = old;
spin_unlock(&pipe->lock);

so the gss_msg is added to in_downcall *after* the atomic_inc(), and the
whole is protected by pipe->lock anyway so even if the atomic_inc() were
delayed by the CPU reordering things, there would be no risk of
gss_pipe_downcall() finding a gss_msg which didn't have the ->count
elevated.

NeilBrown

> Upcall comes back from the gssd, finds msg from Thread2 in_downcall
> list. gss_release_msg() will dec the counter to 0 and will remove the
> msg.


Attachments:
signature.asc (832.00 B)

2016-12-07 14:58:46

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: fix refcounting problems with auth_gss messages.

On Tue, Dec 6, 2016 at 5:12 PM, NeilBrown <[email protected]> wrote:
> On Wed, Dec 07 2016, Olga Kornievskaia wrote:
>>>>
>>>> Agreed. This is a problem.
>>>>
>>>> Doesn't the problem still exist even with this patch because
>>>> gss_add_msg() adds the msg onto the in_downcall() list? So gssd in
>>>> __gss_fin_upcall() can find the 2nd upcall even before the 2nd msg is
>>>> added to the pipe->pipe()?
>>>
>>> The use-after-free problem is solved I think. It doesn't really make
>>> any difference if the down-call arrives before or after
>>> rpc_queue_upcall() is called. The msg will still not be freed before it
>>> is removed from both lists.
>>>
>>
>> Sorry I don't see it.
>
> Maybe we are looking at different code?
>
>>
>> Thread 1 adds an upcall and it's getting processed by gssd.
>> Thread 2 executes gss_add_msg() which puts the message on the
>> in_downcall list. Context switch (before the atomic_inc()!).
>
> gss_add_msg(), as of 4.9-rc8, is
> spin_lock(&pipe->lock);
> old = __gss_find_upcall(pipe, gss_msg->uid, gss_msg->auth);
> if (old == NULL) {
> atomic_inc(&gss_msg->count);
> list_add(&gss_msg->list, &pipe->in_downcall);
> } else
> gss_msg = old;
> spin_unlock(&pipe->lock);
>
> so the gss_msg is added to in_downcall *after* the atomic_inc(), and the
> whole is protected by pipe->lock anyway so even if the atomic_inc() were
> delayed by the CPU reordering things, there would be no risk of
> gss_pipe_downcall() finding a gss_msg which didn't have the ->count
> elevated.
>

Ah, I missed the atomic_inc() in the gss_add_msg(). I was only looking
at your patch that did another atomic_inc(). Makes sense now.

> NeilBrown
>
>> Upcall comes back from the gssd, finds msg from Thread2 in_downcall
>> list. gss_release_msg() will dec the counter to 0 and will remove the
>> msg.