2021-02-26 23:06:55

by Daniel Kobras

[permalink] [raw]
Subject: [PATCH] sunrpc: fix refcount leak for rpc auth modules

If an auth module's accept op returns SVC_CLOSE, svc_process_common()
enters a call path that does not call svc_authorise() before leaving the
function, and thus leaks a reference on the auth module's refcount. Hence,
make sure calls to svc_authenticate() and svc_authorise() are paired for
all call paths, to make sure rpc auth modules can be unloaded.

Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
Signed-off-by: Daniel Kobras <[email protected]>
---
Hi!

While debugging NFS on a system with misconfigured krb5 settings, we noticed
a suspiciously high refcount on the auth_rpcgss module, despite all of its
consumers already unloaded. I wasn't able to analyze any further on the live
system, but had a look at the code afterwards, and found a path that seems
to leak references if the mechanism's accept() op shuts down a connection
early. Although I couldn't verify, this seem to be a plausible fix.

Kind regards,

Daniel

net/sunrpc/svc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 61fb8a18552c..d76dc9d95d16 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)

sendit:
if (svc_authorise(rqstp))
- goto close;
+ goto close_xprt;
return 1; /* Caller can now send it */

release_dropit:
@@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
return 0;

close:
+ svc_authorise(rqstp);
+close_xprt:
if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
svc_close_xprt(rqstp->rq_xprt);
dprintk("svc: svc_process close\n");
@@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
err_short_len:
svc_printk(rqstp, "short len %zd, dropping request\n",
argv->iov_len);
- goto close;
+ goto close_xprt;

err_bad_rpc:
serv->sv_stats->rpcbadfmt++;
--
2.25.1


--
Puzzle ITC Deutschland GmbH
Sitz der Gesellschaft: Eisenbahnstra?e 1, 72072
T?bingen

Eingetragen am Amtsgericht Stuttgart HRB 765802
Gesch?ftsf?hrer:
Lukas Kallies, Daniel Kobras, Mark Pr?hl


2021-03-01 15:24:22

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules


> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <[email protected]> wrote:
>
> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
> enters a call path that does not call svc_authorise() before leaving the
> function, and thus leaks a reference on the auth module's refcount. Hence,
> make sure calls to svc_authenticate() and svc_authorise() are paired for
> all call paths, to make sure rpc auth modules can be unloaded.
>
> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
> Signed-off-by: Daniel Kobras <[email protected]>
> ---
> Hi!
>
> While debugging NFS on a system with misconfigured krb5 settings, we noticed
> a suspiciously high refcount on the auth_rpcgss module, despite all of its
> consumers already unloaded. I wasn't able to analyze any further on the live
> system, but had a look at the code afterwards, and found a path that seems
> to leak references if the mechanism's accept() op shuts down a connection
> early. Although I couldn't verify, this seem to be a plausible fix.
>
> Kind regards,
>
> Daniel

Hi Daniel-

I've provisionally included your patch in my NFSD for-rc topic branch
here:

git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

Your bug report seems plausible, but I need to take a closer look at that
code and your proposed change. Would very much like to hear from others,
too.


> net/sunrpc/svc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 61fb8a18552c..d76dc9d95d16 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>
> sendit:
> if (svc_authorise(rqstp))
> - goto close;
> + goto close_xprt;
> return 1; /* Caller can now send it */
>
> release_dropit:
> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> return 0;
>
> close:
> + svc_authorise(rqstp);
> +close_xprt:
> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
> svc_close_xprt(rqstp->rq_xprt);
> dprintk("svc: svc_process close\n");
> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> err_short_len:
> svc_printk(rqstp, "short len %zd, dropping request\n",
> argv->iov_len);
> - goto close;
> + goto close_xprt;
>
> err_bad_rpc:
> serv->sv_stats->rpcbadfmt++;
> --
> 2.25.1
>
>
> --
> Puzzle ITC Deutschland GmbH
> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
> Tübingen
>
> Eingetragen am Amtsgericht Stuttgart HRB 765802
> Geschäftsführer:
> Lukas Kallies, Daniel Kobras, Mark Pröhl
>

--
Chuck Lever



2021-03-01 16:37:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules

On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>
> > On Feb 26, 2021, at 6:04 PM, Daniel Kobras <[email protected]> wrote:
> >
> > If an auth module's accept op returns SVC_CLOSE, svc_process_common()
> > enters a call path that does not call svc_authorise() before leaving the
> > function, and thus leaks a reference on the auth module's refcount. Hence,
> > make sure calls to svc_authenticate() and svc_authorise() are paired for
> > all call paths, to make sure rpc auth modules can be unloaded.
> >
> > Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
> > Signed-off-by: Daniel Kobras <[email protected]>
> > ---
> > Hi!
> >
> > While debugging NFS on a system with misconfigured krb5 settings, we noticed
> > a suspiciously high refcount on the auth_rpcgss module, despite all of its
> > consumers already unloaded. I wasn't able to analyze any further on the live
> > system, but had a look at the code afterwards, and found a path that seems
> > to leak references if the mechanism's accept() op shuts down a connection
> > early. Although I couldn't verify, this seem to be a plausible fix.
> >
> > Kind regards,
> >
> > Daniel
>
> Hi Daniel-
>
> I've provisionally included your patch in my NFSD for-rc topic branch
> here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>
> Your bug report seems plausible, but I need to take a closer look at that
> code and your proposed change. Would very much like to hear from others,
> too.

So, the effect of this is to call svc_authorise more often. I think
that's always safe, because svc_authorise is a no-op unless rq_authops
is set, it clears rq_authops itself, and rq_authops being set is a
guarantee that ->accept() already ran.

It's harder to know if this solves the problem, as I see a lot of other
mentions of THIS_MODULE in svcauth_gss.c.

Possibly orthogonal to this problem, but: svcauth_gss_release
unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
dereference if the kmalloc at the start of svcauth_gss_accept() fails?

Finally, should we care about module reference leaks? Does anyone
really *need* to unload modules? And will bad stuff happen when the
count overflows, or does the module code fail safely somehow in the
overflow case? I know, bugs are bugs, I should care about fixing all of
them, shame on me....

--b.

>
>
> > net/sunrpc/svc.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 61fb8a18552c..d76dc9d95d16 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> >
> > sendit:
> > if (svc_authorise(rqstp))
> > - goto close;
> > + goto close_xprt;
> > return 1; /* Caller can now send it */
> >
> > release_dropit:
> > @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> > return 0;
> >
> > close:
> > + svc_authorise(rqstp);
> > +close_xprt:
> > if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
> > svc_close_xprt(rqstp->rq_xprt);
> > dprintk("svc: svc_process close\n");
> > @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> > err_short_len:
> > svc_printk(rqstp, "short len %zd, dropping request\n",
> > argv->iov_len);
> > - goto close;
> > + goto close_xprt;
> >
> > err_bad_rpc:
> > serv->sv_stats->rpcbadfmt++;
> > --
> > 2.25.1
> >
> >
> > --
> > Puzzle ITC Deutschland GmbH
> > Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
> > Tübingen
> >
> > Eingetragen am Amtsgericht Stuttgart HRB 765802
> > Geschäftsführer:
> > Lukas Kallies, Daniel Kobras, Mark Pröhl
> >
>
> --
> Chuck Lever
>
>
>

2021-03-01 17:51:39

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules



> On Mar 1, 2021, at 11:28 AM, Bruce Fields <[email protected]> wrote:
>
> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>>
>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <[email protected]> wrote:
>>>
>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>> enters a call path that does not call svc_authorise() before leaving the
>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>> all call paths, to make sure rpc auth modules can be unloaded.
>>>
>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>> Signed-off-by: Daniel Kobras <[email protected]>
>>> ---
>>> Hi!
>>>
>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>> system, but had a look at the code afterwards, and found a path that seems
>>> to leak references if the mechanism's accept() op shuts down a connection
>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>>
>>> Kind regards,
>>>
>>> Daniel
>>
>> Hi Daniel-
>>
>> I've provisionally included your patch in my NFSD for-rc topic branch
>> here:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>
>> Your bug report seems plausible, but I need to take a closer look at that
>> code and your proposed change. Would very much like to hear from others,
>> too.
>
> So, the effect of this is to call svc_authorise more often. I think
> that's always safe, because svc_authorise is a no-op unless rq_authops
> is set, it clears rq_authops itself, and rq_authops being set is a
> guarantee that ->accept() already ran.
>
> It's harder to know if this solves the problem, as I see a lot of other
> mentions of THIS_MODULE in svcauth_gss.c.

Perhaps a deeper audit is necessary.

A small code change to inject SVC_CLOSE returns at random would enable
a more dynamic analysis.


> Possibly orthogonal to this problem, but: svcauth_gss_release
> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>
> Finally, should we care about module reference leaks?

I would prefer that module reference counting work as expected. When it
doesn't that tends to lead to people (say, me) hunting for bugs that
might actually be serious.


> Does anyone really *need* to unload modules?

Anyone who wants to replace the module with a newer build that fixes a
bug. It avoids a full reboot, and for some that's important.


> And will bad stuff happen when the
> count overflows, or does the module code fail safely somehow in the
> overflow case? I know, bugs are bugs, I should care about fixing all of
> them, shame on me....
>
> --b.
>
>>
>>
>>> net/sunrpc/svc.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>> index 61fb8a18552c..d76dc9d95d16 100644
>>> --- a/net/sunrpc/svc.c
>>> +++ b/net/sunrpc/svc.c
>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>
>>> sendit:
>>> if (svc_authorise(rqstp))
>>> - goto close;
>>> + goto close_xprt;
>>> return 1; /* Caller can now send it */
>>>
>>> release_dropit:
>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>> return 0;
>>>
>>> close:
>>> + svc_authorise(rqstp);
>>> +close_xprt:
>>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>> svc_close_xprt(rqstp->rq_xprt);
>>> dprintk("svc: svc_process close\n");
>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>> err_short_len:
>>> svc_printk(rqstp, "short len %zd, dropping request\n",
>>> argv->iov_len);
>>> - goto close;
>>> + goto close_xprt;
>>>
>>> err_bad_rpc:
>>> serv->sv_stats->rpcbadfmt++;
>>> --
>>> 2.25.1
>>>
>>>
>>> --
>>> Puzzle ITC Deutschland GmbH
>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
>>> Tübingen
>>>
>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>> Geschäftsführer:
>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>>
>>
>> --
>> Chuck Lever

--
Chuck Lever



2021-03-01 18:19:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules

On Mon, Mar 01, 2021 at 05:44:02PM +0000, Chuck Lever wrote:
> > So, the effect of this is to call svc_authorise more often. I think
> > that's always safe, because svc_authorise is a no-op unless rq_authops
> > is set, it clears rq_authops itself, and rq_authops being set is a
> > guarantee that ->accept() already ran.
> >
> > It's harder to know if this solves the problem, as I see a lot of other
> > mentions of THIS_MODULE in svcauth_gss.c.
>
> Perhaps a deeper audit is necessary.
>
> A small code change to inject SVC_CLOSE returns at random would enable
> a more dynamic analysis.

That might be interesting.

I don't think tihs patch necessarily has to wait for that.

> > Possibly orthogonal to this problem, but: svcauth_gss_release
> > unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
> > dereference if the kmalloc at the start of svcauth_gss_accept() fails?
> >
> > Finally, should we care about module reference leaks?
>
> I would prefer that module reference counting work as expected. When it
> doesn't that tends to lead to people (say, me) hunting for bugs that
> might actually be serious.
>
>
> > Does anyone really *need* to unload modules?
>
> Anyone who wants to replace the module with a newer build that fixes a
> bug. It avoids a full reboot, and for some that's important.

Fair enough.

--b.

2021-03-01 18:26:42

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules



> On Mar 1, 2021, at 1:15 PM, Bruce Fields <[email protected]> wrote:
>
> On Mon, Mar 01, 2021 at 05:44:02PM +0000, Chuck Lever wrote:
>>> So, the effect of this is to call svc_authorise more often. I think
>>> that's always safe, because svc_authorise is a no-op unless rq_authops
>>> is set, it clears rq_authops itself, and rq_authops being set is a
>>> guarantee that ->accept() already ran.
>>>
>>> It's harder to know if this solves the problem, as I see a lot of other
>>> mentions of THIS_MODULE in svcauth_gss.c.
>>
>> Perhaps a deeper audit is necessary.
>>
>> A small code change to inject SVC_CLOSE returns at random would enable
>> a more dynamic analysis.
>
> That might be interesting.
>
> I don't think this patch necessarily has to wait for that.

OK. It's in for-rc now, and sounds like that doesn't need to change.

Poking around a little, I see a try_module_get() and module_put() done
for every RPC. Considering that both have a preempt_disable/enable pair,
that seems a little expensive for the value it provides. One might like
to see the module reference count handled a little less frequently, but
I don't see an obvious way to address that.


>>> Possibly orthogonal to this problem, but: svcauth_gss_release
>>> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
>>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>>
>>> Finally, should we care about module reference leaks?
>>
>> I would prefer that module reference counting work as expected. When it
>> doesn't that tends to lead to people (say, me) hunting for bugs that
>> might actually be serious.
>>
>>
>>> Does anyone really *need* to unload modules?
>>
>> Anyone who wants to replace the module with a newer build that fixes a
>> bug. It avoids a full reboot, and for some that's important.
>
> Fair enough.
>
> --b.

--
Chuck Lever



2021-03-04 06:32:48

by Daniel Kobras

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules

Hi all!

Am 01.03.21 um 18:44 schrieb Chuck Lever:
>> On Mar 1, 2021, at 11:28 AM, Bruce Fields <[email protected]> wrote:
>>
>> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>>>
>>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <[email protected]> wrote:
>>>>
>>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>>> enters a call path that does not call svc_authorise() before leaving the
>>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>>> all call paths, to make sure rpc auth modules can be unloaded.
>>>>
>>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>>> Signed-off-by: Daniel Kobras <[email protected]>
>>>> ---
>>>> Hi!
>>>>
>>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>>> system, but had a look at the code afterwards, and found a path that seems
>>>> to leak references if the mechanism's accept() op shuts down a connection
>>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>>>
>>>> Kind regards,
>>>>
>>>> Daniel
>>>
>>> Hi Daniel-
>>>
>>> I've provisionally included your patch in my NFSD for-rc topic branch
>>> here:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>>
>>> Your bug report seems plausible, but I need to take a closer look at that
>>> code and your proposed change. Would very much like to hear from others,
>>> too.
>>
>> So, the effect of this is to call svc_authorise more often. I think
>> that's always safe, because svc_authorise is a no-op unless rq_authops
>> is set, it clears rq_authops itself, and rq_authops being set is a
>> guarantee that ->accept() already ran.
>>
>> It's harder to know if this solves the problem, as I see a lot of other
>> mentions of THIS_MODULE in svcauth_gss.c.
>
> Perhaps a deeper audit is necessary.
>
> A small code change to inject SVC_CLOSE returns at random would enable
> a more dynamic analysis.

I've managed to come up with simple reproducer for this bug:

On a working krb5 NFS mount from a test client, check which enctype is
used in the ticket for the NFS service. Then unmount, and exclude this
enctype from permitted_enctypes in the server's /etc/krb5.conf.[*]
Trying to mount again from the test client should now fail (EPERM), and
each mount attempt increases the refcount of the server's auth_rpcgss
module (by 22 in my test).

Exchanging sunrpc.ko for a version with the patch applied, and
re-running the same test, the refcount remains constant instead. This
confirms the initial analysis, and indicates the fix is actually correct.

[*] For a quick test in a standard setup, I've used
[libdefaults]
permitted_enctypes = aes128-cts-hmac-sha1-96
(...)
to make the normal AES256 tickets fail. A more realistic scenario
would be a client that's restricted to RC4, and a server with
RC4 keys on the KDC, but only AES allowed in krb5.conf. Default
behaviour of typical AD join tools makes it easy to end up in a
situation like this.

>> Possibly orthogonal to this problem, but: svcauth_gss_release
>> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>
>> Finally, should we care about module reference leaks?
>
> I would prefer that module reference counting work as expected. When it
> doesn't that tends to lead to people (say, me) hunting for bugs that
> might actually be serious.

The refcount leak is the easily visible consequence, but the skipped
call to svcauth_gss_release() probably also leaks a small amount of
memory for each request. Repeating the test case above for a longer
period of time (eg. by throwing an automounter into the mix), this might
eventually become noticeable.

>> Does anyone really *need* to unload modules?
>
> Anyone who wants to replace the module with a newer build that fixes a
> bug. It avoids a full reboot, and for some that's important.

Switching from rpc.svcgssd to gssproxy without taking down the machine
as a whole was the situation that originally prompted me to look into
this, but I admit that's a rather rare use case.

>> And will bad stuff happen when the
>> count overflows, or does the module code fail safely somehow in the
>> overflow case? I know, bugs are bugs, I should care about fixing all of
>> them, shame on me....
>>
>> --b.
>>
>>>
>>>
>>>> net/sunrpc/svc.c | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>>> index 61fb8a18552c..d76dc9d95d16 100644
>>>> --- a/net/sunrpc/svc.c
>>>> +++ b/net/sunrpc/svc.c
>>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>
>>>> sendit:
>>>> if (svc_authorise(rqstp))
>>>> - goto close;
>>>> + goto close_xprt;
>>>> return 1; /* Caller can now send it */
>>>>
>>>> release_dropit:
>>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>> return 0;
>>>>
>>>> close:
>>>> + svc_authorise(rqstp);
>>>> +close_xprt:
>>>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>>> svc_close_xprt(rqstp->rq_xprt);
>>>> dprintk("svc: svc_process close\n");
>>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>> err_short_len:
>>>> svc_printk(rqstp, "short len %zd, dropping request\n",
>>>> argv->iov_len);
>>>> - goto close;
>>>> + goto close_xprt;
>>>>
>>>> err_bad_rpc:
>>>> serv->sv_stats->rpcbadfmt++;
>>>> --
>>>> 2.25.1
>>>>
>>>>
>>>> --
>>>> Puzzle ITC Deutschland GmbH
>>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
>>>> Tübingen
>>>>
>>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>>> Geschäftsführer:
>>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>>>
>>>
>>> --
>>> Chuck Lever
>
> --
> Chuck Lever

Kind regards,

Daniel
--
Daniel Kobras
Principal Architect
Puzzle ITC Deutschland
+49 7071 14316 0
http://www.puzzle-itc.de

--
Puzzle ITC Deutschland GmbH
Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
Tübingen

Eingetragen am Amtsgericht Stuttgart HRB 765802
Geschäftsführer:
Lukas Kallies, Daniel Kobras, Mark Pröhl

2021-03-04 06:34:12

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules

Excellent, Daniel. Thanks for following up! I will add a Link: tag
to this thread in the patch description.


> On Mar 2, 2021, at 6:50 AM, Daniel Kobras <[email protected]> wrote:
>
> Hi all!
>
> Am 01.03.21 um 18:44 schrieb Chuck Lever:
>>> On Mar 1, 2021, at 11:28 AM, Bruce Fields <[email protected]> wrote:
>>>
>>> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>>>>
>>>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <[email protected]> wrote:
>>>>>
>>>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>>>> enters a call path that does not call svc_authorise() before leaving the
>>>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>>>> all call paths, to make sure rpc auth modules can be unloaded.
>>>>>
>>>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>>>> Signed-off-by: Daniel Kobras <[email protected]>
>>>>> ---
>>>>> Hi!
>>>>>
>>>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>>>> system, but had a look at the code afterwards, and found a path that seems
>>>>> to leak references if the mechanism's accept() op shuts down a connection
>>>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Daniel
>>>>
>>>> Hi Daniel-
>>>>
>>>> I've provisionally included your patch in my NFSD for-rc topic branch
>>>> here:
>>>>
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>>>
>>>> Your bug report seems plausible, but I need to take a closer look at that
>>>> code and your proposed change. Would very much like to hear from others,
>>>> too.
>>>
>>> So, the effect of this is to call svc_authorise more often. I think
>>> that's always safe, because svc_authorise is a no-op unless rq_authops
>>> is set, it clears rq_authops itself, and rq_authops being set is a
>>> guarantee that ->accept() already ran.
>>>
>>> It's harder to know if this solves the problem, as I see a lot of other
>>> mentions of THIS_MODULE in svcauth_gss.c.
>>
>> Perhaps a deeper audit is necessary.
>>
>> A small code change to inject SVC_CLOSE returns at random would enable
>> a more dynamic analysis.
>
> I've managed to come up with simple reproducer for this bug:
>
> On a working krb5 NFS mount from a test client, check which enctype is
> used in the ticket for the NFS service. Then unmount, and exclude this
> enctype from permitted_enctypes in the server's /etc/krb5.conf.[*]
> Trying to mount again from the test client should now fail (EPERM), and
> each mount attempt increases the refcount of the server's auth_rpcgss
> module (by 22 in my test).
>
> Exchanging sunrpc.ko for a version with the patch applied, and
> re-running the same test, the refcount remains constant instead. This
> confirms the initial analysis, and indicates the fix is actually correct.
>
> [*] For a quick test in a standard setup, I've used
> [libdefaults]
> permitted_enctypes = aes128-cts-hmac-sha1-96
> (...)
> to make the normal AES256 tickets fail. A more realistic scenario
> would be a client that's restricted to RC4, and a server with
> RC4 keys on the KDC, but only AES allowed in krb5.conf. Default
> behaviour of typical AD join tools makes it easy to end up in a
> situation like this.
>
>>> Possibly orthogonal to this problem, but: svcauth_gss_release
>>> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
>>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>>
>>> Finally, should we care about module reference leaks?
>>
>> I would prefer that module reference counting work as expected. When it
>> doesn't that tends to lead to people (say, me) hunting for bugs that
>> might actually be serious.
>
> The refcount leak is the easily visible consequence, but the skipped
> call to svcauth_gss_release() probably also leaks a small amount of
> memory for each request. Repeating the test case above for a longer
> period of time (eg. by throwing an automounter into the mix), this might
> eventually become noticeable.
>
>>> Does anyone really *need* to unload modules?
>>
>> Anyone who wants to replace the module with a newer build that fixes a
>> bug. It avoids a full reboot, and for some that's important.
>
> Switching from rpc.svcgssd to gssproxy without taking down the machine
> as a whole was the situation that originally prompted me to look into
> this, but I admit that's a rather rare use case.
>
>>> And will bad stuff happen when the
>>> count overflows, or does the module code fail safely somehow in the
>>> overflow case? I know, bugs are bugs, I should care about fixing all of
>>> them, shame on me....
>>>
>>> --b.
>>>
>>>>
>>>>
>>>>> net/sunrpc/svc.c | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>>>> index 61fb8a18552c..d76dc9d95d16 100644
>>>>> --- a/net/sunrpc/svc.c
>>>>> +++ b/net/sunrpc/svc.c
>>>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>>
>>>>> sendit:
>>>>> if (svc_authorise(rqstp))
>>>>> - goto close;
>>>>> + goto close_xprt;
>>>>> return 1; /* Caller can now send it */
>>>>>
>>>>> release_dropit:
>>>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> return 0;
>>>>>
>>>>> close:
>>>>> + svc_authorise(rqstp);
>>>>> +close_xprt:
>>>>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>>>> svc_close_xprt(rqstp->rq_xprt);
>>>>> dprintk("svc: svc_process close\n");
>>>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> err_short_len:
>>>>> svc_printk(rqstp, "short len %zd, dropping request\n",
>>>>> argv->iov_len);
>>>>> - goto close;
>>>>> + goto close_xprt;
>>>>>
>>>>> err_bad_rpc:
>>>>> serv->sv_stats->rpcbadfmt++;
>>>>> --
>>>>> 2.25.1
>>>>>
>>>>>
>>>>> --
>>>>> Puzzle ITC Deutschland GmbH
>>>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
>>>>> Tübingen
>>>>>
>>>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>>>> Geschäftsführer:
>>>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>
>> --
>> Chuck Lever
>
> Kind regards,
>
> Daniel
> --
> Daniel Kobras
> Principal Architect
> Puzzle ITC Deutschland
> +49 7071 14316 0
> http://www.puzzle-itc.de
>
> --
> Puzzle ITC Deutschland GmbH
> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072
> Tübingen
>
> Eingetragen am Amtsgericht Stuttgart HRB 765802
> Geschäftsführer:
> Lukas Kallies, Daniel Kobras, Mark Pröhl

--
Chuck Lever



2021-03-04 07:51:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] rpc: fix NULL dereference on kmalloc failure

From: "J. Bruce Fields" <[email protected]>

I think this is unlikely but possible:

svc_authenticate sets rq_authop and calls svcauth_gss_accept. The
kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL,
and returning SVC_DENIED.

This causes svc_process_common to go to err_bad_auth, and eventually
call svc_authorise. That calls ->release == svcauth_gss_release, which
tries to dereference rq_auth_data.

Signed-off-by: J. Bruce Fields <[email protected]>
---
net/sunrpc/auth_gss/svcauth_gss.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

On Mon, Mar 01, 2021 at 11:28:20AM -0500, Bruce Fields wrote:
> Possibly orthogonal to this problem, but: svcauth_gss_release
> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
> dereference if the kmalloc at the start of svcauth_gss_accept() fails?

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index bd4678db9d76..6dff64374bfe 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1825,11 +1825,14 @@ static int
svcauth_gss_release(struct svc_rqst *rqstp)
{
struct gss_svc_data *gsd = (struct gss_svc_data *)rqstp->rq_auth_data;
- struct rpc_gss_wire_cred *gc = &gsd->clcred;
+ struct rpc_gss_wire_cred *gc;
struct xdr_buf *resbuf = &rqstp->rq_res;
int stat = -EINVAL;
struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);

+ if (!gsd)
+ goto out;
+ gc = &gsd->clcred;
if (gc->gc_proc != RPC_GSS_PROC_DATA)
goto out;
/* Release can be called twice, but we only wrap once. */
@@ -1870,10 +1873,10 @@ svcauth_gss_release(struct svc_rqst *rqstp)
if (rqstp->rq_cred.cr_group_info)
put_group_info(rqstp->rq_cred.cr_group_info);
rqstp->rq_cred.cr_group_info = NULL;
- if (gsd->rsci)
+ if (gsd && gsd->rsci) {
cache_put(&gsd->rsci->h, sn->rsc_cache);
- gsd->rsci = NULL;
-
+ gsd->rsci = NULL;
+ }
return stat;
}

--
2.29.2

2021-03-04 14:02:29

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] rpc: fix NULL dereference on kmalloc failure

Hi Bruce-

> On Mar 2, 2021, at 10:48 AM, Bruce Fields <[email protected]> wrote:
>
> From: "J. Bruce Fields" <[email protected]>
>
> I think this is unlikely but possible:
>
> svc_authenticate sets rq_authop and calls svcauth_gss_accept. The
> kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL,
> and returning SVC_DENIED.
>
> This causes svc_process_common to go to err_bad_auth, and eventually
> call svc_authorise. That calls ->release == svcauth_gss_release, which
> tries to dereference rq_auth_data.
>
> Signed-off-by: J. Bruce Fields <[email protected]>

Thanks, now included in the for-rc topic branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

with the addition of a Link: tag to reference the extra text below.


> ---
> net/sunrpc/auth_gss/svcauth_gss.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> On Mon, Mar 01, 2021 at 11:28:20AM -0500, Bruce Fields wrote:
>> Possibly orthogonal to this problem, but: svcauth_gss_release
>> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL
>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index bd4678db9d76..6dff64374bfe 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -1825,11 +1825,14 @@ static int
> svcauth_gss_release(struct svc_rqst *rqstp)
> {
> struct gss_svc_data *gsd = (struct gss_svc_data *)rqstp->rq_auth_data;
> - struct rpc_gss_wire_cred *gc = &gsd->clcred;
> + struct rpc_gss_wire_cred *gc;
> struct xdr_buf *resbuf = &rqstp->rq_res;
> int stat = -EINVAL;
> struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
>
> + if (!gsd)
> + goto out;
> + gc = &gsd->clcred;
> if (gc->gc_proc != RPC_GSS_PROC_DATA)
> goto out;
> /* Release can be called twice, but we only wrap once. */
> @@ -1870,10 +1873,10 @@ svcauth_gss_release(struct svc_rqst *rqstp)
> if (rqstp->rq_cred.cr_group_info)
> put_group_info(rqstp->rq_cred.cr_group_info);
> rqstp->rq_cred.cr_group_info = NULL;
> - if (gsd->rsci)
> + if (gsd && gsd->rsci) {
> cache_put(&gsd->rsci->h, sn->rsc_cache);
> - gsd->rsci = NULL;
> -
> + gsd->rsci = NULL;
> + }
> return stat;
> }
>
> --
> 2.29.2
>

--
Chuck Lever