2010-02-16 19:17:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

This is the second attempt at this patch. The main changes are that this
one doesn't set a floor value for the size of the group list. There are
also a few minor cleanups and comments added.

If authunix_create_default() is called by a user with more than 16
supplimental groups, it'll abort(), which causes the program to crash
and coredump.

Fix it to handle this situation gracefully. Get the number of groups
that the user has first, and then allocate a big enough buffer to hold
them. Then, just don't let the lower function use more than the NGRPS
groups.

Also fix up the error handling in this function so that it just returns
a NULL pointer on error and logs a message via warnx() instead of
calling abort().

Reported-by: Peter Engel <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/src/auth_unix.c b/src/auth_unix.c
index 71ca15d..a295e71 100644
--- a/src/auth_unix.c
+++ b/src/auth_unix.c
@@ -49,6 +49,7 @@
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
+#include <errno.h>

#include <rpc/types.h>
#include <rpc/xdr.h>
@@ -175,20 +176,69 @@ AUTH *
authunix_create_default()
{
int len;
+ size_t bufsize = 0;
char machname[MAXHOSTNAMELEN + 1];
uid_t uid;
gid_t gid;
- gid_t gids[NGRPS];
+ gid_t *gids = NULL;
+ AUTH *auth;
+
+ if (gethostname(machname, sizeof machname) == -1) {
+ warnx("%s: gethostname() failed: %s", __func__,
+ strerror(errno));
+ return NULL;
+ }

- if (gethostname(machname, sizeof machname) == -1)
- abort();
machname[sizeof(machname) - 1] = 0;
uid = geteuid();
gid = getegid();
- if ((len = getgroups(NGRPS, gids)) < 0)
- abort();
+
+retry:
+ len = getgroups(0, NULL);
+ if (len < 0) {
+ warnx("%s: failed to get number of groups: %s", __func__,
+ strerror(errno));
+ return NULL;
+ }
+
+ if (len == 0)
+ goto no_groups;
+
+ bufsize = len * sizeof(gid_t);
+ gids = mem_alloc(bufsize);
+ if (gids == NULL) {
+ warnx("%s: memory allocation failed: %s", __func__,
+ strerror(ENOMEM));
+ return NULL;
+ }
+
+ len = getgroups(len, gids);
+ if (len < 0) {
+ mem_free(gids, bufsize);
+ /*
+ * glibc equivalent routines mention that it's possible for
+ * the number of groups to change between two getgroups calls.
+ * If that happens, retry the whole thing again.
+ */
+ if (len == -EINVAL) {
+ gids = NULL;
+ bufsize = 0;
+ goto retry;
+ }
+ warnx("%s: failed to get group list: %s", __func__,
+ strerror(errno));
+ return NULL;
+ }
+
+ /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
+ if (len > NGRPS)
+ len = NGRPS;
+
+no_groups:
/* XXX: interface problem; those should all have been unsigned */
- return (authunix_create(machname, uid, gid, len, gids));
+ auth = authunix_create(machname, uid, gid, len, gids);
+ mem_free(gids, bufsize);
+ return auth;
}

/*
--
1.6.6



2010-02-16 19:24:33

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

Jeff Layton wrote:
> This is the second attempt at this patch. The main changes are that this
> one doesn't set a floor value for the size of the group list. There are
> also a few minor cleanups and comments added.
>
> If authunix_create_default() is called by a user with more than 16
> supplimental groups, it'll abort(), which causes the program to crash
> and coredump.
>
> Fix it to handle this situation gracefully. Get the number of groups
> that the user has first, and then allocate a big enough buffer to hold
> them. Then, just don't let the lower function use more than the NGRPS
> groups.
>
> Also fix up the error handling in this function so that it just returns
> a NULL pointer on error and logs a message via warnx() instead of
> calling abort().
>
> Reported-by: Peter Engel <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 files changed, 56 insertions(+), 6 deletions(-)
>
> diff --git a/src/auth_unix.c b/src/auth_unix.c
> index 71ca15d..a295e71 100644
> --- a/src/auth_unix.c
> +++ b/src/auth_unix.c
> @@ -49,6 +49,7 @@
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> +#include <errno.h>
>
> #include <rpc/types.h>
> #include <rpc/xdr.h>
> @@ -175,20 +176,69 @@ AUTH *
> authunix_create_default()
> {
> int len;
> + size_t bufsize = 0;
> char machname[MAXHOSTNAMELEN + 1];
> uid_t uid;
> gid_t gid;
> - gid_t gids[NGRPS];
> + gid_t *gids = NULL;
> + AUTH *auth;
> +
> + if (gethostname(machname, sizeof machname) == -1) {
> + warnx("%s: gethostname() failed: %s", __func__,
> + strerror(errno));
> + return NULL;
> + }
>
> - if (gethostname(machname, sizeof machname) == -1)
> - abort();
> machname[sizeof(machname) - 1] = 0;
> uid = geteuid();
> gid = getegid();
> - if ((len = getgroups(NGRPS, gids)) < 0)
> - abort();
> +
> +retry:
> + len = getgroups(0, NULL);
> + if (len < 0) {
> + warnx("%s: failed to get number of groups: %s", __func__,
> + strerror(errno));
> + return NULL;
> + }
> +
> + if (len == 0)
> + goto no_groups;
> +
> + bufsize = len * sizeof(gid_t);
> + gids = mem_alloc(bufsize);
> + if (gids == NULL) {
> + warnx("%s: memory allocation failed: %s", __func__,
> + strerror(ENOMEM));
> + return NULL;
> + }
> +
> + len = getgroups(len, gids);
> + if (len < 0) {
> + mem_free(gids, bufsize);
> + /*
> + * glibc equivalent routines mention that it's possible for
> + * the number of groups to change between two getgroups calls.
> + * If that happens, retry the whole thing again.
> + */
> + if (len == -EINVAL) {
> + gids = NULL;
> + bufsize = 0;
> + goto retry;
> + }
> + warnx("%s: failed to get group list: %s", __func__,
> + strerror(errno));
> + return NULL;
> + }
> +
> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
> + if (len > NGRPS)
> + len = NGRPS;
> +
> +no_groups:
> /* XXX: interface problem; those should all have been unsigned */
> - return (authunix_create(machname, uid, gid, len, gids));
> + auth = authunix_create(machname, uid, gid, len, gids);
> + mem_free(gids, bufsize);
> + return auth;
> }
>
> /*

This change to restrict the groups used to the first NGRPS
groups is one that we have always avoided. It can be quite
confusing to the user, to have an operation fail, but then
to look and notice that the correct group is listed.

Having the library abort seems odd and wrong, but this will
also change semantics. Is there really a problem here,
after all of these years, that must be addressed?

Thanx...

ps

2010-02-16 19:24:57

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

On 02/16/2010 02:16 PM, Jeff Layton wrote:
> This is the second attempt at this patch. The main changes are that this
> one doesn't set a floor value for the size of the group list. There are
> also a few minor cleanups and comments added.
>
> If authunix_create_default() is called by a user with more than 16
> supplimental groups, it'll abort(), which causes the program to crash
> and coredump.
>
> Fix it to handle this situation gracefully. Get the number of groups
> that the user has first, and then allocate a big enough buffer to hold
> them. Then, just don't let the lower function use more than the NGRPS
> groups.
>
> Also fix up the error handling in this function so that it just returns
> a NULL pointer on error and logs a message via warnx() instead of
> calling abort().
>
> Reported-by: Peter Engel<[email protected]>
> Signed-off-by: Jeff Layton<[email protected]>

Reviewed-by: Chuck Lever <[email protected]>

> ---
> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 files changed, 56 insertions(+), 6 deletions(-)
>
> diff --git a/src/auth_unix.c b/src/auth_unix.c
> index 71ca15d..a295e71 100644
> --- a/src/auth_unix.c
> +++ b/src/auth_unix.c
> @@ -49,6 +49,7 @@
> #include<stdlib.h>
> #include<unistd.h>
> #include<string.h>
> +#include<errno.h>
>
> #include<rpc/types.h>
> #include<rpc/xdr.h>
> @@ -175,20 +176,69 @@ AUTH *
> authunix_create_default()
> {
> int len;
> + size_t bufsize = 0;
> char machname[MAXHOSTNAMELEN + 1];
> uid_t uid;
> gid_t gid;
> - gid_t gids[NGRPS];
> + gid_t *gids = NULL;
> + AUTH *auth;
> +
> + if (gethostname(machname, sizeof machname) == -1) {
> + warnx("%s: gethostname() failed: %s", __func__,
> + strerror(errno));
> + return NULL;
> + }
>
> - if (gethostname(machname, sizeof machname) == -1)
> - abort();
> machname[sizeof(machname) - 1] = 0;
> uid = geteuid();
> gid = getegid();
> - if ((len = getgroups(NGRPS, gids))< 0)
> - abort();
> +
> +retry:
> + len = getgroups(0, NULL);
> + if (len< 0) {
> + warnx("%s: failed to get number of groups: %s", __func__,
> + strerror(errno));
> + return NULL;
> + }
> +
> + if (len == 0)
> + goto no_groups;
> +
> + bufsize = len * sizeof(gid_t);
> + gids = mem_alloc(bufsize);
> + if (gids == NULL) {
> + warnx("%s: memory allocation failed: %s", __func__,
> + strerror(ENOMEM));
> + return NULL;
> + }
> +
> + len = getgroups(len, gids);
> + if (len< 0) {
> + mem_free(gids, bufsize);
> + /*
> + * glibc equivalent routines mention that it's possible for
> + * the number of groups to change between two getgroups calls.
> + * If that happens, retry the whole thing again.
> + */
> + if (len == -EINVAL) {
> + gids = NULL;
> + bufsize = 0;
> + goto retry;
> + }
> + warnx("%s: failed to get group list: %s", __func__,
> + strerror(errno));
> + return NULL;
> + }
> +
> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
> + if (len> NGRPS)
> + len = NGRPS;
> +
> +no_groups:
> /* XXX: interface problem; those should all have been unsigned */
> - return (authunix_create(machname, uid, gid, len, gids));
> + auth = authunix_create(machname, uid, gid, len, gids);
> + mem_free(gids, bufsize);
> + return auth;
> }
>
> /*


--
chuck[dot]lever[at]oracle[dot]com

2010-02-16 19:27:00

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

On 02/16/2010 02:24 PM, Peter Staubach wrote:
> Jeff Layton wrote:
>> This is the second attempt at this patch. The main changes are that this
>> one doesn't set a floor value for the size of the group list. There are
>> also a few minor cleanups and comments added.
>>
>> If authunix_create_default() is called by a user with more than 16
>> supplimental groups, it'll abort(), which causes the program to crash
>> and coredump.
>>
>> Fix it to handle this situation gracefully. Get the number of groups
>> that the user has first, and then allocate a big enough buffer to hold
>> them. Then, just don't let the lower function use more than the NGRPS
>> groups.
>>
>> Also fix up the error handling in this function so that it just returns
>> a NULL pointer on error and logs a message via warnx() instead of
>> calling abort().
>>
>> Reported-by: Peter Engel<[email protected]>
>> Signed-off-by: Jeff Layton<[email protected]>
>> ---
>> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
>> 1 files changed, 56 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/auth_unix.c b/src/auth_unix.c
>> index 71ca15d..a295e71 100644
>> --- a/src/auth_unix.c
>> +++ b/src/auth_unix.c
>> @@ -49,6 +49,7 @@
>> #include<stdlib.h>
>> #include<unistd.h>
>> #include<string.h>
>> +#include<errno.h>
>>
>> #include<rpc/types.h>
>> #include<rpc/xdr.h>
>> @@ -175,20 +176,69 @@ AUTH *
>> authunix_create_default()
>> {
>> int len;
>> + size_t bufsize = 0;
>> char machname[MAXHOSTNAMELEN + 1];
>> uid_t uid;
>> gid_t gid;
>> - gid_t gids[NGRPS];
>> + gid_t *gids = NULL;
>> + AUTH *auth;
>> +
>> + if (gethostname(machname, sizeof machname) == -1) {
>> + warnx("%s: gethostname() failed: %s", __func__,
>> + strerror(errno));
>> + return NULL;
>> + }
>>
>> - if (gethostname(machname, sizeof machname) == -1)
>> - abort();
>> machname[sizeof(machname) - 1] = 0;
>> uid = geteuid();
>> gid = getegid();
>> - if ((len = getgroups(NGRPS, gids))< 0)
>> - abort();
>> +
>> +retry:
>> + len = getgroups(0, NULL);
>> + if (len< 0) {
>> + warnx("%s: failed to get number of groups: %s", __func__,
>> + strerror(errno));
>> + return NULL;
>> + }
>> +
>> + if (len == 0)
>> + goto no_groups;
>> +
>> + bufsize = len * sizeof(gid_t);
>> + gids = mem_alloc(bufsize);
>> + if (gids == NULL) {
>> + warnx("%s: memory allocation failed: %s", __func__,
>> + strerror(ENOMEM));
>> + return NULL;
>> + }
>> +
>> + len = getgroups(len, gids);
>> + if (len< 0) {
>> + mem_free(gids, bufsize);
>> + /*
>> + * glibc equivalent routines mention that it's possible for
>> + * the number of groups to change between two getgroups calls.
>> + * If that happens, retry the whole thing again.
>> + */
>> + if (len == -EINVAL) {
>> + gids = NULL;
>> + bufsize = 0;
>> + goto retry;
>> + }
>> + warnx("%s: failed to get group list: %s", __func__,
>> + strerror(errno));
>> + return NULL;
>> + }
>> +
>> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
>> + if (len> NGRPS)
>> + len = NGRPS;
>> +
>> +no_groups:
>> /* XXX: interface problem; those should all have been unsigned */
>> - return (authunix_create(machname, uid, gid, len, gids));
>> + auth = authunix_create(machname, uid, gid, len, gids);
>> + mem_free(gids, bufsize);
>> + return auth;
>> }
>>
>> /*
>
> This change to restrict the groups used to the first NGRPS
> groups is one that we have always avoided. It can be quite
> confusing to the user, to have an operation fail, but then
> to look and notice that the correct group is listed.
>
> Having the library abort seems odd and wrong, but this will
> also change semantics. Is there really a problem here,
> after all of these years, that must be addressed?

We're actually copying semantics from glibc. I think RPC applications
on Linux might expect the glibc behavior. Thoughts?

--
chuck[dot]lever[at]oracle[dot]com

2010-02-16 19:33:24

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

On Tue, 16 Feb 2010 14:24:09 -0500
Peter Staubach <[email protected]> wrote:

> Jeff Layton wrote:
> > This is the second attempt at this patch. The main changes are that this
> > one doesn't set a floor value for the size of the group list. There are
> > also a few minor cleanups and comments added.
> >
> > If authunix_create_default() is called by a user with more than 16
> > supplimental groups, it'll abort(), which causes the program to crash
> > and coredump.
> >
> > Fix it to handle this situation gracefully. Get the number of groups
> > that the user has first, and then allocate a big enough buffer to hold
> > them. Then, just don't let the lower function use more than the NGRPS
> > groups.
> >
> > Also fix up the error handling in this function so that it just returns
> > a NULL pointer on error and logs a message via warnx() instead of
> > calling abort().
> >
> > Reported-by: Peter Engel <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> > 1 files changed, 56 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/auth_unix.c b/src/auth_unix.c
> > index 71ca15d..a295e71 100644
> > --- a/src/auth_unix.c
> > +++ b/src/auth_unix.c
> > @@ -49,6 +49,7 @@
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <string.h>
> > +#include <errno.h>
> >
> > #include <rpc/types.h>
> > #include <rpc/xdr.h>
> > @@ -175,20 +176,69 @@ AUTH *
> > authunix_create_default()
> > {
> > int len;
> > + size_t bufsize = 0;
> > char machname[MAXHOSTNAMELEN + 1];
> > uid_t uid;
> > gid_t gid;
> > - gid_t gids[NGRPS];
> > + gid_t *gids = NULL;
> > + AUTH *auth;
> > +
> > + if (gethostname(machname, sizeof machname) == -1) {
> > + warnx("%s: gethostname() failed: %s", __func__,
> > + strerror(errno));
> > + return NULL;
> > + }
> >
> > - if (gethostname(machname, sizeof machname) == -1)
> > - abort();
> > machname[sizeof(machname) - 1] = 0;
> > uid = geteuid();
> > gid = getegid();
> > - if ((len = getgroups(NGRPS, gids)) < 0)
> > - abort();
> > +
> > +retry:
> > + len = getgroups(0, NULL);
> > + if (len < 0) {
> > + warnx("%s: failed to get number of groups: %s", __func__,
> > + strerror(errno));
> > + return NULL;
> > + }
> > +
> > + if (len == 0)
> > + goto no_groups;
> > +
> > + bufsize = len * sizeof(gid_t);
> > + gids = mem_alloc(bufsize);
> > + if (gids == NULL) {
> > + warnx("%s: memory allocation failed: %s", __func__,
> > + strerror(ENOMEM));
> > + return NULL;
> > + }
> > +
> > + len = getgroups(len, gids);
> > + if (len < 0) {
> > + mem_free(gids, bufsize);
> > + /*
> > + * glibc equivalent routines mention that it's possible for
> > + * the number of groups to change between two getgroups calls.
> > + * If that happens, retry the whole thing again.
> > + */
> > + if (len == -EINVAL) {
> > + gids = NULL;
> > + bufsize = 0;
> > + goto retry;
> > + }
> > + warnx("%s: failed to get group list: %s", __func__,
> > + strerror(errno));
> > + return NULL;
> > + }
> > +
> > + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
> > + if (len > NGRPS)
> > + len = NGRPS;
> > +
> > +no_groups:
> > /* XXX: interface problem; those should all have been unsigned */
> > - return (authunix_create(machname, uid, gid, len, gids));
> > + auth = authunix_create(machname, uid, gid, len, gids);
> > + mem_free(gids, bufsize);
> > + return auth;
> > }
> >
> > /*
>
> This change to restrict the groups used to the first NGRPS
> groups is one that we have always avoided. It can be quite
> confusing to the user, to have an operation fail, but then
> to look and notice that the correct group is listed.
>
> Having the library abort seems odd and wrong, but this will
> also change semantics. Is there really a problem here,
> after all of these years, that must be addressed?
>

Yes, we had a bug report against nfs-utils where someone was seeing
umount.nfs core dump because root was in >NGRPS groups.

https://bugzilla.redhat.com/show_bug.cgi?id=565507

...as Chuck points out, this is basically what glibc has done for
years. While it may be confusing for users, I think it's more
preferable than having the programs fail outright.

What sort of behavior would you propose in its place?

--
Jeff Layton <[email protected]>

2010-02-16 19:49:46

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

Jeff Layton wrote:
> On Tue, 16 Feb 2010 14:24:09 -0500
> Peter Staubach <[email protected]> wrote:
>
>> Jeff Layton wrote:
>>> This is the second attempt at this patch. The main changes are that this
>>> one doesn't set a floor value for the size of the group list. There are
>>> also a few minor cleanups and comments added.
>>>
>>> If authunix_create_default() is called by a user with more than 16
>>> supplimental groups, it'll abort(), which causes the program to crash
>>> and coredump.
>>>
>>> Fix it to handle this situation gracefully. Get the number of groups
>>> that the user has first, and then allocate a big enough buffer to hold
>>> them. Then, just don't let the lower function use more than the NGRPS
>>> groups.
>>>
>>> Also fix up the error handling in this function so that it just returns
>>> a NULL pointer on error and logs a message via warnx() instead of
>>> calling abort().
>>>
>>> Reported-by: Peter Engel <[email protected]>
>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
>>> 1 files changed, 56 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/auth_unix.c b/src/auth_unix.c
>>> index 71ca15d..a295e71 100644
>>> --- a/src/auth_unix.c
>>> +++ b/src/auth_unix.c
>>> @@ -49,6 +49,7 @@
>>> #include <stdlib.h>
>>> #include <unistd.h>
>>> #include <string.h>
>>> +#include <errno.h>
>>>
>>> #include <rpc/types.h>
>>> #include <rpc/xdr.h>
>>> @@ -175,20 +176,69 @@ AUTH *
>>> authunix_create_default()
>>> {
>>> int len;
>>> + size_t bufsize = 0;
>>> char machname[MAXHOSTNAMELEN + 1];
>>> uid_t uid;
>>> gid_t gid;
>>> - gid_t gids[NGRPS];
>>> + gid_t *gids = NULL;
>>> + AUTH *auth;
>>> +
>>> + if (gethostname(machname, sizeof machname) == -1) {
>>> + warnx("%s: gethostname() failed: %s", __func__,
>>> + strerror(errno));
>>> + return NULL;
>>> + }
>>>
>>> - if (gethostname(machname, sizeof machname) == -1)
>>> - abort();
>>> machname[sizeof(machname) - 1] = 0;
>>> uid = geteuid();
>>> gid = getegid();
>>> - if ((len = getgroups(NGRPS, gids)) < 0)
>>> - abort();
>>> +
>>> +retry:
>>> + len = getgroups(0, NULL);
>>> + if (len < 0) {
>>> + warnx("%s: failed to get number of groups: %s", __func__,
>>> + strerror(errno));
>>> + return NULL;
>>> + }
>>> +
>>> + if (len == 0)
>>> + goto no_groups;
>>> +
>>> + bufsize = len * sizeof(gid_t);
>>> + gids = mem_alloc(bufsize);
>>> + if (gids == NULL) {
>>> + warnx("%s: memory allocation failed: %s", __func__,
>>> + strerror(ENOMEM));
>>> + return NULL;
>>> + }
>>> +
>>> + len = getgroups(len, gids);
>>> + if (len < 0) {
>>> + mem_free(gids, bufsize);
>>> + /*
>>> + * glibc equivalent routines mention that it's possible for
>>> + * the number of groups to change between two getgroups calls.
>>> + * If that happens, retry the whole thing again.
>>> + */
>>> + if (len == -EINVAL) {
>>> + gids = NULL;
>>> + bufsize = 0;
>>> + goto retry;
>>> + }
>>> + warnx("%s: failed to get group list: %s", __func__,
>>> + strerror(errno));
>>> + return NULL;
>>> + }
>>> +
>>> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
>>> + if (len > NGRPS)
>>> + len = NGRPS;
>>> +
>>> +no_groups:
>>> /* XXX: interface problem; those should all have been unsigned */
>>> - return (authunix_create(machname, uid, gid, len, gids));
>>> + auth = authunix_create(machname, uid, gid, len, gids);
>>> + mem_free(gids, bufsize);
>>> + return auth;
>>> }
>>>
>>> /*
>> This change to restrict the groups used to the first NGRPS
>> groups is one that we have always avoided. It can be quite
>> confusing to the user, to have an operation fail, but then
>> to look and notice that the correct group is listed.
>>
>> Having the library abort seems odd and wrong, but this will
>> also change semantics. Is there really a problem here,
>> after all of these years, that must be addressed?
>>
>
> Yes, we had a bug report against nfs-utils where someone was seeing
> umount.nfs core dump because root was in >NGRPS groups.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=565507
>
> ...as Chuck points out, this is basically what glibc has done for
> years. While it may be confusing for users, I think it's more
> preferable than having the programs fail outright.
>
> What sort of behavior would you propose in its place?
>

Well, I do agree that a library routine should not call abort().
That's just pure and simple, wrong. That bug must be fixed.

However, making a policy decision for the user also seems
wrong. Just because glibc did something does not make it right.
It does tend to make it a standard, so the semantic change
argument becomes weaker, but this change was made to a different
RPC library, so the application must have/be changed to use it,
so all bets are off again.

My inclination would be to return NULL and see who complains.

ps

2010-02-16 20:12:27

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

On 02/16/2010 02:49 PM, Peter Staubach wrote:
> Jeff Layton wrote:
>> On Tue, 16 Feb 2010 14:24:09 -0500
>> Peter Staubach<[email protected]> wrote:
>>
>>> Jeff Layton wrote:
>>>> This is the second attempt at this patch. The main changes are that this
>>>> one doesn't set a floor value for the size of the group list. There are
>>>> also a few minor cleanups and comments added.
>>>>
>>>> If authunix_create_default() is called by a user with more than 16
>>>> supplimental groups, it'll abort(), which causes the program to crash
>>>> and coredump.
>>>>
>>>> Fix it to handle this situation gracefully. Get the number of groups
>>>> that the user has first, and then allocate a big enough buffer to hold
>>>> them. Then, just don't let the lower function use more than the NGRPS
>>>> groups.
>>>>
>>>> Also fix up the error handling in this function so that it just returns
>>>> a NULL pointer on error and logs a message via warnx() instead of
>>>> calling abort().
>>>>
>>>> Reported-by: Peter Engel<[email protected]>
>>>> Signed-off-by: Jeff Layton<[email protected]>
>>>> ---
>>>> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
>>>> 1 files changed, 56 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/src/auth_unix.c b/src/auth_unix.c
>>>> index 71ca15d..a295e71 100644
>>>> --- a/src/auth_unix.c
>>>> +++ b/src/auth_unix.c
>>>> @@ -49,6 +49,7 @@
>>>> #include<stdlib.h>
>>>> #include<unistd.h>
>>>> #include<string.h>
>>>> +#include<errno.h>
>>>>
>>>> #include<rpc/types.h>
>>>> #include<rpc/xdr.h>
>>>> @@ -175,20 +176,69 @@ AUTH *
>>>> authunix_create_default()
>>>> {
>>>> int len;
>>>> + size_t bufsize = 0;
>>>> char machname[MAXHOSTNAMELEN + 1];
>>>> uid_t uid;
>>>> gid_t gid;
>>>> - gid_t gids[NGRPS];
>>>> + gid_t *gids = NULL;
>>>> + AUTH *auth;
>>>> +
>>>> + if (gethostname(machname, sizeof machname) == -1) {
>>>> + warnx("%s: gethostname() failed: %s", __func__,
>>>> + strerror(errno));
>>>> + return NULL;
>>>> + }
>>>>
>>>> - if (gethostname(machname, sizeof machname) == -1)
>>>> - abort();
>>>> machname[sizeof(machname) - 1] = 0;
>>>> uid = geteuid();
>>>> gid = getegid();
>>>> - if ((len = getgroups(NGRPS, gids))< 0)
>>>> - abort();
>>>> +
>>>> +retry:
>>>> + len = getgroups(0, NULL);
>>>> + if (len< 0) {
>>>> + warnx("%s: failed to get number of groups: %s", __func__,
>>>> + strerror(errno));
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + if (len == 0)
>>>> + goto no_groups;
>>>> +
>>>> + bufsize = len * sizeof(gid_t);
>>>> + gids = mem_alloc(bufsize);
>>>> + if (gids == NULL) {
>>>> + warnx("%s: memory allocation failed: %s", __func__,
>>>> + strerror(ENOMEM));
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + len = getgroups(len, gids);
>>>> + if (len< 0) {
>>>> + mem_free(gids, bufsize);
>>>> + /*
>>>> + * glibc equivalent routines mention that it's possible for
>>>> + * the number of groups to change between two getgroups calls.
>>>> + * If that happens, retry the whole thing again.
>>>> + */
>>>> + if (len == -EINVAL) {
>>>> + gids = NULL;
>>>> + bufsize = 0;
>>>> + goto retry;
>>>> + }
>>>> + warnx("%s: failed to get group list: %s", __func__,
>>>> + strerror(errno));
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
>>>> + if (len> NGRPS)
>>>> + len = NGRPS;
>>>> +
>>>> +no_groups:
>>>> /* XXX: interface problem; those should all have been unsigned */
>>>> - return (authunix_create(machname, uid, gid, len, gids));
>>>> + auth = authunix_create(machname, uid, gid, len, gids);
>>>> + mem_free(gids, bufsize);
>>>> + return auth;
>>>> }
>>>>
>>>> /*
>>> This change to restrict the groups used to the first NGRPS
>>> groups is one that we have always avoided. It can be quite
>>> confusing to the user, to have an operation fail, but then
>>> to look and notice that the correct group is listed.
>>>
>>> Having the library abort seems odd and wrong, but this will
>>> also change semantics. Is there really a problem here,
>>> after all of these years, that must be addressed?
>>>
>>
>> Yes, we had a bug report against nfs-utils where someone was seeing
>> umount.nfs core dump because root was in>NGRPS groups.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=565507
>>
>> ...as Chuck points out, this is basically what glibc has done for
>> years. While it may be confusing for users, I think it's more
>> preferable than having the programs fail outright.
>>
>> What sort of behavior would you propose in its place?
>>
>
> Well, I do agree that a library routine should not call abort().
> That's just pure and simple, wrong. That bug must be fixed.
>
> However, making a policy decision for the user also seems
> wrong. Just because glibc did something does not make it right.

I'm not going to argue with that :-)

> It does tend to make it a standard, so the semantic change
> argument becomes weaker, but this change was made to a different
> RPC library, so the application must have/be changed to use it,
> so all bets are off again.

Not precisely true here. authunix_create_default() is, I believe, also
part of the legacy RPC interface (TS-RPC). Thus any application that
uses only the legacy RPC calls and is relinked with libtirpc can use
this interface, and may expect it to behave as before.

> My inclination would be to return NULL and see who complains.

Yeah, unfortunately many of authunix_create_default's callers don't
check the return value.

In the umount.nfs and showmount cases, what would be the appropriate
recovery action for the application, if authunix_create_default()
returns NULL?

--
chuck[dot]lever[at]oracle[dot]com

2010-02-16 20:14:35

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2)

On Tue, 16 Feb 2010 14:49:33 -0500
Peter Staubach <[email protected]> wrote:

> Jeff Layton wrote:
> > On Tue, 16 Feb 2010 14:24:09 -0500
> > Peter Staubach <[email protected]> wrote:
> >
> >> Jeff Layton wrote:
> >>> This is the second attempt at this patch. The main changes are that this
> >>> one doesn't set a floor value for the size of the group list. There are
> >>> also a few minor cleanups and comments added.
> >>>
> >>> If authunix_create_default() is called by a user with more than 16
> >>> supplimental groups, it'll abort(), which causes the program to crash
> >>> and coredump.
> >>>
> >>> Fix it to handle this situation gracefully. Get the number of groups
> >>> that the user has first, and then allocate a big enough buffer to hold
> >>> them. Then, just don't let the lower function use more than the NGRPS
> >>> groups.
> >>>
> >>> Also fix up the error handling in this function so that it just returns
> >>> a NULL pointer on error and logs a message via warnx() instead of
> >>> calling abort().
> >>>
> >>> Reported-by: Peter Engel <[email protected]>
> >>> Signed-off-by: Jeff Layton <[email protected]>
> >>> ---
> >>> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> >>> 1 files changed, 56 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/src/auth_unix.c b/src/auth_unix.c
> >>> index 71ca15d..a295e71 100644
> >>> --- a/src/auth_unix.c
> >>> +++ b/src/auth_unix.c
> >>> @@ -49,6 +49,7 @@
> >>> #include <stdlib.h>
> >>> #include <unistd.h>
> >>> #include <string.h>
> >>> +#include <errno.h>
> >>>
> >>> #include <rpc/types.h>
> >>> #include <rpc/xdr.h>
> >>> @@ -175,20 +176,69 @@ AUTH *
> >>> authunix_create_default()
> >>> {
> >>> int len;
> >>> + size_t bufsize = 0;
> >>> char machname[MAXHOSTNAMELEN + 1];
> >>> uid_t uid;
> >>> gid_t gid;
> >>> - gid_t gids[NGRPS];
> >>> + gid_t *gids = NULL;
> >>> + AUTH *auth;
> >>> +
> >>> + if (gethostname(machname, sizeof machname) == -1) {
> >>> + warnx("%s: gethostname() failed: %s", __func__,
> >>> + strerror(errno));
> >>> + return NULL;
> >>> + }
> >>>
> >>> - if (gethostname(machname, sizeof machname) == -1)
> >>> - abort();
> >>> machname[sizeof(machname) - 1] = 0;
> >>> uid = geteuid();
> >>> gid = getegid();
> >>> - if ((len = getgroups(NGRPS, gids)) < 0)
> >>> - abort();
> >>> +
> >>> +retry:
> >>> + len = getgroups(0, NULL);
> >>> + if (len < 0) {
> >>> + warnx("%s: failed to get number of groups: %s", __func__,
> >>> + strerror(errno));
> >>> + return NULL;
> >>> + }
> >>> +
> >>> + if (len == 0)
> >>> + goto no_groups;
> >>> +
> >>> + bufsize = len * sizeof(gid_t);
> >>> + gids = mem_alloc(bufsize);
> >>> + if (gids == NULL) {
> >>> + warnx("%s: memory allocation failed: %s", __func__,
> >>> + strerror(ENOMEM));
> >>> + return NULL;
> >>> + }
> >>> +
> >>> + len = getgroups(len, gids);
> >>> + if (len < 0) {
> >>> + mem_free(gids, bufsize);
> >>> + /*
> >>> + * glibc equivalent routines mention that it's possible for
> >>> + * the number of groups to change between two getgroups calls.
> >>> + * If that happens, retry the whole thing again.
> >>> + */
> >>> + if (len == -EINVAL) {
> >>> + gids = NULL;
> >>> + bufsize = 0;
> >>> + goto retry;
> >>> + }
> >>> + warnx("%s: failed to get group list: %s", __func__,
> >>> + strerror(errno));
> >>> + return NULL;
> >>> + }
> >>> +
> >>> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */
> >>> + if (len > NGRPS)
> >>> + len = NGRPS;
> >>> +
> >>> +no_groups:
> >>> /* XXX: interface problem; those should all have been unsigned */
> >>> - return (authunix_create(machname, uid, gid, len, gids));
> >>> + auth = authunix_create(machname, uid, gid, len, gids);
> >>> + mem_free(gids, bufsize);
> >>> + return auth;
> >>> }
> >>>
> >>> /*
> >> This change to restrict the groups used to the first NGRPS
> >> groups is one that we have always avoided. It can be quite
> >> confusing to the user, to have an operation fail, but then
> >> to look and notice that the correct group is listed.
> >>
> >> Having the library abort seems odd and wrong, but this will
> >> also change semantics. Is there really a problem here,
> >> after all of these years, that must be addressed?
> >>
> >
> > Yes, we had a bug report against nfs-utils where someone was seeing
> > umount.nfs core dump because root was in >NGRPS groups.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=565507
> >
> > ...as Chuck points out, this is basically what glibc has done for
> > years. While it may be confusing for users, I think it's more
> > preferable than having the programs fail outright.
> >
> > What sort of behavior would you propose in its place?
> >
>
> Well, I do agree that a library routine should not call abort().
> That's just pure and simple, wrong. That bug must be fixed.
>
> However, making a policy decision for the user also seems
> wrong. Just because glibc did something does not make it right.
> It does tend to make it a standard, so the semantic change
> argument becomes weaker, but this change was made to a different
> RPC library, so the application must have/be changed to use it,
> so all bets are off again.
>
> My inclination would be to return NULL and see who complains.
>

That'll pretty much make authunix_create_default() worthless. No
program will ever be able to reasonably use it since it'll also almost
certainly need to deal with the case where the user has >16 groups. At
that point, you might as well just call authunix_create and just do
everything yourself.

I sort of see this function as a convenience wrapper where we need to
use AUTH_UNIX, but don't care much about what groups end up in the
list. For people who want to micromanage the group list, there's always
authunix_create...

--
Jeff Layton <[email protected]>