From: Jeff Layton Subject: Re: [PATCH] libtirpc: handle large numbers of supplemental groups gracefully (try #2) Date: Tue, 16 Feb 2010 14:33:07 -0500 Message-ID: <20100216143307.45a7ed42@tlielax.poochiereds.net> References: <1266347817-11186-1-git-send-email-jlayton@redhat.com> <4B7AF0D9.4030208@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Cc: steved@redhat.com, chuck.lever@oracle.com, libtirpc-devel@lists.sourceforge.net, linux-nfs@vger.kernel.org To: Peter Staubach Return-path: Received: from mx1.redhat.com ([209.132.183.28]:42405 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932872Ab0BPTdY (ORCPT ); Tue, 16 Feb 2010 14:33:24 -0500 In-Reply-To: <4B7AF0D9.4030208@redhat.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, 16 Feb 2010 14:24:09 -0500 Peter Staubach wrote: > Jeff Layton wrote: > > This is the second attempt at this patch. The main changes are that this > > one doesn't set a floor value for the size of the group list. There are > > also a few minor cleanups and comments added. > > > > If authunix_create_default() is called by a user with more than 16 > > supplimental groups, it'll abort(), which causes the program to crash > > and coredump. > > > > Fix it to handle this situation gracefully. Get the number of groups > > that the user has first, and then allocate a big enough buffer to hold > > them. Then, just don't let the lower function use more than the NGRPS > > groups. > > > > Also fix up the error handling in this function so that it just returns > > a NULL pointer on error and logs a message via warnx() instead of > > calling abort(). > > > > Reported-by: Peter Engel > > Signed-off-by: Jeff Layton > > --- > > src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++----- > > 1 files changed, 56 insertions(+), 6 deletions(-) > > > > diff --git a/src/auth_unix.c b/src/auth_unix.c > > index 71ca15d..a295e71 100644 > > --- a/src/auth_unix.c > > +++ b/src/auth_unix.c > > @@ -49,6 +49,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -175,20 +176,69 @@ AUTH * > > authunix_create_default() > > { > > int len; > > + size_t bufsize = 0; > > char machname[MAXHOSTNAMELEN + 1]; > > uid_t uid; > > gid_t gid; > > - gid_t gids[NGRPS]; > > + gid_t *gids = NULL; > > + AUTH *auth; > > + > > + if (gethostname(machname, sizeof machname) == -1) { > > + warnx("%s: gethostname() failed: %s", __func__, > > + strerror(errno)); > > + return NULL; > > + } > > > > - if (gethostname(machname, sizeof machname) == -1) > > - abort(); > > machname[sizeof(machname) - 1] = 0; > > uid = geteuid(); > > gid = getegid(); > > - if ((len = getgroups(NGRPS, gids)) < 0) > > - abort(); > > + > > +retry: > > + len = getgroups(0, NULL); > > + if (len < 0) { > > + warnx("%s: failed to get number of groups: %s", __func__, > > + strerror(errno)); > > + return NULL; > > + } > > + > > + if (len == 0) > > + goto no_groups; > > + > > + bufsize = len * sizeof(gid_t); > > + gids = mem_alloc(bufsize); > > + if (gids == NULL) { > > + warnx("%s: memory allocation failed: %s", __func__, > > + strerror(ENOMEM)); > > + return NULL; > > + } > > + > > + len = getgroups(len, gids); > > + if (len < 0) { > > + mem_free(gids, bufsize); > > + /* > > + * glibc equivalent routines mention that it's possible for > > + * the number of groups to change between two getgroups calls. > > + * If that happens, retry the whole thing again. > > + */ > > + if (len == -EINVAL) { > > + gids = NULL; > > + bufsize = 0; > > + goto retry; > > + } > > + warnx("%s: failed to get group list: %s", __func__, > > + strerror(errno)); > > + return NULL; > > + } > > + > > + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */ > > + if (len > NGRPS) > > + len = NGRPS; > > + > > +no_groups: > > /* XXX: interface problem; those should all have been unsigned */ > > - return (authunix_create(machname, uid, gid, len, gids)); > > + auth = authunix_create(machname, uid, gid, len, gids); > > + mem_free(gids, bufsize); > > + return auth; > > } > > > > /* > > This change to restrict the groups used to the first NGRPS > groups is one that we have always avoided. It can be quite > confusing to the user, to have an operation fail, but then > to look and notice that the correct group is listed. > > Having the library abort seems odd and wrong, but this will > also change semantics. Is there really a problem here, > after all of these years, that must be addressed? > Yes, we had a bug report against nfs-utils where someone was seeing umount.nfs core dump because root was in >NGRPS groups. https://bugzilla.redhat.com/show_bug.cgi?id=565507 ...as Chuck points out, this is basically what glibc has done for years. While it may be confusing for users, I think it's more preferable than having the programs fail outright. What sort of behavior would you propose in its place? -- Jeff Layton