2022-03-03 06:13:35

by NeilBrown

[permalink] [raw]
Subject: NFS access problems when group membership changes on server


Since
Commit 57b691819ee2 ("NFS: Cache access checks more aggressively")
we do not recheck "ACCESS" tests unless the inode changes (or falls out
of cache). I recently discovered that this can be problematic.

The ACCESS test checks if a given user has access to a given file.
That is most likely to change if the file changes, but it can change if
the users capabilities change - e.g. their group memberships change.
With AUTH_SYS the group membership as seen on the client is used (though
with the Linux NFS server, adding the -g option to mountd can change
this).
With AUTH_GSS, the mapping from user to groups must happen on the
server. IF this mapping changes it might be invisible to the client
for an arbitrary long time.

I don't think this affects files with NFSv4 as there will be a OPEN
request to the server, but it does affect directories.

If a user does "ls -l some-directory", this fails, and they go ask the
server admin "please add me to the group to access the directory", then
they go back and try again, it may well still not work.

With a local filesystem, logging off and back on again might be
required to refresh the groups list, but even this isn't sufficient for
NFS.

What to do?

We could simply revert that patch and refresh the access cache similar
to refreshing of other attributes. We could possibly do it with a
longer timeout or only with a mount option, but I don't really like
those options.

Maybe we could add a 'struct pid *' to the access entry which references
the PIDTYPE_SID of the calling process. This would be different for
different login sessions, but common throughout one login.
If we did this, then logging out and back in again would resolve the
access problem, which matches that is needed for local access.
I'm not sure if I like this idea or not, but I thought it worth
mentioning.

Any other ideas?

Thanks,
NeilBrown


2022-03-03 19:40:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS access problems when group membership changes on server

On Thu, 2022-03-03 at 16:29 +1100, NeilBrown wrote:
>
> Since
>  Commit 57b691819ee2 ("NFS: Cache access checks more aggressively")
> we do not recheck "ACCESS" tests unless the inode changes (or falls
> out
> of cache).  I recently discovered that this can be problematic.
>
> The ACCESS test checks if a given user has access to a given file.
> That is most likely to change if the file changes, but it can change
> if
> the users capabilities change  - e.g. their group memberships change.
> With AUTH_SYS the group membership as seen on the client is used
> (though
> with the Linux NFS server, adding the -g option to mountd can change
> this).
> With AUTH_GSS, the mapping from user to groups must happen on the
> server.  IF this mapping changes it might be invisible to the client
> for an arbitrary long time.
>
> I don't think this affects files with NFSv4 as there will be a OPEN
> request to the server, but it does affect directories.
>
> If a user does "ls -l some-directory", this fails, and they go ask
> the
> server admin "please add me to the group to access the directory",
> then
> they go back and try again, it may well still not work.
>
> With a local filesystem, logging off and back on again might be
> required to refresh the groups list, but even this isn't sufficient
> for
> NFS.
>
> What to do?
>
> We could simply revert that patch and refresh the access cache
> similar
> to refreshing of other attributes.  We could possibly do it with a
> longer timeout or only with a mount option, but I don't really like
> those options.
>
> Maybe we could add a 'struct pid *' to the access entry which
> references
> the PIDTYPE_SID of the calling process.  This would be different for
> different login sessions, but common throughout one login.
> If we did this, then logging out and back in again would resolve the
> access problem, which matches that is needed for local access.
> I'm not sure if I like this idea or not, but I thought it worth
> mentioning.
>

Actually, most of the newer common identity mapping schemes, including
Active Directory and FreeIPA will store the group membership in a PAC,
which is attached to the ticket on the client. So the mapping still
happens on the server, but it uses information from the GSS session
instead of doing another round trip to the identity server.

That means the group membership as understood by the server is tied to
your AUTH_GSS session and the ticket that created it. To change it, you
may need to invalidate the AUTH_GSS session, and change your kerberos
ticket through a new "kinit".

>

> Any other ideas?
>

As I see it, the cache needs to be tied to the GSS ticket so that any
given entry can be invalidated when the ticket is. That's a scheme that
also works just fine for AUTH_SYS, and would exactly preserve the POSIX
semantics whereby we update the group membership on login.

One suggestion for doing so might be to allow the cache entries to
subscribe to the RPC cred that was used to create them in a way that
doesn't prevent the cred from being kicked out of the cache (i.e. we
don't take a reference). If the cred is no longer in the cred cache
when we check the access cache, then consider the access entry to need
refreshing/revalidation.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-03-04 09:06:42

by NeilBrown

[permalink] [raw]
Subject: Re: NFS access problems when group membership changes on server

On Fri, 04 Mar 2022, Trond Myklebust wrote:
> On Thu, 2022-03-03 at 16:29 +1100, NeilBrown wrote:
> >
> > Since
> >  Commit 57b691819ee2 ("NFS: Cache access checks more aggressively")
> > we do not recheck "ACCESS" tests unless the inode changes (or falls
> > out
> > of cache).  I recently discovered that this can be problematic.
> >
> > The ACCESS test checks if a given user has access to a given file.
> > That is most likely to change if the file changes, but it can change
> > if
> > the users capabilities change  - e.g. their group memberships change.
> > With AUTH_SYS the group membership as seen on the client is used
> > (though
> > with the Linux NFS server, adding the -g option to mountd can change
> > this).
> > With AUTH_GSS, the mapping from user to groups must happen on the
> > server.  IF this mapping changes it might be invisible to the client
> > for an arbitrary long time.
> >
> > I don't think this affects files with NFSv4 as there will be a OPEN
> > request to the server, but it does affect directories.
> >
> > If a user does "ls -l some-directory", this fails, and they go ask
> > the
> > server admin "please add me to the group to access the directory",
> > then
> > they go back and try again, it may well still not work.
> >
> > With a local filesystem, logging off and back on again might be
> > required to refresh the groups list, but even this isn't sufficient
> > for
> > NFS.
> >
> > What to do?
> >
> > We could simply revert that patch and refresh the access cache
> > similar
> > to refreshing of other attributes.  We could possibly do it with a
> > longer timeout or only with a mount option, but I don't really like
> > those options.
> >
> > Maybe we could add a 'struct pid *' to the access entry which
> > references
> > the PIDTYPE_SID of the calling process.  This would be different for
> > different login sessions, but common throughout one login.
> > If we did this, then logging out and back in again would resolve the
> > access problem, which matches that is needed for local access.
> > I'm not sure if I like this idea or not, but I thought it worth
> > mentioning.
> >
>
> Actually, most of the newer common identity mapping schemes, including
> Active Directory and FreeIPA will store the group membership in a PAC,
> which is attached to the ticket on the client. So the mapping still
> happens on the server, but it uses information from the GSS session
> instead of doing another round trip to the identity server.
>
> That means the group membership as understood by the server is tied to
> your AUTH_GSS session and the ticket that created it. To change it, you
> may need to invalidate the AUTH_GSS session, and change your kerberos
> ticket through a new "kinit".

Interesting .... so it might be possible that logging of and back on
again could already work for the customer. I'll check.
Though I don't even know if they are using GSS. All I know is they did
"something" on the NetApp server and didn't get the access they expected.

>
> >
>
> > Any other ideas?
> >
>
> As I see it, the cache needs to be tied to the GSS ticket so that any
> given entry can be invalidated when the ticket is. That's a scheme that
> also works just fine for AUTH_SYS, and would exactly preserve the POSIX
> semantics whereby we update the group membership on login.
>
> One suggestion for doing so might be to allow the cache entries to
> subscribe to the RPC cred that was used to create them in a way that
> doesn't prevent the cred from being kicked out of the cache (i.e. we
> don't take a reference). If the cred is no longer in the cred cache
> when we check the access cache, then consider the access entry to need
> refreshing/revalidation.

Holding a cred without taking a reference would mean it could get freed,
unless we added some new "weak reference" to creds. I wonder if we can
really justify that.

But maybe we could require the group_info pointer to match. They are
refcounted and shared with the cred, but you get a new one when you log
in.
You get a group info even if the list of groups is empty.
So that would track closely to the login credential.
What would you think of something like this (once it had been tested etc
of course).

Thanks,
NeilBrown


diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 75cb1cbe4cde..5852c67a3bd4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2737,22 +2737,9 @@ static int access_cmp(const struct cred *a, const struct nfs_access_entry *b)
gb = b->group_info;
if (ga == gb)
return 0;
- if (ga == NULL)
+ if (ga < gb)
return -1;
- if (gb == NULL)
- return 1;
- if (ga->ngroups < gb->ngroups)
- return -1;
- if (ga->ngroups > gb->ngroups)
- return 1;
-
- for (g = 0; g < ga->ngroups; g++) {
- if (gid_lt(ga->gid[g], gb->gid[g]))
- return -1;
- if (gid_gt(ga->gid[g], gb->gid[g]))
- return 1;
- }
- return 0;
+ return 1;
}

static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, const struct cred *cred)