Since Commit 57b691819ee2 ("NFS: Cache access checks more aggressively")
(Linux 4.8) NFS has cached the results of ACCESS indefinitely while the
inode isn't changing.
This is often a good choice, but doesn't take into account the
possibility that changes out side of the inode can change effective
permissions.
Depending on configuration, some servers can map the user provided in
the RPC credential to a group list at time of request. If the group
list for a user is changed, the result of ACCESS can change.
This is particularly a problem when extra permissions are given on the
server. The client may make decisions based on outdated ACCESS results
and not even try operations which would in fact succeed.
These two patches change the ACCESS cache so that when the cache grants
an access, that is trusted indefinitely just as it currently does.
However when the cache denies an access, that is only trusted if the
cached data is less than acmin seconds old. Otherwise a new ACCESS
request is made.
This allows additions to group membership to become effective with
only a modest delay.
The second patch contains even more explanatory detail.
Thanks,
NeilBrown
---
NeilBrown (2):
NFS: change nfs_access_get_cached() to nfs_access_check_cached()
NFS: limit use of ACCESS cache for negative responses
fs/nfs/dir.c | 80 +++++++++++++++++++++++++-----------------
fs/nfs/nfs4proc.c | 25 ++++++-------
include/linux/nfs_fs.h | 5 +--
3 files changed, 61 insertions(+), 49 deletions(-)
--
Signature
NFS currently caches the results of ACCESS lookups indefinitely while the
inode doesn't change (i.e. while ctime, changeid, owner/group/mode etc
don't change). This is a problem if the result from the server might
change.
When the result from the server depends purely on the credentials
provided by the client and the information in the inode, it is not
expected that the result from the server will change, and the current
behaviour is safe.
However in some configurations the server can include another lookup
step. This happens with the Linux NFS server when the "--manage-gids"
option is given to rpc.mountd. NetApp servers have similar functionality
with "AUTH_SYS Extended Groups" functionality in ONTAP9. With these,
the user reported by the client is mapped on the server to a list of
group ids. If this mapping changes, the result of ACCESS can change.
This is particularly a problem when a new group is added to a user. If
they had already tried and failed to access the file (or more commonly a
directory) then adding them to the group will not successfully give them
access as the failure is cached. Even if the user logs out and back in
again to get the new credential on the client, they might try to access
the file before the server is aware of the change. By default the Linux
server caches group information for 30 minutes. This can be reduced but
there will always be a window after the group has been added when the
server can still report ACCESS based on old group information.
The inverse is less of a problem. Removing someone from a group has
never been a guaranteed way to remove any access - at the very least a
user normally needs to log off before they lose access to any groups that
they were a member of.
If a user is removed from a group they may still be granted "access" on
the client, but this isn't sufficient to change anything for which write
access has been removed, and normally only provides read access to
cached content. So caching positive ACCESS rights indefinitely is not a
significant security problem.
The value of the ACCESS cache is realised primarily for successful
tests. These happen often, for example the test for X permissions
during filename lookups. Having a quick (even lock-free) result helps
this common operation. When the ACCESS cache denies the access there is
less cost in taking longer to confirm the access, and this is the case
where a stale cache is more problematic.
So, this patch changes the way that the access cache is used.
- If the requested access is found in the cache, and is granted, then the
call uses the cached information no matter how old it is.
- If the requested access is found in the cache and is denied, then the
cached result is only used if is it newer than the current MINATTRTIMEO
for the file.
- If the requested access is found in the cache, is denied, and is more
than MINATTRTIMEO old, then a new ACCESS request is made to the server
- If the requested access is NOT found in the cache, obviously a new
ACCESS request is made to the server, and this will be cached.
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/dir.c | 24 +++++++++++++++++++-----
fs/nfs/nfs4proc.c | 1 +
include/linux/nfs_fs.h | 1 +
3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index b4e2c6a34234..c461029515b5 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2900,7 +2900,8 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, co
return NULL;
}
-static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *cred, u32 *mask, bool may_block)
+static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *cred,
+ u32 *mask, unsigned long *cjiffies, bool may_block)
{
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_access_entry *cache;
@@ -2931,6 +2932,7 @@ static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *
retry = false;
}
*mask = cache->mask;
+ *cjiffies = cache->jiffies;
list_move_tail(&cache->lru, &nfsi->access_cache_entry_lru);
err = 0;
out:
@@ -2942,7 +2944,8 @@ static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *
return -ENOENT;
}
-static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cred, u32 *mask)
+static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cred,
+ u32 *mask, unsigned long *cjiffies)
{
/* Only check the most recently returned cache entry,
* but do it without locking.
@@ -2965,6 +2968,7 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
if (nfs_check_cache_invalid(inode, NFS_INO_INVALID_ACCESS))
goto out;
*mask = cache->mask;
+ *cjiffies = cache->jiffies;
err = 0;
out:
rcu_read_unlock();
@@ -2974,16 +2978,24 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
int nfs_access_check_cached(struct inode *inode, const struct cred *cred,
u32 want, bool may_block)
{
+ unsigned long cjiffies;
u32 have;
int status;
- status = nfs_access_get_cached_rcu(inode, cred, &have);
+ status = nfs_access_get_cached_rcu(inode, cred, &have, &cjiffies);
if (status != 0)
status = nfs_access_get_cached_locked(inode, cred, &have,
- may_block);
+ &cjiffies, may_block);
- if (status == 0 && (want & ~have) != 0)
+ if (status == 0 && (want & ~have) != 0) {
status = -EACCES;
+ /* Don't trust a negative result beyond MINATTRTIMEO,
+ * even if we hold a delegation
+ */
+ if (!time_in_range_open(jiffies, cjiffies,
+ cjiffies + NFS_MINATTRTIMEO(inode)))
+ status = -ENOENT;
+ }
return status;
}
EXPORT_SYMBOL_GPL(nfs_access_check_cached);
@@ -3036,6 +3048,7 @@ void nfs_access_add_cache(struct inode *inode, struct nfs_access_entry *set,
cache->fsgid = cred->fsgid;
cache->group_info = get_group_info(cred->group_info);
cache->mask = set->mask;
+ cache->jiffies = set->jiffies;
/* The above field assignments must be visible
* before this item appears on the lru. We cannot easily
@@ -3121,6 +3134,7 @@ static int nfs_do_access(struct inode *inode, const struct cred *cred, int mask)
*/
cache.mask = NFS_ACCESS_READ | NFS_ACCESS_MODIFY | NFS_ACCESS_EXTEND |
nfs_access_xattr_mask(NFS_SERVER(inode));
+ cache.jiffies = jiffies;
if (S_ISDIR(inode->i_mode))
cache.mask |= NFS_ACCESS_DELETE | NFS_ACCESS_LOOKUP;
else
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1e80d88df588..a78b62c5bad1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2644,6 +2644,7 @@ static int nfs4_opendata_access(const struct cred *cred,
mask = NFS4_ACCESS_READ;
nfs_access_set_mask(&cache, opendata->o_res.access_result);
+ cache.jiffies = jiffies;
nfs_access_add_cache(state->inode, &cache, cred);
flags = NFS4_ACCESS_READ | NFS4_ACCESS_EXECUTE | NFS4_ACCESS_LOOKUP;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7614f621c42d..e40bd61c80c5 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -56,6 +56,7 @@
struct nfs_access_entry {
struct rb_node rb_node;
struct list_head lru;
+ unsigned long jiffies;
kuid_t fsuid;
kgid_t fsgid;
struct group_info *group_info;
Hi,
any thoughts on these patches?
Thanks,
NeilBrown
On Thu, 28 Apr 2022, NeilBrown wrote:
> Since Commit 57b691819ee2 ("NFS: Cache access checks more aggressively")
> (Linux 4.8) NFS has cached the results of ACCESS indefinitely while the
> inode isn't changing.
>
> This is often a good choice, but doesn't take into account the
> possibility that changes out side of the inode can change effective
> permissions.
>
> Depending on configuration, some servers can map the user provided in
> the RPC credential to a group list at time of request. If the group
> list for a user is changed, the result of ACCESS can change.
>
> This is particularly a problem when extra permissions are given on the
> server. The client may make decisions based on outdated ACCESS results
> and not even try operations which would in fact succeed.
>
> These two patches change the ACCESS cache so that when the cache grants
> an access, that is trusted indefinitely just as it currently does.
> However when the cache denies an access, that is only trusted if the
> cached data is less than acmin seconds old. Otherwise a new ACCESS
> request is made.
>
> This allows additions to group membership to become effective with
> only a modest delay.
>
> The second patch contains even more explanatory detail.
>
> Thanks,
> NeilBrown
>
> ---
>
> NeilBrown (2):
> NFS: change nfs_access_get_cached() to nfs_access_check_cached()
> NFS: limit use of ACCESS cache for negative responses
>
>
> fs/nfs/dir.c | 80 +++++++++++++++++++++++++-----------------
> fs/nfs/nfs4proc.c | 25 ++++++-------
> include/linux/nfs_fs.h | 5 +--
> 3 files changed, 61 insertions(+), 49 deletions(-)
>
> --
> Signature
>
>
On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
>
> Hi,
> any thoughts on these patches?
To me, this problem is simply not worth breaking hot path functionality
for. If the credential changes on the server, but not on the client (so
that the cred cache comparison still matches), then why do we care?
IOW: I'm a NACK until convinced otherwise.
>
> Thanks,
> NeilBrown
>
>
> On Thu, 28 Apr 2022, NeilBrown wrote:
> > Since Commit 57b691819ee2 ("NFS: Cache access checks more
> > aggressively")
> > (Linux 4.8) NFS has cached the results of ACCESS indefinitely while
> > the
> > inode isn't changing.
> >
> > This is often a good choice, but doesn't take into account the
> > possibility that changes out side of the inode can change effective
> > permissions.
> >
> > Depending on configuration, some servers can map the user provided
> > in
> > the RPC credential to a group list at time of request. If the
> > group
> > list for a user is changed, the result of ACCESS can change.
> >
> > This is particularly a problem when extra permissions are given on
> > the
> > server. The client may make decisions based on outdated ACCESS
> > results
> > and not even try operations which would in fact succeed.
> >
> > These two patches change the ACCESS cache so that when the cache
> > grants
> > an access, that is trusted indefinitely just as it currently does.
> > However when the cache denies an access, that is only trusted if
> > the
> > cached data is less than acmin seconds old. Otherwise a new ACCESS
> > request is made.
> >
> > This allows additions to group membership to become effective with
> > only a modest delay.
> >
> > The second patch contains even more explanatory detail.
> >
> > Thanks,
> > NeilBrown
> >
> > ---
> >
> > NeilBrown (2):
> > NFS: change nfs_access_get_cached() to
> > nfs_access_check_cached()
> > NFS: limit use of ACCESS cache for negative responses
> >
> >
> > fs/nfs/dir.c | 80 +++++++++++++++++++++++++-------------
> > ----
> > fs/nfs/nfs4proc.c | 25 ++++++-------
> > include/linux/nfs_fs.h | 5 +--
> > 3 files changed, 61 insertions(+), 49 deletions(-)
> >
> > --
> > Signature
> >
> >
--
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022
http://www.hammer.space
On Tue, 17 May 2022, Trond Myklebust wrote:
> On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
> >
> > Hi,
> > any thoughts on these patches?
>
>
> To me, this problem is simply not worth breaking hot path functionality
> for. If the credential changes on the server, but not on the client (so
> that the cred cache comparison still matches), then why do we care?
>
> IOW: I'm a NACK until convinced otherwise.
In what way is the hot path broken? It only affect a permission test
failure. Why is that considered "hot path"??
RFC 1813 says - for NFSv3 at least -
The information returned by the server in response to an
ACCESS call is not permanent. It was correct at the exact
time that the server performed the checks, but not
necessarily afterwards. The server can revoke access
permission at any time.
Clearly the server can allow allow access at any time.
This seems to argue against caching - or at least against relying too
heavily on the cache.
RFC 8881 has the same text for NFSv4.1 - section 18.1.4
"why do we care?" Because the server has changed to grant access, but
the client is ignoring the possibility of that change - so the user is
prevented from doing work.
NeilBrown
On Tue, 2022-05-17 at 11:22 +1000, NeilBrown wrote:
> On Tue, 17 May 2022, Trond Myklebust wrote:
> > On Tue, 2022-05-17 at 11:05 +1000, NeilBrown wrote:
> > > On Tue, 17 May 2022, Trond Myklebust wrote:
> > > > On Tue, 2022-05-17 at 10:40 +1000, NeilBrown wrote:
> > > > > On Tue, 17 May 2022, Trond Myklebust wrote:
> > > > > > On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > > any thoughts on these patches?
> > > > > >
> > > > > >
> > > > > > To me, this problem is simply not worth breaking hot path
> > > > > > functionality
> > > > > > for. If the credential changes on the server, but not on
> > > > > > the
> > > > > > client
> > > > > > (so
> > > > > > that the cred cache comparison still matches), then why do
> > > > > > we
> > > > > > care?
> > > > > >
> > > > > > IOW: I'm a NACK until convinced otherwise.
> > > > >
> > > > > In what way is the hot path broken? It only affect a
> > > > > permission
> > > > > test
> > > > > failure. Why is that considered "hot path"??
> > > >
> > > > It is a permission test that is critical for caching path
> > > > resolution on
> > > > a per-user basis.
> > > >
> > > > >
> > > > > RFC 1813 says - for NFSv3 at least -
> > > > >
> > > > > The information returned by the server in response to
> > > > > an
> > > > > ACCESS call is not permanent. It was correct at the
> > > > > exact
> > > > > time that the server performed the checks, but not
> > > > > necessarily afterwards. The server can revoke access
> > > > > permission at any time.
> > > > >
> > > > > Clearly the server can allow allow access at any time.
> > > > > This seems to argue against caching - or at least against
> > > > > relying
> > > > > too
> > > > > heavily on the cache.
> > > > >
> > > > > RFC 8881 has the same text for NFSv4.1 - section 18.1.4
> > > > >
> > > > > "why do we care?" Because the server has changed to grant
> > > > > access,
> > > > > but
> > > > > the client is ignoring the possibility of that change - so
> > > > > the
> > > > > user
> > > > > is
> > > > > prevented from doing work.
> > > >
> > > > The server enforces permissions in NFS. The client permissions
> > > > checks
> > > > are performed in order to gate access to data that is already
> > > > in
> > > > cache.
> > >
> > > So if the permission check fails, then the client should ignore
> > > the
> > > cache and ask the server for the requested data, so that the
> > > server
> > > has
> > > a chance to enforce the permissions - whether denying or allowing
> > > the
> > > access.
> > >
> > > I completely agree with you, and that is effectively what my
> > > patch
> > > implements.
> > >
> >
> > No. I'm saying that no matter how many spec paragraphs you quote at
> > me,
> > I'm not willing to introduce a timeout on a cache that is critical
> > for
> > path resolution in order to address a corner case of a corner case.
> >
> > I'm saying that if you want this problem fixed, then you need to
> > look
> > at a different solution that doesn't have side-effects for the
> > 99.99999% cases or more that I do care about.
>
> What, specifically, as the cases that you do care about?
The general case, where the group membership is not changing on the
server without also changing on the client. Whether it is positive or
negative lookups. I care about the ability of the client to manage its
cache without gratuitous invalidations.
This isn't even something that is specific to NFS. This is how local
filesystems operate too.
So until you have a different solution that doesn't impact the client's
ability to cache permissions, then the answer is going to be "no" to
these patches.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Tue, 17 May 2022, Trond Myklebust wrote:
> On Tue, 2022-05-17 at 10:40 +1000, NeilBrown wrote:
> > On Tue, 17 May 2022, Trond Myklebust wrote:
> > > On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
> > > >
> > > > Hi,
> > > > any thoughts on these patches?
> > >
> > >
> > > To me, this problem is simply not worth breaking hot path
> > > functionality
> > > for. If the credential changes on the server, but not on the client
> > > (so
> > > that the cred cache comparison still matches), then why do we care?
> > >
> > > IOW: I'm a NACK until convinced otherwise.
> >
> > In what way is the hot path broken? It only affect a permission test
> > failure. Why is that considered "hot path"??
>
> It is a permission test that is critical for caching path resolution on
> a per-user basis.
>
> >
> > RFC 1813 says - for NFSv3 at least -
> >
> > The information returned by the server in response to an
> > ACCESS call is not permanent. It was correct at the exact
> > time that the server performed the checks, but not
> > necessarily afterwards. The server can revoke access
> > permission at any time.
> >
> > Clearly the server can allow allow access at any time.
> > This seems to argue against caching - or at least against relying too
> > heavily on the cache.
> >
> > RFC 8881 has the same text for NFSv4.1 - section 18.1.4
> >
> > "why do we care?" Because the server has changed to grant access, but
> > the client is ignoring the possibility of that change - so the user
> > is
> > prevented from doing work.
>
> The server enforces permissions in NFS. The client permissions checks
> are performed in order to gate access to data that is already in cache.
So if the permission check fails, then the client should ignore the
cache and ask the server for the requested data, so that the server has
a chance to enforce the permissions - whether denying or allowing the
access.
I completely agree with you, and that is effectively what my patch
implements.
NeilBrown
> NACK
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
>
On Tue, 2022-05-17 at 10:40 +1000, NeilBrown wrote:
> On Tue, 17 May 2022, Trond Myklebust wrote:
> > On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
> > >
> > > Hi,
> > > any thoughts on these patches?
> >
> >
> > To me, this problem is simply not worth breaking hot path
> > functionality
> > for. If the credential changes on the server, but not on the client
> > (so
> > that the cred cache comparison still matches), then why do we care?
> >
> > IOW: I'm a NACK until convinced otherwise.
>
> In what way is the hot path broken? It only affect a permission test
> failure. Why is that considered "hot path"??
It is a permission test that is critical for caching path resolution on
a per-user basis.
>
> RFC 1813 says - for NFSv3 at least -
>
> The information returned by the server in response to an
> ACCESS call is not permanent. It was correct at the exact
> time that the server performed the checks, but not
> necessarily afterwards. The server can revoke access
> permission at any time.
>
> Clearly the server can allow allow access at any time.
> This seems to argue against caching - or at least against relying too
> heavily on the cache.
>
> RFC 8881 has the same text for NFSv4.1 - section 18.1.4
>
> "why do we care?" Because the server has changed to grant access, but
> the client is ignoring the possibility of that change - so the user
> is
> prevented from doing work.
The server enforces permissions in NFS. The client permissions checks
are performed in order to gate access to data that is already in cache.
NACK
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Tue, 17 May 2022, Trond Myklebust wrote:
> On Tue, 2022-05-17 at 11:05 +1000, NeilBrown wrote:
> > On Tue, 17 May 2022, Trond Myklebust wrote:
> > > On Tue, 2022-05-17 at 10:40 +1000, NeilBrown wrote:
> > > > On Tue, 17 May 2022, Trond Myklebust wrote:
> > > > > On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
> > > > > >
> > > > > > Hi,
> > > > > > any thoughts on these patches?
> > > > >
> > > > >
> > > > > To me, this problem is simply not worth breaking hot path
> > > > > functionality
> > > > > for. If the credential changes on the server, but not on the
> > > > > client
> > > > > (so
> > > > > that the cred cache comparison still matches), then why do we
> > > > > care?
> > > > >
> > > > > IOW: I'm a NACK until convinced otherwise.
> > > >
> > > > In what way is the hot path broken? It only affect a permission
> > > > test
> > > > failure. Why is that considered "hot path"??
> > >
> > > It is a permission test that is critical for caching path
> > > resolution on
> > > a per-user basis.
> > >
> > > >
> > > > RFC 1813 says - for NFSv3 at least -
> > > >
> > > > The information returned by the server in response to an
> > > > ACCESS call is not permanent. It was correct at the exact
> > > > time that the server performed the checks, but not
> > > > necessarily afterwards. The server can revoke access
> > > > permission at any time.
> > > >
> > > > Clearly the server can allow allow access at any time.
> > > > This seems to argue against caching - or at least against relying
> > > > too
> > > > heavily on the cache.
> > > >
> > > > RFC 8881 has the same text for NFSv4.1 - section 18.1.4
> > > >
> > > > "why do we care?" Because the server has changed to grant access,
> > > > but
> > > > the client is ignoring the possibility of that change - so the
> > > > user
> > > > is
> > > > prevented from doing work.
> > >
> > > The server enforces permissions in NFS. The client permissions
> > > checks
> > > are performed in order to gate access to data that is already in
> > > cache.
> >
> > So if the permission check fails, then the client should ignore the
> > cache and ask the server for the requested data, so that the server
> > has
> > a chance to enforce the permissions - whether denying or allowing the
> > access.
> >
> > I completely agree with you, and that is effectively what my patch
> > implements.
> >
>
> No. I'm saying that no matter how many spec paragraphs you quote at me,
> I'm not willing to introduce a timeout on a cache that is critical for
> path resolution in order to address a corner case of a corner case.
>
> I'm saying that if you want this problem fixed, then you need to look
> at a different solution that doesn't have side-effects for the
> 99.99999% cases or more that I do care about.
What, specifically, as the cases that you do care about?
I assumed that the cases you care about are cases where the user *does*
have access, and where this access is correctly cached, so that a
permission(..., MAY_EXEC)
call returns immediately with a positive answer.
I care about these cases too, and I've ensured that the patch doesn't
change the behaviour for these cases.
What other cases - cases where permission() returns an error - do you care
about?
Thanks,
NeilBrown
On Tue, 2022-05-17 at 11:05 +1000, NeilBrown wrote:
> On Tue, 17 May 2022, Trond Myklebust wrote:
> > On Tue, 2022-05-17 at 10:40 +1000, NeilBrown wrote:
> > > On Tue, 17 May 2022, Trond Myklebust wrote:
> > > > On Tue, 2022-05-17 at 10:05 +1000, NeilBrown wrote:
> > > > >
> > > > > Hi,
> > > > > any thoughts on these patches?
> > > >
> > > >
> > > > To me, this problem is simply not worth breaking hot path
> > > > functionality
> > > > for. If the credential changes on the server, but not on the
> > > > client
> > > > (so
> > > > that the cred cache comparison still matches), then why do we
> > > > care?
> > > >
> > > > IOW: I'm a NACK until convinced otherwise.
> > >
> > > In what way is the hot path broken? It only affect a permission
> > > test
> > > failure. Why is that considered "hot path"??
> >
> > It is a permission test that is critical for caching path
> > resolution on
> > a per-user basis.
> >
> > >
> > > RFC 1813 says - for NFSv3 at least -
> > >
> > > The information returned by the server in response to an
> > > ACCESS call is not permanent. It was correct at the exact
> > > time that the server performed the checks, but not
> > > necessarily afterwards. The server can revoke access
> > > permission at any time.
> > >
> > > Clearly the server can allow allow access at any time.
> > > This seems to argue against caching - or at least against relying
> > > too
> > > heavily on the cache.
> > >
> > > RFC 8881 has the same text for NFSv4.1 - section 18.1.4
> > >
> > > "why do we care?" Because the server has changed to grant access,
> > > but
> > > the client is ignoring the possibility of that change - so the
> > > user
> > > is
> > > prevented from doing work.
> >
> > The server enforces permissions in NFS. The client permissions
> > checks
> > are performed in order to gate access to data that is already in
> > cache.
>
> So if the permission check fails, then the client should ignore the
> cache and ask the server for the requested data, so that the server
> has
> a chance to enforce the permissions - whether denying or allowing the
> access.
>
> I completely agree with you, and that is effectively what my patch
> implements.
>
No. I'm saying that no matter how many spec paragraphs you quote at me,
I'm not willing to introduce a timeout on a cache that is critical for
path resolution in order to address a corner case of a corner case.
I'm saying that if you want this problem fixed, then you need to look
at a different solution that doesn't have side-effects for the
99.99999% cases or more that I do care about.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 16 May 2022, at 21:36, Trond Myklebust wrote:
> So until you have a different solution that doesn't impact the client's
> ability to cache permissions, then the answer is going to be "no" to
> these patches.
Hi Trond,
We have some folks negatively impacted by this issue as well. Are you
willing to consider this via a mount option?
Ben
On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > So until you have a different solution that doesn't impact the
> > client's
> > ability to cache permissions, then the answer is going to be "no"
> > to
> > these patches.
>
> Hi Trond,
>
> We have some folks negatively impacted by this issue as well. Are
> you
> willing to consider this via a mount option?
>
> Ben
>
I don't see how that answers my concern.
I'd rather see us set up an explicit trigger mechanism. It doesn't have
to be particularly sophisticated. I can imagine just having a global,
or more likely a per-container, cookie that has a control mechanism in
/sys/fs/nfs, and that can be used to order all the inodes to invalidate
their permissions caches when you believe there is a need to do so.
i.e. you cache the value of the global cookie in the inode, and if you
notice a change, then that's the signal that you need to invalidate the
permissions cache before updating the cached value of the cookie.
That way, you have a mechanism that serves all purposes: it can do an
immediate one-time only flush, or you can set up a userspace job that
issues a global flush once every so often, e.g. using a cron job.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 26 Aug 2022, at 11:44, Trond Myklebust wrote:
> On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
>> On 16 May 2022, at 21:36, Trond Myklebust wrote:
>>> So until you have a different solution that doesn't impact the
>>> client's
>>> ability to cache permissions, then the answer is going to be "no"
>>> to
>>> these patches.
>>
>> Hi Trond,
>>
>> We have some folks negatively impacted by this issue as well. Are
>> you
>> willing to consider this via a mount option?
>>
>> Ben
>>
>
> I don't see how that answers my concern.
A mount option would need to be set to enable the behavior, so the cases
you're concerned about would be unaffected.
> I'd rather see us set up an explicit trigger mechanism. It doesn't have
> to be particularly sophisticated. I can imagine just having a global,
> or more likely a per-container, cookie that has a control mechanism in
> /sys/fs/nfs, and that can be used to order all the inodes to invalidate
> their permissions caches when you believe there is a need to do so.
>
> i.e. you cache the value of the global cookie in the inode, and if you
> notice a change, then that's the signal that you need to invalidate the
> permissions cache before updating the cached value of the cookie.
>
> That way, you have a mechanism that serves all purposes: it can do an
> immediate one-time only flush, or you can set up a userspace job that
> issues a global flush once every so often, e.g. using a cron job.
We had the every-so-often flush per-inode before 57b691819ee2.
Here's the setup in play: there's a large number of v3 clients and users,
and many times each day group membership changes occur which either restrict
or grant access to parts of the namespace.
The feedback I'm getting is that it will be a lot of extra orchestration to
have to trigger something on each client when the group memberships change.
The desired behavior was as before 57b691819ee2: the access cache expired
with the attribute timeout. It would be nice to have a mount option that
could restore the access cache behavior as it was before 57b691819ee2.
Ben
On Fri, 2022-08-26 at 12:43 -0400, Benjamin Coddington wrote:
> On 26 Aug 2022, at 11:44, Trond Myklebust wrote:
>
> > On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> > > On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > > > So until you have a different solution that doesn't impact the
> > > > client's
> > > > ability to cache permissions, then the answer is going to be
> > > > "no"
> > > > to
> > > > these patches.
> > >
> > > Hi Trond,
> > >
> > > We have some folks negatively impacted by this issue as well.
> > > Are
> > > you
> > > willing to consider this via a mount option?
> > >
> > > Ben
> > >
> >
> > I don't see how that answers my concern.
>
> A mount option would need to be set to enable the behavior, so the
> cases
> you're concerned about would be unaffected.
>
> > I'd rather see us set up an explicit trigger mechanism. It doesn't
> > have
> > to be particularly sophisticated. I can imagine just having a
> > global,
> > or more likely a per-container, cookie that has a control mechanism
> > in
> > /sys/fs/nfs, and that can be used to order all the inodes to
> > invalidate
> > their permissions caches when you believe there is a need to do so.
> >
> > i.e. you cache the value of the global cookie in the inode, and if
> > you
> > notice a change, then that's the signal that you need to invalidate
> > the
> > permissions cache before updating the cached value of the cookie.
> >
> > That way, you have a mechanism that serves all purposes: it can do
> > an
> > immediate one-time only flush, or you can set up a userspace job
> > that
> > issues a global flush once every so often, e.g. using a cron job.
>
> We had the every-so-often flush per-inode before 57b691819ee2.
>
> Here's the setup in play: there's a large number of v3 clients and
> users,
> and many times each day group membership changes occur which either
> restrict
> or grant access to parts of the namespace.
>
> The feedback I'm getting is that it will be a lot of extra
> orchestration to
> have to trigger something on each client when the group memberships
> change.
> The desired behavior was as before 57b691819ee2: the access cache
> expired
> with the attribute timeout. It would be nice to have a mount option
> that
> could restore the access cache behavior as it was before
> 57b691819ee2.
>
User group membership is not a per-mount thing. It's a global thing.
As I said, what I'm proposing does allow you to set up a cron job that
flushes your cache on a regular basis. There is absolutely no extra
value whatsoever provided by moving that into the kernel on a per-mount
basis.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 26 Aug 2022, at 12:56, Trond Myklebust wrote:
> User group membership is not a per-mount thing. It's a global thing.
The cached access entry is a per-inode thing.
> As I said, what I'm proposing does allow you to set up a cron job that
> flushes your cache on a regular basis. There is absolutely no extra
> value whatsoever provided by moving that into the kernel on a per-mount
> basis.
Sure there is - that's where we configure major NFS client behaviors.
I understand where you're coming from, but it seems so bizarre that a previous
behavior that multiple organizations built and depend upon has been removed
to optimize performance, and now we will need to tell them that to restore
it we must write cron jobs on all the clients. I don't think there's been a
dependency on cron to get NFS to work a certain way yet.
A mount option is much easier to deploy in these organizations that have
autofs deployed, and documenting it in NFS(5) seems the right place.
If that's just not acceptable, at least let's just make a tuneable that
expires entries rather than a trigger to flush everything. Please consider
the configuration sprawl on the NFS client, and let me know how to proceed.
Ben
On Sat, 27 Aug 2022, Trond Myklebust wrote:
> On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> > On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > > So until you have a different solution that doesn't impact the
> > > client's
> > > ability to cache permissions, then the answer is going to be "no"
> > > to
> > > these patches.
> >
> > Hi Trond,
> >
> > We have some folks negatively impacted by this issue as well. Are
> > you
> > willing to consider this via a mount option?
> >
> > Ben
> >
>
> I don't see how that answers my concern.
Could you please spell out again what your concerns are? I still don't
understand.
The only performance impact is when a permission test fails. In what
circumstance is permission failure expected on a fast-path?
>
> I'd rather see us set up an explicit trigger mechanism. It doesn't have
> to be particularly sophisticated. I can imagine just having a global,
> or more likely a per-container, cookie that has a control mechanism in
> /sys/fs/nfs, and that can be used to order all the inodes to invalidate
> their permissions caches when you believe there is a need to do so.
I hope it would only invalidate negative cached permissions, not
positive.
Caches positive permissions aren't really a problem as we'll find out
they were wrong as soon as we send the relevant request to the server.
The problem with cached negative permissions is that we never even try
to send a request to the server.
The client doesn't *know* when the server changes it's understanding of
group membership, so it cannot know when to write to this. So the best
the client can do is invalidate negative cached permissions
periodically. So if this /sys/fs/nfs/ tunable were to be added, I would
like it to be a time interval after which they can expire (I would set it
to zero of course).
Thanks,
NeilBrown
>
> i.e. you cache the value of the global cookie in the inode, and if you
> notice a change, then that's the signal that you need to invalidate the
> permissions cache before updating the cached value of the cookie.
>
> That way, you have a mechanism that serves all purposes: it can do an
> immediate one-time only flush, or you can set up a userspace job that
> issues a global flush once every so often, e.g. using a cron job.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
>
On Fri, 2022-08-26 at 14:27 -0400, Benjamin Coddington wrote:
> On 26 Aug 2022, at 12:56, Trond Myklebust wrote:
> > User group membership is not a per-mount thing. It's a global
> > thing.
>
> The cached access entry is a per-inode thing.
That's irrelevant. We already have code that deal with per-inode
changes. Those changes happen when the owner changes, the group owner
changes, the mode changes or the ACL changes.
The problem that you're asking to be solved is when the lookup key for
every single access entry across the entire universe changes because
someone edited a group for a given person, and the server picked up on
that change at some time after the NFS client picked up on that change.
>
> > As I said, what I'm proposing does allow you to set up a cron job
> > that
> > flushes your cache on a regular basis. There is absolutely no extra
> > value whatsoever provided by moving that into the kernel on a per-
> > mount
> > basis.
>
> Sure there is - that's where we configure major NFS client behaviors.
>
> I understand where you're coming from, but it seems so bizarre that a
> previous
> behavior that multiple organizations built and depend upon has been
> removed
> to optimize performance, and now we will need to tell them that to
> restore
> it we must write cron jobs on all the clients. I don't think there's
> been a
> dependency on cron to get NFS to work a certain way yet.
>
> A mount option is much easier to deploy in these organizations that
> have
> autofs deployed, and documenting it in NFS(5) seems the right place.
>
> If that's just not acceptable, at least let's just make a tuneable
> that
> expires entries rather than a trigger to flush everything. Please
> consider
> the configuration sprawl on the NFS client, and let me know how to
> proceed.
>
Can we please try to solve the real problem first? The real problem is
not that user permissions change every hour on the server.
POSIX normally only expects changes to happen to your group membership
when you log in. The problem here is that the NFS server is updating
its rules concerning your group membership at some random time after
your log in on the NFS client.
So how about if we just do our best to approximate the POSIX rules, and
promise to revalidate your cached file access permissions at least once
after you log in? Then we can let the NFS server do whatever the hell
it wants to do after that.
IOW: If the sysadmin changes the group membership for the user, then
the user can remedy the problem by logging out and then logging back in
again, just like they do for local filesystems.
8<----------------------------------------
From 72d5b8b6bb053ff3574c4dcbf7ecf3102e43b5b8 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Fri, 26 Aug 2022 19:44:44 -0400
Subject: [PATCH] NFS: Clear the file access cache upon login
POSIX typically only refreshes the user's supplementary group
information upon login. Since NFS servers may often refresh their
concept of the user supplementary group membership at their own cadence,
it is possible for the NFS client's access cache to become stale due to
the user's group membership changing on the server after the user has
already logged in on the client.
While it is reasonable to expect that such group membership changes are
rare, and that we do not want to optimise the cache to accommodate them,
it is also not unreasonable for the user to expect that if they log out
and log back in again, that the staleness would clear up.
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/dir.c | 22 ++++++++++++++++++++++
include/linux/nfs_fs.h | 1 +
2 files changed, 23 insertions(+)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 5d6c2ddc7ea6..5e09cb79544e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2949,9 +2949,28 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, co
return NULL;
}
+static u64 nfs_access_login_time(const struct task_struct *task,
+ const struct cred *cred)
+{
+ const struct task_struct *parent;
+ u64 ret;
+
+ rcu_read_lock();
+ for (;;) {
+ parent = rcu_dereference(task->real_parent);
+ if (parent == task || cred_fscmp(parent->cred, cred) != 0)
+ break;
+ task = parent;
+ }
+ ret = task->start_time;
+ rcu_read_unlock();
+ return ret;
+}
+
static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *cred, u32 *mask, bool may_block)
{
struct nfs_inode *nfsi = NFS_I(inode);
+ u64 login_time = nfs_access_login_time(current, cred);
struct nfs_access_entry *cache;
bool retry = true;
int err;
@@ -2979,6 +2998,8 @@ static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *
spin_lock(&inode->i_lock);
retry = false;
}
+ if ((s64)(cache->timestamp - login_time) < 0)
+ goto out_zap;
*mask = cache->mask;
list_move_tail(&cache->lru, &nfsi->access_cache_entry_lru);
err = 0;
@@ -3058,6 +3079,7 @@ static void nfs_access_add_rbtree(struct inode *inode,
else
goto found;
}
+ set->timestamp = ktime_get_ns();
rb_link_node(&set->rb_node, parent, p);
rb_insert_color(&set->rb_node, root_node);
list_add_tail(&set->lru, &nfsi->access_cache_entry_lru);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7931fa472561..d92fdfd2444c 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -59,6 +59,7 @@ struct nfs_access_entry {
kuid_t fsuid;
kgid_t fsgid;
struct group_info *group_info;
+ u64 timestamp;
__u32 mask;
struct rcu_head rcu_head;
};
--
2.37.2
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Sat, 2022-08-27 at 09:39 +1000, NeilBrown wrote:
> On Sat, 27 Aug 2022, Trond Myklebust wrote:
> > On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> > > On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > > > So until you have a different solution that doesn't impact the
> > > > client's
> > > > ability to cache permissions, then the answer is going to be
> > > > "no"
> > > > to
> > > > these patches.
> > >
> > > Hi Trond,
> > >
> > > We have some folks negatively impacted by this issue as well.
> > > Are
> > > you
> > > willing to consider this via a mount option?
> > >
> > > Ben
> > >
> >
> > I don't see how that answers my concern.
>
> Could you please spell out again what your concerns are? I still
> don't
> understand.
> The only performance impact is when a permission test fails. In what
> circumstance is permission failure expected on a fast-path?
>
You're treating the problem as if it were a timeout issue, when clearly
it has nothing at all to do with timeouts. There is no problem of
'group membership changes on a regular basis' to be solved.
The problem to be solved is that on the very rare occasion when a group
membership does change, then the server and the client may update their
view of that membership at completely different times. In the
particular case when the client updates its view of the group
membership before the server does, then the access cache is polluted,
and there is no remedy.
So my concerns are around the mismatch of problem and solution. I see
multiple issues.
1. Your timeouts are per inode. That means that if inode A sees the
problem being solved, then there is no guarantee that inode B
sees the same problem as being solved (and the converse is true
as well).
2. There is no quick on-the-spot solution. If your admin updates the
group membership, then you are only guaranteed that the client
and server are in sync once the server has picked up the solution
(however you arrange that), and the client cache has expired.
IOW: your only solution is to wait 1 client cache expiration
period after the server is known to be fixed (or to reboot the
client).
3. There is no solution at all for the positive cache case. If your
sysadmin is trying to revoke an access due to a group membership
change, their only solution is to reboot the client.
4. You are tying the access cache timeout to the completely
unrelated 'acregmin' and 'acdirmin' values. Not only does that
mean that the default values for regular files are extremely
small (3 seconds), meaning that we have to refresh extremely
often. However it also means that you have to explain why
directories behave differently (longer default timeouts) despite
the fact that the group membership changed at exactly the same
time for both types of object.
1. Bonus points for explaining why our default values are designed
for a group membership that changes every 3 seconds.
5. 'noac' suddenly now turns off access caching, but only for
negative cached values.
--
Trond Myklebust Linux NFS client maintainer, Hammerspace
[email protected]
On Sat, 27 Aug 2022, Trond Myklebust wrote:
> On Sat, 2022-08-27 at 09:39 +1000, NeilBrown wrote:
> > On Sat, 27 Aug 2022, Trond Myklebust wrote:
> > > On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> > > > On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > > > > So until you have a different solution that doesn't impact the
> > > > > client's
> > > > > ability to cache permissions, then the answer is going to be
> > > > > "no"
> > > > > to
> > > > > these patches.
> > > >
> > > > Hi Trond,
> > > >
> > > > We have some folks negatively impacted by this issue as well.
> > > > Are
> > > > you
> > > > willing to consider this via a mount option?
> > > >
> > > > Ben
> > > >
> > >
> > > I don't see how that answers my concern.
> >
> > Could you please spell out again what your concerns are? I still
> > don't
> > understand.
> > The only performance impact is when a permission test fails. In what
> > circumstance is permission failure expected on a fast-path?
> >
>
> You're treating the problem as if it were a timeout issue, when clearly
> it has nothing at all to do with timeouts. There is no problem of
> 'group membership changes on a regular basis' to be solved.
You are the one who suggested a timeout. I quote:
> That way, you have a mechanism that serves all purposes: it can do an
> immediate one-time only flush, or you can set up a userspace job that
> issues a global flush once every so often, e.g. using a cron job.
"every so often" == "timeout". I thought that maybe this was somewhere
that we could find some agreement. As I said, I would set the timeout
to zero. I don't really want the timeout - not more than the ac
timeouts we already have.
>
> The problem to be solved is that on the very rare occasion when a group
> membership does change, then the server and the client may update their
> view of that membership at completely different times. In the
> particular case when the client updates its view of the group
> membership before the server does, then the access cache is polluted,
> and there is no remedy.
Agreed.
>
> So my concerns are around the mismatch of problem and solution. I see
> multiple issues.
>
> 1. Your timeouts are per inode. That means that if inode A sees the
> problem being solved, then there is no guarantee that inode B
> sees the same problem as being solved (and the converse is true
> as well).
Is that a problem? Is that even anything new?
If I chmod file A and file B on the server, then the client may see
the change to file A before the change to file B (or vice-versa).
i.e. the inconsistent-cache problem might be solved for one but not the
other. It has always been this way.
> 2. There is no quick on-the-spot solution. If your admin updates the
> group membership, then you are only guaranteed that the client
> and server are in sync once the server has picked up the solution
> (however you arrange that), and the client cache has expired.
> IOW: your only solution is to wait 1 client cache expiration
> period after the server is known to be fixed (or to reboot the
> client).
This is also the only solution to seeing other changes that have been
made to inodes.
For file/directory content you can open the file/directory and this
triggers CTO consistency checks. But stat() or access() doesn't.
Hmmm.. What if we add an ACCESS check to the OPEN request for files, and
the equivalent GETATTR for directories? That would provide a direct
way to force a refresh without adding any extra RPC requests??
> 3. There is no solution at all for the positive cache case. If your
> sysadmin is trying to revoke an access due to a group membership
> change, their only solution is to reboot the client.
Yes. Revoking read/execute access that you have already granted is not
really possible. The application may have already read the file. It
might even have emailed the content to $BLACKHAT. Even rebooting the
client isn't really a solution.
Revoking write access already works fine as does revoking read access to
a file before putting new content in it - the new content is safe.
> 4. You are tying the access cache timeout to the completely
> unrelated 'acregmin' and 'acdirmin' values. Not only does that
> mean that the default values for regular files are extremely
> small (3 seconds), meaning that we have to refresh extremely
> often. However it also means that you have to explain why
> directories behave differently (longer default timeouts) despite
> the fact that the group membership changed at exactly the same
> time for both types of object.
> 1. Bonus points for explaining why our default values are designed
> for a group membership that changes every 3 seconds.
I don't see why you treat the access information as different from all
the other attributes. "bob has group x access to the directory" and
"file size if 42000 bytes" are just attributes of the inode. We collect
them different ways, but they are not deeply different.
The odd thing here is that we cache these "access" attributes
indefinitely when ctime doesn't change - even though there is no
guarantee that ctime captures access changes. I think that choice needs
to be justified.
Using the cached value indefinitely when it grants access is defensible
because it is a frequent operation (checking x access in a directory
while following a patch). I think that adequately justifies the choice.
I cannot see the justification when the access is denied by the cache.
> 5. 'noac' suddenly now turns off access caching, but only for
> negative cached values.
Is this a surprise?
But what do you think of adding an ACCESS check when opening a dir/file?
At least for NFSv4?
Thanks,
NeilBrown
On Mon, 2022-08-29 at 09:32 +1000, NeilBrown wrote:
> On Sat, 27 Aug 2022, Trond Myklebust wrote:
> > On Sat, 2022-08-27 at 09:39 +1000, NeilBrown wrote:
> > > On Sat, 27 Aug 2022, Trond Myklebust wrote:
> > > > On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> > > > > On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > > > > > So until you have a different solution that doesn't impact the
> > > > > > client's
> > > > > > ability to cache permissions, then the answer is going to be
> > > > > > "no"
> > > > > > to
> > > > > > these patches.
> > > > >
> > > > > Hi Trond,
> > > > >
> > > > > We have some folks negatively impacted by this issue as well.?
> > > > > Are
> > > > > you
> > > > > willing to consider this via a mount option?
> > > > >
> > > > > Ben
> > > > >
> > > >
> > > > I don't see how that answers my concern.
> > >
> > > Could you please spell out again what your concerns are?? I still
> > > don't
> > > understand.
> > > The only performance impact is when a permission test fails.? In what
> > > circumstance is permission failure expected on a fast-path?
> > >
> >
> > You're treating the problem as if it were a timeout issue, when clearly
> > it has nothing at all to do with timeouts. There is no problem of
> > 'group membership changes on a regular basis' to be solved.
>
> You are the one who suggested a timeout. I quote:
>
> > That way, you have a mechanism that serves all purposes: it can do an
> > immediate one-time only flush, or you can set up a userspace job that
> > issues a global flush once every so often, e.g. using a cron job.
>
> "every so often" == "timeout". I thought that maybe this was somewhere
> that we could find some agreement. As I said, I would set the timeout
> to zero. I don't really want the timeout - not more than the ac
> timeouts we already have.
>
> >
> > The problem to be solved is that on the very rare occasion when a group
> > membership does change, then the server and the client may update their
> > view of that membership at completely different times. In the
> > particular case when the client updates its view of the group
> > membership before the server does, then the access cache is polluted,
> > and there is no remedy.
>
> Agreed.
>
> >
> > So my concerns are around the mismatch of problem and solution. I see
> > multiple issues.
> >
> > 1. Your timeouts are per inode. That means that if inode A sees the
> > problem being solved, then there is no guarantee that inode B
> > sees the same problem as being solved (and the converse is true
> > as well).
>
> Is that a problem? Is that even anything new?
> If I chmod file A and file B on the server, then the client may see
> the change to file A before the change to file B (or vice-versa).
> i.e. the inconsistent-cache problem might be solved for one but not the
> other. It has always been this way.
>
> > 2. There is no quick on-the-spot solution. If your admin updates the
> > group membership, then you are only guaranteed that the client
> > and server are in sync once the server has picked up the solution
> > (however you arrange that), and the client cache has expired.
> > IOW: your only solution is to wait 1 client cache expiration
> > period after the server is known to be fixed (or to reboot the
> > client).
>
> This is also the only solution to seeing other changes that have been
> made to inodes.
> For file/directory content you can open the file/directory and this
> triggers CTO consistency checks. But stat() or access() doesn't.
>
> Hmmm.. What if we add an ACCESS check to the OPEN request for files, and
> the equivalent GETATTR for directories? That would provide a direct
> way to force a refresh without adding any extra RPC requests??
>
> > 3. There is no solution at all for the positive cache case. If your
> > sysadmin is trying to revoke an access due to a group membership
> > change, their only solution is to reboot the client.
>
> Yes. Revoking read/execute access that you have already granted is not
> really possible. The application may have already read the file. It
> might even have emailed the content to $BLACKHAT. Even rebooting the
> client isn't really a solution.
> Revoking write access already works fine as does revoking read access to
> a file before putting new content in it - the new content is safe.
>
> > 4. You are tying the access cache timeout to the completely
> > unrelated 'acregmin' and 'acdirmin' values. Not only does that
> > mean that the default values for regular files are extremely
> > small (3 seconds), meaning that we have to refresh extremely
> > often. However it also means that you have to explain why
> > directories behave differently (longer default timeouts) despite
> > the fact that the group membership changed at exactly the same
> > time for both types of object.
> > 1. Bonus points for explaining why our default values are designed
> > for a group membership that changes every 3 seconds.
>
> I don't see why you treat the access information as different from all
> the other attributes. "bob has group x access to the directory" and
> "file size if 42000 bytes" are just attributes of the inode. We collect
> them different ways, but they are not deeply different.
>
> The odd thing here is that we cache these "access" attributes
> indefinitely when ctime doesn't change - even though there is no
> guarantee that ctime captures access changes. I think that choice needs
> to be justified.
Maybe I'm being pedantic, but I don't see the first as an inode
attribute. There are really 2 pieces to that access control example:
- bob is a member of group x
- group x has access to the directory
The first has nothing directly to do with the inode and so it's no
surprise that its ctime and i_version aren't affected when group
membership changes.
> Using the cached value indefinitely when it grants access is defensible
> because it is a frequent operation (checking x access in a directory
> while following a patch). I think that adequately justifies the choice.
> I cannot see the justification when the access is denied by the cache.
>
> > 5. 'noac' suddenly now turns off access caching, but only for
> > negative cached values.
>
> Is this a surprise?
>
> But what do you think of adding an ACCESS check when opening a dir/file?
> At least for NFSv4?
--
Jeff Layton <[email protected]>
On Tue, 30 Aug 2022, Jeff Layton wrote:
> On Mon, 2022-08-29 at 09:32 +1000, NeilBrown wrote:
> > On Sat, 27 Aug 2022, Trond Myklebust wrote:
> > > On Sat, 2022-08-27 at 09:39 +1000, NeilBrown wrote:
> > > > On Sat, 27 Aug 2022, Trond Myklebust wrote:
> > > > > On Fri, 2022-08-26 at 10:59 -0400, Benjamin Coddington wrote:
> > > > > > On 16 May 2022, at 21:36, Trond Myklebust wrote:
> > > > > > > So until you have a different solution that doesn't impact the
> > > > > > > client's
> > > > > > > ability to cache permissions, then the answer is going to be
> > > > > > > "no"
> > > > > > > to
> > > > > > > these patches.
> > > > > >
> > > > > > Hi Trond,
> > > > > >
> > > > > > We have some folks negatively impacted by this issue as well.
> > > > > > Are
> > > > > > you
> > > > > > willing to consider this via a mount option?
> > > > > >
> > > > > > Ben
> > > > > >
> > > > >
> > > > > I don't see how that answers my concern.
> > > >
> > > > Could you please spell out again what your concerns are? I still
> > > > don't
> > > > understand.
> > > > The only performance impact is when a permission test fails. In what
> > > > circumstance is permission failure expected on a fast-path?
> > > >
> > >
> > > You're treating the problem as if it were a timeout issue, when clearly
> > > it has nothing at all to do with timeouts. There is no problem of
> > > 'group membership changes on a regular basis' to be solved.
> >
> > You are the one who suggested a timeout. I quote:
> >
> > > That way, you have a mechanism that serves all purposes: it can do an
> > > immediate one-time only flush, or you can set up a userspace job that
> > > issues a global flush once every so often, e.g. using a cron job.
> >
> > "every so often" == "timeout". I thought that maybe this was somewhere
> > that we could find some agreement. As I said, I would set the timeout
> > to zero. I don't really want the timeout - not more than the ac
> > timeouts we already have.
> >
> > >
> > > The problem to be solved is that on the very rare occasion when a group
> > > membership does change, then the server and the client may update their
> > > view of that membership at completely different times. In the
> > > particular case when the client updates its view of the group
> > > membership before the server does, then the access cache is polluted,
> > > and there is no remedy.
> >
> > Agreed.
> >
> > >
> > > So my concerns are around the mismatch of problem and solution. I see
> > > multiple issues.
> > >
> > > 1. Your timeouts are per inode. That means that if inode A sees the
> > > problem being solved, then there is no guarantee that inode B
> > > sees the same problem as being solved (and the converse is true
> > > as well).
> >
> > Is that a problem? Is that even anything new?
> > If I chmod file A and file B on the server, then the client may see
> > the change to file A before the change to file B (or vice-versa).
> > i.e. the inconsistent-cache problem might be solved for one but not the
> > other. It has always been this way.
> >
> > > 2. There is no quick on-the-spot solution. If your admin updates the
> > > group membership, then you are only guaranteed that the client
> > > and server are in sync once the server has picked up the solution
> > > (however you arrange that), and the client cache has expired.
> > > IOW: your only solution is to wait 1 client cache expiration
> > > period after the server is known to be fixed (or to reboot the
> > > client).
> >
> > This is also the only solution to seeing other changes that have been
> > made to inodes.
> > For file/directory content you can open the file/directory and this
> > triggers CTO consistency checks. But stat() or access() doesn't.
> >
> > Hmmm.. What if we add an ACCESS check to the OPEN request for files, and
> > the equivalent GETATTR for directories? That would provide a direct
> > way to force a refresh without adding any extra RPC requests??
> >
> > > 3. There is no solution at all for the positive cache case. If your
> > > sysadmin is trying to revoke an access due to a group membership
> > > change, their only solution is to reboot the client.
> >
> > Yes. Revoking read/execute access that you have already granted is not
> > really possible. The application may have already read the file. It
> > might even have emailed the content to $BLACKHAT. Even rebooting the
> > client isn't really a solution.
> > Revoking write access already works fine as does revoking read access to
> > a file before putting new content in it - the new content is safe.
> >
> > > 4. You are tying the access cache timeout to the completely
> > > unrelated 'acregmin' and 'acdirmin' values. Not only does that
> > > mean that the default values for regular files are extremely
> > > small (3 seconds), meaning that we have to refresh extremely
> > > often. However it also means that you have to explain why
> > > directories behave differently (longer default timeouts) despite
> > > the fact that the group membership changed at exactly the same
> > > time for both types of object.
> > > 1. Bonus points for explaining why our default values are designed
> > > for a group membership that changes every 3 seconds.
> >
> > I don't see why you treat the access information as different from all
> > the other attributes. "bob has group x access to the directory" and
> > "file size if 42000 bytes" are just attributes of the inode. We collect
> > them different ways, but they are not deeply different.
> >
> > The odd thing here is that we cache these "access" attributes
> > indefinitely when ctime doesn't change - even though there is no
> > guarantee that ctime captures access changes. I think that choice needs
> > to be justified.
>
> Maybe I'm being pedantic, but I don't see the first as an inode
> attribute. There are really 2 pieces to that access control example:
>
> - bob is a member of group x
> - group x has access to the directory
>
> The first has nothing directly to do with the inode and so it's no
> surprise that its ctime and i_version aren't affected when group
> membership changes.
The whole point of the NFS ACCESS request is that the client cannot know
how to interpret any access controls that the server might have in
place. Who knows, they might even be time based access controls
(games can only be played on the weekend). Ok, that is a bit extreme,
but the principle remains.
*You* might "know" that there are two pieces (in this specific example)
and that one doesn't directly relate to the inodes, but the NFS client
doesn't know that. The NFS client only knows that it can request ACCESS
information for a filehandle in the context of a credential. The Linux
NFS client then caches that against the inode (so it is all somehow
related to the inode).
The spec doesn't give the client permission to cached these details AT
ALL. Caching positive results makes perfect sense from a performance
perspective and is easily defensible. Caching negative results ... not
so much.
Thanks,
NeilBrown
>
> > Using the cached value indefinitely when it grants access is defensible
> > because it is a frequent operation (checking x access in a directory
> > while following a patch). I think that adequately justifies the choice.
> > I cannot see the justification when the access is denied by the cache.
> >
> > > 5. 'noac' suddenly now turns off access caching, but only for
> > > negative cached values.
> >
> > Is this a surprise?
> >
> > But what do you think of adding an ACCESS check when opening a dir/file?
> > At least for NFSv4?
>
> --
> Jeff Layton <[email protected]>
>
On Sat, 2022-09-03 at 19:57 +1000, NeilBrown wrote:
> The whole point of the NFS ACCESS request is that the client cannot
> know
> how to interpret any access controls that the server might have in
> place. Who knows, they might even be time based access controls
> (games can only be played on the weekend). Ok, that is a bit
> extreme,
> but the principle remains.
>
> *You* might "know" that there are two pieces (in this specific
> example)
> and that one doesn't directly relate to the inodes, but the NFS
> client
> doesn't know that. The NFS client only knows that it can request
> ACCESS
> information for a filehandle in the context of a credential. The
> Linux
> NFS client then caches that against the inode (so it is all somehow
> related to the inode).
>
> The spec doesn't give the client permission to cached these details
> AT
> ALL. Caching positive results makes perfect sense from a performance
> perspective and is easily defensible. Caching negative results ...
> not
> so much.
Of course the spec allows the client to cache permissions in both NFSv3
and NFSv4. Otherwise there would be no way we could cache objects at
all. Enabling the client to gate access to cached data such as
directory contents, and cached data without troubling the server is the
whole point of the ACCESS operation. Without that ability, we would
have to ask the server every time a user tried to 'cd' into a
directory, called open(), called getdents(), called read(), called
write(), ... Without that ability, NFSv4 delegations would be
pointless, because we'd have to go and ask the server for all these
operations.
I know the spec is littered with mealy mouthed "The server can revoke
access permission at any time" statements in various spots, but that
looks more like something to placate a review by the IBM legal team
(yes, it happened) rather than a basic assumption that was engineered
into the protocol. The protocol simply cannot work as described in
RFC5661 and RFC7530 without the assumption that you can cache.
The other point is that access cache changes due to AD/LDAP/NIS group
membership updates happen once in a blue moon, and when they do, they
affect _ALL_ cached values for that user, and not just one at a time.
When those things do happen, the user normally expects to have to log
out and then log back in so that the new group memberships are
reflected in that user's cred (as per POSIX prescribed behaviour). That
will cause the NFS client to send new ACCESS calls to the server,
because the new cred will not be found in the existing access cache.
The only case where this assumption fails, is if the server is
implementing the '--manage-gids' behaviour, and caches its
representation of the user's creds past the time when the user logged
out and logged in again.
So this is literally a 1 in a billion chance problem that you're asking
us to fix. Optimising for that case by adding in a timeout that affects
all cached access value is just not in the cards.
I've already sent out a patch that forces realignment of the NFS client
behaviour with that of POSIX by requiring the client to recheck cached
access values for an existing credential after a new login, but does
not add any further impediments to the access cache. Why is that
approach not sufficient?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Sun, 04 Sep 2022, Trond Myklebust wrote:
> On Sat, 2022-09-03 at 19:57 +1000, NeilBrown wrote:
> > The whole point of the NFS ACCESS request is that the client cannot
> > know
> > how to interpret any access controls that the server might have in
> > place. Who knows, they might even be time based access controls
> > (games can only be played on the weekend). Ok, that is a bit
> > extreme,
> > but the principle remains.
> >
> > *You* might "know" that there are two pieces (in this specific
> > example)
> > and that one doesn't directly relate to the inodes, but the NFS
> > client
> > doesn't know that. The NFS client only knows that it can request
> > ACCESS
> > information for a filehandle in the context of a credential. The
> > Linux
> > NFS client then caches that against the inode (so it is all somehow
> > related to the inode).
> >
> > The spec doesn't give the client permission to cached these details
> > AT
> > ALL. Caching positive results makes perfect sense from a performance
> > perspective and is easily defensible. Caching negative results ...
> > not
> > so much.
>
> Of course the spec allows the client to cache permissions in both NFSv3
> and NFSv4. Otherwise there would be no way we could cache objects at
> all. Enabling the client to gate access to cached data such as
> directory contents, and cached data without troubling the server is the
> whole point of the ACCESS operation. Without that ability, we would
> have to ask the server every time a user tried to 'cd' into a
> directory, called open(), called getdents(), called read(), called
> write(), ... Without that ability, NFSv4 delegations would be
> pointless, because we'd have to go and ask the server for all these
> operations.
> I know the spec is littered with mealy mouthed "The server can revoke
> access permission at any time" statements in various spots, but that
> looks more like something to placate a review by the IBM legal team
> (yes, it happened) rather than a basic assumption that was engineered
> into the protocol. The protocol simply cannot work as described in
> RFC5661 and RFC7530 without the assumption that you can cache.
It isn't explicit though is it? Not like the caching of data which is
explicit.
So I agree the reading between the lines tells us that some caching is
needed and so permitted, but that doesn't mean that any and all caching
of ACCESS results is good. We should cache what we need, and no more.
>
> The other point is that access cache changes due to AD/LDAP/NIS group
> membership updates happen once in a blue moon, and when they do, they
> affect _ALL_ cached values for that user, and not just one at a time.
> When those things do happen, the user normally expects to have to log
> out and then log back in so that the new group memberships are
> reflected in that user's cred (as per POSIX prescribed behaviour). That
> will cause the NFS client to send new ACCESS calls to the server,
> because the new cred will not be found in the existing access cache.
> The only case where this assumption fails, is if the server is
> implementing the '--manage-gids' behaviour, and caches its
> representation of the user's creds past the time when the user logged
> out and logged in again.
> So this is literally a 1 in a billion chance problem that you're asking
> us to fix. Optimising for that case by adding in a timeout that affects
> all cached access value is just not in the cards.
I'll accept 1 in several million (only two enterprises identified out of
how many that use Linux NFS clients?). Obviously that doesn't justify
anything potentially costly.
I don't think there is any interesting cost in not caching negative
access results beyond the normal attribute-cache timeout. I don't
recall you addressing this particular issue. What is the cost?
>
> I've already sent out a patch that forces realignment of the NFS client
> behaviour with that of POSIX by requiring the client to recheck cached
> access values for an existing credential after a new login, but does
> not add any further impediments to the access cache. Why is that
> approach not sufficient?
I think you've observed this fact yourself, but propagation times for
group membership changes (e.g. though LDAP) are not uniform. It is
entirely possible for the client to see a new membership before the
server. So with --manage-gids (or krb5) it is possible to log in to the
client, check that group membership has been updated, access a file
and have that fail. But then a minute later (if you could flush the
access cache) have that access succeed.
When I was first presented with this problem I thought logout/login was
the answer and even provided a patch which would mean different logins
got different cache keys. This wasn't very well received because it
used to work without log out/in (because we didn't cache access
indefinitely), but the killer was that it didn't even work reliably -
because of propagation variability.
Thanks,
NeilBrown
On Mon, 05 Sep 2022, Trond Myklebust wrote:
> On Mon, 2022-09-05 at 09:28 +1000, NeilBrown wrote:
> >
> > When I was first presented with this problem I thought logout/login
> > was
> > the answer and even provided a patch which would mean different
> > logins
> > got different cache keys. This wasn't very well received because it
> > used to work without log out/in (because we didn't cache access
> > indefinitely), but the killer was that it didn't even work reliably -
> > because of propagation variability.
>
> Then the onus is upon you and the people who don't want the two
> proposed solutions to figure out new ones. Timeouts are not acceptable.
Why not? They have always been part of NFS.
But let's forget about timeouts for the moment.
You still haven't answered my question: What is the cost of not caching
negative access results??
NeilBrown
On Mon, 2022-09-05 at 09:28 +1000, NeilBrown wrote:
>
> When I was first presented with this problem I thought logout/login
> was
> the answer and even provided a patch which would mean different
> logins
> got different cache keys. This wasn't very well received because it
> used to work without log out/in (because we didn't cache access
> indefinitely), but the killer was that it didn't even work reliably -
> because of propagation variability.
Then the onus is upon you and the people who don't want the two
proposed solutions to figure out new ones. Timeouts are not acceptable.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Mon, 2022-09-05 at 10:09 +1000, NeilBrown wrote:
> On Mon, 05 Sep 2022, Trond Myklebust wrote:
> > On Mon, 2022-09-05 at 09:28 +1000, NeilBrown wrote:
> > >
> > > When I was first presented with this problem I thought
> > > logout/login
> > > was
> > > the answer and even provided a patch which would mean different
> > > logins
> > > got different cache keys. This wasn't very well received because
> > > it
> > > used to work without log out/in (because we didn't cache access
> > > indefinitely), but the killer was that it didn't even work
> > > reliably -
> > > because of propagation variability.
> >
> > Then the onus is upon you and the people who don't want the two
> > proposed solutions to figure out new ones. Timeouts are not
> > acceptable.
>
> Why not? They have always been part of NFS.
In moderation, yes. However we don't gratuitously fling them at every
problem for the very good reason that they tend to create more problems
than they solve. Look at the proliferation of mount options that
already exist to solve the basic problem of tuning of all the timeouts
that we already have.
> But let's forget about timeouts for the moment.
> You still haven't answered my question: What is the cost of not
> caching
> negative access results??
>
What makes you think they are rare enough to compare to the 1 in a
billion cases you're talking about? You're the one who is claiming
without offering any shred of proof that negative results are rare.
As far as I'm concerned, all it takes is $PATH component that does not
have lookup rights set to allow 'other', or ditto for a library or
config file path component. Alternatively, you happen to have a file
that is not executable in $PATH, but with a name that matches a real
executable further down the list. Suddenly you are hammering the server
despite the fact that the permissions are never observed to change.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 26 Aug 2022, at 20:52, Trond Myklebust wrote:
> Can we please try to solve the real problem first? The real problem is
> not that user permissions change every hour on the server.
>
> POSIX normally only expects changes to happen to your group membership
> when you log in. The problem here is that the NFS server is updating
> its rules concerning your group membership at some random time after
> your log in on the NFS client.
>
> So how about if we just do our best to approximate the POSIX rules, and
> promise to revalidate your cached file access permissions at least once
> after you log in? Then we can let the NFS server do whatever the hell
> it wants to do after that.
> IOW: If the sysadmin changes the group membership for the user, then
> the user can remedy the problem by logging out and then logging back in
> again, just like they do for local filesystems.
This goes a long way toward fixing things up for us, I appreciate it, and
hope to see it merged. The version on your testing branch (d84b059f) can
have my:
Reviewed-by: Benjamin Coddington <[email protected]>
Tested-by: Benjamin Coddington <[email protected]>
Ben
On Tue, 20 Sep 2022, Benjamin Coddington wrote:
> On 26 Aug 2022, at 20:52, Trond Myklebust wrote:
>
> > Can we please try to solve the real problem first? The real problem is
> > not that user permissions change every hour on the server.
> >
> > POSIX normally only expects changes to happen to your group membership
> > when you log in. The problem here is that the NFS server is updating
> > its rules concerning your group membership at some random time after
> > your log in on the NFS client.
> >
> > So how about if we just do our best to approximate the POSIX rules, and
> > promise to revalidate your cached file access permissions at least once
> > after you log in? Then we can let the NFS server do whatever the hell
> > it wants to do after that.
> > IOW: If the sysadmin changes the group membership for the user, then
> > the user can remedy the problem by logging out and then logging back in
> > again, just like they do for local filesystems.
>
> This goes a long way toward fixing things up for us, I appreciate it, and
> hope to see it merged. The version on your testing branch (d84b059f) can
> have my:
>
> Reviewed-by: Benjamin Coddington <[email protected]>
> Tested-by: Benjamin Coddington <[email protected]>
>
The test in that commit can be "gamed".
I could write a tool that double-forks with the intermediate exiting
so the grandchild will be inherited by init. Then the grandchild can
access the problematic path and force the access cache for the current
user to be refreshed. It would optionally need to do a 'find' to be
thorough.
Is this an API we are willing to support indefinitely? Should I write
the tool?
NeilBrown
On Tue, 2022-09-20 at 08:38 +1000, NeilBrown wrote:
> On Tue, 20 Sep 2022, Benjamin Coddington wrote:
> > On 26 Aug 2022, at 20:52, Trond Myklebust wrote:
> >
> > > Can we please try to solve the real problem first? The real
> > > problem is
> > > not that user permissions change every hour on the server.
> > >
> > > POSIX normally only expects changes to happen to your group
> > > membership
> > > when you log in. The problem here is that the NFS server is
> > > updating
> > > its rules concerning your group membership at some random time
> > > after
> > > your log in on the NFS client.
> > >
> > > So how about if we just do our best to approximate the POSIX
> > > rules, and
> > > promise to revalidate your cached file access permissions at
> > > least once
> > > after you log in? Then we can let the NFS server do whatever the
> > > hell
> > > it wants to do after that.
> > > IOW: If the sysadmin changes the group membership for the user,
> > > then
> > > the user can remedy the problem by logging out and then logging
> > > back in
> > > again, just like they do for local filesystems.
> >
> > This goes a long way toward fixing things up for us, I appreciate
> > it, and
> > hope to see it merged. The version on your testing branch
> > (d84b059f) can
> > have my:
> >
> > Reviewed-by: Benjamin Coddington <[email protected]>
> > Tested-by: Benjamin Coddington <[email protected]>
> >
>
> The test in that commit can be "gamed".
> I could write a tool that double-forks with the intermediate exiting
> so the grandchild will be inherited by init. Then the grandchild can
> access the problematic path and force the access cache for the
> current
> user to be refreshed. It would optionally need to do a 'find' to be
> thorough.
Sure... If two tasks share the same cred, one can do deliberately crazy
stuff to break performance of the other. Is that news?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]