2023-09-20 23:14:24

by Charles Hedrick

[permalink] [raw]
Subject: bad info in NFS context

Ubuntu 22 client and server (5.15). Mount is 4.2, sec=sys. I add a user to a group, but they can't see things that the group should be able to see. /proc/net/rpc/auth.unix.gid/content shows that the nfs group cache has their group membership. Doing a mount -o vers=4.1 works (4.1 to force a different context). Other users that didn't try before work. It's been several hours, and 4.2 still won't work for this user.

What do I need to flush?

Note that I'm using gssproxy on the server.


2023-09-21 17:30:55

by Charles Hedrick

[permalink] [raw]
Subject: Re: bad info in NFS context

thanks. I can work with that info. Restarting the server isn't practical. This is a large-scale system serving hundreds of students. We generally keep it up uninterrupted for a whole semester.

>?So, the NFS client will keep caching the result of previous calls to unchanged inodes until it notices that the process' oldest parent with the same user/credential has a task start_time that is older than the currently cached entries.

I trust you mean newer. This is jupyterhub, which likes to keep user processes around after logout and reattach when they login. But as long as we know what's going on, there's a way for a user to kill their processes manually.


From: Benjamin Coddington <[email protected]>
Sent: Thursday, September 21, 2023 9:49 AM
To: Charles Hedrick <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: bad info in NFS context
?
On 21 Sep 2023, at 9:36, Charles Hedrick wrote:

> this is a web app. What is it about logging out and logging in that clears the cache? We'll need to make sure that the web app does it.

There's a very real performance tradeoff to perform NFS ACCESS operations for every part of a tree walk vs. caching the results of previous ACCESS calls.? There's not a sane way for the client to know when a user's group membership has changed in order to invalidate the cache, since the change to the user's group membership is not reflected on the inodes themselves.? So, the NFS client will keep caching the result of previous calls to unchanged inodes until it notices that the process' oldest parent with the same user/credential has a task start_time that is older than the currently cached entries.

TL;DR - you probably want to restart the web server.

Ben


> ________________________________
> From: Benjamin Coddington <[email protected]>
> Sent: Thursday, September 21, 2023 6:44 AM
> To: Charles Hedrick <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: bad info in NFS context
>
> On 20 Sep 2023, at 19:14, Charles Hedrick wrote:
>
>> Ubuntu 22 client and server (5.15). Mount is 4.2, sec=sys. I add a user to a group, but they can't see things that the group should be able to see. /proc/net/rpc/auth.unix.gid/content shows that the nfs group cache has their group membership. Doing a mount -o vers=4.1 works (4.1 to force a different context). Other users that didn't try before work. It's been several hours, and 4.2 still won't work for this user.
>>
>> What do I need to flush?
>>
>> Note that I'm using gssproxy on the server.
>
> Have the user log out and then back in again after the group change, that
> should cause the user's NFS ACCESS cache to clear.
>
> Ben

2023-09-21 19:04:12

by Benjamin Coddington

[permalink] [raw]
Subject: Re: bad info in NFS context

On 21 Sep 2023, at 9:56, Charles Hedrick wrote:

> thanks. I can work with that info. Restarting the server isn't practical. This is a large-scale system serving hundreds of students. We generally keep it up uninterrupted for a whole semester.

By web server, I mean the process not the system. Though, it must have a lot of local state if you don't have it load-balanced and redundant, so maybe even restarting the process is impractical.

Without trying this, we're still guessing that it's the ACCESS cache. You should be able to do something like "sudo su - webserveruser" and that /should/ count as a login for that process, and so that process /should/ gain the access you need from the new membership. Its worth a test to make sure we're not actually dealing with a different problem.

>>  So, the NFS client will keep caching the result of previous calls to unchanged inodes until it notices that the process' oldest parent with the same user/credential has a task start_time that is older than the currently cached entries.
>
> I trust you mean newer. This is jupyterhub, which likes to keep user processes around after logout and reattach when they login. But as long as we know what's going on, there's a way for a user to kill their processes manually.

There's been some attempts to add an "fasc" or "nofasc" mount option to upstream NFS client, which would modify the behavior of the client. That's not had a lot of traction (I think because the patch wants to change the default behavior again).

It's possible to submit work to add a sysfs knob to flush the access cache.. that could look like a full cache flush for everyone, or maybe upon writing a uid to a sysfs file, a flush for cached entries.

Have to tried talking to your NFS client vendor about this problem?

Ben

2023-09-22 05:19:18

by Benjamin Coddington

[permalink] [raw]
Subject: Re: bad info in NFS context

On 21 Sep 2023, at 9:36, Charles Hedrick wrote:

> this is a web app. What is it about logging out and logging in that clears the cache? We'll need to make sure that the web app does it.

There's a very real performance tradeoff to perform NFS ACCESS operations for every part of a tree walk vs. caching the results of previous ACCESS calls. There's not a sane way for the client to know when a user's group membership has changed in order to invalidate the cache, since the change to the user's group membership is not reflected on the inodes themselves. So, the NFS client will keep caching the result of previous calls to unchanged inodes until it notices that the process' oldest parent with the same user/credential has a task start_time that is older than the currently cached entries.

TL;DR - you probably want to restart the web server.

Ben


> ________________________________
> From: Benjamin Coddington <[email protected]>
> Sent: Thursday, September 21, 2023 6:44 AM
> To: Charles Hedrick <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: bad info in NFS context
>
> On 20 Sep 2023, at 19:14, Charles Hedrick wrote:
>
>> Ubuntu 22 client and server (5.15). Mount is 4.2, sec=sys. I add a user to a group, but they can't see things that the group should be able to see. /proc/net/rpc/auth.unix.gid/content shows that the nfs group cache has their group membership. Doing a mount -o vers=4.1 works (4.1 to force a different context). Other users that didn't try before work. It's been several hours, and 4.2 still won't work for this user.
>>
>> What do I need to flush?
>>
>> Note that I'm using gssproxy on the server.
>
> Have the user log out and then back in again after the group change, that
> should cause the user's NFS ACCESS cache to clear.
>
> Ben

2023-09-22 09:42:33

by Benjamin Coddington

[permalink] [raw]
Subject: Re: bad info in NFS context

On 20 Sep 2023, at 19:14, Charles Hedrick wrote:

> Ubuntu 22 client and server (5.15). Mount is 4.2, sec=sys. I add a user to a group, but they can't see things that the group should be able to see. /proc/net/rpc/auth.unix.gid/content shows that the nfs group cache has their group membership. Doing a mount -o vers=4.1 works (4.1 to force a different context). Other users that didn't try before work. It's been several hours, and 4.2 still won't work for this user.
>
> What do I need to flush?
>
> Note that I'm using gssproxy on the server.

Have the user log out and then back in again after the group change, that
should cause the user's NFS ACCESS cache to clear.

Ben

2023-10-18 15:32:28

by Benjamin Coddington

[permalink] [raw]
Subject: Re: bad info in NFS context

On 18 Oct 2023, at 11:18, Charles Hedrick wrote:

> Ubuntu added fasc, but it's a kernel parameter. It can be changed without
> a reboot, so we were able to fix the problem. It appears that this has
> become the default upstream. I looked at Ubuntu's kernel code for their
> next LTS, and it looks like they've accepted the upstream code, so the
> problem shouldn't occur in the next LTS. Meanwhile we've taken precautions
> to set the option on all our systems.

I'm surprised - I haven't seen any "fasc" mount option go upstream. Where
did you see the upstream work?

I'll probably hack up a patch to dump the access cache via sysfs file and
send that sometime this week, I can copy you if you have interest. I'm
thinking usage would look something like:

echo <uid> > /sys/fs/nfs/0\:57/drop_access_cache

Ben