2014-10-17 14:48:29

by Colin Hudler

[permalink] [raw]
Subject: when rpc.mountd flushes auth.unix.gid

We have a few hundred computers mounting an NFS server in a typical
LDAP-based users (nss) setup. We frequently add and remove exports and
use exportfs -r to update etab. Every time we do so, the clients report
"NFS server not responding" and start backing off their requests. After
a painful 3-5 minutes, they recover and life is normal again.

We discovered that when the rpc.mountd cache flushing occurs, our NIS
system is overwhelmed with grouplist requests and this obviously blocks
things. We are working on that problem separately, and I admit this to
be a weakness in our setup. My question is simple.

Why does it flush auth.unix.gid when the etab changed? I think it makes
unnecessary work for rpc.mountd because the gids are unlikely to have
changed, and they already have a reasonable expiration policy.

--
Colin


2014-10-17 21:06:48

by Jeff Layton

[permalink] [raw]
Subject: Re: when rpc.mountd flushes auth.unix.gid

On Fri, 17 Oct 2014 09:42:14 -0500
Colin Hudler <[email protected]> wrote:

> We have a few hundred computers mounting an NFS server in a typical
> LDAP-based users (nss) setup. We frequently add and remove exports and
> use exportfs -r to update etab. Every time we do so, the clients report
> "NFS server not responding" and start backing off their requests. After
> a painful 3-5 minutes, they recover and life is normal again.
>
> We discovered that when the rpc.mountd cache flushing occurs, our NIS
> system is overwhelmed with grouplist requests and this obviously blocks
> things. We are working on that problem separately, and I admit this to
> be a weakness in our setup. My question is simple.
>
> Why does it flush auth.unix.gid when the etab changed? I think it makes
> unnecessary work for rpc.mountd because the gids are unlikely to have
> changed, and they already have a reasonable expiration policy.
>

Most likely because no one really cared until now.

When exports change, cache_flush() is called and that function flushes
out all of the kernel caches.

I expect that could be made to do something a bit more granular, but
you may need to do some archaeology in mountd/exportfs (and the kernel)
to ensure that you're not missing anything.

--
Jeff Layton <[email protected]>

2014-10-17 22:24:17

by Thomas Haynes

[permalink] [raw]
Subject: Re: when rpc.mountd flushes auth.unix.gid




> On Oct 17, 2014, at 4:06 PM, Jeff Layton <[email protected]> wrote:
>
> On Fri, 17 Oct 2014 09:42:14 -0500
> Colin Hudler <[email protected]> wrote:
>
>> We have a few hundred computers mounting an NFS server in a typical
>> LDAP-based users (nss) setup. We frequently add and remove exports and
>> use exportfs -r to update etab. Every time we do so, the clients report
>> "NFS server not responding" and start backing off their requests. After
>> a painful 3-5 minutes, they recover and life is normal again.
>>
>> We discovered that when the rpc.mountd cache flushing occurs, our NIS
>> system is overwhelmed with grouplist requests and this obviously blocks
>> things. We are working on that problem separately, and I admit this to
>> be a weakness in our setup. My question is simple.
>>
>> Why does it flush auth.unix.gid when the etab changed? I think it makes
>> unnecessary work for rpc.mountd because the gids are unlikely to have
>> changed, and they already have a reasonable expiration policy.
>
> Most likely because no one really cared until now.
>
> When exports change, cache_flush() is called and that function flushes
> out all of the kernel caches.
>
> I expect that could be made to do something a bit more granular, but
> you may need to do some archaeology in mountd/exportfs (and the kernel)
> to ensure that you're not missing anything.
>

One thing would be to not remove the exports which are going to be added back in.

The catch here is that you have to account for new entries which need to be added.



> --
> Jeff Layton <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-10-17 18:19:46

by Colin Hudler

[permalink] [raw]
Subject: Re: when rpc.mountd flushes auth.unix.gid



On 10/17/2014 11:15 AM, Tom Haynes wrote:
>
> On Oct 17, 2014, at 9:42 AM, Colin Hudler <[email protected]> wrote:
>
>> We have a few hundred computers mounting an NFS server in a typical LDAP-based users (nss) setup. We frequently add and remove exports and use exportfs -r to update stab.
>
> I know this isn?t your question, but you would be better served by using explicit exportfs -a and exportfs -r commands for the specific changes.
>

Thank you for the insight and suggestions. We are considering changing
our methods but it requires breaking some (long-standing, internal)
abstractions and weighing the risk associated with that. In short, our
automations suck and cannot be changed so easily. However, manual-mode
is always an option.

>
>
>> Every time we do so, the clients report "NFS server not responding" and start backing off their requests. After a painful 3-5 minutes, they recover and life is normal again.
>>
>> We discovered that when the rpc.mountd cache flushing occurs, our NIS system is overwhelmed with grouplist requests and this obviously blocks things. We are working on that problem separately, and I admit this to be a weakness in our setup. My question is simple.
>>
>> Why does it flush auth.unix.gid when the etab changed? I think it makes unnecessary work for rpc.mountd because the gids are unlikely to have changed,
>
> Another assumption is that exports rarely change. I expect your setup is an exception to the rule.
>
>
>> and they already have a reasonable expiration policy.
>
> One way to read what the man page states for exportfs -r:
>
> -r Reexport all directories, synchronizing /var/lib/nfs/etab with /etc/exports and files under /etc/exports.d.
> This option removes entries in /var/lib/nfs/etab which have been deleted from /etc/exports or files under
> /etc/exports.d, and removes any entries from the kernel export table which are no longer valid.
>
> is that it only removes entries which have been deleted.
>
> Instead, it removes all entries and reexports those that are still valid. The remove of all is what blows away the auth.unix.gid caching.
>
> Using exportfs -a <path> and exportfs -r <path> should solve this for you.


Understood. I am now tempted to rework exportfs -r into a loop over
dump(). Thanks again.

2014-10-17 23:37:46

by Jeff Layton

[permalink] [raw]
Subject: Re: when rpc.mountd flushes auth.unix.gid

On Fri, 17 Oct 2014 17:24:14 -0500
Tom Haynes <[email protected]> wrote:

>
>
>
> > On Oct 17, 2014, at 4:06 PM, Jeff Layton <[email protected]> wrote:
> >
> > On Fri, 17 Oct 2014 09:42:14 -0500
> > Colin Hudler <[email protected]> wrote:
> >
> >> We have a few hundred computers mounting an NFS server in a typical
> >> LDAP-based users (nss) setup. We frequently add and remove exports and
> >> use exportfs -r to update etab. Every time we do so, the clients report
> >> "NFS server not responding" and start backing off their requests. After
> >> a painful 3-5 minutes, they recover and life is normal again.
> >>
> >> We discovered that when the rpc.mountd cache flushing occurs, our NIS
> >> system is overwhelmed with grouplist requests and this obviously blocks
> >> things. We are working on that problem separately, and I admit this to
> >> be a weakness in our setup. My question is simple.
> >>
> >> Why does it flush auth.unix.gid when the etab changed? I think it makes
> >> unnecessary work for rpc.mountd because the gids are unlikely to have
> >> changed, and they already have a reasonable expiration policy.
> >
> > Most likely because no one really cared until now.
> >
> > When exports change, cache_flush() is called and that function flushes
> > out all of the kernel caches.
> >
> > I expect that could be made to do something a bit more granular, but
> > you may need to do some archaeology in mountd/exportfs (and the kernel)
> > to ensure that you're not missing anything.
> >
>
> One thing would be to not remove the exports which are going to be added back in.
>
> The catch here is that you have to account for new entries which need to be added.
>
>

I'm not sure that flushing the uid or gid caches is really necessary on
an exports change at all. I don't think we expect that info to change.

In practical terms, we might be able to change exportfs to just flush
the nfsd.fh and nfsd.export caches instead of a full cache_flush() ?

--
Jeff Layton <[email protected]>

2014-10-18 10:10:38

by Jeff Layton

[permalink] [raw]
Subject: Re: when rpc.mountd flushes auth.unix.gid

On Fri, 17 Oct 2014 21:21:18 -0500
Tom Haynes <[email protected]> wrote:

>
> On Oct 17, 2014, at 6:37 PM, Jeff Layton <[email protected]> wrote:
>
> > On Fri, 17 Oct 2014 17:24:14 -0500
> > Tom Haynes <[email protected]> wrote:
> >
> >>
> >>
> >>
> >>> On Oct 17, 2014, at 4:06 PM, Jeff Layton <[email protected]> wrote:
> >>>
> >>> On Fri, 17 Oct 2014 09:42:14 -0500
> >>> Colin Hudler <[email protected]> wrote:
> >>>
> >>>> We have a few hundred computers mounting an NFS server in a typical
> >>>> LDAP-based users (nss) setup. We frequently add and remove exports and
> >>>> use exportfs -r to update etab. Every time we do so, the clients report
> >>>> "NFS server not responding" and start backing off their requests. After
> >>>> a painful 3-5 minutes, they recover and life is normal again.
> >>>>
> >>>> We discovered that when the rpc.mountd cache flushing occurs, our NIS
> >>>> system is overwhelmed with grouplist requests and this obviously blocks
> >>>> things. We are working on that problem separately, and I admit this to
> >>>> be a weakness in our setup. My question is simple.
> >>>>
> >>>> Why does it flush auth.unix.gid when the etab changed? I think it makes
> >>>> unnecessary work for rpc.mountd because the gids are unlikely to have
> >>>> changed, and they already have a reasonable expiration policy.
> >>>
> >>> Most likely because no one really cared until now.
> >>>
> >>> When exports change, cache_flush() is called and that function flushes
> >>> out all of the kernel caches.
> >>>
> >>> I expect that could be made to do something a bit more granular, but
> >>> you may need to do some archaeology in mountd/exportfs (and the kernel)
> >>> to ensure that you're not missing anything.
> >>>
> >>
> >> One thing would be to not remove the exports which are going to be added back in.
> >>
> >> The catch here is that you have to account for new entries which need to be added.
> >>
> >>
> >
> > I'm not sure that flushing the uid or gid caches is really necessary on
> > an exports change at all. I don't think we expect that info to change.
>
> Is there a manual way to flush these caches?
>
> Bump down the default TTL?
>
>

The manual way is to write to /proc/net/rpc/*/flush (which is what
cache_flush() in nfs-utils does). The comments over it say:

/* flush the kNFSd caches.
* Set the flush time to the mtime of _PATH_ETAB or
* if force, to now.
* the caches to flush are:
* auth.unix.ip nfsd.export nfsd.fh
*/

...but it looks like auth.unix.gid was added in 2007 and the
comment wasn't updated.

> >
> > In practical terms, we might be able to change exportfs to just flush
> > the nfsd.fh and nfsd.export caches instead of a full cache_flush() ?
> >
> > --
> > Jeff Layton <[email protected]>
>


--
Jeff Layton <[email protected]>

2014-10-17 16:15:55

by Thomas Haynes

[permalink] [raw]
Subject: Re: when rpc.mountd flushes auth.unix.gid


On Oct 17, 2014, at 9:42 AM, Colin Hudler <[email protected]> wrote:

> We have a few hundred computers mounting an NFS server in a typical LDAP-based users (nss) setup. We frequently add and remove exports and use exportfs -r to update stab.

I know this isn?t your question, but you would be better served by using explicit exportfs -a and exportfs -r commands for the specific changes.



> Every time we do so, the clients report "NFS server not responding" and start backing off their requests. After a painful 3-5 minutes, they recover and life is normal again.
>
> We discovered that when the rpc.mountd cache flushing occurs, our NIS system is overwhelmed with grouplist requests and this obviously blocks things. We are working on that problem separately, and I admit this to be a weakness in our setup. My question is simple.
>
> Why does it flush auth.unix.gid when the etab changed? I think it makes unnecessary work for rpc.mountd because the gids are unlikely to have changed,

Another assumption is that exports rarely change. I expect your setup is an exception to the rule.


> and they already have a reasonable expiration policy.

One way to read what the man page states for exportfs -r:

-r Reexport all directories, synchronizing /var/lib/nfs/etab with /etc/exports and files under /etc/exports.d.
This option removes entries in /var/lib/nfs/etab which have been deleted from /etc/exports or files under
/etc/exports.d, and removes any entries from the kernel export table which are no longer valid.

is that it only removes entries which have been deleted.

Instead, it removes all entries and reexports those that are still valid. The remove of all is what blows away the auth.unix.gid caching.

Using exportfs -a <path> and exportfs -r <path> should solve this for you.




>
> --
> Colin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html