2021-06-28 23:41:06

by Olga Kornievskaia

[permalink] [raw]
Subject: client's caching of server-side capabilities

Hi folks,

I have a general question of why the client doesn't throw away the
cached server's capabilities on server reboot. Say a client mounted a
server when the server didn't support security_labels, then the server
was rebooted and support was enabled. Client re-establishes its
clientid/session, recovers state, but assumes all the old capabilities
apply. A remount is required to clear old/find new capabilities. The
opposite is true that a capability could be removed (but I'm assuming
that's a less practical example).

I'm curious what are the problems of clearing server capabilities and
rediscovering them on reboot? Is it because a local filesystem could
never have its attributes changed and thus a network file system can't
either?

Thank you.


2021-06-28 23:43:15

by Trond Myklebust

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> Hi folks,
>
> I have a general question of why the client doesn't throw away the
> cached server's capabilities on server reboot. Say a client mounted a
> server when the server didn't support security_labels, then the
> server
> was rebooted and support was enabled. Client re-establishes its
> clientid/session, recovers state, but assumes all the old
> capabilities
> apply. A remount is required to clear old/find new capabilities. The
> opposite is true that a capability could be removed (but I'm assuming
> that's a less practical example).
>
> I'm curious what are the problems of clearing server capabilities and
> rediscovering them on reboot? Is it because a local filesystem could
> never have its attributes changed and thus a network file system
> can't
> either?
>
> Thank you.

In my opinion, the client should aim for the absolute minimum overhead
on a server reboot. The goal should be to recover state and get I/O
started again as quickly as possible. Detection of new features, etc
can wait until the client needs to restart.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-06-29 13:02:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities



> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
>> Hi folks,
>>
>> I have a general question of why the client doesn't throw away the
>> cached server's capabilities on server reboot. Say a client mounted a
>> server when the server didn't support security_labels, then the
>> server
>> was rebooted and support was enabled. Client re-establishes its
>> clientid/session, recovers state, but assumes all the old
>> capabilities
>> apply. A remount is required to clear old/find new capabilities. The
>> opposite is true that a capability could be removed (but I'm assuming
>> that's a less practical example).
>>
>> I'm curious what are the problems of clearing server capabilities and
>> rediscovering them on reboot? Is it because a local filesystem could
>> never have its attributes changed and thus a network file system
>> can't
>> either?
>>
>> Thank you.
>
> In my opinion, the client should aim for the absolute minimum overhead
> on a server reboot. The goal should be to recover state and get I/O
> started again as quickly as possible.

I 100% agree with the above. However...


> Detection of new features, etc
> can wait until the client needs to restart.

A server reboot can be part of a failover to a different server. I
think capability discovery needs to happen as part of server reboot
recovery, it can't be optimized away.


--
Chuck Lever



2021-06-29 13:50:58

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Mon, Jun 28, 2021 at 6:06 PM Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > Hi folks,
> >
> > I have a general question of why the client doesn't throw away the
> > cached server's capabilities on server reboot. Say a client mounted a
> > server when the server didn't support security_labels, then the
> > server
> > was rebooted and support was enabled. Client re-establishes its
> > clientid/session, recovers state, but assumes all the old
> > capabilities
> > apply. A remount is required to clear old/find new capabilities. The
> > opposite is true that a capability could be removed (but I'm assuming
> > that's a less practical example).
> >
> > I'm curious what are the problems of clearing server capabilities and
> > rediscovering them on reboot? Is it because a local filesystem could
> > never have its attributes changed and thus a network file system
> > can't
> > either?
> >
> > Thank you.
>
> In my opinion, the client should aim for the absolute minimum overhead
> on a server reboot. The goal should be to recover state and get I/O
> started again as quickly as possible. Detection of new features, etc
> can wait until the client needs to restart.

Do I interpret this correctly: no capability discoveries before
RECLAIM_COMPLETE but perhaps after? I agree that reboot recovery
should be done as quickly as possible. If it's some time after, then
perhaps it can be done on-demand thru say nfs sysfs api: have ability
to clear current capabilities (or a specific one) and do discover new
ones?

The use case I'm going for is when a server upgrades and comes up with
support for new features. Currently, it requires a client re-mount.
But perhaps requiring "mount -o remount" in that case isn't any
different than requiring use of sysfs.

Thank you for the feedback.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2021-06-29 13:52:45

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On Jun 28, 2021, at 6:06 PM, Trond Myklebust <[email protected]> wrote:
> >
> > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> >> Hi folks,
> >>
> >> I have a general question of why the client doesn't throw away the
> >> cached server's capabilities on server reboot. Say a client mounted a
> >> server when the server didn't support security_labels, then the
> >> server
> >> was rebooted and support was enabled. Client re-establishes its
> >> clientid/session, recovers state, but assumes all the old
> >> capabilities
> >> apply. A remount is required to clear old/find new capabilities. The
> >> opposite is true that a capability could be removed (but I'm assuming
> >> that's a less practical example).
> >>
> >> I'm curious what are the problems of clearing server capabilities and
> >> rediscovering them on reboot? Is it because a local filesystem could
> >> never have its attributes changed and thus a network file system
> >> can't
> >> either?
> >>
> >> Thank you.
> >
> > In my opinion, the client should aim for the absolute minimum overhead
> > on a server reboot. The goal should be to recover state and get I/O
> > started again as quickly as possible.
>
> I 100% agree with the above. However...
>
>
> > Detection of new features, etc
> > can wait until the client needs to restart.
>
> A server reboot can be part of a failover to a different server. I
> think capability discovery needs to happen as part of server reboot
> recovery, it can't be optimized away.

Can you clarify what you mean by a "failover to a different server"?
To do reboot recovery it has to be the "same" server (by the
definitions of the RFC). My use case I was thinking of was a reboot of
the "same" server (major, minor, scope same) but with new features but
of course one could argue if it has new features it's no longer the
"same" server. I think you are probably thinking about migration or
are you thinking of telling a difference between session trunkable
servers which are considered to be the same but since it's a different
IP it might have different capabilities?

Thank you for the feedback!

>
>
> --
> Chuck Lever
>
>
>

2021-06-29 13:53:37

by Chuck Lever III

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities



> On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <[email protected]> wrote:
>>
>>
>>
>>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <[email protected]> wrote:
>>>
>>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
>>>> Hi folks,
>>>>
>>>> I have a general question of why the client doesn't throw away the
>>>> cached server's capabilities on server reboot. Say a client mounted a
>>>> server when the server didn't support security_labels, then the
>>>> server
>>>> was rebooted and support was enabled. Client re-establishes its
>>>> clientid/session, recovers state, but assumes all the old
>>>> capabilities
>>>> apply. A remount is required to clear old/find new capabilities. The
>>>> opposite is true that a capability could be removed (but I'm assuming
>>>> that's a less practical example).
>>>>
>>>> I'm curious what are the problems of clearing server capabilities and
>>>> rediscovering them on reboot? Is it because a local filesystem could
>>>> never have its attributes changed and thus a network file system
>>>> can't
>>>> either?
>>>>
>>>> Thank you.
>>>
>>> In my opinion, the client should aim for the absolute minimum overhead
>>> on a server reboot. The goal should be to recover state and get I/O
>>> started again as quickly as possible.
>>
>> I 100% agree with the above. However...
>>
>>
>>> Detection of new features, etc
>>> can wait until the client needs to restart.
>>
>> A server reboot can be part of a failover to a different server. I
>> think capability discovery needs to happen as part of server reboot
>> recovery, it can't be optimized away.
>
> Can you clarify what you mean by a "failover to a different server"?

IP-based failover means that a server can crash, and its partner can
detect that and take over the IP address and exports of the failed
server. The replacement server doesn't have to have exactly the same
set of capabilities.


> To do reboot recovery it has to be the "same" server (by the
> definitions of the RFC). My use case I was thinking of was a reboot of
> the "same" server (major, minor, scope same) but with new features but
> of course one could argue if it has new features it's no longer the
> "same" server. I think you are probably thinking about migration or
> are you thinking of telling a difference between session trunkable
> servers which are considered to be the same but since it's a different
> IP it might have different capabilities?
>
> Thank you for the feedback!
>
>>
>>
>> --
>> Chuck Lever

--
Chuck Lever



2021-06-29 16:14:09

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Tue, Jun 29, 2021 at 9:41 AM Olga Kornievskaia <[email protected]> wrote:
>
> On Mon, Jun 28, 2021 at 6:06 PM Trond Myklebust <[email protected]> wrote:
> >
> > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > > Hi folks,
> > >
> > > I have a general question of why the client doesn't throw away the
> > > cached server's capabilities on server reboot. Say a client mounted a
> > > server when the server didn't support security_labels, then the
> > > server
> > > was rebooted and support was enabled. Client re-establishes its
> > > clientid/session, recovers state, but assumes all the old
> > > capabilities
> > > apply. A remount is required to clear old/find new capabilities. The
> > > opposite is true that a capability could be removed (but I'm assuming
> > > that's a less practical example).
> > >
> > > I'm curious what are the problems of clearing server capabilities and
> > > rediscovering them on reboot? Is it because a local filesystem could
> > > never have its attributes changed and thus a network file system
> > > can't
> > > either?
> > >
> > > Thank you.
> >
> > In my opinion, the client should aim for the absolute minimum overhead
> > on a server reboot. The goal should be to recover state and get I/O
> > started again as quickly as possible. Detection of new features, etc
> > can wait until the client needs to restart.
>
> Do I interpret this correctly: no capability discoveries before
> RECLAIM_COMPLETE but perhaps after? I agree that reboot recovery
> should be done as quickly as possible. If it's some time after, then
> perhaps it can be done on-demand thru say nfs sysfs api: have ability
> to clear current capabilities (or a specific one) and do discover new
> ones?
>
> The use case I'm going for is when a server upgrades and comes up with
> support for new features. Currently, it requires a client re-mount.
> But perhaps requiring "mount -o remount" in that case isn't any
> different than requiring use of sysfs.

Actually, I tried to do a "mount -o remount" after taking down the
server and changing its features (ie security label support), and the
client does not query for supported attributes. So I think either at
least perhaps that can be changed somehow or we do need a sysfs api to
be able to change server's capabilities of a given mount.

>
> Thank you for the feedback.
>
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >

2021-06-29 17:11:07

by Trond Myklebust

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Tue, 2021-06-29 at 12:12 -0400, Olga Kornievskaia wrote:
> On Tue, Jun 29, 2021 at 9:41 AM Olga Kornievskaia <[email protected]>
> wrote:
> >
> > On Mon, Jun 28, 2021 at 6:06 PM Trond Myklebust
> > <[email protected]> wrote:
> > >
> > > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > > > Hi folks,
> > > >
> > > > I have a general question of why the client doesn't throw away
> > > > the
> > > > cached server's capabilities on server reboot. Say a client
> > > > mounted a
> > > > server when the server didn't support security_labels, then the
> > > > server
> > > > was rebooted and support was enabled. Client re-establishes its
> > > > clientid/session, recovers state, but assumes all the old
> > > > capabilities
> > > > apply. A remount is required to clear old/find new
> > > > capabilities. The
> > > > opposite is true that a capability could be removed (but I'm
> > > > assuming
> > > > that's a less practical example).
> > > >
> > > > I'm curious what are the problems of clearing server
> > > > capabilities and
> > > > rediscovering them on reboot? Is it because a local filesystem
> > > > could
> > > > never have its attributes changed and thus a network file
> > > > system
> > > > can't
> > > > either?
> > > >
> > > > Thank you.
> > >
> > > In my opinion, the client should aim for the absolute minimum
> > > overhead
> > > on a server reboot. The goal should be to recover state and get
> > > I/O
> > > started again as quickly as possible. Detection of new features,
> > > etc
> > > can wait until the client needs to restart.
> >
> > Do I interpret this correctly: no capability discoveries before
> > RECLAIM_COMPLETE but perhaps after? I agree that reboot recovery
> > should be done as quickly as possible. If it's some time after,
> > then
> > perhaps it can be done on-demand thru say nfs sysfs api: have
> > ability
> > to clear current capabilities (or a specific one) and do discover
> > new
> > ones?
> >
> > The use case I'm going for is when a server upgrades and comes up
> > with
> > support for new features. Currently, it requires a client re-mount.
> > But perhaps requiring "mount -o remount" in that case isn't any
> > different than requiring use of sysfs.
>
> Actually, I tried to do a "mount -o remount" after taking down the
> server and changing its features (ie security label support), and the
> client does not query for supported attributes. So I think either at
> least perhaps that can be changed somehow or we do need a sysfs api
> to
> be able to change server's capabilities of a given mount.
> > >


'-o remount' does not currently try to reprobe server capabilities, but
we could have it do that.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-06-30 15:25:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
>
>
> > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <[email protected]> wrote:
> >
> > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <[email protected]> wrote:
> >>
> >>
> >>
> >>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <[email protected]> wrote:
> >>>
> >>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> >>>> Hi folks,
> >>>>
> >>>> I have a general question of why the client doesn't throw away the
> >>>> cached server's capabilities on server reboot. Say a client mounted a
> >>>> server when the server didn't support security_labels, then the
> >>>> server
> >>>> was rebooted and support was enabled. Client re-establishes its
> >>>> clientid/session, recovers state, but assumes all the old
> >>>> capabilities
> >>>> apply. A remount is required to clear old/find new capabilities. The
> >>>> opposite is true that a capability could be removed (but I'm assuming
> >>>> that's a less practical example).
> >>>>
> >>>> I'm curious what are the problems of clearing server capabilities and
> >>>> rediscovering them on reboot? Is it because a local filesystem could
> >>>> never have its attributes changed and thus a network file system
> >>>> can't
> >>>> either?
> >>>>
> >>>> Thank you.
> >>>
> >>> In my opinion, the client should aim for the absolute minimum overhead
> >>> on a server reboot. The goal should be to recover state and get I/O
> >>> started again as quickly as possible.
> >>
> >> I 100% agree with the above. However...
> >>
> >>
> >>> Detection of new features, etc
> >>> can wait until the client needs to restart.
> >>
> >> A server reboot can be part of a failover to a different server. I
> >> think capability discovery needs to happen as part of server reboot
> >> recovery, it can't be optimized away.
> >
> > Can you clarify what you mean by a "failover to a different server"?
>
> IP-based failover means that a server can crash, and its partner can
> detect that and take over the IP address and exports of the failed
> server. The replacement server doesn't have to have exactly the same
> set of capabilities.

So it could also lose capabilities?

I'm a little nervous about server features being changed out from under
the client while the client has the server mounted.

But, I don't know, looking quickly through the list of NFS_CAP_*
definitions in nfs_fs_sb.h, I'm not coming up with a case where we
couldn't handle it, maybe it's OK.

--b.

2021-06-30 15:53:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Wed, 2021-06-30 at 11:22 -0400, J. Bruce Fields wrote:
> On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <[email protected]>
> > > wrote:
> > >
> > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III
> > > <[email protected]> wrote:
> > > >
> > > >
> > > >
> > > > > On Jun 28, 2021, at 6:06 PM, Trond Myklebust
> > > > > <[email protected]> wrote:
> > > > >
> > > > > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > I have a general question of why the client doesn't throw
> > > > > > away the
> > > > > > cached server's capabilities on server reboot. Say a client
> > > > > > mounted a
> > > > > > server when the server didn't support security_labels, then
> > > > > > the
> > > > > > server
> > > > > > was rebooted and support was enabled. Client re-establishes
> > > > > > its
> > > > > > clientid/session, recovers state, but assumes all the old
> > > > > > capabilities
> > > > > > apply. A remount is required to clear old/find new
> > > > > > capabilities. The
> > > > > > opposite is true that a capability could be removed (but
> > > > > > I'm assuming
> > > > > > that's a less practical example).
> > > > > >
> > > > > > I'm curious what are the problems of clearing server
> > > > > > capabilities and
> > > > > > rediscovering them on reboot? Is it because a local
> > > > > > filesystem could
> > > > > > never have its attributes changed and thus a network file
> > > > > > system
> > > > > > can't
> > > > > > either?
> > > > > >
> > > > > > Thank you.
> > > > >
> > > > > In my opinion, the client should aim for the absolute minimum
> > > > > overhead
> > > > > on a server reboot. The goal should be to recover state and
> > > > > get I/O
> > > > > started again as quickly as possible.
> > > >
> > > > I 100% agree with the above. However...
> > > >
> > > >
> > > > > Detection of new features, etc
> > > > > can wait until the client needs to restart.
> > > >
> > > > A server reboot can be part of a failover to a different
> > > > server. I
> > > > think capability discovery needs to happen as part of server
> > > > reboot
> > > > recovery, it can't be optimized away.
> > >
> > > Can you clarify what you mean by a "failover to a different
> > > server"?
> >
> > IP-based failover means that a server can crash, and its partner
> > can
> > detect that and take over the IP address and exports of the failed
> > server. The replacement server doesn't have to have exactly the
> > same
> > set of capabilities.
>
> So it could also lose capabilities?
>
> I'm a little nervous about server features being changed out from
> under
> the client while the client has the server mounted.
>
> But, I don't know, looking quickly through the list of NFS_CAP_*
> definitions in nfs_fs_sb.h, I'm not coming up with a case where we
> couldn't handle it, maybe it's OK.
>
> --b.

I'm not taking any patches for the server reboot case. If someone wants
to do it for the migration case, then fine: that's not a case that is
common or that requires performance. However reprobing all mounted
filesystems on every server reboot is NACKed.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-06-30 16:49:49

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Wed, Jun 30, 2021 at 11:23 AM J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <[email protected]> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <[email protected]> wrote:
> > >>
> > >>
> > >>
> > >>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <[email protected]> wrote:
> > >>>
> > >>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > >>>> Hi folks,
> > >>>>
> > >>>> I have a general question of why the client doesn't throw away the
> > >>>> cached server's capabilities on server reboot. Say a client mounted a
> > >>>> server when the server didn't support security_labels, then the
> > >>>> server
> > >>>> was rebooted and support was enabled. Client re-establishes its
> > >>>> clientid/session, recovers state, but assumes all the old
> > >>>> capabilities
> > >>>> apply. A remount is required to clear old/find new capabilities. The
> > >>>> opposite is true that a capability could be removed (but I'm assuming
> > >>>> that's a less practical example).
> > >>>>
> > >>>> I'm curious what are the problems of clearing server capabilities and
> > >>>> rediscovering them on reboot? Is it because a local filesystem could
> > >>>> never have its attributes changed and thus a network file system
> > >>>> can't
> > >>>> either?
> > >>>>
> > >>>> Thank you.
> > >>>
> > >>> In my opinion, the client should aim for the absolute minimum overhead
> > >>> on a server reboot. The goal should be to recover state and get I/O
> > >>> started again as quickly as possible.
> > >>
> > >> I 100% agree with the above. However...
> > >>
> > >>
> > >>> Detection of new features, etc
> > >>> can wait until the client needs to restart.
> > >>
> > >> A server reboot can be part of a failover to a different server. I
> > >> think capability discovery needs to happen as part of server reboot
> > >> recovery, it can't be optimized away.
> > >
> > > Can you clarify what you mean by a "failover to a different server"?
> >
> > IP-based failover means that a server can crash, and its partner can
> > detect that and take over the IP address and exports of the failed
> > server. The replacement server doesn't have to have exactly the same
> > set of capabilities.
>
> So it could also lose capabilities?

Well, wouldn't the client lose capabilities even now? Operations
relying on those capabilities wouldn't work (ie., say security label
wouldn't be returned or an operation would error with ENOTSUPP). And I
think when it comes to operations, that's fine as the capability would
then be adjusted (removed).

To make it clear again, I'm not suggesting to do it at server reboot
as it was pointed out to cause performance problems.

> I'm a little nervous about server features being changed out from under
> the client while the client has the server mounted.
>
> But, I don't know, looking quickly through the list of NFS_CAP_*
> definitions in nfs_fs_sb.h, I'm not coming up with a case where we
> couldn't handle it, maybe it's OK.
>
> --b.

2021-06-30 17:12:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: client's caching of server-side capabilities

On Wed, 2021-06-30 at 12:48 -0400, Olga Kornievskaia wrote:
> On Wed, Jun 30, 2021 at 11:23 AM J. Bruce Fields
> <[email protected]> wrote:
> >
> > On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <[email protected]>
> > > > wrote:
> > > >
> > > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III
> > > > <[email protected]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > On Jun 28, 2021, at 6:06 PM, Trond Myklebust
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > I have a general question of why the client doesn't throw
> > > > > > > away the
> > > > > > > cached server's capabilities on server reboot. Say a
> > > > > > > client mounted a
> > > > > > > server when the server didn't support security_labels,
> > > > > > > then the
> > > > > > > server
> > > > > > > was rebooted and support was enabled. Client re-
> > > > > > > establishes its
> > > > > > > clientid/session, recovers state, but assumes all the old
> > > > > > > capabilities
> > > > > > > apply. A remount is required to clear old/find new
> > > > > > > capabilities. The
> > > > > > > opposite is true that a capability could be removed (but
> > > > > > > I'm assuming
> > > > > > > that's a less practical example).
> > > > > > >
> > > > > > > I'm curious what are the problems of clearing server
> > > > > > > capabilities and
> > > > > > > rediscovering them on reboot? Is it because a local
> > > > > > > filesystem could
> > > > > > > never have its attributes changed and thus a network file
> > > > > > > system
> > > > > > > can't
> > > > > > > either?
> > > > > > >
> > > > > > > Thank you.
> > > > > >
> > > > > > In my opinion, the client should aim for the absolute
> > > > > > minimum overhead
> > > > > > on a server reboot. The goal should be to recover state and
> > > > > > get I/O
> > > > > > started again as quickly as possible.
> > > > >
> > > > > I 100% agree with the above. However...
> > > > >
> > > > >
> > > > > > Detection of new features, etc
> > > > > > can wait until the client needs to restart.
> > > > >
> > > > > A server reboot can be part of a failover to a different
> > > > > server. I
> > > > > think capability discovery needs to happen as part of server
> > > > > reboot
> > > > > recovery, it can't be optimized away.
> > > >
> > > > Can you clarify what you mean by a "failover to a different
> > > > server"?
> > >
> > > IP-based failover means that a server can crash, and its partner
> > > can
> > > detect that and take over the IP address and exports of the
> > > failed
> > > server. The replacement server doesn't have to have exactly the
> > > same
> > > set of capabilities.
> >
> > So it could also lose capabilities?
>
> Well, wouldn't the client lose capabilities even now? Operations
> relying on those capabilities wouldn't work (ie., say security label
> wouldn't be returned or an operation would error with ENOTSUPP). And
> I
> think when it comes to operations, that's fine as the capability
> would
> then be adjusted (removed).
>
> To make it clear again, I'm not suggesting to do it at server reboot
> as it was pointed out to cause performance problems.
>

Yep. The reason why I'd be more tolerant of this in the case of
migration/server failover is because in that case, the client is
already expected to trawl the various mountpoints for NFS4ERR_MOVED
errors, and running fs_locations probes anyway. The process is already
slow and disruptive, so throwing in an fsinfo probe to the new server
isn't really a big deal.

> > I'm a little nervous about server features being changed out from
> > under
> > the client while the client has the server mounted.
> >
> > But, I don't know, looking quickly through the list of NFS_CAP_*
> > definitions in nfs_fs_sb.h, I'm not coming up with a case where we
> > couldn't handle it, maybe it's OK.
> >
> > --b.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]