2022-08-16 03:09:44

by NeilBrown

[permalink] [raw]
Subject: Thoughts on mount option to configure client lease renewal time.


Currently the Linux NFS renews leases at 2/3 of the lease time advised
by the server.
Some server vendors (Not Exactly Targeting Any Particular Party)
recommend very short lease times - as short a 5 seconds in fail-over
configurations. This means 1.7 seconds of jitter in any part of the
system can result in leases being lost - but it does achieve fast
fail-over.

If we could configure a 5 second lease-renewal on the client, but leave
a 60 second lease time on the server, then we could get the best of both
worlds. Failover would happen quickly, but you would need a much longer
load spike or network partition to cause the loss of leases.

As v4.1 can end the grace period early once everyone checks in, a large
grace period (which is needed for a large lease time) would rarely be a
problem.

So my thought is to add a mount option "lease-renew=5" for v4.1+ mounts.
The clients then uses that number providing it is less than 2/3 of the
server-declared lease time.

What do people think of this? Is there a better solution, or a problem
with this one?

NeilBrown


2022-08-16 13:52:04

by Trond Myklebust

[permalink] [raw]
Subject: Re: Thoughts on mount option to configure client lease renewal time.

On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote:
>
> Currently the Linux NFS renews leases at 2/3 of the lease time
> advised
> by the server.
> Some server vendors (Not Exactly Targeting Any Particular Party)
> recommend very short lease times - as short a 5 seconds in fail-over
> configurations.  This means 1.7 seconds of jitter in any part of the
> system can result in leases being lost - but it does achieve fast
> fail-over.
>
> If we could configure a 5 second lease-renewal on the client, but
> leave
> a 60 second lease time on the server, then we could get the best of
> both
> worlds.  Failover would happen quickly, but you would need a much
> longer
> load spike or network partition to cause the loss of leases.
>
> As v4.1 can end the grace period early once everyone checks in, a
> large
> grace period (which is needed for a large lease time) would rarely be
> a
> problem.
>
> So my thought is to add a mount option "lease-renew=5" for v4.1+
> mounts.
> The clients then uses that number providing it is less than 2/3 of
> the
> server-declared lease time.
>
> What do people think of this?  Is there a better solution, or a
> problem
> with this one?
>
> NeilBrown
>  

I don't see how the NFS client can ever guarantee a 5 second lease
renewal time, so as far as I'm concerned, this is not a problem we need
to solve.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-08-16 15:38:46

by Chuck Lever III

[permalink] [raw]
Subject: Re: Thoughts on mount option to configure client lease renewal time.


> On Aug 15, 2022, at 7:35 PM, NeilBrown <[email protected]> wrote:
>
> Currently the Linux NFS renews leases at 2/3 of the lease time advised
> by the server.
> Some server vendors (Not Exactly Targeting Any Particular Party)
> recommend very short lease times - as short a 5 seconds in fail-over
> configurations. This means 1.7 seconds of jitter in any part of the
> system can result in leases being lost - but it does achieve fast
> fail-over.
>
> If we could configure a 5 second lease-renewal on the client, but leave
> a 60 second lease time on the server, then we could get the best of both
> worlds. Failover would happen quickly, but you would need a much longer
> load spike or network partition to cause the loss of leases.

If loss of leases is the only concern (ie, there is no file sharing that
can cause a client to steal another's locks when the other client loses
contact with the server) then courteous server should handle that. The
Linux NFS server is now courteous, and several other implementations are
as well.


> As v4.1 can end the grace period early once everyone checks in, a large
> grace period (which is needed for a large lease time) would rarely be a
> problem.

IMO the above paragraph is the most salient: if failover time is being
impacted by state recovery, use NFSv4.1 with implementations that take
proper advantage of RECLAIM_COMPLETE.


> So my thought is to add a mount option "lease-renew=5" for v4.1+ mounts.
> The clients then uses that number providing it is less than 2/3 of the
> server-declared lease time.
>
> What do people think of this? Is there a better solution, or a problem
> with this one?

RECLAIM_COMPLETE is the preferred solution, if I understand your problem
statement correctly. Can you describe how it does not meet expectations?

The other side of this coin is that clients can have so much outstanding
state that they can't recover it all before the grace period expires.
To compensate, a server can limit the number of delegations it hands out,
or it can lengthen its lease/grace period.


--
Chuck Lever



2022-08-16 22:58:02

by NeilBrown

[permalink] [raw]
Subject: Re: Thoughts on mount option to configure client lease renewal time.

On Wed, 17 Aug 2022, Chuck Lever III wrote:
> > On Aug 15, 2022, at 7:35 PM, NeilBrown <[email protected]> wrote:
> >
> > Currently the Linux NFS renews leases at 2/3 of the lease time advised
> > by the server.
> > Some server vendors (Not Exactly Targeting Any Particular Party)
> > recommend very short lease times - as short a 5 seconds in fail-over
> > configurations. This means 1.7 seconds of jitter in any part of the
> > system can result in leases being lost - but it does achieve fast
> > fail-over.
> >
> > If we could configure a 5 second lease-renewal on the client, but leave
> > a 60 second lease time on the server, then we could get the best of both
> > worlds. Failover would happen quickly, but you would need a much longer
> > load spike or network partition to cause the loss of leases.
>
> If loss of leases is the only concern (ie, there is no file sharing that
> can cause a client to steal another's locks when the other client loses
> contact with the server) then courteous server should handle that. The
> Linux NFS server is now courteous, and several other implementations are
> as well.

"If" being the key word. Courteous servers is great and will certainly
help, but doesn't provide the same guarantee as actually getting a
renew in before the lease expires.

>
>
> > As v4.1 can end the grace period early once everyone checks in, a large
> > grace period (which is needed for a large lease time) would rarely be a
> > problem.
>
> IMO the above paragraph is the most salient: if failover time is being
> impacted by state recovery, use NFSv4.1 with implementations that take
> proper advantage of RECLAIM_COMPLETE.
>
>
> > So my thought is to add a mount option "lease-renew=5" for v4.1+ mounts.
> > The clients then uses that number providing it is less than 2/3 of the
> > server-declared lease time.
> >
> > What do people think of this? Is there a better solution, or a problem
> > with this one?
>
> RECLAIM_COMPLETE is the preferred solution, if I understand your problem
> statement correctly. Can you describe how it does not meet expectations?
>

RECLAIM_COMPLETE is an important part of the solution, but not a
complete solution.
If a client is idle (not touching the filesystem for a little while),
then it won't notice the server failover until it sends a renew, which
it might not do for 2/3 the lease time. e.g. for about 1 minute.
Even if it only takes 1 second to reclaim state and send
RECLAIM_COMPLETE, that is still over 1 minute that the server has to
wait before it can end the grace period.

To reliably reduce the effective grace period, you nee a short renew
time, and the use of RECLAIM_COMPLETE.

> The other side of this coin is that clients can have so much outstanding
> state that they can't recover it all before the grace period expires.
> To compensate, a server can limit the number of delegations it hands out,
> or it can lengthen its lease/grace period.

Maybe an ideal client would estimate the time it would take to recover
all its state, and would ensure the gap between renewal time and lease
time were at least that long. I don't know that a practical client
would do that though. Certainly it would make sense for the server to
extend the grace period while a client were actively reclaiming state -
with some limit in case of misbehaving client.

Thanks,
NeilBrown

2022-08-16 23:27:57

by NeilBrown

[permalink] [raw]
Subject: Re: Thoughts on mount option to configure client lease renewal time.

On Tue, 16 Aug 2022, Trond Myklebust wrote:
> On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote:
> >
> > Currently the Linux NFS renews leases at 2/3 of the lease time
> > advised
> > by the server.
> > Some server vendors (Not Exactly Targeting Any Particular Party)
> > recommend very short lease times - as short a 5 seconds in fail-over
> > configurations.  This means 1.7 seconds of jitter in any part of the
> > system can result in leases being lost - but it does achieve fast
> > fail-over.
> >
> > If we could configure a 5 second lease-renewal on the client, but
> > leave
> > a 60 second lease time on the server, then we could get the best of
> > both
> > worlds.  Failover would happen quickly, but you would need a much
> > longer
> > load spike or network partition to cause the loss of leases.
> >
> > As v4.1 can end the grace period early once everyone checks in, a
> > large
> > grace period (which is needed for a large lease time) would rarely be
> > a
> > problem.
> >
> > So my thought is to add a mount option "lease-renew=5" for v4.1+
> > mounts.
> > The clients then uses that number providing it is less than 2/3 of
> > the
> > server-declared lease time.
> >
> > What do people think of this?  Is there a better solution, or a
> > problem
> > with this one?
> >
> > NeilBrown
> >  
>
> I don't see how the NFS client can ever guarantee a 5 second lease
> renewal time, so as far as I'm concerned, this is not a problem we need
> to solve.

I completely agree with the first statement.
The problem we need to solve is whatever problem it is that motivates
server vendors to recommend unrealistically short lease times.

I believe this problem is fail-over time.
Assuming that a server fail-over happens instantly, full NFS service does
not resume until after the grace period completes.

Providing clients send RECLAIM_COMPLETE appropriately, the grace period
could easily be as long as:

client renew time + time to reclaim all state

as clients that are idle (or busy thinking, not accessing the
filesystem) will not notice the failover until they send a renew, which
may not be until the full renew time has passed.

The only part of that calculation that can be controlled is the client
renew time, andat present that can only be controlled by reducing the
lease time. Hence the recommendation for a short lease time.

If we could provide an alternate means to reducing the client renew time
- a mount option - then there would be no incentive to recommend an
impractically short lease time.

Thanks,
NeilBrown

2022-08-16 23:34:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: Thoughts on mount option to configure client lease renewal time.

On Wed, 2022-08-17 at 09:09 +1000, NeilBrown wrote:
> On Tue, 16 Aug 2022, Trond Myklebust wrote:
> > On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote:
> > >
> > > Currently the Linux NFS renews leases at 2/3 of the lease time
> > > advised
> > > by the server.
> > > Some server vendors (Not Exactly Targeting Any Particular Party)
> > > recommend very short lease times - as short a 5 seconds in fail-
> > > over
> > > configurations.  This means 1.7 seconds of jitter in any part of
> > > the
> > > system can result in leases being lost - but it does achieve fast
> > > fail-over.
> > >
> > > If we could configure a 5 second lease-renewal on the client, but
> > > leave
> > > a 60 second lease time on the server, then we could get the best
> > > of
> > > both
> > > worlds.  Failover would happen quickly, but you would need a much
> > > longer
> > > load spike or network partition to cause the loss of leases.
> > >
> > > As v4.1 can end the grace period early once everyone checks in, a
> > > large
> > > grace period (which is needed for a large lease time) would
> > > rarely be
> > > a
> > > problem.
> > >
> > > So my thought is to add a mount option "lease-renew=5" for v4.1+
> > > mounts.
> > > The clients then uses that number providing it is less than 2/3
> > > of
> > > the
> > > server-declared lease time.
> > >
> > > What do people think of this?  Is there a better solution, or a
> > > problem
> > > with this one?
> > >
> > > NeilBrown
> > >  
> >
> > I don't see how the NFS client can ever guarantee a 5 second lease
> > renewal time, so as far as I'm concerned, this is not a problem we
> > need
> > to solve.
>
> I completely agree with the first statement.
> The problem we need to solve is whatever problem it is that motivates
> server vendors to recommend unrealistically short lease times.
>
> I believe this problem is fail-over time.
> Assuming that a server fail-over happens instantly, full NFS service
> does
> not resume until after the grace period completes.
>
> Providing clients send RECLAIM_COMPLETE appropriately, the grace
> period
> could easily be as long as:
>
>   client renew time + time to reclaim all state
>
> as clients that are idle (or busy thinking, not accessing the
> filesystem) will not notice the failover until they send a renew,
> which
> may not be until the full renew time has passed.
>
> The only part of that calculation that can be controlled is the
> client
> renew time, andat present that can only be controlled by reducing the
> lease time.  Hence the recommendation for a short lease time.
>
> If we could provide an alternate means to reducing the client renew
> time
> - a mount option - then there would be no incentive to recommend an
> impractically short lease time.
>
> Thanks,
> NeilBrown

Instead of wasting a load of CPU cycles pinging the NFS layer, why not
farm this out to the TCP layer? We already have keepalive to ensure
that the connection stays up. All we really need is to handle the case
where the connection is broken by the server.

So the suggestion would be that when the connection is broken, we start
sending a SEQUENCE ping in order to figure out what happened, and
whether we need to re-establish state.

No mount options needed...

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-08-17 00:59:42

by NeilBrown

[permalink] [raw]
Subject: Re: Thoughts on mount option to configure client lease renewal time.

On Wed, 17 Aug 2022, Trond Myklebust wrote:
> On Wed, 2022-08-17 at 09:09 +1000, NeilBrown wrote:
> > On Tue, 16 Aug 2022, Trond Myklebust wrote:
> > > On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote:
> > > >
> > > > Currently the Linux NFS renews leases at 2/3 of the lease time
> > > > advised
> > > > by the server.
> > > > Some server vendors (Not Exactly Targeting Any Particular Party)
> > > > recommend very short lease times - as short a 5 seconds in fail-
> > > > over
> > > > configurations.  This means 1.7 seconds of jitter in any part of
> > > > the
> > > > system can result in leases being lost - but it does achieve fast
> > > > fail-over.
> > > >
> > > > If we could configure a 5 second lease-renewal on the client, but
> > > > leave
> > > > a 60 second lease time on the server, then we could get the best
> > > > of
> > > > both
> > > > worlds.  Failover would happen quickly, but you would need a much
> > > > longer
> > > > load spike or network partition to cause the loss of leases.
> > > >
> > > > As v4.1 can end the grace period early once everyone checks in, a
> > > > large
> > > > grace period (which is needed for a large lease time) would
> > > > rarely be
> > > > a
> > > > problem.
> > > >
> > > > So my thought is to add a mount option "lease-renew=5" for v4.1+
> > > > mounts.
> > > > The clients then uses that number providing it is less than 2/3
> > > > of
> > > > the
> > > > server-declared lease time.
> > > >
> > > > What do people think of this?  Is there a better solution, or a
> > > > problem
> > > > with this one?
> > > >
> > > > NeilBrown
> > > >  
> > >
> > > I don't see how the NFS client can ever guarantee a 5 second lease
> > > renewal time, so as far as I'm concerned, this is not a problem we
> > > need
> > > to solve.
> >
> > I completely agree with the first statement.
> > The problem we need to solve is whatever problem it is that motivates
> > server vendors to recommend unrealistically short lease times.
> >
> > I believe this problem is fail-over time.
> > Assuming that a server fail-over happens instantly, full NFS service
> > does
> > not resume until after the grace period completes.
> >
> > Providing clients send RECLAIM_COMPLETE appropriately, the grace
> > period
> > could easily be as long as:
> >
> >   client renew time + time to reclaim all state
> >
> > as clients that are idle (or busy thinking, not accessing the
> > filesystem) will not notice the failover until they send a renew,
> > which
> > may not be until the full renew time has passed.
> >
> > The only part of that calculation that can be controlled is the
> > client
> > renew time, andat present that can only be controlled by reducing the
> > lease time.  Hence the recommendation for a short lease time.
> >
> > If we could provide an alternate means to reducing the client renew
> > time
> > - a mount option - then there would be no incentive to recommend an
> > impractically short lease time.
> >
> > Thanks,
> > NeilBrown
>
> Instead of wasting a load of CPU cycles pinging the NFS layer, why not
> farm this out to the TCP layer? We already have keepalive to ensure
> that the connection stays up. All we really need is to handle the case
> where the connection is broken by the server.
>
> So the suggestion would be that when the connection is broken, we start
> sending a SEQUENCE ping in order to figure out what happened, and
> whether we need to re-establish state.
>
> No mount options needed...

Yes, that is an interesting idea.
This would mean that the timeo/retrans mount options would determine the
effective lease renewal time when the server stops responding. That
seems to make sense.

I'll have a look and see how much change is required to send a renew if
there are no pending requests when the connection closes.

Thanks!
NeilBrown