2006-06-16 06:10:06

by NeilBrown

[permalink] [raw]
Subject: Re: [Linux-cluster] Re: [RFC] NLM lock failover admin interface

On Thursday June 15, [email protected] wrote:
> this discusion has centered around removing the locks of an export.
> we also want the interface to ge able to remove the locks owned by a single
> client. this is needed to enable client migration between replica's or between
> nodes in a cluster file system. it is not acceptable to place an entire export
> in grace just to move a small number of clients.

Hmmmm....
You want to remove all the locks owned by a particular client
with the intension of reclaiming those locks against a different NFS
server (on a cluster filesystem)
and you don't want to put the whole filesystem into grace mode while
doing it.

Is that correct?

Sounds extremely racy to me. Suppose some other client takes a
conflicting lock between dropping them on one server and claiming them
on the other? That would be bad. The purpose of the grace mode is
precisely to avoid this sort of race.

It would seem that what you "really" want to do is to tell the cluster
filesystem to migrate the locks to a different node and some how tell
lockd about out.

Is there a comprehensive design document about how this is going to
work, because I'm feeling doubtful.

For the 'between replicas' case - I'm not sure locking makes sense.
Locking on a read-only filesystem is pretty pointless, and presumably
replicas are read-only???

Basically, dropping locks that are expected to be picked up again,
without putting the whole filesystem into a grace period simply
doesn't sound workable to me.

Am I missing something?

NeilBrown


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


Subject: Re: [NFS] Re: [RFC] NLM lock failover admin interface

> On Thursday June 15, [email protected] wrote:
> > this discusion has centered around removing the locks of an export.
> > we also want the interface to ge able to remove the locks owned by a single
> > client. this is needed to enable client migration between replica's or between
> > nodes in a cluster file system. it is not acceptable to place an entire export
> > in grace just to move a small number of clients.
>
> Hmmmm....
> You want to remove all the locks owned by a particular client
> with the intension of reclaiming those locks against a different NFS
> server (on a cluster filesystem)
> and you don't want to put the whole filesystem into grace mode while
> doing it.
>
> Is that correct?

yes.

>
> Sounds extremely racy to me. Suppose some other client takes a
> conflicting lock between dropping them on one server and claiming them
> on the other? That would be bad. The purpose of the grace mode is
> precisely to avoid this sort of race.

the idea is that the underlying file system can place only the files with
locks held by the migrating client(s) into grace, leaving all other files for
normal operation. the migrating (nfsv4) client then reclaims opens, locks and
delegations on the new server. its just reducing the scope of the grace period.

>
> It would seem that what you "really" want to do is to tell the cluster
> filesystem to migrate the locks to a different node and some how tell
> lockd about out.

what we really want is for the cluster file system to share the locks between
the original node and the new node. then the client can simply be redirected
and no grace period or reclaim is needed. this is much harder to code than a
reduced grace period as describe above. from what we hear, lustre has this
functionality.

either way, the files with locks held by the migrating client need to be
identified by both the lock manager (lockd/nfsv4 server) and the underlying fs.

>
> Is there a comprehensive design document about how this is going to
> work, because I'm feeling doubtful.

we have a work in progress - it's not done but may help describe our thinking.

http://wiki.linux-nfs.org/index.php/Recovery_and_migration

>
> For the 'between replicas' case - I'm not sure locking makes sense.
> Locking on a read-only filesystem is pretty pointless, and presumably
> replicas are read-only???

nope. we have a promising prototye read/write replica scheme that we are
testing.

http://www.citi.umich.edu/techreports/reports/citi-tr-06-3.pdf

i agree this is an outlying case....

but another immediate consumer of such an iterface would be an administator
who needs to remove the locks for a client.

-->Andy

>
> Basically, dropping locks that are expected to be picked up again,
> without putting the whole filesystem into a grace period simply
> doesn't sound workable to me.
>
> Am I missing something?
>
> NeilBrown
>
>
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs