2007-04-28 04:52:00

by NeilBrown

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

On Friday April 27, [email protected] wrote:
> Frank van Maarseveen wrote:
> >
> >I'd prefer (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock" because:
> >
> >
> To convert the first patch of this submitted series from "fsid" to
> "/some/path" is a no-brainer, since we had gone thru several rounds of
> similar changes. However, my questions (it is more of a Neil's question)
> are, if I convert the first patch to do this,
>
> 1) then why do we still need the RPC drop-lock call in nfs-util ?

Maybe we don't.
I can imagine a (probably hypothetical) situation where you want to
drop some but not all of the locks on a filesystem - if it is a
cluster-aware filesystem that several virtual-NAS's export, and you
want to move just one virtual-NAS. But if you don't want to be able
to do that, you obviously don't have to.

> 2) what should we do for the 2nd patch ? i.e., how do we communicate
> with the take-over server it is time for its action, by RPC call or by
> "echo /some/path > /proc/fs/nfsd/nlm_set_grace_or_whatever" ?

I'm happy with using a path name like this to restart the grace
period. Where would you store the per-filesystem grace-period-end??
I guess you would need a new little data structure indexed by
... 'struct super_block *' I guess. It would need to hold a reference
on the superblock until the grace period expired would it?

It might seem 'obvious' to store it in 'struct svc_export', but there
can be several of these per filesystem, and more could be added after
you set the grace period. So it would be messy to get that right.


>
> In general, I feel if we do this "/some/path" approach, we may as well
> simply convert the 2nd patch from "fsid" to "/some/path". Then we would
> finish this long journey.

Certainly a lot closer.
If we are creating "nlm_drop_locks" and "nlm_set_grace" interfaces, we
should spend a few moments considering exactly what semantics they
should have.

In both cases we write a filename. Presumably it must start with a
'/' and be null terminated, so you use "echo -n" rather than "echo".
After all, a filename can contain a newline.

Is there any extra info we might want to pass in or out at the same
time?

For nlm_drop_locks, we might also want to be able to query locked -
"Do you hold any locks on this filesystem". Even "how many?".
For set_grace, we might want to ask how many seconds are left in the
grace period (I'm not sure how this info would be used, but it is
always nice to be able to read any value that you can write).

Does it make sense to have a single file with composite semantics?

We write
XX/path/name
where XX can be:
a number, to set second remaining in grace period
a '?' (or empty string) to query state
a '-' to remove all locks (and cancels any grace period)
We then read back two numbers, the seconds remaining in the grace
period, and the number of locked files.

Then we need to make sure we choose appropriate names. I think that
the string 'lockd' make more sense than 'nlm', as we are interacting
with the daemon, not configuring the protocol. We might not either
need either as the file is inside /proc/fs/nfsd, it is obviously
related to nfsd.
And if we can use the interface to query, then names like 'set' and
'drop' and probably mis-placed. Maybe "grace" and "locks".
If no path is given, the requests have system-wide effect. If there
is a non-empty path, just that filesystem if queried/modified.

These are just possibilities. I'm quite happy with either 1 or 2
files. I just want to be sure a number of options have been
considered, and that a reasoned choice as been made.

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-04-28 05:26:57

by Marc Eshel

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

[email protected] wrote on 04/27/2007 09:51:17 PM:

> On Friday April 27, [email protected] wrote:
> > Frank van Maarseveen wrote:
> > >
> > >I'd prefer (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock"
because:
> > >
> > >
> > To convert the first patch of this submitted series from "fsid" to
> > "/some/path" is a no-brainer, since we had gone thru several rounds of

> > similar changes. However, my questions (it is more of a Neil's
question)
> > are, if I convert the first patch to do this,
> >
> > 1) then why do we still need the RPC drop-lock call in nfs-util ?
>
> Maybe we don't.
> I can imagine a (probably hypothetical) situation where you want to
> drop some but not all of the locks on a filesystem - if it is a
> cluster-aware filesystem that several virtual-NAS's export, and you
> want to move just one virtual-NAS. But if you don't want to be able
> to do that, you obviously don't have to.

It would be very useful for cluster filesystems, that can export the same
filesystem from few servers using multiple IP addresses from each server,
to be able to move IP address among server for load balancing.
Marc.

> > 2) what should we do for the 2nd patch ? i.e., how do we communicate
> > with the take-over server it is time for its action, by RPC call or by

> > "echo /some/path > /proc/fs/nfsd/nlm_set_grace_or_whatever" ?
>
> I'm happy with using a path name like this to restart the grace
> period. Where would you store the per-filesystem grace-period-end??
> I guess you would need a new little data structure indexed by
> ... 'struct super_block *' I guess. It would need to hold a reference
> on the superblock until the grace period expired would it?
>
> It might seem 'obvious' to store it in 'struct svc_export', but there
> can be several of these per filesystem, and more could be added after
> you set the grace period. So it would be messy to get that right.
>
>
> >
> > In general, I feel if we do this "/some/path" approach, we may as well

> > simply convert the 2nd patch from "fsid" to "/some/path". Then we
would
> > finish this long journey.
>
> Certainly a lot closer.
> If we are creating "nlm_drop_locks" and "nlm_set_grace" interfaces, we
> should spend a few moments considering exactly what semantics they
> should have.
>
> In both cases we write a filename. Presumably it must start with a
> '/' and be null terminated, so you use "echo -n" rather than "echo".
> After all, a filename can contain a newline.
>
> Is there any extra info we might want to pass in or out at the same
> time?
>
> For nlm_drop_locks, we might also want to be able to query locked -
> "Do you hold any locks on this filesystem". Even "how many?".
> For set_grace, we might want to ask how many seconds are left in the
> grace period (I'm not sure how this info would be used, but it is
> always nice to be able to read any value that you can write).
>
> Does it make sense to have a single file with composite semantics?
>
> We write
> XX/path/name
> where XX can be:
> a number, to set second remaining in grace period
> a '?' (or empty string) to query state
> a '-' to remove all locks (and cancels any grace period)
> We then read back two numbers, the seconds remaining in the grace
> period, and the number of locked files.
>
> Then we need to make sure we choose appropriate names. I think that
> the string 'lockd' make more sense than 'nlm', as we are interacting
> with the daemon, not configuring the protocol. We might not either
> need either as the file is inside /proc/fs/nfsd, it is obviously
> related to nfsd.
> And if we can use the interface to query, then names like 'set' and
> 'drop' and probably mis-placed. Maybe "grace" and "locks".
> If no path is given, the requests have system-wide effect. If there
> is a non-empty path, just that filesystem if queried/modified.
>
> These are just possibilities. I'm quite happy with either 1 or 2
> files. I just want to be sure a number of options have been
> considered, and that a reasoned choice as been made.
>
> NeilBrown
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-28 12:33:34

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

On Sat, Apr 28, 2007 at 02:51:17PM +1000, Neil Brown wrote:
> On Friday April 27, [email protected] wrote:
[...]
> Certainly a lot closer.
> If we are creating "nlm_drop_locks" and "nlm_set_grace" interfaces, we
> should spend a few moments considering exactly what semantics they
> should have.
>
> In both cases we write a filename. Presumably it must start with a
> '/' and be null terminated, so you use "echo -n" rather than "echo".
> After all, a filename can contain a newline.

I don't care much about the trailing newline. Try mounting and
exporting it and mounting it on the client ;-). Truncating the
string at the first newline may be a practical thing to do.

>
> Is there any extra info we might want to pass in or out at the same
> time?
>
> For nlm_drop_locks, we might also want to be able to query locked -
> "Do you hold any locks on this filesystem". Even "how many?".

The "no locks dropped" case might be useful. #locks dropped is
only informational (without client info) and covers the first case
too so that would be my choice but I don't have any strong opinion
about this.

>
> Does it make sense to have a single file with composite semantics?

Only if that would avoid an otherwise unavoidable race. There are
just too many components involved with NFS so to avoid any race I'd
probably unplug it temporarily with iptables or "ip addr del..."
But I would like to be able to drop locks without entering grace
mode: a zero second grace mode when combined.

>
> We write
> XX/path/name
> where XX can be:

Try mounting and exporting pathnames with spaces.. that's not going
to work anytime soon, or even anytime at all (other unixes). So no
need to use / as separator.

> a number, to set second remaining in grace period
> a '?' (or empty string) to query state

You mean: write "?/path/name" to tell the kernel what subsequent
reads should query?

> a '-' to remove all locks (and cancels any grace period)

That's a strange combination. But cancelling a grace period
is equivalent with setting it to zero seconds so no need for a
special case.

I'd go for simplicity: one file per function (unless there's an
unavoidable race). What about:

/proc/fs/nfsd/nlm_grace:
Write a number to set the grace period in seconds
(0==cancel). May be followed by a space + pathname to
indicate the superblock/list of svc_something the grace
period applies to (otherwise it's global). Truncate the
string at a newline.

/proc/fs/nfsd/nlm_unlock:
Write either a pathname or "" to drop locks. This has the
same syntax as the second field of nlm_grace.


Optional: In addition to a pathname support "fsid=" syntax in
both cases.

If you wanna go wild then support a file= syntax to recover from
stale locks on individual files due to buggy clients.

--
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs