2007-04-30 01:46:08

by NeilBrown

[permalink] [raw]
Subject: flock and lockf over NFS and the 'pid' in lockd requests.


Hi,
I've been looking at a difficultly being experienced with procmail
used over NFS.

procmail (and I don't think it is alone in this) can be compiled to
use multiple locking schemes (just in case). So it might create a
lock file, call flock, then get an fcntl lock on the file.

For local filesystems, this works fine, as each of the locking
domains are independent - an flock doesn't conflict with an fcntl
lock on the same file.

And it used to work over NFS, because flock didn't travel to the
server, so the domains were again independent.

But it doesn't work with recent kernels.
Since about 2.6.12, flock calls on NFS have been sent over the
network and become fcntl locks on the server (which is a little
inconsistent, but understandable).

This means that when procmail tries to take an flock and a fcntl
lock, the server sees that the second one conflicts with the first,
and procmail blocks. No good.

Now we could compile procmail (and any other mail program that does
similar tricks) to only use one locking mechanism, but that isn't
really very satisfactory I think.

We cannot get the server to support two independent locking domains,
so the only way this can be fixed is to not let the server see the
conflict, and I believe this is possible.

The current lockd code makes sure that it can get a local lock before
asking for a remote lock. If there is a local conflict, there is no
point in asking the server. This means that a lockd server will
*never* see two locks from the one lockd client that conflict. It
may well see locks from two different clients that conflict (that is
the whole point) but not two from the one client.

Currently an flock and an fcntl lock on the one file will both be sent
to the lockd server and it will appear to be a conflict. We can tell
lockd that it isn't a conflict by setting the 'pid' field in the lock
requests to some constant (e.g. 0, or 42 or whatever).
As no genuine conflicts are ever sent to a server, it is always safe
to just send a constant pid. The pid is only used for lockd to
mediate locking requests between processes on the same client. Now
that the lockd client never depends on the server for that mediation,
there is no need to send the pid.


So I would like to propose the following patch.

It is possible that the use of 'pid' in req->a_owner string (at top
of the chunk in the patch) should be removed too, but that doesn't
seem to be used (in the NFS server implementation at least) so I
didn't need that removed for testing.

Note too that if we drop the usage of 'pid' here, then there is a lot
of code for generation unique pids that can be dropped as well. So
this isn't a complete patch, just the core idea.

Have I missed something important? Can we really drop the pid?
If not, is there something else that can be done to make procmail work
other that requiring a recompile.

Thanks,
NeilBrown


diff .prev/fs/lockd/clntproc.c ./fs/lockd/clntproc.c
--- .prev/fs/lockd/clntproc.c 2007-04-30 10:41:01.000000000 +1000
+++ ./fs/lockd/clntproc.c 2007-04-30 11:10:08.000000000 +1000
@@ -134,7 +134,7 @@ static void nlmclnt_setlockargs(struct n
lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
(unsigned int)fl->fl_u.nfs_fl.owner->pid,
utsname()->nodename);
- lock->svid = fl->fl_u.nfs_fl.owner->pid;
+ lock->svid = 42;
lock->fl.fl_start = fl->fl_start;
lock->fl.fl_end = fl->fl_end;
lock->fl.fl_type = fl->fl_type;



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-04-30 02:13:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: flock and lockf over NFS and the 'pid' in lockd requests.

On Mon, 2007-04-30 at 11:45 +1000, Neil Brown wrote:
> Currently an flock and an fcntl lock on the one file will both be sent
> to the lockd server and it will appear to be a conflict. We can tell
> lockd that it isn't a conflict by setting the 'pid' field in the lock
> requests to some constant (e.g. 0, or 42 or whatever).
> As no genuine conflicts are ever sent to a server, it is always safe
> to just send a constant pid. The pid is only used for lockd to
> mediate locking requests between processes on the same client. Now
> that the lockd client never depends on the server for that mediation,
> there is no need to send the pid.

??? On the contrary. Genuine conflicts are currently resolved by the
server and the server alone. We do admittedly call do_vfs_lock() with
the ACCESS flag set, but that doesn't set a lock. It is quite possible
for 2 processes to pass the do_vfs_lock() call, but then conflict on
actually setting the lock.

IOW: afaics your patch will fundamentally break posix/fcntl() locks.

Worse, it means that unlocking the flock() may result in an automatic
clearing of all posix/fcntl() locks...

Cheers
Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-30 03:45:18

by NeilBrown

[permalink] [raw]
Subject: Re: flock and lockf over NFS and the 'pid' in lockd requests.

On Sunday April 29, [email protected] wrote:
> On Mon, 2007-04-30 at 11:45 +1000, Neil Brown wrote:
> > Currently an flock and an fcntl lock on the one file will both be sent
> > to the lockd server and it will appear to be a conflict. We can tell
> > lockd that it isn't a conflict by setting the 'pid' field in the lock
> > requests to some constant (e.g. 0, or 42 or whatever).
> > As no genuine conflicts are ever sent to a server, it is always safe
> > to just send a constant pid. The pid is only used for lockd to
> > mediate locking requests between processes on the same client. Now
> > that the lockd client never depends on the server for that mediation,
> > there is no need to send the pid.
>
> ??? On the contrary. Genuine conflicts are currently resolved by the
> server and the server alone. We do admittedly call do_vfs_lock() with
> the ACCESS flag set, but that doesn't set a lock. It is quite possible
> for 2 processes to pass the do_vfs_lock() call, but then conflict on
> actually setting the lock.
>

Oh bother, I didn't notice the ACCESS setting. Thanks for that.

> IOW: afaics your patch will fundamentally break posix/fcntl() locks.

Indeed... can it be fixed I wonder.

Not easily I think. e.g. when unlocking we need to tell the server
only to unlock that part (those parts) which is/are not held by some other
local client. And that is just one of the issues.

Which means either:
- you cannot hold both an fcntl lock and a flock for the same file at
the same time (thus NFS breaks local file semantics again) or
- we return to the unfortunate but well-understood situation that
flock locks don't work over NFS.
- massive rewrite of the lockd code so that the client doesn't depend
on the server for local arbitration, and the client needs to
potentially send lots of subrange unlocks for a single unlock
call.

What would you think of an "noflock" option to allow people to choose
the old practice of not supporting flock???

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-30 04:00:37

by Wendy Cheng

[permalink] [raw]
Subject: Re: flock and lockf over NFS and the 'pid' in lockd requests.

Neil Brown wrote:

>On Sunday April 29, [email protected] wrote:
>
>
>>??? On the contrary. Genuine conflicts are currently resolved by the
>>server and the server alone. We do admittedly call do_vfs_lock() with
>>the ACCESS flag set, but that doesn't set a lock.
>>
>Oh bother, I didn't notice the ACCESS setting. Thanks for that.
>
>

Tons of people, including me, missed that flag when they first looked at
this code.

> Indeed... can it be fixed I wonder.
>
>[snip]
>
> - massive rewrite of the lockd code so that the client doesn't depend
> on the server for local arbitration,
>
I "like" this (think about performance gain).

>
>What would you think of an "noflock" option to allow people to choose
>the old practice of not supporting flock???
>
>
Ha! Take easy way out ? (sorry, can't resist this :) ).

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-30 04:06:13

by Wendy Cheng

[permalink] [raw]
Subject: Re: flock and lockf over NFS and the 'pid' in lockd requests.

Wendy Cheng wrote:

> Neil Brown wrote:
>
>>
>> What would you think of an "noflock" option to allow people to choose
>> the old practice of not supporting flock???
>>
Sorry, shouldn't make a joke as previous post - I should have said this
is a sensible solution for now.

-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-30 06:08:28

by Olaf Kirch

[permalink] [raw]
Subject: Re: flock and lockf over NFS and the 'pid' in lockd requests.

On Monday 30 April 2007 05:44, Neil Brown wrote:
> - you cannot hold both an fcntl lock and a flock for the same file at
> the same time (thus NFS breaks local file semantics again) or

It would be nice if we could at least return EDEADLOCK
and do a printk("%s tried to take flock+posix lock\n", current->comm);
I think in general the current behavior isn't that bad, and I wouldn't
expect many apps to be affected. In general, MUAs will usually
implement either flock or fcntl locks, but not both at the same time.
Except for procmail, which has always been a bit... special.

> - we return to the unfortunate but well-understood situation that
> flock locks don't work over NFS.

I don't think we want that.

> - massive rewrite of the lockd code so that the client doesn't depend
> on the server for local arbitration, and the client needs to
> potentially send lots of subrange unlocks for a single unlock
> call.

Don't go there. You basically need to copy fs/locks.c and butcher it
badly. On a per-file and per-owner basis, you need to maintain a
list of ranges and their lock status. They would be overlapping
(shared locks), and the semantics on unlock would be a mess
because of the different inheritance rules for posix vs flock locks.

> What would you think of an "noflock" option to allow people to choose
> the old practice of not supporting flock???

I'd suggest, fix procmail and be done with it :-)

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-30 13:17:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: flock and lockf over NFS and the 'pid' in lockd requests.

On Mon, 2007-04-30 at 08:08 +0200, Olaf Kirch wrote:
> On Monday 30 April 2007 05:44, Neil Brown wrote:
> > - you cannot hold both an fcntl lock and a flock for the same file at
> > the same time (thus NFS breaks local file semantics again) or
>
> It would be nice if we could at least return EDEADLOCK
> and do a printk("%s tried to take flock+posix lock\n", current->comm);
> I think in general the current behavior isn't that bad, and I wouldn't
> expect many apps to be affected. In general, MUAs will usually
> implement either flock or fcntl locks, but not both at the same time.
> Except for procmail, which has always been a bit... special.

EDEADLK isn't really a valid flock() error, but I wouldn't object to
returning ENOLCK if someone tries to take both a flock and a posix lock.

> > - massive rewrite of the lockd code so that the client doesn't depend
> > on the server for local arbitration, and the client needs to
> > potentially send lots of subrange unlocks for a single unlock
> > call.
>
> Don't go there. You basically need to copy fs/locks.c and butcher it
> badly. On a per-file and per-owner basis, you need to maintain a
> list of ranges and their lock status. They would be overlapping
> (shared locks), and the semantics on unlock would be a mess
> because of the different inheritance rules for posix vs flock locks.

Yeah. You also lose the ability to do things like lockf(fd, F_ULOCK, 0)
in order to atomically release all locks on the file. That has
consequences for fair queueing of blocking locks, etc...

Cheers
Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs