2010-05-27 11:11:04

by Bernd Petrovitsch

[permalink] [raw]
Subject: Debugging fcntl() file locks

Hi all!

How can I debug fcntl() file locks on a NFSv3-client?
The server side is a NetApp-Box (no details at hand but I can ask).
The client side are stock-RHEL5.3/CentOS-5.3 kernels - 2.6.18-92.el5.

The file system in question is mounted on 2 clients. flock() on a file
succeeds but fcntl() fails with EAGAIN. From what I found in the
Internet and manual pages, this means that someone else already locked
that file (- the file is successfully open()ed read/write so it can't be
"your are not allowed to write-lock the file").
But how do I find out on which host and which process?

strace shows:
---- snip ----
open("/... secret ...", O_RDWR) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7925b400) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
fstat(3, {st_mode=S_IFREG|0600, st_size=935, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
flock(3, LOCK_EX|LOCK_NB) = 0
fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
---- snip ----
[ This is a perl script using CPAN modules at the top. ]

/proc/mounts shows
---- snip ----
rw,nodiratime,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=<ip-address>
---- snip ----
as options for the filesystem.

Thanks in advance,
Bernd
--
mobile: +43 664 4416156 http://www.sysprog.at/
Linux Software Development, Consulting and Services



2010-05-28 09:36:15

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: Debugging fcntl() file locks

On Don, 2010-05-27 at 10:18 -0400, Trond Myklebust wrote:
> On Thu, 2010-05-27 at 15:46 +0200, Bernd Petrovitsch wrote:
[...]
> > On Don, 2010-05-27 at 08:34 -0400, Trond Myklebust wrote:
> > > On Thu, 2010-05-27 at 12:11 +0200, Bernd Petrovitsch wrote:
> > [...]
> > > > How can I debug fcntl() file locks on a NFSv3-client?
> > > > The server side is a NetApp-Box (no details at hand but I can ask).
> > > > The client side are stock-RHEL5.3/CentOS-5.3 kernels - 2.6.18-92.el5.
> > > >
> > > > The file system in question is mounted on 2 clients. flock() on a file
> > > > succeeds but fcntl() fails with EAGAIN. From what I found in the
> > > > Internet and manual pages, this means that someone else already locked
> > > > that file (- the file is successfully open()ed read/write so it can't be
> > > > "your are not allowed to write-lock the file").
> > > > But how do I find out on which host and which process?
> > > >
> > > > strace shows:
> > > > ---- snip ----
> > > > open("/... secret ...", O_RDWR) = 3
> > > > ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7925b400) = -1 ENOTTY (Inappropriate ioctl for device)
> > > > lseek(3, 0, SEEK_CUR) = 0
> > > > fstat(3, {st_mode=S_IFREG|0600, st_size=935, ...}) = 0
> > > > fcntl(3, F_SETFD, FD_CLOEXEC) = 0
> > > > flock(3, LOCK_EX|LOCK_NB) = 0
> > > > fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
> > > > ---- snip ----
> > > > [ This is a perl script using CPAN modules at the top. ]
> > [...]
> > > The Linux NFS client does not allow you to lock a file using both
> > > flock() and POSIX locks. You should choose one or the other locking
> > > scheme. If you remove the flock() line above, then the POSIX lock will
> > > likely succeed.
> >
> > Thanks.
> > Does flock() work on NFS these days on NFS?
> > Historically that was not the case IIRC.
>
> Yes. flock() has worked since the early 2.6.x series, however it uses
> the same NFS byte range lock protocol as POSIX locks, so there is no way
> to avoid conflicts.

IMHO this is the saner approach - treat flock() and fcntl()-locking the
same with just different APIs/interfaces. But there are probably more
(historically grown) differences.

And other filesystems - especially ext3 - allow this "duplicate locking"
- at least for /dev/null.
Said script is trying that to check if fcntl() can be used after flock()
or not.
Not that I find that a particularly good solution because
- there is no check if /dev/null is on the same filesystem as the
to-be-locked files and
- is locking a special char device the same as a plain file.

Bernd
--
mobile: +43 664 4416156 http://www.sysprog.at/
Linux Software Development, Consulting and Services


2010-05-27 12:34:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: Debugging fcntl() file locks

On Thu, 2010-05-27 at 12:11 +0200, Bernd Petrovitsch wrote:
> Hi all!
>
> How can I debug fcntl() file locks on a NFSv3-client?
> The server side is a NetApp-Box (no details at hand but I can ask).
> The client side are stock-RHEL5.3/CentOS-5.3 kernels - 2.6.18-92.el5.
>
> The file system in question is mounted on 2 clients. flock() on a file
> succeeds but fcntl() fails with EAGAIN. From what I found in the
> Internet and manual pages, this means that someone else already locked
> that file (- the file is successfully open()ed read/write so it can't be
> "your are not allowed to write-lock the file").
> But how do I find out on which host and which process?
>
> strace shows:
> ---- snip ----
> open("/... secret ...", O_RDWR) = 3
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7925b400) = -1 ENOTTY (Inappropriate ioctl for device)
> lseek(3, 0, SEEK_CUR) = 0
> fstat(3, {st_mode=S_IFREG|0600, st_size=935, ...}) = 0
> fcntl(3, F_SETFD, FD_CLOEXEC) = 0
> flock(3, LOCK_EX|LOCK_NB) = 0
> fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
> ---- snip ----
> [ This is a perl script using CPAN modules at the top. ]
>
> /proc/mounts shows
> ---- snip ----
> rw,nodiratime,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=<ip-address>
> ---- snip ----
> as options for the filesystem.
>
> Thanks in advance,
> Bernd

The Linux NFS client does not allow you to lock a file using both
flock() and POSIX locks. You should choose one or the other locking
scheme. If you remove the flock() line above, then the POSIX lock will
likely succeed.

Cheers
Trond


2010-05-27 14:18:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: Debugging fcntl() file locks

On Thu, 2010-05-27 at 15:46 +0200, Bernd Petrovitsch wrote:
> Hi!
>
> On Don, 2010-05-27 at 08:34 -0400, Trond Myklebust wrote:
> > On Thu, 2010-05-27 at 12:11 +0200, Bernd Petrovitsch wrote:
> [...]
> > > How can I debug fcntl() file locks on a NFSv3-client?
> > > The server side is a NetApp-Box (no details at hand but I can ask).
> > > The client side are stock-RHEL5.3/CentOS-5.3 kernels - 2.6.18-92.el5.
> > >
> > > The file system in question is mounted on 2 clients. flock() on a file
> > > succeeds but fcntl() fails with EAGAIN. From what I found in the
> > > Internet and manual pages, this means that someone else already locked
> > > that file (- the file is successfully open()ed read/write so it can't be
> > > "your are not allowed to write-lock the file").
> > > But how do I find out on which host and which process?
> > >
> > > strace shows:
> > > ---- snip ----
> > > open("/... secret ...", O_RDWR) = 3
> > > ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7925b400) = -1 ENOTTY (Inappropriate ioctl for device)
> > > lseek(3, 0, SEEK_CUR) = 0
> > > fstat(3, {st_mode=S_IFREG|0600, st_size=935, ...}) = 0
> > > fcntl(3, F_SETFD, FD_CLOEXEC) = 0
> > > flock(3, LOCK_EX|LOCK_NB) = 0
> > > fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
> > > ---- snip ----
> > > [ This is a perl script using CPAN modules at the top. ]
> [...]
> > The Linux NFS client does not allow you to lock a file using both
> > flock() and POSIX locks. You should choose one or the other locking
> > scheme. If you remove the flock() line above, then the POSIX lock will
> > likely succeed.
>
> Thanks.
> Does flock() work on NFS these days on NFS?
> Historically that was not the case IIRC.

Yes. flock() has worked since the early 2.6.x series, however it uses
the same NFS byte range lock protocol as POSIX locks, so there is no way
to avoid conflicts.

Cheers
Trond


2010-05-27 14:11:03

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: Debugging fcntl() file locks

Hi!

On Don, 2010-05-27 at 08:34 -0400, Trond Myklebust wrote:
> On Thu, 2010-05-27 at 12:11 +0200, Bernd Petrovitsch wrote:
[...]
> > How can I debug fcntl() file locks on a NFSv3-client?
> > The server side is a NetApp-Box (no details at hand but I can ask).
> > The client side are stock-RHEL5.3/CentOS-5.3 kernels - 2.6.18-92.el5.
> >
> > The file system in question is mounted on 2 clients. flock() on a file
> > succeeds but fcntl() fails with EAGAIN. From what I found in the
> > Internet and manual pages, this means that someone else already locked
> > that file (- the file is successfully open()ed read/write so it can't be
> > "your are not allowed to write-lock the file").
> > But how do I find out on which host and which process?
> >
> > strace shows:
> > ---- snip ----
> > open("/... secret ...", O_RDWR) = 3
> > ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7925b400) = -1 ENOTTY (Inappropriate ioctl for device)
> > lseek(3, 0, SEEK_CUR) = 0
> > fstat(3, {st_mode=S_IFREG|0600, st_size=935, ...}) = 0
> > fcntl(3, F_SETFD, FD_CLOEXEC) = 0
> > flock(3, LOCK_EX|LOCK_NB) = 0
> > fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
> > ---- snip ----
> > [ This is a perl script using CPAN modules at the top. ]
[...]
> The Linux NFS client does not allow you to lock a file using both
> flock() and POSIX locks. You should choose one or the other locking
> scheme. If you remove the flock() line above, then the POSIX lock will
> likely succeed.

Thanks.
Does flock() work on NFS these days on NFS?
Historically that was not the case IIRC.

Bernd
--
Bernd Petrovitsch Email : [email protected]
LUGA : http://www.luga.at