2006-08-08 14:39:59

by Jesper Juhl

[permalink] [raw]
Subject: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

I have some webservers that have recently started reporting the
following message in their logs :

do_vfs_lock: VFS is out of sync with lock manager!

The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
the message started appearing.
The servers were previously running 2.6.13.4 without experiencing this problem.
Nothing has changed except the kernel.

I've googled a bit and found this mail
(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
"The above is a lockd error that states that the VFS is failing to track
your NFS locks correctly".
Ok, but that doesn't really help me resolve the issue. The servers are
indeed running NFS and access their apache DocumentRoots from a NFS
mount.

Is there anything I can do to help track down this issue?

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-08-09 08:07:44

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On 09/08/06, Grant Coady <[email protected]> wrote:
> On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
>
> >I have some webservers that have recently started reporting the
> >following message in their logs :
> >
> > do_vfs_lock: VFS is out of sync with lock manager!
> >
> >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
> >the message started appearing.
> >The servers were previously running 2.6.13.4 without experiencing this problem.
> >Nothing has changed except the kernel.
> >
> >I've googled a bit and found this mail
> >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
> >"The above is a lockd error that states that the VFS is failing to track
> >your NFS locks correctly".
> >Ok, but that doesn't really help me resolve the issue. The servers are
> >indeed running NFS and access their apache DocumentRoots from a NFS
> >mount.
> >
> >Is there anything I can do to help track down this issue?
>
> I don't have an answer, but offer this observation: five boxen running
> 2.6.17.8 doing six simultaneous
>
> bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>
> didn't burp. The /home/share/ is an NFS export from another box running
> 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
> NFS source file was only opened for non-exclusive read-only.
>
The NFS server here is running 2.6.11.11 and doesn't seem to be
reporting any problems. But I now have two more of my webservers (both
running 2.6.17.8) that have started to complain about "do_vfs_lock:
VFS is out of sync with lock manager!"

I've not found a way to cause the message to be repported at will unfortunately.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-10 22:37:44

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On 09/08/06, Jesper Juhl <[email protected]> wrote:
> On 09/08/06, Grant Coady <[email protected]> wrote:
> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
> >
> > >I have some webservers that have recently started reporting the
> > >following message in their logs :
> > >
> > > do_vfs_lock: VFS is out of sync with lock manager!
> > >
> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
> > >the message started appearing.
> > >The servers were previously running 2.6.13.4 without experiencing this problem.
> > >Nothing has changed except the kernel.
> > >
> > >I've googled a bit and found this mail
> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
> > >"The above is a lockd error that states that the VFS is failing to track
> > >your NFS locks correctly".
> > >Ok, but that doesn't really help me resolve the issue. The servers are
> > >indeed running NFS and access their apache DocumentRoots from a NFS
> > >mount.
> > >
> > >Is there anything I can do to help track down this issue?
> >
> > I don't have an answer, but offer this observation: five boxen running
> > 2.6.17.8 doing six simultaneous
> >
> > bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
> >
> > didn't burp. The /home/share/ is an NFS export from another box running
> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
> > NFS source file was only opened for non-exclusive read-only.
> >
> The NFS server here is running 2.6.11.11 and doesn't seem to be
> reporting any problems. But I now have two more of my webservers (both
> running 2.6.17.8) that have started to complain about "do_vfs_lock:
> VFS is out of sync with lock manager!"
>
> I've not found a way to cause the message to be repported at will unfortunately.
>
Today 3 more of my webservers running 2.6.17.8 reported this message.
The machines all seem to be running fine still, so it doesn't seem to
be a serious problem, but it would still be nice to get it fixed ;)


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-11 00:30:33

by Grant Coady

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Fri, 11 Aug 2006 00:37:35 +0200, "Jesper Juhl" <[email protected]> wrote:

>On 09/08/06, Jesper Juhl <[email protected]> wrote:
>> On 09/08/06, Grant Coady <[email protected]> wrote:
>> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
>> >
>> > >I have some webservers that have recently started reporting the
>> > >following message in their logs :
>> > >
>> > > do_vfs_lock: VFS is out of sync with lock manager!
>> > >
>> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>> > >the message started appearing.
>> > >The servers were previously running 2.6.13.4 without experiencing this problem.
>> > >Nothing has changed except the kernel.
>> > >
>> > >I've googled a bit and found this mail
>> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>> > >"The above is a lockd error that states that the VFS is failing to track
>> > >your NFS locks correctly".
>> > >Ok, but that doesn't really help me resolve the issue. The servers are
>> > >indeed running NFS and access their apache DocumentRoots from a NFS
>> > >mount.
>> > >
>> > >Is there anything I can do to help track down this issue?
>> >
>> > I don't have an answer, but offer this observation: five boxen running
>> > 2.6.17.8 doing six simultaneous
>> >
>> > bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>> >
>> > didn't burp. The /home/share/ is an NFS export from another box running
>> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
>> > NFS source file was only opened for non-exclusive read-only.
>> >
>> The NFS server here is running 2.6.11.11 and doesn't seem to be
>> reporting any problems. But I now have two more of my webservers (both
>> running 2.6.17.8) that have started to complain about "do_vfs_lock:
>> VFS is out of sync with lock manager!"
>>
>> I've not found a way to cause the message to be repported at will unfortunately.
>>
>Today 3 more of my webservers running 2.6.17.8 reported this message.
>The machines all seem to be running fine still, so it doesn't seem to
>be a serious problem, but it would still be nice to get it fixed ;)

Do you have some test case other than web server that triggers it?
I can try it here, recently did much NFS testing reported under
2.4.33-rc1 or -rc2 on lkml to try sort a vfs_unlink issue.

My web server here is low volume, not a good test situation -- plus
it no longer runs under 2.6 and I don't want to take it down too often
to sort out why. Once it could dual boot 2.4 or 2.6 without trouble,
but that option fell apart when 2.6.16 came out.

Grant.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-13 23:08:57

by Grant Coady

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Fri, 11 Aug 2006 00:37:35 +0200, "Jesper Juhl" <[email protected]> wrote:

>On 09/08/06, Jesper Juhl <[email protected]> wrote:
>> On 09/08/06, Grant Coady <[email protected]> wrote:
>> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
>> >
>> > >I have some webservers that have recently started reporting the
>> > >following message in their logs :
>> > >
>> > > do_vfs_lock: VFS is out of sync with lock manager!
>> > >
>> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>> > >the message started appearing.
>> > >The servers were previously running 2.6.13.4 without experiencing this problem.
>> > >Nothing has changed except the kernel.
>> > >
>> > >I've googled a bit and found this mail
>> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>> > >"The above is a lockd error that states that the VFS is failing to track
>> > >your NFS locks correctly".
>> > >Ok, but that doesn't really help me resolve the issue. The servers are
>> > >indeed running NFS and access their apache DocumentRoots from a NFS
>> > >mount.
>> > >
>> > >Is there anything I can do to help track down this issue?
>> >
>> > I don't have an answer, but offer this observation: five boxen running
>> > 2.6.17.8 doing six simultaneous
>> >
>> > bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>> >
>> > didn't burp. The /home/share/ is an NFS export from another box running
>> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
>> > NFS source file was only opened for non-exclusive read-only.
>> >
>> The NFS server here is running 2.6.11.11 and doesn't seem to be
>> reporting any problems. But I now have two more of my webservers (both
>> running 2.6.17.8) that have started to complain about "do_vfs_lock:
>> VFS is out of sync with lock manager!"
>>
>> I've not found a way to cause the message to be repported at will unfortunately.
>>
>Today 3 more of my webservers running 2.6.17.8 reported this message.
>The machines all seem to be running fine still, so it doesn't seem to
>be a serious problem, but it would still be nice to get it fixed ;)

I'm running continuous kernel 2.4.33 rebuild from make mrproper plus
another console extracting tarball, diff tree against last_extracted,
on pair of 2.6.17.8 boxen overnight with NFS TCP support, no problems,
now testing without TCP support. Report again only if I see problems.

Let me know if you want to see test scripts.

Grant.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-09 05:53:52

by Grant Coady

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:

>I have some webservers that have recently started reporting the
>following message in their logs :
>
> do_vfs_lock: VFS is out of sync with lock manager!
>
>The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>the message started appearing.
>The servers were previously running 2.6.13.4 without experiencing this problem.
>Nothing has changed except the kernel.
>
>I've googled a bit and found this mail
>(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>"The above is a lockd error that states that the VFS is failing to track
>your NFS locks correctly".
>Ok, but that doesn't really help me resolve the issue. The servers are
>indeed running NFS and access their apache DocumentRoots from a NFS
>mount.
>
>Is there anything I can do to help track down this issue?

I don't have an answer, but offer this observation: five boxen running
2.6.17.8 doing six simultaneous

bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1

didn't burp. The /home/share/ is an NFS export from another box running
2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
NFS source file was only opened for non-exclusive read-only.

Grant.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 09:19:31

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

Any chance we could get the patch below (or something similar) pushed
into 2.6.19?


On 21/11/06, Jesper Juhl <[email protected]> wrote:
> On 21/08/06, Trond Myklebust <[email protected]> wrote:
> > On Mon, 2006-08-21 at 13:34 +1000, Neil Brown wrote:
> > > Looking in fs/nfs/file.c (at 2.6.18-rc4-mm1 if it matters, but 2.6.17
> > > is much the same)
> > >
> > > - do_vfs_lock is only called when the filesystem was mounted with
> > > -o nolock EXCEPT
> > > - If a lock request to the server in interrupted (when mounted with
> > > -o intr) then do_vfs_lock is called to try to get the lock
> > > locally. Normally equivalent code will be called inside
> > > fs/lockd/clntproc.c when the server replies that the lock has been
> > > gained. In the case of an interrupt though this doesn't happen
> > > but the lock may still have happened on the server. So we record
> > > locally that the lock was gained, to ensure that it gets unlocked
> > > when the process exits.
> > >
> > > As you don't have '-o nolocks' you must be hitting the second case.
> > > The lock call to the server returns -EINTR or -ERESTARTSYS and
> > > do_vfs_lock is called just-in-case.
> > > As this is a just-in-case call, it is quite possible that the lock is
> > > held by some other process, so getting an error is entirely possible.
> > > So printing the message in this case seems wrong.
> > >
> > > On the other hand, printing the message in any other case seems wrong
> > > too, as server locking is not being used, so there is nothing to get
> > > out of sync with.
> > >
> > > As a further complication, I don't think that in the just-in-case
> > > situation that it should risk waiting for the lock.
> > > Now maybe we can be sure there is a pending signal which will break
> > > out of any wait (though I'm worried about -ERESTARTSYS - that doesn't
> > > imply a signal does it?), but I would feel more comfortable if
> > > FL_SLEEP were turned off in that path.
> > >
> > > So: Trond: Any obvious errors in the above?
> > > Is the following patch ok?
> >
> > Could we instead replace it with a dprintk() that returns the value of
> > "res"? That will keep it useful for debugging purposes.
> >
>
> How about the below?
> (compile tested only)
>
> Neil: I left your Signed-off-by line since I just modified your patch slightly.
>
> Since Gmail will probably mangle the inline patch, it is attached as well.
>
>
> Signed-off-by: Neil Brown <[email protected]>
> Signed-off-by: Jesper Juhl <[email protected]>
> --
>
> fs/nfs/file.c | 11 +++++++----
> 1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index cc93865..22572af 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -428,8 +428,8 @@ static int do_vfs_lock(struct file *file
> BUG();
> }
> if (res < 0)
> - printk(KERN_WARNING "%s: VFS is out of sync with lock
> manager!\n",
> - __FUNCTION__);
> + dprintk("%s: VFS is out of sync with lock manager (res
> = %d)!\n",
> + __FUNCTION__, res);
> return res;
> }
>
> @@ -479,10 +479,13 @@ static int do_setlk(struct file *filp, i
> * we clean up any state on the server. We therefore
> * record the lock call as having succeeded in order to
> * ensure that locks_remove_posix() cleans it out when
> - * the process exits.
> + * the process exits. Make sure not to sleep if
> + * someone else holds the lock.
> */
> - if (status == -EINTR || status == -ERESTARTSYS)
> + if (status == -EINTR || status == -ERESTARTSYS) {
> + fl->fl_flags &= ~FL_SLEEP;
> do_vfs_lock(filp, fl);
> + }
> } else
> status = do_vfs_lock(filp, fl);
> unlock_kernel();
>



--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-02-01 22:39:54

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Wednesday January 31, [email protected] wrote:
> On 29/01/07, Trond Myklebust <[email protected]> wrote:
>
> Finally getting that in will be sooooo nice :-) Thank you.
>
> Btw: any reason why not to include the
> fl->fl_flags &= ~FL_SLEEP;
> bit as well? As in http://lkml.org/lkml/2006/11/27/41 ??
>

Uhmm... I guess I had forgotten it ....

Looking again, I cannot convince myself that it is needed.
The comment says "If we were signalled ...", and if we were signalled
then do_vfs_lock won't block anyway. I think I was probably being
over-cautious as I didn't know the significance of testing for
-ERESTARTSYS.
However it seems to only get returned if a signal is pending (so why
EINTR isn't returned I still don't know) so a signal must be pending
on that piece of code, so there is no need to clear FL_SLEEP.

Thanks,
NeilBrown

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs