2006-08-08 14:39:57

by Jesper Juhl

[permalink] [raw]
Subject: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

I have some webservers that have recently started reporting the
following message in their logs :

do_vfs_lock: VFS is out of sync with lock manager!

The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
the message started appearing.
The servers were previously running 2.6.13.4 without experiencing this problem.
Nothing has changed except the kernel.

I've googled a bit and found this mail
(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
"The above is a lockd error that states that the VFS is failing to track
your NFS locks correctly".
Ok, but that doesn't really help me resolve the issue. The servers are
indeed running NFS and access their apache DocumentRoots from a NFS
mount.

Is there anything I can do to help track down this issue?

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html


2006-08-09 05:53:52

by Grant Coady

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:

>I have some webservers that have recently started reporting the
>following message in their logs :
>
> do_vfs_lock: VFS is out of sync with lock manager!
>
>The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>the message started appearing.
>The servers were previously running 2.6.13.4 without experiencing this problem.
>Nothing has changed except the kernel.
>
>I've googled a bit and found this mail
>(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>"The above is a lockd error that states that the VFS is failing to track
>your NFS locks correctly".
>Ok, but that doesn't really help me resolve the issue. The servers are
>indeed running NFS and access their apache DocumentRoots from a NFS
>mount.
>
>Is there anything I can do to help track down this issue?

I don't have an answer, but offer this observation: five boxen running
2.6.17.8 doing six simultaneous

bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1

didn't burp. The /home/share/ is an NFS export from another box running
2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
NFS source file was only opened for non-exclusive read-only.

Grant.

2006-08-09 08:07:43

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On 09/08/06, Grant Coady <[email protected]> wrote:
> On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
>
> >I have some webservers that have recently started reporting the
> >following message in their logs :
> >
> > do_vfs_lock: VFS is out of sync with lock manager!
> >
> >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
> >the message started appearing.
> >The servers were previously running 2.6.13.4 without experiencing this problem.
> >Nothing has changed except the kernel.
> >
> >I've googled a bit and found this mail
> >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
> >"The above is a lockd error that states that the VFS is failing to track
> >your NFS locks correctly".
> >Ok, but that doesn't really help me resolve the issue. The servers are
> >indeed running NFS and access their apache DocumentRoots from a NFS
> >mount.
> >
> >Is there anything I can do to help track down this issue?
>
> I don't have an answer, but offer this observation: five boxen running
> 2.6.17.8 doing six simultaneous
>
> bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>
> didn't burp. The /home/share/ is an NFS export from another box running
> 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
> NFS source file was only opened for non-exclusive read-only.
>
The NFS server here is running 2.6.11.11 and doesn't seem to be
reporting any problems. But I now have two more of my webservers (both
running 2.6.17.8) that have started to complain about "do_vfs_lock:
VFS is out of sync with lock manager!"

I've not found a way to cause the message to be repported at will unfortunately.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-08-10 22:37:38

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On 09/08/06, Jesper Juhl <[email protected]> wrote:
> On 09/08/06, Grant Coady <[email protected]> wrote:
> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
> >
> > >I have some webservers that have recently started reporting the
> > >following message in their logs :
> > >
> > > do_vfs_lock: VFS is out of sync with lock manager!
> > >
> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
> > >the message started appearing.
> > >The servers were previously running 2.6.13.4 without experiencing this problem.
> > >Nothing has changed except the kernel.
> > >
> > >I've googled a bit and found this mail
> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
> > >"The above is a lockd error that states that the VFS is failing to track
> > >your NFS locks correctly".
> > >Ok, but that doesn't really help me resolve the issue. The servers are
> > >indeed running NFS and access their apache DocumentRoots from a NFS
> > >mount.
> > >
> > >Is there anything I can do to help track down this issue?
> >
> > I don't have an answer, but offer this observation: five boxen running
> > 2.6.17.8 doing six simultaneous
> >
> > bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
> >
> > didn't burp. The /home/share/ is an NFS export from another box running
> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
> > NFS source file was only opened for non-exclusive read-only.
> >
> The NFS server here is running 2.6.11.11 and doesn't seem to be
> reporting any problems. But I now have two more of my webservers (both
> running 2.6.17.8) that have started to complain about "do_vfs_lock:
> VFS is out of sync with lock manager!"
>
> I've not found a way to cause the message to be repported at will unfortunately.
>
Today 3 more of my webservers running 2.6.17.8 reported this message.
The machines all seem to be running fine still, so it doesn't seem to
be a serious problem, but it would still be nice to get it fixed ;)


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-08-11 00:30:32

by Grant Coady

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Fri, 11 Aug 2006 00:37:35 +0200, "Jesper Juhl" <[email protected]> wrote:

>On 09/08/06, Jesper Juhl <[email protected]> wrote:
>> On 09/08/06, Grant Coady <[email protected]> wrote:
>> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
>> >
>> > >I have some webservers that have recently started reporting the
>> > >following message in their logs :
>> > >
>> > > do_vfs_lock: VFS is out of sync with lock manager!
>> > >
>> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>> > >the message started appearing.
>> > >The servers were previously running 2.6.13.4 without experiencing this problem.
>> > >Nothing has changed except the kernel.
>> > >
>> > >I've googled a bit and found this mail
>> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>> > >"The above is a lockd error that states that the VFS is failing to track
>> > >your NFS locks correctly".
>> > >Ok, but that doesn't really help me resolve the issue. The servers are
>> > >indeed running NFS and access their apache DocumentRoots from a NFS
>> > >mount.
>> > >
>> > >Is there anything I can do to help track down this issue?
>> >
>> > I don't have an answer, but offer this observation: five boxen running
>> > 2.6.17.8 doing six simultaneous
>> >
>> > bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>> >
>> > didn't burp. The /home/share/ is an NFS export from another box running
>> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
>> > NFS source file was only opened for non-exclusive read-only.
>> >
>> The NFS server here is running 2.6.11.11 and doesn't seem to be
>> reporting any problems. But I now have two more of my webservers (both
>> running 2.6.17.8) that have started to complain about "do_vfs_lock:
>> VFS is out of sync with lock manager!"
>>
>> I've not found a way to cause the message to be repported at will unfortunately.
>>
>Today 3 more of my webservers running 2.6.17.8 reported this message.
>The machines all seem to be running fine still, so it doesn't seem to
>be a serious problem, but it would still be nice to get it fixed ;)

Do you have some test case other than web server that triggers it?
I can try it here, recently did much NFS testing reported under
2.4.33-rc1 or -rc2 on lkml to try sort a vfs_unlink issue.

My web server here is low volume, not a good test situation -- plus
it no longer runs under 2.6 and I don't want to take it down too often
to sort out why. Once it could dual boot 2.4 or 2.6 without trouble,
but that option fell apart when 2.6.16 came out.

Grant.

2006-08-12 09:03:10

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

In-Reply-To: <[email protected]>

On Fri, 11 Aug 2006 00:37:35 +0200, Jesper Juhl wrote:

> > > >I have some webservers that have recently started reporting the
> > > >following message in their logs :
> > > >
> > > > do_vfs_lock: VFS is out of sync with lock manager!

What does this (not even compile tested) patch print?

--- 2.6.17.8-nb/fs/lockd/clntproc.c 2006-06-10 17:39:21.000000000 -0400
+++ 2.6.17.8-nb/fs/lockd/clntproc.c.new 2006-08-12 04:43:45.000000000 -0400
@@ -458,7 +458,9 @@ static void nlmclnt_locks_init_private(s
static void do_vfs_lock(struct file_lock *fl)
{
int res = 0;
- switch (fl->fl_flags & (FL_POSIX|FL_FLOCK)) {
+ unsigned char flags = fl->fl_flags & (FL_POSIX|FL_FLOCK);
+
+ switch (flags) {
case FL_POSIX:
res = posix_lock_file_wait(fl->fl_file, fl);
break;
@@ -469,8 +471,8 @@ static void do_vfs_lock(struct file_lock
BUG();
}
if (res < 0)
- printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n",
- __FUNCTION__);
+ printk(KERN_WARNING "%s: VFS is out of sync with lock manager! -- %s: %d\n",
+ __FUNCTION__, flags == FL_POSIX ? "POSIX" : "FLOCK", res);
}

/*
--
Chuck

2006-08-13 23:08:55

by Grant Coady

[permalink] [raw]
Subject: Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!

On Fri, 11 Aug 2006 00:37:35 +0200, "Jesper Juhl" <[email protected]> wrote:

>On 09/08/06, Jesper Juhl <[email protected]> wrote:
>> On 09/08/06, Grant Coady <[email protected]> wrote:
>> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <[email protected]> wrote:
>> >
>> > >I have some webservers that have recently started reporting the
>> > >following message in their logs :
>> > >
>> > > do_vfs_lock: VFS is out of sync with lock manager!
>> > >
>> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>> > >the message started appearing.
>> > >The servers were previously running 2.6.13.4 without experiencing this problem.
>> > >Nothing has changed except the kernel.
>> > >
>> > >I've googled a bit and found this mail
>> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>> > >"The above is a lockd error that states that the VFS is failing to track
>> > >your NFS locks correctly".
>> > >Ok, but that doesn't really help me resolve the issue. The servers are
>> > >indeed running NFS and access their apache DocumentRoots from a NFS
>> > >mount.
>> > >
>> > >Is there anything I can do to help track down this issue?
>> >
>> > I don't have an answer, but offer this observation: five boxen running
>> > 2.6.17.8 doing six simultaneous
>> >
>> > bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>> >
>> > didn't burp. The /home/share/ is an NFS export from another box running
>> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
>> > NFS source file was only opened for non-exclusive read-only.
>> >
>> The NFS server here is running 2.6.11.11 and doesn't seem to be
>> reporting any problems. But I now have two more of my webservers (both
>> running 2.6.17.8) that have started to complain about "do_vfs_lock:
>> VFS is out of sync with lock manager!"
>>
>> I've not found a way to cause the message to be repported at will unfortunately.
>>
>Today 3 more of my webservers running 2.6.17.8 reported this message.
>The machines all seem to be running fine still, so it doesn't seem to
>be a serious problem, but it would still be nice to get it fixed ;)

I'm running continuous kernel 2.4.33 rebuild from make mrproper plus
another console extracting tarball, diff tree against last_extracted,
on pair of 2.6.17.8 boxen overnight with NFS TCP support, no problems,
now testing without TCP support. Report again only if I see problems.

Let me know if you want to see test scripts.

Grant.