2005-01-12 15:23:27

by Anders Saaby

[permalink] [raw]
Subject: 2.6.10 - VFS is out of sync with lock manager!

Hi Trond,

Yesterday i posted this to LKML (but my mailreader theated me, and didn't keep
thread info):

(I am very sorry if you have already seen my previous mail - I don't want to
bother you unnessary!)

->

I have seen the exact same error on one of my webservers which is serving
from an NFS export and under heavy load. ~2 hours uptime before panic'ing.
I then tried Trond's patch which seems to work. 14 hours of uptime now. :)

Anyways, I have a couple of issues you might be able to clear up for me:

First issue:
New strange message in the kernel log:

"nlmclnt_lock: VFS is out of sync with lock manager!"

- What does this mean? - Is it bad?, What can i do?


Second issue:
my fs/nfs/file.c doesn't look like yours (Vanilla 2.6.10):

<fs/nfs/file.c SNIP>
????????status?=?NFS_PROTO(inode)->lock(filp,?cmd,?fl);
????????/*?If?we?were?signalled?we?still?need?to?ensure?that
?????????*?we?clean?up?any?state?on?the?server.?We?therefore
?????????*?record?the?lock?call?as?having?succeeded?in?order?to
?????????*?ensure?that?locks_remove_posix()?cleans?it?out?when
?????????*?the?process?exits.
?????????*/
????????if?(status?==?-EINTR?||?status?==?-ERESTARTSYS)
????????????????posix_lock_file_wait(filp,?fl);
????????unlock_kernel();
????????if?(status?<?0)
????????????????return?status;
????????/*
?????????*?Make?sure?we?clear?the?cache?whenever?we?try?to?get?the?lock.
?????????*?This?makes?locking?act?as?a?cache?coherency?point.
?????????*/
????????filemap_fdatawrite(filp->f_mapping);
????????down(&inode->i_sem);
????????nfs_wb_all(inode);??????/*?we?may?have?slept?*/
????????up(&inode->i_sem);
????????filemap_fdatawait(filp->f_mapping);
????????nfs_zap_caches(inode);
????????return?0;
</SNIP>

So... Am I missing another patch or something else?

Jan-Frode Myklebust wrote:

> On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote:
>>
>> Looking at the NFS code, I can attempt a wild guess about what may be
>> happening: there may be a race when pressing ^C in the middle of a
>> blocking NFS lock RPC call, and if so, the following patch will fix it.
>
>
> A whopping 9 hours of uptime now :) So the one-liner patch seems to have
> fixed it.
>
> Thanks!
>
>> -???posix_lock_file(filp,?fl);
>> +???posix_lock_file_wait(filp,?fl);
>
>
>???-jf

--
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: [email protected] - http://www.cohaesio.com
------------------------------------------------


2005-01-12 17:01:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6.10 - VFS is out of sync with lock manager!

on den 12.01.2005 Klokka 16:23 (+0100) skreiv Anders Saaby:

> First issue:
> New strange message in the kernel log:
>
> "nlmclnt_lock: VFS is out of sync with lock manager!"
>
> - What does this mean? - Is it bad?, What can i do?

It means that the VFS failed to register the lock that was just granted
to you because of a combination of an RPC race (the reply telling you
the lock was granted arrived before the previous lock holder has been
notified that his lock was freed), and the user pressing ^C at the wrong
moment.

You may have an "orphaned lock" on the server.

>
> Second issue:
> my fs/nfs/file.c doesn't look like yours (Vanilla 2.6.10):

<shrug>My fs/nfs/file.c looks very much like the one in vanilla
2.6.11-rc1. See Linus' changelog for what may be missing.</shrug>

Cheers,
Trond
--
Trond Myklebust <[email protected]>