2007-04-04 07:11:24

by Greg Bradner

[permalink] [raw]
Subject: lockd and statd

I'm getting these errors in the syslog:
Apr 4 00:08:53 fbs2 kernel: statd: server localhost not responding, timed out
Apr 4 00:08:53 fbs2 kernel: lockd: cannot monitor fnas2
Apr 4 00:08:53 fbs2 kernel: lockd: failed to monitor fnas2


I'm using the 2.6.20.4 kernel.

I don't see any statd.c in the src tree.

Did I miss something Ideas?

Thanks,
Greg

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-04-04 23:21:31

by Wendy Cheng

[permalink] [raw]
Subject: Re: lockd and statd

In case there are other changes that I'm not aware of. The kernel patch
is based on 2.6.21-rc4 kernel ... Wendy

> Trond Myklebust wrote:
>> On Wed, 2007-04-04 at 14:49 -0400, Wendy Cheng wrote:
>>
>>> I'm wondering what would be the reason(s) that the community
>>> version of Linux statd is put in user space ? Any one cares to give
>>> either a technical or historical explanation ?
>>>
>>
>> Off the cuff, I can think of 2 main reasons why it needs to be in user
>> space:
>> 1) It creates and maintains a directory hierarchy on permanent storage
>> (as opposed to in a pseudo filesystem).
>> 2) It needs to resolve addresses via DNS etc.
>>
>
> Ha! #2 happens to be my question.. Is there any reason why it has to
> be dnsname ? The following is my issue:
>
> While testing out the NLM failover patches, on Neil's new
> nfs-utils-1.1.0-rc1, sm_mon_1_svc() writes to /var/lib/nfs/sm using
> dnsname (say, for example, dhcp146.something.com), even the kernel has
> passed it a dotted IP address (say 192.168.24.146, since I can't use
> "nsm_use_hostnames" as lockd module param). Unfortunately, the
> sm_unmon_1_svc() doesn't do similar conversion. So when kernel side
> tries to delete the monitored name, I got:
>
> Apr 4 18:12:25 dhcp143 kernel: lockd: delete host
> dhcp146.perf.redhat.com
> Apr 4 18:12:25 dhcp143 kernel: lockd:
> nsm_unmonitor(dhcp146.perf.redhat.com)
> Apr 4 18:12:25 dhcp143 rpc.statd[4210]: unlink
> (/var/lib/nfs/sm/192.168.24.146): No such file or directory
>
> BTW, don't be fooled by above lockd trace (it printed out hostname but
> I'm very sure I passed in IPV4 dotted address as seen by the attached
> kernel statd patch).
>
> Oversight ? My configuration problem ? nfs-util bug ? my bug ? or I
> mis-understand the logic ?
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-04 23:22:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: lockd and statd

On Wed, 2007-04-04 at 18:48 -0400, Wendy Cheng wrote:
> Trond Myklebust wrote:
> > On Wed, 2007-04-04 at 14:49 -0400, Wendy Cheng wrote:
> >
> >>
> >>
> >> I'm wondering what would be the reason(s) that the community version of
> >> Linux statd is put in user space ? Any one cares to give either a
> >> technical or historical explanation ?
> >>
> >
> > Off the cuff, I can think of 2 main reasons why it needs to be in user
> > space:
> > 1) It creates and maintains a directory hierarchy on permanent storage
> > (as opposed to in a pseudo filesystem).
> > 2) It needs to resolve addresses via DNS etc.
> >
>
> Ha! #2 happens to be my question.. Is there any reason why it has to be
> dnsname ? The following is my issue:
>
> While testing out the NLM failover patches, on Neil's new
> nfs-utils-1.1.0-rc1, sm_mon_1_svc() writes to /var/lib/nfs/sm using
> dnsname (say, for example, dhcp146.something.com), even the kernel has
> passed it a dotted IP address (say 192.168.24.146, since I can't use
> "nsm_use_hostnames" as lockd module param). Unfortunately, the
> sm_unmon_1_svc() doesn't do similar conversion. So when kernel side
> tries to delete the monitored name, I got:
>
> Apr 4 18:12:25 dhcp143 kernel: lockd: delete host dhcp146.perf.redhat.com
> Apr 4 18:12:25 dhcp143 kernel: lockd:
> nsm_unmonitor(dhcp146.perf.redhat.com)
> Apr 4 18:12:25 dhcp143 rpc.statd[4210]: unlink
> (/var/lib/nfs/sm/192.168.24.146): No such file or directory
>
> BTW, don't be fooled by above lockd trace (it printed out hostname but
> I'm very sure I passed in IPV4 dotted address as seen by the attached
> kernel statd patch).
>
> Oversight ? My configuration problem ? nfs-util bug ? my bug ? or I
> mis-understand the logic ?
>
> --- Wendy

Tom gave a talk on this subject at Connectathon last year. You can find
his arguments for why statd DNS lookups are a must in his slides:

http://www.connectathon.org/talks06/talpey-cthon06-nsm.pdf

Note his point that the server needs to store both the client hostname
and IP address so that you can fail over from one to the other if the
notification fails.

Cheers
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-10 08:47:53

by Olaf Kirch

[permalink] [raw]
Subject: Re: lockd and statd

On Thursday 05 April 2007 01:22, Trond Myklebust wrote:
> Tom gave a talk on this subject at Connectathon last year. You can find
> his arguments for why statd DNS lookups are a must in his slides:
>
> http://www.connectathon.org/talks06/talpey-cthon06-nsm.pdf
>
> Note his point that the server needs to store both the client hostname
> and IP address so that you can fail over from one to the other if the
> notification fails.

Well, I'm not really convinced. When monitoring a host, I think it is sufficient
to just store the mon_name; no IP address or anything. When dealing with an
incoming SM_NOTIFY, statd just needs to do a string match of the mon_name.
When sending out an SM_NOTIFY, we need a user space helper anyway,
which can do a full-blown DNS lookup and send notifications to *all* addresses
listed for each peer.

This is the approach I took with the kernel statd - these patches have been in
Suse kernels for 1.5 years or so, and until I left in January, I hadn't seen a
single report that complained about lock reclaim problems.

Of course, clients need to send a mon_name that make sense from the
server's point of view in DNS - ie it must be a FQDN, or the server must be
able to find it via its resolv.conf search list.

I believe relying on the client's IP address will paper over configuration
problems at best, and lead to bad or incomplete notifications at worst.

A user space statd has its uses, eg with lock failover - but I believe the
majority of users is served by a small kernel statd just as well.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-10 19:36:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: lockd and statd

On Tue, 2007-04-10 at 10:47 +0200, Olaf Kirch wrote:
> On Thursday 05 April 2007 01:22, Trond Myklebust wrote:
> > Tom gave a talk on this subject at Connectathon last year. You can find
> > his arguments for why statd DNS lookups are a must in his slides:
> >
> > http://www.connectathon.org/talks06/talpey-cthon06-nsm.pdf
> >
> > Note his point that the server needs to store both the client hostname
> > and IP address so that you can fail over from one to the other if the
> > notification fails.
>
> Well, I'm not really convinced. When monitoring a host, I think it is sufficient
> to just store the mon_name; no IP address or anything. When dealing with an
> incoming SM_NOTIFY, statd just needs to do a string match of the mon_name.
> When sending out an SM_NOTIFY, we need a user space helper anyway,
> which can do a full-blown DNS lookup and send notifications to *all* addresses
> listed for each peer.

What about peers that don't have a DNS entry? I certainly don't expect
all machines on my private LAN to have DNS entries.

> This is the approach I took with the kernel statd - these patches have been in
> Suse kernels for 1.5 years or so, and until I left in January, I hadn't seen a
> single report that complained about lock reclaim problems.

History shows that a lack of complaints is not necessarily a good
measure of lack of problems. :-)

> Of course, clients need to send a mon_name that make sense from the
> server's point of view in DNS - ie it must be a FQDN, or the server must be
> able to find it via its resolv.conf search list.

> I believe relying on the client's IP address will paper over configuration
> problems at best, and lead to bad or incomplete notifications at worst.

You missed the point. The IP address was proposed as a _backup_ in case
the DNS lookup fails.

Cheers
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-10 20:16:35

by Talpey, Thomas

[permalink] [raw]
Subject: Re: lockd and statd

At 04:47 AM 4/10/2007, Olaf Kirch wrote:
>This is the approach I took with the kernel statd - these patches have been in
>Suse kernels for 1.5 years or so, and until I left in January, I hadn't seen a
>single report that complained about lock reclaim problems.

But, how would you know? Users have no clue what's going on, and
only some client machines make any attempt to tell them if the
reclaim doesn't succeed. Generally, their first indication is data
corruption.

Frankly, I was shocked when we investigated this behavior on
multiple systems, which is what motivated that Connectathon talk
I gave. Even when everything in the statd notifies worked, it was
easy to find situations where clients failed to reclaim.

>Of course, clients need to send a mon_name that make sense from the
>server's point of view in DNS - ie it must be a FQDN, or the server must be
>able to find it via its resolv.conf search list.
>
>I believe relying on the client's IP address will paper over configuration
>problems at best, and lead to bad or incomplete notifications at worst.

Yes, definitely a FQDN. Servers are not always in the same domain
as their clients. But the server must never rely on the source IP
either - it may change due to NAT, after a network partition, etc.
As I mentioned before, the client's hostname is the only invariant
in NLM recovery, such as it is. Belt-and-suspenders is the order of
the day, I'm afraid.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-04 12:40:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: lockd and statd

On Wed, 2007-04-04 at 00:11 -0700, Greg Bradner wrote:
> I'm getting these errors in the syslog:
> Apr 4 00:08:53 fbs2 kernel: statd: server localhost not responding, timed out
> Apr 4 00:08:53 fbs2 kernel: lockd: cannot monitor fnas2
> Apr 4 00:08:53 fbs2 kernel: lockd: failed to monitor fnas2
>
>
> I'm using the 2.6.20.4 kernel.
>
> I don't see any statd.c in the src tree.
>
> Did I miss something Ideas?

statd lives in userland, and is part of the nfs-utils package. It should
be pretty standard in all distributions that claim to support nfs.

Cheers
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-04 16:09:08

by Greg Bradner

[permalink] [raw]
Subject: Re: lockd and statd

On 4/4/07, Trond Myklebust <[email protected]> wrote:
> On Wed, 2007-04-04 at 00:11 -0700, Greg Bradner wrote:
> > I'm getting these errors in the syslog:
> > Apr 4 00:08:53 fbs2 kernel: statd: server localhost not responding, timed out
> > Apr 4 00:08:53 fbs2 kernel: lockd: cannot monitor fnas2
> > Apr 4 00:08:53 fbs2 kernel: lockd: failed to monitor fnas2
> >
> >
> > I'm using the 2.6.20.4 kernel.
> >
> > I don't see any statd.c in the src tree.
> >
> > Did I miss something Ideas?
>
> statd lives in userland, and is part of the nfs-utils package. It should
> be pretty standard in all distributions that claim to support nfs.
>

That's what I thought. But it doesn't seem to be the case with SUSE 9.3:

> rpm -ql nfs-utils-1.0.7-3
/etc/init.d/nfsserver
/sbin/rpc.lockd
/usr/sbin/exportfs
/usr/sbin/nfsstat
/usr/sbin/nhfsgraph
/usr/sbin/nhfsnums
/usr/sbin/nhfsrun
/usr/sbin/nhfsstone
/usr/sbin/rcnfsserver
/usr/sbin/rpc.mountd
/usr/sbin/rpc.nfsd
/usr/sbin/showmount
/usr/share/doc/packages/nfs-utils
/usr/share/doc/packages/nfs-utils/ChangeLog
/usr/share/doc/packages/nfs-utils/INSTALL
/usr/share/doc/packages/nfs-utils/KNOWNBUGS
/usr/share/doc/packages/nfs-utils/NEW
/usr/share/doc/packages/nfs-utils/README
/usr/share/doc/packages/nfs-utils/THANKS
/usr/share/doc/packages/nfs-utils/TODO
/usr/share/doc/packages/nfs-utils/index.html
/usr/share/doc/packages/nfs-utils/nfs.html
/usr/share/doc/packages/nfs-utils/nfs.ps
/usr/share/doc/packages/nfs-utils/node1.html
/usr/share/doc/packages/nfs-utils/node10.html
/usr/share/doc/packages/nfs-utils/node11.html
/usr/share/doc/packages/nfs-utils/node12.html
/usr/share/doc/packages/nfs-utils/node13.html
/usr/share/doc/packages/nfs-utils/node14.html
/usr/share/doc/packages/nfs-utils/node15.html
/usr/share/doc/packages/nfs-utils/node16.html
/usr/share/doc/packages/nfs-utils/node17.html
/usr/share/doc/packages/nfs-utils/node18.html
/usr/share/doc/packages/nfs-utils/node19.html
/usr/share/doc/packages/nfs-utils/node2.html
/usr/share/doc/packages/nfs-utils/node20.html
/usr/share/doc/packages/nfs-utils/node21.html
/usr/share/doc/packages/nfs-utils/node22.html
/usr/share/doc/packages/nfs-utils/node23.html
/usr/share/doc/packages/nfs-utils/node24.html
/usr/share/doc/packages/nfs-utils/node25.html
/usr/share/doc/packages/nfs-utils/node26.html
/usr/share/doc/packages/nfs-utils/node27.html
/usr/share/doc/packages/nfs-utils/node3.html
/usr/share/doc/packages/nfs-utils/node4.html
/usr/share/doc/packages/nfs-utils/node5.html
/usr/share/doc/packages/nfs-utils/node6.html
/usr/share/doc/packages/nfs-utils/node7.html
/usr/share/doc/packages/nfs-utils/node8.html
/usr/share/doc/packages/nfs-utils/node9.html
/usr/share/man/man5/exports.5.gz
/usr/share/man/man7/nfsd.7.gz
/usr/share/man/man8/exportfs.8.gz
/usr/share/man/man8/lockd.8.gz
/usr/share/man/man8/mountd.8.gz
/usr/share/man/man8/nfsd.8.gz
/usr/share/man/man8/nfsstat.8.gz
/usr/share/man/man8/nhfsgraph.8.gz
/usr/share/man/man8/nhfsnums.8.gz
/usr/share/man/man8/nhfsrun.8.gz
/usr/share/man/man8/nhfsstone.8.gz
/usr/share/man/man8/rpc.mountd.8.gz
/usr/share/man/man8/rpc.nfsd.8.gz
/usr/share/man/man8/showmount.8.gz
/var/adm/fillup-templates/sysconfig.nfs-nfs-utils
/var/lib/nfs
/var/lib/nfs/etab
/var/lib/nfs/rmtab
/var/lib/nfs/xtab

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-04 17:33:40

by Olaf Kirch

[permalink] [raw]
Subject: Re: lockd and statd

On Wednesday 04 April 2007 18:09, Greg Bradner wrote:
> That's what I thought. But it doesn't seem to be the case with SUSE 9.3:

Suse uses a patch that puts statd functionality into the kernel.
So when you switch to a non-Suse kernel, you also need to compile
and install rpc.statd manually - and frob your init scripts so that it gets
started.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-04 19:10:35

by Wendy Cheng

[permalink] [raw]
Subject: Re: lockd and statd

Olaf Kirch wrote:
> On Wednesday 04 April 2007 18:09, Greg Bradner wrote:
>
>> That's what I thought. But it doesn't seem to be the case with SUSE 9.3:
>>
>
> Suse uses a patch that puts statd functionality into the kernel.
> So when you switch to a non-Suse kernel, you also need to compile
> and install rpc.statd manually - and frob your init scripts so that it gets
> started.
>

I'm wondering what would be the reason(s) that the community version of
Linux statd is put in user space ? Any one cares to give either a
technical or historical explanation ?

-- Wendy

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-04 21:34:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: lockd and statd

On Wed, 2007-04-04 at 14:49 -0400, Wendy Cheng wrote:
> Olaf Kirch wrote:
> > On Wednesday 04 April 2007 18:09, Greg Bradner wrote:
> >
> >> That's what I thought. But it doesn't seem to be the case with SUSE 9.3:
> >>
> >
> > Suse uses a patch that puts statd functionality into the kernel.
> > So when you switch to a non-Suse kernel, you also need to compile
> > and install rpc.statd manually - and frob your init scripts so that it gets
> > started.
> >
>
> I'm wondering what would be the reason(s) that the community version of
> Linux statd is put in user space ? Any one cares to give either a
> technical or historical explanation ?
>
> -- Wendy

Off the cuff, I can think of 2 main reasons why it needs to be in user
space:
1) It creates and maintains a directory hierarchy on permanent storage
(as opposed to in a pseudo filesystem).
2) It needs to resolve addresses via DNS etc.

Cheers
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs