2008-09-29 16:55:27

by Just Marc

[permalink] [raw]
Subject: lockd using up 60% CPU and won't let go

Hi everyone,

Doing a seemingly innocent operation such as opening a file with vim on
a CFS (yes, that old crypto file system) NFS mount, lockd would wake up
and take 60% of my CPU away - probably doing nothing important but
certainly keeping the CPU busy, forever.

I use kernel 2.6.26 and kernel NFS. Some detail is available below:

$ grep nfs /proc/mounts
nfsd /proc/fs/nfsd nfsd rw 0 0localhost:/var/lib/cfs/.cfsfs /var/cfs nfs
rw,vers=2,rsize=8192,wsize=8192,namlen=255,hard,intr,proto=udp,timeo=11,retrans=3,sec=sys,addr=127.0.0.1
0 0
localhost:/var/lib/cfs/.cfsfs/x /var/cfs/x nfs
rw,vers=2,rsize=8192,wsize=8192,namlen=255,hard,intr,proto=udp,timeo=11,retrans=3,sec=sys,addr=127.0.0.1
0 0

$ egrep 'NFS|_LOCKD' .config
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_ROOT_NFS=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y

I noticed this a few weeks ago but I don't quite know what causes it but
I certainly know how to trigger it. Stopping CFS and NFS completely
doesn't help - as soon as NFS is restarted lockd starts eating CPU again
just like before.

I'd appreciate any hints on what I can do to find the root cause of the
problem and help get this bug out of the way.

Best,
Marc



2008-09-29 17:14:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: lockd using up 60% CPU and won't let go

On Mon, Sep 29, 2008 at 12:46:15PM -0400, Just Marc wrote:
> Doing a seemingly innocent operation such as opening a file with vim on
> a CFS (yes, that old crypto file system)

It's basically just a userspace NFS server, right?

> NFS mount, lockd would wake up
> and take 60% of my CPU away - probably doing nothing important but
> certainly keeping the CPU busy, forever.

Could you work around the problem by mounting with -onolock?

> I use kernel 2.6.26 and kernel NFS. Some detail is available below:
>
> $ grep nfs /proc/mounts
> nfsd /proc/fs/nfsd nfsd rw 0 0localhost:/var/lib/cfs/.cfsfs /var/cfs nfs

(Missing end-of-line before "localhost"?)

> rw,vers=2,rsize=8192,wsize=8192,namlen=255,hard,intr,proto=udp,timeo=11,retrans=3,sec=sys,addr=127.0.0.1
> 0 0
> localhost:/var/lib/cfs/.cfsfs/x /var/cfs/x nfs
> rw,vers=2,rsize=8192,wsize=8192,namlen=255,hard,intr,proto=udp,timeo=11,retrans=3,sec=sys,addr=127.0.0.1
> 0 0
>
> $ egrep 'NFS|_LOCKD' .config
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_NFS_FS=y
> CONFIG_NFS_V3=y
> CONFIG_NFS_V3_ACL=y
> CONFIG_NFS_V4=y
> CONFIG_NFSD=y
> CONFIG_NFSD_V2_ACL=y
> CONFIG_NFSD_V3=y
> CONFIG_NFSD_V3_ACL=y
> CONFIG_NFSD_V4=y
> CONFIG_ROOT_NFS=y
> CONFIG_LOCKD=y
> CONFIG_LOCKD_V4=y
> CONFIG_NFS_ACL_SUPPORT=y
> CONFIG_NFS_COMMON=y
>
> I noticed this a few weeks ago but I don't quite know what causes it but
> I certainly know how to trigger it. Stopping CFS and NFS completely
> doesn't help - as soon as NFS is restarted lockd starts eating CPU again
> just like before.
>
> I'd appreciate any hints on what I can do to find the root cause of the
> problem and help get this bug out of the way.

You might try running wireshark on the "lo" interface and seeing whether
there's any NLM traffic from lockd.

Or a sysrq-t trace ("echo t >/proc/sysrq-trigger", then look in the
logs) might show what lockd's doing.

--b.

2008-09-30 00:18:14

by Just Marc

[permalink] [raw]
Subject: Re: lockd using up 60% CPU and won't let go

Hi,

> It's basically just a userspace NFS server, right?

Correct.

> Could you work around the problem by mounting with -onolock?

That doesn't seem to help.

>You might try running wireshark on the "lo" interface and seeing
whether there's any NLM traffic from lockd.

You guessed right. There's a 12 megabytes per second of NLM traffic on lo.

unlock call requests and unlock replies saying permission denied, looks
like it just repeats forever in a tight loop.

Marc