2006-05-27 20:39:26

by Ramon van Alteren

[permalink] [raw]
Subject: Soft CPU Lockup on 2.6.15 with kernel nfsd

Hi List,

I'm seeing several of these in my dmesg output on one of my NAS servers.
Any pointers on where to start looking for a cause would be highly
appreciated.

Pid: 7122, comm: nfsd
EIP: 0060:[<c057fe43>] CPU: 0
EIP is at _read_lock_bh+0x12/0x1a
EFLAGS: 00000216 Not tainted (2.6.15-gentoo-r1)
EAX: c069980c EBX: 00000000 ECX: f7ece000 EDX: f6100000
ESI: c06997e0 EDI: ef7b4020 EBP: c0785b80 DS: 007b ES: 007b
CR0: 8005003b CR2: 08056f98 CR3: 0073b000 CR4: 000006d0
[<c05101c6>] ipt_do_table+0x6a/0x326
[<c050a335>] ip_conntrack_in+0xd2/0x2c7
[<c04d1fde>] ip_rcv_finish+0x0/0x2ba
[<c051365b>] ipt_route_hook+0x37/0x3b
[<c04c90ef>] nf_iterate+0x6f/0x87
[<c04d1fde>] ip_rcv_finish+0x0/0x2ba
[<c04d1fde>] ip_rcv_finish+0x0/0x2ba
[<c04c9172>] nf_hook_slow+0x6b/0x10d
[<c04d1fde>] ip_rcv_finish+0x0/0x2ba
[<c04d1cc4>] ip_rcv+0x460/0x57e
[<c04d1fde>] ip_rcv_finish+0x0/0x2ba
[<c04b5c93>] netif_receive_skb+0x147/0x1c3
[<c03efd98>] tg3_rx+0x2ed/0x3d4
[<c03efede>] tg3_poll+0x5f/0x14e
[<c04b5e89>] net_rx_action+0x77/0xfe
[<c01249b7>] __do_softirq+0xbf/0xd1
[<c01249fb>] do_softirq+0x32/0x34
[<c0105242>] do_IRQ+0x1e/0x24
[<c010377e>] common_interrupt+0x1a/0x20
[<c057fe8f>] _spin_lock+0x3/0xf
[<c0177c23>] dput+0x33/0x1cb
[<c02217ee>] fh_put+0x113/0x186
[<c022e957>] nfs3svc_release_fhandle2+0x24/0x31
[<c05756ef>] svc_process+0x42e/0x643
[<c021ed92>] nfsd+0x1b4/0x345
[<c021ebde>] nfsd+0x0/0x345
[<c0101125>] kernel_thread_helper+0x5/0xb
RPC: bad TCP reclen 0x0ddc8af7 (large)
RPC: bad TCP reclen 0x272d2e25 (non-terminal)
RPC: bad TCP reclen 0x46c6f2c8 (large)
BUG: soft lockup detected on CPU#0!

Thanx Ramon

--
If to err is human, I'm most certainly human.





-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-05-28 23:58:48

by NeilBrown

[permalink] [raw]
Subject: Re: Soft CPU Lockup on 2.6.15 with kernel nfsd

On Saturday May 27, [email protected] wrote:
> Hi List,
>
> I'm seeing several of these in my dmesg output on one of my NAS servers.
> Any pointers on where to start looking for a cause would be highly
> appreciated.
>
> Pid: 7122, comm: nfsd
> EIP: 0060:[<c057fe43>] CPU: 0
> EIP is at _read_lock_bh+0x12/0x1a
> EFLAGS: 00000216 Not tainted (2.6.15-gentoo-r1)
> EAX: c069980c EBX: 00000000 ECX: f7ece000 EDX: f6100000
> ESI: c06997e0 EDI: ef7b4020 EBP: c0785b80 DS: 007b ES: 007b
> CR0: 8005003b CR2: 08056f98 CR3: 0073b000 CR4: 000006d0
> [<c05101c6>] ipt_do_table+0x6a/0x326
> [<c050a335>] ip_conntrack_in+0xd2/0x2c7
> [<c04d1fde>] ip_rcv_finish+0x0/0x2ba
> [<c051365b>] ipt_route_hook+0x37/0x3b
snip
> [<c021ebde>] nfsd+0x0/0x345
> [<c0101125>] kernel_thread_helper+0x5/0xb
> RPC: bad TCP reclen 0x0ddc8af7 (large)
> RPC: bad TCP reclen 0x272d2e25 (non-terminal)
> RPC: bad TCP reclen 0x46c6f2c8 (large)
> BUG: soft lockup detected on CPU#0!

Shouldn't the stack trace be "after" the 'soft lockup' message?

Anyway, the 'bad TCP reclen' is saying that you are getting incoming
garbage on tcp connections to the NFS server. It must at least look
like TCP packets to get that far, but it definitely doesn't look like
RPC packets in the TCP packets.

Why this is causing 'soft lockup' isn't clear. The lockup seems to be
in an interrupt service routine. Maybe something about these packets
is confusing netfilter enough that it is taking a long time to
do something.

I would suggest checking your network, make sure there is no
unexpected traffic to port 2049.

NeilBrown


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs