Hi,
we use Ubuntu 10.04.3 LTS and often get a traceback for NFS indicating that
the daemon hangs for several seconds. At the same time some client machines
cannot access the server and have to wait. After some minutes everything
goes on.
What could cause the problem? Is there anything we should change?
Here is the message in the kernel log:
[330573.697121] INFO: task nfsd:1376 blocked for more than 120 seconds.
[330573.708375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[330573.730773] nfsd D 0000000000000001 0 1376 2
0x00000000
[330573.730776] ffff88061c21bdc0 0000000000000046 0000000000015f00
0000000000015f00
[330573.730779] ffff88061c111ad0 ffff88061c21bfd8 0000000000015f00
ffff88061c111700
[330573.730781] 0000000000015f00 ffff88061c21bfd8 0000000000015f00
ffff88061c111ad0
[330573.730784] Call Trace:
[330573.730788] [<ffffffff81559e67>] __mutex_lock_slowpath+0x107/0x190
[330573.730796] [<ffffffffa012300f>] ? svc_authorise+0x3f/0x50 [sunrpc]
[330573.730799] [<ffffffff81559863>] mutex_lock+0x23/0x50
[330573.730807] [<ffffffffa012d478>] svc_send+0x58/0xe0 [sunrpc]
[330573.730809] [<ffffffff8105df90>] ? default_wake_function+0x0/0x20
[330573.730817] [<ffffffffa011faec>] svc_process+0x11c/0x150 [sunrpc]
[330573.730821] [<ffffffffa0184ae5>] nfsd+0xc5/0x170 [nfsd]
[330573.730830] [<ffffffffa0184a20>] ? nfsd+0x0/0x170 [nfsd]
[330573.730832] [<ffffffff81085db6>] kthread+0x96/0xa0
[330573.730835] [<ffffffff810141aa>] child_rip+0xa/0x20
[330573.730837] [<ffffffff81085d20>] ? kthread+0x0/0xa0
[330573.730839] [<ffffffff810141a0>] ? child_rip+0x0/0x20
--
Christoph Bartoschek
T24gU2F0LCAyMDEyLTAzLTMxIGF0IDEzOjU1ICswMjAwLCBDaHJpc3RvcGggQmFydG9zY2hlayB3
cm90ZToNCj4gSGksDQo+IA0KPiB3ZSB1c2UgVWJ1bnR1IDEwLjA0LjMgTFRTIGFuZCBvZnRlbiBn
ZXQgYSB0cmFjZWJhY2sgZm9yIE5GUyBpbmRpY2F0aW5nIHRoYXQgDQo+IHRoZSBkYWVtb24gaGFu
Z3MgZm9yIHNldmVyYWwgc2Vjb25kcy4gQXQgdGhlIHNhbWUgdGltZSBzb21lIGNsaWVudCBtYWNo
aW5lcyANCj4gY2Fubm90IGFjY2VzcyB0aGUgc2VydmVyIGFuZCBoYXZlIHRvIHdhaXQuIEFmdGVy
IHNvbWUgbWludXRlcyBldmVyeXRoaW5nIA0KPiBnb2VzIG9uLg0KPiANCj4gV2hhdCBjb3VsZCBj
YXVzZSB0aGUgcHJvYmxlbT8gSXMgdGhlcmUgYW55dGhpbmcgd2Ugc2hvdWxkIGNoYW5nZT8NCj4g
DQo+IEhlcmUgaXMgdGhlIG1lc3NhZ2UgaW4gdGhlIGtlcm5lbCBsb2c6DQo+IA0KPiBbMzMwNTcz
LjY5NzEyMV0gSU5GTzogdGFzayBuZnNkOjEzNzYgYmxvY2tlZCBmb3IgbW9yZSB0aGFuIDEyMCBz
ZWNvbmRzLg0KPiBbMzMwNTczLjcwODM3NV0gImVjaG8gMCA+IC9wcm9jL3N5cy9rZXJuZWwvaHVu
Z190YXNrX3RpbWVvdXRfc2VjcyIgZGlzYWJsZXMgDQo+IHRoaXMgbWVzc2FnZS4NCj4gWzMzMDU3
My43MzA3NzNdIG5mc2QgICAgICAgICAgRCAwMDAwMDAwMDAwMDAwMDAxICAgICAwICAxMzc2ICAg
ICAgMiANCj4gMHgwMDAwMDAwMA0KPiBbMzMwNTczLjczMDc3Nl0gIGZmZmY4ODA2MWMyMWJkYzAg
MDAwMDAwMDAwMDAwMDA0NiAwMDAwMDAwMDAwMDE1ZjAwIA0KPiAwMDAwMDAwMDAwMDE1ZjAwDQo+
IFszMzA1NzMuNzMwNzc5XSAgZmZmZjg4MDYxYzExMWFkMCBmZmZmODgwNjFjMjFiZmQ4IDAwMDAw
MDAwMDAwMTVmMDAgDQo+IGZmZmY4ODA2MWMxMTE3MDANCj4gWzMzMDU3My43MzA3ODFdICAwMDAw
MDAwMDAwMDE1ZjAwIGZmZmY4ODA2MWMyMWJmZDggMDAwMDAwMDAwMDAxNWYwMCANCj4gZmZmZjg4
MDYxYzExMWFkMA0KPiBbMzMwNTczLjczMDc4NF0gQ2FsbCBUcmFjZToNCj4gWzMzMDU3My43MzA3
ODhdICBbPGZmZmZmZmZmODE1NTllNjc+XSBfX211dGV4X2xvY2tfc2xvd3BhdGgrMHgxMDcvMHgx
OTANCj4gWzMzMDU3My43MzA3OTZdICBbPGZmZmZmZmZmYTAxMjMwMGY+XSA/IHN2Y19hdXRob3Jp
c2UrMHgzZi8weDUwIFtzdW5ycGNdDQoNCkF0IGEgZ3Vlc3MsIEknZCBzYXkgdGhhdCB5b3VyIG1v
dW50ZCAgb3IgcnBjLnN2Y2dzc2QgaXMgcHJvYmFibHkNCmJ1c3kvaGFuZ2luZywgY2F1c2luZyB0
aGUga2VybmVsIE5GUyBkYWVtb24gdG8gaGFuZyB3aGlsZSBpdCB3YWl0cyB0bw0KYXV0aG9yaXNl
IGEgY2xpZW50IG9yIHVzZXIuIFR5cGljYWxseSwgeW91IHdpbGwgc2VlIHRoZSBhYm92ZSBpbiB0
aGUNCmNhc2Ugb2YgYSBrZXJiZXJvcywgTklTIG9yIGxkYXAgb3V0YWdlLg0KDQpTbyBhcmUgeW91
IHVzaW5nIE5JUyBvciBsZGFwLWJhc2VkIG5ldGdyb3VwcyBpbiB5b3VyIC9ldGMvZXhwb3J0cywg
b3INCmFyZSB5b3VyIGNsaWVudHMgcGVyaGFwcyBtb3VudGluZyB3aXRoIHN5cz1rcmI1Pw0KDQpD
aGVlcnMNCiAgVHJvbmQNCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0
YXBwLmNvbQ0KDQo=
I have just seen that the time on our NIS server was several minutes off.
Could this be the reason for our problems?
I know that kerberos needs accurate time, but is this also the case for NIS?
Christoph