2018-09-20 16:35:14

by Jäkel, Guido

[permalink] [raw]
Subject: RE: NFS3 subsystem hung, Kernel alive

SGkgYWxsLA0KDQpUb2RheSBhdCBhYm91dCAidGhlIGV2ZW50IHRpbWUiIHByb2R1Y3Rpb24ga2Vl
cHMgcnVubmluZyBidXQgSSBkaXNjb3ZlciB0aGF0IG9uZSBvZiB0aGUgaG9zdHMgaW4gdGhlIFRl
c3Qgc3RhZ2UgKGJsYWRlcnVubmVyMTApIGJlY29tZSB2ZXJ5ICJzdHV0dGVyaW5nIiB0byByZWFj
dCBvbiBjb21tYW5kcy4NCg0KRnJvbSAgaHR0cHM6Ly91dGNjLnV0b3JvbnRvLmNhL35ja3Mvc3Bh
Y2UvYmxvZy9saW51eC9ORlNNb3VudHN0YXRzWHBydCAgSSBnb3Qgc29tZSBpbmZvcm1hdGlvbiBh
Ym91dC4gQW5kIEkgc3RhcnRlZCB0bw0KDQoJd2F0Y2ggLW4gMSAic2VkIC1uICcvXmRldmljZSAu
KiBvbiBcLyB3aXRoLywvXiQvIHAnICAvcHJvYy9zZWxmL21vdW50c3RhdHMiDQoNCm9uIHRoZSBo
b3N0cyB0byB3YXRjaCB0aGUgcm9vdCBtb3VudC4gT24gIGJsYWRlcnVubmVyMTAgIEkgbm90aWNl
IGEgdmVyeSBoaWdoIHZhbHVlIG9mIHRoZSA4dGggZmllbGQgb2YgeHBydCAoJ2JhZCBYSURzJyks
IHdoaWNoIGlzIGlkZW50aWNhbCB0byB0aGUgZGlmZmVyZW5jZSBiZXR3ZWVuIGZpbGVkIDYgYW5k
IDcgKFRYLVJYKS4gRG9lcyB0aGF0IG1lYW4sIHRoYXQgdGhlcmUgd2VyZSBhIGhpZ2ggbnVtYmVy
IG9mIGJhZCBhbnN3ZXJzIHRvIHF1ZXN0aW9ucz8gT3IgaXMgdGhpcyB0aGUgbnVtYmVyIG9mIHJl
cGxpZXMgdGhhdCBhcmUgb3V0IG9mIHRpbWU/IA0KDQpJZiBJIHdhdGNoIFRYLVJYLUJBRCwgdGhp
cyBpcyBuZWFyIHplcm8gb24gYWxsIGhvc3RzLiBCdXQgb24gYmxhZGVydW5uZXIxMCwgaXQgc29t
ZXRpbWUgcmlzZXMgdG8gZW5vcm1vdXMgdmFsdWVzICg+MTAwMDAwKSBhbmQgaW4gdGhpcyBtb21l
bnQsIGFsbCBGaWxlLUlPIGlzIGZyb3plbiAtIEUuZy4gSSBkb24ndCBnZXQgYSBuZXcgcHJvbXB0
IGlmIEkgc2ltcGx5IGhpdCBlbnRlciBvbiBhbiBiYXNoIGNvbW1hbmQgbGluZS4NCg0KDQoNCmRl
dmljZSAxMC42OS42My4xOTY6LzAyL3EvZGlza2xlc3Mvcm9vdHMvYmxhZGVydW5uZXIxMCBtb3Vu
dGVkIG9uIC8gd2l0aCBmc3R5cGUgbmZzIHN0YXR2ZXJzPTEuMQ0KICAgICAgICBvcHRzOiAgIHJ3
LHZlcnM9Myxyc2l6ZT0xMDI0LHdzaXplPTEwMjQsbmFtbGVuPTI1NSxhY3JlZ21pbj0zLGFjcmVn
bWF4PTYwLGFjZGlybWluPTMwLGFjZGlybWF4PTYwLGhhcmQsbm9sb2NrLHByb3RvPXRjcCx0aW1l
bz02MDAscmV0cmFucz0yLHNlYz1zeXMsbW91bnRhZGRyPTEwLjY5LjYzLjE5Nixtb3VudHZlcnM9
Myxtb3VudHBvcnQ9MCxtb3VudHByb3RvPXRjcCxsb2NhbF9sb2NrPWFsbA0KICAgICAgICBhZ2U6
ICAgIDk5Mzk3MDINCiAgICAgICAgY2FwczogICBjYXBzPTB4M2ZjNyx3dG11bHQ9NTEyLGR0c2l6
ZT0xMDI0LGJzaXplPTAsbmFtbGVuPTI1NQ0KICAgICAgICBzZWM6ICAgIGZsYXZvcj0xLHBzZXVk
b2ZsYXZvcj0xDQogICAgICAgIGV2ZW50czogMjY5MzQzOTI0IDEzNDczOTA4NzMwOCAyMDczNCAx
NDA5MTUgMjMyMTk1NTI0IDc5MjYyIDEzNDg4NjUzODE0OCAyMTgwNDcyMiAxMDQgMTYwNjcgMCAy
OTMzNDE3ODYgMjIyMTkwIDc1MzU2IDE3NzA2Nzk2OSAzNTc5NiAyODI2IDIzMTkwODAyNyAwIDQx
MSAyMTc4MzkwMiAxOTkgMCAwIDAgMCAwIA0KICAgICAgICBieXRlczogIDEyODY1NDgzMDY5NiAy
MDMyMDk1Mzc1OSAwIDAgMjE5NTE3Njc5IDIwNDE1MjI4OTU1IDYzNzcyIDUwMDg4MjEgDQogICAg
ICAgIFJQQyBpb3N0YXRzIHZlcnNpb246IDEuMCAgcC92OiAxMDAwMDMvMyAobmZzKQ0KICAgICAg
ICB4cHJ0OiAgIHRjcCA4MzcgMSAxIDAgMCAyMTQ0ODIyMDM1MCAyMTQ0ODE2NTA2NiA1NTI4NCA1
NzYyODc2NTQ2MzAxMjEgMCAzNDcxMiA4NDUyMjAzMjMwNDEgNTE0MjU2OTE0MDM1DQogICAgICAg
IHBlci1vcCBzdGF0aXN0aWNzDQogICAgICAgICAgICAgICAgTlVMTDogMCAwIDAgMCAwIDAgMCAw
DQogICAgICAgICAgICAgR0VUQVRUUjogMjY5MzQzODk5IDI2OTM0Mzg5OSAwIDM2ODA5MDcxOTE2
IDMwMTY2NTEzNTUyIDMwMzQ0OTggNzE1NzgzNTAgNzgwODA0OTINCiAgICAgICAgICAgICBTRVRB
VFRSOiA3NTcyMSA3NTcyMSAwIDE1OTcyNjI4IDEwOTAzODI0IDE4NTUgNzAyODQgNzM3MjANCiAg
ICAgICAgICAgICAgTE9PS1VQOiA4MDI5NiA4MDI5NiAwIDE1ODI1NDg0IDE4ODE0MzYwIDczMTIg
MTM1OTUxIDE0NDY3OA0KICAgICAgICAgICAgICBBQ0NFU1M6IDM5Mjc0IDM5Mjc0IDAgNzA0ODA1
MiA0NzEyODgwIDQyNDEgMjY0ODUgMzEyNzQNCiAgICAgICAgICAgIFJFQURMSU5LOiA5OTUgOTk1
IDAgMTcwNzk2IDEzOTU2NCA3MiA0NzkgNTY3DQogICAgICAgICAgICAgICAgUkVBRDogMjIzOTQ1
IDIyMzk0NSAwIDQwMzI3MjI4IDI0ODE5ODExNiAxMzAyMjUgMTQzNzgxMCAxNTgzMTcyDQogICAg
ICAgICAgICAgICBXUklURTogMTk5NTg5ODUgMTk5NTg5ODUgMCAyNDQwNjc4Mzg0OCAzMTkzNDM3
NjAwIDE2NzQyMTQ1ODQwNCAyNzA4NjU4NjY3OSAxOTQ1MTEwMTI5OTINCiAgICAgICAgICAgICAg
Q1JFQVRFOiA1MjgxIDUyODEgMCAxMTI2MDYwIDE1NDIwNTIgMTMyIDIxNjk4IDIxOTg5DQogICAg
ICAgICAgICAgICBNS0RJUjogMTI3IDEyNyAwIDI5MTYwIDM2NzQwIDEwIDEyMzA3IDEyMzIxDQog
ICAgICAgICAgICAgU1lNTElOSzogMyAzIDAgNzE2IDg3NiAwIDEgMQ0KICAgICAgICAgICAgICAg
TUtOT0Q6IDMgMyAwIDYzNiA4NzYgMCAyIDINCiAgICAgICAgICAgICAgUkVNT1ZFOiAzNDAwIDM0
MDAgMCA2NjM2MDQgNDg5NjAwIDUyIDEyMTY0IDEyMzEyDQogICAgICAgICAgICAgICBSTURJUjog
MTIyIDEyMiAwIDI0NjI0IDE3NTIwIDE1IDQ2MyA0ODMNCiAgICAgICAgICAgICAgUkVOQU1FOiAy
MDc0IDIwNzQgMCA0OTEzNTIgNTM5MjQwIDY3IDExNDMzIDExNTI5DQogICAgICAgICAgICAgICAg
TElOSzogMCAwIDAgMCAwIDAgMCAwDQogICAgICAgICAgICAgUkVBRERJUjogMzE4ODIgMzE4ODIg
MCA2Mzc2NDAwIDMyMzExMDM2IDI3MDcgNjQ4MDYgNjgzNzkNCiAgICAgICAgIFJFQURESVJQTFVT
OiAyNzM4ODIgMjczODgyIDAgNTU4MDc4NzYgMTQwODg0MzYwIDE0MjU3IDUwOTgyNiA1MzA4OTQN
CiAgICAgICAgICAgICAgRlNTVEFUOiA1MzggNTM4IDAgOTUyMTIgOTAzODQgNjEgNDQ1IDUxOQ0K
ICAgICAgICAgICAgICBGU0lORk86IDIgMiAwIDI3MiAzMjggMCAwIDANCiAgICAgICAgICAgIFBB
VEhDT05GOiAxIDEgMCAxMzYgMTQwIDAgMCAwDQogICAgICAgICAgICAgIENPTU1JVDogMCAwIDAg
MCAwIDAgMCAwDQoNCg0K


2018-09-25 04:03:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS3 subsystem hung, Kernel alive

On Thu, Sep 20, 2018 at 10:52:17AM +0000, Jäkel, Guido wrote:
> Hi all,
>
> Today at about "the event time" production keeps running but I discover that one of the hosts in the Test stage (bladerunner10) become very "stuttering" to react on commands.
>
> From https://utcc.utoronto.ca/~cks/space/blog/linux/NFSMountstatsXprt I got some information about. And I started to
>
> watch -n 1 "sed -n '/^device .* on \/ with/,/^$/ p' /proc/self/mountstats"
>
> on the hosts to watch the root mount. On bladerunner10 I notice a very high value of the 8th field of xprt ('bad XIDs'), which is identical to the difference between filed 6 and 7 (TX-RX). Does that mean, that there were a high number of bad answers to questions? Or is this the number of replies that are out of time?

I don't know what you mean by "filed 6 and 7". Oh, wait, I guess you're
talking about the 6th and 7th fileds of the "xprt" line in mountstats.

bad_xids means the client got a response but couldn't find a matching
reply. I'm not sure why that would happen--maybe a response came after
the client gave up waiting for it?

--b.

>
> If I watch TX-RX-BAD, this is near zero on all hosts. But on bladerunner10, it sometime rises to enormous values (>100000) and in this moment, all File-IO is frozen - E.g. I don't get a new prompt if I simply hit enter on an bash command line.
>
>
>
> device 10.69.63.196:/02/q/diskless/roots/bladerunner10 mounted on / with fstype nfs statvers=1.1
> opts: rw,vers=3,rsize=1024,wsize=1024,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.69.63.196,mountvers=3,mountport=0,mountproto=tcp,local_lock=all
> age: 9939702
> caps: caps=0x3fc7,wtmult=512,dtsize=1024,bsize=0,namlen=255
> sec: flavor=1,pseudoflavor=1
> events: 269343924 134739087308 20734 140915 232195524 79262 134886538148 21804722 104 16067 0 293341786 222190 75356 177067969 35796 2826 231908027 0 411 21783902 199 0 0 0 0 0
> bytes: 128654830696 20320953759 0 0 219517679 20415228955 63772 5008821
> RPC iostats version: 1.0 p/v: 100003/3 (nfs)
> xprt: tcp 837 1 1 0 0 21448220350 21448165066 55284 576287654630121 0 34712 845220323041 514256914035
> per-op statistics
> NULL: 0 0 0 0 0 0 0 0
> GETATTR: 269343899 269343899 0 36809071916 30166513552 3034498 71578350 78080492
> SETATTR: 75721 75721 0 15972628 10903824 1855 70284 73720
> LOOKUP: 80296 80296 0 15825484 18814360 7312 135951 144678
> ACCESS: 39274 39274 0 7048052 4712880 4241 26485 31274
> READLINK: 995 995 0 170796 139564 72 479 567
> READ: 223945 223945 0 40327228 248198116 130225 1437810 1583172
> WRITE: 19958985 19958985 0 24406783848 3193437600 167421458404 27086586679 194511012992
> CREATE: 5281 5281 0 1126060 1542052 132 21698 21989
> MKDIR: 127 127 0 29160 36740 10 12307 12321
> SYMLINK: 3 3 0 716 876 0 1 1
> MKNOD: 3 3 0 636 876 0 2 2
> REMOVE: 3400 3400 0 663604 489600 52 12164 12312
> RMDIR: 122 122 0 24624 17520 15 463 483
> RENAME: 2074 2074 0 491352 539240 67 11433 11529
> LINK: 0 0 0 0 0 0 0 0
> READDIR: 31882 31882 0 6376400 32311036 2707 64806 68379
> READDIRPLUS: 273882 273882 0 55807876 140884360 14257 509826 530894
> FSSTAT: 538 538 0 95212 90384 61 445 519
> FSINFO: 2 2 0 272 328 0 0 0
> PATHCONF: 1 1 0 136 140 0 0 0
> COMMIT: 0 0 0 0 0 0 0 0
>
>

2018-09-25 13:02:31

by Jäkel, Guido

[permalink] [raw]
Subject: RE: NFS3 subsystem hung, Kernel alive

RGVhciBCcnVjZSwNCg0KSSB3cm90ZSB0aGUgc2NyaXB0IHJwYy1zdGF0DQoNCglyb290QGJsYWRl
cnVubmVyMTAgfiAjIGNhdCAvb3B0L2Jpbi9ycGMtc3RhdCANCglzZWQgLW4gJy9eZGV2aWNlIC4q
IG9uIFwvIHdpdGgvLC9eJC8gey94cHJ0Oi8gIWQ7IHB9JyAgL3Byb2Mvc2VsZi9tb3VudHN0YXRz
IHwgY3V0IC1kICIgIiAtZiA3LDgsOSwxMSwxMiB8IFwNCiggcmVhZCBUWCBSWCBCQUQgQlEgTUFY
U0xPVFMgJiYgcHJpbnRmICJydW5uaW5nOiUtM2QgdGltZW91dDolLTNkLCBxdWV1ZWQ6JS0zZCwg
bWF4OiUtNWRcbiIgICQoKFRYLVJYLUJBRCkpICRCQUQgJEJRICRNQVhTTE9UUyApDQoNCnRvIGxl
dCBpdCBydW4gb24gdGVybWluYWxzIHZpYQ0KDQoJd2F0Y2ggLWQgLW4gLjEgcnBjLXN0YXQNCg0K
VGhpcyBvdXRwdXQgc29tZXRoaW5nIGxpa2UNCg0KCUV2ZXJ5IDAuMXM6IHJwYy1zdGF0ICAgICAg
ICAgICAgICBibGFkZXJ1bm5lcjEwOiBUdWUgU2VwIDI1IDA4OjQzOjAyIDIwMTgNCg0KCXJ1bm5p
bmc6MCAgIHRpbWVvdXQ6NTYxNDYsIHF1ZXVlZDowICAsIG1heDozODYzMg0KDQoNClRoZSB2YWx1
ZSBvZiAicnVubmluZyIgKFRYLVJYLUJBRCkgaXMgbW9zdGx5IHplcm8sIGl0IHNlZW0gdG8gY29y
cmVzcG9uZCB3ZWxsIHRvIGFjdGl2aXR5LiBJIHdvbmRlciBhYm91dCB0aGUgInRpbWVvdXQiIChi
YWQgWElEcykgdmFsdWUgLSBpdCBzZWVtIG11Y2ggdG9vIGhpZ2ggZm9yIG1lLiBUaGUgcmVjZW50
bHkgYm9vdGVkICBibGFkZXJ1bm5lcjE0ICBzaG93cyBhIHVub2J0cnVzaXZlIHZhbHVlLCBidXQg
dGhlIG9uZSBvZiB0aGUgc3RydWdnbGluZyAgYmxhZGVydW5uZXIxMCAgc2VlbSB2ZXJ5IGhpZ2gg
dG8gbWUuDQoNCg0KCXJvb3RAYmxhZGVydW5uZXIxNCB+ICMgcnBjLXN0YXQgDQoJcnVubmluZzow
ICAgdGltZW91dDoyNyAsIHF1ZXVlZDowICAsIG1heDoxMDI0Mg0KDQoJcm9vdEBibGFkZXJ1bm5l
cjE0IH4gIyB1cHRpbWUNCgkgMDg6NDg6MzkgdXAgNSBkYXlzLCAyMToyNSwgIDcgdXNlcnMsICBs
b2FkIGF2ZXJhZ2U6IDEwLjk2LCAxMi4xOCwgMTIuNDcNCgkJXi0tKDYgdXNlcnMgZnJvbSB0aGUg
c3RpbGwgcnVubmluZyBjb25zb2xlcyB3aXRoIHRoZSBidXN5Ym94IHNoZWxscykNCg0KDQpIZXJl
J3MgdGhlIHZhbHVlcyBvZiB0aGUgb3RoZXIgYmxhZGUgaG9zdCB1c2VkIGZvciBQcm9kdWN0aW9u
IHN0YWdlLCAgYmxhZGVydW5uZXIxNSAgOg0KDQoJcm9vdEBibGFkZXJ1bm5lcjE1IH4gIyBycGMt
c3RhdA0KCXJ1bm5pbmc6MSAgIHRpbWVvdXQ6MTggLCBxdWV1ZWQ6MCAgLCBtYXg6NzQ0MA0KDQoJ
cm9vdEBibGFkZXJ1bm5lcjE1IH4gIyB1cHRpbWUNCgkgMDg6NTM6MDUgdXAgMTExIGRheXMsIDI2
IG1pbiwgIDEgdXNlciwgIGxvYWQgYXZlcmFnZTogMjAuMzgsIDE5LjI3LCAxOS40MCAgDQoNCg0K
DQo+LS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj5Gcm9tOiAnSi4gQnJ1Y2UgRmllbGRzJyBb
bWFpbHRvOmJmaWVsZHNAZmllbGRzZXMub3JnXQ0KPlNlbnQ6IE1vbmRheSwgU2VwdGVtYmVyIDI0
LCAyMDE4IDExOjU5IFBNDQo+VG86IErDpGtlbCwgR3VpZG8gPEcuSmFla2VsQGRuYi5kZT4NCj5D
YzogJ0plZmYgTGF5dG9uJyA8amxheXRvbkBrZXJuZWwub3JnPjsgJ2xpbnV4LW5mc0B2Z2VyLmtl
cm5lbC5vcmcnIDxsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0KPlN1YmplY3Q6IFJlOiBORlMz
IHN1YnN5c3RlbSBodW5nLCBLZXJuZWwgYWxpdmUNCj4NCj5PbiBUaHUsIFNlcCAyMCwgMjAxOCBh
dCAxMDo1MjoxN0FNICswMDAwLCBKw6RrZWwsIEd1aWRvIHdyb3RlOg0KPj4gSGkgYWxsLA0KPj4N
Cj4+IFRvZGF5IGF0IGFib3V0ICJ0aGUgZXZlbnQgdGltZSIgcHJvZHVjdGlvbiBrZWVwcyBydW5u
aW5nIGJ1dCBJIGRpc2NvdmVyIHRoYXQgb25lIG9mIHRoZSBob3N0cyBpbiB0aGUgVGVzdCBzdGFn
ZQ0KPihibGFkZXJ1bm5lcjEwKSBiZWNvbWUgdmVyeSAic3R1dHRlcmluZyIgdG8gcmVhY3Qgb24g
Y29tbWFuZHMuDQo+Pg0KPj4gRnJvbSAgaHR0cHM6Ly91dGNjLnV0b3JvbnRvLmNhL35ja3Mvc3Bh
Y2UvYmxvZy9saW51eC9ORlNNb3VudHN0YXRzWHBydCAgSSBnb3Qgc29tZSBpbmZvcm1hdGlvbiBh
Ym91dC4gQW5kIEkgc3RhcnRlZCB0bw0KPj4NCj4+IAl3YXRjaCAtbiAxICJzZWQgLW4gJy9eZGV2
aWNlIC4qIG9uIFwvIHdpdGgvLC9eJC8gcCcgIC9wcm9jL3NlbGYvbW91bnRzdGF0cyINCj4+DQo+
PiBvbiB0aGUgaG9zdHMgdG8gd2F0Y2ggdGhlIHJvb3QgbW91bnQuIE9uICBibGFkZXJ1bm5lcjEw
ICBJIG5vdGljZSBhIHZlcnkgaGlnaCB2YWx1ZSBvZiB0aGUgOHRoIGZpZWxkIG9mIHhwcnQgKCdi
YWQgWElEcycpLA0KPndoaWNoIGlzIGlkZW50aWNhbCB0byB0aGUgZGlmZmVyZW5jZSBiZXR3ZWVu
IGZpbGVkIDYgYW5kIDcgKFRYLVJYKS4gRG9lcyB0aGF0IG1lYW4sIHRoYXQgdGhlcmUgd2VyZSBh
IGhpZ2ggbnVtYmVyIG9mIGJhZCBhbnN3ZXJzDQo+dG8gcXVlc3Rpb25zPyBPciBpcyB0aGlzIHRo
ZSBudW1iZXIgb2YgcmVwbGllcyB0aGF0IGFyZSBvdXQgb2YgdGltZT8NCj4NCj5JIGRvbid0IGtu
b3cgd2hhdCB5b3UgbWVhbiBieSAiZmlsZWQgNiBhbmQgNyIuICBPaCwgd2FpdCwgSSBndWVzcyB5
b3UncmUNCj50YWxraW5nIGFib3V0IHRoZSA2dGggYW5kIDd0aCBmaWxlZHMgb2YgdGhlICJ4cHJ0
IiBsaW5lIGluIG1vdW50c3RhdHMuDQo+DQo+YmFkX3hpZHMgbWVhbnMgdGhlIGNsaWVudCBnb3Qg
YSByZXNwb25zZSBidXQgY291bGRuJ3QgZmluZCBhIG1hdGNoaW5nDQo+cmVwbHkuICBJJ20gbm90
IHN1cmUgd2h5IHRoYXQgd291bGQgaGFwcGVuLS1tYXliZSBhIHJlc3BvbnNlIGNhbWUgYWZ0ZXIN
Cj50aGUgY2xpZW50IGdhdmUgdXAgd2FpdGluZyBmb3IgaXQ/DQo+DQo=