Hi,
there seems to be a problem with (at least) Linux 2.4.13, 2.4.15pre3
and network connections to AIX.
Trond thinks, it's not in the nfs3 bit and I found the problem with
these driver/nic combinations:
1. e100 driver (from Intel) on a EtherExpress Pro 100
2. kernel-3c59x on a 3Com 905C
3. recent Donald Becker-3c59x on a 3Com 905C.
on 10Mbit/HalfDuplex and 100MBit/FullDuplex
So it seems to be somewhere in between (IPV4 socket layer???).
Here is what happens:
When copying a file of >900kb into a nfs3-exported Directory of an AIX
nfs3-Server we get on the second attempt:
nfs: server camc1083 not responding, still trying
and after a while:
nfs: server camc1083 OK
The tcpdump during a copy looks like this:
tcpdump on AIX (caes04):
13:47:28.337776317 truncated-ip - 18 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
13:47:28.337860266 caes04.muc.shilp > capc25.muc.796: P 1:121(120) ack 156 win 60032
13:47:28.343619224 capc25.muc.796 > caes04.muc.shilp: . ack 121 win 17520 (DF)
13:47:28.344042473 truncated-ip - 50 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 156:344(188) ack 121 win 17520 (DF)
13:47:28.357398139 truncated-ip - 138 bytes missing!caes04.muc.shilp > capc25.muc.796: P 121:397(276) ack 344 win 60032
13:47:28.364496982 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 344:1804(1460) ack 397 win 17520 (DF)
13:47:28.364872917 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 1804:3264(1460) ack 397 win 17520 (DF)
13:47:28.365142046 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 3264:4724(1460) ack 397 win 17520 (DF)
13:47:28.482136300 caes04.muc.shilp > capc25.muc.796: . ack 4724 win 55652
13:47:28.489517784 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 4724:6184(1460) ack 397 win 17520 (DF)
13:47:28.490018916 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 6184:7644(1460) ack 397 win 17520 (DF)
13:47:28.490313868 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 7644:9104(1460) ack 397 win 17520 (DF)
13:47:28.490566573 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 9104:10564(1460) ack 397 win 17520 (DF)
13:47:28.685246999 caes04.muc.shilp > capc25.muc.796: . ack 2059190468 win 49812
and on Linux (capc25):
13:47:28.324042 > capc25.muc.796 > caes04.muc.nfs: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
13:47:28.330599 < caes04.muc.nfs > capc25.muc.796: P 1:121(120) ack 156 win 60032
13:47:28.330620 > capc25.muc.796 > caes04.muc.nfs: . 156:156(0) ack 121 win 17520 (DF)
13:47:28.330857 > capc25.muc.796 > caes04.muc.nfs: P 156:344(188) ack 121 win 17520 (DF)
13:47:28.350291 < caes04.muc.nfs > capc25.muc.796: P 121:397(276) ack 344 win 60032
13:47:28.350556 > capc25.muc.796 > caes04.muc.nfs: . 344:1804(1460) ack 397 win 17520 (DF)
13:47:28.350569 > capc25.muc.796 > caes04.muc.nfs: . 1804:3264(1460) ack 397 win 17520 (DF)
13:47:28.350581 > capc25.muc.796 > caes04.muc.nfs: . 3264:4724(1460) ack 397 win 17520 (DF)
13:47:28.475691 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 4724 win 55652
13:47:28.475724 > capc25.muc.796 > caes04.muc.nfs: . 4724:6184(1460) ack 397 win 17520 (DF)
13:47:28.475734 > capc25.muc.796 > caes04.muc.nfs: . 6184:7644(1460) ack 397 win 17520 (DF)
13:47:28.475743 > capc25.muc.796 > caes04.muc.nfs: . 7644:9104(1460) ack 397 win 17520 (DF)
13:47:28.475752 > capc25.muc.796 > caes04.muc.nfs: . 9104:10564(1460) ack 397 win 17520 (DF)
13:47:28.678570 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 10564 win 49812
13:47:28.678604 > capc25.muc.796 > caes04.muc.nfs: . 10564:12024(1460) ack 397 win 17520 (DF)
13:47:28.678614 > capc25.muc.796 > caes04.muc.nfs: . 12024:13484(1460) ack 397 win 17520 (DF)
13:47:28.678623 > capc25.muc.796 > caes04.muc.nfs: . 13484:14944(1460) ack 397 win 17520 (DF)
13:47:28.678632 > capc25.muc.796 > caes04.muc.nfs: . 14944:16404(1460) ack 397 win 17520 (DF)
13:47:28.678642 > capc25.muc.796 > caes04.muc.nfs: . 16404:17864(1460) ack 397 win 17520 (DF)
13:47:28.884628 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 17864 win 42512
I have no clue on how to debug a IPV4 socket layer, so I'm grateful
for any comment on how to provide better information...
Regards,
Birger
Trond Myklebust writes:
> Odd... That dump seems to indicate a problem with TCP in which the
> server is seeing bad packets. That's not an NFS problem per se, as we
> just use the standard socket functions. IOW it's either a bug in the
> IPV4 socket layer or a bug with your network card driver.
>
> Have you tried changing the networking card and/or its driver?
>
> Cheers,
> Trond
>
ps: Trond: the email I sent earlier was written after too little
testing...
--
Dr. Birger Lammering
science+computing ag fon: 089 356386-15
Geschaeftsstelle Muenchen fax: 089 356386-37
Ingolstaedter Str. 22
D-80807 Muenchen mailto:[email protected]
> 13:47:28.337776317 truncated-ip - 18 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
Right so someone is truncating frames. I'd start with the hub then work
outwards.
Hi,
Alan Cox writes:
> Right so someone is truncating frames. I'd start with the hub then work
> outwards.
tcpdump on the aix was truncating the frames. sorry, the snap length was
too small --- with using -s 64000 there were no truncated frames
anymore.
Well, I guess (n.b.: GUESS) the problem is with nfs again...
This is a consistent reproducible bug. We have tested it on several
AIXen, Linuxen on different switched Networks.
After having learned a bit better how to use tcpdump, here we go:
nfs3 Client: Linux 2.4.15pre5 (capc25) 3com 905B / Intel EtherReal 100+
nfs3 Server: Aix 4.3 (caes04), IBM SP2 / 7044-170
100MBit/FullDuplex
switched Network (Cisco Catalyst)
This is the first attempt of copying a 500k file onto the nfs server:
16:24:19.652672 > capc25.muc.654 > caes04.muc.sunrpc: udp 40 (DF)
16:24:19.655073 < caes04.muc.sunrpc > capc25.muc.654: udp 24
16:24:19.655119 > capc25.muc.655 > caes04.muc.sunrpc: udp 56 (DF)
16:24:19.656813 < caes04.muc.sunrpc > capc25.muc.655: udp 28
16:24:19.656853 > capc25.muc.656 > caes04.muc.nfs: S 2952133600:2952133600(0) win 5840 <mss 1460,sackOK,timestamp 2001001 0,nop,wscale 0> (DF)
16:24:19.657695 < caes04.muc.nfs > capc25.muc.656: S 4177984247:4177984247(0) ack 2952133601 win 59860 <mss 1460>
16:24:19.657711 > capc25.muc.656 > caes04.muc.nfs: . 1:1(0) ack 1 win 5840 (DF)
16:24:19.657746 > capc25.muc.656 > caes04.muc.nfs: P 1:45(44) ack 1 win 5840 (DF)
16:24:19.658653 < caes04.muc.nfs > capc25.muc.656: P 1:29(28) ack 45 win 60032
16:24:19.658666 > capc25.muc.656 > caes04.muc.nfs: . 45:45(0) ack 29 win 5840 (DF)
16:24:19.658702 > capc25.muc.656 > caes04.muc.nfs: F 45:45(0) ack 29 win 5840 (DF)
16:24:19.658749 > capc25.muc.657 > caes04.muc.sunrpc: udp 40 (DF)
16:24:19.659739 < caes04.muc.nfs > capc25.muc.656: . 29:29(0) ack 46 win 60032
16:24:19.660017 < caes04.muc.nfs > capc25.muc.656: F 29:29(0) ack 46 win 60032
16:24:19.660026 > capc25.muc.656 > caes04.muc.nfs: . 46:46(0) ack 30 win 5840 (DF)
16:24:19.660157 < caes04.muc.sunrpc > capc25.muc.657: udp 24
16:24:19.660199 > capc25.muc.658 > caes04.muc.sunrpc: udp 56 (DF)
16:24:19.662361 < caes04.muc.sunrpc > capc25.muc.658: udp 28
16:24:19.662393 > capc25.muc.50947429 > caes04.muc.nfs: 40 null (DF)
16:24:19.664439 < caes04.muc.nfs > capc25.muc.50947429: reply ok 24 null
16:24:19.665397 > capc25.muc.799 > caes04.muc.nfs: S 2955913379:2955913379(0) win 5840 <mss 1460,sackOK,timestamp 2001002 0,nop,wscale 0> (DF)
16:24:19.667274 < caes04.muc.nfs > capc25.muc.799: S 3186497216:3186497216(0) ack 2955913380 win 59860 <mss 1460>
16:24:19.667293 > capc25.muc.799 > caes04.muc.nfs: . 1:1(0) ack 1 win 5840 (DF)
16:24:19.667328 > capc25.muc.799 > caes04.muc.nfs: P 1:109(108) ack 1 win 5840 (DF)
16:24:19.669096 < caes04.muc.nfs > capc25.muc.799: P 1:117(116) ack 109 win 60032
16:24:19.669108 > capc25.muc.799 > caes04.muc.nfs: . 109:109(0) ack 117 win 5840 (DF)
16:24:19.669155 > capc25.muc.799 > caes04.muc.nfs: P 109:217(108) ack 117 win 5840 (DF)
16:24:19.670267 < caes04.muc.nfs > capc25.muc.799: P 117:289(172) ack 217 win 60032
16:24:19.670295 > capc25.muc.799 > caes04.muc.nfs: P 217:325(108) ack 289 win 6432 (DF)
16:24:19.671674 < caes04.muc.nfs > capc25.muc.799: P 289:457(168) ack 325 win 60032
16:24:19.673364 > capc25.muc.799 > caes04.muc.nfs: P 325:461(136) ack 457 win 6432 (DF)
16:24:19.675030 < caes04.muc.nfs > capc25.muc.799: P 457:701(244) ack 461 win 60032
16:24:19.675121 > capc25.muc.799 > caes04.muc.nfs: P 461:625(164) ack 701 win 7504 (DF)
16:24:19.693914 < caes04.muc.nfs > capc25.muc.799: P 701:849(148) ack 625 win 60032
16:24:19.694170 > capc25.muc.799 > caes04.muc.nfs: . 625:2085(1460) ack 849 win 7504 (DF)
16:24:19.694186 > capc25.muc.799 > caes04.muc.nfs: . 2085:3545(1460) ack 849 win 7504 (DF)
16:24:19.774514 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 3545 win 57112
16:24:19.774545 > capc25.muc.799 > caes04.muc.nfs: . 3545:5005(1460) ack 849 win 7504 (DF)
16:24:19.774556 > capc25.muc.799 > caes04.muc.nfs: . 5005:6465(1460) ack 849 win 7504 (DF)
16:24:19.774565 > capc25.muc.799 > caes04.muc.nfs: . 6465:7925(1460) ack 849 win 7504 (DF)
16:24:19.983865 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 7925 win 52732
16:24:19.983887 > capc25.muc.799 > caes04.muc.nfs: . 7925:9385(1460) ack 849 win 7504 (DF)
16:24:19.983897 > capc25.muc.799 > caes04.muc.nfs: . 9385:10845(1460) ack 849 win 7504 (DF)
16:24:19.983906 > capc25.muc.799 > caes04.muc.nfs: . 10845:12305(1460) ack 849 win 7504 (DF)
16:24:19.983916 > capc25.muc.799 > caes04.muc.nfs: P 12305:13765(1460) ack 849 win 7504 (DF)
16:24:20.194551 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 13765 win 46892
16:24:20.194569 > capc25.muc.799 > caes04.muc.nfs: . 13765:15225(1460) ack 849 win 7504 (DF)
16:24:20.194578 > capc25.muc.799 > caes04.muc.nfs: P 15225:16685(1460) ack 849 win 7504 (DF)
16:24:20.194588 > capc25.muc.799 > caes04.muc.nfs: . 16685:18145(1460) ack 849 win 7504 (DF)
16:24:20.194598 > capc25.muc.799 > caes04.muc.nfs: . 18145:19605(1460) ack 849 win 7504 (DF)
16:24:20.194607 > capc25.muc.799 > caes04.muc.nfs: P 19605:21065(1460) ack 849 win 7504 (DF)
16:24:20.396361 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 21065 win 39592
...and so on...
16:24:21.904205 < caes04.muc.nfs > capc25.muc.799: . 2981:2981(0) ack 462513 win 58968
16:24:21.904231 > capc25.muc.799 > caes04.muc.nfs: P 480033:481493(1460) ack 2981 win 7504 (DF)
16:24:21.904239 > capc25.muc.799 > caes04.muc.nfs: . 481493:482953(1460) ack 2981 win 7504 (DF)
16:24:21.904247 > capc25.muc.799 > caes04.muc.nfs: . 482953:484413(1460) ack 2981 win 7504 (DF)
16:24:21.904256 > capc25.muc.799 > caes04.muc.nfs: . 484413:485873(1460) ack 2981 win 7504 (DF)
16:24:21.904265 > capc25.muc.799 > caes04.muc.nfs: . 485873:487333(1460) ack 2981 win 7504 (DF)
16:24:21.904277 > capc25.muc.799 > caes04.muc.nfs: . 487333:488793(1460) ack 2981 win 7504 (DF)
16:24:21.904289 > capc25.muc.799 > caes04.muc.nfs: . 488793:490253(1460) ack 2981 win 7504 (DF)
16:24:21.904298 > capc25.muc.799 > caes04.muc.nfs: . 490253:491713(1460) ack 2981 win 7504 (DF)
16:24:21.904309 > capc25.muc.799 > caes04.muc.nfs: . 491713:493173(1460) ack 2981 win 7504 (DF)
16:24:21.904317 > capc25.muc.799 > caes04.muc.nfs: . 493173:494633(1460) ack 2981 win 7504 (DF)
16:24:21.904327 > capc25.muc.799 > caes04.muc.nfs: . 494633:496093(1460) ack 2981 win 7504 (DF)
16:24:21.904337 > capc25.muc.799 > caes04.muc.nfs: . 496093:497553(1460) ack 2981 win 7504 (DF)
16:24:21.904349 > capc25.muc.799 > caes04.muc.nfs: . 497553:499013(1460) ack 2981 win 7504 (DF)
16:24:21.904357 > capc25.muc.799 > caes04.muc.nfs: . 499013:500473(1460) ack 2981 win 7504 (DF)
16:24:21.904364 > capc25.muc.799 > caes04.muc.nfs: . 500473:501933(1460) ack 2981 win 7504 (DF)
16:24:21.904372 > capc25.muc.799 > caes04.muc.nfs: . 501933:503393(1460) ack 2981 win 7504 (DF)
16:24:21.905180 < caes04.muc.nfs > capc25.muc.799: P 2981:3145(164) ack 465433 win 56048
16:24:21.905979 > capc25.muc.799 > caes04.muc.nfs: . 503393:504853(1460) ack 2981 win 7504 (DF)
16:24:21.905986 > capc25.muc.799 > caes04.muc.nfs: . 504853:506313(1460) ack 2981 win 7504 (DF)
16:24:21.905992 > capc25.muc.799 > caes04.muc.nfs: . 506313:507773(1460) ack 2981 win 7504 (DF)
16:24:21.905998 > capc25.muc.799 > caes04.muc.nfs: . 507773:509233(1460) ack 2981 win 7504 (DF)
16:24:21.906004 > capc25.muc.799 > caes04.muc.nfs: P 509233:510693(1460) ack 2981 win 7504 (DF)
16:24:21.906009 > capc25.muc.799 > caes04.muc.nfs: P 510693:512153(1460) ack 2981 win 7504 (DF)
16:24:21.906016 > capc25.muc.799 > caes04.muc.nfs: . 512153:513613(1460) ack 2981 win 7504 (DF)
16:24:21.906021 > capc25.muc.799 > caes04.muc.nfs: . 513613:515073(1460) ack 3145 win 7504 (DF)
16:24:21.906027 > capc25.muc.799 > caes04.muc.nfs: P 515073:515133(60) ack 3145 win 7504 (DF)
16:24:21.909806 < caes04.muc.nfs > capc25.muc.799: . 3145:3145(0) ack 494633 win 59764
16:24:21.910644 < caes04.muc.nfs > capc25.muc.799: P 3145:3309(164) ack 497553 win 56844
16:24:21.912247 < caes04.muc.nfs > capc25.muc.799: . 3309:3309(0) ack 515133 win 59892
16:24:21.912855 < caes04.muc.nfs > capc25.muc.799: P 3309:3473(164) ack 515133 win 59892
16:24:21.941386 > capc25.muc.799 > caes04.muc.nfs: . 515133:515133(0) ack 3473 win 7504 (DF)
16:24:21.965016 < caes04.muc.nfs > capc25.muc.799: P 3473:3629(156) ack 515133 win 60032
16:24:21.965042 > capc25.muc.799 > caes04.muc.nfs: . 515133:515133(0) ack 3629 win 7504 (DF)
16:24:21.965128 > capc25.muc.799 > caes04.muc.nfs: P 515133:515273(140) ack 3629 win 7504 (DF)
16:24:21.998128 < caes04.muc.nfs > capc25.muc.799: P 3629:3785(156) ack 515273 win 60032
16:24:22.031385 > capc25.muc.799 > caes04.muc.nfs: . 515273:515273(0) ack 3785 win 7504 (DF)
The file was copied all right.
qx09820@capc25 /home/qx09820 > netstat |grep caes04
tcp 0 0 capc25.muc:729 caes04.muc:nfs TIME_WAIT
tcp 0 0 capc25.muc:744 caes04.muc:nfs TIME_WAIT
tcp 0 0 capc25.muc:737 caes04.muc:nfs TIME_WAIT
tcp 0 0 capc25.muc:798 caes04.muc:nfs ESTABLISHED
Now copying the same file for a second time:
16:27:20.716640 > capc25.muc.799 > caes04.muc.nfs: P 2956428652:2956428780(128) ack 3186501001 win 7504 (DF)
16:27:20.717226 < caes04.muc.nfs > capc25.muc.799: P 1:117(116) ack 128 win 60032
16:27:20.717243 > capc25.muc.799 > caes04.muc.nfs: . 128:128(0) ack 117 win 7504 (DF)
16:27:20.717360 > capc25.muc.799 > caes04.muc.nfs: P 128:264(136) ack 117 win 7504 (DF)
16:27:20.717966 < caes04.muc.nfs > capc25.muc.799: P 117:361(244) ack 264 win 60032
16:27:20.718041 > capc25.muc.799 > caes04.muc.nfs: P 264:428(164) ack 361 win 8576 (DF)
16:27:20.735478 < caes04.muc.nfs > capc25.muc.799: P 361:509(148) ack 428 win 60032
16:27:20.735745 > capc25.muc.799 > caes04.muc.nfs: . 428:1888(1460) ack 509 win 8576 (DF)
16:27:20.735760 > capc25.muc.799 > caes04.muc.nfs: . 1888:3348(1460) ack 509 win 8576 (DF)
16:27:20.858656 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 3348 win 57112
16:27:20.858672 > capc25.muc.799 > caes04.muc.nfs: . 3348:4808(1460) ack 509 win 8576 (DF)
16:27:20.858682 > capc25.muc.799 > caes04.muc.nfs: . 4808:6268(1460) ack 509 win 8576 (DF)
16:27:20.858692 > capc25.muc.799 > caes04.muc.nfs: . 6268:7728(1460) ack 509 win 8576 (DF)
16:27:21.068297 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 7728 win 52732
16:27:21.068318 > capc25.muc.799 > caes04.muc.nfs: . 7728:9188(1460) ack 509 win 8576 (DF)
16:27:21.068328 > capc25.muc.799 > caes04.muc.nfs: . 9188:10648(1460) ack 509 win 8576 (DF)
16:27:21.068337 > capc25.muc.799 > caes04.muc.nfs: . 10648:12108(1460) ack 509 win 8576 (DF)
16:27:21.068347 > capc25.muc.799 > caes04.muc.nfs: . 12108:13568(1460) ack 509 win 8576 (DF)
16:27:21.278296 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 13568 win 46892
16:27:21.278321 > capc25.muc.799 > caes04.muc.nfs: . 13568:15028(1460) ack 509 win 8576 (DF)
16:27:21.278331 > capc25.muc.799 > caes04.muc.nfs: . 15028:16488(1460) ack 509 win 8576 (DF)
16:27:21.278341 > capc25.muc.799 > caes04.muc.nfs: . 16488:17948(1460) ack 509 win 8576 (DF)
16:27:21.278350 > capc25.muc.799 > caes04.muc.nfs: . 17948:19408(1460) ack 509 win 8576 (DF)
16:27:21.278359 > capc25.muc.799 > caes04.muc.nfs: . 19408:20868(1460) ack 509 win 8576 (DF)
16:27:21.488343 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 20868 win 39592
16:27:21.488376 > capc25.muc.799 > caes04.muc.nfs: . 20868:22328(1460) ack 509 win 8576 (DF)
16:27:21.488387 > capc25.muc.799 > caes04.muc.nfs: . 22328:23788(1460) ack 509 win 8576 (DF)
16:27:21.488397 > capc25.muc.799 > caes04.muc.nfs: . 23788:25248(1460) ack 509 win 8576 (DF)
16:27:21.488406 > capc25.muc.799 > caes04.muc.nfs: . 25248:26708(1460) ack 509 win 8576 (DF)
16:27:21.488415 > capc25.muc.799 > caes04.muc.nfs: . 26708:28168(1460) ack 509 win 8576 (DF)
16:27:21.488424 > capc25.muc.799 > caes04.muc.nfs: . 28168:29628(1460) ack 509 win 8576 (DF)
16:27:21.690712 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 29628 win 30832
16:27:21.690736 > capc25.muc.799 > caes04.muc.nfs: P 29628:31088(1460) ack 509 win 8576 (DF)
16:27:21.690746 > capc25.muc.799 > caes04.muc.nfs: . 31088:32548(1460) ack 509 win 8576 (DF)
16:27:21.690756 > capc25.muc.799 > caes04.muc.nfs: . 32548:34008(1460) ack 509 win 8576 (DF)
16:27:21.690765 > capc25.muc.799 > caes04.muc.nfs: . 34008:35468(1460) ack 509 win 8576 (DF)
16:27:21.690774 > capc25.muc.799 > caes04.muc.nfs: . 35468:36928(1460) ack 509 win 8576 (DF)
16:27:21.690784 > capc25.muc.799 > caes04.muc.nfs: . 36928:38388(1460) ack 509 win 8576 (DF)
16:27:21.690793 > capc25.muc.799 > caes04.muc.nfs: . 38388:39848(1460) ack 509 win 8576 (DF)
..and so on..
16:27:22.567415 < caes04.muc.nfs > capc25.muc.799: P 2477:2641(164) ack 434048 win 54320
16:27:22.568505 > capc25.muc.799 > caes04.muc.nfs: . 470548:472008(1460) ack 2477 win 8576 (DF)
16:27:22.568512 > capc25.muc.799 > caes04.muc.nfs: . 472008:473468(1460) ack 2477 win 8576 (DF)
16:27:22.568518 > capc25.muc.799 > caes04.muc.nfs: . 473468:474928(1460) ack 2477 win 8576 (DF)
16:27:22.568524 > capc25.muc.799 > caes04.muc.nfs: . 474928:476388(1460) ack 2477 win 8576 (DF)
16:27:22.568530 > capc25.muc.799 > caes04.muc.nfs: . 476388:477848(1460) ack 2477 win 8576 (DF)
16:27:22.568536 > capc25.muc.799 > caes04.muc.nfs: . 477848:479308(1460) ack 2641 win 8576 (DF)
16:27:22.568541 > capc25.muc.799 > caes04.muc.nfs: . 479308:480768(1460) ack 2641 win 8576 (DF)
16:27:22.568550 > capc25.muc.799 > caes04.muc.nfs: . 480768:482228(1460) ack 2641 win 8576 (DF)
16:27:22.568556 > capc25.muc.799 > caes04.muc.nfs: . 482228:483688(1460) ack 2641 win 8576 (DF)
16:27:22.570140 < caes04.muc.nfs > capc25.muc.799: . 2641:2641(0) ack 461788 win 59496
16:27:22.570163 > capc25.muc.799 > caes04.muc.nfs: . 483688:485148(1460) ack 2641 win 8576 (DF)
16:27:22.570172 > capc25.muc.799 > caes04.muc.nfs: . 485148:486608(1460) ack 2641 win 8576 (DF)
16:27:22.570180 > capc25.muc.799 > caes04.muc.nfs: . 486608:488068(1460) ack 2641 win 8576 (DF)
16:27:22.570190 > capc25.muc.799 > caes04.muc.nfs: . 488068:489528(1460) ack 2641 win 8576 (DF)
16:27:22.570200 > capc25.muc.799 > caes04.muc.nfs: . 489528:490988(1460) ack 2641 win 8576 (DF)
16:27:22.570214 > capc25.muc.799 > caes04.muc.nfs: . 490988:492448(1460) ack 2641 win 8576 (DF)
16:27:22.570222 > capc25.muc.799 > caes04.muc.nfs: . 492448:493908(1460) ack 2641 win 8576 (DF)
16:27:22.570230 > capc25.muc.799 > caes04.muc.nfs: . 493908:495368(1460) ack 2641 win 8576 (DF)
16:27:22.570238 > capc25.muc.799 > caes04.muc.nfs: . 495368:496828(1460) ack 2641 win 8576 (DF)
16:27:22.570247 > capc25.muc.799 > caes04.muc.nfs: . 496828:498288(1460) ack 2641 win 8576 (DF)
16:27:22.570257 > capc25.muc.799 > caes04.muc.nfs: P 498288:499748(1460) ack 2641 win 8576 (DF)
16:27:22.570268 > capc25.muc.799 > caes04.muc.nfs: . 499748:501208(1460) ack 2641 win 8576 (DF)
16:27:22.570279 > capc25.muc.799 > caes04.muc.nfs: . 501208:502668(1460) ack 2641 win 8576 (DF)
16:27:22.570287 > capc25.muc.799 > caes04.muc.nfs: P 502668:504128(1460) ack 2641 win 8576 (DF)
16:27:22.570294 > capc25.muc.799 > caes04.muc.nfs: . 504128:505588(1460) ack 2641 win 8576 (DF)
16:27:22.570307 > capc25.muc.799 > caes04.muc.nfs: . 505588:507048(1460) ack 2641 win 8576 (DF)
16:27:22.570433 < caes04.muc.nfs > capc25.muc.799: P 2641:2805(164) ack 464708 win 56576
16:27:22.571906 > capc25.muc.799 > caes04.muc.nfs: . 507048:508508(1460) ack 2641 win 8576 (DF)
16:27:22.571913 > capc25.muc.799 > caes04.muc.nfs: . 508508:509968(1460) ack 2641 win 8576 (DF)
16:27:22.571919 > capc25.muc.799 > caes04.muc.nfs: . 509968:511428(1460) ack 2641 win 8576 (DF)
16:27:22.571924 > capc25.muc.799 > caes04.muc.nfs: . 511428:512888(1460) ack 2641 win 8576 (DF)
16:27:22.571933 > capc25.muc.799 > caes04.muc.nfs: . 512888:514348(1460) ack 2805 win 8576 (DF)
16:27:22.571938 > capc25.muc.799 > caes04.muc.nfs: P 514348:514796(448) ack 2805 win 8576 (DF)
16:27:22.573924 < caes04.muc.nfs > capc25.muc.799: . 2805:2805(0) ack 495368 win 58832
16:27:22.574462 < caes04.muc.nfs > capc25.muc.799: P 2805:2969(164) ack 499748 win 54452
16:27:22.575192 < caes04.muc.nfs > capc25.muc.799: . 2969:2969(0) ack 514796 win 60032
16:27:22.575518 < caes04.muc.nfs > capc25.muc.799: P 2969:3133(164) ack 514796 win 60032
16:27:22.608665 > capc25.muc.799 > caes04.muc.nfs: . 514796:514796(0) ack 3133 win 8576 (DF)
16:27:26.248872 > capc25.muc.799 > caes04.muc.nfs: P 514796:514936(140) ack 3133 win 8576 (DF)
16:27:26.268210 < caes04.muc.nfs > capc25.muc.799: . 3133:3133(0) ack 514936 win 60032
16:27:26.282829 < caes04.muc.nfs > capc25.muc.799: P 3133:3289(156) ack 514936 win 60032
16:27:26.282843 > capc25.muc.799 > caes04.muc.nfs: . 514936:514936(0) ack 3289 win 8576 (DF)
from now on we get lot's of these:
16:27:26.489024 > capc25.muc.576126976 > caes04.muc.nfs: 40 null (DF)
16:27:26.489647 < caes04.muc.nfs > capc25.muc.576126976: reply ok 24 null
The cp command on the Linux nfs3-client side hangs and cannot be
killed. We get:
dmesg: nfs: server caes04 not responding, still trying
then after a while:
dmesg: nfs: server caes04 OK
qx09820@capc25 /home/qx09820 > netstat | grep caes04
tcp 0 0 capc25.muc:798 caes04.muc:nfs ESTABLISHED
The cp seems to be finished and the game can start again. :-(
The interesting bit seems to be at the end, where `cp` on the Linux
nfs3 client hangs. The cause of the problem might just as well be on
the AIX nfs3 server as on the client. The tcpdump on the server is
basically identical -- now that option -s 64000 has been used.
Files of size 500kB, 1MB and 2MB have been used. Sometimes the whole file
is copied before (after/while?) cp locks up, sometimes just about 800k
are copied.
We also got these messages after the cp died and the nfs: server
caes04 OK message appeared. The line with <mss 1460,sackOK,timestamp
2101448 0,nop,wscale 0> looks weird. Selective acknowledgement? Is
this normal?
16:34:56.487168 > capc25.muc.1649999872 > caes04.muc.nfs: 40 null (DF)
16:34:56.493170 < caes04.muc.nfs > capc25.muc.1649999872: reply ok 24 null
16:35:26.488376 > capc25.muc.2186870784 > caes04.muc.nfs: 40 null (DF)
16:35:26.492354 < caes04.muc.nfs > capc25.muc.2186870784: reply ok 24 null
16:35:56.489587 > capc25.muc.2723741696 > caes04.muc.nfs: 40 null (DF)
16:35:56.491055 < caes04.muc.nfs > capc25.muc.2723741696: reply ok 24 null
16:36:26.490783 > capc25.muc.3260612608 > caes04.muc.nfs: 40 null (DF)
16:36:26.505492 < caes04.muc.nfs > capc25.muc.3260612608: reply ok 24 null
16:36:56.491996 > capc25.muc.3797483520 > caes04.muc.nfs: 40 null (DF)
16:36:56.494516 < caes04.muc.nfs > capc25.muc.3797483520: reply ok 24 null
16:37:26.493194 > capc25.muc.39452672 > caes04.muc.nfs: 40 null (DF)
16:37:26.493631 < caes04.muc.nfs > capc25.muc.39452672: reply ok 24 null
16:37:56.484418 > capc25.muc.576323584 > caes04.muc.nfs: 40 null (DF)
16:37:56.485075 < caes04.muc.nfs > capc25.muc.576323584: reply ok 24 null
16:38:26.485620 > capc25.muc.1113194496 > caes04.muc.nfs: 40 null (DF)
16:38:26.486086 < caes04.muc.nfs > capc25.muc.1113194496: reply ok 24 null
16:38:56.486833 > capc25.muc.1650065408 > caes04.muc.nfs: 40 null (DF)
16:38:56.487256 < caes04.muc.nfs > capc25.muc.1650065408: reply ok 24 null
16:39:26.488042 > capc25.muc.2186936320 > caes04.muc.nfs: 40 null (DF)
16:39:26.488555 < caes04.muc.nfs > capc25.muc.2186936320: reply ok 24 null
16:39:56.489253 > capc25.muc.2723807232 > caes04.muc.nfs: 40 null (DF)
16:39:56.489655 < caes04.muc.nfs > capc25.muc.2723807232: reply ok 24 null
16:40:26.490458 > capc25.muc.3260678144 > caes04.muc.nfs: 40 null (DF)
16:40:26.490858 < caes04.muc.nfs > capc25.muc.3260678144: reply ok 24 null
16:40:56.491656 > capc25.muc.3797549056 > caes04.muc.nfs: 40 null (DF)
16:40:56.492277 < caes04.muc.nfs > capc25.muc.3797549056: reply ok 24 null
16:41:04.171081 < caes04.muc.nfs > capc25.muc.798: F 2278783378:2278783378(0) ack 3544800811 win 60032
16:41:04.171126 > capc25.muc.798 > caes04.muc.nfs: F 1:1(0) ack 1 win 9408 (DF)
16:41:04.171307 > capc25.muc.799 > caes04.muc.nfs: S 4020491914:4020491914(0) win 5840 <mss 1460,sackOK,timestamp 2101448 0,nop,wscale 0> (DF)
16:41:04.172213 < caes04.muc.nfs > capc25.muc.798: . 1:1(0) ack 2 win 60032
16:41:04.172379 < caes04.muc.nfs > capc25.muc.799: S 3316289216:3316289216(0) ack 4020491915 win 59860 <mss 1460>
16:41:04.172394 > capc25.muc.799 > caes04.muc.nfs: . 1:1(0) ack 1 win 5840 (DF)
16:41:04.172426 > capc25.muc.799 > caes04.muc.nfs: . 1:1461(1460) ack 1 win 5840 (DF)
16:41:04.172437 > capc25.muc.799 > caes04.muc.nfs: . 1461:2921(1460) ack 1 win 5840 (DF)
16:41:04.193288 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 2921 win 57112
16:41:04.193322 > capc25.muc.799 > caes04.muc.nfs: . 2921:4381(1460) ack 1 win 5840 (DF)
16:41:04.193330 > capc25.muc.799 > caes04.muc.nfs: . 4381:5841(1460) ack 1 win 5840 (DF)
16:41:04.193339 > capc25.muc.799 > caes04.muc.nfs: . 5841:7301(1460) ack 1 win 5840 (DF)
16:41:04.403451 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 7301 win 52732
16:41:04.403467 > capc25.muc.799 > caes04.muc.nfs: . 7301:8761(1460) ack 1 win 5840 (DF)
16:41:04.403477 > capc25.muc.799 > caes04.muc.nfs: . 8761:10221(1460) ack 1 win 5840 (DF)
16:41:04.403486 > capc25.muc.799 > caes04.muc.nfs: . 10221:11681(1460) ack 1 win 5840 (DF)
16:41:04.403496 > capc25.muc.799 > caes04.muc.nfs: P 11681:13141(1460) ack 1 win 5840 (DF)
16:41:04.613256 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 13141 win 46892
16:41:04.613294 > capc25.muc.799 > caes04.muc.nfs: . 13141:14601(1460) ack 1 win 5840 (DF)
16:41:04.613305 > capc25.muc.799 > caes04.muc.nfs: P 14601:16061(1460) ack 1 win 5840 (DF)
16:41:04.613315 > capc25.muc.799 > caes04.muc.nfs: . 16061:17521(1460) ack 1 win 5840 (DF)
16:41:04.613324 > capc25.muc.799 > caes04.muc.nfs: . 17521:18981(1460) ack 1 win 5840 (DF)
16:41:04.613334 > capc25.muc.799 > caes04.muc.nfs: P 18981:20441(1460) ack 1 win 5840 (DF)
16:41:04.815623 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 20441 win 39592
16:41:04.815641 > capc25.muc.799 > caes04.muc.nfs: . 20441:21901(1460) ack 1 win 5840 (DF)
16:41:04.815651 > capc25.muc.799 > caes04.muc.nfs: . 21901:23361(1460) ack 1 win 5840 (DF)
16:41:04.815660 > capc25.muc.799 > caes04.muc.nfs: . 23361:24821(1460) ack 1 win 5840 (DF)
16:41:04.815670 > capc25.muc.799 > caes04.muc.nfs: P 24821:26281(1460) ack 1 win 5840 (DF)
16:41:04.815693 > capc25.muc.799 > caes04.muc.nfs: . 26281:27741(1460) ack 1 win 5840 (DF)
16:41:04.815704 > capc25.muc.799 > caes04.muc.nfs: . 27741:29201(1460) ack 1 win 5840 (DF)
16:41:05.023080 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 29201 win 30832
16:41:05.023117 > capc25.muc.799 > caes04.muc.nfs: . 29201:30661(1460) ack 1 win 5840 (DF)
16:41:05.023127 > capc25.muc.799 > caes04.muc.nfs: . 30661:32121(1460) ack 1 win 5840 (DF)
16:41:05.023137 > capc25.muc.799 > caes04.muc.nfs: P 32121:32917(796) ack 1 win 5840 (DF)
16:41:05.024644 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 32917 win 60032
16:41:05.025192 < caes04.muc.nfs > capc25.muc.799: P 1:165(164) ack 32917 win 60032
16:41:05.025203 > capc25.muc.799 > caes04.muc.nfs: . 32917:32917(0) ack 165 win 5840 (DF)
16:41:05.025272 > capc25.muc.799 > caes04.muc.nfs: P 32917:33057(140) ack 165 win 5840 (DF)
16:41:05.048922 < caes04.muc.nfs > capc25.muc.799: P 165:321(156) ack 33057 win 60032
16:41:05.081819 > capc25.muc.799 > caes04.muc.nfs: . 33057:33057(0) ack 321 win 5840 (DF)
16:41:26.492871 > capc25.muc.39518208 > caes04.muc.nfs: 40 null (DF)
16:41:26.495098 < caes04.muc.nfs > capc25.muc.39518208: reply ok 24 null
16:41:56.494082 > capc25.muc.576389120 > caes04.muc.nfs: 40 null (DF)
16:41:56.500261 < caes04.muc.nfs > capc25.muc.576389120: reply ok 24 null
16:42:26.485359 > capc25.muc.1113260032 > caes04.muc.nfs: 40 null (DF)
16:42:26.487429 < caes04.muc.nfs > capc25.muc.1113260032: reply ok 24 null
16:42:56.486530 > capc25.muc.1650130944 > caes04.muc.nfs: 40 null (DF)
16:42:56.487097 < caes04.muc.nfs > capc25.muc.1650130944: reply ok 24 null
16:43:26.487748 > capc25.muc.2187001856 > caes04.muc.nfs: 40 null (DF)
16:43:26.489881 < caes04.muc.nfs > capc25.muc.2187001856: reply ok 24 null
16:43:34.489177 > capc25.muc.799 > caes04.muc.nfs: F 33057:33057(0) ack 321 win 5840 (DF)
16:43:34.489777 < caes04.muc.nfs > capc25.muc.799: . 321:321(0) ack 33058 win 60032
16:43:34.490370 < caes04.muc.nfs > capc25.muc.799: F 321:321(0) ack 33058 win 60032
16:43:34.490391 > capc25.muc.799 > caes04.muc.nfs: . 33058:33058(0) ack 322 win 5840 (DF)
16:43:34.494041 > capc25.muc.1022 > caes04.muc.32859: udp 80 (DF)
16:43:34.494079 > capc25.muc.1022 > caes04.muc.sunrpc: udp 84 (DF)
16:43:34.494091 > capc25.muc.1022 > caes04.muc.32859: udp 80 (DF)
16:43:34.496654 < caes04.muc.sunrpc > capc25.muc.1022: udp 28
16:43:34.507767 < caes04.muc.32859 > capc25.muc.1022: udp 72
16:43:34.507939 < caes04.muc.32859 > capc25.muc.1022: udp 24
16:43:56.488965 > capc25.muc.2723872768 > caes04.muc.nfs: 40 null (DF)
16:43:56.489405 < caes04.muc.nfs > capc25.muc.2723872768: reply ok 24 null
16:44:26.490297 > capc25.muc.3260743680 > caes04.muc.nfs: 40 null (DF)
16:44:26.490754 < caes04.muc.nfs > capc25.muc.3260743680: reply ok 24 null
Slighlty clueless,
Birger
>>>>> " " == Birger Lammering <[email protected]> writes:
> 3133:3289(156) ack 514936 win 60032 16:27:26.282843 >
> capc25.muc.799 > caes04.muc.nfs: . 514936:514936(0) ack 3289
> win 8576 (DF)
> from now on we get lot's of these:
> 16:27:26.489024 > capc25.muc.576126976 > caes04.muc.nfs: 40
> null (DF) 16:27:26.489647 < caes04.muc.nfs >
> capc25.muc.576126976: reply ok 24 null
> The cp command on the Linux nfs3-client side hangs and cannot
> be killed. We get:
> dmesg: nfs: server caes04 not responding, still trying
> then after a while: dmesg: nfs: server caes04 OK
> qx09820@capc25 /home/qx09820 > netstat | grep caes04 tcp 0 0
> capc25.muc:798 caes04.muc:nfs ESTABLISHED
Ho hum... It looks to me as if the problem is that the Linux NFS
client is falling asleep before a write, and then not waking
up. That sort of points at the write_space() callback.
When the socket buffer is full, and we get an EAGAIN response to our
sendmsg() request, we normally put the request to sleep, block the
socket, and rely on write_space() to wake us up when there is enough
memory to proceed.
Assuming that this is the case, there are 2 possible causes:
1) A bug in the IPV4 TCP layer in which we don't call write_space()
despite having liberated enough memory to proceed.
2) I've misunderstood the IPV4 tcp api, and so the check for
sock_writeable() in net/sunrpc/xprt.c:tcp_write_space() is
incorrect.
Alexey: Do you have any comments? Is it correct to check for
sock_writeable() on a TCP socket?
Birger: could you try the following patch, that simply removes the
check for sock_writeable()?
Cheers,
Trond
--- linux-2.4.15-pre6/net/sunrpc/xprt.c.orig Mon Oct 8 21:36:07 2001
+++ linux-2.4.15-pre6/net/sunrpc/xprt.c Mon Nov 19 19:07:09 2001
@@ -1071,10 +1071,6 @@
if (xprt->shutdown)
return;
- /* Wait until we have enough socket memory */
- if (!sock_writeable(sk))
- return;
-
if (!xprt_test_and_set_wspace(xprt)) {
spin_lock(&xprt->sock_lock);
if (xprt->snd_task && xprt->snd_task->tk_rpcwait == &xprt->sending)
Hello!
> Alexey: Do you have any comments? Is it correct to check for
> sock_writeable() on a TCP socket?
No. sock_writable() is for datagram sockets, TCP never used or
satisfied this predicate, it used(s) more interesting one.
BTW applications need not use this anyway, we do not awake people
for no reasons. If a write failed with EAGAIN, wakeup will happen
only when there is some room for write. And it will not be awaken
again until the next write will fail. So, if you rejected wakeup
(due to wrong predicate), nobody will remind you again.
Alexey
>>>>> " " == kuznet <[email protected]> writes:
> No. sock_writable() is for datagram sockets, TCP never used or
> satisfied this predicate, it used(s) more interesting one.
> BTW applications need not use this anyway, we do not awake
> people for no reasons. If a write failed with EAGAIN, wakeup
> will happen only when there is some room for write. And it will
> not be awaken again until the next write will fail. So, if you
> rejected wakeup (due to wrong predicate), nobody will remind
> you again.
Thanks... Then the patch I sent Birger is very likely the correct one.
Originally I had a test for whether sock_wspace(sk) was greater than
some minimal value. We need this for UDP in order to avoid waking up
'rpciod' before the socket buffer is large enough to accommodate the
RPC datagram. As the same code worked in 2.2.x for TCP, I had assumed
it was OK...
Cheers,
Trond
Hello!
> Originally I had a test for whether sock_wspace(sk) was greater than
> some minimal value. We need this for UDP in order to avoid waking up
> 'rpciod' before the socket buffer is large enough to accommodate the
> RPC datagram.
I do no think this was right, to be honest. write_space with udp is
too hairy thing to use it correctly. :-) Anyway, it is enough to select
right sndbuf. Right is... well, default value is right. :-)
> As the same code worked in 2.2.x for TCP, I had assumed
> it was OK...
Most likely, it worked because 2.2 did not protect of overschedule
and waked up thread each time when some space was freed, so it was enough
that wakeup predicate used by tcp and by application had one common point:
wmem_alloc==0 and they always have it. 2.4 does not wake
up process, if it did not see full buffer after previous wakeup.
Alexey
>>>>> " " == kuznet <[email protected]> writes:
> I do no think this was right, to be honest. write_space with
> udp is too hairy thing to use it correctly. :-) Anyway, it is
> enough to select right sndbuf. Right is... well, default value
> is right. :-)
The sndbuf is by default 64k, so that should indeed suffice to fit all
the possible varieties of NFS datagram.
The problem is that when EAGAIN is returned by sendmsg, we want to put
the RPC request to sleep (and have rpciod deal with other pending
requests), and then reactivate it as soon as sock_wspace() reports
that the available free buffer space is large enough to fit the next
request.
Assuming that sock_wfree() always gets called whenever an skb is
released and that sock_wspace() does indeed reflect more or less the
maximum message size for which there is free buffer space (I allow a
couple of kilobytes for extra padding) then the current UDP code
should be correct and race-free.
Cheers,
Trond
Hello!
> The problem is that when EAGAIN is returned by sendmsg,
BTW this is already problem. UDP should not hit EAGAIN case, if the
predicate is right.
> requests), and then reactivate it as soon as sock_wspace() reports
> that the available free buffer space is large enough to fit the next
> request.
sock_wspace() is OK. sock_writable() is bad.
> released and that sock_wspace() does indeed reflect more or less the
> maximum message size for which there is free buffer space (I allow a
> couple of kilobytes for extra padding)
Do not economize. :-) The longer you wait the less you scheduled.
We used to wait for half of sndbuf to be freed.
> then the current UDP code
> should be correct and race-free.
BTW recently I was reported it deadlocks on some spinlock...
Alexey
>>>>> " " == kuznet <[email protected]> writes:
>> The problem is that when EAGAIN is returned by sendmsg,
> BTW this is already problem. UDP should not hit EAGAIN case, if
> the predicate is right.
You are saying that the it is impossible for sock_alloc_send_skb() to
fail when using non-blocking writes? It was certainly occuring in
2.2.x.
Don't forget that we can be trying to fire off 16 * 32k 'simultaneous'
requests down the same socket when NFS is doing asynchronous block
writes. (Note: by 'simultaneous' I mean that we don't wait for the
server to reply before firing off the next request)
>> then the current UDP code should be correct and race-free.
> BTW recently I was reported it deadlocks on some spinlock...
Ulrich Weigand reported the following problem on the S/390:
A QDIO networking driver bottom half was grabbing a private spinlock,
then calling dev_kfree_skb_any() which again calls write_space() (via
kfree_skb()) and so tries to take xprt->sock_lock.
At the same time, a QDIO hard interrupt could be trying to take the
same private spinlock on another processor. Since the RPC layer only
protects against bottom halves, and since the interrupted process
could already be holding the RPC lock that kfree_skb() tries to take,
the hard interrupt could deadlock.
I haven't done anything about this because IMHO it makes more sense to
have the QDIO driver drop their special spinlock when calling external
functions such as dev_kfree_skb_any() rather than to force the RPC
layer to use the spin_lock_irqsave().
Cheers,
Trond
>>>>> " " == Trond Myklebust <[email protected]> writes:
> I haven't done anything about this because IMHO it makes more
> sense to have the QDIO driver drop their special spinlock when
> calling external functions such as dev_kfree_skb_any() rather
> than to force the RPC layer to use the spin_lock_irqsave().
I forgot to add: The socket fasync lists use spinlocking in the same
was as RPC does, with sock_fasync() setting
write_lock_bh(&sk->callback_lock), and sock_def_write_space()
doing read_lock(&sk->callback_lock).
So that would deadlock with the QDIO driver in the exact same manner
as the RPC stuff (albeit probably a lot less frequently).
Cheers,
Trond
Hi Trond,
sorry to disappoint you: the patch didn't work for me :-(
I've commented out the two lines in net/sunrpc/xprt.c of 2.4.15pre5.
(Has there been a change in pre6 that might make a difference?)
The result (copying 500k twice from capc25 to caes04):
tcpdump on capc25 (Linux 2.4.15pre5 with sock_writeable patch)
09:59:08.243965 eth0 > capc25.muc.33882112 > caes04.muc.nfs: 40 null (DF)
09:59:08.244500 eth0 < caes04.muc.nfs > capc25.muc.33882112: reply ok 24 null
09:59:08.244533 eth0 > capc25.muc.1022 > caes04.muc.sunrpc: udp 84 (DF)
09:59:08.245271 eth0 < caes04.muc.sunrpc > capc25.muc.1022: udp 28
09:59:31.841577 eth0 > capc25.muc.798 > caes04.muc.nfs: P 1035886130:1035886266(136) ack 1814843210 win 7504 (DF)
09:59:31.843105 eth0 < caes04.muc.nfs > capc25.muc.798: P 1:245(244) ack 136 win 60032
09:59:31.843118 eth0 > capc25.muc.798 > caes04.muc.nfs: . 136:136(0) ack 245 win 8576 (DF)
09:59:31.843160 eth0 > capc25.muc.798 > caes04.muc.nfs: P 136:300(164) ack 245 win 8576 (DF)
09:59:31.850179 eth0 < caes04.muc.nfs > capc25.muc.798: P 245:393(148) ack 300 win 60032
09:59:31.850373 eth0 > capc25.muc.798 > caes04.muc.nfs: . 300:1760(1460) ack 393 win 8576 (DF)
09:59:31.850387 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1760:3220(1460) ack 393 win 8576 (DF)
09:59:31.864330 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 3220 win 57112
09:59:31.864360 eth0 > capc25.muc.798 > caes04.muc.nfs: . 3220:4680(1460) ack 393 win 8576 (DF)
09:59:31.864370 eth0 > capc25.muc.798 > caes04.muc.nfs: . 4680:6140(1460) ack 393 win 8576 (DF)
09:59:31.864379 eth0 > capc25.muc.798 > caes04.muc.nfs: . 6140:7600(1460) ack 393 win 8576 (DF)
09:59:32.064719 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 7600 win 52732
09:59:32.064732 eth0 > capc25.muc.798 > caes04.muc.nfs: . 7600:9060(1460) ack 393 win 8576 (DF)
09:59:32.064741 eth0 > capc25.muc.798 > caes04.muc.nfs: . 9060:10520(1460) ack 393 win 8576 (DF)
09:59:32.064750 eth0 > capc25.muc.798 > caes04.muc.nfs: . 10520:11980(1460) ack 393 win 8576 (DF)
09:59:32.064760 eth0 > capc25.muc.798 > caes04.muc.nfs: P 11980:13440(1460) ack 393 win 8576 (DF)
09:59:32.265178 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 13440 win 46892
09:59:32.265205 eth0 > capc25.muc.798 > caes04.muc.nfs: . 13440:14900(1460) ack 393 win 8576 (DF)
09:59:32.265215 eth0 > capc25.muc.798 > caes04.muc.nfs: P 14900:16360(1460) ack 393 win 8576 (DF)
09:59:32.265225 eth0 > capc25.muc.798 > caes04.muc.nfs: . 16360:17820(1460) ack 393 win 8576 (DF)
09:59:32.265234 eth0 > capc25.muc.798 > caes04.muc.nfs: . 17820:19280(1460) ack 393 win 8576 (DF)
09:59:32.265244 eth0 > capc25.muc.798 > caes04.muc.nfs: P 19280:20740(1460) ack 393 win 8576 (DF)
09:59:32.465675 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 20740 win 39592
09:59:32.465686 eth0 > capc25.muc.798 > caes04.muc.nfs: . 20740:22200(1460) ack 393 win 8576 (DF)
09:59:32.465693 eth0 > capc25.muc.798 > caes04.muc.nfs: . 22200:23660(1460) ack 393 win 8576 (DF)
09:59:32.465702 eth0 > capc25.muc.798 > caes04.muc.nfs: . 23660:25120(1460) ack 393 win 8576 (DF)
09:59:32.465710 eth0 > capc25.muc.798 > caes04.muc.nfs: P 25120:26580(1460) ack 393 win 8576 (DF)
09:59:32.465725 eth0 > capc25.muc.798 > caes04.muc.nfs: . 26580:28040(1460) ack 393 win 8576 (DF)
09:59:32.465736 eth0 > capc25.muc.798 > caes04.muc.nfs: . 28040:29500(1460) ack 393 win 8576 (DF)
09:59:32.667194 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 29500 win 30832
09:59:32.667209 eth0 > capc25.muc.798 > caes04.muc.nfs: . 29500:30960(1460) ack 393 win 8576 (DF)
09:59:32.667218 eth0 > capc25.muc.798 > caes04.muc.nfs: . 30960:32420(1460) ack 393 win 8576 (DF)
09:59:32.667227 eth0 > capc25.muc.798 > caes04.muc.nfs: . 32420:33880(1460) ack 393 win 8576 (DF)
09:59:32.667236 eth0 > capc25.muc.798 > caes04.muc.nfs: . 33880:35340(1460) ack 393 win 8576 (DF)
09:59:32.667248 eth0 > capc25.muc.798 > caes04.muc.nfs: . 35340:36800(1460) ack 393 win 8576 (DF)
09:59:32.667257 eth0 > capc25.muc.798 > caes04.muc.nfs: P 36800:38260(1460) ack 393 win 8576 (DF)
09:59:32.667277 eth0 > capc25.muc.798 > caes04.muc.nfs: . 38260:39720(1460) ack 393 win 8576 (DF)
09:59:32.668855 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 33880 win 59368
09:59:32.668864 eth0 > capc25.muc.798 > caes04.muc.nfs: . 39720:41180(1460) ack 393 win 8576 (DF)
09:59:32.668872 eth0 > capc25.muc.798 > caes04.muc.nfs: . 41180:42640(1460) ack 393 win 8576 (DF)
09:59:32.668880 eth0 > capc25.muc.798 > caes04.muc.nfs: . 42640:44100(1460) ack 393 win 8576 (DF)
09:59:32.668888 eth0 > capc25.muc.798 > caes04.muc.nfs: . 44100:45560(1460) ack 393 win 8576 (DF)
09:59:32.669337 eth0 < caes04.muc.nfs > capc25.muc.798: P 393:557(164) ack 36800 win 56448
09:59:32.669350 eth0 > capc25.muc.798 > caes04.muc.nfs: . 45560:47020(1460) ack 557 win 8576 (DF)
09:59:32.669357 eth0 > capc25.muc.798 > caes04.muc.nfs: . 47020:48480(1460) ack 557 win 8576 (DF)
09:59:32.669365 eth0 > capc25.muc.798 > caes04.muc.nfs: P 48480:49940(1460) ack 557 win 8576 (DF)
09:59:32.867633 eth0 < caes04.muc.nfs > capc25.muc.798: . 557:557(0) ack 49940 win 43308
09:59:32.867668 eth0 > capc25.muc.798 > caes04.muc.nfs: . 49940:51400(1460) ack 557 win 8576 (DF)
09:59:32.867678 eth0 > capc25.muc.798 > caes04.muc.nfs: . 51400:52860(1460) ack 557 win 8576 (DF)
09:59:32.867687 eth0 > capc25.muc.798 > caes04.muc.nfs: P 52860:54320(1460) ack 557 win 8576 (DF)
09:59:32.867696 eth0 > capc25.muc.798 > caes04.muc.nfs: . 54320:55780(1460) ack 557 win 8576 (DF)
09:59:32.867707 eth0 > capc25.muc.798 > caes04.muc.nfs: P 55780:57240(1460) ack 557 win 8576 (DF)
09:59:32.867733 eth0 > capc25.muc.798 > caes04.muc.nfs: . 57240:58700(1460) ack 557 win 8576 (DF)
09:59:32.867745 eth0 > capc25.muc.798 > caes04.muc.nfs: . 58700:60160(1460) ack 557 win 8576 (DF)
09:59:32.867758 eth0 > capc25.muc.798 > caes04.muc.nfs: . 60160:61620(1460) ack 557 win 8576 (DF)
09:59:32.867772 eth0 > capc25.muc.798 > caes04.muc.nfs: . 61620:63080(1460) ack 557 win 8576 (DF)
09:59:32.867781 eth0 > capc25.muc.798 > caes04.muc.nfs: . 63080:64540(1460) ack 557 win 8576 (DF)
.. and so on..
09:59:33.507567 eth0 < caes04.muc.nfs > capc25.muc.798: . 2525:2525(0) ack 462188 win 58968
09:59:33.507587 eth0 > capc25.muc.798 > caes04.muc.nfs: . 478248:479708(1460) ack 2525 win 8576 (DF)
09:59:33.507595 eth0 > capc25.muc.798 > caes04.muc.nfs: . 479708:481168(1460) ack 2525 win 8576 (DF)
09:59:33.507604 eth0 > capc25.muc.798 > caes04.muc.nfs: . 481168:482628(1460) ack 2525 win 8576 (DF)
09:59:33.507613 eth0 > capc25.muc.798 > caes04.muc.nfs: . 482628:484088(1460) ack 2525 win 8576 (DF)
09:59:33.507621 eth0 > capc25.muc.798 > caes04.muc.nfs: . 484088:485548(1460) ack 2525 win 8576 (DF)
09:59:33.507632 eth0 > capc25.muc.798 > caes04.muc.nfs: . 485548:487008(1460) ack 2525 win 8576 (DF)
09:59:33.507644 eth0 > capc25.muc.798 > caes04.muc.nfs: . 487008:488468(1460) ack 2525 win 8576 (DF)
09:59:33.507654 eth0 > capc25.muc.798 > caes04.muc.nfs: . 488468:489928(1460) ack 2525 win 8576 (DF)
09:59:33.507661 eth0 > capc25.muc.798 > caes04.muc.nfs: P 489928:491388(1460) ack 2525 win 8576 (DF)
09:59:33.507672 eth0 > capc25.muc.798 > caes04.muc.nfs: . 491388:492848(1460) ack 2525 win 8576 (DF)
09:59:33.507682 eth0 > capc25.muc.798 > caes04.muc.nfs: P 492848:494040(1192) ack 2525 win 8576 (DF)
09:59:33.507863 eth0 < caes04.muc.nfs > capc25.muc.798: P 2525:2689(164) ack 466568 win 54588
09:59:33.510198 eth0 < caes04.muc.nfs > capc25.muc.798: . 2689:2689(0) ack 494040 win 60032
09:59:33.510770 eth0 < caes04.muc.nfs > capc25.muc.798: P 2689:2853(164) ack 494040 win 60032
09:59:33.510827 eth0 > capc25.muc.798 > caes04.muc.nfs: . 494040:495500(1460) ack 2853 win 8576 (DF)
09:59:33.510838 eth0 > capc25.muc.798 > caes04.muc.nfs: . 495500:496960(1460) ack 2853 win 8576 (DF)
09:59:33.510850 eth0 > capc25.muc.798 > caes04.muc.nfs: . 496960:498420(1460) ack 2853 win 8576 (DF)
09:59:33.510860 eth0 > capc25.muc.798 > caes04.muc.nfs: . 498420:499880(1460) ack 2853 win 8576 (DF)
09:59:33.510875 eth0 > capc25.muc.798 > caes04.muc.nfs: . 499880:501340(1460) ack 2853 win 8576 (DF)
09:59:33.510886 eth0 > capc25.muc.798 > caes04.muc.nfs: . 501340:502800(1460) ack 2853 win 8576 (DF)
09:59:33.510895 eth0 > capc25.muc.798 > caes04.muc.nfs: . 502800:504260(1460) ack 2853 win 8576 (DF)
09:59:33.510905 eth0 > capc25.muc.798 > caes04.muc.nfs: . 504260:505720(1460) ack 2853 win 8576 (DF)
09:59:33.510916 eth0 > capc25.muc.798 > caes04.muc.nfs: . 505720:507180(1460) ack 2853 win 8576 (DF)
09:59:33.510925 eth0 > capc25.muc.798 > caes04.muc.nfs: . 507180:508640(1460) ack 2853 win 8576 (DF)
09:59:33.510935 eth0 > capc25.muc.798 > caes04.muc.nfs: . 508640:510100(1460) ack 2853 win 8576 (DF)
09:59:33.510946 eth0 > capc25.muc.798 > caes04.muc.nfs: . 510100:511560(1460) ack 2853 win 8576 (DF)
09:59:33.510955 eth0 > capc25.muc.798 > caes04.muc.nfs: . 511560:513020(1460) ack 2853 win 8576 (DF)
09:59:33.510965 eth0 > capc25.muc.798 > caes04.muc.nfs: . 513020:514480(1460) ack 2853 win 8576 (DF)
09:59:33.510977 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514480:514668(188) ack 2853 win 8576 (DF)
09:59:33.513913 eth0 < caes04.muc.nfs > capc25.muc.798: . 2853:2853(0) ack 514668 win 60032
09:59:33.514296 eth0 < caes04.muc.nfs > capc25.muc.798: P 2853:3017(164) ack 514668 win 60032
09:59:33.514361 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514668:514808(140) ack 3017 win 8576 (DF)
09:59:33.628254 eth0 < caes04.muc.nfs > capc25.muc.798: P 3017:3173(156) ack 514808 win 60032
09:59:33.665083 eth0 > capc25.muc.798 > caes04.muc.nfs: . 514808:514808(0) ack 3173 win 8576 (DF)
09:59:38.625495 eth0 > capc25.muc.839188480 > caes04.muc.nfs: 40 null (DF)
09:59:38.625997 eth0 < caes04.muc.nfs > capc25.muc.839188480: reply ok 24 null
09:59:49.583361 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514808:514936(128) ack 3173 win 8576 (DF)
09:59:49.584108 eth0 < caes04.muc.nfs > capc25.muc.798: P 3173:3289(116) ack 514936 win 60032
09:59:49.584123 eth0 > capc25.muc.798 > caes04.muc.nfs: . 514936:514936(0) ack 3289 win 8576 (DF)
09:59:49.586188 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514936:515072(136) ack 3289 win 8576 (DF)
09:59:49.586917 eth0 < caes04.muc.nfs > capc25.muc.798: P 3289:3533(244) ack 515072 win 60032
09:59:49.625793 eth0 > capc25.muc.798 > caes04.muc.nfs: . 515072:515072(0) ack 3533 win 8576 (DF)
09:59:52.567700 eth0 > capc25.muc.798 > caes04.muc.nfs: P 515072:515208(136) ack 3533 win 8576 (DF)
09:59:52.568477 eth0 < caes04.muc.nfs > capc25.muc.798: P 3533:3777(244) ack 515208 win 60032
09:59:52.568491 eth0 > capc25.muc.798 > caes04.muc.nfs: . 515208:515208(0) ack 3777 win 8576 (DF)
09:59:52.568530 eth0 > capc25.muc.798 > caes04.muc.nfs: P 515208:515372(164) ack 3777 win 8576 (DF)
09:59:52.584532 eth0 < caes04.muc.nfs > capc25.muc.798: P 3777:3925(148) ack 515372 win 60032
09:59:52.584767 eth0 > capc25.muc.798 > caes04.muc.nfs: . 515372:516832(1460) ack 3925 win 8576 (DF)
09:59:52.584781 eth0 > capc25.muc.798 > caes04.muc.nfs: . 516832:518292(1460) ack 3925 win 8576 (DF)
09:59:52.745784 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 518292 win 57112
09:59:52.745809 eth0 > capc25.muc.798 > caes04.muc.nfs: . 518292:519752(1460) ack 3925 win 8576 (DF)
09:59:52.745820 eth0 > capc25.muc.798 > caes04.muc.nfs: . 519752:521212(1460) ack 3925 win 8576 (DF)
09:59:52.745829 eth0 > capc25.muc.798 > caes04.muc.nfs: . 521212:522672(1460) ack 3925 win 8576 (DF)
09:59:52.946195 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 522672 win 52732
09:59:52.946209 eth0 > capc25.muc.798 > caes04.muc.nfs: . 522672:524132(1460) ack 3925 win 8576 (DF)
09:59:52.946219 eth0 > capc25.muc.798 > caes04.muc.nfs: . 524132:525592(1460) ack 3925 win 8576 (DF)
09:59:52.946228 eth0 > capc25.muc.798 > caes04.muc.nfs: . 525592:527052(1460) ack 3925 win 8576 (DF)
09:59:52.946237 eth0 > capc25.muc.798 > caes04.muc.nfs: . 527052:528512(1460) ack 3925 win 8576 (DF)
09:59:53.146698 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 528512 win 46892
09:59:53.146726 eth0 > capc25.muc.798 > caes04.muc.nfs: . 528512:529972(1460) ack 3925 win 8576 (DF)
09:59:53.146736 eth0 > capc25.muc.798 > caes04.muc.nfs: . 529972:531432(1460) ack 3925 win 8576 (DF)
09:59:53.146746 eth0 > capc25.muc.798 > caes04.muc.nfs: . 531432:532892(1460) ack 3925 win 8576 (DF)
09:59:53.146755 eth0 > capc25.muc.798 > caes04.muc.nfs: . 532892:534352(1460) ack 3925 win 8576 (DF)
09:59:53.146763 eth0 > capc25.muc.798 > caes04.muc.nfs: . 534352:535812(1460) ack 3925 win 8576 (DF)
09:59:53.346907 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 535812 win 39592
.. and so on...
09:59:54.380949 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1010312:1011772(1460) ack 5893 win 8576 (DF)
09:59:54.380956 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1011772:1013232(1460) ack 5893 win 8576 (DF)
09:59:54.380964 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1013232:1014692(1460) ack 5893 win 8576 (DF)
09:59:54.380971 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1014692:1016152(1460) ack 5893 win 8576 (DF)
09:59:54.380978 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1016152:1017612(1460) ack 5893 win 8576 (DF)
09:59:54.382590 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1017612:1019072(1460) ack 5893 win 8576 (DF)
09:59:54.382598 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1019072:1020532(1460) ack 5893 win 8576 (DF)
09:59:54.382603 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1020532:1021992(1460) ack 5893 win 8576 (DF)
09:59:54.382608 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1021992:1023452(1460) ack 5893 win 8576 (DF)
09:59:54.382617 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1023452:1024912(1460) ack 5893 win 8576 (DF)
09:59:54.382623 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1024912:1026372(1460) ack 5893 win 8576 (DF)
09:59:54.382627 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1026372:1027832(1460) ack 5893 win 8576 (DF)
09:59:54.384353 eth0 < caes04.muc.nfs > capc25.muc.798: . 5893:5893(0) ack 1010312 win 58832
09:59:54.384375 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1027832:1029292(1460) ack 5893 win 8576 (DF)
09:59:54.384383 eth0 > capc25.muc.798 > caes04.muc.nfs: P 1029292:1029740(448) ack 5893 win 8576 (DF)
09:59:54.385187 eth0 < caes04.muc.nfs > capc25.muc.798: P 5893:6057(164) ack 1014692 win 54452
09:59:54.385216 eth0 < caes04.muc.nfs > capc25.muc.798: P 6057:6221(164) ack 1017612 win 51532
09:59:54.385237 eth0 < caes04.muc.nfs > capc25.muc.798: P 6221:6385(164) ack 1021992 win 47152
09:59:54.385830 eth0 < caes04.muc.nfs > capc25.muc.798: . 6385:6385(0) ack 1029740 win 60032
09:59:54.386145 eth0 < caes04.muc.nfs > capc25.muc.798: P 6385:6549(164) ack 1029740 win 60032
09:59:54.386153 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1029740:1029740(0) ack 6549 win 8576 (DF)
09:59:58.976285 eth0 > capc25.muc.798 > caes04.muc.nfs: P 1029740:1029880(140) ack 6549 win 8576 (DF)
09:59:59.029839 eth0 < caes04.muc.nfs > capc25.muc.798: P 6549:6705(156) ack 1029880 win 60032
09:59:59.066239 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1029880:1029880(0) ack 6705 win 8576 (DF)
10:00:08.566727 eth0 > capc25.muc.1644494848 > caes04.muc.nfs: 40 null (DF)
10:00:08.567211 eth0 < caes04.muc.nfs > capc25.muc.1644494848: reply ok 24 null
10:00:38.018114 eth0 > capc25.muc.2449801216 > caes04.muc.nfs: 40 null (DF)
10:00:38.018764 eth0 < caes04.muc.nfs > capc25.muc.2449801216: reply ok 24 null
tcpdump on caes04 (AIX)
09:59:31.845604763 capc25.muc.798 > caes04.muc.shilp: P 1035886130:1035886266(136) ack 1814843210 win 7504 (DF)
09:59:31.845756217 caes04.muc.shilp > capc25.muc.798: P 1:245(244) ack 136 win 60032
09:59:31.846304152 capc25.muc.798 > caes04.muc.shilp: . ack 245 win 8576 (DF)
09:59:31.846367867 capc25.muc.798 > caes04.muc.shilp: P 136:300(164) ack 245 win 8576 (DF)
09:59:31.852903330 caes04.muc.shilp > capc25.muc.798: P 245:393(148) ack 300 win 60032
09:59:31.854441517 capc25.muc.798 > caes04.muc.shilp: . 300:1760(1460) ack 393 win 8576 (DF)
09:59:31.854577748 capc25.muc.798 > caes04.muc.shilp: . 1035887890:1035889350(1460) ack 1814843602 win 8576 (DF)
09:59:31.867147719 caes04.muc.shilp > capc25.muc.798: . ack 1460 win 57112
09:59:31.868416956 capc25.muc.798 > caes04.muc.shilp: . 1460:2920(1460) ack 1 win 8576 (DF)
09:59:31.868541353 capc25.muc.798 > caes04.muc.shilp: . 2920:4380(1460) ack 1 win 8576 (DF)
09:59:31.868699607 capc25.muc.798 > caes04.muc.shilp: . 4380:5840(1460) ack 1 win 8576 (DF)
09:59:32.067542678 caes04.muc.shilp > capc25.muc.798: . ack 5840 win 52732
09:59:32.068873485 capc25.muc.798 > caes04.muc.shilp: . 5840:7300(1460) ack 1 win 8576 (DF)
09:59:32.068995737 capc25.muc.798 > caes04.muc.shilp: . 7300:8760(1460) ack 1 win 8576 (DF)
09:59:32.069121401 capc25.muc.798 > caes04.muc.shilp: . 8760:10220(1460) ack 1 win 8576 (DF)
09:59:32.069246176 capc25.muc.798 > caes04.muc.shilp: P 10220:11680(1460) ack 1 win 8576 (DF)
09:59:32.268000574 caes04.muc.shilp > capc25.muc.798: . ack 11680 win 46892
09:59:32.269330237 capc25.muc.798 > caes04.muc.shilp: . 11680:13140(1460) ack 1 win 8576 (DF)
09:59:32.269447200 capc25.muc.798 > caes04.muc.shilp: P 13140:14600(1460) ack 1 win 8576 (DF)
09:59:32.269575574 capc25.muc.798 > caes04.muc.shilp: . 14600:16060(1460) ack 1 win 8576 (DF)
09:59:32.269700305 capc25.muc.798 > caes04.muc.shilp: . 16060:17520(1460) ack 1 win 8576 (DF)
09:59:32.269824713 capc25.muc.798 > caes04.muc.shilp: P 17520:18980(1460) ack 1 win 8576 (DF)
09:59:32.468507295 caes04.muc.shilp > capc25.muc.798: . ack 18980 win 39592
09:59:32.469794655 capc25.muc.798 > caes04.muc.shilp: . 18980:20440(1460) ack 1 win 8576 (DF)
09:59:32.469949532 capc25.muc.798 > caes04.muc.shilp: . 20440:21900(1460) ack 1 win 8576 (DF)
09:59:32.470074273 capc25.muc.798 > caes04.muc.shilp: . 21900:23360(1460) ack 1 win 8576 (DF)
09:59:32.470197836 capc25.muc.798 > caes04.muc.shilp: P 23360:24820(1460) ack 1 win 8576 (DF)
09:59:32.470324178 capc25.muc.798 > caes04.muc.shilp: . 24820:26280(1460) ack 1 win 8576 (DF)
09:59:32.470450442 capc25.muc.798 > caes04.muc.shilp: . 26280:27740(1460) ack 1 win 8576 (DF)
09:59:32.669979456 caes04.muc.shilp > capc25.muc.798: . ack 27740 win 30832
09:59:32.671290251 capc25.muc.798 > caes04.muc.shilp: . 27740:29200(1460) ack 1 win 8576 (DF)
09:59:32.671411003 capc25.muc.798 > caes04.muc.shilp: . 29200:30660(1460) ack 1 win 8576 (DF)
09:59:32.671536311 capc25.muc.798 > caes04.muc.shilp: . 30660:32120(1460) ack 1 win 8576 (DF)
09:59:32.671604804 caes04.muc.shilp > capc25.muc.798: . ack 32120 win 59368
09:59:32.671661586 capc25.muc.798 > caes04.muc.shilp: . 32120:33580(1460) ack 1 win 8576 (DF)
09:59:32.671888756 capc25.muc.798 > caes04.muc.shilp: . 33580:35040(1460) ack 1 win 8576 (DF)
09:59:32.672038243 caes04.muc.shilp > capc25.muc.798: P 1:165(164) ack 35040 win 56448
09:59:32.672178508 capc25.muc.798 > caes04.muc.shilp: . 36500:37960(1460) ack 1 win 8576 (DF)
09:59:32.672918233 capc25.muc.798 > caes04.muc.shilp: . 37960:39420(1460) ack 1 win 8576 (DF)
.. and so on...
09:59:33.515030950 capc25.muc.798 > caes04.muc.shilp: . 493740:495200(1460) ack 2461 win 8576 (DF)
09:59:33.515177881 capc25.muc.798 > caes04.muc.shilp: . 495200:496660(1460) ack 2461 win 8576 (DF)
09:59:33.515304223 capc25.muc.798 > caes04.muc.shilp: . 496660:498120(1460) ack 2461 win 8576 (DF)
09:59:33.515429442 capc25.muc.798 > caes04.muc.shilp: . 498120:499580(1460) ack 2461 win 8576 (DF)
09:59:33.515561750 capc25.muc.798 > caes04.muc.shilp: . 499580:501040(1460) ack 2461 win 8576 (DF)
09:59:33.515695048 capc25.muc.798 > caes04.muc.shilp: . 501040:502500(1460) ack 2461 win 8576 (DF)
09:59:33.515817122 capc25.muc.798 > caes04.muc.shilp: . 502500:503960(1460) ack 2461 win 8576 (DF)
09:59:33.515942297 capc25.muc.798 > caes04.muc.shilp: . 503960:505420(1460) ack 2461 win 8576 (DF)
09:59:33.516066994 capc25.muc.798 > caes04.muc.shilp: . 505420:506880(1460) ack 2461 win 8576 (DF)
09:59:33.516190724 capc25.muc.798 > caes04.muc.shilp: . 506880:508340(1460) ack 2461 win 8576 (DF)
09:59:33.516316599 capc25.muc.798 > caes04.muc.shilp: . 508340:509800(1460) ack 2461 win 8576 (DF)
09:59:33.516443951 capc25.muc.798 > caes04.muc.shilp: . 509800:511260(1460) ack 2461 win 8576 (DF)
09:59:33.516575404 capc25.muc.798 > caes04.muc.shilp: . 511260:512720(1460) ack 2461 win 8576 (DF)
09:59:33.516586905 capc25.muc.798 > caes04.muc.shilp: P 512720:512908(188) ack 2461 win 8576 (DF)
09:59:33.516681467 caes04.muc.shilp > capc25.muc.798: . ack 512908 win 60032
09:59:33.516883991 caes04.muc.shilp > capc25.muc.798: P 2461:2625(164) ack 512908 win 60032
09:59:33.517569101 capc25.muc.798 > caes04.muc.shilp: P 512908:513048(140) ack 2625 win 8576 (DF)
09:59:33.630955691 caes04.muc.shilp > capc25.muc.798: P 2625:2781(156) ack 513048 win 60032
09:59:33.668220832 capc25.muc.798 > caes04.muc.shilp: . ack 2781 win 8576 (DF)
09:59:38.628694302 capc25.muc.32050000 > caes04.muc.nfs: 40 null (DF)
09:59:38.628778496 caes04.muc.nfs > capc25.muc.32050000: reply ok 24
09:59:49.586536582 capc25.muc.798 > caes04.muc.shilp: P 1036400938:1036401066(128) ack 1814846382 win 8576 (DF)
09:59:49.586776164 caes04.muc.shilp > capc25.muc.798: P 1:117(116) ack 128 win 60032
09:59:49.587157099 capc25.muc.798 > caes04.muc.shilp: . ack 117 win 8576 (DF)
09:59:49.589315282 capc25.muc.798 > caes04.muc.shilp: P 128:264(136) ack 117 win 8576 (DF)
09:59:49.589408743 caes04.muc.shilp > capc25.muc.798: P 117:361(244) ack 264 win 60032
09:59:49.629006777 capc25.muc.798 > caes04.muc.shilp: . ack 361 win 8576 (DF)
09:59:52.570844645 capc25.muc.798 > caes04.muc.shilp: P 264:400(136) ack 361 win 8576 (DF)
09:59:52.571043168 caes04.muc.shilp > capc25.muc.798: P 361:605(244) ack 400 win 60032
09:59:52.571576558 capc25.muc.798 > caes04.muc.shilp: . ack 605 win 8576 (DF)
09:59:52.571649841 capc25.muc.798 > caes04.muc.shilp: P 400:564(164) ack 605 win 8576 (DF)
09:59:52.587169789 caes04.muc.shilp > capc25.muc.798: P 605:753(148) ack 564 win 60032
09:59:52.588747424 capc25.muc.798 > caes04.muc.shilp: . 564:2024(1460) ack 753 win 8576 (DF)
09:59:52.588875654 capc25.muc.798 > caes04.muc.shilp: . 515072:516532(1460) ack 3533 win 8576 (DF)
09:59:52.748473502 caes04.muc.shilp > capc25.muc.798: . ack 516532 win 57112
09:59:52.749948552 capc25.muc.798 > caes04.muc.shilp: . 516532:517992(1460) ack 3533 win 8576 (DF)
09:59:52.750054981 capc25.muc.798 > caes04.muc.shilp: . 517992:519452(1460) ack 3533 win 8576 (DF)
09:59:52.750181422 capc25.muc.798 > caes04.muc.shilp: . 519452:520912(1460) ack 3533 win 8576 (DF)
09:59:52.948926208 caes04.muc.shilp > capc25.muc.798: . ack 520912 win 52732
09:59:52.950208057 capc25.muc.798 > caes04.muc.shilp: . 520912:522372(1460) ack 3533 win 8576 (DF)
09:59:52.950317664 capc25.muc.798 > caes04.muc.shilp: . 522372:523832(1460) ack 3533 win 8576 (DF)
09:59:52.950449373 capc25.muc.798 > caes04.muc.shilp: . 523832:525292(1460) ack 3533 win 8576 (DF)
09:59:52.950596493 capc25.muc.798 > caes04.muc.shilp: . 525292:526752(1460) ack 3533 win 8576 (DF)
09:59:53.149427407 caes04.muc.shilp > capc25.muc.798: . ack 526752 win 46892
09:59:53.150752059 capc25.muc.798 > caes04.muc.shilp: . 526752:528212(1460) ack 3533 win 8576 (DF)
09:59:53.150865744 capc25.muc.798 > caes04.muc.shilp: . 528212:529672(1460) ack 3533 win 8576 (DF)
09:59:53.150994330 capc25.muc.798 > caes04.muc.shilp: . 529672:531132(1460) ack 3533 win 8576 (DF)
09:59:53.151120238 capc25.muc.798 > caes04.muc.shilp: . 531132:532592(1460) ack 3533 win 8576 (DF)
09:59:53.151253113 capc25.muc.798 > caes04.muc.shilp: . 532592:534052(1460) ack 3533 win 8576 (DF)
09:59:53.349648355 caes04.muc.shilp > capc25.muc.798: . ack 534052 win 39592
09:59:53.350953271 capc25.muc.798 > caes04.muc.shilp: . 534052:535512(1460) ack 3533 win 8576 (DF)
.. and so on...
09:59:54.385746887 capc25.muc.798 > caes04.muc.shilp: . 999792:1001252(1460) ack 5501 win 8576 (DF)
09:59:54.385886062 capc25.muc.798 > caes04.muc.shilp: . 1001252:1002712(1460) ack 5501 win 8576 (DF)
09:59:54.386048695 capc25.muc.798 > caes04.muc.shilp: P 1002712:1004172(1460) ack 5501 win 8576 (DF)
09:59:54.386173981 capc25.muc.798 > caes04.muc.shilp: . 1004172:1005632(1460) ack 5501 win 8576 (DF)
09:59:54.386299911 capc25.muc.798 > caes04.muc.shilp: . 1005632:1007092(1460) ack 5501 win 8576 (DF)
09:59:54.386425664 capc25.muc.798 > caes04.muc.shilp: P 1007092:1008552(1460) ack 5501 win 8576 (DF)
09:59:54.386469944 caes04.muc.shilp > capc25.muc.798: . ack 1008552 win 58832
09:59:54.386555427 capc25.muc.798 > caes04.muc.shilp: . 1008552:1010012(1460) ack 5501 win 8576 (DF)
09:59:54.386684180 capc25.muc.798 > caes04.muc.shilp: . 1010012:1011472(1460) ack 5501 win 8576 (DF)
09:59:54.386918284 capc25.muc.798 > caes04.muc.shilp: . 1011472:1012932(1460) ack 5501 win 8576 (DF)
09:59:54.386971732 caes04.muc.shilp > capc25.muc.798: P 5501:5665(164) ack 1012932 win 54452
09:59:54.387297675 caes04.muc.shilp > capc25.muc.798: P 5665:5829(164) ack 1015852 win 51532
09:59:54.387677099 caes04.muc.shilp > capc25.muc.798: P 5829:5993(164) ack 1020232 win 47152
09:59:54.387728714 capc25.muc.798 > caes04.muc.shilp: . 1020232:1021692(1460) ack 5501 win 8576 (DF)
09:59:54.387853499 capc25.muc.798 > caes04.muc.shilp: . 1021692:1023152(1460) ack 5501 win 8576 (DF)
09:59:54.387999031 capc25.muc.798 > caes04.muc.shilp: . 1036911042:1036912502(1460) ack 1814849102 win 8576 (DF)
09:59:54.388126872 capc25.muc.798 > caes04.muc.shilp: . 1460:2920(1460) ack 1 win 8576 (DF)
and cp is running in the timeout again....
Cheers,
Birger
Trond Myklebust writes:
> >>>>> " " == Birger Lammering <[email protected]> writes:
>
> > 3133:3289(156) ack 514936 win 60032 16:27:26.282843 >
> > capc25.muc.799 > caes04.muc.nfs: . 514936:514936(0) ack 3289
> > win 8576 (DF)
>
> > from now on we get lot's of these:
>
> > 16:27:26.489024 > capc25.muc.576126976 > caes04.muc.nfs: 40
> > null (DF) 16:27:26.489647 < caes04.muc.nfs >
> > capc25.muc.576126976: reply ok 24 null
>
> > The cp command on the Linux nfs3-client side hangs and cannot
> > be killed. We get:
>
> > dmesg: nfs: server caes04 not responding, still trying
>
> > then after a while: dmesg: nfs: server caes04 OK
>
> > qx09820@capc25 /home/qx09820 > netstat | grep caes04 tcp 0 0
> > capc25.muc:798 caes04.muc:nfs ESTABLISHED
>
> Ho hum... It looks to me as if the problem is that the Linux NFS
> client is falling asleep before a write, and then not waking
> up. That sort of points at the write_space() callback.
>
> When the socket buffer is full, and we get an EAGAIN response to our
> sendmsg() request, we normally put the request to sleep, block the
> socket, and rely on write_space() to wake us up when there is enough
> memory to proceed.
>
> Assuming that this is the case, there are 2 possible causes:
>
> 1) A bug in the IPV4 TCP layer in which we don't call write_space()
> despite having liberated enough memory to proceed.
>
> 2) I've misunderstood the IPV4 tcp api, and so the check for
> sock_writeable() in net/sunrpc/xprt.c:tcp_write_space() is
> incorrect.
>
> Alexey: Do you have any comments? Is it correct to check for
> sock_writeable() on a TCP socket?
>
>
> Birger: could you try the following patch, that simply removes the
> check for sock_writeable()?
>
> Cheers,
> Trond
>
> --- linux-2.4.15-pre6/net/sunrpc/xprt.c.orig Mon Oct 8 21:36:07 2001
> +++ linux-2.4.15-pre6/net/sunrpc/xprt.c Mon Nov 19 19:07:09 2001
> @@ -1071,10 +1071,6 @@
> if (xprt->shutdown)
> return;
>
> - /* Wait until we have enough socket memory */
> - if (!sock_writeable(sk))
> - return;
> -
> if (!xprt_test_and_set_wspace(xprt)) {
> spin_lock(&xprt->sock_lock);
> if (xprt->snd_task && xprt->snd_task->tk_rpcwait == &xprt->sending)
>
Hello!
> You are saying that the it is impossible for sock_alloc_send_skb() to
> fail when using non-blocking writes?
It is possible and normal provided frame is not fragmented.
And this is bug in nfsd if this happens with its frames.
> writes. (Note: by 'simultaneous' I mean that we don't wait for the
> server to reply before firing off the next request)
I do not understand, you have said you wait for write space yet. :-)
> I haven't done anything about this because IMHO it makes more sense to
> have the QDIO driver drop their special spinlock when calling external
> functions such as dev_kfree_skb_any()
It is pretty normal, if I understand your words correctly.
kfree_skb() is called under various kinds of locks in lots of places.
> rather than to force the RPC layer to use the spin_lock_irqsave().
I see no relation at all. Do it irqsave and nothing will change,
write_space is called only from softirqs.
It is bug in xprt level to grab spinlock which can cause deadlocks
inside write_space. Probably, I misunderstood you.
Alexey
Hello!
> I forgot to add: The socket fasync lists use spinlocking in the same
> was as RPC does, with sock_fasync() setting
> write_lock_bh(&sk->callback_lock), and sock_def_write_space()
> doing read_lock(&sk->callback_lock).
>
> So that would deadlock with the QDIO driver in the exact same manner
> as the RPC stuff (albeit probably a lot less frequently).
Please, elaborate. I do not see any way.
Alexey
>>>>> " " == kuznet <[email protected]> writes:
> Hello!
>> I forgot to add: The socket fasync lists use spinlocking in the
>> same was as RPC does, with sock_fasync() setting
>> write_lock_bh(&sk->callback_lock), and sock_def_write_space()
>> doing read_lock(&sk->callback_lock).
>>
>> So that would deadlock with the QDIO driver in the exact same
>> manner as the RPC stuff (albeit probably a lot less
>> frequently).
> Please, elaborate. I do not see any way.
Processor 1 Processor 2
(Call QDIO bottom half code)
spin_lock(&QDIO_lock);
write_lock_bh(&sk->callback_lock)
dev_kfree_skb_any()
-> kfree_skb()
<QDIO hard interrupt>
-> write_space()
->spin_lock(&QDIO_lock)
(spins...)
->read_lock(&sk->callback_lock);
(spins)
Deadlock - in exactly the same way as with the xprt code...
Cheers,
Trond
Hello!
> Deadlock - in exactly the same way as with the xprt code...
Soooory! Delete from the picture all except for containing QDIO:
(Call QDIO bottom half code)
spin_lock(&QDIO_lock);
<QDIO hard interrupt>
->spin_lock(&QDIO_lock)
(spins...)
with the same result. No help of nfs is required. :-)
So, you have drawn wrong picture.
Alexey
>>>>> " " == kuznet <[email protected]> writes:
> Hello!
>> Deadlock - in exactly the same way as with the xprt code...
> Soooory! Delete from the picture all except for containing
> QDIO:
> (Call QDIO bottom half code) spin_lock(&QDIO_lock);
> <QDIO hard
> interrupt>
-> spin_lock(&QDIO_lock)
> (spins...)
> with the same result. No help of nfs is required. :-)
> So, you have drawn wrong picture.
>From the mail I received, I gathered that they were protected against
this. The hard interrupt could only occur on another processor.
In any case, my point is that the xprt stuff does *nothing* that is
not also done in the fasync code. If a spinlock deadlock scenario is
possible in one, then it is also possible in the other.
Cheers,
Trond
From: [email protected]
Date: Tue, 20 Nov 2001 22:45:21 +0300 (MSK)
No help of nfs is required. :-)
So, you have drawn wrong picture.
Trond, Alexey is right, all the examples of sunrpc/fasync/etc.
are totally irrelevant. This driver is buggy beyond belief.
Hello!
> Trond, Alexey is right, all the examples of sunrpc/fasync/etc.
> are totally irrelevant. This driver is buggy beyond belief.
It is he who is right and said this. :-) And me farted.
Alexey
>>>>> " " == David S Miller <[email protected]> writes:
> Trond, Alexey is right, all the examples of sunrpc/fasync/etc.
> are totally irrelevant. This driver is buggy beyond belief.
Right: I wasn't trying to defend it. Rather I wanted to explain why I
believe that this particular driver deadlock is not a good enough
reason to change the current RPC code.
Cheers,
Trond
On Tuesday 20. November 2001 20:45, [email protected] wrote:
> (Call QDIO bottom half code)
> spin_lock(&QDIO_lock);
> <QDIO hard interrupt>
>
> ->spin_lock(&QDIO_lock)
> (spins...)
>
> with the same result. No help of nfs is required. :-)
Now that my blood caffeine levels are above the 'sleep' watermark, I should
probably correct the above in case Ulrich decides to slap us both with a
slander lawsuit 8-)...
Since (as I said in the original mail) CPU 1 is in a bottom half context,
both local bh and local interrupts should be disabled on that process!
Basically, the deadlock between the QDIO driver and RPC/fasync can be reduced
to:
CPU 1 CPU 2
(In BH context) (In ordinary process context)
spin_lock_irq(&lock1);
spin_lock_bh(&lock2);
dev_kfree_skb_any(skb);
<QDIO Hard interrupt>
(switch to QDIO interrupt context)
-> sk->write_space();
spin_lock(&lock1);
-> spin_lock(&lock2);
IOW:
Either we must demand that CPU 2 uses irq-safe spinlocks in order to
protect against sk->write_space(), or we must demand that CPU 1 should drop
'lock1' before being allowed to call dev_kfree_skb_any().
Cheers,
Trond
Hello!
> Either we must demand that CPU 2 uses irq-safe spinlocks in order to
> protect against sk->write_space(),
Never. :-)
> or we must demand that CPU 1 should drop
> 'lock1' before being allowed to call dev_kfree_skb_any().
Yes, alas.
Being pretty evident after seen once, this rule was _not_ evident for me
until yesterday. Actually, this is very sad observation, because core
has no way to detect this is a generic way...
Alexey