We are seeing a performance slowdown between Windows PPP users and
servers running 2.4.0-test10. Attached is a tcpdump log of the
connection. The machines is without TCP ECN support. The Windows machine
is running Windows 98 SE 4.10.2222 A dialed up over PPP w/ TCP header
compression. The Linux machine is connected directly to the Internet via
a 6509. There is a possibility that we are hitting a bandwidth cap on
outgoing traffic.
18:51:33.282286 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: S
3013389:3013389(0) win 8192 <mss 536,nop,nop,sackOK> (DF)
18:51:33.282395 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: S
2198113890:2198113890(0) ack 3013390 win 5840 <mss 1460,nop,nop,sackOK>
(DF)
18:51:33.509532 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
1:1(0) ack 1 win 8576 (DF)
18:51:33.510360 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
1:1(0) ack 1 win 65280 (DF)
18:51:33.510416 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
1:44(43) ack 1 win 65280 (DF)
18:51:33.510457 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
1:1(0) ack 44 win 5840 (DF)
18:51:33.988330 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1:21(20) ack 44 win 5840 (DF)
18:51:33.988474 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
21:557(536) ack 44 win 5840 (DF)
18:51:36.987336 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1:21(20) ack 44 win 5840 (DF)
18:51:37.177772 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
44:56(12) ack 21 win 65260 (DF)
18:51:37.177794 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
21:557(536) ack 44 win 5840 (DF)
18:51:37.177806 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
557:1093(536) ack 56 win 5840 (DF)
18:51:39.845046 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
44:456(412) ack 21 win 65260 (DF)
18:51:39.845071 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
1093:1093(0) ack 456 win 6432 <nop,nop, sack 1 {44:56} > (DF)
18:51:43.177329 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
21:557(536) ack 456 win 6432 (DF)
18:51:43.538219 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
456:456(0) ack 557 win 65280 (DF)
18:51:43.538275 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
557:1093(536) ack 456 win 6432 (DF)
18:51:43.538292 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1093:1629(536) ack 456 win 6432 (DF)
18:51:55.537346 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
557:1093(536) ack 456 win 6432 (DF)
18:51:55.841360 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
456:456(0) ack 1093 win 65280 (DF)
18:51:55.841384 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1093:1629(536) ack 456 win 6432 (DF)
18:51:55.841393 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1629:1849(220) ack 456 win 6432 (DF)
18:52:19.837335 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1093:1629(536) ack 456 win 6432 (DF)
18:52:20.153776 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
456:456(0) ack 1629 win 65280 (DF)
18:52:20.153803 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1629:1849(220) ack 456 win 6432 (DF)
18:53:08.147334 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1629:1849(220) ack 456 win 6432 (DF)
18:53:08.475911 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
456:456(0) ack 1849 win 65060 (DF)
18:53:08.475947 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1849:1871(22) ack 456 win 6432 (DF)
18:54:44.467332 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1849:1871(22) ack 456 win 6432 (DF)
18:54:44.824187 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
456:456(0) ack 1871 win 65038 (DF)
18:54:44.824256 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1871:1893(22) ack 456 win 6432 (DF)
18:54:55.212750 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
456:506(50) ack 1871 win 65038 (DF)
18:54:55.212767 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
1893:1893(0) ack 506 win 6432 (DF)
18:54:55.571337 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
1893:2429(536) ack 506 win 6432 (DF)
18:54:57.394879 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
456:506(50) ack 1871 win 65038 (DF)
18:54:57.394894 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
2429:2429(0) ack 506 win 6432 <nop,nop, sack 1 {456:506} > (DF)
Here are some numbers from /proc/sys/net/ipv4:
$ cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 174760
$ cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 131072
$ cat /proc/sys/net/ipv4/tcp_sack
1
$ cat /proc/sys/net/ipv4/tcp_fack
1
$ cat /proc/sys/net/ipv4/tcp_dsack
1
$ cat /proc/sys/net/ipv4/tcp_window_scaling
1
$ cat /proc/sys/net/ipv4/tcp_syncookies
0
$ cat /proc/sys/net/ipv4/tcp_timestamps
1
Jordan
Jordan Mendelson wrote:
>
> We are seeing a performance slowdown between Windows PPP users and
> servers running 2.4.0-test10. Attached is a tcpdump log of the
> connection. The machines is without TCP ECN support. The Windows machine
> is running Windows 98 SE 4.10.2222 A dialed up over PPP w/ TCP header
> compression. The Linux machine is connected directly to the Internet via
> a 6509. There is a possibility that we are hitting a bandwidth cap on
> outgoing traffic.
Just some updates. This problem does not appear to happen under 2.2.16.
The dump for 2.2.16 is almost the same except we send an mss back of 536
and not 1460 (remote mtu vs local mtu).
Here is the head of a tcpdump with the same client, but this time with a
2.2.16 machine instead of a 2.4.0-test10 machine:
19:26:23.593114 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: S
5061245:5061245(0) win 8192 <mss 536,nop,nop,sackOK> (DF)
19:26:23.593237 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: S
119520695:119520695(0) ack 5061246 win 32696 <mss 536,nop,nop,sackOK>
(DF)
19:26:23.824394 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: .
1:1(0) ack 1 win 65280 (DF)
19:26:23.824398 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: .
1:1(0) ack 1 win 8576 (DF)
19:26:23.825249 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: P
1:44(43) ack 1 win 65280 (DF)
19:26:23.825283 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: .
1:1(0) ack 44 win 32696 (DF)
19:26:25.245845 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
1:21(20) ack 44 win 32696 (DF)
19:26:25.245956 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
21:342(321) ack 44 win 32696 (DF)
19:26:25.466759 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: .
44:44(0) ack 342 win 64939 (DF)
19:26:25.466792 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
342:878(536) ack 44 win 32696 (DF)
19:26:25.466800 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
878:1401(523) ack 44 win 32696 (DF)
19:26:25.467562 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: P
44:56(12) ack 342 win 64939 (DF)
19:26:25.480104 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: .
1401:1401(0) ack 56 win 32696 (DF)
19:26:25.763509 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: P
56:456(400) ack 878 win 65280 (DF)
19:26:25.766253 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: .
456:456(0) ack 1401 win 64757 (DF)
19:26:26.070115 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: .
1401:1401(0) ack 456 win 32296 (DF)
19:26:26.431515 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
1401:1413(12) ack 456 win 32696 (DF)
19:26:26.432141 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
1413:1684(271) ack 456 win 32696 (DF)
19:26:26.657631 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: .
456:456(0) ack 1684 win 65280 (DF)
19:26:26.657663 eth0 > 64.124.41.136.8888 > 209.179.248.69.1260: P
1684:1817(133) ack 456 win 32696 (DF)
19:26:26.952825 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: .
456:456(0) ack 1817 win 65147 (DF)
19:26:31.086138 eth0 < 209.179.248.69.1260 > 64.124.41.136.8888: P
456:506(50) ack 1817 win 65147 (DF)
> 18:51:33.282286 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: S
> 3013389:3013389(0) win 8192 <mss 536,nop,nop,sackOK> (DF)
> 18:51:33.282395 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: S
> 2198113890:2198113890(0) ack 3013390 win 5840 <mss 1460,nop,nop,sackOK>
> (DF)
> 18:51:33.509532 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 1:1(0) ack 1 win 8576 (DF)
> 18:51:33.510360 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 1:1(0) ack 1 win 65280 (DF)
> 18:51:33.510416 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
> 1:44(43) ack 1 win 65280 (DF)
> 18:51:33.510457 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
> 1:1(0) ack 44 win 5840 (DF)
> 18:51:33.988330 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1:21(20) ack 44 win 5840 (DF)
> 18:51:33.988474 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 21:557(536) ack 44 win 5840 (DF)
> 18:51:36.987336 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1:21(20) ack 44 win 5840 (DF)
> 18:51:37.177772 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
> 44:56(12) ack 21 win 65260 (DF)
> 18:51:37.177794 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 21:557(536) ack 44 win 5840 (DF)
> 18:51:37.177806 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 557:1093(536) ack 56 win 5840 (DF)
> 18:51:39.845046 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
> 44:456(412) ack 21 win 65260 (DF)
> 18:51:39.845071 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
> 1093:1093(0) ack 456 win 6432 <nop,nop, sack 1 {44:56} > (DF)
> 18:51:43.177329 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 21:557(536) ack 456 win 6432 (DF)
> 18:51:43.538219 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 456:456(0) ack 557 win 65280 (DF)
> 18:51:43.538275 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 557:1093(536) ack 456 win 6432 (DF)
> 18:51:43.538292 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1093:1629(536) ack 456 win 6432 (DF)
> 18:51:55.537346 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 557:1093(536) ack 456 win 6432 (DF)
> 18:51:55.841360 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 456:456(0) ack 1093 win 65280 (DF)
> 18:51:55.841384 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1093:1629(536) ack 456 win 6432 (DF)
> 18:51:55.841393 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1629:1849(220) ack 456 win 6432 (DF)
> 18:52:19.837335 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1093:1629(536) ack 456 win 6432 (DF)
> 18:52:20.153776 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 456:456(0) ack 1629 win 65280 (DF)
> 18:52:20.153803 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1629:1849(220) ack 456 win 6432 (DF)
> 18:53:08.147334 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1629:1849(220) ack 456 win 6432 (DF)
> 18:53:08.475911 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 456:456(0) ack 1849 win 65060 (DF)
> 18:53:08.475947 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1849:1871(22) ack 456 win 6432 (DF)
> 18:54:44.467332 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1849:1871(22) ack 456 win 6432 (DF)
> 18:54:44.824187 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: .
> 456:456(0) ack 1871 win 65038 (DF)
> 18:54:44.824256 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1871:1893(22) ack 456 win 6432 (DF)
> 18:54:55.212750 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
> 456:506(50) ack 1871 win 65038 (DF)
> 18:54:55.212767 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
> 1893:1893(0) ack 506 win 6432 (DF)
> 18:54:55.571337 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: P
> 1893:2429(536) ack 506 win 6432 (DF)
> 18:54:57.394879 eth0 < 209.179.248.69.1238 > 64.124.41.177.8888: P
> 456:506(50) ack 1871 win 65038 (DF)
> 18:54:57.394894 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
> 2429:2429(0) ack 506 win 6432 <nop,nop, sack 1 {456:506} > (DF)
Jordan
Date: Mon, 06 Nov 2000 18:17:19 -0800
From: Jordan Mendelson <[email protected]>
18:54:57.394894 eth0 > 64.124.41.177.8888 > 209.179.248.69.1238: .
2429:2429(0) ack 506 win 6432 <nop,nop, sack 1 {456:506} > (DF)
And this is it? The connection dies right here and says no
more? Surely, there was more said on this connection after
this point.
Otherwise I see nothing obviously wrong in these dumps.
Later,
David S. Miller
[email protected]
Date: Mon, 06 Nov 2000 19:44:57 -0800
From: Jordan Mendelson <[email protected]>
Just some updates. This problem does not appear to happen under
2.2.16. The dump for 2.2.16 is almost the same except we send an
mss back of 536 and not 1460 (remote mtu vs local mtu).
MSS advertized makes no difference, it controls not what sized
payloads we send, which is determined in this case by PMTU and thus
both Linux 2.2.x and 2.4.x send equally sized limited packets.
Later,
David S. Miller
[email protected]
22:00:01.684927 209.179.245.186.1091 > 64.124.41.136.8888: S 4033171:4033171(0) win 8192 <mss 536,nop,nop,sackOK> (DF)
22:00:01.685021 64.124.41.136.8888 > 209.179.245.186.1091: S 1261602556:1261602556(0) ack 4033172 win 32696 <mss 536,nop,nop,sackOK> (DF)
22:00:01.916120 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1 win 8576 (DF)
22:00:01.916191 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1 win 65280 (DF)
22:00:01.916981 209.179.245.186.1091 > 64.124.41.136.8888: P 1:44(43) ack 1 win 65280 (DF)
22:00:01.917032 64.124.41.136.8888 > 209.179.245.186.1091: . ack 44 win 32696 (DF)
22:00:02.121143 64.124.41.136.8888 > 209.179.245.186.1091: P 1:21(20) ack 44 win 32696 (DF)
22:00:02.121279 64.124.41.136.8888 > 209.179.245.186.1091: P 21:349(328) ack 44 win 32696 (DF)
22:00:02.327779 209.179.245.186.1091 > 64.124.41.136.8888: . ack 349 win 64932 (DF)
22:00:02.327813 64.124.41.136.8888 > 209.179.245.186.1091: P 349:885(536) ack 44 win 32696 (DF)
22:00:02.327825 64.124.41.136.8888 > 209.179.245.186.1091: P 885:1408(523) ack 44 win 32696 (DF)
22:00:02.328909 209.179.245.186.1091 > 64.124.41.136.8888: P 44:56(12) ack 349 win 64932 (DF)
22:00:02.340110 64.124.41.136.8888 > 209.179.245.186.1091: . ack 56 win 32696 (DF)
22:00:02.605282 209.179.245.186.1091 > 64.124.41.136.8888: P 56:456(400) ack 885 win 65280 (DF)
22:00:02.608462 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1408 win 64757 (DF)
22:00:02.608533 64.124.41.136.8888 > 209.179.245.186.1091: P 1408:1420(12) ack 456 win 32296 (DF)
22:00:02.766833 64.124.41.136.8888 > 209.179.245.186.1091: P 1420:1689(269) ack 456 win 32696 (DF)
22:00:02.889731 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1420 win 64745 (DF)
22:00:03.091796 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1689 win 65280 (DF)
22:00:03.091829 64.124.41.136.8888 > 209.179.245.186.1091: P 1689:1822(133) ack 456 win 32696 (DF)
22:00:03.388700 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1822 win 65147 (DF)
22:00:04.442114 209.179.245.186.1091 > 64.124.41.136.8888: F 456:456(0) ack 1822 win 65147 (DF)
22:00:04.442178 64.124.41.136.8888 > 209.179.245.186.1091: . ack 457 win 32696 (DF)
22:00:04.502433 64.124.41.136.8888 > 209.179.245.186.1091: F 1822:1822(0) ack 457 win 32696 (DF)
22:00:04.689026 209.179.245.186.1091 > 64.124.41.136.8888: . ack 1823 win 65147 (DF)
Date: Mon, 06 Nov 2000 21:20:39 -0800
From: Jordan Mendelson <[email protected]>
It looks to me like there is an artificial delay in 2.4.0 which is
slowing down the traffic to unbearable levels.
No, I think I see whats wrong, it's nothing more than packet drop.
The large gaps in time seem to be due to packets being dropped:
22:00:39.991515 64.124.41.179.8888 > 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 (DF)
22:00:39.991660 64.124.41.179.8888 > 209.179.245.186.1092: P 21:557(536) ack 44 win 5840 (DF)
3 seconds pass, retransmit time out.
22:00:42.991490 64.124.41.179.8888 > 209.179.245.186.1092: P 1:21(20) ack 44 win 5840 (DF)
Linux retransmits dropped data.
22:00:43.180946 209.179.245.186.1092 > 64.124.41.179.8888: P 44:56(12) ack 21 win 65260 (DF)
Windows95 responds, acknowledges up to byte 21.
22:00:43.180997 64.124.41.179.8888 > 209.179.245.186.1092: P 21:557(536) ack 44 win 5840 (DF)
22:00:43.181025 64.124.41.179.8888 > 209.179.245.186.1092: P 557:1093(536) ack 56 win 5840 (DF)
22:00:45.685143 209.179.245.186.1092 > 64.124.41.179.8888: P 44:456(412) ack 21 win 65260 (DF)
Linux resends bytes 21:556 and sends new data from 557:1093.
Windows95 sends new data and ACKs only up to 21 (meaning presumably
that all bytes sent by Linux this time were dropped).
22:00:45.685204 64.124.41.179.8888 > 209.179.245.186.1092: . ack 456 win 6432 <nop,nop, sack 1 {44:56} > (DF)
Linux acknowledges data received from Windows95 machine.
A retransmit timeout occurs on the lost data.
22:00:49.171046 64.124.41.179.8888 > 209.179.245.186.1092: P 21:557(536) ack 456 win 6432 (DF)
22:00:49.470193 209.179.245.186.1092 > 64.124.41.179.8888: . ack 557 win 65280 (DF)
Linux resends 21:557, Windows95 (finally) acknowledges it.
Looking at the equivalent 220 traces, the only difference appears to
be that the packets are not getting dropped.
Alexey, do you have any other similar reports wrt. the new MSS
advertisement scheme in 2.4.x?
Jordan, you mentioned something about possibly being "bandwidth
limited"? Please, elaborate...
Later,
David S. Miller
[email protected]
"David S. Miller" wrote:
>
> Date: Mon, 06 Nov 2000 21:20:39 -0800
> From: Jordan Mendelson <[email protected]>
>
> It looks to me like there is an artificial delay in 2.4.0 which is
> slowing down the traffic to unbearable levels.
>
> No, I think I see whats wrong, it's nothing more than packet drop.
>
> Looking at the equivalent 220 traces, the only difference appears to
> be that the packets are not getting dropped.
I would like to note that these two machines the windows client is
connecting to are sitting on the exact same switch connected to the same
provider handling identical user loads.
> Alexey, do you have any other similar reports wrt. the new MSS
> advertisement scheme in 2.4.x?
>
> Jordan, you mentioned something about possibly being "bandwidth
> limited"? Please, elaborate...
There is a possibility that we are hitting an upper level bandwidth
limit between us an our upstream provider due to a misconfiguration on
the other end, but this should only happen during peak time (which it is
not right now). It just bugs me that 2.2.16 doesn't appear to have this
problem.
Jordan
Date: Mon, 06 Nov 2000 22:13:23 -0800
From: Jordan Mendelson <[email protected]>
There is a possibility that we are hitting an upper level bandwidth
limit between us an our upstream provider due to a misconfiguration
on the other end, but this should only happen during peak time
(which it is not right now). It just bugs me that 2.2.16 doesn't
appear to have this problem.
The only thing I can do now is beg for a tcpdump from the windows95
machine side. Do you have the facilities necessary to obtain this?
This would prove that it is packet drop between the two systems, for
whatever reason, that is causing this.
Later,
David S. Miller
[email protected]
22:34:34.884487 arp who-has 64.124.41.179 tell 209.179.194.175
22:34:34.889477 209.179.194.175.1084 > 64.124.41.179.8888: S 370996:370996(0) win 8192 <mss 536,nop,nop,sackOK> (DF)
22:34:35.669892 64.124.41.179.8888 > 209.179.194.175.1084: S 3050526223:3050526223(0) ack 370997 win 5840 <mss 1460,nop,nop,sackOK> (DF)
22:34:35.670624 209.179.194.175.1084 > 64.124.41.179.8888: . ack 1 win 8576 (DF)
22:34:35.670653 209.179.194.175.1084 > 64.124.41.179.8888: . ack 1 win 65280 (DF)
22:34:35.674484 209.179.194.175.1084 > 64.124.41.179.8888: P 1:44(43) ack 1 win 65280 (DF)
22:34:36.049808 64.124.41.179.8888 > 209.179.194.175.1084: . ack 44 win 5840 (DF)
22:34:36.069773 64.124.41.179.8888 > 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 (DF)
22:34:36.069837 64.124.41.179.8888 > 209.179.194.175.1084: P 19:553(534) ack 44 win 5840 (DF)
22:34:39.049788 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF)
22:34:39.051638 209.179.194.175.1084 > 64.124.41.179.8888: P 44:56(12) ack 21 win 65260 (DF)
22:34:39.245138 64.124.41.179.8888 > 209.179.194.175.1084: P 21:555(534) ack 44 win 5840 (DF)
22:34:39.245208 64.124.41.179.8888 > 209.179.194.175.1084: P 557:1091(534) ack 56 win 5840 (DF)
22:34:41.739438 209.179.194.175.1084 > 64.124.41.179.8888: P 44:456(412) ack 21 win 65260 (DF)
22:34:42.064811 64.124.41.179.8888 > 209.179.194.175.1084: . ack 456 win 6432 <nop,nop,sack 43360@5 43372@5> (DF)
22:34:45.224789 64.124.41.179.8888 > 209.179.194.175.1084: P 21:557(536) ack 456 win 6432 (DF)
22:34:45.339396 209.179.194.175.1084 > 64.124.41.179.8888: . ack 557 win 65280 (DF)
22:34:45.524819 64.124.41.179.8888 > 209.179.194.175.1084: P 557:1091(534) ack 456 win 6432 (DF)
22:34:45.544830 64.124.41.179.8888 > 209.179.194.175.1084: P 1091:1625(534) ack 456 win 6432 (DF)
22:34:57.508659 64.124.41.179.8888 > 209.179.194.175.1084: P 557:1093(536) ack 456 win 6432 (DF)
22:34:57.664295 209.179.194.175.1084 > 64.124.41.179.8888: . ack 1093 win 65280 (DF)
22:34:57.834842 64.124.41.179.8888 > 209.179.194.175.1084: P 1093:1627(534) ack 456 win 6432 (DF)
22:34:57.854637 64.124.41.179.8888 > 209.179.194.175.1084: P 1627:1843(216) ack 456 win 6432 (DF)
22:35:21.859406 64.124.41.179.8888 > 209.179.194.175.1084: P 1093:1629(536) ack 456 win 6432 (DF)
22:35:21.974090 209.179.194.175.1084 > 64.124.41.179.8888: . ack 1629 win 65280 (DF)
22:35:22.119319 64.124.41.179.8888 > 209.179.194.175.1084: P 1629:1845(216) ack 456 win 6432 (DF)
22:36:10.179021 64.124.41.179.8888 > 209.179.194.175.1084: P 1629:1847(218) ack 456 win 6432 (DF)
22:36:10.323454 209.179.194.175.1084 > 64.124.41.179.8888: . ack 1847 win 65062 (DF)
22:36:10.478939 64.124.41.179.8888 > 209.179.194.175.1084: P 1847:1866(19) ack 456 win 6432 (DF)
22:36:19.818615 209.179.194.175.1084 > 64.124.41.179.8888: F 456:456(0) ack 1847 win 65062 (DF)
22:36:20.003942 [|tcp] (DF)
22:36:20.004076 64.124.41.179.8888 > 209.179.194.175.1084: F 1868:1868(0) ack 457 win 6432 (DF)
22:36:20.008601 209.179.194.175.1084 > 64.124.41.179.8888: . ack 1847 win 65062 <nop,nop,sack 23899@46547 23900@46547> (DF)
22:37:46.513418 64.124.41.179.8888 > 209.179.194.175.1084: P 1847:1868(21) ack 457 win 6432 (DF)
22:37:46.517916 209.179.194.175.1084 > 64.124.41.179.8888: R 371453:371453(0) win 0 (DF)
On Mon, Nov 06, 2000 at 10:03:05PM -0800, David S. Miller wrote:
> The only thing I can do now is beg for a tcpdump from the windows95
> machine side. Do you have the facilities necessary to obtain this?
> This would prove that it is packet drop between the two systems, for
> whatever reason, that is causing this.
It looks very like to me like a poster child for the non timestamp
RTT update problem I just described on netdev. Linux always retransmits
too early and there is never a better RTT estimate which could fix it.
2.4's advertised windows also do not seem to cope with weird window
advertising strategy of windows (start with a small window and then
suddenly increase it). Linux's stays small.
-Andi
Date: Mon, 06 Nov 2000 22:44:00 -0800
From: Jordan Mendelson <[email protected]>
Attached to this message are dumps from the windows 98 machine using
windump and the linux 2.4.0-test10. Sorry the time stamps don't match
up.
Ok, something is "odd" at the win98 side, I quote the win98 log:
22:34:36.069773 64.124.41.179.8888 > 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 (DF)
22:34:36.069837 64.124.41.179.8888 > 209.179.194.175.1084: P 19:553(534) ack 44 win 5840 (DF)
Linux sends 1-->553
Since this is in the win98 log, it saw this data, but refuses to
acknowledge it and the retransmit timeout expires on the Linux side.
22:34:39.049788 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF)
So Linux resends 1-->21
22:34:39.051638 209.179.194.175.1084 > 64.124.41.179.8888: P 44:56(12) ack 21 win 65260 (DF)
Win98 sends data, and only acknowledges the resent data from Linux.
22:34:39.245138 64.124.41.179.8888 > 209.179.194.175.1084: P 21:555(534) ack 44 win 5840 (DF)
22:34:39.245208 64.124.41.179.8888 > 209.179.194.175.1084: P 557:1091(534) ack 56 win 5840 (DF)
Win98 machine receives bytes 21-->1091 from Linux, Linux also is
acknowledging Win98's data up to 56, but...
22:34:41.739438 209.179.194.175.1084 > 64.124.41.179.8888: P 44:456(412) ack 21 win 65260 (DF)
Win98 still claims it only saw up to byte 21 from Linux. Win98 also
resends its data, therefore it has not seen Linux's ACKs either.
And this goes on and on.
Just to be absolutely sure, 64.124.41.179 is the Linux machine, right?
If so, Win98 is dropping packets it did in fact receive correctly,
before Win98's TCP has a look at them.
WHOA, wait a second! From the Linux side log:
23:36:16.261533 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF)
23:36:16.261669 64.124.41.179.8888 > 209.179.194.175.1084: P 21:557(536) ack 44 win 5840 (DF)
23:36:19.261055 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF)
The equivalent packets from the win98 log:
22:34:36.069773 64.124.41.179.8888 > 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 (DF)
22:34:36.069837 64.124.41.179.8888 > 209.179.194.175.1084: P 19:553(534) ack 44 win 5840 (DF)
22:34:39.049788 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack 44 win 5840 (DF)
(ie. Linux sends bytes 1:21 both the first time, and when it
retransmits that data. However win98 "sees" this as 1:19 the first
time and 1:21 during the retransmit by Linux)
That is bogus. Something is mangling the packets between the Linux
machine and the win98 machine. You mentioned something about
bandwidth limiting at your upstream provider, any chance you can have
them turn this bandwidth limiting device off?
Or maybe earthlink is using some packet mangling device?
It is clear though, that something is messing with or corrupting the
packets. One thing you might try is turning off TCP header
compression for the PPP link, does this make a difference?
Later,
David S. Miller
[email protected]
Date: Tue, 7 Nov 2000 08:03:42 +0100
From: Andi Kleen <[email protected]>
It looks very like to me like a poster child for the non timestamp
RTT update problem I just described on netdev. Linux always
retransmits too early and there is never a better RTT estimate
which could fix it.
I thought so too, _BUT_ see my analysis of the Linux side vs.
Win98 side logs, they don't match up and therefore something
is mangling the packets in the middle. The TCP sequence numbers are
being changed!
Also, if your theory were true then 2.2.x would be affected
by it as well.
Later,
David S. Miller
[email protected]
On Mon, Nov 06, 2000 at 10:59:04PM -0800, David S. Miller wrote:
> Date: Tue, 7 Nov 2000 08:03:42 +0100
> From: Andi Kleen <[email protected]>
>
> It looks very like to me like a poster child for the non timestamp
> RTT update problem I just described on netdev. Linux always
> retransmits too early and there is never a better RTT estimate
> which could fix it.
>
> I thought so too, _BUT_ see my analysis of the Linux side vs.
> Win98 side logs, they don't match up and therefore something
> is mangling the packets in the middle. The TCP sequence numbers are
> being changed!
Hmm. One of these weird bandwidth limiters again?
>
> Also, if your theory were true then 2.2.x would be affected
> by it as well.
2.2 does not save RTTs between connections. The RTT is lower than 2.2's
initial 3s RTT, so 2.2 would never see it. One useful experiment would
be to flush the routing cache between attempts or turn off the tcp metrics
saving (why don't we have a sysctl for that btw?)
-Andi
"David S. Miller" wrote:
>
> Date: Mon, 06 Nov 2000 22:44:00 -0800
> From: Jordan Mendelson <[email protected]>
>
> Attached to this message are dumps from the windows 98 machine using
> windump and the linux 2.4.0-test10. Sorry the time stamps don't match
> up.
>
> (ie. Linux sends bytes 1:21 both the first time, and when it
> retransmits that data. However win98 "sees" this as 1:19 the first
> time and 1:21 during the retransmit by Linux)
>
> That is bogus. Something is mangling the packets between the Linux
> machine and the win98 machine. You mentioned something about
> bandwidth limiting at your upstream provider, any chance you can have
> them turn this bandwidth limiting device off?
It actually turns out that that problem with bandwidth was fixed
yesterday, so this can not be the problem here and yes, 64.124.41.179 is
a linux box. :)
> Or maybe earthlink is using some packet mangling device?
>
> It is clear though, that something is messing with or corrupting the
> packets. One thing you might try is turning off TCP header
> compression for the PPP link, does this make a difference?
Actually, there has been several reports that turning header compression
does help.
Jordan
Date: Mon, 06 Nov 2000 23:16:21 -0800
From: Jordan Mendelson <[email protected]>
"David S. Miller" wrote:
> It is clear though, that something is messing with or corrupting the
> packets. One thing you might try is turning off TCP header
> compression for the PPP link, does this make a difference?
Actually, there has been several reports that turning header
compression does help.
If this is what is causing the TCP sequence numbers to change
then either Win98's or Earthlink terminal server's implementation
of TCP header compression is buggy.
Assuming this is true, it explains why Win98's TCP does not "see" the
data sent by Linux, because such a bug would make the TCP checksum of
these packets incorrect and thus dropped by Win98's TCP.
Later,
David S. Miller
[email protected]
Date: Tue, 7 Nov 2000 08:16:04 +0100
From: Andi Kleen <[email protected]>
Hmm. One of these weird bandwidth limiters again?
In a more recent mail, TCP header compression in Win98 or Earthlink's
terminal servers have become the current prime suspect. :-)
The RTT is lower than 2.2's initial 3s RTT, so 2.2 would never see
it.
The 240 traces are using an RTT of 3s (look at the time difference of
the first retransmit), so this is not it.
Later,
David S. Miller
[email protected]
"David S. Miller" wrote:
>
> Date: Mon, 06 Nov 2000 23:16:21 -0800
> From: Jordan Mendelson <[email protected]>
>
> "David S. Miller" wrote:
> > It is clear though, that something is messing with or corrupting the
> > packets. One thing you might try is turning off TCP header
> > compression for the PPP link, does this make a difference?
>
> Actually, there has been several reports that turning header
> compression does help.
>
> If this is what is causing the TCP sequence numbers to change
> then either Win98's or Earthlink terminal server's implementation
> of TCP header compression is buggy.
>
> Assuming this is true, it explains why Win98's TCP does not "see" the
> data sent by Linux, because such a bug would make the TCP checksum of
> these packets incorrect and thus dropped by Win98's TCP.
Ok, but why doesn't 2.2.16 exhibit this behavior?
We've had reports from quite a number of people complaining about this
and I'm fairly certain not all of them are from Earthlink.
Jordan
Date: Mon, 06 Nov 2000 23:32:42 -0800
From: Jordan Mendelson <[email protected]>
Ok, but why doesn't 2.2.16 exhibit this behavior?
We've had reports from quite a number of people complaining about
this and I'm fairly certain not all of them are from Earthlink.
The only thing different is that 2.2.x is packetizing the write()
system calls on the server differently, otherwise there is no
difference whatsoever.
What 2.4.x is doing is completely legal. Really, even if not all of
these people are from Earthlink (well, you should see if this is for
certain) they may all be using the same buggy terminal server at these
different ISPs.
Later,
David S. Miller
[email protected]
David S. Miller wrote:
> Linux resends 21:557, Windows95 (finally) acknowledges it.
>
> Looking at the equivalent 220 traces, the only difference appears to
> be that the packets are not getting dropped.
This smells of "wrong checksums getting generated", in my opinion.
(This is not my field of expertise. I'll keep my trap shut from now
on, OK?)
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* Common sense is the collection of *
****** prejudices acquired by age eighteen. -- Albert Einstein ********
David S. Miller wrote:
> It is clear though, that something is messing with or corrupting the
> packets. One thing you might try is turning off TCP header
> compression for the PPP link, does this make a difference?
Try specifying "asyncmap 0xffffffff" too.
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* Common sense is the collection of *
****** prejudices acquired by age eighteen. -- Albert Einstein ********
On Mon, Nov 06, 2000 at 11:27:54PM -0800, David S. Miller wrote:
> What 2.4.x is doing is completely legal. Really, even if not all of
> these people are from Earthlink (well, you should see if this is for
> certain) they may all be using the same buggy terminal server at these
> different ISPs.
I think such a theory would at least need verifying (e.g. by a sniffer
on the windows end that checks checksums or someone finding the checksum
failed counters windows probably maintains)
-Andi
On Mon, Nov 06, 2000 at 11:16:21PM -0800, Jordan Mendelson wrote:
> > It is clear though, that something is messing with or corrupting the
> > packets. One thing you might try is turning off TCP header
> > compression for the PPP link, does this make a difference?
>
> Actually, there has been several reports that turning header compression
> does help.
What does help ? Turning it on or turning it off ?
-Andi
Date: Tue, 7 Nov 2000 10:41:36 +0100
From: Andi Kleen <[email protected]>
I think such a theory would at least need verifying (e.g. by a
sniffer on the windows end that checks checksums or someone finding
the checksum failed counters windows probably maintains)
Sure.
BTW, note the pattern of the win98 logs, basically the whole
connection from the WIN98 side is:
repeat:
LINUX data(N) --> win98 DIFFERENT sequence number
LINUX data(N+1) --> win98 DIFFERENT sequence number
retimeout
LINUX data(N) --> win98 SAME sequence number
win98 ACK N --> LINUX
N++; goto repeat
And in all cases except the first data segment sent by Linux, the
sequence numbers are corrupted such that the original 536 byte payload
is reduced to a 534 byte payload.
Now one could argue it might be a bandwidth limiter, because the
first change in sequence numbers is:
22:34:36.069773 64.124.41.179.8888 > 209.179.194.175.1084: P 1:19(18) ack 44 win 5840 (DF)
22:34:36.069837 64.124.41.179.8888 > 209.179.194.175.1084: P 19:553(534) ack 44 win 5840 (DF)
They're changed, but lined up. However later we get stuff like:
22:34:39.245138 64.124.41.179.8888 > 209.179.194.175.1084: P 21:555(534) ack 44 win 5840 (DF)
22:34:39.245208 64.124.41.179.8888 > 209.179.194.175.1084: P 557:1091(534) ack 56 win 5840 (DF)
Which don't even line up at all. Furthermore, if the checksums were
correct win98 should have ACK'd fully the <1:19> data packet and the
<19:553> one. Also for the non-contiguous cases win98 should have
sent an immediate ACK (and a SACK block indicating the non-contiguous
data received).
Now, I could see some buggy bandwidth limiter chopping up the sequence
numbers such that they aren't lined up occaisionally, but corrupting
all the TCP checksums as well? I find that hard to believe.
Well, if it is what's happening, I wouldn't expect such a company to
be making such products for long :-)
Later,
David S. Miller
[email protected]
Date: Tue, 7 Nov 2000 10:38:12 +0100 (MET)
From: [email protected] (Rogier Wolff)
David S. Miller wrote:
> It is clear though, that something is messing with or corrupting the
> packets. One thing you might try is turning off TCP header
> compression for the PPP link, does this make a difference?
Try specifying "asyncmap 0xffffffff" too.
I wonder how this is specified under win98 :-)
Later,
David S. Miller
[email protected]
Date: Tue, 7 Nov 2000 10:35:21 +0100 (MET)
From: [email protected] (Rogier Wolff)
This smells of "wrong checksums getting generated", in my opinion.
Actually the current hypothesis is that the checksums are incorrect,
but only because something between Linux and win98 are changing the
TCP sequence numbers in the packet but not updating the checksum to
match.
Jordan, if you check the windows registry or wherever you view SNMP
statistics under win98, do the "TCP checksum" or "TCP discard" error
counters change after one of these "slow" PPP sessions to
2.4.0-test10?
Later,
David S. Miller
[email protected]
David S. Miller wrote:
> Date: Tue, 7 Nov 2000 10:38:12 +0100 (MET)
> From: [email protected] (Rogier Wolff)
>
> David S. Miller wrote:
> > It is clear though, that something is messing with or corrupting the
> > packets. One thing you might try is turning off TCP header
> > compression for the PPP link, does this make a difference?
>
> Try specifying "asyncmap 0xffffffff" too.
>
> I wonder how this is specified under win98 :-)
Well, I missed the initial part of this discussion. From your remark I
must conclude that the Windows box is dialling in to an ISP and it's a
Linux box that is somewhere on the net. Then you can't just put it in
the ppp0.options on the linux box. :-(
I do suspect that there must be a popup screen on W98 that allows you
this control. Possibly it's tucked away in some "registry" entry
somewhere.
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* Common sense is the collection of *
****** prejudices acquired by age eighteen. -- Albert Einstein ********
> > Assuming this is true, it explains why Win98's TCP does not "see" the
> > data sent by Linux, because such a bug would make the TCP checksum of
> > these packets incorrect and thus dropped by Win98's TCP.
>
> Ok, but why doesn't 2.2.16 exhibit this behavior?
>
> We've had reports from quite a number of people complaining about this
> and I'm fairly certain not all of them are from Earthlink.
If their system is confused by tcp options in data segments then the SACK stuff
in 2.4 may well be the trigger. Windows generally doesnt try and use vj at
all. With the predictable QA results for anyone who does try and use it
Date: Tue, 7 Nov 2000 12:22:14 +0000 (GMT)
From: Alan Cox <[email protected]>
If their system is confused by tcp options in data segments then
the SACK stuff in 2.4 may well be the trigger.
SACK is on by default in both the 2.2.x and 2.4.x traces...
Later,
David S. Miller
[email protected]
Andi Kleen wrote:
>
> On Mon, Nov 06, 2000 at 11:16:21PM -0800, Jordan Mendelson wrote:
> > > It is clear though, that something is messing with or corrupting the
> > > packets. One thing you might try is turning off TCP header
> > > compression for the PPP link, does this make a difference?
> >
> > Actually, there has been several reports that turning header compression
> > does help.
>
> What does help ? Turning it on or turning it off ?
We had a good number of reports that turning PPP header compression off
helped. The windows 98 connection I was testing with it did have header
compression turned on. Unfortunatly, I can't just ask the entire windows
world to turn off header compression in order to use our software. :)
I believe we've reverted all of our machines to 2.2, so testing this any
further is going to be a problem.
Jordan
>23:36:16.261533 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack
>44 win 5840 (DF)
>23:36:16.261669 64.124.41.179.8888 > 209.179.194.175.1084: P 21:557(536)
>ack 44 win 5840 (DF)
>23:36:19.261055 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack
>44 win 5840 (DF)
>
>The equivalent packets from the win98 log:
>
>22:34:36.069773 64.124.41.179.8888 > 209.179.194.175.1084: P 1:19(18) ack
>44 win 5840 (DF)
>22:34:36.069837 64.124.41.179.8888 > 209.179.194.175.1084: P 19:553(534)
>ack 44 win 5840 (DF)
>22:34:39.049788 64.124.41.179.8888 > 209.179.194.175.1084: P 1:21(20) ack
>44 win 5840 (DF)
>
>(ie. Linux sends bytes 1:21 both the first time, and when it
> retransmits that data. However win98 "sees" this as 1:19 the first
> time and 1:21 during the retransmit by Linux)
this excerpt looks like when a modem is set to eat XON/XOFF ...
a ping which does a sweep of many byte values should show this up ...
cheers,
lincoln.