2002-01-18 03:41:52

by Jason Thomas

[permalink] [raw]
Subject: network connections stalls

please CC me I'm not on the list.

Hi,

I need some help with this bug.

I've got two athlon boxes which produce this same problem with all
kernels from 2.4.18-pre4 back to 2.4.0 which is a far as I went.

It seems to be only triggered by a particular string of binary data
which I've narrowed down (it came from a pdf document). which I'll
attach now. its 735 bytes, if I change it or remove a byte it doesn't
cause the problem.

The machines are both athlon 1.1GHZ on ASUS A7V133 mobo's, thats the
only simularitys between them ones scsi the others ide ones a via-rhine
nic and the other is a tulip.

output from tcpump:
14:28:25.184415 hathor.tsa.46893 > alhazred.tsa.1234: S 2668390347:2668390347(0) win 5840 <mss 1460,sackOK,timestamp 18451301 0,nop,wscale 0> (DF)
14:28:25.184461 alhazred.tsa.1234 > hathor.tsa.46893: S 2674935738:2674935738(0) ack 2668390348 win 5792 <mss 1460,sackOK,timestamp 102014 18451301,nop,wscale 0> (DF)
14:28:25.184616 hathor.tsa.46893 > alhazred.tsa.1234: . ack 1 win 5840 <nop,nop,timestamp 18451301 102014> (DF)
14:28:25.184954 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18451301 102014> (DF)
14:28:25.393041 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18451322 102014> (DF)
14:28:25.813051 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18451364 102014> (DF)
14:28:26.653052 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18451448 102014> (DF)
14:28:28.333082 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18451616 102014> (DF)
14:28:31.182923 hathor.tsa.46893 > alhazred.tsa.1234: F 735:735(0) ack 1 win 5840 <nop,nop,timestamp 18451901 102014> (DF)
14:28:31.182947 alhazred.tsa.1234 > hathor.tsa.46893: . ack 1 win 5792 <nop,nop,timestamp 102614 18451616,nop,nop,sack sack 1 {735:736} > (DF)
14:28:31.693138 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18451952 102614> (DF)
14:28:38.413253 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18452624 102614> (DF)
14:28:51.853463 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18453968 102614> (DF)
14:29:18.733896 hathor.tsa.46893 > alhazred.tsa.1234: P 1:735(734) ack 1 win 5840 <nop,nop,timestamp 18456656 102614> (DF)

I've tried this with both netcat and rsh, which produce the same
results. I've also tested sending the same string from a solaris box
which results in the same error.

I've also reduce the kernel to a bare minimum of drivers. I can send the
.config if anyone wants it.

The problem does not show its self on the loopback device.

I am willing to put some effort into debugging this if someone can help
me figure out where to start. If there is any other info required let me
know.

Thanks.


Attachments:
(No filename) (2.84 kB)
testdata.dat (734.00 B)
Download all attachments

2002-01-18 07:40:04

by David Miller

[permalink] [raw]
Subject: Re: network connections stalls


Does the "InErrs" TCP counter in /proc/net/snmp increment on the
receiver when this occurs?

It smells of bad checksumming...

2002-01-18 23:12:48

by Jason Thomas

[permalink] [raw]
Subject: Re: network connections stalls

On Thu, Jan 17, 2002 at 11:38:17PM -0800, David S. Miller wrote:
>
> Does the "InErrs" TCP counter in /proc/net/snmp increment on the
> receiver when this occurs?
>
> It smells of bad checksumming...

Yep it increments by 7

--- snmp1.txt Sat Jan 19 10:09:57 2002
+++ snmp2.txt Sat Jan 19 10:10:25 2002
@@ -1,8 +1,8 @@
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
-Ip: 2 64 192 0 0 0 0 0 148 248 0 0 0 0 0 0 0 0 0
+Ip: 2 64 322 0 0 0 0 0 256 359 0 0 0 0 0 0 0 0 0
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
-Icmp: 44 0 19 0 0 0 0 25 0 0 0 0 0 25 0 0 0 0 0 0 0 25 0 0 0 0 0
+Icmp: 66 0 41 0 0 0 0 25 0 0 0 0 0 25 0 0 0 0 0 0 0 25 0 0 0 0 0
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
-Tcp: 0 0 0 0 0 0 0 0 3 144 101 2 0 0
+Tcp: 0 0 0 0 0 0 0 0 3 250 166 2 7 0
Udp: InDatagrams NoPorts InErrors OutDatagrams
-Udp: 3 0 0 122
+Udp: 3 0 0 168