2001-12-18 09:34:27

by Mika.Liljeberg

[permalink] [raw]
Subject: RE: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

HI again Alexey,

We had the missing FIN retransmit problem recur a few times.

> It is possible _only_ if rto is at 120 seconds. It is the only case
> when retransmissions do not happen and this would be normal behaviour.
>
> For now it is the only hypothesis and it will be clear from
> /proc/net/tcp, whether is this right or not.

Again, 10.0.5.11 is Linux, 10.0.5.3 is Symbian. The FIN-ACK from Linux to
Symbian gets dropped. Symbian retransmits the FIN, which is acked by Linux.
Nothing happens after this. Linux eventually times out from LAST-ACK and
Symbian remains stuck in FIN-WAIT-2.

The dump plus /proc/net/tcp is attached. As you can see, no data is
transferred from Linux to Symbian, so the only RTT sample for Linux comes
from the SYN exchange (about 200 ms). So, something is wrong?

Cheers,

MikaL

----------------------------------------------------------------
Mika Liljeberg Phone: +358 5048 36791
Nokia Research Center Fax: +358 7180 36850
P.O.Box 407 Email: [email protected]
FIN-00045 NOKIA GROUP Office: It?merenkatu 11-13,
Finland FIN-00180, Helsinki, Finland
----------------------------------------------------------------



Attachments:
remote.txt (4.28 kB)
proc.net.tcp.txt (6.01 kB)
Download all attachments

2001-12-18 18:41:45

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

Hello!

> from the SYN exchange (about 200 ms). So, something is wrong?

Well, the guess was right and this is pleasant.

The only minor :-) question remained is to guess how rto could happen
to be at this value. I will think. Well, if you have some guesses,
please, tell me. Is this intel btw? I just see that other side
sends bogus misaligned tcp options... not a problem, but it can
be reason of funnyies with some probability.

Alexey

2001-12-18 20:22:54

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

[email protected] wrote:
>
> Hello!
>
> > from the SYN exchange (about 200 ms). So, something is wrong?
>
> Well, the guess was right and this is pleasant.

Yes. We also saw a case, where the RTO was quite high but not quite 120,
so we got exactly one retransmission.

> The only minor :-) question remained is to guess how rto could happen
> to be at this value. I will think. Well, if you have some guesses,
> please, tell me.

Sorry, I'm not really trying to debug Linux so I haven't given it much
thought. We're exercising retransmission algorithms with a packet loss
ratio of 5% if that's any help.

> Is this intel btw?

It's ARM in little endian mode.

> I just see that other side
> sends bogus misaligned tcp options... not a problem, but it can
> be reason of funnyies with some probability.

Heh, they're not bogus, just differently aligned. :) This is an
implementation where packet processing latency is not highest
item on the list of optimization targets.

Now that you mention it, tcp_parse_options() in input.c seems to expect
that the timestamps are word aligned, which is not the case here, and a
false assumption in any case. I would have expected a bus error for
that, unless the pointer cast generates code that magically word aligns
the resulting pointer...

Cheers,

MikaL

2001-12-18 20:30:03

by David Miller

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: Mika Liljeberg <[email protected]>
Date: Tue, 18 Dec 2001 22:21:44 +0200

Now that you mention it, tcp_parse_options() in input.c seems to expect
that the timestamps are word aligned, which is not the case here, and a
false assumption in any case. I would have expected a bus error for
that, unless the pointer cast generates code that magically word aligns
the resulting pointer...

Unaligned kernel loads and stores must be properly handled by the
platform code, and on ARM chips where that is possible it is.

Nevertheless, if you'd like to rule this out, please try the
patch below:

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/net/ipv4/tcp_input.c linux/net/ipv4/tcp_input.c
--- vanilla/linux/net/ipv4/tcp_input.c Tue Oct 30 15:08:12 2001
+++ linux/net/ipv4/tcp_input.c Tue Nov 6 15:48:01 2001
@@ -1987,6 +1987,18 @@
return 0;
}

+static __inline__ __u16 tcp_options_get16(unsigned char *p)
+{
+ return ((__u16) p[0] << 8) | (__u16) p[1];
+}
+
+static __inline__ __u32 tcp_options_get32(unsigned char *p)
+{
+ return (((__u32) p[0] << 24) |
+ ((__u32) p[1] << 16) |
+ ((__u32) p[2] << 8) |
+ ((__u32) p[3] << 0));
+}

/* Look for tcp options. Normally only called on SYN and SYNACK packets.
* But, this can also be called on packets in the established flow when
@@ -2020,7 +2032,7 @@
switch(opcode) {
case TCPOPT_MSS:
if(opsize==TCPOLEN_MSS && th->syn && !estab) {
- u16 in_mss = ntohs(*(__u16 *)ptr);
+ u16 in_mss = tcp_options_get16(ptr);
if (in_mss) {
if (tp->user_mss && tp->user_mss < in_mss)
in_mss = tp->user_mss;
@@ -2047,8 +2059,8 @@
if ((estab && tp->tstamp_ok) ||
(!estab && sysctl_tcp_timestamps)) {
tp->saw_tstamp = 1;
- tp->rcv_tsval = ntohl(*(__u32 *)ptr);
- tp->rcv_tsecr = ntohl(*(__u32 *)(ptr+4));
+ tp->rcv_tsval = tcp_options_get32(ptr);
+ tp->rcv_tsecr = tcp_options_get32(ptr + 4);
}
}
break;

2001-12-18 20:30:13

by Alexey Kuznetsov

[permalink] [raw]
Subject: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

Hello!

> It's ARM in little endian mode.

I think it is answer to the question.

No doubts it still has broken misaligned access.


> Now that you mention it, tcp_parse_options() in input.c seems to expect
> that the timestamps are word aligned,

Nope. It does not expect any alignment, but it is really supposed
to penalise misbehaving cases.

Alexey

2001-12-18 20:53:13

by Mika Liljeberg

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

[email protected] wrote:

> > It's ARM in little endian mode.
>
> I think it is answer to the question.
>
> No doubts it still has broken misaligned access.

Oops, I think there was a misunderstanding. The linux machine is Intel.
The other one is a non-Linux ARM.

> > Now that you mention it, tcp_parse_options() in input.c seems to expect
> > that the timestamps are word aligned,
>
> Nope. It does not expect any alignment, but it is really supposed
> to penalise misbehaving cases.

Ahh, I see. There's a kernel exception handler that is supposed to fix
misaligned access? Hacky.

Cheers,

MikaL

2001-12-18 21:05:13

by Russell King

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Tue, Dec 18, 2001 at 11:29:06PM +0300, [email protected] wrote:
> > It's ARM in little endian mode.
>
> I think it is answer to the question.
>
> No doubts it still has broken misaligned access.

You're way out of line with that comment.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-18 21:10:33

by David Miller

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: Mika Liljeberg <[email protected]>
Date: Tue, 18 Dec 2001 22:52:31 +0200

Ahh, I see. There's a kernel exception handler that is supposed to fix
misaligned access? Hacky.

Not hacky, "transparent". It allows us to fast-path everything.

2001-12-18 21:14:37

by David Miller

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: Russell King <[email protected]>
Date: Tue, 18 Dec 2001 21:03:32 +0000

On Tue, Dec 18, 2001 at 11:29:06PM +0300, [email protected] wrote:
> No doubts it still has broken misaligned access.

You're way out of line with that comment.

Not necessarily Russell. You have even told us on several occaisions
that the older ARMs simply cannot fix up unaligned loads/stores in
fact.

Look, we're analyzing a problem and trying to explore every avenue
for possible problems. If this were sparc64 I'd be checking my
unaligned handler for bugs :-)

2001-12-18 21:17:54

by Russell King

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Tue, Dec 18, 2001 at 01:11:55PM -0800, David S. Miller wrote:
> On Tue, Dec 18, 2001 at 11:29:06PM +0300, [email protected] wrote:
> > No doubts it still has broken misaligned access.
>
> You're way out of line with that comment.
>
> Not necessarily Russell. You have even told us on several occaisions
> that the older ARMs simply cannot fix up unaligned loads/stores in
> fact.

It read as "Oh, it's ARM, that's your problem then".

> Look, we're analyzing a problem and trying to explore every avenue
> for possible problems. If this were sparc64 I'd be checking my
> unaligned handler for bugs :-)

Well, as its already been established, its not running Linux, so it's not
my problem. 8)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-18 21:19:25

by David Miller

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: Russell King <[email protected]>
Date: Tue, 18 Dec 2001 21:14:50 +0000

On Tue, Dec 18, 2001 at 01:11:55PM -0800, David S. Miller wrote:
> On Tue, Dec 18, 2001 at 11:29:06PM +0300, [email protected] wrote:
> > No doubts it still has broken misaligned access.
>
> You're way out of line with that comment.
>
> Not necessarily Russell. You have even told us on several occaisions
> that the older ARMs simply cannot fix up unaligned loads/stores in
> fact.

It read as "Oh, it's ARM, that's your problem then".

If it was "your problem, so go away" why did I even bother posting a
patch for him to test out?

Franks a lot,
David S. Miller
[email protected]

2001-12-18 21:26:59

by Rik van Riel

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Tue, 18 Dec 2001, David S. Miller wrote:
> From: Russell King <[email protected]>
> On Tue, Dec 18, 2001 at 11:29:06PM +0300, [email protected] wrote:
> > No doubts it still has broken misaligned access.
>
> You're way out of line with that comment.
>
> Not necessarily Russell. You have even told us on several occaisions
> that the older ARMs simply cannot fix up unaligned loads/stores in
> fact.

Then the problem will have to be fixed elsewhere, maybe
by having the networking code do explicit unaligned
accesses through some macro which defaults to a normal
access on other systems ?

> Look, we're analyzing a problem and trying to explore every avenue for
> possible problems. If this were sparc64 I'd be checking my unaligned
> handler for bugs :-)

If sparc64 had this hardware problems, I'm sure we'd have
special hacks to handle the situation all throughout the
kernel already, instead of having such hacks blocked by
subsystem maintainers.

cheers,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 21:30:04

by Mika Liljeberg

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

"David S. Miller" wrote:
> If it was "your problem, so go away" why did I even bother posting a
> patch for him to test out?

Whoa, chill fellas! I didn't mean to start a flame war. :-|

I'll give your patch a go, although seeing that the platform takes care
of the alignment problem, the real bug is probably elsewhere.

MikaL

2001-12-18 21:31:33

by Russell King

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Tue, Dec 18, 2001 at 07:24:41PM -0200, Rik van Riel wrote:
> On Tue, 18 Dec 2001, David S. Miller wrote:
> > Not necessarily Russell. You have even told us on several occaisions
> > that the older ARMs simply cannot fix up unaligned loads/stores in
> > fact.
>
> Then the problem will have to be fixed elsewhere, maybe
> by having the networking code do explicit unaligned
> accesses through some macro which defaults to a normal
> access on other systems ?

It's actually not worth it; these "older ARMs" I believe we can safely
drop from our sights - it has been my intention throughout 2.4 to drop
them out when 2.5 came around. The problem is that there are people
who do want still use antequated machines, but they can look after
the problems that entails themselves IMHO. 8)

So, as far as 2.5 is concerned, consider all ARMs capable of handling
mis-aligned accesses via a fault handler.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-18 21:37:43

by David Miller

[permalink] [raw]
Subject: Re: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: Rik van Riel <[email protected]>
Date: Tue, 18 Dec 2001 19:24:41 -0200 (BRST)

Then the problem will have to be fixed elsewhere, maybe
by having the networking code do explicit unaligned
accesses through some macro which defaults to a normal
access on other systems ?

It is a port requirement to fix up such accesses. It has always been
a port requirement to fix up such accesses, and it isn't going to
change.

If I fix up TCP options, I'd have to fixup every access to every
single networking header in the entire stack because "protocol in
protocol" cases can cause unaligned accesses to happen just about any
place.

2001-12-19 09:11:33

by Mika.Liljeberg

[permalink] [raw]
Subject: RE: ARM: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: ext Alan Cox [mailto:[email protected]]:
> > Ahh, I see. There's a kernel exception handler that is
> > supposed to fix misaligned access? Hacky.
>
> Its a big performance win to only do fixups for the unusual cases.

I didn't mean "hacky" as a criticism and I can see the advantages (even
though "fast pathing" the TCP slow path seems a bit strange to me). But if
this isn't a hack (in the archaic sense of the word) I don't know what is.
:-)

Regards,

MikaL

2001-12-20 07:37:51

by Stephen Oberholtzer

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

At 12:28 PM 12/18/2001 -0800, David S. Miller wrote:
>Unaligned kernel loads and stores must be properly handled by the
>platform code, and on ARM chips where that is possible it is.

I don't know what arch you're using, but I work with ARM7TDMI, which has a
behavior I believe can be found documented in some obscure .pdf from arm.com:

Unaligned accesses wrap.

If you have this:

[mem.] 00 01 02 03 04 05 06 07
[data] 00 11 22 33 44 55 66 77

in little-endian mode,

*(int*)0x00 == 0x33221100
*(int*)0x01 == 0x00332211
*(int*)0x02 == 0x11003322
*(int*)0x03 == 0x22110033
*(int*)0x04 == 0x77665544

At least, that's how ARM's docs seem to describe it. I work with this cpu
embedded in a microcontroller (AT91M40800), and these values result:

*(int*)0x00 == 0x33221100
*(int*)0x01 == 0x33221100
*(int*)0x02 == 0x33221100
*(int*)0x03 == 0x33221100
*(int*)0x04 == 0x77665544

An unaligned access to an assembly-declared variable caused me much grief
once, overwriting the task scheduler's ready-to-run list under certain
conditions...

The moral of the story:
RISC cpus abhor unaligned accesses.


--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE

2001-12-20 07:42:21

by David Miller

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

From: Stevie O <[email protected]>
Date: Thu, 20 Dec 2001 02:31:44 -0500

The moral of the story:
RISC cpus abhor unaligned accesses.

They should trap on it or handle it, silent "garbage" is really not
nice behavior.

2001-12-20 07:57:30

by Stephen Oberholtzer

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

At 11:40 PM 12/19/2001 -0800, David S. Miller wrote:
>They should trap on it or handle it, silent "garbage" is really not
>nice behavior.

hah, I wish. The ARM7 has seven "exception" vectors -- that's it.

0x00 = RESET
0x04 = Undefined instruction
0x08 = Software interrupt (SWI instruction, used to escape the restricted
USR cpu mode)
0x0C = Data abort (a very very very much lesser form of an access
violation; accessing memory that's physically not there)
0x10 = Prefetch abort (a data abort that happens trying to read the next
instruction)
0x14 = IRQ <- these two can't really even count as exceptions!
0x18 = FIQ <- which makes for five...
0x1C = <-
0x20 = <- Oh, yeah, and two "reserved" fields which aren't likely to ever
be used.

Anyway, this is a bit off-topic now.


--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE

2001-12-20 09:09:52

by Russell King

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Thu, Dec 20, 2001 at 02:31:44AM -0500, Stevie O wrote:
> I don't know what arch you're using, but I work with ARM7TDMI, which has a
> behavior I believe can be found documented in some obscure .pdf from arm.com:

Sorry, it's not an obscure PDF. It's documented in the Architecture
Reference Manual, which is the main reference for the behaviour of any
ARM processor. If you don't have that, then you're missing *vital*
information.

> At least, that's how ARM's docs seem to describe it. I work with this cpu
> embedded in a microcontroller (AT91M40800), and these values result:
>
> *(int*)0x00 == 0x33221100
> *(int*)0x01 == 0x33221100
> *(int*)0x02 == 0x33221100
> *(int*)0x03 == 0x33221100
> *(int*)0x04 == 0x77665544

Looks like some random manufacturer decided to do something different.
Nothing out of the ordinary there. 8(

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-20 09:06:52

by Russell King

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Thu, Dec 20, 2001 at 02:51:42AM -0500, Stevie O wrote:
> hah, I wish. The ARM7 has seven "exception" vectors -- that's it.

If you're running on a processor without CP#15 register 1, and therefore
doesn't have the alignment data abort trap (eg, MMUless ARMs), then you're
on your own.

If you do have it, and you didn't enable the alignment fault handler,
that's your problem as well.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-20 10:23:01

by David Weinehall

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Thu, Dec 20, 2001 at 02:31:44AM -0500, Stevie O wrote:
> At 12:28 PM 12/18/2001 -0800, David S. Miller wrote:
> >Unaligned kernel loads and stores must be properly handled by the
> >platform code, and on ARM chips where that is possible it is.
>
> I don't know what arch you're using, but I work with ARM7TDMI, which
> has a behavior I believe can be found documented in some obscure .pdf
> from arm.com:

Last time I checked, the ARM7tdmi was a mmu-less cpu -> ucLinux.


/David
_ _
// David Weinehall <[email protected]> /> Northern lights wander \\
// Maintainer of the v2.0 kernel // Dance across the winter sky //
\> http://www.acc.umu.se/~tao/ </ Full colour fire </

2002-01-02 19:55:20

by Mike Touloumtzis

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2 [NEW DATA]

On Thu, Dec 20, 2001 at 11:22:18AM +0100, David Weinehall wrote:
>
> Last time I checked, the ARM7tdmi was a mmu-less cpu -> ucLinux.

The Cirrus Logic EP7211, among others, is an ARM7TDMI with an MMU.

miket