2001-02-14 20:40:06

by roger

[permalink] [raw]
Subject: MTU and 2.4.x kernel



Kernel 2.4.x apparently disregards my ppp options MTU setting of 552
and sets mss=536 (=> MTU=576). Kernel 2.2.16 sets mss=512 correctly.
Is this a kernel bug or what?

I ran a much broader inquiry recently concerning poor ppp performance
under the 2.4.x kernel. But I got a disappointing response to my
questions so I am narrowing the field to this one question.

I include my original enquiry below.

Thankyou,
Roger Young.














....................................................................


Topic: Problem with Netscape/Linux v.2.4.x [repeat] (MTU problem)?

Symptoms: The browser (Netscape or Lynx) will not download from remote
web sites (dynamic ppp connection via external modem).

This is a second post. The problem is still not resolved, but can now
be described in more detail, thanks to help given by David Woodhouse
(and others) and my ISP.

Description: Typically Netscape/Lynx will connect to a remote site but
will not download (it will hang indefinitely). When the browser is in
such a hung state I am still able to ping/telnet/ftp to the URL. I have
no difficulty browsing with Linux 2.2.16. The problem only occurs with
the 2.4.x kernels (2.4.0, 2.4.1).

My ISP operates a "transparent proxy server". According to tcpdump
TX packets from my machine are passed on by the proxy server to the
remote site, and the response from the latter is also logged by the
server and passed on to me. However at that point there is no further
traffic from the proxy server.

This looks to be a problem for my PC and the 2.4.x kernel, however
the proxy server is also involved for the following reason: although
the browser is locked for almost all remote sites, I _am_ able to
connect to (the web page of) the proxy server itself. And after I do
this the browser is *unlocked*, and I can connect/download from any web
address. However this only lasts for 5 minutes or so, after which time
I must reconnect to the ISP proxy server. It is as though some information
has been cached and then lost after a time??

Now I include a note from my ISP:


>Roger, as we discussed I think the problem is to do with the MTU being =
>used for TCP connections in combination with the 2.4.1 kernel and PPP.
>
>At any rate, what we have found from the packet dumps are when you use =
>kernel 2.2.16 and you set the MTU at 552 our cache receives SYN packets =
>from your host with a "mss" option set at 512 (MTU =3D 552 - IP header =
>(20) - TCP header (20)) (and here is a packet dump of that):
>
>19:29:33.146337 131.203.xxx.yyy.1028 > http://www.google.com.www: S =
>1878153551:1878153551(0) win 15872 <mss 512,sackOK,timestamp =
>614080,nop,wscale 0> (DF)
>
>however, when your 2.4.1 kernel also set with an MTU of 552 does the =
>same thing, we find a "mss" option set at 536 (MTU =3D 576 - IP header - =
>TCP header) not 552! Here is the packet trace:
>
>19:34:17.559674 131.203.xxx.yyy.32771 > http://www.google.com.www: S =
>2178626299:2178626299(0) win 2144 <mss 536,sackOK,timestamp =
>175390,nop,wscale 0> (DF)
>
>There is more in the trace that indicates packets with data segment =
>sizes of 536 are not getting through, and when the data segment drops to =
>468 it does get through, likewise with the 2.2.16 kernel packets only =
>get as big as 512 and they all get through ok.
>
>This indicates that although the MTU is being set to 552, this is being =
>ignored by the 2.4.1 kernel and it is using 576 instead. Kernel 2.2.16 =
>correctly uses the 552 as specified.


This is as far as my understanding of the situation reaches. There
appear to be 3 interdependent elements:

(1) the web browser
(2) the 2.4.x kernel
(3) the ISP transparent proxy server

Can anyone throw further light on this problem and/or suggest how to
fix it? I'll say straight away that it has nothing to do with ECN
since this has not been selected as a kernel option. Our analysis
seems to suggest that with 2.4.x the MTU is being incorrectly set, but
I don't know whether this is the whole explanation.

Thanks for any help you can provide...

Roger Young.
([email protected])

...................................................................
Motherboard: GA-6VX7-4X with Via Apollo Pro AGP chipset
CPU: P3/733 MHz
Memory: 256 Mb SDRAM
Modem: Dynalink 56K external modem. Serial port IRQ4, I/O 03F8-03FF

Distribution: Slackware 7.1
Linux kernel(s): 2.4.1/2.4.0/2.2.16
PPP: 2.4.0. MTU 552
Netscape: 4.76
XFree86: 4.0.2
modutils: 2.4.1
binutils: 2.10.1




2001-02-14 20:52:16

by Alan

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

> Kernel 2.4.x apparently disregards my ppp options MTU setting of 552
> and sets mss=536 (=> MTU=576). Kernel 2.2.16 sets mss=512 correctly.
> Is this a kernel bug or what?

The kernel is entitled to set an MSS that may cause fragmentation. So no
it isnt a bug.

536 + 40 = 576

Im not sure why it made that choice but it is allowed to.
(cc'd to netdev to see if they know)

> Description: Typically Netscape/Lynx will connect to a remote site but
> will not download (it will hang indefinitely). When the browser is in

Typically indicates your ISP has path mtu problems.

> the browser is locked for almost all remote sites, I _am_ able to
> connect to (the web page of) the proxy server itself. And after I do
> this the browser is *unlocked*, and I can connect/download from any web
> address. However this only lasts for 5 minutes or so, after which time

That would be a cached pmtu for that connection. I suspect the connections
via the proxy server are not sending back valid ICMP fragmentation required
frames for path mtu discovery. That would suggest the problem is the ISP.
2.2 happened to cover this up for the case of a single host directly connected
to a modem with a low mtu.

Alan

2001-02-15 18:22:21

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> Kernel 2.4.x apparently disregards my ppp options MTU setting of 552
> and sets mss=536 (=> MTU=576).

Yes, default configuration is not allowed to advertise mss<536.
The limit is controlled via /proc/sys/net/ipv4/route/min_adv_mss,
you can change it to 256.

Default of 536 is sadistic (and apaprently will be changed eventually
to stop tears of poor people whose providers not only supply them
with bogus mtu values sort of 552 or even 296, but also jailed them
to some proxy or masquearding domain), but it is still right: IP
with mtu lower 576 is not full functional.

Alexey

2001-02-15 18:48:34

by Alan

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

> with bogus mtu values sort of 552 or even 296, but also jailed them
> to some proxy or masquearding domain), but it is still right: IP
> with mtu lower 576 is not full functional.

Please cite an exact RFC reference.

The 576 byte requirement is for reassembled packets handled by the host.
That is if you send a 576 byte frame you know the other end will be able
to put it back together. Our handling of DF on syn frames is also broken
due to that misassumption, but fortunately only for crazy mtus like 70.

Alan

2001-02-15 19:34:18

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> Please cite an exact RFC reference.

No need to cite RFC, this is plain sillogism.

A. Datagram protocols do not work with mtus not allowing to send
512 byte frames (even DNS).
B. Accoutning, classification, resource reervation does not work on
fragmented packets.

-> IP suite is not full functional with low MTUs and must be eliminated.


Current setting of min_adv_mss to 536 is actually occasional.
I tested pmtu discovery on local clients using mtu 296 and did not
change the value to less fascist after this. I happened to be not
mistake, I found some fun talking to people, which suffer of superstition
that "mtu 296 is good for..." (latency for example) 8)8)8)


> to put it back together. Our handling of DF on syn frames is also broken
> due to that misassumption, but fortunately only for crazy mtus like 70.

Right observation. It stops to work even earlier: at mtu<128.
It is strict limit. Pardon, discussing marginal cases is useless.
If someone has device with mtu of 128, let him to put it back to the place,
where he found it.

Preventing DoSes requires to block pmtu discovery at 576 or at least 552.

More practical question is mtu=296. There exist old myth that this value
is good for PPP. This is nothing but myth. 14% of overhead.

I would prefer that minimal MTU on internet stayed on 576, which
is already fact.

Alexey

2001-02-15 20:28:20

by Alan

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

> A. Datagram protocols do not work with mtus not allowing to send
> 512 byte frames (even DNS).

I ran DNS reliably over AX.25 networks. They have an MTU of 216. They work.
Please explain your claim in more detail. Please explain why the real world
is violating your version of the laws of physics.

> B. Accoutning, classification, resource reervation does not work on
> fragmented packets.

Thats a bug in accounting classification and resource reservation. We expect
cisco to fix their ECN bugs I think its rather cheeky to refuse to deal
with our own flaws.

> mistake, I found some fun talking to people, which suffer of superstition
> that "mtu 296 is good for..." (latency for example) 8)8)8)

Over a 9600 mobile phone link mtu 296 makes measurable differences to the
latency when mixing a mail fetch with typing. Over a radio link where
error rate causes exponential increases in probability of packet loss as
the frame length grows the issue is even more visible.

> Right observation. It stops to work even earlier: at mtu<128.
> It is strict limit. Pardon, discussing marginal cases is useless.
> If someone has device with mtu of 128, let him to put it back to the place,
> where he found it.

NetROM is MTU 128. It is a neccessary but inconvenient limitation that would
require ripping out tens of thousands of nodes to fix.

> I would prefer that minimal MTU on internet stayed on 576, which
> is already fact.

Only in your mind. Not in the real world. If you wont fix the TCP/IP code
to handle a 128 byte MTU properly then -ac will diverge from the main tree
because some of us want Linux to work on real world tricky environments.

If you want to argue that a MTU < 512 is hard to deal with by MTU discovery
you are right. So when you get a 'must fragment' below 512, just turn DF off
for that socket. Its not exactly a hard problem to solve properly.


I repeat my request. Cite the RFC number and line.

Alan

2001-02-15 20:41:04

by Rick Jones

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

> Default of 536 is sadistic (and apaprently will be changed eventually
> to stop tears of poor people whose providers not only supply them
> with bogus mtu values sort of 552 or even 296, but also jailed them
> to some proxy or masquearding domain), but it is still right: IP
> with mtu lower 576 is not full functional.

I thought that the specs said that 576 was the "minimum maximum"
reassemblable IP datagram size and not a minimum MTU.

rick jones
--
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...

2001-02-15 20:42:21

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> I ran DNS reliably over AX.25 networks. They have an MTU of 216. They work.

Please, Alan, distinguish two things: "works" and "works, until
I ask X". The second is equal to "does not".

512 is maximal message size, which is transmitted without troubles,
hardwired to almost all the datagram protocols.


> > B. Accoutning, classification, resource reervation does not work on
> > fragmented packets.
>
> Thats a bug in accounting classification and resource reservation.

Sorry? It is bug in client mtu selection. Functions above are impossible
on fragmented packet even in theory. And because of A, if client uses mtu
296, it cannot use 100% of emerging and existing IP functions.


> Over a 9600 mobile phone link mtu 296 makes measurable differences to the
> latency when mixing a mail fetch with typing.

It is myth. Changing mtu until ~4K does not affect latency, it stays on 4K/bw.


> Over a radio link where
> error rate causes exponential increases in probability of packet loss as

Another myth. All they do error correction and have so high latency,
that _increasing_ mtu only helps. And helps a lot.

When you have 22Kbit link and 2 second latency, mtu must be large.



> NetROM is MTU 128.

I wrote "<". 8)


> If you want to argue that a MTU < 512 is hard to deal with by MTU discovery
> you are right. So when you get a 'must fragment' below 512, just turn DF off
> for that socket.

It is exactly, which we make, Alan. 8)


> I repeat my request. Cite the RFC number and line.

I repeat my reply: it is sillogism of A and B. See above.
You can write RFC yourself. 8)

Alexey

2001-02-15 20:54:53

by Jordan Mendelson

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Rick Jones wrote:
>
> > Default of 536 is sadistic (and apaprently will be changed eventually
> > to stop tears of poor people whose providers not only supply them
> > with bogus mtu values sort of 552 or even 296, but also jailed them
> > to some proxy or masquearding domain), but it is still right: IP
> > with mtu lower 576 is not full functional.
>
> I thought that the specs said that 576 was the "minimum maximum"
> reassemblable IP datagram size and not a minimum MTU.

RFC 1191 (Path MTU Discovery as it happens):


Plateau MTU Comments Reference
------ --- -------- ---------
65535 Official maximum MTU RFC 791
65535 Hyperchannel RFC 1044
65535
32000 Just in case
17914 16Mb IBM Token Ring ref. [6]
17914
8166 IEEE 802.4 RFC 1042
8166
4464 IEEE 802.5 (4Mb max) RFC 1042
4352 FDDI (Revised) RFC 1188
4352 (1%)
2048 Wideband Network RFC 907
2002 IEEE 802.5 (4Mb recommended) RFC 1042
2002 (2%)
1536 Exp. Ethernet Nets RFC 895
1500 Ethernet Networks RFC 894
1500 Point-to-Point (default) RFC 1134
1492 IEEE 802.3 RFC 1042
1492 (3%)
1006 SLIP RFC 1055
1006 ARPANET BBN 1822
1006
576 X.25 Networks RFC 877
544 DEC IP Portal ref. [10]
512 NETBIOS RFC 1088
508 IEEE 802/Source-Rt Bridge RFC 1042
508 ARCNET RFC 1051
508 (13%)
296 Point-to-Point (low delay) RFC 1144
296
68 Official minimum MTU RFC 791


Jordan

2001-02-15 21:02:33

by Alan

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

> > I ran DNS reliably over AX.25 networks. They have an MTU of 216. They work.
>
> 512 is maximal message size, which is transmitted without troubles,
> hardwired to almost all the datagram protocols.

Message size != MTU. DNS doesnt use DF. In fact DNS can even fall back to
TCP.

> > > B. Accoutning, classification, resource reervation does not work on
> > > fragmented packets.
> > Thats a bug in accounting classification and resource reservation.
> Sorry? It is bug in client mtu selection. Functions above are impossible
> on fragmented packet even in theory. And because of A, if client uses mtu
> 296, it cannot use 100% of emerging and existing IP functions.

Tragic. You are required to accept existing realities and degrade nicely.

> > Over a 9600 mobile phone link mtu 296 makes measurable differences to the
> > latency when mixing a mail fetch with typing.
>
> It is myth. Changing mtu until ~4K does not affect latency, it stays on 4K/bw.

Please tell that to my phone.

> > Over a radio link where
> > error rate causes exponential increases in probability of packet loss as
>
> Another myth. All they do error correction and have so high latency,
> that _increasing_ mtu only helps. And helps a lot.

No. There is large amounts of real world hardware that this is not true for.
You cannot do good FEC on a narrow band link.

Alan

2001-02-16 12:50:49

by Rik van Riel

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

On Thu, 15 Feb 2001 [email protected] wrote:

> -> IP suite is not full functional with low MTUs and must be eliminated.

Wouldn't it be simpler to just fix the bugs instead of
eliminating the entire Linux IP suite ;)


Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-02-16 12:53:09

by Rik van Riel

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

On Thu, 15 Feb 2001 [email protected] wrote:

> > Over a 9600 mobile phone link mtu 296 makes measurable differences to the
> > latency when mixing a mail fetch with typing.
>
> It is myth.

> > Over a radio link where
> > error rate causes exponential increases in probability of packet loss as
>
> Another myth.

I've seen both these "myth"s happen in practice.


Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-02-18 09:41:40

by David Miller

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel


[email protected] writes:
> A. Datagram protocols do not work with mtus not allowing to send
> 512 byte frames (even DNS).

This smells bad. Datagram protocol send sizes are only limited by
socket buffer size, nothing more. Fragmentation makes it work.

If you are really talking about side effects of UDP path-mtu, then I
will turn off UDP path-mtu by default in 2.4.x because it is obviously
very broken either conceptually or in our implementation. :-)

Later,
David S. Miller
[email protected]

2001-02-18 11:25:30

by Pierfrancesco Caci

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel



:-> "kuznet" == kuznet <[email protected]> writes:

>> Over a radio link where
>> error rate causes exponential increases in probability of packet loss as

> Another myth. All they do error correction and have so high latency,
> that _increasing_ mtu only helps. And helps a lot.

Please don't break existing implementations. Some old hardware used in
the amateur radio world doesn't even accept an mtu longer than 256(*),
and the resulting packets will be silently chopped at the end.
If you want to drop mtu lower than 512, please at least add a
CONFIG_I_NEED_A_GODDAMN_SMALL_MTU as an option.

Pf

(*) Kantronics TNCs are an example.




--

-------------------------------------------------------------------------------
Pierfrancesco Caci | ik5pvx | mailto:[email protected] - http://gusp.dyndns.org
Firenze - Italia | Office for the Complication of Otherwise Simple Affairs
Linux penny 2.4.1 #1 Sat Feb 3 20:43:54 CET 2001 i686 unknown

2001-02-18 19:54:26

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> Message size != MTU.

Alan, you misunderstand _sense_ of the problem.

Fragmentation does _not_ work on poor internet more. At all.
Look at original report. It failed _only_ because his intemediate
node failed to forward fragmented packets.

Alexey

2001-02-18 19:56:26

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> Wouldn't it be simpler to just fix the bugs

There are no bugs.

There is phylosophical discussion about current state of internet
communications.

Alexey

2001-02-18 20:18:43

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> This smells bad. Datagram protocol send sizes are only limited by
> socket buffer size, nothing more. Fragmentation makes it work.

The thread was started from the observation that fragmented frames
do _not_ pass through router. See? 8)

Path mtu discovery exists exactly to help to solve this problem.

In this case mtu is too low to be accepted by pmtu discovery,
so that we simply disable it and start to fragment, exaclty like
pmtu discovery was disabled completely. With all the consequences.

So that workaround is not to _disable_ path mtu discovery,
but to _enforce_ it, changing thresholds.

The argument is subjectless. min_adv_mss must be changed to 256
in production version, no doubts. min_pmtu must stay at its current value
of 576. That's all.

Alexey

2001-02-18 20:33:34

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> Please cite an exact RFC reference.

Imagine, I found this reference yet. This is rfc1191, of course. 8)

in the MSS option. The MSS option should be 40 octets less than the
size of the largest datagram the host is able to reassemble (MMS_R,
as defined in [1]); in many cases, this will be the architectural
limit of 65495 (65535 - 40) octets.

Alexey


PS: But:

A host MAY send an MSS value
derived from the MTU of its connected network (the maximum MTU over
its connected networks, for a multi-homed host); this should not
cause problems for PMTU Discovery, and may dissuade a broken peer
from sending enormous datagrams.

Note: At the moment, we see no reason to send an MSS greater
than the maximum MTU of the connected networks, and we
recommend that hosts do not use 65495. It is quite possible
that some IP implementations have sign-bit bugs that would be
tickled by unnecessary use of such a large MSS.

2001-02-19 01:45:37

by Alan

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

> Fragmentation does _not_ work on poor internet more. At all.

We are implementing an IP stack. Fragmentation works very well thank you,
pointing at a few broken sites as an excuse to not do things right isnt
very good.

Alan

2001-02-19 18:27:40

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

Hello!

> We are implementing an IP stack.

Alan, please, tell me what is wrong. And we will repair this.

The implementation follows RFCs and even relaxes their requirements
in the cases, when they are far from reality.

Alexey

2001-02-19 22:20:51

by Rick Jones

[permalink] [raw]
Subject: Re: MTU and 2.4.x kernel

the TCP code should be "honouring" the link-local MTU in its selection
of MSS.

rick jones