The following patch changes the default ip_default_ttl from 64 to 128 hops.
This matches the
default of many modern client OS of my server farms and the logic is that
anything that
capable of reaching me should also be reachable by me.
This has reduced considerably the number of ICMP messages where a packet has
expired
in transit from my server farms. Looks like there are a lot of clients out
there running
(apparently) modern Microsoft OS versions with networks having a lot of hops
(more than 64).
--- linux/include/linux/ip.h.orig Sun Jul 15 08:41:07 2001
+++ linux/include/linux/ip.h Sun Jul 15 08:41:40 2001
@@ -65,7 +65,7 @@
#define IPVERSION 4
#define MAXTTL 255
-#define IPDEFTTL 64
+#define IPDEFTTL 128
/* struct timestamp, struct route and MAX_ROUTES are removed.
George Bonser writes:
> This has reduced considerably the number of ICMP messages where a packet has
> expired
> in transit from my server farms. Looks like there are a lot of clients out
> there running
> (apparently) modern Microsoft OS versions with networks having a lot of hops
> (more than 64).
Why are there 64 friggin hops between machine in your server farm?
That is what I want to know. It makes no sense, even over today's
internet, to have more than 64 hops between two sites.
Later,
David S. Miller
[email protected]
> Why are there 64 friggin hops between machine in your server farm?
> That is what I want to know. It makes no sense, even over today's
> internet, to have more than 64 hops between two sites.
>
> Later,
> David S. Miller
> [email protected]
I have NO idea and feel the same way. Some of the clients might be buried in
some net inside India or China or the US some other place with some goofy
internal net .. I dunno. All I know is that MicroSquish set their default
TTL to 128 and there APPEAR to be people reaching me that are more than 64
hops away that are in fact reachable when I increase the TTL.
On Sun, 15 Jul 2001, David S. Miller wrote:
> Why are there 64 friggin hops between machine in your server farm?
I think he is referring to the amount of hops between him and clients
accessing his servers.
> That is what I want to know. It makes no sense, even over today's
> internet, to have more than 64 hops between two sites.
I have seen sites that are 35 hops away. I'd say it's unlikely
to have more than 64 hops between you and any machine on the internet, but
if this guy is seeing ICMP Unreachables and it lessened when changing TTL,
then I guess there actually ARE machine out there with a lot of IP hops.
What problems could occur from raising it to 128? I'd imagine routing
loops might mean a bit more traffic, but if other major OSes are at TTL
128 and someone is actually having trouble with 64, then why not raise it?
--
Mikael Abrahamsson email: [email protected]
On Sun, Jul 15, 2001 at 03:03:31AM -0700, David S. Miller wrote:
>
> George Bonser writes:
> > This has reduced considerably the number of ICMP messages where a packet has
> > expired
> > in transit from my server farms. Looks like there are a lot of clients out
> > there running
> > (apparently) modern Microsoft OS versions with networks having a lot of hops
> > (more than 64).
>
> Why are there 64 friggin hops between machine in your server farm?
> That is what I want to know. It makes no sense, even over today's
> internet, to have more than 64 hops between two sites.
I seem to recall seeing an NT box setup as a router and it decided to
decrement the TTL by 128 every time instead of 1.
> What problems could occur from raising it to 128? I'd imagine routing
> loops might mean a bit more traffic, but if other major OSes are at TTL
> 128 and someone is actually having trouble with 64, then why not raise it?
I just did a traceroute to one of the IP addresses that fails with a TTL of
64 ... it is in India but the traceroute ends with a different IP address in
less than 16 hops ... proxy arp ???
Anyway ... with the address in question is able to access my server farm
with a TTL of 128 but not with 64. I have NO IDEA what those people are
doing inside their net ... and really do not care. The bottom line as far as
I am concerned is that if they can reach me, I should be able to reach them
... and with a TTL of 128, it appears that I can.
George Bonser writes:
> While it is still the wee hours ... 4am here ... the change in TTL has
> resulted in a 10% increase in bandwidth to my server farms so far. It
> appears to be a substantial improvement.
These people need to fix their systems and routes.
As it stands they have put themselves in a position where they cannot
reach any existing Linux system with a default ttl on routes that
exceed 64 hops.
I mean, even if I did change the default ttl, this would still leave
them with the problem for all existing sites.
Later,
David S. Miller
[email protected]
While it is still the wee hours ... 4am here ... the change in TTL has
resulted in a 10% increase in bandwidth to my server farms so far. It
appears to be a substantial improvement.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of David S. Miller
> Sent: Sunday, July 15, 2001 3:04 AM
> To: George Bonser
> Cc: [email protected]
> Subject: Re: [PATCH] Linux default IP ttl
>
>
>
> George Bonser writes:
> > This has reduced considerably the number of ICMP messages
> where a packet has
> > expired
> > in transit from my server farms. Looks like there are a lot of
> clients out
> > there running
> > (apparently) modern Microsoft OS versions with networks having
> a lot of hops
> > (more than 64).
>
> Why are there 64 friggin hops between machine in your server farm?
> That is what I want to know. It makes no sense, even over today's
> internet, to have more than 64 hops between two sites.
>
> Later,
> David S. Miller
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
> I mean, even if I did change the default ttl, this would still leave
> them with the problem for all existing sites.
>
> Later,
> David S. Miller
> [email protected]
Only Linux sites ... Since NT4.0 Microsoft has used a default TTL of 128.
George Bonser writes:
> Only Linux sites ...
Ie. a significant percentage of the net.
Later,
David S. Miller
[email protected]
Right, which is why I suggest that Linux match that default TTL ...
otherwise people will think that Microsoft sites are more reliable than
Linux sites since they ( the user ) can reach more sites running Microsoft
servers than they can running Linxu servers.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of David S. Miller
> Sent: Sunday, July 15, 2001 4:17 AM
> To: George Bonser
> Cc: [email protected]
> Subject: RE: [PATCH] Linux default IP ttl
>
>
>
> George Bonser writes:
> > Only Linux sites ...
>
> Ie. a significant percentage of the net.
>
> Later,
> David S. Miller
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
It really does not matter all that much to me, I can simply:
echo 128 >/proc/sys/net/ipv4/ip_default_ttl
on every single one of my servers.
But I thought I would "share the wealth" with other admins out there and
have that the kernel default.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of David S. Miller
> Sent: Sunday, July 15, 2001 4:17 AM
> To: George Bonser
> Cc: [email protected]
> Subject: RE: [PATCH] Linux default IP ttl
>
>
>
> George Bonser writes:
> > Only Linux sites ...
>
> Ie. a significant percentage of the net.
>
> Later,
> David S. Miller
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
George Bonser writes:
> Right, which is why I suggest that Linux match that default TTL ...
Then the broken sites will surely never get fixed.
Later,
David S. Miller
[email protected]
George Bonser writes:
> But I thought I would "share the wealth" with other admins out there and
> have that the kernel default.
How about "sharing the wealth" with the broken sites/routers insteaad?
Later,
David S. Miller
[email protected]
Hehe, hey, its not MY routers/sites that are broken. Look at it this way ...
Microsoft must have figured out that there were some broken nets out there
and that setting a default TTL of 128 made their stuff work. I also noticed
that setting my TTL to 128 made my stuff work. I am not in any kind of
crusade to make people do "the right thing". I am simply trying to make my
site work with the maximum number of possible clients.
I really do not care WHY it works, all I care is that it DOES work. I am not
the least bit interested given the current economy of things to try to bully
people into doing what is right. I am more interested in operating with the
client population that is out there without having to make them change
anything.
> -----Original Message-----
> From: David S. Miller [mailto:[email protected]]
> Sent: Sunday, July 15, 2001 4:31 AM
> To: George Bonser
> Cc: [email protected]
> Subject: RE: [PATCH] Linux default IP ttl
>
>
>
> George Bonser writes:
> > But I thought I would "share the wealth" with other admins out
> there and
> > have that the kernel default.
>
> How about "sharing the wealth" with the broken sites/routers insteaad?
>
> Later,
> David S. Miller
> [email protected]
George Bonser writes:
> I really do not care WHY it works, all I care is that it DOES work. I am not
> the least bit interested given the current economy of things to try to bully
> people into doing what is right. I am more interested in operating with the
> client population that is out there without having to make them change
> anything.
You do of course realize that your problem was caused by other people
who probably have exactly the same attitude as you do -- they didn't
care whether they were doing the right thing, they just slapped together
something that worked, even if it did introduce way too many routing
hops. So you're introducing a kludge to counteract their kludge, and
eventually this all turns into a big pile of kludges that doesn't work.
To the extent that the Internet works today, it's because people have
chosen to do the right thing instead of just the thing that works.
Encouraging (not "bullying") other people to do the right thing is
always a good idea.
On Sun, 15 Jul 2001, Steve VanDevender wrote:
> To the extent that the Internet works today, it's because people have
> chosen to do the right thing instead of just the thing that works.
> Encouraging (not "bullying") other people to do the right thing is
> always a good idea.
However, sometimes doing the right thing will cause you to loose
the war. I recall that early Solaris systems had a problem,
the details of which I forget, where web browsers of a certain
very large company would fail. Apparently the Solaris tcp-ip stack
was strictly adhering to the RFC's, it was the other large company's
stack that didn't conform. If memory serves, there was a raging
discussion at the time about whether this non-conformance was
intentional in an effort to target Solaris as an inferior web
server platform. Solaris bowed to the inevitable.
Thus, we have the possibility that parameters may get modified to
gain competitive advantage. While it's nice to stand on principle,
is that really what you want to do in this case?
Regards,
Lew Wolfgang
>
> To the extent that the Internet works today, it's because people have
> chosen to do the right thing instead of just the thing that works.
> Encouraging (not "bullying") other people to do the right thing is
> always a good idea.
I have not evidence to say that their network is not configured properly.
For all I know they have ethernet cable strung from one computer to the next
and it takes 64 hops just to get out of the village. I did a traceroute, it
stopped at an address that would lead me to think, based on experiance and
intuition, that it is a gateway or firewall of some sort ( a .1 IP
address ). The traceroute actually terminated properly at that address so it
might be some kind of a virt IP that it is handling. I could not see what
was beyond it.
The fact that it seemed to work after increasing the TTL told me that they
did not have a hard loop somewhere, that there really did seem to either be
a large number of hops or something was decrementing the hop count by more
than one along the way.
Had the number of the ICMP messages not decreased, I would have written it
off as misconfigured nets, routing loops, or whatever and not bothered to
change the TTL. The bottom line is that 85% of the client software now
shipping ( I think I saw recently that Microsoft has 85% of the desktops )
has a default TTL of 128. Having Linux client and server server software not
work in the same net where another popular OS does would give someone in
that situation the idea that Linux is somehow inferior both as a client and
as a server. They might not immediately draw the conclusion that their
network is inferior. Or, if they DO, they might think that MS is more
"robust" in being able to tolerate their rickety net.
Regardless of who is "right", if there is a substantial population of
software out there that can reach me but I can not reply to, then adapting
my software to the situation is much easier than trying to get them to
change their network design, software, or equipment. They probably have the
"wrong" religion, wear the "wrong" clothes, eat the "wrong" food, and a
whole bunch of other "wrong" stuff too besides using the "wrong" OS but I
would still like to engage in communications with them.
Probably the only reason I noticed it was that the farm in question gets
about 25 million requests a day last time I checked so it has a pretty good
chance of seeing any problems that might exist out there in the world.
> However, sometimes doing the right thing will cause you to loose
> the war. I recall that early Solaris systems had a problem,
> the details of which I forget, where web browsers of a certain
> very large company would fail. Apparently the Solaris tcp-ip stack
> was strictly adhering to the RFC's, it was the other large company's
> stack that didn't conform. If memory serves, there was a raging
> discussion at the time about whether this non-conformance was
> intentional in an effort to target Solaris as an inferior web
> server platform. Solaris bowed to the inevitable.
I *THINK* there was some Path MTU Discovery thing that did not always work
properly with some BSD 4.2 derived stacks. It has been a long time.
> You do of course realize that your problem was caused by other people
> who probably have exactly the same attitude as you do -- they didn't
> care whether they were doing the right thing, they just slapped together
> something that worked, even if it did introduce way too many routing
> hops. So you're introducing a kludge to counteract their kludge, and
> eventually this all turns into a big pile of kludges that doesn't work.
One other thing is that I have no proof that they are reaching that server
farm from a conventional web browser or conventional network. They might be
using some kind of cellular modem attached to a phone or some other wireless
access that might be making several hops to get the data to them. I simply
have no idea. All I know is increasing the default hop count makes it work
and that is good enough for me. If that person can access a Win2k web server
but not my Linux server, there might be a business case against using Linux.
Of course I can always manually bump the default TTL but not every admin of
a website will know to do that. I am just trying to help Linux gain the
maximum possible acceptance by working with the maximum possible number of
clients with the least amount of fuss.
On Sun, Jul 15, 2001 at 03:03:31AM -0700, David S. Miller wrote:
> George Bonser writes:
> > This has reduced considerably the number of ICMP messages where a packet has
> > expired
> > in transit from my server farms. Looks like there are a lot of clients out
> > there running
> > (apparently) modern Microsoft OS versions with networks having a lot of hops
> > (more than 64).
>
> Why are there 64 friggin hops between machine in your server farm?
> That is what I want to know. It makes no sense, even over today's
> internet, to have more than 64 hops between two sites.
While I haven't meassure I'd almost be willing to bet my right arm that
you'll find something like that in the AMPR.org packet radio network
which has at least one /8 network so is a considerable part of the
internet's IP space.
Ralf