2008-06-15 20:57:52

by David Newall

[permalink] [raw]
Subject: Feedback on TCP: Make TCP_RTO_MAX a variable

Last year, Obata Noboru sent a patch to permit adjustment of
TCP_RTO_MAX, which I have found useful. Refer to
http://marc.info/?l=linux-netdev&m=118422471428855 for details.

A customer reported that their internet-connected POS terminals were
regularly "freezing" for extended periods, sometimes for as long as a
few minutes. My analysis, such as it was, suggested that those
occasions were caused by floods of packets directed towards the internet
link at one end or the other (i.e. POS terminal or central server),
leading to severe packet loss and maximum packet retransmit times during
which no session data could be transmitted. I believe those floods were
caused by anonymous third parties scanning the internet, and attempting
to break through my client's routers. I also believe that to be an
unavoidable social quality of the internet; I have to live with it.

Having a "cash register" randomly freeze for minutes at a time is not
acceptable, and neither does it seem necessary. Using Obata Noboru's
patch, I set TCP_RTO_MAX to 5 seconds at both ends. The system has been
running thus for five weeks, and I have not been called by my customer
since. While this change obviously did nothing to solve the underlying
problem of temporary link congestion (it has no solution), it did remove
a frequent, multi-minute pause. Perhaps surprisingly, I have not heard
of sessions being dropped, which could be expected to occur as a
consequence of the substantially reduced retransmit times. This might
be luck, and sessions aren't dropping; or it might be insufficiently
important (annoying) for my client to report; the application would
restart quickly. Either way, apparently my client no longer has a problem.

I acknowledge that this patch must exacerbate an already hopeless
situation: A link is congested and I am causing packets to be sent at
five second intervals instead of 10, 20, 40, 80 or 120. I am
unconcerned by this because the number of additional packets is
miniscule when compared to the number of packets that caused the problem
in the initial instance. I do not know how 120 seconds was chosen for
the RTO maximum but I observe that network bandwidth has increased by
orders of magnitude since it was, and feel that a corresponding decrease
in RTO is fair. I put it to administrators everywhere to consider this
when faced with similar problems.

It's a pity that Obata Noboru's patch was rejected.

Thank you, Noboru.


2008-06-16 00:50:58

by Chris Fowler

[permalink] [raw]
Subject: Re: Feedback on TCP: Make TCP_RTO_MAX a variable

My only question would be that if you did not have this feature how
would you have solved your problem?

Chris

2008-06-16 02:52:13

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Feedback on TCP: Make TCP_RTO_MAX a variable

On Mon, 16 Jun 2008 06:27:35 +0930
David Newall <[email protected]> wrote:

> Last year, Obata Noboru sent a patch to permit adjustment of
> TCP_RTO_MAX, which I have found useful. Refer to
> http://marc.info/?l=linux-netdev&m=118422471428855 for details.
>
> A customer reported that their internet-connected POS terminals were
> regularly "freezing" for extended periods, sometimes for as long as a
> few minutes. My analysis, such as it was, suggested that those
> occasions were caused by floods of packets directed towards the internet
> link at one end or the other (i.e. POS terminal or central server),
> leading to severe packet loss and maximum packet retransmit times during
> which no session data could be transmitted. I believe those floods were
> caused by anonymous third parties scanning the internet, and attempting
> to break through my client's routers. I also believe that to be an
> unavoidable social quality of the internet; I have to live with it.

Why are you letting them through. Use proper firewalling.

> Having a "cash register" randomly freeze for minutes at a time is not
> acceptable, and neither does it seem necessary. Using Obata Noboru's
> patch, I set TCP_RTO_MAX to 5 seconds at both ends. The system has been
> running thus for five weeks, and I have not been called by my customer
> since. While this change obviously did nothing to solve the underlying
> problem of temporary link congestion (it has no solution), it did remove
> a frequent, multi-minute pause. Perhaps surprisingly, I have not heard
> of sessions being dropped, which could be expected to occur as a
> consequence of the substantially reduced retransmit times. This might
> be luck, and sessions aren't dropping; or it might be insufficiently
> important (annoying) for my client to report; the application would
> restart quickly. Either way, apparently my client no longer has a problem.
>

A real VPN with IPSEC would have stopped the problem.
I wouldn't put a mission critical system exposed directly to the Internet.

> I acknowledge that this patch must exacerbate an already hopeless
> situation: A link is congested and I am causing packets to be sent at
> five second intervals instead of 10, 20, 40, 80 or 120. I am
> unconcerned by this because the number of additional packets is
> miniscule when compared to the number of packets that caused the problem
> in the initial instance. I do not know how 120 seconds was chosen for
> the RTO maximum but I observe that network bandwidth has increased by
> orders of magnitude since it was, and feel that a corresponding decrease
> in RTO is fair. I put it to administrators everywhere to consider this
> when faced with similar problems.
>
> It's a pity that Obata Noboru's patch was rejected.

Linux already doesn't follow enough RFC's. But it is free software so
you can do what you want. That is the beauty of it.

2008-06-16 07:33:35

by David Newall

[permalink] [raw]
Subject: Re: Feedback on TCP: Make TCP_RTO_MAX a variable

Stephen Hemminger wrote:
> On Mon, 16 Jun 2008 06:27:35 +0930
> David Newall <[email protected]> wrote:
>
>> ... caused by floods of packets directed towards the internet
>> link at one end or the other
> Why are you letting them through. Use proper firewalling.
>

They didn't get through the router. These floods congested the border
links (devices).

> A real VPN with IPSEC would have stopped the problem.
>

No, it wouldn't. If you don't see this, ask and I'll explain, again.


> I wouldn't put a mission critical system exposed directly to the Internet.
>

I didn't. Standard NAT appliances protect all ends.

2008-06-16 07:40:48

by David Newall

[permalink] [raw]
Subject: Re: Feedback on TCP: Make TCP_RTO_MAX a variable

Chris Fowler wrote:
> My only question would be that if you did not have this feature how
> would you have solved your problem?


I have no idea. An option would be to use private managed links, but
that would cost about 100 times as much, and I would not be able to
administer the machines via the internet.

With programs that scan and probe the internet, widely available for
every script-kiddies entertainment, I believe this is, as I said, a
standard characteristic of the internet, and I must live with it. Had
this been an actual DOS attack on my client, rather than the normal,
random attention from said script-kiddies, reducing TCP_RTO_MAX would
not have helped.

2008-06-16 07:52:48

by David Miller

[permalink] [raw]
Subject: Re: Feedback on TCP: Make TCP_RTO_MAX a variable


linux-net is for user questions rather than development discussion,
please instead use netdev to reach the developers

2008-06-16 14:50:09

by Noboru OBATA

[permalink] [raw]
Subject: Re: Feedback on TCP: Make TCP_RTO_MAX a variable

> Last year, Obata Noboru sent a patch to permit adjustment of
> TCP_RTO_MAX, which I have found useful. Refer to
> http://marc.info/?l=linux-netdev&m=118422471428855 for details.
>
> A customer reported that their internet-connected POS terminals were
> regularly "freezing" for extended periods, sometimes for as long as a
> few minutes. My analysis, such as it was, suggested that those
> occasions were caused by floods of packets directed towards the internet
> link at one end or the other (i.e. POS terminal or central server),
> leading to severe packet loss and maximum packet retransmit times during
> which no session data could be transmitted. I believe those floods were
> caused by anonymous third parties scanning the internet, and attempting
> to break through my client's routers. I also believe that to be an
> unavoidable social quality of the internet; I have to live with it.

I found your feedback interesting, David.

The wireless people gave me an similar feedback when I first
post the patch. They said my patch helps TCP connections
recover from the bursty loss much faster than the normal TCP
behavior. They found my patch useful because a packet loss on
wireless is not necessarily caused by link congestion, but
largely by temporary radio noize or handover between base
stations.

Your situtation seems to me that your connection itself does not
contribute the congestion, but other bursty incoming traffic
does. So exponential back-off does not help the packet loss
substantially.

My motivation of the patch is different, however. I wanted TCP
to retransmit the packet shortly after the failover of
underlying network.

I found it interesting that in all the cases where my patch
helps people, the TCP connection in question is not really a
part of congestion and has nothing to do with the packet loss
the connetion is experiencing.

Regards,

--
Noboru OBATA ([email protected])