2009-09-24 17:49:24

by Mike Caoco

[permalink] [raw]
Subject: TCP stack bug related to F-RTO?

Hello,

I have found the following behavior with different versions of linux kernel. The attached pcap trace is collected with server (192.168.0.13) running 2.6.24 and shows the problem. Basically the behavior is like this:

1. The client opens up a big window,
2. the server sends 19 packets in a row (pkt #14- #32 in the trace), but all of them are dropped due to some congestion.
3. The server hits RTO and retransmits pkt #14 in #33
4. The client immediately acks #33 (=#14), and the server (seems like to enter F-RTO) expends the window and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to 2*RTO; The client immediately sends two Dup-ack to #35 and #36.
5. after 2*RTO, pkt #15 is retransmitted in #39.
6. The client immediately acks #39 (=#15) in #40, and the server continues to expand the window and sends two *NEW* pkt #41 & #42. Now the timeoute is doubled to 4 *RTO.
8. After 4*RTO timeout, #16 is retransmitted.
9....
10. The above steps repeats for retransmitting pkt #16-#32 and each time the timeout is doubled.
11. It takes a long long time to retransmit all the lost packets and before that is done, the client sends a RST because of timeout.

The above behavior looks like F-RTO is in effect. And there seems to be a bug in the TCP's congestion control and retransmission algorithm. Why doesn't the TCP on server (running 2.6.24) enter the slow start? Why should the server take that long to recover from a short period of packet loss?

Has anyone else noticed similar problem before? If my analysis was wrong, can anyone gives me some pointers to what's really wrong and how to fix it?

Thanks a lot,
Joe

PS. Please cc me when this message is replied.



Attachments:
frto.pcap.7 (71.90 kB)

2009-09-25 00:03:58

by Ray Lee

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

[adding netdev cc:]

On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]> wrote:
>
> Hello,
>
> I have found the following behavior with different versions of linux kernel. The attached pcap trace is collected with server (192.168.0.13) running 2.6.24 and shows the problem. Basically the behavior is like this:
>
> 1. The client opens up a big window,
> 2. the server sends 19 packets in a row (pkt #14- #32 in the trace), but all of them are dropped due to some congestion.
> 3. The server hits RTO and retransmits pkt #14 in #33
> 4. The client immediately acks #33 (=#14), and the server (seems like to enter F-RTO) expends the window and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to 2*RTO; The client immediately sends two Dup-ack to #35 and #36.
> 5. after 2*RTO, pkt #15 is retransmitted in #39.
> 6. The client immediately acks #39 (=#15) in #40, and the server continues to expand the window and sends two *NEW* pkt #41 & #42. Now the timeoute is doubled to 4 *RTO.
> 8. After 4*RTO timeout, #16 is retransmitted.
> 9....
> 10. The above steps repeats for retransmitting pkt #16-#32 and each time the timeout is doubled.
> 11. It takes a long long time to retransmit all the lost packets and before that is done, the client sends a RST because of timeout.
>
> The above behavior looks like F-RTO is in effect.  And there seems to be a bug in the TCP's congestion control and retransmission algorithm. Why doesn't the TCP on server (running 2.6.24) enter the slow start? Why should the server take that long to recover from a short period of packet loss?
>
> Has anyone else noticed similar problem before?  If my analysis was wrong, can anyone gives me some pointers to what's really wrong and how to fix it?
>
> Thanks a lot,
> Joe
>
> PS. Please cc me when this message is replied.
>
>
>


Attachments:
frto.pcap.7 (71.90 kB)

2009-09-25 02:32:54

by zhigang gong

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

On Fri, Sep 25, 2009 at 1:43 AM, Joe Cao <[email protected]> wrote:
> Hello,
>
> I have found the following behavior with different versions of linux kernel. The attached pcap trace is collected with server (192.168.0.13) running 2.6.24 and shows the problem. Basically the behavior is like this:
>
> 1. The client opens up a big window,
> 2. the server sends 19 packets in a row (pkt #14- #32 in the trace), but all of them are dropped due to some congestion.
> 3. The server hits RTO and retransmits pkt #14 in #33
> 4. The client immediately acks #33 (=#14), and the server (seems like to enter F-RTO) expends the window and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to 2*RTO; The client immediately sends two Dup-ack to #35 and #36.
> 5. after 2*RTO, pkt #15 is retransmitted in #39.
> 6. The client immediately acks #39 (=#15) in #40, and the server continues to expand the window and sends two *NEW* pkt #41 & #42. Now the timeoute is doubled to 4 *RTO.
> 8. After 4*RTO timeout, #16 is retransmitted.
> 9....
> 10. The above steps repeats for retransmitting pkt #16-#32 and each time the timeout is doubled.
> 11. It takes a long long time to retransmit all the lost packets and before that is done, the client sends a RST because of timeout.
>
> The above behavior looks like F-RTO is in effect. ?And there seems to be a bug in the TCP's congestion control and
> retransmission algorithm. Why doesn't the TCP on server (running 2.6.24) enter the slow start?
As I know, the early implementation hasn't enter slow start if the
remote end is in the same network. I'm not sure that of the version
2.6.24. But after I have a look at your trace, I think this is not the
point of your problem. The behaviour of your client 192.168.0.82 is
very strange. The client always send a packet with error TCP checksum
and the 4# to 13# packets sent by the client totally don't conform
to the TCP protocol, not only with wrong TCP checksum but also with
incorrect seq and ack number.

My suggestion is that before you start to investigate the server
side's behaviour, you need to correct your client side's TCP/IP stack
implementation first.

>Why should the server take that long to recover from a short period of packet loss?

>
> Has anyone else noticed similar problem before? ?If my analysis was wrong, can anyone gives me some pointers to what's really wrong and how to fix it?
>
> Thanks a lot,
> Joe
>
> PS. Please cc me when this message is replied.
>
>
>

2009-09-25 06:42:43

by Mike Caoco

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

Hi,

On the wrong tcp checksum, that's because of hardware checksum offload.

As for the seq/ack number, because the trace is long, I deliberately removed those irrelevant packets between after the three-way handshake and when the problem happens. That can be seen from the timestamps.

Please also note that I intentionally replaced the IP addresses and mac addresses in the trace to hide proprietary information in the trace.

Anyway, the problem is not related to the checksum, or seq/ack number, otherwise, you won't see the behavior shown in the trace.

Thanks,
Joe

--- On Thu, 9/24/09, zhigang gong <[email protected]> wrote:

> From: zhigang gong <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <[email protected]>
> Cc: [email protected], [email protected], [email protected]
> Date: Thursday, September 24, 2009, 7:32 PM
> On Fri, Sep 25, 2009 at 1:43 AM, Joe
> Cao <[email protected]>
> wrote:
> > Hello,
> >
> > I have found the following behavior with different
> versions of linux kernel. The attached pcap trace is
> collected with server (192.168.0.13) running 2.6.24 and
> shows the problem. Basically the behavior is like this:
> >
> > 1. The client opens up a big window,
> > 2. the server sends 19 packets in a row (pkt #14- #32
> in the trace), but all of them are dropped due to some
> congestion.
> > 3. The server hits RTO and retransmits pkt #14 in #33
> > 4. The client immediately acks #33 (=#14), and the
> server (seems like to enter F-RTO) expends the window and
> sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
> 2*RTO; The client immediately sends two Dup-ack to #35 and
> #36.
> > 5. after 2*RTO, pkt #15 is retransmitted in #39.
> > 6. The client immediately acks #39 (=#15) in #40, and
> the server continues to expand the window and sends two
> *NEW* pkt #41 & #42. Now the timeoute is doubled to 4
> *RTO.
> > 8. After 4*RTO timeout, #16 is retransmitted.
> > 9....
> > 10. The above steps repeats for retransmitting pkt
> #16-#32 and each time the timeout is doubled.
> > 11. It takes a long long time to retransmit all the
> lost packets and before that is done, the client sends a RST
> because of timeout.
> >
> > The above behavior looks like F-RTO is in effect.
> ?And there seems to be a bug in the TCP's congestion
> control and
> > retransmission algorithm. Why doesn't the TCP on
> server (running 2.6.24) enter the slow start?
> As I know, the early implementation hasn't enter slow start
> if the
> remote end is in the same network.? I'm not sure that
> of the version
> 2.6.24. But after I have a look at your trace, I think this
> is not the
> point of your problem. The behaviour of your client
> 192.168.0.82 is
> very strange. The client always send a packet with error
> TCP checksum
> and the 4# to 13# packets sent by the
> client???totally don't conform
> to? the TCP protocol, not only with wrong TCP checksum
> but also with
> incorrect seq and ack number.
>
> My suggestion is that before you start to investigate the
> server
> side's behaviour, you need to correct your client side's
> TCP/IP stack
> implementation first.
>
> >Why should the server take that long to recover from a
> short period of packet loss?
>
> >
> > Has anyone else noticed similar problem before? ?If
> my analysis was wrong, can anyone gives me some pointers to
> what's really wrong and how to fix it?
> >
> > Thanks a lot,
> > Joe
> >
> > PS. Please cc me when this message is replied.
> >
> >
> >
>



2009-09-25 08:55:36

by zhigang gong

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

Oh, I see, so I spoke too quickly in last mail. You just ignore some packets
in the trace. I have analysed the traffic flow and have some findings as below,
hope it's helpful.

>> > 1. The client opens up a big window,
>> > 2. the server sends 19 packets in a row (pkt #14- #32
>> in the trace), but all of them are dropped due to some
>> congestion.
>> > 3. The server hits RTO and retransmits pkt #14 in #33
This retransmission timer expiring indicate the server's tcp/ip
stack to enter slow start mode, as a result we can see the
server's sending window will be reduced to one.

>> > 4. The client immediately acks #33 (=#14), and the
>> server (seems like to enter F-RTO) expends the window and
>> sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
>> 2*RTO; The client immediately sends two Dup-ack to #35 and
>> #36.

Server is still in slow start mode, and extend window to 2.

>> > 5. after 2*RTO, pkt #15 is retransmitted in #39.

Here , the second retransmission timer expiring ocur. Server's sending
window reduce to one again and continue in slow start mode.

>> > 6. The client immediately acks #39 (=#15) in #40, and
>> the server continues to expand the window and sends two
>> *NEW* pkt #41 & #42. Now the timeoute is doubled to 4
>> *RTO.
Here you ignore two duplicate acks #37 and #38 sent by the client. As I know
the server must receive three or even more duplcate acks before it enter fast
retransmit mode, otherwise it will still in slow start mode and it
will wait until next
time retransmission timer expiring before retransmit the lost packets.
And this is
actually what you got.

I'm not an kernel expert, I just analyse from the TCP protocol standard. From my
view, I think there is no problem in the server's network stack. But
there maybe
some problem in the client (or some intermediate network appliance) side, as it
always just sends two duplicate acks at the same time, and never send the third
one no matter how long the interval is. In my opinion, if the client
can send the third
duplicate acks then the server will enter fast retransmit mode and
then fast recovery
then every thing will be ok.

>> > 8. After 4*RTO timeout, #16 is retransmitted.
>> > 9....
>> > 10. The above steps repeats for retransmitting pkt
>> #16-#32 and each time the timeout is doubled.
>> > 11. It takes a long long time to retransmit all the
>> lost packets and before that is done, the client sends a RST
>> because of timeout.

On Fri, Sep 25, 2009 at 2:42 PM, Joe Cao <[email protected]> wrote:
> Hi,
>
> On the wrong tcp checksum, that's because of hardware checksum offload.
>
> As for the seq/ack number, because the trace is long, I deliberately removed those irrelevant packets between after the three-way handshake and when the problem happens. ?That can be seen from the timestamps.
>
> Please also note that I intentionally replaced the IP addresses and mac addresses in the trace to hide proprietary information in the trace.
>
> Anyway, the problem is not related to the checksum, or seq/ack number, otherwise, you won't see the behavior shown in the trace.
>
> Thanks,
> Joe
>
> --- On Thu, 9/24/09, zhigang gong <[email protected]> wrote:
>

2009-09-25 13:09:37

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

On Thu, 24 Sep 2009, Ray Lee wrote:

> [adding netdev cc:]
>
> On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]> wrote:
> >
> > Hello,
> >
> > I have found the following behavior with different versions of linux
> > kernel. The attached pcap trace is collected with server
> > (192.168.0.13) running 2.6.24 and shows the problem. Basically the
> > behavior is like this:
> >
> > 1. The client opens up a big window,
> > 2. the server sends 19 packets in a row (pkt #14- #32 in the trace), but all of them are dropped due to some congestion.
> > 3. The server hits RTO and retransmits pkt #14 in #33
> > 4. The client immediately acks #33 (=#14), and the server (seems like to enter F-RTO) expends the window and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to 2*RTO; The client immediately sends two Dup-ack to #35 and #36.
> > 5. after 2*RTO, pkt #15 is retransmitted in #39.
> > 6. The client immediately acks #39 (=#15) in #40, and the server continues to expand the window and sends two *NEW* pkt #41 & #42. Now the timeoute is doubled to 4 *RTO.
> > 8. After 4*RTO timeout, #16 is retransmitted.
> > 9....
> > 10. The above steps repeats for retransmitting pkt #16-#32 and each time the timeout is doubled.
> > 11. It takes a long long time to retransmit all the lost packets and before that is done, the client sends a RST because of timeout.
> >
> > The above behavior looks like F-RTO is in effect. ?And there seems to
> > be a bug in the TCP's congestion control and retransmission algorithm.
> > Why doesn't the TCP on server (running 2.6.24) enter the slow start?
> > Why should the server take that long to recover from a short period
> > of packet loss?
> >
> > Has anyone else noticed similar problem before? ?If my analysis was
> > wrong, can anyone gives me some pointers to what's really wrong and
> > how to fix it?

Yes, 2.6.24 is an obsoleted version with known wrongs in FRTO
implementation. Fixes never when to 2.6.24 stable series as it was
_already_ obsoleted when the problems where reported and found. The
correct fixes may be found from 2.6.25.7 (.7 iirc) and are included from
2.6.26 onward too.

Just in case you happen to run ubuntu based kernel from that era (of
course you should be reporting the bug here then...), a word of warning:
it seemed nearly impossible for them to get a simple thing like that
fixed, I haven't been looking if they'd eventually come to some sensible
conclusion in that matter or is it still unresolved (or e.g., closed
without real resolution).

--
i.

2009-09-25 15:58:13

by Mike Caoco

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?


Hi Ilpo,

Thanks for the reply! Do you happen to know which patch fixed the problem? Is there a bug tracking system for linux kernel?

I studied the FRTO code in latest kernel 2.6.31. It seems the problem is still there:

1. Every time a RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() returns true. And the server tcp enters FRTO.
2. After the head of write queue is retransmitted, two new data packets are transmitted, the server receives two dup-ACKs. That will make the TCP enter tcp_enter_frto_loss(), however, that only rests ssthresh and some other fields.
3. After another longer RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() again returns true. The stack enters FRTO again.
4. The above repeats and the stack couldn't retransmits the lost packets faster.

Is my understanding above correct?

Thanks,
Joe

--- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]> wrote:

> From: Ilpo J?rvinen <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Ray Lee" <[email protected]>
> Cc: "Joe Cao" <[email protected]>, "Netdev" <[email protected]>, "LKML" <[email protected]>, [email protected]
> Date: Friday, September 25, 2009, 6:09 AM
> On Thu, 24 Sep 2009, Ray Lee wrote:
>
> > [adding netdev cc:]
> >
> > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]>
> wrote:
> > >
> > > Hello,
> > >
> > > I have found the following behavior with
> different versions of linux
> > > kernel. The attached pcap trace is collected with
> server
> > > (192.168.0.13) running 2.6.24 and shows the
> problem. Basically the
> > > behavior is like this:
> > >
> > > 1. The client opens up a big window,
> > > 2. the server sends 19 packets in a row (pkt #14-
> #32 in the trace), but all of them are dropped due to some
> congestion.
> > > 3. The server hits RTO and retransmits pkt #14 in
> #33
> > > 4. The client immediately acks #33 (=#14), and
> the server (seems like to enter F-RTO) expends the window
> and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
> 2*RTO; The client immediately sends two Dup-ack to #35 and
> #36.
> > > 5. after 2*RTO, pkt #15 is retransmitted in #39.
> > > 6. The client immediately acks #39 (=#15) in #40,
> and the server continues to expand the window and sends two
> *NEW* pkt #41 & #42. Now the timeoute is doubled to 4
> *RTO.
> > > 8. After 4*RTO timeout, #16 is retransmitted.
> > > 9....
> > > 10. The above steps repeats for retransmitting
> pkt #16-#32 and each time the timeout is doubled.
> > > 11. It takes a long long time to retransmit all
> the lost packets and before that is done, the client sends a
> RST because of timeout.
> > >
> > > The above behavior looks like F-RTO is in effect.
> ?And there seems to
> > > be a bug in the TCP's congestion control and
> retransmission algorithm.
> > > Why doesn't the TCP on server (running 2.6.24)
> enter the slow start?
> > > Why should the server take that long to recover
> from a short period
> > > of packet loss?
> > >
> > > Has anyone else noticed similar problem before?
> ?If my analysis was
> > > wrong, can anyone gives me some pointers to
> what's really wrong and
> > > how to fix it?
>
> Yes, 2.6.24 is an obsoleted version with known wrongs in
> FRTO
> implementation. Fixes never when to 2.6.24 stable series as
> it was
> _already_ obsoleted when the problems where reported and
> found. The
> correct fixes may be found from 2.6.25.7 (.7 iirc) and are
> included from
> 2.6.26 onward too.
>
> Just in case you happen to run ubuntu based kernel from
> that era (of
> course you should be reporting the bug here then...), a
> word of warning:
> it seemed nearly impossible for them to get a simple thing
> like that
> fixed, I haven't been looking if they'd eventually come to
> some sensible
> conclusion in that matter or is it still unresolved (or
> e.g., closed
> without real resolution).
>
> --
> i.



2009-09-25 16:07:35

by Mike Caoco

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

Hi Zhigang,

Thanks for help looking into the issue.

My answer to your analysis is of course there won't the third dup-ack, because the server only sends TWO NEW data packets every time. Clearly this is server's problem and not the client's problem.

Thanks,
Joe

--- On Fri, 9/25/09, zhigang gong <[email protected]> wrote:

> From: zhigang gong <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <[email protected]>
> Cc: [email protected], [email protected], [email protected]
> Date: Friday, September 25, 2009, 1:55 AM
> Oh, I see, so I spoke too quickly in
> last mail. You just ignore some packets
> in the trace. I have analysed the traffic flow? and
> have some findings as below,
> hope it's helpful.
>
> >> > 1. The client opens up a big window,
> >> > 2. the server sends 19 packets in a row (pkt
> #14- #32
> >> in the trace), but all of them are dropped due to
> some
> >> congestion.
> >> > 3. The server hits RTO and retransmits pkt
> #14 in #33
> This retransmission timer expiring indicate the server's
> tcp/ip
> stack to enter slow start mode, as a result we can see the
> server's sending window will be reduced to one.
>
> >> > 4. The client immediately acks #33 (=#14),
> and the
> >> server (seems like to enter F-RTO) expends the
> window and
> >> sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> >> 2*RTO; The client immediately sends two Dup-ack to
> #35 and
> >> #36.
>
> Server is still in slow start mode, and extend window to
> 2.
>
> >> > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
>
> Here , the second retransmission timer expiring ocur.
> Server's sending
> window reduce to one again and continue in slow start
> mode.
>
> >> > 6.. The client immediately acks #39 (=#15) in
> #40, and
> >> the server continues to expand the window and
> sends two
> >> *NEW* pkt #41 & #42. Now the timeoute is
> doubled to 4
> >> *RTO.
> Here you ignore two duplicate acks #37 and #38 sent by the
> client. As I know
> the server must receive three or even more duplcate acks
> before it enter fast
> retransmit mode, otherwise it will still in slow start mode
> and? it
> will wait until next
> time retransmission timer expiring before retransmit the
> lost packets.
> And this is
> actually what you got.
>
> I'm not an kernel expert, I just analyse from the TCP
> protocol standard. From my
> view, I think there is no problem in the server's network
> stack. But
> there maybe
> some problem in the client (or some intermediate network
> appliance) side, as it
> always just sends two duplicate acks at the same time, and
> never send the third
> one no matter how long the interval is. In my opinion, if
> the client
> can send the third
> duplicate acks then the server will enter fast retransmit
> mode and
> then fast recovery
> then every thing will be ok.
>
> >> > 8. After 4*RTO timeout, #16 is
> retransmitted.
> >> > 9....
> >> > 10. The above steps repeats for
> retransmitting pkt
> >> #16-#32 and each time the timeout is doubled.
> >> > 11. It takes a long long time to retransmit
> all the
> >> lost packets and before that is done, the client
> sends a RST
> >> because of timeout.
>
> On Fri, Sep 25, 2009 at 2:42 PM, Joe Cao <[email protected]>
> wrote:
> > Hi,
> >
> > On the wrong tcp checksum, that's because of hardware
> checksum offload.
> >
> > As for the seq/ack number, because the trace is long,
> I deliberately removed those irrelevant packets between
> after the three-way handshake and when the problem happens.
> ?That can be seen from the timestamps.
> >
> > Please also note that I intentionally replaced the IP
> addresses and mac addresses in the trace to hide proprietary
> information in the trace.
> >
> > Anyway, the problem is not related to the checksum, or
> seq/ack number, otherwise, you won't see the behavior shown
> in the trace.
> >
> > Thanks,
> > Joe
> >
> > --- On Thu, 9/24/09, zhigang gong <[email protected]>
> wrote:
> >
>



2009-09-25 18:03:32

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

On Fri, 25 Sep 2009, Joe Cao wrote:

> Thanks for the reply! Do you happen to know which patch fixed the
> problem?

You can find those patches from the stable queue git tree. I gave you hint
from what release to look from in the last mail. However, as 2.6.24 is
anyway obsolete my recommendation is that you should probably consider
upgrading to fix all the other bugs that have been found since 2.6.24 was
obsoleted.

> Is there a bug tracking system for linux kernel?

Nothing that knows everything about everything.

> I studied the FRTO code in latest kernel 2.6.31. It seems the problem
> is still there:
>
> 1. Every time a RTO fires, because tcp_is_sackfrto(tp) returns 1,
> tcp_use_frto() returns true. And the server tcp enters FRTO.
> 2. After the head of write queue is retransmitted, two new data packets
> are transmitted, the server receives two dup-ACKs. That will make the
> TCP enter tcp_enter_frto_loss(), however, that only rests ssthresh and
> some other fields.

Perhaps those other fields are far more important than you think... :-)
...Some retransmission would happen here as step 3.

> 3. After another longer RTO fires, because tcp_is_sackfrto(tp) returns
> 1, tcp_use_frto() again returns true. The stack enters FRTO again.
> 4. The above repeats and the stack couldn't retransmits the lost packets
> faster.
>
> Is my understanding above correct?

...No. All magic that happens in tcp_enter_frto_loss should be enough to
really do more than a single retransmission (that is, in any other than
2.6.24 series kernel). There was an unfortunate bug in this area in 2.6.24
which basically undoed the effect of correct actions tcp_enter_frto_loss
did which effectively prevented tcp_xmit_retransmit_queue from doing its
part.

--
i.

--- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]> wrote:

> From: Ilpo J?rvinen <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Ray Lee" <[email protected]>
> Cc: "Joe Cao" <[email protected]>, "Netdev" <[email protected]>, "LKML" <[email protected]>, [email protected]
> Date: Friday, September 25, 2009, 6:09 AM
> On Thu, 24 Sep 2009, Ray Lee wrote:
>
> > [adding netdev cc:]
> >
> > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]>
> wrote:
> > >
> > > Hello,
> > >
> > > I have found the following behavior with
> different versions of linux
> > > kernel. The attached pcap trace is collected with
> server
> > > (192.168.0.13) running 2.6.24 and shows the
> problem. Basically the
> > > behavior is like this:
> > >
> > > 1. The client opens up a big window,
> > > 2. the server sends 19 packets in a row (pkt #14-
> #32 in the trace), but all of them are dropped due to some
> congestion.
> > > 3. The server hits RTO and retransmits pkt #14 in
> #33
> > > 4. The client immediately acks #33 (=#14), and
> the server (seems like to enter F-RTO) expends the window
> and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
> 2*RTO; The client immediately sends two Dup-ack to #35 and
> #36.
> > > 5. after 2*RTO, pkt #15 is retransmitted in #39.
> > > 6. The client immediately acks #39 (=#15) in #40,
> and the server continues to expand the window and sends two
> *NEW* pkt #41 & #42. Now the timeoute is doubled to 4
> *RTO.
> > > 8. After 4*RTO timeout, #16 is retransmitted.
> > > 9....
> > > 10. The above steps repeats for retransmitting
> pkt #16-#32 and each time the timeout is doubled.
> > > 11. It takes a long long time to retransmit all
> the lost packets and before that is done, the client sends a
> RST because of timeout.
> > >
> > > The above behavior looks like F-RTO is in effect.
> ?And there seems to
> > > be a bug in the TCP's congestion control and
> retransmission algorithm.
> > > Why doesn't the TCP on server (running 2.6.24)
> enter the slow start?
> > > Why should the server take that long to recover
> from a short period
> > > of packet loss?
> > >
> > > Has anyone else noticed similar problem before?
> ?If my analysis was
> > > wrong, can anyone gives me some pointers to
> what's really wrong and
> > > how to fix it?
>
> Yes, 2.6.24 is an obsoleted version with known wrongs in
> FRTO
> implementation. Fixes never when to 2.6.24 stable series as
> it was
> _already_ obsoleted when the problems where reported and
> found. The
> correct fixes may be found from 2.6.25.7 (.7 iirc) and are
> included from
> 2.6.26 onward too.
>
> Just in case you happen to run ubuntu based kernel from
> that era (of
> course you should be reporting the bug here then...), a
> word of warning:
> it seemed nearly impossible for them to get a simple thing
> like that
> fixed, I haven't been looking if they'd eventually come to
> some sensible
> conclusion in that matter or is it still unresolved (or
> e.g., closed
> without real resolution).

2009-09-26 01:50:57

by Mike Caoco

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

That makes sense. Thanks for the info!

Joe

--- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]> wrote:

> From: Ilpo J?rvinen <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <[email protected]>
> Cc: "Ray Lee" <[email protected]>, "Netdev" <[email protected]>, "LKML" <[email protected]>
> Date: Friday, September 25, 2009, 11:03 AM
> On Fri, 25 Sep 2009, Joe Cao wrote:
>
> > Thanks for the reply!? Do you happen to know
> which patch fixed the
> > problem?
>
> You can find those patches from the stable queue git tree.
> I gave you hint
> from what release to look from in the last mail. However,
> as 2.6.24 is
> anyway obsolete my recommendation is that you should
> probably consider
> upgrading to fix all the other bugs that have been found
> since 2.6.24 was
> obsoleted.
>
> > Is there a bug tracking system for linux kernel?
>
> Nothing that knows everything about everything.
>
> > I studied the FRTO code in latest kernel 2.6.31.?
> It seems the problem
> > is still there:?
> >
> > 1. Every time a RTO fires, because tcp_is_sackfrto(tp)
> returns 1,
> > tcp_use_frto() returns true.? And the server tcp
> enters FRTO.
> > 2. After the head of write queue is retransmitted, two
> new data packets
> > are transmitted, the server receives two
> dup-ACKs.? That will make the
> > TCP enter tcp_enter_frto_loss(), however, that only
> rests ssthresh and
> > some other fields.
>
> Perhaps those other fields are far more important than you
> think... :-)
> ...Some retransmission would happen here as step 3.
>
> > 3. After another longer RTO fires, because
> tcp_is_sackfrto(tp) returns
> > 1, tcp_use_frto() again returns true.? The stack
> enters FRTO again.
> > 4. The above repeats and the stack couldn't
> retransmits the lost packets
> > faster.
> >
> > Is my understanding above correct?
>
> ...No. All magic that happens in tcp_enter_frto_loss should
> be enough to
> really do more than a single retransmission (that is, in
> any other than
> 2.6.24 series kernel). There was an unfortunate bug in this
> area in 2.6.24
> which basically undoed the effect of correct actions
> tcp_enter_frto_loss
> did which effectively prevented tcp_xmit_retransmit_queue
> from doing its
> part.
>
> --
> i.
>
> --- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]>
> wrote:
>
> > From: Ilpo J?rvinen <[email protected]>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Ray Lee" <[email protected]>
> > Cc: "Joe Cao" <[email protected]>,
> "Netdev" <[email protected]>,
> "LKML" <[email protected]>,
> [email protected]
> > Date: Friday, September 25, 2009, 6:09 AM
> > On Thu, 24 Sep 2009, Ray Lee wrote:
> >
> > > [adding netdev cc:]
> > >
> > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]>
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have found the following behavior with
> > different versions of linux
> > > > kernel. The attached pcap trace is collected
> with
> > server
> > > > (192.168.0.13) running 2.6.24 and shows the
> > problem. Basically the
> > > > behavior is like this:
> > > >
> > > > 1. The client opens up a big window,
> > > > 2. the server sends 19 packets in a row (pkt
> #14-
> > #32 in the trace), but all of them are dropped due to
> some
> > congestion.
> > > > 3. The server hits RTO and retransmits pkt
> #14 in
> > #33
> > > > 4. The client immediately acks #33 (=#14),
> and
> > the server (seems like to enter F-RTO) expends the
> window
> > and sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> > 2*RTO; The client immediately sends two Dup-ack to #35
> and
> > #36.
> > > > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
> > > > 6. The client immediately acks #39 (=#15) in
> #40,
> > and the server continues to expand the window and
> sends two
> > *NEW* pkt #41 & #42. Now the timeoute is doubled
> to 4
> > *RTO.
> > > > 8. After 4*RTO timeout, #16 is
> retransmitted.
> > > > 9....
> > > > 10. The above steps repeats for
> retransmitting
> > pkt #16-#32 and each time the timeout is doubled.
> > > > 11. It takes a long long time to retransmit
> all
> > the lost packets and before that is done, the client
> sends a
> > RST because of timeout.
> > > >
> > > > The above behavior looks like F-RTO is in
> effect.
> > ?And there seems to
> > > > be a bug in the TCP's congestion control
> and
> > retransmission algorithm.
> > > > Why doesn't the TCP on server (running
> 2.6.24)
> > enter the slow start?
> > > > Why should the server take that long to
> recover
> > from a short period
> > > > of packet loss?
> > > >
> > > > Has anyone else noticed similar problem
> before?
> > ?If my analysis was
> > > > wrong, can anyone gives me some pointers to
> > what's really wrong and
> > > > how to fix it?
> >
> > Yes, 2.6.24 is an obsoleted version with known wrongs
> in
> > FRTO
> > implementation. Fixes never when to 2.6.24 stable
> series as
> > it was
> > _already_ obsoleted when the problems where reported
> and
> > found. The
> > correct fixes may be found from 2.6.25.7 (.7 iirc) and
> are
> > included from
> > 2.6.26 onward too.
> >
> > Just in case you happen to run ubuntu based kernel
> from
> > that era (of
> > course you should be reporting the bug here then...),
> a
> > word of warning:
> > it seemed nearly impossible for them to get a simple
> thing
> > like that
> > fixed, I haven't been looking if they'd eventually
> come to
> > some sensible
> > conclusion in that matter or is it still unresolved
> (or
> > e.g., closed
> > without real resolution).
>
>



2009-09-26 16:53:42

by Mike Caoco

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

Hi Ilpo,

Can you elaborate on "Some retransmission would happen here as step 3"? When the second timeout happens, it will again go into FRTO and then retransmit the write queue head.

I looked at the patch (debian Bug#478062) that's probably what you mentioned as the fix. All it does was to exclude the SACK case when considering FRTO. But in my case, SACK was enabled, as seen in the trace.

In other words, do we still have a problem with FRTO when SACK is enabled in the latest kernel?

Thanks,
Joe

--- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]> wrote:

> From: Ilpo J?rvinen <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <[email protected]>
> Cc: "Ray Lee" <[email protected]>, "Netdev" <[email protected]>, "LKML" <[email protected]>
> Date: Friday, September 25, 2009, 11:03 AM
> On Fri, 25 Sep 2009, Joe Cao wrote:
>
> > Thanks for the reply!? Do you happen to know
> which patch fixed the
> > problem?
>
> You can find those patches from the stable queue git tree.
> I gave you hint
> from what release to look from in the last mail. However,
> as 2.6.24 is
> anyway obsolete my recommendation is that you should
> probably consider
> upgrading to fix all the other bugs that have been found
> since 2.6.24 was
> obsoleted.
>
> > Is there a bug tracking system for linux kernel?
>
> Nothing that knows everything about everything.
>
> > I studied the FRTO code in latest kernel 2.6.31..?
> It seems the problem
> > is still there:?
> >
> > 1. Every time a RTO fires, because tcp_is_sackfrto(tp)
> returns 1,
> > tcp_use_frto() returns true.? And the server tcp
> enters FRTO.
> > 2. After the head of write queue is retransmitted, two
> new data packets
> > are transmitted, the server receives two
> dup-ACKs.? That will make the
> > TCP enter tcp_enter_frto_loss(), however, that only
> rests ssthresh and
> > some other fields.
>
> Perhaps those other fields are far more important than you
> think... :-)
> ...Some retransmission would happen here as step 3.
>
> > 3. After another longer RTO fires, because
> tcp_is_sackfrto(tp) returns
> > 1, tcp_use_frto() again returns true.? The stack
> enters FRTO again.
> > 4. The above repeats and the stack couldn't
> retransmits the lost packets
> > faster.
> >
> > Is my understanding above correct?
>
> ...No. All magic that happens in tcp_enter_frto_loss should
> be enough to
> really do more than a single retransmission (that is, in
> any other than
> 2.6.24 series kernel). There was an unfortunate bug in this
> area in 2.6.24
> which basically undoed the effect of correct actions
> tcp_enter_frto_loss
> did which effectively prevented tcp_xmit_retransmit_queue
> from doing its
> part.
>
> --
> i.
>
> --- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]>
> wrote:
>
> > From: Ilpo J?rvinen <[email protected]>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Ray Lee" <[email protected]>
> > Cc: "Joe Cao" <[email protected]>,
> "Netdev" <[email protected]>,
> "LKML" <[email protected]>,
> [email protected]
> > Date: Friday, September 25, 2009, 6:09 AM
> > On Thu, 24 Sep 2009, Ray Lee wrote:
> >
> > > [adding netdev cc:]
> > >
> > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]>
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have found the following behavior with
> > different versions of linux
> > > > kernel. The attached pcap trace is collected
> with
> > server
> > > > (192.168.0.13) running 2.6.24 and shows the
> > problem. Basically the
> > > > behavior is like this:
> > > >
> > > > 1. The client opens up a big window,
> > > > 2. the server sends 19 packets in a row (pkt
> #14-
> > #32 in the trace), but all of them are dropped due to
> some
> > congestion.
> > > > 3. The server hits RTO and retransmits pkt
> #14 in
> > #33
> > > > 4. The client immediately acks #33 (=#14),
> and
> > the server (seems like to enter F-RTO) expends the
> window
> > and sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> > 2*RTO; The client immediately sends two Dup-ack to #35
> and
> > #36.
> > > > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
> > > > 6. The client immediately acks #39 (=#15) in
> #40,
> > and the server continues to expand the window and
> sends two
> > *NEW* pkt #41 & #42. Now the timeoute is doubled
> to 4
> > *RTO.
> > > > 8. After 4*RTO timeout, #16 is
> retransmitted.
> > > > 9....
> > > > 10. The above steps repeats for
> retransmitting
> > pkt #16-#32 and each time the timeout is doubled.
> > > > 11. It takes a long long time to retransmit
> all
> > the lost packets and before that is done, the client
> sends a
> > RST because of timeout.
> > > >
> > > > The above behavior looks like F-RTO is in
> effect.
> > ?And there seems to
> > > > be a bug in the TCP's congestion control
> and
> > retransmission algorithm.
> > > > Why doesn't the TCP on server (running
> 2.6.24)
> > enter the slow start?
> > > > Why should the server take that long to
> recover
> > from a short period
> > > > of packet loss?
> > > >
> > > > Has anyone else noticed similar problem
> before?
> > ?If my analysis was
> > > > wrong, can anyone gives me some pointers to
> > what's really wrong and
> > > > how to fix it?
> >
> > Yes, 2.6.24 is an obsoleted version with known wrongs
> in
> > FRTO
> > implementation. Fixes never when to 2.6.24 stable
> series as
> > it was
> > _already_ obsoleted when the problems where reported
> and
> > found. The
> > correct fixes may be found from 2.6.25.7 (.7 iirc) and
> are
> > included from
> > 2.6.26 onward too.
> >
> > Just in case you happen to run ubuntu based kernel
> from
> > that era (of
> > course you should be reporting the bug here then...),
> a
> > word of warning:
> > it seemed nearly impossible for them to get a simple
> thing
> > like that
> > fixed, I haven't been looking if they'd eventually
> come to
> > some sensible
> > conclusion in that matter or is it still unresolved
> (or
> > e.g., closed
> > without real resolution).
>
>



2009-09-26 17:51:09

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

On Sat, 26 Sep 2009, Joe Cao wrote:

> Can you elaborate on "Some retransmission would happen here as step 3"?
> When the second timeout happens, it will again go into FRTO and then
> retransmit the write queue head.

Why do you think that the second RTO will happen with anything else than
with 2.6.24. And it's perfectly ok to go into FRTO for the second time.

> I looked at the patch (debian Bug#478062) that's probably what you
> mentioned as the fix. All it does was to exclude the SACK case when
> considering FRTO. But in my case, SACK was enabled, as seen in the
> trace.

You should be looking from where I said rather than picking up your own
sources and assuming that they'll tell you all the story :-). In fact,
there are two fixes that were made in a row and one workaround in the
same timeframe. ...And you managed to pick the wrong one of the fixes, so
I kind of understand why you got confused :-).

> In other words, do we still have a problem with FRTO when SACK is
> enabled in the latest kernel?

For sure we might have all kinds of problems no one has yet
noticed/reported :-). ...However, it seems that this particular problem
your trace is showing is solved. Can you please test with a fixed kernel
before coming back here with these claims.


--
i.

--- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]> wrote:

> From: Ilpo J?rvinen <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <[email protected]>
> Cc: "Ray Lee" <[email protected]>, "Netdev" <[email protected]>, "LKML" <[email protected]>
> Date: Friday, September 25, 2009, 11:03 AM
> On Fri, 25 Sep 2009, Joe Cao wrote:
>
> > Thanks for the reply!? Do you happen to know
> which patch fixed the
> > problem?
>
> You can find those patches from the stable queue git tree.
> I gave you hint
> from what release to look from in the last mail. However,
> as 2.6.24 is
> anyway obsolete my recommendation is that you should
> probably consider
> upgrading to fix all the other bugs that have been found
> since 2.6.24 was
> obsoleted.
>
> > Is there a bug tracking system for linux kernel?
>
> Nothing that knows everything about everything.
>
> > I studied the FRTO code in latest kernel 2.6.31..?
> It seems the problem
> > is still there:?
> >
> > 1. Every time a RTO fires, because tcp_is_sackfrto(tp)
> returns 1,
> > tcp_use_frto() returns true.? And the server tcp
> enters FRTO.
> > 2. After the head of write queue is retransmitted, two
> new data packets
> > are transmitted, the server receives two
> dup-ACKs.? That will make the
> > TCP enter tcp_enter_frto_loss(), however, that only
> rests ssthresh and
> > some other fields.
>
> Perhaps those other fields are far more important than you
> think... :-)
> ...Some retransmission would happen here as step 3.
>
> > 3. After another longer RTO fires, because
> tcp_is_sackfrto(tp) returns
> > 1, tcp_use_frto() again returns true.? The stack
> enters FRTO again.
> > 4. The above repeats and the stack couldn't
> retransmits the lost packets
> > faster.
> >
> > Is my understanding above correct?
>
> ...No. All magic that happens in tcp_enter_frto_loss should
> be enough to
> really do more than a single retransmission (that is, in
> any other than
> 2.6.24 series kernel). There was an unfortunate bug in this
> area in 2.6.24
> which basically undoed the effect of correct actions
> tcp_enter_frto_loss
> did which effectively prevented tcp_xmit_retransmit_queue
> from doing its
> part.
>
> --
> i.
>
> --- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]>
> wrote:
>
> > From: Ilpo J?rvinen <[email protected]>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Ray Lee" <[email protected]>
> > Cc: "Joe Cao" <[email protected]>,
> "Netdev" <[email protected]>,
> "LKML" <[email protected]>,
> [email protected]
> > Date: Friday, September 25, 2009, 6:09 AM
> > On Thu, 24 Sep 2009, Ray Lee wrote:
> >
> > > [adding netdev cc:]
> > >
> > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <[email protected]>
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have found the following behavior with
> > different versions of linux
> > > > kernel. The attached pcap trace is collected
> with
> > server
> > > > (192.168.0.13) running 2.6.24 and shows the
> > problem. Basically the
> > > > behavior is like this:
> > > >
> > > > 1. The client opens up a big window,
> > > > 2. the server sends 19 packets in a row (pkt
> #14-
> > #32 in the trace), but all of them are dropped due to
> some
> > congestion.
> > > > 3. The server hits RTO and retransmits pkt
> #14 in
> > #33
> > > > 4. The client immediately acks #33 (=#14),
> and
> > the server (seems like to enter F-RTO) expends the
> window
> > and sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> > 2*RTO; The client immediately sends two Dup-ack to #35
> and
> > #36.
> > > > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
> > > > 6. The client immediately acks #39 (=#15) in
> #40,
> > and the server continues to expand the window and
> sends two
> > *NEW* pkt #41 & #42. Now the timeoute is doubled
> to 4
> > *RTO.
> > > > 8. After 4*RTO timeout, #16 is
> retransmitted.
> > > > 9....
> > > > 10. The above steps repeats for
> retransmitting
> > pkt #16-#32 and each time the timeout is doubled.
> > > > 11. It takes a long long time to retransmit
> all
> > the lost packets and before that is done, the client
> sends a
> > RST because of timeout.
> > > >
> > > > The above behavior looks like F-RTO is in
> effect.
> > ?And there seems to
> > > > be a bug in the TCP's congestion control
> and
> > retransmission algorithm.
> > > > Why doesn't the TCP on server (running
> 2.6.24)
> > enter the slow start?
> > > > Why should the server take that long to
> recover
> > from a short period
> > > > of packet loss?
> > > >
> > > > Has anyone else noticed similar problem
> before?
> > ?If my analysis was
> > > > wrong, can anyone gives me some pointers to
> > what's really wrong and
> > > > how to fix it?
> >
> > Yes, 2.6.24 is an obsoleted version with known wrongs
> in
> > FRTO
> > implementation. Fixes never when to 2.6.24 stable
> series as
> > it was
> > _already_ obsoleted when the problems where reported
> and
> > found. The
> > correct fixes may be found from 2.6.25.7 (.7 iirc) and
> are
> > included from
> > 2.6.26 onward too.
> >
> > Just in case you happen to run ubuntu based kernel
> from
> > that era (of
> > course you should be reporting the bug here then...),
> a
> > word of warning:
> > it seemed nearly impossible for them to get a simple
> thing
> > like that
> > fixed, I haven't been looking if they'd eventually
> come to
> > some sensible
> > conclusion in that matter or is it still unresolved
> (or
> > e.g., closed
> > without real resolution).

2009-09-26 20:48:28

by Mike Caoco

[permalink] [raw]
Subject: Re: TCP stack bug related to F-RTO?

Hi Ilpo,

Thanks for the replay. We noticed the problem while we were debugging a connection failure case reported by one of our customers (we are a network device vendor). Actually we have suggested our customer to upgrade their server software to fix the problem, and we are still waiting for the feedback from them. Meanwhile, I asked all those questions just because I want to understand the issue and the fixes. We also has to convince the customer to move to a right kernel and don't want them to come back with the same problem again.

Again, thanks for the help!

Joe

--- On Sat, 9/26/09, Ilpo J?rvinen <[email protected]> wrote:

> From: Ilpo J?rvinen <[email protected]>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <[email protected]>
> Cc: "Ray Lee" <[email protected]>, "Netdev" <[email protected]>, "LKML" <[email protected]>
> Date: Saturday, September 26, 2009, 10:51 AM
> On Sat, 26 Sep 2009, Joe Cao wrote:
>
> > Can you elaborate on "Some retransmission would happen
> here as step 3"??
> > When the second timeout happens, it will again go into
> FRTO and then
> > retransmit the write queue head.
>
> Why do you think that the second RTO will happen with
> anything else than
> with 2.6.24. And it's perfectly ok to go into FRTO for the
> second time.
>
> > I looked at the patch (debian Bug#478062) that's
> probably what you
> > mentioned as the fix. All it does was to exclude the
> SACK case when
> > considering FRTO.? But in my case, SACK was
> enabled, as seen in the
> > trace..
>
> You should be looking from where I said rather than picking
> up your own
> sources and assuming that they'll tell you all the story
> :-). In fact,
> there are two fixes that were made in a row and one
> workaround in the
> same timeframe. ...And you managed to pick the wrong one of
> the fixes, so
> I kind of understand why you got confused :-).
>
> > In other words, do we still have a problem with FRTO
> when SACK is
> > enabled in the latest kernel?
>
> For sure we might have all kinds of problems no one has yet
>
> noticed/reported :-). ....However, it seems that this
> particular problem
> your trace is showing is solved. Can you please test with a
> fixed kernel
> before coming back here with these claims.
>
>
> --
> i.
>
> --- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]>
> wrote:
>
> > From: Ilpo J?rvinen <[email protected]>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Joe Cao" <[email protected]>
> > Cc: "Ray Lee" <[email protected]>,
> "Netdev" <[email protected]>,
> "LKML" <[email protected]>
> > Date: Friday, September 25, 2009, 11:03 AM
> > On Fri, 25 Sep 2009, Joe Cao wrote:
> >
> > > Thanks for the reply!? Do you happen to know
> > which patch fixed the
> > > problem?
> >
> > You can find those patches from the stable queue git
> tree.
> > I gave you hint
> > from what release to look from in the last mail.
> However,
> > as 2.6.24 is
> > anyway obsolete my recommendation is that you should
> > probably consider
> > upgrading to fix all the other bugs that have been
> found
> > since 2.6.24 was
> > obsoleted.
> >
> > > Is there a bug tracking system for linux kernel?
> >
> > Nothing that knows everything about everything.
> >
> > > I studied the FRTO code in latest kernel
> 2.6.31..?
> > It seems the problem
> > > is still there:?
> > >
> > > 1. Every time a RTO fires, because
> tcp_is_sackfrto(tp)
> > returns 1,
> > > tcp_use_frto() returns true.? And the server
> tcp
> > enters FRTO.
> > > 2. After the head of write queue is
> retransmitted, two
> > new data packets
> > > are transmitted, the server receives two
> > dup-ACKs.? That will make the
> > > TCP enter tcp_enter_frto_loss(), however, that
> only
> > rests ssthresh and
> > > some other fields.
> >
> > Perhaps those other fields are far more important than
> you
> > think... :-)
> > ...Some retransmission would happen here as step 3.
> >
> > > 3. After another longer RTO fires, because
> > tcp_is_sackfrto(tp) returns
> > > 1, tcp_use_frto() again returns true.? The
> stack
> > enters FRTO again.
> > > 4. The above repeats and the stack couldn't
> > retransmits the lost packets
> > > faster.
> > >
> > > Is my understanding above correct?
> >
> > ...No. All magic that happens in tcp_enter_frto_loss
> should
> > be enough to
> > really do more than a single retransmission (that is,
> in
> > any other than
> > 2.6.24 series kernel). There was an unfortunate bug in
> this
> > area in 2.6.24
> > which basically undoed the effect of correct actions
> > tcp_enter_frto_loss
> > did which effectively prevented
> tcp_xmit_retransmit_queue
> > from doing its
> > part.
> >
> > --
> >? i.
> >
> > --- On Fri, 9/25/09, Ilpo J?rvinen <[email protected]>
> > wrote:
> >
> > > From: Ilpo J?rvinen <[email protected]>
> > > Subject: Re: TCP stack bug related to F-RTO?
> > > To: "Ray Lee" <[email protected]>
> > > Cc: "Joe Cao" <[email protected]>,
> > "Netdev" <[email protected]>,
> > "LKML" <[email protected]>,
> > [email protected]
> > > Date: Friday, September 25, 2009, 6:09 AM
> > > On Thu, 24 Sep 2009, Ray Lee wrote:
> > >
> > > > [adding netdev cc:]
> > > >
> > > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao
> <[email protected]>
> > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I have found the following behavior
> with
> > > different versions of linux
> > > > > kernel. The attached pcap trace is
> collected
> > with
> > > server
> > > > > (192.168.0.13) running 2.6.24 and shows
> the
> > > problem. Basically the
> > > > > behavior is like this:
> > > > >
> > > > > 1. The client opens up a big window,
> > > > > 2. the server sends 19 packets in a row
> (pkt
> > #14-
> > > #32 in the trace), but all of them are dropped
> due to
> > some
> > > congestion.
> > > > > 3. The server hits RTO and retransmits
> pkt
> > #14 in
> > > #33
> > > > > 4. The client immediately acks #33
> (=#14),
> > and
> > > the server (seems like to enter F-RTO) expends
> the
> > window
> > > and sends *NEW* pkt #35 & #36.=A0 Timeoute
> is
> > doubled to
> > > 2*RTO; The client immediately sends two Dup-ack
> to #35
> > and
> > > #36.
> > > > > 5. after 2*RTO, pkt #15 is
> retransmitted in
> > #39.
> > > > > 6. The client immediately acks #39
> (=#15) in
> > #40,
> > > and the server continues to expand the window
> and
> > sends two
> > > *NEW* pkt #41 & #42. Now the timeoute is
> doubled
> > to 4
> > > *RTO.
> > > > > 8. After 4*RTO timeout, #16 is
> > retransmitted.
> > > > > 9....
> > > > > 10. The above steps repeats for
> > retransmitting
> > > pkt #16-#32 and each time the timeout is
> doubled.
> > > > > 11. It takes a long long time to
> retransmit
> > all
> > > the lost packets and before that is done, the
> client
> > sends a
> > > RST because of timeout.
> > > > >
> > > > > The above behavior looks like F-RTO is
> in
> > effect.
> > > ?And there seems to
> > > > > be a bug in the TCP's congestion
> control
> > and
> > > retransmission algorithm.
> > > > > Why doesn't the TCP on server (running
> > 2.6.24)
> > > enter the slow start?
> > > > > Why should the server take that long
> to
> > recover
> > > from a short period
> > > > > of packet loss?
> > > > >
> > > > > Has anyone else noticed similar
> problem
> > before?
> > > ?If my analysis was
> > > > > wrong, can anyone gives me some
> pointers to
> > > what's really wrong and
> > > > > how to fix it?
> > >
> > > Yes, 2.6.24 is an obsoleted version with known
> wrongs
> > in
> > > FRTO
> > > implementation. Fixes never when to 2.6.24
> stable
> > series as
> > > it was
> > > _already_ obsoleted when the problems where
> reported
> > and
> > > found. The
> > > correct fixes may be found from 2.6.25.7 (.7
> iirc) and
> > are
> > > included from
> > > 2.6.26 onward too.
> > >
> > > Just in case you happen to run ubuntu based
> kernel
> > from
> > > that era (of
> > > course you should be reporting the bug here
> then...),
> > a
> > > word of warning:
> > > it seemed nearly impossible for them to get a
> simple
> > thing
> > > like that
> > > fixed, I haven't been looking if they'd
> eventually
> > come to
> > > some sensible
> > > conclusion in that matter or is it still
> unresolved
> > (or
> > > e.g., closed
> > > without real resolution).
>