Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752970AbZIZBu5 (ORCPT ); Fri, 25 Sep 2009 21:50:57 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752839AbZIZBu5 (ORCPT ); Fri, 25 Sep 2009 21:50:57 -0400 Received: from n6.bullet.mail.ac4.yahoo.com ([76.13.13.234]:42007 "HELO n6.bullet.mail.ac4.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752371AbZIZBu4 convert rfc822-to-8bit (ORCPT ); Fri, 25 Sep 2009 21:50:56 -0400 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 660361.25430.bm@omp102.mail.ac4.yahoo.com DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=sWaH1gYOrIBXeXazm9R7zP+a1jpJ0kfCtNeRD8pmX4HrK/ykGcxWE52t6NEyX+w6gVYNltRBKVRC+Y3rIflSaV04s9IqiZsmAZwlekFY4Tj3P657Q+kX5blvXjWGQAEeHqtM2P3mqzHMSIYsRgog3In0NHFRCBr5Ll8apf2xsOA=; Message-ID: <557199.94656.qm@web63402.mail.re1.yahoo.com> X-YMail-OSG: dsNnrg0VM1msuZdAFVLOnAEPeeSBjiIOtR_y6Sf84xTzWQIEIk7Igb4H1pa66tb3wY6xLANp6zz0XHmlNTnMrrM1UJoa9WhpCD0GCcIt3zJ6.U1lLbBc_Yg2dQDSWveXmStFLYn8aBx1bZ1rLGRo1qrfD8jj1ToeNNehaSsNvRoB1oYaNqxRXsqCLCmHW32xUuFtnBXQT0LlcyoEkmBfUZBYx2yY_zqPCXY1fIMxq0o5moimiETxT1rMI_GJLeAHJwajjqV7376um0AOUzRopXGginI29zB_I0p3H2GDhZ7CJDXleEnQHpzdTpfwRcO7aJydpBCMsZ8TCTYTUvOooQYN X-Mailer: YahooMailClassic/7.0.14 YahooMailWebService/0.7.347.3 Date: Fri, 25 Sep 2009 18:50:59 -0700 (PDT) From: Joe Cao Subject: Re: TCP stack bug related to F-RTO? To: =?iso-8859-1?Q?Ilpo_J=E4rvinen?= Cc: Ray Lee , Netdev , LKML In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5907 Lines: 206 That makes sense. Thanks for the info! Joe --- On Fri, 9/25/09, Ilpo J?rvinen wrote: > From: Ilpo J?rvinen > Subject: Re: TCP stack bug related to F-RTO? > To: "Joe Cao" > Cc: "Ray Lee" , "Netdev" , "LKML" > Date: Friday, September 25, 2009, 11:03 AM > On Fri, 25 Sep 2009, Joe Cao wrote: > > > Thanks for the reply!? Do you happen to know > which patch fixed the > > problem? > > You can find those patches from the stable queue git tree. > I gave you hint > from what release to look from in the last mail. However, > as 2.6.24 is > anyway obsolete my recommendation is that you should > probably consider > upgrading to fix all the other bugs that have been found > since 2.6.24 was > obsoleted. > > > Is there a bug tracking system for linux kernel? > > Nothing that knows everything about everything. > > > I studied the FRTO code in latest kernel 2.6.31.? > It seems the problem > > is still there:? > > > > 1. Every time a RTO fires, because tcp_is_sackfrto(tp) > returns 1, > > tcp_use_frto() returns true.? And the server tcp > enters FRTO. > > 2. After the head of write queue is retransmitted, two > new data packets > > are transmitted, the server receives two > dup-ACKs.? That will make the > > TCP enter tcp_enter_frto_loss(), however, that only > rests ssthresh and > > some other fields. > > Perhaps those other fields are far more important than you > think... :-) > ...Some retransmission would happen here as step 3. > > > 3. After another longer RTO fires, because > tcp_is_sackfrto(tp) returns > > 1, tcp_use_frto() again returns true.? The stack > enters FRTO again. > > 4. The above repeats and the stack couldn't > retransmits the lost packets > > faster. > > > > Is my understanding above correct? > > ...No. All magic that happens in tcp_enter_frto_loss should > be enough to > really do more than a single retransmission (that is, in > any other than > 2.6.24 series kernel). There was an unfortunate bug in this > area in 2.6.24 > which basically undoed the effect of correct actions > tcp_enter_frto_loss > did which effectively prevented tcp_xmit_retransmit_queue > from doing its > part. > > -- > i. > > --- On Fri, 9/25/09, Ilpo J?rvinen > wrote: > > > From: Ilpo J?rvinen > > Subject: Re: TCP stack bug related to F-RTO? > > To: "Ray Lee" > > Cc: "Joe Cao" , > "Netdev" , > "LKML" , > jcaoco2002@yahoo.com > > Date: Friday, September 25, 2009, 6:09 AM > > On Thu, 24 Sep 2009, Ray Lee wrote: > > > > > [adding netdev cc:] > > > > > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao > > wrote: > > > > > > > > Hello, > > > > > > > > I have found the following behavior with > > different versions of linux > > > > kernel. The attached pcap trace is collected > with > > server > > > > (192.168.0.13) running 2.6.24 and shows the > > problem. Basically the > > > > behavior is like this: > > > > > > > > 1. The client opens up a big window, > > > > 2. the server sends 19 packets in a row (pkt > #14- > > #32 in the trace), but all of them are dropped due to > some > > congestion. > > > > 3. The server hits RTO and retransmits pkt > #14 in > > #33 > > > > 4. The client immediately acks #33 (=#14), > and > > the server (seems like to enter F-RTO) expends the > window > > and sends *NEW* pkt #35 & #36.=A0 Timeoute is > doubled to > > 2*RTO; The client immediately sends two Dup-ack to #35 > and > > #36. > > > > 5. after 2*RTO, pkt #15 is retransmitted in > #39. > > > > 6. The client immediately acks #39 (=#15) in > #40, > > and the server continues to expand the window and > sends two > > *NEW* pkt #41 & #42. Now the timeoute is doubled > to 4 > > *RTO. > > > > 8. After 4*RTO timeout, #16 is > retransmitted. > > > > 9.... > > > > 10. The above steps repeats for > retransmitting > > pkt #16-#32 and each time the timeout is doubled. > > > > 11. It takes a long long time to retransmit > all > > the lost packets and before that is done, the client > sends a > > RST because of timeout. > > > > > > > > The above behavior looks like F-RTO is in > effect. > > ?And there seems to > > > > be a bug in the TCP's congestion control > and > > retransmission algorithm. > > > > Why doesn't the TCP on server (running > 2.6.24) > > enter the slow start? > > > > Why should the server take that long to > recover > > from a short period > > > > of packet loss? > > > > > > > > Has anyone else noticed similar problem > before? > > ?If my analysis was > > > > wrong, can anyone gives me some pointers to > > what's really wrong and > > > > how to fix it? > > > > Yes, 2.6.24 is an obsoleted version with known wrongs > in > > FRTO > > implementation. Fixes never when to 2.6.24 stable > series as > > it was > > _already_ obsoleted when the problems where reported > and > > found. The > > correct fixes may be found from 2.6.25.7 (.7 iirc) and > are > > included from > > 2.6.26 onward too. > > > > Just in case you happen to run ubuntu based kernel > from > > that era (of > > course you should be reporting the bug here then...), > a > > word of warning: > > it seemed nearly impossible for them to get a simple > thing > > like that > > fixed, I haven't been looking if they'd eventually > come to > > some sensible > > conclusion in that matter or is it still unresolved > (or > > e.g., closed > > without real resolution). > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/