Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760535AbZCaSws (ORCPT ); Tue, 31 Mar 2009 14:52:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1763881AbZCaSuI (ORCPT ); Tue, 31 Mar 2009 14:50:08 -0400 Received: from smtp-out-43.synserver.de ([217.119.50.43]:1041 "HELO smtp-out-43.synserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1763803AbZCaSuE (ORCPT ); Tue, 31 Mar 2009 14:50:04 -0400 X-SynServer-TrustedSrc: 1 X-SynServer-AuthUser: markus@trippelsdorf.de X-SynServer-PPID: 8422 Date: Tue, 31 Mar 2009 20:49:59 +0200 From: Markus Trippelsdorf To: Ilpo =?iso-8859-1?Q?J=E4rvinen?= Cc: Netdev , LKML Subject: Re: WARNING: at net/ipv4/tcp_input.c:2927 tcp_ack+0xd55/0x1991() Message-ID: <20090331184959.GA2725@gentoox2.trippelsdorf.de> References: <20090327211202.GA10014@gentoox2.trippelsdorf.de> <20090328045056.GA2394@gentoox2.trippelsdorf.de> <20090328095514.GA2599@gentoox2.trippelsdorf.de> <20090330164035.GA2652@gentoox2.trippelsdorf.de> <20090331071018.GA2641@gentoox2.trippelsdorf.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4452 Lines: 96 On Tue, Mar 31, 2009 at 12:16:51PM +0300, Ilpo J?rvinen wrote: > On Tue, 31 Mar 2009, Markus Trippelsdorf wrote: > > > On Mon, Mar 30, 2009 at 09:52:55PM +0300, Ilpo J?rvinen wrote: > > > On Mon, 30 Mar 2009, Markus Trippelsdorf wrote: > > > > > > > On Mon, Mar 30, 2009 at 07:01:22PM +0300, Ilpo J?rvinen wrote: > > > > > On Sat, 28 Mar 2009, Markus Trippelsdorf wrote: > > > > > > On Sat, Mar 28, 2009 at 10:29:58AM +0200, Ilpo J?rvinen wrote: > > > > > > > > > > ...And, let me guess, you're in X and therefore unable to catch a final > > > > > oops if any would be printed? It would be nice to get around that as well, > > > > > either use serial/netconsole or hang in text mode while waiting for the > > > > > crash (should be too hard if you are able to setup the workload first > > > > > and then switch away from X and if reproducing takes about an hour)... > > > > > > > > OK, I will try this later. > > > > > > Lets hope that gives some clue where it ends up going boom (if it is > > > caused by TCP we certainly should see something more sensible in console > > > than just a hang)... ...I once again read through tcp commits but just > > > cannot find anything that could cause fackets_out miscount, not to speak > > > of crash prone changes so we'll just have to wait and see... > > > > The machine hanged again this night and I took two pictures: > > http://www.mypicx.com/uploadimg/1055813374_03302009_2.jpg > > http://www.mypicx.com/uploadimg/1543678904_03302009_1.jpg > > > > But this time there was no tcp related warning in the logs. > > Right. If that oops would be hit often enough one can easily mix the > warning with that hang though there is no relation (the fact that final > oops always goes unnoticed in X amplifies the effect). > > > I then pulled the lateset git changes, rebuild, rebooted and setup the > > workload again. The machine was still up and running in the morning > > (~4 hours uptime). So it may well be that the issue was fixed with > > the latest changes. > > Lets hope so, in any case if you still see hangs please get the final oops. > > > If it ever occurs again I will notify you. It happend again. In this case it took ~10 minutes from the warning to the final crash. I'm pretty sure there must be some kind of relation between the two. How else could one explain that the machine crashes just minutes after _each_ instance of that WARNING? (Unfortunately I was in X again, because I thought this problem was solved) ------------[ cut here ]------------ WARNING: at net/ipv4/tcp_input.c:2927 tcp_ack+0xd5b/0x19a6() Hardware name: To Be Filled By O.E.M. Pid: 0, comm: swapper Not tainted 2.6.29-06608-g15f7176 #2 Call Trace: [] warn_slowpath+0xaa/0xd1 [] ? lock_timer_base+0x27/0x4d [] ? mod_timer+0xc4/0xd6 [] ? _spin_unlock_bh+0xf/0x11 [] ? lock_timer_base+0x27/0x4d [] ? mod_timer+0xc4/0xd6 [] tcp_ack+0xd5b/0x19a6 [] ? tcp_validate_incoming+0x4c/0x2bd [] tcp_rcv_established+0x625/0x87d [] ? nf_nat_fn+0x13a/0x150 [] tcp_v4_do_rcv+0x31/0x1b7 [] ? sk_filter+0x72/0x7f [] tcp_v4_rcv+0x431/0x630 [] ? nf_hook_slow+0x65/0xc6 [] ip_local_deliver_finish+0xbc/0x146 [] ip_local_deliver+0x72/0x79 [] ip_rcv_finish+0x293/0x2bf [] ip_rcv+0x24b/0x283 [] netif_receive_skb+0x22c/0x24a [] skge_poll+0x4bd/0x624 [] net_rx_action+0x71/0x13a [] __do_softirq+0x79/0x138 [] ? ack_apic_level+0x48/0xe4 [] call_softirq+0x1c/0x28 [] do_softirq+0x34/0x76 [] irq_exit+0x3f/0x7c [] do_IRQ+0xa9/0xc0 [] ret_from_intr+0x0/0xa [] ? default_idle+0x2d/0x42 [] ? enter_idle+0x20/0x22 [] ? cpu_idle+0x52/0x95 ---[ end trace 4787b4dcb14517b0 ]--- -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/