Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754911Ab0BAOpm (ORCPT ); Mon, 1 Feb 2010 09:45:42 -0500 Received: from mail-bw0-f223.google.com ([209.85.218.223]:53906 "EHLO mail-bw0-f223.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754576Ab0BAOpk convert rfc822-to-8bit (ORCPT ); Mon, 1 Feb 2010 09:45:40 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=E8Hyy/VBEZwVook/RvteFcpjKwhfzIEyBfVNp890/wefgR/xxcNYO+cKq0NV0yuS2x lAoFqS0MyC5QczrWRXP01edOvfpjt/0kjY9w676s2igFNHe7CmfaBRbDFCK8Ba5VNmqz BqtXXgsK8jff/sjtjOJnDjSEIIE/bmDwh5RXI= MIME-Version: 1.0 In-Reply-To: <53cc795f1001190813m377c6c91l16b2dc04f63049e7@mail.gmail.com> References: <53cc795f1001190813m377c6c91l16b2dc04f63049e7@mail.gmail.com> Date: Mon, 1 Feb 2010 17:45:38 +0300 Message-ID: <53cc795f1002010645w54b98b51s3dbdea18e5eb73f2@mail.gmail.com> Subject: Re: Panic at tcp_xmit_retransmit_queue From: sbs To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12485 Lines: 281 actually removing netconsole from kernel didnt help. i found many guys with the same problem but with different hardware configurations here: freez in TCP stack : http://bugzilla.kernel.org/show_bug.cgi?id=14470 is there someone who can investigate it? On Tue, Jan 19, 2010 at 7:13 PM, sbs wrote: > We are hiting kernel panics on servers with nVidia MCP55 NICs once a day; > it appears usualy under a high network trafic ( around 10000Mbit/s) but > it is not a rule, it has happened even on low trafic. > > Servers are used as nginx+static content > On 2 equal servers this panic happens aprox 2 times a day depending on > network load. Machine completly freezes till the netconsole reboots. > > Kernel: 2.6.32.3 > > what can it be? whats wrong with tcp_xmit_retransmit_queue() function ? > can anyone explain or fix? > > Panic output: > > Dec 29 22:33:51 linuxtest [1188725.037019] BUG: unable to handle kernel > Dec 29 22:33:51 linuxtest NULL pointer dereference > Dec 29 22:33:51 linuxtest at (null) > Dec 29 22:33:51 linuxtest [1188725.037042] IP: > Dec 29 22:33:51 linuxtest [] tcp_xmit_retransmit_queue+0x1b2/0x1dc > Dec 29 22:33:51 linuxtest [1188725.037064] *pdpt = 00000000229c2001 > Dec 29 22:33:51 linuxtest *pde = 0000000000000000 > Dec 29 22:33:51 linuxtest > Dec 29 22:33:51 linuxtest [1188725.037080] Thread overran stack, or > stack corrupted > Dec 29 22:33:51 linuxtest [1188725.037091] Oops: 0000 [#1] > Dec 29 22:33:51 linuxtest SMP > Dec 29 22:33:51 linuxtest > Dec 29 22:33:51 linuxtest [1188725.037104] last sysfs file: > /sys/devices/pci0000:00/0000:00:0f.0/0000:07:00.0/0000:08:01.0/0000:09:00.0/class > Dec 29 22:33:51 linuxtest [1188725.037124] > Dec 29 22:33:51 linuxtest [1188725.037131] Pid: 0, comm: swapper Not > tainted (2.6.31.6-v03 #2) H8DMU > Dec 29 22:33:51 linuxtest [1188725.037145] EIP: 0060:[] > EFLAGS: 00010246 CPU: 0 > Dec 29 22:33:51 linuxtest [1188725.037158] EIP is at > tcp_xmit_retransmit_queue+0x1b2/0x1dc > Dec 29 22:33:51 linuxtest [1188725.037170] EAX: c540513c EBX: c54050c0 > ECX: 0e377f15 EDX: c540513c > Dec 29 22:33:51 linuxtest [1188725.037183] ESI: 00000000 EDI: 00000000 > EBP: c0805d28 ESP: c0805d0c > Dec 29 22:33:51 linuxtest [1188725.037196] ?DS: 007b ES: 007b FS: 00d8 > GS: 0000 SS: 0068 > Dec 29 22:33:51 linuxtest [1188725.037208] Process swapper (pid: 0, > ti=c0804000 task=c080b5a0 task.ti=c0804000) > Dec 29 22:33:51 linuxtest [1188725.037285] Stack: > Dec 29 22:33:51 linuxtest [1188725.037368] ?00000202 > Dec 29 22:33:51 linuxtest 00000000 > Dec 29 22:33:51 linuxtest c540513c > Dec 29 22:33:51 linuxtest 0e377f14 > Dec 29 22:33:51 linuxtest 00000000 > Dec 29 22:33:51 linuxtest c54050c0 > Dec 29 22:33:51 linuxtest 0000050e > Dec 29 22:33:51 linuxtest c0805da8 > Dec 29 22:33:51 linuxtest > Dec 29 22:33:51 linuxtest [1188725.037472] <0> > Dec 29 22:33:51 linuxtest c05fe931 > Dec 29 22:33:51 linuxtest 00000001 > Dec 29 22:33:51 linuxtest 00000001 > Dec 29 22:33:51 linuxtest 00000006 > Dec 29 22:33:51 linuxtest 00000005 > Dec 29 22:33:51 linuxtest 00000001 > Dec 29 22:33:51 linuxtest 00000001 > Dec 29 22:33:51 linuxtest 00000006 > Dec 29 22:33:51 linuxtest > Dec 29 22:33:51 linuxtest [1188725.037629] <0> > Dec 29 22:33:51 linuxtest 01000246 > Dec 29 22:33:51 linuxtest 00000005 > Dec 29 22:33:51 linuxtest 11b57b53 > Dec 29 22:33:51 linuxtest c5405168 > Dec 29 22:33:51 linuxtest c061df41 > Dec 29 22:33:51 linuxtest 00000006 > Dec 29 22:33:51 linuxtest 00000000 > Dec 29 22:33:51 linuxtest 00000000 > Dec 29 22:33:51 linuxtest > Dec 29 22:33:51 linuxtest [1188725.037887] Call Trace: > Dec 29 22:33:51 linuxtest [1188725.037975] ?[] ? tcp_ack+0x1591/0x1778 > Dec 29 22:33:51 linuxtest [1188725.038073] ?[] ? > ipt_do_table+0x2f8/0x310 > Dec 29 22:33:51 linuxtest [1188725.038148] ?[] ? > tcp_rcv_state_process+0x4db/0x7fc > Dec 29 22:33:51 linuxtest [1188725.038246] ?[] ? > tcp_v4_do_rcv+0x263/0x29d > Dec 29 22:33:51 linuxtest [1188725.038321] ?[] ? > local_bh_enable+0xb/0xd > Dec 29 22:33:51 linuxtest [1188725.038419] ?[] ? sk_filter+0x5e/0x69 > Dec 29 22:33:51 linuxtest [1188725.038510] ?[] ? > tcp_v4_rcv+0x371/0x502 > Dec 29 22:33:51 linuxtest [1188725.038607] ?[] ? > ip_local_deliver_finish+0x0/0x171 > Dec 29 22:33:51 linuxtest [1188725.038684] ?[] ? > ip_local_deliver_finish+0xfe/0x171 > Dec 29 22:33:51 linuxtest [1188725.038784] ?[] ? > ip_local_deliver+0x61/0x66 > Dec 29 22:33:51 linuxtest [1188725.038876] ?[] ? > ip_rcv_finish+0x289/0x2b1 > Dec 29 22:33:51 linuxtest [1188725.038961] ?[] ? ip_rcv+0x203/0x233 > Dec 29 22:33:51 linuxtest [1188725.039052] ?[] ? > netif_receive_skb+0x335/0x350 > Dec 29 22:33:51 linuxtest [1188725.039151] ?[] ? > process_backlog+0x62/0x88 > Dec 29 22:33:51 linuxtest [1188725.039242] ?[] ? > net_rx_action+0x8e/0x16b > Dec 29 22:33:51 linuxtest [1188725.039333] ?[] ? > __do_softirq+0xa7/0x148 > Dec 29 22:33:51 linuxtest [1188725.039423] ?[] ? do_softirq+0x26/0x2b > Dec 29 22:33:51 linuxtest [1188725.039520] ?[] ? irq_exit+0x29/0x5c > Dec 29 22:33:51 linuxtest [1188725.039610] ?[] ? do_IRQ+0x81/0x95 > Dec 29 22:33:51 linuxtest [1188725.039706] ?[] ? > common_interrupt+0x29/0x30 > Dec 29 22:33:51 linuxtest [1188725.039797] ?[] ? > default_idle+0x3e/0x5b > Dec 29 22:33:51 linuxtest [1188725.039895] ?[] ? > clockevents_notify+0x60/0x65 > Dec 29 22:33:51 linuxtest [1188725.039986] ?[] ? c1e_idle+0xb8/0xd2 > Dec 29 22:33:51 linuxtest [1188725.040058] ?[] ? cpu_idle+0x45/0x5f > Dec 29 22:33:51 linuxtest [1188725.040131] ?[] ? rest_init+0x58/0x5a > Dec 29 22:33:51 linuxtest [1188725.040212] ?[] ? > start_kernel+0x2f0/0x2f5 > Dec 29 22:33:51 linuxtest [1188725.040285] ?[] ? > i386_start_kernel+0x70/0x77 > Dec 29 22:33:51 linuxtest [1188725.040381] Code: > Dec 29 22:33:51 linuxtest ec > Dec 29 22:33:51 linuxtest bd > Dec 29 22:33:51 linuxtest 84 > Dec 29 22:33:51 linuxtest c0 > Dec 29 22:33:51 linuxtest ff > Dec 29 22:33:51 linuxtest 04 > Dec 29 22:33:51 linuxtest 88 > Dec 29 22:33:51 linuxtest 8b > Dec 29 22:33:51 linuxtest 55 > Dec 29 22:33:51 linuxtest ec > Dec 29 22:33:51 linuxtest 8b > Dec 29 22:33:51 linuxtest 02 > Dec 29 22:33:51 linuxtest 39 > Dec 29 22:33:51 linuxtest d0 > Dec 29 22:33:51 linuxtest ba > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest 0f > Dec 29 22:33:51 linuxtest 44 > Dec 29 22:33:51 linuxtest c2 > Dec 29 22:33:51 linuxtest 39 > Dec 29 22:33:51 linuxtest c6 > Dec 29 22:33:51 linuxtest 75 > Dec 29 22:33:51 linuxtest 0f > Dec 29 22:33:51 linuxtest 8b > Dec 29 22:33:51 linuxtest 8b > Dec 29 22:33:51 linuxtest 18 > Dec 29 22:33:51 linuxtest 02 > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest b2 > Dec 29 22:33:51 linuxtest 01 > Dec 29 22:33:51 linuxtest 89 > Dec 29 22:33:51 linuxtest d8 > Dec 29 22:33:51 linuxtest e8 > Dec 29 22:33:51 linuxtest ee > Dec 29 22:33:51 linuxtest fd > Dec 29 22:33:51 linuxtest ff > Dec 29 22:33:51 linuxtest ff > Dec 29 22:33:51 linuxtest 8b > Dec 29 22:33:51 linuxtest 36 > Dec 29 13:33:50 linuxtest unparseable log message: "<8b> " > Dec 29 22:33:51 linuxtest 06 > Dec 29 22:33:51 linuxtest 0f > Dec 29 22:33:51 linuxtest 18 > Dec 29 22:33:51 linuxtest 00 > Dec 29 22:33:51 linuxtest 90 > Dec 29 22:33:51 linuxtest 3b > Dec 29 22:33:51 linuxtest 75 > Dec 29 22:33:51 linuxtest ec > Dec 29 22:33:51 linuxtest 0f > Dec 29 22:33:51 linuxtest 85 > Dec 29 22:33:51 linuxtest a9 > Dec 29 22:33:51 linuxtest fe > Dec 29 22:33:51 linuxtest ff > Dec 29 22:33:51 linuxtest ff > Dec 29 22:33:51 linuxtest eb > Dec 29 22:33:51 linuxtest 11 > Dec 29 22:33:51 linuxtest 85 > Dec 29 22:33:51 linuxtest ff > Dec 29 22:33:51 linuxtest 0f > Dec 29 22:33:51 linuxtest 84 > Dec 29 22:33:51 linuxtest > Dec 29 22:33:51 linuxtest [1188725.040771] EIP: [] > Dec 29 22:33:51 linuxtest tcp_xmit_retransmit_queue+0x1b2/0x1dc > Dec 29 22:33:51 linuxtest SS:ESP 0068:c0805d0c > Dec 29 22:33:51 linuxtest [1188725.040929] CR2: 0000000000000000 > Dec 29 22:33:51 linuxtest [1188725.041346] ---[ end trace 1b9e8ae01c5d5485 ]--- > Dec 29 22:33:51 linuxtest [1188725.042940] Kernel panic - not syncing: > Fatal exception in interrupt > Dec 29 22:33:51 linuxtest [1188725.043076] Pid: 0, comm: swapper > Tainted: G ? ? ?D ? ?2.6.31.6-v03 #2 > Dec 29 22:33:51 linuxtest [1188725.043188] Call Trace: > Dec 29 22:33:51 linuxtest [1188725.043318] ?[] ? printk+0xf/0x11 > Dec 29 22:33:51 linuxtest [1188725.043441] ?[] panic+0x39/0xd6 > Dec 29 22:33:51 linuxtest [1188725.043558] ?[] oops_end+0x8b/0x9a > Dec 29 22:33:51 linuxtest [1188725.043683] ?[] no_context+0x13c/0x146 > Dec 29 22:33:51 linuxtest [1188725.043814] ?[] > __bad_area_nosemaphore+0x113/0x11b > Dec 29 22:33:51 linuxtest [1188725.043943] ?[] ? > nv_start_xmit_optimized+0x3d4/0x401 > Dec 29 22:33:51 linuxtest [1188725.044073] ?[] ? > __enqueue_entity+0x8d/0x95 > Dec 29 22:33:51 linuxtest [1188725.044182] ?[] > bad_area_nosemaphore+0xd/0x10 > Dec 29 22:33:51 linuxtest [1188725.044319] ?[] > do_page_fault+0x108/0x265 > Dec 29 22:33:51 linuxtest [1188725.044444] ?[] ? > enqueue_task+0x72/0x7f > Dec 29 22:33:51 linuxtest [1188725.044562] ?[] ? > do_page_fault+0x0/0x265 > Dec 29 22:33:51 linuxtest [1188725.044686] ?[] error_code+0x66/0x6c > Dec 29 22:33:51 linuxtest [1188725.044817] ?[] ? > do_page_fault+0x0/0x265 > Dec 29 22:33:51 linuxtest [1188725.044944] ?[] ? > tcp_xmit_retransmit_queue+0x1b2/0x1dc > Dec 29 22:33:51 linuxtest [1188725.045077] ?[] tcp_ack+0x1591/0x1778 > Dec 29 22:33:51 linuxtest [1188725.045201] ?[] ? > ipt_do_table+0x2f8/0x310 > Dec 29 22:33:51 linuxtest [1188725.045332] ?[] > tcp_rcv_state_process+0x4db/0x7fc > Dec 29 22:33:51 linuxtest [1188725.045442] ?[] > tcp_v4_do_rcv+0x263/0x29d > Dec 29 22:33:51 linuxtest [1188725.045567] ?[] ? > local_bh_enable+0xb/0xd > Dec 29 22:33:51 linuxtest [1188725.045694] ?[] ? sk_filter+0x5e/0x69 > Dec 29 22:33:51 linuxtest [1188725.045802] ?[] tcp_v4_rcv+0x371/0x502 > Dec 29 22:33:51 linuxtest [1188725.045911] ?[] ? > ip_local_deliver_finish+0x0/0x171 > Dec 29 22:33:51 linuxtest [1188725.046045] ?[] > ip_local_deliver_finish+0xfe/0x171 > Dec 29 22:33:51 linuxtest [1188725.046155] ?[] > ip_local_deliver+0x61/0x66 > Dec 29 22:33:51 linuxtest [1188725.046301] ?[] > ip_rcv_finish+0x289/0x2b1 > Dec 29 22:33:51 linuxtest [1188725.046429] ?[] ip_rcv+0x203/0x233 > Dec 29 22:33:51 linuxtest [1188725.046555] ?[] > netif_receive_skb+0x335/0x350 > Dec 29 22:33:51 linuxtest [1188725.046664] ?[] > process_backlog+0x62/0x88 > Dec 29 22:33:51 linuxtest [1188725.046809] ?[] > net_rx_action+0x8e/0x16b > Dec 29 22:33:51 linuxtest [1188725.046917] ?[] __do_softirq+0xa7/0x148 > Dec 29 22:33:51 linuxtest [1188725.047041] ?[] do_softirq+0x26/0x2b > Dec 29 22:33:51 linuxtest [1188725.047162] ?[] irq_exit+0x29/0x5c > Dec 29 22:33:51 linuxtest [1188725.047285] ?[] do_IRQ+0x81/0x95 > Dec 29 22:33:51 linuxtest [1188725.047409] ?[] > common_interrupt+0x29/0x30 > Dec 29 22:33:51 linuxtest [1188725.047536] ?[] ? > default_idle+0x3e/0x5b > Dec 29 22:33:51 linuxtest [1188725.047664] ?[] ? > clockevents_notify+0x60/0x65 > Dec 29 22:33:51 linuxtest [1188725.047790] ?[] c1e_idle+0xb8/0xd2 > Dec 29 22:33:51 linuxtest [1188725.047913] ?[] cpu_idle+0x45/0x5f > Dec 29 22:33:51 linuxtest [1188725.048030] ?[] rest_init+0x58/0x5a > Dec 29 22:33:51 linuxtest [1188725.048153] ?[] > start_kernel+0x2f0/0x2f5 > Dec 29 22:33:51 linuxtest [1188725.048271] ?[] > i386_start_kernel+0x70/0x77 > Dec 29 22:33:51 linuxtest [1188725.048404] Rebooting in 10 seconds.. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/