Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1765504AbZFQLLZ (ORCPT ); Wed, 17 Jun 2009 07:11:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1765409AbZFQLIW (ORCPT ); Wed, 17 Jun 2009 07:08:22 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:40453 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1763663AbZFQLIR (ORCPT ); Wed, 17 Jun 2009 07:08:17 -0400 Date: Wed, 17 Jun 2009 13:08:03 +0200 From: Ingo Molnar To: Eric Dumazet Cc: David Miller , Thomas Gleixner , torvalds@linux-foundation.org, akpm@linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Patrick McHardy Subject: Re: [bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad() Message-ID: <20090617110803.GA10175@elte.hu> References: <20090615.050449.144947903.davem@davemloft.net> <20090616091538.GA4184@elte.hu> <20090616.034752.226811527.davem@davemloft.net> <20090616105304.GA3579@elte.hu> <20090616122415.GA16630@elte.hu> <20090617092152.GA17449@elte.hu> <4A38C2F3.3000009@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4A38C2F3.3000009@gmail.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6168 Lines: 115 * Eric Dumazet wrote: > Ingo Molnar a ?crit : > > here's another bug i triggered today - some sort of memory/list > > corruption going on in the timer code. Then i turned on debugobjects > > and got a pretty specific assert in the TCP code: > > > > [ 48.320340] ------------[ cut here ]------------ > > [ 48.324031] WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad() > > [ 48.324031] Hardware name: System Product Name > > [ 48.324031] list_add corruption. prev->next should be next (ffffffff81fe2280), but was ffff88003f901440. (prev=ffff880002a9bcf0). > > [ 48.324031] Modules linked in: > > [ 48.324031] Pid: 0, comm: swapper Tainted: G W 2.6.30-tip #54394 > > [ 48.324031] Call Trace: > > [ 48.324031] [] ? __list_add+0x7d/0xad > > [ 48.324031] [] warn_slowpath_common+0x8d/0xd0 > > [ 48.324031] [] warn_slowpath_fmt+0x50/0x66 > > [ 48.324031] [] __list_add+0x7d/0xad > > [ 48.324031] [] internal_add_timer+0xd1/0xe7 > > [ 48.324031] [] __mod_timer+0x107/0x139 > > [ 48.324031] [] mod_timer_pending+0x28/0x3e > > [ 48.324031] [] __nf_ct_refresh_acct+0x71/0xf9 > > [ 48.324031] [] tcp_packet+0x60c/0x6a2 > > [ 48.324031] [] ? nf_conntrack_find_get+0xb7/0xef > > [ 48.324031] [] ? nf_conntrack_find_get+0x0/0xef > > [ 48.324031] [] nf_conntrack_in+0x3a3/0x534 > > [ 48.324031] [] ? ip_rcv_finish+0x0/0x3bc > > [ 48.324031] [] ipv4_conntrack_in+0x34/0x4a > > [ 48.324031] [] nf_iterate+0x5d/0xb1 > > [ 48.324031] [] ? ftrace_call+0x5/0x2b > > [ 48.324031] [] ? ip_rcv_finish+0x0/0x3bc > > [ 48.324031] [] nf_hook_slow+0xa4/0x133 > > [ 48.324031] [] ? ip_rcv_finish+0x0/0x3bc > > [ 48.324031] [] ip_rcv+0x2ae/0x30d > > [ 48.324031] [] ? netpoll_rx+0x14/0x9d > > [ 48.324031] [] netif_receive_skb+0x3b1/0x402 > > [ 48.324031] [] ? netif_receive_skb+0x17b/0x402 > > [ 48.324031] [] ? skb_pull+0xd/0x59 > > [ 48.324031] [] ? eth_type_trans+0x48/0x104 > > [ 48.324031] [] nv_rx_process_optimized+0x15a/0x227 > > [ 48.324031] [] nv_napi_poll+0x2a9/0x2cd > > [ 48.324031] [] net_rx_action+0xd1/0x249 > > [ 48.324031] [] ? net_rx_action+0x1e8/0x249 > > [ 48.324031] [] __do_softirq+0xcb/0x1bb > > [ 48.324031] [] call_softirq+0x1c/0x30 > > [ 48.324031] [] do_softirq+0x5f/0xd7 > > [ 48.324031] [] irq_exit+0x66/0xb9 > > [ 48.324031] [] do_IRQ+0xbb/0xe8 > > [ 48.324031] [] ? early_idt_handler+0x0/0x71 > > [ 48.324031] [] ret_from_intr+0x0/0x16 > > [ 48.324031] [] ? default_idle+0x59/0x9d > > [ 48.324031] [] ? trace_hardirqs_on+0x20/0x36 > > [ 48.324031] [] ? native_safe_halt+0xb/0xd > > [ 48.324031] [] ? native_safe_halt+0x9/0xd > > [ 48.324031] [] ? default_idle+0x5e/0x9d > > [ 48.324031] [] ? stop_critical_timings+0x3d/0x54 > > [ 48.324031] [] ? cpu_idle+0xbe/0x107 > > [ 48.324031] [] ? early_idt_handler+0x0/0x71 > > [ 48.324031] [] ? rest_init+0x79/0x8f > > [ 48.324031] [] ? early_idt_handler+0x0/0x71 > > [ 48.324031] [] ? start_kernel+0x2d8/0x2f3 > > [ 48.324031] [] ? early_idt_handler+0x0/0x71 > > [ 48.324031] [] ? x86_64_start_reservations+0x8f/0xaa > > [ 48.324031] [] ? __init_begin+0x0/0x140 > > [ 48.324031] [] ? x86_64_start_kernel+0x104/0x127 > > [ 48.324031] ---[ end trace 5a5d197966b56a31 ]--- > > modprobe: FATAL: Could not load /lib/modules/2.6.30-tip/modules.dep: No such file or directory > > > > this too is a new pattern. Config and full bootlog attached. > > > > Unfortunately it's not clearly reproducible - needs some networking > > load to trigger, and sometimes the symptoms are just a straight hang > > (with no console messages) - so not very bisection friendly. > > > > Ingo > > > > commit 65cb9fda32be613216f601a330b311c3bd7a8436 seems the origin... > (and/or 440f0d588555892601cfe511728a0fc0c8204063) > > commit 65cb9fda32be613216f601a330b311c3bd7a8436 > Author: Patrick McHardy > Date: Sat Jun 13 12:21:49 2009 +0200 > > netfilter: nf_conntrack: use mod_timer_pending() for conntrack refresh > > Use mod_timer_pending() instead of atomic sequence of del_timer()/ > add_timer(). mod_timer_pending() does not rearm an inactive timer, > so we don't need the conntrack lock anymore to make sure we don't > accidentally rearm a timer of a conntrack which is in the process > of being destroyed. > > With this change, we don't need to take the global lock anymore at all, > counter updates can be performed under the per-conntrack lock. > > Signed-off-by: Patrick McHardy > > > > IPS_CONFIRMED_BIT is set under nf_conntrack_lock (in __nf_conntrack_confirm()), > we probably want to add a synchronisation under ct->lock as well, > or __nf_ct_refresh_acct() could set ct->timeout.expires to extra_jiffies, > while a different cpu could confirm the conntrack. > > Following patch as RFC A quick test suggests that it seems to works here - thanks Eric! Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/