Hi Ingo... I'm seeing a few freezes with 2.6.15-rt21, they seem to be
few and far between and most of the time they don't leave traces behind
in the logs (and the reset button is the only way out). Here's one that
apparently did leave something behind:
Mar 23 15:22:48 host kernel: BUG at
net/ipv4/netfilter/ip_conntrack_core.c:124!
Mar 23 15:22:48 host kernel: ------------[ cut here ]------------
Mar 23 15:22:48 host kernel: kernel BUG at
net/ipv4/netfilter/ip_conntrack_core.c:124!
Mar 23 15:22:48 host kernel: invalid operand: 0000 [#1]
Mar 23 15:22:48 host kernel: PREEMPT SMP
Mar 23 15:22:48 host kernel: last sysfs
file: /devices/pci0000:00/0000:00:0b.0/0000:01:00.0/modalias
Mar 23 15:22:48 host kernel: Modules linked in: radeon drm parport_pc lp
parport snd_seq_midi(U) autofs4 ipt_REJECT ipt_LOG ipt_state ipt_pkttype
ipt_CONNMARK ipt_MARK ipt_connmark ipt_owner ipt_recent ipt_iprange
ipt_physdev ipt_multiport ipt_conntrack iptable_mangle ip_nat_irc
ip_nat_tftp ip_nat_ftp iptable_nat ip_nat ip_conntrack_irc
ip_conntrack_tftp ip_conntrack_ftp ip_conntrack nfnetlink iptable_filter
ip_tables nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc dm_mod video
button battery ac ipv6 ohci1394 ieee1394 ohci_hcd ehci_hcd i2c_nforce2
i2c_core snd_intel8x0(U) snd_ac97_codec(U) snd_ac97_bus(U) skge
snd_hdsp(U) snd_rawmidi(U) snd_seq_dummy(U) snd_seq_oss(U)
snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U)
snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_hwdep(U)
snd(U) soundcore sk98lin floppy ext3 jbd sata_nv sata_sil libata sd_mod
scsi_mod
Mar 23 15:22:48 host kernel: CPU: 1
Mar 23 15:22:48 host kernel: EIP: 0060:[<f8ca8bbe>] Not tainted
VLI
Mar 23 15:22:48 host kernel: EFLAGS: 00010292
(2.6.15-1.1833.4.rrt.rhfc4.ccrmasmp)
Mar 23 15:22:48 host kernel: EIP is at __ip_ct_event_cache_init
+0x9b/0xaa [ip_conntrack]
Mar 23 15:22:48 host kernel: eax: 00000036 ebx: c2e34ec0 ecx:
00000000 edx: c335c270
Mar 23 15:22:48 host kernel: esi: f4035888 edi: 00000000 ebp:
f4035888 esp: f7a078d0
Mar 23 15:22:48 host kernel: ds: 007b es: 007b ss: 0068 preempt:
00000001
Mar 23 15:22:48 host kernel: Process ardour (pid: 13156,
threadinfo=f7a07000 task=c335c270 stack_left=2200 worst_left=-1)
Mar 23 15:22:48 host kernel: Stack: f8caddad f8cad80c 0000007c c2e24ec0
c044df40 f8caabb5 8bc540ab 97c540ab
Mar 23 15:22:48 host kernel: 00000000 dfcddd40 00000000 00000808
f4035888 dfcddd40 f7a079dc f4035888
Mar 23 15:22:49 host kernel: f8cac890 0002bf20 00000001 f8cb52e0
f4035888 f7a079dc f4035888 f8ca9ffb
Mar 23 15:22:49 host kernel: Call Trace:
Mar 23 15:22:49 host kernel: [<f8caabb5>] __ip_ct_refresh_acct
+0x92/0x164 [ip_conntrack] (24)
Mar 23 15:22:49 host kernel: [<f8cac890>] udp_packet+0x2d/0xc2
[ip_conntrack] (44)
Mar 23 15:22:49 host kernel: [<f8ca9ffb>] ip_conntrack_in+0x142/0x3a4
[ip_conntrack] (28)
Mar 23 15:22:49 host kernel: [<c02e958d>] nf_iterate+0x60/0x84 (64)
Mar 23 15:22:49 host kernel: [<c02f232d>] dst_output+0x0/0x18 (8)
Mar 23 15:22:49 host kernel: [<c02e9603>] nf_hook_slow+0x52/0xf5 (28)
Mar 23 15:22:49 host kernel: [<c02f232d>] dst_output+0x0/0x18 (16)
Mar 23 15:22:49 host kernel: [<c02f501e>] ip_push_pending_frames
+0x315/0x4dd (28)
Mar 23 15:22:49 host kernel: [<c02f232d>] dst_output+0x0/0x18 (12)
Mar 23 15:22:49 host kernel: [<c030f3e7>] udp_push_pending_frames
+0x131/0x26b (40)
Mar 23 15:22:49 host kernel: [<c030f90d>] udp_sendmsg+0x3b0/0x69f (40)
Mar 23 15:22:49 host kernel: [<c03165bc>] inet_sendmsg+0x2b/0x49 (164)
Mar 23 15:22:49 host kernel: [<c02c7e93>] sock_sendmsg+0xde/0xf9 (24)
Mar 23 15:22:49 host kernel: [<c0332e84>] __schedule+0x314/0x9ac (88)
Mar 23 15:22:49 host kernel: [<c013a864>] autoremove_wake_function
+0x0/0x37 (12)
Mar 23 15:22:49 host kernel: [<c03336d5>] preempt_schedule_irq
+0x3e/0x58 (68)
Mar 23 15:22:49 host kernel: [<c02c7ed4>] kernel_sendmsg+0x26/0x2c (48)
Mar 23 15:22:49 host kernel: [<f8c6929f>] xs_udp_send_request
+0x229/0x396 [sunrpc] (12)
Mar 23 15:22:49 host kernel: [<c0332e84>] __schedule+0x314/0x9ac (12)
Mar 23 15:22:49 host kernel: [<f8c68496>] xprt_transmit+0x48/0x1dc
[sunrpc] (88)
Mar 23 15:22:49 host kernel: [<f8c66bc0>] call_encode+0x94/0xfe
[sunrpc] (12)
Mar 23 15:22:49 host kernel: [<f8c66f77>] call_transmit+0x47/0xc0
[sunrpc] (32)
Mar 23 15:22:49 host kernel: [<f8c6b75f>] __rpc_execute+0x5a/0x22d
[sunrpc] (20)
Mar 23 15:22:49 host kernel: [<c03355db>] lock_kernel+0x1d/0x23 (12)
Mar 23 15:22:49 host kernel: [<f8cd78c3>] nfs_execute_read+0x34/0x49
[nfs] (12)
Mar 23 15:22:49 host kernel: [<f8cd7bdb>] nfs_pagein_one+0xe8/0x100
[nfs] (24)
Mar 23 15:22:49 host kernel: [<f8cd7c35>] nfs_pagein_list+0x42/0x69
[nfs] (28)
Mar 23 15:22:49 host kernel: [<f8cd81a9>] nfs_readpages+0x86/0xf6 [nfs]
(32)
Mar 23 15:22:49 host kernel: [<f8cd8123>] nfs_readpages+0x0/0xf6 [nfs]
(48)
Mar 23 15:22:49 host kernel: [<c0157304>] read_pages+0x2a/0x13d (16)
Mar 23 15:22:49 host kernel: [<c01575a8>] __do_page_cache_readahead
+0x191/0x1bf (80)
Mar 23 15:22:49 host kernel: [<c03336d5>] preempt_schedule_irq
+0x3e/0x58 (12)
Mar 23 15:22:49 host kernel: [<c01576e9>]
blockable_page_cache_readahead+0x53/0xbc (60)
Mar 23 15:22:49 host kernel: [<c01577ad>] make_ahead_window+0x5b/0x98
(24)
Mar 23 15:22:49 host kernel: [<c015786f>] page_cache_readahead
+0x85/0x15f (32)
Mar 23 15:22:49 host kernel: [<c0150e92>] file_read_actor+0x6d/0xd0 (8)
Mar 23 15:22:49 host kernel: [<c0150d02>] do_generic_mapping_read
+0x412/0x535 (32)
Mar 23 15:22:49 host kernel: [<c013defd>] hrtimer_interrupt+0x151/0x231
(12)
Mar 23 15:22:49 host kernel: [<c015106d>] __generic_file_aio_read
+0x178/0x24c (112)
Mar 23 15:22:49 host kernel: [<c0150e25>] file_read_actor+0x0/0xd0 (12)
Mar 23 15:22:49 host kernel: [<c015117f>] generic_file_aio_read
+0x3e/0x6b (72)
Mar 23 15:22:49 host kernel: [<c0170f19>] do_sync_read+0xbb/0x116 (36)
Mar 23 15:22:49 host kernel: [<c01d0166>] selinux_file_permission
+0xe3/0x15a (56)
Mar 23 15:22:49 host kernel: [<c013a864>] autoremove_wake_function
+0x0/0x37 (52)
Mar 23 15:22:49 host kernel: [<c0170e5e>] do_sync_read+0x0/0x116 (40)
Mar 23 15:22:49 host kernel: [<c0171014>] vfs_read+0xa0/0x158 (4)
Mar 23 15:22:49 host kernel: [<c017146e>] sys_pread64+0x5e/0x62 (24)
Mar 23 15:22:49 host kernel: [<c0104241>] syscall_call+0x7/0xb (20)
Mar 23 15:22:49 host kernel: Code: c7 8b 0b eb b9 89 c8 ff 51 04 8d 76
00 eb c4 c7 44 24 08 7c 00 00 00 c7 44 24 04 0c d8 ca f8 c7 04 24 ad dd
ca f8 e8 77 dc 47 c7 <0f> 0b 7c 00 0c d8 ca f8 8b 0b e9 79 ff ff ff 53
a1 ac 75 39 c0
-- Fernando
* Fernando Lopez-Lezcano <[email protected]> wrote:
> Hi Ingo... I'm seeing a few freezes with 2.6.15-rt21, they seem to be
> few and far between and most of the time they don't leave traces
> behind in the logs (and the reset button is the only way out). Here's
> one that apparently did leave something behind:
>
> Mar 23 15:22:48 host kernel: BUG at
> net/ipv4/netfilter/ip_conntrack_core.c:124!
does the patch below help?
Ingo
Index: linux/include/linux/netfilter_ipv4/ip_conntrack.h
===================================================================
--- linux.orig/include/linux/netfilter_ipv4/ip_conntrack.h
+++ linux/include/linux/netfilter_ipv4/ip_conntrack.h
@@ -336,7 +336,8 @@ ip_conntrack_expect_unregister_notifier(
}
extern void ip_ct_deliver_cached_events(const struct ip_conntrack *ct);
-extern void __ip_ct_event_cache_init(struct ip_conntrack *ct);
+extern void __ip_ct_event_cache_init(struct ip_conntrack_ecache *ecache,
+ struct ip_conntrack *ct);
static inline void
ip_conntrack_event_cache(enum ip_conntrack_events event,
@@ -349,7 +350,7 @@ ip_conntrack_event_cache(enum ip_conntra
local_bh_disable();
ecache = &get_cpu_var_locked(ip_conntrack_ecache, &cpu);
if (ct != ecache->ct)
- __ip_ct_event_cache_init(ct);
+ __ip_ct_event_cache_init(ecache, ct);
ecache->events |= event;
put_cpu_var_locked(ip_conntrack_ecache, cpu);
local_bh_enable();
Index: linux/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- linux.orig/net/ipv4/netfilter/ip_conntrack_core.c
+++ linux/net/ipv4/netfilter/ip_conntrack_core.c
@@ -114,13 +114,10 @@ void ip_ct_deliver_cached_events(const s
local_bh_enable();
}
-void __ip_ct_event_cache_init(struct ip_conntrack *ct)
+void __ip_ct_event_cache_init(struct ip_conntrack_ecache *ecache,
+ struct ip_conntrack *ct)
{
- struct ip_conntrack_ecache *ecache;
- int cpu = raw_smp_processor_id();
-
/* take care of delivering potentially old events */
- ecache = &__get_cpu_var_locked(ip_conntrack_ecache, cpu);
BUG_ON(ecache->ct == ct);
if (ecache->ct)
__ip_ct_deliver_cached_events(ecache);
* Ingo Molnar <[email protected]> wrote:
> > Mar 23 15:22:48 host kernel: BUG at
> > net/ipv4/netfilter/ip_conntrack_core.c:124!
>
> does the patch below help?
updated patch below.
Ingo
Index: linux/include/linux/netfilter_ipv4/ip_conntrack.h
===================================================================
--- linux.orig/include/linux/netfilter_ipv4/ip_conntrack.h
+++ linux/include/linux/netfilter_ipv4/ip_conntrack.h
@@ -336,7 +336,8 @@ ip_conntrack_expect_unregister_notifier(
}
extern void ip_ct_deliver_cached_events(const struct ip_conntrack *ct);
-extern void __ip_ct_event_cache_init(struct ip_conntrack *ct);
+extern void __ip_ct_event_cache_init(struct ip_conntrack_ecache *ecache,
+ struct ip_conntrack *ct);
static inline void
ip_conntrack_event_cache(enum ip_conntrack_events event,
@@ -349,7 +350,7 @@ ip_conntrack_event_cache(enum ip_conntra
local_bh_disable();
ecache = &get_cpu_var_locked(ip_conntrack_ecache, &cpu);
if (ct != ecache->ct)
- __ip_ct_event_cache_init(ct);
+ __ip_ct_event_cache_init(ecache, ct);
ecache->events |= event;
put_cpu_var_locked(ip_conntrack_ecache, cpu);
local_bh_enable();
Index: linux/net/ipv4/netfilter/arp_tables.c
===================================================================
--- linux.orig/net/ipv4/netfilter/arp_tables.c
+++ linux/net/ipv4/netfilter/arp_tables.c
@@ -248,7 +248,7 @@ unsigned int arpt_do_table(struct sk_buf
outdev = out ? out->name : nulldevname;
read_lock_bh(&table->lock);
- table_base = (void *)private->entries[smp_processor_id()];
+ table_base = (void *)private->entries[raw_smp_processor_id()];
e = get_entry(table_base, private->hook_entry[hook]);
back = get_entry(table_base, private->underflow[hook]);
@@ -948,7 +948,7 @@ static int do_add_counters(void __user *
i = 0;
/* Choose the copy that is on our node */
- loc_cpu_entry = private->entries[smp_processor_id()];
+ loc_cpu_entry = private->entries[raw_smp_processor_id()];
ARPT_ENTRY_ITERATE(loc_cpu_entry,
private->size,
add_counter_to_entry,
Index: linux/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- linux.orig/net/ipv4/netfilter/ip_conntrack_core.c
+++ linux/net/ipv4/netfilter/ip_conntrack_core.c
@@ -114,13 +114,10 @@ void ip_ct_deliver_cached_events(const s
local_bh_enable();
}
-void __ip_ct_event_cache_init(struct ip_conntrack *ct)
+void __ip_ct_event_cache_init(struct ip_conntrack_ecache *ecache,
+ struct ip_conntrack *ct)
{
- struct ip_conntrack_ecache *ecache;
- int cpu = raw_smp_processor_id();
-
/* take care of delivering potentially old events */
- ecache = &__get_cpu_var_locked(ip_conntrack_ecache, cpu);
BUG_ON(ecache->ct == ct);
if (ecache->ct)
__ip_ct_deliver_cached_events(ecache);
Index: linux/net/ipv4/netfilter/ip_tables.c
===================================================================
--- linux.orig/net/ipv4/netfilter/ip_tables.c
+++ linux/net/ipv4/netfilter/ip_tables.c
@@ -246,7 +246,7 @@ ipt_do_table(struct sk_buff **pskb,
read_lock_bh(&table->lock);
IP_NF_ASSERT(table->valid_hooks & (1 << hook));
- table_base = (void *)private->entries[smp_processor_id()];
+ table_base = (void *)private->entries[raw_smp_processor_id()];
e = get_entry(table_base, private->hook_entry[hook]);
/* For return from builtin chain */
On Sun, 2006-03-26 at 18:34 +0200, Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
>
> > > Mar 23 15:22:48 host kernel: BUG at
> > > net/ipv4/netfilter/ip_conntrack_core.c:124!
> >
> > does the patch below help?
>
> updated patch below.
Thanks! I'll test later today. It may take a while to be reasonably sure
whether it makes a difference. The hangs have not been frequent.
If I try a 2.6.16 based kernel, should I also use this patch?
-- Fernando
> Index: linux/include/linux/netfilter_ipv4/ip_conntrack.h
> ===================================================================
> --- linux.orig/include/linux/netfilter_ipv4/ip_conntrack.h
> +++ linux/include/linux/netfilter_ipv4/ip_conntrack.h
> @@ -336,7 +336,8 @@ ip_conntrack_expect_unregister_notifier(
> }
>
> extern void ip_ct_deliver_cached_events(const struct ip_conntrack *ct);
> -extern void __ip_ct_event_cache_init(struct ip_conntrack *ct);
> +extern void __ip_ct_event_cache_init(struct ip_conntrack_ecache *ecache,
> + struct ip_conntrack *ct);
>
> static inline void
> ip_conntrack_event_cache(enum ip_conntrack_events event,
> @@ -349,7 +350,7 @@ ip_conntrack_event_cache(enum ip_conntra
> local_bh_disable();
> ecache = &get_cpu_var_locked(ip_conntrack_ecache, &cpu);
> if (ct != ecache->ct)
> - __ip_ct_event_cache_init(ct);
> + __ip_ct_event_cache_init(ecache, ct);
> ecache->events |= event;
> put_cpu_var_locked(ip_conntrack_ecache, cpu);
> local_bh_enable();
> Index: linux/net/ipv4/netfilter/arp_tables.c
> ===================================================================
> --- linux.orig/net/ipv4/netfilter/arp_tables.c
> +++ linux/net/ipv4/netfilter/arp_tables.c
> @@ -248,7 +248,7 @@ unsigned int arpt_do_table(struct sk_buf
> outdev = out ? out->name : nulldevname;
>
> read_lock_bh(&table->lock);
> - table_base = (void *)private->entries[smp_processor_id()];
> + table_base = (void *)private->entries[raw_smp_processor_id()];
> e = get_entry(table_base, private->hook_entry[hook]);
> back = get_entry(table_base, private->underflow[hook]);
>
> @@ -948,7 +948,7 @@ static int do_add_counters(void __user *
>
> i = 0;
> /* Choose the copy that is on our node */
> - loc_cpu_entry = private->entries[smp_processor_id()];
> + loc_cpu_entry = private->entries[raw_smp_processor_id()];
> ARPT_ENTRY_ITERATE(loc_cpu_entry,
> private->size,
> add_counter_to_entry,
> Index: linux/net/ipv4/netfilter/ip_conntrack_core.c
> ===================================================================
> --- linux.orig/net/ipv4/netfilter/ip_conntrack_core.c
> +++ linux/net/ipv4/netfilter/ip_conntrack_core.c
> @@ -114,13 +114,10 @@ void ip_ct_deliver_cached_events(const s
> local_bh_enable();
> }
>
> -void __ip_ct_event_cache_init(struct ip_conntrack *ct)
> +void __ip_ct_event_cache_init(struct ip_conntrack_ecache *ecache,
> + struct ip_conntrack *ct)
> {
> - struct ip_conntrack_ecache *ecache;
> - int cpu = raw_smp_processor_id();
> -
> /* take care of delivering potentially old events */
> - ecache = &__get_cpu_var_locked(ip_conntrack_ecache, cpu);
> BUG_ON(ecache->ct == ct);
> if (ecache->ct)
> __ip_ct_deliver_cached_events(ecache);
> Index: linux/net/ipv4/netfilter/ip_tables.c
> ===================================================================
> --- linux.orig/net/ipv4/netfilter/ip_tables.c
> +++ linux/net/ipv4/netfilter/ip_tables.c
> @@ -246,7 +246,7 @@ ipt_do_table(struct sk_buff **pskb,
>
> read_lock_bh(&table->lock);
> IP_NF_ASSERT(table->valid_hooks & (1 << hook));
> - table_base = (void *)private->entries[smp_processor_id()];
> + table_base = (void *)private->entries[raw_smp_processor_id()];
> e = get_entry(table_base, private->hook_entry[hook]);
>
> /* For return from builtin chain */
* Fernando Lopez-Lezcano <[email protected]> wrote:
> On Sun, 2006-03-26 at 18:34 +0200, Ingo Molnar wrote:
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > > Mar 23 15:22:48 host kernel: BUG at
> > > > net/ipv4/netfilter/ip_conntrack_core.c:124!
> > >
> > > does the patch below help?
> >
> > updated patch below.
>
> Thanks! I'll test later today. It may take a while to be reasonably
> sure whether it makes a difference. The hangs have not been frequent.
>
> If I try a 2.6.16 based kernel, should I also use this patch?
not needed - it's included in -rt10. (-rt9 had it too but was buggy)
Ingo
On Mon, 2006-03-27 at 01:19 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[email protected]> wrote:
>
> > On Sun, 2006-03-26 at 18:34 +0200, Ingo Molnar wrote:
> > > * Ingo Molnar <[email protected]> wrote:
> > >
> > > > > Mar 23 15:22:48 host kernel: BUG at
> > > > > net/ipv4/netfilter/ip_conntrack_core.c:124!
> > > >
> > > > does the patch below help?
> > >
> > > updated patch below.
> >
> > Thanks! I'll test later today. It may take a while to be reasonably
> > sure whether it makes a difference. The hangs have not been frequent.
> >
> > If I try a 2.6.16 based kernel, should I also use this patch?
>
> not needed - it's included in -rt10. (-rt9 had it too but was buggy)
Oh well, I just experienced another complete hang, no traces left
behind, reset button was the only option :-( So most probably this
problem is unrelated to your fix. I'm including a dmesg of the
successful reboot just in case there is something there of use (I only
had to apply one chunk of your fix to arp_tables.c, the other raw_* was
already there, maybe included in 2.6.15.6?).
Maybe I'll have to try 2.6.16...
-- Fernando
On Sun, 2006-03-26 at 16:33 -0800, Fernando Lopez-Lezcano wrote:
> On Mon, 2006-03-27 at 01:19 +0200, Ingo Molnar wrote:
> > * Fernando Lopez-Lezcano <[email protected]> wrote:
> > > > On Sun, 2006-03-26 at 18:34 +0200, Ingo Molnar wrote:
> > > > * Ingo Molnar <[email protected]> wrote:
> > > >
> > > > > > Mar 23 15:22:48 host kernel: BUG at
> > > > > > net/ipv4/netfilter/ip_conntrack_core.c:124!
> > > > >
> > > > > does the patch below help?
> > > >
> > > > updated patch below.
> > >
> > > Thanks! I'll test later today. It may take a while to be reasonably
> > > sure whether it makes a difference. The hangs have not been frequent.
> > >
> > > If I try a 2.6.16 based kernel, should I also use this patch?
> >
> > not needed - it's included in -rt10. (-rt9 had it too but was buggy)
>
> Oh well, I just experienced another complete hang, no traces left
> behind, reset button was the only option :-( So most probably this
> problem is unrelated to your fix.
This morning I got a not-so-complete hang, that is, a sysrq-b key
combination surprisingly rebooted the machine!
So I tried again and managed to get it to hang, and captured a sysrq-T
trace of all running programs through a serial console link, which I
attach to this email (including the boot process that led to it). This
is running 2.6.15.7 + 2.6.15-rt21 + small fix you sent a few days ago on
an Athlon X2. I was logged in running evolution, mozilla and downloading
some iso images with bittorrent.
Hopefully some clue will be found on what the heck is happening...
Thanks for all the help so far!
-- Fernando