Hello!
My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
begins returning an oops right after boot.
kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c0127177>] Not tainted
EFLAGS: 00010006
EIP is at cascade+0x44/0x4e
eax: c03e4368 ebx: c03e02b0 ecx: fffce200 edx: c03e03b0
esi: c03e0398 edi: c03dfa80 ebp: c0387f08 esp: c0387ef4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0386000 task=c0306520)
Stack: c03dfa80 cde229c4 00000000 c03df7a8 c0387f20 c0387f38 c0127732 c03dfa80
c03e0288 00000022 c0387f34 c0387f20 c0387f20 c0308d64 00000001 c03df7a8
0000000a c0387f54 c0123b7c c03df7a8 00000046 00000000 c037da00 c0308d64
Call Trace:
[<c0127732>] run_timer_softirq+0xec/0x16b
[<c0123b7c>] do_softirq+0x98/0x9a
[<c010d2ff>] do_IRQ+0xe4/0x11c
[<c010b974>] common_interrupt+0x18/0x20
[<d08c8257>] acpi_processor_idle+0xe9/0x1e5 [processor]
[<c0105000>] _stext+0x0/0x2a
[<c01090b7>] cpu_idle+0x2f/0x38
[<c038c70a>] start_kernel+0x185/0x1c9
[<c038c44a>] unknow_bootoption+0x0/0x108
Code: 0f 0b 72 01 3b 05 2d c0 eb d4 55 89 e5 56 53 83 ec 04 0f bf
Here is the function:
static int cascade(tvec_base_t *base, tvec_t *tv, int index)
{
/* cascade all the timers from tv up one level */
struct list_head *head, *curr;
head = tv->vec + index;
curr = head->next;
/*
* We are removing _all_ timers from the list, so we don't have to
* detach them individually, just clear the list afterwards.
*/
while (curr != head) {
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
BUG_ON(tmp->base != base);
curr = curr->next;
internal_add_timer(base, tmp);
}
INIT_LIST_HEAD(head);
return index;
}
Any ideas about this one?
Thanks!
--
Fl?vio Bruno Leitner <[email protected]>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
Flavio Bruno Leitner <[email protected]> wrote:
>
> My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
> begins returning an oops right after boot.
>
> kernel BUG at kernel/timer.c:370!
Oh fantastic. Something scrogged the timer lists.
I suggest you try stripping your kernel config down the the bare minimum
which is needed to boot, see if that fixes it and if so, start
reintroducing things until you've worked out which driver is causing the
problem.
On Fri, Mar 05, 2004 at 03:06:15PM -0800, Andrew Morton wrote:
> Flavio Bruno Leitner <[email protected]> wrote:
> >
> > My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
> > begins returning an oops right after boot.
> >
> > kernel BUG at kernel/timer.c:370!
>
> Oh fantastic. Something scrogged the timer lists.
>
> I suggest you try stripping your kernel config down the the bare minimum
> which is needed to boot, see if that fixes it and if so, start
> reintroducing things until you've worked out which driver is causing the
> problem.
Done!
The oops happens when the patch is applied, just do ifconfig eth0 down
and ifconfig eth0 <with another ip> up. The dhcp always get wrong ip,
so my rc.local run ifconfig down and up. Removing the patch, I can't
reproduce it anymore.
This oops still happens with newer kernels.
Thanks!
--
Fl?vio Bruno Leitner <[email protected]>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
Flavio Bruno Leitner <[email protected]> wrote:
>
> On Fri, Mar 05, 2004 at 03:06:15PM -0800, Andrew Morton wrote:
> > Flavio Bruno Leitner <[email protected]> wrote:
> > >
> > > My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
> > > begins returning an oops right after boot.
> > >
> > > kernel BUG at kernel/timer.c:370!
> >
> > Oh fantastic. Something scrogged the timer lists.
> >
> > I suggest you try stripping your kernel config down the the bare minimum
> > which is needed to boot, see if that fixes it and if so, start
> > reintroducing things until you've worked out which driver is causing the
> > problem.
>
> Done!
>
> The oops happens when the patch is applied, just do ifconfig eth0 down
> and ifconfig eth0 <with another ip> up. The dhcp always get wrong ip,
> so my rc.local run ifconfig down and up. Removing the patch, I can't
> reproduce it anymore.
>
Thanks for working that out. Maybe we need to terminate those sysctl
tables. Does this fix it?
---
25-akpm/net/ipv4/devinet.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff -puN net/ipv4/devinet.c~devinet-ctl_table-fix net/ipv4/devinet.c
--- 25/net/ipv4/devinet.c~devinet-ctl_table-fix Thu Mar 11 13:40:38 2004
+++ 25-akpm/net/ipv4/devinet.c Thu Mar 11 13:40:53 2004
@@ -1210,11 +1210,11 @@ int ipv4_doint_and_flush_strategy(ctl_ta
static struct devinet_sysctl_table {
struct ctl_table_header *sysctl_header;
- ctl_table devinet_vars[20];
- ctl_table devinet_dev[2];
- ctl_table devinet_conf_dir[2];
- ctl_table devinet_proto_dir[2];
- ctl_table devinet_root_dir[2];
+ ctl_table devinet_vars[21];
+ ctl_table devinet_dev[3];
+ ctl_table devinet_conf_dir[3];
+ ctl_table devinet_proto_dir[3];
+ ctl_table devinet_root_dir[3];
} devinet_sysctl = {
.devinet_vars = {
{
@@ -1372,6 +1372,7 @@ static struct devinet_sysctl_table {
.proc_handler = &ipv4_doint_and_flush,
.strategy = &ipv4_doint_and_flush_strategy,
},
+ { .ctl_name = 0 }
},
.devinet_dev = {
{
@@ -1380,6 +1381,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_vars,
},
+ { .ctl_name = 0 }
},
.devinet_conf_dir = {
{
@@ -1388,6 +1390,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_dev,
},
+ { .ctl_name = 0 }
},
.devinet_proto_dir = {
{
@@ -1396,6 +1399,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_conf_dir,
},
+ { .ctl_name = 0 }
},
.devinet_root_dir = {
{
@@ -1404,6 +1408,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_proto_dir,
},
+ { .ctl_name = 0 }
},
};
_
On Thu, Mar 11, 2004 at 01:42:21PM -0800, Andrew Morton wrote:
> Thanks for working that out. Maybe we need to terminate those sysctl
> tables. Does this fix it?
No, still the same oops. :(
I test it on old kernel with start with this problem and with bitkeeper of
today.
--
Fl?vio Bruno Leitner <[email protected]>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
I just observed this failure on two separate systems this morning. I
added the patch in the hopes that it will provide some useful
information.
Dave Craig
QUALCOMM Incorporated
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Morton
Sent: Saturday, February 14, 2004 12:22 AM
To: Rafael D'Halleweyn (List)
Cc: [email protected]
Subject: Re: kernel BUG at kernel/timer.c:370!
"Rafael D'Halleweyn (List)" <[email protected]> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
> snapshot, so it might contain errors). I did not copy the stack
trace,
> let me know if you want it.
>
> kernel BUG at kernel/timer.c:370!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c01284f8>] Not tainted
> EFLAGS: 00010003
> EIP is at cascade+0x50/0x70
> eax: d0a77724 ebx: d0a77724 ecx: c04aaa28 edx: 0000001c
> esi: c04aab08 edi: c04aa220 ebp: 0000001c esp: c0457e9e
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
> Stack: ...
> Call Trace:
> [<c01289e4>] update_process_times+0x44/0x50
> [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
> [<c0124695>] do_softirq+0x95/0xa0
> [<c010d2fb>] do_IRQ+0xfb/0x130
> [<c010b5e8>] common_interrupt+0x18/0x20
This could be a hardware problem. Or it could be a bug basically
anywhere
in the kernel.
Are you using CONFIG_DEBUG_SLAB?
Could you please apply the below patch, wait for the problem to reoccur,
then let us know?
diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a 2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c 2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
#include <linux/time.h>
#include <linux/jiffies.h>
#include <linux/cpu.h>
+#include <linux/kallsyms.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
- BUG_ON(tmp->base != base);
+ if (tmp->base != base) {
+ printk("%s: %p != %p\n",
+ __FUNCTION__, tmp->base, base);
+ printk("handler=%p", tmp->function);
+ print_symbol(" (%s)", (unsigned
long)tmp->function);
+ printk("\n");
+ dump_stack();
+ tmp->base = base;
+ }
curr = curr->next;
internal_add_timer(base, tmp);
}
_
cascade: c1a1d5e0 != c1a0d5e0
hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
Call Trace:
[<c012ca73>] cascade+0x79/0xa1
[<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
[<c012d0b3>] run_timer_softirq+0x159/0x1c9
[<c012899d>] do_softirq+0xc9/0xcb
[<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
[<c0108c09>] default_idle+0x0/0x32
[<c010bab2>] apic_timer_interrupt+0x1a/0x20
[<c0108c09>] default_idle+0x0/0x32
[<c0108c36>] default_idle+0x2d/0x32
[<c0108cb4>] cpu_idle+0x3a/0x43
[<c0105000>] rest_init+0x0/0x68
[<c039c89f>] start_kernel+0x1b7/0x209
[<c039c427>] unknown_bootoption+0x0/0x124
Here is the result. I am doing a lot of IPv4 multicast.
Dave
-----Original Message-----
From: Craig, Dave
Sent: Wednesday, March 31, 2004 9:00 AM
To: 'Andrew Morton'; Rafael D'Halleweyn (List)
Cc: [email protected]
Subject: RE: kernel BUG at kernel/timer.c:370!
I just observed this failure on two separate systems this morning. I
added the patch in the hopes that it will provide some useful
information.
Dave Craig
QUALCOMM Incorporated
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Morton
Sent: Saturday, February 14, 2004 12:22 AM
To: Rafael D'Halleweyn (List)
Cc: [email protected]
Subject: Re: kernel BUG at kernel/timer.c:370!
"Rafael D'Halleweyn (List)" <[email protected]> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
> snapshot, so it might contain errors). I did not copy the stack
trace,
> let me know if you want it.
>
> kernel BUG at kernel/timer.c:370!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c01284f8>] Not tainted
> EFLAGS: 00010003
> EIP is at cascade+0x50/0x70
> eax: d0a77724 ebx: d0a77724 ecx: c04aaa28 edx: 0000001c
> esi: c04aab08 edi: c04aa220 ebp: 0000001c esp: c0457e9e
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
> Stack: ...
> Call Trace:
> [<c01289e4>] update_process_times+0x44/0x50
> [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
> [<c0124695>] do_softirq+0x95/0xa0
> [<c010d2fb>] do_IRQ+0xfb/0x130
> [<c010b5e8>] common_interrupt+0x18/0x20
This could be a hardware problem. Or it could be a bug basically
anywhere
in the kernel.
Are you using CONFIG_DEBUG_SLAB?
Could you please apply the below patch, wait for the problem to reoccur,
then let us know?
diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a 2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c 2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
#include <linux/time.h>
#include <linux/jiffies.h>
#include <linux/cpu.h>
+#include <linux/kallsyms.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
- BUG_ON(tmp->base != base);
+ if (tmp->base != base) {
+ printk("%s: %p != %p\n",
+ __FUNCTION__, tmp->base, base);
+ printk("handler=%p", tmp->function);
+ print_symbol(" (%s)", (unsigned
long)tmp->function);
+ printk("\n");
+ dump_stack();
+ tmp->base = base;
+ }
curr = curr->next;
internal_add_timer(base, tmp);
}
_
"Craig, Dave" <[email protected]> wrote:
>
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
> [<c012ca73>] cascade+0x79/0xa1
> [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
> [<c012d0b3>] run_timer_softirq+0x159/0x1c9
> [<c012899d>] do_softirq+0xc9/0xcb
> [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
> [<c0108c09>] default_idle+0x0/0x32
> [<c010bab2>] apic_timer_interrupt+0x1a/0x20
> [<c0108c09>] default_idle+0x0/0x32
> [<c0108c36>] default_idle+0x2d/0x32
> [<c0108cb4>] cpu_idle+0x3a/0x43
> [<c0105000>] rest_init+0x0/0x68
> [<c039c89f>] start_kernel+0x1b7/0x209
> [<c039c427>] unknown_bootoption+0x0/0x124
>
> Here is the result. I am doing a lot of IPv4 multicast.
There's only a single bit difference between the expected and actual
timer->base value. So either your machine has flakey memory or the percpu
data area happened to be separated by 64k.
Is the machine SMP? If so can you please run
nm vmliunx | grep __per_cpu
and send the output?
Sure thing.
7ecb001b A __crc___per_cpu_offset
c033a510 r __kcrctab___per_cpu_offset
c033c462 r __kstrtab___per_cpu_offset
c03366c4 r __ksymtab___per_cpu_offset
c040bd90 A __per_cpu_end
c040c020 B __per_cpu_offset
c04090a0 A __per_cpu_start
It is a dual processor and the processors are hyperthreaded.
Dave
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Morton
Sent: Wednesday, March 31, 2004 11:52 AM
To: Craig, Dave
Cc: [email protected]; [email protected]
Subject: Re: kernel BUG at kernel/timer.c:370!
"Craig, Dave" <[email protected]> wrote:
>
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
> [<c012ca73>] cascade+0x79/0xa1
> [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
> [<c012d0b3>] run_timer_softirq+0x159/0x1c9
> [<c012899d>] do_softirq+0xc9/0xcb
> [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
> [<c0108c09>] default_idle+0x0/0x32
> [<c010bab2>] apic_timer_interrupt+0x1a/0x20
> [<c0108c09>] default_idle+0x0/0x32
> [<c0108c36>] default_idle+0x2d/0x32
> [<c0108cb4>] cpu_idle+0x3a/0x43
> [<c0105000>] rest_init+0x0/0x68
> [<c039c89f>] start_kernel+0x1b7/0x209
> [<c039c427>] unknown_bootoption+0x0/0x124
>
> Here is the result. I am doing a lot of IPv4 multicast.
There's only a single bit difference between the expected and actual
timer->base value. So either your machine has flakey memory or the
percpu
data area happened to be separated by 64k.
Is the machine SMP? If so can you please run
nm vmliunx | grep __per_cpu
and send the output?
"Craig, Dave" <[email protected]> wrote:
>
> Sure thing.
>
> 7ecb001b A __crc___per_cpu_offset
> c033a510 r __kcrctab___per_cpu_offset
> c033c462 r __kstrtab___per_cpu_offset
> c03366c4 r __ksymtab___per_cpu_offset
> c040bd90 A __per_cpu_end
> c040c020 B __per_cpu_offset
> c04090a0 A __per_cpu_start
>
> It is a dual processor and the processors are hyperthreaded.
OK. We're consistently seeing a single-bit difference and there's no
simple power-of-two stride in the things which that pointer points at.
Most likely you have a hardware problem.
On Wed, Mar 31, 2004 at 09:16:52AM -0800, Craig, Dave wrote:
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
> [<c012ca73>] cascade+0x79/0xa1
> [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
> [<c012d0b3>] run_timer_softirq+0x159/0x1c9
> [<c012899d>] do_softirq+0xc9/0xcb
> [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
> [<c0108c09>] default_idle+0x0/0x32
> [<c010bab2>] apic_timer_interrupt+0x1a/0x20
> [<c0108c09>] default_idle+0x0/0x32
> [<c0108c36>] default_idle+0x2d/0x32
> [<c0108cb4>] cpu_idle+0x3a/0x43
> [<c0105000>] rest_init+0x0/0x68
> [<c039c89f>] start_kernel+0x1b7/0x209
> [<c039c427>] unknown_bootoption+0x0/0x124
>
> Here is the result. I am doing a lot of IPv4 multicast.
Applied the patch, here is the result.
cascade: c040b170 != c040ab00
handler=c040b168 (0xc040b168)
Call Trace:
[<c012741f>] cascade+0x7f/0xb0
[<c0127a3e>] run_timer_softirq+0xee/0x170
[<c0123b15>] do_softirq+0xa5/0xb0
[<c010b625>] do_IRQ+0xe5/0x120
[<c0109a94>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c03b0829>] start_kernel+0x189/0x1e0
[<c03b0540>] unknown_bootoption+0x0/0x120
cascade: c040ab20 != c040ab00
handler=c040ab18 (0xc040ab18)
Call Trace:
[<c012741f>] cascade+0x7f/0xb0
[<c0127a3e>] run_timer_softirq+0xee/0x170
[<c0123b15>] do_softirq+0xa5/0xb0
[<c010b625>] do_IRQ+0xe5/0x120
[<c0109a94>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c03b0829>] start_kernel+0x189/0x1e0
[<c03b0540>] unknown_bootoption+0x0/0x120
--
Fl?vio Bruno Leitner <[email protected]>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
Another output with all debug options enabled.
cascade: c03b3128 != c03b28c0
kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
handler=c03b3120 (0xc03b3120)
Call Trace:
[<c01347ef>] cascade+0x7f/0xb0
[<c0135025>] run_timer_softirq+0x315/0x3f0
[<c012fa35>] do_softirq+0xa5/0xb0
[<c010caea>] do_IRQ+0x21a/0x360
[<c012b5bf>] profile_hook+0x1f/0x23
[<c010a934>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c0434829>] start_kernel+0x189/0x1e0
[<c0434540>] unknown_bootoption+0x0/0x120
cascade: c03b2f88 != c03b28c0
handler=c03b2f80 (0xc03b2f80)
Call Trace:
[<c01347ef>] cascade+0x7f/0xb0
[<c0135025>] run_timer_softirq+0x315/0x3f0
[<c012fa35>] do_softirq+0xa5/0xb0
[<c010caea>] do_IRQ+0x21a/0x360
[<c012b5bf>] profile_hook+0x1f/0x23
[<c010a934>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c0434829>] start_kernel+0x189/0x1e0
[<c0434540>] unknown_bootoption+0x0/0x120
cascade: c03b2910 != c03b28c0
handler=c03b2908 (0xc03b2908)
Call Trace:
[<c01347ef>] cascade+0x7f/0xb0
[<c0135025>] run_timer_softirq+0x315/0x3f0
[<c012fa35>] do_softirq+0xa5/0xb0
[<c010caea>] do_IRQ+0x21a/0x360
[<c012b5bf>] profile_hook+0x1f/0x23
[<c010a934>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c0434829>] start_kernel+0x189/0x1e0
[<c0434540>] unknown_bootoption+0x0/0x120
--
Fl?vio Bruno Leitner <[email protected]>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
Flavio Bruno Leitner <[email protected]> wrote:
>
> cascade: c03b3128 != c03b28c0
> kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
> handler=c03b3120 (0xc03b3120)
> Call Trace:
> [<c01347ef>] cascade+0x7f/0xb0
> [<c0135025>] run_timer_softirq+0x315/0x3f0
> [<c012fa35>] do_softirq+0xa5/0xb0
> [<c010caea>] do_IRQ+0x21a/0x360
> [<c012b5bf>] profile_hook+0x1f/0x23
> [<c010a934>] common_interrupt+0x18/0x20
> [<c0107066>] default_idle+0x26/0x40
> [<c01070f4>] cpu_idle+0x34/0x40
> [<c0434829>] start_kernel+0x189/0x1e0
> [<c0434540>] unknown_bootoption+0x0/0x120
Is the machine SMP?
What was the machine doing at the time?
Can you have a look in System.map, see if you can work out what's at
0xc03b3120?
It could be hardware, but it would be hardware negatively interacting
with the kernel preemption feature. The failure does not occur when
that feature is disabled.
Dave
-----Original Message-----
From: Andrew Morton [mailto:[email protected]]
Sent: Wednesday, March 31, 2004 2:16 PM
To: Craig, Dave
Cc: [email protected]; [email protected]
Subject: Re: kernel BUG at kernel/timer.c:370!
"Craig, Dave" <[email protected]> wrote:
>
> Sure thing.
>
> 7ecb001b A __crc___per_cpu_offset
> c033a510 r __kcrctab___per_cpu_offset
> c033c462 r __kstrtab___per_cpu_offset
> c03366c4 r __ksymtab___per_cpu_offset
> c040bd90 A __per_cpu_end
> c040c020 B __per_cpu_offset
> c04090a0 A __per_cpu_start
>
> It is a dual processor and the processors are hyperthreaded.
OK. We're consistently seeing a single-bit difference and there's no
simple power-of-two stride in the things which that pointer points at.
Most likely you have a hardware problem.
On Thu, Apr 01, 2004 at 10:37:18AM -0800, Andrew Morton wrote:
> Flavio Bruno Leitner <[email protected]> wrote:
> >
> > cascade: c03b3128 != c03b28c0
> > kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
> > handler=c03b3120 (0xc03b3120)
> > Call Trace:
> > [<c01347ef>] cascade+0x7f/0xb0
> > [<c0135025>] run_timer_softirq+0x315/0x3f0
> > [<c012fa35>] do_softirq+0xa5/0xb0
> > [<c010caea>] do_IRQ+0x21a/0x360
> > [<c012b5bf>] profile_hook+0x1f/0x23
> > [<c010a934>] common_interrupt+0x18/0x20
> > [<c0107066>] default_idle+0x26/0x40
> > [<c01070f4>] cpu_idle+0x34/0x40
> > [<c0434829>] start_kernel+0x189/0x1e0
> > [<c0434540>] unknown_bootoption+0x0/0x120
>
> Is the machine SMP?
No, it's a simple Pentium II .
> What was the machine doing at the time?
I were running process like postfix, pump, ntpd. Well, after you do this
question, I tried to reproduce with runlevel 1 (single), but I can't until
now. Next step will be disable one per one service until I can't reproduce
anymore.
>
> Can you have a look in System.map, see if you can work out what's at
> 0xc03b3120?
c03b3128 => Not found in System.map
c03b28c0 => per_cpu__tvec_bases
c03b3120 => Not found in System.map
--
Fl?vio Bruno Leitner <[email protected]>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]