2006-10-11 19:19:21

by Frank Sorenson

[permalink] [raw]
Subject: Kernel panic in 2.6.19-rc1

I'm getting kernel panics within a few minutes of boot with 2.6.19-rc1
(latest git) on x86_64. Other than "make oldconfig", it's an identical
configuration to a working kernel on 2.6.18.

The panic scrolls off the screen, but I copied down what was on the screen:

[<ffffffff8103e6d3>] blocking_notifier_call_chain+0x1b/0x41
[<ffffffff810332b3>] profile_task_exit+0x15/0x17
[<ffffffff81034d97>] do_exit+0x25/0x918
[<ffffffff8131b113>] sync_regs+0x0/0x72
[<ffffffff8131b767>] nmi_watchdog_tick+0xfe/0x1de
[<ffffffff8131b318>] default_do_nmi+0x83/0x1c8
[<ffffffff8131b86e>] do_nmi+0x27/0x36
[<ffffffff8131acff>] nmi+0x7f/0x90
[<ffffffff811838e0>] acpi_processor_idle+0x259/0x48d
<<EOE>> [<ffffffff8131cb55>] atomic_notifier_call_chain+0x3e/0x60
[<ffffffff81183687>] acpi_processor_idle+0x0/0x48d
[<ffffffff81008caa>] cpu_idle+0x8f/0xc6
[<ffffffff81018274>] start_secondary+0x44a/0x45a

Kernel panic - not syncing: Attempted to kill the idle task!
CPU 0
Modules linked in: sunrpc asus_acpi lp parport_pc parport nvram ohci1394
ieee1394 joydev uhci_hcd ehci_hcd bcm43xx i2c_i801 i2c_core
Pid: 0, comm: swapper Not tainted 2.6.19-rc1-fs2 #2
RIP: 0010: ffffffff811838e0 acpi_processor_idle+0x259/0x48d
RSP: 0x18:ffffffff8149ff18 EFLAGS: 00000092
RAX: 0000000000d58d95 RBX: 0000000000000002 RCX: 0000000000001008
RDX: 0000000000001016 RSI: 0000000000000013 RDI: 0000000000000001
RBP: ffffffff8149ff68 R08: ffff81007db42d00 R09: 000000007db42d60
R10: 000000000055cfe0 R11: 0000000000000246 R12: ffff81007db42d60
R13: 0000000000d58d95 R14: ffff81007db42c00 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff8148c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b2838a40000 CR3: 0000000060f52000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff8149e000, task ffffffff813d0420)
Stack: 0000000000000000 0000000000000001 ffffffff8149ff58 ffffffff8131cb55
0000000000000246 ffffffff81183687 0000000000000000 0000000000000000
0000000000000000 0000000000000000 ffffffff8149ff88 ffffffff81008caa
Call Trace:
[<ffffffff8131cb55>] atomic_notifier_call_chain+0x3e/0x60
[<ffffffff81183687>] acpi_processor_idle+0x0/0x48d
[<ffffffff81008caa>] cpu_idle+0x8f/0xc6
[<ffffffff8100703f>] rest_init+0x3f/0x41
[<ffffffff814a96c9>] start_kernel+0x21a/0x21c
[<ffffffff814a9156>] _sinittext+0x156/0x15d

Code: 80 ca ed ed 89 c3 41 f6 46 18 20 74 15 f0 ff 0d b4 79 3d 00
BUG: warning at drivers/char/vt.c:3395/do_unblank_screen()

Call Trace:
<NMI> [<ffffffff8100b3ab>] _show_stack+0xdb/0xea
[<ffffffff8119ca6a>] do_unblank_screen+0x5a/0x131
[<ffffffff8119cb4c>] unblank_screen+0xb/0xd
[<ffffffff81020ab2>] bust_spinlocks+0x24/0x50
[<ffffffff8131adea>] oops_end+0x1d/0x62
[<ffffffff8131b0fe>] die_nmi+0x73/0x88
[<ffffffff8131b767>] nmi_watchdog_tick+0xfe/0x1de
[<ffffffff8131b318>] default_do_nmi+0x83/0x1c8
[<ffffffff8131b86e>] do_nmi+0x27/0x36
[<ffffffff8131acff>] nmi+0x7f/0x90
[<ffffffff811838e0>] acpi_processor_idle+0x259/0x48d
<<EOE>> [<ffffffff8131cb55>] atomic_notifier_call_chain+0x3e/0x60
[<ffffffff81183687>] acpi_processor_idle+0x0/0x48d
[<ffffffff81008caa>] cpu_idle+0x8f/0xc6
[<ffffffff8100703f>] rest_init+0x3f/0x41
[<ffffffff814a96c9>] start_kernel+0x21a/0x21c
[<ffffffff814a9156>] _sinittext+0x156/0x15d



Thanks,

Frank


2006-10-12 07:06:50

by Andrew Morton

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

On Wed, 11 Oct 2006 14:19:18 -0500
Frank Sorenson <[email protected]> wrote:

> I'm getting kernel panics within a few minutes of boot with 2.6.19-rc1
> (latest git) on x86_64. Other than "make oldconfig", it's an identical
> configuration to a working kernel on 2.6.18.
>
> The panic scrolls off the screen, but I copied down what was on the screen:

Can you get netconsole going? Documentation/networking/netconsole.txt.
It's pretty simple.

2006-10-12 19:13:55

by Frank Sorenson

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

[ 0.000000] Linux version 2.6.19-rc1-fs2 ([email protected]) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #4 SMP PREEMPT Thu Oct 12 12:32:29 CDT 2006
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
[ 0.000000] BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000007fed3400 (usable)
[ 0.000000] BIOS-e820: 000000007fed3400 - 0000000080000000 (reserved)
[ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0007000 (reserved)
[ 0.000000] BIOS-e820: 00000000f0008000 - 00000000f000c000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[ 0.000000] BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
[ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
[ 0.000000] BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
[ 0.000000] end_pfn_map = 1048576
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] DMA32 4096 -> 1048576
[ 0.000000] Normal 1048576 -> 1048576
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0 -> 159
[ 0.000000] 0: 256 -> 523987
[ 0.000000] mapped APIC to ffffffffff5fd000 ( fee00000)
[ 0.000000] mapped IOAPIC to ffffffffff5fc000 (00000000fec00000)
[ 0.000000] Nosave address range: 000000000009f000 - 00000000000a0000
[ 0.000000] Nosave address range: 00000000000a0000 - 0000000000100000
[ 0.000000] Built 1 zonelists. Total pages: 515695
[ 0.000000] Kernel command line: ro root=/dev/VolGroup00/RootVol vga=794 apic=verbose nmi_watchdog=1 notsc [email protected]/,@64.62.190.123/00:0F:66:99:97:4F loglevel=6 3
[ 0.000000] Initializing CPU#0
[ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[ 12.763844] Console: colour dummy device 80x25
[ 12.764869] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 12.765692] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[ 12.765859] Checking aperture...
[ 12.796709] Memory: 2052256k/2095948k available (3202k kernel code, 43304k reserved, 1458k data, 328k init)
[ 12.856726] Calibrating delay using timer specific routine.. 4330.64 BogoMIPS (lpj=2165320)
[ 12.856851] Mount-cache hash table entries: 256
[ 12.857041] using mwait in idle threads.
[ 12.876954] enabled ExtINT on CPU#0
[ 12.876958] ESR value after enabling vector: 00000000, after 00000040
[ 12.877155] ENABLING IO-APIC IRQs
[ 12.934048] result 10402908
[ 14.730182] Initializing CPU#1
[ 14.730379] masked ExtINT on CPU#1
[ 14.789672] Calibrating delay using timer specific routine.. 4326.95 BogoMIPS (lpj=2163476)
[ 14.790157] Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz stepping 06
[ 14.839363] migration_cost=31
[ 15.022656] ACPI: PCI Interrupt Link [LNKA] (IRQs 9 10 11) *4
[ 15.022947] ACPI: PCI Interrupt Link [LNKB] (IRQs *5 7)
[ 15.023228] ACPI: PCI Interrupt Link [LNKC] (IRQs *9 10 11)
[ 15.023510] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 7 9 10 11) *3
[ 15.023799] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
[ 15.024084] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
[ 15.024371] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
[ 15.024677] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 9 10 11 12 14 15)
[ 15.064728] intel_rng: FWH not detected
[ 15.064962] SCSI subsystem initialized
[ 15.065225]
[ 15.097552] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 15.097735] TCP established hash table entries: 131072 (order: 10, 4194304 bytes)
[ 15.099548] TCP bind hash table entries: 65536 (order: 9, 2097152 bytes)
[ 15.102254] audit(1160678537.508:1): initialized
[ 15.102355] Total HugeTLB memory allocated, 0
[ 15.102494] VFS: Disk quotas dquot_6.5.1
[ 15.102518] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 15.102623] fuse init (API version 7.7)
[ 15.109060] assign_interrupt_mode Found MSI capability
[ 15.109218] assign_interrupt_mode Found MSI capability
[ 15.109426] assign_interrupt_mode Found MSI capability
[ 15.109649] assign_interrupt_mode Found MSI capability
[ 15.109981] pciehp: Cannot get control of hotplug hardware for pci 0000:00:1c.0
[ 15.110044] pciehp: Cannot get control of hotplug hardware for pci 0000:00:1c.1
[ 15.110106] pciehp: Cannot get control of hotplug hardware for pci 0000:00:1c.3
[ 15.152148] Console: switching to colour frame buffer device 160x64
[ 15.216724] ACPI (exconfig-0455): Dynamic SSDT Load - OemId [ PmRef] OemTableId [ Cpu0Ist] [20060707]
[ 15.217357] ACPI (exconfig-0455): Dynamic SSDT Load - OemId [ PmRef] OemTableId [ Cpu0Cst] [20060707]
[ 15.218528] ACPI (exconfig-0455): Dynamic SSDT Load - OemId [ PmRef] OemTableId [ Cpu1Ist] [20060707]
[ 15.219152] ACPI (exconfig-0455): Dynamic SSDT Load - OemId [ PmRef] OemTableId [ Cpu1Cst] [20060707]
[ 15.249464] RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
[ 18.760068] ide0: I/O resource 0x1F0-0x1F7 not free.
[ 18.771524] ide0: ports already in use, skipping probe
[ 17.049447] hdc: TSSTcorp DVD+/-RW TS-L632D, ATAPI CD/DVD-ROM drive
[ 17.090870] ide1 at 0x170-0x177,0x376 on irq 15
[ 17.106756] ide-floppy driver 0.99.newide
[ 17.118679] ata: 0x170 IDE port busy
[ 17.118680] ata: conflict with ide1
[ 17.149371] scsi 0:0:0:0: Direct-Access ATA Hitachi HTS72101 MCZO PQ: 0 ANSI: 5
[ 17.160143] SCSI device sda: 192426570 512-byte hdwr sectors (98522 MB)
[ 17.171027] sda: Write Protect is off
[ 17.182042] SCSI device sda: drive cache: write back
[ 17.192902] SCSI device sda: 192426570 512-byte hdwr sectors (98522 MB)
[ 17.203609] sda: Write Protect is off
[ 18.998025] SCSI device sda: drive cache: write back
[ 17.241954] sd 0:0:0:0: Attached scsi disk sda
[ 17.252306] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 17.407853] hda_intel: azx_get_response timeout, switching to polling mode...
[ 17.425053] Write protecting the kernel read-only data: 677k
[ 19.653029] audit(1160678546.743:2): policy loaded auid=4294967295
[ 79.097281] audit(1160678605.629:3): dev=eth0 prom=256 old_prom=0 auid=4294967295
[ 178.402790] do_IRQ: 0.65 No irq handler for vector
[ 183.545101] NMI Watchdog detected LOCKUP on CPU 0
[ 183.545121] CPU 0
[ 183.545133] Modules linked in: sunrpc asus_acpi lp parport_pc parport nvram i2c_i801 joydev i2c_core ohci1394 ieee1394 ehci_hcd uhci_hcd
[ 183.545202] Pid: 0, comm: swapper Not tainted 2.6.19-rc1-fs2 #4
[ 183.545211] RIP: 0010:[<ffffffff81183dbc>] [<ffffffff81183dbc>] acpi_processor_idle+0x259/0x48d
[ 183.545231] RSP: 0018:ffffffff814a1f18 EFLAGS: 00000097
[ 183.545238] RAX: 000000000052573a RBX: 0000000000000001 RCX: 0000000000001008
[ 183.545247] RDX: 0000000000001016 RSI: 0000000000000013 RDI: 0000000000000000
[ 183.545256] RBP: ffffffff814a1f68 R08: ffff81007db44d00 R09: 000000007db44d60
[ 183.545265] R10: 0000000000000000 R11: 0000000000000246 R12: ffff81007db44d60
[ 183.545273] R13: 000000000052573a R14: ffff81007db44c00 R15: 0000000000000000
[ 183.545283] FS: 0000000000000000(0000) GS:ffffffff8148e000(0000) knlGS:0000000000000000
[ 183.545292] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 183.545301] CR2: 00007fffe4584418 CR3: 0000000077d67000 CR4: 00000000000006e0
[ 183.545310] Process swapper (pid: 0, threadinfo ffffffff814a0000, task ffffffff813d1420)
[ 183.545318] Stack: Bootdata ok (command line is ro root=/dev/VolGroup00/RootVol vga=794 apic=verbose nmi_watchdog=1 notsc [email protected]/,@64.62.190.123/00:0F:66:99:97:4F)


Attachments:
netconsole-1.txt (23.08 kB)
netconsole-2.txt (22.43 kB)
netconsole-3.txt (7.57 kB)
Download all attachments

2006-10-12 19:57:20

by Andrew Morton

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

On Thu, 12 Oct 2006 14:13:27 -0500
Frank Sorenson <[email protected]> wrote:

> Andrew Morton wrote:
> > On Wed, 11 Oct 2006 14:19:18 -0500
> > Frank Sorenson <[email protected]> wrote:
> >
> >> I'm getting kernel panics within a few minutes of boot with 2.6.19-rc1
> >> (latest git) on x86_64. Other than "make oldconfig", it's an identical
> >> configuration to a working kernel on 2.6.18.
> >>
> >> The panic scrolls off the screen, but I copied down what was on the screen:
> >
> > Can you get netconsole going? Documentation/networking/netconsole.txt.
> > It's pretty simple.
>
> Three netconsole dumps attached. I hope they provide more information.
> Let me know if there's anything more I can provide.
>

hmm.


> [ 20.889846] warning: process `date' used the removed sysctl system call
> [ 143.574063] do_IRQ: 0.65 No irq handler for vector

This might be the cause. Please try the appended fix.

> [ 160.311799] NMI Watchdog detected LOCKUP on CPU 1
> [ 160.312107] CPU 1
> [ 160.312250] Modules linked in: sunrpc asus_acpi lp parport_pc parport nvram uhci_hcd joydev i2c_i801 ohci1394 ieee1394 ehci_hcd i2c_core
> [ 160.313252] Pid: 0, comm: swapper Not tainted 2.6.19-rc1-fs2 #4
> [ 160.313635] RIP: 0010:[<ffffffff81183dbc>] [<ffffffff81183dbc>] acpi_processor_idle+0x259/0x48d
> [ 160.314224] RSP: 0018:ffff810037e1be78 EFLAGS: 00000097
> [ 160.314566] RAX: 00000000009bd686 RBX: 0000000000000001 RCX: 0000000000001008
> [ 160.315026] RDX: 0000000000001016 RSI: 0000000000000013 RDI: 0000000000000000
> [ 160.315485] RBP: ffff810037e1bec8 R08: ffff81007db44900 R09: 000000007db44960
> [ 160.315945] R10: 00007fff89ad2cd0 R11: 0000000000000246 R12: ffff81007db44960
> [ 160.316403] R13: 00000000009bd686 R14: ffff81007db44800 R15: 0000000000000008
> [ 160.316864] FS: 0000000000000000(0000) GS:ffff81007debecc0(0000) knlGS:0000000000000000
> [ 160.317383] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 160.317755] CR2: 00002b6954757000 CR3: 0000000067fb5000 CR4: 00000000000006e0
> [ 160.318214] Process swapper (pid: 0, threadinfo ffff810037e1a000, task ffff810037e02100)
> [ 160.318733] Stack: 0000000000000000 0000000000000001 ffff810037e1beb8 ffffffff8131d205
> [ 160.319324] 00000000810088ad ffffffff81183b63 0000000000000001 0000000000000100
> [ 160.319859] ffffffff8148e300 0000000000000008 ffff810037e1bee8 ffffffff81008caa
> [ 160.320379] Call Trace:
> [ 160.320559] [<ffffffff8131d205>] atomic_notifier_call_chain+0x3e/0x60
> [ 160.320984] [<ffffffff81183b63>] acpi_processor_idle+0x0/0x48d
> [ 160.321367] [<ffffffff81008caa>] cpu_idle+0x8f/0xc6
> [ 160.321693] [<ffffffff81018274>] start_secondary+0x44a/0x45a

That's a strange way for it to have manifested.


commit 994bd4f9f5a065ead4a92435fdd928ac7fd33809
tree 11e5b123bd5c5319a65ad4732ad3965b815dedbb
parent c25d5180441e344a3368d100c57f0a481c6944f7
author Eric W. Biederman <[email protected]> 1160628286 -0600
committer Linus Torvalds <[email protected]> 1160663850 -0700

[PATCH] x86_64 irq: Properly update vector_irq

This patch fixes my one line thinko where I was clearing
the vector_irq entries on the wrong cpus.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch/x86_64/kernel/io_apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index c3cdcab..44b55f8 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -660,7 +660,7 @@ next:
}
if (old_vector >= 0) {
int old_cpu;
- for_each_cpu_mask(old_cpu, domain)
+ for_each_cpu_mask(old_cpu, irq_domain[irq])
per_cpu(vector_irq, old_cpu)[old_vector] = -1;
}
for_each_cpu_mask(new_cpu, domain)

2006-10-12 19:59:18

by Andrew Morton

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

On Thu, 12 Oct 2006 15:50:24 -0400
"Doug Reiland" <[email protected]> wrote:

> FYI, I had to get CONFIG_SYSCTL_SYSCALL set to solve my 2.6.19-rc1 boot
> panic.

What boot panic was that?

> Actually, I couldn't get CONFIG_SYSCTL_SYSCALL=y to stick so I modified
> kernel/sysctl.c's ifdefs.

It depends on CONFIG_EMBEDDED.

2006-10-12 20:36:29

by Frank Sorenson

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

Andrew Morton wrote:
> On Thu, 12 Oct 2006 14:13:27 -0500
> Frank Sorenson <[email protected]> wrote:
>
>> Andrew Morton wrote:
>>> On Wed, 11 Oct 2006 14:19:18 -0500
>>> Frank Sorenson <[email protected]> wrote:
>>>
>>>> I'm getting kernel panics within a few minutes of boot with 2.6.19-rc1
>>>> (latest git) on x86_64. Other than "make oldconfig", it's an identical
>>>> configuration to a working kernel on 2.6.18.
>>>>
>>>> The panic scrolls off the screen, but I copied down what was on the screen:
>>> Can you get netconsole going? Documentation/networking/netconsole.txt.
>>> It's pretty simple.
>> Three netconsole dumps attached. I hope they provide more information.
>> Let me know if there's anything more I can provide.
>>
>
> hmm.
>
>
>> [ 20.889846] warning: process `date' used the removed sysctl system call
>> [ 143.574063] do_IRQ: 0.65 No irq handler for vector
>
> This might be the cause. Please try the appended fix.

This patch seems to fix the problem, and since it's already gone into
the kernel, the latest git tree works without modification.

Thanks for the quick response,

Frank

2006-10-12 20:57:52

by Doug Reiland

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

I had similiar problems moving from 2.6.18 to 2.6.19-rc1. It might
have something to do with me using a new kernel on an old
distributation, but things like INIT need sysctl.

I had a time getting CONFIG_SYSCTL_SYSCALL=y to stick so I changed the
kernel/sysctl.c ifdefs. You might try and it looks like you are using
x86_64 so double check for usage of that define under arch/x86_64. I
thought I saw it under the 32bit emulation stuff.

Boot just fine after this.

2006-11-17 14:44:45

by Doug Reiland

[permalink] [raw]
Subject: Re: Kernel panic in 2.6.19-rc1

Andrew Morton sorry I missed your reply:

To recap, I said:
FYI, I had to get CONFIG_SYSCTL_SYSCALL set to solve my
2.6.19-rc1 boot panic.
Actually, I couldn't get CONFIG_SYSCTL_SYSCALL=y to stick so I
modified kernel/sysctl.c's ifdefs.

You said:
What boot panic was that?
It depends on CONFIG_EMBEDDED.

The panic was because init died. I get an error message about unknown
library version (exact message I can't recall) and then the panic.

Again, I am running a new 2.6.x kernel on an old distribution so my
init binary or run-time loader might still be depending on SYSCTL.

I am now playing with a x86_64 kernel and saw this same problem. Your
CONFIG_EMBEDDED hint helped. I set that and CONFIG_SYSCTL_SYSCALL
stays on.

Thanks again and sorry for not attaching this to the original email thread.