2009-09-21 19:21:20

by Tony Vroon

[permalink] [raw]
Subject: 2.6.31-07068-g43c1266 Early boot exception

Good evening,

Current linus-2.6 no longer boots on my workstation, config attached.

Early exception transcribed (1.4MB JPEG available as well, probably too
big to attach here though):
IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10de8201 base: 0xfed00000
SMP: allowing 12 CPUs, 0 hotplug CPUs
Allocating PCI resources starting at b00000000 (gap:b00000000:3000000
NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73688 r0 d20520 u262
PANIC: early exception 06 rip 10:ffffffff815a50e6 error 0 cr2 d0fff6
Pid: 0, comm: swapper Not tained 2.6.31-07068-g43c266 #1
Call Trace:
[<ffffffff81591175>] early_idt_handler+0x55/0x68
[<ffffffff815a50e6>] ? pcpu_setup_first_chunk+0x40b/0x692
[<ffffffff815a50ff>] ? pcpu_setup_first_chunk+0x424/0x692
[<ffffffff815a5867>] pcpu_embed_first_chunk+0x1e2/0x241
[<ffffffff81598c14>] ? pcpu_fc_alloc+0x0/0xac
[<ffffffff81598bf5>] ? pcpu_fc_free+0x0/0x1f
[<ffffffff81598a36>] setup_per_cpu_areas+0x65/0x219
[<ffffffff815919bf>] start_kernel+0x124/0x2c5
[<ffffffff81591252>] x86_64_start_reservations+0x82/0x86
[<ffffffff8159133a>] x86_64_start_kernel+0xe4/0xeb
RIP pcpu_setup_first_chunk+0x40b/0x692

This is a Tyan S2932-SI mainboard on BIOS:
Vendor: American Megatrends Inc.
Version: 'V2.05 '
Release Date: 06/04/2009

This a dual hex-core system.

Last known working kernel is:
Linux prometheus 2.6.31-03123-g99bc470 #1 SMP Mon Sep 14 22:53:06 BST
2009 x86_64 Six-Core AMD Opteron(tm) Processor 2435 AuthenticAMD
GNU/Linux
(dmesg for normal operation in g99bc470 kernel attached)

Happy to provide any further information that you may require.

Regards,
Tony V.


Attachments:
.config (49.52 kB)
dmesg-g99bc470.txt (43.88 kB)
signature.asc (198.00 B)
This is a digitally signed message part
Download all attachments

2009-09-21 19:26:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Tony Vroon wrote:
>
> Last known working kernel is:
> Linux prometheus 2.6.31-03123-g99bc470 #1 SMP Mon Sep 14 22:53:06 BST
> 2009 x86_64 Six-Core AMD Opteron(tm) Processor 2435 AuthenticAMD
> GNU/Linux
> (dmesg for normal operation in g99bc470 kernel attached)
>
> Happy to provide any further information that you may require.
>

Could you please do a git bisect on this problem?

-hpa

2009-09-21 19:34:21

by Tony Vroon

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

On Mon, 2009-09-21 at 12:26 -0700, H. Peter Anvin wrote:
> Could you please do a git bisect on this problem?

Would love to, but how do I convert g99bc470 to a correct git revision
number please?

chainsaw@prometheus /cvs/linux-2.6 $ git bisect start
chainsaw@prometheus /cvs/linux-2.6 $ git bisect bad
chainsaw@prometheus /cvs/linux-2.6 $ git bisect good g99bc470
fatal: Needed a single revision
Bad rev input: g99bc470

> -hpa

Regards,
Tony V.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-09-21 19:36:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception


* Tony Vroon <[email protected]> wrote:

> On Mon, 2009-09-21 at 12:26 -0700, H. Peter Anvin wrote:
> > Could you please do a git bisect on this problem?
>
> Would love to, but how do I convert g99bc470 to a correct git revision
> number please?
>
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect start
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect bad
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect good g99bc470
> fatal: Needed a single revision
> Bad rev input: g99bc470

strip the leading g. (which stands for Git)

i.e. 99bc470.

Ingo

2009-09-21 19:53:31

by Henk Martijn

[permalink] [raw]
Subject: 2.6.31-0.1-default-07068-g43c1266 lockdep warning and scheduling while atomic BUG

This is a i7 based DELL and current git throws the following warning and bug at me during boot:

config attached.

[ 13.905024] PM: Adding info for No Bus:hidraw1
[ 13.905118] generic-usb 0003:0D62:2106.0002: input,hidraw1: USB HID v1.10 Keyboard [USB Multimedia Keyboard] on usb-0000:00:1d.0-2/input0
[ 13.942228] driver: '0003:0D62:2106.0002': driver_bound: bound to device 'generic-usb'
[ 13.942233] bus: 'hid': really_probe: bound device 0003:0D62:2106.0002 to driver generic-usb
[ 13.942301] driver: '4-2:1.0': driver_bound: bound to device 'usbhid'
[ 13.942306] bus: 'usb': really_probe: bound device 4-2:1.0 to driver usbhid
[ 13.942310] bus: 'usb': driver_probe_device: matched device 4-2:1.1 with driver usbhid
[ 13.942312] bus: 'usb': really_probe: probing driver usbhid with device 4-2:1.1
[ 13.942396] device: '0003:0D62:2106.0003': device_add
[ 13.942422] bus: 'hid': add device 0003:0D62:2106.0003
[ 13.942485] PM: Adding info for hid:0003:0D62:2106.0003
[ 13.942546] bus: 'hid': driver_probe_device: matched device 0003:0D62:2106.0003 with driver generic-usb
[ 13.942548] bus: 'hid': really_probe: probing driver generic-usb with device 0003:0D62:2106.0003
[ 13.978016] ------------[ cut here ]------------
[ 14.014518] WARNING: at kernel/lockdep.c:2457 lockdep_trace_alloc+0x96/0xc3()
[ 14.051564] Hardware name: Studio XPS 435MT
[ 14.088517] Modules linked in: usbhid(+) usb_libusual dcdbas ehci_hcd uhci_hcd sd_mod usbcore edd ext3 mbcache jbd fan ata_generic ata_piix thermal processor
[ 14.127811] Pid: 0, comm: swapper Not tainted 2.6.31-0.1-default-07068-g43c1266 #25
[ 14.166977] Call Trace:
[ 14.205599] <IRQ> [<ffffffff81063570>] ? lockdep_trace_alloc+0x96/0xc3
[ 14.244469] [<ffffffff8103e227>] warn_slowpath_common+0x77/0xa4
[ 14.282978] [<ffffffff8103e263>] warn_slowpath_null+0xf/0x11
[ 14.320837] [<ffffffff81063570>] lockdep_trace_alloc+0x96/0xc3
[ 14.358269] [<ffffffff810c9af3>] kmem_cache_alloc+0x31/0x12f
[ 14.395248] [<ffffffff81265e62>] ? hid_input_report+0x8e/0x2d4
[ 14.431762] [<ffffffff81265e62>] hid_input_report+0x8e/0x2d4
[ 14.467764] [<ffffffffa00f4fe6>] hid_ctrl+0xaa/0x180 [usbhid]
[ 14.503265] [<ffffffffa0087c83>] usb_hcd_giveback_urb+0x84/0xbb [usbcore]
[ 14.538766] [<ffffffffa00c28c8>] uhci_giveback_urb+0x114/0x257 [uhci_hcd]
[ 14.573767] [<ffffffff810c2904>] ? dma_pool_free+0x1ce/0x1da
[ 14.608146] [<ffffffffa00c32b5>] uhci_scan_schedule+0x5ad/0x868 [uhci_hcd]
[ 14.642116] [<ffffffffa00c5329>] uhci_irq+0x134/0x14e [uhci_hcd]
[ 14.675410] [<ffffffffa008764c>] usb_hcd_irq+0x38/0x93 [usbcore]
[ 14.708595] [<ffffffff81084832>] handle_IRQ_event+0x54/0x12d
[ 14.741681] [<ffffffff81086329>] handle_fasteoi_irq+0x8b/0xcb
[ 14.774501] [<ffffffff8100dacf>] handle_irq+0x89/0x92
[ 14.806831] [<ffffffff8100cc54>] do_IRQ+0x5a/0xba
[ 14.838698] [<ffffffff8100b993>] ret_from_intr+0x0/0xf
[ 14.870502] <EOI> [<ffffffff811c7b0d>] ? acpi_hw_validate_io_request+0x71/0x153
[ 14.902707] [<ffffffffa00030ff>] ? acpi_idle_enter_simple+0x12d/0x15b [processor]
[ 14.935371] [<ffffffffa00030f5>] ? acpi_idle_enter_simple+0x123/0x15b [processor]
[ 14.967892] [<ffffffffa0002e0e>] ? acpi_idle_enter_bm+0xd3/0x297 [processor]
[ 15.000367] [<ffffffff81336afd>] ? __atomic_notifier_call_chain+0x78/0x87
[ 15.032797] [<ffffffff8126286f>] ? cpuidle_idle_call+0x93/0xf0
[ 15.065101] [<ffffffff8100a30c>] ? cpu_idle+0x89/0xca
[ 15.097244] [<ffffffff8132e13a>] ? start_secondary+0x290/0x2db
[ 15.129253] ---[ end trace 427177f6fb7d2a3d ]---
[ 15.161208] BUG: scheduling while atomic: swapper/0/0x10010000
[ 15.193419] INFO: lockdep is turned off.
[ 15.225415] Modules linked in: usbhid(+) usb_libusual dcdbas ehci_hcd uhci_hcd sd_mod usbcore edd ext3 mbcache jbd fan ata_generic ata_piix thermal processor
[ 15.259383] irq event stamp: 100508
[ 15.292529] hardirqs last enabled at (100507): [<ffffffffa00030f5>] acpi_idle_enter_simple+0x123/0x15b [processor]
[ 15.326939] hardirqs last disabled at (100508): [<ffffffff8100ade7>] save_args+0x67/0x70
[ 15.361106] softirqs last enabled at (100496): [<ffffffff810440ce>] __do_softirq+0x1a3/0x1b5
[ 15.395335] softirqs last disabled at (100481): [<ffffffff8100c17c>] call_softirq+0x1c/0x34
[ 15.429297] CPU 3:
[ 15.462587] Modules linked in: usbhid(+) usb_libusual dcdbas ehci_hcd uhci_hcd sd_mod usbcore edd ext3 mbcache jbd fan ata_generic ata_piix thermal processor
[ 15.497904] Pid: 0, comm: swapper Tainted: G W 2.6.31-0.1-default-07068-g43c1266 #25 Studio XPS 435MT
[ 15.533540] RIP: 0010:[<ffffffffa00030ff>] [<ffffffffa00030ff>] acpi_idle_enter_simple+0x12d/0x15b [processor]
[ 15.569867] RSP: 0018:ffff8800be6bde28 EFLAGS: 00000206
[ 15.606233] RAX: ffff8800be6bdfd8 RBX: ffff8800be6bde68 RCX: 00000000000f4240
[ 15.643177] RDX: ffff8800068c0000 RSI: 0000000000000004 RDI: ffff8800be6c8000
[ 15.680292] RBP: ffffffff8100b98e R08: ffff8800b9ce2090 R09: 0000000000000000
[ 15.717613] R10: ffffffff811c7b0d R11: ffff8800be6bde08 R12: ffffffff8152eed0
[ 15.754834] R13: 0000000000000086 R14: 0000000000000046 R15: ffff8800be6bddb8
[ 15.792046] FS: 0000000000000000(0000) GS:ffff8800068c0000(0000) knlGS:0000000000000000
[ 15.828930] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 15.865108] CR2: 00000000023e2c78 CR3: 00000000b98bc000 CR4: 00000000000006e0
[ 15.901501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 15.937058] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 15.971439] Call Trace:
[ 16.004630] [<ffffffffa00030f5>] ? acpi_idle_enter_simple+0x123/0x15b [processor]
[ 16.037876] [<ffffffffa0002e0e>] ? acpi_idle_enter_bm+0xd3/0x297 [processor]
[ 16.070344] [<ffffffff81336afd>] ? __atomic_notifier_call_chain+0x78/0x87
[ 16.102634] [<ffffffff8126286f>] ? cpuidle_idle_call+0x93/0xf0
[ 16.134673] [<ffffffff8100a30c>] ? cpu_idle+0x89/0xca
[ 16.165749] [<ffffffff8132e13a>] ? start_secondary+0x290/0x2db

I can provide more information or test patches if needed,

Regards,

Henk


Attachments:
config.gz (12.75 kB)

2009-09-21 20:03:11

by Oliver Neukum

[permalink] [raw]
Subject: Re: 2.6.31-0.1-default-07068-g43c1266 lockdep warning and scheduling while atomic BUG

Am Montag, 21. September 2009 21:53:28 schrieb Henk Martijn:
> This is a i7 based DELL and current git throws the following warning and
> bug at me during boot:

Please try this patch

Regards
Oliver

--

commit ca5c4a1397d1a1c0d1074f4d8922630fdd732780
Author: Oliver Neukum <[email protected]>
Date: Mon Sep 21 22:02:01 2009 +0200

hid:usbhid: fix wrong use of GFP_KERNEL

hid_input_report() must be told it is called in interrupt context

diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index 1b0e07a..ab2869d 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -455,7 +455,7 @@ static void hid_ctrl(struct urb *urb)
if (usbhid->ctrl[usbhid->ctrltail].dir == USB_DIR_IN)
hid_input_report(urb->context,
usbhid->ctrl[usbhid->ctrltail].report->type,
- urb->transfer_buffer, urb->actual_length, 0);
+ urb->transfer_buffer, urb->actual_length, 1);
break;
case -ESHUTDOWN: /* unplug */
unplug = 1;

2009-09-21 20:17:29

by Henk Martijn

[permalink] [raw]
Subject: Re: 2.6.31-0.1-default-07068-g43c1266 lockdep warning and scheduling while atomic BUG

Yes that worked, Thanks!

/Henk

Oliver Neukum wrote:
> Am Montag, 21. September 2009 21:53:28 schrieb Henk Martijn:
>> This is a i7 based DELL and current git throws the following warning and
>> bug at me during boot:
>
> Please try this patch
>
> Regards
> Oliver
>
> --
>
> commit ca5c4a1397d1a1c0d1074f4d8922630fdd732780
> Author: Oliver Neukum <[email protected]>
> Date: Mon Sep 21 22:02:01 2009 +0200
>
> hid:usbhid: fix wrong use of GFP_KERNEL
>
> hid_input_report() must be told it is called in interrupt context
>
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index 1b0e07a..ab2869d 100644
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -455,7 +455,7 @@ static void hid_ctrl(struct urb *urb)
> if (usbhid->ctrl[usbhid->ctrltail].dir == USB_DIR_IN)
> hid_input_report(urb->context,
> usbhid->ctrl[usbhid->ctrltail].report->type,
> - urb->transfer_buffer, urb->actual_length, 0);
> + urb->transfer_buffer, urb->actual_length, 1);
> break;
> case -ESHUTDOWN: /* unplug */
> unplug = 1;
>
>

2009-09-21 20:46:09

by Tony Vroon

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

On Mon, 2009-09-21 at 12:26 -0700, H. Peter Anvin wrote:
> Could you please do a git bisect on this problem?

On Mon, 2009-09-21 at 21:36 +0200, Ingo Molnar wrote:
> strip the leading g. (which stands for Git)

Thanks Ingo. 11-step bisection reveals:

chainsaw@prometheus /cvs/linux-2.6 $ git bisect bad
fd1e8a1fe2b54df6c185b4fa65f181f50b9c4d4e is the first bad commit
commit fd1e8a1fe2b54df6c185b4fa65f181f50b9c4d4e
Author: Tejun Heo <[email protected]>
Date: Fri Aug 14 15:00:51 2009 +0900

percpu: introduce pcpu_alloc_info and pcpu_group_info

I will repeat the early exception here:
IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10de8201 base: 0xfed00000
SMP: allowing 12 CPUs, 0 hotplug CPUs
Allocating PCI resources starting at b00000000 (gap:b00000000:3000000
NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73688 r0 d20520 u262
PANIC: early exception 06 rip 10:ffffffff815a50e6 error 0 cr2 d0fff6
Pid: 0, comm: swapper Not tained 2.6.31-07068-g43c266 #1
Call Trace:
[<ffffffff81591175>] early_idt_handler+0x55/0x68
[<ffffffff815a50e6>] ? pcpu_setup_first_chunk+0x40b/0x692
[<ffffffff815a50ff>] ? pcpu_setup_first_chunk+0x424/0x692
[<ffffffff815a5867>] pcpu_embed_first_chunk+0x1e2/0x241
[<ffffffff81598c14>] ? pcpu_fc_alloc+0x0/0xac
[<ffffffff81598bf5>] ? pcpu_fc_free+0x0/0x1f
[<ffffffff81598a36>] setup_per_cpu_areas+0x65/0x219
[<ffffffff815919bf>] start_kernel+0x124/0x2c5
[<ffffffff81591252>] x86_64_start_reservations+0x82/0x86
[<ffffffff8159133a>] x86_64_start_kernel+0xe4/0xeb
RIP pcpu_setup_first_chunk+0x40b/0x692

Config attached.

Regards,
Tony V.


Attachments:
config.txt (49.52 kB)
signature.asc (198.00 B)
This is a digitally signed message part
Download all attachments

2009-09-21 21:14:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Tony Vroon wrote:
> On Mon, 2009-09-21 at 12:26 -0700, H. Peter Anvin wrote:
>> Could you please do a git bisect on this problem?
>
> Would love to, but how do I convert g99bc470 to a correct git revision
> number please?
>
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect start
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect bad
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect good g99bc470
> fatal: Needed a single revision
> Bad rev input: g99bc470
>

Drop the leading "g" (which stands for git).

-hpa

2009-09-22 09:08:59

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Tony Vroon wrote:
> Thanks Ingo. 11-step bisection reveals:
>
> chainsaw@prometheus /cvs/linux-2.6 $ git bisect bad
> fd1e8a1fe2b54df6c185b4fa65f181f50b9c4d4e is the first bad commit
> commit fd1e8a1fe2b54df6c185b4fa65f181f50b9c4d4e
> Author: Tejun Heo <[email protected]>
> Date: Fri Aug 14 15:00:51 2009 +0900
>
> percpu: introduce pcpu_alloc_info and pcpu_group_info
>
> I will repeat the early exception here:
> IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
> Using ACPI (MADT) for SMP configuration information
> ACPI: HPET id: 0x10de8201 base: 0xfed00000
> SMP: allowing 12 CPUs, 0 hotplug CPUs
> Allocating PCI resources starting at b00000000 (gap:b00000000:3000000
> NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
> PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73688 r0 d20520 u262
> PANIC: early exception 06 rip 10:ffffffff815a50e6 error 0 cr2 d0fff6
> Pid: 0, comm: swapper Not tained 2.6.31-07068-g43c266 #1
> Call Trace:
> [<ffffffff81591175>] early_idt_handler+0x55/0x68
> [<ffffffff815a50e6>] ? pcpu_setup_first_chunk+0x40b/0x692
> [<ffffffff815a50ff>] ? pcpu_setup_first_chunk+0x424/0x692
> [<ffffffff815a5867>] pcpu_embed_first_chunk+0x1e2/0x241
> [<ffffffff81598c14>] ? pcpu_fc_alloc+0x0/0xac
> [<ffffffff81598bf5>] ? pcpu_fc_free+0x0/0x1f
> [<ffffffff81598a36>] setup_per_cpu_areas+0x65/0x219
> [<ffffffff815919bf>] start_kernel+0x124/0x2c5
> [<ffffffff81591252>] x86_64_start_reservations+0x82/0x86
> [<ffffffff8159133a>] x86_64_start_kernel+0xe4/0xeb
> RIP pcpu_setup_first_chunk+0x40b/0x692

Can you please try the patch below? Also, please build the kernel
with debug info and ask gdb which line the crash corresponds to?
ie. l *pcpu_embed_first_chunk+0x1e2

Thanks.

diff --git a/mm/percpu.c b/mm/percpu.c
index 43d8cac..66b7d5e 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1812,6 +1812,9 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, ssize_t dyn_size,
/* allocate space for the whole group */
ptr = alloc_fn(cpu, gi->nr_units * ai->unit_size, atom_size);
if (!ptr) {
+ printk("PERCPU: failed to allocate %zu bytes for "
+ "group %d from cpu%u\n",
+ gi->nr_units * ai->unit_size, group, cpu);
rc = -ENOMEM;
goto out_free_areas;
}
@@ -1844,8 +1847,9 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, ssize_t dyn_size,

out_free_areas:
for (group = 0; group < ai->nr_groups; group++)
- free_fn(areas[group],
- ai->groups[group].nr_units * ai->unit_size);
+ if (areas[group])
+ free_fn(areas[group],
+ ai->groups[group].nr_units * ai->unit_size);
out_free:
pcpu_free_alloc_info(ai);
if (areas)
@@ -1956,7 +1960,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,

enomem:
while (--j >= 0)
- free_fn(page_address(pages[j]), PAGE_SIZE);
+ if (pages[j])
+ free_fn(page_address(pages[j]), PAGE_SIZE);
rc = -ENOMEM;
out_free_ar:
free_bootmem(__pa(pages), pages_size);


--
tejun

2009-09-22 09:23:41

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

On Tue, 22 Sep 2009 18:08:39 +0900
Tejun Heo <[email protected]> wrote:

> Tony Vroon wrote:
> > Thanks Ingo. 11-step bisection reveals:
> >
> > chainsaw@prometheus /cvs/linux-2.6 $ git bisect bad
> > fd1e8a1fe2b54df6c185b4fa65f181f50b9c4d4e is the first bad commit
> > commit fd1e8a1fe2b54df6c185b4fa65f181f50b9c4d4e
> > Author: Tejun Heo <[email protected]>
> > Date: Fri Aug 14 15:00:51 2009 +0900
> >
> > percpu: introduce pcpu_alloc_info and pcpu_group_info
> >
> > I will repeat the early exception here:
> > IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
> > Using ACPI (MADT) for SMP configuration information
> > ACPI: HPET id: 0x10de8201 base: 0xfed00000
> > SMP: allowing 12 CPUs, 0 hotplug CPUs
> > Allocating PCI resources starting at b00000000
> > (gap:b00000000:3000000 NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12
> > nr_node_ids:2 PERCPU: Embedded 23 pages/cpu @ffff880028200000
> > s73688 r0 d20520 u262 PANIC: early exception 06 rip
> > 10:ffffffff815a50e6 error 0 cr2 d0fff6 Pid: 0, comm: swapper Not
> > tained 2.6.31-07068-g43c266 #1 Call Trace:
> > [<ffffffff81591175>] early_idt_handler+0x55/0x68
> > [<ffffffff815a50e6>] ? pcpu_setup_first_chunk+0x40b/0x692
> > [<ffffffff815a50ff>] ? pcpu_setup_first_chunk+0x424/0x692
> > [<ffffffff815a5867>] pcpu_embed_first_chunk+0x1e2/0x241
> > [<ffffffff81598c14>] ? pcpu_fc_alloc+0x0/0xac
> > [<ffffffff81598bf5>] ? pcpu_fc_free+0x0/0x1f
> > [<ffffffff81598a36>] setup_per_cpu_areas+0x65/0x219
> > [<ffffffff815919bf>] start_kernel+0x124/0x2c5
> > [<ffffffff81591252>] x86_64_start_reservations+0x82/0x86
> > [<ffffffff8159133a>] x86_64_start_kernel+0xe4/0xeb
> > RIP pcpu_setup_first_chunk+0x40b/0x692
>
> Can you please try the patch below? Also, please build the kernel
> with debug info and ask gdb which line the crash corresponds to?
> ie. l *pcpu_embed_first_chunk+0x1e2

if you build with debuginfo.. run
perl scripts/markup_oops.pl on your dmesg

that gives you not just the line, but the whole context of the crash.



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-22 11:40:29

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Arjan van de Ven wrote:
> if you build with debuginfo.. run
> perl scripts/markup_oops.pl on your dmesg
>
> that gives you not just the line, but the whole context of the crash.

Darn, I knew there gotta be something like that. That's a pretty nice
script. Thanks for the tip. :-)

--
tejun

2009-09-22 19:34:56

by Tony Vroon

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

On Tue, 2009-09-22 at 18:08 +0900, Tejun Heo wrote:
> Can you please try the patch below? Also, please build the kernel
> with debug info and ask gdb which line the crash corresponds to?
> ie. l *pcpu_embed_first_chunk+0x1e2

Patch applied, no additional output visible:
OAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10de8201 base: 0xfed00000
SMP: allowing 12 CPUs, 0 hotplug CPUs
Allocating PCI resources starting at b00000000 (gap:b0000000:30000000)
NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73688 r0 d20520 u26214
PANIC: early exception 06 rip 10:ffffffff81598250 error 0 cr2 d0fff6
Pid: 0, comm: swapper Not tainted 2.6.31-07068-g43c266-dirty #1
Call Trace:
[<ffffffff815719b5>] early_idt_handler+0x55/0x68
[<ffffffff81587259>] ? pcpu_setup_first_chunk+0x40b/0x703
[<ffffffff81587273>] ? pcpu_setup_first_chunk+0x427/0x703
[<ffffffff81587a78>] pcpu_embed_first_chunk+0x1fd/0x261
[<ffffffff8158a927>] ? pcpu_fc_alloc+0x0/0xac
[<ffffffff8157a908>] ? pcpu_fc_free+0x0/0x1f
[<ffffffff8157a749>] setup_per_cpu_areas+0x65/0x219
[<ffffffff815721ff>] start_kernel+0x124/0x2c5
[<ffffffff81571a92>] x86_64_start_reservations+0x82/0x86
[<ffffffff81571b7a>] x86_64_start_kernel+0xe4/0xeb
RIP pcpu_setup_first_chunk+0x40d/0x703

gdb output:

(gdb) l *pcpu_embed_first_chunk+0x1fd
0xffffffff81587a78 is in pcpu_embed_first_chunk (mm/percpu.c:1845).
1840
1841 pr_info("PERCPU: Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu
\n",
1842 PFN_DOWN(size_sum), base, ai->static_size, ai->reserved_size,
1843 ai->dyn_size, ai->unit_size);
1844
1845 rc = pcpu_setup_first_chunk(ai, base);
1846 goto out_free;
1847
1848 out_free_areas:
1849 for (group = 0; group < ai->nr_groups; group++)

Regards,
Tony V.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-09-23 05:53:22

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Hello,

Tony Vroon wrote:
> On Tue, 2009-09-22 at 18:08 +0900, Tejun Heo wrote:
>> Can you please try the patch below? Also, please build the kernel
>> with debug info and ask gdb which line the crash corresponds to?
>> ie. l *pcpu_embed_first_chunk+0x1e2
>
> Patch applied, no additional output visible:
> OAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
> Using ACPI (MADT) for SMP configuration information
> ACPI: HPET id: 0x10de8201 base: 0xfed00000
> SMP: allowing 12 CPUs, 0 hotplug CPUs
> Allocating PCI resources starting at b00000000 (gap:b0000000:30000000)
> NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
> PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73688 r0 d20520 u26214
> PANIC: early exception 06 rip 10:ffffffff81598250 error 0 cr2 d0fff6

Can you please do l on the above rip - ffffffff81598250?

Thanks.

--
tejun

2009-09-23 09:20:19

by Tony Vroon

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

On Wed, 2009-09-23 at 14:52 +0900, Tejun Heo wrote:
> Can you please do l on the above rip - ffffffff81598250?

I was unable to:
(gdb) l *ffffffff81598250
No symbol "ffffffff81598250" in current context.

Here is my vmlinux, LZMA-packed (~15MB download):
http://www.vroon.org/vmlinux.lzma

Hopefully this will allow you to find out the information requested.

Regards,
Tony V.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-09-23 09:23:57

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Tony Vroon wrote:
> On Wed, 2009-09-23 at 14:52 +0900, Tejun Heo wrote:
>> Can you please do l on the above rip - ffffffff81598250?
>
> I was unable to:
> (gdb) l *ffffffff81598250
> No symbol "ffffffff81598250" in current context.

Ah... you should have done l *0xffffffff81598250.

> Here is my vmlinux, LZMA-packed (~15MB download):
> http://www.vroon.org/vmlinux.lzma
>
> Hopefully this will allow you to find out the information requested.

But having the kernel image is much better. Thanks.

--
tejun

2009-09-23 15:47:50

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Tejun Heo wrote:
> Tony Vroon wrote:
>> On Wed, 2009-09-23 at 14:52 +0900, Tejun Heo wrote:
>>> Can you please do l on the above rip - ffffffff81598250?
>> I was unable to:
>> (gdb) l *ffffffff81598250
>> No symbol "ffffffff81598250" in current context.
>
> Ah... you should have done l *0xffffffff81598250.
>

A better way to do this is:

addr2line -e vmlinux 0xffffffff81598250

No need to involve gdb.

-hpa

2009-09-23 23:26:11

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

H. Peter Anvin wrote:
> Tejun Heo wrote:
>> Tony Vroon wrote:
>>> On Wed, 2009-09-23 at 14:52 +0900, Tejun Heo wrote:
>>>> Can you please do l on the above rip - ffffffff81598250?
>>> I was unable to:
>>> (gdb) l *ffffffff81598250
>>> No symbol "ffffffff81598250" in current context.
>>
>> Ah... you should have done l *0xffffffff81598250.
>>
>
> A better way to do this is:
>
> addr2line -e vmlinux 0xffffffff81598250
>
> No need to involve gdb.

Neat. I usually keep gdb open as a calculator w/ C syntax, so I've
never looked for other stuff but this is much better for bug
reporters. Thanks.

--
tejun

2009-09-24 00:05:26

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.31-07068-g43c1266 Early boot exception

Hello,

Tony Vroon wrote:
> Patch applied, no additional output visible:
> OAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
> Using ACPI (MADT) for SMP configuration information
> ACPI: HPET id: 0x10de8201 base: 0xfed00000
> SMP: allowing 12 CPUs, 0 hotplug CPUs
> Allocating PCI resources starting at b00000000 (gap:b0000000:30000000)
> NR_CPUS: 12 nr_cpumask_bits:12 nr_cpu_ids:12 nr_node_ids:2
> PERCPU: Embedded 23 pages/cpu @ffff880028200000 s73688 r0 d20520 u26214
> PANIC: early exception 06 rip 10:ffffffff81598250 error 0 cr2 d0fff6
> Pid: 0, comm: swapper Not tainted 2.6.31-07068-g43c266-dirty #1
> Call Trace:
> [<ffffffff815719b5>] early_idt_handler+0x55/0x68
> [<ffffffff81587259>] ? pcpu_setup_first_chunk+0x40b/0x703
> [<ffffffff81587273>] ? pcpu_setup_first_chunk+0x427/0x703
> [<ffffffff81587a78>] pcpu_embed_first_chunk+0x1fd/0x261
> [<ffffffff8158a927>] ? pcpu_fc_alloc+0x0/0xac
> [<ffffffff8157a908>] ? pcpu_fc_free+0x0/0x1f
> [<ffffffff8157a749>] setup_per_cpu_areas+0x65/0x219
> [<ffffffff815721ff>] start_kernel+0x124/0x2c5
> [<ffffffff81571a92>] x86_64_start_reservations+0x82/0x86
> [<ffffffff81571b7a>] x86_64_start_kernel+0xe4/0xeb
> RIP pcpu_setup_first_chunk+0x40d/0x703

I'm a bit confused by the panic message. The PANIC line says rip is
0xffffffff81598250 which is arch/x86/pci/amd_bus.c:315 which is in the
middle of early_fill_mp_bus_info() which happens way later during
initialization while the call stack shows pcpu_embed_first_chunk()
just called pcpu_setup_first_chunk. You sure this is the right image?
Anyways, the last line "RIP pcpu_setup_first_chunk+0x40d/0x703"
combined with the exception number (#UD) indicates that it's probably
line 1635 in percpu.c.

Can you please apply the following patch and report the failing log?
Specifying earlyprintk=vga (or something like
earlyprintk=ttyS0,115200) should show you how pcpu initialization is
failing.

Thanks.

diff --git a/mm/percpu.c b/mm/percpu.c
index 43d8cac..d9a636f 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1631,8 +1631,18 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
pcpu_last_unit_cpu = cpu;
pcpu_nr_units = unit;

- for_each_possible_cpu(cpu)
- BUG_ON(unit_map[cpu] == NR_CPUS);
+ for_each_possible_cpu(cpu) {
+ static char cpus[4096] __initdata;
+
+ if (unit_map[cpu] != NR_CPUS)
+ continue;
+
+ printk(KERN_CRIT "PERCPU: unit missing for cpu%d\n", cpu);
+ cpumask_scnprintf(cpus, sizeof(cpus), cpu_possible_mask);
+ printk(KERN_CRIT "PERCPU: cpu_possible_mask=%s\n", cpus);
+ pcpu_dump_alloc_info(KERN_CRIT, ai);
+ BUG();
+ }

pcpu_nr_groups = ai->nr_groups;
pcpu_group_offsets = group_offsets;


--
tejun