2004-09-30 19:15:54

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: 2.6.9rc2-mm4 oops

I ran into this while testing the voluntary preemption S7 patch. It
still happens in vanilla 2.6.9rc2-mm4 (with the small patch below that
Ingo sent me, otherwise plain rc2-mm4 would not boot). The machine is an
athlon64 based workstation and has no floppy drive:

inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
Unable to handle kernel paging request at virtual address f8881920
printing eip:
c0251d3d
*pde = 37f5f067
Oops: 0002 [#1]
PREEMPT
Modules linked in: floppy(U) sg(U) dm_mod(U) uhci_hcd(U) ehci_hcd(U)
button(U) battery(U) asus_acpi(U) ac(U) ext3(U) jbd(U) raid5(U) xor(U)
sata_via(U) sata_promise(U) libata(U) sd_mod(U) scsi_mod(U)
CPU: 0
EIP: 0060:[<c0251d3d>] Not tainted VLI
EFLAGS: 00010246 (2.6.8.1-1.520.1nov.rhfc2.ccrma)
EIP is at acpi_bus_register_driver+0xd2/0x165
eax: f8881920 ebx: f88eefe0 ecx: c03d6b40 edx: f88ebd30
esi: ffffffed edi: f6ed4000 ebp: c03d9460 esp: f6ed4f5c
ds: 007b es: 007b ss: 0068
Process modprobe (pid: 2119, threadinfo=f6ed4000 task=f6e71870)
Stack: c03d94a0 f88e9126 00000015 00000014 f8870bc1 c03d94a0 f88ef280
f6ed4000
c0129ef7 c03d94a0 f88ef280 f6ed4000 c03d9460 c014dd52 00000246
f62f29ac
f6e1fc40 f6d8b8ac f6de4380 f6de43ac 00000000 b7fde008 0807a1a0
006a809d
Call Trace:
[<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
[<f8870bc1>] floppy_init+0x11/0x600 [floppy]
[<c0129ef7>] printk+0x17/0x20
[<c014dd52>] sys_init_module+0x252/0x3b0
[<c0106afd>] sysenter_past_esp+0x52/0x71
Code: 00 00 00 a1 ec 67 3e c0 c7 05 78 67 3e c0 a0 7b 3a c0 c7 05 7c 67
3e c0 bd 01 00 00 89 1d ec 67 3e c0 c7 03 e8 67 3e c0 89 43 04 <89> 18
81 3d 68 67 3e c0 3c 4b 24 1d 74 1c 68 68 67 3e c0 68 bf
<6>note: modprobe[2119] exited with preempt_count 1
Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():1, irqs_disabled():0
[<c0125652>] __might_sleep+0xa2/0xb0
[<c012d222>] do_exit+0xa2/0x980
[<c010775f>] die+0x2bf/0x2c0
[<c012a0b6>] vprintk+0x1b6/0x340
[<c011e1e4>] do_page_fault+0x314/0x56c
[<c01d8155>] sysfs_new_dirent+0x25/0x80
[<c01d81cd>] sysfs_make_dirent+0x1d/0x90
[<c0172a29>] unmap_area_pmd+0x49/0x60
[<c01d7d84>] sysfs_add_file+0x74/0xa0
[<c0172bb0>] unmap_vm_area+0x30/0x80
[<c0173136>] __vunmap+0xb6/0xf0
[<c0129cf0>] call_console_drivers+0x80/0x110
[<c011ded0>] do_page_fault+0x0/0x56c
[<c0106cf9>] error_code+0x2d/0x38
[<c0251d3d>] acpi_bus_register_driver+0xd2/0x165
[<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
[<f8870bc1>] floppy_init+0x11/0x600 [floppy]
[<c0129ef7>] printk+0x17/0x20
[<c014dd52>] sys_init_module+0x252/0x3b0
[<c0106afd>] sysenter_past_esp+0x52/0x71
ohci1394: $Rev: 1226 $ Ben Collins <[email protected]>

-- Fernando

--- linux/init/main.c.orig
+++ linux/init/main.c
@@ -435,6 +435,12 @@ static void noinline rest_init(void)
{
kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND);
numa_default_policy();
+ /*
+ * Re-enable preemption but disable interrupts to make sure
+ * we dont get preempted until we schedule() in cpu_idle().
+ */
+ local_irq_disable();
+ preempt_enable_no_resched();
unlock_kernel();
cpu_idle();
}
@@ -501,6 +507,7 @@ asmlinkage void __init start_kernel(void
* time - but meanwhile we still have a functioning scheduler.
*/
sched_init();
+ preempt_disable();
build_all_zonelists();
page_alloc_init();
trap_init();
--- linux/arch/i386/kernel/entry.S.orig
+++ linux/arch/i386/kernel/entry.S
@@ -197,10 +197,8 @@ need_resched:
jz restore_all
testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception
path) ?
jz restore_all
- movl $PREEMPT_ACTIVE,TI_preempt_count(%ebp)
sti
- call schedule
- movl $0,TI_preempt_count(%ebp)
+ call preempt_schedule
cli
jmp need_resched
#endif


2004-09-30 19:51:56

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

Fernando Pablo Lopez-Lezcano <[email protected]> wrote:
>
> I ran into this while testing the voluntary preemption S7 patch. It
> still happens in vanilla 2.6.9rc2-mm4 (with the small patch below that
> Ingo sent me, otherwise plain rc2-mm4 would not boot). The machine is an
> athlon64 based workstation and has no floppy drive:

Thanks. Whenever I see floppy+acpi+oops, I think "Bjorn!".

> inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
> Unable to handle kernel paging request at virtual address f8881920
> printing eip:
> c0251d3d
> *pde = 37f5f067
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: floppy(U) sg(U) dm_mod(U) uhci_hcd(U) ehci_hcd(U)
> button(U) battery(U) asus_acpi(U) ac(U) ext3(U) jbd(U) raid5(U) xor(U)
> sata_via(U) sata_promise(U) libata(U) sd_mod(U) scsi_mod(U)
> CPU: 0
> EIP: 0060:[<c0251d3d>] Not tainted VLI
> EFLAGS: 00010246 (2.6.8.1-1.520.1nov.rhfc2.ccrma)
> EIP is at acpi_bus_register_driver+0xd2/0x165
> eax: f8881920 ebx: f88eefe0 ecx: c03d6b40 edx: f88ebd30
> esi: ffffffed edi: f6ed4000 ebp: c03d9460 esp: f6ed4f5c
> ds: 007b es: 007b ss: 0068
> Process modprobe (pid: 2119, threadinfo=f6ed4000 task=f6e71870)
> Stack: c03d94a0 f88e9126 00000015 00000014 f8870bc1 c03d94a0 f88ef280
> f6ed4000
> c0129ef7 c03d94a0 f88ef280 f6ed4000 c03d9460 c014dd52 00000246
> f62f29ac
> f6e1fc40 f6d8b8ac f6de4380 f6de43ac 00000000 b7fde008 0807a1a0
> 006a809d
> Call Trace:
> [<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
> [<f8870bc1>] floppy_init+0x11/0x600 [floppy]
> [<c0129ef7>] printk+0x17/0x20
> [<c014dd52>] sys_init_module+0x252/0x3b0
> [<c0106afd>] sysenter_past_esp+0x52/0x71
> Code: 00 00 00 a1 ec 67 3e c0 c7 05 78 67 3e c0 a0 7b 3a c0 c7 05 7c 67
> 3e c0 bd 01 00 00 89 1d ec 67 3e c0 c7 03 e8 67 3e c0 89 43 04 <89> 18
> 81 3d 68 67 3e c0 3c 4b 24 1d 74 1c 68 68 67 3e c0 68 bf
> <6>note: modprobe[2119] exited with preempt_count 1
> Debug: sleeping function called from invalid context at
> include/linux/rwsem.h:43
> in_atomic():1, irqs_disabled():0
> [<c0125652>] __might_sleep+0xa2/0xb0
> [<c012d222>] do_exit+0xa2/0x980
> [<c010775f>] die+0x2bf/0x2c0
> [<c012a0b6>] vprintk+0x1b6/0x340
> [<c011e1e4>] do_page_fault+0x314/0x56c
> [<c01d8155>] sysfs_new_dirent+0x25/0x80
> [<c01d81cd>] sysfs_make_dirent+0x1d/0x90
> [<c0172a29>] unmap_area_pmd+0x49/0x60
> [<c01d7d84>] sysfs_add_file+0x74/0xa0
> [<c0172bb0>] unmap_vm_area+0x30/0x80
> [<c0173136>] __vunmap+0xb6/0xf0
> [<c0129cf0>] call_console_drivers+0x80/0x110
> [<c011ded0>] do_page_fault+0x0/0x56c
> [<c0106cf9>] error_code+0x2d/0x38
> [<c0251d3d>] acpi_bus_register_driver+0xd2/0x165
> [<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
> [<f8870bc1>] floppy_init+0x11/0x600 [floppy]
> [<c0129ef7>] printk+0x17/0x20
> [<c014dd52>] sys_init_module+0x252/0x3b0
> [<c0106afd>] sysenter_past_esp+0x52/0x71
> ohci1394: $Rev: 1226 $ Ben Collins <[email protected]>
>
> -- Fernando
>
> --- linux/init/main.c.orig
> +++ linux/init/main.c
> @@ -435,6 +435,12 @@ static void noinline rest_init(void)
> {
> kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND);
> numa_default_policy();
> + /*
> + * Re-enable preemption but disable interrupts to make sure
> + * we dont get preempted until we schedule() in cpu_idle().
> + */
> + local_irq_disable();
> + preempt_enable_no_resched();
> unlock_kernel();
> cpu_idle();
> }
> @@ -501,6 +507,7 @@ asmlinkage void __init start_kernel(void
> * time - but meanwhile we still have a functioning scheduler.
> */
> sched_init();
> + preempt_disable();
> build_all_zonelists();
> page_alloc_init();
> trap_init();
> --- linux/arch/i386/kernel/entry.S.orig
> +++ linux/arch/i386/kernel/entry.S
> @@ -197,10 +197,8 @@ need_resched:
> jz restore_all
> testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception
> path) ?
> jz restore_all
> - movl $PREEMPT_ACTIVE,TI_preempt_count(%ebp)
> sti
> - call schedule
> - movl $0,TI_preempt_count(%ebp)
> + call preempt_schedule
> cli
> jmp need_resched
> #endif

2004-09-30 21:23:06

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

Fernando Pablo Lopez-Lezcano <[email protected]> wrote:
> inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
> Unable to handle kernel paging request at virtual address f8881920
> printing eip:
> c0251d3d
> *pde = 37f5f067
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: floppy(U) sg(U) dm_mod(U) uhci_hcd(U) ehci_hcd(U)
> button(U) battery(U) asus_acpi(U) ac(U) ext3(U) jbd(U) raid5(U) xor(U)
> sata_via(U) sata_promise(U) libata(U) sd_mod(U) scsi_mod(U)
> CPU: 0
> EIP: 0060:[<c0251d3d>] Not tainted VLI
> EFLAGS: 00010246 (2.6.8.1-1.520.1nov.rhfc2.ccrma)
> EIP is at acpi_bus_register_driver+0xd2/0x165

Can you reproduce this oops with CONFIG_PREEMPT turned off? I found
a few reports of similar problems, and they all seem to have PREEMPT
turned on:

http://sourceforge.net/mailarchive/message.php?msg_id=9308735
http://lkml.org/lkml/2004/6/5/92
http://lkml.org/lkml/2004/9/23/71
http://lkml.org/lkml/2004/9/30/175 (yours)

Pierre reported that the oops seemed to be related to DEBUG_PAGEALLOC,
not PREEMPT. But you don't seem to have DEBUG_PAGEALLOC turned on,
so I still wonder if there's some connection with PREEMPT.

> eax: f8881920 ebx: f88eefe0 ecx: c03d6b40 edx: f88ebd30
> esi: ffffffed edi: f6ed4000 ebp: c03d9460 esp: f6ed4f5c
> ds: 007b es: 007b ss: 0068
> Process modprobe (pid: 2119, threadinfo=f6ed4000 task=f6e71870)
> Stack: c03d94a0 f88e9126 00000015 00000014 f8870bc1 c03d94a0 f88ef280
> f6ed4000
> c0129ef7 c03d94a0 f88ef280 f6ed4000 c03d9460 c014dd52 00000246
> f62f29ac
> f6e1fc40 f6d8b8ac f6de4380 f6de43ac 00000000 b7fde008 0807a1a0
> 006a809d
> Call Trace:
> [<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
> [<f8870bc1>] floppy_init+0x11/0x600 [floppy]
> [<c0129ef7>] printk+0x17/0x20
> [<c014dd52>] sys_init_module+0x252/0x3b0
> [<c0106afd>] sysenter_past_esp+0x52/0x71
> Code: 00 00 00 a1 ec 67 3e c0 c7 05 78 67 3e c0 a0 7b 3a c0 c7 05 7c 67
> 3e c0 bd 01 00 00 89 1d ec 67 3e c0 c7 03 e8 67 3e c0 89 43 04 <89> 18
> 81 3d 68 67 3e c0 3c 4b 24 1d 74 1c 68 68 67 3e c0 68 bf
> <6>note: modprobe[2119] exited with preempt_count 1
> Debug: sleeping function called from invalid context at
> include/linux/rwsem.h:43
> in_atomic():1, irqs_disabled():0
> [<c0125652>] __might_sleep+0xa2/0xb0
> [<c012d222>] do_exit+0xa2/0x980
> [<c010775f>] die+0x2bf/0x2c0
> [<c012a0b6>] vprintk+0x1b6/0x340
> [<c011e1e4>] do_page_fault+0x314/0x56c
> [<c01d8155>] sysfs_new_dirent+0x25/0x80
> [<c01d81cd>] sysfs_make_dirent+0x1d/0x90
> [<c0172a29>] unmap_area_pmd+0x49/0x60
> [<c01d7d84>] sysfs_add_file+0x74/0xa0
> [<c0172bb0>] unmap_vm_area+0x30/0x80
> [<c0173136>] __vunmap+0xb6/0xf0
> [<c0129cf0>] call_console_drivers+0x80/0x110
> [<c011ded0>] do_page_fault+0x0/0x56c
> [<c0106cf9>] error_code+0x2d/0x38
> [<c0251d3d>] acpi_bus_register_driver+0xd2/0x165
> [<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
> [<f8870bc1>] floppy_init+0x11/0x600 [floppy]
> [<c0129ef7>] printk+0x17/0x20
> [<c014dd52>] sys_init_module+0x252/0x3b0
> [<c0106afd>] sysenter_past_esp+0x52/0x71
> ohci1394: $Rev: 1226 $ Ben Collins <[email protected]>

2004-09-30 23:04:51

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Thursday 30 September 2004 3:22 pm, Bjorn Helgaas wrote:
> Fernando Pablo Lopez-Lezcano <[email protected]> wrote:
> > inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
> > Unable to handle kernel paging request at virtual address f8881920
> > printing eip:
> > c0251d3d
> > *pde = 37f5f067
> > Oops: 0002 [#1]
> > PREEMPT
> > Modules linked in: floppy(U) sg(U) dm_mod(U) uhci_hcd(U) ehci_hcd(U)
> > button(U) battery(U) asus_acpi(U) ac(U) ext3(U) jbd(U) raid5(U) xor(U)
> > sata_via(U) sata_promise(U) libata(U) sd_mod(U) scsi_mod(U)
> > CPU: 0
> > EIP: 0060:[<c0251d3d>] Not tainted VLI
> > EFLAGS: 00010246 (2.6.8.1-1.520.1nov.rhfc2.ccrma)
> > EIP is at acpi_bus_register_driver+0xd2/0x165

Like Pierre, I was able to reproduce this with DEBUG_PAGEALLOC.
I found a struct acpi_driver in hpet.c that was erroneously marked
__init, and the attached patch fixed the oops for me. Can you give
this a whirl?


Attachments:
(No filename) (940.00 B)
diffs.hpet (332.00 B)
Download all attachments

2004-10-01 01:37:29

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Thu, 2004-09-30 at 16:04, Bjorn Helgaas wrote:
> On Thursday 30 September 2004 3:22 pm, Bjorn Helgaas wrote:
> > Fernando Pablo Lopez-Lezcano <[email protected]> wrote:
> > > inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
> > > Unable to handle kernel paging request at virtual address f8881920
> > > printing eip:
> > > c0251d3d
> > > *pde = 37f5f067
> > > Oops: 0002 [#1]
> > > PREEMPT
> > > Modules linked in: floppy(U) sg(U) dm_mod(U) uhci_hcd(U) ehci_hcd(U)
> > > button(U) battery(U) asus_acpi(U) ac(U) ext3(U) jbd(U) raid5(U) xor(U)
> > > sata_via(U) sata_promise(U) libata(U) sd_mod(U) scsi_mod(U)
> > > CPU: 0
> > > EIP: 0060:[<c0251d3d>] Not tainted VLI
> > > EFLAGS: 00010246 (2.6.8.1-1.520.1nov.rhfc2.ccrma)
> > > EIP is at acpi_bus_register_driver+0xd2/0x165
>
> Like Pierre, I was able to reproduce this with DEBUG_PAGEALLOC.
> I found a struct acpi_driver in hpet.c that was erroneously marked
> __init, and the attached patch fixed the oops for me. Can you give
> this a whirl?

Sorry, I did, and still get the oops. This is it this time (looks the
same to me):

inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
Unable to handle kernel paging request at virtual address f8881920
printing eip:
c0251d3d
*pde = 37f5f067
Oops: 0002 [#1]
PREEMPT
Modules linked in: floppy(U) sg(U) dm_mod(U) uhci_hcd(U) ehci_hcd(U)
button(U) battery(U) asus_acpi(U) ac(U) ext3(U) jbd(U) raid5(U) xor(U)
sata_via(U) sata_promise(U) libata(U) sd_mod(U) scsi_mod(U)
CPU: 0
EIP: 0060:[<c0251d3d>] Not tainted VLI
EFLAGS: 00010246 (2.6.8.1-1.520.1nov.rhfc2.ccrma)
EIP is at acpi_bus_register_driver+0xd2/0x165
eax: f8881920 ebx: f88eefe0 ecx: c03d6b40 edx: f88ebd30
esi: ffffffed edi: f6eb8000 ebp: c03d9460 esp: f6eb8f5c
ds: 007b es: 007b ss: 0068
Process modprobe (pid: 2119, threadinfo=f6eb8000 task=f6dc4640)
Stack: c03d94a0 f88e9126 00000015 00000014 f8870bc1 c03d94a0 f88ef280
f6eb8000
c0129ef7 c03d94a0 f88ef280 f6eb8000 c03d9460 c014dd52 00000246
f62d1eac
f6e44c40 f6f16564 f6ea1c40 f6ea1c6c 00000000 b7fde008 0807a1a0
006a809d
Call Trace:
[<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
[<f8870bc1>] floppy_init+0x11/0x600 [floppy]
[<c0129ef7>] printk+0x17/0x20
[<c014dd52>] sys_init_module+0x252/0x3b0
[<c0106afd>] sysenter_past_esp+0x52/0x71
Code: 00 00 00 a1 ec 67 3e c0 c7 05 78 67 3e c0 a0 7b 3a c0 c7 05 7c 67
3e c0 bd 01 00 00 89 1d ec 67 3e c0 c7 03 e8 67 3e c0 89 43 04 <89> 18
81 3d 68 67 3e c0 3c 4b 24 1d 74 1c 68 68 67 3e c0 68 bf
<6>note: modprobe[2119] exited with preempt_count 1
Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():1, irqs_disabled():0
[<c0125652>] __might_sleep+0xa2/0xb0
[<c012d222>] do_exit+0xa2/0x980
[<c010775f>] die+0x2bf/0x2c0
[<c012a0b6>] vprintk+0x1b6/0x340
[<c011e1e4>] do_page_fault+0x314/0x56c
[<c01d8155>] sysfs_new_dirent+0x25/0x80
[<c01d81cd>] sysfs_make_dirent+0x1d/0x90
[<c0172a29>] unmap_area_pmd+0x49/0x60
[<c01d7d84>] sysfs_add_file+0x74/0xa0
[<c0172bb0>] unmap_vm_area+0x30/0x80
[<c0173136>] __vunmap+0xb6/0xf0
[<c0129cf0>] call_console_drivers+0x80/0x110
[<c011ded0>] do_page_fault+0x0/0x56c
[<c0106cf9>] error_code+0x2d/0x38
[<c0251d3d>] acpi_bus_register_driver+0xd2/0x165
[<f88e9126>] acpi_floppy_init+0x16/0x50 [floppy]
[<f8870bc1>] floppy_init+0x11/0x600 [floppy]
[<c0129ef7>] printk+0x17/0x20
[<c014dd52>] sys_init_module+0x252/0x3b0
[<c0106afd>] sysenter_past_esp+0x52/0x71

I tried building without PREEMPT but there were some errors about
unknown symbols at the end, I'll try again tomorrow.

-- Fernando


2004-10-01 15:06:19

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Thursday 30 September 2004 7:36 pm, Fernando Pablo Lopez-Lezcano wrote:
> On Thu, 2004-09-30 at 16:04, Bjorn Helgaas wrote:
> > Like Pierre, I was able to reproduce this with DEBUG_PAGEALLOC.
> > I found a struct acpi_driver in hpet.c that was erroneously marked
> > __init, and the attached patch fixed the oops for me. Can you give
> > this a whirl?
>
> Sorry, I did, and still get the oops. This is it this time (looks the
> same to me):
>
> inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
> Unable to handle kernel paging request at virtual address f8881920

Can you post your .config? If you don't have CONFIG_HPET turned
on, my patch wouldn't help. Also, can you look up the bad address
(e.g., f8881920) in /proc/kallsyms? When I reproduced it, the
faulting address was hpet_acpi_driver. Maybe you've found a
similar bug in a different driver.

2004-10-01 17:58:38

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Fri, 2004-10-01 at 08:04, Bjorn Helgaas wrote:
> On Thursday 30 September 2004 7:36 pm, Fernando Pablo Lopez-Lezcano wrote:
> > On Thu, 2004-09-30 at 16:04, Bjorn Helgaas wrote:
> > > Like Pierre, I was able to reproduce this with DEBUG_PAGEALLOC.
> > > I found a struct acpi_driver in hpet.c that was erroneously marked
> > > __init, and the attached patch fixed the oops for me. Can you give
> > > this a whirl?
> >
> > Sorry, I did, and still get the oops. This is it this time (looks the
> > same to me):
> >
> > inserting floppy driver for 2.6.8.1-1.520.1nov.rhfc2.ccrma
> > Unable to handle kernel paging request at virtual address f8881920
>
> Can you post your .config? If you don't have CONFIG_HPET turned
> on, my patch wouldn't help.

Sorry, it did not occur to me to check for that. I have it off (config
file attached).

> Also, can you look up the bad address
> (e.g., f8881920) in /proc/kallsyms?

This is what I find:
f8880f80 ? __mod_vermagic5 [xor]
f8880fcd ? __module_depends [xor]
f8881000 t acpi_button_init [button]
f8881000 t init_module [button]
f8884000 t xor_pII_mmx_2 [xor]
f8884130 t xor_pII_mmx_3 [xor]

> When I reproduced it, the
> faulting address was hpet_acpi_driver. Maybe you've found a
> similar bug in a different driver.

-- Fernando


Attachments:
kernel-2.6.8.1-i686.ccrma.config (50.84 kB)

2004-10-01 22:35:10

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Friday 01 October 2004 11:52 am, Fernando Pablo Lopez-Lezcano wrote:
> On Fri, 2004-10-01 at 08:04, Bjorn Helgaas wrote:
> > Also, can you look up the bad address
> > (e.g., f8881920) in /proc/kallsyms?
>
> This is what I find:
> f8880f80 ? __mod_vermagic5 [xor]
> f8880fcd ? __module_depends [xor]
> f8881000 t acpi_button_init [button]
> f8881000 t init_module [button]
> f8884000 t xor_pII_mmx_2 [xor]
> f8884130 t xor_pII_mmx_3 [xor]

You are remembering that /proc/kallsyms isn't sorted, right?

If you still can't match the address to anything interesting,
can you see whether it's related to any of the other modules
(i.e., see whether it happens even if you don't load any of
the other ACPI drivers, or try leaving out any other drivers
you can get along without)? Maybe try loading an ACPI driver
other than floppy, at the same point in the module load sequence,
to see if the problem is specific to floppy, or if floppy is
just an innocent bystander?

I looked at all the callers of acpi_bus_register_driver(), and
they all look fine (except the hpet one I found yesterday). But
maybe there's something I missed, or maybe the acpi_bus_drivers
list got corrupted somehow.

If you don't load the floppy driver, is the system stable?

2004-10-07 20:38:40

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Friday 01 October 2004 4:33 pm, Bjorn Helgaas wrote:
> On Friday 01 October 2004 11:52 am, Fernando Pablo Lopez-Lezcano wrote:
> > On Fri, 2004-10-01 at 08:04, Bjorn Helgaas wrote:
> > > Also, can you look up the bad address
> > > (e.g., f8881920) in /proc/kallsyms?
> >
> > This is what I find:
> > f8880f80 ? __mod_vermagic5 [xor]
> > f8880fcd ? __module_depends [xor]
> > f8881000 t acpi_button_init [button]
> > f8881000 t init_module [button]
> > f8884000 t xor_pII_mmx_2 [xor]
> > f8884130 t xor_pII_mmx_3 [xor]
>
> You are remembering that /proc/kallsyms isn't sorted, right?
>
> If you still can't match the address to anything interesting,
> can you see whether it's related to any of the other modules
> (i.e., see whether it happens even if you don't load any of
> the other ACPI drivers, or try leaving out any other drivers
> you can get along without)? Maybe try loading an ACPI driver
> other than floppy, at the same point in the module load sequence,
> to see if the problem is specific to floppy, or if floppy is
> just an innocent bystander?
>
> I looked at all the callers of acpi_bus_register_driver(), and
> they all look fine (except the hpet one I found yesterday). But
> maybe there's something I missed, or maybe the acpi_bus_drivers
> list got corrupted somehow.
>
> If you don't load the floppy driver, is the system stable?

Any update on all this? I've tried to reproduce the problem on
my Athlon box, but so far I've been unsuccessful.

2004-10-07 21:21:23

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Thu, 2004-10-07 at 13:31, Bjorn Helgaas wrote:
> On Friday 01 October 2004 4:33 pm, Bjorn Helgaas wrote:
> > On Friday 01 October 2004 11:52 am, Fernando Pablo Lopez-Lezcano wrote:
> > > On Fri, 2004-10-01 at 08:04, Bjorn Helgaas wrote:
> > > > Also, can you look up the bad address
> > > > (e.g., f8881920) in /proc/kallsyms?
> > >
> > > This is what I find:
> > > f8880f80 ? __mod_vermagic5 [xor]
> > > f8880fcd ? __module_depends [xor]
> > > f8881000 t acpi_button_init [button]
> > > f8881000 t init_module [button]
> > > f8884000 t xor_pII_mmx_2 [xor]
> > > f8884130 t xor_pII_mmx_3 [xor]
> >
> > You are remembering that /proc/kallsyms isn't sorted, right?
> >
> > If you still can't match the address to anything interesting,
> > can you see whether it's related to any of the other modules
> > (i.e., see whether it happens even if you don't load any of
> > the other ACPI drivers, or try leaving out any other drivers
> > you can get along without)? Maybe try loading an ACPI driver
> > other than floppy, at the same point in the module load sequence,
> > to see if the problem is specific to floppy, or if floppy is
> > just an innocent bystander?
> >
> > I looked at all the callers of acpi_bus_register_driver(), and
> > they all look fine (except the hpet one I found yesterday). But
> > maybe there's something I missed, or maybe the acpi_bus_drivers
> > list got corrupted somehow.
> >
> > If you don't load the floppy driver, is the system stable?

Even if I load it (unsuccesfully - there's actually no floppy) the
system _appears_ to be fine. If I remember correctly this happens in the
context of device discovery (kudzu).

> Any update on all this? I've tried to reproduce the problem on
> my Athlon box, but so far I've been unsuccessful.

Sorry for the delay, too busy as usual.

BTW, what I sent was actually sorted. I'll see if I can get more
information later today (I have not forgotten).

-- Fernando


2004-10-17 16:26:51

by Bernhard Rosenkraenzer

[permalink] [raw]
Subject: Re: 2.6.9rc2-mm4 oops

On Thursday 07 October 2004 22:31, Bjorn Helgaas wrote:

> > I looked at all the callers of acpi_bus_register_driver(), and
> > they all look fine (except the hpet one I found yesterday). But
> > maybe there's something I missed, or maybe the acpi_bus_drivers
> > list got corrupted somehow.
> >
> > If you don't load the floppy driver, is the system stable?
>
> Any update on all this? I've tried to reproduce the problem on
> my Athlon box, but so far I've been unsuccessful.

Tried a couple of boxes (with versions up to 2.6.9-rc4-mm1), it appears to be
100% reproducable on boxes that don't have a floppy drive while it works ok
on boxes that do have a floppy drive.

The system still works reliably after the oops btw.

LLaP
bero