2004-06-22 20:12:10

by Anssi Saari

[permalink] [raw]
Subject: PROBLEM: booting 2.6.7 hangs with IRQ handling problems


Hello,

On my home PC I have an AMD Athlon XP 1900+ on an Aopen AK77-600Max
motherboard, VIA KT600 chipset. It works fine with Linux 2.6.6, apart
from the apparently nonexistent support for PATA devices on the Promise
PDC20378, but I can't boot 2.6.7. I've tried vanilla 2.6.7, 2.6.7 with
acpi-20040326 patch and 2.6.7-bk4. acpi=off, noapic or nolapic don't
seem to help.

I captured the boot messages with the serial console, the full log is at
http://www.sci.fi/~as/linux_2.6.7_boot_hang. Looking through the log,
things seem go fine for a while, until after the cmd64x module loads.
(Putting cmd64x in the kernel didn't help). Then I get this:

irq 10: nobody cared!
[<c0105ac9>] dump_stack+0x19/0x20
[<c0106c93>] __report_bad_irq+0x33/0x90
[<c0106d70>] note_interrupt+0x50/0x80
[<c0106f89>] do_IRQ+0xa9/0x130
[<c0105684>] common_interrupt+0x18/0x20
[<c011cbe5>] do_softirq+0x25/0x30
[<c0106ff1>] do_IRQ+0x111/0x130
[<c0105684>] common_interrupt+0x18/0x20
[<c01070d9>] request_irq+0x89/0xb0
[<c020b007>] init_irq+0x257/0x430
[<c020b5d8>] hwif_init+0x108/0x270
[<c020ac04>] probe_hwif_init+0x14/0x60
[<c020deec>] ide_setup_pci_device+0x3c/0x70
[<f983e3d0>] cmd64x_init_one+0x20/0x30 [cmd64x]
[<c01aeb0d>] pci_device_probe_static+0x2d/0x50
[<c01aeb50>] __pci_device_probe+0x20/0x40
[<c01aeb8e>] pci_device_probe+0x1e/0x40
[<c01f3c42>] bus_match+0x32/0x60
[<c01f3d40>] driver_attach+0x40/0x80
[<c01f3fe5>] bus_add_driver+0x85/0xb0
[<c01f4416>] driver_register+0x36/0x40
[<c01aeda6>] pci_register_driver+0x56/0x80
[<c020e026>] ide_pci_register_driver+0x36/0x50
[<f983e3ed>] cmd64x_ide_init+0xd/0x14 [cmd64x]
[<c012d3c8>] sys_init_module+0x118/0x240
[<c0104d17>] syscall_call+0x7/0xb
handlers:
[<c0207cb0>] (ide_intr+0x0/0x180)
Disabling IRQ #10
ide2 at 0xb400-0xb407,0xb802 on irq 10
hde: max request size: 128KiB
irq 10: nobody cared!

This kind of thing goes on for a while, normal boot messages are in
there too, until finally:

Debug: sleeping function called from invalid context at arch/i386/lib/usercopy.c:597
in_atomic():1, irqs_disabled():0
[<c0105ac9>] dump_stack+0x19/0x20
[<c01172e6>] __might_sleep+0xa6/0xb0
[<c01ab6ca>] copy_to_user+0x1a/0x50
[<c011c4c5>] sys_gettimeofday+0x25/0x60
[<c0104d17>] syscall_call+0x7/0xb
bad: scheduling while atomic!
[<c0105ac9>] dump_stack+0x19/0x20
[<c02890ac>] schedule+0x3c/0x430
[<c0104d3e>] work_resched+0x5/0x16
bad: scheduling while atomic!
[<c0105ac9>] dump_stack+0x19/0x20
[<c02890ac>] schedule+0x3c/0x430
[<c0116e11>] sys_sched_yield+0x41/0x50
[<c0289897>] yield+0x17/0x20
[<c0155c18>] coredump_wait+0x48/0xb0
[<c0155d54>] do_coredump+0xd4/0x1dd
[<c0122cda>] get_signal_to_deliver+0x2ba/0x330
[<c0104ac0>] do_signal+0x50/0xd0
[<c0104b70>] do_notify_resume+0x30/0x48
[<c0104d62>] work_notifysig+0x13/0x15
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing




Subject: Re: PROBLEM: booting 2.6.7 hangs with IRQ handling problems

On Tuesday 22 of June 2004 21:29, Anssi Saari wrote:
>
> Hello,

Hi,

> On my home PC I have an AMD Athlon XP 1900+ on an Aopen AK77-600Max
> motherboard, VIA KT600 chipset. It works fine with Linux 2.6.6, apart
> from the apparently nonexistent support for PATA devices on the Promise
> PDC20378, but I can't boot 2.6.7. I've tried vanilla 2.6.7, 2.6.7 with
> acpi-20040326 patch and 2.6.7-bk4. acpi=off, noapic or nolapic don't
> seem to help.

Since 2.6.6 works and 2.6.7-bk4 doesn't can you try -bk1/2/3 and
do bisection search on specific changesets? Thanks!

Bartlomiej

2004-06-23 18:07:57

by Anssi Saari

[permalink] [raw]
Subject: Re: PROBLEM: booting 2.6.7 hangs with IRQ handling problems

On Wed, Jun 23, 2004 at 05:48:33PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Tuesday 22 of June 2004 21:29, Anssi Saari wrote:
> >
> > Hello,
>
> Hi,
>
> > On my home PC I have an AMD Athlon XP 1900+ on an Aopen AK77-600Max
> > motherboard, VIA KT600 chipset. It works fine with Linux 2.6.6, apart
> > from the apparently nonexistent support for PATA devices on the Promise
> > PDC20378, but I can't boot 2.6.7. I've tried vanilla 2.6.7, 2.6.7 with
> > acpi-20040326 patch and 2.6.7-bk4. acpi=off, noapic or nolapic don't
> > seem to help.
>
> Since 2.6.6 works and 2.6.7-bk4 doesn't can you try -bk1/2/3 and
> do bisection search on specific changesets? Thanks!

OK. I find that 2.6.6-bk1 seemed fine, but 2.6.6-bk2 already prints out
these messages. It did boot, but then hanged shortly after. I hope this
helps to narrow it down?

Anssi


Subject: Re: PROBLEM: booting 2.6.7 hangs with IRQ handling problems

On Wednesday 23 of June 2004 20:04, Anssi Saari wrote:
> On Wed, Jun 23, 2004 at 05:48:33PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Tuesday 22 of June 2004 21:29, Anssi Saari wrote:
> > > Hello,
> >
> > Hi,
> >
> > > On my home PC I have an AMD Athlon XP 1900+ on an Aopen AK77-600Max
> > > motherboard, VIA KT600 chipset. It works fine with Linux 2.6.6, apart
> > > from the apparently nonexistent support for PATA devices on the Promise
> > > PDC20378, but I can't boot 2.6.7. I've tried vanilla 2.6.7, 2.6.7 with
> > > acpi-20040326 patch and 2.6.7-bk4. acpi=off, noapic or nolapic don't
> > > seem to help.
> >
> > Since 2.6.6 works and 2.6.7-bk4 doesn't can you try -bk1/2/3 and
> > do bisection search on specific changesets? Thanks!
>
> OK. I find that 2.6.6-bk1 seemed fine, but 2.6.6-bk2 already prints out
> these messages. It did boot, but then hanged shortly after. I hope this
> helps to narrow it down?

Does it hang the same way as 2.6.7?

There were no IDE changes between 2.6.6-bk1 and 2.6.6-bk2.
Can you do a diff between dmesg outputs from -bk1 and -bk2?

You can also try narrowing it down to a specific changeset
[ http://linux.bkbits.net:8080/linux-2.5/ ] but it can take a while.

Bartlomiej

2004-06-26 18:13:53

by Anssi Saari

[permalink] [raw]
Subject: Re: PROBLEM: booting 2.6.7 hangs with IRQ handling problems

On Fri, Jun 25, 2004 at 09:06:03PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 23 of June 2004 20:04, Anssi Saari wrote:
> > On Wed, Jun 23, 2004 at 05:48:33PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > On Tuesday 22 of June 2004 21:29, Anssi Saari wrote:
> > > > Hello,
> > >
> > > Hi,
> > >
> > > > On my home PC I have an AMD Athlon XP 1900+ on an Aopen AK77-600Max
> > > > motherboard, VIA KT600 chipset. It works fine with Linux 2.6.6, apart
> > > > from the apparently nonexistent support for PATA devices on the Promise
> > > > PDC20378, but I can't boot 2.6.7. I've tried vanilla 2.6.7, 2.6.7 with
> > > > acpi-20040326 patch and 2.6.7-bk4. acpi=off, noapic or nolapic don't
> > > > seem to help.
> > >
> > > Since 2.6.6 works and 2.6.7-bk4 doesn't can you try -bk1/2/3 and
> > > do bisection search on specific changesets? Thanks!
> >
> > OK. I find that 2.6.6-bk1 seemed fine, but 2.6.6-bk2 already prints out
> > these messages. It did boot, but then hanged shortly after. I hope this
> > helps to narrow it down?
>
> Does it hang the same way as 2.6.7?
>
> There were no IDE changes between 2.6.6-bk1 and 2.6.6-bk2.
> Can you do a diff between dmesg outputs from -bk1 and -bk2?

Well, who said this has anything to do with IDE?

> You can also try narrowing it down to a specific changeset
> [ http://linux.bkbits.net:8080/linux-2.5/ ] but it can take a while.

I couldn't figure out if there is a way to get individual changesets for
a date range from bk, each as a separate diff file, so I just went
through the whole big diff between 2.6.6-bk1 and 2.6.6-bk2. Since this
didn't smell like a device driver problem, I was left with only three
changed files.

This is the change that breaks things for me:

-------------------------------------------------------------------------
diff -urN linux-2.6.6-bk1/drivers/acpi/pci_link.c linux-2.6.6-bk2/drivers/acpi/pci_link.c
--- linux-2.6.6-bk1/drivers/acpi/pci_link.c 2004-05-09 19:32:00.000000000 -0700
+++ linux-2.6.6-bk2/drivers/acpi/pci_link.c 2004-05-15 04:50:32.000000000 -0700
@@ -479,7 +479,7 @@
PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ9 PCI, often acpi */
PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ10 PCI */
PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ11 PCI */
- PIRQ_PENALTY_ISA_TYPICAL, /* IRQ12 mouse */
+ PIRQ_PENALTY_ISA_USED, /* IRQ12 mouse */
PIRQ_PENALTY_ISA_USED, /* IRQ13 fpe, sometimes */
PIRQ_PENALTY_ISA_USED, /* IRQ14 ide0 */
PIRQ_PENALTY_ISA_USED, /* IRQ15 ide1 */
@@ -546,17 +546,23 @@
if (link->irq.active == link->irq.possible[i])
break;
}
+ /*
+ * forget active IRQ that is not in possible list
+ */
+ if (i == link->irq.possible_count) {
+ if (acpi_strict)
+ printk(KERN_WARNING PREFIX "_CRS %d not found"
+ " in _PRS\n", link->irq.active);
+ link->irq.active = 0;
+ }

/*
* if active found, use it; else pick entry from end of possible list.
*/
- if (i != link->irq.possible_count) {
+ if (link->irq.active) {
irq = link->irq.active;
} else {
irq = link->irq.possible[link->irq.possible_count - 1];
- if (acpi_strict)
- printk(KERN_WARNING PREFIX "_CRS %d not found"
- " in _PRS\n", link->irq.active);
}

if (acpi_irq_balance || !link->irq.active) {
-------------------------------------------------------------------------

I'm now running 2.6.7 vanilla with the lirc patch and don't seem to have
any problems.

The comment in the change "forget active IRQ that is not in possible list"
and then the "irq 10: nobody cared!" messages from the kernel might have
something to do with each other.

This change is apparently this one from the log:

[email protected], 2004-05-10 16:48:38-04:00, [email protected]
[ACPI] handle _CRS outside _PRS -- even when non-zero
avoid sharing IRQ12
http://bugzilla.kernel.org/show_bug.cgi?id=2665

Apparently Len Brown came up with this as a bugfix for a different
problem, but it breaks things for me. I copied him with this.

Anssi