The system in question has a set of root bridges, with the CD rom (which
is seen) being on bus 0 and the SCSI controller, which doesn't show up
being on bus 1 (with several other things on busses > 1).
The system uses ACPI to detect the multiple bridges, so it seems that
this failure in 2.6.35-rc3:
ACPI: Core revision 20100428
ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
ACPI Warning: AcpiEnable failed (20100428/utxface-147)
ACPI: Unable to enable ACPI
Means that we can't detect any PCI bus other than 0 and thus the system
won't boot.
The full dmesgs for a successful (2.6.34) and failed (2.6.35-rc3) boot
are below.
James
On Mon, 2010-06-21 at 11:03 -0500, James Bottomley wrote:
> The system in question has a set of root bridges, with the CD rom (which
> is seen) being on bus 0 and the SCSI controller, which doesn't show up
> being on bus 1 (with several other things on busses > 1).
>
> The system uses ACPI to detect the multiple bridges, so it seems that
> this failure in 2.6.35-rc3:
>
> ACPI: Core revision 20100428
> ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
> ACPI Warning: AcpiEnable failed (20100428/utxface-147)
> ACPI: Unable to enable ACPI
>
> Means that we can't detect any PCI bus other than 0 and thus the system
> won't boot.
>
> The full dmesgs for a successful (2.6.34) and failed (2.6.35-rc3) boot
> are below.
This didn't make either the PCI or the ACPI lists ... probably because
of the attachments. However, the full message (with attachments) is on
lkml:
http://marc.info/?l=linux-kernel&m=127713622821166
James
James Bottomley <[email protected]> writes:
> The system in question has a set of root bridges, with the CD rom (which
> is seen) being on bus 0 and the SCSI controller, which doesn't show up
> being on bus 1 (with several other things on busses > 1).
>
> The system uses ACPI to detect the multiple bridges, so it seems that
> this failure in 2.6.35-rc3:
>
> ACPI: Core revision 20100428
> ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
> ACPI Warning: AcpiEnable failed (20100428/utxface-147)
> ACPI: Unable to enable ACPI
That probably causes all the other problems. Without ACPI enabled
modern systems generally do not work.
Just guessing, but maybe try to revert
b430acbd7c4b919886fa7fd92eeb7a695f1940d3
That was the last change in this area.
-Andi
--
[email protected] -- Speaking for myself only.
On Mon, 2010-06-21 at 18:18 +0200, Andi Kleen wrote:
> James Bottomley <[email protected]> writes:
>
> > The system in question has a set of root bridges, with the CD rom (which
> > is seen) being on bus 0 and the SCSI controller, which doesn't show up
> > being on bus 1 (with several other things on busses > 1).
> >
> > The system uses ACPI to detect the multiple bridges, so it seems that
> > this failure in 2.6.35-rc3:
> >
> > ACPI: Core revision 20100428
> > ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
> > ACPI Warning: AcpiEnable failed (20100428/utxface-147)
> > ACPI: Unable to enable ACPI
>
> That probably causes all the other problems. Without ACPI enabled
> modern systems generally do not work.
>
> Just guessing, but maybe try to revert
>
> b430acbd7c4b919886fa7fd92eeb7a695f1940d3
>
> That was the last change in this area.
I can confirm that reverting this over a vanilla 2.6.35-rc3 fixes my
boot problem.
James
On Mon, 2010-06-21 at 11:33 -0500, James Bottomley wrote:
> On Mon, 2010-06-21 at 18:18 +0200, Andi Kleen wrote:
> > James Bottomley <[email protected]> writes:
> >
> > > The system in question has a set of root bridges, with the CD rom (which
> > > is seen) being on bus 0 and the SCSI controller, which doesn't show up
> > > being on bus 1 (with several other things on busses > 1).
> > >
> > > The system uses ACPI to detect the multiple bridges, so it seems that
> > > this failure in 2.6.35-rc3:
> > >
> > > ACPI: Core revision 20100428
> > > ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
> > > ACPI Warning: AcpiEnable failed (20100428/utxface-147)
> > > ACPI: Unable to enable ACPI
> >
> > That probably causes all the other problems. Without ACPI enabled
> > modern systems generally do not work.
> >
> > Just guessing, but maybe try to revert
> >
> > b430acbd7c4b919886fa7fd92eeb7a695f1940d3
> >
> > That was the last change in this area.
>
> I can confirm that reverting this over a vanilla 2.6.35-rc3 fixes my
> boot problem.
Andi asked for the timings with the commit reverted, so here they are
(from ACPI version print to PCI root busses found):
[ 0.084086] ACPI: Core revision 20100428
[ 0.119779] Setting APIC routing to physical flat
[ 0.124259] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.128000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 0.128000] ...trying to set up timer (IRQ0) through the 8259A ...
[ 0.128000] ..... (found apic 0 pin 2) ...
[ 0.128000] ....... failed.
[ 0.128000] ...trying to set up timer as Virtual Wire IRQ...
[ 0.169681] ..... works.
[ 0.172005] CPU0: Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
[ 0.188000] lockdep: fixing up alternatives.
[ 0.188133] Booting Node 0, Processors #1 Ok.
[ 0.288089] Brought up 2 CPUs
[ 0.292006] Total of 2 processors activated (12676.86 BogoMIPS).
[ 0.300923] khelper used greatest stack depth: 5864 bytes left
[ 0.305017] khelper used greatest stack depth: 5176 bytes left
[ 0.325510] NET: Registered protocol family 16
[ 0.341322] ACPI: bus type pci registered
[ 0.344474] PCI: Using configuration type 1 for base access
[ 0.359165] bio: create slab <bio-0> at 0
[ 0.478153] ACPI: Interpreter enabled
[ 0.488006] ACPI: (supports S0 S5)
[ 0.497875] ACPI: Using IOAPIC for interrupt routing
[ 0.548145] ACPI: No dock devices found.
[ 0.552176] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[ 0.556263] ACPI: PCI Root Bridge [VP00] (domain 0000 [bus 00])
[ 0.568051] ACPI: PCI Root Bridge [VP01] (domain 0000 [bus 01])
[ 0.576336] ACPI: PCI Root Bridge [VP02] (domain 0000 [bus 02-03])
[ 0.596889] ACPI: PCI Root Bridge [VP03] (domain 0000 [bus 04-05])
[ 0.605222] ACPI: PCI Root Bridge [VP04] (domain 0000 [bus 06-07])
[ 0.613530] ACPI: PCI Root Bridge [VP05] (domain 0000 [bus 08-09])
[ 0.617658] ACPI: PCI Root Bridge [VP06] (domain 0000 [bus 0a-0b])
[ 0.621672] ACPI: PCI Root Bridge [VP07] (domain 0000 [bus 0c-0d])
[ 0.625948] HEST: Table is not found!
James
On Mon, 2010-06-21 at 11:54 -0500, James Bottomley wrote:
> On Mon, 2010-06-21 at 11:33 -0500, James Bottomley wrote:
> > On Mon, 2010-06-21 at 18:18 +0200, Andi Kleen wrote:
> > > James Bottomley <[email protected]> writes:
> > >
> > > > The system in question has a set of root bridges, with the CD rom (which
> > > > is seen) being on bus 0 and the SCSI controller, which doesn't show up
> > > > being on bus 1 (with several other things on busses > 1).
> > > >
> > > > The system uses ACPI to detect the multiple bridges, so it seems that
> > > > this failure in 2.6.35-rc3:
> > > >
> > > > ACPI: Core revision 20100428
> > > > ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
> > > > ACPI Warning: AcpiEnable failed (20100428/utxface-147)
> > > > ACPI: Unable to enable ACPI
> > >
> > > That probably causes all the other problems. Without ACPI enabled
> > > modern systems generally do not work.
> > >
> > > Just guessing, but maybe try to revert
> > >
> > > b430acbd7c4b919886fa7fd92eeb7a695f1940d3
> > >
> > > That was the last change in this area.
> >
> > I can confirm that reverting this over a vanilla 2.6.35-rc3 fixes my
> > boot problem.
>
> Andi asked for the timings with the commit reverted, so here they are
> (from ACPI version print to PCI root busses found):
>
> [ 0.084086] ACPI: Core revision 20100428
> [ 0.119779] Setting APIC routing to physical flat
> [ 0.124259] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.128000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [ 0.128000] ...trying to set up timer (IRQ0) through the 8259A ...
> [ 0.128000] ..... (found apic 0 pin 2) ...
> [ 0.128000] ....... failed.
> [ 0.128000] ...trying to set up timer as Virtual Wire IRQ...
> [ 0.169681] ..... works.
> [ 0.172005] CPU0: Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01
> [ 0.188000] lockdep: fixing up alternatives.
> [ 0.188133] Booting Node 0, Processors #1 Ok.
> [ 0.288089] Brought up 2 CPUs
> [ 0.292006] Total of 2 processors activated (12676.86 BogoMIPS).
> [ 0.300923] khelper used greatest stack depth: 5864 bytes left
> [ 0.305017] khelper used greatest stack depth: 5176 bytes left
> [ 0.325510] NET: Registered protocol family 16
> [ 0.341322] ACPI: bus type pci registered
> [ 0.344474] PCI: Using configuration type 1 for base access
> [ 0.359165] bio: create slab <bio-0> at 0
> [ 0.478153] ACPI: Interpreter enabled
> [ 0.488006] ACPI: (supports S0 S5)
> [ 0.497875] ACPI: Using IOAPIC for interrupt routing
> [ 0.548145] ACPI: No dock devices found.
> [ 0.552176] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
> [ 0.556263] ACPI: PCI Root Bridge [VP00] (domain 0000 [bus 00])
> [ 0.568051] ACPI: PCI Root Bridge [VP01] (domain 0000 [bus 01])
> [ 0.576336] ACPI: PCI Root Bridge [VP02] (domain 0000 [bus 02-03])
> [ 0.596889] ACPI: PCI Root Bridge [VP03] (domain 0000 [bus 04-05])
> [ 0.605222] ACPI: PCI Root Bridge [VP04] (domain 0000 [bus 06-07])
> [ 0.613530] ACPI: PCI Root Bridge [VP05] (domain 0000 [bus 08-09])
> [ 0.617658] ACPI: PCI Root Bridge [VP06] (domain 0000 [bus 0a-0b])
> [ 0.621672] ACPI: PCI Root Bridge [VP07] (domain 0000 [bus 0c-0d])
> [ 0.625948] HEST: Table is not found!
Another update: It's clear the problem is the new expectation that the
hardware sets ACPI mode instantly. I applied the following patch on my
system to see how long it actually takes. I see a single loop:
[ 0.084087] ACPI: Core revision 20100428
[ 0.119720] ACPI Error: Hardware did not enter ACPI mode, retry = 0 (20100428/evxfevnt-104)
[ 0.128240] Setting APIC routing to physical flat
So it needs a delay of about 1ms (or one retry).
James
---
diff --git a/drivers/acpi/acpica/evxfevnt.c b/drivers/acpi/acpica/evxfevnt.c
index d5a5efc..bace473 100644
--- a/drivers/acpi/acpica/evxfevnt.c
+++ b/drivers/acpi/acpica/evxfevnt.c
@@ -70,6 +70,7 @@ acpi_ev_get_gpe_device(struct acpi_gpe_xrupt_info *gpe_xrupt_info,
acpi_status acpi_enable(void)
{
acpi_status status;
+ int retry = 0;
ACPI_FUNCTION_TRACE(acpi_enable);
@@ -98,6 +99,11 @@ acpi_status acpi_enable(void)
/* Sanity check that transition succeeded */
+ while (acpi_hw_get_mode() != ACPI_SYS_MODE_ACPI && retry < 3000) {
+ ACPI_ERROR((AE_INFO,
+ "Hardware did not enter ACPI mode, retry = %d", retry++));
+ acpi_os_stall(1000);
+ }
if (acpi_hw_get_mode() != ACPI_SYS_MODE_ACPI) {
ACPI_ERROR((AE_INFO,
"Hardware did not enter ACPI mode"));
On poniedziaĆek, 21 czerwca 2010 o 18:03:09 James Bottomley wrote:
> The system in question has a set of root bridges, with the CD rom (which
> is seen) being on bus 0 and the SCSI controller, which doesn't show up
> being on bus 1 (with several other things on busses > 1).
>
> The system uses ACPI to detect the multiple bridges, so it seems that
> this failure in 2.6.35-rc3:
>
> ACPI: Core revision 20100428
> ACPI Error: Hardware did not enter ACPI mode (20100428/evxfevnt-103)
> ACPI Warning: AcpiEnable failed (20100428/utxface-147)
> ACPI: Unable to enable ACPI
>
> Means that we can't detect any PCI bus other than 0 and thus the system
> won't boot.
>
> The full dmesgs for a successful (2.6.34) and failed (2.6.35-rc3) boot
> are below.
>
> James
I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=16271
for your bug report, please add your address to the CC list in there, thanks!
--
Maciej Rutecki
http://www.maciek.unixy.pl
James,
Is this a production system with a production BIOS?
BTW, "Instantly" here isn't actually instantly, it is
"before the return from SMM".
thanks,
Len Brown, Intel Open Source Technology Center
From: Len Brown <[email protected]>
Subject: [PATCH] ACPI: handle systems which asynchoronously enable ACPI mode
Folklore suggested that such systems existed
in the pre-history of ACPI.
However, we removed the SCI_EN polling loop from
acpi_hw_set_mode() in b430acbd7c4b919886fa7fd92eeb7a695f1940d3
because it delayed resume by 3 seconds on boxes
that refused to set SCI_EN.
Matthew removed the call to acpi_enable() from
the suspend resume path.
James found a modern system that still needs to be polled
upon boot.
So here we restore the workaround, except that we
put it in acpi_enable() rather than the low level
acpi_hw_set_mode().
https://bugzilla.kernel.org/show_bug.cgi?id=16271
Signed-off-by: Len Brown <[email protected]>
---
James, What does the IBM system see with this patch?
thanks,
-Len
drivers/acpi/acpica/evxfevnt.c | 19 +++++++++++--------
1 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/acpi/acpica/evxfevnt.c b/drivers/acpi/acpica/evxfevnt.c
index d97b8dc..18b3f14 100644
--- a/drivers/acpi/acpica/evxfevnt.c
+++ b/drivers/acpi/acpica/evxfevnt.c
@@ -70,6 +70,7 @@ acpi_ev_get_gpe_device(struct acpi_gpe_xrupt_info *gpe_xrupt_info,
acpi_status acpi_enable(void)
{
acpi_status status;
+ int retry;
ACPI_FUNCTION_TRACE(acpi_enable);
@@ -98,16 +99,18 @@ acpi_status acpi_enable(void)
/* Sanity check that transition succeeded */
- if (acpi_hw_get_mode() != ACPI_SYS_MODE_ACPI) {
- ACPI_ERROR((AE_INFO,
- "Hardware did not enter ACPI mode"));
- return_ACPI_STATUS(AE_NO_HARDWARE_RESPONSE);
+ for (retry = 0; retry < 30000; ++retry) {
+ if (acpi_hw_get_mode() == ACPI_SYS_MODE_ACPI) {
+ if (retry != 0)
+ ACPI_WARNING((AE_INFO,
+ "Platform took > %d00 usec to enter ACPI mode", retry));
+ return_ACPI_STATUS(AE_OK);
+ }
+ acpi_os_stall(100); /* 100 usec */
}
- ACPI_DEBUG_PRINT((ACPI_DB_INIT,
- "Transition to ACPI mode successful\n"));
-
- return_ACPI_STATUS(AE_OK);
+ ACPI_ERROR((AE_INFO, "Hardware did not enter ACPI mode"));
+ return_ACPI_STATUS(AE_NO_HARDWARE_RESPONSE);
}
ACPI_EXPORT_SYMBOL(acpi_enable)
--
1.7.2.rc0
On Mon, 2010-06-28 at 17:59 -0400, Len Brown wrote:
> Is this a production system with a production BIOS?
It's got a production BIOS, yes ... not sure about the system, it might
be a B model.
> BTW, "Instantly" here isn't actually instantly, it is
> "before the return from SMM".
ENOCONTEXT?
As I explained in the previous email, it takes about 1ms before it
reports as entered ACPI mode.
James
On Mon, 2010-06-28 at 23:49 -0400, Len Brown wrote:
> From: Len Brown <[email protected]>
> Subject: [PATCH] ACPI: handle systems which asynchoronously enable ACPI mode
>
> Folklore suggested that such systems existed
> in the pre-history of ACPI.
>
> However, we removed the SCI_EN polling loop from
> acpi_hw_set_mode() in b430acbd7c4b919886fa7fd92eeb7a695f1940d3
> because it delayed resume by 3 seconds on boxes
> that refused to set SCI_EN.
>
> Matthew removed the call to acpi_enable() from
> the suspend resume path.
>
> James found a modern system that still needs to be polled
> upon boot.
>
> So here we restore the workaround, except that we
> put it in acpi_enable() rather than the low level
> acpi_hw_set_mode().
>
> https://bugzilla.kernel.org/show_bug.cgi?id=16271
>
> Signed-off-by: Len Brown <[email protected]>
> ---
>
> James, What does the IBM system see with this patch?
The output is this:
[ 0.084088] ACPI: Core revision 20100428
[ 0.119451] ACPI Warning: Platform took > 100 usec to enter ACPI mode (20100428/evxfevnt-106)
[ 0.128231] Setting APIC routing to physical flat
So it's very fast ... actually it's behaving like there's some caching
issue and it needs two reads to return the correct value
James