2009-10-12 12:06:58

by Jens Axboe

[permalink] [raw]
Subject: pci-express hotplug

Hi,

I'm trying to get pci-express hotplug working in a box here. I don't
really care about the hotplug aspect, I just want the darn pci-e slots
that are designated hotplug slots to actually WORK. When I load pciehp,
I get:

Firmware did not grant requested _OSC control
Firmware did not grant requested _OSC control
Firmware did not grant requested _OSC control
Firmware did not grant requested _OSC control
pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
Firmware did not grant requested _OSC control
pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
Firmware did not grant requested _OSC control
pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
pciehp: PCI Express Hot Plug Controller Driver version: 0.4

and the devices in the hotplug slots stay off. Is this an ACPI/bios
issue? How can I debug this?

--
Jens Axboe


2009-10-12 14:56:12

by Greg KH

[permalink] [raw]
Subject: Re: pci-express hotplug

On Mon, Oct 12, 2009 at 02:06:20PM +0200, Jens Axboe wrote:
> Hi,
>
> I'm trying to get pci-express hotplug working in a box here. I don't
> really care about the hotplug aspect, I just want the darn pci-e slots
> that are designated hotplug slots to actually WORK. When I load pciehp,
> I get:
>
> Firmware did not grant requested _OSC control
> Firmware did not grant requested _OSC control
> Firmware did not grant requested _OSC control
> Firmware did not grant requested _OSC control
> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
> Firmware did not grant requested _OSC control
> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
> Firmware did not grant requested _OSC control
> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>
> and the devices in the hotplug slots stay off. Is this an ACPI/bios
> issue? How can I debug this?

Can you try the acpiphp driver instead? That's usually the driver you
want to use for "modern" systems (i.e. anything made in the past 5
years.)

thanks,

greg k-h

2009-10-12 14:57:39

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Mon, Oct 12 2009, Greg KH wrote:
> On Mon, Oct 12, 2009 at 02:06:20PM +0200, Jens Axboe wrote:
> > Hi,
> >
> > I'm trying to get pci-express hotplug working in a box here. I don't
> > really care about the hotplug aspect, I just want the darn pci-e slots
> > that are designated hotplug slots to actually WORK. When I load pciehp,
> > I get:
> >
> > Firmware did not grant requested _OSC control
> > Firmware did not grant requested _OSC control
> > Firmware did not grant requested _OSC control
> > Firmware did not grant requested _OSC control
> > pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
> > pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
> > Firmware did not grant requested _OSC control
> > pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> > pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
> > Firmware did not grant requested _OSC control
> > pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> > pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
> > pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
> > pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
> > pciehp: PCI Express Hot Plug Controller Driver version: 0.4
> >
> > and the devices in the hotplug slots stay off. Is this an ACPI/bios
> > issue? How can I debug this?
>
> Can you try the acpiphp driver instead? That's usually the driver you
> want to use for "modern" systems (i.e. anything made in the past 5
> years.)

I should have mentioned that I tried that too. It doesn't complain, but
I don't see my cards anywhere afterwards. I'm a hotplug newbie, do I
need to do anything else?

--
Jens Axboe

2009-10-12 15:01:16

by Mark Lord

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Mon, Oct 12 2009, Greg KH wrote:
>> On Mon, Oct 12, 2009 at 02:06:20PM +0200, Jens Axboe wrote:
>>> Hi,
>>>
>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>> I get:
>>>
>>> Firmware did not grant requested _OSC control
>>> Firmware did not grant requested _OSC control
>>> Firmware did not grant requested _OSC control
>>> Firmware did not grant requested _OSC control
>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>> Firmware did not grant requested _OSC control
>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>> Firmware did not grant requested _OSC control
>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>
>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>> issue? How can I debug this?
>> Can you try the acpiphp driver instead? That's usually the driver you
>> want to use for "modern" systems (i.e. anything made in the past 5
>> years.)
>
> I should have mentioned that I tried that too. It doesn't complain, but
> I don't see my cards anywhere afterwards. I'm a hotplug newbie, do I
> need to do anything else?

Tried this yet:

options pciehp pciehp_force=1

??

2009-10-12 15:06:43

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Mon, Oct 12 2009, Mark Lord wrote:
> Jens Axboe wrote:
>> On Mon, Oct 12 2009, Greg KH wrote:
>>> On Mon, Oct 12, 2009 at 02:06:20PM +0200, Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>>> I get:
>>>>
>>>> Firmware did not grant requested _OSC control
>>>> Firmware did not grant requested _OSC control
>>>> Firmware did not grant requested _OSC control
>>>> Firmware did not grant requested _OSC control
>>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>>> Firmware did not grant requested _OSC control
>>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>>> Firmware did not grant requested _OSC control
>>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>>
>>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>>> issue? How can I debug this?
>>> Can you try the acpiphp driver instead? That's usually the driver you
>>> want to use for "modern" systems (i.e. anything made in the past 5
>>> years.)
>>
>> I should have mentioned that I tried that too. It doesn't complain, but
>> I don't see my cards anywhere afterwards. I'm a hotplug newbie, do I
>> need to do anything else?
>
> Tried this yet:
>
> options pciehp pciehp_force=1

Nope, but it does find and register the hotplug slots, so I didn't think
it would make a difference. The _OSC is there.

I'll try tonight, just in case.

--
Jens Axboe

2009-10-12 21:49:34

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

> >>> On Mon, Oct 12, 2009 at 02:06:20PM +0200, Jens Axboe wrote:
> >>>> I'm trying to get pci-express hotplug working in a box here. I don't
> >>>> really care about the hotplug aspect, I just want the darn pci-e slots
> >>>> that are designated hotplug slots to actually WORK. When I load pciehp,
> >>>> I get:
> >>>>
> >>>> Firmware did not grant requested _OSC control
> >>>> Firmware did not grant requested _OSC control
> >>>> Firmware did not grant requested _OSC control
> >>>> Firmware did not grant requested _OSC control

This isn't just a benign message. It means the OS asked to take
over control of the slots and firmware really did say, "nope,
sorry".

Which means that this:

> On Mon, Oct 12 2009, Mark Lord wrote:
> > Tried this yet:
> >
> > options pciehp pciehp_force=1

Is generally a bad idea.

Don't do it unless you really know your platform well.

> >> On Mon, Oct 12 2009, Greg KH wrote:
> >>> Can you try the acpiphp driver instead? That's usually the
> >>> driver you want to use for "modern" systems (i.e. anything
> >>> made in the past 5 years.)
> >>
> >> I should have mentioned that I tried that too. It doesn't
> >> complain, but I don't see my cards anywhere afterwards. I'm
> >> a hotplug newbie, do I need to do anything else?

Can you modprobe acpiphp with debug=1? And send the output?

Thanks.

/ac

2009-10-13 03:20:37

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> Hi,
>
> I'm trying to get pci-express hotplug working in a box here. I don't
> really care about the hotplug aspect, I just want the darn pci-e slots
> that are designated hotplug slots to actually WORK. When I load pciehp,
> I get:
>
> Firmware did not grant requested _OSC control
> Firmware did not grant requested _OSC control
> Firmware did not grant requested _OSC control
> Firmware did not grant requested _OSC control
> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
> Firmware did not grant requested _OSC control
> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
> Firmware did not grant requested _OSC control
> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>
> and the devices in the hotplug slots stay off. Is this an ACPI/bios
> issue? How can I debug this?
>

Could you give me the result of "ls -lR /sys/bus/pci/slots/"
after loading pciehp?

Thanks,
Kenji Kaneshige

2009-10-13 08:29:42

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Mon, Oct 12 2009, Alex Chiang wrote:
> > >>> On Mon, Oct 12, 2009 at 02:06:20PM +0200, Jens Axboe wrote:
> > >>>> I'm trying to get pci-express hotplug working in a box here. I don't
> > >>>> really care about the hotplug aspect, I just want the darn pci-e slots
> > >>>> that are designated hotplug slots to actually WORK. When I load pciehp,
> > >>>> I get:
> > >>>>
> > >>>> Firmware did not grant requested _OSC control
> > >>>> Firmware did not grant requested _OSC control
> > >>>> Firmware did not grant requested _OSC control
> > >>>> Firmware did not grant requested _OSC control
>
> This isn't just a benign message. It means the OS asked to take
> over control of the slots and firmware really did say, "nope,
> sorry".
>
> Which means that this:
>
> > On Mon, Oct 12 2009, Mark Lord wrote:
> > > Tried this yet:
> > >
> > > options pciehp pciehp_force=1
>
> Is generally a bad idea.
>
> Don't do it unless you really know your platform well.

Since I had nothing to lose, I tried it. This is what it prints:

Firmware did not grant requested _OSC control
Firmware did not grant requested _OSC control
Firmware did not grant requested _OSC control
pciehp 0000:00:05.0:pcie04: Bypassing BIOS check for pciehp use on 0000:00:05.0
pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
pciehp 0000:00:05.0:pcie04: Power fault bit 0 set

and modprobe is continually in msleep() afterwards.

> > >> On Mon, Oct 12 2009, Greg KH wrote:
> > >>> Can you try the acpiphp driver instead? That's usually the
> > >>> driver you want to use for "modern" systems (i.e. anything
> > >>> made in the past 5 years.)
> > >>
> > >> I should have mentioned that I tried that too. It doesn't
> > >> complain, but I don't see my cards anywhere afterwards. I'm
> > >> a hotplug newbie, do I need to do anything else?
>
> Can you modprobe acpiphp with debug=1? And send the output?

acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
acpiphp: Slot [1] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
acpiphp: Slot [2] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
acpiphp: Slot [6] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
acpiphp: Slot [7] registered
acpiphp_glue: Bus 0000:87 has 1 slot
acpiphp_glue: Bus 0000:84 has 1 slot
acpiphp_glue: Bus 0000:0b has 1 slot
acpiphp_glue: Bus 0000:08 has 1 slot
acpiphp_glue: Total 4 slots

--
Jens Axboe

2009-10-13 08:32:32

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 13 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> Hi,
>>
>> I'm trying to get pci-express hotplug working in a box here. I don't
>> really care about the hotplug aspect, I just want the darn pci-e slots
>> that are designated hotplug slots to actually WORK. When I load pciehp,
>> I get:
>>
>> Firmware did not grant requested _OSC control
>> Firmware did not grant requested _OSC control
>> Firmware did not grant requested _OSC control
>> Firmware did not grant requested _OSC control
>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>> Firmware did not grant requested _OSC control
>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>> Firmware did not grant requested _OSC control
>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>
>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>> issue? How can I debug this?
>>
>
> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
> after loading pciehp?

I have attached the result of that ls prior to loading pciehp/acpiphp
(pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
only as well (acpiphp-load).

--
Jens Axboe


Attachments:
(No filename) (1.59 kB)
acpiphp-load.bz2 (2.01 kB)
pre-load.bz2 (1.94 kB)
pciehp-load.bz2 (2.07 kB)
Download all attachments

2009-10-13 10:49:51

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> Hi,
>>>
>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>> I get:
>>>
>>> Firmware did not grant requested _OSC control
>>> Firmware did not grant requested _OSC control
>>> Firmware did not grant requested _OSC control
>>> Firmware did not grant requested _OSC control
>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>> Firmware did not grant requested _OSC control
>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>> Firmware did not grant requested _OSC control
>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>
>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>> issue? How can I debug this?
>>>
>> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
>> after loading pciehp?
>
> I have attached the result of that ls prior to loading pciehp/acpiphp
> (pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
> only as well (acpiphp-load).
>

Thank you for the info. From the information, I confirmed that hotplug
slots are detected by pciehp even though _OSC evaluation failed. There
are two ways to take control from the firmware through ACPI control
method. One is _OSC control method, and the other is OSHP control method.
I guess your ACPI fimware has both _OSC and OSHP on DSDT (ACPI Namespace),
and pciehp assumes that it took control through OSHP after the _OSC
evaluation failure. I think this pciehp's behavior is wrong because of
the following reasons and I think pciehp driver mis-detected the hotplug
slots on your environment because of this.

- According to the PCI firmware specification, pciehp driver must use the
result of _OSC, if the platform implements both _OSC and OSHP.
- OSHP control method seems only for SHPC, not for PCI Express native hot-
plug. So pciehp must not evaluate OSHP to take control from firmware.

To confirm this, could you send me the dmesg output after loading pciehp
with 'debug_acpi' of pci_hotplug (PCI hotplug core driver) enabled?
For example,

$ su
# echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
# modprobe pciehp
# dmesg

And if it is possible, could you send me DSDT of your platform?

Anyway, my recommendation is using acpiphp on your environment because
your firmware didn't grant control over hotplug control through _OSC.
>From the information, acpiphp also detects the hotplug slots successfully.
Please try "echo 1 > /sys/bus/pci/slots/<slot#>/power". It would turn on
the slot and initialize adapter card on the slot.

Thanks,
Kenji Kaneshige

2009-10-13 11:26:05

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 13 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>>> I get:
>>>>
>>>> Firmware did not grant requested _OSC control
>>>> Firmware did not grant requested _OSC control
>>>> Firmware did not grant requested _OSC control
>>>> Firmware did not grant requested _OSC control
>>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>>> Firmware did not grant requested _OSC control
>>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>>> Firmware did not grant requested _OSC control
>>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>>
>>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>>> issue? How can I debug this?
>>>>
>>> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
>>> after loading pciehp?
>>
>> I have attached the result of that ls prior to loading pciehp/acpiphp
>> (pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
>> only as well (acpiphp-load).
>>
>
> Thank you for the info. From the information, I confirmed that hotplug
> slots are detected by pciehp even though _OSC evaluation failed. There
> are two ways to take control from the firmware through ACPI control
> method. One is _OSC control method, and the other is OSHP control method.
> I guess your ACPI fimware has both _OSC and OSHP on DSDT (ACPI Namespace),
> and pciehp assumes that it took control through OSHP after the _OSC
> evaluation failure. I think this pciehp's behavior is wrong because of
> the following reasons and I think pciehp driver mis-detected the hotplug
> slots on your environment because of this.
>
> - According to the PCI firmware specification, pciehp driver must use the
> result of _OSC, if the platform implements both _OSC and OSHP.
> - OSHP control method seems only for SHPC, not for PCI Express native hot-
> plug. So pciehp must not evaluate OSHP to take control from firmware.
>
> To confirm this, could you send me the dmesg output after loading pciehp
> with 'debug_acpi' of pci_hotplug (PCI hotplug core driver) enabled?
> For example,
>
> $ su
> # echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
> # modprobe pciehp
> # dmesg

See below.

> And if it is possible, could you send me DSDT of your platform?

Not sure I can do that, I'll check.

> Anyway, my recommendation is using acpiphp on your environment because
> your firmware didn't grant control over hotplug control through _OSC.
> From the information, acpiphp also detects the hotplug slots successfully.
> Please try "echo 1 > /sys/bus/pci/slots/<slot#>/power". It would turn on
> the slot and initialize adapter card on the slot.

It does find the 4 slots correctly. But if I try to turn on the power,
nothing happens and 'power' stays at 0. If I do the same with pciehp, I
get the same hang as described when using pciehp with pciehp_force=1.
But apparently this machine is getting a board replacement very soon, so
it may solve itself. Unless you think it should work and there's
something I can try to check, then lets just leave this issue until I
get it upgraded and return from kernel summit / JLS.


acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP5
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:05.0 (\_SB_.IOH0.MRP5)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP7
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:07.0 (\_SB_.IOH0.MRP7)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.PEX0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0.PEX0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Cannot get control of hotplug hardware for pci 0000:00:1c.0
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPI
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:07.0 (\_SB_.IOH1.MRPI)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPK
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:09.0 (\_SB_.IOH1.MRPK)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP5
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:05.0 (\_SB_.IOH0.MRP5)
pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP7
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:07.0 (\_SB_.IOH0.MRP7)
pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.PEX0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0.PEX0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Cannot get control of hotplug hardware for pci 0000:00:1c.0
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPI
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:07.0 (\_SB_.IOH1.MRPI)
pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPK
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:09.0 (\_SB_.IOH1.MRPK)
pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
pciehp: PCI Express Hot Plug Controller Driver version: 0.4

--
Jens Axboe

2009-10-13 17:28:13

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
> > > >> On Mon, Oct 12 2009, Greg KH wrote:
> > > >>> Can you try the acpiphp driver instead? That's usually the
> > > >>> driver you want to use for "modern" systems (i.e. anything
> > > >>> made in the past 5 years.)
> > > >>
> > > >> I should have mentioned that I tried that too. It doesn't
> > > >> complain, but I don't see my cards anywhere afterwards. I'm
> > > >> a hotplug newbie, do I need to do anything else?
> >
> > Can you modprobe acpiphp with debug=1? And send the output?
>
> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
> acpiphp: Slot [1] registered
> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
> acpiphp: Slot [2] registered
> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
> acpiphp: Slot [6] registered
> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
> acpiphp: Slot [7] registered
> acpiphp_glue: Bus 0000:87 has 1 slot
> acpiphp_glue: Bus 0000:84 has 1 slot
> acpiphp_glue: Bus 0000:0b has 1 slot
> acpiphp_glue: Bus 0000:08 has 1 slot
> acpiphp_glue: Total 4 slots

You mentioned in another mail that you echoed 1 into the various
slots' power files.

Did you do that after modprobing acpiphp with debug=1?

If so, there should be debug output when you try and turn them
on.

Also, quick dummy check, you are trying to power on populated
slots, right? :)

Can you send the output of lspci -vv? And I like the output of
lspci -vt as well... Both before and after loading acpiphp
please.

Thanks.

/ac

2009-10-14 05:27:53

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> Hi,
>>>>>
>>>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>>>> I get:
>>>>>
>>>>> Firmware did not grant requested _OSC control
>>>>> Firmware did not grant requested _OSC control
>>>>> Firmware did not grant requested _OSC control
>>>>> Firmware did not grant requested _OSC control
>>>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>>>> Firmware did not grant requested _OSC control
>>>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>>>> Firmware did not grant requested _OSC control
>>>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>>>
>>>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>>>> issue? How can I debug this?
>>>>>
>>>> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
>>>> after loading pciehp?
>>> I have attached the result of that ls prior to loading pciehp/acpiphp
>>> (pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
>>> only as well (acpiphp-load).
>>>
>> Thank you for the info. From the information, I confirmed that hotplug
>> slots are detected by pciehp even though _OSC evaluation failed. There
>> are two ways to take control from the firmware through ACPI control
>> method. One is _OSC control method, and the other is OSHP control method.
>> I guess your ACPI fimware has both _OSC and OSHP on DSDT (ACPI Namespace),
>> and pciehp assumes that it took control through OSHP after the _OSC
>> evaluation failure. I think this pciehp's behavior is wrong because of
>> the following reasons and I think pciehp driver mis-detected the hotplug
>> slots on your environment because of this.
>>
>> - According to the PCI firmware specification, pciehp driver must use the
>> result of _OSC, if the platform implements both _OSC and OSHP.
>> - OSHP control method seems only for SHPC, not for PCI Express native hot-
>> plug. So pciehp must not evaluate OSHP to take control from firmware.
>>
>> To confirm this, could you send me the dmesg output after loading pciehp
>> with 'debug_acpi' of pci_hotplug (PCI hotplug core driver) enabled?
>> For example,
>>
>> $ su
>> # echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
>> # modprobe pciehp
>> # dmesg
>
> See below.
>
>> And if it is possible, could you send me DSDT of your platform?
>
> Not sure I can do that, I'll check.
>
>> Anyway, my recommendation is using acpiphp on your environment because
>> your firmware didn't grant control over hotplug control through _OSC.
>> From the information, acpiphp also detects the hotplug slots successfully.
>> Please try "echo 1 > /sys/bus/pci/slots/<slot#>/power". It would turn on
>> the slot and initialize adapter card on the slot.
>
> It does find the 4 slots correctly. But if I try to turn on the power,
> nothing happens and 'power' stays at 0. If I do the same with pciehp, I
> get the same hang as described when using pciehp with pciehp_force=1.
> But apparently this machine is getting a board replacement very soon, so
> it may solve itself. Unless you think it should work and there's
> something I can try to check, then lets just leave this issue until I
> get it upgraded and return from kernel summit / JLS.
>

Could you try pciehp with "pciehp_debug" option enabled(*), and give me
the following information?

- "cat /sys/bus/pci/slots/*/*" output
- dmesg output after "echo 1 > /sys/bus/pci/slots/<slot#>/power"

(*) you can enable "pciehp_debug" option as follows

# modprobe pciehp pciehp_debug

I'm not sure, but I think one of the possibility is that your hot-plug
controller doesn't support power controller. On such slot, the slot is
usually turned on automatically at adapter card insertion. So card
re-insertion might make some difference.

By the way, I would like to know if your system has SHPC based hotplug
slots. Could you load shpchp driver and send me the
"ls -lR /sys/bus/pci/slots/" output? And I would like to see the dmesg
output with "debug_acpi" of pci_hotplug enabled. For example,

$ su
# echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
# modprobe shpchp
# ls -lR /sys/bus/pci/slots
# dmesg

>
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> Firmware did not grant requested _OSC control
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP5
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:05.0 (\_SB_.IOH0.MRP5)
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> Firmware did not grant requested _OSC control
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP7
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:07.0 (\_SB_.IOH0.MRP7)
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> Firmware did not grant requested _OSC control
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.PEX0
> acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0.PEX0 OSHP not found
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0 OSHP not found
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Cannot get control of hotplug hardware for pci 0000:00:1c.0
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPI
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:07.0 (\_SB_.IOH1.MRPI)
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPK
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:09.0 (\_SB_.IOH1.MRPK)
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> Firmware did not grant requested _OSC control
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP5
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:05.0 (\_SB_.IOH0.MRP5)
> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> Firmware did not grant requested _OSC control
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP7
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:07.0 (\_SB_.IOH0.MRP7)
> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> Firmware did not grant requested _OSC control
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.PEX0
> acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0.PEX0 OSHP not found
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
> acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0 OSHP not found
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Cannot get control of hotplug hardware for pci 0000:00:1c.0
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPI
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:07.0 (\_SB_.IOH1.MRPI)
> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 _id 340e ss_vid 0 ss_did 0
> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPK
> acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:09.0 (\_SB_.IOH1.MRPK)
> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>

It seems your environment has two PCI root bridges and one of them
(IOH0) provides both _OSC and OSHP, and the other (IOH1) provides
only OSHP. The _OSC under IOH0 doesn't grant hot-plug control, so
I think your system expects that at least the slots under IOH0 is
handled by acpiphp. But I don't have any idea about the slots
under IOH1 for now.

Thanks,
Kenji Kaneshige

2009-10-14 08:13:48

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 13 2009, Alex Chiang wrote:
> * Jens Axboe <[email protected]>:
> > > > >> On Mon, Oct 12 2009, Greg KH wrote:
> > > > >>> Can you try the acpiphp driver instead? That's usually the
> > > > >>> driver you want to use for "modern" systems (i.e. anything
> > > > >>> made in the past 5 years.)
> > > > >>
> > > > >> I should have mentioned that I tried that too. It doesn't
> > > > >> complain, but I don't see my cards anywhere afterwards. I'm
> > > > >> a hotplug newbie, do I need to do anything else?
> > >
> > > Can you modprobe acpiphp with debug=1? And send the output?
> >
> > acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
> > acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
> > acpiphp: Slot [1] registered
> > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
> > acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
> > acpiphp: Slot [2] registered
> > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
> > acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
> > acpiphp: Slot [6] registered
> > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
> > acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
> > acpiphp: Slot [7] registered
> > acpiphp_glue: Bus 0000:87 has 1 slot
> > acpiphp_glue: Bus 0000:84 has 1 slot
> > acpiphp_glue: Bus 0000:0b has 1 slot
> > acpiphp_glue: Bus 0000:08 has 1 slot
> > acpiphp_glue: Total 4 slots
>
> You mentioned in another mail that you echoed 1 into the various
> slots' power files.
>
> Did you do that after modprobing acpiphp with debug=1?
>
> If so, there should be debug output when you try and turn them
> on.

It produces:

acpiphp: enable_slot - physical_slot = 1
acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
acpiphp: enable_slot - physical_slot = 2
acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
acpiphp: enable_slot - physical_slot = 6
acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
acpiphp: enable_slot - physical_slot = 7
acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL

I have a card in one of the slots only this time.

> Also, quick dummy check, you are trying to power on populated
> slots, right? :)

Yes :-)

> Can you send the output of lspci -vv? And I like the output of
> lspci -vt as well... Both before and after loading acpiphp
> please.

Send privately.

--
Jens Axboe

2009-10-14 08:47:55

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Wed, Oct 14 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>>>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>>>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>>>>> I get:
>>>>>>
>>>>>> Firmware did not grant requested _OSC control
>>>>>> Firmware did not grant requested _OSC control
>>>>>> Firmware did not grant requested _OSC control
>>>>>> Firmware did not grant requested _OSC control
>>>>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>>>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>>>>> Firmware did not grant requested _OSC control
>>>>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>>>>> Firmware did not grant requested _OSC control
>>>>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>>>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>>>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>>>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>>>>
>>>>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>>>>> issue? How can I debug this?
>>>>>>
>>>>> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
>>>>> after loading pciehp?
>>>> I have attached the result of that ls prior to loading pciehp/acpiphp
>>>> (pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
>>>> only as well (acpiphp-load).
>>>>
>>> Thank you for the info. From the information, I confirmed that hotplug
>>> slots are detected by pciehp even though _OSC evaluation failed. There
>>> are two ways to take control from the firmware through ACPI control
>>> method. One is _OSC control method, and the other is OSHP control method.
>>> I guess your ACPI fimware has both _OSC and OSHP on DSDT (ACPI Namespace),
>>> and pciehp assumes that it took control through OSHP after the _OSC
>>> evaluation failure. I think this pciehp's behavior is wrong because of
>>> the following reasons and I think pciehp driver mis-detected the hotplug
>>> slots on your environment because of this.
>>>
>>> - According to the PCI firmware specification, pciehp driver must use the
>>> result of _OSC, if the platform implements both _OSC and OSHP.
>>> - OSHP control method seems only for SHPC, not for PCI Express native hot-
>>> plug. So pciehp must not evaluate OSHP to take control from firmware.
>>>
>>> To confirm this, could you send me the dmesg output after loading pciehp
>>> with 'debug_acpi' of pci_hotplug (PCI hotplug core driver) enabled?
>>> For example,
>>>
>>> $ su
>>> # echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
>>> # modprobe pciehp
>>> # dmesg
>>
>> See below.
>>
>>> And if it is possible, could you send me DSDT of your platform?
>>
>> Not sure I can do that, I'll check.
>>
>>> Anyway, my recommendation is using acpiphp on your environment because
>>> your firmware didn't grant control over hotplug control through _OSC.
>>> From the information, acpiphp also detects the hotplug slots successfully.
>>> Please try "echo 1 > /sys/bus/pci/slots/<slot#>/power". It would turn on
>>> the slot and initialize adapter card on the slot.
>>
>> It does find the 4 slots correctly. But if I try to turn on the power,
>> nothing happens and 'power' stays at 0. If I do the same with pciehp, I
>> get the same hang as described when using pciehp with pciehp_force=1.
>> But apparently this machine is getting a board replacement very soon, so
>> it may solve itself. Unless you think it should work and there's
>> something I can try to check, then lets just leave this issue until I
>> get it upgraded and return from kernel summit / JLS.
>>
>
> Could you try pciehp with "pciehp_debug" option enabled(*), and give me
> the following information?

I've attached the output of loading pciehp with the debug option
enabled.

> - "cat /sys/bus/pci/slots/*/*" output

Attached as slots

> - dmesg output after "echo 1 > /sys/bus/pci/slots/<slot#>/power"

# echo 1 > /sys/bus/pci/slots/1/power
pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
[...]

That last line repeats infinitely.

> (*) you can enable "pciehp_debug" option as follows
>
> # modprobe pciehp pciehp_debug
>
> I'm not sure, but I think one of the possibility is that your hot-plug
> controller doesn't support power controller. On such slot, the slot is
> usually turned on automatically at adapter card insertion. So card
> re-insertion might make some difference.

The dmesg seems to suggest it has a power controller. I also tried
removing and replugging the card, which is detected:

pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 8
pciehp 0000:00:05.0:pcie04: Presence/Notify input change
pciehp 0000:00:05.0:pcie04: Card not present on Slot(1)
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 8
pciehp 0000:00:05.0:pcie04: Presence/Notify input change
pciehp 0000:00:05.0:pcie04: Card present on Slot(1)

but the slot isn't powered on still.

> By the way, I would like to know if your system has SHPC based hotplug
> slots. Could you load shpchp driver and send me the
> "ls -lR /sys/bus/pci/slots/" output? And I would like to see the dmesg
> output with "debug_acpi" of pci_hotplug enabled. For example,
>
> $ su
> # echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
> # modprobe shpchp

This doesn't dump anything but:

shpchp: Standard Hot Plug PCI Controller Driver version: 0.4

> # ls -lR /sys/bus/pci/slots
> # dmesg

Attached.

--
Jens Axboe


Attachments:
(No filename) (5.88 kB)
ls-lr-slots (2.50 kB)
slots (196.00 B)
pciehp-load-debug (9.18 kB)
Download all attachments

2009-10-15 05:42:36

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Wed, Oct 14 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>>>>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>>>>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>>>>>> I get:
>>>>>>>
>>>>>>> Firmware did not grant requested _OSC control
>>>>>>> Firmware did not grant requested _OSC control
>>>>>>> Firmware did not grant requested _OSC control
>>>>>>> Firmware did not grant requested _OSC control
>>>>>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>>>>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>>>>>> Firmware did not grant requested _OSC control
>>>>>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>>>>>> Firmware did not grant requested _OSC control
>>>>>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>>>>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>>>>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>>>>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>>>>>
>>>>>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>>>>>> issue? How can I debug this?
>>>>>>>
>>>>>> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
>>>>>> after loading pciehp?
>>>>> I have attached the result of that ls prior to loading pciehp/acpiphp
>>>>> (pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
>>>>> only as well (acpiphp-load).
>>>>>
>>>> Thank you for the info. From the information, I confirmed that hotplug
>>>> slots are detected by pciehp even though _OSC evaluation failed. There
>>>> are two ways to take control from the firmware through ACPI control
>>>> method. One is _OSC control method, and the other is OSHP control method.
>>>> I guess your ACPI fimware has both _OSC and OSHP on DSDT (ACPI Namespace),
>>>> and pciehp assumes that it took control through OSHP after the _OSC
>>>> evaluation failure. I think this pciehp's behavior is wrong because of
>>>> the following reasons and I think pciehp driver mis-detected the hotplug
>>>> slots on your environment because of this.
>>>>
>>>> - According to the PCI firmware specification, pciehp driver must use the
>>>> result of _OSC, if the platform implements both _OSC and OSHP.
>>>> - OSHP control method seems only for SHPC, not for PCI Express native hot-
>>>> plug. So pciehp must not evaluate OSHP to take control from firmware.
>>>>
>>>> To confirm this, could you send me the dmesg output after loading pciehp
>>>> with 'debug_acpi' of pci_hotplug (PCI hotplug core driver) enabled?
>>>> For example,
>>>>
>>>> $ su
>>>> # echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
>>>> # modprobe pciehp
>>>> # dmesg
>>> See below.
>>>
>>>> And if it is possible, could you send me DSDT of your platform?
>>> Not sure I can do that, I'll check.
>>>
>>>> Anyway, my recommendation is using acpiphp on your environment because
>>>> your firmware didn't grant control over hotplug control through _OSC.
>>>> From the information, acpiphp also detects the hotplug slots successfully.
>>>> Please try "echo 1 > /sys/bus/pci/slots/<slot#>/power". It would turn on
>>>> the slot and initialize adapter card on the slot.
>>> It does find the 4 slots correctly. But if I try to turn on the power,
>>> nothing happens and 'power' stays at 0. If I do the same with pciehp, I
>>> get the same hang as described when using pciehp with pciehp_force=1.
>>> But apparently this machine is getting a board replacement very soon, so
>>> it may solve itself. Unless you think it should work and there's
>>> something I can try to check, then lets just leave this issue until I
>>> get it upgraded and return from kernel summit / JLS.
>>>
>> Could you try pciehp with "pciehp_debug" option enabled(*), and give me
>> the following information?
>
> I've attached the output of loading pciehp with the debug option
> enabled.
>
>> - "cat /sys/bus/pci/slots/*/*" output
>
> Attached as slots
>
>> - dmesg output after "echo 1 > /sys/bus/pci/slots/<slot#>/power"
>
> # echo 1 > /sys/bus/pci/slots/1/power
> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> [...]
>
> That last line repeats infinitely.

Thank you very much for information.

The direct cause of the problem that your slot was not turned on
is power fault. I guess acpiphp is suffering the same problem.
Unfortunately, it's difficult for me to analyze the root cause
of this power fault. Please ask the hardware vendor about it. I
hope board replacement will fix the problem.

By the way, thanks to your report, I noticed the several points
that might need to be fixed as follows. I'll try to improve that.

- The message "Firmware did not grant requested _OSC control" is
confusing and similar message is already displayed by the caller
of acpi_pci_osc_control_set(). Therefore, it should be removed.

- If the platform has _OSC control method, OSHP should not be
evaluated.

- (maybe) pciehp must not evaluate OSHP (But your platform seems
to provide OSHP for several PCIe hotplug slots because your
platform provides OSHP even though it doesn't have any SHPC
based PCI/PCI-X hot-plug slots. I need to check PCI firmware
spec again).

- pciehp needs something to prevent power fault interrupt storm.

Thanks,
Kenji Kaneshige

2009-10-15 09:43:08

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Thu, Oct 15 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Wed, Oct 14 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Oct 13 2009, Kenji Kaneshige wrote:
>>>>>>> Jens Axboe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm trying to get pci-express hotplug working in a box here. I don't
>>>>>>>> really care about the hotplug aspect, I just want the darn pci-e slots
>>>>>>>> that are designated hotplug slots to actually WORK. When I load pciehp,
>>>>>>>> I get:
>>>>>>>>
>>>>>>>> Firmware did not grant requested _OSC control
>>>>>>>> Firmware did not grant requested _OSC control
>>>>>>>> Firmware did not grant requested _OSC control
>>>>>>>> Firmware did not grant requested _OSC control
>>>>>>>> pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
>>>>>>>> pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
>>>>>>>> Firmware did not grant requested _OSC control
>>>>>>>> pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>>>>> pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
>>>>>>>> Firmware did not grant requested _OSC control
>>>>>>>> pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
>>>>>>>> pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
>>>>>>>> pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
>>>>>>>> pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
>>>>>>>> pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>>>>>>>>
>>>>>>>> and the devices in the hotplug slots stay off. Is this an ACPI/bios
>>>>>>>> issue? How can I debug this?
>>>>>>>>
>>>>>>> Could you give me the result of "ls -lR /sys/bus/pci/slots/"
>>>>>>> after loading pciehp?
>>>>>> I have attached the result of that ls prior to loading pciehp/acpiphp
>>>>>> (pre-load), after loading pciehp (pciehp-load), and with acpiphp loaded
>>>>>> only as well (acpiphp-load).
>>>>>>
>>>>> Thank you for the info. From the information, I confirmed that hotplug
>>>>> slots are detected by pciehp even though _OSC evaluation failed. There
>>>>> are two ways to take control from the firmware through ACPI control
>>>>> method. One is _OSC control method, and the other is OSHP control method.
>>>>> I guess your ACPI fimware has both _OSC and OSHP on DSDT (ACPI Namespace),
>>>>> and pciehp assumes that it took control through OSHP after the _OSC
>>>>> evaluation failure. I think this pciehp's behavior is wrong because of
>>>>> the following reasons and I think pciehp driver mis-detected the hotplug
>>>>> slots on your environment because of this.
>>>>>
>>>>> - According to the PCI firmware specification, pciehp driver must use the
>>>>> result of _OSC, if the platform implements both _OSC and OSHP.
>>>>> - OSHP control method seems only for SHPC, not for PCI Express native hot-
>>>>> plug. So pciehp must not evaluate OSHP to take control from firmware.
>>>>>
>>>>> To confirm this, could you send me the dmesg output after loading pciehp
>>>>> with 'debug_acpi' of pci_hotplug (PCI hotplug core driver) enabled?
>>>>> For example,
>>>>>
>>>>> $ su
>>>>> # echo Y > /sys/module/pci_hotplug/parameters/debug_acpi
>>>>> # modprobe pciehp
>>>>> # dmesg
>>>> See below.
>>>>
>>>>> And if it is possible, could you send me DSDT of your platform?
>>>> Not sure I can do that, I'll check.
>>>>
>>>>> Anyway, my recommendation is using acpiphp on your environment because
>>>>> your firmware didn't grant control over hotplug control through _OSC.
>>>>> From the information, acpiphp also detects the hotplug slots successfully.
>>>>> Please try "echo 1 > /sys/bus/pci/slots/<slot#>/power". It would turn on
>>>>> the slot and initialize adapter card on the slot.
>>>> It does find the 4 slots correctly. But if I try to turn on the power,
>>>> nothing happens and 'power' stays at 0. If I do the same with pciehp, I
>>>> get the same hang as described when using pciehp with pciehp_force=1.
>>>> But apparently this machine is getting a board replacement very soon, so
>>>> it may solve itself. Unless you think it should work and there's
>>>> something I can try to check, then lets just leave this issue until I
>>>> get it upgraded and return from kernel summit / JLS.
>>>>
>>> Could you try pciehp with "pciehp_debug" option enabled(*), and give me
>>> the following information?
>>
>> I've attached the output of loading pciehp with the debug option
>> enabled.
>>
>>> - "cat /sys/bus/pci/slots/*/*" output
>>
>> Attached as slots
>>
>>> - dmesg output after "echo 1 > /sys/bus/pci/slots/<slot#>/power"
>>
>> # echo 1 > /sys/bus/pci/slots/1/power
>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> [...]
>>
>> That last line repeats infinitely.
>
> Thank you very much for information.
>
> The direct cause of the problem that your slot was not turned on
> is power fault. I guess acpiphp is suffering the same problem.
> Unfortunately, it's difficult for me to analyze the root cause
> of this power fault. Please ask the hardware vendor about it. I
> hope board replacement will fix the problem.

OK, I'll try with the new board when back and see what happens. If the
power fault persists, I'll poke the vendor about it.

> By the way, thanks to your report, I noticed the several points
> that might need to be fixed as follows. I'll try to improve that.
>
> - The message "Firmware did not grant requested _OSC control" is
> confusing and similar message is already displayed by the caller
> of acpi_pci_osc_control_set(). Therefore, it should be removed.

It's one of those messages that mean very little to you, unless you have
an understanding of how hotplug is supposed to work. So removing it
sounds god.

> - If the platform has _OSC control method, OSHP should not be
> evaluated.
>
> - (maybe) pciehp must not evaluate OSHP (But your platform seems
> to provide OSHP for several PCIe hotplug slots because your
> platform provides OSHP even though it doesn't have any SHPC
> based PCI/PCI-X hot-plug slots. I need to check PCI firmware
> spec again).
>
> - pciehp needs something to prevent power fault interrupt storm.

Definitely, it essentially hangs the box and requires a reboot.

Thanks a lot for looking into these issues, I'll be back with a status
message when I've tried the new board.

--
Jens Axboe

2009-10-20 19:07:06

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
> On Tue, Oct 13 2009, Alex Chiang wrote:
> > > > Can you modprobe acpiphp with debug=1? And send the output?
> > >
> > > acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
> > > acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
> > > acpiphp: Slot [1] registered
> > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
> > > acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
> > > acpiphp: Slot [2] registered
> > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
> > > acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
> > > acpiphp: Slot [6] registered
> > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
> > > acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
> > > acpiphp: Slot [7] registered
> > > acpiphp_glue: Bus 0000:87 has 1 slot
> > > acpiphp_glue: Bus 0000:84 has 1 slot
> > > acpiphp_glue: Bus 0000:0b has 1 slot
> > > acpiphp_glue: Bus 0000:08 has 1 slot
> > > acpiphp_glue: Total 4 slots
> >
> > You mentioned in another mail that you echoed 1 into the various
> > slots' power files.
> >
> > Did you do that after modprobing acpiphp with debug=1?
> >
> > If so, there should be debug output when you try and turn them
> > on.
>
> It produces:
>
> acpiphp: enable_slot - physical_slot = 1
> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> acpiphp: enable_slot - physical_slot = 2
> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> acpiphp: enable_slot - physical_slot = 6
> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> acpiphp: enable_slot - physical_slot = 7
> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL

Hm, so for some reason, firmware on your machine is telling us
that it doesn't think cards are present and/or enabled.

Unfortunately, I don't know why your firmware would be saying
that. We could add some more debug printks to see what firmware
thinks about your system... Or we could just wait and see what
happens after you get your hardware replaced.

> I have a card in one of the slots only this time.
>
> > Also, quick dummy check, you are trying to power on populated
> > slots, right? :)
>
> Yes :-)
>
> > Can you send the output of lspci -vv? And I like the output of
> > lspci -vt as well... Both before and after loading acpiphp
> > please.
>
> Send privately.

No difference in before and after. Odd.

If you want to poke us again after your hardware swap, please do
so. Sorry for being not so helpful. :-/

/ac

2009-10-26 10:54:17

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 20 2009, Alex Chiang wrote:
> * Jens Axboe <[email protected]>:
> > On Tue, Oct 13 2009, Alex Chiang wrote:
> > > > > Can you modprobe acpiphp with debug=1? And send the output?
> > > >
> > > > acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> > > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
> > > > acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
> > > > acpiphp: Slot [1] registered
> > > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
> > > > acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
> > > > acpiphp: Slot [2] registered
> > > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
> > > > acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
> > > > acpiphp: Slot [6] registered
> > > > acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
> > > > acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
> > > > acpiphp: Slot [7] registered
> > > > acpiphp_glue: Bus 0000:87 has 1 slot
> > > > acpiphp_glue: Bus 0000:84 has 1 slot
> > > > acpiphp_glue: Bus 0000:0b has 1 slot
> > > > acpiphp_glue: Bus 0000:08 has 1 slot
> > > > acpiphp_glue: Total 4 slots
> > >
> > > You mentioned in another mail that you echoed 1 into the various
> > > slots' power files.
> > >
> > > Did you do that after modprobing acpiphp with debug=1?
> > >
> > > If so, there should be debug output when you try and turn them
> > > on.
> >
> > It produces:
> >
> > acpiphp: enable_slot - physical_slot = 1
> > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > acpiphp: enable_slot - physical_slot = 2
> > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > acpiphp: enable_slot - physical_slot = 6
> > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > acpiphp: enable_slot - physical_slot = 7
> > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>
> Hm, so for some reason, firmware on your machine is telling us
> that it doesn't think cards are present and/or enabled.
>
> Unfortunately, I don't know why your firmware would be saying
> that. We could add some more debug printks to see what firmware
> thinks about your system... Or we could just wait and see what
> happens after you get your hardware replaced.

New board, the exact same thing happens.

> > I have a card in one of the slots only this time.
> >
> > > Also, quick dummy check, you are trying to power on populated
> > > slots, right? :)
> >
> > Yes :-)
> >
> > > Can you send the output of lspci -vv? And I like the output of
> > > lspci -vt as well... Both before and after loading acpiphp
> > > please.
> >
> > Send privately.
>
> No difference in before and after. Odd.
>
> If you want to poke us again after your hardware swap, please do
> so. Sorry for being not so helpful. :-/

Poke :-)

One more thing I tried was pushing the power button on the slot
manually. With acpiphp, I get the same messages as above. Using pciehp,
I get the same power fault bit interrupt storm. So no difference from
using the sysfs interface or doing it on the box side, doesn't work
either way.

--
Jens Axboe

2009-10-27 02:48:40

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
> > > acpiphp: enable_slot - physical_slot = 1
> > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > acpiphp: enable_slot - physical_slot = 2
> > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > acpiphp: enable_slot - physical_slot = 6
> > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > acpiphp: enable_slot - physical_slot = 7
> > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> >
> > Hm, so for some reason, firmware on your machine is telling us
> > that it doesn't think cards are present and/or enabled.
> >
> > Unfortunately, I don't know why your firmware would be saying
> > that. We could add some more debug printks to see what firmware
> > thinks about your system... Or we could just wait and see what
> > happens after you get your hardware replaced.

Let's try and find out why firmware is telling us that we didn't
get ACPI_STA_ALL.

Can you please apply this debug patch and send the output? Again,
please modprobe with debug=1.

Thanks,
/ac

---
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 58d25a1..2caa447 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -797,9 +797,13 @@ static int power_on_slot(struct acpiphp_slot *slot)
struct list_head *l;
int retval = 0;

+ printk("%s\n", __func__);
+
/* if already enabled, just skip */
- if (slot->flags & SLOT_POWEREDON)
+ if (slot->flags & SLOT_POWEREDON) {
+ printk(" slot %ld already powered on\n", slot->sun);
goto err_exit;
+ }

list_for_each (l, &slot->funcs) {
func = list_entry(l, struct acpiphp_func, sibling);
@@ -813,6 +817,8 @@ static int power_on_slot(struct acpiphp_slot *slot)
goto err_exit;
} else
break;
+ } else {
+ printk(" no _PS0\n");
}
}

@@ -1122,11 +1128,14 @@ static unsigned int get_slot_status(struct acpiphp_slot *slot)
struct list_head *l;
struct acpiphp_func *func;

+ printk("%s\n", __func__);
+
list_for_each (l, &slot->funcs) {
func = list_entry(l, struct acpiphp_func, sibling);

if (func->flags & FUNC_HAS_STA) {
status = acpi_evaluate_integer(func->handle, "_STA", NULL, &sta);
+ printk(" FUNC_HAS_STA status %d _STA %#lx\n", status, sta);
if (ACPI_SUCCESS(status) && sta)
break;
} else {
@@ -1134,6 +1143,7 @@ static unsigned int get_slot_status(struct acpiphp_slot *slot)
PCI_DEVFN(slot->device,
func->function),
PCI_VENDOR_ID, &dvid);
+ printk(" reading config space dvid %#lx\n", dvid);
if (dvid != 0xffffffff) {
sta = ACPI_STA_ALL;
break;

2009-10-27 06:32:11

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Tue, Oct 20 2009, Alex Chiang wrote:
>> * Jens Axboe <[email protected]>:
>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>> acpiphp: Slot [1] registered
>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>> acpiphp: Slot [2] registered
>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>> acpiphp: Slot [6] registered
>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>> acpiphp: Slot [7] registered
>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>> acpiphp_glue: Total 4 slots
>>>> You mentioned in another mail that you echoed 1 into the various
>>>> slots' power files.
>>>>
>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>
>>>> If so, there should be debug output when you try and turn them
>>>> on.
>>> It produces:
>>>
>>> acpiphp: enable_slot - physical_slot = 1
>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>> acpiphp: enable_slot - physical_slot = 2
>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>> acpiphp: enable_slot - physical_slot = 6
>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>> acpiphp: enable_slot - physical_slot = 7
>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>> Hm, so for some reason, firmware on your machine is telling us
>> that it doesn't think cards are present and/or enabled.
>>
>> Unfortunately, I don't know why your firmware would be saying
>> that. We could add some more debug printks to see what firmware
>> thinks about your system... Or we could just wait and see what
>> happens after you get your hardware replaced.
>
> New board, the exact same thing happens.
>
>>> I have a card in one of the slots only this time.
>>>
>>>> Also, quick dummy check, you are trying to power on populated
>>>> slots, right? :)
>>> Yes :-)
>>>
>>>> Can you send the output of lspci -vv? And I like the output of
>>>> lspci -vt as well... Both before and after loading acpiphp
>>>> please.
>>> Send privately.
>> No difference in before and after. Odd.
>>
>> If you want to poke us again after your hardware swap, please do
>> so. Sorry for being not so helpful. :-/
>
> Poke :-)
>
> One more thing I tried was pushing the power button on the slot
> manually. With acpiphp, I get the same messages as above. Using pciehp,
> I get the same power fault bit interrupt storm. So no difference from
> using the sysfs interface or doing it on the box side, doesn't work
> either way.
>

I'd like to confirm power fault interrupt storm, just in case.
Could you get /proc/interrupts information after power fault
problem happens and send it to me?

Thanks,
Kenji Kaneshige


2009-10-27 08:26:34

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Mon, Oct 26 2009, Alex Chiang wrote:
> * Jens Axboe <[email protected]>:
> > > > acpiphp: enable_slot - physical_slot = 1
> > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > > acpiphp: enable_slot - physical_slot = 2
> > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > > acpiphp: enable_slot - physical_slot = 6
> > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > > acpiphp: enable_slot - physical_slot = 7
> > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > >
> > > Hm, so for some reason, firmware on your machine is telling us
> > > that it doesn't think cards are present and/or enabled.
> > >
> > > Unfortunately, I don't know why your firmware would be saying
> > > that. We could add some more debug printks to see what firmware
> > > thinks about your system... Or we could just wait and see what
> > > happens after you get your hardware replaced.
>
> Let's try and find out why firmware is telling us that we didn't
> get ACPI_STA_ALL.
>
> Can you please apply this debug patch and send the output? Again,
> please modprobe with debug=1.

acpiphp: enable_slot - physical_slot = 1
power_on_slot
no _PS0
no _PS0
no _PS0
no _PS0
no _PS0
no _PS0
no _PS0
no _PS0
get_slot_status
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
reading config space dvid 0xffffffff
acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL

--
Jens Axboe

2009-10-27 08:27:18

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 27 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>> * Jens Axboe <[email protected]>:
>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>> acpiphp: Slot [1] registered
>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>> acpiphp: Slot [2] registered
>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>> acpiphp: Slot [6] registered
>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>> acpiphp: Slot [7] registered
>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>> acpiphp_glue: Total 4 slots
>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>> slots' power files.
>>>>>
>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>
>>>>> If so, there should be debug output when you try and turn them
>>>>> on.
>>>> It produces:
>>>>
>>>> acpiphp: enable_slot - physical_slot = 1
>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>> acpiphp: enable_slot - physical_slot = 2
>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>> acpiphp: enable_slot - physical_slot = 6
>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>> acpiphp: enable_slot - physical_slot = 7
>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>> Hm, so for some reason, firmware on your machine is telling us
>>> that it doesn't think cards are present and/or enabled.
>>>
>>> Unfortunately, I don't know why your firmware would be saying
>>> that. We could add some more debug printks to see what firmware
>>> thinks about your system... Or we could just wait and see what
>>> happens after you get your hardware replaced.
>>
>> New board, the exact same thing happens.
>>
>>>> I have a card in one of the slots only this time.
>>>>
>>>>> Also, quick dummy check, you are trying to power on populated
>>>>> slots, right? :)
>>>> Yes :-)
>>>>
>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>> please.
>>>> Send privately.
>>> No difference in before and after. Odd.
>>>
>>> If you want to poke us again after your hardware swap, please do
>>> so. Sorry for being not so helpful. :-/
>>
>> Poke :-)
>>
>> One more thing I tried was pushing the power button on the slot
>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>> I get the same power fault bit interrupt storm. So no difference from
>> using the sysfs interface or doing it on the box side, doesn't work
>> either way.
>>
>
> I'd like to confirm power fault interrupt storm, just in case.
> Could you get /proc/interrupts information after power fault
> problem happens and send it to me?

The box pretty much hangs when I try to power on a slot with pciehp, so
it's not easy to do... It doesn't hang with acpiphp, but doesn't work
either (see previous reply to Alex).

--
Jens Axboe

2009-10-27 08:34:18

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 27 2009, Jens Axboe wrote:
> On Mon, Oct 26 2009, Alex Chiang wrote:
> > * Jens Axboe <[email protected]>:
> > > > > acpiphp: enable_slot - physical_slot = 1
> > > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > > > acpiphp: enable_slot - physical_slot = 2
> > > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > > > acpiphp: enable_slot - physical_slot = 6
> > > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > > > acpiphp: enable_slot - physical_slot = 7
> > > > > acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> > > >
> > > > Hm, so for some reason, firmware on your machine is telling us
> > > > that it doesn't think cards are present and/or enabled.
> > > >
> > > > Unfortunately, I don't know why your firmware would be saying
> > > > that. We could add some more debug printks to see what firmware
> > > > thinks about your system... Or we could just wait and see what
> > > > happens after you get your hardware replaced.
> >
> > Let's try and find out why firmware is telling us that we didn't
> > get ACPI_STA_ALL.
> >
> > Can you please apply this debug patch and send the output? Again,
> > please modprobe with debug=1.
>
> acpiphp: enable_slot - physical_slot = 1
> power_on_slot
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> get_slot_status
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL

Since this is a new board and BIOS, below is the info from loading
acpiphp with debug enabled and acpi debug enabled.

acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
get_slot_status
get_slot_status
acpiphp: Slot [1] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
get_slot_status
get_slot_status
acpiphp: Slot [2] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
get_slot_status
get_slot_status
acpiphp: Slot [6] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
get_slot_status
get_slot_status
acpiphp: Slot [7] registered
acpiphp_glue: Bus 0000:87 has 1 slot
acpiphp_glue: Bus 0000:84 has 1 slot
acpiphp_glue: Bus 0000:0b has 1 slot
acpiphp_glue: Bus 0000:08 has 1 slot
acpiphp_glue: Total 4 slots
acpiphp: Slot [1] unregistered
acpiphp: release_slot - physical_slot = 1
acpiphp: Slot [2] unregistered
acpiphp: release_slot - physical_slot = 2
acpiphp: Slot [6] unregistered
acpiphp: release_slot - physical_slot = 6
acpiphp: Slot [7] unregistered
acpiphp: release_slot - physical_slot = 7

--
Jens Axboe

2009-10-27 08:36:29

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 27 2009, Jens Axboe wrote:
> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
> > Jens Axboe wrote:
> >> On Tue, Oct 20 2009, Alex Chiang wrote:
> >>> * Jens Axboe <[email protected]>:
> >>>> On Tue, Oct 13 2009, Alex Chiang wrote:
> >>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
> >>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> >>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
> >>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
> >>>>>> acpiphp: Slot [1] registered
> >>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
> >>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
> >>>>>> acpiphp: Slot [2] registered
> >>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
> >>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
> >>>>>> acpiphp: Slot [6] registered
> >>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
> >>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
> >>>>>> acpiphp: Slot [7] registered
> >>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
> >>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
> >>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
> >>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
> >>>>>> acpiphp_glue: Total 4 slots
> >>>>> You mentioned in another mail that you echoed 1 into the various
> >>>>> slots' power files.
> >>>>>
> >>>>> Did you do that after modprobing acpiphp with debug=1?
> >>>>>
> >>>>> If so, there should be debug output when you try and turn them
> >>>>> on.
> >>>> It produces:
> >>>>
> >>>> acpiphp: enable_slot - physical_slot = 1
> >>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> >>>> acpiphp: enable_slot - physical_slot = 2
> >>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> >>>> acpiphp: enable_slot - physical_slot = 6
> >>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> >>>> acpiphp: enable_slot - physical_slot = 7
> >>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
> >>> Hm, so for some reason, firmware on your machine is telling us
> >>> that it doesn't think cards are present and/or enabled.
> >>>
> >>> Unfortunately, I don't know why your firmware would be saying
> >>> that. We could add some more debug printks to see what firmware
> >>> thinks about your system... Or we could just wait and see what
> >>> happens after you get your hardware replaced.
> >>
> >> New board, the exact same thing happens.
> >>
> >>>> I have a card in one of the slots only this time.
> >>>>
> >>>>> Also, quick dummy check, you are trying to power on populated
> >>>>> slots, right? :)
> >>>> Yes :-)
> >>>>
> >>>>> Can you send the output of lspci -vv? And I like the output of
> >>>>> lspci -vt as well... Both before and after loading acpiphp
> >>>>> please.
> >>>> Send privately.
> >>> No difference in before and after. Odd.
> >>>
> >>> If you want to poke us again after your hardware swap, please do
> >>> so. Sorry for being not so helpful. :-/
> >>
> >> Poke :-)
> >>
> >> One more thing I tried was pushing the power button on the slot
> >> manually. With acpiphp, I get the same messages as above. Using pciehp,
> >> I get the same power fault bit interrupt storm. So no difference from
> >> using the sysfs interface or doing it on the box side, doesn't work
> >> either way.
> >>
> >
> > I'd like to confirm power fault interrupt storm, just in case.
> > Could you get /proc/interrupts information after power fault
> > problem happens and send it to me?
>
> The box pretty much hangs when I try to power on a slot with pciehp, so
> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
> either (see previous reply to Alex).

Ditto new debug info, in case it is of any assistance. Loading pciehp
with pciehp_debug=1.

acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP5
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:05.0 (\_SB_.IOH0.MRP5)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP7
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:07.0 (\_SB_.IOH0.MRP7)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.PEX0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0.PEX0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Cannot get control of hotplug hardware for pci 0000:00:1c.0
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPI
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:07.0 (\_SB_.IOH1.MRPI)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPK
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:09.0 (\_SB_.IOH1.MRPK)
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP5
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:05.0 (\_SB_.IOH0.MRP5)
pciehp 0000:00:05.0:pcie04: Hotplug Controller:
pciehp 0000:00:05.0:pcie04: Seg/Bus/Dev/Func/IRQ : 0000:00:05.0 IRQ 75
pciehp 0000:00:05.0:pcie04: Vendor ID : 0x8086
pciehp 0000:00:05.0:pcie04: Device ID : 0x340c
pciehp 0000:00:05.0:pcie04: Subsystem ID : 0x0000
pciehp 0000:00:05.0:pcie04: Subsystem Vendor ID : 0x0000
pciehp 0000:00:05.0:pcie04: PCIe Cap offset : 0x90
pciehp 0000:00:05.0:pcie04: PCI resource [7] : 0x1000@0x1000
pciehp 0000:00:05.0:pcie04: PCI resource [8] : 0x200000@0x91a00000
pciehp 0000:00:05.0:pcie04: PCI resource [9] : 0x200000@0x91c00000
pciehp 0000:00:05.0:pcie04: Slot Capabilities : 0x0008005b
pciehp 0000:00:05.0:pcie04: Physical Slot Number : 1
pciehp 0000:00:05.0:pcie04: Attention Button : yes
pciehp 0000:00:05.0:pcie04: Power Controller : yes
pciehp 0000:00:05.0:pcie04: MRL Sensor : no
pciehp 0000:00:05.0:pcie04: Attention Indicator : yes
pciehp 0000:00:05.0:pcie04: Power Indicator : yes
pciehp 0000:00:05.0:pcie04: Hot-Plug Surprise : no
pciehp 0000:00:05.0:pcie04: EMI Present : no
pciehp 0000:00:05.0:pcie04: Command Completed : yes
pciehp 0000:00:05.0:pcie04: Slot Status : 0x0040
pciehp 0000:00:05.0:pcie04: Slot Control : 0x07c0
pciehp 0000:00:05.0:pcie04: Link Active Reporting supported
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: HPC vendor_id 8086 device_id 340c ss_vid 0 ss_did 0
pciehp 0000:00:05.0:pcie04: Registering domain:bus:dev=0000:08:00 sun=1
pciehp 0000:00:05.0:pcie04: get_power_status: physical_slot = 1
pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 7c0
pciehp 0000:00:05.0:pcie04: get_attention_status: physical_slot = 1
pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 7c0
pciehp 0000:00:05.0:pcie04: get_latch_status: physical_slot = 1
pciehp 0000:00:05.0:pcie04: get_adapter_status: physical_slot = 1
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: service driver pciehp loaded
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.MRP7
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:00:07.0 (\_SB_.IOH0.MRP7)
pciehp 0000:00:07.0:pcie04: Hotplug Controller:
pciehp 0000:00:07.0:pcie04: Seg/Bus/Dev/Func/IRQ : 0000:00:07.0 IRQ 76
pciehp 0000:00:07.0:pcie04: Vendor ID : 0x8086
pciehp 0000:00:07.0:pcie04: Device ID : 0x340e
pciehp 0000:00:07.0:pcie04: Subsystem ID : 0x0000
pciehp 0000:00:07.0:pcie04: Subsystem Vendor ID : 0x0000
pciehp 0000:00:07.0:pcie04: PCIe Cap offset : 0x90
pciehp 0000:00:07.0:pcie04: PCI resource [7] : 0x1000@0x2000
pciehp 0000:00:07.0:pcie04: PCI resource [8] : 0x200000@0x91e00000
pciehp 0000:00:07.0:pcie04: PCI resource [9] : 0x200000@0x92000000
pciehp 0000:00:07.0:pcie04: Slot Capabilities : 0x0010005b
pciehp 0000:00:07.0:pcie04: Physical Slot Number : 2
pciehp 0000:00:07.0:pcie04: Attention Button : yes
pciehp 0000:00:07.0:pcie04: Power Controller : yes
pciehp 0000:00:07.0:pcie04: MRL Sensor : no
pciehp 0000:00:07.0:pcie04: Attention Indicator : yes
pciehp 0000:00:07.0:pcie04: Power Indicator : yes
pciehp 0000:00:07.0:pcie04: Hot-Plug Surprise : no
pciehp 0000:00:07.0:pcie04: EMI Present : no
pciehp 0000:00:07.0:pcie04: Command Completed : yes
pciehp 0000:00:07.0:pcie04: Slot Status : 0x0000
pciehp 0000:00:07.0:pcie04: Slot Control : 0x07c0
pciehp 0000:00:07.0:pcie04: Link Active Reporting supported
pciehp 0000:00:07.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
pciehp 0000:00:07.0:pcie04: Registering domain:bus:dev=0000:0b:00 sun=2
pciehp 0000:00:07.0:pcie04: get_power_status: physical_slot = 2
pciehp 0000:00:07.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 7c0
pciehp 0000:00:07.0:pcie04: get_attention_status: physical_slot = 2
pciehp 0000:00:07.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 7c0
pciehp 0000:00:07.0:pcie04: get_latch_status: physical_slot = 2
pciehp 0000:00:07.0:pcie04: get_adapter_status: physical_slot = 2
pciehp 0000:00:07.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:07.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:07.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
pciehp 0000:00:07.0:pcie04: service driver pciehp loaded
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
Firmware did not grant requested _OSC control
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0.PEX0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0.PEX0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH0
acpi_pcihp: acpi_run_oshp: acpi_run_oshp:\_SB_.IOH0 OSHP not found
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Cannot get control of hotplug hardware for pci 0000:00:1c.0
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPI
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:07.0 (\_SB_.IOH1.MRPI)
pciehp 0000:80:07.0:pcie04: Hotplug Controller:
pciehp 0000:80:07.0:pcie04: Seg/Bus/Dev/Func/IRQ : 0000:80:07.0 IRQ 84
pciehp 0000:80:07.0:pcie04: Vendor ID : 0x8086
pciehp 0000:80:07.0:pcie04: Device ID : 0x340e
pciehp 0000:80:07.0:pcie04: Subsystem ID : 0x0000
pciehp 0000:80:07.0:pcie04: Subsystem Vendor ID : 0x0000
pciehp 0000:80:07.0:pcie04: PCIe Cap offset : 0x90
pciehp 0000:80:07.0:pcie04: PCI resource [7] : 0x1000@0x8000
pciehp 0000:80:07.0:pcie04: PCI resource [8] : 0x200000@0x92700000
pciehp 0000:80:07.0:pcie04: PCI resource [9] : 0x200000@0x92900000
pciehp 0000:80:07.0:pcie04: Slot Capabilities : 0x0030005b
pciehp 0000:80:07.0:pcie04: Physical Slot Number : 6
pciehp 0000:80:07.0:pcie04: Attention Button : yes
pciehp 0000:80:07.0:pcie04: Power Controller : yes
pciehp 0000:80:07.0:pcie04: MRL Sensor : no
pciehp 0000:80:07.0:pcie04: Attention Indicator : yes
pciehp 0000:80:07.0:pcie04: Power Indicator : yes
pciehp 0000:80:07.0:pcie04: Hot-Plug Surprise : no
pciehp 0000:80:07.0:pcie04: EMI Present : no
pciehp 0000:80:07.0:pcie04: Command Completed : yes
pciehp 0000:80:07.0:pcie04: Slot Status : 0x0040
pciehp 0000:80:07.0:pcie04: Slot Control : 0x0740
pciehp 0000:80:07.0:pcie04: Link Active Reporting supported
pciehp 0000:80:07.0:pcie04: Command not completed in 1000 msec
pciehp 0000:80:07.0:pcie04: HPC vendor_id 8086 device_id 340e ss_vid 0 ss_did 0
pciehp 0000:80:07.0:pcie04: Registering domain:bus:dev=0000:84:00 sun=6
pciehp 0000:80:07.0:pcie04: get_power_status: physical_slot = 6
pciehp 0000:80:07.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 740
pciehp 0000:80:07.0:pcie04: get_attention_status: physical_slot = 6
pciehp 0000:80:07.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 740
pciehp 0000:80:07.0:pcie04: get_latch_status: physical_slot = 6
pciehp 0000:80:07.0:pcie04: get_adapter_status: physical_slot = 6
pciehp 0000:80:07.0:pcie04: Command not completed in 1000 msec
pciehp 0000:80:07.0:pcie04: service driver pciehp loaded
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Trying to get hotplug control for \_SB_.IOH1.MRPK
acpi_pcihp: acpi_get_hp_hw_control_from_firmware: Gained control for hotplug HW for pci 0000:80:09.0 (\_SB_.IOH1.MRPK)
pciehp 0000:80:09.0:pcie04: Hotplug Controller:
pciehp 0000:80:09.0:pcie04: Seg/Bus/Dev/Func/IRQ : 0000:80:09.0 IRQ 85
pciehp 0000:80:09.0:pcie04: Vendor ID : 0x8086
pciehp 0000:80:09.0:pcie04: Device ID : 0x3410
pciehp 0000:80:09.0:pcie04: Subsystem ID : 0x0000
pciehp 0000:80:09.0:pcie04: Subsystem Vendor ID : 0x0000
pciehp 0000:80:09.0:pcie04: PCIe Cap offset : 0x90
pciehp 0000:80:09.0:pcie04: PCI resource [7] : 0x1000@0x9000
pciehp 0000:80:09.0:pcie04: PCI resource [8] : 0x200000@0x92b00000
pciehp 0000:80:09.0:pcie04: PCI resource [9] : 0x200000@0x92d00000
pciehp 0000:80:09.0:pcie04: Slot Capabilities : 0x0038005b
pciehp 0000:80:09.0:pcie04: Physical Slot Number : 7
pciehp 0000:80:09.0:pcie04: Attention Button : yes
pciehp 0000:80:09.0:pcie04: Power Controller : yes
pciehp 0000:80:09.0:pcie04: MRL Sensor : no
pciehp 0000:80:09.0:pcie04: Attention Indicator : yes
pciehp 0000:80:09.0:pcie04: Power Indicator : yes
pciehp 0000:80:09.0:pcie04: Hot-Plug Surprise : no
pciehp 0000:80:09.0:pcie04: EMI Present : no
pciehp 0000:80:09.0:pcie04: Command Completed : yes
pciehp 0000:80:09.0:pcie04: Slot Status : 0x0000
pciehp 0000:80:09.0:pcie04: Slot Control : 0x07c0
pciehp 0000:80:09.0:pcie04: Link Active Reporting supported
pciehp 0000:80:09.0:pcie04: Command not completed in 1000 msec
pciehp 0000:80:09.0:pcie04: HPC vendor_id 8086 device_id 3410 ss_vid 0 ss_did 0
pciehp 0000:80:09.0:pcie04: Registering domain:bus:dev=0000:87:00 sun=7
pciehp 0000:80:09.0:pcie04: get_power_status: physical_slot = 7
pciehp 0000:80:09.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 7c0
pciehp 0000:80:09.0:pcie04: get_attention_status: physical_slot = 7
pciehp 0000:80:09.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 7c0
pciehp 0000:80:09.0:pcie04: get_latch_status: physical_slot = 7
pciehp 0000:80:09.0:pcie04: get_adapter_status: physical_slot = 7
pciehp 0000:80:09.0:pcie04: Command not completed in 1000 msec
pciehp 0000:80:09.0:pcie04: Command not completed in 1000 msec
pciehp 0000:80:09.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
pciehp 0000:80:09.0:pcie04: service driver pciehp loaded
pciehp: pcie_port_service_register = 0
pciehp: PCI Express Hot Plug Controller Driver version: 0.4

--
Jens Axboe

2009-10-27 08:46:22

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>> * Jens Axboe <[email protected]>:
>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>> acpiphp: Slot [1] registered
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>> acpiphp: Slot [2] registered
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>> acpiphp: Slot [6] registered
>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>> acpiphp: Slot [7] registered
>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>> acpiphp_glue: Total 4 slots
>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>> slots' power files.
>>>>>>
>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>
>>>>>> If so, there should be debug output when you try and turn them
>>>>>> on.
>>>>> It produces:
>>>>>
>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>> Hm, so for some reason, firmware on your machine is telling us
>>>> that it doesn't think cards are present and/or enabled.
>>>>
>>>> Unfortunately, I don't know why your firmware would be saying
>>>> that. We could add some more debug printks to see what firmware
>>>> thinks about your system... Or we could just wait and see what
>>>> happens after you get your hardware replaced.
>>> New board, the exact same thing happens.
>>>
>>>>> I have a card in one of the slots only this time.
>>>>>
>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>> slots, right? :)
>>>>> Yes :-)
>>>>>
>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>> please.
>>>>> Send privately.
>>>> No difference in before and after. Odd.
>>>>
>>>> If you want to poke us again after your hardware swap, please do
>>>> so. Sorry for being not so helpful. :-/
>>> Poke :-)
>>>
>>> One more thing I tried was pushing the power button on the slot
>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>> I get the same power fault bit interrupt storm. So no difference from
>>> using the sysfs interface or doing it on the box side, doesn't work
>>> either way.
>>>
>> I'd like to confirm power fault interrupt storm, just in case.
>> Could you get /proc/interrupts information after power fault
>> problem happens and send it to me?
>
> The box pretty much hangs when I try to power on a slot with pciehp, so
> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
> either (see previous reply to Alex).
>

Ok, I'll try to make a debug patch.
Maybe I can send it tomorrow.

Thanks,
Kenji Kaneshige

2009-10-27 15:15:51

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
>
> Since this is a new board and BIOS, below is the info from loading
> acpiphp with debug enabled and acpi debug enabled.

Thanks. Can you please send your DSDT as well please?

You can obtain that with the acpidump tools. If they're not part
of your distro, you can find them on lesswatts.org.

Thanks.

/ac

2009-10-28 06:15:19

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

---
drivers/pci/hotplug/pciehp_hpc.c | 8 ++++++++
1 file changed, 8 insertions(+)

Index: 20091026/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- 20091026.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ 20091026/drivers/pci/hotplug/pciehp_hpc.c
@@ -612,6 +612,7 @@ static irqreturn_t pcie_isr(int irq, voi
struct controller *ctrl = (struct controller *)dev_id;
struct slot *slot = ctrl->slot;
u16 detected, intr_loc;
+ static int nr_power_faults = 0;

/*
* In order to guarantee that all interrupt events are
@@ -664,6 +665,13 @@ static irqreturn_t pcie_isr(int irq, voi
if (intr_loc & PCI_EXP_SLTSTA_PDC)
pciehp_handle_presence_change(slot);

+ if ((intr_loc & PCI_EXP_SLTSTA_PFD) && (++nr_power_faults > 100)) {
+ u16 reg16;
+ pciehp_readw(ctrl, PCI_EXP_SLTCTL, &reg16);
+ reg16 &= ~PCI_EXP_SLTCTL_PFDE;
+ pciehp_writew(ctrl, PCI_EXP_SLTCTL, reg16);
+ }
+
/* Check Power Fault Detected */
if ((intr_loc & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
ctrl->power_fault_detected = 1;


Attachments:
pciehp-power-fault-debug.patch (1.05 kB)

2009-10-28 09:18:15

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Tue, Oct 27 2009, Alex Chiang wrote:
> * Jens Axboe <[email protected]>:
> >
> > Since this is a new board and BIOS, below is the info from loading
> > acpiphp with debug enabled and acpi debug enabled.
>
> Thanks. Can you please send your DSDT as well please?
>
> You can obtain that with the acpidump tools. If they're not part
> of your distro, you can find them on lesswatts.org.

Sent privately.

--
Jens Axboe

2009-10-28 09:23:24

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Wed, Oct 28 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>> * Jens Axboe <[email protected]>:
>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>> slots' power files.
>>>>>>>
>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>
>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>> on.
>>>>>> It produces:
>>>>>>
>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>> that it doesn't think cards are present and/or enabled.
>>>>>
>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>> that. We could add some more debug printks to see what firmware
>>>>> thinks about your system... Or we could just wait and see what
>>>>> happens after you get your hardware replaced.
>>>> New board, the exact same thing happens.
>>>>
>>>>>> I have a card in one of the slots only this time.
>>>>>>
>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>> slots, right? :)
>>>>>> Yes :-)
>>>>>>
>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>> please.
>>>>>> Send privately.
>>>>> No difference in before and after. Odd.
>>>>>
>>>>> If you want to poke us again after your hardware swap, please do
>>>>> so. Sorry for being not so helpful. :-/
>>>> Poke :-)
>>>>
>>>> One more thing I tried was pushing the power button on the slot
>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>> I get the same power fault bit interrupt storm. So no difference from
>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>> either way.
>>>>
>>> I'd like to confirm power fault interrupt storm, just in case.
>>> Could you get /proc/interrupts information after power fault
>>> problem happens and send it to me?
>>
>> The box pretty much hangs when I try to power on a slot with pciehp, so
>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>> either (see previous reply to Alex).
>>
>
> Could you try the attached debugging patch? With this patch, power
> fault interrupt would be disabled after 100 power fault detected (
> I hope so). You can get /proc/interrupts after that.

Here is the output of doing the power on with that patch applied.

pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: Power fault interrupt received
pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
pciehp 0000:00:05.0:pcie04: Link Training Error occurs
pciehp 0000:00:05.0:pcie04: Failed to check link status
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779

--
Jens Axboe


Attachments:
(No filename) (10.47 kB)
interrupts (25.29 kB)
Download all attachments

2009-10-28 19:55:19

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
> On Tue, Oct 27 2009, Alex Chiang wrote:
> > * Jens Axboe <[email protected]>:
> > >
> > > Since this is a new board and BIOS, below is the info from loading
> > > acpiphp with debug enabled and acpi debug enabled.
> >
> > Thanks. Can you please send your DSDT as well please?
> >
> > You can obtain that with the acpidump tools. If they're not part
> > of your distro, you can find them on lesswatts.org.
>
> Sent privately.

Sorry, one more RTT of debug info needed. :-/

Can you send dmesg output and contents of /proc/iomem?

Thanks.

2009-10-28 20:46:55

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
>
> acpiphp: enable_slot - physical_slot = 1
> power_on_slot
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> get_slot_status
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> reading config space dvid 0xffffffff
> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL

Hm, as Kenji-san writes in an earlier email:

The direct cause of the problem that your slot was not
turned on is power fault. I guess acpiphp is suffering
the same problem. Unfortunately, it's difficult for me
to analyze the root cause of this power fault. Please ask
the hardware vendor about it. I hope board replacement
will fix the problem.

In get_slot_status(), we're trying to read the card's vendor ID,
which is a mandatory PCI config space register. The fact that we
can't even read that suggests something is going wrong way
earlier before we get to this point.

Bjorn wondered on irc if your slots are physically working. Do
you know if they work under Windows? If they do, then it would be
good to find out how your bridges are being programmed, which I
believe you can discover with the Device Manager.

/ac

2009-10-28 21:39:45

by Alex Chiang

[permalink] [raw]
Subject: Re: pci-express hotplug

* Jens Axboe <[email protected]>:
>
> acpiphp: enable_slot - physical_slot = 1
> power_on_slot
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0
> no _PS0

One final thought -- your DSDT doesn't provide any power methods
such as _PS[0-3] (I grepped your DSDT so basing my statement on
more than just the output above), and without those, I'm pretty
sure that there's no way for the OS to communicate to the BIOS
that we want to power those slots on.

So, something funky is going on with your BIOS. This isn't some
weird proto board or something, is it? ;)

/ac

2009-10-29 07:44:40

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>> * Jens Axboe <[email protected]>:
>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>> slots' power files.
>>>>>>>>
>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>
>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>> on.
>>>>>>> It produces:
>>>>>>>
>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>
>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>> that. We could add some more debug printks to see what firmware
>>>>>> thinks about your system... Or we could just wait and see what
>>>>>> happens after you get your hardware replaced.
>>>>> New board, the exact same thing happens.
>>>>>
>>>>>>> I have a card in one of the slots only this time.
>>>>>>>
>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>> slots, right? :)
>>>>>>> Yes :-)
>>>>>>>
>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>> please.
>>>>>>> Send privately.
>>>>>> No difference in before and after. Odd.
>>>>>>
>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>> so. Sorry for being not so helpful. :-/
>>>>> Poke :-)
>>>>>
>>>>> One more thing I tried was pushing the power button on the slot
>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>> either way.
>>>>>
>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>> Could you get /proc/interrupts information after power fault
>>>> problem happens and send it to me?
>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>> either (see previous reply to Alex).
>>>
>> Could you try the attached debugging patch? With this patch, power
>> fault interrupt would be disabled after 100 power fault detected (
>> I hope so). You can get /proc/interrupts after that.
>
> Here is the output of doing the power on with that patch applied.
>
> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
> pciehp 0000:00:05.0:pcie04: Link Training Error occurs
> pciehp 0000:00:05.0:pcie04: Failed to check link status
> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>

>From the console log, it seems that my debug patch worked as I expected
(power fault event interrupts ware disabled after 100 power fault event).
But for some reasons, /proc/interrupts indicates only 5 interrupts of
pciehp. Just in case, did you get /proc/interrupts after doing power on?

Thanks,
Kenji Kaneshige


2009-10-29 08:57:52

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Wed, Oct 28 2009, Alex Chiang wrote:
> * Jens Axboe <[email protected]>:
> >
> > acpiphp: enable_slot - physical_slot = 1
> > power_on_slot
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
>
> One final thought -- your DSDT doesn't provide any power methods
> such as _PS[0-3] (I grepped your DSDT so basing my statement on
> more than just the output above), and without those, I'm pretty
> sure that there's no way for the OS to communicate to the BIOS
> that we want to power those slots on.
>
> So, something funky is going on with your BIOS. This isn't some
> weird proto board or something, is it? ;)

It's pre-production, but not a prototype. I'll take it up with the
vendor.

--
Jens Axboe

2009-10-29 08:58:24

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Thu, Oct 29 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>> * Jens Axboe <[email protected]>:
>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>> slots' power files.
>>>>>>>>>
>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>
>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>> on.
>>>>>>>> It produces:
>>>>>>>>
>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>
>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>> happens after you get your hardware replaced.
>>>>>> New board, the exact same thing happens.
>>>>>>
>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>
>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>> slots, right? :)
>>>>>>>> Yes :-)
>>>>>>>>
>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>> please.
>>>>>>>> Send privately.
>>>>>>> No difference in before and after. Odd.
>>>>>>>
>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>> Poke :-)
>>>>>>
>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>> either way.
>>>>>>
>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>> Could you get /proc/interrupts information after power fault
>>>>> problem happens and send it to me?
>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>> either (see previous reply to Alex).
>>>>
>>> Could you try the attached debugging patch? With this patch, power
>>> fault interrupt would be disabled after 100 power fault detected (
>>> I hope so). You can get /proc/interrupts after that.
>>
>> Here is the output of doing the power on with that patch applied.
>>
>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp
>> 0000:00:05.0:pcie04: Failed to check link status
>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>
>
> From the console log, it seems that my debug patch worked as I expected
> (power fault event interrupts ware disabled after 100 power fault event).
> But for some reasons, /proc/interrupts indicates only 5 interrupts of
> pciehp. Just in case, did you get /proc/interrupts after doing power on?

Nope, it was captured post the power on attempt and the above log dump.

--
Jens Axboe

2009-10-29 09:23:28

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

Jens Axboe wrote:
> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>> * Jens Axboe <[email protected]>:
>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>> slots' power files.
>>>>>>>>>>
>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>
>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>> on.
>>>>>>>>> It produces:
>>>>>>>>>
>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>
>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>> happens after you get your hardware replaced.
>>>>>>> New board, the exact same thing happens.
>>>>>>>
>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>
>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>> slots, right? :)
>>>>>>>>> Yes :-)
>>>>>>>>>
>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>> please.
>>>>>>>>> Send privately.
>>>>>>>> No difference in before and after. Odd.
>>>>>>>>
>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>> Poke :-)
>>>>>>>
>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>> either way.
>>>>>>>
>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>> Could you get /proc/interrupts information after power fault
>>>>>> problem happens and send it to me?
>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>> either (see previous reply to Alex).
>>>>>
>>>> Could you try the attached debugging patch? With this patch, power
>>>> fault interrupt would be disabled after 100 power fault detected (
>>>> I hope so). You can get /proc/interrupts after that.
>>> Here is the output of doing the power on with that patch applied.
>>>
>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp
>>> 0000:00:05.0:pcie04: Failed to check link status
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>
>> From the console log, it seems that my debug patch worked as I expected
>> (power fault event interrupts ware disabled after 100 power fault event).
>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
>
> Nope, it was captured post the power on attempt and the above log dump.
>

Can I confirm that? (sorry for my poor English skill)

The /proc/interrupt was captured *before* the power on attempt and the log.
Correct?

Thanks,
Kenji Kaneshige




2009-10-29 09:24:51

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug

On Thu, Oct 29 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>>> Jens Axboe wrote:
>>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>>> * Jens Axboe <[email protected]>:
>>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>>> slots' power files.
>>>>>>>>>>>
>>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>>
>>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>>> on.
>>>>>>>>>> It produces:
>>>>>>>>>>
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>>
>>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>>> happens after you get your hardware replaced.
>>>>>>>> New board, the exact same thing happens.
>>>>>>>>
>>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>>
>>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>>> slots, right? :)
>>>>>>>>>> Yes :-)
>>>>>>>>>>
>>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>>> please.
>>>>>>>>>> Send privately.
>>>>>>>>> No difference in before and after. Odd.
>>>>>>>>>
>>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>>> Poke :-)
>>>>>>>>
>>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>>> either way.
>>>>>>>>
>>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>>> Could you get /proc/interrupts information after power fault
>>>>>>> problem happens and send it to me?
>>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>>> either (see previous reply to Alex).
>>>>>>
>>>>> Could you try the attached debugging patch? With this patch, power
>>>>> fault interrupt would be disabled after 100 power fault detected (
>>>>> I hope so). You can get /proc/interrupts after that.
>>>> Here is the output of doing the power on with that patch applied.
>>>>
>>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp
>>>> 0000:00:05.0:pcie04: Failed to check link status
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>>
>>> From the console log, it seems that my debug patch worked as I expected
>>> (power fault event interrupts ware disabled after 100 power fault event).
>>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
>>
>> Nope, it was captured post the power on attempt and the above log dump.
>>
>
> Can I confirm that? (sorry for my poor English skill)
>
> The /proc/interrupt was captured *before* the power on attempt and the log.
> Correct?

No, the /proc/interrupt output was captured AFTER the power on attempt
and the log capture shown above.

--
Jens Axboe

2009-10-29 18:55:49

by Jens Axboe

[permalink] [raw]
Subject: Re: pci-express hotplug


Just a note for the archives - after chatting with Alex on irc about
this issue and trying other cards, the likely suspect seems to be the
specific card used and/or the firmware on that card. Hotplug works
otherwise, just not with that card at least.

--
Jens Axboe

2009-11-02 05:28:31

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: pci-express hotplug

If firmware doesn't grant over native hotplug control through ACPI
_OSC method, we must not evaluate OSHP.

Signed-off-by: Kenji Kaneshige <[email protected]>

---
drivers/pci/hotplug/acpi_pcihp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Index: 20091026/drivers/pci/hotplug/acpi_pcihp.c
===================================================================
--- 20091026.orig/drivers/pci/hotplug/acpi_pcihp.c
+++ 20091026/drivers/pci/hotplug/acpi_pcihp.c
@@ -362,6 +362,8 @@ int acpi_get_hp_hw_control_from_firmware
status = acpi_pci_osc_control_set(handle, flags);
if (ACPI_SUCCESS(status))
goto got_one;
+ if (status == AE_SUPPORT)
+ goto no_control;
kfree(string.pointer);
string = (struct acpi_buffer){ ACPI_ALLOCATE_BUFFER, NULL };
}
@@ -394,10 +396,9 @@ int acpi_get_hp_hw_control_from_firmware
if (ACPI_FAILURE(status))
break;
}
-
+no_control:
dbg("Cannot get control of hotplug hardware for pci %s\n",
pci_name(pdev));
-
kfree(string.pointer);
return -ENODEV;
got_one:


Attachments:
pciehp-fix-power-fault-interrupt-storm-problem.patch (4.44 kB)
pci-hotplug-fix-oshp-evaluation.patch (1.03 kB)
Download all attachments