2014-06-17 19:27:41

by Myron Stowe

[permalink] [raw]
Subject: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

During PCIe hot-plug initialization - pciehp_probe - data structures
related to slot capabilities are set up. As part of this set up, ISRs are
put in place to handle slot events and all event bits are cleared out.

This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
Slot Status bit to the event bits that are cleared out during
initialization.

Reference:
PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
(PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).

Signed-off-by: Myron Stowe <[email protected]>
---

drivers/pci/hotplug/pciehp_hpc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 42914e0..0568416 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
- PCI_EXP_SLTSTA_CC);
+ PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);

/* Disable software notification */
pcie_disable_notification(ctrl);


2014-06-17 21:39:10

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <[email protected]> wrote:
> During PCIe hot-plug initialization - pciehp_probe - data structures
> related to slot capabilities are set up. As part of this set up, ISRs are
> put in place to handle slot events and all event bits are cleared out.
>
> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
> Slot Status bit to the event bits that are cleared out during
> initialization.

I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
notifications for hot-plug and removal"). Prior to that, pcie_isr()
didn't look at the PCI_EXP_SLTSTA_DLLSC bit.

Apparently there's a non-public report of spurious messages like this
at boot-time, with no actual hotplug events:

pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
0000:83:00, cannot hot-add
pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00

Device 0000:83:00.0 was enumerated previously, during the normal PCI
device enumeration. I suspect DLLSC was set by the hardware when the
link came up after power-on, and it remained set until Linux booted.
Then when we take the first pciehp interrupt, we notice DLLSC is set
and think it's a new link state change.

This might be system-specific, because some BIOSes might clear DLLSC
before handing off to the OS. Or my theory might be all wet.

But I think we should clear DLLSC in any case. We clear all the other
RW1C bits in Slot Status, and I think we should clear DLLSC as well.

I'd like to include a bugzilla or mailing list reference for the
spurious message, and I'd also like confirmation that this changes
actually fixes it.

Bjorn

> Reference:
> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>
> Signed-off-by: Myron Stowe <[email protected]>
> ---
>
> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 42914e0..0568416 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
> - PCI_EXP_SLTSTA_CC);
> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>
> /* Disable software notification */
> pcie_disable_notification(ctrl);
>

2014-06-17 22:55:42

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <[email protected]> wrote:
>> During PCIe hot-plug initialization - pciehp_probe - data structures
>> related to slot capabilities are set up. As part of this set up, ISRs are
>> put in place to handle slot events and all event bits are cleared out.
>>
>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>> Slot Status bit to the event bits that are cleared out during
>> initialization.
>
> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> notifications for hot-plug and removal"). Prior to that, pcie_isr()
> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>
> Apparently there's a non-public report of spurious messages like this
> at boot-time, with no actual hotplug events:
>
> pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
> pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> 0000:83:00, cannot hot-add
> pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>

I think I have seen that message once in a while in our systems.
Rajat, didn't we talk about this a while ago ?

Guenter

> Device 0000:83:00.0 was enumerated previously, during the normal PCI
> device enumeration. I suspect DLLSC was set by the hardware when the
> link came up after power-on, and it remained set until Linux booted.
> Then when we take the first pciehp interrupt, we notice DLLSC is set
> and think it's a new link state change.
>
> This might be system-specific, because some BIOSes might clear DLLSC
> before handing off to the OS. Or my theory might be all wet.
>
> But I think we should clear DLLSC in any case. We clear all the other
> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>
> I'd like to include a bugzilla or mailing list reference for the
> spurious message, and I'd also like confirmation that this changes
> actually fixes it.
>
> Bjorn
>
>> Reference:
>> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
>> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>
>> Signed-off-by: Myron Stowe <[email protected]>
>> ---
>>
>> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 42914e0..0568416 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>> - PCI_EXP_SLTSTA_CC);
>> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>
>> /* Disable software notification */
>> pcie_disable_notification(ctrl);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>

2014-06-24 20:34:20

by Rajat Jain

[permalink] [raw]
Subject: RE: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

Hello,

Sorry, I missed this email.

Please see below.

> -----Original Message-----
> From: [email protected] [mailto:linux-pci-
> [email protected]] On Behalf Of Guenter Roeck
> Sent: Tuesday, June 17, 2014 3:56 PM
> To: Bjorn Helgaas; Myron Stowe
> Cc: [email protected]; Kenji Kaneshige; linux-
> [email protected]; Rajat Jain
> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed
> bit when clearing the Slot Status register's event bits
>
> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe
> <[email protected]> wrote:
> >> During PCIe hot-plug initialization - pciehp_probe - data structures
> >> related to slot capabilities are set up. As part of this set up,
> >> ISRs are put in place to handle slot events and all event bits are cleared
> out.
> >>
> >> This patch adds the Data Link Layer State Changed
> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are
> >> cleared out during initialization.
> >
> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> > notifications for hot-plug and removal"). Prior to that, pcie_isr()
> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
> >
> > Apparently there's a non-public report of spurious messages like this
> > at boot-time, with no actual hotplug events:
> >
> > pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
> > pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> > 0000:83:00, cannot hot-add
> > pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
> >
>
> I think I have seen that message once in a while in our systems.
> Rajat, didn't we talk about this a while ago ?

Essentially my hiccup was that I was not sure whether the driver should or should not take care of the link change events that have happened BEFORE the driver gets loaded. This has more impact if the pciehp is built as a kernel module.

As an example:

It is common for the platform init code built into the kernel, to take the PCI devices on the board out of reset. And that can happen after the PCI enumeration but before the pciehp driver gets loaded. Thus in this condition, with this patch, the pciehp will ignore the linkup event, thus device will not be visible. The only way (other than a rescan) to do hot-add the device would be to apply-and-remove-reset-signal to the device again. At which point pciehp may give a warning about about an attempt to remove a non-existent card, and then will proceed fine with hot-add.

The patch looks good, only wanted to make sure that we understand and agree that the pciehp should NOT process and link events that have happened before the pciehp was loaded.

Acked-by: Rajat Jain <[email protected]>

Thanks,

Rajat


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-06-24 21:16:45

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Tue, Jun 24, 2014 at 2:34 PM, Rajat Jain <[email protected]> wrote:
> Hello,
>
> Sorry, I missed this email.
>
> Please see below.
>
>> -----Original Message-----
>> From: [email protected] [mailto:linux-pci-
>> [email protected]] On Behalf Of Guenter Roeck
>> Sent: Tuesday, June 17, 2014 3:56 PM
>> To: Bjorn Helgaas; Myron Stowe
>> Cc: [email protected]; Kenji Kaneshige; linux-
>> [email protected]; Rajat Jain
>> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed
>> bit when clearing the Slot Status register's event bits
>>
>> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
>> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe
>> <[email protected]> wrote:
>> >> During PCIe hot-plug initialization - pciehp_probe - data structures
>> >> related to slot capabilities are set up. As part of this set up,
>> >> ISRs are put in place to handle slot events and all event bits are cleared
>> out.
>> >>
>> >> This patch adds the Data Link Layer State Changed
>> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are
>> >> cleared out during initialization.
>> >
>> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> > notifications for hot-plug and removal"). Prior to that, pcie_isr()
>> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>> >
>> > Apparently there's a non-public report of spurious messages like this
>> > at boot-time, with no actual hotplug events:
>> >
>> > pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>> > pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> > 0000:83:00, cannot hot-add
>> > pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>> >
>>
>> I think I have seen that message once in a while in our systems.
>> Rajat, didn't we talk about this a while ago ?
>
> Essentially my hiccup was that I was not sure whether the driver should or should not take care of the link change events that have happened BEFORE the driver gets loaded. This has more impact if the pciehp is built as a kernel module.
>
> As an example:
>
> It is common for the platform init code built into the kernel, to take the PCI devices on the board out of reset. And that can happen after the PCI enumeration but before the pciehp driver gets loaded. Thus in this condition, with this patch, the pciehp will ignore the linkup event, thus device will not be visible. The only way (other than a rescan) to do hot-add the device would be to apply-and-remove-reset-signal to the device again. At which point pciehp may give a warning about about an attempt to remove a non-existent card, and then will proceed fine with hot-add.

When you saw these problems, was pciehp a module? We changed it
(c10cc483bf3f in v3.11) so it can't be a module any more, but if there
can still be a significant window between enumeration and pciehp init,
I'd like to understand more about it.

> The patch looks good, only wanted to make sure that we understand and agree that the pciehp should NOT process and link events that have happened before the pciehp was loaded.

Currently (before this patch), we clear the Attention Button Pressed,
Power Fault Detected, MRL Sensor Changed, Presence Detect Changed, and
Command Completed bits, but we *don't* clear Data Link Layer State
Changed.

That asymmetry seems hard to justify. For example, if a card were
inserted after enumeration but before pciehp is initialized, we'd miss
the PDC indication, so I think we would fail to notice the new device.
That seems basically the same as missing the linkup event in your
example. In both cases, I think we *should* notice the new device, so
the fact that we don't is a problem.

But since pciehp can't be a module any more, the window between PCI
enumeration and pciehp initialization should be relatively small. I
think the best way to fix this would be to integrate pciehp more
closely into PCI enumeration, e.g., by doing pcie_init() at the point
where we discover the bridge, before we enumerate any devices below
the bridge. This is somewhat tangled up with acpihp, so I don't know
how complicated it would be to do this.

So I guess my argument is:

- Ignoring events that happen before pciehp init decreases our
dependency on BIOS
- Handling all events consistently is very important
- We currently have a problem with missing pre-pciehp events, but
there is a way to fix this

> Acked-by: Rajat Jain <[email protected]>

Thanks for taking a look at this!

Bjorn

2014-06-25 05:26:22

by Rajat Jain

[permalink] [raw]
Subject: RE: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

Hello,

> >>
> >> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
> >> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe
> >> <[email protected]> wrote:
> >> >> During PCIe hot-plug initialization - pciehp_probe - data
> >> >> structures related to slot capabilities are set up. As part of
> >> >> this set up, ISRs are put in place to handle slot events and all
> >> >> event bits are cleared
> >> out.
> >> >>
> >> >> This patch adds the Data Link Layer State Changed
> >> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are
> >> >> cleared out during initialization.
> >> >
> >> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link
> >> > change notifications for hot-plug and removal"). Prior to that,
> >> > pcie_isr() didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
> >> >
> >> > Apparently there's a non-public report of spurious messages like
> >> > this at boot-time, with no actual hotplug events:
> >> >
> >> > pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
> >> > pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists
> >> > at 0000:83:00, cannot hot-add
> >> > pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
> >> >
> >>
> >> I think I have seen that message once in a while in our systems.
> >> Rajat, didn't we talk about this a while ago ?
> >
> > Essentially my hiccup was that I was not sure whether the driver should or
> should not take care of the link change events that have happened BEFORE
> the driver gets loaded. This has more impact if the pciehp is built as a kernel
> module.
> >
> > As an example:
> >
> > It is common for the platform init code built into the kernel, to take the PCI
> devices on the board out of reset. And that can happen after the PCI
> enumeration but before the pciehp driver gets loaded. Thus in this condition,
> with this patch, the pciehp will ignore the linkup event, thus device will not
> be visible. The only way (other than a rescan) to do hot-add the device would
> be to apply-and-remove-reset-signal to the device again. At which point
> pciehp may give a warning about about an attempt to remove a non-existent
> card, and then will proceed fine with hot-add.
>
> When you saw these problems, was pciehp a module?

No, we did not actually hit *this* problem (DLLSC getting set before pciehp init). We had observed another HW specific problem where both link event and presence detect were toggling, hence 2 independent hot-plug interrupts events were coming, thus resulting in spurious messages.

> We changed it
> (c10cc483bf3f in v3.11) so it can't be a module any more, but if there can still
> be a significant window between enumeration and pciehp init, I'd like to
> understand more about it.

Oh, my bad. I did not ever try to actually build it as a module, I just assumed that it'd be possible. I was just thinking of corner cases I guess.

>
> > The patch looks good, only wanted to make sure that we understand and
> agree that the pciehp should NOT process and link events that have
> happened before the pciehp was loaded.
>
> Currently (before this patch), we clear the Attention Button Pressed, Power
> Fault Detected, MRL Sensor Changed, Presence Detect Changed, and
> Command Completed bits, but we *don't* clear Data Link Layer State
> Changed.
>
> That asymmetry seems hard to justify.

Agree.

> For example, if a card were inserted
> after enumeration but before pciehp is initialized, we'd miss the PDC
> indication, so I think we would fail to notice the new device.
> That seems basically the same as missing the linkup event in your example.
> In both cases, I think we *should* notice the new device, so the fact that we
> don't is a problem.

Hummm, ok understood. So that means we have defined a problem statement at least :-). Yes, I understand that the window is very small here. Also I agree with your opinion whatever solution we put it to take care of events that happened before pciehp init, should be consistent for all events (not just link changes). The problem can wait to be solved I think.

>
> But since pciehp can't be a module any more, the window between PCI
> enumeration and pciehp initialization should be relatively small. I think the
> best way to fix this would be to integrate pciehp more closely into PCI
> enumeration, e.g., by doing pcie_init() at the point where we discover the
> bridge, before we enumerate any devices below the bridge. This is
> somewhat tangled up with acpihp, so I don't know how complicated it would
> be to do this.
>
> So I guess my argument is:
>
> - Ignoring events that happen before pciehp init decreases our dependency
> on BIOS

Agree.

> - Handling all events consistently is very important

Agree.

> - We currently have a problem with missing pre-pciehp events, but there is
> a way to fix this

Somewhat agree (I do not understand what that way is currently. But I think it can wait as we don't hit that problem today and also the window where it can occur is very small).

Thanks,

Rajat

>
> > Acked-by: Rajat Jain <[email protected]>
>
> Thanks for taking a look at this!
>
> Bjorn
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-06-25 17:51:56

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Tue, Jun 24, 2014 at 11:26 PM, Rajat Jain <[email protected]> wrote:

>> > Essentially my hiccup was that I was not sure whether the driver should or
>> should not take care of the link change events that have happened BEFORE
>> the driver gets loaded. This has more impact if the pciehp is built as a kernel
>> module.
>> >
>> > As an example:
>> >
>> > It is common for the platform init code built into the kernel, to take the PCI
>> devices on the board out of reset. And that can happen after the PCI
>> enumeration but before the pciehp driver gets loaded. Thus in this condition,
>> with this patch, the pciehp will ignore the linkup event, thus device will not
>> be visible. The only way (other than a rescan) to do hot-add the device would
>> be to apply-and-remove-reset-signal to the device again. At which point
>> pciehp may give a warning about about an attempt to remove a non-existent
>> card, and then will proceed fine with hot-add.
>>
>> When you saw these problems, was pciehp a module?
>
> No, we did not actually hit *this* problem (DLLSC getting set before pciehp init). We had observed another HW specific problem where both link event and presence detect were toggling, hence 2 independent hot-plug interrupts events were coming, thus resulting in spurious messages.

OK, this sounds like a slightly different problem that we can work on
later if it turns out to be necessary.

> > For example, if a card were inserted
>> after enumeration but before pciehp is initialized, we'd miss the PDC
>> indication, so I think we would fail to notice the new device.
>> That seems basically the same as missing the linkup event in your example.
>> In both cases, I think we *should* notice the new device, so the fact that we
>> don't is a problem.
> ...

>> - We currently have a problem with missing pre-pciehp events, but there is
>> a way to fix this
>
> Somewhat agree (I do not understand what that way is currently. But I think it can wait as we don't hit that problem today and also the window where it can occur is very small).

My thought was that we could fix this by integrating pciehp into PCI
enumeration, so we're prepared to handle hotplug events on the bridge
before we enumerate any device below the bridge. But I haven't
actually tried this so I don't know whether it's practical.

Bjorn

2014-06-30 16:49:18

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <[email protected]> wrote:
> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <[email protected]> wrote:
>> During PCIe hot-plug initialization - pciehp_probe - data structures
>> related to slot capabilities are set up. As part of this set up, ISRs are
>> put in place to handle slot events and all event bits are cleared out.
>>
>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>> Slot Status bit to the event bits that are cleared out during
>> initialization.
>
> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> notifications for hot-plug and removal"). Prior to that, pcie_isr()
> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>
> Apparently there's a non-public report of spurious messages like this
> at boot-time, with no actual hotplug events:
>
> pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
> pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> 0000:83:00, cannot hot-add
> pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>
> Device 0000:83:00.0 was enumerated previously, during the normal PCI
> device enumeration. I suspect DLLSC was set by the hardware when the
> link came up after power-on, and it remained set until Linux booted.
> Then when we take the first pciehp interrupt, we notice DLLSC is set
> and think it's a new link state change.
>
> This might be system-specific, because some BIOSes might clear DLLSC
> before handing off to the OS. Or my theory might be all wet.
>
> But I think we should clear DLLSC in any case. We clear all the other
> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>
> I'd like to include a bugzilla or mailing list reference for the
> spurious message, and I'd also like confirmation that this changes
> actually fixes it.

I just got confirmation from the reporter that the patch fixes the
issue they were encountering.

I've also asked them numerous time if I could make the bug public as I
expect others may be hitting it and could benefit from a BZ and
analysis/explanation. I just repeated my request - if they reply
positively I'll follow through with some documentation.

Myron

>
> Bjorn
>
>> Reference:
>> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
>> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>
>> Signed-off-by: Myron Stowe <[email protected]>
>> ---
>>
>> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 42914e0..0568416 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>> - PCI_EXP_SLTSTA_CC);
>> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>
>> /* Disable software notification */
>> pcie_disable_notification(ctrl);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-01 19:30:18

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <[email protected]> wrote:
> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <[email protected]> wrote:
>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>> related to slot capabilities are set up. As part of this set up, ISRs are
>>> put in place to handle slot events and all event bits are cleared out.
>>>
>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>> Slot Status bit to the event bits that are cleared out during
>>> initialization.
>>
>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> notifications for hot-plug and removal"). Prior to that, pcie_isr()
>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>
>> Apparently there's a non-public report of spurious messages like this
>> at boot-time, with no actual hotplug events:
>>
>> pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>> pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> 0000:83:00, cannot hot-add
>> pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>
>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>> device enumeration. I suspect DLLSC was set by the hardware when the
>> link came up after power-on, and it remained set until Linux booted.
>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>> and think it's a new link state change.
>>
>> This might be system-specific, because some BIOSes might clear DLLSC
>> before handing off to the OS. Or my theory might be all wet.
>>
>> But I think we should clear DLLSC in any case. We clear all the other
>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>
>> I'd like to include a bugzilla or mailing list reference for the
>> spurious message, and I'd also like confirmation that this changes
>> actually fixes it.
>
> I just got confirmation from the reporter that the patch fixes the
> issue they were encountering.
>
> I've also asked them numerous time if I could make the bug public as I
> expect others may be hitting it and could benefit from a BZ and
> analysis/explanation. I just repeated my request - if they reply
> positively I'll follow through with some documentation.

I'd be glad to merge this, as soon as we get more details about the
problem it fixes. I think it's good to include some specifics, e.g.,
"Unexpected Link Up event," etc., because it helps other people with
the same problem to find the solution. It's OK if details about
pre-production machines and so on are removed, but it's better if
changes to generic code are supported by public evidence that
everybody can evaluate.

If you want to pull out the public stuff into a kernel.org bugzilla,
that would be fine, too. That would actually be better because then
we don't have to depend on Red Hat maintaining it.

Bjorn

>>> Reference:
>>> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
>>> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>
>>> Signed-off-by: Myron Stowe <[email protected]>
>>> ---
>>>
>>> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>> index 42914e0..0568416 100644
>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>> - PCI_EXP_SLTSTA_CC);
>>> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>
>>> /* Disable software notification */
>>> pcie_disable_notification(ctrl);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-07 20:26:41

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <[email protected]> wrote:
> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <[email protected]> wrote:
>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <[email protected]> wrote:
>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <[email protected]> wrote:
>>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>>> related to slot capabilities are set up. As part of this set up, ISRs are
>>>> put in place to handle slot events and all event bits are cleared out.
>>>>
>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>>> Slot Status bit to the event bits that are cleared out during
>>>> initialization.
>>>
>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>>> notifications for hot-plug and removal"). Prior to that, pcie_isr()
>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>>
>>> Apparently there's a non-public report of spurious messages like this
>>> at boot-time, with no actual hotplug events:
>>>
>>> pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>> pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>>> 0000:83:00, cannot hot-add
>>> pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>>
>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>>> device enumeration. I suspect DLLSC was set by the hardware when the
>>> link came up after power-on, and it remained set until Linux booted.
>>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>>> and think it's a new link state change.
>>>
>>> This might be system-specific, because some BIOSes might clear DLLSC
>>> before handing off to the OS. Or my theory might be all wet.
>>>
>>> But I think we should clear DLLSC in any case. We clear all the other
>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>>
>>> I'd like to include a bugzilla or mailing list reference for the
>>> spurious message, and I'd also like confirmation that this changes
>>> actually fixes it.
>>
>> I just got confirmation from the reporter that the patch fixes the
>> issue they were encountering.
>>
>> I've also asked them numerous time if I could make the bug public as I
>> expect others may be hitting it and could benefit from a BZ and
>> analysis/explanation. I just repeated my request - if they reply
>> positively I'll follow through with some documentation.
>
> I'd be glad to merge this, as soon as we get more details about the
> problem it fixes. I think it's good to include some specifics, e.g.,
> "Unexpected Link Up event," etc., because it helps other people with
> the same problem to find the solution. It's OK if details about
> pre-production machines and so on are removed, but it's better if
> changes to generic code are supported by public evidence that
> everybody can evaluate.
>
> If you want to pull out the public stuff into a kernel.org bugzilla,
> that would be fine, too. That would actually be better because then
> we don't have to depend on Red Hat maintaining it.

Agreed (but I couldn't get the vendor to agree to open up the Red Hat
BZ). I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611
and included a scrubbed 'dmesg' log there along with an analysis of
how we are hitting the issue.

>
> Bjorn
>
>>>> Reference:
>>>> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
>>>> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>>
>>>> Signed-off-by: Myron Stowe <[email protected]>
>>>> ---
>>>>
>>>> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>>> index 42914e0..0568416 100644
>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>>> - PCI_EXP_SLTSTA_CC);
>>>> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>>
>>>> /* Disable software notification */
>>>> pcie_disable_notification(ctrl);
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-07 20:56:36

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

On Mon, Jul 7, 2014 at 2:26 PM, Myron Stowe <[email protected]> wrote:
> On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <[email protected]> wrote:
>>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <[email protected]> wrote:
>>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <[email protected]> wrote:
>>>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>>>> related to slot capabilities are set up. As part of this set up, ISRs are
>>>>> put in place to handle slot events and all event bits are cleared out.
>>>>>
>>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>>>> Slot Status bit to the event bits that are cleared out during
>>>>> initialization.
>>>>
>>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>>>> notifications for hot-plug and removal"). Prior to that, pcie_isr()
>>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>>>
>>>> Apparently there's a non-public report of spurious messages like this
>>>> at boot-time, with no actual hotplug events:
>>>>
>>>> pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>>> pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>>>> 0000:83:00, cannot hot-add
>>>> pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>>>
>>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>>>> device enumeration. I suspect DLLSC was set by the hardware when the
>>>> link came up after power-on, and it remained set until Linux booted.
>>>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>>>> and think it's a new link state change.
>>>>
>>>> This might be system-specific, because some BIOSes might clear DLLSC
>>>> before handing off to the OS. Or my theory might be all wet.
>>>>
>>>> But I think we should clear DLLSC in any case. We clear all the other
>>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>>>
>>>> I'd like to include a bugzilla or mailing list reference for the
>>>> spurious message, and I'd also like confirmation that this changes
>>>> actually fixes it.
>>>
>>> I just got confirmation from the reporter that the patch fixes the
>>> issue they were encountering.
>>>
>>> I've also asked them numerous time if I could make the bug public as I
>>> expect others may be hitting it and could benefit from a BZ and
>>> analysis/explanation. I just repeated my request - if they reply
>>> positively I'll follow through with some documentation.
>>
>> I'd be glad to merge this, as soon as we get more details about the
>> problem it fixes. I think it's good to include some specifics, e.g.,
>> "Unexpected Link Up event," etc., because it helps other people with
>> the same problem to find the solution. It's OK if details about
>> pre-production machines and so on are removed, but it's better if
>> changes to generic code are supported by public evidence that
>> everybody can evaluate.
>>
>> If you want to pull out the public stuff into a kernel.org bugzilla,
>> that would be fine, too. That would actually be better because then
>> we don't have to depend on Red Hat maintaining it.
>
> Agreed (but I couldn't get the vendor to agree to open up the Red Hat
> BZ). I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611
> and included a scrubbed 'dmesg' log there along with an analysis of
> how we are hitting the issue.

Beautiful, thanks! I marked this for stable and put this in
pci/hotplug for v3.17.

>>>>> Reference:
>>>>> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3
>>>>> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>>>
>>>>> Signed-off-by: Myron Stowe <[email protected]>
>>>>> ---
>>>>>
>>>>> drivers/pci/hotplug/pciehp_hpc.c | 2 +-
>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>>>> index 42914e0..0568416 100644
>>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>>> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>>> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>>>> - PCI_EXP_SLTSTA_CC);
>>>>> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>>>
>>>>> /* Disable software notification */
>>>>> pcie_disable_notification(ctrl);
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html