2014-01-05 17:53:09

by Rajat Jain

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

Hello Bjorn,

Just checking on the fate of this patch set...

On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <[email protected]> wrote:
> [+cc [email protected] (seems to be Yinghai's preferred email]
>
> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>> We need future link up events for hot-add, thus don't disable
>> the link permanently during device removal. Also, remove the static
>> functions that are now left unused.
>
> The changelog should mention that this reverts part of 2debd9289997 ("PCI:
> pciehp: Disable/enable link during slot power off/on").

Sure. Do you want me to submit another patch set (bumping up the
version) with this change log, or you'd want to add this change log
while merging?

>
> Yinghai, can you tell us whether this is an issue on your systems?

As Yinghai confirms further down this thread, his issue was confirmed
by Intel to be a bug in the repeater chip.
----------------------------------
Yinghai writes:
> According to HW guys and Intel, that should be bug of repeater.
>
---------------------------------
I don't know about the details of his scenario, except that when the
adapter was disabled the repeater kept on flapping the link up & down
(and hence disabling the link solved the problem then). Yinghai
couldn't test, but I believe with this patch even if we disable
presence detect interrupt, the "adapter present / no present" messages
would (rightly) convert to "Link Up / Link Down" messages (since the
repeater keeps on flapping the link).

Since it is a platform specific bug, I'm not sure what can be done to
remove those messages except may be reduce the verbosity? If you'd
like I could change all the INFO messages to DBG messages.

Please let me know how to proceed further on this. Also, did you get a
chance to look at the subsequent patches of this patch set, I was
wondering if you had any comments there?

Thanks,

Rajat

>
>> Signed-off-by: Rajat Jain <[email protected]>
>> Signed-off-by: Guenter Roeck <[email protected]>
>> ---
>> v3: no change, created by splitting the patch v2 [2/4]
>> v2: (non existent)
>> v1: (non existent)
>>
>> drivers/pci/hotplug/pciehp_hpc.c | 18 ------------------
>> 1 file changed, 18 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index b45b568..ab12555 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -278,11 +278,6 @@ static void pcie_wait_link_active(struct controller *ctrl)
>> __pcie_wait_link_active(ctrl, true);
>> }
>>
>> -static void pcie_wait_link_not_active(struct controller *ctrl)
>> -{
>> - __pcie_wait_link_active(ctrl, false);
>> -}
>> -
>> static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
>> {
>> u32 l;
>> @@ -383,11 +378,6 @@ static int pciehp_link_enable(struct controller *ctrl)
>> return __pciehp_link_set(ctrl, true);
>> }
>>
>> -static int pciehp_link_disable(struct controller *ctrl)
>> -{
>> - return __pciehp_link_set(ctrl, false);
>> -}
>> -
>> int pciehp_get_attention_status(struct slot *slot, u8 *status)
>> {
>> struct controller *ctrl = slot->ctrl;
>> @@ -620,14 +610,6 @@ int pciehp_power_off_slot(struct slot * slot)
>> u16 cmd_mask;
>> int retval;
>>
>> - /* Disable the link at first */
>> - pciehp_link_disable(ctrl);
>> - /* wait the link is down */
>> - if (ctrl->link_active_reporting)
>> - pcie_wait_link_not_active(ctrl);
>> - else
>> - msleep(1000);
>> ->> slot_cmd = POWER_OFF;
>> cmd_mask = PCI_EXP_SLTCTL_PCC;
>> retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
>> --
>> 1.7.9.5
>>


2014-01-07 00:04:05

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <[email protected]> wrote:
> Hello Bjorn,
>
> Just checking on the fate of this patch set...
>
> On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <[email protected]> wrote:
>> [+cc [email protected] (seems to be Yinghai's preferred email]
>>
>> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>>> We need future link up events for hot-add, thus don't disable
>>> the link permanently during device removal. Also, remove the static
>>> functions that are now left unused.
>>
>> The changelog should mention that this reverts part of 2debd9289997 ("PCI:
>> pciehp: Disable/enable link during slot power off/on").
>
> Sure. Do you want me to submit another patch set (bumping up the
> version) with this change log, or you'd want to add this change log
> while merging?
>
>>
>> Yinghai, can you tell us whether this is an issue on your systems?
>
> As Yinghai confirms further down this thread, his issue was confirmed
> by Intel to be a bug in the repeater chip.
> ----------------------------------
> Yinghai writes:
>> According to HW guys and Intel, that should be bug of repeater.
>>
> ---------------------------------
> I don't know about the details of his scenario, except that when the
> adapter was disabled the repeater kept on flapping the link up & down
> (and hence disabling the link solved the problem then). Yinghai
> couldn't test, but I believe with this patch even if we disable
> presence detect interrupt, the "adapter present / no present" messages
> would (rightly) convert to "Link Up / Link Down" messages (since the
> repeater keeps on flapping the link).
>
> Since it is a platform specific bug, I'm not sure what can be done to
> remove those messages except may be reduce the verbosity? If you'd
> like I could change all the INFO messages to DBG messages.

Even if it's a defect in a particular piece of hardware, I don't want
to regress on that hardware, even if the regression is just extra
messages that we didn't see before.

I think ideally we would add some sort of quirk for that hardware so
it works just as well as it does today. I think extra messages will
lead to a bug reports from users.

Bjorn

2014-01-07 18:20:55

by Rajat Jain

[permalink] [raw]
Subject: RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

Hello Bjorn / Yinghai,

> -----Original Message-----
> From: Bjorn Helgaas [mailto:[email protected]]
> Sent: Monday, January 06, 2014 4:04 PM
> To: Rajat Jain
> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
> [email protected]; [email protected]; Yinghai Lu; Guenter
> Roeck; Rajat Jain; Yinghai Lu
> Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently,
> during removal
>
> On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <[email protected]>
> wrote:
> > Hello Bjorn,
> >
> > Just checking on the fate of this patch set...
> >
> > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <[email protected]>
> wrote:
> >> [+cc [email protected] (seems to be Yinghai's preferred email]
> >>
> >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
> >>> We need future link up events for hot-add, thus don't disable the
> >>> link permanently during device removal. Also, remove the static
> >>> functions that are now left unused.
> >>
> >> The changelog should mention that this reverts part of 2debd9289997
> ("PCI:
> >> pciehp: Disable/enable link during slot power off/on").
> >
> > Sure. Do you want me to submit another patch set (bumping up the
> > version) with this change log, or you'd want to add this change log
> > while merging?
> >
> >>
> >> Yinghai, can you tell us whether this is an issue on your systems?
> >
> > As Yinghai confirms further down this thread, his issue was confirmed
> > by Intel to be a bug in the repeater chip.
> > ----------------------------------
> > Yinghai writes:
> >> According to HW guys and Intel, that should be bug of repeater.
> >>
> > ---------------------------------
> > I don't know about the details of his scenario, except that when the
> > adapter was disabled the repeater kept on flapping the link up & down
> > (and hence disabling the link solved the problem then). Yinghai
> > couldn't test, but I believe with this patch even if we disable
> > presence detect interrupt, the "adapter present / no present" messages
> > would (rightly) convert to "Link Up / Link Down" messages (since the
> > repeater keeps on flapping the link).
> >
> > Since it is a platform specific bug, I'm not sure what can be done to
> > remove those messages except may be reduce the verbosity? If you'd
> > like I could change all the INFO messages to DBG messages.
>
> Even if it's a defect in a particular piece of hardware, I don't want to
> regress on that hardware, even if the regression is just extra messages
> that we didn't see before.
>
> I think ideally we would add some sort of quirk for that hardware so it
> works just as well as it does today. I think extra messages will lead
> to a bug reports from users.

Sure, I can do that. I think what the quirk would have to do is that for that particular platform, don't enable the link-state based hotplug. (Since link-state hotplug will not work if we disable the link permanently as we do today on card removal).

But the question is how to determine that the quirk has to be applied? I think the objective is to apply the quirk to the platforms that have a "PCIe repeater". Since this does not depend on a PCI device / vendor ID, and I think the PCIe repeater is probably not even visible to the pciehp or the PCI subsystem, how do I determine that the quirk has to be applied?

If (hw_has_pcie_repeater)
Don't use link-state hotplug (and disable link permanently during card removal)
Else
Use link-state hotplug (and don't disable the link permanently)


Yinghai: Since I do not have that hardware, I will need some help in testing the patch with the quirk. I was wondering if you'd still have that hardware around and would be able to help me with testing?

Thanks,

Rajat
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-01-09 20:45:56

by Rajat Jain

[permalink] [raw]
Subject: RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

Hi Bjorn,

> -----Original Message-----
> From: [email protected] [mailto:linux-pci-
> [email protected]] On Behalf Of Rajat Jain
> Sent: Tuesday, January 07, 2014 10:21 AM
> To: Bjorn Helgaas; Rajat Jain
> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
> [email protected]; [email protected]; Yinghai Lu; Guenter
> Roeck; Yinghai Lu
> Subject: RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently,
> during removal
>
> Hello Bjorn / Yinghai,
>
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:[email protected]]
> > Sent: Monday, January 06, 2014 4:04 PM
> > To: Rajat Jain
> > Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
> > [email protected]; [email protected]; Yinghai Lu; Guenter
> > Roeck; Rajat Jain; Yinghai Lu
> > Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link
> > permanently, during removal
> >
> > On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <[email protected]>
> > wrote:
> > > Hello Bjorn,
> > >
> > > Just checking on the fate of this patch set...
> > >
> > > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <[email protected]>
> > wrote:
> > >> [+cc [email protected] (seems to be Yinghai's preferred email]
> > >>
> > >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
> > >>> We need future link up events for hot-add, thus don't disable the
> > >>> link permanently during device removal. Also, remove the static
> > >>> functions that are now left unused.
> > >>
> > >> The changelog should mention that this reverts part of 2debd9289997
> > ("PCI:
> > >> pciehp: Disable/enable link during slot power off/on").
> > >
> > > Sure. Do you want me to submit another patch set (bumping up the
> > > version) with this change log, or you'd want to add this change log
> > > while merging?
> > >
> > >>
> > >> Yinghai, can you tell us whether this is an issue on your systems?
> > >
> > > As Yinghai confirms further down this thread, his issue was
> > > confirmed by Intel to be a bug in the repeater chip.
> > > ----------------------------------
> > > Yinghai writes:
> > >> According to HW guys and Intel, that should be bug of repeater.
> > >>
> > > ---------------------------------
> > > I don't know about the details of his scenario, except that when
> > > the adapter was disabled the repeater kept on flapping the link up &
> > > down (and hence disabling the link solved the problem then). Yinghai
> > > couldn't test, but I believe with this patch even if we disable
> > > presence detect interrupt, the "adapter present / no present"
> > > messages would (rightly) convert to "Link Up / Link Down" messages
> > > (since the repeater keeps on flapping the link).
> > >
> > > Since it is a platform specific bug, I'm not sure what can be done
> > > to remove those messages except may be reduce the verbosity? If
> > > you'd like I could change all the INFO messages to DBG messages.
> >
> > Even if it's a defect in a particular piece of hardware, I don't want
> > to regress on that hardware, even if the regression is just extra
> > messages that we didn't see before.
> >
> > I think ideally we would add some sort of quirk for that hardware so
> > it works just as well as it does today. I think extra messages will
> > lead to a bug reports from users.
>
> Sure, I can do that. I think what the quirk would have to do is that for
> that particular platform, don't enable the link-state based hotplug.
> (Since link-state hotplug will not work if we disable the link
> permanently as we do today on card removal).
>
> But the question is how to determine that the quirk has to be applied? I
> think the objective is to apply the quirk to the platforms that have a
> "PCIe repeater". Since this does not depend on a PCI device / vendor ID,
> and I think the PCIe repeater is probably not even visible to the pciehp
> or the PCI subsystem, how do I determine that the quirk has to be
> applied?

Any ideas on how do I identify the platforms that may have this problem?

Thanks,

Rajat


>
> If (hw_has_pcie_repeater)
> Don't use link-state hotplug (and disable link permanently during
> card removal) Else
> Use link-state hotplug (and don't disable the link permanently)
>
>
> Yinghai: Since I do not have that hardware, I will need some help in
> testing the patch with the quirk. I was wondering if you'd still have
> that hardware around and would be able to help me with testing?
>
> Thanks,
>
> Rajat
>  {.n + +% lzwm b 맲 r zX \ ) w*jg  ݢj/ z ޖ 2 ޙ
> & )ߡ a   G h  j:+v w ٥
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-01-09 20:58:54

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

On Thu, Jan 9, 2014 at 1:45 PM, Rajat Jain <[email protected]> wrote:
> Hi Bjorn,
>
>> -----Original Message-----
>> From: [email protected] [mailto:linux-pci-
>> [email protected]] On Behalf Of Rajat Jain
>> Sent: Tuesday, January 07, 2014 10:21 AM
>> To: Bjorn Helgaas; Rajat Jain
>> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
>> [email protected]; [email protected]; Yinghai Lu; Guenter
>> Roeck; Yinghai Lu
>> Subject: RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently,
>> during removal
>>
>> Hello Bjorn / Yinghai,
>>
>> > -----Original Message-----
>> > From: Bjorn Helgaas [mailto:[email protected]]
>> > Sent: Monday, January 06, 2014 4:04 PM
>> > To: Rajat Jain
>> > Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
>> > [email protected]; [email protected]; Yinghai Lu; Guenter
>> > Roeck; Rajat Jain; Yinghai Lu
>> > Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link
>> > permanently, during removal
>> >
>> > On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <[email protected]>
>> > wrote:
>> > > Hello Bjorn,
>> > >
>> > > Just checking on the fate of this patch set...
>> > >
>> > > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <[email protected]>
>> > wrote:
>> > >> [+cc [email protected] (seems to be Yinghai's preferred email]
>> > >>
>> > >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>> > >>> We need future link up events for hot-add, thus don't disable the
>> > >>> link permanently during device removal. Also, remove the static
>> > >>> functions that are now left unused.
>> > >>
>> > >> The changelog should mention that this reverts part of 2debd9289997
>> > ("PCI:
>> > >> pciehp: Disable/enable link during slot power off/on").
>> > >
>> > > Sure. Do you want me to submit another patch set (bumping up the
>> > > version) with this change log, or you'd want to add this change log
>> > > while merging?
>> > >
>> > >>
>> > >> Yinghai, can you tell us whether this is an issue on your systems?
>> > >
>> > > As Yinghai confirms further down this thread, his issue was
>> > > confirmed by Intel to be a bug in the repeater chip.
>> > > ----------------------------------
>> > > Yinghai writes:
>> > >> According to HW guys and Intel, that should be bug of repeater.
>> > >>
>> > > ---------------------------------
>> > > I don't know about the details of his scenario, except that when
>> > > the adapter was disabled the repeater kept on flapping the link up &
>> > > down (and hence disabling the link solved the problem then). Yinghai
>> > > couldn't test, but I believe with this patch even if we disable
>> > > presence detect interrupt, the "adapter present / no present"
>> > > messages would (rightly) convert to "Link Up / Link Down" messages
>> > > (since the repeater keeps on flapping the link).
>> > >
>> > > Since it is a platform specific bug, I'm not sure what can be done
>> > > to remove those messages except may be reduce the verbosity? If
>> > > you'd like I could change all the INFO messages to DBG messages.
>> >
>> > Even if it's a defect in a particular piece of hardware, I don't want
>> > to regress on that hardware, even if the regression is just extra
>> > messages that we didn't see before.
>> >
>> > I think ideally we would add some sort of quirk for that hardware so
>> > it works just as well as it does today. I think extra messages will
>> > lead to a bug reports from users.
>>
>> Sure, I can do that. I think what the quirk would have to do is that for
>> that particular platform, don't enable the link-state based hotplug.
>> (Since link-state hotplug will not work if we disable the link
>> permanently as we do today on card removal).
>>
>> But the question is how to determine that the quirk has to be applied? I
>> think the objective is to apply the quirk to the platforms that have a
>> "PCIe repeater". Since this does not depend on a PCI device / vendor ID,
>> and I think the PCIe repeater is probably not even visible to the pciehp
>> or the PCI subsystem, how do I determine that the quirk has to be
>> applied?
>
> Any ideas on how do I identify the platforms that may have this problem?

I sure don't know. I suspect you're right that the PCIe repeater is
invisible to software, at least in terms of PCI config space. Maybe
we could use DMI to identify platforms. That's not a very good
solution because we have to come up with a list, but I can't think of
a better way. Yinghai knows more about the platform and might have
better ideas.

Bjorn

>> If (hw_has_pcie_repeater)
>> Don't use link-state hotplug (and disable link permanently during
>> card removal) Else
>> Use link-state hotplug (and don't disable the link permanently)
>>
>>
>> Yinghai: Since I do not have that hardware, I will need some help in
>> testing the patch with the quirk. I was wondering if you'd still have
>> that hardware around and would be able to help me with testing?
>>
>> Thanks,
>>
>> Rajat
>> {.n + +% lzwm b 맲 r zX \ ) w*jg ݢj/ z ޖ 2 ޙ
>> & )ߡ a G h j:+v w ٥

2014-01-13 08:30:42

by Rajat Jain

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

Hi Yinghai / Bjorn,


On Thu, Jan 9, 2014 at 12:58 PM, Bjorn Helgaas <[email protected]> wrote:
>>> >
>>> > On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <[email protected]>
>>> > wrote:
>>> > > Hello Bjorn,
>>> > >
>>> > > Just checking on the fate of this patch set...
>>> > >
>>> > > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <[email protected]>
>>> > wrote:
>>> > >> [+cc [email protected] (seems to be Yinghai's preferred email]
>>> > >>
>>> > >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>>> > >>> We need future link up events for hot-add, thus don't disable the
>>> > >>> link permanently during device removal. Also, remove the static
>>> > >>> functions that are now left unused.
>>> > >>
>>> > >> The changelog should mention that this reverts part of 2debd9289997
>>> > ("PCI:
>>> > >> pciehp: Disable/enable link during slot power off/on").
>>> > >
>>> > > Sure. Do you want me to submit another patch set (bumping up the
>>> > > version) with this change log, or you'd want to add this change log
>>> > > while merging?
>>> > >
>>> > >>
>>> > >> Yinghai, can you tell us whether this is an issue on your systems?
>>> > >
>>> > > As Yinghai confirms further down this thread, his issue was
>>> > > confirmed by Intel to be a bug in the repeater chip.
>>> > > ----------------------------------
>>> > > Yinghai writes:
>>> > >> According to HW guys and Intel, that should be bug of repeater.
>>> > >>
>>> > > ---------------------------------
>>> > > I don't know about the details of his scenario, except that when
>>> > > the adapter was disabled the repeater kept on flapping the link up &
>>> > > down (and hence disabling the link solved the problem then). Yinghai
>>> > > couldn't test, but I believe with this patch even if we disable
>>> > > presence detect interrupt, the "adapter present / no present"
>>> > > messages would (rightly) convert to "Link Up / Link Down" messages
>>> > > (since the repeater keeps on flapping the link).
>>> > >
>>> > > Since it is a platform specific bug, I'm not sure what can be done
>>> > > to remove those messages except may be reduce the verbosity? If
>>> > > you'd like I could change all the INFO messages to DBG messages.
>>> >
>>> > Even if it's a defect in a particular piece of hardware, I don't want
>>> > to regress on that hardware, even if the regression is just extra
>>> > messages that we didn't see before.
>>> >
>>> > I think ideally we would add some sort of quirk for that hardware so
>>> > it works just as well as it does today. I think extra messages will
>>> > lead to a bug reports from users.
>>>
>>> Sure, I can do that. I think what the quirk would have to do is that for
>>> that particular platform, don't enable the link-state based hotplug.
>>> (Since link-state hotplug will not work if we disable the link
>>> permanently as we do today on card removal).
>>>
>>> But the question is how to determine that the quirk has to be applied? I
>>> think the objective is to apply the quirk to the platforms that have a
>>> "PCIe repeater". Since this does not depend on a PCI device / vendor ID,
>>> and I think the PCIe repeater is probably not even visible to the pciehp
>>> or the PCI subsystem, how do I determine that the quirk has to be
>>> applied?
>>
>> Any ideas on how do I identify the platforms that may have this problem?
>
> I sure don't know. I suspect you're right that the PCIe repeater is
> invisible to software, at least in terms of PCI config space. Maybe
> we could use DMI to identify platforms. That's not a very good
> solution because we have to come up with a list, but I can't think of
> a better way. Yinghai knows more about the platform and might have
> better ideas.

Yinghai: I am trying to understand what exactly is this platform bug
and how to add a quirk such that this platform remains unaffected. Can
you please help me by suggesting how to decide if this is _the_
platform that has the bug (the pcie repeater).

Bjorn: It seems to be that identification of this platform will be out
PCI code (since the bug seems to be in a pcie repeater chip which is
not a PCI device visible to SW). So even if we find a way to identify
this platform (e.g DMI) , I doubt if you'd want me to add that in the
pciehp code (which is platform independent so far). At best, the only
way out I can see is to provide a knob from the pciehp, that can be
use by the platform code to either enable or disable the link state
hotplug. It could go back towards using a module parameter like
pciehp_use_link_events. Please suggest.

The only other way I can think of, is that I can remove the debug
message altogether (Link up / Link down). (Or the user can change the
verbosity).

Humm, when I think of it, we're trying to address a bug of a chip
which is not a PCI device, into pciehp. I'm praying it doesn't bring
this patch set to a dead end :-)

Thanks,

Rajat

>
> Bjorn
>
>>> If (hw_has_pcie_repeater)
>>> Don't use link-state hotplug (and disable link permanently during
>>> card removal) Else
>>> Use link-state hotplug (and don't disable the link permanently)
>>>
>>>
>>> Yinghai: Since I do not have that hardware, I will need some help in
>>> testing the patch with the quirk. I was wondering if you'd still have
>>> that hardware around and would be able to help me with testing?
>>>
>>> Thanks,
>>>
>>> Rajat
>>> {.n + +% lzwm b 맲 r zX \ ) w*jg ݢj/ z ޖ 2 ޙ
>>> & )ߡ a G h j:+v w ٥

2014-01-13 17:39:19

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently, during removal

On Mon, Jan 13, 2014 at 1:30 AM, Rajat Jain <[email protected]> wrote:

> Yinghai: I am trying to understand what exactly is this platform bug
> and how to add a quirk such that this platform remains unaffected. Can
> you please help me by suggesting how to decide if this is _the_
> platform that has the bug (the pcie repeater).
>
> Bjorn: It seems to be that identification of this platform will be out
> PCI code (since the bug seems to be in a pcie repeater chip which is
> not a PCI device visible to SW). So even if we find a way to identify
> this platform (e.g DMI) , I doubt if you'd want me to add that in the
> pciehp code (which is platform independent so far). At best, the only
> way out I can see is to provide a knob from the pciehp, that can be
> use by the platform code to either enable or disable the link state
> hotplug. It could go back towards using a module parameter like
> pciehp_use_link_events. Please suggest.
>
> The only other way I can think of, is that I can remove the debug
> message altogether (Link up / Link down). (Or the user can change the
> verbosity).

I think it's perfectly fine to add a DMI-based quirk in pciehp. Yes,
it's a bit ugly, but that's just the nature of working around hardware
defects. Identify the platform, emit a diagnostic ("disabling link
state because platform may be buggy"), enable the workaround. That
seems better than requiring the user to figure out what hardware he
has and whether it has a defect.

I would also be OK with adding a pciehp module parameter to explicitly
enable or disable the workaround if that seems necessary. I just want
the common case of correctly working hardware to work without any
switches.

Removing or rate-limiting the link up/down debug message is also fine with me.

Bjorn