Some of the platforms (like Tegra194 and Tegra234) have open slots and
not having an endpoint connected to the slot is not an error.
So, changing the macro from dev_err to dev_info to log the event.
Signed-off-by: Vidya Sagar <[email protected]>
---
drivers/pci/controller/dwc/pcie-designware.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index 650a7f22f9d0..25154555aa7a 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
}
if (retries >= LINK_WAIT_MAX_RETRIES) {
- dev_err(pci->dev, "Phy link never came up\n");
+ dev_info(pci->dev, "Phy link never came up\n");
return -ETIMEDOUT;
}
--
2.17.1
On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>> not having an endpoint connected to the slot is not an error.
>> So, changing the macro from dev_err to dev_info to log the event.
>>
>
> But the link up not happening is an actual error and -ETIMEDOUT is being
> returned. So I don't think the log severity should be changed.
Yes it is an error in the sense it is a timeout, but reporting an error
because nothing is attached to a PCI slot seems a bit noisy. Please note
that a similar change was made by the following commit and it also seems
appropriate here ...
commit 4b16a8227907118e011fb396022da671a52b2272
Author: Manikanta Maddireddy <[email protected]>
Date: Tue Jun 18 23:32:06 2019 +0530
PCI: tegra: Change link retry log level to debug
BTW, we check for error messages in the dmesg output and this is a new
error seen as of Linux v6.0 and so this was flagged in a test. We can
ignore the error, but in this case it seem more appropriate to make this
a info or debug level print.
Jon
--
nvpublic
On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
>
But the link up not happening is an actual error and -ETIMEDOUT is being
returned. So I don't think the log severity should be changed.
Thanks,
Mani
> Signed-off-by: Vidya Sagar <[email protected]>
> ---
> drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index 650a7f22f9d0..25154555aa7a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> }
>
> if (retries >= LINK_WAIT_MAX_RETRIES) {
> - dev_err(pci->dev, "Phy link never came up\n");
> + dev_info(pci->dev, "Phy link never came up\n");
> return -ETIMEDOUT;
> }
>
> --
> 2.17.1
>
On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > not having an endpoint connected to the slot is not an error.
> > > So, changing the macro from dev_err to dev_info to log the event.
> >
> > But the link up not happening is an actual error and -ETIMEDOUT is being
> > returned. So I don't think the log severity should be changed.
>
> Yes it is an error in the sense it is a timeout, but reporting an error
> because nothing is attached to a PCI slot seems a bit noisy. Please note
> that a similar change was made by the following commit and it also seems
> appropriate here ...
>
> commit 4b16a8227907118e011fb396022da671a52b2272
> Author: Manikanta Maddireddy <[email protected]>
> Date: Tue Jun 18 23:32:06 2019 +0530
>
> PCI: tegra: Change link retry log level to debug
>
>
> BTW, we check for error messages in the dmesg output and this is a new error
> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> error, but in this case it seem more appropriate to make this a info or
> debug level print.
Can you tell whether there's a device present, e.g., via Slot Status
Presence Detect? If there's nothing in the slot, I don't know why we
would print anything at all. If a card is present but there's no
link, that's probably worthy of dev_info() or even dev_err().
I guess if you can tell the slot is empty, there's no point in even
trying to start the link, so you could avoid both the message and the
timeout by not even calling dw_pcie_wait_for_link().
Bjorn
On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>
> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > not having an endpoint connected to the slot is not an error.
> > > So, changing the macro from dev_err to dev_info to log the event.
> > >
> >
> > But the link up not happening is an actual error and -ETIMEDOUT is being
> > returned. So I don't think the log severity should be changed.
>
> Yes it is an error in the sense it is a timeout, but reporting an error
> because nothing is attached to a PCI slot seems a bit noisy. Please note
> that a similar change was made by the following commit and it also seems
> appropriate here ...
>
I got the concern. But the problem here is, we cannot distinguish if the
error is due to slot empty or an actual issues with link/phy. Also it
should be noted that if the slot is empty, the dwc core anyway waits for
the link to be up. IMO this is a boot time overhead that could be
avoided if the specific instance could be disabled in platform data like
devicetree.
> commit 4b16a8227907118e011fb396022da671a52b2272
> Author: Manikanta Maddireddy <[email protected]>
> Date: Tue Jun 18 23:32:06 2019 +0530
>
> PCI: tegra: Change link retry log level to debug
>
>
> BTW, we check for error messages in the dmesg output and this is a new error
> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> error, but in this case it seem more appropriate to make this a info or
> debug level print.
>
On some of the Qcom based platforms, we avoid this situation by
disabling the specific PCIe node in devicetree for which we know that
there would be no devices connected.
This not only avoids the error message in log but also removes the time
spent waiting for link to be up.
Thanks,
Mani
> Jon
>
> --
> nvpublic
On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > not having an endpoint connected to the slot is not an error.
> > > > So, changing the macro from dev_err to dev_info to log the event.
> > >
> > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > returned. So I don't think the log severity should be changed.
> >
> > Yes it is an error in the sense it is a timeout, but reporting an error
> > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > that a similar change was made by the following commit and it also seems
> > appropriate here ...
> >
> > commit 4b16a8227907118e011fb396022da671a52b2272
> > Author: Manikanta Maddireddy <[email protected]>
> > Date: Tue Jun 18 23:32:06 2019 +0530
> >
> > PCI: tegra: Change link retry log level to debug
> >
> >
> > BTW, we check for error messages in the dmesg output and this is a new error
> > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > error, but in this case it seem more appropriate to make this a info or
> > debug level print.
>
> Can you tell whether there's a device present, e.g., via Slot Status
> Presence Detect? If there's nothing in the slot, I don't know why we
> would print anything at all. If a card is present but there's no
> link, that's probably worthy of dev_info() or even dev_err().
>
I don't think all form factors allow for the PRSNT pin to be wired up,
so we cannot know if the device is actually present in the slot or not all
the time. Maybe we should do if the form factor supports it?
> I guess if you can tell the slot is empty, there's no point in even
> trying to start the link, so you could avoid both the message and the
> timeout by not even calling dw_pcie_wait_for_link().
Right. There is an overhead of waiting for ~1ms during boot.
We workaround this issue by disabling the PCIe instances in devicetree
for which there would be no devices connected.
Thanks,
Mani
>
> Bjorn
Agree with Mani.
Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and
even if the slot form factor supports PRSNT, it is not mandatory to have
a GPIO routed to the PRSNT pin of the slot.
Also, since these are development platforms, we wouldn't want to keep
changing a controller's status in the DT, instead want to know the
device's presence/absence (by way of link up) looking at the log.
In my honest opinion, I don't think the absence of a device in the slot
should be treated as an error.
Thanks,
Vidya Sagar
On 9/14/2022 11:54 AM, Manivannan Sadhasivam wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>>> not having an endpoint connected to the slot is not an error.
>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>
>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
>>>> returned. So I don't think the log severity should be changed.
>>>
>>> Yes it is an error in the sense it is a timeout, but reporting an error
>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
>>> that a similar change was made by the following commit and it also seems
>>> appropriate here ...
>>>
>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>> Author: Manikanta Maddireddy <[email protected]>
>>> Date: Tue Jun 18 23:32:06 2019 +0530
>>>
>>> PCI: tegra: Change link retry log level to debug
>>>
>>>
>>> BTW, we check for error messages in the dmesg output and this is a new error
>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
>>> error, but in this case it seem more appropriate to make this a info or
>>> debug level print.
>>
>> Can you tell whether there's a device present, e.g., via Slot Status
>> Presence Detect? If there's nothing in the slot, I don't know why we
>> would print anything at all. If a card is present but there's no
>> link, that's probably worthy of dev_info() or even dev_err().
>>
>
> I don't think all form factors allow for the PRSNT pin to be wired up,
> so we cannot know if the device is actually present in the slot or not all
> the time. Maybe we should do if the form factor supports it?
>
>> I guess if you can tell the slot is empty, there's no point in even
>> trying to start the link, so you could avoid both the message and the
>> timeout by not even calling dw_pcie_wait_for_link().
>
> Right. There is an overhead of waiting for ~1ms during boot.
>
> We workaround this issue by disabling the PCIe instances in devicetree
> for which there would be no devices connected.
>
> Thanks,
> Mani
>
>>
>> Bjorn
On Wed, Sep 14, 2022 at 12:25:51PM +0100, Jon Hunter wrote:
>
> On 14/09/2022 12:18, Manivannan Sadhasivam wrote:
> > On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
> > > Agree with Mani.
> > > Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
> > > if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
> > > routed to the PRSNT pin of the slot.
> >
> > Right.
> >
> > > Also, since these are development platforms, we wouldn't want to keep
> > > changing a controller's status in the DT, instead want to know the device's
> > > presence/absence (by way of link up) looking at the log.
> > > In my honest opinion, I don't think the absence of a device in the slot
> > > should be treated as an error.
> > >
> >
> > As I mentioned earlier, timeout can happen due to an issue with PHY layer
> > also. In those cases, "dev_err()" is relevant.
> >
> > And I also agree that absence of the device should not be treated as an
> > error. But my question is, if you know that the slot is going to be
> > empty always, why cannot you just disable it in dts?
>
> I really don't think that makes sense from a usability perspective. You want
> to allow users to connect PCI cards and for them to work without having to
> update the DTB. I have plenty of open PCI slots on my PC and I would be a
> bit upset if someone told me I need to update the PC firmware each time I
> wanted to use a new slot.
>
My question was, "do you think the slot is going to be empty always".
This can happen with slots exposed through a connector (not a PCIe one) and
users would plug in add-on cards for accessing the slots. In those
cases, the add-on specific devicetree can enable the PCIe instance and
use it.
But from your reply, I can infer that the slot is exposed on a standard
PCIe connector and users would connect a PCIe device any time.
Anyway, I don't strongly object the change and leave it to the
maintainers to decide.
Thanks,
Mani
> Jon
>
> --
> nvpublic
On 14/09/2022 12:43, Manivannan Sadhasivam wrote:
> On Wed, Sep 14, 2022 at 12:25:51PM +0100, Jon Hunter wrote:
>>
>> On 14/09/2022 12:18, Manivannan Sadhasivam wrote:
>>> On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
>>>> Agree with Mani.
>>>> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
>>>> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
>>>> routed to the PRSNT pin of the slot.
>>>
>>> Right.
>>>
>>>> Also, since these are development platforms, we wouldn't want to keep
>>>> changing a controller's status in the DT, instead want to know the device's
>>>> presence/absence (by way of link up) looking at the log.
>>>> In my honest opinion, I don't think the absence of a device in the slot
>>>> should be treated as an error.
>>>>
>>>
>>> As I mentioned earlier, timeout can happen due to an issue with PHY layer
>>> also. In those cases, "dev_err()" is relevant.
>>>
>>> And I also agree that absence of the device should not be treated as an
>>> error. But my question is, if you know that the slot is going to be
>>> empty always, why cannot you just disable it in dts?
>>
>> I really don't think that makes sense from a usability perspective. You want
>> to allow users to connect PCI cards and for them to work without having to
>> update the DTB. I have plenty of open PCI slots on my PC and I would be a
>> bit upset if someone told me I need to update the PC firmware each time I
>> wanted to use a new slot.
>>
>
> My question was, "do you think the slot is going to be empty always".
> This can happen with slots exposed through a connector (not a PCIe one) and
> users would plug in add-on cards for accessing the slots. In those
> cases, the add-on specific devicetree can enable the PCIe instance and
> use it.
>
> But from your reply, I can infer that the slot is exposed on a standard
> PCIe connector and users would connect a PCIe device any time.
Correct it is exposed via a standard connector. Yes for PCIe slots that
are not connected to an on-board device or connector, in that case it
would make sense to disable, but this is not the case here.
Jon
--
nvpublic
On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
> Agree with Mani.
> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
> routed to the PRSNT pin of the slot.
Right.
> Also, since these are development platforms, we wouldn't want to keep
> changing a controller's status in the DT, instead want to know the device's
> presence/absence (by way of link up) looking at the log.
> In my honest opinion, I don't think the absence of a device in the slot
> should be treated as an error.
>
As I mentioned earlier, timeout can happen due to an issue with PHY layer
also. In those cases, "dev_err()" is relevant.
And I also agree that absence of the device should not be treated as an
error. But my question is, if you know that the slot is going to be
empty always, why cannot you just disable it in dts?
Even if you make the log "dev_info()" there is the wait time for link up
and I'm sure that you wouldn't want it either.
Thanks,
Mani
> Thanks,
> Vidya Sagar
>
> On 9/14/2022 11:54 AM, Manivannan Sadhasivam wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > >
> > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > returned. So I don't think the log severity should be changed.
> > > >
> > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > that a similar change was made by the following commit and it also seems
> > > > appropriate here ...
> > > >
> > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > Author: Manikanta Maddireddy <[email protected]>
> > > > Date: Tue Jun 18 23:32:06 2019 +0530
> > > >
> > > > PCI: tegra: Change link retry log level to debug
> > > >
> > > >
> > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > error, but in this case it seem more appropriate to make this a info or
> > > > debug level print.
> > >
> > > Can you tell whether there's a device present, e.g., via Slot Status
> > > Presence Detect? If there's nothing in the slot, I don't know why we
> > > would print anything at all. If a card is present but there's no
> > > link, that's probably worthy of dev_info() or even dev_err().
> > >
> >
> > I don't think all form factors allow for the PRSNT pin to be wired up,
> > so we cannot know if the device is actually present in the slot or not all
> > the time. Maybe we should do if the form factor supports it?
> >
> > > I guess if you can tell the slot is empty, there's no point in even
> > > trying to start the link, so you could avoid both the message and the
> > > timeout by not even calling dw_pcie_wait_for_link().
> >
> > Right. There is an overhead of waiting for ~1ms during boot.
> >
> > We workaround this issue by disabling the PCIe instances in devicetree
> > for which there would be no devices connected.
> >
> > Thanks,
> > Mani
> >
> > >
> > > Bjorn
On 14/09/2022 12:18, Manivannan Sadhasivam wrote:
> On Wed, Sep 14, 2022 at 04:32:10PM +0530, Vidya Sagar wrote:
>> Agree with Mani.
>> Not all form factors support PRSNT pin (Ex:- M.2 Key-M form factor) and even
>> if the slot form factor supports PRSNT, it is not mandatory to have a GPIO
>> routed to the PRSNT pin of the slot.
>
> Right.
>
>> Also, since these are development platforms, we wouldn't want to keep
>> changing a controller's status in the DT, instead want to know the device's
>> presence/absence (by way of link up) looking at the log.
>> In my honest opinion, I don't think the absence of a device in the slot
>> should be treated as an error.
>>
>
> As I mentioned earlier, timeout can happen due to an issue with PHY layer
> also. In those cases, "dev_err()" is relevant.
>
> And I also agree that absence of the device should not be treated as an
> error. But my question is, if you know that the slot is going to be
> empty always, why cannot you just disable it in dts?
I really don't think that makes sense from a usability perspective. You
want to allow users to connect PCI cards and for them to work without
having to update the DTB. I have plenty of open PCI slots on my PC and I
would be a bit upset if someone told me I need to update the PC firmware
each time I wanted to use a new slot.
Jon
--
nvpublic
Hello,
[...]
> Anyway, I don't strongly object the change and leave it to the
> maintainers to decide.
Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
since it appears that this information is of more use to the developer (who
most likely has the suitable log level set anyway), and given that there is
no way to reliably detect a presence in a slot on some platforms, this
might otherwise, add to the other messages that normal users don't pay
attention to usually - if this is not to be treated as an error.
Krzysztof
On Wed, Sep 14, 2022 at 09:44:44PM +0900, Krzysztof Wilczyński wrote:
> Hello,
>
> [...]
> > Anyway, I don't strongly object the change and leave it to the
> > maintainers to decide.
>
> Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
> since it appears that this information is of more use to the developer (who
> most likely has the suitable log level set anyway), and given that there is
> no way to reliably detect a presence in a slot on some platforms, this
> might otherwise, add to the other messages that normal users don't pay
> attention to usually - if this is not to be treated as an error.
>
No, this is clearly not a debug message. As I quoted above, the link up
can fail due to an issue with PHY also. In that case, user has to see
the log to debug/report the issue.
So, either dev_info() or dev_err().
Thanks,
Mani
> Krzysztof
Hello,
> > Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
> > since it appears that this information is of more use to the developer (who
> > most likely has the suitable log level set anyway), and given that there is
> > no way to reliably detect a presence in a slot on some platforms, this
> > might otherwise, add to the other messages that normal users don't pay
> > attention to usually - if this is not to be treated as an error.
> >
>
> No, this is clearly not a debug message. As I quoted above, the link up
> can fail due to an issue with PHY also. In that case, user has to see
> the log to debug/report the issue.
Apologies! I missed that. Thank you!
> So, either dev_info() or dev_err().
So, there is nothing to do here, then. This stays as dev_err() as per the
change from:
14c4ad125cf9 ("PCI: dwc: Log link speed and width if it comes up")
That said, perhaps adding a comment explaining why this is an error
might help future reference, as the linked commit didn't justify the
change. There also will be redundant messages printed, as per:
https://elixir.bootlin.com/linux/v6.0-rc5/source/drivers/pci/controller/dwc/pcie-fu740.c#L207
Which the linked commit didn't take into account, I suppose.
Krzysztof
On 14/09/2022 15:52, Krzysztof Wilczyński wrote:
> Hello,
>
>>> Perhaps it makes sense to make this a dev_dbg() over dev_info(), especially
>>> since it appears that this information is of more use to the developer (who
>>> most likely has the suitable log level set anyway), and given that there is
>>> no way to reliably detect a presence in a slot on some platforms, this
>>> might otherwise, add to the other messages that normal users don't pay
>>> attention to usually - if this is not to be treated as an error.
>>>
>>
>> No, this is clearly not a debug message. As I quoted above, the link up
>> can fail due to an issue with PHY also. In that case, user has to see
>> the log to debug/report the issue.
>
> Apologies! I missed that. Thank you!
>
>> So, either dev_info() or dev_err().
>
> So, there is nothing to do here, then. This stays as dev_err() as per the
> change from:
>
> 14c4ad125cf9 ("PCI: dwc: Log link speed and width if it comes up")
I am not sure I agree. There is a similar change here ...
commit 4b16a8227907118e011fb396022da671a52b2272
Author: Manikanta Maddireddy <[email protected]>
Date: Tue Jun 18 23:32:06 2019 +0530
PCI: tegra: Change link retry log level to debug
If we have a way to determine if a card/device is connected then dev_err
is appropriate, but if not then dev_dbg/info are appropriate IMO.
Jon
--
nvpublic
On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
>
> Signed-off-by: Vidya Sagar <[email protected]>
Since the PHY issue that could happen during boot up is rare, it looks
okay to me treating the log as INFO.
Acked-by: Manivannan Sadhasivam <[email protected]>
Thanks,
Mani
> ---
> drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index 650a7f22f9d0..25154555aa7a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> }
>
> if (retries >= LINK_WAIT_MAX_RETRIES) {
> - dev_err(pci->dev, "Phy link never came up\n");
> + dev_info(pci->dev, "Phy link never came up\n");
> return -ETIMEDOUT;
> }
>
> --
> 2.17.1
>
On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
<[email protected]> wrote:
>
> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > not having an endpoint connected to the slot is not an error.
> > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > >
> > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > returned. So I don't think the log severity should be changed.
> > >
> > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > that a similar change was made by the following commit and it also seems
> > > appropriate here ...
> > >
> > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > Author: Manikanta Maddireddy <[email protected]>
> > > Date: Tue Jun 18 23:32:06 2019 +0530
> > >
> > > PCI: tegra: Change link retry log level to debug
> > >
> > >
> > > BTW, we check for error messages in the dmesg output and this is a new error
> > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > error, but in this case it seem more appropriate to make this a info or
> > > debug level print.
> >
> > Can you tell whether there's a device present, e.g., via Slot Status
> > Presence Detect? If there's nothing in the slot, I don't know why we
> > would print anything at all. If a card is present but there's no
> > link, that's probably worthy of dev_info() or even dev_err().
> >
>
> I don't think all form factors allow for the PRSNT pin to be wired up,
> so we cannot know if the device is actually present in the slot or not all
> the time. Maybe we should do if the form factor supports it?
>
> > I guess if you can tell the slot is empty, there's no point in even
> > trying to start the link, so you could avoid both the message and the
> > timeout by not even calling dw_pcie_wait_for_link().
>
> Right. There is an overhead of waiting for ~1ms during boot.
Async probe should mitigate that, right? Saravana is working toward
making that the default instead of opt in, but you could opt in now.
Rob
On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> <[email protected]> wrote:
> >
> > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > >
> > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > returned. So I don't think the log severity should be changed.
> > > >
> > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > that a similar change was made by the following commit and it also seems
> > > > appropriate here ...
> > > >
> > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > Author: Manikanta Maddireddy <[email protected]>
> > > > Date: Tue Jun 18 23:32:06 2019 +0530
> > > >
> > > > PCI: tegra: Change link retry log level to debug
> > > >
> > > >
> > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > error, but in this case it seem more appropriate to make this a info or
> > > > debug level print.
> > >
> > > Can you tell whether there's a device present, e.g., via Slot Status
> > > Presence Detect? If there's nothing in the slot, I don't know why we
> > > would print anything at all. If a card is present but there's no
> > > link, that's probably worthy of dev_info() or even dev_err().
> > >
> >
> > I don't think all form factors allow for the PRSNT pin to be wired up,
> > so we cannot know if the device is actually present in the slot or not all
> > the time. Maybe we should do if the form factor supports it?
> >
> > > I guess if you can tell the slot is empty, there's no point in even
> > > trying to start the link, so you could avoid both the message and the
> > > timeout by not even calling dw_pcie_wait_for_link().
> >
> > Right. There is an overhead of waiting for ~1ms during boot.
>
> Async probe should mitigate that, right? Saravana is working toward
> making that the default instead of opt in, but you could opt in now.
>
No. The delay is due to the DWC core waiting for link up that depends on
the PCIe device to be present on the slot. The driver probe order
doesn't apply here.
Thanks,
Mani
> Rob
Hi,
Just checking if we are good with this patch or does it need any further
modifications?
Thanks,
Vidya Sagar
On 9/15/2022 8:22 PM, Manivannan Sadhasivam wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
>> <[email protected]> wrote:
>>>
>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>>>>> not having an endpoint connected to the slot is not an error.
>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>>>
>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
>>>>>> returned. So I don't think the log severity should be changed.
>>>>>
>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
>>>>> that a similar change was made by the following commit and it also seems
>>>>> appropriate here ...
>>>>>
>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>>>> Author: Manikanta Maddireddy <[email protected]>
>>>>> Date: Tue Jun 18 23:32:06 2019 +0530
>>>>>
>>>>> PCI: tegra: Change link retry log level to debug
>>>>>
>>>>>
>>>>> BTW, we check for error messages in the dmesg output and this is a new error
>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
>>>>> error, but in this case it seem more appropriate to make this a info or
>>>>> debug level print.
>>>>
>>>> Can you tell whether there's a device present, e.g., via Slot Status
>>>> Presence Detect? If there's nothing in the slot, I don't know why we
>>>> would print anything at all. If a card is present but there's no
>>>> link, that's probably worthy of dev_info() or even dev_err().
>>>>
>>>
>>> I don't think all form factors allow for the PRSNT pin to be wired up,
>>> so we cannot know if the device is actually present in the slot or not all
>>> the time. Maybe we should do if the form factor supports it?
>>>
>>>> I guess if you can tell the slot is empty, there's no point in even
>>>> trying to start the link, so you could avoid both the message and the
>>>> timeout by not even calling dw_pcie_wait_for_link().
>>>
>>> Right. There is an overhead of waiting for ~1ms during boot.
>>
>> Async probe should mitigate that, right? Saravana is working toward
>> making that the default instead of opt in, but you could opt in now.
>>
>
> No. The delay is due to the DWC core waiting for link up that depends on
> the PCIe device to be present on the slot. The driver probe order
> doesn't apply here.
>
> Thanks,
> Mani
>
>> Rob
Hi Bjorn / Lorenzo,
Just checking if there are any more comments for this patch? If not, are
we good to take it?
Thanks,
Vidya Sagar
On 9/26/2022 3:59 PM, Vidya Sagar wrote:
> Hi,
> Just checking if we are good with this patch or does it need any further
> modifications?
>
> Thanks,
> Vidya Sagar
>
> On 9/15/2022 8:22 PM, Manivannan Sadhasivam wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
>>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
>>> <[email protected]> wrote:
>>>>
>>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open
>>>>>>>> slots and
>>>>>>>> not having an endpoint connected to the slot is not an error.
>>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>>>>
>>>>>>> But the link up not happening is an actual error and -ETIMEDOUT
>>>>>>> is being
>>>>>>> returned. So I don't think the log severity should be changed.
>>>>>>
>>>>>> Yes it is an error in the sense it is a timeout, but reporting an
>>>>>> error
>>>>>> because nothing is attached to a PCI slot seems a bit noisy.
>>>>>> Please note
>>>>>> that a similar change was made by the following commit and it also
>>>>>> seems
>>>>>> appropriate here ...
>>>>>>
>>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>>>>> Author: Manikanta Maddireddy <[email protected]>
>>>>>> Date: Tue Jun 18 23:32:06 2019 +0530
>>>>>>
>>>>>> PCI: tegra: Change link retry log level to debug
>>>>>>
>>>>>>
>>>>>> BTW, we check for error messages in the dmesg output and this is a
>>>>>> new error
>>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can
>>>>>> ignore the
>>>>>> error, but in this case it seem more appropriate to make this a
>>>>>> info or
>>>>>> debug level print.
>>>>>
>>>>> Can you tell whether there's a device present, e.g., via Slot Status
>>>>> Presence Detect? If there's nothing in the slot, I don't know why we
>>>>> would print anything at all. If a card is present but there's no
>>>>> link, that's probably worthy of dev_info() or even dev_err().
>>>>>
>>>>
>>>> I don't think all form factors allow for the PRSNT pin to be wired up,
>>>> so we cannot know if the device is actually present in the slot or
>>>> not all
>>>> the time. Maybe we should do if the form factor supports it?
>>>>
>>>>> I guess if you can tell the slot is empty, there's no point in even
>>>>> trying to start the link, so you could avoid both the message and the
>>>>> timeout by not even calling dw_pcie_wait_for_link().
>>>>
>>>> Right. There is an overhead of waiting for ~1ms during boot.
>>>
>>> Async probe should mitigate that, right? Saravana is working toward
>>> making that the default instead of opt in, but you could opt in now.
>>>
>>
>> No. The delay is due to the DWC core waiting for link up that depends on
>> the PCIe device to be present on the slot. The driver probe order
>> doesn't apply here.
>>
>> Thanks,
>> Mani
>>
>>> Rob
On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
<[email protected]> wrote:
>
> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> > <[email protected]> wrote:
> > >
> > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > > >
> > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > > returned. So I don't think the log severity should be changed.
> > > > >
> > > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > > that a similar change was made by the following commit and it also seems
> > > > > appropriate here ...
> > > > >
> > > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > > Author: Manikanta Maddireddy <[email protected]>
> > > > > Date: Tue Jun 18 23:32:06 2019 +0530
> > > > >
> > > > > PCI: tegra: Change link retry log level to debug
> > > > >
> > > > >
> > > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > > error, but in this case it seem more appropriate to make this a info or
> > > > > debug level print.
> > > >
> > > > Can you tell whether there's a device present, e.g., via Slot Status
> > > > Presence Detect? If there's nothing in the slot, I don't know why we
> > > > would print anything at all. If a card is present but there's no
> > > > link, that's probably worthy of dev_info() or even dev_err().
> > > >
> > >
> > > I don't think all form factors allow for the PRSNT pin to be wired up,
> > > so we cannot know if the device is actually present in the slot or not all
> > > the time. Maybe we should do if the form factor supports it?
> > >
> > > > I guess if you can tell the slot is empty, there's no point in even
> > > > trying to start the link, so you could avoid both the message and the
> > > > timeout by not even calling dw_pcie_wait_for_link().
> > >
> > > Right. There is an overhead of waiting for ~1ms during boot.
> >
> > Async probe should mitigate that, right? Saravana is working toward
> > making that the default instead of opt in, but you could opt in now.
> >
>
> No. The delay is due to the DWC core waiting for link up that depends on
> the PCIe device to be present on the slot.
Yes, I understand that already.
> The driver probe order
> doesn't apply here.
I'm not talking about probe order, but rather async probe enabling
parallel probing. If waiting for the link happens asynchronously, then
other probes can happen in parallel and you won't see the delay (until
you run out of cores or all the other probes are faster).
Rob
On 10/4/2022 6:23 PM, Rob Herring wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
> <[email protected]> wrote:
>>
>> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
>>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
>>> <[email protected]> wrote:
>>>>
>>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
>>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
>>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
>>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
>>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>>>>>> not having an endpoint connected to the slot is not an error.
>>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>>>>
>>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
>>>>>>> returned. So I don't think the log severity should be changed.
>>>>>>
>>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
>>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
>>>>>> that a similar change was made by the following commit and it also seems
>>>>>> appropriate here ...
>>>>>>
>>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
>>>>>> Author: Manikanta Maddireddy <[email protected]>
>>>>>> Date: Tue Jun 18 23:32:06 2019 +0530
>>>>>>
>>>>>> PCI: tegra: Change link retry log level to debug
>>>>>>
>>>>>>
>>>>>> BTW, we check for error messages in the dmesg output and this is a new error
>>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
>>>>>> error, but in this case it seem more appropriate to make this a info or
>>>>>> debug level print.
>>>>>
>>>>> Can you tell whether there's a device present, e.g., via Slot Status
>>>>> Presence Detect? If there's nothing in the slot, I don't know why we
>>>>> would print anything at all. If a card is present but there's no
>>>>> link, that's probably worthy of dev_info() or even dev_err().
>>>>>
>>>>
>>>> I don't think all form factors allow for the PRSNT pin to be wired up,
>>>> so we cannot know if the device is actually present in the slot or not all
>>>> the time. Maybe we should do if the form factor supports it?
>>>>
>>>>> I guess if you can tell the slot is empty, there's no point in even
>>>>> trying to start the link, so you could avoid both the message and the
>>>>> timeout by not even calling dw_pcie_wait_for_link().
>>>>
>>>> Right. There is an overhead of waiting for ~1ms during boot.
>>>
>>> Async probe should mitigate that, right? Saravana is working toward
>>> making that the default instead of opt in, but you could opt in now.
>>>
>>
>> No. The delay is due to the DWC core waiting for link up that depends on
>> the PCIe device to be present on the slot.
>
> Yes, I understand that already.
>
>> The driver probe order
>> doesn't apply here.
>
> I'm not talking about probe order, but rather async probe enabling
> parallel probing. If waiting for the link happens asynchronously, then
> other probes can happen in parallel and you won't see the delay (until
> you run out of cores or all the other probes are faster).
Are you suggesting to break the existing probe of DWC based PCIe
platform drivers into two i.e. sync part that handles the sequence up
until link up and the async part that starts after link is up (or after
LIKUP_TIMEOUT if link doesn't come up).
- Vidya Sagar
>
> Rob
>
Hi Bjorn,
On 13/09/2022 11:12, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
>
> Signed-off-by: Vidya Sagar <[email protected]>
> ---
> drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index 650a7f22f9d0..25154555aa7a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> }
>
> if (retries >= LINK_WAIT_MAX_RETRIES) {
> - dev_err(pci->dev, "Phy link never came up\n");
> + dev_info(pci->dev, "Phy link never came up\n");
> return -ETIMEDOUT;
> }
>
Are you OK to take this change?
Acked-by: Jon Hunter <[email protected]>
Tested-by: Jon Hunter <[email protected]>
Thanks
Jon
--
nvpublic
On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
> Hi Bjorn,
>
> On 13/09/2022 11:12, Vidya Sagar wrote:
> > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > not having an endpoint connected to the slot is not an error.
> > So, changing the macro from dev_err to dev_info to log the event.
> >
> > Signed-off-by: Vidya Sagar <[email protected]>
> > ---
> > drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > index 650a7f22f9d0..25154555aa7a 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> > }
> > if (retries >= LINK_WAIT_MAX_RETRIES) {
> > - dev_err(pci->dev, "Phy link never came up\n");
> > + dev_info(pci->dev, "Phy link never came up\n");
> > return -ETIMEDOUT;
> > }
>
>
> Are you OK to take this change?
When this came up, Lorenzo was in the middle of a big move and I was
covering for him while he was unavailable. But he's back, and I'm
sure he will resolve this soon.
Personally I'm OK either way.
Bjorn
Hi Lorenzo,
On 18/10/2022 17:43, Bjorn Helgaas wrote:
> On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
>> Hi Bjorn,
>>
>> On 13/09/2022 11:12, Vidya Sagar wrote:
>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>> not having an endpoint connected to the slot is not an error.
>>> So, changing the macro from dev_err to dev_info to log the event.
>>>
>>> Signed-off-by: Vidya Sagar <[email protected]>
>>> ---
>>> drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
>>> index 650a7f22f9d0..25154555aa7a 100644
>>> --- a/drivers/pci/controller/dwc/pcie-designware.c
>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c
>>> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>>> }
>>> if (retries >= LINK_WAIT_MAX_RETRIES) {
>>> - dev_err(pci->dev, "Phy link never came up\n");
>>> + dev_info(pci->dev, "Phy link never came up\n");
>>> return -ETIMEDOUT;
>>> }
>>
>>
>> Are you OK to take this change?
>
> When this came up, Lorenzo was in the middle of a big move and I was
> covering for him while he was unavailable. But he's back, and I'm
> sure he will resolve this soon.
>
> Personally I'm OK either way.
>
> Bjorn
Can we come to a conclusion on this?
We have tests that fail when errors/warning messages are reported. We
can choose to ignore these errors, but given that this is not an error
in this case, we were thinking it is better to make it informational.
Thanks
Jon
--
nvpublic
On Wed, Oct 26, 2022 at 12:39:13PM +0100, Jon Hunter wrote:
> Hi Lorenzo,
>
> On 18/10/2022 17:43, Bjorn Helgaas wrote:
> > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
> > > Hi Bjorn,
> > >
> > > On 13/09/2022 11:12, Vidya Sagar wrote:
> > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > not having an endpoint connected to the slot is not an error.
> > > > So, changing the macro from dev_err to dev_info to log the event.
> > > >
> > > > Signed-off-by: Vidya Sagar <[email protected]>
> > > > ---
> > > > drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > > > index 650a7f22f9d0..25154555aa7a 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> > > > }
> > > > if (retries >= LINK_WAIT_MAX_RETRIES) {
> > > > - dev_err(pci->dev, "Phy link never came up\n");
> > > > + dev_info(pci->dev, "Phy link never came up\n");
> > > > return -ETIMEDOUT;
> > > > }
> > >
> > >
> > > Are you OK to take this change?
> >
> > When this came up, Lorenzo was in the middle of a big move and I was
> > covering for him while he was unavailable. But he's back, and I'm
> > sure he will resolve this soon.
> >
> > Personally I'm OK either way.
> >
> > Bjorn
>
>
> Can we come to a conclusion on this?
>
> We have tests that fail when errors/warning messages are reported. We can
> choose to ignore these errors, but given that this is not an error in this
> case, we were thinking it is better to make it informational.
I understood.
We are at v6.1-rc2, this patch is not a fix, we will handle it for the
v6.2 merge window.
Thanks,
Lorenzo
On Mon, Oct 10, 2022 at 1:02 AM Vidya Sagar <[email protected]> wrote:
>
>
>
> On 10/4/2022 6:23 PM, Rob Herring wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
> > <[email protected]> wrote:
> >>
> >> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> >>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> >>> <[email protected]> wrote:
> >>>>
> >>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> >>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> >>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> >>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> >>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> >>>>>>>> not having an endpoint connected to the slot is not an error.
> >>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
> >>>>>>>
> >>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
> >>>>>>> returned. So I don't think the log severity should be changed.
> >>>>>>
> >>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
> >>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
> >>>>>> that a similar change was made by the following commit and it also seems
> >>>>>> appropriate here ...
> >>>>>>
> >>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
> >>>>>> Author: Manikanta Maddireddy <[email protected]>
> >>>>>> Date: Tue Jun 18 23:32:06 2019 +0530
> >>>>>>
> >>>>>> PCI: tegra: Change link retry log level to debug
> >>>>>>
> >>>>>>
> >>>>>> BTW, we check for error messages in the dmesg output and this is a new error
> >>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> >>>>>> error, but in this case it seem more appropriate to make this a info or
> >>>>>> debug level print.
> >>>>>
> >>>>> Can you tell whether there's a device present, e.g., via Slot Status
> >>>>> Presence Detect? If there's nothing in the slot, I don't know why we
> >>>>> would print anything at all. If a card is present but there's no
> >>>>> link, that's probably worthy of dev_info() or even dev_err().
> >>>>>
> >>>>
> >>>> I don't think all form factors allow for the PRSNT pin to be wired up,
> >>>> so we cannot know if the device is actually present in the slot or not all
> >>>> the time. Maybe we should do if the form factor supports it?
> >>>>
> >>>>> I guess if you can tell the slot is empty, there's no point in even
> >>>>> trying to start the link, so you could avoid both the message and the
> >>>>> timeout by not even calling dw_pcie_wait_for_link().
> >>>>
> >>>> Right. There is an overhead of waiting for ~1ms during boot.
> >>>
> >>> Async probe should mitigate that, right? Saravana is working toward
> >>> making that the default instead of opt in, but you could opt in now.
> >>>
> >>
> >> No. The delay is due to the DWC core waiting for link up that depends on
> >> the PCIe device to be present on the slot.
> >
> > Yes, I understand that already.
> >
> >> The driver probe order
> >> doesn't apply here.
> >
> > I'm not talking about probe order, but rather async probe enabling
> > parallel probing. If waiting for the link happens asynchronously, then
> > other probes can happen in parallel and you won't see the delay (until
> > you run out of cores or all the other probes are faster).
>
> Are you suggesting to break the existing probe of DWC based PCIe
> platform drivers into two i.e. sync part that handles the sequence up
> until link up and the async part that starts after link is up (or after
> LIKUP_TIMEOUT if link doesn't come up).
No, just make the driver opt-in to async probe. It's 1 flag to set for
the driver. Then the delay in your probe is not blocking other probes
and the whole probe for the driver happens in parallel. Then the delay
is only an issue if it is longer than all the other things
initializing during boot or if you are on a single core system.
Neither is likely true.
Rob
On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> <[email protected]> wrote:
> >
> > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > >
> > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > returned. So I don't think the log severity should be changed.
> > > >
> > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > that a similar change was made by the following commit and it also seems
> > > > appropriate here ...
> > > >
> > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > Author: Manikanta Maddireddy <[email protected]>
> > > > Date: Tue Jun 18 23:32:06 2019 +0530
> > > >
> > > > PCI: tegra: Change link retry log level to debug
> > > >
> > > >
> > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > error, but in this case it seem more appropriate to make this a info or
> > > > debug level print.
> > >
> > > Can you tell whether there's a device present, e.g., via Slot Status
> > > Presence Detect? If there's nothing in the slot, I don't know why we
> > > would print anything at all. If a card is present but there's no
> > > link, that's probably worthy of dev_info() or even dev_err().
> > >
> >
> > I don't think all form factors allow for the PRSNT pin to be wired up,
> > so we cannot know if the device is actually present in the slot or not all
> > the time. Maybe we should do if the form factor supports it?
> >
> > > I guess if you can tell the slot is empty, there's no point in even
> > > trying to start the link, so you could avoid both the message and the
> > > timeout by not even calling dw_pcie_wait_for_link().
> >
> > Right. There is an overhead of waiting for ~1ms during boot.
>
> Async probe should mitigate that, right? Saravana is working toward
> making that the default instead of opt in, but you could opt in now.
I read this as "trying to bring the link up is mandatory because
we can't detect an empty slot, therefore ~1ms delay during boot
is unavoidable" and on that Rob's suggestion applies.
Just to understand where this thread is going. The suggestion
above does not change the aim of this patch, that seems reasonable
to me and it makes sense even if/when what Rob is suggesting is
implemented.
Is that correct ?
Thanks,
Lorenzo
On 26/10/2022 12:39, Jon Hunter wrote:
> Hi Lorenzo,
>
> On 18/10/2022 17:43, Bjorn Helgaas wrote:
>> On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
>>> Hi Bjorn,
>>>
>>> On 13/09/2022 11:12, Vidya Sagar wrote:
>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
>>>> not having an endpoint connected to the slot is not an error.
>>>> So, changing the macro from dev_err to dev_info to log the event.
>>>>
>>>> Signed-off-by: Vidya Sagar <[email protected]>
>>>> ---
>>>> drivers/pci/controller/dwc/pcie-designware.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c
>>>> b/drivers/pci/controller/dwc/pcie-designware.c
>>>> index 650a7f22f9d0..25154555aa7a 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-designware.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c
>>>> @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
>>>> }
>>>> if (retries >= LINK_WAIT_MAX_RETRIES) {
>>>> - dev_err(pci->dev, "Phy link never came up\n");
>>>> + dev_info(pci->dev, "Phy link never came up\n");
>>>> return -ETIMEDOUT;
>>>> }
>>>
>>>
>>> Are you OK to take this change?
>>
>> When this came up, Lorenzo was in the middle of a big move and I was
>> covering for him while he was unavailable. But he's back, and I'm
>> sure he will resolve this soon.
>>
>> Personally I'm OK either way.
>>
>> Bjorn
>
>
> Can we come to a conclusion on this?
>
> We have tests that fail when errors/warning messages are reported. We
> can choose to ignore these errors, but given that this is not an error
> in this case, we were thinking it is better to make it informational.
Is there any hardware presence detect available to just avoid even
trying to bring a link up on an disconnected port?
--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
https://www.codethink.co.uk/privacy.html
On Thu, Oct 27, 2022 at 11:39:02AM +0200, Lorenzo Pieralisi wrote:
> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> > On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> > <[email protected]> wrote:
> > >
> > > On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> > > > > On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> > > > > > On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> > > > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > > > not having an endpoint connected to the slot is not an error.
> > > > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > > >
> > > > > > But the link up not happening is an actual error and -ETIMEDOUT is being
> > > > > > returned. So I don't think the log severity should be changed.
> > > > >
> > > > > Yes it is an error in the sense it is a timeout, but reporting an error
> > > > > because nothing is attached to a PCI slot seems a bit noisy. Please note
> > > > > that a similar change was made by the following commit and it also seems
> > > > > appropriate here ...
> > > > >
> > > > > commit 4b16a8227907118e011fb396022da671a52b2272
> > > > > Author: Manikanta Maddireddy <[email protected]>
> > > > > Date: Tue Jun 18 23:32:06 2019 +0530
> > > > >
> > > > > PCI: tegra: Change link retry log level to debug
> > > > >
> > > > >
> > > > > BTW, we check for error messages in the dmesg output and this is a new error
> > > > > seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> > > > > error, but in this case it seem more appropriate to make this a info or
> > > > > debug level print.
> > > >
> > > > Can you tell whether there's a device present, e.g., via Slot Status
> > > > Presence Detect? If there's nothing in the slot, I don't know why we
> > > > would print anything at all. If a card is present but there's no
> > > > link, that's probably worthy of dev_info() or even dev_err().
> > > >
> > >
> > > I don't think all form factors allow for the PRSNT pin to be wired up,
> > > so we cannot know if the device is actually present in the slot or not all
> > > the time. Maybe we should do if the form factor supports it?
> > >
> > > > I guess if you can tell the slot is empty, there's no point in even
> > > > trying to start the link, so you could avoid both the message and the
> > > > timeout by not even calling dw_pcie_wait_for_link().
> > >
> > > Right. There is an overhead of waiting for ~1ms during boot.
> >
> > Async probe should mitigate that, right? Saravana is working toward
> > making that the default instead of opt in, but you could opt in now.
>
> I read this as "trying to bring the link up is mandatory because
> we can't detect an empty slot, therefore ~1ms delay during boot
> is unavoidable" and on that Rob's suggestion applies.
>
Right. Rob's suggestion came as areply to my worry on waiting for link up
during boot.
> Just to understand where this thread is going. The suggestion
> above does not change the aim of this patch, that seems reasonable
> to me and it makes sense even if/when what Rob is suggesting is
> implemented.
>
> Is that correct ?
>
Yes, Rob's suggestion makes sense to me. And it has to be implemented as a
separate patch but this patch itself is required to demote the error log to
debug.
Thanks,
Mani
> Thanks,
> Lorenzo
--
மணிவண்ணன் சதாசிவம்
On Thu, Oct 27, 2022 at 10:55:34AM +0100, Ben Dooks wrote:
> On 26/10/2022 12:39, Jon Hunter wrote:
> > Hi Lorenzo,
> >
> > On 18/10/2022 17:43, Bjorn Helgaas wrote:
> > > On Tue, Oct 18, 2022 at 07:21:54AM +0100, Jon Hunter wrote:
> > > > Hi Bjorn,
> > > >
> > > > On 13/09/2022 11:12, Vidya Sagar wrote:
> > > > > Some of the platforms (like Tegra194 and Tegra234) have open slots and
> > > > > not having an endpoint connected to the slot is not an error.
> > > > > So, changing the macro from dev_err to dev_info to log the event.
> > > > >
> > > > > Signed-off-by: Vidya Sagar <[email protected]>
> > > > > ---
> > > > > drivers/pci/controller/dwc/pcie-designware.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > index 650a7f22f9d0..25154555aa7a 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > @@ -456,7 +456,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
> > > > > }
> > > > > if (retries >= LINK_WAIT_MAX_RETRIES) {
> > > > > - dev_err(pci->dev, "Phy link never came up\n");
> > > > > + dev_info(pci->dev, "Phy link never came up\n");
> > > > > return -ETIMEDOUT;
> > > > > }
> > > >
> > > >
> > > > Are you OK to take this change?
> > >
> > > When this came up, Lorenzo was in the middle of a big move and I was
> > > covering for him while he was unavailable. But he's back, and I'm
> > > sure he will resolve this soon.
> > >
> > > Personally I'm OK either way.
> > >
> > > Bjorn
> >
> >
> > Can we come to a conclusion on this?
> >
> > We have tests that fail when errors/warning messages are reported. We
> > can choose to ignore these errors, but given that this is not an error
> > in this case, we were thinking it is better to make it informational.
>
> Is there any hardware presence detect available to just avoid even
> trying to bring a link up on an disconnected port?
>
PRSNT pin is not available on all form factors sadly.
Thanks,
Mani
>
> --
> Ben Dooks http://www.codethink.co.uk/
> Senior Engineer Codethink - Providing Genius
>
> https://www.codethink.co.uk/privacy.html
>
--
மணிவண்ணன் சதாசிவம்
On Tue, 13 Sep 2022 15:42:37 +0530, Vidya Sagar wrote:
> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> not having an endpoint connected to the slot is not an error.
> So, changing the macro from dev_err to dev_info to log the event.
>
>
Applied to pci/dwc, thanks!
[1/1] PCI: dwc: Use dev_info for PCIe link down event logging
https://git.kernel.org/lpieralisi/pci/c/8405d8f0956d
Thanks,
Lorenzo