2024-03-01 17:36:44

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

Besides Link Training bit, pcie_retrain_link() can also be asked to
wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
'pci/enumeration'").

pcie_retrain_link() first tries to ensure Link Training is not
currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
which must always check Link Training bit regardless of 'use_lt'.
Correct the pcie_wait_for_link_status() parameters to only wait for
the correct bit to ensure there is no ongoing Link Training.

Since waiting for Data Link Layer Link Active bit is only used for the
Target Speed quirk, this only impacts the case when the quirk attempts
to restore speed to higher than 2.5 GT/s (The link is Up at that point
so pcie_retrain_link() will fail).

Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Signed-off-by: Ilpo Järvinen <[email protected]>
---
drivers/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d8f11a078924..251a0c66c8cb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5016,7 +5016,7 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
* avoid LTSSM race as recommended in Implementation Note at the
* end of PCIe r6.0.1 sec 7.5.3.7.
*/
- rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+ rc = pcie_wait_for_link_status(pdev, true, false);
if (rc)
return rc;

--
2.39.2



2024-03-04 11:21:38

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Fri, 1 Mar 2024, Maciej W. Rozycki wrote:

> On Fri, 1 Mar 2024, Ilpo J?rvinen wrote:
>
> > Besides Link Training bit, pcie_retrain_link() can also be asked to
> > wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
> > 'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
> > 'pci/enumeration'").
>
> Nope, it was added with commit 680e9c47a229 ("PCI: Add support for
> polling DLLLA to pcie_retrain_link()"), before said merge.

Ah sorry, my wording was not good here, I meant on the line I was
changing in the patch and that line didn't exist in 680e9c47a229 at all.

So yes, DLLLA and use_lt waiting was added in 680e9c47a229 but the merge
commit brought the implementation note related code into
pcie_retrain_link() which I think was mismerged when it comes to use_lt.

> > pcie_retrain_link() first tries to ensure Link Training is not
> > currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
> > which must always check Link Training bit regardless of 'use_lt'.
> > Correct the pcie_wait_for_link_status() parameters to only wait for
> > the correct bit to ensure there is no ongoing Link Training.
>
> You're talking the PCIe spec here and code is talking a bad device case.
>
> > Since waiting for Data Link Layer Link Active bit is only used for the
> > Target Speed quirk, this only impacts the case when the quirk attempts
> > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > so pcie_retrain_link() will fail).
>
> NAK. It's used for both clamping and unclamping and it will break the
> workaround, because the whole point there is to wait until DLLA has been
> set. Using LT is not reliable because it will oscillate in the failure
> case and seeing the bit clear does not mean link has been established.

In pcie_retrain_link(), there are two calls into
pcie_wait_for_link_status() and the second one of them is meant to
implement the link-has-been-established check.

The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link
retraining race") and is just to ensure the link is not ongoing retraining
to make sure the latest configuration in captured as required by the
implementation note. LT being cleared is exactly what is wanted for that
check because it means that any earlier retraining has ended (a new one
might be starting but that doesn't matter, we saw it cleared so the new
configuration should be in effect for that instance of link retraining).

So my point is, the first check is not even meant to check that link has
been established.

> What are you trying to achieve here and what problem is it to fix?

Actually, I misthought what it breaks so the problem is not well described
above but I still think it is broken:

In the case of quirk, before 2.5GT/s is attempted DLLLA is not set,
right? Then quirk sets 2.5GT/s target speed and calls into
pcie_retrain_link().

The first call into pcie_wait_for_link_status() is made with (..., false,
true) which waits until DLLLA is set but this occurs before OS even
triggered Link Retraining. Since there's no retraining commanded by the
OS, DLLLA will not get set, the wait will fail and error is returned, and
the quirk too returns failure.

It could of course occur that because of the HW retraining loop
(independent of OS control), the link retrains itselfs to 2.5GT/s without
OS asking for it just by OS setting the target speed alone, which is well
possible given the HW behavior in your target speed quirk case is not
fully understood. Even if that's the case, it seems not good to rely on
the HW originating retraining loop triggering the link retraining that
is necessary here.

Maybe this is far fetched thought but perhaps it could explain why you
didn't get the link up with your setup when you tried to test it earlier.

Alternative approach to fix this problem would be to not make the first
call into pcie_wait_for_link_status() at all when use_lt = false.

Of course, I cannot test this with your configuration so I cannot
confirm how the target speed quirk behaves, I just found it by reading the
code. The current code does not make sense because the first wait is
supposed to wait for LT bit, not for DLLLA.

--
i.

2024-03-01 17:22:59

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Fri, 1 Mar 2024, Ilpo Järvinen wrote:

> Besides Link Training bit, pcie_retrain_link() can also be asked to
> wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
> 'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
> 'pci/enumeration'").

Nope, it was added with commit 680e9c47a229 ("PCI: Add support for
polling DLLLA to pcie_retrain_link()"), before said merge.

> pcie_retrain_link() first tries to ensure Link Training is not
> currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
> which must always check Link Training bit regardless of 'use_lt'.
> Correct the pcie_wait_for_link_status() parameters to only wait for
> the correct bit to ensure there is no ongoing Link Training.

You're talking the PCIe spec here and code is talking a bad device case.

> Since waiting for Data Link Layer Link Active bit is only used for the
> Target Speed quirk, this only impacts the case when the quirk attempts
> to restore speed to higher than 2.5 GT/s (The link is Up at that point
> so pcie_retrain_link() will fail).

NAK. It's used for both clamping and unclamping and it will break the
workaround, because the whole point there is to wait until DLLA has been
set. Using LT is not reliable because it will oscillate in the failure
case and seeing the bit clear does not mean link has been established.

What are you trying to achieve here and what problem is it to fix?

Maciej

2024-03-06 15:44:46

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Wed, 6 Mar 2024, Maciej W. Rozycki wrote:
> On Mon, 4 Mar 2024, Ilpo J?rvinen wrote:
>
> > > > Since waiting for Data Link Layer Link Active bit is only used for the
> > > > Target Speed quirk, this only impacts the case when the quirk attempts
> > > > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > > > so pcie_retrain_link() will fail).
> > >
> > > NAK. It's used for both clamping and unclamping and it will break the
> > > workaround, because the whole point there is to wait until DLLA has been
> > > set. Using LT is not reliable because it will oscillate in the failure
> > > case and seeing the bit clear does not mean link has been established.
> >
> > In pcie_retrain_link(), there are two calls into
> > pcie_wait_for_link_status() and the second one of them is meant to
> > implement the link-has-been-established check.
> >
> > The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link
> > retraining race") and is just to ensure the link is not ongoing retraining
> > to make sure the latest configuration in captured as required by the
> > implementation note. LT being cleared is exactly what is wanted for that
> > check because it means that any earlier retraining has ended (a new one
> > might be starting but that doesn't matter, we saw it cleared so the new
> > configuration should be in effect for that instance of link retraining).
> >
> > So my point is, the first check is not even meant to check that link has
> > been established.
>
> I see what you mean, and I now remember the note in the spec. I had
> concerns about it, but did not do anything about it at that point.
>
> I think we still have no guarantee that LT will be clear at the point we
> set RL, because LT could get reasserted by hardware between our read and
> the setting of RL.
>
> IIUC that doesn't matter really, because the new link
> parameters will be taken into account regardless of whether retraining was
> initiated by hardware in an attempt to do link recovery or triggered by
> software via RL.

I, too, was somewhat worried about having LT never clear for long enough
to successfully sample it during the wait but it's like you say, any new
link training should take account the new Target Speed which should
successfully bring the link up (assuming the quirk works in the first
place) and that should clear LT.

> > Of course, I cannot test this with your configuration so I cannot
> > confirm how the target speed quirk behaves, I just found it by reading the
> > code. The current code does not make sense because the first wait is
> > supposed to wait for LT bit, not for DLLLA.
>
> I think I now understand the problem correctly, and indeed from master
> Linux repo's point of view it's been a defect with the merge referred.
> So I withdraw my objection. Sorry about the confusion.

Thanks, and the confusion was entirely my fault caused my confusing
and partly wrong wording.

> However I think the last paragraph of your commit description:
>
> > Since waiting for Data Link Layer Link Active bit is only used for the
> > Target Speed quirk, this only impacts the case when the quirk attempts
> > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > so pcie_retrain_link() will fail).
>
> is not accurate enough, which contributed to my confusion. In particular
> `pcie_retrain_link' succeeds when the link is up, because it calls
> `pcie_wait_for_link_status' such as to succeed when either LT is clear or
> DLLLA is set.

I know, it was simply wrong because I somehow misthought which way those
use_lt negations worked (and thus thought I found a different problem
than there really was).

I already did major rewriting here to explain out how the code ended up
into its current shape but I'll also consider your explanation below
which is likely better than what I've currently, thanks.

> How about:
>
> This only impacts the Target Speed quirk, which is the only case where
> waiting for Data Link Layer Link Active bit is used. It currently works
> in the problematic case by means of link training getting initiated by
> hardware repeatedly and respecting the new link parameters set by the
> caller, which then make training succeed and bring the link up, setting
> DLLLA and causing pcie_wait_for_link_status() to return success. We are
> not supposed to rely on luck and need to make sure that LT transitioned
> through the inactive state though before we initiate link training by
> hand via RL.
>
> then?
>
> Also for `pcie_retrain_link' I think it would be clearer if we rephrased
> the note as follows:
>
> * Ensure the updated LNKCTL parameters are used during link
> * training by checking that there is no ongoing link training
> * that may have started before link parameters were changed, so
> * as to avoid LTSSM race as recommended in Implementation Note
> * at the end of PCIe r6.0.1 sec 7.5.3.7.
>
> NB I am currently away on holiday until the end of next week. However I
> do have access to my lab and I'll see if I can verify your change tonight
> or another evening this week, and overall thank you for your patience.

Don't worry. I can easily wait even until you're back from the holiday
so no need to push it or anything. :-)

--
i.

2024-03-14 12:40:49

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Wed, 6 Mar 2024, Ilpo J?rvinen wrote:

> On Wed, 6 Mar 2024, Maciej W. Rozycki wrote:
> > On Mon, 4 Mar 2024, Ilpo J?rvinen wrote:
> >
> > > > > Since waiting for Data Link Layer Link Active bit is only used for the
> > > > > Target Speed quirk, this only impacts the case when the quirk attempts
> > > > > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > > > > so pcie_retrain_link() will fail).
> > > >
> > > > NAK. It's used for both clamping and unclamping and it will break the
> > > > workaround, because the whole point there is to wait until DLLA has been
> > > > set. Using LT is not reliable because it will oscillate in the failure
> > > > case and seeing the bit clear does not mean link has been established.
> > >
> > > In pcie_retrain_link(), there are two calls into
> > > pcie_wait_for_link_status() and the second one of them is meant to
> > > implement the link-has-been-established check.
> > >
> > > The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link
> > > retraining race") and is just to ensure the link is not ongoing retraining
> > > to make sure the latest configuration in captured as required by the
> > > implementation note. LT being cleared is exactly what is wanted for that
> > > check because it means that any earlier retraining has ended (a new one
> > > might be starting but that doesn't matter, we saw it cleared so the new
> > > configuration should be in effect for that instance of link retraining).
> > >
> > > So my point is, the first check is not even meant to check that link has
> > > been established.
> >
> > I see what you mean, and I now remember the note in the spec. I had
> > concerns about it, but did not do anything about it at that point.
> >
> > I think we still have no guarantee that LT will be clear at the point we
> > set RL, because LT could get reasserted by hardware between our read and
> > the setting of RL.
> >
> > IIUC that doesn't matter really, because the new link
> > parameters will be taken into account regardless of whether retraining was
> > initiated by hardware in an attempt to do link recovery or triggered by
> > software via RL.
>
> I, too, was somewhat worried about having LT never clear for long enough
> to successfully sample it during the wait but it's like you say, any new
> link training should take account the new Target Speed which should
> successfully bring the link up (assuming the quirk works in the first
> place) and that should clear LT.

Hi,

One more point to add here, I started to wonder today why that use_lt
parameter is needed at all for pcie_retrain_link()?

Once the Target Speed has been changed to 2.5GT/s which is what the quirk
does before calling retraining, LT too should work "normally" after that.

--
i.

2024-03-14 21:58:32

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Thu, 14 Mar 2024, Ilpo Järvinen wrote:

> One more point to add here, I started to wonder today why that use_lt
> parameter is needed at all for pcie_retrain_link()?
>
> Once the Target Speed has been changed to 2.5GT/s which is what the quirk
> does before calling retraining, LT too should work "normally" after that.

We don't know if the link is going to become stable with the TLS update
to 2.5GT/s and we want to ensure that the link has reached the active
state before claiming victory; LT clear does not mean the link is active,
it only means what it means, that is that the link isn't being trained at
the moment.

Also we don't want to reset the TLS to the maximum before the link has
become active.

Does this answer your question?

Maciej

2024-03-06 12:24:01

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Mon, 4 Mar 2024, Ilpo Järvinen wrote:

> > > Since waiting for Data Link Layer Link Active bit is only used for the
> > > Target Speed quirk, this only impacts the case when the quirk attempts
> > > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > > so pcie_retrain_link() will fail).
> >
> > NAK. It's used for both clamping and unclamping and it will break the
> > workaround, because the whole point there is to wait until DLLA has been
> > set. Using LT is not reliable because it will oscillate in the failure
> > case and seeing the bit clear does not mean link has been established.
>
> In pcie_retrain_link(), there are two calls into
> pcie_wait_for_link_status() and the second one of them is meant to
> implement the link-has-been-established check.
>
> The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link
> retraining race") and is just to ensure the link is not ongoing retraining
> to make sure the latest configuration in captured as required by the
> implementation note. LT being cleared is exactly what is wanted for that
> check because it means that any earlier retraining has ended (a new one
> might be starting but that doesn't matter, we saw it cleared so the new
> configuration should be in effect for that instance of link retraining).
>
> So my point is, the first check is not even meant to check that link has
> been established.

I see what you mean, and I now remember the note in the spec. I had
concerns about it, but did not do anything about it at that point.

I think we still have no guarantee that LT will be clear at the point we
set RL, because LT could get reasserted by hardware between our read and
the setting of RL. IIUC that doesn't matter really, because the new link
parameters will be taken into account regardless of whether retraining was
initiated by hardware in an attempt to do link recovery or triggered by
software via RL.

> > What are you trying to achieve here and what problem is it to fix?
>
> Actually, I misthought what it breaks so the problem is not well described
> above but I still think it is broken:
>
> In the case of quirk, before 2.5GT/s is attempted DLLLA is not set,
> right? Then quirk sets 2.5GT/s target speed and calls into
> pcie_retrain_link().

Correct.

> The first call into pcie_wait_for_link_status() is made with (..., false,
> true) which waits until DLLLA is set but this occurs before OS even
> triggered Link Retraining. Since there's no retraining commanded by the
> OS, DLLLA will not get set, the wait will fail and error is returned, and
> the quirk too returns failure.
>
> It could of course occur that because of the HW retraining loop
> (independent of OS control), the link retrains itselfs to 2.5GT/s without
> OS asking for it just by OS setting the target speed alone, which is well
> possible given the HW behavior in your target speed quirk case is not
> fully understood. Even if that's the case, it seems not good to rely on
> the HW originating retraining loop triggering the link retraining that
> is necessary here.

I think it just confirms what I observed above (and which is surely noted
somewhere in the spec) that modified link parameters are taken into
account regardless of how retraining has been initiated and the link gets
established (at 2.5GT/s) at the first call to `pcie_wait_for_link_status'
already and returns successfully seeing DLLLA set.

> Maybe this is far fetched thought but perhaps it could explain why you
> didn't get the link up with your setup when you tried to test it earlier.
>
> Alternative approach to fix this problem would be to not make the first
> call into pcie_wait_for_link_status() at all when use_lt = false.
>
> Of course, I cannot test this with your configuration so I cannot
> confirm how the target speed quirk behaves, I just found it by reading the
> code. The current code does not make sense because the first wait is
> supposed to wait for LT bit, not for DLLLA.

I think I now understand the problem correctly, and indeed from master
Linux repo's point of view it's been a defect with the merge referred.
So I withdraw my objection. Sorry about the confusion.

However I think the last paragraph of your commit description:

> Since waiting for Data Link Layer Link Active bit is only used for the
> Target Speed quirk, this only impacts the case when the quirk attempts
> to restore speed to higher than 2.5 GT/s (The link is Up at that point
> so pcie_retrain_link() will fail).

is not accurate enough, which contributed to my confusion. In particular
`pcie_retrain_link' succeeds when the link is up, because it calls
`pcie_wait_for_link_status' such as to succeed when either LT is clear or
DLLLA is set. How about:

This only impacts the Target Speed quirk, which is the only case where
waiting for Data Link Layer Link Active bit is used. It currently works
in the problematic case by means of link training getting initiated by
hardware repeatedly and respecting the new link parameters set by the
caller, which then make training succeed and bring the link up, setting
DLLLA and causing pcie_wait_for_link_status() to return success. We are
not supposed to rely on luck and need to make sure that LT transitioned
through the inactive state though before we initiate link training by
hand via RL.

then?

Also for `pcie_retrain_link' I think it would be clearer if we rephrased
the note as follows:

* Ensure the updated LNKCTL parameters are used during link
* training by checking that there is no ongoing link training
* that may have started before link parameters were changed, so
* as to avoid LTSSM race as recommended in Implementation Note
* at the end of PCIe r6.0.1 sec 7.5.3.7.

NB I am currently away on holiday until the end of next week. However I
do have access to my lab and I'll see if I can verify your change tonight
or another evening this week, and overall thank you for your patience.

Maciej

2024-03-15 09:59:03

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

On Thu, 14 Mar 2024, Maciej W. Rozycki wrote:

> On Thu, 14 Mar 2024, Ilpo Järvinen wrote:
>
> > One more point to add here, I started to wonder today why that use_lt
> > parameter is needed at all for pcie_retrain_link()?
> >
> > Once the Target Speed has been changed to 2.5GT/s which is what the quirk
> > does before calling retraining, LT too should work "normally" after that.
>
> We don't know if the link is going to become stable with the TLS update
> to 2.5GT/s and we want to ensure that the link has reached the active
> state before claiming victory; LT clear does not mean the link is active,
> it only means what it means, that is that the link isn't being trained at
> the moment.

LT clear means retraining has ended which is the condition
pcie_retrain_link() should terminate at. It tried and finished retraining
as proven by LT clear.

> Also we don't want to reset the TLS to the maximum before the link has
> become active.

I'm not suggesting to change the if DLLLA check that is within the quirk
so it will remain the same even if pcie_retrain_link() would no longer
have the use_lt parameter.

If LT clear after retraining does not imply DLLLA set, then this again
falls into the quirk territory and IMO the quirk itself should be what
makes the additional call to wait for DLLLA, not pcie_retrain_link().
I suspect though that DL clear does imply DLLLA set (after the target
speed was lowered) so my expectation is that the extra wait wouldn't be
necessary at that point.

> Does this answer your question?

What I'm trying to achieve is having pcie_retrain_link() focus on
retraining and quirk on steps required by the quirk. Currently they're
kind of mixed if we assume the assumption that LT clear doesn't imply
active link is true. That tells quirk would need additional step,
that is, wait for DLLLA after the retraining has completed which is
currently hidden into pcie_retrain_link() rather than explicitly
called by the quirk.

--
i.