2020-04-03 20:08:41

by Lyude Paul

[permalink] [raw]
Subject: [PATCH 0/4] drm/dp_mst: drm_dp_check_act_status() fixes

Noticed this while fixing some unrelated issues with NAKs being dropped
- we don't wait nearly long enough to receive ACTs from MST hubs in some
situations. Also, we take the time to refactor this function a bit.

This fixes some ACT timeouts I observed on an EVGA MST hub with i915.

Lyude Paul (4):
drm/dp_mst: Improve kdocs for drm_dp_check_act_status()
drm/dp_mst: Reformat drm_dp_check_act_status() a bit
drm/dp_mst: Increase ACT retry timeout to 3s
drm/dp_mst: Print errors on ACT timeouts

drivers/gpu/drm/drm_dp_mst_topology.c | 50 ++++++++++++++++++---------
1 file changed, 34 insertions(+), 16 deletions(-)

--
2.25.1


2020-04-03 20:08:47

by Lyude Paul

[permalink] [raw]
Subject: [PATCH 2/4] drm/dp_mst: Reformat drm_dp_check_act_status() a bit

Just add a bit more line wrapping, get rid of some extraneous
whitespace, remove an unneeded goto label, and move around some variable
declarations. No functional changes here.

Signed-off-by: Lyude Paul <[email protected]>
[this isn't a fix, but it's needed for the fix that comes after this]
Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)")
Cc: Sean Paul <[email protected]>
Cc: <[email protected]> # v3.17+
---
drivers/gpu/drm/drm_dp_mst_topology.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2b9ce965f044..7aaf184a2e5f 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -4473,33 +4473,31 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
*/
int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
{
+ int count = 0, ret;
u8 status;
- int ret;
- int count = 0;

do {
- ret = drm_dp_dpcd_readb(mgr->aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
-
+ ret = drm_dp_dpcd_readb(mgr->aux,
+ DP_PAYLOAD_TABLE_UPDATE_STATUS,
+ &status);
if (ret < 0) {
- DRM_DEBUG_KMS("failed to read payload table status %d\n", ret);
- goto fail;
+ DRM_DEBUG_KMS("failed to read payload table status %d\n",
+ ret);
+ return ret;
}

if (status & DP_PAYLOAD_ACT_HANDLED)
break;
count++;
udelay(100);
-
} while (count < 30);

if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
- DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n", status, count);
- ret = -EINVAL;
- goto fail;
+ DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n",
+ status, count);
+ return -EINVAL;
}
return 0;
-fail:
- return ret;
}
EXPORT_SYMBOL(drm_dp_check_act_status);

--
2.25.1

2020-04-03 20:08:54

by Lyude Paul

[permalink] [raw]
Subject: [PATCH 4/4] drm/dp_mst: Print errors on ACT timeouts

Although it's not unexpected for drm_dp_check_act_status() to fail due
to DPCD read failures (as the hub may have just been unplugged
suddenly), timeouts are a bit more worrying as they either mean we need
a longer timeout value, or we aren't setting up payload allocations
properly. So, let's start printing errors on timeouts.

Signed-off-by: Lyude Paul <[email protected]>
Cc: Sean Paul <[email protected]>
---
drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index f313407374ed..3d0d373f6f91 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -4494,6 +4494,10 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
DP_PAYLOAD_TABLE_UPDATE_STATUS,
&status);
if (ret < 0) {
+ /*
+ * Failure here isn't unexpected - the hub may have
+ * just been unplugged
+ */
DRM_DEBUG_KMS("failed to read payload table status %d\n",
ret);
return ret;
@@ -4505,8 +4509,8 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
} while (jiffies < timeout);

if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
- DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
- status, timeout_ms);
+ DRM_ERROR("Failed to get ACT after %dms, last status: %02x\n",
+ timeout_ms, status);
return -EINVAL;
}
return 0;
--
2.25.1

2020-04-03 20:09:18

by Lyude Paul

[permalink] [raw]
Subject: [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s

Currently we only poll for an ACT up to 30 times, with a busy-wait delay
of 100µs between each attempt - giving us a timeout of 2900µs. While
this might seem sensible, it would appear that in certain scenarios it
can take dramatically longer then that for us to receive an ACT. On one
of the EVGA MST hubs that I have available, I observed said hub
sometimes taking longer then a second before signalling the ACT. These
delays mostly seem to occur when previous sideband messages we've sent
are NAKd by the hub, however it wouldn't be particularly surprising if
it's possible to reproduce times like this simply by introducing branch
devices with large LCTs since payload allocations have to take effect on
every downstream device up to the payload's target.

So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
and additionally use usleep_range() to avoid a very long and rude
busy-wait. Note that the previous retry count of 30 appears to have been
arbitrarily chosen, as I can't find any mention of a recommended timeout
or retry count for ACTs in the DisplayPort 2.0 specification. This also
goes for the range we were previously using for udelay(), although I
suspect that was just copied from the recommended delay for link
training on SST devices.

Signed-off-by: Lyude Paul <[email protected]>
Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)")
Cc: Sean Paul <[email protected]>
Cc: <[email protected]> # v3.17+
---
drivers/gpu/drm/drm_dp_mst_topology.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 7aaf184a2e5f..f313407374ed 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -4466,17 +4466,30 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
* @mgr: manager to use
*
* Tries waiting for the MST hub to finish updating it's payload table by
- * polling for the ACT handled bit.
+ * polling for the ACT handled bit for up to 3 seconds (yes-some hubs really
+ * take that long).
*
* Returns:
* 0 if the ACT was handled in time, negative error code on failure.
*/
int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
{
- int count = 0, ret;
+ /*
+ * There doesn't seem to be any recommended retry count or timeout in
+ * the MST specification. Since some hubs have been observed to take
+ * over 1 second to update their payload allocations under certain
+ * conditions, we use a rather large timeout value.
+ */
+ const int timeout_ms = 3000;
+ unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
+ int ret;
+ bool retrying = false;
u8 status;

do {
+ if (retrying)
+ usleep_range(100, 1000);
+
ret = drm_dp_dpcd_readb(mgr->aux,
DP_PAYLOAD_TABLE_UPDATE_STATUS,
&status);
@@ -4488,13 +4501,12 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)

if (status & DP_PAYLOAD_ACT_HANDLED)
break;
- count++;
- udelay(100);
- } while (count < 30);
+ retrying = true;
+ } while (jiffies < timeout);

if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
- DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n",
- status, count);
+ DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
+ status, timeout_ms);
return -EINVAL;
}
return 0;
--
2.25.1

2020-04-03 20:10:54

by Lyude Paul

[permalink] [raw]
Subject: [PATCH 1/4] drm/dp_mst: Improve kdocs for drm_dp_check_act_status()

No functional changes.

Signed-off-by: Lyude Paul <[email protected]>
Cc: Sean Paul <[email protected]>
---
drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 10d0315af513..2b9ce965f044 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -4462,10 +4462,14 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,


/**
- * drm_dp_check_act_status() - Check ACT handled status.
+ * drm_dp_check_act_status() - Polls for ACT handled status.
* @mgr: manager to use
*
- * Check the payload status bits in the DPCD for ACT handled completion.
+ * Tries waiting for the MST hub to finish updating it's payload table by
+ * polling for the ACT handled bit.
+ *
+ * Returns:
+ * 0 if the ACT was handled in time, negative error code on failure.
*/
int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
{
--
2.25.1

2020-04-06 19:23:30

by Sean Paul

[permalink] [raw]
Subject: Re: [PATCH 1/4] drm/dp_mst: Improve kdocs for drm_dp_check_act_status()

On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
>
> No functional changes.
>
> Signed-off-by: Lyude Paul <[email protected]>
> Cc: Sean Paul <[email protected]>

Reviewed-by: Sean Paul <[email protected]>

> ---
> drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 10d0315af513..2b9ce965f044 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -4462,10 +4462,14 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
>
>
> /**
> - * drm_dp_check_act_status() - Check ACT handled status.
> + * drm_dp_check_act_status() - Polls for ACT handled status.
> * @mgr: manager to use
> *
> - * Check the payload status bits in the DPCD for ACT handled completion.
> + * Tries waiting for the MST hub to finish updating it's payload table by
> + * polling for the ACT handled bit.
> + *
> + * Returns:
> + * 0 if the ACT was handled in time, negative error code on failure.
> */
> int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> {
> --
> 2.25.1
>

2020-04-06 19:26:25

by Sean Paul

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/dp_mst: Reformat drm_dp_check_act_status() a bit

On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
>
> Just add a bit more line wrapping, get rid of some extraneous
> whitespace, remove an unneeded goto label, and move around some variable
> declarations. No functional changes here.
>
> Signed-off-by: Lyude Paul <[email protected]>
> [this isn't a fix, but it's needed for the fix that comes after this]
> Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)")
> Cc: Sean Paul <[email protected]>
> Cc: <[email protected]> # v3.17+
> ---
> drivers/gpu/drm/drm_dp_mst_topology.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 2b9ce965f044..7aaf184a2e5f 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -4473,33 +4473,31 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
> */
> int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> {
> + int count = 0, ret;
> u8 status;
> - int ret;
> - int count = 0;
>
> do {
> - ret = drm_dp_dpcd_readb(mgr->aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
> -
> + ret = drm_dp_dpcd_readb(mgr->aux,
> + DP_PAYLOAD_TABLE_UPDATE_STATUS,
> + &status);
> if (ret < 0) {
> - DRM_DEBUG_KMS("failed to read payload table status %d\n", ret);
> - goto fail;
> + DRM_DEBUG_KMS("failed to read payload table status %d\n",
> + ret);
> + return ret;
> }
>
> if (status & DP_PAYLOAD_ACT_HANDLED)
> break;
> count++;
> udelay(100);
> -
> } while (count < 30);
>
> if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> - DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n", status, count);
> - ret = -EINVAL;
> - goto fail;
> + DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n",

Should we print status in base16 here?

Otherwise:

Reviewed-by: Sean Paul <[email protected]>

> + status, count);
> + return -EINVAL;
> }
> return 0;
> -fail:
> - return ret;
> }
> EXPORT_SYMBOL(drm_dp_check_act_status);
>
> --
> 2.25.1
>

2020-04-06 19:28:47

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/dp_mst: Reformat drm_dp_check_act_status() a bit

On Mon, 2020-04-06 at 15:23 -0400, Sean Paul wrote:
> On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
> > Just add a bit more line wrapping, get rid of some extraneous
> > whitespace, remove an unneeded goto label, and move around some variable
> > declarations. No functional changes here.
> >
> > Signed-off-by: Lyude Paul <[email protected]>
> > [this isn't a fix, but it's needed for the fix that comes after this]
> > Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper
> > (v0.6)")
> > Cc: Sean Paul <[email protected]>
> > Cc: <[email protected]> # v3.17+
> > ---
> > drivers/gpu/drm/drm_dp_mst_topology.c | 22 ++++++++++------------
> > 1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 2b9ce965f044..7aaf184a2e5f 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -4473,33 +4473,31 @@ static int drm_dp_dpcd_write_payload(struct
> > drm_dp_mst_topology_mgr *mgr,
> > */
> > int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> > {
> > + int count = 0, ret;
> > u8 status;
> > - int ret;
> > - int count = 0;
> >
> > do {
> > - ret = drm_dp_dpcd_readb(mgr->aux,
> > DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
> > -
> > + ret = drm_dp_dpcd_readb(mgr->aux,
> > + DP_PAYLOAD_TABLE_UPDATE_STATUS,
> > + &status);
> > if (ret < 0) {
> > - DRM_DEBUG_KMS("failed to read payload table status
> > %d\n", ret);
> > - goto fail;
> > + DRM_DEBUG_KMS("failed to read payload table status
> > %d\n",
> > + ret);
> > + return ret;
> > }
> >
> > if (status & DP_PAYLOAD_ACT_HANDLED)
> > break;
> > count++;
> > udelay(100);
> > -
> > } while (count < 30);
> >
> > if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> > - DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > retries\n", status, count);
> > - ret = -EINVAL;
> > - goto fail;
> > + DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > retries\n",
>
> Should we print status in base16 here?
>
> Otherwise:
>
> Reviewed-by: Sean Paul <[email protected]>

Good point - I'll make sure to fix that before I push the series
>
> > + status, count);
> > + return -EINVAL;
> > }
> > return 0;
> > -fail:
> > - return ret;
> > }
> > EXPORT_SYMBOL(drm_dp_check_act_status);
> >
> > --
> > 2.25.1
> >
--
Cheers,
Lyude Paul (she/her)
Associate Software Engineer at Red Hat

2020-04-06 19:42:44

by Sean Paul

[permalink] [raw]
Subject: Re: [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s

On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
>
> Currently we only poll for an ACT up to 30 times, with a busy-wait delay
> of 100µs between each attempt - giving us a timeout of 2900µs. While
> this might seem sensible, it would appear that in certain scenarios it
> can take dramatically longer then that for us to receive an ACT. On one
> of the EVGA MST hubs that I have available, I observed said hub
> sometimes taking longer then a second before signalling the ACT. These
> delays mostly seem to occur when previous sideband messages we've sent
> are NAKd by the hub, however it wouldn't be particularly surprising if
> it's possible to reproduce times like this simply by introducing branch
> devices with large LCTs since payload allocations have to take effect on
> every downstream device up to the payload's target.
>
> So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
> and additionally use usleep_range() to avoid a very long and rude
> busy-wait. Note that the previous retry count of 30 appears to have been
> arbitrarily chosen, as I can't find any mention of a recommended timeout
> or retry count for ACTs in the DisplayPort 2.0 specification. This also
> goes for the range we were previously using for udelay(), although I
> suspect that was just copied from the recommended delay for link
> training on SST devices.
>
> Signed-off-by: Lyude Paul <[email protected]>
> Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)")
> Cc: Sean Paul <[email protected]>
> Cc: <[email protected]> # v3.17+
> ---
> drivers/gpu/drm/drm_dp_mst_topology.c | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 7aaf184a2e5f..f313407374ed 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -4466,17 +4466,30 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
> * @mgr: manager to use
> *
> * Tries waiting for the MST hub to finish updating it's payload table by
> - * polling for the ACT handled bit.
> + * polling for the ACT handled bit for up to 3 seconds (yes-some hubs really
> + * take that long).
> *
> * Returns:
> * 0 if the ACT was handled in time, negative error code on failure.
> */
> int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> {
> - int count = 0, ret;
> + /*
> + * There doesn't seem to be any recommended retry count or timeout in
> + * the MST specification. Since some hubs have been observed to take
> + * over 1 second to update their payload allocations under certain
> + * conditions, we use a rather large timeout value.
> + */
> + const int timeout_ms = 3000;
> + unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> + int ret;
> + bool retrying = false;
> u8 status;
>
> do {
> + if (retrying)
> + usleep_range(100, 1000);
> +
> ret = drm_dp_dpcd_readb(mgr->aux,
> DP_PAYLOAD_TABLE_UPDATE_STATUS,
> &status);
> @@ -4488,13 +4501,12 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
>
> if (status & DP_PAYLOAD_ACT_HANDLED)
> break;
> - count++;
> - udelay(100);
> - } while (count < 30);
> + retrying = true;
> + } while (jiffies < timeout);

Somewhat academic, but I think there's an overflow possibility here if
timeout is near ulong_max and jiffies overflows during the usleep. In
that case we'll be retrying for a very loong time.

I wish we had i915's wait_for() macro available to all drm...

Sean

>
> if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> - DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n",
> - status, count);
> + DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
> + status, timeout_ms);
> return -EINVAL;
> }
> return 0;
> --
> 2.25.1
>

2020-04-06 19:44:23

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s

On Mon, 2020-04-06 at 15:41 -0400, Sean Paul wrote:
> On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
> > Currently we only poll for an ACT up to 30 times, with a busy-wait delay
> > of 100µs between each attempt - giving us a timeout of 2900µs. While
> > this might seem sensible, it would appear that in certain scenarios it
> > can take dramatically longer then that for us to receive an ACT. On one
> > of the EVGA MST hubs that I have available, I observed said hub
> > sometimes taking longer then a second before signalling the ACT. These
> > delays mostly seem to occur when previous sideband messages we've sent
> > are NAKd by the hub, however it wouldn't be particularly surprising if
> > it's possible to reproduce times like this simply by introducing branch
> > devices with large LCTs since payload allocations have to take effect on
> > every downstream device up to the payload's target.
> >
> > So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
> > and additionally use usleep_range() to avoid a very long and rude
> > busy-wait. Note that the previous retry count of 30 appears to have been
> > arbitrarily chosen, as I can't find any mention of a recommended timeout
> > or retry count for ACTs in the DisplayPort 2.0 specification. This also
> > goes for the range we were previously using for udelay(), although I
> > suspect that was just copied from the recommended delay for link
> > training on SST devices.
> >
> > Signed-off-by: Lyude Paul <[email protected]>
> > Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper
> > (v0.6)")
> > Cc: Sean Paul <[email protected]>
> > Cc: <[email protected]> # v3.17+
> > ---
> > drivers/gpu/drm/drm_dp_mst_topology.c | 26 +++++++++++++++++++-------
> > 1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 7aaf184a2e5f..f313407374ed 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -4466,17 +4466,30 @@ static int drm_dp_dpcd_write_payload(struct
> > drm_dp_mst_topology_mgr *mgr,
> > * @mgr: manager to use
> > *
> > * Tries waiting for the MST hub to finish updating it's payload table by
> > - * polling for the ACT handled bit.
> > + * polling for the ACT handled bit for up to 3 seconds (yes-some hubs
> > really
> > + * take that long).
> > *
> > * Returns:
> > * 0 if the ACT was handled in time, negative error code on failure.
> > */
> > int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> > {
> > - int count = 0, ret;
> > + /*
> > + * There doesn't seem to be any recommended retry count or timeout
> > in
> > + * the MST specification. Since some hubs have been observed to
> > take
> > + * over 1 second to update their payload allocations under certain
> > + * conditions, we use a rather large timeout value.
> > + */
> > + const int timeout_ms = 3000;
> > + unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > + int ret;
> > + bool retrying = false;
> > u8 status;
> >
> > do {
> > + if (retrying)
> > + usleep_range(100, 1000);
> > +
> > ret = drm_dp_dpcd_readb(mgr->aux,
> > DP_PAYLOAD_TABLE_UPDATE_STATUS,
> > &status);
> > @@ -4488,13 +4501,12 @@ int drm_dp_check_act_status(struct
> > drm_dp_mst_topology_mgr *mgr)
> >
> > if (status & DP_PAYLOAD_ACT_HANDLED)
> > break;
> > - count++;
> > - udelay(100);
> > - } while (count < 30);
> > + retrying = true;
> > + } while (jiffies < timeout);
>
> Somewhat academic, but I think there's an overflow possibility here if
> timeout is near ulong_max and jiffies overflows during the usleep. In
> that case we'll be retrying for a very loong time.
>
> I wish we had i915's wait_for() macro available to all drm...

Maybe we could add it to the kernel library somewhere? I don't see why we'd
need to stop at DRM

>
> Sean
>
> > if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> > - DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > retries\n",
> > - status, count);
> > + DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
> > + status, timeout_ms);
> > return -EINVAL;
> > }
> > return 0;
> > --
> > 2.25.1
> >
--
Cheers,
Lyude Paul (she/her)
Associate Software Engineer at Red Hat

2020-04-06 19:44:31

by Sean Paul

[permalink] [raw]
Subject: Re: [PATCH 4/4] drm/dp_mst: Print errors on ACT timeouts

On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
>
> Although it's not unexpected for drm_dp_check_act_status() to fail due
> to DPCD read failures (as the hub may have just been unplugged
> suddenly), timeouts are a bit more worrying as they either mean we need
> a longer timeout value, or we aren't setting up payload allocations
> properly. So, let's start printing errors on timeouts.
>
> Signed-off-by: Lyude Paul <[email protected]>
> Cc: Sean Paul <[email protected]>

Reviewed-by: Sean Paul <[email protected]>

> ---
> drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index f313407374ed..3d0d373f6f91 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -4494,6 +4494,10 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> DP_PAYLOAD_TABLE_UPDATE_STATUS,
> &status);
> if (ret < 0) {
> + /*
> + * Failure here isn't unexpected - the hub may have
> + * just been unplugged
> + */
> DRM_DEBUG_KMS("failed to read payload table status %d\n",
> ret);
> return ret;
> @@ -4505,8 +4509,8 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> } while (jiffies < timeout);
>
> if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> - DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
> - status, timeout_ms);
> + DRM_ERROR("Failed to get ACT after %dms, last status: %02x\n",
> + timeout_ms, status);
> return -EINVAL;
> }
> return 0;
> --
> 2.25.1
>

2020-04-06 19:51:08

by Sean Paul

[permalink] [raw]
Subject: Re: [PATCH 3/4] drm/dp_mst: Increase ACT retry timeout to 3s

On Mon, Apr 6, 2020 at 3:43 PM Lyude Paul <[email protected]> wrote:
>
> On Mon, 2020-04-06 at 15:41 -0400, Sean Paul wrote:
> > On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
> > > Currently we only poll for an ACT up to 30 times, with a busy-wait delay
> > > of 100µs between each attempt - giving us a timeout of 2900µs. While
> > > this might seem sensible, it would appear that in certain scenarios it
> > > can take dramatically longer then that for us to receive an ACT. On one
> > > of the EVGA MST hubs that I have available, I observed said hub
> > > sometimes taking longer then a second before signalling the ACT. These
> > > delays mostly seem to occur when previous sideband messages we've sent
> > > are NAKd by the hub, however it wouldn't be particularly surprising if
> > > it's possible to reproduce times like this simply by introducing branch
> > > devices with large LCTs since payload allocations have to take effect on
> > > every downstream device up to the payload's target.
> > >
> > > So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
> > > and additionally use usleep_range() to avoid a very long and rude
> > > busy-wait. Note that the previous retry count of 30 appears to have been
> > > arbitrarily chosen, as I can't find any mention of a recommended timeout
> > > or retry count for ACTs in the DisplayPort 2.0 specification. This also
> > > goes for the range we were previously using for udelay(), although I
> > > suspect that was just copied from the recommended delay for link
> > > training on SST devices.
> > >
> > > Signed-off-by: Lyude Paul <[email protected]>
> > > Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper
> > > (v0.6)")
> > > Cc: Sean Paul <[email protected]>
> > > Cc: <[email protected]> # v3.17+
> > > ---
> > > drivers/gpu/drm/drm_dp_mst_topology.c | 26 +++++++++++++++++++-------
> > > 1 file changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 7aaf184a2e5f..f313407374ed 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -4466,17 +4466,30 @@ static int drm_dp_dpcd_write_payload(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > > * @mgr: manager to use
> > > *
> > > * Tries waiting for the MST hub to finish updating it's payload table by
> > > - * polling for the ACT handled bit.
> > > + * polling for the ACT handled bit for up to 3 seconds (yes-some hubs
> > > really
> > > + * take that long).
> > > *
> > > * Returns:
> > > * 0 if the ACT was handled in time, negative error code on failure.
> > > */
> > > int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> > > {
> > > - int count = 0, ret;
> > > + /*
> > > + * There doesn't seem to be any recommended retry count or timeout
> > > in
> > > + * the MST specification. Since some hubs have been observed to
> > > take
> > > + * over 1 second to update their payload allocations under certain
> > > + * conditions, we use a rather large timeout value.
> > > + */
> > > + const int timeout_ms = 3000;
> > > + unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > > + int ret;
> > > + bool retrying = false;
> > > u8 status;
> > >
> > > do {
> > > + if (retrying)
> > > + usleep_range(100, 1000);
> > > +
> > > ret = drm_dp_dpcd_readb(mgr->aux,
> > > DP_PAYLOAD_TABLE_UPDATE_STATUS,
> > > &status);
> > > @@ -4488,13 +4501,12 @@ int drm_dp_check_act_status(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >
> > > if (status & DP_PAYLOAD_ACT_HANDLED)
> > > break;
> > > - count++;
> > > - udelay(100);
> > > - } while (count < 30);
> > > + retrying = true;
> > > + } while (jiffies < timeout);
> >
> > Somewhat academic, but I think there's an overflow possibility here if
> > timeout is near ulong_max and jiffies overflows during the usleep. In
> > that case we'll be retrying for a very loong time.
> >
> > I wish we had i915's wait_for() macro available to all drm...
>
> Maybe we could add it to the kernel library somewhere? I don't see why we'd
> need to stop at DRM

So You Want To Build A Bikeshed...

Seriously though, I'd be very happy with that. Alternatively you could
shoehorn this into readx_poll_timeout as well.

Sean

>
> >
> > Sean
> >
> > > if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> > > - DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > > retries\n",
> > > - status, count);
> > > + DRM_DEBUG_KMS("failed to get ACT bit %d after %dms\n",
> > > + status, timeout_ms);
> > > return -EINVAL;
> > > }
> > > return 0;
> > > --
> > > 2.25.1
> > >
> --
> Cheers,
> Lyude Paul (she/her)
> Associate Software Engineer at Red Hat
>

2020-04-06 22:12:54

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/dp_mst: Reformat drm_dp_check_act_status() a bit

On Mon, 2020-04-06 at 15:23 -0400, Sean Paul wrote:
> On Fri, Apr 3, 2020 at 4:08 PM Lyude Paul <[email protected]> wrote:
> > Just add a bit more line wrapping, get rid of some extraneous
> > whitespace, remove an unneeded goto label, and move around some variable
> > declarations. No functional changes here.
> >
> > Signed-off-by: Lyude Paul <[email protected]>
> > [this isn't a fix, but it's needed for the fix that comes after this]
> > Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper
> > (v0.6)")
> > Cc: Sean Paul <[email protected]>
> > Cc: <[email protected]> # v3.17+
> > ---
> > drivers/gpu/drm/drm_dp_mst_topology.c | 22 ++++++++++------------
> > 1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 2b9ce965f044..7aaf184a2e5f 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -4473,33 +4473,31 @@ static int drm_dp_dpcd_write_payload(struct
> > drm_dp_mst_topology_mgr *mgr,
> > */
> > int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr)
> > {
> > + int count = 0, ret;
> > u8 status;
> > - int ret;
> > - int count = 0;
> >
> > do {
> > - ret = drm_dp_dpcd_readb(mgr->aux,
> > DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
> > -
> > + ret = drm_dp_dpcd_readb(mgr->aux,
> > + DP_PAYLOAD_TABLE_UPDATE_STATUS,
> > + &status);
> > if (ret < 0) {
> > - DRM_DEBUG_KMS("failed to read payload table status
> > %d\n", ret);
> > - goto fail;
> > + DRM_DEBUG_KMS("failed to read payload table status
> > %d\n",
> > + ret);
> > + return ret;
> > }
> >
> > if (status & DP_PAYLOAD_ACT_HANDLED)
> > break;
> > count++;
> > udelay(100);
> > -
> > } while (count < 30);
> >
> > if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
> > - DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > retries\n", status, count);
> > - ret = -EINVAL;
> > - goto fail;
> > + DRM_DEBUG_KMS("failed to get ACT bit %d after %d
> > retries\n",
>
> Should we print status in base16 here?

jfyi - I realized we don't actually need to do this, because we do this in the
next patch whoops. Just figured I'd point that out

>
> Otherwise:
>
> Reviewed-by: Sean Paul <[email protected]>
>
> > + status, count);
> > + return -EINVAL;
> > }
> > return 0;
> > -fail:
> > - return ret;
> > }
> > EXPORT_SYMBOL(drm_dp_check_act_status);
> >
> > --
> > 2.25.1
> >
--
Cheers,
Lyude Paul (she/her)
Associate Software Engineer at Red Hat