2022-06-08 06:06:44

by Lyude Paul

[permalink] [raw]
Subject: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if last connected port isn't connected

In the past, we've ran into strange issues regarding errors in response to
trying to destroy payloads after a port has been unplugged. We fixed this
back in:

This is intended to replace the workaround that was added here:

commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by ports in stale topology")

which was intended fix to some of the payload leaks that were observed
before, where we would attempt to determine if the port was still connected
to the topology before updating payloads using
drm_dp_mst_port_downstream_of_branch. This wasn't a particularly good
solution, since one of the points of still having port and mstb validation
is to avoid sending messages to newly disconnected branches wherever
possible - thus the required use of drm_dp_mst_port_downstream_of_branch
would indicate something may be wrong with said validation.

It seems like it may have just been races and luck that made
drm_dp_mst_port_downstream_of_branch work however, as while I was trying to
figure out the true cause of this issue when removing the legacy MST code -
I discovered an important excerpt in section 2.14.2.3.3.6 of the DP 2.0
specs:

"BAD_PARAM - This reply is transmitted when a Message Transaction parameter
is in error; for example, the next port number is invalid or /no device is
connected/ to the port associated with the port number."

Sure enough - removing the calls to drm_dp_mst_port_downstream_of_branch()
and instead checking the ->ddps field of the parent port to see whether we
should release a given payload or not seems to totally fix the issue. This
does actually make sense to me, as it seems the implication is that given a
topology where an MSTB is removed, the payload for the MST parent's port
will be released automatically if that port is also marked as disconnected.
However, if there's another parent in the chain after that which is
connected - payloads must be released there with an ALLOCATE_PAYLOAD
message.

So, let's do that!

Signed-off-by: Lyude Paul <[email protected]>
Cc: Wayne Lin <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Fangzhi Zuo <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Imre Deak <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sean Paul <[email protected]>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 51 +++++++------------
1 file changed, 17 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index dd314586bac3..70adb8db4335 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3137,7 +3137,7 @@ static struct drm_dp_mst_port *drm_dp_get_last_connected_port_to_mstb(struct drm
static struct drm_dp_mst_branch *
drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
struct drm_dp_mst_branch *mstb,
- int *port_num)
+ struct drm_dp_mst_port **last_port)
{
struct drm_dp_mst_branch *rmstb = NULL;
struct drm_dp_mst_port *found_port;
@@ -3153,7 +3153,8 @@ drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,

if (drm_dp_mst_topology_try_get_mstb(found_port->parent)) {
rmstb = found_port->parent;
- *port_num = found_port->port_num;
+ *last_port = found_port;
+ drm_dp_mst_get_port_malloc(found_port);
} else {
/* Search again, starting from this parent */
mstb = found_port->parent;
@@ -3170,7 +3171,7 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
int pbn)
{
struct drm_dp_sideband_msg_tx *txmsg;
- struct drm_dp_mst_branch *mstb;
+ struct drm_dp_mst_branch *mstb = NULL;
int ret, port_num;
u8 sinks[DRM_DP_MAX_SDP_STREAMS];
int i;
@@ -3178,12 +3179,22 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
port_num = port->port_num;
mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
if (!mstb) {
- mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
- port->parent,
- &port_num);
+ struct drm_dp_mst_port *rport = NULL;
+ bool ddps;

+ mstb = drm_dp_get_last_connected_port_and_mstb(mgr, port->parent, &rport);
if (!mstb)
return -EINVAL;
+
+ ddps = rport->ddps;
+ port_num = rport->port_num;
+ drm_dp_mst_put_port_malloc(rport);
+
+ /* If the port is currently marked as disconnected, don't send a payload message */
+ if (!ddps) {
+ ret = -EINVAL;
+ goto fail_put;
+ }
}

txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
@@ -3384,7 +3395,6 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr, int start_s
struct drm_dp_mst_port *port;
int i, j;
int cur_slots = start_slot;
- bool skip;

mutex_lock(&mgr->payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -3399,16 +3409,6 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr, int start_s
port = container_of(vcpi, struct drm_dp_mst_port,
vcpi);

- mutex_lock(&mgr->lock);
- skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
- mutex_unlock(&mgr->lock);
-
- if (skip) {
- drm_dbg_kms(mgr->dev,
- "Virtual channel %d is not in current topology\n",
- i);
- continue;
- }
/* Validated ports don't matter if we're releasing
* VCPI
*/
@@ -3509,7 +3509,6 @@ int drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
struct drm_dp_mst_port *port;
int i;
int ret = 0;
- bool skip;

mutex_lock(&mgr->payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -3519,13 +3518,6 @@ int drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)

port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi);

- mutex_lock(&mgr->lock);
- skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
- mutex_unlock(&mgr->lock);
-
- if (skip)
- continue;
-
drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr->payloads[i].payload_state);
if (mgr->payloads[i].payload_state == DP_PAYLOAD_LOCAL) {
ret = drm_dp_create_payload_step2(mgr, port, mgr->proposed_vcpis[i]->vcpi, &mgr->payloads[i]);
@@ -4780,18 +4772,9 @@ EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
struct drm_dp_mst_port *port)
{
- bool skip;
-
if (!port->vcpi.vcpi)
return;

- mutex_lock(&mgr->lock);
- skip = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
- mutex_unlock(&mgr->lock);
-
- if (skip)
- return;
-
drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
port->vcpi.num_slots = 0;
port->vcpi.pbn = 0;
--
2.35.3


2022-07-05 09:06:11

by Lin, Wayne

[permalink] [raw]
Subject: RE: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if last connected port isn't connected

[Public]



> -----Original Message-----
> From: Lyude Paul <[email protected]>
> Sent: Wednesday, June 8, 2022 3:30 AM
> To: [email protected]; [email protected]; amd-
> [email protected]
> Cc: Lin, Wayne <[email protected]>; Ville Syrjälä
> <[email protected]>; Zuo, Jerry <[email protected]>; Jani Nikula
> <[email protected]>; Imre Deak <[email protected]>; Daniel Vetter
> <[email protected]>; Sean Paul <[email protected]>; David Airlie
> <[email protected]>; Daniel Vetter <[email protected]>; Thomas Zimmermann
> <[email protected]>; Lakha, Bhawanpreet
> <[email protected]>; open list <[email protected]>
> Subject: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if
> last connected port isn't connected
>
> In the past, we've ran into strange issues regarding errors in response to
> trying to destroy payloads after a port has been unplugged. We fixed this
> back in:
>
> This is intended to replace the workaround that was added here:
>
> commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> ports in stale topology")
>
> which was intended fix to some of the payload leaks that were observed
> before, where we would attempt to determine if the port was still
> connected to the topology before updating payloads using
> drm_dp_mst_port_downstream_of_branch. This wasn't a particularly good
> solution, since one of the points of still having port and mstb validation is to
> avoid sending messages to newly disconnected branches wherever possible
> - thus the required use of drm_dp_mst_port_downstream_of_branch
> would indicate something may be wrong with said validation.
>
> It seems like it may have just been races and luck that made
> drm_dp_mst_port_downstream_of_branch work however, as while I was
> trying to figure out the true cause of this issue when removing the legacy
> MST code - I discovered an important excerpt in section 2.14.2.3.3.6 of the DP
> 2.0
> specs:
>
> "BAD_PARAM - This reply is transmitted when a Message Transaction
> parameter is in error; for example, the next port number is invalid or /no
> device is connected/ to the port associated with the port number."
>
> Sure enough - removing the calls to
> drm_dp_mst_port_downstream_of_branch()
> and instead checking the ->ddps field of the parent port to see whether we
> should release a given payload or not seems to totally fix the issue. This does
> actually make sense to me, as it seems the implication is that given a
> topology where an MSTB is removed, the payload for the MST parent's port
> will be released automatically if that port is also marked as disconnected.
> However, if there's another parent in the chain after that which is connected
> - payloads must be released there with an ALLOCATE_PAYLOAD message.
>
> So, let's do that!
>
> Signed-off-by: Lyude Paul <[email protected]>
> Cc: Wayne Lin <[email protected]>
> Cc: Ville Syrjälä <[email protected]>
> Cc: Fangzhi Zuo <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Imre Deak <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Sean Paul <[email protected]>
> ---
> drivers/gpu/drm/display/drm_dp_mst_topology.c | 51 +++++++------------
> 1 file changed, 17 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> index dd314586bac3..70adb8db4335 100644
> --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> @@ -3137,7 +3137,7 @@ static struct drm_dp_mst_port
> *drm_dp_get_last_connected_port_to_mstb(struct drm static struct
> drm_dp_mst_branch * drm_dp_get_last_connected_port_and_mstb(struct
> drm_dp_mst_topology_mgr *mgr,
> struct drm_dp_mst_branch *mstb,
> - int *port_num)
> + struct drm_dp_mst_port **last_port)
> {
> struct drm_dp_mst_branch *rmstb = NULL;
> struct drm_dp_mst_port *found_port;
> @@ -3153,7 +3153,8 @@
> drm_dp_get_last_connected_port_and_mstb(struct
> drm_dp_mst_topology_mgr *mgr,
>
> if (drm_dp_mst_topology_try_get_mstb(found_port-
> >parent)) {
> rmstb = found_port->parent;
> - *port_num = found_port->port_num;
> + *last_port = found_port;
> + drm_dp_mst_get_port_malloc(found_port);
> } else {
> /* Search again, starting from this parent */
> mstb = found_port->parent;
> @@ -3170,7 +3171,7 @@ static int drm_dp_payload_send_msg(struct
> drm_dp_mst_topology_mgr *mgr,
> int pbn)
> {
> struct drm_dp_sideband_msg_tx *txmsg;
> - struct drm_dp_mst_branch *mstb;
> + struct drm_dp_mst_branch *mstb = NULL;
> int ret, port_num;
> u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> int i;
> @@ -3178,12 +3179,22 @@ static int drm_dp_payload_send_msg(struct
> drm_dp_mst_topology_mgr *mgr,
> port_num = port->port_num;
> mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> >parent);
> if (!mstb) {
> - mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> - port->parent,
> - &port_num);
> + struct drm_dp_mst_port *rport = NULL;
> + bool ddps;
>
> + mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> port->parent,
> +&rport);
> if (!mstb)
> return -EINVAL;
> +
> + ddps = rport->ddps;
> + port_num = rport->port_num;
> + drm_dp_mst_put_port_malloc(rport);
> +
> + /* If the port is currently marked as disconnected, don't send
> a payload message */
> + if (!ddps) {
Hi Lyude,

Thanks for driving this!
Shouldn't we still send ALLOCATE_PAYLOAD with PBN 0 to the last connected
Port even its peer device is disconnected? We rely on this "path msg" to update
all payload ID tables along the virtual payload channel.

commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
ports in stale topology") was trying to skip updating payload for a target which is
no longer existing in the current topology rooted at mgr->mst_primary. I passed
"mgr->mst_primary" to drm_dp_mst_port_downstream_of_branch() previously.
Sorry, I might not fully understand the issue you've seen. Could you elaborate on
this more please?

Thanks!
> + ret = -EINVAL;
> + goto fail_put;
> + }
> }
>
> txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); @@ -3384,7 +3395,6
> @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr
> *mgr, int start_s
> struct drm_dp_mst_port *port;
> int i, j;
> int cur_slots = start_slot;
> - bool skip;
>
> mutex_lock(&mgr->payload_lock);
> for (i = 0; i < mgr->max_payloads; i++) { @@ -3399,16 +3409,6 @@ int
> drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr,
> int start_s
> port = container_of(vcpi, struct drm_dp_mst_port,
> vcpi);
>
> - mutex_lock(&mgr->lock);
> - skip
> = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
> - mutex_unlock(&mgr->lock);
> -
> - if (skip) {
> - drm_dbg_kms(mgr->dev,
> - "Virtual channel %d is not in current
> topology\n",
> - i);
> - continue;
> - }
> /* Validated ports don't matter if we're releasing
> * VCPI
> */
> @@ -3509,7 +3509,6 @@ int drm_dp_update_payload_part2(struct
> drm_dp_mst_topology_mgr *mgr)
> struct drm_dp_mst_port *port;
> int i;
> int ret = 0;
> - bool skip;
>
> mutex_lock(&mgr->payload_lock);
> for (i = 0; i < mgr->max_payloads; i++) { @@ -3519,13 +3518,6 @@ int
> drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
>
> port = container_of(mgr->proposed_vcpis[i], struct
> drm_dp_mst_port, vcpi);
>
> - mutex_lock(&mgr->lock);
> - skip = !drm_dp_mst_port_downstream_of_branch(port,
> mgr->mst_primary);
> - mutex_unlock(&mgr->lock);
> -
> - if (skip)
> - continue;
> -
> drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr-
> >payloads[i].payload_state);
> if (mgr->payloads[i].payload_state == DP_PAYLOAD_LOCAL)
> {
> ret = drm_dp_create_payload_step2(mgr, port, mgr-
> >proposed_vcpis[i]->vcpi, &mgr->payloads[i]); @@ -4780,18 +4772,9 @@
> EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> struct drm_dp_mst_port *port)
> {
> - bool skip;
> -
> if (!port->vcpi.vcpi)
> return;
>
> - mutex_lock(&mgr->lock);
> - skip = !drm_dp_mst_port_downstream_of_branch(port, mgr-
> >mst_primary);
> - mutex_unlock(&mgr->lock);
> -
> - if (skip)
> - return;
> -
> drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> port->vcpi.num_slots = 0;
> port->vcpi.pbn = 0;
> --
> 2.35.3
--
Wayne Lin

2022-08-02 22:15:37

by Lyude Paul

[permalink] [raw]
Subject: Re: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if last connected port isn't connected

On Tue, 2022-07-05 at 08:45 +0000, Lin, Wayne wrote:
> [Public]
>
>
>
> > -----Original Message-----
> > From: Lyude Paul <[email protected]>
> > Sent: Wednesday, June 8, 2022 3:30 AM
> > To: [email protected]; [email protected]; amd-
> > [email protected]
> > Cc: Lin, Wayne <[email protected]>; Ville Syrjälä
> > <[email protected]>; Zuo, Jerry <[email protected]>; Jani
> > Nikula
> > <[email protected]>; Imre Deak <[email protected]>; Daniel Vetter
> > <[email protected]>; Sean Paul <[email protected]>; David Airlie
> > <[email protected]>; Daniel Vetter <[email protected]>; Thomas Zimmermann
> > <[email protected]>; Lakha, Bhawanpreet
> > <[email protected]>; open list <[email protected]>
> > Subject: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if
> > last connected port isn't connected
> >
> > In the past, we've ran into strange issues regarding errors in response to
> > trying to destroy payloads after a port has been unplugged. We fixed this
> > back in:
> >
> > This is intended to replace the workaround that was added here:
> >
> > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> > ports in stale topology")
> >
> > which was intended fix to some of the payload leaks that were observed
> > before, where we would attempt to determine if the port was still
> > connected to the topology before updating payloads using
> > drm_dp_mst_port_downstream_of_branch. This wasn't a particularly good
> > solution, since one of the points of still having port and mstb validation
> > is to
> > avoid sending messages to newly disconnected branches wherever possible
> > - thus the required use of drm_dp_mst_port_downstream_of_branch
> > would indicate something may be wrong with said validation.
> >
> > It seems like it may have just been races and luck that made
> > drm_dp_mst_port_downstream_of_branch work however, as while I was
> > trying to figure out the true cause of this issue when removing the legacy
> > MST code - I discovered an important excerpt in section 2.14.2.3.3.6 of
> > the DP
> > 2.0
> > specs:
> >
> > "BAD_PARAM - This reply is transmitted when a Message Transaction
> > parameter is in error; for example, the next port number is invalid or /no
> > device is connected/ to the port associated with the port number."
> >
> > Sure enough - removing the calls to
> > drm_dp_mst_port_downstream_of_branch()
> > and instead checking the ->ddps field of the parent port to see whether we
> > should release a given payload or not seems to totally fix the issue. This
> > does
> > actually make sense to me, as it seems the implication is that given a
> > topology where an MSTB is removed, the payload for the MST parent's port
> > will be released automatically if that port is also marked as
> > disconnected.
> > However, if there's another parent in the chain after that which is
> > connected
> > - payloads must be released there with an ALLOCATE_PAYLOAD message.
> >
> > So, let's do that!
> >
> > Signed-off-by: Lyude Paul <[email protected]>
> > Cc: Wayne Lin <[email protected]>
> > Cc: Ville Syrjälä <[email protected]>
> > Cc: Fangzhi Zuo <[email protected]>
> > Cc: Jani Nikula <[email protected]>
> > Cc: Imre Deak <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: Sean Paul <[email protected]>
> > ---
> >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 51 +++++++------------
> >  1 file changed, 17 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > index dd314586bac3..70adb8db4335 100644
> > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > @@ -3137,7 +3137,7 @@ static struct drm_dp_mst_port
> > *drm_dp_get_last_connected_port_to_mstb(struct drm  static struct
> > drm_dp_mst_branch *  drm_dp_get_last_connected_port_and_mstb(struct
> > drm_dp_mst_topology_mgr *mgr,
> >                                         struct drm_dp_mst_branch *mstb,
> > -                                       int *port_num)
> > +                                       struct drm_dp_mst_port
> > **last_port)
> >  {
> >         struct drm_dp_mst_branch *rmstb = NULL;
> >         struct drm_dp_mst_port *found_port;
> > @@ -3153,7 +3153,8 @@
> > drm_dp_get_last_connected_port_and_mstb(struct
> > drm_dp_mst_topology_mgr *mgr,
> >
> >                 if (drm_dp_mst_topology_try_get_mstb(found_port-
> > > parent)) {
> >                         rmstb = found_port->parent;
> > -                       *port_num = found_port->port_num;
> > +                       *last_port = found_port;
> > +                       drm_dp_mst_get_port_malloc(found_port);
> >                 } else {
> >                         /* Search again, starting from this parent */
> >                         mstb = found_port->parent;
> > @@ -3170,7 +3171,7 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >                                    int pbn)
> >  {
> >         struct drm_dp_sideband_msg_tx *txmsg;
> > -       struct drm_dp_mst_branch *mstb;
> > +       struct drm_dp_mst_branch *mstb = NULL;
> >         int ret, port_num;
> >         u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> >         int i;
> > @@ -3178,12 +3179,22 @@ static int drm_dp_payload_send_msg(struct
> > drm_dp_mst_topology_mgr *mgr,
> >         port_num = port->port_num;
> >         mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> > > parent);
> >         if (!mstb) {
> > -               mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > -                                                              port-
> > >parent,
> > -                                                              &port_num);
> > +               struct drm_dp_mst_port *rport = NULL;
> > +               bool ddps;
> >
> > +               mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > port->parent,
> > +&rport);
> >                 if (!mstb)
> >                         return -EINVAL;
> > +
> > +               ddps = rport->ddps;
> > +               port_num = rport->port_num;
> > +               drm_dp_mst_put_port_malloc(rport);
> > +
> > +               /* If the port is currently marked as disconnected, don't
> > send
> > a payload message */
> > +               if (!ddps) {
> Hi Lyude,
>
> Thanks for driving this!
> Shouldn't we still send ALLOCATE_PAYLOAD with PBN 0 to the last connected
> Port even its peer device is disconnected? We rely on this "path msg" to
> update
> all payload ID tables along the virtual payload channel.
>

Do you know if there's any devices that break with this change, btw? Would be
super useful to know imho, and if so I might be alright with dropping it
depending on what the answer to the next paragraph is.

> commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> ports in stale topology") was trying to skip updating payload for a target
> which is
> no longer existing in the current topology rooted at mgr->mst_primary. I
> passed
> "mgr->mst_primary" to drm_dp_mst_port_downstream_of_branch() previously.
> Sorry, I might not fully understand the issue you've seen. Could you
> elaborate on
> this more please?
>
> Thanks!

I will have to double check this since it's been a month, but basically - the
idea of having the topology references in the first place was to be the one
check for figuring out whether something's in a topology or not. I've been
thinking of maybe trying to replace it at some point, but I think we'd want to
do it all over the helpers instead of just in certain spots.

The other thing I noticed was that when I was rewriting this code, I noticed
it seemed a lot like we had misunderstood the issue that was causing leaks in
the first place. The BAD_PARAM we noticed indicates the payload we're trying
to remove on the other end doesn't exist anymore, meaning the branch device in
question got rid of any payloads it had active in response to the CSN. In
testing though I found that payloads would be automatically released in
situations where the last reachable port was marked as disconnected via a
previous CSN, but was still reachable otherwise, and not in any other
situation. This also seemed to match up with the excerpts in the DP spec that
I found, so I assumed it was probably correct.

Also, I think using the DDPS field instead of trying to traverse the topology
state (which might not have been fully updated yet in response to CSNs) might
be a slightly better idea since DDPS may end up being updated before the port
has been removed from our in-memory topology, which is kind of one of the
reasons I've been considering trying to come up with a better solution then
topology references someday (unfortunately it works 'good enough' for the most
part, so definitely not a priority). This is 100% a guess on my part though.

> > +                       ret = -EINVAL;
> > +                       goto fail_put;
> > +               }
> >         }
> >
> >         txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); @@ -3384,7 +3395,6
> > @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr
> > *mgr, int start_s
> >         struct drm_dp_mst_port *port;
> >         int i, j;
> >         int cur_slots = start_slot;
> > -       bool skip;
> >
> >         mutex_lock(&mgr->payload_lock);
> >         for (i = 0; i < mgr->max_payloads; i++) { @@ -3399,16 +3409,6 @@
> > int
> > drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr,
> > int start_s
> >                         port = container_of(vcpi, struct drm_dp_mst_port,
> >                                             vcpi);
> >
> > -                       mutex_lock(&mgr->lock);
> > -                       skip
> > = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
> > -                       mutex_unlock(&mgr->lock);
> > -
> > -                       if (skip) {
> > -                               drm_dbg_kms(mgr->dev,
> > -                                           "Virtual channel %d is not in
> > current
> > topology\n",
> > -                                           i);
> > -                               continue;
> > -                       }
> >                         /* Validated ports don't matter if we're releasing
> >                          * VCPI
> >                          */
> > @@ -3509,7 +3509,6 @@ int drm_dp_update_payload_part2(struct
> > drm_dp_mst_topology_mgr *mgr)
> >         struct drm_dp_mst_port *port;
> >         int i;
> >         int ret = 0;
> > -       bool skip;
> >
> >         mutex_lock(&mgr->payload_lock);
> >         for (i = 0; i < mgr->max_payloads; i++) { @@ -3519,13 +3518,6 @@
> > int
> > drm_dp_update_payload_part2(struct drm_dp_mst_topology_mgr *mgr)
> >
> >                 port = container_of(mgr->proposed_vcpis[i], struct
> > drm_dp_mst_port, vcpi);
> >
> > -               mutex_lock(&mgr->lock);
> > -               skip = !drm_dp_mst_port_downstream_of_branch(port,
> > mgr->mst_primary);
> > -               mutex_unlock(&mgr->lock);
> > -
> > -               if (skip)
> > -                       continue;
> > -
> >                 drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr-
> > > payloads[i].payload_state);
> >                 if (mgr->payloads[i].payload_state == DP_PAYLOAD_LOCAL)
> > {
> >                         ret = drm_dp_create_payload_step2(mgr, port, mgr-
> > > proposed_vcpis[i]->vcpi, &mgr->payloads[i]); @@ -4780,18 +4772,9 @@
> > EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> >  void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr *mgr,
> >                                 struct drm_dp_mst_port *port)
> >  {
> > -       bool skip;
> > -
> >         if (!port->vcpi.vcpi)
> >                 return;
> >
> > -       mutex_lock(&mgr->lock);
> > -       skip = !drm_dp_mst_port_downstream_of_branch(port, mgr-
> > > mst_primary);
> > -       mutex_unlock(&mgr->lock);
> > -
> > -       if (skip)
> > -               return;
> > -
> >         drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> >         port->vcpi.num_slots = 0;
> >         port->vcpi.pbn = 0;
> > --
> > 2.35.3
> --
> Wayne Lin

--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat


2022-08-10 03:36:27

by Lin, Wayne

[permalink] [raw]
Subject: RE: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if last connected port isn't connected

[Public]



> -----Original Message-----
> From: Lyude Paul <[email protected]>
> Sent: Wednesday, August 3, 2022 6:12 AM
> To: Lin, Wayne <[email protected]>; [email protected];
> [email protected]; [email protected]
> Cc: Ville Syrjälä <[email protected]>; Zuo, Jerry
> <[email protected]>; Jani Nikula <[email protected]>; Imre Deak
> <[email protected]>; Daniel Vetter <[email protected]>; Sean Paul
> <[email protected]>; David Airlie <[email protected]>; Daniel Vetter
> <[email protected]>; Thomas Zimmermann <[email protected]>; Lakha,
> Bhawanpreet <[email protected]>; open list <linux-
> [email protected]>
> Subject: Re: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing
> payloads if last connected port isn't connected
>
> On Tue, 2022-07-05 at 08:45 +0000, Lin, Wayne wrote:
> > [Public]
> >
> >
> >
> > > -----Original Message-----
> > > From: Lyude Paul <[email protected]>
> > > Sent: Wednesday, June 8, 2022 3:30 AM
> > > To: [email protected]; [email protected];
> > > amd- [email protected]
> > > Cc: Lin, Wayne <[email protected]>; Ville Syrjälä
> > > <[email protected]>; Zuo, Jerry <[email protected]>;
> > > Jani Nikula <[email protected]>; Imre Deak
> > > <[email protected]>; Daniel Vetter <[email protected]>; Sean
> > > Paul <[email protected]>; David Airlie <[email protected]>; Daniel
> > > Vetter <[email protected]>; Thomas Zimmermann
> <[email protected]>;
> > > Lakha, Bhawanpreet <[email protected]>; open list
> > > <[email protected]>
> > > Subject: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing
> > > payloads if last connected port isn't connected
> > >
> > > In the past, we've ran into strange issues regarding errors in
> > > response to trying to destroy payloads after a port has been
> > > unplugged. We fixed this back in:
> > >
> > > This is intended to replace the workaround that was added here:
> > >
> > > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> > > ports in stale topology")
> > >
> > > which was intended fix to some of the payload leaks that were
> > > observed before, where we would attempt to determine if the port was
> > > still connected to the topology before updating payloads using
> > > drm_dp_mst_port_downstream_of_branch. This wasn't a particularly
> > > good solution, since one of the points of still having port and mstb
> > > validation is to avoid sending messages to newly disconnected
> > > branches wherever possible
> > > - thus the required use of drm_dp_mst_port_downstream_of_branch
> > > would indicate something may be wrong with said validation.
> > >
> > > It seems like it may have just been races and luck that made
> > > drm_dp_mst_port_downstream_of_branch work however, as while I
> was
> > > trying to figure out the true cause of this issue when removing the
> > > legacy MST code - I discovered an important excerpt in section
> > > 2.14.2.3.3.6 of the DP
> > > 2.0
> > > specs:
> > >
> > > "BAD_PARAM - This reply is transmitted when a Message Transaction
> > > parameter is in error; for example, the next port number is invalid
> > > or /no device is connected/ to the port associated with the port number."
> > >
> > > Sure enough - removing the calls to
> > > drm_dp_mst_port_downstream_of_branch()
> > > and instead checking the ->ddps field of the parent port to see
> > > whether we should release a given payload or not seems to totally
> > > fix the issue. This does actually make sense to me, as it seems the
> > > implication is that given a topology where an MSTB is removed, the
> > > payload for the MST parent's port will be released automatically if
> > > that port is also marked as disconnected.
> > > However, if there's another parent in the chain after that which is
> > > connected
> > > - payloads must be released there with an ALLOCATE_PAYLOAD message.
> > >
> > > So, let's do that!
> > >
> > > Signed-off-by: Lyude Paul <[email protected]>
> > > Cc: Wayne Lin <[email protected]>
> > > Cc: Ville Syrjälä <[email protected]>
> > > Cc: Fangzhi Zuo <[email protected]>
> > > Cc: Jani Nikula <[email protected]>
> > > Cc: Imre Deak <[email protected]>
> > > Cc: Daniel Vetter <[email protected]>
> > > Cc: Sean Paul <[email protected]>
> > > ---
> > >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 51
> > > +++++++------------
> > >  1 file changed, 17 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > index dd314586bac3..70adb8db4335 100644
> > > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > @@ -3137,7 +3137,7 @@ static struct drm_dp_mst_port
> > > *drm_dp_get_last_connected_port_to_mstb(struct drm  static struct
> > > drm_dp_mst_branch
> *  drm_dp_get_last_connected_port_and_mstb(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >                                         struct drm_dp_mst_branch
> > > *mstb,
> > > -                                       int *port_num)
> > > +                                       struct drm_dp_mst_port
> > > **last_port)
> > >  {
> > >         struct drm_dp_mst_branch *rmstb = NULL;
> > >         struct drm_dp_mst_port *found_port; @@ -3153,7 +3153,8 @@
> > > drm_dp_get_last_connected_port_and_mstb(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >
> > >                 if (drm_dp_mst_topology_try_get_mstb(found_port-
> > > > parent)) {
> > >                         rmstb = found_port->parent;
> > > -                       *port_num = found_port->port_num;
> > > +                       *last_port = found_port;
> > > +                       drm_dp_mst_get_port_malloc(found_port);
> > >                 } else {
> > >                         /* Search again, starting from this parent
> > > */
> > >                         mstb = found_port->parent; @@ -3170,7
> > > +3171,7 @@ static int drm_dp_payload_send_msg(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >                                    int pbn)
> > >  {
> > >         struct drm_dp_sideband_msg_tx *txmsg;
> > > -       struct drm_dp_mst_branch *mstb;
> > > +       struct drm_dp_mst_branch *mstb = NULL;
> > >         int ret, port_num;
> > >         u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> > >         int i;
> > > @@ -3178,12 +3179,22 @@ static int drm_dp_payload_send_msg(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >         port_num = port->port_num;
> > >         mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> > > > parent);
> > >         if (!mstb) {
> > > -               mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > -                                                              port-
> > > >parent,
> > > -
> > > &port_num);
> > > +               struct drm_dp_mst_port *rport = NULL;
> > > +               bool ddps;
> > >
> > > +               mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > port->parent,
> > > +&rport);
> > >                 if (!mstb)
> > >                         return -EINVAL;
> > > +
> > > +               ddps = rport->ddps;
> > > +               port_num = rport->port_num;
> > > +               drm_dp_mst_put_port_malloc(rport);
> > > +
> > > +               /* If the port is currently marked as disconnected,
> > > +don't
> > > send
> > > a payload message */
> > > +               if (!ddps) {
> > Hi Lyude,
> >
> > Thanks for driving this!
> > Shouldn't we still send ALLOCATE_PAYLOAD with PBN 0 to the last
> > connected Port even its peer device is disconnected? We rely on this
> > "path msg" to update all payload ID tables along the virtual payload
> > channel.
> >
>
> Do you know if there's any devices that break with this change, btw? Would
> be super useful to know imho, and if so I might be alright with dropping it
> depending on what the answer to the next paragraph is.
>
Hi Lyude,
Thanks for your time and sorry for late response!

It's described in 5.6.1.3 of DP spec 2.0:
"MST branch device, in addition to waiting for the ACK from its immediate
Upstream device, should either wait for the ALLOCATE_PAYLOAD message
transaction with a PBN value equal to 0 from the MST Source device for
de-allocating the time slot assigned to the VC Payload that is routed to the
unplugged DFP or for 2 seconds, whichever occurs first."

> > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> > ports in stale topology") was trying to skip updating payload for a
> > target which is no longer existing in the current topology rooted at
> > mgr->mst_primary. I passed "mgr->mst_primary" to
> > drm_dp_mst_port_downstream_of_branch() previously.
> > Sorry, I might not fully understand the issue you've seen. Could you
> > elaborate on this more please?
> >
> > Thanks!
>
> I will have to double check this since it's been a month, but basically - the idea
> of having the topology references in the first place was to be the one check
> for figuring out whether something's in a topology or not. I've been thinking
> of maybe trying to replace it at some point, but I think we'd want to do it all
> over the helpers instead of just in certain spots.
>
> The other thing I noticed was that when I was rewriting this code, I noticed it
> seemed a lot like we had misunderstood the issue that was causing leaks in
> the first place. The BAD_PARAM we noticed indicates the payload we're
> trying to remove on the other end doesn't exist anymore, meaning the
> branch device in question got rid of any payloads it had active in response to
> the CSN. In testing though I found that payloads would be automatically
> released in situations where the last reachable port was marked as
> disconnected via a previous CSN, but was still reachable otherwise, and not in
> any other situation. This also seemed to match up with the excerpts in the DP
> spec that I found, so I assumed it was probably correct.

IMHO, the main root cause with the commit 3769e4c0af5b ("drm/dp_mst: Avoid
to mess up payload table by ports in stale topology") is like what described in the
commit message. The problem I encountered was when I unplugged the primary
mst branch device from the system, upper layer didn't try to release stale streams
immediately. Instead, it started to gradually release stale streams when I plugged the
mst hub back to the system. In that case, if we didn't do the check to see whether
the current request for deallocating payload is for this time topology instance,
i.e. might be for the stale topology before I unplug, this deallocation will mess up
payload allocation for new topology instance.

As for the CSN, it's a node broadcast request message and not a path message.
Referring to 2.14.6.1 of DP 2.0 spec:
"If the broadcast message is a node request, only the end devices, DP MST
Source or Sink devices (or DP MST Branch device if Source/Sink are not plugged),
process the request."
IMHO, payload should be controlled by source only, by ALLOCATE_PAYLOAD or
CLEAR_PAYLAOD_ID_TABLE message.

>
> Also, I think using the DDPS field instead of trying to traverse the topology
> state (which might not have been fully updated yet in response to CSNs)
> might be a slightly better idea since DDPS may end up being updated before
> the port has been removed from our in-memory topology, which is kind of

Thank you Lyude! Just want to confirm with you the below idea to see if I
understand it correctly.
The flow I thought would be (from Source perspective):
Receive CSN for notifying disconnection event => update physical topology
connection status (e.g. DDPS, put topology krefcount..) => send hotplug event to
userspace => userspace asks deallocating payloads for disconnected stream
sinks => put malloc krefcount of disconnected ports/mstbs => remove ports/mstb
from in-memory topology.
I suppose physical topology connection status is updated before sending hotplug
event to userspace and the in-memory topology still can be referred for stale
connection status before payload deallocation completes, i.e. which will put
malloc krefcount to eventually destroy disconnected devices in topology in-memory.
I mean, ideally, sounds like the topology in-memory should be reliable when
we send ALLOCATE_PAYLOAD as PBN=0. But I understand it definitely is not the
case if we have krefcount leak.

Appreciate for your time and help Lyude!

> one of the reasons I've been considering trying to come up with a better
> solution then topology references someday (unfortunately it works 'good
> enough' for the most part, so definitely not a priority). This is 100% a guess on
> my part though.
>
> > > +                       ret = -EINVAL;
> > > +                       goto fail_put;
> > > +               }
> > >         }
> > >
> > >         txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); @@ -3384,7
> > > +3395,6 @@ int drm_dp_update_payload_part1(struct
> > > drm_dp_mst_topology_mgr *mgr, int start_s
> > >         struct drm_dp_mst_port *port;
> > >         int i, j;
> > >         int cur_slots = start_slot;
> > > -       bool skip;
> > >
> > >         mutex_lock(&mgr->payload_lock);
> > >         for (i = 0; i < mgr->max_payloads; i++) { @@ -3399,16
> > > +3409,6 @@ int drm_dp_update_payload_part1(struct
> > > drm_dp_mst_topology_mgr *mgr, int start_s
> > >                         port = container_of(vcpi, struct
> > > drm_dp_mst_port,
> > >                                             vcpi);
> > >
> > > -                       mutex_lock(&mgr->lock);
> > > -                       skip
> > > = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
> > > -                       mutex_unlock(&mgr->lock);
> > > -
> > > -                       if (skip) {
> > > -                               drm_dbg_kms(mgr->dev,
> > > -                                           "Virtual channel %d is
> > > not in current topology\n",
> > > -                                           i);
> > > -                               continue;
> > > -                       }
> > >                         /* Validated ports don't matter if we're
> > > releasing
> > >                          * VCPI
> > >                          */
> > > @@ -3509,7 +3509,6 @@ int drm_dp_update_payload_part2(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >         struct drm_dp_mst_port *port;
> > >         int i;
> > >         int ret = 0;
> > > -       bool skip;
> > >
> > >         mutex_lock(&mgr->payload_lock);
> > >         for (i = 0; i < mgr->max_payloads; i++) { @@ -3519,13
> > > +3518,6 @@ int drm_dp_update_payload_part2(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >
> > >                 port = container_of(mgr->proposed_vcpis[i], struct
> > > drm_dp_mst_port, vcpi);
> > >
> > > -               mutex_lock(&mgr->lock);
> > > -               skip = !drm_dp_mst_port_downstream_of_branch(port,
> > > mgr->mst_primary);
> > > -               mutex_unlock(&mgr->lock);
> > > -
> > > -               if (skip)
> > > -                       continue;
> > > -
> > >                 drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr-
> > > > payloads[i].payload_state);
> > >                 if (mgr->payloads[i].payload_state ==
> > > DP_PAYLOAD_LOCAL) {
> > >                         ret = drm_dp_create_payload_step2(mgr, port,
> > > mgr-
> > > > proposed_vcpis[i]->vcpi, &mgr->payloads[i]); @@ -4780,18 +4772,9
> > > > @@
> > > EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> > >  void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr
> > > *mgr,
> > >                                 struct drm_dp_mst_port *port)
> > >  {
> > > -       bool skip;
> > > -
> > >         if (!port->vcpi.vcpi)
> > >                 return;
> > >
> > > -       mutex_lock(&mgr->lock);
> > > -       skip = !drm_dp_mst_port_downstream_of_branch(port, mgr-
> > > > mst_primary);
> > > -       mutex_unlock(&mgr->lock);
> > > -
> > > -       if (skip)
> > > -               return;
> > > -
> > >         drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> > >         port->vcpi.num_slots = 0;
> > >         port->vcpi.pbn = 0;
> > > --
> > > 2.35.3
> > --
> > Wayne Lin
>
> --
> Cheers,
> Lyude Paul (she/her)
> Software Engineer at Red Hat
--
Regards,
Wayne Lin

2022-08-10 23:05:44

by Lyude Paul

[permalink] [raw]
Subject: Re: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if last connected port isn't connected

On Wed, 2022-08-10 at 03:28 +0000, Lin, Wayne wrote:
> Hi Lyude,
> Thanks for your time and sorry for late response!
>
> It's described in 5.6.1.3 of DP spec 2.0:
> "MST branch device, in addition to waiting for the ACK from its immediate
> Upstream device, should either wait for the ALLOCATE_PAYLOAD message
> transaction with a PBN value equal to 0 from the MST Source device for
> de-allocating the time slot assigned to the VC Payload that is routed to the
> unplugged DFP or for 2 seconds, whichever occurs first."

oooh! Thank you for posting this, I totally missed the bit that says "or for 2
seconds, whichever occurs first." That certainly explains a lot.

>
> > > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> > > ports in stale topology") was trying to skip updating payload for a
> > > target which is no longer existing in the current topology rooted at
> > > mgr->mst_primary. I passed "mgr->mst_primary" to
> > > drm_dp_mst_port_downstream_of_branch() previously.
> > > Sorry, I might not fully understand the issue you've seen. Could you
> > > elaborate on this more please?
> > >
> > > Thanks!
> >
> > I will have to double check this since it's been a month, but basically - the idea
> > of having the topology references in the first place was to be the one check
> > for figuring out whether something's in a topology or not. I've been thinking
> > of maybe trying to replace it at some point, but I think we'd want to do it all
> > over the helpers instead of just in certain spots.
> >
> > The other thing I noticed was that when I was rewriting this code, I noticed it
> > seemed a lot like we had misunderstood the issue that was causing leaks in
> > the first place. The BAD_PARAM we noticed indicates the payload we're
> > trying to remove on the other end doesn't exist anymore, meaning the
> > branch device in question got rid of any payloads it had active in response to
> > the CSN. In testing though I found that payloads would be automatically
> > released in situations where the last reachable port was marked as
> > disconnected via a previous CSN, but was still reachable otherwise, and not in
> > any other situation. This also seemed to match up with the excerpts in the DP
> > spec that I found, so I assumed it was probably correct.
>
> IMHO, the main root cause with the commit 3769e4c0af5b ("drm/dp_mst: Avoid
>  to mess up payload table by ports in stale topology") is like what described in the
> commit message. The problem I encountered was when I unplugged the primary
> mst branch device from the system, upper layer didn't try to  release stale streams
> immediately. Instead, it started to gradually release stale streams when I plugged the
> mst hub back to the system. In that case, if we didn't do the check to see whether
> the current request for deallocating payload is for this time topology instance,
> i.e. might be for the stale topology before I unplug, this deallocation will mess up
> payload allocation for new topology instance.
>
> As for the CSN, it's a node broadcast request message and not a path message.
> Referring to 2.14.6.1 of DP 2.0 spec:
> "If the broadcast message is a node request, only the end devices, DP MST
> Source or Sink devices (or DP MST Branch device if Source/Sink are not plugged),
> process the request."
> IMHO, payload should be controlled by source only, by ALLOCATE_PAYLOAD or
> CLEAR_PAYLAOD_ID_TABLE message.
>
> >
> > Also, I think using the DDPS field instead of trying to traverse the topology
> > state (which might not have been fully updated yet in response to CSNs)
> > might be a slightly better idea since DDPS may end up being updated before
> > the port has been removed from our in-memory topology, which is kind of
>
> Thank you Lyude! Just want to confirm with you the below idea to see if I
> understand it correctly.
> The flow I thought would be (from Source perspective):
> Receive CSN for notifying disconnection event => update physical topology
> connection status (e.g. DDPS, put topology krefcount..) => send hotplug event to
> userspace => userspace asks deallocating payloads for disconnected stream
> sinks =>  put malloc krefcount of disconnected ports/mstbs  => remove ports/mstb
> from in-memory topology.
> I suppose physical topology connection status is updated before sending hotplug
> event to userspace and the in-memory topology still can be referred for stale
> connection status before payload deallocation completes, i.e. which will put
> malloc krefcount to eventually destroy disconnected devices in topology in-memory.
> I mean, ideally, sounds like the topology in-memory should be reliable when
> we send ALLOCATE_PAYLOAD as PBN=0. But I understand it definitely is not the
> case if we have krefcount leak.

mhm, I think you made me realize I'm overthinking this a bit now that I've
seen the excerpt you mentioned above, along with the other excerpt about only
the end devices being involved. The main reason I originally foresaw an issue
with this is because the delay with updating the in-memory topology structure
might put us slightly out of sync with the state of the hub on the other end -
causing the hub to spit out an error.

However - based on the excerpts you mentioned I think what I was seeing was
mainly just the 2 second timeout causing things to be released properly - not
specific behavior based on the location in the topology of the branch that was
just unplugged like I originally assumed. I think in that case it probably
does make more sense to go with your fix, so I'll likely drop this and rework
the topology checks you had into this.

>
> Appreciate for your time and help Lyude!
>

no, thank you for your help! :) There aren't a whole ton of people who are
this involved with MST so it's very useful to finally have another pair of
eyes looking at all of this.

>  

--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat