2023-11-29 12:55:14

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v4] scsi: libsas: Fix the failure of adding phy with zero-address to port

On 28/11/2023 03:45, yangxingui wrote:
>
> On 2023/11/28 3:28, John Garry wrote:
>> On 24/11/2023 02:27, yangxingui wrote:
>>>> We already do this in sas_ex_join_wide_port(), right?
>>> No, If the addr of ex_phy matches dev->parent,
>>> sas_ex_join_wide_port() will not be called, but sas_add_parent_port()
>>> will be called  as follows:
>>> static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
>>> {
>>>          struct expander_device *ex = &dev->ex_dev;
>>>          struct ex_phy *ex_phy = &ex->ex_phy[phy_id];
>>>          struct domain_device *child = NULL;
>>>          int res = 0;
>>>
>>>      <...>
>>>          /* Parent and domain coherency */
>>>          if (!dev->parent && sas_phy_match_port_addr(dev->port,
>>> ex_phy)) {
>>>                  sas_add_parent_port(dev, phy_id);
>>>                  return 0;
>>>          }
>>>          if (dev->parent && sas_phy_match_dev_addr(dev->parent,
>>> ex_phy)) {
>>>                  sas_add_parent_port(dev, phy_id);
>>>                  if (ex_phy->routing_attr == TABLE_ROUTING)
>>>                          sas_configure_phy(dev, phy_id,
>>> dev->port->sas_addr, 1);
>>>                  return 0;
>>>          }
>>>      <...>
>>> }
>>>
>>>>
>>>> I am not saying that what we do now does not have a problem - I am
>>>> just trying to understand what currently happens
>>>
>>> ok, because ex_phy->port is not set when calling
>>> sas_add_parent_port(), when deleting phy from the parent wide port,
>>> it is not removed from the phy_list of the parent wide port as follows:
>>> static void sas_unregister_devs_sas_addr(struct domain_device *parent,
>>>                                           int phy_id, bool last)
>>> {
>>>      <...>
>>>      // Since ex_phy->port is not set, this branch will not be enter
>>
>> But then how does this ever work? It is because we follow path
>> sas_rediscover_dev() -> sas_discover_new() ->
>> sas_ex_discover_devices() -> sas_ex_discover_dev() ->
>> sas_add_parent_port(), and not sas_rediscover_dev() ->
>> sas_discover_new() -> sas_ex_join_wide_port()? If so, is that because
>> ephy->sas_attached_phy == 0 in sas_discover_new() ->
>> sas_ex_join_wide_port() and it fails?
>>
>> BTW, about something mentioned earlier - adding the phy19 with SAS_ADDR
>
> Yes,
> For phy19, when the phy is attached and added to the parent wide port,
> the path is:
> sas_rediscover()
>     ->sas_discover_new()
>         ->sas_ex_discover_devices()
>             ->sas_ex_discover_dev()
>                 -> sas_add_parent_port().

ok, so then the change to set ex_phy->port = ex->parent_port looks ok.
Maybe we can put this in a helper with the sas_port_add_phy() call, as
it is duplicated in sas_ex_join_wide_port()

Do we also need to set ex_phy->phy_state (like sas_ex_join_wide_port())?

> And the path called when it is removed from parent wide port is:
> sas_rediscover()
>     ->sas_unregister_devs_sas_addr() // The sas address of phy19
> becomes 0. Since ex_phy->port is NULL, phy19 is not removed from the
> parent wide port's phy_list.
>
> For phy0, it is connected to a new sata device.
> sas_rediscover()
>     ->sas_discover_new()->sas_ex_phy_discover()
>                             ->sas_ex_phy_discover_helper()
>                                 ->sas_set_ex_phy() // The device type
> is stp. Since the linkrate is 5 and less than 1.5G, sas_address is set
> to 0.

Then when we get the proper linkrate later, will we then rediscover and
set the proper SAS address? I am just wondering if this change is really
required?

BTW, Even with the change to set ex_phy->port = ex->parent_port, are we
still joining the host-attached expander phy (19) to a port with SAS
address == 0?

>                         ->sas_ex_discover_devices()
>                             ->sas_ex_discover_dev()
>                                 ->sas_ex_discover_end_dev()
>                                     ->sas_port_alloc() // Create
> port-7:7:0
>                                     ->sas_ex_get_linkrate()
>                                         ->sas_port_add_phy() // Try
> adding phy19 to port->7:7:0, triggering BUG()

Thanks,
John


2023-11-30 03:53:12

by Xingui Yang

[permalink] [raw]
Subject: Re: [PATCH v4] scsi: libsas: Fix the failure of adding phy with zero-address to port

Hi, John

On 2023/11/29 20:54, John Garry wrote:
> On 28/11/2023 03:45, yangxingui wrote:
>>
>> On 2023/11/28 3:28, John Garry wrote:
>>> On 24/11/2023 02:27, yangxingui wrote:
>>>>> We already do this in sas_ex_join_wide_port(), right?
>>>> No, If the addr of ex_phy matches dev->parent,
>>>> sas_ex_join_wide_port() will not be called, but
>>>> sas_add_parent_port() will be called  as follows:
>>>> static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
>>>> {
>>>>          struct expander_device *ex = &dev->ex_dev;
>>>>          struct ex_phy *ex_phy = &ex->ex_phy[phy_id];
>>>>          struct domain_device *child = NULL;
>>>>          int res = 0;
>>>>
>>>>      <...>
>>>>          /* Parent and domain coherency */
>>>>          if (!dev->parent && sas_phy_match_port_addr(dev->port,
>>>> ex_phy)) {
>>>>                  sas_add_parent_port(dev, phy_id);
>>>>                  return 0;
>>>>          }
>>>>          if (dev->parent && sas_phy_match_dev_addr(dev->parent,
>>>> ex_phy)) {
>>>>                  sas_add_parent_port(dev, phy_id);
>>>>                  if (ex_phy->routing_attr == TABLE_ROUTING)
>>>>                          sas_configure_phy(dev, phy_id,
>>>> dev->port->sas_addr, 1);
>>>>                  return 0;
>>>>          }
>>>>      <...>
>>>> }
>>>>
>>>>>
>>>>> I am not saying that what we do now does not have a problem - I am
>>>>> just trying to understand what currently happens
>>>>
>>>> ok, because ex_phy->port is not set when calling
>>>> sas_add_parent_port(), when deleting phy from the parent wide port,
>>>> it is not removed from the phy_list of the parent wide port as follows:
>>>> static void sas_unregister_devs_sas_addr(struct domain_device *parent,
>>>>                                           int phy_id, bool last)
>>>> {
>>>>      <...>
>>>>      // Since ex_phy->port is not set, this branch will not be enter
>>>
>>> But then how does this ever work? It is because we follow path
>>> sas_rediscover_dev() -> sas_discover_new() ->
>>> sas_ex_discover_devices() -> sas_ex_discover_dev() ->
>>> sas_add_parent_port(), and not sas_rediscover_dev() ->
>>> sas_discover_new() -> sas_ex_join_wide_port()? If so, is that because
>>> ephy->sas_attached_phy == 0 in sas_discover_new() ->
>>> sas_ex_join_wide_port() and it fails?
>>>
>>> BTW, about something mentioned earlier - adding the phy19 with SAS_ADDR
>>
>> Yes,
>> For phy19, when the phy is attached and added to the parent wide port,
>> the path is:
>> sas_rediscover()
>>      ->sas_discover_new()
>>          ->sas_ex_discover_devices()
>>              ->sas_ex_discover_dev()
>>                  -> sas_add_parent_port().
>
> ok, so then the change to set ex_phy->port = ex->parent_port looks ok.
> Maybe we can put this in a helper with the sas_port_add_phy() call, as
> it is duplicated in sas_ex_join_wide_port()
>
> Do we also need to set ex_phy->phy_state (like sas_ex_join_wide_port())?

Well, okay, as follows?
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -856,9 +856,7 @@ static bool sas_ex_join_wide_port(struct
domain_device *parent, int phy_id)

if (!memcmp(phy->attached_sas_addr,
ephy->attached_sas_addr,
SAS_ADDR_SIZE) && ephy->port) {
- sas_port_add_phy(ephy->port, phy->phy);
- phy->port = ephy->port;
- phy->phy_state = PHY_DEVICE_DISCOVERED;
+ sas_port_add_ex_phy(ephy->port, phy);
return true;
}
}
diff --git a/drivers/scsi/libsas/sas_internal.h
b/drivers/scsi/libsas/sas_internal.h
index e860d5b19880..39ffa60a9a01 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -189,6 +189,13 @@ static inline void sas_phy_set_target(struct
asd_sas_phy *p, struct domain_devic
}
}

+static inline void sas_port_add_ex_phy(struct sas_port *port, struct
ex_phy *ex_phy)
+{
+ sas_port_add_phy(port, ex_phy->phy);
+ ex_phy->port = port;
+ ex_phy->phy_state = PHY_DEVICE_DISCOVERED;
+}
+
static inline void sas_add_parent_port(struct domain_device *dev, int
phy_id)
{
struct expander_device *ex = &dev->ex_dev;
@@ -201,8 +208,7 @@ static inline void sas_add_parent_port(struct
domain_device *dev, int phy_id)
BUG_ON(sas_port_add(ex->parent_port));
sas_port_mark_backlink(ex->parent_port);
}
- sas_port_add_phy(ex->parent_port, ex_phy->phy);
+ sas_port_add_ex_phy(ex->parent_port, ex_phy);
}

>
>> And the path called when it is removed from parent wide port is:
>> sas_rediscover()
>>      ->sas_unregister_devs_sas_addr() // The sas address of phy19
>> becomes 0. Since ex_phy->port is NULL, phy19 is not removed from the
>> parent wide port's phy_list.
>>
>> For phy0, it is connected to a new sata device.
>> sas_rediscover()
>>      ->sas_discover_new()->sas_ex_phy_discover()
>>                              ->sas_ex_phy_discover_helper()
>>                                  ->sas_set_ex_phy() // The device type
>> is stp. Since the linkrate is 5 and less than 1.5G, sas_address is set
>> to 0.
>
> Then when we get the proper linkrate later, will we then rediscover and
> set the proper SAS address? I am just wondering if this change is really
> required?
Yes, but in fact it has not reached that stage yet. After setting the
address to 0, it will continue to create a new port and try to add other
phys with the same address as it to this new port.

>
> BTW, Even with the change to set ex_phy->port = ex->parent_port, are we
> still joining the host-attached expander phy (19) to a port with SAS
> address == 0?
Yes, in order to avoid this situation, in the current patch, we will not
force the SAS address to be set to 0 when the device type is not NULL,
but will still use the address obtained after requesting the expander.


Thanks,
Xingui

2023-12-01 09:23:41

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v4] scsi: libsas: Fix the failure of adding phy with zero-address to port

On 30/11/2023 03:53, yangxingui wrote:
>>>
>>> For phy19, when the phy is attached and added to the parent wide
>>> port, the path is:
>>> sas_rediscover()
>>>      ->sas_discover_new()
>>>          ->sas_ex_discover_devices()
>>>              ->sas_ex_discover_dev()
>>>                  -> sas_add_parent_port().
>>
>> ok, so then the change to set ex_phy->port = ex->parent_port looks ok.
>> Maybe we can put this in a helper with the sas_port_add_phy() call, as
>> it is duplicated in sas_ex_join_wide_port()
>>
>> Do we also need to set ex_phy->phy_state (like sas_ex_join_wide_port())?
>
> Well, okay, as follows?
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -856,9 +856,7 @@ static bool sas_ex_join_wide_port(struct
> domain_device *parent, int phy_id)
>
>                 if (!memcmp(phy->attached_sas_addr,
> ephy->attached_sas_addr,
>                             SAS_ADDR_SIZE) && ephy->port) {
> -                       sas_port_add_phy(ephy->port, phy->phy);
> -                       phy->port = ephy->port;
> -                       phy->phy_state = PHY_DEVICE_DISCOVERED;
> +                       sas_port_add_ex_phy(ephy->port, phy);
>                         return true;

this looks ok. How about adding this helper and using it in a separate
change?

>                 }
>         }
> diff --git a/drivers/scsi/libsas/sas_internal.h
> b/drivers/scsi/libsas/sas_internal.h
> index e860d5b19880..39ffa60a9a01 100644
> --- a/drivers/scsi/libsas/sas_internal.h
> +++ b/drivers/scsi/libsas/sas_internal.h
> @@ -189,6 +189,13 @@ static inline void sas_phy_set_target(struct
> asd_sas_phy *p, struct domain_devic
>         }
>  }
>
> +static inline void sas_port_add_ex_phy(struct sas_port *port, struct
> ex_phy *ex_phy)
> +{
> +       sas_port_add_phy(port, ex_phy->phy);
> +       ex_phy->port = port;
> +       ex_phy->phy_state = PHY_DEVICE_DISCOVERED;
> +}

I'd prefer sas_expander.c, but sas_add_parent_port() is here... having
said that, sas_add_parent_port() is only used in sas_expander.c

> +
>  static inline void sas_add_parent_port(struct domain_device *dev, int
> phy_id)
>  {
>         struct expander_device *ex = &dev->ex_dev;
> @@ -201,8 +208,7 @@ static inline void sas_add_parent_port(struct
> domain_device *dev, int phy_id)
>                 BUG_ON(sas_port_add(ex->parent_port));
>                 sas_port_mark_backlink(ex->parent_port);
>         }
> -       sas_port_add_phy(ex->parent_port, ex_phy->phy);
> +       sas_port_add_ex_phy(ex->parent_port, ex_phy);
>  }
>
>>
>>> And the path called when it is removed from parent wide port is:
>>> sas_rediscover()
>>>      ->sas_unregister_devs_sas_addr() // The sas address of phy19
>>> becomes 0. Since ex_phy->port is NULL, phy19 is not removed from the
>>> parent wide port's phy_list.
>>>
>>> For phy0, it is connected to a new sata device.
>>> sas_rediscover()
>>>      ->sas_discover_new()->sas_ex_phy_discover()
>>>                              ->sas_ex_phy_discover_helper()
>>>                                  ->sas_set_ex_phy() // The device
>>> type is stp. Since the linkrate is 5 and less than 1.5G, sas_address
>>> is set to 0.
>>
>> Then when we get the proper linkrate later, will we then rediscover
>> and set the proper SAS address? I am just wondering if this change is
>> really required?
> Yes, but in fact it has not reached that stage yet. After setting the
> address to 0, it will continue to create a new port and try to add other
> phys with the same address as it to this new port.

creating a port for SAS address == 0 and adding phys seems incorrect, right?

>
>>
>> BTW, Even with the change to set ex_phy->port = ex->parent_port, are
>> we still joining the host-attached expander phy (19) to a port with
>> SAS address == 0?
> Yes, in order to avoid this situation, in the current patch, we will not
> force the SAS address to be set to 0 when the device type is not NULL,
> but will still use the address obtained after requesting the expander.

ok, let me check that again later today.

Thanks,
John


2023-12-04 11:49:25

by Xingui Yang

[permalink] [raw]
Subject: Re: [PATCH v4] scsi: libsas: Fix the failure of adding phy with zero-address to port

Hi, John

On 2023/12/1 17:22, John Garry wrote:
> On 30/11/2023 03:53, yangxingui wrote:
>>>>
>>>> For phy19, when the phy is attached and added to the parent wide
>>>> port, the path is:
>>>> sas_rediscover()
>>>>      ->sas_discover_new()
>>>>          ->sas_ex_discover_devices()
>>>>              ->sas_ex_discover_dev()
>>>>                  -> sas_add_parent_port().
>>>
>>> ok, so then the change to set ex_phy->port = ex->parent_port looks
>>> ok. Maybe we can put this in a helper with the sas_port_add_phy()
>>> call, as it is duplicated in sas_ex_join_wide_port()
>>>
>>> Do we also need to set ex_phy->phy_state (like sas_ex_join_wide_port())?
>>
>> Well, okay, as follows?
>> +++ b/drivers/scsi/libsas/sas_expander.c
>> @@ -856,9 +856,7 @@ static bool sas_ex_join_wide_port(struct
>> domain_device *parent, int phy_id)
>>
>>                  if (!memcmp(phy->attached_sas_addr,
>> ephy->attached_sas_addr,
>>                              SAS_ADDR_SIZE) && ephy->port) {
>> -                       sas_port_add_phy(ephy->port, phy->phy);
>> -                       phy->port = ephy->port;
>> -                       phy->phy_state = PHY_DEVICE_DISCOVERED;
>> +                       sas_port_add_ex_phy(ephy->port, phy);
>>                          return true;
>
> this looks ok. How about adding this helper and using it in a separate
> change?
Okay, then I will update the version.
>
>>                  }
>>          }
>> diff --git a/drivers/scsi/libsas/sas_internal.h
>> b/drivers/scsi/libsas/sas_internal.h
>> index e860d5b19880..39ffa60a9a01 100644
>> --- a/drivers/scsi/libsas/sas_internal.h
>> +++ b/drivers/scsi/libsas/sas_internal.h
>> @@ -189,6 +189,13 @@ static inline void sas_phy_set_target(struct
>> asd_sas_phy *p, struct domain_devic
>>          }
>>   }
>>
>> +static inline void sas_port_add_ex_phy(struct sas_port *port, struct
>> ex_phy *ex_phy)
>> +{
>> +       sas_port_add_phy(port, ex_phy->phy);
>> +       ex_phy->port = port;
>> +       ex_phy->phy_state = PHY_DEVICE_DISCOVERED;
>> +}
>
> I'd prefer sas_expander.c, but sas_add_parent_port() is here... having
> said that, sas_add_parent_port() is only used in sas_expander.c
Okay, then I will update the version and move it to sas_expander.c .

>
>> +
>>   static inline void sas_add_parent_port(struct domain_device *dev,
>> int phy_id)
>>   {
>>          struct expander_device *ex = &dev->ex_dev;
>> @@ -201,8 +208,7 @@ static inline void sas_add_parent_port(struct
>> domain_device *dev, int phy_id)
>>                  BUG_ON(sas_port_add(ex->parent_port));
>>                  sas_port_mark_backlink(ex->parent_port);
>>          }
>> -       sas_port_add_phy(ex->parent_port, ex_phy->phy);
>> +       sas_port_add_ex_phy(ex->parent_port, ex_phy);
>>   }
>>
>>>
>>>> And the path called when it is removed from parent wide port is:
>>>> sas_rediscover()
>>>>      ->sas_unregister_devs_sas_addr() // The sas address of phy19
>>>> becomes 0. Since ex_phy->port is NULL, phy19 is not removed from the
>>>> parent wide port's phy_list.
>>>>
>>>> For phy0, it is connected to a new sata device.
>>>> sas_rediscover()
>>>>      ->sas_discover_new()->sas_ex_phy_discover()
>>>>                              ->sas_ex_phy_discover_helper()
>>>>                                  ->sas_set_ex_phy() // The device
>>>> type is stp. Since the linkrate is 5 and less than 1.5G, sas_address
>>>> is set to 0.
>>>
>>> Then when we get the proper linkrate later, will we then rediscover
>>> and set the proper SAS address? I am just wondering if this change is
>>> really required?
>> Yes, but in fact it has not reached that stage yet. After setting the
>> address to 0, it will continue to create a new port and try to add
>> other phys with the same address as it to this new port.
>
> creating a port for SAS address == 0 and adding phys seems incorrect,
> right?
Yes. There are three possible ways to solve the problem of creating a
port with a zero address:
1. Use the sas address obtained by querying the expander instead of the
zero address.
2. Forbid the phy with an address of 0 to create a port.
3. When the rate is less than 1.5G, do not let it enter
sas_ex_discover_end_dev().

Because when the device type is not empty, its SAS address is legal, and
we are currently using the first one.
>
>>
>>>
>>> BTW, Even with the change to set ex_phy->port = ex->parent_port, are
>>> we still joining the host-attached expander phy (19) to a port with
>>> SAS address == 0?
>> Yes, in order to avoid this situation, in the current patch, we will
>> not force the SAS address to be set to 0 when the device type is not
>> NULL, but will still use the address obtained after requesting the
>> expander.
>
> ok, let me check that again later today.
OK.

Thanks
Xingui