2022-05-06 16:23:52

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout



On 5/6/22 03:47, Srinivas Kandagatla wrote:
> Currently timeout for autoenumeration during probe and bus reset is set to
> 2 secs which is really a big value. This can have an adverse effect on
> boot time if the slave device is not ready/reset.
> This was the case with wcd938x which was not reset yet but we spent 2
> secs waiting in the soundwire controller probe. Reduce this time to
> 1/10 of Hz which should be good enough time to finish autoenumeration
> if any slaves are available on the bus.

Humm, now that I think of it I am not sure what reducing the timeout does.

It's clear that autoenumeration should be very fast, but if there is
nothing to enumerate what would happen then? It seems that reducing the
timeout value only forces an inconsistent configuration to be exposed
earlier, but that would not result in a functional change where the
missing device would magically appear, would it? Is this change mainly
to make the tests fail faster? If the 'slave device is not ready/reset',
is there a recovery mechanism to recheck later?

Would you mind clarifying what happens after the timeout, and why the
timeout would happen in the first place?

Thanks!

>
> Reported-by: Srinivasa Rao Mandadapu <[email protected]>
> Signed-off-by: Srinivas Kandagatla <[email protected]>
> ---
>
> Changes since v1:
> replaced HZ/10 with 100 as suggested by Pierre
>
> drivers/soundwire/qcom.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
> index 7367aa88b8ac..d6111f69d320 100644
> --- a/drivers/soundwire/qcom.c
> +++ b/drivers/soundwire/qcom.c
> @@ -105,7 +105,7 @@
>
> #define SWRM_SPECIAL_CMD_ID 0xF
> #define MAX_FREQ_NUM 1
> -#define TIMEOUT_MS (2 * HZ)
> +#define TIMEOUT_MS 100
> #define QCOM_SWRM_MAX_RD_LEN 0x1
> #define QCOM_SDW_MAX_PORTS 14
> #define DEFAULT_CLK_FREQ 9600000


2022-05-09 02:06:31

by Srinivas Kandagatla

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout

Thanks Pierre,

On 06/05/2022 15:13, Pierre-Louis Bossart wrote:
>
>
> On 5/6/22 03:47, Srinivas Kandagatla wrote:
>> Currently timeout for autoenumeration during probe and bus reset is set to
>> 2 secs which is really a big value. This can have an adverse effect on
>> boot time if the slave device is not ready/reset.
>> This was the case with wcd938x which was not reset yet but we spent 2
>> secs waiting in the soundwire controller probe. Reduce this time to
>> 1/10 of Hz which should be good enough time to finish autoenumeration
>> if any slaves are available on the bus.
>
> Humm, now that I think of it I am not sure what reducing the timeout does.
>
> It's clear that autoenumeration should be very fast, but if there is
> nothing to enumerate what would happen then? It seems that reducing the
> timeout value only forces an inconsistent configuration to be exposed
> earlier, but that would not result in a functional change where the
> missing device would magically appear, would it? Is this change mainly
> to make the tests fail faster? If the 'slave device is not ready/reset',
> is there a recovery mechanism to recheck later?
>
> Would you mind clarifying what happens after the timeout, and why the
> timeout would happen in the first place?

This issue is mostly present/seen with WCD938x codec due to its Linux
device model.
WCD938x Codec has 3 Linux component drivers
1. TX Component (A soundwire device connected to TX Soundwire Master)
2. RX Component (A soundwire device connected to RX Soundwire Master)
3. Master Component (Linux component framework master for (1) and (2)
and registers ASoC codec)

Also we have only one reset for (1) and (2).

reset line is handled by (3)
There are two possibilities when the WCD938x reset can happen,

1. If reset happens earlier than probing (1) and (2) which is best case.


2. if reset happens after (1) and (2) are probed then SoundWire TX and
RX master will have spend 2 + 2 secs waiting, Which is a long time out
Hence the patch.

TBH, the 2 sec timeout value was just a random number which I added at
the start, we had to come up with some sensible value over the time
anyway for that.

You could say why do we need wait itself in the first place.

The reason we need wait in first place is because, there is a danger of
codec accessing registers even before enumeration is finished. Because
most of the ASoC codec registration happens as part of codec/component
driver probe function rather than status callback.

I hope this answers your questions.

thanks,
--srini



>
> Thanks!
>
>>
>> Reported-by: Srinivasa Rao Mandadapu <[email protected]>
>> Signed-off-by: Srinivas Kandagatla <[email protected]>
>> ---
>>
>> Changes since v1:
>> replaced HZ/10 with 100 as suggested by Pierre
>>
>> drivers/soundwire/qcom.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
>> index 7367aa88b8ac..d6111f69d320 100644
>> --- a/drivers/soundwire/qcom.c
>> +++ b/drivers/soundwire/qcom.c
>> @@ -105,7 +105,7 @@
>>
>> #define SWRM_SPECIAL_CMD_ID 0xF
>> #define MAX_FREQ_NUM 1
>> -#define TIMEOUT_MS (2 * HZ)
>> +#define TIMEOUT_MS 100
>> #define QCOM_SWRM_MAX_RD_LEN 0x1
>> #define QCOM_SDW_MAX_PORTS 14
>> #define DEFAULT_CLK_FREQ 9600000

2022-05-09 13:43:39

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout



On 5/7/22 01:52, Srinivas Kandagatla wrote:
> Thanks Pierre,
>
> On 06/05/2022 15:13, Pierre-Louis Bossart wrote:
>>
>>
>> On 5/6/22 03:47, Srinivas Kandagatla wrote:
>>> Currently timeout for autoenumeration during probe and bus reset is
>>> set to
>>> 2 secs which is really a big value. This can have an adverse effect on
>>> boot time if the slave device is not ready/reset.
>>> This was the case with wcd938x which was not reset yet but we spent 2
>>> secs waiting in the soundwire controller probe. Reduce this time to
>>> 1/10 of Hz which should be good enough time to finish autoenumeration
>>> if any slaves are available on the bus.
>>
>> Humm, now that I think of it I am not sure what reducing the timeout
>> does.
>>
>> It's clear that autoenumeration should be very fast, but if there is
>> nothing to enumerate what would happen then? It seems that reducing the
>> timeout value only forces an inconsistent configuration to be exposed
>> earlier, but that would not result in a functional change where the
>> missing device would magically appear, would it? Is this change mainly
>> to make the tests fail faster? If the 'slave device is not ready/reset',
>> is there a recovery mechanism to recheck later?
>>
>> Would you mind clarifying what happens after the timeout, and why the
>> timeout would happen in the first place?
>
> This issue is mostly present/seen with WCD938x codec due to its Linux
> device model.
> WCD938x Codec has 3 Linux component drivers
> 1. TX Component (A soundwire device connected to TX Soundwire Master)
> 2. RX Component (A soundwire device connected to RX Soundwire Master)
> 3. Master Component (Linux component framework master for (1) and  (2)
> and registers ASoC codec)
>
> Also we have only one reset for (1) and (2).
>
> reset line is handled by (3)
> There are two possibilities when the WCD938x reset can happen,
>
> 1. If reset happens earlier than probing (1) and (2) which is best case.
>
>
> 2. if reset happens after (1) and (2) are probed then SoundWire TX and
> RX master will have spend 2 + 2 secs waiting, Which is a long time out
> Hence the patch.
>
> TBH, the 2 sec timeout value was just a random number which I added at
> the start, we had to come up with some sensible value over the time
> anyway for that.
>
> You could say why do we need wait itself in the first place.
>
> The reason we need wait in first place is because, there is a danger of
> codec accessing registers even before enumeration is finished. Because
> most of the ASoC codec registration happens as part of codec/component
> driver probe function rather than status callback.
>
> I hope this answers your questions.


Humm, not really.

First, you're using this TIMEOUT_MS value in 3 unrelated places, and
using 2 different struct completion, which means there are side effects
beyond autoenumeration.

qcom.c- /*

qcom.c- * sleep for 10ms for MSM soundwire variant to allow
broadcast

qcom.c- * command to complete.

qcom.c- */

qcom.c- ret = wait_for_completion_timeout(&swrm->broadcast,

qcom.c: msecs_to_jiffies(TIMEOUT_MS));

--

qcom.c- goto err_clk;

qcom.c- }

qcom.c-

qcom.c- qcom_swrm_init(ctrl);

qcom.c- wait_for_completion_timeout(&ctrl->enumeration,

qcom.c: msecs_to_jiffies(TIMEOUT_MS));

--

qcom.c- if (!swrm_wait_for_frame_gen_enabled(ctrl))

qcom.c- dev_err(ctrl->dev, "link failed to connect\n");

qcom.c-

qcom.c- /* wait for hw enumeration to complete */

qcom.c- wait_for_completion_timeout(&ctrl->enumeration,

qcom.c: msecs_to_jiffies(TIMEOUT_MS));


And then I don't get what you do on a timeout. in the enumeration part,
the timeout value is not checked for, so I guess enumeration proceeds
without the hardware being available? That seems to contradict the
assertion that you don't want to access registers before the hardware is
properly initialized.

And then on broadcast you have this error handling:

ret = wait_for_completion_timeout(&swrm->broadcast,
msecs_to_jiffies(TIMEOUT_MS));
if (!ret)
ret = SDW_CMD_IGNORED;
else
ret = SDW_CMD_OK;

Which is equally confusing since SDW_CMD_IGNORED is really an error, and
the bus layer does not really handle it well - not to mention that I
vaguely recall the qcom hardware having its own definition of IGNORED?


2022-05-09 14:15:22

by Srinivas Kandagatla

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout



On 09/05/2022 14:31, Pierre-Louis Bossart wrote:
>
>
> On 5/7/22 01:52, Srinivas Kandagatla wrote:
>> Thanks Pierre,
>>
>> On 06/05/2022 15:13, Pierre-Louis Bossart wrote:
>>>
>>>
>>> On 5/6/22 03:47, Srinivas Kandagatla wrote:
>>>> Currently timeout for autoenumeration during probe and bus reset is
>>>> set to
>>>> 2 secs which is really a big value. This can have an adverse effect on
>>>> boot time if the slave device is not ready/reset.
>>>> This was the case with wcd938x which was not reset yet but we spent 2
>>>> secs waiting in the soundwire controller probe. Reduce this time to
>>>> 1/10 of Hz which should be good enough time to finish autoenumeration
>>>> if any slaves are available on the bus.
>>>
>>> Humm, now that I think of it I am not sure what reducing the timeout
>>> does.
>>>
>>> It's clear that autoenumeration should be very fast, but if there is
>>> nothing to enumerate what would happen then? It seems that reducing the
>>> timeout value only forces an inconsistent configuration to be exposed
>>> earlier, but that would not result in a functional change where the
>>> missing device would magically appear, would it? Is this change mainly
>>> to make the tests fail faster? If the 'slave device is not ready/reset',
>>> is there a recovery mechanism to recheck later?
>>>
>>> Would you mind clarifying what happens after the timeout, and why the
>>> timeout would happen in the first place?
>>
>> This issue is mostly present/seen with WCD938x codec due to its Linux
>> device model.
>> WCD938x Codec has 3 Linux component drivers
>> 1. TX Component (A soundwire device connected to TX Soundwire Master)
>> 2. RX Component (A soundwire device connected to RX Soundwire Master)
>> 3. Master Component (Linux component framework master for (1) and  (2)
>> and registers ASoC codec)
>>
>> Also we have only one reset for (1) and (2).
>>
>> reset line is handled by (3)
>> There are two possibilities when the WCD938x reset can happen,
>>
>> 1. If reset happens earlier than probing (1) and (2) which is best case.
>>
>>
>> 2. if reset happens after (1) and (2) are probed then SoundWire TX and
>> RX master will have spend 2 + 2 secs waiting, Which is a long time out
>> Hence the patch.
>>
>> TBH, the 2 sec timeout value was just a random number which I added at
>> the start, we had to come up with some sensible value over the time
>> anyway for that.
>>
>> You could say why do we need wait itself in the first place.
>>
>> The reason we need wait in first place is because, there is a danger of
>> codec accessing registers even before enumeration is finished. Because
>> most of the ASoC codec registration happens as part of codec/component
>> driver probe function rather than status callback.
>>
>> I hope this answers your questions.
>
>
> Humm, not really.
>
> First, you're using this TIMEOUT_MS value in 3 unrelated places, and
> using 2 different struct completion, which means there are side effects
> beyond autoenumeration.

2 of these 3 are autoenum ones, one is in probe path and other in bus
reset path during pm.

The final one is broadcast timeout, new timeout value should be okay for
that too, given that we need 10ms timeout.

>
> qcom.c- /*
>
> qcom.c- * sleep for 10ms for MSM soundwire variant to allow
> broadcast
>
> qcom.c- * command to complete.
>
> qcom.c- */
>
> qcom.c- ret = wait_for_completion_timeout(&swrm->broadcast,
>
> qcom.c: msecs_to_jiffies(TIMEOUT_MS));
>
> --
>
> qcom.c- goto err_clk;
>
> qcom.c- }
>
> qcom.c-
>
> qcom.c- qcom_swrm_init(ctrl);
>
> qcom.c- wait_for_completion_timeout(&ctrl->enumeration,
>
> qcom.c: msecs_to_jiffies(TIMEOUT_MS));
>
> --
>
> qcom.c- if (!swrm_wait_for_frame_gen_enabled(ctrl))
>
> qcom.c- dev_err(ctrl->dev, "link failed to connect\n");
>
> qcom.c-
>
> qcom.c- /* wait for hw enumeration to complete */
>
> qcom.c- wait_for_completion_timeout(&ctrl->enumeration,
>
> qcom.c: msecs_to_jiffies(TIMEOUT_MS)); >
>
> And then I don't get what you do on a timeout. in the enumeration part,

We can't do much on the timeout.

> the timeout value is not checked for, so I guess enumeration proceeds
> without the hardware being available? That seems to contradict the
> assertion that you don't want to access registers before the hardware is
> properly initialized.

Even if enumeration timeout, we will not access the registers because
the ASoC codec is not registered yet from WCD938x component master.

>
> And then on broadcast you have this error handling:
>
> ret = wait_for_completion_timeout(&swrm->broadcast,
> msecs_to_jiffies(TIMEOUT_MS));
> if (!ret)
> ret = SDW_CMD_IGNORED;
> else
> ret = SDW_CMD_OK;
>
> Which is equally confusing since SDW_CMD_IGNORED is really an error, and
> the bus layer does not really handle it well - not to mention that I
> vaguely recall the qcom hardware having its own definition of IGNORED?
QCom hardware alteast the version based on which this driver was written
did not have any support to report errors type back on register read/writes.

In this case a broad cast read or write did not generate a complete
interrupt its assumed that its ignored, as there is no way to say if its
error or ignored.

--srini




>

2022-05-09 14:27:19

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout


>>> You could say why do we need wait itself in the first place.
>>>
>>> The reason we need wait in first place is because, there is a danger of
>>> codec accessing registers even before enumeration is finished. Because
>>> most of the ASoC codec registration happens as part of codec/component
>>> driver probe function rather than status callback.
>>>
>>> I hope this answers your questions.
>>
>>
>> Humm, not really.
>>
>> First, you're using this TIMEOUT_MS value in 3 unrelated places, and
>> using 2 different struct completion, which means there are side effects
>> beyond autoenumeration.
>
> 2 of these 3 are autoenum ones, one is in probe path and other in bus
> reset path during pm.
>
> The final one is broadcast timeout, new timeout value should be okay for
> that too, given that we need 10ms timeout.

probably better to make things explicit with a different timeout value
for the broadcast case.

100ms for a broadcast is really eons, it's supposed to be immediate really.

>> And then I don't get what you do on a timeout. in the enumeration part,
>
> We can't do much on the timeout.
>
>> the timeout value is not checked for, so I guess enumeration proceeds
>> without the hardware being available? That seems to contradict the
>> assertion that you don't want to access registers before the hardware is
>> properly initialized.
>
> Even if enumeration timeout, we will not access the registers because
> the ASoC codec is not registered yet from WCD938x component master.

What happens when the codec is registered then? the autoenumeration
still didn't complete, so what prevents the read/writes from failing then?

>> And then on broadcast you have this error handling:
>>
>>         ret = wait_for_completion_timeout(&swrm->broadcast,
>>                           msecs_to_jiffies(TIMEOUT_MS));
>>         if (!ret)
>>             ret = SDW_CMD_IGNORED;
>>         else
>>             ret = SDW_CMD_OK;
>>
>> Which is equally confusing since SDW_CMD_IGNORED is really an error, and
>> the bus layer does not really handle it well - not to mention that I
>> vaguely recall the qcom hardware having its own definition of IGNORED?
> QCom hardware alteast the version based on which this driver was written
> did not have any support to report errors type back on register
> read/writes.
>
> In this case a broad cast read or write did not generate a complete
> interrupt its assumed that its ignored, as there is no way to say if its
> error or ignored.

ok, that should be fine.

The code in bus.c mostly ignores -ENODATA for the suspend cases. For the
bank switch, I forgot that we also ignore it, Bard added a patch in
2021. The only case where we abort a transfer is on a real error.




2022-05-09 14:34:44

by Srinivas Kandagatla

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout



On 09/05/2022 15:24, Pierre-Louis Bossart wrote:
>
>>>> You could say why do we need wait itself in the first place.
>>>>
>>>> The reason we need wait in first place is because, there is a danger of
>>>> codec accessing registers even before enumeration is finished. Because
>>>> most of the ASoC codec registration happens as part of codec/component
>>>> driver probe function rather than status callback.
>>>>
>>>> I hope this answers your questions.
>>>
>>>
>>> Humm, not really.
>>>
>>> First, you're using this TIMEOUT_MS value in 3 unrelated places, and
>>> using 2 different struct completion, which means there are side effects
>>> beyond autoenumeration.
>>
>> 2 of these 3 are autoenum ones, one is in probe path and other in bus
>> reset path during pm.
>>
>> The final one is broadcast timeout, new timeout value should be okay for
>> that too, given that we need 10ms timeout.
>
> probably better to make things explicit with a different timeout value
> for the broadcast case.
I agree, we should define a dedicated 10ms timeout for this to avoid
confusion.

>
> 100ms for a broadcast is really eons, it's supposed to be immediate really.
>
>>> And then I don't get what you do on a timeout. in the enumeration part,
>>
>> We can't do much on the timeout.
>>
>>> the timeout value is not checked for, so I guess enumeration proceeds
>>> without the hardware being available? That seems to contradict the
>>> assertion that you don't want to access registers before the hardware is
>>> properly initialized.
>>
>> Even if enumeration timeout, we will not access the registers because
>> the ASoC codec is not registered yet from WCD938x component master.
>
> What happens when the codec is registered then? the autoenumeration
Codec is only registered after reset and when both TX and RX components
are probed.

> still didn't complete, so what prevents the read/writes from failing then?
If codec is reset and registered and for some reason autoenum took more
than 100ms which will be hw bug then :-).
In this case register read/writes will fail.

--srini
>
>>> And then on broadcast you have this error handling:
>>>
>>>         ret = wait_for_completion_timeout(&swrm->broadcast,
>>>                           msecs_to_jiffies(TIMEOUT_MS));
>>>         if (!ret)
>>>             ret = SDW_CMD_IGNORED;
>>>         else
>>>             ret = SDW_CMD_OK;
>>>
>>> Which is equally confusing since SDW_CMD_IGNORED is really an error, and
>>> the bus layer does not really handle it well - not to mention that I
>>> vaguely recall the qcom hardware having its own definition of IGNORED?
>> QCom hardware alteast the version based on which this driver was written
>> did not have any support to report errors type back on register
>> read/writes.
>>
>> In this case a broad cast read or write did not generate a complete
>> interrupt its assumed that its ignored, as there is no way to say if its
>> error or ignored.
>
> ok, that should be fine.
>
> The code in bus.c mostly ignores -ENODATA for the suspend cases. For the
> bank switch, I forgot that we also ignore it, Bard added a patch in
> 2021. The only case where we abort a transfer is on a real error.
>
>
>

2022-05-09 14:36:59

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout


>>> Even if enumeration timeout, we will not access the registers because
>>> the ASoC codec is not registered yet from WCD938x component master.
>>
>> What happens when the codec is registered then? the autoenumeration
> Codec is only registered after reset and when both TX and RX components
> are probed.
>
>> still didn't complete, so what prevents the read/writes from failing
>> then?
> If codec is reset and registered and for some reason autoenum took more
> than 100ms which will be hw bug then :-).
> In this case register read/writes will fail.

Does this reset result in the 'bus reset' in the SoundWire sence and
restart hence the autoenumeration?

It looks like you have a race between different components and starting
the bus before it's needed, no?

2022-05-09 14:44:42

by Srinivas Kandagatla

[permalink] [raw]
Subject: Re: [PATCH v2] soundwire: qcom: adjust autoenumeration timeout



On 09/05/2022 15:36, Pierre-Louis Bossart wrote:
>
>>>> Even if enumeration timeout, we will not access the registers because
>>>> the ASoC codec is not registered yet from WCD938x component master.
>>>
>>> What happens when the codec is registered then? the autoenumeration
>> Codec is only registered after reset and when both TX and RX components
>> are probed.
>>
>>> still didn't complete, so what prevents the read/writes from failing
>>> then?
>> If codec is reset and registered and for some reason autoenum took more
>> than 100ms which will be hw bug then :-).
>> In this case register read/writes will fail.
>
> Does this reset result in the 'bus reset' in the SoundWire sence and
> restart hence the autoenumeration?
The reset am referring here is codec reset gpio line. on reset device
will show up on the bus which should trigger an autoenumeration.

>
> It looks like you have a race between different components and starting
> the bus before it's needed, no?WCD938x is bit of a odd hw configuration, that is why we are using
Component framework for this codec. This does ensure most part of it.

--srini