To avoid ACP entering into D3 state during slave enumeration and
initialization on two soundwire controller instances for multiple codecs,
increase the runtime suspend delay to 3 seconds.
Signed-off-by: Vijendar Mukunda <[email protected]>
---
sound/soc/amd/ps/acp63.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h
index 833d0b5aa73d..6c8849f2bcec 100644
--- a/sound/soc/amd/ps/acp63.h
+++ b/sound/soc/amd/ps/acp63.h
@@ -51,7 +51,7 @@
#define MIN_BUFFER MAX_BUFFER
/* time in ms for runtime suspend delay */
-#define ACP_SUSPEND_DELAY_MS 2000
+#define ACP_SUSPEND_DELAY_MS 3000
#define ACP63_DMIC_ADDR 2
#define ACP63_PDM_MODE_DEVS 3
--
2.34.1
On 1/11/23 03:02, Vijendar Mukunda wrote:
> To avoid ACP entering into D3 state during slave enumeration and
> initialization on two soundwire controller instances for multiple codecs,
> increase the runtime suspend delay to 3 seconds.
You have a parent PCI device and a set of child devices for each
manager. The parent PCI device cannot suspend before all its children
are also suspended, so shouldn't the delay be modified at the manager level?
Not getting what this delay is and how this would deal with a lengthy
enumeration/initialization process.
>
> Signed-off-by: Vijendar Mukunda <[email protected]>
> ---
> sound/soc/amd/ps/acp63.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h
> index 833d0b5aa73d..6c8849f2bcec 100644
> --- a/sound/soc/amd/ps/acp63.h
> +++ b/sound/soc/amd/ps/acp63.h
> @@ -51,7 +51,7 @@
> #define MIN_BUFFER MAX_BUFFER
>
> /* time in ms for runtime suspend delay */
> -#define ACP_SUSPEND_DELAY_MS 2000
> +#define ACP_SUSPEND_DELAY_MS 3000
>
> #define ACP63_DMIC_ADDR 2
> #define ACP63_PDM_MODE_DEVS 3
On 11/01/23 21:32, Pierre-Louis Bossart wrote:
> On 1/11/23 03:02, Vijendar Mukunda wrote:
>> To avoid ACP entering into D3 state during slave enumeration and
>> initialization on two soundwire controller instances for multiple codecs,
>> increase the runtime suspend delay to 3 seconds.
> You have a parent PCI device and a set of child devices for each
> manager. The parent PCI device cannot suspend before all its children
> are also suspended, so shouldn't the delay be modified at the manager level?
>
> Not getting what this delay is and how this would deal with a lengthy
> enumeration/initialization process.
Yes agreed. Until Child devices are suspended, parent device will
be in D0 state. We will rephrase the commit message.
Machine driver node will be created by ACP PCI driver.
We have added delay in machine driver to make sure
two manager instances completes codec enumeration and
peripheral initialization before registering the sound card.
Without adding delay in machine driver will result early card
registration before codec initialization is completed. Manager
will enter in to bad state due to codec read/write failures.
We are intended to keep the ACP in D0 state, till sound card
is created and jack controls are initialized. To handle, at manager
level increased runtime suspend delay.
>> Signed-off-by: Vijendar Mukunda <[email protected]>
>> ---
>> sound/soc/amd/ps/acp63.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h
>> index 833d0b5aa73d..6c8849f2bcec 100644
>> --- a/sound/soc/amd/ps/acp63.h
>> +++ b/sound/soc/amd/ps/acp63.h
>> @@ -51,7 +51,7 @@
>> #define MIN_BUFFER MAX_BUFFER
>>
>> /* time in ms for runtime suspend delay */
>> -#define ACP_SUSPEND_DELAY_MS 2000
>> +#define ACP_SUSPEND_DELAY_MS 3000
>>
>> #define ACP63_DMIC_ADDR 2
>> #define ACP63_PDM_MODE_DEVS 3
On 1/12/2023 08:54, Pierre-Louis Bossart wrote:
>
>
> On 1/12/23 05:02, Mukunda,Vijendar wrote:
>> On 11/01/23 21:32, Pierre-Louis Bossart wrote:
>>> On 1/11/23 03:02, Vijendar Mukunda wrote:
>>>> To avoid ACP entering into D3 state during slave enumeration and
>>>> initialization on two soundwire controller instances for multiple codecs,
>>>> increase the runtime suspend delay to 3 seconds.
>>> You have a parent PCI device and a set of child devices for each
>>> manager. The parent PCI device cannot suspend before all its children
>>> are also suspended, so shouldn't the delay be modified at the manager level?
>>>
>>> Not getting what this delay is and how this would deal with a lengthy
>>> enumeration/initialization process.
>> Yes agreed. Until Child devices are suspended, parent device will
>> be in D0 state. We will rephrase the commit message.
>>
>> Machine driver node will be created by ACP PCI driver.
>> We have added delay in machine driver to make sure
>> two manager instances completes codec enumeration and
>> peripheral initialization before registering the sound card.
>> Without adding delay in machine driver will result early card
>> registration before codec initialization is completed. Manager
>> will enter in to bad state due to codec read/write failures.
>> We are intended to keep the ACP in D0 state, till sound card
>> is created and jack controls are initialized. To handle, at manager
>> level increased runtime suspend delay.
>
> This doesn't look too good. You should not assume any timing
> dependencies in the machine driver probe. I made that mistake in earlier
> versions and we had to revisit all this to make sure drivers could be
> bound/unbound at any time.
Rather than a timing dependency, could you perhaps prohibit runtime PM
and have a codec make a callback to indicate it's fully initialized and
then allow runtime PM again?
On 1/12/23 05:02, Mukunda,Vijendar wrote:
> On 11/01/23 21:32, Pierre-Louis Bossart wrote:
>> On 1/11/23 03:02, Vijendar Mukunda wrote:
>>> To avoid ACP entering into D3 state during slave enumeration and
>>> initialization on two soundwire controller instances for multiple codecs,
>>> increase the runtime suspend delay to 3 seconds.
>> You have a parent PCI device and a set of child devices for each
>> manager. The parent PCI device cannot suspend before all its children
>> are also suspended, so shouldn't the delay be modified at the manager level?
>>
>> Not getting what this delay is and how this would deal with a lengthy
>> enumeration/initialization process.
> Yes agreed. Until Child devices are suspended, parent device will
> be in D0 state. We will rephrase the commit message.
>
> Machine driver node will be created by ACP PCI driver.
> We have added delay in machine driver to make sure
> two manager instances completes codec enumeration and
> peripheral initialization before registering the sound card.
> Without adding delay in machine driver will result early card
> registration before codec initialization is completed. Manager
> will enter in to bad state due to codec read/write failures.
> We are intended to keep the ACP in D0 state, till sound card
> is created and jack controls are initialized. To handle, at manager
> level increased runtime suspend delay.
This doesn't look too good. You should not assume any timing
dependencies in the machine driver probe. I made that mistake in earlier
versions and we had to revisit all this to make sure drivers could be
bound/unbound at any time.
On 1/12/23 09:29, Limonciello, Mario wrote:
> On 1/12/2023 08:54, Pierre-Louis Bossart wrote:
>>
>>
>> On 1/12/23 05:02, Mukunda,Vijendar wrote:
>>> On 11/01/23 21:32, Pierre-Louis Bossart wrote:
>>>> On 1/11/23 03:02, Vijendar Mukunda wrote:
>>>>> To avoid ACP entering into D3 state during slave enumeration and
>>>>> initialization on two soundwire controller instances for multiple
>>>>> codecs,
>>>>> increase the runtime suspend delay to 3 seconds.
>>>> You have a parent PCI device and a set of child devices for each
>>>> manager. The parent PCI device cannot suspend before all its children
>>>> are also suspended, so shouldn't the delay be modified at the
>>>> manager level?
>>>>
>>>> Not getting what this delay is and how this would deal with a lengthy
>>>> enumeration/initialization process.
>>> Yes agreed. Until Child devices are suspended, parent device will
>>> be in D0 state. We will rephrase the commit message.
>>>
>>> Machine driver node will be created by ACP PCI driver.
>>> We have added delay in machine driver to make sure
>>> two manager instances completes codec enumeration and
>>> peripheral initialization before registering the sound card.
>>> Without adding delay in machine driver will result early card
>>> registration before codec initialization is completed. Manager
>>> will enter in to bad state due to codec read/write failures.
>>> We are intended to keep the ACP in D0 state, till sound card
>>> is created and jack controls are initialized. To handle, at manager
>>> level increased runtime suspend delay.
>>
>> This doesn't look too good. You should not assume any timing
>> dependencies in the machine driver probe. I made that mistake in earlier
>> versions and we had to revisit all this to make sure drivers could be
>> bound/unbound at any time.
>
> Rather than a timing dependency, could you perhaps prohibit runtime PM
> and have a codec make a callback to indicate it's fully initialized and
> then allow runtime PM again?
We already have enumeration and initialization 'struct completion' that
are used by codec drivers to know if the hardware is usable. We also
have pm_runtime_get_sync() is the bus layer to make sure the codec is
resumed before being accessed.
The explanations above confuse card registration and manager
probe/initialization. These are two different things. Maybe there's
indeed a missing part in the SoundWire PM assumptions, but I am not
getting what the issue is.
On 12/01/23 21:35, Pierre-Louis Bossart wrote:
>
> On 1/12/23 09:29, Limonciello, Mario wrote:
>> On 1/12/2023 08:54, Pierre-Louis Bossart wrote:
>>>
>>> On 1/12/23 05:02, Mukunda,Vijendar wrote:
>>>> On 11/01/23 21:32, Pierre-Louis Bossart wrote:
>>>>> On 1/11/23 03:02, Vijendar Mukunda wrote:
>>>>>> To avoid ACP entering into D3 state during slave enumeration and
>>>>>> initialization on two soundwire controller instances for multiple
>>>>>> codecs,
>>>>>> increase the runtime suspend delay to 3 seconds.
>>>>> You have a parent PCI device and a set of child devices for each
>>>>> manager. The parent PCI device cannot suspend before all its children
>>>>> are also suspended, so shouldn't the delay be modified at the
>>>>> manager level?
>>>>>
>>>>> Not getting what this delay is and how this would deal with a lengthy
>>>>> enumeration/initialization process.
>>>> Yes agreed. Until Child devices are suspended, parent device will
>>>> be in D0 state. We will rephrase the commit message.
>>>>
>>>> Machine driver node will be created by ACP PCI driver.
>>>> We have added delay in machine driver to make sure
>>>> two manager instances completes codec enumeration and
>>>> peripheral initialization before registering the sound card.
>>>> Without adding delay in machine driver will result early card
>>>> registration before codec initialization is completed. Manager
>>>> will enter in to bad state due to codec read/write failures.
>>>> We are intended to keep the ACP in D0 state, till sound card
>>>> is created and jack controls are initialized. To handle, at manager
>>>> level increased runtime suspend delay.
>>> This doesn't look too good. You should not assume any timing
>>> dependencies in the machine driver probe. I made that mistake in earlier
>>> versions and we had to revisit all this to make sure drivers could be
>>> bound/unbound at any time.
>> Rather than a timing dependency, could you perhaps prohibit runtime PM
>> and have a codec make a callback to indicate it's fully initialized and
>> then allow runtime PM again?
> We already have enumeration and initialization 'struct completion' that
> are used by codec drivers to know if the hardware is usable. We also
> have pm_runtime_get_sync() is the bus layer to make sure the codec is
> resumed before being accessed.
Instead of walking through codec list and checking completion status
for every codec over the link, can we have some solution where once
all codecs gets enumerated and initialized, a variable in bus instance
will be updated to know all peripherals initialized. So that we can
check this variable in machine driver.
>
> The explanations above confuse card registration and manager
> probe/initialization. These are two different things. Maybe there's
> indeed a missing part in the SoundWire PM assumptions, but I am not
> getting what the issue is.
We will rephrase the commit message.
At manager level we want to increase the delay to 3s.
>>>>>>> increase the runtime suspend delay to 3 seconds.
>>>>>> You have a parent PCI device and a set of child devices for each
>>>>>> manager. The parent PCI device cannot suspend before all its children
>>>>>> are also suspended, so shouldn't the delay be modified at the
>>>>>> manager level?
>>>>>>
>>>>>> Not getting what this delay is and how this would deal with a lengthy
>>>>>> enumeration/initialization process.
>>>>> Yes agreed. Until Child devices are suspended, parent device will
>>>>> be in D0 state. We will rephrase the commit message.
>>>>>
>>>>> Machine driver node will be created by ACP PCI driver.
>>>>> We have added delay in machine driver to make sure
>>>>> two manager instances completes codec enumeration and
>>>>> peripheral initialization before registering the sound card.
>>>>> Without adding delay in machine driver will result early card
>>>>> registration before codec initialization is completed. Manager
>>>>> will enter in to bad state due to codec read/write failures.
>>>>> We are intended to keep the ACP in D0 state, till sound card
>>>>> is created and jack controls are initialized. To handle, at manager
>>>>> level increased runtime suspend delay.
>>>> This doesn't look too good. You should not assume any timing
>>>> dependencies in the machine driver probe. I made that mistake in earlier
>>>> versions and we had to revisit all this to make sure drivers could be
>>>> bound/unbound at any time.
>>> Rather than a timing dependency, could you perhaps prohibit runtime PM
>>> and have a codec make a callback to indicate it's fully initialized and
>>> then allow runtime PM again?
>> We already have enumeration and initialization 'struct completion' that
>> are used by codec drivers to know if the hardware is usable. We also
>> have pm_runtime_get_sync() is the bus layer to make sure the codec is
>> resumed before being accessed.
> Instead of walking through codec list and checking completion status
> for every codec over the link, can we have some solution where once
> all codecs gets enumerated and initialized, a variable in bus instance
> will be updated to know all peripherals initialized. So that we can
> check this variable in machine driver.
No, because the bus cannot know for sure what codecs to expect on the
platform.
This comes from the design, we first create a bunch of devices based on
ACPI information, which causes the drivers to probe. Then when the bus
starts, codecs that are physically present on the bus will attach and be
initialized in the update_status callback.
It's perfectly acceptable for devices to be exposed in ACPI and not be
present on a board. The bus wouldn't know what is needed.
I am still not clear on what the "early card registration" issue might be.
Can you clarify which codec registers are accessed in that case, are
those supposed to be managed with regmap? one possibility is that we
need to make sure the codec drivers are in regmap cache_only probe at
the probe time, that may prevent this sort of uncontrolled register
access. I had a PR on this that I haven't touched in a while, see [1]
I do recall some issues with the codec jacks, where if the card
registration happens too late the codec might have suspended. But we
added pm_runtime_resume_and_get in the set_jack_detect callbacks, so
that was solved.
[1] https://github.com/thesofproject/linux/pull/3941
On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote:
> I do recall some issues with the codec jacks, where if the card
> registration happens too late the codec might have suspended. But we
> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so
> that was solved.
Right, I would expect that whatever needs the device to be powered on
would be explicitly ensuring that this is done rather than tweaking
timeouts - the timeouts should be more of a performance thing to avoid
bouncing power too much, not a correctness thing.
On 14/01/23 01:27, Mark Brown wrote:
> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote:
>
>> I do recall some issues with the codec jacks, where if the card
>> registration happens too late the codec might have suspended. But we
>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so
>> that was solved.
> Right, I would expect that whatever needs the device to be powered on
> would be explicitly ensuring that this is done rather than tweaking
> timeouts - the timeouts should be more of a performance thing to avoid
> bouncing power too much, not a correctness thing.
Machine driver probe is executed in parallel with Manager driver
probe sequence. Because of it, before completion of all peripherals
enumeration across the multiple links, if card registration is
completed, codec register writes will fail as Codec device numbers
are not assigned.
If we understood correctly, as per your suggestion, We shouldn't use any
time bounds in machine driver probe sequence and before registering the
sound card, need to traverses through all peripheral initialization completion
status for all the managers.
On 1/16/23 02:35, Mukunda,Vijendar wrote:
> On 14/01/23 01:27, Mark Brown wrote:
>> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote:
>>
>>> I do recall some issues with the codec jacks, where if the card
>>> registration happens too late the codec might have suspended. But we
>>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so
>>> that was solved.
>> Right, I would expect that whatever needs the device to be powered on
>> would be explicitly ensuring that this is done rather than tweaking
>> timeouts - the timeouts should be more of a performance thing to avoid
>> bouncing power too much, not a correctness thing.
> Machine driver probe is executed in parallel with Manager driver
> probe sequence. Because of it, before completion of all peripherals
> enumeration across the multiple links, if card registration is
> completed, codec register writes will fail as Codec device numbers
> are not assigned.
>
> If we understood correctly, as per your suggestion, We shouldn't use any
> time bounds in machine driver probe sequence and before registering the
> sound card, need to traverses through all peripheral initialization completion
> status for all the managers.
What's not clear in your reply is this:
What codec registers are accessed as a result of the machine driver
probe and card registration, and in what part of the card registration?
Are we talking about SoundWire 'standard' registers for device/port
management, about vendor specific ones that are exposed to userspace, or
vendor-specific ones entirely configured by the driver/regmap.
You've got to give us more data or understanding of the sequence to
help. Saying there's a race condition doesn't really help if there's
nothing that explains what codec registers are accessed and when.
On 16/01/23 20:32, Pierre-Louis Bossart wrote:
>
> On 1/16/23 02:35, Mukunda,Vijendar wrote:
>> On 14/01/23 01:27, Mark Brown wrote:
>>> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote:
>>>
>>>> I do recall some issues with the codec jacks, where if the card
>>>> registration happens too late the codec might have suspended. But we
>>>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so
>>>> that was solved.
>>> Right, I would expect that whatever needs the device to be powered on
>>> would be explicitly ensuring that this is done rather than tweaking
>>> timeouts - the timeouts should be more of a performance thing to avoid
>>> bouncing power too much, not a correctness thing.
>> Machine driver probe is executed in parallel with Manager driver
>> probe sequence. Because of it, before completion of all peripherals
>> enumeration across the multiple links, if card registration is
>> completed, codec register writes will fail as Codec device numbers
>> are not assigned.
>>
>> If we understood correctly, as per your suggestion, We shouldn't use any
>> time bounds in machine driver probe sequence and before registering the
>> sound card, need to traverses through all peripheral initialization completion
>> status for all the managers.
> What's not clear in your reply is this:
>
> What codec registers are accessed as a result of the machine driver
> probe and card registration, and in what part of the card registration?
>
> Are we talking about SoundWire 'standard' registers for device/port
> management, about vendor specific ones that are exposed to userspace, or
> vendor-specific ones entirely configured by the driver/regmap.
>
> You've got to give us more data or understanding of the sequence to
> help. Saying there's a race condition doesn't really help if there's
> nothing that explains what codec registers are accessed and when.
We have come across a race condition, where sound card registration
is successful before codec enumerations across all the links gets completed
and our manager instance going into bad state.
Please refer below link for error logs.
https://pastebin.com/ZYEN928S
On 1/17/23 05:33, Mukunda,Vijendar wrote:
> On 16/01/23 20:32, Pierre-Louis Bossart wrote:
>>
>> On 1/16/23 02:35, Mukunda,Vijendar wrote:
>>> On 14/01/23 01:27, Mark Brown wrote:
>>>> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote:
>>>>
>>>>> I do recall some issues with the codec jacks, where if the card
>>>>> registration happens too late the codec might have suspended. But we
>>>>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so
>>>>> that was solved.
>>>> Right, I would expect that whatever needs the device to be powered on
>>>> would be explicitly ensuring that this is done rather than tweaking
>>>> timeouts - the timeouts should be more of a performance thing to avoid
>>>> bouncing power too much, not a correctness thing.
>>> Machine driver probe is executed in parallel with Manager driver
>>> probe sequence. Because of it, before completion of all peripherals
>>> enumeration across the multiple links, if card registration is
>>> completed, codec register writes will fail as Codec device numbers
>>> are not assigned.
>>>
>>> If we understood correctly, as per your suggestion, We shouldn't use any
>>> time bounds in machine driver probe sequence and before registering the
>>> sound card, need to traverses through all peripheral initialization completion
>>> status for all the managers.
>> What's not clear in your reply is this:
>>
>> What codec registers are accessed as a result of the machine driver
>> probe and card registration, and in what part of the card registration?
>>
>> Are we talking about SoundWire 'standard' registers for device/port
>> management, about vendor specific ones that are exposed to userspace, or
>> vendor-specific ones entirely configured by the driver/regmap.
>>
>> You've got to give us more data or understanding of the sequence to
>> help. Saying there's a race condition doesn't really help if there's
>> nothing that explains what codec registers are accessed and when.
> We have come across a race condition, where sound card registration
> is successful before codec enumerations across all the links gets completed
> and our manager instance going into bad state.
>
> Please refer below link for error logs.
> https://pastebin.com/ZYEN928S
You have two RT1316 register areas that are accessed while the codec is
not even enumerated:
[ 2.755828] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at
snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register:
[0x41080100] -22
[ 2.758904] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at
snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register:
[0x00003004] -110
The last one is clearly listed in the regmap list.
You probably want to reverse-engineer what causes these accesses.
I see this suspicious kcontrol definition that might be related:
SOC_SINGLE("Left I Tag Select", 0x3004, 4, 7, 0),
On Tue, Jan 17, 2023 at 05:51:03AM -0600, Pierre-Louis Bossart wrote:
> On 1/17/23 05:33, Mukunda,Vijendar wrote:
> [ 2.758904] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at
> snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register:
> [0x00003004] -110
> The last one is clearly listed in the regmap list.
> You probably want to reverse-engineer what causes these accesses.
> I see this suspicious kcontrol definition that might be related:
> SOC_SINGLE("Left I Tag Select", 0x3004, 4, 7, 0),
Looks like a case for putting the CODEC in cache only mode...
On 1/17/23 06:16, Mark Brown wrote:
> On Tue, Jan 17, 2023 at 05:51:03AM -0600, Pierre-Louis Bossart wrote:
>> On 1/17/23 05:33, Mukunda,Vijendar wrote:
>
>> [ 2.758904] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at
>> snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register:
>> [0x00003004] -110
>
>> The last one is clearly listed in the regmap list.
>
>> You probably want to reverse-engineer what causes these accesses.
>> I see this suspicious kcontrol definition that might be related:
>
>> SOC_SINGLE("Left I Tag Select", 0x3004, 4, 7, 0),
>
> Looks like a case for putting the CODEC in cache only mode...
Right, and I think we'd need to do this during the probe instead of the
hardware initialization (which could happen at a later time).
I started a PR to try and improve regmap handling, see
https://github.com/thesofproject/linux/pull/3941
I was trying to solve the case where codecs become unattached, but
apparently the problem is hardware-related. One of the suggested
improvements was to move the cache_only part earlier to prevent such
accesses. Unfortunately the work isn't complete so that PR is just a
draft at the moment.