Here is the situation we've met on an ARM SoC:
We have an ARM SoC with dedicated power input for CPU core and supports different
CPU clocks. CPU core power is supplied by external regulator, it's controlled by
PWM channel from the SoC. DTS has node for pwm-regulator with voltage table
(voltage table is used instead of range apparently to use only fine-tuned
duty-cycle values) and with boot-on and always-on properties. This regulator is
bound to cpu node. The pwm-regulator is inited at very early stage before bootloader
in vendor closed-source code.
When a pwm-regulator is probed in the kernel it gets pwm current state and search
voltage table by dutycycle to figure out the current voltage. If that search failed
then the regulator goes to notrecoverbale state and the core sets the minimal power
for the regulator.
The situation: bootloader sets mean cpu power and mean cpu clock.
but that cpu power is not found in the voltage table (value is between table items)
due to different versions of bootloader and kernel and the regulator core sets
the minimal power but cpu clock stays the same. CPU hangs somewhere during boot.
The core problem as I see it is if regulator is bound to CPU (or some other
complex consumer) it can't be changed except by the consumer at any stage. So
the regulator driver (core part) should wait for the own consumer to init
it properly but regulator can't be in unknown state after probing.
What you think? The least should be done is to report about the situation.
George Stark (1):
regulator: core: add warning for not-recoverable state
drivers/regulator/core.c | 4 ++++
1 file changed, 4 insertions(+)
--
2.25.1
On Mon, Jun 10, 2024 at 03:00:24PM +0300, George Stark wrote:
> Here is the situation we've met on an ARM SoC:
Please don't send cover letters for single patches, if there is anything
that needs saying put it in the changelog of the patch or after the ---
if it's administrative stuff. This reduces mail volume and ensures that
any important information is recorded in the changelog rather than being
lost.
On Mon, Jun 10, 2024 at 03:00:24PM +0300, George Stark wrote:
> The situation: bootloader sets mean cpu power and mean cpu clock.
> but that cpu power is not found in the voltage table (value is between table items)
> due to different versions of bootloader and kernel and the regulator core sets
> the minimal power but cpu clock stays the same. CPU hangs somewhere during boot.
Why not just add this OPP to the table the kernel knows about? Clearly
it's something the vendor set and presumably thinks the device can
actually operate at. As far as I can tell you're only having problems
here because you've got a software defined regulator but haven't given
the software information about this configuration so it's got no idea
what's going on when bootstrapping.
> The core problem as I see it is if regulator is bound to CPU (or some other
> complex consumer) it can't be changed except by the consumer at any stage. So
> the regulator driver (core part) should wait for the own consumer to init
> it properly but regulator can't be in unknown state after probing.
If the regulator is configured outside the constraints configured for it
in the binding then the core will bring the regulator within those
constraints, some systems with regulator configurations fixed in
silicon rely on this for correct performance. Regulators with
unreadable hardware are very much an edge case when it comes to this,
what works for one system will be broken for another one so we just have
to pick a behaviour that will hopefully work more often than it breaks.
We can't rely on consumers setting a voltage since consumers are only
expected to set a voltage if they are actively managing it at runtime,
other consumers should rely on system configuration.
Hello Mark
Thanks for the review. Sorry about the separate cover-letter.
On 6/10/24 18:37, Mark Brown wrote:
> On Mon, Jun 10, 2024 at 03:00:24PM +0300, George Stark wrote:
>
>> The situation: bootloader sets mean cpu power and mean cpu clock.
>> but that cpu power is not found in the voltage table (value is between table items)
>> due to different versions of bootloader and kernel and the regulator core sets
>> the minimal power but cpu clock stays the same. CPU hangs somewhere during boot.
>
> Why not just add this OPP to the table the kernel knows about? Clearly
> it's something the vendor set and presumably thinks the device can
> actually operate at. As far as I can tell you're only having problems
> here because you've got a software defined regulator but haven't given
> the software information about this configuration so it's got no idea
> what's going on when bootstrapping.
Actually we did a similar thing - we modified voltage table adding
duty-cycle that was set by bootloader with fractional voltage value that
is not used in opp table - just to make pwm-regulator happy.
But this issue was very hard to find. Due to deviations of hardware
component's characteristics some devices managed to keep working with
minimal voltage till cpu opp driver got probed and appropriate voltage
is restored. Other devices got stuck at different places with random errors.
>
>> The core problem as I see it is if regulator is bound to CPU (or some other
>> complex consumer) it can't be changed except by the consumer at any stage. So
>> the regulator driver (core part) should wait for the own consumer to init
>> it properly but regulator can't be in unknown state after probing.
>
> If the regulator is configured outside the constraints configured for it
> in the binding then the core will bring the regulator within those
> constraints, some systems with regulator configurations fixed in
> silicon rely on this for correct performance. Regulators with
> unreadable hardware are very much an edge case when it comes to this,
> what works for one system will be broken for another one so we just have
> to pick a behaviour that will hopefully work more often than it breaks.
> We can't rely on consumers setting a voltage since consumers are only
> expected to set a voltage if they are actively managing it at runtime,
> other consumers should rely on system configuration.
Of course such a behavior should be configurable. At the other hand it
may be too much changes for a corner case that's why I proposed only a
warning patch just to simplify detecting the problem.
Actually we already have a hint that says voltage is reset:
rdev_info(rdev, "Setting %d-%duV\n",
rdev->constraints->min_uV,
rdev->constraints->max_uV);
but there's no indication this is due to regulator device error.
Should I consider adding my warning only for "system-critical-regulator"
regulators (cpu power regulator is critical indeed)? Although this
property is never used in mainline bindings.
--
Best regards
George
On Tue, Jun 11, 2024 at 12:33:42AM +0300, George Stark wrote:
> Actually we did a similar thing - we modified voltage table adding
> duty-cycle that was set by bootloader with fractional voltage value that is
> not used in opp table - just to make pwm-regulator happy.
> But this issue was very hard to find. Due to deviations of hardware
> component's characteristics some devices managed to keep working with
> minimal voltage till cpu opp driver got probed and appropriate voltage is
> restored. Other devices got stuck at different places with random errors.
I think the diagnostics you are looking for here are in the PWM
regulator driver. The core does announce that it's bringing the voltage
into range when it does it so there's a hint that the voltage got
changed there.
> Of course such a behavior should be configurable. At the other hand it may
> be too much changes for a corner case that's why I proposed only a
> warning patch just to simplify detecting the problem.
> Actually we already have a hint that says voltage is reset:
> rdev_info(rdev, "Setting %d-%duV\n",
> rdev->constraints->min_uV,
> rdev->constraints->max_uV);
> but there's no indication this is due to regulator device error.
The issue is that it may not be an error, it may be normal operation -
there are some regulators (those for some of the Qualcomm firmware
controlled devices for example) which are purely write only so we've got
no way to tell what the current configuration is and always need to
write out what we want during startup.
> Should I consider adding my warning only for "system-critical-regulator"
> regulators (cpu power regulator is critical indeed)? Although this property
> is never used in mainline bindings.
To the extent that it's an issue it's going to depend on the specific
regulator, the driver for a given regulator is best placed to know if
being able to read back from the hardware is expected and if it should
complain about not being able to.