2020-11-18 14:52:40

by Nishanth Menon

[permalink] [raw]
Subject: [PATCH] regulator: ti-abb: Fix array out of bound read access on the first transition

At the start of driver initialization, we do not know what bias
setting the bootloader has configured the system for and we only know
for certain the very first time we do a transition.

However, since the initial value of the comparison index is -EINVAL,
this negative value results in an array out of bound access on the
very first transition.

Since we don't know what the setting is, we just set the bias
configuration as there is nothing to compare against. This prevents
the array out of bound access.

NOTE: Even though we could use a more relaxed check of "< 0" the only
valid values(ignoring cosmic ray induced bitflips) are -EINVAL, 0+.

Fixes: 40b1936efebd ("regulator: Introduce TI Adaptive Body Bias(ABB) on-chip LDO driver")
Link: https://lore.kernel.org/linux-mm/CA+G9fYuk4imvhyCN7D7T6PMDH6oNp6HDCRiTUKMQ6QXXjBa4ag@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Nishanth Menon <[email protected]>
---

Mark,

I will leave it to your descretion if this needs to be tagged for
stable or to drop the Fixes tag - Side effect, if this occurs, will be
an unstable system very hard to track down - but typically occurring
during system boot - Impacts systems: DM3/OMAP3,4,5,DRA7/AM5x.

I would categorize this as "This could be a problem..." problem..
the bug is an out of bound read, and has been around since v3.11 and is
not a catastrophic data corruption kind of issue.

Though theoretically, there is a possibility that the compare may
pass and result in missing bias configuration(and resulting system
will be unstable), I have'nt heard of actual report (but, it will be
surprising if any actual instability was actually tracked down to this
bug). Any ways, I had to go to git full history to pick the exact
commit - I have left it in the patch.


Arnd,

I have left your reviewed-by from the thread, I can fixup
any further commit message comments if you may have.

drivers/regulator/ti-abb-regulator.c | 12 +++++++++++- 1 file changed,
11 insertions(+), 1 deletion(-)

diff --git a/drivers/regulator/ti-abb-regulator.c b/drivers/regulator/ti-abb-regulator.c
index 3e60bff76194..9f0a4d50cead 100644
--- a/drivers/regulator/ti-abb-regulator.c
+++ b/drivers/regulator/ti-abb-regulator.c
@@ -342,8 +342,17 @@ static int ti_abb_set_voltage_sel(struct regulator_dev *rdev, unsigned sel)
return ret;
}

- /* If data is exactly the same, then just update index, no change */
info = &abb->info[sel];
+ /*
+ * When Linux kernel is starting up, we are'nt sure of the
+ * Bias configuration that bootloader has configured.
+ * So, we get to know the actual setting the first time
+ * we are asked to transition.
+ */
+ if (abb->current_info_idx == -EINVAL)
+ goto just_set_abb;
+
+ /* If data is exactly the same, then just update index, no change */
oinfo = &abb->info[abb->current_info_idx];
if (!memcmp(info, oinfo, sizeof(*info))) {
dev_dbg(dev, "%s: Same data new idx=%d, old idx=%d\n", __func__,
@@ -351,6 +360,7 @@ static int ti_abb_set_voltage_sel(struct regulator_dev *rdev, unsigned sel)
goto out;
}

+just_set_abb:
ret = ti_abb_set_opp(rdev, abb, info);

out:
--
2.29.2


2020-11-18 15:40:34

by Andreas Kemnade

[permalink] [raw]
Subject: Re: [PATCH] regulator: ti-abb: Fix array out of bound read access on the first transition

Hi,

On Wed, 18 Nov 2020 08:50:09 -0600
Nishanth Menon <[email protected]> wrote:

> At the start of driver initialization, we do not know what bias
> setting the bootloader has configured the system for and we only know
> for certain the very first time we do a transition.
>
> However, since the initial value of the comparison index is -EINVAL,
> this negative value results in an array out of bound access on the
> very first transition.
>
> Since we don't know what the setting is, we just set the bias
> configuration as there is nothing to compare against. This prevents
> the array out of bound access.
>
> NOTE: Even though we could use a more relaxed check of "< 0" the only
> valid values(ignoring cosmic ray induced bitflips) are -EINVAL, 0+.
>
> Fixes: 40b1936efebd ("regulator: Introduce TI Adaptive Body Bias(ABB) on-chip LDO driver")
> Link: https://lore.kernel.org/linux-mm/CA+G9fYuk4imvhyCN7D7T6PMDH6oNp6HDCRiTUKMQ6QXXjBa4ag@mail.gmail.com/
> Reported-by: Naresh Kamboju <[email protected]>
> Reviewed-by: Arnd Bergmann <[email protected]>
> Signed-off-by: Nishanth Menon <[email protected]>
> ---
>
> Mark,
>
> I will leave it to your descretion if this needs to be tagged for
> stable or to drop the Fixes tag - Side effect, if this occurs, will be
> an unstable system very hard to track down - but typically occurring
> during system boot - Impacts systems: DM3/OMAP3,4,5,DRA7/AM5x.
>
> I would categorize this as "This could be a problem..." problem..
> the bug is an out of bound read, and has been around since v3.11 and is
> not a catastrophic data corruption kind of issue.
>
> Though theoretically, there is a possibility that the compare may
> pass and result in missing bias configuration(and resulting system
> will be unstable), I have'nt heard of actual report (but, it will be
> surprising if any actual instability was actually tracked down to this
> bug). Any ways, I had to go to git full history to pick the exact
> commit - I have left it in the patch.
>
>
Hmm so probably these boot problems which only occur when your debug
cable is not attached?

Is there any connection with commits like this:
ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo

So would the potential problem be more probable by patches like the
that one mentioned above?

Regards,
Andreas

2020-11-18 16:29:45

by Nishanth Menon

[permalink] [raw]
Subject: Re: [PATCH] regulator: ti-abb: Fix array out of bound read access on the first transition

On 16:38-20201118, Andreas Kemnade wrote:
[...]
> > Though theoretically, there is a possibility that the compare may
> > pass and result in missing bias configuration(and resulting system
> > will be unstable), I have'nt heard of actual report (but, it will be
> > surprising if any actual instability was actually tracked down to this
> > bug). Any ways, I had to go to git full history to pick the exact
> > commit - I have left it in the patch.
> >
> >
> Hmm so probably these boot problems which only occur when your debug
> cable is not attached?

If you mean a timing problem introduced by connecting jtag as in a
timing behavior? short answer: NO.

Long winded answer: To understand the bug, we need to understand what
is being programmed, why is there a comparison and then we understand
why this is a bug.

What is bring programmed:

In certain domains, Transistor in the older TI SoCs in OMAP generation
were doped and had specific voltage planes to allow for very fine
grained tuning of transistor switching frequency to be a factor of
vdd (domain voltage) AND a bias voltage. So, if you attempt to switch
the transistors at a specific frequency with either just the voltage
or bias correctly configured, depending on the age of the transistor
and temperature factors, transistors may start misbehaving. This is
basically like a 101 of transistor physics.

Voltage is the other factor that comes into play - Adaptive voltage
control is yet another scheme on TI (and a lot more other SoCs now -
different names, but fundamentally the same thing..) to balance the
correct level so that the transistors dont go and end up in a thermal
runaway scenario. (but not the context of this discussion)

What is the check in the code?
This is basically a check to prevent reconfiguring the ABB (Body bias)
hardware if the configuration that is present on the hardware is the
same as before. this may be the same index, OR could be that multiple
indices have the same programming configuration.

It is the second part (multiple indices have the same programming
configuration) that is of interest for this bug.

Why: The ABB hardware blocks' Bias programming involves a set of steps
that can take too long OR can put the state of the hardware into
intermediate state which may be unstable as the transition of clock
frequencies OR voltage takes place (this is the transients / short
term intermediate configuration "glitches" that we want to avoid).

Now, the problem happens when you compare index[-EINVAL] basically
array negative offset against a valid array, you are comparing
content, so it is not a timing related problem. if it so happens that
the content compared was same (very very very unlikely that after all
linkage etc.. -EINVAL offset will exactly matchup .. but lets say you
use address randomizer of some sort.. then there is a chance that
some unrelated address content may happen to have the same content as
comparison point, yes..).. anyways, if the compare matches, the logic
wrongly assumes.. hey, this was already a configured ABB bias and
ok, lets skip programming it.. Now you end up with freq and voltage
correct and ABB wrongly configured for the duration.. situation
does'nt exactly recover itself till a mismatches content index is
compared.

So, no, unless we are using debug cable to artifically create the
mismatch, you should'nt see a difference of debug vs no debug cable for
this bug at least.
>
> Is there any connection with commits like this:
> ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo

Side note: Help lookup with commit ID..
341afbc9ea39 ("ARM: dts: omap36xx: using OPP1G needs to control the
abb_ldo")

>
> So would the potential problem be more probable by patches like the
> that one mentioned above?

Short answer: no.

Long winded answer:
That is actually fixing the problem where one part of the problem
(enabling ABB is being done for higher OPPs) - we will also need AVS
class 1.5 for a complete solution. Honestly, I have'nt had time to pay
closer attention to what was going on there, but note that bias along
with the correct voltage is the functional state for the transistor
the correct switching frequency. Remember: these are configurations that
TI puts on chip assuming all worst case operational conditions AND
adding a margin on top.. Very very few devices in the broad spectrum of
usage ever come and operate in that "bad space".

So, we may have some level of fall outs in probably corner of the bell
curve in the part distributions in the process.. Chances are very
high that the typical device most folks have are "nominal" and not in
the tail end of the process distribution, so, most of the users can
probably survive having a "non-optimal" configuration.


So the combination probability: you are operating in a corner process
lot in the worst possible condition + your invalid array index
pointed memory content exactly matches the content of valid index
memory content, hence skipping a bias programming... Ummm.. If I
was a betting man, I'd probably bet against it.. But, hey, be a bit
conservative at times, who knows..I think there are few of these
devices that have flown off to space, right?

--
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D

2020-11-18 21:04:27

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] regulator: ti-abb: Fix array out of bound read access on the first transition

On Wed, 18 Nov 2020 08:50:09 -0600, Nishanth Menon wrote:
> At the start of driver initialization, we do not know what bias
> setting the bootloader has configured the system for and we only know
> for certain the very first time we do a transition.
>
> However, since the initial value of the comparison index is -EINVAL,
> this negative value results in an array out of bound access on the
> very first transition.
>
> [...]

Applied to

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next

Thanks!

[1/1] regulator: ti-abb: Fix array out of bound read access on the first transition
commit: 2ba546ebe0ce2af47833d8912ced9b4a579f13cb

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark