Amlogic G12B and SM1 devices experience CPU stalls and random board
wedges when the system idles and CPU cores clock down to lower opp
points. Recent vendor kernels include a change to remove 100-250MHz
(with no explanation) [0] but other downstream sources also remove
the 500/667MHz points (also with no explanation). Unless 100-667Mhz
opps are removed or the CPU governor forced to performance, stalls
are observed, so let's remove them an improve stability/uptime.
[0] https://github.com/khadas/linux/commit/20e237a4fe9f0302370e24950cb1416e038eee03
Signed-off-by: Christian Hewitt <[email protected]>
---
Numerous people have experienced this issue and I have tested with
only the low opp-points removed and numerous voltage tweaks: but it
makes no difference. With the opp points present an Odroid N2 or
Khadas VIM3 reliably drop off my network after being left idling
overnight with UART showing a CPU stall splat. With the opp points
removed I see weeks of uninterupted uptime. It's beyond my skills
to research what the cause of the stalls might be, but if anyone
ever figures it out we can always restore things. NB: This issue
is not too widely reported in forums, but that's largely because
most of the Amlogic supporting distros have been including this
change picked from my kernel patchset for some time.
.../boot/dts/amlogic/meson-g12b-a311d.dtsi | 40 -------------------
.../boot/dts/amlogic/meson-g12b-s922x.dtsi | 40 -------------------
arch/arm64/boot/dts/amlogic/meson-sm1.dtsi | 20 ----------
3 files changed, 100 deletions(-)
diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi
index d61f43052a34..8e9ad1e51d66 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi
@@ -11,26 +11,6 @@
compatible = "operating-points-v2";
opp-shared;
- opp-100000000 {
- opp-hz = /bits/ 64 <100000000>;
- opp-microvolt = <731000>;
- };
-
- opp-250000000 {
- opp-hz = /bits/ 64 <250000000>;
- opp-microvolt = <731000>;
- };
-
- opp-500000000 {
- opp-hz = /bits/ 64 <500000000>;
- opp-microvolt = <731000>;
- };
-
- opp-667000000 {
- opp-hz = /bits/ 64 <667000000>;
- opp-microvolt = <731000>;
- };
-
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <761000>;
@@ -71,26 +51,6 @@
compatible = "operating-points-v2";
opp-shared;
- opp-100000000 {
- opp-hz = /bits/ 64 <100000000>;
- opp-microvolt = <731000>;
- };
-
- opp-250000000 {
- opp-hz = /bits/ 64 <250000000>;
- opp-microvolt = <731000>;
- };
-
- opp-500000000 {
- opp-hz = /bits/ 64 <500000000>;
- opp-microvolt = <731000>;
- };
-
- opp-667000000 {
- opp-hz = /bits/ 64 <667000000>;
- opp-microvolt = <731000>;
- };
-
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <731000>;
diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
index 1e5d0ee5d541..44c23c984034 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
@@ -11,26 +11,6 @@
compatible = "operating-points-v2";
opp-shared;
- opp-100000000 {
- opp-hz = /bits/ 64 <100000000>;
- opp-microvolt = <731000>;
- };
-
- opp-250000000 {
- opp-hz = /bits/ 64 <250000000>;
- opp-microvolt = <731000>;
- };
-
- opp-500000000 {
- opp-hz = /bits/ 64 <500000000>;
- opp-microvolt = <731000>;
- };
-
- opp-667000000 {
- opp-hz = /bits/ 64 <667000000>;
- opp-microvolt = <731000>;
- };
-
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <731000>;
@@ -76,26 +56,6 @@
compatible = "operating-points-v2";
opp-shared;
- opp-100000000 {
- opp-hz = /bits/ 64 <100000000>;
- opp-microvolt = <751000>;
- };
-
- opp-250000000 {
- opp-hz = /bits/ 64 <250000000>;
- opp-microvolt = <751000>;
- };
-
- opp-500000000 {
- opp-hz = /bits/ 64 <500000000>;
- opp-microvolt = <751000>;
- };
-
- opp-667000000 {
- opp-hz = /bits/ 64 <667000000>;
- opp-microvolt = <751000>;
- };
-
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <771000>;
diff --git a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi
index 3c07a89bfd27..80737731af3f 100644
--- a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi
@@ -95,26 +95,6 @@
compatible = "operating-points-v2";
opp-shared;
- opp-100000000 {
- opp-hz = /bits/ 64 <100000000>;
- opp-microvolt = <730000>;
- };
-
- opp-250000000 {
- opp-hz = /bits/ 64 <250000000>;
- opp-microvolt = <730000>;
- };
-
- opp-500000000 {
- opp-hz = /bits/ 64 <500000000>;
- opp-microvolt = <730000>;
- };
-
- opp-667000000 {
- opp-hz = /bits/ 64 <666666666>;
- opp-microvolt = <750000>;
- };
-
opp-1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <770000>;
--
2.17.1
Christian Hewitt <[email protected]> writes:
> Amlogic G12B and SM1 devices experience CPU stalls and random board
> wedges when the system idles and CPU cores clock down to lower opp
> points. Recent vendor kernels include a change to remove 100-250MHz
> (with no explanation) [0] but other downstream sources also remove
> the 500/667MHz points (also with no explanation). Unless 100-667Mhz
> opps are removed or the CPU governor forced to performance, stalls
> are observed, so let's remove them an improve stability/uptime.
>
> [0] https://github.com/khadas/linux/commit/20e237a4fe9f0302370e24950cb1416e038eee03
hehe, not a very helpful changelog in that khadas kernel commit :(
> Signed-off-by: Christian Hewitt <[email protected]>
> ---
> Numerous people have experienced this issue and I have tested with
> only the low opp-points removed and numerous voltage tweaks: but it
> makes no difference. With the opp points present an Odroid N2 or
> Khadas VIM3 reliably drop off my network after being left idling
> overnight with UART showing a CPU stall splat. With the opp points
> removed I see weeks of uninterupted uptime. It's beyond my skills
> to research what the cause of the stalls might be, but if anyone
> ever figures it out we can always restore things. NB: This issue
> is not too widely reported in forums, but that's largely because
> most of the Amlogic supporting distros have been including this
> change picked from my kernel patchset for some time.
Very interesting. I've also noticed instability across suspend resume
on VIM3/VIM3L and only got as far in debugging to noticing it was
DVFS/OPP related, but didn't get much further yet. I'll give this a try
to see if it helps.
Thanks for finding & posting!
Kevin
Christian Hewitt <[email protected]> writes:
> Amlogic G12B and SM1 devices experience CPU stalls and random board
> wedges when the system idles and CPU cores clock down to lower opp
> points. Recent vendor kernels include a change to remove 100-250MHz
> (with no explanation) [0] but other downstream sources also remove
> the 500/667MHz points (also with no explanation). Unless 100-667Mhz
> opps are removed or the CPU governor forced to performance, stalls
> are observed, so let's remove them an improve stability/uptime.
Just curious: what CPUfreq governor do you use by default for the
LibreELEC kernel?
Your patch greatly improves the stability I'm seeing, but doesn't quite
elimitate it.
I'm testing suspend/resume in a loop on VIM3, and with schedutil
(default) or ondemand, it eventually hangs. With either powersave or
performance it's stable.
Kevin
> On 10 Feb 2022, at 5:31 am, Kevin Hilman <[email protected]> wrote:
>
> Christian Hewitt <[email protected]> writes:
>
>> Amlogic G12B and SM1 devices experience CPU stalls and random board
>> wedges when the system idles and CPU cores clock down to lower opp
>> points. Recent vendor kernels include a change to remove 100-250MHz
>> (with no explanation) [0] but other downstream sources also remove
>> the 500/667MHz points (also with no explanation). Unless 100-667Mhz
>> opps are removed or the CPU governor forced to performance, stalls
>> are observed, so let's remove them an improve stability/uptime.
>
> Just curious: what CPUfreq governor do you use by default for the
> LibreELEC kernel?
LE uses ondemand. One of the original clues on the problem us that the
issue isn’t seen in some of the retro-gaming forks on LE's codebase
which use the performance governor (and overclocks, etc.)
> Your patch greatly improves the stability I'm seeing, but doesn't quite
> elimitate it.
>
> I'm testing suspend/resume in a loop on VIM3, and with schedutil
> (default) or ondemand, it eventually hangs. With either powersave or
> performance it's stable.
>
> Kevin
On 09/02/2022 14:55, Christian Hewitt wrote:
> Amlogic G12B and SM1 devices experience CPU stalls and random board
> wedges when the system idles and CPU cores clock down to lower opp
> points. Recent vendor kernels include a change to remove 100-250MHz
> (with no explanation) [0] but other downstream sources also remove
> the 500/667MHz points (also with no explanation). Unless 100-667Mhz
> opps are removed or the CPU governor forced to performance, stalls
> are observed, so let's remove them an improve stability/uptime.
>
> [0] https://github.com/khadas/linux/commit/20e237a4fe9f0302370e24950cb1416e038eee03
>
> Signed-off-by: Christian Hewitt <[email protected]>
> ---
> Numerous people have experienced this issue and I have tested with
> only the low opp-points removed and numerous voltage tweaks: but it
> makes no difference. With the opp points present an Odroid N2 or
> Khadas VIM3 reliably drop off my network after being left idling
> overnight with UART showing a CPU stall splat. With the opp points
> removed I see weeks of uninterupted uptime. It's beyond my skills
> to research what the cause of the stalls might be, but if anyone
> ever figures it out we can always restore things. NB: This issue
> is not too widely reported in forums, but that's largely because
> most of the Amlogic supporting distros have been including this
> change picked from my kernel patchset for some time.
>
> .../boot/dts/amlogic/meson-g12b-a311d.dtsi | 40 -------------------
> .../boot/dts/amlogic/meson-g12b-s922x.dtsi | 40 -------------------
> arch/arm64/boot/dts/amlogic/meson-sm1.dtsi | 20 ----------
> 3 files changed, 100 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi
> index d61f43052a34..8e9ad1e51d66 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi
> @@ -11,26 +11,6 @@
> compatible = "operating-points-v2";
> opp-shared;
>
> - opp-100000000 {
> - opp-hz = /bits/ 64 <100000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-250000000 {
> - opp-hz = /bits/ 64 <250000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-500000000 {
> - opp-hz = /bits/ 64 <500000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-667000000 {
> - opp-hz = /bits/ 64 <667000000>;
> - opp-microvolt = <731000>;
> - };
> -
> opp-1000000000 {
> opp-hz = /bits/ 64 <1000000000>;
> opp-microvolt = <761000>;
> @@ -71,26 +51,6 @@
> compatible = "operating-points-v2";
> opp-shared;
>
> - opp-100000000 {
> - opp-hz = /bits/ 64 <100000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-250000000 {
> - opp-hz = /bits/ 64 <250000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-500000000 {
> - opp-hz = /bits/ 64 <500000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-667000000 {
> - opp-hz = /bits/ 64 <667000000>;
> - opp-microvolt = <731000>;
> - };
> -
> opp-1000000000 {
> opp-hz = /bits/ 64 <1000000000>;
> opp-microvolt = <731000>;
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
> index 1e5d0ee5d541..44c23c984034 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
> @@ -11,26 +11,6 @@
> compatible = "operating-points-v2";
> opp-shared;
>
> - opp-100000000 {
> - opp-hz = /bits/ 64 <100000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-250000000 {
> - opp-hz = /bits/ 64 <250000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-500000000 {
> - opp-hz = /bits/ 64 <500000000>;
> - opp-microvolt = <731000>;
> - };
> -
> - opp-667000000 {
> - opp-hz = /bits/ 64 <667000000>;
> - opp-microvolt = <731000>;
> - };
> -
> opp-1000000000 {
> opp-hz = /bits/ 64 <1000000000>;
> opp-microvolt = <731000>;
> @@ -76,26 +56,6 @@
> compatible = "operating-points-v2";
> opp-shared;
>
> - opp-100000000 {
> - opp-hz = /bits/ 64 <100000000>;
> - opp-microvolt = <751000>;
> - };
> -
> - opp-250000000 {
> - opp-hz = /bits/ 64 <250000000>;
> - opp-microvolt = <751000>;
> - };
> -
> - opp-500000000 {
> - opp-hz = /bits/ 64 <500000000>;
> - opp-microvolt = <751000>;
> - };
> -
> - opp-667000000 {
> - opp-hz = /bits/ 64 <667000000>;
> - opp-microvolt = <751000>;
> - };
> -
> opp-1000000000 {
> opp-hz = /bits/ 64 <1000000000>;
> opp-microvolt = <771000>;
> diff --git a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi
> index 3c07a89bfd27..80737731af3f 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi
> @@ -95,26 +95,6 @@
> compatible = "operating-points-v2";
> opp-shared;
>
> - opp-100000000 {
> - opp-hz = /bits/ 64 <100000000>;
> - opp-microvolt = <730000>;
> - };
> -
> - opp-250000000 {
> - opp-hz = /bits/ 64 <250000000>;
> - opp-microvolt = <730000>;
> - };
> -
> - opp-500000000 {
> - opp-hz = /bits/ 64 <500000000>;
> - opp-microvolt = <730000>;
> - };
> -
> - opp-667000000 {
> - opp-hz = /bits/ 64 <666666666>;
> - opp-microvolt = <750000>;
> - };
> -
> opp-1000000000 {
> opp-hz = /bits/ 64 <1000000000>;
> opp-microvolt = <770000>;
Can you find if an acceptable set of Fixes tag can be added to permit backporting to the current LTS kernels ?
Neil
Christian Hewitt <[email protected]> writes:
>> On 10 Feb 2022, at 5:31 am, Kevin Hilman <[email protected]> wrote:
>>
>> Christian Hewitt <[email protected]> writes:
>>
>>> Amlogic G12B and SM1 devices experience CPU stalls and random board
>>> wedges when the system idles and CPU cores clock down to lower opp
>>> points. Recent vendor kernels include a change to remove 100-250MHz
>>> (with no explanation) [0] but other downstream sources also remove
>>> the 500/667MHz points (also with no explanation). Unless 100-667Mhz
>>> opps are removed or the CPU governor forced to performance, stalls
>>> are observed, so let's remove them an improve stability/uptime.
>>
>> Just curious: what CPUfreq governor do you use by default for the
>> LibreELEC kernel?
>
> LE uses ondemand. One of the original clues on the problem us that the
> issue isn’t seen in some of the retro-gaming forks on LE's codebase
> which use the performance governor (and overclocks, etc.)
OK, thanks. And does LE ever do full system suspend/resume? Are things
stable for you across multiple suspend/resume cycles on G12B or SM1
devices?
I'm seeing hat with either powersave or performance, repeated
suspend/resume is stable, but with ondemand or schedultil it's not, even
with $SUBJECT patch applied.
If you have some time to test, seeing how long this loop[1] runs with
ondemand vs performance or powersave would be instructive.
Even more interesting... if I set the governor to performance, but set
the suspend OPP to 1GHz[2] (which is what it would be for the powersave
governor), it is also unstable. This suggests (to me) that any sort of
OPP change during the suspend/resume process is going to be unstable.
Now the challenge is to understand why so we can avoid it.
Thanks,
Kevin
[1]
while true; do
echo "=== SUSPEND ==="
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
cat /sys/devices/system/cpu/cpufreq/policy2/scaling_governor
cat /sys/devices/system/cpu/cpufreq/policy2/scaling_cur_freq
echo rtcwake -d rtc0 -m mem -s4
echo "=== RESUME ==="
sleep 4
done
[2]
diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
index 1e5d0ee5d541..37da8be85288 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi
@@ -14,6 +14,7 @@ cpu_opp_table_0: opp-table-0 {
opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
opp-microvolt = <731000>;
+ opp-suspend;
};
opp-250000000 {
@@ -79,6 +80,7 @@ cpub_opp_table_1: opp-table-1 {
opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
opp-microvolt = <751000>;
+ opp-suspend;
};
opp-250000000 {