2021-02-22 10:37:25

by Markus Reichl

[permalink] [raw]
Subject: [PATCH] Revert "ARM: dts: exynos: Remove 'opp-shared' from Exynos4412 bus OPP-tables"

This reverts commit a23beead41a18c3be3ca409cb52f35bc02e601b9.

I'm running an Odroid-X2 as headless 24/7 server.
With plain stable 5.10.1 I had 54 up days without problems.
With opp-shared removed on kernels before and now on 5.11
my system freezes after some days on disk activity to eMMC
(rsync, apt upgrade).

The spontaneous hangs are not easy to reproduce but testing this
for several months now I am quite confident that there is something
wrong with this patch.

Signed-off-by: Markus Reichl <[email protected]>
---
arch/arm/boot/dts/exynos4412.dtsi | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm/boot/dts/exynos4412.dtsi b/arch/arm/boot/dts/exynos4412.dtsi
index a142fe84010b..6246df278431 100644
--- a/arch/arm/boot/dts/exynos4412.dtsi
+++ b/arch/arm/boot/dts/exynos4412.dtsi
@@ -405,6 +405,7 @@

bus_dmc_opp_table: opp-table1 {
compatible = "operating-points-v2";
+ opp-shared;

opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
@@ -431,6 +432,7 @@

bus_acp_opp_table: opp-table2 {
compatible = "operating-points-v2";
+ opp-shared;

opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
@@ -500,6 +502,7 @@

bus_leftbus_opp_table: opp-table3 {
compatible = "operating-points-v2";
+ opp-shared;

opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
@@ -522,6 +525,7 @@

bus_display_opp_table: opp-table4 {
compatible = "operating-points-v2";
+ opp-shared;

opp-160000000 {
opp-hz = /bits/ 64 <160000000>;
@@ -533,6 +537,7 @@

bus_fsys_opp_table: opp-table5 {
compatible = "operating-points-v2";
+ opp-shared;

opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
@@ -544,6 +549,7 @@

bus_peri_opp_table: opp-table6 {
compatible = "operating-points-v2";
+ opp-shared;

opp-50000000 {
opp-hz = /bits/ 64 <50000000>;
--
2.20.1


2021-02-23 09:36:05

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH] Revert "ARM: dts: exynos: Remove 'opp-shared' from Exynos4412 bus OPP-tables"

Hi Markus,

On 22.02.2021 10:54, Markus Reichl wrote:
> This reverts commit a23beead41a18c3be3ca409cb52f35bc02e601b9.
>
> I'm running an Odroid-X2 as headless 24/7 server.
> With plain stable 5.10.1 I had 54 up days without problems.
> With opp-shared removed on kernels before and now on 5.11
> my system freezes after some days on disk activity to eMMC
> (rsync, apt upgrade).
>
> The spontaneous hangs are not easy to reproduce but testing this
> for several months now I am quite confident that there is something
> wrong with this patch.
>
> Signed-off-by: Markus Reichl <[email protected]>

Thanks for the report.

IMHO a straight revert is a bad idea. I would prefer to keep current opp
definitions and disable the affected devfreq devices (probably right bus
would be enough) or try to identify which transitions are responsible
for that issue. I know that it would take some time to identify them,
but that would be the best solution. Reverting leads to incorrect
hardware description, what in turn confuses the driver and framework,
what in turn hides a real problem.

Another problem related to devfreq on Exynos4412 has been introduced
recently by the commit 86ad9a24f21e ("PM / devfreq: Add required OPPs
support to passive governor"). You can see lots of the messages like
this one:

devfreq soc:bus-acp: failed to update devfreq using passive governor

I didn't have time to check what's wrong there, but I consider devfreq
on Exynos a little bit broken, so another solution would be just to
disable it in the exynos_defconfig.

> ---
> arch/arm/boot/dts/exynos4412.dtsi | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm/boot/dts/exynos4412.dtsi b/arch/arm/boot/dts/exynos4412.dtsi
> index a142fe84010b..6246df278431 100644
> --- a/arch/arm/boot/dts/exynos4412.dtsi
> +++ b/arch/arm/boot/dts/exynos4412.dtsi
> @@ -405,6 +405,7 @@
>
> bus_dmc_opp_table: opp-table1 {
> compatible = "operating-points-v2";
> + opp-shared;
>
> opp-100000000 {
> opp-hz = /bits/ 64 <100000000>;
> @@ -431,6 +432,7 @@
>
> bus_acp_opp_table: opp-table2 {
> compatible = "operating-points-v2";
> + opp-shared;
>
> opp-100000000 {
> opp-hz = /bits/ 64 <100000000>;
> @@ -500,6 +502,7 @@
>
> bus_leftbus_opp_table: opp-table3 {
> compatible = "operating-points-v2";
> + opp-shared;
>
> opp-100000000 {
> opp-hz = /bits/ 64 <100000000>;
> @@ -522,6 +525,7 @@
>
> bus_display_opp_table: opp-table4 {
> compatible = "operating-points-v2";
> + opp-shared;
>
> opp-160000000 {
> opp-hz = /bits/ 64 <160000000>;
> @@ -533,6 +537,7 @@
>
> bus_fsys_opp_table: opp-table5 {
> compatible = "operating-points-v2";
> + opp-shared;
>
> opp-100000000 {
> opp-hz = /bits/ 64 <100000000>;
> @@ -544,6 +549,7 @@
>
> bus_peri_opp_table: opp-table6 {
> compatible = "operating-points-v2";
> + opp-shared;
>
> opp-50000000 {
> opp-hz = /bits/ 64 <50000000>;

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-02-23 23:50:08

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH] Revert "ARM: dts: exynos: Remove 'opp-shared' from Exynos4412 bus OPP-tables"

On Tue, Feb 23, 2021 at 10:24:41AM +0100, Marek Szyprowski wrote:
> Hi Markus,
>
> On 22.02.2021 10:54, Markus Reichl wrote:
> > This reverts commit a23beead41a18c3be3ca409cb52f35bc02e601b9.
> >
> > I'm running an Odroid-X2 as headless 24/7 server.
> > With plain stable 5.10.1 I had 54 up days without problems.
> > With opp-shared removed on kernels before and now on 5.11
> > my system freezes after some days on disk activity to eMMC
> > (rsync, apt upgrade).
> >
> > The spontaneous hangs are not easy to reproduce but testing this
> > for several months now I am quite confident that there is something
> > wrong with this patch.
> >
> > Signed-off-by: Markus Reichl <[email protected]>
>
> Thanks for the report.
>
> IMHO a straight revert is a bad idea. I would prefer to keep current opp
> definitions and disable the affected devfreq devices (probably right bus
> would be enough) or try to identify which transitions are responsible
> for that issue. I know that it would take some time to identify them,
> but that would be the best solution. Reverting leads to incorrect
> hardware description, what in turn confuses the driver and framework,
> what in turn hides a real problem.

I agree with this approach. If devfreq is unusable on that platform,
let's try disabling the exynos-bus nodes. It could be enough to help.
The opp-shared does not look like proper fix for this problem, but
rather a incorrect solution which achieves the same result - disabling
frequency/voltage scaling on some buses.

>
> Another problem related to devfreq on Exynos4412 has been introduced
> recently by the commit 86ad9a24f21e ("PM / devfreq: Add required OPPs
> support to passive governor"). You can see lots of the messages like
> this one:
>
> devfreq soc:bus-acp: failed to update devfreq using passive governor
>
> I didn't have time to check what's wrong there, but I consider devfreq
> on Exynos a little bit broken, so another solution would be just to
> disable it in the exynos_defconfig.

Yes, I saw it as well. However defconfig is only defconfig, so customers
still would be affected and still might report bugs for it. Maybe better
to disable all exynos-bus nodes?

Best regards,
Krzysztof