2023-07-05 14:56:52

by Christopher Obbard

[permalink] [raw]
Subject: [PATCH v1 1/2] arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4

There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:

[ 289.171014] mmc1: running CQE recovery
[ 290.048972] mmc1: running CQE recovery
[ 290.054834] mmc1: running CQE recovery
[ 290.060817] mmc1: running CQE recovery
[ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
[ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
[ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
[ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
[ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
[ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
[ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
[ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
[ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
[ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
[ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
[ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297

Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.

While we are here, set the eMMC maximum clock frequency to 1.5MHz to
follow the ROCK 4C+.

Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support")
Signed-off-by: Christopher Obbard <[email protected]>
---

arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
index 907071d4fe80..95efee311ece 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
@@ -645,9 +645,9 @@ &saradc {
};

&sdhci {
+ max-frequency = <150000000>;
bus-width = <8>;
- mmc-hs400-1_8v;
- mmc-hs400-enhanced-strobe;
+ mmc-hs200-1_8v;
non-removable;
status = "okay";
};
--
2.40.1



2023-07-05 20:44:15

by Folker Schwesinger

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4

On Wed Jul 5, 2023 at 4:42 PM CEST, Christopher Obbard wrote:
> There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
> in HS400 mode. This ends up resulting in some block errors after a while
> or after a "heavy" operation utilising the eMMC (e.g. resizing a
> filesystem). An example of these errors is as follows:
>
> [ 289.171014] mmc1: running CQE recovery
> [ 290.048972] mmc1: running CQE recovery
> [ 290.054834] mmc1: running CQE recovery
> [ 290.060817] mmc1: running CQE recovery
> [ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
> [ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
> [ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
> [ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
> [ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
> [ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
> [ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
> [ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
> [ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
> [ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
> [ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
> [ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297
>
> Disabling the Command Queue seems to stop the CQE recovery from running,
> but doesn't seem to improve the I/O errors. Until this can be investigated
> further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
> errors from occurring.
>
> While we are here, set the eMMC maximum clock frequency to 1.5MHz to
> follow the ROCK 4C+.
>
> Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support")
> Signed-off-by: Christopher Obbard <[email protected]>
> ---
>
> arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
> index 907071d4fe80..95efee311ece 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
> @@ -645,9 +645,9 @@ &saradc {
> };
>
> &sdhci {
> + max-frequency = <150000000>;
> bus-width = <8>;
> - mmc-hs400-1_8v;
> - mmc-hs400-enhanced-strobe;
> + mmc-hs200-1_8v;
> non-removable;
> status = "okay";
> };

Works as advertised on a RockPi 4b v1.3 with kernel 6.1.37.

Tested-By: Folker Schwesinger <[email protected]>

Folker

2023-07-16 04:55:27

by Alban Browaeys

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4

Le mercredi 05 juillet 2023 à 15:42 +0100, Christopher Obbard a écrit :
> > > > There is some instablity with some eMMC modules on ROCK Pi 4
> > > > SBCs
> > > > running
> > > > in HS400 mode. This ends up resulting in some block errors
> > > > after a
> > > > while
> > > > or after a "heavy" operation utilising the eMMC (e.g. resizing
> > > > a
> > > > filesystem). An example of these errors is as follows:

I did not report my finding to the Linux upstream back then (due to
using a non vanilla Linux kernel) but with my Armbian install I had
bisected this issue to 06653ebc0ad2e0b7d799cd71a5c2933ed2fb7a66 as the
first bad commit.
I believe it was released in 5.10.60 (the first broken version to reach
armbian was 5.10.63 from a working 5.10.43.
Since then all rk3399 I have checked have disabled hs400 (down to hs200
which is stable even with the above commits).

commit 06653ebc0ad2e0b7d799cd71a5c2933ed2fb7a66
Author: Dmitry Baryshkov <[email protected]>
Date: Thu May 20 01:12:23 2021 +0300

regulator: core: resolve supply for boot-on/always-on regulators

commit 98e48cd9283dbac0e1445ee780889f10b3d1db6a upstream.

For the boot-on/always-on regulators the set_machine_constrainst() is
called before resolving rdev->supply. Thus the code would try to enable
rdev before enabling supplying regulator. Enforce resolving supply
regulator before enabling rdev.

Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator")
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drivers/regulator/core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index f192bf19492ed..e20e77e4c159d 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1425,6 +1425,12 @@ static int set_machine_constraints(struct regulator_dev *rdev)
* and we have control then make sure it is enabled.
*/
if (rdev->constraints->always_on || rdev->constraints->boot_on) {
+ /* If we want to enable this regulator, make sure that we know
+ * the supplying regulator.
+ */
+ if (rdev->supply_name && !rdev->supply)
+ return -EPROBE_DEFER;
+
if (rdev->supply) {
ret = regulator_enable(rdev->supply);
if (ret < 0) {


My findings here:
https://forum.armbian.com/topic/18855-upgrading-to-bullseye-troubleshooting-armbian-21081/?do=findComment&comment=128793
this on a kobol helios64 rk3399 board.

I told a user to try this fix (revert commits 06653ebc0ad2e0b7d799cd71a5c2933ed2fb7a66
and aea6cb99703e17019e025aa71643b4d3e0a24413) also for an armbian kernel on a Nanopc-T4
and it fixes the issue https://forum.armbian.com/topic/20002-nanopc-t4-new-kernel-2202-generates-issues-on-mmc2-and-makes-system-not-properly-working/?do=findComment&comment=138052
This above 5.16.8.





I had high expectations that the commit that fixed double init would fix the issue for good, but sadly not.
I believe this would have been the only required fix for 5.16 kernels but nowadays it is not enough a revert.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/regulator/core.c?id=8a866d527ac0441c0eb14a991fa11358b476b11d

regulator: core: Resolve supply name earlier to prevent double-init
Previously, an unresolved regulator supply reference upon calling
regulator_register on an always-on or boot-on regulator caused
set_machine_constraints to be called twice.

This in turn may initialize the regulator twice, leading to voltage
glitches that are timing-dependent. A simple, unrelated configuration
change may be enough to hide this problem, only to be surfaced by
chance.

One such example is the SD-Card voltage regulator in a NanoPI R4S that
would not initialize reliably unless the registration flow was just
complex enough to allow the regulator to properly reset between calls.

Fix this by re-arranging regulator_register, trying resolve the
regulator's supply early enough that set_machine_constraints does not
need to be called twice.

Signed-off-by: Christian Kohlschütter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
"
story behing this patch https://kohlschuetter.github.io/blog/posts/2022/10/28/linux-nanopi-r4s/

It should have worked because basically this patch is a revert of
commit aea6cb99703e17019e025aa71643b4d3e0a24413 "regulator: resolve
supply after creating regulator" except it keep what I believe is now
dead code (ie the second set_machine_constains in "if (ret == -
EPROBE_DEFER) " is of no use now that the regulator supply is resolved
before the first set_machine_constraints call in regilator_registers.
The only code left from the 5.10.60 breakage is the EPROBE_DEFER if
regulator supply is not registered in set_machine_constrains.
But even after removing this leftover and the new EPROBE_DEFER that was
added to set_machine_constraints for "regulator that have no direct
control", I cannot get rid of the Filesystem corruption and errors with
hs400 with 6.3.

Still I have no clue why emmc regulators double init is fine on most
SoC but not rk3399.


Cheers,
Alban