2023-03-10 12:13:34

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
[ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
[ 3.895011] Reserved instruction in kernel code[#1]:
[ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
[ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
[ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
[ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
[ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
[ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
[ 3.932848] $20 : 00000000 00000000 55590000 77d70000
[ 3.938251] $24 : 00000018 00000010
[ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
[ 3.949058] Hi : 00000000
[ 3.952013] Lo : 00000000
[ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
[ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
[ 3.965913] Status: 10008703 KERNEL EXL IE
[ 3.970216] Cause : 00800028 (ExcCode 0a)
[ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
[ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
[ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
[ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
[ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
[ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
[ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
[ 4.044196] ...
[ 4.046706] Call Trace:
[ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
[ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
[ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
[ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
[ 4.069036]
[ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
[ 4.080686]
[ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
[ 4.087374] Kernel panic - not syncing: Fatal exception
[ 4.092753] Rebooting in 1 seconds..

Signed-off-by: Álvaro Fernández Rojas <[email protected]>
---
arch/mips/bmips/dma.c | 5 +++++
arch/mips/bmips/setup.c | 5 +++++
2 files changed, 10 insertions(+)

diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index 33788668cbdb..3769d5efdede 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -5,11 +5,16 @@
#include <asm/bmips.h>
#include <asm/io.h>

+int bmips_dma_sync_enabled = 1;
+
void arch_sync_dma_for_cpu_all(void)
{
void __iomem *cbr = BMIPS_GET_CBR();
u32 cfg;

+ if (!bmips_dma_sync_enabled)
+ return;
+
if (boot_cpu_type() != CPU_BMIPS3300 &&
boot_cpu_type() != CPU_BMIPS4350 &&
boot_cpu_type() != CPU_BMIPS4380)
diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
index e95b3f78e7cd..825e85e14010 100644
--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -35,6 +35,8 @@
#define REG_BCM6328_OTP ((void __iomem *)CKSEG1ADDR(0x1000062c))
#define BCM6328_TP1_DISABLED BIT(9)

+extern int bmips_dma_sync_enabled;
+
static const unsigned long kbase = VMLINUX_LOAD_ADDRESS & 0xfff00000;

struct bmips_quirk {
@@ -104,6 +106,9 @@ static void bcm6358_quirks(void)
* disable SMP for now
*/
bmips_smp_enabled = 0;
+
+ /* ARCH_HAS_SYNC_DMA_FOR_CPU_ALL causes kernel panics on BCM6358 */
+ bmips_dma_sync_enabled = 0;
}

static void bcm6368_quirks(void)
--
2.30.2



2023-03-11 17:32:36

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()



On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:
> arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
> [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
> [ 3.895011] Reserved instruction in kernel code[#1]:
> [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> [ 3.938251] $24 : 00000018 00000010
> [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> [ 3.949058] Hi : 00000000
> [ 3.952013] Lo : 00000000
> [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> [ 3.965913] Status: 10008703 KERNEL EXL IE
> [ 3.970216] Cause : 00800028 (ExcCode 0a)
> [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
> [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
> [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
> [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
> [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
> [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
> [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
> [ 4.044196] ...
> [ 4.046706] Call Trace:
> [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> [ 4.069036]
> [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
> [ 4.080686]
> [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> [ 4.087374] Kernel panic - not syncing: Fatal exception
> [ 4.092753] Rebooting in 1 seconds..

Did you pinpoint which specific instruction within
arch_sync_dma_for_cpu_all() is causing the reserved instruction exception?

>
> Signed-off-by: Álvaro Fernández Rojas <[email protected]>
> ---
> arch/mips/bmips/dma.c | 5 +++++
> arch/mips/bmips/setup.c | 5 +++++
> 2 files changed, 10 insertions(+)
>
> diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
> index 33788668cbdb..3769d5efdede 100644
> --- a/arch/mips/bmips/dma.c
> +++ b/arch/mips/bmips/dma.c
> @@ -5,11 +5,16 @@
> #include <asm/bmips.h>
> #include <asm/io.h>
>
> +int bmips_dma_sync_enabled = 1;

bool bmips_dma_sync_disabled;

> +
> void arch_sync_dma_for_cpu_all(void)
> {
> void __iomem *cbr = BMIPS_GET_CBR();
> u32 cfg;
>
> + if (!bmips_dma_sync_enabled)
> + return;

if (bmips_dma_sync_disabled)

> +
> if (boot_cpu_type() != CPU_BMIPS3300 &&
> boot_cpu_type() != CPU_BMIPS4350 &&
> boot_cpu_type() != CPU_BMIPS4380)
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index e95b3f78e7cd..825e85e14010 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -35,6 +35,8 @@
> #define REG_BCM6328_OTP ((void __iomem *)CKSEG1ADDR(0x1000062c))
> #define BCM6328_TP1_DISABLED BIT(9)
>
> +extern int bmips_dma_sync_enabled;
> +
> static const unsigned long kbase = VMLINUX_LOAD_ADDRESS & 0xfff00000;
>
> struct bmips_quirk {
> @@ -104,6 +106,9 @@ static void bcm6358_quirks(void)
> * disable SMP for now
> */
> bmips_smp_enabled = 0;
> +
> + /* ARCH_HAS_SYNC_DMA_FOR_CPU_ALL causes kernel panics on BCM6358 */
> + bmips_dma_sync_enabled = 0;

bmips_dma_sync_disabled = true;

> }
>
> static void bcm6368_quirks(void)

--
Florian

2023-03-11 19:44:27

by Jonas Gorski

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

On Sat, 11 Mar 2023 at 18:32, Florian Fainelli <[email protected]> wrote:
>
>
>
> On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:
> > arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
> > [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
> > [ 3.895011] Reserved instruction in kernel code[#1]:
> > [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> > [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> > [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> > [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> > [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> > [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> > [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> > [ 3.938251] $24 : 00000018 00000010
> > [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> > [ 3.949058] Hi : 00000000
> > [ 3.952013] Lo : 00000000
> > [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> > [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> > [ 3.965913] Status: 10008703 KERNEL EXL IE
> > [ 3.970216] Cause : 00800028 (ExcCode 0a)
> > [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> > [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
> > [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
> > [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
> > [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
> > [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
> > [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
> > [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
> > [ 4.044196] ...
> > [ 4.046706] Call Trace:
> > [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> > [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> > [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> > [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> > [ 4.069036]
> > [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
> > [ 4.080686]
> > [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> > [ 4.087374] Kernel panic - not syncing: Fatal exception
> > [ 4.092753] Rebooting in 1 seconds..
>
> Did you pinpoint which specific instruction within
> arch_sync_dma_for_cpu_all() is causing the reserved instruction exception?

It's setup_sigcontext(), not arch_sync_dma_for_cpu_all() that's
causing the exception ;-)

Hand decoding the Code gives me

lw $1, 0xb4($fp)
or $v0, 0, 0
addiu $a0, $s1, 8
sw $v0, 0($a0) <- the code in brackets, so I guess EPC?
sw $v1, 4($a0)

which I assume is this part:

err |= __put_user(regs->cp0_epc, &sc->sc_pc);

(0xb4 is the offset of cp0_epc, 0x8 the offset of sc_pc)

One thing I see is that we do the RAC flush for BMIPS3300, 4350 and
4380, but only initialize it for 3300 [1], but leave it at whatever
state the bootloader did for the other ones. Maybe it has some invalid
config in (that particuar?) 6358 that triggers issues later on after a
flush? E.g. the flush puts it in an error state, and the next time
something triggers a prefetch(write?) (by trying to access userspace)
it generates an error exception.

Just spit balling though.

[1] https://elixir.bootlin.com/linux/latest/source/arch/mips/kernel/smp-bmips.c#L587

Jonas

2023-03-12 07:31:53

by William Zhang

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()



On 03/11/2023 11:44 AM, Jonas Gorski wrote:
> On Sat, 11 Mar 2023 at 18:32, Florian Fainelli <[email protected]> wrote:
>>
>>
>>
>> On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:
>>> arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
>>> [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
>>> [ 3.895011] Reserved instruction in kernel code[#1]:
>>> [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
>>> [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
>>> [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
>>> [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
>>> [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
>>> [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
>>> [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
>>> [ 3.938251] $24 : 00000018 00000010
>>> [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
>>> [ 3.949058] Hi : 00000000
>>> [ 3.952013] Lo : 00000000
>>> [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
>>> [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
>>> [ 3.965913] Status: 10008703 KERNEL EXL IE
>>> [ 3.970216] Cause : 00800028 (ExcCode 0a)
>>> [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
>>> [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
>>> [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
>>> [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
>>> [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
>>> [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
>>> [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
>>> [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
>>> [ 4.044196] ...
>>> [ 4.046706] Call Trace:
>>> [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
>>> [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
>>> [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
>>> [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
>>> [ 4.069036]
>>> [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
>>> [ 4.080686]
>>> [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
>>> [ 4.087374] Kernel panic - not syncing: Fatal exception
>>> [ 4.092753] Rebooting in 1 seconds..
>>
>> Did you pinpoint which specific instruction within
>> arch_sync_dma_for_cpu_all() is causing the reserved instruction exception?
>
> It's setup_sigcontext(), not arch_sync_dma_for_cpu_all() that's
> causing the exception ;-)
>
> Hand decoding the Code gives me
>
> lw $1, 0xb4($fp)
> or $v0, 0, 0
> addiu $a0, $s1, 8
> sw $v0, 0($a0) <- the code in brackets, so I guess EPC?
> sw $v1, 4($a0)
>
> which I assume is this part:
>
> err |= __put_user(regs->cp0_epc, &sc->sc_pc);
>
> (0xb4 is the offset of cp0_epc, 0x8 the offset of sc_pc)
>
> One thing I see is that we do the RAC flush for BMIPS3300, 4350 and
> 4380, but only initialize it for 3300 [1], but leave it at whatever
> state the bootloader did for the other ones. Maybe it has some invalid
> config in (that particuar?) 6358 that triggers issues later on after a
> flush? E.g. the flush puts it in an error state, and the next time
> something triggers a prefetch(write?) (by trying to access userspace)
> it generates an error exception.
>
Depending on the bootloader but likely bootloader does not use RAC at
all. So agree that RAC may not be properly initialized when the flush
function is called and push the stale data to corrupt memory and cause
problem later on the userspace.

> Just spit balling though.
>
> [1] https://elixir.bootlin.com/linux/latest/source/arch/mips/kernel/smp-bmips.c#L587
>
> Jonas
>


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-03-12 16:19:08

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()



On 3/11/2023 11:31 PM, William Zhang wrote:
>
>
> On 03/11/2023 11:44 AM, Jonas Gorski wrote:
>> On Sat, 11 Mar 2023 at 18:32, Florian Fainelli <[email protected]>
>> wrote:
>>>
>>>
>>>
>>> On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:
>>>> arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with
>>>> EHCI/OHCI:
>>>> [    3.881739] usb 1-1: new high-speed USB device number 2 using
>>>> ehci-platform
>>>> [    3.895011] Reserved instruction in kernel code[#1]:
>>>> [    3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
>>>> [    3.905829] $ 0   : 00000000 10008700 00000000 77d94060
>>>> [    3.911238] $ 4   : 7fd1f088 00000000 81431cac 81431ca0
>>>> [    3.916641] $ 8   : 00000000 ffffefff 8075cd34 00000000
>>>> [    3.922043] $12   : 806f8d40 f3e812b7 00000000 000d9aaa
>>>> [    3.927446] $16   : 7fd1f068 7fd1f080 7ff559b8 81428470
>>>> [    3.932848] $20   : 00000000 00000000 55590000 77d70000
>>>> [    3.938251] $24   : 00000018 00000010
>>>> [    3.943655] $28   : 81430000 81431e60 81431f28 800157fc
>>>> [    3.949058] Hi    : 00000000
>>>> [    3.952013] Lo    : 00000000
>>>> [    3.955019] epc   : 80015808 setup_sigcontext+0x54/0x24c
>>>> [    3.960464] ra    : 800157fc setup_sigcontext+0x48/0x24c
>>>> [    3.965913] Status: 10008703       KERNEL EXL IE
>>>> [    3.970216] Cause : 00800028 (ExcCode 0a)
>>>> [    3.974340] PrId  : 0002a010 (Broadcom BMIPS4350)
>>>> [    3.979170] Modules linked in: ohci_platform ohci_hcd
>>>> fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug
>>>> usbcore nls_base usb_common
>>>> [    3.992907] Process init (pid: 1, threadinfo=(ptrval),
>>>> task=(ptrval), tls=77e22ec8)
>>>> [    4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068
>>>> 81431edc 7ff559b8 81428470
>>>> [    4.009467]         81431f28 7fd1f080 55590000 77d70000 77d5498c
>>>> 80015c70 806f0000 8063ae74
>>>> [    4.018149]         08100002 81431f28 0000000a 08100002 81431f28
>>>> 0000000a 77d6b418 00000003
>>>> [    4.026831]         ffffffff 80016414 80080734 81431ecc 81431ecc
>>>> 00000001 00000000 04000000
>>>> [    4.035512]         77d54874 00000000 00000000 00000000 00000000
>>>> 00000012 00000002 00000000
>>>> [    4.044196]         ...
>>>> [    4.046706] Call Trace:
>>>> [    4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
>>>> [    4.054356] [<80015c70>] setup_frame+0xdc/0x124
>>>> [    4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
>>>> [    4.064207] [<80011b50>] work_notifysig+0x10/0x18
>>>> [    4.069036]
>>>> [    4.070538] Code: 8fc300b4  00001025  26240008 <ac820000>
>>>> ac830004  3c048063  0c0228aa  24846a00  26240010
>>>> [    4.080686]
>>>> [    4.082517] ---[ end trace 22a8edb41f5f983b ]---
>>>> [    4.087374] Kernel panic - not syncing: Fatal exception
>>>> [    4.092753] Rebooting in 1 seconds..
>>>
>>> Did you pinpoint which specific instruction within
>>> arch_sync_dma_for_cpu_all() is causing the reserved instruction
>>> exception?
>>
>> It's setup_sigcontext(), not arch_sync_dma_for_cpu_all() that's
>> causing the exception ;-)
>>
>> Hand decoding the Code gives me
>>
>> lw $1, 0xb4($fp)
>> or $v0, 0, 0
>> addiu $a0, $s1, 8
>> sw $v0, 0($a0) <- the code in brackets, so I guess EPC?
>> sw $v1, 4($a0)
>>
>> which I assume is this part:
>>
>> err |= __put_user(regs->cp0_epc, &sc->sc_pc);
>>
>> (0xb4 is the offset of cp0_epc, 0x8 the offset of sc_pc)
>>
>> One thing I see is that we do the RAC flush for BMIPS3300, 4350 and
>> 4380, but only initialize it for 3300 [1], but leave it at whatever
>> state the bootloader did for the other ones. Maybe it has some invalid
>> config in (that particuar?) 6358 that triggers issues later on after a
>> flush? E.g. the flush puts it in an error state, and the next time
>> something triggers a prefetch(write?) (by trying to access userspace)
>> it generates an error exception.
>>
> Depending on the bootloader but likely bootloader does not use RAC at
> all.  So agree that RAC may not be properly initialized when the flush
> function is called and push the stale data to corrupt memory and cause
> problem later on the userspace.

Alvaro, could you do the following and let us know the results, at boot
time in bcm6358_quirks():

- issue a read_c0_brcm_config_0() and look whether bit 29 (RAC present)
is set, if it is not set, then we should forcibly disable the use of the
RAC using a flag

- issue a __raw_readl(cbr + BMIPS_RAC_CONFIG) and check whether bits 0
(RAC_I) or 1 (RAC_D) are set, if not, try to set them and see whether it
works

Thanks!
--
Florian

2023-03-12 18:50:36

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

Hi Florian,

I tried what you suggested but it stil panics on EHCI:

[ 0.000000] Linux version 5.15.98 (noltari@atlantis)
(mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
[ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
[ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
[ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)

It looks like bit 29 is set so RAC should be present.
And RAC_I seems to be set, but not RAC_D...

BTW, this is what I added to bmips_cpu_setup:

case CPU_BMIPS4350:
cfg = read_c0_brcm_config_0();
pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);

cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
__raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
__raw_readl(cbr + BMIPS_RAC_CONFIG);
break;

Best regards,
Álvaro.


El dom, 12 mar 2023 a las 17:16, Florian Fainelli
(<[email protected]>) escribió:
>
>
>
> On 3/11/2023 11:31 PM, William Zhang wrote:
> >
> >
> > On 03/11/2023 11:44 AM, Jonas Gorski wrote:
> >> On Sat, 11 Mar 2023 at 18:32, Florian Fainelli <[email protected]>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:
> >>>> arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with
> >>>> EHCI/OHCI:
> >>>> [ 3.881739] usb 1-1: new high-speed USB device number 2 using
> >>>> ehci-platform
> >>>> [ 3.895011] Reserved instruction in kernel code[#1]:
> >>>> [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> >>>> [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> >>>> [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> >>>> [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> >>>> [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> >>>> [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> >>>> [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> >>>> [ 3.938251] $24 : 00000018 00000010
> >>>> [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> >>>> [ 3.949058] Hi : 00000000
> >>>> [ 3.952013] Lo : 00000000
> >>>> [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> >>>> [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> >>>> [ 3.965913] Status: 10008703 KERNEL EXL IE
> >>>> [ 3.970216] Cause : 00800028 (ExcCode 0a)
> >>>> [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> >>>> [ 3.979170] Modules linked in: ohci_platform ohci_hcd
> >>>> fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug
> >>>> usbcore nls_base usb_common
> >>>> [ 3.992907] Process init (pid: 1, threadinfo=(ptrval),
> >>>> task=(ptrval), tls=77e22ec8)
> >>>> [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068
> >>>> 81431edc 7ff559b8 81428470
> >>>> [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c
> >>>> 80015c70 806f0000 8063ae74
> >>>> [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28
> >>>> 0000000a 77d6b418 00000003
> >>>> [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc
> >>>> 00000001 00000000 04000000
> >>>> [ 4.035512] 77d54874 00000000 00000000 00000000 00000000
> >>>> 00000012 00000002 00000000
> >>>> [ 4.044196] ...
> >>>> [ 4.046706] Call Trace:
> >>>> [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> >>>> [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> >>>> [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> >>>> [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> >>>> [ 4.069036]
> >>>> [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000>
> >>>> ac830004 3c048063 0c0228aa 24846a00 26240010
> >>>> [ 4.080686]
> >>>> [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> >>>> [ 4.087374] Kernel panic - not syncing: Fatal exception
> >>>> [ 4.092753] Rebooting in 1 seconds..
> >>>
> >>> Did you pinpoint which specific instruction within
> >>> arch_sync_dma_for_cpu_all() is causing the reserved instruction
> >>> exception?
> >>
> >> It's setup_sigcontext(), not arch_sync_dma_for_cpu_all() that's
> >> causing the exception ;-)
> >>
> >> Hand decoding the Code gives me
> >>
> >> lw $1, 0xb4($fp)
> >> or $v0, 0, 0
> >> addiu $a0, $s1, 8
> >> sw $v0, 0($a0) <- the code in brackets, so I guess EPC?
> >> sw $v1, 4($a0)
> >>
> >> which I assume is this part:
> >>
> >> err |= __put_user(regs->cp0_epc, &sc->sc_pc);
> >>
> >> (0xb4 is the offset of cp0_epc, 0x8 the offset of sc_pc)
> >>
> >> One thing I see is that we do the RAC flush for BMIPS3300, 4350 and
> >> 4380, but only initialize it for 3300 [1], but leave it at whatever
> >> state the bootloader did for the other ones. Maybe it has some invalid
> >> config in (that particuar?) 6358 that triggers issues later on after a
> >> flush? E.g. the flush puts it in an error state, and the next time
> >> something triggers a prefetch(write?) (by trying to access userspace)
> >> it generates an error exception.
> >>
> > Depending on the bootloader but likely bootloader does not use RAC at
> > all. So agree that RAC may not be properly initialized when the flush
> > function is called and push the stale data to corrupt memory and cause
> > problem later on the userspace.
>
> Alvaro, could you do the following and let us know the results, at boot
> time in bcm6358_quirks():
>
> - issue a read_c0_brcm_config_0() and look whether bit 29 (RAC present)
> is set, if it is not set, then we should forcibly disable the use of the
> RAC using a flag
>
> - issue a __raw_readl(cbr + BMIPS_RAC_CONFIG) and check whether bits 0
> (RAC_I) or 1 (RAC_D) are set, if not, try to set them and see whether it
> works
>
> Thanks!
> --
> Florian

2023-03-13 17:37:32

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

On 3/12/23 11:50, Álvaro Fernández Rojas wrote:
> Hi Florian,
>
> I tried what you suggested but it stil panics on EHCI:
>
> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> [ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
>
> It looks like bit 29 is set so RAC should be present.
> And RAC_I seems to be set, but not RAC_D...
>
> BTW, this is what I added to bmips_cpu_setup:
>
> case CPU_BMIPS4350:
> cfg = read_c0_brcm_config_0();
> pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
>
> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
> __raw_readl(cbr + BMIPS_RAC_CONFIG);
> break;

Thanks for running those experiments, I cannot explain what you are
seeing, so there must be some sort of erratum applicable to the
BMIPS4380 revision used on the 6358 somehow...

If you can make the suggested change to use negative logic in order to
disable the RAC flushing, that would work for me, also maybe add a
Fixes: tag so it gets backported to stable trees?

Thanks!
--
Florian


2023-03-13 21:39:40

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

Hi Florian,

I did another test changing from TP1 to TP0 and this is the result:
[ 0.000000] Linux version 5.15.98 (noltari@atlantis)
(mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
[ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
[ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x2a00015
[ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)

And there were no exceptions with EHCI/OHCI as opposed to TP1.
So the issue is only happening when booting from TP1.
Maybe it's due to the fact that BCM6358 has a shared TLB?

Maybe the correct way of solving the issue would be by adding the
following code at bcm6358_quirks():
if (read_c0_brcm_cmt_local() & (1 << 31))
bmips_dma_sync_disabled = 1;

BTW, if I understood it correctly, you want me to reverse the logic,
so bmips_dma_sync_disabled instead of bmips_dma_sync_enabled.
Is this correct?

Best regards,
Álvaro.


El lun, 13 mar 2023 a las 18:37, Florian Fainelli
(<[email protected]>) escribió:
>
> On 3/12/23 11:50, Álvaro Fernández Rojas wrote:
> > Hi Florian,
> >
> > I tried what you suggested but it stil panics on EHCI:
> >
> > [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> > (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> > 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> > [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> > [ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
> > [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
> >
> > It looks like bit 29 is set so RAC should be present.
> > And RAC_I seems to be set, but not RAC_D...
> >
> > BTW, this is what I added to bmips_cpu_setup:
> >
> > case CPU_BMIPS4350:
> > cfg = read_c0_brcm_config_0();
> > pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
> >
> > cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> > pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> > __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
> > __raw_readl(cbr + BMIPS_RAC_CONFIG);
> > break;
>
> Thanks for running those experiments, I cannot explain what you are
> seeing, so there must be some sort of erratum applicable to the
> BMIPS4380 revision used on the 6358 somehow...
>
> If you can make the suggested change to use negative logic in order to
> disable the RAC flushing, that would work for me, also maybe add a
> Fixes: tag so it gets backported to stable trees?
>
> Thanks!
> --
> Florian
>

2023-03-13 21:46:44

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

(please don't top post)

On 3/13/23 14:39, Álvaro Fernández Rojas wrote:
> Hi Florian,
>
> I did another test changing from TP1 to TP0 and this is the result:
> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x2a00015
> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
>
> And there were no exceptions with EHCI/OHCI as opposed to TP1.
> So the issue is only happening when booting from TP1.

Ah, that explains it then, I was just about to ask you which TP was the
kernel booted on.

> Maybe it's due to the fact that BCM6358 has a shared TLB?

I think it has to do with the fact that the BMIPS_RAC_CONFIG_1 is likely
not enabling the RAC since that register pertains to TP1, could you dump
its contents, and if they do not set bit 0 and/or 1, please set them and
try again and see whether it works any better? The RAC provides
substantial performance improvements, it would be a change to keep it
disabled.

>
> Maybe the correct way of solving the issue would be by adding the
> following code at bcm6358_quirks():
> if (read_c0_brcm_cmt_local() & (1 << 31))
> bmips_dma_sync_disabled = 1;
>
> BTW, if I understood it correctly, you want me to reverse the logic,
> so bmips_dma_sync_disabled instead of bmips_dma_sync_enabled.
> Is this correct?

Yes, I want the logic such that we need to set a variable to 1/true
rather setting one to 0, less change to get it wrong IMHO.

>
> Best regards,
> Álvaro.
>
>
> El lun, 13 mar 2023 a las 18:37, Florian Fainelli
> (<[email protected]>) escribió:
>>
>> On 3/12/23 11:50, Álvaro Fernández Rojas wrote:
>>> Hi Florian,
>>>
>>> I tried what you suggested but it stil panics on EHCI:
>>>
>>> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
>>> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
>>> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
>>> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
>>> [ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
>>> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
>>>
>>> It looks like bit 29 is set so RAC should be present.
>>> And RAC_I seems to be set, but not RAC_D...
>>>
>>> BTW, this is what I added to bmips_cpu_setup:
>>>
>>> case CPU_BMIPS4350:
>>> cfg = read_c0_brcm_config_0();
>>> pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
>>>
>>> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
>>> pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
>>> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
>>> __raw_readl(cbr + BMIPS_RAC_CONFIG);
>>> break;
>>
>> Thanks for running those experiments, I cannot explain what you are
>> seeing, so there must be some sort of erratum applicable to the
>> BMIPS4380 revision used on the 6358 somehow...
>>
>> If you can make the suggested change to use negative logic in order to
>> disable the RAC flushing, that would work for me, also maybe add a
>> Fixes: tag so it gets backported to stable trees?
>>
>> Thanks!
>> --
>> Florian
>>

--
Florian


2023-03-14 18:15:05

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

Hi Florian,

El lun, 13 mar 2023 a las 22:46, Florian Fainelli
(<[email protected]>) escribió:
>
> (please don't top post)
>
> On 3/13/23 14:39, Álvaro Fernández Rojas wrote:
> > Hi Florian,
> >
> > I did another test changing from TP1 to TP0 and this is the result:
> > [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> > (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> > 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> > [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> > [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x2a00015
> > [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
> >
> > And there were no exceptions with EHCI/OHCI as opposed to TP1.
> > So the issue is only happening when booting from TP1.
>
> Ah, that explains it then, I was just about to ask you which TP was the
> kernel booted on.
>
> > Maybe it's due to the fact that BCM6358 has a shared TLB?
>
> I think it has to do with the fact that the BMIPS_RAC_CONFIG_1 is likely
> not enabling the RAC since that register pertains to TP1, could you dump
> its contents, and if they do not set bit 0 and/or 1, please set them and
> try again and see whether it works any better? The RAC provides
> substantial performance improvements, it would be a change to keep it
> disabled.

This is the code that I added to bmips_cpu_setup():
case CPU_BMIPS4350:
cfg = read_c0_brcm_cmt_local();
pr_info("bmips_cpu_setup: read_c0_brcm_cmt_local() = 0x%x\n", cfg);

cfg = read_c0_brcm_config_0();
pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);

cfg = __raw_readl(cbr + BMIPS_RAC_ADDRESS_RANGE);
pr_info("bmips_cpu_setup: BMIPS_RAC_ADDRESS_RANGE = 0x%x\n", cfg);

cfg = __raw_readl(cbr + BMIPS_L2_CONFIG);
pr_info("bmips_cpu_setup: BMIPS_L2_CONFIG = 0x%x\n", cfg);

cfg = __raw_readl(cbr + BMIPS_LMB_CONTROL);
pr_info("bmips_cpu_setup: BMIPS_LMB_CONTROL = 0x%x\n", cfg);

cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x%x\n", cfg);
__raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x%x\n", cfg);
__raw_readl(cbr + BMIPS_RAC_CONFIG);

cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG_1);
pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x%x\n", cfg);
__raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG_1);
pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x%x\n", cfg);
__raw_readl(cbr + BMIPS_RAC_CONFIG_1);
break;

And this is the result:
[ 0.000000] bmips_cpu_setup: read_c0_brcm_cmt_local() = 0x80000000
[ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
[ 0.000000] bmips_cpu_setup: BMIPS_RAC_ADDRESS_RANGE = 0x277bdab0
[ 0.000000] bmips_cpu_setup: BMIPS_L2_CONFIG = 0x241a0008
[ 0.000000] bmips_cpu_setup: BMIPS_LMB_CONTROL = 0x0
[ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x3c1b8041
[ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x3c1b8041
[ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x3600008
[ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x3600008

As you can see the bit's aren't set and all the registers appear to
have strange values and not the usual ones when initialized by the
bootloader...

>
> >
> > Maybe the correct way of solving the issue would be by adding the
> > following code at bcm6358_quirks():
> > if (read_c0_brcm_cmt_local() & (1 << 31))
> > bmips_dma_sync_disabled = 1;
> >
> > BTW, if I understood it correctly, you want me to reverse the logic,
> > so bmips_dma_sync_disabled instead of bmips_dma_sync_enabled.
> > Is this correct?
>
> Yes, I want the logic such that we need to set a variable to 1/true
> rather setting one to 0, less change to get it wrong IMHO.
>
> >
> > Best regards,
> > Álvaro.
> >
> >
> > El lun, 13 mar 2023 a las 18:37, Florian Fainelli
> > (<[email protected]>) escribió:
> >>
> >> On 3/12/23 11:50, Álvaro Fernández Rojas wrote:
> >>> Hi Florian,
> >>>
> >>> I tried what you suggested but it stil panics on EHCI:
> >>>
> >>> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> >>> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> >>> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> >>> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> >>> [ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
> >>> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
> >>>
> >>> It looks like bit 29 is set so RAC should be present.
> >>> And RAC_I seems to be set, but not RAC_D...
> >>>
> >>> BTW, this is what I added to bmips_cpu_setup:
> >>>
> >>> case CPU_BMIPS4350:
> >>> cfg = read_c0_brcm_config_0();
> >>> pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
> >>>
> >>> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> >>> pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> >>> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
> >>> __raw_readl(cbr + BMIPS_RAC_CONFIG);
> >>> break;
> >>
> >> Thanks for running those experiments, I cannot explain what you are
> >> seeing, so there must be some sort of erratum applicable to the
> >> BMIPS4380 revision used on the 6358 somehow...
> >>
> >> If you can make the suggested change to use negative logic in order to
> >> disable the RAC flushing, that would work for me, also maybe add a
> >> Fixes: tag so it gets backported to stable trees?
> >>
> >> Thanks!
> >> --
> >> Florian
> >>
>
> --
> Florian
>

Álvaro

2023-03-16 17:41:05

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

On 3/14/23 11:14, Álvaro Fernández Rojas wrote:
> Hi Florian,
>
> El lun, 13 mar 2023 a las 22:46, Florian Fainelli
> (<[email protected]>) escribió:
>>
>> (please don't top post)
>>
>> On 3/13/23 14:39, Álvaro Fernández Rojas wrote:
>>> Hi Florian,
>>>
>>> I did another test changing from TP1 to TP0 and this is the result:
>>> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
>>> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
>>> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
>>> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
>>> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x2a00015
>>> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
>>>
>>> And there were no exceptions with EHCI/OHCI as opposed to TP1.
>>> So the issue is only happening when booting from TP1.
>>
>> Ah, that explains it then, I was just about to ask you which TP was the
>> kernel booted on.
>>
>>> Maybe it's due to the fact that BCM6358 has a shared TLB?
>>
>> I think it has to do with the fact that the BMIPS_RAC_CONFIG_1 is likely
>> not enabling the RAC since that register pertains to TP1, could you dump
>> its contents, and if they do not set bit 0 and/or 1, please set them and
>> try again and see whether it works any better? The RAC provides
>> substantial performance improvements, it would be a change to keep it
>> disabled.
>
> This is the code that I added to bmips_cpu_setup():
> case CPU_BMIPS4350:
> cfg = read_c0_brcm_cmt_local();
> pr_info("bmips_cpu_setup: read_c0_brcm_cmt_local() = 0x%x\n", cfg);
>
> cfg = read_c0_brcm_config_0();
> pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
>
> cfg = __raw_readl(cbr + BMIPS_RAC_ADDRESS_RANGE);
> pr_info("bmips_cpu_setup: BMIPS_RAC_ADDRESS_RANGE = 0x%x\n", cfg);
>
> cfg = __raw_readl(cbr + BMIPS_L2_CONFIG);
> pr_info("bmips_cpu_setup: BMIPS_L2_CONFIG = 0x%x\n", cfg);
>
> cfg = __raw_readl(cbr + BMIPS_LMB_CONTROL);
> pr_info("bmips_cpu_setup: BMIPS_LMB_CONTROL = 0x%x\n", cfg);
>
> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
> pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> __raw_readl(cbr + BMIPS_RAC_CONFIG);
>
> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG_1);
> pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x%x\n", cfg);
> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG_1);
> pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x%x\n", cfg);
> __raw_readl(cbr + BMIPS_RAC_CONFIG_1);
> break;
>
> And this is the result:
> [ 0.000000] bmips_cpu_setup: read_c0_brcm_cmt_local() = 0x80000000

OK, so we are executing from TP1, but we knew that already.

> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006

bit 31: instruction cache enabled
bit 30: data cache enabled
bit 29: RAC present
bit 25: DSU power enabled
bit 24: data cache power enabled
bit 19: low-latency memory bus (LMB) present
bit 18: concurrent multi threading (CMT) present
bit 17: reserved
bit 12: split instruction cache
bit 2: number of Hi/Lo special registers - 1
bit 1: eDSP present

This seems to match the recommended and default values

> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_ADDRESS_RANGE = 0x277bdab0

That does not look intended, the reset value is supposed to be 0. Can
you apply the same values as the ones programmed in the CPU_BMIPS_3300 case?

> [ 0.000000] bmips_cpu_setup: BMIPS_L2_CONFIG = 0x241a0008


> [ 0.000000] bmips_cpu_setup: BMIPS_LMB_CONTROL = 0x0

LMB not enabled, that's OK.

> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x3c1b8041
> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x3c1b8041
> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x3600008
> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x3600008
>
> As you can see the bit's aren't set and all the registers appear to
> have strange values and not the usual ones when initialized by the
> bootloader...

Yes, so maybe the best way to go about is indeed to go with your change
such that if we are running on TP1, it is safe to assume that CFE has
not done any sensible initialization, and we have a non functional RAC.

>
>>
>>>
>>> Maybe the correct way of solving the issue would be by adding the
>>> following code at bcm6358_quirks():
>>> if (read_c0_brcm_cmt_local() & (1 << 31))
>>> bmips_dma_sync_disabled = 1;
>>>
>>> BTW, if I understood it correctly, you want me to reverse the logic,
>>> so bmips_dma_sync_disabled instead of bmips_dma_sync_enabled.
>>> Is this correct?
>>
>> Yes, I want the logic such that we need to set a variable to 1/true
>> rather setting one to 0, less change to get it wrong IMHO.
>>
>>>
>>> Best regards,
>>> Álvaro.
>>>
>>>
>>> El lun, 13 mar 2023 a las 18:37, Florian Fainelli
>>> (<[email protected]>) escribió:
>>>>
>>>> On 3/12/23 11:50, Álvaro Fernández Rojas wrote:
>>>>> Hi Florian,
>>>>>
>>>>> I tried what you suggested but it stil panics on EHCI:
>>>>>
>>>>> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
>>>>> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
>>>>> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
>>>>> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
>>>>> [ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
>>>>> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
>>>>>
>>>>> It looks like bit 29 is set so RAC should be present.
>>>>> And RAC_I seems to be set, but not RAC_D...
>>>>>
>>>>> BTW, this is what I added to bmips_cpu_setup:
>>>>>
>>>>> case CPU_BMIPS4350:
>>>>> cfg = read_c0_brcm_config_0();
>>>>> pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
>>>>>
>>>>> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
>>>>> pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
>>>>> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
>>>>> __raw_readl(cbr + BMIPS_RAC_CONFIG);
>>>>> break;
>>>>
>>>> Thanks for running those experiments, I cannot explain what you are
>>>> seeing, so there must be some sort of erratum applicable to the
>>>> BMIPS4380 revision used on the 6358 somehow...
>>>>
>>>> If you can make the suggested change to use negative logic in order to
>>>> disable the RAC flushing, that would work for me, also maybe add a
>>>> Fixes: tag so it gets backported to stable trees?
>>>>
>>>> Thanks!
>>>> --
>>>> Florian
>>>>
>>
>> --
>> Florian
>>
>
> Álvaro

--
Florian


2023-03-16 18:04:04

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

El jue, 16 mar 2023 a las 18:40, Florian Fainelli
(<[email protected]>) escribió:
>
> On 3/14/23 11:14, Álvaro Fernández Rojas wrote:
> > Hi Florian,
> >
> > El lun, 13 mar 2023 a las 22:46, Florian Fainelli
> > (<[email protected]>) escribió:
> >>
> >> (please don't top post)
> >>
> >> On 3/13/23 14:39, Álvaro Fernández Rojas wrote:
> >>> Hi Florian,
> >>>
> >>> I did another test changing from TP1 to TP0 and this is the result:
> >>> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> >>> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> >>> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> >>> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> >>> [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x2a00015
> >>> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
> >>>
> >>> And there were no exceptions with EHCI/OHCI as opposed to TP1.
> >>> So the issue is only happening when booting from TP1.
> >>
> >> Ah, that explains it then, I was just about to ask you which TP was the
> >> kernel booted on.
> >>
> >>> Maybe it's due to the fact that BCM6358 has a shared TLB?
> >>
> >> I think it has to do with the fact that the BMIPS_RAC_CONFIG_1 is likely
> >> not enabling the RAC since that register pertains to TP1, could you dump
> >> its contents, and if they do not set bit 0 and/or 1, please set them and
> >> try again and see whether it works any better? The RAC provides
> >> substantial performance improvements, it would be a change to keep it
> >> disabled.
> >
> > This is the code that I added to bmips_cpu_setup():
> > case CPU_BMIPS4350:
> > cfg = read_c0_brcm_cmt_local();
> > pr_info("bmips_cpu_setup: read_c0_brcm_cmt_local() = 0x%x\n", cfg);
> >
> > cfg = read_c0_brcm_config_0();
> > pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
> >
> > cfg = __raw_readl(cbr + BMIPS_RAC_ADDRESS_RANGE);
> > pr_info("bmips_cpu_setup: BMIPS_RAC_ADDRESS_RANGE = 0x%x\n", cfg);
> >
> > cfg = __raw_readl(cbr + BMIPS_L2_CONFIG);
> > pr_info("bmips_cpu_setup: BMIPS_L2_CONFIG = 0x%x\n", cfg);
> >
> > cfg = __raw_readl(cbr + BMIPS_LMB_CONTROL);
> > pr_info("bmips_cpu_setup: BMIPS_LMB_CONTROL = 0x%x\n", cfg);
> >
> > cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> > pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> > __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
> > pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> > __raw_readl(cbr + BMIPS_RAC_CONFIG);
> >
> > cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG_1);
> > pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x%x\n", cfg);
> > __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG_1);
> > pr_info("bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x%x\n", cfg);
> > __raw_readl(cbr + BMIPS_RAC_CONFIG_1);
> > break;
> >
> > And this is the result:
> > [ 0.000000] bmips_cpu_setup: read_c0_brcm_cmt_local() = 0x80000000
>
> OK, so we are executing from TP1, but we knew that already.
>
> > [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
>
> bit 31: instruction cache enabled
> bit 30: data cache enabled
> bit 29: RAC present
> bit 25: DSU power enabled
> bit 24: data cache power enabled
> bit 19: low-latency memory bus (LMB) present
> bit 18: concurrent multi threading (CMT) present
> bit 17: reserved
> bit 12: split instruction cache
> bit 2: number of Hi/Lo special registers - 1
> bit 1: eDSP present

I'm curious, is there any bit that indicates a split data cache?

>
> This seems to match the recommended and default values
>
> > [ 0.000000] bmips_cpu_setup: BMIPS_RAC_ADDRESS_RANGE = 0x277bdab0
>
> That does not look intended, the reset value is supposed to be 0. Can
> you apply the same values as the ones programmed in the CPU_BMIPS_3300 case?

I've just tried this but it still panics...

>
> > [ 0.000000] bmips_cpu_setup: BMIPS_L2_CONFIG = 0x241a0008
>
>
> > [ 0.000000] bmips_cpu_setup: BMIPS_LMB_CONTROL = 0x0
>
> LMB not enabled, that's OK.
>
> > [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x3c1b8041
> > [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG = 0x3c1b8041
> > [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x3600008
> > [ 0.000000] bmips_cpu_setup: BMIPS_RAC_CONFIG_1 = 0x3600008
> >
> > As you can see the bit's aren't set and all the registers appear to
> > have strange values and not the usual ones when initialized by the
> > bootloader...
>
> Yes, so maybe the best way to go about is indeed to go with your change
> such that if we are running on TP1, it is safe to assume that CFE has
> not done any sensible initialization, and we have a non functional RAC.

Yes, I will send v2 checking if we're running on TP1.

>
> >
> >>
> >>>
> >>> Maybe the correct way of solving the issue would be by adding the
> >>> following code at bcm6358_quirks():
> >>> if (read_c0_brcm_cmt_local() & (1 << 31))
> >>> bmips_dma_sync_disabled = 1;
> >>>
> >>> BTW, if I understood it correctly, you want me to reverse the logic,
> >>> so bmips_dma_sync_disabled instead of bmips_dma_sync_enabled.
> >>> Is this correct?
> >>
> >> Yes, I want the logic such that we need to set a variable to 1/true
> >> rather setting one to 0, less change to get it wrong IMHO.
> >>
> >>>
> >>> Best regards,
> >>> Álvaro.
> >>>
> >>>
> >>> El lun, 13 mar 2023 a las 18:37, Florian Fainelli
> >>> (<[email protected]>) escribió:
> >>>>
> >>>> On 3/12/23 11:50, Álvaro Fernández Rojas wrote:
> >>>>> Hi Florian,
> >>>>>
> >>>>> I tried what you suggested but it stil panics on EHCI:
> >>>>>
> >>>>> [ 0.000000] Linux version 5.15.98 (noltari@atlantis)
> >>>>> (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.2.0 r22187+1-19817fa3f5)
> >>>>> 12.2.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Sun Mar 12 18:23:28 2023
> >>>>> [ 0.000000] bmips_cpu_setup: read_c0_brcm_config_0() = 0xe30e1006
> >>>>> [ 0.000000] bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x3c1b8041
> >>>>> [ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
> >>>>>
> >>>>> It looks like bit 29 is set so RAC should be present.
> >>>>> And RAC_I seems to be set, but not RAC_D...
> >>>>>
> >>>>> BTW, this is what I added to bmips_cpu_setup:
> >>>>>
> >>>>> case CPU_BMIPS4350:
> >>>>> cfg = read_c0_brcm_config_0();
> >>>>> pr_info("bmips_cpu_setup: read_c0_brcm_config_0() = 0x%x\n", cfg);
> >>>>>
> >>>>> cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> >>>>> pr_info("bmips_cpu_setup: cbr + BMIPS_RAC_CONFIG = 0x%x\n", cfg);
> >>>>> __raw_writel(cfg | BIT(0) | BIT(1), cbr + BMIPS_RAC_CONFIG);
> >>>>> __raw_readl(cbr + BMIPS_RAC_CONFIG);
> >>>>> break;
> >>>>
> >>>> Thanks for running those experiments, I cannot explain what you are
> >>>> seeing, so there must be some sort of erratum applicable to the
> >>>> BMIPS4380 revision used on the 6358 somehow...
> >>>>
> >>>> If you can make the suggested change to use negative logic in order to
> >>>> disable the RAC flushing, that would work for me, also maybe add a
> >>>> Fixes: tag so it gets backported to stable trees?
> >>>>
> >>>> Thanks!
> >>>> --
> >>>> Florian
> >>>>
> >>
> >> --
> >> Florian
> >>
> >
> > Álvaro
>
> --
> Florian
>

Álvaro

2023-03-16 18:05:29

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: [PATCH v2] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all() for TP1

arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
[ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
[ 3.895011] Reserved instruction in kernel code[#1]:
[ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
[ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
[ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
[ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
[ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
[ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
[ 3.932848] $20 : 00000000 00000000 55590000 77d70000
[ 3.938251] $24 : 00000018 00000010
[ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
[ 3.949058] Hi : 00000000
[ 3.952013] Lo : 00000000
[ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
[ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
[ 3.965913] Status: 10008703 KERNEL EXL IE
[ 3.970216] Cause : 00800028 (ExcCode 0a)
[ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
[ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
[ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
[ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
[ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
[ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
[ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
[ 4.044196] ...
[ 4.046706] Call Trace:
[ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
[ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
[ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
[ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
[ 4.069036]
[ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
[ 4.080686]
[ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
[ 4.087374] Kernel panic - not syncing: Fatal exception
[ 4.092753] Rebooting in 1 seconds..

This only happens when booting from TP1 instead of TP0.

Signed-off-by: Álvaro Fernández Rojas <[email protected]>
---
v2: check if we're running on TP1 and invert logic.

arch/mips/bmips/dma.c | 5 +++++
arch/mips/bmips/setup.c | 9 +++++++++
2 files changed, 14 insertions(+)

diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index 33788668cbdb..2e470ef8d30f 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -5,11 +5,16 @@
#include <asm/bmips.h>
#include <asm/io.h>

+int bmips_dma_sync_disable = 0;
+
void arch_sync_dma_for_cpu_all(void)
{
void __iomem *cbr = BMIPS_GET_CBR();
u32 cfg;

+ if (bmips_dma_sync_disable)
+ return;
+
if (boot_cpu_type() != CPU_BMIPS3300 &&
boot_cpu_type() != CPU_BMIPS4350 &&
boot_cpu_type() != CPU_BMIPS4380)
diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
index e95b3f78e7cd..15543e8d5b0c 100644
--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -35,6 +35,8 @@
#define REG_BCM6328_OTP ((void __iomem *)CKSEG1ADDR(0x1000062c))
#define BCM6328_TP1_DISABLED BIT(9)

+extern int bmips_dma_sync_disable;
+
static const unsigned long kbase = VMLINUX_LOAD_ADDRESS & 0xfff00000;

struct bmips_quirk {
@@ -104,6 +106,13 @@ static void bcm6358_quirks(void)
* disable SMP for now
*/
bmips_smp_enabled = 0;
+
+ /*
+ * ARCH_HAS_SYNC_DMA_FOR_CPU_ALL causes kernel panics on BCM6358 when
+ * booting from TP1
+ */
+ if (read_c0_brcm_cmt_local() & (1 << 31))
+ bmips_dma_sync_disable = 1;
}

static void bcm6368_quirks(void)
--
2.30.2


2023-03-17 00:00:41

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v2] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all() for TP1

On 3/16/23 11:05, Álvaro Fernández Rojas wrote:
> arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
> [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
> [ 3.895011] Reserved instruction in kernel code[#1]:
> [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> [ 3.938251] $24 : 00000018 00000010
> [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> [ 3.949058] Hi : 00000000
> [ 3.952013] Lo : 00000000
> [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> [ 3.965913] Status: 10008703 KERNEL EXL IE
> [ 3.970216] Cause : 00800028 (ExcCode 0a)
> [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
> [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
> [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
> [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
> [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
> [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
> [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
> [ 4.044196] ...
> [ 4.046706] Call Trace:
> [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> [ 4.069036]
> [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
> [ 4.080686]
> [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> [ 4.087374] Kernel panic - not syncing: Fatal exception
> [ 4.092753] Rebooting in 1 seconds..
>
> This only happens when booting from TP1 instead of TP0.

Because the bootloader (CFE) is not initializing the Read-ahead cache
properly on the second thread (TP1). Since the RAC was not initialized
properly, we should avoid flushing it at the risk of corrupting the
instruction stream as seen in the trace above.

>
> Signed-off-by: Álvaro Fernández Rojas <[email protected]>
> ---
> v2: check if we're running on TP1 and invert logic.
>
> arch/mips/bmips/dma.c | 5 +++++
> arch/mips/bmips/setup.c | 9 +++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
> index 33788668cbdb..2e470ef8d30f 100644
> --- a/arch/mips/bmips/dma.c
> +++ b/arch/mips/bmips/dma.c
> @@ -5,11 +5,16 @@
> #include <asm/bmips.h>
> #include <asm/io.h>
>
> +int bmips_dma_sync_disable = 0;

bool bmips_rac_flush_disable;

No need for the false, it is initialized to 0 by default.

> +
> void arch_sync_dma_for_cpu_all(void)
> {
> void __iomem *cbr = BMIPS_GET_CBR();
> u32 cfg;
>
> + if (bmips_dma_sync_disable)
> + return;

if (unlikely(bmips_rac_flush_disable))

and move it after the check on the boot_cpu_type() so we do not impact
other BMIPS CPUs.

> +
> if (boot_cpu_type() != CPU_BMIPS3300 &&
> boot_cpu_type() != CPU_BMIPS4350 &&
> boot_cpu_type() != CPU_BMIPS4380)
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index e95b3f78e7cd..15543e8d5b0c 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -35,6 +35,8 @@
> #define REG_BCM6328_OTP ((void __iomem *)CKSEG1ADDR(0x1000062c))
> #define BCM6328_TP1_DISABLED BIT(9)
>
> +extern int bmips_dma_sync_disable;

extern bool bmips_rac_flush_disable;

> +
> static const unsigned long kbase = VMLINUX_LOAD_ADDRESS & 0xfff00000;
>
> struct bmips_quirk {
> @@ -104,6 +106,13 @@ static void bcm6358_quirks(void)
> * disable SMP for now
> */
> bmips_smp_enabled = 0;
> +
> + /*
> + * ARCH_HAS_SYNC_DMA_FOR_CPU_ALL causes kernel panics on BCM6358 when
> + * booting from TP1
> + */
> + if (read_c0_brcm_cmt_local() & (1 << 31))
> + bmips_dma_sync_disable = 1;

bmips_rac_flush_disable = !!(read_c0_brcm_cmt_local() & (1 << 31));

is more compact, but what you have works too.

Sorry for being so insisting about the name, but signaling what this
does is fairly important.
--
Florian


2023-03-17 10:20:21

by Álvaro Fernández Rojas

[permalink] [raw]
Subject: [PATCH v3] mips: bmips: BCM6358: disable RAC flush for TP1

RAC flush causes kernel panics on BCM6358 with EHCI/OHCI when booting from TP1:
[ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
[ 3.895011] Reserved instruction in kernel code[#1]:
[ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
[ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
[ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
[ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
[ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
[ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
[ 3.932848] $20 : 00000000 00000000 55590000 77d70000
[ 3.938251] $24 : 00000018 00000010
[ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
[ 3.949058] Hi : 00000000
[ 3.952013] Lo : 00000000
[ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
[ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
[ 3.965913] Status: 10008703 KERNEL EXL IE
[ 3.970216] Cause : 00800028 (ExcCode 0a)
[ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
[ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
[ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
[ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
[ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
[ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
[ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
[ 4.044196] ...
[ 4.046706] Call Trace:
[ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
[ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
[ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
[ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
[ 4.069036]
[ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
[ 4.080686]
[ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
[ 4.087374] Kernel panic - not syncing: Fatal exception
[ 4.092753] Rebooting in 1 seconds..

Because the bootloader (CFE) is not initializing the Read-ahead cache properly
on the second thread (TP1). Since the RAC was not initialized properly, we
should avoid flushing it at the risk of corrupting the instruction stream as
seen in the trace above.

Signed-off-by: Álvaro Fernández Rojas <[email protected]>
---
v3: add changes suggested by Florian:
- Switch to a bool and remove unneeded initialization.
- Remove if from bcm6358_quirks().
- Improve commit description and bcm6358_quirks() comment.
v2: check if we're running on TP1 and invert logic.

arch/mips/bmips/dma.c | 5 +++++
arch/mips/bmips/setup.c | 8 ++++++++
2 files changed, 13 insertions(+)

diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index 33788668cbdb..3779e7855bd7 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -5,6 +5,8 @@
#include <asm/bmips.h>
#include <asm/io.h>

+bool bmips_rac_flush_disable;
+
void arch_sync_dma_for_cpu_all(void)
{
void __iomem *cbr = BMIPS_GET_CBR();
@@ -15,6 +17,9 @@ void arch_sync_dma_for_cpu_all(void)
boot_cpu_type() != CPU_BMIPS4380)
return;

+ if (unlikely(bmips_rac_flush_disable))
+ return;
+
/* Flush stale data out of the readahead cache */
cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
__raw_writel(cfg | 0x100, cbr + BMIPS_RAC_CONFIG);
diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
index e95b3f78e7cd..549a6392a3d2 100644
--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -35,6 +35,8 @@
#define REG_BCM6328_OTP ((void __iomem *)CKSEG1ADDR(0x1000062c))
#define BCM6328_TP1_DISABLED BIT(9)

+extern bool bmips_rac_flush_disable;
+
static const unsigned long kbase = VMLINUX_LOAD_ADDRESS & 0xfff00000;

struct bmips_quirk {
@@ -104,6 +106,12 @@ static void bcm6358_quirks(void)
* disable SMP for now
*/
bmips_smp_enabled = 0;
+
+ /*
+ * RAC flush causes kernel panics on BCM6358 when booting from TP1
+ * because the bootloader is not initializing it properly.
+ */
+ bmips_rac_flush_disable = !!(read_c0_brcm_cmt_local() & (1 << 31));
}

static void bcm6368_quirks(void)
--
2.30.2


2023-03-17 16:46:03

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v3] mips: bmips: BCM6358: disable RAC flush for TP1

On 3/17/23 03:20, Álvaro Fernández Rojas wrote:
> RAC flush causes kernel panics on BCM6358 with EHCI/OHCI when booting from TP1:
> [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
> [ 3.895011] Reserved instruction in kernel code[#1]:
> [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> [ 3.938251] $24 : 00000018 00000010
> [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> [ 3.949058] Hi : 00000000
> [ 3.952013] Lo : 00000000
> [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> [ 3.965913] Status: 10008703 KERNEL EXL IE
> [ 3.970216] Cause : 00800028 (ExcCode 0a)
> [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
> [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
> [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
> [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
> [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
> [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
> [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
> [ 4.044196] ...
> [ 4.046706] Call Trace:
> [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> [ 4.069036]
> [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
> [ 4.080686]
> [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> [ 4.087374] Kernel panic - not syncing: Fatal exception
> [ 4.092753] Rebooting in 1 seconds..
>
> Because the bootloader (CFE) is not initializing the Read-ahead cache properly
> on the second thread (TP1). Since the RAC was not initialized properly, we
> should avoid flushing it at the risk of corrupting the instruction stream as
> seen in the trace above.
>
> Signed-off-by: Álvaro Fernández Rojas <[email protected]>

Third time is a charm, Thomas, I believe this deserves adding a:

Fixes: d59098a0e9cb ("MIPS: bmips: use generic dma noncoherent ops")

such that this backports cleanly. Thanks!
--
Florian


2023-03-18 13:38:27

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH v3] mips: bmips: BCM6358: disable RAC flush for TP1

On Fri, Mar 17, 2023 at 11:20:04AM +0100, ?lvaro Fern?ndez Rojas wrote:
> RAC flush causes kernel panics on BCM6358 with EHCI/OHCI when booting from TP1:
> [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
> [ 3.895011] Reserved instruction in kernel code[#1]:
> [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> [ 3.938251] $24 : 00000018 00000010
> [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> [ 3.949058] Hi : 00000000
> [ 3.952013] Lo : 00000000
> [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> [ 3.965913] Status: 10008703 KERNEL EXL IE
> [ 3.970216] Cause : 00800028 (ExcCode 0a)
> [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
> [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
> [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
> [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
> [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
> [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
> [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
> [ 4.044196] ...
> [ 4.046706] Call Trace:
> [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> [ 4.069036]
> [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
> [ 4.080686]
> [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> [ 4.087374] Kernel panic - not syncing: Fatal exception
> [ 4.092753] Rebooting in 1 seconds..
>
> Because the bootloader (CFE) is not initializing the Read-ahead cache properly
> on the second thread (TP1). Since the RAC was not initialized properly, we
> should avoid flushing it at the risk of corrupting the instruction stream as
> seen in the trace above.
>
> Signed-off-by: ?lvaro Fern?ndez Rojas <[email protected]>
> ---
> v3: add changes suggested by Florian:
> - Switch to a bool and remove unneeded initialization.
> - Remove if from bcm6358_quirks().
> - Improve commit description and bcm6358_quirks() comment.
> v2: check if we're running on TP1 and invert logic.
>
> arch/mips/bmips/dma.c | 5 +++++
> arch/mips/bmips/setup.c | 8 ++++++++
> 2 files changed, 13 insertions(+)

applied to mips-fixes with the Fixes tag added.

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]