2022-11-03 05:21:51

by John Thomson

[permalink] [raw]
Subject: [RFC PATCH 0/3] mips: ralink: mt7621: fix too-early kzalloc

ralink mt7621 attempts to use kzalloc before normal memory management is
available.
Before v6.1-rc1, mt7621.c soc_dev_init silently failed to kzalloc and
returned in soc_dev_init, but continued to boot without the soc device
registered.
After, kernel crashes before it outputs any console messages

This was bisected to an mm/slub change (detailed in patch 3)

RFC due to
- probably a (much) better way to do this
- do not have mt7621 device with PCIe to test
drivers/phy/ralink/phy-mt7621-pci.c
- should this reference a commit as Fixes?





2022-11-03 05:31:28

by John Thomson

[permalink] [raw]
Subject: [RFC PATCH 1/3] mips: ralink: mt7621: define MT7621_SYSC_BASE with __iomem

So that MT7621_SYSC_BASE can be used later in multiple functions without
needing to repeat this __iomem declaration each time

Signed-off-by: John Thomson <[email protected]>
---
arch/mips/include/asm/mach-ralink/mt7621.h | 4 +++-
arch/mips/ralink/mt7621.c | 7 +++----
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/mach-ralink/mt7621.h b/arch/mips/include/asm/mach-ralink/mt7621.h
index 6bbf082dd149..905c5b3ed2bd 100644
--- a/arch/mips/include/asm/mach-ralink/mt7621.h
+++ b/arch/mips/include/asm/mach-ralink/mt7621.h
@@ -7,10 +7,12 @@
#ifndef _MT7621_REGS_H_
#define _MT7621_REGS_H_

+#define IOMEM(x) ((void __iomem *)(KSEG1ADDR(CPHYSADDR(x))))
+
#define MT7621_PALMBUS_BASE 0x1C000000
#define MT7621_PALMBUS_SIZE 0x03FFFFFF

-#define MT7621_SYSC_BASE 0x1E000000
+#define MT7621_SYSC_BASE IOMEM(0x1E000000)

#define SYSC_REG_CHIP_NAME0 0x00
#define SYSC_REG_CHIP_NAME1 0x04
diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
index fb0565bc34fd..17dbf28897e0 100644
--- a/arch/mips/ralink/mt7621.c
+++ b/arch/mips/ralink/mt7621.c
@@ -126,7 +126,6 @@ static void soc_dev_init(struct ralink_soc_info *soc_info, u32 rev)

void __init prom_soc_init(struct ralink_soc_info *soc_info)
{
- void __iomem *sysc = (void __iomem *) KSEG1ADDR(MT7621_SYSC_BASE);
unsigned char *name = NULL;
u32 n0;
u32 n1;
@@ -154,8 +153,8 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
__sync();
}

- n0 = __raw_readl(sysc + SYSC_REG_CHIP_NAME0);
- n1 = __raw_readl(sysc + SYSC_REG_CHIP_NAME1);
+ n0 = __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_NAME0);
+ n1 = __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_NAME1);

if (n0 == MT7621_CHIP_NAME0 && n1 == MT7621_CHIP_NAME1) {
name = "MT7621";
@@ -164,7 +163,7 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
panic("mt7621: unknown SoC, n0:%08x n1:%08x\n", n0, n1);
}
ralink_soc = MT762X_SOC_MT7621AT;
- rev = __raw_readl(sysc + SYSC_REG_CHIP_REV);
+ rev = __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_REV);

snprintf(soc_info->sys_type, RAMIPS_SYS_TYPE_LEN,
"MediaTek %s ver:%u eco:%u",
--
2.37.2


2022-11-03 05:33:57

by John Thomson

[permalink] [raw]
Subject: [RFC PATCH 3/3] mips: ralink: mt7621: do not use kzalloc too early

Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
of kmalloc") mt7621 failed to boot very early, without showing any
console messages.
This exposed the pre-existing bug of mt7621.c using kzalloc before normal
memory management was available.
Prior to this slub change, there existed the unintended protection against
"kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
return NULL and bailed out of slab_alloc_node().
This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
but continue booting without this soc device.

Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
with mm/slub modified to warn on kmem_cache zero or null:

zimage at: 80B842A0 810B4BC0
Uncompressing Linux at load address 80001000
Copy device tree to address 80B80EE0
Now, booting the kernel...

[ 0.000000] Linux version 6.1.0-rc3+ (john@john)
(mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
Nov 2 05:10:01 AEST 2022
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
kmem_cache_alloc+0x5a4/0x5e8
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
[ 0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
00000000 80889d04 80c90000
[ 0.000000] 80920000 807bd328 8089d368 80923bd3 00000000
00000001 80889cb0 00000000
[ 0.000000] 00000000 00000000 807bd328 8084bcb1 00000002
00000002 00000001 6d6f4320
[ 0.000000] 00000000 80c97d3d 80c97d68 fffffffc 807bd328
00000000 00000000 00000000
[ 0.000000] 00000000 a0000000 80910000 8110a0b4 00000000
00000020 80010000 80010000
[ 0.000000] ...
[ 0.000000] Call Trace:
[ 0.000000] [<80008260>] show_stack+0x28/0xf0
[ 0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
[ 0.000000] [<8002e184>] __warn+0xc4/0xf8
[ 0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
[ 0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
[ 0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
[ 0.000000] [<80928060>] prom_init+0x44/0xf0
[ 0.000000] [<80929214>] setup_arch+0x4c/0x6a8
[ 0.000000] [<809257e0>] start_kernel+0x88/0x7c0
[ 0.000000]
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[ 0.000000] printk: bootconsole [early0] enabled

This early kzalloc was introduced in commit 71b9b5e0130d ("MIPS: ralink:
mt7621: introduce 'soc_device' initialization")

Link: https://lore.kernel.org/linux-mm/[email protected]/
Signed-off-by: John Thomson <[email protected]>
---
arch/mips/ralink/mt7621.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
index f2443b833bc3..836965021d5c 100644
--- a/arch/mips/ralink/mt7621.c
+++ b/arch/mips/ralink/mt7621.c
@@ -25,6 +25,7 @@
#define MT7621_MEM_TEST_PATTERN 0xaa5555aa

static u32 detect_magic __initdata;
+struct ralink_soc_info *soc_info_ptr;

int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
{
@@ -147,27 +148,30 @@ static const char __init *mt7621_get_soc_revision(void)
return "E1";
}

-static void soc_dev_init(struct ralink_soc_info *soc_info)
+static int __init mt7621_soc_dev_init(void)
{
struct soc_device *soc_dev;
struct soc_device_attribute *soc_dev_attr;

soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
if (!soc_dev_attr)
- return;
+ return -ENOMEM;

soc_dev_attr->soc_id = "mt7621";
soc_dev_attr->family = "Ralink";
soc_dev_attr->revision = mt7621_get_soc_revision();

- soc_dev_attr->data = soc_info;
+ soc_dev_attr->data = soc_info_ptr;

soc_dev = soc_device_register(soc_dev_attr);
if (IS_ERR(soc_dev)) {
kfree(soc_dev_attr);
- return;
+ return PTR_ERR(soc_dev);
}
+
+ return 0;
}
+device_initcall(mt7621_soc_dev_init);

void __init prom_soc_init(struct ralink_soc_info *soc_info)
{
@@ -209,7 +213,7 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)

soc_info->mem_detect = mt7621_memory_detect;

- soc_dev_init(soc_info);
+ soc_info_ptr = soc_info;

if (!register_cps_smp_ops())
return;
--
2.37.2


2022-11-03 06:40:01

by John Thomson

[permalink] [raw]
Subject: [RFC PATCH 2/3] mips: ralink: mt7621: soc queries and tests as functions

Move the SoC register value queries and tests to specific functions,
to remove repetition of logic
No functional changes intended

Signed-off-by: John Thomson <[email protected]>
---
arch/mips/ralink/mt7621.c | 86 +++++++++++++++++++++++++++------------
1 file changed, 61 insertions(+), 25 deletions(-)

diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
index 17dbf28897e0..f2443b833bc3 100644
--- a/arch/mips/ralink/mt7621.c
+++ b/arch/mips/ralink/mt7621.c
@@ -97,7 +97,57 @@ void __init ralink_of_remap(void)
panic("Failed to remap core resources");
}

-static void soc_dev_init(struct ralink_soc_info *soc_info, u32 rev)
+static const unsigned int __init mt7621_get_soc_name0(void)
+{
+ return __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_NAME0);
+}
+
+static const unsigned int __init mt7621_get_soc_name1(void)
+{
+ return __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_NAME1);
+}
+
+static const bool __init mt7621_soc_valid(void)
+{
+ if (mt7621_get_soc_name0() == MT7621_CHIP_NAME0 &&
+ mt7621_get_soc_name1() == MT7621_CHIP_NAME1)
+ return true;
+ else
+ return false;
+}
+
+static const char __init *mt7621_get_soc_id(void)
+{
+ if (mt7621_soc_valid())
+ return "MT7621";
+ else
+ return "invalid";
+}
+
+static const unsigned int __init mt7621_get_soc_rev(void)
+{
+ return __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_REV);
+}
+
+static const unsigned int __init mt7621_get_soc_ver(void)
+{
+ return (mt7621_get_soc_rev() >> CHIP_REV_VER_SHIFT) & CHIP_REV_VER_MASK;
+}
+
+static const unsigned int __init mt7621_get_soc_eco(void)
+{
+ return (mt7621_get_soc_rev() & CHIP_REV_ECO_MASK);
+}
+
+static const char __init *mt7621_get_soc_revision(void)
+{
+ if (mt7621_get_soc_rev() == 1 && mt7621_get_soc_eco() == 1)
+ return "E2";
+ else
+ return "E1";
+}
+
+static void soc_dev_init(struct ralink_soc_info *soc_info)
{
struct soc_device *soc_dev;
struct soc_device_attribute *soc_dev_attr;
@@ -108,12 +158,7 @@ static void soc_dev_init(struct ralink_soc_info *soc_info, u32 rev)

soc_dev_attr->soc_id = "mt7621";
soc_dev_attr->family = "Ralink";
-
- if (((rev >> CHIP_REV_VER_SHIFT) & CHIP_REV_VER_MASK) == 1 &&
- (rev & CHIP_REV_ECO_MASK) == 1)
- soc_dev_attr->revision = "E2";
- else
- soc_dev_attr->revision = "E1";
+ soc_dev_attr->revision = mt7621_get_soc_revision();

soc_dev_attr->data = soc_info;

@@ -126,11 +171,6 @@ static void soc_dev_init(struct ralink_soc_info *soc_info, u32 rev)

void __init prom_soc_init(struct ralink_soc_info *soc_info)
{
- unsigned char *name = NULL;
- u32 n0;
- u32 n1;
- u32 rev;
-
/* Early detection of CMP support */
mips_cm_probe();
mips_cpc_probe();
@@ -153,27 +193,23 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
__sync();
}

- n0 = __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_NAME0);
- n1 = __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_NAME1);
-
- if (n0 == MT7621_CHIP_NAME0 && n1 == MT7621_CHIP_NAME1) {
- name = "MT7621";
+ if (mt7621_soc_valid())
soc_info->compatible = "mediatek,mt7621-soc";
- } else {
- panic("mt7621: unknown SoC, n0:%08x n1:%08x\n", n0, n1);
- }
+ else
+ panic("mt7621: unknown SoC, n0:%08x n1:%08x\n",
+ mt7621_get_soc_name0(),
+ mt7621_get_soc_name1());
ralink_soc = MT762X_SOC_MT7621AT;
- rev = __raw_readl(MT7621_SYSC_BASE + SYSC_REG_CHIP_REV);

snprintf(soc_info->sys_type, RAMIPS_SYS_TYPE_LEN,
"MediaTek %s ver:%u eco:%u",
- name,
- (rev >> CHIP_REV_VER_SHIFT) & CHIP_REV_VER_MASK,
- (rev & CHIP_REV_ECO_MASK));
+ mt7621_get_soc_id(),
+ mt7621_get_soc_ver(),
+ mt7621_get_soc_eco());

soc_info->mem_detect = mt7621_memory_detect;

- soc_dev_init(soc_info, rev);
+ soc_dev_init(soc_info);

if (!register_cps_smp_ops())
return;
--
2.37.2


2022-11-03 11:29:36

by John Thomson

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mips: ralink: mt7621: do not use kzalloc too early

On Thu, 3 Nov 2022, at 05:05, John Thomson wrote:
> Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
> of kmalloc") mt7621 failed to boot very early, without showing any
> console messages.
> This exposed the pre-existing bug of mt7621.c using kzalloc before normal
> memory management was available.
> Prior to this slub change, there existed the unintended protection against
> "kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
> return NULL and bailed out of slab_alloc_node().
> This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
> but continue booting without this soc device.
>
> Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
> with mm/slub modified to warn on kmem_cache zero or null:
>
> zimage at: 80B842A0 810B4BC0
> Uncompressing Linux at load address 80001000
> Copy device tree to address 80B80EE0
> Now, booting the kernel...
>
> [ 0.000000] Linux version 6.1.0-rc3+ (john@john)
> (mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
> 2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
> Nov 2 05:10:01 AEST 2022
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
> kmem_cache_alloc+0x5a4/0x5e8
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
> [ 0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
> 00000000 80889d04 80c90000
> [ 0.000000] 80920000 807bd328 8089d368 80923bd3 00000000
> 00000001 80889cb0 00000000
> [ 0.000000] 00000000 00000000 807bd328 8084bcb1 00000002
> 00000002 00000001 6d6f4320
> [ 0.000000] 00000000 80c97d3d 80c97d68 fffffffc 807bd328
> 00000000 00000000 00000000
> [ 0.000000] 00000000 a0000000 80910000 8110a0b4 00000000
> 00000020 80010000 80010000
> [ 0.000000] ...
> [ 0.000000] Call Trace:
> [ 0.000000] [<80008260>] show_stack+0x28/0xf0
> [ 0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
> [ 0.000000] [<8002e184>] __warn+0xc4/0xf8
> [ 0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
> [ 0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
> [ 0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
> [ 0.000000] [<80928060>] prom_init+0x44/0xf0
> [ 0.000000] [<80929214>] setup_arch+0x4c/0x6a8
> [ 0.000000] [<809257e0>] start_kernel+0x88/0x7c0
> [ 0.000000]
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> [ 0.000000] printk: bootconsole [early0] enabled
>
> This early kzalloc was introduced in commit 71b9b5e0130d ("MIPS: ralink:
> mt7621: introduce 'soc_device' initialization")
>
> Link:
> https://lore.kernel.org/linux-mm/[email protected]/
> Signed-off-by: John Thomson <[email protected]>
> ---
> arch/mips/ralink/mt7621.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
> index f2443b833bc3..836965021d5c 100644
> --- a/arch/mips/ralink/mt7621.c
> +++ b/arch/mips/ralink/mt7621.c
> @@ -25,6 +25,7 @@
> #define MT7621_MEM_TEST_PATTERN 0xaa5555aa
>
> static u32 detect_magic __initdata;
> +struct ralink_soc_info *soc_info_ptr;
>
> int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
> {
> @@ -147,27 +148,30 @@ static const char __init *mt7621_get_soc_revision(void)
> return "E1";
> }
>
> -static void soc_dev_init(struct ralink_soc_info *soc_info)
> +static int __init mt7621_soc_dev_init(void)
> {
> struct soc_device *soc_dev;
> struct soc_device_attribute *soc_dev_attr;
>
> soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> if (!soc_dev_attr)
> - return;
> + return -ENOMEM;
>
> soc_dev_attr->soc_id = "mt7621";
> soc_dev_attr->family = "Ralink";
> soc_dev_attr->revision = mt7621_get_soc_revision();
>
> - soc_dev_attr->data = soc_info;
> + soc_dev_attr->data = soc_info_ptr;
>
> soc_dev = soc_device_register(soc_dev_attr);
> if (IS_ERR(soc_dev)) {
> kfree(soc_dev_attr);
> - return;
> + return PTR_ERR(soc_dev);
> }
> +
> + return 0;
> }
> +device_initcall(mt7621_soc_dev_init);
>
> void __init prom_soc_init(struct ralink_soc_info *soc_info)
> {
> @@ -209,7 +213,7 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
>
> soc_info->mem_detect = mt7621_memory_detect;
>
> - soc_dev_init(soc_info);
> + soc_info_ptr = soc_info;
>
> if (!register_cps_smp_ops())
> return;
> --
> 2.37.2

I backported this to kernel 5.10 as a test
without it, there was no /sys/bus/soc
with it, the drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c driver
panicked in soc_device_match_attr
This was fixed with an added sentinel element in the quirk table:
--- a/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
+++ b/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
@@ -293,7 +293,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct device *d
ev,
}

static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
- { .soc_id = "mt7621", .revision = "E2" }
+ { .soc_id = "mt7621", .revision = "E2" },
+ { /* sentinel */ }
};

static const struct regmap_config mt7621_pci_phy_regmap_config = {

There is the same quirk table to kernel 5.15 in drivers/staging/mt7621-pci/pci-mt7621.c
Should I add commits for these for the stable kernels?

In master, these files are now
drivers/pci/controller/pcie-mt7621.c
drivers/phy/ralink/phy-mt7621-pci.c

Should I add sentinels to the soc_device_attribute quirk tables in all of these files?

Cheers,
--
John Thomson

2022-11-03 17:47:58

by Sergio Paracuellos

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mips: ralink: mt7621: do not use kzalloc too early

Hi John,

Thanks for the patches!

On Thu, Nov 3, 2022 at 12:15 PM John Thomson
<[email protected]> wrote:
>
> On Thu, 3 Nov 2022, at 05:05, John Thomson wrote:
> > Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
> > of kmalloc") mt7621 failed to boot very early, without showing any
> > console messages.
> > This exposed the pre-existing bug of mt7621.c using kzalloc before normal
> > memory management was available.
> > Prior to this slub change, there existed the unintended protection against
> > "kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
> > return NULL and bailed out of slab_alloc_node().
> > This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
> > but continue booting without this soc device.
> >
> > Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
> > with mm/slub modified to warn on kmem_cache zero or null:
> >
> > zimage at: 80B842A0 810B4BC0
> > Uncompressing Linux at load address 80001000
> > Copy device tree to address 80B80EE0
> > Now, booting the kernel...
> >
> > [ 0.000000] Linux version 6.1.0-rc3+ (john@john)
> > (mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
> > 2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
> > Nov 2 05:10:01 AEST 2022
> > [ 0.000000] ------------[ cut here ]------------
> > [ 0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
> > kmem_cache_alloc+0x5a4/0x5e8
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
> > [ 0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
> > 00000000 80889d04 80c90000
> > [ 0.000000] 80920000 807bd328 8089d368 80923bd3 00000000
> > 00000001 80889cb0 00000000
> > [ 0.000000] 00000000 00000000 807bd328 8084bcb1 00000002
> > 00000002 00000001 6d6f4320
> > [ 0.000000] 00000000 80c97d3d 80c97d68 fffffffc 807bd328
> > 00000000 00000000 00000000
> > [ 0.000000] 00000000 a0000000 80910000 8110a0b4 00000000
> > 00000020 80010000 80010000
> > [ 0.000000] ...
> > [ 0.000000] Call Trace:
> > [ 0.000000] [<80008260>] show_stack+0x28/0xf0
> > [ 0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
> > [ 0.000000] [<8002e184>] __warn+0xc4/0xf8
> > [ 0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
> > [ 0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
> > [ 0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
> > [ 0.000000] [<80928060>] prom_init+0x44/0xf0
> > [ 0.000000] [<80929214>] setup_arch+0x4c/0x6a8
> > [ 0.000000] [<809257e0>] start_kernel+0x88/0x7c0
> > [ 0.000000]
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> > [ 0.000000] printk: bootconsole [early0] enabled

Last version I tested on my gnubee PC1 mt7621 board was v6.0 and all
was booting properly.

> >
> > This early kzalloc was introduced in commit 71b9b5e0130d ("MIPS: ralink:
> > mt7621: introduce 'soc_device' initialization")
> >
> > Link:
> > https://lore.kernel.org/linux-mm/[email protected]/
> > Signed-off-by: John Thomson <[email protected]>
> > ---
> > arch/mips/ralink/mt7621.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
> > index f2443b833bc3..836965021d5c 100644
> > --- a/arch/mips/ralink/mt7621.c
> > +++ b/arch/mips/ralink/mt7621.c
> > @@ -25,6 +25,7 @@
> > #define MT7621_MEM_TEST_PATTERN 0xaa5555aa
> >
> > static u32 detect_magic __initdata;
> > +struct ralink_soc_info *soc_info_ptr;
> >
> > int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
> > {
> > @@ -147,27 +148,30 @@ static const char __init *mt7621_get_soc_revision(void)
> > return "E1";
> > }
> >
> > -static void soc_dev_init(struct ralink_soc_info *soc_info)
> > +static int __init mt7621_soc_dev_init(void)
> > {
> > struct soc_device *soc_dev;
> > struct soc_device_attribute *soc_dev_attr;
> >
> > soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> > if (!soc_dev_attr)
> > - return;
> > + return -ENOMEM;
> >
> > soc_dev_attr->soc_id = "mt7621";
> > soc_dev_attr->family = "Ralink";
> > soc_dev_attr->revision = mt7621_get_soc_revision();
> >
> > - soc_dev_attr->data = soc_info;
> > + soc_dev_attr->data = soc_info_ptr;
> >
> > soc_dev = soc_device_register(soc_dev_attr);
> > if (IS_ERR(soc_dev)) {
> > kfree(soc_dev_attr);
> > - return;
> > + return PTR_ERR(soc_dev);
> > }
> > +
> > + return 0;
> > }
> > +device_initcall(mt7621_soc_dev_init);
> >
> > void __init prom_soc_init(struct ralink_soc_info *soc_info)
> > {
> > @@ -209,7 +213,7 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
> >
> > soc_info->mem_detect = mt7621_memory_detect;
> >
> > - soc_dev_init(soc_info);
> > + soc_info_ptr = soc_info;
> >
> > if (!register_cps_smp_ops())
> > return;
> > --
> > 2.37.2

I was trying to quicky add all of them to my trew using b4 with [0]
but I am getting a DNS error with that URL...

So, I am a bit busy this week but hopefully next week I'll try to make
time to test all of your changes. Let me test all your changes and
come back to you again.

[0]: https://lore.kernel.org/lkml/[email protected]/T/#m75e858f83a3e2e26ca84295d2d09040e14128e71

>
> I backported this to kernel 5.10 as a test
> without it, there was no /sys/bus/soc
> with it, the drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c driver
> panicked in soc_device_match_attr
> This was fixed with an added sentinel element in the quirk table:
> --- a/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
> +++ b/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
> @@ -293,7 +293,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct device *d
> ev,
> }
>
> static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
> - { .soc_id = "mt7621", .revision = "E2" }
> + { .soc_id = "mt7621", .revision = "E2" },
> + { /* sentinel */ }
> };
>
> static const struct regmap_config mt7621_pci_phy_regmap_config = {
>
> There is the same quirk table to kernel 5.15 in drivers/staging/mt7621-pci/pci-mt7621.c
> Should I add commits for these for the stable kernels?
>
> In master, these files are now
> drivers/pci/controller/pcie-mt7621.c
> drivers/phy/ralink/phy-mt7621-pci.c
>
> Should I add sentinels to the soc_device_attribute quirk tables in all of these files?

I guess we should add sentinel in all related files. Please CC me with
your series if you send any patch before I come back to you after
testing.

>
> Cheers,
> --
> John Thomson

Thanks,
Sergio Paracuellos

2022-11-04 13:25:31

by Sergio Paracuellos

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mips: ralink: mt7621: do not use kzalloc too early

Hi John,

On Thu, Nov 3, 2022 at 6:25 PM Sergio Paracuellos
<[email protected]> wrote:
>
> Hi John,
>
> Thanks for the patches!
>
> On Thu, Nov 3, 2022 at 12:15 PM John Thomson
> <[email protected]> wrote:
> >
> > On Thu, 3 Nov 2022, at 05:05, John Thomson wrote:
> > > Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
> > > of kmalloc") mt7621 failed to boot very early, without showing any
> > > console messages.
> > > This exposed the pre-existing bug of mt7621.c using kzalloc before normal
> > > memory management was available.
> > > Prior to this slub change, there existed the unintended protection against
> > > "kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
> > > return NULL and bailed out of slab_alloc_node().
> > > This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
> > > but continue booting without this soc device.
> > >
> > > Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
> > > with mm/slub modified to warn on kmem_cache zero or null:
> > >
> > > zimage at: 80B842A0 810B4BC0
> > > Uncompressing Linux at load address 80001000
> > > Copy device tree to address 80B80EE0
> > > Now, booting the kernel...
> > >
> > > [ 0.000000] Linux version 6.1.0-rc3+ (john@john)
> > > (mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
> > > 2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
> > > Nov 2 05:10:01 AEST 2022
> > > [ 0.000000] ------------[ cut here ]------------
> > > [ 0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
> > > kmem_cache_alloc+0x5a4/0x5e8
> > > [ 0.000000] Modules linked in:
> > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
> > > [ 0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
> > > 00000000 80889d04 80c90000
> > > [ 0.000000] 80920000 807bd328 8089d368 80923bd3 00000000
> > > 00000001 80889cb0 00000000
> > > [ 0.000000] 00000000 00000000 807bd328 8084bcb1 00000002
> > > 00000002 00000001 6d6f4320
> > > [ 0.000000] 00000000 80c97d3d 80c97d68 fffffffc 807bd328
> > > 00000000 00000000 00000000
> > > [ 0.000000] 00000000 a0000000 80910000 8110a0b4 00000000
> > > 00000020 80010000 80010000
> > > [ 0.000000] ...
> > > [ 0.000000] Call Trace:
> > > [ 0.000000] [<80008260>] show_stack+0x28/0xf0
> > > [ 0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
> > > [ 0.000000] [<8002e184>] __warn+0xc4/0xf8
> > > [ 0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
> > > [ 0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
> > > [ 0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
> > > [ 0.000000] [<80928060>] prom_init+0x44/0xf0
> > > [ 0.000000] [<80929214>] setup_arch+0x4c/0x6a8
> > > [ 0.000000] [<809257e0>] start_kernel+0x88/0x7c0
> > > [ 0.000000]
> > > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > > [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> > > [ 0.000000] printk: bootconsole [early0] enabled
>
> Last version I tested on my gnubee PC1 mt7621 board was v6.0 and all
> was booting properly.

I have verified with 6.1.0-rc1 system does not boot as you was pointed out here.
After adding your patches the system boots and got an Oops because
soc_device_match_attr:

[ 20.569959] CPU 0 Unable to handle kernel paging request at virtual
address 675f6b6c, epc == 80403dec, ra == 804ae11c
[ 20.591060] Oops[#1]:
[ 20.595462] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #148
[ 20.608265] $ 0 : 00000000 00000001 82262a00 00000000
[ 20.618615] $ 4 : 675f6b6c 808dea04 00000000 804ae138
[ 20.628983] $ 8 : 00000000 808787ba 00000000 821f4b00
[ 20.639351] $12 : 0000005b 0000005d 0000002d 0000005c
[ 20.649735] $16 : 82253580 807b4034 807b4034 804ae138
[ 20.660087] $20 : fffffff4 82c382b8 809e1094 00000008
[ 20.670455] $24 : 0000002a 0000003f
[ 20.680823] $28 : 82050000 82051c30 80a0d638 804ae11c
[ 20.691190] Hi : 00000037
[ 20.696891] Lo : 5c28f6a0
[ 20.702610] epc : 80403dec glob_match+0x1c/0x240
[ 20.712100] ra : 804ae11c soc_device_match_attr+0xac/0xc8
[ 20.723330] Status: 11000403 KERNEL EXL IE
[ 20.731626] Cause : 40800008 (ExcCode 02)
[ 20.739576] BadVA : 675f6b6c
[ 20.745277] PrId : 0001992f (MIPS 1004Kc)
[ 20.753414] Modules linked in:
[ 20.759448] Process swapper/0 (pid: 1, threadinfo=(ptrval),
task=(ptrval), tls=00000000)
[ 20.775520] Stack : fffffff4 80496ab8 820c6010 828c8518 80950000
ffffffea 80950000 80496b48
[ 20.792106] 00000000 828c8400 820c6010 821f4880 1e160000
821bc754 82253734 7f8268e6
[ 20.808707] 809c6a94 807b4034 804ae138 809c8e88 819a0000
804ae1d8 80a0d638 80438e10
[ 20.825282] 821f3e70 80950000 808c0000 828c8400 820c6000
828c8548 820c6010 80456608
[ 20.841879] 821f3dc0 821d32c0 819a0000 801d8768 821f3dc0
821d32c0 828c8540 80950000
[ 20.858473] ...
[ 20.863298] Call Trace:
[ 20.868137] [<80403dec>] glob_match+0x1c/0x240
[ 20.876955] [<804ae11c>] soc_device_match_attr+0xac/0xc8
[ 20.887500] [<80496b48>] bus_for_each_dev+0x7c/0xc0
[ 20.897176] [<804ae1d8>] soc_device_match+0x98/0xc8
[ 20.906869] [<80456608>] mt7621_pcie_probe+0x90/0x7b8
[ 20.916876] [<8049b46c>] platform_probe+0x54/0x94
[ 20.926206] [<80499058>] really_probe+0x200/0x434
[ 20.935538] [<80499520>] driver_probe_device+0x44/0xd4
[ 20.945732] [<80499ae0>] __driver_attach+0xb8/0x1b0
[ 20.955428] [<80496b48>] bus_for_each_dev+0x7c/0xc0
[ 20.965089] [<80497f18>] bus_add_driver+0x100/0x218
[ 20.974763] [<8049a338>] driver_register+0xd0/0x118
[ 20.984438] [<80001590>] do_one_initcall+0x8c/0x28c
[ 20.994115] [<809e21c8>] kernel_init_freeable+0x254/0x28c
[ 21.004845] [<80781070>] kernel_init+0x24/0x118
[ 21.013830] [<800034f8>] ret_from_kernel_thread+0x14/0x1c
[ 21.024522]
[ 21.027457] Code: 240f005c 2418002a 2419003f <80820000> 24a90001
90a70000 104c006f 24860001 2843005c
[ 21.046810]
[ 21.049830] ---[ end trace 0000000000000000 ]---
[ 21.058935] Kernel panic - not syncing: Fatal exception
[ 21.069310] Rebooting in 1 seconds..

I have fixed this adding two sentinels in the following files:

drivers/pci/controller/pcie-mt7621.c
drivers/phy/ralink/phy-mt7621-pci.c

sergio@camaron:~/GNUBEE-SERGIO-TEST/linux$ git diff
drivers/pci/controller/pcie-mt7621.c
drivers/phy/ralink/phy-mt7621-pci.c
diff --git a/drivers/pci/controller/pcie-mt7621.c
b/drivers/pci/controller/pcie-mt7621.c
index 4bd1abf26008..ee7aad09d627 100644
--- a/drivers/pci/controller/pcie-mt7621.c
+++ b/drivers/pci/controller/pcie-mt7621.c
@@ -466,7 +466,8 @@ static int mt7621_pcie_register_host(struct
pci_host_bridge *host)
}

static const struct soc_device_attribute mt7621_pcie_quirks_match[] = {
- { .soc_id = "mt7621", .revision = "E2" }
+ { .soc_id = "mt7621", .revision = "E2" },
+ { /* sentinel */ }
};

static int mt7621_pcie_probe(struct platform_device *pdev)
diff --git a/drivers/phy/ralink/phy-mt7621-pci.c
b/drivers/phy/ralink/phy-mt7621-pci.c
index 5e6530f545b5..85888ab2d307 100644
--- a/drivers/phy/ralink/phy-mt7621-pci.c
+++ b/drivers/phy/ralink/phy-mt7621-pci.c
@@ -280,7 +280,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct
device *dev,
}

static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
- { .soc_id = "mt7621", .revision = "E2" }
+ { .soc_id = "mt7621", .revision = "E2" },
+ { /* sentinel */ }
};

static const struct regmap_config mt7621_pci_phy_regmap_config = {

With this two minor changes and your patches the system properly boots
and behaves properly.

So FWIW feel free to add my:

Tested-by: Sergio Paracuellos <[email protected]>
Acked-by: Sergio Paracuellos <[email protected]>

Please, let me know if you want me to send any patches or if you are
going to create a complete patchset with all the needed changes.

Thank you very much for doing this!

Best regards,
Sergio Paracuellos

[snip]

2022-11-04 21:52:45

by John Thomson

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mips: ralink: mt7621: do not use kzalloc too early

On Fri, 4 Nov 2022, at 12:29, Sergio Paracuellos wrote:

> I have verified with 6.1.0-rc1 system does not boot as you was pointed out here.
> After adding your patches the system boots and got an Oops because
> soc_device_match_attr:
>
> [ 20.569959] CPU 0 Unable to handle kernel paging request at virtual
> address 675f6b6c, epc == 80403dec, ra == 804ae11c
> [ 20.591060] Oops[#1]:
> [ 20.595462] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #148
> [ 20.608265] $ 0 : 00000000 00000001 82262a00 00000000
> [ 20.618615] $ 4 : 675f6b6c 808dea04 00000000 804ae138
> [ 20.628983] $ 8 : 00000000 808787ba 00000000 821f4b00
> [ 20.639351] $12 : 0000005b 0000005d 0000002d 0000005c
> [ 20.649735] $16 : 82253580 807b4034 807b4034 804ae138
> [ 20.660087] $20 : fffffff4 82c382b8 809e1094 00000008
> [ 20.670455] $24 : 0000002a 0000003f
> [ 20.680823] $28 : 82050000 82051c30 80a0d638 804ae11c
> [ 20.691190] Hi : 00000037
> [ 20.696891] Lo : 5c28f6a0
> [ 20.702610] epc : 80403dec glob_match+0x1c/0x240
> [ 20.712100] ra : 804ae11c soc_device_match_attr+0xac/0xc8
> [ 20.723330] Status: 11000403 KERNEL EXL IE
> [ 20.731626] Cause : 40800008 (ExcCode 02)
> [ 20.739576] BadVA : 675f6b6c
> [ 20.745277] PrId : 0001992f (MIPS 1004Kc)
> [ 20.753414] Modules linked in:
> [ 20.759448] Process swapper/0 (pid: 1, threadinfo=(ptrval),
> task=(ptrval), tls=00000000)
> [ 20.775520] Stack : fffffff4 80496ab8 820c6010 828c8518 80950000
> ffffffea 80950000 80496b48
> [ 20.792106] 00000000 828c8400 820c6010 821f4880 1e160000
> 821bc754 82253734 7f8268e6
> [ 20.808707] 809c6a94 807b4034 804ae138 809c8e88 819a0000
> 804ae1d8 80a0d638 80438e10
> [ 20.825282] 821f3e70 80950000 808c0000 828c8400 820c6000
> 828c8548 820c6010 80456608
> [ 20.841879] 821f3dc0 821d32c0 819a0000 801d8768 821f3dc0
> 821d32c0 828c8540 80950000
> [ 20.858473] ...
> [ 20.863298] Call Trace:
> [ 20.868137] [<80403dec>] glob_match+0x1c/0x240
> [ 20.876955] [<804ae11c>] soc_device_match_attr+0xac/0xc8
> [ 20.887500] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> [ 20.897176] [<804ae1d8>] soc_device_match+0x98/0xc8
> [ 20.906869] [<80456608>] mt7621_pcie_probe+0x90/0x7b8
> [ 20.916876] [<8049b46c>] platform_probe+0x54/0x94
> [ 20.926206] [<80499058>] really_probe+0x200/0x434
> [ 20.935538] [<80499520>] driver_probe_device+0x44/0xd4
> [ 20.945732] [<80499ae0>] __driver_attach+0xb8/0x1b0
> [ 20.955428] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> [ 20.965089] [<80497f18>] bus_add_driver+0x100/0x218
> [ 20.974763] [<8049a338>] driver_register+0xd0/0x118
> [ 20.984438] [<80001590>] do_one_initcall+0x8c/0x28c
> [ 20.994115] [<809e21c8>] kernel_init_freeable+0x254/0x28c
> [ 21.004845] [<80781070>] kernel_init+0x24/0x118
> [ 21.013830] [<800034f8>] ret_from_kernel_thread+0x14/0x1c
> [ 21.024522]
> [ 21.027457] Code: 240f005c 2418002a 2419003f <80820000> 24a90001
> 90a70000 104c006f 24860001 2843005c
> [ 21.046810]
> [ 21.049830] ---[ end trace 0000000000000000 ]---
> [ 21.058935] Kernel panic - not syncing: Fatal exception
> [ 21.069310] Rebooting in 1 seconds..
>
> I have fixed this adding two sentinels in the following files:
>
> drivers/pci/controller/pcie-mt7621.c
> drivers/phy/ralink/phy-mt7621-pci.c
>
> sergio@camaron:~/GNUBEE-SERGIO-TEST/linux$ git diff
> drivers/pci/controller/pcie-mt7621.c
> drivers/phy/ralink/phy-mt7621-pci.c
> diff --git a/drivers/pci/controller/pcie-mt7621.c
> b/drivers/pci/controller/pcie-mt7621.c
> index 4bd1abf26008..ee7aad09d627 100644
> --- a/drivers/pci/controller/pcie-mt7621.c
> +++ b/drivers/pci/controller/pcie-mt7621.c
> @@ -466,7 +466,8 @@ static int mt7621_pcie_register_host(struct
> pci_host_bridge *host)
> }
>
> static const struct soc_device_attribute mt7621_pcie_quirks_match[] = {
> - { .soc_id = "mt7621", .revision = "E2" }
> + { .soc_id = "mt7621", .revision = "E2" },
> + { /* sentinel */ }
> };
>
> static int mt7621_pcie_probe(struct platform_device *pdev)
> diff --git a/drivers/phy/ralink/phy-mt7621-pci.c
> b/drivers/phy/ralink/phy-mt7621-pci.c
> index 5e6530f545b5..85888ab2d307 100644
> --- a/drivers/phy/ralink/phy-mt7621-pci.c
> +++ b/drivers/phy/ralink/phy-mt7621-pci.c
> @@ -280,7 +280,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct
> device *dev,
> }
>
> static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
> - { .soc_id = "mt7621", .revision = "E2" }
> + { .soc_id = "mt7621", .revision = "E2" },
> + { /* sentinel */ }
> };
>
> static const struct regmap_config mt7621_pci_phy_regmap_config = {
>
> With this two minor changes and your patches the system properly boots
> and behaves properly.

Thank you for finding time, and testing and verifying this.

>
> So FWIW feel free to add my:
>
> Tested-by: Sergio Paracuellos <[email protected]>
> Acked-by: Sergio Paracuellos <[email protected]>
>
> Please, let me know if you want me to send any patches or if you are
> going to create a complete patchset with all the needed changes.

I sent in these two patches with Fixes tags, and some queries about getting
those pci & phy changes in before this fix, and also possibly in the 5.10 and 5.15 stable trees,
in case we want this kzalloc change too early backported as well? Please let me know what you think.

>
> Thank you very much for doing this!
>
> Best regards,
> Sergio Paracuellos
>
> [snip]

Some more queries here:
I should add a note in this commit message that this boot failure only happens with CONFIG_SLUB=y
Fixes reference or not?
Fixes 71b9b5e0130d ("MIPS: ralink: mt7621: introduce 'soc_device' initialization")
I used device_initcall, but postcore_initcall also works fine, and I am not sure of the difference here.


Cheers,
--
John Thomson

2022-11-05 07:04:46

by Sergio Paracuellos

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mips: ralink: mt7621: do not use kzalloc too early

Hi John,

On Fri, Nov 4, 2022 at 10:13 PM John Thomson
<[email protected]> wrote:
>
> On Fri, 4 Nov 2022, at 12:29, Sergio Paracuellos wrote:
>
> > I have verified with 6.1.0-rc1 system does not boot as you was pointed out here.
> > After adding your patches the system boots and got an Oops because
> > soc_device_match_attr:
> >
> > [ 20.569959] CPU 0 Unable to handle kernel paging request at virtual
> > address 675f6b6c, epc == 80403dec, ra == 804ae11c
> > [ 20.591060] Oops[#1]:
> > [ 20.595462] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #148
> > [ 20.608265] $ 0 : 00000000 00000001 82262a00 00000000
> > [ 20.618615] $ 4 : 675f6b6c 808dea04 00000000 804ae138
> > [ 20.628983] $ 8 : 00000000 808787ba 00000000 821f4b00
> > [ 20.639351] $12 : 0000005b 0000005d 0000002d 0000005c
> > [ 20.649735] $16 : 82253580 807b4034 807b4034 804ae138
> > [ 20.660087] $20 : fffffff4 82c382b8 809e1094 00000008
> > [ 20.670455] $24 : 0000002a 0000003f
> > [ 20.680823] $28 : 82050000 82051c30 80a0d638 804ae11c
> > [ 20.691190] Hi : 00000037
> > [ 20.696891] Lo : 5c28f6a0
> > [ 20.702610] epc : 80403dec glob_match+0x1c/0x240
> > [ 20.712100] ra : 804ae11c soc_device_match_attr+0xac/0xc8
> > [ 20.723330] Status: 11000403 KERNEL EXL IE
> > [ 20.731626] Cause : 40800008 (ExcCode 02)
> > [ 20.739576] BadVA : 675f6b6c
> > [ 20.745277] PrId : 0001992f (MIPS 1004Kc)
> > [ 20.753414] Modules linked in:
> > [ 20.759448] Process swapper/0 (pid: 1, threadinfo=(ptrval),
> > task=(ptrval), tls=00000000)
> > [ 20.775520] Stack : fffffff4 80496ab8 820c6010 828c8518 80950000
> > ffffffea 80950000 80496b48
> > [ 20.792106] 00000000 828c8400 820c6010 821f4880 1e160000
> > 821bc754 82253734 7f8268e6
> > [ 20.808707] 809c6a94 807b4034 804ae138 809c8e88 819a0000
> > 804ae1d8 80a0d638 80438e10
> > [ 20.825282] 821f3e70 80950000 808c0000 828c8400 820c6000
> > 828c8548 820c6010 80456608
> > [ 20.841879] 821f3dc0 821d32c0 819a0000 801d8768 821f3dc0
> > 821d32c0 828c8540 80950000
> > [ 20.858473] ...
> > [ 20.863298] Call Trace:
> > [ 20.868137] [<80403dec>] glob_match+0x1c/0x240
> > [ 20.876955] [<804ae11c>] soc_device_match_attr+0xac/0xc8
> > [ 20.887500] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> > [ 20.897176] [<804ae1d8>] soc_device_match+0x98/0xc8
> > [ 20.906869] [<80456608>] mt7621_pcie_probe+0x90/0x7b8
> > [ 20.916876] [<8049b46c>] platform_probe+0x54/0x94
> > [ 20.926206] [<80499058>] really_probe+0x200/0x434
> > [ 20.935538] [<80499520>] driver_probe_device+0x44/0xd4
> > [ 20.945732] [<80499ae0>] __driver_attach+0xb8/0x1b0
> > [ 20.955428] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> > [ 20.965089] [<80497f18>] bus_add_driver+0x100/0x218
> > [ 20.974763] [<8049a338>] driver_register+0xd0/0x118
> > [ 20.984438] [<80001590>] do_one_initcall+0x8c/0x28c
> > [ 20.994115] [<809e21c8>] kernel_init_freeable+0x254/0x28c
> > [ 21.004845] [<80781070>] kernel_init+0x24/0x118
> > [ 21.013830] [<800034f8>] ret_from_kernel_thread+0x14/0x1c
> > [ 21.024522]
> > [ 21.027457] Code: 240f005c 2418002a 2419003f <80820000> 24a90001
> > 90a70000 104c006f 24860001 2843005c
> > [ 21.046810]
> > [ 21.049830] ---[ end trace 0000000000000000 ]---
> > [ 21.058935] Kernel panic - not syncing: Fatal exception
> > [ 21.069310] Rebooting in 1 seconds..
> >
> > I have fixed this adding two sentinels in the following files:
> >
> > drivers/pci/controller/pcie-mt7621.c
> > drivers/phy/ralink/phy-mt7621-pci.c
> >
> > sergio@camaron:~/GNUBEE-SERGIO-TEST/linux$ git diff
> > drivers/pci/controller/pcie-mt7621.c
> > drivers/phy/ralink/phy-mt7621-pci.c
> > diff --git a/drivers/pci/controller/pcie-mt7621.c
> > b/drivers/pci/controller/pcie-mt7621.c
> > index 4bd1abf26008..ee7aad09d627 100644
> > --- a/drivers/pci/controller/pcie-mt7621.c
> > +++ b/drivers/pci/controller/pcie-mt7621.c
> > @@ -466,7 +466,8 @@ static int mt7621_pcie_register_host(struct
> > pci_host_bridge *host)
> > }
> >
> > static const struct soc_device_attribute mt7621_pcie_quirks_match[] = {
> > - { .soc_id = "mt7621", .revision = "E2" }
> > + { .soc_id = "mt7621", .revision = "E2" },
> > + { /* sentinel */ }
> > };
> >
> > static int mt7621_pcie_probe(struct platform_device *pdev)
> > diff --git a/drivers/phy/ralink/phy-mt7621-pci.c
> > b/drivers/phy/ralink/phy-mt7621-pci.c
> > index 5e6530f545b5..85888ab2d307 100644
> > --- a/drivers/phy/ralink/phy-mt7621-pci.c
> > +++ b/drivers/phy/ralink/phy-mt7621-pci.c
> > @@ -280,7 +280,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct
> > device *dev,
> > }
> >
> > static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
> > - { .soc_id = "mt7621", .revision = "E2" }
> > + { .soc_id = "mt7621", .revision = "E2" },
> > + { /* sentinel */ }
> > };
> >
> > static const struct regmap_config mt7621_pci_phy_regmap_config = {
> >
> > With this two minor changes and your patches the system properly boots
> > and behaves properly.
>
> Thank you for finding time, and testing and verifying this.
>
> >
> > So FWIW feel free to add my:
> >
> > Tested-by: Sergio Paracuellos <[email protected]>
> > Acked-by: Sergio Paracuellos <[email protected]>
> >
> > Please, let me know if you want me to send any patches or if you are
> > going to create a complete patchset with all the needed changes.
>
> I sent in these two patches with Fixes tags, and some queries about getting
> those pci & phy changes in before this fix, and also possibly in the 5.10 and 5.15 stable trees,
> in case we want this kzalloc change too early backported as well? Please let me know what you think.

I don't really know. I don't think kzalloc patches are stable
material, so I don't think we have a real need of backporting these
two also. Also, this SoC is used intensively for the openWRT community
and never reported an issue in this way and they are using both 5.10
(stable) and 5.15 (development) kernels.

>
> >
> > Thank you very much for doing this!
> >
> > Best regards,
> > Sergio Paracuellos
> >
> > [snip]
>
> Some more queries here:
> I should add a note in this commit message that this boot failure only happens with CONFIG_SLUB=y

It does not hurt at all adding this, so it will be helpful for sure in
future if issues appear.

> Fixes reference or not?
> Fixes 71b9b5e0130d ("MIPS: ralink: mt7621: introduce 'soc_device' initialization")

I guess it should be also there.

> I used device_initcall, but postcore_initcall also works fine, and I am not sure of the difference here.

Difference is the execution order at boot. postcore_initcall is
executed earlier than device_initcall. See [0] for details.

Thanks.
Sergio Paracuellos

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/init/main.c

>
>
> Cheers,
> --
> John Thomson