2021-06-15 11:13:30

by Naresh Kamboju

[permalink] [raw]
Subject: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
with allmodconfig build.

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[ 0.000000] Linux version 5.13.0-rc6-next-20210615
(tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
PREEMPT Tue Jun 15 10:20:51 UTC 2021
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] printk: bootconsole [pl11] enabled
[ 0.000000] efi: UEFI not found.
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000040000000-0x00000000bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
[ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G T
5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
[ 0.000000] pc : __phys_addr_symbol+0x44/0xc0
[ 0.000000] lr : __phys_addr_symbol+0x44/0xc0
[ 0.000000] sp : ffff800014287b00
[ 0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
[ 0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
[ 0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
[ 0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
[ 0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
[ 0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
[ 0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
[ 0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
[ 0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
[ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 0.000000] Call trace:
[ 0.000000] __phys_addr_symbol+0x44/0xc0
[ 0.000000] sparse_init_nid+0x98/0x6d0
[ 0.000000] sparse_init+0x460/0x4d4
[ 0.000000] bootmem_init+0x110/0x340
[ 0.000000] setup_arch+0x1b8/0x2e0
[ 0.000000] start_kernel+0x110/0x870
[ 0.000000] __primary_switched+0xa8/0xb0
[ 0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
[ 0.000000] random: get_random_bytes called from
oops_exit+0x54/0xc0 with crng_init=0
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
[ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
exception ]---

Reported-by: Naresh Kamboju <[email protected]>

--
Linaro LKFT
https://lkft.linaro.org


2021-06-15 11:52:29

by Will Deacon

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

Hi Naresh,

On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> with allmodconfig build.
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> [ 0.000000] Linux version 5.13.0-rc6-next-20210615
> (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> PREEMPT Tue Jun 15 10:20:51 UTC 2021
> [ 0.000000] Machine model: linux,dummy-virt
> [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> [ 0.000000] printk: bootconsole [pl11] enabled
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] NUMA: No NUMA configuration found
> [ 0.000000] NUMA: Faking a node at [mem
> 0x0000000040000000-0x00000000bfffffff]
> [ 0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G T
> 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> [ 0.000000] Hardware name: linux,dummy-virt (DT)
> [ 0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> [ 0.000000] pc : __phys_addr_symbol+0x44/0xc0
> [ 0.000000] lr : __phys_addr_symbol+0x44/0xc0
> [ 0.000000] sp : ffff800014287b00
> [ 0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> [ 0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> [ 0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> [ 0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> [ 0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> [ 0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> [ 0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> [ 0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> [ 0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> [ 0.000000] Call trace:
> [ 0.000000] __phys_addr_symbol+0x44/0xc0
> [ 0.000000] sparse_init_nid+0x98/0x6d0
> [ 0.000000] sparse_init+0x460/0x4d4
> [ 0.000000] bootmem_init+0x110/0x340
> [ 0.000000] setup_arch+0x1b8/0x2e0
> [ 0.000000] start_kernel+0x110/0x870
> [ 0.000000] __primary_switched+0xa8/0xb0
> [ 0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> [ 0.000000] random: get_random_bytes called from
> oops_exit+0x54/0xc0 with crng_init=0
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> exception ]---
>
> Reported-by: Naresh Kamboju <[email protected]>

Thanks for the report, although since this appears to be part of a broader
testing effort, here are some things that I think would make the reports
even more useful:

1. An indication as to whether or not this is a regression (i.e. do you
have a known good build, perhaps even a bisection?)

2. Either a link to the vmlinux, or faddr2line run on the backtrace.
Looking at the above, I can't tell what sparse_init_nid+0x98/0x6d0
actually is.

3. The exact QEMU command-line you are using, so I can try to reproduce
this locally. I think the 0-day bot wraps the repro up in a shell
script for you.

4. Whether or not the issue is reproducible.

5. Information about the toolchain you used to build the kernel (it
happens to be present here because it's in the kernel log, but
generally I think it would be handy to specify that in the report).

Please can you provide that information for this crash? It would really
help in debugging it.

Thanks!

Will

2021-06-15 12:50:00

by Mark Rutland

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> with allmodconfig build.
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> [ 0.000000] Linux version 5.13.0-rc6-next-20210615
> (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> PREEMPT Tue Jun 15 10:20:51 UTC 2021
> [ 0.000000] Machine model: linux,dummy-virt
> [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> [ 0.000000] printk: bootconsole [pl11] enabled
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] NUMA: No NUMA configuration found
> [ 0.000000] NUMA: Faking a node at [mem
> 0x0000000040000000-0x00000000bfffffff]
> [ 0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G T
> 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> [ 0.000000] Hardware name: linux,dummy-virt (DT)
> [ 0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> [ 0.000000] pc : __phys_addr_symbol+0x44/0xc0
> [ 0.000000] lr : __phys_addr_symbol+0x44/0xc0
> [ 0.000000] sp : ffff800014287b00
> [ 0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> [ 0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> [ 0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> [ 0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> [ 0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> [ 0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> [ 0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> [ 0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> [ 0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> [ 0.000000] Call trace:
> [ 0.000000] __phys_addr_symbol+0x44/0xc0
> [ 0.000000] sparse_init_nid+0x98/0x6d0

From the looks of it, this is pgdat_to_phys, as introduced in next
commit:

e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")

It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
node_data array (since contig_page_data is only defined for !NUMA).

I don't think that commit is correct.

Thanks,
Mark.

> [ 0.000000] sparse_init+0x460/0x4d4
> [ 0.000000] bootmem_init+0x110/0x340
> [ 0.000000] setup_arch+0x1b8/0x2e0
> [ 0.000000] start_kernel+0x110/0x870
> [ 0.000000] __primary_switched+0xa8/0xb0
> [ 0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> [ 0.000000] random: get_random_bytes called from
> oops_exit+0x54/0xc0 with crng_init=0
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> exception ]---
>
> Reported-by: Naresh Kamboju <[email protected]>
>
> --
> Linaro LKFT
> https://lkft.linaro.org

2021-06-15 13:23:19

by Mark Rutland

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

On Tue, Jun 15, 2021 at 01:47:45PM +0100, Mark Rutland wrote:
> On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > with allmodconfig build.
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > [ 0.000000] Linux version 5.13.0-rc6-next-20210615
> > (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> > 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> > PREEMPT Tue Jun 15 10:20:51 UTC 2021
> > [ 0.000000] Machine model: linux,dummy-virt
> > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> > [ 0.000000] printk: bootconsole [pl11] enabled
> > [ 0.000000] efi: UEFI not found.
> > [ 0.000000] NUMA: No NUMA configuration found
> > [ 0.000000] NUMA: Faking a node at [mem
> > 0x0000000040000000-0x00000000bfffffff]
> > [ 0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> > [ 0.000000] ------------[ cut here ]------------
> > [ 0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> > [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G T
> > 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> > [ 0.000000] Hardware name: linux,dummy-virt (DT)
> > [ 0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > [ 0.000000] pc : __phys_addr_symbol+0x44/0xc0
> > [ 0.000000] lr : __phys_addr_symbol+0x44/0xc0
> > [ 0.000000] sp : ffff800014287b00
> > [ 0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> > [ 0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> > [ 0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> > [ 0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> > [ 0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> > [ 0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> > [ 0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> > [ 0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> > [ 0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> > [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> > [ 0.000000] Call trace:
> > [ 0.000000] __phys_addr_symbol+0x44/0xc0
> > [ 0.000000] sparse_init_nid+0x98/0x6d0
>
> From the looks of it, this is pgdat_to_phys, as introduced in next
> commit:
>
> e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")
>
> It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
> but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
> node_data array (since contig_page_data is only defined for !NUMA).
>
> I don't think that commit is correct.

Looking some more, it looks like that's correct in isolation, but it
clashes with commit:

5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")

... and I reckon it'd be clearer and more robust to define
pgdat_to_phys() in the same ifdefs as contig_page_data so that
these, stay in-sync. e.g. have:

| #ifdef CONFIG_NUMA
| #define pgdat_to_phys(x) virt_to_phys(x)
| #else /* CONFIG_NUMA */
|
| extern struct pglist_data contig_page_data;
| ...
| #define pgdat_to_phys(x) __pa_symbol(&contig_page_data)
|
| #endif /* CONIFIG_NUMA */

... which'd also make clear that contig_page_data is the *only* expected
pglist_data.

Thanks,
Mark.

> Thanks,
> Mark.
>
> > [ 0.000000] sparse_init+0x460/0x4d4
> > [ 0.000000] bootmem_init+0x110/0x340
> > [ 0.000000] setup_arch+0x1b8/0x2e0
> > [ 0.000000] start_kernel+0x110/0x870
> > [ 0.000000] __primary_switched+0xa8/0xb0
> > [ 0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> > [ 0.000000] random: get_random_bytes called from
> > oops_exit+0x54/0xc0 with crng_init=0
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> > exception ]---
> >
> > Reported-by: Naresh Kamboju <[email protected]>
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org

2021-06-15 14:51:41

by Qian Cai

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c



On 6/15/2021 9:19 AM, Mark Rutland wrote:
> Looking some more, it looks like that's correct in isolation, but it
> clashes with commit:
>
> 5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")

Just a data point. Reverting the commit alone fixed the same crash for me.

2021-06-15 19:26:00

by Mike Rapoport

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

On Tue, Jun 15, 2021 at 10:50:31AM -0400, Qian Cai wrote:
>
>
> On 6/15/2021 9:19 AM, Mark Rutland wrote:
> > Looking some more, it looks like that's correct in isolation, but it
> > clashes with commit:
> >
> > 5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")
>
> Just a data point. Reverting the commit alone fixed the same crash for me.

Yeah, that commit didn't take into the account the change in
pgdat_to_phys().

The patch below should fix it. In the long run I think we should get rid of
contig_page_data and allocate NODE_DATA(0) for !NUMA case as well.

Andrew, can you please add this as a fixup to "mm: replace
CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA"?


diff --git a/mm/sparse.c b/mm/sparse.c
index a0e9cdb5bc38..6326cdf36c4f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -347,7 +347,7 @@ size_t mem_section_usage_size(void)

static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
{
-#ifndef CONFIG_NEED_MULTIPLE_NODES
+#ifndef CONFIG_NUMA
return __pa_symbol(pgdat);
#else
return __pa(pgdat);

--
Sincerely yours,
Mike.

2021-06-15 23:36:42

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

Hi all,

On Tue, 15 Jun 2021 22:21:32 +0300 Mike Rapoport <[email protected]> wrote:
>
> On Tue, Jun 15, 2021 at 10:50:31AM -0400, Qian Cai wrote:
> >
> > On 6/15/2021 9:19 AM, Mark Rutland wrote:
> > > Looking some more, it looks like that's correct in isolation, but it
> > > clashes with commit:
> > >
> > > 5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")
> >
> > Just a data point. Reverting the commit alone fixed the same crash for me.
>
> Yeah, that commit didn't take into the account the change in
> pgdat_to_phys().
>
> The patch below should fix it. In the long run I think we should get rid of
> contig_page_data and allocate NODE_DATA(0) for !NUMA case as well.
>
> Andrew, can you please add this as a fixup to "mm: replace
> CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA"?
>
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index a0e9cdb5bc38..6326cdf36c4f 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -347,7 +347,7 @@ size_t mem_section_usage_size(void)
>
> static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
> {
> -#ifndef CONFIG_NEED_MULTIPLE_NODES
> +#ifndef CONFIG_NUMA
> return __pa_symbol(pgdat);
> #else
> return __pa(pgdat);

Added to linux-next today.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-15 23:41:34

by Miles Chen

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

On Wed, 2021-06-16 at 09:34 +1000, Stephen Rothwell wrote:
> Hi all,
>
> On Tue, 15 Jun 2021 22:21:32 +0300 Mike Rapoport <[email protected]> wrote:
> >
> > On Tue, Jun 15, 2021 at 10:50:31AM -0400, Qian Cai wrote:
> > >
> > > On 6/15/2021 9:19 AM, Mark Rutland wrote:
> > > > Looking some more, it looks like that's correct in isolation, but it
> > > > clashes with commit:
> > > >
> > > > 5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")
> > >
> > > Just a data point. Reverting the commit alone fixed the same crash for me.
> >
> > Yeah, that commit didn't take into the account the change in
> > pgdat_to_phys().
> >
> > The patch below should fix it. In the long run I think we should get rid of
> > contig_page_data and allocate NODE_DATA(0) for !NUMA case as well.
> >
> > Andrew, can you please add this as a fixup to "mm: replace
> > CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA"?
> >
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index a0e9cdb5bc38..6326cdf36c4f 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -347,7 +347,7 @@ size_t mem_section_usage_size(void)
> >
> > static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
> > {
> > -#ifndef CONFIG_NEED_MULTIPLE_NODES
> > +#ifndef CONFIG_NUMA
> > return __pa_symbol(pgdat);
> > #else
> > return __pa(pgdat);
>
> Added to linux-next today.
>

Sorry for my late response.
thanks for doing this.

Miles

2021-06-16 00:31:08

by Miles Chen

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

On Tue, 2021-06-15 at 14:19 +0100, Mark Rutland wrote:
> On Tue, Jun 15, 2021 at 01:47:45PM +0100, Mark Rutland wrote:
> > On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > > with allmodconfig build.
> > >
> > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > > [ 0.000000] Linux version 5.13.0-rc6-next-20210615
> > > (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> > > 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> > > PREEMPT Tue Jun 15 10:20:51 UTC 2021
> > > [ 0.000000] Machine model: linux,dummy-virt
> > > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> > > [ 0.000000] printk: bootconsole [pl11] enabled
> > > [ 0.000000] efi: UEFI not found.
> > > [ 0.000000] NUMA: No NUMA configuration found
> > > [ 0.000000] NUMA: Faking a node at [mem
> > > 0x0000000040000000-0x00000000bfffffff]
> > > [ 0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> > > [ 0.000000] ------------[ cut here ]------------
> > > [ 0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> > > [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > > [ 0.000000] Modules linked in:
> > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G T
> > > 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> > > [ 0.000000] Hardware name: linux,dummy-virt (DT)
> > > [ 0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > > [ 0.000000] pc : __phys_addr_symbol+0x44/0xc0
> > > [ 0.000000] lr : __phys_addr_symbol+0x44/0xc0
> > > [ 0.000000] sp : ffff800014287b00
> > > [ 0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> > > [ 0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> > > [ 0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> > > [ 0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> > > [ 0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> > > [ 0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> > > [ 0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> > > [ 0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> > > [ 0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> > > [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> > > [ 0.000000] Call trace:
> > > [ 0.000000] __phys_addr_symbol+0x44/0xc0
> > > [ 0.000000] sparse_init_nid+0x98/0x6d0
> >
> > From the looks of it, this is pgdat_to_phys, as introduced in next
> > commit:
> >
> > e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")
> >
> > It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
> > but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
> > node_data array (since contig_page_data is only defined for !NUMA).
> >
> > I don't think that commit is correct.
>
> Looking some more, it looks like that's correct in isolation, but it
> clashes with commit:
>
> 5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")
>
> ... and I reckon it'd be clearer and more robust to define
> pgdat_to_phys() in the same ifdefs as contig_page_data so that
> these, stay in-sync. e.g. have:
>
> | #ifdef CONFIG_NUMA
> | #define pgdat_to_phys(x) virt_to_phys(x)
> | #else /* CONFIG_NUMA */
> |
> | extern struct pglist_data contig_page_data;
> | ...
> | #define pgdat_to_phys(x) __pa_symbol(&contig_page_data)
> |
> | #endif /* CONIFIG_NUMA */
>
> ... which'd also make clear that contig_page_data is the *only* expected
> pglist_data.

Thanks for your suggestion.
It looks more clear, I will submit another patch for this. (after the
merge)

Miles

> Thanks,
> Mark.
>
> > Thanks,
> > Mark.
> >
> > > [ 0.000000] sparse_init+0x460/0x4d4
> > > [ 0.000000] bootmem_init+0x110/0x340
> > > [ 0.000000] setup_arch+0x1b8/0x2e0
> > > [ 0.000000] start_kernel+0x110/0x870
> > > [ 0.000000] __primary_switched+0xa8/0xb0
> > > [ 0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> > > [ 0.000000] random: get_random_bytes called from
> > > oops_exit+0x54/0xc0 with crng_init=0
> > > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > > [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > > [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> > > exception ]---
> > >
> > > Reported-by: Naresh Kamboju <[email protected]>
> > >
> > > --
> > > Linaro LKFT
> > > https://lkft.linaro.org

2021-06-17 17:24:40

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

Hi Will,

On Tue, 15 Jun 2021 at 17:20, Will Deacon <[email protected]> wrote:
>
> Hi Naresh,
>
> On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > with allmodconfig build.

<trim>

> Thanks for the report, although since this appears to be part of a broader
> testing effort, here are some things that I think would make the reports
> even more useful:
>
> 1. An indication as to whether or not this is a regression (i.e. do you
> have a known good build, perhaps even a bisection?)
>
> 2. Either a link to the vmlinux, or faddr2line run on the backtrace.
> Looking at the above, I can't tell what sparse_init_nid+0x98/0x6d0
> actually is.
>
> 3. The exact QEMU command-line you are using, so I can try to reproduce
> this locally. I think the 0-day bot wraps the repro up in a shell
> script for you.
>
> 4. Whether or not the issue is reproducible.
>
> 5. Information about the toolchain you used to build the kernel (it
> happens to be present here because it's in the kernel log, but
> generally I think it would be handy to specify that in the report).
>
> Please can you provide that information for this crash? It would really
> help in debugging it.

Sorry for the incomplete bug report.

Thanks for sharing these details.
Next time I will include the suggested data points in my email report.

- Naresh