On Mon, 16 May 2022 07:16:22 +0100,
Naresh Kamboju <[email protected]> wrote:
>
> The kernel crash reported on arm64 juno-r2 device with kselftest-merge config
> while booting Linux next-20220513 kernel [1].
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> [ 0.000000] Linux version 5.18.0-rc6-next-20220513
> (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU
> Binutils) 2.30.0.20180208) #1 SMP PREEMPT Fri May 13 08:34:42 UTC 2022
> [ 0.000000] Machine model: ARM Juno development board (r2)
> [ 0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
> [ 0.000000] printk: bootconsole [pl11] enabled
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] NUMA: No NUMA configuration found
> [ 0.000000] NUMA: Faking a node at [mem
> 0x0000000080000000-0x00000009ffffffff]
> [ 0.000000] NUMA: NODE_DATA [mem 0x9fefce600-0x9fefd0fff]
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff]
> [ 0.000000] DMA32 empty
> [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000feffffff]
> [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
> [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
> [ 0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges
> [ 0.000000] cma: Reserved 32 MiB at 0x00000000fd000000
> [ 0.000000] psci: probing for conduit method from DT.
> [ 0.000000] psci: PSCIv1.1 detected in firmware.
> [ 0.000000] psci: Using standard PSCI v0.2 function IDs
> [ 0.000000] psci: Trusted OS migration not required
> [ 0.000000] psci: SMC Calling Convention v1.0
> [ 0.000000] percpu: Embedded 31 pages/cpu s89632 r8192 d29152 u126976
> [ 0.000000] pcpu-alloc: s89632 r8192 d29152 u126976 alloc=31*4096
> [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5
> [ 0.000000] Detected VIPT I-cache on CPU0
> [ 0.000000] CPU features: detected: ARM erratum 843419
> [ 0.000000] CPU features: detected: ARM erratum 845719
> [ 0.000000] Fallback order for Node 0: 0
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2060288
> [ 0.000000] Policy zone: Normal
> [ 0.000000] Kernel command line: console=ttyAMA0,115200n8
> root=/dev/nfs rw
> nfsroot=10.66.16.125:/var/lib/lava/dispatcher/tmp/5021955/extract-nfsrootfs-23zdukp_,tcp,hard,vers=3
> rootwait earlycon=pl011,0x7ff80000 debug systemd.log_target=null
> user_debug=31 androidboot.hardware=juno loglevel=9
> sky2.mac_address=0x00,0x02,0xF7,0x00,0x68,0x3F ip=dhcp
> [ 0.000000] Unknown kernel command line parameters
> \"user_debug=31\", will be passed to user space.
> [ 0.000000] Dentry cache hash table entries: 1048576 (order: 11,
> 8388608 bytes, linear)
> [ 0.000000] Inode-cache hash table entries: 524288 (order: 10,
> 4194304 bytes, linear)
> [ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
> [ 0.000000] Stack Depot early init allocating hash table with
> memblock_alloc, 8388608 bytes
> [ 0.000000] software IO TLB: mapped [mem
> 0x00000000f9000000-0x00000000fd000000] (64MB)
> [ 0.000000] Memory: 8038640K/8372224K available (22784K kernel
> code, 5468K rwdata, 11824K rodata, 11520K init, 11734K bss, 300816K
> reserved, 32768K cma-reserved)
> [ 0.000000] **********************************************************
> [ 0.000000] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
> [ 0.000000] ** **
> [ 0.000000] ** This system shows unhashed kernel memory addresses **
> [ 0.000000] ** via the console, logs, and other interfaces. This **
> [ 0.000000] ** might reduce the security of your system. **
> [ 0.000000] ** **
> [ 0.000000] ** If you see this message and you are not debugging **
> [ 0.000000] ** the kernel, report this immediately to your system **
> [ 0.000000] ** administrator! **
> [ 0.000000] ** **
> [ 0.000000] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
> [ 0.000000] **********************************************************
> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
> [ 0.000000] ftrace: allocating 69398 entries in 272 pages
> [ 0.000000] ftrace: allocated 272 pages with 2 groups
> [ 0.000000] trace event string verifier disabled
> [ 0.000000] Running RCU self tests
> [ 0.000000] rcu: Preemptible hierarchical RCU implementation.
> [ 0.000000] rcu: RCU event tracing is enabled.
> [ 0.000000] rcu: RCU lockdep checking is enabled.
> [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=6.
> [ 0.000000] Trampoline variant of Tasks RCU enabled.
> [ 0.000000] Rude variant of Tasks RCU enabled.
> [ 0.000000] Tracing variant of Tasks RCU enabled.
> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> is 25 jiffies.
> [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
> [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> [ 0.000000] Root IRQ handler: gic_handle_irq
> [ 0.000000] GIC: Using split EOI/Deactivate mode
> [ 0.000000] Unexpected kernel BRK exception at EL1
Huh. Who inserts random BRKs like this?
> [ 0.000000] Internal error: BRK handler: f20003e8 [#1] PREEMPT SMP
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 5.18.0-rc6-next-20220513 #1
> [ 0.000000] Hardware name: ARM Juno development board (r2) (DT)
> [ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 0.000000] pc : gic_dist_config+0x4c/0x68
> [ 0.000000] lr : gic_init_bases+0xd4/0x248
Please provide a disassembly of this function.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Hi Marc,
Thanks for looking into this report.
On Mon, 16 May 2022 at 12:38, Marc Zyngier <[email protected]> wrote:
>
> On Mon, 16 May 2022 07:16:22 +0100,
> Naresh Kamboju <[email protected]> wrote:
> >
> > The kernel crash reported on arm64 juno-r2 device with kselftest-merge config
> > while booting Linux next-20220513 kernel [1].
<trim>
>
> Huh. Who inserts random BRKs like this?
>
> > [ 0.000000] Internal error: BRK handler: f20003e8 [#1] PREEMPT SMP
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > 5.18.0-rc6-next-20220513 #1
> > [ 0.000000] Hardware name: ARM Juno development board (r2) (DT)
> > [ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 0.000000] pc : gic_dist_config+0x4c/0x68
> > [ 0.000000] lr : gic_init_bases+0xd4/0x248
>
> Please provide a disassembly of this function.
objdump snipper is here.
http://ix.io/3XUW
The vmlinux file is located in this url
Please make use of it.
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-next/1226/
- Naresh
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
On Mon, 16 May 2022 14:58:28 +0100,
Naresh Kamboju <[email protected]> wrote:
>
> Hi Marc,
>
> Thanks for looking into this report.
>
> On Mon, 16 May 2022 at 12:38, Marc Zyngier <[email protected]> wrote:
> >
> > On Mon, 16 May 2022 07:16:22 +0100,
> > Naresh Kamboju <[email protected]> wrote:
> > >
> > > The kernel crash reported on arm64 juno-r2 device with kselftest-merge config
> > > while booting Linux next-20220513 kernel [1].
>
> <trim>
>
> >
> > Huh. Who inserts random BRKs like this?
> >
> > > [ 0.000000] Internal error: BRK handler: f20003e8 [#1] PREEMPT SMP
> > > [ 0.000000] Modules linked in:
> > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > > 5.18.0-rc6-next-20220513 #1
> > > [ 0.000000] Hardware name: ARM Juno development board (r2) (DT)
> > > [ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [ 0.000000] pc : gic_dist_config+0x4c/0x68
> > > [ 0.000000] lr : gic_init_bases+0xd4/0x248
> >
> > Please provide a disassembly of this function.
>
> objdump snipper is here.
> http://ix.io/3XUW
Wrong function (I wasn't clear I wanted the breaking function, not the
caller).
> The vmlinux file is located in this url
> Please make use of it.
> http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-next/1226/
ffff8000087f9908 <gic_dist_config>:
ffff8000087f9908: a9bd7bfd stp x29, x30, [sp, #-48]!
ffff8000087f990c: 910003fd mov x29, sp
ffff8000087f9910: a90153f3 stp x19, x20, [sp, #16]
ffff8000087f9914: f90013f5 str x21, [sp, #32]
ffff8000087f9918: 2a0103f3 mov w19, w1
ffff8000087f991c: aa0003f4 mov x20, x0
ffff8000087f9920: aa0203f5 mov x21, x2
ffff8000087f9924: aa1e03e0 mov x0, x30
ffff8000087f9928: 97e0de72 bl ffff8000080312f0 <_mcount>
ffff8000087f992c: 7100827f cmp w19, #0x20
ffff8000087f9930: 54000149 b.ls ffff8000087f9958 <gic_dist_config+0x50> // b.plast
ffff8000087f9934: 52800402 mov w2, #0x20 // #32
ffff8000087f9938: 53027c40 lsr w0, w2, #2
ffff8000087f993c: 91300000 add x0, x0, #0xc00
ffff8000087f9940: 8b000280 add x0, x20, x0
ffff8000087f9944: b900001f str wzr, [x0]
ffff8000087f9948: 11004042 add w2, w2, #0x10
ffff8000087f994c: 6b02027f cmp w19, w2
ffff8000087f9950: 54ffff48 b.hi ffff8000087f9938 <gic_dist_config+0x30> // b.pmore
ffff8000087f9954: d4207d00 brk #0x3e8
What the hell is this??? This function has no WARN_ON, no BUG_ON, the
allowed values for the immediate are:
#define KPROBES_BRK_IMM 0x004
#define UPROBES_BRK_IMM 0x005
#define KPROBES_BRK_SS_IMM 0x006
#define FAULT_BRK_IMM 0x100
#define KGDB_DYN_DBG_BRK_IMM 0x400
#define KGDB_COMPILED_DBG_BRK_IMM 0x401
#define BUG_BRK_IMM 0x800
#define KASAN_BRK_IMM 0x900
#define KASAN_BRK_MASK 0x0ff
and 0x3e8 isn't one of them. This seems like a GCC 'division by zero'
hack, but there are no divisions by zero here. Your kernel is also
full of the stuff.
What sort of odd options do you have? I can't help but notice that you
have the Rust stuff in your tree. Can you please start by disabling
this, just in case there is an interaction with your toolchain?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.