2021-04-27 17:06:11

by Eddie James

[permalink] [raw]
Subject: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

Hi all,

I'm having a problem in simulation and hardware where my PPC476
processor stops executing instructions after callling /init. In my case
this is a bash script. The code descends to flush the TLB, and
somewhere in the loop in _tlbil_pid, the PC goes to
InstructionTLBError47x but does not go any further. This only occurs in
the crash kernel environment, which is using the same kernel,
initramfs, and init script as the main kernel, which executed fine. I
do not see this problem with linux 4.19 or 3.10. I do see it with 5.4
and 5.10. I see a fair amount of refactoring in the PPC memory
management area between 4.19 and 5.4. Can anyone point me in a
direction to debug this further? My stack trace is below as I can run
gdb in simulation.

Thanks,
Eddie


#0 _tlbil_pid () at
/usr/src/kernel/arch/powerpc/mm/nohash/tlb_low.S:123
#1 0xca014864 in local_flush_tlb_mm (mm=<optimized out>)
at /usr/src/kernel/arch/powerpc/mm/nohash/tlb.c:201
#2 tlb_flush (tlb=<optimized out>)
at /usr/src/kernel/arch/powerpc/mm/nohash/tlb.c:395
#3 0xca161e48 in tlb_flush_mmu_tlbonly (tlb=<optimized out>)
at /usr/src/kernel/include/asm-generic/tlb.h:408
#4 tlb_flush_mmu_tlbonly (tlb=<optimized out>)
at /usr/src/kernel/include/asm-generic/tlb.h:403
#5 tlb_flush_mmu (tlb=0xcec2fd18) at
/usr/src/kernel/mm/mmu_gather.c:190
#6 0xca161fa8 in tlb_finish_mmu (tlb=0xcec2fd18, start=<optimized
out>,
end=<optimized out>) at /usr/src/kernel/mm/mmu_gather.c:272
#7 0xca18e070 in shift_arg_pages (shift=<optimized out>,
vma=<optimized out>)
at /usr/src/kernel/fs/exec.c:678
#8 setup_arg_pages (bprm=0xcef1a000, stack_top=<optimized out>,
executable_stack=<optimized out>) at /usr/src/kernel/fs/exec.c:768
#9 0xca1f617c in load_elf_binary (bprm=0xcef1a000)
at /usr/src/kernel/fs/binfmt_elf.c:867
#10 0xca18f3d4 in search_binary_handler (bprm=<optimized out>)
at /usr/src/kernel/fs/exec.c:1691
#11 0xca1f458c in next_terminator (last=<optimized out>,
first=<optimized out>)
at /usr/src/kernel/fs/binfmt_script.c:29
#12 load_script (bprm=0xcef1a000) at
/usr/src/kernel/fs/binfmt_script.c:83
#13 0xca18f3d4 in search_binary_handler (bprm=<optimized out>)
at /usr/src/kernel/fs/exec.c:1691
#14 0xca190104 in acct_arg_size (bprm=<optimized out>, pages=<optimized
out>)
at /usr/src/kernel/fs/exec.c:187
#15 __do_execve_file (fd=<optimized out>, filename=0xcec98000,
argv=...,
envp=..., flags=<optimized out>, file=<optimized out>)
at /usr/src/kernel/fs/exec.c:1872
#16 0xca19059c in __read_once_size (size=<optimized out>,
res=<optimized out>, p=<optimized out>) at
/usr/src/kernel/include/linux/compiler.h:235
#17 set_dumpable (mm=<optimized out>, value=<optimized out>) at
/usr/src/kernel/fs/exec.c:1983


2021-04-27 17:27:12

by Christophe Leroy

[permalink] [raw]
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

Hi Eddies,

Le 27/04/2021 à 19:03, Eddie James a écrit :
> Hi all,
>
> I'm having a problem in simulation and hardware where my PPC476
> processor stops executing instructions after callling /init. In my case
> this is a bash script. The code descends to flush the TLB, and
> somewhere in the loop in _tlbil_pid, the PC goes to
> InstructionTLBError47x but does not go any further. This only occurs in
> the crash kernel environment, which is using the same kernel,
> initramfs, and init script as the main kernel, which executed fine. I
> do not see this problem with linux 4.19 or 3.10. I do see it with 5.4
> and 5.10. I see a fair amount of refactoring in the PPC memory
> management area between 4.19 and 5.4. Can anyone point me in a
> direction to debug this further? My stack trace is below as I can run
> gdb in simulation.

Can you bisect to pin point the culprit commit ?

Assuming the problem is in arch/powerpc/ , you should get the result in approx 10 steps:

[root@po15610vm linux-powerpc]# git bisect start -- arch/powerpc/
[root@po15610vm linux-powerpc]# git bisect bad v5.4
[root@po15610vm linux-powerpc]# git bisect good v4.19
Bisecting: 964 revisions left to test after this (roughly 10 steps)


Christophe

2021-04-27 22:46:26

by Eddie James

[permalink] [raw]
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
> Hi Eddies,
>
> Le 27/04/2021 à 19:03, Eddie James a écrit :
> > Hi all,
> >
> > I'm having a problem in simulation and hardware where my PPC476
> > processor stops executing instructions after callling /init. In my
> > case
> > this is a bash script. The code descends to flush the TLB, and
> > somewhere in the loop in _tlbil_pid, the PC goes to
> > InstructionTLBError47x but does not go any further. This only
> > occurs in
> > the crash kernel environment, which is using the same kernel,
> > initramfs, and init script as the main kernel, which executed fine.
> > I
> > do not see this problem with linux 4.19 or 3.10. I do see it with
> > 5.4
> > and 5.10. I see a fair amount of refactoring in the PPC memory
> > management area between 4.19 and 5.4. Can anyone point me in a
> > direction to debug this further? My stack trace is below as I can
> > run
> > gdb in simulation.
>
> Can you bisect to pin point the culprit commit ?

Hi, thanks for your prompt reply.

Good idea! I have bisected to:

commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
refs/bisect/bad)
Author: Christophe Leroy <[email protected]>
Date: Thu Feb 21 19:08:40 2019 +0000

powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.

Now that mmu_mapin_ram() is able to handle other blocks
than the one starting at 0, the WII can use it for all
its blocks.

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

I also confirmed that reverting this commit resolves the issue in 5.4+.

Now, I don't understand why this is problematic or what is really
happening... Reverting is probably not the desired solution.

Thanks
Eddie


>
> Assuming the problem is in arch/powerpc/ , you should get the result
> in approx 10 steps:
>
> [root@po15610vm linux-powerpc]# git bisect start -- arch/powerpc/
> [root@po15610vm linux-powerpc]# git bisect bad v5.4
> [root@po15610vm linux-powerpc]# git bisect good v4.19
> Bisecting: 964 revisions left to test after this (roughly 10 steps)
>
>
> Christophe

2021-04-28 06:09:51

by Christophe Leroy

[permalink] [raw]
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+



Le 28/04/2021 à 00:42, Eddie James a écrit :
> On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
>> Hi Eddies,
>>
>> Le 27/04/2021 à 19:03, Eddie James a écrit :
>>> Hi all,
>>>
>>> I'm having a problem in simulation and hardware where my PPC476
>>> processor stops executing instructions after callling /init. In my
>>> case
>>> this is a bash script. The code descends to flush the TLB, and
>>> somewhere in the loop in _tlbil_pid, the PC goes to
>>> InstructionTLBError47x but does not go any further. This only
>>> occurs in
>>> the crash kernel environment, which is using the same kernel,
>>> initramfs, and init script as the main kernel, which executed fine.
>>> I
>>> do not see this problem with linux 4.19 or 3.10. I do see it with
>>> 5.4
>>> and 5.10. I see a fair amount of refactoring in the PPC memory
>>> management area between 4.19 and 5.4. Can anyone point me in a
>>> direction to debug this further? My stack trace is below as I can
>>> run
>>> gdb in simulation.
>>
>> Can you bisect to pin point the culprit commit ?
>
> Hi, thanks for your prompt reply.
>
> Good idea! I have bisected to:
>
> commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
> refs/bisect/bad)
> Author: Christophe Leroy <[email protected]>
> Date: Thu Feb 21 19:08:40 2019 +0000
>
> powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
>
> Now that mmu_mapin_ram() is able to handle other blocks
> than the one starting at 0, the WII can use it for all
> its blocks.
>
> Signed-off-by: Christophe Leroy <[email protected]>
> Signed-off-by: Michael Ellerman <[email protected]>
>
> I also confirmed that reverting this commit resolves the issue in 5.4+.
>
> Now, I don't understand why this is problematic or what is really
> happening... Reverting is probably not the desired solution.
>

Can you provide the 'dmesg' or a dump of the logs printed by the kernel at boottime ?

The difference with this commit is that if there are several memblocks, all get mapped. Maybe your
target doesn't like it.

You are talking about simulation, are you using QEMU ? If yes can you provide details so that I can
try and reproduce the issue ?

Thanks
Christophe

2021-04-28 17:35:09

by Eddie James

[permalink] [raw]
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

On Wed, 2021-04-28 at 08:08 +0200, Christophe Leroy wrote:
>
> Le 28/04/2021 à 00:42, Eddie James a écrit :
> > On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
> > > Hi Eddies,
> > >
> > > Le 27/04/2021 à 19:03, Eddie James a écrit :
> > > > Hi all,
> > > >
> > > > I'm having a problem in simulation and hardware where my PPC476
> > > > processor stops executing instructions after callling /init. In
> > > > my
> > > > case
> > > > this is a bash script. The code descends to flush the TLB, and
> > > > somewhere in the loop in _tlbil_pid, the PC goes to
> > > > InstructionTLBError47x but does not go any further. This only
> > > > occurs in
> > > > the crash kernel environment, which is using the same kernel,
> > > > initramfs, and init script as the main kernel, which executed
> > > > fine.
> > > > I
> > > > do not see this problem with linux 4.19 or 3.10. I do see it
> > > > with
> > > > 5.4
> > > > and 5.10. I see a fair amount of refactoring in the PPC memory
> > > > management area between 4.19 and 5.4. Can anyone point me in a
> > > > direction to debug this further? My stack trace is below as I
> > > > can
> > > > run
> > > > gdb in simulation.
> > >
> > > Can you bisect to pin point the culprit commit ?
> >
> > Hi, thanks for your prompt reply.
> >
> > Good idea! I have bisected to:
> >
> > commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
> > refs/bisect/bad)
> > Author: Christophe Leroy <[email protected]>
> > Date: Thu Feb 21 19:08:40 2019 +0000
> >
> > powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
> >
> > Now that mmu_mapin_ram() is able to handle other blocks
> > than the one starting at 0, the WII can use it for all
> > its blocks.
> >
> > Signed-off-by: Christophe Leroy <[email protected]>
> > Signed-off-by: Michael Ellerman <[email protected]>
> >
> > I also confirmed that reverting this commit resolves the issue in
> > 5.4+.
> >
> > Now, I don't understand why this is problematic or what is really
> > happening... Reverting is probably not the desired solution.
> >
>
> Can you provide the 'dmesg' or a dump of the logs printed by the
> kernel at boottime ?

Yes:

[ 0.000000] Linux version 5.4.85-a19ab63.ppcnf-fsp2 (oe-user@oe-host
) (gcc ve
rsion 8.3.0 (GCC)) #1 Tue Mar 30 15:32:31 UTC 2021
[ 0.000000] Found initrd at 0xcf6a7000:0xcfffb09f
[ 0.000000] Using FSP-2 machine description
[ 0.000000] Found legacy serial port 0 for /plb4/opb/serial@b0020000
[ 0.000000] mem=10b0020000, taddr=10b0020000, irq=0, clk=20833333,
speed=11
5200
[ 0.000000] -----------------------------------------------------
[ 0.000000] phys_mem_size = 0x4000000
[ 0.000000] dcache_bsize = 0x80
[ 0.000000] icache_bsize = 0x20
[ 0.000000] cpu_features = 0x0000000040800120
[ 0.000000] possible = 0x0000000040800120
[ 0.000000] always = 0x0000000000000020
[ 0.000000] cpu_user_features = 0x8c008000 0x00000000
[ 0.000000] mmu_features = 0x00140020
[ 0.000000] physical_start = 0xc000000
[ 0.000000] -----------------------------------------------------
[ 0.000000] Top of RAM: 0x10000000, Total RAM: 0x4000000
[ 0.000000] Memory hole size: 192MB
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x000000000c000000-0x000000000fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x000000000c000000-0x000000000fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x000000000c000000-
0x000000000fffffff]
[ 0.000000] On node 0 totalpages: 16384
[ 0.000000] Normal zone: 160 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 16384 pages, LIFO batch:3
[ 0.000000] MMU: Allocated 278528 bytes of context maps for 65535
contexts
[ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages:
16224
[ 0.000000] Kernel command line: console=ttyS0,115200 loglevel=9
rootwait roo
t=/dev/mmcblk0p2 elfcorehdr=209456K maxcpus=1
[ 0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768
bytes, lin
ear)
[ 0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384
bytes, line
ar)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 42028K/65536K available (7136K kernel code, 620K
rwdata,
1916K rodata, 2080K init, 233K bss, 23508K reserved, 0K cma-reserved)
[ 0.000000] Kernel virtual memory layout:
[ 0.000000] * 0xfffdf000..0xfffff000 : fixmap
[ 0.000000] * 0xd1000000..0xfffdf000 : vmalloc & ioremap
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[ 0.000000] UIC0 (32 IRQ sources) at DCR 0x2c0
[ 0.000000] UIC1 (32 IRQ sources) at DCR 0x2c8
[ 0.000000] UIC2 (32 IRQ sources) at DCR 0x350
[ 0.000000] UIC3 (32 IRQ sources) at DCR 0x358
[ 0.000000] UIC4 (32 IRQ sources) at DCR 0x360
[ 0.000000] UIC5 (32 IRQ sources) at DCR 0x368
[ 0.000000] UIC6 (32 IRQ sources) at DCR 0x370
[ 0.000000] UIC7 (32 IRQ sources) at DCR 0x2d0
[ 0.000000] UIC8 (32 IRQ sources) at DCR 0x2d8
[ 0.000000] UIC17 (32 IRQ sources) at DCR 0x320
[ 0.000000] UIC18 (32 IRQ sources) at DCR 0x328
[ 0.000000] UIC19 (32 IRQ sources) at DCR 0x330
[ 0.000000] UIC20 (32 IRQ sources) at DCR 0x338
[ 0.000000] UIC21 (32 IRQ sources) at DCR 0x340
[ 0.000000] UIC22 (32 IRQ sources) at DCR 0x348
[ 0.000000] UIC9 (32 IRQ sources) at DCR 0x2e0
[ 0.000000] UIC10 (32 IRQ sources) at DCR 0x2e8
[ 0.000000] UIC11 (32 IRQ sources) at DCR 0x2f0
[ 0.000000] UIC12 (32 IRQ sources) at DCR 0x2f8
[ 0.000000] UIC13 (32 IRQ sources) at DCR 0x300
[ 0.000000] UIC14 (32 IRQ sources) at DCR 0x308
[ 0.000000] UIC15 (32 IRQ sources) at DCR 0x310
[ 0.000000] UIC16 (32 IRQ sources) at DCR 0x318
[ 0.000000] random: get_random_u32 called from
start_kernel+0x32c/0x500 with
crng_init=0
[ 0.000000] time_init: decrementer frequency = 166.666667 MHz
[ 0.000000] time_init: processor frequency = 1333.333333 MHz
[ 0.000004] clocksource: timebase: mask: 0xffffffffffffffff
max_cycles: 0x267
03d839d, max_idle_ns: 440795207088 ns
[ 0.000007] clocksource: timebase mult[6000000] shift[24] registered
[ 0.000011] clockevent: decrementer mult[2aaaaaac] shift[32] cpu[0]
[ 0.000071] pid_max: default: 32768 minimum: 301
[ 0.000107] Mount-cache hash table entries: 1024 (order: 0, 4096
bytes, linea
r)
[ 0.000111] Mountpoint-cache hash table entries: 1024 (order: 0,
4096 bytes,
linear)
[ 0.000580] devtmpfs: initialized
[ 0.002029] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, ma
x_idle_ns: 7645041785100000 ns
[ 0.002034] futex hash table entries: 256 (order: -1, 3072 bytes,
linear)
[ 0.002146] NET: Registered protocol family 16
[ 0.004958] SCSI subsystem initialized
[ 0.004973] libata version 3.00 loaded.
[ 0.005003] usbcore: registered new interface driver usbfs
[ 0.005019] usbcore: registered new interface driver hub
[ 0.005031] usbcore: registered new device driver usb
[ 0.005043] pps_core: LinuxPPS API ver. 1 registered
[ 0.005045] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giome
tti <[email protected]>
[ 0.005052] PTP clock support registered
[ 0.005257] clocksource: Switched to clocksource timebase
[ 0.311619] thermal_sys: Registered thermal governor 'step_wise'
[ 0.311665] NET: Registered protocol family 2
[ 0.311834] tcp_listen_portaddr_hash hash table entries: 512 (order:
0, 4096
bytes, linear)
[ 0.311844] TCP established hash table entries: 1024 (order: 0, 4096
bytes, l
inear)
[ 0.311855] TCP bind hash table entries: 1024 (order: 0, 4096 bytes,
linear)
[ 0.311864] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.311877] UDP hash table entries: 256 (order: 0, 4096 bytes,
linear)
[ 0.311885] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes,
linear)
[ 0.311918] NET: Registered protocol family 1
[ 0.311994] RPC: Registered named UNIX socket transport module.
[ 0.311996] RPC: Registered udp transport module.
[ 0.311999] RPC: Registered tcp transport module.
[ 0.312001] RPC: Registered tcp NFSv4.1 backchannel transport
module.
[ 0.312043] Trying to unpack rootfs image as initramfs...
[ 0.632637] Freeing initrd memory: 9552K
[ 0.634361] shadow_disr_to_uic15: 0x00000020 to uic15_pr
[ 0.635621] workingset: timestamp_bits=30 max_order=14
bucket_order=0
[ 0.638710] NFS: Registering the id_resolver key type
[ 0.638717] Key type id_resolver registered
[ 0.638720] Key type id_legacy registered
[ 0.638730] jffs2: version 2.2. (NAND) (SUMMARY) © 2001-2006 Red
Hat, Inc.
[ 0.638858] fuse: init (API version 7.31)
[ 0.664372] io scheduler mq-deadline registered
[ 0.664375] io scheduler kyber registered
[ 0.664573] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[ 0.665728] printk: console [ttyS0] disabled
[ 0.685664] serial8250.0: ttyS0 at MMIO 0x10b0020000 (irq = 47,
base_baud = 1
302083) is a 16550A
[ 0.693859] printk: console [ttyS0] enabled
[ 0.694150] printk: console [ttyS0] disabled
[ 0.694212] 10b0020000.serial: ttyS0 at MMIO 0x10b0020000 (irq = 47,
base_bau
d = 1302083) is a 16550
[ 0.702632] printk: console [ttyS0] enabled
[ 0.704642] brd: module loaded
[ 0.704730] slram: not enough parameters.
[ 0.704792] PPP generic driver version 2.4.2
[ 0.704875] PPP Deflate Compression module registered
[ 0.704941] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
Driver
[ 0.705052] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 0.705148] usbcore: registered new interface driver usb-storage
[ 0.705248] usbcore: registered new interface driver
usbserial_generic
[ 0.705341] usbserial: USB Serial support registered for generic
[ 0.705428] usbcore: registered new interface driver pl2303
[ 0.705508] usbserial: USB Serial support registered for pl2303
[ 0.705592] booke_wdt: powerpc book-e watchdog driver loaded
[ 0.705769] sdhci: Secure Digital Host Controller Interface driver
[ 0.705849] sdhci: Copyright(c) Pierre Ossman
[ 0.705906] sdhci-pltfm: SDHCI platform and OF driver helper
[ 0.706033] mmc0: Invalid maximum block size, assuming 512 bytes
[ 0.706112] mmc0 bounce up to 128 segments into one, max segment
size 65536 b
ytes
[ 0.731266] mmc0: SDHCI controller on 10020c0000.emmc
[10020c0000.emmc] using
DMA
[ 0.731422] oprofile: using timer interrupt.
[ 0.731546] Initializing XFRM netlink socket
[ 0.731789] NET: Registered protocol family 10
[ 0.732013] Segment Routing with IPv6
[ 0.732096] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 0.732331] NET: Registered protocol family 17
[ 0.732414] sctp: Hash tables configured (bind 1024/1024)
[ 0.732591] Key type dns_resolver registered
[ 0.732698] drmem: No dynamic reconfiguration memory found
[ 0.732888] hctosys: unable to open rtc device (rtc0)
[ 0.734057] Freeing unused kernel memory: 2080K
[ 0.734116] This architecture does not have kernel memory
protection.
[ 0.734199] Run /init as init process

>
> The difference with this commit is that if there are several
> memblocks, all get mapped. Maybe your
> target doesn't like it.
>
> You are talking about simulation, are you using QEMU ? If yes can you
> provide details so that I can
> try and reproduce the issue ?

No, we use Windriver simics for this system. I'm not sure if QEMU has a
model for the PPC476... and I suspect it won't recreate with a more
generic 32 bit ppc machine, though I can try and find out.

Thanks,
Eddie

>
> Thanks
> Christophe

2021-04-28 17:35:29

by Eddie James

[permalink] [raw]
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

On Wed, 2021-04-28 at 08:08 +0200, Christophe Leroy wrote:
>
> Le 28/04/2021 à 00:42, Eddie James a écrit :
> > On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
> > > Hi Eddies,
> > >
> > > Le 27/04/2021 à 19:03, Eddie James a écrit :
> > > > Hi all,
> > > >
> > > > I'm having a problem in simulation and hardware where my PPC476
> > > > processor stops executing instructions after callling /init. In
> > > > my
> > > > case
> > > > this is a bash script. The code descends to flush the TLB, and
> > > > somewhere in the loop in _tlbil_pid, the PC goes to
> > > > InstructionTLBError47x but does not go any further. This only
> > > > occurs in
> > > > the crash kernel environment, which is using the same kernel,
> > > > initramfs, and init script as the main kernel, which executed
> > > > fine.
> > > > I
> > > > do not see this problem with linux 4.19 or 3.10. I do see it
> > > > with
> > > > 5.4
> > > > and 5.10. I see a fair amount of refactoring in the PPC memory
> > > > management area between 4.19 and 5.4. Can anyone point me in a
> > > > direction to debug this further? My stack trace is below as I
> > > > can
> > > > run
> > > > gdb in simulation.
> > >
> > > Can you bisect to pin point the culprit commit ?
> >
> > Hi, thanks for your prompt reply.
> >
> > Good idea! I have bisected to:
> >
> > commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
> > refs/bisect/bad)
> > Author: Christophe Leroy <[email protected]>
> > Date: Thu Feb 21 19:08:40 2019 +0000
> >
> > powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
> >
> > Now that mmu_mapin_ram() is able to handle other blocks
> > than the one starting at 0, the WII can use it for all
> > its blocks.
> >
> > Signed-off-by: Christophe Leroy <[email protected]>
> > Signed-off-by: Michael Ellerman <[email protected]>
> >
> > I also confirmed that reverting this commit resolves the issue in
> > 5.4+.
> >
> > Now, I don't understand why this is problematic or what is really
> > happening... Reverting is probably not the desired solution.
> >
>
> Can you provide the 'dmesg' or a dump of the logs printed by the
> kernel at boottime ?
>
> The difference with this commit is that if there are several
> memblocks, all get mapped. Maybe your
> target doesn't like it.
>
> You are talking about simulation, are you using QEMU ? If yes can you
> provide details so that I can
> try and reproduce the issue ?

Milton mentioned the kexec debug output might be useful:

lzma_decompress_file: read on uImage of 65536
bytes failed
kernel: 0xb75b3010 kernel_size: 0x4f1231
0000000000000000-0000000040000000 : 0
get base memory ranges:1
usable memory rgns size:1 base:0 size:10000000
exclude_range sorted exclude_range[0] start:0, end:bb7000
setup_memory_ranges memory_range[0] start:bb7001, end:30000000
CRASH MEMORY RANGES
0000000000000000-0000000000010000
0000000000010000-000000000c000000
0000000010000000-000000002d600000
get_crash_notes_per_cpu: crash_notes addr = 2ce84e00, size = 300
Elf header: p_type = 4, p_offset = 0x2ce84e00 p_paddr = 0x2ce84e00
p_vaddr = 0x0
p_filesz = 0x12c p_memsz = 0x12c
vmcoreinfo header: p_type = 4, p_offset = 0x2c8d4000 p_paddr =
0x2c8d4000 p_vadd
r = 0x0 p_filesz = 0x1024 p_memsz = 0x1024
Elf header: p_type = 1, p_offset = 0xcc7e000 p_paddr = 0x0 p_vaddr =
0xc0000000
p_filesz = 0x10000 p_memsz = 0x10000
Elf header: p_type = 1, p_offset = 0x10000 p_paddr = 0x10000 p_vaddr =
0xc001000
0 p_filesz = 0xbff0000 p_memsz = 0xbff0000
Elf header: p_type = 1, p_offset = 0x10000000 p_paddr = 0x10000000
p_vaddr = 0xd
0000000 p_filesz = 0x1d600000 p_memsz = 0x1d600000
Command line after adding elfcorehdr: elfcorehdr=209464K
sym: .data info: 03 other: 00 shndx: 7 value: 0 size: 0
sym: .data value: ffffbf0 addr: fffc032
sym: .data info: 03 other: 00 shndx: 7 value: 0 size: 0
sym: .data value: ffffbf0 addr: fffc03a
sym: sha256_starts info: 12 other: 00 shndx: 2 value: 92c size: 70
sym: sha256_starts value: fffc950 addr: fffc044
sym: sha256_update info: 12 other: 00 shndx: 2 value: 36c4 size: c
sym: sha256_update value: ffff6e8 addr: fffc058
sym: sha256_finish info: 12 other: 00 shndx: 2 value: 36d0 size: 1a4
sym: sha256_finish value: ffff6f4 addr: fffc070
sym: memcmp info: 12 other: 00 shndx: 2 value: 5fc size: 38
sym: memcmp value: fffc620 addr: fffc080
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff898 addr: fffc08e
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8c8 addr: fffc092
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff898 addr: fffc096
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc09c
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8b8 addr: fffc0a2
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8b8 addr: fffc0a6
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8c8 addr: fffc0aa
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc0ac
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc0b8
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8ce addr: fffc0ca
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8ce addr: fffc0d2
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc0d8
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8d0 addr: fffc0de
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8d0 addr: fffc0e2
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc0e4
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc0f0
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc104
sym: _restgpr_28_x info: 12 other: 00 shndx: 2 value: 888 size: 0
sym: _restgpr_28_x value: fffc8ac addr: fffc110
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8e0 addr: fffc11a
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8e0 addr: fffc122
sym: printf info: 12 other: 00 shndx: 2 value: 524 size: 68
sym: printf value: fffc548 addr: fffc128
sym: setup_arch info: 12 other: 00 shndx: 2 value: 91c size: 4
sym: setup_arch value: fffc940 addr: fffc12c
sym: skip_checks info: 11 other: 00 shndx: 8 value: 0 size: 4
sym: skip_checks value: ffffd50 addr: fffc132
sym: skip_checks info: 11 other: 00 shndx: 8 value: 0 size: 4
sym: skip_checks value: ffffd50 addr: fffc136
sym: verify_sha256_digest info: 12 other: 00 shndx: 2 value: 0 size: f0
sym: verify_sha256_digest value: fffc024 addr: fffc140
sym: post_verification_setup_arch info: 12 other: 00 shndx: 2 value:
920 size: 4
sym: post_verification_setup_arch value: fffc944 addr: fffc15c
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8f2 addr: fffc16e
sym: .rodata.str1.1 info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .rodata.str1.1 value: ffff8f2 addr: fffc18a
sym: _restgpr_20_x info: 12 other: 00 shndx: 2 value: 868 size: 0
sym: _restgpr_20_x value: fffc88c addr: fffc1a8
sym: putchar info: 12 other: 00 shndx: 2 value: 928 size: 4
sym: putchar value: fffc94c addr: fffc1cc
sym: putchar info: 12 other: 00 shndx: 2 value: 928 size: 4
sym: putchar value: fffc94c addr: fffc244
sym: __lshrdi3 info: 10 other: 00 shndx: 2 value: 8f8 size: 0
sym: __lshrdi3 value: fffc91c addr: fffc2c4
sym: putchar info: 12 other: 00 shndx: 2 value: 928 size: 4
sym: putchar value: fffc94c addr: fffc4e4
sym: vsprintf info: 12 other: 00 shndx: 2 value: 13c size: 38c
sym: vsprintf value: fffc160 addr: fffc534
sym: vsprintf info: 12 other: 00 shndx: 2 value: 13c size: 38c
sym: vsprintf value: fffc160 addr: fffc59c
sym: my_thread_ptr info: 11 other: 00 shndx: 8 value: 8 size: 4
sym: my_thread_ptr value: ffffd58 addr: fffc76a
sym: my_thread_ptr info: 11 other: 00 shndx: 8 value: 8 size: 4
sym: my_thread_ptr value: ffffd58 addr: fffc76e
sym: stack info: 11 other: 00 shndx: 8 value: 10 size: 4
sym: stack value: ffffd60 addr: fffc776
sym: stack info: 11 other: 00 shndx: 8 value: 10 size: 4
sym: stack value: ffffd60 addr: fffc77a
sym: purgatory info: 12 other: 00 shndx: 2 value: f0 size: 4c
sym: purgatory value: fffc114 addr: fffc784
sym: dt_offset info: 11 other: 00 shndx: 8 value: c size: 4
sym: dt_offset value: ffffd5c addr: fffc79e
sym: dt_offset info: 11 other: 00 shndx: 8 value: c size: 4
sym: dt_offset value: ffffd5c addr: fffc7a2
sym: kernel info: 11 other: 00 shndx: 8 value: 4 size: 4
sym: kernel value: ffffd54 addr: fffc7ba
sym: kernel info: 11 other: 00 shndx: 8 value: 4 size: 4
sym: kernel value: ffffd54 addr: fffc7be
sym: memcpy info: 12 other: 00 shndx: 2 value: 5d8 size: 24
sym: memcpy value: fffc5fc addr: ffff648
sym: memcpy info: 12 other: 00 shndx: 2 value: 5d8 size: 24
sym: memcpy value: fffc5fc addr: ffff65c
sym: sha256_process info: 12 other: 00 shndx: 2 value: 99c size: 2be0
sym: sha256_process value: fffc9c0 addr: ffff66c
sym: sha256_process info: 12 other: 00 shndx: 2 value: 99c size: 2be0
sym: sha256_process value: fffc9c0 addr: ffff6a8
sym: .data info: 03 other: 00 shndx: 7 value: 0 size: 0
sym: .data value: ffffd10 addr: ffff742
sym: .data info: 03 other: 00 shndx: 7 value: 0 size: 0
sym: .data value: ffffd10 addr: ffff74a
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc024 addr: ffff920
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc114 addr: ffff94c
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc160 addr: ffff980
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc4ec addr: ffff9c8
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc548 addr: ffff9e8
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc5b0 addr: ffffa1c
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc5dc addr: ffffa30
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc5fc addr: ffffa44
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc620 addr: ffffa58
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc940 addr: ffffa80
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc944 addr: ffffa94
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc948 addr: ffffaa8
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc94c addr: ffffad0
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc950 addr: ffffaf8
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: fffc9c0 addr: ffffb0c
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: ffff5a0 addr: ffffb68
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: ffff6e8 addr: ffffbbc
sym: .text info: 03 other: 00 shndx: 2 value: 0 size: 0
sym: .text value: ffff6f4 addr: ffffbd0
Modified cmdline:console=ttyS0,115200 loglevel=9 rootwait
root=/dev/mmcblk0p2 el
fcorehdr=209464K maxcpus=1
reserve regions: 4
0: offset: e3fc000, size: 4000
1: offset: c000000, size: c8f000
2: offset: fffc000, size: 4000
3: offset: f6a7000, size: 955000
debug.dtb written
kexec_load: entry = 0xfffc658 flags = 0x1
nr_segments = 6
segment[0].buf = 0xb5cb1010
segment[0].bufsz = 0xb7d568
segment[0].mem = 0xc000000
segment[0].memsz = 0xc7e000
segment[1].buf = 0x100997b0
segment[1].bufsz = 0x10000
segment[1].mem = 0xcc7e000
segment[1].memsz = 0x10000
segment[2].buf = 0x100a97f0
segment[2].bufsz = 0x400
segment[2].mem = 0xcc8e000
segment[2].memsz = 0x1000
segment[3].buf = 0x100ae670
segment[3].bufsz = 0x3c5f
segment[3].mem = 0xe3fc000
segment[3].memsz = 0x4000
segment[4].buf = 0xb6c5e050
segment[4].bufsz = 0x95409f
segment[4].mem = 0xf6a7000
segment[4].memsz = 0x955000
segment[5].buf = 0x100aa8a0
segment[5].bufsz = 0x3d6c
segment[5].mem = 0xfffc000
segment[5].memsz = 0x4000

>
> Thanks
> Christophe