2021-12-14 19:58:49

by Chris Ward

[permalink] [raw]
Subject: Kernel crash on ARM64

Please personally cc me on answers/comments as I am not currently
subscribed to the LKML. Apologies for the formatting; I was trying to
use my business email to send this and that has some html in which got
rejected by your filtering system.

My team has a problem which is being bounced between Canonical support
and Xilinx support.
We are using kernel 5.4.0-xilinx-v2020.2 built from sources under
https://github.com/Xilinx/linux-xlnx with a Ubuntu 20.04 userland on
an ARM64 embedded linux machine (i.e. not x86-64). When trying to set
up a file system on a ramdisk, we get a kernel crash for sizes of
ramdisk larger than 2GB while trying to 'dd if=/dev/zero ...' in
preparation for issuing mkfs.
The first few lines of the crash message are
[ 36.082810] cloud-init[858]: 2068-03-21 21:24:22,477 -
cc_final_message.py[WARNING]: Used fallback datae
[ 40.413307] overlayfs: filesystem on
'/var/lib/docker/check-overlayfs-support002292139/upper' not suppor
[ 40.937166] overlayfs: filesystem on
'/var/lib/docker/check-overlayfs-support240114958/upper' not suppor
[ 112.624740] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[ 112.633516] Mem abort info:
[ 112.636291] ESR = 0x96000004
[ 112.639330] EC = 0x25: DABT (current EL), IL = 32 bits
[ 112.644624] SET = 0, FnV = 0
[ 112.647662] EA = 0, S1PTW = 0
[ 112.650786] Data abort info:
[ 112.653651] ISV = 0, ISS = 0x00000004
[ 112.657470] CM = 0, WnR = 0
[ 112.660423] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000b4f60b000
[ 112.666845] [0000000000000000] pgd=0000000000000000
[ 112.671707] Internal error: Oops: 96000004 [#1] SMP
[ 112.676567] Modules linked in: br_netfilter
[ 112.680747] CPU: 2 PID: 1060 Comm: dd Not tainted 5.4.0-xilinx-v2020.2 #1
[ 112.687521] Hardware name: xlnx,zynqmp (DT)
[ 112.691689] pstate: a0000085 (NzCv daIf -PAN -UAO)
[ 112.696472] pc : __wake_up_common+0x58/0x170
[ 112.700726] lr : __wake_up_common_lock+0x98/0x110
[ 112.705410] sp : ffff800011ad3520
[ 112.708709] x29: ffff800011ad3520 x28: 0000000000000080
[ 112.714003] x27: ffff800011ad3650 x26: 0000000000000000
[ 112.719298] x25: 0000000000000003 x24: 0000000000000000
[ 112.724593] x23: 0000000000000001 x22: ffff800011ad35f0
[ 112.729888] x21: ffff8000110d5d88 x20: 0000000000000001
[ 112.735183] x19: ffff8000110d5d80 x18: fffffe002d6cd0c8
[ 112.740477] x17: ffff000b6c001008 x16: ffff000b6c001028
[ 112.745772] x15: 0001000000000000 x14: 0000000000000000
[ 112.751067] x13: 0000000000000000 x12: 0000000000000000
[ 112.756362] x11: 0000000000000000 x10: 0000000000000000
[ 112.761657] x9 : 0000000000000000 x8 : 0000000000000000
[ 112.766952] x7 : 0000000000000000 x6 : ffffffffffffffe8
[ 112.772247] x5 : ffff800011ad35f0 x4 : ffff800011ad3650
[ 112.777541] x3 : 0000000000000000 x2 : 0000000000000001
[ 112.782836] x1 : 0000000000000000 x0 : 0000000000000000
[ 112.788132] Call trace:
[ 112.790565] __wake_up_common+0x58/0x170
[ 112.794479] __wake_up_common_lock+0x98/0x110
[ 112.798819] __wake_up+0x14/0x20
[ 112.802029] wake_up_bit+0x78/0xa0
[ 112.805416] unlock_buffer+0x2c/0x38
[ 112.808974] end_buffer_async_write+0x98/0x1c0
[ 112.813401] end_bio_bh_io_sync+0x30/0x60

I will attach the full kernel log from boot to crash. Some questions:
1) Is it possible that there is an incompatibility between the Ubuntu
userland and the Xilinx kernel ? I think it is not possible to have an
incompatibility here, which would land the support question solidly in
Xilinx' court.
2) Is there a known problem with this kernel level on ARM64 hardware ?
3) Would it be likely to be productive to move to a newer Xilinx kernel ?
4) If I have to debug this myself, where do I start ?

Thanks all !
T J (Chris) Ward, IBM Research.
Scalable Data-Centric Computing - IBM Spectrum MPI
IBM United Kingdom Ltd., Hursley Park, Winchester, Hants, SO21 2JN
011-44-1962-818679
LinkedIn https://www.linkedin.com/in/tjcward/
ResearchGate https://www.researchgate.net/profile/Thomas_Ward16

IBM Research -- Data Centric Systems
IBM Supercomputer Marketing

IBM Branded Products IBM Branded Swag


UNIX in the Cloud - Find A Place Where There's Room To Grow, with the
original Open Standard. Free Trial Here Today
Protein Folding by Supercomputer - BlueMatter Molecular Dynamics package.
Data Tables - In-memory key-value store package.
Linux on Windows - Virtualisation package. On the Lighthouse !



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Attachments area
Chris Ward ([email protected])


On Tue, 14 Dec 2021 at 17:14, Chris Ward <[email protected]> wrote:
>
>
> T J (Chris) Ward, IBM Research.
> Scalable Data-Centric Computing - IBM Spectrum MPI
> IBM United Kingdom Ltd., Hursley Park, Winchester, Hants, SO21 2JN
> 011-44-1962-818679
>
>
> To:
> [email protected]
> cc:
> "Mohit Kapur" <[email protected]>, "Ralph Bellofatto" <[email protected]>
> Date:
> 12:51:45 PM Today
> Subject:
> Kernel crash on ARM64
>
> Please personally cc me on answers/comments as I am not currently subscribed to the LKML.
>
> My team has a problem which is being bounced between Canonical support and Xilinx support.
> We are using kernel 5.4.0-xilinx-v2020.2 built from sources under https://github.com/Xilinx/linux-xlnx with a Ubuntu 20.04 userland on an ARM64 embedded linux machine (i.e. not x86-64). When trying to set up a file system on a ramdisk, we get a kernel crash for sizes of ramdisk larger than 2GB while trying to 'dd if=/dev/zero ...' in preparation for issuing mkfs.
> The first few lines of the crash message are
> [ 36.082810] cloud-init[858]: 2068-03-21 21:24:22,477 - cc_final_message.py[WARNING]: Used fallback datae
> [ 40.413307] overlayfs: filesystem on '/var/lib/docker/check-overlayfs-support002292139/upper' not suppor
> [ 40.937166] overlayfs: filesystem on '/var/lib/docker/check-overlayfs-support240114958/upper' not suppor
> [ 112.624740] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [ 112.633516] Mem abort info:
> [ 112.636291] ESR = 0x96000004
> [ 112.639330] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 112.644624] SET = 0, FnV = 0
> [ 112.647662] EA = 0, S1PTW = 0
> [ 112.650786] Data abort info:
> [ 112.653651] ISV = 0, ISS = 0x00000004
> [ 112.657470] CM = 0, WnR = 0
> [ 112.660423] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000b4f60b000
> [ 112.666845] [0000000000000000] pgd=0000000000000000
> [ 112.671707] Internal error: Oops: 96000004 [#1] SMP
> [ 112.676567] Modules linked in: br_netfilter
> [ 112.680747] CPU: 2 PID: 1060 Comm: dd Not tainted 5.4.0-xilinx-v2020.2 #1
> [ 112.687521] Hardware name: xlnx,zynqmp (DT)
> [ 112.691689] pstate: a0000085 (NzCv daIf -PAN -UAO)
> [ 112.696472] pc : __wake_up_common+0x58/0x170
> [ 112.700726] lr : __wake_up_common_lock+0x98/0x110
> [ 112.705410] sp : ffff800011ad3520
> [ 112.708709] x29: ffff800011ad3520 x28: 0000000000000080
> [ 112.714003] x27: ffff800011ad3650 x26: 0000000000000000
> [ 112.719298] x25: 0000000000000003 x24: 0000000000000000
> [ 112.724593] x23: 0000000000000001 x22: ffff800011ad35f0
> [ 112.729888] x21: ffff8000110d5d88 x20: 0000000000000001
> [ 112.735183] x19: ffff8000110d5d80 x18: fffffe002d6cd0c8
> [ 112.740477] x17: ffff000b6c001008 x16: ffff000b6c001028
> [ 112.745772] x15: 0001000000000000 x14: 0000000000000000
> [ 112.751067] x13: 0000000000000000 x12: 0000000000000000
> [ 112.756362] x11: 0000000000000000 x10: 0000000000000000
> [ 112.761657] x9 : 0000000000000000 x8 : 0000000000000000
> [ 112.766952] x7 : 0000000000000000 x6 : ffffffffffffffe8
> [ 112.772247] x5 : ffff800011ad35f0 x4 : ffff800011ad3650
> [ 112.777541] x3 : 0000000000000000 x2 : 0000000000000001
> [ 112.782836] x1 : 0000000000000000 x0 : 0000000000000000
> [ 112.788132] Call trace:
> [ 112.790565] __wake_up_common+0x58/0x170
> [ 112.794479] __wake_up_common_lock+0x98/0x110
> [ 112.798819] __wake_up+0x14/0x20
> [ 112.802029] wake_up_bit+0x78/0xa0
> [ 112.805416] unlock_buffer+0x2c/0x38
> [ 112.808974] end_buffer_async_write+0x98/0x1c0
> [ 112.813401] end_bio_bh_io_sync+0x30/0x60
>
> I will attach the full kernel log from boot to crash. Some questions:
> 1) Is it possible that there is an incompatibility between the Ubuntu userland and the Xilinx kernel ? I think it is not possible to have an incompatibility here, which would land the support question solidly in Xilinx' court.
> 2) Is there a known problem with this kernel level on ARM64 hardware ?
> 3) Would it be likely to be productive to move to a newer Xilinx kernel ?
> 4) If I have to debug this myself, where do I start ?
>
>
>
> Thanks all !
> T J (Chris) Ward, IBM Research.
> Scalable Data-Centric Computing - IBM Spectrum MPI
> IBM United Kingdom Ltd., Hursley Park, Winchester, Hants, SO21 2JN
> 011-44-1962-818679
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Attachments:
crash.txt (67.35 kB)