2023-10-16 10:22:40

by Naresh Kamboju

[permalink] [raw]
Subject: mm: Unable to handle kernel NULL pointer dereference at virtual address - mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)

Following kernel crash noticed while running LTP hugetlb and selftests on
qemu-x86_64 and qemu-arm64 running with Linux next 6.6.0-rc6-next-20231016.

Reported-by: Linux Kernel Functional Testing <[email protected]>
Reported-by: Naresh Kamboju <[email protected]>

Test Logs:
-----
<1>[ 97.466617] Unable to handle kernel NULL pointer dereference at
virtual address 00000000000000d8
<1>[ 97.469156] Mem abort info:
<1>[ 97.469619] ESR = 0x0000000097c08005
<1>[ 97.470362] EC = 0x25: DABT (current EL), IL = 32 bits
<1>[ 97.471288] SET = 0, FnV = 0
<1>[ 97.472061] EA = 0, S1PTW = 0
<1>[ 97.473341] FSC = 0x05: level 1 translation fault
<1>[ 97.473935] Data abort info:
<1>[ 97.474630] Access size = 8 byte(s)
<1>[ 97.475400] SSE = 0, SRT = 0
<1>[ 97.476583] SF = 1, AR = 0
<1>[ 97.477038] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[ 97.477975] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[ 97.478939] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101c17000
<1>[ 97.479949] [00000000000000d8] pgd=0800000101d5c003,
p4d=0800000101d5c003, pud=0000000000000000
<0>[ 97.482922] Internal error: Oops: 0000000097c08005 [#1] PREEMPT SMP
<4>[ 97.484136] Modules linked in: fuse drm backlight dm_mod
ip_tables x_tables
<4>[ 97.486054] CPU: 0 PID: 342 Comm: hugemmap13 Not tainted
6.6.0-rc6-next-20231016 #1
<4>[ 97.487075] Hardware name: linux,dummy-virt (DT)
<4>[ 97.487955] pstate: 03400009 (nzcv daif +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
<4>[ 97.488901] pc : mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)
<4>[ 97.490228] lr : mmap_region (mm/mmap.c:2945)
<4>[ 97.490733] sp : ffff80008069bba0
<4>[ 97.491176] x29: ffff80008069bbb0 x28: ffff0000c5d5e4d0 x27:
fffffffffffffff4
<4>[ 97.492062] x26: 0000000000000000 x25: 0000000000000002 x24:
0000000000000001
<4>[ 97.492989] x23: 0000000000000001 x22: 0000000000000000 x21:
ffff0000c20fcf00
<4>[ 97.493771] x20: 00000002000000fb x19: 00000000fffff000 x18:
ffff80008069bc38
<4>[ 97.494568] x17: 0000aaaae6247fff x16: 0000aaaade59cfff x15:
0000aaaade580fff
<4>[ 97.495367] x14: 0000aaaade57ffff x13: 0000000000000000 x12:
00000000fffff000
<4>[ 97.496172] x11: 0000000100000000 x10: 00000000000fffff x9 :
0000000000000000
<4>[ 97.497004] x8 : 0000000000000001 x7 : 00000002000000fb x6 :
ffff0000c20fcf00
<4>[ 97.497810] x5 : ffff0000c5d5e4d0 x4 : 00000000000001c4 x3 :
ffffb50d82f264f8
<4>[ 97.498577] x2 : 0000000000000000 x1 : 00000000ffe00000 x0 :
0000000000000000
<4>[ 97.499871] Call trace:
<4>[ 97.500288] mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)
<4>[ 97.500814] do_mmap (mm/mmap.c:1379)
<4>[ 97.501243] vm_mmap_pgoff (mm/util.c:546)
<4>[ 97.501711] ksys_mmap_pgoff (mm/mmap.c:1425)
<4>[ 97.502166] __arm64_sys_mmap (arch/arm64/kernel/sys.c:21)
<4>[ 97.502634] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:56)
<4>[ 97.503175] el0_svc_common.constprop.0
(include/linux/thread_info.h:127 (discriminator 2)
arch/arm64/kernel/syscall.c:144 (discriminator 2))
<4>[ 97.503763] do_el0_svc (arch/arm64/kernel/syscall.c:156)
<4>[ 97.504191] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:144
arch/arm64/kernel/entry-common.c:679)
<4>[ 97.504640] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
<4>[ 97.505159] el0t_64_sync (arch/arm64/kernel/entry.S:595)
<0>[ 97.505635] Code: 52800037 17fffe9f 93407c1b 17fffed1 (f9406ec0)
All code
========
0: 52800037 mov w23, #0x1 // #1
4: 17fffe9f b 0xfffffffffffffa80
8: 93407c1b sxtw x27, w0
c: 17fffed1 b 0xfffffffffffffb50
10:* f9406ec0 ldr x0, [x22, #216] <-- trapping instruction

Code starting with the faulting instruction
===========================================
0: f9406ec0 ldr x0, [x22, #216]
<4>[ 97.506697] ---[ end trace 0000000000000000 ]---


Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231016/testrun/20616666/suite/log-parser-test/test/check-kernel-oops/log
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231016/testrun/20616666/suite/log-parser-test/tests/

Build:
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2Wpo3Fqa5DhxsWQjZYBnbqMmD8X/vmlinux.xz
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2Wpo3Fqa5DhxsWQjZYBnbqMmD8X/System.map
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2Wpo3Fqa5DhxsWQjZYBnbqMmD8X/

Step to reproduce:
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2Wpo5DC7b6y3ZyDnxzj6rn5ZNlX/reproducer

# To install tuxrun to your home directory at ~/.local/bin:
# pip3 install -U --user tuxrun==0.49.2
#
# Or install a deb/rpm depending on the running distribution
# See https://tuxmake.org/install-deb/ or
# https://tuxmake.org/install-rpm/
#
# See https://tuxrun.org/ for complete documentation.

tuxrun --runtime podman --device qemu-arm64 --boot-args rw --kernel
https://storage.tuxsuite.com/public/linaro/lkft/builds/2Wpo3Fqa5DhxsWQjZYBnbqMmD8X/Image.gz
--modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2Wpo3Fqa5DhxsWQjZYBnbqMmD8X/modules.tar.xz
--rootfs https://storage.tuxboot.com/debian/bookworm/arm64/rootfs.ext4.xz
--parameters SKIPFILE=skipfile-lkft.yaml --image
docker.io/linaro/tuxrun-dispatcher:v0.49.2 --tests ltp-hugetlb
--timeouts boot=30 ltp-hugetlb=20 --overlay
https://storage.tuxboot.com/overlays/debian/bookworm/arm64/ltp/20230516/ltp.tar.xz

--
Linaro LKFT
https://lkft.linaro.org


2023-10-16 11:06:04

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: mm: Unable to handle kernel NULL pointer dereference at virtual address - mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)

On Mon, Oct 16, 2023 at 03:52:07PM +0530, Naresh Kamboju wrote:
> Following kernel crash noticed while running LTP hugetlb and selftests on
> qemu-x86_64 and qemu-arm64 running with Linux next 6.6.0-rc6-next-20231016.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
> Reported-by: Naresh Kamboju <[email protected]>
>
> Test Logs:
> -----

[snip]

> <4>[ 97.499871] Call trace:
> <4>[ 97.500288] mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)

OK this is from a patch of mine, and an easy fix (incorrect assumption about
vm->vm_file == file).

I will put a fix forward tonight.

> <4>[ 97.500814] do_mmap (mm/mmap.c:1379)
> <4>[ 97.501243] vm_mmap_pgoff (mm/util.c:546)
> <4>[ 97.501711] ksys_mmap_pgoff (mm/mmap.c:1425)
> <4>[ 97.502166] __arm64_sys_mmap (arch/arm64/kernel/sys.c:21)
> <4>[ 97.502634] invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:56)
> <4>[ 97.503175] el0_svc_common.constprop.0
> (include/linux/thread_info.h:127 (discriminator 2)
> arch/arm64/kernel/syscall.c:144 (discriminator 2))
> <4>[ 97.503763] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> <4>[ 97.504191] el0_svc (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:133
> arch/arm64/kernel/entry-common.c:144
> arch/arm64/kernel/entry-common.c:679)

[snip]

2023-10-16 16:42:43

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: mm: Unable to handle kernel NULL pointer dereference at virtual address - mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)

On Mon, Oct 16, 2023 at 12:05:37PM +0100, Lorenzo Stoakes wrote:
> On Mon, Oct 16, 2023 at 03:52:07PM +0530, Naresh Kamboju wrote:
> > Following kernel crash noticed while running LTP hugetlb and selftests on
> > qemu-x86_64 and qemu-arm64 running with Linux next 6.6.0-rc6-next-20231016.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > Reported-by: Naresh Kamboju <[email protected]>
> >
> > Test Logs:
> > -----
>
> [snip]
>
> > <4>[ 97.499871] Call trace:
> > <4>[ 97.500288] mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)
>
> OK this is from a patch of mine, and an easy fix (incorrect assumption about
> vm->vm_file == file).
>
> I will put a fix forward tonight.
>
> > <4>[ 97.500814] do_mmap (mm/mmap.c:1379)
> > <4>[ 97.501243] vm_mmap_pgoff (mm/util.c:546)
> > <4>[ 97.501711] ksys_mmap_pgoff (mm/mmap.c:1425)
> > <4>[ 97.502166] __arm64_sys_mmap (arch/arm64/kernel/sys.c:21)
> > <4>[ 97.502634] invoke_syscall (arch/arm64/include/asm/current.h:19
> > arch/arm64/kernel/syscall.c:56)
> > <4>[ 97.503175] el0_svc_common.constprop.0
> > (include/linux/thread_info.h:127 (discriminator 2)
> > arch/arm64/kernel/syscall.c:144 (discriminator 2))
> > <4>[ 97.503763] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> > <4>[ 97.504191] el0_svc (arch/arm64/include/asm/daifflags.h:28
> > arch/arm64/kernel/entry-common.c:133
> > arch/arm64/kernel/entry-common.c:144
> > arch/arm64/kernel/entry-common.c:679)
>
> [snip]

Have cc-d people in this thread on it, but for the record, -fix patch is at
https://lore.kernel.org/all/[email protected]/

2023-10-17 14:15:44

by Dan Carpenter

[permalink] [raw]
Subject: Re: mm: Unable to handle kernel NULL pointer dereference at virtual address - mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)

On Mon, Oct 16, 2023 at 05:32:00PM +0100, Lorenzo Stoakes wrote:
> On Mon, Oct 16, 2023 at 12:05:37PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Oct 16, 2023 at 03:52:07PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash noticed while running LTP hugetlb and selftests on
> > > qemu-x86_64 and qemu-arm64 running with Linux next 6.6.0-rc6-next-20231016.
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > > Reported-by: Naresh Kamboju <[email protected]>
> > >
> > > Test Logs:
> > > -----
> >
> > [snip]
> >
> > > <4>[ 97.499871] Call trace:
> > > <4>[ 97.500288] mmap_region (include/linux/fs.h:580 mm/mmap.c:2946)
> >
> > OK this is from a patch of mine, and an easy fix (incorrect assumption about
> > vm->vm_file == file).
> >
> > I will put a fix forward tonight.
> >
> > > <4>[ 97.500814] do_mmap (mm/mmap.c:1379)
> > > <4>[ 97.501243] vm_mmap_pgoff (mm/util.c:546)
> > > <4>[ 97.501711] ksys_mmap_pgoff (mm/mmap.c:1425)
> > > <4>[ 97.502166] __arm64_sys_mmap (arch/arm64/kernel/sys.c:21)
> > > <4>[ 97.502634] invoke_syscall (arch/arm64/include/asm/current.h:19
> > > arch/arm64/kernel/syscall.c:56)
> > > <4>[ 97.503175] el0_svc_common.constprop.0
> > > (include/linux/thread_info.h:127 (discriminator 2)
> > > arch/arm64/kernel/syscall.c:144 (discriminator 2))
> > > <4>[ 97.503763] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> > > <4>[ 97.504191] el0_svc (arch/arm64/include/asm/daifflags.h:28
> > > arch/arm64/kernel/entry-common.c:133
> > > arch/arm64/kernel/entry-common.c:144
> > > arch/arm64/kernel/entry-common.c:679)
> >
> > [snip]
>
> Have cc-d people in this thread on it, but for the record, -fix patch is at
> https://lore.kernel.org/all/[email protected]/

Smatch also caught this bug. Your patch silences the warning.

mm/mmap.c:2946 mmap_region() error: we previously assumed 'file' could be null (see line 2849)

It's amazing that Naresh was able to hit this after it had only been in
linux-next for less than a day.

regards,
dan carpenter