Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
I am bisecting this problem.
Reported-by: Linux Kernel Functional Testing <[email protected]>
The initial investigation show that,
GOOD: next-20220603
BAD: next-20220606
Boot log:
Starting kernel ...
The recent changes show,
# git log --oneline next-20220603..next-20220606 -- arch/arm64/
202693ac55e0 (origin/akpm-base, origin/akpm) Merge branch
'mm-everything' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
a83bdd6800e3 Merge branch 'rust-next' of
https://github.com/Rust-for-Linux/linux.git
9daba6cb8145 Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
582d5ed4caf7 Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
1ec6574a3c0a Merge tag 'kthread-cleanups-for-v5.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
21873bd66b6e Merge tag 'arm64-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
a8fc46f5a417 mm: avoid unnecessary page fault retires on shared memory types
3c59c47d1a6d arm64: Change elfcore for_each_mte_vma() to use VMA iterator
1c826fa748d5 arm64: remove mmap linked list from vdso
54c2cc79194c Merge tag 'usb-5.19-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
09a018176ba2 Merge tag 'arm-late-5.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
96479c09803b Merge tag 'arm-multiplatform-5.19-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Test job link,
https://lkft.validation.linaro.org/scheduler/job/5136989#L560
metadata:
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git_sha: 40b58e42584bf5bd9230481dc8946f714fb387de
git_describe: next-20220606
kernel_version: 5.19.0-rc1
kernel-config: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/config
build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/556237413
artifact-location: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy
--
Linaro LKFT
https://lkft.linaro.org
On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
>
> Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> I am bisecting this problem.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> The initial investigation show that,
>
> GOOD: next-20220603
> BAD: next-20220606
>
> Boot log:
> Starting kernel ...
Linux next-20220606 and next-20220607 arm64 boot failed.
The kernel panic log showing after earlycon.
Reported-by: Linux Kernel Functional Testing <[email protected]>
[ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
[ 0.000000] Linux version 5.19.0-rc1-next-20220606
(tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-3) 11.3.0, GNU
ld (GNU Binutils for Debian) 2.38) #1 SMP PREEMPT @1654490846
[ 0.000000] Machine model: ARM Juno development board (r2)
[ 0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
[ 0.000000] printk: bootconsole [pl11] enabled
[ 0.000000] efi: UEFI not found.
[ 0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '115200n8')
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] console 'pl11' already registered
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/printk/printk.c:3327
register_console+0x64/0x2ec
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.19.0-rc1-next-20220606 #1
[ 0.000000] Hardware name: ARM Juno development board (r2) (DT)
[ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : register_console+0x64/0x2ec
[ 0.000000] lr : register_console+0x64/0x2ec
[ 0.000000] sp : ffff80000a963c80
[ 0.000000] x29: ffff80000a963c80 x28: 00000000820a0018 x27: 0000000000000000
[ 0.000000] x26: 00000000fef770dc x25: 0000000000000000 x24: ffff80000acbc000
[ 0.000000] x23: 0000000000000000 x22: ffff80000a0b1a30 x21: ffff80000ae39250
[ 0.000000] x20: 00000000000050cc x19: ffff80000acbc5e0 x18: ffffffffffffffff
[ 0.000000] x17: 0000000000ffa000 x16: 00000009ff006000 x15: ffff80008a963957
[ 0.000000] x14: 0000000000000000 x13: 6465726574736967 x12: 6572207964616572
[ 0.000000] x11: 6c61202731316c70 x10: ffff80000a9ea6a8 x9 : ffff80000a9926a8
[ 0.000000] x8 : 00000000ffffefff x7 : ffff80000a9ea6a8 x6 : 0000000000000000
[ 0.000000] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff80000a979e00
[ 0.000000] Call trace:
[ 0.000000] register_console+0x64/0x2ec
[ 0.000000] of_setup_earlycon+0x254/0x278
[ 0.000000] early_init_dt_scan_chosen_stdout+0x164/0x1a4
[ 0.000000] acpi_boot_table_init+0x1d8/0x218
[ 0.000000] setup_arch+0x28c/0x5f0
[ 0.000000] start_kernel+0xa4/0x748
[ 0.000000] __primary_switched+0xc0/0xc8
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] NUMA: No NUMA configuration found
login-action: exception
#
[ 0.000000] NUMA: Faking a #
[login-action] Waiting for messages, (timeout 00:12:59)
node at [mem 0x0000000080000000-0x00000009ffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x9fefd5b40-0x9fefd7fff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x00000000feffffff]
[ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
[ 0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges
[ 0.000000] cma: Reserved 32 MiB at 0x00000000fd000000
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.1 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.0
[ 0.000000] percpu: Embedded 30 pages/cpu s82792 r8192 d31896 u122880
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: ARM erratum 843419
[ 0.000000] CPU features: detected: ARM erratum 845719
[ 0.000000] Fallback order for Node 0: 0
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2060288
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: console=ttyAMA0,115200n8
root=/dev/nfs rw
nfsroot=10.66.16.125:/var/lib/lava/dispatcher/tmp/5143101/extract-nfsrootfs-i9fmnadt,tcp,hard,vers=3,wsize=65536
earlycon=pl011,0x7ff80000 console_msg_format=syslog earlycon
default_hugepagesz=2M hugepages=256
sky2.mac_address=0x00,0x02,0xF7,0x00,0x67,0x17 ip=dhcp
<6>[ 0.000000] HugeTLB: can optimize 7 vmemmap pages for hugepages-2048kB
<6>[ 0.000000] Dentry cache hash table entries: 1048576 (order: 11,
8388608 bytes, linear)
<6>[ 0.000000] Inode-cache hash table entries: 524288 (order: 10,
4194304 bytes, linear)
<6>[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
<6>[ 0.000000] software IO TLB: mapped [mem
0x00000000f9000000-0x00000000fd000000] (64MB)
<6>[ 0.000000] Memory: 8062180K/8372224K available (20032K kernel
code, 4884K rwdata, 11148K rodata, 11008K init, 951K bss, 277276K
reserved, 32768K cma-reserved)
<6>[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
<6>[ 0.000000] ftrace: allocating 65019 entries in 254 pages
<6>[ 0.000000] ftrace: allocated 254 pages with 7 groups
<6>[ 0.000000] trace event string verifier disabled
<6>[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
<6>[ 0.000000] rcu: RCU event tracing is enabled.
<6>[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=6.
<6>[ 0.000000] Trampoline variant of Tasks RCU enabled.
<6>[ 0.000000] Rude variant of Tasks RCU enabled.
<6>[ 0.000000] Tracing variant of Tasks RCU enabled.
<6>[ 0.000000] rcu: RCU calculated value of scheduler-enlistment
delay is 25 jiffies.
<6>[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
<6>[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
<6>[ 0.000000] Root IRQ handler: gic_handle_irq
<6>[ 0.000000] GIC: Using split EOI/Deactivate mode
<6>[ 0.000000] GICv2m: range[mem 0x2c1c0000-0x2c1cffff], SPI[224:255]
<6>[ 0.000000] GICv2m: range[mem 0x2c1d0000-0x2c1dffff], SPI[256:287]
<6>[ 0.000000] GICv2m: range[mem 0x2c1e0000-0x2c1effff], SPI[288:319]
<6>[ 0.000000] GICv2m: range[mem 0x2c1f0000-0x2c1fffff], SPI[320:351]
<6>[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
<6>[ 0.000000] kfence: initialized - using 2097152 bytes for 255
objects at 0x(____ptrval____)-0x(____ptrval____)
<3>[ 0.000000] timer_sp804: timer clock not found: -517
<3>[ 0.000000] timer_sp804: arm,sp804 clock not found: -2
<3>[ 0.000000] Failed to initialize
'/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@110000':
-22
<3>[ 0.000000] timer_sp804: timer clock not found: -517
<3>[ 0.000000] timer_sp804: arm,sp804 clock not found: -2
<3>[ 0.000000] Failed to initialize
'/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@120000':
-22
<6>[ 0.000000] arch_timer: cp15 and mmio timer(s) running at
50.00MHz (phys/phys).
<6>[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
<6>[ 0.000000] sched_clock: 56 bits at 50MHz, resolution 20ns,
wraps every 4398046511100ns
<6>[ 0.009801] Console: colour dummy device 80x25
<6>[ 0.014654] Calibrating delay loop (skipped), value calculated
using timer frequency.. 100.00 BogoMIPS (lpj=200000)
<6>[ 0.025413] pid_max: default: 32768 minimum: 301
<6>[ 0.030453] LSM: Security Framework initializing
<1>[ 0.035435] Unable to handle kernel paging request at virtual
address fffffe00002bc248
<1>[ 0.043654] Mem abort info:
<1>[ 0.046719] ESR = 0x0000000096000004
<1>[ 0.050752] EC = 0x25: DABT (current EL), IL = 32 bits
<1>[ 0.056355] SET = 0, FnV = 0
<1>[ 0.059683] EA = 0, S1PTW = 0
<1>[ 0.063105] FSC = 0x04: level 0 translation fault
<1>[ 0.068270] Data abort info:
<1>[ 0.071421] ISV = 0, ISS = 0x00000004
<1>[ 0.075539] CM = 0, WnR = 0
<1>[ 0.078780] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082090000
<1>[ 0.085778] [fffffe00002bc248] pgd=0000000000000000, p4d=0000000000000000
<0>[ 0.092881] Internal error: Oops: 96000004 [#1] PREEMPT SMP
<4>[ 0.098730] Modules linked in:
<4>[ 0.102054] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W
5.19.0-rc1-next-20220606 #1
<4>[ 0.111214] Hardware name: ARM Juno development board (r2) (DT)
<4>[ 0.117407] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT
-SSBS BTYPE=--)
<4>[ 0.124652] pc : mem_cgroup_from_obj+0x2c/0x120
<4>[ 0.129462] lr : register_pernet_operations+0xf0/0x59c
<4>[ 0.134878] sp : ffff80000a963d70
<4>[ 0.138458] x29: ffff80000a963d70 x28: 00000000820a0018 x27:
0000000000000000
<4>[ 0.145886] x26: ffff80000a0c7688 x25: ffff80000a0c7688 x24:
ffff80000ad5e680
<4>[ 0.153313] x23: ffff80000a963dd8 x22: ffff80000ad5e818 x21:
ffff80000a979e00
<4>[ 0.160739] x20: ffff80000af09740 x19: ffff80000ad5e720 x18:
0000000000000014
<4>[ 0.168166] x17: 00000000beabf81a x16: 00000000d8e898a9 x15:
000000005b20ff98
<4>[ 0.175594] x14: 00000000032b2301 x13: 00000000c9e39f56 x12:
0000000014288186
<4>[ 0.183021] x11: 00000000bcf02680 x10: 000000008d09a8d9 x9 :
ffff800009146254
<4>[ 0.190446] x8 : ffff80000a963d48 x7 : 0000000000000000 x6 :
0000000000000002
<4>[ 0.197872] x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 :
ffff80000ad5e680
<4>[ 0.205299] x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 :
ffff80000af09740
<4>[ 0.212726] Call trace:
<4>[ 0.215435] mem_cgroup_from_obj+0x2c/0x120
<4>[ 0.219894] register_pernet_subsys+0x3c/0x60
<4>[ 0.224523] net_ns_init+0xe4/0x13c
<4>[ 0.228285] start_kernel+0x6d4/0x748
<4>[ 0.232222] __primary_switched+0xc0/0xc8
<0>[ 0.236513] Code: b25657e4 d34cfc21 d37ae421 8b040022 (f9400443)
<4>[ 0.242886] ---[ end trace 0000000000000000 ]---
<0>[ 0.247788] Kernel panic - not syncing: Attempted to kill the idle task!
<0>[ 0.254772] ---[ end Kernel panic - not syncing: Attempted to
kill the idle task! ]---
>
> The recent changes show,
>
> # git log --oneline next-20220603..next-20220606 -- arch/arm64/
> 202693ac55e0 (origin/akpm-base, origin/akpm) Merge branch
> 'mm-everything' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> a83bdd6800e3 Merge branch 'rust-next' of
> https://github.com/Rust-for-Linux/linux.git
> 9daba6cb8145 Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
> 582d5ed4caf7 Merge branch 'master' of
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
> 1ec6574a3c0a Merge tag 'kthread-cleanups-for-v5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
> 21873bd66b6e Merge tag 'arm64-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> a8fc46f5a417 mm: avoid unnecessary page fault retires on shared memory types
> 3c59c47d1a6d arm64: Change elfcore for_each_mte_vma() to use VMA iterator
> 1c826fa748d5 arm64: remove mmap linked list from vdso
> 54c2cc79194c Merge tag 'usb-5.19-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> 09a018176ba2 Merge tag 'arm-late-5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> 96479c09803b Merge tag 'arm-multiplatform-5.19-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>
>
> Test job link,
> https://lkft.validation.linaro.org/scheduler/job/5136989#L560
>
>
> metadata:
> git_ref: master
> git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> git_sha: 40b58e42584bf5bd9230481dc8946f714fb387de
> git_describe: next-20220606
> kernel_version: 5.19.0-rc1
> kernel-config: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/config
> build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/556237413
> artifact-location: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy
>
>
--
Linaro LKFT
https://lkft.linaro.org
On Mon, Jun 6, 2022 at 11:25 PM Stephen Rothwell <[email protected]> wrote:
>
> Hi Naresh,
>
> On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <[email protected]> wrote:
> >
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > >
> > > The initial investigation show that,
> > >
> > > GOOD: next-20220603
> > > BAD: next-20220606
> > >
> > > Boot log:
> > > Starting kernel ...
> >
> > Linux next-20220606 and next-20220607 arm64 boot failed.
> > The kernel panic log showing after earlycon.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> Can you test v5.19-rc1, please? If that does not fail, then you could
> bisect between that and next-20220606 ...
>
This is already reported at
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
the underlying issue (which is calling virt_to_page() on a vmalloc
address).
Hi Shakeel,
> > > Can you test v5.19-rc1, please? If that does not fail, then you could
> > > bisect between that and next-20220606 ...
> > >
> >
> > This is already reported at
> > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > the underlying issue (which is calling virt_to_page() on a vmalloc
> > address).
>
> Sorry, I might be wrong. Just checked the stacktrace again and it
> seems like the failure is happening in early boot in this report.
> Though the error "Unable to handle kernel paging request at virtual
> address" is happening in the function mem_cgroup_from_obj().
>
> Naresh, can you repro the issue if you revert the patch "net: set
> proper memcg for net_init hooks allocations"?
yes. You are right !
19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
After reverting this single commit I am able to boot arm64 successfully.
Reported-by: Linux Kernel Functional Testing <[email protected]>
--
Linaro LKFT
https://lkft.linaro.org
On Mon, Jun 6, 2022 at 11:36 PM Shakeel Butt <[email protected]> wrote:
>
> On Mon, Jun 6, 2022 at 11:25 PM Stephen Rothwell <[email protected]> wrote:
> >
> > Hi Naresh,
> >
> > On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
> > > >
> > > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > > I am bisecting this problem.
> > > >
> > > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > > >
> > > > The initial investigation show that,
> > > >
> > > > GOOD: next-20220603
> > > > BAD: next-20220606
> > > >
> > > > Boot log:
> > > > Starting kernel ...
> > >
> > > Linux next-20220606 and next-20220607 arm64 boot failed.
> > > The kernel panic log showing after earlycon.
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >
> > Can you test v5.19-rc1, please? If that does not fail, then you could
> > bisect between that and next-20220606 ...
> >
>
> This is already reported at
> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> the underlying issue (which is calling virt_to_page() on a vmalloc
> address).
Sorry, I might be wrong. Just checked the stacktrace again and it
seems like the failure is happening in early boot in this report.
Though the error "Unable to handle kernel paging request at virtual
address" is happening in the function mem_cgroup_from_obj().
Naresh, can you repro the issue if you revert the patch "net: set
proper memcg for net_init hooks allocations"?
On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <[email protected]> wrote:
>
> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <[email protected]> wrote:
> >
> > Hi Shakeel,
> >
> > > > > Can you test v5.19-rc1, please? If that does not fail, then you could
> > > > > bisect between that and next-20220606 ...
> > > > >
> > > >
> > > > This is already reported at
> > > > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > > > the underlying issue (which is calling virt_to_page() on a vmalloc
> > > > address).
> > >
> > > Sorry, I might be wrong. Just checked the stacktrace again and it
> > > seems like the failure is happening in early boot in this report.
> > > Though the error "Unable to handle kernel paging request at virtual
> > > address" is happening in the function mem_cgroup_from_obj().
> > >
> > > Naresh, can you repro the issue if you revert the patch "net: set
> > > proper memcg for net_init hooks allocations"?
> >
> > yes. You are right !
> > 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
> > After reverting this single commit I am able to boot arm64 successfully.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >
>
> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
./scripts/faddr2line vmlinux mem_cgroup_from_obj+0x2c/0x120
mem_cgroup_from_obj+0x2c/0x120:
mem_cgroup_from_obj at ??:?
Please find the following artifacts which are causing kernel crashes.
vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map
- Naresh
Hi Naresh,
On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <[email protected]> wrote:
>
> On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
> >
> > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > I am bisecting this problem.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >
> > The initial investigation show that,
> >
> > GOOD: next-20220603
> > BAD: next-20220606
> >
> > Boot log:
> > Starting kernel ...
>
> Linux next-20220606 and next-20220607 arm64 boot failed.
> The kernel panic log showing after earlycon.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
Can you test v5.19-rc1, please? If that does not fail, then you could
bisect between that and next-20220606 ...
--
Cheers,
Stephen Rothwell
Hi Stephen,
On Tue, 7 Jun 2022 at 11:55, Stephen Rothwell <[email protected]> wrote:
>
> Hi Naresh,
>
> On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <[email protected]> wrote:
> >
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
The bisection found the first bad commit as,
19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
After reverting this single commit I am able to boot arm64 successfully.
- Naresh
On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <[email protected]> wrote:
>
> Hi Shakeel,
>
> > > > Can you test v5.19-rc1, please? If that does not fail, then you could
> > > > bisect between that and next-20220606 ...
> > > >
> > >
> > > This is already reported at
> > > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > > the underlying issue (which is calling virt_to_page() on a vmalloc
> > > address).
> >
> > Sorry, I might be wrong. Just checked the stacktrace again and it
> > seems like the failure is happening in early boot in this report.
> > Though the error "Unable to handle kernel paging request at virtual
> > address" is happening in the function mem_cgroup_from_obj().
> >
> > Naresh, can you repro the issue if you revert the patch "net: set
> > proper memcg for net_init hooks allocations"?
>
> yes. You are right !
> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
> After reverting this single commit I am able to boot arm64 successfully.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
Dear ARM developers,
could you please help me to find the reason of this problem?
On 6/7/22 18:29, Naresh Kamboju wrote:
> On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <[email protected]> wrote:
>>
>> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <[email protected]> wrote:
>>>
>>> Hi Shakeel,
>>>
>>>>>> Can you test v5.19-rc1, please? If that does not fail, then you could
>>>>>> bisect between that and next-20220606 ...
>>>>>>
>>>>>
>>>>> This is already reported at
>>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
>>>>> the underlying issue (which is calling virt_to_page() on a vmalloc
>>>>> address).
>>>>
>>>> Sorry, I might be wrong. Just checked the stacktrace again and it
>>>> seems like the failure is happening in early boot in this report.
>>>> Though the error "Unable to handle kernel paging request at virtual
>>>> address" is happening in the function mem_cgroup_from_obj().
>>>>
>>>> Naresh, can you repro the issue if you revert the patch "net: set
>>>> proper memcg for net_init hooks allocations"?
>>>
>>> yes. You are right !
>>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
>>> After reverting this single commit I am able to boot arm64 successfully.
>>>
>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>
>>
>> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
>
> ./scripts/faddr2line vmlinux mem_cgroup_from_obj+0x2c/0x120
> mem_cgroup_from_obj+0x2c/0x120:
> mem_cgroup_from_obj at ??:?
>
> Please find the following artifacts which are causing kernel crashes.
>
> vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
> System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map
Dear Naresh,
thank you very much
mem_cgroup_from_obj():
ffff80000836cf40: d503245f bti c
ffff80000836cf44: d503201f nop
ffff80000836cf48: d503201f nop
ffff80000836cf4c: d503233f paciasp
ffff80000836cf50: d503201f nop
ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656
ffff80000836cf58: 8b010001 add x1, x0, x1
ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104
ffff80000836cf60: d34cfc21 lsr x1, x1, #12
ffff80000836cf64: d37ae421 lsl x1, x1, #6
ffff80000836cf68: 8b040022 add x2, x1, x4
ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
according to System.map it is init_net
This issue is caused by calling virt_to_page() on address of static variable init_net.
Arm64 consider that addresses of static variables are not valid virtual addresses.
On x86_64 the same API works without any problem.
Unfortunately I do not understand the cause of the problem.
I do not see any bugs in my patch.
I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
to account for the specified object.
In particular, in the current case, I wanted to get the memory cgroup of the
specified network namespace by the name taken from for_each_net().
The first object in this list is the static structure unit_net
On x86_64 I can translate its address to page:
crash> p &init_net
$1 = (struct net *) 0xffffffff90c7bdc0 <init_net>
crash> vtop 0xffffffff90c7bdc0
VIRTUAL PHYSICAL
ffffffff90c7bdc0 402c7bdc0
PGD DIRECTORY: ffffffff8fe10000
PAGE DIRECTORY: 401e15067
PUD: 401e15ff0 => 401e16063
PMD: 401e16430 => 8000000402c000e3
PAGE: 402c00000 (2MB)
PTE PHYSICAL FLAGS
8000000402c000e3 402c00000 (PRESENT|RW|ACCESSED|DIRTY|PSE|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff227d00b1ec0 402c7b000 0 0 1 17ffffc0001000 reserved
However, as far as I understand this does not work for arm64.
Could you please help me to understand what is wrong here?
Below are:
link to my patch:
https://lore.kernel.org/all/[email protected]/
and the quote of my investigation of similar report:
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/
> virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
> WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys
...
> Call trace:
> __virt_to_phys
> mem_cgroup_from_obj
> __register_pernet_operations
@@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list,
* setup_net() and cleanup_net() are not possible.
*/
for_each_net(net) {
+ struct mem_cgroup *old, *memcg;
+
+ memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net)); <<<< Here
+ old = set_active_memcg(memcg);
error = ops_init(ops, net);
+ set_active_memcg(old);
+ mem_cgroup_put(memcg);
...
+static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+ do {
+ memcg = mem_cgroup_from_obj(p); <<<<
+ } while (memcg && !css_tryget(&memcg->css));
...
struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
struct folio *folio;
if (mem_cgroup_disabled())
return NULL;
folio = virt_to_folio(p); <<<< here
...
static inline struct folio *virt_to_folio(const void *x)
{
struct page *page = virt_to_page(x); <<< here
... (arm64)
#define virt_to_page(x) pfn_to_page(virt_to_pfn(x))
...
#define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
...
phys_addr_t __virt_to_phys(unsigned long x)
{
WARN(!__is_lm_address(__tag_reset(x)),
"virt_to_phys used for non-linear address: %pK (%pS)\n",
...
virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
Thank you,
Vasily Averin
On 2022/6/9 10:49, Vasily Averin wrote:
> Dear ARM developers,
> could you please help me to find the reason of this problem?
Hi,
> mem_cgroup_from_obj():
> ffff80000836cf40: d503245f bti c
> ffff80000836cf44: d503201f nop
> ffff80000836cf48: d503201f nop
> ffff80000836cf4c: d503233f paciasp
> ffff80000836cf50: d503201f nop
> ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656
> ffff80000836cf58: 8b010001 add x1, x0, x1
> ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104
> ffff80000836cf60: d34cfc21 lsr x1, x1, #12
> ffff80000836cf64: d37ae421 lsl x1, x1, #6
> ffff80000836cf68: 8b040022 add x2, x1, x4
> ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
>
> x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
>
> x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> according to System.map it is init_net
>
> This issue is caused by calling virt_to_page() on address of static variable init_net.
> Arm64 consider that addresses of static variables are not valid virtual addresses.
> On x86_64 the same API works without any problem.
>
> Unfortunately I do not understand the cause of the problem.
> I do not see any bugs in my patch.
> I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
> to account for the specified object.
> In particular, in the current case, I wanted to get the memory cgroup of the
> specified network namespace by the name taken from for_each_net().
> The first object in this list is the static structure unit_net
root@test:~# cat /proc/kallsyms |grep -w _data
ffff80000a110000 D _data
root@test:~# cat /proc/kallsyms |grep -w _end
ffff80000a500000 B _end
root@test:~# cat /proc/kallsyms |grep -w init_net
ffff80000a4eb980 B init_net
the init_net is located in data section, on arm64, it is allowed by
vmalloc, see
map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data,
0, 0);
and the arm has same behavior.
We could let init_net be allocated dynamically, but I think it could
change a lot.
Any better sugguestion, Catalin?
On Thu, Jun 09, 2022 at 12:43:00PM +0800, Kefeng Wang wrote:
>
> On 2022/6/9 11:44, Kefeng Wang wrote:
> >
> > On 2022/6/9 10:49, Vasily Averin wrote:
> > > Dear ARM developers,
> > > could you please help me to find the reason of this problem?
> > Hi,
> > > mem_cgroup_from_obj():
> > > ffff80000836cf40:?????? d503245f??????? bti???? c
> > > ffff80000836cf44:?????? d503201f??????? nop
> > > ffff80000836cf48:?????? d503201f??????? nop
> > > ffff80000836cf4c:?????? d503233f??????? paciasp
> > > ffff80000836cf50:?????? d503201f??????? nop
> > > ffff80000836cf54:?????? d2e00021??????? mov???? x1,
> > > #0x1000000000000??????????? // #281474976710656
> > > ffff80000836cf58:?????? 8b010001??????? add???? x1, x0, x1
> > > ffff80000836cf5c:?????? b25657e4??????? mov???? x4,
> > > #0xfffffc0000000000???????? // #-4398046511104
> > > ffff80000836cf60:?????? d34cfc21??????? lsr???? x1, x1, #12
> > > ffff80000836cf64:?????? d37ae421??????? lsl???? x1, x1, #6
> > > ffff80000836cf68:?????? 8b040022??????? add???? x2, x1, x4
> > > ffff80000836cf6c:?????? f9400443??????? ldr???? x3, [x2, #8]
> > >
> > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > >
> > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > according to System.map it is init_net
> > >
> > > This issue is caused by calling virt_to_page() on address of static
> > > variable init_net.
> > > Arm64 consider that addresses of static variables are not valid
> > > virtual addresses.
> > > On x86_64 the same API works without any problem.
> > >
> > > Unfortunately I do not understand the cause of the problem.
> > > I do not see any bugs in my patch.
> > > I'm using an existing API, mem_cgroup_from_obj(), to find the memory
> > > cgroup used
> > > to account for the specified object.
> > > In particular, in the current case, I wanted to get the memory
> > > cgroup of the
> > > specified network namespace by the name taken from for_each_net().
> > > The first object in this list is the static structure unit_net
> >
> > root@test:~# cat /proc/kallsyms |grep -w _data
> > ffff80000a110000 D _data
> > root@test:~# cat /proc/kallsyms |grep -w _end
> > ffff80000a500000 B _end
> > root@test:~# cat /proc/kallsyms |grep -w init_net
> > ffff80000a4eb980 B init_net
> >
> > the init_net is located in data section, on arm64, it is allowed by
> > vmalloc, see
> >
> > ??? map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0,
> > 0);
> >
> > and the arm has same behavior.
> >
> > We could let init_net be allocated dynamically, but I think it could
> > change a lot.
> >
> > Any better sugguestion, Catalin?
>
> or? add vmalloc check in mem_cgroup_from_obj()?
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 27cebaa53472..fb817e5da5f0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2860,7 +2860,10 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> ??????? if (mem_cgroup_disabled())
> ??????????????? return NULL;
>
> -?????? folio = virt_to_folio(p);
> +?????? if (unlikely(is_vmalloc_addr(p)))
> +?????????????? folio = page_folio(vmalloc_to_page(p));
> +?????? else
> +?????????????? folio = virt_to_folio(p);
>
> ??????? /*
> ???????? * Slab objects are accounted individually, not per-page.
>
It sounds right. Later we can add something like mem_cgroup_from_slab_obj()
to use on hot paths and avoid this check.
On 2022/6/9 11:44, Kefeng Wang wrote:
>
> On 2022/6/9 10:49, Vasily Averin wrote:
>> Dear ARM developers,
>> could you please help me to find the reason of this problem?
> Hi,
>> mem_cgroup_from_obj():
>> ffff80000836cf40: d503245f bti c
>> ffff80000836cf44: d503201f nop
>> ffff80000836cf48: d503201f nop
>> ffff80000836cf4c: d503233f paciasp
>> ffff80000836cf50: d503201f nop
>> ffff80000836cf54: d2e00021 mov x1,
>> #0x1000000000000 // #281474976710656
>> ffff80000836cf58: 8b010001 add x1, x0, x1
>> ffff80000836cf5c: b25657e4 mov x4,
>> #0xfffffc0000000000 // #-4398046511104
>> ffff80000836cf60: d34cfc21 lsr x1, x1, #12
>> ffff80000836cf64: d37ae421 lsl x1, x1, #6
>> ffff80000836cf68: 8b040022 add x2, x1, x4
>> ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
>>
>> x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
>> x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
>>
>> x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
>> according to System.map it is init_net
>>
>> This issue is caused by calling virt_to_page() on address of static
>> variable init_net.
>> Arm64 consider that addresses of static variables are not valid
>> virtual addresses.
>> On x86_64 the same API works without any problem.
>>
>> Unfortunately I do not understand the cause of the problem.
>> I do not see any bugs in my patch.
>> I'm using an existing API, mem_cgroup_from_obj(), to find the memory
>> cgroup used
>> to account for the specified object.
>> In particular, in the current case, I wanted to get the memory cgroup
>> of the
>> specified network namespace by the name taken from for_each_net().
>> The first object in this list is the static structure unit_net
>
> root@test:~# cat /proc/kallsyms |grep -w _data
> ffff80000a110000 D _data
> root@test:~# cat /proc/kallsyms |grep -w _end
> ffff80000a500000 B _end
> root@test:~# cat /proc/kallsyms |grep -w init_net
> ffff80000a4eb980 B init_net
>
> the init_net is located in data section, on arm64, it is allowed by
> vmalloc, see
>
> map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data,
> 0, 0);
>
> and the arm has same behavior.
>
> We could let init_net be allocated dynamically, but I think it could
> change a lot.
>
> Any better sugguestion, Catalin?
or add vmalloc check in mem_cgroup_from_obj()?
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 27cebaa53472..fb817e5da5f0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2860,7 +2860,10 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
if (mem_cgroup_disabled())
return NULL;
- folio = virt_to_folio(p);
+ if (unlikely(is_vmalloc_addr(p)))
+ folio = page_folio(vmalloc_to_page(p));
+ else
+ folio = virt_to_folio(p);
/*
* Slab objects are accounted individually, not per-page.
On Thu, Jun 09, 2022 at 11:11:54AM +0100, Will Deacon wrote:
> On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> > On 2022/6/9 10:49, Vasily Averin wrote:
> > > mem_cgroup_from_obj():
> > > ffff80000836cf40: d503245f bti c
> > > ffff80000836cf44: d503201f nop
> > > ffff80000836cf48: d503201f nop
> > > ffff80000836cf4c: d503233f paciasp
> > > ffff80000836cf50: d503201f nop
> > > ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656
> > > ffff80000836cf58: 8b010001 add x1, x0, x1
> > > ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104
> > > ffff80000836cf60: d34cfc21 lsr x1, x1, #12
> > > ffff80000836cf64: d37ae421 lsl x1, x1, #6
> > > ffff80000836cf68: 8b040022 add x2, x1, x4
> > > ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
> > >
> > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > >
> > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > according to System.map it is init_net
> > >
> > > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > > On x86_64 the same API works without any problem.
>
> This just depends on whether or not the kernel is running out of the linear
> mapping or not. On arm64, we use the vmalloc area for the kernel image and
> so virt_to_page() won't work, just like it won't work for modules on other
> architectures.
>
> How are module addresses handled by mem_cgroup_from_obj()?
It doesn't look like they are handled in any way. It just expects the
pointer to be a linear map one. Something like below:
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 27cebaa53472..795bf3673fa7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2860,6 +2860,11 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
if (mem_cgroup_disabled())
return NULL;
+ if (is_module_address((unsigned long)p))
+ return NULL;
+ else if (is_kernel((unsigned long)p))
+ return NULL;
+
folio = virt_to_folio(p);
/*
--
Catalin
On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> On 2022/6/9 10:49, Vasily Averin wrote:
> > mem_cgroup_from_obj():
> > ffff80000836cf40: d503245f bti c
> > ffff80000836cf44: d503201f nop
> > ffff80000836cf48: d503201f nop
> > ffff80000836cf4c: d503233f paciasp
> > ffff80000836cf50: d503201f nop
> > ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656
> > ffff80000836cf58: 8b010001 add x1, x0, x1
> > ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104
> > ffff80000836cf60: d34cfc21 lsr x1, x1, #12
> > ffff80000836cf64: d37ae421 lsl x1, x1, #6
> > ffff80000836cf68: 8b040022 add x2, x1, x4
> > ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
> >
> > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> >
> > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > according to System.map it is init_net
> >
> > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > On x86_64 the same API works without any problem.
This just depends on whether or not the kernel is running out of the linear
mapping or not. On arm64, we use the vmalloc area for the kernel image and
so virt_to_page() won't work, just like it won't work for modules on other
architectures.
How are module addresses handled by mem_cgroup_from_obj()?
> > Unfortunately I do not understand the cause of the problem.
> > I do not see any bugs in my patch.
> > I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
> > to account for the specified object.
> > In particular, in the current case, I wanted to get the memory cgroup of the
> > specified network namespace by the name taken from for_each_net().
> > The first object in this list is the static structure unit_net
>
> root@test:~# cat /proc/kallsyms |grep -w _data
> ffff80000a110000 D _data
> root@test:~# cat /proc/kallsyms |grep -w _end
> ffff80000a500000 B _end
> root@test:~# cat /proc/kallsyms |grep -w init_net
> ffff80000a4eb980 B init_net
>
> the init_net is located in data section, on arm64, it is allowed by vmalloc,
> see
>
> ??? map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
>
> and the arm has same behavior.
>
> We could let init_net be allocated dynamically, but I think it could change
> a lot.
>
> Any better sugguestion, Catalin?
For this specific issue, can you use lm_alias to get a virtual address
suitable for virt_to_page()? My question about modules still applies though.
Will
On Thu, Jun 9, 2022 at 3:26 AM Catalin Marinas <[email protected]> wrote:
>
> On Thu, Jun 09, 2022 at 11:11:54AM +0100, Will Deacon wrote:
> > On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> > > On 2022/6/9 10:49, Vasily Averin wrote:
> > > > mem_cgroup_from_obj():
> > > > ffff80000836cf40: d503245f bti c
> > > > ffff80000836cf44: d503201f nop
> > > > ffff80000836cf48: d503201f nop
> > > > ffff80000836cf4c: d503233f paciasp
> > > > ffff80000836cf50: d503201f nop
> > > > ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656
> > > > ffff80000836cf58: 8b010001 add x1, x0, x1
> > > > ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104
> > > > ffff80000836cf60: d34cfc21 lsr x1, x1, #12
> > > > ffff80000836cf64: d37ae421 lsl x1, x1, #6
> > > > ffff80000836cf68: 8b040022 add x2, x1, x4
> > > > ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
> > > >
> > > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > >
> > > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > > according to System.map it is init_net
> > > >
> > > > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > > > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > > > On x86_64 the same API works without any problem.
> >
> > This just depends on whether or not the kernel is running out of the linear
> > mapping or not. On arm64, we use the vmalloc area for the kernel image and
> > so virt_to_page() won't work, just like it won't work for modules on other
> > architectures.
> >
> > How are module addresses handled by mem_cgroup_from_obj()?
>
> It doesn't look like they are handled in any way. It just expects the
> pointer to be a linear map one.
Yes, that is correct.
> Something like below:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 27cebaa53472..795bf3673fa7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2860,6 +2860,11 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> if (mem_cgroup_disabled())
> return NULL;
>
> + if (is_module_address((unsigned long)p))
> + return NULL;
> + else if (is_kernel((unsigned long)p))
> + return NULL;
> +
How about just is_vmalloc_addr(p) check? It should cover modules and
also arm64 using vmalloc for kernel image cases.
> folio = virt_to_folio(p);
>
> /*
>
> --
> Catalin
On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <[email protected]> wrote:
> >
> [...]
> > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > +{
> > + struct folio *folio;
> > +
> > + if (mem_cgroup_disabled())
> > + return NULL;
> > +
> > + if (unlikely(is_vmalloc_addr(p)))
> > + folio = page_folio(vmalloc_to_page(p));
>
> Do we need to check for NULL from vmalloc_to_page(p)?
Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
I would be surprised, but maybe I'm missing something.
On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <[email protected]> wrote:
>
[...]
> +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +{
> + struct folio *folio;
> +
> + if (mem_cgroup_disabled())
> + return NULL;
> +
> + if (unlikely(is_vmalloc_addr(p)))
> + folio = page_folio(vmalloc_to_page(p));
Do we need to check for NULL from vmalloc_to_page(p)?
> + else
> + folio = virt_to_folio(p);
> +
> + return mem_cgroup_from_obj_folio(folio, p);
> +}
On Tue, Jun 07, 2022 at 11:00:39AM +0530, Naresh Kamboju wrote:
> On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
> >
> > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > I am bisecting this problem.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >
> > The initial investigation show that,
> >
> > GOOD: next-20220603
> > BAD: next-20220606
> >
> > Boot log:
> > Starting kernel ...
>
> Linux next-20220606 and next-20220607 arm64 boot failed.
> The kernel panic log showing after earlycon.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
Naresh, can you, please, check if the following patch resolves the issue?
(completely untested except for building)
--
From 6a454876c9a1886e3cf8e9b66dae19b326f8901a Mon Sep 17 00:00:00 2001
From: Roman Gushchin <[email protected]>
Date: Thu, 9 Jun 2022 10:03:20 -0700
Subject: [PATCH] mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe
Currently mem_cgroup_from_obj() is not working properly with objects
allocated using vmalloc(). It creates problems in some cases, when
it's called for static objects belonging to modules or generally
allocated using vmalloc().
This patch makes mem_cgroup_from_obj() safe to be called on objects
allocated using vmalloc().
It also introduces mem_cgroup_from_slab_obj(), which is a faster
version to use in places when we know the object is either a slab
object or a generic slab page (e.g. when adding an object to a lru
list).
Suggested-by: Kefeng Wang <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
---
include/linux/memcontrol.h | 6 ++++
mm/list_lru.c | 2 +-
mm/memcontrol.c | 71 +++++++++++++++++++++++++++-----------
3 files changed, 57 insertions(+), 22 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0d7584e2f335..4d31ce55b1c0 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1761,6 +1761,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
}
struct mem_cgroup *mem_cgroup_from_obj(void *p);
+struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
static inline void count_objcg_event(struct obj_cgroup *objcg,
enum vm_event_item idx)
@@ -1858,6 +1859,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
return NULL;
}
+static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
+{
+ return NULL;
+}
+
static inline void count_objcg_event(struct obj_cgroup *objcg,
enum vm_event_item idx)
{
diff --git a/mm/list_lru.c b/mm/list_lru.c
index ba76428ceece..a05e5bef3b40 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -71,7 +71,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
if (!list_lru_memcg_aware(lru))
goto out;
- memcg = mem_cgroup_from_obj(ptr);
+ memcg = mem_cgroup_from_slab_obj(ptr);
if (!memcg)
goto out;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4093062c5c9b..8c408d681377 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -783,7 +783,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
struct lruvec *lruvec;
rcu_read_lock();
- memcg = mem_cgroup_from_obj(p);
+ memcg = mem_cgroup_from_slab_obj(p);
/*
* Untracked pages have no memcg, no lruvec. Update only the
@@ -2833,27 +2833,9 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
return 0;
}
-/*
- * Returns a pointer to the memory cgroup to which the kernel object is charged.
- *
- * A passed kernel object can be a slab object or a generic kernel page, so
- * different mechanisms for getting the memory cgroup pointer should be used.
- * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
- * can not know for sure how the kernel object is implemented.
- * mem_cgroup_from_obj() can be safely used in such cases.
- *
- * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
- * cgroup_mutex, etc.
- */
-struct mem_cgroup *mem_cgroup_from_obj(void *p)
+static __always_inline
+struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
{
- struct folio *folio;
-
- if (mem_cgroup_disabled())
- return NULL;
-
- folio = virt_to_folio(p);
-
/*
* Slab objects are accounted individually, not per-page.
* Memcg membership data for each individual object is saved in
@@ -2886,6 +2868,53 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
return page_memcg_check(folio_page(folio, 0));
}
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ *
+ * A passed kernel object can be a slab object, vmalloc object or a generic
+ * kernel page, so different mechanisms for getting the memory cgroup pointer
+ * should be used.
+ *
+ * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
+ * can not know for sure how the kernel object is implemented.
+ * mem_cgroup_from_obj() can be safely used in such cases.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+ struct folio *folio;
+
+ if (mem_cgroup_disabled())
+ return NULL;
+
+ if (unlikely(is_vmalloc_addr(p)))
+ folio = page_folio(vmalloc_to_page(p));
+ else
+ folio = virt_to_folio(p);
+
+ return mem_cgroup_from_obj_folio(folio, p);
+}
+
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ * Similar to mem_cgroup_from_obj(), but faster and not suitable for objects,
+ * allocated using vmalloc().
+ *
+ * A passed kernel object must be a slab object or a generic kernel page.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
+{
+ if (mem_cgroup_disabled())
+ return NULL;
+
+ return mem_cgroup_from_obj_folio(virt_to_folio(p), p);
+}
+
static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
{
struct obj_cgroup *objcg = NULL;
--
2.35.3
On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <[email protected]> wrote:
> > >
> > [...]
> > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > +{
> > > + struct folio *folio;
> > > +
> > > + if (mem_cgroup_disabled())
> > > + return NULL;
> > > +
> > > + if (unlikely(is_vmalloc_addr(p)))
> > > + folio = page_folio(vmalloc_to_page(p));
> >
> > Do we need to check for NULL from vmalloc_to_page(p)?
>
> Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> I would be surprised, but maybe I'm missing something.
is_vmalloc_addr() is simply checking the range and some buggy caller can
provide an unmapped address within the range. Maybe VM_BUG_ON() should
be good enough (though no strong opinion either way).
On Thu, Jun 09, 2022 at 07:12:21PM +0000, Shakeel Butt wrote:
> On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> > On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <[email protected]> wrote:
> > > >
> > > [...]
> > > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > > +{
> > > > + struct folio *folio;
> > > > +
> > > > + if (mem_cgroup_disabled())
> > > > + return NULL;
> > > > +
> > > > + if (unlikely(is_vmalloc_addr(p)))
> > > > + folio = page_folio(vmalloc_to_page(p));
> > >
> > > Do we need to check for NULL from vmalloc_to_page(p)?
> >
> > Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> > I would be surprised, but maybe I'm missing something.
>
> is_vmalloc_addr() is simply checking the range and some buggy caller can
> provide an unmapped address within the range. Maybe VM_BUG_ON() should
> be good enough (though no strong opinion either way).
No strong opinion here as well, but I think we don't have to be too defensive
here. Actually we'll know anyway, unlikely a null pointer dereference will be
unnoticed. And it's not different to calling mem_cgroup_from_obj() with some
random invalid address now.
Thanks!
On Thu, Jun 09, 2022 at 03:05:08PM -0700, Roman Gushchin wrote:
> On Thu, Jun 09, 2022 at 07:12:21PM +0000, Shakeel Butt wrote:
> > On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> > > On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > > > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <[email protected]> wrote:
> > > > >
> > > > [...]
> > > > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > > > +{
> > > > > + struct folio *folio;
> > > > > +
> > > > > + if (mem_cgroup_disabled())
> > > > > + return NULL;
> > > > > +
> > > > > + if (unlikely(is_vmalloc_addr(p)))
> > > > > + folio = page_folio(vmalloc_to_page(p));
> > > >
> > > > Do we need to check for NULL from vmalloc_to_page(p)?
> > >
> > > Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> > > I would be surprised, but maybe I'm missing something.
> >
> > is_vmalloc_addr() is simply checking the range and some buggy caller can
> > provide an unmapped address within the range. Maybe VM_BUG_ON() should
> > be good enough (though no strong opinion either way).
>
> No strong opinion here as well, but I think we don't have to be too defensive
> here. Actually we'll know anyway, unlikely a null pointer dereference will be
> unnoticed. And it's not different to calling mem_cgroup_from_obj() with some
> random invalid address now.
>
Sounds good. You can add my ack when you send the official version of
the patch.
Hi Roman,
On Thu, 9 Jun 2022 at 22:57, Roman Gushchin <[email protected]> wrote:
>
> On Tue, Jun 07, 2022 at 11:00:39AM +0530, Naresh Kamboju wrote:
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <[email protected]> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > >
> > > The initial investigation show that,
> > >
> > > GOOD: next-20220603
> > > BAD: next-20220606
> > >
> > > Boot log:
> > > Starting kernel ...
> >
> > Linux next-20220606 and next-20220607 arm64 boot failed.
> > The kernel panic log showing after earlycon.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> Naresh, can you, please, check if the following patch resolves the issue?
> (completely untested except for building)
I have tested this patch on top of next-20220606 and boot successfully [1].
Tested-by: Linux Kernel Functional Testing <[email protected]>
> --
>
> From 6a454876c9a1886e3cf8e9b66dae19b326f8901a Mon Sep 17 00:00:00 2001
> From: Roman Gushchin <[email protected]>
> Date: Thu, 9 Jun 2022 10:03:20 -0700
> Subject: [PATCH] mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe
>
> Currently mem_cgroup_from_obj() is not working properly with objects
> allocated using vmalloc(). It creates problems in some cases, when
> it's called for static objects belonging to modules or generally
> allocated using vmalloc().
>
> This patch makes mem_cgroup_from_obj() safe to be called on objects
> allocated using vmalloc().
>
> It also introduces mem_cgroup_from_slab_obj(), which is a faster
> version to use in places when we know the object is either a slab
> object or a generic slab page (e.g. when adding an object to a lru
> list).
>
> Suggested-by: Kefeng Wang <[email protected]>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
> include/linux/memcontrol.h | 6 ++++
> mm/list_lru.c | 2 +-
> mm/memcontrol.c | 71 +++++++++++++++++++++++++++-----------
> 3 files changed, 57 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0d7584e2f335..4d31ce55b1c0 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1761,6 +1761,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
> }
>
> struct mem_cgroup *mem_cgroup_from_obj(void *p);
> +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
>
> static inline void count_objcg_event(struct obj_cgroup *objcg,
> enum vm_event_item idx)
> @@ -1858,6 +1859,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
> return NULL;
> }
>
> +static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
> +{
> + return NULL;
> +}
> +
> static inline void count_objcg_event(struct obj_cgroup *objcg,
> enum vm_event_item idx)
> {
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index ba76428ceece..a05e5bef3b40 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -71,7 +71,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
> if (!list_lru_memcg_aware(lru))
> goto out;
>
> - memcg = mem_cgroup_from_obj(ptr);
> + memcg = mem_cgroup_from_slab_obj(ptr);
> if (!memcg)
> goto out;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4093062c5c9b..8c408d681377 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -783,7 +783,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> struct lruvec *lruvec;
>
> rcu_read_lock();
> - memcg = mem_cgroup_from_obj(p);
> + memcg = mem_cgroup_from_slab_obj(p);
>
> /*
> * Untracked pages have no memcg, no lruvec. Update only the
> @@ -2833,27 +2833,9 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
> return 0;
> }
>
> -/*
> - * Returns a pointer to the memory cgroup to which the kernel object is charged.
> - *
> - * A passed kernel object can be a slab object or a generic kernel page, so
> - * different mechanisms for getting the memory cgroup pointer should be used.
> - * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
> - * can not know for sure how the kernel object is implemented.
> - * mem_cgroup_from_obj() can be safely used in such cases.
> - *
> - * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> - * cgroup_mutex, etc.
> - */
> -struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +static __always_inline
> +struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
> {
> - struct folio *folio;
> -
> - if (mem_cgroup_disabled())
> - return NULL;
> -
> - folio = virt_to_folio(p);
> -
> /*
> * Slab objects are accounted individually, not per-page.
> * Memcg membership data for each individual object is saved in
> @@ -2886,6 +2868,53 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> return page_memcg_check(folio_page(folio, 0));
> }
>
> +/*
> + * Returns a pointer to the memory cgroup to which the kernel object is charged.
> + *
> + * A passed kernel object can be a slab object, vmalloc object or a generic
> + * kernel page, so different mechanisms for getting the memory cgroup pointer
> + * should be used.
> + *
> + * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
> + * can not know for sure how the kernel object is implemented.
> + * mem_cgroup_from_obj() can be safely used in such cases.
> + *
> + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> + * cgroup_mutex, etc.
> + */
> +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +{
> + struct folio *folio;
> +
> + if (mem_cgroup_disabled())
> + return NULL;
> +
> + if (unlikely(is_vmalloc_addr(p)))
> + folio = page_folio(vmalloc_to_page(p));
> + else
> + folio = virt_to_folio(p);
> +
> + return mem_cgroup_from_obj_folio(folio, p);
> +}
> +
> +/*
> + * Returns a pointer to the memory cgroup to which the kernel object is charged.
> + * Similar to mem_cgroup_from_obj(), but faster and not suitable for objects,
> + * allocated using vmalloc().
> + *
> + * A passed kernel object must be a slab object or a generic kernel page.
> + *
> + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> + * cgroup_mutex, etc.
> + */
> +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
> +{
> + if (mem_cgroup_disabled())
> + return NULL;
> +
> + return mem_cgroup_from_obj_folio(virt_to_folio(p), p);
> +}
> +
> static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> {
> struct obj_cgroup *objcg = NULL;
> --
> 2.35.3
[1] https://lkft.validation.linaro.org/scheduler/job/5156201
--
Linaro LKFT
https://lkft.linaro.org