Hello,
Kernel: 5.1 (self-compiled, no modules)
Arch: x86_64
Distro: Debian Testing
Issue: I was performing a dump of ext3 and ext4 filesystems and then
restoring them to a separate volume (testing)-- afterwards I noticed that
khugepaged is stuck at 100% CPU. It is currently still stuck at 100% CPU, is
there anything I can do to debug what is happening prior to a reboot to work
around the issue? I have never seen this behavior prior to 5.1.
$ cat /proc/cmdline
auto BOOT_IMAGE=5.1.0-2 ro root=901 3w-sas.use_msi=1 nohugeiomap
page_poison=1 pcie_aspm=off pti=on rootfstype=ext4 slub_debug=P
zram.enabled=1 zram.num_devices=12 zswap.enabled=1 zswap.compressor=lz4
zswap.zpool=z3fold
$ 5.1 .config attached: config.txt.gz
$ attachment: graphic.JPG -> graph of the processes, dark green ->
khugepaged
$ top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
77 root 39 19 0 0 0 R 100.0 0.0 23:29.27
khugepaged
1 root 20 0 171800 7832 4948 S 0.0 0.0 1:25.59 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02
kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/0+
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
mm_percpu+
Thoughts on debugging / before reboot to clear this up?
Thanks,
Justin.
On Thu, May 9, 2019 at 5:54 AM Justin Piszcz <[email protected]> wrote:
>
> Hello,
>
> Kernel: 5.1 (self-compiled, no modules)
> Arch: x86_64
> Distro: Debian Testing
>
> Issue: I was performing a dump of ext3 and ext4 filesystems and then
> restoring them to a separate volume (testing)-- afterwards I noticed that
> khugepaged is stuck at 100% CPU. It is currently still stuck at 100% CPU, is
> there anything I can do to debug what is happening prior to a reboot to work
> around the issue? I have never seen this behavior prior to 5.1.
>
> $ cat /proc/cmdline
> auto BOOT_IMAGE=5.1.0-2 ro root=901 3w-sas.use_msi=1 nohugeiomap
> page_poison=1 pcie_aspm=off pti=on rootfstype=ext4 slub_debug=P
> zram.enabled=1 zram.num_devices=12 zswap.enabled=1 zswap.compressor=lz4
> zswap.zpool=z3fold
>
> $ 5.1 .config attached: config.txt.gz
>
> $ attachment: graphic.JPG -> graph of the processes, dark green ->
> khugepaged
>
> $ top
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 77 root 39 19 0 0 0 R 100.0 0.0 23:29.27
> khugepaged
> 1 root 20 0 171800 7832 4948 S 0.0 0.0 1:25.59 systemd
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02
> kthreadd
> 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
> 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> rcu_par_gp
> 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> kworker/0+
> 8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> mm_percpu+
>
Will disable zcache/zswap for now, but FWIW
From perf top:
PerfTop: 1181 irqs/sec kernel:83.1% exact: 96.9% lost: 73/137
drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
-------------------------------------------------------------------------------
54.56% [kernel] [k] compaction_alloc
29.46% [kernel] [k] __pageblock_pfn_to_page
10.66% [kernel] [k] nmi
0.61% [kernel] [k] _cond_resched
0.32% [kernel] [k] format_decode
0.25% [kernel] [k] __lock_text_start
0.20% [kernel] [k] __vma_adjust
0.16% perf [.] __symbols__insert
0.13% perf [.] rb_insert_color_cached
I tried dropping caches, etc, I will disable huge pages for now.
Some other posts:
https://bugzilla.redhat.com/show_bug.cgi?id=879801
https://lkml.org/lkml/2012/6/27/565
https://lkml.org/lkml/2011/11/9/252
On Thu, May 09, 2019 at 05:54:56AM -0400, Justin Piszcz wrote:
> Hello,
>
> Kernel: 5.1 (self-compiled, no modules)
> Arch: x86_64
> Distro: Debian Testing
>
> Issue: I was performing a dump of ext3 and ext4 filesystems and then
> restoring them to a separate volume (testing)-- afterwards I noticed that
> khugepaged is stuck at 100% CPU. It is currently still stuck at 100% CPU, is
> there anything I can do to debug what is happening prior to a reboot to work
> around the issue? I have never seen this behavior prior to 5.1.
>
> $ cat /proc/cmdline
> auto BOOT_IMAGE=5.1.0-2 ro root=901 3w-sas.use_msi=1 nohugeiomap
> page_poison=1 pcie_aspm=off pti=on rootfstype=ext4 slub_debug=P
> zram.enabled=1 zram.num_devices=12 zswap.enabled=1 zswap.compressor=lz4
> zswap.zpool=z3fold
>
> $ 5.1 .config attached: config.txt.gz
>
> $ attachment: graphic.JPG -> graph of the processes, dark green ->
> khugepaged
>
> $ top
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 77 root 39 19 0 0 0 R 100.0 0.0 23:29.27
> khugepaged
> 1 root 20 0 171800 7832 4948 S 0.0 0.0 1:25.59 systemd
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02
> kthreadd
> 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
> 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> rcu_par_gp
> 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> kworker/0+
> 8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> mm_percpu+
>
> Thoughts on debugging / before reboot to clear this up?
Could you check what khugepaged doing?
cat /proc/$(pidof khugepaged)/stack
> -----Original Message-----
> From: Kirill A. Shutemov [mailto:[email protected]]
> Sent: Thursday, May 09, 2019 7:14 AM
> To: Justin Piszcz
> Cc: [email protected]
> Subject: Re: 5.1 kernel: khugepaged stuck at 100%
>
> On Thu, May 09, 2019 at 05:54:56AM -0400, Justin Piszcz wrote:
[ ... ]
> > Thoughts on debugging / before reboot to clear this up?
>
> Could you check what khugepaged doing?
>
> cat /proc/$(pidof khugepaged)/stack
Hello, the next time I see this (khugepage at 100%) I will provide the
requested info.
However, this time I'm getting a kernel read fault (below) under 5.1.1,
under 5.0.x and earlier I've never seen any of these errors before:
[54101.281515] BUG: unable to handle kernel paging request at
ffffea0002060000
[54101.281519] #PF error: [normal kernel read fault]
[54101.281521] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
[54101.281524] Oops: 0000 [#1] SMP PTI
[54101.281527] CPU: 0 PID: 77 Comm: khugepaged Tainted: G T
5.1.1 #1
[54101.281528] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2
01/16/2015
[54101.281534] RIP: 0010:isolate_freepages_block+0xb9/0x310
[54101.281535] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49 8d 45
79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef 00 00 00 <48>
8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01 0f 84
[54101.281537] RSP: 0018:ffffc900003a7860 EFLAGS: 00010246
[54101.281538] RAX: 0000000000000000 RBX: ffffea0002060000 RCX:
ffffc900003a7b69
[54101.281539] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffff828c4790
[54101.281540] RBP: 0000000000081800 R08: 0000000000000001 R09:
0000000000000000
[54101.281542] R10: 0000000000000202 R11: ffffffff828c43d0 R12:
0000000000000000
[54101.281543] R13: 0000000000000000 R14: ffffc900003a7af0 R15:
0000000000000000
[54101.281544] FS: 0000000000000000(0000) GS:ffff88903f800000(0000)
knlGS:0000000000000000
[54101.281546] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54101.281547] CR2: ffffea0002060000 CR3: 000000000280e002 CR4:
00000000001606f0
[54101.281548] Call Trace:
[54101.281552] compaction_alloc+0x83d/0x8d0
[54101.281555] migrate_pages+0x30d/0x750
[54101.281557] ? isolate_migratepages_block+0xa10/0xa10
[54101.281559] ? __reset_isolation_suitable+0x110/0x110
[54101.281561] compact_zone+0x684/0xa70
[54101.281563] compact_zone_order+0x109/0x150
[54101.281566] ? schedule_timeout+0x1ba/0x290
[54101.281569] ? record_times+0x13/0xa0
[54101.281571] try_to_compact_pages+0x10d/0x220
[54101.281574] __alloc_pages_direct_compact+0x93/0x180
[54101.281576] __alloc_pages_nodemask+0x6c7/0xe20
[54101.281579] ? __wake_up_common_lock+0xb0/0xb0
[54101.281581] khugepaged+0x31f/0x19b0
[54101.281584] ? __wake_up_common_lock+0xb0/0xb0
[54101.281586] ? collapse_shmem.isra.4+0xc20/0xc20
[54101.281589] kthread+0x10a/0x120
[54101.281592] ? __kthread_create_on_node+0x1b0/0x1b0
[54101.281595] ret_from_fork+0x35/0x40
[54101.281597] CR2: ffffea0002060000
[54101.281599] ---[ end trace d9bfd1c296893a64 ]---
[54101.281601] RIP: 0010:isolate_freepages_block+0xb9/0x310
[54101.281602] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49 8d 45
79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef 00 00 00 <48>
8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01 0f 84
[54101.281604] RSP: 0018:ffffc900003a7860 EFLAGS: 00010246
[54101.281605] RAX: 0000000000000000 RBX: ffffea0002060000 RCX:
ffffc900003a7b69
[54101.281606] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffff828c4790
[54101.281607] RBP: 0000000000081800 R08: 0000000000000001 R09:
0000000000000000
[54101.281608] R10: 0000000000000202 R11: ffffffff828c43d0 R12:
0000000000000000
[54101.281609] R13: 0000000000000000 R14: ffffc900003a7af0 R15:
0000000000000000
[54101.281610] FS: 0000000000000000(0000) GS:ffff88903f800000(0000)
knlGS:0000000000000000
[54101.281611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54101.281613] CR2: ffffea0002060000 CR3: 000000000280e002 CR4:
00000000001606f0
On Tue, May 14, 2019 at 9:16 AM Kirill A. Shutemov <[email protected]> wrote:
> Could you check what khugepaged doing?
>
> cat /proc/$(pidof khugepaged)/stack
It is doing it again, 10:12am - 2019-05-16
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
77 root 39 19 0 0 0 R 100.0 0.0 92:06.94 khugepaged
Kernel: 5.1.2
$ sudo cat /proc/$(pidof khugepaged)/stack
[<0>] 0xffffffffffffffff
$ perf top
PerfTop: 3716 irqs/sec kernel:92.9% exact: 99.1% lost: 68/68
drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
-------------------------------------------------------------------------------
47.53% [kernel] [k] compaction_alloc
38.88% [kernel] [k] __pageblock_pfn_to_page
6.68% [kernel] [k] nmi
0.58% [kernel] [k] __list_del_entry_valid
0.48% [kernel] [k] format_decode
0.39% [kernel] [k] __rb_insert_augmented
0.25% libdbus-1.so.3.19.9 [.] _dbus_string_hex_decode
0.24% [kernel] [k] entry_SYSCALL_64_after_hwframe
0.20% perf [.] rb_next
0.19% perf [.] __symbols__insert
On Thu, May 16, 2019 at 10:14 AM Justin Piszcz <[email protected]> wrote:
>
> On Tue, May 14, 2019 at 9:16 AM Kirill A. Shutemov <[email protected]> wrote:
>
> > Could you check what khugepaged doing?
> >
> > cat /proc/$(pidof khugepaged)/stack
>
> It is doing it again, 10:12am - 2019-05-16
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 77 root 39 19 0 0 0 R 100.0 0.0 92:06.94 khugepaged
>
> Kernel: 5.1.2
>
> $ sudo cat /proc/$(pidof khugepaged)/stack
> [<0>] 0xffffffffffffffff
>
> $ perf top
>
> PerfTop: 3716 irqs/sec kernel:92.9% exact: 99.1% lost: 68/68
> drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
> -------------------------------------------------------------------------------
>
> 47.53% [kernel] [k] compaction_alloc
> 38.88% [kernel] [k] __pageblock_pfn_to_page
> 6.68% [kernel] [k] nmi
> 0.58% [kernel] [k] __list_del_entry_valid
> 0.48% [kernel] [k] format_decode
> 0.39% [kernel] [k] __rb_insert_augmented
> 0.25% libdbus-1.so.3.19.9 [.] _dbus_string_hex_decode
> 0.24% [kernel] [k] entry_SYSCALL_64_after_hwframe
> 0.20% perf [.] rb_next
> 0.19% perf [.] __symbols__insert
Incase this is asked for later:
# find . -maxdepth 1 -type f | while read f; do echo "$f
---------------------"; cat $f; echo " "; done
./environ ---------------------
./auxv ---------------------
./status ---------------------
Name: khugepaged
Umask: 0000
State: R (running)
Tgid: 77
Ngid: 0
Pid: 77
PPid: 2
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
NStgid: 77
NSpid: 77
NSpgid: 0
NSsid: 0
Threads: 1
SigQ: 1/257125
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: ffffffffffffffff
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: fff
Cpus_allowed_list: 0-11
Mems_allowed: 1
Mems_allowed_list: 0
voluntary_ctxt_switches: 7728
nonvoluntary_ctxt_switches: 94662
./personality ---------------------
00000000
./limits ---------------------
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 257125 257125 processes
Max open files 1024 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 257125 257125 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
./sched ---------------------
khugepaged (77, #threads: 1)
-------------------------------------------------------------------
se.exec_start : 84565632.684623
se.vruntime : 30287808.462012
se.sum_exec_runtime : 5808238.978150
se.nr_migrations : 946
nr_switches : 102390
nr_voluntary_switches : 7728
nr_involuntary_switches : 94662
se.load.weight : 15360
se.runnable_weight : 15360
se.avg.load_sum : 47454
se.avg.runnable_load_sum : 47454
se.avg.util_sum : 48835831
se.avg.load_avg : 14
se.avg.runnable_load_avg : 14
se.avg.util_avg : 1024
se.avg.last_update_time : 84565632684032
se.avg.util_est.ewma : 25
se.avg.util_est.enqueued : 16
policy : 0
prio : 139
clock-delta : 11
./comm ---------------------
khugepaged
./syscall ---------------------
running
./cmdline ---------------------
./stat ---------------------
77 (khugepaged) R 2 0 0 0 -1 27265088 0 0 0 0 0 580824 0 0 39 19 1 0
725 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 4 0 0
0 0 0 0 0 0 0 0 0 0 0
./statm ---------------------
0 0 0 0 0 0 0
./maps ---------------------
./mem ---------------------
./clear_refs ---------------------
cat: ./clear_refs: Invalid argument
./smaps ---------------------
./smaps_rollup ---------------------
cat: ./smaps_rollup: No such process
./pagemap ---------------------
./wchan ---------------------
0
./stack ---------------------
[<0>] 0xffffffffffffffff
./schedstat ---------------------
5808249976150 707215957352 102392
./latency ---------------------
Latency Top version : v0.1
./cpuset ---------------------
/
./cgroup ---------------------
11:perf_event:/
10:cpuset:/
9:devices:/
8:rdma:/
7:cpu,cpuacct:/
6:blkio:/
5:freezer:/
4:net_cls:/
3:memory:/
2:pids:/
1:name=systemd:/
0::/
./oom_score ---------------------
0
./oom_adj ---------------------
0
./oom_score_adj ---------------------
0
./coredump_filter ---------------------
./io ---------------------
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0
./uid_map ---------------------
0 0 4294967295
./gid_map ---------------------
0 0 4294967295
./projid_map ---------------------
0 0 4294967295
./setgroups ---------------------
allow
./timerslack_ns ---------------------
50000
./stack_depth ---------------------
previous stack depth: 16376
stack depth: 16376
On Thu, May 16, 2019 at 10:17 AM Justin Piszcz <[email protected]> wrote:
>
> On Thu, May 16, 2019 at 10:14 AM Justin Piszcz <[email protected]> wrote:
> >
> > On Tue, May 14, 2019 at 9:16 AM Kirill A. Shutemov <[email protected]> wrote:
> >
> > > Could you check what khugepaged doing?
> > >
> > > cat /proc/$(pidof khugepaged)/stack
> >
> > It is doing it again, 10:12am - 2019-05-16
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 77 root 39 19 0 0 0 R 100.0 0.0 92:06.94 khugepaged
> >
> > Kernel: 5.1.2
> >
> > $ sudo cat /proc/$(pidof khugepaged)/stack
> > [<0>] 0xffffffffffffffff
> >
As a workaround/in the meantime--I will add the following to lilo/grub:
transparent_hugepage=never
Justin
On Thu, May 16, 2019 at 10:27:50AM -0400, Justin Piszcz wrote:
> On Thu, May 16, 2019 at 10:17 AM Justin Piszcz <[email protected]> wrote:
> >
> > On Thu, May 16, 2019 at 10:14 AM Justin Piszcz <[email protected]> wrote:
> > >
> > > On Tue, May 14, 2019 at 9:16 AM Kirill A. Shutemov <[email protected]> wrote:
> > >
> > > > Could you check what khugepaged doing?
> > > >
> > > > cat /proc/$(pidof khugepaged)/stack
> > >
> > > It is doing it again, 10:12am - 2019-05-16
> > >
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > 77 root 39 19 0 0 0 R 100.0 0.0 92:06.94 khugepaged
> > >
> > > Kernel: 5.1.2
> > >
> > > $ sudo cat /proc/$(pidof khugepaged)/stack
> > > [<0>] 0xffffffffffffffff
> > >
>
> As a workaround/in the meantime--I will add the following to lilo/grub:
> transparent_hugepage=never
>
> Justin
Cc'ing myself since i observe such a behaviour sometimes right after KVM
VM is launched. No luck with reproducing it on demand so far, though.
--
Best regards,
Oleksandr Natalenko (post-factum)
Senior Software Maintenance Engineer
On 2019/05/16 16:14, Justin Piszcz <[email protected]> wrote:
> Kernel: 5.1.2
>
> $ sudo cat /proc/$(pidof khugepaged)/stack
> [<0>] 0xffffffffffffffff
>
> $ perf top
>
> PerfTop: 3716 irqs/sec kernel:92.9% exact: 99.1% lost: 68/68
> drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
> -------------------------------------------------------------------------------
>
> 47.53% [kernel] [k] compaction_alloc
> 38.88% [kernel] [k] __pageblock_pfn_to_page
> 6.68% [kernel] [k] nmi
> 0.58% [kernel] [k] __list_del_entry_valid
> 0.48% [kernel] [k] format_decode
> 0.39% [kernel] [k] __rb_insert_augmented
> 0.25% libdbus-1.so.3.19.9 [.] _dbus_string_hex_decode
> 0.24% [kernel] [k] entry_SYSCALL_64_after_hwframe
> 0.20% perf [.] rb_next
> 0.19% perf [.] __symbols__insert
I have the same problem (kernel 5.1.7), but over here, it's a PHP
process, not khugepaged, which is looping inside compaction_alloc.
This is from "perf report":
100.00% 0.00% php-cgi7.0 php-cgi7.0 [.] 0x000055d0e88bc5ee
|
---0x55d0e88bc5ee
page_fault
__do_page_fault
handle_mm_fault
__handle_mm_fault
do_huge_pmd_anonymous_page
__alloc_pages_nodemask
__alloc_pages_slowpath
__alloc_pages_direct_compact
try_to_compact_pages
compact_zone_order
compact_zone
migrate_pages
compaction_alloc
|
|--24.43%--__pageblock_pfn_to_page
|
--2.22%--_cond_resched
|
--0.89%--rcu_all_qs
ftrace:
<...>-263514 [012] .... 109004.793009: rcu_all_qs <-_cond_resched
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793009: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793010: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: __pageblock_pfn_to_page <-compaction_alloc
<...>-263514 [012] .... 109004.793011: _cond_resched <-compaction_alloc
(Repeating this sequence over and over.)
Nothing useful in /proc/263514/{stack,wchan,syscall}.
What else can I do to collect more information to aid fix this?
Max
On 2019/06/06 19:24, Max Kellermann <[email protected]> wrote:
> I have the same problem (kernel 5.1.7), but over here, it's a PHP
> process, not khugepaged, which is looping inside compaction_alloc.
This is what happened an hour later:
kernel tried to execute NX-protected page - exploit attempt? (uid: 33333)
BUG: unable to handle kernel paging request at ffffffffc036f00f
#PF error: [PROT] [INSTR]
PGD 35fa10067 P4D 35fa10067 PUD 35fa12067 PMD 105ba71067 PTE 800000022d28e061
Oops: 0011 [#1] SMP PTI
CPU: 12 PID: 263514 Comm: php-cgi7.0 Not tainted 5.1.7-cmag1-th+ #5
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
RIP: 0010:0xffffffffc036f00f
Code: Bad RIP value.
RSP: 0018:ffffb63c4d547928 EFLAGS: 00010216
RAX: 0000000000000000 RBX: ffffb63c4d547b10 RCX: 0000ffc004d021bd
RDX: ffff9ac83fffc500 RSI: 7fe0026810dee7ff RDI: 7fe0026810dee400
RBP: 7fe0026810dee400 R08: 0000000000000002 R09: 0000000000020300
R10: 00010642641a0d3a R11: 0000000000000001 R12: 7fe0026810dee800
R13: 0000000000000001 R14: 0000000000000000 R15: ffff9ac83fffc500
FS: 00007fa5c1000740(0000) GS:ffff9ad01f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc036efe5 CR3: 00000008eb8a0005 CR4: 00000000001606e0
Call Trace:
? move_freelist_tail+0xd0/0xd0
? migrate_pages+0xaa/0x780
? isolate_freepages_block+0x380/0x380
? compact_zone+0x6ec/0xca0
? compact_zone_order+0xd8/0x120
? try_to_compact_pages+0xb1/0x260
? __alloc_pages_direct_compact+0x87/0x160
? __alloc_pages_slowpath+0x427/0xd50
? __alloc_pages_nodemask+0x2d6/0x310
? do_huge_pmd_anonymous_page+0x131/0x680
? vma_merge+0x24f/0x3a0
? __handle_mm_fault+0xbca/0x1260
? handle_mm_fault+0x135/0x1b0
? __do_page_fault+0x242/0x4b0
? page_fault+0x8/0x30
? page_fault+0x1e/0x30
Modules linked in:
CR2: ffffffffc036f00f
---[ end trace 0f31edf3041f5d9e ]---
RIP: 0010:0xffffffffc036f00f
Code: Bad RIP value.
RSP: 0018:ffffb63c4d547928 EFLAGS: 00010216
RAX: 0000000000000000 RBX: ffffb63c4d547b10 RCX: 0000ffc004d021bd
RDX: ffff9ac83fffc500 RSI: 7fe0026810dee7ff RDI: 7fe0026810dee400
RBP: 7fe0026810dee400 R08: 0000000000000002 R09: 0000000000020300
R10: 00010642641a0d3a R11: 0000000000000001 R12: 7fe0026810dee800
R13: 0000000000000001 R14: 0000000000000000 R15: ffff9ac83fffc500
FS: 00007fa5c1000740(0000) GS:ffff9ad01f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc036efe5 CR3: 00000008eb8a0005 CR4: 00000000001606e0
Hi.
On Fri, Jun 07, 2019 at 09:40:52AM +0200, Max Kellermann wrote:
> On 2019/06/06 19:24, Max Kellermann <[email protected]> wrote:
> > I have the same problem (kernel 5.1.7), but over here, it's a PHP
> > process, not khugepaged, which is looping inside compaction_alloc.
>
> This is what happened an hour later:
>
> kernel tried to execute NX-protected page - exploit attempt? (uid: 33333)
> BUG: unable to handle kernel paging request at ffffffffc036f00f
> #PF error: [PROT] [INSTR]
> PGD 35fa10067 P4D 35fa10067 PUD 35fa12067 PMD 105ba71067 PTE 800000022d28e061
> Oops: 0011 [#1] SMP PTI
> CPU: 12 PID: 263514 Comm: php-cgi7.0 Not tainted 5.1.7-cmag1-th+ #5
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
> RIP: 0010:0xffffffffc036f00f
> Code: Bad RIP value.
> RSP: 0018:ffffb63c4d547928 EFLAGS: 00010216
> RAX: 0000000000000000 RBX: ffffb63c4d547b10 RCX: 0000ffc004d021bd
> RDX: ffff9ac83fffc500 RSI: 7fe0026810dee7ff RDI: 7fe0026810dee400
> RBP: 7fe0026810dee400 R08: 0000000000000002 R09: 0000000000020300
> R10: 00010642641a0d3a R11: 0000000000000001 R12: 7fe0026810dee800
> R13: 0000000000000001 R14: 0000000000000000 R15: ffff9ac83fffc500
> FS: 00007fa5c1000740(0000) GS:ffff9ad01f600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffc036efe5 CR3: 00000008eb8a0005 CR4: 00000000001606e0
> Call Trace:
> ? move_freelist_tail+0xd0/0xd0
> ? migrate_pages+0xaa/0x780
> ? isolate_freepages_block+0x380/0x380
> ? compact_zone+0x6ec/0xca0
> ? compact_zone_order+0xd8/0x120
> ? try_to_compact_pages+0xb1/0x260
> ? __alloc_pages_direct_compact+0x87/0x160
> ? __alloc_pages_slowpath+0x427/0xd50
> ? __alloc_pages_nodemask+0x2d6/0x310
> ? do_huge_pmd_anonymous_page+0x131/0x680
> ? vma_merge+0x24f/0x3a0
> ? __handle_mm_fault+0xbca/0x1260
> ? handle_mm_fault+0x135/0x1b0
> ? __do_page_fault+0x242/0x4b0
> ? page_fault+0x8/0x30
> ? page_fault+0x1e/0x30
> Modules linked in:
> CR2: ffffffffc036f00f
> ---[ end trace 0f31edf3041f5d9e ]---
> RIP: 0010:0xffffffffc036f00f
> Code: Bad RIP value.
> RSP: 0018:ffffb63c4d547928 EFLAGS: 00010216
> RAX: 0000000000000000 RBX: ffffb63c4d547b10 RCX: 0000ffc004d021bd
> RDX: ffff9ac83fffc500 RSI: 7fe0026810dee7ff RDI: 7fe0026810dee400
> RBP: 7fe0026810dee400 R08: 0000000000000002 R09: 0000000000020300
> R10: 00010642641a0d3a R11: 0000000000000001 R12: 7fe0026810dee800
> R13: 0000000000000001 R14: 0000000000000000 R15: ffff9ac83fffc500
> FS: 00007fa5c1000740(0000) GS:ffff9ad01f600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffc036efe5 CR3: 00000008eb8a0005 CR4: 00000000001606e0
Make sure to check if e577c8b64d ("mm, compaction: make sure we isolate
a valid PFN") fixes your issue. It is staged for 5.1.8, BTW.
--
Best regards,
Oleksandr Natalenko (post-factum)
Senior Software Maintenance Engineer
On 2019/06/07 09:40, Max Kellermann <[email protected]> wrote:
> On 2019/06/06 19:24, Max Kellermann <[email protected]> wrote:
> > I have the same problem (kernel 5.1.7), but over here, it's a PHP
> > process, not khugepaged, which is looping inside compaction_alloc.
>
> This is what happened an hour later:
>
> kernel tried to execute NX-protected page - exploit attempt? (uid: 33333)
> BUG: unable to handle kernel paging request at ffffffffc036f00f
Another machine in the cluster has just crashed (same kernel, and
hugepage/compaction again):
BUG: unable to handle kernel paging request at ffffd43382060000
#PF error: [normal kernel read fault]
PGD 85fff8067 P4D 85fff8067 PUD 85fff7067 PMD 0
Oops: 0000 [#2] SMP PTI
CPU: 27 PID: 117908 Comm: php-cgi7.0 Tainted: G D 5.1.7-cmag1-th+ #5
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
RIP: 0010:isolate_freepages_block+0xa3/0x380
Code: 10 4c 89 c0 c7 44 24 0c 00 00 00 00 4d 89 de 48 c1 e0 06 f6 c3 1f 48 89 44 24 18 44 89 c8 49 89 d1 41 89 c4 0f 84 03 01 00 00 <49> 8b 07 83 c5 01 a9 00 00 01 00 75 0c 49 8b 47 08 a8 01 0f 84 b8
RSP: 0018:ffffad0d258bf898 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000081800 RCX: ffffad0d258bfb89
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9c08ffffc8e0
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000081bff
R10: 0000000000000000 R11: ffffad0d258bfb89 R12: 0000000000000000
R13: ffffad0d258bfb10 R14: ffffad0d258bfb89 R15: ffffd43382060000
FS: 00007f92c1c84740(0000) GS:ffff9c08df9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd43382060000 CR3: 0000000966150002 CR4: 00000000001606e0
Call Trace:
compaction_alloc+0x7e6/0x870
? move_freelist_tail+0xd0/0xd0
migrate_pages+0xaa/0x780
? isolate_freepages_block+0x380/0x380
compact_zone+0x6ec/0xca0
compact_zone_order+0xd8/0x120
try_to_compact_pages+0xb1/0x260
__alloc_pages_direct_compact+0x87/0x160
__alloc_pages_slowpath+0x427/0xd50
__alloc_pages_nodemask+0x2d6/0x310
do_huge_pmd_anonymous_page+0x131/0x680
? vma_merge+0x24f/0x3a0
__handle_mm_fault+0xbca/0x1260
handle_mm_fault+0x135/0x1b0
__do_page_fault+0x242/0x4b0
? page_fault+0x8/0x30
page_fault+0x1e/0x30
RIP: 0033:0x55ab2b7205ee
Code: 24 18 01 00 00 41 8b 8c 24 48 01 00 00 8d 51 01 41 3b 94 24 4c 01 00 00 41 89 94 24 48 01 00 00 7e 08 41 89 94 24 4c 01 00 00 <4c> 89 20 49 8b 94 24 38 01 00 00 45 31 db 41 bd 00 10 00 00 41 b9
RSP: 002b:00007ffd79f76750 EFLAGS: 00010202
RAX: 00007f929b800000 RBX: 0000000000000011 RCX: 0000000000000007
RDX: 0000000000000008 RSI: 0000000000200000 RDI: 00007f929b800000
RBP: 0000000000000006 R08: ffffffffffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 00007f92bd000040
R13: 00000000ffffffff R14: 0000000000000200 R15: 00007f929c7ac648
Modules linked in:
CR2: ffffd43382060000
---[ end trace 73412f77c0bd1f72 ]---
RIP: 0010:isolate_freepages_block+0xa3/0x380
Code: 10 4c 89 c0 c7 44 24 0c 00 00 00 00 4d 89 de 48 c1 e0 06 f6 c3 1f 48 89 44 24 18 44 89 c8 49 89 d1 41 89 c4 0f 84 03 01 00 00 <49> 8b 07 83 c5 01 a9 00 00 01 00 75 0c 49 8b 47 08 a8 01 0f 84 b8
RSP: 0018:ffffad0d0894f898 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000082000 RCX: ffffad0d0894fb89
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9c08ffffc8e0
RBP: 0000000000000000 R08: 0000000000000001 R09: 00000000000823ff
R10: 0000000000000000 R11: ffffad0d0894fb89 R12: 0000000000000000
R13: ffffad0d0894fb10 R14: ffffad0d0894fb89 R15: ffffd43382080000
FS: 00007f92c1c84740(0000) GS:ffff9c08df9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd43382060000 CR3: 0000000966150002 CR4: 00000000001606e0