LinuxLists.cc - kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems

2024-05-08 18:21:40

Subject: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

Greetings!

Got that on my dual CPU PowerMac G4 DP shortly after boot. This does not happen every time at bootup though:

[...]
kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 1 PID: 40 Comm: kswapd0 Not tainted 6.8.9-gentoo-PMacG4 #1
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[effe5cc0] [c0784b64] dump_stack_lvl+0x80/0xac (unreliable)
[effe5ce0] [c01aef80] warn_alloc+0x100/0x178
[effe5d40] [c01af34c] __alloc_pages+0x354/0x8ac
[effe5e00] [c01af988] page_frag_alloc_align+0x68/0x17c
[effe5e30] [c063b7b4] __netdev_alloc_skb+0xb4/0x17c
[effe5e60] [be99573c] setup_rx_descbuffer+0x40/0x144 [b43legacy]
[effe5e90] [be99687c] b43legacy_dma_rx+0x20c/0x278 [b43legacy]
[effe5ee0] [be9897f4] b43legacy_interrupt_tasklet+0x580/0x5a4 [b43legacy]
[effe5f50] [c0050b34] tasklet_action_common.isra.0+0xb0/0xe8
[effe5f80] [c07aaf5c] __do_softirq+0x1dc/0x218
[effe5ff0] [c0008820] do_softirq_own_stack+0x34/0x40
[c1c555f0] [c07ff5f0] 0xc07ff5f0
[c1c55610] [c0050384] __irq_exit_rcu+0x6c/0xbc
[c1c55620] [c00507f8] irq_exit+0x10/0x20
[c1c55630] [c00045b4] HardwareInterrupt_virt+0x108/0x10c
--- interrupt: 500 at BIT_flushBits+0x1c/0x58
NIP: c0458840 LR: c0458fdc CTR: 00000000
REGS: c1c55640 TRAP: 0500 Not tainted (6.8.9-gentoo-PMacG4)
MSR: 0220b032 <VEC,EE,FP,ME,IR,DR,RI> CR: 88082202 XER: 00000000

GPR00: c0458f98 c1c55700 c1c2b9a0 c1c5575c 00000000 00000000 00000000 00000000
GPR08: 00000000 00000000 f2c1d0d5 c1c55710 88084202 00000000 f2c1e000 f35f8150
GPR16: 00000000 f34bf238 f34bf30c 00000000 0000001b 00000002 00000000 00000000
GPR24: f35f8150 00000080 f35f7950 f35f7d50 00000000 c07ff5f0 00000052 f35f5be0
NIP [c0458840] BIT_flushBits+0x1c/0x58
LR [c0458fdc] ZSTD_encodeSequences+0x2ac/0x308
--- interrupt: 500
[c1c55700] [c07ff5f0] 0xc07ff5f0 (unreliable)
[c1c55710] [c0458f98] ZSTD_encodeSequences+0x268/0x308
[c1c557b0] [c0452eb0] ZSTD_entropyCompressSeqStore.constprop.0+0x1c4/0x2bc
[c1c55840] [c045357c] ZSTD_compressBlock_internal+0xac/0x144
[c1c55870] [c04543c8] ZSTD_compressContinue_internal+0x734/0x7c0
[c1c55920] [c04559d4] ZSTD_compressEnd+0x2c/0x13c
[c1c55950] [c04574b0] ZSTD_compressStream2+0x1b8/0x508
[c1c559a0] [c045788c] ZSTD_compressStream2_simpleArgs+0x48/0x60
[c1c559e0] [c0457904] ZSTD_compress2+0x60/0x90
[c1c55a10] [c03d0abc] __zstd_compress+0x54/0x78
[c1c55a70] [c03c3794] scomp_acomp_comp_decomp+0xe8/0x16c
[c1c55aa0] [c01c4c80] zswap_store+0x4b4/0x668
[c1c55b20] [c01bc404] swap_writepage+0x3c/0xa8
[c1c55b40] [c0172dd4] pageout+0x11c/0x1ec
[c1c55bc0] [c0174e2c] shrink_folio_list+0x6b0/0x878
[c1c55c40] [c01767e8] evict_folios+0x9d4/0xd08
[c1c55d40] [c0176cec] try_to_shrink_lruvec+0x1d0/0x210
[c1c55da0] [c0176dcc] shrink_one+0xa0/0x134
[c1c55dd0] [c01786a4] shrink_node+0x248/0x844
[c1c55e50] [c0178f68] balance_pgdat+0x2c8/0x614
[c1c55f50] [c01794cc] kswapd+0x218/0x24c
[c1c55fc0] [c006aae4] kthread+0xe4/0xe8
[c1c55ff0] [c0015304] start_kernel_thread+0x10/0x14
Mem-Info:
active_anon:247864 inactive_anon:215615 isolated_anon:0
active_file:9633 inactive_file:12323 isolated_file:0
unevictable:4 dirty:0 writeback:2
slab_reclaimable:1293 slab_unreclaimable:6904
mapped:16876 shmem:155 pagetables:899
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:2224 free_pcp:596 free_cma:0
Node 0 active_anon:991456kB inactive_anon:862460kB active_file:38532kB inactive_file:49292kB unevictable:16kB isolated(anon):0kB isolated(file):0kB mapped:67504kB dirty:0kB writeback:8kB shmem:620kB writeback_tmp:0kB kernel_stack:1536kB pagetables:3596kB sec_pagetables:0kB all_unreclaimable? no
DMA free:1680kB boost:4096kB min:7548kB low:8408kB high:9268kB reserved_highatomic:0KB active_anon:675928kB inactive_anon:5644kB active_file:124kB inactive_file:596kB unevictable:0kB writepending:0kB present:786432kB managed:746656kB mlocked:0kB bounce:0kB free_pcp:2384kB local_pcp:1004kB free_cma:0kB
lowmem_reserve[]: 0 0 1280 1280
DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 18*32kB (UME) 7*64kB (UM) 0*128kB 1*256kB (M) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 1824kB
35907 total pagecache pages
13813 pages in swap cache
Free swap = 8270180kB
[...]

There was no crash though, the G4 was still usable via VNC.

Full dmesg and kernel .config attached.

Regards,
Erhard

Attachments:

(No filename) (4.27 kB)
config_689_g4 (113.83 kB)
dmesg_689_g4 (47.02 kB)
Download all attachments

2024-05-15 20:45:57

On Sat, 1 Jun 2024 00:01:48 -0600
Yu Zhao <[email protected]> wrote:

> Hi Erhard,
>
> The OOM kills on both kernel versions seem to be reasonable to me.
>
> Your system has 2GB memory and it uses zswap with zsmalloc (which is
> good since it can allocate from the highmem zone) and zstd/lzo (which
> doesn't matter much). Somehow -- I couldn't figure out why -- it
> splits the 2GB into a 0.25GB DMA zone and a 1.75GB highmem zone:
>
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff]
> [ 0.000000] Normal empty
> [ 0.000000] HighMem [mem 0x0000000030000000-0x000000007fffffff]
>
> The kernel can't allocate from the highmem zone -- only userspace and
> zsmalloc can. OOM kills were due to the low memory conditions in the
> DMA zone where the kernel itself failed to allocate from.
>
> Do you know a kernel version that doesn't have OOM kills while running
> the same workload? If so, could you send that .config to me? If not,
> could you try disabling CONFIG_HIGHMEM? (It might not help but I'm out
> of ideas at the moment.)
>
> Thanks!

Hi Yu!

Thanks for looking into this.

The reason for this 0.25GB DMA / 1.75GB highmem split is beyond my knowledge. I can only tell this much that it's like this at least since kernel v4.14.x (dmesg of an old bugreport of mine at https://bugzilla.kernel.org/show_bug.cgi?id=201723), I guess earlier kernel versions too.

Without CONFIG_HIGHMEM the memory layout looks like this:

Total memory = 768MB; using 2048kB for hash table
[...]
Top of RAM: 0x30000000, Total RAM: 0x30000000
Memory hole size: 0MB
Zone ranges:
DMA [mem 0x0000000000000000-0x000000002fffffff]
Normal empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000000000-0x000000002fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000002fffffff]
percpu: Embedded 29 pages/cpu s28448 r8192 d82144 u118784
pcpu-alloc: s28448 r8192 d82144 u118784 alloc=29*4096
pcpu-alloc: [0] 0 [0] 1
Kernel command line: ro root=/dev/sda5 slub_debug=FZP page_poison=1 [email protected]/eth0,[email protected]/A8:A1:59:16:4F:EA debug
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
Built 1 zonelists, mobility grouping on. Total pages: 194880
mem auto-init: stack:all(pattern), heap alloc:off, heap free:off
Kernel virtual memory layout:
* 0xffbdf000..0xfffff000 : fixmap
* 0xff8f4000..0xffbdf000 : early ioremap
* 0xf1000000..0xff8f4000 : vmalloc & ioremap
* 0xb0000000..0xc0000000 : modules
Memory: 761868K/786432K available (7760K kernel code, 524K rwdata, 4528K rodata, 1100K init, 253K bss, 24564K reserved, 0K cma-reserved)
[...]

With only 768 MB RAM and 2048K hashtable I get pretty much the same "kswapd0: page allocation failure: order:0, mode:0xcc0(GFP_KERNEL),nodemask=(null),cpuset=/,mems_allowed=0" as with the HIGHMEM enabled kernel at
running "stress-ng --vm 2 --vm-bytes 1930M --verify -v".

I tried the workload on v6.6.32 LTS where the issue shows up too. But v6.1.92 LTS seems ok! Triple checked v6.1.92 to be sure.

Attached please find kernel v6.9.3 dmesg (without HIGHMEM) and kernel v6.1.92 .config.

Regards,
Erhard

Attachments:

(No filename) (3.27 kB)
dmesg_693_g4_highmem-disabled (50.08 kB)
config_6192_g4 (98.12 kB)
Download all attachments

2024-06-02 20:39:08

by Yu Zhao

On Tue, Jun 4, 2024 at 2:10 PM Erhard Furtner <[email protected]> wrote:
>
> On Tue, 4 Jun 2024 11:01:39 -0700
> Yosry Ahmed <[email protected]> wrote:
>
> > How many CPUs does this machine have? I am wondering if 32 can be an
> > overkill for small machines, perhaps the number of pools should be
> > max(nr_cpus, 32)?
>
> This PowerMac G4 DP got 2 CPUs. Not much for a desktop machine by todays standards but some SoCs have less. ;)
>
> # lscpu
> Architecture: ppc
> CPU op-mode(s): 32-bit
> Byte Order: Big Endian
> CPU(s): 2
> On-line CPU(s) list: 0,1
> Model name: 7455, altivec supported
> Model: 3.3 (pvr 8001 0303)
> Thread(s) per core: 1
> Core(s) per socket: 1
> Socket(s): 2
> BogoMIPS: 83.78
> Caches (sum of all):
> L1d: 64 KiB (2 instances)
> L1i: 64 KiB (2 instances)
> L2: 512 KiB (2 instances)
> L3: 4 MiB (2 instances)
>
> Regards,
> Erhard

Could you check if the attached patch helps? It basically changes the
number of zpools from 32 to min(32, nr_cpus).

Attachments:

0001-mm-zswap-do-not-scale-the-number-of-zpools-unnecessa.patch (3.84 kB)

2024-06-05 23:04:58

[permalink] [raw]

Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

On (24/06/07 10:40), Nhat Pham wrote:
> Personally, I'm not super convinced about class locks. We're
> essentially relying on the post-compression size of the data to
> load-balance the queries - I can imagine a scenario where a workload
> has a concentrated distribution of post-compression data (i.e its
> pages are compressed to similar-ish sizes), and we're once again
> contending for a (few) lock(s) again.
>
> That said, I'll let the data tell the story :) We don't need a perfect
> solution, just a good enough solution for now.

Speaking of size class locks:

One thing to mention is that zsmalloc merges size classes, we never have
documented/claimed 256 size classe, the actual number is always much
much lower. Each such "cluster" (merged size classes) holds a range of
objects' sizes (e.g. 3504-3584 bytes). The wider the cluster's size range
the more likely the (size class) lock contention is.

Setting CONFIG_ZSMALLOC_CHAIN_SIZE to 10 or higher makes zsmalloc pool
to be configured with more size class clusters (which means that clusters
hold narrower size intervals).