2023-09-29 12:46:51

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: [6.5.5] System slowdown during compilation workload and RIP: lazy_rcu_shrink_scan

Hi,

I notice a bug report on Bugzilla [1]. Quoting from it:

> ith Kernel 6.5.5, I've noticed a sudden system slowdown during a heavy compilation workload even though plenty of free RAM still available (only around 9 GB used out of 48 GB RAM in total). I first thought of being OOM, but as stated, plenty of free RAM was still available. The system freezed for a couple of seconds, became responsive again for some seconds but this cycle repeated itself until rebooting the system. While the compilation workload could finish (albeit slowly), the system became next to unusable for anything else.
>
> In the logs, I've seen the following:
>
> [ 4394.309138] ------------[ cut here ]------------
> [ 4394.309141] ------------[ cut here ]------------
> [ 4394.309143] WARNING: CPU: 11 PID: 436221 at kernel/rcu/tree_nocb.h:124 call_rcu+0xa85/0x14e0
> [ 4394.309142] WARNING: CPU: 8 PID: 366 at kernel/rcu/tree_nocb.h:104 lazy_rcu_shrink_scan+0x311/0x340
> [ 4394.309150] Modules linked in:
> [ 4394.309151] Modules linked in:
> [ 4394.309151] vfat
> [ 4394.309152] vfat
> [ 4394.309153] fat
> [ 4394.309153] fat
> [ 4394.309155] snd_hda_codec_realtek
> [ 4394.309155] snd_hda_codec_realtek
> [ 4394.309156] snd_hda_codec_generic
> [ 4394.309156] snd_hda_codec_generic ledtrig_audio
> [ 4394.309158] ledtrig_audio intel_rapl_msr
> [ 4394.309159] intel_rapl_msr
> [ 4394.309160] snd_hda_codec_hdmi
> [ 4394.309160] snd_hda_codec_hdmi
> [ 4394.309162] intel_rapl_common
> [ 4394.309162] intel_rapl_common
> [ 4394.309163] sb_edac
> [ 4394.309163] sb_edac x86_pkg_temp_thermal
> [ 4394.309165] x86_pkg_temp_thermal intel_powerclamp
> [ 4394.309166] intel_powerclamp snd_hda_intel
> [ 4394.309167] snd_hda_intel snd_intel_dspcfg
> [ 4394.309168] snd_intel_dspcfg snd_hda_codec
> [ 4394.309170] snd_hda_codec snd_hwdep
> [ 4394.309171] snd_hwdep
> [ 4394.309172] coretemp
> [ 4394.309172] coretemp
> [ 4394.309173] crct10dif_pclmul
> [ 4394.309173] crct10dif_pclmul snd_hda_core
> [ 4394.309174] snd_hda_core crc32_pclmul
> [ 4394.309175] crc32_pclmul
> [ 4394.309177] polyval_clmulni
> [ 4394.309177] polyval_clmulni polyval_generic
> [ 4394.309178] polyval_generic gf128mul
> [ 4394.309179] gf128mul
> [ 4394.309180] ghash_clmulni_intel
> [ 4394.309180] ghash_clmulni_intel sha512_ssse3
> [ 4394.309181] sha512_ssse3 snd_pcm
> [ 4394.309182] snd_pcm
> [ 4394.309184] sha256_ssse3
> [ 4394.309184] sha256_ssse3
> [ 4394.309185] sha1_ssse3
> [ 4394.309185] sha1_ssse3
> [ 4394.309186] aesni_intel
> [ 4394.309186] aesni_intel
> [ 4394.309188] snd_timer
> [ 4394.309188] snd_timer crypto_simd
> [ 4394.309189] crypto_simd cryptd
> [ 4394.309190] cryptd i2c_i801
> [ 4394.309191] i2c_i801 acpi_cpufreq
> [ 4394.309192] acpi_cpufreq i2c_smbus
> [ 4394.309194] i2c_smbus snd
> [ 4394.309195] snd igb
> [ 4394.309196] igb lpc_ich
> [ 4394.309197] lpc_ich
> [ 4394.309198] soundcore mei_wdt
> [ 4394.309199] soundcore mei_wdt
> [ 4394.309200] razerkbd(O)
> [ 4394.309201] razerkbd(O)
> [ 4394.309201] mousedev sch_fq_codel
> [ 4394.309202] mousedev sch_fq_codel
> [ 4394.309203] usbip_host usbip_core
> [ 4394.309204] usbip_host usbip_core
> [ 4394.309206] pkcs8_key_parser crypto_user
> [ 4394.309207] pkcs8_key_parser crypto_user
> [ 4394.309208] fuse loop
> [ 4394.309209] fuse
> [ 4394.309210] zram
> [ 4394.309210] loop zram
> [ 4394.309211] bpf_preload ip_tables
> [ 4394.309212] bpf_preload
> [ 4394.309214] x_tables
> [ 4394.309214] ip_tables x_tables
> [ 4394.309215] ext4
> [ 4394.309216] ext4
> [ 4394.309216] crc32c_generic mbcache
> [ 4394.309218] crc32c_generic mbcache
> [ 4394.309219] crc16
> [ 4394.309220] crc16
> [ 4394.309220] jbd2 usbhid
> [ 4394.309221] jbd2 usbhid
> [ 4394.309222] amdgpu mfd_core
> [ 4394.309223] amdgpu
> [ 4394.309224] drm_buddy drm_suballoc_helper
> [ 4394.309225] mfd_core
> [ 4394.309226] crc32c_intel
> [ 4394.309226] drm_buddy drm_suballoc_helper
> [ 4394.309227] video drm_ttm_helper
> [ 4394.309228] crc32c_intel
> [ 4394.309229] ttm i2c_algo_bit
> [ 4394.309230] video drm_ttm_helper
> [ 4394.309231] drm_display_helper cec
> [ 4394.309232] ttm i2c_algo_bit
> [ 4394.309234] xhci_pci gpu_sched
> [ 4394.309235] drm_display_helper
> [ 4394.309236] xhci_pci_renesas
> [ 4394.309236] cec xhci_pci
> [ 4394.309237] amdxcp wmi
> [ 4394.309238] gpu_sched xhci_pci_renesas
>
> [ 4394.309240] amdxcp wmi
> [ 4394.309240] CPU: 8 PID: 366 Comm: kswapd0 Tainted: G O 6.5.5-2.1-cachyos-lto #1 cc6c6175d85fa32a794326555b105a9657ffd076
> [ 4394.309244] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> [ 4394.309242] CPU: 11 PID: 436221 Comm: cc1plus Tainted: G O 6.5.5-2.1-cachyos-lto #1 cc6c6175d85fa32a794326555b105a9657ffd076
> [ 4394.309245] RIP: 0010:lazy_rcu_shrink_scan+0x311/0x340
> [ 4394.309246] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> [ 4394.309248] Code: 03 00 00 75 25 4c 89 e7 e8 ac 81 87 00 f0 41 ff 8e 74 01 00 00 41 f6 86 e8 00 00 00 04 0f 85 1a fe ff ff 0f 0b e9 13 fe ff ff <0f> 0b eb d7 48 c7 c7 20 ff ec 9c e8 ff d1 86 00 31 c0 49 83 fd 01
> [ 4394.309248] RIP: 0010:call_rcu+0xa85/0x14e0
> [ 4394.309250] RSP: 0000:ffff8e82a3fdba88 EFLAGS: 00010093
> [ 4394.309251] Code: 02 00 00 00 0f 85 f0 00 00 00 f6 83 e8 00 00 00 01 74 ac 48 8b 83 80 00 00 00 48 83 38 00 0f 94 c0 89 44 24 08 e9 06 f9 ff ff <0f> 0b 0f ae e8 e9 d3 f6 ff ff 0f 0b e9 cc f6 ff ff 4c 89 ef e8 62
>
> [ 4394.309253] RAX: 0000000000000008 RBX: 000000000000000b RCX: 0000000000000000
> [ 4394.309254] RSP: 0000:ffff8e7a8a033388 EFLAGS: 00010002
> [ 4394.309255] RDX: ffff8e84cf8b0148 RSI: 0000000000000286 RDI: ffff8e84cf8f0140
>
> [ 4394.309257] RBP: 0000000000000001 R08: ffff8e84cf82c540 R09: 0000000000000094
> [ 4394.309258] RAX: 000000000000000b RBX: ffff8e84cf8eff40 RCX: 0000000000001388
> [ 4394.309258] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e84cf8f0140
> [ 4394.309260] RDX: 40000077ed480115 RSI: ffff8e84cf8effb8 RDI: ffff8e7db6814da8
> [ 4394.309260] R13: 0000000000000006 R14: ffff8e84cf8eff40 R15: 0000000000000202
> [ 4394.309262] FS: 0000000000000000(0000) GS:ffff8e84cf800000(0000) knlGS:0000000000000000
> [ 4394.309262] RBP: 0000000000000002 R08: 0000000000000040 R09: ffff8e7db6814d90
> [ 4394.309264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4394.309264] R10: 000000000000000c R11: ffff8e843519d4f0 R12: ffff8e84cf8f0148
> [ 4394.309266] CR2: 00007f86c9ef0000 CR3: 0000000163681001 CR4: 00000000001706e0
> [ 4394.309266] R13: 0000000000000002 R14: 0000000000000000 R15: ffff8e7db6814da8
> [ 4394.309268] Call Trace:
> [ 4394.309268] FS: 00007f1566065b40(0000) GS:ffff8e84cf8c0000(0000) knlGS:0000000000000000
> [ 4394.309270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4394.309271] <TASK>
> [ 4394.309272] CR2: 00007f1548773000 CR3: 00000007a5401003 CR4: 00000000001706e0
> [ 4394.309274] Call Trace:
> [ 4394.309276] <TASK>
> [ 4394.309274] ? __warn+0x9e/0x160
> [ 4394.309278] ? lazy_rcu_shrink_scan+0x311/0x340
> [ 4394.309278] ? __warn+0x9e/0x160
> [ 4394.309282] ? call_rcu+0xa85/0x14e0
> [ 4394.309281] ? report_bug+0x147/0x180
> [ 4394.309284] ? handle_bug+0x48/0x80
> [ 4394.309285] ? report_bug+0x147/0x180
> [ 4394.309287] ? exc_invalid_op+0x16/0x40
> [ 4394.309289] ? handle_bug+0x48/0x80
> [ 4394.309290] ? asm_exc_invalid_op+0x16/0x20
> [ 4394.309292] ? exc_invalid_op+0x16/0x40
> [ 4394.309294] ? lazy_rcu_shrink_scan+0x311/0x340
> [ 4394.309295] ? asm_exc_invalid_op+0x16/0x20
> [ 4394.309297] ? lazy_rcu_shrink_scan+0x10e/0x340
> [ 4394.309299] shrink_slab+0x95a/0xae0
> [ 4394.309300] ? call_rcu+0xa85/0x14e0
> [ 4394.309303] shrink_node+0x1b48/0x29c0
> [ 4394.309303] ? sysvec_apic_timer_interrupt+0x11/0x80
> [ 4394.309308] kswapd+0x9fd/0x13e0
> [ 4394.309307] xas_store+0xb7a/0x1480
> [ 4394.309311] ? schedule+0x61a/0x1d20
> [ 4394.309313] shadow_lru_isolate+0x103/0x1c0
> [ 4394.309314] kthread+0x1bd/0x260
> [ 4394.309317] ? check_move_unevictable_folios+0x4a0/0x4a0
> [ 4394.309318] ? xas_store+0x1480/0x1480
> [ 4394.309321] ? count_shadow_nodes+0x460/0x460
> [ 4394.309321] ? __blk_mq_get_driver_tag+0x180/0x180
> [ 4394.309325] ret_from_fork+0x30/0x40
> [ 4394.309324] __list_lru_walk_one+0x10f/0x1c0
> [ 4394.309328] ? __blk_mq_get_driver_tag+0x180/0x180
> [ 4394.309331] ret_from_fork_asm+0x11/0x20
> [ 4394.309331] ? count_shadow_nodes+0x460/0x460
> [ 4394.309334] </TASK>
> [ 4394.309334] list_lru_walk_one_irq+0x53/0x80
> [ 4394.309335] ---[ end trace 0000000000000000 ]---
> [ 4394.309338] shrink_slab+0x95a/0xae0
> [ 4394.309341] shrink_node+0x1b48/0x29c0
> [ 4394.309344] ? __rmqueue_pcplist+0x2e4/0x1660
> [ 4394.309350] do_try_to_free_pages+0x21b/0x700
> [ 4394.309355] try_to_free_pages+0x193/0xb80
> [ 4394.309361] __folio_alloc+0xa65/0x6e80
> [ 4394.309364] ? __alloc_pages+0x19b/0x3580
> [ 4394.309370] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 4394.309374] ? __pte_offset_map_lock+0x143/0x200
> [ 4394.309379] do_pte_missing+0x647/0x2da0
> [ 4394.309385] ? task_tick_fair+0x745/0x2880
> [ 4394.309390] handle_mm_fault+0x97d/0xdc0
> [ 4394.309395] do_user_addr_fault+0x66a/0x2f80
> [ 4394.309401] exc_page_fault+0x66/0x160
> [ 4394.309405] asm_exc_page_fault+0x22/0x30
> [ 4394.309408] RIP: 0033:0x7f1560da1c5d
> [ 4394.309425] Code: 4c 17 f0 c3 66 0f 1f 84 00 00 00 00 00 48 8b 4c 16 f8 48 8b 36 48 89 37 48 89 4c 17 f8 c3 c5 fe 6f 54 16 e0 c5 fe 6f 5c 16 c0 <c5> fe 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5
> [ 4394.309426] RSP: 002b:00007ffcbe93e318 EFLAGS: 00010246
> [ 4394.309428] RAX: 00007f1548773000 RBX: 00007f1548773000 RCX: 000000000000001f
> [ 4394.309430] RDX: 0000000000000080 RSI: 00007f155c0dc880 RDI: 00007f1548773000
> [ 4394.309431] RBP: 00007ffcbe93e370 R08: 0000000000000000 R09: 00000007f1548773
> [ 4394.309433] R10: 00007f1548773000 R11: 000000000000a795 R12: 00007f155c0dc880
> [ 4394.309434] R13: 0000000000000026 R14: 0000000000000026 R15: 0000000000000003
> [ 4394.309437] </TASK>
> [ 4394.309438] ---[ end trace 0000000000000000 ]---

See Bugzilla for the full thread and attached full dmesg output.

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217948

--
An old man doll... just what I always wanted! - Clara


2023-09-29 15:35:34

by Marcus Seyfarth

[permalink] [raw]
Subject: Re: Fwd: [6.5.5] System slowdown during compilation workload and RIP: lazy_rcu_shrink_scan

> This seems to be a heavily patched kernel.
> Does this problem also appear with a vanilla 6.5 kernel?

Indeed, CachyOS comes with additional patches. I haven't found an easy
way to try out a vanilla Kernel yet (there is
https://aur.archlinux.org/packages/linux-mainline - but that is
already on 6.6 RC3). As CachyOS also makes use of ananicy and uksmd, I
don't know if it is the best idea for the stability of the system to
test with such a vanilla Kernel that doesn't support these extra
features.

I can present a new data point however which looks quite a bit
different: I've attached a new trace from today of an OOM that
recovered successfully (which means without the freezes afterwards)
with the same Kernel in use, the relevant part is:

[29. Sep 12:14] Qt bearer threa invoked oom-killer:
gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0,
oom_score_adj=200
[ +0,000007] CPU: 25 PID: 1126 Comm: Qt bearer threa Tainted: G
O 6.5.5-2.1-cachyos-lto #1
ae9643c86e4447bdd5b0d7da31c14411335d3e8d
[ +0,000003] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS
CX99DE26 10/10/2020
[ +0,000001] Call Trace:
[ +0,000002] <TASK>
[ +0,000003] dump_header+0x51/0x260
[ +0,000005] oom_kill_process+0x92/0x1a0
[ +0,000003] out_of_memory+0x227/0x320
[ +0,000002] __folio_alloc+0x2e46/0x6ee0
[ +0,000005] ? blk_mq_flush_plug_list+0xaa/0xa00
[ +0,000005] __filemap_get_folio+0x1e2/0x460
[ +0,000002] filemap_fault+0x56c/0x1260
[ +0,000004] do_pte_missing+0x194/0x2da0
[ +0,000004] ? ____fput+0x550/0x2d60
[ +0,000002] ? rtnl_dump_all+0xff/0x120
[ +0,000004] ? free_unref_page+0x237/0xc20
[ +0,000003] ? __wake_up+0xe4/0x1c0
[ +0,000004] handle_mm_fault+0x976/0xe00
[ +0,000003] do_user_addr_fault+0x8ca/0x2f80
[ +0,000002] ? do_syscall_64+0x68/0x80
[ +0,000005] exc_page_fault+0x66/0x160
[ +0,000003] asm_exc_page_fault+0x22/0x30
[ +0,000005] RIP: 0033:0x7f5243289003
[ +0,000014] Code: Unable to access opcode bytes at 0x7f5243288fd9.
[ +0,000001] RSP: 002b:00007f52227fae98 EFLAGS: 00010206
[ +0,000002] RAX: 00007f5272534d40 RBX: 00007f520c012930 RCX: 0000000000000055
[ +0,000002] RDX: 0000000000000005 RSI: 00007f520c00e270 RDI: 00007f520c012950
[ +0,000001] RBP: 0000557650c0c690 R08: 00007f520c011460 R09: 00000007f520c00e
[ +0,000001] R10: 00007f520c000058 R11: 0000000000000003 R12: 00007f52227faf28
[ +0,000001] R13: 00007f520c00f4e0 R14: 0000000000000000 R15: 00007f520c00f4f8
[ +0,000001] </TASK>
[ +0,000001] Mem-Info:
[ +0,000001] active_anon:1474442 inactive_anon:9717963 isolated_anon:0
active_file:10365 inactive_file:6709 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:100405 slab_unreclaimable:112907
mapped:2811 shmem:1465 pagetables:58902
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:57924 free_pcp:4870 free_cma:0
[ +0,000004] Node 0 active_anon:5897768kB inactive_anon:38871852kB
active_file:41460kB inactive_file:26836kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB mapped:11244kB dirty:0>
[ +0,000003] DMA free:15360kB boost:0kB min:8kB low:200kB high:392kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepen>
[ +0,000003] lowmem_reserve[]: 0 1762 47954 47954
[ +0,000003] DMA32 free:185848kB boost:0kB min:1056kB low:24244kB
high:47432kB reserved_highatomic:64KB active_anon:204464kB
inactive_anon:1429212kB active_file:0kB inactive_file:0kB un>
[ +0,000002] lowmem_reserve[]: 0 0 46191 46191
[ +0,000002] Normal free:30488kB boost:0kB min:26964kB low:618252kB
high:1209540kB reserved_highatomic:4352KB active_anon:5693304kB
inactive_anon:37442640kB active_file:41652kB inactive>
[ +0,000003] lowmem_reserve[]: 0 0 0 0
[ +0,000002] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 1*1024kB (M) 1*2048kB (M) 3*4096kB (M) = 15360kB
[ +0,000006] DMA32: 176*4kB (UM) 99*8kB (UM) 486*16kB (UME) 174*32kB
(UME) 134*64kB (UME) 65*128kB (UME) 8*256kB (UME) 5*512kB (M)
40*1024kB (ME) 11*2048kB (UM) 21*4096kB (M) = 185848kB
[ +0,000008] Normal: 308*4kB (UME) 301*8kB (UME) 344*16kB (UME)
409*32kB (UME) 35*64kB (UME) 7*128kB (UM) 3*256kB (U) 1*512kB (U)
0*1024kB 0*2048kB 0*4096kB = 26648kB
[ +0,000007] 20391 total pagecache pages
[ +0,000001] 104 pages in swap cache
[ +0,000001] Free swap = 64kB
[ +0,000000] Total swap = 12293372kB
[ +0,000001] 12542844 pages RAM
[ +0,000000] 0 pages HighMem/MovableOnly
[ +0,000001] 249387 pages reserved
[ +0,000000] 0 pages hwpoisoned
[ +0,000001] Tasks state (memory values in pages):


> What if you disable RCU_LAZY?
I will try that out over the next coming days; by default it is
enabled on CachyOS.


Attachments:
dmesg2.txt (108.89 kB)

2023-09-30 17:39:53

by Tor Vic

[permalink] [raw]
Subject: Re: Fwd: [6.5.5] System slowdown during compilation workload and RIP: lazy_rcu_shrink_scan



On 9/29/23 09:19, Bagas Sanjaya wrote:
> Hi,
>
> I notice a bug report on Bugzilla [1]. Quoting from it:
>
>> ith Kernel 6.5.5, I've noticed a sudden system slowdown during a heavy compilation workload even though plenty of free RAM still available (only around 9 GB used out of 48 GB RAM in total). I first thought of being OOM, but as stated, plenty of free RAM was still available. The system freezed for a couple of seconds, became responsive again for some seconds but this cycle repeated itself until rebooting the system. While the compilation workload could finish (albeit slowly), the system became next to unusable for anything else.
>>
>> In the logs, I've seen the following:
>>
>> [ 4394.309138] ------------[ cut here ]------------
>> [ 4394.309141] ------------[ cut here ]------------
>> [ 4394.309143] WARNING: CPU: 11 PID: 436221 at kernel/rcu/tree_nocb.h:124 call_rcu+0xa85/0x14e0
>> [ 4394.309142] WARNING: CPU: 8 PID: 366 at kernel/rcu/tree_nocb.h:104 lazy_rcu_shrink_scan+0x311/0x340
>> [ 4394.309150] Modules linked in:
>> [ 4394.309151] Modules linked in:
>> [ 4394.309151] vfat
>> [ 4394.309152] vfat
>> [ 4394.309153] fat
>> [ 4394.309153] fat
>> [ 4394.309155] snd_hda_codec_realtek
>> [ 4394.309155] snd_hda_codec_realtek
>> [ 4394.309156] snd_hda_codec_generic
>> [ 4394.309156] snd_hda_codec_generic ledtrig_audio
>> [ 4394.309158] ledtrig_audio intel_rapl_msr
>> [ 4394.309159] intel_rapl_msr
>> [ 4394.309160] snd_hda_codec_hdmi
>> [ 4394.309160] snd_hda_codec_hdmi
>> [ 4394.309162] intel_rapl_common
>> [ 4394.309162] intel_rapl_common
>> [ 4394.309163] sb_edac
>> [ 4394.309163] sb_edac x86_pkg_temp_thermal
>> [ 4394.309165] x86_pkg_temp_thermal intel_powerclamp
>> [ 4394.309166] intel_powerclamp snd_hda_intel
>> [ 4394.309167] snd_hda_intel snd_intel_dspcfg
>> [ 4394.309168] snd_intel_dspcfg snd_hda_codec
>> [ 4394.309170] snd_hda_codec snd_hwdep
>> [ 4394.309171] snd_hwdep
>> [ 4394.309172] coretemp
>> [ 4394.309172] coretemp
>> [ 4394.309173] crct10dif_pclmul
>> [ 4394.309173] crct10dif_pclmul snd_hda_core
>> [ 4394.309174] snd_hda_core crc32_pclmul
>> [ 4394.309175] crc32_pclmul
>> [ 4394.309177] polyval_clmulni
>> [ 4394.309177] polyval_clmulni polyval_generic
>> [ 4394.309178] polyval_generic gf128mul
>> [ 4394.309179] gf128mul
>> [ 4394.309180] ghash_clmulni_intel
>> [ 4394.309180] ghash_clmulni_intel sha512_ssse3
>> [ 4394.309181] sha512_ssse3 snd_pcm
>> [ 4394.309182] snd_pcm
>> [ 4394.309184] sha256_ssse3
>> [ 4394.309184] sha256_ssse3
>> [ 4394.309185] sha1_ssse3
>> [ 4394.309185] sha1_ssse3
>> [ 4394.309186] aesni_intel
>> [ 4394.309186] aesni_intel
>> [ 4394.309188] snd_timer
>> [ 4394.309188] snd_timer crypto_simd
>> [ 4394.309189] crypto_simd cryptd
>> [ 4394.309190] cryptd i2c_i801
>> [ 4394.309191] i2c_i801 acpi_cpufreq
>> [ 4394.309192] acpi_cpufreq i2c_smbus
>> [ 4394.309194] i2c_smbus snd
>> [ 4394.309195] snd igb
>> [ 4394.309196] igb lpc_ich
>> [ 4394.309197] lpc_ich
>> [ 4394.309198] soundcore mei_wdt
>> [ 4394.309199] soundcore mei_wdt
>> [ 4394.309200] razerkbd(O)
>> [ 4394.309201] razerkbd(O)
>> [ 4394.309201] mousedev sch_fq_codel
>> [ 4394.309202] mousedev sch_fq_codel
>> [ 4394.309203] usbip_host usbip_core
>> [ 4394.309204] usbip_host usbip_core
>> [ 4394.309206] pkcs8_key_parser crypto_user
>> [ 4394.309207] pkcs8_key_parser crypto_user
>> [ 4394.309208] fuse loop
>> [ 4394.309209] fuse
>> [ 4394.309210] zram
>> [ 4394.309210] loop zram
>> [ 4394.309211] bpf_preload ip_tables
>> [ 4394.309212] bpf_preload
>> [ 4394.309214] x_tables
>> [ 4394.309214] ip_tables x_tables
>> [ 4394.309215] ext4
>> [ 4394.309216] ext4
>> [ 4394.309216] crc32c_generic mbcache
>> [ 4394.309218] crc32c_generic mbcache
>> [ 4394.309219] crc16
>> [ 4394.309220] crc16
>> [ 4394.309220] jbd2 usbhid
>> [ 4394.309221] jbd2 usbhid
>> [ 4394.309222] amdgpu mfd_core
>> [ 4394.309223] amdgpu
>> [ 4394.309224] drm_buddy drm_suballoc_helper
>> [ 4394.309225] mfd_core
>> [ 4394.309226] crc32c_intel
>> [ 4394.309226] drm_buddy drm_suballoc_helper
>> [ 4394.309227] video drm_ttm_helper
>> [ 4394.309228] crc32c_intel
>> [ 4394.309229] ttm i2c_algo_bit
>> [ 4394.309230] video drm_ttm_helper
>> [ 4394.309231] drm_display_helper cec
>> [ 4394.309232] ttm i2c_algo_bit
>> [ 4394.309234] xhci_pci gpu_sched
>> [ 4394.309235] drm_display_helper
>> [ 4394.309236] xhci_pci_renesas
>> [ 4394.309236] cec xhci_pci
>> [ 4394.309237] amdxcp wmi
>> [ 4394.309238] gpu_sched xhci_pci_renesas
>>
>> [ 4394.309240] amdxcp wmi
>> [ 4394.309240] CPU: 8 PID: 366 Comm: kswapd0 Tainted: G O 6.5.5-2.1-cachyos-lto #1 cc6c6175d85fa32a794326555b105a9657ffd076

This seems to be a heavily patched kernel.
Does this problem also appear with a vanilla 6.5 kernel?

>> [ 4394.309244] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
>> [ 4394.309242] CPU: 11 PID: 436221 Comm: cc1plus Tainted: G O 6.5.5-2.1-cachyos-lto #1 cc6c6175d85fa32a794326555b105a9657ffd076
>> [ 4394.309245] RIP: 0010:lazy_rcu_shrink_scan+0x311/0x340
>> [ 4394.309246] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
>> [ 4394.309248] Code: 03 00 00 75 25 4c 89 e7 e8 ac 81 87 00 f0 41 ff 8e 74 01 00 00 41 f6 86 e8 00 00 00 04 0f 85 1a fe ff ff 0f 0b e9 13 fe ff ff <0f> 0b eb d7 48 c7 c7 20 ff ec 9c e8 ff d1 86 00 31 c0 49 83 fd 01
>> [ 4394.309248] RIP: 0010:call_rcu+0xa85/0x14e0
>> [ 4394.309250] RSP: 0000:ffff8e82a3fdba88 EFLAGS: 00010093
>> [ 4394.309251] Code: 02 00 00 00 0f 85 f0 00 00 00 f6 83 e8 00 00 00 01 74 ac 48 8b 83 80 00 00 00 48 83 38 00 0f 94 c0 89 44 24 08 e9 06 f9 ff ff <0f> 0b 0f ae e8 e9 d3 f6 ff ff 0f 0b e9 cc f6 ff ff 4c 89 ef e8 62
>>
>> [ 4394.309253] RAX: 0000000000000008 RBX: 000000000000000b RCX: 0000000000000000
>> [ 4394.309254] RSP: 0000:ffff8e7a8a033388 EFLAGS: 00010002
>> [ 4394.309255] RDX: ffff8e84cf8b0148 RSI: 0000000000000286 RDI: ffff8e84cf8f0140
>>
>> [ 4394.309257] RBP: 0000000000000001 R08: ffff8e84cf82c540 R09: 0000000000000094
>> [ 4394.309258] RAX: 000000000000000b RBX: ffff8e84cf8eff40 RCX: 0000000000001388
>> [ 4394.309258] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e84cf8f0140
>> [ 4394.309260] RDX: 40000077ed480115 RSI: ffff8e84cf8effb8 RDI: ffff8e7db6814da8
>> [ 4394.309260] R13: 0000000000000006 R14: ffff8e84cf8eff40 R15: 0000000000000202
>> [ 4394.309262] FS: 0000000000000000(0000) GS:ffff8e84cf800000(0000) knlGS:0000000000000000
>> [ 4394.309262] RBP: 0000000000000002 R08: 0000000000000040 R09: ffff8e7db6814d90
>> [ 4394.309264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4394.309264] R10: 000000000000000c R11: ffff8e843519d4f0 R12: ffff8e84cf8f0148
>> [ 4394.309266] CR2: 00007f86c9ef0000 CR3: 0000000163681001 CR4: 00000000001706e0
>> [ 4394.309266] R13: 0000000000000002 R14: 0000000000000000 R15: ffff8e7db6814da8
>> [ 4394.309268] Call Trace:
>> [ 4394.309268] FS: 00007f1566065b40(0000) GS:ffff8e84cf8c0000(0000) knlGS:0000000000000000
>> [ 4394.309270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4394.309271] <TASK>
>> [ 4394.309272] CR2: 00007f1548773000 CR3: 00000007a5401003 CR4: 00000000001706e0
>> [ 4394.309274] Call Trace:
>> [ 4394.309276] <TASK>
>> [ 4394.309274] ? __warn+0x9e/0x160
>> [ 4394.309278] ? lazy_rcu_shrink_scan+0x311/0x340

What if you disable RCU_LAZY?

>> [ 4394.309278] ? __warn+0x9e/0x160
>> [ 4394.309282] ? call_rcu+0xa85/0x14e0
>> [ 4394.309281] ? report_bug+0x147/0x180
>> [ 4394.309284] ? handle_bug+0x48/0x80
>> [ 4394.309285] ? report_bug+0x147/0x180
>> [ 4394.309287] ? exc_invalid_op+0x16/0x40
>> [ 4394.309289] ? handle_bug+0x48/0x80
>> [ 4394.309290] ? asm_exc_invalid_op+0x16/0x20
>> [ 4394.309292] ? exc_invalid_op+0x16/0x40
>> [ 4394.309294] ? lazy_rcu_shrink_scan+0x311/0x340
>> [ 4394.309295] ? asm_exc_invalid_op+0x16/0x20
>> [ 4394.309297] ? lazy_rcu_shrink_scan+0x10e/0x340
>> [ 4394.309299] shrink_slab+0x95a/0xae0
>> [ 4394.309300] ? call_rcu+0xa85/0x14e0
>> [ 4394.309303] shrink_node+0x1b48/0x29c0
>> [ 4394.309303] ? sysvec_apic_timer_interrupt+0x11/0x80
>> [ 4394.309308] kswapd+0x9fd/0x13e0
>> [ 4394.309307] xas_store+0xb7a/0x1480
>> [ 4394.309311] ? schedule+0x61a/0x1d20
>> [ 4394.309313] shadow_lru_isolate+0x103/0x1c0
>> [ 4394.309314] kthread+0x1bd/0x260
>> [ 4394.309317] ? check_move_unevictable_folios+0x4a0/0x4a0
>> [ 4394.309318] ? xas_store+0x1480/0x1480
>> [ 4394.309321] ? count_shadow_nodes+0x460/0x460
>> [ 4394.309321] ? __blk_mq_get_driver_tag+0x180/0x180
>> [ 4394.309325] ret_from_fork+0x30/0x40
>> [ 4394.309324] __list_lru_walk_one+0x10f/0x1c0
>> [ 4394.309328] ? __blk_mq_get_driver_tag+0x180/0x180
>> [ 4394.309331] ret_from_fork_asm+0x11/0x20
>> [ 4394.309331] ? count_shadow_nodes+0x460/0x460
>> [ 4394.309334] </TASK>
>> [ 4394.309334] list_lru_walk_one_irq+0x53/0x80
>> [ 4394.309335] ---[ end trace 0000000000000000 ]---
>> [ 4394.309338] shrink_slab+0x95a/0xae0
>> [ 4394.309341] shrink_node+0x1b48/0x29c0
>> [ 4394.309344] ? __rmqueue_pcplist+0x2e4/0x1660
>> [ 4394.309350] do_try_to_free_pages+0x21b/0x700
>> [ 4394.309355] try_to_free_pages+0x193/0xb80
>> [ 4394.309361] __folio_alloc+0xa65/0x6e80
>> [ 4394.309364] ? __alloc_pages+0x19b/0x3580
>> [ 4394.309370] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>> [ 4394.309374] ? __pte_offset_map_lock+0x143/0x200
>> [ 4394.309379] do_pte_missing+0x647/0x2da0
>> [ 4394.309385] ? task_tick_fair+0x745/0x2880
>> [ 4394.309390] handle_mm_fault+0x97d/0xdc0
>> [ 4394.309395] do_user_addr_fault+0x66a/0x2f80
>> [ 4394.309401] exc_page_fault+0x66/0x160
>> [ 4394.309405] asm_exc_page_fault+0x22/0x30
>> [ 4394.309408] RIP: 0033:0x7f1560da1c5d
>> [ 4394.309425] Code: 4c 17 f0 c3 66 0f 1f 84 00 00 00 00 00 48 8b 4c 16 f8 48 8b 36 48 89 37 48 89 4c 17 f8 c3 c5 fe 6f 54 16 e0 c5 fe 6f 5c 16 c0 <c5> fe 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5
>> [ 4394.309426] RSP: 002b:00007ffcbe93e318 EFLAGS: 00010246
>> [ 4394.309428] RAX: 00007f1548773000 RBX: 00007f1548773000 RCX: 000000000000001f
>> [ 4394.309430] RDX: 0000000000000080 RSI: 00007f155c0dc880 RDI: 00007f1548773000
>> [ 4394.309431] RBP: 00007ffcbe93e370 R08: 0000000000000000 R09: 00000007f1548773
>> [ 4394.309433] R10: 00007f1548773000 R11: 000000000000a795 R12: 00007f155c0dc880
>> [ 4394.309434] R13: 0000000000000026 R14: 0000000000000026 R15: 0000000000000003
>> [ 4394.309437] </TASK>
>> [ 4394.309438] ---[ end trace 0000000000000000 ]---
>
> See Bugzilla for the full thread and attached full dmesg output.
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217948
>