2024-04-14 08:50:31

by Sergei Trofimovich

[permalink] [raw]
Subject: 6.8 to 6.9.0-rc3: kernel NULL pointer dereference in pick_next_task_fair+0x89

Hi kernel/sched/ maintainers!

Over past few days my machines started OOpsing when a nightly kernel
build starts. I don't have a reliable reproducer. The builder should use
`idle` CPU scheduling policy.

Which debugging options should I try to get a better clue what causes
the crash?

Thank you!

Adding a few backtraces in hope that they are useful.

Most recent one:

<1>[161961.133291] BUG: kernel NULL pointer dereference, address: 00000000000000a0
<1>[161961.133296] #PF: supervisor read access in kernel mode
<1>[161961.133298] #PF: error_code(0x0000) - not-present page
<6>[161961.133299] PGD 0 P4D 0
<4>[161961.133301] Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[161961.133303] CPU: 1 PID: 1181910 Comm: as Not tainted 6.9.0-rc3 #1-NixOS
<4>[161961.133305] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ULTRA/X570 AORUS ULTRA, BIOS F32 GK 01/19/2021
<4>[161961.133306] RIP: 0010:pick_next_task_fair+0x89/0x5a0
<4>[161961.133311] Code: 00 00 00 49 81 bc 24 b0 02 00 00 a0 89 7a 95 75 5e 4d 89 f7 eb 27 4c 89 ff e8 d3 9f ff ff 84 c0 75 3f 4c 89 ff e8 57 11 ff ff <4c> 8b b8 a0 00 00 00 48 89 c3 4d 85 ff 0f 84 01 01 00 00 49 8b 47
Oops#1 Part3
<4>[161961.133312] RSP: 0018:ffffbeece7437d90 EFLAGS: 00010082
<4>[161961.133314] RAX: 0000000000000000 RBX: ffffa2647e2b5380 RCX: 0000000000000400
<4>[161961.133315] RDX: 9b9d1e7d431adc5a RSI: 00000000000004d6 RDI: 00000000000000d6
<4>[161961.133315] RBP: ffffa2647e2b5380 R08: 00000000000000d6 R09: 0000000000000002
<4>[161961.133316] R10: 00000000fa83b2da R11: 0000000000da2bc9 R12: ffffa24eb3b31200
<4>[161961.133317] R13: ffffbeece7437e18 R14: ffffa2647e2b5480 R15: ffffa2647e2b5480
<4>[161961.133318] FS: 0000000000000000(0000) GS:ffffa2647e280000(0000) knlGS:0000000000000000
<4>[161961.133319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[161961.133319] CR2: 00000000000000a0 CR3: 000000183ea20000 CR4: 0000000000f50ef0
<4>[161961.133320] PKRU: 55555554
<4>[161961.133321] Call Trace:
<4>[161961.133322] <TASK>
<4>[161961.133326] ? __die+0x23/0x70
<4>[161961.133329] ? page_fault_oops+0x173/0x580
<4>[161961.133332] ? exc_page_fault+0x71/0x150
<4>[161961.133335] ? asm_exc_page_fault+0x26/0x30
<4>[161961.133341] ? pick_next_task_fair+0x89/0x5a0
<4>[161961.133343] __schedule+0x184/0x1540
<4>[161961.133346] ? sysvec_irq_work+0xe/0x80
<4>[161961.133347] ? __mod_memcg_state+0x84/0x100
<4>[161961.133350] ? refill_stock+0x1a/0x30
<4>[161961.133352] do_task_dead+0x42/0x50
<4>[161961.133355] do_exit+0x7a9/0xaa0
<4>[161961.133358] do_group_exit+0x30/0x80
<4>[161961.133360] __x64_sys_exit_group+0x18/0x20
<4>[161961.133362] do_syscall_64+0xba/0x210
<4>[161961.133364] entry_SYSCALL_64_after_hwframe+0x72/0x7a
<4>[161961.133367] RIP: 0033:0x7ffff7d3e3ed
Oops#1 Part2
<4>[161961.133403] Code: Unable to access opcode bytes at 0x7ffff7d3e3c3.
<4>[161961.133403] RSP: 002b:00007ffffffe42f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
<4>[161961.133405] RAX: ffffffffffffffda RBX: 00007ffff7e46fa8 RCX: 00007ffff7d3e3ed
<4>[161961.133405] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
<4>[161961.133406] RBP: 0000000000000001 R08: 00007ffffffe4298 R09: 0000000000000006
<4>[161961.133406] R10: 0000000000000038 R11: 0000000000000202 R12: 0000000000000000
<4>[161961.133407] R13: 0000000000000000 R14: 00007ffff7e45680 R15: 00007ffff7e46fc0
<4>[161961.133408] </TASK>
<4>[161961.133409] Modules linked in: tcp_diag inet_diag nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 cast5_avx_x86_64 cast5_generic cast_common blowfish_generic blowfish_x86_64 blowfish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common aes_generic lrw snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ctr af_packet ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes algif_skcipher dummy cmac md4 algif_hash af_alg msr nft_chain_nat xt_MASQUERADE nf_nat nls_utf8 nls_cp866 vfat fat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 amdgpu ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat iwlmvm tcp_bbr sch_fq nf_tables mac80211 tls atkbd snd_hda_codec_hdmi libps2 intel_rapl_msr serio libarc4 edac_mce_amd vivaldi_fmap edac_core
Oops#1 Part1
<4>[161961.133452] snd_hda_intel snd_intel_dspcfg amd_atl snd_intel_sdw_acpi btusb intel_rapl_common snd_hda_codec amdxcp drm_exec crc32_pclmul polyval_clmulni btrtl gpu_sched polyval_generic gf128mul drm_buddy btintel loop btbcm ghash_clmulni_intel drm_suballoc_helper cpufreq_ondemand btmtk sha512_ssse3 drm_ttm_helper iwlwifi snd_hda_core sha512_generic bluetooth ttm sha256_ssse3 snd_hwdep tun sha1_ssse3 wmi_bmof gigabyte_wmi drm_display_helper snd_pcm tap mxm_wmi aesni_intel crypto_simd cryptd ecdh_generic input_leds ecc led_class cec snd_timer macvlan evdev mousedev crc16 libaes joydev cfg80211 mac_hid sp5100_tco video rapl snd bridge acpi_cpufreq k10temp igb backlight soundcore stp watchdog ptp llc tiny_power_button pps_core i2c_piix4 rfkill dca i2c_algo_bit thermal wmi button kvm_amd ccp rng_core kvm fuse configfs efi_pstore nfnetlink zstd zram efivarfs dmi_sysfs ip_tables x_tables autofs4 sd_mod hid_generic usbhid hid ahci libahci libata xhci_pci xhci_pci_renesas nvme firmware_class nvme_core xhci_hcd scsi_mod t10_pi
<4>[161961.133500] crc64_rocksoft crc64 crc_t10dif crct10dif_generic scsi_common crct10dif_pclmul crct10dif_common rtc_cmos dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
<4>[161961.133508] CR2: 00000000000000a0
<4>[161961.133510] ---[ end trace 0000000000000000 ]---

Previous crashes:

<1>[70053.502250] BUG: kernel NULL pointer dereference, address: 00000000000000a0
<1>[70053.502255] #PF: supervisor read access in kernel mode
<1>[70053.502257] #PF: error_code(0x0000) - not-present page
<6>[70053.502258] PGD 1e1ccec067 P4D 1e1ccec067 PUD 196e968067 PMD 0
<4>[70053.502261] Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[70053.502263] CPU: 3 PID: 3124643 Comm: strip Not tainted 6.8.2 #1-NixOS
<4>[70053.502265] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ULTRA/X570 AORUS ULTRA, BIOS F32 GK 01/19/2021
<4>[70053.502266] RIP: 0010:pick_next_task_fair+0x89/0x5a0
<4>[70053.502271] Code: 00 00 00 49 81 bc 24 b0 02 00 00 e0 b8 99 9b 75 5e 4d 89 f7 eb 27 4c 89 ff e8 83 9e ff ff 84 c0 75 3f 4c 89 ff e8 67 11 ff ff <4c> 8b b8 a0 00 00 00 48 89 c3 4d 85 ff 0f 84 01 01 00 00 49 8b 47
Oops#1 Part3
<4>[70053.502272] RSP: 0000:ffffa5e16860fe30 EFLAGS: 00010082
<4>[70053.502273] RAX: 0000000000000000 RBX: ffff98ff3e3b3640 RCX: 0000000000000400
<4>[70053.502274] RDX: a4bd7da6e65033f8 RSI: 00000000000004e4 RDI: 00000000000000e4
<4>[70053.502275] RBP: ffff98ff3e3b3640 R08: 00000000000000e4 R09: 0000000000000002
<4>[70053.502276] R10: ffffffff9c006110 R11: 00000000000075da R12: ffff98e0893091c0
<4>[70053.502276] R13: ffffa5e16860feb8 R14: ffff98ff3e3b3740 R15: ffff98ff3e3b3740
<4>[70053.502277] FS: 0000000000000000(0000) GS:ffff98ff3e380000(0000) knlGS:0000000000000000
<4>[70053.502278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[70053.502279] CR2: 00000000000000a0 CR3: 000000055953e000 CR4: 0000000000f50ef0
<4>[70053.502280] PKRU: 55555554
<4>[70053.502280] Call Trace:
<4>[70053.502282] <TASK>
<4>[70053.502286] ? __die+0x23/0x70
<4>[70053.502289] ? page_fault_oops+0x171/0x4e0
<4>[70053.502291] ? __slab_free+0xdf/0x360
<4>[70053.502294] ? exc_page_fault+0x72/0x160
<4>[70053.502297] ? asm_exc_page_fault+0x26/0x30
<4>[70053.502302] ? pick_next_task_fair+0x89/0x5a0
<4>[70053.502303] __schedule+0x185/0x1550
<4>[70053.502306] ? sched_clock+0x10/0x30
<4>[70053.502308] ? __do_softirq+0x17a/0x2ca
<4>[70053.502310] schedule+0x32/0xd0
<4>[70053.502311] irqentry_exit_to_user_mode+0x1dc/0x230
<4>[70053.502313] asm_sysvec_apic_timer_interrupt+0x1a/0x20
<4>[70053.502315] RIP: 0033:0x7ffff7fdbe06
<4>[70053.502341] Code: c5 08 31 c0 80 fa 3d 74 1f 84 d2 74 1b 0f 1f 80 00 00 00 00 83 c0 01 48 63 d0 41 0f b6 14 11 84 d2 74 c1 80 fa 3d 75 ec 84 d2 <74> b8 44 8d 40 01 4d 63 c0 4d 01 c8 4d 85 ed 0f 84 bf 00 00 00 4c
Oops#1 Part2
<4>[70053.502342] RSP: 002b:00007fffffff67f0 EFLAGS: 00000202
<4>[70053.502343] RAX: 000000000000000b RBX: ffff800008004b57 RCX: 00007fffffff981b
<4>[70053.502344] RDX: 000000000000003d RSI: 000000000000004e RDI: 00007ffff7ffca71
<4>[70053.502344] RBP: 00007ffff7ffb440 R08: 00007fffffff9828 R09: 00007fffffff986a
<4>[70053.502345] R10: 000000000000037f R11: 0000000000000064 R12: 00007ffff7ffca71
<4>[70053.502346] R13: 00007fffffff6c28 R14: 00007ffff7fc9000 R15: 00007ffff7fc95b0
<4>[70053.502347] </TASK>
<4>[70053.502348] Modules linked in: tcp_diag inet_diag nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 cast5_avx_x86_64 cast5_generic cast_common blowfish_generic blowfish_x86_64 blowfish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common aes_generic lrw snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ctr af_packet ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes algif_skcipher dummy cmac md4 algif_hash af_alg msr nft_chain_nat nls_utf8 xt_MASQUERADE nls_cp866 nf_nat vfat fat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 amdgpu ip6t_rpfilter ipt_rpfilter iwlmvm xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat mac80211 nf_tables tcp_bbr sch_fq libarc4 snd_hda_codec_hdmi intel_rapl_msr tls btusb btrtl atkbd snd_hda_intel btintel edac_mce_amd libps2
Oops#1 Part1
<4>[70053.502381] btbcm snd_intel_dspcfg serio snd_intel_sdw_acpi btmtk edac_core iwlwifi vivaldi_fmap intel_rapl_common amdxcp drm_exec snd_hda_codec bluetooth gpu_sched crc32_pclmul drm_buddy polyval_clmulni polyval_generic gf128mul snd_hda_core ghash_clmulni_intel drm_suballoc_helper loop sha512_ssse3 sha512_generic drm_ttm_helper ttm snd_hwdep cpufreq_ondemand sha256_ssse3 tun drm_display_helper snd_pcm sha1_ssse3 gigabyte_wmi ecdh_generic wmi_bmof mxm_wmi cec input_leds tap igb aesni_intel ecc snd_timer video mousedev led_class evdev crypto_simd sp5100_tco macvlan cfg80211 joydev crc16 mac_hid cryptd libaes ptp rapl snd pps_core watchdog bridge acpi_cpufreq tiny_power_button soundcore backlight dca k10temp stp i2c_piix4 i2c_algo_bit rfkill llc thermal wmi button kvm_amd ccp rng_core kvm irqbypass fuse efi_pstore configfs nfnetlink zstd zram efivarfs dmi_sysfs ip_tables x_tables autofs4 sd_mod hid_generic usbhid hid ahci libahci libata xhci_pci xhci_pci_renesas nvme firmware_class nvme_core xhci_hcd scsi_mod t10_pi
<4>[70053.502423] crc64_rocksoft crc64 crc_t10dif crct10dif_generic scsi_common crct10dif_pclmul crct10dif_common rtc_cmos dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
<4>[70053.502431] CR2: 00000000000000a0
<4>[70053.502433] ---[ end trace 0000000000000000 ]---

Another one:

<1>[626079.354590] BUG: kernel NULL pointer dereference, address: 00000000000000a0
<1>[626079.354596] #PF: supervisor read access in kernel mode
<1>[626079.354597] #PF: error_code(0x0000) - not-present page
<6>[626079.354598] PGD 2c985c067 P4D 2c985c067 PUD 2686ef067 PMD 0
<4>[626079.354602] Oops: 0000 [#2] PREEMPT SMP NOPTI
<4>[626079.354603] CPU: 3 PID: 21065 Comm: pahole Tainted: G D 6.8.2 #1-NixOS
<4>[626079.354605] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ULTRA/X570 AORUS ULTRA, BIOS F32 GK 01/19/2021
<4>[626079.354606] RIP: 0010:pick_next_task_fair+0x89/0x5a0
<4>[626079.354612] Code: 00 00 00 49 81 bc 24 b0 02 00 00 e0 b8 19 ba 75 5e 4d 89 f7 eb 27 4c 89 ff e8 83 9e ff ff 84 c0 75 3f 4c 89 ff e8 67 11 ff ff <4c> 8b b8 a0 00 00 00 48 89 c3 4d 85 ff 0f 84 01 01 00 00 49 8b 47
Oops#2 Part3
<4>[626079.354613] RSP: 0000:ffffb5d93a157e30 EFLAGS: 00010082
<4>[626079.354614] RAX: 0000000000000000 RBX: ffff9bd83e3b3640 RCX: 0000000000000800
<4>[626079.354615] RDX: e55c7ab36e3e44c0 RSI: 0000000000000850 RDI: 0000000000000050
<4>[626079.354616] RBP: ffff9bd83e3b3640 R08: 0000000000000050 R09: 0000000000000002
<4>[626079.354617] R10: ffff9bd83e3b3790 R11: 0000000000000000 R12: ffff9bbe12582380
<4>[626079.354617] R13: ffffb5d93a157eb8 R14: ffff9bd83e3b3740 R15: ffff9bd83e3b3740
<4>[626079.354618] FS: 00007ffff7b37740(0000) GS:ffff9bd83e380000(0000) knlGS:0000000000000000
<4>[626079.354619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[626079.354620] CR2: 00000000000000a0 CR3: 000000038926a000 CR4: 0000000000f50ef0
<4>[626079.354621] PKRU: 55555554
<4>[626079.354622] Call Trace:
<4>[626079.354624] <TASK>
<4>[626079.354628] ? __die+0x23/0x70
<4>[626079.354631] ? page_fault_oops+0x171/0x4e0
<4>[626079.354634] ? exc_page_fault+0x72/0x160
<4>[626079.354637] ? asm_exc_page_fault+0x26/0x30
<4>[626079.354641] ? pick_next_task_fair+0x89/0x5a0
<4>[626079.354643] __schedule+0x185/0x1550
<4>[626079.354645] ? sched_clock+0x10/0x30
<4>[626079.354647] ? __do_softirq+0x17a/0x2ca
<4>[626079.354650] schedule+0x32/0xd0
<4>[626079.354651] irqentry_exit_to_user_mode+0x1dc/0x230
<4>[626079.354653] asm_sysvec_apic_timer_interrupt+0x1a/0x20
<4>[626079.354654] RIP: 0033:0x7ffff7e5a550
<4>[626079.354685] Code: 00 0c 00 00 00 eb cb e8 1e 8f ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89 f5 53 48 89 fb 48 83 ec 08 8b 47 54 39 c6 <73> 11 66 0f 1f 44 00 00 48 8b 5b 48 8b 43 54 39 c5 72 f5 89 ea 29
Oops#2 Part2
<4>[626079.354686] RSP: 002b:00007fffffff64c0 EFLAGS: 00000202
<4>[626079.354687] RAX: 0000000000000000 RBX: 00000000004151a0 RCX: 0000000000008df2
<4>[626079.354688] RDX: 00000000000ca7b2 RSI: 00000000000ca7bf RDI: 00000000004151a0
<4>[626079.354688] RBP: 00000000000ca7bf R08: 000000007fffffff R09: 000000000001a6f7
<4>[626079.354689] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffff7e99b34
<4>[626079.354690] R13: 00007ffff7148e34 R14: 00007ffff7148d64 R15: 00000000000ca7bf
<4>[626079.354691] </TASK>
<4>[626079.354691] Modules linked in: uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 snd_usb_audio videodev snd_usbmidi_lib snd_ump videobuf2_common snd_rawmidi mc nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 cast5_avx_x86_64 cast5_generic cast_common blowfish_generic blowfish_x86_64 blowfish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common aes_generic lrw tcp_diag inet_diag snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ctr af_packet ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes algif_skcipher dummy cmac md4 algif_hash af_alg msr nft_chain_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 amdgpu ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat iwlmvm nls_utf8 nls_cp866 tcp_bbr
Oops#2 Part1
<4>[626079.354725] vfat nf_tables sch_fq fat mac80211 tls btusb edac_mce_amd btrtl atkbd btintel snd_hda_codec_hdmi edac_core btbcm intel_rapl_msr btmtk libps2 serio libarc4 snd_hda_intel intel_rapl_common vivaldi_fmap gigabyte_wmi mxm_wmi wmi_bmof snd_intel_dspcfg crc32_pclmul bluetooth polyval_clmulni snd_intel_sdw_acpi polyval_generic amdxcp loop drm_exec gf128mul iwlwifi snd_hda_codec ghash_clmulni_intel gpu_sched cpufreq_ondemand sha512_ssse3 sha512_generic drm_buddy tun drm_suballoc_helper sha256_ssse3 drm_ttm_helper ecdh_generic tap input_leds snd_hda_core sha1_ssse3 aesni_intel ecc mousedev evdev ttm macvlan snd_hwdep led_class crypto_simd crc16 igb joydev mac_hid libaes bridge drm_display_helper snd_pcm cryptd cfg80211 stp snd_timer rapl cec ptp sp5100_tco llc snd acpi_cpufreq pps_core tiny_power_button watchdog video soundcore backlight dca k10temp rfkill i2c_algo_bit i2c_piix4 kvm_amd wmi thermal button ccp rng_core kvm irqbypass fuse configfs efi_pstore nfnetlink zstd zram efivarfs dmi_sysfs ip_tables x_tables
<4>[626079.354766] autofs4 sd_mod hid_generic usbhid hid ahci libahci libata xhci_pci xhci_pci_renesas nvme firmware_class nvme_core xhci_hcd scsi_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic scsi_common crct10dif_pclmul crct10dif_common rtc_cmos dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
<4>[626079.354780] CR2: 00000000000000a0
<4>[626079.354781] ---[ end trace 0000000000000000 ]---

--

Sergei


2024-04-15 10:59:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 6.8 to 6.9.0-rc3: kernel NULL pointer dereference in pick_next_task_fair+0x89

On Sun, Apr 14, 2024 at 09:50:18AM +0100, Sergei Trofimovich wrote:
> Hi kernel/sched/ maintainers!
>
> Over past few days my machines started OOpsing when a nightly kernel
> build starts. I don't have a reliable reproducer. The builder should use
> `idle` CPU scheduling policy.
>

Thanks for the report, some of us have been chasing this ghost for a
little while and so far it has proven elusive because it's very hard to
reliably reproduce.

I'll go try with your approach.

2024-04-16 06:21:43

by Igor Raits

[permalink] [raw]
Subject: Re: 6.8 to 6.9.0-rc3: kernel NULL pointer dereference in pick_next_task_fair+0x89

Hello all,

We also see this issue quite frequently these days, however it gets
slightly hidden behind printk issue so that full stacktrace can't be
printed due to the deadlock.

If you will have any patch to try out we are happy to test.

2024-04-16 16:23:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 6.8 to 6.9.0-rc3: kernel NULL pointer dereference in pick_next_task_fair+0x89

On Tue, Apr 16, 2024 at 08:18:28AM +0200, Igor Raits wrote:
> Hello all,
>
> We also see this issue quite frequently these days, however it gets
> slightly hidden behind printk issue so that full stacktrace can't be
> printed due to the deadlock.

Who is we and how do you make it go bang?

Having a semi reliable reproducer in hand would be a tremendous help.


Anywya, I've had:

schedtool -D -e bash -c "while :; make O=defconfig-build clean; make O=defconfig-build -j64; done"

running for an hour or so now, but no luck :/

2024-04-16 16:46:46

by Igor Raits

[permalink] [raw]
Subject: Re: 6.8 to 6.9.0-rc3: kernel NULL pointer dereference in pick_next_task_fair+0x89

On Tue, Apr 16, 2024 at 6:23 PM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Apr 16, 2024 at 08:18:28AM +0200, Igor Raits wrote:
> > Hello all,
> >
> > We also see this issue quite frequently these days, however it gets
> > slightly hidden behind printk issue so that full stacktrace can't be
> > printed due to the deadlock.
>
> Who is we and how do you make it go bang?

Our company is running our production :) We simply run our lot of
parallel instances of Java application inside a container that
consumes a lot of CPU and after finishing the new container is run
with the same app processing a different load. I don't have anything
specific apart from "lot java runs in parallel spawning, consuming lot
of CPU" somehow triggers this. Sadly stacktrace can't be printed as
printk deadlock is happening but this is how the VM looks like

PID: 34302 TASK: ffff9db7a020b400 CPU: 0 COMMAND: "C2 CompilerThre"
[exception RIP: native_queued_spin_lock_slowpath+496]
RIP: ffffffff994d81a0 RSP: ffffadef43a67d48 RFLAGS: 00000002
RAX: 0000000000180101 RBX: ffff9dc443c34080 RCX: 0000000000040000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffff9dc443c73280
RBP: ffff9dc443c73280 R8: 0000000000000006 R9: fffffffffffffffc
R10: 0000000000000006 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000007aaf R15: ffff9db600a5e800
CS: 0010 SS: 0000
#0 [ffffadef43a67d68] _raw_spin_lock at ffffffff994d7de5
#1 [ffffadef43a67d70] raw_spin_rq_lock_nested at ffffffff9894be19
#2 [ffffadef43a67d88] load_balance at ffffffff98966c5d
#3 [ffffadef43a67e68] rebalance_domains at ffffffff98967d5a
#4 [ffffadef43a67ed8] __do_softirq at ffffffff994d8a08
#5 [ffffadef43a67f28] irq_exit_rcu at ffffffff9890f0e6
#6 [ffffadef43a67f38] sysvec_apic_timer_interrupt at ffffffff994c2aec
#7 [ffffadef43a67f50] asm_sysvec_apic_timer_interrupt at ffffffff99600e06
RIP: 00007f8d3168e9da RSP: 00007f8d27ffdb90 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 00007f8c80992cc0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007f8d31518820
RBP: 00007f8d27ffdbb0 R8: 00007f8cc80b7200 R9: 0000000000000000
R10: 00007f8d284195a0 R11: 0b99c62094d7ff00 R12: 00007f8d27ffdbd7
R13: 00007f8d27ffdd60 R14: 00007f8d314b2998 R15: 00007f8d2801c910
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

PID: 37 TASK: ffff9db600a6ce00 CPU: 1 COMMAND: "ksoftirqd/1"
[exception RIP: native_queued_spin_lock_slowpath+114]
RIP: ffffffff994d8022 RSP: ffffadef4016f840 RFLAGS: 00000002
RAX: 0000000000000001 RBX: ffff9db9ccab0000 RCX: ffff9db9ccab0428
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9dc443c73280
RBP: ffff9dc443c73280 R8: ffff9dc443c72c28 R9: ffff9dc443c72c28
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9dc443c73280
R13: ffff9db9ccab0cd4 R14: 0000000000000006 R15: 0000000000000001
CS: 0010 SS: 0018
#0 [ffffadef4016f860] _raw_spin_lock at ffffffff994d7de5
#1 [ffffadef4016f868] raw_spin_rq_lock_nested at ffffffff9894be19
#2 [ffffadef4016f880] try_to_wake_up at ffffffff98956aff
#3 [ffffadef4016f8d8] kick_pool at ffffffff989281a9
#4 [ffffadef4016f8f0] __queue_work at ffffffff9892b8c7
#5 [ffffadef4016f938] queue_work_on at ffffffff9892bb04
#6 [ffffadef4016f948] soft_cursor at ffffffff98f00540
#7 [ffffadef4016f9a0] bit_cursor at ffffffff98f000b5
#8 [ffffadef4016fa68] hide_cursor at ffffffff98fca13b
#9 [ffffadef4016fa78] vt_console_print at ffffffff98fcc761
#10 [ffffadef4016fae0] console_flush_all at ffffffff98990d7b
#11 [ffffadef4016fb58] console_unlock at ffffffff98990fe6
#12 [ffffadef4016fb90] vprintk_emit at ffffffff98991f94
#13 [ffffadef4016fbd8] _printk at ffffffff9898e248
#14 [ffffadef4016fc38] show_fault_oops at ffffffff988930ce
#15 [ffffadef4016fc90] page_fault_oops at ffffffff988931ce
#16 [ffffadef4016fce8] exc_page_fault at ffffffff994c3495
#17 [ffffadef4016fd10] asm_exc_page_fault at ffffffff99600bb2
[exception RIP: pick_next_task_fair+251]
RIP: ffffffff9896769b RSP: ffffadef4016fdc8 RFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff9dc443c73280 RCX: 0000000000000806
RDX: 1272dd0281c36788 RSI: 0000000000000806 RDI: 009b28f004606f7f
RBP: ffff9db67ca46800 R8: 0000000000000000 R9: 0000000000000000
R10: 000000000000014b R11: 0000000000000000 R12: ffffadef4016fe60
R13: ffff9db600a6ce00 R14: ffff9dc443c73300 R15: ffff9db67ca46c00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#18 [ffffadef4016fdf8] pick_next_task at ffffffff9894f06a
#19 [ffffadef4016fe58] __schedule at ffffffff994d08ed
#20 [ffffadef4016fec0] schedule at ffffffff994d0f7e
#21 [ffffadef4016fed8] smpboot_thread_fn at ffffffff98940d57
#22 [ffffadef4016fef8] kthread at ffffffff989354cf
#23 [ffffadef4016ff30] ret_from_fork at ffffffff9884077d
#24 [ffffadef4016ff50] ret_from_fork_asm at ffffffff988041bb

PID: 33793 TASK: ffff9db792449a00 CPU: 2 COMMAND: "java"
[exception RIP: native_queued_spin_lock_slowpath+639]
RIP: ffffffff994d822f RSP: ffffadef44027928 RFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff9dc443cb4080 RCX: 00000000000c0000
RDX: 0000000000000004 RSI: 0000000000140100 RDI: ffff9dc443c73280
RBP: ffff9dc443c73280 R8: 0000000000000006 R9: fffffffffffffffc
R10: 0000000000000006 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000002 R14: ffff9dc443cb3280 R15: 0000000000000000
CS: 0010 SS: 0018
#0 [ffffadef44027948] _raw_spin_lock at ffffffff994d7de5
#1 [ffffadef44027950] raw_spin_rq_lock_nested at ffffffff9894be19
#2 [ffffadef44027968] load_balance at ffffffff98966c5d
#3 [ffffadef44027a48] newidle_balance at ffffffff98967323
#4 [ffffadef44027aa8] pick_next_task_fair at ffffffff989675e1
#5 [ffffadef44027ae0] pick_next_task at ffffffff9894f06a
#6 [ffffadef44027b40] __schedule at ffffffff994d08ed
#7 [ffffadef44027ba8] schedule at ffffffff994d0f7e
#8 [ffffadef44027bc0] schedule_timeout at ffffffff994d6c58
#9 [ffffadef44027c10] wait_woken at ffffffff98977750
#10 [ffffadef44027c28] sk_wait_data at ffffffff99240f26
#11 [ffffadef44027c88] tcp_recvmsg_locked at ffffffff99332b5f
#12 [ffffadef44027d18] tcp_recvmsg at ffffffff99333d61
#13 [ffffadef44027d88] inet_recvmsg at ffffffff99371aa2
#14 [ffffadef44027dc8] sock_recvmsg at ffffffff99237fc5
#15 [ffffadef44027df0] __sys_recvfrom at ffffffff9923a5d0
#16 [ffffadef44027f20] __x64_sys_recvfrom at ffffffff9923a670
#17 [ffffadef44027f28] do_syscall_64 at ffffffff994bd52e
#18 [ffffadef44027f50] entry_SYSCALL_64_after_hwframe at ffffffff99600130
RIP: 00007f93ea30f66e RSP: 00007f93ea5e54b0 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f93e804c1b0 RCX: 00007f93ea30f66e
RDX: 0000000000000005 RSI: 00007f93ea5e5590 RDI: 000000000000005e
RBP: 00007f93ea5e5560 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f93ea5e5510 R14: 00007f93ea5e5590 R15: 00007f93ea5ff5c0
ORIG_RAX: 000000000000002d CS: 0033 SS: 002b

PID: 33821 TASK: ffff9db6c49d0000 CPU: 3 COMMAND: "C2 CompilerThre"
[exception RIP: native_queued_spin_lock_slowpath+639]
RIP: ffffffff994d822f RSP: ffffadef44baba98 RFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff9dc443cf4080 RCX: 0000000000100000
RDX: 0000000000000002 RSI: 00000000000c0100 RDI: ffff9dc443c73280
RBP: ffff9dc443c73280 R8: 0000000000000006 R9: fffffffffffffffc
R10: 0000000000000006 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000003 R14: ffff9dc443cf3280 R15: 0000000000000000
CS: 0010 SS: 0018
#0 [ffffadef44babab8] _raw_spin_lock at ffffffff994d7de5
#1 [ffffadef44babac0] raw_spin_rq_lock_nested at ffffffff9894be19
#2 [ffffadef44babad8] load_balance at ffffffff98966c5d
#3 [ffffadef44babbb8] newidle_balance at ffffffff98967323
#4 [ffffadef44babc18] pick_next_task_fair at ffffffff989675e1
#5 [ffffadef44babc50] pick_next_task at ffffffff9894f06a
#6 [ffffadef44babcb0] __schedule at ffffffff994d08ed
#7 [ffffadef44babd18] schedule at ffffffff994d0f7e
#8 [ffffadef44babd30] futex_wait_queue at ffffffff989eca00
#9 [ffffadef44babd50] __futex_wait at ffffffff989ed0b9
#10 [ffffadef44babe10] futex_wait at ffffffff989ed1a7
#11 [ffffadef44babea0] do_futex at ffffffff989e9388
#12 [ffffadef44babeb0] __x64_sys_futex at ffffffff989e9a43
#13 [ffffadef44babf28] do_syscall_64 at ffffffff994bd52e
#14 [ffffadef44babf50] entry_SYSCALL_64_after_hwframe at ffffffff99600130
RIP: 00007f93ea28679a RSP: 00007f93bd0fe990 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f93bd0feaa0 RCX: 00007f93ea28679a
RDX: 0000000000000000 RSI: 0000000000000089 RDI: 00007f93e4419b7c
RBP: 0000000000000000 R8: 0000000000000000 R9: 00000000ffffffff
R10: 00007f93bd0feaa0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f93e4419b7c R14: 0000000000000000 R15: 00007f93e4419b28
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b

PID: 34261 TASK: ffff9db62162b400 CPU: 4 COMMAND: "java"
[exception RIP: native_queued_spin_lock_slowpath+639]
RIP: ffffffff994d822f RSP: ffffadef445efd48 RFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff9dc443d34080 RCX: 0000000000140000
RDX: 0000000000000000 RSI: 0000000000040100 RDI: ffff9dc443c73280
RBP: ffff9dc443c73280 R8: 0000000000000006 R9: fffffffffffffffc
R10: 0000000000000006 R11: 000000000000002c R12: 0000000000000000
R13: 0000000000000004 R14: 000000000001a23e R15: ffff9db600a70e00
CS: 0010 SS: 0000
#0 [ffffadef445efd68] _raw_spin_lock at ffffffff994d7de5
#1 [ffffadef445efd70] raw_spin_rq_lock_nested at ffffffff9894be19
#2 [ffffadef445efd88] load_balance at ffffffff98966c5d
#3 [ffffadef445efe68] rebalance_domains at ffffffff98967d5a
#4 [ffffadef445efed8] __do_softirq at ffffffff994d8a08
#5 [ffffadef445eff28] irq_exit_rcu at ffffffff9890f0e6
#6 [ffffadef445eff38] sysvec_apic_timer_interrupt at ffffffff994c2aec
#7 [ffffadef445eff50] asm_sysvec_apic_timer_interrupt at ffffffff99600e06
RIP: 00007f8d1224ed4b RSP: 00007f8d3191c9c0 RFLAGS: 00000293
RAX: 00000007593ce380 RBX: 0000000000000022 RCX: 0000000000000188
RDX: 0000000000000022 RSI: 00000007414076a8 RDI: 00000000000002e4
RBP: 00000007595fad38 R8: 00000007593ce090 R9: 00000000000020fe
R10: 00007f8d31cb4000 R11: 00000000e81a149f R12: 0000000000000000
R13: 00000007593ce210 R14: 00000007593d5028 R15: 00007f8d28027800
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

PID: 35079 TASK: ffff9db70682ce00 CPU: 5 COMMAND: "C2 CompilerThre"
[exception RIP: native_queued_spin_lock_slowpath+639]
RIP: ffffffff994d822f RSP: ffffadef4478fa98 RFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff9dc443d74080 RCX: 0000000000180000
RDX: 0000000000000003 RSI: 0000000000100100 RDI: ffff9dc443c73280
RBP: ffff9dc443c73280 R8: 0000000000000006 R9: fffffffffffffffc
R10: 0000000000000006 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000005 R14: ffff9dc443d73280 R15: 0000000000000000
CS: 0010 SS: 0018
#0 [ffffadef4478fab8] _raw_spin_lock at ffffffff994d7de5
#1 [ffffadef4478fac0] raw_spin_rq_lock_nested at ffffffff9894be19
#2 [ffffadef4478fad8] load_balance at ffffffff98966c5d
#3 [ffffadef4478fbb8] newidle_balance at ffffffff98967323
#4 [ffffadef4478fc18] pick_next_task_fair at ffffffff989675e1
#5 [ffffadef4478fc50] pick_next_task at ffffffff9894f06a
#6 [ffffadef4478fcb0] __schedule at ffffffff994d08ed
#7 [ffffadef4478fd18] schedule at ffffffff994d0f7e
#8 [ffffadef4478fd30] futex_wait_queue at ffffffff989eca00
#9 [ffffadef4478fd50] __futex_wait at ffffffff989ed0b9
#10 [ffffadef4478fe10] futex_wait at ffffffff989ed1a7
#11 [ffffadef4478fea0] do_futex at ffffffff989e9388
#12 [ffffadef4478feb0] __x64_sys_futex at ffffffff989e9a43
#13 [ffffadef4478ff28] do_syscall_64 at ffffffff994bd52e
#14 [ffffadef4478ff50] entry_SYSCALL_64_after_hwframe at ffffffff99600130
RIP: 00007f1a7188679a RSP: 00007f1a505fe990 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f1a505feaa0 RCX: 00007f1a7188679a
RDX: 0000000000000000 RSI: 0000000000000089 RDI: 00007f1a6c419b7c
RBP: 0000000000000000 R8: 0000000000000000 R9: 00000000ffffffff
R10: 00007f1a505feaa0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f1a6c419b7c R14: 0000000000000000 R15: 00007f1a6c419b28
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b


> Having a semi reliable reproducer in hand would be a tremendous help.

We don't have a specific reproducer but it happens on a specific type
of VM workloads and does not crash anywhere else.
Java+Ruby+Containers+Load+Parallel+NFS is the description of this
specific type of VM workload we have.

>
> Anywya, I've had:
>
> schedtool -D -e bash -c "while :; make O=defconfig-build clean; make O=defconfig-build -j64; done"
>
> running for an hour or so now, but no luck :/