Hello
the perf_fuzzer has turned up a repeatable crash on my haswell system.
addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST.
I'll investigate more when I have the chance.
Vince
[96289.009646] BUG: kernel NULL pointer dereference, address: 0000000000000150
[96289.017094] #PF: supervisor read access in kernel mode
[96289.022588] #PF: error_code(0x0000) - not-present page
[96289.028069] PGD 0 P4D 0
[96289.030796] Oops: 0000 [#1] SMP PTI
[96289.034549] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.11.0-rc5+ #151
[96289.043059] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[96289.050946] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
[96289.056817] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
[96289.076876] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
[96289.082468] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
[96289.090095] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
[96289.097746] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
[96289.105376] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[96289.113008] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
[96289.120671] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[96289.129346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96289.135526] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
[96289.143159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[96289.150803] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[96289.158414] Call Trace:
[96289.161041] ? update_blocked_averages+0x532/0x620
[96289.166152] ? update_group_capacity+0x25/0x1d0
[96289.171025] ? cpumask_next_and+0x19/0x20
[96289.175339] ? update_sd_lb_stats.constprop.0+0x702/0x820
[96289.181105] intel_pmu_drain_pebs_buffer+0x33/0x50
[96289.186259] ? x86_pmu_commit_txn+0xbc/0xf0
[96289.190749] ? _raw_spin_lock_irqsave+0x1d/0x30
[96289.195603] ? timerqueue_add+0x64/0xb0
[96289.199720] ? update_load_avg+0x6c/0x5e0
[96289.204001] ? enqueue_task_fair+0x98/0x5a0
[96289.208464] ? timerqueue_del+0x1e/0x40
[96289.212556] ? uncore_msr_read_counter+0x10/0x20
[96289.217513] intel_pmu_pebs_disable+0x12a/0x130
[96289.222324] x86_pmu_stop+0x48/0xa0
[96289.226076] x86_pmu_del+0x40/0x160
[96289.229813] event_sched_out.isra.0+0x81/0x1e0
[96289.234602] group_sched_out.part.0+0x4f/0xc0
[96289.239257] __perf_event_disable+0xef/0x1d0
[96289.243831] event_function+0x8c/0xd0
[96289.247785] remote_function+0x3e/0x50
[96289.251797] flush_smp_call_function_queue+0x11b/0x1a0
[96289.257268] flush_smp_call_function_from_idle+0x38/0x60
[96289.262944] do_idle+0x15f/0x240
[96289.266421] cpu_startup_entry+0x19/0x20
[96289.270639] start_kernel+0x7df/0x804
[96289.274558] ? apply_microcode_early.cold+0xc/0x27
[96289.279678] secondary_startup_64_no_verify+0xb0/0xbb
[96289.285078] Modules linked in: nf_tables libcrc32c nfnetlink intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi x86_pkg_temp_thermal ledtrig_audio intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel kvm snd_hwdep irqbypass at24 snd_pcm tpm_tis crct10dif_pclmul snd_timer crc32_pclmul regmap_i2c wmi_bmof sg tpm_tis_core snd ghash_clmulni_intel tpm iTCO_wdt aesni_intel soundcore rng_core iTCO_vendor_support crypto_simd mei_me mei cryptd pcspkr evdev glue_helper binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod t10_pi cdrom i915 iosf_mbi ahci i2c_algo_bit libahci drm_kms_helper xhci_pci ehci_pci ehci_hcd libata xhci_hcd lpc_ich usbcore i2c_i801 drm crc32c_intel e1000e mfd_core scsi_mod usb_common i2c_smbus wmi fan thermal video button
[96289.362498] CR2: 0000000000000150
[96289.366070] ---[ end trace 80c577f99562015f ]---
[96289.371007] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
[96289.376868] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
[96289.396981] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
[96289.402573] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
[96289.410226] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
[96289.417841] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
[96289.425461] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[96289.433122] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
[96289.440774] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[96289.449374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96289.455507] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
[96289.463119] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[96289.470764] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[96289.478408] Kernel panic - not syncing: Attempted to kill the idle task!
[96289.485598] Kernel Offset: disabled
[96289.489355] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
On Thu, 28 Jan 2021, Vince Weaver wrote:
> the perf_fuzzer has turned up a repeatable crash on my haswell system.
>
> addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST.
> I'll investigate more when I have the chance.
so I poked around some more.
This seems to be caused in
__intel_pmu_pebs_event()
get_next_pebs_record_by_bit() ds.c line 1639
get_pebs_status(at) ds.c line 1317
return ((struct pebs_record_nhm *)n)->status;
where "n" has the value of 0xc0 rather than a proper pointer.
this does seem to be repetable, but fairly deep in a fuzzing run so I
don't have a quick reproducer.
Vince
> [96289.009646] BUG: kernel NULL pointer dereference, address: 0000000000000150
> [96289.017094] #PF: supervisor read access in kernel mode
> [96289.022588] #PF: error_code(0x0000) - not-present page
> [96289.028069] PGD 0 P4D 0
> [96289.030796] Oops: 0000 [#1] SMP PTI
> [96289.034549] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.11.0-rc5+ #151
> [96289.043059] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [96289.050946] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
> [96289.056817] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
> [96289.076876] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
> [96289.082468] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
> [96289.090095] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
> [96289.097746] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
> [96289.105376] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [96289.113008] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
> [96289.120671] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
> [96289.129346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [96289.135526] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
> [96289.143159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [96289.150803] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [96289.158414] Call Trace:
> [96289.161041] ? update_blocked_averages+0x532/0x620
> [96289.166152] ? update_group_capacity+0x25/0x1d0
> [96289.171025] ? cpumask_next_and+0x19/0x20
> [96289.175339] ? update_sd_lb_stats.constprop.0+0x702/0x820
> [96289.181105] intel_pmu_drain_pebs_buffer+0x33/0x50
> [96289.186259] ? x86_pmu_commit_txn+0xbc/0xf0
> [96289.190749] ? _raw_spin_lock_irqsave+0x1d/0x30
> [96289.195603] ? timerqueue_add+0x64/0xb0
> [96289.199720] ? update_load_avg+0x6c/0x5e0
> [96289.204001] ? enqueue_task_fair+0x98/0x5a0
> [96289.208464] ? timerqueue_del+0x1e/0x40
> [96289.212556] ? uncore_msr_read_counter+0x10/0x20
> [96289.217513] intel_pmu_pebs_disable+0x12a/0x130
> [96289.222324] x86_pmu_stop+0x48/0xa0
> [96289.226076] x86_pmu_del+0x40/0x160
> [96289.229813] event_sched_out.isra.0+0x81/0x1e0
> [96289.234602] group_sched_out.part.0+0x4f/0xc0
> [96289.239257] __perf_event_disable+0xef/0x1d0
> [96289.243831] event_function+0x8c/0xd0
> [96289.247785] remote_function+0x3e/0x50
> [96289.251797] flush_smp_call_function_queue+0x11b/0x1a0
> [96289.257268] flush_smp_call_function_from_idle+0x38/0x60
> [96289.262944] do_idle+0x15f/0x240
> [96289.266421] cpu_startup_entry+0x19/0x20
> [96289.270639] start_kernel+0x7df/0x804
> [96289.274558] ? apply_microcode_early.cold+0xc/0x27
> [96289.279678] secondary_startup_64_no_verify+0xb0/0xbb
> [96289.285078] Modules linked in: nf_tables libcrc32c nfnetlink intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi x86_pkg_temp_thermal ledtrig_audio intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel kvm snd_hwdep irqbypass at24 snd_pcm tpm_tis crct10dif_pclmul snd_timer crc32_pclmul regmap_i2c wmi_bmof sg tpm_tis_core snd ghash_clmulni_intel tpm iTCO_wdt aesni_intel soundcore rng_core iTCO_vendor_support crypto_simd mei_me mei cryptd pcspkr evdev glue_helper binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod t10_pi cdrom i915 iosf_mbi ahci i2c_algo_bit libahci drm_kms_helper xhci_pci ehci_pci ehci_hcd libata xhci_hcd lpc_ich usbcore i2c_i801 drm crc32c_intel e1000e mfd_core scsi_mod usb_common i2c_smbus wmi fan thermal video button
> [96289.362498] CR2: 0000000000000150
> [96289.366070] ---[ end trace 80c577f99562015f ]---
> [96289.371007] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
> [96289.376868] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
> [96289.396981] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
> [96289.402573] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
> [96289.410226] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
> [96289.417841] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
> [96289.425461] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [96289.433122] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
> [96289.440774] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
> [96289.449374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [96289.455507] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
> [96289.463119] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [96289.470764] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [96289.478408] Kernel panic - not syncing: Attempted to kill the idle task!
> [96289.485598] Kernel Offset: disabled
> [96289.489355] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
>
Kan, do you have time to look at this?
On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote:
> On Thu, 28 Jan 2021, Vince Weaver wrote:
>
> > the perf_fuzzer has turned up a repeatable crash on my haswell system.
> >
> > addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST.
> > I'll investigate more when I have the chance.
>
> so I poked around some more.
>
> This seems to be caused in
>
> __intel_pmu_pebs_event()
> get_next_pebs_record_by_bit() ds.c line 1639
> get_pebs_status(at) ds.c line 1317
> return ((struct pebs_record_nhm *)n)->status;
>
> where "n" has the value of 0xc0 rather than a proper pointer.
>
> this does seem to be repetable, but fairly deep in a fuzzing run so I
> don't have a quick reproducer.
>
> Vince
>
>
> > [96289.009646] BUG: kernel NULL pointer dereference, address: 0000000000000150
> > [96289.017094] #PF: supervisor read access in kernel mode
> > [96289.022588] #PF: error_code(0x0000) - not-present page
> > [96289.028069] PGD 0 P4D 0
> > [96289.030796] Oops: 0000 [#1] SMP PTI
> > [96289.034549] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.11.0-rc5+ #151
> > [96289.043059] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> > [96289.050946] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
> > [96289.056817] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
> > [96289.076876] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
> > [96289.082468] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
> > [96289.090095] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
> > [96289.097746] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
> > [96289.105376] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> > [96289.113008] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
> > [96289.120671] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
> > [96289.129346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [96289.135526] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
> > [96289.143159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [96289.150803] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> > [96289.158414] Call Trace:
> > [96289.161041] ? update_blocked_averages+0x532/0x620
> > [96289.166152] ? update_group_capacity+0x25/0x1d0
> > [96289.171025] ? cpumask_next_and+0x19/0x20
> > [96289.175339] ? update_sd_lb_stats.constprop.0+0x702/0x820
> > [96289.181105] intel_pmu_drain_pebs_buffer+0x33/0x50
> > [96289.186259] ? x86_pmu_commit_txn+0xbc/0xf0
> > [96289.190749] ? _raw_spin_lock_irqsave+0x1d/0x30
> > [96289.195603] ? timerqueue_add+0x64/0xb0
> > [96289.199720] ? update_load_avg+0x6c/0x5e0
> > [96289.204001] ? enqueue_task_fair+0x98/0x5a0
> > [96289.208464] ? timerqueue_del+0x1e/0x40
> > [96289.212556] ? uncore_msr_read_counter+0x10/0x20
> > [96289.217513] intel_pmu_pebs_disable+0x12a/0x130
> > [96289.222324] x86_pmu_stop+0x48/0xa0
> > [96289.226076] x86_pmu_del+0x40/0x160
> > [96289.229813] event_sched_out.isra.0+0x81/0x1e0
> > [96289.234602] group_sched_out.part.0+0x4f/0xc0
> > [96289.239257] __perf_event_disable+0xef/0x1d0
> > [96289.243831] event_function+0x8c/0xd0
> > [96289.247785] remote_function+0x3e/0x50
> > [96289.251797] flush_smp_call_function_queue+0x11b/0x1a0
> > [96289.257268] flush_smp_call_function_from_idle+0x38/0x60
> > [96289.262944] do_idle+0x15f/0x240
> > [96289.266421] cpu_startup_entry+0x19/0x20
> > [96289.270639] start_kernel+0x7df/0x804
> > [96289.274558] ? apply_microcode_early.cold+0xc/0x27
> > [96289.279678] secondary_startup_64_no_verify+0xb0/0xbb
> > [96289.285078] Modules linked in: nf_tables libcrc32c nfnetlink intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi x86_pkg_temp_thermal ledtrig_audio intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel kvm snd_hwdep irqbypass at24 snd_pcm tpm_tis crct10dif_pclmul snd_timer crc32_pclmul regmap_i2c wmi_bmof sg tpm_tis_core snd ghash_clmulni_intel tpm iTCO_wdt aesni_intel soundcore rng_core iTCO_vendor_support crypto_simd mei_me mei cryptd pcspkr evdev glue_helper binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod t10_pi cdrom i915 iosf_mbi ahci i2c_algo_bit libahci drm_kms_helper xhci_pci ehci_pci ehci_hcd libata xhci_hcd lpc_ich usbcore i2c_i801 drm crc32c_intel e1000e mfd_core scsi_mod usb_common i2c_smbus wmi fan thermal video button
> > [96289.362498] CR2: 0000000000000150
> > [96289.366070] ---[ end trace 80c577f99562015f ]---
> > [96289.371007] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
> > [96289.376868] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
> > [96289.396981] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
> > [96289.402573] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
> > [96289.410226] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
> > [96289.417841] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
> > [96289.425461] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> > [96289.433122] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
> > [96289.440774] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
> > [96289.449374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [96289.455507] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
> > [96289.463119] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [96289.470764] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> > [96289.478408] Kernel panic - not syncing: Attempted to kill the idle task!
> > [96289.485598] Kernel Offset: disabled
> > [96289.489355] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> >
> >
>
On 2/11/2021 9:53 AM, Peter Zijlstra wrote:
>
> Kan, do you have time to look at this?
>
> On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote:
>> On Thu, 28 Jan 2021, Vince Weaver wrote:
>>
>>> the perf_fuzzer has turned up a repeatable crash on my haswell system.
>>>
>>> addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST.
>>> I'll investigate more when I have the chance.
>>
>> so I poked around some more.
>>
>> This seems to be caused in
>>
>> __intel_pmu_pebs_event()
>> get_next_pebs_record_by_bit() ds.c line 1639
>> get_pebs_status(at) ds.c line 1317
>> return ((struct pebs_record_nhm *)n)->status;
>>
>> where "n" has the value of 0xc0 rather than a proper pointer.
>>
The issue is pretty strange. I haven't found the root cause yet.
The base->status (aka "n") was just used in the intel_pmu_drain_pebs_nhm().
for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;
u64 pebs_status;
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= mask;
Then it seems to be modified to 0xc0, and crash at get_pebs_status()
based on your investigation.
The "base" is a local variable. The ds->pebs_buffer_base is assigned to
"base" at the beginning of the function. After that, no one change it.
>> this does seem to be repetable,
I'd like to reproduce it on my machine.
Is this issue only found in a Haswell client machine?
To reproduce the issue, can I use ./perf_fuzzer under
perf_event_tests/fuzzer?
Do I need to apply any parameters with ./perf_fuzzer?
Usually how long does it take to reproduce the issue?
Thanks,
Kan
>> but fairly deep in a fuzzing run so I
>> don't have a quick reproducer. >> Vince
>>
>>
>>> [96289.009646] BUG: kernel NULL pointer dereference, address: 0000000000000150
>>> [96289.017094] #PF: supervisor read access in kernel mode
>>> [96289.022588] #PF: error_code(0x0000) - not-present page
>>> [96289.028069] PGD 0 P4D 0
>>> [96289.030796] Oops: 0000 [#1] SMP PTI
>>> [96289.034549] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.11.0-rc5+ #151
>>> [96289.043059] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>>> [96289.050946] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
>>> [96289.056817] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
>>> [96289.076876] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
>>> [96289.082468] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
>>> [96289.090095] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
>>> [96289.097746] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
>>> [96289.105376] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
>>> [96289.113008] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
>>> [96289.120671] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
>>> [96289.129346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [96289.135526] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
>>> [96289.143159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [96289.150803] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>> [96289.158414] Call Trace:
>>> [96289.161041] ? update_blocked_averages+0x532/0x620
>>> [96289.166152] ? update_group_capacity+0x25/0x1d0
>>> [96289.171025] ? cpumask_next_and+0x19/0x20
>>> [96289.175339] ? update_sd_lb_stats.constprop.0+0x702/0x820
>>> [96289.181105] intel_pmu_drain_pebs_buffer+0x33/0x50
>>> [96289.186259] ? x86_pmu_commit_txn+0xbc/0xf0
>>> [96289.190749] ? _raw_spin_lock_irqsave+0x1d/0x30
>>> [96289.195603] ? timerqueue_add+0x64/0xb0
>>> [96289.199720] ? update_load_avg+0x6c/0x5e0
>>> [96289.204001] ? enqueue_task_fair+0x98/0x5a0
>>> [96289.208464] ? timerqueue_del+0x1e/0x40
>>> [96289.212556] ? uncore_msr_read_counter+0x10/0x20
>>> [96289.217513] intel_pmu_pebs_disable+0x12a/0x130
>>> [96289.222324] x86_pmu_stop+0x48/0xa0
>>> [96289.226076] x86_pmu_del+0x40/0x160
>>> [96289.229813] event_sched_out.isra.0+0x81/0x1e0
>>> [96289.234602] group_sched_out.part.0+0x4f/0xc0
>>> [96289.239257] __perf_event_disable+0xef/0x1d0
>>> [96289.243831] event_function+0x8c/0xd0
>>> [96289.247785] remote_function+0x3e/0x50
>>> [96289.251797] flush_smp_call_function_queue+0x11b/0x1a0
>>> [96289.257268] flush_smp_call_function_from_idle+0x38/0x60
>>> [96289.262944] do_idle+0x15f/0x240
>>> [96289.266421] cpu_startup_entry+0x19/0x20
>>> [96289.270639] start_kernel+0x7df/0x804
>>> [96289.274558] ? apply_microcode_early.cold+0xc/0x27
>>> [96289.279678] secondary_startup_64_no_verify+0xb0/0xbb
>>> [96289.285078] Modules linked in: nf_tables libcrc32c nfnetlink intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi x86_pkg_temp_thermal ledtrig_audio intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel kvm snd_hwdep irqbypass at24 snd_pcm tpm_tis crct10dif_pclmul snd_timer crc32_pclmul regmap_i2c wmi_bmof sg tpm_tis_core snd ghash_clmulni_intel tpm iTCO_wdt aesni_intel soundcore rng_core iTCO_vendor_support crypto_simd mei_me mei cryptd pcspkr evdev glue_helper binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod t10_pi cdrom i915 iosf_mbi ahci i2c_algo_bit libahci drm_kms_helper xhci_pci ehci_pci ehci_hcd libata xhci_hcd lpc_ich usbcore i2c_i801 drm crc32c_intel e1000e mfd_core scsi_mod usb_common i2c_smbus wmi fan thermal video button
>>> [96289.362498] CR2: 0000000000000150
>>> [96289.366070] ---[ end trace 80c577f99562015f ]---
>>> [96289.371007] RIP: 0010:intel_pmu_drain_pebs_nhm+0x464/0x5f0
>>> [96289.376868] Code: 09 00 00 0f b6 c0 49 39 c4 74 2a 48 63 82 78 09 00 00 48 01 c5 48 39 6c 24 08 76 17 0f b6 05 14 70 3f 01 83 e0 0f 3c 03 77 a4 <48> 8b 85 90 00 00 00 eb 9f 31 ed 83 eb 01 83 fb 01 0f 85 30 ff ff
>>> [96289.396981] RSP: 0000:ffffffff822039e0 EFLAGS: 00010097
>>> [96289.402573] RAX: 0000000000000002 RBX: 0000000000000155 RCX: 0000000000000008
>>> [96289.410226] RDX: ffff88811ac118a0 RSI: ffffffff82203980 RDI: ffffffff82203980
>>> [96289.417841] RBP: 00000000000000c0 R08: 0000000000000000 R09: 0000000000000000
>>> [96289.425461] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
>>> [96289.433122] R13: ffffffff82203bc0 R14: ffff88801c3cf800 R15: ffffffff829814a0
>>> [96289.440774] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
>>> [96289.449374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [96289.455507] CR2: 0000000000000150 CR3: 000000000220c003 CR4: 00000000001706f0
>>> [96289.463119] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [96289.470764] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>> [96289.478408] Kernel panic - not syncing: Attempted to kill the idle task!
>>> [96289.485598] Kernel Offset: disabled
>>> [96289.489355] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>>
>>>
>>
On Thu, 11 Feb 2021, Liang, Kan wrote:
> > On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote:
> I'd like to reproduce it on my machine.
> Is this issue only found in a Haswell client machine?
>
> To reproduce the issue, can I use ./perf_fuzzer under perf_event_tests/fuzzer?
> Do I need to apply any parameters with ./perf_fuzzer?
>
> Usually how long does it take to reproduce the issue?
On my machine if I run the commands
echo 1 > /proc/sys/kernel/nmi_watchdog
echo 0 > /proc/sys/kernel/perf_event_paranoid
echo 1000 > /proc/sys/kernel/perf_event_max_sample_rate
./perf_fuzzer -s 30000 -r 1611784483
it is repeatable within a minute, but because of the nature of the fuzzer
it probably won't work for you because the random events will diverge
based on the different configs of the system.
I can try to generate a simple reproducer, I've just been extremely busy
here at work and haven't had the chance.
If you want to try to reproduce it the hard way, run the
./fast_repro99.sh
script in the perf_fuzzer directory. It will start fuzzing. My machine
turned up the issue within a day or so.
Vince
On 2/11/2021 5:14 PM, Vince Weaver wrote:
> On Thu, 11 Feb 2021, Liang, Kan wrote:
>
>>> On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote:
>> I'd like to reproduce it on my machine.
>> Is this issue only found in a Haswell client machine?
>>
>> To reproduce the issue, can I use ./perf_fuzzer under perf_event_tests/fuzzer?
>> Do I need to apply any parameters with ./perf_fuzzer?
>>
>> Usually how long does it take to reproduce the issue?
>
> On my machine if I run the commands
> echo 1 > /proc/sys/kernel/nmi_watchdog
> echo 0 > /proc/sys/kernel/perf_event_paranoid
> echo 1000 > /proc/sys/kernel/perf_event_max_sample_rate
> ./perf_fuzzer -s 30000 -r 1611784483
>
> it is repeatable within a minute, but because of the nature of the fuzzer
> it probably won't work for you because the random events will diverge
> based on the different configs of the system.
>
> I can try to generate a simple reproducer, I've just been extremely busy
> here at work and haven't had the chance.
>
> If you want to try to reproduce it the hard way, run the
> ./fast_repro99.sh
> script in the perf_fuzzer directory. It will start fuzzing. My machine
> turned up the issue within a day or so.
>
Sorry for the late response. Just want to let you know I'm still trying
to reproduce the issue.
I only have a Haswell server on hand. I run the ./fast_repro99.sh on the
machine for 3 days, but I didn't observe any crash.
Now, I'm looking for a HSW client in my lab. I will check if I can
reproduce it on a client machine.
If you have a simple reproducer, please let me know.
Thanks,
Kan
On 2/11/2021 9:53 AM, Peter Zijlstra wrote:
>
> Kan, do you have time to look at this?
>
> On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote:
>> On Thu, 28 Jan 2021, Vince Weaver wrote:
>>
>>> the perf_fuzzer has turned up a repeatable crash on my haswell system.
>>>
>>> addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST.
>>> I'll investigate more when I have the chance.
>>
>> so I poked around some more.
>>
>> This seems to be caused in
>>
>> __intel_pmu_pebs_event()
>> get_next_pebs_record_by_bit() ds.c line 1639
>> get_pebs_status(at) ds.c line 1317
>> return ((struct pebs_record_nhm *)n)->status;
>>
>> where "n" has the value of 0xc0 rather than a proper pointer.
>>
I think I find the suspicious patch.
The commt id 01330d7288e00 ("perf/x86: Allow zero PEBS status with only
single active event")
https://lore.kernel.org/lkml/[email protected]/
The patch is an SW workaround for some old CPUs (HSW and earlier), which
may set 0 to the PEBS status. It adds a check in the
intel_pmu_drain_pebs_nhm(). It tries to minimize the impact of the
defect by avoiding dropping the PEBS records which have PEBS status 0.
But, it doesn't correct the PEBS status, which may bring problems,
especially for the large PEBS.
It's possible that all the PEBS records in a large PEBS have the PEBS
status 0. If so, the first get_next_pebs_record_by_bit() in the
__intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
PEBS, the 'count' parameter must > 1. The second
get_next_pebs_record_by_bit() will crash.
Could you please revert the patch and check whether it fixes your issue?
Thanks,
Kan
On Mon, 1 Mar 2021, Liang, Kan wrote:
> https://lore.kernel.org/lkml/[email protected]/
> The patch is an SW workaround for some old CPUs (HSW and earlier), which may
> set 0 to the PEBS status. It adds a check in the intel_pmu_drain_pebs_nhm().
> It tries to minimize the impact of the defect by avoiding dropping the PEBS
> records which have PEBS status 0.
> But, it doesn't correct the PEBS status, which may bring problems,
> especially for the large PEBS.
> It's possible that all the PEBS records in a large PEBS have the PEBS status
> 0. If so, the first get_next_pebs_record_by_bit() in the
> __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large PEBS,
> the 'count' parameter must > 1. The second get_next_pebs_record_by_bit() will
> crash.
>
> Could you please revert the patch and check whether it fixes your issue?
I've reverted that patch and my test-case no longer triggers the issue.
I'll restart a longer fuzzing run to see if any other issues turn up.
Thanks,
Vince
Hello
on my Haswell machine the perf_fuzzer managed to trigger this message:
[117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0400000000000000) at rIP: 0xffffffff8106e4f4 (native_write_msr+0x4/0x20)
[117248.089957] Call Trace:
[117248.092685] intel_pmu_pebs_enable_all+0x31/0x40
[117248.097737] intel_pmu_enable_all+0xa/0x10
[117248.102210] __perf_event_task_sched_in+0x2df/0x2f0
[117248.107511] finish_task_switch.isra.0+0x15f/0x280
[117248.112765] schedule_tail+0xc/0x40
[117248.116562] ret_from_fork+0x8/0x30
that shouldn't be possible, should it? MSR 0x3f1 is MSR_IA32_PEBS_ENABLE
this is on recent-git with the patch causing the pebs-related crash
reverted.
Vince
On Wed, Mar 3, 2021 at 10:16 AM Vince Weaver <[email protected]> wrote:
>
> Hello
>
> on my Haswell machine the perf_fuzzer managed to trigger this message:
>
> [117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0400000000000000) at rIP: 0xffffffff8106e4f4 (native_write_msr+0x4/0x20)
> [117248.089957] Call Trace:
> [117248.092685] intel_pmu_pebs_enable_all+0x31/0x40
> [117248.097737] intel_pmu_enable_all+0xa/0x10
> [117248.102210] __perf_event_task_sched_in+0x2df/0x2f0
> [117248.107511] finish_task_switch.isra.0+0x15f/0x280
> [117248.112765] schedule_tail+0xc/0x40
> [117248.116562] ret_from_fork+0x8/0x30
>
> that shouldn't be possible, should it? MSR 0x3f1 is MSR_IA32_PEBS_ENABLE
>
Not possible, bit 58 is not defined in PEBS_ENABLE, AFAIK.
>
> this is on recent-git with the patch causing the pebs-related crash
> reverted.
>
> Vince
On 3/3/2021 2:28 PM, Stephane Eranian wrote:
> On Wed, Mar 3, 2021 at 10:16 AM Vince Weaver <[email protected]> wrote:
>>
>> Hello
>>
>> on my Haswell machine the perf_fuzzer managed to trigger this message:
>>
>> [117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0400000000000000) at rIP: 0xffffffff8106e4f4 (native_write_msr+0x4/0x20)
>> [117248.089957] Call Trace:
>> [117248.092685] intel_pmu_pebs_enable_all+0x31/0x40
>> [117248.097737] intel_pmu_enable_all+0xa/0x10
>> [117248.102210] __perf_event_task_sched_in+0x2df/0x2f0
>> [117248.107511] finish_task_switch.isra.0+0x15f/0x280
>> [117248.112765] schedule_tail+0xc/0x40
>> [117248.116562] ret_from_fork+0x8/0x30
>>
>> that shouldn't be possible, should it? MSR 0x3f1 is MSR_IA32_PEBS_ENABLE
>>
> Not possible, bit 58 is not defined in PEBS_ENABLE, AFAIK.
>
>>
>> this is on recent-git with the patch causing the pebs-related crash
>> reverted.
>>
We never use bit 58. It should be a new issue.
Is it repeatable?
Thanks,
Kan
On Wed, 3 Mar 2021, Liang, Kan wrote:
> We never use bit 58. It should be a new issue.
> Is it repeatable?
yes, it's repeatable.
(which I'm glad to see because it looks suspiciously like a memory bit
flip)
Though since it's a WARN_ONCE I have to reboot each time I want to test.
If I get a chance I'll try to come up with a reduced test case but
probably won't have time for that today.
Vince
On 3/3/2021 3:22 PM, Vince Weaver wrote:
> On Wed, 3 Mar 2021, Liang, Kan wrote:
>
>> We never use bit 58. It should be a new issue.
Actually, KVM uses it. They create a fake event called VLBR_EVENT, which
uses bit 58. It's introduced from the commit 097e4311cda9 ("perf/x86:
Add constraint to create guest LBR event without hw counter").
Since it's a fake event, it doesn't support PEBS. Perf should reject it
if it sets the precise_ip.
The below patch should fix the MSR access error.
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5bac48d..1ea3c67 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3659,6 +3659,10 @@ static int intel_pmu_hw_config(struct perf_event
*event)
return ret;
if (event->attr.precise_ip) {
+
+ if ((event->attr.config & INTEL_ARCH_EVENT_MASK) ==
INTEL_FIXED_VLBR_EVENT)
+ return -EINVAL;
+
if (!(event->attr.freq || (event->attr.wakeup_events &&
!event->attr.watermark))) {
event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
if (!(event->attr.sample_type &
Thanks,
Kan