Hi all,
I'm seeing a system crash while running perf_fuzzer with upstream kernel
on an Intel machine. I hit the crash twice (out of which I don't have log
of first crash so don't know if the reason is same for both the crashes).
I've attached my .config with the mail.
System configuration:
========
$ uname -r
5.0.0-rc3-ravi-pfuzzer+
$ gcc --version
gcc (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
$ lscpu
CPU(s): 64
On-line CPU(s) list: 0-3 (<===== I kept only 4 cpus ON)
Off-line CPU(s) list: 4-63
Model name: Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz
========
And I fuzz only Hardware pmu events. Also, I disabled /proc/sys/kernel/perf*
files manipulation by perf_fuzzer:
========
$ git diff
diff --git a/fuzzer/perf_fuzzer.c b/fuzzer/perf_fuzzer.c
index 7110fd6..6dcb307 100644
--- a/fuzzer/perf_fuzzer.c
+++ b/fuzzer/perf_fuzzer.c
@@ -711,7 +711,7 @@ int main(int argc, char **argv) {
}
break;
case 6: if (type&TYPE_ACCESS) {
- access_random_file();
+ //access_random_file();
}
break;
case 7: if (type&TYPE_FORK) {
diff --git a/fuzzer/trinity_files/perf_event_open.c b/fuzzer/trinity_files/perf_event_open.c
index 2ef3739..d5f4948 100644
--- a/fuzzer/trinity_files/perf_event_open.c
+++ b/fuzzer/trinity_files/perf_event_open.c
@@ -615,6 +615,8 @@ static int random_event_type(void)
int type=0;
+ return PERF_TYPE_HARDWARE;
+
switch (rand() % 8) {
case 0:
type = PERF_TYPE_HARDWARE;
========
And, I'm running fuzzer in a loop with *root*. (Let me know if running
as root is harmful ;-) ).
========
# while true; do ./perf_fuzzer; done
========
I started fuzzer at: Fri Jan 25 10:28:44 IST 2019
dmesg till system was running nornal:
[Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750
[Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250
[Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000
[Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500
[Fri Jan 25 10:32:44 2019] ------------[ cut here ]------------
[Fri Jan 25 10:32:44 2019] perfevents: irq loop stuck!
[Fri Jan 25 10:32:44 2019] WARNING: CPU: 1 PID: 0 at arch/x86/events/intel/core.c:2440 intel_pmu_handle_irq+0x158/0x170
[Fri Jan 25 10:32:44 2019] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif input_leds kvm_intel cdc_ether usbnet kvm mii irqbypass sch_fq_codel ipmi_si ib_iser ioatdma ipmi_devintf rdma_cm ipmi_msghandler lpc_ich dca wmi mac_hid iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul drm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper be2net ahci megaraid_sas libahci hid_generic usbhid hid
[Fri Jan 25 10:32:44 2019] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3-ravi-pfuzzer+ #1
[Fri Jan 25 10:32:44 2019] Hardware name: IBM CPU PLANAR -[8722xyz]-/00FL808, BIOS -[KOE162DUS-2.30]- 08/27/2018
[Fri Jan 25 10:32:44 2019] RIP: 0010:intel_pmu_handle_irq+0x158/0x170
[Fri Jan 25 10:32:44 2019] Code: 10 e8 4c 36 bf 00 e9 06 ff ff ff 80 3d b0 07 8a 01 00 74 0a e8 f9 df ff ff e9 78 ff ff ff 48 c7 c7 82 cc ca 8c e8 08 d7 07 00 <0f> 0b e8 11 a2 ff ff c6 05 8a 07 8a 01 01 eb da 0f 1f 84 00 00 00
[Fri Jan 25 10:32:44 2019] RSP: 0000:fffffe0000032df8 EFLAGS: 00010082
[Fri Jan 25 10:32:44 2019] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002
[Fri Jan 25 10:32:44 2019] RDX: 0000000000000003 RSI: ffffffff8ccacc9e RDI: 0000000000000000
[Fri Jan 25 10:32:44 2019] RBP: fffffe0000032e28 R08: 0000000000000000 R09: 0000000000000000
[Fri Jan 25 10:32:44 2019] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[Fri Jan 25 10:32:44 2019] R13: 0000000000000000 R14: fffffe0000032ef8 R15: ffff95b3ff84f3a0
[Fri Jan 25 10:32:44 2019] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
[Fri Jan 25 10:32:44 2019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Jan 25 10:32:44 2019] CR2: 00007ffdc4de0238 CR3: 0000001fd220e003 CR4: 00000000000606e0
[Fri Jan 25 10:32:44 2019] Call Trace:
[Fri Jan 25 10:32:44 2019] <NMI>
[Fri Jan 25 10:32:44 2019] perf_event_nmi_handler+0x33/0x50
[Fri Jan 25 10:32:44 2019] nmi_handle+0x6e/0x120
[Fri Jan 25 10:32:44 2019] default_do_nmi+0x44/0x120
[Fri Jan 25 10:32:44 2019] do_nmi+0x119/0x180
[Fri Jan 25 10:32:44 2019] end_repeat_nmi+0x16/0x50
[Fri Jan 25 10:32:44 2019] RIP: 0010:intel_pmu_lbr_enable_all+0x2d/0xf0
[Fri Jan 25 10:32:44 2019] Code: 66 90 55 48 c7 c2 a0 f3 00 00 48 89 d0 48 89 e5 41 54 53 65 48 03 05 5a e0 3f 74 8b 80 58 09 00 00 85 c0 75 05 5b 41 5c 5d c3 <65> 48 03 15 43 e0 3f 74 40 84 ff 75 7c 48 8b 82 68 0c 00 00 48 85
[Fri Jan 25 10:32:44 2019] RSP: 0000:ffff95b3ff843e48 EFLAGS: 00000002
[Fri Jan 25 10:32:44 2019] RAX: 0000000000000001 RBX: ffff95b3ff84f3a0 RCX: 00000000000003f1
[Fri Jan 25 10:32:44 2019] RDX: 000000000000f3a0 RSI: 0000000000000003 RDI: 0000000000000000
[Fri Jan 25 10:32:44 2019] RBP: ffff95b3ff843e58 R08: fffffe0000036000 R09: 0000000000000005
[Fri Jan 25 10:32:44 2019] R10: ffff95b3ff84fac0 R11: 0000000000000005 R12: 0000000000000000
[Fri Jan 25 10:32:44 2019] R13: ffff95b3ff84f3a0 R14: ffff95b3ff84f5d4 R15: 0000000000000005
[Fri Jan 25 10:32:44 2019] ? intel_pmu_lbr_enable_all+0x2d/0xf0
[Fri Jan 25 10:32:44 2019] ? intel_pmu_lbr_enable_all+0x2d/0xf0
[Fri Jan 25 10:32:44 2019] </NMI>
[Fri Jan 25 10:32:44 2019] <IRQ>
[Fri Jan 25 10:32:44 2019] __intel_pmu_enable_all+0x2c/0x80
[Fri Jan 25 10:32:44 2019] intel_pmu_enable_all+0x10/0x20
[Fri Jan 25 10:32:44 2019] x86_pmu_enable+0x27f/0x310
[Fri Jan 25 10:32:44 2019] perf_pmu_enable.part.100+0xd/0x10
[Fri Jan 25 10:32:44 2019] ctx_resched+0x8f/0xd0
[Fri Jan 25 10:32:44 2019] __perf_event_enable+0x193/0x1e0
[Fri Jan 25 10:32:44 2019] event_function+0x8e/0xc0
[Fri Jan 25 10:32:44 2019] remote_function+0x41/0x50
[Fri Jan 25 10:32:44 2019] flush_smp_call_function_queue+0x68/0x100
[Fri Jan 25 10:32:44 2019] generic_smp_call_function_single_interrupt+0x13/0x30
[Fri Jan 25 10:32:44 2019] smp_call_function_single_interrupt+0x3e/0xe0
[Fri Jan 25 10:32:44 2019] call_function_single_interrupt+0xf/0x20
[Fri Jan 25 10:32:44 2019] </IRQ>
[Fri Jan 25 10:32:44 2019] RIP: 0010:cpuidle_enter_state+0xc2/0x460
[Fri Jan 25 10:32:44 2019] Code: 66 90 31 ff e8 2f 19 8a ff 80 7d c7 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 69 03 00 00 31 ff e8 f2 25 90 ff fb 66 66 90 <66> 66 90 45 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba
[Fri Jan 25 10:32:44 2019] RSP: 0000:ffffb1d70c527e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[Fri Jan 25 10:32:44 2019] RAX: ffff95b3ff862a00 RBX: ffffffff8d1449c0 RCX: 000000000000001f
[Fri Jan 25 10:32:44 2019] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[Fri Jan 25 10:32:44 2019] RBP: ffffb1d70c527e90 R08: ffffd016493f3b42 R09: ffff95b3ff861a64
[Fri Jan 25 10:32:44 2019] R10: ffffb1d70c527e18 R11: 0000000000000034 R12: ffffd1aaefa40090
[Fri Jan 25 10:32:44 2019] R13: 0000000000000005 R14: 0000000000000005 R15: ffffffff8d1449c0
[Fri Jan 25 10:32:44 2019] ? cpuidle_enter_state+0xa1/0x460
[Fri Jan 25 10:32:44 2019] cpuidle_enter+0x17/0x20
[Fri Jan 25 10:32:44 2019] call_cpuidle+0x23/0x40
[Fri Jan 25 10:32:44 2019] do_idle+0x201/0x280
[Fri Jan 25 10:32:44 2019] cpu_startup_entry+0x1d/0x20
[Fri Jan 25 10:32:44 2019] start_secondary+0x1ab/0x200
[Fri Jan 25 10:32:44 2019] secondary_startup_64+0xa4/0xb0
[Fri Jan 25 10:32:44 2019] ---[ end trace 826a05dfaf1c2406 ]---
[Fri Jan 25 10:32:44 2019] CPU#1: ctrl: 0000000000000000
[Fri Jan 25 10:32:44 2019] CPU#1: status: 0000000000000002
[Fri Jan 25 10:32:44 2019] CPU#1: overflow: 0000000000000000
[Fri Jan 25 10:32:44 2019] CPU#1: fixed: 0000000000000bb0
[Fri Jan 25 10:32:44 2019] CPU#1: pebs: 0000000000000000
[Fri Jan 25 10:32:44 2019] CPU#1: debugctl: 0000000000000000
[Fri Jan 25 10:32:44 2019] CPU#1: active: 0000000600000007
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC0 ctrl: 0000000000410300
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC0 count: 0000800000000001
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC0 left: 00007fffffffffff
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC1 ctrl: 00000000004200c5
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC1 count: 0000ffffffffffff
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC1 left: 0000000000000002
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC2 ctrl: 000000000053003c
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC2 count: 000080000014f025
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC2 left: 00007fffffeb0fdb
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC3 ctrl: 000000000010412e
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC3 count: 0000fffffffffffe
[Fri Jan 25 10:32:44 2019] CPU#1: gen-PMC3 left: 0000000000000002
[Fri Jan 25 10:32:44 2019] CPU#1: fixed-PMC0 count: 0000800000000001
[Fri Jan 25 10:32:44 2019] CPU#1: fixed-PMC1 count: 0000ffffbf106c64
[Fri Jan 25 10:32:44 2019] CPU#1: fixed-PMC2 count: 000080000013cae9
[Fri Jan 25 10:32:44 2019] core: clearing PMU state on CPU#1
[Fri Jan 25 10:33:02 2019] perf: interrupt took too long (8592 > 8420), lowering kernel.perf_event_max_sample_rate to 23250
[Fri Jan 25 10:33:02 2019] perf: interrupt took too long (10793 > 10740), lowering kernel.perf_event_max_sample_rate to 18500
[Fri Jan 25 10:33:02 2019] perf: interrupt took too long (13764 > 13491), lowering kernel.perf_event_max_sample_rate to 14500
[Fri Jan 25 10:41:01 2019] perf: interrupt took too long (17219 > 17205), lowering kernel.perf_event_max_sample_rate to 11500
[Fri Jan 25 10:45:01 2019] perf: interrupt took too long (21567 > 21523), lowering kernel.perf_event_max_sample_rate to 9250
[Fri Jan 25 10:45:01 2019] perf: interrupt took too long (27058 > 26958), lowering kernel.perf_event_max_sample_rate to 7250
[Fri Jan 25 10:45:01 2019] perf: interrupt took too long (34097 > 33822), lowering kernel.perf_event_max_sample_rate to 5750
[Fri Jan 25 10:45:01 2019] perf: interrupt took too long (42633 > 42621), lowering kernel.perf_event_max_sample_rate to 4500
[Fri Jan 25 10:45:01 2019] perf: interrupt took too long (53320 > 53291), lowering kernel.perf_event_max_sample_rate to 3750
[Fri Jan 25 10:47:45 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.193 msecs
[Fri Jan 25 10:47:48 2019] perf: interrupt took too long (76120 > 66650), lowering kernel.perf_event_max_sample_rate to 2500
[Fri Jan 25 10:47:49 2019] perf: interrupt took too long (105206 > 95150), lowering kernel.perf_event_max_sample_rate to 1750
[Fri Jan 25 10:47:49 2019] perf: interrupt took too long (132931 > 131507), lowering kernel.perf_event_max_sample_rate to 1500
dmesg from console when system crashed:
[ 1432.176374] general protection fault: 0000 [#1] SMP PTI
[ 1432.182253] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.0.0-rc3-ravi-pfuzzer+ #1
[ 1432.192088] Hardware name: IBM CPU PLANAR -[8722xyz]-/00FL808, BIOS -[KOE162DUS-2.30]- 08/27/2018 [264/488]
[ 1432.206120] RIP: 0010:perf_prepare_sample+0x8f/0x510
[ 1432.211679] Code: ff ff 41 f6 c4 01 0f 85 22 02 00 00 41 f6 c4 20 74 26 4d 85 e4 0f 88 0c 01 00 00 4c 89 f6 4c 89 ff e8 f5 fe ff ff 49 89 45 70 <48> 8b 00 8d 04 c5 08 00 00 0
0 66 01 43 06 41 f7 c4 00 04 00 00 74
[ 1432.232699] RSP: 0000:ffff95b3ff843a78 EFLAGS: 00010082
[ 1432.238548] RAX: 8d1217eb826cce00 RBX: ffff95b3ff843ad8 RCX: 000000000000001f
[ 1432.246536] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1432.254522] RBP: ffff95b3ff843ab0 R08: ffffd016493f3b42 R09: 0000000000000000
[ 1432.262508] R10: ffff95b3ff843a08 R11: 0000000000000000 R12: 80000000000306e5
[ 1432.270495] R13: ffff95b3ff843bc0 R14: ffff95b3ff843b18 R15: ffff95b3f6b65800
[ 1432.278483] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
[ 1432.287539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1432.293969] CR2: 000055bf7f768c90 CR3: 0000001fd220e005 CR4: 00000000000606e0
[ 1432.301956] Call Trace:
[ 1432.304697] <IRQ>
[ 1432.306956] ? intel_pmu_drain_bts_buffer+0x194/0x230
[ 1432.312615] intel_pmu_drain_bts_buffer+0x160/0x230
[ 1432.318078] ? tick_nohz_irq_exit+0x31/0x40
[ 1432.322765] ? smp_call_function_single_interrupt+0x48/0xe0
[ 1432.328993] ? call_function_single_interrupt+0xf/0x20
[ 1432.334745] ? call_function_single_interrupt+0xa/0x20
[ 1432.340501] ? x86_schedule_events+0x1a0/0x2f0
[ 1432.345475] ? x86_pmu_commit_txn+0xb4/0x100
[ 1432.350258] ? find_busiest_group+0x47/0x5d0
[ 1432.355039] ? perf_event_set_state.part.42+0x12/0x50
[ 1432.360694] ? perf_mux_hrtimer_restart+0x40/0xb0
[ 1432.365960] intel_pmu_disable_event+0xae/0x100
[ 1432.371031] ? intel_pmu_disable_event+0xae/0x100
[ 1432.376297] x86_pmu_stop+0x7a/0xb0
[ 1432.380201] x86_pmu_del+0x57/0x120
[ 1432.384105] event_sched_out.isra.101+0x83/0x180
[ 1432.389272] group_sched_out.part.103+0x57/0xe0
[ 1432.394343] ctx_sched_out+0x188/0x240
[ 1432.398539] ctx_resched+0xa8/0xd0
[ 1432.402345] __perf_event_enable+0x193/0x1e0
[ 1432.407125] event_function+0x8e/0xc0
[ 1432.411222] remote_function+0x41/0x50
[ 1432.415418] flush_smp_call_function_queue+0x68/0x100
[ 1432.421071] generic_smp_call_function_single_interrupt+0x13/0x30
[ 1432.427893] smp_call_function_single_interrupt+0x3e/0xe0
[ 1432.433936] call_function_single_interrupt+0xf/0x20
[ 1432.440204] </IRQ>
[ 1432.443267] RIP: 0010:cpuidle_enter_state+0xc2/0x460
[ 1432.449536] Code: 66 90 31 ff e8 2f 19 8a ff 80 7d c7 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 69 03 00 00 31 ff e8 f2 25 90 ff fb 66 66 90 <66> 66 90 45 85 ed 0f 88 37 0
3 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba [220/488]
[ 1432.472009] RSP: 0000:ffffb1d70c527e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[ 1432.481218] RAX: ffff95b3ff862a00 RBX: ffffffff8d1449c0 RCX: 000000000000001f
[ 1432.489931] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1432.498633] RBP: ffffb1d70c527e90 R08: ffffd016493f3b42 R09: ffff95b3ff861a64
[ 1432.507337] R10: ffffb1d70c527e18 R11: 0000000000000050 R12: ffffd1aaefa40090
[ 1432.516024] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff8d1449c0
[ 1432.524693] ? cpuidle_enter_state+0xa1/0x460
[ 1432.530249] cpuidle_enter+0x17/0x20
[ 1432.534925] call_cpuidle+0x23/0x40
[ 1432.539486] do_idle+0x201/0x280
[ 1432.543758] cpu_startup_entry+0x1d/0x20
[ 1432.548811] start_secondary+0x1ab/0x200
[ 1432.553856] secondary_startup_64+0xa4/0xb0
[ 1432.559177] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif input_leds kvm_intel cdc_ether usbnet kvm mii irqbypass sch_f
q_codel ipmi_si ib_iser ioatdma ipmi_devintf rdma_cm ipmi_msghandler lpc_ich dca wmi mac_hid iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x
_tables btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear mgag200 i2c_algo_bit dr
m_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul drm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper be2net
ahci megaraid_sas libahci hid_generic usbhid hid
[ 1432.637312] ---[ end trace 826a05dfaf1c2407 ]---
[ 1432.643261] RIP: 0010:perf_prepare_sample+0x8f/0x510
[ 1432.649594] Code: ff ff 41 f6 c4 01 0f 85 22 02 00 00 41 f6 c4 20 74 26 4d 85 e4 0f 88 0c 01 00 00 4c 89 f6 4c 89 ff e8 f5 fe ff ff 49 89 45 70 <48> 8b 00 8d 04 c5 08 00 00 0
0 66 01 43 06 41 f7 c4 00 04 00 00 74
[ 1432.672236] RSP: 0000:ffff95b3ff843a78 EFLAGS: 00010082
[ 1432.678918] RAX: 8d1217eb826cce00 RBX: ffff95b3ff843ad8 RCX: 000000000000001f
[ 1432.687752] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1432.696591] RBP: ffff95b3ff843ab0 R08: ffffd016493f3b42 R09: 0000000000000000
[ 1432.705428] R10: ffff95b3ff843a08 R11: 0000000000000000 R12: 80000000000306e5
[ 1432.714268] R13: ffff95b3ff843bc0 R14: ffff95b3ff843b18 R15: ffff95b3f6b65800
[ 1432.723109] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
[ 1432.733031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1432.740334] CR2: 000055bf7f768c90 CR3: 0000001fd220e005 CR4: 00000000000606e0
[ 1432.749210] Kernel panic - not syncing: Fatal exception in interrupt
[ 1432.757265] Kernel Offset: 0xac00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1432.770166] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
[ 1432.779582] ------------[ cut here ]------------
[ 1432.785700] sched: Unexpected reschedule of offline CPU#0!
[ 1432.792802] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x3f/0x50
[ 1432.803993] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif input_leds kvm_intel cdc_ether usbnet kvm mii irqbypass sch_f
q_codel ipmi_si ib_iser ioatdma ipmi_devintf rdma_cm ipmi_msghandler lpc_ich dca wmi mac_hid iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x
_tables btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear mgag200 i2c_algo_bit dr
m_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul drm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper be2net
ahci megaraid_sas libahci hid_generic usbhid hid
[ 1432.884112] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 5.0.0-rc3-ravi-pfuzzer+ #1
[ 1432.895062] Hardware name: IBM CPU PLANAR -[8722xyz]-/00FL808, BIOS -[KOE162DUS-2.30]- 08/27/2018 [176/488]
[ 1432.911307] RIP: 0010:native_smp_send_reschedule+0x3f/0x50
[ 1432.918548] Code: c0 84 c0 74 17 48 8b 05 4f 64 15 01 be fd 00 00 00 48 8b 40 30 e8 31 c4 ba 00 5d c3 89 fe 48 c7 c7 10 8c cb 8c e8 01 65 03 00 <0f> 0b 5d c3 0f 1f 00 66 2e 0
f 1f 84 00 00 00 00 00 66 66 66 66 90
[ 1432.941803] RSP: 0000:ffff95b3ff843460 EFLAGS: 00010082
[ 1432.948781] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 1432.957893] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff95b3ff8563a0
[ 1432.967013] RBP: ffff95b3ff843460 R08: 0000000000000001 R09: 0000000000cdcdcd
[ 1432.976127] R10: 000000000005538e R11: 0000000000000001 R12: ffff95b3f67696c0
[ 1432.985244] R13: ffff95b3ff822a00 R14: ffff95b3ff843518 R15: ffff95b3ff822a00
[ 1432.994364] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
[ 1433.004567] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1433.012158] CR2: 000055bf7f768c90 CR3: 0000001fd220e005 CR4: 00000000000606e0
[ 1433.021312] Call Trace:
[ 1433.025226] <IRQ>
[ 1433.028630] resched_curr+0x6c/0xd0
[ 1433.033660] check_preempt_curr+0x54/0x90
[ 1433.039247] ttwu_do_wakeup+0x1e/0x150
[ 1433.044514] ttwu_do_activate+0x77/0x80
[ 1433.049855] try_to_wake_up+0x1d6/0x4b0
[ 1433.055170] ? mga_dirty_update+0x1ed/0x320 [mgag200]
[ 1433.061823] default_wake_function+0x12/0x20
[ 1433.067596] autoremove_wake_function+0x12/0x40
[ 1433.073646] __wake_up_common+0x8c/0x130
[ 1433.078998] __wake_up_common_lock+0x80/0xc0
[ 1433.084721] __wake_up+0x13/0x20
[ 1433.089269] wake_up_klogd_work_func+0x40/0x60
[ 1433.095165] irq_work_run_list+0x55/0x80
[ 1433.100472] ? tick_sched_do_timer+0x60/0x60
[ 1433.106178] irq_work_tick+0x40/0x50
[ 1433.111105] update_process_times+0x42/0x60
[ 1433.116696] tick_sched_handle+0x29/0x60
[ 1433.121987] tick_sched_timer+0x3c/0x80
[ 1433.127170] __hrtimer_run_queues+0x106/0x270
[ 1433.132925] hrtimer_interrupt+0x116/0x240
[ 1433.138372] smp_apic_timer_interrupt+0x6f/0x150
[ 1433.144382] apic_timer_interrupt+0xf/0x20
[ 1433.149802] RIP: 0010:panic+0x25b/0x2a4
[ 1433.154921] Code: 83 c3 64 eb a6 83 3d df 4b 88 01 00 74 05 e8 00 87 02 00 48 c7 c6 e0 01 51 8d 48 c7 c7 08 2a cc 8c e8 40 92 06 00 fb 66 66 90 <66> 66 90 31 db e8 af d7 0d 0
0 4c 39 eb 7c 1d 41 83 f4 01 48 8b 05
[ 1433.177603] RSP: 0000:ffff95b3ff8438b8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
[ 1433.186893] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[ 1433.195683] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff95b3ff8563a0
[ 1433.204434] RBP: ffff95b3ff843930 R08: 0000000000000001 R09: 0000000000cdcdcd
[ 1433.213177] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [132/488]
[ 1433.221896] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1433.230589] ? apic_timer_interrupt+0xa/0x20
[ 1433.236060] ? panic+0x257/0x2a4
[ 1433.240357] oops_end+0xe4/0xf0
[ 1433.244540] die+0x42/0x50
[ 1433.248220] do_general_protection+0xcc/0x160
[ 1433.253710] general_protection+0x1e/0x30
[ 1433.258808] RIP: 0010:perf_prepare_sample+0x8f/0x510
[ 1433.264972] Code: ff ff 41 f6 c4 01 0f 85 22 02 00 00 41 f6 c4 20 74 26 4d 85 e4 0f 88 0c 01 00 00 4c 89 f6 4c 89 ff e8 f5 fe ff ff 49 89 45 70 <48> 8b 00 8d 04 c5 08 00 00 0
0 66 01 43 06 41 f7 c4 00 04 00 00 74
[ 1433.287267] RSP: 0000:ffff95b3ff843a78 EFLAGS: 00010082
[ 1433.293772] RAX: 8d1217eb826cce00 RBX: ffff95b3ff843ad8 RCX: 000000000000001f
[ 1433.302419] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1433.311056] RBP: ffff95b3ff843ab0 R08: ffffd016493f3b42 R09: 0000000000000000
[ 1433.319685] R10: ffff95b3ff843a08 R11: 0000000000000000 R12: 80000000000306e5
[ 1433.328307] R13: ffff95b3ff843bc0 R14: ffff95b3ff843b18 R15: ffff95b3f6b65800
[ 1433.336927] ? perf_prepare_sample+0x29b/0x510
[ 1433.342545] ? intel_pmu_drain_bts_buffer+0x194/0x230
[ 1433.348849] intel_pmu_drain_bts_buffer+0x160/0x230
[ 1433.354963] ? tick_nohz_irq_exit+0x31/0x40
[ 1433.360301] ? smp_call_function_single_interrupt+0x48/0xe0
[ 1433.367202] ? call_function_single_interrupt+0xf/0x20
[ 1433.373620] ? call_function_single_interrupt+0xa/0x20
[ 1433.380031] ? x86_schedule_events+0x1a0/0x2f0
[ 1433.385662] ? x86_pmu_commit_txn+0xb4/0x100
[ 1433.391096] ? find_busiest_group+0x47/0x5d0
[ 1433.396522] ? perf_event_set_state.part.42+0x12/0x50
[ 1433.402829] ? perf_mux_hrtimer_restart+0x40/0xb0
[ 1433.408757] intel_pmu_disable_event+0xae/0x100
[ 1433.414492] ? intel_pmu_disable_event+0xae/0x100
[ 1433.420424] x86_pmu_stop+0x7a/0xb0
[ 1433.424995] x86_pmu_del+0x57/0x120
[ 1433.429550] event_sched_out.isra.101+0x83/0x180
[ 1433.435381] group_sched_out.part.103+0x57/0xe0
[ 1433.441104] ctx_sched_out+0x188/0x240
[ 1433.445963] ctx_resched+0xa8/0xd0
[ 1433.450430] __perf_event_enable+0x193/0x1e0
[ 1433.455870] event_function+0x8e/0xc0
[ 1433.460625] remote_function+0x41/0x50
[ 1433.465477] flush_smp_call_function_queue+0x68/0x100
[ 1433.471784] generic_smp_call_function_single_interrupt+0x13/0x30
[ 1433.479244] smp_call_function_single_interrupt+0x3e/0xe0
[ 1433.485917] call_function_single_interrupt+0xf/0x20
[ 1433.492094] </IRQ> [88/488]
[ 1433.495056] RIP: 0010:cpuidle_enter_state+0xc2/0x460
[ 1433.501231] Code: 66 90 31 ff e8 2f 19 8a ff 80 7d c7 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 69 03 00 00 31 ff e8 f2 25 90 ff fb 66 66 90 <66> 66 90 45 85 ed 0f 88 37 0
3 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba
[ 1433.523542] RSP: 0000:ffffb1d70c527e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[ 1433.532691] RAX: ffff95b3ff862a00 RBX: ffffffff8d1449c0 RCX: 000000000000001f
[ 1433.541355] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1433.550023] RBP: ffffb1d70c527e90 R08: ffffd016493f3b42 R09: ffff95b3ff861a64
[ 1433.558700] R10: ffffb1d70c527e18 R11: 0000000000000050 R12: ffffd1aaefa40090
[ 1433.567376] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff8d1449c0
[ 1433.576065] ? cpuidle_enter_state+0xa1/0x460
[ 1433.581651] cpuidle_enter+0x17/0x20
[ 1433.586361] call_cpuidle+0x23/0x40
[ 1433.590943] do_idle+0x201/0x280
[ 1433.595219] cpu_startup_entry+0x1d/0x20
[ 1433.600256] start_secondary+0x1ab/0x200
[ 1433.605284] secondary_startup_64+0xa4/0xb0
[ 1433.610599] ---[ end trace 826a05dfaf1c2408 ]---
[ 1433.616404] ------------[ cut here ]------------
[ 1433.622203] sched: Unexpected reschedule of offline CPU#2!
[ 1433.628979] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x3f/0x50
[ 1433.639842] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif input_leds kvm_intel cdc_ether usbnet kvm mii irqbypass sch_f
q_codel ipmi_si ib_iser ioatdma ipmi_devintf rdma_cm ipmi_msghandler lpc_ich dca wmi mac_hid iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x
_tables btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear mgag200 i2c_algo_bit dr
m_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul drm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper be2net
ahci megaraid_sas libahci hid_generic usbhid hid
[ 1433.717991] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 5.0.0-rc3-ravi-pfuzzer+ #1
[ 1433.728625] Hardware name: IBM CPU PLANAR -[8722xyz]-/00FL808, BIOS -[KOE162DUS-2.30]- 08/27/2018
[ 1433.744299] RIP: 0010:native_smp_send_reschedule+0x3f/0x50
[ 1433.751279] Code: c0 84 c0 74 17 48 8b 05 4f 64 15 01 be fd 00 00 00 48 8b 40 30 e8 31 c4 ba 00 5d c3 89 fe 48 c7 c7 10 8c cb 8c e8 01 65 03 00 <0f> 0b 5d c3 0f 1f 00 66 2e 0
f 1f 84 00 00 00 00 00 66 66 66 66 90
[ 1433.774036] RSP: 0000:ffff95b3ff843460 EFLAGS: 00010082
[ 1433.780773] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006
[ 1433.789657] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff95b3ff8563a0
[ 1433.798540] RBP: ffff95b3ff843460 R08: 0000000000000001 R09: 0000000000cdcdcd
[ 1433.807431] R10: 0000000000100000 R11: 0000000000000001 R12: ffff95c3f82b5b00
[ 1433.816326] R13: ffff95b3ff8a2a00 R14: ffff95b3ff843518 R15: ffff95b3ff8a2a00
[ 1433.825224] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
[ 1433.835206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1433.842569] CR2: 000055bf7f768c90 CR3: 0000001fd220e005 CR4: 00000000000606e0
[ 1433.851498] Call Trace:
[ 1433.855190] <IRQ>
[ 1433.858386] resched_curr+0x6c/0xd0
[ 1433.863229] check_preempt_curr+0x54/0x90
[ 1433.868651] ttwu_do_wakeup+0x1e/0x150 [44/488]
[ 1433.873772] ttwu_do_activate+0x77/0x80
[ 1433.878978] try_to_wake_up+0x1d6/0x4b0
[ 1433.884189] ? mga_dirty_update+0x1ed/0x320 [mgag200]
[ 1433.890742] default_wake_function+0x12/0x20
[ 1433.896400] autoremove_wake_function+0x12/0x40
[ 1433.902336] __wake_up_common+0x8c/0x130
[ 1433.907585] __wake_up_common_lock+0x80/0xc0
[ 1433.913215] __wake_up+0x13/0x20
[ 1433.917668] wake_up_klogd_work_func+0x40/0x60
[ 1433.923468] irq_work_run_list+0x55/0x80
[ 1433.928694] ? tick_sched_do_timer+0x60/0x60
[ 1433.934299] irq_work_tick+0x40/0x50
[ 1433.939125] update_process_times+0x42/0x60
[ 1433.944637] tick_sched_handle+0x29/0x60
[ 1433.949858] tick_sched_timer+0x3c/0x80
[ 1433.954981] __hrtimer_run_queues+0x106/0x270
[ 1433.960687] hrtimer_interrupt+0x116/0x240
[ 1433.966101] smp_apic_timer_interrupt+0x6f/0x150
[ 1433.972099] apic_timer_interrupt+0xf/0x20
[ 1433.977515] RIP: 0010:panic+0x25b/0x2a4
[ 1433.982634] Code: 83 c3 64 eb a6 83 3d df 4b 88 01 00 74 05 e8 00 87 02 00 48 c7 c6 e0 01 51 8d 48 c7 c7 08 2a cc 8c e8 40 92 06 00 fb 66 66 90 <66> 66 90 31 db e8 af d7 0d 0
0 4c 39 eb 7c 1d 41 83 f4 01 48 8b 05
[ 1434.005315] RSP: 0000:ffff95b3ff8438b8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
[ 1434.014609] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[ 1434.023395] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff95b3ff8563a0
[ 1434.032160] RBP: ffff95b3ff843930 R08: 0000000000000001 R09: 0000000000cdcdcd
[ 1434.040903] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 1434.049620] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1434.058311] ? apic_timer_interrupt+0xa/0x20
[ 1434.063781] ? panic+0x257/0x2a4
[ 1434.068072] oops_end+0xe4/0xf0
[ 1434.072253] die+0x42/0x50
[ 1434.075927] do_general_protection+0xcc/0x160
[ 1434.081426] general_protection+0x1e/0x30
[ 1434.086523] RIP: 0010:perf_prepare_sample+0x8f/0x510
[ 1434.092683] Code: ff ff 41 f6 c4 01 0f 85 22 02 00 00 41 f6 c4 20 74 26 4d 85 e4 0f 88 0c 01 00 00 4c 89 f6 4c 89 ff e8 f5 fe ff ff 49 89 45 70 <48> 8b 00 8d 04 c5 08 00 00 0
0 66 01 43 06 41 f7 c4 00 04 00 00 74
[ 1434.114974] RSP: 0000:ffff95b3ff843a78 EFLAGS: 00010082
[ 1434.121474] RAX: 8d1217eb826cce00 RBX: ffff95b3ff843ad8 RCX: 000000000000001f
[ 1434.130119] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1434.138755] RBP: ffff95b3ff843ab0 R08: ffffd016493f3b42 R09: 0000000000000000
[ 1434.147383] R10: ffff95b3ff843a08 R11: 0000000000000000 R12: 80000000000306e5
[ 1434.156003] R13: ffff95b3ff843bc0 R14: ffff95b3ff843b18 R15: ffff95b3f6b65800
[ 1434.164622] ? perf_prepare_sample+0x29b/0x510
[ 1434.170239] ? intel_pmu_drain_bts_buffer+0x194/0x230
[ 1434.176543] intel_pmu_drain_bts_buffer+0x160/0x230
[ 1434.182655] ? tick_nohz_irq_exit+0x31/0x40
[ 1434.187991] ? smp_call_function_single_interrupt+0x48/0xe0
[ 1434.194890] ? call_function_single_interrupt+0xf/0x20
[ 1434.201311] ? call_function_single_interrupt+0xa/0x20
[ 1434.207712] ? x86_schedule_events+0x1a0/0x2f0
[ 1434.213344] ? x86_pmu_commit_txn+0xb4/0x100
[ 1434.218780] ? find_busiest_group+0x47/0x5d0
[ 1434.224212] ? perf_event_set_state.part.42+0x12/0x50
[ 1434.230528] ? perf_mux_hrtimer_restart+0x40/0xb0
[ 1434.236447] intel_pmu_disable_event+0xae/0x100
[ 1434.242181] ? intel_pmu_disable_event+0xae/0x100
[ 1434.248104] x86_pmu_stop+0x7a/0xb0
[ 1434.252674] x86_pmu_del+0x57/0x120
[ 1434.257230] event_sched_out.isra.101+0x83/0x180
[ 1434.263062] group_sched_out.part.103+0x57/0xe0
[ 1434.268793] ctx_sched_out+0x188/0x240
[ 1434.273654] ctx_resched+0xa8/0xd0
[ 1434.278122] __perf_event_enable+0x193/0x1e0
[ 1434.283563] event_function+0x8e/0xc0
[ 1434.288319] remote_function+0x41/0x50
[ 1434.293170] flush_smp_call_function_queue+0x68/0x100
[ 1434.299485] generic_smp_call_function_single_interrupt+0x13/0x30
[ 1434.306951] smp_call_function_single_interrupt+0x3e/0xe0
[ 1434.313626] call_function_single_interrupt+0xf/0x20
[ 1434.319802] </IRQ>
[ 1434.322763] RIP: 0010:cpuidle_enter_state+0xc2/0x460
[ 1434.328936] Code: 66 90 31 ff e8 2f 19 8a ff 80 7d c7 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 69 03 00 00 31 ff e8 f2 25 90 ff fb 66 66 90 <66> 66 90 45 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba
[ 1434.351246] RSP: 0000:ffffb1d70c527e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[ 1434.360394] RAX: ffff95b3ff862a00 RBX: ffffffff8d1449c0 RCX: 000000000000001f
[ 1434.369055] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
[ 1434.377721] RBP: ffffb1d70c527e90 R08: ffffd016493f3b42 R09: ffff95b3ff861a64
[ 1434.386398] R10: ffffb1d70c527e18 R11: 0000000000000050 R12: ffffd1aaefa40090
[ 1434.395071] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff8d1449c0
[ 1434.403760] ? cpuidle_enter_state+0xa1/0x460
[ 1434.409346] cpuidle_enter+0x17/0x20
[ 1434.414054] call_cpuidle+0x23/0x40
[ 1434.418644] do_idle+0x201/0x280
[ 1434.422918] cpu_startup_entry+0x1d/0x20
[ 1434.427955] start_secondary+0x1ab/0x200
[ 1434.432983] secondary_startup_64+0xa4/0xb0
[ 1434.438301] ---[ end trace 826a05dfaf1c2409 ]---
Thanks,
Ravi
On Fri, 25 Jan 2019, Ravi Bangoria wrote:
> I'm seeing a system crash while running perf_fuzzer with upstream kernel
> on an Intel machine. I hit the crash twice (out of which I don't have log
> of first crash so don't know if the reason is same for both the crashes).
> I've attached my .config with the mail.
> type = PERF_TYPE_HARDWARE;
> ========
>
> And, I'm running fuzzer in a loop with *root*. (Let me know if running
> as root is harmful ;-) ).
There's a known issue related to Intel BTS events that you can trigger
with the perf_fuzzer, even as a normal user. I reported it a few months
ago but I don't think it ever got resolved. The traces you get look
similar to some that you posted.
It's hard to track down as it doesn't seem to be a simple issue, but
rather it looks like the BTS event handling is stomping over memory it
shouldn't somehow.
Vince
> [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
> [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750
> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250
> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000
> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500
These are fairly normal.
> [Fri Jan 25 10:32:44 2019] ------------[ cut here ]------------
> [Fri Jan 25 10:32:44 2019] perfevents: irq loop stuck!
I believe it's always possible to cause an irq loop. This happens when
the PMU is programmed to cause PMIs on multiple counters
too quickly. Maybe should just recover from it without printing such
scary messages.
Right now the scary message is justified because it resets the complete
PMU. Perhaps need to be a bit more selective resetting on only
the events that loop.
> [Fri Jan 25 10:32:44 2019] WARNING: CPU: 1 PID: 0 at arch/x86/events/intel/core.c:2440 intel_pmu_handle_irq+0x158/0x170
This looks independent.
I would apply the following patch (cut'n'pasted, so may need manual apply)
and then run with
cd /sys/kernel/debug/tracing
echo 50000 > buffer_size_kb
echo default_do_nmi > set_graph_function
echo 1 > events/msr/enable
echo 'msr != 0xc0000100 && msr != 0x6e0' > events/msr/write_msr/filter
echo function_graph > current_tracer
echo printk:traceoff > set_ftrace_filter
echo 1 > tracing_on
and then collect the trace from /sys/kernel/debug/tracing/trace
after the oops. This should show the context of when it happens.
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 3c022e33c109..8afc997110e0 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -1,7 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
-ifdef CONFIG_FUNCTION_TRACER
-CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
-endif
obj-y := core.o ring_buffer.o callchain.o
On Fri, Jan 25, 2019 at 12:16:27PM +0530, Ravi Bangoria wrote:
SNIP
> [ 1432.176374] general protection fault: 0000 [#1] SMP PTI
> [ 1432.182253] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.0.0-rc3-ravi-pfuzzer+ #1
> [ 1432.192088] Hardware name: IBM CPU PLANAR -[8722xyz]-/00FL808, BIOS -[KOE162DUS-2.30]- 08/27/2018 [264/488]
> [ 1432.206120] RIP: 0010:perf_prepare_sample+0x8f/0x510
> [ 1432.211679] Code: ff ff 41 f6 c4 01 0f 85 22 02 00 00 41 f6 c4 20 74 26 4d 85 e4 0f 88 0c 01 00 00 4c 89 f6 4c 89 ff e8 f5 fe ff ff 49 89 45 70 <48> 8b 00 8d 04 c5 08 00 00 0
> 0 66 01 43 06 41 f7 c4 00 04 00 00 74
> [ 1432.232699] RSP: 0000:ffff95b3ff843a78 EFLAGS: 00010082
> [ 1432.238548] RAX: 8d1217eb826cce00 RBX: ffff95b3ff843ad8 RCX: 000000000000001f
> [ 1432.246536] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
> [ 1432.254522] RBP: ffff95b3ff843ab0 R08: ffffd016493f3b42 R09: 0000000000000000
> [ 1432.262508] R10: ffff95b3ff843a08 R11: 0000000000000000 R12: 80000000000306e5
> [ 1432.270495] R13: ffff95b3ff843bc0 R14: ffff95b3ff843b18 R15: ffff95b3f6b65800
> [ 1432.278483] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
> [ 1432.287539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1432.293969] CR2: 000055bf7f768c90 CR3: 0000001fd220e005 CR4: 00000000000606e0
> [ 1432.301956] Call Trace:
> [ 1432.304697] <IRQ>
> [ 1432.306956] ? intel_pmu_drain_bts_buffer+0x194/0x230
> [ 1432.312615] intel_pmu_drain_bts_buffer+0x160/0x230
> [ 1432.318078] ? tick_nohz_irq_exit+0x31/0x40
> [ 1432.322765] ? smp_call_function_single_interrupt+0x48/0xe0
> [ 1432.328993] ? call_function_single_interrupt+0xf/0x20
> [ 1432.334745] ? call_function_single_interrupt+0xa/0x20
> [ 1432.340501] ? x86_schedule_events+0x1a0/0x2f0
> [ 1432.345475] ? x86_pmu_commit_txn+0xb4/0x100
> [ 1432.350258] ? find_busiest_group+0x47/0x5d0
> [ 1432.355039] ? perf_event_set_state.part.42+0x12/0x50
> [ 1432.360694] ? perf_mux_hrtimer_restart+0x40/0xb0
> [ 1432.365960] intel_pmu_disable_event+0xae/0x100
> [ 1432.371031] ? intel_pmu_disable_event+0xae/0x100
> [ 1432.376297] x86_pmu_stop+0x7a/0xb0
> [ 1432.380201] x86_pmu_del+0x57/0x120
> [ 1432.384105] event_sched_out.isra.101+0x83/0x180
> [ 1432.389272] group_sched_out.part.103+0x57/0xe0
> [ 1432.394343] ctx_sched_out+0x188/0x240
> [ 1432.398539] ctx_resched+0xa8/0xd0
> [ 1432.402345] __perf_event_enable+0x193/0x1e0
> [ 1432.407125] event_function+0x8e/0xc0
> [ 1432.411222] remote_function+0x41/0x50
> [ 1432.415418] flush_smp_call_function_queue+0x68/0x100
> [ 1432.421071] generic_smp_call_function_single_interrupt+0x13/0x30
> [ 1432.427893] smp_call_function_single_interrupt+0x3e/0xe0
> [ 1432.433936] call_function_single_interrupt+0xf/0x20
> [ 1432.440204] </IRQ>
hi,
I finaly managed to reproduced this one ;-)
also I found reproducer for crash on my server, now I'm running
the fuzzer to see if it cures it as well.. so far so good ;-)
the crash happens when I create 'almost bts' event with:
event = branch-instructions:up
sample_period = 2
sample_type = CALLCHAIN
after I open the event I change the sample_period via ioctl to '1',
which will make the event 'full bts', but no bts checks are performed
so the bts drain crashes
the patch adds check_eriod pmu callback.. I need to check if there's
better way to do this, but so far it fixes the crash for me
if you guys could check this patch, that'd be great
thanks,
jirka
---
arch/x86/events/core.c | 6 ++++++
arch/x86/events/intel/core.c | 9 +++++++++
arch/x86/events/perf_event.h | 2 ++
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 31 +++++++++++++++++++++++++++++++
5 files changed, 53 insertions(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..e5db4fad517e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2278,6 +2278,11 @@ void perf_check_microcode(void)
x86_pmu.check_microcode();
}
+static int x86_pmu_check_period(struct perf_event *event)
+{
+ return x86_pmu.check_period ? x86_pmu.check_period(event) : 0;
+}
+
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
@@ -2302,6 +2307,7 @@ static struct pmu pmu = {
.event_idx = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
.task_ctx_size = sizeof(struct x86_perf_task_context),
+ .check_period = x86_pmu_check_period,
};
void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 40e12cfc87f6..125930e328c8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3584,6 +3584,11 @@ static void intel_pmu_sched_task(struct perf_event_context *ctx,
intel_pmu_lbr_sched_task(ctx, sched_in);
}
+static int intel_pmu_check_period(struct perf_event *event)
+{
+ return intel_pmu_has_bts(event) ? -EINVAL : 0;
+}
+
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -3663,6 +3668,8 @@ static __initconst const struct x86_pmu core_pmu = {
.cpu_prepare = intel_pmu_cpu_prepare,
.cpu_starting = intel_pmu_cpu_starting,
.cpu_dying = intel_pmu_cpu_dying,
+
+ .check_period = intel_pmu_check_period,
};
static struct attribute *intel_pmu_attrs[];
@@ -3705,6 +3712,8 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_dying = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
+
+ .check_period = intel_pmu_check_period,
};
static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..170b58d48710 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -646,6 +646,8 @@ struct x86_pmu {
* Intel host/guest support (KVM)
*/
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+ int (*check_period) (struct perf_event *event);
};
struct x86_perf_task_context {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a79e59fc3b7d..749a5c3110e2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -447,6 +447,11 @@ struct pmu {
* Filter events for PMU-specific reasons.
*/
int (*filter_match) (struct perf_event *event); /* optional */
+
+ /*
+ * Check period for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event); /* optional */
};
enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 280a72b3a553..22ec63a0782e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event *event,
}
}
+static int check_period(struct perf_event *event, u64 value)
+{
+ u64 sample_period_attr = event->attr.sample_period;
+ u64 sample_period_hw = event->hw.sample_period;
+ int ret;
+
+ if (event->attr.freq) {
+ event->attr.sample_freq = value;
+ } else {
+ event->attr.sample_period = value;
+ event->hw.sample_period = value;
+ }
+
+ ret = event->pmu->check_period(event);
+
+ event->attr.sample_period = sample_period_attr;
+ event->hw.sample_period = sample_period_hw;
+ return ret;
+}
+
static int perf_event_period(struct perf_event *event, u64 __user *arg)
{
u64 value;
@@ -4985,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
if (event->attr.freq && value > sysctl_perf_event_sample_rate)
return -EINVAL;
+ if (check_period(event, value))
+ return -EINVAL;
+
event_function_call(event, __perf_event_period, &value);
return 0;
@@ -9601,6 +9624,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
return 0;
}
+static int perf_event_nop_int(struct perf_event *event)
+{
+ return 0;
+}
+
static DEFINE_PER_CPU(unsigned int, nop_txn_flags);
static void perf_pmu_start_txn(struct pmu *pmu, unsigned int flags)
@@ -9901,6 +9929,9 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
pmu->pmu_disable = perf_pmu_nop_void;
}
+ if (!pmu->check_period)
+ pmu->check_period = perf_event_nop_int;
+
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
--
2.17.2
Jiri Olsa <[email protected]> writes:
>
> the patch adds check_eriod pmu callback.. I need to check if there's
> better way to do this, but so far it fixes the crash for me
>
> if you guys could check this patch, that'd be great
There's already a limit_period callback, perhaps that could
be extended. But ok, can do it this way too.
I suspect there are some other cases that need this callback, not
just BTS, e.g. the checks in hsw_hw_config
-Andi
On Wed, Jan 30, 2019 at 12:39:47PM -0800, Andi Kleen wrote:
> Jiri Olsa <[email protected]> writes:
> >
> > the patch adds check_eriod pmu callback.. I need to check if there's
> > better way to do this, but so far it fixes the crash for me
> >
> > if you guys could check this patch, that'd be great
>
> There's already a limit_period callback, perhaps that could
> be extended. But ok, can do it this way too.
right, we call it within x86_perf_event_set_period to limit
the period, but I guess we should include this check after
changing the period:
if (event->attr.sample_period && x86_pmu.limit_period) {
if (x86_pmu.limit_period(event, event->attr.sample_period) >
event->attr.sample_period)
return -EINVAL;
}
>
> I suspect there are some other cases that need this callback, not
> just BTS, e.g. the checks in hsw_hw_config
ok, the sample_period values for checkpointed events,
we should check for this as well.. I'll add that
jirka
On Wed, Jan 30, 2019 at 07:36:48PM +0100, Jiri Olsa wrote:
> On Fri, Jan 25, 2019 at 12:16:27PM +0530, Ravi Bangoria wrote:
>
> SNIP
>
> > [ 1432.176374] general protection fault: 0000 [#1] SMP PTI
> > [ 1432.182253] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.0.0-rc3-ravi-pfuzzer+ #1
> > [ 1432.192088] Hardware name: IBM CPU PLANAR -[8722xyz]-/00FL808, BIOS -[KOE162DUS-2.30]- 08/27/2018 [264/488]
> > [ 1432.206120] RIP: 0010:perf_prepare_sample+0x8f/0x510
> > [ 1432.211679] Code: ff ff 41 f6 c4 01 0f 85 22 02 00 00 41 f6 c4 20 74 26 4d 85 e4 0f 88 0c 01 00 00 4c 89 f6 4c 89 ff e8 f5 fe ff ff 49 89 45 70 <48> 8b 00 8d 04 c5 08 00 00 0
> > 0 66 01 43 06 41 f7 c4 00 04 00 00 74
> > [ 1432.232699] RSP: 0000:ffff95b3ff843a78 EFLAGS: 00010082
> > [ 1432.238548] RAX: 8d1217eb826cce00 RBX: ffff95b3ff843ad8 RCX: 000000000000001f
> > [ 1432.246536] RDX: 0000000000000000 RSI: 00000000355563e5 RDI: 0000000000000000
> > [ 1432.254522] RBP: ffff95b3ff843ab0 R08: ffffd016493f3b42 R09: 0000000000000000
> > [ 1432.262508] R10: ffff95b3ff843a08 R11: 0000000000000000 R12: 80000000000306e5
> > [ 1432.270495] R13: ffff95b3ff843bc0 R14: ffff95b3ff843b18 R15: ffff95b3f6b65800
> > [ 1432.278483] FS: 0000000000000000(0000) GS:ffff95b3ff840000(0000) knlGS:0000000000000000
> > [ 1432.287539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1432.293969] CR2: 000055bf7f768c90 CR3: 0000001fd220e005 CR4: 00000000000606e0
> > [ 1432.301956] Call Trace:
> > [ 1432.304697] <IRQ>
> > [ 1432.306956] ? intel_pmu_drain_bts_buffer+0x194/0x230
> > [ 1432.312615] intel_pmu_drain_bts_buffer+0x160/0x230
> > [ 1432.318078] ? tick_nohz_irq_exit+0x31/0x40
> > [ 1432.322765] ? smp_call_function_single_interrupt+0x48/0xe0
> > [ 1432.328993] ? call_function_single_interrupt+0xf/0x20
> > [ 1432.334745] ? call_function_single_interrupt+0xa/0x20
> > [ 1432.340501] ? x86_schedule_events+0x1a0/0x2f0
> > [ 1432.345475] ? x86_pmu_commit_txn+0xb4/0x100
> > [ 1432.350258] ? find_busiest_group+0x47/0x5d0
> > [ 1432.355039] ? perf_event_set_state.part.42+0x12/0x50
> > [ 1432.360694] ? perf_mux_hrtimer_restart+0x40/0xb0
> > [ 1432.365960] intel_pmu_disable_event+0xae/0x100
> > [ 1432.371031] ? intel_pmu_disable_event+0xae/0x100
> > [ 1432.376297] x86_pmu_stop+0x7a/0xb0
> > [ 1432.380201] x86_pmu_del+0x57/0x120
> > [ 1432.384105] event_sched_out.isra.101+0x83/0x180
> > [ 1432.389272] group_sched_out.part.103+0x57/0xe0
> > [ 1432.394343] ctx_sched_out+0x188/0x240
> > [ 1432.398539] ctx_resched+0xa8/0xd0
> > [ 1432.402345] __perf_event_enable+0x193/0x1e0
> > [ 1432.407125] event_function+0x8e/0xc0
> > [ 1432.411222] remote_function+0x41/0x50
> > [ 1432.415418] flush_smp_call_function_queue+0x68/0x100
> > [ 1432.421071] generic_smp_call_function_single_interrupt+0x13/0x30
> > [ 1432.427893] smp_call_function_single_interrupt+0x3e/0xe0
> > [ 1432.433936] call_function_single_interrupt+0xf/0x20
> > [ 1432.440204] </IRQ>
>
> hi,
> I finaly managed to reproduced this one ;-)
ugh.. and fuzzer managed to reproduce it again even with the patch below :-\
there's some other way the event could become bts with callchain
jirka
>
> also I found reproducer for crash on my server, now I'm running
> the fuzzer to see if it cures it as well.. so far so good ;-)
>
> the crash happens when I create 'almost bts' event with:
> event = branch-instructions:up
> sample_period = 2
> sample_type = CALLCHAIN
>
> after I open the event I change the sample_period via ioctl to '1',
> which will make the event 'full bts', but no bts checks are performed
> so the bts drain crashes
>
> the patch adds check_eriod pmu callback.. I need to check if there's
> better way to do this, but so far it fixes the crash for me
>
> if you guys could check this patch, that'd be great
>
> thanks,
> jirka
>
>
> ---
> arch/x86/events/core.c | 6 ++++++
> arch/x86/events/intel/core.c | 9 +++++++++
> arch/x86/events/perf_event.h | 2 ++
> include/linux/perf_event.h | 5 +++++
> kernel/events/core.c | 31 +++++++++++++++++++++++++++++++
> 5 files changed, 53 insertions(+)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 374a19712e20..e5db4fad517e 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2278,6 +2278,11 @@ void perf_check_microcode(void)
> x86_pmu.check_microcode();
> }
>
> +static int x86_pmu_check_period(struct perf_event *event)
> +{
> + return x86_pmu.check_period ? x86_pmu.check_period(event) : 0;
> +}
> +
> static struct pmu pmu = {
> .pmu_enable = x86_pmu_enable,
> .pmu_disable = x86_pmu_disable,
> @@ -2302,6 +2307,7 @@ static struct pmu pmu = {
> .event_idx = x86_pmu_event_idx,
> .sched_task = x86_pmu_sched_task,
> .task_ctx_size = sizeof(struct x86_perf_task_context),
> + .check_period = x86_pmu_check_period,
> };
>
> void arch_perf_update_userpage(struct perf_event *event,
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 40e12cfc87f6..125930e328c8 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3584,6 +3584,11 @@ static void intel_pmu_sched_task(struct perf_event_context *ctx,
> intel_pmu_lbr_sched_task(ctx, sched_in);
> }
>
> +static int intel_pmu_check_period(struct perf_event *event)
> +{
> + return intel_pmu_has_bts(event) ? -EINVAL : 0;
> +}
> +
> PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
>
> PMU_FORMAT_ATTR(ldlat, "config1:0-15");
> @@ -3663,6 +3668,8 @@ static __initconst const struct x86_pmu core_pmu = {
> .cpu_prepare = intel_pmu_cpu_prepare,
> .cpu_starting = intel_pmu_cpu_starting,
> .cpu_dying = intel_pmu_cpu_dying,
> +
> + .check_period = intel_pmu_check_period,
> };
>
> static struct attribute *intel_pmu_attrs[];
> @@ -3705,6 +3712,8 @@ static __initconst const struct x86_pmu intel_pmu = {
> .cpu_dying = intel_pmu_cpu_dying,
> .guest_get_msrs = intel_guest_get_msrs,
> .sched_task = intel_pmu_sched_task,
> +
> + .check_period = intel_pmu_check_period,
> };
>
> static __init void intel_clovertown_quirk(void)
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 78d7b7031bfc..170b58d48710 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -646,6 +646,8 @@ struct x86_pmu {
> * Intel host/guest support (KVM)
> */
> struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
> +
> + int (*check_period) (struct perf_event *event);
> };
>
> struct x86_perf_task_context {
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index a79e59fc3b7d..749a5c3110e2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -447,6 +447,11 @@ struct pmu {
> * Filter events for PMU-specific reasons.
> */
> int (*filter_match) (struct perf_event *event); /* optional */
> +
> + /*
> + * Check period for PERF_EVENT_IOC_PERIOD ioctl.
> + */
> + int (*check_period) (struct perf_event *event); /* optional */
> };
>
> enum perf_addr_filter_action_t {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 280a72b3a553..22ec63a0782e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event *event,
> }
> }
>
> +static int check_period(struct perf_event *event, u64 value)
> +{
> + u64 sample_period_attr = event->attr.sample_period;
> + u64 sample_period_hw = event->hw.sample_period;
> + int ret;
> +
> + if (event->attr.freq) {
> + event->attr.sample_freq = value;
> + } else {
> + event->attr.sample_period = value;
> + event->hw.sample_period = value;
> + }
> +
> + ret = event->pmu->check_period(event);
> +
> + event->attr.sample_period = sample_period_attr;
> + event->hw.sample_period = sample_period_hw;
> + return ret;
> +}
> +
> static int perf_event_period(struct perf_event *event, u64 __user *arg)
> {
> u64 value;
> @@ -4985,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
> if (event->attr.freq && value > sysctl_perf_event_sample_rate)
> return -EINVAL;
>
> + if (check_period(event, value))
> + return -EINVAL;
> +
> event_function_call(event, __perf_event_period, &value);
>
> return 0;
> @@ -9601,6 +9624,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
> return 0;
> }
>
> +static int perf_event_nop_int(struct perf_event *event)
> +{
> + return 0;
> +}
> +
> static DEFINE_PER_CPU(unsigned int, nop_txn_flags);
>
> static void perf_pmu_start_txn(struct pmu *pmu, unsigned int flags)
> @@ -9901,6 +9929,9 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
> pmu->pmu_disable = perf_pmu_nop_void;
> }
>
> + if (!pmu->check_period)
> + pmu->check_period = perf_event_nop_int;
> +
> if (!pmu->event_idx)
> pmu->event_idx = perf_event_idx_default;
>
> --
> 2.17.2
>
Hi Andi,
On 1/25/19 9:30 PM, Andi Kleen wrote:
>> [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
>> [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750
>> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250
>> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000
>> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500
>
> These are fairly normal.
I understand that throttling mechanism is designed exactly to do this.
But I've observed that, everytime I run the fuzzer, max_sample_rates is
been throttled down to 250 (which is CONFIG_HZ I guess). Doesn't this
mean the interrupt time is somehow increasing gradually? Is that fine?
Here is the sample dmesg:
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (2928 > 2500), lowering kernel.perf_event_max_sample_rate to 68250
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (4363 > 3660), lowering kernel.perf_event_max_sample_rate to 45750
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 2.183 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (21382 > 5453), lowering kernel.perf_event_max_sample_rate to 9250
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (34548 > 26727), lowering kernel.perf_event_max_sample_rate to 5750
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.509 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (61682 > 43185), lowering kernel.perf_event_max_sample_rate to 3000
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.593 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (89206 > 77102), lowering kernel.perf_event_max_sample_rate to 2000
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.619 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (120188 > 111507), lowering kernel.perf_event_max_sample_rate to 1500
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 3.782 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (171065 > 150235), lowering kernel.perf_event_max_sample_rate to 1000
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 4.066 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (226815 > 213831), lowering kernel.perf_event_max_sample_rate to 750
[Thu Jan 31 09:25:40 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 5.364 msecs
[Thu Jan 31 09:25:40 2019] perf: interrupt took too long (300844 > 283518), lowering kernel.perf_event_max_sample_rate to 500
[Thu Jan 31 09:33:43 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 6.136 msecs
[Thu Jan 31 09:50:35 2019] perf: interrupt took too long (378352 > 376055), lowering kernel.perf_event_max_sample_rate to 500
[Thu Jan 31 09:53:47 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 6.456 msecs
[Thu Jan 31 09:57:31 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 11.002 msecs
[Thu Jan 31 10:01:30 2019] perf: interrupt took too long (478447 > 472940), lowering kernel.perf_event_max_sample_rate to 250
[Thu Jan 31 12:28:31 2019] perf: interrupt took too long (601630 > 598058), lowering kernel.perf_event_max_sample_rate to 250
[Thu Jan 31 12:28:31 2019] perf: interrupt took too long (754288 > 752037), lowering kernel.perf_event_max_sample_rate to 250
[Thu Jan 31 12:43:13 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 12.781 msecs
[Thu Jan 31 12:43:13 2019] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.583 msecs
Thanks,
Ravi
On Wed, Jan 30, 2019 at 07:36:48PM +0100, Jiri Olsa wrote:
SNIP
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 280a72b3a553..22ec63a0782e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event *event,
> }
> }
>
> +static int check_period(struct perf_event *event, u64 value)
> +{
> + u64 sample_period_attr = event->attr.sample_period;
> + u64 sample_period_hw = event->hw.sample_period;
> + int ret;
> +
> + if (event->attr.freq) {
> + event->attr.sample_freq = value;
> + } else {
> + event->attr.sample_period = value;
> + event->hw.sample_period = value;
> + }
hm, I think we need to check the period without changing the event,
because we don't disable pmu, so it might get picked up by bts code
will check
jirka
> +
> + ret = event->pmu->check_period(event);
> +
> + event->attr.sample_period = sample_period_attr;
> + event->hw.sample_period = sample_period_hw;
> + return ret;
> +}
> +
> static int perf_event_period(struct perf_event *event, u64 __user *arg)
> {
> u64 value;
> @@ -4985,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
> if (event->attr.freq && value > sysctl_perf_event_sample_rate)
> return -EINVAL;
>
> + if (check_period(event, value))
> + return -EINVAL;
> +
> event_function_call(event, __perf_event_period, &value);
>
> return 0;
> @@ -9601,6 +9624,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
> return 0;
> }
>
> +static int perf_event_nop_int(struct perf_event *event)
> +{
> + return 0;
> +}
> +
> static DEFINE_PER_CPU(unsigned int, nop_txn_flags);
>
> static void perf_pmu_start_txn(struct pmu *pmu, unsigned int flags)
> @@ -9901,6 +9929,9 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
> pmu->pmu_disable = perf_pmu_nop_void;
> }
>
> + if (!pmu->check_period)
> + pmu->check_period = perf_event_nop_int;
> +
> if (!pmu->event_idx)
> pmu->event_idx = perf_event_idx_default;
>
> --
> 2.17.2
>
On Thu, Jan 31, 2019 at 01:28:34PM +0530, Ravi Bangoria wrote:
> Hi Andi,
>
> On 1/25/19 9:30 PM, Andi Kleen wrote:
> >> [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
> >> [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750
> >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250
> >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000
> >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500
> >
> > These are fairly normal.
>
> I understand that throttling mechanism is designed exactly to do this.
> But I've observed that, everytime I run the fuzzer, max_sample_rates is
> been throttled down to 250 (which is CONFIG_HZ I guess). Doesn't this
> mean the interrupt time is somehow increasing gradually? Is that fine?
It's more like the throttling mechanism is an controller
and it takes multiple tries to zoom in on the truely
needed value.
You can measure the PMI time by enabling the nmi:nmi_handler
trace point. It directly reports it. From what I've seen
it's a long tail distribution with regular large outliers.
Most of the PMIs are not that slow, just an occassional
few are.
When I did some investigation on this a couple years back
the outliers were either due to call stack processing,
or due to flushing the perf ring buffer. There were some
fixes on the the call stack case back then, but I'm sure more could
be done.
For the call stack processing there isn't much more we can do I think
(other than switching to call stack LBR only),
but I suspect the buffer flushing problem could be improved more.
It's relatively easy to investigate with a variant of the ftrace
recipe I posted earlier (but you need to fix the Makefile first
to enable ftrace for all of perf) Just add a ftrace trigger on the
nmi_handler trace point to stop tracing when the nmi_handler
time exceeds a threshold and look at the traces.
-Andi
On Fri, Jan 25, 2019 at 8:02 AM Andi Kleen <[email protected]> wrote:
>
> > [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
> > [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750
> > [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250
> > [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000
> > [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500
>
> These are fairly normal.
>
> > [Fri Jan 25 10:32:44 2019] ------------[ cut here ]------------
> > [Fri Jan 25 10:32:44 2019] perfevents: irq loop stuck!
>
> I believe it's always possible to cause an irq loop. This happens when
> the PMU is programmed to cause PMIs on multiple counters
> too quickly. Maybe should just recover from it without printing such
> scary messages.
Yeah, a loop stuck looks really scary inside an NMI handler.
Should I just go ahead to send a patch to remove this warning?
Or probably turn it into a pr_info()?
>
> Right now the scary message is justified because it resets the complete
> PMU. Perhaps need to be a bit more selective resetting on only
> the events that loop.
>
> > [Fri Jan 25 10:32:44 2019] WARNING: CPU: 1 PID: 0 at arch/x86/events/intel/core.c:2440 intel_pmu_handle_irq+0x158/0x170
>
> This looks independent.
>
> I would apply the following patch (cut'n'pasted, so may need manual apply)
> and then run with
>
I would like to help as we keep seeing this warning for a rather long time,
but unfortunately the reproducer provided by Ravi doesn't trigger any warning
or crash here. Maybe I don't use a right hardware to trigger it?
[ 0.132136] Performance Events: PEBS fmt2+, Broadwell events,
16-deep LBR, full-width counters, Intel PMU driver.
[ 0.133003] ... version: 3
[ 0.134001] ... bit width: 48
[ 0.135001] ... generic registers: 4
[ 0.136001] ... value mask: 0000ffffffffffff
[ 0.137001] ... max period: 00007fffffffffff
[ 0.138001] ... fixed-purpose events: 3
[ 0.139001] ... event mask: 000000070000000f
Thanks!
> Yeah, a loop stuck looks really scary inside an NMI handler.
> Should I just go ahead to send a patch to remove this warning?
> Or probably turn it into a pr_info()?
Not at this point. Would need to fix the PMU reset first to be
more selective.
-Andi
On Thu, Jan 31, 2019 at 09:27:11AM +0100, Jiri Olsa wrote:
> On Wed, Jan 30, 2019 at 07:36:48PM +0100, Jiri Olsa wrote:
>
> SNIP
>
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 280a72b3a553..22ec63a0782e 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event *event,
> > }
> > }
> >
> > +static int check_period(struct perf_event *event, u64 value)
> > +{
> > + u64 sample_period_attr = event->attr.sample_period;
> > + u64 sample_period_hw = event->hw.sample_period;
> > + int ret;
> > +
> > + if (event->attr.freq) {
> > + event->attr.sample_freq = value;
> > + } else {
> > + event->attr.sample_period = value;
> > + event->hw.sample_period = value;
> > + }
>
> hm, I think we need to check the period without changing the event,
> because we don't disable pmu, so it might get picked up by bts code
>
> will check
with attached patch I did not trigger the fuzzer crash
for over a day now, could you guys try?
thanks,
jirka
---
arch/x86/events/core.c | 14 ++++++++++++++
arch/x86/events/intel/core.c | 9 +++++++++
arch/x86/events/perf_event.h | 16 ++++++++++++++--
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 16 ++++++++++++++++
5 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..b684f0294f35 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2278,6 +2278,19 @@ void perf_check_microcode(void)
x86_pmu.check_microcode();
}
+static int x86_pmu_check_period(struct perf_event *event, u64 value)
+{
+ if (x86_pmu.check_period && x86_pmu.check_period(event, value))
+ return -EINVAL;
+
+ if (value && x86_pmu.limit_period) {
+ if (x86_pmu.limit_period(event, value) > value)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
@@ -2302,6 +2315,7 @@ static struct pmu pmu = {
.event_idx = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
.task_ctx_size = sizeof(struct x86_perf_task_context),
+ .check_period = x86_pmu_check_period,
};
void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8b4e3020aba2..a3fbbd33beef 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3586,6 +3586,11 @@ static void intel_pmu_sched_task(struct perf_event_context *ctx,
intel_pmu_lbr_sched_task(ctx, sched_in);
}
+static int intel_pmu_check_period(struct perf_event *event, u64 value)
+{
+ return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
+}
+
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -3665,6 +3670,8 @@ static __initconst const struct x86_pmu core_pmu = {
.cpu_prepare = intel_pmu_cpu_prepare,
.cpu_starting = intel_pmu_cpu_starting,
.cpu_dying = intel_pmu_cpu_dying,
+
+ .check_period = intel_pmu_check_period,
};
static struct attribute *intel_pmu_attrs[];
@@ -3707,6 +3714,8 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_dying = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
+
+ .check_period = intel_pmu_check_period,
};
static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..d46fd6754d92 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -646,6 +646,11 @@ struct x86_pmu {
* Intel host/guest support (KVM)
*/
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+ /*
+ * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event, u64 period);
};
struct x86_perf_task_context {
@@ -857,7 +862,7 @@ static inline int amd_pmu_init(void)
#ifdef CONFIG_CPU_SUP_INTEL
-static inline bool intel_pmu_has_bts(struct perf_event *event)
+static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period)
{
struct hw_perf_event *hwc = &event->hw;
unsigned int hw_event, bts_event;
@@ -868,7 +873,14 @@ static inline bool intel_pmu_has_bts(struct perf_event *event)
hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
- return hw_event == bts_event && hwc->sample_period == 1;
+ return hw_event == bts_event && period == 1;
+}
+
+static inline bool intel_pmu_has_bts(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ return intel_pmu_has_bts_period(event, hwc->sample_period);
}
int intel_pmu_save_and_restart(struct perf_event *event);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f2f9f8592d42..0a7d4d6a3660 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -447,6 +447,11 @@ struct pmu {
* Filter events for PMU-specific reasons.
*/
int (*filter_match) (struct perf_event *event); /* optional */
+
+ /*
+ * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event, u64 value); /* optional */
};
enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 214c434ddc1b..edd92dc556fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4969,6 +4969,11 @@ static void __perf_event_period(struct perf_event *event,
}
}
+static int perf_event_check_period(struct perf_event *event, u64 value)
+{
+ return event->pmu->check_period(event, value);
+}
+
static int perf_event_period(struct perf_event *event, u64 __user *arg)
{
u64 value;
@@ -4985,6 +4990,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
if (event->attr.freq && value > sysctl_perf_event_sample_rate)
return -EINVAL;
+ if (perf_event_check_period(event, value))
+ return -EINVAL;
+
event_function_call(event, __perf_event_period, &value);
return 0;
@@ -9601,6 +9609,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
return 0;
}
+static int perf_event_nop_int(struct perf_event *event, u64 value)
+{
+ return 0;
+}
+
static DEFINE_PER_CPU(unsigned int, nop_txn_flags);
static void perf_pmu_start_txn(struct pmu *pmu, unsigned int flags)
@@ -9901,6 +9914,9 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
pmu->pmu_disable = perf_pmu_nop_void;
}
+ if (!pmu->check_period)
+ pmu->check_period = perf_event_nop_int;
+
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
--
2.17.2
Hi Jiri,
On 2/1/19 1:13 PM, Jiri Olsa wrote:
> On Thu, Jan 31, 2019 at 09:27:11AM +0100, Jiri Olsa wrote:
>> On Wed, Jan 30, 2019 at 07:36:48PM +0100, Jiri Olsa wrote:
>>
>> SNIP
>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 280a72b3a553..22ec63a0782e 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event *event,
>>> }
>>> }
>>>
>>> +static int check_period(struct perf_event *event, u64 value)
>>> +{
>>> + u64 sample_period_attr = event->attr.sample_period;
>>> + u64 sample_period_hw = event->hw.sample_period;
>>> + int ret;
>>> +
>>> + if (event->attr.freq) {
>>> + event->attr.sample_freq = value;
>>> + } else {
>>> + event->attr.sample_period = value;
>>> + event->hw.sample_period = value;
>>> + }
>>
>> hm, I think we need to check the period without changing the event,
>> because we don't disable pmu, so it might get picked up by bts code
>>
>> will check
>
> with attached patch I did not trigger the fuzzer crash
> for over a day now, could you guys try?
I ran fuzzer for couple of hours but I didn't see any crash with
your previous patch.
I'll try this newer one as well.
Thanks.
On Fri, 1 Feb 2019, Jiri Olsa wrote:
> with attached patch I did not trigger the fuzzer crash
> for over a day now, could you guys try?
I've just started fuzzing with the patch applied. Often it takes a few
hours to trigger the bug.
Added question about this bug. It appeared that the crash was triggered
by the BTS driver over-writing kernel memory. The data being written, was
this user controllable? Meaning, is this a security issue being fixed, or
just a crashing issue?
Vince Weaver
[email protected]
On Fri, Feb 01, 2019 at 11:27:28AM -0500, Vince Weaver wrote:
> On Fri, 1 Feb 2019, Jiri Olsa wrote:
>
> > with attached patch I did not trigger the fuzzer crash
> > for over a day now, could you guys try?
>
> I've just started fuzzing with the patch applied. Often it takes a few
> hours to trigger the bug.
cool, thanks
>
> Added question about this bug. It appeared that the crash was triggered
> by the BTS driver over-writing kernel memory. The data being written, was
> this user controllable? Meaning, is this a security issue being fixed, or
> just a crashing issue?
yea, I have an example that can trigger it immediately
jirka
On 2/1/19 1:24 PM, Ravi Bangoria wrote:
> I ran fuzzer for couple of hours but I didn't see any crash with
> your previous patch.
>
> I'll try this newer one as well.
I ran fuzzer for ~8 hrs and no lockup so far.
Thanks.
On Sat, Feb 02, 2019 at 08:54:35AM +0530, Ravi Bangoria wrote:
>
>
> On 2/1/19 1:24 PM, Ravi Bangoria wrote:
> > I ran fuzzer for couple of hours but I didn't see any crash with
> > your previous patch.
> >
> > I'll try this newer one as well.
>
> I ran fuzzer for ~8 hrs and no lockup so far.
yep looks good, mine's running for 2 days now
I'll put together some changelog and send it proper patch
jirka
On Fri, 1 Feb 2019, Jiri Olsa wrote:
> >
> > I've just started fuzzing with the patch applied. Often it takes a few
> > hours to trigger the bug.
>
> cool, thanks
I let it run overnight and no crash.
> > Added question about this bug. It appeared that the crash was triggered
> > by the BTS driver over-writing kernel memory. The data being written, was
> > this user controllable? Meaning, is this a security issue being fixed, or
> > just a crashing issue?
>
> yea, I have an example that can trigger it immediately
I mean: the crash is happening because data structures are getting
over-written by the BTS driver. Depending who and what is doing this,
this could be a security issue (i.e. if it was raw BTS data that was
partially userspace controlled values). Though even if this were the case
it would probably be hard to exploit.
Vince
On Sat, Feb 02, 2019 at 12:58:07PM -0500, Vince Weaver wrote:
> On Fri, 1 Feb 2019, Jiri Olsa wrote:
>
> > >
> > > I've just started fuzzing with the patch applied. Often it takes a few
> > > hours to trigger the bug.
> >
> > cool, thanks
>
> I let it run overnight and no crash.
>
> > > Added question about this bug. It appeared that the crash was triggered
> > > by the BTS driver over-writing kernel memory. The data being written, was
> > > this user controllable? Meaning, is this a security issue being fixed, or
> > > just a crashing issue?
> >
> > yea, I have an example that can trigger it immediately
>
> I mean: the crash is happening because data structures are getting
> over-written by the BTS driver. Depending who and what is doing this,
> this could be a security issue (i.e. if it was raw BTS data that was
> partially userspace controlled values). Though even if this were the case
> it would probably be hard to exploit.
well, I'm not sure about other exploits,
but single program can crash the server
attaching the complete patch (and cc-ing Ingo)
jirka
---
Vince (and later on Ravi) reported crash in BTS code during
fuzzing with following backtrace:
general protection fault: 0000 [#1] SMP PTI
...
RIP: 0010:perf_prepare_sample+0x8f/0x510
...
Call Trace:
<IRQ>
? intel_pmu_drain_bts_buffer+0x194/0x230
intel_pmu_drain_bts_buffer+0x160/0x230
? tick_nohz_irq_exit+0x31/0x40
? smp_call_function_single_interrupt+0x48/0xe0
? call_function_single_interrupt+0xf/0x20
? call_function_single_interrupt+0xa/0x20
? x86_schedule_events+0x1a0/0x2f0
? x86_pmu_commit_txn+0xb4/0x100
? find_busiest_group+0x47/0x5d0
? perf_event_set_state.part.42+0x12/0x50
? perf_mux_hrtimer_restart+0x40/0xb0
intel_pmu_disable_event+0xae/0x100
? intel_pmu_disable_event+0xae/0x100
x86_pmu_stop+0x7a/0xb0
x86_pmu_del+0x57/0x120
event_sched_out.isra.101+0x83/0x180
group_sched_out.part.103+0x57/0xe0
ctx_sched_out+0x188/0x240
ctx_resched+0xa8/0xd0
__perf_event_enable+0x193/0x1e0
event_function+0x8e/0xc0
remote_function+0x41/0x50
flush_smp_call_function_queue+0x68/0x100
generic_smp_call_function_single_interrupt+0x13/0x30
smp_call_function_single_interrupt+0x3e/0xe0
call_function_single_interrupt+0xf/0x20
</IRQ>
The reason is that while event init code does several checks
for BTS events and prevents several unwanted config bits for
BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
to create BTS event without those checks being done.
Following sequence will cause the crash:
- create 'almost' BTS event with precise_ip and callchains,
(perf command line -e option equiv.):
-e cpu/branch-instructions/up -c 2 -g
- change the period of that event to '1', which will turn
it to BTS event, with precise_ip and callchains
That will immediately cause crash in perf_prepare_sample
function because precise_ip events are expected to come
in with callchain data initialized, but that's not the
case for intel_pmu_drain_bts_buffer caller.
Adding a check_period callback to be called before the period
is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
if the event would become BTS. Plus adding also the limit_period
check as well.
Cc: Vince Weaver <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/events/core.c | 14 ++++++++++++++
arch/x86/events/intel/core.c | 9 +++++++++
arch/x86/events/perf_event.h | 16 ++++++++++++++--
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 16 ++++++++++++++++
5 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..b684f0294f35 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2278,6 +2278,19 @@ void perf_check_microcode(void)
x86_pmu.check_microcode();
}
+static int x86_pmu_check_period(struct perf_event *event, u64 value)
+{
+ if (x86_pmu.check_period && x86_pmu.check_period(event, value))
+ return -EINVAL;
+
+ if (value && x86_pmu.limit_period) {
+ if (x86_pmu.limit_period(event, value) > value)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
@@ -2302,6 +2315,7 @@ static struct pmu pmu = {
.event_idx = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
.task_ctx_size = sizeof(struct x86_perf_task_context),
+ .check_period = x86_pmu_check_period,
};
void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 40e12cfc87f6..d6532ae099f7 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3584,6 +3584,11 @@ static void intel_pmu_sched_task(struct perf_event_context *ctx,
intel_pmu_lbr_sched_task(ctx, sched_in);
}
+static int intel_pmu_check_period(struct perf_event *event, u64 value)
+{
+ return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
+}
+
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -3663,6 +3668,8 @@ static __initconst const struct x86_pmu core_pmu = {
.cpu_prepare = intel_pmu_cpu_prepare,
.cpu_starting = intel_pmu_cpu_starting,
.cpu_dying = intel_pmu_cpu_dying,
+
+ .check_period = intel_pmu_check_period,
};
static struct attribute *intel_pmu_attrs[];
@@ -3705,6 +3712,8 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_dying = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
+
+ .check_period = intel_pmu_check_period,
};
static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..d46fd6754d92 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -646,6 +646,11 @@ struct x86_pmu {
* Intel host/guest support (KVM)
*/
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+ /*
+ * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event, u64 period);
};
struct x86_perf_task_context {
@@ -857,7 +862,7 @@ static inline int amd_pmu_init(void)
#ifdef CONFIG_CPU_SUP_INTEL
-static inline bool intel_pmu_has_bts(struct perf_event *event)
+static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period)
{
struct hw_perf_event *hwc = &event->hw;
unsigned int hw_event, bts_event;
@@ -868,7 +873,14 @@ static inline bool intel_pmu_has_bts(struct perf_event *event)
hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
- return hw_event == bts_event && hwc->sample_period == 1;
+ return hw_event == bts_event && period == 1;
+}
+
+static inline bool intel_pmu_has_bts(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ return intel_pmu_has_bts_period(event, hwc->sample_period);
}
int intel_pmu_save_and_restart(struct perf_event *event);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index dec882fc451b..fb1de91e1e0b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -447,6 +447,11 @@ struct pmu {
* Filter events for PMU-specific reasons.
*/
int (*filter_match) (struct perf_event *event); /* optional */
+
+ /*
+ * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event, u64 value); /* optional */
};
enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 280a72b3a553..45ac90d3ba24 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4969,6 +4969,11 @@ static void __perf_event_period(struct perf_event *event,
}
}
+static int perf_event_check_period(struct perf_event *event, u64 value)
+{
+ return event->pmu->check_period(event, value);
+}
+
static int perf_event_period(struct perf_event *event, u64 __user *arg)
{
u64 value;
@@ -4985,6 +4990,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
if (event->attr.freq && value > sysctl_perf_event_sample_rate)
return -EINVAL;
+ if (perf_event_check_period(event, value))
+ return -EINVAL;
+
event_function_call(event, __perf_event_period, &value);
return 0;
@@ -9601,6 +9609,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
return 0;
}
+static int perf_event_nop_int(struct perf_event *event, u64 value)
+{
+ return 0;
+}
+
static DEFINE_PER_CPU(unsigned int, nop_txn_flags);
static void perf_pmu_start_txn(struct pmu *pmu, unsigned int flags)
@@ -9901,6 +9914,9 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
pmu->pmu_disable = perf_pmu_nop_void;
}
+ if (!pmu->check_period)
+ pmu->check_period = perf_event_nop_int;
+
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
--
2.17.2
On Mon, Feb 04, 2019 at 01:35:32PM +0100, Jiri Olsa wrote:
> Vince (and later on Ravi) reported crash in BTS code during
> fuzzing with following backtrace:
>
> general protection fault: 0000 [#1] SMP PTI
> ...
> RIP: 0010:perf_prepare_sample+0x8f/0x510
> ...
> Call Trace:
> <IRQ>
> ? intel_pmu_drain_bts_buffer+0x194/0x230
> intel_pmu_drain_bts_buffer+0x160/0x230
> ? tick_nohz_irq_exit+0x31/0x40
> ? smp_call_function_single_interrupt+0x48/0xe0
> ? call_function_single_interrupt+0xf/0x20
> ? call_function_single_interrupt+0xa/0x20
> ? x86_schedule_events+0x1a0/0x2f0
> ? x86_pmu_commit_txn+0xb4/0x100
> ? find_busiest_group+0x47/0x5d0
> ? perf_event_set_state.part.42+0x12/0x50
> ? perf_mux_hrtimer_restart+0x40/0xb0
> intel_pmu_disable_event+0xae/0x100
> ? intel_pmu_disable_event+0xae/0x100
> x86_pmu_stop+0x7a/0xb0
> x86_pmu_del+0x57/0x120
> event_sched_out.isra.101+0x83/0x180
> group_sched_out.part.103+0x57/0xe0
> ctx_sched_out+0x188/0x240
> ctx_resched+0xa8/0xd0
> __perf_event_enable+0x193/0x1e0
> event_function+0x8e/0xc0
> remote_function+0x41/0x50
> flush_smp_call_function_queue+0x68/0x100
> generic_smp_call_function_single_interrupt+0x13/0x30
> smp_call_function_single_interrupt+0x3e/0xe0
> call_function_single_interrupt+0xf/0x20
> </IRQ>
>
> The reason is that while event init code does several checks
> for BTS events and prevents several unwanted config bits for
> BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
> to create BTS event without those checks being done.
>
> Following sequence will cause the crash:
> - create 'almost' BTS event with precise_ip and callchains,
> (perf command line -e option equiv.):
>
> -e cpu/branch-instructions/up -c 2 -g
>
> - change the period of that event to '1', which will turn
> it to BTS event, with precise_ip and callchains
>
> That will immediately cause crash in perf_prepare_sample
> function because precise_ip events are expected to come
> in with callchain data initialized, but that's not the
> case for intel_pmu_drain_bts_buffer caller.
>
> Adding a check_period callback to be called before the period
> is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
> if the event would become BTS. Plus adding also the limit_period
> check as well.
>
> Cc: Vince Weaver <[email protected]>
> Cc: Ravi Bangoria <[email protected]>
> Reported-by: Vince Weaver <[email protected]>
> Signed-off-by: Jiri Olsa <[email protected]>
Thanks Jiri!
Commit-ID: 81ec3f3c4c4d78f2d3b6689c9816bfbdf7417dbb
Gitweb: https://git.kernel.org/tip/81ec3f3c4c4d78f2d3b6689c9816bfbdf7417dbb
Author: Jiri Olsa <[email protected]>
AuthorDate: Mon, 4 Feb 2019 13:35:32 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 11 Feb 2019 11:46:43 +0100
perf/x86: Add check_period PMU callback
Vince (and later on Ravi) reported crashes in the BTS code during
fuzzing with the following backtrace:
general protection fault: 0000 [#1] SMP PTI
...
RIP: 0010:perf_prepare_sample+0x8f/0x510
...
Call Trace:
<IRQ>
? intel_pmu_drain_bts_buffer+0x194/0x230
intel_pmu_drain_bts_buffer+0x160/0x230
? tick_nohz_irq_exit+0x31/0x40
? smp_call_function_single_interrupt+0x48/0xe0
? call_function_single_interrupt+0xf/0x20
? call_function_single_interrupt+0xa/0x20
? x86_schedule_events+0x1a0/0x2f0
? x86_pmu_commit_txn+0xb4/0x100
? find_busiest_group+0x47/0x5d0
? perf_event_set_state.part.42+0x12/0x50
? perf_mux_hrtimer_restart+0x40/0xb0
intel_pmu_disable_event+0xae/0x100
? intel_pmu_disable_event+0xae/0x100
x86_pmu_stop+0x7a/0xb0
x86_pmu_del+0x57/0x120
event_sched_out.isra.101+0x83/0x180
group_sched_out.part.103+0x57/0xe0
ctx_sched_out+0x188/0x240
ctx_resched+0xa8/0xd0
__perf_event_enable+0x193/0x1e0
event_function+0x8e/0xc0
remote_function+0x41/0x50
flush_smp_call_function_queue+0x68/0x100
generic_smp_call_function_single_interrupt+0x13/0x30
smp_call_function_single_interrupt+0x3e/0xe0
call_function_single_interrupt+0xf/0x20
</IRQ>
The reason is that while event init code does several checks
for BTS events and prevents several unwanted config bits for
BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
to create BTS event without those checks being done.
Following sequence will cause the crash:
If we create an 'almost' BTS event with precise_ip and callchains,
and it into a BTS event it will crash the perf_prepare_sample()
function because precise_ip events are expected to come
in with callchain data initialized, but that's not the
case for intel_pmu_drain_bts_buffer() caller.
Adding a check_period callback to be called before the period
is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
if the event would become BTS. Plus adding also the limit_period
check as well.
Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/core.c | 14 ++++++++++++++
arch/x86/events/intel/core.c | 9 +++++++++
arch/x86/events/perf_event.h | 16 ++++++++++++++--
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 16 ++++++++++++++++
5 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..b684f0294f35 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2278,6 +2278,19 @@ void perf_check_microcode(void)
x86_pmu.check_microcode();
}
+static int x86_pmu_check_period(struct perf_event *event, u64 value)
+{
+ if (x86_pmu.check_period && x86_pmu.check_period(event, value))
+ return -EINVAL;
+
+ if (value && x86_pmu.limit_period) {
+ if (x86_pmu.limit_period(event, value) > value)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable = x86_pmu_disable,
@@ -2302,6 +2315,7 @@ static struct pmu pmu = {
.event_idx = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
.task_ctx_size = sizeof(struct x86_perf_task_context),
+ .check_period = x86_pmu_check_period,
};
void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index daafb893449b..730978dff63f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3587,6 +3587,11 @@ static void intel_pmu_sched_task(struct perf_event_context *ctx,
intel_pmu_lbr_sched_task(ctx, sched_in);
}
+static int intel_pmu_check_period(struct perf_event *event, u64 value)
+{
+ return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
+}
+
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -3667,6 +3672,8 @@ static __initconst const struct x86_pmu core_pmu = {
.cpu_starting = intel_pmu_cpu_starting,
.cpu_dying = intel_pmu_cpu_dying,
.cpu_dead = intel_pmu_cpu_dead,
+
+ .check_period = intel_pmu_check_period,
};
static struct attribute *intel_pmu_attrs[];
@@ -3711,6 +3718,8 @@ static __initconst const struct x86_pmu intel_pmu = {
.guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
+
+ .check_period = intel_pmu_check_period,
};
static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..d46fd6754d92 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -646,6 +646,11 @@ struct x86_pmu {
* Intel host/guest support (KVM)
*/
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+ /*
+ * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event, u64 period);
};
struct x86_perf_task_context {
@@ -857,7 +862,7 @@ static inline int amd_pmu_init(void)
#ifdef CONFIG_CPU_SUP_INTEL
-static inline bool intel_pmu_has_bts(struct perf_event *event)
+static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period)
{
struct hw_perf_event *hwc = &event->hw;
unsigned int hw_event, bts_event;
@@ -868,7 +873,14 @@ static inline bool intel_pmu_has_bts(struct perf_event *event)
hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
- return hw_event == bts_event && hwc->sample_period == 1;
+ return hw_event == bts_event && period == 1;
+}
+
+static inline bool intel_pmu_has_bts(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ return intel_pmu_has_bts_period(event, hwc->sample_period);
}
int intel_pmu_save_and_restart(struct perf_event *event);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d5c551a5add..e1a051724f7e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -447,6 +447,11 @@ struct pmu {
* Filter events for PMU-specific reasons.
*/
int (*filter_match) (struct perf_event *event); /* optional */
+
+ /*
+ * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
+ */
+ int (*check_period) (struct perf_event *event, u64 value); /* optional */
};
enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e5ede6918050..26d6edab051a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4963,6 +4963,11 @@ static void __perf_event_period(struct perf_event *event,
}
}
+static int perf_event_check_period(struct perf_event *event, u64 value)
+{
+ return event->pmu->check_period(event, value);
+}
+
static int perf_event_period(struct perf_event *event, u64 __user *arg)
{
u64 value;
@@ -4979,6 +4984,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
if (event->attr.freq && value > sysctl_perf_event_sample_rate)
return -EINVAL;
+ if (perf_event_check_period(event, value))
+ return -EINVAL;
+
event_function_call(event, __perf_event_period, &value);
return 0;
@@ -9391,6 +9399,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
return 0;
}
+static int perf_event_nop_int(struct perf_event *event, u64 value)
+{
+ return 0;
+}
+
static DEFINE_PER_CPU(unsigned int, nop_txn_flags);
static void perf_pmu_start_txn(struct pmu *pmu, unsigned int flags)
@@ -9691,6 +9704,9 @@ got_cpu_context:
pmu->pmu_disable = perf_pmu_nop_void;
}
+ if (!pmu->check_period)
+ pmu->check_period = perf_event_nop_int;
+
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
On Fri 2019-01-25 08:00:56, Andi Kleen wrote:
> > [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
> > [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750
> > [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250
> > [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000
> > [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500
>
> These are fairly normal.
Unfortunately, they are. Could we set up our defaults so that they
don't normally happen?
pavel@amd:~/g/unicsy_demo$ dmesg | grep "took too"
[ 761.507893] perf: interrupt took too long (2516 > 2500), lowering
kernel.perf_event_max_sample_rate to 79250
[ 4736.674595] perf: interrupt took too long (3209 > 3145), lowering
kernel.perf_event_max_sample_rate to 62250
pavel@amd:~/g/unicsy_demo$
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html