2019-10-22 09:26:17

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 2/3] perf: Optimize perf_init_event()

Andi reported that he was hitting the linear search in
perf_init_event() a lot. Make more agressive use of the IDR lookup to
avoid hitting the linear search.

With exception of PERF_TYPE_SOFTWARE (which relies on a hideous hack),
we can put everything in the IDR. On top of that, we can alias
TYPE_HARDWARE and TYPE_HW_CACHE to TYPE_RAW on the lookup side.

This greatly reduces the chances of hitting the linear search.

Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Kan <[email protected]>
---
kernel/events/core.c | 41 ++++++++++++++++++++++++++++++-----------
1 file changed, 30 insertions(+), 11 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10067,7 +10067,7 @@ static struct lock_class_key cpuctx_lock

int perf_pmu_register(struct pmu *pmu, const char *name, int type)
{
- int cpu, ret;
+ int cpu, ret, max = PERF_TYPE_MAX;

mutex_lock(&pmus_lock);
ret = -ENOMEM;
@@ -10080,12 +10080,17 @@ int perf_pmu_register(struct pmu *pmu, c
goto skip_type;
pmu->name = name;

- if (type < 0) {
- type = idr_alloc(&pmu_idr, pmu, PERF_TYPE_MAX, 0, GFP_KERNEL);
- if (type < 0) {
- ret = type;
+ if (type != PERF_TYPE_SOFTWARE) {
+ if (type >= 0)
+ max = type;
+
+ ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
+ if (ret < 0)
goto free_pdc;
- }
+
+ WARN_ON(type >= 0 && ret != type);
+
+ type = ret;
}
pmu->type = type;

@@ -10175,7 +10180,7 @@ int perf_pmu_register(struct pmu *pmu, c
put_device(pmu->dev);

free_idr:
- if (pmu->type >= PERF_TYPE_MAX)
+ if (pmu->type != PERF_TYPE_SOFTWARE)
idr_remove(&pmu_idr, pmu->type);

free_pdc:
@@ -10197,7 +10202,7 @@ void perf_pmu_unregister(struct pmu *pmu
synchronize_rcu();

free_percpu(pmu->pmu_disable_count);
- if (pmu->type >= PERF_TYPE_MAX)
+ if (pmu->type != PERF_TYPE_SOFTWARE)
idr_remove(&pmu_idr, pmu->type);
if (pmu_bus_running) {
if (pmu->nr_addr_filters)
@@ -10267,9 +10272,8 @@ static int perf_try_init_event(struct pm

static struct pmu *perf_init_event(struct perf_event *event)
{
+ int idx, type, ret;
struct pmu *pmu;
- int idx;
- int ret;

idx = srcu_read_lock(&pmus_srcu);

@@ -10282,12 +10286,27 @@ static struct pmu *perf_init_event(struc
}

rcu_read_lock();
- pmu = idr_find(&pmu_idr, event->attr.type);
+ /*
+ * PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
+ * are often aliases for PERF_TYPE_RAW.
+ */
+ type = event->attr.type;
+ if (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE)
+ type = PERF_TYPE_RAW;
+
+again:
+ pmu = idr_find(&pmu_idr, type);
rcu_read_unlock();
if (pmu) {
ret = perf_try_init_event(pmu, event);
+ if (ret == -ENOENT && event->attr.type != type) {
+ type = event->attr.type;
+ goto again;
+ }
+
if (ret)
pmu = ERR_PTR(ret);
+
goto unlock;
}




2019-10-27 05:22:23

by Chen, Rong A

[permalink] [raw]
Subject: [perf] 06e0dbcfd3: phoronix-test-suite.mbw.0.mib_s 12.6% improvement

Greeting,

FYI, we noticed a 12.6% improvement of phoronix-test-suite.mbw.0.mib_s due to commit:


commit: 06e0dbcfd33c53ac0046e5a1f93f7b8d71c40fc7 ("[PATCH 2/3] perf: Optimize perf_init_event()")
url: https://github.com/0day-ci/linux/commits/Peter-Zijlstra/Various-optimizations-for-event-creation/20191024-170638


in testcase: phoronix-test-suite
on test machine: 16 threads Intel(R) Xeon(R) CPU X5570 @ 2.93GHz with 48G memory
with following parameters:

test: mbw-1.0.0
cpufreq_governor: performance

test-description: The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added.
test-url: http://www.phoronix-test-suite.com/





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-phoronix/lkp-nhm-2ep1/mbw-1.0.0/phoronix-test-suite

commit:
c204d011d5 ("perf: Optimize perf_install_in_event()")
06e0dbcfd3 ("perf: Optimize perf_init_event()")

c204d011d597993a 06e0dbcfd33c53ac0046e5a1f93
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:4 75% 3:4 dmesg.BUG:scheduling_while_atomic
:4 180% 7:4 perf-profile.children.cycles-pp.error_entry
%stddev %change %stddev
\ | \
4480 ± 2% +12.6% 5044 phoronix-test-suite.mbw.0.mib_s
4905 +6.5% 5224 ± 2% phoronix-test-suite.mbw.1.mib_s
123842 ± 6% -25.8% 91945 ± 24% numa-meminfo.node1.AnonHugePages
21915 ± 24% -85.6% 3149 ± 9% vmstat.system.in
5945 ± 3% +25.9% 7484 ± 13% slabinfo.filp.num_objs
1205 ± 2% -22.8% 930.75 slabinfo.kmalloc-2k.active_objs
1224 ± 3% -21.9% 956.75 slabinfo.kmalloc-2k.num_objs
5763 -0.8% 5718 proc-vmstat.nr_kernel_stack
3199 -1.6% 3147 proc-vmstat.numa_other
2812 ±158% +874.4% 27405 ± 90% proc-vmstat.numa_pages_migrated
2812 ±158% +874.4% 27405 ± 90% proc-vmstat.pgmigrate_success
1152705 ± 86% -92.4% 87218 ± 47% cpuidle.C1.time
58509 ± 90% -91.8% 4816 ± 52% cpuidle.C1.usage
8002146 ± 95% -98.3% 139101 ± 42% cpuidle.C1E.time
132689 ± 99% -98.9% 1406 ± 36% cpuidle.C1E.usage
7.89e+08 ± 13% -91.1% 70529991 ± 26% cpuidle.C3.time
900688 ± 9% -91.9% 72507 ± 29% cpuidle.C3.usage
48753838 ± 82% +2009.0% 1.028e+09 ± 48% cpuidle.C6.time
48056 ± 76% +171.2% 130330 ± 33% cpuidle.C6.usage
9943122 ± 97% -97.9% 206756 ± 46% cpuidle.POLL.time
18354 ± 58% -59.9% 7364 ± 39% cpuidle.POLL.usage
71992 ± 7% -59.0% 29486 ± 35% interrupts.0:IO-APIC.2-edge.timer
59.25 ± 31% +124.9% 133.25 ± 63% interrupts.37:PCI-MSI.524291-edge.eth0-rx-2
71992 ± 7% -59.0% 29486 ± 35% interrupts.CPU0.0:IO-APIC.2-edge.timer
27089 ± 56% -79.1% 5658 ± 66% interrupts.CPU0.LOC:Local_timer_interrupts
13.50 ±164% +1431.5% 206.75 ± 96% interrupts.CPU0.TLB:TLB_shootdowns
79092 ± 13% -88.4% 9162 ± 40% interrupts.CPU1.LOC:Local_timer_interrupts
641.25 ± 34% +154.9% 1634 ± 20% interrupts.CPU1.RES:Rescheduling_interrupts
75334 ± 14% -85.2% 11173 ± 45% interrupts.CPU10.LOC:Local_timer_interrupts
194.25 ± 86% -77.0% 44.75 ± 91% interrupts.CPU10.RES:Rescheduling_interrupts
75022 ± 16% -88.4% 8685 ± 36% interrupts.CPU11.LOC:Local_timer_interrupts
568.75 ±115% -94.2% 33.00 ± 43% interrupts.CPU11.RES:Rescheduling_interrupts
75043 ± 16% -83.8% 12142 ± 61% interrupts.CPU12.LOC:Local_timer_interrupts
74930 ± 16% -85.3% 10999 ± 33% interrupts.CPU13.LOC:Local_timer_interrupts
1.25 ±131% +12840.0% 161.75 ±169% interrupts.CPU13.TLB:TLB_shootdowns
59.25 ± 31% +124.9% 133.25 ± 63% interrupts.CPU14.37:PCI-MSI.524291-edge.eth0-rx-2
75615 ± 14% -84.7% 11539 ± 27% interrupts.CPU14.LOC:Local_timer_interrupts
501.50 ± 56% -94.6% 27.25 ± 25% interrupts.CPU14.RES:Rescheduling_interrupts
78040 ± 11% -88.0% 9388 ± 45% interrupts.CPU15.LOC:Local_timer_interrupts
74913 ± 16% -86.2% 10367 ± 26% interrupts.CPU2.LOC:Local_timer_interrupts
76688 ± 16% -89.6% 7975 ± 29% interrupts.CPU3.LOC:Local_timer_interrupts
75758 ± 14% -85.6% 10937 ± 19% interrupts.CPU4.LOC:Local_timer_interrupts
75875 ± 17% -87.6% 9380 ± 13% interrupts.CPU5.LOC:Local_timer_interrupts
74896 ± 15% -84.4% 11680 ± 34% interrupts.CPU6.LOC:Local_timer_interrupts
76311 ± 13% -81.9% 13845 ± 36% interrupts.CPU7.LOC:Local_timer_interrupts
75970 ± 16% -84.6% 11704 ± 49% interrupts.CPU8.LOC:Local_timer_interrupts
77046 ± 15% -90.8% 7092 ± 16% interrupts.CPU9.LOC:Local_timer_interrupts
1167625 ± 15% -86.1% 161731 ± 28% interrupts.LOC:Local_timer_interrupts
12892 ± 8% -52.1% 6173 ± 4% softirqs.CPU0.SCHED
31018 ± 4% -51.8% 14957 ± 14% softirqs.CPU0.TIMER
10756 ± 6% -43.7% 6051 ± 9% softirqs.CPU1.SCHED
30437 ± 10% -61.5% 11724 ± 18% softirqs.CPU1.TIMER
9770 ± 6% -66.5% 3268 ± 19% softirqs.CPU10.SCHED
27722 ± 7% -60.0% 11091 ± 19% softirqs.CPU10.TIMER
10417 ± 7% -66.0% 3541 ± 10% softirqs.CPU11.SCHED
30529 ± 11% -65.2% 10636 ± 11% softirqs.CPU11.TIMER
10187 ± 5% -66.2% 3440 ± 15% softirqs.CPU12.SCHED
28111 ± 6% -57.9% 11835 ± 32% softirqs.CPU12.TIMER
9790 ± 8% -67.8% 3157 ± 17% softirqs.CPU13.SCHED
29315 ± 7% -53.0% 13765 ± 22% softirqs.CPU13.TIMER
9583 ± 7% -63.6% 3488 ± 25% softirqs.CPU14.SCHED
29318 ± 10% -49.5% 14793 ± 12% softirqs.CPU14.TIMER
10303 ± 9% -69.8% 3107 ± 8% softirqs.CPU15.SCHED
35959 ± 16% -66.2% 12147 ± 15% softirqs.CPU15.TIMER
10358 ± 4% -57.9% 4358 ± 25% softirqs.CPU2.SCHED
28406 ± 10% -58.0% 11921 ± 23% softirqs.CPU2.TIMER
10459 ± 9% -57.7% 4429 ± 7% softirqs.CPU3.SCHED
27497 ± 11% -59.5% 11143 ± 7% softirqs.CPU3.TIMER
9960 ± 12% -65.8% 3409 ± 19% softirqs.CPU4.SCHED
27949 ± 5% -60.0% 11190 ± 6% softirqs.CPU4.TIMER
10009 ± 9% -62.2% 3778 ± 9% softirqs.CPU5.SCHED
27090 ± 6% -56.4% 11808 ± 7% softirqs.CPU5.TIMER
9224 ± 12% -69.8% 2783 ± 22% softirqs.CPU6.SCHED
26628 ± 13% -65.6% 9157 ± 20% softirqs.CPU6.TIMER
10475 ± 11% -68.9% 3254 ± 5% softirqs.CPU7.SCHED
30529 ± 15% -55.8% 13502 ± 20% softirqs.CPU7.TIMER
10705 ± 13% -69.8% 3230 ± 13% softirqs.CPU8.SCHED
27870 ± 9% -62.1% 10558 ± 22% softirqs.CPU8.TIMER
9213 ± 10% -65.0% 3224 ± 9% softirqs.CPU9.SCHED
29536 ± 10% -67.5% 9606 ± 9% softirqs.CPU9.TIMER
66127 ± 12% -37.1% 41572 ± 13% softirqs.RCU
164112 ± 5% -63.0% 60699 ± 8% softirqs.SCHED
467926 ± 6% -59.4% 189843 ± 10% softirqs.TIMER
16.76 ± 23% -100.0% 0.00 perf-stat.i.MPKI
5.023e+08 ± 19% -100.0% 0.00 perf-stat.i.branch-instructions
2.35 ± 6% -2.3 0.00 perf-stat.i.branch-miss-rate%
18797300 ± 12% -100.0% 0.00 perf-stat.i.branch-misses
30.32 ± 4% -30.3 0.00 perf-stat.i.cache-miss-rate%
17828018 ± 15% -100.0% 0.00 perf-stat.i.cache-misses
28884105 ± 9% -100.0% 0.00 perf-stat.i.cache-references
3.14 ± 5% -100.0% 0.00 perf-stat.i.cpi
6.031e+09 ± 15% -100.0% 0.00 perf-stat.i.cpu-cycles
5052 ± 26% -100.0% 0.00 perf-stat.i.cycles-between-cache-misses
0.09 ± 21% -0.1 0.00 perf-stat.i.dTLB-load-miss-rate%
747496 ± 4% -100.0% 0.00 perf-stat.i.dTLB-load-misses
9.926e+08 ± 7% -100.0% 0.00 perf-stat.i.dTLB-loads
0.18 ± 4% -0.2 0.00 perf-stat.i.dTLB-store-miss-rate%
713615 ± 3% -100.0% 0.00 perf-stat.i.dTLB-store-misses
8.47e+08 ± 3% -100.0% 0.00 perf-stat.i.dTLB-stores
0.02 ± 13% -0.0 0.00 perf-stat.i.iTLB-load-miss-rate%
291714 ± 14% -100.0% 0.00 perf-stat.i.iTLB-load-misses
2.416e+09 ± 15% -100.0% 0.00 perf-stat.i.iTLB-loads
2.39e+09 ± 15% -100.0% 0.00 perf-stat.i.instructions
9722 ± 2% -100.0% 0.00 perf-stat.i.instructions-per-iTLB-miss
0.37 ± 4% -100.0% 0.00 perf-stat.i.ipc
12.36 ± 19% -100.0% 0.00 perf-stat.overall.MPKI
3.81 ± 12% -3.8 0.00 perf-stat.overall.branch-miss-rate%
61.30 ± 5% -61.3 0.00 perf-stat.overall.cache-miss-rate%
2.53 -100.0% 0.00 perf-stat.overall.cpi
350.57 ± 22% -100.0% 0.00 perf-stat.overall.cycles-between-cache-misses
0.08 ± 7% -0.1 0.00 perf-stat.overall.dTLB-load-miss-rate%
0.08 ± 5% -0.1 0.00 perf-stat.overall.dTLB-store-miss-rate%
0.01 ± 2% -0.0 0.00 perf-stat.overall.iTLB-load-miss-rate%
8204 ± 2% -100.0% 0.00 perf-stat.overall.instructions-per-iTLB-miss
0.40 -100.0% 0.00 perf-stat.overall.ipc
4.932e+08 ± 19% -100.0% 0.00 perf-stat.ps.branch-instructions
18451216 ± 12% -100.0% 0.00 perf-stat.ps.branch-misses
17425026 ± 15% -100.0% 0.00 perf-stat.ps.cache-misses
28269421 ± 9% -100.0% 0.00 perf-stat.ps.cache-references
5.942e+09 ± 15% -100.0% 0.00 perf-stat.ps.cpu-cycles
732958 ± 3% -100.0% 0.00 perf-stat.ps.dTLB-load-misses
9.783e+08 ± 7% -100.0% 0.00 perf-stat.ps.dTLB-loads
700749 ± 2% -100.0% 0.00 perf-stat.ps.dTLB-store-misses
8.398e+08 ± 3% -100.0% 0.00 perf-stat.ps.dTLB-stores
285931 ± 14% -100.0% 0.00 perf-stat.ps.iTLB-load-misses
2.375e+09 ± 14% -100.0% 0.00 perf-stat.ps.iTLB-loads
2.349e+09 ± 15% -100.0% 0.00 perf-stat.ps.instructions
1.334e+11 ± 7% -100.0% 0.00 perf-stat.total.instructions
17.55 ±116% -13.5 4.06 ±173% perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_remove_from_context.perf_event_release_kernel.perf_release
16.45 ± 63% -11.8 4.62 ±173% perf-profile.calltrace.cycles-pp.task_work_run.do_exit.do_group_exit.get_signal.do_signal
16.45 ± 63% -11.8 4.62 ±173% perf-profile.calltrace.cycles-pp.__fput.task_work_run.do_exit.do_group_exit.get_signal
16.45 ± 63% -11.8 4.62 ±173% perf-profile.calltrace.cycles-pp.perf_release.__fput.task_work_run.do_exit.do_group_exit
16.45 ± 63% -11.8 4.62 ±173% perf-profile.calltrace.cycles-pp.perf_event_release_kernel.perf_release.__fput.task_work_run.do_exit
12.50 ±101% -8.4 4.06 ±173% perf-profile.calltrace.cycles-pp.event_function_call.perf_remove_from_context.perf_event_release_kernel.perf_release.__fput
12.50 ±101% -8.4 4.06 ±173% perf-profile.calltrace.cycles-pp.perf_remove_from_context.perf_event_release_kernel.perf_release.__fput.task_work_run
7.63 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.drm_client_buffer_vmap.drm_fb_helper_dirty_work.process_one_work.worker_thread.kthread
7.63 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.drm_gem_vmap.drm_client_buffer_vmap.drm_fb_helper_dirty_work.process_one_work.worker_thread
7.63 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.drm_gem_vram_object_vmap.drm_gem_vmap.drm_client_buffer_vmap.drm_fb_helper_dirty_work.process_one_work
7.63 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.drm_gem_vram_kmap.drm_gem_vram_object_vmap.drm_gem_vmap.drm_client_buffer_vmap.drm_fb_helper_dirty_work
7.63 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.ttm_bo_kmap.drm_gem_vram_kmap.drm_gem_vram_object_vmap.drm_gem_vmap.drm_client_buffer_vmap
7.63 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.__ioremap_caller.ttm_bo_kmap.drm_gem_vram_kmap.drm_gem_vram_object_vmap.drm_gem_vmap
7.62 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.on_each_cpu.flush_tlb_kernel_range.pmd_free_pte_page.ioremap_page_range.__ioremap_caller
7.62 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.smp_call_function_many.on_each_cpu.flush_tlb_kernel_range.pmd_free_pte_page.ioremap_page_range
7.62 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.ioremap_page_range.__ioremap_caller.ttm_bo_kmap.drm_gem_vram_kmap.drm_gem_vram_object_vmap
7.62 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.pmd_free_pte_page.ioremap_page_range.__ioremap_caller.ttm_bo_kmap.drm_gem_vram_kmap
7.62 ± 73% -7.6 0.00 perf-profile.calltrace.cycles-pp.flush_tlb_kernel_range.pmd_free_pte_page.ioremap_page_range.__ioremap_caller.ttm_bo_kmap
8.46 ± 74% -4.9 3.56 ±173% perf-profile.calltrace.cycles-pp.drm_fb_helper_dirty_work.process_one_work.worker_thread.kthread.ret_from_fork
4.87 ± 57% -1.5 3.38 ±173% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__tick_broadcast_oneshot_control.intel_idle.cpuidle_enter_state
4.88 ± 57% -1.5 3.40 ±173% perf-profile.calltrace.cycles-pp._raw_spin_lock.__tick_broadcast_oneshot_control.intel_idle.cpuidle_enter_state.cpuidle_enter
4.97 ± 57% -1.5 3.49 ±173% perf-profile.calltrace.cycles-pp.__tick_broadcast_oneshot_control.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle
26.94 ± 51% -14.7 12.21 ±110% perf-profile.children.cycles-pp.smp_call_function_single
16.45 ± 63% -11.8 4.62 ±173% perf-profile.children.cycles-pp.perf_release
16.45 ± 63% -11.8 4.62 ±173% perf-profile.children.cycles-pp.perf_event_release_kernel
16.46 ± 63% -11.8 4.66 ±172% perf-profile.children.cycles-pp.__fput
16.46 ± 63% -11.8 4.68 ±171% perf-profile.children.cycles-pp.task_work_run
9.22 ± 41% -8.6 0.60 ±161% perf-profile.children.cycles-pp.on_each_cpu
9.22 ± 41% -8.6 0.60 ±160% perf-profile.children.cycles-pp.smp_call_function_many
12.50 ±101% -8.4 4.06 ±173% perf-profile.children.cycles-pp.perf_remove_from_context
7.63 ± 73% -7.6 0.04 ±173% perf-profile.children.cycles-pp.drm_client_buffer_vmap
7.63 ± 73% -7.6 0.04 ±173% perf-profile.children.cycles-pp.drm_gem_vmap
7.63 ± 73% -7.6 0.04 ±173% perf-profile.children.cycles-pp.drm_gem_vram_object_vmap
7.63 ± 73% -7.6 0.04 ±173% perf-profile.children.cycles-pp.drm_gem_vram_kmap
7.63 ± 73% -7.6 0.04 ±173% perf-profile.children.cycles-pp.ttm_bo_kmap
7.63 ± 73% -7.6 0.04 ±173% perf-profile.children.cycles-pp.__ioremap_caller
7.62 ± 73% -7.6 0.03 ±173% perf-profile.children.cycles-pp.pmd_free_pte_page
7.62 ± 73% -7.6 0.03 ±173% perf-profile.children.cycles-pp.flush_tlb_kernel_range
7.62 ± 73% -7.6 0.03 ±173% perf-profile.children.cycles-pp.ioremap_page_range
8.46 ± 74% -4.9 3.56 ±173% perf-profile.children.cycles-pp.drm_fb_helper_dirty_work
3.50 ± 71% -2.8 0.71 ±173% perf-profile.children.cycles-pp.irq_work_run
3.50 ± 71% -2.8 0.71 ±173% perf-profile.children.cycles-pp.printk
3.50 ± 71% -2.8 0.74 ±173% perf-profile.children.cycles-pp.irq_work_run_list
3.39 ± 75% -2.7 0.71 ±173% perf-profile.children.cycles-pp.irq_work_interrupt
3.39 ± 75% -2.7 0.71 ±173% perf-profile.children.cycles-pp.smp_irq_work_interrupt
4.90 ± 57% -1.5 3.39 ±173% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
4.99 ± 57% -1.5 3.52 ±173% perf-profile.children.cycles-pp.__tick_broadcast_oneshot_control
0.02 ±173% +7.0 6.99 ±114% perf-profile.children.cycles-pp.do_filp_open
0.02 ±173% +7.0 6.99 ±114% perf-profile.children.cycles-pp.path_openat
0.00 +7.0 6.98 ±114% perf-profile.children.cycles-pp.do_sys_open
9.18 ± 41% -8.6 0.60 ±161% perf-profile.self.cycles-pp.smp_call_function_many
3.50 ± 71% -3.5 0.00 perf-profile.self.cycles-pp.vprintk_emit
4.90 ± 57% -1.5 3.39 ±173% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



phoronix-test-suite.mbw.0.mib_s

6000 +-+------------------------------------------------------------------+
| |
5000 O-O O O O O O O O O O O O O O O O O O O O O O |
| .+.|
|.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+ + + + +.+.+.+.+.+.+ |
4000 +-+ : : : : : |
| : : : : : |
3000 +-+ : : : : : : : : |
| : : : : : : : : |
2000 +-+ : : : : : : : : |
| : : : : : : : : |
| : : : : : : : : |
1000 +-+ : : : : |
| : : : : |
0 +-+------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (20.04 kB)
config-5.4.0-rc3-00149-g06e0dbcfd33c5 (203.86 kB)
job-script (7.00 kB)
job.yaml (4.62 kB)
reproduce (263.00 B)
Download all attachments

2019-10-27 19:56:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [perf] 06e0dbcfd3: phoronix-test-suite.mbw.0.mib_s 12.6% improvement

On Sun, Oct 27, 2019 at 01:18:12PM +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a 12.6% improvement of phoronix-test-suite.mbw.0.mib_s due to commit:

Wow! Sadly it's a false positive, from lowering perf overhead, instead
of improving the workload. Still seems like a good thing.

Note that there is a perf user tool change coming soon that will likely
improve it even more (using affiity to optimize all perf IPIs).

-Andi