2014-01-02 06:13:18

by Fengguang Wu

[permalink] [raw]
Subject: perf-stat changes after "Use hrtimers for event multiplexing"

Greetings,

We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
hrtimers for event multiplexing") and its parent commit ab573844e.
Are these expected changes?

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer
545996 +478.0% 3155637 TOTAL interrupts.LOC
182281 +12.3% 204718 TOTAL softirqs.SCHED
1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses
107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses
1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses
2590 +247.8% 9009 TOTAL vmstat.system.in
4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend
6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend
1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches
8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles
37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses
4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads
2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads
4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions
7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses
1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles
4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads
2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads
9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses
2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions
7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles
2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses
229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations
1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads
4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses
1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads
1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores
2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads
2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores
8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches
1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses
1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses
5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses
8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores
3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references
8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses
1613129 +2.6% 1655724 TOTAL perf-stat.context-switches

All of the changes happen in one of our test box, which has a DX58SO
baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
We can test more boxes if necessary.

Below is the more detailed list of changes.

ab573844e3058ee 9e6302056f8029f438e853432 <= commits being compared
--------------- -------------------------
81917 ~ 1% +773.9% 715856 ~ 6% fat/micro/dd-write/1HDD-cfq-btrfs-100dd <= test box, case and params
71000 ~ 1% +922.6% 726041 ~18% fat/micro/dd-write/1HDD-cfq-ext4-100dd <= test box, case and params
^^^^stddev ^^^^stddev
152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer <= metric for this chunk of compares
~~~~~~ ~~~~~~~ ~~~~~~~ value1, percent of change, value2

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
280034 ~ 1% +466.0% 1584966 ~ 2% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
265962 ~ 1% +490.6% 1570671 ~ 6% fat/micro/dd-write/1HDD-cfq-ext4-100dd
545996 +478.0% 3155637 TOTAL interrupts.LOC

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
90939 ~ 2% +13.6% 103315 ~ 3% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
91342 ~ 0% +11.0% 101402 ~ 0% fat/micro/dd-write/1HDD-cfq-ext4-100dd
182281 +12.3% 204718 TOTAL softirqs.SCHED

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
93997892 ~31% -93.8% 5850557 ~37% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1.046e+08 ~ 7% -98.8% 1255361 ~138% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
58318221 ~10% -100.0% 18569 ~130% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
48923497 ~15% -99.4% 298955 ~193% fat/micro/dd-write/1HDD-cfq-ext4-100dd
107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
83665740 ~ 6% -82.0% 15059253 ~144% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1.102e+08 ~ 7% -97.4% 2871173 ~41% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1529 ~ 2% +208.9% 4724 ~ 2% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1061 ~ 3% +303.9% 4285 ~ 8% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2590 +247.8% 9009 TOTAL vmstat.system.in

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
2.891e+12 ~ 1% +146.6% 7.13e+12 ~ 2% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1.658e+12 ~ 3% +178.6% 4.62e+12 ~ 7% fat/micro/dd-write/1HDD-cfq-ext4-100dd
4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
4.371e+12 ~ 1% +140.0% 1.049e+13 ~ 2% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
2.436e+12 ~ 3% +165.5% 6.467e+12 ~ 7% fat/micro/dd-write/1HDD-cfq-ext4-100dd
6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.753e+08 ~ 2% -50.8% 86339289 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
8.326e+11 ~ 6% +45.0% 1.207e+12 ~ 7% fat/micro/dd-write/1HDD-cfq-ext4-100dd
8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
19697591 ~ 2% +32.9% 26168712 ~ 0% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
18234552 ~ 3% +31.5% 23977313 ~ 4% fat/micro/dd-write/1HDD-cfq-ext4-100dd
37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
4.738e+11 ~ 5% +30.1% 6.165e+11 ~ 7% fat/micro/dd-write/1HDD-cfq-ext4-100dd
4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
2.56e+11 ~ 5% +30.1% 3.33e+11 ~ 7% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
4.951e+11 ~10% +24.6% 6.169e+11 ~ 6% fat/micro/dd-write/1HDD-cfq-ext4-100dd
4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
6.504e+08 ~ 7% +16.1% 7.548e+08 ~ 4% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1.345e+08 ~10% -33.8% 89058216 ~ 4% fat/micro/dd-write/1HDD-cfq-ext4-100dd
7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.055e+12 ~ 1% +5.8% 1.116e+12 ~ 1% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
8.363e+11 ~ 5% +44.2% 1.206e+12 ~ 6% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
2.241e+08 ~ 3% -18.6% 1.826e+08 ~ 2% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
2.103e+08 ~ 1% -22.2% 1.636e+08 ~ 3% fat/micro/dd-write/1HDD-cfq-ext4-100dd
4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.856e+11 ~ 0% +4.4% 1.938e+11 ~ 1% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
9.797e+10 ~12% +41.9% 1.39e+11 ~ 9% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
9.506e+10 ~ 6% +24.5% 1.183e+11 ~ 5% fat/micro/dd-write/1HDD-cfq-ext4-100dd
9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.824e+11 ~ 0% +4.9% 1.914e+11 ~ 0% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
9.791e+10 ~ 6% +43.5% 1.405e+11 ~ 9% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
4.411e+10 ~ 1% +5.1% 4.636e+10 ~ 0% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
3.576e+10 ~ 5% +40.4% 5.022e+10 ~ 6% fat/micro/dd-write/1HDD-cfq-ext4-100dd
7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
2.041e+09 ~ 0% +22.2% 2.495e+09 ~ 1% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
113694 ~ 1% -15.1% 96524 ~ 2% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
115451 ~ 2% -19.4% 93077 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.782e+11 ~ 4% +17.9% 2.1e+11 ~ 3% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
2.345e+08 ~ 3% -9.5% 2.123e+08 ~ 3% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
2.357e+08 ~ 4% -20.1% 1.883e+08 ~ 1% fat/micro/dd-write/1HDD-cfq-ext4-100dd
4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.418e+11 ~ 2% +17.4% 1.666e+11 ~ 3% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.838e+09 ~ 3% +16.1% 2.133e+09 ~ 1% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.275e+09 ~ 2% +8.3% 1.38e+09 ~ 1% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1.153e+09 ~ 2% +14.6% 1.322e+09 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.698e+11 ~ 1% +4.4% 1.773e+11 ~ 0% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1.09e+11 ~ 2% +15.2% 1.256e+11 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
8.66e+08 ~ 3% +10.8% 9.594e+08 ~ 0% fat/micro/dd-write/1HDD-cfq-ext4-100dd
8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.117e+09 ~ 2% +10.5% 1.234e+09 ~ 4% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1.705e+09 ~ 1% +5.3% 1.796e+09 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
5.671e+09 ~ 2% +6.1% 6.015e+09 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
8.794e+10 ~ 0% +3.6% 9.109e+10 ~ 2% fat/micro/dd-write/1HDD-cfq-ext4-100dd
8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
3.46e+09 ~ 1% +4.6% 3.618e+09 ~ 1% fat/micro/dd-write/1HDD-cfq-ext4-100dd
3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
8.696e+08 ~ 0% +1.8% 8.849e+08 ~ 0% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses

ab573844e3058ee 9e6302056f8029f438e853432
--------------- -------------------------
1613129 ~ 2% +2.6% 1655724 ~ 1% fat/micro/dd-write/1HDD-cfq-btrfs-100dd
1613129 +2.6% 1655724 TOTAL perf-stat.context-switches


perf-stat.stalled-cycles-frontend

1.2e+13 ++---------------------------------------------------------------+
| O O |
1.1e+13 ++ O O O O O O O |
1e+13 O+ O O O O O O O O O O
| |
9e+12 ++ |
8e+12 ++ |
| |
7e+12 ++ |
6e+12 ++ |
| |
5e+12 ++ |
4e+12 *+. ..*..*...*..*...*..*..*...*..*...*..*..*...*..*...*..*...*..*
| *. |
3e+12 ++---------------------------------------------------------------+


perf-stat.stalled-cycles-backend

7.5e+12 ++---------------O------O--------------------------O------O------+
7e+12 ++ O O O O O O O O |
O O O O O O O O
6.5e+12 ++ |
6e+12 ++ |
| |
5.5e+12 ++ |
5e+12 ++ |
4.5e+12 ++ |
| |
4e+12 ++ |
3.5e+12 ++ |
| .*... .*.. ..*.. |
3e+12 *+. ..*..*...*. *. *...*..*. *..*...*..*...*..*...*..*
2.5e+12 ++-*-------------------------------------------------------------+


vmstat.system.in

5500 ++------------------------------------------------------------------+
| O O |
5000 O+ O O O O O O O O O |
4500 ++ O O O O O O O O
| |
4000 ++ |
3500 ++ |
| |
3000 ++ |
2500 ++ |
| |
2000 ++ |
1500 *+..*..*... .*...*..*... ..*..*...*..*...*...*..*...*..*... .*...*
| *. *. *. |
1000 ++------------------------------------------------------------------+


perf-stat.node-prefetch-misses

9e+07 ++-----------------------------------------------------------------+
| * |
8e+07 ++ : + *.. |
7e+07 *+ *. : + : . |
| .. + .. : *.. ..*.. .*.. : * *...*
6e+07 ++ + : . .*. .. : + .. |
5e+07 ++ * *..* *. * *...* + ..* |
| *. |
4e+07 ++ |
3e+07 ++ |
| |
2e+07 ++ |
1e+07 ++ |
| |
0 O+--O--O---O--O---O--O---O--O---O--O---O--O---O--O---O--O---O------+


Attachments:
(No filename) (19.06 kB)
dmesg (67.34 kB)
x86_64-lkp (78.69 kB)
Download all attachments

2014-01-04 19:02:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf-stat changes after "Use hrtimers for event multiplexing"

On Thu, Jan 02, 2014 at 02:12:42PM +0800, [email protected] wrote:
> Greetings,
>
> We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
> hrtimers for event multiplexing") and its parent commit ab573844e.
> Are these expected changes?
>
> ab573844e3058ee 9e6302056f8029f438e853432
> --------------- -------------------------
> 152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer
> 545996 +478.0% 3155637 TOTAL interrupts.LOC
> 182281 +12.3% 204718 TOTAL softirqs.SCHED
> 1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses
> 107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses
> 1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses
> 2590 +247.8% 9009 TOTAL vmstat.system.in
> 4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend
> 6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend
> 1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches
> 8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles
> 37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses
> 4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads
> 2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads
> 4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions
> 7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses
> 1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles
> 4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads
> 2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads
> 9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses
> 2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions
> 7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles
> 2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses
> 229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations
> 1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads
> 4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses
> 1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads
> 1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores
> 2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads
> 2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores
> 8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches
> 1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses
> 1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses
> 5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses
> 8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores
> 3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references
> 8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses
> 1613129 +2.6% 1655724 TOTAL perf-stat.context-switches
>
> All of the changes happen in one of our test box, which has a DX58SO
> baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
> We can test more boxes if necessary.

How do you run perf stat? Curious that you notice this now, its a fairly
old commit.

IIRC we did have a few wobbles with that, but I cannot remember much
detail.

The biggest difference between before and after that patch is that we'd
rotate while the core is 'idle'. So if you do something like 'perf stat
-a' and have significant idle time it does indeed make a difference.

2014-01-05 01:14:29

by Fengguang Wu

[permalink] [raw]
Subject: Re: perf-stat changes after "Use hrtimers for event multiplexing"

On Sat, Jan 04, 2014 at 08:02:28PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 02, 2014 at 02:12:42PM +0800, [email protected] wrote:
> > Greetings,
> >
> > We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
> > hrtimers for event multiplexing") and its parent commit ab573844e.
> > Are these expected changes?
> >
> > ab573844e3058ee 9e6302056f8029f438e853432
> > --------------- -------------------------
> > 152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer
> > 545996 +478.0% 3155637 TOTAL interrupts.LOC
> > 182281 +12.3% 204718 TOTAL softirqs.SCHED
> > 1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses
> > 107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses
> > 1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses
> > 2590 +247.8% 9009 TOTAL vmstat.system.in
> > 4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend
> > 6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend
> > 1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches
> > 8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles
> > 37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses
> > 4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads
> > 2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads
> > 4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions
> > 7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses
> > 1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles
> > 4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads
> > 2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads
> > 9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses
> > 2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions
> > 7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles
> > 2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses
> > 229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations
> > 1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads
> > 4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses
> > 1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads
> > 1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores
> > 2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads
> > 2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores
> > 8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches
> > 1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses
> > 1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses
> > 5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses
> > 8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores
> > 3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references
> > 8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses
> > 1613129 +2.6% 1655724 TOTAL perf-stat.context-switches
> >
> > All of the changes happen in one of our test box, which has a DX58SO
> > baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
> > We can test more boxes if necessary.
>
> How do you run perf stat?

perf stat -a $(-e hardware, cache, software events)

> Curious that you notice this now, its a fairly old commit.

Yeah, we are feeding old kernels to the 0day performance test system, too. :)

> IIRC we did have a few wobbles with that, but I cannot remember much
> detail.
>
> The biggest difference between before and after that patch is that we'd
> rotate while the core is 'idle'. So if you do something like 'perf stat
> -a' and have significant idle time it does indeed make a difference.

It is 'perf stat -a'; the CPU is mostly idle because it's an IO workload.

btw, we find another commit that changed some perf-stat output:

2f7f73a520 ("perf/x86: Fix shared register mutual exclusion enforcement")

Comparing to its parent commit:

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
1.308e+08 ~26% -77.8% 29029594 ~12% fat/micro/dd-write/1HDD-deadline-xfs-10dd
1.308e+08 -77.8% 29029594 TOTAL perf-stat.LLC-prefetch-misses

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
97086131 ~ 7% -71.0% 28127157 ~11% fat/micro/dd-write/1HDD-deadline-xfs-10dd
97086131 -71.0% 28127157 TOTAL perf-stat.node-prefetches

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
1.4e+08 ~ 3% -56.6% 60744486 ~ 9% fat/micro/dd-write/1HDD-deadline-xfs-10dd
1.4e+08 -56.6% 60744486 TOTAL perf-stat.LLC-load-misses

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
6.967e+08 ~ 0% -49.6% 3.513e+08 ~ 6% fat/micro/dd-write/1HDD-deadline-xfs-10dd
6.967e+08 -49.6% 3.513e+08 TOTAL perf-stat.node-stores

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
1.933e+09 ~ 1% -43.0% 1.103e+09 ~ 2% fat/micro/dd-write/1HDD-deadline-xfs-10dd
1.933e+09 -43.0% 1.103e+09 TOTAL perf-stat.LLC-stores

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
7.013e+08 ~ 5% -55.5% 3.118e+08 ~ 4% fat/micro/dd-write/1HDD-deadline-btrfs-100dd
6.775e+09 ~ 1% -20.4% 5.391e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
7.477e+09 -23.7% 5.703e+09 TOTAL perf-stat.LLC-store-misses

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
2.294e+09 ~ 1% -10.0% 2.065e+09 ~ 0% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
2.294e+09 -10.0% 2.065e+09 TOTAL perf-stat.LLC-prefetches

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
8.685e+09 ~ 0% -10.0% 7.814e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
8.685e+09 -10.0% 7.814e+09 TOTAL perf-stat.cache-misses

069e0c3c4058147 2f7f73a52078b667d64df16ea
--------------- -------------------------
1.591e+12 ~ 0% -8.7% 1.453e+12 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
1.591e+12 -8.7% 1.453e+12 TOTAL perf-stat.dTLB-loads


Thanks,
Fengguang

2014-01-07 09:52:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf-stat changes after "Use hrtimers for event multiplexing"

Hi,

With the hrtitmer patch, you will get more regular multiplexing when
you have idle cores during your benchmark.
Without the patch, multiplexing was piggybacked on timer tick. The
timer tick does not occur when a core is idle
when using a tickless kernel. Thus, the quality of the results with
hrtimers should be improved.


On Sun, Jan 5, 2014 at 2:14 AM, Fengguang Wu <[email protected]> wrote:
> On Sat, Jan 04, 2014 at 08:02:28PM +0100, Peter Zijlstra wrote:
>> On Thu, Jan 02, 2014 at 02:12:42PM +0800, [email protected] wrote:
>> > Greetings,
>> >
>> > We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
>> > hrtimers for event multiplexing") and its parent commit ab573844e.
>> > Are these expected changes?
>> >
>> > ab573844e3058ee 9e6302056f8029f438e853432
>> > --------------- -------------------------
>> > 152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer
>> > 545996 +478.0% 3155637 TOTAL interrupts.LOC
>> > 182281 +12.3% 204718 TOTAL softirqs.SCHED
>> > 1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses
>> > 107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses
>> > 1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses
>> > 2590 +247.8% 9009 TOTAL vmstat.system.in
>> > 4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend
>> > 6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend
>> > 1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches
>> > 8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles
>> > 37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses
>> > 4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads
>> > 2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads
>> > 4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions
>> > 7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses
>> > 1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles
>> > 4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads
>> > 2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads
>> > 9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses
>> > 2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions
>> > 7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles
>> > 2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses
>> > 229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations
>> > 1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads
>> > 4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses
>> > 1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads
>> > 1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores
>> > 2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads
>> > 2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores
>> > 8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches
>> > 1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses
>> > 1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses
>> > 5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses
>> > 8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores
>> > 3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references
>> > 8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses
>> > 1613129 +2.6% 1655724 TOTAL perf-stat.context-switches
>> >
>> > All of the changes happen in one of our test box, which has a DX58SO
>> > baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
>> > We can test more boxes if necessary.
>>
>> How do you run perf stat?
>
> perf stat -a $(-e hardware, cache, software events)
>
>> Curious that you notice this now, its a fairly old commit.
>
> Yeah, we are feeding old kernels to the 0day performance test system, too. :)
>
>> IIRC we did have a few wobbles with that, but I cannot remember much
>> detail.
>>
>> The biggest difference between before and after that patch is that we'd
>> rotate while the core is 'idle'. So if you do something like 'perf stat
>> -a' and have significant idle time it does indeed make a difference.
>
> It is 'perf stat -a'; the CPU is mostly idle because it's an IO workload.
>
> btw, we find another commit that changed some perf-stat output:
>
> 2f7f73a520 ("perf/x86: Fix shared register mutual exclusion enforcement")
>
> Comparing to its parent commit:
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 1.308e+08 ~26% -77.8% 29029594 ~12% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> 1.308e+08 -77.8% 29029594 TOTAL perf-stat.LLC-prefetch-misses
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 97086131 ~ 7% -71.0% 28127157 ~11% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> 97086131 -71.0% 28127157 TOTAL perf-stat.node-prefetches
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 1.4e+08 ~ 3% -56.6% 60744486 ~ 9% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> 1.4e+08 -56.6% 60744486 TOTAL perf-stat.LLC-load-misses
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 6.967e+08 ~ 0% -49.6% 3.513e+08 ~ 6% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> 6.967e+08 -49.6% 3.513e+08 TOTAL perf-stat.node-stores
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 1.933e+09 ~ 1% -43.0% 1.103e+09 ~ 2% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> 1.933e+09 -43.0% 1.103e+09 TOTAL perf-stat.LLC-stores
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 7.013e+08 ~ 5% -55.5% 3.118e+08 ~ 4% fat/micro/dd-write/1HDD-deadline-btrfs-100dd
> 6.775e+09 ~ 1% -20.4% 5.391e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> 7.477e+09 -23.7% 5.703e+09 TOTAL perf-stat.LLC-store-misses
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 2.294e+09 ~ 1% -10.0% 2.065e+09 ~ 0% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> 2.294e+09 -10.0% 2.065e+09 TOTAL perf-stat.LLC-prefetches
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 8.685e+09 ~ 0% -10.0% 7.814e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> 8.685e+09 -10.0% 7.814e+09 TOTAL perf-stat.cache-misses
>
> 069e0c3c4058147 2f7f73a52078b667d64df16ea
> --------------- -------------------------
> 1.591e+12 ~ 0% -8.7% 1.453e+12 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> 1.591e+12 -8.7% 1.453e+12 TOTAL perf-stat.dTLB-loads
>
>
> Thanks,
> Fengguang

2014-01-07 13:20:45

by Fengguang Wu

[permalink] [raw]
Subject: Re: perf-stat changes after "Use hrtimers for event multiplexing"

Hi Stephane,

On Tue, Jan 07, 2014 at 10:52:50AM +0100, Stephane Eranian wrote:
> Hi,
>
> With the hrtitmer patch, you will get more regular multiplexing when
> you have idle cores during your benchmark.
> Without the patch, multiplexing was piggybacked on timer tick. The
> timer tick does not occur when a core is idle
> when using a tickless kernel. Thus, the quality of the results with
> hrtimers should be improved.

OK, got it. Thanks for the explanations!

Thanks,
Fengguang

>
> On Sun, Jan 5, 2014 at 2:14 AM, Fengguang Wu <[email protected]> wrote:
> > On Sat, Jan 04, 2014 at 08:02:28PM +0100, Peter Zijlstra wrote:
> >> On Thu, Jan 02, 2014 at 02:12:42PM +0800, [email protected] wrote:
> >> > Greetings,
> >> >
> >> > We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
> >> > hrtimers for event multiplexing") and its parent commit ab573844e.
> >> > Are these expected changes?
> >> >
> >> > ab573844e3058ee 9e6302056f8029f438e853432
> >> > --------------- -------------------------
> >> > 152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer
> >> > 545996 +478.0% 3155637 TOTAL interrupts.LOC
> >> > 182281 +12.3% 204718 TOTAL softirqs.SCHED
> >> > 1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses
> >> > 107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses
> >> > 1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses
> >> > 2590 +247.8% 9009 TOTAL vmstat.system.in
> >> > 4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend
> >> > 6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend
> >> > 1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches
> >> > 8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles
> >> > 37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses
> >> > 4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads
> >> > 2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads
> >> > 4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions
> >> > 7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses
> >> > 1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles
> >> > 4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads
> >> > 2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads
> >> > 9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses
> >> > 2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions
> >> > 7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles
> >> > 2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses
> >> > 229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations
> >> > 1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads
> >> > 4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses
> >> > 1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads
> >> > 1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores
> >> > 2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads
> >> > 2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores
> >> > 8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches
> >> > 1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses
> >> > 1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses
> >> > 5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses
> >> > 8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores
> >> > 3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references
> >> > 8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses
> >> > 1613129 +2.6% 1655724 TOTAL perf-stat.context-switches
> >> >
> >> > All of the changes happen in one of our test box, which has a DX58SO
> >> > baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
> >> > We can test more boxes if necessary.
> >>
> >> How do you run perf stat?
> >
> > perf stat -a $(-e hardware, cache, software events)
> >
> >> Curious that you notice this now, its a fairly old commit.
> >
> > Yeah, we are feeding old kernels to the 0day performance test system, too. :)
> >
> >> IIRC we did have a few wobbles with that, but I cannot remember much
> >> detail.
> >>
> >> The biggest difference between before and after that patch is that we'd
> >> rotate while the core is 'idle'. So if you do something like 'perf stat
> >> -a' and have significant idle time it does indeed make a difference.
> >
> > It is 'perf stat -a'; the CPU is mostly idle because it's an IO workload.
> >
> > btw, we find another commit that changed some perf-stat output:
> >
> > 2f7f73a520 ("perf/x86: Fix shared register mutual exclusion enforcement")
> >
> > Comparing to its parent commit:
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 1.308e+08 ~26% -77.8% 29029594 ~12% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> > 1.308e+08 -77.8% 29029594 TOTAL perf-stat.LLC-prefetch-misses
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 97086131 ~ 7% -71.0% 28127157 ~11% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> > 97086131 -71.0% 28127157 TOTAL perf-stat.node-prefetches
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 1.4e+08 ~ 3% -56.6% 60744486 ~ 9% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> > 1.4e+08 -56.6% 60744486 TOTAL perf-stat.LLC-load-misses
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 6.967e+08 ~ 0% -49.6% 3.513e+08 ~ 6% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> > 6.967e+08 -49.6% 3.513e+08 TOTAL perf-stat.node-stores
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 1.933e+09 ~ 1% -43.0% 1.103e+09 ~ 2% fat/micro/dd-write/1HDD-deadline-xfs-10dd
> > 1.933e+09 -43.0% 1.103e+09 TOTAL perf-stat.LLC-stores
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 7.013e+08 ~ 5% -55.5% 3.118e+08 ~ 4% fat/micro/dd-write/1HDD-deadline-btrfs-100dd
> > 6.775e+09 ~ 1% -20.4% 5.391e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> > 7.477e+09 -23.7% 5.703e+09 TOTAL perf-stat.LLC-store-misses
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 2.294e+09 ~ 1% -10.0% 2.065e+09 ~ 0% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> > 2.294e+09 -10.0% 2.065e+09 TOTAL perf-stat.LLC-prefetches
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 8.685e+09 ~ 0% -10.0% 7.814e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> > 8.685e+09 -10.0% 7.814e+09 TOTAL perf-stat.cache-misses
> >
> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
> > --------------- -------------------------
> > 1.591e+12 ~ 0% -8.7% 1.453e+12 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> > 1.591e+12 -8.7% 1.453e+12 TOTAL perf-stat.dTLB-loads
> >
> >
> > Thanks,
> > Fengguang

2014-01-07 14:26:11

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf-stat changes after "Use hrtimers for event multiplexing"

On Tue, Jan 7, 2014 at 2:20 PM, Fengguang Wu <[email protected]> wrote:
> Hi Stephane,
>
> On Tue, Jan 07, 2014 at 10:52:50AM +0100, Stephane Eranian wrote:
>> Hi,
>>
>> With the hrtitmer patch, you will get more regular multiplexing when
>> you have idle cores during your benchmark.
>> Without the patch, multiplexing was piggybacked on timer tick. The
>> timer tick does not occur when a core is idle
>> when using a tickless kernel. Thus, the quality of the results with
>> hrtimers should be improved.
>
In other words, if your measurement was not multiplexing events OR
your benchmark did not incur idle time, you should not see any difference
with previous kernel with no hrtimer mux.

> OK, got it. Thanks for the explanations!
>
> Thanks,
> Fengguang
>
>>
>> On Sun, Jan 5, 2014 at 2:14 AM, Fengguang Wu <[email protected]> wrote:
>> > On Sat, Jan 04, 2014 at 08:02:28PM +0100, Peter Zijlstra wrote:
>> >> On Thu, Jan 02, 2014 at 02:12:42PM +0800, [email protected] wrote:
>> >> > Greetings,
>> >> >
>> >> > We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
>> >> > hrtimers for event multiplexing") and its parent commit ab573844e.
>> >> > Are these expected changes?
>> >> >
>> >> > ab573844e3058ee 9e6302056f8029f438e853432
>> >> > --------------- -------------------------
>> >> > 152917 +842.9% 1441897 TOTAL interrupts.0:IO-APIC-edge.timer
>> >> > 545996 +478.0% 3155637 TOTAL interrupts.LOC
>> >> > 182281 +12.3% 204718 TOTAL softirqs.SCHED
>> >> > 1.986e+08 -96.4% 7105919 TOTAL perf-stat.node-store-misses
>> >> > 107241719 -99.7% 317525 TOTAL perf-stat.node-prefetch-misses
>> >> > 1.938e+08 -90.7% 17930426 TOTAL perf-stat.node-load-misses
>> >> > 2590 +247.8% 9009 TOTAL vmstat.system.in
>> >> > 4.549e+12 +158.3% 1.175e+13 TOTAL perf-stat.stalled-cycles-backend
>> >> > 6.807e+12 +149.1% 1.696e+13 TOTAL perf-stat.stalled-cycles-frontend
>> >> > 1.753e+08 -50.8% 86339289 TOTAL perf-stat.node-prefetches
>> >> > 8.326e+11 +45.0% 1.207e+12 TOTAL perf-stat.cpu-cycles
>> >> > 37932143 +32.2% 50146025 TOTAL perf-stat.iTLB-load-misses
>> >> > 4.738e+11 +30.1% 6.165e+11 TOTAL perf-stat.iTLB-loads
>> >> > 2.56e+11 +30.1% 3.33e+11 TOTAL perf-stat.L1-icache-loads
>> >> > 4.951e+11 +24.6% 6.169e+11 TOTAL perf-stat.instructions
>> >> > 7.85e+08 +7.5% 8.439e+08 TOTAL perf-stat.LLC-prefetch-misses
>> >> > 1.891e+12 +22.8% 2.322e+12 TOTAL perf-stat.ref-cycles
>> >> > 4.344e+08 -20.3% 3.462e+08 TOTAL perf-stat.node-loads
>> >> > 2.836e+11 +17.4% 3.328e+11 TOTAL perf-stat.branch-loads
>> >> > 9.506e+10 +24.5% 1.183e+11 TOTAL perf-stat.branch-load-misses
>> >> > 2.803e+11 +18.4% 3.319e+11 TOTAL perf-stat.branch-instructions
>> >> > 7.988e+10 +20.9% 9.658e+10 TOTAL perf-stat.bus-cycles
>> >> > 2.041e+09 +22.2% 2.495e+09 TOTAL perf-stat.branch-misses
>> >> > 229145 -17.3% 189601 TOTAL perf-stat.cpu-migrations
>> >> > 1.782e+11 +17.9% 2.1e+11 TOTAL perf-stat.dTLB-loads
>> >> > 4.702e+08 -14.8% 4.006e+08 TOTAL perf-stat.LLC-load-misses
>> >> > 1.418e+11 +17.4% 1.666e+11 TOTAL perf-stat.L1-dcache-loads
>> >> > 1.838e+09 +16.1% 2.133e+09 TOTAL perf-stat.LLC-stores
>> >> > 2.428e+09 +11.3% 2.702e+09 TOTAL perf-stat.LLC-loads
>> >> > 2.788e+11 +8.6% 3.029e+11 TOTAL perf-stat.dTLB-stores
>> >> > 8.66e+08 +10.8% 9.594e+08 TOTAL perf-stat.LLC-prefetches
>> >> > 1.117e+09 +10.5% 1.234e+09 TOTAL perf-stat.dTLB-store-misses
>> >> > 1.705e+09 +5.3% 1.796e+09 TOTAL perf-stat.L1-dcache-store-misses
>> >> > 5.671e+09 +6.1% 6.015e+09 TOTAL perf-stat.L1-dcache-load-misses
>> >> > 8.794e+10 +3.6% 9.109e+10 TOTAL perf-stat.L1-dcache-stores
>> >> > 3.46e+09 +4.6% 3.618e+09 TOTAL perf-stat.cache-references
>> >> > 8.696e+08 +1.8% 8.849e+08 TOTAL perf-stat.cache-misses
>> >> > 1613129 +2.6% 1655724 TOTAL perf-stat.context-switches
>> >> >
>> >> > All of the changes happen in one of our test box, which has a DX58SO
>> >> > baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
>> >> > We can test more boxes if necessary.
>> >>
>> >> How do you run perf stat?
>> >
>> > perf stat -a $(-e hardware, cache, software events)
>> >
>> >> Curious that you notice this now, its a fairly old commit.
>> >
>> > Yeah, we are feeding old kernels to the 0day performance test system, too. :)
>> >
>> >> IIRC we did have a few wobbles with that, but I cannot remember much
>> >> detail.
>> >>
>> >> The biggest difference between before and after that patch is that we'd
>> >> rotate while the core is 'idle'. So if you do something like 'perf stat
>> >> -a' and have significant idle time it does indeed make a difference.
>> >
>> > It is 'perf stat -a'; the CPU is mostly idle because it's an IO workload.
>> >
>> > btw, we find another commit that changed some perf-stat output:
>> >
>> > 2f7f73a520 ("perf/x86: Fix shared register mutual exclusion enforcement")
>> >
>> > Comparing to its parent commit:
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 1.308e+08 ~26% -77.8% 29029594 ~12% fat/micro/dd-write/1HDD-deadline-xfs-10dd
>> > 1.308e+08 -77.8% 29029594 TOTAL perf-stat.LLC-prefetch-misses
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 97086131 ~ 7% -71.0% 28127157 ~11% fat/micro/dd-write/1HDD-deadline-xfs-10dd
>> > 97086131 -71.0% 28127157 TOTAL perf-stat.node-prefetches
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 1.4e+08 ~ 3% -56.6% 60744486 ~ 9% fat/micro/dd-write/1HDD-deadline-xfs-10dd
>> > 1.4e+08 -56.6% 60744486 TOTAL perf-stat.LLC-load-misses
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 6.967e+08 ~ 0% -49.6% 3.513e+08 ~ 6% fat/micro/dd-write/1HDD-deadline-xfs-10dd
>> > 6.967e+08 -49.6% 3.513e+08 TOTAL perf-stat.node-stores
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 1.933e+09 ~ 1% -43.0% 1.103e+09 ~ 2% fat/micro/dd-write/1HDD-deadline-xfs-10dd
>> > 1.933e+09 -43.0% 1.103e+09 TOTAL perf-stat.LLC-stores
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 7.013e+08 ~ 5% -55.5% 3.118e+08 ~ 4% fat/micro/dd-write/1HDD-deadline-btrfs-100dd
>> > 6.775e+09 ~ 1% -20.4% 5.391e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
>> > 7.477e+09 -23.7% 5.703e+09 TOTAL perf-stat.LLC-store-misses
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 2.294e+09 ~ 1% -10.0% 2.065e+09 ~ 0% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
>> > 2.294e+09 -10.0% 2.065e+09 TOTAL perf-stat.LLC-prefetches
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 8.685e+09 ~ 0% -10.0% 7.814e+09 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
>> > 8.685e+09 -10.0% 7.814e+09 TOTAL perf-stat.cache-misses
>> >
>> > 069e0c3c4058147 2f7f73a52078b667d64df16ea
>> > --------------- -------------------------
>> > 1.591e+12 ~ 0% -8.7% 1.453e+12 ~ 1% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
>> > 1.591e+12 -8.7% 1.453e+12 TOTAL perf-stat.dTLB-loads
>> >
>> >
>> > Thanks,
>> > Fengguang