It turns out that perf_event for intel seems to use the INST_RETIRED.ALL
event interchangably with the "Fixed Counter 0" event.
It turns out they are not equivelent. The Fixed Counter 0 event turns out
to be deterministic, while INST_RETIRED.ALL has a bug where it counts
extra events due to hardware interrupts.
Having a user-accessible deterministic instructions event would be really
useful. So is there a way we can specify we want an event to run on Fixed
Counter 0? I think there is code already that does this for Fixed Counter
2 for similar reasons.
For an example of this happening in real life, take the
./retired_instr.all.x86_64 from my deterministic benchmark that
I'll be presenting at the ISPASS conference next week.
(can be found here git://github.com/deater/deterministic.git )
If you run this benchmark with the same event listed 5 times on an Ivy
Bridge machine you get these results, notice the last one is the "proper"
deterministic result and thus the one that ran on Fixed Counter 0.
$ perf stat -e instructions:u,instructions:u,instructions:u,instructions:u,instructions:u ./retired_instr.all.x86_64
...
Performance counter stats for './retired_instr.all.x86_64':
227,010,687 instructions:u # 0.00 insns per cycle
227,010,687 instructions:u # 0.00 insns per cycle
227,010,687 instructions:u # 0.00 insns per cycle
227,010,687 instructions:u # 0.00 insns per cycle
227,000,723 instructions:u # 0.00 insns per cycle
1.902648316 seconds time elapsed
Thanks,
Vince Weaver
[email protected]
http://www.eece.maine.edu/~vweaver/
On Mon, Apr 15, 2013 at 6:13 PM, Vince Weaver <[email protected]> wrote:
>
> It turns out that perf_event for intel seems to use the INST_RETIRED.ALL
> event interchangably with the "Fixed Counter 0" event.
>
Yes, it does.
> It turns out they are not equivelent. The Fixed Counter 0 event turns out
> to be deterministic, while INST_RETIRED.ALL has a bug where it counts
> extra events due to hardware interrupts.
>
Never heard of that problem. I know there was another problem due to leaking
during priv level transitions. It would be take a few instr or cycles to realize
you were not in user level any more when doing event:u.
Interrupt should impact fixed and generic counters the same way.
> Having a user-accessible deterministic instructions event would be really
> useful. So is there a way we can specify we want an event to run on Fixed
> Counter 0? I think there is code already that does this for Fixed Counter
> 2 for similar reasons.
>
No. Fixed counter 2 (ref-cycles) is a different reason. It's because it measures
an event that does not exist on generic counters: unhalted_reference_cycles.
> For an example of this happening in real life, take the
> ./retired_instr.all.x86_64 from my deterministic benchmark that
> I'll be presenting at the ISPASS conference next week.
> (can be found here git://github.com/deater/deterministic.git )
>
> If you run this benchmark with the same event listed 5 times on an Ivy
> Bridge machine you get these results, notice the last one is the "proper"
> deterministic result and thus the one that ran on Fixed Counter 0.
>
Are you sure that the 5th event stayed in fixed counter 0 all along?
> $ perf stat -e instructions:u,instructions:u,instructions:u,instructions:u,instructions:u ./retired_instr.all.x86_64
> ...
> Performance counter stats for './retired_instr.all.x86_64':
>
> 227,010,687 instructions:u # 0.00 insns per cycle
> 227,010,687 instructions:u # 0.00 insns per cycle
> 227,010,687 instructions:u # 0.00 insns per cycle
> 227,010,687 instructions:u # 0.00 insns per cycle
> 227,000,723 instructions:u # 0.00 insns per cycle
>
> 1.902648316 seconds time elapsed
>
> Thanks,
>
> Vince Weaver
> [email protected]
> http://www.eece.maine.edu/~vweaver/
On Mon, 15 Apr 2013, Stephane Eranian wrote:
> Never heard of that problem. I know there was another problem due to leaking
> during priv level transitions. It would be take a few instr or cycles to realize
> you were not in user level any more when doing event:u.
>
> Interrupt should impact fixed and generic counters the same way.
Some people inside Intel were reproducing my "deterministic event" work
and they informed me of this issue.
> Are you sure that the 5th event stayed in fixed counter 0 all along?
No, but is there any way to enforce that currently using perf?
The results are about what I'd expect. The generic instructions:u
events are overcounting by roughly 20,008 for page faults (as expected)
and 650 for hardware interrupts (also as expected) wheras the
Fixed Counter 0 event is overcounting 10,000 (for page faults?) and
undercounting a bit possibly due to a supposedly known issue involving
the counts for rep-prefixed string instructions that apparently only
happens on Fixed Counter 0.
> > $ perf stat -e instructions:u,instructions:u,instructions:u,instructions:u,instructions:u ./retired_instr.all.x86_64
> > ...
> > Performance counter stats for './retired_instr.all.x86_64':
> >
> > 227,010,687 instructions:u # 0.00 insns per cycle
> > 227,010,687 instructions:u # 0.00 insns per cycle
> > 227,010,687 instructions:u # 0.00 insns per cycle
> > 227,010,687 instructions:u # 0.00 insns per cycle
> > 227,000,723 instructions:u # 0.00 insns per cycle
> >
> > 1.902648316 seconds time elapsed
Vince Weaver
[email protected]
http://www.eece.maine.edu/~vweaver/