2019-12-04 17:03:10

by David Laight

[permalink] [raw]
Subject: Running an Ivy Bridge cpu at fixed frequency

Is there any way to persuade the intel_pstate driver to make an Ivy bridge (i7-3770)
cpu run at a fixed frequency?
It is really difficult to compare code execution times when the cpu clock speed
keeps changing.
I thought I'd managed by setting the 'scaling_max_freq' to 1.7GHz, but even that
doesn't seem to be working now.
It would also be nice to run a little faster than that - but without it 'randomly'
going to 'turbo' frequencies (which it is doing even after I've set no_turbo to 1).

An alternative would be a variable frequency TSC - might give more consistent values.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2019-12-04 17:58:57

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Running an Ivy Bridge cpu at fixed frequency

On Wed, Dec 4, 2019 at 9:01 AM David Laight <[email protected]> wrote:
>
> Is there any way to persuade the intel_pstate driver to make an Ivy bridge (i7-3770)
> cpu run at a fixed frequency?
> It is really difficult to compare code execution times when the cpu clock speed
> keeps changing.
> I thought I'd managed by setting the 'scaling_max_freq' to 1.7GHz, but even that
> doesn't seem to be working now.
> It would also be nice to run a little faster than that - but without it 'randomly'
> going to 'turbo' frequencies (which it is doing even after I've set no_turbo to 1).
>

I don't remember. I'm sure I could figure out what MSR to write, but
that's not the answer you're looking for. Someone else will know :)

> An alternative would be a variable frequency TSC - might give more consistent values.

You can quite easily use perf to count cycles. I never really
finished it, but this is a tiny little library that should do exactly
what you need. It's a bit messy.

https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/tight_loop/perf_self_monitor.c

2019-12-05 09:46:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Running an Ivy Bridge cpu at fixed frequency

On Wed, Dec 04, 2019 at 05:01:32PM +0000, David Laight wrote:
> Is there any way to persuade the intel_pstate driver to make an Ivy bridge (i7-3770)
> cpu run at a fixed frequency?

You can, use performance governor and put scaling_{min,max}_freq at the
base_frequency (I _think_, I never quite remember if that is the max non
turbo P state).

If that doesn't work, simply put it at cpuinfo_min_freq. It's slow, but
it's guaranteed stable.

> It is really difficult to compare code execution times when the cpu clock speed
> keeps changing.

As Andy already wrote, perf is really good for this.

Find attached, it probably is less shiny than what Andy handed you, but
contains all the bits required to frob something.

> I thought I'd managed by setting the 'scaling_max_freq' to 1.7GHz, but even that
> doesn't seem to be working now.

You also have to set the min I think, and select the performance
governor, otherwise it's too tempted to be 'smart' about stuff.

> It would also be nice to run a little faster than that - but without it 'randomly'
> going to 'turbo' frequencies (which it is doing even after I've set no_turbo to 1).
>
> An alternative would be a variable frequency TSC - might give more consistent values.

perf :-)


Attachments:
(No filename) (1.24 kB)
spinlocks.tar.bz2 (28.33 kB)
Download all attachments

2019-12-05 15:55:25

by David Laight

[permalink] [raw]
Subject: RE: Running an Ivy Bridge cpu at fixed frequency

From: Peter Zijlstra
> Sent: 05 December 2019 09:46
> As Andy already wrote, perf is really good for this.
>
> Find attached, it probably is less shiny than what Andy handed you, but
> contains all the bits required to frob something.

You are in a maze of incomplete documentation all disjoint.

The x86 instruction set doc (eg 325462.pdf) defines the rdpmc instruction, tells you
how many counters each cpu type has, but doesn't even contain a reference
to how they are incremented.
I guess there are some processor-specific MSR for that.

perf_event_open(2) tells you a few things, but doesn't actually what anything is.
It contains all but the last 'if' clause of this function, without really saying
what any of it does - or why you might do it this way.

static inline u64 mmap_read_self(void *addr)
{
struct perf_event_mmap_page *pc = addr;
u32 seq, idx, time_mult = 0, time_shift = 0, width = 0;
u64 count, cyc = 0, time_offset = 0, enabled, running, delta;
s64 pmc = 0;

do {
seq = pc->lock;
barrier();

enabled = pc->time_enabled;
running = pc->time_running;

if (pc->cap_user_time && enabled != running) {
cyc = rdtsc();
time_mult = pc->time_mult;
time_shift = pc->time_shift;
time_offset = pc->time_offset;
}

idx = pc->index;
count = pc->offset;
if (pc->cap_user_rdpmc && idx) {
width = pc->pmc_width;
pmc = rdpmc(idx - 1);
}

barrier();
} while (pc->lock != seq);

if (idx) {
pmc <<= 64 - width;
pmc >>= 64 - width; /* shift right signed */
count += pmc;
}

if (enabled != running) {
u64 quot, rem;

quot = (cyc >> time_shift);
rem = cyc & ((1 << time_shift) - 1);
delta = time_offset + quot * time_mult +
((rem * time_mult) >> time_shift);

enabled += delta;
if (idx)
running += delta;

quot = count / running;
rem = count % running;
count = quot * enabled + (rem * enabled) / running;
}

return count;
}

AFAICT:
1) The last clause is scaling the count up to allow for time when the hardware counter
couldn't be allocated.
I'm not convinced that is useful, better to ignore the entire measurement.
Half this got deleted from the man page, leaving strange 'set but unused' variables.

2) The hardware counters are disabled while the process is asleep.
On wake a different pmc counter might be used (maybe on a different cpu).
The new cpu might not even have a counter available.

3) If you don't want to scale up for missing periods it is probably enough to do:
do {
seq = pc->offset;
barrier();
idx = pc->index;
if (!index)
return -1;
count = pc->offset + rdpmc(idx - 1);
} while (seq != pc->seq);
return (unsigned int)count;

Not tried it yet :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-12-05 18:10:25

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Running an Ivy Bridge cpu at fixed frequency



> On Dec 5, 2019, at 7:54 AM, David Laight <[email protected]> wrote:
>
> From: Peter Zijlstra
>> Sent: 05 December 2019 09:46
>> As Andy already wrote, perf is really good for this.
>> Find attached, it probably is less shiny than what Andy handed you, but
>> contains all the bits required to frob something.
>
> You are in a maze of incomplete documentation all disjoint.

I don’t see any documentation. Maybe you shouldn’t have turned your flashlight on.

>
> The x86 instruction set doc (eg 325462.pdf) defines the rdpmc instruction, tells you
> how many counters each cpu type has, but doesn't even contain a reference
> to how they are incremented.
> I guess there are some processor-specific MSR for that.
>
> perf_event_open(2) tells you a few things, but doesn't actually what anything is.
> It contains all but the last 'if' clause of this function, without really saying
> what any of it does - or why you might do it this way.
>
> static inline u64 mmap_read_self(void *addr)
> {
> struct perf_event_mmap_page *pc = addr;
> u32 seq, idx, time_mult = 0, time_shift = 0, width = 0;
> u64 count, cyc = 0, time_offset = 0, enabled, running, delta;
> s64 pmc = 0;
>
> do {
> seq = pc->lock;
> barrier();
>
> enabled = pc->time_enabled;
> running = pc->time_running;
>
> if (pc->cap_user_time && enabled != running) {
> cyc = rdtsc();
> time_mult = pc->time_mult;
> time_shift = pc->time_shift;
> time_offset = pc->time_offset;
> }
>
> idx = pc->index;
> count = pc->offset;
> if (pc->cap_user_rdpmc && idx) {
> width = pc->pmc_width;
> pmc = rdpmc(idx - 1);
> }
>
> barrier();
> } while (pc->lock != seq);
>
> if (idx) {
> pmc <<= 64 - width;
> pmc >>= 64 - width; /* shift right signed */
> count += pmc;
> }
>
> if (enabled != running) {
> u64 quot, rem;
>
> quot = (cyc >> time_shift);
> rem = cyc & ((1 << time_shift) - 1);
> delta = time_offset + quot * time_mult +
> ((rem * time_mult) >> time_shift);
>
> enabled += delta;
> if (idx)
> running += delta;
>
> quot = count / running;
> rem = count % running;
> count = quot * enabled + (rem * enabled) / running;
> }
>
> return count;
> }
>
> AFAICT:
> 1) The last clause is scaling the count up to allow for time when the hardware counter
> couldn't be allocated.
> I'm not convinced that is useful, better to ignore the entire measurement.
> Half this got deleted from the man page, leaving strange 'set but unused' variables.
>
> 2) The hardware counters are disabled while the process is asleep.
> On wake a different pmc counter might be used (maybe on a different cpu).
> The new cpu might not even have a counter available.
>
> 3) If you don't want to scale up for missing periods it is probably enough to do:
> do {
> seq = pc->offset;
> barrier();
> idx = pc->index;
> if (!index)
> return -1;
> count = pc->offset + rdpmc(idx - 1);
> } while (seq != pc->seq);
> return (unsigned int)count;
>
> Not tried it yet :-)

Use my version :). I just throw out the sample if we were preempted or if it was otherwise suspicious.

—Andy

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2019-12-06 10:16:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Running an Ivy Bridge cpu at fixed frequency

On Thu, Dec 05, 2019 at 03:53:55PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 05 December 2019 09:46
> > As Andy already wrote, perf is really good for this.
> >
> > Find attached, it probably is less shiny than what Andy handed you, but
> > contains all the bits required to frob something.
>
> You are in a maze of incomplete documentation all disjoint.

I'm sure..

> The x86 instruction set doc (eg 325462.pdf) defines the rdpmc instruction, tells you
> how many counters each cpu type has, but doesn't even contain a reference
> to how they are incremented.

There's book 3, chapter 18, performance monitoring overview, that should
explain how the counters work, and chapter 19 that lists many of the
available events.

TL;DR, they're (48bit) signed counters that increment and raise an
interrupt when the sign flips. This means we set them to '-period' and
then upon read (either early or on interrupt) compute the delta and
accumulate elsewhere.

> perf_event_open(2) tells you a few things, but doesn't actually what anything is.
> It contains all but the last 'if' clause of this function, without really saying
> what any of it does - or why you might do it this way.

I don't actually know what's in that manpage. But it really shouldn't be
too hard to understand.

It's a seqcount protected set of value, there's the RDPMC counter index,
and the counter offset. If the idx!=0 it means the counter is actually
programmed and we must RDPMC, the result of which we must add to the
offset.

The whole counter scaling crud is just that, crud you can mostly forget
about if you want to quickly hack something together. See
mmap_read_pinned() for the simplified (and much faster version) that
ignores all that.


> AFAICT:
> 1) The last clause is scaling the count up to allow for time when the hardware counter
> couldn't be allocated.
> I'm not convinced that is useful, better to ignore the entire measurement.
> Half this got deleted from the man page, leaving strange 'set but unused' variables.

Depending on the usecase, sure. I don't mave use for it either. I know
other people find it useful.

> 2) The hardware counters are disabled while the process is asleep.
> On wake a different pmc counter might be used (maybe on a different cpu).
> The new cpu might not even have a counter available.

Right, but if this is all you're running that is unlikely to happen.

> 3) If you don't want to scale up for missing periods it is probably enough to do:
> do {
> seq = pc->offset;
> barrier();
> idx = pc->index;
> if (!index)
> return -1;
> count = pc->offset + rdpmc(idx - 1);
> } while (seq != pc->seq);
> return (unsigned int)count;

You still need to do the rdpmc sign extent crud, but see
mmap_read_pinned() that does just about that.

As the name suggests it relies on using perf_event_attr::pinned = 1.

2019-12-06 13:08:26

by David Laight

[permalink] [raw]
Subject: RE: Running an Ivy Bridge cpu at fixed frequency

From: Peter Zijlstra
> Sent: 06 December 2019 10:16
> To: David Laight <[email protected]>
...
> The whole counter scaling crud is just that, crud you can mostly forget
> about if you want to quickly hack something together. See
> mmap_read_pinned() for the simplified (and much faster version) that
> ignores all that.

I noticed that version later :-(
The 'seqcount' is interesting, since it only protects against updates
that happen while the process itself is in kernel space.
It doesn't allow arbitrary kernel updates of the memory area.

...
> You still need to do the rdpmc sign extent crud, but see
> mmap_read_pinned() that does just about that.

Actually for what I'm doing i can truncate the counter to 32 bits
and not worry about when it wraps.

Anyway I've not got some histograms of the elapsed cycle counts
for recvfrom() and recvmsg() with, and without, some of the
HARDENED_USERCOPY costs.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-12-06 14:48:13

by Alexey Klimov

[permalink] [raw]
Subject: Re: Running an Ivy Bridge cpu at fixed frequency

On Wed, Dec 4, 2019 at 5:32 PM David Laight <[email protected]> wrote:
>
> Is there any way to persuade the intel_pstate driver to make an Ivy bridge (i7-3770)
> cpu run at a fixed frequency?
> It is really difficult to compare code execution times when the cpu clock speed
> keeps changing.
> I thought I'd managed by setting the 'scaling_max_freq' to 1.7GHz, but even that
> doesn't seem to be working now.
> It would also be nice to run a little faster than that - but without it 'randomly'
> going to 'turbo' frequencies (which it is doing even after I've set no_turbo to 1).
>
> An alternative would be a variable frequency TSC - might give more consistent values.

Have you tried intel_pstate=passive parameter in cmdline?
You'll be able to fix the frequency using governors or sysfs.
Not sure that this is what you're looking for. I personally also don't
know that 'passive' mode will work on Ivy Bridge.

Best regards,
Alexey