2022-05-27 12:59:14

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2] perf metrics: Add literal for system TSC frequency

Such a literal is useful to calculate things like the average frequency
[1]. The TSC frequency isn't exposed by sysfs although some experimental
drivers look to add it [2]. This change computes the value using the
frequency in /proc/cpuinfo which is accurate at least on Intel
processors.

v2. Adds warnings to make clear if things have changed/broken on future
Intel platforms. It also adds caching and an Intel specific that a
value is computed.

[1] https://github.com/intel/perfmon-metrics/blob/5ad9ef7056f31075e8178b9f1fb732af183b2c8d/SKX/metrics/perf/skx_metric_perf.json#L11
[2] https://github.com/trailofbits/tsc_freq_khz

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/expr.c | 15 +++++++++++++
tools/perf/util/expr.c | 50 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)

diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index 5c0032fe93ae..45afe4f24859 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -1,8 +1,10 @@
// SPDX-License-Identifier: GPL-2.0
#include "util/debug.h"
#include "util/expr.h"
+#include "util/header.h"
#include "util/smt.h"
#include "tests.h"
+#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <linux/zalloc.h>
@@ -69,6 +71,11 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
double val, num_cpus, num_cores, num_dies, num_packages;
int ret;
struct expr_parse_ctx *ctx;
+ bool is_intel = false;
+ char buf[128];
+
+ if (!get_cpuid(buf, sizeof(buf)))
+ is_intel = strstr(buf, "Intel") != NULL;

TEST_ASSERT_EQUAL("ids_union", test_ids_union(), 0);

@@ -175,6 +182,14 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
if (num_dies) // Some platforms do not have CPU die support, for example s390
TEST_ASSERT_VAL("#num_dies >= #num_packages", num_dies >= num_packages);

+ if (is_intel) {
+ double system_tsc_freq;
+
+ TEST_ASSERT_VAL("#system_tsc_freq", expr__parse(&system_tsc_freq, ctx,
+ "#system_tsc_freq") == 0);
+ TEST_ASSERT_VAL("!isnan(#system_tsc_freq)", !isnan(system_tsc_freq));
+ }
+
/*
* Source count returns the number of events aggregating in a leader
* event including the leader. Check parsing yields an id.
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 675f318ce7c1..f33aeb1e6faa 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -402,6 +402,51 @@ double expr_id_data__source_count(const struct expr_id_data *data)
return data->val.source_count;
}

+/*
+ * Derive the TSC frequency in Hz from the /proc/cpuinfo, for example:
+ * ...
+ * model name : Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
+ * ...
+ * will return 3000000000.
+ */
+static double system_tsc_freq(void)
+{
+ static double result;
+ static bool computed;
+ FILE *cpuinfo;
+ char *line = NULL;
+ size_t len = 0;
+
+ if (computed)
+ return result;
+
+ computed = true;
+ result = NAN;
+ cpuinfo = fopen("/proc/cpuinfo", "r");
+ if (!cpuinfo) {
+ pr_err("Failed to read /proc/cpuinfo for TSC frequency");
+ return NAN;
+ }
+ while (getline(&line, &len, cpuinfo) > 0) {
+ if (!strncmp(line, "model name", 10)) {
+ char *pos = strstr(line + 11, " @ ");
+
+ if (pos && sscanf(pos, " @ %lfGHz", &result) == 1) {
+ result *= 1000000000;
+ goto out;
+ }
+ }
+ }
+
+out:
+ if (isnan(result))
+ pr_err("Failed to find TSC frequency in /proc/cpuinfo");
+
+ free(line);
+ fclose(cpuinfo);
+ return result;
+}
+
double expr__get_literal(const char *literal)
{
static struct cpu_topology *topology;
@@ -417,6 +462,11 @@ double expr__get_literal(const char *literal)
goto out;
}

+ if (!strcasecmp("#system_tsc_freq", literal)) {
+ result = system_tsc_freq();
+ goto out;
+ }
+
/*
* Assume that topology strings are consistent, such as CPUs "0-1"
* wouldn't be listed as "0,1", and so after deduplication the number of
--
2.36.1.124.g0e6072fb45-goog



2022-05-28 18:33:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] perf metrics: Add literal for system TSC frequency

On Thu, May 26, 2022 at 09:04:07PM -0700, Ian Rogers wrote:
> Such a literal is useful to calculate things like the average frequency
> [1]. The TSC frequency isn't exposed by sysfs although some experimental
> drivers look to add it [2]. This change computes the value using the
> frequency in /proc/cpuinfo which is accurate at least on Intel
> processors.
>
> v2. Adds warnings to make clear if things have changed/broken on future
> Intel platforms. It also adds caching and an Intel specific that a
> value is computed.
>
> [1] https://github.com/intel/perfmon-metrics/blob/5ad9ef7056f31075e8178b9f1fb732af183b2c8d/SKX/metrics/perf/skx_metric_perf.json#L11
> [2] https://github.com/trailofbits/tsc_freq_khz

This all seems bonghits inspired... and perf actually does expose the
tsc frequency. What do you think is in perf_event_mmap_page::time_* ?

>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/tests/expr.c | 15 +++++++++++++
> tools/perf/util/expr.c | 50 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
> index 5c0032fe93ae..45afe4f24859 100644
> --- a/tools/perf/tests/expr.c
> +++ b/tools/perf/tests/expr.c
> @@ -1,8 +1,10 @@
> // SPDX-License-Identifier: GPL-2.0
> #include "util/debug.h"
> #include "util/expr.h"
> +#include "util/header.h"
> #include "util/smt.h"
> #include "tests.h"
> +#include <math.h>
> #include <stdlib.h>
> #include <string.h>
> #include <linux/zalloc.h>
> @@ -69,6 +71,11 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
> double val, num_cpus, num_cores, num_dies, num_packages;
> int ret;
> struct expr_parse_ctx *ctx;
> + bool is_intel = false;
> + char buf[128];
> +
> + if (!get_cpuid(buf, sizeof(buf)))
> + is_intel = strstr(buf, "Intel") != NULL;
>
> TEST_ASSERT_EQUAL("ids_union", test_ids_union(), 0);
>
> @@ -175,6 +182,14 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
> if (num_dies) // Some platforms do not have CPU die support, for example s390
> TEST_ASSERT_VAL("#num_dies >= #num_packages", num_dies >= num_packages);
>
> + if (is_intel) {
> + double system_tsc_freq;
> +
> + TEST_ASSERT_VAL("#system_tsc_freq", expr__parse(&system_tsc_freq, ctx,
> + "#system_tsc_freq") == 0);
> + TEST_ASSERT_VAL("!isnan(#system_tsc_freq)", !isnan(system_tsc_freq));
> + }
> +
> /*
> * Source count returns the number of events aggregating in a leader
> * event including the leader. Check parsing yields an id.
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 675f318ce7c1..f33aeb1e6faa 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -402,6 +402,51 @@ double expr_id_data__source_count(const struct expr_id_data *data)
> return data->val.source_count;
> }
>
> +/*
> + * Derive the TSC frequency in Hz from the /proc/cpuinfo, for example:
> + * ...
> + * model name : Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
> + * ...
> + * will return 3000000000.
> + */
> +static double system_tsc_freq(void)
> +{
> + static double result;
> + static bool computed;
> + FILE *cpuinfo;
> + char *line = NULL;
> + size_t len = 0;
> +
> + if (computed)
> + return result;
> +
> + computed = true;
> + result = NAN;
> + cpuinfo = fopen("/proc/cpuinfo", "r");
> + if (!cpuinfo) {
> + pr_err("Failed to read /proc/cpuinfo for TSC frequency");
> + return NAN;
> + }
> + while (getline(&line, &len, cpuinfo) > 0) {
> + if (!strncmp(line, "model name", 10)) {
> + char *pos = strstr(line + 11, " @ ");
> +
> + if (pos && sscanf(pos, " @ %lfGHz", &result) == 1) {
> + result *= 1000000000;
> + goto out;
> + }
> + }
> + }
> +
> +out:
> + if (isnan(result))
> + pr_err("Failed to find TSC frequency in /proc/cpuinfo");
> +
> + free(line);
> + fclose(cpuinfo);
> + return result;
> +}
> +
> double expr__get_literal(const char *literal)
> {
> static struct cpu_topology *topology;
> @@ -417,6 +462,11 @@ double expr__get_literal(const char *literal)
> goto out;
> }
>
> + if (!strcasecmp("#system_tsc_freq", literal)) {
> + result = system_tsc_freq();
> + goto out;
> + }
> +
> /*
> * Assume that topology strings are consistent, such as CPUs "0-1"
> * wouldn't be listed as "0,1", and so after deduplication the number of
> --
> 2.36.1.124.g0e6072fb45-goog
>

2022-05-28 19:12:44

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v2] perf metrics: Add literal for system TSC frequency

On Fri, May 27, 2022 at 7:54 AM Andi Kleen <[email protected]> wrote:
>
>
> On 5/27/2022 2:49 AM, Peter Zijlstra wrote:
> > On Thu, May 26, 2022 at 09:04:07PM -0700, Ian Rogers wrote:
> >> Such a literal is useful to calculate things like the average frequency
> >> [1]. The TSC frequency isn't exposed by sysfs although some experimental
> >> drivers look to add it [2]. This change computes the value using the
> >> frequency in /proc/cpuinfo which is accurate at least on Intel
> >> processors.
>
> It would be better to use CPUID if available which is far more accurate.
> Also I believe newer Intel CPUs drop the frequency in the brand string.

But on v1 you also said it was broken for certain architectures. The
point of the tests and warnings is to identify this case.

> BTW it also has the problem that if perf script is run on some other
> system to compute metrics it won't work.

Yep. We don't yet support recording then reporting metrics on a
different system. There would be a bunch to do to support this.

> >
> > This all seems bonghits inspired... and perf actually does expose the
> > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
>
>
> That's not really available to perf stat, which is the primary metrics user.

So we have 3 approaches:

1) read /proc/cpuinfo: pros: easy in comparison to other approaches.
cons: will break in the future, will need a cross system strategy
2) use cpuid: pros: can be made to work across systems cons: not
available for all Intel architectures
3) mmap: pros: more stable API than cpuinfo vendor string. cons: could
end up with new events purely to gather a TSC frequency and associated
complexity around permissions, etc.

I feel we can bike shed and go between the 3 approaches almost
endlessly. I'd rather get something in and iterate. How the approach
breaks can motivate the next steps and the tests added here will
capture when it does break, not just when a metric is first used. I
mentioned this on V1 and requested feedback, this is just implementing
what I said I'd do to V1.

Thanks,
Ian

> -Andi
>
>
>

2022-05-28 19:45:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] perf metrics: Add literal for system TSC frequency

On Sat, May 28, 2022 at 07:50:40AM -0700, Ian Rogers wrote:
> On Sat, May 28, 2022 at 7:02 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Fri, May 27, 2022 at 07:54:19AM -0700, Andi Kleen wrote:
> >
> > > > This all seems bonghits inspired... and perf actually does expose the
> > > > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
> > >
> > >
> > > That's not really available to perf stat, which is the primary metrics user.
> >
> > Why not? You can mmap any perf-fd (even software events) and these
> > fields should be filled out.
> >
> > It should work on any x86 CPU that has a TSC. The only caveat is that
> > the kernel must not have marked the TSC unstable.
> >
> > It could even work for virt -- all you need is for virt to use
> > native_sched_clock() instead of the paravirt nonsense.
>
> It will at least fail if inherit is enabled, no?

For per-task events, yes, I suppose it will. I'd forgotten about that
restriction on perf_mmap() :/

2022-05-28 19:57:23

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v2] perf metrics: Add literal for system TSC frequency

On Sat, May 28, 2022 at 7:02 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, May 27, 2022 at 07:54:19AM -0700, Andi Kleen wrote:
>
> > > This all seems bonghits inspired... and perf actually does expose the
> > > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
> >
> >
> > That's not really available to perf stat, which is the primary metrics user.
>
> Why not? You can mmap any perf-fd (even software events) and these
> fields should be filled out.
>
> It should work on any x86 CPU that has a TSC. The only caveat is that
> the kernel must not have marked the TSC unstable.
>
> It could even work for virt -- all you need is for virt to use
> native_sched_clock() instead of the paravirt nonsense.

It will at least fail if inherit is enabled, no?

Thanks,
Ian

2022-05-28 20:37:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2] perf metrics: Add literal for system TSC frequency


On 5/27/2022 2:49 AM, Peter Zijlstra wrote:
> On Thu, May 26, 2022 at 09:04:07PM -0700, Ian Rogers wrote:
>> Such a literal is useful to calculate things like the average frequency
>> [1]. The TSC frequency isn't exposed by sysfs although some experimental
>> drivers look to add it [2]. This change computes the value using the
>> frequency in /proc/cpuinfo which is accurate at least on Intel
>> processors.

It would be better to use CPUID if available which is far more accurate.
Also I believe newer Intel CPUs drop the frequency in the brand string.

BTW it also has the problem that if perf script is run on some other
system to compute metrics it won't work.

>
> This all seems bonghits inspired... and perf actually does expose the
> tsc frequency. What do you think is in perf_event_mmap_page::time_* ?


That's not really available to perf stat, which is the primary metrics user.


-Andi




2022-05-28 20:42:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] perf metrics: Add literal for system TSC frequency

On Fri, May 27, 2022 at 07:54:19AM -0700, Andi Kleen wrote:

> > This all seems bonghits inspired... and perf actually does expose the
> > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
>
>
> That's not really available to perf stat, which is the primary metrics user.

Why not? You can mmap any perf-fd (even software events) and these
fields should be filled out.

It should work on any x86 CPU that has a TSC. The only caveat is that
the kernel must not have marked the TSC unstable.

It could even work for virt -- all you need is for virt to use
native_sched_clock() instead of the paravirt nonsense.