LinuxLists.cc - [PATCH 0/3]measure SMI cost

2017-03-23 18:36:54

Subject: [PATCH 0/3]measure SMI cost

From: Kan Liang <[email protected]>

Currently, there is no way to measure the time cost in System management
mode (SMM) by perf.

Intel perfmon supports FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. Once it sets,
the PMU core counters will freeze on SMI handler. But it will not have an
effect on free running counters. E.g. APERF counter.
The cost of SMI can be measured by (aperf - cycles).

A new sysfs entry /sys/device/cpu/freeze_on_smi is introduced to set
FREEZE_WHILE_SMM bit in IA32_DEBUGCTL.

A new --smi-cost mode in perf stat is implemented to measure the SMI cost
by calculating cycles and aperf results. In practice, the percentages of
SMI cycles should be more useful than absolute value. So the output will be
the percentage of SMI cycles and SMI#.

Here is an example output.

Performance counter stats for 'sudo echo ':

SMI cycles% SMI#
0.1% 1

0.010858678 seconds time elapsed

Kan Liang (3):
perf/x86: add sysfs entry to freeze counter on SMI
tools lib api fs: Add sysfs__write_int function
perf stat: Add support to measure SMI cost

arch/x86/events/core.c | 10 +++++++
arch/x86/events/intel/core.c | 48 ++++++++++++++++++++++++++++++++++
arch/x86/events/perf_event.h | 3 +++
arch/x86/include/asm/msr-index.h | 1 +
tools/lib/api/fs/fs.c | 29 ++++++++++++++++++++
tools/lib/api/fs/fs.h | 4 +++
tools/perf/Documentation/perf-stat.txt | 9 +++++++
tools/perf/builtin-stat.c | 48 ++++++++++++++++++++++++++++++++++
tools/perf/util/stat-shadow.c | 33 +++++++++++++++++++++++
tools/perf/util/stat.c | 2 ++
tools/perf/util/stat.h | 2 ++
11 files changed, 189 insertions(+)

--
2.7.4

2017-03-23 18:36:57

by Liang, Kan

[permalink] [raw]

Subject: [PATCH 3/3] perf stat: Add support to measure SMI cost

From: Kan Liang <[email protected]>

Implementing a new --smi-cost mode in perf stat to measure SMI cost.
During the measurement, the /sys/device/cpu/freeze_on_smi will be set.
The measurement can be done with one counter and two free running MSR
counters (IA32_APERF and SMI_COUNT).

The formula to caculate SMI cost is as below.
The percentages of SMI cycles = (aperf - cycles) / aperf

metric_only will be set by default unless the user explicitly disable
it.

Here is an example output.

Performance counter stats for 'sudo echo ':

SMI cycles% SMI#
0.1% 1

0.010858678 seconds time elapsed

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 9 +++++++
tools/perf/builtin-stat.c | 48 ++++++++++++++++++++++++++++++++++
tools/perf/util/stat-shadow.c | 33 +++++++++++++++++++++++
tools/perf/util/stat.c | 2 ++
tools/perf/util/stat.h | 2 ++
5 files changed, 94 insertions(+)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 9785481..d8df8e9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -236,6 +236,15 @@ To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.

+--smi-cost::
+Measure SMI cost if support and SMI is detected
+
+During the measurement, the /sys/device/cpu/freeze_on_smi will be
+set to freeze cycles counter on SMI. The aperf event will not be
+effected by freeze_on_smi.
+
+The percentages of SMI cycles = (aperf - cycles) / aperf
+
EXAMPLES
--------

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f53f449..f10aad6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -76,6 +76,7 @@
#define DEFAULT_SEPARATOR " "
#define CNTR_NOT_SUPPORTED "<not supported>"
#define CNTR_NOT_COUNTED "<not counted>"
+#define FREEZE_ON_SMI_PATH "devices/cpu/freeze_on_smi"

static void print_counters(struct timespec *ts, int argc, const char **argv);

@@ -112,6 +113,14 @@ static const char * topdown_attrs[] = {
NULL,
};

+static const char *smi_cost_attrs = {
+ "{"
+ "msr/aperf/,"
+ "msr/smi/,"
+ "cycles"
+ "}"
+};
+
static struct perf_evlist *evsel_list;

static struct target target = {
@@ -127,6 +136,8 @@ static bool null_run = false;
static int detailed_run = 0;
static bool transaction_run;
static bool topdown_run = false;
+static bool smi_cost = false;
+static bool smi_reset = false;
static bool big_num = true;
static int big_num_opt = -1;
static const char *csv_sep = NULL;
@@ -1670,6 +1681,8 @@ static const struct option stat_options[] = {
"Only print computed metrics. No raw values", enable_metric_only),
OPT_BOOLEAN(0, "topdown", &topdown_run,
"measure topdown level 1 statistics"),
+ OPT_BOOLEAN(0, "smi-cost", &smi_cost,
+ "measure SMI cost"),
OPT_END()
};

@@ -2048,6 +2061,38 @@ static int add_default_attributes(void)
return 0;
}

+ if (smi_cost) {
+ int smi;
+
+ if (sysfs__read_int(FREEZE_ON_SMI_PATH, &smi) < 0) {
+ fprintf(stderr, "freeze_on_smi is not supported.\n");
+ return -1;
+ }
+
+ if (!smi) {
+ if (sysfs__write_int(FREEZE_ON_SMI_PATH, 1) < 0) {
+ fprintf(stderr, "Failed to set freeze_on_smi.\n");
+ return -1;
+ }
+ smi_reset = true;
+ }
+
+ if (pmu_have_event("msr", "aperf") &&
+ pmu_have_event("msr", "smi")) {
+ if (!force_metric_only)
+ metric_only = true;
+ err = parse_events(evsel_list, smi_cost_attrs, NULL);
+ } else {
+ fprintf(stderr, "To measure SMI cost, it needs msr/aperf/, msr/smi/ and cpu/cycles/ events support\n");
+ return -1;
+ }
+ if (err) {
+ fprintf(stderr, "Cannot set up SMI cost events\n");
+ return -1;
+ }
+ return 0;
+ }
+
if (topdown_run) {
char *str = NULL;
bool warn = false;
@@ -2629,6 +2674,9 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
perf_stat__exit_aggr_mode();
perf_evlist__free_stats(evsel_list);
out:
+ if (smi_cost && smi_reset)
+ sysfs__write_int(FREEZE_ON_SMI_PATH, 0);
+
perf_evlist__delete(evsel_list);
return status;
}
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 8a2bbd2..dda7877 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -41,6 +41,8 @@ static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_smi_num_stats[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
static bool have_frontend_stalled;

struct stats walltime_nsecs_stats;
@@ -92,6 +94,9 @@ void perf_stat__reset_shadow_stats(void)
memset(runtime_topdown_slots_issued, 0, sizeof(runtime_topdown_slots_issued));
memset(runtime_topdown_fetch_bubbles, 0, sizeof(runtime_topdown_fetch_bubbles));
memset(runtime_topdown_recovery_bubbles, 0, sizeof(runtime_topdown_recovery_bubbles));
+
+ memset(runtime_smi_num_stats, 0, sizeof(runtime_smi_num_stats));
+ memset(runtime_aperf_stats, 0, sizeof(runtime_aperf_stats));
}

/*
@@ -143,6 +148,10 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 *count,
update_stats(&runtime_dtlb_cache_stats[ctx][cpu], count[0]);
else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
update_stats(&runtime_itlb_cache_stats[ctx][cpu], count[0]);
+ else if (perf_stat_evsel__is(counter, SMI_NUM))
+ update_stats(&runtime_smi_num_stats[ctx][cpu], count[0]);
+ else if (perf_stat_evsel__is(counter, APERF))
+ update_stats(&runtime_aperf_stats[ctx][cpu], count[0]);
}

/* used for get_ratio_color() */
@@ -423,6 +432,25 @@ static double td_be_bound(int ctx, int cpu)
return sanitize_val(1.0 - sum);
}

+static void print_smi_cost(int cpu, struct perf_evsel *evsel,
+ struct perf_stat_output_ctx *out)
+{
+ double smi_num, aperf, cycles, cost = 0.0;
+ int ctx = evsel_context(evsel);
+ const char *color = NULL;
+
+ smi_num = avg_stats(&runtime_smi_num_stats[ctx][cpu]);
+ aperf = avg_stats(&runtime_aperf_stats[ctx][cpu]);
+ cycles = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+
+ cost = (aperf - cycles) / aperf * 100.00;
+
+ if (cost > 10)
+ color = PERF_COLOR_RED;
+ out->print_metric(out->ctx, color, "%8.1f%%", "SMI cycles%", cost);
+ out->print_metric(out->ctx, NULL, "%4.0f", "SMI#", smi_num);
+}
+
void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
double avg, int cpu,
struct perf_stat_output_ctx *out)
@@ -628,6 +656,11 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
}
snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
+ } else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
+ if (avg_stats(&runtime_smi_num_stats[ctx][cpu]))
+ print_smi_cost(cpu, evsel, out);
+ else
+ print_metric(ctxp, NULL, NULL, " no SMI detected", 0);
} else {
print_metric(ctxp, NULL, NULL, NULL, 0);
}
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 0d51334..8ae3160f 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -84,6 +84,8 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
+ ID(SMI_NUM, msr/smi/),
+ ID(APERF, msr/aperf/),
};
#undef ID

diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index c29bb94..c3be07d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -22,6 +22,8 @@ enum perf_stat_evsel_id {
PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
+ PERF_STAT_EVSEL_ID__SMI_NUM,
+ PERF_STAT_EVSEL_ID__APERF,
PERF_STAT_EVSEL_ID__MAX,
};

--
2.7.4

2017-03-23 18:37:11

by Liang, Kan

[permalink] [raw]

Subject: [PATCH 2/3] tools lib api fs: Add sysfs__write_int function

From: Kan Liang <[email protected]>

Adding sysfs__write_int function to ease up writing int to sysfs.
New interface is:

int sysfs__write_int(const char *entry, int value);

Also, introducing filename__write_int which is useful for new helpers to
write sysctl values.

Signed-off-by: Kan Liang <[email protected]>
---
tools/lib/api/fs/fs.c | 29 +++++++++++++++++++++++++++++
tools/lib/api/fs/fs.h | 4 ++++
2 files changed, 33 insertions(+)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 809c772..de4f74e 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -387,6 +387,22 @@ int filename__read_str(const char *filename, char **buf, size_t *sizep)
return err;
}

+int filename__write_int(const char *filename, int value)
+{
+ int fd = open(filename, O_WRONLY), err = -1;
+ char buf[64];
+
+ if (fd < 0)
+ return err;
+
+ sprintf(buf, "%d", value);
+ if (write(fd, buf, sizeof(buf)) == sizeof(buf))
+ err = 0;
+
+ close(fd);
+ return err;
+}
+
int procfs__read_str(const char *entry, char **buf, size_t *sizep)
{
char path[PATH_MAX];
@@ -480,3 +496,16 @@ int sysctl__read_int(const char *sysctl, int *value)

return filename__read_int(path, value);
}
+
+int sysfs__write_int(const char *entry, int value)
+{
+ char path[PATH_MAX];
+ const char *sysfs = sysfs__mountpoint();
+
+ if (!sysfs)
+ return -1;
+
+ snprintf(path, sizeof(path), "%s/%s", sysfs, entry);
+
+ return filename__write_int(path, value);
+}
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 956c211..4560534 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -31,6 +31,8 @@ int filename__read_int(const char *filename, int *value);
int filename__read_ull(const char *filename, unsigned long long *value);
int filename__read_str(const char *filename, char **buf, size_t *sizep);

+int filename__write_int(const char *filename, int value);
+
int procfs__read_str(const char *entry, char **buf, size_t *sizep);

int sysctl__read_int(const char *sysctl, int *value);
@@ -38,4 +40,6 @@ int sysfs__read_int(const char *entry, int *value);
int sysfs__read_ull(const char *entry, unsigned long long *value);
int sysfs__read_str(const char *entry, char **buf, size_t *sizep);
int sysfs__read_bool(const char *entry, bool *value);
+
+int sysfs__write_int(const char *entry, int value);
#endif /* __API_FS__ */
--
2.7.4

2017-03-23 18:37:29

by Liang, Kan

[permalink] [raw]

Subject: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

From: Kan Liang <[email protected]>

When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
counters will be effected. There is no way to do per-counter freeze
on smi. So it should not use the per-event interface (e.g. ioctl or
event attribute) to set FREEZE_WHILE_SMM bit.

Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
while in SMM.
Value has to be 0 or 1. It will be applied to all possible cpus.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/core.c | 10 +++++++++
arch/x86/events/intel/core.c | 48 ++++++++++++++++++++++++++++++++++++++++
arch/x86/events/perf_event.h | 3 +++
arch/x86/include/asm/msr-index.h | 1 +
4 files changed, 62 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 349d4d1..c16fb50 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1750,6 +1750,8 @@ ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event)
return ret;
}

+static struct attribute_group x86_pmu_attr_group;
+
static int __init init_hw_perf_events(void)
{
struct x86_pmu_quirk *quirk;
@@ -1813,6 +1815,14 @@ static int __init init_hw_perf_events(void)
x86_pmu_events_group.attrs = tmp;
}

+ if (x86_pmu.attrs) {
+ struct attribute **tmp;
+
+ tmp = merge_attr(x86_pmu_attr_group.attrs, x86_pmu.attrs);
+ if (!WARN_ON(!tmp))
+ x86_pmu_attr_group.attrs = tmp;
+ }
+
pr_info("... version: %d\n", x86_pmu.version);
pr_info("... bit width: %d\n", x86_pmu.cntval_bits);
pr_info("... generic registers: %d\n", x86_pmu.num_counters);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4244bed..a99a4ea 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3595,6 +3595,52 @@ static struct attribute *hsw_events_attrs[] = {
NULL
};

+static ssize_t freeze_on_smi_show(struct device *cdev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%d\n", x86_pmu.attr_freeze_on_smi);
+}
+
+static ssize_t freeze_on_smi_store(struct device *cdev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long val;
+ u64 debugctlmsr;
+ ssize_t ret;
+ int cpu;
+
+ ret = kstrtoul(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ if (val > 1)
+ return -EINVAL;
+
+ if (x86_pmu.attr_freeze_on_smi == val)
+ return count;
+
+ for_each_possible_cpu(cpu) {
+ rdmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, &debugctlmsr);
+ if (val)
+ wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, debugctlmsr | DEBUGCTLMSR_FREEZE_WHILE_SMM);
+ else
+ wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, debugctlmsr & ~DEBUGCTLMSR_FREEZE_WHILE_SMM);
+ }
+
+ x86_pmu.attr_freeze_on_smi = val;
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(freeze_on_smi);
+
+static struct attribute *intel_pmu_attrs[] = {
+ &dev_attr_freeze_on_smi.attr,
+ NULL,
+};
+
__init int intel_pmu_init(void)
{
union cpuid10_edx edx;
@@ -3641,6 +3687,8 @@ __init int intel_pmu_init(void)

x86_pmu.max_pebs_events = min_t(unsigned, MAX_PEBS_EVENTS, x86_pmu.num_counters);

+
+ x86_pmu.attrs = intel_pmu_attrs;
/*
* Quirk: v2 perfmon does not report fixed-purpose events, so
* assume at least 3 events, when not running in a hypervisor:
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bcbb1d2..110cb9b0 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -561,6 +561,9 @@ struct x86_pmu {
ssize_t (*events_sysfs_show)(char *page, u64 config);
struct attribute **cpu_events;

+ int attr_freeze_on_smi;
+ struct attribute **attrs;
+
/*
* CPU Hotplug hooks
*/
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d8b5f8a..26c861f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -134,6 +134,7 @@
#define DEBUGCTLMSR_BTS_OFF_OS (1UL << 9)
#define DEBUGCTLMSR_BTS_OFF_USR (1UL << 10)
#define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11)
+#define DEBUGCTLMSR_FREEZE_WHILE_SMM (1UL << 14)

#define MSR_PEBS_FRONTEND 0x000003f7

--
2.7.4

2017-03-23 20:31:59

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
> counters will be effected. There is no way to do per-counter freeze
> on smi. So it should not use the per-event interface (e.g. ioctl or
> event attribute) to set FREEZE_WHILE_SMM bit.
>
> Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
> bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
> while in SMM.
> Value has to be 0 or 1. It will be applied to all possible cpus.

So is there ever a good reason to not set this?

2017-03-23 20:33:11

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> + for_each_possible_cpu(cpu) {
> + rdmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, &debugctlmsr);
> + if (val)
> + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, debugctlmsr | DEBUGCTLMSR_FREEZE_WHILE_SMM);
> + else
> + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, debugctlmsr & ~DEBUGCTLMSR_FREEZE_WHILE_SMM);
> + }

No; that's just disgusting. Also {rd,wr}msr_on_cpu() should die, exactly
because people end up writing crap like this.

2017-03-23 20:48:13

by Liang, Kan

[permalink] [raw]

Subject: RE: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

> On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all
> performance
> > counters will be effected. There is no way to do per-counter freeze on
> > smi. So it should not use the per-event interface (e.g. ioctl or event
> > attribute) to set FREEZE_WHILE_SMM bit.
> >
> > Adds sysfs entry /sys/device/cpu/freeze_on_smi to set
> FREEZE_WHILE_SMM
> > bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
> > while in SMM.
> > Value has to be 0 or 1. It will be applied to all possible cpus.
>
> So is there ever a good reason to not set this?

For me, I don't see any drawbacks to set it unconditionally.
But I'm not sure if there is someone else who may want the counter
running in SMI.

If there is no objection, I will set the FREEZE_WHILE_SMM bit
unconditionally in next version.

Thanks,
Kan

2017-03-23 22:23:08

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

On Thu, Mar 23, 2017 at 09:31:38PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
> > counters will be effected. There is no way to do per-counter freeze
> > on smi. So it should not use the per-event interface (e.g. ioctl or
> > event attribute) to set FREEZE_WHILE_SMM bit.
> >
> > Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
> > bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
> > while in SMM.
> > Value has to be 0 or 1. It will be applied to all possible cpus.
>
> So is there ever a good reason to not set this?

That means SMIs become invisible to most performance counters.

I don't think that's a good default. If the SMI takes 1% of my
cycles I want to see it.

The masking trick is mainly useful when doing --smi-cost

-Andi

2017-03-24 08:17:53

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

On Thu, Mar 23, 2017 at 03:23:03PM -0700, Andi Kleen wrote:
> On Thu, Mar 23, 2017 at 09:31:38PM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
> > > counters will be effected. There is no way to do per-counter freeze
> > > on smi. So it should not use the per-event interface (e.g. ioctl or
> > > event attribute) to set FREEZE_WHILE_SMM bit.
> > >
> > > Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
> > > bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
> > > while in SMM.
> > > Value has to be 0 or 1. It will be applied to all possible cpus.
> >
> > So is there ever a good reason to not set this?
>
> That means SMIs become invisible to most performance counters.
>
> I don't think that's a good default. If the SMI takes 1% of my
> cycles I want to see it.
>
> The masking trick is mainly useful when doing --smi-cost

Changelog should spell this out though. It adds a knob, so it should say
why it needs be a knob.

2017-03-24 08:34:03

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

On Thu, 23 Mar 2017, Peter Zijlstra wrote:
> On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> > + for_each_possible_cpu(cpu) {
> > + rdmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, &debugctlmsr);
> > + if (val)
> > + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, debugctlmsr | DEBUGCTLMSR_FREEZE_WHILE_SMM);
> > + else
> > + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR, debugctlmsr & ~DEBUGCTLMSR_FREEZE_WHILE_SMM);
> > + }
>
> No; that's just disgusting. Also {rd,wr}msr_on_cpu() should die, exactly
> because people end up writing crap like this.

Aside of that this is completely broken against other users of DEBUGCTLMSR
because it's not atomic vs. the other modifications.

Thanks,

tglx

2017-03-24 08:44:56

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 0/3]measure SMI cost

On Thu, 23 Mar 2017, [email protected] wrote:

> From: Kan Liang <[email protected]>
>
> Currently, there is no way to measure the time cost in System management
> mode (SMM) by perf.
>
> Intel perfmon supports FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. Once it sets,
> the PMU core counters will freeze on SMI handler. But it will not have an
> effect on free running counters. E.g. APERF counter.
> The cost of SMI can be measured by (aperf - cycles).
>
> A new sysfs entry /sys/device/cpu/freeze_on_smi is introduced to set
> FREEZE_WHILE_SMM bit in IA32_DEBUGCTL.
>
> A new --smi-cost mode in perf stat is implemented to measure the SMI cost
> by calculating cycles and aperf results. In practice, the percentages of
> SMI cycles should be more useful than absolute value.

That's only true for performance oriented analysis, but for analyzing the
root cause of latencies the actual cycles are definitely interesting.

Thanks,

tglx

2017-03-24 11:41:03

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH 0/3]measure SMI cost

> > A new --smi-cost mode in perf stat is implemented to measure the SMI cost
> > by calculating cycles and aperf results. In practice, the percentages of
> > SMI cycles should be more useful than absolute value.
>
> That's only true for performance oriented analysis, but for analyzing the
> root cause of latencies the actual cycles are definitely interesting.

perf stat also prints the absolute cycles of course (unless you do --metric-only)

It cannot print individual cycles (per SMI occurrence), the only
way to do that would be to poll constantly.

-Andi

2017-03-24 12:08:04

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 0/3]measure SMI cost

On Fri, 24 Mar 2017, Andi Kleen wrote:

> > > A new --smi-cost mode in perf stat is implemented to measure the SMI cost
> > > by calculating cycles and aperf results. In practice, the percentages of
> > > SMI cycles should be more useful than absolute value.
> >
> > That's only true for performance oriented analysis, but for analyzing the
> > root cause of latencies the actual cycles are definitely interesting.
>
> perf stat also prints the absolute cycles of course (unless you do --metric-only)

So much for the theory. From the patch:

+ if (!force_metric_only)
+ metric_only = true;

> It cannot print individual cycles (per SMI occurrence), the only
> way to do that would be to poll constantly.

I'm well aware of that.

Thanks,

tglx

2017-03-24 14:14:31

by Liang, Kan

[permalink] [raw]

Subject: RE: [PATCH 0/3]measure SMI cost

>
> > > > A new --smi-cost mode in perf stat is implemented to measure the
> > > > SMI cost by calculating cycles and aperf results. In practice, the
> > > > percentages of SMI cycles should be more useful than absolute value.
> > >
> > > That's only true for performance oriented analysis, but for
> > > analyzing the root cause of latencies the actual cycles are definitely
> interesting.
> >
> > perf stat also prints the absolute cycles of course (unless you do
> > --metric-only)
>
> So much for the theory. From the patch:
>
> + if (!force_metric_only)
> + metric_only = true;
>

The metric_only will be set by default in the patch. If user wants to get
the actual cycles, they can apply --no-metric-only.

Do you want me to make it clear in the changelog? Or you just don't like
that "metric_only=true" is set by default?

Thanks,
Kan

2017-03-24 14:15:33

by Liang, Kan

[permalink] [raw]

Subject: RE: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

> On Thu, 23 Mar 2017, Peter Zijlstra wrote:
> > On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> > > + for_each_possible_cpu(cpu) {
> > > + rdmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR,
> &debugctlmsr);
> > > + if (val)
> > > + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR,
> debugctlmsr | DEBUGCTLMSR_FREEZE_WHILE_SMM);
> > > + else
> > > + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR,
> debugctlmsr & ~DEBUGCTLMSR_FREEZE_WHILE_SMM);
> > > + }
> >
> > No; that's just disgusting. Also {rd,wr}msr_on_cpu() should die,
> > exactly because people end up writing crap like this.
>
> Aside of that this is completely broken against other users of
> DEBUGCTLMSR because it's not atomic vs. the other modifications.
>

OK. I will change it.
I guess I need a way/function which can atomically rd,wr msr on all cpus.
Are there existing alternative ways/functions to do that?

Thanks,
Kan

2017-03-24 14:23:45

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/3] perf/x86: add sysfs entry to freeze counter on SMI

On Fri, Mar 24, 2017 at 02:15:16PM +0000, Liang, Kan wrote:
>
>
> > On Thu, 23 Mar 2017, Peter Zijlstra wrote:
> > > On Thu, Mar 23, 2017 at 11:25:49AM -0700, [email protected] wrote:
> > > > + for_each_possible_cpu(cpu) {
> > > > + rdmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR,
> > &debugctlmsr);
> > > > + if (val)
> > > > + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR,
> > debugctlmsr | DEBUGCTLMSR_FREEZE_WHILE_SMM);
> > > > + else
> > > > + wrmsrl_on_cpu(cpu, MSR_IA32_DEBUGCTLMSR,
> > debugctlmsr & ~DEBUGCTLMSR_FREEZE_WHILE_SMM);
> > > > + }
> > >
> > > No; that's just disgusting. Also {rd,wr}msr_on_cpu() should die,
> > > exactly because people end up writing crap like this.
> >
> > Aside of that this is completely broken against other users of
> > DEBUGCTLMSR because it's not atomic vs. the other modifications.
> >
>
> OK. I will change it.
> I guess I need a way/function which can atomically rd,wr msr on all cpus.
> Are there existing alternative ways/functions to do that?

No; also don't forget hotplug.

2017-03-28 03:38:49

by Elliott, Robert (Servers)

[permalink] [raw]

Subject: RE: [PATCH 2/3] tools lib api fs: Add sysfs__write_int function

> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of [email protected]
> Sent: Thursday, March 23, 2017 1:26 PM
> Subject: [PATCH 2/3] tools lib api fs: Add sysfs__write_int function
...
> diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
...
> +
> +int sysfs__write_int(const char *entry, int value)
> +{
> + char path[PATH_MAX];
> + const char *sysfs = sysfs__mountpoint();
> +
> + if (!sysfs)
> + return -1;
> +
> + snprintf(path, sizeof(path), "%s/%s", sysfs, entry);
> +
> + return filename__write_int(path, value);

In the unlikely event of an overflow, it would be safer to confirm that
the string fit into the path array (by using scnprintf()?) before trying
to open that path.

2017-03-31 21:51:19

by Stephane Eranian

[permalink] [raw]

Subject: Re: [PATCH 0/3]measure SMI cost

On Thu, Mar 23, 2017 at 11:25 AM, <[email protected]> wrote:
> From: Kan Liang <[email protected]>
>
> Currently, there is no way to measure the time cost in System management
> mode (SMM) by perf.
>
> Intel perfmon supports FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. Once it sets,
> the PMU core counters will freeze on SMI handler. But it will not have an
> effect on free running counters. E.g. APERF counter.
> The cost of SMI can be measured by (aperf - cycles).
>
> A new sysfs entry /sys/device/cpu/freeze_on_smi is introduced to set
> FREEZE_WHILE_SMM bit in IA32_DEBUGCTL.
>
> A new --smi-cost mode in perf stat is implemented to measure the SMI cost
> by calculating cycles and aperf results. In practice, the percentages of
> SMI cycles should be more useful than absolute value. So the output will be
> the percentage of SMI cycles and SMI#.
>
You are talking about the percentage of what cycles?
Wallclock, unhalted_ref_cycles, unhalted_core_cycles?
I
> Here is an example output.
>
> Performance counter stats for 'sudo echo ':
>
> SMI cycles% SMI#
> 0.1% 1
>
> 0.010858678 seconds time elapsed
>
>
> Kan Liang (3):
> perf/x86: add sysfs entry to freeze counter on SMI
> tools lib api fs: Add sysfs__write_int function
> perf stat: Add support to measure SMI cost
>
> arch/x86/events/core.c | 10 +++++++
> arch/x86/events/intel/core.c | 48 ++++++++++++++++++++++++++++++++++
> arch/x86/events/perf_event.h | 3 +++
> arch/x86/include/asm/msr-index.h | 1 +
> tools/lib/api/fs/fs.c | 29 ++++++++++++++++++++
> tools/lib/api/fs/fs.h | 4 +++
> tools/perf/Documentation/perf-stat.txt | 9 +++++++
> tools/perf/builtin-stat.c | 48 ++++++++++++++++++++++++++++++++++
> tools/perf/util/stat-shadow.c | 33 +++++++++++++++++++++++
> tools/perf/util/stat.c | 2 ++
> tools/perf/util/stat.h | 2 ++
> 11 files changed, 189 insertions(+)
>
> --
> 2.7.4
>

2017-04-01 01:41:13

by Liang, Kan

[permalink] [raw]

Subject: RE: [PATCH 0/3]measure SMI cost

> On Thu, Mar 23, 2017 at 11:25 AM, <[email protected]> wrote:
> > From: Kan Liang <[email protected]>
> >
> > Currently, there is no way to measure the time cost in System
> > management mode (SMM) by perf.
> >
> > Intel perfmon supports FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. Once
> it
> > sets, the PMU core counters will freeze on SMI handler. But it will
> > not have an effect on free running counters. E.g. APERF counter.
> > The cost of SMI can be measured by (aperf - cycles).
> >
> > A new sysfs entry /sys/device/cpu/freeze_on_smi is introduced to set
> > FREEZE_WHILE_SMM bit in IA32_DEBUGCTL.
> >
> > A new --smi-cost mode in perf stat is implemented to measure the SMI
> > cost by calculating cycles and aperf results. In practice, the
> > percentages of SMI cycles should be more useful than absolute value.
> > So the output will be the percentage of SMI cycles and SMI#.
> >
> You are talking about the percentage of what cycles?
> Wallclock, unhalted_ref_cycles, unhalted_core_cycles?

Unhalted core cycles.

Thanks,
Kan