From: Kan Liang <[email protected]>
Some metric groups, e.g. Page_Walks_Utilization, will never count when
NMI watchdog is enabled.
$echo 1 > /proc/sys/kernel/nmi_watchdog
$perf stat -M Page_Walks_Utilization
Performance counter stats for 'system wide':
<not counted> itlb_misses.walk_pending (0.00%)
<not counted> dtlb_load_misses.walk_pending (0.00%)
<not counted> dtlb_store_misses.walk_pending (0.00%)
<not counted> ept.walk_pending (0.00%)
<not counted> cycles (0.00%)
2.343460588 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try
reorganizing the group.
A metric group is a weak group, which relies on group validation
code in the kernel to determine whether to be opened as a group or
a non-group. However, group validation code may return false-positives,
especially when NMI watchdog is enabled. (The metric group is allowed
as a group but will never be scheduled.)
The attempt to fix the group validation code has been rejected.
https://lore.kernel.org/lkml/[email protected]/
Because we cannot accurately predict whether the group can be scheduled
as a group, only by checking current status.
This patch set provides another solution to mitigate the issue.
Add "MetricConstraint" in event list, which provides a hint for perf tool,
e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
metric group to non-group (standalone metrics) if NMI watchdog is enabled.
After applying the patch,
$echo 1 > /proc/sys/kernel/nmi_watchdog
$perf stat -M Page_Walks_Utilization
Splitting metric group Page_Walks_Utilization into standalone metrics.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Performance counter stats for 'system wide':
18,253,454 itlb_misses.walk_pending # 0.0
Page_Walks_Utilization (50.55%)
78,051,525 dtlb_load_misses.walk_pending (50.55%)
29,213,063 dtlb_store_misses.walk_pending (50.55%)
0 ept.walk_pending (50.55%)
2,542,132,364 cycles (49.92%)
1.037095993 seconds time elapsed
Kan Liang (5):
perf jevents: Support metric constraint
perf metricgroup: Factor out metricgroup__add_metric_weak_group()
perf util: Factor out sysctl__nmi_watchdog_enabled()
perf metricgroup: Support metric constraint
perf vendor events: Add NO_NMI_WATCHDOG metric constraint
.../arch/x86/cascadelakex/clx-metrics.json | 3 +-
.../pmu-events/arch/x86/skylake/skl-metrics.json | 3 +-
.../pmu-events/arch/x86/skylakex/skx-metrics.json | 3 +-
tools/perf/pmu-events/jevents.c | 19 +++--
tools/perf/pmu-events/jevents.h | 2 +-
tools/perf/pmu-events/pmu-events.h | 1 +
tools/perf/util/metricgroup.c | 97 ++++++++++++++++------
tools/perf/util/stat-display.c | 6 +-
tools/perf/util/util.c | 18 ++++
tools/perf/util/util.h | 2 +
10 files changed, 116 insertions(+), 38 deletions(-)
--
2.7.4
From: Kan Liang <[email protected]>
Factor out metricgroup__add_metric_weak_group() which add metrics into a
weak group. The change can improve code readability. Because following
patch will introduce a function which add standalone metrics.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/metricgroup.c | 57 +++++++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 24 deletions(-)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 02aee94..1cd042c 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -399,13 +399,42 @@ void metricgroup__print(bool metrics, bool metricgroups, char *filter,
strlist__delete(metriclist);
}
+static void metricgroup__add_metric_weak_group(struct strbuf *events,
+ const char **ids,
+ int idnum)
+{
+ bool no_group = false;
+ int i;
+
+ for (i = 0; i < idnum; i++) {
+ pr_debug("found event %s\n", ids[i]);
+ /*
+ * Duration time maps to a software event and can make
+ * groups not count. Always use it outside a
+ * group.
+ */
+ if (!strcmp(ids[i], "duration_time")) {
+ if (i > 0)
+ strbuf_addf(events, "}:W,");
+ strbuf_addf(events, "duration_time");
+ no_group = true;
+ continue;
+ }
+ strbuf_addf(events, "%s%s",
+ i == 0 || no_group ? "{" : ",",
+ ids[i]);
+ no_group = false;
+ }
+ if (!no_group)
+ strbuf_addf(events, "}:W");
+}
+
static int metricgroup__add_metric(const char *metric, struct strbuf *events,
struct list_head *group_list)
{
struct pmu_events_map *map = perf_pmu__find_map(NULL);
struct pmu_event *pe;
- int ret = -EINVAL;
- int i, j;
+ int i, ret = -EINVAL;
if (!map)
return 0;
@@ -422,7 +451,6 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
const char **ids;
int idnum;
struct egroup *eg;
- bool no_group = false;
pr_debug("metric expr %s for %s\n", pe->metric_expr, pe->metric_name);
@@ -431,27 +459,8 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
continue;
if (events->len > 0)
strbuf_addf(events, ",");
- for (j = 0; j < idnum; j++) {
- pr_debug("found event %s\n", ids[j]);
- /*
- * Duration time maps to a software event and can make
- * groups not count. Always use it outside a
- * group.
- */
- if (!strcmp(ids[j], "duration_time")) {
- if (j > 0)
- strbuf_addf(events, "}:W,");
- strbuf_addf(events, "duration_time");
- no_group = true;
- continue;
- }
- strbuf_addf(events, "%s%s",
- j == 0 || no_group ? "{" : ",",
- ids[j]);
- no_group = false;
- }
- if (!no_group)
- strbuf_addf(events, "}:W");
+
+ metricgroup__add_metric_weak_group(events, ids, idnum);
eg = malloc(sizeof(struct egroup));
if (!eg) {
--
2.7.4
On Wed, Feb 19, 2020 at 11:08:35AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Some metric groups, e.g. Page_Walks_Utilization, will never count when
> NMI watchdog is enabled.
>
> $echo 1 > /proc/sys/kernel/nmi_watchdog
> $perf stat -M Page_Walks_Utilization
>
> Performance counter stats for 'system wide':
>
> <not counted> itlb_misses.walk_pending (0.00%)
> <not counted> dtlb_load_misses.walk_pending (0.00%)
> <not counted> dtlb_store_misses.walk_pending (0.00%)
> <not counted> ept.walk_pending (0.00%)
> <not counted> cycles (0.00%)
>
> 2.343460588 seconds time elapsed
>
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> The events in group usually have to be from the same PMU. Try
> reorganizing the group.
>
> A metric group is a weak group, which relies on group validation
> code in the kernel to determine whether to be opened as a group or
> a non-group. However, group validation code may return false-positives,
> especially when NMI watchdog is enabled. (The metric group is allowed
> as a group but will never be scheduled.)
>
> The attempt to fix the group validation code has been rejected.
> https://lore.kernel.org/lkml/[email protected]/
> Because we cannot accurately predict whether the group can be scheduled
> as a group, only by checking current status.
>
> This patch set provides another solution to mitigate the issue.
> Add "MetricConstraint" in event list, which provides a hint for perf tool,
> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
the problem is in the missing counter, that's taken by NMI watchdog, right?
and it's problem for any metric that won't fit to the available
counters.. shouldn't we rather do this workaround for any metric
that wouldn't fit in available counters? not just for chosen
ones..?
thanks,
jirka
On 2/20/2020 6:39 AM, Jiri Olsa wrote:
> On Wed, Feb 19, 2020 at 11:08:35AM -0800, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> Some metric groups, e.g. Page_Walks_Utilization, will never count when
>> NMI watchdog is enabled.
>>
>> $echo 1 > /proc/sys/kernel/nmi_watchdog
>> $perf stat -M Page_Walks_Utilization
>>
>> Performance counter stats for 'system wide':
>>
>> <not counted> itlb_misses.walk_pending (0.00%)
>> <not counted> dtlb_load_misses.walk_pending (0.00%)
>> <not counted> dtlb_store_misses.walk_pending (0.00%)
>> <not counted> ept.walk_pending (0.00%)
>> <not counted> cycles (0.00%)
>>
>> 2.343460588 seconds time elapsed
>>
>> Some events weren't counted. Try disabling the NMI watchdog:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> The events in group usually have to be from the same PMU. Try
>> reorganizing the group.
>>
>> A metric group is a weak group, which relies on group validation
>> code in the kernel to determine whether to be opened as a group or
>> a non-group. However, group validation code may return false-positives,
>> especially when NMI watchdog is enabled. (The metric group is allowed
>> as a group but will never be scheduled.)
>>
>> The attempt to fix the group validation code has been rejected.
>> https://lore.kernel.org/lkml/[email protected]/
>> Because we cannot accurately predict whether the group can be scheduled
>> as a group, only by checking current status.
>>
>> This patch set provides another solution to mitigate the issue.
>> Add "MetricConstraint" in event list, which provides a hint for perf tool,
>> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
>> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
>
> the problem is in the missing counter, that's taken by NMI watchdog, right?
>
> and it's problem for any metric that won't fit to the available
> counters.. shouldn't we rather do this workaround for any metric
> that wouldn't fit in available counters?
I think current perf already did this.
All metric groups are weak group. Kernel (validate_group()) tells perf
tool whether a metric group fit to available counters.
If yes, the metric group will be scheduled as a group.
If no, perf tool will not using a group and re-try. The code is as below.
try_again:
if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
/* Weak group failed. Reset the group. */
if ((errno == EINVAL || errno == EBADF) &&
counter->leader != counter &&
counter->weak_group) {
counter = perf_evlist__reset_weak_group(evsel_list, counter);
goto try_again;
}
This patch set is to workaroud the false-positives from the kernel.
Kernel only validate the group itself. It assumes that all counters are
available. So, when any counter is permanently occupied by others, e.g.
watchdog, the validate_group() may return false-positives.
? not just for chosen
> ones..?
For now, I think we only need to workaround the Page_Walks_Utilization
metric group. Because it has 5 events, and one of the events is cycles.
The cycles has to be scheduled on fixed counter 2. But it's already
occupied by watchdog.
The false-positives of validate_group() will trigger the issue (metric
group never be scheduled.).
For other metric groups, even they have cycles, the issue should not be
triggered.
For example, if they have 4 or less events, the cycles can be scheduled
to GP counter instead.
If they have 6 or more events, the weak group will be reject anyway.
Perf tool will open it as non-group (standalone metrics).
I think we only need to apply the constraint for the
Page_Walks_Utilization metric group for now.
Thanks,
Kan
>
> thanks,
> jirka
>
> For other metric groups, even they have cycles, the issue should not be
> triggered.
> For example, if they have 4 or less events, the cycles can be scheduled to
> GP counter instead.
> If they have 6 or more events, the weak group will be reject anyway.
> Perf tool will open it as non-group (standalone metrics).
Technically it can also happen for 9 events with Hyper Threading off or
on Icelake (8 generic counters)
I didn't think we had any of those, but please double check.
-Andi
On 2/20/2020 11:43 AM, Andi Kleen wrote:
>> For other metric groups, even they have cycles, the issue should not be
>> triggered.
>> For example, if they have 4 or less events, the cycles can be scheduled to
>> GP counter instead.
>> If they have 6 or more events, the weak group will be reject anyway.
>> Perf tool will open it as non-group (standalone metrics).
>
> Technically it can also happen for 9 events with Hyper Threading off or
> on Icelake (8 generic counters)
>
> I didn't think we had any of those, but please double check.
>
I checked all public metrics groups. Right, we don't have such metrics
group with 8 GP events + 1 cycles.
We only need to add watchdog constraint for Page_Walks_Utilization for now.
Thanks,
Kan
On Thu, Feb 20, 2020 at 11:03:35AM -0500, Liang, Kan wrote:
>
>
> On 2/20/2020 6:39 AM, Jiri Olsa wrote:
> > On Wed, Feb 19, 2020 at 11:08:35AM -0800, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > Some metric groups, e.g. Page_Walks_Utilization, will never count when
> > > NMI watchdog is enabled.
> > >
> > > $echo 1 > /proc/sys/kernel/nmi_watchdog
> > > $perf stat -M Page_Walks_Utilization
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > <not counted> itlb_misses.walk_pending (0.00%)
> > > <not counted> dtlb_load_misses.walk_pending (0.00%)
> > > <not counted> dtlb_store_misses.walk_pending (0.00%)
> > > <not counted> ept.walk_pending (0.00%)
> > > <not counted> cycles (0.00%)
> > >
> > > 2.343460588 seconds time elapsed
> > >
> > > Some events weren't counted. Try disabling the NMI watchdog:
> > > echo 0 > /proc/sys/kernel/nmi_watchdog
> > > perf stat ...
> > > echo 1 > /proc/sys/kernel/nmi_watchdog
> > > The events in group usually have to be from the same PMU. Try
> > > reorganizing the group.
> > >
> > > A metric group is a weak group, which relies on group validation
> > > code in the kernel to determine whether to be opened as a group or
> > > a non-group. However, group validation code may return false-positives,
> > > especially when NMI watchdog is enabled. (The metric group is allowed
> > > as a group but will never be scheduled.)
> > >
> > > The attempt to fix the group validation code has been rejected.
> > > https://lore.kernel.org/lkml/[email protected]/
> > > Because we cannot accurately predict whether the group can be scheduled
> > > as a group, only by checking current status.
> > >
> > > This patch set provides another solution to mitigate the issue.
> > > Add "MetricConstraint" in event list, which provides a hint for perf tool,
> > > e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
> > > metric group to non-group (standalone metrics) if NMI watchdog is enabled.
> >
> > the problem is in the missing counter, that's taken by NMI watchdog, right?
> >
> > and it's problem for any metric that won't fit to the available
> > counters.. shouldn't we rather do this workaround for any metric
> > that wouldn't fit in available counters?
>
> I think current perf already did this.
> All metric groups are weak group. Kernel (validate_group()) tells perf tool
> whether a metric group fit to available counters.
> If yes, the metric group will be scheduled as a group.
> If no, perf tool will not using a group and re-try. The code is as below.
>
> try_again:
> if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
>
> /* Weak group failed. Reset the group. */
> if ((errno == EINVAL || errno == EBADF) &&
> counter->leader != counter &&
> counter->weak_group) {
> counter = perf_evlist__reset_weak_group(evsel_list, counter);
> goto try_again;
> }
>
>
> This patch set is to workaroud the false-positives from the kernel.
ah so validate_group will say the group can go on, but then
the pmu add will fail because the countters are occupied
thanks for explanation, looks good
jirka