LinuxLists.cc - [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes

2019-04-08 20:18:09

Subject: [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes

This short patch series improves the TFA patch series by adding a
guarantee to users each time the allow_force_tsx_abort (TFA) sysctl
control knob is modified.

The current TFA support in perf_events operates as follow:
- TFA=1
The PMU has priority over TSX, if PMC3 is needed, then TSX transactions
are forced to abort. PMU has access to PMC3 and can schedule events on it.

- TFA=0
TSX has priority over PMU. If PMC3 is needed for an event, then the event
must be scheduled on another counter. PMC3 is not available.

When a sysadmin modifies TFA, the current code base does not change anything
to the events measured at the time nor the actual MSR controlling TFA. If the
kernel transitions from TFA=1 to TFA=0, nothing happens until the events are
descheduled on context switch, multiplexing or termination of measurement.
That means the TSX transactions still fail until then. There is no easy way
to evaluate how long this can take.

This patch series addresses this issue by rescheduling the events as part of the
sysctl changes. That way, there is the guarantee that no more perf_events events
are running on PMC3 by the time the write() syscall (from the echo) returns, and
that TSX transactions may succeed from then on. Similarly, when transitioning
from TFA=0 to TFA=1, the events are rescheduled and can use PMC3 immediately if
needed and TSX transactions systematically abort, by the time the write() syscall
returns.

To make this work, the patch uses an existing reschedule function in the generic
code, ctx_resched(). In V2, we export a new function called perf_ctx_resched()
which takes care of locking the contexts and invoking ctx_resched().

The patch adds a x86_get_pmu() call which is less than ideal, but I am open to
suggestions here.

In V2, we also switched from ksttoul() to kstrtobool() and added the proper
get_online_cpus()/put_online_cpus().

Signed-off-by: Stephane Eranian <[email protected]>

Stephane Eranian (2):
perf/core: add perf_ctx_resched() as global function
perf/x86/intel: force resched when TFA sysctl is modified

arch/x86/events/core.c | 4 +++
arch/x86/events/intel/core.c | 53 ++++++++++++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 1 +
include/linux/perf_event.h | 14 ++++++++++
kernel/events/core.c | 18 ++++++------
5 files changed, 79 insertions(+), 11 deletions(-)

--
2.21.0.392.gf8f6787159e-goog

2019-04-08 19:31:35

by Stephane Eranian

[permalink] [raw]

Subject: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function

This patch add perf_ctx_resched() a global function that can be called
to force rescheduling of events based on event types. The function locks
both cpuctx and task_ctx internally. This will be used by a subsequent patch.

Signed-off-by: Stephane Eranian <[email protected]>
Change-Id: Icbc05e5f461fd6e091b46778fe62b23f308e2be7
---
include/linux/perf_event.h | 14 ++++++++++++++
kernel/events/core.c | 18 +++++++++---------
2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 085a95e2582a..ee8a275df0ed 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -822,6 +822,15 @@ struct bpf_perf_event_data_kern {
struct perf_event *event;
};

+enum event_type_t {
+ EVENT_FLEXIBLE = 0x1,
+ EVENT_PINNED = 0x2,
+ EVENT_TIME = 0x4,
+ /* see ctx_resched() for details */
+ EVENT_CPU = 0x8,
+ EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
+};
+
#ifdef CONFIG_CGROUP_PERF

/*
@@ -888,6 +897,11 @@ extern void perf_sched_cb_dec(struct pmu *pmu);
extern void perf_sched_cb_inc(struct pmu *pmu);
extern int perf_event_task_disable(void);
extern int perf_event_task_enable(void);
+
+extern void perf_ctx_resched(struct perf_cpu_context *cpuctx,
+ struct perf_event_context *task_ctx,
+ enum event_type_t event_type);
+
extern int perf_event_refresh(struct perf_event *event, int refresh);
extern void perf_event_update_userpage(struct perf_event *event);
extern int perf_event_release_kernel(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dfc4bab0b02b..30474064ec22 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -354,15 +354,6 @@ static void event_function_local(struct perf_event *event, event_f func, void *d
(PERF_SAMPLE_BRANCH_KERNEL |\
PERF_SAMPLE_BRANCH_HV)

-enum event_type_t {
- EVENT_FLEXIBLE = 0x1,
- EVENT_PINNED = 0x2,
- EVENT_TIME = 0x4,
- /* see ctx_resched() for details */
- EVENT_CPU = 0x8,
- EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
-};
-
/*
* perf_sched_events : >0 events exist
* perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
@@ -2477,6 +2468,15 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
perf_pmu_enable(cpuctx->ctx.pmu);
}

+void perf_ctx_resched(struct perf_cpu_context *cpuctx,
+ struct perf_event_context *task_ctx,
+ enum event_type_t event_type)
+{
+ perf_ctx_lock(cpuctx, task_ctx);
+ ctx_resched(cpuctx, task_ctx, event_type);
+ perf_ctx_unlock(cpuctx, task_ctx);
+}
+
/*
* Cross CPU call to install and enable a performance event
*
--
2.21.0.392.gf8f6787159e-goog

2019-04-08 19:32:31

by Stephane Eranian

[permalink] [raw]

Subject: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified

This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
event is using PMC3 when the echo command returns. Vice-Versa, when TFA
is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).

$ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
1.000123979 125,768,725,208 branches
1.000562520 125,631,000,456 branches
1.000942898 125,487,114,291 branches
1.001333316 125,323,363,620 branches
2.004721306 125,514,968,546 branches
2.005114560 125,511,110,861 branches
2.005482722 125,510,132,724 branches
2.005851245 125,508,967,086 branches
3.006323475 125,166,570,648 branches
3.006709247 125,165,650,056 branches
3.007086605 125,164,639,142 branches
3.007459298 125,164,402,912 branches
4.007922698 125,045,577,140 branches
4.008310775 125,046,804,324 branches
4.008670814 125,048,265,111 branches
4.009039251 125,048,677,611 branches
5.009503373 125,122,240,217 branches
5.009897067 125,122,450,517 branches

Then on another connection, sysadmin does:
$ echo 1 >/sys/devices/cpu/allow_tsx_force_abort

Then perf stat adjusts the events immediately:

5.010286029 125,121,393,483 branches
5.010646308 125,120,556,786 branches
6.011113588 124,963,351,832 branches
6.011510331 124,964,267,566 branches
6.011889913 124,964,829,130 branches
6.012262996 124,965,841,156 branches
7.012708299 124,419,832,234 branches [79.69%]
7.012847908 124,416,363,853 branches [79.73%]
7.013225462 124,400,723,712 branches [79.73%]
7.013598191 124,376,154,434 branches [79.70%]
8.014089834 124,250,862,693 branches [74.98%]
8.014481363 124,267,539,139 branches [74.94%]
8.014856006 124,259,519,786 branches [74.98%]
8.014980848 124,225,457,969 branches [75.04%]
9.015464576 124,204,235,423 branches [75.03%]
9.015858587 124,204,988,490 branches [75.04%]
9.016243680 124,220,092,486 branches [74.99%]
9.016620104 124,231,260,146 branches [74.94%]

And vice-versa if the syadmin does:
$ echo 0 >/sys/devices/cpu/allow_tsx_force_abort

Events are again spread over the 4 counters:

10.017096277 124,276,230,565 branches [74.96%]
10.017237209 124,228,062,171 branches [75.03%]
10.017478637 124,178,780,626 branches [75.03%]
10.017853402 124,198,316,177 branches [75.03%]
11.018334423 124,602,418,933 branches [85.40%]
11.018722584 124,602,921,320 branches [85.42%]
11.019095621 124,603,956,093 branches [85.42%]
11.019467742 124,595,273,783 branches [85.42%]
12.019945736 125,110,114,864 branches
12.020330764 125,109,334,472 branches
12.020688740 125,109,818,865 branches
12.021054020 125,108,594,014 branches
13.021516774 125,109,164,018 branches
13.021903640 125,108,794,510 branches
13.022270770 125,107,756,978 branches
13.022630819 125,109,380,471 branches
14.023114989 125,133,140,817 branches
14.023501880 125,133,785,858 branches
14.023868339 125,133,852,700 branches

Signed-off-by: Stephane Eranian <[email protected]>
Change-Id: Ib443265edce31b93ca4d10fe7695c05d00a7178e
---
arch/x86/events/core.c | 4 +++
arch/x86/events/intel/core.c | 53 ++++++++++++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 1 +
3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 87b50f4be201..fdd106267fd2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -661,6 +661,10 @@ static inline int is_x86_event(struct perf_event *event)
return event->pmu == &pmu;
}

+struct pmu *x86_get_pmu(void)
+{
+ return &pmu;
+}
/*
* Event scheduler state:
*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1403c05e25e2..20698e84c388 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4156,6 +4156,53 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
return count;
}

+static void update_tfa_sched(void *ignored)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct pmu *pmu = x86_get_pmu();
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+ struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+ /*
+ * check if PMC3 is used
+ * and if so force schedule out for all event types all contexts
+ */
+ if (test_bit(3, cpuc->active_mask))
+ perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+}
+
+static ssize_t show_sysctl_tfa(struct device *cdev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
+}
+
+static ssize_t set_sysctl_tfa(struct device *cdev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ bool val;
+ ssize_t ret;
+
+ ret = kstrtobool(buf, &val);
+ if (ret)
+ return ret;
+
+ /* no change */
+ if (val == allow_tsx_force_abort)
+ return count;
+
+ allow_tsx_force_abort = val;
+
+ get_online_cpus();
+ on_each_cpu(update_tfa_sched, NULL, 1);
+ put_online_cpus();
+
+ return count;
+}
+
+
static DEVICE_ATTR_RW(freeze_on_smi);

static ssize_t branches_show(struct device *cdev,
@@ -4188,7 +4235,9 @@ static struct attribute *intel_pmu_caps_attrs[] = {
NULL
};

-static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+static DEVICE_ATTR(allow_tsx_force_abort, 0644,
+ show_sysctl_tfa,
+ set_sysctl_tfa);

static struct attribute *intel_pmu_attrs[] = {
&dev_attr_freeze_on_smi.attr,
@@ -4697,7 +4746,7 @@ __init int intel_pmu_init(void)
x86_pmu.get_event_constraints = tfa_get_event_constraints;
x86_pmu.enable_all = intel_tfa_pmu_enable_all;
x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
- intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+ intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr;
}

pr_cont("Skylake events, ");
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index b4d41829da4f..49ba8171ad7a 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -713,6 +713,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \
.event_str_ht = ht, \
}

+struct pmu *x86_get_pmu(void);
extern struct x86_pmu x86_pmu __read_mostly;

static inline bool x86_pmu_has_lbr_callstack(void)
--
2.21.0.392.gf8f6787159e-goog

2019-04-15 15:46:53

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function

On Mon, Apr 08, 2019 at 10:32:51AM -0700, Stephane Eranian wrote:
> This patch add perf_ctx_resched() a global function that can be called
> to force rescheduling of events based on event types. The function locks
> both cpuctx and task_ctx internally. This will be used by a subsequent patch.

I took out the task_ctx and event_type arguments.

---
Subject: perf/core: Add perf_ctx_resched() as global function
From: Stephane Eranian <[email protected]>
Date: Mon, 8 Apr 2019 10:32:51 -0700

This patch add perf_ctx_resched() a global function that can be called
to force rescheduling of events. The function locks both cpuctx and
task_ctx internally. This will be used by a subsequent patch.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Stephane Eranian <[email protected]>
[peterz: simplified calling convention]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/perf_event.h | 3 +++
kernel/events/core.c | 9 +++++++++
2 files changed, 12 insertions(+)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -888,6 +888,9 @@ extern void perf_sched_cb_dec(struct pmu
extern void perf_sched_cb_inc(struct pmu *pmu);
extern int perf_event_task_disable(void);
extern int perf_event_task_enable(void);
+
+extern void perf_ctx_resched(struct perf_cpu_context *cpuctx);
+
extern int perf_event_refresh(struct perf_event *event, int refresh);
extern void perf_event_update_userpage(struct perf_event *event);
extern int perf_event_release_kernel(struct perf_event *event);
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2478,6 +2478,15 @@ static void ctx_resched(struct perf_cpu_
perf_pmu_enable(cpuctx->ctx.pmu);
}

+void perf_ctx_resched(struct perf_cpu_context *cpuctx)
+{
+ struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+ perf_ctx_lock(cpuctx, task_ctx);
+ ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+ perf_ctx_unlock(cpuctx, task_ctx);
+}
+
/*
* Cross CPU call to install and enable a performance event
*

2019-04-15 15:54:52

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function

On Mon, Apr 15, 2019 at 05:45:37PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:32:51AM -0700, Stephane Eranian wrote:
> > This patch add perf_ctx_resched() a global function that can be called
> > to force rescheduling of events based on event types. The function locks
> > both cpuctx and task_ctx internally. This will be used by a subsequent patch.
>
> I took out the task_ctx and event_type arguments.

> +void perf_ctx_resched(struct perf_cpu_context *cpuctx)
> +{
> + struct perf_event_context *task_ctx = cpuctx->task_ctx;
> +
> + perf_ctx_lock(cpuctx, task_ctx);
> + ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
> + perf_ctx_unlock(cpuctx, task_ctx);
> +}

Hmm... I might just rework the thing to:

perf_pmu_resched(struct pmu *pmu);

As that is what we want; esp. in the light of that whole core context
rewrite that i'm still sitting on :/

2019-04-15 16:00:08

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified

On Mon, Apr 08, 2019 at 10:32:52AM -0700, Stephane Eranian wrote:
> +static ssize_t set_sysctl_tfa(struct device *cdev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + bool val;
> + ssize_t ret;
> +
> + ret = kstrtobool(buf, &val);
> + if (ret)
> + return ret;
> +
> + /* no change */
> + if (val == allow_tsx_force_abort)
> + return count;
> +
> + allow_tsx_force_abort = val;
> +
> + get_online_cpus();
> + on_each_cpu(update_tfa_sched, NULL, 1);
> + put_online_cpus();
> +
> + return count;
> +}

So we care about concurrent writing to that file?

2019-04-15 17:50:15

by Stephane Eranian

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified

On Mon, Apr 15, 2019 at 8:57 AM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Apr 08, 2019 at 10:32:52AM -0700, Stephane Eranian wrote:
> > +static ssize_t set_sysctl_tfa(struct device *cdev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + bool val;
> > + ssize_t ret;
> > +
> > + ret = kstrtobool(buf, &val);
> > + if (ret)
> > + return ret;
> > +
> > + /* no change */
> > + if (val == allow_tsx_force_abort)
> > + return count;
> > +
> > + allow_tsx_force_abort = val;
> > +
> > + get_online_cpus();
> > + on_each_cpu(update_tfa_sched, NULL, 1);
> > + put_online_cpus();
> > +
> > + return count;
> > +}
>
> So we care about concurrent writing to that file?
Not likely but we care about seeing the effects on event scheduling
before the sysctl returns.

2019-04-16 11:34:24

by tip-bot for Vasyl Gomonovych

[permalink] [raw]

Subject: [tip:perf/core] perf/core: Add perf_pmu_resched() as global function

Commit-ID: c68d224e5ed15605e651e2482c6ffd95915ddf58
Gitweb: https://git.kernel.org/tip/c68d224e5ed15605e651e2482c6ffd95915ddf58
Author: Stephane Eranian <[email protected]>
AuthorDate: Mon, 8 Apr 2019 10:32:51 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 16 Apr 2019 12:19:34 +0200

perf/core: Add perf_pmu_resched() as global function

This patch add perf_pmu_resched() a global function that can be called
to force rescheduling of events for a given PMU. The function locks
both cpuctx and task_ctx internally. This will be used by a subsequent
patch.

Signed-off-by: Stephane Eranian <[email protected]>
[ Simplified the calling convention. ]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/perf_event.h | 3 +++
kernel/events/core.c | 10 ++++++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 085a95e2582a..f3864e1c5569 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -888,6 +888,9 @@ extern void perf_sched_cb_dec(struct pmu *pmu);
extern void perf_sched_cb_inc(struct pmu *pmu);
extern int perf_event_task_disable(void);
extern int perf_event_task_enable(void);
+
+extern void perf_pmu_resched(struct pmu *pmu);
+
extern int perf_event_refresh(struct perf_event *event, int refresh);
extern void perf_event_update_userpage(struct perf_event *event);
extern int perf_event_release_kernel(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30a572e4c6f1..abbd4b3b96c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2478,6 +2478,16 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
perf_pmu_enable(cpuctx->ctx.pmu);
}

+void perf_pmu_resched(struct pmu *pmu)
+{
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+ struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+ perf_ctx_lock(cpuctx, task_ctx);
+ ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+ perf_ctx_unlock(cpuctx, task_ctx);
+}
+
/*
* Cross CPU call to install and enable a performance event
*

2019-04-16 11:35:36

by tip-bot for Vasyl Gomonovych

[permalink] [raw]

Subject: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified

Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
Author: Stephane Eranian <[email protected]>
AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 16 Apr 2019 12:19:35 +0200

perf/x86/intel: Force resched when TFA sysctl is modified

This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
event is using PMC3 when the echo command returns. Vice-Versa, when TFA
is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).

$ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
1.000123979 125,768,725,208 branches
1.000562520 125,631,000,456 branches
1.000942898 125,487,114,291 branches
1.001333316 125,323,363,620 branches
2.004721306 125,514,968,546 branches
2.005114560 125,511,110,861 branches
2.005482722 125,510,132,724 branches
2.005851245 125,508,967,086 branches
3.006323475 125,166,570,648 branches
3.006709247 125,165,650,056 branches
3.007086605 125,164,639,142 branches
3.007459298 125,164,402,912 branches
4.007922698 125,045,577,140 branches
4.008310775 125,046,804,324 branches
4.008670814 125,048,265,111 branches
4.009039251 125,048,677,611 branches
5.009503373 125,122,240,217 branches
5.009897067 125,122,450,517 branches

Then on another connection, sysadmin does:

$ echo 1 >/sys/devices/cpu/allow_tsx_force_abort

Then perf stat adjusts the events immediately:

5.010286029 125,121,393,483 branches
5.010646308 125,120,556,786 branches
6.011113588 124,963,351,832 branches
6.011510331 124,964,267,566 branches
6.011889913 124,964,829,130 branches
6.012262996 124,965,841,156 branches
7.012708299 124,419,832,234 branches [79.69%]
7.012847908 124,416,363,853 branches [79.73%]
7.013225462 124,400,723,712 branches [79.73%]
7.013598191 124,376,154,434 branches [79.70%]
8.014089834 124,250,862,693 branches [74.98%]
8.014481363 124,267,539,139 branches [74.94%]
8.014856006 124,259,519,786 branches [74.98%]
8.014980848 124,225,457,969 branches [75.04%]
9.015464576 124,204,235,423 branches [75.03%]
9.015858587 124,204,988,490 branches [75.04%]
9.016243680 124,220,092,486 branches [74.99%]
9.016620104 124,231,260,146 branches [74.94%]

And vice-versa if the syadmin does:

$ echo 0 >/sys/devices/cpu/allow_tsx_force_abort

Events are again spread over the 4 counters:

10.017096277 124,276,230,565 branches [74.96%]
10.017237209 124,228,062,171 branches [75.03%]
10.017478637 124,178,780,626 branches [75.03%]
10.017853402 124,198,316,177 branches [75.03%]
11.018334423 124,602,418,933 branches [85.40%]
11.018722584 124,602,921,320 branches [85.42%]
11.019095621 124,603,956,093 branches [85.42%]
11.019467742 124,595,273,783 branches [85.42%]
12.019945736 125,110,114,864 branches
12.020330764 125,109,334,472 branches
12.020688740 125,109,818,865 branches
12.021054020 125,108,594,014 branches
13.021516774 125,109,164,018 branches
13.021903640 125,108,794,510 branches
13.022270770 125,107,756,978 branches
13.022630819 125,109,380,471 branches
14.023114989 125,133,140,817 branches
14.023501880 125,133,785,858 branches
14.023868339 125,133,852,700 branches

Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/core.c | 4 ++++
arch/x86/events/intel/core.c | 50 ++++++++++++++++++++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 1 +
3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 87b50f4be201..fdd106267fd2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -661,6 +661,10 @@ static inline int is_x86_event(struct perf_event *event)
return event->pmu == &pmu;
}

+struct pmu *x86_get_pmu(void)
+{
+ return &pmu;
+}
/*
* Event scheduler state:
*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1bb59c4c59f2..8265b5026a19 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4156,6 +4156,50 @@ done:
return count;
}

+static void update_tfa_sched(void *ignored)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+ /*
+ * check if PMC3 is used
+ * and if so force schedule out for all event types all contexts
+ */
+ if (test_bit(3, cpuc->active_mask))
+ perf_pmu_resched(x86_get_pmu());
+}
+
+static ssize_t show_sysctl_tfa(struct device *cdev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
+}
+
+static ssize_t set_sysctl_tfa(struct device *cdev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ bool val;
+ ssize_t ret;
+
+ ret = kstrtobool(buf, &val);
+ if (ret)
+ return ret;
+
+ /* no change */
+ if (val == allow_tsx_force_abort)
+ return count;
+
+ allow_tsx_force_abort = val;
+
+ get_online_cpus();
+ on_each_cpu(update_tfa_sched, NULL, 1);
+ put_online_cpus();
+
+ return count;
+}
+
+
static DEVICE_ATTR_RW(freeze_on_smi);

static ssize_t branches_show(struct device *cdev,
@@ -4188,7 +4232,9 @@ static struct attribute *intel_pmu_caps_attrs[] = {
NULL
};

-static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+static DEVICE_ATTR(allow_tsx_force_abort, 0644,
+ show_sysctl_tfa,
+ set_sysctl_tfa);

static struct attribute *intel_pmu_attrs[] = {
&dev_attr_freeze_on_smi.attr,
@@ -4697,7 +4743,7 @@ __init int intel_pmu_init(void)
x86_pmu.get_event_constraints = tfa_get_event_constraints;
x86_pmu.enable_all = intel_tfa_pmu_enable_all;
x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
- intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+ intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr;
}

pr_cont("Skylake events, ");
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index e544d83ea4b4..9e474a5f3b86 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -713,6 +713,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \
.event_str_ht = ht, \
}

+struct pmu *x86_get_pmu(void);
extern struct x86_pmu x86_pmu __read_mostly;

static inline bool x86_pmu_has_lbr_callstack(void)

2019-04-16 16:29:00

by Vince Weaver

[permalink] [raw]

Subject: Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified

On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:

> Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> Author: Stephane Eranian <[email protected]>
> AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
>
> perf/x86/intel: Force resched when TFA sysctl is modified

What's TFA? Tuna-fish-alarm? Nowhere in the commit or in the code does
it ever say what a TFA is or why we'd want to resched when it is modified.

Vince

2019-04-16 17:42:32

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified

On Tue, Apr 16, 2019 at 12:28:00PM -0400, Vince Weaver wrote:
> On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:
>
> > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Author: Stephane Eranian <[email protected]>
> > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> >
> > perf/x86/intel: Force resched when TFA sysctl is modified
>
> What's TFA? Tuna-fish-alarm? Nowhere in the commit or in the code does
> it ever say what a TFA is or why we'd want to resched when it is modified.

See commit: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")

Author: Peter Zijlstra (Intel) <[email protected]>
Date: Tue Mar 5 22:23:18 2019 +0100

perf/x86/intel: Implement support for TSX Force Abort

Skylake (and later) will receive a microcode update to address a TSX
errata. This microcode will, on execution of a TSX instruction
(speculative or not) use (clobber) PMC3. This update will also provide
a new MSR to change this behaviour along with a CPUID bit to enumerate
the presence of this new MSR.

When the MSR gets set; the microcode will no longer use PMC3 but will
Force Abort every TSX transaction (upon executing COMMIT).

When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
unused.

When TFA is not allowed; clear PMC3 from all constraints such that it
will not get used.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

2019-04-17 06:07:38

by Ingo Molnar

[permalink] [raw]

Subject: Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified

* Vince Weaver <[email protected]> wrote:

> On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:
>
> > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Author: Stephane Eranian <[email protected]>
> > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> >
> > perf/x86/intel: Force resched when TFA sysctl is modified
>
> What's TFA? Tuna-fish-alarm?

Heh, I wish! :-)

> [...] Nowhere in the commit or in the code does it ever say what a TFA
> is or why we'd want to resched when it is modified.

Yeah, it's the TSX-Force-Abort acronym - Intel has numbed our general
dislike to random acrynyms ...

Peter and me usually fix such changelog context omissions, but this one
slipped through. :-/

The commit is too deep down perf/core already to rebase it just for the
changelog, but if we are going to rebase it for some functional reason
I'll take care of it next time around.

TFA. (Thanks For your Assistance. :-)

Ingo

2019-04-18 21:53:04

by Stephane Eranian

[permalink] [raw]

Subject: Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified

Vince

On Tue, Apr 16, 2019 at 11:06 PM Ingo Molnar <[email protected]> wrote:
>
>
> * Vince Weaver <[email protected]> wrote:
>
> > On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:
> >
> > > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > > Author: Stephane Eranian <[email protected]>
> > > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> > > Committer: Ingo Molnar <[email protected]>
> > > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> > >
> > > perf/x86/intel: Force resched when TFA sysctl is modified
> >
> > What's TFA? Tuna-fish-alarm?
>
> Heh, I wish! :-)
>
Sorry about the confusion. I was just trying to mimic the function
names that Peter used
in the code. Hard to fit the whole sysctl name in the title, anyway.

> > [...] Nowhere in the commit or in the code does it ever say what a TFA
> > is or why we'd want to resched when it is modified.
>
> Yeah, it's the TSX-Force-Abort acronym - Intel has numbed our general
> dislike to random acrynyms ...
>
> Peter and me usually fix such changelog context omissions, but this one
> slipped through. :-/
>
> The commit is too deep down perf/core already to rebase it just for the
> changelog, but if we are going to rebase it for some functional reason
> I'll take care of it next time around.
>
> TFA. (Thanks For your Assistance. :-)
>
> Ingo