This short patch series improves the TFA patch series by adding a
guarantee to users each time the allow_force_tsx_abort (TFA) sysctl
control knob is modified.
The current TFA support in perf_events operates as follow:
- TFA=1
The PMU has priority over TSX, if PMC3 is needed, then TSX transactions
are forced to abort. PMU has access to PMC3 and can schedule events on it.
- TFA=0
TSX has priority over PMU. If PMC3 is needed for an event, then the event
must be scheduled on another counter. PMC3 is not available.
When a sysadmin modifies TFA, the current code base does not change anything
to the events measured at the time nor the actual MSR controlling TFA. If the
kernel transitions from TFA=1 to TFA=0, nothing happens until the events are
descheduled on context switch, multiplexing or termination of measurement.
That means the TSX transactions still fail until then. There is no easy way
to evaluate how long this can take.
This patch series addresses this issue by rescheduling the events as part of the
sysctl changes. That way, there is the guarantee that no more perf_events events
are running on PMC3 by the time the write() syscall (from the echo) returns, and
that TSX transactions may succeed from then on. Similarly, when transitioning
from TFA=0 to TFA=1, the events are rescheduled and can use PMC3 immediately if
needed and TSX transactions systematically abort, by the time the write() syscall
returns.
To make this work, the patch uses an existing reschedule function in the generic
code. It makes it visible outside the generic code as well as the context locking
code. All to avoid code duplication. Given there is no good way to find the struct
pmu, if you do not have it, the patch adds a x86_get_pmu() call which is less than
ideal, but I am open to suggestions here.
Signed-off-by: Stephane Eranian <[email protected]>
Stephane Eranian (3):
perf/core: make perf_ctx_*lock() global inline functions
perf/core: make ctx_resched() a global function
perf/x86/intel: force resched when TFA sysctl is modified
arch/x86/events/core.c | 4 +++
arch/x86/events/intel/core.c | 60 ++++++++++++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 1 +
include/linux/perf_event.h | 28 +++++++++++++++++
kernel/events/core.c | 37 ++++------------------
5 files changed, 97 insertions(+), 33 deletions(-)
--
2.21.0.392.gf8f6787159e-goog
This patch renames ctx_resched() to perf_ctx_resched() and makes
the function globally accessible. This is to prepare for the next
patch which needs to call this function from arch specific code.
Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/perf_event.h | 12 ++++++++++++
kernel/events/core.c | 21 ++++++---------------
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 514de997696b..150cfd493ad2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -829,6 +829,15 @@ struct bpf_perf_event_data_kern {
struct perf_event *event;
};
+enum event_type_t {
+ EVENT_FLEXIBLE = 0x1,
+ EVENT_PINNED = 0x2,
+ EVENT_TIME = 0x4,
+ /* see ctx_resched() for details */
+ EVENT_CPU = 0x8,
+ EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
+};
+
#ifdef CONFIG_CGROUP_PERF
/*
@@ -895,6 +904,9 @@ extern void perf_sched_cb_dec(struct pmu *pmu);
extern void perf_sched_cb_inc(struct pmu *pmu);
extern int perf_event_task_disable(void);
extern int perf_event_task_enable(void);
+extern void perf_ctx_resched(struct perf_cpu_context *cpuctx,
+ struct perf_event_context *task_ctx,
+ enum event_type_t event_type);
extern int perf_event_refresh(struct perf_event *event, int refresh);
extern void perf_event_update_userpage(struct perf_event *event);
extern int perf_event_release_kernel(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 429bf6d8be95..48b955a2b7f1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -338,15 +338,6 @@ static void event_function_local(struct perf_event *event, event_f func, void *d
(PERF_SAMPLE_BRANCH_KERNEL |\
PERF_SAMPLE_BRANCH_HV)
-enum event_type_t {
- EVENT_FLEXIBLE = 0x1,
- EVENT_PINNED = 0x2,
- EVENT_TIME = 0x4,
- /* see ctx_resched() for details */
- EVENT_CPU = 0x8,
- EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
-};
-
/*
* perf_sched_events : >0 events exist
* perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
@@ -2430,9 +2421,9 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
* event_type is a bit mask of the types of events involved. For CPU events,
* event_type is only either EVENT_PINNED or EVENT_FLEXIBLE.
*/
-static void ctx_resched(struct perf_cpu_context *cpuctx,
- struct perf_event_context *task_ctx,
- enum event_type_t event_type)
+void perf_ctx_resched(struct perf_cpu_context *cpuctx,
+ struct perf_event_context *task_ctx,
+ enum event_type_t event_type)
{
enum event_type_t ctx_event_type;
bool cpu_event = !!(event_type & EVENT_CPU);
@@ -2520,7 +2511,7 @@ static int __perf_install_in_context(void *info)
if (reprogram) {
ctx_sched_out(ctx, cpuctx, EVENT_TIME);
add_event_to_ctx(event, ctx);
- ctx_resched(cpuctx, task_ctx, get_event_type(event));
+ perf_ctx_resched(cpuctx, task_ctx, get_event_type(event));
} else {
add_event_to_ctx(event, ctx);
}
@@ -2664,7 +2655,7 @@ static void __perf_event_enable(struct perf_event *event,
if (ctx->task)
WARN_ON_ONCE(task_ctx != ctx);
- ctx_resched(cpuctx, task_ctx, get_event_type(event));
+ perf_ctx_resched(cpuctx, task_ctx, get_event_type(event));
}
/*
@@ -3782,7 +3773,7 @@ static void perf_event_enable_on_exec(int ctxn)
*/
if (enabled) {
clone_ctx = unclone_ctx(ctx);
- ctx_resched(cpuctx, ctx, event_type);
+ perf_ctx_resched(cpuctx, ctx, event_type);
} else {
ctx_sched_in(ctx, cpuctx, EVENT_TIME, current);
}
--
2.21.0.392.gf8f6787159e-goog
This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
event is using PMC3 when the echo command returns. Vice-Versa, when TFA
is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).
$ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
1.000123979 125,768,725,208 branches
1.000562520 125,631,000,456 branches
1.000942898 125,487,114,291 branches
1.001333316 125,323,363,620 branches
2.004721306 125,514,968,546 branches
2.005114560 125,511,110,861 branches
2.005482722 125,510,132,724 branches
2.005851245 125,508,967,086 branches
3.006323475 125,166,570,648 branches
3.006709247 125,165,650,056 branches
3.007086605 125,164,639,142 branches
3.007459298 125,164,402,912 branches
4.007922698 125,045,577,140 branches
4.008310775 125,046,804,324 branches
4.008670814 125,048,265,111 branches
4.009039251 125,048,677,611 branches
5.009503373 125,122,240,217 branches
5.009897067 125,122,450,517 branches
Then on another connection, sysadmin does:
$ echo 1 >/sys/devices/cpu/allow_tsx_force_abort
Then perf stat adjusts the events immediately:
5.010286029 125,121,393,483 branches
5.010646308 125,120,556,786 branches
6.011113588 124,963,351,832 branches
6.011510331 124,964,267,566 branches
6.011889913 124,964,829,130 branches
6.012262996 124,965,841,156 branches
7.012708299 124,419,832,234 branches [79.69%]
7.012847908 124,416,363,853 branches [79.73%]
7.013225462 124,400,723,712 branches [79.73%]
7.013598191 124,376,154,434 branches [79.70%]
8.014089834 124,250,862,693 branches [74.98%]
8.014481363 124,267,539,139 branches [74.94%]
8.014856006 124,259,519,786 branches [74.98%]
8.014980848 124,225,457,969 branches [75.04%]
9.015464576 124,204,235,423 branches [75.03%]
9.015858587 124,204,988,490 branches [75.04%]
9.016243680 124,220,092,486 branches [74.99%]
9.016620104 124,231,260,146 branches [74.94%]
And vice-versa if the syadmin does:
$ echo 0 >/sys/devices/cpu/allow_tsx_force_abort
Events are again spread over the 4 counters:
10.017096277 124,276,230,565 branches [74.96%]
10.017237209 124,228,062,171 branches [75.03%]
10.017478637 124,178,780,626 branches [75.03%]
10.017853402 124,198,316,177 branches [75.03%]
11.018334423 124,602,418,933 branches [85.40%]
11.018722584 124,602,921,320 branches [85.42%]
11.019095621 124,603,956,093 branches [85.42%]
11.019467742 124,595,273,783 branches [85.42%]
12.019945736 125,110,114,864 branches
12.020330764 125,109,334,472 branches
12.020688740 125,109,818,865 branches
12.021054020 125,108,594,014 branches
13.021516774 125,109,164,018 branches
13.021903640 125,108,794,510 branches
13.022270770 125,107,756,978 branches
13.022630819 125,109,380,471 branches
14.023114989 125,133,140,817 branches
14.023501880 125,133,785,858 branches
14.023868339 125,133,852,700 branches
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/events/core.c | 4 +++
arch/x86/events/intel/core.c | 60 ++++++++++++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 1 +
3 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 12d7d591843e..314173f89cc8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -677,6 +677,10 @@ static inline int is_x86_event(struct perf_event *event)
return event->pmu == &pmu;
}
+struct pmu *x86_get_pmu(void)
+{
+ return &pmu;
+}
/*
* Event scheduler state:
*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a4b7711ef0ee..8d356c2096bc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4483,6 +4483,60 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
return count;
}
+static void update_tfa_sched(void *ignored)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct pmu *pmu = x86_get_pmu();
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+ struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+ /* prevent any changes to the two contexts */
+ perf_ctx_lock(cpuctx, task_ctx);
+
+ /*
+ * check if PMC3 is used
+ * and if so force schedule out for all event types all contexts
+ */
+ if (test_bit(3, cpuc->active_mask))
+ perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+
+ perf_ctx_unlock(cpuctx, task_ctx);
+}
+
+static ssize_t show_sysctl_tfa(struct device *cdev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
+}
+
+static ssize_t set_sysctl_tfa(struct device *cdev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long val;
+ ssize_t ret;
+
+ ret = kstrtoul(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ /* looking for boolean value */
+ if (val > 2)
+ return -EINVAL;
+
+ /* no change */
+ if (val == allow_tsx_force_abort)
+ return count;
+
+ allow_tsx_force_abort ^= 1;
+
+ on_each_cpu(update_tfa_sched, NULL, 1);
+
+ return count;
+}
+
+
static DEVICE_ATTR_RW(freeze_on_smi);
static ssize_t branches_show(struct device *cdev,
@@ -4515,7 +4569,9 @@ static struct attribute *intel_pmu_caps_attrs[] = {
NULL
};
-static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+static DEVICE_ATTR(allow_tsx_force_abort, 0644,
+ show_sysctl_tfa,
+ set_sysctl_tfa);
static struct attribute *intel_pmu_attrs[] = {
&dev_attr_freeze_on_smi.attr,
@@ -5026,7 +5082,7 @@ __init int intel_pmu_init(void)
x86_pmu.get_event_constraints = tfa_get_event_constraints;
x86_pmu.enable_all = intel_tfa_pmu_enable_all;
x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
- intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+ intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr;
}
pr_cont("Skylake events, ");
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index fff8868f92a8..437255603810 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -785,6 +785,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \
.event_str_ht = ht, \
}
+struct pmu *x86_get_pmu(void);
extern struct x86_pmu x86_pmu __read_mostly;
static inline bool x86_pmu_has_lbr_callstack(void)
--
2.21.0.392.gf8f6787159e-goog
This patch makes the perf_ctx_lock()/perf_ctx_unlock() inlined functions
available throughout the perf_events code and not just in kernel/events/core.c
This will help with the next patch.
Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/perf_event.h | 16 ++++++++++++++++
kernel/events/core.c | 16 ----------------
2 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2a1405e907ec..514de997696b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1283,6 +1283,22 @@ perf_event_addr_filters(struct perf_event *event)
return ifh;
}
+static inline void perf_ctx_lock(struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ raw_spin_lock(&cpuctx->ctx.lock);
+ if (ctx)
+ raw_spin_lock(&ctx->lock);
+}
+
+static inline void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ if (ctx)
+ raw_spin_unlock(&ctx->lock);
+ raw_spin_unlock(&cpuctx->ctx.lock);
+}
+
extern void perf_event_addr_filters_sync(struct perf_event *event);
extern int perf_output_begin(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 833f1bccf25a..429bf6d8be95 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -148,22 +148,6 @@ __get_cpu_context(struct perf_event_context *ctx)
return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
}
-static void perf_ctx_lock(struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx)
-{
- raw_spin_lock(&cpuctx->ctx.lock);
- if (ctx)
- raw_spin_lock(&ctx->lock);
-}
-
-static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx)
-{
- if (ctx)
- raw_spin_unlock(&ctx->lock);
- raw_spin_unlock(&cpuctx->ctx.lock);
-}
-
#define TASK_TOMBSTONE ((void *)-1L)
static bool is_kernel_event(struct perf_event *event)
--
2.21.0.392.gf8f6787159e-goog
On Thu, Apr 04, 2019 at 11:32:19AM -0700, Stephane Eranian wrote:
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index a4b7711ef0ee..8d356c2096bc 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4483,6 +4483,60 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
> return count;
> }
>
> +static void update_tfa_sched(void *ignored)
> +{
> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> + struct pmu *pmu = x86_get_pmu();
> + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
> + struct perf_event_context *task_ctx = cpuctx->task_ctx;
> +
> + /* prevent any changes to the two contexts */
> + perf_ctx_lock(cpuctx, task_ctx);
> +
> + /*
> + * check if PMC3 is used
> + * and if so force schedule out for all event types all contexts
> + */
> + if (test_bit(3, cpuc->active_mask))
> + perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
> +
> + perf_ctx_unlock(cpuctx, task_ctx);
I'm not particularly happy with exporting all that. Can't we create this
new perf_ctx_resched() to include the locking and everything. Then the
above reduces to:
if (test_bit(3, cpuc->active_mask))
perf_ctx_resched(cpuctx);
And we don't get to export the tricky bits.
> +}
> +
> +static ssize_t show_sysctl_tfa(struct device *cdev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
> +}
> +
> +static ssize_t set_sysctl_tfa(struct device *cdev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + unsigned long val;
> + ssize_t ret;
> +
> + ret = kstrtoul(buf, 0, &val);
You want kstrtobool()
> + if (ret)
> + return ret;
> +
> + /* looking for boolean value */
> + if (val > 2)
> + return -EINVAL;
> +
> + /* no change */
> + if (val == allow_tsx_force_abort)
> + return count;
> +
> + allow_tsx_force_abort ^= 1;
allow_tsx_force_abort = val;
is simpler
> +
> + on_each_cpu(update_tfa_sched, NULL, 1);
> +
> + return count;
> +}
On Fri, Apr 5, 2019 at 12:13 AM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Apr 04, 2019 at 11:32:19AM -0700, Stephane Eranian wrote:
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index a4b7711ef0ee..8d356c2096bc 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -4483,6 +4483,60 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
> > return count;
> > }
> >
> > +static void update_tfa_sched(void *ignored)
> > +{
> > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > + struct pmu *pmu = x86_get_pmu();
> > + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
> > + struct perf_event_context *task_ctx = cpuctx->task_ctx;
> > +
> > + /* prevent any changes to the two contexts */
> > + perf_ctx_lock(cpuctx, task_ctx);
> > +
> > + /*
> > + * check if PMC3 is used
> > + * and if so force schedule out for all event types all contexts
> > + */
> > + if (test_bit(3, cpuc->active_mask))
> > + perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
> > +
> > + perf_ctx_unlock(cpuctx, task_ctx);
>
> I'm not particularly happy with exporting all that. Can't we create this
> new perf_ctx_resched() to include the locking and everything. Then the
> above reduces to:
>
> if (test_bit(3, cpuc->active_mask))
> perf_ctx_resched(cpuctx);
>
> And we don't get to export the tricky bits.
>
The only reason I exported the locking is to protect
cpuc->active_mask. But if you
think there is no race, then sure, we can just export a new
perf_ctx_resched() that
does the locking and invokes the ctx_resched() function.
> > +}
> > +
> > +static ssize_t show_sysctl_tfa(struct device *cdev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
> > +}
> > +
> > +static ssize_t set_sysctl_tfa(struct device *cdev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + unsigned long val;
> > + ssize_t ret;
> > +
> > + ret = kstrtoul(buf, 0, &val);
>
> You want kstrtobool()
>
ok.
> > + if (ret)
> > + return ret;
> > +
> > + /* looking for boolean value */
> > + if (val > 2)
> > + return -EINVAL;
> > +
> > + /* no change */
> > + if (val == allow_tsx_force_abort)
> > + return count;
> > +
> > + allow_tsx_force_abort ^= 1;
>
> allow_tsx_force_abort = val;
>
> is simpler
>
ok.
> > +
> > + on_each_cpu(update_tfa_sched, NULL, 1);
> > +
> > + return count;
> > +}
On Fri, Apr 05, 2019 at 10:00:03AM -0700, Stephane Eranian wrote:
> > > +static void update_tfa_sched(void *ignored)
> > > +{
> > > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > + struct pmu *pmu = x86_get_pmu();
> > > + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
> > > + struct perf_event_context *task_ctx = cpuctx->task_ctx;
> > > +
> > > + /* prevent any changes to the two contexts */
> > > + perf_ctx_lock(cpuctx, task_ctx);
> > > +
> > > + /*
> > > + * check if PMC3 is used
> > > + * and if so force schedule out for all event types all contexts
> > > + */
> > > + if (test_bit(3, cpuc->active_mask))
> > > + perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
> > > +
> > > + perf_ctx_unlock(cpuctx, task_ctx);
> >
> > I'm not particularly happy with exporting all that. Can't we create this
> > new perf_ctx_resched() to include the locking and everything. Then the
> > above reduces to:
> >
> > if (test_bit(3, cpuc->active_mask))
> > perf_ctx_resched(cpuctx);
> >
> > And we don't get to export the tricky bits.
> >
> The only reason I exported the locking is to protect
> cpuc->active_mask. But if you
> think there is no race, then sure, we can just export a new
> perf_ctx_resched() that
> does the locking and invokes the ctx_resched() function.
It doesn't matter if it races, if it was used and isn't anymore, it's
a pointless reschedule, if it isn't used and we don't reschedule, it
cannot be used because we've already set the flag.
On Fri, Apr 5, 2019 at 1:26 PM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Apr 05, 2019 at 10:00:03AM -0700, Stephane Eranian wrote:
>
> > > > +static void update_tfa_sched(void *ignored)
> > > > +{
> > > > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > > + struct pmu *pmu = x86_get_pmu();
> > > > + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
> > > > + struct perf_event_context *task_ctx = cpuctx->task_ctx;
> > > > +
> > > > + /* prevent any changes to the two contexts */
> > > > + perf_ctx_lock(cpuctx, task_ctx);
> > > > +
> > > > + /*
> > > > + * check if PMC3 is used
> > > > + * and if so force schedule out for all event types all contexts
> > > > + */
> > > > + if (test_bit(3, cpuc->active_mask))
> > > > + perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
> > > > +
> > > > + perf_ctx_unlock(cpuctx, task_ctx);
> > >
> > > I'm not particularly happy with exporting all that. Can't we create this
> > > new perf_ctx_resched() to include the locking and everything. Then the
> > > above reduces to:
> > >
> > > if (test_bit(3, cpuc->active_mask))
> > > perf_ctx_resched(cpuctx);
> > >
> > > And we don't get to export the tricky bits.
> > >
> > The only reason I exported the locking is to protect
> > cpuc->active_mask. But if you
> > think there is no race, then sure, we can just export a new
> > perf_ctx_resched() that
> > does the locking and invokes the ctx_resched() function.
>
> It doesn't matter if it races, if it was used and isn't anymore, it's
> a pointless reschedule, if it isn't used and we don't reschedule, it
> cannot be used because we've already set the flag.
True. I will post V2 shortly.