2024-01-15 17:01:52

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 0/3] Fixing i915 PMU use after free after driver unbind

From: Tvrtko Ursulin <[email protected]>

Hi Peter, all,

This is an early RFC to outline a newly discovered problem in the current
handling of driver unbind with active perf fds.

The sequence is basically this:

1. Open a perf fd
2. Unbind a driver
3. Close the dangling fd

Or a slightly more evil variant:

1. Open a perf fd
2. Unbind a driver
3. Bind the driver again
4. Close the dangling fd

I thought we had this covered by recording the unbound status (pmu->closed in
i915_pmu.c) and making sure the struct i915_pmu (and struct perf_pmu) remain
active until the last event is closed (via internal reference counting). But
what I missed until now are two things:

1)
core.c: _free_event() will dereference event->pmu _after_ event->destroy().

KASAN catches this easily and patches 1 & 2 are the attempt to fix that.

2)
A more evil case where pmu->cpu_pmu_context per-cpu allocation gets re-used
_before_ the old perf fd is closed.

There things can nicely explode on the list_del_init inside event_sched_out on
list_del_init(&event->active_list); (with list debugging turned on of course).

Most easily reproducible by simply re-binding i915, which happens to grab the
same per-cpu block and then the new perf_pmu_register zaps the list_head which
the old event will try to unlink itself from.

This is what the third patch attempts to deal with. It is a bit incomplete
though, as I was unsure what is the best approach to fix and so thought to send
it out early for some guidance.

Cc: Peter Zijlstra <[email protected]>
Cc: Umesh Nerlige Ramappa <[email protected]>
Cc: Aravind Iddamsetty <[email protected]>

Tvrtko Ursulin (3):
perf: Add new late event free callback
drm/i915/pmu: Move i915 reference drop to new event->free()
perf: Reference count struct perf_cpu_pmu_context to fix driver unbind

drivers/gpu/drm/i915/i915_pmu.c | 4 ++--
include/linux/perf_event.h | 2 ++
kernel/events/core.c | 34 ++++++++++++++++++++++++---------
3 files changed, 29 insertions(+), 11 deletions(-)

--
2.40.1



2024-01-15 17:01:56

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 1/3] perf: Add new late event free callback

From: Tvrtko Ursulin <[email protected]>

This allows drivers to implement a PMU with support for unbinding the
device, for example by making event->pmu reference counted on the driver
side and its lifetime matching the struct perf_event init/free.

Otherwise, if an open perf fd is kept past driver unbind, the perf code
can dereference the potentially freed struct pmu from the _free_event
steps which follow the existing destroy callback.

TODO/FIXME/QQQ:

A simpler version could be to simply move the ->destroy() callback to
later in _free_event(). However a comment there claims there are steps
which need to run after the existing destroy callbacks, hence I opted for
an initially cautious approach.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Umesh Nerlige Ramappa <[email protected]>
Cc: Aravind Iddamsetty <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 13 +++++++++++--
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5547ba68e6e4..a567d2d98be1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -799,6 +799,7 @@ struct perf_event {
struct perf_event *aux_event;

void (*destroy)(struct perf_event *);
+ void (*free)(struct perf_event *);
struct rcu_head rcu_head;

struct pid_namespace *ns;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a64165af45c1..4b62d2201ca7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5242,6 +5242,9 @@ static void _free_event(struct perf_event *event)
exclusive_event_destroy(event);
module_put(event->pmu->module);

+ if (event->free)
+ event->free(event);
+
call_rcu(&event->rcu_head, free_event_rcu);
}

@@ -11662,8 +11665,12 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
event_has_any_exclude_flag(event))
ret = -EINVAL;

- if (ret && event->destroy)
- event->destroy(event);
+ if (ret) {
+ if (event->destroy)
+ event->destroy(event);
+ if (event->free)
+ event->free(event);
+ }
}

if (ret)
@@ -12090,6 +12097,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
perf_detach_cgroup(event);
if (event->destroy)
event->destroy(event);
+ if (event->free)
+ event->free(event);
module_put(pmu->module);
err_ns:
if (event->hw.target)
--
2.40.1


2024-01-15 17:02:30

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 3/3] perf: Reference count struct perf_cpu_pmu_context to fix driver unbind

From: Tvrtko Ursulin <[email protected]>

If a PMU driver is a PCI driver which can be unbound at runtime there is
currently an use after free issue with CPU contexts when an active perf fd
is kept around.

Specifically, when perf_pmu_unregister() calls free_pmu_context() on the
PMU provided per cpu struct perf_cpu_pmu_context storage, any call path
which ends up in event_sched_out() (such as __perf_remove_from_context()
via perf_event_release_kernel()) will dereference a freed event->pmu_ctx.
Furthermore if the same percpu area has in the meantime been re-allocated,
the use after free will corrupt someone elses per cpu storage area.

To fix it we attempt to add reference counting to struct
perf_cpu_pmu_context such that the object remains until the last user
is done with it.

TODO/FIXME/QQQ:

1)

I am really not sure about the locking here. I *think* I needed a per
struct pmu counter and by looking at what find_get_pmu_context does when
it takes a slot from driver provided pmu->cpu_pmu_context under the
ctx->lock, it looked like that should be sufficient. Maybe even if not
atomic_t. Or maybe ctx->lock is not enough.

2)

I believe pmu->pmu_disable_count will need a similar treatment, but as
I wasn't sure of the locking model, or even if this all makes sense on
the high level I left it out for now. Like does the idea to reference
count even flies or a completely different solution will be needed.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Umesh Nerlige Ramappa <[email protected]>
Cc: Aravind Iddamsetty <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 21 ++++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a567d2d98be1..bd1c8f3c1736 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -317,6 +317,7 @@ struct pmu {

int __percpu *pmu_disable_count;
struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
+ atomic_t cpu_pmu_context_refcount;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
int hrtimer_interval_ms;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4b62d2201ca7..0c95aecf560a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4873,6 +4873,9 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
atomic_inc(&epc->refcount);
}
raw_spin_unlock_irq(&ctx->lock);
+
+ atomic_inc(&pmu->cpu_pmu_context_refcount);
+
return epc;
}

@@ -4928,6 +4931,12 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
return epc;
}

+static void put_pmu_context(struct pmu *pmu)
+{
+ if (atomic_dec_and_test(&pmu->cpu_pmu_context_refcount))
+ free_percpu(pmu->cpu_pmu_context);
+}
+
static void get_pmu_ctx(struct perf_event_pmu_context *epc)
{
WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount));
@@ -4967,8 +4976,10 @@ static void put_pmu_ctx(struct perf_event_pmu_context *epc)

raw_spin_unlock_irqrestore(&ctx->lock, flags);

- if (epc->embedded)
+ if (epc->embedded) {
+ put_pmu_context(epc->pmu);
return;
+ }

call_rcu(&epc->rcu_head, free_epc_rcu);
}
@@ -11347,11 +11358,6 @@ static int perf_event_idx_default(struct perf_event *event)
return 0;
}

-static void free_pmu_context(struct pmu *pmu)
-{
- free_percpu(pmu->cpu_pmu_context);
-}
-
/*
* Let userspace know that this PMU supports address range filtering:
*/
@@ -11573,6 +11579,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
pmu->event_idx = perf_event_idx_default;

list_add_rcu(&pmu->entry, &pmus);
+ atomic_set(&pmu->cpu_pmu_context_refcount, 1);
atomic_set(&pmu->exclusive_cnt, 0);
ret = 0;
unlock:
@@ -11615,7 +11622,7 @@ void perf_pmu_unregister(struct pmu *pmu)
device_del(pmu->dev);
put_device(pmu->dev);
}
- free_pmu_context(pmu);
+ put_pmu_context(pmu);
mutex_unlock(&pmus_lock);
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
--
2.40.1


2024-01-15 17:02:31

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 2/3] drm/i915/pmu: Move i915 reference drop to new event->free()

From: Tvrtko Ursulin <[email protected]>

Avoids use after free in the perf core code on the event destruction
path, after the PCI driver has been unbound with the active perf file
descriptors.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Umesh Nerlige Ramappa <[email protected]>
Cc: Aravind Iddamsetty <[email protected]>
---
drivers/gpu/drm/i915/i915_pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 21eb0c5b320d..010763a5bc39 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -514,7 +514,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
return HRTIMER_RESTART;
}

-static void i915_pmu_event_destroy(struct perf_event *event)
+static void i915_pmu_event_free(struct perf_event *event)
{
struct i915_pmu *pmu = event_to_pmu(event);
struct drm_i915_private *i915 = pmu_to_i915(pmu);
@@ -630,7 +630,7 @@ static int i915_pmu_event_init(struct perf_event *event)

if (!event->parent) {
drm_dev_get(&i915->drm);
- event->destroy = i915_pmu_event_destroy;
+ event->free = i915_pmu_event_free;
}

return 0;
--
2.40.1