LinuxLists.cc - [PATCH 0/3] DRM scheduler documentation & bug fixes

2023-07-14 08:59:44

Subject: [PATCH 0/3] DRM scheduler documentation & bug fixes

Based on the previous discussion while I was writing the Rust
abstractions for the DRM scheduler, it looks like we're overdue for some
documentation.

This series first attempts to document what I've learned about the
scheduler and what I believe should be the *intended* lifetime
semantics, and then fixes a few bugs that result from that:

1. The DRM scheduler fences cannot be required to be outlived by the
scheduler. This is non-negotiable. The whole point of these fences is
to decouple the underlying hardware/driver from consumers, such as
dma-bufs with an attached fence. If this requirement were not met,
then we'd have to somehow keep the scheduler and all the driver
components associated with it alive as long as a dma-buf with an
attached drm_sched fence is alive, which could be indefinitely even
after the hardware that produced that dma-buf is long gone. Consider,
for example, using a hot-pluggable GPU to write to a dma-buf in main
memory, which gets presented on an integrated display controller, and
then the GPU is unplugged. That buffer could potentially live
forever, we can't block GPU driver cleanup on that.

2. Make the DRM scheduler properly clean up jobs on shutdown, such that
we can support the use case of tearing down the scheduler with
in-flight jobs. This is important to cleanly support the firmware
scheduling use case, where the DRM scheduler is attached to a file
(which we want to be able to tear down quickly when userspace closes
it) while firmware could continue to (attempt to) run in-flight jobs
after that point. The major missing codepath to make this work is
detaching jobs from their HW fences on scheduler shutdown, so
implement that. This also makes writing a safe Rust abstraction
plausible, since otherwise we'd have to add a huge pile of complexity
to that side in order to enforce the invariant that the scheduler
outlives its jobs (including setting up a workqueue to handle
scheduler teardown and other craziness, which is an unacceptable
level of complexity for what should be a lightweight abstraction).

I believe there *may* still be at least one UAF-type bug related to case
2 above, but it's very hard to trigger and I wasn't able to figure out
what causes it the one time I saw it recently. Other than that, things
look quite robust on the Asahi driver with these patches, even when
trying to break things by killing GPU consumers in a tight loop and
things like that. If we agree this is a good way forward, I think this
is a good start even if there's still a bug lurking somewhere.

Aside (but related to the previous discussion): the can_run_job thing
is gone, I'm using fences returned from prepare() now and that works
well (and actually fixes one corner case related to wait contexts I'd
missed), so hopefully that's OK with everyone ^^

Changes from the previous version of patch #2: explicitly signal
detached job fences with an error. I'd missed that and I think it's what
was causing us some rare lockups due to fences never getting signaled.

Signed-off-by: Asahi Lina <[email protected]>
---
Asahi Lina (3):
drm/scheduler: Add more documentation
drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
drm/scheduler: Clean up jobs when the scheduler is torn down.
drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-
drivers/gpu/drm/scheduler/sched_fence.c | 4 +-
drivers/gpu/drm/scheduler/sched_main.c | 90 ++++++++++++++++++++++++++++++--
include/drm/gpu_scheduler.h | 5 ++
4 files changed, 99 insertions(+), 7 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230714-drm-sched-fixes-94bea043bbe7

Thank you,
~~ Lina

2023-07-14 09:00:08

by Asahi Lina

[permalink] [raw]

Subject: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

A signaled scheduler fence can outlive its scheduler, since fences are
independencly reference counted. Therefore, we can't reference the
scheduler in the get_timeline_name() implementation.

Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
dma-bufs reference fences from GPU schedulers that no longer exist.

Signed-off-by: Asahi Lina <[email protected]>
---
drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
include/drm/gpu_scheduler.h | 5 +++++
3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index b2bbc8a68b30..17f35b0b005a 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -389,7 +389,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)

/*
* Fence is from the same scheduler, only need to wait for
- * it to be scheduled
+ * it to be scheduled.
+ *
+ * Note: s_fence->sched could have been freed and reallocated
+ * as another scheduler. This false positive case is okay, as if
+ * the old scheduler was freed all of its jobs must have
+ * signaled their completion fences.
*/
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index ef120475e7c6..06a0eebcca10 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -68,7 +68,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence)
static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
{
struct drm_sched_fence *fence = to_drm_sched_fence(f);
- return (const char *)fence->sched->name;
+ return (const char *)fence->sched_name;
}

static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
@@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
unsigned seq;

fence->sched = entity->rq->sched;
+ strlcpy(fence->sched_name, entity->rq->sched->name,
+ sizeof(fence->sched_name));
seq = atomic_inc_return(&entity->fence_seq);
dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
&fence->lock, entity->fence_context, seq);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e95b4837e5a3..4fa9523bd47d 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -305,6 +305,11 @@ struct drm_sched_fence {
* @lock: the lock used by the scheduled and the finished fences.
*/
spinlock_t lock;
+ /**
+ * @sched_name: the name of the scheduler that owns this fence. We
+ * keep a copy here since fences can outlive their scheduler.
+ */
+ char sched_name[16];
/**
* @owner: job owner for debugging
*/

--
2.40.1

2023-07-14 10:23:55

by Asahi Lina

[permalink] [raw]

Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

On 14/07/2023 18.51, Christian König wrote:
> Am 14.07.23 um 11:44 schrieb Asahi Lina:
>> On 14/07/2023 17.43, Christian König wrote:
>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>> A signaled scheduler fence can outlive its scheduler, since fences are
>>>> independencly reference counted. Therefore, we can't reference the
>>>> scheduler in the get_timeline_name() implementation.
>>>>
>>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>
>>>> Signed-off-by: Asahi Lina <[email protected]>
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>>    drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>>    include/drm/gpu_scheduler.h              | 5 +++++
>>>>    3 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index b2bbc8a68b30..17f35b0b005a 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -389,7 +389,12 @@ static bool
>>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>>               /*
>>>>             * Fence is from the same scheduler, only need to wait for
>>>> -         * it to be scheduled
>>>> +         * it to be scheduled.
>>>> +         *
>>>> +         * Note: s_fence->sched could have been freed and reallocated
>>>> +         * as another scheduler. This false positive case is okay,
>>>> as if
>>>> +         * the old scheduler was freed all of its jobs must have
>>>> +         * signaled their completion fences.
>>>
>>> This is outright nonsense. As long as an entity for a scheduler exists
>>> it is not allowed to free up this scheduler.
>>>
>>> So this function can't be called like this.
>>>
>>>>             */
>>>>            fence = dma_fence_get(&s_fence->scheduled);
>>>>            dma_fence_put(entity->dependency);
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
>>>> b/drivers/gpu/drm/scheduler/sched_fence.c
>>>> index ef120475e7c6..06a0eebcca10 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>> @@ -68,7 +68,7 @@ static const char
>>>> *drm_sched_fence_get_driver_name(struct dma_fence *fence)
>>>>    static const char *drm_sched_fence_get_timeline_name(struct
>>>> dma_fence *f)
>>>>    {
>>>>        struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>> -    return (const char *)fence->sched->name;
>>>> +    return (const char *)fence->sched_name;
>>>>    }
>>>>       static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
>>>> @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence
>>>> *fence,
>>>>        unsigned seq;
>>>>           fence->sched = entity->rq->sched;
>>>> +    strlcpy(fence->sched_name, entity->rq->sched->name,
>>>> +        sizeof(fence->sched_name));
>>>>        seq = atomic_inc_return(&entity->fence_seq);
>>>>        dma_fence_init(&fence->scheduled,
>>>> &drm_sched_fence_ops_scheduled,
>>>>                   &fence->lock, entity->fence_context, seq);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index e95b4837e5a3..4fa9523bd47d 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -305,6 +305,11 @@ struct drm_sched_fence {
>>>>             * @lock: the lock used by the scheduled and the finished
>>>> fences.
>>>>             */
>>>>        spinlock_t            lock;
>>>> +        /**
>>>> +         * @sched_name: the name of the scheduler that owns this
>>>> fence. We
>>>> +     * keep a copy here since fences can outlive their scheduler.
>>>> +         */
>>>> +    char sched_name[16];
>>>
>>> This just mitigates the problem, but doesn't fix it.
>>
>> Could you point out any remaining issues so we can fix them? Right now
>> this absolutely *is* broken and this fixes the breakage I observed. If
>> there are other bugs remaining, I'd like to know what they are so I
>> can fix them.
>>
>>> The real issue is that the hw fence is kept around much longer than
>>> that.
>>
>> As far as I know, the whole point of scheduler fences is to decouple
>> the hw fences from the consumers.
>
> Well yes and no. The decoupling is for the signaling, it's not
> decoupling the lifetime.

When I spoke with Daniel I understood the intent was also to decouple
the lifetime.

>> I already talked with Daniel about this. The current behavior is
>> broken. These fences can live forever. It is broken to require that
>> they outlive the driver that produced them.
>>
>>> Additional to that I'm not willing to increase the scheduler fence size
>>> like that just to decouple them from the scheduler.
>>
>> Did you read my explanation on the cover letter as to how this is just
>> broken right now? We need to fix this. If you have a better suggestion
>> I'll take it. Doing nothing is not an option.
>
> Well this isn't broken at all. This works exactly like intended, you
> just want to use it for something it wasn't made for.
>
> That scheduler fences could be changed to outlive the scheduler which
> issued them is possible, but this is certainly a new requirement.
>
> Especially since we need to grab additional references to make sure that
> the module isn't unloaded in such a case.

Yes, that's a remaining issue. The fences need to grab a module
reference to make sure drm_sched doesn't get unloaded until they're all
really gone. I can add that in v2.

It would also be desirable to drop the hw fence as soon as it signals,
instead of keeping a reference to it forever.

~~ Lina

2023-07-14 10:25:18

by Christian König

[permalink] [raw]

Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Am 14.07.23 um 11:44 schrieb Asahi Lina:
> On 14/07/2023 17.43, Christian König wrote:
>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>> A signaled scheduler fence can outlive its scheduler, since fences are
>>> independencly reference counted. Therefore, we can't reference the
>>> scheduler in the get_timeline_name() implementation.
>>>
>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>
>>> Signed-off-by: Asahi Lina <[email protected]>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>    drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>    include/drm/gpu_scheduler.h              | 5 +++++
>>>    3 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index b2bbc8a68b30..17f35b0b005a 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -389,7 +389,12 @@ static bool
>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>               /*
>>>             * Fence is from the same scheduler, only need to wait for
>>> -         * it to be scheduled
>>> +         * it to be scheduled.
>>> +         *
>>> +         * Note: s_fence->sched could have been freed and reallocated
>>> +         * as another scheduler. This false positive case is okay,
>>> as if
>>> +         * the old scheduler was freed all of its jobs must have
>>> +         * signaled their completion fences.
>>
>> This is outright nonsense. As long as an entity for a scheduler exists
>> it is not allowed to free up this scheduler.
>>
>> So this function can't be called like this.
>>
>>>             */
>>>            fence = dma_fence_get(&s_fence->scheduled);
>>>            dma_fence_put(entity->dependency);
>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
>>> b/drivers/gpu/drm/scheduler/sched_fence.c
>>> index ef120475e7c6..06a0eebcca10 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>> @@ -68,7 +68,7 @@ static const char
>>> *drm_sched_fence_get_driver_name(struct dma_fence *fence)
>>>    static const char *drm_sched_fence_get_timeline_name(struct
>>> dma_fence *f)
>>>    {
>>>        struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>> -    return (const char *)fence->sched->name;
>>> +    return (const char *)fence->sched_name;
>>>    }
>>>       static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
>>> @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence
>>> *fence,
>>>        unsigned seq;
>>>           fence->sched = entity->rq->sched;
>>> +    strlcpy(fence->sched_name, entity->rq->sched->name,
>>> +        sizeof(fence->sched_name));
>>>        seq = atomic_inc_return(&entity->fence_seq);
>>>        dma_fence_init(&fence->scheduled,
>>> &drm_sched_fence_ops_scheduled,
>>>                   &fence->lock, entity->fence_context, seq);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index e95b4837e5a3..4fa9523bd47d 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -305,6 +305,11 @@ struct drm_sched_fence {
>>>             * @lock: the lock used by the scheduled and the finished
>>> fences.
>>>             */
>>>        spinlock_t            lock;
>>> +        /**
>>> +         * @sched_name: the name of the scheduler that owns this
>>> fence. We
>>> +     * keep a copy here since fences can outlive their scheduler.
>>> +         */
>>> +    char sched_name[16];
>>
>> This just mitigates the problem, but doesn't fix it.
>
> Could you point out any remaining issues so we can fix them? Right now
> this absolutely *is* broken and this fixes the breakage I observed. If
> there are other bugs remaining, I'd like to know what they are so I
> can fix them.
>
>> The real issue is that the hw fence is kept around much longer than
>> that.
>
> As far as I know, the whole point of scheduler fences is to decouple
> the hw fences from the consumers.

Well yes and no. The decoupling is for the signaling, it's not
decoupling the lifetime.

> I already talked with Daniel about this. The current behavior is
> broken. These fences can live forever. It is broken to require that
> they outlive the driver that produced them.
>
>> Additional to that I'm not willing to increase the scheduler fence size
>> like that just to decouple them from the scheduler.
>
> Did you read my explanation on the cover letter as to how this is just
> broken right now? We need to fix this. If you have a better suggestion
> I'll take it. Doing nothing is not an option.

Well this isn't broken at all. This works exactly like intended, you
just want to use it for something it wasn't made for.

That scheduler fences could be changed to outlive the scheduler which
issued them is possible, but this is certainly a new requirement.

Especially since we need to grab additional references to make sure that
the module isn't unloaded in such a case.

Christian.

>
> ~~ Lina
>

2023-07-14 10:31:53

by Asahi Lina

[permalink] [raw]

Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

On 14/07/2023 18.57, Christian König wrote:
> Am 14.07.23 um 11:49 schrieb Asahi Lina:
>> On 14/07/2023 17.43, Christian König wrote:
>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>> A signaled scheduler fence can outlive its scheduler, since fences are
>>>> independencly reference counted. Therefore, we can't reference the
>>>> scheduler in the get_timeline_name() implementation.
>>>>
>>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>
>>>> Signed-off-by: Asahi Lina <[email protected]>
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>>    drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>>    include/drm/gpu_scheduler.h              | 5 +++++
>>>>    3 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index b2bbc8a68b30..17f35b0b005a 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -389,7 +389,12 @@ static bool
>>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>>               /*
>>>>             * Fence is from the same scheduler, only need to wait for
>>>> -         * it to be scheduled
>>>> +         * it to be scheduled.
>>>> +         *
>>>> +         * Note: s_fence->sched could have been freed and reallocated
>>>> +         * as another scheduler. This false positive case is okay,
>>>> as if
>>>> +         * the old scheduler was freed all of its jobs must have
>>>> +         * signaled their completion fences.
>>>
>>> This is outright nonsense. As long as an entity for a scheduler exists
>>> it is not allowed to free up this scheduler.
>>>
>>> So this function can't be called like this.
>>
>> As I already explained, the fences can outlive their scheduler. That
>> means *this* entity certainly exists for *this* scheduler, but the
>> *dependency* fence might have come from a past scheduler that was
>> already destroyed along with all of its entities, and its address reused.
>
> Well this is function is not about fences, this function is a callback
> for the entity.

That deals with dependency fences, which could have come from any
arbitrary source, including another entity and another scheduler.

>>
>> Christian, I'm really getting tired of your tone. I don't appreciate
>> being told my comments are "outright nonsense" when you don't even
>> take the time to understand what the issue is and what I'm trying to
>> do/document. If you aren't interested in working with me, I'm just
>> going to give up on drm_sched, wait until Rust gets workqueue support,
>> and reimplement it in Rust. You can keep your broken fence lifetime
>> semantics and I'll do my own thing.
>
> I'm certainly trying to help here, but you seem to have unrealistic
> expectations.

I don't think expecting not to be told my changes are "outright
nonsense" is an unrealistic expectation. If you think it is, maybe
that's yet another indicator of the culture problems the kernel
community has...

> I perfectly understand what you are trying to do, but you don't seem to
> understand that this functionality here isn't made for your use case.

I do, that's why I'm trying to change things. Right now, this
functionality isn't even properly documented, which is why I thought it
could be used for my use case, and slowly discovered otherwise. Daniel
suggested documenting it, then fixing the mismatches between
documentation and reality, which is what I'm doing here.

> We can adjust the functionality to better match your requirements, but
> you can't say it is broken because it doesn't work when you use it not
> in the way it is intended to be used.

I'm saying the idea that a random dma-buf holds onto a chain of
references that prevents unloading a driver module that wrote into it
(and keeps a bunch of random unrelated objects alive) is a broken state
of affairs. It may or may not trickle down to actual problems for users
(I would bet it does in some cases but I don't know for sure), but it's
a situation that doesn't make any sense.

I know I'm triggering actual breakage with my new use case due to this,
which is why I'm trying to improve things. But the current state of
affairs just doesn't make any sense even if it isn't causing kernel
oopses today with other drivers.

> You can go ahead and try to re-implement the functionality in Rust, but
> then I would reject that pointing out that this should probably be an
> extension to the existing code.

You keep rejecting my attempts at extending the existing code...

~~ Lina

2023-07-14 11:21:46

by Christian König

[permalink] [raw]

Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Am 14.07.23 um 12:07 schrieb Asahi Lina:
> On 14/07/2023 18.51, Christian König wrote:
>> Am 14.07.23 um 11:44 schrieb Asahi Lina:
>>> On 14/07/2023 17.43, Christian König wrote:
>>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>>> A signaled scheduler fence can outlive its scheduler, since fences
>>>>> are
>>>>> independencly reference counted. Therefore, we can't reference the
>>>>> scheduler in the get_timeline_name() implementation.
>>>>>
>>>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>>
>>>>> Signed-off-by: Asahi Lina <[email protected]>
>>>>> ---
>>>>>     drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>>>     drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>>>     include/drm/gpu_scheduler.h              | 5 +++++
>>>>>     3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index b2bbc8a68b30..17f35b0b005a 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -389,7 +389,12 @@ static bool
>>>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>>>                /*
>>>>>              * Fence is from the same scheduler, only need to wait
>>>>> for
>>>>> -         * it to be scheduled
>>>>> +         * it to be scheduled.
>>>>> +         *
>>>>> +         * Note: s_fence->sched could have been freed and
>>>>> reallocated
>>>>> +         * as another scheduler. This false positive case is okay,
>>>>> as if
>>>>> +         * the old scheduler was freed all of its jobs must have
>>>>> +         * signaled their completion fences.
>>>>
>>>> This is outright nonsense. As long as an entity for a scheduler exists
>>>> it is not allowed to free up this scheduler.
>>>>
>>>> So this function can't be called like this.
>>>>
>>>>>              */
>>>>>             fence = dma_fence_get(&s_fence->scheduled);
>>>>>             dma_fence_put(entity->dependency);
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> index ef120475e7c6..06a0eebcca10 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> @@ -68,7 +68,7 @@ static const char
>>>>> *drm_sched_fence_get_driver_name(struct dma_fence *fence)
>>>>>     static const char *drm_sched_fence_get_timeline_name(struct
>>>>> dma_fence *f)
>>>>>     {
>>>>>         struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>> -    return (const char *)fence->sched->name;
>>>>> +    return (const char *)fence->sched_name;
>>>>>     }
>>>>>        static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
>>>>> @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence
>>>>> *fence,
>>>>>         unsigned seq;
>>>>>            fence->sched = entity->rq->sched;
>>>>> +    strlcpy(fence->sched_name, entity->rq->sched->name,
>>>>> +        sizeof(fence->sched_name));
>>>>>         seq = atomic_inc_return(&entity->fence_seq);
>>>>>         dma_fence_init(&fence->scheduled,
>>>>> &drm_sched_fence_ops_scheduled,
>>>>>                    &fence->lock, entity->fence_context, seq);
>>>>> diff --git a/include/drm/gpu_scheduler.h
>>>>> b/include/drm/gpu_scheduler.h
>>>>> index e95b4837e5a3..4fa9523bd47d 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -305,6 +305,11 @@ struct drm_sched_fence {
>>>>>              * @lock: the lock used by the scheduled and the finished
>>>>> fences.
>>>>>              */
>>>>>         spinlock_t            lock;
>>>>> +        /**
>>>>> +         * @sched_name: the name of the scheduler that owns this
>>>>> fence. We
>>>>> +     * keep a copy here since fences can outlive their scheduler.
>>>>> +         */
>>>>> +    char sched_name[16];
>>>>
>>>> This just mitigates the problem, but doesn't fix it.
>>>
>>> Could you point out any remaining issues so we can fix them? Right now
>>> this absolutely *is* broken and this fixes the breakage I observed. If
>>> there are other bugs remaining, I'd like to know what they are so I
>>> can fix them.
>>>
>>>> The real issue is that the hw fence is kept around much longer than
>>>> that.
>>>
>>> As far as I know, the whole point of scheduler fences is to decouple
>>> the hw fences from the consumers.
>>
>> Well yes and no. The decoupling is for the signaling, it's not
>> decoupling the lifetime.
>
> When I spoke with Daniel I understood the intent was also to decouple
> the lifetime.

Yes, we discussed that a long long time ago as well.

We *wanted* to decouple the dma_fence lifetime, it's just not done that
way because of problems with that approach.

>
>>> I already talked with Daniel about this. The current behavior is
>>> broken. These fences can live forever. It is broken to require that
>>> they outlive the driver that produced them.
>>>
>>>> Additional to that I'm not willing to increase the scheduler fence
>>>> size
>>>> like that just to decouple them from the scheduler.
>>>
>>> Did you read my explanation on the cover letter as to how this is just
>>> broken right now? We need to fix this. If you have a better suggestion
>>> I'll take it. Doing nothing is not an option.
>>
>> Well this isn't broken at all. This works exactly like intended, you
>> just want to use it for something it wasn't made for.
>>
>> That scheduler fences could be changed to outlive the scheduler which
>> issued them is possible, but this is certainly a new requirement.
>>
>> Especially since we need to grab additional references to make sure that
>> the module isn't unloaded in such a case.
>
> Yes, that's a remaining issue. The fences need to grab a module
> reference to make sure drm_sched doesn't get unloaded until they're
> all really gone. I can add that in v2.

You also need to come up with an idea to prevent races with the deadline
handling. See drm_sched_fence_set_deadline_finished().

>
> It would also be desirable to drop the hw fence as soon as it signals,
> instead of keeping a reference to it forever.

Yeah, agree. Problem here again is that this is easier said than done in
a non-racy way.

Christian.

>
> ~~ Lina
>

2023-07-14 11:44:40

by Christian König

[permalink] [raw]

Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Am 14.07.23 um 12:06 schrieb Asahi Lina:
> On 14/07/2023 18.57, Christian König wrote:
>> Am 14.07.23 um 11:49 schrieb Asahi Lina:
>>> On 14/07/2023 17.43, Christian König wrote:
>>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>>> A signaled scheduler fence can outlive its scheduler, since fences
>>>>> are
>>>>> independencly reference counted. Therefore, we can't reference the
>>>>> scheduler in the get_timeline_name() implementation.
>>>>>
>>>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>>
>>>>> Signed-off-by: Asahi Lina <[email protected]>
>>>>> ---
>>>>>     drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>>>     drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>>>     include/drm/gpu_scheduler.h              | 5 +++++
>>>>>     3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index b2bbc8a68b30..17f35b0b005a 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -389,7 +389,12 @@ static bool
>>>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>>>                /*
>>>>>              * Fence is from the same scheduler, only need to wait
>>>>> for
>>>>> -         * it to be scheduled
>>>>> +         * it to be scheduled.
>>>>> +         *
>>>>> +         * Note: s_fence->sched could have been freed and
>>>>> reallocated
>>>>> +         * as another scheduler. This false positive case is okay,
>>>>> as if
>>>>> +         * the old scheduler was freed all of its jobs must have
>>>>> +         * signaled their completion fences.
>>>>
>>>> This is outright nonsense. As long as an entity for a scheduler exists
>>>> it is not allowed to free up this scheduler.
>>>>
>>>> So this function can't be called like this.
>>>
>>> As I already explained, the fences can outlive their scheduler. That
>>> means *this* entity certainly exists for *this* scheduler, but the
>>> *dependency* fence might have come from a past scheduler that was
>>> already destroyed along with all of its entities, and its address
>>> reused.
>>
>> Well this is function is not about fences, this function is a callback
>> for the entity.
>
> That deals with dependency fences, which could have come from any
> arbitrary source, including another entity and another scheduler.

No, they can't. Signaling is certainly mandatory to happen before things
are released even if we allow to decouple the dma_fence from it's issuer.

>
>>>
>>> Christian, I'm really getting tired of your tone. I don't appreciate
>>> being told my comments are "outright nonsense" when you don't even
>>> take the time to understand what the issue is and what I'm trying to
>>> do/document. If you aren't interested in working with me, I'm just
>>> going to give up on drm_sched, wait until Rust gets workqueue support,
>>> and reimplement it in Rust. You can keep your broken fence lifetime
>>> semantics and I'll do my own thing.
>>
>> I'm certainly trying to help here, but you seem to have unrealistic
>> expectations.
>
> I don't think expecting not to be told my changes are "outright
> nonsense" is an unrealistic expectation. If you think it is, maybe
> that's yet another indicator of the culture problems the kernel
> community has...

Well I'm just pointing out that you don't seem to understand the
background of the things and just think this is a bug instead of
intentional behavior.

>
>> I perfectly understand what you are trying to do, but you don't seem to
>> understand that this functionality here isn't made for your use case.
>
> I do, that's why I'm trying to change things. Right now, this
> functionality isn't even properly documented, which is why I thought
> it could be used for my use case, and slowly discovered otherwise.
> Daniel suggested documenting it, then fixing the mismatches between
> documentation and reality, which is what I'm doing here.

Well I know Daniel for something like 10-15 years or so, I'm pretty sure
that he meant that you document the existing state because otherwise
this goes against usual patch submission approaches.

>
>> We can adjust the functionality to better match your requirements, but
>> you can't say it is broken because it doesn't work when you use it not
>> in the way it is intended to be used.
>
> I'm saying the idea that a random dma-buf holds onto a chain of
> references that prevents unloading a driver module that wrote into it
> (and keeps a bunch of random unrelated objects alive) is a broken
> state of affairs.

Well no, this is intentional design. Otherwise the module and with it
the operations pointer the fences rely on go away. We already discussed
that over 10 years ago when Marten came up with the initial dma_fence
design.

The resulting problems are very well known and I completely agree that
they are undesirable, but this is how the framework works and not just
the scheduler but the rest of the DMA-buf framework as well.

> It may or may not trickle down to actual problems for users (I would
> bet it does in some cases but I don't know for sure), but it's a
> situation that doesn't make any sense.
>
> I know I'm triggering actual breakage with my new use case due to
> this, which is why I'm trying to improve things. But the current state
> of affairs just doesn't make any sense even if it isn't causing kernel
> oopses today with other drivers.
>
>> You can go ahead and try to re-implement the functionality in Rust, but
>> then I would reject that pointing out that this should probably be an
>> extension to the existing code.
>
> You keep rejecting my attempts at extending the existing code...

Well I will try to improve here and push you into the right direction
instead.

Regards,
Christian.

>
> ~~ Lina
>

2023-07-14 12:24:09

by Asahi Lina

[permalink] [raw]

Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

On 14/07/2023 19.18, Christian König wrote:
> Am 14.07.23 um 12:06 schrieb Asahi Lina:
>> On 14/07/2023 18.57, Christian König wrote:
>>> Am 14.07.23 um 11:49 schrieb Asahi Lina:
>>>> On 14/07/2023 17.43, Christian König wrote:
>>>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>>>> A signaled scheduler fence can outlive its scheduler, since fences
>>>>>> are
>>>>>> independencly reference counted. Therefore, we can't reference the
>>>>>> scheduler in the get_timeline_name() implementation.
>>>>>>
>>>>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>>>> dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>>>
>>>>>> Signed-off-by: Asahi Lina <[email protected]>
>>>>>> ---
>>>>>>     drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>>>>     drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
>>>>>>     include/drm/gpu_scheduler.h              | 5 +++++
>>>>>>     3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> index b2bbc8a68b30..17f35b0b005a 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> @@ -389,7 +389,12 @@ static bool
>>>>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>>>>                /*
>>>>>>              * Fence is from the same scheduler, only need to wait
>>>>>> for
>>>>>> -         * it to be scheduled
>>>>>> +         * it to be scheduled.
>>>>>> +         *
>>>>>> +         * Note: s_fence->sched could have been freed and
>>>>>> reallocated
>>>>>> +         * as another scheduler. This false positive case is okay,
>>>>>> as if
>>>>>> +         * the old scheduler was freed all of its jobs must have
>>>>>> +         * signaled their completion fences.
>>>>>
>>>>> This is outright nonsense. As long as an entity for a scheduler exists
>>>>> it is not allowed to free up this scheduler.
>>>>>
>>>>> So this function can't be called like this.
>>>>
>>>> As I already explained, the fences can outlive their scheduler. That
>>>> means *this* entity certainly exists for *this* scheduler, but the
>>>> *dependency* fence might have come from a past scheduler that was
>>>> already destroyed along with all of its entities, and its address
>>>> reused.
>>>
>>> Well this is function is not about fences, this function is a callback
>>> for the entity.
>>
>> That deals with dependency fences, which could have come from any
>> arbitrary source, including another entity and another scheduler.
>
> No, they can't. Signaling is certainly mandatory to happen before things
> are released even if we allow to decouple the dma_fence from it's issuer.

That's exactly what I'm saying in my comment. That the fence must be
signaled if its creator no longer exists, therefore it's okay to
inadvertently wait on its scheduled fence instead of its finished fence
(if that one was intended) since everything needs to be signaled at that
point anyway.

>>
>>>>
>>>> Christian, I'm really getting tired of your tone. I don't appreciate
>>>> being told my comments are "outright nonsense" when you don't even
>>>> take the time to understand what the issue is and what I'm trying to
>>>> do/document. If you aren't interested in working with me, I'm just
>>>> going to give up on drm_sched, wait until Rust gets workqueue support,
>>>> and reimplement it in Rust. You can keep your broken fence lifetime
>>>> semantics and I'll do my own thing.
>>>
>>> I'm certainly trying to help here, but you seem to have unrealistic
>>> expectations.
>>
>> I don't think expecting not to be told my changes are "outright
>> nonsense" is an unrealistic expectation. If you think it is, maybe
>> that's yet another indicator of the culture problems the kernel
>> community has...
>
> Well I'm just pointing out that you don't seem to understand the
> background of the things and just think this is a bug instead of
> intentional behavior.

I made a change, I explained why that change works with a portion of the
existing code by updating a comment, and you called that nonsense. It's
not even a bug, I'm trying to explain why this part isn't a bug even
with the expectation that fences don't outlive the scheduler. This is
because I went through the code trying to find problems this approach
would cause, ran into this tricky case, thought about it for a while,
realized it wasn't a problem, and figured it needed a comment.

>>> I perfectly understand what you are trying to do, but you don't seem to
>>> understand that this functionality here isn't made for your use case.
>>
>> I do, that's why I'm trying to change things. Right now, this
>> functionality isn't even properly documented, which is why I thought
>> it could be used for my use case, and slowly discovered otherwise.
>> Daniel suggested documenting it, then fixing the mismatches between
>> documentation and reality, which is what I'm doing here.
>
> Well I know Daniel for something like 10-15 years or so, I'm pretty sure
> that he meant that you document the existing state because otherwise
> this goes against usual patch submission approaches.
>
>>
>>> We can adjust the functionality to better match your requirements, but
>>> you can't say it is broken because it doesn't work when you use it not
>>> in the way it is intended to be used.
>>
>> I'm saying the idea that a random dma-buf holds onto a chain of
>> references that prevents unloading a driver module that wrote into it
>> (and keeps a bunch of random unrelated objects alive) is a broken
>> state of affairs.
>
> Well no, this is intentional design. Otherwise the module and with it
> the operations pointer the fences rely on go away.

But this is a drm_sched fence, not a driver fence. That's the point,
that they should be decoupled. The driver is free to unload and only
drm_sched would need to stay loaded so its fences continue to be valid.
Except that's not what happens right now. Right now the drm_sched fence
hangs onto the hw fence and the whole thing is supposed to keep the
whole scheduler alive for things not to go boom.

> We already discussed
> that over 10 years ago when Marten came up with the initial dma_fence
> design.
>
> The resulting problems are very well known and I completely agree that
> they are undesirable, but this is how the framework works and not just
> the scheduler but the rest of the DMA-buf framework as well.

So it's undesirable but you don't want me to change things...

>
>> It may or may not trickle down to actual problems for users (I would
>> bet it does in some cases but I don't know for sure), but it's a
>> situation that doesn't make any sense.
>>
>> I know I'm triggering actual breakage with my new use case due to
>> this, which is why I'm trying to improve things. But the current state
>> of affairs just doesn't make any sense even if it isn't causing kernel
>> oopses today with other drivers.
>>
>>> You can go ahead and try to re-implement the functionality in Rust, but
>>> then I would reject that pointing out that this should probably be an
>>> extension to the existing code.
>>
>> You keep rejecting my attempts at extending the existing code...
>
> Well I will try to improve here and push you into the right direction
> instead.

What is the right direction?

So far it's looking more and more like wait until we get workqueues in
Rust, write a trivial scheduler in the driver, and give up on this whole
drm_sched thing. Please tell me if there is a better way, because so far
all you've done is tell me my attempts are not the right way, and
demotivated me from working on drm_sched at all.

~~ Lina