2022-06-01 18:34:20

by Christian König

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-fence: allow dma fence to have their own lock

Am 01.06.22 um 15:22 schrieb Daniel Vetter:
> On Wed, Jun 01, 2022 at 02:45:42PM +0200, Christian König wrote:
>> Am 31.05.22 um 04:51 schrieb Sergey Senozhatsky:
>>> On (22/05/30 16:55), Christian König wrote:
>>>> Am 30.05.22 um 16:22 schrieb Sergey Senozhatsky:
>>>>> [SNIP]
>>>>> So the `lock` should have at least same lifespan as the DMA fence
>>>>> that borrows it, which is impossible to guarantee in our case.
>>>> Nope, that's not correct. The lock should have at least same lifespan as the
>>>> context of the DMA fence.
>>> How does one know when it's safe to release the context? DMA fence
>>> objects are still transparently refcount-ed and "live their own lives",
>>> how does one synchronize lifespans?
>> Well, you don't.
>>
>> If you have a dynamic context structure you need to reference count that as
>> well. In other words every time you create a fence in your context you need
>> to increment the reference count and every time a fence is release you
>> decrement it.
>>
>> If you have a static context structure like most drivers have then you must
>> make sure that all fences at least signal before you unload your driver. We
>> still somewhat have a race when you try to unload a driver and the fence_ops
>> structure suddenly disappear, but we currently live with that.
>>
>> Apart from that you are right, fences can live forever and we need to deal
>> with that.
> Yeah this entire thing is a bit an "oops we might have screwed up" moment.
> I think the cleanest way is to essentially do what the drm/sched codes
> does, which is split the gpu job into the public dma_fence (which can live
> forever) and the internal job fence (which has to deal with all the
> resource refcounting issues). And then make sure that only ever the public
> fence escapes to places where the fence can live forever (dma_resv,
> drm_syncobj, sync_file as our uapi container objects are the prominent
> cases really).
>
> It sucks a bit.

It's actually not that bad.

See after signaling the dma_fence_ops is mostly used for debugging I
think, e.g. timeline name etc...

Christian.

> -Daniel