2022-05-31 13:52:58

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] dma-fence: allow dma fence to have their own lock

On (22/05/30 16:55), Christian K?nig wrote:
> Am 30.05.22 um 16:22 schrieb Sergey Senozhatsky:
> > [SNIP]
> > So the `lock` should have at least same lifespan as the DMA fence
> > that borrows it, which is impossible to guarantee in our case.
>
> Nope, that's not correct. The lock should have at least same lifespan as the
> context of the DMA fence.

How does one know when it's safe to release the context? DMA fence
objects are still transparently refcount-ed and "live their own lives",
how does one synchronize lifespans?


2022-06-01 19:14:01

by Christian König

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-fence: allow dma fence to have their own lock

Am 31.05.22 um 04:51 schrieb Sergey Senozhatsky:
> On (22/05/30 16:55), Christian König wrote:
>> Am 30.05.22 um 16:22 schrieb Sergey Senozhatsky:
>>> [SNIP]
>>> So the `lock` should have at least same lifespan as the DMA fence
>>> that borrows it, which is impossible to guarantee in our case.
>> Nope, that's not correct. The lock should have at least same lifespan as the
>> context of the DMA fence.
> How does one know when it's safe to release the context? DMA fence
> objects are still transparently refcount-ed and "live their own lives",
> how does one synchronize lifespans?

Well, you don't.

If you have a dynamic context structure you need to reference count that
as well. In other words every time you create a fence in your context
you need to increment the reference count and every time a fence is
release you decrement it.

If you have a static context structure like most drivers have then you
must make sure that all fences at least signal before you unload your
driver. We still somewhat have a race when you try to unload a driver
and the fence_ops structure suddenly disappear, but we currently live
with that.

Apart from that you are right, fences can live forever and we need to
deal with that.

Regards,
Christian.

> _______________________________________________
> Linaro-mm-sig mailing list -- [email protected]
> To unsubscribe send an email to [email protected]


2022-06-01 19:52:18

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-fence: allow dma fence to have their own lock

On Wed, Jun 01, 2022 at 02:45:42PM +0200, Christian K?nig wrote:
> Am 31.05.22 um 04:51 schrieb Sergey Senozhatsky:
> > On (22/05/30 16:55), Christian K?nig wrote:
> > > Am 30.05.22 um 16:22 schrieb Sergey Senozhatsky:
> > > > [SNIP]
> > > > So the `lock` should have at least same lifespan as the DMA fence
> > > > that borrows it, which is impossible to guarantee in our case.
> > > Nope, that's not correct. The lock should have at least same lifespan as the
> > > context of the DMA fence.
> > How does one know when it's safe to release the context? DMA fence
> > objects are still transparently refcount-ed and "live their own lives",
> > how does one synchronize lifespans?
>
> Well, you don't.
>
> If you have a dynamic context structure you need to reference count that as
> well. In other words every time you create a fence in your context you need
> to increment the reference count and every time a fence is release you
> decrement it.
>
> If you have a static context structure like most drivers have then you must
> make sure that all fences at least signal before you unload your driver. We
> still somewhat have a race when you try to unload a driver and the fence_ops
> structure suddenly disappear, but we currently live with that.
>
> Apart from that you are right, fences can live forever and we need to deal
> with that.

Yeah this entire thing is a bit an "oops we might have screwed up" moment.
I think the cleanest way is to essentially do what the drm/sched codes
does, which is split the gpu job into the public dma_fence (which can live
forever) and the internal job fence (which has to deal with all the
resource refcounting issues). And then make sure that only ever the public
fence escapes to places where the fence can live forever (dma_resv,
drm_syncobj, sync_file as our uapi container objects are the prominent
cases really).

It sucks a bit.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2022-06-01 21:32:20

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-fence: allow dma fence to have their own lock

On (22/06/01 14:45), Christian K?nig wrote:
> Am 31.05.22 um 04:51 schrieb Sergey Senozhatsky:
> > On (22/05/30 16:55), Christian K?nig wrote:
> > > Am 30.05.22 um 16:22 schrieb Sergey Senozhatsky:
> > > > [SNIP]
> > > > So the `lock` should have at least same lifespan as the DMA fence
> > > > that borrows it, which is impossible to guarantee in our case.
> > > Nope, that's not correct. The lock should have at least same lifespan as the
> > > context of the DMA fence.
> > How does one know when it's safe to release the context? DMA fence
> > objects are still transparently refcount-ed and "live their own lives",
> > how does one synchronize lifespans?
>
> Well, you don't.
>
> If you have a dynamic context structure you need to reference count that as
> well. In other words every time you create a fence in your context you need
> to increment the reference count and every time a fence is release you
> decrement it.

OK then fence release should be able to point back to its "context"
structure. Either a "private" data in dma fence or we need to "embed"
fence into another object (refcounted) that owns the lock and provide
dma fence ops->release callback, which can container_of() to the object
that dma fence is embedded into.

I think you are suggesting the latter. Thanks for clarifications.

The limiting factor of this approach is that now our ops->release() is
under the same "pressure" as dma_fence_put()->dma_fence_release() are.
dma_fence_put() and dma_fence_release() can be called from any context,
as far as I understand, e.g. IRQ, however our normal object ->release
can schedule, we do things like synchronize_rcu() and so on. Nothing is
impossible, just saying that even this approach is not 100% perfect and
may need additional workarounds.

> If you have a static context structure like most drivers have then you must
> make sure that all fences at least signal before you unload your driver. We
> still somewhat have a race when you try to unload a driver and the fence_ops
> structure suddenly disappear, but we currently live with that.

Hmm, indeed... I didn't consider fence_ops case.

> Apart from that you are right, fences can live forever and we need to deal
> with that.

OK. I see.