LinuxLists.cc - Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma

2023-09-14 19:05:23

Subject: Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
> On 9/14/23 13:32, Thomas Hellström wrote:
> >
> > On 9/14/23 12:57, Danilo Krummrich wrote:
> > > On 9/13/23 14:16, Danilo Krummrich wrote:
> > >
> > > <snip>
> > >
> > > > > > And validate() can remove it while still holding all dma-
> > > > > > resv locks,
> > > > > > neat!
> > > > > > However, what if two tasks are trying to lock the VA space
> > > > > > concurrently? What
> > > > > > do we do when the drm_gpuvm_bo's refcount drops to zero in
> > > > > > drm_gpuva_unlink()?
> > > > > > Are we guaranteed that at this point of time the
> > > > > > drm_gpuvm_bo is not
> > > > > > on the
> > > > > > evicted list? Because otherwise we would call
> > > > > > drm_gpuvm_bo_destroy()
> > > > > > with the
> > > > > > dma-resv lock held, which wouldn't be allowed, since
> > > > > > drm_gpuvm_bo_destroy()
> > > > > > might drop the last reference to the drm_gem_object and
> > > > > > hence we'd
> > > > > > potentially
> > > > > > free the dma-resv lock while holding it, at least if it's
> > > > > > an external
> > > > > > object.
> > > > >
> > > > > Easiest way in this scheme is to think of the lists as being
> > > > > protected
> > > > > by the vm's resv lock. That means anybody calling unlink()
> > > > > must also
> > > > > hold the vm's resv lock. (Which is OK from an UAF point of
> > > > > view, but
> > > > > perhaps not from a locking inversion POW from an async list
> > > > > update).
> > > >
> > > > This would mean that on unlink() we'd need to hold the VM's
> > > > resv lock and the
> > > > corresponding GEM's resv lock (in case they're not the same
> > > > anyways) because the
> > > > VM's resv lock would protect the external / evicted object
> > > > lists and the GEM
> > > > objects resv lock protects the GEM's list of drm_gpuvm_bos and
> > > > the
> > > > drm_gpuvm_bo's list of drm_gpuvas.
> > >
> > > As mentioned below the same applies for drm_gpuvm_bo_put() since
> > > it might
> > > destroy the vm_bo, which includes removing the vm_bo from
> > > external / evicted
> > > object lists and the GEMs list of vm_bos.
> > >
> > > As mentioned, if the GEM's dma-resv is different from the VM's
> > > dma-resv we need
> > > to take both locks. Ultimately, this would mean we need a
> > > drm_exec loop, because
> > > we can't know the order in which to take these locks. Doing a
> > > full drm_exec loop
> > > just to put() a vm_bo doesn't sound reasonable to me.
> > >
> > > Can we instead just have an internal mutex for locking the lists
> > > such that we
> > > avoid taking and dropping the spinlocks, which we use currently,
> > > in a loop?
> >
> > You'd have the same locking inversion problem with a mutex, right?
> > Since in the eviction path you have resv->mutex, from exec you have
> > resv->mutex->resv because validate would attempt to grab resv.
>
> Both lists, evict and extobj, would need to have a separate mutex,
> not a common one.
> We'd also need a dedicated GEM gpuva lock. Then the only rule would
> be that you can't
> hold the dma-resv lock when calling put(). Which I admit is not that
> nice.
>
> With the current spinlock solution drivers wouldn't need to worry
> about anything locking
> related though. So maybe I come back to your proposal of having a
> switch for external
> locking with dma-resv locks entirely. Such that with external dma-
> resv locking I skip
> all the spinlocks and add lockdep checks instead.
>
> I think that makes the most sense in terms of taking advantage of
> external dma-resv locking
> where possible and on the other hand having a self-contained solution
> if not. This should
> get all concerns out of the way, yours, Christian's and Boris'.

If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
solution, and check back after a while to see if we can remove either
option once most pitfalls are hit.

Thanks,
/Thomas

>
> >
> > That said, xe currently indeed does the vm+bo exec dance on vma
> > put.
> >
> > One reason why that seemingly horrible construct is good, is that
> > when evicting an extobj and you need to access individual vmas to
> > Zap page table entries or TLB flush, those VMAs are not allowed to
> > go away (we're not refcounting them). Holding the bo resv on gpuva
> > put prevents that from happening. Possibly one could use another
> > mutex to protect the gem->vm_bo list to achieve the same, but we'd
> > need to hold it on gpuva put.
> >
> > /Thomas
> >
> >
> > >
> > > - Danilo
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > For extobjs an outer lock would be enough in case of
> > > > > > > > Xe, but I
> > > > > > > > really would not
> > > > > > > > like to add even more complexity just to get the
> > > > > > > > spinlock out of
> > > > > > > > the way in case
> > > > > > > > the driver already has an outer lock protecting this
> > > > > > > > path.
> > > > > > >
> > > > > > > I must disagree here. These spinlocks and atomic
> > > > > > > operations are
> > > > > > > pretty
> > > > > > > costly and as discussed earlier this type of locking was
> > > > > > > the reason
> > > > > > > (at
> > > > > > > least according to the commit message) that made
> > > > > > > Christian drop the
> > > > > > > XArray
> > > > > > > use in drm_exec for the same set of objects: "The locking
> > > > > > > overhead
> > > > > > > is
> > > > > > > unecessary and measurable". IMHO the spinlock is the
> > > > > > > added
> > > > > > > complexity and a
> > > > > > > single wide lock following the drm locking guidelines set
> > > > > > > out by
> > > > > > > Daniel and
> > > > > > > David should really be the default choice with an opt-in
> > > > > > > for a
> > > > > > > spinlock if
> > > > > > > needed for async and pushing out to a wq is not an
> > > > > > > option.
> > > > > >
> > > > > > For the external object list an outer lock would work as
> > > > > > long as it's
> > > > > > not the
> > > > > > dma-resv lock of the corresponding GEM object, since here
> > > > > > we actually
> > > > > > need to
> > > > > > remove the list entry from the external object list on
> > > > > > drm_gpuvm_bo_destroy().
> > > > > > It's just a bit weird design wise that drivers would need
> > > > > > to take
> > > > > > this outer
> > > > > > lock on:
> > > > > >
> > > > > > - drm_gpuvm_bo_extobj_add()
> > > > > > - drm_gpuvm_bo_destroy()        (and hence also
> > > > > > drm_gpuvm_bo_put())
> > > > > > - drm_gpuva_unlink()            (because it needs to call
> > > > > > drm_gpuvm_bo_put())
> > > > > > - drm_gpuvm_exec_lock()
> > > > > > - drm_gpuvm_exec_lock_array()
> > > > > > - drm_gpuvm_prepare_range()
> > > > > >
> > > > > > Given that it seems reasonable to do all the required
> > > > > > locking
> > > > > > internally.
> > > > >
> > > > > From a design POW, there has been a clear direction in XE to
> > > > > make
> > > > > things similar to mmap() / munmap(), so this outer lock,
> > > > > which in Xe is
> > > > > an rwsem, is used in a similar way as the mmap_lock. It's
> > > > > protecting
> > > > > the page-table structures and vma rb tree, the userptr
> > > > > structures and
> > > > > the extobj list. Basically it's taken early in the exec
> > > > > IOCTL, the
> > > > > VM_BIND ioctl, the compute rebind worker and the pagefault
> > > > > handler, so
> > > > > all of the above are just asserting that it is taken in the
> > > > > correct
> > > > > mode.
> > > > >
> > > > > But strictly with this scheme one could also use the vm's
> > > > > dma_resv for
> > > > > the extobj list since with drm_exec, it's locked before
> > > > > traversing the
> > > > > list.
> > > > >
> > > > > The whole point of this scheme is to rely on locks that you
> > > > > already are
> > > > > supposed to be holding for various reasons and is simple to
> > > > > comprehend.
> > > >
> > > > I don't agree that we're supposed to hold the VM's resv lock
> > > > anyways for
> > > > functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
> > > > I'm fine using it
> > > > for that purpose nevertheless.
> > > >
> > > > >
> > > > > >
> > > > > > In order to at least place lockdep checks, the driver would
> > > > > > need to
> > > > > > supply the
> > > > > > corresponding lock's lockdep_map, because the GPUVM
> > > > > > otherwise doesn't
> > > > > > know about
> > > > > > the lock.
> > > > >
> > > > > Yes, that sounds reasonable. One lockdep map per list.
> > > >
> > > > I'd really like to avoid that, especially now that everything
> > > > got simpler. We
> > > > should define the actual locks to take instead.
> > > >
> > > > >
> > > > > >
> > > > > > Out of curiosity, what is the overhead of a spin_lock()
> > > > > > that doesn't
> > > > > > need to
> > > > > > spin?
> > > > >
> > > > > I guess it's hard to tell exactly, but it is much lower on
> > > > > modern x86
> > > > > than what it used to be. Not sure about ARM, which is the
> > > > > other
> > > > > architecture important to us. I figure if there is little
> > > > > cache-line
> > > > > bouncing the main overhead comes from the implied barriers.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > A pretty simple way that would not add much code would be
> > > > > > >
> > > > > > > static void gpuvm_cond_spin_lock(const struct drm_gpuvm
> > > > > > > *gpuvm,
> > > > > > > spinlock_t
> > > > > > > *lock)
> > > > > > >
> > > > > > > {
> > > > > > >
> > > > > > >      if (!gpuvm->resv_protected_lists)
> > > > > > >          spin_lock(lock);
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > > > For such drivers, that would require anybody calling
> > > > > > > > > unlink to
> > > > > > > > > hold the vm's
> > > > > > > > > resv, though.
> > > > > > > > In V4 I want to go back to having a dedicated lock for
> > > > > > > > the GEMs
> > > > > > > > gpuva list (or
> > > > > > > > VM_BO list to be more precise). We can't just use the
> > > > > > > > dma-resv
> > > > > > > > lock for that
> > > > > > > > with VM_BO abstractions, because on destruction of a
> > > > > > > > VM_BO we
> > > > > > > > otherwise wouldn't
> > > > > > > > be allowed to already hold the dma-resv lock. That's
> > > > > > > > the fix I
> > > > > > > > was referring to
> > > > > > > > earlier.
> > > > > > >
> > > > > > > Yeah, I can see the need for a dedicated lock for the
> > > > > > > GEM's gpuva
> > > > > > > list, but
> > > > > > > holding the vm's dma-resv lock across the unlink
> > > > > > > shouldn't be a
> > > > > > > problem. We
> > > > > > > may free the object and a pointer to the vm's resv during
> > > > > > > unlink
> > > > > > > but we
> > > > > > > don't free the vm's resv. It'd be a matter of ensuring
> > > > > > > that any
> > > > > > > calls to
> > > > > > > unlink from *within* drm_gpuvm allows it to be held.
> > > > > >
> > > > > > Drivers calling unlink() from the fence signaling path
> > > > > > can't use the
> > > > > > VM's
> > > > > > dma-resv lock.
> > > > >
> > > > > Yes, that made me a bit curious because in the current
> > > > > version the code
> > > > > required the object's dma_resv for unlink() which can't be
> > > > > grabbed
> > > > > either from the fence signaling path. So are there any
> > > > > drivers actually
> > > > > wanting to do that? If so, they will either need to resort to
> > > > > the
> > > > > current spinlock solution or they will need to call unlink
> > > > > from a
> > > > > workqueue item.
> > > >
> > > > As Boris already mentioned we have the dma-resv lock by default
> > > > or a driver
> > > > specific GEM gpuva lock as opt-in. Now, we can get rid of the
> > > > latter.
> > > >
> > > > > >
> > > > > > Also, what if the object is an external object? We can't
> > > > > > use the VM's
> > > > > > dma-resv
> > > > > > lock here.
> > > > >
> > > > > Why? Typically (sync) unlink is only ever called from an
> > > > > unbind-like
> > > > > operation where it should be trivial to grab the vm's resv.
> > > > > Or, for
> > > > > that matter any outer lock protecting the extobj list. Rule
> > > > > would be
> > > > > the drm_gpuvm_bo::entry::extobj and
> > > > > drm_gpuvm_bo::entry::evict would
> > > > > be protected by either the vm's dma_resv (or possibly an
> > > > > outer lock in
> > > > > the case of the extobj list).
> > > >
> > > > Outer lock wouldn't have been working for updates in the async
> > > > path, but
> > > > shouldn't be relevant anymore. We could use the VM's resv for
> > > > that.
> > > >
> > > > >
> > > > > > And we can't have the GEM objs dma-resv lock held when
> > > > > > calling
> > > > > > unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
> > > > > > the
> > > > > > refcount drops
> > > > > > to zero calls drm_gpuvm_bo_destroy() and
> > > > > > drm_gpuvm_bo_destroy() might
> > > > > > drop the
> > > > > > last reference of the GEM object.
> > > > >
> > > > > Yes, but this is a different problem as to what exactly
> > > > > protects
> > > > > drm_gpuvm_bo::entry::gem. Either as you suggest an internal
> > > > > per bo list
> > > > > lock, or if we want to keep the bo's dma_resv we need to
> > > > > ensure that
> > > > > the caller of dma_resv_unlock(obj->resv) actually refcounts
> > > > > its obj
> > > > > pointer, and doesn't implicitly rely on the gpuvm_bo's
> > > > > refcount (I know
> > > > > Boris didn't like that, but requiring an explicit refcount
> > > > > for a
> > > > > pointer you dereference unless you're under a lock that
> > > > > ensures keeping
> > > > > the object alive is pretty much required?) But anyway for the
> > > > > drm_gpuvm_bo::entry::gem list protection (bo resv or internal
> > > > > spinlock)
> > > > > I don't have a strong preference.
> > > >
> > > > We can keep the GEM objects dma-resv lock, however as mentioned
> > > > above
> > > > drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
> > > > the VM's resv lock
> > > > and the GEM's resv lock in case they differ.
> > > >
> > >
> > > > > > >
> > >
> >
>

2023-09-15 03:53:23

by Danilo Krummrich

[permalink] [raw]

Subject: Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

On 9/14/23 19:13, Thomas Hellström wrote:
> On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
>> On 9/14/23 13:32, Thomas Hellström wrote:
>>>
>>> On 9/14/23 12:57, Danilo Krummrich wrote:
>>>> On 9/13/23 14:16, Danilo Krummrich wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> And validate() can remove it while still holding all dma-
>>>>>>> resv locks,
>>>>>>> neat!
>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>> concurrently? What
>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>> drm_gpuva_unlink()?
>>>>>>> Are we guaranteed that at this point of time the
>>>>>>> drm_gpuvm_bo is not
>>>>>>> on the
>>>>>>> evicted list? Because otherwise we would call
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> with the
>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> might drop the last reference to the drm_gem_object and
>>>>>>> hence we'd
>>>>>>> potentially
>>>>>>> free the dma-resv lock while holding it, at least if it's
>>>>>>> an external
>>>>>>> object.
>>>>>>
>>>>>> Easiest way in this scheme is to think of the lists as being
>>>>>> protected
>>>>>> by the vm's resv lock. That means anybody calling unlink()
>>>>>> must also
>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of
>>>>>> view, but
>>>>>> perhaps not from a locking inversion POW from an async list
>>>>>> update).
>>>>>
>>>>> This would mean that on unlink() we'd need to hold the VM's
>>>>> resv lock and the
>>>>> corresponding GEM's resv lock (in case they're not the same
>>>>> anyways) because the
>>>>> VM's resv lock would protect the external / evicted object
>>>>> lists and the GEM
>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and
>>>>> the
>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>
>>>> As mentioned below the same applies for drm_gpuvm_bo_put() since
>>>> it might
>>>> destroy the vm_bo, which includes removing the vm_bo from
>>>> external / evicted
>>>> object lists and the GEMs list of vm_bos.
>>>>
>>>> As mentioned, if the GEM's dma-resv is different from the VM's
>>>> dma-resv we need
>>>> to take both locks. Ultimately, this would mean we need a
>>>> drm_exec loop, because
>>>> we can't know the order in which to take these locks. Doing a
>>>> full drm_exec loop
>>>> just to put() a vm_bo doesn't sound reasonable to me.
>>>>
>>>> Can we instead just have an internal mutex for locking the lists
>>>> such that we
>>>> avoid taking and dropping the spinlocks, which we use currently,
>>>> in a loop?
>>>
>>> You'd have the same locking inversion problem with a mutex, right?
>>> Since in the eviction path you have resv->mutex, from exec you have
>>> resv->mutex->resv because validate would attempt to grab resv.
>>
>> Both lists, evict and extobj, would need to have a separate mutex,
>> not a common one.
>> We'd also need a dedicated GEM gpuva lock. Then the only rule would
>> be that you can't
>> hold the dma-resv lock when calling put(). Which I admit is not that
>> nice.
>>
>> With the current spinlock solution drivers wouldn't need to worry
>> about anything locking
>> related though. So maybe I come back to your proposal of having a
>> switch for external
>> locking with dma-resv locks entirely. Such that with external dma-
>> resv locking I skip
>> all the spinlocks and add lockdep checks instead.
>>
>> I think that makes the most sense in terms of taking advantage of
>> external dma-resv locking
>> where possible and on the other hand having a self-contained solution
>> if not. This should
>> get all concerns out of the way, yours, Christian's and Boris'.
>
> If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
> solution, and check back after a while to see if we can remove either
> option once most pitfalls are hit.

Sounds good, I'll prepare this for a V4.

- Danilo

>
> Thanks,
> /Thomas
>
>
>>
>>>
>>> That said, xe currently indeed does the vm+bo exec dance on vma
>>> put.
>>>
>>> One reason why that seemingly horrible construct is good, is that
>>> when evicting an extobj and you need to access individual vmas to
>>> Zap page table entries or TLB flush, those VMAs are not allowed to
>>> go away (we're not refcounting them). Holding the bo resv on gpuva
>>> put prevents that from happening. Possibly one could use another
>>> mutex to protect the gem->vm_bo list to achieve the same, but we'd
>>> need to hold it on gpuva put.
>>>
>>> /Thomas
>>>
>>>
>>>>
>>>> - Danilo
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> For extobjs an outer lock would be enough in case of
>>>>>>>>> Xe, but I
>>>>>>>>> really would not
>>>>>>>>> like to add even more complexity just to get the
>>>>>>>>> spinlock out of
>>>>>>>>> the way in case
>>>>>>>>> the driver already has an outer lock protecting this
>>>>>>>>> path.
>>>>>>>>
>>>>>>>> I must disagree here. These spinlocks and atomic
>>>>>>>> operations are
>>>>>>>> pretty
>>>>>>>> costly and as discussed earlier this type of locking was
>>>>>>>> the reason
>>>>>>>> (at
>>>>>>>> least according to the commit message) that made
>>>>>>>> Christian drop the
>>>>>>>> XArray
>>>>>>>> use in drm_exec for the same set of objects: "The locking
>>>>>>>> overhead
>>>>>>>> is
>>>>>>>> unecessary and measurable". IMHO the spinlock is the
>>>>>>>> added
>>>>>>>> complexity and a
>>>>>>>> single wide lock following the drm locking guidelines set
>>>>>>>> out by
>>>>>>>> Daniel and
>>>>>>>> David should really be the default choice with an opt-in
>>>>>>>> for a
>>>>>>>> spinlock if
>>>>>>>> needed for async and pushing out to a wq is not an
>>>>>>>> option.
>>>>>>>
>>>>>>> For the external object list an outer lock would work as
>>>>>>> long as it's
>>>>>>> not the
>>>>>>> dma-resv lock of the corresponding GEM object, since here
>>>>>>> we actually
>>>>>>> need to
>>>>>>> remove the list entry from the external object list on
>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>> It's just a bit weird design wise that drivers would need
>>>>>>> to take
>>>>>>> this outer
>>>>>>> lock on:
>>>>>>>
>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>
>>>>>>> Given that it seems reasonable to do all the required
>>>>>>> locking
>>>>>>> internally.
>>>>>>
>>>>>> From a design POW, there has been a clear direction in XE to
>>>>>> make
>>>>>> things similar to mmap() / munmap(), so this outer lock,
>>>>>> which in Xe is
>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's
>>>>>> protecting
>>>>>> the page-table structures and vma rb tree, the userptr
>>>>>> structures and
>>>>>> the extobj list. Basically it's taken early in the exec
>>>>>> IOCTL, the
>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault
>>>>>> handler, so
>>>>>> all of the above are just asserting that it is taken in the
>>>>>> correct
>>>>>> mode.
>>>>>>
>>>>>> But strictly with this scheme one could also use the vm's
>>>>>> dma_resv for
>>>>>> the extobj list since with drm_exec, it's locked before
>>>>>> traversing the
>>>>>> list.
>>>>>>
>>>>>> The whole point of this scheme is to rely on locks that you
>>>>>> already are
>>>>>> supposed to be holding for various reasons and is simple to
>>>>>> comprehend.
>>>>>
>>>>> I don't agree that we're supposed to hold the VM's resv lock
>>>>> anyways for
>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
>>>>> I'm fine using it
>>>>> for that purpose nevertheless.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> In order to at least place lockdep checks, the driver would
>>>>>>> need to
>>>>>>> supply the
>>>>>>> corresponding lock's lockdep_map, because the GPUVM
>>>>>>> otherwise doesn't
>>>>>>> know about
>>>>>>> the lock.
>>>>>>
>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>
>>>>> I'd really like to avoid that, especially now that everything
>>>>> got simpler. We
>>>>> should define the actual locks to take instead.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Out of curiosity, what is the overhead of a spin_lock()
>>>>>>> that doesn't
>>>>>>> need to
>>>>>>> spin?
>>>>>>
>>>>>> I guess it's hard to tell exactly, but it is much lower on
>>>>>> modern x86
>>>>>> than what it used to be. Not sure about ARM, which is the
>>>>>> other
>>>>>> architecture important to us. I figure if there is little
>>>>>> cache-line
>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>
>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>> spinlock_t
>>>>>>>> *lock)
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>          spin_lock(lock);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>>> For such drivers, that would require anybody calling
>>>>>>>>>> unlink to
>>>>>>>>>> hold the vm's
>>>>>>>>>> resv, though.
>>>>>>>>> In V4 I want to go back to having a dedicated lock for
>>>>>>>>> the GEMs
>>>>>>>>> gpuva list (or
>>>>>>>>> VM_BO list to be more precise). We can't just use the
>>>>>>>>> dma-resv
>>>>>>>>> lock for that
>>>>>>>>> with VM_BO abstractions, because on destruction of a
>>>>>>>>> VM_BO we
>>>>>>>>> otherwise wouldn't
>>>>>>>>> be allowed to already hold the dma-resv lock. That's
>>>>>>>>> the fix I
>>>>>>>>> was referring to
>>>>>>>>> earlier.
>>>>>>>>
>>>>>>>> Yeah, I can see the need for a dedicated lock for the
>>>>>>>> GEM's gpuva
>>>>>>>> list, but
>>>>>>>> holding the vm's dma-resv lock across the unlink
>>>>>>>> shouldn't be a
>>>>>>>> problem. We
>>>>>>>> may free the object and a pointer to the vm's resv during
>>>>>>>> unlink
>>>>>>>> but we
>>>>>>>> don't free the vm's resv. It'd be a matter of ensuring
>>>>>>>> that any
>>>>>>>> calls to
>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>
>>>>>>> Drivers calling unlink() from the fence signaling path
>>>>>>> can't use the
>>>>>>> VM's
>>>>>>> dma-resv lock.
>>>>>>
>>>>>> Yes, that made me a bit curious because in the current
>>>>>> version the code
>>>>>> required the object's dma_resv for unlink() which can't be
>>>>>> grabbed
>>>>>> either from the fence signaling path. So are there any
>>>>>> drivers actually
>>>>>> wanting to do that? If so, they will either need to resort to
>>>>>> the
>>>>>> current spinlock solution or they will need to call unlink
>>>>>> from a
>>>>>> workqueue item.
>>>>>
>>>>> As Boris already mentioned we have the dma-resv lock by default
>>>>> or a driver
>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the
>>>>> latter.
>>>>>
>>>>>>>
>>>>>>> Also, what if the object is an external object? We can't
>>>>>>> use the VM's
>>>>>>> dma-resv
>>>>>>> lock here.
>>>>>>
>>>>>> Why? Typically (sync) unlink is only ever called from an
>>>>>> unbind-like
>>>>>> operation where it should be trivial to grab the vm's resv.
>>>>>> Or, for
>>>>>> that matter any outer lock protecting the extobj list. Rule
>>>>>> would be
>>>>>> the drm_gpuvm_bo::entry::extobj and
>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>> be protected by either the vm's dma_resv (or possibly an
>>>>>> outer lock in
>>>>>> the case of the extobj list).
>>>>>
>>>>> Outer lock wouldn't have been working for updates in the async
>>>>> path, but
>>>>> shouldn't be relevant anymore. We could use the VM's resv for
>>>>> that.
>>>>>
>>>>>>
>>>>>>> And we can't have the GEM objs dma-resv lock held when
>>>>>>> calling
>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
>>>>>>> the
>>>>>>> refcount drops
>>>>>>> to zero calls drm_gpuvm_bo_destroy() and
>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>> drop the
>>>>>>> last reference of the GEM object.
>>>>>>
>>>>>> Yes, but this is a different problem as to what exactly
>>>>>> protects
>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal
>>>>>> per bo list
>>>>>> lock, or if we want to keep the bo's dma_resv we need to
>>>>>> ensure that
>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts
>>>>>> its obj
>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's
>>>>>> refcount (I know
>>>>>> Boris didn't like that, but requiring an explicit refcount
>>>>>> for a
>>>>>> pointer you dereference unless you're under a lock that
>>>>>> ensures keeping
>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal
>>>>>> spinlock)
>>>>>> I don't have a strong preference.
>>>>>
>>>>> We can keep the GEM objects dma-resv lock, however as mentioned
>>>>> above
>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
>>>>> the VM's resv lock
>>>>> and the GEM's resv lock in case they differ.
>>>>>
>>>>
>>>>>>>>
>>>>
>>>
>>
>

2023-09-18 13:25:35

by Danilo Krummrich

[permalink] [raw]

Subject: Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

On 9/14/23 19:15, Danilo Krummrich wrote:
> On 9/14/23 19:13, Thomas Hellström wrote:
>> On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
>>> On 9/14/23 13:32, Thomas Hellström wrote:
>>>>
>>>> On 9/14/23 12:57, Danilo Krummrich wrote:
>>>>> On 9/13/23 14:16, Danilo Krummrich wrote:
>>>>>
>>>>> <snip>
>>>>>
>>>>>>>> And validate() can remove it while still holding all dma-
>>>>>>>> resv locks,
>>>>>>>> neat!
>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>> concurrently? What
>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>> drm_gpuva_unlink()?
>>>>>>>> Are we guaranteed that at this point of time the
>>>>>>>> drm_gpuvm_bo is not
>>>>>>>> on the
>>>>>>>> evicted list? Because otherwise we would call
>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>> with the
>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>> might drop the last reference to the drm_gem_object and
>>>>>>>> hence we'd
>>>>>>>> potentially
>>>>>>>> free the dma-resv lock while holding it, at least if it's
>>>>>>>> an external
>>>>>>>> object.
>>>>>>>
>>>>>>> Easiest way in this scheme is to think of the lists as being
>>>>>>> protected
>>>>>>> by the vm's resv lock. That means anybody calling unlink()
>>>>>>> must also
>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of
>>>>>>> view, but
>>>>>>> perhaps not from a locking inversion POW from an async list
>>>>>>> update).
>>>>>>
>>>>>> This would mean that on unlink() we'd need to hold the VM's
>>>>>> resv lock and the
>>>>>> corresponding GEM's resv lock (in case they're not the same
>>>>>> anyways) because the
>>>>>> VM's resv lock would protect the external / evicted object
>>>>>> lists and the GEM
>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and
>>>>>> the
>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>
>>>>> As mentioned below the same applies for drm_gpuvm_bo_put() since
>>>>> it might
>>>>> destroy the vm_bo, which includes removing the vm_bo from
>>>>> external / evicted
>>>>> object lists and the GEMs list of vm_bos.
>>>>>
>>>>> As mentioned, if the GEM's dma-resv is different from the VM's
>>>>> dma-resv we need
>>>>> to take both locks. Ultimately, this would mean we need a
>>>>> drm_exec loop, because
>>>>> we can't know the order in which to take these locks. Doing a
>>>>> full drm_exec loop
>>>>> just to put() a vm_bo doesn't sound reasonable to me.
>>>>>
>>>>> Can we instead just have an internal mutex for locking the lists
>>>>> such that we
>>>>> avoid taking and dropping the spinlocks, which we use currently,
>>>>> in a loop?
>>>>
>>>> You'd have the same locking inversion problem with a mutex, right?
>>>> Since in the eviction path you have resv->mutex, from exec you have
>>>> resv->mutex->resv because validate would attempt to grab resv.
>>>
>>> Both lists, evict and extobj, would need to have a separate mutex,
>>> not a common one.
>>> We'd also need a dedicated GEM gpuva lock. Then the only rule would
>>> be that you can't
>>> hold the dma-resv lock when calling put(). Which I admit is not that
>>> nice.
>>>
>>> With the current spinlock solution drivers wouldn't need to worry
>>> about anything locking
>>> related though. So maybe I come back to your proposal of having a
>>> switch for external
>>> locking with dma-resv locks entirely. Such that with external dma-
>>> resv locking I skip
>>> all the spinlocks and add lockdep checks instead.
>>>
>>> I think that makes the most sense in terms of taking advantage of
>>> external dma-resv locking
>>> where possible and on the other hand having a self-contained solution
>>> if not. This should
>>> get all concerns out of the way, yours, Christian's and Boris'.
>>
>> If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
>> solution, and check back after a while to see if we can remove either
>> option once most pitfalls are hit.
>
> Sounds good, I'll prepare this for a V4.

I was considering getting rid of the spinlocks using srcu for both external and
evicted objects instead. This would get us rid of taking/dropping the spinlock in
every iteration step of the lists, limiting it to a single srcu_read_{lock,unlock}
call per list walk. Plus, obviously the list_add_rcu() and list_del_rcu() variants
as accessors. The accessors, would probably still need a spinlock to protect against
concurrent list_add_rcu()/list_del_rcu() calls, but I think those are not a concern.

Any concerns from your side with variant?

>
> - Danilo
>
>>
>> Thanks,
>> /Thomas
>>
>>
>>>
>>>>
>>>> That said, xe currently indeed does the vm+bo exec dance on vma
>>>> put.
>>>>
>>>> One reason why that seemingly horrible construct is good, is that
>>>> when evicting an extobj and you need to access individual vmas to
>>>> Zap page table entries or TLB flush, those VMAs are not allowed to
>>>> go away (we're not refcounting them). Holding the bo resv on gpuva
>>>> put prevents that from happening. Possibly one could use another
>>>> mutex to protect the gem->vm_bo list to achieve the same, but we'd
>>>> need to hold it on gpuva put.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>>
>>>>> - Danilo
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For extobjs an outer lock would be enough in case of
>>>>>>>>>> Xe, but I
>>>>>>>>>> really would not
>>>>>>>>>> like to add even more complexity just to get the
>>>>>>>>>> spinlock out of
>>>>>>>>>> the way in case
>>>>>>>>>> the driver already has an outer lock protecting this
>>>>>>>>>> path.
>>>>>>>>>
>>>>>>>>> I must disagree here. These spinlocks and atomic
>>>>>>>>> operations are
>>>>>>>>> pretty
>>>>>>>>> costly and as discussed earlier this type of locking was
>>>>>>>>> the reason
>>>>>>>>> (at
>>>>>>>>> least according to the commit message) that made
>>>>>>>>> Christian drop the
>>>>>>>>> XArray
>>>>>>>>> use in drm_exec for the same set of objects: "The locking
>>>>>>>>> overhead
>>>>>>>>> is
>>>>>>>>> unecessary and measurable". IMHO the spinlock is the
>>>>>>>>> added
>>>>>>>>> complexity and a
>>>>>>>>> single wide lock following the drm locking guidelines set
>>>>>>>>> out by
>>>>>>>>> Daniel and
>>>>>>>>> David should really be the default choice with an opt-in
>>>>>>>>> for a
>>>>>>>>> spinlock if
>>>>>>>>> needed for async and pushing out to a wq is not an
>>>>>>>>> option.
>>>>>>>>
>>>>>>>> For the external object list an outer lock would work as
>>>>>>>> long as it's
>>>>>>>> not the
>>>>>>>> dma-resv lock of the corresponding GEM object, since here
>>>>>>>> we actually
>>>>>>>> need to
>>>>>>>> remove the list entry from the external object list on
>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>> It's just a bit weird design wise that drivers would need
>>>>>>>> to take
>>>>>>>> this outer
>>>>>>>> lock on:
>>>>>>>>
>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also
>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>
>>>>>>>> Given that it seems reasonable to do all the required
>>>>>>>> locking
>>>>>>>> internally.
>>>>>>>
>>>>>>>   From a design POW, there has been a clear direction in XE to
>>>>>>> make
>>>>>>> things similar to mmap() / munmap(), so this outer lock,
>>>>>>> which in Xe is
>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's
>>>>>>> protecting
>>>>>>> the page-table structures and vma rb tree, the userptr
>>>>>>> structures and
>>>>>>> the extobj list. Basically it's taken early in the exec
>>>>>>> IOCTL, the
>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault
>>>>>>> handler, so
>>>>>>> all of the above are just asserting that it is taken in the
>>>>>>> correct
>>>>>>> mode.
>>>>>>>
>>>>>>> But strictly with this scheme one could also use the vm's
>>>>>>> dma_resv for
>>>>>>> the extobj list since with drm_exec, it's locked before
>>>>>>> traversing the
>>>>>>> list.
>>>>>>>
>>>>>>> The whole point of this scheme is to rely on locks that you
>>>>>>> already are
>>>>>>> supposed to be holding for various reasons and is simple to
>>>>>>> comprehend.
>>>>>>
>>>>>> I don't agree that we're supposed to hold the VM's resv lock
>>>>>> anyways for
>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
>>>>>> I'm fine using it
>>>>>> for that purpose nevertheless.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> In order to at least place lockdep checks, the driver would
>>>>>>>> need to
>>>>>>>> supply the
>>>>>>>> corresponding lock's lockdep_map, because the GPUVM
>>>>>>>> otherwise doesn't
>>>>>>>> know about
>>>>>>>> the lock.
>>>>>>>
>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>
>>>>>> I'd really like to avoid that, especially now that everything
>>>>>> got simpler. We
>>>>>> should define the actual locks to take instead.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Out of curiosity, what is the overhead of a spin_lock()
>>>>>>>> that doesn't
>>>>>>>> need to
>>>>>>>> spin?
>>>>>>>
>>>>>>> I guess it's hard to tell exactly, but it is much lower on
>>>>>>> modern x86
>>>>>>> than what it used to be. Not sure about ARM, which is the
>>>>>>> other
>>>>>>> architecture important to us. I figure if there is little
>>>>>>> cache-line
>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>
>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm
>>>>>>>>> *gpuvm,
>>>>>>>>> spinlock_t
>>>>>>>>> *lock)
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>>       if (!gpuvm->resv_protected_lists)
>>>>>>>>>           spin_lock(lock);
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>>> For such drivers, that would require anybody calling
>>>>>>>>>>> unlink to
>>>>>>>>>>> hold the vm's
>>>>>>>>>>> resv, though.
>>>>>>>>>> In V4 I want to go back to having a dedicated lock for
>>>>>>>>>> the GEMs
>>>>>>>>>> gpuva list (or
>>>>>>>>>> VM_BO list to be more precise). We can't just use the
>>>>>>>>>> dma-resv
>>>>>>>>>> lock for that
>>>>>>>>>> with VM_BO abstractions, because on destruction of a
>>>>>>>>>> VM_BO we
>>>>>>>>>> otherwise wouldn't
>>>>>>>>>> be allowed to already hold the dma-resv lock. That's
>>>>>>>>>> the fix I
>>>>>>>>>> was referring to
>>>>>>>>>> earlier.
>>>>>>>>>
>>>>>>>>> Yeah, I can see the need for a dedicated lock for the
>>>>>>>>> GEM's gpuva
>>>>>>>>> list, but
>>>>>>>>> holding the vm's dma-resv lock across the unlink
>>>>>>>>> shouldn't be a
>>>>>>>>> problem. We
>>>>>>>>> may free the object and a pointer to the vm's resv during
>>>>>>>>> unlink
>>>>>>>>> but we
>>>>>>>>> don't free the vm's resv. It'd be a matter of ensuring
>>>>>>>>> that any
>>>>>>>>> calls to
>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>
>>>>>>>> Drivers calling unlink() from the fence signaling path
>>>>>>>> can't use the
>>>>>>>> VM's
>>>>>>>> dma-resv lock.
>>>>>>>
>>>>>>> Yes, that made me a bit curious because in the current
>>>>>>> version the code
>>>>>>> required the object's dma_resv for unlink() which can't be
>>>>>>> grabbed
>>>>>>> either from the fence signaling path. So are there any
>>>>>>> drivers actually
>>>>>>> wanting to do that? If so, they will either need to resort to
>>>>>>> the
>>>>>>> current spinlock solution or they will need to call unlink
>>>>>>> from a
>>>>>>> workqueue item.
>>>>>>
>>>>>> As Boris already mentioned we have the dma-resv lock by default
>>>>>> or a driver
>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the
>>>>>> latter.
>>>>>>
>>>>>>>>
>>>>>>>> Also, what if the object is an external object? We can't
>>>>>>>> use the VM's
>>>>>>>> dma-resv
>>>>>>>> lock here.
>>>>>>>
>>>>>>> Why? Typically (sync) unlink is only ever called from an
>>>>>>> unbind-like
>>>>>>> operation where it should be trivial to grab the vm's resv.
>>>>>>> Or, for
>>>>>>> that matter any outer lock protecting the extobj list. Rule
>>>>>>> would be
>>>>>>> the drm_gpuvm_bo::entry::extobj and
>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>> be protected by either the vm's dma_resv (or possibly an
>>>>>>> outer lock in
>>>>>>> the case of the extobj list).
>>>>>>
>>>>>> Outer lock wouldn't have been working for updates in the async
>>>>>> path, but
>>>>>> shouldn't be relevant anymore. We could use the VM's resv for
>>>>>> that.
>>>>>>
>>>>>>>
>>>>>>>>    And we can't have the GEM objs dma-resv lock held when
>>>>>>>> calling
>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
>>>>>>>> the
>>>>>>>> refcount drops
>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and
>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>> drop the
>>>>>>>> last reference of the GEM object.
>>>>>>>
>>>>>>> Yes, but this is a different problem as to what exactly
>>>>>>> protects
>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal
>>>>>>> per bo list
>>>>>>> lock, or if we want to keep the bo's dma_resv we need to
>>>>>>> ensure that
>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts
>>>>>>> its obj
>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's
>>>>>>> refcount (I know
>>>>>>> Boris didn't like that, but requiring an explicit refcount
>>>>>>> for a
>>>>>>> pointer you dereference unless you're under a lock that
>>>>>>> ensures keeping
>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal
>>>>>>> spinlock)
>>>>>>> I don't have a strong preference.
>>>>>>
>>>>>> We can keep the GEM objects dma-resv lock, however as mentioned
>>>>>> above
>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
>>>>>> the VM's resv lock
>>>>>> and the GEM's resv lock in case they differ.
>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>
>>>
>>