2023-09-28 23:21:34

by Danilo Krummrich

[permalink] [raw]
Subject: [PATCH drm-misc-next v5 0/6] [RFC] DRM GPUVM features

Currently GPUVM offers common infrastructure to track GPU VA allocations
and mappings, generically connect GPU VA mappings to their backing
buffers and perform more complex mapping operations on the GPU VA space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make GPUVM represent the
basis of a VM implementation. In this context, this patch series aims at
generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

The implementation introduces struct drm_gpuvm_bo, which serves as abstraction
combining a struct drm_gpuvm and struct drm_gem_object, similar to what
amdgpu does with struct amdgpu_bo_vm. While this adds a bit of complexity it
improves the efficiency of tracking external and evicted GEM objects.

This patch series is also available at [3].

[1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/gpuvm-next

Changes in V2:
==============
- rename 'drm_gpuva_manager' -> 'drm_gpuvm' which generally leads to more
consistent naming
- properly separate commits (introduce common dma-resv, drm_gpuvm_bo
abstraction, etc.)
- remove maple tree for tracking external objects, use a list drm_gpuvm_bos
per drm_gpuvm instead
- rework dma-resv locking helpers (Thomas)
- add a locking helper for a given range of the VA space (Christian)
- make the GPUVA manager buildable as module, rather than drm_exec
builtin (Christian)

Changes in V3:
==============
- rename missing function and files (Boris)
- warn if vm_obj->obj != obj in drm_gpuva_link() (Boris)
- don't expose drm_gpuvm_bo_destroy() (Boris)
- unlink VM_BO from GEM in drm_gpuvm_bo_destroy() rather than
drm_gpuva_unlink() and link within drm_gpuvm_bo_obtain() to keep
drm_gpuvm_bo instances unique
- add internal locking to external and evicted object lists to support drivers
updating the VA space from within the fence signalling critical path (Boris)
- unlink external objects and evicted objects from the GPUVM's list in
drm_gpuvm_bo_destroy()
- add more documentation and fix some kernel doc issues

Changes in V4:
==============
- add a drm_gpuvm_resv() helper (Boris)
- add a drm_gpuvm::<list_name>::local_list field (Boris)
- remove drm_gpuvm_bo_get_unless_zero() helper (Boris)
- fix missing NULL assignment in get_next_vm_bo_from_list() (Boris)
- keep a drm_gem_object reference on potential vm_bo destroy (alternatively we
could free the vm_bo and drop the vm_bo's drm_gem_object reference through
async work)
- introduce DRM_GPUVM_RESV_PROTECTED flag to indicate external locking through
the corresponding dma-resv locks to optimize for drivers already holding
them when needed; add the corresponding lock_assert_held() calls (Thomas)
- make drm_gpuvm_bo_evict() per vm_bo and add a drm_gpuvm_bo_gem_evict()
helper (Thomas)
- pass a drm_gpuvm_bo in drm_gpuvm_ops::vm_bo_validate() (Thomas)
- documentation fixes

Changes in V5:
==============
- use a root drm_gem_object provided by the driver as a base for the VM's
common dma-resv (Christian)
- provide a helper to allocate a "dummy" root GEM object in case a driver
specific root GEM object isn't available
- add a dedicated patch for nouveau to make use of the GPUVM's shared dma-resv
- improve documentation (Boris)
- the following patches are removed from the series, since they already landed
in drm-misc-next
- f72c2db47080 ("drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm")
- fe7acaa727e1 ("drm/gpuvm: allow building as module")
- 78f54469b871 ("drm/nouveau: uvmm: rename 'umgr' to 'base'")

Danilo Krummrich (6):
drm/gpuvm: add common dma-resv per struct drm_gpuvm
drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm
drm/gpuvm: add an abstraction for a VM / BO combination
drm/gpuvm: track/lock/validate external/evicted objects
drm/nouveau: make use of the GPUVM's shared dma-resv
drm/nouveau: use GPUVM common infrastructure

drivers/gpu/drm/drm_gpuvm.c | 1036 +++++++++++++++++++++--
drivers/gpu/drm/nouveau/nouveau_bo.c | 15 +-
drivers/gpu/drm/nouveau/nouveau_bo.h | 5 +
drivers/gpu/drm/nouveau/nouveau_exec.c | 52 +-
drivers/gpu/drm/nouveau/nouveau_exec.h | 4 -
drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +-
drivers/gpu/drm/nouveau/nouveau_sched.h | 4 +-
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 183 ++--
drivers/gpu/drm/nouveau/nouveau_uvmm.h | 1 -
include/drm/drm_gem.h | 32 +-
include/drm/drm_gpuvm.h | 465 +++++++++-
11 files changed, 1625 insertions(+), 182 deletions(-)


base-commit: a4ead6e37e3290cff399e2598d75e98777b69b37
--
2.41.0


2023-09-29 01:21:21

by Danilo Krummrich

[permalink] [raw]
Subject: [PATCH drm-misc-next v5 3/6] drm/gpuvm: add an abstraction for a VM / BO combination

This patch adds an abstraction layer between the drm_gpuva mappings of
a particular drm_gem_object and this GEM object itself. The abstraction
represents a combination of a drm_gem_object and drm_gpuvm. The
drm_gem_object holds a list of drm_gpuvm_bo structures (the structure
representing this abstraction), while each drm_gpuvm_bo contains list of
mappings of this GEM object.

This has multiple advantages:

1) We can use the drm_gpuvm_bo structure to attach it to various lists
of the drm_gpuvm. This is useful for tracking external and evicted
objects per VM, which is introduced in subsequent patches.

2) Finding mappings of a certain drm_gem_object mapped in a certain
drm_gpuvm becomes much cheaper.

3) Drivers can derive and extend the structure to easily represent
driver specific states of a BO for a certain GPUVM.

The idea of this abstraction was taken from amdgpu, hence the credit for
this idea goes to the developers of amdgpu.

Cc: Christian König <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
---
drivers/gpu/drm/drm_gpuvm.c | 334 +++++++++++++++++++++----
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++--
include/drm/drm_gem.h | 32 +--
include/drm/drm_gpuvm.h | 177 ++++++++++++-
4 files changed, 523 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 6368dfdbe9dd..27100423154b 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -70,6 +70,18 @@
* &drm_gem_object, such as the &drm_gem_object containing the root page table,
* but it can also be a 'dummy' object, which can be allocated with
* drm_gpuvm_root_object_alloc().
+ *
+ * In order to connect a struct drm_gpuva its backing &drm_gem_object each
+ * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each
+ * &drm_gpuvm_bo contains a list of &&drm_gpuva structures.
+ *
+ * A &drm_gpuvm_bo is an abstraction that represents a combination of a
+ * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique.
+ * This is ensured by the API through drm_gpuvm_bo_obtain() and
+ * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding
+ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
+ * particular combination. If not existent a new instance is created and linked
+ * to the &drm_gem_object.
*/

/**
@@ -395,21 +407,28 @@
/**
* DOC: Locking
*
- * Generally, the GPU VA manager does not take care of locking itself, it is
- * the drivers responsibility to take care about locking. Drivers might want to
- * protect the following operations: inserting, removing and iterating
- * &drm_gpuva objects as well as generating all kinds of operations, such as
- * split / merge or prefetch.
- *
- * The GPU VA manager also does not take care of the locking of the backing
- * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
- * enforce mutual exclusion using either the GEMs dma_resv lock or alternatively
- * a driver specific external lock. For the latter see also
- * drm_gem_gpuva_set_lock().
- *
- * However, the GPU VA manager contains lockdep checks to ensure callers of its
- * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is
- * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink().
+ * In terms of managing &drm_gpuva entries DRM GPUVM does not take care of
+ * locking itself, it is the drivers responsibility to take care about locking.
+ * Drivers might want to protect the following operations: inserting, removing
+ * and iterating &drm_gpuva objects as well as generating all kinds of
+ * operations, such as split / merge or prefetch.
+ *
+ * DRM GPUVM also does not take care of the locking of the backing
+ * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by
+ * itself; drivers are responsible to enforce mutual exclusion using either the
+ * GEMs dma_resv lock or alternatively a driver specific external lock. For the
+ * latter see also drm_gem_gpuva_set_lock().
+ *
+ * However, DRM GPUVM contains lockdep checks to ensure callers of its API hold
+ * the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed
+ * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also
+ * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put().
+ *
+ * The latter is required since on creation and destruction of a &drm_gpuvm_bo
+ * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list.
+ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
+ * &drm_gem_object must be able to observe previous creations and destructions
+ * of &drm_gpuvm_bos in order to keep instances unique.
*/

/**
@@ -439,6 +458,7 @@
* {
* struct drm_gpuva_ops *ops;
* struct drm_gpuva_op *op
+ * struct drm_gpuvm_bo *vm_bo;
*
* driver_lock_va_space();
* ops = drm_gpuvm_sm_map_ops_create(gpuvm, addr, range,
@@ -446,6 +466,10 @@
* if (IS_ERR(ops))
* return PTR_ERR(ops);
*
+ * vm_bo = drm_gpuvm_bo_obtain(gpuvm, obj);
+ * if (IS_ERR(vm_bo))
+ * return PTR_ERR(vm_bo);
+ *
* drm_gpuva_for_each_op(op, ops) {
* struct drm_gpuva *va;
*
@@ -458,7 +482,7 @@
*
* driver_vm_map();
* drm_gpuva_map(gpuvm, va, &op->map);
- * drm_gpuva_link(va);
+ * drm_gpuva_link(va, vm_bo);
*
* break;
* case DRM_GPUVA_OP_REMAP: {
@@ -485,11 +509,11 @@
* driver_vm_remap();
* drm_gpuva_remap(prev, next, &op->remap);
*
- * drm_gpuva_unlink(va);
* if (prev)
- * drm_gpuva_link(prev);
+ * drm_gpuva_link(prev, va->vm_bo);
* if (next)
- * drm_gpuva_link(next);
+ * drm_gpuva_link(next, va->vm_bo);
+ * drm_gpuva_unlink(va);
*
* break;
* }
@@ -505,6 +529,7 @@
* break;
* }
* }
+ * drm_gpuvm_bo_put(vm_bo);
* driver_unlock_va_space();
*
* return 0;
@@ -514,6 +539,7 @@
*
* struct driver_context {
* struct drm_gpuvm *gpuvm;
+ * struct drm_gpuvm_bo *vm_bo;
* struct drm_gpuva *new_va;
* struct drm_gpuva *prev_va;
* struct drm_gpuva *next_va;
@@ -534,6 +560,7 @@
* struct drm_gem_object *obj, u64 offset)
* {
* struct driver_context ctx;
+ * struct drm_gpuvm_bo *vm_bo;
* struct drm_gpuva_ops *ops;
* struct drm_gpuva_op *op;
* int ret = 0;
@@ -543,16 +570,23 @@
* ctx.new_va = kzalloc(sizeof(*ctx.new_va), GFP_KERNEL);
* ctx.prev_va = kzalloc(sizeof(*ctx.prev_va), GFP_KERNEL);
* ctx.next_va = kzalloc(sizeof(*ctx.next_va), GFP_KERNEL);
- * if (!ctx.new_va || !ctx.prev_va || !ctx.next_va) {
+ * ctx.vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
+ * if (!ctx.new_va || !ctx.prev_va || !ctx.next_va || !vm_bo) {
* ret = -ENOMEM;
* goto out;
* }
*
+ * // Typically protected with a driver specific GEM gpuva lock
+ * // used in the fence signaling path for drm_gpuva_link() and
+ * // drm_gpuva_unlink(), hence pre-allocate.
+ * ctx.vm_bo = drm_gpuvm_bo_obtain_prealloc(ctx.vm_bo);
+ *
* driver_lock_va_space();
* ret = drm_gpuvm_sm_map(gpuvm, &ctx, addr, range, obj, offset);
* driver_unlock_va_space();
*
* out:
+ * drm_gpuvm_bo_put(ctx.vm_bo);
* kfree(ctx.new_va);
* kfree(ctx.prev_va);
* kfree(ctx.next_va);
@@ -565,7 +599,7 @@
*
* drm_gpuva_map(ctx->vm, ctx->new_va, &op->map);
*
- * drm_gpuva_link(ctx->new_va);
+ * drm_gpuva_link(ctx->new_va, ctx->vm_bo);
*
* // prevent the new GPUVA from being freed in
* // driver_mapping_create()
@@ -577,22 +611,23 @@
* int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
* {
* struct driver_context *ctx = __ctx;
+ * struct drm_gpuva *va = op->remap.unmap->va;
*
* drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
*
- * drm_gpuva_unlink(op->remap.unmap->va);
- * kfree(op->remap.unmap->va);
- *
* if (op->remap.prev) {
- * drm_gpuva_link(ctx->prev_va);
+ * drm_gpuva_link(ctx->prev_va, va->vm_bo);
* ctx->prev_va = NULL;
* }
*
* if (op->remap.next) {
- * drm_gpuva_link(ctx->next_va);
+ * drm_gpuva_link(ctx->next_va, va->vm_bo);
* ctx->next_va = NULL;
* }
*
+ * drm_gpuva_unlink(va);
+ * kfree(va);
+ *
* return 0;
* }
*
@@ -771,6 +806,194 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
}
EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);

+/**
+ * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ *
+ * If provided by the driver, this function uses the &drm_gpuvm_ops
+ * vm_bo_alloc() callback to allocate.
+ *
+ * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj)
+{
+ const struct drm_gpuvm_ops *ops = gpuvm->ops;
+ struct drm_gpuvm_bo *vm_bo;
+
+ if (ops && ops->vm_bo_alloc)
+ vm_bo = ops->vm_bo_alloc();
+ else
+ vm_bo = kzalloc(sizeof(*vm_bo), GFP_KERNEL);
+
+ if (unlikely(!vm_bo))
+ return NULL;
+
+ vm_bo->vm = gpuvm;
+ vm_bo->obj = obj;
+
+ kref_init(&vm_bo->kref);
+ INIT_LIST_HEAD(&vm_bo->list.gpuva);
+ INIT_LIST_HEAD(&vm_bo->list.entry.gem);
+
+ drm_gem_object_get(obj);
+
+ return vm_bo;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_create);
+
+static void
+drm_gpuvm_bo_destroy(struct kref *kref)
+{
+ struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
+ kref);
+ struct drm_gpuvm *gpuvm = vm_bo->vm;
+ const struct drm_gpuvm_ops *ops = gpuvm->ops;
+ struct drm_gem_object *obj = vm_bo->obj;
+ bool lock = !drm_gpuvm_resv_protected(gpuvm);
+
+ drm_gem_gpuva_assert_lock_held(obj);
+ if (!lock)
+ drm_gpuvm_resv_assert_held(gpuvm);
+
+ list_del(&vm_bo->list.entry.gem);
+
+ drm_gem_object_put(obj);
+
+ if (ops && ops->vm_bo_free)
+ ops->vm_bo_free(vm_bo);
+ else
+ kfree(vm_bo);
+}
+
+/**
+ * drm_gpuvm_bo_put() - drop a struct drm_gpuvm_bo reference
+ * @vm_bo: the &drm_gpuvm_bo to release the reference of
+ *
+ * This releases a reference to @vm_bo.
+ *
+ * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
+ * includes removing it from the GEMs gpuva list. Hence, if a call to this
+ * function can potentially let the reference count to zero the caller must
+ * hold the dma-resv or driver specific GEM gpuva lock.
+ */
+void
+drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
+{
+ if (vm_bo)
+ kref_put(&vm_bo->kref, drm_gpuvm_bo_destroy);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
+
+static struct drm_gpuvm_bo *
+__drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj)
+{
+ struct drm_gpuvm_bo *vm_bo;
+
+ drm_gem_gpuva_assert_lock_held(obj);
+
+ drm_gem_for_each_gpuvm_bo(vm_bo, obj)
+ if (vm_bo->vm == gpuvm)
+ return vm_bo;
+
+ return NULL;
+}
+
+/**
+ * drm_gpuvm_bo_find() - find the &drm_gpuvm_bo for the given
+ * &drm_gpuvm and &drm_gem_object
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ *
+ * Find the &drm_gpuvm_bo representing the combination of the given
+ * &drm_gpuvm and &drm_gem_object. If found, increases the reference
+ * count of the &drm_gpuvm_bo accordingly.
+ *
+ * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj)
+{
+ struct drm_gpuvm_bo *vm_bo = __drm_gpuvm_bo_find(gpuvm, obj);
+
+ return vm_bo ? drm_gpuvm_bo_get(vm_bo) : NULL;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_find);
+
+/**
+ * drm_gpuvm_bo_obtain() - obtains and instance of the &drm_gpuvm_bo for the
+ * given &drm_gpuvm and &drm_gem_object
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ *
+ * Find the &drm_gpuvm_bo representing the combination of the given
+ * &drm_gpuvm and &drm_gem_object. If found, increases the reference
+ * count of the &drm_gpuvm_bo accordingly. If not found, allocates a new
+ * &drm_gpuvm_bo.
+ *
+ * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
+ *
+ * Returns: a pointer to the &drm_gpuvm_bo on success, an ERR_PTR on failure
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj)
+{
+ struct drm_gpuvm_bo *vm_bo;
+
+ vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
+ if (vm_bo)
+ return vm_bo;
+
+ vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
+ if (!vm_bo)
+ return ERR_PTR(-ENOMEM);
+
+ list_add_tail(&vm_bo->list.entry.gem, &obj->gpuva.list);
+
+ return vm_bo;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain);
+
+/**
+ * drm_gpuvm_bo_obtain_prealloc() - obtains and instance of the &drm_gpuvm_bo
+ * for the given &drm_gpuvm and &drm_gem_object
+ * @__vm_bo: A pre-allocated struct drm_gpuvm_bo.
+ *
+ * Find the &drm_gpuvm_bo representing the combination of the given
+ * &drm_gpuvm and &drm_gem_object. If found, increases the reference
+ * count of the found &drm_gpuvm_bo accordingly, while the @__vm_bo reference
+ * count is decreased. If not found @__vm_bo is returned without further
+ * increase of the reference count.
+ *
+ * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
+ *
+ * Returns: a pointer to the found &drm_gpuvm_bo or @__vm_bo if no existing
+ * &drm_gpuvm_bo was found
+ */
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
+{
+ struct drm_gpuvm *gpuvm = __vm_bo->vm;
+ struct drm_gem_object *obj = __vm_bo->obj;
+ struct drm_gpuvm_bo *vm_bo;
+
+ vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
+ if (vm_bo) {
+ drm_gpuvm_bo_put(__vm_bo);
+ return vm_bo;
+ }
+
+ list_add_tail(&__vm_bo->list.entry.gem, &obj->gpuva.list);
+
+ return __vm_bo;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
+
static int
__drm_gpuva_insert(struct drm_gpuvm *gpuvm,
struct drm_gpuva *va)
@@ -860,24 +1083,33 @@ EXPORT_SYMBOL_GPL(drm_gpuva_remove);
/**
* drm_gpuva_link() - link a &drm_gpuva
* @va: the &drm_gpuva to link
+ * @vm_bo: the &drm_gpuvm_bo to add the &drm_gpuva to
*
- * This adds the given &va to the GPU VA list of the &drm_gem_object it is
- * associated with.
+ * This adds the given &va to the GPU VA list of the &drm_gpuvm_bo and the
+ * &drm_gpuvm_bo to the &drm_gem_object it is associated with.
+ *
+ * For every &drm_gpuva entry added to the &drm_gpuvm_bo an additional
+ * reference of the latter is taken.
*
* This function expects the caller to protect the GEM's GPUVA list against
- * concurrent access using the GEMs dma_resv lock.
+ * concurrent access using either the GEMs dma_resv lock or a driver specific
+ * lock set through drm_gem_gpuva_set_lock().
*/
void
-drm_gpuva_link(struct drm_gpuva *va)
+drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo)
{
struct drm_gem_object *obj = va->gem.obj;

if (unlikely(!obj))
return;

+ WARN_ON(obj != vm_bo->obj);
drm_gem_gpuva_assert_lock_held(obj);

- list_add_tail(&va->gem.entry, &obj->gpuva.list);
+ drm_gpuvm_bo_get(vm_bo);
+
+ va->vm_bo = vm_bo;
+ list_add_tail(&va->gem.entry, &vm_bo->list.gpuva);
}
EXPORT_SYMBOL_GPL(drm_gpuva_link);

@@ -888,13 +1120,22 @@ EXPORT_SYMBOL_GPL(drm_gpuva_link);
* This removes the given &va from the GPU VA list of the &drm_gem_object it is
* associated with.
*
+ * This removes the given &va from the GPU VA list of the &drm_gpuvm_bo and
+ * the &drm_gpuvm_bo from the &drm_gem_object it is associated with in case
+ * this call unlinks the last &drm_gpuva from the &drm_gpuvm_bo.
+ *
+ * For every &drm_gpuva entry removed from the &drm_gpuvm_bo a reference of
+ * the latter is dropped.
+ *
* This function expects the caller to protect the GEM's GPUVA list against
- * concurrent access using the GEMs dma_resv lock.
+ * concurrent access using either the GEMs dma_resv lock or a driver specific
+ * lock set through drm_gem_gpuva_set_lock().
*/
void
drm_gpuva_unlink(struct drm_gpuva *va)
{
struct drm_gem_object *obj = va->gem.obj;
+ struct drm_gpuvm_bo *vm_bo = va->vm_bo;

if (unlikely(!obj))
return;
@@ -902,6 +1143,11 @@ drm_gpuva_unlink(struct drm_gpuva *va)
drm_gem_gpuva_assert_lock_held(obj);

list_del_init(&va->gem.entry);
+ va->vm_bo = NULL;
+
+ drm_gem_object_get(obj);
+ drm_gpuvm_bo_put(vm_bo);
+ drm_gem_object_put(obj);
}
EXPORT_SYMBOL_GPL(drm_gpuva_unlink);

@@ -1046,10 +1292,10 @@ drm_gpuva_remap(struct drm_gpuva *prev,
struct drm_gpuva *next,
struct drm_gpuva_op_remap *op)
{
- struct drm_gpuva *curr = op->unmap->va;
- struct drm_gpuvm *gpuvm = curr->vm;
+ struct drm_gpuva *va = op->unmap->va;
+ struct drm_gpuvm *gpuvm = va->vm;

- drm_gpuva_remove(curr);
+ drm_gpuva_remove(va);

if (op->prev) {
drm_gpuva_init_from_op(prev, op->prev);
@@ -1693,9 +1939,8 @@ drm_gpuvm_prefetch_ops_create(struct drm_gpuvm *gpuvm,
EXPORT_SYMBOL_GPL(drm_gpuvm_prefetch_ops_create);

/**
- * drm_gpuvm_gem_unmap_ops_create() - creates the &drm_gpuva_ops to unmap a GEM
- * @gpuvm: the &drm_gpuvm representing the GPU VA space
- * @obj: the &drm_gem_object to unmap
+ * drm_gpuvm_bo_unmap_ops_create() - creates the &drm_gpuva_ops to unmap a GEM
+ * @vm_bo: the &drm_gpuvm_bo abstraction
*
* This function creates a list of operations to perform unmapping for every
* GPUVA attached to a GEM.
@@ -1712,15 +1957,14 @@ EXPORT_SYMBOL_GPL(drm_gpuvm_prefetch_ops_create);
* Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
*/
struct drm_gpuva_ops *
-drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
- struct drm_gem_object *obj)
+drm_gpuvm_bo_unmap_ops_create(struct drm_gpuvm_bo *vm_bo)
{
struct drm_gpuva_ops *ops;
struct drm_gpuva_op *op;
struct drm_gpuva *va;
int ret;

- drm_gem_gpuva_assert_lock_held(obj);
+ drm_gem_gpuva_assert_lock_held(vm_bo->obj);

ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (!ops)
@@ -1728,8 +1972,8 @@ drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,

INIT_LIST_HEAD(&ops->list);

- drm_gem_for_each_gpuva(va, obj) {
- op = gpuva_op_alloc(gpuvm);
+ drm_gpuvm_bo_for_each_va(va, vm_bo) {
+ op = gpuva_op_alloc(vm_bo->vm);
if (!op) {
ret = -ENOMEM;
goto err_free_ops;
@@ -1743,10 +1987,10 @@ drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
return ops;

err_free_ops:
- drm_gpuva_ops_free(gpuvm, ops);
+ drm_gpuva_ops_free(vm_bo->vm, ops);
return ERR_PTR(ret);
}
-EXPORT_SYMBOL_GPL(drm_gpuvm_gem_unmap_ops_create);
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_unmap_ops_create);

/**
* drm_gpuva_ops_free() - free the given &drm_gpuva_ops
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 93ad2ba7ec8b..4e46f850e65f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -62,6 +62,8 @@ struct bind_job_op {
enum vm_bind_op op;
u32 flags;

+ struct drm_gpuvm_bo *vm_bo;
+
struct {
u64 addr;
u64 range;
@@ -1113,22 +1115,28 @@ bind_validate_region(struct nouveau_job *job)
}

static void
-bind_link_gpuvas(struct drm_gpuva_ops *ops, struct nouveau_uvma_prealloc *new)
+bind_link_gpuvas(struct bind_job_op *bop)
{
+ struct nouveau_uvma_prealloc *new = &bop->new;
+ struct drm_gpuvm_bo *vm_bo = bop->vm_bo;
+ struct drm_gpuva_ops *ops = bop->ops;
struct drm_gpuva_op *op;

drm_gpuva_for_each_op(op, ops) {
switch (op->op) {
case DRM_GPUVA_OP_MAP:
- drm_gpuva_link(&new->map->va);
+ drm_gpuva_link(&new->map->va, vm_bo);
break;
- case DRM_GPUVA_OP_REMAP:
+ case DRM_GPUVA_OP_REMAP: {
+ struct drm_gpuva *va = op->remap.unmap->va;
+
if (op->remap.prev)
- drm_gpuva_link(&new->prev->va);
+ drm_gpuva_link(&new->prev->va, va->vm_bo);
if (op->remap.next)
- drm_gpuva_link(&new->next->va);
- drm_gpuva_unlink(op->remap.unmap->va);
+ drm_gpuva_link(&new->next->va, va->vm_bo);
+ drm_gpuva_unlink(va);
break;
+ }
case DRM_GPUVA_OP_UNMAP:
drm_gpuva_unlink(op->unmap.va);
break;
@@ -1150,10 +1158,18 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)

list_for_each_op(op, &bind_job->ops) {
if (op->op == OP_MAP) {
- op->gem.obj = drm_gem_object_lookup(job->file_priv,
- op->gem.handle);
- if (!op->gem.obj)
+ struct drm_gem_object *obj;
+
+ obj = drm_gem_object_lookup(job->file_priv,
+ op->gem.handle);
+ if (!(op->gem.obj = obj))
return -ENOENT;
+
+ dma_resv_lock(obj->resv, NULL);
+ op->vm_bo = drm_gpuvm_bo_obtain(&uvmm->base, obj);
+ dma_resv_unlock(obj->resv);
+ if (IS_ERR(op->vm_bo))
+ return PTR_ERR(op->vm_bo);
}

ret = bind_validate_op(job, op);
@@ -1364,7 +1380,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
case OP_UNMAP_SPARSE:
case OP_MAP:
case OP_UNMAP:
- bind_link_gpuvas(op->ops, &op->new);
+ bind_link_gpuvas(op);
break;
default:
break;
@@ -1511,6 +1527,12 @@ nouveau_uvmm_bind_job_free_work_fn(struct work_struct *work)
if (!IS_ERR_OR_NULL(op->ops))
drm_gpuva_ops_free(&uvmm->base, op->ops);

+ if (!IS_ERR_OR_NULL(op->vm_bo)) {
+ dma_resv_lock(obj->resv, NULL);
+ drm_gpuvm_bo_put(op->vm_bo);
+ dma_resv_unlock(obj->resv);
+ }
+
if (obj)
drm_gem_object_put(obj);
}
@@ -1776,15 +1798,18 @@ void
nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbo, struct nouveau_mem *mem)
{
struct drm_gem_object *obj = &nvbo->bo.base;
+ struct drm_gpuvm_bo *vm_bo;
struct drm_gpuva *va;

dma_resv_assert_held(obj->resv);

- drm_gem_for_each_gpuva(va, obj) {
- struct nouveau_uvma *uvma = uvma_from_va(va);
+ drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+ drm_gpuvm_bo_for_each_va(va, vm_bo) {
+ struct nouveau_uvma *uvma = uvma_from_va(va);

- nouveau_uvma_map(uvma, mem);
- drm_gpuva_invalidate(va, false);
+ nouveau_uvma_map(uvma, mem);
+ drm_gpuva_invalidate(va, false);
+ }
}
}

@@ -1792,15 +1817,18 @@ void
nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo)
{
struct drm_gem_object *obj = &nvbo->bo.base;
+ struct drm_gpuvm_bo *vm_bo;
struct drm_gpuva *va;

dma_resv_assert_held(obj->resv);

- drm_gem_for_each_gpuva(va, obj) {
- struct nouveau_uvma *uvma = uvma_from_va(va);
+ drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+ drm_gpuvm_bo_for_each_va(va, vm_bo) {
+ struct nouveau_uvma *uvma = uvma_from_va(va);

- nouveau_uvma_unmap(uvma);
- drm_gpuva_invalidate(va, true);
+ nouveau_uvma_unmap(uvma);
+ drm_gpuva_invalidate(va, true);
+ }
}
}

diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..7147978d82d8 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -571,7 +571,7 @@ int drm_gem_evict(struct drm_gem_object *obj);
* drm_gem_gpuva_init() - initialize the gpuva list of a GEM object
* @obj: the &drm_gem_object
*
- * This initializes the &drm_gem_object's &drm_gpuva list.
+ * This initializes the &drm_gem_object's &drm_gpuvm_bo list.
*
* Calling this function is only necessary for drivers intending to support the
* &drm_driver_feature DRIVER_GEM_GPUVA.
@@ -584,28 +584,28 @@ static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
}

/**
- * drm_gem_for_each_gpuva() - iternator to walk over a list of gpuvas
- * @entry__: &drm_gpuva structure to assign to in each iteration step
- * @obj__: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ * drm_gem_for_each_gpuvm_bo() - iterator to walk over a list of &drm_gpuvm_bo
+ * @entry__: &drm_gpuvm_bo structure to assign to in each iteration step
+ * @obj__: the &drm_gem_object the &drm_gpuvm_bo to walk are associated with
*
- * This iterator walks over all &drm_gpuva structures associated with the
- * &drm_gpuva_manager.
+ * This iterator walks over all &drm_gpuvm_bo structures associated with the
+ * &drm_gem_object.
*/
-#define drm_gem_for_each_gpuva(entry__, obj__) \
- list_for_each_entry(entry__, &(obj__)->gpuva.list, gem.entry)
+#define drm_gem_for_each_gpuvm_bo(entry__, obj__) \
+ list_for_each_entry(entry__, &(obj__)->gpuva.list, list.entry.gem)

/**
- * drm_gem_for_each_gpuva_safe() - iternator to safely walk over a list of
- * gpuvas
- * @entry__: &drm_gpuva structure to assign to in each iteration step
- * @next__: &next &drm_gpuva to store the next step
- * @obj__: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ * drm_gem_for_each_gpuvm_bo_safe() - iterator to safely walk over a list of
+ * &drm_gpuvm_bo
+ * @entry__: &drm_gpuvm_bostructure to assign to in each iteration step
+ * @next__: &next &drm_gpuvm_bo to store the next step
+ * @obj__: the &drm_gem_object the &drm_gpuvm_bo to walk are associated with
*
- * This iterator walks over all &drm_gpuva structures associated with the
+ * This iterator walks over all &drm_gpuvm_bo structures associated with the
* &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
* it is save against removal of elements.
*/
-#define drm_gem_for_each_gpuva_safe(entry__, next__, obj__) \
- list_for_each_entry_safe(entry__, next__, &(obj__)->gpuva.list, gem.entry)
+#define drm_gem_for_each_gpuvm_bo_safe(entry__, next__, obj__) \
+ list_for_each_entry_safe(entry__, next__, &(obj__)->gpuva.list, list.entry.gem)

#endif /* __DRM_GEM_H__ */
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 13539f32c2e2..7ab479153a00 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -26,12 +26,14 @@
*/

#include <linux/list.h>
+#include <linux/dma-resv.h>
#include <linux/rbtree.h>
#include <linux/types.h>

#include <drm/drm_gem.h>

struct drm_gpuvm;
+struct drm_gpuvm_bo;
struct drm_gpuvm_ops;

/**
@@ -72,6 +74,12 @@ struct drm_gpuva {
*/
struct drm_gpuvm *vm;

+ /**
+ * @vm_bo: the &drm_gpuvm_bo abstraction for the mapped
+ * &drm_gem_object
+ */
+ struct drm_gpuvm_bo *vm_bo;
+
/**
* @flags: the &drm_gpuva_flags for this mapping
*/
@@ -107,7 +115,7 @@ struct drm_gpuva {
struct drm_gem_object *obj;

/**
- * @entry: the &list_head to attach this object to a &drm_gem_object
+ * @entry: the &list_head to attach this object to a &drm_gpuvm_bo
*/
struct list_head entry;
} gem;
@@ -140,7 +148,7 @@ struct drm_gpuva {
int drm_gpuva_insert(struct drm_gpuvm *gpuvm, struct drm_gpuva *va);
void drm_gpuva_remove(struct drm_gpuva *va);

-void drm_gpuva_link(struct drm_gpuva *va);
+void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo);
void drm_gpuva_unlink(struct drm_gpuva *va);

struct drm_gpuva *drm_gpuva_find(struct drm_gpuvm *gpuvm,
@@ -187,10 +195,16 @@ static inline bool drm_gpuva_invalidated(struct drm_gpuva *va)
* enum drm_gpuvm_flags - flags for struct drm_gpuvm
*/
enum drm_gpuvm_flags {
+ /**
+ * @DRM_GPUVM_RESV_PROTECTED: GPUVM is protected externally by the
+ * GPUVM's &dma_resv lock
+ */
+ DRM_GPUVM_RESV_PROTECTED = (1 << 0),
+
/**
* @DRM_GPUVM_USERBITS: user defined bits
*/
- DRM_GPUVM_USERBITS = (1 << 0),
+ DRM_GPUVM_USERBITS = (1 << 1),
};

/**
@@ -272,6 +286,19 @@ bool drm_gpuvm_interval_empty(struct drm_gpuvm *gpuvm, u64 addr, u64 range);
struct drm_gem_object *
drm_gpuvm_root_object_alloc(struct drm_device *drm);

+/**
+ * drm_gpuvm_resv_protected() - indicates whether &DRM_GPUVM_RESV_PROTECTED is
+ * set
+ * @gpuvm: the &drm_gpuvm
+ *
+ * Returns: true if &DRM_GPUVM_RESV_PROTECTED is set, false otherwise.
+ */
+static inline bool
+drm_gpuvm_resv_protected(struct drm_gpuvm *gpuvm)
+{
+ return gpuvm->flags & DRM_GPUVM_RESV_PROTECTED;
+}
+
/**
* drm_gpuvm_resv() - returns the &drm_gpuvm's &dma_resv
* @gpuvm__: the &drm_gpuvm
@@ -290,6 +317,12 @@ drm_gpuvm_root_object_alloc(struct drm_device *drm);
*/
#define drm_gpuvm_resv_obj(gpuvm__) ((gpuvm__)->r_obj)

+#define drm_gpuvm_resv_held(gpuvm__) \
+ dma_resv_held(drm_gpuvm_resv(gpuvm__))
+
+#define drm_gpuvm_resv_assert_held(gpuvm__) \
+ dma_resv_assert_held(drm_gpuvm_resv(gpuvm__))
+
#define drm_gpuvm_resv_held(gpuvm__) \
dma_resv_held(drm_gpuvm_resv(gpuvm__))

@@ -374,6 +407,117 @@ __drm_gpuva_next(struct drm_gpuva *va)
#define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)

+/**
+ * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
+ * &drm_gem_object combination
+ *
+ * This structure is an abstraction representing a &drm_gpuvm and
+ * &drm_gem_object combination. It serves as an indirection to accelerate
+ * iterating all &drm_gpuvas within a &drm_gpuvm backed by the same
+ * &drm_gem_object.
+ *
+ * Furthermore it is used cache evicted GEM objects for a certain GPU-VM to
+ * accelerate validation.
+ *
+ * Typically, drivers want to create an instance of a struct drm_gpuvm_bo once
+ * a GEM object is mapped first in a GPU-VM and release the instance once the
+ * last mapping of the GEM object in this GPU-VM is unmapped.
+ */
+struct drm_gpuvm_bo {
+
+ /**
+ * @gpuvm: The &drm_gpuvm the @obj is mapped in.
+ */
+ struct drm_gpuvm *vm;
+
+ /**
+ * @obj: The &drm_gem_object being mapped in the @gpuvm.
+ */
+ struct drm_gem_object *obj;
+
+ /**
+ * @kref: The reference count for this &drm_gpuvm_bo.
+ */
+ struct kref kref;
+
+ /**
+ * @list: Structure containing all &list_heads.
+ */
+ struct {
+ /**
+ * @gpuva: The list of linked &drm_gpuvas.
+ */
+ struct list_head gpuva;
+
+ /**
+ * @entry: Structure containing all &list_heads serving as
+ * entry.
+ */
+ struct {
+ /**
+ * @gem: List entry to attach to the &drm_gem_objects
+ * gpuva list.
+ */
+ struct list_head gem;
+ } entry;
+ } list;
+};
+
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj);
+
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj);
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *vm_bo);
+
+/**
+ * drm_gpuvm_bo_get() - acquire a struct drm_gpuvm_bo reference
+ * @vm_bo: the &drm_gpuvm_bo to acquire the reference of
+ *
+ * This function acquires an additional reference to @vm_bo. It is illegal to
+ * call this without already holding a reference. No locks required.
+ */
+static inline struct drm_gpuvm_bo *
+drm_gpuvm_bo_get(struct drm_gpuvm_bo *vm_bo)
+{
+ kref_get(&vm_bo->kref);
+ return vm_bo;
+}
+
+void drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo);
+
+struct drm_gpuvm_bo *
+drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj);
+
+/**
+ * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
+ * @va__: &drm_gpuva structure to assign to in each iteration step
+ * @vm_bo__: the &drm_gpuvm_bo the &drm_gpuva to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuvm_bo.
+ */
+#define drm_gpuvm_bo_for_each_va(va__, vm_bo__) \
+ list_for_each_entry(va__, &(vm_bo)->list.gpuva, gem.entry)
+
+/**
+ * drm_gpuvm_bo_for_each_va_safe() - iterator to safely walk over a list of
+ * &drm_gpuva
+ * @va__: &drm_gpuva structure to assign to in each iteration step
+ * @next__: &next &drm_gpuva to store the next step
+ * @vm_bo__: the &drm_gpuvm_bo the &drm_gpuva to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuvm_bo. It is implemented with list_for_each_entry_safe(), hence
+ * it is save against removal of elements.
+ */
+#define drm_gpuvm_bo_for_each_va_safe(va__, next__, vm_bo__) \
+ list_for_each_entry_safe(va__, next__, &(vm_bo)->list.gpuva, gem.entry)
+
/**
* enum drm_gpuva_op_type - GPU VA operation type
*
@@ -643,8 +787,7 @@ drm_gpuvm_prefetch_ops_create(struct drm_gpuvm *gpuvm,
u64 addr, u64 range);

struct drm_gpuva_ops *
-drm_gpuvm_gem_unmap_ops_create(struct drm_gpuvm *gpuvm,
- struct drm_gem_object *obj);
+drm_gpuvm_bo_unmap_ops_create(struct drm_gpuvm_bo *vm_bo);

void drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
struct drm_gpuva_ops *ops);
@@ -688,6 +831,30 @@ struct drm_gpuvm_ops {
*/
void (*op_free)(struct drm_gpuva_op *op);

+ /**
+ * @vm_bo_alloc: called when the &drm_gpuvm allocates
+ * a struct drm_gpuvm_bo
+ *
+ * Some drivers may want to embed struct drm_gpuvm_bo into driver
+ * specific structures. By implementing this callback drivers can
+ * allocate memory accordingly.
+ *
+ * This callback is optional.
+ */
+ struct drm_gpuvm_bo *(*vm_bo_alloc)(void);
+
+ /**
+ * @vm_bo_free: called when the &drm_gpuvm frees a
+ * struct drm_gpuvm_bo
+ *
+ * Some drivers may want to embed struct drm_gpuvm_bo into driver
+ * specific structures. By implementing this callback drivers can
+ * free the previously allocated memory accordingly.
+ *
+ * This callback is optional.
+ */
+ void (*vm_bo_free)(struct drm_gpuvm_bo *vm_bo);
+
/**
* @sm_step_map: called from &drm_gpuvm_sm_map to finally insert the
* mapping once all previous steps were completed
--
2.41.0

2023-09-29 01:26:30

by Danilo Krummrich

[permalink] [raw]
Subject: [PATCH drm-misc-next v5 5/6] drm/nouveau: make use of the GPUVM's shared dma-resv

DRM GEM objects private to a single GPUVM can use a shared dma-resv.
Make use of the shared dma-resv of GPUVM rather than a driver specific
one.

The shared dma-resv originates from a "root" GEM object serving as
container for the dma-resv to make it compatible with drm_exec.

In order to make sure the object proving the shared dma-resv can't be
freed up before the objects making use of it, let every such GEM object
take a reference on it.

Signed-off-by: Danilo Krummrich <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 11 +++++++++--
drivers/gpu/drm/nouveau/nouveau_bo.h | 5 +++++
drivers/gpu/drm/nouveau/nouveau_gem.c | 10 ++++++++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 7 ++-----
drivers/gpu/drm/nouveau/nouveau_uvmm.h | 1 -
5 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 19cab37ac69c..dbb3facfd23d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -148,10 +148,17 @@ nouveau_bo_del_ttm(struct ttm_buffer_object *bo)
* If nouveau_bo_new() allocated this buffer, the GEM object was never
* initialized, so don't attempt to release it.
*/
- if (bo->base.dev)
+ if (bo->base.dev) {
+ /* Gem objects not being shared with other VMs get their
+ * dma_resv from a root GEM object.
+ */
+ if (nvbo->no_share)
+ drm_gem_object_put(nvbo->r_obj);
+
drm_gem_object_release(&bo->base);
- else
+ } else {
dma_resv_fini(&bo->base._resv);
+ }

kfree(nvbo);
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index 07f671cf895e..70c551921a9e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -26,6 +26,11 @@ struct nouveau_bo {
struct list_head entry;
int pbbo_index;
bool validate_mapped;
+
+ /* Root GEM object we derive the dma_resv of in case this BO is not
+ * shared between VMs.
+ */
+ struct drm_gem_object *r_obj;
bool no_share;

/* GPU address space is independent of CPU word size */
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c0b10d8d3d03..7715baf85c7e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -111,7 +111,8 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
if (vmm->vmm.object.oclass < NVIF_CLASS_VMM_NV50)
return 0;

- if (nvbo->no_share && uvmm && &uvmm->resv != nvbo->bo.base.resv)
+ if (nvbo->no_share && uvmm &&
+ drm_gpuvm_resv(&uvmm->base) != nvbo->bo.base.resv)
return -EPERM;

ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);
@@ -245,7 +246,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
if (unlikely(!uvmm))
return -EINVAL;

- resv = &uvmm->resv;
+ resv = drm_gpuvm_resv(&uvmm->base);
}

if (!(domain & (NOUVEAU_GEM_DOMAIN_VRAM | NOUVEAU_GEM_DOMAIN_GART)))
@@ -288,6 +289,11 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_TESLA)
nvbo->valid_domains &= domain;

+ if (nvbo->no_share) {
+ nvbo->r_obj = drm_gpuvm_resv_obj(&uvmm->base);
+ drm_gem_object_get(nvbo->r_obj);
+ }
+
*pnvbo = nvbo;
return 0;
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 4e46f850e65f..436b0ac74ffe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -1841,7 +1841,6 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
int ret;

mutex_init(&uvmm->mutex);
- dma_resv_init(&uvmm->resv);
mt_init_flags(&uvmm->region_mt, MT_FLAGS_LOCK_EXTERN);
mt_set_external_lock(&uvmm->region_mt, &uvmm->mutex);

@@ -1884,14 +1883,14 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
kernel_managed_addr, kernel_managed_size,
NULL, 0, &cli->uvmm.vmm.vmm);
if (ret)
- goto out_free_gpuva_mgr;
+ goto out_gpuvm_fini;

cli->uvmm.vmm.cli = cli;
mutex_unlock(&cli->mutex);

return 0;

-out_free_gpuva_mgr:
+out_gpuvm_fini:
drm_gpuvm_destroy(&uvmm->base);
out_unlock:
mutex_unlock(&cli->mutex);
@@ -1949,6 +1948,4 @@ nouveau_uvmm_fini(struct nouveau_uvmm *uvmm)
nouveau_vmm_fini(&uvmm->vmm);
drm_gpuvm_destroy(&uvmm->base);
mutex_unlock(&cli->mutex);
-
- dma_resv_fini(&uvmm->resv);
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.h b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
index a308c59760a5..878cc7958483 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
@@ -12,7 +12,6 @@ struct nouveau_uvmm {
struct nouveau_vmm vmm;
struct maple_tree region_mt;
struct mutex mutex;
- struct dma_resv resv;

u64 kernel_managed_addr;
u64 kernel_managed_size;
--
2.41.0

2023-09-29 04:04:13

by Danilo Krummrich

[permalink] [raw]
Subject: [PATCH drm-misc-next v5 6/6] drm/nouveau: use GPUVM common infrastructure

GPUVM provides common infrastructure to track external and evicted GEM
objects as well as locking and validation helpers.

Especially external and evicted object tracking is a huge improvement
compared to the current brute force approach of iterating all mappings
in order to lock and validate the GPUVM's GEM objects. Hence, make us of
it.

Signed-off-by: Danilo Krummrich <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +-
drivers/gpu/drm/nouveau/nouveau_exec.c | 52 +++----------
drivers/gpu/drm/nouveau/nouveau_exec.h | 4 -
drivers/gpu/drm/nouveau/nouveau_sched.h | 4 +-
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 99 ++++++++++++++++---------
5 files changed, 80 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index dbb3facfd23d..62371fe39e96 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1067,17 +1067,18 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict,
{
struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
struct nouveau_bo *nvbo = nouveau_bo(bo);
+ struct drm_gem_object *obj = &bo->base;
struct ttm_resource *old_reg = bo->resource;
struct nouveau_drm_tile *new_tile = NULL;
int ret = 0;

-
if (new_reg->mem_type == TTM_PL_TT) {
ret = nouveau_ttm_tt_bind(bo->bdev, bo->ttm, new_reg);
if (ret)
return ret;
}

+ drm_gpuvm_bo_gem_evict(obj, evict);
nouveau_bo_move_ntfy(bo, new_reg);
ret = ttm_bo_wait_ctx(bo, ctx);
if (ret)
@@ -1142,6 +1143,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict,
out_ntfy:
if (ret) {
nouveau_bo_move_ntfy(bo, bo->resource);
+ drm_gpuvm_bo_gem_evict(obj, !evict);
}
return ret;
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c b/drivers/gpu/drm/nouveau/nouveau_exec.c
index b4239af29e5a..ba6913a3efb6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.c
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -1,7 +1,5 @@
// SPDX-License-Identifier: MIT

-#include <drm/drm_exec.h>
-
#include "nouveau_drv.h"
#include "nouveau_gem.h"
#include "nouveau_mem.h"
@@ -91,9 +89,6 @@ nouveau_exec_job_submit(struct nouveau_job *job)
struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
struct nouveau_cli *cli = job->cli;
struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
- struct drm_exec *exec = &job->exec;
- struct drm_gem_object *obj;
- unsigned long index;
int ret;

ret = nouveau_fence_new(&exec_job->fence);
@@ -101,52 +96,29 @@ nouveau_exec_job_submit(struct nouveau_job *job)
return ret;

nouveau_uvmm_lock(uvmm);
- drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
- DRM_EXEC_IGNORE_DUPLICATES);
- drm_exec_until_all_locked(exec) {
- struct drm_gpuva *va;
-
- drm_gpuvm_for_each_va(va, &uvmm->base) {
- if (unlikely(va == &uvmm->base.kernel_alloc_node))
- continue;
-
- ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
- drm_exec_retry_on_contention(exec);
- if (ret)
- goto err_uvmm_unlock;
- }
+ job->vm_exec.vm = &uvmm->base;
+ ret = drm_gpuvm_exec_lock(&job->vm_exec, 1, false);
+ if (ret) {
+ nouveau_uvmm_unlock(uvmm);
+ return ret;
}
nouveau_uvmm_unlock(uvmm);

- drm_exec_for_each_locked_object(exec, index, obj) {
- struct nouveau_bo *nvbo = nouveau_gem_object(obj);
-
- ret = nouveau_bo_validate(nvbo, true, false);
- if (ret)
- goto err_exec_fini;
+ ret = drm_gpuvm_exec_validate(&job->vm_exec);
+ if (ret) {
+ drm_gpuvm_exec_unlock(&job->vm_exec);
+ return ret;
}

return 0;
-
-err_uvmm_unlock:
- nouveau_uvmm_unlock(uvmm);
-err_exec_fini:
- drm_exec_fini(exec);
- return ret;
-
}

static void
nouveau_exec_job_armed_submit(struct nouveau_job *job)
{
- struct drm_exec *exec = &job->exec;
- struct drm_gem_object *obj;
- unsigned long index;
-
- drm_exec_for_each_locked_object(exec, index, obj)
- dma_resv_add_fence(obj->resv, job->done_fence, job->resv_usage);
-
- drm_exec_fini(exec);
+ drm_gpuvm_exec_resv_add_fence(&job->vm_exec, job->done_fence,
+ job->resv_usage, job->resv_usage);
+ drm_gpuvm_exec_unlock(&job->vm_exec);
}

static struct dma_fence *
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.h b/drivers/gpu/drm/nouveau/nouveau_exec.h
index 778cacd90f65..b815de2428f3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.h
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.h
@@ -3,16 +3,12 @@
#ifndef __NOUVEAU_EXEC_H__
#define __NOUVEAU_EXEC_H__

-#include <drm/drm_exec.h>
-
#include "nouveau_drv.h"
#include "nouveau_sched.h"

struct nouveau_exec_job_args {
struct drm_file *file_priv;
struct nouveau_sched_entity *sched_entity;
-
- struct drm_exec exec;
struct nouveau_channel *chan;

struct {
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouveau/nouveau_sched.h
index 27ac19792597..54379af6f925 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.h
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
@@ -5,7 +5,7 @@

#include <linux/types.h>

-#include <drm/drm_exec.h>
+#include <drm/drm_gpuvm.h>
#include <drm/gpu_scheduler.h>

#include "nouveau_drv.h"
@@ -54,7 +54,7 @@ struct nouveau_job {
struct drm_file *file_priv;
struct nouveau_cli *cli;

- struct drm_exec exec;
+ struct drm_gpuvm_exec vm_exec;
enum dma_resv_usage resv_usage;
struct dma_fence *done_fence;

diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 436b0ac74ffe..ba0f7fcb6f7c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -438,8 +438,9 @@ nouveau_uvma_region_complete(struct nouveau_uvma_region *reg)
static void
op_map_prepare_unwind(struct nouveau_uvma *uvma)
{
+ struct drm_gpuva *va = &uvma->va;
nouveau_uvma_gem_put(uvma);
- drm_gpuva_remove(&uvma->va);
+ drm_gpuva_remove(va);
nouveau_uvma_free(uvma);
}

@@ -468,6 +469,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
break;
case DRM_GPUVA_OP_REMAP: {
struct drm_gpuva_op_remap *r = &op->remap;
+ struct drm_gpuva *va = r->unmap->va;

if (r->next)
op_map_prepare_unwind(new->next);
@@ -475,7 +477,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
if (r->prev)
op_map_prepare_unwind(new->prev);

- op_unmap_prepare_unwind(r->unmap->va);
+ op_unmap_prepare_unwind(va);
break;
}
case DRM_GPUVA_OP_UNMAP:
@@ -634,6 +636,7 @@ nouveau_uvmm_sm_prepare(struct nouveau_uvmm *uvmm,
goto unwind;
}
}
+
break;
}
case DRM_GPUVA_OP_REMAP: {
@@ -1146,13 +1149,44 @@ bind_link_gpuvas(struct bind_job_op *bop)
}
}

+static int
+bind_lock_extra(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
+{
+ struct nouveau_uvmm_bind_job *bind_job = vm_exec->extra.priv;
+ struct drm_exec *exec = &vm_exec->exec;
+ struct bind_job_op *op;
+ int ret;
+
+ list_for_each_op(op, &bind_job->ops) {
+ struct drm_gpuva_op *va_op;
+
+ if (IS_ERR_OR_NULL(op->ops))
+ continue;
+
+ drm_gpuva_for_each_op(va_op, op->ops) {
+ struct drm_gem_object *obj = op_gem_obj(va_op);
+
+ if (unlikely(!obj))
+ continue;
+
+ if (va_op->op != DRM_GPUVA_OP_UNMAP)
+ continue;
+
+ ret = drm_exec_prepare_obj(exec, obj, num_fences);
+ if (ret)
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static int
nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
{
struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
struct nouveau_uvmm_bind_job *bind_job = to_uvmm_bind_job(job);
struct nouveau_sched_entity *entity = job->entity;
- struct drm_exec *exec = &job->exec;
struct bind_job_op *op;
int ret;

@@ -1170,6 +1204,8 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
dma_resv_unlock(obj->resv);
if (IS_ERR(op->vm_bo))
return PTR_ERR(op->vm_bo);
+
+ drm_gpuvm_bo_extobj_add(op->vm_bo);
}

ret = bind_validate_op(job, op);
@@ -1192,6 +1228,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
* unwind all GPU VA space changes on failure.
*/
nouveau_uvmm_lock(uvmm);
+
list_for_each_op(op, &bind_job->ops) {
switch (op->op) {
case OP_MAP_SPARSE:
@@ -1303,30 +1340,13 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
}
}

- drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
- DRM_EXEC_IGNORE_DUPLICATES);
- drm_exec_until_all_locked(exec) {
- list_for_each_op(op, &bind_job->ops) {
- struct drm_gpuva_op *va_op;
+ job->vm_exec.vm = &uvmm->base;
+ job->vm_exec.extra.fn = bind_lock_extra;
+ job->vm_exec.extra.priv = bind_job;

- if (IS_ERR_OR_NULL(op->ops))
- continue;
-
- drm_gpuva_for_each_op(va_op, op->ops) {
- struct drm_gem_object *obj = op_gem_obj(va_op);
-
- if (unlikely(!obj))
- continue;
-
- ret = drm_exec_prepare_obj(exec, obj, 1);
- drm_exec_retry_on_contention(exec);
- if (ret) {
- op = list_last_op(&bind_job->ops);
- goto unwind;
- }
- }
- }
- }
+ ret = drm_gpuvm_exec_lock(&job->vm_exec, 1, false);
+ if (ret)
+ goto unwind_continue;

list_for_each_op(op, &bind_job->ops) {
struct drm_gpuva_op *va_op;
@@ -1426,21 +1446,16 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job)
}

nouveau_uvmm_unlock(uvmm);
- drm_exec_fini(exec);
+ drm_gpuvm_exec_unlock(&job->vm_exec);
return ret;
}

static void
nouveau_uvmm_bind_job_armed_submit(struct nouveau_job *job)
{
- struct drm_exec *exec = &job->exec;
- struct drm_gem_object *obj;
- unsigned long index;
-
- drm_exec_for_each_locked_object(exec, index, obj)
- dma_resv_add_fence(obj->resv, job->done_fence, job->resv_usage);
-
- drm_exec_fini(exec);
+ drm_gpuvm_exec_resv_add_fence(&job->vm_exec, job->done_fence,
+ job->resv_usage, job->resv_usage);
+ drm_gpuvm_exec_unlock(&job->vm_exec);
}

static struct dma_fence *
@@ -1832,6 +1847,18 @@ nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo)
}
}

+static int
+nouveau_uvmm_bo_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
+{
+ struct nouveau_bo *nvbo = nouveau_gem_object(vm_bo->obj);
+
+ return nouveau_bo_validate(nvbo, true, false);
+}
+
+static const struct drm_gpuvm_ops gpuvm_ops = {
+ .vm_bo_validate = nouveau_uvmm_bo_validate,
+};
+
int
nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
u64 kernel_managed_addr, u64 kernel_managed_size)
@@ -1874,7 +1901,7 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
NOUVEAU_VA_SPACE_START,
NOUVEAU_VA_SPACE_END,
kernel_managed_addr, kernel_managed_size,
- NULL);
+ &gpuvm_ops);
/* GPUVM takes care from here on. */
drm_gem_object_put(r_obj);

--
2.41.0

2023-09-30 06:04:01

by Danilo Krummrich

[permalink] [raw]
Subject: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Currently the DRM GPUVM offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVM represent
a basis for GPU-VM implementations. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Big thanks to Boris Brezillon for his help to figure out locking for
drivers updating the GPU VA space within the fence signalling path.

Suggested-by: Matthew Brost <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
---
drivers/gpu/drm/drm_gpuvm.c | 642 ++++++++++++++++++++++++++++++++++++
include/drm/drm_gpuvm.h | 240 ++++++++++++++
2 files changed, 882 insertions(+)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 27100423154b..770bb3d68d1f 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -82,6 +82,21 @@
* &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
* particular combination. If not existent a new instance is created and linked
* to the &drm_gem_object.
+ *
+ * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
+ * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
+ * list are maintained in order to accelerate locking of dma-resv locks and
+ * validation of evicted objects bound in a &drm_gpuvm. For instance, all
+ * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
+ * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
+ * order to validate all evicted &drm_gem_objects. It is also possible to lock
+ * additional &drm_gem_objects by providing the corresponding parameters to
+ * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
+ * use of helper functions such as drm_gpuvm_prepare_range() or
+ * drm_gpuvm_prepare_objects().
+ *
+ * Every bound &drm_gem_object is treated as external object when its &dma_resv
+ * structure is different than the &drm_gpuvm's common &dma_resv structure.
*/

/**
@@ -429,6 +444,20 @@
* Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
* &drm_gem_object must be able to observe previous creations and destructions
* of &drm_gpuvm_bos in order to keep instances unique.
+ *
+ * The &drm_gpuvm's lists for keeping track of external and evicted objects are
+ * protected against concurrent insertion / removal and iteration internally.
+ *
+ * However, drivers still need ensure to protect concurrent calls to functions
+ * iterating those lists, namely drm_gpuvm_prepare_objects() and
+ * drm_gpuvm_validate().
+ *
+ * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate
+ * that the corresponding &dma_resv locks are held in order to protect the
+ * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and
+ * the corresponding lockdep checks are enabled. This is an optimization for
+ * drivers which are capable of taking the corresponding &dma_resv locks and
+ * hence do not require internal locking.
*/

/**
@@ -641,6 +670,195 @@
* }
*/

+/**
+ * get_next_vm_bo_from_list() - get the next vm_bo element
+ * @__gpuvm: The GPU VM
+ * @__list_name: The name of the list we're iterating on
+ * @__local_list: A pointer to the local list used to store already iterated items
+ * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from
+ * the list, so list insertion deletion can happen concurrently.
+ *
+ * Elements popped from the original list are kept in a local list, so removal
+ * and is_empty checks can still happen while we're iterating the list.
+ */
+#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \
+ ({ \
+ struct drm_gpuvm_bo *__vm_bo = NULL; \
+ \
+ drm_gpuvm_bo_put(__prev_vm_bo); \
+ \
+ spin_lock(&(__gpuvm)->__list_name.lock); \
+ if (!(__gpuvm)->__list_name.local_list) \
+ (__gpuvm)->__list_name.local_list = __local_list; \
+ else \
+ WARN_ON((__gpuvm)->__list_name.local_list != __local_list); \
+ \
+ while (!list_empty(&(__gpuvm)->__list_name.list)) { \
+ __vm_bo = list_first_entry(&(__gpuvm)->__list_name.list, \
+ struct drm_gpuvm_bo, \
+ list.entry.__list_name); \
+ if (kref_get_unless_zero(&__vm_bo->kref)) { \
+ list_move_tail(&(__vm_bo)->list.entry.__list_name, \
+ __local_list); \
+ break; \
+ } else { \
+ list_del_init(&(__vm_bo)->list.entry.__list_name); \
+ __vm_bo = NULL; \
+ } \
+ } \
+ spin_unlock(&(__gpuvm)->__list_name.lock); \
+ \
+ __vm_bo; \
+ })
+
+/**
+ * for_each_vm_bo_in_list() - internal vm_bo list iterator
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from the
+ * list, hence list insertion and deletion can happen concurrently.
+ *
+ * It is not allowed to re-assign the vm_bo pointer from inside this loop.
+ *
+ * Typical use:
+ *
+ * struct drm_gpuvm_bo *vm_bo;
+ * LIST_HEAD(my_local_list);
+ *
+ * ret = 0;
+ * for_each_vm_bo_in_list(gpuvm, <list_name>, &my_local_list, vm_bo) {
+ * ret = do_something_with_vm_bo(..., vm_bo);
+ * if (ret)
+ * break;
+ * }
+ * drm_gpuvm_bo_put(vm_bo);
+ * restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
+ *
+ *
+ * Only used for internal list iterations, not meant to be exposed to the outside
+ * world.
+ */
+#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo) \
+ for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
+ __local_list, NULL); \
+ __vm_bo; \
+ __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
+ __local_list, __vm_bo))
+
+static inline void
+__restore_vm_bo_list(struct drm_gpuvm *gpuvm, spinlock_t *lock,
+ struct list_head *list, struct list_head **local_list)
+{
+ /* Merge back the two lists, moving local list elements to the
+ * head to preserve previous ordering, in case it matters.
+ */
+ spin_lock(lock);
+ if (*local_list) {
+ list_splice(*local_list, list);
+ *local_list = NULL;
+ }
+ spin_unlock(lock);
+}
+
+/**
+ * restore_vm_bo_list() - move vm_bo elements back to their original list
+ * @__gpuvm: The GPU VM
+ * @__list_name: The name of the list we're iterating on
+ *
+ * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
+ * to restore the original state and let new iterations take place.
+ */
+#define restore_vm_bo_list(__gpuvm, __list_name) \
+ __restore_vm_bo_list((__gpuvm), &(__gpuvm)->__list_name.lock, \
+ &(__gpuvm)->__list_name.list, \
+ &(__gpuvm)->__list_name.local_list)
+
+static inline void
+cond_spin_lock(spinlock_t *lock, bool cond)
+{
+ if (cond)
+ spin_lock(lock);
+}
+
+static inline void
+cond_spin_unlock(spinlock_t *lock, bool cond)
+{
+ if (cond)
+ spin_unlock(lock);
+}
+
+static inline void
+__drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
+ struct list_head *entry, struct list_head *list)
+{
+ cond_spin_lock(lock, !!lock);
+ if (list_empty(entry))
+ list_add_tail(entry, list);
+ cond_spin_unlock(lock, !!lock);
+}
+
+/**
+ * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ * @__lock: whether to lock with the internal spinlock
+ *
+ * Inserts the given @__vm_bo into the list specified by @__list_name.
+ */
+#define drm_gpuvm_bo_list_add(__vm_bo, __list_name, __lock) \
+ __drm_gpuvm_bo_list_add((__vm_bo)->vm, \
+ __lock ? &(__vm_bo)->vm->__list_name.lock : \
+ NULL, \
+ &(__vm_bo)->list.entry.__list_name, \
+ &(__vm_bo)->vm->__list_name.list)
+
+static inline void
+__drm_gpuvm_bo_list_del(struct drm_gpuvm *gpuvm, spinlock_t *lock,
+ struct list_head *entry, bool init)
+{
+ cond_spin_lock(lock, !!lock);
+ if (init) {
+ if (!list_empty(entry))
+ list_del_init(entry);
+ } else {
+ list_del(entry);
+ }
+ cond_spin_unlock(lock, !!lock);
+}
+
+/**
+ * drm_gpuvm_bo_list_del_init() - remove a vm_bo from the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ * @__lock: whether to lock with the internal spinlock
+ *
+ * Removes the given @__vm_bo from the list specified by @__list_name.
+ */
+#define drm_gpuvm_bo_list_del_init(__vm_bo, __list_name, __lock) \
+ __drm_gpuvm_bo_list_del((__vm_bo)->vm, \
+ __lock ? &(__vm_bo)->vm->__list_name.lock : \
+ NULL, \
+ &(__vm_bo)->list.entry.__list_name, \
+ true)
+
+/**
+ * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ * @__lock: whether to lock with the internal spinlock
+ *
+ * Removes the given @__vm_bo from the list specified by @__list_name.
+ */
+#define drm_gpuvm_bo_list_del(__vm_bo, __list_name, __lock) \
+ __drm_gpuvm_bo_list_del((__vm_bo)->vm, \
+ __lock ? &(__vm_bo)->vm->__list_name.lock : \
+ NULL, \
+ &(__vm_bo)->list.entry.__list_name, \
+ false)
+
#define to_drm_gpuva(__node) container_of((__node), struct drm_gpuva, rb.node)

#define GPUVA_START(node) ((node)->va.addr)
@@ -760,6 +978,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_gem_object *r_obj,
gpuvm->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(&gpuvm->rb.list);

+ INIT_LIST_HEAD(&gpuvm->extobj.list);
+ spin_lock_init(&gpuvm->extobj.lock);
+
+ INIT_LIST_HEAD(&gpuvm->evict.list);
+ spin_lock_init(&gpuvm->evict.lock);
+
drm_gpuvm_check_overflow(start_offset, range);
gpuvm->mm_start = start_offset;
gpuvm->mm_range = range;
@@ -802,10 +1026,373 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
"GPUVA tree is not empty, potentially leaking memory.\n");

+ WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
+ WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
+
drm_gem_object_put(gpuvm->r_obj);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);

+static int
+__drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ unsigned int num_fences)
+{
+ struct drm_gpuvm_bo *vm_bo;
+ LIST_HEAD(extobjs);
+ int ret = 0;
+
+ for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
+ ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
+ if (ret)
+ break;
+ }
+ /* Drop ref in case we break out of the loop. */
+ drm_gpuvm_bo_put(vm_bo);
+ restore_vm_bo_list(gpuvm, extobj);
+
+ return ret;
+}
+
+static int
+drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ unsigned int num_fences)
+{
+ struct drm_gpuvm_bo *vm_bo;
+ int ret = 0;
+
+ drm_gpuvm_resv_assert_held(gpuvm);
+ list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
+ ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+/**
+ * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec locking context
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
+ * &drm_gpuvm contains mappings of.
+ *
+ * Using this function directly, it is the drivers responsibility to call
+ * drm_exec_init() and drm_exec_fini() accordingly.
+ *
+ * Note: This function is safe against concurrent insertion and removal of
+ * external objects, however it is not safe against concurrent usage itself.
+ *
+ * Drivers need to make sure to protect this case with either an outer VM lock
+ * or by calling drm_gpuvm_prepare_vm() before this function within the
+ * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
+ * mutual exclusion.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ unsigned int num_fences)
+{
+ if (drm_gpuvm_resv_protected(gpuvm))
+ return drm_gpuvm_prepare_objects_locked(gpuvm, exec,
+ num_fences);
+ else
+ return __drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
+
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
+
+/**
+ * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec locking context
+ * @addr: the start address within the VA space
+ * @range: the range to iterate within the VA space
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
+ * and @addr + @range.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
+ u64 addr, u64 range, unsigned int num_fences)
+{
+ struct drm_gpuva *va;
+ u64 end = addr + range;
+ int ret;
+
+ drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
+ struct drm_gem_object *obj = va->gem.obj;
+
+ ret = drm_exec_prepare_obj(exec, obj, num_fences);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
+
+/**
+ * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @num_fences: the amount of &dma_fences to reserve
+ * @interruptible: sleep interruptible if waiting
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects the given
+ * &drm_gpuvm contains mappings of.
+ *
+ * Addionally, when calling this function with struct drm_gpuvm_exec::extra
+ * being set the driver receives the given @fn callback to lock additional
+ * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
+ * would call drm_exec_prepare_obj() from within this callback.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
+ unsigned int num_fences,
+ bool interruptible)
+{
+ struct drm_gpuvm *gpuvm = vm_exec->vm;
+ struct drm_exec *exec = &vm_exec->exec;
+ uint32_t flags;
+ int ret;
+
+ flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
+ DRM_EXEC_IGNORE_DUPLICATES;
+
+ drm_exec_init(exec, flags);
+
+ drm_exec_until_all_locked(exec) {
+ ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
+ drm_exec_retry_on_contention(exec);
+ if (ret)
+ goto err;
+
+ ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
+ drm_exec_retry_on_contention(exec);
+ if (ret)
+ goto err;
+
+ if (vm_exec->extra.fn) {
+ ret = vm_exec->extra.fn(vm_exec, num_fences);
+ drm_exec_retry_on_contention(exec);
+ if (ret)
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ drm_exec_fini(exec);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
+
+static int
+fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
+{
+ struct {
+ struct drm_gem_object **objs;
+ unsigned int num_objs;
+ } *args = vm_exec->extra.priv;
+
+ return drm_exec_prepare_array(&vm_exec->exec, args->objs,
+ args->num_objs, num_fences);
+}
+
+/**
+ * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @objs: additional &drm_gem_objects to lock
+ * @num_objs: the number of additional &drm_gem_objects to lock
+ * @num_fences: the amount of &dma_fences to reserve
+ * @interruptible: sleep interruptible if waiting
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
+ * contains mappings of, plus the ones given through @objs.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
+ struct drm_gem_object **objs,
+ unsigned int num_objs,
+ unsigned int num_fences,
+ bool interruptible)
+{
+ struct {
+ struct drm_gem_object **objs;
+ unsigned int num_objs;
+ } args;
+
+ args.objs = objs;
+ args.num_objs = num_objs;
+
+ vm_exec->extra.fn = fn_lock_array;
+ vm_exec->extra.priv = &args;
+
+ return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
+
+/**
+ * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @addr: the start address within the VA space
+ * @range: the range to iterate within the VA space
+ * @num_fences: the amount of &dma_fences to reserve
+ * @interruptible: sleep interruptible if waiting
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
+ * @addr + @range.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
+ u64 addr, u64 range,
+ unsigned int num_fences,
+ bool interruptible)
+{
+ struct drm_gpuvm *gpuvm = vm_exec->vm;
+ struct drm_exec *exec = &vm_exec->exec;
+ uint32_t flags;
+ int ret;
+
+ flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
+ DRM_EXEC_IGNORE_DUPLICATES;
+
+ drm_exec_init(exec, flags);
+
+ drm_exec_until_all_locked(exec) {
+ ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
+ num_fences);
+ drm_exec_retry_on_contention(exec);
+ if (ret)
+ goto err;
+ }
+
+ return ret;
+
+err:
+ drm_exec_fini(exec);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
+
+static int
+__drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
+{
+ const struct drm_gpuvm_ops *ops = gpuvm->ops;
+ struct drm_gpuvm_bo *vm_bo;
+ LIST_HEAD(evict);
+ int ret = 0;
+
+ for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
+ ret = ops->vm_bo_validate(vm_bo, exec);
+ if (ret)
+ break;
+ }
+ /* Drop ref in case we break out of the loop. */
+ drm_gpuvm_bo_put(vm_bo);
+ restore_vm_bo_list(gpuvm, evict);
+
+ return ret;
+}
+
+static int
+drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
+{
+ const struct drm_gpuvm_ops *ops = gpuvm->ops;
+ struct drm_gpuvm_bo *vm_bo, *next;
+ int ret = 0;
+
+#ifdef CONFIG_LOCKDEP
+ drm_gpuvm_resv_assert_held(gpuvm);
+ list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj)
+ dma_resv_assert_held(vm_bo->obj->resv);
+#endif
+
+ /* Iterate list safely, drivers typically remove the current entry from
+ * their drm_gpuvm_ops::vm_bo_validate callback. Drivers might also
+ * re-add the entry on failure; this is safe since on failure we break
+ * out of the loop.
+ */
+ list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
+ list.entry.evict) {
+ ret = ops->vm_bo_validate(vm_bo, exec);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+/**
+ * drm_gpuvm_validate() - validate all BOs marked as evicted
+ * @gpuvm: the &drm_gpuvm to validate evicted BOs
+ * @exec: the &drm_exec instance used for locking the GPUVM
+ *
+ * Calls the &drm_gpuvm_ops::vm_bo_validate callback for all evicted buffer
+ * objects being mapped in the given &drm_gpuvm.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
+{
+ const struct drm_gpuvm_ops *ops = gpuvm->ops;
+
+ if (unlikely(!ops || !ops->vm_bo_validate))
+ return -ENOTSUPP;
+
+ if (drm_gpuvm_resv_protected(gpuvm))
+ return drm_gpuvm_validate_locked(gpuvm, exec);
+ else
+ return __drm_gpuvm_validate(gpuvm, exec);
+
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
+
+/**
+ * drm_gpuvm_resv_add_fence - add fence to private and all extobj
+ * dma-resv
+ * @gpuvm: the &drm_gpuvm to add a fence to
+ * @exec: the &drm_exec locking context
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ */
+void
+drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ struct dma_fence *fence,
+ enum dma_resv_usage private_usage,
+ enum dma_resv_usage extobj_usage)
+{
+ struct drm_gem_object *obj;
+ unsigned long index;
+
+ drm_exec_for_each_locked_object(exec, index, obj) {
+ dma_resv_assert_held(obj->resv);
+ dma_resv_add_fence(obj->resv, fence,
+ drm_gpuvm_is_extobj(gpuvm, obj) ?
+ private_usage : extobj_usage);
+ }
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
+
/**
* drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
* @gpuvm: The &drm_gpuvm the @obj is mapped in.
@@ -838,6 +1425,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
INIT_LIST_HEAD(&vm_bo->list.gpuva);
INIT_LIST_HEAD(&vm_bo->list.entry.gem);

+ INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
+ INIT_LIST_HEAD(&vm_bo->list.entry.evict);
+
drm_gem_object_get(obj);

return vm_bo;
@@ -858,6 +1448,9 @@ drm_gpuvm_bo_destroy(struct kref *kref)
if (!lock)
drm_gpuvm_resv_assert_held(gpuvm);

+ drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
+ drm_gpuvm_bo_list_del(vm_bo, evict, lock);
+
list_del(&vm_bo->list.entry.gem);

drm_gem_object_put(obj);
@@ -994,6 +1587,55 @@ drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);

+/**
+ * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
+ * extobj list
+ * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
+ *
+ * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
+ * already and if the corresponding &drm_gem_object is an external object,
+ * actually.
+ */
+void
+drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
+{
+ struct drm_gpuvm *gpuvm = vm_bo->vm;
+ bool lock = !drm_gpuvm_resv_protected(gpuvm);
+
+ if (!lock)
+ drm_gpuvm_resv_assert_held(gpuvm);
+
+ if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
+ drm_gpuvm_bo_list_add(vm_bo, extobj, lock);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
+
+/**
+ * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the &drm_gpuvms
+ * evicted list
+ * @vm_bo: the &drm_gpuvm_bo to add or remove
+ * @evict: indicates whether the object is evicted
+ *
+ * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms evicted list.
+ */
+void
+drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
+{
+ struct drm_gem_object *obj = vm_bo->obj;
+
+ dma_resv_assert_held(obj->resv);
+
+ /* Always lock list transactions, even if DRM_GPUVM_RESV_PROTECTED is
+ * set. This is required to protect multiple concurrent calls to
+ * drm_gpuvm_bo_evict() with BOs with different dma_resv.
+ */
+ if (evict)
+ drm_gpuvm_bo_list_add(vm_bo, evict, true);
+ else
+ drm_gpuvm_bo_list_del_init(vm_bo, evict, true);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
+
static int
__drm_gpuva_insert(struct drm_gpuvm *gpuvm,
struct drm_gpuva *va)
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 7ab479153a00..905f9eaf4fa3 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -31,6 +31,7 @@
#include <linux/types.h>

#include <drm/drm_gem.h>
+#include <drm/drm_exec.h>

struct drm_gpuvm;
struct drm_gpuvm_bo;
@@ -272,6 +273,50 @@ struct drm_gpuvm {
* @r_obj: Root GEM object; representing the GPUVM's common &dma_resv.
*/
struct drm_gem_object *r_obj;
+
+ /**
+ * @extobj: structure holding the extobj list
+ */
+ struct {
+ /**
+ * @list: &list_head storing &drm_gpuvm_bos serving as
+ * external object
+ */
+ struct list_head list;
+
+ /**
+ * @local_list: pointer to the local list temporarily storing
+ * entries from the external object list
+ */
+ struct list_head *local_list;
+
+ /**
+ * @lock: spinlock to protect the extobj list
+ */
+ spinlock_t lock;
+ } extobj;
+
+ /**
+ * @evict: structure holding the evict list and evict list lock
+ */
+ struct {
+ /**
+ * @list: &list_head storing &drm_gpuvm_bos currently being
+ * evicted
+ */
+ struct list_head list;
+
+ /**
+ * @local_list: pointer to the local list temporarily storing
+ * entries from the evicted object list
+ */
+ struct list_head *local_list;
+
+ /**
+ * @lock: spinlock to protect the evict list
+ */
+ spinlock_t lock;
+ } evict;
};

void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_gem_object *r_obj,
@@ -329,6 +374,22 @@ drm_gpuvm_resv_protected(struct drm_gpuvm *gpuvm)
#define drm_gpuvm_resv_assert_held(gpuvm__) \
dma_resv_assert_held(drm_gpuvm_resv(gpuvm__))

+/**
+ * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
+ * external object
+ * @gpuvm: the &drm_gpuvm to check
+ * @obj: the &drm_gem_object to check
+ *
+ * Returns: true if the &drm_gem_object &dma_resv differs from the
+ * &drm_gpuvms &dma_resv, false otherwise
+ */
+static inline bool
+drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
+ struct drm_gem_object *obj)
+{
+ return obj && obj->resv != drm_gpuvm_resv(gpuvm);
+}
+
static inline struct drm_gpuva *
__drm_gpuva_next(struct drm_gpuva *va)
{
@@ -407,6 +468,140 @@ __drm_gpuva_next(struct drm_gpuva *va)
#define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)

+/**
+ * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
+ *
+ * This structure should be created on the stack as &drm_exec should be.
+ *
+ * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
+ */
+struct drm_gpuvm_exec {
+ /**
+ * @exec: the &drm_exec structure
+ */
+ struct drm_exec exec;
+
+ /**
+ * @vm: the &drm_gpuvm to lock its DMA reservations
+ */
+ struct drm_gpuvm *vm;
+
+ /**
+ * @extra: Callback and corresponding private data for the driver to
+ * lock arbitrary additional &drm_gem_objects.
+ */
+ struct {
+ /**
+ * @fn: The driver callback to lock additional &drm_gem_objects.
+ */
+ int (*fn)(struct drm_gpuvm_exec *vm_exec,
+ unsigned int num_fences);
+
+ /**
+ * @priv: driver private data for the @fn callback
+ */
+ void *priv;
+ } extra;
+};
+
+/**
+ * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec context
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
+ *
+ * Using this function directly, it is the drivers responsibility to call
+ * drm_exec_init() and drm_exec_fini() accordingly.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static inline int
+drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ unsigned int num_fences)
+{
+ return drm_exec_prepare_obj(exec, gpuvm->r_obj, num_fences);
+}
+
+int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ unsigned int num_fences);
+
+int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ u64 addr, u64 range,
+ unsigned int num_fences);
+
+int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
+ unsigned int num_fences,
+ bool interruptible);
+
+int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
+ struct drm_gem_object **objs,
+ unsigned int num_objs,
+ unsigned int num_fences,
+ bool interruptible);
+
+int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
+ u64 addr, u64 range,
+ unsigned int num_fences,
+ bool interruptible);
+
+/**
+ * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
+ * @gpuvm: the &drm_gpuvm
+ *
+ * Releases all dma-resv locks of all &drm_gem_objects previously acquired
+ * through drm_gpuvm_lock() or its variants.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static inline void
+drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
+{
+ drm_exec_fini(&vm_exec->exec);
+}
+
+int drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec);
+void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
+ struct drm_exec *exec,
+ struct dma_fence *fence,
+ enum dma_resv_usage private_usage,
+ enum dma_resv_usage extobj_usage);
+
+/**
+ * drm_gpuvm_exec_resv_add_fence()
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ *
+ * See drm_gpuvm_resv_add_fence().
+ */
+static inline void
+drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
+ struct dma_fence *fence,
+ enum dma_resv_usage private_usage,
+ enum dma_resv_usage extobj_usage)
+{
+ drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
+ private_usage, extobj_usage);
+}
+
+/**
+ * drm_gpuvm_exec_resv_add_fence()
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ *
+ * See drm_gpuvm_validate().
+ */
+static inline int
+drm_gpuvm_exec_validate(struct drm_gpuvm_exec *vm_exec)
+{
+ return drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
+}
+
/**
* struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
* &drm_gem_object combination
@@ -459,6 +654,18 @@ struct drm_gpuvm_bo {
* gpuva list.
*/
struct list_head gem;
+
+ /**
+ * @evict: List entry to attach to the &drm_gpuvms
+ * extobj list.
+ */
+ struct list_head extobj;
+
+ /**
+ * @evict: List entry to attach to the &drm_gpuvms evict
+ * list.
+ */
+ struct list_head evict;
} entry;
} list;
};
@@ -493,6 +700,27 @@ struct drm_gpuvm_bo *
drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
struct drm_gem_object *obj);

+void drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict);
+
+/**
+ * drm_gpuvm_bo_gem_evict()
+ * @obj: the &drm_gem_object
+ * @evict: indicates whether @obj is evicted
+ *
+ * See drm_gpuvm_bo_evict().
+ */
+static inline void
+drm_gpuvm_bo_gem_evict(struct drm_gem_object *obj, bool evict)
+{
+ struct drm_gpuvm_bo *vm_bo;
+
+ drm_gem_gpuva_assert_lock_held(obj);
+ drm_gem_for_each_gpuvm_bo(vm_bo, obj)
+ drm_gpuvm_bo_evict(vm_bo, evict);
+}
+
+void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
+
/**
* drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
* @va__: &drm_gpuva structure to assign to in each iteration step
@@ -855,6 +1083,18 @@ struct drm_gpuvm_ops {
*/
void (*vm_bo_free)(struct drm_gpuvm_bo *vm_bo);

+ /**
+ * @vm_bo_validate: called from drm_gpuvm_validate()
+ *
+ * Drivers receive this callback for every evicted &drm_gem_object being
+ * mapped in the corresponding &drm_gpuvm.
+ *
+ * Typically, drivers would call their driver specific variant of
+ * ttm_bo_validate() from within this callback.
+ */
+ int (*vm_bo_validate)(struct drm_gpuvm_bo *vm_bo,
+ struct drm_exec *exec);
+
/**
* @sm_step_map: called from &drm_gpuvm_sm_map to finally insert the
* mapping once all previous steps were completed
--
2.41.0

2023-10-02 11:21:25

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 3/6] drm/gpuvm: add an abstraction for a VM / BO combination

Hi Danilo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on a4ead6e37e3290cff399e2598d75e98777b69b37]

url: https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-gpuvm-add-common-dma-resv-per-struct-drm_gpuvm/20230929-031831
base: a4ead6e37e3290cff399e2598d75e98777b69b37
patch link: https://lore.kernel.org/r/20230928191624.13703-4-dakr%40redhat.com
patch subject: [PATCH drm-misc-next v5 3/6] drm/gpuvm: add an abstraction for a VM / BO combination
reproduce: (https://download.01.org/0day-ci/archive/20231002/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> ./include/drm/drm_gpuvm.h:464: warning: Function parameter or member 'vm' not described in 'drm_gpuvm_bo'

vim +464 ./include/drm/drm_gpuvm.h

427
428 /**
429 * @gpuvm: The &drm_gpuvm the @obj is mapped in.
430 */
431 struct drm_gpuvm *vm;
432
433 /**
434 * @obj: The &drm_gem_object being mapped in the @gpuvm.
435 */
436 struct drm_gem_object *obj;
437
438 /**
439 * @kref: The reference count for this &drm_gpuvm_bo.
440 */
441 struct kref kref;
442
443 /**
444 * @list: Structure containing all &list_heads.
445 */
446 struct {
447 /**
448 * @gpuva: The list of linked &drm_gpuvas.
449 */
450 struct list_head gpuva;
451
452 /**
453 * @entry: Structure containing all &list_heads serving as
454 * entry.
455 */
456 struct {
457 /**
458 * @gem: List entry to attach to the &drm_gem_objects
459 * gpuva list.
460 */
461 struct list_head gem;
462 } entry;
463 } list;
> 464 };
465

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-10-02 19:37:06

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Hi Danilo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on a4ead6e37e3290cff399e2598d75e98777b69b37]

url: https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-gpuvm-add-common-dma-resv-per-struct-drm_gpuvm/20230929-031831
base: a4ead6e37e3290cff399e2598d75e98777b69b37
patch link: https://lore.kernel.org/r/20230928191624.13703-5-dakr%40redhat.com
patch subject: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
reproduce: (https://download.01.org/0day-ci/archive/20231002/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> ./include/drm/drm_gpuvm.h:563: warning: Function parameter or member 'vm_exec' not described in 'drm_gpuvm_exec_unlock'
>> ./include/drm/drm_gpuvm.h:563: warning: expecting prototype for drm_gpuvm_lock(). Prototype was for drm_gpuvm_exec_unlock() instead
>> ./include/drm/drm_gpuvm.h:601: warning: expecting prototype for drm_gpuvm_exec_resv_add_fence(). Prototype was for drm_gpuvm_exec_validate() instead

vim +563 ./include/drm/drm_gpuvm.h

527
528 int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
529 struct drm_exec *exec,
530 unsigned int num_fences);
531
532 int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
533 struct drm_exec *exec,
534 u64 addr, u64 range,
535 unsigned int num_fences);
536
537 int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
538 unsigned int num_fences,
539 bool interruptible);
540
541 int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
542 struct drm_gem_object **objs,
543 unsigned int num_objs,
544 unsigned int num_fences,
545 bool interruptible);
546
547 int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
548 u64 addr, u64 range,
549 unsigned int num_fences,
550 bool interruptible);
551
552 /**
553 * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
554 * @gpuvm: the &drm_gpuvm
555 *
556 * Releases all dma-resv locks of all &drm_gem_objects previously acquired
557 * through drm_gpuvm_lock() or its variants.
558 *
559 * Returns: 0 on success, negative error code on failure.
560 */
561 static inline void
562 drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> 563 {
564 drm_exec_fini(&vm_exec->exec);
565 }
566
567 int drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec);
568 void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
569 struct drm_exec *exec,
570 struct dma_fence *fence,
571 enum dma_resv_usage private_usage,
572 enum dma_resv_usage extobj_usage);
573
574 /**
575 * drm_gpuvm_exec_resv_add_fence()
576 * @vm_exec: the &drm_gpuvm_exec abstraction
577 * @fence: fence to add
578 * @private_usage: private dma-resv usage
579 * @extobj_usage: extobj dma-resv usage
580 *
581 * See drm_gpuvm_resv_add_fence().
582 */
583 static inline void
584 drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
585 struct dma_fence *fence,
586 enum dma_resv_usage private_usage,
587 enum dma_resv_usage extobj_usage)
588 {
589 drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
590 private_usage, extobj_usage);
591 }
592
593 /**
594 * drm_gpuvm_exec_resv_add_fence()
595 * @vm_exec: the &drm_gpuvm_exec abstraction
596 *
597 * See drm_gpuvm_validate().
598 */
599 static inline int
600 drm_gpuvm_exec_validate(struct drm_gpuvm_exec *vm_exec)
> 601 {
602 return drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
603 }
604

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-10-03 08:36:39

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Hi, Danilo,

On 9/28/23 21:16, Danilo Krummrich wrote:
> Currently the DRM GPUVM offers common infrastructure to track GPU VA
> allocations and mappings, generically connect GPU VA mappings to their
> backing buffers and perform more complex mapping operations on the GPU VA
> space.
>
> However, there are more design patterns commonly used by drivers, which
> can potentially be generalized in order to make the DRM GPUVM represent
> a basis for GPU-VM implementations. In this context, this patch aims
> at generalizing the following elements.
>
> 1) Provide a common dma-resv for GEM objects not being used outside of
> this GPU-VM.
>
> 2) Provide tracking of external GEM objects (GEM objects which are
> shared with other GPU-VMs).
>
> 3) Provide functions to efficiently lock all GEM objects dma-resv the
> GPU-VM contains mappings of.
>
> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> of, such that validation of evicted GEM objects is accelerated.
>
> 5) Provide some convinience functions for common patterns.
>
> Big thanks to Boris Brezillon for his help to figure out locking for
> drivers updating the GPU VA space within the fence signalling path.
>
> Suggested-by: Matthew Brost <[email protected]>
> Signed-off-by: Danilo Krummrich <[email protected]>
> ---
> drivers/gpu/drm/drm_gpuvm.c | 642 ++++++++++++++++++++++++++++++++++++
> include/drm/drm_gpuvm.h | 240 ++++++++++++++
> 2 files changed, 882 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 27100423154b..770bb3d68d1f 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -82,6 +82,21 @@
> * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
> * particular combination. If not existent a new instance is created and linked
> * to the &drm_gem_object.
> + *
> + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
> + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
> + * list are maintained in order to accelerate locking of dma-resv locks and
> + * validation of evicted objects bound in a &drm_gpuvm. For instance, all
> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
> + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
> + * order to validate all evicted &drm_gem_objects. It is also possible to lock
> + * additional &drm_gem_objects by providing the corresponding parameters to
> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
> + * use of helper functions such as drm_gpuvm_prepare_range() or
> + * drm_gpuvm_prepare_objects().
> + *
> + * Every bound &drm_gem_object is treated as external object when its &dma_resv
> + * structure is different than the &drm_gpuvm's common &dma_resv structure.
> */
>
> /**
> @@ -429,6 +444,20 @@
> * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
> * &drm_gem_object must be able to observe previous creations and destructions
> * of &drm_gpuvm_bos in order to keep instances unique.
> + *
> + * The &drm_gpuvm's lists for keeping track of external and evicted objects are
> + * protected against concurrent insertion / removal and iteration internally.
> + *
> + * However, drivers still need ensure to protect concurrent calls to functions
> + * iterating those lists, namely drm_gpuvm_prepare_objects() and
> + * drm_gpuvm_validate().
> + *
> + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate
> + * that the corresponding &dma_resv locks are held in order to protect the
> + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and
> + * the corresponding lockdep checks are enabled. This is an optimization for
> + * drivers which are capable of taking the corresponding &dma_resv locks and
> + * hence do not require internal locking.
> */
>
> /**
> @@ -641,6 +670,195 @@
> * }
> */
>
> +/**
> + * get_next_vm_bo_from_list() - get the next vm_bo element
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from
> + * the list, so list insertion deletion can happen concurrently.
> + *
> + * Elements popped from the original list are kept in a local list, so removal
> + * and is_empty checks can still happen while we're iterating the list.
> + */
> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \
> + ({ \
> + struct drm_gpuvm_bo *__vm_bo = NULL; \
> + \
> + drm_gpuvm_bo_put(__prev_vm_bo); \
> + \
> + spin_lock(&(__gpuvm)->__list_name.lock); \

Here we unconditionally take the spinlocks while iterating, and the main
point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?


> + if (!(__gpuvm)->__list_name.local_list) \
> + (__gpuvm)->__list_name.local_list = __local_list; \
> + else \
> + WARN_ON((__gpuvm)->__list_name.local_list != __local_list); \
> + \
> + while (!list_empty(&(__gpuvm)->__list_name.list)) { \
> + __vm_bo = list_first_entry(&(__gpuvm)->__list_name.list, \
> + struct drm_gpuvm_bo, \
> + list.entry.__list_name); \
> + if (kref_get_unless_zero(&__vm_bo->kref)) {
And unnecessarily grab a reference in the RESV_PROTECTED case.
> \
> + list_move_tail(&(__vm_bo)->list.entry.__list_name, \
> + __local_list); \
> + break; \
> + } else { \
> + list_del_init(&(__vm_bo)->list.entry.__list_name); \
> + __vm_bo = NULL; \
> + } \
> + } \
> + spin_unlock(&(__gpuvm)->__list_name.lock); \
> + \
> + __vm_bo; \
> + })

IMHO this lockless list iteration looks very complex and should be
pretty difficult to maintain while moving forward, also since it pulls
the gpuvm_bos off the list, list iteration needs to be protected by an
outer lock anyway. Also from what I understand from Boris, the extobj
list would typically not need the fine-grained locking; only the evict
list? Also it seems that if we are to maintain two modes here, for
reasonably clean code we'd need two separate instances of
get_next_bo_from_list().

For the !RESV_PROTECTED case, perhaps one would want to consider the
solution used currently in xe, where the VM maintains two evict lists.
One protected by a spinlock and one protected by the VM resv. When the
VM resv is locked to begin list traversal, the spinlock is locked *once*
and the spinlock-protected list is looped over and copied into the resv
protected one. For traversal, the resv protected one is used.

If that works with all concerns raised so far,  list traversal would be
greatly simplified, and no need for a separate RESV_PROTECTED mode.

Also some inline comments below.


> +
> +/**
> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from the
> + * list, hence list insertion and deletion can happen concurrently.
> + *
> + * It is not allowed to re-assign the vm_bo pointer from inside this loop.
> + *
> + * Typical use:
> + *
> + * struct drm_gpuvm_bo *vm_bo;
> + * LIST_HEAD(my_local_list);
> + *
> + * ret = 0;
> + * for_each_vm_bo_in_list(gpuvm, <list_name>, &my_local_list, vm_bo) {
> + * ret = do_something_with_vm_bo(..., vm_bo);
> + * if (ret)
> + * break;
> + * }
> + * drm_gpuvm_bo_put(vm_bo);
> + * restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
> + *
> + *
> + * Only used for internal list iterations, not meant to be exposed to the outside
> + * world.
> + */
> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo) \
> + for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
> + __local_list, NULL); \
> + __vm_bo; \
> + __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
> + __local_list, __vm_bo))
> +
> +static inline void
> +__restore_vm_bo_list(struct drm_gpuvm *gpuvm, spinlock_t *lock,
> + struct list_head *list, struct list_head **local_list)
> +{
> + /* Merge back the two lists, moving local list elements to the
> + * head to preserve previous ordering, in case it matters.
> + */
> + spin_lock(lock);
> + if (*local_list) {
> + list_splice(*local_list, list);
> + *local_list = NULL;
> + }
> + spin_unlock(lock);
> +}
> +
> +/**
> + * restore_vm_bo_list() - move vm_bo elements back to their original list
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + *
> + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
> + * to restore the original state and let new iterations take place.
> + */
> +#define restore_vm_bo_list(__gpuvm, __list_name) \
> + __restore_vm_bo_list((__gpuvm), &(__gpuvm)->__list_name.lock, \
> + &(__gpuvm)->__list_name.list, \
> + &(__gpuvm)->__list_name.local_list)
> +
> +static inline void
> +cond_spin_lock(spinlock_t *lock, bool cond)
> +{
> + if (cond)
> + spin_lock(lock);
> +}
> +
> +static inline void
> +cond_spin_unlock(spinlock_t *lock, bool cond)
> +{
> + if (cond)
> + spin_unlock(lock);
> +}
> +
> +static inline void
> +__drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
> + struct list_head *entry, struct list_head *list)
> +{
> + cond_spin_lock(lock, !!lock);
> + if (list_empty(entry))
> + list_add_tail(entry, list);
> + cond_spin_unlock(lock, !!lock);
> +}
> +
> +/**
> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + * @__lock: whether to lock with the internal spinlock
> + *
> + * Inserts the given @__vm_bo into the list specified by @__list_name.
> + */
> +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name, __lock) \
> + __drm_gpuvm_bo_list_add((__vm_bo)->vm, \
> + __lock ? &(__vm_bo)->vm->__list_name.lock : \
> + NULL, \
> + &(__vm_bo)->list.entry.__list_name, \
> + &(__vm_bo)->vm->__list_name.list)
> +
> +static inline void
> +__drm_gpuvm_bo_list_del(struct drm_gpuvm *gpuvm, spinlock_t *lock,
> + struct list_head *entry, bool init)
> +{
> + cond_spin_lock(lock, !!lock);
> + if (init) {
> + if (!list_empty(entry))
> + list_del_init(entry);
> + } else {
> + list_del(entry);
> + }
> + cond_spin_unlock(lock, !!lock);
> +}
> +
> +/**
> + * drm_gpuvm_bo_list_del_init() - remove a vm_bo from the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + * @__lock: whether to lock with the internal spinlock
> + *
> + * Removes the given @__vm_bo from the list specified by @__list_name.
> + */
> +#define drm_gpuvm_bo_list_del_init(__vm_bo, __list_name, __lock) \
> + __drm_gpuvm_bo_list_del((__vm_bo)->vm, \
> + __lock ? &(__vm_bo)->vm->__list_name.lock : \
> + NULL, \
> + &(__vm_bo)->list.entry.__list_name, \
> + true)
> +
> +/**
> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + * @__lock: whether to lock with the internal spinlock
> + *
> + * Removes the given @__vm_bo from the list specified by @__list_name.
> + */
> +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name, __lock) \
> + __drm_gpuvm_bo_list_del((__vm_bo)->vm, \
> + __lock ? &(__vm_bo)->vm->__list_name.lock : \
> + NULL, \
> + &(__vm_bo)->list.entry.__list_name, \
> + false)
> +
> #define to_drm_gpuva(__node) container_of((__node), struct drm_gpuva, rb.node)
>
> #define GPUVA_START(node) ((node)->va.addr)
> @@ -760,6 +978,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_gem_object *r_obj,
> gpuvm->rb.tree = RB_ROOT_CACHED;
> INIT_LIST_HEAD(&gpuvm->rb.list);
>
> + INIT_LIST_HEAD(&gpuvm->extobj.list);
> + spin_lock_init(&gpuvm->extobj.lock);
> +
> + INIT_LIST_HEAD(&gpuvm->evict.list);
> + spin_lock_init(&gpuvm->evict.lock);
> +
> drm_gpuvm_check_overflow(start_offset, range);
> gpuvm->mm_start = start_offset;
> gpuvm->mm_range = range;
> @@ -802,10 +1026,373 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
> WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
> "GPUVA tree is not empty, potentially leaking memory.\n");
>
> + WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
> + WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
> +
> drm_gem_object_put(gpuvm->r_obj);
> }
> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>
> +static int
> +__drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> + struct drm_exec *exec,
> + unsigned int num_fences)
> +{
> + struct drm_gpuvm_bo *vm_bo;
> + LIST_HEAD(extobjs);
> + int ret = 0;
> +
> + for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
> + ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
> + if (ret)
> + break;
> + }
> + /* Drop ref in case we break out of the loop. */
> + drm_gpuvm_bo_put(vm_bo);
> + restore_vm_bo_list(gpuvm, extobj);
> +
> + return ret;
> +}
> +
> +static int
> +drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
> + struct drm_exec *exec,
> + unsigned int num_fences)
> +{
> + struct drm_gpuvm_bo *vm_bo;
> + int ret = 0;
> +
> + drm_gpuvm_resv_assert_held(gpuvm);
> + list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
> + ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
> + if (ret)
> + break;
> + }
> +
> + return ret;
> +}
> +
> +/**
> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
> + * @gpuvm: the &drm_gpuvm
> + * @exec: the &drm_exec locking context
> + * @num_fences: the amount of &dma_fences to reserve
> + *
> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
> + * &drm_gpuvm contains mappings of.
> + *
> + * Using this function directly, it is the drivers responsibility to call
> + * drm_exec_init() and drm_exec_fini() accordingly.
> + *
> + * Note: This function is safe against concurrent insertion and removal of
> + * external objects, however it is not safe against concurrent usage itself.
> + *
> + * Drivers need to make sure to protect this case with either an outer VM lock
> + * or by calling drm_gpuvm_prepare_vm() before this function within the
> + * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
> + * mutual exclusion.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> + struct drm_exec *exec,
> + unsigned int num_fences)
> +{
> + if (drm_gpuvm_resv_protected(gpuvm))
> + return drm_gpuvm_prepare_objects_locked(gpuvm, exec,
> + num_fences);
> + else
> + return __drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
> +
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
> +
> +/**
> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
> + * @gpuvm: the &drm_gpuvm
> + * @exec: the &drm_exec locking context
> + * @addr: the start address within the VA space
> + * @range: the range to iterate within the VA space
> + * @num_fences: the amount of &dma_fences to reserve
> + *
> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
> + * and @addr + @range.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
> + u64 addr, u64 range, unsigned int num_fences)
> +{
> + struct drm_gpuva *va;
> + u64 end = addr + range;
> + int ret;
> +
> + drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
> + struct drm_gem_object *obj = va->gem.obj;
> +
> + ret = drm_exec_prepare_obj(exec, obj, num_fences);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
> +
> +/**
> + * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @num_fences: the amount of &dma_fences to reserve
> + * @interruptible: sleep interruptible if waiting
> + *
> + * Acquires all dma-resv locks of all &drm_gem_objects the given
> + * &drm_gpuvm contains mappings of.
> + *
> + * Addionally, when calling this function with struct drm_gpuvm_exec::extra
> + * being set the driver receives the given @fn callback to lock additional
> + * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
> + * would call drm_exec_prepare_obj() from within this callback.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> + unsigned int num_fences,
> + bool interruptible)
> +{
> + struct drm_gpuvm *gpuvm = vm_exec->vm;
> + struct drm_exec *exec = &vm_exec->exec;
> + uint32_t flags;
> + int ret;
> +
> + flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> + DRM_EXEC_IGNORE_DUPLICATES;
> +
> + drm_exec_init(exec, flags);
> +
> + drm_exec_until_all_locked(exec) {
> + ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
> + drm_exec_retry_on_contention(exec);
> + if (ret)
> + goto err;
> +
> + ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
> + drm_exec_retry_on_contention(exec);
> + if (ret)
> + goto err;
> +
> + if (vm_exec->extra.fn) {
> + ret = vm_exec->extra.fn(vm_exec, num_fences);
> + drm_exec_retry_on_contention(exec);
> + if (ret)
> + goto err;
> + }
> + }
> +
> + return 0;
> +
> +err:
> + drm_exec_fini(exec);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
> +
> +static int
> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
> +{
> + struct {
> + struct drm_gem_object **objs;
> + unsigned int num_objs;
> + } *args = vm_exec->extra.priv;
> +
> + return drm_exec_prepare_array(&vm_exec->exec, args->objs,
> + args->num_objs, num_fences);
> +}
> +
> +/**
> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @objs: additional &drm_gem_objects to lock
> + * @num_objs: the number of additional &drm_gem_objects to lock
> + * @num_fences: the amount of &dma_fences to reserve
> + * @interruptible: sleep interruptible if waiting
> + *
> + * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
> + * contains mappings of, plus the ones given through @objs.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> + struct drm_gem_object **objs,
> + unsigned int num_objs,
> + unsigned int num_fences,
> + bool interruptible)
> +{
> + struct {
> + struct drm_gem_object **objs;
> + unsigned int num_objs;
> + } args;
> +
> + args.objs = objs;
> + args.num_objs = num_objs;
> +
> + vm_exec->extra.fn = fn_lock_array;
> + vm_exec->extra.priv = &args;
> +
> + return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
> +
> +/**
> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @addr: the start address within the VA space
> + * @range: the range to iterate within the VA space
> + * @num_fences: the amount of &dma_fences to reserve
> + * @interruptible: sleep interruptible if waiting
> + *
> + * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
> + * @addr + @range.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> + u64 addr, u64 range,
> + unsigned int num_fences,
> + bool interruptible)
> +{
> + struct drm_gpuvm *gpuvm = vm_exec->vm;
> + struct drm_exec *exec = &vm_exec->exec;
> + uint32_t flags;
> + int ret;
> +
> + flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> + DRM_EXEC_IGNORE_DUPLICATES;
> +
> + drm_exec_init(exec, flags);
> +
> + drm_exec_until_all_locked(exec) {
> + ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
> + num_fences);
> + drm_exec_retry_on_contention(exec);
> + if (ret)
> + goto err;
> + }
> +
> + return ret;
> +
> +err:
> + drm_exec_fini(exec);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
> +
> +static int
> +__drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
> +{
> + const struct drm_gpuvm_ops *ops = gpuvm->ops;
> + struct drm_gpuvm_bo *vm_bo;
> + LIST_HEAD(evict);
> + int ret = 0;
> +
> + for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
> + ret = ops->vm_bo_validate(vm_bo, exec);
> + if (ret)
> + break;
> + }
> + /* Drop ref in case we break out of the loop. */
> + drm_gpuvm_bo_put(vm_bo);
> + restore_vm_bo_list(gpuvm, evict);
> +
> + return ret;
> +}
> +
> +static int
> +drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
> +{
> + const struct drm_gpuvm_ops *ops = gpuvm->ops;
> + struct drm_gpuvm_bo *vm_bo, *next;
> + int ret = 0;
> +
> +#ifdef CONFIG_LOCKDEP
> + drm_gpuvm_resv_assert_held(gpuvm);
> + list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj)
> + dma_resv_assert_held(vm_bo->obj->resv);
> +#endif
> +
> + /* Iterate list safely, drivers typically remove the current entry from
> + * their drm_gpuvm_ops::vm_bo_validate callback. Drivers might also
> + * re-add the entry on failure; this is safe since on failure we break
> + * out of the loop.
> + */
> + list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
> + list.entry.evict) {
> + ret = ops->vm_bo_validate(vm_bo, exec);
> + if (ret)
> + break;
> + }
> +
> + return ret;
> +}
> +
> +/**
> + * drm_gpuvm_validate() - validate all BOs marked as evicted
> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
> + * @exec: the &drm_exec instance used for locking the GPUVM
> + *
> + * Calls the &drm_gpuvm_ops::vm_bo_validate callback for all evicted buffer
> + * objects being mapped in the given &drm_gpuvm.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
> +{
> + const struct drm_gpuvm_ops *ops = gpuvm->ops;
> +
> + if (unlikely(!ops || !ops->vm_bo_validate))
> + return -ENOTSUPP;
> +
> + if (drm_gpuvm_resv_protected(gpuvm))
> + return drm_gpuvm_validate_locked(gpuvm, exec);
> + else
> + return __drm_gpuvm_validate(gpuvm, exec);
> +
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
> +
> +/**
> + * drm_gpuvm_resv_add_fence - add fence to private and all extobj
> + * dma-resv
> + * @gpuvm: the &drm_gpuvm to add a fence to
> + * @exec: the &drm_exec locking context
> + * @fence: fence to add
> + * @private_usage: private dma-resv usage
> + * @extobj_usage: extobj dma-resv usage
> + */
> +void
> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> + struct drm_exec *exec,
> + struct dma_fence *fence,
> + enum dma_resv_usage private_usage,
> + enum dma_resv_usage extobj_usage)
> +{
> + struct drm_gem_object *obj;
> + unsigned long index;
> +
> + drm_exec_for_each_locked_object(exec, index, obj) {
> + dma_resv_assert_held(obj->resv);
> + dma_resv_add_fence(obj->resv, fence,
> + drm_gpuvm_is_extobj(gpuvm, obj) ?
> + private_usage : extobj_usage);
> + }
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
> +
> /**
> * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
> * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> @@ -838,6 +1425,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
> INIT_LIST_HEAD(&vm_bo->list.gpuva);
> INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>
> + INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> + INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> +
> drm_gem_object_get(obj);
>
> return vm_bo;
> @@ -858,6 +1448,9 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> if (!lock)
> drm_gpuvm_resv_assert_held(gpuvm);
>
> + drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
> + drm_gpuvm_bo_list_del(vm_bo, evict, lock);
> +
> list_del(&vm_bo->list.entry.gem);
>
> drm_gem_object_put(obj);
> @@ -994,6 +1587,55 @@ drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
> }
> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>
> +/**
> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
> + * extobj list
> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
> + *
> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
> + * already and if the corresponding &drm_gem_object is an external object,
> + * actually.
> + */
> +void
> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
> +{
> + struct drm_gpuvm *gpuvm = vm_bo->vm;
> + bool lock = !drm_gpuvm_resv_protected(gpuvm);
> +
> + if (!lock)
> + drm_gpuvm_resv_assert_held(gpuvm);
> +
> + if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
> + drm_gpuvm_bo_list_add(vm_bo, extobj, lock);
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
> +
> +/**
> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the &drm_gpuvms
> + * evicted list
> + * @vm_bo: the &drm_gpuvm_bo to add or remove
> + * @evict: indicates whether the object is evicted
> + *
> + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms evicted list.
> + */
> +void
> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
> +{
> + struct drm_gem_object *obj = vm_bo->obj;
> +
> + dma_resv_assert_held(obj->resv);
> +
> + /* Always lock list transactions, even if DRM_GPUVM_RESV_PROTECTED is
> + * set. This is required to protect multiple concurrent calls to
> + * drm_gpuvm_bo_evict() with BOs with different dma_resv.
> + */

This doesn't work. The RESV_PROTECTED case requires the evicted flag we
discussed before. The list is either protected by the spinlock or the
resv. Otherwise a list add could race with a list removal elsewhere.

Thanks,

Thomas


2023-10-03 09:11:53

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Hi Again,


On 10/3/23 10:36, Thomas Hellström wrote:
> Hi, Danilo,
>
> On 9/28/23 21:16, Danilo Krummrich wrote:
>> Currently the DRM GPUVM offers common infrastructure to track GPU VA
>> allocations and mappings, generically connect GPU VA mappings to their
>> backing buffers and perform more complex mapping operations on the
>> GPU VA
>> space.
>>
>> However, there are more design patterns commonly used by drivers, which
>> can potentially be generalized in order to make the DRM GPUVM represent
>> a basis for GPU-VM implementations. In this context, this patch aims
>> at generalizing the following elements.
>>
>> 1) Provide a common dma-resv for GEM objects not being used outside of
>>     this GPU-VM.
>>
>> 2) Provide tracking of external GEM objects (GEM objects which are
>>     shared with other GPU-VMs).
>>
>> 3) Provide functions to efficiently lock all GEM objects dma-resv the
>>     GPU-VM contains mappings of.
>>
>> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
>>     of, such that validation of evicted GEM objects is accelerated.
>>
>> 5) Provide some convinience functions for common patterns.
>>
>> Big thanks to Boris Brezillon for his help to figure out locking for
>> drivers updating the GPU VA space within the fence signalling path.
>>
>> Suggested-by: Matthew Brost <[email protected]>
>> Signed-off-by: Danilo Krummrich <[email protected]>
>> ---
>>   drivers/gpu/drm/drm_gpuvm.c | 642 ++++++++++++++++++++++++++++++++++++
>>   include/drm/drm_gpuvm.h     | 240 ++++++++++++++
>>   2 files changed, 882 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>> index 27100423154b..770bb3d68d1f 100644
>> --- a/drivers/gpu/drm/drm_gpuvm.c
>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>> @@ -82,6 +82,21 @@
>>    * &drm_gem_object list of &drm_gpuvm_bos for an existing instance
>> of this
>>    * particular combination. If not existent a new instance is
>> created and linked
>>    * to the &drm_gem_object.
>> + *
>> + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm,
>> are also used
>> + * as entry for the &drm_gpuvm's lists of external and evicted
>> objects. Those
>> + * list are maintained in order to accelerate locking of dma-resv
>> locks and
>> + * validation of evicted objects bound in a &drm_gpuvm. For
>> instance, all
>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked
>> by calling
>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>> drm_gpuvm_validate() in
>> + * order to validate all evicted &drm_gem_objects. It is also
>> possible to lock
>> + * additional &drm_gem_objects by providing the corresponding
>> parameters to
>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop
>> while making
>> + * use of helper functions such as drm_gpuvm_prepare_range() or
>> + * drm_gpuvm_prepare_objects().
>> + *
>> + * Every bound &drm_gem_object is treated as external object when
>> its &dma_resv
>> + * structure is different than the &drm_gpuvm's common &dma_resv
>> structure.
>>    */
>>     /**
>> @@ -429,6 +444,20 @@
>>    * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>> &drm_gpuvm and
>>    * &drm_gem_object must be able to observe previous creations and
>> destructions
>>    * of &drm_gpuvm_bos in order to keep instances unique.
>> + *
>> + * The &drm_gpuvm's lists for keeping track of external and evicted
>> objects are
>> + * protected against concurrent insertion / removal and iteration
>> internally.
>> + *
>> + * However, drivers still need ensure to protect concurrent calls to
>> functions
>> + * iterating those lists, namely drm_gpuvm_prepare_objects() and
>> + * drm_gpuvm_validate().
>> + *
>> + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag
>> to indicate
>> + * that the corresponding &dma_resv locks are held in order to
>> protect the
>> + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is
>> disabled and
>> + * the corresponding lockdep checks are enabled. This is an
>> optimization for
>> + * drivers which are capable of taking the corresponding &dma_resv
>> locks and
>> + * hence do not require internal locking.
>>    */
>>     /**
>> @@ -641,6 +670,195 @@
>>    *    }
>>    */
>>   +/**
>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>> + * @__gpuvm: The GPU VM
>> + * @__list_name: The name of the list we're iterating on
>> + * @__local_list: A pointer to the local list used to store already
>> iterated items
>> + * @__prev_vm_bo: The previous element we got from
>> drm_gpuvm_get_next_cached_vm_bo()
>> + *
>> + * This helper is here to provide lockless list iteration. Lockless
>> as in, the
>> + * iterator releases the lock immediately after picking the first
>> element from
>> + * the list, so list insertion deletion can happen concurrently.
>> + *
>> + * Elements popped from the original list are kept in a local list,
>> so removal
>> + * and is_empty checks can still happen while we're iterating the list.
>> + */
>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list,
>> __prev_vm_bo)    \
>> +    ({                                        \
>> +        struct drm_gpuvm_bo *__vm_bo = NULL; \
>> +                                            \
>> +        drm_gpuvm_bo_put(__prev_vm_bo); \
>> +                                            \
>> + spin_lock(&(__gpuvm)->__list_name.lock);                \
>
> Here we unconditionally take the spinlocks while iterating, and the
> main point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?


Never mind, I missed this code wasn't used in the RESV_PROTECTED case.

>
>
>> +        if (!(__gpuvm)->__list_name.local_list)                    \
>> +            (__gpuvm)->__list_name.local_list = __local_list;        \
>> +        else                                    \
>> +            WARN_ON((__gpuvm)->__list_name.local_list !=
>> __local_list);    \
>> +                                            \
>> +        while (!list_empty(&(__gpuvm)->__list_name.list)) { \
>> +            __vm_bo =
>> list_first_entry(&(__gpuvm)->__list_name.list,    \
>> +                           struct drm_gpuvm_bo,            \
>> +                           list.entry.__list_name);        \
>> +            if (kref_get_unless_zero(&__vm_bo->kref)) {
> And unnecessarily grab a reference in the RESV_PROTECTED case.

Same here.


>>             \
>> + list_move_tail(&(__vm_bo)->list.entry.__list_name,    \
>> +                           __local_list);                \
>> +                break;                            \
>> +            } else {                            \
>> + list_del_init(&(__vm_bo)->list.entry.__list_name);    \
>> +                __vm_bo = NULL;                        \
>> +            }                                \
>> +        }                                    \
>> + spin_unlock(&(__gpuvm)->__list_name.lock); \
>> +                                            \
>> +        __vm_bo;                                \
>> +    })
>
> IMHO this lockless list iteration looks very complex and should be
> pretty difficult to maintain while moving forward, also since it pulls
> the gpuvm_bos off the list, list iteration needs to be protected by an
> outer lock anyway. Also from what I understand from Boris, the extobj
> list would typically not need the fine-grained locking; only the evict
> list? Also it seems that if we are to maintain two modes here, for
> reasonably clean code we'd need two separate instances of
> get_next_bo_from_list().
(And we indeed do, sort of),
>
> For the !RESV_PROTECTED case, perhaps one would want to consider the
> solution used currently in xe, where the VM maintains two evict lists.
> One protected by a spinlock and one protected by the VM resv. When the
> VM resv is locked to begin list traversal, the spinlock is locked
> *once* and the spinlock-protected list is looped over and copied into
> the resv protected one. For traversal, the resv protected one is used.
>
> If that works with all concerns raised so far,  list traversal would
> be greatly simplified, and no need for a separate RESV_PROTECTED mode.

Although this doesn't work with async removal from lists. Not sure if
that's still a use-case though.

Thanks,

Thomas



>
> Also some inline comments below.
>
>
>> +
>> +/**
>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>> + *
>> + * This helper is here to provide lockless list iteration. Lockless
>> as in, the
>> + * iterator releases the lock immediately after picking the first
>> element from the
>> + * list, hence list insertion and deletion can happen concurrently.
>> + *
>> + * It is not allowed to re-assign the vm_bo pointer from inside this
>> loop.
>> + *
>> + * Typical use:
>> + *
>> + *    struct drm_gpuvm_bo *vm_bo;
>> + *    LIST_HEAD(my_local_list);
>> + *
>> + *    ret = 0;
>> + *    for_each_vm_bo_in_list(gpuvm, <list_name>, &my_local_list,
>> vm_bo) {
>> + *        ret = do_something_with_vm_bo(..., vm_bo);
>> + *        if (ret)
>> + *            break;
>> + *    }
>> + *    drm_gpuvm_bo_put(vm_bo);
>> + *    restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
>> + *
>> + *
>> + * Only used for internal list iterations, not meant to be exposed
>> to the outside
>> + * world.
>> + */
>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list,
>> __vm_bo)    \
>> +    for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>> __list_name,        \
>> +                        __local_list, NULL);        \
>> +         __vm_bo;                                \
>> +         __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>> __list_name,        \
>> +                        __local_list, __vm_bo))
>> +
>> +static inline void
>> +__restore_vm_bo_list(struct drm_gpuvm *gpuvm, spinlock_t *lock,
>> +             struct list_head *list, struct list_head **local_list)
>> +{
>> +    /* Merge back the two lists, moving local list elements to the
>> +     * head to preserve previous ordering, in case it matters.
>> +     */
>> +    spin_lock(lock);
>> +    if (*local_list) {
>> +        list_splice(*local_list, list);
>> +        *local_list = NULL;
>> +    }
>> +    spin_unlock(lock);
>> +}
>> +
>> +/**
>> + * restore_vm_bo_list() - move vm_bo elements back to their original
>> list
>> + * @__gpuvm: The GPU VM
>> + * @__list_name: The name of the list we're iterating on
>> + *
>> + * When we're done iterating a vm_bo list, we should call
>> restore_vm_bo_list()
>> + * to restore the original state and let new iterations take place.
>> + */
>> +#define restore_vm_bo_list(__gpuvm, __list_name)            \
>> +    __restore_vm_bo_list((__gpuvm), &(__gpuvm)->__list_name.lock,    \
>> +                 &(__gpuvm)->__list_name.list,        \
>> +                 &(__gpuvm)->__list_name.local_list)
>> +
>> +static inline void
>> +cond_spin_lock(spinlock_t *lock, bool cond)
>> +{
>> +    if (cond)
>> +        spin_lock(lock);
>> +}
>> +
>> +static inline void
>> +cond_spin_unlock(spinlock_t *lock, bool cond)
>> +{
>> +    if (cond)
>> +        spin_unlock(lock);
>> +}
>> +
>> +static inline void
>> +__drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
>> +            struct list_head *entry, struct list_head *list)
>> +{
>> +    cond_spin_lock(lock, !!lock);
>> +    if (list_empty(entry))
>> +        list_add_tail(entry, list);
>> +    cond_spin_unlock(lock, !!lock);
>> +}
>> +
>> +/**
>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
>> + * @__vm_bo: the &drm_gpuvm_bo
>> + * @__list_name: the name of the list to insert into
>> + * @__lock: whether to lock with the internal spinlock
>> + *
>> + * Inserts the given @__vm_bo into the list specified by @__list_name.
>> + */
>> +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name,
>> __lock)            \
>> + __drm_gpuvm_bo_list_add((__vm_bo)->vm,                    \
>> +                __lock ? &(__vm_bo)->vm->__list_name.lock :    \
>> +                     NULL,                    \
>> + &(__vm_bo)->list.entry.__list_name,        \
>> +                &(__vm_bo)->vm->__list_name.list)
>> +
>> +static inline void
>> +__drm_gpuvm_bo_list_del(struct drm_gpuvm *gpuvm, spinlock_t *lock,
>> +            struct list_head *entry, bool init)
>> +{
>> +    cond_spin_lock(lock, !!lock);
>> +    if (init) {
>> +        if (!list_empty(entry))
>> +            list_del_init(entry);
>> +    } else {
>> +        list_del(entry);
>> +    }
>> +    cond_spin_unlock(lock, !!lock);
>> +}
>> +
>> +/**
>> + * drm_gpuvm_bo_list_del_init() - remove a vm_bo from the given list
>> + * @__vm_bo: the &drm_gpuvm_bo
>> + * @__list_name: the name of the list to insert into
>> + * @__lock: whether to lock with the internal spinlock
>> + *
>> + * Removes the given @__vm_bo from the list specified by @__list_name.
>> + */
>> +#define drm_gpuvm_bo_list_del_init(__vm_bo, __list_name,
>> __lock)        \
>> + __drm_gpuvm_bo_list_del((__vm_bo)->vm,                    \
>> +                __lock ? &(__vm_bo)->vm->__list_name.lock :    \
>> +                     NULL,                    \
>> + &(__vm_bo)->list.entry.__list_name,        \
>> +                true)
>> +
>> +/**
>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
>> + * @__vm_bo: the &drm_gpuvm_bo
>> + * @__list_name: the name of the list to insert into
>> + * @__lock: whether to lock with the internal spinlock
>> + *
>> + * Removes the given @__vm_bo from the list specified by @__list_name.
>> + */
>> +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name,
>> __lock)            \
>> + __drm_gpuvm_bo_list_del((__vm_bo)->vm,                    \
>> +                __lock ? &(__vm_bo)->vm->__list_name.lock :    \
>> +                     NULL,                    \
>> + &(__vm_bo)->list.entry.__list_name,        \
>> +                false)
>> +
>>   #define to_drm_gpuva(__node)    container_of((__node), struct
>> drm_gpuva, rb.node)
>>     #define GPUVA_START(node) ((node)->va.addr)
>> @@ -760,6 +978,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>> drm_gem_object *r_obj,
>>       gpuvm->rb.tree = RB_ROOT_CACHED;
>>       INIT_LIST_HEAD(&gpuvm->rb.list);
>>   +    INIT_LIST_HEAD(&gpuvm->extobj.list);
>> +    spin_lock_init(&gpuvm->extobj.lock);
>> +
>> +    INIT_LIST_HEAD(&gpuvm->evict.list);
>> +    spin_lock_init(&gpuvm->evict.lock);
>> +
>>       drm_gpuvm_check_overflow(start_offset, range);
>>       gpuvm->mm_start = start_offset;
>>       gpuvm->mm_range = range;
>> @@ -802,10 +1026,373 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
>>       WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>            "GPUVA tree is not empty, potentially leaking memory.\n");
>>   +    WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be
>> empty.\n");
>> +    WARN(!list_empty(&gpuvm->evict.list), "Evict list should be
>> empty.\n");
>> +
>>       drm_gem_object_put(gpuvm->r_obj);
>>   }
>>   EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>   +static int
>> +__drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>> +                struct drm_exec *exec,
>> +                unsigned int num_fences)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo;
>> +    LIST_HEAD(extobjs);
>> +    int ret = 0;
>> +
>> +    for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
>> +        ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
>> +        if (ret)
>> +            break;
>> +    }
>> +    /* Drop ref in case we break out of the loop. */
>> +    drm_gpuvm_bo_put(vm_bo);
>> +    restore_vm_bo_list(gpuvm, extobj);
>> +
>> +    return ret;
>> +}
>> +
>> +static int
>> +drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
>> +                 struct drm_exec *exec,
>> +                 unsigned int num_fences)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo;
>> +    int ret = 0;
>> +
>> +    drm_gpuvm_resv_assert_held(gpuvm);
>> +    list_for_each_entry(vm_bo, &gpuvm->extobj.list,
>> list.entry.extobj) {
>> +        ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
>> +        if (ret)
>> +            break;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +/**
>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>> + * @gpuvm: the &drm_gpuvm
>> + * @exec: the &drm_exec locking context
>> + * @num_fences: the amount of &dma_fences to reserve
>> + *
>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
>> + * &drm_gpuvm contains mappings of.
>> + *
>> + * Using this function directly, it is the drivers responsibility to
>> call
>> + * drm_exec_init() and drm_exec_fini() accordingly.
>> + *
>> + * Note: This function is safe against concurrent insertion and
>> removal of
>> + * external objects, however it is not safe against concurrent usage
>> itself.
>> + *
>> + * Drivers need to make sure to protect this case with either an
>> outer VM lock
>> + * or by calling drm_gpuvm_prepare_vm() before this function within the
>> + * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv
>> lock ensures
>> + * mutual exclusion.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>> +              struct drm_exec *exec,
>> +              unsigned int num_fences)
>> +{
>> +    if (drm_gpuvm_resv_protected(gpuvm))
>> +        return drm_gpuvm_prepare_objects_locked(gpuvm, exec,
>> +                            num_fences);
>> +    else
>> +        return __drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
>> +
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>> +
>> +/**
>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given
>> range
>> + * @gpuvm: the &drm_gpuvm
>> + * @exec: the &drm_exec locking context
>> + * @addr: the start address within the VA space
>> + * @range: the range to iterate within the VA space
>> + * @num_fences: the amount of &dma_fences to reserve
>> + *
>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped
>> between @addr
>> + * and @addr + @range.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
>> +            u64 addr, u64 range, unsigned int num_fences)
>> +{
>> +    struct drm_gpuva *va;
>> +    u64 end = addr + range;
>> +    int ret;
>> +
>> +    drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>> +        struct drm_gem_object *obj = va->gem.obj;
>> +
>> +        ret = drm_exec_prepare_obj(exec, obj, num_fences);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>> +
>> +/**
>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>> + * @num_fences: the amount of &dma_fences to reserve
>> + * @interruptible: sleep interruptible if waiting
>> + *
>> + * Acquires all dma-resv locks of all &drm_gem_objects the given
>> + * &drm_gpuvm contains mappings of.
>> + *
>> + * Addionally, when calling this function with struct
>> drm_gpuvm_exec::extra
>> + * being set the driver receives the given @fn callback to lock
>> additional
>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>> Typically, drivers
>> + * would call drm_exec_prepare_obj() from within this callback.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>> +            unsigned int num_fences,
>> +            bool interruptible)
>> +{
>> +    struct drm_gpuvm *gpuvm = vm_exec->vm;
>> +    struct drm_exec *exec = &vm_exec->exec;
>> +    uint32_t flags;
>> +    int ret;
>> +
>> +    flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
>> +        DRM_EXEC_IGNORE_DUPLICATES;
>> +
>> +    drm_exec_init(exec, flags);
>> +
>> +    drm_exec_until_all_locked(exec) {
>> +        ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
>> +        drm_exec_retry_on_contention(exec);
>> +        if (ret)
>> +            goto err;
>> +
>> +        ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
>> +        drm_exec_retry_on_contention(exec);
>> +        if (ret)
>> +            goto err;
>> +
>> +        if (vm_exec->extra.fn) {
>> +            ret = vm_exec->extra.fn(vm_exec, num_fences);
>> +            drm_exec_retry_on_contention(exec);
>> +            if (ret)
>> +                goto err;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +
>> +err:
>> +    drm_exec_fini(exec);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>> +
>> +static int
>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
>> +{
>> +    struct {
>> +        struct drm_gem_object **objs;
>> +        unsigned int num_objs;
>> +    } *args = vm_exec->extra.priv;
>> +
>> +    return drm_exec_prepare_array(&vm_exec->exec, args->objs,
>> +                      args->num_objs, num_fences);
>> +}
>> +
>> +/**
>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>> assoiciated BOs
>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>> + * @objs: additional &drm_gem_objects to lock
>> + * @num_objs: the number of additional &drm_gem_objects to lock
>> + * @num_fences: the amount of &dma_fences to reserve
>> + * @interruptible: sleep interruptible if waiting
>> + *
>> + * Acquires all dma-resv locks of all &drm_gem_objects the given
>> &drm_gpuvm
>> + * contains mappings of, plus the ones given through @objs.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>> +              struct drm_gem_object **objs,
>> +              unsigned int num_objs,
>> +              unsigned int num_fences,
>> +              bool interruptible)
>> +{
>> +    struct {
>> +        struct drm_gem_object **objs;
>> +        unsigned int num_objs;
>> +    } args;
>> +
>> +    args.objs = objs;
>> +    args.num_objs = num_objs;
>> +
>> +    vm_exec->extra.fn = fn_lock_array;
>> +    vm_exec->extra.priv = &args;
>> +
>> +    return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>> +
>> +/**
>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a
>> given range
>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>> + * @addr: the start address within the VA space
>> + * @range: the range to iterate within the VA space
>> + * @num_fences: the amount of &dma_fences to reserve
>> + * @interruptible: sleep interruptible if waiting
>> + *
>> + * Acquires all dma-resv locks of all &drm_gem_objects mapped
>> between @addr and
>> + * @addr + @range.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>> +              u64 addr, u64 range,
>> +              unsigned int num_fences,
>> +              bool interruptible)
>> +{
>> +    struct drm_gpuvm *gpuvm = vm_exec->vm;
>> +    struct drm_exec *exec = &vm_exec->exec;
>> +    uint32_t flags;
>> +    int ret;
>> +
>> +    flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
>> +        DRM_EXEC_IGNORE_DUPLICATES;
>> +
>> +    drm_exec_init(exec, flags);
>> +
>> +    drm_exec_until_all_locked(exec) {
>> +        ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
>> +                          num_fences);
>> +        drm_exec_retry_on_contention(exec);
>> +        if (ret)
>> +            goto err;
>> +    }
>> +
>> +    return ret;
>> +
>> +err:
>> +    drm_exec_fini(exec);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>> +
>> +static int
>> +__drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
>> +{
>> +    const struct drm_gpuvm_ops *ops = gpuvm->ops;
>> +    struct drm_gpuvm_bo *vm_bo;
>> +    LIST_HEAD(evict);
>> +    int ret = 0;
>> +
>> +    for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>> +        ret = ops->vm_bo_validate(vm_bo, exec);
>> +        if (ret)
>> +            break;
>> +    }
>> +    /* Drop ref in case we break out of the loop. */
>> +    drm_gpuvm_bo_put(vm_bo);
>> +    restore_vm_bo_list(gpuvm, evict);
>> +
>> +    return ret;
>> +}
>> +
>> +static int
>> +drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec
>> *exec)
>> +{
>> +    const struct drm_gpuvm_ops *ops = gpuvm->ops;
>> +    struct drm_gpuvm_bo *vm_bo, *next;
>> +    int ret = 0;
>> +
>> +#ifdef CONFIG_LOCKDEP
>> +    drm_gpuvm_resv_assert_held(gpuvm);
>> +    list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj)
>> +        dma_resv_assert_held(vm_bo->obj->resv);
>> +#endif
>> +
>> +    /* Iterate list safely, drivers typically remove the current
>> entry from
>> +     * their drm_gpuvm_ops::vm_bo_validate callback. Drivers might also
>> +     * re-add the entry on failure; this is safe since on failure we
>> break
>> +     * out of the loop.
>> +     */
>> +    list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
>> +                 list.entry.evict) {
>> +        ret = ops->vm_bo_validate(vm_bo, exec);
>> +        if (ret)
>> +            break;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +/**
>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>> + * @exec: the &drm_exec instance used for locking the GPUVM
>> + *
>> + * Calls the &drm_gpuvm_ops::vm_bo_validate callback for all evicted
>> buffer
>> + * objects being mapped in the given &drm_gpuvm.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
>> +{
>> +    const struct drm_gpuvm_ops *ops = gpuvm->ops;
>> +
>> +    if (unlikely(!ops || !ops->vm_bo_validate))
>> +        return -ENOTSUPP;
>> +
>> +    if (drm_gpuvm_resv_protected(gpuvm))
>> +        return drm_gpuvm_validate_locked(gpuvm, exec);
>> +    else
>> +        return __drm_gpuvm_validate(gpuvm, exec);
>> +
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>> +
>> +/**
>> + * drm_gpuvm_resv_add_fence - add fence to private and all extobj
>> + * dma-resv
>> + * @gpuvm: the &drm_gpuvm to add a fence to
>> + * @exec: the &drm_exec locking context
>> + * @fence: fence to add
>> + * @private_usage: private dma-resv usage
>> + * @extobj_usage: extobj dma-resv usage
>> + */
>> +void
>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>> +             struct drm_exec *exec,
>> +             struct dma_fence *fence,
>> +             enum dma_resv_usage private_usage,
>> +             enum dma_resv_usage extobj_usage)
>> +{
>> +    struct drm_gem_object *obj;
>> +    unsigned long index;
>> +
>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>> +        dma_resv_assert_held(obj->resv);
>> +        dma_resv_add_fence(obj->resv, fence,
>> +                   drm_gpuvm_is_extobj(gpuvm, obj) ?
>> +                   private_usage : extobj_usage);
>> +    }
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>> +
>>   /**
>>    * drm_gpuvm_bo_create() - create a new instance of struct
>> drm_gpuvm_bo
>>    * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>> @@ -838,6 +1425,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
>>       INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>       INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>   +    INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>> +    INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>> +
>>       drm_gem_object_get(obj);
>>         return vm_bo;
>> @@ -858,6 +1448,9 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>       if (!lock)
>>           drm_gpuvm_resv_assert_held(gpuvm);
>>   +    drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
>> +    drm_gpuvm_bo_list_del(vm_bo, evict, lock);
>> +
>>       list_del(&vm_bo->list.entry.gem);
>>         drm_gem_object_put(obj);
>> @@ -994,6 +1587,55 @@ drm_gpuvm_bo_obtain_prealloc(struct
>> drm_gpuvm_bo *__vm_bo)
>>   }
>>   EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>   +/**
>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>> &drm_gpuvm's
>> + * extobj list
>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj
>> list.
>> + *
>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on
>> the list
>> + * already and if the corresponding &drm_gem_object is an external
>> object,
>> + * actually.
>> + */
>> +void
>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>> +{
>> +    struct drm_gpuvm *gpuvm = vm_bo->vm;
>> +    bool lock = !drm_gpuvm_resv_protected(gpuvm);
>> +
>> +    if (!lock)
>> +        drm_gpuvm_resv_assert_held(gpuvm);
>> +
>> +    if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>> +        drm_gpuvm_bo_list_add(vm_bo, extobj, lock);
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>> +
>> +/**
>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the
>> &drm_gpuvms
>> + * evicted list
>> + * @vm_bo: the &drm_gpuvm_bo to add or remove
>> + * @evict: indicates whether the object is evicted
>> + *
>> + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms
>> evicted list.
>> + */
>> +void
>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
>> +{
>> +    struct drm_gem_object *obj = vm_bo->obj;
>> +
>> +    dma_resv_assert_held(obj->resv);
>> +
>> +    /* Always lock list transactions, even if
>> DRM_GPUVM_RESV_PROTECTED is
>> +     * set. This is required to protect multiple concurrent calls to
>> +     * drm_gpuvm_bo_evict() with BOs with different dma_resv.
>> +     */
>
> This doesn't work. The RESV_PROTECTED case requires the evicted flag
> we discussed before. The list is either protected by the spinlock or
> the resv. Otherwise a list add could race with a list removal elsewhere.
>
> Thanks,
>
> Thomas
>
>

2023-10-03 10:06:32

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Hello Thomas,

On Tue, 3 Oct 2023 10:36:10 +0200
Thomas Hellström <[email protected]> wrote:

> > +/**
> > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from
> > + * the list, so list insertion deletion can happen concurrently.
> > + *
> > + * Elements popped from the original list are kept in a local list, so removal
> > + * and is_empty checks can still happen while we're iterating the list.
> > + */
> > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \
> > + ({ \
> > + struct drm_gpuvm_bo *__vm_bo = NULL; \
> > + \
> > + drm_gpuvm_bo_put(__prev_vm_bo); \
> > + \
> > + spin_lock(&(__gpuvm)->__list_name.lock); \
>
> Here we unconditionally take the spinlocks while iterating, and the main
> point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
>
>
> > + if (!(__gpuvm)->__list_name.local_list) \
> > + (__gpuvm)->__list_name.local_list = __local_list; \
> > + else \
> > + WARN_ON((__gpuvm)->__list_name.local_list != __local_list); \
> > + \
> > + while (!list_empty(&(__gpuvm)->__list_name.list)) { \
> > + __vm_bo = list_first_entry(&(__gpuvm)->__list_name.list, \
> > + struct drm_gpuvm_bo, \
> > + list.entry.__list_name); \
> > + if (kref_get_unless_zero(&__vm_bo->kref)) {
> And unnecessarily grab a reference in the RESV_PROTECTED case.
> > \
> > + list_move_tail(&(__vm_bo)->list.entry.__list_name, \
> > + __local_list); \
> > + break; \
> > + } else { \
> > + list_del_init(&(__vm_bo)->list.entry.__list_name); \
> > + __vm_bo = NULL; \
> > + } \
> > + } \
> > + spin_unlock(&(__gpuvm)->__list_name.lock); \
> > + \
> > + __vm_bo; \
> > + })
>
> IMHO this lockless list iteration looks very complex and should be
> pretty difficult to maintain while moving forward, also since it pulls
> the gpuvm_bos off the list, list iteration needs to be protected by an
> outer lock anyway.

As being partly responsible for this convoluted list iterator, I must
say I agree with you. There's so many ways this can go wrong if the
user doesn't call it the right way, or doesn't protect concurrent list
iterations with a separate lock (luckily, this is a private iterator). I
mean, it works, so there's certainly a way to get it right, but gosh,
this is so far from the simple API I had hoped for.

> Also from what I understand from Boris, the extobj
> list would typically not need the fine-grained locking; only the evict
> list?

Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when
the GEM and VM resvs are held, and I'm deferring the drm_gpuvm_bo_put()
call to a work that's not in the dma-signalling path. This being said,
I'm still not comfortable with the

gem = drm_gem_object_get(vm_bo->gem);
dma_resv_lock(gem->resv);
drm_gpuvm_bo_put(vm_bo);
dma_resv_unlock(gem->resv);
drm_gem_object_put(gem);

dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM
owner, not to mention that drm_gpuva_unlink() calls drm_gpuvm_bo_put()
after making sure the GEM gpuvm_list lock is held, but this lock might
differ from the resv lock (custom locking so we can call
gpuvm_unlink() in the dma-signalling path). So we now have paths where
drm_gpuvm_bo_put() are called with the resv lock held, and others where
they are not, and that only works because we're relying on the the fact
those drm_gpuvm_bo_put() calls won't make the refcount drop to zero,
because the deferred vm_bo_put() work still owns a vm_bo ref.

All these tiny details add to the overall complexity of this common
layer, and to me, that's not any better than the
get_next_vm_bo_from_list() complexity you were complaining about (might
be even worth, because this sort of things leak to users).

Having an internal lock partly solves that, in that the locking of the
extobj list is now entirely orthogonal to the GEM that's being removed
from this list, and we can lock/unlock internally without forcing the
caller to take weird actions to make sure things don't explode. Don't
get me wrong, I get that this locking overhead is not acceptable for
Xe, but I feel like we're turning drm_gpuvm into a white elephant that
only few people will get right.

This is just my personal view on this, and I certainly don't want to
block or delay the merging of this patchset, but I thought I'd share my
concerns. As someone who's been following the evolution of this
drm_gpuva/vm series for weeks, and who's still sometimes getting lost,
I can't imagine how new drm_gpuvm users would feel...

> Also it seems that if we are to maintain two modes here, for
> reasonably clean code we'd need two separate instances of
> get_next_bo_from_list().
>
> For the !RESV_PROTECTED case, perhaps one would want to consider the
> solution used currently in xe, where the VM maintains two evict lists.
> One protected by a spinlock and one protected by the VM resv. When the
> VM resv is locked to begin list traversal, the spinlock is locked *once*
> and the spinlock-protected list is looped over and copied into the resv
> protected one. For traversal, the resv protected one is used.

Oh, so you do have the same sort of trick where you move the entire
list to another list, such that you can let other paths update the list
while you're iterating your own snapshot. That's interesting...

Regards,

Boris

2023-10-03 12:26:34

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Hi, Boris,

On Tue, 2023-10-03 at 12:05 +0200, Boris Brezillon wrote:
> Hello Thomas,
>
> On Tue, 3 Oct 2023 10:36:10 +0200
> Thomas Hellström <[email protected]> wrote:
>
> > > +/**
> > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > + * @__gpuvm: The GPU VM
> > > + * @__list_name: The name of the list we're iterating on
> > > + * @__local_list: A pointer to the local list used to store
> > > already iterated items
> > > + * @__prev_vm_bo: The previous element we got from
> > > drm_gpuvm_get_next_cached_vm_bo()
> > > + *
> > > + * This helper is here to provide lockless list iteration.
> > > Lockless as in, the
> > > + * iterator releases the lock immediately after picking the
> > > first element from
> > > + * the list, so list insertion deletion can happen concurrently.
> > > + *
> > > + * Elements popped from the original list are kept in a local
> > > list, so removal
> > > + * and is_empty checks can still happen while we're iterating
> > > the list.
> > > + */
> > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > __local_list, __prev_vm_bo)     \
> > > +       ({                                                       
> > >                        \
> > > +               struct drm_gpuvm_bo *__vm_bo =
> > > NULL;                                    \
> > > +                                                                
> > >                        \
> > > +               drm_gpuvm_bo_put(__prev_vm_bo);                  
> > >                        \
> > > +                                                                
> > >                        \
> > > +               spin_lock(&(__gpuvm)-
> > > >__list_name.lock);                                \ 
> >
> > Here we unconditionally take the spinlocks while iterating, and the
> > main
> > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
> >
> >
> > > +               if (!(__gpuvm)-
> > > >__list_name.local_list)                                 \
> > > +                       (__gpuvm)->__list_name.local_list =
> > > __local_list;               \
> > > +               else                                             
> > >                        \
> > > +                       WARN_ON((__gpuvm)->__list_name.local_list
> > > != __local_list);     \
> > > +                                                                
> > >                        \
> > > +               while (!list_empty(&(__gpuvm)->__list_name.list))
> > > {                     \
> > > +                       __vm_bo = list_first_entry(&(__gpuvm)-
> > > >__list_name.list,        \
> > > +                                                  struct
> > > drm_gpuvm_bo,                 \
> > > +                                                 
> > > list.entry.__list_name);             \
> > > +                       if (kref_get_unless_zero(&__vm_bo->kref))
> > > { 
> > And unnecessarily grab a reference in the RESV_PROTECTED case.
> > >                         \
> > > +                               list_move_tail(&(__vm_bo)-
> > > >list.entry.__list_name,      \
> > > +                                             
> > > __local_list);                           \
> > > +                               break;                           
> > >                        \
> > > +                       } else
> > > {                                                        \
> > > +                               list_del_init(&(__vm_bo)-
> > > >list.entry.__list_name);      \
> > > +                               __vm_bo =
> > > NULL;                                         \
> > > +                       }                                        
> > >                        \
> > > +               }                                                
> > >                        \
> > > +               spin_unlock(&(__gpuvm)-
> > > >__list_name.lock);                              \
> > > +                                                                
> > >                        \
> > > +               __vm_bo;                                         
> > >                        \
> > > +       }) 
> >
> > IMHO this lockless list iteration looks very complex and should be
> > pretty difficult to maintain while moving forward, also since it
> > pulls
> > the gpuvm_bos off the list, list iteration needs to be protected by
> > an
> > outer lock anyway.
>
> As being partly responsible for this convoluted list iterator, I must
> say I agree with you. There's so many ways this can go wrong if the
> user doesn't call it the right way, or doesn't protect concurrent
> list
> iterations with a separate lock (luckily, this is a private
> iterator). I
> mean, it works, so there's certainly a way to get it right, but gosh,
> this is so far from the simple API I had hoped for.
>
> > Also from what I understand from Boris, the extobj
> > list would typically not need the fine-grained locking; only the
> > evict
> > list?
>
> Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when
> the GEM and VM resvs are held, and I'm deferring the
> drm_gpuvm_bo_put()
> call to a work that's not in the dma-signalling path. This being
> said,
> I'm still not comfortable with the
>
> gem = drm_gem_object_get(vm_bo->gem);
> dma_resv_lock(gem->resv);
> drm_gpuvm_bo_put(vm_bo);
> dma_resv_unlock(gem->resv);
> drm_gem_object_put(gem);
>
> dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM
> owner, not to mention that drm_gpuva_unlink() calls
> drm_gpuvm_bo_put()
> after making sure the GEM gpuvm_list lock is held, but this lock
> might
> differ from the resv lock (custom locking so we can call
> gpuvm_unlink() in the dma-signalling path). So we now have paths
> where
> drm_gpuvm_bo_put() are called with the resv lock held, and others
> where
> they are not, and that only works because we're relying on the the
> fact
> those drm_gpuvm_bo_put() calls won't make the refcount drop to zero,
> because the deferred vm_bo_put() work still owns a vm_bo ref.

I'm not sure I follow to 100% here, but in the code snippet above it's
pretty clear to me that it needs to hold an explicit gem object
reference when calling dma_resv_unlock(gem->resv). Each time you copy a
referenced pointer (here from vm_bo->gem to gem) you need to up the
refcount unless you make sure (by locks or other means) that the source
of the copy has a strong refcount and stays alive, so that's no weird
action to me. Could possibly add a drm_gpuvm_bo_get_gem() to access the
gem member (and that also takes a refcount) for driver users to avoid
the potential pitfall.

>
> All these tiny details add to the overall complexity of this common
> layer, and to me, that's not any better than the
> get_next_vm_bo_from_list() complexity you were complaining about
> (might
> be even worth, because this sort of things leak to users).
>
> Having an internal lock partly solves that, in that the locking of
> the
> extobj list is now entirely orthogonal to the GEM that's being
> removed
> from this list, and we can lock/unlock internally without forcing the
> caller to take weird actions to make sure things don't explode. Don't
> get me wrong, I get that this locking overhead is not acceptable for
> Xe, but I feel like we're turning drm_gpuvm into a white elephant
> that
> only few people will get right.

I tend to agree, but to me the big complication comes from the async
(dma signalling path) state updates.

Let's say for example we have a lower level lock for the gem object's
gpuvm_bo list. Some drivers grab it from the dma fence signalling path,
other drivers need to access all vm's of a bo to grab their dma_resv
locks using a WW transaction. There will be problems, although probably
solveable.

>
> This is just my personal view on this, and I certainly don't want to
> block or delay the merging of this patchset, but I thought I'd share
> my
> concerns. As someone who's been following the evolution of this
> drm_gpuva/vm series for weeks, and who's still sometimes getting
> lost,
> I can't imagine how new drm_gpuvm users would feel...


>
> > Also it seems that if we are to maintain two modes here, for
> > reasonably clean code we'd need two separate instances of
> > get_next_bo_from_list().
> >
> > For the !RESV_PROTECTED case, perhaps one would want to consider
> > the
> > solution used currently in xe, where the VM maintains two evict
> > lists.
> > One protected by a spinlock and one protected by the VM resv. When
> > the
> > VM resv is locked to begin list traversal, the spinlock is locked
> > *once*
> > and the spinlock-protected list is looped over and copied into the
> > resv
> > protected one. For traversal, the resv protected one is used.
>
> Oh, so you do have the same sort of trick where you move the entire
> list to another list, such that you can let other paths update the
> list
> while you're iterating your own snapshot. That's interesting...

Yes, it's instead of the "evicted" bool suggested here. I thought the
latter would be simpler. Although that remains to be seen after all
use-cases are implemented.

But in general I think the concept of copying from a staging list to
another with different protection rather than traversing the first list
and unlocking between items is a good way of solving the locking
inversion problem with minimal overhead. We use it also for O(1)
userptr validation.

/Thomas


>
> Regards,
>
> Boris

2023-10-03 14:22:13

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

On Tue, 03 Oct 2023 14:25:56 +0200
Thomas Hellström <[email protected]> wrote:

> > > > +/**
> > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > + * @__gpuvm: The GPU VM
> > > > + * @__list_name: The name of the list we're iterating on
> > > > + * @__local_list: A pointer to the local list used to store
> > > > already iterated items
> > > > + * @__prev_vm_bo: The previous element we got from
> > > > drm_gpuvm_get_next_cached_vm_bo()
> > > > + *
> > > > + * This helper is here to provide lockless list iteration.
> > > > Lockless as in, the
> > > > + * iterator releases the lock immediately after picking the
> > > > first element from
> > > > + * the list, so list insertion deletion can happen concurrently.
> > > > + *
> > > > + * Elements popped from the original list are kept in a local
> > > > list, so removal
> > > > + * and is_empty checks can still happen while we're iterating
> > > > the list.
> > > > + */
> > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > > __local_list, __prev_vm_bo)     \
> > > > +       ({                                                       
> > > >                        \
> > > > +               struct drm_gpuvm_bo *__vm_bo =
> > > > NULL;                                    \
> > > > +                                                                
> > > >                        \
> > > > +               drm_gpuvm_bo_put(__prev_vm_bo);                  
> > > >                        \
> > > > +                                                                
> > > >                        \
> > > > +               spin_lock(&(__gpuvm)-
> > > > >__list_name.lock);                                \ 
> > >
> > > Here we unconditionally take the spinlocks while iterating, and the
> > > main
> > > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
> > >
> > >
> > > > +               if (!(__gpuvm)-
> > > > >__list_name.local_list)                                 \
> > > > +                       (__gpuvm)->__list_name.local_list =
> > > > __local_list;               \
> > > > +               else                                             
> > > >                        \
> > > > +                       WARN_ON((__gpuvm)->__list_name.local_list
> > > > != __local_list);     \
> > > > +                                                                
> > > >                        \
> > > > +               while (!list_empty(&(__gpuvm)->__list_name.list))
> > > > {                     \
> > > > +                       __vm_bo = list_first_entry(&(__gpuvm)-
> > > > >__list_name.list,        \
> > > > +                                                  struct
> > > > drm_gpuvm_bo,                 \
> > > > +                                                 
> > > > list.entry.__list_name);             \
> > > > +                       if (kref_get_unless_zero(&__vm_bo->kref))
> > > > { 
> > > And unnecessarily grab a reference in the RESV_PROTECTED case.
> > > >                         \
> > > > +                               list_move_tail(&(__vm_bo)-
> > > > >list.entry.__list_name,      \
> > > > +                                             
> > > > __local_list);                           \
> > > > +                               break;                           
> > > >                        \
> > > > +                       } else
> > > > {                                                        \
> > > > +                               list_del_init(&(__vm_bo)-
> > > > >list.entry.__list_name);      \
> > > > +                               __vm_bo =
> > > > NULL;                                         \
> > > > +                       }                                        
> > > >                        \
> > > > +               }                                                
> > > >                        \
> > > > +               spin_unlock(&(__gpuvm)-
> > > > >__list_name.lock);                              \
> > > > +                                                                
> > > >                        \
> > > > +               __vm_bo;                                         
> > > >                        \
> > > > +       }) 
> > >
> > > IMHO this lockless list iteration looks very complex and should be
> > > pretty difficult to maintain while moving forward, also since it
> > > pulls
> > > the gpuvm_bos off the list, list iteration needs to be protected by
> > > an
> > > outer lock anyway.
> >
> > As being partly responsible for this convoluted list iterator, I must
> > say I agree with you. There's so many ways this can go wrong if the
> > user doesn't call it the right way, or doesn't protect concurrent
> > list
> > iterations with a separate lock (luckily, this is a private
> > iterator). I
> > mean, it works, so there's certainly a way to get it right, but gosh,
> > this is so far from the simple API I had hoped for.
> >
> > > Also from what I understand from Boris, the extobj
> > > list would typically not need the fine-grained locking; only the
> > > evict
> > > list?
> >
> > Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when
> > the GEM and VM resvs are held, and I'm deferring the
> > drm_gpuvm_bo_put()
> > call to a work that's not in the dma-signalling path. This being
> > said,
> > I'm still not comfortable with the
> >
> > gem = drm_gem_object_get(vm_bo->gem);
> > dma_resv_lock(gem->resv);
> > drm_gpuvm_bo_put(vm_bo);
> > dma_resv_unlock(gem->resv);
> > drm_gem_object_put(gem);
> >
> > dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM
> > owner, not to mention that drm_gpuva_unlink() calls
> > drm_gpuvm_bo_put()
> > after making sure the GEM gpuvm_list lock is held, but this lock
> > might
> > differ from the resv lock (custom locking so we can call
> > gpuvm_unlink() in the dma-signalling path). So we now have paths
> > where
> > drm_gpuvm_bo_put() are called with the resv lock held, and others
> > where
> > they are not, and that only works because we're relying on the the
> > fact
> > those drm_gpuvm_bo_put() calls won't make the refcount drop to zero,
> > because the deferred vm_bo_put() work still owns a vm_bo ref.
>
> I'm not sure I follow to 100% here, but in the code snippet above it's
> pretty clear to me that it needs to hold an explicit gem object
> reference when calling dma_resv_unlock(gem->resv). Each time you copy a
> referenced pointer (here from vm_bo->gem to gem) you need to up the
> refcount unless you make sure (by locks or other means) that the source
> of the copy has a strong refcount and stays alive, so that's no weird
> action to me. Could possibly add a drm_gpuvm_bo_get_gem() to access the
> gem member (and that also takes a refcount) for driver users to avoid
> the potential pitfall.

Except this is only needed because of the GEM-resv-must-be-held locking
constraint that was added on vm_bo_put(). I mean, the usual way we do
object un-referencing is by calling _put() and letting the internal
logic undo things when the refcount drops to zero. If the object needs
to be removed from some list, it's normally the responsibility of the
destruction method to lock the list, remove the object and unlock the
list. Now, we have a refcounted object that's referenced by vm_bo, and
whose lock needs to be taken when the destruction happens, which leads
to this weird dance described above, when, in normal situations, we'd
just call drm_gpuvm_bo_put(vm_bo) and let drm_gpuvm do its thing.

>
> >
> > All these tiny details add to the overall complexity of this common
> > layer, and to me, that's not any better than the
> > get_next_vm_bo_from_list() complexity you were complaining about
> > (might
> > be even worth, because this sort of things leak to users).
> >
> > Having an internal lock partly solves that, in that the locking of
> > the
> > extobj list is now entirely orthogonal to the GEM that's being
> > removed
> > from this list, and we can lock/unlock internally without forcing the
> > caller to take weird actions to make sure things don't explode. Don't
> > get me wrong, I get that this locking overhead is not acceptable for
> > Xe, but I feel like we're turning drm_gpuvm into a white elephant
> > that
> > only few people will get right.
>
> I tend to agree, but to me the big complication comes from the async
> (dma signalling path) state updates.

I don't deny updating the VM state from the dma signalling path adds
some amount of complexity, but the fact we're trying to use dma_resv
locks for everything, including protection of internal datasets doesn't
help. Anyway, I think both of us are biased when it comes to judging
which approach adds the most complexity :P.

Also note that, right now, the only thing I'd like to be able to update
from the dma signalling path is the VM mapping tree. Everything else
(drm_gpuva_[un]link(), add/remove extobj), we could do outside this
path:

- for MAP operations, we could call drm_gpuva_link() in the ioctl path
(we'd just need to initialize the drm_gpuva object)
- for MAP operations, we're already calling drm_gpuvm_bo_obtain() from
the ioctl path
- for UNMAP operations, we could add the drm_gpuva_unlink() call to the
VM op cleanup worker

The only problem we'd have is that drm_gpuva_link() needs to be called
inside drm_gpuvm_ops::sm_step_remap() when a remap with next/prev !=
NULL occurs, otherwise we lose track of these mappings.

>
> Let's say for example we have a lower level lock for the gem object's
> gpuvm_bo list. Some drivers grab it from the dma fence signalling path,
> other drivers need to access all vm's of a bo to grab their dma_resv
> locks using a WW transaction. There will be problems, although probably
> solveable.

To me, the gpuvm extobj vm_bo list is just an internal list and has an
internal lock associated. The lock that's protecting the GEM vm_bo list
is a bit different in that the driver gets to decide when a vm_bo is
inserted/removed by calling drm_gpuvm_[un]link(), and can easily make
sure the lock is held when this happens, while the gpuvm internal lists
are kinda transparently updated (for instance, the first caller of
drm_gpuvm_bo_obtain() adds the vm_bo to the extobj and the last vm_bo
owner calling drm_gpuvm_bo_put() removes it from this list, which is
certainly not obvious based on the name of these functions).

If we want to let drivers iterate over the extobj/evict lists, and
assuming they are considered internal lists maintained by the core and
protected with an internal lock, we should indeed provide iteration
helpers that:

1/ make sure all the necessary external locks are held (VM resv, I
guess)
2/ make sure the internal lock is not held during iteration (the sort
of snapshot list trick you're using for the evict list in Xe)

> > > Also it seems that if we are to maintain two modes here, for
> > > reasonably clean code we'd need two separate instances of
> > > get_next_bo_from_list().
> > >
> > > For the !RESV_PROTECTED case, perhaps one would want to consider
> > > the
> > > solution used currently in xe, where the VM maintains two evict
> > > lists.
> > > One protected by a spinlock and one protected by the VM resv. When
> > > the
> > > VM resv is locked to begin list traversal, the spinlock is locked
> > > *once*
> > > and the spinlock-protected list is looped over and copied into the
> > > resv
> > > protected one. For traversal, the resv protected one is used.
> >
> > Oh, so you do have the same sort of trick where you move the entire
> > list to another list, such that you can let other paths update the
> > list
> > while you're iterating your own snapshot. That's interesting...
>
> Yes, it's instead of the "evicted" bool suggested here. I thought the
> latter would be simpler. Although that remains to be seen after all
> use-cases are implemented.
>
> But in general I think the concept of copying from a staging list to
> another with different protection rather than traversing the first list
> and unlocking between items is a good way of solving the locking
> inversion problem with minimal overhead. We use it also for O(1)
> userptr validation.

That's more or less the idea behind get_next_vm_bo_from_list() except
it's dequeuing one element at a time, instead of moving all items at
once. Note that, if you allow concurrent removal protected only by the
spinlock, you still need to take/release this spinlock when iterating
over elements of this snapshot list, because all the remover needs to
remove an element is the element itself, and it doesn't care in which
list it's currently inserted (real or snapshot/staging list), so you'd
be iterating over a moving target if you don't protect the iteration
with the spinlock.

2023-10-03 16:57:20

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

It seems like we're mostly aligned on this series, except for the key
controversy we're discussing for a few versions now: locking of the internal
lists. Hence, let's just re-iterate the options we have to get this out of the
way.

(1) The spinlock dance. This basically works for every use case, updating the VA
space from the IOCTL, from the fence signaling path or anywhere else.
However, it has the downside of requiring spin_lock() / spin_unlock() for
*each* list element when locking all external objects and validating all
evicted objects. Typically, the amount of extobjs and evicted objects
shouldn't be excessive, but there might be exceptions, e.g. Xe.

(2) The dma-resv lock dance. This is convinient for drivers updating the VA
space from a VM_BIND ioctl() and is especially efficient if such drivers
have a huge amount of external and/or evicted objects to manage. However,
the downsides are that it requires a few tricks in drivers updating the VA
space from the fence signaling path (e.g. job_run()). Design wise, I'm still
skeptical that it is a good idea to protect internal data structures with
external locks in a way that it's not clear to callers that a certain
function would access one of those resources and hence needs protection.
E.g. it is counter intuitive that drm_gpuvm_bo_put() would require both the
dma-resv lock of the corresponding object and the VM's dma-resv lock held.
(Additionally, there were some concerns from amdgpu regarding flexibility in
terms of using GPUVM for non-VM_BIND uAPIs and compute, however, AFAICS
those discussions did not complete and to me it's still unclear why it
wouldn't work.)

(3) Simply use an internal mutex per list. This adds a tiny (IMHO negligible)
overhead for drivers updating the VA space from a VM_BIND ioctl(), namely
a *single* mutex_lock()/mutex_unlock() when locking all external objects
and validating all evicted objects. And it still requires some tricks for
drivers updating the VA space from the fence signaling path. However, it's
as simple as it can be and hence way less error prone as well as
self-contained and hence easy to use. Additionally, it's flexible in a way
that we don't have any expections on drivers to already hold certain locks
that the driver in some situation might not be able to acquire in the first
place.

(4) Arbitrary combinations of the above. For instance, the current V5 implements
both (1) and (2) (as either one or the other). But also (1) and (3) (as in
(1) additionally to (3)) would be an option, where a driver could opt-in for
the spinlock dance in case it updates the VA space from the fence signaling
path.

I also considered a few other options as well, however, they don't seem to be
flexible enough. For instance, as by now we could use SRCU for the external
object list. However, this falls apart once a driver wants to remove and re-add
extobjs for the same VM_BO instance. (For the same reason it wouldn't work for
evicted objects.)

Personally, after seeing the weird implications of (1), (2) and a combination of
both, I tend to go with (3). Optionally, with an opt-in for (1). The reason for
the latter is that with (3) the weirdness of (1) by its own mostly disappears.

Please let me know what you think, and, of course, other ideas than the
mentioned ones above are still welcome.

- Danilo

On Tue, Oct 03, 2023 at 04:21:43PM +0200, Boris Brezillon wrote:
> On Tue, 03 Oct 2023 14:25:56 +0200
> Thomas Hellstr?m <[email protected]> wrote:
>
> > > > > +/**
> > > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > > + * @__gpuvm: The GPU VM
> > > > > + * @__list_name: The name of the list we're iterating on
> > > > > + * @__local_list: A pointer to the local list used to store
> > > > > already iterated items
> > > > > + * @__prev_vm_bo: The previous element we got from
> > > > > drm_gpuvm_get_next_cached_vm_bo()
> > > > > + *
> > > > > + * This helper is here to provide lockless list iteration.
> > > > > Lockless as in, the
> > > > > + * iterator releases the lock immediately after picking the
> > > > > first element from
> > > > > + * the list, so list insertion deletion can happen concurrently.
> > > > > + *
> > > > > + * Elements popped from the original list are kept in a local
> > > > > list, so removal
> > > > > + * and is_empty checks can still happen while we're iterating
> > > > > the list.
> > > > > + */
> > > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > > > __local_list, __prev_vm_bo)?????\
> > > > > +???????({???????????????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????struct drm_gpuvm_bo *__vm_bo =
> > > > > NULL;????????????????????????????????????\
> > > > > +????????????????????????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????drm_gpuvm_bo_put(__prev_vm_bo);??????????????????
> > > > > ???????????????????????\
> > > > > +????????????????????????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????spin_lock(&(__gpuvm)-
> > > > > >__list_name.lock);????????????????????????????????\?
> > > >
> > > > Here we unconditionally take the spinlocks while iterating, and the
> > > > main
> > > > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
> > > >
> > > >
> > > > > +???????????????if (!(__gpuvm)-
> > > > > >__list_name.local_list)?????????????????????????????????\
> > > > > +???????????????????????(__gpuvm)->__list_name.local_list =
> > > > > __local_list;???????????????\
> > > > > +???????????????else?????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????????????WARN_ON((__gpuvm)->__list_name.local_list
> > > > > != __local_list);?????\
> > > > > +????????????????????????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????while (!list_empty(&(__gpuvm)->__list_name.list))
> > > > > {?????????????????????\
> > > > > +???????????????????????__vm_bo = list_first_entry(&(__gpuvm)-
> > > > > >__list_name.list,????????\
> > > > > +????????????????????????????????????????????????? struct
> > > > > drm_gpuvm_bo,?????????????????\
> > > > > +?????????????????????????????????????????????????
> > > > > list.entry.__list_name);?????????????\
> > > > > +???????????????????????if (kref_get_unless_zero(&__vm_bo->kref))
> > > > > {?
> > > > And unnecessarily grab a reference in the RESV_PROTECTED case.
> > > > > ????????????????????????\
> > > > > +???????????????????????????????list_move_tail(&(__vm_bo)-
> > > > > >list.entry.__list_name,??????\
> > > > > +?????????????????????????????????????????????
> > > > > __local_list);???????????????????????????\
> > > > > +???????????????????????????????break;???????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????????????} else
> > > > > {????????????????????????????????????????????????????????\
> > > > > +???????????????????????????????list_del_init(&(__vm_bo)-
> > > > > >list.entry.__list_name);??????\
> > > > > +???????????????????????????????__vm_bo =
> > > > > NULL;?????????????????????????????????????????\
> > > > > +???????????????????????}????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????}????????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????spin_unlock(&(__gpuvm)-
> > > > > >__list_name.lock);??????????????????????????????\
> > > > > +????????????????????????????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????????????__vm_bo;?????????????????????????????????????????
> > > > > ???????????????????????\
> > > > > +???????})?
> > > >
> > > > IMHO this lockless list iteration looks very complex and should be
> > > > pretty difficult to maintain while moving forward, also since it
> > > > pulls
> > > > the gpuvm_bos off the list, list iteration needs to be protected by
> > > > an
> > > > outer lock anyway.
> > >
> > > As being partly responsible for this convoluted list iterator, I must
> > > say I agree with you. There's so many ways this can go wrong if the
> > > user doesn't call it the right way, or doesn't protect concurrent
> > > list
> > > iterations with a separate lock (luckily, this is a private
> > > iterator). I
> > > mean, it works, so there's certainly a way to get it right, but gosh,
> > > this is so far from the simple API I had hoped for.
> > >
> > > > Also from what I understand from Boris, the extobj
> > > > list would typically not need the fine-grained locking; only the
> > > > evict
> > > > list?
> > >
> > > Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when
> > > the GEM and VM resvs are held, and I'm deferring the
> > > drm_gpuvm_bo_put()
> > > call to a work that's not in the dma-signalling path. This being
> > > said,
> > > I'm still not comfortable with the
> > >
> > > gem = drm_gem_object_get(vm_bo->gem);
> > > dma_resv_lock(gem->resv);
> > > drm_gpuvm_bo_put(vm_bo);
> > > dma_resv_unlock(gem->resv);
> > > drm_gem_object_put(gem);
> > >
> > > dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM
> > > owner, not to mention that drm_gpuva_unlink() calls
> > > drm_gpuvm_bo_put()
> > > after making sure the GEM gpuvm_list lock is held, but this lock
> > > might
> > > differ from the resv lock (custom locking so we can call
> > > gpuvm_unlink() in the dma-signalling path). So we now have paths
> > > where
> > > drm_gpuvm_bo_put() are called with the resv lock held, and others
> > > where
> > > they are not, and that only works because we're relying on the the
> > > fact
> > > those drm_gpuvm_bo_put() calls won't make the refcount drop to zero,
> > > because the deferred vm_bo_put() work still owns a vm_bo ref.
> >
> > I'm not sure I follow to 100% here, but in the code snippet above it's
> > pretty clear to me that it needs to hold an explicit gem object
> > reference when calling dma_resv_unlock(gem->resv). Each time you copy a
> > referenced pointer (here from vm_bo->gem to gem) you need to up the
> > refcount unless you make sure (by locks or other means) that the source
> > of the copy has a strong refcount and stays alive, so that's no weird
> > action to me. Could possibly add a drm_gpuvm_bo_get_gem() to access the
> > gem member (and that also takes a refcount) for driver users to avoid
> > the potential pitfall.
>
> Except this is only needed because of the GEM-resv-must-be-held locking
> constraint that was added on vm_bo_put(). I mean, the usual way we do
> object un-referencing is by calling _put() and letting the internal
> logic undo things when the refcount drops to zero. If the object needs
> to be removed from some list, it's normally the responsibility of the
> destruction method to lock the list, remove the object and unlock the
> list. Now, we have a refcounted object that's referenced by vm_bo, and
> whose lock needs to be taken when the destruction happens, which leads
> to this weird dance described above, when, in normal situations, we'd
> just call drm_gpuvm_bo_put(vm_bo) and let drm_gpuvm do its thing.
>
> >
> > >
> > > All these tiny details add to the overall complexity of this common
> > > layer, and to me, that's not any better than the
> > > get_next_vm_bo_from_list() complexity you were complaining about
> > > (might
> > > be even worth, because this sort of things leak to users).
> > >
> > > Having an internal lock partly solves that, in that the locking of
> > > the
> > > extobj list is now entirely orthogonal to the GEM that's being
> > > removed
> > > from this list, and we can lock/unlock internally without forcing the
> > > caller to take weird actions to make sure things don't explode. Don't
> > > get me wrong, I get that this locking overhead is not acceptable for
> > > Xe, but I feel like we're turning drm_gpuvm into a white elephant
> > > that
> > > only few people will get right.
> >
> > I tend to agree, but to me the big complication comes from the async
> > (dma signalling path) state updates.
>
> I don't deny updating the VM state from the dma signalling path adds
> some amount of complexity, but the fact we're trying to use dma_resv
> locks for everything, including protection of internal datasets doesn't
> help. Anyway, I think both of us are biased when it comes to judging
> which approach adds the most complexity :P.
>
> Also note that, right now, the only thing I'd like to be able to update
> from the dma signalling path is the VM mapping tree. Everything else
> (drm_gpuva_[un]link(), add/remove extobj), we could do outside this
> path:
>
> - for MAP operations, we could call drm_gpuva_link() in the ioctl path
> (we'd just need to initialize the drm_gpuva object)
> - for MAP operations, we're already calling drm_gpuvm_bo_obtain() from
> the ioctl path
> - for UNMAP operations, we could add the drm_gpuva_unlink() call to the
> VM op cleanup worker
>
> The only problem we'd have is that drm_gpuva_link() needs to be called
> inside drm_gpuvm_ops::sm_step_remap() when a remap with next/prev !=
> NULL occurs, otherwise we lose track of these mappings.
>
> >
> > Let's say for example we have a lower level lock for the gem object's
> > gpuvm_bo list. Some drivers grab it from the dma fence signalling path,
> > other drivers need to access all vm's of a bo to grab their dma_resv
> > locks using a WW transaction. There will be problems, although probably
> > solveable.
>
> To me, the gpuvm extobj vm_bo list is just an internal list and has an
> internal lock associated. The lock that's protecting the GEM vm_bo list
> is a bit different in that the driver gets to decide when a vm_bo is
> inserted/removed by calling drm_gpuvm_[un]link(), and can easily make
> sure the lock is held when this happens, while the gpuvm internal lists
> are kinda transparently updated (for instance, the first caller of
> drm_gpuvm_bo_obtain() adds the vm_bo to the extobj and the last vm_bo
> owner calling drm_gpuvm_bo_put() removes it from this list, which is
> certainly not obvious based on the name of these functions).
>
> If we want to let drivers iterate over the extobj/evict lists, and
> assuming they are considered internal lists maintained by the core and
> protected with an internal lock, we should indeed provide iteration
> helpers that:
>
> 1/ make sure all the necessary external locks are held (VM resv, I
> guess)
> 2/ make sure the internal lock is not held during iteration (the sort
> of snapshot list trick you're using for the evict list in Xe)
>
> > > > Also it seems that if we are to maintain two modes here, for
> > > > reasonably clean code we'd need two separate instances of
> > > > get_next_bo_from_list().
> > > >
> > > > For the !RESV_PROTECTED case, perhaps one would want to consider
> > > > the
> > > > solution used currently in xe, where the VM maintains two evict
> > > > lists.
> > > > One protected by a spinlock and one protected by the VM resv. When
> > > > the
> > > > VM resv is locked to begin list traversal, the spinlock is locked
> > > > *once*
> > > > and the spinlock-protected list is looped over and copied into the
> > > > resv
> > > > protected one. For traversal, the resv protected one is used.
> > >
> > > Oh, so you do have the same sort of trick where you move the entire
> > > list to another list, such that you can let other paths update the
> > > list
> > > while you're iterating your own snapshot. That's interesting...
> >
> > Yes, it's instead of the "evicted" bool suggested here. I thought the
> > latter would be simpler. Although that remains to be seen after all
> > use-cases are implemented.
> >
> > But in general I think the concept of copying from a staging list to
> > another with different protection rather than traversing the first list
> > and unlocking between items is a good way of solving the locking
> > inversion problem with minimal overhead. We use it also for O(1)
> > userptr validation.
>
> That's more or less the idea behind get_next_vm_bo_from_list() except
> it's dequeuing one element at a time, instead of moving all items at
> once. Note that, if you allow concurrent removal protected only by the
> spinlock, you still need to take/release this spinlock when iterating
> over elements of this snapshot list, because all the remover needs to
> remove an element is the element itself, and it doesn't care in which
> list it's currently inserted (real or snapshot/staging list), so you'd
> be iterating over a moving target if you don't protect the iteration
> with the spinlock.
>

2023-10-03 17:38:41

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Hi, Danilo

On Tue, 2023-10-03 at 18:55 +0200, Danilo Krummrich wrote:
> It seems like we're mostly aligned on this series, except for the key
> controversy we're discussing for a few versions now: locking of the
> internal
> lists. Hence, let's just re-iterate the options we have to get this
> out of the
> way.
>
> (1) The spinlock dance. This basically works for every use case,
> updating the VA
>     space from the IOCTL, from the fence signaling path or anywhere
> else.
>     However, it has the downside of requiring spin_lock() /
> spin_unlock() for
>     *each* list element when locking all external objects and
> validating all
>     evicted objects. Typically, the amount of extobjs and evicted
> objects
>     shouldn't be excessive, but there might be exceptions, e.g. Xe.
>
> (2) The dma-resv lock dance. This is convinient for drivers updating
> the VA
>     space from a VM_BIND ioctl() and is especially efficient if such
> drivers
>     have a huge amount of external and/or evicted objects to manage.
> However,
>     the downsides are that it requires a few tricks in drivers
> updating the VA
>     space from the fence signaling path (e.g. job_run()). Design
> wise, I'm still
>     skeptical that it is a good idea to protect internal data
> structures with
>     external locks in a way that it's not clear to callers that a
> certain
>     function would access one of those resources and hence needs
> protection.
>     E.g. it is counter intuitive that drm_gpuvm_bo_put() would
> require both the
>     dma-resv lock of the corresponding object and the VM's dma-resv
> lock held.
>     (Additionally, there were some concerns from amdgpu regarding
> flexibility in
>     terms of using GPUVM for non-VM_BIND uAPIs and compute, however,
> AFAICS
>     those discussions did not complete and to me it's still unclear
> why it
>     wouldn't work.)
>
> (3) Simply use an internal mutex per list. This adds a tiny (IMHO
> negligible)
>     overhead for drivers updating the VA space from a VM_BIND
> ioctl(), namely
>     a *single* mutex_lock()/mutex_unlock() when locking all external
> objects
>     and validating all evicted objects. And it still requires some
> tricks for
>     drivers updating the VA space from the fence signaling path.
> However, it's
>     as simple as it can be and hence way less error prone as well as
>     self-contained and hence easy to use. Additionally, it's flexible
> in a way
>     that we don't have any expections on drivers to already hold
> certain locks
>     that the driver in some situation might not be able to acquire in
> the first
>     place.

Such an overhead is fully OK IMO, But didn't we conclude at some point
that using a mutex in this way isn't possible due to the fact that
validate() needs to be able to lock dma_resv, and then we have
dma_resv()->mutex->dma_resv()?


>
> (4) Arbitrary combinations of the above. For instance, the current V5
> implements
>     both (1) and (2) (as either one or the other). But also (1) and
> (3) (as in
>     (1) additionally to (3)) would be an option, where a driver could
> opt-in for
>     the spinlock dance in case it updates the VA space from the fence
> signaling
>     path.
>
> I also considered a few other options as well, however, they don't
> seem to be
> flexible enough. For instance, as by now we could use SRCU for the
> external
> object list. However, this falls apart once a driver wants to remove
> and re-add
> extobjs for the same VM_BO instance. (For the same reason it wouldn't
> work for
> evicted objects.)
>
> Personally, after seeing the weird implications of (1), (2) and a
> combination of
> both, I tend to go with (3). Optionally, with an opt-in for (1). The
> reason for
> the latter is that with (3) the weirdness of (1) by its own mostly
> disappears.
>
> Please let me know what you think, and, of course, other ideas than
> the
> mentioned ones above are still welcome.

Personally, after converting xe to version 5, I think it's pretty
convenient for the driver, (although had to add the evict trick), so I
think I'd vote for this, even if not currently using the opt-in for
(1).

/Thomas


>
> - Danilo
>
> On Tue, Oct 03, 2023 at 04:21:43PM +0200, Boris Brezillon wrote:
> > On Tue, 03 Oct 2023 14:25:56 +0200
> > Thomas Hellström <[email protected]> wrote:
> >
> > > > > > +/**
> > > > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > > > + * @__gpuvm: The GPU VM
> > > > > > + * @__list_name: The name of the list we're iterating on
> > > > > > + * @__local_list: A pointer to the local list used to
> > > > > > store
> > > > > > already iterated items
> > > > > > + * @__prev_vm_bo: The previous element we got from
> > > > > > drm_gpuvm_get_next_cached_vm_bo()
> > > > > > + *
> > > > > > + * This helper is here to provide lockless list iteration.
> > > > > > Lockless as in, the
> > > > > > + * iterator releases the lock immediately after picking
> > > > > > the
> > > > > > first element from
> > > > > > + * the list, so list insertion deletion can happen
> > > > > > concurrently.
> > > > > > + *
> > > > > > + * Elements popped from the original list are kept in a
> > > > > > local
> > > > > > list, so removal
> > > > > > + * and is_empty checks can still happen while we're
> > > > > > iterating
> > > > > > the list.
> > > > > > + */
> > > > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > > > > __local_list, __prev_vm_bo)     \
> > > > > > +       ({                                                 
> > > > > >       
> > > > > >                        \
> > > > > > +               struct drm_gpuvm_bo *__vm_bo =
> > > > > > NULL;                                    \
> > > > > > +                                                          
> > > > > >       
> > > > > >                        \
> > > > > > +               drm_gpuvm_bo_put(__prev_vm_bo);            
> > > > > >       
> > > > > >                        \
> > > > > > +                                                          
> > > > > >       
> > > > > >                        \
> > > > > > +               spin_lock(&(__gpuvm)- 
> > > > > > > __list_name.lock);                                \   
> > > > >
> > > > > Here we unconditionally take the spinlocks while iterating,
> > > > > and the
> > > > > main
> > > > > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
> > > > >
> > > > >  
> > > > > > +               if (!(__gpuvm)- 
> > > > > > > __list_name.local_list)                                 \
> > > > > > >  
> > > > > > +                       (__gpuvm)->__list_name.local_list =
> > > > > > __local_list;               \
> > > > > > +               else                                       
> > > > > >       
> > > > > >                        \
> > > > > > +                       WARN_ON((__gpuvm)-
> > > > > > >__list_name.local_list
> > > > > > != __local_list);     \
> > > > > > +                                                          
> > > > > >       
> > > > > >                        \
> > > > > > +               while (!list_empty(&(__gpuvm)-
> > > > > > >__list_name.list))
> > > > > > {                     \
> > > > > > +                       __vm_bo =
> > > > > > list_first_entry(&(__gpuvm)- 
> > > > > > > __list_name.list,        \ 
> > > > > > +                                                  struct
> > > > > > drm_gpuvm_bo,                 \
> > > > > > +                                                 
> > > > > > list.entry.__list_name);             \
> > > > > > +                       if (kref_get_unless_zero(&__vm_bo-
> > > > > > >kref))
> > > > > > {   
> > > > > And unnecessarily grab a reference in the RESV_PROTECTED
> > > > > case. 
> > > > > >                         \
> > > > > > +                               list_move_tail(&(__vm_bo)- 
> > > > > > > list.entry.__list_name,      \ 
> > > > > > +                                             
> > > > > > __local_list);                           \
> > > > > > +                               break;                     
> > > > > >       
> > > > > >                        \
> > > > > > +                       } else
> > > > > > {                                                        \
> > > > > > +                               list_del_init(&(__vm_bo)- 
> > > > > > > list.entry.__list_name);      \ 
> > > > > > +                               __vm_bo =
> > > > > > NULL;                                         \
> > > > > > +                       }                                  
> > > > > >       
> > > > > >                        \
> > > > > > +               }                                          
> > > > > >       
> > > > > >                        \
> > > > > > +               spin_unlock(&(__gpuvm)- 
> > > > > > > __list_name.lock);                              \ 
> > > > > > +                                                          
> > > > > >       
> > > > > >                        \
> > > > > > +               __vm_bo;                                   
> > > > > >       
> > > > > >                        \
> > > > > > +       })   
> > > > >
> > > > > IMHO this lockless list iteration looks very complex and
> > > > > should be
> > > > > pretty difficult to maintain while moving forward, also since
> > > > > it
> > > > > pulls
> > > > > the gpuvm_bos off the list, list iteration needs to be
> > > > > protected by
> > > > > an
> > > > > outer lock anyway. 
> > > >
> > > > As being partly responsible for this convoluted list iterator,
> > > > I must
> > > > say I agree with you. There's so many ways this can go wrong if
> > > > the
> > > > user doesn't call it the right way, or doesn't protect
> > > > concurrent
> > > > list
> > > > iterations with a separate lock (luckily, this is a private
> > > > iterator). I
> > > > mean, it works, so there's certainly a way to get it right, but
> > > > gosh,
> > > > this is so far from the simple API I had hoped for.
> > > >  
> > > > > Also from what I understand from Boris, the extobj
> > > > > list would typically not need the fine-grained locking; only
> > > > > the
> > > > > evict
> > > > > list? 
> > > >
> > > > Right, I'm adding the gpuvm_bo to extobj list in the ioctl
> > > > path, when
> > > > the GEM and VM resvs are held, and I'm deferring the
> > > > drm_gpuvm_bo_put()
> > > > call to a work that's not in the dma-signalling path. This
> > > > being
> > > > said,
> > > > I'm still not comfortable with the
> > > >
> > > > gem = drm_gem_object_get(vm_bo->gem);
> > > > dma_resv_lock(gem->resv);
> > > > drm_gpuvm_bo_put(vm_bo);
> > > > dma_resv_unlock(gem->resv);
> > > > drm_gem_object_put(gem);
> > > >
> > > > dance that's needed to avoid a UAF when the gpuvm_bo is the
> > > > last GEM
> > > > owner, not to mention that drm_gpuva_unlink() calls
> > > > drm_gpuvm_bo_put()
> > > > after making sure the GEM gpuvm_list lock is held, but this
> > > > lock
> > > > might
> > > > differ from the resv lock (custom locking so we can call
> > > > gpuvm_unlink() in the dma-signalling path). So we now have
> > > > paths
> > > > where
> > > > drm_gpuvm_bo_put() are called with the resv lock held, and
> > > > others
> > > > where
> > > > they are not, and that only works because we're relying on the
> > > > the
> > > > fact
> > > > those drm_gpuvm_bo_put() calls won't make the refcount drop to
> > > > zero,
> > > > because the deferred vm_bo_put() work still owns a vm_bo ref. 
> > >
> > > I'm not sure I follow to 100% here, but in the code snippet above
> > > it's
> > > pretty clear to me that it needs to hold an explicit gem object
> > > reference when calling dma_resv_unlock(gem->resv). Each time you
> > > copy a
> > > referenced pointer (here from vm_bo->gem to gem) you need to up
> > > the
> > > refcount unless you make sure (by locks or other means) that the
> > > source
> > > of the copy has a strong refcount and stays alive, so that's no
> > > weird
> > > action to me. Could possibly add a drm_gpuvm_bo_get_gem() to
> > > access the
> > > gem member (and that also takes a refcount) for driver users to
> > > avoid
> > > the potential pitfall.
> >
> > Except this is only needed because of the GEM-resv-must-be-held
> > locking
> > constraint that was added on vm_bo_put(). I mean, the usual way we
> > do
> > object un-referencing is by calling _put() and letting the internal
> > logic undo things when the refcount drops to zero. If the object
> > needs
> > to be removed from some list, it's normally the responsibility of
> > the
> > destruction method to lock the list, remove the object and unlock
> > the
> > list. Now, we have a refcounted object that's referenced by vm_bo,
> > and
> > whose lock needs to be taken when the destruction happens, which
> > leads
> > to this weird dance described above, when, in normal situations,
> > we'd
> > just call drm_gpuvm_bo_put(vm_bo) and let drm_gpuvm do its thing.
> >
> > >
> > > >
> > > > All these tiny details add to the overall complexity of this
> > > > common
> > > > layer, and to me, that's not any better than the
> > > > get_next_vm_bo_from_list() complexity you were complaining
> > > > about
> > > > (might
> > > > be even worth, because this sort of things leak to users).
> > > >
> > > > Having an internal lock partly solves that, in that the locking
> > > > of
> > > > the
> > > > extobj list is now entirely orthogonal to the GEM that's being
> > > > removed
> > > > from this list, and we can lock/unlock internally without
> > > > forcing the
> > > > caller to take weird actions to make sure things don't explode.
> > > > Don't
> > > > get me wrong, I get that this locking overhead is not
> > > > acceptable for
> > > > Xe, but I feel like we're turning drm_gpuvm into a white
> > > > elephant
> > > > that
> > > > only few people will get right. 
> > >
> > > I tend to agree, but to me the big complication comes from the
> > > async
> > > (dma signalling path) state updates.
> >
> > I don't deny updating the VM state from the dma signalling path
> > adds
> > some amount of complexity, but the fact we're trying to use
> > dma_resv
> > locks for everything, including protection of internal datasets
> > doesn't
> > help. Anyway, I think both of us are biased when it comes to
> > judging
> > which approach adds the most complexity :P.
> >
> > Also note that, right now, the only thing I'd like to be able to
> > update
> > from the dma signalling path is the VM mapping tree. Everything
> > else
> > (drm_gpuva_[un]link(), add/remove extobj), we could do outside this
> > path:
> >
> > - for MAP operations, we could call drm_gpuva_link() in the ioctl
> > path
> >   (we'd just need to initialize the drm_gpuva object)
> > - for MAP operations, we're already calling drm_gpuvm_bo_obtain()
> > from
> >   the ioctl path
> > - for UNMAP operations, we could add the drm_gpuva_unlink() call to
> > the
> >   VM op cleanup worker
> >
> > The only problem we'd have is that drm_gpuva_link() needs to be
> > called
> > inside drm_gpuvm_ops::sm_step_remap() when a remap with next/prev
> > !=
> > NULL occurs, otherwise we lose track of these mappings.
> >
> > >
> > > Let's say for example we have a lower level lock for the gem
> > > object's
> > > gpuvm_bo list. Some drivers grab it from the dma fence signalling
> > > path,
> > > other drivers need to access all vm's of a bo to grab their
> > > dma_resv
> > > locks using a WW transaction. There will be problems, although
> > > probably
> > > solveable.
> >
> > To me, the gpuvm extobj vm_bo list is just an internal list and has
> > an
> > internal lock associated. The lock that's protecting the GEM vm_bo
> > list
> > is a bit different in that the driver gets to decide when a vm_bo
> > is
> > inserted/removed by calling drm_gpuvm_[un]link(), and can easily
> > make
> > sure the lock is held when this happens, while the gpuvm internal
> > lists
> > are kinda transparently updated (for instance, the first caller of
> > drm_gpuvm_bo_obtain() adds the vm_bo to the extobj and the last
> > vm_bo
> > owner calling drm_gpuvm_bo_put() removes it from this list, which
> > is
> > certainly not obvious based on the name of these functions).
> >
> > If we want to let drivers iterate over the extobj/evict lists, and
> > assuming they are considered internal lists maintained by the core
> > and
> > protected with an internal lock, we should indeed provide iteration
> > helpers that:
> >
> > 1/ make sure all the necessary external locks are held (VM resv, I
> >    guess)
> > 2/ make sure the internal lock is not held during iteration (the
> > sort
> >    of snapshot list trick you're using for the evict list in Xe)
> >
> > > > > Also it seems that if we are to maintain two modes here, for
> > > > > reasonably clean code we'd need two separate instances of
> > > > > get_next_bo_from_list().
> > > > >
> > > > > For the !RESV_PROTECTED case, perhaps one would want to
> > > > > consider
> > > > > the
> > > > > solution used currently in xe, where the VM maintains two
> > > > > evict
> > > > > lists.
> > > > > One protected by a spinlock and one protected by the VM resv.
> > > > > When
> > > > > the
> > > > > VM resv is locked to begin list traversal, the spinlock is
> > > > > locked
> > > > > *once*
> > > > > and the spinlock-protected list is looped over and copied
> > > > > into the
> > > > > resv
> > > > > protected one. For traversal, the resv protected one is
> > > > > used. 
> > > >
> > > > Oh, so you do have the same sort of trick where you move the
> > > > entire
> > > > list to another list, such that you can let other paths update
> > > > the
> > > > list
> > > > while you're iterating your own snapshot. That's
> > > > interesting... 
> > >
> > > Yes, it's instead of the "evicted" bool suggested here. I thought
> > > the
> > > latter would be simpler. Although that remains to be seen after
> > > all
> > > use-cases are implemented.
> > >
> > > But in general I think the concept of copying from a staging list
> > > to
> > > another with different protection rather than traversing the
> > > first list
> > > and unlocking between items is a good way of solving the locking
> > > inversion problem with minimal overhead. We use it also for O(1)
> > > userptr validation.
> >
> > That's more or less the idea behind get_next_vm_bo_from_list()
> > except
> > it's dequeuing one element at a time, instead of moving all items
> > at
> > once. Note that, if you allow concurrent removal protected only by
> > the
> > spinlock, you still need to take/release this spinlock when
> > iterating
> > over elements of this snapshot list, because all the remover needs
> > to
> > remove an element is the element itself, and it doesn't care in
> > which
> > list it's currently inserted (real or snapshot/staging list), so
> > you'd
> > be iterating over a moving target if you don't protect the
> > iteration
> > with the spinlock.
> >
>

2023-10-03 18:57:49

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects


On 10/3/23 18:55, Danilo Krummrich wrote:
> It seems like we're mostly aligned on this series, except for the key
> controversy we're discussing for a few versions now: locking of the internal
> lists. Hence, let's just re-iterate the options we have to get this out of the
> way.
>
> (1) The spinlock dance. This basically works for every use case, updating the VA
> space from the IOCTL, from the fence signaling path or anywhere else.
> However, it has the downside of requiring spin_lock() / spin_unlock() for
> *each* list element when locking all external objects and validating all
> evicted objects. Typically, the amount of extobjs and evicted objects
> shouldn't be excessive, but there might be exceptions, e.g. Xe.
>
> (2) The dma-resv lock dance. This is convinient for drivers updating the VA
> space from a VM_BIND ioctl() and is especially efficient if such drivers
> have a huge amount of external and/or evicted objects to manage. However,
> the downsides are that it requires a few tricks in drivers updating the VA
> space from the fence signaling path (e.g. job_run()). Design wise, I'm still
> skeptical that it is a good idea to protect internal data structures with
> external locks in a way that it's not clear to callers that a certain
> function would access one of those resources and hence needs protection.
> E.g. it is counter intuitive that drm_gpuvm_bo_put() would require both the
> dma-resv lock of the corresponding object and the VM's dma-resv lock held.
> (Additionally, there were some concerns from amdgpu regarding flexibility in
> terms of using GPUVM for non-VM_BIND uAPIs and compute, however, AFAICS
> those discussions did not complete and to me it's still unclear why it
> wouldn't work.)
>
> (3) Simply use an internal mutex per list. This adds a tiny (IMHO negligible)
> overhead for drivers updating the VA space from a VM_BIND ioctl(), namely
> a *single* mutex_lock()/mutex_unlock() when locking all external objects
> and validating all evicted objects. And it still requires some tricks for
> drivers updating the VA space from the fence signaling path. However, it's
> as simple as it can be and hence way less error prone as well as
> self-contained and hence easy to use. Additionally, it's flexible in a way
> that we don't have any expections on drivers to already hold certain locks
> that the driver in some situation might not be able to acquire in the first
> place.
>
> (4) Arbitrary combinations of the above. For instance, the current V5 implements
> both (1) and (2) (as either one or the other). But also (1) and (3) (as in
> (1) additionally to (3)) would be an option, where a driver could opt-in for
> the spinlock dance in case it updates the VA space from the fence signaling
> path.
>
> I also considered a few other options as well, however, they don't seem to be
> flexible enough. For instance, as by now we could use SRCU for the external
> object list. However, this falls apart once a driver wants to remove and re-add
> extobjs for the same VM_BO instance. (For the same reason it wouldn't work for
> evicted objects.)
>
> Personally, after seeing the weird implications of (1), (2) and a combination of
> both, I tend to go with (3). Optionally, with an opt-in for (1). The reason for
> the latter is that with (3) the weirdness of (1) by its own mostly disappears.
>
> Please let me know what you think, and, of course, other ideas than the
> mentioned ones above are still welcome.
>
> - Danilo
>
Here are the locking principles Daniel put together and Dave once called
out for us to be applying when reviewing DRM code. These were prompted
by very fragile and hard to understand locking patterns in the i915
driver and I think the xe vm_bind locking design was made with these in
mind, (not sure exactly who wrote what, though so can't say for sure).

https://blog.ffwll.ch/2022/07/locking-engineering.html
https://blog.ffwll.ch/2022/08/locking-hierarchy.html

At least to me, this motivates using the resv design unless we strictly
need lower level locks that are taken in the eviction paths or userptr
invalidation paths, but doesn't rule out spinlocks or lock dropping
tricks where these are really necessary. But pretty much rules out RCU /
SRCU from what I can tell.

It also calls for documenting how individual members of structs are
protected when ever possible.

Thanks,
Thomas


2023-10-04 12:58:30

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

On 10/3/23 11:11, Thomas Hellström wrote:

<snip>

>>> +
>>> +/**
>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the &drm_gpuvms
>>> + * evicted list
>>> + * @vm_bo: the &drm_gpuvm_bo to add or remove
>>> + * @evict: indicates whether the object is evicted
>>> + *
>>> + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms evicted list.
>>> + */
>>> +void
>>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
>>> +{
>>> +    struct drm_gem_object *obj = vm_bo->obj;
>>> +
>>> +    dma_resv_assert_held(obj->resv);
>>> +
>>> +    /* Always lock list transactions, even if DRM_GPUVM_RESV_PROTECTED is
>>> +     * set. This is required to protect multiple concurrent calls to
>>> +     * drm_gpuvm_bo_evict() with BOs with different dma_resv.
>>> +     */
>>
>> This doesn't work. The RESV_PROTECTED case requires the evicted flag we discussed before. The list is either protected by the spinlock or the resv. Otherwise a list add could race with a list removal elsewhere.

I think it does unless I miss something, but it might be a bit subtle though.

Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. Additionally, when
drm_gpuvm_bo_evict() is called we hold the dma-resv of the corresponding GEM object.

In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which implies that no
one can call drm_gpuvm_bo_evict() on any of the VM's objects and no one can add a new
one and directly call drm_gpuvm_bo_evict() on it either.

>>
>> Thanks,
>>
>> Thomas
>>
>>
>

2023-10-04 13:36:35

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

On 10/3/23 19:37, Thomas Hellström wrote:
> Hi, Danilo
>
> On Tue, 2023-10-03 at 18:55 +0200, Danilo Krummrich wrote:
>> It seems like we're mostly aligned on this series, except for the key
>> controversy we're discussing for a few versions now: locking of the
>> internal
>> lists. Hence, let's just re-iterate the options we have to get this
>> out of the
>> way.
>>
>> (1) The spinlock dance. This basically works for every use case,
>> updating the VA
>>     space from the IOCTL, from the fence signaling path or anywhere
>> else.
>>     However, it has the downside of requiring spin_lock() /
>> spin_unlock() for
>>     *each* list element when locking all external objects and
>> validating all
>>     evicted objects. Typically, the amount of extobjs and evicted
>> objects
>>     shouldn't be excessive, but there might be exceptions, e.g. Xe.
>>
>> (2) The dma-resv lock dance. This is convinient for drivers updating
>> the VA
>>     space from a VM_BIND ioctl() and is especially efficient if such
>> drivers
>>     have a huge amount of external and/or evicted objects to manage.
>> However,
>>     the downsides are that it requires a few tricks in drivers
>> updating the VA
>>     space from the fence signaling path (e.g. job_run()). Design
>> wise, I'm still
>>     skeptical that it is a good idea to protect internal data
>> structures with
>>     external locks in a way that it's not clear to callers that a
>> certain
>>     function would access one of those resources and hence needs
>> protection.
>>     E.g. it is counter intuitive that drm_gpuvm_bo_put() would
>> require both the
>>     dma-resv lock of the corresponding object and the VM's dma-resv
>> lock held.
>>     (Additionally, there were some concerns from amdgpu regarding
>> flexibility in
>>     terms of using GPUVM for non-VM_BIND uAPIs and compute, however,
>> AFAICS
>>     those discussions did not complete and to me it's still unclear
>> why it
>>     wouldn't work.)
>>
>> (3) Simply use an internal mutex per list. This adds a tiny (IMHO
>> negligible)
>>     overhead for drivers updating the VA space from a VM_BIND
>> ioctl(), namely
>>     a *single* mutex_lock()/mutex_unlock() when locking all external
>> objects
>>     and validating all evicted objects. And it still requires some
>> tricks for
>>     drivers updating the VA space from the fence signaling path.
>> However, it's
>>     as simple as it can be and hence way less error prone as well as
>>     self-contained and hence easy to use. Additionally, it's flexible
>> in a way
>>     that we don't have any expections on drivers to already hold
>> certain locks
>>     that the driver in some situation might not be able to acquire in
>> the first
>>     place.
>
> Such an overhead is fully OK IMO, But didn't we conclude at some point
> that using a mutex in this way isn't possible due to the fact that
> validate() needs to be able to lock dma_resv, and then we have
> dma_resv()->mutex->dma_resv()?

Oh, yes. I already forgot about it. I think it would work for protecting the
evicted list. But it breaks with the external object list, because we'd hold
the mutex while acquiring the dma-resv locks. Hence, there'd be a potential
lock inversion when drm_gpuvm_bo_put() is called with the corrsponding
dma-resv lock held. Then this option is indeed gone as well, unfortunately.

>
>
>>
>> (4) Arbitrary combinations of the above. For instance, the current V5
>> implements
>>     both (1) and (2) (as either one or the other). But also (1) and
>> (3) (as in
>>     (1) additionally to (3)) would be an option, where a driver could
>> opt-in for
>>     the spinlock dance in case it updates the VA space from the fence
>> signaling
>>     path.
>>
>> I also considered a few other options as well, however, they don't
>> seem to be
>> flexible enough. For instance, as by now we could use SRCU for the
>> external
>> object list. However, this falls apart once a driver wants to remove
>> and re-add
>> extobjs for the same VM_BO instance. (For the same reason it wouldn't
>> work for
>> evicted objects.)
>>
>> Personally, after seeing the weird implications of (1), (2) and a
>> combination of
>> both, I tend to go with (3). Optionally, with an opt-in for (1). The
>> reason for
>> the latter is that with (3) the weirdness of (1) by its own mostly
>> disappears.
>>
>> Please let me know what you think, and, of course, other ideas than
>> the
>> mentioned ones above are still welcome.
>
> Personally, after converting xe to version 5, I think it's pretty
> convenient for the driver, (although had to add the evict trick), so I

With evict trick you mean a field drm_gpuvm_bo::evicted? I think we don't
need it necessarily (see my previous reply). But I agree it'd be a bit
cleaner locking wise.

My only concern with that is that it would restrict the context in which
the evict list is useful, because it implies that in order to even see the
actual state of the evict list all external objects must be locked first.

What if a driver wants to only lock and validate only a certain range of
the VA space? Surely, it can just call validate() for each drm_gpuva's BO,
but depending on the size of the range we might still want to accelerate
it using the evicted list.

Honestly, I don't know if there are drivers having this need, but Christians
concerns about this way of updating the evict list seemed to go in this
direction.

> think I'd vote for this, even if not currently using the opt-in for
> (1).

Yeah, also due to the lack of other options, I think we need to stick with
what V5 already does. Either with or without drm_gpuvm_bo::evicted field.

Keeping the dma-resv locking scheme, I think we'd want some helpers around
drm_gpuvm_bo_put() for the drm_exec dance that is required for external
objects. Maybe add a drm_gpuvm_bo_put_locked() which can be called with the
dma-resv locks held and let drm_gpuvm_bo_put() do the drm_exec dance?

>
> /Thomas
>
>
>>
>> - Danilo
>>
>> On Tue, Oct 03, 2023 at 04:21:43PM +0200, Boris Brezillon wrote:
>>> On Tue, 03 Oct 2023 14:25:56 +0200
>>> Thomas Hellström <[email protected]> wrote:
>>>
>>>>>>> +/**
>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>> + * @__local_list: A pointer to the local list used to
>>>>>>> store
>>>>>>> already iterated items
>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>> + *
>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>> Lockless as in, the
>>>>>>> + * iterator releases the lock immediately after picking
>>>>>>> the
>>>>>>> first element from
>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>> concurrently.
>>>>>>> + *
>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>> local
>>>>>>> list, so removal
>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>> iterating
>>>>>>> the list.
>>>>>>> + */
>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>> +       ({
>>>>>>>
>>>>>>>                        \
>>>>>>> +               struct drm_gpuvm_bo *__vm_bo =
>>>>>>> NULL;                                    \
>>>>>>> +
>>>>>>>
>>>>>>>                        \
>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>
>>>>>>>                        \
>>>>>>> +
>>>>>>>
>>>>>>>                        \
>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>> __list_name.lock);                                \
>>>>>>
>>>>>> Here we unconditionally take the spinlocks while iterating,
>>>>>> and the
>>>>>> main
>>>>>> point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
>>>>>>
>>>>>>
>>>>>>> +               if (!(__gpuvm)-
>>>>>>>> __list_name.local_list)                                 \
>>>>>>>>
>>>>>>> +                       (__gpuvm)->__list_name.local_list =
>>>>>>> __local_list;               \
>>>>>>> +               else
>>>>>>>
>>>>>>>                        \
>>>>>>> +                       WARN_ON((__gpuvm)-
>>>>>>>> __list_name.local_list
>>>>>>> != __local_list);     \
>>>>>>> +
>>>>>>>
>>>>>>>                        \
>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>> __list_name.list))
>>>>>>> {                     \
>>>>>>> +                       __vm_bo =
>>>>>>> list_first_entry(&(__gpuvm)-
>>>>>>>> __list_name.list,        \
>>>>>>> +                                                  struct
>>>>>>> drm_gpuvm_bo,                 \
>>>>>>> +
>>>>>>> list.entry.__list_name);             \
>>>>>>> +                       if (kref_get_unless_zero(&__vm_bo-
>>>>>>>> kref))
>>>>>>> {
>>>>>> And unnecessarily grab a reference in the RESV_PROTECTED
>>>>>> case.
>>>>>>>                         \
>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>> list.entry.__list_name,      \
>>>>>>> +
>>>>>>> __local_list);                           \
>>>>>>> +                               break;
>>>>>>>
>>>>>>>                        \
>>>>>>> +                       } else
>>>>>>> {                                                        \
>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>> list.entry.__list_name);      \
>>>>>>> +                               __vm_bo =
>>>>>>> NULL;                                         \
>>>>>>> +                       }
>>>>>>>
>>>>>>>                        \
>>>>>>> +               }
>>>>>>>
>>>>>>>                        \
>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>> __list_name.lock);                              \
>>>>>>> +
>>>>>>>
>>>>>>>                        \
>>>>>>> +               __vm_bo;
>>>>>>>
>>>>>>>                        \
>>>>>>> +       })
>>>>>>
>>>>>> IMHO this lockless list iteration looks very complex and
>>>>>> should be
>>>>>> pretty difficult to maintain while moving forward, also since
>>>>>> it
>>>>>> pulls
>>>>>> the gpuvm_bos off the list, list iteration needs to be
>>>>>> protected by
>>>>>> an
>>>>>> outer lock anyway.
>>>>>
>>>>> As being partly responsible for this convoluted list iterator,
>>>>> I must
>>>>> say I agree with you. There's so many ways this can go wrong if
>>>>> the
>>>>> user doesn't call it the right way, or doesn't protect
>>>>> concurrent
>>>>> list
>>>>> iterations with a separate lock (luckily, this is a private
>>>>> iterator). I
>>>>> mean, it works, so there's certainly a way to get it right, but
>>>>> gosh,
>>>>> this is so far from the simple API I had hoped for.
>>>>>
>>>>>> Also from what I understand from Boris, the extobj
>>>>>> list would typically not need the fine-grained locking; only
>>>>>> the
>>>>>> evict
>>>>>> list?
>>>>>
>>>>> Right, I'm adding the gpuvm_bo to extobj list in the ioctl
>>>>> path, when
>>>>> the GEM and VM resvs are held, and I'm deferring the
>>>>> drm_gpuvm_bo_put()
>>>>> call to a work that's not in the dma-signalling path. This
>>>>> being
>>>>> said,
>>>>> I'm still not comfortable with the
>>>>>
>>>>> gem = drm_gem_object_get(vm_bo->gem);
>>>>> dma_resv_lock(gem->resv);
>>>>> drm_gpuvm_bo_put(vm_bo);
>>>>> dma_resv_unlock(gem->resv);
>>>>> drm_gem_object_put(gem);
>>>>>
>>>>> dance that's needed to avoid a UAF when the gpuvm_bo is the
>>>>> last GEM
>>>>> owner, not to mention that drm_gpuva_unlink() calls
>>>>> drm_gpuvm_bo_put()
>>>>> after making sure the GEM gpuvm_list lock is held, but this
>>>>> lock
>>>>> might
>>>>> differ from the resv lock (custom locking so we can call
>>>>> gpuvm_unlink() in the dma-signalling path). So we now have
>>>>> paths
>>>>> where
>>>>> drm_gpuvm_bo_put() are called with the resv lock held, and
>>>>> others
>>>>> where
>>>>> they are not, and that only works because we're relying on the
>>>>> the
>>>>> fact
>>>>> those drm_gpuvm_bo_put() calls won't make the refcount drop to
>>>>> zero,
>>>>> because the deferred vm_bo_put() work still owns a vm_bo ref.
>>>>
>>>> I'm not sure I follow to 100% here, but in the code snippet above
>>>> it's
>>>> pretty clear to me that it needs to hold an explicit gem object
>>>> reference when calling dma_resv_unlock(gem->resv). Each time you
>>>> copy a
>>>> referenced pointer (here from vm_bo->gem to gem) you need to up
>>>> the
>>>> refcount unless you make sure (by locks or other means) that the
>>>> source
>>>> of the copy has a strong refcount and stays alive, so that's no
>>>> weird
>>>> action to me. Could possibly add a drm_gpuvm_bo_get_gem() to
>>>> access the
>>>> gem member (and that also takes a refcount) for driver users to
>>>> avoid
>>>> the potential pitfall.
>>>
>>> Except this is only needed because of the GEM-resv-must-be-held
>>> locking
>>> constraint that was added on vm_bo_put(). I mean, the usual way we
>>> do
>>> object un-referencing is by calling _put() and letting the internal
>>> logic undo things when the refcount drops to zero. If the object
>>> needs
>>> to be removed from some list, it's normally the responsibility of
>>> the
>>> destruction method to lock the list, remove the object and unlock
>>> the
>>> list. Now, we have a refcounted object that's referenced by vm_bo,
>>> and
>>> whose lock needs to be taken when the destruction happens, which
>>> leads
>>> to this weird dance described above, when, in normal situations,
>>> we'd
>>> just call drm_gpuvm_bo_put(vm_bo) and let drm_gpuvm do its thing.
>>>
>>>>
>>>>>
>>>>> All these tiny details add to the overall complexity of this
>>>>> common
>>>>> layer, and to me, that's not any better than the
>>>>> get_next_vm_bo_from_list() complexity you were complaining
>>>>> about
>>>>> (might
>>>>> be even worth, because this sort of things leak to users).
>>>>>
>>>>> Having an internal lock partly solves that, in that the locking
>>>>> of
>>>>> the
>>>>> extobj list is now entirely orthogonal to the GEM that's being
>>>>> removed
>>>>> from this list, and we can lock/unlock internally without
>>>>> forcing the
>>>>> caller to take weird actions to make sure things don't explode.
>>>>> Don't
>>>>> get me wrong, I get that this locking overhead is not
>>>>> acceptable for
>>>>> Xe, but I feel like we're turning drm_gpuvm into a white
>>>>> elephant
>>>>> that
>>>>> only few people will get right.
>>>>
>>>> I tend to agree, but to me the big complication comes from the
>>>> async
>>>> (dma signalling path) state updates.
>>>
>>> I don't deny updating the VM state from the dma signalling path
>>> adds
>>> some amount of complexity, but the fact we're trying to use
>>> dma_resv
>>> locks for everything, including protection of internal datasets
>>> doesn't
>>> help. Anyway, I think both of us are biased when it comes to
>>> judging
>>> which approach adds the most complexity :P.
>>>
>>> Also note that, right now, the only thing I'd like to be able to
>>> update
>>> from the dma signalling path is the VM mapping tree. Everything
>>> else
>>> (drm_gpuva_[un]link(), add/remove extobj), we could do outside this
>>> path:
>>>
>>> - for MAP operations, we could call drm_gpuva_link() in the ioctl
>>> path
>>>   (we'd just need to initialize the drm_gpuva object)
>>> - for MAP operations, we're already calling drm_gpuvm_bo_obtain()
>>> from
>>>   the ioctl path
>>> - for UNMAP operations, we could add the drm_gpuva_unlink() call to
>>> the
>>>   VM op cleanup worker
>>>
>>> The only problem we'd have is that drm_gpuva_link() needs to be
>>> called
>>> inside drm_gpuvm_ops::sm_step_remap() when a remap with next/prev
>>> !=
>>> NULL occurs, otherwise we lose track of these mappings.
>>>
>>>>
>>>> Let's say for example we have a lower level lock for the gem
>>>> object's
>>>> gpuvm_bo list. Some drivers grab it from the dma fence signalling
>>>> path,
>>>> other drivers need to access all vm's of a bo to grab their
>>>> dma_resv
>>>> locks using a WW transaction. There will be problems, although
>>>> probably
>>>> solveable.
>>>
>>> To me, the gpuvm extobj vm_bo list is just an internal list and has
>>> an
>>> internal lock associated. The lock that's protecting the GEM vm_bo
>>> list
>>> is a bit different in that the driver gets to decide when a vm_bo
>>> is
>>> inserted/removed by calling drm_gpuvm_[un]link(), and can easily
>>> make
>>> sure the lock is held when this happens, while the gpuvm internal
>>> lists
>>> are kinda transparently updated (for instance, the first caller of
>>> drm_gpuvm_bo_obtain() adds the vm_bo to the extobj and the last
>>> vm_bo
>>> owner calling drm_gpuvm_bo_put() removes it from this list, which
>>> is
>>> certainly not obvious based on the name of these functions).
>>>
>>> If we want to let drivers iterate over the extobj/evict lists, and
>>> assuming they are considered internal lists maintained by the core
>>> and
>>> protected with an internal lock, we should indeed provide iteration
>>> helpers that:
>>>
>>> 1/ make sure all the necessary external locks are held (VM resv, I
>>>    guess)
>>> 2/ make sure the internal lock is not held during iteration (the
>>> sort
>>>    of snapshot list trick you're using for the evict list in Xe)
>>>
>>>>>> Also it seems that if we are to maintain two modes here, for
>>>>>> reasonably clean code we'd need two separate instances of
>>>>>> get_next_bo_from_list().
>>>>>>
>>>>>> For the !RESV_PROTECTED case, perhaps one would want to
>>>>>> consider
>>>>>> the
>>>>>> solution used currently in xe, where the VM maintains two
>>>>>> evict
>>>>>> lists.
>>>>>> One protected by a spinlock and one protected by the VM resv.
>>>>>> When
>>>>>> the
>>>>>> VM resv is locked to begin list traversal, the spinlock is
>>>>>> locked
>>>>>> *once*
>>>>>> and the spinlock-protected list is looped over and copied
>>>>>> into the
>>>>>> resv
>>>>>> protected one. For traversal, the resv protected one is
>>>>>> used.
>>>>>
>>>>> Oh, so you do have the same sort of trick where you move the
>>>>> entire
>>>>> list to another list, such that you can let other paths update
>>>>> the
>>>>> list
>>>>> while you're iterating your own snapshot. That's
>>>>> interesting...
>>>>
>>>> Yes, it's instead of the "evicted" bool suggested here. I thought
>>>> the
>>>> latter would be simpler. Although that remains to be seen after
>>>> all
>>>> use-cases are implemented.
>>>>
>>>> But in general I think the concept of copying from a staging list
>>>> to
>>>> another with different protection rather than traversing the
>>>> first list
>>>> and unlocking between items is a good way of solving the locking
>>>> inversion problem with minimal overhead. We use it also for O(1)
>>>> userptr validation.
>>>
>>> That's more or less the idea behind get_next_vm_bo_from_list()
>>> except
>>> it's dequeuing one element at a time, instead of moving all items
>>> at
>>> once. Note that, if you allow concurrent removal protected only by
>>> the
>>> spinlock, you still need to take/release this spinlock when
>>> iterating
>>> over elements of this snapshot list, because all the remover needs
>>> to
>>> remove an element is the element itself, and it doesn't care in
>>> which
>>> list it's currently inserted (real or snapshot/staging list), so
>>> you'd
>>> be iterating over a moving target if you don't protect the
>>> iteration
>>> with the spinlock.
>>>
>>
>

2023-10-04 15:30:05

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects


On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:
> On 10/3/23 11:11, Thomas Hellström wrote:
>
> <snip>
>
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to /
> > > > from the &drm_gpuvms
> > > > + * evicted list
> > > > + * @vm_bo: the &drm_gpuvm_bo to add or remove
> > > > + * @evict: indicates whether the object is evicted
> > > > + *
> > > > + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms
> > > > evicted list.
> > > > + */
> > > > +void
> > > > +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
> > > > +{
> > > > +    struct drm_gem_object *obj = vm_bo->obj;
> > > > +
> > > > +    dma_resv_assert_held(obj->resv);
> > > > +
> > > > +    /* Always lock list transactions, even if
> > > > DRM_GPUVM_RESV_PROTECTED is
> > > > +     * set. This is required to protect multiple concurrent
> > > > calls to
> > > > +     * drm_gpuvm_bo_evict() with BOs with different dma_resv.
> > > > +     */
> > >
> > > This doesn't work. The RESV_PROTECTED case requires the evicted
> > > flag we discussed before. The list is either protected by the
> > > spinlock or the resv. Otherwise a list add could race with a list
> > > removal elsewhere.
>
> I think it does unless I miss something, but it might be a bit subtle
> though.
>
> Concurrent drm_gpuvm_bo_evict() are protected by the spinlock.
> Additionally, when
> drm_gpuvm_bo_evict() is called we hold the dma-resv of the
> corresponding GEM object.
>
> In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which
> implies that no
> one can call drm_gpuvm_bo_evict() on any of the VM's objects and no
> one can add a new
> one and directly call drm_gpuvm_bo_evict() on it either.

But translated into how the data (the list in this case) is protected
it becomes

"Either the spinlock and the bo resv of a single list item OR the bo
resvs of all bos that can potentially be on the list",

while this is certainly possible to assert, any new / future code that
manipulates the evict list will probably get this wrong and as a result
the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() already
gets it wrong in that it, while holding a single resv, doesn't take the
spinlock.

So I think that needs fixing, and if keeping that protection I think it
needs to be documented with the list member and ideally an assert. But
also note that lockdep_assert_held will typically give false true for
dma_resv locks; as long as the first dma_resv lock locked in a drm_exec
sequence remains locked, lockdep thinks *all* dma_resv locks are held.
(or something along those lines), so the resv lockdep asserts are
currently pretty useless.

/Thomas



>
> > >
> > > Thanks,
> > >
> > > Thomas
> > >
> > >
> >
>

2023-10-04 17:18:31

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

On 10/4/23 17:29, Thomas Hellström wrote:
>
> On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:
>> On 10/3/23 11:11, Thomas Hellström wrote:
>>
>> <snip>
>>
>>>>> +
>>>>> +/**
>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to /
>>>>> from the &drm_gpuvms
>>>>> + * evicted list
>>>>> + * @vm_bo: the &drm_gpuvm_bo to add or remove
>>>>> + * @evict: indicates whether the object is evicted
>>>>> + *
>>>>> + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms
>>>>> evicted list.
>>>>> + */
>>>>> +void
>>>>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
>>>>> +{
>>>>> +    struct drm_gem_object *obj = vm_bo->obj;
>>>>> +
>>>>> +    dma_resv_assert_held(obj->resv);
>>>>> +
>>>>> +    /* Always lock list transactions, even if
>>>>> DRM_GPUVM_RESV_PROTECTED is
>>>>> +     * set. This is required to protect multiple concurrent
>>>>> calls to
>>>>> +     * drm_gpuvm_bo_evict() with BOs with different dma_resv.
>>>>> +     */
>>>>
>>>> This doesn't work. The RESV_PROTECTED case requires the evicted
>>>> flag we discussed before. The list is either protected by the
>>>> spinlock or the resv. Otherwise a list add could race with a list
>>>> removal elsewhere.
>>
>> I think it does unless I miss something, but it might be a bit subtle
>> though.
>>
>> Concurrent drm_gpuvm_bo_evict() are protected by the spinlock.
>> Additionally, when
>> drm_gpuvm_bo_evict() is called we hold the dma-resv of the
>> corresponding GEM object.
>>
>> In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which
>> implies that no
>> one can call drm_gpuvm_bo_evict() on any of the VM's objects and no
>> one can add a new
>> one and directly call drm_gpuvm_bo_evict() on it either.
>
> But translated into how the data (the list in this case) is protected
> it becomes
>
> "Either the spinlock and the bo resv of a single list item OR the bo
> resvs of all bos that can potentially be on the list",
>
> while this is certainly possible to assert, any new / future code that
> manipulates the evict list will probably get this wrong and as a result
> the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() already
> gets it wrong in that it, while holding a single resv, doesn't take the
> spinlock.

That's true and I don't like it either. Unfortunately, with the dma-resv
locking scheme we can't really protect the evict list without the
drm_gpuvm_bo::evicted trick properly.

But as pointed out in my other reply, I'm a bit worried about the
drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's fine
doing it in the RESV_PROTECTED case.

>
> So I think that needs fixing, and if keeping that protection I think it
> needs to be documented with the list member and ideally an assert. But
> also note that lockdep_assert_held will typically give false true for
> dma_resv locks; as long as the first dma_resv lock locked in a drm_exec
> sequence remains locked, lockdep thinks *all* dma_resv locks are held.
> (or something along those lines), so the resv lockdep asserts are
> currently pretty useless.
>
> /Thomas
>
>
>
>>
>>>>
>>>> Thanks,
>>>>
>>>> Thomas
>>>>
>>>>
>>>
>>
>

2023-10-04 17:57:59

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

On Wed, 2023-10-04 at 19:17 +0200, Danilo Krummrich wrote:
> On 10/4/23 17:29, Thomas Hellström wrote:
> >
> > On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:
> > > On 10/3/23 11:11, Thomas Hellström wrote:
> > >
> > > <snip>
> > >
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to
> > > > > > /
> > > > > > from the &drm_gpuvms
> > > > > > + * evicted list
> > > > > > + * @vm_bo: the &drm_gpuvm_bo to add or remove
> > > > > > + * @evict: indicates whether the object is evicted
> > > > > > + *
> > > > > > + * Adds a &drm_gpuvm_bo to or removes it from the
> > > > > > &drm_gpuvms
> > > > > > evicted list.
> > > > > > + */
> > > > > > +void
> > > > > > +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
> > > > > > +{
> > > > > > +    struct drm_gem_object *obj = vm_bo->obj;
> > > > > > +
> > > > > > +    dma_resv_assert_held(obj->resv);
> > > > > > +
> > > > > > +    /* Always lock list transactions, even if
> > > > > > DRM_GPUVM_RESV_PROTECTED is
> > > > > > +     * set. This is required to protect multiple
> > > > > > concurrent
> > > > > > calls to
> > > > > > +     * drm_gpuvm_bo_evict() with BOs with different
> > > > > > dma_resv.
> > > > > > +     */
> > > > >
> > > > > This doesn't work. The RESV_PROTECTED case requires the
> > > > > evicted
> > > > > flag we discussed before. The list is either protected by the
> > > > > spinlock or the resv. Otherwise a list add could race with a
> > > > > list
> > > > > removal elsewhere.
> > >
> > > I think it does unless I miss something, but it might be a bit
> > > subtle
> > > though.
> > >
> > > Concurrent drm_gpuvm_bo_evict() are protected by the spinlock.
> > > Additionally, when
> > > drm_gpuvm_bo_evict() is called we hold the dma-resv of the
> > > corresponding GEM object.
> > >
> > > In drm_gpuvm_validate() I assert that we hold *all* dma-resv,
> > > which
> > > implies that no
> > > one can call drm_gpuvm_bo_evict() on any of the VM's objects and
> > > no
> > > one can add a new
> > > one and directly call drm_gpuvm_bo_evict() on it either.
> >
> > But translated into how the data (the list in this case) is
> > protected
> > it becomes
> >
> > "Either the spinlock and the bo resv of a single list item OR the
> > bo
> > resvs of all bos that can potentially be on the list",
> >
> > while this is certainly possible to assert, any new / future code
> > that
> > manipulates the evict list will probably get this wrong and as a
> > result
> > the code becomes pretty fragile. I think drm_gpuvm_bo_destroy()
> > already
> > gets it wrong in that it, while holding a single resv, doesn't take
> > the
> > spinlock.
>
> That's true and I don't like it either. Unfortunately, with the dma-
> resv
> locking scheme we can't really protect the evict list without the
> drm_gpuvm_bo::evicted trick properly.
>
> But as pointed out in my other reply, I'm a bit worried about the
> drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's
> fine
> doing it in the RESV_PROTECTED case.

Ah, indeed. I misread that as discussing the current code rather than
the drm_gpuvm_bo::evicted trick. If validating only a subset, or a
range, then with the drm_gpuvm_bo::evicted trick would be valid only
for that subset.

But the current code would break because the condition of locking "the
resvs of all bos that can potentially be on the list" doesn't hold
anymore, and you'd get list corruption.

What *would* work, though, is the solution currently in xe, The
original evict list, and a staging evict list whose items are copied
over on validation. The staging evict list being protected by the
spinlock, the original evict list by the resv, and they'd use separate
list heads in the drm_gpuvm_bo, but that is yet another complication.

But I think if this becomes an issue, those VMs (perhaps OpenGL UMD
VMs) only wanting to validate a subset, would simply initially rely on
the current non-RESV solution. It looks like it's only a matter of
flipping the flag on a per-vm basis.

/Thomas


>
> >
> > So I think that needs fixing, and if keeping that protection I
> > think it
> > needs to be documented with the list member and ideally an assert.
> > But
> > also note that lockdep_assert_held will typically give false true
> > for
> > dma_resv locks; as long as the first dma_resv lock locked in a
> > drm_exec
> > sequence  remains locked, lockdep thinks *all* dma_resv locks are
> > held.
> > (or something along those lines), so the resv lockdep asserts are
> > currently pretty useless.
> >
> > /Thomas
> >
> >
> >
> > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Thomas
> > > > >
> > > > >
> > > >
> > >
> >
>

2023-10-04 18:25:18

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects



On 10/4/23 19:57, Thomas Hellström wrote:
> On Wed, 2023-10-04 at 19:17 +0200, Danilo Krummrich wrote:
>> On 10/4/23 17:29, Thomas Hellström wrote:
>>>
>>> On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:
>>>> On 10/3/23 11:11, Thomas Hellström wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to
>>>>>>> /
>>>>>>> from the &drm_gpuvms
>>>>>>> + * evicted list
>>>>>>> + * @vm_bo: the &drm_gpuvm_bo to add or remove
>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>> + *
>>>>>>> + * Adds a &drm_gpuvm_bo to or removes it from the
>>>>>>> &drm_gpuvms
>>>>>>> evicted list.
>>>>>>> + */
>>>>>>> +void
>>>>>>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
>>>>>>> +{
>>>>>>> +    struct drm_gem_object *obj = vm_bo->obj;
>>>>>>> +
>>>>>>> +    dma_resv_assert_held(obj->resv);
>>>>>>> +
>>>>>>> +    /* Always lock list transactions, even if
>>>>>>> DRM_GPUVM_RESV_PROTECTED is
>>>>>>> +     * set. This is required to protect multiple
>>>>>>> concurrent
>>>>>>> calls to
>>>>>>> +     * drm_gpuvm_bo_evict() with BOs with different
>>>>>>> dma_resv.
>>>>>>> +     */
>>>>>>
>>>>>> This doesn't work. The RESV_PROTECTED case requires the
>>>>>> evicted
>>>>>> flag we discussed before. The list is either protected by the
>>>>>> spinlock or the resv. Otherwise a list add could race with a
>>>>>> list
>>>>>> removal elsewhere.
>>>>
>>>> I think it does unless I miss something, but it might be a bit
>>>> subtle
>>>> though.
>>>>
>>>> Concurrent drm_gpuvm_bo_evict() are protected by the spinlock.
>>>> Additionally, when
>>>> drm_gpuvm_bo_evict() is called we hold the dma-resv of the
>>>> corresponding GEM object.
>>>>
>>>> In drm_gpuvm_validate() I assert that we hold *all* dma-resv,
>>>> which
>>>> implies that no
>>>> one can call drm_gpuvm_bo_evict() on any of the VM's objects and
>>>> no
>>>> one can add a new
>>>> one and directly call drm_gpuvm_bo_evict() on it either.
>>>
>>> But translated into how the data (the list in this case) is
>>> protected
>>> it becomes
>>>
>>> "Either the spinlock and the bo resv of a single list item OR the
>>> bo
>>> resvs of all bos that can potentially be on the list",
>>>
>>> while this is certainly possible to assert, any new / future code
>>> that
>>> manipulates the evict list will probably get this wrong and as a
>>> result
>>> the code becomes pretty fragile. I think drm_gpuvm_bo_destroy()
>>> already
>>> gets it wrong in that it, while holding a single resv, doesn't take
>>> the
>>> spinlock.
>>
>> That's true and I don't like it either. Unfortunately, with the dma-
>> resv
>> locking scheme we can't really protect the evict list without the
>> drm_gpuvm_bo::evicted trick properly.
>>
>> But as pointed out in my other reply, I'm a bit worried about the
>> drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's
>> fine
>> doing it in the RESV_PROTECTED case.
>
> Ah, indeed. I misread that as discussing the current code rather than
> the drm_gpuvm_bo::evicted trick. If validating only a subset, or a
> range, then with the drm_gpuvm_bo::evicted trick would be valid only
> for that subset.
>
> But the current code would break because the condition of locking "the
> resvs of all bos that can potentially be on the list" doesn't hold
> anymore, and you'd get list corruption.
>
> What *would* work, though, is the solution currently in xe, The
> original evict list, and a staging evict list whose items are copied
> over on validation. The staging evict list being protected by the
> spinlock, the original evict list by the resv, and they'd use separate
> list heads in the drm_gpuvm_bo, but that is yet another complication.
>
> But I think if this becomes an issue, those VMs (perhaps OpenGL UMD
> VMs) only wanting to validate a subset, would simply initially rely on
> the current non-RESV solution. It looks like it's only a matter of
> flipping the flag on a per-vm basis.

If such a driver locks a range it can also just validate all locked
objects I guess.

And for everything else, we still have the spinlock protected variant,
where drivers can freely move things around by just taking the spinlock.

I think I will go ahead and add drm_gpuvm_bo::evicted, plus the helpers
I mentioned.

>
> /Thomas
>
>
>>
>>>
>>> So I think that needs fixing, and if keeping that protection I
>>> think it
>>> needs to be documented with the list member and ideally an assert.
>>> But
>>> also note that lockdep_assert_held will typically give false true
>>> for
>>> dma_resv locks; as long as the first dma_resv lock locked in a
>>> drm_exec
>>> sequence  remains locked, lockdep thinks *all* dma_resv locks are
>>> held.
>>> (or something along those lines), so the resv lockdep asserts are
>>> currently pretty useless.
>>>
>>> /Thomas
>>>
>>>
>>>
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

2023-10-05 13:59:21

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects


On 9/28/23 21:16, Danilo Krummrich wrote:
> Currently the DRM GPUVM offers common infrastructure to track GPU VA
> allocations and mappings, generically connect GPU VA mappings to their
> backing buffers and perform more complex mapping operations on the GPU VA
> space.
>
> However, there are more design patterns commonly used by drivers, which
> can potentially be generalized in order to make the DRM GPUVM represent
> a basis for GPU-VM implementations. In this context, this patch aims
> at generalizing the following elements.
>
> 1) Provide a common dma-resv for GEM objects not being used outside of
> this GPU-VM.
>
> 2) Provide tracking of external GEM objects (GEM objects which are
> shared with other GPU-VMs).
>
> 3) Provide functions to efficiently lock all GEM objects dma-resv the
> GPU-VM contains mappings of.
>
> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> of, such that validation of evicted GEM objects is accelerated.
>
> 5) Provide some convinience functions for common patterns.
>
> Big thanks to Boris Brezillon for his help to figure out locking for
> drivers updating the GPU VA space within the fence signalling path.
>
> Suggested-by: Matthew Brost <[email protected]>
> Signed-off-by: Danilo Krummrich <[email protected]>
> ---
> drivers/gpu/drm/drm_gpuvm.c | 642 ++++++++++++++++++++++++++++++++++++
> include/drm/drm_gpuvm.h | 240 ++++++++++++++
> 2 files changed, 882 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 27100423154b..770bb3d68d1f 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -82,6 +82,21 @@
> * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
> * particular combination. If not existent a new instance is created and linked
> * to the &drm_gem_object.
> + *
> + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
> + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
> + * list are maintained in order to accelerate locking of dma-resv locks and
> + * validation of evicted objects bound in a &drm_gpuvm. For instance, all
> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
> + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
> + * order to validate all evicted &drm_gem_objects. It is also possible to lock
> + * additional &drm_gem_objects by providing the corresponding parameters to
> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
> + * use of helper functions such as drm_gpuvm_prepare_range() or
> + * drm_gpuvm_prepare_objects().
> + *
> + * Every bound &drm_gem_object is treated as external object when its &dma_resv
> + * structure is different than the &drm_gpuvm's common &dma_resv structure.
> */
>
> /**
> @@ -429,6 +444,20 @@
> * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
> * &drm_gem_object must be able to observe previous creations and destructions
> * of &drm_gpuvm_bos in order to keep instances unique.
> + *
> + * The &drm_gpuvm's lists for keeping track of external and evicted objects are
> + * protected against concurrent insertion / removal and iteration internally.
> + *
> + * However, drivers still need ensure to protect concurrent calls to functions
> + * iterating those lists, namely drm_gpuvm_prepare_objects() and
> + * drm_gpuvm_validate().
> + *
> + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate
> + * that the corresponding &dma_resv locks are held in order to protect the
> + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and
> + * the corresponding lockdep checks are enabled. This is an optimization for
> + * drivers which are capable of taking the corresponding &dma_resv locks and
> + * hence do not require internal locking.
> */
>
> /**
> @@ -641,6 +670,195 @@
> * }
> */
>
> +/**
> + * get_next_vm_bo_from_list() - get the next vm_bo element
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from
> + * the list, so list insertion deletion can happen concurrently.
> + *
> + * Elements popped from the original list are kept in a local list, so removal
> + * and is_empty checks can still happen while we're iterating the list.
> + */
> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \
> + ({ \
> + struct drm_gpuvm_bo *__vm_bo = NULL; \
> + \
> + drm_gpuvm_bo_put(__prev_vm_bo); \
> + \
> + spin_lock(&(__gpuvm)->__list_name.lock); \
> + if (!(__gpuvm)->__list_name.local_list) \
> + (__gpuvm)->__list_name.local_list = __local_list; \
> + else \
> + WARN_ON((__gpuvm)->__list_name.local_list != __local_list); \
> + \
> + while (!list_empty(&(__gpuvm)->__list_name.list)) { \
> + __vm_bo = list_first_entry(&(__gpuvm)->__list_name.list, \
> + struct drm_gpuvm_bo, \
> + list.entry.__list_name); \
> + if (kref_get_unless_zero(&__vm_bo->kref)) { \
> + list_move_tail(&(__vm_bo)->list.entry.__list_name, \
> + __local_list); \
> + break; \
> + } else { \
> + list_del_init(&(__vm_bo)->list.entry.__list_name); \
> + __vm_bo = NULL; \
> + } \
> + } \
> + spin_unlock(&(__gpuvm)->__list_name.lock); \
> + \
> + __vm_bo; \
> + })
> +
> +/**
> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from the
> + * list, hence list insertion and deletion can happen concurrently.
> + *
> + * It is not allowed to re-assign the vm_bo pointer from inside this loop.
> + *
> + * Typical use:
> + *
> + * struct drm_gpuvm_bo *vm_bo;
> + * LIST_HEAD(my_local_list);
> + *
> + * ret = 0;
> + * for_each_vm_bo_in_list(gpuvm, <list_name>, &my_local_list, vm_bo) {
> + * ret = do_something_with_vm_bo(..., vm_bo);
> + * if (ret)
> + * break;
> + * }
> + * drm_gpuvm_bo_put(vm_bo);
> + * restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
> + *
> + *
> + * Only used for internal list iterations, not meant to be exposed to the outside
> + * world.
> + */
> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo) \
> + for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
> + __local_list, NULL); \
> + __vm_bo; \
> + __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
> + __local_list, __vm_bo))
> +
> +static inline void
> +__restore_vm_bo_list(struct drm_gpuvm *gpuvm, spinlock_t *lock,
> + struct list_head *list, struct list_head **local_list)
s/static inline void/static void/?  In .c files, the compiler is
typically trusted to inline where needed.

/Thomas


2023-10-05 14:38:17

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 0/6] [RFC] DRM GPUVM features

Hi, Danilo

On 9/28/23 21:16, Danilo Krummrich wrote:
> Currently GPUVM offers common infrastructure to track GPU VA allocations
> and mappings, generically connect GPU VA mappings to their backing
> buffers and perform more complex mapping operations on the GPU VA space.
>
> However, there are more design patterns commonly used by drivers, which
> can potentially be generalized in order to make GPUVM represent the
> basis of a VM implementation. In this context, this patch series aims at
> generalizing the following elements.
>
> 1) Provide a common dma-resv for GEM objects not being used outside of
> this GPU-VM.
>
> 2) Provide tracking of external GEM objects (GEM objects which are
> shared with other GPU-VMs).
>
> 3) Provide functions to efficiently lock all GEM objects dma-resv the
> GPU-VM contains mappings of.
>
> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> of, such that validation of evicted GEM objects is accelerated.
>
> 5) Provide some convinience functions for common patterns.
>
> The implementation introduces struct drm_gpuvm_bo, which serves as abstraction
> combining a struct drm_gpuvm and struct drm_gem_object, similar to what
> amdgpu does with struct amdgpu_bo_vm. While this adds a bit of complexity it
> improves the efficiency of tracking external and evicted GEM objects.
>
> This patch series is also available at [3].
>
> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/gpuvm-next
>
> Changes in V2:
> ==============
> - rename 'drm_gpuva_manager' -> 'drm_gpuvm' which generally leads to more
> consistent naming
> - properly separate commits (introduce common dma-resv, drm_gpuvm_bo
> abstraction, etc.)
> - remove maple tree for tracking external objects, use a list drm_gpuvm_bos
> per drm_gpuvm instead
> - rework dma-resv locking helpers (Thomas)
> - add a locking helper for a given range of the VA space (Christian)
> - make the GPUVA manager buildable as module, rather than drm_exec
> builtin (Christian)
>
> Changes in V3:
> ==============
> - rename missing function and files (Boris)
> - warn if vm_obj->obj != obj in drm_gpuva_link() (Boris)
> - don't expose drm_gpuvm_bo_destroy() (Boris)
> - unlink VM_BO from GEM in drm_gpuvm_bo_destroy() rather than
> drm_gpuva_unlink() and link within drm_gpuvm_bo_obtain() to keep
> drm_gpuvm_bo instances unique
> - add internal locking to external and evicted object lists to support drivers
> updating the VA space from within the fence signalling critical path (Boris)
> - unlink external objects and evicted objects from the GPUVM's list in
> drm_gpuvm_bo_destroy()
> - add more documentation and fix some kernel doc issues
>
> Changes in V4:
> ==============
> - add a drm_gpuvm_resv() helper (Boris)
> - add a drm_gpuvm::<list_name>::local_list field (Boris)
> - remove drm_gpuvm_bo_get_unless_zero() helper (Boris)
> - fix missing NULL assignment in get_next_vm_bo_from_list() (Boris)
> - keep a drm_gem_object reference on potential vm_bo destroy (alternatively we
> could free the vm_bo and drop the vm_bo's drm_gem_object reference through
> async work)
> - introduce DRM_GPUVM_RESV_PROTECTED flag to indicate external locking through
> the corresponding dma-resv locks to optimize for drivers already holding
> them when needed; add the corresponding lock_assert_held() calls (Thomas)
> - make drm_gpuvm_bo_evict() per vm_bo and add a drm_gpuvm_bo_gem_evict()
> helper (Thomas)
> - pass a drm_gpuvm_bo in drm_gpuvm_ops::vm_bo_validate() (Thomas)
> - documentation fixes
>
> Changes in V5:
> ==============
> - use a root drm_gem_object provided by the driver as a base for the VM's
> common dma-resv (Christian)
> - provide a helper to allocate a "dummy" root GEM object in case a driver
> specific root GEM object isn't available
> - add a dedicated patch for nouveau to make use of the GPUVM's shared dma-resv
> - improve documentation (Boris)
> - the following patches are removed from the series, since they already landed
> in drm-misc-next
> - f72c2db47080 ("drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm")
> - fe7acaa727e1 ("drm/gpuvm: allow building as module")
> - 78f54469b871 ("drm/nouveau: uvmm: rename 'umgr' to 'base'")
>
> Danilo Krummrich (6):
> drm/gpuvm: add common dma-resv per struct drm_gpuvm
> drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm
> drm/gpuvm: add an abstraction for a VM / BO combination
> drm/gpuvm: track/lock/validate external/evicted objects
> drm/nouveau: make use of the GPUVM's shared dma-resv
> drm/nouveau: use GPUVM common infrastructure
>
> drivers/gpu/drm/drm_gpuvm.c | 1036 +++++++++++++++++++++--
> drivers/gpu/drm/nouveau/nouveau_bo.c | 15 +-
> drivers/gpu/drm/nouveau/nouveau_bo.h | 5 +
> drivers/gpu/drm/nouveau/nouveau_exec.c | 52 +-
> drivers/gpu/drm/nouveau/nouveau_exec.h | 4 -
> drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +-
> drivers/gpu/drm/nouveau/nouveau_sched.h | 4 +-
> drivers/gpu/drm/nouveau/nouveau_uvmm.c | 183 ++--
> drivers/gpu/drm/nouveau/nouveau_uvmm.h | 1 -
> include/drm/drm_gem.h | 32 +-
> include/drm/drm_gpuvm.h | 465 +++++++++-
> 11 files changed, 1625 insertions(+), 182 deletions(-)
>
>
> base-commit: a4ead6e37e3290cff399e2598d75e98777b69b37

One comment I had before on the GPUVM code in general was the licensing,
but I'm not sure there was a reply. Is it possible to have this code
dual MIT / GPLV2?

Thanks,

Thomas



2023-10-05 15:52:58

by Thomas Hellström

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 3/6] drm/gpuvm: add an abstraction for a VM / BO combination

Hi,

On 9/28/23 21:16, Danilo Krummrich wrote:
> This patch adds an abstraction layer between the drm_gpuva mappings of
NIT: imperative:  s/This patch adds/Add/
> a particular drm_gem_object and this GEM object itself. The abstraction
> represents a combination of a drm_gem_object and drm_gpuvm. The
> drm_gem_object holds a list of drm_gpuvm_bo structures (the structure
> representing this abstraction), while each drm_gpuvm_bo contains list of
> mappings of this GEM object.
>
> This has multiple advantages:
>
> 1) We can use the drm_gpuvm_bo structure to attach it to various lists
> of the drm_gpuvm. This is useful for tracking external and evicted
> objects per VM, which is introduced in subsequent patches.
>
> 2) Finding mappings of a certain drm_gem_object mapped in a certain
> drm_gpuvm becomes much cheaper.
>
> 3) Drivers can derive and extend the structure to easily represent
> driver specific states of a BO for a certain GPUVM.
>
> The idea of this abstraction was taken from amdgpu, hence the credit for
> this idea goes to the developers of amdgpu.
>
> Cc: Christian König <[email protected]>
> Signed-off-by: Danilo Krummrich <[email protected]>
> ---
> drivers/gpu/drm/drm_gpuvm.c | 334 +++++++++++++++++++++----
> drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++--
> include/drm/drm_gem.h | 32 +--
> include/drm/drm_gpuvm.h | 177 ++++++++++++-
> 4 files changed, 523 insertions(+), 84 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 6368dfdbe9dd..27100423154b 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -70,6 +70,18 @@
> * &drm_gem_object, such as the &drm_gem_object containing the root page table,
> * but it can also be a 'dummy' object, which can be allocated with
> * drm_gpuvm_root_object_alloc().
> + *
> + * In order to connect a struct drm_gpuva its backing &drm_gem_object each
> + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each
> + * &drm_gpuvm_bo contains a list of &&drm_gpuva structures.
> + *
> + * A &drm_gpuvm_bo is an abstraction that represents a combination of a
> + * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique.
> + * This is ensured by the API through drm_gpuvm_bo_obtain() and
> + * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding
> + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
> + * particular combination. If not existent a new instance is created and linked
> + * to the &drm_gem_object.
> */
>
> /**
> @@ -395,21 +407,28 @@
> /**
> * DOC: Locking
> *
> - * Generally, the GPU VA manager does not take care of locking itself, it is
> - * the drivers responsibility to take care about locking. Drivers might want to
> - * protect the following operations: inserting, removing and iterating
> - * &drm_gpuva objects as well as generating all kinds of operations, such as
> - * split / merge or prefetch.
> - *
> - * The GPU VA manager also does not take care of the locking of the backing
> - * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
> - * enforce mutual exclusion using either the GEMs dma_resv lock or alternatively
> - * a driver specific external lock. For the latter see also
> - * drm_gem_gpuva_set_lock().
> - *
> - * However, the GPU VA manager contains lockdep checks to ensure callers of its
> - * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is
> - * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink().
> + * In terms of managing &drm_gpuva entries DRM GPUVM does not take care of
> + * locking itself, it is the drivers responsibility to take care about locking.
> + * Drivers might want to protect the following operations: inserting, removing
> + * and iterating &drm_gpuva objects as well as generating all kinds of
> + * operations, such as split / merge or prefetch.
> + *
> + * DRM GPUVM also does not take care of the locking of the backing
> + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by
> + * itself; drivers are responsible to enforce mutual exclusion using either the
> + * GEMs dma_resv lock or alternatively a driver specific external lock. For the
> + * latter see also drm_gem_gpuva_set_lock().
> + *
> + * However, DRM GPUVM contains lockdep checks to ensure callers of its API hold
> + * the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed
> + * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also
> + * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put().
> + *
> + * The latter is required since on creation and destruction of a &drm_gpuvm_bo
> + * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list.
> + * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
> + * &drm_gem_object must be able to observe previous creations and destructions
> + * of &drm_gpuvm_bos in order to keep instances unique.
> */
>
> /**
> @@ -439,6 +458,7 @@
> * {
> * struct drm_gpuva_ops *ops;
> * struct drm_gpuva_op *op
> + * struct drm_gpuvm_bo *vm_bo;
> *
> * driver_lock_va_space();
> * ops = drm_gpuvm_sm_map_ops_create(gpuvm, addr, range,
> @@ -446,6 +466,10 @@
> * if (IS_ERR(ops))
> * return PTR_ERR(ops);
> *
> + * vm_bo = drm_gpuvm_bo_obtain(gpuvm, obj);
> + * if (IS_ERR(vm_bo))
> + * return PTR_ERR(vm_bo);
> + *
> * drm_gpuva_for_each_op(op, ops) {
> * struct drm_gpuva *va;
> *
> @@ -458,7 +482,7 @@
> *
> * driver_vm_map();
> * drm_gpuva_map(gpuvm, va, &op->map);
> - * drm_gpuva_link(va);
> + * drm_gpuva_link(va, vm_bo);
> *
> * break;
> * case DRM_GPUVA_OP_REMAP: {
> @@ -485,11 +509,11 @@
> * driver_vm_remap();
> * drm_gpuva_remap(prev, next, &op->remap);
> *
> - * drm_gpuva_unlink(va);
> * if (prev)
> - * drm_gpuva_link(prev);
> + * drm_gpuva_link(prev, va->vm_bo);
> * if (next)
> - * drm_gpuva_link(next);
> + * drm_gpuva_link(next, va->vm_bo);
> + * drm_gpuva_unlink(va);
> *
> * break;
> * }
> @@ -505,6 +529,7 @@
> * break;
> * }
> * }
> + * drm_gpuvm_bo_put(vm_bo);
> * driver_unlock_va_space();
> *
> * return 0;
> @@ -514,6 +539,7 @@
> *
> * struct driver_context {
> * struct drm_gpuvm *gpuvm;
> + * struct drm_gpuvm_bo *vm_bo;
> * struct drm_gpuva *new_va;
> * struct drm_gpuva *prev_va;
> * struct drm_gpuva *next_va;
> @@ -534,6 +560,7 @@
> * struct drm_gem_object *obj, u64 offset)
> * {
> * struct driver_context ctx;
> + * struct drm_gpuvm_bo *vm_bo;
> * struct drm_gpuva_ops *ops;
> * struct drm_gpuva_op *op;
> * int ret = 0;
> @@ -543,16 +570,23 @@
> * ctx.new_va = kzalloc(sizeof(*ctx.new_va), GFP_KERNEL);
> * ctx.prev_va = kzalloc(sizeof(*ctx.prev_va), GFP_KERNEL);
> * ctx.next_va = kzalloc(sizeof(*ctx.next_va), GFP_KERNEL);
> - * if (!ctx.new_va || !ctx.prev_va || !ctx.next_va) {
> + * ctx.vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
> + * if (!ctx.new_va || !ctx.prev_va || !ctx.next_va || !vm_bo) {
> * ret = -ENOMEM;
> * goto out;
> * }
> *
> + * // Typically protected with a driver specific GEM gpuva lock
> + * // used in the fence signaling path for drm_gpuva_link() and
> + * // drm_gpuva_unlink(), hence pre-allocate.
> + * ctx.vm_bo = drm_gpuvm_bo_obtain_prealloc(ctx.vm_bo);
> + *
> * driver_lock_va_space();
> * ret = drm_gpuvm_sm_map(gpuvm, &ctx, addr, range, obj, offset);
> * driver_unlock_va_space();
> *
> * out:
> + * drm_gpuvm_bo_put(ctx.vm_bo);
> * kfree(ctx.new_va);
> * kfree(ctx.prev_va);
> * kfree(ctx.next_va);
> @@ -565,7 +599,7 @@
> *
> * drm_gpuva_map(ctx->vm, ctx->new_va, &op->map);
> *
> - * drm_gpuva_link(ctx->new_va);
> + * drm_gpuva_link(ctx->new_va, ctx->vm_bo);
> *
> * // prevent the new GPUVA from being freed in
> * // driver_mapping_create()
> @@ -577,22 +611,23 @@
> * int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
> * {
> * struct driver_context *ctx = __ctx;
> + * struct drm_gpuva *va = op->remap.unmap->va;
> *
> * drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
> *
> - * drm_gpuva_unlink(op->remap.unmap->va);
> - * kfree(op->remap.unmap->va);
> - *
> * if (op->remap.prev) {
> - * drm_gpuva_link(ctx->prev_va);
> + * drm_gpuva_link(ctx->prev_va, va->vm_bo);
> * ctx->prev_va = NULL;
> * }
> *
> * if (op->remap.next) {
> - * drm_gpuva_link(ctx->next_va);
> + * drm_gpuva_link(ctx->next_va, va->vm_bo);
> * ctx->next_va = NULL;
> * }
> *
> + * drm_gpuva_unlink(va);
> + * kfree(va);
> + *
> * return 0;
> * }
> *
> @@ -771,6 +806,194 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
> }
> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>
> +/**
> + * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
> + * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> + * @obj: The &drm_gem_object being mapped in the @gpuvm.
> + *
> + * If provided by the driver, this function uses the &drm_gpuvm_ops
> + * vm_bo_alloc() callback to allocate.
> + *
> + * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
> + */
> +struct drm_gpuvm_bo *
> +drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
> + struct drm_gem_object *obj)
> +{
> + const struct drm_gpuvm_ops *ops = gpuvm->ops;
> + struct drm_gpuvm_bo *vm_bo;
> +
> + if (ops && ops->vm_bo_alloc)
> + vm_bo = ops->vm_bo_alloc();
> + else
> + vm_bo = kzalloc(sizeof(*vm_bo), GFP_KERNEL);
> +
> + if (unlikely(!vm_bo))
> + return NULL;
> +
> + vm_bo->vm = gpuvm;
> + vm_bo->obj = obj;
> +
> + kref_init(&vm_bo->kref);
> + INIT_LIST_HEAD(&vm_bo->list.gpuva);
> + INIT_LIST_HEAD(&vm_bo->list.entry.gem);
> +
> + drm_gem_object_get(obj);
> +
> + return vm_bo;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_create);
> +
> +static void
> +drm_gpuvm_bo_destroy(struct kref *kref)
> +{
> + struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
> + kref);
> + struct drm_gpuvm *gpuvm = vm_bo->vm;
> + const struct drm_gpuvm_ops *ops = gpuvm->ops;
> + struct drm_gem_object *obj = vm_bo->obj;
> + bool lock = !drm_gpuvm_resv_protected(gpuvm);
> +
> + drm_gem_gpuva_assert_lock_held(obj);
> + if (!lock)
> + drm_gpuvm_resv_assert_held(gpuvm);
> +
> + list_del(&vm_bo->list.entry.gem);
> +
> + drm_gem_object_put(obj);
> +
> + if (ops && ops->vm_bo_free)
> + ops->vm_bo_free(vm_bo);
> + else
> + kfree(vm_bo);
> +}
> +
> +/**
> + * drm_gpuvm_bo_put() - drop a struct drm_gpuvm_bo reference
> + * @vm_bo: the &drm_gpuvm_bo to release the reference of
> + *
> + * This releases a reference to @vm_bo.
> + *
> + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
> + * includes removing it from the GEMs gpuva list. Hence, if a call to this
> + * function can potentially let the reference count to zero the caller must
> + * hold the dma-resv or driver specific GEM gpuva lock.
> + */
> +void
> +drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> +{
> + if (vm_bo)
> + kref_put(&vm_bo->kref, drm_gpuvm_bo_destroy);
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> +
> +static struct drm_gpuvm_bo *
> +__drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> + struct drm_gem_object *obj)
> +{
> + struct drm_gpuvm_bo *vm_bo;
> +
> + drm_gem_gpuva_assert_lock_held(obj);
> +
> + drm_gem_for_each_gpuvm_bo(vm_bo, obj)
> + if (vm_bo->vm == gpuvm)
> + return vm_bo;
> +
> + return NULL;
> +}
> +
> +/**
> + * drm_gpuvm_bo_find() - find the &drm_gpuvm_bo for the given
> + * &drm_gpuvm and &drm_gem_object
> + * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> + * @obj: The &drm_gem_object being mapped in the @gpuvm.
> + *
> + * Find the &drm_gpuvm_bo representing the combination of the given
> + * &drm_gpuvm and &drm_gem_object. If found, increases the reference
> + * count of the &drm_gpuvm_bo accordingly.
> + *
> + * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
> + */
> +struct drm_gpuvm_bo *
> +drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> + struct drm_gem_object *obj)
> +{
> + struct drm_gpuvm_bo *vm_bo = __drm_gpuvm_bo_find(gpuvm, obj);
> +
> + return vm_bo ? drm_gpuvm_bo_get(vm_bo) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_find);
> +
> +/**
> + * drm_gpuvm_bo_obtain() - obtains and instance of the &drm_gpuvm_bo for the
> + * given &drm_gpuvm and &drm_gem_object
> + * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> + * @obj: The &drm_gem_object being mapped in the @gpuvm.
> + *
> + * Find the &drm_gpuvm_bo representing the combination of the given
> + * &drm_gpuvm and &drm_gem_object. If found, increases the reference
> + * count of the &drm_gpuvm_bo accordingly. If not found, allocates a new
> + * &drm_gpuvm_bo.
> + *
> + * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
> + *
> + * Returns: a pointer to the &drm_gpuvm_bo on success, an ERR_PTR on failure
> + */
> +struct drm_gpuvm_bo *
> +drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
> + struct drm_gem_object *obj)
> +{
> + struct drm_gpuvm_bo *vm_bo;
> +
> + vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
> + if (vm_bo)
> + return vm_bo;
> +
> + vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
> + if (!vm_bo)
> + return ERR_PTR(-ENOMEM);
> +
> + list_add_tail(&vm_bo->list.entry.gem, &obj->gpuva.list);
> +
> + return vm_bo;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain);
> +
> +/**
> + * drm_gpuvm_bo_obtain_prealloc() - obtains and instance of the &drm_gpuvm_bo
> + * for the given &drm_gpuvm and &drm_gem_object
> + * @__vm_bo: A pre-allocated struct drm_gpuvm_bo.
> + *
> + * Find the &drm_gpuvm_bo representing the combination of the given
> + * &drm_gpuvm and &drm_gem_object. If found, increases the reference
> + * count of the found &drm_gpuvm_bo accordingly, while the @__vm_bo reference
> + * count is decreased. If not found @__vm_bo is returned without further
> + * increase of the reference count.
> + *
> + * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
> + *
> + * Returns: a pointer to the found &drm_gpuvm_bo or @__vm_bo if no existing
> + * &drm_gpuvm_bo was found
> + */
> +struct drm_gpuvm_bo *
> +drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
> +{
> + struct drm_gpuvm *gpuvm = __vm_bo->vm;
> + struct drm_gem_object *obj = __vm_bo->obj;
> + struct drm_gpuvm_bo *vm_bo;
> +
> + vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
> + if (vm_bo) {
> + drm_gpuvm_bo_put(__vm_bo);
> + return vm_bo;
> + }
> +
> + list_add_tail(&__vm_bo->list.entry.gem, &obj->gpuva.list);
> +
> + return __vm_bo;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
> +
> static int
> __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> struct drm_gpuva *va)
> @@ -860,24 +1083,33 @@ EXPORT_SYMBOL_GPL(drm_gpuva_remove);
> /**
> * drm_gpuva_link() - link a &drm_gpuva
> * @va: the &drm_gpuva to link
> + * @vm_bo: the &drm_gpuvm_bo to add the &drm_gpuva to
> *
> - * This adds the given &va to the GPU VA list of the &drm_gem_object it is
> - * associated with.
> + * This adds the given &va to the GPU VA list of the &drm_gpuvm_bo and the
> + * &drm_gpuvm_bo to the &drm_gem_object it is associated with.
> + *
> + * For every &drm_gpuva entry added to the &drm_gpuvm_bo an additional
> + * reference of the latter is taken.
> *
> * This function expects the caller to protect the GEM's GPUVA list against
> - * concurrent access using the GEMs dma_resv lock.
> + * concurrent access using either the GEMs dma_resv lock or a driver specific
> + * lock set through drm_gem_gpuva_set_lock().
> */
> void
> -drm_gpuva_link(struct drm_gpuva *va)
> +drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo)
> {
> struct drm_gem_object *obj = va->gem.obj;
>
> if (unlikely(!obj))
> return;
>
> + WARN_ON(obj != vm_bo->obj);
> drm_gem_gpuva_assert_lock_held(obj);
>
> - list_add_tail(&va->gem.entry, &obj->gpuva.list);
> + drm_gpuvm_bo_get(vm_bo);
> +
> + va->vm_bo = vm_bo;
> + list_add_tail(&va->gem.entry, &vm_bo->list.gpuva);
> }
> EXPORT_SYMBOL_GPL(drm_gpuva_link);
>
> @@ -888,13 +1120,22 @@ EXPORT_SYMBOL_GPL(drm_gpuva_link);
> * This removes the given &va from the GPU VA list of the &drm_gem_object it is
> * associated with.
> *
> + * This removes the given &va from the GPU VA list of the &drm_gpuvm_bo and
> + * the &drm_gpuvm_bo from the &drm_gem_object it is associated with in case
> + * this call unlinks the last &drm_gpuva from the &drm_gpuvm_bo.
> + *
> + * For every &drm_gpuva entry removed from the &drm_gpuvm_bo a reference of
> + * the latter is dropped.
> + *
> * This function expects the caller to protect the GEM's GPUVA list against
> - * concurrent access using the GEMs dma_resv lock.
> + * concurrent access using either the GEMs dma_resv lock or a driver specific
> + * lock set through drm_gem_gpuva_set_lock().
> */
> void
> drm_gpuva_unlink(struct drm_gpuva *va)
> {
> struct drm_gem_object *obj = va->gem.obj;
Can we ditch va->gem.obj now and replace it with an accessor to the 
vm_bo's pointer?
> + struct drm_gpuvm_bo *vm_bo = va->vm_bo;
>
> if (unlikely(!obj))
> return;
> @@ -902,6 +1143,11 @@ drm_gpuva_unlink(struct drm_gpuva *va)
> drm_gem_gpuva_assert_lock_held(obj);
>
> list_del_init(&va->gem.entry);
> + va->vm_bo = NULL;
> +
> + drm_gem_object_get(obj);
> + drm_gpuvm_bo_put(vm_bo);
> + drm_gem_object_put(obj);

This get->put dance is unneccesary? If the caller is required to hold a
lock on obj it is also required to hold a reference on obj.

Besides, if the vm_bo's reference on obj is otherwise the last one, it
will still be freed before the function exits.

/Thomas


2023-10-08 22:50:12

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 0/6] [RFC] DRM GPUVM features

Hi Thomas,

On 10/5/23 11:35, Thomas Hellström wrote:
> Hi, Danilo
>
> On 9/28/23 21:16, Danilo Krummrich wrote:
>> Currently GPUVM offers common infrastructure to track GPU VA allocations
>> and mappings, generically connect GPU VA mappings to their backing
>> buffers and perform more complex mapping operations on the GPU VA space.
>>
>> However, there are more design patterns commonly used by drivers, which
>> can potentially be generalized in order to make GPUVM represent the
>> basis of a VM implementation. In this context, this patch series aims at
>> generalizing the following elements.
>>
>> 1) Provide a common dma-resv for GEM objects not being used outside of
>>     this GPU-VM.
>>
>> 2) Provide tracking of external GEM objects (GEM objects which are
>>     shared with other GPU-VMs).
>>
>> 3) Provide functions to efficiently lock all GEM objects dma-resv the
>>     GPU-VM contains mappings of.
>>
>> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
>>     of, such that validation of evicted GEM objects is accelerated.
>>
>> 5) Provide some convinience functions for common patterns.
>>
>> The implementation introduces struct drm_gpuvm_bo, which serves as abstraction
>> combining a struct drm_gpuvm and struct drm_gem_object, similar to what
>> amdgpu does with struct amdgpu_bo_vm. While this adds a bit of complexity it
>> improves the efficiency of tracking external and evicted GEM objects.
>>
>> This patch series is also available at [3].
>>
>> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/gpuvm-next
>>
>> Changes in V2:
>> ==============
>>    - rename 'drm_gpuva_manager' -> 'drm_gpuvm' which generally leads to more
>>      consistent naming
>>    - properly separate commits (introduce common dma-resv, drm_gpuvm_bo
>>      abstraction, etc.)
>>    - remove maple tree for tracking external objects, use a list drm_gpuvm_bos
>>      per drm_gpuvm instead
>>    - rework dma-resv locking helpers (Thomas)
>>    - add a locking helper for a given range of the VA space (Christian)
>>    - make the GPUVA manager buildable as module, rather than drm_exec
>>      builtin (Christian)
>>
>> Changes in V3:
>> ==============
>>    - rename missing function and files (Boris)
>>    - warn if vm_obj->obj != obj in drm_gpuva_link() (Boris)
>>    - don't expose drm_gpuvm_bo_destroy() (Boris)
>>    - unlink VM_BO from GEM in drm_gpuvm_bo_destroy() rather than
>>      drm_gpuva_unlink() and link within drm_gpuvm_bo_obtain() to keep
>>      drm_gpuvm_bo instances unique
>>    - add internal locking to external and evicted object lists to support drivers
>>      updating the VA space from within the fence signalling critical path (Boris)
>>    - unlink external objects and evicted objects from the GPUVM's list in
>>      drm_gpuvm_bo_destroy()
>>    - add more documentation and fix some kernel doc issues
>>
>> Changes in V4:
>> ==============
>>    - add a drm_gpuvm_resv() helper (Boris)
>>    - add a drm_gpuvm::<list_name>::local_list field (Boris)
>>    - remove drm_gpuvm_bo_get_unless_zero() helper (Boris)
>>    - fix missing NULL assignment in get_next_vm_bo_from_list() (Boris)
>>    - keep a drm_gem_object reference on potential vm_bo destroy (alternatively we
>>      could free the vm_bo and drop the vm_bo's drm_gem_object reference through
>>      async work)
>>    - introduce DRM_GPUVM_RESV_PROTECTED flag to indicate external locking through
>>      the corresponding dma-resv locks to optimize for drivers already holding
>>      them when needed; add the corresponding lock_assert_held() calls (Thomas)
>>    - make drm_gpuvm_bo_evict() per vm_bo and add a drm_gpuvm_bo_gem_evict()
>>      helper (Thomas)
>>    - pass a drm_gpuvm_bo in drm_gpuvm_ops::vm_bo_validate() (Thomas)
>>    - documentation fixes
>>
>> Changes in V5:
>> ==============
>>    - use a root drm_gem_object provided by the driver as a base for the VM's
>>      common dma-resv (Christian)
>>    - provide a helper to allocate a "dummy" root GEM object in case a driver
>>      specific root GEM object isn't available
>>    - add a dedicated patch for nouveau to make use of the GPUVM's shared dma-resv
>>    - improve documentation (Boris)
>>    - the following patches are removed from the series, since they already landed
>>      in drm-misc-next
>>      - f72c2db47080 ("drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm")
>>      - fe7acaa727e1 ("drm/gpuvm: allow building as module")
>>      - 78f54469b871 ("drm/nouveau: uvmm: rename 'umgr' to 'base'")
>>
>> Danilo Krummrich (6):
>>    drm/gpuvm: add common dma-resv per struct drm_gpuvm
>>    drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm
>>    drm/gpuvm: add an abstraction for a VM / BO combination
>>    drm/gpuvm: track/lock/validate external/evicted objects
>>    drm/nouveau: make use of the GPUVM's shared dma-resv
>>    drm/nouveau: use GPUVM common infrastructure
>>
>>   drivers/gpu/drm/drm_gpuvm.c             | 1036 +++++++++++++++++++++--
>>   drivers/gpu/drm/nouveau/nouveau_bo.c    |   15 +-
>>   drivers/gpu/drm/nouveau/nouveau_bo.h    |    5 +
>>   drivers/gpu/drm/nouveau/nouveau_exec.c  |   52 +-
>>   drivers/gpu/drm/nouveau/nouveau_exec.h  |    4 -
>>   drivers/gpu/drm/nouveau/nouveau_gem.c   |   10 +-
>>   drivers/gpu/drm/nouveau/nouveau_sched.h |    4 +-
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  183 ++--
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h  |    1 -
>>   include/drm/drm_gem.h                   |   32 +-
>>   include/drm/drm_gpuvm.h                 |  465 +++++++++-
>>   11 files changed, 1625 insertions(+), 182 deletions(-)
>>
>>
>> base-commit: a4ead6e37e3290cff399e2598d75e98777b69b37
>
> One comment I had before on the GPUVM code in general was the licensing, but I'm not sure there was a reply. Is it possible to have this code dual MIT / GPLV2?

Personally, I don't have any objections. Please feel free to send a patch to change it, I'm happy to ACK it.

- Danilo

>
> Thanks,
>
> Thomas
>
>
>

2023-10-08 23:13:10

by Danilo Krummrich

[permalink] [raw]
Subject: Re: [PATCH drm-misc-next v5 3/6] drm/gpuvm: add an abstraction for a VM / BO combination

On 10/5/23 13:51, Thomas Hellström wrote:
> Hi,
>
> On 9/28/23 21:16, Danilo Krummrich wrote:
>> This patch adds an abstraction layer between the drm_gpuva mappings of
> NIT: imperative:  s/This patch adds/Add/
>> a particular drm_gem_object and this GEM object itself. The abstraction
>> represents a combination of a drm_gem_object and drm_gpuvm. The
>> drm_gem_object holds a list of drm_gpuvm_bo structures (the structure
>> representing this abstraction), while each drm_gpuvm_bo contains list of
>> mappings of this GEM object.
>>
>> This has multiple advantages:
>>
>> 1) We can use the drm_gpuvm_bo structure to attach it to various lists
>>     of the drm_gpuvm. This is useful for tracking external and evicted
>>     objects per VM, which is introduced in subsequent patches.
>>
>> 2) Finding mappings of a certain drm_gem_object mapped in a certain
>>     drm_gpuvm becomes much cheaper.
>>
>> 3) Drivers can derive and extend the structure to easily represent
>>     driver specific states of a BO for a certain GPUVM.
>>
>> The idea of this abstraction was taken from amdgpu, hence the credit for
>> this idea goes to the developers of amdgpu.
>>
>> Cc: Christian König <[email protected]>
>> Signed-off-by: Danilo Krummrich <[email protected]>
>> ---
>>   drivers/gpu/drm/drm_gpuvm.c            | 334 +++++++++++++++++++++----
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c |  64 +++--
>>   include/drm/drm_gem.h                  |  32 +--
>>   include/drm/drm_gpuvm.h                | 177 ++++++++++++-
>>   4 files changed, 523 insertions(+), 84 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>> index 6368dfdbe9dd..27100423154b 100644
>> --- a/drivers/gpu/drm/drm_gpuvm.c
>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>> @@ -70,6 +70,18 @@
>>    * &drm_gem_object, such as the &drm_gem_object containing the root page table,
>>    * but it can also be a 'dummy' object, which can be allocated with
>>    * drm_gpuvm_root_object_alloc().
>> + *
>> + * In order to connect a struct drm_gpuva its backing &drm_gem_object each
>> + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each
>> + * &drm_gpuvm_bo contains a list of &&drm_gpuva structures.
>> + *
>> + * A &drm_gpuvm_bo is an abstraction that represents a combination of a
>> + * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique.
>> + * This is ensured by the API through drm_gpuvm_bo_obtain() and
>> + * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding
>> + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
>> + * particular combination. If not existent a new instance is created and linked
>> + * to the &drm_gem_object.
>>    */
>>   /**
>> @@ -395,21 +407,28 @@
>>   /**
>>    * DOC: Locking
>>    *
>> - * Generally, the GPU VA manager does not take care of locking itself, it is
>> - * the drivers responsibility to take care about locking. Drivers might want to
>> - * protect the following operations: inserting, removing and iterating
>> - * &drm_gpuva objects as well as generating all kinds of operations, such as
>> - * split / merge or prefetch.
>> - *
>> - * The GPU VA manager also does not take care of the locking of the backing
>> - * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to
>> - * enforce mutual exclusion using either the GEMs dma_resv lock or alternatively
>> - * a driver specific external lock. For the latter see also
>> - * drm_gem_gpuva_set_lock().
>> - *
>> - * However, the GPU VA manager contains lockdep checks to ensure callers of its
>> - * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is
>> - * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink().
>> + * In terms of managing &drm_gpuva entries DRM GPUVM does not take care of
>> + * locking itself, it is the drivers responsibility to take care about locking.
>> + * Drivers might want to protect the following operations: inserting, removing
>> + * and iterating &drm_gpuva objects as well as generating all kinds of
>> + * operations, such as split / merge or prefetch.
>> + *
>> + * DRM GPUVM also does not take care of the locking of the backing
>> + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by
>> + * itself; drivers are responsible to enforce mutual exclusion using either the
>> + * GEMs dma_resv lock or alternatively a driver specific external lock. For the
>> + * latter see also drm_gem_gpuva_set_lock().
>> + *
>> + * However, DRM GPUVM contains lockdep checks to ensure callers of its API hold
>> + * the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed
>> + * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also
>> + * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put().
>> + *
>> + * The latter is required since on creation and destruction of a &drm_gpuvm_bo
>> + * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list.
>> + * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
>> + * &drm_gem_object must be able to observe previous creations and destructions
>> + * of &drm_gpuvm_bos in order to keep instances unique.
>>    */
>>   /**
>> @@ -439,6 +458,7 @@
>>    *    {
>>    *        struct drm_gpuva_ops *ops;
>>    *        struct drm_gpuva_op *op
>> + *        struct drm_gpuvm_bo *vm_bo;
>>    *
>>    *        driver_lock_va_space();
>>    *        ops = drm_gpuvm_sm_map_ops_create(gpuvm, addr, range,
>> @@ -446,6 +466,10 @@
>>    *        if (IS_ERR(ops))
>>    *            return PTR_ERR(ops);
>>    *
>> + *        vm_bo = drm_gpuvm_bo_obtain(gpuvm, obj);
>> + *        if (IS_ERR(vm_bo))
>> + *            return PTR_ERR(vm_bo);
>> + *
>>    *        drm_gpuva_for_each_op(op, ops) {
>>    *            struct drm_gpuva *va;
>>    *
>> @@ -458,7 +482,7 @@
>>    *
>>    *                driver_vm_map();
>>    *                drm_gpuva_map(gpuvm, va, &op->map);
>> - *                drm_gpuva_link(va);
>> + *                drm_gpuva_link(va, vm_bo);
>>    *
>>    *                break;
>>    *            case DRM_GPUVA_OP_REMAP: {
>> @@ -485,11 +509,11 @@
>>    *                driver_vm_remap();
>>    *                drm_gpuva_remap(prev, next, &op->remap);
>>    *
>> - *                drm_gpuva_unlink(va);
>>    *                if (prev)
>> - *                    drm_gpuva_link(prev);
>> + *                    drm_gpuva_link(prev, va->vm_bo);
>>    *                if (next)
>> - *                    drm_gpuva_link(next);
>> + *                    drm_gpuva_link(next, va->vm_bo);
>> + *                drm_gpuva_unlink(va);
>>    *
>>    *                break;
>>    *            }
>> @@ -505,6 +529,7 @@
>>    *                break;
>>    *            }
>>    *        }
>> + *        drm_gpuvm_bo_put(vm_bo);
>>    *        driver_unlock_va_space();
>>    *
>>    *        return 0;
>> @@ -514,6 +539,7 @@
>>    *
>>    *    struct driver_context {
>>    *        struct drm_gpuvm *gpuvm;
>> + *        struct drm_gpuvm_bo *vm_bo;
>>    *        struct drm_gpuva *new_va;
>>    *        struct drm_gpuva *prev_va;
>>    *        struct drm_gpuva *next_va;
>> @@ -534,6 +560,7 @@
>>    *                  struct drm_gem_object *obj, u64 offset)
>>    *    {
>>    *        struct driver_context ctx;
>> + *        struct drm_gpuvm_bo *vm_bo;
>>    *        struct drm_gpuva_ops *ops;
>>    *        struct drm_gpuva_op *op;
>>    *        int ret = 0;
>> @@ -543,16 +570,23 @@
>>    *        ctx.new_va = kzalloc(sizeof(*ctx.new_va), GFP_KERNEL);
>>    *        ctx.prev_va = kzalloc(sizeof(*ctx.prev_va), GFP_KERNEL);
>>    *        ctx.next_va = kzalloc(sizeof(*ctx.next_va), GFP_KERNEL);
>> - *        if (!ctx.new_va || !ctx.prev_va || !ctx.next_va) {
>> + *        ctx.vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
>> + *        if (!ctx.new_va || !ctx.prev_va || !ctx.next_va || !vm_bo) {
>>    *            ret = -ENOMEM;
>>    *            goto out;
>>    *        }
>>    *
>> + *        // Typically protected with a driver specific GEM gpuva lock
>> + *        // used in the fence signaling path for drm_gpuva_link() and
>> + *        // drm_gpuva_unlink(), hence pre-allocate.
>> + *        ctx.vm_bo = drm_gpuvm_bo_obtain_prealloc(ctx.vm_bo);
>> + *
>>    *        driver_lock_va_space();
>>    *        ret = drm_gpuvm_sm_map(gpuvm, &ctx, addr, range, obj, offset);
>>    *        driver_unlock_va_space();
>>    *
>>    *    out:
>> + *        drm_gpuvm_bo_put(ctx.vm_bo);
>>    *        kfree(ctx.new_va);
>>    *        kfree(ctx.prev_va);
>>    *        kfree(ctx.next_va);
>> @@ -565,7 +599,7 @@
>>    *
>>    *        drm_gpuva_map(ctx->vm, ctx->new_va, &op->map);
>>    *
>> - *        drm_gpuva_link(ctx->new_va);
>> + *        drm_gpuva_link(ctx->new_va, ctx->vm_bo);
>>    *
>>    *        // prevent the new GPUVA from being freed in
>>    *        // driver_mapping_create()
>> @@ -577,22 +611,23 @@
>>    *    int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
>>    *    {
>>    *        struct driver_context *ctx = __ctx;
>> + *        struct drm_gpuva *va = op->remap.unmap->va;
>>    *
>>    *        drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
>>    *
>> - *        drm_gpuva_unlink(op->remap.unmap->va);
>> - *        kfree(op->remap.unmap->va);
>> - *
>>    *        if (op->remap.prev) {
>> - *            drm_gpuva_link(ctx->prev_va);
>> + *            drm_gpuva_link(ctx->prev_va, va->vm_bo);
>>    *            ctx->prev_va = NULL;
>>    *        }
>>    *
>>    *        if (op->remap.next) {
>> - *            drm_gpuva_link(ctx->next_va);
>> + *            drm_gpuva_link(ctx->next_va, va->vm_bo);
>>    *            ctx->next_va = NULL;
>>    *        }
>>    *
>> + *        drm_gpuva_unlink(va);
>> + *        kfree(va);
>> + *
>>    *        return 0;
>>    *    }
>>    *
>> @@ -771,6 +806,194 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
>>   }
>>   EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>> +/**
>> + * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
>> + * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>> + * @obj: The &drm_gem_object being mapped in the @gpuvm.
>> + *
>> + * If provided by the driver, this function uses the &drm_gpuvm_ops
>> + * vm_bo_alloc() callback to allocate.
>> + *
>> + * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
>> + */
>> +struct drm_gpuvm_bo *
>> +drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
>> +            struct drm_gem_object *obj)
>> +{
>> +    const struct drm_gpuvm_ops *ops = gpuvm->ops;
>> +    struct drm_gpuvm_bo *vm_bo;
>> +
>> +    if (ops && ops->vm_bo_alloc)
>> +        vm_bo = ops->vm_bo_alloc();
>> +    else
>> +        vm_bo = kzalloc(sizeof(*vm_bo), GFP_KERNEL);
>> +
>> +    if (unlikely(!vm_bo))
>> +        return NULL;
>> +
>> +    vm_bo->vm = gpuvm;
>> +    vm_bo->obj = obj;
>> +
>> +    kref_init(&vm_bo->kref);
>> +    INIT_LIST_HEAD(&vm_bo->list.gpuva);
>> +    INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>> +
>> +    drm_gem_object_get(obj);
>> +
>> +    return vm_bo;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_create);
>> +
>> +static void
>> +drm_gpuvm_bo_destroy(struct kref *kref)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
>> +                          kref);
>> +    struct drm_gpuvm *gpuvm = vm_bo->vm;
>> +    const struct drm_gpuvm_ops *ops = gpuvm->ops;
>> +    struct drm_gem_object *obj = vm_bo->obj;
>> +    bool lock = !drm_gpuvm_resv_protected(gpuvm);
>> +
>> +    drm_gem_gpuva_assert_lock_held(obj);
>> +    if (!lock)
>> +        drm_gpuvm_resv_assert_held(gpuvm);
>> +
>> +    list_del(&vm_bo->list.entry.gem);
>> +
>> +    drm_gem_object_put(obj);
>> +
>> +    if (ops && ops->vm_bo_free)
>> +        ops->vm_bo_free(vm_bo);
>> +    else
>> +        kfree(vm_bo);
>> +}
>> +
>> +/**
>> + * drm_gpuvm_bo_put() - drop a struct drm_gpuvm_bo reference
>> + * @vm_bo: the &drm_gpuvm_bo to release the reference of
>> + *
>> + * This releases a reference to @vm_bo.
>> + *
>> + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
>> + * includes removing it from the GEMs gpuva list. Hence, if a call to this
>> + * function can potentially let the reference count to zero the caller must
>> + * hold the dma-resv or driver specific GEM gpuva lock.
>> + */
>> +void
>> +drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>> +{
>> +    if (vm_bo)
>> +        kref_put(&vm_bo->kref, drm_gpuvm_bo_destroy);
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>> +
>> +static struct drm_gpuvm_bo *
>> +__drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>> +            struct drm_gem_object *obj)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo;
>> +
>> +    drm_gem_gpuva_assert_lock_held(obj);
>> +
>> +    drm_gem_for_each_gpuvm_bo(vm_bo, obj)
>> +        if (vm_bo->vm == gpuvm)
>> +            return vm_bo;
>> +
>> +    return NULL;
>> +}
>> +
>> +/**
>> + * drm_gpuvm_bo_find() - find the &drm_gpuvm_bo for the given
>> + * &drm_gpuvm and &drm_gem_object
>> + * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>> + * @obj: The &drm_gem_object being mapped in the @gpuvm.
>> + *
>> + * Find the &drm_gpuvm_bo representing the combination of the given
>> + * &drm_gpuvm and &drm_gem_object. If found, increases the reference
>> + * count of the &drm_gpuvm_bo accordingly.
>> + *
>> + * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
>> + */
>> +struct drm_gpuvm_bo *
>> +drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>> +          struct drm_gem_object *obj)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo = __drm_gpuvm_bo_find(gpuvm, obj);
>> +
>> +    return vm_bo ? drm_gpuvm_bo_get(vm_bo) : NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_find);
>> +
>> +/**
>> + * drm_gpuvm_bo_obtain() - obtains and instance of the &drm_gpuvm_bo for the
>> + * given &drm_gpuvm and &drm_gem_object
>> + * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>> + * @obj: The &drm_gem_object being mapped in the @gpuvm.
>> + *
>> + * Find the &drm_gpuvm_bo representing the combination of the given
>> + * &drm_gpuvm and &drm_gem_object. If found, increases the reference
>> + * count of the &drm_gpuvm_bo accordingly. If not found, allocates a new
>> + * &drm_gpuvm_bo.
>> + *
>> + * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
>> + *
>> + * Returns: a pointer to the &drm_gpuvm_bo on success, an ERR_PTR on failure
>> + */
>> +struct drm_gpuvm_bo *
>> +drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
>> +            struct drm_gem_object *obj)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo;
>> +
>> +    vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
>> +    if (vm_bo)
>> +        return vm_bo;
>> +
>> +    vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
>> +    if (!vm_bo)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    list_add_tail(&vm_bo->list.entry.gem, &obj->gpuva.list);
>> +
>> +    return vm_bo;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain);
>> +
>> +/**
>> + * drm_gpuvm_bo_obtain_prealloc() - obtains and instance of the &drm_gpuvm_bo
>> + * for the given &drm_gpuvm and &drm_gem_object
>> + * @__vm_bo: A pre-allocated struct drm_gpuvm_bo.
>> + *
>> + * Find the &drm_gpuvm_bo representing the combination of the given
>> + * &drm_gpuvm and &drm_gem_object. If found, increases the reference
>> + * count of the found &drm_gpuvm_bo accordingly, while the @__vm_bo reference
>> + * count is decreased. If not found @__vm_bo is returned without further
>> + * increase of the reference count.
>> + *
>> + * A new &drm_gpuvm_bo is added to the GEMs gpuva list.
>> + *
>> + * Returns: a pointer to the found &drm_gpuvm_bo or @__vm_bo if no existing
>> + * &drm_gpuvm_bo was found
>> + */
>> +struct drm_gpuvm_bo *
>> +drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
>> +{
>> +    struct drm_gpuvm *gpuvm = __vm_bo->vm;
>> +    struct drm_gem_object *obj = __vm_bo->obj;
>> +    struct drm_gpuvm_bo *vm_bo;
>> +
>> +    vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
>> +    if (vm_bo) {
>> +        drm_gpuvm_bo_put(__vm_bo);
>> +        return vm_bo;
>> +    }
>> +
>> +    list_add_tail(&__vm_bo->list.entry.gem, &obj->gpuva.list);
>> +
>> +    return __vm_bo;
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>> +
>>   static int
>>   __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>              struct drm_gpuva *va)
>> @@ -860,24 +1083,33 @@ EXPORT_SYMBOL_GPL(drm_gpuva_remove);
>>   /**
>>    * drm_gpuva_link() - link a &drm_gpuva
>>    * @va: the &drm_gpuva to link
>> + * @vm_bo: the &drm_gpuvm_bo to add the &drm_gpuva to
>>    *
>> - * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>> - * associated with.
>> + * This adds the given &va to the GPU VA list of the &drm_gpuvm_bo and the
>> + * &drm_gpuvm_bo to the &drm_gem_object it is associated with.
>> + *
>> + * For every &drm_gpuva entry added to the &drm_gpuvm_bo an additional
>> + * reference of the latter is taken.
>>    *
>>    * This function expects the caller to protect the GEM's GPUVA list against
>> - * concurrent access using the GEMs dma_resv lock.
>> + * concurrent access using either the GEMs dma_resv lock or a driver specific
>> + * lock set through drm_gem_gpuva_set_lock().
>>    */
>>   void
>> -drm_gpuva_link(struct drm_gpuva *va)
>> +drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo)
>>   {
>>       struct drm_gem_object *obj = va->gem.obj;
>>       if (unlikely(!obj))
>>           return;
>> +    WARN_ON(obj != vm_bo->obj);
>>       drm_gem_gpuva_assert_lock_held(obj);
>> -    list_add_tail(&va->gem.entry, &obj->gpuva.list);
>> +    drm_gpuvm_bo_get(vm_bo);
>> +
>> +    va->vm_bo = vm_bo;
>> +    list_add_tail(&va->gem.entry, &vm_bo->list.gpuva);
>>   }
>>   EXPORT_SYMBOL_GPL(drm_gpuva_link);
>> @@ -888,13 +1120,22 @@ EXPORT_SYMBOL_GPL(drm_gpuva_link);
>>    * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>>    * associated with.
>>    *
>> + * This removes the given &va from the GPU VA list of the &drm_gpuvm_bo and
>> + * the &drm_gpuvm_bo from the &drm_gem_object it is associated with in case
>> + * this call unlinks the last &drm_gpuva from the &drm_gpuvm_bo.
>> + *
>> + * For every &drm_gpuva entry removed from the &drm_gpuvm_bo a reference of
>> + * the latter is dropped.
>> + *
>>    * This function expects the caller to protect the GEM's GPUVA list against
>> - * concurrent access using the GEMs dma_resv lock.
>> + * concurrent access using either the GEMs dma_resv lock or a driver specific
>> + * lock set through drm_gem_gpuva_set_lock().
>>    */
>>   void
>>   drm_gpuva_unlink(struct drm_gpuva *va)
>>   {
>>       struct drm_gem_object *obj = va->gem.obj;
> Can we ditch va->gem.obj now and replace it with an accessor to the vm_bo's pointer?

Theoretically, drm_gpuvm could be used to track mappings and create drm_gpuva_ops to
map / unmap stuff only. Not sure if anyone ever does that though.

If we decide to drop it and use the vm_bo's pointer instead, the drm_gpuva must always
carry a pointer to the vm_bo since the drm_gpuvm_sm_map() function family requires the
BO pointer (if any) to work correctly. Also I'm not sure about all the corresponding
lifecycle implications yet.

Either way, I'd prefer to approach this in a separate patch if we decide so.

>> +    struct drm_gpuvm_bo *vm_bo = va->vm_bo;
>>       if (unlikely(!obj))
>>           return;
>> @@ -902,6 +1143,11 @@ drm_gpuva_unlink(struct drm_gpuva *va)
>>       drm_gem_gpuva_assert_lock_held(obj);
>>       list_del_init(&va->gem.entry);
>> +    va->vm_bo = NULL;
>> +
>> +    drm_gem_object_get(obj);
>> +    drm_gpuvm_bo_put(vm_bo);
>> +    drm_gem_object_put(obj);
>
> This get->put dance is unneccesary? If the caller is required to hold a lock on obj it is also required to hold a reference on obj.

I think I had the case where the driver has a separate (external) GEM gpuva lock to
protect the GEM's VM_BO list and the VM_BO's drm_gpuva list in my mind when writing this.
However, in this case we also don't need to keep a reference on the GEM object I guess...

Will remove it.

>
> Besides, if the vm_bo's reference on obj is otherwise the last one, it will still be freed before the function exits.
>
> /Thomas
>
>