2014-07-09 12:29:09

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 00/17] Convert TTM to the new fence interface.

This series applies on top of the driver-core-next branch of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git

Before converting ttm to the new fence interface I had to fix some
drivers to require a reservation before poking with fence_obj.
After flipping the switch RCU becomes available instead, and
the extra reservations can be dropped again. :-)

I've done at least basic testing on all the drivers I've converted
at some point, but more testing is definitely welcomed!

---

Maarten Lankhorst (17):
drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
drm/ttm: kill off some members to ttm_validate_buffer
drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence
drm/ttm: call ttm_bo_wait while inside a reservation
drm/ttm: kill fence_lock
drm/nouveau: rework to new fence interface
drm/radeon: add timeout argument to radeon_fence_wait_seq
drm/radeon: use common fence implementation for fences
drm/qxl: rework to new fence interface
drm/vmwgfx: get rid of different types of fence_flags entirely
drm/vmwgfx: rework to new fence interface
drm/ttm: flip the switch, and convert to dma_fence
drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
drm/radeon: use rcu waits in some ioctls
drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
drm/ttm: use rcu in core ttm

drivers/gpu/drm/nouveau/core/core/event.c | 4
drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +---
drivers/gpu/drm/nouveau/nouveau_display.c | 25 +-
drivers/gpu/drm/nouveau/nouveau_fence.c | 431 +++++++++++++++++++----------
drivers/gpu/drm/nouveau/nouveau_fence.h | 22 +
drivers/gpu/drm/nouveau/nouveau_gem.c | 55 +---
drivers/gpu/drm/nouveau/nv04_fence.c | 4
drivers/gpu/drm/nouveau/nv10_fence.c | 4
drivers/gpu/drm/nouveau/nv17_fence.c | 2
drivers/gpu/drm/nouveau/nv50_fence.c | 2
drivers/gpu/drm/nouveau/nv84_fence.c | 11 -
drivers/gpu/drm/qxl/Makefile | 2
drivers/gpu/drm/qxl/qxl_cmd.c | 7
drivers/gpu/drm/qxl/qxl_debugfs.c | 16 +
drivers/gpu/drm/qxl/qxl_drv.h | 20 -
drivers/gpu/drm/qxl/qxl_fence.c | 91 ------
drivers/gpu/drm/qxl/qxl_kms.c | 1
drivers/gpu/drm/qxl/qxl_object.c | 2
drivers/gpu/drm/qxl/qxl_object.h | 6
drivers/gpu/drm/qxl/qxl_release.c | 172 ++++++++++--
drivers/gpu/drm/qxl/qxl_ttm.c | 93 ------
drivers/gpu/drm/radeon/radeon.h | 15 -
drivers/gpu/drm/radeon/radeon_cs.c | 10 +
drivers/gpu/drm/radeon/radeon_device.c | 60 ++++
drivers/gpu/drm/radeon/radeon_display.c | 21 +
drivers/gpu/drm/radeon/radeon_fence.c | 283 +++++++++++++++----
drivers/gpu/drm/radeon/radeon_gem.c | 19 +
drivers/gpu/drm/radeon/radeon_object.c | 8 -
drivers/gpu/drm/radeon/radeon_ttm.c | 34 --
drivers/gpu/drm/radeon/radeon_uvd.c | 10 -
drivers/gpu/drm/radeon/radeon_vm.c | 16 +
drivers/gpu/drm/ttm/ttm_bo.c | 187 ++++++-------
drivers/gpu/drm/ttm/ttm_bo_util.c | 28 --
drivers/gpu/drm/ttm/ttm_bo_vm.c | 3
drivers/gpu/drm/ttm/ttm_execbuf_util.c | 146 +++-------
drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 47 ---
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 24 --
drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 329 ++++++++++++----------
drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 35 +-
drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 43 +--
include/drm/ttm/ttm_bo_api.h | 7
include/drm/ttm/ttm_bo_driver.h | 29 --
include/drm/ttm/ttm_execbuf_util.h | 22 +
44 files changed, 1256 insertions(+), 1150 deletions(-)
delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

--
Signature


2014-07-09 12:29:16

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 01/17] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers

It seems some drivers really want this as a parameter,
like vmwgfx.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/qxl/qxl_release.c | 2 +-
drivers/gpu/drm/radeon/radeon_object.c | 2 +-
drivers/gpu/drm/radeon/radeon_uvd.c | 2 +-
drivers/gpu/drm/radeon/radeon_vm.c | 2 +-
drivers/gpu/drm/ttm/ttm_execbuf_util.c | 22 +++++++++++++---------
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 7 ++-----
drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 2 +-
include/drm/ttm/ttm_execbuf_util.h | 9 +++++----
8 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 14e776f1d14e..2b43e5deb051 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -159,7 +159,7 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
if (list_is_singular(&release->bos))
return 0;

- ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos);
+ ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos, !no_intr);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 6c717b257d6d..a3ed725ea641 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -438,7 +438,7 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
u64 bytes_moved = 0, initial_bytes_moved;
u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);

- r = ttm_eu_reserve_buffers(ticket, head);
+ r = ttm_eu_reserve_buffers(ticket, head, true);
if (unlikely(r != 0)) {
return r;
}
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index a4ad270e8261..67b2a367df40 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -620,7 +620,7 @@ static int radeon_uvd_send_msg(struct radeon_device *rdev,
INIT_LIST_HEAD(&head);
list_add(&tv.head, &head);

- r = ttm_eu_reserve_buffers(&ticket, &head);
+ r = ttm_eu_reserve_buffers(&ticket, &head, true);
if (r)
return r;

diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index eecff6bbd341..4c68852c3e72 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -364,7 +364,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
INIT_LIST_HEAD(&head);
list_add(&tv.head, &head);

- r = ttm_eu_reserve_buffers(&ticket, &head);
+ r = ttm_eu_reserve_buffers(&ticket, &head, true);
if (r)
return r;

diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index e8dac8758528..39a11bbd2bac 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
*/

int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
- struct list_head *list)
+ struct list_head *list, bool intr)
{
struct ttm_bo_global *glob;
struct ttm_validate_buffer *entry;
@@ -140,7 +140,7 @@ retry:
if (entry->reserved)
continue;

- ret = __ttm_bo_reserve(bo, true, (ticket == NULL), true,
+ ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
ticket);

if (ret == -EDEADLK) {
@@ -153,13 +153,17 @@ retry:
ttm_eu_backoff_reservation_locked(list);
spin_unlock(&glob->lru_lock);
ttm_eu_list_ref_sub(list);
- ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
- ticket);
- if (unlikely(ret != 0)) {
- if (ret == -EINTR)
- ret = -ERESTARTSYS;
- goto err_fini;
- }
+
+ if (intr) {
+ ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+ ticket);
+ if (unlikely(ret != 0)) {
+ if (ret == -EINTR)
+ ret = -ERESTARTSYS;
+ goto err_fini;
+ }
+ } else
+ ww_mutex_lock_slow(&bo->resv->lock, ticket);

entry->reserved = true;
if (unlikely(atomic_read(&bo->cpu_writers) > 0)) {
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 87df0b3674fd..5d7d2e00296b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2465,7 +2465,7 @@ int vmw_execbuf_process(struct drm_file *file_priv,
if (unlikely(ret != 0))
goto out_err_nores;

- ret = ttm_eu_reserve_buffers(&ticket, &sw_context->validate_nodes);
+ ret = ttm_eu_reserve_buffers(&ticket, &sw_context->validate_nodes, true);
if (unlikely(ret != 0))
goto out_err;

@@ -2655,10 +2655,7 @@ void __vmw_execbuf_release_pinned_bo(struct vmw_private *dev_priv,
query_val.bo = ttm_bo_reference(dev_priv->dummy_query_bo);
list_add_tail(&query_val.head, &validate_list);

- do {
- ret = ttm_eu_reserve_buffers(&ticket, &validate_list);
- } while (ret == -ERESTARTSYS);
-
+ ret = ttm_eu_reserve_buffers(&ticket, &validate_list, false);
if (unlikely(ret != 0)) {
vmw_execbuf_unpin_panic(dev_priv);
goto out_no_reserve;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 01d68f0a69dc..873613a16f72 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1215,7 +1215,7 @@ vmw_resource_check_buffer(struct vmw_resource *res,
INIT_LIST_HEAD(&val_list);
val_buf->bo = ttm_bo_reference(&res->backup->base);
list_add_tail(&val_buf->head, &val_list);
- ret = ttm_eu_reserve_buffers(NULL, &val_list);
+ ret = ttm_eu_reserve_buffers(NULL, &val_list, interruptible);
if (unlikely(ret != 0))
goto out_no_reserve;

diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index 16db7d01a336..fd95fd569ca3 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -73,6 +73,7 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
* @ticket: [out] ww_acquire_ctx filled in by call, or NULL if only
* non-blocking reserves should be tried.
* @list: thread private list of ttm_validate_buffer structs.
+ * @intr: should the wait be interruptible
*
* Tries to reserve bos pointed to by the list entries for validation.
* If the function returns 0, all buffers are marked as "unfenced",
@@ -84,9 +85,9 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
* CPU write reservations to be cleared, and for other threads to
* unreserve their buffers.
*
- * This function may return -ERESTART or -EAGAIN if the calling process
- * receives a signal while waiting. In that case, no buffers on the list
- * will be reserved upon return.
+ * If intr is set to true, this function may return -ERESTARTSYS if the
+ * calling process receives a signal while waiting. In that case, no
+ * buffers on the list will be reserved upon return.
*
* Buffers reserved by this function should be unreserved by
* a call to either ttm_eu_backoff_reservation() or
@@ -95,7 +96,7 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
*/

extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
- struct list_head *list);
+ struct list_head *list, bool intr);

/**
* function ttm_eu_fence_buffer_objects.

2014-07-09 12:29:19

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 02/17] drm/ttm: kill off some members to ttm_validate_buffer

This reorders the list to keep track of what buffers are reserved,
so previous members are always unreserved.

This gets rid of some bookkeeping that's no longer needed,
while simplifying the code some.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/qxl/qxl_release.c | 1
drivers/gpu/drm/ttm/ttm_execbuf_util.c | 142 +++++++++++--------------------
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 1
include/drm/ttm/ttm_execbuf_util.h | 3 -
4 files changed, 50 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 2b43e5deb051..e85c4d274dc0 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -350,7 +350,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)

ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
- entry->reserved = false;
}
spin_unlock(&bdev->fence_lock);
spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 39a11bbd2bac..6db47a72667e 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -32,20 +32,12 @@
#include <linux/sched.h>
#include <linux/module.h>

-static void ttm_eu_backoff_reservation_locked(struct list_head *list)
+static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
+ struct ttm_validate_buffer *entry)
{
- struct ttm_validate_buffer *entry;
-
- list_for_each_entry(entry, list, head) {
+ list_for_each_entry_continue_reverse(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
- if (!entry->reserved)
- continue;

- entry->reserved = false;
- if (entry->removed) {
- ttm_bo_add_to_lru(bo);
- entry->removed = false;
- }
__ttm_bo_unreserve(bo);
}
}
@@ -56,27 +48,9 @@ static void ttm_eu_del_from_lru_locked(struct list_head *list)

list_for_each_entry(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
- if (!entry->reserved)
- continue;
+ unsigned put_count = ttm_bo_del_from_lru(bo);

- if (!entry->removed) {
- entry->put_count = ttm_bo_del_from_lru(bo);
- entry->removed = true;
- }
- }
-}
-
-static void ttm_eu_list_ref_sub(struct list_head *list)
-{
- struct ttm_validate_buffer *entry;
-
- list_for_each_entry(entry, list, head) {
- struct ttm_buffer_object *bo = entry->bo;
-
- if (entry->put_count) {
- ttm_bo_list_ref_sub(bo, entry->put_count, true);
- entry->put_count = 0;
- }
+ ttm_bo_list_ref_sub(bo, put_count, true);
}
}

@@ -91,11 +65,18 @@ void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,

entry = list_first_entry(list, struct ttm_validate_buffer, head);
glob = entry->bo->glob;
+
spin_lock(&glob->lru_lock);
- ttm_eu_backoff_reservation_locked(list);
+ list_for_each_entry(entry, list, head) {
+ struct ttm_buffer_object *bo = entry->bo;
+
+ ttm_bo_add_to_lru(bo);
+ __ttm_bo_unreserve(bo);
+ }
+ spin_unlock(&glob->lru_lock);
+
if (ticket)
ww_acquire_fini(ticket);
- spin_unlock(&glob->lru_lock);
}
EXPORT_SYMBOL(ttm_eu_backoff_reservation);

@@ -121,64 +102,55 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
if (list_empty(list))
return 0;

- list_for_each_entry(entry, list, head) {
- entry->reserved = false;
- entry->put_count = 0;
- entry->removed = false;
- }
-
entry = list_first_entry(list, struct ttm_validate_buffer, head);
glob = entry->bo->glob;

if (ticket)
ww_acquire_init(ticket, &reservation_ww_class);
-retry:
+
list_for_each_entry(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;

- /* already slowpath reserved? */
- if (entry->reserved)
- continue;
-
ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
ticket);
+ if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
+ __ttm_bo_unreserve(bo);

- if (ret == -EDEADLK) {
- /* uh oh, we lost out, drop every reservation and try
- * to only reserve this buffer, then start over if
- * this succeeds.
- */
- BUG_ON(ticket == NULL);
- spin_lock(&glob->lru_lock);
- ttm_eu_backoff_reservation_locked(list);
- spin_unlock(&glob->lru_lock);
- ttm_eu_list_ref_sub(list);
-
- if (intr) {
- ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
- ticket);
- if (unlikely(ret != 0)) {
- if (ret == -EINTR)
- ret = -ERESTARTSYS;
- goto err_fini;
- }
- } else
- ww_mutex_lock_slow(&bo->resv->lock, ticket);
-
- entry->reserved = true;
- if (unlikely(atomic_read(&bo->cpu_writers) > 0)) {
- ret = -EBUSY;
- goto err;
- }
- goto retry;
- } else if (ret)
- goto err;
-
- entry->reserved = true;
- if (unlikely(atomic_read(&bo->cpu_writers) > 0)) {
ret = -EBUSY;
- goto err;
}
+
+ if (!ret)
+ continue;
+
+ /* uh oh, we lost out, drop every reservation and try
+ * to only reserve this buffer, then start over if
+ * this succeeds.
+ */
+ ttm_eu_backoff_reservation_reverse(list, entry);
+
+ if (ret == -EDEADLK && intr) {
+ ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+ ticket);
+ } else if (ret == -EDEADLK) {
+ ww_mutex_lock_slow(&bo->resv->lock, ticket);
+ ret = 0;
+ }
+
+ if (unlikely(ret != 0)) {
+ if (ret == -EINTR)
+ ret = -ERESTARTSYS;
+ if (ticket) {
+ ww_acquire_done(ticket);
+ ww_acquire_fini(ticket);
+ }
+ return ret;
+ }
+
+ /* move this item to the front of the list,
+ * forces correct iteration of the loop without keeping track
+ */
+ list_del(&entry->head);
+ list_add(&entry->head, list);
}

if (ticket)
@@ -186,20 +158,7 @@ retry:
spin_lock(&glob->lru_lock);
ttm_eu_del_from_lru_locked(list);
spin_unlock(&glob->lru_lock);
- ttm_eu_list_ref_sub(list);
return 0;
-
-err:
- spin_lock(&glob->lru_lock);
- ttm_eu_backoff_reservation_locked(list);
- spin_unlock(&glob->lru_lock);
- ttm_eu_list_ref_sub(list);
-err_fini:
- if (ticket) {
- ww_acquire_done(ticket);
- ww_acquire_fini(ticket);
- }
- return ret;
}
EXPORT_SYMBOL(ttm_eu_reserve_buffers);

@@ -229,7 +188,6 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
bo->sync_obj = driver->sync_obj_ref(sync_obj);
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
- entry->reserved = false;
}
spin_unlock(&bdev->fence_lock);
spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 5d7d2e00296b..f8b25bc4e634 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -346,7 +346,6 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context *sw_context,
++sw_context->cur_val_buf;
val_buf = &vval_buf->base;
val_buf->bo = ttm_bo_reference(bo);
- val_buf->reserved = false;
list_add_tail(&val_buf->head, &sw_context->validate_nodes);
vval_buf->validate_as_mob = validate_as_mob;
}
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index fd95fd569ca3..8490cb8ee0d8 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -48,9 +48,6 @@
struct ttm_validate_buffer {
struct list_head head;
struct ttm_buffer_object *bo;
- bool reserved;
- bool removed;
- int put_count;
void *old_sync_obj;
};

2014-07-09 12:29:33

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 04/17] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence

This will ensure we always hold the required lock when calling those functions.
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 2 ++
drivers/gpu/drm/nouveau/nouveau_display.c | 17 +++++++++++++----
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..33eb7164525a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1431,6 +1431,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
struct nouveau_fence *old_fence = NULL;

+ lockdep_assert_held(&nvbo->bo.resv->lock.base);
+
spin_lock(&nvbo->bo.bdev->fence_lock);
old_fence = nvbo->bo.sync_obj;
nvbo->bo.sync_obj = new_fence;
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 47ad74255bf1..826b66c44235 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -716,6 +716,9 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
}

mutex_lock(&chan->cli->mutex);
+ ret = ttm_bo_reserve(&new_bo->bo, true, false, false, NULL);
+ if (ret)
+ goto fail_unpin;

/* synchronise rendering channel with the kernel's channel */
spin_lock(&new_bo->bo.bdev->fence_lock);
@@ -723,12 +726,18 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
spin_unlock(&new_bo->bo.bdev->fence_lock);
ret = nouveau_fence_sync(fence, chan);
nouveau_fence_unref(&fence);
- if (ret)
+ if (ret) {
+ ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
+ }

- ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
- if (ret)
- goto fail_unpin;
+ if (new_bo != old_bo) {
+ ttm_bo_unreserve(&new_bo->bo);
+
+ ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
+ if (ret)
+ goto fail_unpin;
+ }

/* Initialize a page flip struct */
*s = (struct nouveau_page_flip_state)

2014-07-09 12:29:38

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 05/17] drm/ttm: call ttm_bo_wait while inside a reservation

This is the last remaining function that doesn't use the reservation
lock completely to fence off access to a buffer.
---
drivers/gpu/drm/ttm/ttm_bo.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4ab9f7171c4f..d7d34336f108 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -502,17 +502,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
if (ret)
return ret;

- /*
- * remove sync_obj with ttm_bo_wait, the wait should be
- * finished, and no new wait object should have been added.
- */
- spin_lock(&bdev->fence_lock);
- ret = ttm_bo_wait(bo, false, false, true);
- WARN_ON(ret);
- spin_unlock(&bdev->fence_lock);
- if (ret)
- return ret;
-
spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);

@@ -528,8 +517,16 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
spin_unlock(&glob->lru_lock);
return 0;
}
- } else
- spin_unlock(&bdev->fence_lock);
+
+ /*
+ * remove sync_obj with ttm_bo_wait, the wait should be
+ * finished, and no new wait object should have been added.
+ */
+ spin_lock(&bdev->fence_lock);
+ ret = ttm_bo_wait(bo, false, false, true);
+ WARN_ON(ret);
+ }
+ spin_unlock(&bdev->fence_lock);

if (ret || unlikely(list_empty(&bo->ddestroy))) {
__ttm_bo_unreserve(bo);
@@ -1539,6 +1536,8 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
void *sync_obj;
int ret = 0;

+ lockdep_assert_held(&bo->resv->lock.base);
+
if (likely(bo->sync_obj == NULL))
return 0;

2014-07-09 12:29:25

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 03/17] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Apart from some code inside ttm itself and nouveau_bo_vma_del,
this is the only place where ttm_bo_wait is used without a reservation.
Fix this so we can remove the fence_lock later on.

After the switch to rcu the reservation lock will be
removed again.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_gem.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..6e1c58a880fe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -886,17 +886,31 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
struct drm_gem_object *gem;
struct nouveau_bo *nvbo;
bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
- int ret = -EINVAL;
+ int ret;
+ struct nouveau_fence *fence = NULL;

gem = drm_gem_object_lookup(dev, file_priv, req->handle);
if (!gem)
return -ENOENT;
nvbo = nouveau_gem_object(gem);

- spin_lock(&nvbo->bo.bdev->fence_lock);
- ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
- spin_unlock(&nvbo->bo.bdev->fence_lock);
+ ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
+ if (!ret) {
+ spin_lock(&nvbo->bo.bdev->fence_lock);
+ ret = ttm_bo_wait(&nvbo->bo, true, true, true);
+ if (!no_wait && ret)
+ fence = nouveau_fence_ref(nvbo->bo.sync_obj);
+ spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+ ttm_bo_unreserve(&nvbo->bo);
+ }
drm_gem_object_unreference_unlocked(gem);
+
+ if (fence) {
+ ret = nouveau_fence_wait(fence, true, no_wait);
+ nouveau_fence_unref(&fence);
+ }
+
return ret;
}

2014-07-09 12:29:44

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 06/17] drm/ttm: kill fence_lock

No users are left, kill it off! :D
Conversion to the reservation api is next on the list, after
that the functionality can be restored with rcu.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 25 +++-------
drivers/gpu/drm/nouveau/nouveau_display.c | 6 --
drivers/gpu/drm/nouveau/nouveau_gem.c | 16 +-----
drivers/gpu/drm/qxl/qxl_cmd.c | 2 -
drivers/gpu/drm/qxl/qxl_fence.c | 4 --
drivers/gpu/drm/qxl/qxl_object.h | 2 -
drivers/gpu/drm/qxl/qxl_release.c | 2 -
drivers/gpu/drm/radeon/radeon_display.c | 12 +++--
drivers/gpu/drm/radeon/radeon_object.c | 2 -
drivers/gpu/drm/ttm/ttm_bo.c | 75 +++++++----------------------
drivers/gpu/drm/ttm/ttm_bo_util.c | 5 --
drivers/gpu/drm/ttm/ttm_bo_vm.c | 3 -
drivers/gpu/drm/ttm/ttm_execbuf_util.c | 2 -
drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 4 --
drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 17 ++-----
include/drm/ttm/ttm_bo_api.h | 5 --
include/drm/ttm/ttm_bo_driver.h | 3 -
17 files changed, 45 insertions(+), 140 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 33eb7164525a..e98af2e9a1cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1196,9 +1196,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, bool intr,
}

/* Fallback to software copy. */
- spin_lock(&bo->bdev->fence_lock);
ret = ttm_bo_wait(bo, true, intr, no_wait_gpu);
- spin_unlock(&bo->bdev->fence_lock);
if (ret == 0)
ret = ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);

@@ -1425,26 +1423,19 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
}

+static void
+nouveau_bo_fence_unref(void **sync_obj)
+{
+ nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+}
+
void
nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
{
- struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
- struct nouveau_fence *old_fence = NULL;
-
lockdep_assert_held(&nvbo->bo.resv->lock.base);

- spin_lock(&nvbo->bo.bdev->fence_lock);
- old_fence = nvbo->bo.sync_obj;
- nvbo->bo.sync_obj = new_fence;
- spin_unlock(&nvbo->bo.bdev->fence_lock);
-
- nouveau_fence_unref(&old_fence);
-}
-
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
- nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+ nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
+ nvbo->bo.sync_obj = nouveau_fence_ref(fence);
}

static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 826b66c44235..7928f8f07334 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -721,11 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
goto fail_unpin;

/* synchronise rendering channel with the kernel's channel */
- spin_lock(&new_bo->bo.bdev->fence_lock);
- fence = nouveau_fence_ref(new_bo->bo.sync_obj);
- spin_unlock(&new_bo->bo.bdev->fence_lock);
- ret = nouveau_fence_sync(fence, chan);
- nouveau_fence_unref(&fence);
+ ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
if (ret) {
ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 6e1c58a880fe..6cd5298cbb53 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -105,9 +105,7 @@ nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct nouveau_vma *vma)
list_del(&vma->head);

if (mapped) {
- spin_lock(&nvbo->bo.bdev->fence_lock);
fence = nouveau_fence_ref(nvbo->bo.sync_obj);
- spin_unlock(&nvbo->bo.bdev->fence_lock);
}

if (fence) {
@@ -432,17 +430,11 @@ retry:
static int
validate_sync(struct nouveau_channel *chan, struct nouveau_bo *nvbo)
{
- struct nouveau_fence *fence = NULL;
+ struct nouveau_fence *fence = nvbo->bo.sync_obj;
int ret = 0;

- spin_lock(&nvbo->bo.bdev->fence_lock);
- fence = nouveau_fence_ref(nvbo->bo.sync_obj);
- spin_unlock(&nvbo->bo.bdev->fence_lock);
-
- if (fence) {
+ if (fence)
ret = nouveau_fence_sync(fence, chan);
- nouveau_fence_unref(&fence);
- }

return ret;
}
@@ -661,9 +653,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
data |= r->vor;
}

- spin_lock(&nvbo->bo.bdev->fence_lock);
ret = ttm_bo_wait(&nvbo->bo, false, false, false);
- spin_unlock(&nvbo->bo.bdev->fence_lock);
if (ret) {
NV_ERROR(cli, "reloc wait_idle failed: %d\n", ret);
break;
@@ -896,11 +886,9 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,

ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
if (!ret) {
- spin_lock(&nvbo->bo.bdev->fence_lock);
ret = ttm_bo_wait(&nvbo->bo, true, true, true);
if (!no_wait && ret)
fence = nouveau_fence_ref(nvbo->bo.sync_obj);
- spin_unlock(&nvbo->bo.bdev->fence_lock);

ttm_bo_unreserve(&nvbo->bo);
}
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index eb89653a7a17..45fad7b45486 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -628,9 +628,7 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool stal
if (stall)
mutex_unlock(&qdev->surf_evict_mutex);

- spin_lock(&surf->tbo.bdev->fence_lock);
ret = ttm_bo_wait(&surf->tbo, true, true, !stall);
- spin_unlock(&surf->tbo.bdev->fence_lock);

if (stall)
mutex_lock(&qdev->surf_evict_mutex);
diff --git a/drivers/gpu/drm/qxl/qxl_fence.c b/drivers/gpu/drm/qxl/qxl_fence.c
index ae59e91cfb9a..c7248418117d 100644
--- a/drivers/gpu/drm/qxl/qxl_fence.c
+++ b/drivers/gpu/drm/qxl/qxl_fence.c
@@ -60,9 +60,6 @@ int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id)
{
void *ret;
int retval = 0;
- struct qxl_bo *bo = container_of(qfence, struct qxl_bo, fence);
-
- spin_lock(&bo->tbo.bdev->fence_lock);

ret = radix_tree_delete(&qfence->tree, rel_id);
if (ret == qfence)
@@ -71,7 +68,6 @@ int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id)
DRM_DEBUG("didn't find fence in radix tree for %d\n", rel_id);
retval = -ENOENT;
}
- spin_unlock(&bo->tbo.bdev->fence_lock);
return retval;
}

diff --git a/drivers/gpu/drm/qxl/qxl_object.h b/drivers/gpu/drm/qxl/qxl_object.h
index d458a140c024..98395b223ad0 100644
--- a/drivers/gpu/drm/qxl/qxl_object.h
+++ b/drivers/gpu/drm/qxl/qxl_object.h
@@ -76,12 +76,10 @@ static inline int qxl_bo_wait(struct qxl_bo *bo, u32 *mem_type,
}
return r;
}
- spin_lock(&bo->tbo.bdev->fence_lock);
if (mem_type)
*mem_type = bo->tbo.mem.mem_type;
if (bo->tbo.sync_obj)
r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
- spin_unlock(&bo->tbo.bdev->fence_lock);
ttm_bo_unreserve(&bo->tbo);
return r;
}
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index e85c4d274dc0..4045ba873ab8 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -337,7 +337,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
glob = bo->glob;

spin_lock(&glob->lru_lock);
- spin_lock(&bdev->fence_lock);

list_for_each_entry(entry, &release->bos, head) {
bo = entry->bo;
@@ -351,7 +350,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
}
- spin_unlock(&bdev->fence_lock);
spin_unlock(&glob->lru_lock);
ww_acquire_fini(&release->ticket);
}
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 13896edcf0b6..fb3c08dced85 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -544,10 +544,16 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
obj = new_radeon_fb->obj;
work->new_rbo = gem_to_radeon_bo(obj);

- spin_lock(&work->new_rbo->tbo.bdev->fence_lock);
- if (work->new_rbo->tbo.sync_obj)
+ if (work->new_rbo->tbo.sync_obj) {
+ int ret = ttm_bo_reserve(&work->new_rbo->tbo, true, false, false, NULL);
+ if (ret) {
+ drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base);
+ kfree(work);
+ return ret;
+ }
work->fence = radeon_fence_ref(work->new_rbo->tbo.sync_obj);
- spin_unlock(&work->new_rbo->tbo.bdev->fence_lock);
+ ttm_bo_unreserve(&work->new_rbo->tbo);
+ }

/* We borrow the event spin lock for protecting flip_work */
spin_lock_irqsave(&crtc->dev->event_lock, flags);
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index a3ed725ea641..8538aebb6580 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -734,12 +734,10 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
r = ttm_bo_reserve(&bo->tbo, true, no_wait, false, NULL);
if (unlikely(r != 0))
return r;
- spin_lock(&bo->tbo.bdev->fence_lock);
if (mem_type)
*mem_type = bo->tbo.mem.mem_type;
if (bo->tbo.sync_obj)
r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
- spin_unlock(&bo->tbo.bdev->fence_lock);
ttm_bo_unreserve(&bo->tbo);
return r;
}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index d7d34336f108..ce0434377223 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -414,24 +414,20 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);

- spin_lock(&bdev->fence_lock);
- (void) ttm_bo_wait(bo, false, false, true);
- if (!ret && !bo->sync_obj) {
- spin_unlock(&bdev->fence_lock);
- put_count = ttm_bo_del_from_lru(bo);
+ if (!ret) {
+ (void) ttm_bo_wait(bo, false, false, true);

- spin_unlock(&glob->lru_lock);
- ttm_bo_cleanup_memtype_use(bo);
+ if (!bo->sync_obj) {
+ put_count = ttm_bo_del_from_lru(bo);

- ttm_bo_list_ref_sub(bo, put_count, true);
+ spin_unlock(&glob->lru_lock);
+ ttm_bo_cleanup_memtype_use(bo);

- return;
- }
- if (bo->sync_obj)
- sync_obj = driver->sync_obj_ref(bo->sync_obj);
- spin_unlock(&bdev->fence_lock);
+ ttm_bo_list_ref_sub(bo, put_count, true);

- if (!ret) {
+ return;
+ }
+ sync_obj = driver->sync_obj_ref(bo->sync_obj);

/*
* Make NO_EVICT bos immediately available to
@@ -480,7 +476,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
int put_count;
int ret;

- spin_lock(&bdev->fence_lock);
ret = ttm_bo_wait(bo, false, false, true);

if (ret && !no_wait_gpu) {
@@ -492,7 +487,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
* no new sync objects can be attached.
*/
sync_obj = driver->sync_obj_ref(bo->sync_obj);
- spin_unlock(&bdev->fence_lock);

__ttm_bo_unreserve(bo);
spin_unlock(&glob->lru_lock);
@@ -522,11 +516,9 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
* remove sync_obj with ttm_bo_wait, the wait should be
* finished, and no new wait object should have been added.
*/
- spin_lock(&bdev->fence_lock);
ret = ttm_bo_wait(bo, false, false, true);
WARN_ON(ret);
}
- spin_unlock(&bdev->fence_lock);

if (ret || unlikely(list_empty(&bo->ddestroy))) {
__ttm_bo_unreserve(bo);
@@ -664,9 +656,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, bool interruptible,
struct ttm_placement placement;
int ret = 0;

- spin_lock(&bdev->fence_lock);
ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
- spin_unlock(&bdev->fence_lock);

if (unlikely(ret != 0)) {
if (ret != -ERESTARTSYS) {
@@ -963,7 +953,6 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
{
int ret = 0;
struct ttm_mem_reg mem;
- struct ttm_bo_device *bdev = bo->bdev;

lockdep_assert_held(&bo->resv->lock.base);

@@ -972,9 +961,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
* Have the driver move function wait for idle when necessary,
* instead of doing it here.
*/
- spin_lock(&bdev->fence_lock);
ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
- spin_unlock(&bdev->fence_lock);
if (ret)
return ret;
mem.num_pages = bo->num_pages;
@@ -1474,7 +1461,6 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
bdev->glob = glob;
bdev->need_dma32 = need_dma32;
bdev->val_seq = 0;
- spin_lock_init(&bdev->fence_lock);
mutex_lock(&glob->device_list_mutex);
list_add_tail(&bdev->device_list, &glob->device_list);
mutex_unlock(&glob->device_list_mutex);
@@ -1532,7 +1518,6 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
bool lazy, bool interruptible, bool no_wait)
{
struct ttm_bo_driver *driver = bo->bdev->driver;
- struct ttm_bo_device *bdev = bo->bdev;
void *sync_obj;
int ret = 0;

@@ -1541,53 +1526,33 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
if (likely(bo->sync_obj == NULL))
return 0;

- while (bo->sync_obj) {
-
+ if (bo->sync_obj) {
if (driver->sync_obj_signaled(bo->sync_obj)) {
- void *tmp_obj = bo->sync_obj;
- bo->sync_obj = NULL;
+ driver->sync_obj_unref(&bo->sync_obj);
clear_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
- spin_unlock(&bdev->fence_lock);
- driver->sync_obj_unref(&tmp_obj);
- spin_lock(&bdev->fence_lock);
- continue;
+ return 0;
}

if (no_wait)
return -EBUSY;

sync_obj = driver->sync_obj_ref(bo->sync_obj);
- spin_unlock(&bdev->fence_lock);
ret = driver->sync_obj_wait(sync_obj,
lazy, interruptible);
- if (unlikely(ret != 0)) {
- driver->sync_obj_unref(&sync_obj);
- spin_lock(&bdev->fence_lock);
- return ret;
- }
- spin_lock(&bdev->fence_lock);
- if (likely(bo->sync_obj == sync_obj)) {
- void *tmp_obj = bo->sync_obj;
- bo->sync_obj = NULL;
+
+ if (likely(ret == 0)) {
clear_bit(TTM_BO_PRIV_FLAG_MOVING,
&bo->priv_flags);
- spin_unlock(&bdev->fence_lock);
- driver->sync_obj_unref(&sync_obj);
- driver->sync_obj_unref(&tmp_obj);
- spin_lock(&bdev->fence_lock);
- } else {
- spin_unlock(&bdev->fence_lock);
- driver->sync_obj_unref(&sync_obj);
- spin_lock(&bdev->fence_lock);
+ driver->sync_obj_unref(&bo->sync_obj);
}
+ driver->sync_obj_unref(&sync_obj);
}
- return 0;
+ return ret;
}
EXPORT_SYMBOL(ttm_bo_wait);

int ttm_bo_synccpu_write_grab(struct ttm_buffer_object *bo, bool no_wait)
{
- struct ttm_bo_device *bdev = bo->bdev;
int ret = 0;

/*
@@ -1597,9 +1562,7 @@ int ttm_bo_synccpu_write_grab(struct ttm_buffer_object *bo, bool no_wait)
ret = ttm_bo_reserve(bo, true, no_wait, false, 0);
if (unlikely(ret != 0))
return ret;
- spin_lock(&bdev->fence_lock);
ret = ttm_bo_wait(bo, false, true, no_wait);
- spin_unlock(&bdev->fence_lock);
if (likely(ret == 0))
atomic_inc(&bo->cpu_writers);
ttm_bo_unreserve(bo);
@@ -1656,9 +1619,7 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink)
* Wait for GPU, then move to system cached.
*/

- spin_lock(&bo->bdev->fence_lock);
ret = ttm_bo_wait(bo, false, false, false);
- spin_unlock(&bo->bdev->fence_lock);

if (unlikely(ret != 0))
goto out;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1df856f78568..23db594e55c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -466,12 +466,10 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
drm_vma_node_reset(&fbo->vma_node);
atomic_set(&fbo->cpu_writers, 0);

- spin_lock(&bdev->fence_lock);
if (bo->sync_obj)
fbo->sync_obj = driver->sync_obj_ref(bo->sync_obj);
else
fbo->sync_obj = NULL;
- spin_unlock(&bdev->fence_lock);
kref_init(&fbo->list_kref);
kref_init(&fbo->kref);
fbo->destroy = &ttm_transfered_destroy;
@@ -657,7 +655,6 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
struct ttm_buffer_object *ghost_obj;
void *tmp_obj = NULL;

- spin_lock(&bdev->fence_lock);
if (bo->sync_obj) {
tmp_obj = bo->sync_obj;
bo->sync_obj = NULL;
@@ -665,7 +662,6 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
bo->sync_obj = driver->sync_obj_ref(sync_obj);
if (evict) {
ret = ttm_bo_wait(bo, false, false, false);
- spin_unlock(&bdev->fence_lock);
if (tmp_obj)
driver->sync_obj_unref(&tmp_obj);
if (ret)
@@ -688,7 +684,6 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
*/

set_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
- spin_unlock(&bdev->fence_lock);
if (tmp_obj)
driver->sync_obj_unref(&tmp_obj);

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 0ce48e5a9cb4..d05437f219e9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -45,10 +45,8 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
struct vm_area_struct *vma,
struct vm_fault *vmf)
{
- struct ttm_bo_device *bdev = bo->bdev;
int ret = 0;

- spin_lock(&bdev->fence_lock);
if (likely(!test_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags)))
goto out_unlock;

@@ -82,7 +80,6 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
VM_FAULT_NOPAGE;

out_unlock:
- spin_unlock(&bdev->fence_lock);
return ret;
}

diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 6db47a72667e..108730e9147b 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -180,7 +180,6 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
glob = bo->glob;

spin_lock(&glob->lru_lock);
- spin_lock(&bdev->fence_lock);

list_for_each_entry(entry, list, head) {
bo = entry->bo;
@@ -189,7 +188,6 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
}
- spin_unlock(&bdev->fence_lock);
spin_unlock(&glob->lru_lock);
if (ticket)
ww_acquire_fini(ticket);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 6327cfc36805..4a36bb1dc525 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -829,11 +829,7 @@ static void vmw_move_notify(struct ttm_buffer_object *bo,
*/
static void vmw_swap_notify(struct ttm_buffer_object *bo)
{
- struct ttm_bo_device *bdev = bo->bdev;
-
- spin_lock(&bdev->fence_lock);
ttm_bo_wait(bo, false, false, false);
- spin_unlock(&bdev->fence_lock);
}


diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 873613a16f72..48e47a100dea 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -567,12 +567,12 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo,
int ret;

if (flags & drm_vmw_synccpu_allow_cs) {
- struct ttm_bo_device *bdev = bo->bdev;
-
- spin_lock(&bdev->fence_lock);
- ret = ttm_bo_wait(bo, false, true,
- !!(flags & drm_vmw_synccpu_dontblock));
- spin_unlock(&bdev->fence_lock);
+ ret = ttm_bo_reserve(bo, true, !!(flags & drm_vmw_synccpu_dontblock), false, 0);
+ if (!ret) {
+ ret = ttm_bo_wait(bo, false, true,
+ !!(flags & drm_vmw_synccpu_dontblock));
+ ttm_bo_unreserve(bo);
+ }
return ret;
}

@@ -1429,12 +1429,10 @@ void vmw_fence_single_bo(struct ttm_buffer_object *bo,
else
driver->sync_obj_ref(fence);

- spin_lock(&bdev->fence_lock);

old_fence_obj = bo->sync_obj;
bo->sync_obj = fence;

- spin_unlock(&bdev->fence_lock);

if (old_fence_obj)
vmw_fence_obj_unreference(&old_fence_obj);
@@ -1475,7 +1473,6 @@ void vmw_resource_move_notify(struct ttm_buffer_object *bo,

if (mem->mem_type != VMW_PL_MOB) {
struct vmw_resource *res, *n;
- struct ttm_bo_device *bdev = bo->bdev;
struct ttm_validate_buffer val_buf;

val_buf.bo = bo;
@@ -1491,9 +1488,7 @@ void vmw_resource_move_notify(struct ttm_buffer_object *bo,
list_del_init(&res->mob_head);
}

- spin_lock(&bdev->fence_lock);
(void) ttm_bo_wait(bo, false, false, false);
- spin_unlock(&bdev->fence_lock);
}
}

diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 7526c5bf5610..67df9d7c06cc 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -227,10 +227,7 @@ struct ttm_buffer_object {
struct list_head io_reserve_lru;

/**
- * Members protected by struct buffer_object_device::fence_lock
- * In addition, setting sync_obj to anything else
- * than NULL requires bo::reserved to be held. This allows for
- * checking NULL while reserved but not holding the mentioned lock.
+ * Members protected by a bo reservation.
*/

void *sync_obj;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..0aa6caa59415 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -518,8 +518,6 @@ struct ttm_bo_global {
*
* @driver: Pointer to a struct ttm_bo_driver struct setup by the driver.
* @man: An array of mem_type_managers.
- * @fence_lock: Protects the synchronizing members on *all* bos belonging
- * to this device.
* @vma_manager: Address space manager
* lru_lock: Spinlock that protects the buffer+device lru lists and
* ddestroy lists.
@@ -539,7 +537,6 @@ struct ttm_bo_device {
struct ttm_bo_global *glob;
struct ttm_bo_driver *driver;
struct ttm_mem_type_manager man[TTM_NUM_MEM_TYPES];
- spinlock_t fence_lock;

/*
* Protected by internal locks.

2014-07-09 12:29:49

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 07/17] drm/nouveau: rework to new fence interface

From: Maarten Lankhorst <[email protected]>

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/nouveau/core/core/event.c | 4
drivers/gpu/drm/nouveau/nouveau_bo.c | 6
drivers/gpu/drm/nouveau/nouveau_display.c | 4
drivers/gpu/drm/nouveau/nouveau_fence.c | 435 ++++++++++++++++++++---------
drivers/gpu/drm/nouveau/nouveau_fence.h | 20 +
drivers/gpu/drm/nouveau/nouveau_gem.c | 17 -
drivers/gpu/drm/nouveau/nv04_fence.c | 4
drivers/gpu/drm/nouveau/nv10_fence.c | 4
drivers/gpu/drm/nouveau/nv17_fence.c | 2
drivers/gpu/drm/nouveau/nv50_fence.c | 2
drivers/gpu/drm/nouveau/nv84_fence.c | 11 -
11 files changed, 330 insertions(+), 179 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c
index ae81d3b5d8b7..5ddc28ec7660 100644
--- a/drivers/gpu/drm/nouveau/core/core/event.c
+++ b/drivers/gpu/drm/nouveau/core/core/event.c
@@ -139,14 +139,14 @@ nouveau_event_ref(struct nouveau_eventh *handler, struct nouveau_eventh **ref)
void
nouveau_event_trigger(struct nouveau_event *event, u32 types, int index)
{
- struct nouveau_eventh *handler;
+ struct nouveau_eventh *handler, *next;
unsigned long flags;

if (WARN_ON(index >= event->index_nr))
return;

spin_lock_irqsave(&event->list_lock, flags);
- list_for_each_entry(handler, &event->list[index], head) {
+ list_for_each_entry_safe(handler, next, &event->list[index], head) {
if (!test_bit(NVKM_EVENT_ENABLE, &handler->flags))
continue;
if (!(handler->types & types))
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index e98af2e9a1cb..84aba3fa1bd0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -959,7 +959,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr,
}

mutex_lock_nested(&chan->cli->mutex, SINGLE_DEPTH_NESTING);
- ret = nouveau_fence_sync(bo->sync_obj, chan);
+ ret = nouveau_fence_sync(nouveau_bo(bo), chan);
if (ret == 0) {
ret = drm->ttm.move(chan, bo, &bo->mem, new_mem);
if (ret == 0) {
@@ -1432,10 +1432,12 @@ nouveau_bo_fence_unref(void **sync_obj)
void
nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
{
- lockdep_assert_held(&nvbo->bo.resv->lock.base);
+ struct reservation_object *resv = nvbo->bo.resv;

nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
nvbo->bo.sync_obj = nouveau_fence_ref(fence);
+
+ reservation_object_add_excl_fence(resv, &fence->base);
}

static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 7928f8f07334..2c4798750b20 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -660,7 +660,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
spin_unlock_irqrestore(&dev->event_lock, flags);

/* Synchronize with the old framebuffer */
- ret = nouveau_fence_sync(old_bo->bo.sync_obj, chan);
+ ret = nouveau_fence_sync(old_bo, chan);
if (ret)
goto fail;

@@ -721,7 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
goto fail_unpin;

/* synchronise rendering channel with the kernel's channel */
- ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
+ ret = nouveau_fence_sync(new_bo, chan);
if (ret) {
ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ab5ea3b0d666..d24f8ce4341a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -32,91 +32,139 @@
#include "nouveau_drm.h"
#include "nouveau_dma.h"
#include "nouveau_fence.h"
+#include <trace/events/fence.h>

#include <engine/fifo.h>

-struct fence_work {
- struct work_struct base;
- struct list_head head;
- void (*func)(void *);
- void *data;
-};
+static const struct fence_ops nouveau_fence_ops_uevent;
+static const struct fence_ops nouveau_fence_ops_legacy;

static void
nouveau_fence_signal(struct nouveau_fence *fence)
{
- struct fence_work *work, *temp;
+ fence_signal_locked(&fence->base);
+ list_del(&fence->head);
+
+ if (fence->base.ops == &nouveau_fence_ops_uevent &&
+ fence->event.head.next) {
+ struct nouveau_event *event;

- list_for_each_entry_safe(work, temp, &fence->work, head) {
- schedule_work(&work->base);
- list_del(&work->head);
+ list_del(&fence->event.head);
+ fence->event.head.next = NULL;
+
+ event = container_of(fence->base.lock, typeof(*event), list_lock);
+ nouveau_event_put(&fence->event);
}

- fence->channel = NULL;
- list_del(&fence->head);
+ fence_put(&fence->base);
+}
+
+static struct nouveau_fence *
+nouveau_local_fence(struct fence *fence, struct nouveau_drm *drm) {
+ struct nouveau_fence_priv *priv = (void*)drm->fence;
+ struct nouveau_fence *f = container_of(fence,
+ struct nouveau_fence,
+ base);
+
+ if (fence->ops != &nouveau_fence_ops_legacy &&
+ fence->ops != &nouveau_fence_ops_uevent)
+ return NULL;
+
+ if (fence->context < priv->context_base ||
+ fence->context >= priv->context_base + priv->contexts)
+ return NULL;
+
+ return f;
}

void
nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
{
struct nouveau_fence *fence, *fnext;
- spin_lock(&fctx->lock);
- list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
+
+ spin_lock_irq(fctx->lock);
+ list_for_each_entry_safe(fence, fnext, &fctx->pending, head)
nouveau_fence_signal(fence);
- }
- spin_unlock(&fctx->lock);
+ spin_unlock_irq(fctx->lock);
}

void
-nouveau_fence_context_new(struct nouveau_fence_chan *fctx)
+nouveau_fence_context_new(struct nouveau_channel *chan, struct nouveau_fence_chan *fctx)
{
+ struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
+ struct nouveau_fifo_chan *fifo = (void*)chan->object;
+
+ fctx->lock = &pfifo->uevent->list_lock;
INIT_LIST_HEAD(&fctx->flip);
INIT_LIST_HEAD(&fctx->pending);
- spin_lock_init(&fctx->lock);
+
+ snprintf(fctx->name, sizeof(fctx->name) - 1, "nouveau channel %i", fifo->chid);
}

+struct nouveau_fence_work {
+ struct work_struct work;
+ struct fence_cb cb;
+ void (*func)(void *);
+ void *data;
+};
+
static void
nouveau_fence_work_handler(struct work_struct *kwork)
{
- struct fence_work *work = container_of(kwork, typeof(*work), base);
+ struct nouveau_fence_work *work = container_of(kwork, typeof(*work), work);
work->func(work->data);
kfree(work);
}

+static void nouveau_fence_work_cb(struct fence *fence, struct fence_cb *cb)
+{
+ struct nouveau_fence_work *work = container_of(cb, typeof(*work), cb);
+
+ schedule_work(&work->work);
+}
+
+/*
+ * In an ideal world, read would not assume the channel context is still alive.
+ * This function may be called from another device, running into free memory as a
+ * result. The drm node should still be there, so we can derive the index from
+ * the fence context.
+ */
+static bool nouveau_fence_is_signaled(struct fence *f)
+{
+ struct nouveau_fence *fence = container_of(f, struct nouveau_fence, base);
+ struct nouveau_channel *chan = fence->channel;
+ struct nouveau_fence_chan *fctx = chan->fence;
+
+ return (int)(fctx->read(chan) - fence->base.seqno) >= 0;
+}
+
void
nouveau_fence_work(struct nouveau_fence *fence,
void (*func)(void *), void *data)
{
- struct nouveau_channel *chan = fence->channel;
- struct nouveau_fence_chan *fctx;
- struct fence_work *work = NULL;
+ struct nouveau_fence_work *work;

- if (nouveau_fence_done(fence)) {
- func(data);
- return;
- }
+ if (fence_is_signaled(&fence->base))
+ goto err;

- fctx = chan->fence;
work = kmalloc(sizeof(*work), GFP_KERNEL);
if (!work) {
WARN_ON(nouveau_fence_wait(fence, false, false));
- func(data);
- return;
+ goto err;
}

- spin_lock(&fctx->lock);
- if (!fence->channel) {
- spin_unlock(&fctx->lock);
- kfree(work);
- func(data);
- return;
- }
-
- INIT_WORK(&work->base, nouveau_fence_work_handler);
+ INIT_WORK(&work->work, nouveau_fence_work_handler);
work->func = func;
work->data = data;
- list_add(&work->head, &fence->work);
- spin_unlock(&fctx->lock);
+
+ if (fence_add_callback(&fence->base, &work->cb, nouveau_fence_work_cb) < 0)
+ goto err_free;
+ return;
+
+err_free:
+ kfree(work);
+err:
+ func(data);
}

static void
@@ -125,33 +173,45 @@ nouveau_fence_update(struct nouveau_channel *chan)
struct nouveau_fence_chan *fctx = chan->fence;
struct nouveau_fence *fence, *fnext;

- spin_lock(&fctx->lock);
+ u32 seq = fctx->read(chan);
+
list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
- if (fctx->read(chan) < fence->sequence)
+ if ((int)(seq - fence->base.seqno) < 0)
break;

nouveau_fence_signal(fence);
- nouveau_fence_unref(&fence);
}
- spin_unlock(&fctx->lock);
}

int
nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
{
struct nouveau_fence_chan *fctx = chan->fence;
+ struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
+ struct nouveau_fifo_chan *fifo = (void*)chan->object;
+ struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
int ret;

fence->channel = chan;
fence->timeout = jiffies + (15 * HZ);
- fence->sequence = ++fctx->sequence;

+ if (priv->uevent)
+ fence_init(&fence->base, &nouveau_fence_ops_uevent,
+ &pfifo->uevent->list_lock,
+ priv->context_base + fifo->chid, ++fctx->sequence);
+ else
+ fence_init(&fence->base, &nouveau_fence_ops_legacy,
+ &pfifo->uevent->list_lock,
+ priv->context_base + fifo->chid, ++fctx->sequence);
+
+ trace_fence_emit(&fence->base);
ret = fctx->emit(fence);
if (!ret) {
- kref_get(&fence->kref);
- spin_lock(&fctx->lock);
+ fence_get(&fence->base);
+ spin_lock_irq(fctx->lock);
+ nouveau_fence_update(chan);
list_add_tail(&fence->head, &fctx->pending);
- spin_unlock(&fctx->lock);
+ spin_unlock_irq(fctx->lock);
}

return ret;
@@ -160,104 +220,71 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
bool
nouveau_fence_done(struct nouveau_fence *fence)
{
- if (fence->channel)
+ if (fence->base.ops == &nouveau_fence_ops_legacy ||
+ fence->base.ops == &nouveau_fence_ops_uevent) {
+ struct nouveau_fence_chan *fctx;
+ unsigned long flags;
+
+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
+ return true;
+
+ fctx = fence->channel->fence;
+ spin_lock_irqsave(fctx->lock, flags);
nouveau_fence_update(fence->channel);
- return !fence->channel;
+ spin_unlock_irqrestore(fctx->lock, flags);
+ }
+ return fence_is_signaled(&fence->base);
}

-static int
-nouveau_fence_wait_uevent_handler(void *data, u32 type, int index)
+static long
+nouveau_fence_wait_legacy(struct fence *f, bool intr, long wait)
{
- struct nouveau_fence_priv *priv = data;
- wake_up_all(&priv->waiting);
- return NVKM_EVENT_KEEP;
-}
+ struct nouveau_fence *fence = container_of(f, typeof(*fence), base);
+ unsigned long sleep_time = NSEC_PER_MSEC / 1000;
+ unsigned long t = jiffies, timeout = t + wait;

-static int
-nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr)
+ while (!nouveau_fence_done(fence)) {
+ ktime_t kt;

-{
- struct nouveau_channel *chan = fence->channel;
- struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
- struct nouveau_fence_priv *priv = chan->drm->fence;
- struct nouveau_eventh *handler;
- int ret = 0;
+ t = jiffies;

- ret = nouveau_event_new(pfifo->uevent, 1, 0,
- nouveau_fence_wait_uevent_handler,
- priv, &handler);
- if (ret)
- return ret;
+ if (wait != MAX_SCHEDULE_TIMEOUT && time_after_eq(t, timeout)) {
+ __set_current_state(TASK_RUNNING);
+ return 0;
+ }

- nouveau_event_get(handler);
+ __set_current_state(intr ? TASK_INTERRUPTIBLE :
+ TASK_UNINTERRUPTIBLE);

- if (fence->timeout) {
- unsigned long timeout = fence->timeout - jiffies;
-
- if (time_before(jiffies, fence->timeout)) {
- if (intr) {
- ret = wait_event_interruptible_timeout(
- priv->waiting,
- nouveau_fence_done(fence),
- timeout);
- } else {
- ret = wait_event_timeout(priv->waiting,
- nouveau_fence_done(fence),
- timeout);
- }
- }
+ kt = ktime_set(0, sleep_time);
+ schedule_hrtimeout(&kt, HRTIMER_MODE_REL);
+ sleep_time *= 2;
+ if (sleep_time > NSEC_PER_MSEC)
+ sleep_time = NSEC_PER_MSEC;

- if (ret >= 0) {
- fence->timeout = jiffies + ret;
- if (time_after_eq(jiffies, fence->timeout))
- ret = -EBUSY;
- }
- } else {
- if (intr) {
- ret = wait_event_interruptible(priv->waiting,
- nouveau_fence_done(fence));
- } else {
- wait_event(priv->waiting, nouveau_fence_done(fence));
- }
+ if (intr && signal_pending(current))
+ return -ERESTARTSYS;
}

- nouveau_event_ref(NULL, &handler);
- if (unlikely(ret < 0))
- return ret;
+ __set_current_state(TASK_RUNNING);

- return 0;
+ return timeout - t;
}

-int
-nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
+static int
+nouveau_fence_wait_busy(struct nouveau_fence *fence, bool intr)
{
- struct nouveau_channel *chan = fence->channel;
- struct nouveau_fence_priv *priv = chan ? chan->drm->fence : NULL;
- unsigned long sleep_time = NSEC_PER_MSEC / 1000;
- ktime_t t;
int ret = 0;

- while (priv && priv->uevent && lazy && !nouveau_fence_done(fence)) {
- ret = nouveau_fence_wait_uevent(fence, intr);
- if (ret < 0)
- return ret;
- }
-
while (!nouveau_fence_done(fence)) {
- if (fence->timeout && time_after_eq(jiffies, fence->timeout)) {
+ if (time_after_eq(jiffies, fence->timeout)) {
ret = -EBUSY;
break;
}

- __set_current_state(intr ? TASK_INTERRUPTIBLE :
- TASK_UNINTERRUPTIBLE);
- if (lazy) {
- t = ktime_set(0, sleep_time);
- schedule_hrtimeout(&t, HRTIMER_MODE_REL);
- sleep_time *= 2;
- if (sleep_time > NSEC_PER_MSEC)
- sleep_time = NSEC_PER_MSEC;
- }
+ __set_current_state(intr ?
+ TASK_INTERRUPTIBLE :
+ TASK_UNINTERRUPTIBLE);

if (intr && signal_pending(current)) {
ret = -ERESTARTSYS;
@@ -270,36 +297,79 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
}

int
-nouveau_fence_sync(struct nouveau_fence *fence, struct nouveau_channel *chan)
+nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
+{
+ long ret;
+
+ if (!lazy)
+ return nouveau_fence_wait_busy(fence, intr);
+
+ ret = fence_wait_timeout(&fence->base, intr, 15 * HZ);
+ if (ret < 0)
+ return ret;
+ else if (!ret)
+ return -EBUSY;
+ else
+ return 0;
+}
+
+int
+nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan)
{
struct nouveau_fence_chan *fctx = chan->fence;
- struct nouveau_channel *prev;
- int ret = 0;
+ struct fence *fence = NULL;
+ struct reservation_object *resv = nvbo->bo.resv;
+ struct reservation_object_list *fobj;
+ int ret = 0, i;
+
+ fence = nvbo->bo.sync_obj;
+ if (fence && fence_is_signaled(fence)) {
+ nouveau_fence_unref((struct nouveau_fence **)
+ &nvbo->bo.sync_obj);
+ fence = NULL;
+ }
+
+ if (fence) {
+ struct nouveau_fence *f = container_of(fence,
+ struct nouveau_fence,
+ base);
+ struct nouveau_channel *prev = f->channel;

- prev = fence ? fence->channel : NULL;
- if (prev) {
- if (unlikely(prev != chan && !nouveau_fence_done(fence))) {
- ret = fctx->sync(fence, prev, chan);
+ if (prev != chan) {
+ ret = fctx->sync(f, prev, chan);
if (unlikely(ret))
- ret = nouveau_fence_wait(fence, true, false);
+ ret = nouveau_fence_wait(f, true, true);
}
}

- return ret;
-}
+ if (ret)
+ return ret;

-static void
-nouveau_fence_del(struct kref *kref)
-{
- struct nouveau_fence *fence = container_of(kref, typeof(*fence), kref);
- kfree(fence);
+ fence = reservation_object_get_excl(resv);
+ if (fence && !nouveau_local_fence(fence, chan->drm))
+ ret = fence_wait(fence, true);
+
+ fobj = reservation_object_get_list(resv);
+ if (!fobj || ret)
+ return ret;
+
+ for (i = 0; i < fobj->shared_count && !ret; ++i) {
+ fence = rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(resv));
+
+ /* should always be true, for now */
+ if (!nouveau_local_fence(fence, chan->drm))
+ ret = fence_wait(fence, true);
+ }
+
+ return ret;
}

void
nouveau_fence_unref(struct nouveau_fence **pfence)
{
if (*pfence)
- kref_put(&(*pfence)->kref, nouveau_fence_del);
+ fence_put(&(*pfence)->base);
*pfence = NULL;
}

@@ -307,7 +377,7 @@ struct nouveau_fence *
nouveau_fence_ref(struct nouveau_fence *fence)
{
if (fence)
- kref_get(&fence->kref);
+ fence_get(&fence->base);
return fence;
}

@@ -325,9 +395,7 @@ nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
if (!fence)
return -ENOMEM;

- INIT_LIST_HEAD(&fence->work);
fence->sysmem = sysmem;
- kref_init(&fence->kref);

ret = nouveau_fence_emit(fence, chan);
if (ret)
@@ -336,3 +404,86 @@ nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
*pfence = fence;
return ret;
}
+
+
+static bool nouveau_fence_no_signaling(struct fence *f)
+{
+ /*
+ * This needs uevents to work correctly, but fence_add_callback relies on
+ * being able to enable signaling. It will still get signaled eventually,
+ * just not right away.
+ */
+ if (nouveau_fence_is_signaled(f))
+ return false;
+
+ return true;
+}
+
+static const char *nouveau_fence_get_get_driver_name(struct fence *fence)
+{
+ return "nouveau";
+}
+
+static const char *nouveau_fence_get_timeline_name(struct fence *f)
+{
+ struct nouveau_fence *fence =
+ container_of(f, struct nouveau_fence, base);
+ struct nouveau_fence_chan *fctx = fence->channel->fence;
+
+ return fctx ? fctx->name : "dead channel";
+}
+
+static const struct fence_ops nouveau_fence_ops_legacy = {
+ .get_driver_name = nouveau_fence_get_get_driver_name,
+ .get_timeline_name = nouveau_fence_get_timeline_name,
+ .enable_signaling = nouveau_fence_no_signaling,
+ .signaled = nouveau_fence_is_signaled,
+ .wait = nouveau_fence_wait_legacy,
+ .release = NULL
+};
+
+static int
+nouveau_fence_wait_uevent_handler(void *priv, u32 types, int index)
+{
+ struct nouveau_fence *fence = priv;
+
+ if (nouveau_fence_is_signaled(&fence->base))
+ nouveau_fence_signal(fence);
+
+ /*
+ * NVKM_EVENT_DROP is never appropriate here, nouveau_fence_signal
+ * will unlink and free the event if needed.
+ */
+ return NVKM_EVENT_KEEP;
+}
+
+static bool nouveau_fence_enable_signaling(struct fence *f)
+{
+ struct nouveau_fence *fence = container_of(f, struct nouveau_fence, base);
+ struct nouveau_event *event = container_of(f->lock, struct nouveau_event, list_lock);
+ struct nouveau_eventh *handler = &fence->event;
+
+ handler->event = event;
+ handler->func = nouveau_fence_wait_uevent_handler;
+ handler->priv = fence;
+ handler->types = 1;
+
+ nouveau_event_get(handler);
+ if (nouveau_fence_is_signaled(f)) {
+ nouveau_event_put(handler);
+ return false;
+ }
+
+ list_add_tail(&handler->head, &event->list[0]);
+
+ return true;
+}
+
+static const struct fence_ops nouveau_fence_ops_uevent = {
+ .get_driver_name = nouveau_fence_get_get_driver_name,
+ .get_timeline_name = nouveau_fence_get_timeline_name,
+ .enable_signaling = nouveau_fence_enable_signaling,
+ .signaled = nouveau_fence_is_signaled,
+ .wait = fence_default_wait,
+ .release = NULL
+};
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index c57bb61da58c..1989ec22e66e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -1,18 +1,21 @@
#ifndef __NOUVEAU_FENCE_H__
#define __NOUVEAU_FENCE_H__

+#include <linux/fence.h>
+
struct nouveau_drm;
+struct nouveau_bo;

struct nouveau_fence {
+ struct fence base;
+
struct list_head head;
- struct list_head work;
- struct kref kref;
+ struct nouveau_eventh event;

bool sysmem;

struct nouveau_channel *channel;
unsigned long timeout;
- u32 sequence;
};

int nouveau_fence_new(struct nouveau_channel *, bool sysmem,
@@ -25,7 +28,7 @@ int nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *);
bool nouveau_fence_done(struct nouveau_fence *);
void nouveau_fence_work(struct nouveau_fence *, void (*)(void *), void *);
int nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
-int nouveau_fence_sync(struct nouveau_fence *, struct nouveau_channel *);
+int nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *);

struct nouveau_fence_chan {
struct list_head pending;
@@ -38,8 +41,10 @@ struct nouveau_fence_chan {
int (*emit32)(struct nouveau_channel *, u64, u32);
int (*sync32)(struct nouveau_channel *, u64, u32);

- spinlock_t lock;
+ spinlock_t *lock;
u32 sequence;
+ u32 context;
+ char name[24];
};

struct nouveau_fence_priv {
@@ -49,13 +54,14 @@ struct nouveau_fence_priv {
int (*context_new)(struct nouveau_channel *);
void (*context_del)(struct nouveau_channel *);

- wait_queue_head_t waiting;
bool uevent;
+
+ u32 contexts, context_base;
};

#define nouveau_fence(drm) ((struct nouveau_fence_priv *)(drm)->fence)

-void nouveau_fence_context_new(struct nouveau_fence_chan *);
+void nouveau_fence_context_new(struct nouveau_channel *, struct nouveau_fence_chan *);
void nouveau_fence_context_del(struct nouveau_fence_chan *);

int nv04_fence_create(struct nouveau_drm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 6cd5298cbb53..a61530becfb9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -428,18 +428,6 @@ retry:
}

static int
-validate_sync(struct nouveau_channel *chan, struct nouveau_bo *nvbo)
-{
- struct nouveau_fence *fence = nvbo->bo.sync_obj;
- int ret = 0;
-
- if (fence)
- ret = nouveau_fence_sync(fence, chan);
-
- return ret;
-}
-
-static int
validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo,
uint64_t user_pbbo_ptr)
@@ -468,9 +456,10 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
return ret;
}

- ret = validate_sync(chan, nvbo);
+ ret = nouveau_fence_sync(nvbo, chan);
if (unlikely(ret)) {
- NV_ERROR(cli, "fail post-validate sync\n");
+ if (ret != -ERESTARTSYS)
+ NV_ERROR(cli, "fail post-validate sync\n");
return ret;
}

diff --git a/drivers/gpu/drm/nouveau/nv04_fence.c b/drivers/gpu/drm/nouveau/nv04_fence.c
index 94eadd1dd10a..997c54122ed9 100644
--- a/drivers/gpu/drm/nouveau/nv04_fence.c
+++ b/drivers/gpu/drm/nouveau/nv04_fence.c
@@ -43,7 +43,7 @@ nv04_fence_emit(struct nouveau_fence *fence)
int ret = RING_SPACE(chan, 2);
if (ret == 0) {
BEGIN_NV04(chan, NvSubSw, 0x0150, 1);
- OUT_RING (chan, fence->sequence);
+ OUT_RING (chan, fence->base.seqno);
FIRE_RING (chan);
}
return ret;
@@ -77,7 +77,7 @@ nv04_fence_context_new(struct nouveau_channel *chan)
{
struct nv04_fence_chan *fctx = kzalloc(sizeof(*fctx), GFP_KERNEL);
if (fctx) {
- nouveau_fence_context_new(&fctx->base);
+ nouveau_fence_context_new(chan, &fctx->base);
fctx->base.emit = nv04_fence_emit;
fctx->base.sync = nv04_fence_sync;
fctx->base.read = nv04_fence_read;
diff --git a/drivers/gpu/drm/nouveau/nv10_fence.c b/drivers/gpu/drm/nouveau/nv10_fence.c
index 06f434f03fba..e8f73f7f31ef 100644
--- a/drivers/gpu/drm/nouveau/nv10_fence.c
+++ b/drivers/gpu/drm/nouveau/nv10_fence.c
@@ -36,7 +36,7 @@ nv10_fence_emit(struct nouveau_fence *fence)
int ret = RING_SPACE(chan, 2);
if (ret == 0) {
BEGIN_NV04(chan, 0, NV10_SUBCHAN_REF_CNT, 1);
- OUT_RING (chan, fence->sequence);
+ OUT_RING (chan, fence->base.seqno);
FIRE_RING (chan);
}
return ret;
@@ -74,7 +74,7 @@ nv10_fence_context_new(struct nouveau_channel *chan)
if (!fctx)
return -ENOMEM;

- nouveau_fence_context_new(&fctx->base);
+ nouveau_fence_context_new(chan, &fctx->base);
fctx->base.emit = nv10_fence_emit;
fctx->base.read = nv10_fence_read;
fctx->base.sync = nv10_fence_sync;
diff --git a/drivers/gpu/drm/nouveau/nv17_fence.c b/drivers/gpu/drm/nouveau/nv17_fence.c
index 22aa9963ea6f..e404bab31e9d 100644
--- a/drivers/gpu/drm/nouveau/nv17_fence.c
+++ b/drivers/gpu/drm/nouveau/nv17_fence.c
@@ -83,7 +83,7 @@ nv17_fence_context_new(struct nouveau_channel *chan)
if (!fctx)
return -ENOMEM;

- nouveau_fence_context_new(&fctx->base);
+ nouveau_fence_context_new(chan, &fctx->base);
fctx->base.emit = nv10_fence_emit;
fctx->base.read = nv10_fence_read;
fctx->base.sync = nv17_fence_sync;
diff --git a/drivers/gpu/drm/nouveau/nv50_fence.c b/drivers/gpu/drm/nouveau/nv50_fence.c
index 0ee363840035..19f6fccb84a1 100644
--- a/drivers/gpu/drm/nouveau/nv50_fence.c
+++ b/drivers/gpu/drm/nouveau/nv50_fence.c
@@ -47,7 +47,7 @@ nv50_fence_context_new(struct nouveau_channel *chan)
if (!fctx)
return -ENOMEM;

- nouveau_fence_context_new(&fctx->base);
+ nouveau_fence_context_new(chan, &fctx->base);
fctx->base.emit = nv10_fence_emit;
fctx->base.read = nv10_fence_read;
fctx->base.sync = nv17_fence_sync;
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c
index 9fd475c89820..8a06727b23d1 100644
--- a/drivers/gpu/drm/nouveau/nv84_fence.c
+++ b/drivers/gpu/drm/nouveau/nv84_fence.c
@@ -89,7 +89,7 @@ nv84_fence_emit(struct nouveau_fence *fence)
else
addr += fctx->vma.offset;

- return fctx->base.emit32(chan, addr, fence->sequence);
+ return fctx->base.emit32(chan, addr, fence->base.seqno);
}

static int
@@ -105,7 +105,7 @@ nv84_fence_sync(struct nouveau_fence *fence,
else
addr += fctx->vma.offset;

- return fctx->base.sync32(chan, addr, fence->sequence);
+ return fctx->base.sync32(chan, addr, fence->base.seqno);
}

static u32
@@ -149,12 +149,14 @@ nv84_fence_context_new(struct nouveau_channel *chan)
if (!fctx)
return -ENOMEM;

- nouveau_fence_context_new(&fctx->base);
+ nouveau_fence_context_new(chan, &fctx->base);
fctx->base.emit = nv84_fence_emit;
fctx->base.sync = nv84_fence_sync;
fctx->base.read = nv84_fence_read;
fctx->base.emit32 = nv84_fence_emit32;
fctx->base.sync32 = nv84_fence_sync32;
+ fctx->base.sequence = nv84_fence_read(chan);
+ fctx->base.context = priv->base.context_base + fifo->chid;

ret = nouveau_bo_vma_add(priv->bo, client->vm, &fctx->vma);
if (ret == 0) {
@@ -239,7 +241,8 @@ nv84_fence_create(struct nouveau_drm *drm)
priv->base.context_new = nv84_fence_context_new;
priv->base.context_del = nv84_fence_context_del;

- init_waitqueue_head(&priv->base.waiting);
+ priv->base.contexts = pfifo->max + 1;
+ priv->base.context_base = fence_context_alloc(priv->base.contexts);
priv->base.uevent = true;

ret = nouveau_bo_new(drm->dev, 16 * (pfifo->max + 1), 0,

2014-07-09 12:29:55

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 08/17] drm/radeon: add timeout argument to radeon_fence_wait_seq

This makes it possible to wait for a specific amount of time,
rather than wait until infinity.

Signed-off-by: Maarten Lankhorst <[email protected]>
Reviewed-by: Christian König <[email protected]>
---
drivers/gpu/drm/radeon/radeon_fence.c | 60 ++++++++++++++++++++++-----------
1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 913787085dfa..6435719fd45b 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
}

/**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
*
* @rdev: radeon device pointer
* @target_seq: sequence number(s) we want to wait for
* @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
*
* Wait for the requested sequence number(s) to be written by any ring
* (all asics). Sequnce number array is indexed by ring id.
* @intr selects whether to use interruptable (true) or non-interruptable
* (false) sleep when waiting for the sequence number. Helper function
* for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
* -EDEADLK is returned when a GPU lockup has been detected.
*/
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
- bool intr)
+static long radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+ u64 *target_seq, bool intr,
+ long timeout)
{
uint64_t last_seq[RADEON_NUM_RINGS];
bool signaled;
- int i, r;
+ int i;

while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+ long r, waited;
+
+ waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+ timeout : RADEON_FENCE_JIFFIES_TIMEOUT;

/* Save current sequence values, used to check for GPU lockups */
for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,11 +326,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
if (intr) {
r = wait_event_interruptible_timeout(rdev->fence_queue, (
(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
- || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+ || rdev->needs_reset), waited);
} else {
r = wait_event_timeout(rdev->fence_queue, (
(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
- || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+ || rdev->needs_reset), waited);
}

for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -337,6 +344,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
if (unlikely(r < 0))
return r;

+ timeout -= waited - r;
+
+ /*
+ * If this is a timed wait and the wait completely timed out just return.
+ */
+ if (!timeout)
+ break;
+
if (unlikely(!signaled)) {
if (rdev->needs_reset)
return -EDEADLK;
@@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
}
}
}
- return 0;
+ return timeout;
}

/**
* radeon_fence_wait - wait for a fence to signal
*
* @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
*
* Wait for the requested fence to signal (all asics).
* @intr selects whether to use interruptable (true) or non-interruptable
@@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
int radeon_fence_wait(struct radeon_fence *fence, bool intr)
{
uint64_t seq[RADEON_NUM_RINGS] = {};
- int r;
+ long r;

if (fence == NULL) {
WARN(1, "Querying an invalid fence : %p !\n", fence);
@@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
return 0;

- r = radeon_fence_wait_seq(fence->rdev, seq, intr);
- if (r)
+ r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+ if (r < 0) {
return r;
+ }

fence->seq = RADEON_FENCE_SIGNALED_SEQ;
return 0;
@@ -434,7 +450,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
{
uint64_t seq[RADEON_NUM_RINGS];
unsigned i, num_rings = 0;
- int r;
+ long r;

for (i = 0; i < RADEON_NUM_RINGS; ++i) {
seq[i] = 0;
@@ -455,8 +471,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
if (num_rings == 0)
return -ENOENT;

- r = radeon_fence_wait_seq(rdev, seq, intr);
- if (r) {
+ r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+ if (r < 0) {
return r;
}
return 0;
@@ -475,6 +491,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
{
uint64_t seq[RADEON_NUM_RINGS] = {};
+ long r;

seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -482,7 +499,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
already the last emited fence */
return -ENOENT;
}
- return radeon_fence_wait_seq(rdev, seq, false);
+ r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+ if (r < 0)
+ return r;
+ return 0;
}

/**
@@ -498,18 +518,18 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
{
uint64_t seq[RADEON_NUM_RINGS] = {};
- int r;
+ long r;

seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
if (!seq[ring])
return 0;

- r = radeon_fence_wait_seq(rdev, seq, false);
- if (r) {
+ r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+ if (r < 0) {
if (r == -EDEADLK)
return -EDEADLK;

- dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n",
+ dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%ld)\n",
ring, r);
}
return 0;

2014-07-09 12:30:03

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 09/17] drm/radeon: use common fence implementation for fences

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/radeon/radeon.h | 15 +-
drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
3 files changed, 248 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 29d9cc04c04e..03a5567f2c2f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
#include <linux/wait.h>
#include <linux/list.h>
#include <linux/kref.h>
+#include <linux/fence.h>

#include <ttm/ttm_bo_api.h>
#include <ttm/ttm_bo_driver.h>
@@ -116,9 +117,6 @@ extern int radeon_deep_color;
#define RADEONFB_CONN_LIMIT 4
#define RADEON_BIOS_NUM_SCRATCH 8

-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ 0LL
-
/* internal ring indices */
/* r1xx+ has gfx CP ring */
#define RADEON_RING_TYPE_GFX_INDEX 0
@@ -350,12 +348,15 @@ struct radeon_fence_driver {
};

struct radeon_fence {
+ struct fence base;
+
struct radeon_device *rdev;
- struct kref kref;
/* protected by radeon_fence.lock */
uint64_t seq;
/* RB, DMA, etc. */
unsigned ring;
+
+ wait_queue_t fence_wake;
};

int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2268,6 +2269,7 @@ struct radeon_device {
struct radeon_mman mman;
struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t fence_queue;
+ unsigned fence_context;
struct mutex ring_lock;
struct radeon_ring ring[RADEON_NUM_RINGS];
bool ib_pool_ready;
@@ -2358,11 +2360,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);

/*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
* Registers read & write functions.
*/
#define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 03686fab842d..86699df7c8f3 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1213,6 +1213,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+ rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);

DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1607,6 +1608,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
return 0;
}

+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+ uint32_t mask = 0;
+ int i;
+
+ if (!rdev->ddev->irq_enabled)
+ return mask;
+
+ /*
+ * increase refcount on sw interrupts for all rings to stop
+ * enabling interrupts in radeon_fence_enable_signaling during
+ * gpu reset.
+ */
+
+ for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+ if (!rdev->ring[i].ready)
+ continue;
+
+ atomic_inc(&rdev->irq.ring_int[i]);
+ mask |= 1 << i;
+ }
+ return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+ unsigned long irqflags;
+ int i;
+
+ if (!mask)
+ return;
+
+ /*
+ * undo refcount increase, and reset irqs to correct value.
+ */
+
+ for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+ if (!(mask & (1 << i)))
+ continue;
+
+ atomic_dec(&rdev->irq.ring_int[i]);
+ }
+
+ spin_lock_irqsave(&rdev->irq.lock, irqflags);
+ radeon_irq_set(rdev);
+ spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
/**
* radeon_gpu_reset - reset the asic
*
@@ -1624,6 +1673,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)

int i, r;
int resched;
+ uint32_t sw_mask;

down_write(&rdev->exclusive_lock);

@@ -1637,6 +1687,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
radeon_save_bios_scratch_regs(rdev);
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+ sw_mask = radeon_gpu_mask_sw_irq(rdev);
radeon_pm_suspend(rdev);
radeon_suspend(rdev);

@@ -1686,13 +1737,20 @@ retry:
radeon_pm_resume(rdev);
drm_helper_resume_force_mode(rdev->ddev);

+ radeon_gpu_unmask_sw_irq(rdev, sw_mask);
ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
if (r) {
/* bad news, how to tell it to userspace ? */
dev_info(rdev->dev, "GPU reset failed\n");
}

- up_write(&rdev->exclusive_lock);
+ /*
+ * force all waiters to recheck, some may have been
+ * added while the exclusive_lock was unavailable
+ */
+ downgrade_write(&rdev->exclusive_lock);
+ wake_up_all(&rdev->fence_queue);
+ up_read(&rdev->exclusive_lock);
return r;
}

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 6435719fd45b..81c98f6ff0ca 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
#include "radeon.h"
#include "radeon_trace.h"

+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+ ({ \
+ struct radeon_fence *__f; \
+ __f = container_of((p), struct radeon_fence, base); \
+ __f->base.ops == &radeon_fence_ops ? __f : NULL; \
+ })
+
/*
* Fences
* Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
struct radeon_fence **fence,
int ring)
{
+ u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
/* we are protected by the ring emission mutex */
*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
if ((*fence) == NULL) {
return -ENOMEM;
}
- kref_init(&((*fence)->kref));
- (*fence)->rdev = rdev;
- (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
(*fence)->ring = ring;
+ fence_init(&(*fence)->base, &radeon_fence_ops,
+ &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+ (*fence)->rdev = rdev;
+ (*fence)->seq = seq;
radeon_fence_ring_emit(rdev, ring, *fence);
trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
return 0;
}

/**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
*
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
*/
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+ struct radeon_fence *fence;
+ u64 seq;
+
+ fence = container_of(wait, struct radeon_fence, fence_wake);
+
+ seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+ if (seq >= fence->seq) {
+ int ret = fence_signal_locked(&fence->base);
+
+ if (!ret)
+ FENCE_TRACE(&fence->base, "signaled from irq context\n");
+ else
+ FENCE_TRACE(&fence->base, "was already signaled\n");
+
+ radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+ __remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+ fence_put(&fence->base);
+ } else
+ FENCE_TRACE(&fence->base, "pending\n");
+ return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
{
uint64_t seq, last_seq, last_emitted;
unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
}
} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);

- if (wake)
- wake_up_all(&rdev->fence_queue);
+ return wake;
}

/**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
*
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
*
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
*/
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
{
- struct radeon_fence *fence;
-
- fence = container_of(kref, struct radeon_fence, kref);
- kfree(fence);
+ if (__radeon_fence_process(rdev, ring))
+ wake_up_all(&rdev->fence_queue);
}

/**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
return false;
}

+static bool __radeon_fence_signaled(struct fence *f)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ struct radeon_device *rdev = fence->rdev;
+ unsigned ring = fence->ring;
+ u64 seq = fence->seq;
+
+ if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+ return true;
+ }
+
+ if (down_read_trylock(&rdev->exclusive_lock)) {
+ radeon_fence_process(rdev, ring);
+ up_read(&rdev->exclusive_lock);
+
+ if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+ return true;
+ }
+ }
+ return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ struct radeon_device *rdev = fence->rdev;
+
+ if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+ !rdev->ddev->irq_enabled)
+ return false;
+
+ radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+ if (down_read_trylock(&rdev->exclusive_lock)) {
+ if (__radeon_fence_process(rdev, fence->ring))
+ wake_up_all_locked(&rdev->fence_queue);
+
+ up_read(&rdev->exclusive_lock);
+ }
+
+ /* did fence get signaled after we enabled the sw irq? */
+ if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+ radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+ return false;
+ }
+
+ fence->fence_wake.flags = 0;
+ fence->fence_wake.private = NULL;
+ fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+ fence_get(f);
+
+ return true;
+}
+
/**
* radeon_fence_signaled - check if a fence has signaled
*
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
if (!fence) {
return true;
}
- if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
- return true;
- }
+
if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
- fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+ int ret;
+
+ ret = fence_signal(&fence->base);
+ if (!ret)
+ FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
return true;
}
return false;
@@ -413,21 +511,18 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
uint64_t seq[RADEON_NUM_RINGS] = {};
long r;

- if (fence == NULL) {
- WARN(1, "Querying an invalid fence : %p !\n", fence);
- return -EINVAL;
- }
-
- seq[fence->ring] = fence->seq;
- if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
return 0;

+ seq[fence->ring] = fence->seq;
r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
if (r < 0) {
return r;
}

- fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+ r = fence_signal(&fence->base);
+ if (!r)
+ FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
return 0;
}

@@ -459,12 +554,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
continue;
}

+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+ /* already signaled */
+ return 0;
+ }
+
seq[i] = fences[i]->seq;
++num_rings;
-
- /* test if something was allready signaled */
- if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
- return 0;
}

/* nothing to wait for ? */
@@ -545,7 +641,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
*/
struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
{
- kref_get(&fence->kref);
+ fence_get(&fence->base);
return fence;
}

@@ -561,9 +657,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
struct radeon_fence *tmp = *fence;

*fence = NULL;
- if (tmp) {
- kref_put(&tmp->kref, radeon_fence_destroy);
- }
+ if (tmp)
+ fence_put(&tmp->base);
}

/**
@@ -872,3 +967,51 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
return 0;
#endif
}
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ u64 target_seq[RADEON_NUM_RINGS] = {};
+ struct radeon_device *rdev = fence->rdev;
+ long r;
+
+ target_seq[fence->ring] = fence->seq;
+
+ down_read(&rdev->exclusive_lock);
+ r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+ if (r > 0 && !fence_signal(&fence->base))
+ FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+ up_read(&rdev->exclusive_lock);
+ return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+ return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ switch (fence->ring) {
+ case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+ case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+ case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+ case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+ case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+ case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+ default: WARN_ON_ONCE(1); return "radeon.unk";
+ }
+}
+
+static const struct fence_ops radeon_fence_ops = {
+ .get_driver_name = radeon_fence_get_driver_name,
+ .get_timeline_name = radeon_fence_get_timeline_name,
+ .enable_signaling = radeon_fence_enable_signaling,
+ .signaled = __radeon_fence_signaled,
+ .wait = __radeon_fence_wait,
+ .release = NULL,
+};

2014-07-09 12:30:08

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 10/17] drm/qxl: rework to new fence interface

Final driver! \o/

This is not a proper dma_fence because the hardware may never signal
anything, so don't use dma-buf with qxl, ever.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/qxl/Makefile | 2
drivers/gpu/drm/qxl/qxl_cmd.c | 5 -
drivers/gpu/drm/qxl/qxl_debugfs.c | 12 ++-
drivers/gpu/drm/qxl/qxl_drv.h | 22 ++---
drivers/gpu/drm/qxl/qxl_fence.c | 87 -------------------
drivers/gpu/drm/qxl/qxl_kms.c | 2
drivers/gpu/drm/qxl/qxl_object.c | 2
drivers/gpu/drm/qxl/qxl_release.c | 166 ++++++++++++++++++++++++++++++++-----
drivers/gpu/drm/qxl/qxl_ttm.c | 97 ++++++++++++----------
9 files changed, 220 insertions(+), 175 deletions(-)
delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

diff --git a/drivers/gpu/drm/qxl/Makefile b/drivers/gpu/drm/qxl/Makefile
index ea046ba691d2..ac0d74852e11 100644
--- a/drivers/gpu/drm/qxl/Makefile
+++ b/drivers/gpu/drm/qxl/Makefile
@@ -4,6 +4,6 @@

ccflags-y := -Iinclude/drm

-qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o qxl_ioctl.o qxl_fence.o qxl_release.o
+qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o qxl_ioctl.o qxl_release.o

obj-$(CONFIG_DRM_QXL)+= qxl.o
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 45fad7b45486..97823644d347 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -620,11 +620,6 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool stal
if (ret == -EBUSY)
return -EBUSY;

- if (surf->fence.num_active_releases > 0 && stall == false) {
- qxl_bo_unreserve(surf);
- return -EBUSY;
- }
-
if (stall)
mutex_unlock(&qdev->surf_evict_mutex);

diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c
index c3c2bbdc6674..0d144e0646d6 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -57,11 +57,21 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
struct qxl_device *qdev = node->minor->dev->dev_private;
struct qxl_bo *bo;

+ spin_lock(&qdev->release_lock);
list_for_each_entry(bo, &qdev->gem.objects, list) {
+ struct reservation_object_list *fobj;
+ int rel;
+
+ rcu_read_lock();
+ fobj = rcu_dereference(bo->tbo.resv->fence);
+ rel = fobj ? fobj->shared_count : 0;
+ rcu_read_unlock();
+
seq_printf(m, "size %ld, pc %d, sync obj %p, num releases %d\n",
(unsigned long)bo->gem_base.size, bo->pin_count,
- bo->tbo.sync_obj, bo->fence.num_active_releases);
+ bo->tbo.sync_obj, rel);
}
+ spin_unlock(&qdev->release_lock);
return 0;
}

diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 36ed40ba773f..d547cbdebeb4 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -31,6 +31,7 @@
* Definitions taken from spice-protocol, plus kernel driver specific bits.
*/

+#include <linux/fence.h>
#include <linux/workqueue.h>
#include <linux/firmware.h>
#include <linux/platform_device.h>
@@ -95,13 +96,6 @@ enum {
QXL_INTERRUPT_IO_CMD |\
QXL_INTERRUPT_CLIENT_MONITORS_CONFIG)

-struct qxl_fence {
- struct qxl_device *qdev;
- uint32_t num_active_releases;
- uint32_t *release_ids;
- struct radix_tree_root tree;
-};
-
struct qxl_bo {
/* Protected by gem.mutex */
struct list_head list;
@@ -113,13 +107,13 @@ struct qxl_bo {
unsigned pin_count;
void *kptr;
int type;
+
/* Constant after initialization */
struct drm_gem_object gem_base;
bool is_primary; /* is this now a primary surface */
bool hw_surf_alloc;
struct qxl_surface surf;
uint32_t surface_id;
- struct qxl_fence fence; /* per bo fence - list of releases */
struct qxl_release *surf_create;
};
#define gem_to_qxl_bo(gobj) container_of((gobj), struct qxl_bo, gem_base)
@@ -191,6 +185,8 @@ enum {
* spice-protocol/qxl_dev.h */
#define QXL_MAX_RES 96
struct qxl_release {
+ struct fence base;
+
int id;
int type;
uint32_t release_offset;
@@ -284,7 +280,11 @@ struct qxl_device {
uint8_t slot_gen_bits;
uint64_t va_slot_mask;

+ /* XXX: when rcu becomes available, release_lock can be killed */
+ spinlock_t release_lock;
+ spinlock_t fence_lock;
struct idr release_idr;
+ uint32_t release_seqno;
spinlock_t release_idr_lock;
struct mutex async_io_mutex;
unsigned int last_sent_io_cmd;
@@ -561,10 +561,4 @@ qxl_surface_lookup(struct drm_device *dev, int surface_id);
void qxl_surface_evict(struct qxl_device *qdev, struct qxl_bo *surf, bool freeing);
int qxl_update_surface(struct qxl_device *qdev, struct qxl_bo *surf);

-/* qxl_fence.c */
-void qxl_fence_add_release_locked(struct qxl_fence *qfence, uint32_t rel_id);
-int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id);
-int qxl_fence_init(struct qxl_device *qdev, struct qxl_fence *qfence);
-void qxl_fence_fini(struct qxl_fence *qfence);
-
#endif
diff --git a/drivers/gpu/drm/qxl/qxl_fence.c b/drivers/gpu/drm/qxl/qxl_fence.c
deleted file mode 100644
index c7248418117d..000000000000
--- a/drivers/gpu/drm/qxl/qxl_fence.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/*
- * Copyright 2013 Red Hat Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- *
- * Authors: Dave Airlie
- * Alon Levy
- */
-
-
-#include "qxl_drv.h"
-
-/* QXL fencing-
-
- When we submit operations to the GPU we pass a release reference to the GPU
- with them, the release reference is then added to the release ring when
- the GPU is finished with that particular operation and has removed it from
- its tree.
-
- So we have can have multiple outstanding non linear fences per object.
-
- From a TTM POV we only care if the object has any outstanding releases on
- it.
-
- we wait until all outstanding releases are processeed.
-
- sync object is just a list of release ids that represent that fence on
- that buffer.
-
- we just add new releases onto the sync object attached to the object.
-
- This currently uses a radix tree to store the list of release ids.
-
- For some reason every so often qxl hw fails to release, things go wrong.
-*/
-/* must be called with the fence lock held */
-void qxl_fence_add_release_locked(struct qxl_fence *qfence, uint32_t rel_id)
-{
- radix_tree_insert(&qfence->tree, rel_id, qfence);
- qfence->num_active_releases++;
-}
-
-int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id)
-{
- void *ret;
- int retval = 0;
-
- ret = radix_tree_delete(&qfence->tree, rel_id);
- if (ret == qfence)
- qfence->num_active_releases--;
- else {
- DRM_DEBUG("didn't find fence in radix tree for %d\n", rel_id);
- retval = -ENOENT;
- }
- return retval;
-}
-
-
-int qxl_fence_init(struct qxl_device *qdev, struct qxl_fence *qfence)
-{
- qfence->qdev = qdev;
- qfence->num_active_releases = 0;
- INIT_RADIX_TREE(&qfence->tree, GFP_ATOMIC);
- return 0;
-}
-
-void qxl_fence_fini(struct qxl_fence *qfence)
-{
- kfree(qfence->release_ids);
- qfence->num_active_releases = 0;
-}
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index fd88eb4a3f79..a9e7c30e92c5 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -223,6 +223,8 @@ static int qxl_device_init(struct qxl_device *qdev,

idr_init(&qdev->release_idr);
spin_lock_init(&qdev->release_idr_lock);
+ spin_lock_init(&qdev->release_lock);
+ spin_lock_init(&qdev->fence_lock);

idr_init(&qdev->surf_id_idr);
spin_lock_init(&qdev->surf_id_idr_lock);
diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
index b95f144f0b49..9981962451d7 100644
--- a/drivers/gpu/drm/qxl/qxl_object.c
+++ b/drivers/gpu/drm/qxl/qxl_object.c
@@ -36,7 +36,6 @@ static void qxl_ttm_bo_destroy(struct ttm_buffer_object *tbo)
qdev = (struct qxl_device *)bo->gem_base.dev->dev_private;

qxl_surface_evict(qdev, bo, false);
- qxl_fence_fini(&bo->fence);
mutex_lock(&qdev->gem.mutex);
list_del_init(&bo->list);
mutex_unlock(&qdev->gem.mutex);
@@ -99,7 +98,6 @@ int qxl_bo_create(struct qxl_device *qdev,
bo->type = domain;
bo->pin_count = pinned ? 1 : 0;
bo->surface_id = 0;
- qxl_fence_init(qdev, &bo->fence);
INIT_LIST_HEAD(&bo->list);

if (surf)
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 4045ba873ab8..9731d2540a40 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -21,6 +21,7 @@
*/
#include "qxl_drv.h"
#include "qxl_object.h"
+#include <trace/events/fence.h>

/*
* drawable cmd cache - allocate a bunch of VRAM pages, suballocate
@@ -39,6 +40,88 @@
static const int release_size_per_bo[] = { RELEASE_SIZE, SURFACE_RELEASE_SIZE, RELEASE_SIZE };
static const int releases_per_bo[] = { RELEASES_PER_BO, SURFACE_RELEASES_PER_BO, RELEASES_PER_BO };

+static const char *qxl_get_driver_name(struct fence *fence)
+{
+ return "qxl";
+}
+
+static const char *qxl_get_timeline_name(struct fence *fence)
+{
+ return "release";
+}
+
+static bool qxl_nop_signaling(struct fence *fence)
+{
+ /* fences are always automatically signaled, so just pretend we did this.. */
+ return true;
+}
+
+static long qxl_fence_wait(struct fence *fence, bool intr, signed long timeout)
+{
+ struct qxl_device *qdev;
+ struct qxl_release *release;
+ int count = 0, sc = 0;
+ bool have_drawable_releases;
+ unsigned long cur, end = jiffies + timeout;
+
+ qdev = container_of(fence->lock, struct qxl_device, release_lock);
+ release = container_of(fence, struct qxl_release, base);
+ have_drawable_releases = release->type == QXL_RELEASE_DRAWABLE;
+
+retry:
+ sc++;
+
+ if (fence_is_signaled_locked(fence))
+ goto signaled;
+
+ qxl_io_notify_oom(qdev);
+
+ for (count = 0; count < 11; count++) {
+ if (!qxl_queue_garbage_collect(qdev, true))
+ break;
+
+ if (fence_is_signaled_locked(fence))
+ goto signaled;
+ }
+
+ if (fence_is_signaled_locked(fence))
+ goto signaled;
+
+ if (have_drawable_releases || sc < 4) {
+ if (sc > 2)
+ /* back off */
+ usleep_range(500, 1000);
+
+ if (time_after(jiffies, end))
+ return 0;
+
+ if (have_drawable_releases && sc > 300) {
+ FENCE_WARN(fence, "failed to wait on release %d "
+ "after spincount %d\n",
+ fence->context & ~0xf0000000, sc);
+ goto signaled;
+ }
+ goto retry;
+ }
+ /*
+ * yeah, original sync_obj_wait gave up after 3 spins when
+ * have_drawable_releases is not set.
+ */
+
+signaled:
+ cur = jiffies;
+ if (time_after(cur, end))
+ return 0;
+ return end - cur;
+}
+
+static const struct fence_ops qxl_fence_ops = {
+ .get_driver_name = qxl_get_driver_name,
+ .get_timeline_name = qxl_get_timeline_name,
+ .enable_signaling = qxl_nop_signaling,
+ .wait = qxl_fence_wait,
+};
+
static uint64_t
qxl_release_alloc(struct qxl_device *qdev, int type,
struct qxl_release **ret)
@@ -46,13 +129,13 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
struct qxl_release *release;
int handle;
size_t size = sizeof(*release);
- int idr_ret;

release = kmalloc(size, GFP_KERNEL);
if (!release) {
DRM_ERROR("Out of memory\n");
return 0;
}
+ release->base.ops = NULL;
release->type = type;
release->release_offset = 0;
release->surface_release_id = 0;
@@ -60,44 +143,59 @@ qxl_release_alloc(struct qxl_device *qdev, int type,

idr_preload(GFP_KERNEL);
spin_lock(&qdev->release_idr_lock);
- idr_ret = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+ handle = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+ release->base.seqno = ++qdev->release_seqno;
spin_unlock(&qdev->release_idr_lock);
idr_preload_end();
- handle = idr_ret;
- if (idr_ret < 0)
- goto release_fail;
+ if (handle < 0) {
+ kfree(release);
+ *ret = NULL;
+ return handle;
+ }
*ret = release;
QXL_INFO(qdev, "allocated release %lld\n", handle);
release->id = handle;
-release_fail:
-
return handle;
}

+static void
+qxl_release_free_list(struct qxl_release *release)
+{
+ while (!list_empty(&release->bos)) {
+ struct ttm_validate_buffer *entry;
+
+ entry = container_of(release->bos.next,
+ struct ttm_validate_buffer, head);
+
+ list_del(&entry->head);
+ kfree(entry);
+ }
+}
+
void
qxl_release_free(struct qxl_device *qdev,
struct qxl_release *release)
{
- struct qxl_bo_list *entry, *tmp;
QXL_INFO(qdev, "release %d, type %d\n", release->id,
release->type);

if (release->surface_release_id)
qxl_surface_id_dealloc(qdev, release->surface_release_id);

- list_for_each_entry_safe(entry, tmp, &release->bos, tv.head) {
- struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
- QXL_INFO(qdev, "release %llx\n",
- drm_vma_node_offset_addr(&entry->tv.bo->vma_node)
- - DRM_FILE_OFFSET);
- qxl_fence_remove_release(&bo->fence, release->id);
- qxl_bo_unref(&bo);
- kfree(entry);
- }
spin_lock(&qdev->release_idr_lock);
idr_remove(&qdev->release_idr, release->id);
spin_unlock(&qdev->release_idr_lock);
- kfree(release);
+
+ if (release->base.ops) {
+ WARN_ON(list_empty(&release->bos));
+ qxl_release_free_list(release);
+
+ fence_signal(&release->base);
+ fence_put(&release->base);
+ } else {
+ qxl_release_free_list(release);
+ kfree(release);
+ }
}

static int qxl_release_bo_alloc(struct qxl_device *qdev,
@@ -142,6 +240,10 @@ static int qxl_release_validate_bo(struct qxl_bo *bo)
return ret;
}

+ ret = reservation_object_reserve_shared(bo->tbo.resv);
+ if (ret)
+ return ret;
+
/* allocate a surface for reserved + validated buffers */
ret = qxl_bo_check_id(bo->gem_base.dev->dev_private, bo);
if (ret)
@@ -199,6 +301,8 @@ int qxl_alloc_surface_release_reserved(struct qxl_device *qdev,

/* stash the release after the create command */
idr_ret = qxl_release_alloc(qdev, QXL_RELEASE_SURFACE_CMD, release);
+ if (idr_ret < 0)
+ return idr_ret;
bo = qxl_bo_ref(to_qxl_bo(entry->tv.bo));

(*release)->release_offset = create_rel->release_offset + 64;
@@ -239,6 +343,11 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, unsigned long size,
}

idr_ret = qxl_release_alloc(qdev, type, release);
+ if (idr_ret < 0) {
+ if (rbo)
+ *rbo = NULL;
+ return idr_ret;
+ }

mutex_lock(&qdev->release_mutex);
if (qdev->current_release_bo_offset[cur_idx] + 1 >= releases_per_bo[cur_idx]) {
@@ -319,12 +428,13 @@ void qxl_release_unmap(struct qxl_device *qdev,

void qxl_release_fence_buffer_objects(struct qxl_release *release)
{
- struct ttm_validate_buffer *entry;
struct ttm_buffer_object *bo;
struct ttm_bo_global *glob;
struct ttm_bo_device *bdev;
struct ttm_bo_driver *driver;
struct qxl_bo *qbo;
+ struct ttm_validate_buffer *entry;
+ struct qxl_device *qdev;

/* if only one object on the release its the release itself
since these objects are pinned no need to reserve */
@@ -333,23 +443,35 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)

bo = list_first_entry(&release->bos, struct ttm_validate_buffer, head)->bo;
bdev = bo->bdev;
+ qdev = container_of(bdev, struct qxl_device, mman.bdev);
+
+ /*
+ * Since we never really allocated a context and we don't want to conflict,
+ * set the highest bits. This will break if we really allow exporting of dma-bufs.
+ */
+ fence_init(&release->base, &qxl_fence_ops, &qdev->release_lock,
+ release->id | 0xf0000000, release->base.seqno);
+ trace_fence_emit(&release->base);
+
driver = bdev->driver;
glob = bo->glob;

spin_lock(&glob->lru_lock);
+ /* acquire release_lock to protect bo->resv->fence and its contents */
+ spin_lock(&qdev->release_lock);

list_for_each_entry(entry, &release->bos, head) {
bo = entry->bo;
qbo = to_qxl_bo(bo);

if (!entry->bo->sync_obj)
- entry->bo->sync_obj = &qbo->fence;
-
- qxl_fence_add_release_locked(&qbo->fence, release->id);
+ entry->bo->sync_obj = qbo;

+ reservation_object_add_shared_fence(bo->resv, &release->base);
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
}
+ spin_unlock(&qdev->release_lock);
spin_unlock(&glob->lru_lock);
ww_acquire_fini(&release->ticket);
}
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 71a1baeac14e..6230251fa5b0 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -355,67 +355,67 @@ static int qxl_bo_move(struct ttm_buffer_object *bo,
return ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
}

+static bool qxl_sync_obj_signaled(void *sync_obj);

static int qxl_sync_obj_wait(void *sync_obj,
bool lazy, bool interruptible)
{
- struct qxl_fence *qfence = (struct qxl_fence *)sync_obj;
- int count = 0, sc = 0;
- struct qxl_bo *bo = container_of(qfence, struct qxl_bo, fence);
-
- if (qfence->num_active_releases == 0)
- return 0;
+ struct qxl_bo *bo = (struct qxl_bo *)sync_obj;
+ struct qxl_device *qdev = bo->gem_base.dev->dev_private;
+ struct reservation_object_list *fobj;
+ int count = 0, sc = 0, num_release = 0;
+ bool have_drawable_releases;

retry:
if (sc == 0) {
if (bo->type == QXL_GEM_DOMAIN_SURFACE)
- qxl_update_surface(qfence->qdev, bo);
+ qxl_update_surface(qdev, bo);
} else if (sc >= 1) {
- qxl_io_notify_oom(qfence->qdev);
+ qxl_io_notify_oom(qdev);
}

sc++;

for (count = 0; count < 10; count++) {
- bool ret;
- ret = qxl_queue_garbage_collect(qfence->qdev, true);
- if (ret == false)
- break;
-
- if (qfence->num_active_releases == 0)
+ if (qxl_sync_obj_signaled(sync_obj))
return 0;
+
+ if (!qxl_queue_garbage_collect(qdev, true))
+ break;
}

- if (qfence->num_active_releases) {
- bool have_drawable_releases = false;
- void **slot;
- struct radix_tree_iter iter;
- int release_id;
+ have_drawable_releases = false;
+ num_release = 0;

- radix_tree_for_each_slot(slot, &qfence->tree, &iter, 0) {
- struct qxl_release *release;
+ spin_lock(&qdev->release_lock);
+ fobj = bo->tbo.resv->fence;
+ for (count = 0; fobj && count < fobj->shared_count; count++) {
+ struct qxl_release *release;

- release_id = iter.index;
- release = qxl_release_from_id_locked(qfence->qdev, release_id);
- if (release == NULL)
- continue;
+ release = container_of(fobj->shared[count],
+ struct qxl_release, base);

- if (release->type == QXL_RELEASE_DRAWABLE)
- have_drawable_releases = true;
- }
+ if (fence_is_signaled(&release->base))
+ continue;
+
+ num_release++;
+
+ if (release->type == QXL_RELEASE_DRAWABLE)
+ have_drawable_releases = true;
+ }
+ spin_unlock(&qdev->release_lock);
+
+ qxl_queue_garbage_collect(qdev, true);

- qxl_queue_garbage_collect(qfence->qdev, true);
-
- if (have_drawable_releases || sc < 4) {
- if (sc > 2)
- /* back off */
- usleep_range(500, 1000);
- if (have_drawable_releases && sc > 300) {
- WARN(1, "sync obj %d still has outstanding releases %d %d %d %ld %d\n", sc, bo->surface_id, bo->is_primary, bo->pin_count, (unsigned long)bo->gem_base.size, qfence->num_active_releases);
- return -EBUSY;
- }
- goto retry;
+ if (have_drawable_releases || sc < 4) {
+ if (sc > 2)
+ /* back off */
+ usleep_range(500, 1000);
+ if (have_drawable_releases && sc > 300) {
+ WARN(1, "sync obj %d still has outstanding releases %d %d %d %ld %d\n", sc, bo->surface_id, bo->is_primary, bo->pin_count, (unsigned long)bo->gem_base.size, num_release);
+ return -EBUSY;
}
+ goto retry;
}
return 0;
}
@@ -437,8 +437,21 @@ static void *qxl_sync_obj_ref(void *sync_obj)

static bool qxl_sync_obj_signaled(void *sync_obj)
{
- struct qxl_fence *qfence = (struct qxl_fence *)sync_obj;
- return (qfence->num_active_releases == 0);
+ struct qxl_bo *qbo = (struct qxl_bo *)sync_obj;
+ struct qxl_device *qdev = qbo->gem_base.dev->dev_private;
+ struct reservation_object_list *fobj;
+ bool ret = true;
+ unsigned i;
+
+ spin_lock(&qdev->release_lock);
+ fobj = qbo->tbo.resv->fence;
+ for (i = 0; fobj && i < fobj->shared_count; ++i) {
+ ret = fence_is_signaled(fobj->shared[i]);
+ if (!ret)
+ break;
+ }
+ spin_unlock(&qdev->release_lock);
+ return ret;
}

static void qxl_bo_move_notify(struct ttm_buffer_object *bo,
@@ -475,8 +488,6 @@ static struct ttm_bo_driver qxl_bo_driver = {
.move_notify = &qxl_bo_move_notify,
};

-
-
int qxl_ttm_init(struct qxl_device *qdev)
{
int r;

2014-07-09 12:30:12

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 11/17] drm/vmwgfx: get rid of different types of fence_flags entirely

Only one type was ever used. This is needed to simplify the fence
support in the next commit.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 5 +--
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1 -
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 14 ++-------
drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 50 ++++++++++++-------------------
drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 8 +----
5 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 4a36bb1dc525..f15718cc631d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -792,15 +792,12 @@ static int vmw_sync_obj_flush(void *sync_obj)

static bool vmw_sync_obj_signaled(void *sync_obj)
{
- return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj,
- DRM_VMW_FENCE_FLAG_EXEC);
-
+ return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj);
}

static int vmw_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
{
return vmw_fence_obj_wait((struct vmw_fence_obj *) sync_obj,
- DRM_VMW_FENCE_FLAG_EXEC,
lazy, interruptible,
VMW_FENCE_WAIT_TIMEOUT);
}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 6b252a887ae2..f217e9723b9e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -332,7 +332,6 @@ struct vmw_sw_context{
uint32_t *cmd_bounce;
uint32_t cmd_bounce_size;
struct list_head resource_list;
- uint32_t fence_flags;
struct ttm_buffer_object *cur_query_bo;
struct list_head res_relocations;
uint32_t *buf_start;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index f8b25bc4e634..db30b790ad24 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -350,8 +350,6 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context *sw_context,
vval_buf->validate_as_mob = validate_as_mob;
}

- sw_context->fence_flags |= DRM_VMW_FENCE_FLAG_EXEC;
-
if (p_val_node)
*p_val_node = val_node;

@@ -2308,13 +2306,9 @@ int vmw_execbuf_fence_commands(struct drm_file *file_priv,

if (p_handle != NULL)
ret = vmw_user_fence_create(file_priv, dev_priv->fman,
- sequence,
- DRM_VMW_FENCE_FLAG_EXEC,
- p_fence, p_handle);
+ sequence, p_fence, p_handle);
else
- ret = vmw_fence_create(dev_priv->fman, sequence,
- DRM_VMW_FENCE_FLAG_EXEC,
- p_fence);
+ ret = vmw_fence_create(dev_priv->fman, sequence, p_fence);

if (unlikely(ret != 0 && !synced)) {
(void) vmw_fallback_wait(dev_priv, false, false,
@@ -2387,8 +2381,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
ttm_ref_object_base_unref(vmw_fp->tfile,
fence_handle, TTM_REF_USAGE);
DRM_ERROR("Fence copy error. Syncing.\n");
- (void) vmw_fence_obj_wait(fence, fence->signal_mask,
- false, false,
+ (void) vmw_fence_obj_wait(fence, false, false,
VMW_FENCE_WAIT_TIMEOUT);
}
}
@@ -2438,7 +2431,6 @@ int vmw_execbuf_process(struct drm_file *file_priv,
sw_context->fp = vmw_fpriv(file_priv);
sw_context->cur_reloc = 0;
sw_context->cur_val_buf = 0;
- sw_context->fence_flags = 0;
INIT_LIST_HEAD(&sw_context->resource_list);
sw_context->cur_query_bo = dev_priv->pinned_bo;
sw_context->last_query_ctx = NULL;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 436b013b4231..05b9eea8e875 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -207,9 +207,7 @@ void vmw_fence_manager_takedown(struct vmw_fence_manager *fman)
}

static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
- struct vmw_fence_obj *fence,
- u32 seqno,
- uint32_t mask,
+ struct vmw_fence_obj *fence, u32 seqno,
void (*destroy) (struct vmw_fence_obj *fence))
{
unsigned long irq_flags;
@@ -220,7 +218,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
INIT_LIST_HEAD(&fence->seq_passed_actions);
fence->fman = fman;
fence->signaled = 0;
- fence->signal_mask = mask;
kref_init(&fence->kref);
fence->destroy = destroy;
init_waitqueue_head(&fence->queue);
@@ -356,7 +353,7 @@ static bool vmw_fence_goal_check_locked(struct vmw_fence_obj *fence)
u32 goal_seqno;
__le32 __iomem *fifo_mem;

- if (fence->signaled & DRM_VMW_FENCE_FLAG_EXEC)
+ if (fence->signaled)
return false;

fifo_mem = fence->fman->dev_priv->mmio_virt;
@@ -386,7 +383,7 @@ rerun:
list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (seqno - fence->seqno < VMW_FENCE_WRAP) {
list_del_init(&fence->head);
- fence->signaled |= DRM_VMW_FENCE_FLAG_EXEC;
+ fence->signaled = 1;
INIT_LIST_HEAD(&action_list);
list_splice_init(&fence->seq_passed_actions,
&action_list);
@@ -417,8 +414,7 @@ rerun:
}
}

-bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence,
- uint32_t flags)
+bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
{
struct vmw_fence_manager *fman = fence->fman;
unsigned long irq_flags;
@@ -428,28 +424,25 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence,
signaled = fence->signaled;
spin_unlock_irqrestore(&fman->lock, irq_flags);

- flags &= fence->signal_mask;
- if ((signaled & flags) == flags)
+ if (signaled)
return 1;

- if ((signaled & DRM_VMW_FENCE_FLAG_EXEC) == 0)
- vmw_fences_update(fman);
+ vmw_fences_update(fman);

spin_lock_irqsave(&fman->lock, irq_flags);
signaled = fence->signaled;
spin_unlock_irqrestore(&fman->lock, irq_flags);

- return ((signaled & flags) == flags);
+ return signaled;
}

-int vmw_fence_obj_wait(struct vmw_fence_obj *fence,
- uint32_t flags, bool lazy,
+int vmw_fence_obj_wait(struct vmw_fence_obj *fence, bool lazy,
bool interruptible, unsigned long timeout)
{
struct vmw_private *dev_priv = fence->fman->dev_priv;
long ret;

- if (likely(vmw_fence_obj_signaled(fence, flags)))
+ if (likely(vmw_fence_obj_signaled(fence)))
return 0;

vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
@@ -458,12 +451,12 @@ int vmw_fence_obj_wait(struct vmw_fence_obj *fence,
if (interruptible)
ret = wait_event_interruptible_timeout
(fence->queue,
- vmw_fence_obj_signaled(fence, flags),
+ vmw_fence_obj_signaled(fence),
timeout);
else
ret = wait_event_timeout
(fence->queue,
- vmw_fence_obj_signaled(fence, flags),
+ vmw_fence_obj_signaled(fence),
timeout);

vmw_seqno_waiter_remove(dev_priv);
@@ -497,7 +490,6 @@ static void vmw_fence_destroy(struct vmw_fence_obj *fence)

int vmw_fence_create(struct vmw_fence_manager *fman,
uint32_t seqno,
- uint32_t mask,
struct vmw_fence_obj **p_fence)
{
struct ttm_mem_global *mem_glob = vmw_mem_glob(fman->dev_priv);
@@ -515,7 +507,7 @@ int vmw_fence_create(struct vmw_fence_manager *fman,
goto out_no_object;
}

- ret = vmw_fence_obj_init(fman, fence, seqno, mask,
+ ret = vmw_fence_obj_init(fman, fence, seqno,
vmw_fence_destroy);
if (unlikely(ret != 0))
goto out_err_init;
@@ -559,7 +551,6 @@ static void vmw_user_fence_base_release(struct ttm_base_object **p_base)
int vmw_user_fence_create(struct drm_file *file_priv,
struct vmw_fence_manager *fman,
uint32_t seqno,
- uint32_t mask,
struct vmw_fence_obj **p_fence,
uint32_t *p_handle)
{
@@ -586,7 +577,7 @@ int vmw_user_fence_create(struct drm_file *file_priv,
}

ret = vmw_fence_obj_init(fman, &ufence->fence, seqno,
- mask, vmw_user_fence_destroy);
+ vmw_user_fence_destroy);
if (unlikely(ret != 0)) {
kfree(ufence);
goto out_no_object;
@@ -647,13 +638,12 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
kref_get(&fence->kref);
spin_unlock_irq(&fman->lock);

- ret = vmw_fence_obj_wait(fence, fence->signal_mask,
- false, false,
+ ret = vmw_fence_obj_wait(fence, false, false,
VMW_FENCE_WAIT_TIMEOUT);

if (unlikely(ret != 0)) {
list_del_init(&fence->head);
- fence->signaled |= DRM_VMW_FENCE_FLAG_EXEC;
+ fence->signaled = 1;
INIT_LIST_HEAD(&action_list);
list_splice_init(&fence->seq_passed_actions,
&action_list);
@@ -716,14 +706,14 @@ int vmw_fence_obj_wait_ioctl(struct drm_device *dev, void *data,

timeout = jiffies;
if (time_after_eq(timeout, (unsigned long)arg->kernel_cookie)) {
- ret = ((vmw_fence_obj_signaled(fence, arg->flags)) ?
+ ret = ((vmw_fence_obj_signaled(fence)) ?
0 : -EBUSY);
goto out;
}

timeout = (unsigned long)arg->kernel_cookie - timeout;

- ret = vmw_fence_obj_wait(fence, arg->flags, arg->lazy, true, timeout);
+ ret = vmw_fence_obj_wait(fence, arg->lazy, true, timeout);

out:
ttm_base_object_unref(&base);
@@ -760,10 +750,10 @@ int vmw_fence_obj_signaled_ioctl(struct drm_device *dev, void *data,
fence = &(container_of(base, struct vmw_user_fence, base)->fence);
fman = fence->fman;

- arg->signaled = vmw_fence_obj_signaled(fence, arg->flags);
+ arg->signaled = vmw_fence_obj_signaled(fence);
spin_lock_irq(&fman->lock);

- arg->signaled_flags = fence->signaled;
+ arg->signaled_flags = arg->flags;
arg->passed_seqno = dev_priv->last_read_seqno;
spin_unlock_irq(&fman->lock);

@@ -908,7 +898,7 @@ static void vmw_fence_obj_add_action(struct vmw_fence_obj *fence,
spin_lock_irqsave(&fman->lock, irq_flags);

fman->pending_actions[action->type]++;
- if (fence->signaled & DRM_VMW_FENCE_FLAG_EXEC) {
+ if (fence->signaled) {
struct list_head action_list;

INIT_LIST_HEAD(&action_list);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
index faf2e7873860..8c18d32bd1c3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
@@ -56,7 +56,6 @@ struct vmw_fence_obj {
struct vmw_fence_manager *fman;
struct list_head head;
uint32_t signaled;
- uint32_t signal_mask;
struct list_head seq_passed_actions;
void (*destroy)(struct vmw_fence_obj *fence);
wait_queue_head_t queue;
@@ -74,10 +73,9 @@ vmw_fence_obj_reference(struct vmw_fence_obj *fence);

extern void vmw_fences_update(struct vmw_fence_manager *fman);

-extern bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence,
- uint32_t flags);
+extern bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence);

-extern int vmw_fence_obj_wait(struct vmw_fence_obj *fence, uint32_t flags,
+extern int vmw_fence_obj_wait(struct vmw_fence_obj *fence,
bool lazy,
bool interruptible, unsigned long timeout);

@@ -85,13 +83,11 @@ extern void vmw_fence_obj_flush(struct vmw_fence_obj *fence);

extern int vmw_fence_create(struct vmw_fence_manager *fman,
uint32_t seqno,
- uint32_t mask,
struct vmw_fence_obj **p_fence);

extern int vmw_user_fence_create(struct drm_file *file_priv,
struct vmw_fence_manager *fman,
uint32_t sequence,
- uint32_t mask,
struct vmw_fence_obj **p_fence,
uint32_t *p_handle);

2014-07-09 12:30:27

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 13/17] drm/ttm: flip the switch, and convert to dma_fence


---
drivers/gpu/drm/nouveau/nouveau_bo.c | 48 +-------
drivers/gpu/drm/nouveau/nouveau_fence.c | 24 +---
drivers/gpu/drm/nouveau/nouveau_fence.h | 2
drivers/gpu/drm/nouveau/nouveau_gem.c | 16 ++-
drivers/gpu/drm/qxl/qxl_debugfs.c | 6 +
drivers/gpu/drm/qxl/qxl_drv.h | 2
drivers/gpu/drm/qxl/qxl_kms.c | 1
drivers/gpu/drm/qxl/qxl_object.h | 4 -
drivers/gpu/drm/qxl/qxl_release.c | 3 -
drivers/gpu/drm/qxl/qxl_ttm.c | 104 ------------------
drivers/gpu/drm/radeon/radeon_cs.c | 10 +-
drivers/gpu/drm/radeon/radeon_display.c | 25 +++-
drivers/gpu/drm/radeon/radeon_object.c | 4 -
drivers/gpu/drm/radeon/radeon_ttm.c | 34 ------
drivers/gpu/drm/radeon/radeon_uvd.c | 8 +
drivers/gpu/drm/radeon/radeon_vm.c | 14 ++
drivers/gpu/drm/ttm/ttm_bo.c | 171 +++++++++++++++++++++---------
drivers/gpu/drm/ttm/ttm_bo_util.c | 23 +---
drivers/gpu/drm/ttm/ttm_execbuf_util.c | 10 --
drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 40 -------
drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 14 +-
include/drm/ttm/ttm_bo_api.h | 2
include/drm/ttm/ttm_bo_driver.h | 26 -----
include/drm/ttm/ttm_execbuf_util.h | 10 +-
24 files changed, 208 insertions(+), 393 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 84aba3fa1bd0..5b8ccc39a282 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -92,13 +92,13 @@ nv10_bo_get_tile_region(struct drm_device *dev, int i)

static void
nv10_bo_put_tile_region(struct drm_device *dev, struct nouveau_drm_tile *tile,
- struct nouveau_fence *fence)
+ struct fence *fence)
{
struct nouveau_drm *drm = nouveau_drm(dev);

if (tile) {
spin_lock(&drm->tile.lock);
- tile->fence = nouveau_fence_ref(fence);
+ tile->fence = nouveau_fence_ref((struct nouveau_fence *)fence);
tile->used = false;
spin_unlock(&drm->tile.lock);
}
@@ -965,7 +965,8 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr,
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
- ret = ttm_bo_move_accel_cleanup(bo, fence,
+ ret = ttm_bo_move_accel_cleanup(bo,
+ &fence->base,
evict,
no_wait_gpu,
new_mem);
@@ -1151,8 +1152,9 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
{
struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
struct drm_device *dev = drm->dev;
+ struct fence *fence = reservation_object_get_excl(bo->resv);

- nv10_bo_put_tile_region(dev, *old_tile, bo->sync_obj);
+ nv10_bo_put_tile_region(dev, *old_tile, fence);
*old_tile = new_tile;
}

@@ -1423,47 +1425,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
}

-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
- nouveau_fence_unref((struct nouveau_fence **)sync_obj);
-}
-
void
nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
{
struct reservation_object *resv = nvbo->bo.resv;

- nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
- nvbo->bo.sync_obj = nouveau_fence_ref(fence);
-
reservation_object_add_excl_fence(resv, &fence->base);
}

-static void *
-nouveau_bo_fence_ref(void *sync_obj)
-{
- return nouveau_fence_ref(sync_obj);
-}
-
-static bool
-nouveau_bo_fence_signalled(void *sync_obj)
-{
- return nouveau_fence_done(sync_obj);
-}
-
-static int
-nouveau_bo_fence_wait(void *sync_obj, bool lazy, bool intr)
-{
- return nouveau_fence_wait(sync_obj, lazy, intr);
-}
-
-static int
-nouveau_bo_fence_flush(void *sync_obj)
-{
- return 0;
-}
-
struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
.ttm_tt_populate = &nouveau_ttm_tt_populate,
@@ -1474,11 +1443,6 @@ struct ttm_bo_driver nouveau_bo_driver = {
.move_notify = nouveau_bo_move_ntfy,
.move = nouveau_bo_move,
.verify_access = nouveau_bo_verify_access,
- .sync_obj_signaled = nouveau_bo_fence_signalled,
- .sync_obj_wait = nouveau_bo_fence_wait,
- .sync_obj_flush = nouveau_bo_fence_flush,
- .sync_obj_unref = nouveau_bo_fence_unref,
- .sync_obj_ref = nouveau_bo_fence_ref,
.fault_reserve_notify = &nouveau_ttm_fault_reserve_notify,
.io_mem_reserve = &nouveau_ttm_io_mem_reserve,
.io_mem_free = &nouveau_ttm_io_mem_free,
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index d24f8ce4341a..9f92ad37637d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -139,17 +139,18 @@ static bool nouveau_fence_is_signaled(struct fence *f)
}

void
-nouveau_fence_work(struct nouveau_fence *fence,
+nouveau_fence_work(struct fence *fence,
void (*func)(void *), void *data)
{
struct nouveau_fence_work *work;

- if (fence_is_signaled(&fence->base))
+ if (fence_is_signaled(fence))
goto err;

work = kmalloc(sizeof(*work), GFP_KERNEL);
if (!work) {
- WARN_ON(nouveau_fence_wait(fence, false, false));
+ WARN_ON(nouveau_fence_wait((struct nouveau_fence *)fence,
+ false, false));
goto err;
}

@@ -157,7 +158,7 @@ nouveau_fence_work(struct nouveau_fence *fence,
work->func = func;
work->data = data;

- if (fence_add_callback(&fence->base, &work->cb, nouveau_fence_work_cb) < 0)
+ if (fence_add_callback(fence, &work->cb, nouveau_fence_work_cb) < 0)
goto err_free;
return;

@@ -322,14 +323,9 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan)
struct reservation_object_list *fobj;
int ret = 0, i;

- fence = nvbo->bo.sync_obj;
- if (fence && fence_is_signaled(fence)) {
- nouveau_fence_unref((struct nouveau_fence **)
- &nvbo->bo.sync_obj);
- fence = NULL;
- }
+ fence = reservation_object_get_excl(resv);

- if (fence) {
+ if (fence && !fence_is_signaled(fence)) {
struct nouveau_fence *f = container_of(fence,
struct nouveau_fence,
base);
@@ -345,12 +341,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan)
if (ret)
return ret;

- fence = reservation_object_get_excl(resv);
- if (fence && !nouveau_local_fence(fence, chan->drm))
- ret = fence_wait(fence, true);
-
fobj = reservation_object_get_list(resv);
- if (!fobj || ret)
+ if (!fobj)
return ret;

for (i = 0; i < fobj->shared_count && !ret; ++i) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 1989ec22e66e..41abc8a44e3c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -26,7 +26,7 @@ void nouveau_fence_unref(struct nouveau_fence **);

int nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *);
bool nouveau_fence_done(struct nouveau_fence *);
-void nouveau_fence_work(struct nouveau_fence *, void (*)(void *), void *);
+void nouveau_fence_work(struct fence *, void (*)(void *), void *);
int nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
int nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *);

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index a61530becfb9..4beaa897adad 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -100,13 +100,12 @@ static void
nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct nouveau_vma *vma)
{
const bool mapped = nvbo->bo.mem.mem_type != TTM_PL_SYSTEM;
- struct nouveau_fence *fence = NULL;
+ struct fence *fence = NULL;

list_del(&vma->head);

- if (mapped) {
- fence = nouveau_fence_ref(nvbo->bo.sync_obj);
- }
+ if (mapped)
+ fence = reservation_object_get_excl(nvbo->bo.resv);

if (fence) {
nouveau_fence_work(fence, nouveau_gem_object_delete, vma);
@@ -116,7 +115,6 @@ nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct nouveau_vma *vma)
nouveau_vm_put(vma);
kfree(vma);
}
- nouveau_fence_unref(&fence);
}

void
@@ -876,8 +874,12 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
if (!ret) {
ret = ttm_bo_wait(&nvbo->bo, true, true, true);
- if (!no_wait && ret)
- fence = nouveau_fence_ref(nvbo->bo.sync_obj);
+ if (!no_wait && ret) {
+ struct fence *excl;
+
+ excl = reservation_object_get_excl(nvbo->bo.resv);
+ fence = nouveau_fence_ref((struct nouveau_fence *)excl);
+ }

ttm_bo_unreserve(&nvbo->bo);
}
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c
index 0d144e0646d6..a4a63fd84803 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -67,9 +67,9 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
rel = fobj ? fobj->shared_count : 0;
rcu_read_unlock();

- seq_printf(m, "size %ld, pc %d, sync obj %p, num releases %d\n",
- (unsigned long)bo->gem_base.size, bo->pin_count,
- bo->tbo.sync_obj, rel);
+ seq_printf(m, "size %ld, pc %d, num releases %d\n",
+ (unsigned long)bo->gem_base.size,
+ bo->pin_count, rel);
}
spin_unlock(&qdev->release_lock);
return 0;
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index d547cbdebeb4..74e2117ee0e6 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -280,9 +280,7 @@ struct qxl_device {
uint8_t slot_gen_bits;
uint64_t va_slot_mask;

- /* XXX: when rcu becomes available, release_lock can be killed */
spinlock_t release_lock;
- spinlock_t fence_lock;
struct idr release_idr;
uint32_t release_seqno;
spinlock_t release_idr_lock;
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index a9e7c30e92c5..7234561e09d9 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -224,7 +224,6 @@ static int qxl_device_init(struct qxl_device *qdev,
idr_init(&qdev->release_idr);
spin_lock_init(&qdev->release_idr_lock);
spin_lock_init(&qdev->release_lock);
- spin_lock_init(&qdev->fence_lock);

idr_init(&qdev->surf_id_idr);
spin_lock_init(&qdev->surf_id_idr_lock);
diff --git a/drivers/gpu/drm/qxl/qxl_object.h b/drivers/gpu/drm/qxl/qxl_object.h
index 98395b223ad0..9da7becbdb34 100644
--- a/drivers/gpu/drm/qxl/qxl_object.h
+++ b/drivers/gpu/drm/qxl/qxl_object.h
@@ -78,8 +78,8 @@ static inline int qxl_bo_wait(struct qxl_bo *bo, u32 *mem_type,
}
if (mem_type)
*mem_type = bo->tbo.mem.mem_type;
- if (bo->tbo.sync_obj)
- r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
+
+ r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
ttm_bo_unreserve(&bo->tbo);
return r;
}
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 9731d2540a40..15158c5a5b3a 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -464,9 +464,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
bo = entry->bo;
qbo = to_qxl_bo(bo);

- if (!entry->bo->sync_obj)
- entry->bo->sync_obj = qbo;
-
reservation_object_add_shared_fence(bo->resv, &release->base);
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 6230251fa5b0..99b7ee110a98 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -355,105 +355,6 @@ static int qxl_bo_move(struct ttm_buffer_object *bo,
return ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
}

-static bool qxl_sync_obj_signaled(void *sync_obj);
-
-static int qxl_sync_obj_wait(void *sync_obj,
- bool lazy, bool interruptible)
-{
- struct qxl_bo *bo = (struct qxl_bo *)sync_obj;
- struct qxl_device *qdev = bo->gem_base.dev->dev_private;
- struct reservation_object_list *fobj;
- int count = 0, sc = 0, num_release = 0;
- bool have_drawable_releases;
-
-retry:
- if (sc == 0) {
- if (bo->type == QXL_GEM_DOMAIN_SURFACE)
- qxl_update_surface(qdev, bo);
- } else if (sc >= 1) {
- qxl_io_notify_oom(qdev);
- }
-
- sc++;
-
- for (count = 0; count < 10; count++) {
- if (qxl_sync_obj_signaled(sync_obj))
- return 0;
-
- if (!qxl_queue_garbage_collect(qdev, true))
- break;
- }
-
- have_drawable_releases = false;
- num_release = 0;
-
- spin_lock(&qdev->release_lock);
- fobj = bo->tbo.resv->fence;
- for (count = 0; fobj && count < fobj->shared_count; count++) {
- struct qxl_release *release;
-
- release = container_of(fobj->shared[count],
- struct qxl_release, base);
-
- if (fence_is_signaled(&release->base))
- continue;
-
- num_release++;
-
- if (release->type == QXL_RELEASE_DRAWABLE)
- have_drawable_releases = true;
- }
- spin_unlock(&qdev->release_lock);
-
- qxl_queue_garbage_collect(qdev, true);
-
- if (have_drawable_releases || sc < 4) {
- if (sc > 2)
- /* back off */
- usleep_range(500, 1000);
- if (have_drawable_releases && sc > 300) {
- WARN(1, "sync obj %d still has outstanding releases %d %d %d %ld %d\n", sc, bo->surface_id, bo->is_primary, bo->pin_count, (unsigned long)bo->gem_base.size, num_release);
- return -EBUSY;
- }
- goto retry;
- }
- return 0;
-}
-
-static int qxl_sync_obj_flush(void *sync_obj)
-{
- return 0;
-}
-
-static void qxl_sync_obj_unref(void **sync_obj)
-{
- *sync_obj = NULL;
-}
-
-static void *qxl_sync_obj_ref(void *sync_obj)
-{
- return sync_obj;
-}
-
-static bool qxl_sync_obj_signaled(void *sync_obj)
-{
- struct qxl_bo *qbo = (struct qxl_bo *)sync_obj;
- struct qxl_device *qdev = qbo->gem_base.dev->dev_private;
- struct reservation_object_list *fobj;
- bool ret = true;
- unsigned i;
-
- spin_lock(&qdev->release_lock);
- fobj = qbo->tbo.resv->fence;
- for (i = 0; fobj && i < fobj->shared_count; ++i) {
- ret = fence_is_signaled(fobj->shared[i]);
- if (!ret)
- break;
- }
- spin_unlock(&qdev->release_lock);
- return ret;
-}
-
static void qxl_bo_move_notify(struct ttm_buffer_object *bo,
struct ttm_mem_reg *new_mem)
{
@@ -480,11 +381,6 @@ static struct ttm_bo_driver qxl_bo_driver = {
.verify_access = &qxl_verify_access,
.io_mem_reserve = &qxl_ttm_io_mem_reserve,
.io_mem_free = &qxl_ttm_io_mem_free,
- .sync_obj_signaled = &qxl_sync_obj_signaled,
- .sync_obj_wait = &qxl_sync_obj_wait,
- .sync_obj_flush = &qxl_sync_obj_flush,
- .sync_obj_unref = &qxl_sync_obj_unref,
- .sync_obj_ref = &qxl_sync_obj_ref,
.move_notify = &qxl_bo_move_notify,
};

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 71a143461478..dfd3f389776c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -228,11 +228,17 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
int i;

for (i = 0; i < p->nrelocs; i++) {
+ struct reservation_object *resv;
+ struct fence *fence;
+
if (!p->relocs[i].robj)
continue;

+ resv = p->relocs[i].robj->tbo.resv;
+ fence = reservation_object_get_excl(resv);
+
radeon_semaphore_sync_to(p->ib.semaphore,
- p->relocs[i].robj->tbo.sync_obj);
+ (struct radeon_fence *)fence);
}
}

@@ -402,7 +408,7 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo

ttm_eu_fence_buffer_objects(&parser->ticket,
&parser->validated,
- parser->ib.fence);
+ &parser->ib.fence->base);
} else if (backoff) {
ttm_eu_backoff_reservation(&parser->ticket,
&parser->validated);
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index fb3c08dced85..7e7b6b6064db 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -518,6 +518,7 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
struct radeon_framebuffer *new_radeon_fb;
struct drm_gem_object *obj;
struct radeon_flip_work *work;
+ struct fence *fence;
unsigned long flags;

work = kzalloc(sizeof *work, GFP_KERNEL);
@@ -544,15 +545,21 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
obj = new_radeon_fb->obj;
work->new_rbo = gem_to_radeon_bo(obj);

- if (work->new_rbo->tbo.sync_obj) {
- int ret = ttm_bo_reserve(&work->new_rbo->tbo, true, false, false, NULL);
- if (ret) {
- drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base);
- kfree(work);
- return ret;
- }
- work->fence = radeon_fence_ref(work->new_rbo->tbo.sync_obj);
- ttm_bo_unreserve(&work->new_rbo->tbo);
+ /* XXX: Hack, bo should really be pinned at this point */
+ do {
+ rcu_read_lock();
+ fence = rcu_dereference(work->new_rbo->tbo.resv->fence_excl);
+ if (fence)
+ work->fence = (struct radeon_fence *)fence_get_rcu(fence);
+ rcu_read_unlock();
+ } while (fence && !work->fence);
+
+ if (fence && !fence->ops->signaled) {
+ /*
+ * make sure if this fence doesn't belong to this
+ * device that it will still signal completion
+ */
+ fence_enable_sw_signaling(fence);
}

/* We borrow the event spin lock for protecting flip_work */
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 8538aebb6580..53104f80d382 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -736,8 +736,8 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
return r;
if (mem_type)
*mem_type = bo->tbo.mem.mem_type;
- if (bo->tbo.sync_obj)
- r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
+
+ r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
ttm_bo_unreserve(&bo->tbo);
return r;
}
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index c8a8a5144ec1..715e29f984c1 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -265,12 +265,12 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
BUILD_BUG_ON((PAGE_SIZE % RADEON_GPU_PAGE_SIZE) != 0);

/* sync other rings */
- fence = bo->sync_obj;
+ fence = (struct radeon_fence *)reservation_object_get_excl(bo->resv);
r = radeon_copy(rdev, old_start, new_start,
new_mem->num_pages * (PAGE_SIZE / RADEON_GPU_PAGE_SIZE), /* GPU pages */
&fence);
/* FIXME: handle copy error */
- r = ttm_bo_move_accel_cleanup(bo, (void *)fence,
+ r = ttm_bo_move_accel_cleanup(bo, &fence->base,
evict, no_wait_gpu, new_mem);
radeon_fence_unref(&fence);
return r;
@@ -483,31 +483,6 @@ static void radeon_ttm_io_mem_free(struct ttm_bo_device *bdev, struct ttm_mem_re
{
}

-static int radeon_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
-{
- return radeon_fence_wait((struct radeon_fence *)sync_obj, interruptible);
-}
-
-static int radeon_sync_obj_flush(void *sync_obj)
-{
- return 0;
-}
-
-static void radeon_sync_obj_unref(void **sync_obj)
-{
- radeon_fence_unref((struct radeon_fence **)sync_obj);
-}
-
-static void *radeon_sync_obj_ref(void *sync_obj)
-{
- return radeon_fence_ref((struct radeon_fence *)sync_obj);
-}
-
-static bool radeon_sync_obj_signaled(void *sync_obj)
-{
- return radeon_fence_signaled((struct radeon_fence *)sync_obj);
-}
-
/*
* TTM backend functions.
*/
@@ -685,11 +660,6 @@ static struct ttm_bo_driver radeon_bo_driver = {
.evict_flags = &radeon_evict_flags,
.move = &radeon_bo_move,
.verify_access = &radeon_verify_access,
- .sync_obj_signaled = &radeon_sync_obj_signaled,
- .sync_obj_wait = &radeon_sync_obj_wait,
- .sync_obj_flush = &radeon_sync_obj_flush,
- .sync_obj_unref = &radeon_sync_obj_unref,
- .sync_obj_ref = &radeon_sync_obj_ref,
.move_notify = &radeon_bo_move_notify,
.fault_reserve_notify = &radeon_bo_fault_reserve_notify,
.io_mem_reserve = &radeon_ttm_io_mem_reserve,
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index 67b2a367df40..b35655e2e35e 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -356,6 +356,7 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
{
int32_t *msg, msg_type, handle;
unsigned img_size = 0;
+ struct fence *f;
void *ptr;

int i, r;
@@ -365,8 +366,9 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
return -EINVAL;
}

- if (bo->tbo.sync_obj) {
- r = radeon_fence_wait(bo->tbo.sync_obj, false);
+ f = reservation_object_get_excl(bo->tbo.resv);
+ if (f) {
+ r = radeon_fence_wait((struct radeon_fence *)f, false);
if (r) {
DRM_ERROR("Failed waiting for UVD message (%d)!\n", r);
return r;
@@ -649,7 +651,7 @@ static int radeon_uvd_send_msg(struct radeon_device *rdev,
r = radeon_ib_schedule(rdev, &ib, NULL);
if (r)
goto err;
- ttm_eu_fence_buffer_objects(&ticket, &head, ib.fence);
+ ttm_eu_fence_buffer_objects(&ticket, &head, &ib.fence->base);

if (fence)
*fence = radeon_fence_ref(ib.fence);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index 4c68852c3e72..d57dc7c63d0e 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -388,7 +388,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
if (r)
goto error;

- ttm_eu_fence_buffer_objects(&ticket, &head, ib.fence);
+ ttm_eu_fence_buffer_objects(&ticket, &head, &ib.fence->base);
radeon_ib_free(rdev, &ib);

return 0;
@@ -644,7 +644,12 @@ int radeon_vm_update_page_directory(struct radeon_device *rdev,
incr, R600_PTE_VALID);

if (ib.length_dw != 0) {
- radeon_semaphore_sync_to(ib.semaphore, pd->tbo.sync_obj);
+ struct fence *fence;
+
+ fence = reservation_object_get_excl(pd->tbo.resv);
+ radeon_semaphore_sync_to(ib.semaphore,
+ (struct radeon_fence *)fence);
+
radeon_semaphore_sync_to(ib.semaphore, vm->last_id_use);
r = radeon_ib_schedule(rdev, &ib, NULL);
if (r) {
@@ -772,8 +777,11 @@ static void radeon_vm_update_ptes(struct radeon_device *rdev,
struct radeon_bo *pt = vm->page_tables[pt_idx].bo;
unsigned nptes;
uint64_t pte;
+ struct fence *fence;

- radeon_semaphore_sync_to(ib->semaphore, pt->tbo.sync_obj);
+ fence = reservation_object_get_excl(pt->tbo.resv);
+ radeon_semaphore_sync_to(ib->semaphore,
+ (struct radeon_fence *)fence);

if ((addr & ~mask) == (end & ~mask))
nptes = end - addr;
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ce0434377223..31c4a6dd722d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -40,6 +40,7 @@
#include <linux/file.h>
#include <linux/module.h>
#include <linux/atomic.h>
+#include <linux/reservation.h>

#define TTM_ASSERT_LOCKED(param)
#define TTM_DEBUG(fmt, arg...)
@@ -141,7 +142,6 @@ static void ttm_bo_release_list(struct kref *list_kref)
BUG_ON(atomic_read(&bo->list_kref.refcount));
BUG_ON(atomic_read(&bo->kref.refcount));
BUG_ON(atomic_read(&bo->cpu_writers));
- BUG_ON(bo->sync_obj != NULL);
BUG_ON(bo->mem.mm_node != NULL);
BUG_ON(!list_empty(&bo->lru));
BUG_ON(!list_empty(&bo->ddestroy));
@@ -402,12 +402,30 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
ww_mutex_unlock (&bo->resv->lock);
}

+static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
+{
+ struct reservation_object_list *fobj;
+ struct fence *fence;
+ int i;
+
+ fobj = reservation_object_get_list(bo->resv);
+ fence = reservation_object_get_excl(bo->resv);
+ if (fence && !fence->ops->signaled)
+ fence_enable_sw_signaling(fence);
+
+ for (i = 0; fobj && i < fobj->shared_count; ++i) {
+ fence = rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(bo->resv));
+
+ if (!fence->ops->signaled)
+ fence_enable_sw_signaling(fence);
+ }
+}
+
static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
{
struct ttm_bo_device *bdev = bo->bdev;
struct ttm_bo_global *glob = bo->glob;
- struct ttm_bo_driver *driver = bdev->driver;
- void *sync_obj = NULL;
int put_count;
int ret;

@@ -415,9 +433,7 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
ret = __ttm_bo_reserve(bo, false, true, false, 0);

if (!ret) {
- (void) ttm_bo_wait(bo, false, false, true);
-
- if (!bo->sync_obj) {
+ if (!ttm_bo_wait(bo, false, false, true)) {
put_count = ttm_bo_del_from_lru(bo);

spin_unlock(&glob->lru_lock);
@@ -426,8 +442,8 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
ttm_bo_list_ref_sub(bo, put_count, true);

return;
- }
- sync_obj = driver->sync_obj_ref(bo->sync_obj);
+ } else
+ ttm_bo_flush_all_fences(bo);

/*
* Make NO_EVICT bos immediately available to
@@ -446,14 +462,70 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
list_add_tail(&bo->ddestroy, &bdev->ddestroy);
spin_unlock(&glob->lru_lock);

- if (sync_obj) {
- driver->sync_obj_flush(sync_obj);
- driver->sync_obj_unref(&sync_obj);
- }
schedule_delayed_work(&bdev->wq,
((HZ / 100) < 1) ? 1 : HZ / 100);
}

+static int ttm_bo_unreserve_and_wait(struct ttm_buffer_object *bo,
+ bool interruptible)
+{
+ struct ttm_bo_global *glob = bo->glob;
+ struct reservation_object_list *fobj;
+ struct fence *excl = NULL;
+ struct fence **shared = NULL;
+ u32 shared_count = 0, i;
+ int ret = 0;
+
+ fobj = reservation_object_get_list(bo->resv);
+ if (fobj && fobj->shared_count) {
+ shared = kmalloc(sizeof(*shared) * fobj->shared_count,
+ GFP_KERNEL);
+
+ if (!shared) {
+ ret = -ENOMEM;
+ __ttm_bo_unreserve(bo);
+ spin_unlock(&glob->lru_lock);
+ return ret;
+ }
+
+ for (i = 0; i < fobj->shared_count; ++i) {
+ if (!fence_is_signaled(fobj->shared[i])) {
+ fence_get(fobj->shared[i]);
+ shared[shared_count++] = fobj->shared[i];
+ }
+ }
+ if (!shared_count) {
+ kfree(shared);
+ shared = NULL;
+ }
+ }
+
+ excl = reservation_object_get_excl(bo->resv);
+ if (excl && !fence_is_signaled(excl))
+ fence_get(excl);
+ else
+ excl = NULL;
+
+ __ttm_bo_unreserve(bo);
+ spin_unlock(&glob->lru_lock);
+
+ if (excl) {
+ ret = fence_wait(excl, interruptible);
+ fence_put(excl);
+ }
+
+ if (shared_count > 0) {
+ for (i = 0; i < shared_count; ++i) {
+ if (!ret)
+ ret = fence_wait(shared[i], interruptible);
+ fence_put(shared[i]);
+ }
+ kfree(shared);
+ }
+
+ return ret;
+}
+
/**
* function ttm_bo_cleanup_refs_and_unlock
* If bo idle, remove from delayed- and lru lists, and unref.
@@ -470,8 +542,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
bool interruptible,
bool no_wait_gpu)
{
- struct ttm_bo_device *bdev = bo->bdev;
- struct ttm_bo_driver *driver = bdev->driver;
struct ttm_bo_global *glob = bo->glob;
int put_count;
int ret;
@@ -479,20 +549,7 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
ret = ttm_bo_wait(bo, false, false, true);

if (ret && !no_wait_gpu) {
- void *sync_obj;
-
- /*
- * Take a reference to the fence and unreserve,
- * at this point the buffer should be dead, so
- * no new sync objects can be attached.
- */
- sync_obj = driver->sync_obj_ref(bo->sync_obj);
-
- __ttm_bo_unreserve(bo);
- spin_unlock(&glob->lru_lock);
-
- ret = driver->sync_obj_wait(sync_obj, false, interruptible);
- driver->sync_obj_unref(&sync_obj);
+ ret = ttm_bo_unreserve_and_wait(bo, interruptible);
if (ret)
return ret;

@@ -1513,41 +1570,51 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)

EXPORT_SYMBOL(ttm_bo_unmap_virtual);

-
int ttm_bo_wait(struct ttm_buffer_object *bo,
bool lazy, bool interruptible, bool no_wait)
{
- struct ttm_bo_driver *driver = bo->bdev->driver;
- void *sync_obj;
- int ret = 0;
-
- lockdep_assert_held(&bo->resv->lock.base);
+ struct reservation_object_list *fobj;
+ struct reservation_object *resv;
+ struct fence *excl;
+ long timeout = 15 * HZ;
+ int i;

- if (likely(bo->sync_obj == NULL))
- return 0;
+ resv = bo->resv;
+ fobj = reservation_object_get_list(resv);
+ excl = reservation_object_get_excl(resv);
+ if (excl) {
+ if (!fence_is_signaled(excl)) {
+ if (no_wait)
+ return -EBUSY;

- if (bo->sync_obj) {
- if (driver->sync_obj_signaled(bo->sync_obj)) {
- driver->sync_obj_unref(&bo->sync_obj);
- clear_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
- return 0;
+ timeout = fence_wait_timeout(excl,
+ interruptible, timeout);
}
+ }

- if (no_wait)
- return -EBUSY;
+ for (i = 0; fobj && timeout > 0 && i < fobj->shared_count; ++i) {
+ struct fence *fence;
+ fence = rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(resv));

- sync_obj = driver->sync_obj_ref(bo->sync_obj);
- ret = driver->sync_obj_wait(sync_obj,
- lazy, interruptible);
+ if (!fence_is_signaled(fence)) {
+ if (no_wait)
+ return -EBUSY;

- if (likely(ret == 0)) {
- clear_bit(TTM_BO_PRIV_FLAG_MOVING,
- &bo->priv_flags);
- driver->sync_obj_unref(&bo->sync_obj);
+ timeout = fence_wait_timeout(fence,
+ interruptible, timeout);
}
- driver->sync_obj_unref(&sync_obj);
}
- return ret;
+
+ if (timeout < 0)
+ return timeout;
+
+ if (timeout == 0)
+ return -EBUSY;
+
+ reservation_object_add_excl_fence(resv, NULL);
+ clear_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
+ return 0;
}
EXPORT_SYMBOL(ttm_bo_wait);

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 23db594e55c0..fe806c1ded9e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -37,6 +37,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/module.h>
+#include <linux/reservation.h>

void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
{
@@ -444,8 +445,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
struct ttm_buffer_object **new_obj)
{
struct ttm_buffer_object *fbo;
- struct ttm_bo_device *bdev = bo->bdev;
- struct ttm_bo_driver *driver = bdev->driver;
int ret;

fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
@@ -466,10 +465,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
drm_vma_node_reset(&fbo->vma_node);
atomic_set(&fbo->cpu_writers, 0);

- if (bo->sync_obj)
- fbo->sync_obj = driver->sync_obj_ref(bo->sync_obj);
- else
- fbo->sync_obj = NULL;
kref_init(&fbo->list_kref);
kref_init(&fbo->kref);
fbo->destroy = &ttm_transfered_destroy;
@@ -642,28 +637,20 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map)
EXPORT_SYMBOL(ttm_bo_kunmap);

int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
- void *sync_obj,
+ struct fence *fence,
bool evict,
bool no_wait_gpu,
struct ttm_mem_reg *new_mem)
{
struct ttm_bo_device *bdev = bo->bdev;
- struct ttm_bo_driver *driver = bdev->driver;
struct ttm_mem_type_manager *man = &bdev->man[new_mem->mem_type];
struct ttm_mem_reg *old_mem = &bo->mem;
int ret;
struct ttm_buffer_object *ghost_obj;
- void *tmp_obj = NULL;

- if (bo->sync_obj) {
- tmp_obj = bo->sync_obj;
- bo->sync_obj = NULL;
- }
- bo->sync_obj = driver->sync_obj_ref(sync_obj);
+ reservation_object_add_excl_fence(bo->resv, fence);
if (evict) {
ret = ttm_bo_wait(bo, false, false, false);
- if (tmp_obj)
- driver->sync_obj_unref(&tmp_obj);
if (ret)
return ret;

@@ -684,13 +671,13 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
*/

set_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
- if (tmp_obj)
- driver->sync_obj_unref(&tmp_obj);

ret = ttm_buffer_object_transfer(bo, &ghost_obj);
if (ret)
return ret;

+ reservation_object_add_excl_fence(ghost_obj->resv, fence);
+
/**
* If we're not moving to fixed memory, the TTM object
* needs to stay alive. Otherwhise hang it on the ghost
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 108730e9147b..adafc0f8ec06 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -163,7 +163,7 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
EXPORT_SYMBOL(ttm_eu_reserve_buffers);

void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
- struct list_head *list, void *sync_obj)
+ struct list_head *list, struct fence *fence)
{
struct ttm_validate_buffer *entry;
struct ttm_buffer_object *bo;
@@ -183,18 +183,12 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,

list_for_each_entry(entry, list, head) {
bo = entry->bo;
- entry->old_sync_obj = bo->sync_obj;
- bo->sync_obj = driver->sync_obj_ref(sync_obj);
+ reservation_object_add_excl_fence(bo->resv, fence);
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
}
spin_unlock(&glob->lru_lock);
if (ticket)
ww_acquire_fini(ticket);
-
- list_for_each_entry(entry, list, head) {
- if (entry->old_sync_obj)
- driver->sync_obj_unref(&entry->old_sync_obj);
- }
}
EXPORT_SYMBOL(ttm_eu_fence_buffer_objects);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index f15718cc631d..656c88485e14 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -768,41 +768,6 @@ static int vmw_ttm_fault_reserve_notify(struct ttm_buffer_object *bo)
}

/**
- * FIXME: We're using the old vmware polling method to sync.
- * Do this with fences instead.
- */
-
-static void *vmw_sync_obj_ref(void *sync_obj)
-{
-
- return (void *)
- vmw_fence_obj_reference((struct vmw_fence_obj *) sync_obj);
-}
-
-static void vmw_sync_obj_unref(void **sync_obj)
-{
- vmw_fence_obj_unreference((struct vmw_fence_obj **) sync_obj);
-}
-
-static int vmw_sync_obj_flush(void *sync_obj)
-{
- vmw_fence_obj_flush((struct vmw_fence_obj *) sync_obj);
- return 0;
-}
-
-static bool vmw_sync_obj_signaled(void *sync_obj)
-{
- return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj);
-}
-
-static int vmw_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
-{
- return vmw_fence_obj_wait((struct vmw_fence_obj *) sync_obj,
- lazy, interruptible,
- VMW_FENCE_WAIT_TIMEOUT);
-}
-
-/**
* vmw_move_notify - TTM move_notify_callback
*
* @bo: The TTM buffer object about to move.
@@ -839,11 +804,6 @@ struct ttm_bo_driver vmw_bo_driver = {
.evict_flags = vmw_evict_flags,
.move = NULL,
.verify_access = vmw_verify_access,
- .sync_obj_signaled = vmw_sync_obj_signaled,
- .sync_obj_wait = vmw_sync_obj_wait,
- .sync_obj_flush = vmw_sync_obj_flush,
- .sync_obj_unref = vmw_sync_obj_unref,
- .sync_obj_ref = vmw_sync_obj_ref,
.move_notify = vmw_move_notify,
.swap_notify = vmw_swap_notify,
.fault_reserve_notify = &vmw_ttm_fault_reserve_notify,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 6688a6341486..20a1a866ceeb 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1419,22 +1419,16 @@ void vmw_fence_single_bo(struct ttm_buffer_object *bo,
struct vmw_fence_obj *fence)
{
struct ttm_bo_device *bdev = bo->bdev;
- struct vmw_fence_obj *old_fence_obj;
+
struct vmw_private *dev_priv =
container_of(bdev, struct vmw_private, bdev);

if (fence == NULL) {
vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL);
+ reservation_object_add_excl_fence(bo->resv, &fence->base);
+ fence_put(&fence->base);
} else
- vmw_fence_obj_reference(fence);
-
- reservation_object_add_excl_fence(bo->resv, &fence->base);
-
- old_fence_obj = bo->sync_obj;
- bo->sync_obj = fence;
-
- if (old_fence_obj)
- vmw_fence_obj_unreference(&old_fence_obj);
+ reservation_object_add_excl_fence(bo->resv, &fence->base);
}

/**
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 67df9d7c06cc..3b630f3153d0 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -163,7 +163,6 @@ struct ttm_tt;
* @lru: List head for the lru list.
* @ddestroy: List head for the delayed destroy list.
* @swap: List head for swap LRU list.
- * @sync_obj: Pointer to a synchronization object.
* @priv_flags: Flags describing buffer object internal state.
* @vma_node: Address space manager node.
* @offset: The current GPU offset, which can have different meanings
@@ -230,7 +229,6 @@ struct ttm_buffer_object {
* Members protected by a bo reservation.
*/

- void *sync_obj;
unsigned long priv_flags;

struct drm_vma_offset_node vma_node;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 0aa6caa59415..71a345ee92d5 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -309,11 +309,6 @@ struct ttm_mem_type_manager {
* @move: Callback for a driver to hook in accelerated functions to
* move a buffer.
* If set to NULL, a potentially slow memcpy() move is used.
- * @sync_obj_signaled: See ttm_fence_api.h
- * @sync_obj_wait: See ttm_fence_api.h
- * @sync_obj_flush: See ttm_fence_api.h
- * @sync_obj_unref: See ttm_fence_api.h
- * @sync_obj_ref: See ttm_fence_api.h
*/

struct ttm_bo_driver {
@@ -415,23 +410,6 @@ struct ttm_bo_driver {
int (*verify_access) (struct ttm_buffer_object *bo,
struct file *filp);

- /**
- * In case a driver writer dislikes the TTM fence objects,
- * the driver writer can replace those with sync objects of
- * his / her own. If it turns out that no driver writer is
- * using these. I suggest we remove these hooks and plug in
- * fences directly. The bo driver needs the following functionality:
- * See the corresponding functions in the fence object API
- * documentation.
- */
-
- bool (*sync_obj_signaled) (void *sync_obj);
- int (*sync_obj_wait) (void *sync_obj,
- bool lazy, bool interruptible);
- int (*sync_obj_flush) (void *sync_obj);
- void (*sync_obj_unref) (void **sync_obj);
- void *(*sync_obj_ref) (void *sync_obj);
-
/* hook to notify driver about a driver move so it
* can do tiling things */
void (*move_notify)(struct ttm_buffer_object *bo,
@@ -1031,7 +1009,7 @@ extern void ttm_bo_free_old_node(struct ttm_buffer_object *bo);
* ttm_bo_move_accel_cleanup.
*
* @bo: A pointer to a struct ttm_buffer_object.
- * @sync_obj: A sync object that signals when moving is complete.
+ * @fence: A fence object that signals when moving is complete.
* @evict: This is an evict move. Don't return until the buffer is idle.
* @no_wait_gpu: Return immediately if the GPU is busy.
* @new_mem: struct ttm_mem_reg indicating where to move.
@@ -1045,7 +1023,7 @@ extern void ttm_bo_free_old_node(struct ttm_buffer_object *bo);
*/

extern int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
- void *sync_obj,
+ struct fence *fence,
bool evict, bool no_wait_gpu,
struct ttm_mem_reg *new_mem);
/**
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index 8490cb8ee0d8..ff11a424f752 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -39,16 +39,11 @@
*
* @head: list head for thread-private list.
* @bo: refcounted buffer object pointer.
- * @reserved: Indicates whether @bo has been reserved for validation.
- * @removed: Indicates whether @bo has been removed from lru lists.
- * @put_count: Number of outstanding references on bo::list_kref.
- * @old_sync_obj: Pointer to a sync object about to be unreferenced
*/

struct ttm_validate_buffer {
struct list_head head;
struct ttm_buffer_object *bo;
- void *old_sync_obj;
};

/**
@@ -100,7 +95,7 @@ extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
*
* @ticket: ww_acquire_ctx from reserve call
* @list: thread private list of ttm_validate_buffer structs.
- * @sync_obj: The new sync object for the buffers.
+ * @fence: The new exclusive fence for the buffers.
*
* This function should be called when command submission is complete, and
* it will add a new sync object to bos pointed to by entries on @list.
@@ -109,6 +104,7 @@ extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
*/

extern void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
- struct list_head *list, void *sync_obj);
+ struct list_head *list,
+ struct fence *fence);

#endif

2014-07-09 12:30:33

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 14/17] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep

With the conversion to the reservation api this should be safe.

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_gem.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 4beaa897adad..c2ca894f6507 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -863,33 +863,29 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
struct drm_gem_object *gem;
struct nouveau_bo *nvbo;
bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
+ bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE);
int ret;
- struct nouveau_fence *fence = NULL;

gem = drm_gem_object_lookup(dev, file_priv, req->handle);
if (!gem)
return -ENOENT;
nvbo = nouveau_gem_object(gem);

- ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
- if (!ret) {
- ret = ttm_bo_wait(&nvbo->bo, true, true, true);
- if (!no_wait && ret) {
- struct fence *excl;
-
- excl = reservation_object_get_excl(nvbo->bo.resv);
- fence = nouveau_fence_ref((struct nouveau_fence *)excl);
- }
+ if (no_wait)
+ ret = reservation_object_test_signaled_rcu(nvbo->bo.resv, write) ? 0 : -EBUSY;
+ else {
+ long lret;

- ttm_bo_unreserve(&nvbo->bo);
+ lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, write, true, 30 * HZ);
+ if (!lret)
+ ret = -EBUSY;
+ else if (lret > 0)
+ ret = 0;
+ else
+ ret = lret;
}
drm_gem_object_unreference_unlocked(gem);

- if (fence) {
- ret = nouveau_fence_wait(fence, true, no_wait);
- nouveau_fence_unref(&fence);
- }
-
return ret;
}

2014-07-09 12:30:44

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 16/17] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 20a1a866ceeb..79e950df3018 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -567,13 +567,16 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo,
int ret;

if (flags & drm_vmw_synccpu_allow_cs) {
- ret = ttm_bo_reserve(bo, true, !!(flags & drm_vmw_synccpu_dontblock), false, 0);
- if (!ret) {
- ret = ttm_bo_wait(bo, false, true,
- !!(flags & drm_vmw_synccpu_dontblock));
- ttm_bo_unreserve(bo);
- }
- return ret;
+ long lret;
+ if (flags & drm_vmw_synccpu_dontblock)
+ return reservation_object_test_signaled_rcu(bo->resv, true) ? 0 : -EBUSY;
+
+ lret = reservation_object_wait_timeout_rcu(bo->resv, true, true, MAX_SCHEDULE_TIMEOUT);
+ if (!lret)
+ return -EBUSY;
+ else if (lret < 0)
+ return lret;
+ return 0;
}

ret = ttm_bo_synccpu_write_grab

2014-07-09 12:30:38

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 15/17] drm/radeon: use rcu waits in some ioctls

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/radeon/radeon_gem.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index d09650c1d720..7ba883843668 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -107,9 +107,12 @@ static int radeon_gem_set_domain(struct drm_gem_object *gobj,
}
if (domain == RADEON_GEM_DOMAIN_CPU) {
/* Asking for cpu access wait for object idle */
- r = radeon_bo_wait(robj, NULL, false);
- if (r) {
- printk(KERN_ERR "Failed to wait for object !\n");
+ r = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 30 * HZ);
+ if (!r)
+ r = -EBUSY;
+
+ if (r < 0 && r != -EINTR) {
+ printk(KERN_ERR "Failed to wait for object: %i\n", r);
return r;
}
}
@@ -357,14 +360,20 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
struct drm_radeon_gem_wait_idle *args = data;
struct drm_gem_object *gobj;
struct radeon_bo *robj;
- int r;
+ int r = 0;
+ long ret;

gobj = drm_gem_object_lookup(dev, filp, args->handle);
if (gobj == NULL) {
return -ENOENT;
}
robj = gem_to_radeon_bo(gobj);
- r = radeon_bo_wait(robj, NULL, false);
+ ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 30 * HZ);
+ if (ret == 0)
+ r = -EBUSY;
+ else if (ret < 0)
+ r = ret;
+
/* callback hw specific functions if any */
if (rdev->asic->ioctl_wait_idle)
robj->rdev->asic->ioctl_wait_idle(rdev, robj);

2014-07-09 12:30:49

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 17/17] drm/ttm: use rcu in core ttm

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/ttm/ttm_bo.c | 76 +++++++-----------------------------------
1 file changed, 13 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 31c4a6dd722d..6fe1f4bf37ed 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -466,66 +466,6 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
((HZ / 100) < 1) ? 1 : HZ / 100);
}

-static int ttm_bo_unreserve_and_wait(struct ttm_buffer_object *bo,
- bool interruptible)
-{
- struct ttm_bo_global *glob = bo->glob;
- struct reservation_object_list *fobj;
- struct fence *excl = NULL;
- struct fence **shared = NULL;
- u32 shared_count = 0, i;
- int ret = 0;
-
- fobj = reservation_object_get_list(bo->resv);
- if (fobj && fobj->shared_count) {
- shared = kmalloc(sizeof(*shared) * fobj->shared_count,
- GFP_KERNEL);
-
- if (!shared) {
- ret = -ENOMEM;
- __ttm_bo_unreserve(bo);
- spin_unlock(&glob->lru_lock);
- return ret;
- }
-
- for (i = 0; i < fobj->shared_count; ++i) {
- if (!fence_is_signaled(fobj->shared[i])) {
- fence_get(fobj->shared[i]);
- shared[shared_count++] = fobj->shared[i];
- }
- }
- if (!shared_count) {
- kfree(shared);
- shared = NULL;
- }
- }
-
- excl = reservation_object_get_excl(bo->resv);
- if (excl && !fence_is_signaled(excl))
- fence_get(excl);
- else
- excl = NULL;
-
- __ttm_bo_unreserve(bo);
- spin_unlock(&glob->lru_lock);
-
- if (excl) {
- ret = fence_wait(excl, interruptible);
- fence_put(excl);
- }
-
- if (shared_count > 0) {
- for (i = 0; i < shared_count; ++i) {
- if (!ret)
- ret = fence_wait(shared[i], interruptible);
- fence_put(shared[i]);
- }
- kfree(shared);
- }
-
- return ret;
-}
-
/**
* function ttm_bo_cleanup_refs_and_unlock
* If bo idle, remove from delayed- and lru lists, and unref.
@@ -549,9 +489,19 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
ret = ttm_bo_wait(bo, false, false, true);

if (ret && !no_wait_gpu) {
- ret = ttm_bo_unreserve_and_wait(bo, interruptible);
- if (ret)
- return ret;
+ long lret;
+ ww_mutex_unlock(&bo->resv->lock);
+ spin_unlock(&glob->lru_lock);
+
+ lret = reservation_object_wait_timeout_rcu(bo->resv,
+ true,
+ interruptible,
+ 30 * HZ);
+
+ if (lret < 0)
+ return lret;
+ else if (lret == 0)
+ return -EBUSY;

spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);

2014-07-09 12:30:24

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH 12/17] drm/vmwgfx: rework to new fence interface

Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 2
drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 299 ++++++++++++++++++------------
drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 29 ++-
drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 9 -
4 files changed, 200 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index db30b790ad24..f3f8caa09cc8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2360,7 +2360,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
BUG_ON(fence == NULL);

fence_rep.handle = fence_handle;
- fence_rep.seqno = fence->seqno;
+ fence_rep.seqno = fence->base.seqno;
vmw_update_seqno(dev_priv, &dev_priv->fifo);
fence_rep.passed_seqno = dev_priv->last_read_seqno;
}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 05b9eea8e875..77f416b7552c 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -46,6 +46,7 @@ struct vmw_fence_manager {
bool goal_irq_on; /* Protected by @goal_irq_mutex */
bool seqno_valid; /* Protected by @lock, and may not be set to true
without the @goal_irq_mutex held. */
+ unsigned ctx;
};

struct vmw_user_fence {
@@ -80,6 +81,12 @@ struct vmw_event_fence_action {
uint32_t *tv_usec;
};

+static struct vmw_fence_manager *
+fman_from_fence(struct vmw_fence_obj *fence)
+{
+ return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+}
+
/**
* Note on fencing subsystem usage of irqs:
* Typically the vmw_fences_update function is called
@@ -102,25 +109,130 @@ struct vmw_event_fence_action {
* objects with actions attached to them.
*/

-static void vmw_fence_obj_destroy_locked(struct kref *kref)
+static void vmw_fence_obj_destroy(struct fence *f)
{
struct vmw_fence_obj *fence =
- container_of(kref, struct vmw_fence_obj, kref);
+ container_of(f, struct vmw_fence_obj, base);

- struct vmw_fence_manager *fman = fence->fman;
- unsigned int num_fences;
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
+ unsigned long irq_flags;

+ spin_lock_irqsave(&fman->lock, irq_flags);
list_del_init(&fence->head);
- num_fences = --fman->num_fence_objects;
- spin_unlock_irq(&fman->lock);
- if (fence->destroy)
- fence->destroy(fence);
- else
- kfree(fence);
+ --fman->num_fence_objects;
+ spin_unlock_irqrestore(&fman->lock, irq_flags);
+ fence->destroy(fence);
+}

- spin_lock_irq(&fman->lock);
+static const char *vmw_fence_get_driver_name(struct fence *f)
+{
+ return "vmwgfx";
+}
+
+static const char *vmw_fence_get_timeline_name(struct fence *f)
+{
+ return "svga";
+}
+
+static bool vmw_fence_enable_signaling(struct fence *f)
+{
+ struct vmw_fence_obj *fence =
+ container_of(f, struct vmw_fence_obj, base);
+
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
+
+ __le32 __iomem *fifo_mem = fman->dev_priv->mmio_virt;
+ u32 seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
+ if (seqno - fence->base.seqno < VMW_FENCE_WRAP)
+ return false;
+
+ vmw_fifo_ping_host(fman->dev_priv, SVGA_SYNC_GENERIC);
+
+ return true;
+}
+
+struct vmwgfx_wait_cb {
+ struct fence_cb base;
+ struct task_struct *task;
+};
+
+static void
+vmwgfx_wait_cb(struct fence *fence, struct fence_cb *cb)
+{
+ struct vmwgfx_wait_cb *wait =
+ container_of(cb, struct vmwgfx_wait_cb, base);
+
+ wake_up_process(wait->task);
}

+static void __vmw_fences_update(struct vmw_fence_manager *fman);
+
+static long vmw_fence_wait(struct fence *f, bool intr, signed long timeout)
+{
+ struct vmw_fence_obj *fence =
+ container_of(f, struct vmw_fence_obj, base);
+
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
+ struct vmw_private *dev_priv = fman->dev_priv;
+ struct vmwgfx_wait_cb cb;
+ long ret = timeout;
+ unsigned long irq_flags;
+
+ if (likely(vmw_fence_obj_signaled(fence)))
+ return timeout;
+
+ vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
+ vmw_seqno_waiter_add(dev_priv);
+
+ spin_lock_irqsave(f->lock, irq_flags);
+
+ if (intr && signal_pending(current)) {
+ ret = -ERESTARTSYS;
+ goto out;
+ }
+
+ cb.base.func = vmwgfx_wait_cb;
+ cb.task = current;
+ list_add(&cb.base.node, &f->cb_list);
+
+ while (ret > 0) {
+ __vmw_fences_update(fman);
+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &f->flags))
+ break;
+
+ if (intr)
+ __set_current_state(TASK_INTERRUPTIBLE);
+ else
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock_irqrestore(f->lock, irq_flags);
+
+ ret = schedule_timeout(ret);
+
+ spin_lock_irqsave(f->lock, irq_flags);
+ if (ret > 0 && intr && signal_pending(current))
+ ret = -ERESTARTSYS;
+ }
+
+ if (!list_empty(&cb.base.node))
+ list_del(&cb.base.node);
+ __set_current_state(TASK_RUNNING);
+
+out:
+ spin_unlock_irqrestore(f->lock, irq_flags);
+
+ vmw_seqno_waiter_remove(dev_priv);
+
+ return ret;
+}
+
+static struct fence_ops vmw_fence_ops = {
+ .get_driver_name = vmw_fence_get_driver_name,
+ .get_timeline_name = vmw_fence_get_timeline_name,
+ .enable_signaling = vmw_fence_enable_signaling,
+ .wait = vmw_fence_wait,
+ .release = vmw_fence_obj_destroy,
+};
+

/**
* Execute signal actions on fences recently signaled.
@@ -186,6 +298,7 @@ struct vmw_fence_manager *vmw_fence_manager_init(struct vmw_private *dev_priv)
fman->event_fence_action_size =
ttm_round_pot(sizeof(struct vmw_event_fence_action));
mutex_init(&fman->goal_irq_mutex);
+ fman->ctx = fence_context_alloc(1);

return fman;
}
@@ -211,16 +324,12 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
void (*destroy) (struct vmw_fence_obj *fence))
{
unsigned long irq_flags;
- unsigned int num_fences;
int ret = 0;

- fence->seqno = seqno;
+ fence_init(&fence->base, &vmw_fence_ops, &fman->lock,
+ fman->ctx, seqno);
INIT_LIST_HEAD(&fence->seq_passed_actions);
- fence->fman = fman;
- fence->signaled = 0;
- kref_init(&fence->kref);
fence->destroy = destroy;
- init_waitqueue_head(&fence->queue);

spin_lock_irqsave(&fman->lock, irq_flags);
if (unlikely(fman->fifo_down)) {
@@ -228,7 +337,7 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
- num_fences = ++fman->num_fence_objects;
+ ++fman->num_fence_objects;

out_unlock:
spin_unlock_irqrestore(&fman->lock, irq_flags);
@@ -236,38 +345,6 @@ out_unlock:

}

-struct vmw_fence_obj *vmw_fence_obj_reference(struct vmw_fence_obj *fence)
-{
- if (unlikely(fence == NULL))
- return NULL;
-
- kref_get(&fence->kref);
- return fence;
-}
-
-/**
- * vmw_fence_obj_unreference
- *
- * Note that this function may not be entered with disabled irqs since
- * it may re-enable them in the destroy function.
- *
- */
-void vmw_fence_obj_unreference(struct vmw_fence_obj **fence_p)
-{
- struct vmw_fence_obj *fence = *fence_p;
- struct vmw_fence_manager *fman;
-
- if (unlikely(fence == NULL))
- return;
-
- fman = fence->fman;
- *fence_p = NULL;
- spin_lock_irq(&fman->lock);
- BUG_ON(atomic_read(&fence->kref.refcount) == 0);
- kref_put(&fence->kref, vmw_fence_obj_destroy_locked);
- spin_unlock_irq(&fman->lock);
-}
-
static void vmw_fences_perform_actions(struct vmw_fence_manager *fman,
struct list_head *list)
{
@@ -323,7 +400,7 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman,
list_for_each_entry(fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
- iowrite32(fence->seqno,
+ iowrite32(fence->base.seqno,
fifo_mem + SVGA_FIFO_FENCE_GOAL);
break;
}
@@ -350,27 +427,27 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman,
*/
static bool vmw_fence_goal_check_locked(struct vmw_fence_obj *fence)
{
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
u32 goal_seqno;
__le32 __iomem *fifo_mem;

- if (fence->signaled)
+ if (fence_is_signaled_locked(&fence->base))
return false;

- fifo_mem = fence->fman->dev_priv->mmio_virt;
+ fifo_mem = fman->dev_priv->mmio_virt;
goal_seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE_GOAL);
- if (likely(fence->fman->seqno_valid &&
- goal_seqno - fence->seqno < VMW_FENCE_WRAP))
+ if (likely(fman->seqno_valid &&
+ goal_seqno - fence->base.seqno < VMW_FENCE_WRAP))
return false;

- iowrite32(fence->seqno, fifo_mem + SVGA_FIFO_FENCE_GOAL);
- fence->fman->seqno_valid = true;
+ iowrite32(fence->base.seqno, fifo_mem + SVGA_FIFO_FENCE_GOAL);
+ fman->seqno_valid = true;

return true;
}

-void vmw_fences_update(struct vmw_fence_manager *fman)
+static void __vmw_fences_update(struct vmw_fence_manager *fman)
{
- unsigned long flags;
struct vmw_fence_obj *fence, *next_fence;
struct list_head action_list;
bool needs_rerun;
@@ -379,32 +456,25 @@ void vmw_fences_update(struct vmw_fence_manager *fman)

seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
rerun:
- spin_lock_irqsave(&fman->lock, flags);
list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
- if (seqno - fence->seqno < VMW_FENCE_WRAP) {
+ if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
list_del_init(&fence->head);
- fence->signaled = 1;
+ fence_signal_locked(&fence->base);
INIT_LIST_HEAD(&action_list);
list_splice_init(&fence->seq_passed_actions,
&action_list);
vmw_fences_perform_actions(fman, &action_list);
- wake_up_all(&fence->queue);
} else
break;
}

- needs_rerun = vmw_fence_goal_new_locked(fman, seqno);
-
- if (!list_empty(&fman->cleanup_list))
- (void) schedule_work(&fman->work);
- spin_unlock_irqrestore(&fman->lock, flags);
-
/*
* Rerun if the fence goal seqno was updated, and the
* hardware might have raced with that update, so that
* we missed a fence_goal irq.
*/

+ needs_rerun = vmw_fence_goal_new_locked(fman, seqno);
if (unlikely(needs_rerun)) {
new_seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
if (new_seqno != seqno) {
@@ -412,75 +482,58 @@ rerun:
goto rerun;
}
}
+
+ if (!list_empty(&fman->cleanup_list))
+ (void) schedule_work(&fman->work);
}

-bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
+void vmw_fences_update(struct vmw_fence_manager *fman)
{
- struct vmw_fence_manager *fman = fence->fman;
unsigned long irq_flags;
- uint32_t signaled;

spin_lock_irqsave(&fman->lock, irq_flags);
- signaled = fence->signaled;
+ __vmw_fences_update(fman);
spin_unlock_irqrestore(&fman->lock, irq_flags);
+}
+
+bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
+{
+ struct vmw_fence_manager *fman = fman_from_fence(fence);

- if (signaled)
+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
return 1;

vmw_fences_update(fman);

- spin_lock_irqsave(&fman->lock, irq_flags);
- signaled = fence->signaled;
- spin_unlock_irqrestore(&fman->lock, irq_flags);
-
- return signaled;
+ return fence_is_signaled(&fence->base);
}

int vmw_fence_obj_wait(struct vmw_fence_obj *fence, bool lazy,
bool interruptible, unsigned long timeout)
{
- struct vmw_private *dev_priv = fence->fman->dev_priv;
- long ret;
+ long ret = fence_wait_timeout(&fence->base, interruptible, timeout);

- if (likely(vmw_fence_obj_signaled(fence)))
+ if (likely(ret > 0))
return 0;
-
- vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
- vmw_seqno_waiter_add(dev_priv);
-
- if (interruptible)
- ret = wait_event_interruptible_timeout
- (fence->queue,
- vmw_fence_obj_signaled(fence),
- timeout);
+ else if (ret == 0)
+ return -EBUSY;
else
- ret = wait_event_timeout
- (fence->queue,
- vmw_fence_obj_signaled(fence),
- timeout);
-
- vmw_seqno_waiter_remove(dev_priv);
-
- if (unlikely(ret == 0))
- ret = -EBUSY;
- else if (likely(ret > 0))
- ret = 0;
-
- return ret;
+ return ret;
}

void vmw_fence_obj_flush(struct vmw_fence_obj *fence)
{
- struct vmw_private *dev_priv = fence->fman->dev_priv;
+ struct vmw_private *dev_priv = fman_from_fence(fence)->dev_priv;

vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
}

static void vmw_fence_destroy(struct vmw_fence_obj *fence)
{
- struct vmw_fence_manager *fman = fence->fman;
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
+
+ fence_free(&fence->base);

- kfree(fence);
/*
* Free kernel space accounting.
*/
@@ -527,7 +580,7 @@ static void vmw_user_fence_destroy(struct vmw_fence_obj *fence)
{
struct vmw_user_fence *ufence =
container_of(fence, struct vmw_user_fence, fence);
- struct vmw_fence_manager *fman = fence->fman;
+ struct vmw_fence_manager *fman = fman_from_fence(fence);

ttm_base_object_kfree(ufence, base);
/*
@@ -620,7 +673,6 @@ out_no_object:

void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
{
- unsigned long irq_flags;
struct list_head action_list;
int ret;

@@ -629,13 +681,13 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
* restart when we've released the fman->lock.
*/

- spin_lock_irqsave(&fman->lock, irq_flags);
+ spin_lock_irq(&fman->lock);
fman->fifo_down = true;
while (!list_empty(&fman->fence_list)) {
struct vmw_fence_obj *fence =
list_entry(fman->fence_list.prev, struct vmw_fence_obj,
head);
- kref_get(&fence->kref);
+ fence_get(&fence->base);
spin_unlock_irq(&fman->lock);

ret = vmw_fence_obj_wait(fence, false, false,
@@ -643,20 +695,18 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)

if (unlikely(ret != 0)) {
list_del_init(&fence->head);
- fence->signaled = 1;
+ fence_signal(&fence->base);
INIT_LIST_HEAD(&action_list);
list_splice_init(&fence->seq_passed_actions,
&action_list);
vmw_fences_perform_actions(fman, &action_list);
- wake_up_all(&fence->queue);
}

- spin_lock_irq(&fman->lock);
-
BUG_ON(!list_empty(&fence->head));
- kref_put(&fence->kref, vmw_fence_obj_destroy_locked);
+ fence_put(&fence->base);
+ spin_lock_irq(&fman->lock);
}
- spin_unlock_irqrestore(&fman->lock, irq_flags);
+ spin_unlock_irq(&fman->lock);
}

void vmw_fence_fifo_up(struct vmw_fence_manager *fman)
@@ -748,12 +798,12 @@ int vmw_fence_obj_signaled_ioctl(struct drm_device *dev, void *data,
}

fence = &(container_of(base, struct vmw_user_fence, base)->fence);
- fman = fence->fman;
+ fman = fman_from_fence(fence);

arg->signaled = vmw_fence_obj_signaled(fence);
- spin_lock_irq(&fman->lock);

arg->signaled_flags = arg->flags;
+ spin_lock_irq(&fman->lock);
arg->passed_seqno = dev_priv->last_read_seqno;
spin_unlock_irq(&fman->lock);

@@ -866,7 +916,7 @@ static void vmw_event_fence_action_cleanup(struct vmw_fence_action *action)
{
struct vmw_event_fence_action *eaction =
container_of(action, struct vmw_event_fence_action, action);
- struct vmw_fence_manager *fman = eaction->fence->fman;
+ struct vmw_fence_manager *fman = fman_from_fence(eaction->fence);
unsigned long irq_flags;

spin_lock_irqsave(&fman->lock, irq_flags);
@@ -890,7 +940,7 @@ static void vmw_event_fence_action_cleanup(struct vmw_fence_action *action)
static void vmw_fence_obj_add_action(struct vmw_fence_obj *fence,
struct vmw_fence_action *action)
{
- struct vmw_fence_manager *fman = fence->fman;
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
unsigned long irq_flags;
bool run_update = false;

@@ -898,7 +948,7 @@ static void vmw_fence_obj_add_action(struct vmw_fence_obj *fence,
spin_lock_irqsave(&fman->lock, irq_flags);

fman->pending_actions[action->type]++;
- if (fence->signaled) {
+ if (fence_is_signaled_locked(&fence->base)) {
struct list_head action_list;

INIT_LIST_HEAD(&action_list);
@@ -950,7 +1000,7 @@ int vmw_event_fence_action_queue(struct drm_file *file_priv,
bool interruptible)
{
struct vmw_event_fence_action *eaction;
- struct vmw_fence_manager *fman = fence->fman;
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
struct vmw_fpriv *vmw_fp = vmw_fpriv(file_priv);
unsigned long irq_flags;

@@ -990,7 +1040,8 @@ static int vmw_event_fence_action_create(struct drm_file *file_priv,
bool interruptible)
{
struct vmw_event_fence_pending *event;
- struct drm_device *dev = fence->fman->dev_priv->dev;
+ struct vmw_fence_manager *fman = fman_from_fence(fence);
+ struct drm_device *dev = fman->dev_priv->dev;
unsigned long irq_flags;
int ret;

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
index 8c18d32bd1c3..26a4add39208 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
@@ -27,6 +27,8 @@

#ifndef _VMWGFX_FENCE_H_

+#include <linux/fence.h>
+
#define VMW_FENCE_WAIT_TIMEOUT (5*HZ)

struct vmw_private;
@@ -50,15 +52,11 @@ struct vmw_fence_action {
};

struct vmw_fence_obj {
- struct kref kref;
- u32 seqno;
+ struct fence base;

- struct vmw_fence_manager *fman;
struct list_head head;
- uint32_t signaled;
struct list_head seq_passed_actions;
void (*destroy)(struct vmw_fence_obj *fence);
- wait_queue_head_t queue;
};

extern struct vmw_fence_manager *
@@ -66,10 +64,23 @@ vmw_fence_manager_init(struct vmw_private *dev_priv);

extern void vmw_fence_manager_takedown(struct vmw_fence_manager *fman);

-extern void vmw_fence_obj_unreference(struct vmw_fence_obj **fence_p);
-
-extern struct vmw_fence_obj *
-vmw_fence_obj_reference(struct vmw_fence_obj *fence);
+static inline void
+vmw_fence_obj_unreference(struct vmw_fence_obj **fence_p)
+{
+ struct vmw_fence_obj *fence = *fence_p;
+
+ *fence_p = NULL;
+ if (fence)
+ fence_put(&fence->base);
+}
+
+static inline struct vmw_fence_obj *
+vmw_fence_obj_reference(struct vmw_fence_obj *fence)
+{
+ if (fence)
+ fence_get(&fence->base);
+ return fence;
+}

extern void vmw_fences_update(struct vmw_fence_manager *fman);

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 48e47a100dea..6688a6341486 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1419,21 +1419,20 @@ void vmw_fence_single_bo(struct ttm_buffer_object *bo,
struct vmw_fence_obj *fence)
{
struct ttm_bo_device *bdev = bo->bdev;
- struct ttm_bo_driver *driver = bdev->driver;
struct vmw_fence_obj *old_fence_obj;
struct vmw_private *dev_priv =
container_of(bdev, struct vmw_private, bdev);

- if (fence == NULL)
+ if (fence == NULL) {
vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL);
- else
- driver->sync_obj_ref(fence);
+ } else
+ vmw_fence_obj_reference(fence);

+ reservation_object_add_excl_fence(bo->resv, &fence->base);

old_fence_obj = bo->sync_obj;
bo->sync_obj = fence;

-
if (old_fence_obj)
vmw_fence_obj_unreference(&old_fence_obj);
}

2014-07-09 13:00:38

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH 09/17] drm/radeon: use common fence implementation for fences



> -----Original Message-----
> From: Maarten Lankhorst [mailto:[email protected]]
> Sent: Wednesday, July 09, 2014 8:30 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected];
> [email protected]; Deucher, Alexander; Koenig, Christian
> Subject: [PATCH 09/17] drm/radeon: use common fence implementation for
> fences
>
> Signed-off-by: Maarten Lankhorst <[email protected]>
> ---
> drivers/gpu/drm/radeon/radeon.h | 15 +-
> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> drivers/gpu/drm/radeon/radeon_fence.c | 223
> ++++++++++++++++++++++++++------
> 3 files changed, 248 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h
> b/drivers/gpu/drm/radeon/radeon.h
> index 29d9cc04c04e..03a5567f2c2f 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -64,6 +64,7 @@
> #include <linux/wait.h>
> #include <linux/list.h>
> #include <linux/kref.h>
> +#include <linux/fence.h>
>
> #include <ttm/ttm_bo_api.h>
> #include <ttm/ttm_bo_driver.h>
> @@ -116,9 +117,6 @@ extern int radeon_deep_color;
> #define RADEONFB_CONN_LIMIT 4
> #define RADEON_BIOS_NUM_SCRATCH 8
>
> -/* fence seq are set to this number when signaled */
> -#define RADEON_FENCE_SIGNALED_SEQ 0LL
> -
> /* internal ring indices */
> /* r1xx+ has gfx CP ring */
> #define RADEON_RING_TYPE_GFX_INDEX 0
> @@ -350,12 +348,15 @@ struct radeon_fence_driver {
> };
>
> struct radeon_fence {
> + struct fence base;
> +
> struct radeon_device *rdev;
> - struct kref kref;
> /* protected by radeon_fence.lock */
> uint64_t seq;
> /* RB, DMA, etc. */
> unsigned ring;
> +
> + wait_queue_t fence_wake;
> };
>
> int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
> @@ -2268,6 +2269,7 @@ struct radeon_device {
> struct radeon_mman mman;
> struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS];
> wait_queue_head_t fence_queue;
> + unsigned fence_context;
> struct mutex ring_lock;
> struct radeon_ring ring[RADEON_NUM_RINGS];
> bool ib_pool_ready;
> @@ -2358,11 +2360,6 @@ u32 cik_mm_rdoorbell(struct radeon_device
> *rdev, u32 index);
> void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
>
> /*
> - * Cast helper
> - */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> -
> -/*
> * Registers read & write functions.
> */
> #define RREG8(reg) readb((rdev->rmmio) + (reg))
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
> b/drivers/gpu/drm/radeon/radeon_device.c
> index 03686fab842d..86699df7c8f3 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1213,6 +1213,7 @@ int radeon_device_init(struct radeon_device
> *rdev,
> for (i = 0; i < RADEON_NUM_RINGS; i++) {
> rdev->ring[i].idx = i;
> }
> + rdev->fence_context =
> fence_context_alloc(RADEON_NUM_RINGS);
>
> DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X
> 0x%04X:0x%04X).\n",
> radeon_family_name[rdev->family], pdev->vendor, pdev-
> >device,
> @@ -1607,6 +1608,54 @@ int radeon_resume_kms(struct drm_device *dev,
> bool resume, bool fbcon)
> return 0;
> }
>
> +static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
> +{
> + uint32_t mask = 0;
> + int i;
> +
> + if (!rdev->ddev->irq_enabled)
> + return mask;
> +
> + /*
> + * increase refcount on sw interrupts for all rings to stop
> + * enabling interrupts in radeon_fence_enable_signaling during
> + * gpu reset.
> + */
> +
> + for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> + if (!rdev->ring[i].ready)
> + continue;
> +
> + atomic_inc(&rdev->irq.ring_int[i]);
> + mask |= 1 << i;
> + }
> + return mask;
> +}
> +
> +static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev,
> uint32_t mask)
> +{
> + unsigned long irqflags;
> + int i;
> +
> + if (!mask)
> + return;
> +
> + /*
> + * undo refcount increase, and reset irqs to correct value.
> + */
> +
> + for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> + if (!(mask & (1 << i)))
> + continue;
> +
> + atomic_dec(&rdev->irq.ring_int[i]);
> + }
> +
> + spin_lock_irqsave(&rdev->irq.lock, irqflags);
> + radeon_irq_set(rdev);
> + spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
> +}
> +
> /**
> * radeon_gpu_reset - reset the asic
> *
> @@ -1624,6 +1673,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>
> int i, r;
> int resched;
> + uint32_t sw_mask;
>
> down_write(&rdev->exclusive_lock);
>
> @@ -1637,6 +1687,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
> radeon_save_bios_scratch_regs(rdev);
> /* block TTM */
> resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
> + sw_mask = radeon_gpu_mask_sw_irq(rdev);
> radeon_pm_suspend(rdev);
> radeon_suspend(rdev);
>
> @@ -1686,13 +1737,20 @@ retry:
> radeon_pm_resume(rdev);
> drm_helper_resume_force_mode(rdev->ddev);
>
> + radeon_gpu_unmask_sw_irq(rdev, sw_mask);
> ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev,
> resched);
> if (r) {
> /* bad news, how to tell it to userspace ? */
> dev_info(rdev->dev, "GPU reset failed\n");
> }
>
> - up_write(&rdev->exclusive_lock);
> + /*
> + * force all waiters to recheck, some may have been
> + * added while the exclusive_lock was unavailable
> + */
> + downgrade_write(&rdev->exclusive_lock);
> + wake_up_all(&rdev->fence_queue);
> + up_read(&rdev->exclusive_lock);
> return r;
> }
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index 6435719fd45b..81c98f6ff0ca 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -39,6 +39,15 @@
> #include "radeon.h"
> #include "radeon_trace.h"
>
> +static const struct fence_ops radeon_fence_ops;
> +
> +#define to_radeon_fence(p) \
> + ({ \
> + struct radeon_fence *__f; \
> + __f = container_of((p), struct radeon_fence, base); \
> + __f->base.ops == &radeon_fence_ops ? __f : NULL; \
> + })
> +
> /*
> * Fences
> * Fences mark an event in the GPUs pipeline and are used
> @@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device
> *rdev,
> struct radeon_fence **fence,
> int ring)
> {
> + u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +
> /* we are protected by the ring emission mutex */
> *fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
> if ((*fence) == NULL) {
> return -ENOMEM;
> }
> - kref_init(&((*fence)->kref));
> - (*fence)->rdev = rdev;
> - (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
> (*fence)->ring = ring;
> + fence_init(&(*fence)->base, &radeon_fence_ops,
> + &rdev->fence_queue.lock, rdev->fence_context + ring,
> seq);
> + (*fence)->rdev = rdev;
> + (*fence)->seq = seq;
> radeon_fence_ring_emit(rdev, ring, *fence);
> trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
> return 0;
> }
>
> /**
> - * radeon_fence_process - process a fence
> + * radeon_fence_check_signaled - callback from fence_queue
> *
> - * @rdev: radeon_device pointer
> - * @ring: ring index the fence is associated with
> - *
> - * Checks the current fence value and wakes the fence queue
> - * if the sequence number has increased (all asics).
> + * this function is called with fence_queue lock held, which is also used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
> */
> -void radeon_fence_process(struct radeon_device *rdev, int ring)
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned
> mode, int flags, void *key)
> +{
> + struct radeon_fence *fence;
> + u64 seq;
> +
> + fence = container_of(wait, struct radeon_fence, fence_wake);
> +
> + seq = atomic64_read(&fence->rdev->fence_drv[fence-
> >ring].last_seq);
> + if (seq >= fence->seq) {
> + int ret = fence_signal_locked(&fence->base);
> +
> + if (!ret)
> + FENCE_TRACE(&fence->base, "signaled from irq
> context\n");
> + else
> + FENCE_TRACE(&fence->base, "was already
> signaled\n");
> +
> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> + __remove_wait_queue(&fence->rdev->fence_queue,
> &fence->fence_wake);
> + fence_put(&fence->base);
> + } else
> + FENCE_TRACE(&fence->base, "pending\n");
> + return 0;
> +}
> +
> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
> {
> uint64_t seq, last_seq, last_emitted;
> unsigned count_loop = 0;
> @@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device
> *rdev, int ring)
> }
> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) >
> seq);
>
> - if (wake)
> - wake_up_all(&rdev->fence_queue);
> + return wake;
> }
>
> /**
> - * radeon_fence_destroy - destroy a fence
> + * radeon_fence_process - process a fence
> *
> - * @kref: fence kref
> + * @rdev: radeon_device pointer
> + * @ring: ring index the fence is associated with
> *
> - * Frees the fence object (all asics).
> + * Checks the current fence value and wakes the fence queue
> + * if the sequence number has increased (all asics).
> */
> -static void radeon_fence_destroy(struct kref *kref)
> +void radeon_fence_process(struct radeon_device *rdev, int ring)
> {
> - struct radeon_fence *fence;
> -
> - fence = container_of(kref, struct radeon_fence, kref);
> - kfree(fence);
> + if (__radeon_fence_process(rdev, ring))
> + wake_up_all(&rdev->fence_queue);
> }
>
> /**
> @@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct
> radeon_device *rdev,
> return false;
> }
>
> +static bool __radeon_fence_signaled(struct fence *f)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + struct radeon_device *rdev = fence->rdev;
> + unsigned ring = fence->ring;
> + u64 seq = fence->seq;
> +
> + if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> + return true;
> + }
> +
> + if (down_read_trylock(&rdev->exclusive_lock)) {
> + radeon_fence_process(rdev, ring);
> + up_read(&rdev->exclusive_lock);
> +
> + if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq)
> {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + struct radeon_device *rdev = fence->rdev;
> +
> + if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=
> fence->seq ||
> + !rdev->ddev->irq_enabled)
> + return false;
> +
> + radeon_irq_kms_sw_irq_get(rdev, fence->ring);
> +
> + if (down_read_trylock(&rdev->exclusive_lock)) {
> + if (__radeon_fence_process(rdev, fence->ring))
> + wake_up_all_locked(&rdev->fence_queue);
> +
> + up_read(&rdev->exclusive_lock);
> + }
> +
> + /* did fence get signaled after we enabled the sw irq? */
> + if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=
> fence->seq) {
> + radeon_irq_kms_sw_irq_put(rdev, fence->ring);
> + return false;
> + }
> +
> + fence->fence_wake.flags = 0;
> + fence->fence_wake.private = NULL;
> + fence->fence_wake.func = radeon_fence_check_signaled;
> + __add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
> + fence_get(f);
> +
> + return true;
> +}
> +
> /**
> * radeon_fence_signaled - check if a fence has signaled
> *
> @@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence
> *fence)
> if (!fence) {
> return true;
> }
> - if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
> - return true;
> - }
> +
> if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence-
> >ring)) {
> - fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> + int ret;
> +
> + ret = fence_signal(&fence->base);
> + if (!ret)
> + FENCE_TRACE(&fence->base, "signaled from
> radeon_fence_signaled\n");
> return true;
> }
> return false;
> @@ -413,21 +511,18 @@ int radeon_fence_wait(struct radeon_fence
> *fence, bool intr)
> uint64_t seq[RADEON_NUM_RINGS] = {};
> long r;
>
> - if (fence == NULL) {
> - WARN(1, "Querying an invalid fence : %p !\n", fence);
> - return -EINVAL;
> - }
> -
> - seq[fence->ring] = fence->seq;
> - if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
> + if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
> return 0;
>
> + seq[fence->ring] = fence->seq;
> r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr,
> MAX_SCHEDULE_TIMEOUT);
> if (r < 0) {
> return r;
> }
>
> - fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> + r = fence_signal(&fence->base);
> + if (!r)
> + FENCE_TRACE(&fence->base, "signaled from
> fence_wait\n");
> return 0;
> }
>
> @@ -459,12 +554,13 @@ int radeon_fence_wait_any(struct radeon_device
> *rdev,
> continue;
> }
>
> + if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]-
> >base.flags)) {
> + /* already signaled */
> + return 0;
> + }
> +
> seq[i] = fences[i]->seq;
> ++num_rings;
> -
> - /* test if something was allready signaled */
> - if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
> - return 0;
> }
>
> /* nothing to wait for ? */
> @@ -545,7 +641,7 @@ int radeon_fence_wait_empty(struct radeon_device
> *rdev, int ring)
> */
> struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
> {
> - kref_get(&fence->kref);
> + fence_get(&fence->base);
> return fence;
> }
>
> @@ -561,9 +657,8 @@ void radeon_fence_unref(struct radeon_fence
> **fence)
> struct radeon_fence *tmp = *fence;
>
> *fence = NULL;
> - if (tmp) {
> - kref_put(&tmp->kref, radeon_fence_destroy);
> - }
> + if (tmp)
> + fence_put(&tmp->base);
> }
>
> /**
> @@ -872,3 +967,51 @@ int radeon_debugfs_fence_init(struct
> radeon_device *rdev)
> return 0;
> #endif
> }
> +
> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + u64 target_seq[RADEON_NUM_RINGS] = {};
> + struct radeon_device *rdev = fence->rdev;
> + long r;
> +
> + target_seq[fence->ring] = fence->seq;
> +
> + down_read(&rdev->exclusive_lock);
> + r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr,
> timeout);
> +
> + if (r > 0 && !fence_signal(&fence->base))
> + FENCE_TRACE(&fence->base, "signaled from
> __radeon_fence_wait\n");
> +
> + up_read(&rdev->exclusive_lock);
> + return r;
> +
> +}
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> + return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + switch (fence->ring) {
> + case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> + case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> + case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> + case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> + case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> + case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";

Radeon supports vce rings on newer ascis. Probably want to add the case for those here too.

Alex

> + default: WARN_ON_ONCE(1); return "radeon.unk";
> + }
> +}
> +
> +static const struct fence_ops radeon_fence_ops = {
> + .get_driver_name = radeon_fence_get_driver_name,
> + .get_timeline_name = radeon_fence_get_timeline_name,
> + .enable_signaling = radeon_fence_enable_signaling,
> + .signaled = __radeon_fence_signaled,
> + .wait = __radeon_fence_wait,
> + .release = NULL,
> +};

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-07-09 13:21:30

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH 00/17] Convert TTM to the new fence interface.

op 09-07-14 15:09, Mike Lothian schreef:
> Hi Maarten
>
> Will this stop the stuttering I've been seeing with DRI3 and PRIME? Or will
> other patches / plumbing be required
>
No, that testing was with the whole series including the parts where you synchronized intel with radeon (iirc).
Although it might if lucky, I noticed that I missed a int to long conversion, which resulted in a success being
reported as error, disabling graphics acceleration entirely.

The series here simply convert the drivers to a common fence infrastructure, but shouldn't cause any regressions
or any major behavioral changes. A separate series is needed to make intel and radeon synchronized,
and for that series the support on the intel side is a hack. It should be possible to get the the radeon/nouveau
changes upstreamed, but this conversion is required for that.

~Maarten

2014-07-09 13:23:55

by Maarten Lankhorst

[permalink] [raw]
Subject: [PATCH v2 09/17] drm/radeon: use common fence implementation for fences

op 09-07-14 14:57, Deucher, Alexander schreef:
>> <snip>
>> +static const char *radeon_fence_get_timeline_name(struct fence *f)
>> +{
>> + struct radeon_fence *fence = to_radeon_fence(f);
>> + switch (fence->ring) {
>> + case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
>> + case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
>> + case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
>> + case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
>> + case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
>> + case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> Radeon supports vce rings on newer ascis. Probably want to add the case for those here too.
>
> Alex
>
Indeed, how about this?
----------8<-------
Signed-off-by: Maarten Lankhorst <[email protected]>
---
drivers/gpu/drm/radeon/radeon.h | 15 +--
drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
drivers/gpu/drm/radeon/radeon_fence.c | 225 +++++++++++++++++++++++++++------
3 files changed, 250 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 29d9cc04c04e..03a5567f2c2f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
#include <linux/wait.h>
#include <linux/list.h>
#include <linux/kref.h>
+#include <linux/fence.h>

#include <ttm/ttm_bo_api.h>
#include <ttm/ttm_bo_driver.h>
@@ -116,9 +117,6 @@ extern int radeon_deep_color;
#define RADEONFB_CONN_LIMIT 4
#define RADEON_BIOS_NUM_SCRATCH 8

-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ 0LL
-
/* internal ring indices */
/* r1xx+ has gfx CP ring */
#define RADEON_RING_TYPE_GFX_INDEX 0
@@ -350,12 +348,15 @@ struct radeon_fence_driver {
};

struct radeon_fence {
+ struct fence base;
+
struct radeon_device *rdev;
- struct kref kref;
/* protected by radeon_fence.lock */
uint64_t seq;
/* RB, DMA, etc. */
unsigned ring;
+
+ wait_queue_t fence_wake;
};

int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2268,6 +2269,7 @@ struct radeon_device {
struct radeon_mman mman;
struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t fence_queue;
+ unsigned fence_context;
struct mutex ring_lock;
struct radeon_ring ring[RADEON_NUM_RINGS];
bool ib_pool_ready;
@@ -2358,11 +2360,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);

/*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
* Registers read & write functions.
*/
#define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 03686fab842d..86699df7c8f3 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1213,6 +1213,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+ rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);

DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1607,6 +1608,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
return 0;
}

+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+ uint32_t mask = 0;
+ int i;
+
+ if (!rdev->ddev->irq_enabled)
+ return mask;
+
+ /*
+ * increase refcount on sw interrupts for all rings to stop
+ * enabling interrupts in radeon_fence_enable_signaling during
+ * gpu reset.
+ */
+
+ for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+ if (!rdev->ring[i].ready)
+ continue;
+
+ atomic_inc(&rdev->irq.ring_int[i]);
+ mask |= 1 << i;
+ }
+ return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+ unsigned long irqflags;
+ int i;
+
+ if (!mask)
+ return;
+
+ /*
+ * undo refcount increase, and reset irqs to correct value.
+ */
+
+ for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+ if (!(mask & (1 << i)))
+ continue;
+
+ atomic_dec(&rdev->irq.ring_int[i]);
+ }
+
+ spin_lock_irqsave(&rdev->irq.lock, irqflags);
+ radeon_irq_set(rdev);
+ spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
/**
* radeon_gpu_reset - reset the asic
*
@@ -1624,6 +1673,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)

int i, r;
int resched;
+ uint32_t sw_mask;

down_write(&rdev->exclusive_lock);

@@ -1637,6 +1687,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
radeon_save_bios_scratch_regs(rdev);
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+ sw_mask = radeon_gpu_mask_sw_irq(rdev);
radeon_pm_suspend(rdev);
radeon_suspend(rdev);

@@ -1686,13 +1737,20 @@ retry:
radeon_pm_resume(rdev);
drm_helper_resume_force_mode(rdev->ddev);

+ radeon_gpu_unmask_sw_irq(rdev, sw_mask);
ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
if (r) {
/* bad news, how to tell it to userspace ? */
dev_info(rdev->dev, "GPU reset failed\n");
}

- up_write(&rdev->exclusive_lock);
+ /*
+ * force all waiters to recheck, some may have been
+ * added while the exclusive_lock was unavailable
+ */
+ downgrade_write(&rdev->exclusive_lock);
+ wake_up_all(&rdev->fence_queue);
+ up_read(&rdev->exclusive_lock);
return r;
}

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 6435719fd45b..763b7928026d 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
#include "radeon.h"
#include "radeon_trace.h"

+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+ ({ \
+ struct radeon_fence *__f; \
+ __f = container_of((p), struct radeon_fence, base); \
+ __f->base.ops == &radeon_fence_ops ? __f : NULL; \
+ })
+
/*
* Fences
* Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
struct radeon_fence **fence,
int ring)
{
+ u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
/* we are protected by the ring emission mutex */
*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
if ((*fence) == NULL) {
return -ENOMEM;
}
- kref_init(&((*fence)->kref));
- (*fence)->rdev = rdev;
- (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
(*fence)->ring = ring;
+ fence_init(&(*fence)->base, &radeon_fence_ops,
+ &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+ (*fence)->rdev = rdev;
+ (*fence)->seq = seq;
radeon_fence_ring_emit(rdev, ring, *fence);
trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
return 0;
}

/**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
*
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
*/
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+ struct radeon_fence *fence;
+ u64 seq;
+
+ fence = container_of(wait, struct radeon_fence, fence_wake);
+
+ seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+ if (seq >= fence->seq) {
+ int ret = fence_signal_locked(&fence->base);
+
+ if (!ret)
+ FENCE_TRACE(&fence->base, "signaled from irq context\n");
+ else
+ FENCE_TRACE(&fence->base, "was already signaled\n");
+
+ radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+ __remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+ fence_put(&fence->base);
+ } else
+ FENCE_TRACE(&fence->base, "pending\n");
+ return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
{
uint64_t seq, last_seq, last_emitted;
unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
}
} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);

- if (wake)
- wake_up_all(&rdev->fence_queue);
+ return wake;
}

/**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
*
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
*
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
*/
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
{
- struct radeon_fence *fence;
-
- fence = container_of(kref, struct radeon_fence, kref);
- kfree(fence);
+ if (__radeon_fence_process(rdev, ring))
+ wake_up_all(&rdev->fence_queue);
}

/**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
return false;
}

+static bool __radeon_fence_signaled(struct fence *f)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ struct radeon_device *rdev = fence->rdev;
+ unsigned ring = fence->ring;
+ u64 seq = fence->seq;
+
+ if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+ return true;
+ }
+
+ if (down_read_trylock(&rdev->exclusive_lock)) {
+ radeon_fence_process(rdev, ring);
+ up_read(&rdev->exclusive_lock);
+
+ if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+ return true;
+ }
+ }
+ return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ struct radeon_device *rdev = fence->rdev;
+
+ if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+ !rdev->ddev->irq_enabled)
+ return false;
+
+ radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+ if (down_read_trylock(&rdev->exclusive_lock)) {
+ if (__radeon_fence_process(rdev, fence->ring))
+ wake_up_all_locked(&rdev->fence_queue);
+
+ up_read(&rdev->exclusive_lock);
+ }
+
+ /* did fence get signaled after we enabled the sw irq? */
+ if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+ radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+ return false;
+ }
+
+ fence->fence_wake.flags = 0;
+ fence->fence_wake.private = NULL;
+ fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+ fence_get(f);
+
+ return true;
+}
+
/**
* radeon_fence_signaled - check if a fence has signaled
*
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
if (!fence) {
return true;
}
- if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
- return true;
- }
+
if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
- fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+ int ret;
+
+ ret = fence_signal(&fence->base);
+ if (!ret)
+ FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
return true;
}
return false;
@@ -413,21 +511,18 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
uint64_t seq[RADEON_NUM_RINGS] = {};
long r;

- if (fence == NULL) {
- WARN(1, "Querying an invalid fence : %p !\n", fence);
- return -EINVAL;
- }
-
- seq[fence->ring] = fence->seq;
- if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
return 0;

+ seq[fence->ring] = fence->seq;
r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
if (r < 0) {
return r;
}

- fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+ r = fence_signal(&fence->base);
+ if (!r)
+ FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
return 0;
}

@@ -459,12 +554,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
continue;
}

+ if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+ /* already signaled */
+ return 0;
+ }
+
seq[i] = fences[i]->seq;
++num_rings;
-
- /* test if something was allready signaled */
- if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
- return 0;
}

/* nothing to wait for ? */
@@ -545,7 +641,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
*/
struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
{
- kref_get(&fence->kref);
+ fence_get(&fence->base);
return fence;
}

@@ -561,9 +657,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
struct radeon_fence *tmp = *fence;

*fence = NULL;
- if (tmp) {
- kref_put(&tmp->kref, radeon_fence_destroy);
- }
+ if (tmp)
+ fence_put(&tmp->base);
}

/**
@@ -872,3 +967,53 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
return 0;
#endif
}
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ u64 target_seq[RADEON_NUM_RINGS] = {};
+ struct radeon_device *rdev = fence->rdev;
+ long r;
+
+ target_seq[fence->ring] = fence->seq;
+
+ down_read(&rdev->exclusive_lock);
+ r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+ if (r > 0 && !fence_signal(&fence->base))
+ FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+ up_read(&rdev->exclusive_lock);
+ return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+ return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+ struct radeon_fence *fence = to_radeon_fence(f);
+ switch (fence->ring) {
+ case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+ case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+ case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+ case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+ case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+ case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+ case TN_RING_TYPE_VCE1_INDEX: return "radeon.vce1";
+ case TN_RING_TYPE_VCE2_INDEX: return "radeon.vce2";
+ default: WARN_ON_ONCE(1); return "radeon.unk";
+ }
+}
+
+static const struct fence_ops radeon_fence_ops = {
+ .get_driver_name = radeon_fence_get_driver_name,
+ .get_timeline_name = radeon_fence_get_timeline_name,
+ .enable_signaling = radeon_fence_enable_signaling,
+ .signaled = __radeon_fence_signaled,
+ .wait = __radeon_fence_wait,
+ .release = NULL,
+};
--
2.0.0

2014-07-10 17:27:45

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 9, 2014 at 9:23 AM, Maarten Lankhorst
<[email protected]> wrote:
> op 09-07-14 14:57, Deucher, Alexander schreef:
>>> <snip>
>>> +static const char *radeon_fence_get_timeline_name(struct fence *f)
>>> +{
>>> + struct radeon_fence *fence = to_radeon_fence(f);
>>> + switch (fence->ring) {
>>> + case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
>>> + case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
>>> + case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
>>> + case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
>>> + case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
>>> + case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
>> Radeon supports vce rings on newer ascis. Probably want to add the case for those here too.
>>
>> Alex
>>
> Indeed, how about this?

Looks good. I'll let Christian comment on the rest of the changes.

Alex

> ----------8<-------
> Signed-off-by: Maarten Lankhorst <[email protected]>
> ---
> drivers/gpu/drm/radeon/radeon.h | 15 +--
> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> drivers/gpu/drm/radeon/radeon_fence.c | 225 +++++++++++++++++++++++++++------
> 3 files changed, 250 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 29d9cc04c04e..03a5567f2c2f 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -64,6 +64,7 @@
> #include <linux/wait.h>
> #include <linux/list.h>
> #include <linux/kref.h>
> +#include <linux/fence.h>
>
> #include <ttm/ttm_bo_api.h>
> #include <ttm/ttm_bo_driver.h>
> @@ -116,9 +117,6 @@ extern int radeon_deep_color;
> #define RADEONFB_CONN_LIMIT 4
> #define RADEON_BIOS_NUM_SCRATCH 8
>
> -/* fence seq are set to this number when signaled */
> -#define RADEON_FENCE_SIGNALED_SEQ 0LL
> -
> /* internal ring indices */
> /* r1xx+ has gfx CP ring */
> #define RADEON_RING_TYPE_GFX_INDEX 0
> @@ -350,12 +348,15 @@ struct radeon_fence_driver {
> };
>
> struct radeon_fence {
> + struct fence base;
> +
> struct radeon_device *rdev;
> - struct kref kref;
> /* protected by radeon_fence.lock */
> uint64_t seq;
> /* RB, DMA, etc. */
> unsigned ring;
> +
> + wait_queue_t fence_wake;
> };
>
> int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
> @@ -2268,6 +2269,7 @@ struct radeon_device {
> struct radeon_mman mman;
> struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS];
> wait_queue_head_t fence_queue;
> + unsigned fence_context;
> struct mutex ring_lock;
> struct radeon_ring ring[RADEON_NUM_RINGS];
> bool ib_pool_ready;
> @@ -2358,11 +2360,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
> void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
>
> /*
> - * Cast helper
> - */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> -
> -/*
> * Registers read & write functions.
> */
> #define RREG8(reg) readb((rdev->rmmio) + (reg))
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 03686fab842d..86699df7c8f3 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1213,6 +1213,7 @@ int radeon_device_init(struct radeon_device *rdev,
> for (i = 0; i < RADEON_NUM_RINGS; i++) {
> rdev->ring[i].idx = i;
> }
> + rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
>
> DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
> radeon_family_name[rdev->family], pdev->vendor, pdev->device,
> @@ -1607,6 +1608,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
> return 0;
> }
>
> +static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
> +{
> + uint32_t mask = 0;
> + int i;
> +
> + if (!rdev->ddev->irq_enabled)
> + return mask;
> +
> + /*
> + * increase refcount on sw interrupts for all rings to stop
> + * enabling interrupts in radeon_fence_enable_signaling during
> + * gpu reset.
> + */
> +
> + for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> + if (!rdev->ring[i].ready)
> + continue;
> +
> + atomic_inc(&rdev->irq.ring_int[i]);
> + mask |= 1 << i;
> + }
> + return mask;
> +}
> +
> +static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
> +{
> + unsigned long irqflags;
> + int i;
> +
> + if (!mask)
> + return;
> +
> + /*
> + * undo refcount increase, and reset irqs to correct value.
> + */
> +
> + for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> + if (!(mask & (1 << i)))
> + continue;
> +
> + atomic_dec(&rdev->irq.ring_int[i]);
> + }
> +
> + spin_lock_irqsave(&rdev->irq.lock, irqflags);
> + radeon_irq_set(rdev);
> + spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
> +}
> +
> /**
> * radeon_gpu_reset - reset the asic
> *
> @@ -1624,6 +1673,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>
> int i, r;
> int resched;
> + uint32_t sw_mask;
>
> down_write(&rdev->exclusive_lock);
>
> @@ -1637,6 +1687,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
> radeon_save_bios_scratch_regs(rdev);
> /* block TTM */
> resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
> + sw_mask = radeon_gpu_mask_sw_irq(rdev);
> radeon_pm_suspend(rdev);
> radeon_suspend(rdev);
>
> @@ -1686,13 +1737,20 @@ retry:
> radeon_pm_resume(rdev);
> drm_helper_resume_force_mode(rdev->ddev);
>
> + radeon_gpu_unmask_sw_irq(rdev, sw_mask);
> ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
> if (r) {
> /* bad news, how to tell it to userspace ? */
> dev_info(rdev->dev, "GPU reset failed\n");
> }
>
> - up_write(&rdev->exclusive_lock);
> + /*
> + * force all waiters to recheck, some may have been
> + * added while the exclusive_lock was unavailable
> + */
> + downgrade_write(&rdev->exclusive_lock);
> + wake_up_all(&rdev->fence_queue);
> + up_read(&rdev->exclusive_lock);
> return r;
> }
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index 6435719fd45b..763b7928026d 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -39,6 +39,15 @@
> #include "radeon.h"
> #include "radeon_trace.h"
>
> +static const struct fence_ops radeon_fence_ops;
> +
> +#define to_radeon_fence(p) \
> + ({ \
> + struct radeon_fence *__f; \
> + __f = container_of((p), struct radeon_fence, base); \
> + __f->base.ops == &radeon_fence_ops ? __f : NULL; \
> + })
> +
> /*
> * Fences
> * Fences mark an event in the GPUs pipeline and are used
> @@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
> struct radeon_fence **fence,
> int ring)
> {
> + u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +
> /* we are protected by the ring emission mutex */
> *fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
> if ((*fence) == NULL) {
> return -ENOMEM;
> }
> - kref_init(&((*fence)->kref));
> - (*fence)->rdev = rdev;
> - (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
> (*fence)->ring = ring;
> + fence_init(&(*fence)->base, &radeon_fence_ops,
> + &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
> + (*fence)->rdev = rdev;
> + (*fence)->seq = seq;
> radeon_fence_ring_emit(rdev, ring, *fence);
> trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
> return 0;
> }
>
> /**
> - * radeon_fence_process - process a fence
> + * radeon_fence_check_signaled - callback from fence_queue
> *
> - * @rdev: radeon_device pointer
> - * @ring: ring index the fence is associated with
> - *
> - * Checks the current fence value and wakes the fence queue
> - * if the sequence number has increased (all asics).
> + * this function is called with fence_queue lock held, which is also used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
> */
> -void radeon_fence_process(struct radeon_device *rdev, int ring)
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
> +{
> + struct radeon_fence *fence;
> + u64 seq;
> +
> + fence = container_of(wait, struct radeon_fence, fence_wake);
> +
> + seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
> + if (seq >= fence->seq) {
> + int ret = fence_signal_locked(&fence->base);
> +
> + if (!ret)
> + FENCE_TRACE(&fence->base, "signaled from irq context\n");
> + else
> + FENCE_TRACE(&fence->base, "was already signaled\n");
> +
> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> + __remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> + fence_put(&fence->base);
> + } else
> + FENCE_TRACE(&fence->base, "pending\n");
> + return 0;
> +}
> +
> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
> {
> uint64_t seq, last_seq, last_emitted;
> unsigned count_loop = 0;
> @@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
> }
> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>
> - if (wake)
> - wake_up_all(&rdev->fence_queue);
> + return wake;
> }
>
> /**
> - * radeon_fence_destroy - destroy a fence
> + * radeon_fence_process - process a fence
> *
> - * @kref: fence kref
> + * @rdev: radeon_device pointer
> + * @ring: ring index the fence is associated with
> *
> - * Frees the fence object (all asics).
> + * Checks the current fence value and wakes the fence queue
> + * if the sequence number has increased (all asics).
> */
> -static void radeon_fence_destroy(struct kref *kref)
> +void radeon_fence_process(struct radeon_device *rdev, int ring)
> {
> - struct radeon_fence *fence;
> -
> - fence = container_of(kref, struct radeon_fence, kref);
> - kfree(fence);
> + if (__radeon_fence_process(rdev, ring))
> + wake_up_all(&rdev->fence_queue);
> }
>
> /**
> @@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
> return false;
> }
>
> +static bool __radeon_fence_signaled(struct fence *f)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + struct radeon_device *rdev = fence->rdev;
> + unsigned ring = fence->ring;
> + u64 seq = fence->seq;
> +
> + if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> + return true;
> + }
> +
> + if (down_read_trylock(&rdev->exclusive_lock)) {
> + radeon_fence_process(rdev, ring);
> + up_read(&rdev->exclusive_lock);
> +
> + if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + struct radeon_device *rdev = fence->rdev;
> +
> + if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
> + !rdev->ddev->irq_enabled)
> + return false;
> +
> + radeon_irq_kms_sw_irq_get(rdev, fence->ring);
> +
> + if (down_read_trylock(&rdev->exclusive_lock)) {
> + if (__radeon_fence_process(rdev, fence->ring))
> + wake_up_all_locked(&rdev->fence_queue);
> +
> + up_read(&rdev->exclusive_lock);
> + }
> +
> + /* did fence get signaled after we enabled the sw irq? */
> + if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
> + radeon_irq_kms_sw_irq_put(rdev, fence->ring);
> + return false;
> + }
> +
> + fence->fence_wake.flags = 0;
> + fence->fence_wake.private = NULL;
> + fence->fence_wake.func = radeon_fence_check_signaled;
> + __add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
> + fence_get(f);
> +
> + return true;
> +}
> +
> /**
> * radeon_fence_signaled - check if a fence has signaled
> *
> @@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
> if (!fence) {
> return true;
> }
> - if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
> - return true;
> - }
> +
> if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
> - fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> + int ret;
> +
> + ret = fence_signal(&fence->base);
> + if (!ret)
> + FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
> return true;
> }
> return false;
> @@ -413,21 +511,18 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
> uint64_t seq[RADEON_NUM_RINGS] = {};
> long r;
>
> - if (fence == NULL) {
> - WARN(1, "Querying an invalid fence : %p !\n", fence);
> - return -EINVAL;
> - }
> -
> - seq[fence->ring] = fence->seq;
> - if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
> + if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
> return 0;
>
> + seq[fence->ring] = fence->seq;
> r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
> if (r < 0) {
> return r;
> }
>
> - fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> + r = fence_signal(&fence->base);
> + if (!r)
> + FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
> return 0;
> }
>
> @@ -459,12 +554,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
> continue;
> }
>
> + if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
> + /* already signaled */
> + return 0;
> + }
> +
> seq[i] = fences[i]->seq;
> ++num_rings;
> -
> - /* test if something was allready signaled */
> - if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
> - return 0;
> }
>
> /* nothing to wait for ? */
> @@ -545,7 +641,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
> */
> struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
> {
> - kref_get(&fence->kref);
> + fence_get(&fence->base);
> return fence;
> }
>
> @@ -561,9 +657,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
> struct radeon_fence *tmp = *fence;
>
> *fence = NULL;
> - if (tmp) {
> - kref_put(&tmp->kref, radeon_fence_destroy);
> - }
> + if (tmp)
> + fence_put(&tmp->base);
> }
>
> /**
> @@ -872,3 +967,53 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
> return 0;
> #endif
> }
> +
> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + u64 target_seq[RADEON_NUM_RINGS] = {};
> + struct radeon_device *rdev = fence->rdev;
> + long r;
> +
> + target_seq[fence->ring] = fence->seq;
> +
> + down_read(&rdev->exclusive_lock);
> + r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
> +
> + if (r > 0 && !fence_signal(&fence->base))
> + FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
> +
> + up_read(&rdev->exclusive_lock);
> + return r;
> +
> +}
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> + return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> + struct radeon_fence *fence = to_radeon_fence(f);
> + switch (fence->ring) {
> + case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> + case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> + case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> + case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> + case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> + case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> + case TN_RING_TYPE_VCE1_INDEX: return "radeon.vce1";
> + case TN_RING_TYPE_VCE2_INDEX: return "radeon.vce2";
> + default: WARN_ON_ONCE(1); return "radeon.unk";
> + }
> +}
> +
> +static const struct fence_ops radeon_fence_ops = {
> + .get_driver_name = radeon_fence_get_driver_name,
> + .get_timeline_name = radeon_fence_get_timeline_name,
> + .enable_signaling = radeon_fence_enable_signaling,
> + .signaled = __radeon_fence_signaled,
> + .wait = __radeon_fence_wait,
> + .release = NULL,
> +};
> --
> 2.0.0
>
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

2014-07-10 21:37:35

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH 00/17] Convert TTM to the new fence interface.


On 2014-07-09 14:29, Maarten Lankhorst wrote:
> This series applies on top of the driver-core-next branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
>
> Before converting ttm to the new fence interface I had to fix some
> drivers to require a reservation before poking with fence_obj.
> After flipping the switch RCU becomes available instead, and
> the extra reservations can be dropped again. :-)
>
> I've done at least basic testing on all the drivers I've converted
> at some point, but more testing is definitely welcomed!

I'm currently on vacation for the next couple of weeks, so I can't test
or review but otherwise

Acked-by: Thomas Hellstrom <[email protected]>

> ---
>
> Maarten Lankhorst (17):
> drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
> drm/ttm: kill off some members to ttm_validate_buffer
> drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
> drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence
> drm/ttm: call ttm_bo_wait while inside a reservation
> drm/ttm: kill fence_lock
> drm/nouveau: rework to new fence interface
> drm/radeon: add timeout argument to radeon_fence_wait_seq
> drm/radeon: use common fence implementation for fences
> drm/qxl: rework to new fence interface
> drm/vmwgfx: get rid of different types of fence_flags entirely
> drm/vmwgfx: rework to new fence interface
> drm/ttm: flip the switch, and convert to dma_fence
> drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
> drm/radeon: use rcu waits in some ioctls
> drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
> drm/ttm: use rcu in core ttm
>
> drivers/gpu/drm/nouveau/core/core/event.c | 4
> drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +---
> drivers/gpu/drm/nouveau/nouveau_display.c | 25 +-
> drivers/gpu/drm/nouveau/nouveau_fence.c | 431 +++++++++++++++++++----------
> drivers/gpu/drm/nouveau/nouveau_fence.h | 22 +
> drivers/gpu/drm/nouveau/nouveau_gem.c | 55 +---
> drivers/gpu/drm/nouveau/nv04_fence.c | 4
> drivers/gpu/drm/nouveau/nv10_fence.c | 4
> drivers/gpu/drm/nouveau/nv17_fence.c | 2
> drivers/gpu/drm/nouveau/nv50_fence.c | 2
> drivers/gpu/drm/nouveau/nv84_fence.c | 11 -
> drivers/gpu/drm/qxl/Makefile | 2
> drivers/gpu/drm/qxl/qxl_cmd.c | 7
> drivers/gpu/drm/qxl/qxl_debugfs.c | 16 +
> drivers/gpu/drm/qxl/qxl_drv.h | 20 -
> drivers/gpu/drm/qxl/qxl_fence.c | 91 ------
> drivers/gpu/drm/qxl/qxl_kms.c | 1
> drivers/gpu/drm/qxl/qxl_object.c | 2
> drivers/gpu/drm/qxl/qxl_object.h | 6
> drivers/gpu/drm/qxl/qxl_release.c | 172 ++++++++++--
> drivers/gpu/drm/qxl/qxl_ttm.c | 93 ------
> drivers/gpu/drm/radeon/radeon.h | 15 -
> drivers/gpu/drm/radeon/radeon_cs.c | 10 +
> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++
> drivers/gpu/drm/radeon/radeon_display.c | 21 +
> drivers/gpu/drm/radeon/radeon_fence.c | 283 +++++++++++++++----
> drivers/gpu/drm/radeon/radeon_gem.c | 19 +
> drivers/gpu/drm/radeon/radeon_object.c | 8 -
> drivers/gpu/drm/radeon/radeon_ttm.c | 34 --
> drivers/gpu/drm/radeon/radeon_uvd.c | 10 -
> drivers/gpu/drm/radeon/radeon_vm.c | 16 +
> drivers/gpu/drm/ttm/ttm_bo.c | 187 ++++++-------
> drivers/gpu/drm/ttm/ttm_bo_util.c | 28 --
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 3
> drivers/gpu/drm/ttm/ttm_execbuf_util.c | 146 +++-------
> drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 47 ---
> drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1
> drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 24 --
> drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 329 ++++++++++++----------
> drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 35 +-
> drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 43 +--
> include/drm/ttm/ttm_bo_api.h | 7
> include/drm/ttm/ttm_bo_driver.h | 29 --
> include/drm/ttm/ttm_execbuf_util.h | 22 +
> 44 files changed, 1256 insertions(+), 1150 deletions(-)
> delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c
>

2014-07-22 04:05:39

by Dave Airlie

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
> Signed-off-by: Maarten Lankhorst <[email protected]>
> ---
> drivers/gpu/drm/radeon/radeon.h | 15 +-
> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
> 3 files changed, 248 insertions(+), 50 deletions(-)
>

>From what I can see this is still suffering from the problem that we
need to find a proper solution to,

My summary of the issues after talking to Jerome and Ben and
re-reading things is:

We really need to work out a better interface into the drivers to be
able to avoid random atomic entrypoints,

I'm sure you have some ideas and I think you really need to
investigate them to move this thing forward,
even it if means some issues with android sync pts.

but none of the two major drivers seem to want the interface as-is so
something needs to give

My major question is why we need an atomic callback here at all, what
scenario does it cover?

Surely we can use a workqueue based callback to ask a driver to check
its signalling, is it really
that urgent?

Dave.

2014-07-22 08:43:26

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 06:05, schrieb Dave Airlie:
> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>> Signed-off-by: Maarten Lankhorst <[email protected]>
>> ---
>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>
> From what I can see this is still suffering from the problem that we
> need to find a proper solution to,
>
> My summary of the issues after talking to Jerome and Ben and
> re-reading things is:
>
> We really need to work out a better interface into the drivers to be
> able to avoid random atomic entrypoints,

Which is exactly what I criticized from the very first beginning. Good
to know that I'm not the only one thinking that this isn't such a good idea.

> I'm sure you have some ideas and I think you really need to
> investigate them to move this thing forward,
> even it if means some issues with android sync pts.

Actually I think that TTMs fence interface already gave quite a good
hint how it might look like. I can only guess that this won't fit with
the Android stuff, otherwise I can't see a good reason why we didn't
stick with that.

> but none of the two major drivers seem to want the interface as-is so
> something needs to give
>
> My major question is why we need an atomic callback here at all, what
> scenario does it cover?

Agree totally. As far as I can see all current uses of the interface are
of the kind of waiting for a fence to signal.

No need for any callback from one driver into another, especially not in
atomic context. If a driver needs such a functionality it should just
start up a kernel thread and do it's waiting there.

This obviously shouldn't be an obstacle for pure hardware
implementations where one driver signals a semaphore another driver is
waiting for, or a high signal on an interrupt line directly wired
between two chips. And I think this is a completely different topic and
not necessarily part of the common fence interface we should currently
focus on.

Christian.

> Surely we can use a workqueue based callback to ask a driver to check
> its signalling, is it really
> that urgent?
>
> Dave.

2014-07-22 11:46:03

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
> Am 22.07.2014 06:05, schrieb Dave Airlie:
> >On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
> >>Signed-off-by: Maarten Lankhorst <[email protected]>
> >>---
> >> drivers/gpu/drm/radeon/radeon.h | 15 +-
> >> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> >> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
> >> 3 files changed, 248 insertions(+), 50 deletions(-)
> >>
> > From what I can see this is still suffering from the problem that we
> >need to find a proper solution to,
> >
> >My summary of the issues after talking to Jerome and Ben and
> >re-reading things is:
> >
> >We really need to work out a better interface into the drivers to be
> >able to avoid random atomic entrypoints,
>
> Which is exactly what I criticized from the very first beginning. Good to
> know that I'm not the only one thinking that this isn't such a good idea.

I guess I've lost context a bit, but which atomic entry point are we
talking about? Afaics the only one that's mandatory is the is
fence->signaled callback to check whether a fence really has been
signalled. It's used internally by the fence code to avoid spurious
wakeups. Afaik that should be doable already on any hardware. If that's
not the case then we can always track the signalled state in software and
double-check in a worker thread before updating the sw state. And wrap
this all up into a special fence class if there's more than one driver
needing this.

There is nothing else that forces callbacks from atomic contexts upon you.
You can use them if you see it fit, but really if it doesn't suit your
driver you can just ignore that part and do process based waits
everywhere.

> >I'm sure you have some ideas and I think you really need to
> >investigate them to move this thing forward,
> >even it if means some issues with android sync pts.
>
> Actually I think that TTMs fence interface already gave quite a good hint
> how it might look like. I can only guess that this won't fit with the
> Android stuff, otherwise I can't see a good reason why we didn't stick with
> that.

Well the current plan for i915<->radeon sync from Maarten is to use these
atomic callbacks on the i915 side. So android didn't figure into this at
all. Actually with android the entire implementation is kinda the
platforms problem, the generic parts just give you a userspace interface
and some means to stack up fences.

> >but none of the two major drivers seem to want the interface as-is so
> >something needs to give
> >
> >My major question is why we need an atomic callback here at all, what
> >scenario does it cover?
>
> Agree totally. As far as I can see all current uses of the interface are of
> the kind of waiting for a fence to signal.
>
> No need for any callback from one driver into another, especially not in
> atomic context. If a driver needs such a functionality it should just start
> up a kernel thread and do it's waiting there.
>
> This obviously shouldn't be an obstacle for pure hardware implementations
> where one driver signals a semaphore another driver is waiting for, or a
> high signal on an interrupt line directly wired between two chips. And I
> think this is a completely different topic and not necessarily part of the
> common fence interface we should currently focus on.

It's for mixed hw/sw stuff where we want to poke the hw from the irq
context (if possible) since someone forgot the wire. At least on the i915
side it boils down to one mmio write, and it's fairly pointless to launch
a thread for that.

So I haven't dug into ttm details but from the i915 side the current stuff
and atomic semantics makes sense. Maybe we just need to wrap a bit more
insulation around ttm-based drivers.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 11:52:10

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

Hey,

op 22-07-14 06:05, Dave Airlie schreef:
> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>> Signed-off-by: Maarten Lankhorst <[email protected]>
>> ---
>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>
> From what I can see this is still suffering from the problem that we
> need to find a proper solution to,
>
> My summary of the issues after talking to Jerome and Ben and
> re-reading things is:
>
> We really need to work out a better interface into the drivers to be
> able to avoid random atomic entrypoints,
> I'm sure you have some ideas and I think you really need to
> investigate them to move this thing forward,
> even it if means some issues with android sync pts.
>
> but none of the two major drivers seem to want the interface as-is so
> something needs to give
wait_queue_t (which radeon uses for fence_queue) uses atomic entrypoints too, the most common
one being autoremove_wake_function, which wakes up the thread it was initialized from, and removes
itself from the wait_queue_t list, in atomic fashion. It's used by __wait_event_interruptible_locked,
if something internally wants to add some arbitrary callback it could already happen...

> My major question is why we need an atomic callback here at all, what
> scenario does it cover?
A atomic callback could do something like schedule_work(&work) (like nouveau_fence_work already does right now!!!!).

I've also added some more experimental things in my unsubmitted branch, in a codepath that's taken when synchronization is used with multiple GPU's:

Nouveau: I write the new seqno to the GART fence, which I added a GPU wait for using SEMAPHORE_TRIGGER.ACQUIRE_GE.
radeon: I write to a memory location to unblock the execution ring, this will probably be replaced by a call to the GPU scheduler.
i915: write to the EXCC (condition code) register to unblock the ring operation when it's waiting for the condition code.

But I want to emphasize that this is a hack, and driver maintainers will probably NACK it, I think I will only submit the one for nouveau, which is sane there because it schedules contexts in hardware.
Even so that part is not final and will probably go through a few iterations before submission.


> Surely we can use a workqueue based callback to ask a driver to check
> its signalling, is it really
> that urgent?
Nothing prevents a driver from using that approach, even with those changes.

Driver maintainers can still NACK the use of fence_add_callback if they want to,
or choose not to export fences to outside the driver. Because fences are still
not exporting, nothing will change for them compared to the current situation.

~Maarten

2014-07-22 11:52:32

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
> > Am 22.07.2014 06:05, schrieb Dave Airlie:
> > >On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
> > >>Signed-off-by: Maarten Lankhorst <[email protected]>
> > >>---
> > >> drivers/gpu/drm/radeon/radeon.h | 15 +-
> > >> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> > >> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
> > >> 3 files changed, 248 insertions(+), 50 deletions(-)
> > >>
> > > From what I can see this is still suffering from the problem that we
> > >need to find a proper solution to,
> > >
> > >My summary of the issues after talking to Jerome and Ben and
> > >re-reading things is:
> > >
> > >We really need to work out a better interface into the drivers to be
> > >able to avoid random atomic entrypoints,
> >
> > Which is exactly what I criticized from the very first beginning. Good to
> > know that I'm not the only one thinking that this isn't such a good idea.
>
> I guess I've lost context a bit, but which atomic entry point are we
> talking about? Afaics the only one that's mandatory is the is
> fence->signaled callback to check whether a fence really has been
> signalled. It's used internally by the fence code to avoid spurious
> wakeups. Afaik that should be doable already on any hardware. If that's
> not the case then we can always track the signalled state in software and
> double-check in a worker thread before updating the sw state. And wrap
> this all up into a special fence class if there's more than one driver
> needing this.
>
> There is nothing else that forces callbacks from atomic contexts upon you.
> You can use them if you see it fit, but really if it doesn't suit your
> driver you can just ignore that part and do process based waits
> everywhere.

Aside: The fence-process-callback has already been implemented by nouveau
with the struct fence_work in nouveau_fence.c. Would make loads of sense
to move that code into the driver core and adapat it to Maarten's struct
fence once this has all landed.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 11:57:29

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
> > Am 22.07.2014 06:05, schrieb Dave Airlie:
> > >On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
> > >>Signed-off-by: Maarten Lankhorst <[email protected]>
> > >>---
> > >> drivers/gpu/drm/radeon/radeon.h | 15 +-
> > >> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> > >> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
> > >> 3 files changed, 248 insertions(+), 50 deletions(-)
> > >>
> > > From what I can see this is still suffering from the problem that we
> > >need to find a proper solution to,
> > >
> > >My summary of the issues after talking to Jerome and Ben and
> > >re-reading things is:
> > >
> > >We really need to work out a better interface into the drivers to be
> > >able to avoid random atomic entrypoints,
> >
> > Which is exactly what I criticized from the very first beginning. Good to
> > know that I'm not the only one thinking that this isn't such a good idea.
>
> I guess I've lost context a bit, but which atomic entry point are we
> talking about? Afaics the only one that's mandatory is the is
> fence->signaled callback to check whether a fence really has been
> signalled. It's used internally by the fence code to avoid spurious
> wakeups. Afaik that should be doable already on any hardware. If that's
> not the case then we can always track the signalled state in software and
> double-check in a worker thread before updating the sw state. And wrap
> this all up into a special fence class if there's more than one driver
> needing this.

One thing I've forgotten: The i915 scheduler that's floating around runs
its bottom half from irq context. So I really want to be able to check
fence state from irq context and I also want to make it possible
(possible! not mandatory) to register callbacks which are run from any
context asap after the fence is signalled.

If the radeon hw/driver doesn't want to cope with that complexity we can
fully insolate it with the sw tracked fence state if you don't like
Maarten's radeon implementation. But forcing everyone to forgoe this just
because you don't like it and don't want to use it in radeon doesn't sound
right.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 12:20:14

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 13:57, schrieb Daniel Vetter:
> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>> ---
>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>
>>>> From what I can see this is still suffering from the problem that we
>>>> need to find a proper solution to,
>>>>
>>>> My summary of the issues after talking to Jerome and Ben and
>>>> re-reading things is:
>>>>
>>>> We really need to work out a better interface into the drivers to be
>>>> able to avoid random atomic entrypoints,
>>> Which is exactly what I criticized from the very first beginning. Good to
>>> know that I'm not the only one thinking that this isn't such a good idea.
>> I guess I've lost context a bit, but which atomic entry point are we
>> talking about? Afaics the only one that's mandatory is the is
>> fence->signaled callback to check whether a fence really has been
>> signalled. It's used internally by the fence code to avoid spurious
>> wakeups. Afaik that should be doable already on any hardware. If that's
>> not the case then we can always track the signalled state in software and
>> double-check in a worker thread before updating the sw state. And wrap
>> this all up into a special fence class if there's more than one driver
>> needing this.
> One thing I've forgotten: The i915 scheduler that's floating around runs
> its bottom half from irq context. So I really want to be able to check
> fence state from irq context and I also want to make it possible
> (possible! not mandatory) to register callbacks which are run from any
> context asap after the fence is signalled.

NAK, that's just the bad design I've talked about. Checking fence state
inside the same driver from interrupt context is OK, because it's the
drivers interrupt that we are talking about here.

Checking fence status from another drivers interrupt context is what
really concerns me here, cause your driver doesn't have the slightest
idea if the called driver is really capable of checking the fence right now.

> If the radeon hw/driver doesn't want to cope with that complexity we can
> fully insolate it with the sw tracked fence state if you don't like
> Maarten's radeon implementation. But forcing everyone to forgoe this just
> because you don't like it and don't want to use it in radeon doesn't sound
> right.

While it's clearly a hack Maarten's solution for radeon would indeed
work, but that's not really the point here.

It's just that I think leaking interrupt context from one driver into
another driver is just a really really bad idea from a design point of view.

And calling into a driver while in atomic context to check for a fence
being signaled doesn't sounds like a good idea either, cause that limits
way to much what the called driver can do for checking the status of a
fence.

Christian.

2014-07-22 13:26:48

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 02:19:57PM +0200, Christian K?nig wrote:
> Am 22.07.2014 13:57, schrieb Daniel Vetter:
> >On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
> >>On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
> >>>Am 22.07.2014 06:05, schrieb Dave Airlie:
> >>>>On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
> >>>>>Signed-off-by: Maarten Lankhorst <[email protected]>
> >>>>>---
> >>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
> >>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
> >>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
> >>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
> >>>>>
> >>>> From what I can see this is still suffering from the problem that we
> >>>>need to find a proper solution to,
> >>>>
> >>>>My summary of the issues after talking to Jerome and Ben and
> >>>>re-reading things is:
> >>>>
> >>>>We really need to work out a better interface into the drivers to be
> >>>>able to avoid random atomic entrypoints,
> >>>Which is exactly what I criticized from the very first beginning. Good to
> >>>know that I'm not the only one thinking that this isn't such a good idea.
> >>I guess I've lost context a bit, but which atomic entry point are we
> >>talking about? Afaics the only one that's mandatory is the is
> >>fence->signaled callback to check whether a fence really has been
> >>signalled. It's used internally by the fence code to avoid spurious
> >>wakeups. Afaik that should be doable already on any hardware. If that's
> >>not the case then we can always track the signalled state in software and
> >>double-check in a worker thread before updating the sw state. And wrap
> >>this all up into a special fence class if there's more than one driver
> >>needing this.
> >One thing I've forgotten: The i915 scheduler that's floating around runs
> >its bottom half from irq context. So I really want to be able to check
> >fence state from irq context and I also want to make it possible
> >(possible! not mandatory) to register callbacks which are run from any
> >context asap after the fence is signalled.
>
> NAK, that's just the bad design I've talked about. Checking fence state
> inside the same driver from interrupt context is OK, because it's the
> drivers interrupt that we are talking about here.
>
> Checking fence status from another drivers interrupt context is what really
> concerns me here, cause your driver doesn't have the slightest idea if the
> called driver is really capable of checking the fence right now.

I guess my mail hasn't been clear then. If you don't like it we could add
a bit of glue to insulate the madness and bad design i915 might do from
radeon. That imo doesn't invalidate the overall fence interfaces.

So what about the following:
- fence->enabling_signaling is restricted to be called from process
context. We don't use any different yet, so would boild down to adding a
WARN_ON(in_interrupt) or so to fence_enable_sw_signalling.

- Make fence->signaled optional (already the case) and don't implement it
in readon (i.e. reduce this patch here). Only downside is that radeon
needs to correctly (i.e. without races or so) call fence_signal. And the
cross-driver synchronization might be a bit less efficient. Note that
you can call fence_signal from wherever you want to, so hopefully that
doesn't restrict your implementation.

End result: No one calls into radeon from interrupt context, and this is
guaranteed.

Would that be something you can agree to?

Like I've said I think restricting the insanity other people are willing
to live with just because you don't like it isn't right. But it is
certainly right for you to insist on not being forced into any such
design. I think the above would achieve this.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 13:45:51

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 15:26, schrieb Daniel Vetter:
> On Tue, Jul 22, 2014 at 02:19:57PM +0200, Christian K?nig wrote:
>> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>>
>>>>>> From what I can see this is still suffering from the problem that we
>>>>>> need to find a proper solution to,
>>>>>>
>>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>>> re-reading things is:
>>>>>>
>>>>>> We really need to work out a better interface into the drivers to be
>>>>>> able to avoid random atomic entrypoints,
>>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>>> I guess I've lost context a bit, but which atomic entry point are we
>>>> talking about? Afaics the only one that's mandatory is the is
>>>> fence->signaled callback to check whether a fence really has been
>>>> signalled. It's used internally by the fence code to avoid spurious
>>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>>> not the case then we can always track the signalled state in software and
>>>> double-check in a worker thread before updating the sw state. And wrap
>>>> this all up into a special fence class if there's more than one driver
>>>> needing this.
>>> One thing I've forgotten: The i915 scheduler that's floating around runs
>>> its bottom half from irq context. So I really want to be able to check
>>> fence state from irq context and I also want to make it possible
>>> (possible! not mandatory) to register callbacks which are run from any
>>> context asap after the fence is signalled.
>> NAK, that's just the bad design I've talked about. Checking fence state
>> inside the same driver from interrupt context is OK, because it's the
>> drivers interrupt that we are talking about here.
>>
>> Checking fence status from another drivers interrupt context is what really
>> concerns me here, cause your driver doesn't have the slightest idea if the
>> called driver is really capable of checking the fence right now.
> I guess my mail hasn't been clear then. If you don't like it we could add
> a bit of glue to insulate the madness and bad design i915 might do from
> radeon. That imo doesn't invalidate the overall fence interfaces.
>
> So what about the following:
> - fence->enabling_signaling is restricted to be called from process
> context. We don't use any different yet, so would boild down to adding a
> WARN_ON(in_interrupt) or so to fence_enable_sw_signalling.
>
> - Make fence->signaled optional (already the case) and don't implement it
> in readon (i.e. reduce this patch here). Only downside is that radeon
> needs to correctly (i.e. without races or so) call fence_signal. And the
> cross-driver synchronization might be a bit less efficient. Note that
> you can call fence_signal from wherever you want to, so hopefully that
> doesn't restrict your implementation.
>
> End result: No one calls into radeon from interrupt context, and this is
> guaranteed.
>
> Would that be something you can agree to?

No, the whole enable_signaling stuff should go away. No callback from
the driver into the fence code, only the other way around.

fence->signaled as well as fence->wait should become mandatory and only
called from process context without holding any locks, neither atomic
nor any mutex/semaphore (rcu might be ok).

> Like I've said I think restricting the insanity other people are willing
> to live with just because you don't like it isn't right. But it is
> certainly right for you to insist on not being forced into any such
> design. I think the above would achieve this.

I don't think so. If it's just me I would say that I'm just to cautious
and the idea is still save to apply to the whole kernel.

But since Dave, Jerome and Ben seems to have similar concerns I think we
need to agree to a minimum and save interface for all drivers.

Christian.

2014-07-22 14:05:47

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 22-07-14 14:19, Christian K?nig schreef:
> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>>> ---
>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>
>>>>> From what I can see this is still suffering from the problem that we
>>>>> need to find a proper solution to,
>>>>>
>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>> re-reading things is:
>>>>>
>>>>> We really need to work out a better interface into the drivers to be
>>>>> able to avoid random atomic entrypoints,
>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>> I guess I've lost context a bit, but which atomic entry point are we
>>> talking about? Afaics the only one that's mandatory is the is
>>> fence->signaled callback to check whether a fence really has been
>>> signalled. It's used internally by the fence code to avoid spurious
>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>> not the case then we can always track the signalled state in software and
>>> double-check in a worker thread before updating the sw state. And wrap
>>> this all up into a special fence class if there's more than one driver
>>> needing this.
>> One thing I've forgotten: The i915 scheduler that's floating around runs
>> its bottom half from irq context. So I really want to be able to check
>> fence state from irq context and I also want to make it possible
>> (possible! not mandatory) to register callbacks which are run from any
>> context asap after the fence is signalled.
>
> NAK, that's just the bad design I've talked about. Checking fence state inside the same driver from interrupt context is OK, because it's the drivers interrupt that we are talking about here.
>
> Checking fence status from another drivers interrupt context is what really concerns me here, cause your driver doesn't have the slightest idea if the called driver is really capable of checking the fence right now.
I think there is a usecase for having atomic context allowed with fence_is_signaled, but I don't think there is one for interrupt context, so it's good with me if fence_is_signaled cannot be called in interrupt context, or with irqs disabled.

fence_enable_sw_signaling disables interrupts because it holds fence->lock, so in theory it could be called from any context including interrupts. But no sane driver author does that, or at least I hope not..

Would a sanity check like the one below be enough to allay your fears?
8<-------

diff --git a/include/linux/fence.h b/include/linux/fence.h
index d174585b874b..c1a4519ba2f5 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -143,6 +143,7 @@ struct fence_cb {
* the second time will be a noop since it was already signaled.
*
* Notes on signaled:
+ * Called with interrupts enabled, and never from interrupt context.
* May set fence->status if returning true.
*
* Notes on wait:
@@ -268,15 +269,29 @@ fence_is_signaled_locked(struct fence *fence)
static inline bool
fence_is_signaled(struct fence *fence)
{
+ bool ret;
+
if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;

- if (fence->ops->signaled && fence->ops->signaled(fence)) {
+ if (!fence->ops->signaled)
+ return false;
+
+ if (config_enabled(CONFIG_PROVE_LOCKING))
+ WARN_ON(in_interrupt() || irqs_disabled());
+
+ if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP))
+ preempt_disable();
+
+ ret = fence->ops->signaled(fence);
+
+ if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP))
+ preempt_enable();
+
+ if (ret)
fence_signal(fence);
- return true;
- }

- return false;
+ return ret;
}

/**
8<--------

>> If the radeon hw/driver doesn't want to cope with that complexity we can
>> fully insolate it with the sw tracked fence state if you don't like
>> Maarten's radeon implementation. But forcing everyone to forgoe this just
>> because you don't like it and don't want to use it in radeon doesn't sound
>> right.
>
> While it's clearly a hack Maarten's solution for radeon would indeed work, but that's not really the point here.
>
> It's just that I think leaking interrupt context from one driver into another driver is just a really really bad idea from a design point of view.
>
> And calling into a driver while in atomic context to check for a fence being signaled doesn't sounds like a good idea either, cause that limits way to much what the called driver can do for checking the status of a fence.
No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.

~Maarten

2014-07-22 14:24:28

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.
It's not only the lockup case from radeon I have in mind here. For
userspace queues it might be necessary to call copy_from_user to figure
out if a fence is signaled or not.

Returning false all the time is probably not a good idea either.

Christian.

Am 22.07.2014 16:05, schrieb Maarten Lankhorst:
> op 22-07-14 14:19, Christian K?nig schreef:
>> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>>
>>>>>> From what I can see this is still suffering from the problem that we
>>>>>> need to find a proper solution to,
>>>>>>
>>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>>> re-reading things is:
>>>>>>
>>>>>> We really need to work out a better interface into the drivers to be
>>>>>> able to avoid random atomic entrypoints,
>>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>>> I guess I've lost context a bit, but which atomic entry point are we
>>>> talking about? Afaics the only one that's mandatory is the is
>>>> fence->signaled callback to check whether a fence really has been
>>>> signalled. It's used internally by the fence code to avoid spurious
>>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>>> not the case then we can always track the signalled state in software and
>>>> double-check in a worker thread before updating the sw state. And wrap
>>>> this all up into a special fence class if there's more than one driver
>>>> needing this.
>>> One thing I've forgotten: The i915 scheduler that's floating around runs
>>> its bottom half from irq context. So I really want to be able to check
>>> fence state from irq context and I also want to make it possible
>>> (possible! not mandatory) to register callbacks which are run from any
>>> context asap after the fence is signalled.
>> NAK, that's just the bad design I've talked about. Checking fence state inside the same driver from interrupt context is OK, because it's the drivers interrupt that we are talking about here.
>>
>> Checking fence status from another drivers interrupt context is what really concerns me here, cause your driver doesn't have the slightest idea if the called driver is really capable of checking the fence right now.
> I think there is a usecase for having atomic context allowed with fence_is_signaled, but I don't think there is one for interrupt context, so it's good with me if fence_is_signaled cannot be called in interrupt context, or with irqs disabled.
>
> fence_enable_sw_signaling disables interrupts because it holds fence->lock, so in theory it could be called from any context including interrupts. But no sane driver author does that, or at least I hope not..
>
> Would a sanity check like the one below be enough to allay your fears?
> 8<-------
>
> diff --git a/include/linux/fence.h b/include/linux/fence.h
> index d174585b874b..c1a4519ba2f5 100644
> --- a/include/linux/fence.h
> +++ b/include/linux/fence.h
> @@ -143,6 +143,7 @@ struct fence_cb {
> * the second time will be a noop since it was already signaled.
> *
> * Notes on signaled:
> + * Called with interrupts enabled, and never from interrupt context.
> * May set fence->status if returning true.
> *
> * Notes on wait:
> @@ -268,15 +269,29 @@ fence_is_signaled_locked(struct fence *fence)
> static inline bool
> fence_is_signaled(struct fence *fence)
> {
> + bool ret;
> +
> if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return true;
>
> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
> + if (!fence->ops->signaled)
> + return false;
> +
> + if (config_enabled(CONFIG_PROVE_LOCKING))
> + WARN_ON(in_interrupt() || irqs_disabled());
> +
> + if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP))
> + preempt_disable();
> +
> + ret = fence->ops->signaled(fence);
> +
> + if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP))
> + preempt_enable();
> +
> + if (ret)
> fence_signal(fence);
> - return true;
> - }
>
> - return false;
> + return ret;
> }
>
> /**
> 8<--------
>
>>> If the radeon hw/driver doesn't want to cope with that complexity we can
>>> fully insolate it with the sw tracked fence state if you don't like
>>> Maarten's radeon implementation. But forcing everyone to forgoe this just
>>> because you don't like it and don't want to use it in radeon doesn't sound
>>> right.
>> While it's clearly a hack Maarten's solution for radeon would indeed work, but that's not really the point here.
>>
>> It's just that I think leaking interrupt context from one driver into another driver is just a really really bad idea from a design point of view.
>>
>> And calling into a driver while in atomic context to check for a fence being signaled doesn't sounds like a good idea either, cause that limits way to much what the called driver can do for checking the status of a fence.
> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.
>
> ~Maarten
>

2014-07-22 14:27:10

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 22-07-14 16:24, Christian K?nig schreef:
>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.
> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not.
>
> Returning false all the time is probably not a good idea either.
Having userspace implement a fence sounds like an awful idea, why would you want to do that?

A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll..

~Maarten

2014-07-22 14:39:59

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 16:27, schrieb Maarten Lankhorst:
> op 22-07-14 16:24, Christian K?nig schreef:
>>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.
>> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not.
>>
>> Returning false all the time is probably not a good idea either.
> Having userspace implement a fence sounds like an awful idea, why would you want to do that?

Marketing moves in mysterious ways. Don't ask me, but that the direction
it currently moves with userspace queues and IOMMU etc...

> A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll..

Yeah agree totally, but the point for the fence interface is that I
can't predict what's necessary to check if a fence is signaled or not on
future hardware.

For the currently available radeon hardware I can say that reading a
value from a kernel page is pretty much all you need. But for older
hardware that was reading from a register which might become very tricky
if the hardware is power off or currently inside a reset cycle.

Because off this I would avoid any such interface if it's not absolutely
required by some use case, and currently I don't see this requirement
because the functionality you want to archive could be implemented
without this.

Christian.

>
> ~Maarten
>

2014-07-22 14:44:24

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 22-07-14 15:45, Christian K?nig schreef:
> Am 22.07.2014 15:26, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 02:19:57PM +0200, Christian K?nig wrote:
>>> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>>>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>>>
>>>>>>> From what I can see this is still suffering from the problem that we
>>>>>>> need to find a proper solution to,
>>>>>>>
>>>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>>>> re-reading things is:
>>>>>>>
>>>>>>> We really need to work out a better interface into the drivers to be
>>>>>>> able to avoid random atomic entrypoints,
>>>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>>>> I guess I've lost context a bit, but which atomic entry point are we
>>>>> talking about? Afaics the only one that's mandatory is the is
>>>>> fence->signaled callback to check whether a fence really has been
>>>>> signalled. It's used internally by the fence code to avoid spurious
>>>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>>>> not the case then we can always track the signalled state in software and
>>>>> double-check in a worker thread before updating the sw state. And wrap
>>>>> this all up into a special fence class if there's more than one driver
>>>>> needing this.
>>>> One thing I've forgotten: The i915 scheduler that's floating around runs
>>>> its bottom half from irq context. So I really want to be able to check
>>>> fence state from irq context and I also want to make it possible
>>>> (possible! not mandatory) to register callbacks which are run from any
>>>> context asap after the fence is signalled.
>>> NAK, that's just the bad design I've talked about. Checking fence state
>>> inside the same driver from interrupt context is OK, because it's the
>>> drivers interrupt that we are talking about here.
>>>
>>> Checking fence status from another drivers interrupt context is what really
>>> concerns me here, cause your driver doesn't have the slightest idea if the
>>> called driver is really capable of checking the fence right now.
>> I guess my mail hasn't been clear then. If you don't like it we could add
>> a bit of glue to insulate the madness and bad design i915 might do from
>> radeon. That imo doesn't invalidate the overall fence interfaces.
>>
>> So what about the following:
>> - fence->enabling_signaling is restricted to be called from process
>> context. We don't use any different yet, so would boild down to adding a
>> WARN_ON(in_interrupt) or so to fence_enable_sw_signalling.
>>
>> - Make fence->signaled optional (already the case) and don't implement it
>> in readon (i.e. reduce this patch here). Only downside is that radeon
>> needs to correctly (i.e. without races or so) call fence_signal. And the
>> cross-driver synchronization might be a bit less efficient. Note that
>> you can call fence_signal from wherever you want to, so hopefully that
>> doesn't restrict your implementation.
>>
>> End result: No one calls into radeon from interrupt context, and this is
>> guaranteed.
>>
>> Would that be something you can agree to?
>
> No, the whole enable_signaling stuff should go away. No callback from the driver into the fence code, only the other way around.
>
> fence->signaled as well as fence->wait should become mandatory and only called from process context without holding any locks, neither atomic nor any mutex/semaphore (rcu might be ok).
fence->wait is mandatory, and already requires sleeping.

If .signaled is not implemented there is no guarantee the fence will be
signaled sometime soon, this is also why enable_signaling exists, to
allow the driver to flush. I get it that it doesn't apply to radeon and nouveau,
but for other drivers that could be necessary, like vmwgfx.

Ironically that is also a part of the ttm fence, except it was called flush there.

I would also like to note that ttm_bo_wait currently is also a function that currently uses is_signaled from atomic_context...

For the more complicated locking worries: Lockdep is your friend, use PROVE_LOCKING and find bugs before they trigger. ;-)

>> Like I've said I think restricting the insanity other people are willing
>> to live with just because you don't like it isn't right. But it is
>> certainly right for you to insist on not being forced into any such
>> design. I think the above would achieve this.
>
> I don't think so. If it's just me I would say that I'm just to cautious and the idea is still save to apply to the whole kernel.
>
> But since Dave, Jerome and Ben seems to have similar concerns I think we need to agree to a minimum and save interface for all drivers.
>
> Christian.
>

2014-07-22 14:47:22

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 22-07-14 16:39, Christian K?nig schreef:
> Am 22.07.2014 16:27, schrieb Maarten Lankhorst:
>> op 22-07-14 16:24, Christian K?nig schreef:
>>>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.
>>> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not.
>>>
>>> Returning false all the time is probably not a good idea either.
>> Having userspace implement a fence sounds like an awful idea, why would you want to do that?
>
> Marketing moves in mysterious ways. Don't ask me, but that the direction it currently moves with userspace queues and IOMMU etc...
>
>> A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll..
>
> Yeah agree totally, but the point for the fence interface is that I can't predict what's necessary to check if a fence is signaled or not on future hardware.
>
> For the currently available radeon hardware I can say that reading a value from a kernel page is pretty much all you need. But for older hardware that was reading from a register which might become very tricky if the hardware is power off or currently inside a reset cycle.
>
> Because off this I would avoid any such interface if it's not absolutely required by some use case, and currently I don't see this requirement because the functionality you want to archive could be implemented without this.
Oh? I've already done that in radeon_fence, there is no way enable_signaling will fiddle with hardware registers during a reset cycle.
I've also made sure that __radeon_fence_is_signaled grabs exclusive_lock in read mode before touching any hw state.

Older hardware also doesn't implement optimus, so I think power off is not much of a worry for them, if you could point me at the checking done for that I could make sure that this is the case.

~Maarten

2014-07-22 15:02:42

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 16:44, schrieb Maarten Lankhorst:
> op 22-07-14 15:45, Christian K?nig schreef:
>> Am 22.07.2014 15:26, schrieb Daniel Vetter:
>>> On Tue, Jul 22, 2014 at 02:19:57PM +0200, Christian K?nig wrote:
>>>> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>>>>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>>>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>>>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>>>>>> ---
>>>>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>>>>
>>>>>>>> From what I can see this is still suffering from the problem that we
>>>>>>>> need to find a proper solution to,
>>>>>>>>
>>>>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>>>>> re-reading things is:
>>>>>>>>
>>>>>>>> We really need to work out a better interface into the drivers to be
>>>>>>>> able to avoid random atomic entrypoints,
>>>>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>>>>> I guess I've lost context a bit, but which atomic entry point are we
>>>>>> talking about? Afaics the only one that's mandatory is the is
>>>>>> fence->signaled callback to check whether a fence really has been
>>>>>> signalled. It's used internally by the fence code to avoid spurious
>>>>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>>>>> not the case then we can always track the signalled state in software and
>>>>>> double-check in a worker thread before updating the sw state. And wrap
>>>>>> this all up into a special fence class if there's more than one driver
>>>>>> needing this.
>>>>> One thing I've forgotten: The i915 scheduler that's floating around runs
>>>>> its bottom half from irq context. So I really want to be able to check
>>>>> fence state from irq context and I also want to make it possible
>>>>> (possible! not mandatory) to register callbacks which are run from any
>>>>> context asap after the fence is signalled.
>>>> NAK, that's just the bad design I've talked about. Checking fence state
>>>> inside the same driver from interrupt context is OK, because it's the
>>>> drivers interrupt that we are talking about here.
>>>>
>>>> Checking fence status from another drivers interrupt context is what really
>>>> concerns me here, cause your driver doesn't have the slightest idea if the
>>>> called driver is really capable of checking the fence right now.
>>> I guess my mail hasn't been clear then. If you don't like it we could add
>>> a bit of glue to insulate the madness and bad design i915 might do from
>>> radeon. That imo doesn't invalidate the overall fence interfaces.
>>>
>>> So what about the following:
>>> - fence->enabling_signaling is restricted to be called from process
>>> context. We don't use any different yet, so would boild down to adding a
>>> WARN_ON(in_interrupt) or so to fence_enable_sw_signalling.
>>>
>>> - Make fence->signaled optional (already the case) and don't implement it
>>> in readon (i.e. reduce this patch here). Only downside is that radeon
>>> needs to correctly (i.e. without races or so) call fence_signal. And the
>>> cross-driver synchronization might be a bit less efficient. Note that
>>> you can call fence_signal from wherever you want to, so hopefully that
>>> doesn't restrict your implementation.
>>>
>>> End result: No one calls into radeon from interrupt context, and this is
>>> guaranteed.
>>>
>>> Would that be something you can agree to?
>> No, the whole enable_signaling stuff should go away. No callback from the driver into the fence code, only the other way around.
>>
>> fence->signaled as well as fence->wait should become mandatory and only called from process context without holding any locks, neither atomic nor any mutex/semaphore (rcu might be ok).
> fence->wait is mandatory, and already requires sleeping.
>
> If .signaled is not implemented there is no guarantee the fence will be
> signaled sometime soon, this is also why enable_signaling exists, to
> allow the driver to flush. I get it that it doesn't apply to radeon and nouveau,
> but for other drivers that could be necessary, like vmwgfx.
>
> Ironically that is also a part of the ttm fence, except it was called flush there.

Then call it flush again and make it optional like in TTM.

> I would also like to note that ttm_bo_wait currently is also a function that currently uses is_signaled from atomic_context...

I know, but TTM is only called from inside a single driver, no inter
driver needs here. We currently even call the internal fence
implementation from interrupt context as well and at more than one
occasion assume that TTM only uses radeon fences.

Christian.

> For the more complicated locking worries: Lockdep is your friend, use PROVE_LOCKING and find bugs before they trigger. ;-)
>
>>> Like I've said I think restricting the insanity other people are willing
>>> to live with just because you don't like it isn't right. But it is
>>> certainly right for you to insist on not being forced into any such
>>> design. I think the above would achieve this.
>> I don't think so. If it's just me I would say that I'm just to cautious and the idea is still save to apply to the whole kernel.
>>
>> But since Dave, Jerome and Ben seems to have similar concerns I think we need to agree to a minimum and save interface for all drivers.
>>
>> Christian.
>>

2014-07-22 15:17:09

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 16:47, schrieb Maarten Lankhorst:
> op 22-07-14 16:39, Christian K?nig schreef:
>> Am 22.07.2014 16:27, schrieb Maarten Lankhorst:
>>> op 22-07-14 16:24, Christian K?nig schreef:
>>>>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet.
>>>> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not.
>>>>
>>>> Returning false all the time is probably not a good idea either.
>>> Having userspace implement a fence sounds like an awful idea, why would you want to do that?
>> Marketing moves in mysterious ways. Don't ask me, but that the direction it currently moves with userspace queues and IOMMU etc...
>>
>>> A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll..
>> Yeah agree totally, but the point for the fence interface is that I can't predict what's necessary to check if a fence is signaled or not on future hardware.
>>
>> For the currently available radeon hardware I can say that reading a value from a kernel page is pretty much all you need. But for older hardware that was reading from a register which might become very tricky if the hardware is power off or currently inside a reset cycle.
>>
>> Because off this I would avoid any such interface if it's not absolutely required by some use case, and currently I don't see this requirement because the functionality you want to archive could be implemented without this.
> Oh? I've already done that in radeon_fence, there is no way enable_signaling will fiddle with hardware registers during a reset cycle.
> I've also made sure that __radeon_fence_is_signaled grabs exclusive_lock in read mode before touching any hw state.
>
> Older hardware also doesn't implement optimus, so I think power off is not much of a worry for them, if you could point me at the checking done for that I could make sure that this is the case.

I'm not talking about any specific radeon hardware or use case here. As
far as I can see you indeed solved all driver problems with the current
interface design.

The question I'm raising is if the current interface design needs as
complex as it is. And my answer to this is a clear *no*, so why do you
want to stick with this design? I still haven't understood that.

If it's just to support a further feature of direct synchronization in
interrupt context between different drivers then I must clearly say that
this is a NAK, cause you add complexity to the kernel that isn't necessary.

Christian.

>
> ~Maarten
>

2014-07-22 15:17:58

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 3:45 PM, Christian König
<[email protected]> wrote:
>> Would that be something you can agree to?
>
>
> No, the whole enable_signaling stuff should go away. No callback from the
> driver into the fence code, only the other way around.
>
> fence->signaled as well as fence->wait should become mandatory and only
> called from process context without holding any locks, neither atomic nor
> any mutex/semaphore (rcu might be ok).

So for the enable_signaling, that's optional already. It's only for
drivers that don't want to keep interrupts enabled all the time. You
can opt out of that easily.

Wrt holding no locks at all while calling into any fence functions,
that's just not going to work out. The point here is to make different
drivers work together and we can rework all the ttm and i915 code to
work locklessly in all cases where they need to wait for someone to
complete rendering. Or at least I don't think that's feasible. So if
you insist that no one might call into radeon code then we simply need
to exclude radeon from participating in any shared fencing. But that's
a bit pointless.

>> Like I've said I think restricting the insanity other people are willing
>> to live with just because you don't like it isn't right. But it is
>> certainly right for you to insist on not being forced into any such
>> design. I think the above would achieve this.
>
>
> I don't think so. If it's just me I would say that I'm just to cautious and
> the idea is still save to apply to the whole kernel.
>
> But since Dave, Jerome and Ben seems to have similar concerns I think we
> need to agree to a minimum and save interface for all drivers.

Well I haven't yet seen a proposal that actually works. From an intel
pov I don't care that much since we don't care about desktop prime, so
if radeon/nouveau don't want to do that, meh. Imo the design as-is is
fairly sound, and as simple as it can get given the requirements. I
haven't heard an argument convincing me otherwise, so I guess we
won't have prime support on linux that actually works, ever.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 15:18:50

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Hey,

op 22-07-14 17:02, Christian K?nig schreef:
> Am 22.07.2014 16:44, schrieb Maarten Lankhorst:
>> op 22-07-14 15:45, Christian K?nig schreef:
>>> Am 22.07.2014 15:26, schrieb Daniel Vetter:
>>>> On Tue, Jul 22, 2014 at 02:19:57PM +0200, Christian K?nig wrote:
>>>>> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>>>>>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>>>>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian K?nig wrote:
>>>>>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>>>>>> On 9 July 2014 22:29, Maarten Lankhorst <[email protected]> wrote:
>>>>>>>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +-
>>>>>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++-
>>>>>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------
>>>>>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>>>>>
>>>>>>>>> From what I can see this is still suffering from the problem that we
>>>>>>>>> need to find a proper solution to,
>>>>>>>>>
>>>>>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>>>>>> re-reading things is:
>>>>>>>>>
>>>>>>>>> We really need to work out a better interface into the drivers to be
>>>>>>>>> able to avoid random atomic entrypoints,
>>>>>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>>>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>>>>>> I guess I've lost context a bit, but which atomic entry point are we
>>>>>>> talking about? Afaics the only one that's mandatory is the is
>>>>>>> fence->signaled callback to check whether a fence really has been
>>>>>>> signalled. It's used internally by the fence code to avoid spurious
>>>>>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>>>>>> not the case then we can always track the signalled state in software and
>>>>>>> double-check in a worker thread before updating the sw state. And wrap
>>>>>>> this all up into a special fence class if there's more than one driver
>>>>>>> needing this.
>>>>>> One thing I've forgotten: The i915 scheduler that's floating around runs
>>>>>> its bottom half from irq context. So I really want to be able to check
>>>>>> fence state from irq context and I also want to make it possible
>>>>>> (possible! not mandatory) to register callbacks which are run from any
>>>>>> context asap after the fence is signalled.
>>>>> NAK, that's just the bad design I've talked about. Checking fence state
>>>>> inside the same driver from interrupt context is OK, because it's the
>>>>> drivers interrupt that we are talking about here.
>>>>>
>>>>> Checking fence status from another drivers interrupt context is what really
>>>>> concerns me here, cause your driver doesn't have the slightest idea if the
>>>>> called driver is really capable of checking the fence right now.
>>>> I guess my mail hasn't been clear then. If you don't like it we could add
>>>> a bit of glue to insulate the madness and bad design i915 might do from
>>>> radeon. That imo doesn't invalidate the overall fence interfaces.
>>>>
>>>> So what about the following:
>>>> - fence->enabling_signaling is restricted to be called from process
>>>> context. We don't use any different yet, so would boild down to adding a
>>>> WARN_ON(in_interrupt) or so to fence_enable_sw_signalling.
>>>>
>>>> - Make fence->signaled optional (already the case) and don't implement it
>>>> in readon (i.e. reduce this patch here). Only downside is that radeon
>>>> needs to correctly (i.e. without races or so) call fence_signal. And the
>>>> cross-driver synchronization might be a bit less efficient. Note that
>>>> you can call fence_signal from wherever you want to, so hopefully that
>>>> doesn't restrict your implementation.
>>>>
>>>> End result: No one calls into radeon from interrupt context, and this is
>>>> guaranteed.
>>>>
>>>> Would that be something you can agree to?
>>> No, the whole enable_signaling stuff should go away. No callback from the driver into the fence code, only the other way around.
>>>
>>> fence->signaled as well as fence->wait should become mandatory and only called from process context without holding any locks, neither atomic nor any mutex/semaphore (rcu might be ok).
>> fence->wait is mandatory, and already requires sleeping.
>>
>> If .signaled is not implemented there is no guarantee the fence will be
>> signaled sometime soon, this is also why enable_signaling exists, to
>> allow the driver to flush. I get it that it doesn't apply to radeon and nouveau,
>> but for other drivers that could be necessary, like vmwgfx.
>>
>> Ironically that is also a part of the ttm fence, except it was called flush there.
>
> Then call it flush again and make it optional like in TTM.
You've posted a lot of concerns, but I haven't seen you come up with any scenario that could create a lockup and lockdep wouldn't warn about.

>> I would also like to note that ttm_bo_wait currently is also a function that currently uses is_signaled from atomic_context...
>
> I know, but TTM is only called from inside a single driver, no inter driver needs here. We currently even call the internal fence implementation from interrupt context as well and at more than one occasion assume that TTM only uses radeon fences.
This is no longer true when you start synchronizing with other drivers. The TTM core will see the intel fences and treat them no differently. That's the entire reason for this conversion. It's also needed to remove the need to pin dma-buf when exporting.

~Maarten.

2014-07-22 15:19:58

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 4:39 PM, Christian König
<[email protected]> wrote:
> Am 22.07.2014 16:27, schrieb Maarten Lankhorst:
>
>> op 22-07-14 16:24, Christian König schreef:
>>>>
>>>> No, you really shouldn't be doing much in the check anyway, it's meant
>>>> to be a lightweight check. If you're not ready yet because of a lockup
>>>> simply return not signaled yet.
>>>
>>> It's not only the lockup case from radeon I have in mind here. For
>>> userspace queues it might be necessary to call copy_from_user to figure out
>>> if a fence is signaled or not.
>>>
>>> Returning false all the time is probably not a good idea either.
>>
>> Having userspace implement a fence sounds like an awful idea, why would
>> you want to do that?
>
>
> Marketing moves in mysterious ways. Don't ask me, but that the direction it
> currently moves with userspace queues and IOMMU etc...

Fence-based syncing between userspace queues submitted stuff through
doorbells and anything submitted by the general simply wont work.
Which is why I think the doorbell is a stupid interface since I just
don't see cameras and v4l devices implementing all that complexity to
get a pure userspace side sync solution.

But that's a different problem really, and I guess marketing will
eventually figure this one out, too.
-Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 15:35:28

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 17:17, schrieb Daniel Vetter:
> On Tue, Jul 22, 2014 at 3:45 PM, Christian König
> <[email protected]> wrote:
>>> Would that be something you can agree to?
>>
>> No, the whole enable_signaling stuff should go away. No callback from the
>> driver into the fence code, only the other way around.
>>
>> fence->signaled as well as fence->wait should become mandatory and only
>> called from process context without holding any locks, neither atomic nor
>> any mutex/semaphore (rcu might be ok).
> So for the enable_signaling, that's optional already. It's only for
> drivers that don't want to keep interrupts enabled all the time. You
> can opt out of that easily.
>
> Wrt holding no locks at all while calling into any fence functions,
> that's just not going to work out. The point here is to make different
> drivers work together and we can rework all the ttm and i915 code to
> work locklessly in all cases where they need to wait for someone to
> complete rendering. Or at least I don't think that's feasible. So if
> you insist that no one might call into radeon code then we simply need
> to exclude radeon from participating in any shared fencing. But that's
> a bit pointless.
>
>>> Like I've said I think restricting the insanity other people are willing
>>> to live with just because you don't like it isn't right. But it is
>>> certainly right for you to insist on not being forced into any such
>>> design. I think the above would achieve this.
>>
>> I don't think so. If it's just me I would say that I'm just to cautious and
>> the idea is still save to apply to the whole kernel.
>>
>> But since Dave, Jerome and Ben seems to have similar concerns I think we
>> need to agree to a minimum and save interface for all drivers.
> Well I haven't yet seen a proposal that actually works.

How about this:

Drivers exporting fences need to provide a fence->signaled and a
fence->wait function, everything else like fence->enable_signaling or
calling fence_signaled() from the driver is optional.

Drivers wanting to use exported fences don't call fence->signaled or
fence->wait in atomic or interrupt context, and not with holding any
global locking primitives (like mmap_sem etc...). Holding locking
primitives local to the driver is ok, as long as they don't conflict
with anything possible used by their own fence implementation.

Christian.

> From an intel
> pov I don't care that much since we don't care about desktop prime, so
> if radeon/nouveau don't want to do that, meh. Imo the design as-is is
> fairly sound, and as simple as it can get given the requirements. I
> haven't heard an argument convincing me otherwise, so I guess we
> won't have prime support on linux that actually works, ever.
> -Daniel

2014-07-22 15:42:19

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 11:19 AM, Daniel Vetter <[email protected]> wrote:
> On Tue, Jul 22, 2014 at 4:39 PM, Christian König
> <[email protected]> wrote:
>> Am 22.07.2014 16:27, schrieb Maarten Lankhorst:
>>
>>> op 22-07-14 16:24, Christian König schreef:
>>>>>
>>>>> No, you really shouldn't be doing much in the check anyway, it's meant
>>>>> to be a lightweight check. If you're not ready yet because of a lockup
>>>>> simply return not signaled yet.
>>>>
>>>> It's not only the lockup case from radeon I have in mind here. For
>>>> userspace queues it might be necessary to call copy_from_user to figure out
>>>> if a fence is signaled or not.
>>>>
>>>> Returning false all the time is probably not a good idea either.
>>>
>>> Having userspace implement a fence sounds like an awful idea, why would
>>> you want to do that?
>>
>>
>> Marketing moves in mysterious ways. Don't ask me, but that the direction it
>> currently moves with userspace queues and IOMMU etc...
>
> Fence-based syncing between userspace queues submitted stuff through
> doorbells and anything submitted by the general simply wont work.
> Which is why I think the doorbell is a stupid interface since I just
> don't see cameras and v4l devices implementing all that complexity to
> get a pure userspace side sync solution.
>

Like it or not this is what a lot of application writers want (look at
mantle and metal and similar new APIs or android synpts). Having
queues and fences in userspace allows the application to structure
things to best fit their own task graphs. The app can decide how to
deal with dependencies and synchronization explicitly instead of
blocking the queues in the kernel for everyone. Anyway, this is
getting off topic.

Alex

2014-07-22 15:43:00

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 5:35 PM, Christian König
<[email protected]> wrote:
> Drivers exporting fences need to provide a fence->signaled and a fence->wait
> function, everything else like fence->enable_signaling or calling
> fence_signaled() from the driver is optional.
>
> Drivers wanting to use exported fences don't call fence->signaled or
> fence->wait in atomic or interrupt context, and not with holding any global
> locking primitives (like mmap_sem etc...). Holding locking primitives local
> to the driver is ok, as long as they don't conflict with anything possible
> used by their own fence implementation.

Well that's almost what we have right now with the exception that
drivers are allowed (actually must for correctness when updating
fences) the ww_mutexes for dma-bufs (or other buffer objects). Locking
correctness is enforced with some extremely nasty lockdep annotations
+ additional debugging infrastructure enabled with
CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
dma-buf ww_mutexes while updating fences or waiting for them. And
obviously for ->wait we need non-atomic context, not just
non-interrupt.

Agreed that any shared locks are out of the way (especially stuff like
dev->struct_mutex or other non-strictly driver-private stuff, i915 is
really bad here still).

So from the core fence framework I think we already have exactly this,
and we only need to adjust the radeon implementation a bit to make it
less risky and invasive to the radeon driver logic.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 15:48:23

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 5:42 PM, Alex Deucher <[email protected]> wrote:
>> Fence-based syncing between userspace queues submitted stuff through
>> doorbells and anything submitted by the general simply wont work.
>> Which is why I think the doorbell is a stupid interface since I just
>> don't see cameras and v4l devices implementing all that complexity to
>> get a pure userspace side sync solution.
>>
>
> Like it or not this is what a lot of application writers want (look at
> mantle and metal and similar new APIs or android synpts). Having
> queues and fences in userspace allows the application to structure
> things to best fit their own task graphs. The app can decide how to
> deal with dependencies and synchronization explicitly instead of
> blocking the queues in the kernel for everyone. Anyway, this is
> getting off topic.

Well there's explicit fences as used in opencl and android syncpts. My
plan is actually to support that in i915 using Maarten's struct fence
stuff (and there's just a very trivial patch for the android stuff in
merging needed to get there). What doesn't work is fences created
behind the kernel's back purely in userspace by giving shared memory
locations special meaning. Those get the kernel completely out of the
picture (as opposed to android syncpts, which just make sync
explicit).

I guess long-term we might need something like gpu futexes to make
that pure userspace syncing integrate a bit better, but imo that's (at
least for now) out of scope. For fences here I have the goal of one
internally representation used by both implicit syncing (dma-buf on
classic linux, e.g. prime) and explicit fencing on android or opencl
or something like that.

We don't have the code yet ready, but that's the direction i915 will
move towards for the near future. Jesse is working on some patches
already.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 15:59:40

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 17:42, schrieb Daniel Vetter:
> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
> <[email protected]> wrote:
>> Drivers exporting fences need to provide a fence->signaled and a fence->wait
>> function, everything else like fence->enable_signaling or calling
>> fence_signaled() from the driver is optional.
>>
>> Drivers wanting to use exported fences don't call fence->signaled or
>> fence->wait in atomic or interrupt context, and not with holding any global
>> locking primitives (like mmap_sem etc...). Holding locking primitives local
>> to the driver is ok, as long as they don't conflict with anything possible
>> used by their own fence implementation.
> Well that's almost what we have right now with the exception that
> drivers are allowed (actually must for correctness when updating
> fences) the ww_mutexes for dma-bufs (or other buffer objects).

In this case sorry for so much noise. I really haven't looked in so much
detail into anything but Maarten's Radeon patches.

But how does that then work right now? My impression was that it's
mandatory for drivers to call fence_signaled()?

> Locking
> correctness is enforced with some extremely nasty lockdep annotations
> + additional debugging infrastructure enabled with
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
> dma-buf ww_mutexes while updating fences or waiting for them. And
> obviously for ->wait we need non-atomic context, not just
> non-interrupt.

Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't
be an RCU be more appropriate here? E.g. aren't we just interested that
the current assigned fence at some point is signaled?

Something like grab ww_mutexes, grab a reference to the current fence
object, release ww_mutex, wait for fence, release reference to the fence
object.


> Agreed that any shared locks are out of the way (especially stuff like
> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
> really bad here still).

Yeah that's also an point I've wanted to note on Maartens patch. Radeon
grabs the read side of it's exclusive semaphore while waiting for fences
(because it assumes that the fence it waits for is a Radeon fence).

Assuming that we need to wait in both directions with Prime (e.g. Intel
driver needs to wait for Radeon to finish rendering and Radeon needs to
wait for Intel to finish displaying), this might become a perfect
example of locking inversion.

> So from the core fence framework I think we already have exactly this,
> and we only need to adjust the radeon implementation a bit to make it
> less risky and invasive to the radeon driver logic.

Agree. Well the biggest problem I see is that exclusive semaphore I need
to take when anything calls into the driver. For the fence code I need
to move that down into the fence->signaled handler, cause that now can
be called from outside the driver.

Maarten solved this by telling the driver in the lockup handler (where
we grab the write side of the exclusive lock) that all interrupts are
already enabled, so that fence->signaled hopefully wouldn't mess with
the hardware at all. While this probably works, it just leaves me with a
feeling that we are doing something wrong here.

Christian.

> -Daniel

2014-07-22 16:21:56

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 5:59 PM, Christian König
<[email protected]> wrote:
> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>
>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>> <[email protected]> wrote:
>>>
>>> Drivers exporting fences need to provide a fence->signaled and a
>>> fence->wait
>>> function, everything else like fence->enable_signaling or calling
>>> fence_signaled() from the driver is optional.
>>>
>>> Drivers wanting to use exported fences don't call fence->signaled or
>>> fence->wait in atomic or interrupt context, and not with holding any
>>> global
>>> locking primitives (like mmap_sem etc...). Holding locking primitives
>>> local
>>> to the driver is ok, as long as they don't conflict with anything
>>> possible
>>> used by their own fence implementation.
>>
>> Well that's almost what we have right now with the exception that
>> drivers are allowed (actually must for correctness when updating
>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>
>
> In this case sorry for so much noise. I really haven't looked in so much
> detail into anything but Maarten's Radeon patches.
>
> But how does that then work right now? My impression was that it's mandatory
> for drivers to call fence_signaled()?

Maybe I've mixed things up a bit in my description. There is
fence_signal which the implementor/exporter of a fence must call when
the fence is completed. If the exporter has an ->enable_signaling
callback it can delay that call to fence_signal for as long as it
wishes as long as enable_signaling isn't called yet. But that's just
the optimization to not required irqs to be turned on all the time.

The other function is fence_is_signaled, which is used by code that is
interested in the fence state, together with fence_wait if it wants to
block and not just wants to know the momentary fence state. All the
other functions (the stuff that adds callbacks and the various _locked
and other versions) are just for fancy special cases.

>> Locking
>> correctness is enforced with some extremely nasty lockdep annotations
>> + additional debugging infrastructure enabled with
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>> dma-buf ww_mutexes while updating fences or waiting for them. And
>> obviously for ->wait we need non-atomic context, not just
>> non-interrupt.
>
>
> Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't be
> an RCU be more appropriate here? E.g. aren't we just interested that the
> current assigned fence at some point is signaled?

Yeah, as an optimization you can get the set of currently attached
fences to a dma-buf with just rcu. But if you update the set of fences
attached to a dma-buf (e.g. radeon blits the newly rendered frame to a
dma-buf exported by i915 for scanout on i915) then you need a write
lock on that buffer. Which is what the ww_mutex is for, to make sure
that you don't deadlock with i915 doing concurrent ops on the same
underlying buffer.

> Something like grab ww_mutexes, grab a reference to the current fence
> object, release ww_mutex, wait for fence, release reference to the fence
> object.

Yeah, if the only thing you want to do is wait for fences, then the
rcu-protected fence ref grabbing + lockless waiting is all you need.
But e.g. in an execbuf you also need to update fences and maybe deep
down in the reservation code you notice that you need to evict some
stuff and so need to wait on some other guy to finish, and it's too
complicated to drop and reacquire all the locks. Or you simply need to
do a blocking wait on other gpus (because there's no direct hw sync
mechanism) and again dropping locks would needlessly complicate the
code. So I think we should allow this just to avoid too hairy/brittle
(and almost definitely little tested code) in drivers.

Afaik this is also the same way ttm currently handles things wrt
buffer reservation and eviction.

>> Agreed that any shared locks are out of the way (especially stuff like
>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>> really bad here still).
>
>
> Yeah that's also an point I've wanted to note on Maartens patch. Radeon
> grabs the read side of it's exclusive semaphore while waiting for fences
> (because it assumes that the fence it waits for is a Radeon fence).
>
> Assuming that we need to wait in both directions with Prime (e.g. Intel
> driver needs to wait for Radeon to finish rendering and Radeon needs to wait
> for Intel to finish displaying), this might become a perfect example of
> locking inversion.

fence updates are atomic on a dma-buf, protected by ww_mutex. The neat
trick of ww_mutex is that they enforce a global ordering, so in your
scenario either i915 or radeon would be first and you can't deadlock.
There is no way to interleave anything even if you have lots of
buffers shared between i915/radeon. Wrt deadlocking it's exactly the
same guarantees as the magic ttm provides for just one driver with
concurrent command submission since it's the same idea.

>> So from the core fence framework I think we already have exactly this,
>> and we only need to adjust the radeon implementation a bit to make it
>> less risky and invasive to the radeon driver logic.
>
>
> Agree. Well the biggest problem I see is that exclusive semaphore I need to
> take when anything calls into the driver. For the fence code I need to move
> that down into the fence->signaled handler, cause that now can be called
> from outside the driver.
>
> Maarten solved this by telling the driver in the lockup handler (where we
> grab the write side of the exclusive lock) that all interrupts are already
> enabled, so that fence->signaled hopefully wouldn't mess with the hardware
> at all. While this probably works, it just leaves me with a feeling that we
> are doing something wrong here.

I'm not versed on the details in readon, but on i915 we can attach a
memory location and cookie value to each fence and just do a memory
fetch to figure out whether the fence has passed or not. So no locking
needed at all. Of course the fence itself needs to lock a reference
onto that memory location, which is a neat piece of integration work
that we still need to tackle in some cases - there's conflicting patch
series all over this ;-)

But like I've said fence->signaled is optional so you don't need this
necessarily, as long as radeon eventually calls fence_signaled once
the fence has completed.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 16:40:04

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

> Maybe I've mixed things up a bit in my description. There is
> fence_signal which the implementor/exporter of a fence must call when
> the fence is completed. If the exporter has an ->enable_signaling
> callback it can delay that call to fence_signal for as long as it
> wishes as long as enable_signaling isn't called yet. But that's just
> the optimization to not required irqs to be turned on all the time.
>
> The other function is fence_is_signaled, which is used by code that is
> interested in the fence state, together with fence_wait if it wants to
> block and not just wants to know the momentary fence state. All the
> other functions (the stuff that adds callbacks and the various _locked
> and other versions) are just for fancy special cases.
Well that's rather bad, cause IRQs aren't reliable enough on Radeon HW
for such a thing. Especially on Prime systems and Macs.

That's why we have this fancy HZ/2 timeout on all fence wait operations
to manually check if the fence is signaled or not.

To guarantee that a fence is signaled after enable_signaling is called
we would need to fire up a kernel thread which periodically calls
fence->signaled.

Christian.

Am 22.07.2014 18:21, schrieb Daniel Vetter:
> On Tue, Jul 22, 2014 at 5:59 PM, Christian König
> <[email protected]> wrote:
>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>
>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>>> <[email protected]> wrote:
>>>> Drivers exporting fences need to provide a fence->signaled and a
>>>> fence->wait
>>>> function, everything else like fence->enable_signaling or calling
>>>> fence_signaled() from the driver is optional.
>>>>
>>>> Drivers wanting to use exported fences don't call fence->signaled or
>>>> fence->wait in atomic or interrupt context, and not with holding any
>>>> global
>>>> locking primitives (like mmap_sem etc...). Holding locking primitives
>>>> local
>>>> to the driver is ok, as long as they don't conflict with anything
>>>> possible
>>>> used by their own fence implementation.
>>> Well that's almost what we have right now with the exception that
>>> drivers are allowed (actually must for correctness when updating
>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>
>> In this case sorry for so much noise. I really haven't looked in so much
>> detail into anything but Maarten's Radeon patches.
>>
>> But how does that then work right now? My impression was that it's mandatory
>> for drivers to call fence_signaled()?
> Maybe I've mixed things up a bit in my description. There is
> fence_signal which the implementor/exporter of a fence must call when
> the fence is completed. If the exporter has an ->enable_signaling
> callback it can delay that call to fence_signal for as long as it
> wishes as long as enable_signaling isn't called yet. But that's just
> the optimization to not required irqs to be turned on all the time.
>
> The other function is fence_is_signaled, which is used by code that is
> interested in the fence state, together with fence_wait if it wants to
> block and not just wants to know the momentary fence state. All the
> other functions (the stuff that adds callbacks and the various _locked
> and other versions) are just for fancy special cases.
>
>>> Locking
>>> correctness is enforced with some extremely nasty lockdep annotations
>>> + additional debugging infrastructure enabled with
>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>>> dma-buf ww_mutexes while updating fences or waiting for them. And
>>> obviously for ->wait we need non-atomic context, not just
>>> non-interrupt.
>>
>> Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't be
>> an RCU be more appropriate here? E.g. aren't we just interested that the
>> current assigned fence at some point is signaled?
> Yeah, as an optimization you can get the set of currently attached
> fences to a dma-buf with just rcu. But if you update the set of fences
> attached to a dma-buf (e.g. radeon blits the newly rendered frame to a
> dma-buf exported by i915 for scanout on i915) then you need a write
> lock on that buffer. Which is what the ww_mutex is for, to make sure
> that you don't deadlock with i915 doing concurrent ops on the same
> underlying buffer.
>
>> Something like grab ww_mutexes, grab a reference to the current fence
>> object, release ww_mutex, wait for fence, release reference to the fence
>> object.
> Yeah, if the only thing you want to do is wait for fences, then the
> rcu-protected fence ref grabbing + lockless waiting is all you need.
> But e.g. in an execbuf you also need to update fences and maybe deep
> down in the reservation code you notice that you need to evict some
> stuff and so need to wait on some other guy to finish, and it's too
> complicated to drop and reacquire all the locks. Or you simply need to
> do a blocking wait on other gpus (because there's no direct hw sync
> mechanism) and again dropping locks would needlessly complicate the
> code. So I think we should allow this just to avoid too hairy/brittle
> (and almost definitely little tested code) in drivers.
>
> Afaik this is also the same way ttm currently handles things wrt
> buffer reservation and eviction.
>
>>> Agreed that any shared locks are out of the way (especially stuff like
>>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>>> really bad here still).
>>
>> Yeah that's also an point I've wanted to note on Maartens patch. Radeon
>> grabs the read side of it's exclusive semaphore while waiting for fences
>> (because it assumes that the fence it waits for is a Radeon fence).
>>
>> Assuming that we need to wait in both directions with Prime (e.g. Intel
>> driver needs to wait for Radeon to finish rendering and Radeon needs to wait
>> for Intel to finish displaying), this might become a perfect example of
>> locking inversion.
> fence updates are atomic on a dma-buf, protected by ww_mutex. The neat
> trick of ww_mutex is that they enforce a global ordering, so in your
> scenario either i915 or radeon would be first and you can't deadlock.
> There is no way to interleave anything even if you have lots of
> buffers shared between i915/radeon. Wrt deadlocking it's exactly the
> same guarantees as the magic ttm provides for just one driver with
> concurrent command submission since it's the same idea.
>
>>> So from the core fence framework I think we already have exactly this,
>>> and we only need to adjust the radeon implementation a bit to make it
>>> less risky and invasive to the radeon driver logic.
>>
>> Agree. Well the biggest problem I see is that exclusive semaphore I need to
>> take when anything calls into the driver. For the fence code I need to move
>> that down into the fence->signaled handler, cause that now can be called
>> from outside the driver.
>>
>> Maarten solved this by telling the driver in the lockup handler (where we
>> grab the write side of the exclusive lock) that all interrupts are already
>> enabled, so that fence->signaled hopefully wouldn't mess with the hardware
>> at all. While this probably works, it just leaves me with a feeling that we
>> are doing something wrong here.
> I'm not versed on the details in readon, but on i915 we can attach a
> memory location and cookie value to each fence and just do a memory
> fetch to figure out whether the fence has passed or not. So no locking
> needed at all. Of course the fence itself needs to lock a reference
> onto that memory location, which is a neat piece of integration work
> that we still need to tackle in some cases - there's conflicting patch
> series all over this ;-)
>
> But like I've said fence->signaled is optional so you don't need this
> necessarily, as long as radeon eventually calls fence_signaled once
> the fence has completed.
> -Daniel

2014-07-22 16:43:09

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 6:21 PM, Daniel Vetter <[email protected]> wrote:
> But like I've said fence->signaled is optional so you don't need this
> necessarily, as long as radeon eventually calls fence_signaled once
> the fence has completed.

Actually I've chatted a bit with Maarten about the different ways we
could restrict both the calling context and the implementations for
fence callbacks to avoid surprises. One is certainyl that we need
WARN_ON(in_interrupt) around the wait, enable_singaling and add
callback stuff.

But we also talked about ensure that the ->signaled callback never
sleeps by wrapping it in a preempt_enable/disable section. Currently
sleeping is allowed in ->signaled, which the radeon implementation
here does. I think it would be reasonable to forbid this and drop
__radeon_fence_signaled.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 16:52:46

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 6:39 PM, Christian König
<[email protected]> wrote:
>> Maybe I've mixed things up a bit in my description. There is
>> fence_signal which the implementor/exporter of a fence must call when
>> the fence is completed. If the exporter has an ->enable_signaling
>> callback it can delay that call to fence_signal for as long as it
>> wishes as long as enable_signaling isn't called yet. But that's just
>> the optimization to not required irqs to be turned on all the time.
>>
>> The other function is fence_is_signaled, which is used by code that is
>> interested in the fence state, together with fence_wait if it wants to
>> block and not just wants to know the momentary fence state. All the
>> other functions (the stuff that adds callbacks and the various _locked
>> and other versions) are just for fancy special cases.
>
> Well that's rather bad, cause IRQs aren't reliable enough on Radeon HW for
> such a thing. Especially on Prime systems and Macs.
>
> That's why we have this fancy HZ/2 timeout on all fence wait operations to
> manually check if the fence is signaled or not.
>
> To guarantee that a fence is signaled after enable_signaling is called we
> would need to fire up a kernel thread which periodically calls
> fence->signaled.

We actually have seen similar fun on some i915 platforms. I wonder
whether we shouldn't have something in the fence core for this given
how common it is. Currently we have the same trick with regular wakups
on platforms with unreliable interrupts, but I haven't yet looked at
how we'll do this with callbacks once we add the scheduler and fences.
It might be though that we've finally fixed these coherency issues
between the interrupt and the fence write for real.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-22 19:13:25

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, 22 Jul 2014 17:48:18 +0200
Daniel Vetter <[email protected]> wrote:

> On Tue, Jul 22, 2014 at 5:42 PM, Alex Deucher <[email protected]> wrote:
> >> Fence-based syncing between userspace queues submitted stuff through
> >> doorbells and anything submitted by the general simply wont work.
> >> Which is why I think the doorbell is a stupid interface since I just
> >> don't see cameras and v4l devices implementing all that complexity to
> >> get a pure userspace side sync solution.
> >>
> >
> > Like it or not this is what a lot of application writers want (look at
> > mantle and metal and similar new APIs or android synpts). Having
> > queues and fences in userspace allows the application to structure
> > things to best fit their own task graphs. The app can decide how to
> > deal with dependencies and synchronization explicitly instead of
> > blocking the queues in the kernel for everyone. Anyway, this is
> > getting off topic.
>
> Well there's explicit fences as used in opencl and android syncpts. My
> plan is actually to support that in i915 using Maarten's struct fence
> stuff (and there's just a very trivial patch for the android stuff in
> merging needed to get there). What doesn't work is fences created
> behind the kernel's back purely in userspace by giving shared memory
> locations special meaning. Those get the kernel completely out of the
> picture (as opposed to android syncpts, which just make sync
> explicit).
>
> I guess long-term we might need something like gpu futexes to make
> that pure userspace syncing integrate a bit better, but imo that's (at
> least for now) out of scope. For fences here I have the goal of one

Yeah, with a little kernel help you could have a mostly kernel
independent sync mechanism using just shared mem in userspace. The
kernel would just need to signal any interested clients when something
happened (even if it didn't know what) and let userspace sort out the
rest. I think that would be a nice thing to provide at some point, as
it could allow for some fine grained CPU/GPU algorithms that use
lightweight synchronization with and without busy looping on the CPU
side.

But all of that is definitely a lower priority than getting explicit
fencing exported to userspace to work right, both for intra-driver
sync and inter-driver sync.

> internally representation used by both implicit syncing (dma-buf on
> classic linux, e.g. prime) and explicit fencing on android or opencl
> or something like that.
>
> We don't have the code yet ready, but that's the direction i915 will
> move towards for the near future. Jesse is working on some patches
> already.

Yeah I'd like to get some feedback from Maarten on my bits so I can get
them ready for upstream. I still need to add documentation and tests,
but I'd like to make sure the interfaces and internals get acked first.

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2014-07-23 06:41:11

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 22-07-14 17:59, Christian König schreef:
> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>> <[email protected]> wrote:
>>> Drivers exporting fences need to provide a fence->signaled and a fence->wait
>>> function, everything else like fence->enable_signaling or calling
>>> fence_signaled() from the driver is optional.
>>>
>>> Drivers wanting to use exported fences don't call fence->signaled or
>>> fence->wait in atomic or interrupt context, and not with holding any global
>>> locking primitives (like mmap_sem etc...). Holding locking primitives local
>>> to the driver is ok, as long as they don't conflict with anything possible
>>> used by their own fence implementation.
>> Well that's almost what we have right now with the exception that
>> drivers are allowed (actually must for correctness when updating
>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>
> In this case sorry for so much noise. I really haven't looked in so much detail into anything but Maarten's Radeon patches.
>
> But how does that then work right now? My impression was that it's mandatory for drivers to call fence_signaled()?
It's only mandatory to call fence_signal() if the .enable_signaling callback has been called, else you can get away with never calling signaling a fence at all before dropping the last refcount to it.
This allows you to keep interrupts disabled when you don't need them.
>> Locking
>> correctness is enforced with some extremely nasty lockdep annotations
>> + additional debugging infrastructure enabled with
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>> dma-buf ww_mutexes while updating fences or waiting for them. And
>> obviously for ->wait we need non-atomic context, not just
>> non-interrupt.
>
> Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't be an RCU be more appropriate here? E.g. aren't we just interested that the current assigned fence at some point is signaled?
You can wait with RCU, without holding the ww_mutex, by calling reservation_object_wait_timeout_rcu on ttm_bo->resv.
If you don't want to block you could test with reservation_object_test_signaled_rcu.
Or if you want a copy of all fences without taking locks, try reservation_object_get_fences_rcu. (Might be out of date by the time the function returns if you don't hold ww_mutex, if you hold ww_mutex you probably don't need to call this function.)

I didn't add non-rcu versions, but using the RCU functions would work with ww_mutex held too, probably with some small overhead.
> Something like grab ww_mutexes, grab a reference to the current fence object, release ww_mutex, wait for fence, release reference to the fence object.
This is what I do currently. :-) The reservation_object that's embedded in TTM gets shared with the dma-buf, so there will be no special case needed for dma-buf at all, all objects can simply be shared and the synchronization is handled in the same way.

ttm_bo_reserve and friends automatically end up locking the dma-buf too, and lockdep works on it.

>
>> Agreed that any shared locks are out of the way (especially stuff like
>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>> really bad here still).
>
> Yeah that's also an point I've wanted to note on Maartens patch. Radeon grabs the read side of it's exclusive semaphore while waiting for fences (because it assumes that the fence it waits for is a Radeon fence).
>
> Assuming that we need to wait in both directions with Prime (e.g. Intel driver needs to wait for Radeon to finish rendering and Radeon needs to wait for Intel to finish displaying), this might become a perfect example of locking inversion.
In the preliminary patches where I can sync radeon with other GPU's I've been very careful in all the places that call into fences, to make sure that radeon wouldn't try to handle lockups for a different (possibly also radeon) card.

This is also why fence_is_signaled should never block, and why it trylocks the exclusive_lock. :-) I think lockdep would complain if I grabbed exclusive_lock while blocking in is_signaled.

>> So from the core fence framework I think we already have exactly this,
>> and we only need to adjust the radeon implementation a bit to make it
>> less risky and invasive to the radeon driver logic.
>
> Agree. Well the biggest problem I see is that exclusive semaphore I need to take when anything calls into the driver. For the fence code I need to move that down into the fence->signaled handler, cause that now can be called from outside the driver.
>
> Maarten solved this by telling the driver in the lockup handler (where we grab the write side of the exclusive lock) that all interrupts are already enabled, so that fence->signaled hopefully wouldn't mess with the hardware at all. While this probably works, it just leaves me with a feeling that we are doing something wrong here.
There is unfortunately no global mechanism to say 'this card is locked up, please don't call into any of my fences', and I don't associate fences with devices, and radeon doesn't keep a global list of fences.
If all of that existed, it would complicate the interface and its callers a lot, while I like to keep things simple.
So I did the best I could, and simply prevented the fence calls from fiddling with the hardware. Fortunately gpu lockup is not a common operation. :-)

~Maarten

2014-07-23 06:52:28

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
> op 22-07-14 17:59, Christian König schreef:
>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>>> <[email protected]> wrote:
>>>> Drivers exporting fences need to provide a fence->signaled and a fence->wait
>>>> function, everything else like fence->enable_signaling or calling
>>>> fence_signaled() from the driver is optional.
>>>>
>>>> Drivers wanting to use exported fences don't call fence->signaled or
>>>> fence->wait in atomic or interrupt context, and not with holding any global
>>>> locking primitives (like mmap_sem etc...). Holding locking primitives local
>>>> to the driver is ok, as long as they don't conflict with anything possible
>>>> used by their own fence implementation.
>>> Well that's almost what we have right now with the exception that
>>> drivers are allowed (actually must for correctness when updating
>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>> In this case sorry for so much noise. I really haven't looked in so much detail into anything but Maarten's Radeon patches.
>>
>> But how does that then work right now? My impression was that it's mandatory for drivers to call fence_signaled()?
> It's only mandatory to call fence_signal() if the .enable_signaling callback has been called, else you can get away with never calling signaling a fence at all before dropping the last refcount to it.
> This allows you to keep interrupts disabled when you don't need them.

Can we somehow avoid the need to call fence_signal() at all? The
interrupts at least on radeon are way to unreliable for such a thing.
Can enable_signalling fail? What's the reason for fence_signaled() in
the first place?

>>> Agreed that any shared locks are out of the way (especially stuff like
>>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>>> really bad here still).
>> Yeah that's also an point I've wanted to note on Maartens patch. Radeon grabs the read side of it's exclusive semaphore while waiting for fences (because it assumes that the fence it waits for is a Radeon fence).
>>
>> Assuming that we need to wait in both directions with Prime (e.g. Intel driver needs to wait for Radeon to finish rendering and Radeon needs to wait for Intel to finish displaying), this might become a perfect example of locking inversion.
> In the preliminary patches where I can sync radeon with other GPU's I've been very careful in all the places that call into fences, to make sure that radeon wouldn't try to handle lockups for a different (possibly also radeon) card.

That's actually not such a good idea.

In case of a lockup we need to handle the lockup cause otherwise it
could happen that radeon waits for the lockup to be resolved and the
lockup handling needs to wait for a fence that's never signaled because
of the lockup.

Christian.

>
> This is also why fence_is_signaled should never block, and why it trylocks the exclusive_lock. :-) I think lockdep would complain if I grabbed exclusive_lock while blocking in is_signaled.
>
>>> So from the core fence framework I think we already have exactly this,
>>> and we only need to adjust the radeon implementation a bit to make it
>>> less risky and invasive to the radeon driver logic.
>> Agree. Well the biggest problem I see is that exclusive semaphore I need to take when anything calls into the driver. For the fence code I need to move that down into the fence->signaled handler, cause that now can be called from outside the driver.
>>
>> Maarten solved this by telling the driver in the lockup handler (where we grab the write side of the exclusive lock) that all interrupts are already enabled, so that fence->signaled hopefully wouldn't mess with the hardware at all. While this probably works, it just leaves me with a feeling that we are doing something wrong here.
> There is unfortunately no global mechanism to say 'this card is locked up, please don't call into any of my fences', and I don't associate fences with devices, and radeon doesn't keep a global list of fences.
> If all of that existed, it would complicate the interface and its callers a lot, while I like to keep things simple.
> So I did the best I could, and simply prevented the fence calls from fiddling with the hardware. Fortunately gpu lockup is not a common operation. :-)
>
> ~Maarten
>
>

2014-07-23 07:02:13

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 8:52 AM, Christian König
<[email protected]> wrote:
>> In the preliminary patches where I can sync radeon with other GPU's I've
>> been very careful in all the places that call into fences, to make sure that
>> radeon wouldn't try to handle lockups for a different (possibly also radeon)
>> card.
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could
> happen that radeon waits for the lockup to be resolved and the lockup
> handling needs to wait for a fence that's never signaled because of the
> lockup.

I thought the plan for now is that each driver handles lookups
themselfs for now. So if any batch gets stuck for too long (whether
it's our own gpu that's stuck or whether we're somehow stuck on a
fence from a 2nd gpu doesn't matter) the driver steps in with a reset
and signals completion to all its own fences that have been in that
pile-up. As long as each driver participating in fencing has means to
abort/reset we'll eventually get unstuck.

Essentially every driver has to guarantee that assuming dependent
fences all complete eventually that it _will_ complete its own fences
no matter what.

For now this should be good enough, but for arb_robusteness or people
who care a bit about their compute results we need reliable
notification to userspace that a reset happened. I think we could add
a new "aborted" fence state for that case and then propagate that. But
given how tricky the code to compute reset victims in i915 is already
I think we should leave this out for now. And even later on make it
strictly opt-in.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 07:06:44

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 08:52, Christian König schreef:
> Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
>> op 22-07-14 17:59, Christian König schreef:
>>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>>>> <[email protected]> wrote:
>>>>> Drivers exporting fences need to provide a fence->signaled and a fence->wait
>>>>> function, everything else like fence->enable_signaling or calling
>>>>> fence_signaled() from the driver is optional.
>>>>>
>>>>> Drivers wanting to use exported fences don't call fence->signaled or
>>>>> fence->wait in atomic or interrupt context, and not with holding any global
>>>>> locking primitives (like mmap_sem etc...). Holding locking primitives local
>>>>> to the driver is ok, as long as they don't conflict with anything possible
>>>>> used by their own fence implementation.
>>>> Well that's almost what we have right now with the exception that
>>>> drivers are allowed (actually must for correctness when updating
>>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>> In this case sorry for so much noise. I really haven't looked in so much detail into anything but Maarten's Radeon patches.
>>>
>>> But how does that then work right now? My impression was that it's mandatory for drivers to call fence_signaled()?
>> It's only mandatory to call fence_signal() if the .enable_signaling callback has been called, else you can get away with never calling signaling a fence at all before dropping the last refcount to it.
>> This allows you to keep interrupts disabled when you don't need them.
>
> Can we somehow avoid the need to call fence_signal() at all? The interrupts at least on radeon are way to unreliable for such a thing. Can enable_signalling fail? What's the reason for fence_signaled() in the first place?
It doesn't need to be completely reliable, or finish immediately.

And any time wake_up_all(&rdev->fence_queue) is called all the fences that were enabled will be rechecked.

>>>> Agreed that any shared locks are out of the way (especially stuff like
>>>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>>>> really bad here still).
>>> Yeah that's also an point I've wanted to note on Maartens patch. Radeon grabs the read side of it's exclusive semaphore while waiting for fences (because it assumes that the fence it waits for is a Radeon fence).
>>>
>>> Assuming that we need to wait in both directions with Prime (e.g. Intel driver needs to wait for Radeon to finish rendering and Radeon needs to wait for Intel to finish displaying), this might become a perfect example of locking inversion.
>> In the preliminary patches where I can sync radeon with other GPU's I've been very careful in all the places that call into fences, to make sure that radeon wouldn't try to handle lockups for a different (possibly also radeon) card.
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could happen that radeon waits for the lockup to be resolved and the lockup handling needs to wait for a fence that's never signaled because of the lockup.
The lockup handling calls radeon_fence_wait, not the generic fence_wait. It doesn't call the exported wait function that takes the exclusive_lock in read mode.
And lockdep should have complained if I screwed that up. :-)

~Maarten

2014-07-23 07:09:23

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 9:06 AM, Maarten Lankhorst
<[email protected]> wrote:
>> Can we somehow avoid the need to call fence_signal() at all? The interrupts at least on radeon are way to unreliable for such a thing. Can enable_signalling fail? What's the reason for fence_signaled() in the first place?
> It doesn't need to be completely reliable, or finish immediately.
>
> And any time wake_up_all(&rdev->fence_queue) is called all the fences that were enabled will be rechecked.

I raised this already somewhere else, but should we have some common
infrastructure in the core fence code to recheck fences periodically?
radeon doesn't seem to be the only hw where this isn't reliable
enough. Of course timer-based rechecking would only work if the driver
provides the fence->signalled callback to recheck actual fence state.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 07:16:04

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 09:09, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 9:06 AM, Maarten Lankhorst
> <[email protected]> wrote:
>>> Can we somehow avoid the need to call fence_signal() at all? The interrupts at least on radeon are way to unreliable for such a thing. Can enable_signalling fail? What's the reason for fence_signaled() in the first place?
>> It doesn't need to be completely reliable, or finish immediately.
>>
>> And any time wake_up_all(&rdev->fence_queue) is called all the fences that were enabled will be rechecked.
> I raised this already somewhere else, but should we have some common
> infrastructure in the core fence code to recheck fences periodically?
> radeon doesn't seem to be the only hw where this isn't reliable
> enough. Of course timer-based rechecking would only work if the driver
> provides the fence->signalled callback to recheck actual fence state.

Yeah, agree. The proposal won't work reliable at all with radeon.

Interrupts are accumulated before sending them to the CPU, e.g. you can
get one interrupt for multiple fences finished. If it's just the
interrupt for the last fence submitted that gets lost you are completely
screwed up because you won't get another interrupt.

I had that problem multiple times while working on UVD support,
resulting in the driver thinking that it can't submit more jobs because
non of the interrupts for the already submitted fence cam through.

Apart from that interrupts on Macs usually don't work at all, so we
really need a solution where calling fence_signaled() is completely
optional.

Christian.

> -Daniel

2014-07-23 07:27:16

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 09:06, schrieb Maarten Lankhorst:
> op 23-07-14 08:52, Christian König schreef:
>> Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
>>> op 22-07-14 17:59, Christian König schreef:
>>>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>>>>> <[email protected]> wrote:
>>>>>> Drivers exporting fences need to provide a fence->signaled and a fence->wait
>>>>>> function, everything else like fence->enable_signaling or calling
>>>>>> fence_signaled() from the driver is optional.
>>>>>>
>>>>>> Drivers wanting to use exported fences don't call fence->signaled or
>>>>>> fence->wait in atomic or interrupt context, and not with holding any global
>>>>>> locking primitives (like mmap_sem etc...). Holding locking primitives local
>>>>>> to the driver is ok, as long as they don't conflict with anything possible
>>>>>> used by their own fence implementation.
>>>>> Well that's almost what we have right now with the exception that
>>>>> drivers are allowed (actually must for correctness when updating
>>>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>>> In this case sorry for so much noise. I really haven't looked in so much detail into anything but Maarten's Radeon patches.
>>>>
>>>> But how does that then work right now? My impression was that it's mandatory for drivers to call fence_signaled()?
>>> It's only mandatory to call fence_signal() if the .enable_signaling callback has been called, else you can get away with never calling signaling a fence at all before dropping the last refcount to it.
>>> This allows you to keep interrupts disabled when you don't need them.
>> Can we somehow avoid the need to call fence_signal() at all? The interrupts at least on radeon are way to unreliable for such a thing. Can enable_signalling fail? What's the reason for fence_signaled() in the first place?
> It doesn't need to be completely reliable, or finish immediately.
>
> And any time wake_up_all(&rdev->fence_queue) is called all the fences that were enabled will be rechecked.
>
>>>>> Agreed that any shared locks are out of the way (especially stuff like
>>>>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>>>>> really bad here still).
>>>> Yeah that's also an point I've wanted to note on Maartens patch. Radeon grabs the read side of it's exclusive semaphore while waiting for fences (because it assumes that the fence it waits for is a Radeon fence).
>>>>
>>>> Assuming that we need to wait in both directions with Prime (e.g. Intel driver needs to wait for Radeon to finish rendering and Radeon needs to wait for Intel to finish displaying), this might become a perfect example of locking inversion.
>>> In the preliminary patches where I can sync radeon with other GPU's I've been very careful in all the places that call into fences, to make sure that radeon wouldn't try to handle lockups for a different (possibly also radeon) card.
>> That's actually not such a good idea.
>>
>> In case of a lockup we need to handle the lockup cause otherwise it could happen that radeon waits for the lockup to be resolved and the lockup handling needs to wait for a fence that's never signaled because of the lockup.
> The lockup handling calls radeon_fence_wait, not the generic fence_wait. It doesn't call the exported wait function that takes the exclusive_lock in read mode.
> And lockdep should have complained if I screwed that up. :-)

You screwed it up and lockdep didn't warn you about it :-P

It's not a locking problem I'm talking about here. Radeons lockup
handling kicks in when anything calls into the driver from the outside,
if you have a fence wait function that's called from the outside but
doesn't handle lockups you essentially rely on somebody else calling
another radeon function for the lockup to be resolved.

Christian.

>
> ~Maarten
>

2014-07-23 07:31:53

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 9:26 AM, Christian König
<[email protected]> wrote:
> It's not a locking problem I'm talking about here. Radeons lockup handling
> kicks in when anything calls into the driver from the outside, if you have a
> fence wait function that's called from the outside but doesn't handle
> lockups you essentially rely on somebody else calling another radeon
> function for the lockup to be resolved.

So you don't have a timer in radeon that periodically checks whether
progress is still being made? That's the approach we're using in i915,
together with some tricks to kick any stuck waiters so that we can
reliably step in and grab locks for the reset.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 07:32:29

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 09:15, Christian König schreef:
> Am 23.07.2014 09:09, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 9:06 AM, Maarten Lankhorst
>> <[email protected]> wrote:
>>>> Can we somehow avoid the need to call fence_signal() at all? The interrupts at least on radeon are way to unreliable for such a thing. Can enable_signalling fail? What's the reason for fence_signaled() in the first place?
>>> It doesn't need to be completely reliable, or finish immediately.
>>>
>>> And any time wake_up_all(&rdev->fence_queue) is called all the fences that were enabled will be rechecked.
>> I raised this already somewhere else, but should we have some common
>> infrastructure in the core fence code to recheck fences periodically?
>> radeon doesn't seem to be the only hw where this isn't reliable
>> enough. Of course timer-based rechecking would only work if the driver
>> provides the fence->signalled callback to recheck actual fence state.
>
> Yeah, agree. The proposal won't work reliable at all with radeon.
>
> Interrupts are accumulated before sending them to the CPU, e.g. you can get one interrupt for multiple fences finished. If it's just the interrupt for the last fence submitted that gets lost you are completely screwed up because you won't get another interrupt.
>
> I had that problem multiple times while working on UVD support, resulting in the driver thinking that it can't submit more jobs because non of the interrupts for the already submitted fence cam through.
Yeah but all the fences that have .enable_signaling will get signaled from a single interrupt, or when any waiter calls radeon_fence_process.

> Apart from that interrupts on Macs usually don't work at all, so we really need a solution where calling fence_signaled() is completely optional.
I haven't had a problem with interrupts on my mbp after d1f9809ed1315c4cdc5760cf2f59626fd3276952, but it should be trivial to start a timer that periodically does wake_up_all, and gets its timeout reset in a call to radeon_fence_process. It could either be added as a work item, or as a normal timer (disabled during gpu lockup recovery to prevent checks from fiddling with things it shouldn't).

~Maarten

2014-07-23 07:37:20

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 09:31, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 9:26 AM, Christian König
> <[email protected]> wrote:
>> It's not a locking problem I'm talking about here. Radeons lockup handling
>> kicks in when anything calls into the driver from the outside, if you have a
>> fence wait function that's called from the outside but doesn't handle
>> lockups you essentially rely on somebody else calling another radeon
>> function for the lockup to be resolved.
> So you don't have a timer in radeon that periodically checks whether
> progress is still being made? That's the approach we're using in i915,
> together with some tricks to kick any stuck waiters so that we can
> reliably step in and grab locks for the reset.

We tried this approach, but it didn't worked at all.

I already considered trying it again because of the upcoming fence
implementation, but reconsidering that when a driver is forced to change
it's handling because of the fence implementation that's just another
hint that there is something wrong here.

Christian.

> -Daniel

2014-07-23 07:42:11

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 09:32, schrieb Maarten Lankhorst:
> op 23-07-14 09:15, Christian König schreef:
>> Am 23.07.2014 09:09, schrieb Daniel Vetter:
>>> On Wed, Jul 23, 2014 at 9:06 AM, Maarten Lankhorst
>>> <[email protected]> wrote:
>>>>> Can we somehow avoid the need to call fence_signal() at all? The interrupts at least on radeon are way to unreliable for such a thing. Can enable_signalling fail? What's the reason for fence_signaled() in the first place?
>>>> It doesn't need to be completely reliable, or finish immediately.
>>>>
>>>> And any time wake_up_all(&rdev->fence_queue) is called all the fences that were enabled will be rechecked.
>>> I raised this already somewhere else, but should we have some common
>>> infrastructure in the core fence code to recheck fences periodically?
>>> radeon doesn't seem to be the only hw where this isn't reliable
>>> enough. Of course timer-based rechecking would only work if the driver
>>> provides the fence->signalled callback to recheck actual fence state.
>> Yeah, agree. The proposal won't work reliable at all with radeon.
>>
>> Interrupts are accumulated before sending them to the CPU, e.g. you can get one interrupt for multiple fences finished. If it's just the interrupt for the last fence submitted that gets lost you are completely screwed up because you won't get another interrupt.
>>
>> I had that problem multiple times while working on UVD support, resulting in the driver thinking that it can't submit more jobs because non of the interrupts for the already submitted fence cam through.
> Yeah but all the fences that have .enable_signaling will get signaled from a single interrupt, or when any waiter calls radeon_fence_process.

You still need to check if the fence is really signaled, cause
radeon_fence_process might wakeup the wait queue because of something
completely different.

Apart from that you once again rely on somebody else calling
radeon_fence_process. This will probably work most of the times, but
it's not 100% reliable.

>
>> Apart from that interrupts on Macs usually don't work at all, so we really need a solution where calling fence_signaled() is completely optional.
> I haven't had a problem with interrupts on my mbp after d1f9809ed1315c4cdc5760cf2f59626fd3276952, but it should be trivial to start a timer that periodically does wake_up_all, and gets its timeout reset in a call to radeon_fence_process. It could either be added as a work item, or as a normal timer (disabled during gpu lockup recovery to prevent checks from fiddling with things it shouldn't).

That will probably work, but just again sounds like we force the driver
to fit the fence implementation instead of the other way around.

Why is that fence_signaled() call needed for in the first place?

Christian.

>
> ~Maarten
>

2014-07-23 07:52:06

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 09:37, Christian König schreef:
> Am 23.07.2014 09:31, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 9:26 AM, Christian König
>> <[email protected]> wrote:
>>> It's not a locking problem I'm talking about here. Radeons lockup handling
>>> kicks in when anything calls into the driver from the outside, if you have a
>>> fence wait function that's called from the outside but doesn't handle
>>> lockups you essentially rely on somebody else calling another radeon
>>> function for the lockup to be resolved.
>> So you don't have a timer in radeon that periodically checks whether
>> progress is still being made? That's the approach we're using in i915,
>> together with some tricks to kick any stuck waiters so that we can
>> reliably step in and grab locks for the reset.
>
> We tried this approach, but it didn't worked at all.
>
> I already considered trying it again because of the upcoming fence implementation, but reconsidering that when a driver is forced to change it's handling because of the fence implementation that's just another hint that there is something wrong here.
As far as I can tell it wouldn't need to be reworked for the fence implementation currently, only the moment you want to allow callers outside of radeon. :-)
Doing a GPU lockup recovery in the wait function would be messy even right now, you would hit a deadlock in ttm_bo_delayed_delete -> ttm_bo_cleanup_refs_and_unlock.

Regardless of the fence implementation, why would it be a good idea to do a full lockup recovery when some other driver is
calling your wait function? That doesn't seem to be a nice thing to do, so I think a timeout is the best error you could return here,
other drivers have to deal with that anyway.

~Maarten

2014-07-23 07:58:39

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

> Regardless of the fence implementation, why would it be a good idea to do a full lockup recovery when some other driver is
> calling your wait function? That doesn't seem to be a nice thing to do, so I think a timeout is the best error you could return here,
> other drivers have to deal with that anyway.
The problem is that we need to guarantee that the lockup will be
resolved eventually.

Just imagine an application using prime is locking up Radeon and because
of that gets killed by the user. Nothing else in the system would use
the Radeon hardware any more and so radeon gets only called by another
driver waiting patiently for radeon to finish rendering which never
happens because the whole thing is locked up and we don't get a chance
to recover.

Christian.

Am 23.07.2014 09:51, schrieb Maarten Lankhorst:
> op 23-07-14 09:37, Christian König schreef:
>> Am 23.07.2014 09:31, schrieb Daniel Vetter:
>>> On Wed, Jul 23, 2014 at 9:26 AM, Christian König
>>> <[email protected]> wrote:
>>>> It's not a locking problem I'm talking about here. Radeons lockup handling
>>>> kicks in when anything calls into the driver from the outside, if you have a
>>>> fence wait function that's called from the outside but doesn't handle
>>>> lockups you essentially rely on somebody else calling another radeon
>>>> function for the lockup to be resolved.
>>> So you don't have a timer in radeon that periodically checks whether
>>> progress is still being made? That's the approach we're using in i915,
>>> together with some tricks to kick any stuck waiters so that we can
>>> reliably step in and grab locks for the reset.
>> We tried this approach, but it didn't worked at all.
>>
>> I already considered trying it again because of the upcoming fence implementation, but reconsidering that when a driver is forced to change it's handling because of the fence implementation that's just another hint that there is something wrong here.
> As far as I can tell it wouldn't need to be reworked for the fence implementation currently, only the moment you want to allow callers outside of radeon. :-)
> Doing a GPU lockup recovery in the wait function would be messy even right now, you would hit a deadlock in ttm_bo_delayed_delete -> ttm_bo_cleanup_refs_and_unlock.
>
> Regardless of the fence implementation, why would it be a good idea to do a full lockup recovery when some other driver is
> calling your wait function? That doesn't seem to be a nice thing to do, so I think a timeout is the best error you could return here,
> other drivers have to deal with that anyway.
>
> ~Maarten
>

2014-07-23 08:01:59

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 9:37 AM, Christian König
<[email protected]> wrote:
> Am 23.07.2014 09:31, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 9:26 AM, Christian König
>> <[email protected]> wrote:
>>>
>>> It's not a locking problem I'm talking about here. Radeons lockup
>>> handling
>>> kicks in when anything calls into the driver from the outside, if you
>>> have a
>>> fence wait function that's called from the outside but doesn't handle
>>> lockups you essentially rely on somebody else calling another radeon
>>> function for the lockup to be resolved.
>>
>> So you don't have a timer in radeon that periodically checks whether
>> progress is still being made? That's the approach we're using in i915,
>> together with some tricks to kick any stuck waiters so that we can
>> reliably step in and grab locks for the reset.
>
>
> We tried this approach, but it didn't worked at all.
>
> I already considered trying it again because of the upcoming fence
> implementation, but reconsidering that when a driver is forced to change
> it's handling because of the fence implementation that's just another hint
> that there is something wrong here.

Out of curiosity: What's the blocker for using a timer/scheduled work
to reset radeon? Getting this right on i915 has been fairly tricky and
we now have an elaborate multi-stage state machine to get the driver
through a reset. So always interested in different solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 08:07:20

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 9:58 AM, Christian König
<[email protected]> wrote:
> Just imagine an application using prime is locking up Radeon and because of
> that gets killed by the user. Nothing else in the system would use the
> Radeon hardware any more and so radeon gets only called by another driver
> waiting patiently for radeon to finish rendering which never happens because
> the whole thing is locked up and we don't get a chance to recover.

But isn't that possible already without fences? X hangs radeon, user
crashes X for unrelated reasons before radeon will notice the hang.
Then no one uses radeon any longer and the hang stays undetected.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 08:20:18

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 10:07, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 9:58 AM, Christian König
> <[email protected]> wrote:
>> Just imagine an application using prime is locking up Radeon and because of
>> that gets killed by the user. Nothing else in the system would use the
>> Radeon hardware any more and so radeon gets only called by another driver
>> waiting patiently for radeon to finish rendering which never happens because
>> the whole thing is locked up and we don't get a chance to recover.
> But isn't that possible already without fences? X hangs radeon, user
> crashes X for unrelated reasons before radeon will notice the hang.
> Then no one uses radeon any longer and the hang stays undetected.

Yeah, especially with multimedia application. But I don't really care
about this problem because the next time an application tries to use the
block in question we actually do the reset and everything is fine.

In your example we would do the reset when the next X server starts,
before that point nobody would care because nobody uses the hardware.

An additional problem here is that resets are something perfect normal
for radeon. For example UVD can "crash" when you feed it with invalid
bitstream data, (ok actually it send an interrupt and stops any
processing for the driver to investigate). To continue processing you
need to go through a rather complicated reset procedure.

Christian.

> -Daniel

2014-07-23 08:26:14

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 10:20, Christian König schreef:
> Am 23.07.2014 10:07, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 9:58 AM, Christian König
>> <[email protected]> wrote:
>>> Just imagine an application using prime is locking up Radeon and because of
>>> that gets killed by the user. Nothing else in the system would use the
>>> Radeon hardware any more and so radeon gets only called by another driver
>>> waiting patiently for radeon to finish rendering which never happens because
>>> the whole thing is locked up and we don't get a chance to recover.
>> But isn't that possible already without fences? X hangs radeon, user
>> crashes X for unrelated reasons before radeon will notice the hang.
>> Then no one uses radeon any longer and the hang stays undetected.
>
> Yeah, especially with multimedia application. But I don't really care about this problem because the next time an application tries to use the block in question we actually do the reset and everything is fine.
>
> In your example we would do the reset when the next X server starts, before that point nobody would care because nobody uses the hardware.
>
> An additional problem here is that resets are something perfect normal for radeon. For example UVD can "crash" when you feed it with invalid bitstream data, (ok actually it send an interrupt and stops any processing for the driver to investigate). To continue processing you need to go through a rather complicated reset procedure.
In this case if the sync was to i915 the i915 lockup procedure would take care of itself. It wouldn't fix radeon, but it would at least unblock your intel card again. I haven't specifically added a special case to attempt to unblock external fences, but I've considered it. :-)

~Maarten

2014-07-23 08:32:11

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 10:01, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 9:37 AM, Christian König
> <[email protected]> wrote:
>> Am 23.07.2014 09:31, schrieb Daniel Vetter:
>>> On Wed, Jul 23, 2014 at 9:26 AM, Christian König
>>> <[email protected]> wrote:
>>>> It's not a locking problem I'm talking about here. Radeons lockup
>>>> handling
>>>> kicks in when anything calls into the driver from the outside, if you
>>>> have a
>>>> fence wait function that's called from the outside but doesn't handle
>>>> lockups you essentially rely on somebody else calling another radeon
>>>> function for the lockup to be resolved.
>>> So you don't have a timer in radeon that periodically checks whether
>>> progress is still being made? That's the approach we're using in i915,
>>> together with some tricks to kick any stuck waiters so that we can
>>> reliably step in and grab locks for the reset.
>>
>> We tried this approach, but it didn't worked at all.
>>
>> I already considered trying it again because of the upcoming fence
>> implementation, but reconsidering that when a driver is forced to change
>> it's handling because of the fence implementation that's just another hint
>> that there is something wrong here.
> Out of curiosity: What's the blocker for using a timer/scheduled work
> to reset radeon? Getting this right on i915 has been fairly tricky and
> we now have an elaborate multi-stage state machine to get the driver
> through a reset. So always interested in different solutions.

IIRC we would have needed a quite advanced multi-stage state machine as
well and that was just to much overhead at this point.

One major problem was the power management in use back then, but that
got replaced by DPM in the meantime. So it might be a good idea to try
again.

What we currently do is marking the driver as "needs reset" and
returning -EAGAIN and then the next IOCTL starts the reset procedure
before doing anything else.

Christian.

> -Daniel

2014-07-23 08:42:18

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 10:25 AM, Maarten Lankhorst
<[email protected]> wrote:
> In this case if the sync was to i915 the i915 lockup procedure would take care of itself. It wouldn't fix radeon, but it would at least unblock your intel card again. I haven't specifically added a special case to attempt to unblock external fences, but I've considered it. :-)

Actually the i915 reset stuff relies crucially on being able to kick
all waiters holding driver locks. Since the current fence code only
exposes an opaque wait function without exposing the underlying wait
queue we won't be able to sleep on both the fence queue and the reset
queue. So would pose a problem if we add fence_wait calls to our
driver.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 08:46:39

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 10:42, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 10:25 AM, Maarten Lankhorst
> <[email protected]> wrote:
>> In this case if the sync was to i915 the i915 lockup procedure would take care of itself. It wouldn't fix radeon, but it would at least unblock your intel card again. I haven't specifically added a special case to attempt to unblock external fences, but I've considered it. :-)
> Actually the i915 reset stuff relies crucially on being able to kick
> all waiters holding driver locks. Since the current fence code only
> exposes an opaque wait function without exposing the underlying wait
> queue we won't be able to sleep on both the fence queue and the reset
> queue. So would pose a problem if we add fence_wait calls to our
> driver.

And apart from that I really think that I misunderstood Maarten. But his
explanation sounds like i915 would do a reset because Radeon is locked
up, right?

Well if that's really the case then I would question the interface even
more, cause that is really nonsense.

Christian.

> -Daniel

2014-07-23 08:55:05

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 10:46 AM, Christian König
<[email protected]> wrote:
> Am 23.07.2014 10:42, schrieb Daniel Vetter:
>
>> On Wed, Jul 23, 2014 at 10:25 AM, Maarten Lankhorst
>> <[email protected]> wrote:
>>>
>>> In this case if the sync was to i915 the i915 lockup procedure would take
>>> care of itself. It wouldn't fix radeon, but it would at least unblock your
>>> intel card again. I haven't specifically added a special case to attempt to
>>> unblock external fences, but I've considered it. :-)
>>
>> Actually the i915 reset stuff relies crucially on being able to kick
>> all waiters holding driver locks. Since the current fence code only
>> exposes an opaque wait function without exposing the underlying wait
>> queue we won't be able to sleep on both the fence queue and the reset
>> queue. So would pose a problem if we add fence_wait calls to our
>> driver.
>
>
> And apart from that I really think that I misunderstood Maarten. But his
> explanation sounds like i915 would do a reset because Radeon is locked up,
> right?
>
> Well if that's really the case then I would question the interface even
> more, cause that is really nonsense.

I disagree - the entire point of fences is that we can do multi-gpu
work asynchronously. So by the time we'll notice that radeon's dead we
have accepted the batch from userspace already. The only way to get
rid of it again is through our reset machinery, which also tells
userspace that we couldn't execute the batch. Whether we actually need
to do a hw reset depends upon whether we've committed the batch to the
hw already. Atm that's always the case, but the scheduler will change
that. So I have no issues with intel doing a reset when other drivers
don't signal fences.

Also this isn't a problem with the interface really, but with the
current implementation for radeon. And getting cross-driver reset
notifications right will require more work either way.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 09:27:58

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 10:54, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 10:46 AM, Christian König
> <[email protected]> wrote:
>> Am 23.07.2014 10:42, schrieb Daniel Vetter:
>>
>>> On Wed, Jul 23, 2014 at 10:25 AM, Maarten Lankhorst
>>> <[email protected]> wrote:
>>>> In this case if the sync was to i915 the i915 lockup procedure would take
>>>> care of itself. It wouldn't fix radeon, but it would at least unblock your
>>>> intel card again. I haven't specifically added a special case to attempt to
>>>> unblock external fences, but I've considered it. :-)
>>> Actually the i915 reset stuff relies crucially on being able to kick
>>> all waiters holding driver locks. Since the current fence code only
>>> exposes an opaque wait function without exposing the underlying wait
>>> queue we won't be able to sleep on both the fence queue and the reset
>>> queue. So would pose a problem if we add fence_wait calls to our
>>> driver.
>>
>> And apart from that I really think that I misunderstood Maarten. But his
>> explanation sounds like i915 would do a reset because Radeon is locked up,
>> right?
>>
>> Well if that's really the case then I would question the interface even
>> more, cause that is really nonsense.
> I disagree - the entire point of fences is that we can do multi-gpu
> work asynchronously. So by the time we'll notice that radeon's dead we
> have accepted the batch from userspace already. The only way to get
> rid of it again is through our reset machinery, which also tells
> userspace that we couldn't execute the batch. Whether we actually need
> to do a hw reset depends upon whether we've committed the batch to the
> hw already. Atm that's always the case, but the scheduler will change
> that. So I have no issues with intel doing a reset when other drivers
> don't signal fences.

You submit a job to the hardware and then block the job to wait for
radeon to be finished? Well than this would indeed require a hardware
reset, but wouldn't that make the whole problem even worse?

I mean currently we block one userspace process to wait for other
hardware to be finished with a buffer, but what you are describing here
blocks the whole hardware to wait for other hardware which in the end
blocks all userspace process accessing the hardware.

Talking about alternative approaches wouldn't it be simpler to just
offload the waiting to a different kernel or userspace thread?

Christian.

>
> Also this isn't a problem with the interface really, but with the
> current implementation for radeon. And getting cross-driver reset
> notifications right will require more work either way.
> -Daniel

2014-07-23 09:30:46

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 11:27 AM, Christian König
<[email protected]> wrote:
> You submit a job to the hardware and then block the job to wait for radeon
> to be finished? Well than this would indeed require a hardware reset, but
> wouldn't that make the whole problem even worse?
>
> I mean currently we block one userspace process to wait for other hardware
> to be finished with a buffer, but what you are describing here blocks the
> whole hardware to wait for other hardware which in the end blocks all
> userspace process accessing the hardware.

There is nothing new here with prime - if one context hangs the gpu it
blocks everyone else on i915.

> Talking about alternative approaches wouldn't it be simpler to just offload
> the waiting to a different kernel or userspace thread?

Well this is exactly what we'll do once we have the scheduler. But
this is an orthogonal issue imo.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 09:36:57

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 11:30, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 11:27 AM, Christian König
> <[email protected]> wrote:
>> You submit a job to the hardware and then block the job to wait for radeon
>> to be finished? Well than this would indeed require a hardware reset, but
>> wouldn't that make the whole problem even worse?
>>
>> I mean currently we block one userspace process to wait for other hardware
>> to be finished with a buffer, but what you are describing here blocks the
>> whole hardware to wait for other hardware which in the end blocks all
>> userspace process accessing the hardware.
> There is nothing new here with prime - if one context hangs the gpu it
> blocks everyone else on i915.
>
>> Talking about alternative approaches wouldn't it be simpler to just offload
>> the waiting to a different kernel or userspace thread?
> Well this is exactly what we'll do once we have the scheduler. But
> this is an orthogonal issue imo.

Mhm, could have the scheduler first?

Cause that sounds like reducing the necessary fence interface to just a
fence->wait function.

Christian.

> -Daniel

2014-07-23 09:38:34

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 11:36, Christian König schreef:
> Am 23.07.2014 11:30, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 11:27 AM, Christian König
>> <[email protected]> wrote:
>>> You submit a job to the hardware and then block the job to wait for radeon
>>> to be finished? Well than this would indeed require a hardware reset, but
>>> wouldn't that make the whole problem even worse?
>>>
>>> I mean currently we block one userspace process to wait for other hardware
>>> to be finished with a buffer, but what you are describing here blocks the
>>> whole hardware to wait for other hardware which in the end blocks all
>>> userspace process accessing the hardware.
>> There is nothing new here with prime - if one context hangs the gpu it
>> blocks everyone else on i915.
>>
>>> Talking about alternative approaches wouldn't it be simpler to just offload
>>> the waiting to a different kernel or userspace thread?
>> Well this is exactly what we'll do once we have the scheduler. But
>> this is an orthogonal issue imo.
>
> Mhm, could have the scheduler first?
>
> Cause that sounds like reducing the necessary fence interface to just a fence->wait function.
You would also lose benefits like having a 'perf timechart' for gpu's.

~Maarten

2014-07-23 09:39:15

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 11:36 AM, Christian König
<[email protected]> wrote:
> Am 23.07.2014 11:30, schrieb Daniel Vetter:
>
>> On Wed, Jul 23, 2014 at 11:27 AM, Christian König
>> <[email protected]> wrote:
>>>
>>> You submit a job to the hardware and then block the job to wait for
>>> radeon
>>> to be finished? Well than this would indeed require a hardware reset, but
>>> wouldn't that make the whole problem even worse?
>>>
>>> I mean currently we block one userspace process to wait for other
>>> hardware
>>> to be finished with a buffer, but what you are describing here blocks the
>>> whole hardware to wait for other hardware which in the end blocks all
>>> userspace process accessing the hardware.
>>
>> There is nothing new here with prime - if one context hangs the gpu it
>> blocks everyone else on i915.
>>
>>> Talking about alternative approaches wouldn't it be simpler to just
>>> offload
>>> the waiting to a different kernel or userspace thread?
>>
>> Well this is exactly what we'll do once we have the scheduler. But
>> this is an orthogonal issue imo.
>
>
> Mhm, could have the scheduler first?
>
> Cause that sounds like reducing the necessary fence interface to just a
> fence->wait function.

The scheduler needs to keep track of a lot of fences, so I think we'll
have to register callbacks, not a simple wait function. We must keep
track of all the non-i915 fences for all oustanding batches. Also, the
scheduler doesn't eliminate the hw queue, only keep it much slower so
that we can sneak in higher priority things.

Really, scheduler or not is orthogonal.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 09:40:04

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 11:38, schrieb Maarten Lankhorst:
> op 23-07-14 11:36, Christian König schreef:
>> Am 23.07.2014 11:30, schrieb Daniel Vetter:
>>> On Wed, Jul 23, 2014 at 11:27 AM, Christian König
>>> <[email protected]> wrote:
>>>> You submit a job to the hardware and then block the job to wait for radeon
>>>> to be finished? Well than this would indeed require a hardware reset, but
>>>> wouldn't that make the whole problem even worse?
>>>>
>>>> I mean currently we block one userspace process to wait for other hardware
>>>> to be finished with a buffer, but what you are describing here blocks the
>>>> whole hardware to wait for other hardware which in the end blocks all
>>>> userspace process accessing the hardware.
>>> There is nothing new here with prime - if one context hangs the gpu it
>>> blocks everyone else on i915.
>>>
>>>> Talking about alternative approaches wouldn't it be simpler to just offload
>>>> the waiting to a different kernel or userspace thread?
>>> Well this is exactly what we'll do once we have the scheduler. But
>>> this is an orthogonal issue imo.
>> Mhm, could have the scheduler first?
>>
>> Cause that sounds like reducing the necessary fence interface to just a fence->wait function.
> You would also lose benefits like having a 'perf timechart' for gpu's.

I can live with that, when it reduces the complexity of the fence interface.

Christian.

>
> ~Maarten
>

2014-07-23 09:44:27

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 11:39 AM, Daniel Vetter <[email protected]> wrote:
> The scheduler needs to keep track of a lot of fences, so I think we'll
> have to register callbacks, not a simple wait function. We must keep
> track of all the non-i915 fences for all oustanding batches. Also, the
> scheduler doesn't eliminate the hw queue, only keep it much slower so
> that we can sneak in higher priority things.
>
> Really, scheduler or not is orthogonal.

Also see my other comment about interactions between wait_fence and
the i915 reset logic. We can't actually use it from within the
scheduler code since that would deadlock.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 09:47:18

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 9:14 PM, Jesse Barnes <[email protected]> wrote:
>> We don't have the code yet ready, but that's the direction i915 will
>> move towards for the near future. Jesse is working on some patches
>> already.
>
> Yeah I'd like to get some feedback from Maarten on my bits so I can get
> them ready for upstream. I still need to add documentation and tests,
> but I'd like to make sure the interfaces and internals get acked first.

Review works better if you supply a pointer to the patches ;-) I asked
Maarten whether he looked at it and he said he didn't know where ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 09:48:09

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 11:44, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 11:39 AM, Daniel Vetter <[email protected]> wrote:
>> The scheduler needs to keep track of a lot of fences, so I think we'll
>> have to register callbacks, not a simple wait function. We must keep
>> track of all the non-i915 fences for all oustanding batches. Also, the
>> scheduler doesn't eliminate the hw queue, only keep it much slower so
>> that we can sneak in higher priority things.
>>
>> Really, scheduler or not is orthogonal.
> Also see my other comment about interactions between wait_fence and
> the i915 reset logic. We can't actually use it from within the
> scheduler code since that would deadlock.

Yeah, I see. You would need some way to abort the waiting on other
devices fences in case of a lockup.

What about an userspace thread to offload waiting and command submission to?

Just playing with ideas right now,
Christian.

> -Daniel

2014-07-23 09:52:59

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 11:47 AM, Christian König
<[email protected]> wrote:
> Am 23.07.2014 11:44, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 11:39 AM, Daniel Vetter <[email protected]>
>> wrote:
>>>
>>> The scheduler needs to keep track of a lot of fences, so I think we'll
>>> have to register callbacks, not a simple wait function. We must keep
>>> track of all the non-i915 fences for all oustanding batches. Also, the
>>> scheduler doesn't eliminate the hw queue, only keep it much slower so
>>> that we can sneak in higher priority things.
>>>
>>> Really, scheduler or not is orthogonal.
>>
>> Also see my other comment about interactions between wait_fence and
>> the i915 reset logic. We can't actually use it from within the
>> scheduler code since that would deadlock.
>
>
> Yeah, I see. You would need some way to abort the waiting on other devices
> fences in case of a lockup.
>
> What about an userspace thread to offload waiting and command submission to?

That's what your android guys currently do. They hate it. And google
explicitly created their syncpts stuff to move all that into the
kernel. That one does explicit fencing, but the end result is still
that you have fences as deps between different drivers.

The other problem is that dri/prime is running under an implicitly
sync'ed model, so there's no clear point/responsibility for who would
actually do the waiting. You'll end up with synchronous behaviour
since the render sooner or later needs to perfectly align with
client/compositor ipc.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 09:55:39

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 11:47, Christian König schreef:
> Am 23.07.2014 11:44, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 11:39 AM, Daniel Vetter <[email protected]> wrote:
>>> The scheduler needs to keep track of a lot of fences, so I think we'll
>>> have to register callbacks, not a simple wait function. We must keep
>>> track of all the non-i915 fences for all oustanding batches. Also, the
>>> scheduler doesn't eliminate the hw queue, only keep it much slower so
>>> that we can sneak in higher priority things.
>>>
>>> Really, scheduler or not is orthogonal.
>> Also see my other comment about interactions between wait_fence and
>> the i915 reset logic. We can't actually use it from within the
>> scheduler code since that would deadlock.
>
> Yeah, I see. You would need some way to abort the waiting on other devices fences in case of a lockup.
>
> What about an userspace thread to offload waiting and command submission to?
You would still need enable_signaling, else polling on the dma-buf wouldn't work. ;-)
Can't wait synchronously with multiple shared fences, need to poll for that.
And the dma-buf would still have fences belonging to both drivers, and it would still call from outside the driver.

~Maarten

2014-07-23 10:13:17

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 11:55, schrieb Maarten Lankhorst:
> op 23-07-14 11:47, Christian König schreef:
>> Am 23.07.2014 11:44, schrieb Daniel Vetter:
>>> On Wed, Jul 23, 2014 at 11:39 AM, Daniel Vetter <[email protected]> wrote:
>>>> The scheduler needs to keep track of a lot of fences, so I think we'll
>>>> have to register callbacks, not a simple wait function. We must keep
>>>> track of all the non-i915 fences for all oustanding batches. Also, the
>>>> scheduler doesn't eliminate the hw queue, only keep it much slower so
>>>> that we can sneak in higher priority things.
>>>>
>>>> Really, scheduler or not is orthogonal.
>>> Also see my other comment about interactions between wait_fence and
>>> the i915 reset logic. We can't actually use it from within the
>>> scheduler code since that would deadlock.
>> Yeah, I see. You would need some way to abort the waiting on other devices fences in case of a lockup.
>>
>> What about an userspace thread to offload waiting and command submission to?
> You would still need enable_signaling, else polling on the dma-buf wouldn't work. ;-)
> Can't wait synchronously with multiple shared fences, need to poll for that.

No you don't. Just make a list of fences you need to wait for and wait
for each after another. But having an thread for each command submission
context doesn't sounds like the best solution anyway.

> And the dma-buf would still have fences belonging to both drivers, and it would still call from outside the driver.

Calling from outside the driver is fine as long as the driver can do
everything necessary to complete it's work and isn't forced into any
ugly hacks and things that are not 100% reliable.

So I don't see much other approach as integrating recovery code for not
firing interrupts and some kind of lockup handling into the fence code
as well.

Christian.

>
> ~Maarten
>

2014-07-23 10:52:53

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 12:13 PM, Christian König
<[email protected]> wrote:
>
>> And the dma-buf would still have fences belonging to both drivers, and it
>> would still call from outside the driver.
>
>
> Calling from outside the driver is fine as long as the driver can do
> everything necessary to complete it's work and isn't forced into any ugly
> hacks and things that are not 100% reliable.
>
> So I don't see much other approach as integrating recovery code for not
> firing interrupts and some kind of lockup handling into the fence code as
> well.

That approach doesn't really work at that well since every driver has
it's own reset semantics. And we're trying to move away from global
reset to fine-grained reset. So stop-the-world reset is out of
fashion, at least for i915. As you said, reset is normal in gpus and
we're trying to make reset less invasive. I really don't see a point
in imposing a reset scheme upon all drivers and I think you have about
as much motivation to convert radeon to the scheme used by i915 as
I'll have for converting to the one used by radeon. If it would fit at
all.

I guess for radeon we just have to add tons of insulation by punting
all callbacks to work items so that radeon can do whatever it wants.
Plus start a delayed_work based fallback when ->enable_signalling is
called to make sure we work on platforms that lack interrupts.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 12:35:57

by Rob Clark

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 2:52 AM, Christian König
<[email protected]> wrote:
> Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
>
>> op 22-07-14 17:59, Christian König schreef:
>>>
>>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>>>
>>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian König
>>>> <[email protected]> wrote:
>>>>>
>>>>> Drivers exporting fences need to provide a fence->signaled and a
>>>>> fence->wait
>>>>> function, everything else like fence->enable_signaling or calling
>>>>> fence_signaled() from the driver is optional.
>>>>>
>>>>> Drivers wanting to use exported fences don't call fence->signaled or
>>>>> fence->wait in atomic or interrupt context, and not with holding any
>>>>> global
>>>>> locking primitives (like mmap_sem etc...). Holding locking primitives
>>>>> local
>>>>> to the driver is ok, as long as they don't conflict with anything
>>>>> possible
>>>>> used by their own fence implementation.
>>>>
>>>> Well that's almost what we have right now with the exception that
>>>> drivers are allowed (actually must for correctness when updating
>>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>>
>>> In this case sorry for so much noise. I really haven't looked in so much
>>> detail into anything but Maarten's Radeon patches.
>>>
>>> But how does that then work right now? My impression was that it's
>>> mandatory for drivers to call fence_signaled()?
>>
>> It's only mandatory to call fence_signal() if the .enable_signaling
>> callback has been called, else you can get away with never calling signaling
>> a fence at all before dropping the last refcount to it.
>> This allows you to keep interrupts disabled when you don't need them.
>
>
> Can we somehow avoid the need to call fence_signal() at all? The interrupts
> at least on radeon are way to unreliable for such a thing. Can
> enable_signalling fail? What's the reason for fence_signaled() in the first
> place?
>

the device you are sharing with may not be able to do hw<->hw
signalling.. think about buffer sharing w/ camera, for example.

You probably want your ->enable_signalling() to enable whatever
workaround periodic-polling you need to do to catch missed irq's (and
then call fence->signal() once you detect the fence has passed.

fwiw, I haven't had a chance to read this whole thread yet, but I
expect that a lot of the SoC devices, especially ones with separate
kms-only display and gpu drivers, will want callback from gpu's irq to
bang a few display controller registers. I agree in general callbacks
from atomic ctx is probably something you want to avoid, but in this
particular case I think it is worth the extra complexity. Nothing is
stopping a driver from using a callback that just chucks something on
a workqueue, whereas the inverse is not possible.

BR,
-R

>
>>>> Agreed that any shared locks are out of the way (especially stuff like
>>>> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
>>>> really bad here still).
>>>
>>> Yeah that's also an point I've wanted to note on Maartens patch. Radeon
>>> grabs the read side of it's exclusive semaphore while waiting for fences
>>> (because it assumes that the fence it waits for is a Radeon fence).
>>>
>>> Assuming that we need to wait in both directions with Prime (e.g. Intel
>>> driver needs to wait for Radeon to finish rendering and Radeon needs to wait
>>> for Intel to finish displaying), this might become a perfect example of
>>> locking inversion.
>>
>> In the preliminary patches where I can sync radeon with other GPU's I've
>> been very careful in all the places that call into fences, to make sure that
>> radeon wouldn't try to handle lockups for a different (possibly also radeon)
>> card.
>
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could
> happen that radeon waits for the lockup to be resolved and the lockup
> handling needs to wait for a fence that's never signaled because of the
> lockup.
>
> Christian.
>
>
>>
>> This is also why fence_is_signaled should never block, and why it trylocks
>> the exclusive_lock. :-) I think lockdep would complain if I grabbed
>> exclusive_lock while blocking in is_signaled.
>>
>>>> So from the core fence framework I think we already have exactly this,
>>>> and we only need to adjust the radeon implementation a bit to make it
>>>> less risky and invasive to the radeon driver logic.
>>>
>>> Agree. Well the biggest problem I see is that exclusive semaphore I need
>>> to take when anything calls into the driver. For the fence code I need to
>>> move that down into the fence->signaled handler, cause that now can be
>>> called from outside the driver.
>>>
>>> Maarten solved this by telling the driver in the lockup handler (where we
>>> grab the write side of the exclusive lock) that all interrupts are already
>>> enabled, so that fence->signaled hopefully wouldn't mess with the hardware
>>> at all. While this probably works, it just leaves me with a feeling that we
>>> are doing something wrong here.
>>
>> There is unfortunately no global mechanism to say 'this card is locked up,
>> please don't call into any of my fences', and I don't associate fences with
>> devices, and radeon doesn't keep a global list of fences.
>> If all of that existed, it would complicate the interface and its callers
>> a lot, while I like to keep things simple.
>> So I did the best I could, and simply prevented the fence calls from
>> fiddling with the hardware. Fortunately gpu lockup is not a common
>> operation. :-)
>>
>> ~Maarten
>>
>>
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

2014-07-23 12:36:20

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 12:52, schrieb Daniel Vetter:
> On Wed, Jul 23, 2014 at 12:13 PM, Christian König
> <[email protected]> wrote:
>>> And the dma-buf would still have fences belonging to both drivers, and it
>>> would still call from outside the driver.
>>
>> Calling from outside the driver is fine as long as the driver can do
>> everything necessary to complete it's work and isn't forced into any ugly
>> hacks and things that are not 100% reliable.
>>
>> So I don't see much other approach as integrating recovery code for not
>> firing interrupts and some kind of lockup handling into the fence code as
>> well.
> That approach doesn't really work at that well since every driver has
> it's own reset semantics. And we're trying to move away from global
> reset to fine-grained reset. So stop-the-world reset is out of
> fashion, at least for i915. As you said, reset is normal in gpus and
> we're trying to make reset less invasive. I really don't see a point
> in imposing a reset scheme upon all drivers and I think you have about
> as much motivation to convert radeon to the scheme used by i915 as
> I'll have for converting to the one used by radeon. If it would fit at
> all.
Oh my! No, I didn't wanted to suggest any global reset infrastructure.

My idea was more that the fence framework provides a
fence->process_signaling callback that is periodically called after
enable_signaling is called to trigger manual signal processing in the
driver.

This would both be suitable as a fallback in case of not working
interrupts as well as a chance for any driver to do necessary lockup
handling.

Christian.

> I guess for radeon we just have to add tons of insulation by punting
> all callbacks to work items so that radeon can do whatever it wants.
> Plus start a delayed_work based fallback when ->enable_signalling is
> called to make sure we work on platforms that lack interrupts.
> -Daniel

2014-07-23 12:42:10

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 2:36 PM, Christian König
<[email protected]> wrote:
> My idea was more that the fence framework provides a
> fence->process_signaling callback that is periodically called after
> enable_signaling is called to trigger manual signal processing in the
> driver.
>
> This would both be suitable as a fallback in case of not working interrupts
> as well as a chance for any driver to do necessary lockup handling.

Imo that should be an implementation detail of the fence provider. So
in ->enable_signaling radeon needs to arm a delayed work to regularly
check fence and signal them if the irq failed. If it's a common need
we might provide some shared code for this (e.g. a struct
unreliable_fence or so). But this shouldn't be mandatory since not all
gpus are broken like that.

And if we force other drivers to care for this special case that imo
leaks the abstraction out of radeon (or any other driver with
unreliable interrupts).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-07-23 13:17:03

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 14:36, Christian König schreef:
> Am 23.07.2014 12:52, schrieb Daniel Vetter:
>> On Wed, Jul 23, 2014 at 12:13 PM, Christian König
>> <[email protected]> wrote:
>>>> And the dma-buf would still have fences belonging to both drivers, and it
>>>> would still call from outside the driver.
>>>
>>> Calling from outside the driver is fine as long as the driver can do
>>> everything necessary to complete it's work and isn't forced into any ugly
>>> hacks and things that are not 100% reliable.
>>>
>>> So I don't see much other approach as integrating recovery code for not
>>> firing interrupts and some kind of lockup handling into the fence code as
>>> well.
>> That approach doesn't really work at that well since every driver has
>> it's own reset semantics. And we're trying to move away from global
>> reset to fine-grained reset. So stop-the-world reset is out of
>> fashion, at least for i915. As you said, reset is normal in gpus and
>> we're trying to make reset less invasive. I really don't see a point
>> in imposing a reset scheme upon all drivers and I think you have about
>> as much motivation to convert radeon to the scheme used by i915 as
>> I'll have for converting to the one used by radeon. If it would fit at
>> all.
> Oh my! No, I didn't wanted to suggest any global reset infrastructure.
>
> My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver.
>
> This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling.
I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change..

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7fbfd41479f1..51b646b9c8bb 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -345,6 +345,9 @@ struct radeon_fence_driver {
uint64_t sync_seq[RADEON_NUM_RINGS];
atomic64_t last_seq;
bool initialized;
+ struct delayed_work work;
+ struct radeon_device *rdev;
+ unsigned ring;
};

struct radeon_fence_cb {
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index da83f36dd708..955c825946ad 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
}
} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);

+ if (!wake && last_seq < last_emitted)
+ schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10));
+
return wake;
}

@@ -815,6 +818,14 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
return 0;
}

+static void radeon_fence_driver_recheck(struct work_struct *work)
+{
+ struct radeon_fence_driver *drv = container_of(work, struct radeon_fence_driver, work.work);
+
+ DRM_ERROR("omg, working!\n");
+ radeon_fence_process(drv->rdev, drv->ring);
+}
+
/**
* radeon_fence_driver_init_ring - init the fence driver
* for the requested ring.
@@ -836,6 +847,10 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
rdev->fence_drv[ring].sync_seq[i] = 0;
atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
rdev->fence_drv[ring].initialized = false;
+
+ rdev->fence_drv[ring].ring = ring;
+ rdev->fence_drv[ring].rdev = rdev;
+ INIT_DELAYED_WORK(&rdev->fence_drv[ring].work, radeon_fence_driver_recheck);
}

/**
@@ -880,6 +895,7 @@ void radeon_fence_driver_fini(struct radeon_device *rdev)
for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
if (!rdev->fence_drv[ring].initialized)
continue;
+ cancel_delayed_work_sync(&rdev->fence_drv[ring].work);
r = radeon_fence_wait_empty(rdev, ring);
if (r) {
/* no need to trigger GPU reset as we are unloading */
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 16807afab362..85391ddd3ce9 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -331,7 +331,7 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring)
{
unsigned long irqflags;

- if (!rdev->ddev->irq_enabled)
+// if (!rdev->ddev->irq_enabled)
return;

if (atomic_inc_return(&rdev->irq.ring_int[ring]) == 1) {
@@ -355,7 +355,7 @@ void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring)
{
unsigned long irqflags;

- if (!rdev->ddev->irq_enabled)
+// if (!rdev->ddev->irq_enabled)
return;

if (atomic_dec_and_test(&rdev->irq.ring_int[ring])) {

2014-07-23 14:05:32

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 15:16, Maarten Lankhorst schreef:
> op 23-07-14 14:36, Christian König schreef:
>> Am 23.07.2014 12:52, schrieb Daniel Vetter:
>>> On Wed, Jul 23, 2014 at 12:13 PM, Christian König
>>> <[email protected]> wrote:
>>>>> And the dma-buf would still have fences belonging to both drivers, and it
>>>>> would still call from outside the driver.
>>>> Calling from outside the driver is fine as long as the driver can do
>>>> everything necessary to complete it's work and isn't forced into any ugly
>>>> hacks and things that are not 100% reliable.
>>>>
>>>> So I don't see much other approach as integrating recovery code for not
>>>> firing interrupts and some kind of lockup handling into the fence code as
>>>> well.
>>> That approach doesn't really work at that well since every driver has
>>> it's own reset semantics. And we're trying to move away from global
>>> reset to fine-grained reset. So stop-the-world reset is out of
>>> fashion, at least for i915. As you said, reset is normal in gpus and
>>> we're trying to make reset less invasive. I really don't see a point
>>> in imposing a reset scheme upon all drivers and I think you have about
>>> as much motivation to convert radeon to the scheme used by i915 as
>>> I'll have for converting to the one used by radeon. If it would fit at
>>> all.
>> Oh my! No, I didn't wanted to suggest any global reset infrastructure.
>>
>> My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver.
>>
>> This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling.
> I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change..
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 7fbfd41479f1..51b646b9c8bb 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -345,6 +345,9 @@ struct radeon_fence_driver {
> uint64_t sync_seq[RADEON_NUM_RINGS];
> atomic64_t last_seq;
> bool initialized;
> + struct delayed_work work;
> + struct radeon_device *rdev;
> + unsigned ring;
> };
>
> struct radeon_fence_cb {
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index da83f36dd708..955c825946ad 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
> }
> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>
> + if (!wake && last_seq < last_emitted)
> + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10));
> +
>
When trying this: if (seq < last_emitted) is probably a better check.

~Maarten

2014-07-23 15:36:46

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, 23 Jul 2014 11:47:15 +0200
Daniel Vetter <[email protected]> wrote:

> On Tue, Jul 22, 2014 at 9:14 PM, Jesse Barnes <[email protected]> wrote:
> >> We don't have the code yet ready, but that's the direction i915 will
> >> move towards for the near future. Jesse is working on some patches
> >> already.
> >
> > Yeah I'd like to get some feedback from Maarten on my bits so I can get
> > them ready for upstream. I still need to add documentation and tests,
> > but I'd like to make sure the interfaces and internals get acked first.
>
> Review works better if you supply a pointer to the patches ;-) I asked
> Maarten whether he looked at it and he said he didn't know where ...

Oh I provided it in IRC earlier, figured Maarten was just busy. :)

Tree is android-dma-buf-i915-fences in my fdo linux repo.

--
Jesse Barnes, Intel Open Source Technology Center

2014-07-24 13:47:55

by Christian König

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Hi Maarten,

try to implement this as a replacement for specifying the
RADEON_FENCE_JIFFIES_TIMEOUT on wait_event_*. And reset the timeout
every time radeon_fence_process is called and not only when any of the
sequences increment.

I don't have the time right now to look deeper into it or help with the
patch, but the general approach sounds valid to me.

Regards,
Christian.

Am 23.07.2014 16:05, schrieb Maarten Lankhorst:
> op 23-07-14 15:16, Maarten Lankhorst schreef:
>> op 23-07-14 14:36, Christian König schreef:
>>> Am 23.07.2014 12:52, schrieb Daniel Vetter:
>>>> On Wed, Jul 23, 2014 at 12:13 PM, Christian König
>>>> <[email protected]> wrote:
>>>>>> And the dma-buf would still have fences belonging to both drivers, and it
>>>>>> would still call from outside the driver.
>>>>> Calling from outside the driver is fine as long as the driver can do
>>>>> everything necessary to complete it's work and isn't forced into any ugly
>>>>> hacks and things that are not 100% reliable.
>>>>>
>>>>> So I don't see much other approach as integrating recovery code for not
>>>>> firing interrupts and some kind of lockup handling into the fence code as
>>>>> well.
>>>> That approach doesn't really work at that well since every driver has
>>>> it's own reset semantics. And we're trying to move away from global
>>>> reset to fine-grained reset. So stop-the-world reset is out of
>>>> fashion, at least for i915. As you said, reset is normal in gpus and
>>>> we're trying to make reset less invasive. I really don't see a point
>>>> in imposing a reset scheme upon all drivers and I think you have about
>>>> as much motivation to convert radeon to the scheme used by i915 as
>>>> I'll have for converting to the one used by radeon. If it would fit at
>>>> all.
>>> Oh my! No, I didn't wanted to suggest any global reset infrastructure.
>>>
>>> My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver.
>>>
>>> This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling.
>> I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change..
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index 7fbfd41479f1..51b646b9c8bb 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -345,6 +345,9 @@ struct radeon_fence_driver {
>> uint64_t sync_seq[RADEON_NUM_RINGS];
>> atomic64_t last_seq;
>> bool initialized;
>> + struct delayed_work work;
>> + struct radeon_device *rdev;
>> + unsigned ring;
>> };
>>
>> struct radeon_fence_cb {
>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>> index da83f36dd708..955c825946ad 100644
>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>> @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
>> }
>> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>>
>> + if (!wake && last_seq < last_emitted)
>> + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10));
>> +
>>
> When trying this: if (seq < last_emitted) is probably a better check.
>
> ~Maarten
>