2023-03-08 15:53:40

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 00/15] dma-fence: Deadline awareness

From: Rob Clark <[email protected]>

This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline. IGT tests utilizing these can be found at:

https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago. Add (mostly unrelated) drm/msm patch which also
uses the vblank helper. Use dma_fence_chain_contained(). More
verbose syncobj UABI comments. Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v7: Fix kbuild complaints about vblank helper. Add more docs.
v8: Add patch to surface sync_file UAPI, and more docs updates.
v9: Drop (E)POLLPRI support.. I still like it, but not essential and
it can always be revived later. Fix doc build warning.
v10: Update 11/15 to handle multiple CRTCs

Rob Clark (15):
dma-buf/dma-fence: Add deadline awareness
dma-buf/fence-array: Add fence deadline support
dma-buf/fence-chain: Add fence deadline support
dma-buf/dma-resv: Add a way to set fence deadline
dma-buf/sync_file: Surface sync-file uABI
dma-buf/sync_file: Add SET_DEADLINE ioctl
dma-buf/sw_sync: Add fence deadline support
drm/scheduler: Add fence deadline support
drm/syncobj: Add deadline support for syncobj waits
drm/vblank: Add helper to get next vblank time
drm/atomic-helper: Set fence deadline for vblank
drm/msm: Add deadline based boost support
drm/msm: Add wait-boost support
drm/msm/atomic: Switch to vblank_start helper
drm/i915: Add deadline based boost support

Rob Clark (15):
dma-buf/dma-fence: Add deadline awareness
dma-buf/fence-array: Add fence deadline support
dma-buf/fence-chain: Add fence deadline support
dma-buf/dma-resv: Add a way to set fence deadline
dma-buf/sync_file: Surface sync-file uABI
dma-buf/sync_file: Add SET_DEADLINE ioctl
dma-buf/sw_sync: Add fence deadline support
drm/scheduler: Add fence deadline support
drm/syncobj: Add deadline support for syncobj waits
drm/vblank: Add helper to get next vblank time
drm/atomic-helper: Set fence deadline for vblank
drm/msm: Add deadline based boost support
drm/msm: Add wait-boost support
drm/msm/atomic: Switch to vblank_start helper
drm/i915: Add deadline based boost support

Documentation/driver-api/dma-buf.rst | 16 ++++-
drivers/dma-buf/dma-fence-array.c | 11 ++++
drivers/dma-buf/dma-fence-chain.c | 12 ++++
drivers/dma-buf/dma-fence.c | 60 ++++++++++++++++++
drivers/dma-buf/dma-resv.c | 22 +++++++
drivers/dma-buf/sw_sync.c | 81 +++++++++++++++++++++++++
drivers/dma-buf/sync_debug.h | 2 +
drivers/dma-buf/sync_file.c | 19 ++++++
drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++
drivers/gpu/drm/drm_syncobj.c | 64 +++++++++++++++----
drivers/gpu/drm/drm_vblank.c | 53 +++++++++++++---
drivers/gpu/drm/i915/i915_request.c | 20 ++++++
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -----
drivers/gpu/drm/msm/msm_atomic.c | 8 ++-
drivers/gpu/drm/msm/msm_drv.c | 12 ++--
drivers/gpu/drm/msm/msm_fence.c | 74 ++++++++++++++++++++++
drivers/gpu/drm/msm/msm_fence.h | 20 ++++++
drivers/gpu/drm/msm/msm_gem.c | 5 ++
drivers/gpu/drm/msm/msm_kms.h | 8 ---
drivers/gpu/drm/scheduler/sched_fence.c | 46 ++++++++++++++
drivers/gpu/drm/scheduler/sched_main.c | 2 +-
include/drm/drm_vblank.h | 1 +
include/drm/gpu_scheduler.h | 17 ++++++
include/linux/dma-fence.h | 22 +++++++
include/linux/dma-resv.h | 2 +
include/uapi/drm/drm.h | 17 ++++++
include/uapi/drm/msm_drm.h | 14 ++++-
include/uapi/linux/sync_file.h | 59 +++++++++++-------
28 files changed, 640 insertions(+), 79 deletions(-)

--
2.39.2



2023-03-08 15:53:44

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

From: Rob Clark <[email protected]>

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss. This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
deadlines, to avoid increasing dma_fence size. The fence-context
implementation will need similar logic to track deadlines of all
the fences on the same timeline. [ckoenig]
v3: Clarify locking wrt. set_deadline callback
v4: Clarify in docs comment that this is a hint
v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v6: More docs
v7: Fix typo, clarify past deadlines

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Christian König <[email protected]>
Acked-by: Pekka Paalanen <[email protected]>
Reviewed-by: Bagas Sanjaya <[email protected]>
---
Documentation/driver-api/dma-buf.rst | 6 +++
drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
include/linux/dma-fence.h | 22 +++++++++++
3 files changed, 87 insertions(+)

diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 622b8156d212..183e480d8cea 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
.. kernel-doc:: drivers/dma-buf/dma-fence.c
:doc: fence signalling annotation

+DMA Fence Deadline Hints
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: drivers/dma-buf/dma-fence.c
+ :doc: deadline hints
+
DMA Fences Functions Reference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..f177c56269bb 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
}
EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
+ * DOC: deadline hints
+ *
+ * In an ideal world, it would be possible to pipeline a workload sufficiently
+ * that a utilization based device frequency governor could arrive at a minimum
+ * frequency that meets the requirements of the use-case, in order to minimize
+ * power consumption. But in the real world there are many workloads which
+ * defy this ideal. For example, but not limited to:
+ *
+ * * Workloads that ping-pong between device and CPU, with alternating periods
+ * of CPU waiting for device, and device waiting on CPU. This can result in
+ * devfreq and cpufreq seeing idle time in their respective domains and in
+ * result reduce frequency.
+ *
+ * * Workloads that interact with a periodic time based deadline, such as double
+ * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
+ * missing a vblank deadline results in an *increase* in idle time on the GPU
+ * (since it has to wait an additional vblank period), sending a signal to
+ * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
+ * needed.
+ *
+ * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
+ * The deadline hint provides a way for the waiting driver, or userspace, to
+ * convey an appropriate sense of urgency to the signaling driver.
+ *
+ * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
+ * facing APIs). The time could either be some point in the future (such as
+ * the vblank based deadline for page-flipping, or the start of a compositor's
+ * composition cycle), or the current time to indicate an immediate deadline
+ * hint (Ie. forward progress cannot be made until this fence is signaled).
+ *
+ * Multiple deadlines may be set on a given fence, even in parallel. See the
+ * documentation for &dma_fence_ops.set_deadline.
+ *
+ * The deadline hint is just that, a hint. The driver that created the fence
+ * may react by increasing frequency, making different scheduling choices, etc.
+ * Or doing nothing at all.
+ */
+
+/**
+ * dma_fence_set_deadline - set desired fence-wait deadline hint
+ * @fence: the fence that is to be waited on
+ * @deadline: the time by which the waiter hopes for the fence to be
+ * signaled
+ *
+ * Give the fence signaler a hint about an upcoming deadline, such as
+ * vblank, by which point the waiter would prefer the fence to be
+ * signaled by. This is intended to give feedback to the fence signaler
+ * to aid in power management decisions, such as boosting GPU frequency
+ * if a periodic vblank deadline is approaching but the fence is not
+ * yet signaled..
+ */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+ if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
+ fence->ops->set_deadline(fence, deadline);
+}
+EXPORT_SYMBOL(dma_fence_set_deadline);
+
/**
* dma_fence_describe - Dump fence describtion into seq_file
* @fence: the 6fence to describe
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..d54b595a0fe0 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -257,6 +257,26 @@ struct dma_fence_ops {
*/
void (*timeline_value_str)(struct dma_fence *fence,
char *str, int size);
+
+ /**
+ * @set_deadline:
+ *
+ * Callback to allow a fence waiter to inform the fence signaler of
+ * an upcoming deadline, such as vblank, by which point the waiter
+ * would prefer the fence to be signaled by. This is intended to
+ * give feedback to the fence signaler to aid in power management
+ * decisions, such as boosting GPU frequency.
+ *
+ * This is called without &dma_fence.lock held, it can be called
+ * multiple times and from any context. Locking is up to the callee
+ * if it has some state to manage. If multiple deadlines are set,
+ * the expectation is to track the soonest one. If the deadline is
+ * before the current time, it should be interpreted as an immediate
+ * deadline.
+ *
+ * This callback is optional.
+ */
+ void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
};

void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -583,6 +603,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
return ret < 0 ? ret : 0;
}

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
+
struct dma_fence *dma_fence_get_stub(void);
struct dma_fence *dma_fence_allocate_private_stub(void);
u64 dma_fence_context_alloc(unsigned num);
--
2.39.2


2023-03-08 15:53:48

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 02/15] dma-buf/fence-array: Add fence deadline support

From: Rob Clark <[email protected]>

Propagate the deadline to all the fences in the array.

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Christian König <[email protected]>
---
drivers/dma-buf/dma-fence-array.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 5c8a7084577b..9b3ce8948351 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -123,12 +123,23 @@ static void dma_fence_array_release(struct dma_fence *fence)
dma_fence_free(fence);
}

+static void dma_fence_array_set_deadline(struct dma_fence *fence,
+ ktime_t deadline)
+{
+ struct dma_fence_array *array = to_dma_fence_array(fence);
+ unsigned i;
+
+ for (i = 0; i < array->num_fences; ++i)
+ dma_fence_set_deadline(array->fences[i], deadline);
+}
+
const struct dma_fence_ops dma_fence_array_ops = {
.get_driver_name = dma_fence_array_get_driver_name,
.get_timeline_name = dma_fence_array_get_timeline_name,
.enable_signaling = dma_fence_array_enable_signaling,
.signaled = dma_fence_array_signaled,
.release = dma_fence_array_release,
+ .set_deadline = dma_fence_array_set_deadline,
};
EXPORT_SYMBOL(dma_fence_array_ops);

--
2.39.2


2023-03-08 15:53:52

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 03/15] dma-buf/fence-chain: Add fence deadline support

From: Rob Clark <[email protected]>

Propagate the deadline to all the fences in the chain.

v2: Use dma_fence_chain_contained [Tvrtko]

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Christian König <[email protected]> for this one.
---
drivers/dma-buf/dma-fence-chain.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index a0d920576ba6..9663ba1bb6ac 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -206,6 +206,17 @@ static void dma_fence_chain_release(struct dma_fence *fence)
dma_fence_free(fence);
}

+
+static void dma_fence_chain_set_deadline(struct dma_fence *fence,
+ ktime_t deadline)
+{
+ dma_fence_chain_for_each(fence, fence) {
+ struct dma_fence *f = dma_fence_chain_contained(fence);
+
+ dma_fence_set_deadline(f, deadline);
+ }
+}
+
const struct dma_fence_ops dma_fence_chain_ops = {
.use_64bit_seqno = true,
.get_driver_name = dma_fence_chain_get_driver_name,
@@ -213,6 +224,7 @@ const struct dma_fence_ops dma_fence_chain_ops = {
.enable_signaling = dma_fence_chain_enable_signaling,
.signaled = dma_fence_chain_signaled,
.release = dma_fence_chain_release,
+ .set_deadline = dma_fence_chain_set_deadline,
};
EXPORT_SYMBOL(dma_fence_chain_ops);

--
2.39.2


2023-03-08 15:53:56

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 04/15] dma-buf/dma-resv: Add a way to set fence deadline

From: Rob Clark <[email protected]>

Add a way to set a deadline on remaining resv fences according to the
requested usage.

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Christian König <[email protected]>
---
drivers/dma-buf/dma-resv.c | 22 ++++++++++++++++++++++
include/linux/dma-resv.h | 2 ++
2 files changed, 24 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 1c76aed8e262..2a594b754af1 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -684,6 +684,28 @@ long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage,
}
EXPORT_SYMBOL_GPL(dma_resv_wait_timeout);

+/**
+ * dma_resv_set_deadline - Set a deadline on reservation's objects fences
+ * @obj: the reservation object
+ * @usage: controls which fences to include, see enum dma_resv_usage.
+ * @deadline: the requested deadline (MONOTONIC)
+ *
+ * May be called without holding the dma_resv lock. Sets @deadline on
+ * all fences filtered by @usage.
+ */
+void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage,
+ ktime_t deadline)
+{
+ struct dma_resv_iter cursor;
+ struct dma_fence *fence;
+
+ dma_resv_iter_begin(&cursor, obj, usage);
+ dma_resv_for_each_fence_unlocked(&cursor, fence) {
+ dma_fence_set_deadline(fence, deadline);
+ }
+ dma_resv_iter_end(&cursor);
+}
+EXPORT_SYMBOL_GPL(dma_resv_set_deadline);

/**
* dma_resv_test_signaled - Test if a reservation object's fences have been
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 0637659a702c..8d0e34dad446 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -479,6 +479,8 @@ int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage,
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src);
long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage,
bool intr, unsigned long timeout);
+void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage,
+ ktime_t deadline);
bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage);
void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq);

--
2.39.2


2023-03-08 15:54:08

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 05/15] dma-buf/sync_file: Surface sync-file uABI

From: Rob Clark <[email protected]>

We had all of the internal driver APIs, but not the all important
userspace uABI, in the dma-buf doc. Fix that. And re-arrange the
comments slightly as otherwise the comments for the ioctl nr defines
would not show up.

v2: Fix docs build warning coming from newly including the uabi header
in the docs build

Signed-off-by: Rob Clark <[email protected]>
Acked-by: Pekka Paalanen <[email protected]>
---
Documentation/driver-api/dma-buf.rst | 10 ++++++--
include/uapi/linux/sync_file.h | 37 +++++++++++-----------------
2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 183e480d8cea..ff3f8da296af 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -203,8 +203,8 @@ DMA Fence unwrap
.. kernel-doc:: include/linux/dma-fence-unwrap.h
:internal:

-DMA Fence uABI/Sync File
-~~~~~~~~~~~~~~~~~~~~~~~~
+DMA Fence Sync File
+~~~~~~~~~~~~~~~~~~~

.. kernel-doc:: drivers/dma-buf/sync_file.c
:export:
@@ -212,6 +212,12 @@ DMA Fence uABI/Sync File
.. kernel-doc:: include/linux/sync_file.h
:internal:

+DMA Fence Sync File uABI
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: include/uapi/linux/sync_file.h
+ :internal:
+
Indefinite DMA Fences
~~~~~~~~~~~~~~~~~~~~~

diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
index ee2dcfb3d660..7e42a5b7558b 100644
--- a/include/uapi/linux/sync_file.h
+++ b/include/uapi/linux/sync_file.h
@@ -16,12 +16,16 @@
#include <linux/types.h>

/**
- * struct sync_merge_data - data passed to merge ioctl
+ * struct sync_merge_data - SYNC_IOC_MERGE: merge two fences
* @name: name of new fence
* @fd2: file descriptor of second fence
* @fence: returns the fd of the new fence to userspace
* @flags: merge_data flags
* @pad: padding for 64-bit alignment, should always be zero
+ *
+ * Creates a new fence containing copies of the sync_pts in both
+ * the calling fd and sync_merge_data.fd2. Returns the new fence's
+ * fd in sync_merge_data.fence
*/
struct sync_merge_data {
char name[32];
@@ -34,8 +38,8 @@ struct sync_merge_data {
/**
* struct sync_fence_info - detailed fence information
* @obj_name: name of parent sync_timeline
-* @driver_name: name of driver implementing the parent
-* @status: status of the fence 0:active 1:signaled <0:error
+ * @driver_name: name of driver implementing the parent
+ * @status: status of the fence 0:active 1:signaled <0:error
* @flags: fence_info flags
* @timestamp_ns: timestamp of status change in nanoseconds
*/
@@ -48,14 +52,19 @@ struct sync_fence_info {
};

/**
- * struct sync_file_info - data returned from fence info ioctl
+ * struct sync_file_info - SYNC_IOC_FILE_INFO: get detailed information on a sync_file
* @name: name of fence
* @status: status of fence. 1: signaled 0:active <0:error
* @flags: sync_file_info flags
* @num_fences number of fences in the sync_file
* @pad: padding for 64-bit alignment, should always be zero
- * @sync_fence_info: pointer to array of structs sync_fence_info with all
+ * @sync_fence_info: pointer to array of struct &sync_fence_info with all
* fences in the sync_file
+ *
+ * Takes a struct sync_file_info. If num_fences is 0, the field is updated
+ * with the actual number of fences. If num_fences is > 0, the system will
+ * use the pointer provided on sync_fence_info to return up to num_fences of
+ * struct sync_fence_info, with detailed fence information.
*/
struct sync_file_info {
char name[32];
@@ -69,30 +78,14 @@ struct sync_file_info {

#define SYNC_IOC_MAGIC '>'

-/**
+/*
* Opcodes 0, 1 and 2 were burned during a API change to avoid users of the
* old API to get weird errors when trying to handling sync_files. The API
* change happened during the de-stage of the Sync Framework when there was
* no upstream users available.
*/

-/**
- * DOC: SYNC_IOC_MERGE - merge two fences
- *
- * Takes a struct sync_merge_data. Creates a new fence containing copies of
- * the sync_pts in both the calling fd and sync_merge_data.fd2. Returns the
- * new fence's fd in sync_merge_data.fence
- */
#define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
-
-/**
- * DOC: SYNC_IOC_FILE_INFO - get detailed information on a sync_file
- *
- * Takes a struct sync_file_info. If num_fences is 0, the field is updated
- * with the actual number of fences. If num_fences is > 0, the system will
- * use the pointer provided on sync_fence_info to return up to num_fences of
- * struct sync_fence_info, with detailed fence information.
- */
#define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)

#endif /* _UAPI_LINUX_SYNC_H */
--
2.39.2


2023-03-08 15:54:13

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 07/15] dma-buf/sw_sync: Add fence deadline support

From: Rob Clark <[email protected]>

This consists of simply storing the most recent deadline, and adding an
ioctl to retrieve the deadline. This can be used in conjunction with
the SET_DEADLINE ioctl on a fence fd for testing. Ie. create various
sw_sync fences, merge them into a fence-array, set deadline on the
fence-array and confirm that it is propagated properly to each fence.

v2: Switch UABI to express deadline as u64
v3: More verbose UAPI docs, show how to convert from timespec
v4: Better comments, track the soonest deadline, as a normal fence
implementation would, return an error if no deadline set.

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Christian König <[email protected]>
Acked-by: Pekka Paalanen <[email protected]>
---
drivers/dma-buf/sw_sync.c | 81 ++++++++++++++++++++++++++++++++++++
drivers/dma-buf/sync_debug.h | 2 +
2 files changed, 83 insertions(+)

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 348b3a9170fa..f53071bca3af 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -52,12 +52,33 @@ struct sw_sync_create_fence_data {
__s32 fence; /* fd of new fence */
};

+/**
+ * struct sw_sync_get_deadline - get the deadline hint of a sw_sync fence
+ * @deadline_ns: absolute time of the deadline
+ * @pad: must be zero
+ * @fence_fd: the sw_sync fence fd (in)
+ *
+ * Return the earliest deadline set on the fence. The timebase for the
+ * deadline is CLOCK_MONOTONIC (same as vblank). If there is no deadline
+ * set on the fence, this ioctl will return -ENOENT.
+ */
+struct sw_sync_get_deadline {
+ __u64 deadline_ns;
+ __u32 pad;
+ __s32 fence_fd;
+};
+
#define SW_SYNC_IOC_MAGIC 'W'

#define SW_SYNC_IOC_CREATE_FENCE _IOWR(SW_SYNC_IOC_MAGIC, 0,\
struct sw_sync_create_fence_data)

#define SW_SYNC_IOC_INC _IOW(SW_SYNC_IOC_MAGIC, 1, __u32)
+#define SW_SYNC_GET_DEADLINE _IOWR(SW_SYNC_IOC_MAGIC, 2, \
+ struct sw_sync_get_deadline)
+
+
+#define SW_SYNC_HAS_DEADLINE_BIT DMA_FENCE_FLAG_USER_BITS

static const struct dma_fence_ops timeline_fence_ops;

@@ -171,6 +192,22 @@ static void timeline_fence_timeline_value_str(struct dma_fence *fence,
snprintf(str, size, "%d", parent->value);
}

+static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+ struct sync_pt *pt = dma_fence_to_sync_pt(fence);
+ unsigned long flags;
+
+ spin_lock_irqsave(fence->lock, flags);
+ if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
+ if (ktime_before(deadline, pt->deadline))
+ pt->deadline = deadline;
+ } else {
+ pt->deadline = deadline;
+ set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
+ }
+ spin_unlock_irqrestore(fence->lock, flags);
+}
+
static const struct dma_fence_ops timeline_fence_ops = {
.get_driver_name = timeline_fence_get_driver_name,
.get_timeline_name = timeline_fence_get_timeline_name,
@@ -179,6 +216,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
.release = timeline_fence_release,
.fence_value_str = timeline_fence_value_str,
.timeline_value_str = timeline_fence_timeline_value_str,
+ .set_deadline = timeline_fence_set_deadline,
};

/**
@@ -387,6 +425,46 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg)
return 0;
}

+static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long arg)
+{
+ struct sw_sync_get_deadline data;
+ struct dma_fence *fence;
+ struct sync_pt *pt;
+ int ret = 0;
+
+ if (copy_from_user(&data, (void __user *)arg, sizeof(data)))
+ return -EFAULT;
+
+ if (data.deadline_ns || data.pad)
+ return -EINVAL;
+
+ fence = sync_file_get_fence(data.fence_fd);
+ if (!fence)
+ return -EINVAL;
+
+ pt = dma_fence_to_sync_pt(fence);
+ if (!pt)
+ return -EINVAL;
+
+ spin_lock(fence->lock);
+ if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
+ data.deadline_ns = ktime_to_ns(pt->deadline);
+ } else {
+ ret = -ENOENT;
+ }
+ spin_unlock(fence->lock);
+
+ dma_fence_put(fence);
+
+ if (ret)
+ return ret;
+
+ if (copy_to_user((void __user *)arg, &data, sizeof(data)))
+ return -EFAULT;
+
+ return 0;
+}
+
static long sw_sync_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -399,6 +477,9 @@ static long sw_sync_ioctl(struct file *file, unsigned int cmd,
case SW_SYNC_IOC_INC:
return sw_sync_ioctl_inc(obj, arg);

+ case SW_SYNC_GET_DEADLINE:
+ return sw_sync_ioctl_get_deadline(obj, arg);
+
default:
return -ENOTTY;
}
diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
index 6176e52ba2d7..a1bdd62efccd 100644
--- a/drivers/dma-buf/sync_debug.h
+++ b/drivers/dma-buf/sync_debug.h
@@ -55,11 +55,13 @@ static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
* @base: base fence object
* @link: link on the sync timeline's list
* @node: node in the sync timeline's tree
+ * @deadline: the earliest fence deadline hint
*/
struct sync_pt {
struct dma_fence base;
struct list_head link;
struct rb_node node;
+ ktime_t deadline;
};

extern const struct file_operations sw_sync_debugfs_fops;
--
2.39.2


2023-03-08 15:54:20

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 06/15] dma-buf/sync_file: Add SET_DEADLINE ioctl

From: Rob Clark <[email protected]>

The initial purpose is for igt tests, but this would also be useful for
compositors that wait until close to vblank deadline to make decisions
about which frame to show.

The igt tests can be found at:

https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline

v2: Clarify the timebase, add link to igt tests
v3: Use u64 value in ns to express deadline.
v4: More doc

Signed-off-by: Rob Clark <[email protected]>
Acked-by: Pekka Paalanen <[email protected]>
---
drivers/dma-buf/dma-fence.c | 3 ++-
drivers/dma-buf/sync_file.c | 19 +++++++++++++++++++
include/uapi/linux/sync_file.h | 22 ++++++++++++++++++++++
3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f177c56269bb..74e36f6d05b0 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -933,7 +933,8 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
* the GPU's devfreq to reduce frequency, when in fact the opposite is what is
* needed.
*
- * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
+ * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline
+ * (or indirectly via userspace facing ioctls like &sync_set_deadline).
* The deadline hint provides a way for the waiting driver, or userspace, to
* convey an appropriate sense of urgency to the signaling driver.
*
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index af57799c86ce..418021cfb87c 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -350,6 +350,22 @@ static long sync_file_ioctl_fence_info(struct sync_file *sync_file,
return ret;
}

+static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
+ unsigned long arg)
+{
+ struct sync_set_deadline ts;
+
+ if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
+ return -EFAULT;
+
+ if (ts.pad)
+ return -EINVAL;
+
+ dma_fence_set_deadline(sync_file->fence, ns_to_ktime(ts.deadline_ns));
+
+ return 0;
+}
+
static long sync_file_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -362,6 +378,9 @@ static long sync_file_ioctl(struct file *file, unsigned int cmd,
case SYNC_IOC_FILE_INFO:
return sync_file_ioctl_fence_info(sync_file, arg);

+ case SYNC_IOC_SET_DEADLINE:
+ return sync_file_ioctl_set_deadline(sync_file, arg);
+
default:
return -ENOTTY;
}
diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
index 7e42a5b7558b..d61752dca4c6 100644
--- a/include/uapi/linux/sync_file.h
+++ b/include/uapi/linux/sync_file.h
@@ -76,6 +76,27 @@ struct sync_file_info {
__u64 sync_fence_info;
};

+/**
+ * struct sync_set_deadline - SYNC_IOC_SET_DEADLINE - set a deadline hint on a fence
+ * @deadline_ns: absolute time of the deadline
+ * @pad: must be zero
+ *
+ * Allows userspace to set a deadline on a fence, see &dma_fence_set_deadline
+ *
+ * The timebase for the deadline is CLOCK_MONOTONIC (same as vblank). For
+ * example
+ *
+ * clock_gettime(CLOCK_MONOTONIC, &t);
+ * deadline_ns = (t.tv_sec * 1000000000L) + t.tv_nsec + ns_until_deadline
+ */
+struct sync_set_deadline {
+ __u64 deadline_ns;
+ /* Not strictly needed for alignment but gives some possibility
+ * for future extension:
+ */
+ __u64 pad;
+};
+
#define SYNC_IOC_MAGIC '>'

/*
@@ -87,5 +108,6 @@ struct sync_file_info {

#define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
#define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
+#define SYNC_IOC_SET_DEADLINE _IOW(SYNC_IOC_MAGIC, 5, struct sync_set_deadline)

#endif /* _UAPI_LINUX_SYNC_H */
--
2.39.2


2023-03-08 15:54:25

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 08/15] drm/scheduler: Add fence deadline support

As the finished fence is the one that is exposed to userspace, and
therefore the one that other operations, like atomic update, would
block on, we need to propagate the deadline from from the finished
fence to the actual hw fence.

v2: Split into drm_sched_fence_set_parent() (ckoenig)
v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees
fence->parent set before drm_sched_fence_set_parent() does this
test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT).

Signed-off-by: Rob Clark <[email protected]>
Acked-by: Luben Tuikov <[email protected]>
---
drivers/gpu/drm/scheduler/sched_fence.c | 46 +++++++++++++++++++++++++
drivers/gpu/drm/scheduler/sched_main.c | 2 +-
include/drm/gpu_scheduler.h | 17 +++++++++
3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 7fd869520ef2..fe9c6468e440 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -123,6 +123,37 @@ static void drm_sched_fence_release_finished(struct dma_fence *f)
dma_fence_put(&fence->scheduled);
}

+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
+ ktime_t deadline)
+{
+ struct drm_sched_fence *fence = to_drm_sched_fence(f);
+ struct dma_fence *parent;
+ unsigned long flags;
+
+ spin_lock_irqsave(&fence->lock, flags);
+
+ /* If we already have an earlier deadline, keep it: */
+ if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
+ ktime_before(fence->deadline, deadline)) {
+ spin_unlock_irqrestore(&fence->lock, flags);
+ return;
+ }
+
+ fence->deadline = deadline;
+ set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
+
+ spin_unlock_irqrestore(&fence->lock, flags);
+
+ /*
+ * smp_load_aquire() to ensure that if we are racing another
+ * thread calling drm_sched_fence_set_parent(), that we see
+ * the parent set before it calls test_bit(HAS_DEADLINE_BIT)
+ */
+ parent = smp_load_acquire(&fence->parent);
+ if (parent)
+ dma_fence_set_deadline(parent, deadline);
+}
+
static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -133,6 +164,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
.release = drm_sched_fence_release_finished,
+ .set_deadline = drm_sched_fence_set_deadline_finished,
};

struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
@@ -147,6 +179,20 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
}
EXPORT_SYMBOL(to_drm_sched_fence);

+void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence,
+ struct dma_fence *fence)
+{
+ /*
+ * smp_store_release() to ensure another thread racing us
+ * in drm_sched_fence_set_deadline_finished() sees the
+ * fence's parent set before test_bit()
+ */
+ smp_store_release(&s_fence->parent, dma_fence_get(fence));
+ if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT,
+ &s_fence->finished.flags))
+ dma_fence_set_deadline(fence, s_fence->deadline);
+}
+
struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
void *owner)
{
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 4e6ad6e122bc..007f98c48f8d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1019,7 +1019,7 @@ static int drm_sched_main(void *param)
drm_sched_fence_scheduled(s_fence);

if (!IS_ERR_OR_NULL(fence)) {
- s_fence->parent = dma_fence_get(fence);
+ drm_sched_fence_set_parent(s_fence, fence);
/* Drop for original kref_init of the fence */
dma_fence_put(fence);

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9db9e5e504ee..99584e457153 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -41,6 +41,15 @@
*/
#define DRM_SCHED_FENCE_DONT_PIPELINE DMA_FENCE_FLAG_USER_BITS

+/**
+ * DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT - A fence deadline hint has been set
+ *
+ * Because we could have a deadline hint can be set before the backing hw
+ * fence is created, we need to keep track of whether a deadline has already
+ * been set.
+ */
+#define DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT (DMA_FENCE_FLAG_USER_BITS + 1)
+
enum dma_resv_usage;
struct dma_resv;
struct drm_gem_object;
@@ -280,6 +289,12 @@ struct drm_sched_fence {
*/
struct dma_fence finished;

+ /**
+ * @deadline: deadline set on &drm_sched_fence.finished which
+ * potentially needs to be propagated to &drm_sched_fence.parent
+ */
+ ktime_t deadline;
+
/**
* @parent: the fence returned by &drm_sched_backend_ops.run_job
* when scheduling the job on hardware. We signal the
@@ -568,6 +583,8 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
enum drm_sched_priority priority);
bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);

+void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence,
+ struct dma_fence *fence);
struct drm_sched_fence *drm_sched_fence_alloc(
struct drm_sched_entity *s_entity, void *owner);
void drm_sched_fence_init(struct drm_sched_fence *fence,
--
2.39.2


2023-03-08 15:54:42

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 10/15] drm/vblank: Add helper to get next vblank time

From: Rob Clark <[email protected]>

Will be used in the next commit to set a deadline on fences that an
atomic update is waiting on.

v2: Calculate time at *start* of vblank period, not end
v3: Fix kbuild complaints

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Mario Kleiner <[email protected]>
---
drivers/gpu/drm/drm_vblank.c | 53 ++++++++++++++++++++++++++++++------
include/drm/drm_vblank.h | 1 +
2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 2ff31717a3de..299fa2a19a90 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -844,10 +844,9 @@ bool drm_crtc_vblank_helper_get_vblank_timestamp(struct drm_crtc *crtc,
EXPORT_SYMBOL(drm_crtc_vblank_helper_get_vblank_timestamp);

/**
- * drm_get_last_vbltimestamp - retrieve raw timestamp for the most recent
- * vblank interval
- * @dev: DRM device
- * @pipe: index of CRTC whose vblank timestamp to retrieve
+ * drm_crtc_get_last_vbltimestamp - retrieve raw timestamp for the most
+ * recent vblank interval
+ * @crtc: CRTC whose vblank timestamp to retrieve
* @tvblank: Pointer to target time which should receive the timestamp
* @in_vblank_irq:
* True when called from drm_crtc_handle_vblank(). Some drivers
@@ -865,10 +864,9 @@ EXPORT_SYMBOL(drm_crtc_vblank_helper_get_vblank_timestamp);
* True if timestamp is considered to be very precise, false otherwise.
*/
static bool
-drm_get_last_vbltimestamp(struct drm_device *dev, unsigned int pipe,
- ktime_t *tvblank, bool in_vblank_irq)
+drm_crtc_get_last_vbltimestamp(struct drm_crtc *crtc, ktime_t *tvblank,
+ bool in_vblank_irq)
{
- struct drm_crtc *crtc = drm_crtc_from_index(dev, pipe);
bool ret = false;

/* Define requested maximum error on timestamps (nanoseconds). */
@@ -876,8 +874,6 @@ drm_get_last_vbltimestamp(struct drm_device *dev, unsigned int pipe,

/* Query driver if possible and precision timestamping enabled. */
if (crtc && crtc->funcs->get_vblank_timestamp && max_error > 0) {
- struct drm_crtc *crtc = drm_crtc_from_index(dev, pipe);
-
ret = crtc->funcs->get_vblank_timestamp(crtc, &max_error,
tvblank, in_vblank_irq);
}
@@ -891,6 +887,15 @@ drm_get_last_vbltimestamp(struct drm_device *dev, unsigned int pipe,
return ret;
}

+static bool
+drm_get_last_vbltimestamp(struct drm_device *dev, unsigned int pipe,
+ ktime_t *tvblank, bool in_vblank_irq)
+{
+ struct drm_crtc *crtc = drm_crtc_from_index(dev, pipe);
+
+ return drm_crtc_get_last_vbltimestamp(crtc, tvblank, in_vblank_irq);
+}
+
/**
* drm_crtc_vblank_count - retrieve "cooked" vblank counter value
* @crtc: which counter to retrieve
@@ -980,6 +985,36 @@ u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
}
EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);

+/**
+ * drm_crtc_next_vblank_start - calculate the time of the next vblank
+ * @crtc: the crtc for which to calculate next vblank time
+ * @vblanktime: pointer to time to receive the next vblank timestamp.
+ *
+ * Calculate the expected time of the start of the next vblank period,
+ * based on time of previous vblank and frame duration
+ */
+int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime)
+{
+ unsigned int pipe = drm_crtc_index(crtc);
+ struct drm_vblank_crtc *vblank = &crtc->dev->vblank[pipe];
+ struct drm_display_mode *mode = &vblank->hwmode;
+ u64 vblank_start;
+
+ if (!vblank->framedur_ns || !vblank->linedur_ns)
+ return -EINVAL;
+
+ if (!drm_crtc_get_last_vbltimestamp(crtc, vblanktime, false))
+ return -EINVAL;
+
+ vblank_start = DIV_ROUND_DOWN_ULL(
+ (u64)vblank->framedur_ns * mode->crtc_vblank_start,
+ mode->crtc_vtotal);
+ *vblanktime = ktime_add(*vblanktime, ns_to_ktime(vblank_start));
+
+ return 0;
+}
+EXPORT_SYMBOL(drm_crtc_next_vblank_start);
+
static void send_vblank_event(struct drm_device *dev,
struct drm_pending_vblank_event *e,
u64 seq, ktime_t now)
diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
index 733a3e2d1d10..7f3957943dd1 100644
--- a/include/drm/drm_vblank.h
+++ b/include/drm/drm_vblank.h
@@ -230,6 +230,7 @@ bool drm_dev_has_vblank(const struct drm_device *dev);
u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
ktime_t *vblanktime);
+int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime);
void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
struct drm_pending_vblank_event *e);
void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
--
2.39.2


2023-03-08 15:55:15

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 09/15] drm/syncobj: Add deadline support for syncobj waits

From: Rob Clark <[email protected]>

Add a new flag to let userspace provide a deadline as a hint for syncobj
and timeline waits. This gives a hint to the driver signaling the
backing fences about how soon userspace needs it to compete work, so it
can addjust GPU frequency accordingly. An immediate deadline can be
given to provide something equivalent to i915 "wait boost".

v2: Use absolute u64 ns value for deadline hint, drop cap and driver
feature flag in favor of allowing count_handles==0 as a way for
userspace to probe kernel for support of new flag
v3: More verbose comments about UAPI

Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/drm_syncobj.c | 64 ++++++++++++++++++++++++++++-------
include/uapi/drm/drm.h | 17 ++++++++++
2 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0c2be8360525..a85e9464f07b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -126,6 +126,11 @@
* synchronize between the two.
* This requirement is inherited from the Vulkan fence API.
*
+ * If &DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
+ * a fence deadline hint on the backing fences before waiting, to provide the
+ * fence signaler with an appropriate sense of urgency. The deadline is
+ * specified as an absolute &CLOCK_MONOTONIC value in units of ns.
+ *
* Similarly, &DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
* handles as well as an array of u64 points and does a host-side wait on all
* of syncobj fences at the given points simultaneously.
@@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
uint32_t count,
uint32_t flags,
signed long timeout,
- uint32_t *idx)
+ uint32_t *idx,
+ ktime_t *deadline)
{
struct syncobj_wait_entry *entries;
struct dma_fence *fence;
@@ -1053,6 +1059,15 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
drm_syncobj_fence_add_wait(syncobjs[i], &entries[i]);
}

+ if (deadline) {
+ for (i = 0; i < count; ++i) {
+ fence = entries[i].fence;
+ if (!fence)
+ continue;
+ dma_fence_set_deadline(fence, *deadline);
+ }
+ }
+
do {
set_current_state(TASK_INTERRUPTIBLE);

@@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
struct drm_file *file_private,
struct drm_syncobj_wait *wait,
struct drm_syncobj_timeline_wait *timeline_wait,
- struct drm_syncobj **syncobjs, bool timeline)
+ struct drm_syncobj **syncobjs, bool timeline,
+ ktime_t *deadline)
{
signed long timeout = 0;
uint32_t first = ~0;
@@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
NULL,
wait->count_handles,
wait->flags,
- timeout, &first);
+ timeout, &first,
+ deadline);
if (timeout < 0)
return timeout;
wait->first_signaled = first;
@@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
u64_to_user_ptr(timeline_wait->points),
timeline_wait->count_handles,
timeline_wait->flags,
- timeout, &first);
+ timeout, &first,
+ deadline);
if (timeout < 0)
return timeout;
timeline_wait->first_signaled = first;
@@ -1243,17 +1261,22 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
{
struct drm_syncobj_wait *args = data;
struct drm_syncobj **syncobjs;
+ unsigned possible_flags;
+ ktime_t t, *tp = NULL;
int ret = 0;

if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
return -EOPNOTSUPP;

- if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
- DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+ possible_flags = DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE;
+
+ if (args->flags & ~possible_flags)
return -EINVAL;

if (args->count_handles == 0)
- return -EINVAL;
+ return 0;

ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
@@ -1262,8 +1285,13 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
if (ret < 0)
return ret;

+ if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE) {
+ t = ns_to_ktime(args->deadline_ns);
+ tp = &t;
+ }
+
ret = drm_syncobj_array_wait(dev, file_private,
- args, NULL, syncobjs, false);
+ args, NULL, syncobjs, false, tp);

drm_syncobj_array_free(syncobjs, args->count_handles);

@@ -1276,18 +1304,23 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
{
struct drm_syncobj_timeline_wait *args = data;
struct drm_syncobj **syncobjs;
+ unsigned possible_flags;
+ ktime_t t, *tp = NULL;
int ret = 0;

if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
return -EOPNOTSUPP;

- if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
- DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
- DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE))
+ possible_flags = DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE;
+
+ if (args->flags & ~possible_flags)
return -EINVAL;

if (args->count_handles == 0)
- return -EINVAL;
+ return -0;

ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
@@ -1296,8 +1329,13 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
if (ret < 0)
return ret;

+ if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE) {
+ t = ns_to_ktime(args->deadline_ns);
+ tp = &t;
+ }
+
ret = drm_syncobj_array_wait(dev, file_private,
- NULL, args, syncobjs, true);
+ NULL, args, syncobjs, true, tp);

drm_syncobj_array_free(syncobjs, args->count_handles);

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 642808520d92..bff0509ac8b6 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -887,6 +887,7 @@ struct drm_syncobj_transfer {
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for time point to become available */
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence deadline based to deadline_ns */
struct drm_syncobj_wait {
__u64 handles;
/* absolute timeout */
@@ -895,6 +896,14 @@ struct drm_syncobj_wait {
__u32 flags;
__u32 first_signaled; /* only valid when not waiting all */
__u32 pad;
+ /**
+ * @deadline_ns - fence deadline hint
+ *
+ * Deadline hint, in absolute CLOCK_MONOTONIC, to set on backing
+ * fence(s) if the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE flag is
+ * set.
+ */
+ __u64 deadline_ns;
};

struct drm_syncobj_timeline_wait {
@@ -907,6 +916,14 @@ struct drm_syncobj_timeline_wait {
__u32 flags;
__u32 first_signaled; /* only valid when not waiting all */
__u32 pad;
+ /**
+ * @deadline_ns - fence deadline hint
+ *
+ * Deadline hint, in absolute CLOCK_MONOTONIC, to set on backing
+ * fence(s) if the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE flag is
+ * set.
+ */
+ __u64 deadline_ns;
};


--
2.39.2


2023-03-08 15:55:15

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

From: Rob Clark <[email protected]>

For an atomic commit updating a single CRTC (ie. a pageflip) calculate
the next vblank time, and inform the fence(s) of that deadline.

v2: Comment typo fix (danvet)
v3: If there are multiple CRTCs, consider the time of the soonest vblank

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..28e3f2c8917e 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1511,6 +1511,41 @@ void drm_atomic_helper_commit_modeset_enables(struct drm_device *dev,
}
EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables);

+/*
+ * For atomic updates which touch just a single CRTC, calculate the time of the
+ * next vblank, and inform all the fences of the deadline.
+ */
+static void set_fence_deadline(struct drm_device *dev,
+ struct drm_atomic_state *state)
+{
+ struct drm_crtc *crtc;
+ struct drm_crtc_state *new_crtc_state;
+ struct drm_plane *plane;
+ struct drm_plane_state *new_plane_state;
+ ktime_t vbltime = 0;
+ int i;
+
+ for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
+ ktime_t v;
+
+ if (drm_crtc_next_vblank_start(crtc, &v))
+ continue;
+
+ if (!vbltime || ktime_before(v, vbltime))
+ vbltime = v;
+ }
+
+ /* If no CRTCs updated, then nothing to do: */
+ if (!vbltime)
+ return;
+
+ for_each_new_plane_in_state (state, plane, new_plane_state, i) {
+ if (!new_plane_state->fence)
+ continue;
+ dma_fence_set_deadline(new_plane_state->fence, vbltime);
+ }
+}
+
/**
* drm_atomic_helper_wait_for_fences - wait for fences stashed in plane state
* @dev: DRM device
@@ -1540,6 +1575,8 @@ int drm_atomic_helper_wait_for_fences(struct drm_device *dev,
struct drm_plane_state *new_plane_state;
int i, ret;

+ set_fence_deadline(dev, state);
+
for_each_new_plane_in_state(state, plane, new_plane_state, i) {
if (!new_plane_state->fence)
continue;
--
2.39.2


2023-03-08 15:55:15

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 12/15] drm/msm: Add deadline based boost support

From: Rob Clark <[email protected]>

Track the nearest deadline on a fence timeline and set a timer to expire
shortly before to trigger boost if the fence has not yet been signaled.

v2: rebase

Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++
drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++
2 files changed, 94 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 56641408ea74..51b461f32103 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -8,6 +8,35 @@

#include "msm_drv.h"
#include "msm_fence.h"
+#include "msm_gpu.h"
+
+static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx)
+{
+ struct msm_drm_private *priv = fctx->dev->dev_private;
+ return priv->gpu;
+}
+
+static enum hrtimer_restart deadline_timer(struct hrtimer *t)
+{
+ struct msm_fence_context *fctx = container_of(t,
+ struct msm_fence_context, deadline_timer);
+
+ kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
+
+ return HRTIMER_NORESTART;
+}
+
+static void deadline_work(struct kthread_work *work)
+{
+ struct msm_fence_context *fctx = container_of(work,
+ struct msm_fence_context, deadline_work);
+
+ /* If deadline fence has already passed, nothing to do: */
+ if (msm_fence_completed(fctx, fctx->next_deadline_fence))
+ return;
+
+ msm_devfreq_boost(fctx2gpu(fctx), 2);
+}


struct msm_fence_context *
@@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr,
fctx->completed_fence = fctx->last_fence;
*fctx->fenceptr = fctx->last_fence;

+ hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+ fctx->deadline_timer.function = deadline_timer;
+
+ kthread_init_work(&fctx->deadline_work, deadline_work);
+
+ fctx->next_deadline = ktime_get();
+
return fctx;
}

@@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
spin_lock_irqsave(&fctx->spinlock, flags);
if (fence_after(fence, fctx->completed_fence))
fctx->completed_fence = fence;
+ if (msm_fence_completed(fctx, fctx->next_deadline_fence))
+ hrtimer_cancel(&fctx->deadline_timer);
spin_unlock_irqrestore(&fctx->spinlock, flags);
}

@@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence)
return msm_fence_completed(f->fctx, f->base.seqno);
}

+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+ struct msm_fence *f = to_msm_fence(fence);
+ struct msm_fence_context *fctx = f->fctx;
+ unsigned long flags;
+ ktime_t now;
+
+ spin_lock_irqsave(&fctx->spinlock, flags);
+ now = ktime_get();
+
+ if (ktime_after(now, fctx->next_deadline) ||
+ ktime_before(deadline, fctx->next_deadline)) {
+ fctx->next_deadline = deadline;
+ fctx->next_deadline_fence =
+ max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
+
+ /*
+ * Set timer to trigger boost 3ms before deadline, or
+ * if we are already less than 3ms before the deadline
+ * schedule boost work immediately.
+ */
+ deadline = ktime_sub(deadline, ms_to_ktime(3));
+
+ if (ktime_after(now, deadline)) {
+ kthread_queue_work(fctx2gpu(fctx)->worker,
+ &fctx->deadline_work);
+ } else {
+ hrtimer_start(&fctx->deadline_timer, deadline,
+ HRTIMER_MODE_ABS);
+ }
+ }
+
+ spin_unlock_irqrestore(&fctx->spinlock, flags);
+}
+
static const struct dma_fence_ops msm_fence_ops = {
.get_driver_name = msm_fence_get_driver_name,
.get_timeline_name = msm_fence_get_timeline_name,
.signaled = msm_fence_signaled,
+ .set_deadline = msm_fence_set_deadline,
};

struct dma_fence *
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
index 7f1798c54cd1..cdaebfb94f5c 100644
--- a/drivers/gpu/drm/msm/msm_fence.h
+++ b/drivers/gpu/drm/msm/msm_fence.h
@@ -52,6 +52,26 @@ struct msm_fence_context {
volatile uint32_t *fenceptr;

spinlock_t spinlock;
+
+ /*
+ * TODO this doesn't really deal with multiple deadlines, like
+ * if userspace got multiple frames ahead.. OTOH atomic updates
+ * don't queue, so maybe that is ok
+ */
+
+ /** next_deadline: Time of next deadline */
+ ktime_t next_deadline;
+
+ /**
+ * next_deadline_fence:
+ *
+ * Fence value for next pending deadline. The deadline timer is
+ * canceled when this fence is signaled.
+ */
+ uint32_t next_deadline_fence;
+
+ struct hrtimer deadline_timer;
+ struct kthread_work deadline_work;
};

struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
--
2.39.2


2023-03-08 15:55:28

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 13/15] drm/msm: Add wait-boost support

From: Rob Clark <[email protected]>

Add a way for various userspace waits to signal urgency.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/msm/msm_drv.c | 12 ++++++++----
drivers/gpu/drm/msm/msm_gem.c | 5 +++++
include/uapi/drm/msm_drm.h | 14 ++++++++++++--
3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index aca48c868c14..f6764a86b2da 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -46,6 +46,7 @@
* - 1.8.0 - Add MSM_BO_CACHED_COHERENT for supported GPUs (a6xx)
* - 1.9.0 - Add MSM_SUBMIT_FENCE_SN_IN
* - 1.10.0 - Add MSM_SUBMIT_BO_NO_IMPLICIT
+ * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
*/
#define MSM_VERSION_MAJOR 1
#define MSM_VERSION_MINOR 10
@@ -899,7 +900,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
}

static int wait_fence(struct msm_gpu_submitqueue *queue, uint32_t fence_id,
- ktime_t timeout)
+ ktime_t timeout, uint32_t flags)
{
struct dma_fence *fence;
int ret;
@@ -929,6 +930,9 @@ static int wait_fence(struct msm_gpu_submitqueue *queue, uint32_t fence_id,
if (!fence)
return 0;

+ if (flags & MSM_WAIT_FENCE_BOOST)
+ dma_fence_set_deadline(fence, ktime_get());
+
ret = dma_fence_wait_timeout(fence, true, timeout_to_jiffies(&timeout));
if (ret == 0) {
ret = -ETIMEDOUT;
@@ -949,8 +953,8 @@ static int msm_ioctl_wait_fence(struct drm_device *dev, void *data,
struct msm_gpu_submitqueue *queue;
int ret;

- if (args->pad) {
- DRM_ERROR("invalid pad: %08x\n", args->pad);
+ if (args->flags & ~MSM_WAIT_FENCE_FLAGS) {
+ DRM_ERROR("invalid flags: %08x\n", args->flags);
return -EINVAL;
}

@@ -961,7 +965,7 @@ static int msm_ioctl_wait_fence(struct drm_device *dev, void *data,
if (!queue)
return -ENOENT;

- ret = wait_fence(queue, args->fence, to_ktime(args->timeout));
+ ret = wait_fence(queue, args->fence, to_ktime(args->timeout), args->flags);

msm_submitqueue_put(queue);

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 1dee0d18abbb..dd4a0d773f6e 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -846,6 +846,11 @@ int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout)
op & MSM_PREP_NOSYNC ? 0 : timeout_to_jiffies(timeout);
long ret;

+ if (op & MSM_PREP_BOOST) {
+ dma_resv_set_deadline(obj->resv, dma_resv_usage_rw(write),
+ ktime_get());
+ }
+
ret = dma_resv_wait_timeout(obj->resv, dma_resv_usage_rw(write),
true, remain);
if (ret == 0)
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 329100016e7c..dbf0d6f43fa9 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -151,8 +151,13 @@ struct drm_msm_gem_info {
#define MSM_PREP_READ 0x01
#define MSM_PREP_WRITE 0x02
#define MSM_PREP_NOSYNC 0x04
+#define MSM_PREP_BOOST 0x08

-#define MSM_PREP_FLAGS (MSM_PREP_READ | MSM_PREP_WRITE | MSM_PREP_NOSYNC)
+#define MSM_PREP_FLAGS (MSM_PREP_READ | \
+ MSM_PREP_WRITE | \
+ MSM_PREP_NOSYNC | \
+ MSM_PREP_BOOST | \
+ 0)

struct drm_msm_gem_cpu_prep {
__u32 handle; /* in */
@@ -286,6 +291,11 @@ struct drm_msm_gem_submit {

};

+#define MSM_WAIT_FENCE_BOOST 0x00000001
+#define MSM_WAIT_FENCE_FLAGS ( \
+ MSM_WAIT_FENCE_BOOST | \
+ 0)
+
/* The normal way to synchronize with the GPU is just to CPU_PREP on
* a buffer if you need to access it from the CPU (other cmdstream
* submission from same or other contexts, PAGE_FLIP ioctl, etc, all
@@ -295,7 +305,7 @@ struct drm_msm_gem_submit {
*/
struct drm_msm_wait_fence {
__u32 fence; /* in */
- __u32 pad;
+ __u32 flags; /* in, bitmask of MSM_WAIT_FENCE_x */
struct drm_msm_timespec timeout; /* in */
__u32 queueid; /* in, submitqueue id */
};
--
2.39.2


2023-03-08 15:55:36

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 14/15] drm/msm/atomic: Switch to vblank_start helper

From: Rob Clark <[email protected]>

Drop our custom thing and switch to drm_crtc_next_vblank_start() for
calculating the time of the start of the next vblank period.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 ---------------
drivers/gpu/drm/msm/msm_atomic.c | 8 +++++---
drivers/gpu/drm/msm/msm_kms.h | 8 --------
3 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index a683bd9b5a04..43996aecaf8c 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -411,20 +411,6 @@ static void dpu_kms_disable_commit(struct msm_kms *kms)
pm_runtime_put_sync(&dpu_kms->pdev->dev);
}

-static ktime_t dpu_kms_vsync_time(struct msm_kms *kms, struct drm_crtc *crtc)
-{
- struct drm_encoder *encoder;
-
- drm_for_each_encoder_mask(encoder, crtc->dev, crtc->state->encoder_mask) {
- ktime_t vsync_time;
-
- if (dpu_encoder_vsync_time(encoder, &vsync_time) == 0)
- return vsync_time;
- }
-
- return ktime_get();
-}
-
static void dpu_kms_prepare_commit(struct msm_kms *kms,
struct drm_atomic_state *state)
{
@@ -953,7 +939,6 @@ static const struct msm_kms_funcs kms_funcs = {
.irq = dpu_core_irq,
.enable_commit = dpu_kms_enable_commit,
.disable_commit = dpu_kms_disable_commit,
- .vsync_time = dpu_kms_vsync_time,
.prepare_commit = dpu_kms_prepare_commit,
.flush_commit = dpu_kms_flush_commit,
.wait_flush = dpu_kms_wait_flush,
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index 1686fbb611fd..c5e71c05f038 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -186,8 +186,7 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
struct msm_kms *kms = priv->kms;
struct drm_crtc *async_crtc = NULL;
unsigned crtc_mask = get_crtc_mask(state);
- bool async = kms->funcs->vsync_time &&
- can_do_async(state, &async_crtc);
+ bool async = can_do_async(state, &async_crtc);

trace_msm_atomic_commit_tail_start(async, crtc_mask);

@@ -231,7 +230,9 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)

kms->pending_crtc_mask |= crtc_mask;

- vsync_time = kms->funcs->vsync_time(kms, async_crtc);
+ if (drm_crtc_next_vblank_start(async_crtc, &vsync_time))
+ goto fallback;
+
wakeup_time = ktime_sub(vsync_time, ms_to_ktime(1));

msm_hrtimer_queue_work(&timer->work, wakeup_time,
@@ -253,6 +254,7 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
return;
}

+fallback:
/*
* If there is any async flush pending on updated crtcs, fold
* them into the current flush.
diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
index f8ed7588928c..086a3f1ff956 100644
--- a/drivers/gpu/drm/msm/msm_kms.h
+++ b/drivers/gpu/drm/msm/msm_kms.h
@@ -59,14 +59,6 @@ struct msm_kms_funcs {
void (*enable_commit)(struct msm_kms *kms);
void (*disable_commit)(struct msm_kms *kms);

- /**
- * If the kms backend supports async commit, it should implement
- * this method to return the time of the next vsync. This is
- * used to determine a time slightly before vsync, for the async
- * commit timer to run and complete an async commit.
- */
- ktime_t (*vsync_time)(struct msm_kms *kms, struct drm_crtc *crtc);
-
/**
* Prepare for atomic commit. This is called after any previous
* (async or otherwise) commit has completed.
--
2.39.2


2023-03-08 15:56:14

by Rob Clark

[permalink] [raw]
Subject: [PATCH v10 15/15] drm/i915: Add deadline based boost support

From: Rob Clark <[email protected]>

I expect this patch to be replaced by someone who knows i915 better.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence)
return i915_request_enable_breadcrumb(to_request(fence));
}

+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+ struct i915_request *rq = to_request(fence);
+
+ if (i915_request_completed(rq))
+ return;
+
+ if (i915_request_started(rq))
+ return;
+
+ /*
+ * TODO something more clever for deadlines that are in the
+ * future. I think probably track the nearest deadline in
+ * rq->timeline and set timer to trigger boost accordingly?
+ */
+
+ intel_rps_boost(rq);
+}
+
static signed long i915_fence_wait(struct dma_fence *fence,
bool interruptible,
signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+ .set_deadline = i915_fence_set_deadline,
};

static void irq_execute_cb(struct irq_work *wrk)
--
2.39.2


2023-03-09 10:21:59

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v10 00/15] dma-fence: Deadline awareness

On Wed, 8 Mar 2023 07:52:51 -0800
Rob Clark <[email protected]> wrote:

> From: Rob Clark <[email protected]>
>
> This series adds a deadline hint to fences, so realtime deadlines
> such as vblank can be communicated to the fence signaller for power/
> frequency management decisions.
>
> This is partially inspired by a trick i915 does, but implemented
> via dma-fence for a couple of reasons:
>
> 1) To continue to be able to use the atomic helpers
> 2) To support cases where display and gpu are different drivers
>
> This iteration adds a dma-fence ioctl to set a deadline (both to
> support igt-tests, and compositors which delay decisions about which
> client buffer to display), and a sw_sync ioctl to read back the
> deadline. IGT tests utilizing these can be found at:
>
> https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline
>
>
> v1: https://patchwork.freedesktop.org/series/93035/
> v2: Move filtering out of later deadlines to fence implementation
> to avoid increasing the size of dma_fence
> v3: Add support in fence-array and fence-chain; Add some uabi to
> support igt tests and userspace compositors.
> v4: Rebase, address various comments, and add syncobj deadline
> support, and sync_file EPOLLPRI based on experience with perf/
> freq issues with clvk compute workloads on i915 (anv)
> v5: Clarify that this is a hint as opposed to a more hard deadline
> guarantee, switch to using u64 ns values in UABI (still absolute
> CLOCK_MONOTONIC values), drop syncobj related cap and driver
> feature flag in favor of allowing count_handles==0 for probing
> kernel support.
> v6: Re-work vblank helper to calculate time of _start_ of vblank,
> and work correctly if the last vblank event was more than a
> frame ago. Add (mostly unrelated) drm/msm patch which also
> uses the vblank helper. Use dma_fence_chain_contained(). More
> verbose syncobj UABI comments. Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> v7: Fix kbuild complaints about vblank helper. Add more docs.
> v8: Add patch to surface sync_file UAPI, and more docs updates.
> v9: Drop (E)POLLPRI support.. I still like it, but not essential and
> it can always be revived later. Fix doc build warning.
> v10: Update 11/15 to handle multiple CRTCs

Hi Rob,

it is very nice to keep revision numbers and list the changes in each
patch. If I looked at series v8 last, and I now see series v10, and I
look at a patch that lists changes done in v7, how do I know if that
change was made between series v8 and v10 or earlier?

At least in some previous revision, series might have been v8 and a
patch have new changes listed as v5 (because it was the 5th time that
one patch was changed) instead of v8.

Am I expected to keep track of vN of each individual patch
independently?


Thanks,
pq


Attachments:
(No filename) (833.00 B)
OpenPGP digital signature

2023-03-10 15:53:41

by Jonas Ådahl

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> Add a way to hint to the fence signaler of an upcoming deadline, such as
> vblank, which the fence waiter would prefer not to miss. This is to aid
> the fence signaler in making power management decisions, like boosting
> frequency as the deadline approaches and awareness of missing deadlines
> so that can be factored in to the frequency scaling.
>
> v2: Drop dma_fence::deadline and related logic to filter duplicate
> deadlines, to avoid increasing dma_fence size. The fence-context
> implementation will need similar logic to track deadlines of all
> the fences on the same timeline. [ckoenig]
> v3: Clarify locking wrt. set_deadline callback
> v4: Clarify in docs comment that this is a hint
> v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> v6: More docs
> v7: Fix typo, clarify past deadlines
>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Christian K?nig <[email protected]>
> Acked-by: Pekka Paalanen <[email protected]>
> Reviewed-by: Bagas Sanjaya <[email protected]>
> ---

Hi Rob!

> Documentation/driver-api/dma-buf.rst | 6 +++
> drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> include/linux/dma-fence.h | 22 +++++++++++
> 3 files changed, 87 insertions(+)
>
> diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> index 622b8156d212..183e480d8cea 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> .. kernel-doc:: drivers/dma-buf/dma-fence.c
> :doc: fence signalling annotation
>
> +DMA Fence Deadline Hints
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> + :doc: deadline hints
> +
> DMA Fences Functions Reference
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 0de0482cd36e..f177c56269bb 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> }
> EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>
> +/**
> + * DOC: deadline hints
> + *
> + * In an ideal world, it would be possible to pipeline a workload sufficiently
> + * that a utilization based device frequency governor could arrive at a minimum
> + * frequency that meets the requirements of the use-case, in order to minimize
> + * power consumption. But in the real world there are many workloads which
> + * defy this ideal. For example, but not limited to:
> + *
> + * * Workloads that ping-pong between device and CPU, with alternating periods
> + * of CPU waiting for device, and device waiting on CPU. This can result in
> + * devfreq and cpufreq seeing idle time in their respective domains and in
> + * result reduce frequency.
> + *
> + * * Workloads that interact with a periodic time based deadline, such as double
> + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> + * missing a vblank deadline results in an *increase* in idle time on the GPU
> + * (since it has to wait an additional vblank period), sending a signal to
> + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> + * needed.

This is the use case I'd like to get some better understanding about how
this series intends to work, as the problematic scheduling behavior
triggered by missed deadlines has plagued compositing display servers
for a long time.

I apologize, I'm not a GPU driver developer, nor an OpenGL driver
developer, so I will need some hand holding when it comes to
understanding exactly what piece of software is responsible for
communicating what piece of information.

> + *
> + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> + * The deadline hint provides a way for the waiting driver, or userspace, to
> + * convey an appropriate sense of urgency to the signaling driver.
> + *
> + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> + * facing APIs). The time could either be some point in the future (such as
> + * the vblank based deadline for page-flipping, or the start of a compositor's
> + * composition cycle), or the current time to indicate an immediate deadline
> + * hint (Ie. forward progress cannot be made until this fence is signaled).

Is it guaranteed that a GPU driver will use the actual start of the
vblank as the effective deadline? I have some memories of seing
something about vblank evasion browsing driver code, which I might have
misunderstood, but I have yet to find whether this is something
userspace can actually expect to be something it can rely on.

Can userspace set a deadline that targets the next vblank deadline
before GPU work has been flushed e.g. at the start of a paint cycle, and
still be sure that the kernel has the information it needs to know it should
make its clocks increase their speed in time for when the actual work
has been actually flushed? Or is it needed that the this deadline is set
at the end?

What I'm more or less trying to ask is, will a mode setting compositor
be able to tell the kernel to boost its clocks at the time it knows is
best, and how will it in practice achieve this?

For example relying on the atomic mode setting commit setting the
deadline is fundamentally flawed, since user space will at times want to
purposefully delay committing until as late as possible, without doing
so causing an increased risk of missing the deadline due to the kernel
not speeding up clocks at the right time for GPU work that has already
been flushed long ago.

Relying on commits also has no effect on GPU work queued by
a compositor drawing only to dma-bufs that are never intended to be
presented using mode setting. How can we make sure a compositor can
provide hints that the kernel will know to respect despite the
compositor not being drm master?


Jonas

> + *
> + * Multiple deadlines may be set on a given fence, even in parallel. See the
> + * documentation for &dma_fence_ops.set_deadline.
> + *
> + * The deadline hint is just that, a hint. The driver that created the fence
> + * may react by increasing frequency, making different scheduling choices, etc.
> + * Or doing nothing at all.
> + */
> +
> +/**
> + * dma_fence_set_deadline - set desired fence-wait deadline hint
> + * @fence: the fence that is to be waited on
> + * @deadline: the time by which the waiter hopes for the fence to be
> + * signaled
> + *
> + * Give the fence signaler a hint about an upcoming deadline, such as
> + * vblank, by which point the waiter would prefer the fence to be
> + * signaled by. This is intended to give feedback to the fence signaler
> + * to aid in power management decisions, such as boosting GPU frequency
> + * if a periodic vblank deadline is approaching but the fence is not
> + * yet signaled..
> + */
> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> +{
> + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> + fence->ops->set_deadline(fence, deadline);
> +}
> +EXPORT_SYMBOL(dma_fence_set_deadline);
> +
> /**
> * dma_fence_describe - Dump fence describtion into seq_file
> * @fence: the 6fence to describe
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 775cdc0b4f24..d54b595a0fe0 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -257,6 +257,26 @@ struct dma_fence_ops {
> */
> void (*timeline_value_str)(struct dma_fence *fence,
> char *str, int size);
> +
> + /**
> + * @set_deadline:
> + *
> + * Callback to allow a fence waiter to inform the fence signaler of
> + * an upcoming deadline, such as vblank, by which point the waiter
> + * would prefer the fence to be signaled by. This is intended to
> + * give feedback to the fence signaler to aid in power management
> + * decisions, such as boosting GPU frequency.
> + *
> + * This is called without &dma_fence.lock held, it can be called
> + * multiple times and from any context. Locking is up to the callee
> + * if it has some state to manage. If multiple deadlines are set,
> + * the expectation is to track the soonest one. If the deadline is
> + * before the current time, it should be interpreted as an immediate
> + * deadline.
> + *
> + * This callback is optional.
> + */
> + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> };
>
> void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> @@ -583,6 +603,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
> return ret < 0 ? ret : 0;
> }
>
> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> +
> struct dma_fence *dma_fence_get_stub(void);
> struct dma_fence *dma_fence_allocate_private_stub(void);
> u64 dma_fence_context_alloc(unsigned num);
> --
> 2.39.2
>

2023-03-10 17:39:19

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
>
> On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > From: Rob Clark <[email protected]>
> >
> > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > vblank, which the fence waiter would prefer not to miss. This is to aid
> > the fence signaler in making power management decisions, like boosting
> > frequency as the deadline approaches and awareness of missing deadlines
> > so that can be factored in to the frequency scaling.
> >
> > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > deadlines, to avoid increasing dma_fence size. The fence-context
> > implementation will need similar logic to track deadlines of all
> > the fences on the same timeline. [ckoenig]
> > v3: Clarify locking wrt. set_deadline callback
> > v4: Clarify in docs comment that this is a hint
> > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > v6: More docs
> > v7: Fix typo, clarify past deadlines
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > Reviewed-by: Christian König <[email protected]>
> > Acked-by: Pekka Paalanen <[email protected]>
> > Reviewed-by: Bagas Sanjaya <[email protected]>
> > ---
>
> Hi Rob!
>
> > Documentation/driver-api/dma-buf.rst | 6 +++
> > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > include/linux/dma-fence.h | 22 +++++++++++
> > 3 files changed, 87 insertions(+)
> >
> > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > index 622b8156d212..183e480d8cea 100644
> > --- a/Documentation/driver-api/dma-buf.rst
> > +++ b/Documentation/driver-api/dma-buf.rst
> > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > :doc: fence signalling annotation
> >
> > +DMA Fence Deadline Hints
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > + :doc: deadline hints
> > +
> > DMA Fences Functions Reference
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 0de0482cd36e..f177c56269bb 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > }
> > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >
> > +/**
> > + * DOC: deadline hints
> > + *
> > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > + * that a utilization based device frequency governor could arrive at a minimum
> > + * frequency that meets the requirements of the use-case, in order to minimize
> > + * power consumption. But in the real world there are many workloads which
> > + * defy this ideal. For example, but not limited to:
> > + *
> > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > + * result reduce frequency.
> > + *
> > + * * Workloads that interact with a periodic time based deadline, such as double
> > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > + * (since it has to wait an additional vblank period), sending a signal to
> > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > + * needed.
>
> This is the use case I'd like to get some better understanding about how
> this series intends to work, as the problematic scheduling behavior
> triggered by missed deadlines has plagued compositing display servers
> for a long time.
>
> I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> developer, so I will need some hand holding when it comes to
> understanding exactly what piece of software is responsible for
> communicating what piece of information.
>
> > + *
> > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > + * convey an appropriate sense of urgency to the signaling driver.
> > + *
> > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > + * facing APIs). The time could either be some point in the future (such as
> > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > + * composition cycle), or the current time to indicate an immediate deadline
> > + * hint (Ie. forward progress cannot be made until this fence is signaled).
>
> Is it guaranteed that a GPU driver will use the actual start of the
> vblank as the effective deadline? I have some memories of seing
> something about vblank evasion browsing driver code, which I might have
> misunderstood, but I have yet to find whether this is something
> userspace can actually expect to be something it can rely on.

I guess you mean s/GPU driver/display driver/ ? It makes things more
clear if we talk about them separately even if they happen to be the
same device.

Assuming that is what you mean, nothing strongly defines what the
deadline is. In practice there is probably some buffering in the
display controller. For ex, block based (including bandwidth
compressed) formats, you need to buffer up a row of blocks to
efficiently linearize for scanout. So you probably need to latch some
time before you start sending pixel data to the display. But details
like this are heavily implementation dependent. I think the most
reasonable thing to target is start of vblank.

Also, keep in mind the deadline hint is just that. It won't magically
make the GPU finish by that deadline, but it gives the GPU driver
information about lateness so it can realize if it needs to clock up.

> Can userspace set a deadline that targets the next vblank deadline
> before GPU work has been flushed e.g. at the start of a paint cycle, and
> still be sure that the kernel has the information it needs to know it should
> make its clocks increase their speed in time for when the actual work
> has been actually flushed? Or is it needed that the this deadline is set
> at the end?

You need a fence to set the deadline, and for that work needs to be
flushed. But you can't associate a deadline with work that the kernel
is unaware of anyways.

> What I'm more or less trying to ask is, will a mode setting compositor
> be able to tell the kernel to boost its clocks at the time it knows is
> best, and how will it in practice achieve this?

The anticipated usage for a compositor is that, when you receive a
<buf, fence> pair from an app, you immediately set a deadline for
upcoming start-of-vblank on the fence fd passed from the app. (Or for
implicit sync you can use DMA_BUF_IOCTL_EXPORT_SYNC_FILE). For the
composite step, no need to set a deadline as this is already done on
the kernel side in drm_atomic_helper_wait_for_fences().

> For example relying on the atomic mode setting commit setting the
> deadline is fundamentally flawed, since user space will at times want to
> purposefully delay committing until as late as possible, without doing
> so causing an increased risk of missing the deadline due to the kernel
> not speeding up clocks at the right time for GPU work that has already
> been flushed long ago.

Right, this is the point for exposing the ioctl to userspace.

> Relying on commits also has no effect on GPU work queued by
> a compositor drawing only to dma-bufs that are never intended to be
> presented using mode setting. How can we make sure a compositor can
> provide hints that the kernel will know to respect despite the
> compositor not being drm master?

It doesn't matter if there are indirect dependencies. Even if the
compositor completely ignores deadline hints and fancy tricks like
delaying composite decisions, the indirect dependency (app rendering)
will delay the direct dependency (compositor rendering) of the page
flip. So the driver will still see whether it is late or early
compared to the deadline, allowing it to adjust freq in the
appropriate direction for the next frame.

BR,
-R

>
> Jonas
>
> > + *
> > + * Multiple deadlines may be set on a given fence, even in parallel. See the
> > + * documentation for &dma_fence_ops.set_deadline.
> > + *
> > + * The deadline hint is just that, a hint. The driver that created the fence
> > + * may react by increasing frequency, making different scheduling choices, etc.
> > + * Or doing nothing at all.
> > + */
> > +
> > +/**
> > + * dma_fence_set_deadline - set desired fence-wait deadline hint
> > + * @fence: the fence that is to be waited on
> > + * @deadline: the time by which the waiter hopes for the fence to be
> > + * signaled
> > + *
> > + * Give the fence signaler a hint about an upcoming deadline, such as
> > + * vblank, by which point the waiter would prefer the fence to be
> > + * signaled by. This is intended to give feedback to the fence signaler
> > + * to aid in power management decisions, such as boosting GPU frequency
> > + * if a periodic vblank deadline is approaching but the fence is not
> > + * yet signaled..
> > + */
> > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> > +{
> > + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> > + fence->ops->set_deadline(fence, deadline);
> > +}
> > +EXPORT_SYMBOL(dma_fence_set_deadline);
> > +
> > /**
> > * dma_fence_describe - Dump fence describtion into seq_file
> > * @fence: the 6fence to describe
> > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > index 775cdc0b4f24..d54b595a0fe0 100644
> > --- a/include/linux/dma-fence.h
> > +++ b/include/linux/dma-fence.h
> > @@ -257,6 +257,26 @@ struct dma_fence_ops {
> > */
> > void (*timeline_value_str)(struct dma_fence *fence,
> > char *str, int size);
> > +
> > + /**
> > + * @set_deadline:
> > + *
> > + * Callback to allow a fence waiter to inform the fence signaler of
> > + * an upcoming deadline, such as vblank, by which point the waiter
> > + * would prefer the fence to be signaled by. This is intended to
> > + * give feedback to the fence signaler to aid in power management
> > + * decisions, such as boosting GPU frequency.
> > + *
> > + * This is called without &dma_fence.lock held, it can be called
> > + * multiple times and from any context. Locking is up to the callee
> > + * if it has some state to manage. If multiple deadlines are set,
> > + * the expectation is to track the soonest one. If the deadline is
> > + * before the current time, it should be interpreted as an immediate
> > + * deadline.
> > + *
> > + * This callback is optional.
> > + */
> > + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > };
> >
> > void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > @@ -583,6 +603,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
> > return ret < 0 ? ret : 0;
> > }
> >
> > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> > +
> > struct dma_fence *dma_fence_get_stub(void);
> > struct dma_fence *dma_fence_allocate_private_stub(void);
> > u64 dma_fence_context_alloc(unsigned num);
> > --
> > 2.39.2
> >

2023-03-15 13:53:14

by Jonas Ådahl

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> >
> > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > From: Rob Clark <[email protected]>
> > >
> > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > the fence signaler in making power management decisions, like boosting
> > > frequency as the deadline approaches and awareness of missing deadlines
> > > so that can be factored in to the frequency scaling.
> > >
> > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > implementation will need similar logic to track deadlines of all
> > > the fences on the same timeline. [ckoenig]
> > > v3: Clarify locking wrt. set_deadline callback
> > > v4: Clarify in docs comment that this is a hint
> > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > v6: More docs
> > > v7: Fix typo, clarify past deadlines
> > >
> > > Signed-off-by: Rob Clark <[email protected]>
> > > Reviewed-by: Christian König <[email protected]>
> > > Acked-by: Pekka Paalanen <[email protected]>
> > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > ---
> >
> > Hi Rob!
> >
> > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > include/linux/dma-fence.h | 22 +++++++++++
> > > 3 files changed, 87 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > index 622b8156d212..183e480d8cea 100644
> > > --- a/Documentation/driver-api/dma-buf.rst
> > > +++ b/Documentation/driver-api/dma-buf.rst
> > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > :doc: fence signalling annotation
> > >
> > > +DMA Fence Deadline Hints
> > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > + :doc: deadline hints
> > > +
> > > DMA Fences Functions Reference
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > index 0de0482cd36e..f177c56269bb 100644
> > > --- a/drivers/dma-buf/dma-fence.c
> > > +++ b/drivers/dma-buf/dma-fence.c
> > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > }
> > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > >
> > > +/**
> > > + * DOC: deadline hints
> > > + *
> > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > + * that a utilization based device frequency governor could arrive at a minimum
> > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > + * power consumption. But in the real world there are many workloads which
> > > + * defy this ideal. For example, but not limited to:
> > > + *
> > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > + * result reduce frequency.
> > > + *
> > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > + * (since it has to wait an additional vblank period), sending a signal to
> > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > + * needed.
> >
> > This is the use case I'd like to get some better understanding about how
> > this series intends to work, as the problematic scheduling behavior
> > triggered by missed deadlines has plagued compositing display servers
> > for a long time.
> >
> > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > developer, so I will need some hand holding when it comes to
> > understanding exactly what piece of software is responsible for
> > communicating what piece of information.
> >
> > > + *
> > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > + * convey an appropriate sense of urgency to the signaling driver.
> > > + *
> > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > + * facing APIs). The time could either be some point in the future (such as
> > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > + * composition cycle), or the current time to indicate an immediate deadline
> > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> >
> > Is it guaranteed that a GPU driver will use the actual start of the
> > vblank as the effective deadline? I have some memories of seing
> > something about vblank evasion browsing driver code, which I might have
> > misunderstood, but I have yet to find whether this is something
> > userspace can actually expect to be something it can rely on.
>
> I guess you mean s/GPU driver/display driver/ ? It makes things more
> clear if we talk about them separately even if they happen to be the
> same device.

Sure, sorry about being unclear about that.

>
> Assuming that is what you mean, nothing strongly defines what the
> deadline is. In practice there is probably some buffering in the
> display controller. For ex, block based (including bandwidth
> compressed) formats, you need to buffer up a row of blocks to
> efficiently linearize for scanout. So you probably need to latch some
> time before you start sending pixel data to the display. But details
> like this are heavily implementation dependent. I think the most
> reasonable thing to target is start of vblank.

The driver exposing those details would be quite useful for userspace
though, so that it can delay committing updates to late, but not too
late. Setting a deadline to be the vblank seems easy enough, but it
isn't enough for scheduling the actual commit.

>
> Also, keep in mind the deadline hint is just that. It won't magically
> make the GPU finish by that deadline, but it gives the GPU driver
> information about lateness so it can realize if it needs to clock up.

Sure, even if the GPU ramped up clocks to the max, if the job queue is
too large, it won't magically invent more cycles to squeeze in.

>
> > Can userspace set a deadline that targets the next vblank deadline
> > before GPU work has been flushed e.g. at the start of a paint cycle, and
> > still be sure that the kernel has the information it needs to know it should
> > make its clocks increase their speed in time for when the actual work
> > has been actually flushed? Or is it needed that the this deadline is set
> > at the end?
>
> You need a fence to set the deadline, and for that work needs to be
> flushed. But you can't associate a deadline with work that the kernel
> is unaware of anyways.

That makes sense, but it might also a bit inadequate to have it as the
only way to tell the kernel it should speed things up. Even with the
trick i915 does, with GNOME Shell, we still end up with the feedback
loop this series aims to mitigate. Doing triple buffering, i.e. delaying
or dropping the first frame is so far the best work around that works,
except doing other tricks that makes the kernel to ramp up its clock.
Having to rely on choosing between latency and frame drops should
ideally not have to be made.

>
> > What I'm more or less trying to ask is, will a mode setting compositor
> > be able to tell the kernel to boost its clocks at the time it knows is
> > best, and how will it in practice achieve this?
>
> The anticipated usage for a compositor is that, when you receive a
> <buf, fence> pair from an app, you immediately set a deadline for
> upcoming start-of-vblank on the fence fd passed from the app. (Or for
> implicit sync you can use DMA_BUF_IOCTL_EXPORT_SYNC_FILE). For the
> composite step, no need to set a deadline as this is already done on
> the kernel side in drm_atomic_helper_wait_for_fences().

So it sounds like the new uapi will help compositors that do not draw
with the intention of page flipping anything, and compositors that
deliberately delay the commit. I suppose with proper target presentation
time integration EGL/Vulkan WSI can set deadlines them as well. All that
sounds like a welcomed improvement, but I'm not convinced it's enough to
solve the problems we currently face.

>
> > For example relying on the atomic mode setting commit setting the
> > deadline is fundamentally flawed, since user space will at times want to
> > purposefully delay committing until as late as possible, without doing
> > so causing an increased risk of missing the deadline due to the kernel
> > not speeding up clocks at the right time for GPU work that has already
> > been flushed long ago.
>
> Right, this is the point for exposing the ioctl to userspace.
>
> > Relying on commits also has no effect on GPU work queued by
> > a compositor drawing only to dma-bufs that are never intended to be
> > presented using mode setting. How can we make sure a compositor can
> > provide hints that the kernel will know to respect despite the
> > compositor not being drm master?
>
> It doesn't matter if there are indirect dependencies. Even if the
> compositor completely ignores deadline hints and fancy tricks like
> delaying composite decisions, the indirect dependency (app rendering)
> will delay the direct dependency (compositor rendering) of the page
> flip. So the driver will still see whether it is late or early
> compared to the deadline, allowing it to adjust freq in the
> appropriate direction for the next frame.

Is it expected that WSI's will set their own deadlines, or should that
be the job of the compositor? For example by using compositors using
DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
deadline matching the vsync it most ideally will be committed to?


Jonas

>
> BR,
> -R
>
> >
> > Jonas
> >
> > > + *
> > > + * Multiple deadlines may be set on a given fence, even in parallel. See the
> > > + * documentation for &dma_fence_ops.set_deadline.
> > > + *
> > > + * The deadline hint is just that, a hint. The driver that created the fence
> > > + * may react by increasing frequency, making different scheduling choices, etc.
> > > + * Or doing nothing at all.
> > > + */
> > > +
> > > +/**
> > > + * dma_fence_set_deadline - set desired fence-wait deadline hint
> > > + * @fence: the fence that is to be waited on
> > > + * @deadline: the time by which the waiter hopes for the fence to be
> > > + * signaled
> > > + *
> > > + * Give the fence signaler a hint about an upcoming deadline, such as
> > > + * vblank, by which point the waiter would prefer the fence to be
> > > + * signaled by. This is intended to give feedback to the fence signaler
> > > + * to aid in power management decisions, such as boosting GPU frequency
> > > + * if a periodic vblank deadline is approaching but the fence is not
> > > + * yet signaled..
> > > + */
> > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> > > +{
> > > + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> > > + fence->ops->set_deadline(fence, deadline);
> > > +}
> > > +EXPORT_SYMBOL(dma_fence_set_deadline);
> > > +
> > > /**
> > > * dma_fence_describe - Dump fence describtion into seq_file
> > > * @fence: the 6fence to describe
> > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > index 775cdc0b4f24..d54b595a0fe0 100644
> > > --- a/include/linux/dma-fence.h
> > > +++ b/include/linux/dma-fence.h
> > > @@ -257,6 +257,26 @@ struct dma_fence_ops {
> > > */
> > > void (*timeline_value_str)(struct dma_fence *fence,
> > > char *str, int size);
> > > +
> > > + /**
> > > + * @set_deadline:
> > > + *
> > > + * Callback to allow a fence waiter to inform the fence signaler of
> > > + * an upcoming deadline, such as vblank, by which point the waiter
> > > + * would prefer the fence to be signaled by. This is intended to
> > > + * give feedback to the fence signaler to aid in power management
> > > + * decisions, such as boosting GPU frequency.
> > > + *
> > > + * This is called without &dma_fence.lock held, it can be called
> > > + * multiple times and from any context. Locking is up to the callee
> > > + * if it has some state to manage. If multiple deadlines are set,
> > > + * the expectation is to track the soonest one. If the deadline is
> > > + * before the current time, it should be interpreted as an immediate
> > > + * deadline.
> > > + *
> > > + * This callback is optional.
> > > + */
> > > + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > > };
> > >
> > > void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > > @@ -583,6 +603,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
> > > return ret < 0 ? ret : 0;
> > > }
> > >
> > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> > > +
> > > struct dma_fence *dma_fence_get_stub(void);
> > > struct dma_fence *dma_fence_allocate_private_stub(void);
> > > u64 dma_fence_context_alloc(unsigned num);
> > > --
> > > 2.39.2
> > >

2023-03-15 16:20:20

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
>
> On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > >
> > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > From: Rob Clark <[email protected]>
> > > >
> > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > the fence signaler in making power management decisions, like boosting
> > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > so that can be factored in to the frequency scaling.
> > > >
> > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > implementation will need similar logic to track deadlines of all
> > > > the fences on the same timeline. [ckoenig]
> > > > v3: Clarify locking wrt. set_deadline callback
> > > > v4: Clarify in docs comment that this is a hint
> > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > v6: More docs
> > > > v7: Fix typo, clarify past deadlines
> > > >
> > > > Signed-off-by: Rob Clark <[email protected]>
> > > > Reviewed-by: Christian König <[email protected]>
> > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > ---
> > >
> > > Hi Rob!
> > >
> > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > 3 files changed, 87 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > index 622b8156d212..183e480d8cea 100644
> > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > :doc: fence signalling annotation
> > > >
> > > > +DMA Fence Deadline Hints
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > + :doc: deadline hints
> > > > +
> > > > DMA Fences Functions Reference
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > >
> > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > index 0de0482cd36e..f177c56269bb 100644
> > > > --- a/drivers/dma-buf/dma-fence.c
> > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > }
> > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > >
> > > > +/**
> > > > + * DOC: deadline hints
> > > > + *
> > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > + * power consumption. But in the real world there are many workloads which
> > > > + * defy this ideal. For example, but not limited to:
> > > > + *
> > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > + * result reduce frequency.
> > > > + *
> > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > + * needed.
> > >
> > > This is the use case I'd like to get some better understanding about how
> > > this series intends to work, as the problematic scheduling behavior
> > > triggered by missed deadlines has plagued compositing display servers
> > > for a long time.
> > >
> > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > developer, so I will need some hand holding when it comes to
> > > understanding exactly what piece of software is responsible for
> > > communicating what piece of information.
> > >
> > > > + *
> > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > + *
> > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > + * facing APIs). The time could either be some point in the future (such as
> > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > >
> > > Is it guaranteed that a GPU driver will use the actual start of the
> > > vblank as the effective deadline? I have some memories of seing
> > > something about vblank evasion browsing driver code, which I might have
> > > misunderstood, but I have yet to find whether this is something
> > > userspace can actually expect to be something it can rely on.
> >
> > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > clear if we talk about them separately even if they happen to be the
> > same device.
>
> Sure, sorry about being unclear about that.
>
> >
> > Assuming that is what you mean, nothing strongly defines what the
> > deadline is. In practice there is probably some buffering in the
> > display controller. For ex, block based (including bandwidth
> > compressed) formats, you need to buffer up a row of blocks to
> > efficiently linearize for scanout. So you probably need to latch some
> > time before you start sending pixel data to the display. But details
> > like this are heavily implementation dependent. I think the most
> > reasonable thing to target is start of vblank.
>
> The driver exposing those details would be quite useful for userspace
> though, so that it can delay committing updates to late, but not too
> late. Setting a deadline to be the vblank seems easy enough, but it
> isn't enough for scheduling the actual commit.

I'm not entirely sure how that would even work.. but OTOH I think you
are talking about something on the order of 100us? But that is a bit
of another topic.

> >
> > Also, keep in mind the deadline hint is just that. It won't magically
> > make the GPU finish by that deadline, but it gives the GPU driver
> > information about lateness so it can realize if it needs to clock up.
>
> Sure, even if the GPU ramped up clocks to the max, if the job queue is
> too large, it won't magically invent more cycles to squeeze in.
>
> >
> > > Can userspace set a deadline that targets the next vblank deadline
> > > before GPU work has been flushed e.g. at the start of a paint cycle, and
> > > still be sure that the kernel has the information it needs to know it should
> > > make its clocks increase their speed in time for when the actual work
> > > has been actually flushed? Or is it needed that the this deadline is set
> > > at the end?
> >
> > You need a fence to set the deadline, and for that work needs to be
> > flushed. But you can't associate a deadline with work that the kernel
> > is unaware of anyways.
>
> That makes sense, but it might also a bit inadequate to have it as the
> only way to tell the kernel it should speed things up. Even with the
> trick i915 does, with GNOME Shell, we still end up with the feedback
> loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> or dropping the first frame is so far the best work around that works,
> except doing other tricks that makes the kernel to ramp up its clock.
> Having to rely on choosing between latency and frame drops should
> ideally not have to be made.

Before you have a fence, the thing you want to be speeding up is the
CPU, not the GPU. There are existing mechanisms for that.

TBF I'm of the belief that there is still a need for input based cpu
boost (and early wake-up trigger for GPU).. we have something like
this in CrOS kernel. That is a bit of a different topic, but my point
is that fence deadlines are just one of several things we need to
optimize power/perf and responsiveness, rather than the single thing
that solves every problem under the sun ;-)

> >
> > > What I'm more or less trying to ask is, will a mode setting compositor
> > > be able to tell the kernel to boost its clocks at the time it knows is
> > > best, and how will it in practice achieve this?
> >
> > The anticipated usage for a compositor is that, when you receive a
> > <buf, fence> pair from an app, you immediately set a deadline for
> > upcoming start-of-vblank on the fence fd passed from the app. (Or for
> > implicit sync you can use DMA_BUF_IOCTL_EXPORT_SYNC_FILE). For the
> > composite step, no need to set a deadline as this is already done on
> > the kernel side in drm_atomic_helper_wait_for_fences().
>
> So it sounds like the new uapi will help compositors that do not draw
> with the intention of page flipping anything, and compositors that
> deliberately delay the commit. I suppose with proper target presentation
> time integration EGL/Vulkan WSI can set deadlines them as well. All that
> sounds like a welcomed improvement, but I'm not convinced it's enough to
> solve the problems we currently face.

Yeah, like I mentioned this addresses one issue, giving the GPU kernel
driver better information for freq mgmt. But there probably isn't one
single solution for everything.

> >
> > > For example relying on the atomic mode setting commit setting the
> > > deadline is fundamentally flawed, since user space will at times want to
> > > purposefully delay committing until as late as possible, without doing
> > > so causing an increased risk of missing the deadline due to the kernel
> > > not speeding up clocks at the right time for GPU work that has already
> > > been flushed long ago.
> >
> > Right, this is the point for exposing the ioctl to userspace.
> >
> > > Relying on commits also has no effect on GPU work queued by
> > > a compositor drawing only to dma-bufs that are never intended to be
> > > presented using mode setting. How can we make sure a compositor can
> > > provide hints that the kernel will know to respect despite the
> > > compositor not being drm master?
> >
> > It doesn't matter if there are indirect dependencies. Even if the
> > compositor completely ignores deadline hints and fancy tricks like
> > delaying composite decisions, the indirect dependency (app rendering)
> > will delay the direct dependency (compositor rendering) of the page
> > flip. So the driver will still see whether it is late or early
> > compared to the deadline, allowing it to adjust freq in the
> > appropriate direction for the next frame.
>
> Is it expected that WSI's will set their own deadlines, or should that
> be the job of the compositor? For example by using compositors using
> DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
> deadline matching the vsync it most ideally will be committed to?
>

I'm kind of assuming compositors, but if the WSI somehow has more
information about ideal presentation time, then I suppose it could be
in the WSI? I'll defer to folks who spend more time on WSI and
compositors to hash out the details ;-)

BR,
-R

>
> Jonas
>
> >
> > BR,
> > -R
> >
> > >
> > > Jonas
> > >
> > > > + *
> > > > + * Multiple deadlines may be set on a given fence, even in parallel. See the
> > > > + * documentation for &dma_fence_ops.set_deadline.
> > > > + *
> > > > + * The deadline hint is just that, a hint. The driver that created the fence
> > > > + * may react by increasing frequency, making different scheduling choices, etc.
> > > > + * Or doing nothing at all.
> > > > + */
> > > > +
> > > > +/**
> > > > + * dma_fence_set_deadline - set desired fence-wait deadline hint
> > > > + * @fence: the fence that is to be waited on
> > > > + * @deadline: the time by which the waiter hopes for the fence to be
> > > > + * signaled
> > > > + *
> > > > + * Give the fence signaler a hint about an upcoming deadline, such as
> > > > + * vblank, by which point the waiter would prefer the fence to be
> > > > + * signaled by. This is intended to give feedback to the fence signaler
> > > > + * to aid in power management decisions, such as boosting GPU frequency
> > > > + * if a periodic vblank deadline is approaching but the fence is not
> > > > + * yet signaled..
> > > > + */
> > > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> > > > +{
> > > > + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> > > > + fence->ops->set_deadline(fence, deadline);
> > > > +}
> > > > +EXPORT_SYMBOL(dma_fence_set_deadline);
> > > > +
> > > > /**
> > > > * dma_fence_describe - Dump fence describtion into seq_file
> > > > * @fence: the 6fence to describe
> > > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > > index 775cdc0b4f24..d54b595a0fe0 100644
> > > > --- a/include/linux/dma-fence.h
> > > > +++ b/include/linux/dma-fence.h
> > > > @@ -257,6 +257,26 @@ struct dma_fence_ops {
> > > > */
> > > > void (*timeline_value_str)(struct dma_fence *fence,
> > > > char *str, int size);
> > > > +
> > > > + /**
> > > > + * @set_deadline:
> > > > + *
> > > > + * Callback to allow a fence waiter to inform the fence signaler of
> > > > + * an upcoming deadline, such as vblank, by which point the waiter
> > > > + * would prefer the fence to be signaled by. This is intended to
> > > > + * give feedback to the fence signaler to aid in power management
> > > > + * decisions, such as boosting GPU frequency.
> > > > + *
> > > > + * This is called without &dma_fence.lock held, it can be called
> > > > + * multiple times and from any context. Locking is up to the callee
> > > > + * if it has some state to manage. If multiple deadlines are set,
> > > > + * the expectation is to track the soonest one. If the deadline is
> > > > + * before the current time, it should be interpreted as an immediate
> > > > + * deadline.
> > > > + *
> > > > + * This callback is optional.
> > > > + */
> > > > + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > > > };
> > > >
> > > > void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > > > @@ -583,6 +603,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
> > > > return ret < 0 ? ret : 0;
> > > > }
> > > >
> > > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> > > > +
> > > > struct dma_fence *dma_fence_get_stub(void);
> > > > struct dma_fence *dma_fence_allocate_private_stub(void);
> > > > u64 dma_fence_context_alloc(unsigned num);
> > > > --
> > > > 2.39.2
> > > >

2023-03-16 09:27:05

by Jonas Ådahl

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> >
> > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > From: Rob Clark <[email protected]>
> > > > >
> > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > the fence signaler in making power management decisions, like boosting
> > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > so that can be factored in to the frequency scaling.
> > > > >
> > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > implementation will need similar logic to track deadlines of all
> > > > > the fences on the same timeline. [ckoenig]
> > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > v4: Clarify in docs comment that this is a hint
> > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > v6: More docs
> > > > > v7: Fix typo, clarify past deadlines
> > > > >
> > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > Reviewed-by: Christian König <[email protected]>
> > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > ---
> > > >
> > > > Hi Rob!
> > > >
> > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > 3 files changed, 87 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > index 622b8156d212..183e480d8cea 100644
> > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > :doc: fence signalling annotation
> > > > >
> > > > > +DMA Fence Deadline Hints
> > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > +
> > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > + :doc: deadline hints
> > > > > +
> > > > > DMA Fences Functions Reference
> > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > >
> > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > }
> > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > >
> > > > > +/**
> > > > > + * DOC: deadline hints
> > > > > + *
> > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > + * power consumption. But in the real world there are many workloads which
> > > > > + * defy this ideal. For example, but not limited to:
> > > > > + *
> > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > + * result reduce frequency.
> > > > > + *
> > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > + * needed.
> > > >
> > > > This is the use case I'd like to get some better understanding about how
> > > > this series intends to work, as the problematic scheduling behavior
> > > > triggered by missed deadlines has plagued compositing display servers
> > > > for a long time.
> > > >
> > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > developer, so I will need some hand holding when it comes to
> > > > understanding exactly what piece of software is responsible for
> > > > communicating what piece of information.
> > > >
> > > > > + *
> > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > + *
> > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > >
> > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > vblank as the effective deadline? I have some memories of seing
> > > > something about vblank evasion browsing driver code, which I might have
> > > > misunderstood, but I have yet to find whether this is something
> > > > userspace can actually expect to be something it can rely on.
> > >
> > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > clear if we talk about them separately even if they happen to be the
> > > same device.
> >
> > Sure, sorry about being unclear about that.
> >
> > >
> > > Assuming that is what you mean, nothing strongly defines what the
> > > deadline is. In practice there is probably some buffering in the
> > > display controller. For ex, block based (including bandwidth
> > > compressed) formats, you need to buffer up a row of blocks to
> > > efficiently linearize for scanout. So you probably need to latch some
> > > time before you start sending pixel data to the display. But details
> > > like this are heavily implementation dependent. I think the most
> > > reasonable thing to target is start of vblank.
> >
> > The driver exposing those details would be quite useful for userspace
> > though, so that it can delay committing updates to late, but not too
> > late. Setting a deadline to be the vblank seems easy enough, but it
> > isn't enough for scheduling the actual commit.
>
> I'm not entirely sure how that would even work.. but OTOH I think you
> are talking about something on the order of 100us? But that is a bit
> of another topic.

Yes, something like that. But yea, it's not really related. Scheduling
commits closer to the deadline has more complex behavior than that too,
e.g. the need for real time scheduling, and knowing how long it usually
takes to create and commit and for the kernel to process.

>

8-< *snip* 8-<

> > >
> > > You need a fence to set the deadline, and for that work needs to be
> > > flushed. But you can't associate a deadline with work that the kernel
> > > is unaware of anyways.
> >
> > That makes sense, but it might also a bit inadequate to have it as the
> > only way to tell the kernel it should speed things up. Even with the
> > trick i915 does, with GNOME Shell, we still end up with the feedback
> > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > or dropping the first frame is so far the best work around that works,
> > except doing other tricks that makes the kernel to ramp up its clock.
> > Having to rely on choosing between latency and frame drops should
> > ideally not have to be made.
>
> Before you have a fence, the thing you want to be speeding up is the
> CPU, not the GPU. There are existing mechanisms for that.

Is there no benefit to let the GPU know earlier that it should speed up,
so that when the job queue arrives, it's already up to speed?

>
> TBF I'm of the belief that there is still a need for input based cpu
> boost (and early wake-up trigger for GPU).. we have something like
> this in CrOS kernel. That is a bit of a different topic, but my point
> is that fence deadlines are just one of several things we need to
> optimize power/perf and responsiveness, rather than the single thing
> that solves every problem under the sun ;-)

Perhaps; but I believe it's a bit of a back channel of intent; the piece
of the puzzle that has the information to know whether there is need
actually speed up is the compositor, not the kernel.

For example, pressing 'p' while a terminal is focused does not need high
frequency clocks, it just needs the terminal emulator to draw a 'p' and
the compositor to composite that update. Pressing <Super> may however
trigger a non-trivial animation moving a lot of stuff around on screen,
maybe triggering Wayland clients to draw and what not, and should most
arguably have the ability to "warn" the kernel about the upcoming flood
of work before it is already knocking on its door step.

>

8-< *snip* 8-<

> >
> > Is it expected that WSI's will set their own deadlines, or should that
> > be the job of the compositor? For example by using compositors using
> > DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
> > deadline matching the vsync it most ideally will be committed to?
> >
>
> I'm kind of assuming compositors, but if the WSI somehow has more
> information about ideal presentation time, then I suppose it could be
> in the WSI? I'll defer to folks who spend more time on WSI and
> compositors to hash out the details ;-)

With my compositor developer hat on, it might be best to let it be up to
the compositor, it's the one that knows if a client's content will
actually end up anywhere visible.


Jonas

>
> BR,
> -R

2023-03-16 16:29:22

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
>
> On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > >
> > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > >
> > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > From: Rob Clark <[email protected]>
> > > > > >
> > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > so that can be factored in to the frequency scaling.
> > > > > >
> > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > implementation will need similar logic to track deadlines of all
> > > > > > the fences on the same timeline. [ckoenig]
> > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > v6: More docs
> > > > > > v7: Fix typo, clarify past deadlines
> > > > > >
> > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > ---
> > > > >
> > > > > Hi Rob!
> > > > >
> > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > 3 files changed, 87 insertions(+)
> > > > > >
> > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > :doc: fence signalling annotation
> > > > > >
> > > > > > +DMA Fence Deadline Hints
> > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > +
> > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > + :doc: deadline hints
> > > > > > +
> > > > > > DMA Fences Functions Reference
> > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > >
> > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > }
> > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > >
> > > > > > +/**
> > > > > > + * DOC: deadline hints
> > > > > > + *
> > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > + *
> > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > + * result reduce frequency.
> > > > > > + *
> > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > + * needed.
> > > > >
> > > > > This is the use case I'd like to get some better understanding about how
> > > > > this series intends to work, as the problematic scheduling behavior
> > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > for a long time.
> > > > >
> > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > developer, so I will need some hand holding when it comes to
> > > > > understanding exactly what piece of software is responsible for
> > > > > communicating what piece of information.
> > > > >
> > > > > > + *
> > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > + *
> > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > >
> > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > vblank as the effective deadline? I have some memories of seing
> > > > > something about vblank evasion browsing driver code, which I might have
> > > > > misunderstood, but I have yet to find whether this is something
> > > > > userspace can actually expect to be something it can rely on.
> > > >
> > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > clear if we talk about them separately even if they happen to be the
> > > > same device.
> > >
> > > Sure, sorry about being unclear about that.
> > >
> > > >
> > > > Assuming that is what you mean, nothing strongly defines what the
> > > > deadline is. In practice there is probably some buffering in the
> > > > display controller. For ex, block based (including bandwidth
> > > > compressed) formats, you need to buffer up a row of blocks to
> > > > efficiently linearize for scanout. So you probably need to latch some
> > > > time before you start sending pixel data to the display. But details
> > > > like this are heavily implementation dependent. I think the most
> > > > reasonable thing to target is start of vblank.
> > >
> > > The driver exposing those details would be quite useful for userspace
> > > though, so that it can delay committing updates to late, but not too
> > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > isn't enough for scheduling the actual commit.
> >
> > I'm not entirely sure how that would even work.. but OTOH I think you
> > are talking about something on the order of 100us? But that is a bit
> > of another topic.
>
> Yes, something like that. But yea, it's not really related. Scheduling
> commits closer to the deadline has more complex behavior than that too,
> e.g. the need for real time scheduling, and knowing how long it usually
> takes to create and commit and for the kernel to process.
>
> >
>
> 8-< *snip* 8-<
>
> > > >
> > > > You need a fence to set the deadline, and for that work needs to be
> > > > flushed. But you can't associate a deadline with work that the kernel
> > > > is unaware of anyways.
> > >
> > > That makes sense, but it might also a bit inadequate to have it as the
> > > only way to tell the kernel it should speed things up. Even with the
> > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > or dropping the first frame is so far the best work around that works,
> > > except doing other tricks that makes the kernel to ramp up its clock.
> > > Having to rely on choosing between latency and frame drops should
> > > ideally not have to be made.
> >
> > Before you have a fence, the thing you want to be speeding up is the
> > CPU, not the GPU. There are existing mechanisms for that.
>
> Is there no benefit to let the GPU know earlier that it should speed up,
> so that when the job queue arrives, it's already up to speed?

Downstream we have input notifier that resumes the GPU so we can
pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
wait to boost freq until we have cmdstream to submit, since that
doesn't take as long. What needs help initially after input is all
the stuff that happens on the CPU before the GPU can start to do
anything ;-)

Btw, I guess I haven't made this clear, dma-fence deadline is trying
to help the steady-state situation, rather than the input-latency
situation. It might take a frame or two of missed deadlines for
gpufreq to arrive at a good steady-state freq.

> >
> > TBF I'm of the belief that there is still a need for input based cpu
> > boost (and early wake-up trigger for GPU).. we have something like
> > this in CrOS kernel. That is a bit of a different topic, but my point
> > is that fence deadlines are just one of several things we need to
> > optimize power/perf and responsiveness, rather than the single thing
> > that solves every problem under the sun ;-)
>
> Perhaps; but I believe it's a bit of a back channel of intent; the piece
> of the puzzle that has the information to know whether there is need
> actually speed up is the compositor, not the kernel.
>
> For example, pressing 'p' while a terminal is focused does not need high
> frequency clocks, it just needs the terminal emulator to draw a 'p' and
> the compositor to composite that update. Pressing <Super> may however
> trigger a non-trivial animation moving a lot of stuff around on screen,
> maybe triggering Wayland clients to draw and what not, and should most
> arguably have the ability to "warn" the kernel about the upcoming flood
> of work before it is already knocking on its door step.

The super key is problematic, but not for the reason you think. It is
because it is a case where we should boost on key-up instead of
key-down.. and the second key-up event comes after the cpu-boost is
already in it's cool-down period. But even if suboptimal in cases
like this, it is still useful for touch/stylus cases where the
slightest of lag is much more perceptible.

This is getting off topic but I kinda favor coming up with some sort
of static definition that userspace could give the kernel to let the
kernel know what input to boost on. Or maybe something could be done
with BPF?

> >
>
> 8-< *snip* 8-<
>
> > >
> > > Is it expected that WSI's will set their own deadlines, or should that
> > > be the job of the compositor? For example by using compositors using
> > > DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
> > > deadline matching the vsync it most ideally will be committed to?
> > >
> >
> > I'm kind of assuming compositors, but if the WSI somehow has more
> > information about ideal presentation time, then I suppose it could be
> > in the WSI? I'll defer to folks who spend more time on WSI and
> > compositors to hash out the details ;-)
>
> With my compositor developer hat on, it might be best to let it be up to
> the compositor, it's the one that knows if a client's content will
> actually end up anywhere visible.
>

wfm

BR,
-R

>
> Jonas
>
> >
> > BR,
> > -R

2023-03-16 21:23:10

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 00/15] dma-fence: Deadline awareness

On Wed, Mar 8, 2023 at 7:53 AM Rob Clark <[email protected]> wrote:
>
> From: Rob Clark <[email protected]>
>
> This series adds a deadline hint to fences, so realtime deadlines
> such as vblank can be communicated to the fence signaller for power/
> frequency management decisions.
>
> This is partially inspired by a trick i915 does, but implemented
> via dma-fence for a couple of reasons:
>
> 1) To continue to be able to use the atomic helpers
> 2) To support cases where display and gpu are different drivers
>
> This iteration adds a dma-fence ioctl to set a deadline (both to
> support igt-tests, and compositors which delay decisions about which
> client buffer to display), and a sw_sync ioctl to read back the
> deadline. IGT tests utilizing these can be found at:
>
> https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline
>

jfwiw, mesa side of this:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21973

BR,
-R

>
> v1: https://patchwork.freedesktop.org/series/93035/
> v2: Move filtering out of later deadlines to fence implementation
> to avoid increasing the size of dma_fence
> v3: Add support in fence-array and fence-chain; Add some uabi to
> support igt tests and userspace compositors.
> v4: Rebase, address various comments, and add syncobj deadline
> support, and sync_file EPOLLPRI based on experience with perf/
> freq issues with clvk compute workloads on i915 (anv)
> v5: Clarify that this is a hint as opposed to a more hard deadline
> guarantee, switch to using u64 ns values in UABI (still absolute
> CLOCK_MONOTONIC values), drop syncobj related cap and driver
> feature flag in favor of allowing count_handles==0 for probing
> kernel support.
> v6: Re-work vblank helper to calculate time of _start_ of vblank,
> and work correctly if the last vblank event was more than a
> frame ago. Add (mostly unrelated) drm/msm patch which also
> uses the vblank helper. Use dma_fence_chain_contained(). More
> verbose syncobj UABI comments. Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> v7: Fix kbuild complaints about vblank helper. Add more docs.
> v8: Add patch to surface sync_file UAPI, and more docs updates.
> v9: Drop (E)POLLPRI support.. I still like it, but not essential and
> it can always be revived later. Fix doc build warning.
> v10: Update 11/15 to handle multiple CRTCs
>
> Rob Clark (15):
> dma-buf/dma-fence: Add deadline awareness
> dma-buf/fence-array: Add fence deadline support
> dma-buf/fence-chain: Add fence deadline support
> dma-buf/dma-resv: Add a way to set fence deadline
> dma-buf/sync_file: Surface sync-file uABI
> dma-buf/sync_file: Add SET_DEADLINE ioctl
> dma-buf/sw_sync: Add fence deadline support
> drm/scheduler: Add fence deadline support
> drm/syncobj: Add deadline support for syncobj waits
> drm/vblank: Add helper to get next vblank time
> drm/atomic-helper: Set fence deadline for vblank
> drm/msm: Add deadline based boost support
> drm/msm: Add wait-boost support
> drm/msm/atomic: Switch to vblank_start helper
> drm/i915: Add deadline based boost support
>
> Rob Clark (15):
> dma-buf/dma-fence: Add deadline awareness
> dma-buf/fence-array: Add fence deadline support
> dma-buf/fence-chain: Add fence deadline support
> dma-buf/dma-resv: Add a way to set fence deadline
> dma-buf/sync_file: Surface sync-file uABI
> dma-buf/sync_file: Add SET_DEADLINE ioctl
> dma-buf/sw_sync: Add fence deadline support
> drm/scheduler: Add fence deadline support
> drm/syncobj: Add deadline support for syncobj waits
> drm/vblank: Add helper to get next vblank time
> drm/atomic-helper: Set fence deadline for vblank
> drm/msm: Add deadline based boost support
> drm/msm: Add wait-boost support
> drm/msm/atomic: Switch to vblank_start helper
> drm/i915: Add deadline based boost support
>
> Documentation/driver-api/dma-buf.rst | 16 ++++-
> drivers/dma-buf/dma-fence-array.c | 11 ++++
> drivers/dma-buf/dma-fence-chain.c | 12 ++++
> drivers/dma-buf/dma-fence.c | 60 ++++++++++++++++++
> drivers/dma-buf/dma-resv.c | 22 +++++++
> drivers/dma-buf/sw_sync.c | 81 +++++++++++++++++++++++++
> drivers/dma-buf/sync_debug.h | 2 +
> drivers/dma-buf/sync_file.c | 19 ++++++
> drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++
> drivers/gpu/drm/drm_syncobj.c | 64 +++++++++++++++----
> drivers/gpu/drm/drm_vblank.c | 53 +++++++++++++---
> drivers/gpu/drm/i915/i915_request.c | 20 ++++++
> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -----
> drivers/gpu/drm/msm/msm_atomic.c | 8 ++-
> drivers/gpu/drm/msm/msm_drv.c | 12 ++--
> drivers/gpu/drm/msm/msm_fence.c | 74 ++++++++++++++++++++++
> drivers/gpu/drm/msm/msm_fence.h | 20 ++++++
> drivers/gpu/drm/msm/msm_gem.c | 5 ++
> drivers/gpu/drm/msm/msm_kms.h | 8 ---
> drivers/gpu/drm/scheduler/sched_fence.c | 46 ++++++++++++++
> drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> include/drm/drm_vblank.h | 1 +
> include/drm/gpu_scheduler.h | 17 ++++++
> include/linux/dma-fence.h | 22 +++++++
> include/linux/dma-resv.h | 2 +
> include/uapi/drm/drm.h | 17 ++++++
> include/uapi/drm/msm_drm.h | 14 ++++-
> include/uapi/linux/sync_file.h | 59 +++++++++++-------
> 28 files changed, 640 insertions(+), 79 deletions(-)
>
> --
> 2.39.2
>

2023-03-16 22:27:35

by Sebastian Wick

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Thu, Mar 16, 2023 at 5:29 PM Rob Clark <[email protected]> wrote:
>
> On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> >
> > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > From: Rob Clark <[email protected]>
> > > > > > >
> > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > so that can be factored in to the frequency scaling.
> > > > > > >
> > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > v6: More docs
> > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > ---
> > > > > >
> > > > > > Hi Rob!
> > > > > >
> > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > 3 files changed, 87 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > :doc: fence signalling annotation
> > > > > > >
> > > > > > > +DMA Fence Deadline Hints
> > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > +
> > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > + :doc: deadline hints
> > > > > > > +
> > > > > > > DMA Fences Functions Reference
> > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > >
> > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > }
> > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > >
> > > > > > > +/**
> > > > > > > + * DOC: deadline hints
> > > > > > > + *
> > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > + *
> > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > + * result reduce frequency.
> > > > > > > + *
> > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > + * needed.
> > > > > >
> > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > for a long time.
> > > > > >
> > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > developer, so I will need some hand holding when it comes to
> > > > > > understanding exactly what piece of software is responsible for
> > > > > > communicating what piece of information.
> > > > > >
> > > > > > > + *
> > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > + *
> > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > >
> > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > userspace can actually expect to be something it can rely on.
> > > > >
> > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > clear if we talk about them separately even if they happen to be the
> > > > > same device.
> > > >
> > > > Sure, sorry about being unclear about that.
> > > >
> > > > >
> > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > deadline is. In practice there is probably some buffering in the
> > > > > display controller. For ex, block based (including bandwidth
> > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > time before you start sending pixel data to the display. But details
> > > > > like this are heavily implementation dependent. I think the most
> > > > > reasonable thing to target is start of vblank.
> > > >
> > > > The driver exposing those details would be quite useful for userspace
> > > > though, so that it can delay committing updates to late, but not too
> > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > isn't enough for scheduling the actual commit.
> > >
> > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > are talking about something on the order of 100us? But that is a bit
> > > of another topic.
> >
> > Yes, something like that. But yea, it's not really related. Scheduling
> > commits closer to the deadline has more complex behavior than that too,
> > e.g. the need for real time scheduling, and knowing how long it usually
> > takes to create and commit and for the kernel to process.

Vblank can be really long, especially with VRR where the additional
time you get to finish the frame comes from making vblank longer.
Using the start of vblank as a deadline makes VRR useless. It really
would be nice to have some feedback about the actual deadline from the
kernel, maybe in `struct drm_event_vblank`.

But yes, sorry, off topic...

> > >
> >
> > 8-< *snip* 8-<
> >
> > > > >
> > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > is unaware of anyways.
> > > >
> > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > only way to tell the kernel it should speed things up. Even with the
> > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > or dropping the first frame is so far the best work around that works,
> > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > Having to rely on choosing between latency and frame drops should
> > > > ideally not have to be made.
> > >
> > > Before you have a fence, the thing you want to be speeding up is the
> > > CPU, not the GPU. There are existing mechanisms for that.
> >
> > Is there no benefit to let the GPU know earlier that it should speed up,
> > so that when the job queue arrives, it's already up to speed?
>
> Downstream we have input notifier that resumes the GPU so we can
> pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> wait to boost freq until we have cmdstream to submit, since that
> doesn't take as long. What needs help initially after input is all
> the stuff that happens on the CPU before the GPU can start to do
> anything ;-)
>
> Btw, I guess I haven't made this clear, dma-fence deadline is trying
> to help the steady-state situation, rather than the input-latency
> situation. It might take a frame or two of missed deadlines for
> gpufreq to arrive at a good steady-state freq.

The mutter issue also is about a suboptimal steady-state.

Truth be told, I'm not sure if this fence deadline idea fixes the
issue we're seeing or at least helps sometimes. It might, it might
not. What annoys me is that the compositor *knows* before any work is
submitted that some work will be submitted and when it has to finish.
We could maximize the chances to get everything right but having to
wait for a fence to materialize in the compositor to do anything about
it is suboptimal.

> > >
> > > TBF I'm of the belief that there is still a need for input based cpu
> > > boost (and early wake-up trigger for GPU).. we have something like
> > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > is that fence deadlines are just one of several things we need to
> > > optimize power/perf and responsiveness, rather than the single thing
> > > that solves every problem under the sun ;-)
> >
> > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > of the puzzle that has the information to know whether there is need
> > actually speed up is the compositor, not the kernel.
> >
> > For example, pressing 'p' while a terminal is focused does not need high
> > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > the compositor to composite that update. Pressing <Super> may however
> > trigger a non-trivial animation moving a lot of stuff around on screen,
> > maybe triggering Wayland clients to draw and what not, and should most
> > arguably have the ability to "warn" the kernel about the upcoming flood
> > of work before it is already knocking on its door step.
>
> The super key is problematic, but not for the reason you think. It is
> because it is a case where we should boost on key-up instead of
> key-down.. and the second key-up event comes after the cpu-boost is
> already in it's cool-down period. But even if suboptimal in cases
> like this, it is still useful for touch/stylus cases where the
> slightest of lag is much more perceptible.
>
> This is getting off topic but I kinda favor coming up with some sort
> of static definition that userspace could give the kernel to let the
> kernel know what input to boost on. Or maybe something could be done
> with BPF?

Why? Do you think user space is so slow that it can't process the
input events and then do a syscall? We need to have all input devices
open anyway that can affect the system and know more about how they
affect behavior than the kernel can ever know.

>
> > >
> >
> > 8-< *snip* 8-<
> >
> > > >
> > > > Is it expected that WSI's will set their own deadlines, or should that
> > > > be the job of the compositor? For example by using compositors using
> > > > DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
> > > > deadline matching the vsync it most ideally will be committed to?
> > > >
> > >
> > > I'm kind of assuming compositors, but if the WSI somehow has more
> > > information about ideal presentation time, then I suppose it could be
> > > in the WSI? I'll defer to folks who spend more time on WSI and
> > > compositors to hash out the details ;-)
> >
> > With my compositor developer hat on, it might be best to let it be up to
> > the compositor, it's the one that knows if a client's content will
> > actually end up anywhere visible.
> >
>
> wfm
>
> BR,
> -R
>
> >
> > Jonas
> >
> > >
> > > BR,
> > > -R
>


2023-03-16 22:59:58

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Thu, Mar 16, 2023 at 3:22 PM Sebastian Wick
<[email protected]> wrote:
>
> On Thu, Mar 16, 2023 at 5:29 PM Rob Clark <[email protected]> wrote:
> >
> > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> > >
> > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > From: Rob Clark <[email protected]>
> > > > > > > >
> > > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > >
> > > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > v6: More docs
> > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > > ---
> > > > > > >
> > > > > > > Hi Rob!
> > > > > > >
> > > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > > 3 files changed, 87 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > :doc: fence signalling annotation
> > > > > > > >
> > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > +
> > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > + :doc: deadline hints
> > > > > > > > +
> > > > > > > > DMA Fences Functions Reference
> > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > > }
> > > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * DOC: deadline hints
> > > > > > > > + *
> > > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > > + *
> > > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > > + * result reduce frequency.
> > > > > > > > + *
> > > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > > + * needed.
> > > > > > >
> > > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > > for a long time.
> > > > > > >
> > > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > > developer, so I will need some hand holding when it comes to
> > > > > > > understanding exactly what piece of software is responsible for
> > > > > > > communicating what piece of information.
> > > > > > >
> > > > > > > > + *
> > > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > > + *
> > > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > > >
> > > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > > userspace can actually expect to be something it can rely on.
> > > > > >
> > > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > > clear if we talk about them separately even if they happen to be the
> > > > > > same device.
> > > > >
> > > > > Sure, sorry about being unclear about that.
> > > > >
> > > > > >
> > > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > > deadline is. In practice there is probably some buffering in the
> > > > > > display controller. For ex, block based (including bandwidth
> > > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > > time before you start sending pixel data to the display. But details
> > > > > > like this are heavily implementation dependent. I think the most
> > > > > > reasonable thing to target is start of vblank.
> > > > >
> > > > > The driver exposing those details would be quite useful for userspace
> > > > > though, so that it can delay committing updates to late, but not too
> > > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > > isn't enough for scheduling the actual commit.
> > > >
> > > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > > are talking about something on the order of 100us? But that is a bit
> > > > of another topic.
> > >
> > > Yes, something like that. But yea, it's not really related. Scheduling
> > > commits closer to the deadline has more complex behavior than that too,
> > > e.g. the need for real time scheduling, and knowing how long it usually
> > > takes to create and commit and for the kernel to process.
>
> Vblank can be really long, especially with VRR where the additional
> time you get to finish the frame comes from making vblank longer.
> Using the start of vblank as a deadline makes VRR useless. It really
> would be nice to have some feedback about the actual deadline from the
> kernel, maybe in `struct drm_event_vblank`.

note that here we are only talking about the difference between
start/end of vblank and the deadline for the hw to latch a change for
the next frame. (Which I _expect_ generally amounts to however long
it takes to slurp in a row of tiles)

> But yes, sorry, off topic...
>
> > > >
> > >
> > > 8-< *snip* 8-<
> > >
> > > > > >
> > > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > > is unaware of anyways.
> > > > >
> > > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > > only way to tell the kernel it should speed things up. Even with the
> > > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > > or dropping the first frame is so far the best work around that works,
> > > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > > Having to rely on choosing between latency and frame drops should
> > > > > ideally not have to be made.
> > > >
> > > > Before you have a fence, the thing you want to be speeding up is the
> > > > CPU, not the GPU. There are existing mechanisms for that.
> > >
> > > Is there no benefit to let the GPU know earlier that it should speed up,
> > > so that when the job queue arrives, it's already up to speed?
> >
> > Downstream we have input notifier that resumes the GPU so we can
> > pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> > wait to boost freq until we have cmdstream to submit, since that
> > doesn't take as long. What needs help initially after input is all
> > the stuff that happens on the CPU before the GPU can start to do
> > anything ;-)
> >
> > Btw, I guess I haven't made this clear, dma-fence deadline is trying
> > to help the steady-state situation, rather than the input-latency
> > situation. It might take a frame or two of missed deadlines for
> > gpufreq to arrive at a good steady-state freq.
>
> The mutter issue also is about a suboptimal steady-state.
>
> Truth be told, I'm not sure if this fence deadline idea fixes the
> issue we're seeing or at least helps sometimes. It might, it might
> not. What annoys me is that the compositor *knows* before any work is
> submitted that some work will be submitted and when it has to finish.
> We could maximize the chances to get everything right but having to
> wait for a fence to materialize in the compositor to do anything about
> it is suboptimal.

Why would the app not immediately send the fence+buf to the compositor
as soon as it is submitted to the kernel on client process side?

At any rate, it really doesn't matter how early the kernel finds out
about the deadline, since the point is to let the kernel driver know
if it is missing the deadline so that it doesn't mis-interpret stall
time waiting for the _next_ vblank after the one we wanted.

> > > >
> > > > TBF I'm of the belief that there is still a need for input based cpu
> > > > boost (and early wake-up trigger for GPU).. we have something like
> > > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > > is that fence deadlines are just one of several things we need to
> > > > optimize power/perf and responsiveness, rather than the single thing
> > > > that solves every problem under the sun ;-)
> > >
> > > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > > of the puzzle that has the information to know whether there is need
> > > actually speed up is the compositor, not the kernel.
> > >
> > > For example, pressing 'p' while a terminal is focused does not need high
> > > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > > the compositor to composite that update. Pressing <Super> may however
> > > trigger a non-trivial animation moving a lot of stuff around on screen,
> > > maybe triggering Wayland clients to draw and what not, and should most
> > > arguably have the ability to "warn" the kernel about the upcoming flood
> > > of work before it is already knocking on its door step.
> >
> > The super key is problematic, but not for the reason you think. It is
> > because it is a case where we should boost on key-up instead of
> > key-down.. and the second key-up event comes after the cpu-boost is
> > already in it's cool-down period. But even if suboptimal in cases
> > like this, it is still useful for touch/stylus cases where the
> > slightest of lag is much more perceptible.
> >
> > This is getting off topic but I kinda favor coming up with some sort
> > of static definition that userspace could give the kernel to let the
> > kernel know what input to boost on. Or maybe something could be done
> > with BPF?
>
> Why? Do you think user space is so slow that it can't process the
> input events and then do a syscall? We need to have all input devices
> open anyway that can affect the system and know more about how they
> affect behavior than the kernel can ever know.

Again this is getting off into a different topic. But my gut feel is
that the shorter the path to input cpu freq boost, the better.. since
however many extra cycles you add, they will be cycles with cpu (and
probably ddr) at lowest freq

BR,
-R

> >
> > > >
> > >
> > > 8-< *snip* 8-<
> > >
> > > > >
> > > > > Is it expected that WSI's will set their own deadlines, or should that
> > > > > be the job of the compositor? For example by using compositors using
> > > > > DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
> > > > > deadline matching the vsync it most ideally will be committed to?
> > > > >
> > > >
> > > > I'm kind of assuming compositors, but if the WSI somehow has more
> > > > information about ideal presentation time, then I suppose it could be
> > > > in the WSI? I'll defer to folks who spend more time on WSI and
> > > > compositors to hash out the details ;-)
> > >
> > > With my compositor developer hat on, it might be best to let it be up to
> > > the compositor, it's the one that knows if a client's content will
> > > actually end up anywhere visible.
> > >
> >
> > wfm
> >
> > BR,
> > -R
> >
> > >
> > > Jonas
> > >
> > > >
> > > > BR,
> > > > -R
> >
>

2023-03-17 09:11:49

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On 3/16/23 23:22, Sebastian Wick wrote:
> On Thu, Mar 16, 2023 at 5:29 PM Rob Clark <[email protected]> wrote:
>> On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
>>> On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
>>>> On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
>>>>> On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
>>>>>> On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
>>>>>>>
>>>>>>>> + *
>>>>>>>> + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
>>>>>>>> + * The deadline hint provides a way for the waiting driver, or userspace, to
>>>>>>>> + * convey an appropriate sense of urgency to the signaling driver.
>>>>>>>> + *
>>>>>>>> + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
>>>>>>>> + * facing APIs). The time could either be some point in the future (such as
>>>>>>>> + * the vblank based deadline for page-flipping, or the start of a compositor's
>>>>>>>> + * composition cycle), or the current time to indicate an immediate deadline
>>>>>>>> + * hint (Ie. forward progress cannot be made until this fence is signaled).
>>>>>>>
>>>>>>> Is it guaranteed that a GPU driver will use the actual start of the
>>>>>>> vblank as the effective deadline? I have some memories of seing
>>>>>>> something about vblank evasion browsing driver code, which I might have
>>>>>>> misunderstood, but I have yet to find whether this is something
>>>>>>> userspace can actually expect to be something it can rely on.
>>>>>>
>>>>>> I guess you mean s/GPU driver/display driver/ ? It makes things more
>>>>>> clear if we talk about them separately even if they happen to be the
>>>>>> same device.
>>>>>
>>>>> Sure, sorry about being unclear about that.
>>>>>
>>>>>>
>>>>>> Assuming that is what you mean, nothing strongly defines what the
>>>>>> deadline is. In practice there is probably some buffering in the
>>>>>> display controller. For ex, block based (including bandwidth
>>>>>> compressed) formats, you need to buffer up a row of blocks to
>>>>>> efficiently linearize for scanout. So you probably need to latch some
>>>>>> time before you start sending pixel data to the display. But details
>>>>>> like this are heavily implementation dependent. I think the most
>>>>>> reasonable thing to target is start of vblank.
>>>>>
>>>>> The driver exposing those details would be quite useful for userspace
>>>>> though, so that it can delay committing updates to late, but not too
>>>>> late. Setting a deadline to be the vblank seems easy enough, but it
>>>>> isn't enough for scheduling the actual commit.
>>>>
>>>> I'm not entirely sure how that would even work.. but OTOH I think you
>>>> are talking about something on the order of 100us? But that is a bit
>>>> of another topic.
>>>
>>> Yes, something like that. But yea, it's not really related. Scheduling
>>> commits closer to the deadline has more complex behavior than that too,
>>> e.g. the need for real time scheduling, and knowing how long it usually
>>> takes to create and commit and for the kernel to process.
>
> Vblank can be really long, especially with VRR where the additional
> time you get to finish the frame comes from making vblank longer.
> Using the start of vblank as a deadline makes VRR useless.

Not really. We normally still want to aim for start of vblank with VRR, which would result in the maximum refresh rate. Missing that target just incurs less of a penalty than with fixed refresh rate.


--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer


2023-03-17 10:23:50

by Jonas Ådahl

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Thu, Mar 16, 2023 at 09:28:55AM -0700, Rob Clark wrote:
> On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> >
> > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > From: Rob Clark <[email protected]>
> > > > > > >
> > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > so that can be factored in to the frequency scaling.
> > > > > > >
> > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > v6: More docs
> > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > ---
> > > > > >
> > > > > > Hi Rob!
> > > > > >
> > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > 3 files changed, 87 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > :doc: fence signalling annotation
> > > > > > >
> > > > > > > +DMA Fence Deadline Hints
> > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > +
> > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > + :doc: deadline hints
> > > > > > > +
> > > > > > > DMA Fences Functions Reference
> > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > >
> > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > }
> > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > >
> > > > > > > +/**
> > > > > > > + * DOC: deadline hints
> > > > > > > + *
> > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > + *
> > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > + * result reduce frequency.
> > > > > > > + *
> > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > + * needed.
> > > > > >
> > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > for a long time.
> > > > > >
> > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > developer, so I will need some hand holding when it comes to
> > > > > > understanding exactly what piece of software is responsible for
> > > > > > communicating what piece of information.
> > > > > >
> > > > > > > + *
> > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > + *
> > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > >
> > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > userspace can actually expect to be something it can rely on.
> > > > >
> > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > clear if we talk about them separately even if they happen to be the
> > > > > same device.
> > > >
> > > > Sure, sorry about being unclear about that.
> > > >
> > > > >
> > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > deadline is. In practice there is probably some buffering in the
> > > > > display controller. For ex, block based (including bandwidth
> > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > time before you start sending pixel data to the display. But details
> > > > > like this are heavily implementation dependent. I think the most
> > > > > reasonable thing to target is start of vblank.
> > > >
> > > > The driver exposing those details would be quite useful for userspace
> > > > though, so that it can delay committing updates to late, but not too
> > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > isn't enough for scheduling the actual commit.
> > >
> > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > are talking about something on the order of 100us? But that is a bit
> > > of another topic.
> >
> > Yes, something like that. But yea, it's not really related. Scheduling
> > commits closer to the deadline has more complex behavior than that too,
> > e.g. the need for real time scheduling, and knowing how long it usually
> > takes to create and commit and for the kernel to process.
> >
> > >
> >
> > 8-< *snip* 8-<
> >
> > > > >
> > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > is unaware of anyways.
> > > >
> > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > only way to tell the kernel it should speed things up. Even with the
> > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > or dropping the first frame is so far the best work around that works,
> > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > Having to rely on choosing between latency and frame drops should
> > > > ideally not have to be made.
> > >
> > > Before you have a fence, the thing you want to be speeding up is the
> > > CPU, not the GPU. There are existing mechanisms for that.
> >
> > Is there no benefit to let the GPU know earlier that it should speed up,
> > so that when the job queue arrives, it's already up to speed?
>
> Downstream we have input notifier that resumes the GPU so we can
> pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> wait to boost freq until we have cmdstream to submit, since that
> doesn't take as long. What needs help initially after input is all
> the stuff that happens on the CPU before the GPU can start to do
> anything ;-)

How do you deal with boosting CPU speeds downstream? Does the input
notifier do that too?

>
> Btw, I guess I haven't made this clear, dma-fence deadline is trying
> to help the steady-state situation, rather than the input-latency
> situation. It might take a frame or two of missed deadlines for
> gpufreq to arrive at a good steady-state freq.

I'm just not sure it will help. Missed deadlines set at commit hasn't
been enough in the past to let the kernel understand it should speed
things up before the next frame (which will be a whole frame late
without any triple buffering which should be a last resort), so I don't
see how it will help by adding a userspace hook to do the same thing.

I think input latency and steady state target frequency here is tightly
linked; what we should aim for is to provide enough information at the
right time so that it does *not* take a frame or two to of missed
deadlines to arrive at the target frequency, as those missed deadlines
either means either stuttering and/or lag.

That it helps with the deliberately late commit I do understand, but we
don't do that yet, but intend to when there is kernel uapi to lets us do
so without negative consequences.

>
> > >
> > > TBF I'm of the belief that there is still a need for input based cpu
> > > boost (and early wake-up trigger for GPU).. we have something like
> > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > is that fence deadlines are just one of several things we need to
> > > optimize power/perf and responsiveness, rather than the single thing
> > > that solves every problem under the sun ;-)
> >
> > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > of the puzzle that has the information to know whether there is need
> > actually speed up is the compositor, not the kernel.
> >
> > For example, pressing 'p' while a terminal is focused does not need high
> > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > the compositor to composite that update. Pressing <Super> may however
> > trigger a non-trivial animation moving a lot of stuff around on screen,
> > maybe triggering Wayland clients to draw and what not, and should most
> > arguably have the ability to "warn" the kernel about the upcoming flood
> > of work before it is already knocking on its door step.
>
> The super key is problematic, but not for the reason you think. It is
> because it is a case where we should boost on key-up instead of
> key-down.. and the second key-up event comes after the cpu-boost is
> already in it's cool-down period. But even if suboptimal in cases
> like this, it is still useful for touch/stylus cases where the
> slightest of lag is much more perceptible.

Other keys are even more problematic. Alt, for example, does nothing,
Alt + Tab does some light rendering, but Alt + KeyAboveTab will,
depending on the current active applications, suddenly trigger N Wayland
surfaces to start rendering at the same time.

>
> This is getting off topic but I kinda favor coming up with some sort
> of static definition that userspace could give the kernel to let the
> kernel know what input to boost on. Or maybe something could be done
> with BPF?

I have hard time seeing any static information can be enough, it's
depends too much on context what is expected to happen. And can a BPF
program really help? Unless BPF programs that pulls some internal kernel
strings to speed things up whenever userspace wants I don't see how it
is that much better.

I don't think userspace is necessarily too slow to actively particitpate
in providing direct scheduling hints either. Input processing can, for
example, be off loaded to a real time scheduled thread, and plumbing any
hints about future expectations from rendering, windowing and layout
subsystems will be significantly easier to plumb to a real time input
thread than translated into static informations or BPF programs.


Jonas

2023-03-17 15:10:05

by Sebastian Wick

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Thu, Mar 16, 2023 at 11:59 PM Rob Clark <[email protected]> wrote:
>
> On Thu, Mar 16, 2023 at 3:22 PM Sebastian Wick
> <[email protected]> wrote:
> >
> > On Thu, Mar 16, 2023 at 5:29 PM Rob Clark <[email protected]> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > > > >
> > > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > > From: Rob Clark <[email protected]>
> > > > > > > > >
> > > > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > > >
> > > > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > > v6: More docs
> > > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > > >
> > > > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > Hi Rob!
> > > > > > > >
> > > > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > > > 3 files changed, 87 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > :doc: fence signalling annotation
> > > > > > > > >
> > > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > > +
> > > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > + :doc: deadline hints
> > > > > > > > > +
> > > > > > > > > DMA Fences Functions Reference
> > > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > > > }
> > > > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * DOC: deadline hints
> > > > > > > > > + *
> > > > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > > > + *
> > > > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > > > + * result reduce frequency.
> > > > > > > > > + *
> > > > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > > > + * needed.
> > > > > > > >
> > > > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > > > for a long time.
> > > > > > > >
> > > > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > > > developer, so I will need some hand holding when it comes to
> > > > > > > > understanding exactly what piece of software is responsible for
> > > > > > > > communicating what piece of information.
> > > > > > > >
> > > > > > > > > + *
> > > > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > > > + *
> > > > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > > > >
> > > > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > > > userspace can actually expect to be something it can rely on.
> > > > > > >
> > > > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > > > clear if we talk about them separately even if they happen to be the
> > > > > > > same device.
> > > > > >
> > > > > > Sure, sorry about being unclear about that.
> > > > > >
> > > > > > >
> > > > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > > > deadline is. In practice there is probably some buffering in the
> > > > > > > display controller. For ex, block based (including bandwidth
> > > > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > > > time before you start sending pixel data to the display. But details
> > > > > > > like this are heavily implementation dependent. I think the most
> > > > > > > reasonable thing to target is start of vblank.
> > > > > >
> > > > > > The driver exposing those details would be quite useful for userspace
> > > > > > though, so that it can delay committing updates to late, but not too
> > > > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > > > isn't enough for scheduling the actual commit.
> > > > >
> > > > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > > > are talking about something on the order of 100us? But that is a bit
> > > > > of another topic.
> > > >
> > > > Yes, something like that. But yea, it's not really related. Scheduling
> > > > commits closer to the deadline has more complex behavior than that too,
> > > > e.g. the need for real time scheduling, and knowing how long it usually
> > > > takes to create and commit and for the kernel to process.
> >
> > Vblank can be really long, especially with VRR where the additional
> > time you get to finish the frame comes from making vblank longer.
> > Using the start of vblank as a deadline makes VRR useless. It really
> > would be nice to have some feedback about the actual deadline from the
> > kernel, maybe in `struct drm_event_vblank`.
>
> note that here we are only talking about the difference between
> start/end of vblank and the deadline for the hw to latch a change for
> the next frame. (Which I _expect_ generally amounts to however long
> it takes to slurp in a row of tiles)
>
> > But yes, sorry, off topic...
> >
> > > > >
> > > >
> > > > 8-< *snip* 8-<
> > > >
> > > > > > >
> > > > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > > > is unaware of anyways.
> > > > > >
> > > > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > > > only way to tell the kernel it should speed things up. Even with the
> > > > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > > > or dropping the first frame is so far the best work around that works,
> > > > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > > > Having to rely on choosing between latency and frame drops should
> > > > > > ideally not have to be made.
> > > > >
> > > > > Before you have a fence, the thing you want to be speeding up is the
> > > > > CPU, not the GPU. There are existing mechanisms for that.
> > > >
> > > > Is there no benefit to let the GPU know earlier that it should speed up,
> > > > so that when the job queue arrives, it's already up to speed?
> > >
> > > Downstream we have input notifier that resumes the GPU so we can
> > > pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> > > wait to boost freq until we have cmdstream to submit, since that
> > > doesn't take as long. What needs help initially after input is all
> > > the stuff that happens on the CPU before the GPU can start to do
> > > anything ;-)
> > >
> > > Btw, I guess I haven't made this clear, dma-fence deadline is trying
> > > to help the steady-state situation, rather than the input-latency
> > > situation. It might take a frame or two of missed deadlines for
> > > gpufreq to arrive at a good steady-state freq.
> >
> > The mutter issue also is about a suboptimal steady-state.
> >
> > Truth be told, I'm not sure if this fence deadline idea fixes the
> > issue we're seeing or at least helps sometimes. It might, it might
> > not. What annoys me is that the compositor *knows* before any work is
> > submitted that some work will be submitted and when it has to finish.
> > We could maximize the chances to get everything right but having to
> > wait for a fence to materialize in the compositor to do anything about
> > it is suboptimal.
>
> Why would the app not immediately send the fence+buf to the compositor
> as soon as it is submitted to the kernel on client process side?

Some apps just are not good at this. Reading back work from the GPU,
taking a lot of CPU time to create the GPU work, etc.

The other obvious offender: frame callbacks. Committing a buffer only
happens after receiving a frame callback in FIFO/vsync mode which we
try to schedule as close to the deadline as possible.

The idea that the clients are able to submit all GPU work some time
early, then immediately commit to show up in the compositor well
before the deadline is very idealized. We're trying to get there but
we also only have control over the WSI so bad apps will still be bad
apps.

> At any rate, it really doesn't matter how early the kernel finds out
> about the deadline, since the point is to let the kernel driver know
> if it is missing the deadline so that it doesn't mis-interpret stall
> time waiting for the _next_ vblank after the one we wanted.

That's a good point! Let's see how well this works in practice and how
we can improve on that in the future.

> > > > >
> > > > > TBF I'm of the belief that there is still a need for input based cpu
> > > > > boost (and early wake-up trigger for GPU).. we have something like
> > > > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > > > is that fence deadlines are just one of several things we need to
> > > > > optimize power/perf and responsiveness, rather than the single thing
> > > > > that solves every problem under the sun ;-)
> > > >
> > > > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > > > of the puzzle that has the information to know whether there is need
> > > > actually speed up is the compositor, not the kernel.
> > > >
> > > > For example, pressing 'p' while a terminal is focused does not need high
> > > > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > > > the compositor to composite that update. Pressing <Super> may however
> > > > trigger a non-trivial animation moving a lot of stuff around on screen,
> > > > maybe triggering Wayland clients to draw and what not, and should most
> > > > arguably have the ability to "warn" the kernel about the upcoming flood
> > > > of work before it is already knocking on its door step.
> > >
> > > The super key is problematic, but not for the reason you think. It is
> > > because it is a case where we should boost on key-up instead of
> > > key-down.. and the second key-up event comes after the cpu-boost is
> > > already in it's cool-down period. But even if suboptimal in cases
> > > like this, it is still useful for touch/stylus cases where the
> > > slightest of lag is much more perceptible.
> > >
> > > This is getting off topic but I kinda favor coming up with some sort
> > > of static definition that userspace could give the kernel to let the
> > > kernel know what input to boost on. Or maybe something could be done
> > > with BPF?
> >
> > Why? Do you think user space is so slow that it can't process the
> > input events and then do a syscall? We need to have all input devices
> > open anyway that can affect the system and know more about how they
> > affect behavior than the kernel can ever know.
>
> Again this is getting off into a different topic. But my gut feel is
> that the shorter the path to input cpu freq boost, the better.. since
> however many extra cycles you add, they will be cycles with cpu (and
> probably ddr) at lowest freq

On the one hand, sure, that makes sense in theory. On the other hand,
we won't know for sure until we try it and I suspect a RT thread in
user space will be fast enough.

> BR,
> -R
>
> > >
> > > > >
> > > >
> > > > 8-< *snip* 8-<
> > > >
> > > > > >
> > > > > > Is it expected that WSI's will set their own deadlines, or should that
> > > > > > be the job of the compositor? For example by using compositors using
> > > > > > DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
> > > > > > deadline matching the vsync it most ideally will be committed to?
> > > > > >
> > > > >
> > > > > I'm kind of assuming compositors, but if the WSI somehow has more
> > > > > information about ideal presentation time, then I suppose it could be
> > > > > in the WSI? I'll defer to folks who spend more time on WSI and
> > > > > compositors to hash out the details ;-)
> > > >
> > > > With my compositor developer hat on, it might be best to let it be up to
> > > > the compositor, it's the one that knows if a client's content will
> > > > actually end up anywhere visible.
> > > >
> > >
> > > wfm
> > >
> > > BR,
> > > -R
> > >
> > > >
> > > > Jonas
> > > >
> > > > >
> > > > > BR,
> > > > > -R
> > >
> >
>


2023-03-17 16:00:22

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Fri, Mar 17, 2023 at 3:23 AM Jonas Ådahl <[email protected]> wrote:
>
> On Thu, Mar 16, 2023 at 09:28:55AM -0700, Rob Clark wrote:
> > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> > >
> > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > From: Rob Clark <[email protected]>
> > > > > > > >
> > > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > >
> > > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > v6: More docs
> > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > > ---
> > > > > > >
> > > > > > > Hi Rob!
> > > > > > >
> > > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > > 3 files changed, 87 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > :doc: fence signalling annotation
> > > > > > > >
> > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > +
> > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > + :doc: deadline hints
> > > > > > > > +
> > > > > > > > DMA Fences Functions Reference
> > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > > }
> > > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * DOC: deadline hints
> > > > > > > > + *
> > > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > > + *
> > > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > > + * result reduce frequency.
> > > > > > > > + *
> > > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > > + * needed.
> > > > > > >
> > > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > > for a long time.
> > > > > > >
> > > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > > developer, so I will need some hand holding when it comes to
> > > > > > > understanding exactly what piece of software is responsible for
> > > > > > > communicating what piece of information.
> > > > > > >
> > > > > > > > + *
> > > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > > + *
> > > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > > >
> > > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > > userspace can actually expect to be something it can rely on.
> > > > > >
> > > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > > clear if we talk about them separately even if they happen to be the
> > > > > > same device.
> > > > >
> > > > > Sure, sorry about being unclear about that.
> > > > >
> > > > > >
> > > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > > deadline is. In practice there is probably some buffering in the
> > > > > > display controller. For ex, block based (including bandwidth
> > > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > > time before you start sending pixel data to the display. But details
> > > > > > like this are heavily implementation dependent. I think the most
> > > > > > reasonable thing to target is start of vblank.
> > > > >
> > > > > The driver exposing those details would be quite useful for userspace
> > > > > though, so that it can delay committing updates to late, but not too
> > > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > > isn't enough for scheduling the actual commit.
> > > >
> > > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > > are talking about something on the order of 100us? But that is a bit
> > > > of another topic.
> > >
> > > Yes, something like that. But yea, it's not really related. Scheduling
> > > commits closer to the deadline has more complex behavior than that too,
> > > e.g. the need for real time scheduling, and knowing how long it usually
> > > takes to create and commit and for the kernel to process.
> > >
> > > >
> > >
> > > 8-< *snip* 8-<
> > >
> > > > > >
> > > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > > is unaware of anyways.
> > > > >
> > > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > > only way to tell the kernel it should speed things up. Even with the
> > > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > > or dropping the first frame is so far the best work around that works,
> > > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > > Having to rely on choosing between latency and frame drops should
> > > > > ideally not have to be made.
> > > >
> > > > Before you have a fence, the thing you want to be speeding up is the
> > > > CPU, not the GPU. There are existing mechanisms for that.
> > >
> > > Is there no benefit to let the GPU know earlier that it should speed up,
> > > so that when the job queue arrives, it's already up to speed?
> >
> > Downstream we have input notifier that resumes the GPU so we can
> > pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> > wait to boost freq until we have cmdstream to submit, since that
> > doesn't take as long. What needs help initially after input is all
> > the stuff that happens on the CPU before the GPU can start to do
> > anything ;-)
>
> How do you deal with boosting CPU speeds downstream? Does the input
> notifier do that too?

Yes.. actually currently downstream (depending on device) we have 1 to
3 input notifiers, one for CPU boost, one for early-PSR-exit, and one
to get a head start on booting up the GPU.

> >
> > Btw, I guess I haven't made this clear, dma-fence deadline is trying
> > to help the steady-state situation, rather than the input-latency
> > situation. It might take a frame or two of missed deadlines for
> > gpufreq to arrive at a good steady-state freq.
>
> I'm just not sure it will help. Missed deadlines set at commit hasn't
> been enough in the past to let the kernel understand it should speed
> things up before the next frame (which will be a whole frame late
> without any triple buffering which should be a last resort), so I don't
> see how it will help by adding a userspace hook to do the same thing.

So deadline is just a superset of "right now" and "sometime in the
future".. and this has been useful enough for i915 that they have both
forms, when waiting on GPU via i915 specific ioctls and when pageflip
(assuming userspace isn't deferring composition decision and instead
just pushing it all down to the kernel). But this breaks down in a
few cases:

1) non pageflip (for ex. ping-ponging between cpu and gpu) use cases
when you wait via polling on fence fd or wait via drm_syncobj instead
of DRM_IOCTL_I915_GEM_WAIT
2) when userspace decides late in frame to not pageflip because app
fence isn't signaled yet

And this is all done in a way that doesn't help for situations where
you have separate kms and render devices. Or the kms driver doesn't
bypass atomic helpers (ie. uses drm_atomic_helper_wait_for_fences()).
So the technique has already proven to be useful. This series just
extends it beyond driver specific primitives (ie.
dma_fence/drm_syncojb)

> I think input latency and steady state target frequency here is tightly
> linked; what we should aim for is to provide enough information at the
> right time so that it does *not* take a frame or two to of missed
> deadlines to arrive at the target frequency, as those missed deadlines
> either means either stuttering and/or lag.

If you have some magic way for a gl/vk driver to accurately predict
how many cycles it will take to execute a sequence of draws, I'm all
ears.

Realistically, the best solution on sudden input is to overshoot and
let freqs settle back down.

But there is a lot more to input latency than GPU freq. In UI
workloads, even fullscreen animation, I don't really see the GPU going
above the 2nd lowest OPP even on relatively small things like a618.
UI input latency (touch scrolling, on-screen stylus / low-latency-ink,
animations) are a separate issue from what this series addresses, and
aren't too much to do with GPU freq.

> That it helps with the deliberately late commit I do understand, but we
> don't do that yet, but intend to when there is kernel uapi to lets us do
> so without negative consequences.
>
> >
> > > >
> > > > TBF I'm of the belief that there is still a need for input based cpu
> > > > boost (and early wake-up trigger for GPU).. we have something like
> > > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > > is that fence deadlines are just one of several things we need to
> > > > optimize power/perf and responsiveness, rather than the single thing
> > > > that solves every problem under the sun ;-)
> > >
> > > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > > of the puzzle that has the information to know whether there is need
> > > actually speed up is the compositor, not the kernel.
> > >
> > > For example, pressing 'p' while a terminal is focused does not need high
> > > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > > the compositor to composite that update. Pressing <Super> may however
> > > trigger a non-trivial animation moving a lot of stuff around on screen,
> > > maybe triggering Wayland clients to draw and what not, and should most
> > > arguably have the ability to "warn" the kernel about the upcoming flood
> > > of work before it is already knocking on its door step.
> >
> > The super key is problematic, but not for the reason you think. It is
> > because it is a case where we should boost on key-up instead of
> > key-down.. and the second key-up event comes after the cpu-boost is
> > already in it's cool-down period. But even if suboptimal in cases
> > like this, it is still useful for touch/stylus cases where the
> > slightest of lag is much more perceptible.
>
> Other keys are even more problematic. Alt, for example, does nothing,
> Alt + Tab does some light rendering, but Alt + KeyAboveTab will,
> depending on the current active applications, suddenly trigger N Wayland
> surfaces to start rendering at the same time.
>
> >
> > This is getting off topic but I kinda favor coming up with some sort
> > of static definition that userspace could give the kernel to let the
> > kernel know what input to boost on. Or maybe something could be done
> > with BPF?
>
> I have hard time seeing any static information can be enough, it's
> depends too much on context what is expected to happen. And can a BPF
> program really help? Unless BPF programs that pulls some internal kernel
> strings to speed things up whenever userspace wants I don't see how it
> is that much better.
>
> I don't think userspace is necessarily too slow to actively particitpate
> in providing direct scheduling hints either. Input processing can, for
> example, be off loaded to a real time scheduled thread, and plumbing any
> hints about future expectations from rendering, windowing and layout
> subsystems will be significantly easier to plumb to a real time input
> thread than translated into static informations or BPF programs.

I mean, the kernel side input handler is called from irq context long
before even the scheduler gets involved..

But I think you are over-thinking the Alt + SomeOtherKey case. The
important thing isn't what the other key is, it is just to know that
Alt is a modifier key (ie. handle it on key-up instead of key-down).
No need to over-complicate things. It's probably enough to give the
kernel a list of modifier+key combo's that do _something_..

And like I've said before, keyboard input is the least problematic in
terms of latency. It is a _lot_ easier to notice lag with touch
scrolling or stylus (on screen). (The latter case, I think wayland
has some catching up to do compared to CrOS or android.. you really
need a way to allow the app to do front buffer rendering to an overlay
for the stylus case, because even just 16ms delay is _very_
noticeable.)

BR,
-R

2023-03-21 13:26:00

by Jonas Ådahl

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Fri, Mar 17, 2023 at 08:59:48AM -0700, Rob Clark wrote:
> On Fri, Mar 17, 2023 at 3:23 AM Jonas Ådahl <[email protected]> wrote:
> >
> > On Thu, Mar 16, 2023 at 09:28:55AM -0700, Rob Clark wrote:
> > > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > > > >
> > > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > > From: Rob Clark <[email protected]>
> > > > > > > > >
> > > > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > > >
> > > > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > > v6: More docs
> > > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > > >
> > > > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > Hi Rob!
> > > > > > > >
> > > > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > > > 3 files changed, 87 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > :doc: fence signalling annotation
> > > > > > > > >
> > > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > > +
> > > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > + :doc: deadline hints
> > > > > > > > > +
> > > > > > > > > DMA Fences Functions Reference
> > > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > > > }
> > > > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * DOC: deadline hints
> > > > > > > > > + *
> > > > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > > > + *
> > > > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > > > + * result reduce frequency.
> > > > > > > > > + *
> > > > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > > > + * needed.
> > > > > > > >
> > > > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > > > for a long time.
> > > > > > > >
> > > > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > > > developer, so I will need some hand holding when it comes to
> > > > > > > > understanding exactly what piece of software is responsible for
> > > > > > > > communicating what piece of information.
> > > > > > > >
> > > > > > > > > + *
> > > > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > > > + *
> > > > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > > > >
> > > > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > > > userspace can actually expect to be something it can rely on.
> > > > > > >
> > > > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > > > clear if we talk about them separately even if they happen to be the
> > > > > > > same device.
> > > > > >
> > > > > > Sure, sorry about being unclear about that.
> > > > > >
> > > > > > >
> > > > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > > > deadline is. In practice there is probably some buffering in the
> > > > > > > display controller. For ex, block based (including bandwidth
> > > > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > > > time before you start sending pixel data to the display. But details
> > > > > > > like this are heavily implementation dependent. I think the most
> > > > > > > reasonable thing to target is start of vblank.
> > > > > >
> > > > > > The driver exposing those details would be quite useful for userspace
> > > > > > though, so that it can delay committing updates to late, but not too
> > > > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > > > isn't enough for scheduling the actual commit.
> > > > >
> > > > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > > > are talking about something on the order of 100us? But that is a bit
> > > > > of another topic.
> > > >
> > > > Yes, something like that. But yea, it's not really related. Scheduling
> > > > commits closer to the deadline has more complex behavior than that too,
> > > > e.g. the need for real time scheduling, and knowing how long it usually
> > > > takes to create and commit and for the kernel to process.
> > > >
> > > > >
> > > >
> > > > 8-< *snip* 8-<
> > > >
> > > > > > >
> > > > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > > > is unaware of anyways.
> > > > > >
> > > > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > > > only way to tell the kernel it should speed things up. Even with the
> > > > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > > > or dropping the first frame is so far the best work around that works,
> > > > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > > > Having to rely on choosing between latency and frame drops should
> > > > > > ideally not have to be made.
> > > > >
> > > > > Before you have a fence, the thing you want to be speeding up is the
> > > > > CPU, not the GPU. There are existing mechanisms for that.
> > > >
> > > > Is there no benefit to let the GPU know earlier that it should speed up,
> > > > so that when the job queue arrives, it's already up to speed?
> > >
> > > Downstream we have input notifier that resumes the GPU so we can
> > > pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> > > wait to boost freq until we have cmdstream to submit, since that
> > > doesn't take as long. What needs help initially after input is all
> > > the stuff that happens on the CPU before the GPU can start to do
> > > anything ;-)
> >
> > How do you deal with boosting CPU speeds downstream? Does the input
> > notifier do that too?
>
> Yes.. actually currently downstream (depending on device) we have 1 to
> 3 input notifiers, one for CPU boost, one for early-PSR-exit, and one
> to get a head start on booting up the GPU.

Would be really nice to upstream these, one way or the other, be it
actually input event based, or via some uapi to just poke the kernel. I
realize it's not related to this thread, so this is just me wishing
things into the void.

>
> > >
> > > Btw, I guess I haven't made this clear, dma-fence deadline is trying
> > > to help the steady-state situation, rather than the input-latency
> > > situation. It might take a frame or two of missed deadlines for
> > > gpufreq to arrive at a good steady-state freq.
> >
> > I'm just not sure it will help. Missed deadlines set at commit hasn't
> > been enough in the past to let the kernel understand it should speed
> > things up before the next frame (which will be a whole frame late
> > without any triple buffering which should be a last resort), so I don't
> > see how it will help by adding a userspace hook to do the same thing.
>
> So deadline is just a superset of "right now" and "sometime in the
> future".. and this has been useful enough for i915 that they have both
> forms, when waiting on GPU via i915 specific ioctls and when pageflip
> (assuming userspace isn't deferring composition decision and instead
> just pushing it all down to the kernel). But this breaks down in a
> few cases:
>
> 1) non pageflip (for ex. ping-ponging between cpu and gpu) use cases
> when you wait via polling on fence fd or wait via drm_syncobj instead
> of DRM_IOCTL_I915_GEM_WAIT
> 2) when userspace decides late in frame to not pageflip because app
> fence isn't signaled yet

It breaks down in practice today, because we do entering the low-freq
feedback loop that triple buffering today effectively works around.
That is even with non-delayed page flipping, and a single pipeline
source (compositor only rendering) or only using already signaled ready
client buffers when compositing.

Anyway, I don't doubt its usefulness, just a bit pessimistic.

>
> And this is all done in a way that doesn't help for situations where
> you have separate kms and render devices. Or the kms driver doesn't
> bypass atomic helpers (ie. uses drm_atomic_helper_wait_for_fences()).
> So the technique has already proven to be useful. This series just
> extends it beyond driver specific primitives (ie.
> dma_fence/drm_syncojb)
>
> > I think input latency and steady state target frequency here is tightly
> > linked; what we should aim for is to provide enough information at the
> > right time so that it does *not* take a frame or two to of missed
> > deadlines to arrive at the target frequency, as those missed deadlines
> > either means either stuttering and/or lag.
>
> If you have some magic way for a gl/vk driver to accurately predict
> how many cycles it will take to execute a sequence of draws, I'm all
> ears.
>
> Realistically, the best solution on sudden input is to overshoot and
> let freqs settle back down.
>
> But there is a lot more to input latency than GPU freq. In UI
> workloads, even fullscreen animation, I don't really see the GPU going
> above the 2nd lowest OPP even on relatively small things like a618.
> UI input latency (touch scrolling, on-screen stylus / low-latency-ink,
> animations) are a separate issue from what this series addresses, and
> aren't too much to do with GPU freq.
>
> > That it helps with the deliberately late commit I do understand, but we
> > don't do that yet, but intend to when there is kernel uapi to lets us do
> > so without negative consequences.
> >
> > >
> > > > >
> > > > > TBF I'm of the belief that there is still a need for input based cpu
> > > > > boost (and early wake-up trigger for GPU).. we have something like
> > > > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > > > is that fence deadlines are just one of several things we need to
> > > > > optimize power/perf and responsiveness, rather than the single thing
> > > > > that solves every problem under the sun ;-)
> > > >
> > > > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > > > of the puzzle that has the information to know whether there is need
> > > > actually speed up is the compositor, not the kernel.
> > > >
> > > > For example, pressing 'p' while a terminal is focused does not need high
> > > > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > > > the compositor to composite that update. Pressing <Super> may however
> > > > trigger a non-trivial animation moving a lot of stuff around on screen,
> > > > maybe triggering Wayland clients to draw and what not, and should most
> > > > arguably have the ability to "warn" the kernel about the upcoming flood
> > > > of work before it is already knocking on its door step.
> > >
> > > The super key is problematic, but not for the reason you think. It is
> > > because it is a case where we should boost on key-up instead of
> > > key-down.. and the second key-up event comes after the cpu-boost is
> > > already in it's cool-down period. But even if suboptimal in cases
> > > like this, it is still useful for touch/stylus cases where the
> > > slightest of lag is much more perceptible.
> >
> > Other keys are even more problematic. Alt, for example, does nothing,
> > Alt + Tab does some light rendering, but Alt + KeyAboveTab will,
> > depending on the current active applications, suddenly trigger N Wayland
> > surfaces to start rendering at the same time.
> >
> > >
> > > This is getting off topic but I kinda favor coming up with some sort
> > > of static definition that userspace could give the kernel to let the
> > > kernel know what input to boost on. Or maybe something could be done
> > > with BPF?
> >
> > I have hard time seeing any static information can be enough, it's
> > depends too much on context what is expected to happen. And can a BPF
> > program really help? Unless BPF programs that pulls some internal kernel
> > strings to speed things up whenever userspace wants I don't see how it
> > is that much better.
> >
> > I don't think userspace is necessarily too slow to actively particitpate
> > in providing direct scheduling hints either. Input processing can, for
> > example, be off loaded to a real time scheduled thread, and plumbing any
> > hints about future expectations from rendering, windowing and layout
> > subsystems will be significantly easier to plumb to a real time input
> > thread than translated into static informations or BPF programs.
>
> I mean, the kernel side input handler is called from irq context long
> before even the scheduler gets involved..
>
> But I think you are over-thinking the Alt + SomeOtherKey case. The
> important thing isn't what the other key is, it is just to know that
> Alt is a modifier key (ie. handle it on key-up instead of key-down).
> No need to over-complicate things. It's probably enough to give the
> kernel a list of modifier+key combo's that do _something_..

Perhaps I'm over thinking it, it just seems all so unnecessary to
complicate the kernel so that it's able to predict when GUI animations
will happen instead of the GUI itself doing it when it is actually
beneficial. All it'd take (naively) is uapi for the three kind of boosts
downstream now does automatically from input events.

>
> And like I've said before, keyboard input is the least problematic in
> terms of latency. It is a _lot_ easier to notice lag with touch
> scrolling or stylus (on screen). (The latter case, I think wayland
> has some catching up to do compared to CrOS or android.. you really
> need a way to allow the app to do front buffer rendering to an overlay
> for the stylus case, because even just 16ms delay is _very_
> noticeable.)

Sure, but here too userpsace (rt thread in the compositor) is probably a
good enough place to predict when to boost since it will be the one
proxies e.g. the stylus input events to the application.

Front buffering on the other hand is a very different topic ;)


Jonas

>
> BR,
> -R

2023-03-21 14:35:01

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

On Tue, Mar 21, 2023 at 6:24 AM Jonas Ådahl <[email protected]> wrote:
>
> On Fri, Mar 17, 2023 at 08:59:48AM -0700, Rob Clark wrote:
> > On Fri, Mar 17, 2023 at 3:23 AM Jonas Ådahl <[email protected]> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 09:28:55AM -0700, Rob Clark wrote:
> > > > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl <[email protected]> wrote:
> > > > >
> > > > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl <[email protected]> wrote:
> > > > > > >
> > > > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > > > From: Rob Clark <[email protected]>
> > > > > > > > > >
> > > > > > > > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > > > > > > > vblank, which the fence waiter would prefer not to miss. This is to aid
> > > > > > > > > > the fence signaler in making power management decisions, like boosting
> > > > > > > > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > > > >
> > > > > > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > > > > > deadlines, to avoid increasing dma_fence size. The fence-context
> > > > > > > > > > implementation will need similar logic to track deadlines of all
> > > > > > > > > > the fences on the same timeline. [ckoenig]
> > > > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > > > v6: More docs
> > > > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > > > > Reviewed-by: Christian König <[email protected]>
> > > > > > > > > > Acked-by: Pekka Paalanen <[email protected]>
> > > > > > > > > > Reviewed-by: Bagas Sanjaya <[email protected]>
> > > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Hi Rob!
> > > > > > > > >
> > > > > > > > > > Documentation/driver-api/dma-buf.rst | 6 +++
> > > > > > > > > > drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++
> > > > > > > > > > include/linux/dma-fence.h | 22 +++++++++++
> > > > > > > > > > 3 files changed, 87 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > > > > .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > > :doc: fence signalling annotation
> > > > > > > > > >
> > > > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > > > +
> > > > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > > + :doc: deadline hints
> > > > > > > > > > +
> > > > > > > > > > DMA Fences Functions Reference
> > > > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> > > > > > > > > > }
> > > > > > > > > > EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > > > > > >
> > > > > > > > > > +/**
> > > > > > > > > > + * DOC: deadline hints
> > > > > > > > > > + *
> > > > > > > > > > + * In an ideal world, it would be possible to pipeline a workload sufficiently
> > > > > > > > > > + * that a utilization based device frequency governor could arrive at a minimum
> > > > > > > > > > + * frequency that meets the requirements of the use-case, in order to minimize
> > > > > > > > > > + * power consumption. But in the real world there are many workloads which
> > > > > > > > > > + * defy this ideal. For example, but not limited to:
> > > > > > > > > > + *
> > > > > > > > > > + * * Workloads that ping-pong between device and CPU, with alternating periods
> > > > > > > > > > + * of CPU waiting for device, and device waiting on CPU. This can result in
> > > > > > > > > > + * devfreq and cpufreq seeing idle time in their respective domains and in
> > > > > > > > > > + * result reduce frequency.
> > > > > > > > > > + *
> > > > > > > > > > + * * Workloads that interact with a periodic time based deadline, such as double
> > > > > > > > > > + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario,
> > > > > > > > > > + * missing a vblank deadline results in an *increase* in idle time on the GPU
> > > > > > > > > > + * (since it has to wait an additional vblank period), sending a signal to
> > > > > > > > > > + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is
> > > > > > > > > > + * needed.
> > > > > > > > >
> > > > > > > > > This is the use case I'd like to get some better understanding about how
> > > > > > > > > this series intends to work, as the problematic scheduling behavior
> > > > > > > > > triggered by missed deadlines has plagued compositing display servers
> > > > > > > > > for a long time.
> > > > > > > > >
> > > > > > > > > I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> > > > > > > > > developer, so I will need some hand holding when it comes to
> > > > > > > > > understanding exactly what piece of software is responsible for
> > > > > > > > > communicating what piece of information.
> > > > > > > > >
> > > > > > > > > > + *
> > > > > > > > > > + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
> > > > > > > > > > + * The deadline hint provides a way for the waiting driver, or userspace, to
> > > > > > > > > > + * convey an appropriate sense of urgency to the signaling driver.
> > > > > > > > > > + *
> > > > > > > > > > + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
> > > > > > > > > > + * facing APIs). The time could either be some point in the future (such as
> > > > > > > > > > + * the vblank based deadline for page-flipping, or the start of a compositor's
> > > > > > > > > > + * composition cycle), or the current time to indicate an immediate deadline
> > > > > > > > > > + * hint (Ie. forward progress cannot be made until this fence is signaled).
> > > > > > > > >
> > > > > > > > > Is it guaranteed that a GPU driver will use the actual start of the
> > > > > > > > > vblank as the effective deadline? I have some memories of seing
> > > > > > > > > something about vblank evasion browsing driver code, which I might have
> > > > > > > > > misunderstood, but I have yet to find whether this is something
> > > > > > > > > userspace can actually expect to be something it can rely on.
> > > > > > > >
> > > > > > > > I guess you mean s/GPU driver/display driver/ ? It makes things more
> > > > > > > > clear if we talk about them separately even if they happen to be the
> > > > > > > > same device.
> > > > > > >
> > > > > > > Sure, sorry about being unclear about that.
> > > > > > >
> > > > > > > >
> > > > > > > > Assuming that is what you mean, nothing strongly defines what the
> > > > > > > > deadline is. In practice there is probably some buffering in the
> > > > > > > > display controller. For ex, block based (including bandwidth
> > > > > > > > compressed) formats, you need to buffer up a row of blocks to
> > > > > > > > efficiently linearize for scanout. So you probably need to latch some
> > > > > > > > time before you start sending pixel data to the display. But details
> > > > > > > > like this are heavily implementation dependent. I think the most
> > > > > > > > reasonable thing to target is start of vblank.
> > > > > > >
> > > > > > > The driver exposing those details would be quite useful for userspace
> > > > > > > though, so that it can delay committing updates to late, but not too
> > > > > > > late. Setting a deadline to be the vblank seems easy enough, but it
> > > > > > > isn't enough for scheduling the actual commit.
> > > > > >
> > > > > > I'm not entirely sure how that would even work.. but OTOH I think you
> > > > > > are talking about something on the order of 100us? But that is a bit
> > > > > > of another topic.
> > > > >
> > > > > Yes, something like that. But yea, it's not really related. Scheduling
> > > > > commits closer to the deadline has more complex behavior than that too,
> > > > > e.g. the need for real time scheduling, and knowing how long it usually
> > > > > takes to create and commit and for the kernel to process.
> > > > >
> > > > > >
> > > > >
> > > > > 8-< *snip* 8-<
> > > > >
> > > > > > > >
> > > > > > > > You need a fence to set the deadline, and for that work needs to be
> > > > > > > > flushed. But you can't associate a deadline with work that the kernel
> > > > > > > > is unaware of anyways.
> > > > > > >
> > > > > > > That makes sense, but it might also a bit inadequate to have it as the
> > > > > > > only way to tell the kernel it should speed things up. Even with the
> > > > > > > trick i915 does, with GNOME Shell, we still end up with the feedback
> > > > > > > loop this series aims to mitigate. Doing triple buffering, i.e. delaying
> > > > > > > or dropping the first frame is so far the best work around that works,
> > > > > > > except doing other tricks that makes the kernel to ramp up its clock.
> > > > > > > Having to rely on choosing between latency and frame drops should
> > > > > > > ideally not have to be made.
> > > > > >
> > > > > > Before you have a fence, the thing you want to be speeding up is the
> > > > > > CPU, not the GPU. There are existing mechanisms for that.
> > > > >
> > > > > Is there no benefit to let the GPU know earlier that it should speed up,
> > > > > so that when the job queue arrives, it's already up to speed?
> > > >
> > > > Downstream we have input notifier that resumes the GPU so we can
> > > > pipeline the 1-2ms it takes to boot up the GPU with userspace. But we
> > > > wait to boost freq until we have cmdstream to submit, since that
> > > > doesn't take as long. What needs help initially after input is all
> > > > the stuff that happens on the CPU before the GPU can start to do
> > > > anything ;-)
> > >
> > > How do you deal with boosting CPU speeds downstream? Does the input
> > > notifier do that too?
> >
> > Yes.. actually currently downstream (depending on device) we have 1 to
> > 3 input notifiers, one for CPU boost, one for early-PSR-exit, and one
> > to get a head start on booting up the GPU.
>
> Would be really nice to upstream these, one way or the other, be it
> actually input event based, or via some uapi to just poke the kernel. I
> realize it's not related to this thread, so this is just me wishing
> things into the void.

There was a drm/input_helper proposed maybe a year or so back, mainly
for the early-PSR-exit but I was planning to build on that for early
GPU wake-up. I guess we should revisit it. Might not be the right
place for cpu boost, but it solves some problems so it's a start.

As far as uapi, I think sysfs already gives you everything or at least
most everything you need. For ex,
/sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq .. on the gpu
side, for drivers using devfreq (ie. panfrost/msm/etc) there is
similar sysfs. I'm not sure what sort of knobs are avail on
intel/amd.

BR,
-R

> >
> > > >
> > > > Btw, I guess I haven't made this clear, dma-fence deadline is trying
> > > > to help the steady-state situation, rather than the input-latency
> > > > situation. It might take a frame or two of missed deadlines for
> > > > gpufreq to arrive at a good steady-state freq.
> > >
> > > I'm just not sure it will help. Missed deadlines set at commit hasn't
> > > been enough in the past to let the kernel understand it should speed
> > > things up before the next frame (which will be a whole frame late
> > > without any triple buffering which should be a last resort), so I don't
> > > see how it will help by adding a userspace hook to do the same thing.
> >
> > So deadline is just a superset of "right now" and "sometime in the
> > future".. and this has been useful enough for i915 that they have both
> > forms, when waiting on GPU via i915 specific ioctls and when pageflip
> > (assuming userspace isn't deferring composition decision and instead
> > just pushing it all down to the kernel). But this breaks down in a
> > few cases:
> >
> > 1) non pageflip (for ex. ping-ponging between cpu and gpu) use cases
> > when you wait via polling on fence fd or wait via drm_syncobj instead
> > of DRM_IOCTL_I915_GEM_WAIT
> > 2) when userspace decides late in frame to not pageflip because app
> > fence isn't signaled yet
>
> It breaks down in practice today, because we do entering the low-freq
> feedback loop that triple buffering today effectively works around.
> That is even with non-delayed page flipping, and a single pipeline
> source (compositor only rendering) or only using already signaled ready
> client buffers when compositing.
>
> Anyway, I don't doubt its usefulness, just a bit pessimistic.
>
> >
> > And this is all done in a way that doesn't help for situations where
> > you have separate kms and render devices. Or the kms driver doesn't
> > bypass atomic helpers (ie. uses drm_atomic_helper_wait_for_fences()).
> > So the technique has already proven to be useful. This series just
> > extends it beyond driver specific primitives (ie.
> > dma_fence/drm_syncojb)
> >
> > > I think input latency and steady state target frequency here is tightly
> > > linked; what we should aim for is to provide enough information at the
> > > right time so that it does *not* take a frame or two to of missed
> > > deadlines to arrive at the target frequency, as those missed deadlines
> > > either means either stuttering and/or lag.
> >
> > If you have some magic way for a gl/vk driver to accurately predict
> > how many cycles it will take to execute a sequence of draws, I'm all
> > ears.
> >
> > Realistically, the best solution on sudden input is to overshoot and
> > let freqs settle back down.
> >
> > But there is a lot more to input latency than GPU freq. In UI
> > workloads, even fullscreen animation, I don't really see the GPU going
> > above the 2nd lowest OPP even on relatively small things like a618.
> > UI input latency (touch scrolling, on-screen stylus / low-latency-ink,
> > animations) are a separate issue from what this series addresses, and
> > aren't too much to do with GPU freq.
> >
> > > That it helps with the deliberately late commit I do understand, but we
> > > don't do that yet, but intend to when there is kernel uapi to lets us do
> > > so without negative consequences.
> > >
> > > >
> > > > > >
> > > > > > TBF I'm of the belief that there is still a need for input based cpu
> > > > > > boost (and early wake-up trigger for GPU).. we have something like
> > > > > > this in CrOS kernel. That is a bit of a different topic, but my point
> > > > > > is that fence deadlines are just one of several things we need to
> > > > > > optimize power/perf and responsiveness, rather than the single thing
> > > > > > that solves every problem under the sun ;-)
> > > > >
> > > > > Perhaps; but I believe it's a bit of a back channel of intent; the piece
> > > > > of the puzzle that has the information to know whether there is need
> > > > > actually speed up is the compositor, not the kernel.
> > > > >
> > > > > For example, pressing 'p' while a terminal is focused does not need high
> > > > > frequency clocks, it just needs the terminal emulator to draw a 'p' and
> > > > > the compositor to composite that update. Pressing <Super> may however
> > > > > trigger a non-trivial animation moving a lot of stuff around on screen,
> > > > > maybe triggering Wayland clients to draw and what not, and should most
> > > > > arguably have the ability to "warn" the kernel about the upcoming flood
> > > > > of work before it is already knocking on its door step.
> > > >
> > > > The super key is problematic, but not for the reason you think. It is
> > > > because it is a case where we should boost on key-up instead of
> > > > key-down.. and the second key-up event comes after the cpu-boost is
> > > > already in it's cool-down period. But even if suboptimal in cases
> > > > like this, it is still useful for touch/stylus cases where the
> > > > slightest of lag is much more perceptible.
> > >
> > > Other keys are even more problematic. Alt, for example, does nothing,
> > > Alt + Tab does some light rendering, but Alt + KeyAboveTab will,
> > > depending on the current active applications, suddenly trigger N Wayland
> > > surfaces to start rendering at the same time.
> > >
> > > >
> > > > This is getting off topic but I kinda favor coming up with some sort
> > > > of static definition that userspace could give the kernel to let the
> > > > kernel know what input to boost on. Or maybe something could be done
> > > > with BPF?
> > >
> > > I have hard time seeing any static information can be enough, it's
> > > depends too much on context what is expected to happen. And can a BPF
> > > program really help? Unless BPF programs that pulls some internal kernel
> > > strings to speed things up whenever userspace wants I don't see how it
> > > is that much better.
> > >
> > > I don't think userspace is necessarily too slow to actively particitpate
> > > in providing direct scheduling hints either. Input processing can, for
> > > example, be off loaded to a real time scheduled thread, and plumbing any
> > > hints about future expectations from rendering, windowing and layout
> > > subsystems will be significantly easier to plumb to a real time input
> > > thread than translated into static informations or BPF programs.
> >
> > I mean, the kernel side input handler is called from irq context long
> > before even the scheduler gets involved..
> >
> > But I think you are over-thinking the Alt + SomeOtherKey case. The
> > important thing isn't what the other key is, it is just to know that
> > Alt is a modifier key (ie. handle it on key-up instead of key-down).
> > No need to over-complicate things. It's probably enough to give the
> > kernel a list of modifier+key combo's that do _something_..
>
> Perhaps I'm over thinking it, it just seems all so unnecessary to
> complicate the kernel so that it's able to predict when GUI animations
> will happen instead of the GUI itself doing it when it is actually
> beneficial. All it'd take (naively) is uapi for the three kind of boosts
> downstream now does automatically from input events.
>
> >
> > And like I've said before, keyboard input is the least problematic in
> > terms of latency. It is a _lot_ easier to notice lag with touch
> > scrolling or stylus (on screen). (The latter case, I think wayland
> > has some catching up to do compared to CrOS or android.. you really
> > need a way to allow the app to do front buffer rendering to an overlay
> > for the stylus case, because even just 16ms delay is _very_
> > noticeable.)
>
> Sure, but here too userpsace (rt thread in the compositor) is probably a
> good enough place to predict when to boost since it will be the one
> proxies e.g. the stylus input events to the application.
>
> Front buffering on the other hand is a very different topic ;)
>
>
> Jonas
>
> >
> > BR,
> > -R

2023-03-27 19:13:22

by Matt Turner

[permalink] [raw]
Subject: Re: [PATCH v10 00/15] dma-fence: Deadline awareness

On Wed, Mar 8, 2023 at 10:53 AM Rob Clark <[email protected]> wrote:
>
> From: Rob Clark <[email protected]>
>
> This series adds a deadline hint to fences, so realtime deadlines
> such as vblank can be communicated to the fence signaller for power/
> frequency management decisions.
>
> This is partially inspired by a trick i915 does, but implemented
> via dma-fence for a couple of reasons:
>
> 1) To continue to be able to use the atomic helpers
> 2) To support cases where display and gpu are different drivers
>
> This iteration adds a dma-fence ioctl to set a deadline (both to
> support igt-tests, and compositors which delay decisions about which
> client buffer to display), and a sw_sync ioctl to read back the
> deadline. IGT tests utilizing these can be found at:


I read through the series and didn't spot anything. Have a rather weak

Reviewed-by: Matt Turner <[email protected]>

Thanks!

2023-03-28 14:05:04

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH v10 07/15] dma-buf/sw_sync: Add fence deadline support


On 08/03/2023 15:52, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> This consists of simply storing the most recent deadline, and adding an
> ioctl to retrieve the deadline. This can be used in conjunction with
> the SET_DEADLINE ioctl on a fence fd for testing. Ie. create various
> sw_sync fences, merge them into a fence-array, set deadline on the
> fence-array and confirm that it is propagated properly to each fence.
>
> v2: Switch UABI to express deadline as u64
> v3: More verbose UAPI docs, show how to convert from timespec
> v4: Better comments, track the soonest deadline, as a normal fence
> implementation would, return an error if no deadline set.
>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Christian König <[email protected]>
> Acked-by: Pekka Paalanen <[email protected]>
> ---
> drivers/dma-buf/sw_sync.c | 81 ++++++++++++++++++++++++++++++++++++
> drivers/dma-buf/sync_debug.h | 2 +
> 2 files changed, 83 insertions(+)
>
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 348b3a9170fa..f53071bca3af 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -52,12 +52,33 @@ struct sw_sync_create_fence_data {
> __s32 fence; /* fd of new fence */
> };
>
> +/**
> + * struct sw_sync_get_deadline - get the deadline hint of a sw_sync fence
> + * @deadline_ns: absolute time of the deadline
> + * @pad: must be zero
> + * @fence_fd: the sw_sync fence fd (in)
> + *
> + * Return the earliest deadline set on the fence. The timebase for the
> + * deadline is CLOCK_MONOTONIC (same as vblank). If there is no deadline

Mentioning vblank reads odd since this is drivers/dma-buf/. Dunno.

> + * set on the fence, this ioctl will return -ENOENT.
> + */
> +struct sw_sync_get_deadline {
> + __u64 deadline_ns;
> + __u32 pad;
> + __s32 fence_fd;
> +};
> +
> #define SW_SYNC_IOC_MAGIC 'W'
>
> #define SW_SYNC_IOC_CREATE_FENCE _IOWR(SW_SYNC_IOC_MAGIC, 0,\
> struct sw_sync_create_fence_data)
>
> #define SW_SYNC_IOC_INC _IOW(SW_SYNC_IOC_MAGIC, 1, __u32)
> +#define SW_SYNC_GET_DEADLINE _IOWR(SW_SYNC_IOC_MAGIC, 2, \
> + struct sw_sync_get_deadline)
> +
> +
> +#define SW_SYNC_HAS_DEADLINE_BIT DMA_FENCE_FLAG_USER_BITS
>
> static const struct dma_fence_ops timeline_fence_ops;
>
> @@ -171,6 +192,22 @@ static void timeline_fence_timeline_value_str(struct dma_fence *fence,
> snprintf(str, size, "%d", parent->value);
> }
>
> +static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> +{
> + struct sync_pt *pt = dma_fence_to_sync_pt(fence);
> + unsigned long flags;
> +
> + spin_lock_irqsave(fence->lock, flags);
> + if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
> + if (ktime_before(deadline, pt->deadline))
> + pt->deadline = deadline;
> + } else {
> + pt->deadline = deadline;
> + set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);

FWIW could use __set_bit to avoid needless atomic under spinlock.

> + }
> + spin_unlock_irqrestore(fence->lock, flags);
> +}
> +
> static const struct dma_fence_ops timeline_fence_ops = {
> .get_driver_name = timeline_fence_get_driver_name,
> .get_timeline_name = timeline_fence_get_timeline_name,
> @@ -179,6 +216,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
> .release = timeline_fence_release,
> .fence_value_str = timeline_fence_value_str,
> .timeline_value_str = timeline_fence_timeline_value_str,
> + .set_deadline = timeline_fence_set_deadline,
> };
>
> /**
> @@ -387,6 +425,46 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg)
> return 0;
> }
>
> +static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long arg)
> +{
> + struct sw_sync_get_deadline data;
> + struct dma_fence *fence;
> + struct sync_pt *pt;
> + int ret = 0;
> +
> + if (copy_from_user(&data, (void __user *)arg, sizeof(data)))
> + return -EFAULT;
> +
> + if (data.deadline_ns || data.pad)
> + return -EINVAL;
> +
> + fence = sync_file_get_fence(data.fence_fd);
> + if (!fence)
> + return -EINVAL;
> +
> + pt = dma_fence_to_sync_pt(fence);
> + if (!pt)
> + return -EINVAL;
> +
> + spin_lock(fence->lock);

This may need to be _irq.

> + if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
> + data.deadline_ns = ktime_to_ns(pt->deadline);
> + } else {
> + ret = -ENOENT;
> + }
> + spin_unlock(fence->lock);
> +
> + dma_fence_put(fence);
> +
> + if (ret)
> + return ret;
> +
> + if (copy_to_user((void __user *)arg, &data, sizeof(data)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> static long sw_sync_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> @@ -399,6 +477,9 @@ static long sw_sync_ioctl(struct file *file, unsigned int cmd,
> case SW_SYNC_IOC_INC:
> return sw_sync_ioctl_inc(obj, arg);
>
> + case SW_SYNC_GET_DEADLINE:
> + return sw_sync_ioctl_get_deadline(obj, arg);
> +
> default:
> return -ENOTTY;
> }
> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> index 6176e52ba2d7..a1bdd62efccd 100644
> --- a/drivers/dma-buf/sync_debug.h
> +++ b/drivers/dma-buf/sync_debug.h
> @@ -55,11 +55,13 @@ static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
> * @base: base fence object
> * @link: link on the sync timeline's list
> * @node: node in the sync timeline's tree
> + * @deadline: the earliest fence deadline hint
> */
> struct sync_pt {
> struct dma_fence base;
> struct list_head link;
> struct rb_node node;
> + ktime_t deadline;
> };
>
> extern const struct file_operations sw_sync_debugfs_fops;

Regards,

Tvrtko

2023-03-28 14:26:09

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH v10 09/15] drm/syncobj: Add deadline support for syncobj waits


On 08/03/2023 15:53, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> Add a new flag to let userspace provide a deadline as a hint for syncobj
> and timeline waits. This gives a hint to the driver signaling the
> backing fences about how soon userspace needs it to compete work, so it
> can addjust GPU frequency accordingly. An immediate deadline can be

adjust

> given to provide something equivalent to i915 "wait boost".
>
> v2: Use absolute u64 ns value for deadline hint, drop cap and driver
> feature flag in favor of allowing count_handles==0 as a way for
> userspace to probe kernel for support of new flag
> v3: More verbose comments about UAPI
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> drivers/gpu/drm/drm_syncobj.c | 64 ++++++++++++++++++++++++++++-------
> include/uapi/drm/drm.h | 17 ++++++++++
> 2 files changed, 68 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 0c2be8360525..a85e9464f07b 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -126,6 +126,11 @@
> * synchronize between the two.
> * This requirement is inherited from the Vulkan fence API.
> *
> + * If &DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
> + * a fence deadline hint on the backing fences before waiting, to provide the
> + * fence signaler with an appropriate sense of urgency. The deadline is
> + * specified as an absolute &CLOCK_MONOTONIC value in units of ns.
> + *
> * Similarly, &DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
> * handles as well as an array of u64 points and does a host-side wait on all
> * of syncobj fences at the given points simultaneously.
> @@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
> uint32_t count,
> uint32_t flags,
> signed long timeout,
> - uint32_t *idx)
> + uint32_t *idx,
> + ktime_t *deadline)
> {
> struct syncobj_wait_entry *entries;
> struct dma_fence *fence;
> @@ -1053,6 +1059,15 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
> drm_syncobj_fence_add_wait(syncobjs[i], &entries[i]);
> }
>
> + if (deadline) {
> + for (i = 0; i < count; ++i) {
> + fence = entries[i].fence;
> + if (!fence)
> + continue;
> + dma_fence_set_deadline(fence, *deadline);
> + }
> + }
> +
> do {
> set_current_state(TASK_INTERRUPTIBLE);
>
> @@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
> struct drm_file *file_private,
> struct drm_syncobj_wait *wait,
> struct drm_syncobj_timeline_wait *timeline_wait,
> - struct drm_syncobj **syncobjs, bool timeline)
> + struct drm_syncobj **syncobjs, bool timeline,
> + ktime_t *deadline)
> {
> signed long timeout = 0;
> uint32_t first = ~0;
> @@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
> NULL,
> wait->count_handles,
> wait->flags,
> - timeout, &first);
> + timeout, &first,
> + deadline);
> if (timeout < 0)
> return timeout;
> wait->first_signaled = first;
> @@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
> u64_to_user_ptr(timeline_wait->points),
> timeline_wait->count_handles,
> timeline_wait->flags,
> - timeout, &first);
> + timeout, &first,
> + deadline);
> if (timeout < 0)
> return timeout;
> timeline_wait->first_signaled = first;
> @@ -1243,17 +1261,22 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> {
> struct drm_syncobj_wait *args = data;
> struct drm_syncobj **syncobjs;
> + unsigned possible_flags;
> + ktime_t t, *tp = NULL;
> int ret = 0;
>
> if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
> return -EOPNOTSUPP;
>
> - if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
> - DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
> + possible_flags = DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE;
> +
> + if (args->flags & ~possible_flags)
> return -EINVAL;
>
> if (args->count_handles == 0)
> - return -EINVAL;
> + return 0;
>
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> @@ -1262,8 +1285,13 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> if (ret < 0)
> return ret;
>
> + if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE) {
> + t = ns_to_ktime(args->deadline_ns);
> + tp = &t;
> + }
> +
> ret = drm_syncobj_array_wait(dev, file_private,
> - args, NULL, syncobjs, false);
> + args, NULL, syncobjs, false, tp);
>
> drm_syncobj_array_free(syncobjs, args->count_handles);
>
> @@ -1276,18 +1304,23 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
> {
> struct drm_syncobj_timeline_wait *args = data;
> struct drm_syncobj **syncobjs;
> + unsigned possible_flags;
> + ktime_t t, *tp = NULL;
> int ret = 0;
>
> if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
> return -EOPNOTSUPP;
>
> - if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
> - DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
> - DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE))
> + possible_flags = DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE |
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE;
> +
> + if (args->flags & ~possible_flags)
> return -EINVAL;
>
> if (args->count_handles == 0)
> - return -EINVAL;
> + return -0;
>
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> @@ -1296,8 +1329,13 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
> if (ret < 0)
> return ret;
>
> + if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE) {
> + t = ns_to_ktime(args->deadline_ns);
> + tp = &t;
> + }
> +
> ret = drm_syncobj_array_wait(dev, file_private,
> - NULL, args, syncobjs, true);
> + NULL, args, syncobjs, true, tp);
>
> drm_syncobj_array_free(syncobjs, args->count_handles);
>
> diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> index 642808520d92..bff0509ac8b6 100644
> --- a/include/uapi/drm/drm.h
> +++ b/include/uapi/drm/drm.h
> @@ -887,6 +887,7 @@ struct drm_syncobj_transfer {
> #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
> #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
> #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for time point to become available */
> +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence deadline based to deadline_ns */

s/based// ?

> struct drm_syncobj_wait {
> __u64 handles;
> /* absolute timeout */
> @@ -895,6 +896,14 @@ struct drm_syncobj_wait {
> __u32 flags;
> __u32 first_signaled; /* only valid when not waiting all */
> __u32 pad;
> + /**
> + * @deadline_ns - fence deadline hint
> + *
> + * Deadline hint, in absolute CLOCK_MONOTONIC, to set on backing
> + * fence(s) if the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE flag is
> + * set.
> + */
> + __u64 deadline_ns;
> };
>
> struct drm_syncobj_timeline_wait {
> @@ -907,6 +916,14 @@ struct drm_syncobj_timeline_wait {
> __u32 flags;
> __u32 first_signaled; /* only valid when not waiting all */
> __u32 pad;
> + /**
> + * @deadline_ns - fence deadline hint
> + *
> + * Deadline hint, in absolute CLOCK_MONOTONIC, to set on backing
> + * fence(s) if the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE flag is
> + * set.
> + */
> + __u64 deadline_ns;
> };
>
>

FWIW,

Reviewed-by: Tvrtko Ursulin <[email protected]>

Regards,

Tvrtko

2023-03-31 20:48:15

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

Hi Rob,

On Wed, Mar 08, 2023 at 07:53:02AM -0800, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> the next vblank time, and inform the fence(s) of that deadline.
>
> v2: Comment typo fix (danvet)
> v3: If there are multiple CRTCs, consider the time of the soonest vblank
>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Daniel Vetter <[email protected]>
> Signed-off-by: Rob Clark <[email protected]>

I apologize if this has already been reported or fixed, I searched lore
but did not find anything.

This change as commit d39e48ca80c0 ("drm/atomic-helper: Set fence
deadline for vblank") in -next causes a hang while running LTP's
read_all test on /proc on my Ampere Altra system (it seems it is hanging
on a pagemap file?). Additionally, I have this splat in dmesg, which
seems related based on the call stack.

[ 20.542591] fbcon: Taking over console
[ 20.550772] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000074
[ 20.550776] Mem abort info:
[ 20.550777] ESR = 0x0000000096000004
[ 20.550779] EC = 0x25: DABT (current EL), IL = 32 bits
[ 20.550781] SET = 0, FnV = 0
[ 20.550782] EA = 0, S1PTW = 0
[ 20.550784] FSC = 0x04: level 0 translation fault
[ 20.550785] Data abort info:
[ 20.550786] ISV = 0, ISS = 0x00000004
[ 20.550788] CM = 0, WnR = 0
[ 20.550789] user pgtable: 4k pages, 48-bit VAs, pgdp=0000080009d16000
[ 20.550791] [0000000000000074] pgd=0000000000000000, p4d=0000000000000000
[ 20.550796] Internal error: Oops: 0000000096000004 [#1] SMP
[ 20.550800] Modules linked in: ip6table_nat tun nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc vfat fat xfs snd_usb_audio snd_hwdep snd_usbmidi_lib snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore joydev mc ipmi_ssif ipmi_devintf ipmi_msghandler arm_spe_pmu arm_cmn arm_dsu_pmu arm_dmc620_pmu cppc_cpufreq loop zram crct10dif_ce polyval_ce nvme polyval_generic ghash_ce sbsa_gwdt igb nvme_core ast nvme_common i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
[ 20.550869] CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 6.3.0-rc2-00008-gd39e48ca80c0 #1
[ 20.550872] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
[ 20.550875] Workqueue: events fbcon_register_existing_fbs
[ 20.550884] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 20.550888] pc : drm_crtc_next_vblank_start+0x2c/0x98
[ 20.550894] lr : drm_atomic_helper_wait_for_fences+0x90/0x240
[ 20.550898] sp : ffff80000d583960
[ 20.550900] x29: ffff80000d583960 x28: ffff07ff8fc187b0 x27: 0000000000000000
[ 20.550904] x26: ffff07ff99c08c00 x25: 0000000000000038 x24: ffff07ff99c0c000
[ 20.550908] x23: 0000000000000001 x22: 0000000000000038 x21: 0000000000000000
[ 20.550912] x20: ffff07ff9640a280 x19: 0000000000000000 x18: ffffffffffffffff
[ 20.550915] x17: 0000000000000000 x16: ffffb24d2eece1c0 x15: 0000003038303178
[ 20.550919] x14: 3032393100000048 x13: 0000000000000000 x12: 0000000000000000
[ 20.550923] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffb24d2eeeaca0
[ 20.550926] x8 : ffff80000d583628 x7 : 0000080077783000 x6 : 0000000000000000
[ 20.550930] x5 : ffff80000d584000 x4 : ffff07ff99c0c000 x3 : 0000000000000130
[ 20.550934] x2 : 0000000000000000 x1 : ffff80000d5839c0 x0 : ffff07ff99c0cc08
[ 20.550937] Call trace:
[ 20.550939] drm_crtc_next_vblank_start+0x2c/0x98
[ 20.550942] drm_atomic_helper_wait_for_fences+0x90/0x240
[ 20.550946] drm_atomic_helper_commit+0xb0/0x188
[ 20.550949] drm_atomic_commit+0xb0/0xf0
[ 20.550953] drm_client_modeset_commit_atomic+0x218/0x280
[ 20.550957] drm_client_modeset_commit_locked+0x64/0x1a0
[ 20.550961] drm_client_modeset_commit+0x38/0x68
[ 20.550965] __drm_fb_helper_restore_fbdev_mode_unlocked+0xb0/0xf8
[ 20.550970] drm_fb_helper_set_par+0x44/0x88
[ 20.550973] fbcon_init+0x1e0/0x4a8
[ 20.550976] visual_init+0xbc/0x118
[ 20.550981] do_bind_con_driver.isra.0+0x194/0x3a0
[ 20.550984] do_take_over_console+0x50/0x70
[ 20.550987] do_fbcon_takeover+0x74/0xf8
[ 20.550989] do_fb_registered+0x13c/0x158
[ 20.550992] fbcon_register_existing_fbs+0x78/0xc0
[ 20.550995] process_one_work+0x1ec/0x478
[ 20.551000] worker_thread+0x74/0x418
[ 20.551002] kthread+0xec/0x100
[ 20.551005] ret_from_fork+0x10/0x20
[ 20.551011] Code: f9400004 b9409013 f940a082 9ba30a73 (b9407662)
[ 20.551013] ---[ end trace 0000000000000000 ]---

If there is any additional information that I can provide or patches I
can test, I am more than happy to do so.

Cheers,
Nathan

# bad: [4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5] Add linux-next specific files for 20230331
# good: [b2bc47e9b2011a183f9d3d3454a294a938082fb9] Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect start '4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5' 'b2bc47e9b2011a183f9d3d3454a294a938082fb9'
# good: [ed5f95f3349003d74a4a11b27b0f05d6794c382a] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good ed5f95f3349003d74a4a11b27b0f05d6794c382a
# bad: [85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
git bisect bad 85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b
# bad: [fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f] Merge branch 'msm-next' of https://gitlab.freedesktop.org/drm/msm.git
git bisect bad fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f
# good: [90031bc33f7525f0cc7a9ef0b1df62a1a4463382] Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect good 90031bc33f7525f0cc7a9ef0b1df62a1a4463382
# good: [d4e04817db670083aed73de1fadd3b21758e69ba] drm/amdgpu: Return from switch early for EEPROM I2C address
git bisect good d4e04817db670083aed73de1fadd3b21758e69ba
# good: [70e360f9b548d99f959668d4f047d1363d42fe8e] drm: exynos: dsi: Consolidate component and bridge
git bisect good 70e360f9b548d99f959668d4f047d1363d42fe8e
# bad: [0b43595d0cbb06736d1e572e79e29a410a273573] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
git bisect bad 0b43595d0cbb06736d1e572e79e29a410a273573
# good: [fbb3b3500f76ec8b741bd2d0e761ca3e856ad924] dt-bindings: display: boe,tv101wum-nl6: document rotation
git bisect good fbb3b3500f76ec8b741bd2d0e761ca3e856ad924
# bad: [82bbec189ab34873688484cd14189a5392946fbb] Merge v6.3-rc4 into drm-next
git bisect bad 82bbec189ab34873688484cd14189a5392946fbb
# bad: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank
git bisect bad d39e48ca80c0960b039cb38633957f0040f63e1a
# good: [d7d5a21dd6b4706c04fbba5d25db8da5f25aab68] dma-buf/dma-resv: Add a way to set fence deadline
git bisect good d7d5a21dd6b4706c04fbba5d25db8da5f25aab68
# good: [f3823da7e4ba7d4781375c2bb786a8a78efc6591] drm/scheduler: Add fence deadline support
git bisect good f3823da7e4ba7d4781375c2bb786a8a78efc6591
# good: [b2c077d001b612b1f34f7e528b2dc6072bd6794e] drm/vblank: Add helper to get next vblank time
git bisect good b2c077d001b612b1f34f7e528b2dc6072bd6794e
# first bad commit: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank

2023-03-31 22:17:08

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On Fri, Mar 31, 2023 at 1:44 PM Nathan Chancellor <[email protected]> wrote:
>
> Hi Rob,
>
> On Wed, Mar 08, 2023 at 07:53:02AM -0800, Rob Clark wrote:
> > From: Rob Clark <[email protected]>
> >
> > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > the next vblank time, and inform the fence(s) of that deadline.
> >
> > v2: Comment typo fix (danvet)
> > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > Reviewed-by: Daniel Vetter <[email protected]>
> > Signed-off-by: Rob Clark <[email protected]>
>
> I apologize if this has already been reported or fixed, I searched lore
> but did not find anything.
>
> This change as commit d39e48ca80c0 ("drm/atomic-helper: Set fence
> deadline for vblank") in -next causes a hang while running LTP's
> read_all test on /proc on my Ampere Altra system (it seems it is hanging
> on a pagemap file?). Additionally, I have this splat in dmesg, which
> seems related based on the call stack.

Hi, I'm not familiar with this hardware.. do you know which drm driver
is used? I can't tell from the call-stack.

BR,
-R


> [ 20.542591] fbcon: Taking over console
> [ 20.550772] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000074
> [ 20.550776] Mem abort info:
> [ 20.550777] ESR = 0x0000000096000004
> [ 20.550779] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 20.550781] SET = 0, FnV = 0
> [ 20.550782] EA = 0, S1PTW = 0
> [ 20.550784] FSC = 0x04: level 0 translation fault
> [ 20.550785] Data abort info:
> [ 20.550786] ISV = 0, ISS = 0x00000004
> [ 20.550788] CM = 0, WnR = 0
> [ 20.550789] user pgtable: 4k pages, 48-bit VAs, pgdp=0000080009d16000
> [ 20.550791] [0000000000000074] pgd=0000000000000000, p4d=0000000000000000
> [ 20.550796] Internal error: Oops: 0000000096000004 [#1] SMP
> [ 20.550800] Modules linked in: ip6table_nat tun nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc vfat fat xfs snd_usb_audio snd_hwdep snd_usbmidi_lib snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore joydev mc ipmi_ssif ipmi_devintf ipmi_msghandler arm_spe_pmu arm_cmn arm_dsu_pmu arm_dmc620_pmu cppc_cpufreq loop zram crct10dif_ce polyval_ce nvme polyval_generic ghash_ce sbsa_gwdt igb nvme_core ast nvme_common i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
> [ 20.550869] CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 6.3.0-rc2-00008-gd39e48ca80c0 #1
> [ 20.550872] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> [ 20.550875] Workqueue: events fbcon_register_existing_fbs
> [ 20.550884] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 20.550888] pc : drm_crtc_next_vblank_start+0x2c/0x98
> [ 20.550894] lr : drm_atomic_helper_wait_for_fences+0x90/0x240
> [ 20.550898] sp : ffff80000d583960
> [ 20.550900] x29: ffff80000d583960 x28: ffff07ff8fc187b0 x27: 0000000000000000
> [ 20.550904] x26: ffff07ff99c08c00 x25: 0000000000000038 x24: ffff07ff99c0c000
> [ 20.550908] x23: 0000000000000001 x22: 0000000000000038 x21: 0000000000000000
> [ 20.550912] x20: ffff07ff9640a280 x19: 0000000000000000 x18: ffffffffffffffff
> [ 20.550915] x17: 0000000000000000 x16: ffffb24d2eece1c0 x15: 0000003038303178
> [ 20.550919] x14: 3032393100000048 x13: 0000000000000000 x12: 0000000000000000
> [ 20.550923] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffb24d2eeeaca0
> [ 20.550926] x8 : ffff80000d583628 x7 : 0000080077783000 x6 : 0000000000000000
> [ 20.550930] x5 : ffff80000d584000 x4 : ffff07ff99c0c000 x3 : 0000000000000130
> [ 20.550934] x2 : 0000000000000000 x1 : ffff80000d5839c0 x0 : ffff07ff99c0cc08
> [ 20.550937] Call trace:
> [ 20.550939] drm_crtc_next_vblank_start+0x2c/0x98
> [ 20.550942] drm_atomic_helper_wait_for_fences+0x90/0x240
> [ 20.550946] drm_atomic_helper_commit+0xb0/0x188
> [ 20.550949] drm_atomic_commit+0xb0/0xf0
> [ 20.550953] drm_client_modeset_commit_atomic+0x218/0x280
> [ 20.550957] drm_client_modeset_commit_locked+0x64/0x1a0
> [ 20.550961] drm_client_modeset_commit+0x38/0x68
> [ 20.550965] __drm_fb_helper_restore_fbdev_mode_unlocked+0xb0/0xf8
> [ 20.550970] drm_fb_helper_set_par+0x44/0x88
> [ 20.550973] fbcon_init+0x1e0/0x4a8
> [ 20.550976] visual_init+0xbc/0x118
> [ 20.550981] do_bind_con_driver.isra.0+0x194/0x3a0
> [ 20.550984] do_take_over_console+0x50/0x70
> [ 20.550987] do_fbcon_takeover+0x74/0xf8
> [ 20.550989] do_fb_registered+0x13c/0x158
> [ 20.550992] fbcon_register_existing_fbs+0x78/0xc0
> [ 20.550995] process_one_work+0x1ec/0x478
> [ 20.551000] worker_thread+0x74/0x418
> [ 20.551002] kthread+0xec/0x100
> [ 20.551005] ret_from_fork+0x10/0x20
> [ 20.551011] Code: f9400004 b9409013 f940a082 9ba30a73 (b9407662)
> [ 20.551013] ---[ end trace 0000000000000000 ]---
>
> If there is any additional information that I can provide or patches I
> can test, I am more than happy to do so.
>
> Cheers,
> Nathan
>
> # bad: [4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5] Add linux-next specific files for 20230331
> # good: [b2bc47e9b2011a183f9d3d3454a294a938082fb9] Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> git bisect start '4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5' 'b2bc47e9b2011a183f9d3d3454a294a938082fb9'
> # good: [ed5f95f3349003d74a4a11b27b0f05d6794c382a] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect good ed5f95f3349003d74a4a11b27b0f05d6794c382a
> # bad: [85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> git bisect bad 85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b
> # bad: [fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f] Merge branch 'msm-next' of https://gitlab.freedesktop.org/drm/msm.git
> git bisect bad fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f
> # good: [90031bc33f7525f0cc7a9ef0b1df62a1a4463382] Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
> git bisect good 90031bc33f7525f0cc7a9ef0b1df62a1a4463382
> # good: [d4e04817db670083aed73de1fadd3b21758e69ba] drm/amdgpu: Return from switch early for EEPROM I2C address
> git bisect good d4e04817db670083aed73de1fadd3b21758e69ba
> # good: [70e360f9b548d99f959668d4f047d1363d42fe8e] drm: exynos: dsi: Consolidate component and bridge
> git bisect good 70e360f9b548d99f959668d4f047d1363d42fe8e
> # bad: [0b43595d0cbb06736d1e572e79e29a410a273573] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
> git bisect bad 0b43595d0cbb06736d1e572e79e29a410a273573
> # good: [fbb3b3500f76ec8b741bd2d0e761ca3e856ad924] dt-bindings: display: boe,tv101wum-nl6: document rotation
> git bisect good fbb3b3500f76ec8b741bd2d0e761ca3e856ad924
> # bad: [82bbec189ab34873688484cd14189a5392946fbb] Merge v6.3-rc4 into drm-next
> git bisect bad 82bbec189ab34873688484cd14189a5392946fbb
> # bad: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank
> git bisect bad d39e48ca80c0960b039cb38633957f0040f63e1a
> # good: [d7d5a21dd6b4706c04fbba5d25db8da5f25aab68] dma-buf/dma-resv: Add a way to set fence deadline
> git bisect good d7d5a21dd6b4706c04fbba5d25db8da5f25aab68
> # good: [f3823da7e4ba7d4781375c2bb786a8a78efc6591] drm/scheduler: Add fence deadline support
> git bisect good f3823da7e4ba7d4781375c2bb786a8a78efc6591
> # good: [b2c077d001b612b1f34f7e528b2dc6072bd6794e] drm/vblank: Add helper to get next vblank time
> git bisect good b2c077d001b612b1f34f7e528b2dc6072bd6794e
> # first bad commit: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank

2023-03-31 23:47:10

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On Fri, Mar 31, 2023 at 03:14:30PM -0700, Rob Clark wrote:
> On Fri, Mar 31, 2023 at 1:44 PM Nathan Chancellor <[email protected]> wrote:
> >
> > Hi Rob,
> >
> > On Wed, Mar 08, 2023 at 07:53:02AM -0800, Rob Clark wrote:
> > > From: Rob Clark <[email protected]>
> > >
> > > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > > the next vblank time, and inform the fence(s) of that deadline.
> > >
> > > v2: Comment typo fix (danvet)
> > > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> > >
> > > Signed-off-by: Rob Clark <[email protected]>
> > > Reviewed-by: Daniel Vetter <[email protected]>
> > > Signed-off-by: Rob Clark <[email protected]>
> >
> > I apologize if this has already been reported or fixed, I searched lore
> > but did not find anything.
> >
> > This change as commit d39e48ca80c0 ("drm/atomic-helper: Set fence
> > deadline for vblank") in -next causes a hang while running LTP's
> > read_all test on /proc on my Ampere Altra system (it seems it is hanging
> > on a pagemap file?). Additionally, I have this splat in dmesg, which
> > seems related based on the call stack.
>
> Hi, I'm not familiar with this hardware.. do you know which drm driver
> is used? I can't tell from the call-stack.

I think it is drivers/gpu/drm/ast, as I see ast in lsmod?

> > [ 20.542591] fbcon: Taking over console
> > [ 20.550772] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000074
> > [ 20.550776] Mem abort info:
> > [ 20.550777] ESR = 0x0000000096000004
> > [ 20.550779] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 20.550781] SET = 0, FnV = 0
> > [ 20.550782] EA = 0, S1PTW = 0
> > [ 20.550784] FSC = 0x04: level 0 translation fault
> > [ 20.550785] Data abort info:
> > [ 20.550786] ISV = 0, ISS = 0x00000004
> > [ 20.550788] CM = 0, WnR = 0
> > [ 20.550789] user pgtable: 4k pages, 48-bit VAs, pgdp=0000080009d16000
> > [ 20.550791] [0000000000000074] pgd=0000000000000000, p4d=0000000000000000
> > [ 20.550796] Internal error: Oops: 0000000096000004 [#1] SMP
> > [ 20.550800] Modules linked in: ip6table_nat tun nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc vfat fat xfs snd_usb_audio snd_hwdep snd_usbmidi_lib snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore joydev mc ipmi_ssif ipmi_devintf ipmi_msghandler arm_spe_pmu arm_cmn arm_dsu_pmu arm_dmc620_pmu cppc_cpufreq loop zram crct10dif_ce polyval_ce nvme polyval_generic ghash_ce sbsa_gwdt igb nvme_core ast nvme_common i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
> > [ 20.550869] CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 6.3.0-rc2-00008-gd39e48ca80c0 #1
> > [ 20.550872] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> > [ 20.550875] Workqueue: events fbcon_register_existing_fbs
> > [ 20.550884] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 20.550888] pc : drm_crtc_next_vblank_start+0x2c/0x98
> > [ 20.550894] lr : drm_atomic_helper_wait_for_fences+0x90/0x240
> > [ 20.550898] sp : ffff80000d583960
> > [ 20.550900] x29: ffff80000d583960 x28: ffff07ff8fc187b0 x27: 0000000000000000
> > [ 20.550904] x26: ffff07ff99c08c00 x25: 0000000000000038 x24: ffff07ff99c0c000
> > [ 20.550908] x23: 0000000000000001 x22: 0000000000000038 x21: 0000000000000000
> > [ 20.550912] x20: ffff07ff9640a280 x19: 0000000000000000 x18: ffffffffffffffff
> > [ 20.550915] x17: 0000000000000000 x16: ffffb24d2eece1c0 x15: 0000003038303178
> > [ 20.550919] x14: 3032393100000048 x13: 0000000000000000 x12: 0000000000000000
> > [ 20.550923] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffb24d2eeeaca0
> > [ 20.550926] x8 : ffff80000d583628 x7 : 0000080077783000 x6 : 0000000000000000
> > [ 20.550930] x5 : ffff80000d584000 x4 : ffff07ff99c0c000 x3 : 0000000000000130
> > [ 20.550934] x2 : 0000000000000000 x1 : ffff80000d5839c0 x0 : ffff07ff99c0cc08
> > [ 20.550937] Call trace:
> > [ 20.550939] drm_crtc_next_vblank_start+0x2c/0x98
> > [ 20.550942] drm_atomic_helper_wait_for_fences+0x90/0x240
> > [ 20.550946] drm_atomic_helper_commit+0xb0/0x188
> > [ 20.550949] drm_atomic_commit+0xb0/0xf0
> > [ 20.550953] drm_client_modeset_commit_atomic+0x218/0x280
> > [ 20.550957] drm_client_modeset_commit_locked+0x64/0x1a0
> > [ 20.550961] drm_client_modeset_commit+0x38/0x68
> > [ 20.550965] __drm_fb_helper_restore_fbdev_mode_unlocked+0xb0/0xf8
> > [ 20.550970] drm_fb_helper_set_par+0x44/0x88
> > [ 20.550973] fbcon_init+0x1e0/0x4a8
> > [ 20.550976] visual_init+0xbc/0x118
> > [ 20.550981] do_bind_con_driver.isra.0+0x194/0x3a0
> > [ 20.550984] do_take_over_console+0x50/0x70
> > [ 20.550987] do_fbcon_takeover+0x74/0xf8
> > [ 20.550989] do_fb_registered+0x13c/0x158
> > [ 20.550992] fbcon_register_existing_fbs+0x78/0xc0
> > [ 20.550995] process_one_work+0x1ec/0x478
> > [ 20.551000] worker_thread+0x74/0x418
> > [ 20.551002] kthread+0xec/0x100
> > [ 20.551005] ret_from_fork+0x10/0x20
> > [ 20.551011] Code: f9400004 b9409013 f940a082 9ba30a73 (b9407662)
> > [ 20.551013] ---[ end trace 0000000000000000 ]---
> >
> > If there is any additional information that I can provide or patches I
> > can test, I am more than happy to do so.
> >
> > Cheers,
> > Nathan
> >
> > # bad: [4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5] Add linux-next specific files for 20230331
> > # good: [b2bc47e9b2011a183f9d3d3454a294a938082fb9] Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > git bisect start '4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5' 'b2bc47e9b2011a183f9d3d3454a294a938082fb9'
> > # good: [ed5f95f3349003d74a4a11b27b0f05d6794c382a] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > git bisect good ed5f95f3349003d74a4a11b27b0f05d6794c382a
> > # bad: [85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > git bisect bad 85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b
> > # bad: [fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f] Merge branch 'msm-next' of https://gitlab.freedesktop.org/drm/msm.git
> > git bisect bad fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f
> > # good: [90031bc33f7525f0cc7a9ef0b1df62a1a4463382] Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
> > git bisect good 90031bc33f7525f0cc7a9ef0b1df62a1a4463382
> > # good: [d4e04817db670083aed73de1fadd3b21758e69ba] drm/amdgpu: Return from switch early for EEPROM I2C address
> > git bisect good d4e04817db670083aed73de1fadd3b21758e69ba
> > # good: [70e360f9b548d99f959668d4f047d1363d42fe8e] drm: exynos: dsi: Consolidate component and bridge
> > git bisect good 70e360f9b548d99f959668d4f047d1363d42fe8e
> > # bad: [0b43595d0cbb06736d1e572e79e29a410a273573] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
> > git bisect bad 0b43595d0cbb06736d1e572e79e29a410a273573
> > # good: [fbb3b3500f76ec8b741bd2d0e761ca3e856ad924] dt-bindings: display: boe,tv101wum-nl6: document rotation
> > git bisect good fbb3b3500f76ec8b741bd2d0e761ca3e856ad924
> > # bad: [82bbec189ab34873688484cd14189a5392946fbb] Merge v6.3-rc4 into drm-next
> > git bisect bad 82bbec189ab34873688484cd14189a5392946fbb
> > # bad: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank
> > git bisect bad d39e48ca80c0960b039cb38633957f0040f63e1a
> > # good: [d7d5a21dd6b4706c04fbba5d25db8da5f25aab68] dma-buf/dma-resv: Add a way to set fence deadline
> > git bisect good d7d5a21dd6b4706c04fbba5d25db8da5f25aab68
> > # good: [f3823da7e4ba7d4781375c2bb786a8a78efc6591] drm/scheduler: Add fence deadline support
> > git bisect good f3823da7e4ba7d4781375c2bb786a8a78efc6591
> > # good: [b2c077d001b612b1f34f7e528b2dc6072bd6794e] drm/vblank: Add helper to get next vblank time
> > git bisect good b2c077d001b612b1f34f7e528b2dc6072bd6794e
> > # first bad commit: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank

2023-04-01 15:39:56

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On Fri, Mar 31, 2023 at 4:30 PM Nathan Chancellor <[email protected]> wrote:
>
> On Fri, Mar 31, 2023 at 03:14:30PM -0700, Rob Clark wrote:
> > On Fri, Mar 31, 2023 at 1:44 PM Nathan Chancellor <[email protected]> wrote:
> > >
> > > Hi Rob,
> > >
> > > On Wed, Mar 08, 2023 at 07:53:02AM -0800, Rob Clark wrote:
> > > > From: Rob Clark <[email protected]>
> > > >
> > > > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > > > the next vblank time, and inform the fence(s) of that deadline.
> > > >
> > > > v2: Comment typo fix (danvet)
> > > > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> > > >
> > > > Signed-off-by: Rob Clark <[email protected]>
> > > > Reviewed-by: Daniel Vetter <[email protected]>
> > > > Signed-off-by: Rob Clark <[email protected]>
> > >
> > > I apologize if this has already been reported or fixed, I searched lore
> > > but did not find anything.
> > >
> > > This change as commit d39e48ca80c0 ("drm/atomic-helper: Set fence
> > > deadline for vblank") in -next causes a hang while running LTP's
> > > read_all test on /proc on my Ampere Altra system (it seems it is hanging
> > > on a pagemap file?). Additionally, I have this splat in dmesg, which
> > > seems related based on the call stack.
> >
> > Hi, I'm not familiar with this hardware.. do you know which drm driver
> > is used? I can't tell from the call-stack.
>
> I think it is drivers/gpu/drm/ast, as I see ast in lsmod?

Ok, assuming my theory is correct, this should fix it:

https://patchwork.freedesktop.org/series/115992/

BR,
-R

> > > [ 20.542591] fbcon: Taking over console
> > > [ 20.550772] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000074
> > > [ 20.550776] Mem abort info:
> > > [ 20.550777] ESR = 0x0000000096000004
> > > [ 20.550779] EC = 0x25: DABT (current EL), IL = 32 bits
> > > [ 20.550781] SET = 0, FnV = 0
> > > [ 20.550782] EA = 0, S1PTW = 0
> > > [ 20.550784] FSC = 0x04: level 0 translation fault
> > > [ 20.550785] Data abort info:
> > > [ 20.550786] ISV = 0, ISS = 0x00000004
> > > [ 20.550788] CM = 0, WnR = 0
> > > [ 20.550789] user pgtable: 4k pages, 48-bit VAs, pgdp=0000080009d16000
> > > [ 20.550791] [0000000000000074] pgd=0000000000000000, p4d=0000000000000000
> > > [ 20.550796] Internal error: Oops: 0000000096000004 [#1] SMP
> > > [ 20.550800] Modules linked in: ip6table_nat tun nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc vfat fat xfs snd_usb_audio snd_hwdep snd_usbmidi_lib snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore joydev mc ipmi_ssif ipmi_devintf ipmi_msghandler arm_spe_pmu arm_cmn arm_dsu_pmu arm_dmc620_pmu cppc_cpufreq loop zram crct10dif_ce polyval_ce nvme polyval_generic ghash_ce sbsa_gwdt igb nvme_core ast nvme_common i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
> > > [ 20.550869] CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 6.3.0-rc2-00008-gd39e48ca80c0 #1
> > > [ 20.550872] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> > > [ 20.550875] Workqueue: events fbcon_register_existing_fbs
> > > [ 20.550884] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [ 20.550888] pc : drm_crtc_next_vblank_start+0x2c/0x98
> > > [ 20.550894] lr : drm_atomic_helper_wait_for_fences+0x90/0x240
> > > [ 20.550898] sp : ffff80000d583960
> > > [ 20.550900] x29: ffff80000d583960 x28: ffff07ff8fc187b0 x27: 0000000000000000
> > > [ 20.550904] x26: ffff07ff99c08c00 x25: 0000000000000038 x24: ffff07ff99c0c000
> > > [ 20.550908] x23: 0000000000000001 x22: 0000000000000038 x21: 0000000000000000
> > > [ 20.550912] x20: ffff07ff9640a280 x19: 0000000000000000 x18: ffffffffffffffff
> > > [ 20.550915] x17: 0000000000000000 x16: ffffb24d2eece1c0 x15: 0000003038303178
> > > [ 20.550919] x14: 3032393100000048 x13: 0000000000000000 x12: 0000000000000000
> > > [ 20.550923] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffb24d2eeeaca0
> > > [ 20.550926] x8 : ffff80000d583628 x7 : 0000080077783000 x6 : 0000000000000000
> > > [ 20.550930] x5 : ffff80000d584000 x4 : ffff07ff99c0c000 x3 : 0000000000000130
> > > [ 20.550934] x2 : 0000000000000000 x1 : ffff80000d5839c0 x0 : ffff07ff99c0cc08
> > > [ 20.550937] Call trace:
> > > [ 20.550939] drm_crtc_next_vblank_start+0x2c/0x98
> > > [ 20.550942] drm_atomic_helper_wait_for_fences+0x90/0x240
> > > [ 20.550946] drm_atomic_helper_commit+0xb0/0x188
> > > [ 20.550949] drm_atomic_commit+0xb0/0xf0
> > > [ 20.550953] drm_client_modeset_commit_atomic+0x218/0x280
> > > [ 20.550957] drm_client_modeset_commit_locked+0x64/0x1a0
> > > [ 20.550961] drm_client_modeset_commit+0x38/0x68
> > > [ 20.550965] __drm_fb_helper_restore_fbdev_mode_unlocked+0xb0/0xf8
> > > [ 20.550970] drm_fb_helper_set_par+0x44/0x88
> > > [ 20.550973] fbcon_init+0x1e0/0x4a8
> > > [ 20.550976] visual_init+0xbc/0x118
> > > [ 20.550981] do_bind_con_driver.isra.0+0x194/0x3a0
> > > [ 20.550984] do_take_over_console+0x50/0x70
> > > [ 20.550987] do_fbcon_takeover+0x74/0xf8
> > > [ 20.550989] do_fb_registered+0x13c/0x158
> > > [ 20.550992] fbcon_register_existing_fbs+0x78/0xc0
> > > [ 20.550995] process_one_work+0x1ec/0x478
> > > [ 20.551000] worker_thread+0x74/0x418
> > > [ 20.551002] kthread+0xec/0x100
> > > [ 20.551005] ret_from_fork+0x10/0x20
> > > [ 20.551011] Code: f9400004 b9409013 f940a082 9ba30a73 (b9407662)
> > > [ 20.551013] ---[ end trace 0000000000000000 ]---
> > >
> > > If there is any additional information that I can provide or patches I
> > > can test, I am more than happy to do so.
> > >
> > > Cheers,
> > > Nathan
> > >
> > > # bad: [4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5] Add linux-next specific files for 20230331
> > > # good: [b2bc47e9b2011a183f9d3d3454a294a938082fb9] Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > > git bisect start '4b0f4525dc4fe8af17b3daefe585f0c2eb0fe0a5' 'b2bc47e9b2011a183f9d3d3454a294a938082fb9'
> > > # good: [ed5f95f3349003d74a4a11b27b0f05d6794c382a] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > > git bisect good ed5f95f3349003d74a4a11b27b0f05d6794c382a
> > > # bad: [85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
> > > git bisect bad 85f7d1bfa30a05df2c9d8a0e9f6b1f23b4a6f13b
> > > # bad: [fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f] Merge branch 'msm-next' of https://gitlab.freedesktop.org/drm/msm.git
> > > git bisect bad fbd0f79f200f8e5cb73fb3d7b788de09a8f33a6f
> > > # good: [90031bc33f7525f0cc7a9ef0b1df62a1a4463382] Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
> > > git bisect good 90031bc33f7525f0cc7a9ef0b1df62a1a4463382
> > > # good: [d4e04817db670083aed73de1fadd3b21758e69ba] drm/amdgpu: Return from switch early for EEPROM I2C address
> > > git bisect good d4e04817db670083aed73de1fadd3b21758e69ba
> > > # good: [70e360f9b548d99f959668d4f047d1363d42fe8e] drm: exynos: dsi: Consolidate component and bridge
> > > git bisect good 70e360f9b548d99f959668d4f047d1363d42fe8e
> > > # bad: [0b43595d0cbb06736d1e572e79e29a410a273573] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
> > > git bisect bad 0b43595d0cbb06736d1e572e79e29a410a273573
> > > # good: [fbb3b3500f76ec8b741bd2d0e761ca3e856ad924] dt-bindings: display: boe,tv101wum-nl6: document rotation
> > > git bisect good fbb3b3500f76ec8b741bd2d0e761ca3e856ad924
> > > # bad: [82bbec189ab34873688484cd14189a5392946fbb] Merge v6.3-rc4 into drm-next
> > > git bisect bad 82bbec189ab34873688484cd14189a5392946fbb
> > > # bad: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank
> > > git bisect bad d39e48ca80c0960b039cb38633957f0040f63e1a
> > > # good: [d7d5a21dd6b4706c04fbba5d25db8da5f25aab68] dma-buf/dma-resv: Add a way to set fence deadline
> > > git bisect good d7d5a21dd6b4706c04fbba5d25db8da5f25aab68
> > > # good: [f3823da7e4ba7d4781375c2bb786a8a78efc6591] drm/scheduler: Add fence deadline support
> > > git bisect good f3823da7e4ba7d4781375c2bb786a8a78efc6591
> > > # good: [b2c077d001b612b1f34f7e528b2dc6072bd6794e] drm/vblank: Add helper to get next vblank time
> > > git bisect good b2c077d001b612b1f34f7e528b2dc6072bd6794e
> > > # first bad commit: [d39e48ca80c0960b039cb38633957f0040f63e1a] drm/atomic-helper: Set fence deadline for vblank

2023-04-04 17:23:41

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On 08/03/2023 17:53, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> the next vblank time, and inform the fence(s) of that deadline.
>
> v2: Comment typo fix (danvet)
> v3: If there are multiple CRTCs, consider the time of the soonest vblank
>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Daniel Vetter <[email protected]>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)

As I started playing with hotplug on RB5 (sm8250, DSI-HDMI bridge), I
found that this patch introduces the following backtrace on HDMI
hotplug. Is there anything that I can do to debug/fix the issue? The
warning seems harmless, but it would be probably be good to still fix
it. With addresses decoded:

[ 31.151348] ------------[ cut here ]------------
[ 31.157043] msm_dpu ae01000.display-controller:
drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
[ 31.157177] WARNING: CPU: 0 PID: 13 at
drivers/gpu/drm/drm_vblank.c:728
drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[ 31.180629] Modules linked in:
[ 31.184106] CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted
6.3.0-rc2-00008-gd39e48ca80c0 #542
[ 31.193358] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
[ 31.200796] Workqueue: events lt9611uxc_hpd_work
[ 31.205990] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 31.213722] pc : drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[ 31.222032] lr : drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[ 31.230341] sp : ffff8000080bb8d0
[ 31.234061] x29: ffff8000080bb900 x28: 0000000000000038 x27:
ffff61a7956b8d60
[ 31.242051] x26: 0000000000000000 x25: 0000000000000000 x24:
ffff8000080bb9c4
[ 31.250038] x23: 0000000000000001 x22: ffffbf0033b94ef0 x21:
ffff61a7957901d0
[ 31.258029] x20: ffff61a795710000 x19: ffff61a78128b000 x18:
fffffffffffec278
[ 31.266014] x17: 0040000000000465 x16: 0000000000000020 x15:
0000000000000060
[ 31.274001] x14: 0000000000000001 x13: ffffbf00354550e0 x12:
0000000000000825
[ 31.281989] x11: 00000000000002b7 x10: ffffbf00354b1208 x9 :
ffffbf00354550e0
[ 31.289976] x8 : 00000000ffffefff x7 : ffffbf00354ad0e0 x6 :
00000000000002b7
[ 31.297963] x5 : ffff61a8feebbe48 x4 : 40000000fffff2b7 x3 :
ffffa2a8c9f64000
[ 31.305947] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
ffff61a780283100
[ 31.313934] Call trace:
[ 31.316719] drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[ 31.324646] drm_crtc_vblank_helper_get_vblank_timestamp
(drivers/gpu/drm/drm_vblank.c:843)
[ 31.331528] drm_crtc_get_last_vbltimestamp
(drivers/gpu/drm/drm_vblank.c:884)
[ 31.337170] drm_crtc_next_vblank_start
(drivers/gpu/drm/drm_vblank.c:1006)
[ 31.342430] drm_atomic_helper_wait_for_fences
(drivers/gpu/drm/drm_atomic_helper.c:1531
drivers/gpu/drm/drm_atomic_helper.c:1578)
[ 31.348561] drm_atomic_helper_commit
(drivers/gpu/drm/drm_atomic_helper.c:2007)
[ 31.353724] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1444)
[ 31.358127] drm_client_modeset_commit_atomic
(drivers/gpu/drm/drm_client_modeset.c:1045)
[ 31.364146] drm_client_modeset_commit_locked
(drivers/gpu/drm/drm_client_modeset.c:1148)
[ 31.370071] drm_client_modeset_commit
(drivers/gpu/drm/drm_client_modeset.c:1174)
[ 31.375233] drm_fb_helper_set_par
(drivers/gpu/drm/drm_fb_helper.c:254 drivers/gpu/drm/drm_fb_helper.c:229
drivers/gpu/drm/drm_fb_helper.c:1644)
[ 31.380108] drm_fb_helper_hotplug_event
(drivers/gpu/drm/drm_fb_helper.c:2302 (discriminator 4))
[ 31.385456] drm_fb_helper_output_poll_changed
(drivers/gpu/drm/drm_fb_helper.c:2331)
[ 31.391376] drm_kms_helper_hotplug_event
(drivers/gpu/drm/drm_probe_helper.c:697)
[ 31.396825] drm_bridge_connector_hpd_cb
(drivers/gpu/drm/drm_bridge_connector.c:129)
[ 31.402175] drm_bridge_hpd_notify (drivers/gpu/drm/drm_bridge.c:1315)
[ 31.406954] lt9611uxc_hpd_work
(drivers/gpu/drm/bridge/lontium-lt9611uxc.c:185)
[ 31.411450] process_one_work (kernel/workqueue.c:2395)
[ 31.415949] worker_thread (include/linux/list.h:292
kernel/workqueue.c:2538)
[ 31.426843] kthread (kernel/kthread.c:376)
[ 31.437182] ret_from_fork (arch/arm64/kernel/entry.S:871)
[ 31.447828] irq event stamp: 44642
[ 31.458284] hardirqs last enabled at (44641): __up_console_sem
(arch/arm64/include/asm/irqflags.h:182 (discriminator 1)
arch/arm64/include/asm/irqflags.h:202 (discriminator 1)
kernel/printk/printk.c:345 (discriminator 1))
[ 31.474540] hardirqs last disabled at (44642): el1_dbg
(arch/arm64/kernel/entry-common.c:335 arch/arm64/kernel/entry-common.c:406)
[ 31.489882] softirqs last enabled at (42912): _stext
(arch/arm64/include/asm/current.h:19 arch/arm64/include/asm/preempt.h:13
kernel/softirq.c:415 kernel/softirq.c:600)
[ 31.505256] softirqs last disabled at (42907): ____do_softirq
(arch/arm64/kernel/irq.c:81)
[ 31.521139] ---[ end trace 0000000000000000 ]---



--
With best wishes
Dmitry

2023-04-04 19:25:57

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On Tue, Apr 04, 2023 at 08:22:05PM +0300, Dmitry Baryshkov wrote:
> On 08/03/2023 17:53, Rob Clark wrote:
> > From: Rob Clark <[email protected]>
> >
> > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > the next vblank time, and inform the fence(s) of that deadline.
> >
> > v2: Comment typo fix (danvet)
> > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > Reviewed-by: Daniel Vetter <[email protected]>
> > Signed-off-by: Rob Clark <[email protected]>
> > ---
> > drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++++++++++++++++++++
> > 1 file changed, 37 insertions(+)
>
> As I started playing with hotplug on RB5 (sm8250, DSI-HDMI bridge), I found
> that this patch introduces the following backtrace on HDMI hotplug. Is there
> anything that I can do to debug/fix the issue? The warning seems harmless,
> but it would be probably be good to still fix it. With addresses decoded:

Bit a shot in the dark, but does the below help?


diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index f21b5a74176c..6640d80d84f3 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
ktime_t v;

+ if (drm_atomic_crtc_needs_modeset(new_crtc_state))
+ continue;
+
if (drm_crtc_next_vblank_start(crtc, &v))
continue;

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 78a8c51a4abf..7ae38e8e27e8 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1001,6 +1001,9 @@ int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime)
struct drm_display_mode *mode = &vblank->hwmode;
u64 vblank_start;

+ if (!drm_dev_has_vblank(crtc->dev))
+ return -EINVAL;
+
if (!vblank->framedur_ns || !vblank->linedur_ns)
return -EINVAL;


>
> [ 31.151348] ------------[ cut here ]------------
> [ 31.157043] msm_dpu ae01000.display-controller:
> drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
> [ 31.157177] WARNING: CPU: 0 PID: 13 at drivers/gpu/drm/drm_vblank.c:728
> drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [ 31.180629] Modules linked in:
> [ 31.184106] CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted
> 6.3.0-rc2-00008-gd39e48ca80c0 #542
> [ 31.193358] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
> [ 31.200796] Workqueue: events lt9611uxc_hpd_work
> [ 31.205990] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [ 31.213722] pc : drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [ 31.222032] lr : drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [ 31.230341] sp : ffff8000080bb8d0
> [ 31.234061] x29: ffff8000080bb900 x28: 0000000000000038 x27:
> ffff61a7956b8d60
> [ 31.242051] x26: 0000000000000000 x25: 0000000000000000 x24:
> ffff8000080bb9c4
> [ 31.250038] x23: 0000000000000001 x22: ffffbf0033b94ef0 x21:
> ffff61a7957901d0
> [ 31.258029] x20: ffff61a795710000 x19: ffff61a78128b000 x18:
> fffffffffffec278
> [ 31.266014] x17: 0040000000000465 x16: 0000000000000020 x15:
> 0000000000000060
> [ 31.274001] x14: 0000000000000001 x13: ffffbf00354550e0 x12:
> 0000000000000825
> [ 31.281989] x11: 00000000000002b7 x10: ffffbf00354b1208 x9 :
> ffffbf00354550e0
> [ 31.289976] x8 : 00000000ffffefff x7 : ffffbf00354ad0e0 x6 :
> 00000000000002b7
> [ 31.297963] x5 : ffff61a8feebbe48 x4 : 40000000fffff2b7 x3 :
> ffffa2a8c9f64000
> [ 31.305947] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffff61a780283100
> [ 31.313934] Call trace:
> [ 31.316719] drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [ 31.324646] drm_crtc_vblank_helper_get_vblank_timestamp
> (drivers/gpu/drm/drm_vblank.c:843)
> [ 31.331528] drm_crtc_get_last_vbltimestamp
> (drivers/gpu/drm/drm_vblank.c:884)
> [ 31.337170] drm_crtc_next_vblank_start
> (drivers/gpu/drm/drm_vblank.c:1006)
> [ 31.342430] drm_atomic_helper_wait_for_fences
> (drivers/gpu/drm/drm_atomic_helper.c:1531
> drivers/gpu/drm/drm_atomic_helper.c:1578)
> [ 31.348561] drm_atomic_helper_commit
> (drivers/gpu/drm/drm_atomic_helper.c:2007)
> [ 31.353724] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1444)
> [ 31.358127] drm_client_modeset_commit_atomic
> (drivers/gpu/drm/drm_client_modeset.c:1045)
> [ 31.364146] drm_client_modeset_commit_locked
> (drivers/gpu/drm/drm_client_modeset.c:1148)
> [ 31.370071] drm_client_modeset_commit
> (drivers/gpu/drm/drm_client_modeset.c:1174)
> [ 31.375233] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:254
> drivers/gpu/drm/drm_fb_helper.c:229 drivers/gpu/drm/drm_fb_helper.c:1644)
> [ 31.380108] drm_fb_helper_hotplug_event
> (drivers/gpu/drm/drm_fb_helper.c:2302 (discriminator 4))
> [ 31.385456] drm_fb_helper_output_poll_changed
> (drivers/gpu/drm/drm_fb_helper.c:2331)
> [ 31.391376] drm_kms_helper_hotplug_event
> (drivers/gpu/drm/drm_probe_helper.c:697)
> [ 31.396825] drm_bridge_connector_hpd_cb
> (drivers/gpu/drm/drm_bridge_connector.c:129)
> [ 31.402175] drm_bridge_hpd_notify (drivers/gpu/drm/drm_bridge.c:1315)
> [ 31.406954] lt9611uxc_hpd_work
> (drivers/gpu/drm/bridge/lontium-lt9611uxc.c:185)
> [ 31.411450] process_one_work (kernel/workqueue.c:2395)
> [ 31.415949] worker_thread (include/linux/list.h:292
> kernel/workqueue.c:2538)
> [ 31.426843] kthread (kernel/kthread.c:376)
> [ 31.437182] ret_from_fork (arch/arm64/kernel/entry.S:871)
> [ 31.447828] irq event stamp: 44642
> [ 31.458284] hardirqs last enabled at (44641): __up_console_sem
> (arch/arm64/include/asm/irqflags.h:182 (discriminator 1)
> arch/arm64/include/asm/irqflags.h:202 (discriminator 1)
> kernel/printk/printk.c:345 (discriminator 1))
> [ 31.474540] hardirqs last disabled at (44642): el1_dbg
> (arch/arm64/kernel/entry-common.c:335 arch/arm64/kernel/entry-common.c:406)
> [ 31.489882] softirqs last enabled at (42912): _stext
> (arch/arm64/include/asm/current.h:19 arch/arm64/include/asm/preempt.h:13
> kernel/softirq.c:415 kernel/softirq.c:600)
> [ 31.505256] softirqs last disabled at (42907): ____do_softirq
> (arch/arm64/kernel/irq.c:81)
> [ 31.521139] ---[ end trace 0000000000000000 ]---
>
>
>
> --
> With best wishes
> Dmitry
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2023-04-04 21:56:20

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On 04/04/2023 22:16, Daniel Vetter wrote:
> On Tue, Apr 04, 2023 at 08:22:05PM +0300, Dmitry Baryshkov wrote:
>> On 08/03/2023 17:53, Rob Clark wrote:
>>> From: Rob Clark <[email protected]>
>>>
>>> For an atomic commit updating a single CRTC (ie. a pageflip) calculate
>>> the next vblank time, and inform the fence(s) of that deadline.
>>>
>>> v2: Comment typo fix (danvet)
>>> v3: If there are multiple CRTCs, consider the time of the soonest vblank
>>>
>>> Signed-off-by: Rob Clark <[email protected]>
>>> Reviewed-by: Daniel Vetter <[email protected]>
>>> Signed-off-by: Rob Clark <[email protected]>
>>> ---
>>> drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++++++++++++++++++++
>>> 1 file changed, 37 insertions(+)
>>
>> As I started playing with hotplug on RB5 (sm8250, DSI-HDMI bridge), I found
>> that this patch introduces the following backtrace on HDMI hotplug. Is there
>> anything that I can do to debug/fix the issue? The warning seems harmless,
>> but it would be probably be good to still fix it. With addresses decoded:
>
> Bit a shot in the dark, but does the below help?

This indeed seems to fix the issue. I'm not sure about the possible side
effects, but, if you were to send the patch:

Tested-by: Dmitry Baryshkov <[email protected]>

>
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index f21b5a74176c..6640d80d84f3 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
> for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> ktime_t v;
>
> + if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> + continue;
> +
> if (drm_crtc_next_vblank_start(crtc, &v))
> continue;
>
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index 78a8c51a4abf..7ae38e8e27e8 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -1001,6 +1001,9 @@ int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime)
> struct drm_display_mode *mode = &vblank->hwmode;
> u64 vblank_start;
>
> + if (!drm_dev_has_vblank(crtc->dev))
> + return -EINVAL;
> +
> if (!vblank->framedur_ns || !vblank->linedur_ns)
> return -EINVAL;
>
>
>>
>> [ 31.151348] ------------[ cut here ]------------
>> [ 31.157043] msm_dpu ae01000.display-controller:
>> drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
>> [ 31.157177] WARNING: CPU: 0 PID: 13 at drivers/gpu/drm/drm_vblank.c:728
>> drm_crtc_vblank_helper_get_vblank_timestamp_internal
>> (drivers/gpu/drm/drm_vblank.c:728)
>> [ 31.180629] Modules linked in:
>> [ 31.184106] CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted
>> 6.3.0-rc2-00008-gd39e48ca80c0 #542
>> [ 31.193358] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
>> [ 31.200796] Workqueue: events lt9611uxc_hpd_work
>> [ 31.205990] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [ 31.213722] pc : drm_crtc_vblank_helper_get_vblank_timestamp_internal
>> (drivers/gpu/drm/drm_vblank.c:728)
>> [ 31.222032] lr : drm_crtc_vblank_helper_get_vblank_timestamp_internal
>> (drivers/gpu/drm/drm_vblank.c:728)
>> [ 31.230341] sp : ffff8000080bb8d0
>> [ 31.234061] x29: ffff8000080bb900 x28: 0000000000000038 x27:
>> ffff61a7956b8d60
>> [ 31.242051] x26: 0000000000000000 x25: 0000000000000000 x24:
>> ffff8000080bb9c4
>> [ 31.250038] x23: 0000000000000001 x22: ffffbf0033b94ef0 x21:
>> ffff61a7957901d0
>> [ 31.258029] x20: ffff61a795710000 x19: ffff61a78128b000 x18:
>> fffffffffffec278
>> [ 31.266014] x17: 0040000000000465 x16: 0000000000000020 x15:
>> 0000000000000060
>> [ 31.274001] x14: 0000000000000001 x13: ffffbf00354550e0 x12:
>> 0000000000000825
>> [ 31.281989] x11: 00000000000002b7 x10: ffffbf00354b1208 x9 :
>> ffffbf00354550e0
>> [ 31.289976] x8 : 00000000ffffefff x7 : ffffbf00354ad0e0 x6 :
>> 00000000000002b7
>> [ 31.297963] x5 : ffff61a8feebbe48 x4 : 40000000fffff2b7 x3 :
>> ffffa2a8c9f64000
>> [ 31.305947] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>> ffff61a780283100
>> [ 31.313934] Call trace:
>> [ 31.316719] drm_crtc_vblank_helper_get_vblank_timestamp_internal
>> (drivers/gpu/drm/drm_vblank.c:728)
>> [ 31.324646] drm_crtc_vblank_helper_get_vblank_timestamp
>> (drivers/gpu/drm/drm_vblank.c:843)
>> [ 31.331528] drm_crtc_get_last_vbltimestamp
>> (drivers/gpu/drm/drm_vblank.c:884)
>> [ 31.337170] drm_crtc_next_vblank_start
>> (drivers/gpu/drm/drm_vblank.c:1006)
>> [ 31.342430] drm_atomic_helper_wait_for_fences
>> (drivers/gpu/drm/drm_atomic_helper.c:1531
>> drivers/gpu/drm/drm_atomic_helper.c:1578)
>> [ 31.348561] drm_atomic_helper_commit
>> (drivers/gpu/drm/drm_atomic_helper.c:2007)
>> [ 31.353724] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1444)
>> [ 31.358127] drm_client_modeset_commit_atomic
>> (drivers/gpu/drm/drm_client_modeset.c:1045)
>> [ 31.364146] drm_client_modeset_commit_locked
>> (drivers/gpu/drm/drm_client_modeset.c:1148)
>> [ 31.370071] drm_client_modeset_commit
>> (drivers/gpu/drm/drm_client_modeset.c:1174)
>> [ 31.375233] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:254
>> drivers/gpu/drm/drm_fb_helper.c:229 drivers/gpu/drm/drm_fb_helper.c:1644)
>> [ 31.380108] drm_fb_helper_hotplug_event
>> (drivers/gpu/drm/drm_fb_helper.c:2302 (discriminator 4))
>> [ 31.385456] drm_fb_helper_output_poll_changed
>> (drivers/gpu/drm/drm_fb_helper.c:2331)
>> [ 31.391376] drm_kms_helper_hotplug_event
>> (drivers/gpu/drm/drm_probe_helper.c:697)
>> [ 31.396825] drm_bridge_connector_hpd_cb
>> (drivers/gpu/drm/drm_bridge_connector.c:129)
>> [ 31.402175] drm_bridge_hpd_notify (drivers/gpu/drm/drm_bridge.c:1315)
>> [ 31.406954] lt9611uxc_hpd_work
>> (drivers/gpu/drm/bridge/lontium-lt9611uxc.c:185)
>> [ 31.411450] process_one_work (kernel/workqueue.c:2395)
>> [ 31.415949] worker_thread (include/linux/list.h:292
>> kernel/workqueue.c:2538)
>> [ 31.426843] kthread (kernel/kthread.c:376)
>> [ 31.437182] ret_from_fork (arch/arm64/kernel/entry.S:871)
>> [ 31.447828] irq event stamp: 44642
>> [ 31.458284] hardirqs last enabled at (44641): __up_console_sem
>> (arch/arm64/include/asm/irqflags.h:182 (discriminator 1)
>> arch/arm64/include/asm/irqflags.h:202 (discriminator 1)
>> kernel/printk/printk.c:345 (discriminator 1))
>> [ 31.474540] hardirqs last disabled at (44642): el1_dbg
>> (arch/arm64/kernel/entry-common.c:335 arch/arm64/kernel/entry-common.c:406)
>> [ 31.489882] softirqs last enabled at (42912): _stext
>> (arch/arm64/include/asm/current.h:19 arch/arm64/include/asm/preempt.h:13
>> kernel/softirq.c:415 kernel/softirq.c:600)
>> [ 31.505256] softirqs last disabled at (42907): ____do_softirq
>> (arch/arm64/kernel/irq.c:81)
>> [ 31.521139] ---[ end trace 0000000000000000 ]---
>>
>>
>>
>> --
>> With best wishes
>> Dmitry
>>
>

--
With best wishes
Dmitry

2023-04-05 08:00:08

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

On Wed, Apr 05, 2023 at 12:53:29AM +0300, Dmitry Baryshkov wrote:
> On 04/04/2023 22:16, Daniel Vetter wrote:
> > On Tue, Apr 04, 2023 at 08:22:05PM +0300, Dmitry Baryshkov wrote:
> > > On 08/03/2023 17:53, Rob Clark wrote:
> > > > From: Rob Clark <[email protected]>
> > > >
> > > > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > > > the next vblank time, and inform the fence(s) of that deadline.
> > > >
> > > > v2: Comment typo fix (danvet)
> > > > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> > > >
> > > > Signed-off-by: Rob Clark <[email protected]>
> > > > Reviewed-by: Daniel Vetter <[email protected]>
> > > > Signed-off-by: Rob Clark <[email protected]>
> > > > ---
> > > > drivers/gpu/drm/drm_atomic_helper.c | 37 +++++++++++++++++++++++++++++
> > > > 1 file changed, 37 insertions(+)
> > >
> > > As I started playing with hotplug on RB5 (sm8250, DSI-HDMI bridge), I found
> > > that this patch introduces the following backtrace on HDMI hotplug. Is there
> > > anything that I can do to debug/fix the issue? The warning seems harmless,
> > > but it would be probably be good to still fix it. With addresses decoded:
> >
> > Bit a shot in the dark, but does the below help?
>
> This indeed seems to fix the issue. I'm not sure about the possible side
> effects, but, if you were to send the patch:
>
> Tested-by: Dmitry Baryshkov <[email protected]>

Thanks for the quick feedback, I already discussed this with Rob on irc
yesterday (and landed his more throughrough version of the drm_vblank.c
fix to drm-misc-next). I'll polish the drm_atomic_helper.c part asap and
will send it out. Would be great if you can then retest to make sure all
the pieces still work together for your case.
-Daniel

>
> >
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index f21b5a74176c..6640d80d84f3 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
> > for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> > ktime_t v;
> > + if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> > + continue;
> > +
> > if (drm_crtc_next_vblank_start(crtc, &v))
> > continue;
> > diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > index 78a8c51a4abf..7ae38e8e27e8 100644
> > --- a/drivers/gpu/drm/drm_vblank.c
> > +++ b/drivers/gpu/drm/drm_vblank.c
> > @@ -1001,6 +1001,9 @@ int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime)
> > struct drm_display_mode *mode = &vblank->hwmode;
> > u64 vblank_start;
> > + if (!drm_dev_has_vblank(crtc->dev))
> > + return -EINVAL;
> > +
> > if (!vblank->framedur_ns || !vblank->linedur_ns)
> > return -EINVAL;
> >
> > >
> > > [ 31.151348] ------------[ cut here ]------------
> > > [ 31.157043] msm_dpu ae01000.display-controller:
> > > drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
> > > [ 31.157177] WARNING: CPU: 0 PID: 13 at drivers/gpu/drm/drm_vblank.c:728
> > > drm_crtc_vblank_helper_get_vblank_timestamp_internal
> > > (drivers/gpu/drm/drm_vblank.c:728)
> > > [ 31.180629] Modules linked in:
> > > [ 31.184106] CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted
> > > 6.3.0-rc2-00008-gd39e48ca80c0 #542
> > > [ 31.193358] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
> > > [ 31.200796] Workqueue: events lt9611uxc_hpd_work
> > > [ 31.205990] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> > > BTYPE=--)
> > > [ 31.213722] pc : drm_crtc_vblank_helper_get_vblank_timestamp_internal
> > > (drivers/gpu/drm/drm_vblank.c:728)
> > > [ 31.222032] lr : drm_crtc_vblank_helper_get_vblank_timestamp_internal
> > > (drivers/gpu/drm/drm_vblank.c:728)
> > > [ 31.230341] sp : ffff8000080bb8d0
> > > [ 31.234061] x29: ffff8000080bb900 x28: 0000000000000038 x27:
> > > ffff61a7956b8d60
> > > [ 31.242051] x26: 0000000000000000 x25: 0000000000000000 x24:
> > > ffff8000080bb9c4
> > > [ 31.250038] x23: 0000000000000001 x22: ffffbf0033b94ef0 x21:
> > > ffff61a7957901d0
> > > [ 31.258029] x20: ffff61a795710000 x19: ffff61a78128b000 x18:
> > > fffffffffffec278
> > > [ 31.266014] x17: 0040000000000465 x16: 0000000000000020 x15:
> > > 0000000000000060
> > > [ 31.274001] x14: 0000000000000001 x13: ffffbf00354550e0 x12:
> > > 0000000000000825
> > > [ 31.281989] x11: 00000000000002b7 x10: ffffbf00354b1208 x9 :
> > > ffffbf00354550e0
> > > [ 31.289976] x8 : 00000000ffffefff x7 : ffffbf00354ad0e0 x6 :
> > > 00000000000002b7
> > > [ 31.297963] x5 : ffff61a8feebbe48 x4 : 40000000fffff2b7 x3 :
> > > ffffa2a8c9f64000
> > > [ 31.305947] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> > > ffff61a780283100
> > > [ 31.313934] Call trace:
> > > [ 31.316719] drm_crtc_vblank_helper_get_vblank_timestamp_internal
> > > (drivers/gpu/drm/drm_vblank.c:728)
> > > [ 31.324646] drm_crtc_vblank_helper_get_vblank_timestamp
> > > (drivers/gpu/drm/drm_vblank.c:843)
> > > [ 31.331528] drm_crtc_get_last_vbltimestamp
> > > (drivers/gpu/drm/drm_vblank.c:884)
> > > [ 31.337170] drm_crtc_next_vblank_start
> > > (drivers/gpu/drm/drm_vblank.c:1006)
> > > [ 31.342430] drm_atomic_helper_wait_for_fences
> > > (drivers/gpu/drm/drm_atomic_helper.c:1531
> > > drivers/gpu/drm/drm_atomic_helper.c:1578)
> > > [ 31.348561] drm_atomic_helper_commit
> > > (drivers/gpu/drm/drm_atomic_helper.c:2007)
> > > [ 31.353724] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1444)
> > > [ 31.358127] drm_client_modeset_commit_atomic
> > > (drivers/gpu/drm/drm_client_modeset.c:1045)
> > > [ 31.364146] drm_client_modeset_commit_locked
> > > (drivers/gpu/drm/drm_client_modeset.c:1148)
> > > [ 31.370071] drm_client_modeset_commit
> > > (drivers/gpu/drm/drm_client_modeset.c:1174)
> > > [ 31.375233] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:254
> > > drivers/gpu/drm/drm_fb_helper.c:229 drivers/gpu/drm/drm_fb_helper.c:1644)
> > > [ 31.380108] drm_fb_helper_hotplug_event
> > > (drivers/gpu/drm/drm_fb_helper.c:2302 (discriminator 4))
> > > [ 31.385456] drm_fb_helper_output_poll_changed
> > > (drivers/gpu/drm/drm_fb_helper.c:2331)
> > > [ 31.391376] drm_kms_helper_hotplug_event
> > > (drivers/gpu/drm/drm_probe_helper.c:697)
> > > [ 31.396825] drm_bridge_connector_hpd_cb
> > > (drivers/gpu/drm/drm_bridge_connector.c:129)
> > > [ 31.402175] drm_bridge_hpd_notify (drivers/gpu/drm/drm_bridge.c:1315)
> > > [ 31.406954] lt9611uxc_hpd_work
> > > (drivers/gpu/drm/bridge/lontium-lt9611uxc.c:185)
> > > [ 31.411450] process_one_work (kernel/workqueue.c:2395)
> > > [ 31.415949] worker_thread (include/linux/list.h:292
> > > kernel/workqueue.c:2538)
> > > [ 31.426843] kthread (kernel/kthread.c:376)
> > > [ 31.437182] ret_from_fork (arch/arm64/kernel/entry.S:871)
> > > [ 31.447828] irq event stamp: 44642
> > > [ 31.458284] hardirqs last enabled at (44641): __up_console_sem
> > > (arch/arm64/include/asm/irqflags.h:182 (discriminator 1)
> > > arch/arm64/include/asm/irqflags.h:202 (discriminator 1)
> > > kernel/printk/printk.c:345 (discriminator 1))
> > > [ 31.474540] hardirqs last disabled at (44642): el1_dbg
> > > (arch/arm64/kernel/entry-common.c:335 arch/arm64/kernel/entry-common.c:406)
> > > [ 31.489882] softirqs last enabled at (42912): _stext
> > > (arch/arm64/include/asm/current.h:19 arch/arm64/include/asm/preempt.h:13
> > > kernel/softirq.c:415 kernel/softirq.c:600)
> > > [ 31.505256] softirqs last disabled at (42907): ____do_softirq
> > > (arch/arm64/kernel/irq.c:81)
> > > [ 31.521139] ---[ end trace 0000000000000000 ]---
> > >
> > >
> > >
> > > --
> > > With best wishes
> > > Dmitry
> > >
> >
>
> --
> With best wishes
> Dmitry
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch