Add an atomic helper to implement dirtyfb support. This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).
To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.
Signed-off-by: Rob Clark <[email protected]>
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.
I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place. But we kinda needa a stop-gap solution.
I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit. (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.) The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things. Hence adding an atomic dirtyfb helper.
I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution. Depending on how far off that is, a stop-
gap solution could be useful.
drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
drivers/gpu/drm/msm/msm_fb.c | 1 +
include/drm/drm_atomic_helper.h | 4 +++
include/drm/drm_plane.h | 9 +++++
5 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index c35654591c12..a578dc681b27 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
if (state->fb)
drm_framebuffer_get(state->fb);
+ state->dirty = false;
state->fence = NULL;
state->commit = NULL;
}
@@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
}
EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
+/**
+ * drm_atomic_helper_dirtyfb - helper for dirtyfb
+ *
+ * A helper to implement drm_framebuffer_funcs::dirty
+ */
+int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+ struct drm_file *file_priv, unsigned flags,
+ unsigned color, struct drm_clip_rect *clips,
+ unsigned num_clips)
+{
+ struct drm_modeset_acquire_ctx ctx;
+ struct drm_atomic_state *state;
+ struct drm_plane *plane;
+ int ret = 0;
+
+ /*
+ * When called from ioctl, we are interruptable, but not when
+ * called internally (ie. defio worker)
+ */
+ drm_modeset_acquire_init(&ctx,
+ file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
+
+ state = drm_atomic_state_alloc(fb->dev);
+ if (!state) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ state->acquire_ctx = &ctx;
+
+retry:
+ drm_for_each_plane(plane, fb->dev) {
+ struct drm_plane_state *plane_state;
+
+ if (plane->state->fb != fb)
+ continue;
+
+ plane_state = drm_atomic_get_plane_state(state, plane);
+ if (IS_ERR(plane_state)) {
+ ret = PTR_ERR(plane_state);
+ goto out;
+ }
+
+ plane_state->dirty = true;
+ }
+
+ ret = drm_atomic_nonblocking_commit(state);
+
+out:
+ if (ret == -EDEADLK) {
+ drm_atomic_state_clear(state);
+ ret = drm_modeset_backoff(&ctx);
+ if (!ret)
+ goto retry;
+ }
+
+ drm_atomic_state_put(state);
+
+ drm_modeset_drop_locks(&ctx);
+ drm_modeset_acquire_fini(&ctx);
+
+ return ret;
+
+}
+EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
+
/**
* __drm_atomic_helper_private_duplicate_state - copy atomic private state
* @obj: CRTC object
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index bf5f8c39f34d..bb55a048e98b 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
* Figure out what fence to wait for:
*/
for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
- if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
+ bool sync_fb = new_plane_state->fb &&
+ ((new_plane_state->fb != old_plane_state->fb) ||
+ new_plane_state->dirty);
+ if (sync_fb) {
struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index 0e0c87252ab0..a5d882a34a33 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
.create_handle = msm_framebuffer_create_handle,
.destroy = msm_framebuffer_destroy,
+ .dirty = drm_atomic_helper_dirtyfb,
};
#ifdef CONFIG_DEBUG_FS
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 26aaba58d6ce..9b7a95c2643d 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
u16 *red, u16 *green, u16 *blue,
uint32_t size,
struct drm_modeset_acquire_ctx *ctx);
+int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+ struct drm_file *file_priv, unsigned flags,
+ unsigned color, struct drm_clip_rect *clips,
+ unsigned num_clips);
void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
struct drm_private_state *state);
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index f7bf4a48b1c3..296fa22bda7a 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -76,6 +76,15 @@ struct drm_plane_state {
*/
struct drm_framebuffer *fb;
+ /**
+ * @dirty:
+ *
+ * Flag that indicates the fb contents have changed even though the
+ * fb has not. This is mostly a stop-gap solution until we have
+ * atomic dirty-rect(s) property.
+ */
+ bool dirty;
+
/**
* @fence:
*
--
2.14.3
On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
> Add an atomic helper to implement dirtyfb support. This is needed to
> support DSI command-mode panels with x11 userspace (ie. when we can't
> rely on pageflips to trigger a flush to the panel).
>
> To signal to the driver that the async atomic update needs to
> synchronize with fences, even though the fb didn't change, the
> drm_atomic_state::dirty flag is added.
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> Background: there are a number of different folks working on getting
> upstream kernel working on various different phones/tablets with qcom
> SoC's.. many of them have command mode panels, so we kind of need a
> way to support the legacy dirtyfb ioctl for x11 support.
>
> I know there is work on a proprer non-legacy atomic property for
> userspace to communicate dirty-rect(s) to the kernel, so this can
> be improved from triggering a full-frame flush once that is in
> place. But we kinda needa a stop-gap solution.
>
> I had considered an in-driver solution for this, but things get a
> bit tricky if userspace ands up combining dirtyfb ioctls with page-
> flips, because we need to synchronize setting various CTL.FLUSH bits
> with setting the CTL.START bit. (ie. really all we need to do for
> cmd mode panels is bang CTL.START, but is this ends up racing with
> pageflips setting FLUSH bits, then bad things.) The easiest soln
> is to wrap this up as an atomic commit and rely on the worker to
> serialize things. Hence adding an atomic dirtyfb helper.
>
> I guess at least the helper, with some small addition to translate
> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> rect property solution. Depending on how far off that is, a stop-
> gap solution could be useful.
Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel
>
> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
> drivers/gpu/drm/msm/msm_fb.c | 1 +
> include/drm/drm_atomic_helper.h | 4 +++
> include/drm/drm_plane.h | 9 +++++
> 5 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index c35654591c12..a578dc681b27 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> if (state->fb)
> drm_framebuffer_get(state->fb);
>
> + state->dirty = false;
> state->fence = NULL;
> state->commit = NULL;
> }
> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> }
> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>
> +/**
> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> + *
> + * A helper to implement drm_framebuffer_funcs::dirty
> + */
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv, unsigned flags,
> + unsigned color, struct drm_clip_rect *clips,
> + unsigned num_clips)
> +{
> + struct drm_modeset_acquire_ctx ctx;
> + struct drm_atomic_state *state;
> + struct drm_plane *plane;
> + int ret = 0;
> +
> + /*
> + * When called from ioctl, we are interruptable, but not when
> + * called internally (ie. defio worker)
> + */
> + drm_modeset_acquire_init(&ctx,
> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> +
> + state = drm_atomic_state_alloc(fb->dev);
> + if (!state) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + state->acquire_ctx = &ctx;
> +
> +retry:
> + drm_for_each_plane(plane, fb->dev) {
> + struct drm_plane_state *plane_state;
> +
> + if (plane->state->fb != fb)
> + continue;
> +
> + plane_state = drm_atomic_get_plane_state(state, plane);
> + if (IS_ERR(plane_state)) {
> + ret = PTR_ERR(plane_state);
> + goto out;
> + }
> +
> + plane_state->dirty = true;
> + }
> +
> + ret = drm_atomic_nonblocking_commit(state);
> +
> +out:
> + if (ret == -EDEADLK) {
> + drm_atomic_state_clear(state);
> + ret = drm_modeset_backoff(&ctx);
> + if (!ret)
> + goto retry;
> + }
> +
> + drm_atomic_state_put(state);
> +
> + drm_modeset_drop_locks(&ctx);
> + drm_modeset_acquire_fini(&ctx);
> +
> + return ret;
> +
> +}
> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> +
> /**
> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> * @obj: CRTC object
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index bf5f8c39f34d..bb55a048e98b 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> * Figure out what fence to wait for:
> */
> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> + bool sync_fb = new_plane_state->fb &&
> + ((new_plane_state->fb != old_plane_state->fb) ||
> + new_plane_state->dirty);
> + if (sync_fb) {
> struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> struct msm_gem_object *msm_obj = to_msm_bo(obj);
> struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> index 0e0c87252ab0..a5d882a34a33 100644
> --- a/drivers/gpu/drm/msm/msm_fb.c
> +++ b/drivers/gpu/drm/msm/msm_fb.c
> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> .create_handle = msm_framebuffer_create_handle,
> .destroy = msm_framebuffer_destroy,
> + .dirty = drm_atomic_helper_dirtyfb,
> };
>
> #ifdef CONFIG_DEBUG_FS
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 26aaba58d6ce..9b7a95c2643d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> u16 *red, u16 *green, u16 *blue,
> uint32_t size,
> struct drm_modeset_acquire_ctx *ctx);
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv, unsigned flags,
> + unsigned color, struct drm_clip_rect *clips,
> + unsigned num_clips);
> void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> struct drm_private_state *state);
>
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index f7bf4a48b1c3..296fa22bda7a 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -76,6 +76,15 @@ struct drm_plane_state {
> */
> struct drm_framebuffer *fb;
>
> + /**
> + * @dirty:
> + *
> + * Flag that indicates the fb contents have changed even though the
> + * fb has not. This is mostly a stop-gap solution until we have
> + * atomic dirty-rect(s) property.
> + */
> + bool dirty;
> +
> /**
> * @fence:
> *
> --
> 2.14.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Hi,
On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>> Add an atomic helper to implement dirtyfb support. This is needed to
>> support DSI command-mode panels with x11 userspace (ie. when we can't
>> rely on pageflips to trigger a flush to the panel).
>>
>> To signal to the driver that the async atomic update needs to
>> synchronize with fences, even though the fb didn't change, the
>> drm_atomic_state::dirty flag is added.
>>
>> Signed-off-by: Rob Clark <[email protected]>
>> ---
>> Background: there are a number of different folks working on getting
>> upstream kernel working on various different phones/tablets with qcom
>> SoC's.. many of them have command mode panels, so we kind of need a
>> way to support the legacy dirtyfb ioctl for x11 support.
>>
>> I know there is work on a proprer non-legacy atomic property for
>> userspace to communicate dirty-rect(s) to the kernel, so this can
>> be improved from triggering a full-frame flush once that is in
>> place. But we kinda needa a stop-gap solution.
>>
>> I had considered an in-driver solution for this, but things get a
>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> flips, because we need to synchronize setting various CTL.FLUSH bits
>> with setting the CTL.START bit. (ie. really all we need to do for
>> cmd mode panels is bang CTL.START, but is this ends up racing with
>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>> is to wrap this up as an atomic commit and rely on the worker to
>> serialize things. Hence adding an atomic dirtyfb helper.
>>
>> I guess at least the helper, with some small addition to translate
>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> rect property solution. Depending on how far off that is, a stop-
>> gap solution could be useful.
> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> -Daniel
I've asked Deepak to RFC the core changes suggested for the full dirty
blob on dri-devel. It builds on DisplayLink's suggestion, with a simple
helper to get to the desired coordinates.
One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context,
the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict
the damage region that can be fully ignored by the driver, new content
is indicated by a new framebuffer.
We could do the same for frontbuffer rendering: Either set a dirty flag
like you do here, or provide a content_age state member. Since we clear
the dirty flag on state copies, I guess that would be sufficient. The
blob rectangles would then become a hint to restrict the damage region.
Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.
I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.
/Thomas
On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> Hi,
>
> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
> > > Add an atomic helper to implement dirtyfb support. This is needed to
> > > support DSI command-mode panels with x11 userspace (ie. when we can't
> > > rely on pageflips to trigger a flush to the panel).
> > >
> > > To signal to the driver that the async atomic update needs to
> > > synchronize with fences, even though the fb didn't change, the
> > > drm_atomic_state::dirty flag is added.
> > >
> > > Signed-off-by: Rob Clark <[email protected]>
> > > ---
> > > Background: there are a number of different folks working on getting
> > > upstream kernel working on various different phones/tablets with qcom
> > > SoC's.. many of them have command mode panels, so we kind of need a
> > > way to support the legacy dirtyfb ioctl for x11 support.
> > >
> > > I know there is work on a proprer non-legacy atomic property for
> > > userspace to communicate dirty-rect(s) to the kernel, so this can
> > > be improved from triggering a full-frame flush once that is in
> > > place. But we kinda needa a stop-gap solution.
> > >
> > > I had considered an in-driver solution for this, but things get a
> > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > > flips, because we need to synchronize setting various CTL.FLUSH bits
> > > with setting the CTL.START bit. (ie. really all we need to do for
> > > cmd mode panels is bang CTL.START, but is this ends up racing with
> > > pageflips setting FLUSH bits, then bad things.) The easiest soln
> > > is to wrap this up as an atomic commit and rely on the worker to
> > > serialize things. Hence adding an atomic dirtyfb helper.
> > >
> > > I guess at least the helper, with some small addition to translate
> > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> > > rect property solution. Depending on how far off that is, a stop-
> > > gap solution could be useful.
> > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> > -Daniel
>
> I've asked Deepak to RFC the core changes suggested for the full dirty blob
> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
> get to the desired coordinates.
>
> One thing to perhaps discuss is how we would like to fit this with
> front-buffer rendering and the dirty ioctl. In the page-flip context, the
> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
> damage region that can be fully ignored by the driver, new content is
> indicated by a new framebuffer.
>
> We could do the same for frontbuffer rendering: Either set a dirty flag like
> you do here, or provide a content_age state member. Since we clear the dirty
> flag on state copies, I guess that would be sufficient. The blob rectangles
> would then become a hint to restrict the damage region.
I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.
And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).
So from that pov the new dirty flag is kinda not necessary imo.
> Another approach would be to have the presence of dirty rects without
> framebuffer change to indicate frontbuffer rendering.
>
> I think I like the first approach best, although it may be tempting for
> user-space apps to just set the dirty bit instead of providing the full
> damage region.
Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>> Hi,
>>
>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>> rely on pageflips to trigger a flush to the panel).
>>>>
>>>> To signal to the driver that the async atomic update needs to
>>>> synchronize with fences, even though the fb didn't change, the
>>>> drm_atomic_state::dirty flag is added.
>>>>
>>>> Signed-off-by: Rob Clark <[email protected]>
>>>> ---
>>>> Background: there are a number of different folks working on getting
>>>> upstream kernel working on various different phones/tablets with qcom
>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>
>>>> I know there is work on a proprer non-legacy atomic property for
>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>> be improved from triggering a full-frame flush once that is in
>>>> place. But we kinda needa a stop-gap solution.
>>>>
>>>> I had considered an in-driver solution for this, but things get a
>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>
>>>> I guess at least the helper, with some small addition to translate
>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>> rect property solution. Depending on how far off that is, a stop-
>>>> gap solution could be useful.
>>> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>> -Daniel
>> I've asked Deepak to RFC the core changes suggested for the full dirty blob
>> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>> get to the desired coordinates.
>>
>> One thing to perhaps discuss is how we would like to fit this with
>> front-buffer rendering and the dirty ioctl. In the page-flip context, the
>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>> damage region that can be fully ignored by the driver, new content is
>> indicated by a new framebuffer.
>>
>> We could do the same for frontbuffer rendering: Either set a dirty flag like
>> you do here, or provide a content_age state member. Since we clear the dirty
>> flag on state copies, I guess that would be sufficient. The blob rectangles
>> would then become a hint to restrict the damage region.
> I'm not entirely following here - I thought for frontbuffer rendering the
> dirty rects have always just been a hint, and that the driver was always
> free to re-upload the entire buffer to the screen.
>
> And through a helper like Rob's proposing here (and have floated around in
> different versions already) we'd essentially map a frontbuffer dirtyfb
> call to a fake flip with dirty rect. Manual upload drivers already need to
> upload the entire screen if they get a flip, since some userspace uses
> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>
> So from that pov the new dirty flag is kinda not necessary imo.
>
>> Another approach would be to have the presence of dirty rects without
>> framebuffer change to indicate frontbuffer rendering.
>>
>> I think I like the first approach best, although it may be tempting for
>> user-space apps to just set the dirty bit instead of providing the full
>> damage region.
> Or I'm not following you here, because I don't quite see the difference
> between dirtyfb and a flip.
> -Daniel
OK, let me rephrase:
From the driver's point-of-view, in the atomic world, new content and
the need for manual upload is indicated by a change in fb attached to
the plane.
With Rob's patch here, (correct me if I'm wrong) in addition new content
and the need for manual upload is identified by the dirty flag, (since
the fb stays the same and the driver thus never identifies a page-flip).
In both these situations, clip rects can provide a hint to restrict the
dirty region.
Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:
1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the
same underlying buffer object.
Are you saying that people are already using 3) and we should keep using
that?
/Thomas
On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> > > Hi,
> > >
> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
> > > > > Add an atomic helper to implement dirtyfb support. This is needed to
> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
> > > > > rely on pageflips to trigger a flush to the panel).
> > > > >
> > > > > To signal to the driver that the async atomic update needs to
> > > > > synchronize with fences, even though the fb didn't change, the
> > > > > drm_atomic_state::dirty flag is added.
> > > > >
> > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > ---
> > > > > Background: there are a number of different folks working on getting
> > > > > upstream kernel working on various different phones/tablets with qcom
> > > > > SoC's.. many of them have command mode panels, so we kind of need a
> > > > > way to support the legacy dirtyfb ioctl for x11 support.
> > > > >
> > > > > I know there is work on a proprer non-legacy atomic property for
> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
> > > > > be improved from triggering a full-frame flush once that is in
> > > > > place. But we kinda needa a stop-gap solution.
> > > > >
> > > > > I had considered an in-driver solution for this, but things get a
> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
> > > > > with setting the CTL.START bit. (ie. really all we need to do for
> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
> > > > > pageflips setting FLUSH bits, then bad things.) The easiest soln
> > > > > is to wrap this up as an atomic commit and rely on the worker to
> > > > > serialize things. Hence adding an atomic dirtyfb helper.
> > > > >
> > > > > I guess at least the helper, with some small addition to translate
> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> > > > > rect property solution. Depending on how far off that is, a stop-
> > > > > gap solution could be useful.
> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> > > > -Daniel
> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
> > > get to the desired coordinates.
> > >
> > > One thing to perhaps discuss is how we would like to fit this with
> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
> > > damage region that can be fully ignored by the driver, new content is
> > > indicated by a new framebuffer.
> > >
> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
> > > you do here, or provide a content_age state member. Since we clear the dirty
> > > flag on state copies, I guess that would be sufficient. The blob rectangles
> > > would then become a hint to restrict the damage region.
> > I'm not entirely following here - I thought for frontbuffer rendering the
> > dirty rects have always just been a hint, and that the driver was always
> > free to re-upload the entire buffer to the screen.
> >
> > And through a helper like Rob's proposing here (and have floated around in
> > different versions already) we'd essentially map a frontbuffer dirtyfb
> > call to a fake flip with dirty rect. Manual upload drivers already need to
> > upload the entire screen if they get a flip, since some userspace uses
> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
> >
> > So from that pov the new dirty flag is kinda not necessary imo.
> >
> > > Another approach would be to have the presence of dirty rects without
> > > framebuffer change to indicate frontbuffer rendering.
> > >
> > > I think I like the first approach best, although it may be tempting for
> > > user-space apps to just set the dirty bit instead of providing the full
> > > damage region.
> > Or I'm not following you here, because I don't quite see the difference
> > between dirtyfb and a flip.
> > -Daniel
>
> OK, let me rephrase:
>
> From the driver's point-of-view, in the atomic world, new content and the
> need for manual upload is indicated by a change in fb attached to the plane.
>
> With Rob's patch here, (correct me if I'm wrong) in addition new content and
> the need for manual upload is identified by the dirty flag, (since the fb
> stays the same and the driver thus never identifies a page-flip).
Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.
> In both these situations, clip rects can provide a hint to restrict the
> dirty region.
>
> Now I was thinking about the preferred way for user-space to communicate
> front buffer rendering through the atomic ioctl:
>
> 1) Expose a dirty (or content_age property)
> 2) Attach a clip-rect blob property.
> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
> underlying buffer object.
>
> Are you saying that people are already using 3) and we should keep using
> that?
I'm saying they're using 3b), flip the same buffer wrapped in the same
drm_framebuffer, and expect it to work.
The only advantage dirtyfb has is that it allows you to supply the
optimized upload rectangles, but at the cost of a funny api (it's working
on the fb, not the plane/crtc you want to upload) and lack of drm_event to
confirm when exactly you uploaded your stuff. But imo they should be the
same underlying operation.
Also note that atomic helpers don't optimize out plane flips for same
buffers. We only optimize out some of the waiting, in a failed attempt at
making cursors stall less, but that's not fixed with the async plane
update stuff. And we can obviously optimize out the prepare/cleanup hooks,
because the buffer should be pinned already.
So imo for a quick fix I think we need:
1) Fix drivers (or at least msm) to upload buffers for manual upload on
all flips (well, all screen updates). Probably should do the fence waiting
unconditionally too (and handle cursors with the new async stuff).
2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
commit, which currently means we're throwing the dirty rect optimization
into the wind. But that could easily be added to the drm_plane_state,
without exposing it to userspace as a property just yet.
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
> Add an atomic helper to implement dirtyfb support. This is needed to
> support DSI command-mode panels with x11 userspace (ie. when we can't
> rely on pageflips to trigger a flush to the panel).
>
> To signal to the driver that the async atomic update needs to
> synchronize with fences, even though the fb didn't change, the
> drm_atomic_state::dirty flag is added.
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> Background: there are a number of different folks working on getting
> upstream kernel working on various different phones/tablets with qcom
> SoC's.. many of them have command mode panels, so we kind of need a
> way to support the legacy dirtyfb ioctl for x11 support.
>
> I know there is work on a proprer non-legacy atomic property for
> userspace to communicate dirty-rect(s) to the kernel, so this can
> be improved from triggering a full-frame flush once that is in
> place. But we kinda needa a stop-gap solution.
>
> I had considered an in-driver solution for this, but things get a
> bit tricky if userspace ands up combining dirtyfb ioctls with page-
> flips, because we need to synchronize setting various CTL.FLUSH bits
> with setting the CTL.START bit. (ie. really all we need to do for
> cmd mode panels is bang CTL.START, but is this ends up racing with
> pageflips setting FLUSH bits, then bad things.) The easiest soln
> is to wrap this up as an atomic commit and rely on the worker to
> serialize things. Hence adding an atomic dirtyfb helper.
>
> I guess at least the helper, with some small addition to translate
> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> rect property solution. Depending on how far off that is, a stop-
> gap solution could be useful.
>
> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
> drivers/gpu/drm/msm/msm_fb.c | 1 +
> include/drm/drm_atomic_helper.h | 4 +++
> include/drm/drm_plane.h | 9 +++++
> 5 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index c35654591c12..a578dc681b27 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> if (state->fb)
> drm_framebuffer_get(state->fb);
>
> + state->dirty = false;
> state->fence = NULL;
> state->commit = NULL;
> }
> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> }
> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>
> +/**
> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> + *
> + * A helper to implement drm_framebuffer_funcs::dirty
> + */
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv, unsigned flags,
> + unsigned color, struct drm_clip_rect *clips,
> + unsigned num_clips)
> +{
> + struct drm_modeset_acquire_ctx ctx;
> + struct drm_atomic_state *state;
> + struct drm_plane *plane;
> + int ret = 0;
> +
> + /*
> + * When called from ioctl, we are interruptable, but not when
> + * called internally (ie. defio worker)
> + */
> + drm_modeset_acquire_init(&ctx,
> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> +
> + state = drm_atomic_state_alloc(fb->dev);
> + if (!state) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + state->acquire_ctx = &ctx;
> +
> +retry:
> + drm_for_each_plane(plane, fb->dev) {
> + struct drm_plane_state *plane_state;
> +
> + if (plane->state->fb != fb)
> + continue;
> +
> + plane_state = drm_atomic_get_plane_state(state, plane);
> + if (IS_ERR(plane_state)) {
> + ret = PTR_ERR(plane_state);
> + goto out;
> + }
> +
> + plane_state->dirty = true;
> + }
> +
> + ret = drm_atomic_nonblocking_commit(state);
> +
> +out:
> + if (ret == -EDEADLK) {
> + drm_atomic_state_clear(state);
> + ret = drm_modeset_backoff(&ctx);
> + if (!ret)
> + goto retry;
> + }
> +
> + drm_atomic_state_put(state);
> +
> + drm_modeset_drop_locks(&ctx);
> + drm_modeset_acquire_fini(&ctx);
> +
> + return ret;
> +
> +}
> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> +
> /**
> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> * @obj: CRTC object
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index bf5f8c39f34d..bb55a048e98b 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> * Figure out what fence to wait for:
> */
> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> + bool sync_fb = new_plane_state->fb &&
> + ((new_plane_state->fb != old_plane_state->fb) ||
> + new_plane_state->dirty);
Why do you have this optimization even here? Imo flipping to the same fb
should result in the fb getting fully uploaded, whether you're doing a
legacy page_flip, and atomic one or just a plane update.
Iirc some userspace does use that as essentially a full-plane frontbuffer
rendering flush already. IOW I don't think we need your
plane_state->dirty, it's implied to always be true - why would userspace
do a flip otherwise?
The helper itself to map dirtyfb to a nonblocking atomic commit looks
reasonable, but misses a bunch of the trickery discussed with Noralf and
others I think.
-Daniel
> + if (sync_fb) {
> struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> struct msm_gem_object *msm_obj = to_msm_bo(obj);
> struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> index 0e0c87252ab0..a5d882a34a33 100644
> --- a/drivers/gpu/drm/msm/msm_fb.c
> +++ b/drivers/gpu/drm/msm/msm_fb.c
> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> .create_handle = msm_framebuffer_create_handle,
> .destroy = msm_framebuffer_destroy,
> + .dirty = drm_atomic_helper_dirtyfb,
> };
>
> #ifdef CONFIG_DEBUG_FS
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 26aaba58d6ce..9b7a95c2643d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> u16 *red, u16 *green, u16 *blue,
> uint32_t size,
> struct drm_modeset_acquire_ctx *ctx);
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv, unsigned flags,
> + unsigned color, struct drm_clip_rect *clips,
> + unsigned num_clips);
> void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> struct drm_private_state *state);
>
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index f7bf4a48b1c3..296fa22bda7a 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -76,6 +76,15 @@ struct drm_plane_state {
> */
> struct drm_framebuffer *fb;
>
> + /**
> + * @dirty:
> + *
> + * Flag that indicates the fb contents have changed even though the
> + * fb has not. This is mostly a stop-gap solution until we have
> + * atomic dirty-rect(s) property.
> + */
> + bool dirty;
> +
> /**
> * @fence:
> *
> --
> 2.14.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
> > Add an atomic helper to implement dirtyfb support. This is needed to
> > support DSI command-mode panels with x11 userspace (ie. when we can't
> > rely on pageflips to trigger a flush to the panel).
> >
> > To signal to the driver that the async atomic update needs to
> > synchronize with fences, even though the fb didn't change, the
> > drm_atomic_state::dirty flag is added.
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > ---
> > Background: there are a number of different folks working on getting
> > upstream kernel working on various different phones/tablets with qcom
> > SoC's.. many of them have command mode panels, so we kind of need a
> > way to support the legacy dirtyfb ioctl for x11 support.
> >
> > I know there is work on a proprer non-legacy atomic property for
> > userspace to communicate dirty-rect(s) to the kernel, so this can
> > be improved from triggering a full-frame flush once that is in
> > place. But we kinda needa a stop-gap solution.
> >
> > I had considered an in-driver solution for this, but things get a
> > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > flips, because we need to synchronize setting various CTL.FLUSH bits
> > with setting the CTL.START bit. (ie. really all we need to do for
> > cmd mode panels is bang CTL.START, but is this ends up racing with
> > pageflips setting FLUSH bits, then bad things.) The easiest soln
> > is to wrap this up as an atomic commit and rely on the worker to
> > serialize things. Hence adding an atomic dirtyfb helper.
> >
> > I guess at least the helper, with some small addition to translate
> > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> > rect property solution. Depending on how far off that is, a stop-
> > gap solution could be useful.
> >
> > drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> > drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
> > drivers/gpu/drm/msm/msm_fb.c | 1 +
> > include/drm/drm_atomic_helper.h | 4 +++
> > include/drm/drm_plane.h | 9 +++++
> > 5 files changed, 84 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index c35654591c12..a578dc681b27 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> > if (state->fb)
> > drm_framebuffer_get(state->fb);
> >
> > + state->dirty = false;
> > state->fence = NULL;
> > state->commit = NULL;
> > }
> > @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> > }
> > EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
> >
> > +/**
> > + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> > + *
> > + * A helper to implement drm_framebuffer_funcs::dirty
> > + */
> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> > + struct drm_file *file_priv, unsigned flags,
> > + unsigned color, struct drm_clip_rect *clips,
> > + unsigned num_clips)
> > +{
> > + struct drm_modeset_acquire_ctx ctx;
> > + struct drm_atomic_state *state;
> > + struct drm_plane *plane;
> > + int ret = 0;
> > +
> > + /*
> > + * When called from ioctl, we are interruptable, but not when
> > + * called internally (ie. defio worker)
> > + */
> > + drm_modeset_acquire_init(&ctx,
> > + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> > +
> > + state = drm_atomic_state_alloc(fb->dev);
> > + if (!state) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + state->acquire_ctx = &ctx;
> > +
> > +retry:
> > + drm_for_each_plane(plane, fb->dev) {
> > + struct drm_plane_state *plane_state;
> > +
> > + if (plane->state->fb != fb)
> > + continue;
> > +
> > + plane_state = drm_atomic_get_plane_state(state, plane);
> > + if (IS_ERR(plane_state)) {
> > + ret = PTR_ERR(plane_state);
> > + goto out;
> > + }
> > +
> > + plane_state->dirty = true;
> > + }
> > +
> > + ret = drm_atomic_nonblocking_commit(state);
> > +
> > +out:
> > + if (ret == -EDEADLK) {
> > + drm_atomic_state_clear(state);
> > + ret = drm_modeset_backoff(&ctx);
> > + if (!ret)
> > + goto retry;
> > + }
> > +
> > + drm_atomic_state_put(state);
> > +
> > + drm_modeset_drop_locks(&ctx);
> > + drm_modeset_acquire_fini(&ctx);
> > +
> > + return ret;
> > +
> > +}
> > +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> > +
> > /**
> > * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> > * @obj: CRTC object
> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> > index bf5f8c39f34d..bb55a048e98b 100644
> > --- a/drivers/gpu/drm/msm/msm_atomic.c
> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
> > @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> > * Figure out what fence to wait for:
> > */
> > for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> > - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> > + bool sync_fb = new_plane_state->fb &&
> > + ((new_plane_state->fb != old_plane_state->fb) ||
> > + new_plane_state->dirty);
>
> Why do you have this optimization even here? Imo flipping to the same fb
> should result in the fb getting fully uploaded, whether you're doing a
> legacy page_flip, and atomic one or just a plane update.
>
> Iirc some userspace does use that as essentially a full-plane frontbuffer
> rendering flush already. IOW I don't think we need your
> plane_state->dirty, it's implied to always be true - why would userspace
> do a flip otherwise?
>
> The helper itself to map dirtyfb to a nonblocking atomic commit looks
> reasonable, but misses a bunch of the trickery discussed with Noralf and
> others I think.
Ok, I've done some history digging:
- i915 and nouveau unconditionally wait for fences, even for same-fb
flips.
- no idea what amdgpu and vmwgfx are doing, they're not using
plane_state->fence for implicit fences.
- most arm-soc drivers do have this "optimization" in their code, and it
even managed to get into the new drm_gem_fb_prepare_fb helper (which I
reviewed, or well claimed to have ... oops). Afaict it goes back to the
original msm atomic code, and was then dutifully copypasted all over the
place.
If folks are ok I'll do a patch series to align drivers with i915/nouveau.
Well, any driver using reservation_object_get_excl_rcu +
drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
no idea what they're doing or whether they might have the same bug.
From looking at at least the various prepare_fb callbacks I don't see any
other drivers doing funny stuff around implicit fences.
-Daniel
> > + if (sync_fb) {
> > struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> > struct msm_gem_object *msm_obj = to_msm_bo(obj);
> > struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> > diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> > index 0e0c87252ab0..a5d882a34a33 100644
> > --- a/drivers/gpu/drm/msm/msm_fb.c
> > +++ b/drivers/gpu/drm/msm/msm_fb.c
> > @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> > static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> > .create_handle = msm_framebuffer_create_handle,
> > .destroy = msm_framebuffer_destroy,
> > + .dirty = drm_atomic_helper_dirtyfb,
> > };
> >
> > #ifdef CONFIG_DEBUG_FS
> > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> > index 26aaba58d6ce..9b7a95c2643d 100644
> > --- a/include/drm/drm_atomic_helper.h
> > +++ b/include/drm/drm_atomic_helper.h
> > @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> > u16 *red, u16 *green, u16 *blue,
> > uint32_t size,
> > struct drm_modeset_acquire_ctx *ctx);
> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> > + struct drm_file *file_priv, unsigned flags,
> > + unsigned color, struct drm_clip_rect *clips,
> > + unsigned num_clips);
> > void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> > struct drm_private_state *state);
> >
> > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> > index f7bf4a48b1c3..296fa22bda7a 100644
> > --- a/include/drm/drm_plane.h
> > +++ b/include/drm/drm_plane.h
> > @@ -76,6 +76,15 @@ struct drm_plane_state {
> > */
> > struct drm_framebuffer *fb;
> >
> > + /**
> > + * @dirty:
> > + *
> > + * Flag that indicates the fb contents have changed even though the
> > + * fb has not. This is mostly a stop-gap solution until we have
> > + * atomic dirty-rect(s) property.
> > + */
> > + bool dirty;
> > +
> > /**
> > * @fence:
> > *
> > --
> > 2.14.3
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On 04/04/2018 11:56 AM, Daniel Vetter wrote:
> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>> Hi,
>>>>
>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>
>>>>>> To signal to the driver that the async atomic update needs to
>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>> ---
>>>>>> Background: there are a number of different folks working on getting
>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>
>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>
>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>
>>>>>> I guess at least the helper, with some small addition to translate
>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>> gap solution could be useful.
>>>>> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>>>> -Daniel
>>>> I've asked Deepak to RFC the core changes suggested for the full dirty blob
>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>>>> get to the desired coordinates.
>>>>
>>>> One thing to perhaps discuss is how we would like to fit this with
>>>> front-buffer rendering and the dirty ioctl. In the page-flip context, the
>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>>>> damage region that can be fully ignored by the driver, new content is
>>>> indicated by a new framebuffer.
>>>>
>>>> We could do the same for frontbuffer rendering: Either set a dirty flag like
>>>> you do here, or provide a content_age state member. Since we clear the dirty
>>>> flag on state copies, I guess that would be sufficient. The blob rectangles
>>>> would then become a hint to restrict the damage region.
>>> I'm not entirely following here - I thought for frontbuffer rendering the
>>> dirty rects have always just been a hint, and that the driver was always
>>> free to re-upload the entire buffer to the screen.
>>>
>>> And through a helper like Rob's proposing here (and have floated around in
>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>> call to a fake flip with dirty rect. Manual upload drivers already need to
>>> upload the entire screen if they get a flip, since some userspace uses
>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>
>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>
>>>> Another approach would be to have the presence of dirty rects without
>>>> framebuffer change to indicate frontbuffer rendering.
>>>>
>>>> I think I like the first approach best, although it may be tempting for
>>>> user-space apps to just set the dirty bit instead of providing the full
>>>> damage region.
>>> Or I'm not following you here, because I don't quite see the difference
>>> between dirtyfb and a flip.
>>> -Daniel
>> OK, let me rephrase:
>>
>> From the driver's point-of-view, in the atomic world, new content and the
>> need for manual upload is indicated by a change in fb attached to the plane.
>>
>> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> the need for manual upload is identified by the dirty flag, (since the fb
>> stays the same and the driver thus never identifies a page-flip).
> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> (atomic or not) should result in the entire buffer getting uploaded. The
> dirty flag is kinda redundant, a flip with the same buffer works the same
> way as a dirtyfb with the entire buffer as the dirty rectangle.
>
>> In both these situations, clip rects can provide a hint to restrict the
>> dirty region.
>>
>> Now I was thinking about the preferred way for user-space to communicate
>> front buffer rendering through the atomic ioctl:
>>
>> 1) Expose a dirty (or content_age property)
>> 2) Attach a clip-rect blob property.
>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> underlying buffer object.
>>
>> Are you saying that people are already using 3) and we should keep using
>> that?
> I'm saying they're using 3b), flip the same buffer wrapped in the same
> drm_framebuffer, and expect it to work.
>
> The only advantage dirtyfb has is that it allows you to supply the
> optimized upload rectangles, but at the cost of a funny api (it's working
> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> confirm when exactly you uploaded your stuff. But imo they should be the
> same underlying operation.
>
> Also note that atomic helpers don't optimize out plane flips for same
> buffers. We only optimize out some of the waiting, in a failed attempt at
> making cursors stall less, but that's not fixed with the async plane
> update stuff. And we can obviously optimize out the prepare/cleanup hooks,
> because the buffer should be pinned already.
>
I'm a bit confused.
Apparently we have different opinions in when an uploading driver should
assume altered plane content and the need for re-upload.
vmwgfx is from what I know currently assuming that this happens only on
changed fb attachment (what we call a page-flip) whereas if I understand
you correctly it should happen on each atomic state commit?
If we should assume the latter, then it has odd implications, let's say
you have 8 screens up, and you pan one of them on the large fb, why
would you upload the contents of the other 7?
Likewise for cursors, why would you want to upload the cursor image on
each cursor move?
So in my POW, option 1) is the option that aligns with the current
vmwgfx implementation and from what I can tell, what Rob has implemented
in his patch.
/Thomas
Op 04-04-18 om 12:21 schreef Daniel Vetter:
> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>> rely on pageflips to trigger a flush to the panel).
>>>
>>> To signal to the driver that the async atomic update needs to
>>> synchronize with fences, even though the fb didn't change, the
>>> drm_atomic_state::dirty flag is added.
>>>
>>> Signed-off-by: Rob Clark <[email protected]>
>>> ---
>>> Background: there are a number of different folks working on getting
>>> upstream kernel working on various different phones/tablets with qcom
>>> SoC's.. many of them have command mode panels, so we kind of need a
>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>
>>> I know there is work on a proprer non-legacy atomic property for
>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>> be improved from triggering a full-frame flush once that is in
>>> place. But we kinda needa a stop-gap solution.
>>>
>>> I had considered an in-driver solution for this, but things get a
>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>> with setting the CTL.START bit. (ie. really all we need to do for
>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>> is to wrap this up as an atomic commit and rely on the worker to
>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>
>>> I guess at least the helper, with some small addition to translate
>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>> rect property solution. Depending on how far off that is, a stop-
>>> gap solution could be useful.
>>>
>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>> include/drm/drm_atomic_helper.h | 4 +++
>>> include/drm/drm_plane.h | 9 +++++
>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>> index c35654591c12..a578dc681b27 100644
>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>> if (state->fb)
>>> drm_framebuffer_get(state->fb);
>>>
>>> + state->dirty = false;
>>> state->fence = NULL;
>>> state->commit = NULL;
>>> }
>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>> }
>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>
>>> +/**
>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>> + *
>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>> + */
>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>> + struct drm_file *file_priv, unsigned flags,
>>> + unsigned color, struct drm_clip_rect *clips,
>>> + unsigned num_clips)
>>> +{
>>> + struct drm_modeset_acquire_ctx ctx;
>>> + struct drm_atomic_state *state;
>>> + struct drm_plane *plane;
>>> + int ret = 0;
>>> +
>>> + /*
>>> + * When called from ioctl, we are interruptable, but not when
>>> + * called internally (ie. defio worker)
>>> + */
>>> + drm_modeset_acquire_init(&ctx,
>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>> +
>>> + state = drm_atomic_state_alloc(fb->dev);
>>> + if (!state) {
>>> + ret = -ENOMEM;
>>> + goto out;
>>> + }
>>> + state->acquire_ctx = &ctx;
>>> +
>>> +retry:
>>> + drm_for_each_plane(plane, fb->dev) {
>>> + struct drm_plane_state *plane_state;
>>> +
>>> + if (plane->state->fb != fb)
>>> + continue;
>>> +
>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>> + if (IS_ERR(plane_state)) {
>>> + ret = PTR_ERR(plane_state);
>>> + goto out;
>>> + }
>>> +
>>> + plane_state->dirty = true;
>>> + }
>>> +
>>> + ret = drm_atomic_nonblocking_commit(state);
>>> +
>>> +out:
>>> + if (ret == -EDEADLK) {
>>> + drm_atomic_state_clear(state);
>>> + ret = drm_modeset_backoff(&ctx);
>>> + if (!ret)
>>> + goto retry;
>>> + }
>>> +
>>> + drm_atomic_state_put(state);
>>> +
>>> + drm_modeset_drop_locks(&ctx);
>>> + drm_modeset_acquire_fini(&ctx);
>>> +
>>> + return ret;
>>> +
>>> +}
>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>> +
>>> /**
>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>> * @obj: CRTC object
>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>> index bf5f8c39f34d..bb55a048e98b 100644
>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>> * Figure out what fence to wait for:
>>> */
>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>> + bool sync_fb = new_plane_state->fb &&
>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>> + new_plane_state->dirty);
>> Why do you have this optimization even here? Imo flipping to the same fb
>> should result in the fb getting fully uploaded, whether you're doing a
>> legacy page_flip, and atomic one or just a plane update.
>>
>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>> rendering flush already. IOW I don't think we need your
>> plane_state->dirty, it's implied to always be true - why would userspace
>> do a flip otherwise?
>>
>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>> others I think.
> Ok, I've done some history digging:
>
> - i915 and nouveau unconditionally wait for fences, even for same-fb
> flips.
> - no idea what amdgpu and vmwgfx are doing, they're not using
> plane_state->fence for implicit fences.
I thought plane_state->fence was used for explicit fences, so its use by drivers
would interfere with it? I don't think fencing would work on msm or vc4..
> - most arm-soc drivers do have this "optimization" in their code, and it
> even managed to get into the new drm_gem_fb_prepare_fb helper (which I
> reviewed, or well claimed to have ... oops). Afaict it goes back to the
> original msm atomic code, and was then dutifully copypasted all over the
> place.
>
> If folks are ok I'll do a patch series to align drivers with i915/nouveau.
> Well, any driver using reservation_object_get_excl_rcu +
> drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
> no idea what they're doing or whether they might have the same bug.
>
> From looking at at least the various prepare_fb callbacks I don't see any
> other drivers doing funny stuff around implicit fences.
> -Daniel
>
>>> + if (sync_fb) {
>>> struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>>> struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>> struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
>>> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>>> index 0e0c87252ab0..a5d882a34a33 100644
>>> --- a/drivers/gpu/drm/msm/msm_fb.c
>>> +++ b/drivers/gpu/drm/msm/msm_fb.c
>>> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>>> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>>> .create_handle = msm_framebuffer_create_handle,
>>> .destroy = msm_framebuffer_destroy,
>>> + .dirty = drm_atomic_helper_dirtyfb,
>>> };
>>>
>>> #ifdef CONFIG_DEBUG_FS
>>> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
>>> index 26aaba58d6ce..9b7a95c2643d 100644
>>> --- a/include/drm/drm_atomic_helper.h
>>> +++ b/include/drm/drm_atomic_helper.h
>>> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>> u16 *red, u16 *green, u16 *blue,
>>> uint32_t size,
>>> struct drm_modeset_acquire_ctx *ctx);
>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>> + struct drm_file *file_priv, unsigned flags,
>>> + unsigned color, struct drm_clip_rect *clips,
>>> + unsigned num_clips);
>>> void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>>> struct drm_private_state *state);
>>>
>>> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>>> index f7bf4a48b1c3..296fa22bda7a 100644
>>> --- a/include/drm/drm_plane.h
>>> +++ b/include/drm/drm_plane.h
>>> @@ -76,6 +76,15 @@ struct drm_plane_state {
>>> */
>>> struct drm_framebuffer *fb;
>>>
>>> + /**
>>> + * @dirty:
>>> + *
>>> + * Flag that indicates the fb contents have changed even though the
>>> + * fb has not. This is mostly a stop-gap solution until we have
>>> + * atomic dirty-rect(s) property.
>>> + */
>>> + bool dirty;
>>> +
>>> /**
>>> * @fence:
>>> *
>>> --
>>> 2.14.3
>>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
On Wed, Apr 4, 2018 at 6:21 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>> > Add an atomic helper to implement dirtyfb support. This is needed to
>> > support DSI command-mode panels with x11 userspace (ie. when we can't
>> > rely on pageflips to trigger a flush to the panel).
>> >
>> > To signal to the driver that the async atomic update needs to
>> > synchronize with fences, even though the fb didn't change, the
>> > drm_atomic_state::dirty flag is added.
>> >
>> > Signed-off-by: Rob Clark <[email protected]>
>> > ---
>> > Background: there are a number of different folks working on getting
>> > upstream kernel working on various different phones/tablets with qcom
>> > SoC's.. many of them have command mode panels, so we kind of need a
>> > way to support the legacy dirtyfb ioctl for x11 support.
>> >
>> > I know there is work on a proprer non-legacy atomic property for
>> > userspace to communicate dirty-rect(s) to the kernel, so this can
>> > be improved from triggering a full-frame flush once that is in
>> > place. But we kinda needa a stop-gap solution.
>> >
>> > I had considered an in-driver solution for this, but things get a
>> > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> > flips, because we need to synchronize setting various CTL.FLUSH bits
>> > with setting the CTL.START bit. (ie. really all we need to do for
>> > cmd mode panels is bang CTL.START, but is this ends up racing with
>> > pageflips setting FLUSH bits, then bad things.) The easiest soln
>> > is to wrap this up as an atomic commit and rely on the worker to
>> > serialize things. Hence adding an atomic dirtyfb helper.
>> >
>> > I guess at least the helper, with some small addition to translate
>> > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> > rect property solution. Depending on how far off that is, a stop-
>> > gap solution could be useful.
>> >
>> > drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>> > drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>> > drivers/gpu/drm/msm/msm_fb.c | 1 +
>> > include/drm/drm_atomic_helper.h | 4 +++
>> > include/drm/drm_plane.h | 9 +++++
>> > 5 files changed, 84 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>> > index c35654591c12..a578dc681b27 100644
>> > --- a/drivers/gpu/drm/drm_atomic_helper.c
>> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> > @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>> > if (state->fb)
>> > drm_framebuffer_get(state->fb);
>> >
>> > + state->dirty = false;
>> > state->fence = NULL;
>> > state->commit = NULL;
>> > }
>> > @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>> > }
>> > EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>> >
>> > +/**
>> > + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>> > + *
>> > + * A helper to implement drm_framebuffer_funcs::dirty
>> > + */
>> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> > + struct drm_file *file_priv, unsigned flags,
>> > + unsigned color, struct drm_clip_rect *clips,
>> > + unsigned num_clips)
>> > +{
>> > + struct drm_modeset_acquire_ctx ctx;
>> > + struct drm_atomic_state *state;
>> > + struct drm_plane *plane;
>> > + int ret = 0;
>> > +
>> > + /*
>> > + * When called from ioctl, we are interruptable, but not when
>> > + * called internally (ie. defio worker)
>> > + */
>> > + drm_modeset_acquire_init(&ctx,
>> > + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>> > +
>> > + state = drm_atomic_state_alloc(fb->dev);
>> > + if (!state) {
>> > + ret = -ENOMEM;
>> > + goto out;
>> > + }
>> > + state->acquire_ctx = &ctx;
>> > +
>> > +retry:
>> > + drm_for_each_plane(plane, fb->dev) {
>> > + struct drm_plane_state *plane_state;
>> > +
>> > + if (plane->state->fb != fb)
>> > + continue;
>> > +
>> > + plane_state = drm_atomic_get_plane_state(state, plane);
>> > + if (IS_ERR(plane_state)) {
>> > + ret = PTR_ERR(plane_state);
>> > + goto out;
>> > + }
>> > +
>> > + plane_state->dirty = true;
>> > + }
>> > +
>> > + ret = drm_atomic_nonblocking_commit(state);
>> > +
>> > +out:
>> > + if (ret == -EDEADLK) {
>> > + drm_atomic_state_clear(state);
>> > + ret = drm_modeset_backoff(&ctx);
>> > + if (!ret)
>> > + goto retry;
>> > + }
>> > +
>> > + drm_atomic_state_put(state);
>> > +
>> > + drm_modeset_drop_locks(&ctx);
>> > + drm_modeset_acquire_fini(&ctx);
>> > +
>> > + return ret;
>> > +
>> > +}
>> > +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>> > +
>> > /**
>> > * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>> > * @obj: CRTC object
>> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>> > index bf5f8c39f34d..bb55a048e98b 100644
>> > --- a/drivers/gpu/drm/msm/msm_atomic.c
>> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
>> > @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>> > * Figure out what fence to wait for:
>> > */
>> > for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>> > - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>> > + bool sync_fb = new_plane_state->fb &&
>> > + ((new_plane_state->fb != old_plane_state->fb) ||
>> > + new_plane_state->dirty);
>>
>> Why do you have this optimization even here? Imo flipping to the same fb
>> should result in the fb getting fully uploaded, whether you're doing a
>> legacy page_flip, and atomic one or just a plane update.
>>
>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>> rendering flush already. IOW I don't think we need your
>> plane_state->dirty, it's implied to always be true - why would userspace
>> do a flip otherwise?
>>
>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>> others I think.
>
> Ok, I've done some history digging:
>
> - i915 and nouveau unconditionally wait for fences, even for same-fb
> flips.
> - no idea what amdgpu and vmwgfx are doing, they're not using
> plane_state->fence for implicit fences.
> - most arm-soc drivers do have this "optimization" in their code, and it
> even managed to get into the new drm_gem_fb_prepare_fb helper (which I
> reviewed, or well claimed to have ... oops). Afaict it goes back to the
> original msm atomic code, and was then dutifully copypasted all over the
> place.
>
> If folks are ok I'll do a patch series to align drivers with i915/nouveau.
> Well, any driver using reservation_object_get_excl_rcu +
> drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
> no idea what they're doing or whether they might have the same bug.
if you do a patch series, I think do it the other way around and align
to msm ;-)
I think otherwise we could end up stalling while x11 does front buffer
rendering for something unrelated (maybe I was hitting that with
cursor updates?)
BR,
-R
> From looking at at least the various prepare_fb callbacks I don't see any
> other drivers doing funny stuff around implicit fences.
> -Daniel
>
>> > + if (sync_fb) {
>> > struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>> > struct msm_gem_object *msm_obj = to_msm_bo(obj);
>> > struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
>> > diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>> > index 0e0c87252ab0..a5d882a34a33 100644
>> > --- a/drivers/gpu/drm/msm/msm_fb.c
>> > +++ b/drivers/gpu/drm/msm/msm_fb.c
>> > @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>> > static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>> > .create_handle = msm_framebuffer_create_handle,
>> > .destroy = msm_framebuffer_destroy,
>> > + .dirty = drm_atomic_helper_dirtyfb,
>> > };
>> >
>> > #ifdef CONFIG_DEBUG_FS
>> > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
>> > index 26aaba58d6ce..9b7a95c2643d 100644
>> > --- a/include/drm/drm_atomic_helper.h
>> > +++ b/include/drm/drm_atomic_helper.h
>> > @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>> > u16 *red, u16 *green, u16 *blue,
>> > uint32_t size,
>> > struct drm_modeset_acquire_ctx *ctx);
>> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> > + struct drm_file *file_priv, unsigned flags,
>> > + unsigned color, struct drm_clip_rect *clips,
>> > + unsigned num_clips);
>> > void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>> > struct drm_private_state *state);
>> >
>> > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>> > index f7bf4a48b1c3..296fa22bda7a 100644
>> > --- a/include/drm/drm_plane.h
>> > +++ b/include/drm/drm_plane.h
>> > @@ -76,6 +76,15 @@ struct drm_plane_state {
>> > */
>> > struct drm_framebuffer *fb;
>> >
>> > + /**
>> > + * @dirty:
>> > + *
>> > + * Flag that indicates the fb contents have changed even though the
>> > + * fb has not. This is mostly a stop-gap solution until we have
>> > + * atomic dirty-rect(s) property.
>> > + */
>> > + bool dirty;
>> > +
>> > /**
>> > * @fence:
>> > *
>> > --
>> > 2.14.3
>> >
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
<[email protected]> wrote:
> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>> rely on pageflips to trigger a flush to the panel).
>>>>
>>>> To signal to the driver that the async atomic update needs to
>>>> synchronize with fences, even though the fb didn't change, the
>>>> drm_atomic_state::dirty flag is added.
>>>>
>>>> Signed-off-by: Rob Clark <[email protected]>
>>>> ---
>>>> Background: there are a number of different folks working on getting
>>>> upstream kernel working on various different phones/tablets with qcom
>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>
>>>> I know there is work on a proprer non-legacy atomic property for
>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>> be improved from triggering a full-frame flush once that is in
>>>> place. But we kinda needa a stop-gap solution.
>>>>
>>>> I had considered an in-driver solution for this, but things get a
>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>
>>>> I guess at least the helper, with some small addition to translate
>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>> rect property solution. Depending on how far off that is, a stop-
>>>> gap solution could be useful.
>>>>
>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>> include/drm/drm_plane.h | 9 +++++
>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>> index c35654591c12..a578dc681b27 100644
>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>> if (state->fb)
>>>> drm_framebuffer_get(state->fb);
>>>>
>>>> + state->dirty = false;
>>>> state->fence = NULL;
>>>> state->commit = NULL;
>>>> }
>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>> }
>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>
>>>> +/**
>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>> + *
>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>> + */
>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>> + struct drm_file *file_priv, unsigned flags,
>>>> + unsigned color, struct drm_clip_rect *clips,
>>>> + unsigned num_clips)
>>>> +{
>>>> + struct drm_modeset_acquire_ctx ctx;
>>>> + struct drm_atomic_state *state;
>>>> + struct drm_plane *plane;
>>>> + int ret = 0;
>>>> +
>>>> + /*
>>>> + * When called from ioctl, we are interruptable, but not when
>>>> + * called internally (ie. defio worker)
>>>> + */
>>>> + drm_modeset_acquire_init(&ctx,
>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>> +
>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>> + if (!state) {
>>>> + ret = -ENOMEM;
>>>> + goto out;
>>>> + }
>>>> + state->acquire_ctx = &ctx;
>>>> +
>>>> +retry:
>>>> + drm_for_each_plane(plane, fb->dev) {
>>>> + struct drm_plane_state *plane_state;
>>>> +
>>>> + if (plane->state->fb != fb)
>>>> + continue;
>>>> +
>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>> + if (IS_ERR(plane_state)) {
>>>> + ret = PTR_ERR(plane_state);
>>>> + goto out;
>>>> + }
>>>> +
>>>> + plane_state->dirty = true;
>>>> + }
>>>> +
>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>> +
>>>> +out:
>>>> + if (ret == -EDEADLK) {
>>>> + drm_atomic_state_clear(state);
>>>> + ret = drm_modeset_backoff(&ctx);
>>>> + if (!ret)
>>>> + goto retry;
>>>> + }
>>>> +
>>>> + drm_atomic_state_put(state);
>>>> +
>>>> + drm_modeset_drop_locks(&ctx);
>>>> + drm_modeset_acquire_fini(&ctx);
>>>> +
>>>> + return ret;
>>>> +
>>>> +}
>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>> +
>>>> /**
>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>> * @obj: CRTC object
>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>> * Figure out what fence to wait for:
>>>> */
>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>> + bool sync_fb = new_plane_state->fb &&
>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>> + new_plane_state->dirty);
>>> Why do you have this optimization even here? Imo flipping to the same fb
>>> should result in the fb getting fully uploaded, whether you're doing a
>>> legacy page_flip, and atomic one or just a plane update.
>>>
>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>> rendering flush already. IOW I don't think we need your
>>> plane_state->dirty, it's implied to always be true - why would userspace
>>> do a flip otherwise?
>>>
>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>> others I think.
>> Ok, I've done some history digging:
>>
>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>> flips.
>> - no idea what amdgpu and vmwgfx are doing, they're not using
>> plane_state->fence for implicit fences.
> I thought plane_state->fence was used for explicit fences, so its use by drivers
> would interfere with it? I don't think fencing would work on msm or vc4..
for implicit fencing we fish out the implicit fence and stuff it in
plane_state->fence..
BR,
-R
>> - most arm-soc drivers do have this "optimization" in their code, and it
>> even managed to get into the new drm_gem_fb_prepare_fb helper (which I
>> reviewed, or well claimed to have ... oops). Afaict it goes back to the
>> original msm atomic code, and was then dutifully copypasted all over the
>> place.
>>
>> If folks are ok I'll do a patch series to align drivers with i915/nouveau.
>> Well, any driver using reservation_object_get_excl_rcu +
>> drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
>> no idea what they're doing or whether they might have the same bug.
>>
>> From looking at at least the various prepare_fb callbacks I don't see any
>> other drivers doing funny stuff around implicit fences.
>> -Daniel
>>
>>>> + if (sync_fb) {
>>>> struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
>>>> struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>>> struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
>>>> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>>>> index 0e0c87252ab0..a5d882a34a33 100644
>>>> --- a/drivers/gpu/drm/msm/msm_fb.c
>>>> +++ b/drivers/gpu/drm/msm/msm_fb.c
>>>> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
>>>> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>>>> .create_handle = msm_framebuffer_create_handle,
>>>> .destroy = msm_framebuffer_destroy,
>>>> + .dirty = drm_atomic_helper_dirtyfb,
>>>> };
>>>>
>>>> #ifdef CONFIG_DEBUG_FS
>>>> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
>>>> index 26aaba58d6ce..9b7a95c2643d 100644
>>>> --- a/include/drm/drm_atomic_helper.h
>>>> +++ b/include/drm/drm_atomic_helper.h
>>>> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>> u16 *red, u16 *green, u16 *blue,
>>>> uint32_t size,
>>>> struct drm_modeset_acquire_ctx *ctx);
>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>> + struct drm_file *file_priv, unsigned flags,
>>>> + unsigned color, struct drm_clip_rect *clips,
>>>> + unsigned num_clips);
>>>> void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
>>>> struct drm_private_state *state);
>>>>
>>>> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>>>> index f7bf4a48b1c3..296fa22bda7a 100644
>>>> --- a/include/drm/drm_plane.h
>>>> +++ b/include/drm/drm_plane.h
>>>> @@ -76,6 +76,15 @@ struct drm_plane_state {
>>>> */
>>>> struct drm_framebuffer *fb;
>>>>
>>>> + /**
>>>> + * @dirty:
>>>> + *
>>>> + * Flag that indicates the fb contents have changed even though the
>>>> + * fb has not. This is mostly a stop-gap solution until we have
>>>> + * atomic dirty-rect(s) property.
>>>> + */
>>>> + bool dirty;
>>>> +
>>>> /**
>>>> * @fence:
>>>> *
>>>> --
>>>> 2.14.3
>>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>
>
On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>> > > Hi,
>> > >
>> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>> > > > > Add an atomic helper to implement dirtyfb support. This is needed to
>> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
>> > > > > rely on pageflips to trigger a flush to the panel).
>> > > > >
>> > > > > To signal to the driver that the async atomic update needs to
>> > > > > synchronize with fences, even though the fb didn't change, the
>> > > > > drm_atomic_state::dirty flag is added.
>> > > > >
>> > > > > Signed-off-by: Rob Clark <[email protected]>
>> > > > > ---
>> > > > > Background: there are a number of different folks working on getting
>> > > > > upstream kernel working on various different phones/tablets with qcom
>> > > > > SoC's.. many of them have command mode panels, so we kind of need a
>> > > > > way to support the legacy dirtyfb ioctl for x11 support.
>> > > > >
>> > > > > I know there is work on a proprer non-legacy atomic property for
>> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
>> > > > > be improved from triggering a full-frame flush once that is in
>> > > > > place. But we kinda needa a stop-gap solution.
>> > > > >
>> > > > > I had considered an in-driver solution for this, but things get a
>> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
>> > > > > with setting the CTL.START bit. (ie. really all we need to do for
>> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
>> > > > > pageflips setting FLUSH bits, then bad things.) The easiest soln
>> > > > > is to wrap this up as an atomic commit and rely on the worker to
>> > > > > serialize things. Hence adding an atomic dirtyfb helper.
>> > > > >
>> > > > > I guess at least the helper, with some small addition to translate
>> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> > > > > rect property solution. Depending on how far off that is, a stop-
>> > > > > gap solution could be useful.
>> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>> > > > -Daniel
>> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
>> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>> > > get to the desired coordinates.
>> > >
>> > > One thing to perhaps discuss is how we would like to fit this with
>> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
>> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>> > > damage region that can be fully ignored by the driver, new content is
>> > > indicated by a new framebuffer.
>> > >
>> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
>> > > you do here, or provide a content_age state member. Since we clear the dirty
>> > > flag on state copies, I guess that would be sufficient. The blob rectangles
>> > > would then become a hint to restrict the damage region.
>> > I'm not entirely following here - I thought for frontbuffer rendering the
>> > dirty rects have always just been a hint, and that the driver was always
>> > free to re-upload the entire buffer to the screen.
>> >
>> > And through a helper like Rob's proposing here (and have floated around in
>> > different versions already) we'd essentially map a frontbuffer dirtyfb
>> > call to a fake flip with dirty rect. Manual upload drivers already need to
>> > upload the entire screen if they get a flip, since some userspace uses
>> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
>> >
>> > So from that pov the new dirty flag is kinda not necessary imo.
>> >
>> > > Another approach would be to have the presence of dirty rects without
>> > > framebuffer change to indicate frontbuffer rendering.
>> > >
>> > > I think I like the first approach best, although it may be tempting for
>> > > user-space apps to just set the dirty bit instead of providing the full
>> > > damage region.
>> > Or I'm not following you here, because I don't quite see the difference
>> > between dirtyfb and a flip.
>> > -Daniel
>>
>> OK, let me rephrase:
>>
>> From the driver's point-of-view, in the atomic world, new content and the
>> need for manual upload is indicated by a change in fb attached to the plane.
>>
>> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> the need for manual upload is identified by the dirty flag, (since the fb
>> stays the same and the driver thus never identifies a page-flip).
>
> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> (atomic or not) should result in the entire buffer getting uploaded. The
> dirty flag is kinda redundant, a flip with the same buffer works the same
> way as a dirtyfb with the entire buffer as the dirty rectangle.
Userspace could do a pageflip, but (with buffer-age extension) only
re-render part of the frame. So there is a use-case for full frame
pageflip with hints about what region(s) changed since previous frame.
(Of course userspace could screw that up and get different results on
"command" vs "video" style display.. in that case, it gets to keep
both pieces)
BR,
-R
>> In both these situations, clip rects can provide a hint to restrict the
>> dirty region.
>>
>> Now I was thinking about the preferred way for user-space to communicate
>> front buffer rendering through the atomic ioctl:
>>
>> 1) Expose a dirty (or content_age property)
>> 2) Attach a clip-rect blob property.
>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> underlying buffer object.
>>
>> Are you saying that people are already using 3) and we should keep using
>> that?
>
> I'm saying they're using 3b), flip the same buffer wrapped in the same
> drm_framebuffer, and expect it to work.
>
> The only advantage dirtyfb has is that it allows you to supply the
> optimized upload rectangles, but at the cost of a funny api (it's working
> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> confirm when exactly you uploaded your stuff. But imo they should be the
> same underlying operation.
>
> Also note that atomic helpers don't optimize out plane flips for same
> buffers. We only optimize out some of the waiting, in a failed attempt at
> making cursors stall less, but that's not fixed with the async plane
> update stuff. And we can obviously optimize out the prepare/cleanup hooks,
> because the buffer should be pinned already.
>
> So imo for a quick fix I think we need:
> 1) Fix drivers (or at least msm) to upload buffers for manual upload on
> all flips (well, all screen updates). Probably should do the fence waiting
> unconditionally too (and handle cursors with the new async stuff).
> 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
> commit, which currently means we're throwing the dirty rect optimization
> into the wind. But that could easily be added to the drm_plane_state,
> without exposing it to userspace as a property just yet.
>
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:
> On 04/04/2018 11:56 AM, Daniel Vetter wrote:
>> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>>> Hi,
>>>>>
>>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]>
>>>>>> wrote:
>>>>>>> Add an atomic helper to implement dirtyfb support. This is
>>>>>>> needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we
>>>>>>> can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on
>>>>>>> getting
>>>>>>> upstream kernel working on various different phones/tablets with
>>>>>>> qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH
>>>>>>> bits
>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic
>>>>>>> dirty-
>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>> Adding Noralf, who iirc already posted the full dirty helpers
>>>>>> already somewhere.
>>>>>> -Daniel
>>>>> I've asked Deepak to RFC the core changes suggested for the full
>>>>> dirty blob
>>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple
>>>>> helper to
>>>>> get to the desired coordinates.
>>>>>
>>>>> One thing to perhaps discuss is how we would like to fit this with
>>>>> front-buffer rendering and the dirty ioctl. In the page-flip
>>>>> context, the
>>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to
>>>>> restrict the
>>>>> damage region that can be fully ignored by the driver, new content is
>>>>> indicated by a new framebuffer.
>>>>>
>>>>> We could do the same for frontbuffer rendering: Either set a dirty
>>>>> flag like
>>>>> you do here, or provide a content_age state member. Since we clear
>>>>> the dirty
>>>>> flag on state copies, I guess that would be sufficient. The blob
>>>>> rectangles
>>>>> would then become a hint to restrict the damage region.
>>>> I'm not entirely following here - I thought for frontbuffer
>>>> rendering the
>>>> dirty rects have always just been a hint, and that the driver was
>>>> always
>>>> free to re-upload the entire buffer to the screen.
>>>>
>>>> And through a helper like Rob's proposing here (and have floated
>>>> around in
>>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>>> call to a fake flip with dirty rect. Manual upload drivers already
>>>> need to
>>>> upload the entire screen if they get a flip, since some userspace uses
>>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>>
>>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>>
>>>>> Another approach would be to have the presence of dirty rects without
>>>>> framebuffer change to indicate frontbuffer rendering.
>>>>>
>>>>> I think I like the first approach best, although it may be
>>>>> tempting for
>>>>> user-space apps to just set the dirty bit instead of providing the
>>>>> full
>>>>> damage region.
>>>> Or I'm not following you here, because I don't quite see the
>>>> difference
>>>> between dirtyfb and a flip.
>>>> -Daniel
>>> OK, let me rephrase:
>>>
>>> From the driver's point-of-view, in the atomic world, new content
>>> and the
>>> need for manual upload is indicated by a change in fb attached to
>>> the plane.
>>>
>>> With Rob's patch here, (correct me if I'm wrong) in addition new
>>> content and
>>> the need for manual upload is identified by the dirty flag, (since
>>> the fb
>>> stays the same and the driver thus never identifies a page-flip).
>> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> (atomic or not) should result in the entire buffer getting uploaded. The
>> dirty flag is kinda redundant, a flip with the same buffer works the
>> same
>> way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>>> In both these situations, clip rects can provide a hint to restrict the
>>> dirty region.
>>>
>>> Now I was thinking about the preferred way for user-space to
>>> communicate
>>> front buffer rendering through the atomic ioctl:
>>>
>>> 1) Expose a dirty (or content_age property)
>>> 2) Attach a clip-rect blob property.
>>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to
>>> the same
>>> underlying buffer object.
>>>
>>> Are you saying that people are already using 3) and we should keep
>>> using
>>> that?
>> I'm saying they're using 3b), flip the same buffer wrapped in the same
>> drm_framebuffer, and expect it to work.
>>
>> The only advantage dirtyfb has is that it allows you to supply the
>> optimized upload rectangles, but at the cost of a funny api (it's
>> working
>> on the fb, not the plane/crtc you want to upload) and lack of
>> drm_event to
>> confirm when exactly you uploaded your stuff. But imo they should be the
>> same underlying operation.
>>
>> Also note that atomic helpers don't optimize out plane flips for same
>> buffers. We only optimize out some of the waiting, in a failed
>> attempt at
>> making cursors stall less, but that's not fixed with the async plane
>> update stuff. And we can obviously optimize out the prepare/cleanup
>> hooks,
>> because the buffer should be pinned already.
>>
>
> I'm a bit confused.
>
> Apparently we have different opinions in when an uploading driver
> should assume altered plane content and the need for re-upload.
> vmwgfx is from what I know currently assuming that this happens only
> on changed fb attachment (what we call a page-flip) whereas if I
> understand you correctly it should happen on each atomic state commit?
>
> If we should assume the latter, then it has odd implications, let's
> say you have 8 screens up, and you pan one of them on the large fb,
> why would you upload the contents of the other 7?
>
> Likewise for cursors, why would you want to upload the cursor image
> on each cursor move?
>
> So in my POW, option 1) is the option that aligns with the current
> vmwgfx implementation and from what I can tell, what Rob has
> implemented in his patch.
>
Hmm.
After doing some apparently well needed reading up on the code it looks
like vmwgfx is actually doing a full upload on each plane state change,
only those planes that actually got changed are referenced in the
update. So that takes care of the panning example, assuming user-space
is smart enough to leave the unchanged planes / crtcs out of the update.
However, the cursor example still holds, and IMHO we should have a
better way to define content change than plane state update...
/Thomas
> /Thomas
>
>
Op 04-04-18 om 13:37 schreef Rob Clark:
> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
> <[email protected]> wrote:
>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>
>>>>> To signal to the driver that the async atomic update needs to
>>>>> synchronize with fences, even though the fb didn't change, the
>>>>> drm_atomic_state::dirty flag is added.
>>>>>
>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>> ---
>>>>> Background: there are a number of different folks working on getting
>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>
>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>> be improved from triggering a full-frame flush once that is in
>>>>> place. But we kinda needa a stop-gap solution.
>>>>>
>>>>> I had considered an in-driver solution for this, but things get a
>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>
>>>>> I guess at least the helper, with some small addition to translate
>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>> gap solution could be useful.
>>>>>
>>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>>> include/drm/drm_plane.h | 9 +++++
>>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>> index c35654591c12..a578dc681b27 100644
>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>> if (state->fb)
>>>>> drm_framebuffer_get(state->fb);
>>>>>
>>>>> + state->dirty = false;
>>>>> state->fence = NULL;
>>>>> state->commit = NULL;
>>>>> }
>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>> }
>>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>
>>>>> +/**
>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>> + *
>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>> + */
>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>> + struct drm_file *file_priv, unsigned flags,
>>>>> + unsigned color, struct drm_clip_rect *clips,
>>>>> + unsigned num_clips)
>>>>> +{
>>>>> + struct drm_modeset_acquire_ctx ctx;
>>>>> + struct drm_atomic_state *state;
>>>>> + struct drm_plane *plane;
>>>>> + int ret = 0;
>>>>> +
>>>>> + /*
>>>>> + * When called from ioctl, we are interruptable, but not when
>>>>> + * called internally (ie. defio worker)
>>>>> + */
>>>>> + drm_modeset_acquire_init(&ctx,
>>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>> +
>>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>>> + if (!state) {
>>>>> + ret = -ENOMEM;
>>>>> + goto out;
>>>>> + }
>>>>> + state->acquire_ctx = &ctx;
>>>>> +
>>>>> +retry:
>>>>> + drm_for_each_plane(plane, fb->dev) {
>>>>> + struct drm_plane_state *plane_state;
>>>>> +
>>>>> + if (plane->state->fb != fb)
>>>>> + continue;
>>>>> +
>>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>>> + if (IS_ERR(plane_state)) {
>>>>> + ret = PTR_ERR(plane_state);
>>>>> + goto out;
>>>>> + }
>>>>> +
>>>>> + plane_state->dirty = true;
>>>>> + }
>>>>> +
>>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>>> +
>>>>> +out:
>>>>> + if (ret == -EDEADLK) {
>>>>> + drm_atomic_state_clear(state);
>>>>> + ret = drm_modeset_backoff(&ctx);
>>>>> + if (!ret)
>>>>> + goto retry;
>>>>> + }
>>>>> +
>>>>> + drm_atomic_state_put(state);
>>>>> +
>>>>> + drm_modeset_drop_locks(&ctx);
>>>>> + drm_modeset_acquire_fini(&ctx);
>>>>> +
>>>>> + return ret;
>>>>> +
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>> +
>>>>> /**
>>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>> * @obj: CRTC object
>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>> * Figure out what fence to wait for:
>>>>> */
>>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>> + bool sync_fb = new_plane_state->fb &&
>>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>>> + new_plane_state->dirty);
>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>> legacy page_flip, and atomic one or just a plane update.
>>>>
>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>> rendering flush already. IOW I don't think we need your
>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>> do a flip otherwise?
>>>>
>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>> others I think.
>>> Ok, I've done some history digging:
>>>
>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>> flips.
>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>> plane_state->fence for implicit fences.
>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>> would interfere with it? I don't think fencing would work on msm or vc4..
> for implicit fencing we fish out the implicit fence and stuff it in
> plane_state->fence..
What happens when userspace passes a fence fd to in_fence_fd?
On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
<[email protected]> wrote:
> Op 04-04-18 om 13:37 schreef Rob Clark:
>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>> <[email protected]> wrote:
>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>
>>>>>> To signal to the driver that the async atomic update needs to
>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>> ---
>>>>>> Background: there are a number of different folks working on getting
>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>
>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>
>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>
>>>>>> I guess at least the helper, with some small addition to translate
>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>> gap solution could be useful.
>>>>>>
>>>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>>>> include/drm/drm_plane.h | 9 +++++
>>>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>> if (state->fb)
>>>>>> drm_framebuffer_get(state->fb);
>>>>>>
>>>>>> + state->dirty = false;
>>>>>> state->fence = NULL;
>>>>>> state->commit = NULL;
>>>>>> }
>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>> }
>>>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>
>>>>>> +/**
>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>> + *
>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>> + */
>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>> + struct drm_file *file_priv, unsigned flags,
>>>>>> + unsigned color, struct drm_clip_rect *clips,
>>>>>> + unsigned num_clips)
>>>>>> +{
>>>>>> + struct drm_modeset_acquire_ctx ctx;
>>>>>> + struct drm_atomic_state *state;
>>>>>> + struct drm_plane *plane;
>>>>>> + int ret = 0;
>>>>>> +
>>>>>> + /*
>>>>>> + * When called from ioctl, we are interruptable, but not when
>>>>>> + * called internally (ie. defio worker)
>>>>>> + */
>>>>>> + drm_modeset_acquire_init(&ctx,
>>>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>> +
>>>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>>>> + if (!state) {
>>>>>> + ret = -ENOMEM;
>>>>>> + goto out;
>>>>>> + }
>>>>>> + state->acquire_ctx = &ctx;
>>>>>> +
>>>>>> +retry:
>>>>>> + drm_for_each_plane(plane, fb->dev) {
>>>>>> + struct drm_plane_state *plane_state;
>>>>>> +
>>>>>> + if (plane->state->fb != fb)
>>>>>> + continue;
>>>>>> +
>>>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>> + if (IS_ERR(plane_state)) {
>>>>>> + ret = PTR_ERR(plane_state);
>>>>>> + goto out;
>>>>>> + }
>>>>>> +
>>>>>> + plane_state->dirty = true;
>>>>>> + }
>>>>>> +
>>>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>>>> +
>>>>>> +out:
>>>>>> + if (ret == -EDEADLK) {
>>>>>> + drm_atomic_state_clear(state);
>>>>>> + ret = drm_modeset_backoff(&ctx);
>>>>>> + if (!ret)
>>>>>> + goto retry;
>>>>>> + }
>>>>>> +
>>>>>> + drm_atomic_state_put(state);
>>>>>> +
>>>>>> + drm_modeset_drop_locks(&ctx);
>>>>>> + drm_modeset_acquire_fini(&ctx);
>>>>>> +
>>>>>> + return ret;
>>>>>> +
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>> +
>>>>>> /**
>>>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>> * @obj: CRTC object
>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>> * Figure out what fence to wait for:
>>>>>> */
>>>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>> + bool sync_fb = new_plane_state->fb &&
>>>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>> + new_plane_state->dirty);
>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>
>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>> rendering flush already. IOW I don't think we need your
>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>> do a flip otherwise?
>>>>>
>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>> others I think.
>>>> Ok, I've done some history digging:
>>>>
>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>> flips.
>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>> plane_state->fence for implicit fences.
>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>> would interfere with it? I don't think fencing would work on msm or vc4..
>> for implicit fencing we fish out the implicit fence and stuff it in
>> plane_state->fence..
> What happens when userspace passes a fence fd to in_fence_fd?
mixing implicit sync and explicit sync is undefined
BR,
-R
On Wed, Apr 04, 2018 at 01:46:37PM +0200, Thomas Hellstrom wrote:
> On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:
> > On 04/04/2018 11:56 AM, Daniel Vetter wrote:
> > > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
> > > > On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> > > > > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> > > > > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark
> > > > > > > <[email protected]> wrote:
> > > > > > > > Add an atomic helper to implement dirtyfb
> > > > > > > > support.? This is needed to
> > > > > > > > support DSI command-mode panels with x11
> > > > > > > > userspace (ie. when we can't
> > > > > > > > rely on pageflips to trigger a flush to the panel).
> > > > > > > >
> > > > > > > > To signal to the driver that the async atomic update needs to
> > > > > > > > synchronize with fences, even though the fb didn't change, the
> > > > > > > > drm_atomic_state::dirty flag is added.
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > > > > ---
> > > > > > > > Background: there are a number of different
> > > > > > > > folks working on getting
> > > > > > > > upstream kernel working on various different
> > > > > > > > phones/tablets with qcom
> > > > > > > > SoC's.. many of them have command mode panels, so we kind of need a
> > > > > > > > way to support the legacy dirtyfb ioctl for x11 support.
> > > > > > > >
> > > > > > > > I know there is work on a proprer non-legacy atomic property for
> > > > > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
> > > > > > > > be improved from triggering a full-frame flush once that is in
> > > > > > > > place.? But we kinda needa a stop-gap solution.
> > > > > > > >
> > > > > > > > I had considered an in-driver solution for this, but things get a
> > > > > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> > > > > > > > flips, because we need to synchronize setting
> > > > > > > > various CTL.FLUSH bits
> > > > > > > > with setting the CTL.START bit.? (ie. really all we need to do for
> > > > > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
> > > > > > > > pageflips setting FLUSH bits, then bad things.)? The easiest soln
> > > > > > > > is to wrap this up as an atomic commit and rely on the worker to
> > > > > > > > serialize things.? Hence adding an atomic dirtyfb helper.
> > > > > > > >
> > > > > > > > I guess at least the helper, with some small addition to translate
> > > > > > > > and pass-thru the dirty rect(s) is useful to the
> > > > > > > > final atomic dirty-
> > > > > > > > rect property solution.? Depending on how far off that is, a stop-
> > > > > > > > gap solution could be useful.
> > > > > > > Adding Noralf, who iirc already posted the full
> > > > > > > dirty helpers already somewhere.
> > > > > > > -Daniel
> > > > > > I've asked Deepak to RFC the core changes suggested for
> > > > > > the full dirty blob
> > > > > > on dri-devel. It builds on DisplayLink's suggestion,
> > > > > > with a simple helper to
> > > > > > get to the desired coordinates.
> > > > > >
> > > > > > One thing to perhaps discuss is how we would like to fit this with
> > > > > > front-buffer rendering and the dirty ioctl. In the
> > > > > > page-flip context, the
> > > > > > dirty rects, like egl's swapbuffer_with_damage is a hint
> > > > > > to restrict the
> > > > > > damage region that can be fully ignored by the driver, new content is
> > > > > > indicated by a new framebuffer.
> > > > > >
> > > > > > We could do the same for frontbuffer rendering: Either
> > > > > > set a dirty flag like
> > > > > > you do here, or provide a content_age state member.
> > > > > > Since we clear the dirty
> > > > > > flag on state copies, I guess that would be sufficient.
> > > > > > The blob rectangles
> > > > > > would then become a hint to restrict the damage region.
> > > > > I'm not entirely following here - I thought for frontbuffer
> > > > > rendering the
> > > > > dirty rects have always just been a hint, and that the
> > > > > driver was always
> > > > > free to re-upload the entire buffer to the screen.
> > > > >
> > > > > And through a helper like Rob's proposing here (and have
> > > > > floated around in
> > > > > different versions already) we'd essentially map a frontbuffer dirtyfb
> > > > > call to a fake flip with dirty rect. Manual upload drivers
> > > > > already need to
> > > > > upload the entire screen if they get a flip, since some userspace uses
> > > > > that to flush out frontbuffer rendering (instead of calling dirtyfb).
> > > > >
> > > > > So from that pov the new dirty flag is kinda not necessary imo.
> > > > >
> > > > > > Another approach would be to have the presence of dirty rects without
> > > > > > framebuffer change to indicate frontbuffer rendering.
> > > > > >
> > > > > > I think I like the first approach best, although it may
> > > > > > be tempting for
> > > > > > user-space apps to just set the dirty bit instead of
> > > > > > providing the full
> > > > > > damage region.
> > > > > Or I'm not following you here, because I don't quite see the
> > > > > difference
> > > > > between dirtyfb and a flip.
> > > > > -Daniel
> > > > OK, let me rephrase:
> > > >
> > > > ?From the driver's point-of-view, in the atomic world, new
> > > > content and the
> > > > need for manual upload is indicated by a change in fb attached
> > > > to the plane.
> > > >
> > > > With Rob's patch here, (correct me if I'm wrong) in addition new
> > > > content and
> > > > the need for manual upload is identified by the dirty flag,
> > > > (since the fb
> > > > stays the same and the driver thus never identifies a page-flip).
> > > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> > > (atomic or not) should result in the entire buffer getting uploaded. The
> > > dirty flag is kinda redundant, a flip with the same buffer works the
> > > same
> > > way as a dirtyfb with the entire buffer as the dirty rectangle.
> > >
> > > > In both these situations, clip rects can provide a hint to restrict the
> > > > dirty region.
> > > >
> > > > Now I was thinking about the preferred way for user-space to
> > > > communicate
> > > > front buffer rendering through the atomic ioctl:
> > > >
> > > > 1) Expose a dirty (or content_age property)
> > > > 2) Attach a clip-rect blob property.
> > > > 3) Fake a page-flip by ping-ponging two frame-buffers pointing
> > > > to the same
> > > > underlying buffer object.
> > > >
> > > > Are you saying that people are already using 3) and we should
> > > > keep using
> > > > that?
> > > I'm saying they're using 3b), flip the same buffer wrapped in the same
> > > drm_framebuffer, and expect it to work.
> > >
> > > The only advantage dirtyfb has is that it allows you to supply the
> > > optimized upload rectangles, but at the cost of a funny api (it's
> > > working
> > > on the fb, not the plane/crtc you want to upload) and lack of
> > > drm_event to
> > > confirm when exactly you uploaded your stuff. But imo they should be the
> > > same underlying operation.
> > >
> > > Also note that atomic helpers don't optimize out plane flips for same
> > > buffers. We only optimize out some of the waiting, in a failed
> > > attempt at
> > > making cursors stall less, but that's not fixed with the async plane
> > > update stuff. And we can obviously optimize out the prepare/cleanup
> > > hooks,
> > > because the buffer should be pinned already.
> > >
> >
> > I'm a bit confused.
> >
> > Apparently we have different opinions in when an uploading driver should
> > assume altered plane content and the need for re-upload.
> > vmwgfx is from what I know currently assuming that this happens only on
> > changed fb attachment (what we call a page-flip) whereas if I understand
> > you correctly it should happen on each atomic state commit?
> >
> > If we should assume the latter, then it has odd implications, let's say
> > you have 8 screens up, and you pan one of them on the large fb, why
> > would you upload the contents of the other 7?
> >
> > Likewise for cursors,? why would you want to upload the cursor image on
> > each cursor move?
> >
> > So in my POW, option 1) is the option that aligns with the current
> > vmwgfx implementation and from what I can tell, what Rob has implemented
> > in his patch.
> >
>
> Hmm.
>
> After doing some apparently well needed reading up on the code it looks like
> vmwgfx is actually doing a full upload on each plane state change,
> only those planes that actually got changed are referenced in the update. So
> that takes care of the panning example, assuming user-space is smart enough
> to leave the unchanged planes / crtcs out of the update.
>
> However, the cursor example still holds, and IMHO we should have a better
> way to define content change than plane state update...
We're definitely inconsistent here. I guess for manual upload drivers on
real hw panning isn't a note-worthy special case, since if you pan you
need to upload the entire damaged area on the screen anyway. But virtual
hw drivers like vmwgfx are different this way, since panning only means
you transmit the new position to the host driver, but don't have to upload
the entire thing.
And the legacy cursor ioctl seems to be the only thing with that
expectation (although I think funny experiments on i915 used gpu
rendering for that too).
Simply for robustness reasons I'd say that with maybe the exception of
cursors, we should assume that a flip must upload the entire buffer that's
getting flipped, since that's how it works on most drivers.
And then we can add the dirty rectangle stuff so that clever userspace and
clever drives can optimize this all again. And preferrably the cursor
trick would be implemented by having an empty dirty rect, instead of one
that spans the entire fb.
-Daniel
>
> /Thomas
>
>
>
> > /Thomas
> >
> >
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Wed, Apr 04, 2018 at 07:40:32AM -0400, Rob Clark wrote:
> On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <[email protected]> wrote:
> > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
> >> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
> >> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
> >> > > Hi,
> >> > >
> >> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
> >> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
> >> > > > > Add an atomic helper to implement dirtyfb support. This is needed to
> >> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
> >> > > > > rely on pageflips to trigger a flush to the panel).
> >> > > > >
> >> > > > > To signal to the driver that the async atomic update needs to
> >> > > > > synchronize with fences, even though the fb didn't change, the
> >> > > > > drm_atomic_state::dirty flag is added.
> >> > > > >
> >> > > > > Signed-off-by: Rob Clark <[email protected]>
> >> > > > > ---
> >> > > > > Background: there are a number of different folks working on getting
> >> > > > > upstream kernel working on various different phones/tablets with qcom
> >> > > > > SoC's.. many of them have command mode panels, so we kind of need a
> >> > > > > way to support the legacy dirtyfb ioctl for x11 support.
> >> > > > >
> >> > > > > I know there is work on a proprer non-legacy atomic property for
> >> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
> >> > > > > be improved from triggering a full-frame flush once that is in
> >> > > > > place. But we kinda needa a stop-gap solution.
> >> > > > >
> >> > > > > I had considered an in-driver solution for this, but things get a
> >> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> >> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
> >> > > > > with setting the CTL.START bit. (ie. really all we need to do for
> >> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
> >> > > > > pageflips setting FLUSH bits, then bad things.) The easiest soln
> >> > > > > is to wrap this up as an atomic commit and rely on the worker to
> >> > > > > serialize things. Hence adding an atomic dirtyfb helper.
> >> > > > >
> >> > > > > I guess at least the helper, with some small addition to translate
> >> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> >> > > > > rect property solution. Depending on how far off that is, a stop-
> >> > > > > gap solution could be useful.
> >> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
> >> > > > -Daniel
> >> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
> >> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
> >> > > get to the desired coordinates.
> >> > >
> >> > > One thing to perhaps discuss is how we would like to fit this with
> >> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
> >> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
> >> > > damage region that can be fully ignored by the driver, new content is
> >> > > indicated by a new framebuffer.
> >> > >
> >> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
> >> > > you do here, or provide a content_age state member. Since we clear the dirty
> >> > > flag on state copies, I guess that would be sufficient. The blob rectangles
> >> > > would then become a hint to restrict the damage region.
> >> > I'm not entirely following here - I thought for frontbuffer rendering the
> >> > dirty rects have always just been a hint, and that the driver was always
> >> > free to re-upload the entire buffer to the screen.
> >> >
> >> > And through a helper like Rob's proposing here (and have floated around in
> >> > different versions already) we'd essentially map a frontbuffer dirtyfb
> >> > call to a fake flip with dirty rect. Manual upload drivers already need to
> >> > upload the entire screen if they get a flip, since some userspace uses
> >> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
> >> >
> >> > So from that pov the new dirty flag is kinda not necessary imo.
> >> >
> >> > > Another approach would be to have the presence of dirty rects without
> >> > > framebuffer change to indicate frontbuffer rendering.
> >> > >
> >> > > I think I like the first approach best, although it may be tempting for
> >> > > user-space apps to just set the dirty bit instead of providing the full
> >> > > damage region.
> >> > Or I'm not following you here, because I don't quite see the difference
> >> > between dirtyfb and a flip.
> >> > -Daniel
> >>
> >> OK, let me rephrase:
> >>
> >> From the driver's point-of-view, in the atomic world, new content and the
> >> need for manual upload is indicated by a change in fb attached to the plane.
> >>
> >> With Rob's patch here, (correct me if I'm wrong) in addition new content and
> >> the need for manual upload is identified by the dirty flag, (since the fb
> >> stays the same and the driver thus never identifies a page-flip).
> >
> > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> > (atomic or not) should result in the entire buffer getting uploaded. The
> > dirty flag is kinda redundant, a flip with the same buffer works the same
> > way as a dirtyfb with the entire buffer as the dirty rectangle.
>
> Userspace could do a pageflip, but (with buffer-age extension) only
> re-render part of the frame. So there is a use-case for full frame
> pageflip with hints about what region(s) changed since previous frame.
>
> (Of course userspace could screw that up and get different results on
> "command" vs "video" style display.. in that case, it gets to keep
> both pieces)
I'm not against dirty on flips with the same buffer as an optimization.
Disagreement here is around whether a flip to the same buffer means:
a) nothing changed, upload nothing
b) entire buffer might have changed
I'm proposing we standardize on b) (with maybe the exception of legacy
cursor ioctls because those are special) and add the dirty rects as an
option to optimize this more.
Either way you'd always have to follow the implicit fence (which is what
your msm hunk seems to be doing, but only for dirtyfb).
-Daniel
>
> BR,
> -R
>
>
> >> In both these situations, clip rects can provide a hint to restrict the
> >> dirty region.
> >>
> >> Now I was thinking about the preferred way for user-space to communicate
> >> front buffer rendering through the atomic ioctl:
> >>
> >> 1) Expose a dirty (or content_age property)
> >> 2) Attach a clip-rect blob property.
> >> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
> >> underlying buffer object.
> >>
> >> Are you saying that people are already using 3) and we should keep using
> >> that?
> >
> > I'm saying they're using 3b), flip the same buffer wrapped in the same
> > drm_framebuffer, and expect it to work.
> >
> > The only advantage dirtyfb has is that it allows you to supply the
> > optimized upload rectangles, but at the cost of a funny api (it's working
> > on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> > confirm when exactly you uploaded your stuff. But imo they should be the
> > same underlying operation.
> >
> > Also note that atomic helpers don't optimize out plane flips for same
> > buffers. We only optimize out some of the waiting, in a failed attempt at
> > making cursors stall less, but that's not fixed with the async plane
> > update stuff. And we can obviously optimize out the prepare/cleanup hooks,
> > because the buffer should be pinned already.
> >
> > So imo for a quick fix I think we need:
> > 1) Fix drivers (or at least msm) to upload buffers for manual upload on
> > all flips (well, all screen updates). Probably should do the fence waiting
> > unconditionally too (and handle cursors with the new async stuff).
> > 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
> > commit, which currently means we're throwing the dirty rect optimization
> > into the wind. But that could easily be added to the drm_plane_state,
> > without exposing it to userspace as a property just yet.
> >
> > Cheers, Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Wed, Apr 04, 2018 at 07:35:58AM -0400, Rob Clark wrote:
> On Wed, Apr 4, 2018 at 6:21 AM, Daniel Vetter <[email protected]> wrote:
> > On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
> >> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
> >> > Add an atomic helper to implement dirtyfb support. This is needed to
> >> > support DSI command-mode panels with x11 userspace (ie. when we can't
> >> > rely on pageflips to trigger a flush to the panel).
> >> >
> >> > To signal to the driver that the async atomic update needs to
> >> > synchronize with fences, even though the fb didn't change, the
> >> > drm_atomic_state::dirty flag is added.
> >> >
> >> > Signed-off-by: Rob Clark <[email protected]>
> >> > ---
> >> > Background: there are a number of different folks working on getting
> >> > upstream kernel working on various different phones/tablets with qcom
> >> > SoC's.. many of them have command mode panels, so we kind of need a
> >> > way to support the legacy dirtyfb ioctl for x11 support.
> >> >
> >> > I know there is work on a proprer non-legacy atomic property for
> >> > userspace to communicate dirty-rect(s) to the kernel, so this can
> >> > be improved from triggering a full-frame flush once that is in
> >> > place. But we kinda needa a stop-gap solution.
> >> >
> >> > I had considered an in-driver solution for this, but things get a
> >> > bit tricky if userspace ands up combining dirtyfb ioctls with page-
> >> > flips, because we need to synchronize setting various CTL.FLUSH bits
> >> > with setting the CTL.START bit. (ie. really all we need to do for
> >> > cmd mode panels is bang CTL.START, but is this ends up racing with
> >> > pageflips setting FLUSH bits, then bad things.) The easiest soln
> >> > is to wrap this up as an atomic commit and rely on the worker to
> >> > serialize things. Hence adding an atomic dirtyfb helper.
> >> >
> >> > I guess at least the helper, with some small addition to translate
> >> > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> >> > rect property solution. Depending on how far off that is, a stop-
> >> > gap solution could be useful.
> >> >
> >> > drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> >> > drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
> >> > drivers/gpu/drm/msm/msm_fb.c | 1 +
> >> > include/drm/drm_atomic_helper.h | 4 +++
> >> > include/drm/drm_plane.h | 9 +++++
> >> > 5 files changed, 84 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> >> > index c35654591c12..a578dc681b27 100644
> >> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> >> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> >> > @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> >> > if (state->fb)
> >> > drm_framebuffer_get(state->fb);
> >> >
> >> > + state->dirty = false;
> >> > state->fence = NULL;
> >> > state->commit = NULL;
> >> > }
> >> > @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> >> > }
> >> > EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
> >> >
> >> > +/**
> >> > + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> >> > + *
> >> > + * A helper to implement drm_framebuffer_funcs::dirty
> >> > + */
> >> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> >> > + struct drm_file *file_priv, unsigned flags,
> >> > + unsigned color, struct drm_clip_rect *clips,
> >> > + unsigned num_clips)
> >> > +{
> >> > + struct drm_modeset_acquire_ctx ctx;
> >> > + struct drm_atomic_state *state;
> >> > + struct drm_plane *plane;
> >> > + int ret = 0;
> >> > +
> >> > + /*
> >> > + * When called from ioctl, we are interruptable, but not when
> >> > + * called internally (ie. defio worker)
> >> > + */
> >> > + drm_modeset_acquire_init(&ctx,
> >> > + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> >> > +
> >> > + state = drm_atomic_state_alloc(fb->dev);
> >> > + if (!state) {
> >> > + ret = -ENOMEM;
> >> > + goto out;
> >> > + }
> >> > + state->acquire_ctx = &ctx;
> >> > +
> >> > +retry:
> >> > + drm_for_each_plane(plane, fb->dev) {
> >> > + struct drm_plane_state *plane_state;
> >> > +
> >> > + if (plane->state->fb != fb)
> >> > + continue;
> >> > +
> >> > + plane_state = drm_atomic_get_plane_state(state, plane);
> >> > + if (IS_ERR(plane_state)) {
> >> > + ret = PTR_ERR(plane_state);
> >> > + goto out;
> >> > + }
> >> > +
> >> > + plane_state->dirty = true;
> >> > + }
> >> > +
> >> > + ret = drm_atomic_nonblocking_commit(state);
> >> > +
> >> > +out:
> >> > + if (ret == -EDEADLK) {
> >> > + drm_atomic_state_clear(state);
> >> > + ret = drm_modeset_backoff(&ctx);
> >> > + if (!ret)
> >> > + goto retry;
> >> > + }
> >> > +
> >> > + drm_atomic_state_put(state);
> >> > +
> >> > + drm_modeset_drop_locks(&ctx);
> >> > + drm_modeset_acquire_fini(&ctx);
> >> > +
> >> > + return ret;
> >> > +
> >> > +}
> >> > +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> >> > +
> >> > /**
> >> > * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> >> > * @obj: CRTC object
> >> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> >> > index bf5f8c39f34d..bb55a048e98b 100644
> >> > --- a/drivers/gpu/drm/msm/msm_atomic.c
> >> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
> >> > @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> >> > * Figure out what fence to wait for:
> >> > */
> >> > for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> >> > - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> >> > + bool sync_fb = new_plane_state->fb &&
> >> > + ((new_plane_state->fb != old_plane_state->fb) ||
> >> > + new_plane_state->dirty);
> >>
> >> Why do you have this optimization even here? Imo flipping to the same fb
> >> should result in the fb getting fully uploaded, whether you're doing a
> >> legacy page_flip, and atomic one or just a plane update.
> >>
> >> Iirc some userspace does use that as essentially a full-plane frontbuffer
> >> rendering flush already. IOW I don't think we need your
> >> plane_state->dirty, it's implied to always be true - why would userspace
> >> do a flip otherwise?
> >>
> >> The helper itself to map dirtyfb to a nonblocking atomic commit looks
> >> reasonable, but misses a bunch of the trickery discussed with Noralf and
> >> others I think.
> >
> > Ok, I've done some history digging:
> >
> > - i915 and nouveau unconditionally wait for fences, even for same-fb
> > flips.
> > - no idea what amdgpu and vmwgfx are doing, they're not using
> > plane_state->fence for implicit fences.
> > - most arm-soc drivers do have this "optimization" in their code, and it
> > even managed to get into the new drm_gem_fb_prepare_fb helper (which I
> > reviewed, or well claimed to have ... oops). Afaict it goes back to the
> > original msm atomic code, and was then dutifully copypasted all over the
> > place.
> >
> > If folks are ok I'll do a patch series to align drivers with i915/nouveau.
> > Well, any driver using reservation_object_get_excl_rcu +
> > drm_atomic_set_fence_for_plane combo, since amdgpu and vmwgfx don't I have
> > no idea what they're doing or whether they might have the same bug.
>
> if you do a patch series, I think do it the other way around and align
> to msm ;-)
The problem is that this would break i915 userspace afaik, which I think
is using a frontbuffer page flip to flush out uploads (and re-enable self
refresh and stuff like that). So that we can't do easily.
> I think otherwise we could end up stalling while x11 does front buffer
> rendering for something unrelated (maybe I was hitting that with
> cursor updates?)
Why would you stall on something unrelated if we unconditionally fish out
the fence? Especially cursor rendering isn't done through the gpu at all
afaik.
-Daniel
>
> BR,
> -R
>
> > From looking at at least the various prepare_fb callbacks I don't see any
> > other drivers doing funny stuff around implicit fences.
> > -Daniel
> >
> >> > + if (sync_fb) {
> >> > struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> >> > struct msm_gem_object *msm_obj = to_msm_bo(obj);
> >> > struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> >> > diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> >> > index 0e0c87252ab0..a5d882a34a33 100644
> >> > --- a/drivers/gpu/drm/msm/msm_fb.c
> >> > +++ b/drivers/gpu/drm/msm/msm_fb.c
> >> > @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> >> > static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> >> > .create_handle = msm_framebuffer_create_handle,
> >> > .destroy = msm_framebuffer_destroy,
> >> > + .dirty = drm_atomic_helper_dirtyfb,
> >> > };
> >> >
> >> > #ifdef CONFIG_DEBUG_FS
> >> > diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> >> > index 26aaba58d6ce..9b7a95c2643d 100644
> >> > --- a/include/drm/drm_atomic_helper.h
> >> > +++ b/include/drm/drm_atomic_helper.h
> >> > @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> >> > u16 *red, u16 *green, u16 *blue,
> >> > uint32_t size,
> >> > struct drm_modeset_acquire_ctx *ctx);
> >> > +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> >> > + struct drm_file *file_priv, unsigned flags,
> >> > + unsigned color, struct drm_clip_rect *clips,
> >> > + unsigned num_clips);
> >> > void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> >> > struct drm_private_state *state);
> >> >
> >> > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> >> > index f7bf4a48b1c3..296fa22bda7a 100644
> >> > --- a/include/drm/drm_plane.h
> >> > +++ b/include/drm/drm_plane.h
> >> > @@ -76,6 +76,15 @@ struct drm_plane_state {
> >> > */
> >> > struct drm_framebuffer *fb;
> >> >
> >> > + /**
> >> > + * @dirty:
> >> > + *
> >> > + * Flag that indicates the fb contents have changed even though the
> >> > + * fb has not. This is mostly a stop-gap solution until we have
> >> > + * atomic dirty-rect(s) property.
> >> > + */
> >> > + bool dirty;
> >> > +
> >> > /**
> >> > * @fence:
> >> > *
> >> > --
> >> > 2.14.3
> >> >
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Op 04-04-18 om 14:05 schreef Rob Clark:
> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
> <[email protected]> wrote:
>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>> <[email protected]> wrote:
>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>>
>>>>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>>>>> include/drm/drm_plane.h | 9 +++++
>>>>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>> if (state->fb)
>>>>>>> drm_framebuffer_get(state->fb);
>>>>>>>
>>>>>>> + state->dirty = false;
>>>>>>> state->fence = NULL;
>>>>>>> state->commit = NULL;
>>>>>>> }
>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>
>>>>>>> +/**
>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>> + *
>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>> + */
>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>> + struct drm_file *file_priv, unsigned flags,
>>>>>>> + unsigned color, struct drm_clip_rect *clips,
>>>>>>> + unsigned num_clips)
>>>>>>> +{
>>>>>>> + struct drm_modeset_acquire_ctx ctx;
>>>>>>> + struct drm_atomic_state *state;
>>>>>>> + struct drm_plane *plane;
>>>>>>> + int ret = 0;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * When called from ioctl, we are interruptable, but not when
>>>>>>> + * called internally (ie. defio worker)
>>>>>>> + */
>>>>>>> + drm_modeset_acquire_init(&ctx,
>>>>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>> +
>>>>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>>>>> + if (!state) {
>>>>>>> + ret = -ENOMEM;
>>>>>>> + goto out;
>>>>>>> + }
>>>>>>> + state->acquire_ctx = &ctx;
>>>>>>> +
>>>>>>> +retry:
>>>>>>> + drm_for_each_plane(plane, fb->dev) {
>>>>>>> + struct drm_plane_state *plane_state;
>>>>>>> +
>>>>>>> + if (plane->state->fb != fb)
>>>>>>> + continue;
>>>>>>> +
>>>>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>> + if (IS_ERR(plane_state)) {
>>>>>>> + ret = PTR_ERR(plane_state);
>>>>>>> + goto out;
>>>>>>> + }
>>>>>>> +
>>>>>>> + plane_state->dirty = true;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>>>>> +
>>>>>>> +out:
>>>>>>> + if (ret == -EDEADLK) {
>>>>>>> + drm_atomic_state_clear(state);
>>>>>>> + ret = drm_modeset_backoff(&ctx);
>>>>>>> + if (!ret)
>>>>>>> + goto retry;
>>>>>>> + }
>>>>>>> +
>>>>>>> + drm_atomic_state_put(state);
>>>>>>> +
>>>>>>> + drm_modeset_drop_locks(&ctx);
>>>>>>> + drm_modeset_acquire_fini(&ctx);
>>>>>>> +
>>>>>>> + return ret;
>>>>>>> +
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>> +
>>>>>>> /**
>>>>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>> * @obj: CRTC object
>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>> * Figure out what fence to wait for:
>>>>>>> */
>>>>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>> + bool sync_fb = new_plane_state->fb &&
>>>>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>> + new_plane_state->dirty);
>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>
>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>> rendering flush already. IOW I don't think we need your
>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>> do a flip otherwise?
>>>>>>
>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>> others I think.
>>>>> Ok, I've done some history digging:
>>>>>
>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>> flips.
>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>> plane_state->fence for implicit fences.
>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>> for implicit fencing we fish out the implicit fence and stuff it in
>>> plane_state->fence..
>> What happens when userspace passes a fence fd to in_fence_fd?
> mixing implicit sync and explicit sync is undefined
So a drivers implicit sync may override the explicit sync? Would make more sense if the fences were merged.
On Wed, Apr 4, 2018 at 2:05 PM, Rob Clark <[email protected]> wrote:
> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
> <[email protected]> wrote:
>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>> <[email protected]> wrote:
>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>>
>>>>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>>>>> include/drm/drm_plane.h | 9 +++++
>>>>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>> if (state->fb)
>>>>>>> drm_framebuffer_get(state->fb);
>>>>>>>
>>>>>>> + state->dirty = false;
>>>>>>> state->fence = NULL;
>>>>>>> state->commit = NULL;
>>>>>>> }
>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>
>>>>>>> +/**
>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>> + *
>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>> + */
>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>> + struct drm_file *file_priv, unsigned flags,
>>>>>>> + unsigned color, struct drm_clip_rect *clips,
>>>>>>> + unsigned num_clips)
>>>>>>> +{
>>>>>>> + struct drm_modeset_acquire_ctx ctx;
>>>>>>> + struct drm_atomic_state *state;
>>>>>>> + struct drm_plane *plane;
>>>>>>> + int ret = 0;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * When called from ioctl, we are interruptable, but not when
>>>>>>> + * called internally (ie. defio worker)
>>>>>>> + */
>>>>>>> + drm_modeset_acquire_init(&ctx,
>>>>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>> +
>>>>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>>>>> + if (!state) {
>>>>>>> + ret = -ENOMEM;
>>>>>>> + goto out;
>>>>>>> + }
>>>>>>> + state->acquire_ctx = &ctx;
>>>>>>> +
>>>>>>> +retry:
>>>>>>> + drm_for_each_plane(plane, fb->dev) {
>>>>>>> + struct drm_plane_state *plane_state;
>>>>>>> +
>>>>>>> + if (plane->state->fb != fb)
>>>>>>> + continue;
>>>>>>> +
>>>>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>> + if (IS_ERR(plane_state)) {
>>>>>>> + ret = PTR_ERR(plane_state);
>>>>>>> + goto out;
>>>>>>> + }
>>>>>>> +
>>>>>>> + plane_state->dirty = true;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>>>>> +
>>>>>>> +out:
>>>>>>> + if (ret == -EDEADLK) {
>>>>>>> + drm_atomic_state_clear(state);
>>>>>>> + ret = drm_modeset_backoff(&ctx);
>>>>>>> + if (!ret)
>>>>>>> + goto retry;
>>>>>>> + }
>>>>>>> +
>>>>>>> + drm_atomic_state_put(state);
>>>>>>> +
>>>>>>> + drm_modeset_drop_locks(&ctx);
>>>>>>> + drm_modeset_acquire_fini(&ctx);
>>>>>>> +
>>>>>>> + return ret;
>>>>>>> +
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>> +
>>>>>>> /**
>>>>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>> * @obj: CRTC object
>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>> * Figure out what fence to wait for:
>>>>>>> */
>>>>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>> + bool sync_fb = new_plane_state->fb &&
>>>>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>> + new_plane_state->dirty);
>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>
>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>> rendering flush already. IOW I don't think we need your
>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>> do a flip otherwise?
>>>>>>
>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>> others I think.
>>>>> Ok, I've done some history digging:
>>>>>
>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>> flips.
>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>> plane_state->fence for implicit fences.
>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>> for implicit fencing we fish out the implicit fence and stuff it in
>>> plane_state->fence..
>> What happens when userspace passes a fence fd to in_fence_fd?
>
> mixing implicit sync and explicit sync is undefined
It's very well-defined, explicit sync wins. See the kerneldoc for
drm_atomic_set_fence_for_plane. Since a pile of drivers don't use that
the reality might be undefined though :-)
-Daniel
>
> BR,
> -R
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Op 04-04-18 om 14:26 schreef Daniel Vetter:
> On Wed, Apr 4, 2018 at 2:05 PM, Rob Clark <[email protected]> wrote:
>> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
>> <[email protected]> wrote:
>>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>>> <[email protected]> wrote:
>>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>>
>>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>>
>>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>>> ---
>>>>>>>> Background: there are a number of different folks working on getting
>>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>>
>>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>>
>>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>>
>>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>>> gap solution could be useful.
>>>>>>>>
>>>>>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>>>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>>>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>>>>>> include/drm/drm_plane.h | 9 +++++
>>>>>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>>> if (state->fb)
>>>>>>>> drm_framebuffer_get(state->fb);
>>>>>>>>
>>>>>>>> + state->dirty = false;
>>>>>>>> state->fence = NULL;
>>>>>>>> state->commit = NULL;
>>>>>>>> }
>>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>>> + *
>>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>>> + */
>>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>>> + struct drm_file *file_priv, unsigned flags,
>>>>>>>> + unsigned color, struct drm_clip_rect *clips,
>>>>>>>> + unsigned num_clips)
>>>>>>>> +{
>>>>>>>> + struct drm_modeset_acquire_ctx ctx;
>>>>>>>> + struct drm_atomic_state *state;
>>>>>>>> + struct drm_plane *plane;
>>>>>>>> + int ret = 0;
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * When called from ioctl, we are interruptable, but not when
>>>>>>>> + * called internally (ie. defio worker)
>>>>>>>> + */
>>>>>>>> + drm_modeset_acquire_init(&ctx,
>>>>>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>>> +
>>>>>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>>>>>> + if (!state) {
>>>>>>>> + ret = -ENOMEM;
>>>>>>>> + goto out;
>>>>>>>> + }
>>>>>>>> + state->acquire_ctx = &ctx;
>>>>>>>> +
>>>>>>>> +retry:
>>>>>>>> + drm_for_each_plane(plane, fb->dev) {
>>>>>>>> + struct drm_plane_state *plane_state;
>>>>>>>> +
>>>>>>>> + if (plane->state->fb != fb)
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>>> + if (IS_ERR(plane_state)) {
>>>>>>>> + ret = PTR_ERR(plane_state);
>>>>>>>> + goto out;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + plane_state->dirty = true;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>>>>>> +
>>>>>>>> +out:
>>>>>>>> + if (ret == -EDEADLK) {
>>>>>>>> + drm_atomic_state_clear(state);
>>>>>>>> + ret = drm_modeset_backoff(&ctx);
>>>>>>>> + if (!ret)
>>>>>>>> + goto retry;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + drm_atomic_state_put(state);
>>>>>>>> +
>>>>>>>> + drm_modeset_drop_locks(&ctx);
>>>>>>>> + drm_modeset_acquire_fini(&ctx);
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>>> +
>>>>>>>> /**
>>>>>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>>> * @obj: CRTC object
>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>>> * Figure out what fence to wait for:
>>>>>>>> */
>>>>>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>>> + bool sync_fb = new_plane_state->fb &&
>>>>>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>>> + new_plane_state->dirty);
>>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>>
>>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>>> rendering flush already. IOW I don't think we need your
>>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>>> do a flip otherwise?
>>>>>>>
>>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>>> others I think.
>>>>>> Ok, I've done some history digging:
>>>>>>
>>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>>> flips.
>>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>>> plane_state->fence for implicit fences.
>>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>>> for implicit fencing we fish out the implicit fence and stuff it in
>>>> plane_state->fence..
>>> What happens when userspace passes a fence fd to in_fence_fd?
>> mixing implicit sync and explicit sync is undefined
> It's very well-defined, explicit sync wins. See the kerneldoc for
> drm_atomic_set_fence_for_plane. Since a pile of drivers don't use that
> the reality might be undefined though :-)
Ah right, missed that part.
~Maarten
On Wed, Apr 4, 2018 at 8:16 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, Apr 04, 2018 at 07:40:32AM -0400, Rob Clark wrote:
>> On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <[email protected]> wrote:
>> > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> >> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>> >> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>> >> > > Hi,
>> >> > >
>> >> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>> >> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>> >> > > > > Add an atomic helper to implement dirtyfb support. This is needed to
>> >> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
>> >> > > > > rely on pageflips to trigger a flush to the panel).
>> >> > > > >
>> >> > > > > To signal to the driver that the async atomic update needs to
>> >> > > > > synchronize with fences, even though the fb didn't change, the
>> >> > > > > drm_atomic_state::dirty flag is added.
>> >> > > > >
>> >> > > > > Signed-off-by: Rob Clark <[email protected]>
>> >> > > > > ---
>> >> > > > > Background: there are a number of different folks working on getting
>> >> > > > > upstream kernel working on various different phones/tablets with qcom
>> >> > > > > SoC's.. many of them have command mode panels, so we kind of need a
>> >> > > > > way to support the legacy dirtyfb ioctl for x11 support.
>> >> > > > >
>> >> > > > > I know there is work on a proprer non-legacy atomic property for
>> >> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
>> >> > > > > be improved from triggering a full-frame flush once that is in
>> >> > > > > place. But we kinda needa a stop-gap solution.
>> >> > > > >
>> >> > > > > I had considered an in-driver solution for this, but things get a
>> >> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> >> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
>> >> > > > > with setting the CTL.START bit. (ie. really all we need to do for
>> >> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
>> >> > > > > pageflips setting FLUSH bits, then bad things.) The easiest soln
>> >> > > > > is to wrap this up as an atomic commit and rely on the worker to
>> >> > > > > serialize things. Hence adding an atomic dirtyfb helper.
>> >> > > > >
>> >> > > > > I guess at least the helper, with some small addition to translate
>> >> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> >> > > > > rect property solution. Depending on how far off that is, a stop-
>> >> > > > > gap solution could be useful.
>> >> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>> >> > > > -Daniel
>> >> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
>> >> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>> >> > > get to the desired coordinates.
>> >> > >
>> >> > > One thing to perhaps discuss is how we would like to fit this with
>> >> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
>> >> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>> >> > > damage region that can be fully ignored by the driver, new content is
>> >> > > indicated by a new framebuffer.
>> >> > >
>> >> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
>> >> > > you do here, or provide a content_age state member. Since we clear the dirty
>> >> > > flag on state copies, I guess that would be sufficient. The blob rectangles
>> >> > > would then become a hint to restrict the damage region.
>> >> > I'm not entirely following here - I thought for frontbuffer rendering the
>> >> > dirty rects have always just been a hint, and that the driver was always
>> >> > free to re-upload the entire buffer to the screen.
>> >> >
>> >> > And through a helper like Rob's proposing here (and have floated around in
>> >> > different versions already) we'd essentially map a frontbuffer dirtyfb
>> >> > call to a fake flip with dirty rect. Manual upload drivers already need to
>> >> > upload the entire screen if they get a flip, since some userspace uses
>> >> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
>> >> >
>> >> > So from that pov the new dirty flag is kinda not necessary imo.
>> >> >
>> >> > > Another approach would be to have the presence of dirty rects without
>> >> > > framebuffer change to indicate frontbuffer rendering.
>> >> > >
>> >> > > I think I like the first approach best, although it may be tempting for
>> >> > > user-space apps to just set the dirty bit instead of providing the full
>> >> > > damage region.
>> >> > Or I'm not following you here, because I don't quite see the difference
>> >> > between dirtyfb and a flip.
>> >> > -Daniel
>> >>
>> >> OK, let me rephrase:
>> >>
>> >> From the driver's point-of-view, in the atomic world, new content and the
>> >> need for manual upload is indicated by a change in fb attached to the plane.
>> >>
>> >> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> >> the need for manual upload is identified by the dirty flag, (since the fb
>> >> stays the same and the driver thus never identifies a page-flip).
>> >
>> > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> > (atomic or not) should result in the entire buffer getting uploaded. The
>> > dirty flag is kinda redundant, a flip with the same buffer works the same
>> > way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>> Userspace could do a pageflip, but (with buffer-age extension) only
>> re-render part of the frame. So there is a use-case for full frame
>> pageflip with hints about what region(s) changed since previous frame.
>>
>> (Of course userspace could screw that up and get different results on
>> "command" vs "video" style display.. in that case, it gets to keep
>> both pieces)
>
> I'm not against dirty on flips with the same buffer as an optimization.
> Disagreement here is around whether a flip to the same buffer means:
> a) nothing changed, upload nothing
> b) entire buffer might have changed
>
> I'm proposing we standardize on b) (with maybe the exception of legacy
> cursor ioctls because those are special) and add the dirty rects as an
> option to optimize this more.
ok, I can agree w/ flip to same buffer, if FB_ID is in the incoming
property list.. I just want a way to differentiate that from "plane
got added to new state for unrelated reasons".. so new_fb != old_fb
isn't a good test for "should we synchronize", but new plane state
isn't either.
Keeping the dirty flag (or sequence count or whatever mechanism) to
indicate that an FB_ID that happened to be the same fb was part of
what userspace passed in would do the trick.
BR,
-R
> Either way you'd always have to follow the implicit fence (which is what
> your msm hunk seems to be doing, but only for dirtyfb).
> -Daniel
>
>>
>> BR,
>> -R
>>
>>
>> >> In both these situations, clip rects can provide a hint to restrict the
>> >> dirty region.
>> >>
>> >> Now I was thinking about the preferred way for user-space to communicate
>> >> front buffer rendering through the atomic ioctl:
>> >>
>> >> 1) Expose a dirty (or content_age property)
>> >> 2) Attach a clip-rect blob property.
>> >> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> >> underlying buffer object.
>> >>
>> >> Are you saying that people are already using 3) and we should keep using
>> >> that?
>> >
>> > I'm saying they're using 3b), flip the same buffer wrapped in the same
>> > drm_framebuffer, and expect it to work.
>> >
>> > The only advantage dirtyfb has is that it allows you to supply the
>> > optimized upload rectangles, but at the cost of a funny api (it's working
>> > on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>> > confirm when exactly you uploaded your stuff. But imo they should be the
>> > same underlying operation.
>> >
>> > Also note that atomic helpers don't optimize out plane flips for same
>> > buffers. We only optimize out some of the waiting, in a failed attempt at
>> > making cursors stall less, but that's not fixed with the async plane
>> > update stuff. And we can obviously optimize out the prepare/cleanup hooks,
>> > because the buffer should be pinned already.
>> >
>> > So imo for a quick fix I think we need:
>> > 1) Fix drivers (or at least msm) to upload buffers for manual upload on
>> > all flips (well, all screen updates). Probably should do the fence waiting
>> > unconditionally too (and handle cursors with the new async stuff).
>> > 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
>> > commit, which currently means we're throwing the dirty rect optimization
>> > into the wind. But that could easily be added to the drm_plane_state,
>> > without exposing it to userspace as a property just yet.
>> >
>> > Cheers, Daniel
>> > --
>> > Daniel Vetter
>> > Software Engineer, Intel Corporation
>> > http://blog.ffwll.ch
>> _______________________________________________
>> dri-devel mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
On Wed, Apr 4, 2018 at 8:26 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, Apr 4, 2018 at 2:05 PM, Rob Clark <[email protected]> wrote:
>> On Wed, Apr 4, 2018 at 7:49 AM, Maarten Lankhorst
>> <[email protected]> wrote:
>>> Op 04-04-18 om 13:37 schreef Rob Clark:
>>>> On Wed, Apr 4, 2018 at 6:36 AM, Maarten Lankhorst
>>>> <[email protected]> wrote:
>>>>> Op 04-04-18 om 12:21 schreef Daniel Vetter:
>>>>>> On Wed, Apr 04, 2018 at 12:03:00PM +0200, Daniel Vetter wrote:
>>>>>>> On Tue, Apr 03, 2018 at 06:42:23PM -0400, Rob Clark wrote:
>>>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>>
>>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>>
>>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>>> ---
>>>>>>>> Background: there are a number of different folks working on getting
>>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>>
>>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>>
>>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>>
>>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>>> gap solution could be useful.
>>>>>>>>
>>>>>>>> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
>>>>>>>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>>>>>>>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>>>>>>>> include/drm/drm_atomic_helper.h | 4 +++
>>>>>>>> include/drm/drm_plane.h | 9 +++++
>>>>>>>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> index c35654591c12..a578dc681b27 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>>>>>> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>>>>>>>> if (state->fb)
>>>>>>>> drm_framebuffer_get(state->fb);
>>>>>>>>
>>>>>>>> + state->dirty = false;
>>>>>>>> state->fence = NULL;
>>>>>>>> state->commit = NULL;
>>>>>>>> }
>>>>>>>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>>>>>>>> + *
>>>>>>>> + * A helper to implement drm_framebuffer_funcs::dirty
>>>>>>>> + */
>>>>>>>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>>>>>>>> + struct drm_file *file_priv, unsigned flags,
>>>>>>>> + unsigned color, struct drm_clip_rect *clips,
>>>>>>>> + unsigned num_clips)
>>>>>>>> +{
>>>>>>>> + struct drm_modeset_acquire_ctx ctx;
>>>>>>>> + struct drm_atomic_state *state;
>>>>>>>> + struct drm_plane *plane;
>>>>>>>> + int ret = 0;
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * When called from ioctl, we are interruptable, but not when
>>>>>>>> + * called internally (ie. defio worker)
>>>>>>>> + */
>>>>>>>> + drm_modeset_acquire_init(&ctx,
>>>>>>>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>>>>>>>> +
>>>>>>>> + state = drm_atomic_state_alloc(fb->dev);
>>>>>>>> + if (!state) {
>>>>>>>> + ret = -ENOMEM;
>>>>>>>> + goto out;
>>>>>>>> + }
>>>>>>>> + state->acquire_ctx = &ctx;
>>>>>>>> +
>>>>>>>> +retry:
>>>>>>>> + drm_for_each_plane(plane, fb->dev) {
>>>>>>>> + struct drm_plane_state *plane_state;
>>>>>>>> +
>>>>>>>> + if (plane->state->fb != fb)
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + plane_state = drm_atomic_get_plane_state(state, plane);
>>>>>>>> + if (IS_ERR(plane_state)) {
>>>>>>>> + ret = PTR_ERR(plane_state);
>>>>>>>> + goto out;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + plane_state->dirty = true;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + ret = drm_atomic_nonblocking_commit(state);
>>>>>>>> +
>>>>>>>> +out:
>>>>>>>> + if (ret == -EDEADLK) {
>>>>>>>> + drm_atomic_state_clear(state);
>>>>>>>> + ret = drm_modeset_backoff(&ctx);
>>>>>>>> + if (!ret)
>>>>>>>> + goto retry;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + drm_atomic_state_put(state);
>>>>>>>> +
>>>>>>>> + drm_modeset_drop_locks(&ctx);
>>>>>>>> + drm_modeset_acquire_fini(&ctx);
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>>>>>>>> +
>>>>>>>> /**
>>>>>>>> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
>>>>>>>> * @obj: CRTC object
>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> index bf5f8c39f34d..bb55a048e98b 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>>>>>>>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>>>>>>>> * Figure out what fence to wait for:
>>>>>>>> */
>>>>>>>> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
>>>>>>>> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
>>>>>>>> + bool sync_fb = new_plane_state->fb &&
>>>>>>>> + ((new_plane_state->fb != old_plane_state->fb) ||
>>>>>>>> + new_plane_state->dirty);
>>>>>>> Why do you have this optimization even here? Imo flipping to the same fb
>>>>>>> should result in the fb getting fully uploaded, whether you're doing a
>>>>>>> legacy page_flip, and atomic one or just a plane update.
>>>>>>>
>>>>>>> Iirc some userspace does use that as essentially a full-plane frontbuffer
>>>>>>> rendering flush already. IOW I don't think we need your
>>>>>>> plane_state->dirty, it's implied to always be true - why would userspace
>>>>>>> do a flip otherwise?
>>>>>>>
>>>>>>> The helper itself to map dirtyfb to a nonblocking atomic commit looks
>>>>>>> reasonable, but misses a bunch of the trickery discussed with Noralf and
>>>>>>> others I think.
>>>>>> Ok, I've done some history digging:
>>>>>>
>>>>>> - i915 and nouveau unconditionally wait for fences, even for same-fb
>>>>>> flips.
>>>>>> - no idea what amdgpu and vmwgfx are doing, they're not using
>>>>>> plane_state->fence for implicit fences.
>>>>> I thought plane_state->fence was used for explicit fences, so its use by drivers
>>>>> would interfere with it? I don't think fencing would work on msm or vc4..
>>>> for implicit fencing we fish out the implicit fence and stuff it in
>>>> plane_state->fence..
>>> What happens when userspace passes a fence fd to in_fence_fd?
>>
>> mixing implicit sync and explicit sync is undefined
>
> It's very well-defined, explicit sync wins. See the kerneldoc for
> drm_atomic_set_fence_for_plane. Since a pile of drivers don't use that
> the reality might be undefined though :-)
ahh, msm_atomic does do drm_atomic_set_fence_for_plane() for
fished-out-fences so if that dtrt, it should be good..
BR,
-R
> -Daniel
>
>>
>> BR,
>> -R
>> _______________________________________________
>> dri-devel mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Wed, Apr 4, 2018 at 3:24 PM, Rob Clark <[email protected]> wrote:
> On Wed, Apr 4, 2018 at 8:16 AM, Daniel Vetter <[email protected]> wrote:
>> On Wed, Apr 04, 2018 at 07:40:32AM -0400, Rob Clark wrote:
>>> On Wed, Apr 4, 2018 at 5:56 AM, Daniel Vetter <[email protected]> wrote:
>>> > On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>> >> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>> >> > On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>> >> > > Hi,
>>> >> > >
>>> >> > > On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>> >> > > > On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>>> >> > > > > Add an atomic helper to implement dirtyfb support. This is needed to
>>> >> > > > > support DSI command-mode panels with x11 userspace (ie. when we can't
>>> >> > > > > rely on pageflips to trigger a flush to the panel).
>>> >> > > > >
>>> >> > > > > To signal to the driver that the async atomic update needs to
>>> >> > > > > synchronize with fences, even though the fb didn't change, the
>>> >> > > > > drm_atomic_state::dirty flag is added.
>>> >> > > > >
>>> >> > > > > Signed-off-by: Rob Clark <[email protected]>
>>> >> > > > > ---
>>> >> > > > > Background: there are a number of different folks working on getting
>>> >> > > > > upstream kernel working on various different phones/tablets with qcom
>>> >> > > > > SoC's.. many of them have command mode panels, so we kind of need a
>>> >> > > > > way to support the legacy dirtyfb ioctl for x11 support.
>>> >> > > > >
>>> >> > > > > I know there is work on a proprer non-legacy atomic property for
>>> >> > > > > userspace to communicate dirty-rect(s) to the kernel, so this can
>>> >> > > > > be improved from triggering a full-frame flush once that is in
>>> >> > > > > place. But we kinda needa a stop-gap solution.
>>> >> > > > >
>>> >> > > > > I had considered an in-driver solution for this, but things get a
>>> >> > > > > bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>> >> > > > > flips, because we need to synchronize setting various CTL.FLUSH bits
>>> >> > > > > with setting the CTL.START bit. (ie. really all we need to do for
>>> >> > > > > cmd mode panels is bang CTL.START, but is this ends up racing with
>>> >> > > > > pageflips setting FLUSH bits, then bad things.) The easiest soln
>>> >> > > > > is to wrap this up as an atomic commit and rely on the worker to
>>> >> > > > > serialize things. Hence adding an atomic dirtyfb helper.
>>> >> > > > >
>>> >> > > > > I guess at least the helper, with some small addition to translate
>>> >> > > > > and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>> >> > > > > rect property solution. Depending on how far off that is, a stop-
>>> >> > > > > gap solution could be useful.
>>> >> > > > Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>> >> > > > -Daniel
>>> >> > > I've asked Deepak to RFC the core changes suggested for the full dirty blob
>>> >> > > on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>>> >> > > get to the desired coordinates.
>>> >> > >
>>> >> > > One thing to perhaps discuss is how we would like to fit this with
>>> >> > > front-buffer rendering and the dirty ioctl. In the page-flip context, the
>>> >> > > dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>>> >> > > damage region that can be fully ignored by the driver, new content is
>>> >> > > indicated by a new framebuffer.
>>> >> > >
>>> >> > > We could do the same for frontbuffer rendering: Either set a dirty flag like
>>> >> > > you do here, or provide a content_age state member. Since we clear the dirty
>>> >> > > flag on state copies, I guess that would be sufficient. The blob rectangles
>>> >> > > would then become a hint to restrict the damage region.
>>> >> > I'm not entirely following here - I thought for frontbuffer rendering the
>>> >> > dirty rects have always just been a hint, and that the driver was always
>>> >> > free to re-upload the entire buffer to the screen.
>>> >> >
>>> >> > And through a helper like Rob's proposing here (and have floated around in
>>> >> > different versions already) we'd essentially map a frontbuffer dirtyfb
>>> >> > call to a fake flip with dirty rect. Manual upload drivers already need to
>>> >> > upload the entire screen if they get a flip, since some userspace uses
>>> >> > that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>> >> >
>>> >> > So from that pov the new dirty flag is kinda not necessary imo.
>>> >> >
>>> >> > > Another approach would be to have the presence of dirty rects without
>>> >> > > framebuffer change to indicate frontbuffer rendering.
>>> >> > >
>>> >> > > I think I like the first approach best, although it may be tempting for
>>> >> > > user-space apps to just set the dirty bit instead of providing the full
>>> >> > > damage region.
>>> >> > Or I'm not following you here, because I don't quite see the difference
>>> >> > between dirtyfb and a flip.
>>> >> > -Daniel
>>> >>
>>> >> OK, let me rephrase:
>>> >>
>>> >> From the driver's point-of-view, in the atomic world, new content and the
>>> >> need for manual upload is indicated by a change in fb attached to the plane.
>>> >>
>>> >> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>>> >> the need for manual upload is identified by the dirty flag, (since the fb
>>> >> stays the same and the driver thus never identifies a page-flip).
>>> >
>>> > Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>>> > (atomic or not) should result in the entire buffer getting uploaded. The
>>> > dirty flag is kinda redundant, a flip with the same buffer works the same
>>> > way as a dirtyfb with the entire buffer as the dirty rectangle.
>>>
>>> Userspace could do a pageflip, but (with buffer-age extension) only
>>> re-render part of the frame. So there is a use-case for full frame
>>> pageflip with hints about what region(s) changed since previous frame.
>>>
>>> (Of course userspace could screw that up and get different results on
>>> "command" vs "video" style display.. in that case, it gets to keep
>>> both pieces)
>>
>> I'm not against dirty on flips with the same buffer as an optimization.
>> Disagreement here is around whether a flip to the same buffer means:
>> a) nothing changed, upload nothing
>> b) entire buffer might have changed
>>
>> I'm proposing we standardize on b) (with maybe the exception of legacy
>> cursor ioctls because those are special) and add the dirty rects as an
>> option to optimize this more.
>
> ok, I can agree w/ flip to same buffer, if FB_ID is in the incoming
> property list.. I just want a way to differentiate that from "plane
> got added to new state for unrelated reasons".. so new_fb != old_fb
> isn't a good test for "should we synchronize", but new plane state
> isn't either.
Why do you add planes for unrelated reasons? Also I'm not clear on how
looking at FB_ID will work for all the legacy ioctls (plane_update,
page_flip, setcrtc can also be used to do a blocking flip, and is
used).
Atomic core doesn't add any unecessary planes. Atomic helpers add only
the planes for all the crtc you're touching anyway, so maybe we need
to add a flag there. Anything else is up to your driver code.
-Daniel
> Keeping the dirty flag (or sequence count or whatever mechanism) to
> indicate that an FB_ID that happened to be the same fb was part of
> what userspace passed in would do the trick.
>
> BR,
> -R
>
>> Either way you'd always have to follow the implicit fence (which is what
>> your msm hunk seems to be doing, but only for dirtyfb).
>> -Daniel
>>
>>>
>>> BR,
>>> -R
>>>
>>>
>>> >> In both these situations, clip rects can provide a hint to restrict the
>>> >> dirty region.
>>> >>
>>> >> Now I was thinking about the preferred way for user-space to communicate
>>> >> front buffer rendering through the atomic ioctl:
>>> >>
>>> >> 1) Expose a dirty (or content_age property)
>>> >> 2) Attach a clip-rect blob property.
>>> >> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>>> >> underlying buffer object.
>>> >>
>>> >> Are you saying that people are already using 3) and we should keep using
>>> >> that?
>>> >
>>> > I'm saying they're using 3b), flip the same buffer wrapped in the same
>>> > drm_framebuffer, and expect it to work.
>>> >
>>> > The only advantage dirtyfb has is that it allows you to supply the
>>> > optimized upload rectangles, but at the cost of a funny api (it's working
>>> > on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>>> > confirm when exactly you uploaded your stuff. But imo they should be the
>>> > same underlying operation.
>>> >
>>> > Also note that atomic helpers don't optimize out plane flips for same
>>> > buffers. We only optimize out some of the waiting, in a failed attempt at
>>> > making cursors stall less, but that's not fixed with the async plane
>>> > update stuff. And we can obviously optimize out the prepare/cleanup hooks,
>>> > because the buffer should be pinned already.
>>> >
>>> > So imo for a quick fix I think we need:
>>> > 1) Fix drivers (or at least msm) to upload buffers for manual upload on
>>> > all flips (well, all screen updates). Probably should do the fence waiting
>>> > unconditionally too (and handle cursors with the new async stuff).
>>> > 2) Tiny helper that remaps a dirtyfb call into a nonblocking atomic
>>> > commit, which currently means we're throwing the dirty rect optimization
>>> > into the wind. But that could easily be added to the drm_plane_state,
>>> > without exposing it to userspace as a property just yet.
>>> >
>>> > Cheers, Daniel
>>> > --
>>> > Daniel Vetter
>>> > Software Engineer, Intel Corporation
>>> > http://blog.ffwll.ch
>>> _______________________________________________
>>> dri-devel mailing list
>>> [email protected]
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Den 04.04.2018 00.42, skrev Rob Clark:
> Add an atomic helper to implement dirtyfb support. This is needed to
> support DSI command-mode panels with x11 userspace (ie. when we can't
> rely on pageflips to trigger a flush to the panel).
>
> To signal to the driver that the async atomic update needs to
> synchronize with fences, even though the fb didn't change, the
> drm_atomic_state::dirty flag is added.
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> Background: there are a number of different folks working on getting
> upstream kernel working on various different phones/tablets with qcom
> SoC's.. many of them have command mode panels, so we kind of need a
> way to support the legacy dirtyfb ioctl for x11 support.
>
> I know there is work on a proprer non-legacy atomic property for
> userspace to communicate dirty-rect(s) to the kernel, so this can
> be improved from triggering a full-frame flush once that is in
> place. But we kinda needa a stop-gap solution.
>
> I had considered an in-driver solution for this, but things get a
> bit tricky if userspace ands up combining dirtyfb ioctls with page-
> flips, because we need to synchronize setting various CTL.FLUSH bits
> with setting the CTL.START bit. (ie. really all we need to do for
> cmd mode panels is bang CTL.START, but is this ends up racing with
> pageflips setting FLUSH bits, then bad things.) The easiest soln
> is to wrap this up as an atomic commit and rely on the worker to
> serialize things. Hence adding an atomic dirtyfb helper.
>
> I guess at least the helper, with some small addition to translate
> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
> rect property solution. Depending on how far off that is, a stop-
> gap solution could be useful.
I have been waiting for someone to address this since I'm not skilled
enough myself to tackle the atomic machinery. It would be be nice to do
all flushing during commit since that would mean that the tinydrm drivers
could be driven solely through drm_simple_display_pipe_funcs. I wouldn't
have to wire through the dirty callback like I do now.
I see that you use a nonblocking commit. What happens if a new dirtyfb
comes in before the previous is done?
If we could save the dirty clips, then I could use this in tinydrm.
A full flush over SPI takes ~50ms so I need to shave off where I can.
This works for me in my tinydrm world:
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index c64225274807..218cb36757fa 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -28,6 +28,7 @@
#include <drm/drm_mode_object.h>
#include <drm/drm_color_mgmt.h>
+struct drm_clip_rect;
struct drm_crtc;
struct drm_printer;
struct drm_modeset_acquire_ctx;
@@ -85,6 +86,9 @@ struct drm_plane_state {
*/
bool dirty;
+ struct drm_clip_rect *dirty_clips;
+ unsigned int num_dirty_clips;
+
/**
* @fence:
*
diff --git a/drivers/gpu/drm/drm_atomic_helper.c
b/drivers/gpu/drm/drm_atomic_helper.c
index 8066c508ab50..e00b8247b7c5 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3521,6 +3521,8 @@ void
__drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
drm_framebuffer_get(state->fb);
state->dirty = false;
+ state->dirty_clips = NULL;
+ state->num_dirty_clips = 0;
state->fence = NULL;
state->commit = NULL;
}
@@ -3567,6 +3569,8 @@ void
__drm_atomic_helper_plane_destroy_state(struct drm_plane_state *state)
if (state->commit)
drm_crtc_commit_put(state->commit);
+
+ kfree(state->dirty_clips);
}
EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
@@ -3877,8 +3881,20 @@ int drm_atomic_helper_dirtyfb(struct
drm_framebuffer *fb,
struct drm_modeset_acquire_ctx ctx;
struct drm_atomic_state *state;
struct drm_plane *plane;
+ bool fb_found = false;
int ret = 0;
+ /* fbdev can flush even when we don't want it to */
+ drm_for_each_plane(plane, fb->dev) {
+ if (plane->state->fb == fb) {
+ fb_found = true;
+ break;
+ }
+ }
+
+ if (!fb_found)
+ return 0;
+
/*
* When called from ioctl, we are interruptable, but not when
* called internally (ie. defio worker)
@@ -3907,6 +3923,25 @@ int drm_atomic_helper_dirtyfb(struct
drm_framebuffer *fb,
}
plane_state->dirty = true;
+ if (clips && num_clips)
+ plane_state->dirty_clips = kmemdup(clips,
num_clips * sizeof(*clips),
+ GFP_KERNEL);
+ else
+ plane_state->dirty_clips =
kzalloc(sizeof(*clips), GFP_KERNEL);
+
+ if (!plane_state->dirty_clips) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (clips && num_clips) {
+ plane_state->num_dirty_clips = num_clips;
+ } else {
+ /* TODO: Maybe choose better max values */
+ plane_state->dirty_clips->x2 = ~0;
+ plane_state->dirty_clips->y2 = ~0;
+ plane_state->num_dirty_clips = 1;
+ }
}
ret = drm_atomic_nonblocking_commit(state);
diff --git a/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
b/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
index e68b528ae64d..1a0c72c496f6 100644
--- a/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
+++ b/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
@@ -121,12 +121,14 @@ void tinydrm_display_pipe_update(struct
drm_simple_display_pipe *pipe,
struct drm_plane_state *old_state)
{
struct tinydrm_device *tdev = pipe_to_tinydrm(pipe);
- struct drm_framebuffer *fb = pipe->plane.state->fb;
+ struct drm_plane_state *new_state = pipe->plane.state;
+ struct drm_framebuffer *fb = new_state->fb;
struct drm_crtc *crtc = &tdev->pipe.crtc;
- if (fb && (fb != old_state->fb)) {
+ if (fb && ((fb != old_state->fb) || new_state->dirty_clips)) {
if (tdev->fb_dirty)
- tdev->fb_dirty(fb, NULL, 0, 0, NULL, 0);
+ tdev->fb_dirty(fb, NULL, 0, 0,
new_state->dirty_clips,
+ new_state->num_dirty_clips);
}
if (crtc->state->event) {
diff --git a/drivers/gpu/drm/tinydrm/mipi-dbi.c
b/drivers/gpu/drm/tinydrm/mipi-dbi.c
index 4d1fb31a781f..49dee24dde60 100644
--- a/drivers/gpu/drm/tinydrm/mipi-dbi.c
+++ b/drivers/gpu/drm/tinydrm/mipi-dbi.c
@@ -9,6 +9,7 @@
* (at your option) any later version.
*/
+#include <drm/drm_atomic_helper.h>
#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/tinydrm/mipi-dbi.h>
#include <drm/tinydrm/tinydrm-helpers.h>
@@ -254,7 +257,7 @@ static int mipi_dbi_fb_dirty(struct drm_framebuffer *fb,
static const struct drm_framebuffer_funcs mipi_dbi_fb_funcs = {
.destroy = drm_gem_fb_destroy,
.create_handle = drm_gem_fb_create_handle,
- .dirty = tinydrm_fb_dirty,
+ .dirty = drm_atomic_helper_dirtyfb,
};
/**
I did some brief testing a few months back with PRIME buffers imported
from vc4 and it looked like X11 sometimes did a page flip followed by a
dirtyfb. This kills performance for me since I flush on both. Do you know
any details on how/where X11 handles this?
Noralf.
> drivers/gpu/drm/drm_atomic_helper.c | 66 +++++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
> drivers/gpu/drm/msm/msm_fb.c | 1 +
> include/drm/drm_atomic_helper.h | 4 +++
> include/drm/drm_plane.h | 9 +++++
> 5 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index c35654591c12..a578dc681b27 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3504,6 +3504,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
> if (state->fb)
> drm_framebuffer_get(state->fb);
>
> + state->dirty = false;
> state->fence = NULL;
> state->commit = NULL;
> }
> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> }
> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>
> +/**
> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
> + *
> + * A helper to implement drm_framebuffer_funcs::dirty
> + */
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv, unsigned flags,
> + unsigned color, struct drm_clip_rect *clips,
> + unsigned num_clips)
> +{
> + struct drm_modeset_acquire_ctx ctx;
> + struct drm_atomic_state *state;
> + struct drm_plane *plane;
> + int ret = 0;
> +
> + /*
> + * When called from ioctl, we are interruptable, but not when
> + * called internally (ie. defio worker)
> + */
> + drm_modeset_acquire_init(&ctx,
> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
> +
> + state = drm_atomic_state_alloc(fb->dev);
> + if (!state) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + state->acquire_ctx = &ctx;
> +
> +retry:
> + drm_for_each_plane(plane, fb->dev) {
> + struct drm_plane_state *plane_state;
> +
> + if (plane->state->fb != fb)
> + continue;
> +
> + plane_state = drm_atomic_get_plane_state(state, plane);
> + if (IS_ERR(plane_state)) {
> + ret = PTR_ERR(plane_state);
> + goto out;
> + }
> +
> + plane_state->dirty = true;
> + }
> +
> + ret = drm_atomic_nonblocking_commit(state);
> +
> +out:
> + if (ret == -EDEADLK) {
> + drm_atomic_state_clear(state);
> + ret = drm_modeset_backoff(&ctx);
> + if (!ret)
> + goto retry;
> + }
> +
> + drm_atomic_state_put(state);
> +
> + drm_modeset_drop_locks(&ctx);
> + drm_modeset_acquire_fini(&ctx);
> +
> + return ret;
> +
> +}
> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
> +
> /**
> * __drm_atomic_helper_private_duplicate_state - copy atomic private state
> * @obj: CRTC object
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index bf5f8c39f34d..bb55a048e98b 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
> * Figure out what fence to wait for:
> */
> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> - if ((new_plane_state->fb != old_plane_state->fb) && new_plane_state->fb) {
> + bool sync_fb = new_plane_state->fb &&
> + ((new_plane_state->fb != old_plane_state->fb) ||
> + new_plane_state->dirty);
> + if (sync_fb) {
> struct drm_gem_object *obj = msm_framebuffer_bo(new_plane_state->fb, 0);
> struct msm_gem_object *msm_obj = to_msm_bo(obj);
> struct dma_fence *fence = reservation_object_get_excl_rcu(msm_obj->resv);
> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
> index 0e0c87252ab0..a5d882a34a33 100644
> --- a/drivers/gpu/drm/msm/msm_fb.c
> +++ b/drivers/gpu/drm/msm/msm_fb.c
> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct drm_framebuffer *fb)
> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
> .create_handle = msm_framebuffer_create_handle,
> .destroy = msm_framebuffer_destroy,
> + .dirty = drm_atomic_helper_dirtyfb,
> };
>
> #ifdef CONFIG_DEBUG_FS
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 26aaba58d6ce..9b7a95c2643d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct drm_crtc *crtc,
> u16 *red, u16 *green, u16 *blue,
> uint32_t size,
> struct drm_modeset_acquire_ctx *ctx);
> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv, unsigned flags,
> + unsigned color, struct drm_clip_rect *clips,
> + unsigned num_clips);
> void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj,
> struct drm_private_state *state);
>
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index f7bf4a48b1c3..296fa22bda7a 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -76,6 +76,15 @@ struct drm_plane_state {
> */
> struct drm_framebuffer *fb;
>
> + /**
> + * @dirty:
> + *
> + * Flag that indicates the fb contents have changed even though the
> + * fb has not. This is mostly a stop-gap solution until we have
> + * atomic dirty-rect(s) property.
> + */
> + bool dirty;
> +
> /**
> * @fence:
> *
On Wed, Apr 4, 2018 at 7:41 PM, Noralf Trønnes <[email protected]> wrote:
>
>
> Den 04.04.2018 00.42, skrev Rob Clark:
>>
>> Add an atomic helper to implement dirtyfb support. This is needed to
>> support DSI command-mode panels with x11 userspace (ie. when we can't
>> rely on pageflips to trigger a flush to the panel).
>>
>> To signal to the driver that the async atomic update needs to
>> synchronize with fences, even though the fb didn't change, the
>> drm_atomic_state::dirty flag is added.
>>
>> Signed-off-by: Rob Clark <[email protected]>
>> ---
>> Background: there are a number of different folks working on getting
>> upstream kernel working on various different phones/tablets with qcom
>> SoC's.. many of them have command mode panels, so we kind of need a
>> way to support the legacy dirtyfb ioctl for x11 support.
>>
>> I know there is work on a proprer non-legacy atomic property for
>> userspace to communicate dirty-rect(s) to the kernel, so this can
>> be improved from triggering a full-frame flush once that is in
>> place. But we kinda needa a stop-gap solution.
>>
>> I had considered an in-driver solution for this, but things get a
>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> flips, because we need to synchronize setting various CTL.FLUSH bits
>> with setting the CTL.START bit. (ie. really all we need to do for
>> cmd mode panels is bang CTL.START, but is this ends up racing with
>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>> is to wrap this up as an atomic commit and rely on the worker to
>> serialize things. Hence adding an atomic dirtyfb helper.
>>
>> I guess at least the helper, with some small addition to translate
>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> rect property solution. Depending on how far off that is, a stop-
>> gap solution could be useful.
>
>
> I have been waiting for someone to address this since I'm not skilled
> enough myself to tackle the atomic machinery. It would be be nice to do
> all flushing during commit since that would mean that the tinydrm drivers
> could be driven solely through drm_simple_display_pipe_funcs. I wouldn't
> have to wire through the dirty callback like I do now.
>
> I see that you use a nonblocking commit. What happens if a new dirtyfb
> comes in before the previous is done?
We could reuse the same trick we're doing in the fbdev code, of
pushing the commit to a worker and doing it in a blocking fashion. Or
at least wait for it to finish (can be done with the crtc_state->event
stuff). That way userspace can pile us up in dirtyfb calls, but we'd
do at most one per frame. More isn't really useful anyway.
> If we could save the dirty clips, then I could use this in tinydrm.
> A full flush over SPI takes ~50ms so I need to shave off where I can.
>
> This works for me in my tinydrm world:
One thing I thought you've had around somewhere is some additional
tracking code, so we don't have to take all the plane mutexes in the
->dirtyfb callback. Does that exist, or was that just a figment of my
imagination?
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index c64225274807..218cb36757fa 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -28,6 +28,7 @@
> #include <drm/drm_mode_object.h>
> #include <drm/drm_color_mgmt.h>
>
> +struct drm_clip_rect;
> struct drm_crtc;
> struct drm_printer;
> struct drm_modeset_acquire_ctx;
> @@ -85,6 +86,9 @@ struct drm_plane_state {
> */
> bool dirty;
>
> + struct drm_clip_rect *dirty_clips;
> + unsigned int num_dirty_clips;
> +
> /**
> * @fence:
> *
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c
> b/drivers/gpu/drm/drm_atomic_helper.c
> index 8066c508ab50..e00b8247b7c5 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3521,6 +3521,8 @@ void __drm_atomic_helper_plane_duplicate_state(struct
> drm_plane *plane,
> drm_framebuffer_get(state->fb);
>
> state->dirty = false;
> + state->dirty_clips = NULL;
> + state->num_dirty_clips = 0;
> state->fence = NULL;
> state->commit = NULL;
> }
> @@ -3567,6 +3569,8 @@ void __drm_atomic_helper_plane_destroy_state(struct
> drm_plane_state *state)
>
> if (state->commit)
> drm_crtc_commit_put(state->commit);
> +
> + kfree(state->dirty_clips);
> }
> EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
>
> @@ -3877,8 +3881,20 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer
> *fb,
> struct drm_modeset_acquire_ctx ctx;
> struct drm_atomic_state *state;
> struct drm_plane *plane;
> + bool fb_found = false;
> int ret = 0;
>
> + /* fbdev can flush even when we don't want it to */
> + drm_for_each_plane(plane, fb->dev) {
> + if (plane->state->fb == fb) {
> + fb_found = true;
> + break;
> + }
> + }
> +
> + if (!fb_found)
> + return 0;
> +
> /*
> * When called from ioctl, we are interruptable, but not when
> * called internally (ie. defio worker)
> @@ -3907,6 +3923,25 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer
> *fb,
> }
>
> plane_state->dirty = true;
> + if (clips && num_clips)
> + plane_state->dirty_clips = kmemdup(clips, num_clips
> * sizeof(*clips),
> + GFP_KERNEL);
> + else
> + plane_state->dirty_clips = kzalloc(sizeof(*clips),
> GFP_KERNEL);
> +
> + if (!plane_state->dirty_clips) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + if (clips && num_clips) {
> + plane_state->num_dirty_clips = num_clips;
> + } else {
> + /* TODO: Maybe choose better max values */
> + plane_state->dirty_clips->x2 = ~0;
> + plane_state->dirty_clips->y2 = ~0;
> + plane_state->num_dirty_clips = 1;
Either we fill this out for all legacy paths, or we just require that
drivers upload everything when num_clips == 0. The trouble with having
num_clips == 0 imply a full upload is that we can't do a pure pan
without having to set a dummy clip rect of 0 size. Which is also
silly.
So I'm leaning towards going through all the legacy ioctl paths (well,
at least the ones where we agree it should result in a full upload,
cursor probably excluded) and setting up a clip rect with max values
like above for all of them.
And we'd need to remove the ->dirty bit then I think, since it'd be
entirely redundant.
Cheers, Daniel
> + }
> }
>
> ret = drm_atomic_nonblocking_commit(state);
> diff --git a/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> b/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> index e68b528ae64d..1a0c72c496f6 100644
> --- a/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> +++ b/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> @@ -121,12 +121,14 @@ void tinydrm_display_pipe_update(struct
> drm_simple_display_pipe *pipe,
> struct drm_plane_state *old_state)
> {
> struct tinydrm_device *tdev = pipe_to_tinydrm(pipe);
> - struct drm_framebuffer *fb = pipe->plane.state->fb;
> + struct drm_plane_state *new_state = pipe->plane.state;
> + struct drm_framebuffer *fb = new_state->fb;
> struct drm_crtc *crtc = &tdev->pipe.crtc;
>
> - if (fb && (fb != old_state->fb)) {
> + if (fb && ((fb != old_state->fb) || new_state->dirty_clips)) {
> if (tdev->fb_dirty)
> - tdev->fb_dirty(fb, NULL, 0, 0, NULL, 0);
> + tdev->fb_dirty(fb, NULL, 0, 0,
> new_state->dirty_clips,
> + new_state->num_dirty_clips);
> }
>
> if (crtc->state->event) {
> diff --git a/drivers/gpu/drm/tinydrm/mipi-dbi.c
> b/drivers/gpu/drm/tinydrm/mipi-dbi.c
> index 4d1fb31a781f..49dee24dde60 100644
> --- a/drivers/gpu/drm/tinydrm/mipi-dbi.c
> +++ b/drivers/gpu/drm/tinydrm/mipi-dbi.c
> @@ -9,6 +9,7 @@
> * (at your option) any later version.
> */
>
> +#include <drm/drm_atomic_helper.h>
> #include <drm/drm_gem_framebuffer_helper.h>
> #include <drm/tinydrm/mipi-dbi.h>
> #include <drm/tinydrm/tinydrm-helpers.h>
> @@ -254,7 +257,7 @@ static int mipi_dbi_fb_dirty(struct drm_framebuffer *fb,
> static const struct drm_framebuffer_funcs mipi_dbi_fb_funcs = {
> .destroy = drm_gem_fb_destroy,
> .create_handle = drm_gem_fb_create_handle,
> - .dirty = tinydrm_fb_dirty,
> + .dirty = drm_atomic_helper_dirtyfb,
> };
>
> /**
>
>
> I did some brief testing a few months back with PRIME buffers imported
> from vc4 and it looked like X11 sometimes did a page flip followed by a
> dirtyfb. This kills performance for me since I flush on both. Do you know
> any details on how/where X11 handles this?
>
> Noralf.
>
>
>> drivers/gpu/drm/drm_atomic_helper.c | 66
>> +++++++++++++++++++++++++++++++++++++
>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>> include/drm/drm_atomic_helper.h | 4 +++
>> include/drm/drm_plane.h | 9 +++++
>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c
>> b/drivers/gpu/drm/drm_atomic_helper.c
>> index c35654591c12..a578dc681b27 100644
>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> @@ -3504,6 +3504,7 @@ void
>> __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>> if (state->fb)
>> drm_framebuffer_get(state->fb);
>> + state->dirty = false;
>> state->fence = NULL;
>> state->commit = NULL;
>> }
>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct
>> drm_crtc *crtc,
>> }
>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>> +/**
>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>> + *
>> + * A helper to implement drm_framebuffer_funcs::dirty
>> + */
>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> + struct drm_file *file_priv, unsigned flags,
>> + unsigned color, struct drm_clip_rect *clips,
>> + unsigned num_clips)
>> +{
>> + struct drm_modeset_acquire_ctx ctx;
>> + struct drm_atomic_state *state;
>> + struct drm_plane *plane;
>> + int ret = 0;
>> +
>> + /*
>> + * When called from ioctl, we are interruptable, but not when
>> + * called internally (ie. defio worker)
>> + */
>> + drm_modeset_acquire_init(&ctx,
>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>> +
>> + state = drm_atomic_state_alloc(fb->dev);
>> + if (!state) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> + state->acquire_ctx = &ctx;
>> +
>> +retry:
>> + drm_for_each_plane(plane, fb->dev) {
>> + struct drm_plane_state *plane_state;
>> +
>> + if (plane->state->fb != fb)
>> + continue;
>> +
>> + plane_state = drm_atomic_get_plane_state(state, plane);
>> + if (IS_ERR(plane_state)) {
>> + ret = PTR_ERR(plane_state);
>> + goto out;
>> + }
>> +
>> + plane_state->dirty = true;
>> + }
>> +
>> + ret = drm_atomic_nonblocking_commit(state);
>> +
>> +out:
>> + if (ret == -EDEADLK) {
>> + drm_atomic_state_clear(state);
>> + ret = drm_modeset_backoff(&ctx);
>> + if (!ret)
>> + goto retry;
>> + }
>> +
>> + drm_atomic_state_put(state);
>> +
>> + drm_modeset_drop_locks(&ctx);
>> + drm_modeset_acquire_fini(&ctx);
>> +
>> + return ret;
>> +
>> +}
>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>> +
>> /**
>> * __drm_atomic_helper_private_duplicate_state - copy atomic private
>> state
>> * @obj: CRTC object
>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c
>> b/drivers/gpu/drm/msm/msm_atomic.c
>> index bf5f8c39f34d..bb55a048e98b 100644
>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>> * Figure out what fence to wait for:
>> */
>> for_each_oldnew_plane_in_state(state, plane, old_plane_state,
>> new_plane_state, i) {
>> - if ((new_plane_state->fb != old_plane_state->fb) &&
>> new_plane_state->fb) {
>> + bool sync_fb = new_plane_state->fb &&
>> + ((new_plane_state->fb != old_plane_state->fb) ||
>> + new_plane_state->dirty);
>> + if (sync_fb) {
>> struct drm_gem_object *obj =
>> msm_framebuffer_bo(new_plane_state->fb, 0);
>> struct msm_gem_object *msm_obj = to_msm_bo(obj);
>> struct dma_fence *fence =
>> reservation_object_get_excl_rcu(msm_obj->resv);
>> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>> index 0e0c87252ab0..a5d882a34a33 100644
>> --- a/drivers/gpu/drm/msm/msm_fb.c
>> +++ b/drivers/gpu/drm/msm/msm_fb.c
>> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct
>> drm_framebuffer *fb)
>> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>> .create_handle = msm_framebuffer_create_handle,
>> .destroy = msm_framebuffer_destroy,
>> + .dirty = drm_atomic_helper_dirtyfb,
>> };
>> #ifdef CONFIG_DEBUG_FS
>> diff --git a/include/drm/drm_atomic_helper.h
>> b/include/drm/drm_atomic_helper.h
>> index 26aaba58d6ce..9b7a95c2643d 100644
>> --- a/include/drm/drm_atomic_helper.h
>> +++ b/include/drm/drm_atomic_helper.h
>> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct
>> drm_crtc *crtc,
>> u16 *red, u16 *green, u16 *blue,
>> uint32_t size,
>> struct drm_modeset_acquire_ctx
>> *ctx);
>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> + struct drm_file *file_priv, unsigned flags,
>> + unsigned color, struct drm_clip_rect *clips,
>> + unsigned num_clips);
>> void __drm_atomic_helper_private_obj_duplicate_state(struct
>> drm_private_obj *obj,
>> struct
>> drm_private_state *state);
>> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>> index f7bf4a48b1c3..296fa22bda7a 100644
>> --- a/include/drm/drm_plane.h
>> +++ b/include/drm/drm_plane.h
>> @@ -76,6 +76,15 @@ struct drm_plane_state {
>> */
>> struct drm_framebuffer *fb;
>> + /**
>> + * @dirty:
>> + *
>> + * Flag that indicates the fb contents have changed even though
>> the
>> + * fb has not. This is mostly a stop-gap solution until we have
>> + * atomic dirty-rect(s) property.
>> + */
>> + bool dirty;
>> +
>> /**
>> * @fence:
>> *
>
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Den 04.04.2018 19.56, skrev Daniel Vetter:
> On Wed, Apr 4, 2018 at 7:41 PM, Noralf Trønnes <[email protected]> wrote:
>>
>> Den 04.04.2018 00.42, skrev Rob Clark:
>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>> rely on pageflips to trigger a flush to the panel).
>>>
>>> To signal to the driver that the async atomic update needs to
>>> synchronize with fences, even though the fb didn't change, the
>>> drm_atomic_state::dirty flag is added.
>>>
>>> Signed-off-by: Rob Clark <[email protected]>
>>> ---
>>> Background: there are a number of different folks working on getting
>>> upstream kernel working on various different phones/tablets with qcom
>>> SoC's.. many of them have command mode panels, so we kind of need a
>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>
>>> I know there is work on a proprer non-legacy atomic property for
>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>> be improved from triggering a full-frame flush once that is in
>>> place. But we kinda needa a stop-gap solution.
>>>
>>> I had considered an in-driver solution for this, but things get a
>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>> with setting the CTL.START bit. (ie. really all we need to do for
>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>> is to wrap this up as an atomic commit and rely on the worker to
>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>
>>> I guess at least the helper, with some small addition to translate
>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>> rect property solution. Depending on how far off that is, a stop-
>>> gap solution could be useful.
>>
>> I have been waiting for someone to address this since I'm not skilled
>> enough myself to tackle the atomic machinery. It would be be nice to do
>> all flushing during commit since that would mean that the tinydrm drivers
>> could be driven solely through drm_simple_display_pipe_funcs. I wouldn't
>> have to wire through the dirty callback like I do now.
>>
>> I see that you use a nonblocking commit. What happens if a new dirtyfb
>> comes in before the previous is done?
> We could reuse the same trick we're doing in the fbdev code, of
> pushing the commit to a worker and doing it in a blocking fashion. Or
> at least wait for it to finish (can be done with the crtc_state->event
> stuff). That way userspace can pile us up in dirtyfb calls, but we'd
> do at most one per frame. More isn't really useful anyway.
>
>> If we could save the dirty clips, then I could use this in tinydrm.
>> A full flush over SPI takes ~50ms so I need to shave off where I can.
>>
>> This works for me in my tinydrm world:
> One thing I thought you've had around somewhere is some additional
> tracking code, so we don't have to take all the plane mutexes in the
> ->dirtyfb callback. Does that exist, or was that just a figment of my
> imagination?
I haven't looked at this at all since I know way too little about how
atomic works and after the discussion started with damage on page flips,
I knew I just had to wait for someone other than me to do this. And I
hardly know anything about how the multitude of userspace operates.
So I'm sorry, but I can't add much to this discussion, I fall off rather
quickly when the details and corner cases are discussed.
All I can do is state the needs of tinydrm :-)
Noralf.
On Wed, Apr 4, 2018 at 1:41 PM, Noralf Trønnes <[email protected]> wrote:
>
>
> Den 04.04.2018 00.42, skrev Rob Clark:
>>
>> Add an atomic helper to implement dirtyfb support. This is needed to
>> support DSI command-mode panels with x11 userspace (ie. when we can't
>> rely on pageflips to trigger a flush to the panel).
>>
>> To signal to the driver that the async atomic update needs to
>> synchronize with fences, even though the fb didn't change, the
>> drm_atomic_state::dirty flag is added.
>>
>> Signed-off-by: Rob Clark <[email protected]>
>> ---
>> Background: there are a number of different folks working on getting
>> upstream kernel working on various different phones/tablets with qcom
>> SoC's.. many of them have command mode panels, so we kind of need a
>> way to support the legacy dirtyfb ioctl for x11 support.
>>
>> I know there is work on a proprer non-legacy atomic property for
>> userspace to communicate dirty-rect(s) to the kernel, so this can
>> be improved from triggering a full-frame flush once that is in
>> place. But we kinda needa a stop-gap solution.
>>
>> I had considered an in-driver solution for this, but things get a
>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>> flips, because we need to synchronize setting various CTL.FLUSH bits
>> with setting the CTL.START bit. (ie. really all we need to do for
>> cmd mode panels is bang CTL.START, but is this ends up racing with
>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>> is to wrap this up as an atomic commit and rely on the worker to
>> serialize things. Hence adding an atomic dirtyfb helper.
>>
>> I guess at least the helper, with some small addition to translate
>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>> rect property solution. Depending on how far off that is, a stop-
>> gap solution could be useful.
>
>
> I have been waiting for someone to address this since I'm not skilled
> enough myself to tackle the atomic machinery. It would be be nice to do
> all flushing during commit since that would mean that the tinydrm drivers
> could be driven solely through drm_simple_display_pipe_funcs. I wouldn't
> have to wire through the dirty callback like I do now.
>
> I see that you use a nonblocking commit. What happens if a new dirtyfb
> comes in before the previous is done?
I'm relying on the workqueue for committing the async part of
non-blocking commits to serialize things. This was actually something
I pretty badly need for msm (otherwise pageflip + dirtyfb causes races
for settings various FLUSH/START bits which need to happen in a
certain order)
This was what killed my earlier lazy plan of handling dirtyfb in drm/msm ;-)
> If we could save the dirty clips, then I could use this in tinydrm.
> A full flush over SPI takes ~50ms so I need to shave off where I can.
>
> This works for me in my tinydrm world:
>
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index c64225274807..218cb36757fa 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -28,6 +28,7 @@
> #include <drm/drm_mode_object.h>
> #include <drm/drm_color_mgmt.h>
>
> +struct drm_clip_rect;
> struct drm_crtc;
> struct drm_printer;
> struct drm_modeset_acquire_ctx;
> @@ -85,6 +86,9 @@ struct drm_plane_state {
> */
> bool dirty;
>
> + struct drm_clip_rect *dirty_clips;
> + unsigned int num_dirty_clips;
> +
> /**
> * @fence:
> *
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c
> b/drivers/gpu/drm/drm_atomic_helper.c
> index 8066c508ab50..e00b8247b7c5 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -3521,6 +3521,8 @@ void __drm_atomic_helper_plane_duplicate_state(struct
> drm_plane *plane,
> drm_framebuffer_get(state->fb);
>
> state->dirty = false;
> + state->dirty_clips = NULL;
> + state->num_dirty_clips = 0;
> state->fence = NULL;
> state->commit = NULL;
> }
> @@ -3567,6 +3569,8 @@ void __drm_atomic_helper_plane_destroy_state(struct
> drm_plane_state *state)
>
> if (state->commit)
> drm_crtc_commit_put(state->commit);
> +
> + kfree(state->dirty_clips);
> }
> EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
>
> @@ -3877,8 +3881,20 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer
> *fb,
> struct drm_modeset_acquire_ctx ctx;
> struct drm_atomic_state *state;
> struct drm_plane *plane;
> + bool fb_found = false;
> int ret = 0;
>
> + /* fbdev can flush even when we don't want it to */
> + drm_for_each_plane(plane, fb->dev) {
> + if (plane->state->fb == fb) {
> + fb_found = true;
> + break;
> + }
> + }
> +
> + if (!fb_found)
> + return 0;
> +
> /*
> * When called from ioctl, we are interruptable, but not when
> * called internally (ie. defio worker)
> @@ -3907,6 +3923,25 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer
> *fb,
> }
>
> plane_state->dirty = true;
> + if (clips && num_clips)
> + plane_state->dirty_clips = kmemdup(clips, num_clips
> * sizeof(*clips),
> + GFP_KERNEL);
> + else
> + plane_state->dirty_clips = kzalloc(sizeof(*clips),
> GFP_KERNEL);
> +
> + if (!plane_state->dirty_clips) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + if (clips && num_clips) {
> + plane_state->num_dirty_clips = num_clips;
> + } else {
> + /* TODO: Maybe choose better max values */
> + plane_state->dirty_clips->x2 = ~0;
> + plane_state->dirty_clips->y2 = ~0;
> + plane_state->num_dirty_clips = 1;
> + }
> }
>
> ret = drm_atomic_nonblocking_commit(state);
> diff --git a/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> b/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> index e68b528ae64d..1a0c72c496f6 100644
> --- a/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> +++ b/drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c
> @@ -121,12 +121,14 @@ void tinydrm_display_pipe_update(struct
> drm_simple_display_pipe *pipe,
> struct drm_plane_state *old_state)
> {
> struct tinydrm_device *tdev = pipe_to_tinydrm(pipe);
> - struct drm_framebuffer *fb = pipe->plane.state->fb;
> + struct drm_plane_state *new_state = pipe->plane.state;
> + struct drm_framebuffer *fb = new_state->fb;
> struct drm_crtc *crtc = &tdev->pipe.crtc;
>
> - if (fb && (fb != old_state->fb)) {
> + if (fb && ((fb != old_state->fb) || new_state->dirty_clips)) {
> if (tdev->fb_dirty)
> - tdev->fb_dirty(fb, NULL, 0, 0, NULL, 0);
> + tdev->fb_dirty(fb, NULL, 0, 0,
> new_state->dirty_clips,
> + new_state->num_dirty_clips);
> }
>
> if (crtc->state->event) {
> diff --git a/drivers/gpu/drm/tinydrm/mipi-dbi.c
> b/drivers/gpu/drm/tinydrm/mipi-dbi.c
> index 4d1fb31a781f..49dee24dde60 100644
> --- a/drivers/gpu/drm/tinydrm/mipi-dbi.c
> +++ b/drivers/gpu/drm/tinydrm/mipi-dbi.c
> @@ -9,6 +9,7 @@
> * (at your option) any later version.
> */
>
> +#include <drm/drm_atomic_helper.h>
> #include <drm/drm_gem_framebuffer_helper.h>
> #include <drm/tinydrm/mipi-dbi.h>
> #include <drm/tinydrm/tinydrm-helpers.h>
> @@ -254,7 +257,7 @@ static int mipi_dbi_fb_dirty(struct drm_framebuffer *fb,
> static const struct drm_framebuffer_funcs mipi_dbi_fb_funcs = {
> .destroy = drm_gem_fb_destroy,
> .create_handle = drm_gem_fb_create_handle,
> - .dirty = tinydrm_fb_dirty,
> + .dirty = drm_atomic_helper_dirtyfb,
> };
>
> /**
>
>
> I did some brief testing a few months back with PRIME buffers imported
> from vc4 and it looked like X11 sometimes did a page flip followed by a
> dirtyfb. This kills performance for me since I flush on both. Do you know
> any details on how/where X11 handles this?
>
From vague memory in x11 there is some sort mechanism to track damage
and flush to kernel when xserver becomes idle. So pageflip followed
by dirtyfb doesn't seem too surpising. I'm not sure how vmwgfx
handles this, I guess they'd have the same issue..
BR,
-R
> Noralf.
>
>
>> drivers/gpu/drm/drm_atomic_helper.c | 66
>> +++++++++++++++++++++++++++++++++++++
>> drivers/gpu/drm/msm/msm_atomic.c | 5 ++-
>> drivers/gpu/drm/msm/msm_fb.c | 1 +
>> include/drm/drm_atomic_helper.h | 4 +++
>> include/drm/drm_plane.h | 9 +++++
>> 5 files changed, 84 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c
>> b/drivers/gpu/drm/drm_atomic_helper.c
>> index c35654591c12..a578dc681b27 100644
>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> @@ -3504,6 +3504,7 @@ void
>> __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>> if (state->fb)
>> drm_framebuffer_get(state->fb);
>> + state->dirty = false;
>> state->fence = NULL;
>> state->commit = NULL;
>> }
>> @@ -3847,6 +3848,71 @@ int drm_atomic_helper_legacy_gamma_set(struct
>> drm_crtc *crtc,
>> }
>> EXPORT_SYMBOL(drm_atomic_helper_legacy_gamma_set);
>> +/**
>> + * drm_atomic_helper_dirtyfb - helper for dirtyfb
>> + *
>> + * A helper to implement drm_framebuffer_funcs::dirty
>> + */
>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> + struct drm_file *file_priv, unsigned flags,
>> + unsigned color, struct drm_clip_rect *clips,
>> + unsigned num_clips)
>> +{
>> + struct drm_modeset_acquire_ctx ctx;
>> + struct drm_atomic_state *state;
>> + struct drm_plane *plane;
>> + int ret = 0;
>> +
>> + /*
>> + * When called from ioctl, we are interruptable, but not when
>> + * called internally (ie. defio worker)
>> + */
>> + drm_modeset_acquire_init(&ctx,
>> + file_priv ? DRM_MODESET_ACQUIRE_INTERRUPTIBLE : 0);
>> +
>> + state = drm_atomic_state_alloc(fb->dev);
>> + if (!state) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> + state->acquire_ctx = &ctx;
>> +
>> +retry:
>> + drm_for_each_plane(plane, fb->dev) {
>> + struct drm_plane_state *plane_state;
>> +
>> + if (plane->state->fb != fb)
>> + continue;
>> +
>> + plane_state = drm_atomic_get_plane_state(state, plane);
>> + if (IS_ERR(plane_state)) {
>> + ret = PTR_ERR(plane_state);
>> + goto out;
>> + }
>> +
>> + plane_state->dirty = true;
>> + }
>> +
>> + ret = drm_atomic_nonblocking_commit(state);
>> +
>> +out:
>> + if (ret == -EDEADLK) {
>> + drm_atomic_state_clear(state);
>> + ret = drm_modeset_backoff(&ctx);
>> + if (!ret)
>> + goto retry;
>> + }
>> +
>> + drm_atomic_state_put(state);
>> +
>> + drm_modeset_drop_locks(&ctx);
>> + drm_modeset_acquire_fini(&ctx);
>> +
>> + return ret;
>> +
>> +}
>> +EXPORT_SYMBOL(drm_atomic_helper_dirtyfb);
>> +
>> /**
>> * __drm_atomic_helper_private_duplicate_state - copy atomic private
>> state
>> * @obj: CRTC object
>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c
>> b/drivers/gpu/drm/msm/msm_atomic.c
>> index bf5f8c39f34d..bb55a048e98b 100644
>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>> @@ -201,7 +201,10 @@ int msm_atomic_commit(struct drm_device *dev,
>> * Figure out what fence to wait for:
>> */
>> for_each_oldnew_plane_in_state(state, plane, old_plane_state,
>> new_plane_state, i) {
>> - if ((new_plane_state->fb != old_plane_state->fb) &&
>> new_plane_state->fb) {
>> + bool sync_fb = new_plane_state->fb &&
>> + ((new_plane_state->fb != old_plane_state->fb) ||
>> + new_plane_state->dirty);
>> + if (sync_fb) {
>> struct drm_gem_object *obj =
>> msm_framebuffer_bo(new_plane_state->fb, 0);
>> struct msm_gem_object *msm_obj = to_msm_bo(obj);
>> struct dma_fence *fence =
>> reservation_object_get_excl_rcu(msm_obj->resv);
>> diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
>> index 0e0c87252ab0..a5d882a34a33 100644
>> --- a/drivers/gpu/drm/msm/msm_fb.c
>> +++ b/drivers/gpu/drm/msm/msm_fb.c
>> @@ -62,6 +62,7 @@ static void msm_framebuffer_destroy(struct
>> drm_framebuffer *fb)
>> static const struct drm_framebuffer_funcs msm_framebuffer_funcs = {
>> .create_handle = msm_framebuffer_create_handle,
>> .destroy = msm_framebuffer_destroy,
>> + .dirty = drm_atomic_helper_dirtyfb,
>> };
>> #ifdef CONFIG_DEBUG_FS
>> diff --git a/include/drm/drm_atomic_helper.h
>> b/include/drm/drm_atomic_helper.h
>> index 26aaba58d6ce..9b7a95c2643d 100644
>> --- a/include/drm/drm_atomic_helper.h
>> +++ b/include/drm/drm_atomic_helper.h
>> @@ -183,6 +183,10 @@ int drm_atomic_helper_legacy_gamma_set(struct
>> drm_crtc *crtc,
>> u16 *red, u16 *green, u16 *blue,
>> uint32_t size,
>> struct drm_modeset_acquire_ctx
>> *ctx);
>> +int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>> + struct drm_file *file_priv, unsigned flags,
>> + unsigned color, struct drm_clip_rect *clips,
>> + unsigned num_clips);
>> void __drm_atomic_helper_private_obj_duplicate_state(struct
>> drm_private_obj *obj,
>> struct
>> drm_private_state *state);
>> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
>> index f7bf4a48b1c3..296fa22bda7a 100644
>> --- a/include/drm/drm_plane.h
>> +++ b/include/drm/drm_plane.h
>> @@ -76,6 +76,15 @@ struct drm_plane_state {
>> */
>> struct drm_framebuffer *fb;
>> + /**
>> + * @dirty:
>> + *
>> + * Flag that indicates the fb contents have changed even though
>> the
>> + * fb has not. This is mostly a stop-gap solution until we have
>> + * atomic dirty-rect(s) property.
>> + */
>> + bool dirty;
>> +
>> /**
>> * @fence:
>> *
>
>
On 04/04/2018 11:56 AM, Daniel Vetter wrote:
> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>> Hi,
>>>>
>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]> wrote:
>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>
>>>>>> To signal to the driver that the async atomic update needs to
>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>> ---
>>>>>> Background: there are a number of different folks working on getting
>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>
>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>
>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>
>>>>>> I guess at least the helper, with some small addition to translate
>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>> gap solution could be useful.
>>>>> Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
>>>>> -Daniel
>>>> I've asked Deepak to RFC the core changes suggested for the full dirty blob
>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
>>>> get to the desired coordinates.
>>>>
>>>> One thing to perhaps discuss is how we would like to fit this with
>>>> front-buffer rendering and the dirty ioctl. In the page-flip context, the
>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
>>>> damage region that can be fully ignored by the driver, new content is
>>>> indicated by a new framebuffer.
>>>>
>>>> We could do the same for frontbuffer rendering: Either set a dirty flag like
>>>> you do here, or provide a content_age state member. Since we clear the dirty
>>>> flag on state copies, I guess that would be sufficient. The blob rectangles
>>>> would then become a hint to restrict the damage region.
>>> I'm not entirely following here - I thought for frontbuffer rendering the
>>> dirty rects have always just been a hint, and that the driver was always
>>> free to re-upload the entire buffer to the screen.
>>>
>>> And through a helper like Rob's proposing here (and have floated around in
>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>> call to a fake flip with dirty rect. Manual upload drivers already need to
>>> upload the entire screen if they get a flip, since some userspace uses
>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>
>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>
>>>> Another approach would be to have the presence of dirty rects without
>>>> framebuffer change to indicate frontbuffer rendering.
>>>>
>>>> I think I like the first approach best, although it may be tempting for
>>>> user-space apps to just set the dirty bit instead of providing the full
>>>> damage region.
>>> Or I'm not following you here, because I don't quite see the difference
>>> between dirtyfb and a flip.
>>> -Daniel
>> OK, let me rephrase:
>>
>> From the driver's point-of-view, in the atomic world, new content and the
>> need for manual upload is indicated by a change in fb attached to the plane.
>>
>> With Rob's patch here, (correct me if I'm wrong) in addition new content and
>> the need for manual upload is identified by the dirty flag, (since the fb
>> stays the same and the driver thus never identifies a page-flip).
> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
> (atomic or not) should result in the entire buffer getting uploaded. The
> dirty flag is kinda redundant, a flip with the same buffer works the same
> way as a dirtyfb with the entire buffer as the dirty rectangle.
>
>> In both these situations, clip rects can provide a hint to restrict the
>> dirty region.
>>
>> Now I was thinking about the preferred way for user-space to communicate
>> front buffer rendering through the atomic ioctl:
>>
>> 1) Expose a dirty (or content_age property)
>> 2) Attach a clip-rect blob property.
>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
>> underlying buffer object.
>>
>> Are you saying that people are already using 3) and we should keep using
>> that?
> I'm saying they're using 3b), flip the same buffer wrapped in the same
> drm_framebuffer, and expect it to work.
>
> The only advantage dirtyfb has is that it allows you to supply the
> optimized upload rectangles, but at the cost of a funny api (it's working
> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
> confirm when exactly you uploaded your stuff. But imo they should be the
> same underlying operation.
>
FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like
rendering operation without any synchronization,
We've guaranteed that only the rects that are present are uploaded, but
only xf86-video-vmware has taken advantage of this to keep
CPU- and GPU rendered content apart.
I think we've at some point run into problems with number of cliprects,
(Old KDE lock screen?) and use multiple dirtyfb for the same
update...
IIRC the reason for working with the fb, is that it's much easier for
user-space, which doesn't have to care where planes are scanning out and
why.
"Present my new content on any screen that's affected".
/Thomas
On Thu, Apr 5, 2018 at 3:30 PM, Thomas Hellstrom <[email protected]> wrote:
> On 04/04/2018 11:56 AM, Daniel Vetter wrote:
>>
>> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>>
>>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>>>
>>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>>>
>>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>
>>>>>> Adding Noralf, who iirc already posted the full dirty helpers already
>>>>>> somewhere.
>>>>>> -Daniel
>>>>>
>>>>> I've asked Deepak to RFC the core changes suggested for the full dirty
>>>>> blob
>>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple
>>>>> helper to
>>>>> get to the desired coordinates.
>>>>>
>>>>> One thing to perhaps discuss is how we would like to fit this with
>>>>> front-buffer rendering and the dirty ioctl. In the page-flip context,
>>>>> the
>>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict
>>>>> the
>>>>> damage region that can be fully ignored by the driver, new content is
>>>>> indicated by a new framebuffer.
>>>>>
>>>>> We could do the same for frontbuffer rendering: Either set a dirty flag
>>>>> like
>>>>> you do here, or provide a content_age state member. Since we clear the
>>>>> dirty
>>>>> flag on state copies, I guess that would be sufficient. The blob
>>>>> rectangles
>>>>> would then become a hint to restrict the damage region.
>>>>
>>>> I'm not entirely following here - I thought for frontbuffer rendering
>>>> the
>>>> dirty rects have always just been a hint, and that the driver was always
>>>> free to re-upload the entire buffer to the screen.
>>>>
>>>> And through a helper like Rob's proposing here (and have floated around
>>>> in
>>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>>> call to a fake flip with dirty rect. Manual upload drivers already need
>>>> to
>>>> upload the entire screen if they get a flip, since some userspace uses
>>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>>
>>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>>
>>>>> Another approach would be to have the presence of dirty rects without
>>>>> framebuffer change to indicate frontbuffer rendering.
>>>>>
>>>>> I think I like the first approach best, although it may be tempting for
>>>>> user-space apps to just set the dirty bit instead of providing the full
>>>>> damage region.
>>>>
>>>> Or I'm not following you here, because I don't quite see the difference
>>>> between dirtyfb and a flip.
>>>> -Daniel
>>>
>>> OK, let me rephrase:
>>>
>>> From the driver's point-of-view, in the atomic world, new content and
>>> the
>>> need for manual upload is indicated by a change in fb attached to the
>>> plane.
>>>
>>> With Rob's patch here, (correct me if I'm wrong) in addition new content
>>> and
>>> the need for manual upload is identified by the dirty flag, (since the fb
>>> stays the same and the driver thus never identifies a page-flip).
>>
>> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> (atomic or not) should result in the entire buffer getting uploaded. The
>> dirty flag is kinda redundant, a flip with the same buffer works the same
>> way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>>> In both these situations, clip rects can provide a hint to restrict the
>>> dirty region.
>>>
>>> Now I was thinking about the preferred way for user-space to communicate
>>> front buffer rendering through the atomic ioctl:
>>>
>>> 1) Expose a dirty (or content_age property)
>>> 2) Attach a clip-rect blob property.
>>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the
>>> same
>>> underlying buffer object.
>>>
>>> Are you saying that people are already using 3) and we should keep using
>>> that?
>>
>> I'm saying they're using 3b), flip the same buffer wrapped in the same
>> drm_framebuffer, and expect it to work.
>>
>> The only advantage dirtyfb has is that it allows you to supply the
>> optimized upload rectangles, but at the cost of a funny api (it's working
>> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>> confirm when exactly you uploaded your stuff. But imo they should be the
>> same underlying operation.
>>
>
> FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like
> rendering operation without any synchronization,
> We've guaranteed that only the rects that are present are uploaded, but only
> xf86-video-vmware has taken advantage of this to keep
> CPU- and GPU rendered content apart.
> I think we've at some point run into problems with number of cliprects, (Old
> KDE lock screen?) and use multiple dirtyfb for the same
> update...
Ok, if we can hit this in practices then I think it's ok to have the
limit. Just need to make sure userspace actually condenses the
cliprects down to something within the limit, since with atomic flips
you can't just do multiple updates - that would tear badly.
Wrt not syncing: I think general use pretty clearly says lots of
userspace renders into buffers with gpus (not even necessarily your
own) and then expects dirtyfb or a flip to upload all the bits.
Otherwise Rob Clark wouldn't need his patch. Given that I think we
need to make general semantics follow that requirement - I don't
expect it'll harm vmwgfx since it doesn't render into the frontbuffer
anyway?
> IIRC the reason for working with the fb, is that it's much easier for
> user-space, which doesn't have to care where planes are scanning out and
> why.
> "Present my new content on any screen that's affected".
Yeah, dirtyfb makes tons of sense for frontbuffer-rendering X, which
also doesn't do per-scanout pixmaps. But for atomic flips you really
want to flip on the crtc (or well plane), since otherwise with
multiple planes it comes up all teared up. vmwgfx doesn't care I guess
since you only have 1 primary plane, but all the SoC chips very much
do.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >>> 1) Expose a dirty (or content_age property)
> >>> 2) Attach a clip-rect blob property.
> >>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the
> >>> same
> >>> underlying buffer object.
> >>>
> >>> Are you saying that people are already using 3) and we should keep
> using
> >>> that?
> >>
> >> I'm saying they're using 3b), flip the same buffer wrapped in the same
> >> drm_framebuffer, and expect it to work.
> >>
> >> The only advantage dirtyfb has is that it allows you to supply the
> >> optimized upload rectangles, but at the cost of a funny api (it's working
> >> on the fb, not the plane/crtc you want to upload) and lack of drm_event
> to
> >> confirm when exactly you uploaded your stuff. But imo they should be
> the
> >> same underlying operation.
> >>
> >
> > FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like
> > rendering operation without any synchronization,
> > We've guaranteed that only the rects that are present are uploaded, but
> only
> > xf86-video-vmware has taken advantage of this to keep
> > CPU- and GPU rendered content apart.
> > I think we've at some point run into problems with number of cliprects,
> (Old
> > KDE lock screen?) and use multiple dirtyfb for the same
> > update...
>
> Ok, if we can hit this in practices then I think it's ok to have the
> limit. Just need to make sure userspace actually condenses the
> cliprects down to something within the limit, since with atomic flips
> you can't just do multiple updates - that would tear badly.
So I think the conclusion is to have damage clip rect limit with proper
documentation stating limitation of doing multiple updates.
>
> Wrt not syncing: I think general use pretty clearly says lots of
> userspace renders into buffers with gpus (not even necessarily your
> own) and then expects dirtyfb or a flip to upload all the bits.
> Otherwise Rob Clark wouldn't need his patch. Given that I think we
> need to make general semantics follow that requirement - I don't
> expect it'll harm vmwgfx since it doesn't render into the frontbuffer
> anyway?
>
> > IIRC the reason for working with the fb, is that it's much easier for
> > user-space, which doesn't have to care where planes are scanning out and
> > why.
> > "Present my new content on any screen that's affected".
>
> Yeah, dirtyfb makes tons of sense for frontbuffer-rendering X, which
> also doesn't do per-scanout pixmaps. But for atomic flips you really
> want to flip on the crtc (or well plane), since otherwise with
> multiple planes it comes up all teared up. vmwgfx doesn't care I guess
> since you only have 1 primary plane, but all the SoC chips very much
> do.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__blog.ffwll.ch&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=zOOG28inJK
> 0762SxAf-cyZdStnd2NQpRu98lJP2iYGw&m=XKgN7GPElFapBWdozPSC-
> 9rcfKmDvQC1QHhsFghexWc&s=SH9q5tw-
> UqpUBJVrr2v1mLpRo28Aau7iJ3YWlrgbPmU&e=
On Thu, Apr 5, 2018 at 9:30 AM, Thomas Hellstrom <[email protected]> wrote:
> On 04/04/2018 11:56 AM, Daniel Vetter wrote:
>>
>> On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:
>>>
>>> On 04/04/2018 10:43 AM, Daniel Vetter wrote:
>>>>
>>>> On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 04/04/2018 08:58 AM, Daniel Vetter wrote:
>>>>>>
>>>>>> On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Add an atomic helper to implement dirtyfb support. This is needed to
>>>>>>> support DSI command-mode panels with x11 userspace (ie. when we can't
>>>>>>> rely on pageflips to trigger a flush to the panel).
>>>>>>>
>>>>>>> To signal to the driver that the async atomic update needs to
>>>>>>> synchronize with fences, even though the fb didn't change, the
>>>>>>> drm_atomic_state::dirty flag is added.
>>>>>>>
>>>>>>> Signed-off-by: Rob Clark <[email protected]>
>>>>>>> ---
>>>>>>> Background: there are a number of different folks working on getting
>>>>>>> upstream kernel working on various different phones/tablets with qcom
>>>>>>> SoC's.. many of them have command mode panels, so we kind of need a
>>>>>>> way to support the legacy dirtyfb ioctl for x11 support.
>>>>>>>
>>>>>>> I know there is work on a proprer non-legacy atomic property for
>>>>>>> userspace to communicate dirty-rect(s) to the kernel, so this can
>>>>>>> be improved from triggering a full-frame flush once that is in
>>>>>>> place. But we kinda needa a stop-gap solution.
>>>>>>>
>>>>>>> I had considered an in-driver solution for this, but things get a
>>>>>>> bit tricky if userspace ands up combining dirtyfb ioctls with page-
>>>>>>> flips, because we need to synchronize setting various CTL.FLUSH bits
>>>>>>> with setting the CTL.START bit. (ie. really all we need to do for
>>>>>>> cmd mode panels is bang CTL.START, but is this ends up racing with
>>>>>>> pageflips setting FLUSH bits, then bad things.) The easiest soln
>>>>>>> is to wrap this up as an atomic commit and rely on the worker to
>>>>>>> serialize things. Hence adding an atomic dirtyfb helper.
>>>>>>>
>>>>>>> I guess at least the helper, with some small addition to translate
>>>>>>> and pass-thru the dirty rect(s) is useful to the final atomic dirty-
>>>>>>> rect property solution. Depending on how far off that is, a stop-
>>>>>>> gap solution could be useful.
>>>>>>
>>>>>> Adding Noralf, who iirc already posted the full dirty helpers already
>>>>>> somewhere.
>>>>>> -Daniel
>>>>>
>>>>> I've asked Deepak to RFC the core changes suggested for the full dirty
>>>>> blob
>>>>> on dri-devel. It builds on DisplayLink's suggestion, with a simple
>>>>> helper to
>>>>> get to the desired coordinates.
>>>>>
>>>>> One thing to perhaps discuss is how we would like to fit this with
>>>>> front-buffer rendering and the dirty ioctl. In the page-flip context,
>>>>> the
>>>>> dirty rects, like egl's swapbuffer_with_damage is a hint to restrict
>>>>> the
>>>>> damage region that can be fully ignored by the driver, new content is
>>>>> indicated by a new framebuffer.
>>>>>
>>>>> We could do the same for frontbuffer rendering: Either set a dirty flag
>>>>> like
>>>>> you do here, or provide a content_age state member. Since we clear the
>>>>> dirty
>>>>> flag on state copies, I guess that would be sufficient. The blob
>>>>> rectangles
>>>>> would then become a hint to restrict the damage region.
>>>>
>>>> I'm not entirely following here - I thought for frontbuffer rendering
>>>> the
>>>> dirty rects have always just been a hint, and that the driver was always
>>>> free to re-upload the entire buffer to the screen.
>>>>
>>>> And through a helper like Rob's proposing here (and have floated around
>>>> in
>>>> different versions already) we'd essentially map a frontbuffer dirtyfb
>>>> call to a fake flip with dirty rect. Manual upload drivers already need
>>>> to
>>>> upload the entire screen if they get a flip, since some userspace uses
>>>> that to flush out frontbuffer rendering (instead of calling dirtyfb).
>>>>
>>>> So from that pov the new dirty flag is kinda not necessary imo.
>>>>
>>>>> Another approach would be to have the presence of dirty rects without
>>>>> framebuffer change to indicate frontbuffer rendering.
>>>>>
>>>>> I think I like the first approach best, although it may be tempting for
>>>>> user-space apps to just set the dirty bit instead of providing the full
>>>>> damage region.
>>>>
>>>> Or I'm not following you here, because I don't quite see the difference
>>>> between dirtyfb and a flip.
>>>> -Daniel
>>>
>>> OK, let me rephrase:
>>>
>>> From the driver's point-of-view, in the atomic world, new content and
>>> the
>>> need for manual upload is indicated by a change in fb attached to the
>>> plane.
>>>
>>> With Rob's patch here, (correct me if I'm wrong) in addition new content
>>> and
>>> the need for manual upload is identified by the dirty flag, (since the fb
>>> stays the same and the driver thus never identifies a page-flip).
>>
>> Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
>> (atomic or not) should result in the entire buffer getting uploaded. The
>> dirty flag is kinda redundant, a flip with the same buffer works the same
>> way as a dirtyfb with the entire buffer as the dirty rectangle.
>>
>>> In both these situations, clip rects can provide a hint to restrict the
>>> dirty region.
>>>
>>> Now I was thinking about the preferred way for user-space to communicate
>>> front buffer rendering through the atomic ioctl:
>>>
>>> 1) Expose a dirty (or content_age property)
>>> 2) Attach a clip-rect blob property.
>>> 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the
>>> same
>>> underlying buffer object.
>>>
>>> Are you saying that people are already using 3) and we should keep using
>>> that?
>>
>> I'm saying they're using 3b), flip the same buffer wrapped in the same
>> drm_framebuffer, and expect it to work.
>>
>> The only advantage dirtyfb has is that it allows you to supply the
>> optimized upload rectangles, but at the cost of a funny api (it's working
>> on the fb, not the plane/crtc you want to upload) and lack of drm_event to
>> confirm when exactly you uploaded your stuff. But imo they should be the
>> same underlying operation.
>>
>
> FWIW, vmwgfx has always treated a dirtyfb as a pure front-buffer like
> rendering operation without any synchronization,
I guess this works for you because dirtyfb msg to host and rendering
msgs to host end up serialized?
> We've guaranteed that only the rects that are present are uploaded, but only
> xf86-video-vmware has taken advantage of this to keep
> CPU- and GPU rendered content apart.
> I think we've at some point run into problems with number of cliprects, (Old
> KDE lock screen?) and use multiple dirtyfb for the same
> update...
fwiw, I haven't really looked at how it works on msm yet.. my memories
from omap where that we could set a single clip-rect on tx to panel
but we'd need to block tx of contents of next clip-rect until previous
was finished, so we kinda have a limit of 1.. and I expect msm (or
most other dsi drivers) are similar.. but I'm expecting that
serializing everything on atomic commit worker helps here
BR,
-R
>
> IIRC the reason for working with the fb, is that it's much easier for
> user-space, which doesn't have to care where planes are scanning out and
> why.
> "Present my new content on any screen that's affected".
>
> /Thomas
>
>