2021-03-08 08:26:12

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v2 0/5] Add option to mmap GEM buffers cached

Rework of my previous patchset which added support for GEM buffers
backed by non-coherent memory to the ingenic-drm driver.

Having GEM buffers backed by non-coherent memory is interesting in
the particular case where it is faster to render to a non-coherent
buffer then sync the data cache, than to render to a write-combine
buffer, and (by extension) much faster than using a shadow buffer.
This is true for instance on some Ingenic SoCs, where even simple
blits (e.g. memcpy) are about three times faster using this method.

For the record, the previous patchset was accepted for 5.10 then had
to be reverted, as it conflicted with some changes made to the DMA API.

This new patchset is pretty different as it adds the functionality to
the DRM core. The first three patches add variants to existing functions
but with the "non-coherent memory" twist, exported as GPL symbols. The
fourth patch adds a function to be used with the damage helpers.
Finally, the last patch adds support for non-coherent GEM buffers to the
ingenic-drm driver. The functionality is enabled through a module
parameter, and is disabled by default.

Cheers,
-Paul

Paul Cercueil (5):
drm: Add and export function drm_gem_cma_create_noncoherent
drm: Add and export function drm_gem_cma_dumb_create_noncoherent
drm: Add and export function drm_gem_cma_mmap_noncoherent
drm: Add and export function drm_gem_cma_sync_data
drm/ingenic: Add option to alloc cached GEM buffers

drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++++++++++++++++++---
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++-
drivers/gpu/drm/ingenic/ingenic-drm.h | 4 +
drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +-
include/drm/drm_gem_cma_helper.h | 13 ++
5 files changed, 273 insertions(+), 30 deletions(-)

--
2.30.1


2021-03-08 08:26:38

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data

This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 43 ++++++++++++++++++++++++++++
include/drm/drm_gem_cma_helper.h | 5 ++++
2 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index e39b0464e19d..fdae54a18670 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -18,9 +18,14 @@
#include <linux/slab.h>

#include <drm/drm.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_device.h>
#include <drm/drm_drv.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_fb_cma_helper.h>
+#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane.h>
#include <drm/drm_vma_manager.h>

/**
@@ -684,3 +689,41 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
return obj;
}
EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
+
+/**
+ * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
+ * @dev: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_gem_cma_sync_data(struct device *dev,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state)
+{
+ const struct drm_format_info *finfo = state->fb->format;
+ struct drm_atomic_helper_damage_iter iter;
+ unsigned int offset, i;
+ struct drm_rect clip;
+ dma_addr_t daddr;
+
+ drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+ drm_atomic_for_each_plane_damage(&iter, &clip) {
+ for (i = 0; i < finfo->num_planes; i++) {
+ daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+
+ /* Ignore x1/x2 values, invalidate complete lines */
+ offset = clip.y1 * state->fb->pitches[i];
+
+ dma_sync_single_for_device(dev, daddr + offset,
+ (clip.y2 - clip.y1) * state->fb->pitches[i],
+ DMA_TO_DEVICE);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 6a3f7e1312cc..cdd3fb456916 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -7,6 +7,7 @@
#include <drm/drm_gem.h>

struct drm_mode_create_dumb;
+struct drm_plane_state;

/**
* struct drm_gem_cma_object - GEM object backed by CMA memory allocations
@@ -190,4 +191,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
struct dma_buf_attachment *attach,
struct sg_table *sgt);

+void drm_gem_cma_sync_data(struct device *dev,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state);
+
#endif /* __DRM_GEM_CMA_HELPER_H__ */
--
2.30.1

2021-03-08 08:47:03

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached

Hi Paul,

having individual functions for each mode only makes sense if the
decision is at compile time. But in patch 5, you're working around your
earlier design by introducing in-driver helpers that select the correct
CMA function.

In SHMEM helpers we have the flag map_wc in the GEM structure that
selects the pages caching mode (wc vs uncached). I think CMA should use
this design as well. Have a map_noncoherent flag in the CMA GEM object
and set it from the driver's implementation of gem_create_object.

And in the long run, we could try to consolidate all drivers/helpers
mapping flags in struct drm_gem_object.

Best regards
Thomas

Am 07.03.21 um 21:28 schrieb Paul Cercueil:
> Rework of my previous patchset which added support for GEM buffers
> backed by non-coherent memory to the ingenic-drm driver.
>
> Having GEM buffers backed by non-coherent memory is interesting in
> the particular case where it is faster to render to a non-coherent
> buffer then sync the data cache, than to render to a write-combine
> buffer, and (by extension) much faster than using a shadow buffer.
> This is true for instance on some Ingenic SoCs, where even simple
> blits (e.g. memcpy) are about three times faster using this method.
>
> For the record, the previous patchset was accepted for 5.10 then had
> to be reverted, as it conflicted with some changes made to the DMA API.
>
> This new patchset is pretty different as it adds the functionality to
> the DRM core. The first three patches add variants to existing functions
> but with the "non-coherent memory" twist, exported as GPL symbols. The
> fourth patch adds a function to be used with the damage helpers.
> Finally, the last patch adds support for non-coherent GEM buffers to the
> ingenic-drm driver. The functionality is enabled through a module
> parameter, and is disabled by default.
>
> Cheers,
> -Paul
>
> Paul Cercueil (5):
> drm: Add and export function drm_gem_cma_create_noncoherent
> drm: Add and export function drm_gem_cma_dumb_create_noncoherent
> drm: Add and export function drm_gem_cma_mmap_noncoherent
> drm: Add and export function drm_gem_cma_sync_data
> drm/ingenic: Add option to alloc cached GEM buffers
>
> drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++++++++++++++++++---
> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++-
> drivers/gpu/drm/ingenic/ingenic-drm.h | 4 +
> drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +-
> include/drm/drm_gem_cma_helper.h | 13 ++
> 5 files changed, 273 insertions(+), 30 deletions(-)
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-03-08 10:13:06

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent

This function can be used by drivers that need to mmap dumb buffers
created with non-coherent backing memory.

v2: Use dma_to_phys() since cma_obj->paddr isn't a phys_addr_t but a
dma_addr_t.

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 67 +++++++++++++++++++++++++---
include/drm/drm_gem_cma_helper.h | 1 +
2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index d100c5f9c140..e39b0464e19d 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -10,6 +10,7 @@
*/

#include <linux/dma-buf.h>
+#include <linux/dma-direct.h>
#include <linux/dma-mapping.h>
#include <linux/export.h>
#include <linux/mm.h>
@@ -42,10 +43,20 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
.vm_ops = &drm_gem_cma_vm_ops,
};

+static const struct drm_gem_object_funcs drm_gem_cma_noncoherent_funcs = {
+ .free = drm_gem_cma_free_object,
+ .print_info = drm_gem_cma_print_info,
+ .get_sg_table = drm_gem_cma_get_sg_table,
+ .vmap = drm_gem_cma_vmap,
+ .mmap = drm_gem_cma_mmap_noncoherent,
+ .vm_ops = &drm_gem_cma_vm_ops,
+};
+
/**
* __drm_gem_cma_create - Create a GEM CMA object without allocating memory
* @drm: DRM device
* @size: size of the object to allocate
+ * @noncoherent: if true, will use non-coherent backed memory
*
* This function creates and initializes a GEM CMA object of the given size,
* but doesn't allocate any memory to back the object.
@@ -55,7 +66,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* error code on failure.
*/
static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool noncoherent)
{
struct drm_gem_cma_object *cma_obj;
struct drm_gem_object *gem_obj;
@@ -68,8 +79,12 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
if (!gem_obj)
return ERR_PTR(-ENOMEM);

- if (!gem_obj->funcs)
- gem_obj->funcs = &drm_gem_cma_default_funcs;
+ if (!gem_obj->funcs) {
+ if (noncoherent)
+ gem_obj->funcs = &drm_gem_cma_noncoherent_funcs;
+ else
+ gem_obj->funcs = &drm_gem_cma_default_funcs;
+ }

cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);

@@ -100,7 +115,7 @@ drm_gem_cma_create_with_cache_param(struct drm_device *drm,

size = round_up(size, PAGE_SIZE);

- cma_obj = __drm_gem_cma_create(drm, size);
+ cma_obj = __drm_gem_cma_create(drm, size, noncoherent);
if (IS_ERR(cma_obj))
return cma_obj;

@@ -503,7 +518,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(-EINVAL);

/* Create a CMA GEM buffer. */
- cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+ cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, false);
if (IS_ERR(cma_obj))
return ERR_CAST(cma_obj);

@@ -579,6 +594,48 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
}
EXPORT_SYMBOL_GPL(drm_gem_cma_mmap);

+/**
+ * drm_gem_cma_mmap_noncoherent - memory-map a CMA GEM object with
+ * non-coherent cache attribute
+ * @filp: file object
+ * @vma: VMA for the area to be mapped
+ *
+ * Just like drm_gem_cma_mmap, but for a GEM object backed by non-coherent
+ * memory.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj,
+ struct vm_area_struct *vma)
+{
+ struct drm_gem_cma_object *cma_obj;
+ unsigned long pfn;
+ int ret;
+
+ /*
+ * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
+ * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
+ * the whole buffer.
+ */
+ vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
+ vma->vm_flags &= ~VM_PFNMAP;
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+ cma_obj = to_drm_gem_cma_obj(obj);
+
+ pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev, cma_obj->paddr));
+
+ ret = remap_pfn_range(vma, vma->vm_start, pfn,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+ if (ret)
+ drm_gem_vm_close(vma);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_mmap_noncoherent);
+
/**
* drm_gem_cma_prime_import_sg_table_vmap - PRIME import another driver's
* scatter/gather table and get the virtual address of the buffer
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 6b44e7492a63..6a3f7e1312cc 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -107,6 +107,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
struct sg_table *sgt);
int drm_gem_cma_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
+int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj, struct vm_area_struct *vma);

/**
* DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE - CMA GEM driver operations
--
2.30.1

2021-03-10 19:06:32

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached

Hi Thomas,

Le lun. 8 mars 2021 ? 9:41, Thomas Zimmermann <[email protected]> a
?crit :
> Hi Paul,
>
> having individual functions for each mode only makes sense if the
> decision is at compile time. But in patch 5, you're working around
> your earlier design by introducing in-driver helpers that select the
> correct CMA function.
>
> In SHMEM helpers we have the flag map_wc in the GEM structure that
> selects the pages caching mode (wc vs uncached). I think CMA should
> use this design as well. Have a map_noncoherent flag in the CMA GEM
> object and set it from the driver's implementation of
> gem_create_object.

Is that a new addition? That severely reduces the patchset size, which
is perfect.

I'll send a V3 then.

Cheers,
-Paul

> And in the long run, we could try to consolidate all drivers/helpers
> mapping flags in struct drm_gem_object.
>
> Best regards
> Thomas
>
> Am 07.03.21 um 21:28 schrieb Paul Cercueil:
>> Rework of my previous patchset which added support for GEM buffers
>> backed by non-coherent memory to the ingenic-drm driver.
>>
>> Having GEM buffers backed by non-coherent memory is interesting in
>> the particular case where it is faster to render to a non-coherent
>> buffer then sync the data cache, than to render to a write-combine
>> buffer, and (by extension) much faster than using a shadow buffer.
>> This is true for instance on some Ingenic SoCs, where even simple
>> blits (e.g. memcpy) are about three times faster using this method.
>>
>> For the record, the previous patchset was accepted for 5.10 then had
>> to be reverted, as it conflicted with some changes made to the DMA
>> API.
>>
>> This new patchset is pretty different as it adds the functionality to
>> the DRM core. The first three patches add variants to existing
>> functions
>> but with the "non-coherent memory" twist, exported as GPL symbols.
>> The
>> fourth patch adds a function to be used with the damage helpers.
>> Finally, the last patch adds support for non-coherent GEM buffers to
>> the
>> ingenic-drm driver. The functionality is enabled through a module
>> parameter, and is disabled by default.
>>
>> Cheers,
>> -Paul
>>
>> Paul Cercueil (5):
>> drm: Add and export function drm_gem_cma_create_noncoherent
>> drm: Add and export function drm_gem_cma_dumb_create_noncoherent
>> drm: Add and export function drm_gem_cma_mmap_noncoherent
>> drm: Add and export function drm_gem_cma_sync_data
>> drm/ingenic: Add option to alloc cached GEM buffers
>>
>> drivers/gpu/drm/drm_gem_cma_helper.c | 223
>> +++++++++++++++++++---
>> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++-
>> drivers/gpu/drm/ingenic/ingenic-drm.h | 4 +
>> drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +-
>> include/drm/drm_gem_cma_helper.h | 13 ++
>> 5 files changed, 273 insertions(+), 30 deletions(-)
>>
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 N?rnberg, Germany
> (HRB 36809, AG N?rnberg)
> Gesch?ftsf?hrer: Felix Imend?rffer
>


2021-03-11 07:54:23

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached

Hi

Am 10.03.21 um 20:02 schrieb Paul Cercueil:
> Hi Thomas,
>
> Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann <[email protected]> a
> écrit :
>> Hi Paul,
>>
>> having individual functions for each mode only makes sense if the
>> decision is at compile time. But in patch 5, you're working around
>> your earlier design by introducing in-driver helpers that select the
>> correct CMA function.
>>
>> In SHMEM helpers we have the flag map_wc in the GEM structure that
>> selects the pages caching mode (wc vs uncached). I think CMA should

Re-reading this, it should rather be WC and cached.

>> use this design as well. Have a map_noncoherent flag in the CMA GEM
>> object and set it from the driver's implementation of gem_create_object.
>
> Is that a new addition? That severely reduces the patchset size, which
> is perfect.

It was added a few months ago, around the time you send the first
version of the patches at hand.

Originally, SHMEM uses write combining by default. And several drivers
used default mapping flags instead (so usually cached). IIRC I
streamlined everything and flipped the default. Most drivers can use
cached mappings and only some require WC.

Recently there was also a patchset that added support for cached video
RAM (or something like that). So at some point we could store these
flags in drm_gem_object. For now, I'd just put a flag into
drm_gem_cma_object.

>
> I'll send a V3 then.

Great!

Best regards
Thomas

>
> Cheers,
> -Paul
>
>> And in the long run, we could try to consolidate all drivers/helpers
>> mapping flags in struct drm_gem_object.
>>
>> Best regards
>> Thomas
>>
>> Am 07.03.21 um 21:28 schrieb Paul Cercueil:
>>> Rework of my previous patchset which added support for GEM buffers
>>> backed by non-coherent memory to the ingenic-drm driver.
>>>
>>> Having GEM buffers backed by non-coherent memory is interesting in
>>> the particular case where it is faster to render to a non-coherent
>>> buffer then sync the data cache, than to render to a write-combine
>>> buffer, and (by extension) much faster than using a shadow buffer.
>>> This is true for instance on some Ingenic SoCs, where even simple
>>> blits (e.g. memcpy) are about three times faster using this method.
>>>
>>> For the record, the previous patchset was accepted for 5.10 then had
>>> to be reverted, as it conflicted with some changes made to the DMA API.
>>>
>>> This new patchset is pretty different as it adds the functionality to
>>> the DRM core. The first three patches add variants to existing functions
>>> but with the "non-coherent memory" twist, exported as GPL symbols. The
>>> fourth patch adds a function to be used with the damage helpers.
>>> Finally, the last patch adds support for non-coherent GEM buffers to the
>>> ingenic-drm driver. The functionality is enabled through a module
>>> parameter, and is disabled by default.
>>>
>>> Cheers,
>>> -Paul
>>>
>>> Paul Cercueil (5):
>>>    drm: Add and export function drm_gem_cma_create_noncoherent
>>>    drm: Add and export function drm_gem_cma_dumb_create_noncoherent
>>>    drm: Add and export function drm_gem_cma_mmap_noncoherent
>>>    drm: Add and export function drm_gem_cma_sync_data
>>>    drm/ingenic: Add option to alloc cached GEM buffers
>>>
>>>   drivers/gpu/drm/drm_gem_cma_helper.c      | 223 +++++++++++++++++++---
>>>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  49 ++++-
>>>   drivers/gpu/drm/ingenic/ingenic-drm.h     |   4 +
>>>   drivers/gpu/drm/ingenic/ingenic-ipu.c     |  14 +-
>>>   include/drm/drm_gem_cma_helper.h          |  13 ++
>>>   5 files changed, 273 insertions(+), 30 deletions(-)
>>>
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Maxfeldstr. 5, 90409 Nürnberg, Germany
>> (HRB 36809, AG Nürnberg)
>> Geschäftsführer: Felix Imendörffer
>>
>
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-03-11 12:31:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent

> +int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj,
> + struct vm_area_struct *vma)
> +{
> + struct drm_gem_cma_object *cma_obj;
> + unsigned long pfn;
> + int ret;
> +
> + /*
> + * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
> + * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
> + * the whole buffer.
> + */
> + vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
> + vma->vm_flags &= ~VM_PFNMAP;
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +
> + cma_obj = to_drm_gem_cma_obj(obj);
> +
> + pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev, cma_obj->paddr));
> +
> + ret = remap_pfn_range(vma, vma->vm_start, pfn,
> + vma->vm_end - vma->vm_start,
> + vma->vm_page_prot);

dma_to_phys must not be used by drivers.

I have a proper helper for this waiting for users:

http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952

If you can confirm the helpers works for you I can try to still sneak
it to Linus for 5.12 to ease the merge pain.

2021-03-11 12:31:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data

On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
> + drm_atomic_for_each_plane_damage(&iter, &clip) {
> + for (i = 0; i < finfo->num_planes; i++) {
> + daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
> +
> + /* Ignore x1/x2 values, invalidate complete lines */
> + offset = clip.y1 * state->fb->pitches[i];
> +
> + dma_sync_single_for_device(dev, daddr + offset,
> + (clip.y2 - clip.y1) * state->fb->pitches[i],
> + DMA_TO_DEVICE);

Are these helpers only ever used to transfer data to the device and
never from it? If so please clearly document that.

2021-03-11 12:34:49

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent

Hi Christoph,

Le jeu. 11 mars 2021 ? 12:26, Christoph Hellwig <[email protected]> a
?crit :
>> +int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj,
>> + struct vm_area_struct *vma)
>> +{
>> + struct drm_gem_cma_object *cma_obj;
>> + unsigned long pfn;
>> + int ret;
>> +
>> + /*
>> + * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and
>> set the
>> + * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want
>> to map
>> + * the whole buffer.
>> + */
>> + vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
>> + vma->vm_flags &= ~VM_PFNMAP;
>> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> +
>> + cma_obj = to_drm_gem_cma_obj(obj);
>> +
>> + pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev,
>> cma_obj->paddr));
>> +
>> + ret = remap_pfn_range(vma, vma->vm_start, pfn,
>> + vma->vm_end - vma->vm_start,
>> + vma->vm_page_prot);
>
> dma_to_phys must not be used by drivers.
>
> I have a proper helper for this waiting for users:
>
> http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952
>
> If you can confirm the helpers works for you I can try to still sneak
> it to Linus for 5.12 to ease the merge pain.

I can try. How do I get a page pointer from a dma_addr_t?

-Paul


2021-03-11 12:39:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent

On Thu, Mar 11, 2021 at 12:32:27PM +0000, Paul Cercueil wrote:
> > dma_to_phys must not be used by drivers.
> >
> > I have a proper helper for this waiting for users:
> >
> > http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952
> >
> > If you can confirm the helpers works for you I can try to still sneak
> > it to Linus for 5.12 to ease the merge pain.
>
> I can try. How do I get a page pointer from a dma_addr_t?

You don't - you get it from using virt_to_page on the pointer returned
from dma_alloc_noncoherent. That beind said to keep the API sane I
should probably add a wrapper that does that for you.

2021-03-12 11:54:26

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data



Le jeu. 11 mars 2021 ? 12:28, Christoph Hellwig <[email protected]> a
?crit :
> On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
>> + drm_atomic_for_each_plane_damage(&iter, &clip) {
>> + for (i = 0; i < finfo->num_planes; i++) {
>> + daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
>> +
>> + /* Ignore x1/x2 values, invalidate complete lines */
>> + offset = clip.y1 * state->fb->pitches[i];
>> +
>> + dma_sync_single_for_device(dev, daddr + offset,
>> + (clip.y2 - clip.y1) * state->fb->pitches[i],
>> + DMA_TO_DEVICE);
>
> Are these helpers only ever used to transfer data to the device and
> never from it? If so please clearly document that.

Yes. In the DRM world, are there cases where we transfer data from the
device? I assume these cases are handled by v4l2 instead.

-Paul


2021-03-12 11:54:32

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent



Le jeu. 11 mars 2021 ? 12:36, Christoph Hellwig <[email protected]> a
?crit :
> On Thu, Mar 11, 2021 at 12:32:27PM +0000, Paul Cercueil wrote:
>> > dma_to_phys must not be used by drivers.
>> >
>> > I have a proper helper for this waiting for users:
>> >
>> >
>> http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952
>> >
>> > If you can confirm the helpers works for you I can try to still
>> sneak
>> > it to Linus for 5.12 to ease the merge pain.
>>
>> I can try. How do I get a page pointer from a dma_addr_t?
>
> You don't - you get it from using virt_to_page on the pointer returned
> from dma_alloc_noncoherent. That beind said to keep the API sane I
> should probably add a wrapper that does that for you.

I tested using:

ret = dma_mmap_pages(cma_obj->base.dev->dev,
vma, vma->vm_end - vma->vm_start,
virt_to_page(cma_obj->vaddr));

It works fine.

I think I can use remap_pfn_range() for now, and switch to your new API
once it's available in drm-misc-next.

Cheers,
-Paul


2021-03-12 16:39:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent

On Thu, Mar 11, 2021 at 04:12:55PM +0000, Paul Cercueil wrote:
> ret = dma_mmap_pages(cma_obj->base.dev->dev,
> vma, vma->vm_end - vma->vm_start,
> virt_to_page(cma_obj->vaddr));
>
> It works fine.
>
> I think I can use remap_pfn_range() for now, and switch to your new API once
> it's available in drm-misc-next.

No, drivers must not use dma_to_phys, and they also must not include
dma-direct.h.

2021-03-15 07:45:32

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data

Hi

Am 11.03.21 um 13:33 schrieb Paul Cercueil:
>
>
> Le jeu. 11 mars 2021 à 12:28, Christoph Hellwig <[email protected]> a
> écrit :
>> On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
>>>  +    drm_atomic_for_each_plane_damage(&iter, &clip) {
>>>  +        for (i = 0; i < finfo->num_planes; i++) {
>>>  +            daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
>>>  +
>>>  +            /* Ignore x1/x2 values, invalidate complete lines */
>>>  +            offset = clip.y1 * state->fb->pitches[i];
>>>  +
>>>  +            dma_sync_single_for_device(dev, daddr + offset,
>>>  +                       (clip.y2 - clip.y1) * state->fb->pitches[i],
>>>  +                       DMA_TO_DEVICE);
>>
>> Are these helpers only ever used to transfer data to the device and
>> never from it?  If so please clearly document that.
>
> Yes. In the DRM world, are there cases where we transfer data from the
> device? I assume these cases are handled by v4l2 instead.

Software rendering (i.e., anything wrt dumb buffers) likely reads back
framebuffer content during blit operations. For devices where this is a
slow operation (e.g., PCI read) we set struct
drm_mode_config.prefer_shadow to hint renderers to use shadow buffering.

Best regards
Thomas

>
> -Paul
>
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-03-15 12:02:41

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data

Hi Thomas,

Le lun. 15 mars 2021 ? 8:43, Thomas Zimmermann <[email protected]> a
?crit :
> Hi
>
> Am 11.03.21 um 13:33 schrieb Paul Cercueil:
>>
>>
>> Le jeu. 11 mars 2021 ? 12:28, Christoph Hellwig <[email protected]>
>> a ?crit :
>>> On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
>>>> + drm_atomic_for_each_plane_damage(&iter, &clip) {
>>>> + for (i = 0; i < finfo->num_planes; i++) {
>>>> + daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
>>>> +
>>>> + /* Ignore x1/x2 values, invalidate complete lines */
>>>> + offset = clip.y1 * state->fb->pitches[i];
>>>> +
>>>> + dma_sync_single_for_device(dev, daddr + offset,
>>>> + (clip.y2 - clip.y1) *
>>>> state->fb->pitches[i],
>>>> + DMA_TO_DEVICE);
>>>
>>> Are these helpers only ever used to transfer data to the device and
>>> never from it? If so please clearly document that.
>>
>> Yes. In the DRM world, are there cases where we transfer data from
>> the device? I assume these cases are handled by v4l2 instead.
>
> Software rendering (i.e., anything wrt dumb buffers) likely reads
> back framebuffer content during blit operations. For devices where
> this is a slow operation (e.g., PCI read) we set struct
> drm_mode_config.prefer_shadow to hint renderers to use shadow
> buffering.

This has been brought up a few times already. I answered that in the
cover letter. In my case, *writes* (e.g. dumb memcpy) are also slower
with a write-combine buffer than with a non-coherent buffer + cache
sync. So a shadow buffer does nothing for me.

Cheers,
-Paul