2021-05-23 17:05:49

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 0/3] Add option to mmap GEM buffers cached

V5 of my patchset which adds the option for having GEM buffers backed by
non-coherent memory.

Changes from V4:

- [2/3]:
- Rename to drm_fb_cma_sync_non_coherent
- Invert loops for better cache locality
- Only sync BOs that have the non-coherent flag
- Properly sort includes
- Move to drm_fb_cma_helper.c to avoid circular dependency

- [3/3]:
- Fix drm_atomic_get_new_plane_state() used to retrieve the old
state
- Use custom drm_gem_fb_create()
- Only check damage clips and sync DMA buffers if non-coherent
buffers are used

Cheers,
-Paul

Paul Cercueil (3):
drm: Add support for GEM buffers backed by non-coherent memory
drm: Add and export function drm_fb_cma_sync_non_coherent
drm/ingenic: Add option to alloc cached GEM buffers

drivers/gpu/drm/drm_fb_cma_helper.c | 46 ++++++++++++++++++
drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++----
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 +++++++++++++++++++++--
drivers/gpu/drm/ingenic/ingenic-drm.h | 1 +
drivers/gpu/drm/ingenic/ingenic-ipu.c | 21 ++++++--
include/drm/drm_fb_cma_helper.h | 4 ++
include/drm/drm_gem_cma_helper.h | 3 ++
7 files changed, 156 insertions(+), 16 deletions(-)

--
2.30.2


2021-05-23 17:05:49

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent

This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.

v3: - Only sync data if using GEM objects backed by non-coherent memory.
- Use a drm_device pointer instead of device pointer in prototype

v5: - Rename to drm_fb_cma_sync_non_coherent
- Invert loops for better cache locality
- Only sync BOs that have the non-coherent flag
- Move to drm_fb_cma_helper.c to avoid circular dependency

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/gpu/drm/drm_fb_cma_helper.c | 46 +++++++++++++++++++++++++++++
include/drm/drm_fb_cma_helper.h | 4 +++
2 files changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/drm_fb_cma_helper.c b/drivers/gpu/drm/drm_fb_cma_helper.c
index cb2349ad338d..69c57273b184 100644
--- a/drivers/gpu/drm/drm_fb_cma_helper.c
+++ b/drivers/gpu/drm/drm_fb_cma_helper.c
@@ -9,12 +9,14 @@
* Copyright (C) 2012 Red Hat
*/

+#include <drm/drm_damage_helper.h>
#include <drm/drm_fb_cma_helper.h>
#include <drm/drm_fourcc.h>
#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_cma_helper.h>
#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/drm_plane.h>
+#include <linux/dma-mapping.h>
#include <linux/module.h>

/**
@@ -97,3 +99,47 @@ dma_addr_t drm_fb_cma_get_gem_addr(struct drm_framebuffer *fb,
return paddr;
}
EXPORT_SYMBOL_GPL(drm_fb_cma_get_gem_addr);
+
+/**
+ * drm_fb_cma_sync_non_coherent - Sync GEM object to non-coherent backing
+ * memory
+ * @drm: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state)
+{
+ const struct drm_format_info *finfo = state->fb->format;
+ struct drm_atomic_helper_damage_iter iter;
+ const struct drm_gem_cma_object *cma_obj;
+ unsigned int offset, i;
+ struct drm_rect clip;
+ dma_addr_t daddr;
+ size_t nb_bytes;
+
+ for (i = 0; i < finfo->num_planes; i++) {
+ cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
+ if (!cma_obj->map_noncoherent)
+ continue;
+
+ daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+ drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+ drm_atomic_for_each_plane_damage(&iter, &clip) {
+ /* Ignore x1/x2 values, invalidate complete lines */
+ offset = clip.y1 * state->fb->pitches[i];
+
+ nb_bytes = (clip.y2 - clip.y1) * state->fb->pitches[i];
+ dma_sync_single_for_device(drm->dev, daddr + offset,
+ nb_bytes, DMA_TO_DEVICE);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(drm_fb_cma_sync_non_coherent);
diff --git a/include/drm/drm_fb_cma_helper.h b/include/drm/drm_fb_cma_helper.h
index 795aea1d0a25..3a177632b17e 100644
--- a/include/drm/drm_fb_cma_helper.h
+++ b/include/drm/drm_fb_cma_helper.h
@@ -14,5 +14,9 @@ dma_addr_t drm_fb_cma_get_gem_addr(struct drm_framebuffer *fb,
struct drm_plane_state *state,
unsigned int plane);

+void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state);
+
#endif

--
2.30.2

2021-05-23 17:05:49

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory

Having GEM buffers backed by non-coherent memory is interesting in the
particular case where it is faster to render to a non-coherent buffer
then sync the data cache, than to render to a write-combine buffer, and
(by extension) much faster than using a shadow buffer. This is true for
instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
are about three times faster using this method.

Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
can be set by the drivers when they create the dumb buffer.

Since this really only applies to software rendering, disable this flag
as soon as the CMA objects are exported via PRIME.

v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
the objects are mapped, and use the new dma_mmap_pages function.

v4: Make sure map_noncoherent is always disabled when creating GEM
objects meant to be used with dma-buf.

Signed-off-by: Paul Cercueil <[email protected]>
Acked-by: Thomas Zimmermann <[email protected]>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
include/drm/drm_gem_cma_helper.h | 3 +++
2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..235c7a63da2b 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* __drm_gem_cma_create - Create a GEM CMA object without allocating memory
* @drm: DRM device
* @size: size of the object to allocate
+ * @private: true if used for internal purposes
*
* This function creates and initializes a GEM CMA object of the given size,
* but doesn't allocate any memory to back the object.
@@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* error code on failure.
*/
static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
{
struct drm_gem_cma_object *cma_obj;
struct drm_gem_object *gem_obj;
- int ret;
+ int ret = 0;

if (drm->driver->gem_create_object)
gem_obj = drm->driver->gem_create_object(drm, size);
@@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)

cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);

- ret = drm_gem_object_init(drm, gem_obj, size);
+ if (private) {
+ drm_gem_private_object_init(drm, gem_obj, size);
+
+ /* Always use writecombine for dma-buf mappings */
+ cma_obj->map_noncoherent = false;
+ } else {
+ ret = drm_gem_object_init(drm, gem_obj, size);
+ }
if (ret)
goto error;

@@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,

size = round_up(size, PAGE_SIZE);

- cma_obj = __drm_gem_cma_create(drm, size);
+ cma_obj = __drm_gem_cma_create(drm, size, false);
if (IS_ERR(cma_obj))
return cma_obj;

- cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
- GFP_KERNEL | __GFP_NOWARN);
+ if (cma_obj->map_noncoherent) {
+ cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
+ &cma_obj->paddr,
+ DMA_TO_DEVICE,
+ GFP_KERNEL | __GFP_NOWARN);
+ } else {
+ cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
+ GFP_KERNEL | __GFP_NOWARN);
+ }
if (!cma_obj->vaddr) {
drm_dbg(drm, "failed to allocate buffer with size %zu\n",
size);
@@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(-EINVAL);

/* Create a CMA GEM buffer. */
- cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+ cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
if (IS_ERR(cma_obj))
return ERR_CAST(cma_obj);

@@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)

cma_obj = to_drm_gem_cma_obj(obj);

- ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
- cma_obj->paddr, vma->vm_end - vma->vm_start);
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ if (!cma_obj->map_noncoherent)
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+ ret = dma_mmap_pages(cma_obj->base.dev->dev,
+ vma, vma->vm_end - vma->vm_start,
+ virt_to_page(cma_obj->vaddr));
if (ret)
drm_gem_vm_close(vma);

diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..cd13508acbc1 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
* more than one entry but they are guaranteed to have contiguous
* DMA addresses.
* @vaddr: kernel virtual address of the backing memory
+ * @map_noncoherent: if true, the GEM object is backed by non-coherent memory
*/
struct drm_gem_cma_object {
struct drm_gem_object base;
@@ -24,6 +25,8 @@ struct drm_gem_cma_object {

/* For objects with DMA memory allocated by GEM CMA */
void *vaddr;
+
+ bool map_noncoherent;
};

#define to_drm_gem_cma_obj(gem_obj) \
--
2.30.2

2021-05-23 17:05:57

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH v5 3/3] drm/ingenic: Add option to alloc cached GEM buffers

Alloc GEM buffers backed by noncoherent memory on SoCs where it is
actually faster than write-combine.

This dramatically speeds up software rendering on these SoCs, even for
tasks where write-combine memory should in theory be faster (e.g. simple
blits).

v3: The option is now selected per-SoC instead of being a module
parameter.

v5: - Fix drm_atomic_get_new_plane_state() used to retrieve the old
state
- Use custom drm_gem_fb_create()
- Only check damage clips and sync DMA buffers if non-coherent
buffers are used

Signed-off-by: Paul Cercueil <[email protected]>
---
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 +++++++++++++++++++++--
drivers/gpu/drm/ingenic/ingenic-drm.h | 1 +
drivers/gpu/drm/ingenic/ingenic-ipu.c | 21 ++++++--
3 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 389cad59e090..5244f4763477 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -9,6 +9,7 @@
#include <linux/component.h>
#include <linux/clk.h>
#include <linux/dma-mapping.h>
+#include <linux/io.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/of_device.h>
@@ -23,6 +24,7 @@
#include <drm/drm_color_mgmt.h>
#include <drm/drm_crtc.h>
#include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
#include <drm/drm_encoder.h>
#include <drm/drm_gem_cma_helper.h>
@@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
struct jz_soc_info {
bool needs_dev_clk;
bool has_osd;
+ bool map_noncoherent;
unsigned int max_width, max_height;
const u32 *formats_f0, *formats_f1;
unsigned int num_formats_f0, num_formats_f1;
@@ -410,6 +413,9 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
old_plane_state->fb->format->format != new_plane_state->fb->format->format))
crtc_state->mode_changed = true;

+ if (priv->soc_info->map_noncoherent)
+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
return 0;
}

@@ -526,6 +532,13 @@ void ingenic_drm_plane_config(struct device *dev,
}
}

+bool ingenic_drm_map_noncoherent(const struct device *dev)
+{
+ const struct ingenic_drm *priv = dev_get_drvdata(dev);
+
+ return priv->soc_info->map_noncoherent;
+}
+
static void ingenic_drm_update_palette(struct ingenic_drm *priv,
const struct drm_color_lut *lut)
{
@@ -544,8 +557,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_state *state)
{
struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
- plane);
+ struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+ struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
struct drm_crtc_state *crtc_state;
struct ingenic_dma_hwdesc *hwdesc;
unsigned int width, height, cpp, offset;
@@ -553,6 +566,9 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
u32 fourcc;

if (newstate && newstate->fb) {
+ if (priv->soc_info->map_noncoherent)
+ drm_fb_cma_sync_non_coherent(&priv->drm, oldstate, newstate);
+
crtc_state = newstate->crtc->state;

addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -742,6 +758,33 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
}

+static struct drm_framebuffer *
+ingenic_drm_gem_fb_create(struct drm_device *drm, struct drm_file *file,
+ const struct drm_mode_fb_cmd2 *mode_cmd)
+{
+ struct ingenic_drm *priv = drm_device_get_priv(drm);
+
+ if (priv->soc_info->map_noncoherent)
+ return drm_gem_fb_create_with_dirty(drm, file, mode_cmd);
+
+ return drm_gem_fb_create(drm, file, mode_cmd);
+}
+
+static struct drm_gem_object *
+ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
+{
+ struct ingenic_drm *priv = drm_device_get_priv(drm);
+ struct drm_gem_cma_object *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ obj->map_noncoherent = priv->soc_info->map_noncoherent;
+
+ return &obj->base;
+}
+
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);

static const struct drm_driver ingenic_drm_driver_data = {
@@ -754,6 +797,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
.patchlevel = 0,

.fops = &ingenic_drm_fops,
+ .gem_create_object = ingenic_drm_gem_create_object,
DRM_GEM_CMA_DRIVER_OPS,

.irq_handler = ingenic_drm_irq_handler,
@@ -804,7 +848,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs =
};

static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
- .fb_create = drm_gem_fb_create,
+ .fb_create = ingenic_drm_gem_fb_create,
.output_poll_changed = drm_fb_helper_output_poll_changed,
.atomic_check = drm_atomic_helper_check,
.atomic_commit = drm_atomic_helper_commit,
@@ -961,6 +1005,9 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
return ret;
}

+ if (soc_info->map_noncoherent)
+ drm_plane_enable_fb_damage_clips(&priv->f1);
+
drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);

ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -989,6 +1036,9 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
return ret;
}

+ if (soc_info->map_noncoherent)
+ drm_plane_enable_fb_damage_clips(&priv->f0);
+
if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
ret = component_bind_all(dev, drm);
if (ret) {
@@ -1245,6 +1295,7 @@ static const u32 jz4770_formats_f0[] = {
static const struct jz_soc_info jz4740_soc_info = {
.needs_dev_clk = true,
.has_osd = false,
+ .map_noncoherent = false,
.max_width = 800,
.max_height = 600,
.formats_f1 = jz4740_formats,
@@ -1255,6 +1306,7 @@ static const struct jz_soc_info jz4740_soc_info = {
static const struct jz_soc_info jz4725b_soc_info = {
.needs_dev_clk = false,
.has_osd = true,
+ .map_noncoherent = false,
.max_width = 800,
.max_height = 600,
.formats_f1 = jz4725b_formats_f1,
@@ -1266,6 +1318,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
static const struct jz_soc_info jz4770_soc_info = {
.needs_dev_clk = false,
.has_osd = true,
+ .map_noncoherent = true,
.max_width = 1280,
.max_height = 720,
.formats_f1 = jz4770_formats_f1,
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h
index 1b4347f7f084..22654ac1dde1 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm.h
+++ b/drivers/gpu/drm/ingenic/ingenic-drm.h
@@ -184,6 +184,7 @@ struct platform_driver;
void ingenic_drm_plane_config(struct device *dev,
struct drm_plane *plane, u32 fourcc);
void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+bool ingenic_drm_map_noncoherent(const struct device *dev);

extern struct platform_driver *ingenic_ipu_driver_ptr;

diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 3b1091e7c0cd..61b6d9fdbba1 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -20,10 +20,13 @@

#include <drm/drm_atomic.h>
#include <drm/drm_atomic_helper.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
#include <drm/drm_fb_cma_helper.h>
#include <drm/drm_fourcc.h>
#include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/drm_plane.h>
#include <drm/drm_plane_helper.h>
#include <drm/drm_property.h>
@@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_state *state)
{
struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
- plane);
+ struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+ struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
const struct drm_format_info *finfo;
u32 ctrl, stride = 0, coef_index = 0, format = 0;
bool needs_modeset, upscaling_w, upscaling_h;
@@ -317,6 +320,9 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
}

+ if (ingenic_drm_map_noncoherent(ipu->master))
+ drm_fb_cma_sync_non_coherent(ipu->drm, oldstate, newstate);
+
/* New addresses will be committed in vblank handler... */
ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
if (finfo->num_planes > 1)
@@ -541,7 +547,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,

if (!new_plane_state->crtc ||
!crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
- return 0;
+ goto out_check_damage;

/* Plane must be fully visible */
if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +564,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
return -EINVAL;

if (!osd_changed(new_plane_state, old_plane_state))
- return 0;
+ goto out_check_damage;

crtc_state->mode_changed = true;

@@ -592,6 +598,10 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
ipu->denom_w = denom_w;
ipu->denom_h = denom_h;

+out_check_damage:
+ if (ingenic_drm_map_noncoherent(ipu->master))
+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
return 0;
}

@@ -773,6 +783,9 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d)
return err;
}

+ if (ingenic_drm_map_noncoherent(master))
+ drm_plane_enable_fb_damage_clips(plane);
+
/*
* Sharpness settings range is [0,32]
* 0 : nearest-neighbor
--
2.30.2

2021-05-23 18:42:51

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent

Hi Paul,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.13-rc2 next-20210521]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Paul-Cercueil/Add-option-to-mmap-GEM-buffers-cached/20210524-010735
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4d7620341eda38573a73ab63c33423534fa38eb9
config: powerpc-randconfig-r013-20210524 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project b4fd512c36ca344a3ff69350219e8b0a67e9472a)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# https://github.com/0day-ci/linux/commit/2dea3091811eb397337113498cf675a383412c59
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Paul-Cercueil/Add-option-to-mmap-GEM-buffers-cached/20210524-010735
git checkout 2dea3091811eb397337113498cf675a383412c59
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:43:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insb, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:84:1: note: expanded from here
__do_insb
^
arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb'
#define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
In file included from include/linux/dma-buf.h:16:
In file included from include/linux/dma-buf-map.h:9:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:86:1: note: expanded from here
__do_insw
^
arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw'
#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
In file included from include/linux/dma-buf.h:16:
In file included from include/linux/dma-buf-map.h:9:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:88:1: note: expanded from here
__do_insl
^
arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl'
#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
In file included from include/linux/dma-buf.h:16:
In file included from include/linux/dma-buf-map.h:9:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:90:1: note: expanded from here
__do_outsb
^
arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb'
#define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
In file included from include/linux/dma-buf.h:16:
In file included from include/linux/dma-buf-map.h:9:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:92:1: note: expanded from here
__do_outsw
^
arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw'
#define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
In file included from include/linux/dma-buf.h:16:
In file included from include/linux/dma-buf-map.h:9:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:94:1: note: expanded from here
__do_outsl
^
arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl'
#define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from drivers/gpu/drm/pl111/pl111_display.c:18:
>> include/drm/drm_fb_cma_helper.h:17:42: warning: declaration of 'struct drm_device' will not be visible outside of this function [-Wvisibility]
void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
^
7 warnings generated.


vim +17 include/drm/drm_fb_cma_helper.h

9
10 struct drm_gem_cma_object *drm_fb_cma_get_gem_obj(struct drm_framebuffer *fb,
11 unsigned int plane);
12
13 dma_addr_t drm_fb_cma_get_gem_addr(struct drm_framebuffer *fb,
14 struct drm_plane_state *state,
15 unsigned int plane);
16
> 17 void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
18 struct drm_plane_state *old_state,
19 struct drm_plane_state *state);
20

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (8.77 kB)
.config.gz (41.69 kB)
Download all attachments

2021-05-23 19:09:14

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v5 0/3] Add option to mmap GEM buffers cached

Hi

Am 23.05.21 um 19:04 schrieb Paul Cercueil:
> V5 of my patchset which adds the option for having GEM buffers backed by
> non-coherent memory.
>
> Changes from V4:
>
> - [2/3]:
> - Rename to drm_fb_cma_sync_non_coherent
> - Invert loops for better cache locality
> - Only sync BOs that have the non-coherent flag
> - Properly sort includes
> - Move to drm_fb_cma_helper.c to avoid circular dependency

I'm pretty sure it's still not the right place. That would be something
like drm_gem_cma_atomic_helper.c, but creating a new file just for a
single function doesn't make sense.

>
> - [3/3]:
> - Fix drm_atomic_get_new_plane_state() used to retrieve the old
> state
> - Use custom drm_gem_fb_create()

It's often a better choice to express such differences via different
data structures (i.e., different instances of drm_mode_config_funcs) but
it's not a big deal either.

Please go ahaed and merge if no one objects. All patches:

Acked-by: Thomas Zimmermann <[email protected]>

Best regards
Thomas

> - Only check damage clips and sync DMA buffers if non-coherent
> buffers are used
>
> Cheers,
> -Paul
>
> Paul Cercueil (3):
> drm: Add support for GEM buffers backed by non-coherent memory
> drm: Add and export function drm_fb_cma_sync_non_coherent
> drm/ingenic: Add option to alloc cached GEM buffers
>
> drivers/gpu/drm/drm_fb_cma_helper.c | 46 ++++++++++++++++++
> drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++----
> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 +++++++++++++++++++++--
> drivers/gpu/drm/ingenic/ingenic-drm.h | 1 +
> drivers/gpu/drm/ingenic/ingenic-ipu.c | 21 ++++++--
> include/drm/drm_fb_cma_helper.h | 4 ++
> include/drm/drm_gem_cma_helper.h | 3 ++
> 7 files changed, 156 insertions(+), 16 deletions(-)
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-05-23 19:23:03

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 0/3] Add option to mmap GEM buffers cached

Hi Thomas,

Le dim., mai 23 2021 at 21:05:30 +0200, Thomas Zimmermann
<[email protected]> a ?crit :
> Hi
>
> Am 23.05.21 um 19:04 schrieb Paul Cercueil:
>> V5 of my patchset which adds the option for having GEM buffers
>> backed by
>> non-coherent memory.
>>
>> Changes from V4:
>>
>> - [2/3]:
>> - Rename to drm_fb_cma_sync_non_coherent
>> - Invert loops for better cache locality
>> - Only sync BOs that have the non-coherent flag
>> - Properly sort includes
>> - Move to drm_fb_cma_helper.c to avoid circular dependency
>
> I'm pretty sure it's still not the right place. That would be
> something like drm_gem_cma_atomic_helper.c, but creating a new file
> just for a single function doesn't make sense.

drm_fb_cma_sync_non_coherent calls drm_fb_cma_* functions, so it's a
better match than its former location (which wasn't good as it created
a circular dependency between drm.ko and drm-kms-helper.ko).

Do you have a better idea?

>>
>> - [3/3]:
>> - Fix drm_atomic_get_new_plane_state() used to retrieve the old
>> state
>> - Use custom drm_gem_fb_create()
>
> It's often a better choice to express such differences via different
> data structures (i.e., different instances of drm_mode_config_funcs)
> but it's not a big deal either.

The different drm_mode_config_funcs instances already exist in
drm_gem_framebuffer_helper.c but are static, and drm_gem_fb_create()
and drm_gem_fb_create_with_dirty() are just tiny wrappers around
drm_gem_fb_create_with_funcs() with the corresponding
drm_mode_config_funcs instance. I didn't want to copy them to
ingenic-drm-drv.c, but maybe I can export the symbols and use
drm_gem_fb_create_with_funcs() directly?

> Please go ahaed and merge if no one objects. All patches:
>
> Acked-by: Thomas Zimmermann <[email protected]>

Thanks!

Cheers,
-Paul

> Best regards
> Thomas
>
>> - Only check damage clips and sync DMA buffers if non-coherent
>> buffers are used
>>
>> Cheers,
>> -Paul
>>
>> Paul Cercueil (3):
>> drm: Add support for GEM buffers backed by non-coherent memory
>> drm: Add and export function drm_fb_cma_sync_non_coherent
>> drm/ingenic: Add option to alloc cached GEM buffers
>>
>> drivers/gpu/drm/drm_fb_cma_helper.c | 46 ++++++++++++++++++
>> drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++----
>> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59
>> +++++++++++++++++++++--
>> drivers/gpu/drm/ingenic/ingenic-drm.h | 1 +
>> drivers/gpu/drm/ingenic/ingenic-ipu.c | 21 ++++++--
>> include/drm/drm_fb_cma_helper.h | 4 ++
>> include/drm/drm_gem_cma_helper.h | 3 ++
>> 7 files changed, 156 insertions(+), 16 deletions(-)
>>
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 N?rnberg, Germany
> (HRB 36809, AG N?rnberg)
> Gesch?ftsf?hrer: Felix Imend?rffer
>


2021-05-25 10:36:36

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v5 0/3] Add option to mmap GEM buffers cached

Hi

Am 23.05.21 um 21:19 schrieb Paul Cercueil:
> Hi Thomas,
>
> Le dim., mai 23 2021 at 21:05:30 +0200, Thomas Zimmermann
> <[email protected]> a écrit :
>> Hi
>>
>> Am 23.05.21 um 19:04 schrieb Paul Cercueil:
>>> V5 of my patchset which adds the option for having GEM buffers backed
by
>>> non-coherent memory.
>>>
>>> Changes from V4:
>>>
>>> - [2/3]:
>>>      - Rename to drm_fb_cma_sync_non_coherent
>>>      - Invert loops for better cache locality
>>>      - Only sync BOs that have the non-coherent flag
>>>      - Properly sort includes
>>>      - Move to drm_fb_cma_helper.c to avoid circular dependency
>>
>> I'm pretty sure it's still not the right place. That would be
>> something like drm_gem_cma_atomic_helper.c, but creating a new file
>> just for a single function doesn't make sense.
>
> drm_fb_cma_sync_non_coherent calls drm_fb_cma_* functions, so it's a
> better match than its former location (which wasn't good as it created a
> circular dependency between drm.ko and drm-kms-helper.ko).
>
> Do you have a better idea?

No, it was more of a rant than a review comment. Please go ahead and
merge the patchset.

Best regards
Thomas


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-05-27 10:43:36

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory

On 23/05/2021 20:04, Paul Cercueil wrote:
> Having GEM buffers backed by non-coherent memory is interesting in the
> particular case where it is faster to render to a non-coherent buffer
> then sync the data cache, than to render to a write-combine buffer, and
> (by extension) much faster than using a shadow buffer. This is true for
> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
> are about three times faster using this method.
>
> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
> can be set by the drivers when they create the dumb buffer.
>
> Since this really only applies to software rendering, disable this flag
> as soon as the CMA objects are exported via PRIME.
>
> v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
> the objects are mapped, and use the new dma_mmap_pages function.
>
> v4: Make sure map_noncoherent is always disabled when creating GEM
> objects meant to be used with dma-buf.
>
> Signed-off-by: Paul Cercueil <[email protected]>
> Acked-by: Thomas Zimmermann <[email protected]>
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
> include/drm/drm_gem_cma_helper.h | 3 +++
> 2 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 7942cf05cd93..235c7a63da2b 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
> * @drm: DRM device
> * @size: size of the object to allocate
> + * @private: true if used for internal purposes
> *
> * This function creates and initializes a GEM CMA object of the given size,
> * but doesn't allocate any memory to back the object.
> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> * error code on failure.
> */
> static struct drm_gem_cma_object *
> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
> {
> struct drm_gem_cma_object *cma_obj;
> struct drm_gem_object *gem_obj;
> - int ret;
> + int ret = 0;
>
> if (drm->driver->gem_create_object)
> gem_obj = drm->driver->gem_create_object(drm, size);
> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
>
> cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
>
> - ret = drm_gem_object_init(drm, gem_obj, size);
> + if (private) {
> + drm_gem_private_object_init(drm, gem_obj, size);
> +
> + /* Always use writecombine for dma-buf mappings */
> + cma_obj->map_noncoherent = false;
> + } else {
> + ret = drm_gem_object_init(drm, gem_obj, size);
> + }
> if (ret)
> goto error;
>
> @@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
>
> size = round_up(size, PAGE_SIZE);
>
> - cma_obj = __drm_gem_cma_create(drm, size);
> + cma_obj = __drm_gem_cma_create(drm, size, false);
> if (IS_ERR(cma_obj))
> return cma_obj;
>
> - cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> - GFP_KERNEL | __GFP_NOWARN);
> + if (cma_obj->map_noncoherent) {
> + cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
> + &cma_obj->paddr,
> + DMA_TO_DEVICE,
> + GFP_KERNEL | __GFP_NOWARN);
> + } else {
> + cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> + GFP_KERNEL | __GFP_NOWARN);
> + }
> if (!cma_obj->vaddr) {
> drm_dbg(drm, "failed to allocate buffer with size %zu\n",
> size);
> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
> return ERR_PTR(-EINVAL);
>
> /* Create a CMA GEM buffer. */
> - cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> + cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
> if (IS_ERR(cma_obj))
> return ERR_CAST(cma_obj);
>
> @@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
>
> cma_obj = to_drm_gem_cma_obj(obj);
>
> - ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
> - cma_obj->paddr, vma->vm_end - vma->vm_start);
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> + if (!cma_obj->map_noncoherent)
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> + ret = dma_mmap_pages(cma_obj->base.dev->dev,
> + vma, vma->vm_end - vma->vm_start,
> + virt_to_page(cma_obj->vaddr));

This breaks mmap on TI's J7 EVM (tidss driver). All DRM apps just die
when doing mmap. Changing these lines back to dma_mmap_wc() makes it work.

Is dma_alloc_wc() even compatible with dma_mmap_pages()?

Tomi

2021-05-27 21:48:25

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory

Hi Tomi,

Le jeu., mai 27 2021 at 13:40:07 +0300, Tomi Valkeinen
<[email protected]> a ?crit :
> On 23/05/2021 20:04, Paul Cercueil wrote:
>> Having GEM buffers backed by non-coherent memory is interesting in
>> the
>> particular case where it is faster to render to a non-coherent buffer
>> then sync the data cache, than to render to a write-combine buffer,
>> and
>> (by extension) much faster than using a shadow buffer. This is true
>> for
>> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
>> are about three times faster using this method.
>>
>> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure,
>> which
>> can be set by the drivers when they create the dumb buffer.
>>
>> Since this really only applies to software rendering, disable this
>> flag
>> as soon as the CMA objects are exported via PRIME.
>>
>> v3: New patch. Now uses a simple 'map_noncoherent' flag to control
>> how
>> the objects are mapped, and use the new dma_mmap_pages function.
>>
>> v4: Make sure map_noncoherent is always disabled when creating GEM
>> objects meant to be used with dma-buf.
>>
>> Signed-off-by: Paul Cercueil <[email protected]>
>> Acked-by: Thomas Zimmermann <[email protected]>
>> ---
>> drivers/gpu/drm/drm_gem_cma_helper.c | 38
>> +++++++++++++++++++++-------
>> include/drm/drm_gem_cma_helper.h | 3 +++
>> 2 files changed, 32 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c
>> b/drivers/gpu/drm/drm_gem_cma_helper.c
>> index 7942cf05cd93..235c7a63da2b 100644
>> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
>> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs
>> drm_gem_cma_default_funcs = {
>> * __drm_gem_cma_create - Create a GEM CMA object without
>> allocating memory
>> * @drm: DRM device
>> * @size: size of the object to allocate
>> + * @private: true if used for internal purposes
>> *
>> * This function creates and initializes a GEM CMA object of the
>> given size,
>> * but doesn't allocate any memory to back the object.
>> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs
>> drm_gem_cma_default_funcs = {
>> * error code on failure.
>> */
>> static struct drm_gem_cma_object *
>> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
>> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool
>> private)
>> {
>> struct drm_gem_cma_object *cma_obj;
>> struct drm_gem_object *gem_obj;
>> - int ret;
>> + int ret = 0;
>>  if (drm->driver->gem_create_object)
>> gem_obj = drm->driver->gem_create_object(drm, size);
>> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm,
>> size_t size)
>>  cma_obj = container_of(gem_obj, struct drm_gem_cma_object,
>> base);
>> - ret = drm_gem_object_init(drm, gem_obj, size);
>> + if (private) {
>> + drm_gem_private_object_init(drm, gem_obj, size);
>> +
>> + /* Always use writecombine for dma-buf mappings */
>> + cma_obj->map_noncoherent = false;
>> + } else {
>> + ret = drm_gem_object_init(drm, gem_obj, size);
>> + }
>> if (ret)
>> goto error;
>> @@ -111,12 +119,19 @@ struct drm_gem_cma_object
>> *drm_gem_cma_create(struct drm_device *drm,
>>  size = round_up(size, PAGE_SIZE);
>> - cma_obj = __drm_gem_cma_create(drm, size);
>> + cma_obj = __drm_gem_cma_create(drm, size, false);
>> if (IS_ERR(cma_obj))
>> return cma_obj;
>> - cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
>> - GFP_KERNEL | __GFP_NOWARN);
>> + if (cma_obj->map_noncoherent) {
>> + cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
>> + &cma_obj->paddr,
>> + DMA_TO_DEVICE,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + } else {
>> + cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + }
>> if (!cma_obj->vaddr) {
>> drm_dbg(drm, "failed to allocate buffer with size %zu\n",
>> size);
>> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct
>> drm_device *dev,
>> return ERR_PTR(-EINVAL);
>>  /* Create a CMA GEM buffer. */
>> - cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
>> + cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
>> if (IS_ERR(cma_obj))
>> return ERR_CAST(cma_obj);
>> @@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object
>> *obj, struct vm_area_struct *vma)
>>  cma_obj = to_drm_gem_cma_obj(obj);
>> - ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
>> - cma_obj->paddr, vma->vm_end - vma->vm_start);
>> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> + if (!cma_obj->map_noncoherent)
>> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>> +
>> + ret = dma_mmap_pages(cma_obj->base.dev->dev,
>> + vma, vma->vm_end - vma->vm_start,
>> + virt_to_page(cma_obj->vaddr));
>
> This breaks mmap on TI's J7 EVM (tidss driver). All DRM apps just die
> when doing mmap. Changing these lines back to dma_mmap_wc() makes it
> work.
>
> Is dma_alloc_wc() even compatible with dma_mmap_pages()?

My bad, dma_mmap_wc() is not just a regular mmap with writecombine page
protection.

The solution, I guess, would be to call dma_mmap_wc() if
(!cma_obj->map_noncoherent). I can send a patch later this week, unless
you already have one?

Cheers,
-Paul


2021-05-28 00:19:16

by Paul Cercueil

[permalink] [raw]
Subject: [PATCH] drm: Fix for GEM buffers with write-combine memory

The previous commit wrongly assumed that dma_mmap_wc() could be replaced
by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup,
but did not work everywhere.

Use dma_mmap_wc() when the buffer has the write-combine cache attribute,
and dma_mmap_pages() when it has the non-coherent cache attribute.

Signed-off-by: Paul Cercueil <[email protected]>
Reported-by: Tomi Valkeinen <[email protected]>
Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by non-coherent memory")
---
drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 235c7a63da2b..4c3772651954 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)

cma_obj = to_drm_gem_cma_obj(obj);

- vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
- if (!cma_obj->map_noncoherent)
- vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+ if (cma_obj->map_noncoherent) {
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+ ret = dma_mmap_pages(cma_obj->base.dev->dev,
+ vma, vma->vm_end - vma->vm_start,
+ virt_to_page(cma_obj->vaddr));
+ } else {
+ ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
+ cma_obj->paddr, vma->vm_end - vma->vm_start);

- ret = dma_mmap_pages(cma_obj->base.dev->dev,
- vma, vma->vm_end - vma->vm_start,
- virt_to_page(cma_obj->vaddr));
+ }
if (ret)
drm_gem_vm_close(vma);

--
2.30.2

2021-05-28 06:26:22

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: [PATCH] drm: Fix for GEM buffers with write-combine memory

On 28/05/2021 02:03, Paul Cercueil wrote:
> The previous commit wrongly assumed that dma_mmap_wc() could be replaced
> by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup,
> but did not work everywhere.
>
> Use dma_mmap_wc() when the buffer has the write-combine cache attribute,
> and dma_mmap_pages() when it has the non-coherent cache attribute.
>
> Signed-off-by: Paul Cercueil <[email protected]>
> Reported-by: Tomi Valkeinen <[email protected]>
> Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by non-coherent memory")
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 235c7a63da2b..4c3772651954 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
>
> cma_obj = to_drm_gem_cma_obj(obj);
>
> - vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> - if (!cma_obj->map_noncoherent)
> - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> + if (cma_obj->map_noncoherent) {
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +
> + ret = dma_mmap_pages(cma_obj->base.dev->dev,
> + vma, vma->vm_end - vma->vm_start,
> + virt_to_page(cma_obj->vaddr));
> + } else {
> + ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
> + cma_obj->paddr, vma->vm_end - vma->vm_start);
>
> - ret = dma_mmap_pages(cma_obj->base.dev->dev,
> - vma, vma->vm_end - vma->vm_start,
> - virt_to_page(cma_obj->vaddr));
> + }
> if (ret)
> drm_gem_vm_close(vma);
>
>

Reviewed-by: Tomi Valkeinen <[email protected]>

and

Tested-by: Tomi Valkeinen <[email protected]>

Thanks!

Btw, the kernel-doc for drm_gem_cma_create doesn't quite match, as it
says wc is always used.

Tomi

2021-05-28 12:13:04

by Paul Cercueil

[permalink] [raw]
Subject: Re: [PATCH] drm: Fix for GEM buffers with write-combine memory

Hi Tomi,

Le ven., mai 28 2021 at 08:59:15 +0300, Tomi Valkeinen
<[email protected]> a ?crit :
> On 28/05/2021 02:03, Paul Cercueil wrote:
>> The previous commit wrongly assumed that dma_mmap_wc() could be
>> replaced
>> by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup,
>> but did not work everywhere.
>>
>> Use dma_mmap_wc() when the buffer has the write-combine cache
>> attribute,
>> and dma_mmap_pages() when it has the non-coherent cache attribute.
>>
>> Signed-off-by: Paul Cercueil <[email protected]>
>> Reported-by: Tomi Valkeinen <[email protected]>
>> Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by
>> non-coherent memory")
>> ---
>> drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++++++++++------
>> 1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c
>> b/drivers/gpu/drm/drm_gem_cma_helper.c
>> index 235c7a63da2b..4c3772651954 100644
>> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
>> @@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object
>> *obj, struct vm_area_struct *vma)
>>  cma_obj = to_drm_gem_cma_obj(obj);
>> - vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> - if (!cma_obj->map_noncoherent)
>> - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>> + if (cma_obj->map_noncoherent) {
>> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> +
>> + ret = dma_mmap_pages(cma_obj->base.dev->dev,
>> + vma, vma->vm_end - vma->vm_start,
>> + virt_to_page(cma_obj->vaddr));
>> + } else {
>> + ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
>> + cma_obj->paddr, vma->vm_end - vma->vm_start);
>> - ret = dma_mmap_pages(cma_obj->base.dev->dev,
>> - vma, vma->vm_end - vma->vm_start,
>> - virt_to_page(cma_obj->vaddr));
>> + }
>> if (ret)
>> drm_gem_vm_close(vma);
>> 
>
> Reviewed-by: Tomi Valkeinen <[email protected]>
>
> and
>
> Tested-by: Tomi Valkeinen <[email protected]>

Thanks, I applied it now.

>
> Thanks!
>
> Btw, the kernel-doc for drm_gem_cma_create doesn't quite match, as it
> says wc is always used.

Alright, I'll send a patch later for this one too.

Cheers,
-Paul