2023-06-20 10:02:59

by Sui Jingfeng

[permalink] [raw]
Subject: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

From: Sui Jingfeng <[email protected]>

Loongson CPUs maintain cache coherency by hardware, which means that the
data in the CPU cache is identical to the data in main system memory. As
for the peripheral device, most of Loongson chips chose to define the
peripherals as DMA coherent by default, device drivers do not need to
maintain the coherency between a processor and an I/O device manually.

There are exceptions, for LS2K1000 SoC, part of peripheral device can be
configured as DMA non-coherent. But there is no released version of such
firmware exist in the market. Peripherals of older LS2K1000 is also DMA
non-coherent, but they are nearly outdated. So, those are trivial cases.

Nevertheless, kernel space still need to do the probe work, because vivante
GPU IP has been integrated into various platform. Hence, this patch add
runtime detection code to probe if a specific GPU is DMA coherent, If the
answer is yes, we are going to utilize such features. On Loongson platform,
When a buffer is accessed by both the GPU and the CPU, the driver should
prefer ETNA_BO_CACHED over ETNA_BO_WC.

This patch also add a new parameter: etnaviv_param_gpu_coherent, which
allow userspace to know if such a feature is available. Because
write-combined BO is still preferred in some case, especially where don't
need CPU read, for example, uploading compiled shader bin.

Cc: Lucas Stach <[email protected]>
Cc: Christian Gmeiner <[email protected]>
Cc: Philipp Zabel <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Sui Jingfeng <[email protected]>
---
drivers/gpu/drm/etnaviv/etnaviv_drv.c | 35 +++++++++++++++++++++
drivers/gpu/drm/etnaviv/etnaviv_drv.h | 6 ++++
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 22 ++++++++++---
drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 7 ++++-
drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +++
include/uapi/drm/etnaviv_drm.h | 1 +
6 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 0a365e96d371..d8e788aa16cb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -5,7 +5,9 @@

#include <linux/component.h>
#include <linux/dma-mapping.h>
+#include <linux/dma-map-ops.h>
#include <linux/module.h>
+#include <linux/of_address.h>
#include <linux/of_platform.h>
#include <linux/uaccess.h>

@@ -24,6 +26,34 @@
#include "etnaviv_pci_drv.h"
#include "etnaviv_perfmon.h"

+static struct device_node *etnaviv_of_first_available_node(void)
+{
+ struct device_node *core_node;
+
+ for_each_compatible_node(core_node, NULL, "vivante,gc") {
+ if (of_device_is_available(core_node))
+ return core_node;
+ }
+
+ return NULL;
+}
+
+static bool etnaviv_is_dma_coherent(struct device *dev)
+{
+ struct device_node *np;
+ bool coherent;
+
+ np = etnaviv_of_first_available_node();
+ if (np) {
+ coherent = of_dma_is_coherent(np);
+ of_node_put(np);
+ } else {
+ coherent = dev_is_dma_coherent(dev);
+ }
+
+ return coherent;
+}
+
/*
* etnaviv private data construction and destructions:
*/
@@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct drm_device *drm)
return ERR_PTR(-ENOMEM);
}

+ priv->dma_coherent = etnaviv_is_dma_coherent(dev);
+
+ if (priv->dma_coherent)
+ drm_info(drm, "%s is dma coherent\n", dev_name(dev));
+
return priv;
}

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
index 9cd72948cfad..644e5712c050 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
@@ -46,6 +46,12 @@ struct etnaviv_drm_private {
struct xarray active_contexts;
u32 next_context_id;

+ /*
+ * If true, the GPU is capable of snooping cpu cache. Here, it
+ * also means that cache coherency is enforced by the hardware.
+ */
+ bool dma_coherent;
+
/* list of GEM objects: */
struct mutex gem_lock;
struct list_head gem_list;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index b5f73502e3dd..39bdc3774f2d 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
{
struct page **pages;
+ pgprot_t prot;

lockdep_assert_held(&obj->lock);

@@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
if (IS_ERR(pages))
return NULL;

- return vmap(pages, obj->base.size >> PAGE_SHIFT,
- VM_MAP, pgprot_writecombine(PAGE_KERNEL));
+ switch (obj->flags) {
+ case ETNA_BO_CACHED:
+ prot = PAGE_KERNEL;
+ break;
+ case ETNA_BO_UNCACHED:
+ prot = pgprot_noncached(PAGE_KERNEL);
+ break;
+ case ETNA_BO_WC:
+ default:
+ prot = pgprot_writecombine(PAGE_KERNEL);
+ }
+
+ return vmap(pages, obj->base.size >> PAGE_SHIFT, VM_MAP, prot);
}

static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
@@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
{
struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
struct drm_device *dev = obj->dev;
+ struct etnaviv_drm_private *priv = dev->dev_private;
bool write = !!(op & ETNA_PREP_WRITE);
int ret;

@@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
return ret == 0 ? -ETIMEDOUT : ret;
}

- if (etnaviv_obj->flags & ETNA_BO_CACHED) {
+ if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
etnaviv_op_to_dma_dir(op));
etnaviv_obj->last_cpu_prep_op = op;
@@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
{
struct drm_device *dev = obj->dev;
struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+ struct etnaviv_drm_private *priv = dev->dev_private;

- if (etnaviv_obj->flags & ETNA_BO_CACHED) {
+ if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
/* fini without a prep is almost certainly a userspace error */
WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
index 3524b5811682..754126992264 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
@@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach, struct sg_table *sgt)
{
+ struct etnaviv_drm_private *priv = dev->dev_private;
struct etnaviv_gem_object *etnaviv_obj;
size_t size = PAGE_ALIGN(attach->dmabuf->size);
+ u32 cache_flags = ETNA_BO_WC;
int ret, npages;

- ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
+ if (priv->dma_coherent)
+ cache_flags = ETNA_BO_CACHED;
+
+ ret = etnaviv_gem_new_private(dev, size, cache_flags,
&etnaviv_gem_prime_ops, &etnaviv_obj);
if (ret < 0)
return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index d6a21e97feb1..d99ac675ce8b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -164,6 +164,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, u32 param, u64 *value)
*value = gpu->identity.eco_id;
break;

+ case ETNAVIV_PARAM_GPU_COHERENT:
+ *value = priv->dma_coherent;
+ break;
+
default:
DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
return -EINVAL;
diff --git a/include/uapi/drm/etnaviv_drm.h b/include/uapi/drm/etnaviv_drm.h
index af024d90453d..76baf45d7158 100644
--- a/include/uapi/drm/etnaviv_drm.h
+++ b/include/uapi/drm/etnaviv_drm.h
@@ -77,6 +77,7 @@ struct drm_etnaviv_timespec {
#define ETNAVIV_PARAM_GPU_PRODUCT_ID 0x1c
#define ETNAVIV_PARAM_GPU_CUSTOMER_ID 0x1d
#define ETNAVIV_PARAM_GPU_ECO_ID 0x1e
+#define ETNAVIV_PARAM_GPU_COHERENT 0x1f

#define ETNA_MAX_PIPES 4

--
2.25.1



2023-06-21 10:12:25

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Dienstag, dem 20.06.2023 um 17:47 +0800 schrieb Sui Jingfeng:
> From: Sui Jingfeng <[email protected]>
>
> Loongson CPUs maintain cache coherency by hardware, which means that the
> data in the CPU cache is identical to the data in main system memory. As
> for the peripheral device, most of Loongson chips chose to define the
> peripherals as DMA coherent by default, device drivers do not need to
> maintain the coherency between a processor and an I/O device manually.
>
> There are exceptions, for LS2K1000 SoC, part of peripheral device can be
> configured as DMA non-coherent. But there is no released version of such
> firmware exist in the market. Peripherals of older LS2K1000 is also DMA
> non-coherent, but they are nearly outdated. So, those are trivial cases.
>
> Nevertheless, kernel space still need to do the probe work, because vivante
> GPU IP has been integrated into various platform. Hence, this patch add
> runtime detection code to probe if a specific GPU is DMA coherent, If the
> answer is yes, we are going to utilize such features. On Loongson platform,
> When a buffer is accessed by both the GPU and the CPU, the driver should
> prefer ETNA_BO_CACHED over ETNA_BO_WC.
>
> This patch also add a new parameter: etnaviv_param_gpu_coherent, which
> allow userspace to know if such a feature is available. Because
> write-combined BO is still preferred in some case, especially where don't
> need CPU read, for example, uploading compiled shader bin.
>
> Cc: Lucas Stach <[email protected]>
> Cc: Christian Gmeiner <[email protected]>
> Cc: Philipp Zabel <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Signed-off-by: Sui Jingfeng <[email protected]>
> ---
> drivers/gpu/drm/etnaviv/etnaviv_drv.c | 35 +++++++++++++++++++++
> drivers/gpu/drm/etnaviv/etnaviv_drv.h | 6 ++++
> drivers/gpu/drm/etnaviv/etnaviv_gem.c | 22 ++++++++++---
> drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 7 ++++-
> drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +++
> include/uapi/drm/etnaviv_drm.h | 1 +
> 6 files changed, 70 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 0a365e96d371..d8e788aa16cb 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -5,7 +5,9 @@
>
> #include <linux/component.h>
> #include <linux/dma-mapping.h>
> +#include <linux/dma-map-ops.h>
> #include <linux/module.h>
> +#include <linux/of_address.h>
> #include <linux/of_platform.h>
> #include <linux/uaccess.h>
>
> @@ -24,6 +26,34 @@
> #include "etnaviv_pci_drv.h"
> #include "etnaviv_perfmon.h"
>
> +static struct device_node *etnaviv_of_first_available_node(void)
> +{
> + struct device_node *core_node;
> +
> + for_each_compatible_node(core_node, NULL, "vivante,gc") {
> + if (of_device_is_available(core_node))
> + return core_node;
> + }
> +
> + return NULL;
> +}
> +
> +static bool etnaviv_is_dma_coherent(struct device *dev)
> +{
> + struct device_node *np;
> + bool coherent;
> +
> + np = etnaviv_of_first_available_node();
> + if (np) {
> + coherent = of_dma_is_coherent(np);
> + of_node_put(np);
> + } else {
> + coherent = dev_is_dma_coherent(dev);
> + }

This whole dance shouldn't be needed. We transfer the DMA capabilities
from the first node to the virtual master device in the platform device
case, so dev_is_dma_coherent(dev) should always return the right thing.

> +
> + return coherent;
> +}
> +
> /*
> * etnaviv private data construction and destructions:
> */
> @@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct drm_device *drm)
> return ERR_PTR(-ENOMEM);
> }
>
> + priv->dma_coherent = etnaviv_is_dma_coherent(dev);
> +
> + if (priv->dma_coherent)
> + drm_info(drm, "%s is dma coherent\n", dev_name(dev));
> +
> return priv;
> }
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> index 9cd72948cfad..644e5712c050 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
> struct xarray active_contexts;
> u32 next_context_id;
>
> + /*
> + * If true, the GPU is capable of snooping cpu cache. Here, it
> + * also means that cache coherency is enforced by the hardware.
> + */
> + bool dma_coherent;
> +
No need for this, I think. Just use dev_is_dma_coherent() where you
need to know this.

> /* list of GEM objects: */
> struct mutex gem_lock;
> struct list_head gem_list;
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index b5f73502e3dd..39bdc3774f2d 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
> static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
> {
> struct page **pages;
> + pgprot_t prot;
>
> lockdep_assert_held(&obj->lock);
>
> @@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
> if (IS_ERR(pages))
> return NULL;
>
> - return vmap(pages, obj->base.size >> PAGE_SHIFT,
> - VM_MAP, pgprot_writecombine(PAGE_KERNEL));
> + switch (obj->flags) {

switch (obj->flags & ETNA_BO_CACHE_MASK)

> + case ETNA_BO_CACHED:
> + prot = PAGE_KERNEL;
> + break;
> + case ETNA_BO_UNCACHED:
> + prot = pgprot_noncached(PAGE_KERNEL);
> + break;
> + case ETNA_BO_WC:
> + default:
> + prot = pgprot_writecombine(PAGE_KERNEL);
> + }
> +
> + return vmap(pages, obj->base.size >> PAGE_SHIFT, VM_MAP, prot);

While that change makes sense it should also be in a separate patch, as
it's a valid change on its own, even if for non-coherent devices.

> }
>
> static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
> @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> {
> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> struct drm_device *dev = obj->dev;
> + struct etnaviv_drm_private *priv = dev->dev_private;
> bool write = !!(op & ETNA_PREP_WRITE);
> int ret;
>
> @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> return ret == 0 ? -ETIMEDOUT : ret;
> }
>
> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {

Why do you need this? Isn't dma_sync_sgtable_for_cpu a no-op on your
platform when the device is coherent?

> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
> etnaviv_op_to_dma_dir(op));
> etnaviv_obj->last_cpu_prep_op = op;
> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
> {
> struct drm_device *dev = obj->dev;
> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> + struct etnaviv_drm_private *priv = dev->dev_private;
>
> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> /* fini without a prep is almost certainly a userspace error */
> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> index 3524b5811682..754126992264 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
> struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
> struct dma_buf_attachment *attach, struct sg_table *sgt)
> {
> + struct etnaviv_drm_private *priv = dev->dev_private;
> struct etnaviv_gem_object *etnaviv_obj;
> size_t size = PAGE_ALIGN(attach->dmabuf->size);
> + u32 cache_flags = ETNA_BO_WC;
> int ret, npages;
>
> - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
> + if (priv->dma_coherent)
> + cache_flags = ETNA_BO_CACHED;
> +
Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
from WC to CACHED as necessary by adding something like this:

/*
* Upgrade WC to CACHED when the device is hardware coherent and the
* platform doesn't allow mixing cached and writecombined mappings to
* the same memory area.
*/
if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;

Regards,
Lucas

> + ret = etnaviv_gem_new_private(dev, size, cache_flags,
> &etnaviv_gem_prime_ops, &etnaviv_obj);
> if (ret < 0)
> return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> index d6a21e97feb1..d99ac675ce8b 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> @@ -164,6 +164,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, u32 param, u64 *value)
> *value = gpu->identity.eco_id;
> break;
>
> + case ETNAVIV_PARAM_GPU_COHERENT:
> + *value = priv->dma_coherent;
> + break;
> +
> default:
> DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
> return -EINVAL;
> diff --git a/include/uapi/drm/etnaviv_drm.h b/include/uapi/drm/etnaviv_drm.h
> index af024d90453d..76baf45d7158 100644
> --- a/include/uapi/drm/etnaviv_drm.h
> +++ b/include/uapi/drm/etnaviv_drm.h
> @@ -77,6 +77,7 @@ struct drm_etnaviv_timespec {
> #define ETNAVIV_PARAM_GPU_PRODUCT_ID 0x1c
> #define ETNAVIV_PARAM_GPU_CUSTOMER_ID 0x1d
> #define ETNAVIV_PARAM_GPU_ECO_ID 0x1e
> +#define ETNAVIV_PARAM_GPU_COHERENT 0x1f
>
> #define ETNA_MAX_PIPES 4
>


2023-06-21 14:48:26

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device


On 2023/6/21 18:00, Lucas Stach wrote:
>> +static bool etnaviv_is_dma_coherent(struct device *dev)
>> +{
>> + struct device_node *np;
>> + bool coherent;
>> +
>> + np = etnaviv_of_first_available_node();
>> + if (np) {
>> + coherent = of_dma_is_coherent(np);
>> + of_node_put(np);
>> + } else {
>> + coherent = dev_is_dma_coherent(dev);
>> + }
> This whole dance shouldn't be needed. We transfer the DMA capabilities
> from the first node to the virtual master device in the platform device
> case, so dev_is_dma_coherent(dev) should always return the right thing.
>
OK, I'm fine if this is OK on your platform.
>> +
>> + return coherent;
>> +}
>> +

--
Jingfeng


2023-06-21 14:50:39

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device


On 2023/6/21 18:00, Lucas Stach wrote:
>> /* list of GEM objects: */
>> struct mutex gem_lock;
>> struct list_head gem_list;
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> index b5f73502e3dd..39bdc3774f2d 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> @@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
>> static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
>> {
>> struct page **pages;
>> + pgprot_t prot;
>>
>> lockdep_assert_held(&obj->lock);
>>
>> @@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
>> if (IS_ERR(pages))
>> return NULL;
>>
>> - return vmap(pages, obj->base.size >> PAGE_SHIFT,
>> - VM_MAP, pgprot_writecombine(PAGE_KERNEL));
>> + switch (obj->flags) {
> switch (obj->flags & ETNA_BO_CACHE_MASK)
>
This is certainly OK, acceptable!

--
Jingfeng


2023-06-21 14:51:56

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/21 18:00, Lucas Stach wrote:
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> index 9cd72948cfad..644e5712c050 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
>> struct xarray active_contexts;
>> u32 next_context_id;
>>
>> + /*
>> + * If true, the GPU is capable of snooping cpu cache. Here, it
>> + * also means that cache coherency is enforced by the hardware.
>> + */
>> + bool dma_coherent;
>> +
> No need for this, I think. Just use dev_is_dma_coherent() where you
> need to know this.
>
No, we want this value cached by the driver.

We only need call  dev_is_dma_coherent() once!

We need to reuse this variable on other places.

--
Jingfeng


2023-06-21 15:20:49

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi

On 2023/6/21 18:00, Lucas Stach wrote:
>> + case ETNA_BO_CACHED:
>> + prot = PAGE_KERNEL;
>> + break;
>> + case ETNA_BO_UNCACHED:
>> + prot = pgprot_noncached(PAGE_KERNEL);
>> + break;
>> + case ETNA_BO_WC:
>> + default:
>> + prot = pgprot_writecombine(PAGE_KERNEL);
>> + }
>> +
>> + return vmap(pages, obj->base.size >> PAGE_SHIFT, VM_MAP, prot);
> While that change makes sense it should also be in a separate patch, as
> it's a valid change on its own, even if for non-coherent devices.
>
Accept with pleasure.

I will prepare the patch carefully.

Thanks for the positive comments.


--
Jingfeng


2023-06-21 15:35:50

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Mittwoch, dem 21.06.2023 um 22:44 +0800 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/6/21 18:00, Lucas Stach wrote:
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > index 9cd72948cfad..644e5712c050 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
> > > struct xarray active_contexts;
> > > u32 next_context_id;
> > >
> > > + /*
> > > + * If true, the GPU is capable of snooping cpu cache. Here, it
> > > + * also means that cache coherency is enforced by the hardware.
> > > + */
> > > + bool dma_coherent;
> > > +
> > No need for this, I think. Just use dev_is_dma_coherent() where you
> > need to know this.
> >
> No, we want this value cached by the driver.
>
Why? dev_is_dma_coherent() is a header-only function with a single
pointer chasing operation. Your cache is also a single pointer chasing
access, just that we now need storage for this information in both
struct device and struct etnaviv_gpu.

Regards,
Lucas

> We only need call  dev_is_dma_coherent() once!
>
> We need to reuse this variable on other places.
>


2023-06-21 15:41:05

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Mittwoch, dem 21.06.2023 um 23:00 +0800 schrieb Sui Jingfeng:
> On 2023/6/21 18:00, Lucas Stach wrote:
> > > static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
> > > @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > {
> > > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > struct drm_device *dev = obj->dev;
> > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > bool write = !!(op & ETNA_PREP_WRITE);
> > > int ret;
> > >
> > > @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > return ret == 0 ? -ETIMEDOUT : ret;
> > > }
> > >
> > > - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> > Why do you need this? Isn't dma_sync_sgtable_for_cpu a no-op on your
> > platform when the device is coherent?
> >
> I need this to show that our hardware is truly dma-coherent!
>
> I have tested that the driver still works like a charm without adding
> this code '!priv->dma_coherent'.
>
>
> But I'm expressing the idea that a truly dma-coherent just device don't
> need this.
>
> I don't care if it is a no-op.
>
> It is now, it may not in the future.

And that's exactly the point. If it ever turns into something more than
a no-op on your platform, then that's probably for a good reason and a
driver should not assume that it knows better than the DMA API
implementation what is or is not required on a specific platform to
make DMA work.

>
> Even it is, the overhead of function call itself still get involved.
>
cpu_prep/fini aren't total fast paths, you already synchronized with
the GPU here, potentially waiting for jobs to finish, etc. If your
platform no-ops this then the function call will be in the noise.

> Also, we want to try flush the write buffer with the CPU manually.
>
>
> Currently, we want the absolute correctness in the concept,
>
> not only the rendering results.

And if you want absolute correctness then calling dma_sync_sgtable_* is
the right thing to do, as it can do much more than just manage caches. 

Right now it also provides SWIOTLB translation if needed.

Regards,
Lucas

2023-06-21 15:43:14

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device


On 2023/6/21 18:00, Lucas Stach wrote:
>> static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
>> @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
>> {
>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>> struct drm_device *dev = obj->dev;
>> + struct etnaviv_drm_private *priv = dev->dev_private;
>> bool write = !!(op & ETNA_PREP_WRITE);
>> int ret;
>>
>> @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
>> return ret == 0 ? -ETIMEDOUT : ret;
>> }
>>
>> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> Why do you need this? Isn't dma_sync_sgtable_for_cpu a no-op on your
> platform when the device is coherent?
>
I need this to show that our hardware is truly dma-coherent!

I have tested that the driver still works like a charm without adding
this code '!priv->dma_coherent'.


But I'm expressing the idea that a truly dma-coherent just device don't
need this.

I don't care if it is a no-op.

It is now, it may not in the future.

Even it is, the overhead of function call itself still get involved.

Also, we want to try flush the write buffer with the CPU manually.


Currently, we want the absolute correctness in the concept,

not only the rendering results.

--
Jingfeng


2023-06-21 15:47:01

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/21 18:00, Lucas Stach wrote:
>> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>> etnaviv_op_to_dma_dir(op));
>> etnaviv_obj->last_cpu_prep_op = op;
>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>> {
>> struct drm_device *dev = obj->dev;
>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>
>> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>> /* fini without a prep is almost certainly a userspace error */
>> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> index 3524b5811682..754126992264 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
>> struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
>> struct dma_buf_attachment *attach, struct sg_table *sgt)
>> {
>> + struct etnaviv_drm_private *priv = dev->dev_private;
>> struct etnaviv_gem_object *etnaviv_obj;
>> size_t size = PAGE_ALIGN(attach->dmabuf->size);
>> + u32 cache_flags = ETNA_BO_WC;
>> int ret, npages;
>>
>> - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>> + if (priv->dma_coherent)
>> + cache_flags = ETNA_BO_CACHED;
>> +
> Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
> from WC to CACHED as necessary by adding something like this:

I understand you are a profession person in vivante GPU driver domain.

I respect you reviews and instruction.

But, I'm really reluctant to agree with this, is there any space to
negotiate?

> /*
> * Upgrade WC to CACHED when the device is hardware coherent and the
> * platform doesn't allow mixing cached and writecombined mappings to
> * the same memory area.
> */
> if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
> dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
> flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;

This is policy, not a mechanism.

Using what cache property is a user-space program's choice.

While you are override the WC with CACHED mapping. This is not correct
in the concept!

you approach forbidden any possibility to use the WC BO at anywhere.


My approach need only check once, while you approach need at least 3
check plus

so much bit-wise logic operations,  plus a function call  (&, ==, &&, 
&, ~, &) .

and every time you create a BO. This nasty judgement happens.


Please keep our original implement, it's simple and clear, Please?


--
Jingfeng


2023-06-21 16:18:11

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/6/21 18:00, Lucas Stach wrote:
> > > dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
> > > etnaviv_op_to_dma_dir(op));
> > > etnaviv_obj->last_cpu_prep_op = op;
> > > @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
> > > {
> > > struct drm_device *dev = obj->dev;
> > > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > >
> > > - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > /* fini without a prep is almost certainly a userspace error */
> > > WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
> > > dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > index 3524b5811682..754126992264 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
> > > struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
> > > struct dma_buf_attachment *attach, struct sg_table *sgt)
> > > {
> > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > struct etnaviv_gem_object *etnaviv_obj;
> > > size_t size = PAGE_ALIGN(attach->dmabuf->size);
> > > + u32 cache_flags = ETNA_BO_WC;
> > > int ret, npages;
> > >
> > > - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
> > > + if (priv->dma_coherent)
> > > + cache_flags = ETNA_BO_CACHED;
> > > +
> > Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
> > from WC to CACHED as necessary by adding something like this:
>
> I understand you are a profession person in vivante GPU driver domain.
>
> I respect you reviews and instruction.
>
> But, I'm really reluctant to agree with this, is there any space to
> negotiate?
>
> > /*
> > * Upgrade WC to CACHED when the device is hardware coherent and the
> > * platform doesn't allow mixing cached and writecombined mappings to
> > * the same memory area.
> > */
> > if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
> > dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
> > flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
>
> This is policy, not a mechanism.
>
> Using what cache property is a user-space program's choice.
>
> While you are override the WC with CACHED mapping. This is not correct
> in the concept!
>
Please explain why you think that this isn't correct. If using WC
mappings cause a potential loss of coherency on your platform, then we
can not allow the userspace driver to use WC mappings.

As I would like to keep the option of WC mappings, I've asked you if
there are ways to prepare the cache in a way that WC mappings aren't
causing any troubles on your platform. You told me that this might be
possible but needs confirmation from a HW engineer and such
confirmation could take a long time.

With that in mind, our only option right now is to upgrade the mappings
to cached in order to not lay out traps for the userspace driver.

> you approach forbidden any possibility to use the WC BO at anywhere.
>
>
> My approach need only check once, while you approach need at least 3
> check plus
>
> so much bit-wise logic operations,  plus a function call  (&, ==, &&, 
> &, ~, &) .
>
> and every time you create a BO. This nasty judgement happens.
>
BO creation again is not a fast path. You are committing to allocate
new memory, which is a few orders of magnitude more costly than the few
instructions needed for those comparisons.

>
> Please keep our original implement, it's simple and clear, Please?
>

It isn't as simple and clear for the userspace interface. It allows
userspace to use WC mappings that would potentially cause loss of
coherency between CPU and GPU, which isn't acceptable.

Regards,
Lucas

2023-06-21 16:19:43

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Mittwoch, dem 21.06.2023 um 23:54 +0800 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/6/21 23:33, Lucas Stach wrote:
> > Am Mittwoch, dem 21.06.2023 um 23:00 +0800 schrieb Sui Jingfeng:
> > > On 2023/6/21 18:00, Lucas Stach wrote:
> > > > > static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
> > > > > @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > > > {
> > > > > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > > struct drm_device *dev = obj->dev;
> > > > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > > > bool write = !!(op & ETNA_PREP_WRITE);
> > > > > int ret;
> > > > >
> > > > > @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > > > return ret == 0 ? -ETIMEDOUT : ret;
> > > > > }
> > > > >
> > > > > - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > > > + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > > Why do you need this? Isn't dma_sync_sgtable_for_cpu a no-op on your
> > > > platform when the device is coherent?
> > > >
> > > I need this to show that our hardware is truly dma-coherent!
> > >
> > > I have tested that the driver still works like a charm without adding
> > > this code '!priv->dma_coherent'.
> > >
> > >
> > > But I'm expressing the idea that a truly dma-coherent just device don't
> > > need this.
> > >
> > > I don't care if it is a no-op.
> > >
> > > It is now, it may not in the future.
> > And that's exactly the point. If it ever turns into something more than
> > a no-op on your platform, then that's probably for a good reason and a
> > driver should not assume that it knows better than the DMA API
> > implementation what is or is not required on a specific platform to
> > make DMA work.
> >
> > > Even it is, the overhead of function call itself still get involved.
> > >
> > cpu_prep/fini aren't total fast paths, you already synchronized with
> > the GPU here, potentially waiting for jobs to finish, etc. If your
> > platform no-ops this then the function call will be in the noise.
> >
> > > Also, we want to try flush the write buffer with the CPU manually.
> > >
> > >
> > > Currently, we want the absolute correctness in the concept,
> > >
> > > not only the rendering results.
> > And if you want absolute correctness then calling dma_sync_sgtable_* is
> > the right thing to do, as it can do much more than just manage caches.
>
> For our hardware, cached mapping don't need calling dma_sync_sgtable_*.
>
> This is the the right thing to do. The hardware already guarantee it for
> use.
>
And as the HW guarantees it on your platform, your platform
implementation makes this function effectively a no-op. Skipping the
call to this function is breaking the DMA API abstraction, as now the
driver is second guessing the DMA API implementation. I really see no
reason to do this.

>
> We may only want to call it for WC mapping BO,  please don't tangle all
> of this together.
>
> We simply want to do the right thing.
>
> > Right now it also provides SWIOTLB translation if needed.
>
> SWIOTLB introduce the bounce buffer, slower the performance.
>
> We don't need it. It should be avoid.

Sure. If your platform doesn't need it, that's totally fine. But you
can't guarantee that all platforms with coherent Vivante GPUs don't
need this. If it isn't needed the DMA API implementation will skip it
just fine at almost no cost, so the driver really shouldn't try to be
more clever than the platform DMA API implementation.

Regards,
Lucas

2023-06-21 16:33:10

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/21 23:33, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 23:00 +0800 schrieb Sui Jingfeng:
>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>> static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
>>>> @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
>>>> {
>>>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>>> struct drm_device *dev = obj->dev;
>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>> bool write = !!(op & ETNA_PREP_WRITE);
>>>> int ret;
>>>>
>>>> @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
>>>> return ret == 0 ? -ETIMEDOUT : ret;
>>>> }
>>>>
>>>> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>> Why do you need this? Isn't dma_sync_sgtable_for_cpu a no-op on your
>>> platform when the device is coherent?
>>>
>> I need this to show that our hardware is truly dma-coherent!
>>
>> I have tested that the driver still works like a charm without adding
>> this code '!priv->dma_coherent'.
>>
>>
>> But I'm expressing the idea that a truly dma-coherent just device don't
>> need this.
>>
>> I don't care if it is a no-op.
>>
>> It is now, it may not in the future.
> And that's exactly the point. If it ever turns into something more than
> a no-op on your platform, then that's probably for a good reason and a
> driver should not assume that it knows better than the DMA API
> implementation what is or is not required on a specific platform to
> make DMA work.
>
>> Even it is, the overhead of function call itself still get involved.
>>
> cpu_prep/fini aren't total fast paths, you already synchronized with
> the GPU here, potentially waiting for jobs to finish, etc. If your
> platform no-ops this then the function call will be in the noise.
>
>> Also, we want to try flush the write buffer with the CPU manually.
>>
>>
>> Currently, we want the absolute correctness in the concept,
>>
>> not only the rendering results.
> And if you want absolute correctness then calling dma_sync_sgtable_* is
> the right thing to do, as it can do much more than just manage caches.

For our hardware, cached mapping don't need calling dma_sync_sgtable_*.

This is the the right thing to do. The hardware already guarantee it for
use.


We may only want to call it for WC mapping BO,  please don't tangle all
of this together.

We simply want to do the right thing.

> Right now it also provides SWIOTLB translation if needed.

SWIOTLB introduce the bounce buffer, slower the performance.

We don't need it. It should be avoid.

 I know you know everything. No sugar-coated bullets please.

>
> Regards,
> Lucas

--
Jingfeng


2023-06-21 16:37:27

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Mittwoch, dem 21.06.2023 um 23:41 +0800 schrieb Sui Jingfeng:
> On 2023/6/21 23:23, Lucas Stach wrote:
> > Am Mittwoch, dem 21.06.2023 um 22:44 +0800 schrieb Sui Jingfeng:
> > > Hi,
> > >
> > > On 2023/6/21 18:00, Lucas Stach wrote:
> > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > > index 9cd72948cfad..644e5712c050 100644
> > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > > @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
> > > > > struct xarray active_contexts;
> > > > > u32 next_context_id;
> > > > >
> > > > > + /*
> > > > > + * If true, the GPU is capable of snooping cpu cache. Here, it
> > > > > + * also means that cache coherency is enforced by the hardware.
> > > > > + */
> > > > > + bool dma_coherent;
> > > > > +
> > > > No need for this, I think. Just use dev_is_dma_coherent() where you
> > > > need to know this.
> > > >
> > > No, we want this value cached by the driver.
> > >
> > Why? dev_is_dma_coherent() is a header-only function with a single
> > pointer chasing operation. Your cache is also a single pointer chasing
> > access, just that we now need storage for this information in both
> > struct device and struct etnaviv_gpu.
>
>
> You don't need store it in struct etnaviv_gpu.
>
> As this variable is shared across the device, so it is better to be put
> in the struct etnaviv_drm_private.
>
> I don't think another 4 bytes allocation is something what we can't pay for.
>
>
> My patch doesn't mentioned that it need to store it inside of struct
> etnaviv_gpu, do I?

You are right, I was mistaken about the etnaviv struct this is added
to. However there is still the fundamental question: what's the gain of
this cache? The information is already available in struct device and
will be accessed with the same amount of loads if you care that much
about micro-optimization.

Regards,
Lucas


2023-06-21 16:39:28

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device


On 2023/6/21 23:23, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 22:44 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>> index 9cd72948cfad..644e5712c050 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
>>>> struct xarray active_contexts;
>>>> u32 next_context_id;
>>>>
>>>> + /*
>>>> + * If true, the GPU is capable of snooping cpu cache. Here, it
>>>> + * also means that cache coherency is enforced by the hardware.
>>>> + */
>>>> + bool dma_coherent;
>>>> +
>>> No need for this, I think. Just use dev_is_dma_coherent() where you
>>> need to know this.
>>>
>> No, we want this value cached by the driver.
>>
> Why? dev_is_dma_coherent() is a header-only function with a single
> pointer chasing operation. Your cache is also a single pointer chasing
> access, just that we now need storage for this information in both
> struct device and struct etnaviv_gpu.


You don't need store it in struct etnaviv_gpu.

As this variable is shared across the device, so it is better to be put
in the struct etnaviv_drm_private.

I don't think another 4 bytes allocation is something what we can't pay for.


My patch doesn't mentioned that it need to store it inside of struct
etnaviv_gpu, do I?

> Regards,
> Lucas
>
>> We only need call  dev_is_dma_coherent() once!
>>
>> We need to reuse this variable on other places.
>>
--
Jingfeng


2023-06-21 16:56:18

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/22 00:12, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 23:41 +0800 schrieb Sui Jingfeng:
>> On 2023/6/21 23:23, Lucas Stach wrote:
>>> Am Mittwoch, dem 21.06.2023 um 22:44 +0800 schrieb Sui Jingfeng:
>>>> Hi,
>>>>
>>>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>>>> index 9cd72948cfad..644e5712c050 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>>>> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
>>>>>> struct xarray active_contexts;
>>>>>> u32 next_context_id;
>>>>>>
>>>>>> + /*
>>>>>> + * If true, the GPU is capable of snooping cpu cache. Here, it
>>>>>> + * also means that cache coherency is enforced by the hardware.
>>>>>> + */
>>>>>> + bool dma_coherent;
>>>>>> +
>>>>> No need for this, I think. Just use dev_is_dma_coherent() where you
>>>>> need to know this.
>>>>>
>>>> No, we want this value cached by the driver.
>>>>
>>> Why? dev_is_dma_coherent() is a header-only function with a single
>>> pointer chasing operation. Your cache is also a single pointer chasing
>>> access, just that we now need storage for this information in both
>>> struct device and struct etnaviv_gpu.
>>
>> You don't need store it in struct etnaviv_gpu.
>>
>> As this variable is shared across the device, so it is better to be put
>> in the struct etnaviv_drm_private.
>>
>> I don't think another 4 bytes allocation is something what we can't pay for.
>>
>>
>> My patch doesn't mentioned that it need to store it inside of struct
>> etnaviv_gpu, do I?
> You are right, I was mistaken about the etnaviv struct this is added
> to. However there is still the fundamental question: what's the gain of
> this cache?

Clearness and short

you approach need to de-reference the pointer struct device *dev every
time you need to fetch its value.

my name is short, typing it is less time-consuming

> The information is already available in struct device and
> will be accessed with the same amount of loads if you care that much
> about micro-optimization.

I don't want call it everywhere, its too long.

What if the function you recommend get expanded by some programmer someday?

> Regards,
> Lucas

--
Jingfeng


2023-06-21 17:09:28

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi

On 2023/6/21 23:58, Lucas Stach wrote:
>> you approach forbidden any possibility to use the WC BO at anywhere.
>>
>>
>> My approach need only check once, while you approach need at least 3
>> check plus
>>
>> so much bit-wise logic operations,  plus a function call  (&, ==, &&,
>> &, ~, &) .
>>
>> and every time you create a BO. This nasty judgement happens.
>>
> BO creation again is not a fast path. You are committing to allocate
> new memory, which is a few orders of magnitude more costly than the few
> instructions needed for those comparisons.
>
What's wrong with you point here is that

We are not going make it more worse because it is worse.

We would like same any single bit of the performance.


It's about the beauty, and beauty and correctness is every thing.


My implement is good both in the perspective of beauty and in the
perspective of the performance.

BO creation is fast or not is irrelevant to what the point we are discuss.

You are always introduce non-critical factor to disturb the discuss,

leading the people go a wrong direction.

--
Jingfeng


2023-06-21 17:17:15

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/22 00:12, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 23:41 +0800 schrieb Sui Jingfeng:
>> On 2023/6/21 23:23, Lucas Stach wrote:
>>> Am Mittwoch, dem 21.06.2023 um 22:44 +0800 schrieb Sui Jingfeng:
>>>> Hi,
>>>>
>>>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>>>> index 9cd72948cfad..644e5712c050 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>>>>>> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
>>>>>> struct xarray active_contexts;
>>>>>> u32 next_context_id;
>>>>>>
>>>>>> + /*
>>>>>> + * If true, the GPU is capable of snooping cpu cache. Here, it
>>>>>> + * also means that cache coherency is enforced by the hardware.
>>>>>> + */
>>>>>> + bool dma_coherent;
>>>>>> +
>>>>> No need for this, I think. Just use dev_is_dma_coherent() where you
>>>>> need to know this.
>>>>>
>>>> No, we want this value cached by the driver.
>>>>
>>> Why? dev_is_dma_coherent() is a header-only function with a single
>>> pointer chasing operation. Your cache is also a single pointer chasing
>>> access, just that we now need storage for this information in both
>>> struct device and struct etnaviv_gpu.
>>
>> You don't need store it in struct etnaviv_gpu.
>>
>> As this variable is shared across the device, so it is better to be put
>> in the struct etnaviv_drm_private.
>>
>> I don't think another 4 bytes allocation is something what we can't pay for.
>>
>>
>> My patch doesn't mentioned that it need to store it inside of struct
>> etnaviv_gpu, do I?
> You are right, I was mistaken about the etnaviv struct this is added
> to. However there is still the fundamental question: what's the gain of
> this cache? The information is already available in struct device and
> will be accessed with the same amount of loads if you care that much
> about micro-optimization.

Sometime, in some function it is more convenient(easier) to fetch
'struct etnaviv_drm_private *priv'

than the 'struct device *dev',  I think this is obvious.

> Regards,
> Lucas

--
Jingfeng


2023-06-21 17:42:25

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/22 00:07, Lucas Stach wrote:
> And as the HW guarantees it on your platform, your platform
> implementation makes this function effectively a no-op. Skipping the
> call to this function is breaking the DMA API abstraction, as now the
> driver is second guessing the DMA API implementation. I really see no
> reason to do this.

It is the same reason you chose the word 'effectively', not 'difinitely'.

We don't want waste the CPU's time,


 to running the dma_sync_sg_for_cpu funcion() function


```

void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
            int nelems, enum dma_data_direction dir)
{
    const struct dma_map_ops *ops = get_dma_ops(dev);

    BUG_ON(!valid_dma_direction(dir));
    if (dma_map_direct(dev, ops))
        dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
    else if (ops->sync_sg_for_cpu)
        ops->sync_sg_for_cpu(dev, sg, nelems, dir);
    debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
}

```


 to running the this:


```

int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
{
    struct drm_device *dev = obj->dev;
    struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
    struct etnaviv_drm_private *priv = dev->dev_private;

    if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
        /* fini without a prep is almost certainly a userspace error */
        WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
        dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
        etnaviv_obj->last_cpu_prep_op = 0;
    }

    return 0;
}

```


But, this is acceptable, because we can kill the GEM_CPU_PREP and
GEM_CPU_FINI ioctl entirely

at userspace for cached buffer, as this is totally not needed for cached
mapping on our platform.


Well leave this for the WC mapping only,

OK ?

--
Jingfeng


2023-06-21 17:50:55

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/21 23:58, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>>>> etnaviv_op_to_dma_dir(op));
>>>> etnaviv_obj->last_cpu_prep_op = op;
>>>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>>>> {
>>>> struct drm_device *dev = obj->dev;
>>>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>>
>>>> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>> /* fini without a prep is almost certainly a userspace error */
>>>> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>>> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> index 3524b5811682..754126992264 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
>>>> struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
>>>> struct dma_buf_attachment *attach, struct sg_table *sgt)
>>>> {
>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>> struct etnaviv_gem_object *etnaviv_obj;
>>>> size_t size = PAGE_ALIGN(attach->dmabuf->size);
>>>> + u32 cache_flags = ETNA_BO_WC;
>>>> int ret, npages;
>>>>
>>>> - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>>>> + if (priv->dma_coherent)
>>>> + cache_flags = ETNA_BO_CACHED;
>>>> +
>>> Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
>>> from WC to CACHED as necessary by adding something like this:
>> I understand you are a profession person in vivante GPU driver domain.
>>
>> I respect you reviews and instruction.
>>
>> But, I'm really reluctant to agree with this, is there any space to
>> negotiate?
>>
>>> /*
>>> * Upgrade WC to CACHED when the device is hardware coherent and the
>>> * platform doesn't allow mixing cached and writecombined mappings to
>>> * the same memory area.
>>> */
>>> if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
>>> dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
>>> flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
>> This is policy, not a mechanism.
>>
>> Using what cache property is a user-space program's choice.
>>
>> While you are override the WC with CACHED mapping. This is not correct
>> in the concept!
>>
> Please explain why you think that this isn't correct.

Again,

this is user-space things!

this is user-space things!

this is user-space things!

I have explained several times.

made the decision for the user-space program is wrong.


> It allows
> userspace to use WC mappings that would potentially cause loss of
> coherency between CPU and GPU, which isn't acceptable.

Before made the WC works correctly,  you need the developing environment.

userspace program can tune the BO cache mapping easily.

Either environment or supply a conf file.


While with your implement, we don't have the opportunity to do debugging
and the development.

The kernel space is writing hard-code.


> Regards,
> Lucas

--
Jingfeng


2023-06-21 18:07:31

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Donnerstag, dem 22.06.2023 um 01:21 +0800 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/6/21 23:58, Lucas Stach wrote:
> > Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
> > > Hi,
> > >
> > > On 2023/6/21 18:00, Lucas Stach wrote:
> > > > > dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
> > > > > etnaviv_op_to_dma_dir(op));
> > > > > etnaviv_obj->last_cpu_prep_op = op;
> > > > > @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
> > > > > {
> > > > > struct drm_device *dev = obj->dev;
> > > > > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > > >
> > > > > - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > > > + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > > > /* fini without a prep is almost certainly a userspace error */
> > > > > WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
> > > > > dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
> > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > > > index 3524b5811682..754126992264 100644
> > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > > > @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
> > > > > struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
> > > > > struct dma_buf_attachment *attach, struct sg_table *sgt)
> > > > > {
> > > > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > > > struct etnaviv_gem_object *etnaviv_obj;
> > > > > size_t size = PAGE_ALIGN(attach->dmabuf->size);
> > > > > + u32 cache_flags = ETNA_BO_WC;
> > > > > int ret, npages;
> > > > >
> > > > > - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
> > > > > + if (priv->dma_coherent)
> > > > > + cache_flags = ETNA_BO_CACHED;
> > > > > +
> > > > Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
> > > > from WC to CACHED as necessary by adding something like this:
> > > I understand you are a profession person in vivante GPU driver domain.
> > >
> > > I respect you reviews and instruction.
> > >
> > > But, I'm really reluctant to agree with this, is there any space to
> > > negotiate?
> > >
> > > > /*
> > > > * Upgrade WC to CACHED when the device is hardware coherent and the
> > > > * platform doesn't allow mixing cached and writecombined mappings to
> > > > * the same memory area.
> > > > */
> > > > if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
> > > > dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
> > > > flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
> > > This is policy, not a mechanism.
> > >
> > > Using what cache property is a user-space program's choice.
> > >
> > > While you are override the WC with CACHED mapping. This is not correct
> > > in the concept!
> > >
> > Please explain why you think that this isn't correct.
>
> Again,
>
> this is user-space things!
>
> this is user-space things!
>
> this is user-space things!
>
> I have explained several times.
>
> made the decision for the user-space program is wrong.
>
This mode of communication isn't helpful. Please stop it.

As I tried to explain to you multiple times: if userspace can break
coherency by selecting the wrong mapping type then this is something
the kernel should prevent.

>
> > It allows
> > userspace to use WC mappings that would potentially cause loss of
> > coherency between CPU and GPU, which isn't acceptable.
>
> Before made the WC works correctly,  you need the developing environment.
>
> userspace program can tune the BO cache mapping easily.
>
> Either environment or supply a conf file.
>
>
> While with your implement, we don't have the opportunity to do debugging
> and the development.

You can always add a patch to your local kernel to re-allow WC mappings
while you work on making them behave as expected on your platform. With
the mainline kernel there is no way that the kernel driver will allow
broken coherency.

And as I also mentioned before, there is a clear upgrade path here:
once WC mappings work as expected on your platform we can easily drop
the upgrading from the kernel driver again. The userspace driver can
already be changed to use CACHED BOs where beneficial on your platform
in the meantime.

Regards,
Lucas

2023-06-21 18:07:41

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Am Donnerstag, dem 22.06.2023 um 01:31 +0800 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/6/22 00:07, Lucas Stach wrote:
> > And as the HW guarantees it on your platform, your platform
> > implementation makes this function effectively a no-op. Skipping the
> > call to this function is breaking the DMA API abstraction, as now the
> > driver is second guessing the DMA API implementation. I really see no
> > reason to do this.
>
> It is the same reason you chose the word 'effectively', not 'difinitely'.
>
> We don't want waste the CPU's time,
>
>
>  to running the dma_sync_sg_for_cpu funcion() function
>
>
> ```
>
> void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>             int nelems, enum dma_data_direction dir)
> {
>     const struct dma_map_ops *ops = get_dma_ops(dev);
>
>     BUG_ON(!valid_dma_direction(dir));
>     if (dma_map_direct(dev, ops))
>         dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>     else if (ops->sync_sg_for_cpu)
>         ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>     debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
> }
>
> ```
>
>
>  to running the this:
>
>
> ```
>
> int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
> {
>     struct drm_device *dev = obj->dev;
>     struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>     struct etnaviv_drm_private *priv = dev->dev_private;
>
>     if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>         /* fini without a prep is almost certainly a userspace error */
>         WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>         dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
> etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
>         etnaviv_obj->last_cpu_prep_op = 0;
>     }
>
>     return 0;
> }
>
> ```
>
My judgment as the maintainer of this driver is that the small CPU
overhead of calling this function is very well worth it, if the
alternative is breaking the DMA API abstractions.

>
> But, this is acceptable, because we can kill the GEM_CPU_PREP and
> GEM_CPU_FINI ioctl entirely
>
> at userspace for cached buffer, as this is totally not needed for cached
> mapping on our platform.
>
And that statement isn't true either. The CPU_PREP/FINI ioctls also
provide fence synchronization between CPU and GPU. There are a few very
specific cases where skipping those ioctls is acceptable (mostly when
the userspace driver explicitly wants unsynchronized access), but in
most cases they are required for correctness.

Regards,
Lucas

2023-06-22 20:56:28

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi

On 2023/6/21 23:58, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>>>> etnaviv_op_to_dma_dir(op));
>>>> etnaviv_obj->last_cpu_prep_op = op;
>>>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>>>> {
>>>> struct drm_device *dev = obj->dev;
>>>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>>
>>>> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>> /* fini without a prep is almost certainly a userspace error */
>>>> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>>> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> index 3524b5811682..754126992264 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
>>>> struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
>>>> struct dma_buf_attachment *attach, struct sg_table *sgt)
>>>> {
>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>> struct etnaviv_gem_object *etnaviv_obj;
>>>> size_t size = PAGE_ALIGN(attach->dmabuf->size);
>>>> + u32 cache_flags = ETNA_BO_WC;
>>>> int ret, npages;
>>>>
>>>> - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>>>> + if (priv->dma_coherent)
>>>> + cache_flags = ETNA_BO_CACHED;
>>>> +
>>> Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
>>> from WC to CACHED as necessary by adding something like this:
>> I understand you are a profession person in vivante GPU driver domain.
>>
>> I respect you reviews and instruction.
>>
>> But, I'm really reluctant to agree with this, is there any space to
>> negotiate?
>>
>>> /*
>>> * Upgrade WC to CACHED when the device is hardware coherent and the
>>> * platform doesn't allow mixing cached and writecombined mappings to
>>> * the same memory area.
>>> */
>>> if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
>>> dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
>>> flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
>> This is policy, not a mechanism.
>>
>> Using what cache property is a user-space program's choice.
>>
>> While you are override the WC with CACHED mapping. This is not correct
>> in the concept!
>>
> Please explain why you think that this isn't correct.

This is NOT always correct because:

when etnaviv_gem_prime_import_sg_table() is called,

drm/etnaviv is importing buffer from other drivers (for example drm/loongson or drm/ingenic).

drm/etnaviv is the importer, and drm/loongson or drm/ingenic is the exporter.


While drm/etnaviv using the WC mapping by default, which may cause *cache aliasing*,


Because the imported buffer originally belong to the KMS driver side.

The KMS driver may choose cached mapping by default.


For drm/ingenic(jz4770), the BO will be cached, but their hardware can't guarantee coherency.

they have to flush the cache manually when do the frame buffer update(damage update or dirty update).

Because cached mapping is more fast than the WC mapping.




For drm/loongson, shared buffer have to pinned into GTT,

By default, we using the cached mapping for the buffers in the system RAM;

But we are lucky, our the hardware bless us,

the hardware maintain the cache coherency for us.


While drm/etnaviv choose the WC mapping for the imported buffer blindly,

when doing the import,

*drm/etnaviv know nothing about the original buffer's caching property.*

This is the problem.


Currently, I guess drm/etnaviv only works for drm/imx,

because drm/imx choose WC mapping as the cache property by default for
the dumb buffer.

The cache property mapping match, so it works.

But their no communications between the KMS driver and RO driver.


I think, drm/etanviv will also wrong on the ARM64 platforms where cache
coherency is maintained by the hardware.

If the KMS driver using cached mapping, while drm/etnaviv using WC mapping,

There still will have problems.

I will go to find the hardware, and do the testing for you.


> If using WC
> mappings cause a potential loss of coherency on your platform, then we
> can not allow the userspace driver to use WC mappings.

I have explained with my broken English,

The point is that loss of coherency may because KMS-RO framework have
defect,

maybe software side bug, please don't make the conclusion too fast.


Xorg DDX driver (EXA base DDX for an example) fallback to the CPU do the
software rendering.

It is even faster than the hardware accelerated implement,

especially in the case where the CPU is fast and the GPU is weak.

LS7A1000(GC1000) + [email protected] X4 core is such a case.

For Discrete on-board VRAM we will using the WC mapping, it will not be
shared with other driver,

The CPU use it themself, this is definitionally OK.

In practice, it works like a charm.

For buffers in the system RAM and GTT,

We will using cached mapping for faster CPU software rendering.

hard-coded by the driver.


We are benefited so much from the cached mapping.

While when using write-combine caching mapping for a buffer on LoongArch
and Loongson Mips CPU,

It is OK for CPU write.

But when you want read from such buffer by the CPU, it is just boil down
to uncached.

For our platform, uncached read serve as a kind of SYNC.

It will flush the data in the CPU's write buffer.

This cause performance drop.


But we still can use the write-combine caching mapping in the
circumstances where do not need the CPU read.

So allow write-combine is providing one more choice to the user-space,

It is just that use it at your own risk.


> As I would like to keep the option of WC mappings,

Yeah, I know what's reason.

Because on the platform you love (imx6q, imx8q, for an example),

The hardware don't maintain the cache coherency.

So if fallback to software rendering, you will need to flush the cache frequency.

the performance will drop very much, right?


--
Jingfeng


2023-06-23 12:17:30

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

On 2023-06-20 10:47, Sui Jingfeng wrote:
> From: Sui Jingfeng <[email protected]>
>
> Loongson CPUs maintain cache coherency by hardware, which means that the
> data in the CPU cache is identical to the data in main system memory. As
> for the peripheral device, most of Loongson chips chose to define the
> peripherals as DMA coherent by default, device drivers do not need to
> maintain the coherency between a processor and an I/O device manually.
>
> There are exceptions, for LS2K1000 SoC, part of peripheral device can be
> configured as DMA non-coherent. But there is no released version of such
> firmware exist in the market. Peripherals of older LS2K1000 is also DMA
> non-coherent, but they are nearly outdated. So, those are trivial cases.
>
> Nevertheless, kernel space still need to do the probe work, because vivante
> GPU IP has been integrated into various platform. Hence, this patch add
> runtime detection code to probe if a specific GPU is DMA coherent, If the
> answer is yes, we are going to utilize such features. On Loongson platform,
> When a buffer is accessed by both the GPU and the CPU, the driver should
> prefer ETNA_BO_CACHED over ETNA_BO_WC.
>
> This patch also add a new parameter: etnaviv_param_gpu_coherent, which
> allow userspace to know if such a feature is available. Because
> write-combined BO is still preferred in some case, especially where don't
> need CPU read, for example, uploading compiled shader bin.
>
> Cc: Lucas Stach <[email protected]>
> Cc: Christian Gmeiner <[email protected]>
> Cc: Philipp Zabel <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Signed-off-by: Sui Jingfeng <[email protected]>
> ---
> drivers/gpu/drm/etnaviv/etnaviv_drv.c | 35 +++++++++++++++++++++
> drivers/gpu/drm/etnaviv/etnaviv_drv.h | 6 ++++
> drivers/gpu/drm/etnaviv/etnaviv_gem.c | 22 ++++++++++---
> drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 7 ++++-
> drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +++
> include/uapi/drm/etnaviv_drm.h | 1 +
> 6 files changed, 70 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 0a365e96d371..d8e788aa16cb 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -5,7 +5,9 @@
>
> #include <linux/component.h>
> #include <linux/dma-mapping.h>
> +#include <linux/dma-map-ops.h>

/*
* This header is for implementations of dma_map_ops and related code.
* It should not be included in drivers just using the DMA API.
*/

> #include <linux/module.h>
> +#include <linux/of_address.h>
> #include <linux/of_platform.h>
> #include <linux/uaccess.h>
>
> @@ -24,6 +26,34 @@
> #include "etnaviv_pci_drv.h"
> #include "etnaviv_perfmon.h"
>
> +static struct device_node *etnaviv_of_first_available_node(void)
> +{
> + struct device_node *core_node;
> +
> + for_each_compatible_node(core_node, NULL, "vivante,gc") {
> + if (of_device_is_available(core_node))
> + return core_node;
> + }
> +
> + return NULL;
> +}
> +
> +static bool etnaviv_is_dma_coherent(struct device *dev)
> +{
> + struct device_node *np;
> + bool coherent;
> +
> + np = etnaviv_of_first_available_node();
> + if (np) {
> + coherent = of_dma_is_coherent(np);
> + of_node_put(np);
> + } else {
> + coherent = dev_is_dma_coherent(dev);
> + }

Please use device_get_dma_attr() like other well-behaved drivers.

> +
> + return coherent;
> +}
> +
> /*
> * etnaviv private data construction and destructions:
> */
> @@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct drm_device *drm)
> return ERR_PTR(-ENOMEM);
> }
>
> + priv->dma_coherent = etnaviv_is_dma_coherent(dev);
> +
> + if (priv->dma_coherent)
> + drm_info(drm, "%s is dma coherent\n", dev_name(dev));

I'm pretty sure the end-user doesn't care.

> +
> return priv;
> }
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> index 9cd72948cfad..644e5712c050 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
> struct xarray active_contexts;
> u32 next_context_id;
>
> + /*
> + * If true, the GPU is capable of snooping cpu cache. Here, it
> + * also means that cache coherency is enforced by the hardware.
> + */
> + bool dma_coherent;
> +
> /* list of GEM objects: */
> struct mutex gem_lock;
> struct list_head gem_list;
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index b5f73502e3dd..39bdc3774f2d 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
> static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
> {
> struct page **pages;
> + pgprot_t prot;
>
> lockdep_assert_held(&obj->lock);
>
> @@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
> if (IS_ERR(pages))
> return NULL;
>
> - return vmap(pages, obj->base.size >> PAGE_SHIFT,
> - VM_MAP, pgprot_writecombine(PAGE_KERNEL));
> + switch (obj->flags) {
> + case ETNA_BO_CACHED:
> + prot = PAGE_KERNEL;
> + break;
> + case ETNA_BO_UNCACHED:
> + prot = pgprot_noncached(PAGE_KERNEL);
> + break;
> + case ETNA_BO_WC:
> + default:
> + prot = pgprot_writecombine(PAGE_KERNEL);
> + }
> +
> + return vmap(pages, obj->base.size >> PAGE_SHIFT, VM_MAP, prot);
> }
>
> static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
> @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> {
> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> struct drm_device *dev = obj->dev;
> + struct etnaviv_drm_private *priv = dev->dev_private;
> bool write = !!(op & ETNA_PREP_WRITE);
> int ret;
>
> @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> return ret == 0 ? -ETIMEDOUT : ret;
> }
>
> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {

TBH the existing condition already looks a bit sketchy - even if
userspace has a non-cacheable mapping, a DMA sync may need to do things
other than cache maintenance - but this change certainly isn't making
things any better. If you can demonstrate a *measurable* and significant
overhead from calling dma_sync_sgtable_*() on your platform when the
device is coherent and nothing is bounce-buffered, then the first thing
we can do is look at making that quicker (for everyone). Otherwise, this
seems like the famous bad kind of premature optimisation.

Thanks,
Robin.

> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
> etnaviv_op_to_dma_dir(op));
> etnaviv_obj->last_cpu_prep_op = op;
> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
> {
> struct drm_device *dev = obj->dev;
> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> + struct etnaviv_drm_private *priv = dev->dev_private;
>
> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> /* fini without a prep is almost certainly a userspace error */
> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> index 3524b5811682..754126992264 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
> struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
> struct dma_buf_attachment *attach, struct sg_table *sgt)
> {
> + struct etnaviv_drm_private *priv = dev->dev_private;
> struct etnaviv_gem_object *etnaviv_obj;
> size_t size = PAGE_ALIGN(attach->dmabuf->size);
> + u32 cache_flags = ETNA_BO_WC;
> int ret, npages;
>
> - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
> + if (priv->dma_coherent)
> + cache_flags = ETNA_BO_CACHED;
> +
> + ret = etnaviv_gem_new_private(dev, size, cache_flags,
> &etnaviv_gem_prime_ops, &etnaviv_obj);
> if (ret < 0)
> return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> index d6a21e97feb1..d99ac675ce8b 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> @@ -164,6 +164,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, u32 param, u64 *value)
> *value = gpu->identity.eco_id;
> break;
>
> + case ETNAVIV_PARAM_GPU_COHERENT:
> + *value = priv->dma_coherent;
> + break;
> +
> default:
> DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
> return -EINVAL;
> diff --git a/include/uapi/drm/etnaviv_drm.h b/include/uapi/drm/etnaviv_drm.h
> index af024d90453d..76baf45d7158 100644
> --- a/include/uapi/drm/etnaviv_drm.h
> +++ b/include/uapi/drm/etnaviv_drm.h
> @@ -77,6 +77,7 @@ struct drm_etnaviv_timespec {
> #define ETNAVIV_PARAM_GPU_PRODUCT_ID 0x1c
> #define ETNAVIV_PARAM_GPU_CUSTOMER_ID 0x1d
> #define ETNAVIV_PARAM_GPU_ECO_ID 0x1e
> +#define ETNAVIV_PARAM_GPU_COHERENT 0x1f
>
> #define ETNA_MAX_PIPES 4
>

2023-06-23 12:43:35

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/23 19:52, Robin Murphy wrote:
> On 2023-06-20 10:47, Sui Jingfeng wrote:
>> From: Sui Jingfeng <[email protected]>
>>
>> Loongson CPUs maintain cache coherency by hardware, which means that the
>> data in the CPU cache is identical to the data in main system memory. As
>> for the peripheral device, most of Loongson chips chose to define the
>> peripherals as DMA coherent by default, device drivers do not need to
>> maintain the coherency between a processor and an I/O device manually.
>>
>> There are exceptions, for LS2K1000 SoC, part of peripheral device can be
>> configured as DMA non-coherent. But there is no released version of such
>> firmware exist in the market. Peripherals of older LS2K1000 is also DMA
>> non-coherent, but they are nearly outdated. So, those are trivial cases.
>>
>> Nevertheless, kernel space still need to do the probe work, because
>> vivante
>> GPU IP has been integrated into various platform. Hence, this patch add
>> runtime detection code to probe if a specific GPU is DMA coherent, If
>> the
>> answer is yes, we are going to utilize such features. On Loongson
>> platform,
>> When a buffer is accessed by both the GPU and the CPU, the driver should
>> prefer ETNA_BO_CACHED over ETNA_BO_WC.
>>
>> This patch also add a new parameter: etnaviv_param_gpu_coherent, which
>> allow userspace to know if such a feature is available. Because
>> write-combined BO is still preferred in some case, especially where
>> don't
>> need CPU read, for example, uploading compiled shader bin.
>>
>> Cc: Lucas Stach <[email protected]>
>> Cc: Christian Gmeiner <[email protected]>
>> Cc: Philipp Zabel <[email protected]>
>> Cc: Bjorn Helgaas <[email protected]>
>> Cc: Daniel Vetter <[email protected]>
>> Signed-off-by: Sui Jingfeng <[email protected]>
>> ---
>>   drivers/gpu/drm/etnaviv/etnaviv_drv.c       | 35 +++++++++++++++++++++
>>   drivers/gpu/drm/etnaviv/etnaviv_drv.h       |  6 ++++
>>   drivers/gpu/drm/etnaviv/etnaviv_gem.c       | 22 ++++++++++---
>>   drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c |  7 ++++-
>>   drivers/gpu/drm/etnaviv/etnaviv_gpu.c       |  4 +++
>>   include/uapi/drm/etnaviv_drm.h              |  1 +
>>   6 files changed, 70 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
>> b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
>> index 0a365e96d371..d8e788aa16cb 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
>> @@ -5,7 +5,9 @@
>>     #include <linux/component.h>
>>   #include <linux/dma-mapping.h>
>> +#include <linux/dma-map-ops.h>
>
> /*
>  * This header is for implementations of dma_map_ops and related code.
>  * It should not be included in drivers just using the DMA API.
>  */
>
1)

The driver can not pass the compile, without include the
linux/dma-map-ops.h


You may be right, but Lucas suggest using dev_is_dma_coherent() function.

because the dev_is_dma_coherent() function is defined in dma-map-ops.h.


for our platform its:


```

static inline bool dev_is_dma_coherent(struct device *dev)
{
    return true;
}

```

2. drm/exynos and drm/msm also include this handler


```

$ cd drivers/gpu/drm

$ find . -name "*.c" -type f | xargs grep "dma-map-ops.h"

./exynos/exynos_drm_dma.c:#include <linux/dma-map-ops.h>
./msm/msm_gem.c:#include <linux/dma-map-ops.h>

```


>>   #include <linux/module.h>
>> +#include <linux/of_address.h>
>>   #include <linux/of_platform.h>
>>   #include <linux/uaccess.h>
>>   @@ -24,6 +26,34 @@
>>   #include "etnaviv_pci_drv.h"
>>   #include "etnaviv_perfmon.h"
>>   +static struct device_node *etnaviv_of_first_available_node(void)
>> +{
>> +    struct device_node *core_node;
>> +
>> +    for_each_compatible_node(core_node, NULL, "vivante,gc") {
>> +        if (of_device_is_available(core_node))
>> +            return core_node;
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static bool etnaviv_is_dma_coherent(struct device *dev)
>> +{
>> +    struct device_node *np;
>> +    bool coherent;
>> +
>> +    np = etnaviv_of_first_available_node();
>> +    if (np) {
>> +        coherent = of_dma_is_coherent(np);
>> +        of_node_put(np);
>> +    } else {
>> +        coherent = dev_is_dma_coherent(dev);
>> +    }
>
> Please use device_get_dma_attr() like other well-behaved drivers.
>
OK, I agree with you.


```

priv->dma_coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;

```

Lucas,  Is above code Snippet looks fine to you?


>> +
>> +    return coherent;
>> +}
>> +
>>   /*
>>    * etnaviv private data construction and destructions:
>>    */
>> @@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct
>> drm_device *drm)
>>           return ERR_PTR(-ENOMEM);
>>       }
>>   +    priv->dma_coherent = etnaviv_is_dma_coherent(dev);
>> +
>> +    if (priv->dma_coherent)
>> +        drm_info(drm, "%s is dma coherent\n", dev_name(dev));
>
> I'm pretty sure the end-user doesn't care.
>
OK, i will delete it at the next version.

>> +
>>       return priv;
>>   }
>>   diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> index 9cd72948cfad..644e5712c050 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
>> @@ -46,6 +46,12 @@ struct etnaviv_drm_private {
>>       struct xarray active_contexts;
>>       u32 next_context_id;
>>   +    /*
>> +     * If true, the GPU is capable of snooping cpu cache. Here, it
>> +     * also means that cache coherency is enforced by the hardware.
>> +     */
>> +    bool dma_coherent;
>> +
>>       /* list of GEM objects: */
>>       struct mutex gem_lock;
>>       struct list_head gem_list;
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> index b5f73502e3dd..39bdc3774f2d 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> @@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
>>   static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
>>   {
>>       struct page **pages;
>> +    pgprot_t prot;
>>         lockdep_assert_held(&obj->lock);
>>   @@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct
>> etnaviv_gem_object *obj)
>>       if (IS_ERR(pages))
>>           return NULL;
>>   -    return vmap(pages, obj->base.size >> PAGE_SHIFT,
>> -            VM_MAP, pgprot_writecombine(PAGE_KERNEL));
>> +    switch (obj->flags) {
>> +    case ETNA_BO_CACHED:
>> +        prot = PAGE_KERNEL;
>> +        break;
>> +    case ETNA_BO_UNCACHED:
>> +        prot = pgprot_noncached(PAGE_KERNEL);
>> +        break;
>> +    case ETNA_BO_WC:
>> +    default:
>> +        prot = pgprot_writecombine(PAGE_KERNEL);
>> +    }
>> +
>> +    return vmap(pages, obj->base.size >> PAGE_SHIFT, VM_MAP, prot);
>>   }
>>     static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
>> @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object
>> *obj, u32 op,
>>   {
>>       struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>       struct drm_device *dev = obj->dev;
>> +    struct etnaviv_drm_private *priv = dev->dev_private;
>>       bool write = !!(op & ETNA_PREP_WRITE);
>>       int ret;
>>   @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object
>> *obj, u32 op,
>>               return ret == 0 ? -ETIMEDOUT : ret;
>>       }
>>   -    if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>> +    if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>
> TBH the existing condition already looks a bit sketchy - even if
> userspace has a non-cacheable mapping, a DMA sync may need to do
> things other than cache maintenance - but this change certainly isn't
> making things any better. If you can demonstrate a *measurable* and
> significant overhead from calling dma_sync_sgtable_*() on your
> platform when the device is coherent and nothing is bounce-buffered,
> then the first thing we can do is look at making that quicker (for
> everyone). Otherwise, this seems like the famous bad kind of premature
> optimisation.
>
OK, I agree with you, both you and Lucas told to remove this.

I will remove this at next version.


It doesn't hurt much to the performance. On our platform, the
performance is GPU bound.


> Thanks,
> Robin.
>
>> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>>                        etnaviv_op_to_dma_dir(op));
>>           etnaviv_obj->last_cpu_prep_op = op;
>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>>   {
>>       struct drm_device *dev = obj->dev;
>>       struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>> +    struct etnaviv_drm_private *priv = dev->dev_private;
>>   -    if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>> +    if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>           /* fini without a prep is almost certainly a userspace
>> error */
>>           WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>           dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> index 3524b5811682..754126992264 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops
>> etnaviv_gem_prime_ops = {
>>   struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct
>> drm_device *dev,
>>       struct dma_buf_attachment *attach, struct sg_table *sgt)
>>   {
>> +    struct etnaviv_drm_private *priv = dev->dev_private;
>>       struct etnaviv_gem_object *etnaviv_obj;
>>       size_t size = PAGE_ALIGN(attach->dmabuf->size);
>> +    u32 cache_flags = ETNA_BO_WC;
>>       int ret, npages;
>>   -    ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>> +    if (priv->dma_coherent)
>> +        cache_flags = ETNA_BO_CACHED;
>> +
>> +    ret = etnaviv_gem_new_private(dev, size, cache_flags,
>>                         &etnaviv_gem_prime_ops, &etnaviv_obj);
>>       if (ret < 0)
>>           return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
>> b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
>> index d6a21e97feb1..d99ac675ce8b 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
>> @@ -164,6 +164,10 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu
>> *gpu, u32 param, u64 *value)
>>           *value = gpu->identity.eco_id;
>>           break;
>>   +    case ETNAVIV_PARAM_GPU_COHERENT:
>> +        *value = priv->dma_coherent;
>> +        break;
>> +
>>       default:
>>           DBG("%s: invalid param: %u", dev_name(gpu->dev), param);
>>           return -EINVAL;
>> diff --git a/include/uapi/drm/etnaviv_drm.h
>> b/include/uapi/drm/etnaviv_drm.h
>> index af024d90453d..76baf45d7158 100644
>> --- a/include/uapi/drm/etnaviv_drm.h
>> +++ b/include/uapi/drm/etnaviv_drm.h
>> @@ -77,6 +77,7 @@ struct drm_etnaviv_timespec {
>>   #define ETNAVIV_PARAM_GPU_PRODUCT_ID                0x1c
>>   #define ETNAVIV_PARAM_GPU_CUSTOMER_ID               0x1d
>>   #define ETNAVIV_PARAM_GPU_ECO_ID                    0x1e
>> +#define ETNAVIV_PARAM_GPU_COHERENT                  0x1f
>>     #define ETNA_MAX_PIPES 4

--
Jingfeng


2023-06-24 16:18:50

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/22 01:45, Lucas Stach wrote:
>> Again,
>>
>> this is user-space things!
>>
>> this is user-space things!
>>
>> this is user-space things!
>>
>> I have explained several times.
>>
>> made the decision for the user-space program is wrong.
>>
> This mode of communication isn't helpful. Please stop it.
>
> As I tried to explain to you multiple times: if userspace can break
> coherency by selecting the wrong mapping type then this is something
> the kernel should prevent.
>
This is still a user-space things.

While my point is that we want to prevent it at userspace.

Please don't forget that sometime a user or a programmer want a wrong thing.

Either because of the need of debugging itself or due to the curiosity of

what it would happen if we doing something wrong.

Sometimes, the wrong rendering may more interesting.


The most important thing is that my patch is trying to increase the
flexibility,

and to increase the possibility. While you idea is to doing the reverse.


--
Jingfeng


2023-06-25 04:15:51

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/22 01:45, Lucas Stach wrote:
> Am Donnerstag, dem 22.06.2023 um 01:21 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/21 23:58, Lucas Stach wrote:
>>> Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
>>>> Hi,
>>>>
>>>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>>>> dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>>>>>> etnaviv_op_to_dma_dir(op));
>>>>>> etnaviv_obj->last_cpu_prep_op = op;
>>>>>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>>>>>> {
>>>>>> struct drm_device *dev = obj->dev;
>>>>>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>>>>
>>>>>> - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>>>> + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>>>> /* fini without a prep is almost certainly a userspace error */
>>>>>> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>>>>> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>>>> index 3524b5811682..754126992264 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>>>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
>>>>>> struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
>>>>>> struct dma_buf_attachment *attach, struct sg_table *sgt)
>>>>>> {
>>>>>> + struct etnaviv_drm_private *priv = dev->dev_private;
>>>>>> struct etnaviv_gem_object *etnaviv_obj;
>>>>>> size_t size = PAGE_ALIGN(attach->dmabuf->size);
>>>>>> + u32 cache_flags = ETNA_BO_WC;
>>>>>> int ret, npages;
>>>>>>
>>>>>> - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>>>>>> + if (priv->dma_coherent)
>>>>>> + cache_flags = ETNA_BO_CACHED;
>>>>>> +
>>>>> Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
>>>>> from WC to CACHED as necessary by adding something like this:
>>>> I understand you are a profession person in vivante GPU driver domain.
>>>>
>>>> I respect you reviews and instruction.
>>>>
>>>> But, I'm really reluctant to agree with this, is there any space to
>>>> negotiate?
>>>>
>>>>> /*
>>>>> * Upgrade WC to CACHED when the device is hardware coherent and the
>>>>> * platform doesn't allow mixing cached and writecombined mappings to
>>>>> * the same memory area.
>>>>> */
>>>>> if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
>>>>> dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
>>>>> flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
>>>> This is policy, not a mechanism.
>>>>
>>>> Using what cache property is a user-space program's choice.
>>>>
>>>> While you are override the WC with CACHED mapping. This is not correct
>>>> in the concept!
>>>>
>>> Please explain why you think that this isn't correct.
>> Again,
>>
>> this is user-space things!
>>
>> this is user-space things!
>>
>> this is user-space things!
>>
>> I have explained several times.
>>
>> made the decision for the user-space program is wrong.
>>
> This mode of communication isn't helpful. Please stop it.
>
> As I tried to explain to you multiple times: if userspace can break
> coherency by selecting the wrong mapping type then this is something
> the kernel should prevent.

You are right in overall.


This is the only one benefit which WC mapping is preferred over the
cached mapping.

As you already told me,  the WC mapping don't *pollute CPU's cache*.


If we can make sure that a BO is *only* going to be used by the GPU,

then we still can choose WC mapping as the cache property of this BO.

As the cache property is CPU side thing.


>>> It allows
>>> userspace to use WC mappings that would potentially cause loss of
>>> coherency between CPU and GPU, which isn't acceptable.
>> Before made the WC works correctly,  you need the developing environment.
>>
>> userspace program can tune the BO cache mapping easily.
>>
>> Either environment or supply a conf file.
>>
>>
>> While with your implement, we don't have the opportunity to do debugging
>> and the development.
> You can always add a patch to your local kernel to re-allow WC mappings
> while you work on making them behave as expected on your platform.


We are doing the things about the *upstream*.


> With
> the mainline kernel there is no way that the kernel driver will allow
> broken coherency.


A buffer is used by the GPU solely won't break the coherency.


> And as I also mentioned before, there is a clear upgrade path here:
> once WC mappings work as expected on your platform we can easily drop
> the upgrading from the kernel driver again. The userspace driver can
> already be changed to use CACHED BOs where beneficial on your platform
> in the meantime.

For our platform, I think the problem is that the GPU always write to L3
share cache,

even you use the WC mapping.

As I already said, the WC mapping only influence the CPU side.

How can I control the GPU when a BO is WC mapping ?

Does the GPU know that whether a BO  is WC mapping or not ?

How can I let(or tell) the GPU that

please not write to shared L3 cache,  write to the system RAM directly?

On out platform, the coherency between the CPU and peripheral IO
device's cache

is maintained by hardware. While at here.  the hardware is the shared L3
cache.


I guess the current problem is that the GPU don't listen to me,

he still write to CPU cache's even you choose the WC mapping.

Then if you want to read the rendered image by CPU,

the CPU go the un-cached path. Then I think the CPU will get wrong data.

> Regards,
> Lucas

--
Jingfeng


2023-06-25 04:16:42

by Sui Jingfeng

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

On 2023/6/22 01:53, Lucas Stach wrote:
> Am Donnerstag, dem 22.06.2023 um 01:31 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/22 00:07, Lucas Stach wrote:
>>> And as the HW guarantees it on your platform, your platform
>>> implementation makes this function effectively a no-op. Skipping the
>>> call to this function is breaking the DMA API abstraction, as now the
>>> driver is second guessing the DMA API implementation. I really see no
>>> reason to do this.
>> It is the same reason you chose the word 'effectively', not 'difinitely'.
>>
>> We don't want waste the CPU's time,
>>
>>
>>  to running the dma_sync_sg_for_cpu funcion() function
>>
>>
>> ```
>>
>> void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>>             int nelems, enum dma_data_direction dir)
>> {
>>     const struct dma_map_ops *ops = get_dma_ops(dev);
>>
>>     BUG_ON(!valid_dma_direction(dir));
>>     if (dma_map_direct(dev, ops))
>>         dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>>     else if (ops->sync_sg_for_cpu)
>>         ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>>     debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>> }
>>
>> ```
>>
>>
>>  to running the this:
>>
>>
>> ```
>>
>> int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>> {
>>     struct drm_device *dev = obj->dev;
>>     struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>     struct etnaviv_drm_private *priv = dev->dev_private;
>>
>>     if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>         /* fini without a prep is almost certainly a userspace error */
>>         WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>         dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>> etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
>>         etnaviv_obj->last_cpu_prep_op = 0;
>>     }
>>
>>     return 0;
>> }
>>
>> ```
>>
> My judgment as the maintainer of this driver is that the small CPU
> overhead of calling this function is very well worth it, if the
> alternative is breaking the DMA API abstractions.
>
>> But, this is acceptable, because we can kill the GEM_CPU_PREP and
>> GEM_CPU_FINI ioctl entirely
>>
>> at userspace for cached buffer, as this is totally not needed for cached
>> mapping on our platform.
>>
> And that statement isn't true either.

Yes, you are right here. I admit.


Because I have suffered such problem in the past when developing
xf86-video-loongson.

The root cause, I think,  is the CPU don't know when the GPU have
finished the rendering.

Or there still some data reside in the GPU's cache.


We have to call etna_bo_cpu_prep(etna_bo, DRM_ETNA_PREP_READ) function

to make sure the  data fetch by CPU is the latest.


I realized this knowledge(issue) five month ago in this year, see [1]
for reference.

I  just forget this thing when doing the debate with you.


[1]
https://gitlab.freedesktop.org/longxin2019/xf86-video-loongson/-/commit/95f9596eb19223c3109ea1f32c3e086fd1d43bd8

||


> The CPU_PREP/FINI ioctls also
> provide fence synchronization between CPU and GPU.

You are correct here.

> There are a few very
> specific cases where skipping those ioctls is acceptable (mostly when
> the userspace driver explicitly wants unsynchronized access), but in
> most cases they are required for correctness.

OK, you are extremely correct.

> Regards,
> Lucas

--
Jingfeng


2023-06-26 11:23:42

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Hi,

Am Sonntag, dem 25.06.2023 um 11:51 +0800 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/6/22 01:45, Lucas Stach wrote:
> > Am Donnerstag, dem 22.06.2023 um 01:21 +0800 schrieb Sui Jingfeng:
> > > Hi,
> > >
> > > On 2023/6/21 23:58, Lucas Stach wrote:
> > > > Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
> > > > > Hi,
> > > > >
> > > > > On 2023/6/21 18:00, Lucas Stach wrote:
> > > > > > > dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
> > > > > > > etnaviv_op_to_dma_dir(op));
> > > > > > > etnaviv_obj->last_cpu_prep_op = op;
> > > > > > > @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
> > > > > > > {
> > > > > > > struct drm_device *dev = obj->dev;
> > > > > > > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > > > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > > > > >
> > > > > > > - if (etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > > > > > + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
> > > > > > > /* fini without a prep is almost certainly a userspace error */
> > > > > > > WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
> > > > > > > dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
> > > > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > > > > > index 3524b5811682..754126992264 100644
> > > > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > > > > > @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
> > > > > > > struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
> > > > > > > struct dma_buf_attachment *attach, struct sg_table *sgt)
> > > > > > > {
> > > > > > > + struct etnaviv_drm_private *priv = dev->dev_private;
> > > > > > > struct etnaviv_gem_object *etnaviv_obj;
> > > > > > > size_t size = PAGE_ALIGN(attach->dmabuf->size);
> > > > > > > + u32 cache_flags = ETNA_BO_WC;
> > > > > > > int ret, npages;
> > > > > > >
> > > > > > > - ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
> > > > > > > + if (priv->dma_coherent)
> > > > > > > + cache_flags = ETNA_BO_CACHED;
> > > > > > > +
> > > > > > Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
> > > > > > from WC to CACHED as necessary by adding something like this:
> > > > > I understand you are a profession person in vivante GPU driver domain.
> > > > >
> > > > > I respect you reviews and instruction.
> > > > >
> > > > > But, I'm really reluctant to agree with this, is there any space to
> > > > > negotiate?
> > > > >
> > > > > > /*
> > > > > > * Upgrade WC to CACHED when the device is hardware coherent and the
> > > > > > * platform doesn't allow mixing cached and writecombined mappings to
> > > > > > * the same memory area.
> > > > > > */
> > > > > > if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
> > > > > > dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
> > > > > > flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
> > > > > This is policy, not a mechanism.
> > > > >
> > > > > Using what cache property is a user-space program's choice.
> > > > >
> > > > > While you are override the WC with CACHED mapping. This is not correct
> > > > > in the concept!
> > > > >
> > > > Please explain why you think that this isn't correct.
> > > Again,
> > >
> > > this is user-space things!
> > >
> > > this is user-space things!
> > >
> > > this is user-space things!
> > >
> > > I have explained several times.
> > >
> > > made the decision for the user-space program is wrong.
> > >
> > This mode of communication isn't helpful. Please stop it.
> >
> > As I tried to explain to you multiple times: if userspace can break
> > coherency by selecting the wrong mapping type then this is something
> > the kernel should prevent.
>
> You are right in overall.
>
>
> This is the only one benefit which WC mapping is preferred over the
> cached mapping.
>
> As you already told me,  the WC mapping don't *pollute CPU's cache*.
>
>
> If we can make sure that a BO is *only* going to be used by the GPU,
>
> then we still can choose WC mapping as the cache property of this BO.
>
> As the cache property is CPU side thing.
>
But if it only used by the GPU, then WC won't help you either as the
CPU is never touching the memory. The benefit of WC is that the CPU can
write around the cache, so if it uploads data to the GPU this data
won't pollute the cache. If the buffer is only used by the GPU, then
there is no risk of cache pollution through the CPU, right?

>
> > > > It allows
> > > > userspace to use WC mappings that would potentially cause loss of
> > > > coherency between CPU and GPU, which isn't acceptable.
> > > Before made the WC works correctly,  you need the developing environment.
> > >
> > > userspace program can tune the BO cache mapping easily.
> > >
> > > Either environment or supply a conf file.
> > >
> > >
> > > While with your implement, we don't have the opportunity to do debugging
> > > and the development.
> > You can always add a patch to your local kernel to re-allow WC mappings
> > while you work on making them behave as expected on your platform.
>
>
> We are doing the things about the *upstream*.
>
Upstream is not about doing experiments. You can always do those
experiments in your development kernel and if they prove to be
beneficial, we can think about how to integrate them with the upstream
kernel without the risk of inadvertently breaking things.

>
> > With
> > the mainline kernel there is no way that the kernel driver will allow
> > broken coherency.
>
>
> A buffer is used by the GPU solely won't break the coherency.
>
>
> > And as I also mentioned before, there is a clear upgrade path here:
> > once WC mappings work as expected on your platform we can easily drop
> > the upgrading from the kernel driver again. The userspace driver can
> > already be changed to use CACHED BOs where beneficial on your platform
> > in the meantime.
>
> For our platform, I think the problem is that the GPU always write to L3
> share cache,
>
> even you use the WC mapping.

As long as that is the case, I think forcing the CPU to go through the
same cache by upgrading the BOs to cached is the right thing to do.
>
> As I already said, the WC mapping only influence the CPU side.
>
> How can I control the GPU when a BO is WC mapping ?

Hm, newer GPUs allow to use different AXI attributes (which include the
cachability) controlled via a pagetable setting, but I'm not sure if
the GC1000 on your platform supports this.

Can you ask your hardware guys if there is a way to set the PCI "no-
snoop" transaction flag from the GPU and if that will cause the GPU
memory accesses to bypass the L3 cache?

>
> Does the GPU know that whether a BO  is WC mapping or not ?
>
> How can I let(or tell) the GPU that
>
> please not write to shared L3 cache,  write to the system RAM directly?
>
> On out platform, the coherency between the CPU and peripheral IO
> device's cache
>
> is maintained by hardware. While at here.  the hardware is the shared L3
> cache.
>
>
> I guess the current problem is that the GPU don't listen to me,
>
> he still write to CPU cache's even you choose the WC mapping.
>
> Then if you want to read the rendered image by CPU,
>
> the CPU go the un-cached path. Then I think the CPU will get wrong data.
>
Yes, that sounds plausible and I don't see much of a way around this
other than forcing the CPU to use the same path through the cache.

Regards,
Lucas