2014-06-27 11:22:42

by Alexandre Courbot

[permalink] [raw]
Subject: [PATCH v3 0/2] drm: nouveau: memory coherency for ARM

v2 was doing some pretty nasty things with the DMA API, so I took a different
approach for this v3.

As suggested, this version uses ttm_dma_populate() to populate BOs. The reason
for doing this was that it would entitle us to using the DMA sync functions,
but since the memory returned is already coherent anyway, we do not even
need to call these functions anymore.

So this series has turned into 2 small patches:

- The first attempts to make it more obvious that Nouveau can use different
ways to populate TTs, and make it possible to choose which method to use from
a single place.
- The second leverages this work to select the DMA allocator to populate TTs
on ARM.

Doing this solves all our coherency problems with Nouveau on Tegra, and
hopefully makes the code easier to read in the process.

Alexandre Courbot (2):
drm/nouveau: cleanup TTM population logic
drm/nouveau: use DMA TT population method on ARM

drivers/gpu/drm/nouveau/nouveau_bo.c | 63 ++++++++++++++++++-----------------
drivers/gpu/drm/nouveau/nouveau_drm.h | 11 ++++++
drivers/gpu/drm/nouveau/nouveau_ttm.c | 17 ++++++++++
3 files changed, 61 insertions(+), 30 deletions(-)

--
2.0.0


2014-06-27 11:22:48

by Alexandre Courbot

[permalink] [raw]
Subject: [PATCH v3 1/2] drm/nouveau: cleanup TTM population logic

TTM pages can currently be populated using 3 different methods, but the
code does not make it clear which one is used. The same complex
conditions are tested in various parts of the code, which makes it
difficult to follow and error prone.

This patch introduces an enum into the nouveau_drm struct that indicates
which population method will be used for TT-backed BOs. It is set once
and for all during TTM initialization so the same conditions need not be
tested again and again. This will also allow us to add new non-default
TTM population cases by changing a single piece of code in
nouveau_ttm_init() instead of all over nouveau_bo.c

Signed-off-by: Alexandre Courbot <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 63 ++++++++++++++++++-----------------
drivers/gpu/drm/nouveau/nouveau_drm.h | 11 ++++++
drivers/gpu/drm/nouveau/nouveau_ttm.c | 15 +++++++++
3 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 4db886f9f793..82c6b479abcb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -28,7 +28,6 @@
*/

#include <core/engine.h>
-#include <linux/swiotlb.h>

#include <subdev/fb.h>
#include <subdev/vm.h>
@@ -467,11 +466,11 @@ static struct ttm_tt *
nouveau_ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size,
uint32_t page_flags, struct page *dummy_read)
{
-#if __OS_HAS_AGP
+#if defined(TTM_HAS_AGP)
struct nouveau_drm *drm = nouveau_bdev(bdev);
struct drm_device *dev = drm->dev;

- if (drm->agp.stat == ENABLED) {
+ if (drm->ttm.populate_method == AGP) {
return ttm_agp_tt_create(bdev, dev->agp->bridge, size,
page_flags, dummy_read);
}
@@ -524,21 +523,23 @@ nouveau_bo_init_mem_type(struct ttm_bo_device *bdev, uint32_t type,
if (nv_device(drm->device)->card_type >= NV_50)
man->func = &nouveau_gart_manager;
else
- if (drm->agp.stat != ENABLED)
+ if (drm->ttm.populate_method != AGP)
man->func = &nv04_gart_manager;
else
man->func = &ttm_bo_manager_func;

- if (drm->agp.stat == ENABLED) {
- man->flags = TTM_MEMTYPE_FLAG_MAPPABLE;
+ man->flags = TTM_MEMTYPE_FLAG_MAPPABLE;
+
+ if (drm->ttm.populate_method != AGP)
+ man->flags |= TTM_MEMTYPE_FLAG_CMA;
+
+ if (drm->ttm.populate_method == POOL) {
+ man->available_caching = TTM_PL_MASK_CACHING;
+ man->default_caching = TTM_PL_FLAG_CACHED;
+ } else {
man->available_caching = TTM_PL_FLAG_UNCACHED |
TTM_PL_FLAG_WC;
man->default_caching = TTM_PL_FLAG_WC;
- } else {
- man->flags = TTM_MEMTYPE_FLAG_MAPPABLE |
- TTM_MEMTYPE_FLAG_CMA;
- man->available_caching = TTM_PL_MASK_CACHING;
- man->default_caching = TTM_PL_FLAG_CACHED;
}

break;
@@ -1249,13 +1250,11 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct ttm_mem_reg *mem)
/* System memory */
return 0;
case TTM_PL_TT:
-#if __OS_HAS_AGP
- if (drm->agp.stat == ENABLED) {
+ if (drm->ttm.populate_method == AGP) {
mem->bus.offset = mem->start << PAGE_SHIFT;
mem->bus.base = drm->agp.base;
mem->bus.is_iomem = !dev->agp->cant_use_aperture;
}
-#endif
if (nv_device(drm->device)->card_type < NV_50 || !node->memtype)
/* untiled */
break;
@@ -1359,17 +1358,20 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm)
device = nv_device(drm->device);
dev = drm->dev;

-#if __OS_HAS_AGP
- if (drm->agp.stat == ENABLED) {
+ switch (drm->ttm.populate_method) {
+ case AGP:
+#if defined(TTM_HAS_AGP)
return ttm_agp_tt_populate(ttm);
- }
+#else
+ return 0;
#endif
+ case DMA:
+ return ttm_dma_populate(ttm_dma, dev->dev);

-#ifdef CONFIG_SWIOTLB
- if (swiotlb_nr_tbl()) {
- return ttm_dma_populate((void *)ttm, dev->dev);
+ case POOL:
+ /* POOL case is handled below */
+ break;
}
-#endif

r = ttm_pool_populate(ttm);
if (r) {
@@ -1409,19 +1411,20 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
device = nv_device(drm->device);
dev = drm->dev;

-#if __OS_HAS_AGP
- if (drm->agp.stat == ENABLED) {
+ switch (drm->ttm.populate_method) {
+ case AGP:
+#if defined(TTM_HAS_AGP)
ttm_agp_tt_unpopulate(ttm);
- return;
- }
#endif
-
-#ifdef CONFIG_SWIOTLB
- if (swiotlb_nr_tbl()) {
- ttm_dma_unpopulate((void *)ttm, dev->dev);
return;
+ case DMA:
+ ttm_dma_unpopulate(ttm_dma, dev->dev);
+ return;
+
+ case POOL:
+ /* POOL case is handled below */
+ break;
}
-#endif

for (i = 0; i < ttm->num_pages; i++) {
if (ttm_dma->dma_address[i]) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.h b/drivers/gpu/drm/nouveau/nouveau_drm.h
index 7efbafaf7c1d..0d4b9eff8863 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.h
@@ -103,6 +103,17 @@ struct nouveau_drm {
struct ttm_mem_reg *, struct ttm_mem_reg *);
struct nouveau_channel *chan;
int mtrr;
+ /*
+ * How TTM objects are populated:
+ * POOL: use ttm_pool_populate()
+ * AGP: use ttm_agp_tt_populate()
+ * DMA: use ttm_dma_populate()
+ */
+ enum {
+ POOL,
+ AGP,
+ DMA,
+ } populate_method;
} ttm;

/* GEM interface support */
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index ab0228f640a5..70893416d4b6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -24,6 +24,10 @@
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

+#include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
+#include <linux/kconfig.h>
+
#include <subdev/fb.h>
#include <subdev/vm.h>
#include <subdev/instmem.h>
@@ -375,6 +379,17 @@ nouveau_ttm_init(struct nouveau_drm *drm)
DMA_BIT_MASK(32));
}

+ drm->ttm.populate_method = POOL;
+
+#ifdef CONFIG_SWIOTLB
+ if (swiotlb_nr_tbl())
+ drm->ttm.populate_method = DMA;
+#endif
+#if defined(TTM_HAS_AGP)
+ if (drm->agp.stat == ENABLED)
+ drm->ttm.populate_method = AGP;
+#endif
+
ret = nouveau_ttm_global_init(drm);
if (ret)
return ret;
--
2.0.0

2014-06-27 11:23:03

by Alexandre Courbot

[permalink] [raw]
Subject: [PATCH v3 2/2] drm/nouveau: use DMA TT population method on ARM

Cached memory accesses between the CPU and the GPU are not coherent on
ARM. Use the DMA TTM allocator on this architecture to obtain coherent
memory.

Signed-off-by: Alexandre Courbot <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 70893416d4b6..d77fc8a5167d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -385,6 +385,8 @@ nouveau_ttm_init(struct nouveau_drm *drm)
if (swiotlb_nr_tbl())
drm->ttm.populate_method = DMA;
#endif
+ if (IS_ENABLED(CONFIG_ARM))
+ drm->ttm.populate_method = DMA;
#if defined(TTM_HAS_AGP)
if (drm->agp.stat == ENABLED)
drm->ttm.populate_method = AGP;
--
2.0.0

2014-07-03 10:05:30

by Alexandre Courbot

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] drm: nouveau: memory coherency for ARM

While this series is more correct from the DMA API point of view, it
is also much heavier as it strictly disables the use of any cache on
all user-space mapped BOs, and is also much more restricted in terms
of which memory it can use.

I have a v4 in the works that lets us use TTM for user-mapped objects
and only falls back to using the DMA API for kernel-only objects
(fences & PBs basically). Will push it shortly but wanted to signal
this revision as deprecated in the meantime.

On Fri, Jun 27, 2014 at 8:22 PM, Alexandre Courbot <[email protected]> wrote:
> v2 was doing some pretty nasty things with the DMA API, so I took a different
> approach for this v3.
>
> As suggested, this version uses ttm_dma_populate() to populate BOs. The reason
> for doing this was that it would entitle us to using the DMA sync functions,
> but since the memory returned is already coherent anyway, we do not even
> need to call these functions anymore.
>
> So this series has turned into 2 small patches:
>
> - The first attempts to make it more obvious that Nouveau can use different
> ways to populate TTs, and make it possible to choose which method to use from
> a single place.
> - The second leverages this work to select the DMA allocator to populate TTs
> on ARM.
>
> Doing this solves all our coherency problems with Nouveau on Tegra, and
> hopefully makes the code easier to read in the process.
>
> Alexandre Courbot (2):
> drm/nouveau: cleanup TTM population logic
> drm/nouveau: use DMA TT population method on ARM
>
> drivers/gpu/drm/nouveau/nouveau_bo.c | 63 ++++++++++++++++++-----------------
> drivers/gpu/drm/nouveau/nouveau_drm.h | 11 ++++++
> drivers/gpu/drm/nouveau/nouveau_ttm.c | 17 ++++++++++
> 3 files changed, 61 insertions(+), 30 deletions(-)
>
> --
> 2.0.0
>