LinuxLists.cc - [PATCH 1/2] drm/gem: don't force writecombine mmap'ing

2019-07-16 16:46:41

Subject: [PATCH 1/2] drm/gem: don't force writecombine mmap'ing

From: Rob Clark <[email protected]>

The driver should be in control of this.

Signed-off-by: Rob Clark <[email protected]>
---
It is possible that this was masking bugs (ie. not setting appropriate
pgprot) in drivers. I don't have a particularly good idea for tracking
those down (since I don't have the hw for most drivers). Unless someone
has a better idea, maybe land this and let driver maintainers fix any
potential fallout in their drivers?

This is necessary for the next patch to fix VGEM brokenness on arm.

drivers/gpu/drm/drm_gem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 8a55f71325b1..7d6242cc69f2 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1110,7 +1110,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,

vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
vma->vm_private_data = obj;
- vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);

/* Take a ref for this mapping of the object, so that the fault
--
2.21.0

2019-07-16 16:48:04

by Rob Clark

[permalink] [raw]

Subject: [PATCH 2/2] drm/vgem: use normal cached mmap'ings

From: Rob Clark <[email protected]>

Since there is no real device associated with vgem, it is impossible to
end up with appropriate dev->dma_ops, meaning that we have no way to
invalidate the shmem pages allocated by vgem. So, at least on platforms
without drm_cflush_pages(), we end up with corruption when cache lines
from previous usage of vgem bo pages get evicted to memory.

The only sane option is to use cached mappings.

Signed-off-by: Rob Clark <[email protected]>
---
Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
it is. And that doesn't really help for drivers that don't attach/
detach for each use.

But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
need to care too much about use of cached mmap'ings.

drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 11a8f99ba18c..ccf0c3fbd586 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -259,9 +259,6 @@ static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
if (ret)
return ret;

- /* Keep the WC mmaping set by drm_gem_mmap() but our pages
- * are ordinary and not special.
- */
vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
return 0;
}
@@ -382,7 +379,7 @@ static void *vgem_prime_vmap(struct drm_gem_object *obj)
if (IS_ERR(pages))
return NULL;

- return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
+ return vmap(pages, n_pages, 0, PAGE_KERNEL);
}

static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
@@ -411,7 +408,7 @@ static int vgem_prime_mmap(struct drm_gem_object *obj,
fput(vma->vm_file);
vma->vm_file = get_file(obj->filp);
vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
- vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);

return 0;
}
--
2.21.0

2019-07-16 17:00:53

by Chris Wilson

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vgem: use normal cached mmap'ings

Quoting Rob Clark (2019-07-16 17:42:15)
> From: Rob Clark <[email protected]>
>
> Since there is no real device associated with vgem, it is impossible to
> end up with appropriate dev->dma_ops, meaning that we have no way to
> invalidate the shmem pages allocated by vgem. So, at least on platforms
> without drm_cflush_pages(), we end up with corruption when cache lines
> from previous usage of vgem bo pages get evicted to memory.
>
> The only sane option is to use cached mappings.
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> it is. And that doesn't really help for drivers that don't attach/
> detach for each use.
>
> But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> need to care too much about use of cached mmap'ings.

Sadly this regresses with i915 interop.

Starting subtest: 4KiB-tiny-vgem-blt-early-read-child
(gem_concurrent_blit:8309) CRITICAL: Test assertion failure function dmabuf_cmp_bo, file ../tests/i915/gem_concurrent_all.c:408:
(gem_concurrent_blit:8309) CRITICAL: Failed assertion: v[((y)*(b->width) + (((y) + pass)%(b->width)))] == val
(gem_concurrent_blit:8309) CRITICAL: error: 0 != 0xdeadbeef

and igt/prime_vgem

Can you please cc intel-gfx so CI can pick up these changes?
-Chris

2019-07-16 17:05:15

by Rob Clark

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vgem: use normal cached mmap'ings

On Tue, Jul 16, 2019 at 10:01 AM Chris Wilson <[email protected]> wrote:
>
> Quoting Rob Clark (2019-07-16 17:42:15)
> > From: Rob Clark <[email protected]>
> >
> > Since there is no real device associated with vgem, it is impossible to
> > end up with appropriate dev->dma_ops, meaning that we have no way to
> > invalidate the shmem pages allocated by vgem. So, at least on platforms
> > without drm_cflush_pages(), we end up with corruption when cache lines
> > from previous usage of vgem bo pages get evicted to memory.
> >
> > The only sane option is to use cached mappings.
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > ---
> > Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> > although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> > it is. And that doesn't really help for drivers that don't attach/
> > detach for each use.
> >
> > But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> > need to care too much about use of cached mmap'ings.
>
> Sadly this regresses with i915 interop.
>
> Starting subtest: 4KiB-tiny-vgem-blt-early-read-child
> (gem_concurrent_blit:8309) CRITICAL: Test assertion failure function dmabuf_cmp_bo, file ../tests/i915/gem_concurrent_all.c:408:
> (gem_concurrent_blit:8309) CRITICAL: Failed assertion: v[((y)*(b->width) + (((y) + pass)%(b->width)))] == val
> (gem_concurrent_blit:8309) CRITICAL: error: 0 != 0xdeadbeef
>
> and igt/prime_vgem
>
> Can you please cc intel-gfx so CI can pick up these changes?
> -Chris

I suppose CI is actually reading the imported VGEM bo from GPU? I can
try to wire up the attach/detach dma_sync, which might help..

BR,
-R

2019-07-19 09:12:13

by Daniel Vetter

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vgem: use normal cached mmap'ings

On Tue, Jul 16, 2019 at 09:42:15AM -0700, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> Since there is no real device associated with vgem, it is impossible to
> end up with appropriate dev->dma_ops, meaning that we have no way to
> invalidate the shmem pages allocated by vgem. So, at least on platforms
> without drm_cflush_pages(), we end up with corruption when cache lines
> from previous usage of vgem bo pages get evicted to memory.
>
> The only sane option is to use cached mappings.
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> it is. And that doesn't really help for drivers that don't attach/
> detach for each use.
>
> But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> need to care too much about use of cached mmap'ings.

Isn't this going to horribly break testing buffer sharing with SoC
devices? I'd assume they all expect writecombining mode to make sure stuff
is coherent?

Also could we get away with this by simply extending drm_cflush_pages for
those arm platforms where we do have a clflush instruction? Yes I know
that'll get people screaming, I'll shrug :-)

If all we need patch 1/2 for is this vgem patch then the auditing needed for
patch 1 doesn't look appealing ...
-Daniel

>
> drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index 11a8f99ba18c..ccf0c3fbd586 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -259,9 +259,6 @@ static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
> if (ret)
> return ret;
>
> - /* Keep the WC mmaping set by drm_gem_mmap() but our pages
> - * are ordinary and not special.
> - */
> vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
> return 0;
> }
> @@ -382,7 +379,7 @@ static void *vgem_prime_vmap(struct drm_gem_object *obj)
> if (IS_ERR(pages))
> return NULL;
>
> - return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
> + return vmap(pages, n_pages, 0, PAGE_KERNEL);
> }
>
> static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
> @@ -411,7 +408,7 @@ static int vgem_prime_mmap(struct drm_gem_object *obj,
> fput(vma->vm_file);
> vma->vm_file = get_file(obj->filp);
> vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
> - vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>
> return 0;
> }
> --
> 2.21.0
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2019-07-19 18:38:44

by Rob Clark

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vgem: use normal cached mmap'ings

On Fri, Jul 19, 2019 at 2:09 AM Daniel Vetter <[email protected]> wrote:
>
> On Tue, Jul 16, 2019 at 09:42:15AM -0700, Rob Clark wrote:
> > From: Rob Clark <[email protected]>
> >
> > Since there is no real device associated with vgem, it is impossible to
> > end up with appropriate dev->dma_ops, meaning that we have no way to
> > invalidate the shmem pages allocated by vgem. So, at least on platforms
> > without drm_cflush_pages(), we end up with corruption when cache lines
> > from previous usage of vgem bo pages get evicted to memory.
> >
> > The only sane option is to use cached mappings.
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > ---
> > Possibly we could dma_sync_*_for_{device,cpu}() on dmabuf attach/detach,
> > although the ->gem_prime_{pin,unpin}() API isn't quite ideal for that as
> > it is. And that doesn't really help for drivers that don't attach/
> > detach for each use.
> >
> > But AFAICT vgem is mainly used for dmabuf testing, so maybe we don't
> > need to care too much about use of cached mmap'ings.
>
> Isn't this going to horribly break testing buffer sharing with SoC
> devices? I'd assume they all expect writecombining mode to make sure stuff
> is coherent?
>
> Also could we get away with this by simply extending drm_cflush_pages for
> those arm platforms where we do have a clflush instruction? Yes I know
> that'll get people screaming, I'll shrug :-)
>
> If all we need patch 1/2 for is this vgem patch then the auditing needed for
> patch 1 doesn't look appealing ...

I think we should go w/ the simpler approach in that keeps WC (but
kinda relies on an implementation detail of dma-mapping, ie.
dev->dma_ops==NULL => dma_direct

IMO the first patch in this series is probably a thing we should try
to do somehow, it is a bit rude that core helpers are forcing WC. But
not sure about how to land that smoothly. Perhaps something worth
adding to the TODO list at any rate.

BR,
-R

> -Daniel
>
> >
> > drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> > index 11a8f99ba18c..ccf0c3fbd586 100644
> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > @@ -259,9 +259,6 @@ static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
> > if (ret)
> > return ret;
> >
> > - /* Keep the WC mmaping set by drm_gem_mmap() but our pages
> > - * are ordinary and not special.
> > - */
> > vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP;
> > return 0;
> > }
> > @@ -382,7 +379,7 @@ static void *vgem_prime_vmap(struct drm_gem_object *obj)
> > if (IS_ERR(pages))
> > return NULL;
> >
> > - return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL));
> > + return vmap(pages, n_pages, 0, PAGE_KERNEL);
> > }
> >
> > static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
> > @@ -411,7 +408,7 @@ static int vgem_prime_mmap(struct drm_gem_object *obj,
> > fput(vma->vm_file);
> > vma->vm_file = get_file(obj->filp);
> > vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
> > - vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> > + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> >
> > return 0;
> > }
> > --
> > 2.21.0
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch