2009-07-24 07:57:32

by Thomas Hellstrom

[permalink] [raw]
Subject: [PATCH 1/2] x86: Export kmap_atomic_prot() needed for TTM.

This functionality is needed to kmap_atomic() highmem pages that may
potentially have or are about to set up other mappings with
non-standard caching attributes.

Signed-off-by: Thomas Hellstrom <[email protected]>
---
arch/x86/mm/highmem_32.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 58f621e..2112ed5 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -103,6 +103,7 @@ EXPORT_SYMBOL(kmap);
EXPORT_SYMBOL(kunmap);
EXPORT_SYMBOL(kmap_atomic);
EXPORT_SYMBOL(kunmap_atomic);
+EXPORT_SYMBOL(kmap_atomic_prot);

void __init set_highmem_pages_init(void)
{
--
1.6.1.3


2009-07-24 07:57:35

by Thomas Hellstrom

[permalink] [raw]
Subject: [PATCH 2/2] ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes.

For x86 this affected highmem pages only, since they were always kmapped
cache-coherent, and this is fixed using kmap_atomic_prot().

For other architectures that may not modify the linear kernel map we
resort to vmap() for now, since kmap_atomic_prot() generally uses the
linear kernel map for lowmem pages. This of course comes with a
performance impact and should be optimized when possible.

Signed-off-by: Thomas Hellstrom <[email protected]>
---
drivers/gpu/drm/ttm/ttm_bo_util.c | 63 ++++++++++++++++++++++++++++++------
1 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 3e5d0c4..ce2e6f3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -136,7 +136,8 @@ static int ttm_copy_io_page(void *dst, void *src, unsigned long page)
}

static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
- unsigned long page)
+ unsigned long page,
+ pgprot_t prot)
{
struct page *d = ttm_tt_get_page(ttm, page);
void *dst;
@@ -145,17 +146,35 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
return -ENOMEM;

src = (void *)((unsigned long)src + (page << PAGE_SHIFT));
- dst = kmap(d);
+
+#ifdef CONFIG_X86
+ dst = kmap_atomic_prot(d, KM_USER0, prot);
+#else
+ if (prot != PAGE_KERNEL)
+ dst = vmap(&d, 1, 0, prot);
+ else
+ dst = kmap(d);
+#endif
if (!dst)
return -ENOMEM;

memcpy_fromio(dst, src, PAGE_SIZE);
- kunmap(d);
+
+#ifdef CONFIG_X86
+ kunmap_atomic(dst, KM_USER0);
+#else
+ if (prot != PAGE_KERNEL)
+ vunmap(dst);
+ else
+ kunmap(d);
+#endif
+
return 0;
}

static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst,
- unsigned long page)
+ unsigned long page,
+ pgprot_t prot)
{
struct page *s = ttm_tt_get_page(ttm, page);
void *src;
@@ -164,12 +183,28 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst,
return -ENOMEM;

dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT));
- src = kmap(s);
+#ifdef CONFIG_X86
+ src = kmap_atomic_prot(s, KM_USER0, prot);
+#else
+ if (prot != PAGE_KERNEL)
+ src = vmap(&s, 1, 0, prot);
+ else
+ src = kmap(s);
+#endif
if (!src)
return -ENOMEM;

memcpy_toio(dst, src, PAGE_SIZE);
- kunmap(s);
+
+#ifdef CONFIG_X86
+ kunmap_atomic(src, KM_USER0);
+#else
+ if (prot != PAGE_KERNEL)
+ vunmap(src);
+ else
+ kunmap(s);
+#endif
+
return 0;
}

@@ -214,11 +249,17 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,

for (i = 0; i < new_mem->num_pages; ++i) {
page = i * dir + add;
- if (old_iomap == NULL)
- ret = ttm_copy_ttm_io_page(ttm, new_iomap, page);
- else if (new_iomap == NULL)
- ret = ttm_copy_io_ttm_page(ttm, old_iomap, page);
- else
+ if (old_iomap == NULL) {
+ pgprot_t prot = ttm_io_prot(old_mem->placement,
+ PAGE_KERNEL);
+ ret = ttm_copy_ttm_io_page(ttm, new_iomap, page,
+ prot);
+ } else if (new_iomap == NULL) {
+ pgprot_t prot = ttm_io_prot(new_mem->placement,
+ PAGE_KERNEL);
+ ret = ttm_copy_io_ttm_page(ttm, old_iomap, page,
+ prot);
+ } else
ret = ttm_copy_io_page(new_iomap, old_iomap, page);
if (ret)
goto out1;
--
1.6.1.3

2009-07-30 15:59:17

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH 2/2] ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes.

Hi,

since I see this patch in Linus' tree, and I likely have to patch
TTM in Nouveau's compat-branch to compile with older kernels,
I have a question below.

(The Nouveau kernel tree's compat branch offers drm.ko, ttm.ko and
nouveau.ko to be built against kernels 2.6.28 and later.)

On Fri, 24 Jul 2009 09:57:34 +0200
Thomas Hellstrom <[email protected]> wrote:

> For x86 this affected highmem pages only, since they were always kmapped
> cache-coherent, and this is fixed using kmap_atomic_prot().
>
> For other architectures that may not modify the linear kernel map we
> resort to vmap() for now, since kmap_atomic_prot() generally uses the
> linear kernel map for lowmem pages. This of course comes with a
> performance impact and should be optimized when possible.
>
> Signed-off-by: Thomas Hellstrom <[email protected]>
> ---
> drivers/gpu/drm/ttm/ttm_bo_util.c | 63 ++++++++++++++++++++++++++++++------
> 1 files changed, 52 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 3e5d0c4..ce2e6f3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -136,7 +136,8 @@ static int ttm_copy_io_page(void *dst, void *src, unsigned long page)
> }
>
> static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
> - unsigned long page)
> + unsigned long page,
> + pgprot_t prot)
> {
> struct page *d = ttm_tt_get_page(ttm, page);
> void *dst;
> @@ -145,17 +146,35 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
> return -ENOMEM;
>
> src = (void *)((unsigned long)src + (page << PAGE_SHIFT));
> - dst = kmap(d);
> +
> +#ifdef CONFIG_X86
> + dst = kmap_atomic_prot(d, KM_USER0, prot);
> +#else
> + if (prot != PAGE_KERNEL)
> + dst = vmap(&d, 1, 0, prot);
> + else
> + dst = kmap(d);
> +#endif

What are the implications of choosing the non-CONFIG_X86 path
even on x86?

Is kmap_atomic_prot() simply an optimization allowed by the x86
arch, and the alternate way also works, although it uses the
precious vmalloc address space?

Since kmap_atomic_prot() is not exported on earlier kernels,
I'm tempted to just do the non-CONFIG_X86 path.

> if (!dst)
> return -ENOMEM;
>
> memcpy_fromio(dst, src, PAGE_SIZE);
> - kunmap(d);
> +
> +#ifdef CONFIG_X86
> + kunmap_atomic(dst, KM_USER0);
> +#else
> + if (prot != PAGE_KERNEL)
> + vunmap(dst);
> + else
> + kunmap(d);
> +#endif
> +
> return 0;
> }
>
> static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst,
> - unsigned long page)
> + unsigned long page,
> + pgprot_t prot)
> {
> struct page *s = ttm_tt_get_page(ttm, page);
> void *src;
> @@ -164,12 +183,28 @@ static int ttm_copy_ttm_io_page(struct ttm_tt *ttm, void *dst,
> return -ENOMEM;
>
> dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT));
> - src = kmap(s);
> +#ifdef CONFIG_X86
> + src = kmap_atomic_prot(s, KM_USER0, prot);
> +#else
> + if (prot != PAGE_KERNEL)
> + src = vmap(&s, 1, 0, prot);
> + else
> + src = kmap(s);
> +#endif
> if (!src)
> return -ENOMEM;
>
> memcpy_toio(dst, src, PAGE_SIZE);
> - kunmap(s);
> +
> +#ifdef CONFIG_X86
> + kunmap_atomic(src, KM_USER0);
> +#else
> + if (prot != PAGE_KERNEL)
> + vunmap(src);
> + else
> + kunmap(s);
> +#endif
> +
> return 0;
> }
>
> @@ -214,11 +249,17 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo,
>
> for (i = 0; i < new_mem->num_pages; ++i) {
> page = i * dir + add;
> - if (old_iomap == NULL)
> - ret = ttm_copy_ttm_io_page(ttm, new_iomap, page);
> - else if (new_iomap == NULL)
> - ret = ttm_copy_io_ttm_page(ttm, old_iomap, page);
> - else
> + if (old_iomap == NULL) {
> + pgprot_t prot = ttm_io_prot(old_mem->placement,
> + PAGE_KERNEL);
> + ret = ttm_copy_ttm_io_page(ttm, new_iomap, page,
> + prot);
> + } else if (new_iomap == NULL) {
> + pgprot_t prot = ttm_io_prot(new_mem->placement,
> + PAGE_KERNEL);
> + ret = ttm_copy_io_ttm_page(ttm, old_iomap, page,
> + prot);
> + } else
> ret = ttm_copy_io_page(new_iomap, old_iomap, page);
> if (ret)
> goto out1;
> --
> 1.6.1.3


Thanks.

--
Pekka Paalanen
http://www.iki.fi/pq/

2009-07-31 09:00:15

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH 2/2] ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes.

Pekka Paalanen wrote:
> Hi,
>
> since I see this patch in Linus' tree, and I likely have to patch
> TTM in Nouveau's compat-branch to compile with older kernels,
> I have a question below.
>
> (The Nouveau kernel tree's compat branch offers drm.ko, ttm.ko and
> nouveau.ko to be built against kernels 2.6.28 and later.)
>
> On Fri, 24 Jul 2009 09:57:34 +0200
> Thomas Hellstrom <[email protected]> wrote:
>
>
>> For x86 this affected highmem pages only, since they were always kmapped
>> cache-coherent, and this is fixed using kmap_atomic_prot().
>>
>> For other architectures that may not modify the linear kernel map we
>> resort to vmap() for now, since kmap_atomic_prot() generally uses the
>> linear kernel map for lowmem pages. This of course comes with a
>> performance impact and should be optimized when possible.
>>
>> Signed-off-by: Thomas Hellstrom <[email protected]>
>> ---
>> drivers/gpu/drm/ttm/ttm_bo_util.c | 63 ++++++++++++++++++++++++++++++------
>> 1 files changed, 52 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index 3e5d0c4..ce2e6f3 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -136,7 +136,8 @@ static int ttm_copy_io_page(void *dst, void *src, unsigned long page)
>> }
>>
>> static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
>> - unsigned long page)
>> + unsigned long page,
>> + pgprot_t prot)
>> {
>> struct page *d = ttm_tt_get_page(ttm, page);
>> void *dst;
>> @@ -145,17 +146,35 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
>> return -ENOMEM;
>>
>> src = (void *)((unsigned long)src + (page << PAGE_SHIFT));
>> - dst = kmap(d);
>> +
>> +#ifdef CONFIG_X86
>> + dst = kmap_atomic_prot(d, KM_USER0, prot);
>> +#else
>> + if (prot != PAGE_KERNEL)
>> + dst = vmap(&d, 1, 0, prot);
>> + else
>> + dst = kmap(d);
>> +#endif
>>
>
> What are the implications of choosing the non-CONFIG_X86 path
> even on x86?
>

The only implication is a slowdown if dealing with highmem pages or
pages with
a non standard caching policy. Also you need the patch I just posted to
dri-devel / lkml to make it compile.
I should've done more thorough testing of the non-x86 path.

> Is kmap_atomic_prot() simply an optimization allowed by the x86
> arch, and the alternate way also works, although it uses the
> precious vmalloc address space?
>

Exactly, although it's only using one page out of vmalloc space and for
the time it
takes to copy a page to / from io.

> Since kmap_atomic_prot() is not exported on earlier kernels,
> I'm tempted to just do the non-CONFIG_X86 path.
>
For compat I think that should be fine. If your driver is using
accelerated copy to / from
VRAM, you shouldn't even hit this path.

/Thomas




2009-07-31 09:31:26

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH 2/2] ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes.

On Fri, 31 Jul 2009 10:59:57 +0200
Thomas Hellstr?m <[email protected]> wrote:

> Pekka Paalanen wrote:
> >> @@ -145,17 +146,35 @@ static int ttm_copy_io_ttm_page(struct ttm_tt *ttm, void *src,
> >> return -ENOMEM;
> >>
> >> src = (void *)((unsigned long)src + (page << PAGE_SHIFT));
> >> - dst = kmap(d);
> >> +
> >> +#ifdef CONFIG_X86
> >> + dst = kmap_atomic_prot(d, KM_USER0, prot);
> >> +#else
> >> + if (prot != PAGE_KERNEL)
> >> + dst = vmap(&d, 1, 0, prot);
> >> + else
> >> + dst = kmap(d);
> >> +#endif
> >>
> >
> > What are the implications of choosing the non-CONFIG_X86 path
> > even on x86?
> >
>
> The only implication is a slowdown if dealing with highmem pages or
> pages with
> a non standard caching policy. Also you need the patch I just posted to
> dri-devel / lkml to make it compile.
> I should've done more thorough testing of the non-x86 path.
>
> > Is kmap_atomic_prot() simply an optimization allowed by the x86
> > arch, and the alternate way also works, although it uses the
> > precious vmalloc address space?
> >
>
> Exactly, although it's only using one page out of vmalloc space and for
> the time it
> takes to copy a page to / from io.
>
> > Since kmap_atomic_prot() is not exported on earlier kernels,
> > I'm tempted to just do the non-CONFIG_X86 path.
> >
> For compat I think that should be fine. If your driver is using
> accelerated copy to / from
> VRAM, you shouldn't even hit this path.

Okay, thank you very much.

--
Pekka Paalanen
http://www.iki.fi/pq/