2005-01-06 17:18:12

by [email protected]

[permalink] [raw]
Subject: chasing the four level page table

The DRM driver contains this routine:

drivers/char/drm/drm_memory.h

static inline unsigned long
drm_follow_page (void *vaddr)
{
pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
pud_t *pud = pud_offset(pgd, (unsigned long) vaddr);
pmd_t *pmd = pmd_offset(pud, (unsigned long) vaddr);
pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
return pte_pfn(*ptep) << PAGE_SHIFT;
}

No other driver needs to chase the page table like this so there is
probably some other way to achieve this. Can someone who knows more
about the VM system tell me if there is a way to eliminate this code?

If there are any VM/AGP experts with some free time, drm_memory.h
could use some rewriting to make it pass sparse checks.

--
Jon Smirl
[email protected]


2005-01-06 18:16:35

by Andi Kleen

[permalink] [raw]
Subject: Re: chasing the four level page table

Jon Smirl <[email protected]> writes:

> The DRM driver contains this routine:
>
> drivers/char/drm/drm_memory.h
>
> static inline unsigned long
> drm_follow_page (void *vaddr)
> {
> pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
> pud_t *pud = pud_offset(pgd, (unsigned long) vaddr);
> pmd_t *pmd = pmd_offset(pud, (unsigned long) vaddr);
> pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
> return pte_pfn(*ptep) << PAGE_SHIFT;
> }
>
> No other driver needs to chase the page table like this so there is
> probably some other way to achieve this. Can someone who knows more
> about the VM system tell me if there is a way to eliminate this code?

Yes, you should use get_user_pages() instead if you access real memory.
If you try to find hardware mappings using that there is no ready
function for you right now, although I guess it could be added.

The function is also not quite correct, it should already least take
the page_table_lock (depending on where you call it from) and check
p*_none() on each level.

-Andi

2005-01-06 18:40:59

by [email protected]

[permalink] [raw]
Subject: Re: chasing the four level page table

On Thu, 06 Jan 2005 19:12:15 +0100, Andi Kleen <[email protected]> wrote:
> Yes, you should use get_user_pages() instead if you access real memory.
> If you try to find hardware mappings using that there is no ready
> function for you right now, although I guess it could be added.

drm_follow_page is used like this:

offset = drm_follow_page(pt) | ((unsigned long) pt & ~PAGE_MASK);
map = drm_lookup_map(offset, size, dev);
if (map && map->type == _DRM_AGP) {
vunmap(pt);
return;
}

I think pt is a user space address. In DRM AGP memory is mapped into
kernel and user space so the user space address is being converted
into a kernel space one. The kernel space one is used to verify that
the address is a valid mapping to AGP space, if so the page is
unmapped. I didn't write this code so I'm not 100% sure of what is
going on.

On Thu, 06 Jan 2005 19:12:15 +0100, Andi Kleen <[email protected]> wrote:
> Jon Smirl <[email protected]> writes:
>
> > The DRM driver contains this routine:
> >
> > drivers/char/drm/drm_memory.h
> >
> > static inline unsigned long
> > drm_follow_page (void *vaddr)
> > {
> > pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
> > pud_t *pud = pud_offset(pgd, (unsigned long) vaddr);
> > pmd_t *pmd = pmd_offset(pud, (unsigned long) vaddr);
> > pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
> > return pte_pfn(*ptep) << PAGE_SHIFT;
> > }
> >
> > No other driver needs to chase the page table like this so there is
> > probably some other way to achieve this. Can someone who knows more
> > about the VM system tell me if there is a way to eliminate this code?
>
> Yes, you should use get_user_pages() instead if you access real memory.
> If you try to find hardware mappings using that there is no ready
> function for you right now, although I guess it could be added.
>
> The function is also not quite correct, it should already least take
> the page_table_lock (depending on where you call it from) and check
> p*_none() on each level.
>
> -Andi
>

--
Jon Smirl
[email protected]

2005-01-06 19:43:08

by Andi Kleen

[permalink] [raw]
Subject: Re: chasing the four level page table

On Thu, Jan 06, 2005 at 01:36:46PM -0500, Jon Smirl wrote:
> On Thu, 06 Jan 2005 19:12:15 +0100, Andi Kleen <[email protected]> wrote:
> > Yes, you should use get_user_pages() instead if you access real memory.
> > If you try to find hardware mappings using that there is no ready
> > function for you right now, although I guess it could be added.
>
> drm_follow_page is used like this:
>
> offset = drm_follow_page(pt) | ((unsigned long) pt & ~PAGE_MASK);
> map = drm_lookup_map(offset, size, dev);
> if (map && map->type == _DRM_AGP) {
> vunmap(pt);
> return;
> }
>
> I think pt is a user space address. In DRM AGP memory is mapped into
> kernel and user space so the user space address is being converted
> into a kernel space one. The kernel space one is used to verify that
> the address is a valid mapping to AGP space, if so the page is
> unmapped. I didn't write this code so I'm not 100% sure of what is
> going on.

You can't use get_user_pages in this case because the AGP aperture
can be above mem_map. If none of the callers take page_table_lock
already you would need to add that too. I guess from the context the lock
is not taken, but better double check.

Perhaps we should add a get_user_phys() or somesuch for this.

-Andi

2005-01-06 20:15:37

by [email protected]

[permalink] [raw]
Subject: Re: chasing the four level page table

On 6 Jan 2005 20:38:27 +0100, Andi Kleen <[email protected]> wrote:
> You can't use get_user_pages in this case because the AGP aperture
> can be above mem_map. If none of the callers take page_table_lock
> already you would need to add that too. I guess from the context the lock
> is not taken, but better double check.
>
> Perhaps we should add a get_user_phys() or somesuch for this.

No where in DRM is page_table_lock being taken. Also, no other device
driver takes page_table_lock either, so that probably implies that DRM
shouldn't start doing it to. Best solution would probably be add an mm
function for get_user_phys() that takes the lock internally. If you
add the function I'll convert DRM to use it.

--
Jon Smirl
[email protected]

2005-01-06 21:51:05

by Dave Jones

[permalink] [raw]
Subject: Re: chasing the four level page table

On Thu, Jan 06, 2005 at 03:05:49PM -0500, Jon Smirl wrote:
> On 6 Jan 2005 20:38:27 +0100, Andi Kleen <[email protected]> wrote:
> > You can't use get_user_pages in this case because the AGP aperture
> > can be above mem_map. If none of the callers take page_table_lock
> > already you would need to add that too. I guess from the context the lock
> > is not taken, but better double check.
> >
> > Perhaps we should add a get_user_phys() or somesuch for this.
>
> No where in DRM is page_table_lock being taken. Also, no other device
> driver takes page_table_lock either, so that probably implies that DRM
> shouldn't start doing it to.

No other device driver is also doing such lowlevel stuff with
page tables directly afaics. drivers/char/drm seem to be the only drivers
using [pgd|pmd|pte]_offset() routines.

Dave

2005-01-08 05:23:16

by [email protected]

[permalink] [raw]
Subject: Re: chasing the four level page table

On Thu, 6 Jan 2005 16:41:59 -0500, Dave Jones <[email protected]> wrote:
> No other device driver is also doing such lowlevel stuff with
> page tables directly afaics. drivers/char/drm seem to be the only drivers
> using [pgd|pmd|pte]_offset() routines.

On 6 Jan 2005 20:38:27 +0100, Andi Kleen <[email protected]> wrote:
> Perhaps we should add a get_user_phys() or somesuch for this.

I think this is a case where the memory manager is missing a function
that DRM needs. If there was a get_user_phys() function DRM wouldn't
need to walk the page tables.

--
Jon Smirl
[email protected]

2005-01-15 02:54:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: chasing the four level page table

Followup to: <[email protected]>
By author: Jon Smirl <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Thu, 6 Jan 2005 16:41:59 -0500, Dave Jones <[email protected]> wrote:
> > No other device driver is also doing such lowlevel stuff with
> > page tables directly afaics. drivers/char/drm seem to be the only drivers
> > using [pgd|pmd|pte]_offset() routines.
>
> On 6 Jan 2005 20:38:27 +0100, Andi Kleen <[email protected]> wrote:
> > Perhaps we should add a get_user_phys() or somesuch for this.
>
> I think this is a case where the memory manager is missing a function
> that DRM needs. If there was a get_user_phys() function DRM wouldn't
> need to walk the page tables.
>

FWIW, the Nvidia device driver wrapper also has this issue.

There seems to be at least two classes of device drivers -- graphics
and RDMA -- which have a genuine need to DMA user pages, after
appropriate locking, of course.

At that point we're better off having the mm export the right
functionality to keep device driver authors from doing it wrong.

-hpa

2005-01-15 03:38:18

by Roland Dreier

[permalink] [raw]
Subject: Re: chasing the four level page table

ak> Perhaps we should add a get_user_phys() or somesuch for this.

hpa> There seems to be at least two classes of device drivers --
hpa> graphics and RDMA -- which have a genuine need to DMA user
hpa> pages, after appropriate locking, of course.

I'm working on InfiniBand drivers, which will be doing RDMA. I'm
probably missing something, but get_user_pages() seems to be all I
need -- I don't see how get_user_phys() would help me. Of course I
need to do dma_map_sg() or the like to get an address I can pass to
the InfiniBand device but I don't see anything wrong with that.

There are some issues around fork() and copy-on-write, but those
really require more access to vma handling than page tables. What
would be nice would be an equivalent to do_mlock() that also sets or
clears the VM_DONTCOPY flag. This is because an RDMA application
wants to do something like mlock() to keep any pages to be DMAed
present, but even after doing mlock(), if the application forks and
touches one of these locked pages, the COW will move the page to a new
physical address.

- Roland

2005-01-15 04:24:57

by Andi Kleen

[permalink] [raw]
Subject: Re: chasing the four level page table

[email protected] (H. Peter Anvin) writes:
>
> There seems to be at least two classes of device drivers -- graphics
> and RDMA -- which have a genuine need to DMA user pages, after
> appropriate locking, of course.

And block devices with O_DIRECT and network devices. We supported that
fine for years with get_user_pages.

The issue with DRM is just that it does something more different:

it wants to get at a AGP page outside get_user_pages doesn't work for
this because the AGP hole is often outside mem_map. For that a
nice helper is missing.

I'm not 100% we really want a helper because it's rather obscure
requirement, unlikely to be useful for others, and it may be better
to keep it in DRM.

-Andi

2005-01-15 05:59:21

by [email protected]

[permalink] [raw]
Subject: Re: chasing the four level page table

On Sat, 15 Jan 2005 05:24:54 +0100, Andi Kleen <[email protected]> wrote:
> it wants to get at a AGP page outside get_user_pages doesn't work for
> this because the AGP hole is often outside mem_map. For that a
> nice helper is missing.
>
> I'm not 100% we really want a helper because it's rather obscure
> requirement, unlikely to be useful for others, and it may be better
> to keep it in DRM.

Wouldn't it be better as a helper where the memory management people
maintain it? For example I only work on x86, are there cross platform
issues? Also, the DRM code is missing the page_table_lock around the
calls too.

--
Jon Smirl
[email protected]