2006-10-31 09:19:13

by Guillermo Marcus

[permalink] [raw]
Subject: mmaping a kernel buffer to user space

Hi all,

I recently run with the following situation while developing a PCI
driver. The driver allocates memory for a PCI device using
pci_alloc_consistent as this memory is going to be used to perform DMA
transfers. To pass the data from/to the user application, I mmap the
buffer into userspace. However, if I try to use remap_pfn_range
(>=2.6.10) or the older remap_page_range(<=2.6.9) for mmaping, it ends
up creating a new buffer, because they do not support RAM mapping, then
pagefaulting to the VMA and by default allocating new pages. Therefore,
I had to implement the nopage method and mmap one page at a time as they
fault.

However, to my point of view, this is unnecessary. The memory is already
allocated, the memory is locked because it is consistent, and it may be
a (very small) performance and stability issue to do them one-by-one.
Why can't I simply mmap it all at once? am I missing some function? More
important, why can't remap_{pfn/page}_range handle it?


Best wishes,
Guillermo Marcus

Note: I am using kernel 2.6.9 for these tests, as it is required by my
current setup. Maybe this issue has already been addressed in newer
kernel. If that is the case, please let me know.


2006-10-31 16:01:07

by Jiri Slaby

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Guillermo Marcus wrote:
> Hi all,
>
> I recently run with the following situation while developing a PCI
> driver. The driver allocates memory for a PCI device using
> pci_alloc_consistent as this memory is going to be used to perform DMA
> transfers. To pass the data from/to the user application, I mmap the
> buffer into userspace. However, if I try to use remap_pfn_range
> (>=2.6.10) or the older remap_page_range(<=2.6.9) for mmaping, it ends
> up creating a new buffer, because they do not support RAM mapping, then
> pagefaulting to the VMA and by default allocating new pages. Therefore,
> I had to implement the nopage method and mmap one page at a time as they
> fault.
>
> However, to my point of view, this is unnecessary. The memory is already
> allocated, the memory is locked because it is consistent, and it may be
> a (very small) performance and stability issue to do them one-by-one.
> Why can't I simply mmap it all at once? am I missing some function? More
> important, why can't remap_{pfn/page}_range handle it?

Piece of code please. pci_alloc_consistent calls __get_free_pages, and there
should be no problem with mmaping this area.

regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-10-31 16:15:20

by Rolf Offermanns

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Guillermo Marcus wrote:
> I recently run with the following situation while developing a PCI
> driver. The driver allocates memory for a PCI device using
> pci_alloc_consistent as this memory is going to be used to perform DMA
> transfers. To pass the data from/to the user application, I mmap the
> buffer into userspace. However, if I try to use remap_pfn_range
> (>=2.6.10) or the older remap_page_range(<=2.6.9) for mmaping, it ends
> up creating a new buffer, because they do not support RAM mapping, then
> pagefaulting to the VMA and by default allocating new pages. Therefore,
> I had to implement the nopage method and mmap one page at a time as they
> fault.
>
> However, to my point of view, this is unnecessary. The memory is already
> allocated, the memory is locked because it is consistent, and it may be
> a (very small) performance and stability issue to do them one-by-one.
> Why can't I simply mmap it all at once? am I missing some function? More
> important, why can't remap_{pfn/page}_range handle it?
>
Here is what I did some time ago:

-> Reserve mem at boot time (mem=realmem-size_of_mem_you_need) / bigphysmem
-> I used the highmem allocator from the LDD2/3 examples to get a pointer
the this reserved memory at runtime.
-> Use ioremap() to remap the memory to kernelspace
-> do some magic (I don't remember the background, sorry) with the vma_flags
in your mmap() function:

vma->vm_flags |= VM_RESERVED;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

and then do a remap_pfn_range() as usualy.

HTH,
Rolf


2006-10-31 16:25:57

by Guillermo Marcus

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Hi Jiri,

The fact that it does not works with RAM is well documented in LDD3,
pages 430++. It says (and I tested) that remap_xxx_range does not work
in this case. They suggest a method using nopage, similar to the one I
implement.

I do not see why remap_xxx_range has the limitation, but it is there.
The question is then: can the limitation be removed, or can we implement
a new function that maps RAM all at once without the need for a nopage
implementation?

In any case, here is the code.

Best Wishes,
Guillermo


/*************************************************/

To allocate (inside an IOctl cmd):

...
retptr = pci_alloc_consistent( privdata->pdev, kmem_handle->size,
&(kmem_entry->dma_handle) );
if (retptr == NULL)
goto kmem_alloc_mem_fail;
kmem_entry->cpua = (unsigned long)retptr;
kmem_entry->size = kmem_handle->size;
kmem_entry->id = atomic_inc_return(&privdata->kmem_count) - 1;
...


To mmap (inside mmap fops, this DOES NOT works):

...
/* Map the Buffer to the VMA */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,10)
ret = remap_pfn_range(
vma,
vma->vm_start,
__pa(kmem_entry->cpua) >> PAGE_SHIFT,
kmem_entry->size,
vma->vm_page_prot );
#else
ret = remap_page_range(
vma,
vma->vm_start,
__pa(kmem_entry->cpua),
kmem_entry->size,
vma->vm_page_prot );
#endif
...


Forcing me to this:

(in mmap):

...
/* Set VM operations */
vma->vm_private_data = kmem_entry;
vma->vm_ops = &pcidriver_vm_operations;
pcidriver_vm_open(vma);
...

(and the nopage vm_ops):

...
/* Maps physical memory to user space */
struct page *pcidriver_vm_nopage(struct vm_area_struct *vma, unsigned
long address, int * type) {

pcidriver_kmem_entry_t *kmem_entry;
struct page *page = NOPAGE_SIGBUS;
unsigned long pfn_offset,pfn;

/* Get private data for the page */
kmem_entry = vma->vm_private_data;

/* All checks where done during the mmap_kmem call, so we can safely
* just map the offset of the vm area to the offset of the region,
* that is guaranteed to be contiguos */

pfn_offset = (address) - (vma->vm_start);
pfn = (__pa(kmem_entry->cpua) + pfn_offset ) >> PAGE_SHIFT;

if (!pfn_valid(pfn)) {
mod_info("Invalid pfn in nopage() - 0x%lx \n", pfn);
return NOPAGE_SIGBUS;
}

page = pfn_to_page( pfn );

get_page(page);
if (type)
*type = VM_FAULT_MINOR;

return page;
}
...

/*************************************************/



Jiri Slaby wrote:
> Guillermo Marcus wrote:
>> Hi all,
>>
>> I recently run with the following situation while developing a PCI
>> driver. The driver allocates memory for a PCI device using
>> pci_alloc_consistent as this memory is going to be used to perform DMA
>> transfers. To pass the data from/to the user application, I mmap the
>> buffer into userspace. However, if I try to use remap_pfn_range
>> (>=2.6.10) or the older remap_page_range(<=2.6.9) for mmaping, it ends
>> up creating a new buffer, because they do not support RAM mapping, then
>> pagefaulting to the VMA and by default allocating new pages. Therefore,
>> I had to implement the nopage method and mmap one page at a time as they
>> fault.
>>
>> However, to my point of view, this is unnecessary. The memory is already
>> allocated, the memory is locked because it is consistent, and it may be
>> a (very small) performance and stability issue to do them one-by-one.
>> Why can't I simply mmap it all at once? am I missing some function? More
>> important, why can't remap_{pfn/page}_range handle it?
>
> Piece of code please. pci_alloc_consistent calls __get_free_pages, and there
> should be no problem with mmaping this area.
>
> regards,

2006-10-31 16:33:35

by Guillermo Marcus

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Hi Rolf,

Thanks for your comments. Unfortunately, given some of the platforms we
want to support, I cannot reserve memory at boot time, so I had to find
some other way. Besides, it is anyway interesting to see how to present
a buffer allocated using the DMA interfaces (pci_alloc or dma_alloc) to
the user space.

All the best,
Guillermo

Rolf Offermanns wrote:
> Guillermo Marcus wrote:
>> I recently run with the following situation while developing a PCI
>> driver. The driver allocates memory for a PCI device using
>> pci_alloc_consistent as this memory is going to be used to perform DMA
>> transfers. To pass the data from/to the user application, I mmap the
>> buffer into userspace. However, if I try to use remap_pfn_range
>> (>=2.6.10) or the older remap_page_range(<=2.6.9) for mmaping, it ends
>> up creating a new buffer, because they do not support RAM mapping, then
>> pagefaulting to the VMA and by default allocating new pages. Therefore,
>> I had to implement the nopage method and mmap one page at a time as they
>> fault.
>>
>> However, to my point of view, this is unnecessary. The memory is already
>> allocated, the memory is locked because it is consistent, and it may be
>> a (very small) performance and stability issue to do them one-by-one.
>> Why can't I simply mmap it all at once? am I missing some function? More
>> important, why can't remap_{pfn/page}_range handle it?
>>
> Here is what I did some time ago:
>
> -> Reserve mem at boot time (mem=realmem-size_of_mem_you_need) / bigphysmem
> -> I used the highmem allocator from the LDD2/3 examples to get a pointer
> the this reserved memory at runtime.
> -> Use ioremap() to remap the memory to kernelspace
> -> do some magic (I don't remember the background, sorry) with the vma_flags
> in your mmap() function:
>
> vma->vm_flags |= VM_RESERVED;
> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>
> and then do a remap_pfn_range() as usualy.
>
> HTH,
> Rolf
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-10-31 16:49:48

by Jiri Slaby

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Guillermo Marcus wrote:
> Hi Jiri,
>
> The fact that it does not works with RAM is well documented in LDD3,
> pages 430++. It says (and I tested) that remap_xxx_range does not work
> in this case. They suggest a method using nopage, similar to the one I
> implement.

Could somebody confirm, that this still holds?

> I do not see why remap_xxx_range has the limitation, but it is there.
> The question is then: can the limitation be removed, or can we implement
> a new function that maps RAM all at once without the need for a nopage
> implementation?
>
> In any case, here is the code.

Hmm, interesting. I used remap_pfn_range for this purpose today and it worked (I
double-checked this). I should probably do the rework :(.

regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-10-31 17:08:20

by Franck Bui-Huu

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Hi,

Jiri Slaby wrote:
> Guillermo Marcus wrote:
>> Hi Jiri,
>>
>> The fact that it does not works with RAM is well documented in LDD3,
>> pages 430++. It says (and I tested) that remap_xxx_range does not work
>> in this case. They suggest a method using nopage, similar to the one I
>> implement.
>
> Could somebody confirm, that this still holds?

Apparently this restriction has been removed since 2.6.15 when
VM_PFNMAP flag has been introduced, see commit
6aab341e0a28aff100a09831c5300a2994b8b986

Why there's such restriction before 2.6.15, I haven't searched
yet, but any hints would be appreciated.

Thanks
Franck

2006-10-31 19:23:03

by Russell King

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

On Tue, Oct 31, 2006 at 05:00:59PM +0100, Jiri Slaby wrote:
> Piece of code please. pci_alloc_consistent calls __get_free_pages, and there
> should be no problem with mmaping this area.

That is an implementation detail which is not portable to other
architectures. Please don't encourage people to use non-portable
implementation details.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-10-31 19:37:00

by Jiri Slaby

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Russell King wrote:
> On Tue, Oct 31, 2006 at 05:00:59PM +0100, Jiri Slaby wrote:
>> Piece of code please. pci_alloc_consistent calls __get_free_pages, and there
>> should be no problem with mmaping this area.
>
> That is an implementation detail which is not portable to other
> architectures. Please don't encourage people to use non-portable
> implementation details.

Sorry, I stand corrected, thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-10-31 19:44:21

by Miguel Ojeda

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

On 10/31/06, Jiri Slaby <[email protected]> wrote:
> Guillermo Marcus wrote:
> > Hi Jiri,
> >
> > The fact that it does not works with RAM is well documented in LDD3,
> > pages 430++. It says (and I tested) that remap_xxx_range does not work
> > in this case. They suggest a method using nopage, similar to the one I
> > implement.
>
> Could somebody confirm, that this still holds?
>

Hum, I also tried it some days ago and it didn't work for me, so I
read LDD3 and I found such explanation about such limitation of
remap_pfn_range(). I heard then that changed in 2.6.15 because of the
new flag; so I have had the same situation.

If it is possible to remap a kernel buffer to userspace with
remap_pfn_range, how should be done the right way?

> > I do not see why remap_xxx_range has the limitation, but it is there.
> > The question is then: can the limitation be removed, or can we implement
> > a new function that maps RAM all at once without the need for a nopage
> > implementation?
> >
> > In any case, here is the code.
>
> Hmm, interesting. I used remap_pfn_range for this purpose today and it worked (I
> double-checked this). I should probably do the rework :(.
>
> regards,
>

2006-11-01 11:12:08

by Rolf Offermanns

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Guillermo Marcus <marcus <at> ti.uni-mannheim.de> writes:
> Note: I am using kernel 2.6.9 for these tests, as it is required by my
> current setup. Maybe this issue has already been addressed in newer
> kernel. If that is the case, please let me know.

Have a look at this article:

"The evolution of driver page remapping"
http://lwn.net/Articles/162860/

It should make things clearer.

The "API changes in the 2.6 kernel series" page is also a very good read:
http://lwn.net/Articles/2.6-kernel-api/

HTH,
Rolf

2006-11-01 12:55:33

by Guillermo Marcus

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

Rolf Offermanns schrieb:
> Guillermo Marcus <marcus <at> ti.uni-mannheim.de> writes:
>> Note: I am using kernel 2.6.9 for these tests, as it is required by my
>> current setup. Maybe this issue has already been addressed in newer
>> kernel. If that is the case, please let me know.
>
> Have a look at this article:
>
> "The evolution of driver page remapping"
> http://lwn.net/Articles/162860/
>
> It should make things clearer.
>
> The "API changes in the 2.6 kernel series" page is also a very good read:
> http://lwn.net/Articles/2.6-kernel-api/
>
> HTH,
> Rolf

Thanks for the links!

Yes, it looks like a step in the right direction. However, the article
says about vm_insert_page(): "...What it does require is that the page
be an order-zero allocation obtained for this purpose...", therefore
making it also unusable for this case (mmaping a pci_alloc_consistent).

I think the limitation (being order zero), is related to the page
counting, as I understand that for bigger order allocations, only the
first-page counter is incremented (not every page). If that is a
problem, I guess I would also see a problem with my workaround, and I
see none (yet). So I may try in a newer kernel and see if I can use it
to walk the pages on the mmap without using the nopage().

My suggestion would be to add two functions: pci_map_consistent() and
dma_map_coherent() to address this issue, and their corresponding
unmap's. That will make sure all that is needed is done, is a clean and
consistent with the pci_ and dma_ APIs, and fills a mmap requirement not
covered by the other functions.

Best wishes,
Guillermo

2006-11-01 14:00:28

by yogeshwar sonawane

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

On 11/1/06, Guillermo Marcus Martinez <[email protected]> wrote:
> Rolf Offermanns schrieb:
> > Guillermo Marcus <marcus <at> ti.uni-mannheim.de> writes:
> >> Note: I am using kernel 2.6.9 for these tests, as it is required by my
> >> current setup. Maybe this issue has already been addressed in newer
> >> kernel. If that is the case, please let me know.
> >
> > Have a look at this article:
> >
> > "The evolution of driver page remapping"
> > http://lwn.net/Articles/162860/
> >
> > It should make things clearer.
> >
> > The "API changes in the 2.6 kernel series" page is also a very good read:
> > http://lwn.net/Articles/2.6-kernel-api/
> >
> > HTH,
> > Rolf
>
> Thanks for the links!
>
> Yes, it looks like a step in the right direction. However, the article
> says about vm_insert_page(): "...What it does require is that the page
> be an order-zero allocation obtained for this purpose...", therefore
> making it also unusable for this case (mmaping a pci_alloc_consistent).
>
> I think the limitation (being order zero), is related to the page
> counting, as I understand that for bigger order allocations, only the
> first-page counter is incremented (not every page). If that is a
> problem, I guess I would also see a problem with my workaround, and I
> see none (yet). So I may try in a newer kernel and see if I can use it
> to walk the pages on the mmap without using the nopage().

Setting 'PG_reserved' bit of all allocated pages & then calling
remap_page/pfn_range()
will do the things for 2.6.9.

>
> My suggestion would be to add two functions: pci_map_consistent() and
> dma_map_coherent() to address this issue, and their corresponding
> unmap's. That will make sure all that is needed is done, is a clean and
> consistent with the pci_ and dma_ APIs, and fills a mmap requirement not
> covered by the other functions.
>
> Best wishes,
> Guillermo
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-11-01 15:03:46

by Guillermo Marcus

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

yogeshwar sonawane schrieb:
> On 11/1/06, Guillermo Marcus Martinez <[email protected]> wrote:
>> Rolf Offermanns schrieb:
>> > Guillermo Marcus <marcus <at> ti.uni-mannheim.de> writes:
>> >> Note: I am using kernel 2.6.9 for these tests, as it is required by my
>> >> current setup. Maybe this issue has already been addressed in newer
>> >> kernel. If that is the case, please let me know.
>> >
>> > Have a look at this article:
>> >
>> > "The evolution of driver page remapping"
>> > http://lwn.net/Articles/162860/
>> >
>> > It should make things clearer.
>> >
>> > The "API changes in the 2.6 kernel series" page is also a very good
>> read:
>> > http://lwn.net/Articles/2.6-kernel-api/
>> >
>> > HTH,
>> > Rolf
>>
>> Thanks for the links!
>>
>> Yes, it looks like a step in the right direction. However, the article
>> says about vm_insert_page(): "...What it does require is that the page
>> be an order-zero allocation obtained for this purpose...", therefore
>> making it also unusable for this case (mmaping a pci_alloc_consistent).
>>
>> I think the limitation (being order zero), is related to the page
>> counting, as I understand that for bigger order allocations, only the
>> first-page counter is incremented (not every page). If that is a
>> problem, I guess I would also see a problem with my workaround, and I
>> see none (yet). So I may try in a newer kernel and see if I can use it
>> to walk the pages on the mmap without using the nopage().
>
> Setting 'PG_reserved' bit of all allocated pages & then calling
> remap_page/pfn_range()
> will do the things for 2.6.9.
>

I will give it a try. I guess it may not be equivalent to setting
VM_RESERVED before calling remap_page/pfn_range(). Is this platform
specific, or is intended behavior/usage of remap_page/pfn_range()?


>>
>> My suggestion would be to add two functions: pci_map_consistent() and
>> dma_map_coherent() to address this issue, and their corresponding
>> unmap's. That will make sure all that is needed is done, is a clean and
>> consistent with the pci_ and dma_ APIs, and fills a mmap requirement not
>> covered by the other functions.
>>
>> Best wishes,
>> Guillermo
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2006-11-02 08:31:18

by Russell King

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space

On Wed, Nov 01, 2006 at 01:58:17PM +0100, Guillermo Marcus Martinez wrote:
> My suggestion would be to add two functions: pci_map_consistent() and
> dma_map_coherent() to address this issue, and their corresponding
> unmap's. That will make sure all that is needed is done, is a clean and
> consistent with the pci_ and dma_ APIs, and fills a mmap requirement not
> covered by the other functions.

You might want to look through include/asm-arm/dma-mapping.h to see if
an architecture already has considered that and the interface they
implemented.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-11-02 11:32:22

by Guillermo Marcus

[permalink] [raw]
Subject: Re: mmaping a kernel buffer to user space



Russell King wrote:
> On Wed, Nov 01, 2006 at 01:58:17PM +0100, Guillermo Marcus Martinez wrote:
>> My suggestion would be to add two functions: pci_map_consistent() and
>> dma_map_coherent() to address this issue, and their corresponding
>> unmap's. That will make sure all that is needed is done, is a clean and
>> consistent with the pci_ and dma_ APIs, and fills a mmap requirement not
>> covered by the other functions.
>
> You might want to look through include/asm-arm/dma-mapping.h to see if
> an architecture already has considered that and the interface they
> implemented.
>

Nice! Thanks. I think the issue of mapping a coherent area to user space
is fairly general. Should not this be promoted to be part of the general
dma-api? (that is, not a platform specific function)