2002-01-04 20:33:32

by Steffen Persvold

[permalink] [raw]
Subject: Short question about the mmap method

Hi lkml readers,

I have a question regarding drivers implementing the mmap and nopage methods. In some references
I've read that pages in kernel allocated memory (either allocated with kmalloc, vmalloc or
__get_free_pages) should be set to reserved (mem_map_reserve or set_bit(PG_reserved, page->flags)
before they can be mmap'ed to guarantee that they can't be swapped out. Is this true ?

The reason I ask is that I have a test driver that allocates some pages with vmalloc(), reserves
them with mem_map_reserve(), and uses the "nopage" method to give a userspace app access to them.
When the userpace app accesses the page, the "nopage" function is invoked as expected and the
page->count is incremented (by the nopage function). When this application exits the page->count
should have been decremented by the kernel, but it seems like it doesn't since the page is reserved
and this causes a giant memleak when the driver is unloaded since the pages are not put back to the
free list unless page->count is zero (I do mem_map_unreserve before vfree). If I avoid using
mem_map_reserve (and _unreserve) the page count is 1 at the time I unload the driver and everything
goes fine.

The kernel I'm using is RedHat 7.2 2.4.9-12

One more question:

When getting the "struct page" for pages allocated with vmalloc() I use the same method as
drivers/media/video/bttv-driver.c (checking the page table). Somewhere (I think Rubini) I read that
the init_mm.page_table_lock should be held before checking the page table. Is this true, or can it
safely be done without doing that (the bttv-driver.c doesn't) ?

Any ideas & comments are appreciated.

Thanks,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency


2002-01-04 21:00:24

by Tommy Reynolds

[permalink] [raw]
Subject: Re: Short question about the mmap method

Uttered "Steffen Persvold" <[email protected]>, spoke thus:

> Hi lkml readers,
>
> I have a question regarding drivers implementing the mmap and nopage methods.
> In some references I've read that pages in kernel allocated memory (either
> allocated with kmalloc, vmalloc or__get_free_pages) should be set to reserved
> (mem_map_reserve or set_bit(PG_reserved, page->flags) before they can be
> mmap'ed to guarantee that they can't be swapped out. Is this true ?

[kv]malloc memory is _never_ subject to paging and can be mmap'ed with a
vengeance without resorting to mucking about with marking pages or the like.

You're working too hard ;-)

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- --
Tommy Reynolds | mailto: <[email protected]>
Red Hat, Inc., Embedded Development Services | Phone: +1.256.704.9286
307 Wynn Drive NW, Huntsville, AL 35805 USA | FAX: +1.256.837.3839
Senior Software Developer | Mobile: +1.919.641.2923


Attachments:
(No filename) (197.00 B)

2002-01-05 01:15:26

by Steffen Persvold

[permalink] [raw]
Subject: Re: Short question about the mmap method

Tommy Reynolds wrote:
>
> Uttered "Steffen Persvold" <[email protected]>, spoke thus:
>
> > Hi lkml readers,
> >
> > I have a question regarding drivers implementing the mmap and nopage methods.
> > In some references I've read that pages in kernel allocated memory (either
> > allocated with kmalloc, vmalloc or__get_free_pages) should be set to reserved
> > (mem_map_reserve or set_bit(PG_reserved, page->flags) before they can be
> > mmap'ed to guarantee that they can't be swapped out. Is this true ?
>
> [kv]malloc memory is _never_ subject to paging and can be mmap'ed with a
> vengeance without resorting to mucking about with marking pages or the like.
>
> You're working too hard ;-)
>

OK, thanks. But I found out that if you want to use remap_page_range on kmalloc'ed memory you need
to set the reserve bit first. Without it, it just doesn't work. When using the nopage method no
reserving is necessary.


What about my question regarding locking the mm spinlock table before traversing the page table (for
vmalloc'ed memory). Any ideas there ?

Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency

2002-01-05 03:04:15

by Steffen Persvold

[permalink] [raw]
Subject: Re: Short question about the mmap method

Steffen Persvold wrote:
>
> Tommy Reynolds wrote:
> >
> > Uttered "Steffen Persvold" <[email protected]>, spoke thus:
> >
> > > Hi lkml readers,
> > >
> > > I have a question regarding drivers implementing the mmap and nopage methods.
> > > In some references I've read that pages in kernel allocated memory (either
> > > allocated with kmalloc, vmalloc or__get_free_pages) should be set to reserved
> > > (mem_map_reserve or set_bit(PG_reserved, page->flags) before they can be
> > > mmap'ed to guarantee that they can't be swapped out. Is this true ?
> >
> > [kv]malloc memory is _never_ subject to paging and can be mmap'ed with a
> > vengeance without resorting to mucking about with marking pages or the like.
> >
> > You're working too hard ;-)
> >

Another thing, when allocating memory with vmalloc, how can I be sure that the pages I get is
adressable within 4GB (i.e I wan't to call pci_map_sg on this buffer for my 32bit PCI device without
having to use bounce buffers ) ? On systems with less that 4GB of physical memory there's no
problem, but what happens if you have more (lets say an IA64 server with 16GB of RAM) and don't have
an IOMMU (like alpha and sparc) ?

I noticed a vmalloc_32 in linux/vmalloc.h (the comment says "32bit PA addressable pages - eg for PCI
32bit devices"), but is that one platform independent (I see that it is only using GFP_KERNEL, while
vmalloc is using GFP_KERNEL | __GFP_HIGHMEM) ? This issue goes for __get_free_pages too I guess.

Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency

2002-01-05 04:07:04

by Steffen Persvold

[permalink] [raw]
Subject: Re: Short question about the mmap method

Steffen Persvold wrote:
>
> Another thing, when allocating memory with vmalloc, how can I be sure that the pages I get is
> adressable within 4GB (i.e I wan't to call pci_map_sg on this buffer for my 32bit PCI device without
> having to use bounce buffers ) ? On systems with less that 4GB of physical memory there's no
> problem, but what happens if you have more (lets say an IA64 server with 16GB of RAM) and don't have
> an IOMMU (like alpha and sparc) ?
>
> I noticed a vmalloc_32 in linux/vmalloc.h (the comment says "32bit PA addressable pages - eg for PCI
> 32bit devices"), but is that one platform independent (I see that it is only using GFP_KERNEL, while
> vmalloc is using GFP_KERNEL | __GFP_HIGHMEM) ? This issue goes for __get_free_pages too I guess.
>

Hmm, it helps looking back at old threads (which I actually was involved in).

What ever happened to Jens Axboe's "zone_dma32" patch ? Why wasn't it included in the main 2.4.6
kernel tree (seemed like a good idea to me) ?

Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency

2002-01-05 17:49:59

by Douglas Gilbert

[permalink] [raw]
Subject: Re: Short question about the mmap method

Steffen Persvold <[email protected]> wrote:

> I have a question regarding drivers implementing the
> mmap and nopage methods. In some references
> I've read that pages in kernel allocated memory (either
> allocated with kmalloc, vmalloc or __get_free_pages)
> should be set to reserved (mem_map_reserve or
> set_bit(PG_reserved, page->flags) before they can be
> mmap'ed to guarantee that they can't be swapped out.
> Is this true ?

Steffen,
I recently implemented the mmap() call in the SCSI generic
(sg) driver. See that driver in lk 2.4.17 or go to
http://www.torque.net/sg
and download version 3.1.22 of the sg driver from the table.
It will run on any kernel in the 2.4 series. It should
answer most of your questions (at least about mmap-ing
memory obtained from __get_free_pages() which is a bit
tricky).

Doug Gilbert

2002-01-06 10:41:49

by Roman Zippel

[permalink] [raw]
Subject: Re: Short question about the mmap method

Hi,

Steffen Persvold wrote:

> I have a question regarding drivers implementing the mmap and nopage methods. In some references
> I've read that pages in kernel allocated memory (either allocated with kmalloc, vmalloc or
> __get_free_pages) should be set to reserved (mem_map_reserve or set_bit(PG_reserved, page->flags)
> before they can be mmap'ed to guarantee that they can't be swapped out. Is this true ?
>
> The reason I ask is that I have a test driver that allocates some pages with vmalloc(), reserves
> them with mem_map_reserve(), and uses the "nopage" method to give a userspace app access to them.
> When the userpace app accesses the page, the "nopage" function is invoked as expected and the
> page->count is incremented (by the nopage function).

You must only set PG_reserved or only increment the page count. If you
increment the page count, the kernel will try to free memory by removing
the page from the process and reread the page later by calling the
nopage function again.
If you have the pages allocated all the time anyway, it's better to set
PG_reserved, but these pages can only be freed again when all files are
closed. You probably also want to set VM_RESERVED in vma->vm_flags, so
the kernel won't even look at these pages, while scanning the process
for freeable memory.

> When getting the "struct page" for pages allocated with vmalloc() I use the same method as
> drivers/media/video/bttv-driver.c (checking the page table). Somewhere (I think Rubini) I read that
> the init_mm.page_table_lock should be held before checking the page table. Is this true, or can it
> safely be done without doing that (the bttv-driver.c doesn't) ?

Currently this lock isn't needed, since noone will remove or change this
mapping until you call vfree().

bye, Roman