2001-10-01 11:33:41

by Bernd Harries

[permalink] [raw]
Subject: Re: __get_free_pages(): is the MEM really mine?

Ingo Molnar wrote:

> > Is there a guarantee that the n - 1 pages above the 1st one are not
> > donated to other programs while my driver uses them?
>
> yes. The 2MB block of 512 x 4k pages (we should perhaps call it a 'order 9
> page') is yours.

I think I have to demonstrate to you how my driver behaves in reality.

Too bad the driver would in the moment not allow any open() without at least
a PLX RDK Lite evaluation board... It would be possible to modify it to
allow opens even there is no card. Or to malloc a 4 MB buffer also for the minor
31 device, which is my dummy test minor that needs no HW.

Of course you couldn't use the PLX DMA engine then. But you could still mmap
the RAM to user space.

An alternative to sending you a driver (which could make your box instable
temporaryly) is to let you use my Linux box at home. Damn, why didn't I let
you log in from Oldenburg... I forgot about that possibility. I took a PLX eval
board home with me already friday, because here I have the real RSC cards
already.

What do you think?


> > I'll move the code to init_module later once it is stable.
>
> even init_module() can be executed much later: eg. kmod removes the module
> because it's unused, and it's reinserted later. So generally it's really
> unrobust to expect a 9th order allocation to succeed at module_init()
> time.

For our application (dedicated System) I could guarantee even that.

> the fundamental issue is not the lazyness of Linux VM developers. 99.9% of
> all allocations are order 0. 99.9% of the remaining allocations are order
> 1 or 2.

I wonder why only I see problems so far. Maybe it's because I also mmap()
that RAM to user space?



> (later on we could even add support to grow and shrink the size of the
> physical memory pool (within certain boundaries), so it could be sized
> boot-time.)
>
> would anything like this be useful? Since it's a completely separate pool
> (in fact it wont even show up in the normal memory statistics), it does
> not disturb the existing VM in any way.

It would'nt even be needed in the moment. The 9-order get_free_pages() does
not explicitly fail. Not even during later open()s. If it would I would
simply add more RAM. (well, let the company pay it) 256 MB are in and that is
enough so far.

Later I will load the module explicitly right after boot and then it's
almost sure I will get the RAM.

Well, as I said, get_free_pages doesn't even fail! It just seems to allow
others to use the RAM before I free it again... Or it corrupts some kernel
structs during munmap(), which certainly decrements the usage counter of the
upper pages to 0 again.

For now I'll try to reproduce instability without using a DMA Hardware.

Thanks,

--
Bernd Harries

[email protected] http://bharries.freeyellow.com
[email protected] Tel. +49 421 809 7343 priv. | MSB First!
[email protected] +49 421 457 3966 offi. | Linux-m68k
[email protected] +49 172 139 6054 handy | Medusa T40

GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


2001-10-05 12:53:49

by Hugh Dickins

[permalink] [raw]
Subject: Re: __get_free_pages(): is the MEM really mine?

On Mon, 1 Oct 2001, Bernd Harries wrote:
>
> I wonder why only I see problems so far. Maybe it's because I also mmap()
> that RAM to user space?

Probably.

munmap() will handle each order-0-page of your order-9
allocation separately. __get_free_pages gave you count 1 on the
first of those order-0-pages, leaving count 0 on the rest. I don't
know whether you're following the mmap-makes-all-pages-present
model (using remap_page_range), or the fault-page-by-page model
(supplying your own nopage function). But either way it sounds like
you bump each page count by 1 when you map it in, and then when it's
unmapped the count goes down to 0 on all the later order-0-pages,
so they get freed before you're done with them.

Either you should force page count 1 on each of the order-0-pages
before you mmap them in (and raise count to 2); or you should set
the Reserved bit on each them, and clear it before freeing (see use
of mem_map_reserve and mem_map_unreserve in various drivers/sound
sources using remap_page_range; there's also a couple of examples
of the nopage method down there too).

Hugh

2001-10-05 13:32:22

by Bernd Harries

[permalink] [raw]
Subject: Re: __get_free_pages(): is the MEM really mine?

Hugh Dickins wrote:


> I don't
> know whether you're following the mmap-makes-all-pages-present
> model (using remap_page_range), or the fault-page-by-page model
> (supplying your own nopage function).

The nopage method. In Alessandro Rubini's book (p.391) I read, that I can't use remap_page_range() on pages optained by get_free_page().

> But either way it sounds like
> you bump each page count by 1 when you map it in, and then when > it's unmapped the count goes down to 0 on all the later
> order-0-pages,

exactly that happens in the version I use on minor 26 today.

> so they get freed before you're done with them.

Hmm, the only thing that happens to them after munmap() is
free_pages(). I don't access the pages anymore. But maybe some code in free_pages does? Or decrements count to -1?

> Either you should force page count 1 on each of the
> order-0-pages before you mmap them in

Yes, I do that in the version used in minor 27 today right after the allocation.

> (and raise count to 2);

by get_page(), right?

> or you should set
> the Reserved bit on each them, and clear it before freeing
> (see use of mem_map_reserve and mem_map_unreserve in various
> drivers/sound
> sources using remap_page_range; there's also a couple of
> examples of the nopage method down there too).

Ok, thanks a lot. So it's definitely insufficient how my minor 26 version handles the pages, right? If so, that's a statement I can live with.

And it was never ment that I could simply mmap the upper pages to userspace directly, without 'touching' each page, was it?

Ciao,
--
Bernd Harries

[email protected] http://bharries.freeyellow.com
[email protected] Tel. +49 421 809 7343 priv. | MSB First!
[email protected] +49 421 457 3966 offi. | Linux-m68k
[email protected] +49 172 139 6054 handy | Medusa T40

2001-10-05 15:25:57

by Hugh Dickins

[permalink] [raw]
Subject: Re: __get_free_pages(): is the MEM really mine?

On Fri, 5 Oct 2001, Bernd Harries wrote:
> Hugh Dickins wrote:
>
> > I don't
> > know whether you're following the mmap-makes-all-pages-present
> > model (using remap_page_range), or the fault-page-by-page model
> > (supplying your own nopage function).
>
> The nopage method. In Alessandro Rubini's book (p.391) I read, that
> I can't use remap_page_range() on pages optained by get_free_page().

I just looked that up. Rubini is right that remap_page_range only
works as you'd want on reserved pages, and pages which fail the
VALID_PAGE(page) test (I'm trying to avoid saying "invalid pages"),
and there is a good reason for that. But Rubini omits to mention
mem_map_reserve, which can be used (on pages you own exclusively)
to mark a page as temporarily reserved, so remap_page_range will
then work as you'd want on it (with mem_map_unreserve to undo later).

The mem_map_reserve, remap_page_range model is commoner in drivers
than the nopage model; but it is somewhat deprecated now, Linus for
one certainly preferring the nopage model; and the VM_RESERVED vma
flag can give pages that immunity from swap_out which mem_map_reserve
also confers. You're not wrong to follow the nopage model.

> Hmm, the only thing that happens to them after munmap() is
> free_pages(). I don't access the pages anymore. But maybe some code in free_pages does? Or decrements count to -1?

I've forgotten by now what your precise symptoms were. But either
pages would be freed twice and allocated twice; or they would hit a
BUG() statement in second free or second allocation; neither good.

> > Either you should force page count 1 on each of the
> > order-0-pages before you mmap them in (and raise count to 2);
>
> by get_page(), right?

Fine; and I expect you'll need to undo it later by appropriate put_page()s.

> Ok, thanks a lot. So it's definitely insufficient how my minor 26 version handles the pages, right? If so, that's a statement I can live with.
>
> And it was never ment that I could simply mmap the upper pages to userspace directly, without 'touching' each page, was it?

Probably all the drivers which use higher order allocations are using
the older, mem_map_reserve + remap_page_range method; the reserved
bit preserves a page against freeing whatever its page count. Maybe
you're the first to use the nopage method on a higher order allocation
(or maybe not, and there are already drivers working around it).

I wouldn't claim the way it is currently is ideal design: I think
you've hit a not entirely satisfactory but easily worked around oddity,

Hugh