2000-10-30 15:58:19

by Richard B. Johnson

[permalink] [raw]
Subject: kmalloc() allocation.


Hello,
How much memory would it be reasonable for kmalloc() to be able
to allocate to a module?

Oct 30 10:48:31 chaos kernel: kmalloc: Size (524288) too large

Using Version 2.2.17, I can't allocate more than 64k! I need
to allocate at least 1/2 megabyte and preferably more (like 2 megabytes).

There are 256 megabytes of SDRAM available. I don't think it's
reasonable that a 1/2 megabyte allocation would fail, especially
since it's the first module being installed.

The attempt to allocate is memory of type GFP_KERNEL.


Any advice?

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



2000-10-30 16:06:22

by Tigran Aivazian

[permalink] [raw]
Subject: Re: kmalloc() allocation.

Hi Dick,

Sorry, I thought you knew this already :) The maximum for kmalloc is 128K
and is defined in mm/slab.c. It is trivial to "enhance" slab.c to support
more but it is in practice not very useful because requesting too much
physically-contiguous (which kmalloc is all about) memory is impossible
except at very early stages after boot (due to obvious fragmentation).

So, if you don't need physically contiguous (and fast) allocations perhaps
you could make use of vmalloc()/vfree() instead? There must be also some
"exotic" allocation APIs like bootmem but I know nothing of them so I stop
here.

Regards,
Tigran


On Mon, 30 Oct 2000, Richard B. Johnson wrote:

>
> Hello,
> How much memory would it be reasonable for kmalloc() to be able
> to allocate to a module?
>
> Oct 30 10:48:31 chaos kernel: kmalloc: Size (524288) too large
>
> Using Version 2.2.17, I can't allocate more than 64k! I need
> to allocate at least 1/2 megabyte and preferably more (like 2 megabytes).
>
> There are 256 megabytes of SDRAM available. I don't think it's
> reasonable that a 1/2 megabyte allocation would fail, especially
> since it's the first module being installed.
>
> The attempt to allocate is memory of type GFP_KERNEL.
>
>
> Any advice?
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).
>
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

2000-10-30 16:06:52

by John Levon

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Richard B. Johnson wrote:

>
> Hello,
> How much memory would it be reasonable for kmalloc() to be able
> to allocate to a module?
>
> Oct 30 10:48:31 chaos kernel: kmalloc: Size (524288) too large
>
> Using Version 2.2.17, I can't allocate more than 64k! I need
> to allocate at least 1/2 megabyte and preferably more (like 2 megabytes).
>
> There are 256 megabytes of SDRAM available. I don't think it's
> reasonable that a 1/2 megabyte allocation would fail, especially
> since it's the first module being installed.
>
> The attempt to allocate is memory of type GFP_KERNEL.

Why do you need physically-contiguous memory ? Can you not just use
vmalloc()/vfree()

john

--
"It's not that the suggestions are not good ideas. That problem is that
committees cannot say no to good ideas, while the one thing that matters
above all in any design task is saying no to almost everything."
- Vern Schryver

2000-10-30 16:29:41

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Tigran Aivazian wrote:

> Hi Dick,
>
> Sorry, I thought you knew this already :) The maximum for kmalloc is 128K
> and is defined in mm/slab.c. It is trivial to "enhance" slab.c to support
> more but it is in practice not very useful because requesting too much
> physically-contiguous (which kmalloc is all about) memory is impossible
> except at very early stages after boot (due to obvious fragmentation).
>
> So, if you don't need physically contiguous (and fast) allocations perhaps
> you could make use of vmalloc()/vfree() instead? There must be also some
> "exotic" allocation APIs like bootmem but I know nothing of them so I stop
> here.
>
> Regards,
> Tigran
>
>

Okay. Looks like I need a linked-list so I can use noncontiguous memory.



Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-30 16:38:23

by Tigran Aivazian

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Richard B. Johnson wrote:
> > So, if you don't need physically contiguous (and fast) allocations perhaps
> > you could make use of vmalloc()/vfree() instead? There must be also some
> > "exotic" allocation APIs like bootmem but I know nothing of them so I stop
> > here.
>
> Okay. Looks like I need a linked-list so I can use noncontiguous memory.

Just to remind, I was talking of physically and not just virtually
contiguous. vmalloc will still give you a virtually-contiguous chunk. But
if by "I need a linked-list" you mean that each node of the list may be
talking to some hardware but the hardware won't know about the whole list,
then you still need to use physically-contiguous allocator like
__get_free_pages() for each data node, i.e. if your hardware actually
needs physically contiguous chunk to talk to. Also, in this case, using
vmalloc() to allocate just the "linkage/admin overhead" is silly, just
using kmalloc or even creating a private slab object cache is probably a
better idea.

Regards,
Tigran

2000-10-30 16:41:03

by Rik van Riel

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Richard B. Johnson wrote:

> How much memory would it be reasonable for kmalloc() to be able
> to allocate to a module?

> There are 256 megabytes of SDRAM available. I don't think it's
> reasonable that a 1/2 megabyte allocation would fail, especially
> since it's the first module being installed.

If you write the defragmentation code for the VM, I'll
be happy to bump up the limit a bit ...

Until then, please be modest with the amount of physically
contiguous pages you try to allocate ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/

2000-10-30 16:42:03

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, John Levon wrote:

> On Mon, 30 Oct 2000, Richard B. Johnson wrote:
>
> >
> > Hello,
> > How much memory would it be reasonable for kmalloc() to be able
> > to allocate to a module?
> >
> > Oct 30 10:48:31 chaos kernel: kmalloc: Size (524288) too large
> >
> > Using Version 2.2.17, I can't allocate more than 64k! I need
> > to allocate at least 1/2 megabyte and preferably more (like 2 megabytes).
> >
> > There are 256 megabytes of SDRAM available. I don't think it's
> > reasonable that a 1/2 megabyte allocation would fail, especially
> > since it's the first module being installed.
> >
> > The attempt to allocate is memory of type GFP_KERNEL.
>
> Why do you need physically-contiguous memory ? Can you not just use
> vmalloc()/vfree()
>

Well, maybe there is a better way, but the following must happen:
I need a non-paged buffer that has already been allocated, so it is
available during an interrupt.

I get an interrupt, at which time I have to copy up to 4 megabytes
from a memory-mapped PCI window into this RAM. These data represent
a 'snap-shot' of the output of an ADC during the past ~20 us (yes
it's fast). Once I have copied the data, I can then re-enable
the ADC from within the ISR, i.e., allow the image data to change.

The ISR then executes wake_ip_interruptible() to notify a caller
sleeping in poll().

One the caller is awakened, he read()s the device and the data
are copied, using copy_to_user() into its buffers.

Now, I could set up a linked-list of buffers and use vmalloc()
if the buffers were allocated from non-paged RAM. I don't think
they are. These buffers must be present during an interrupt.

However, I could possibly use kmalloc() to initialize a linked-list
so they don't have to be contiguous.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-30 16:51:34

by Jeff Garzik

[permalink] [raw]
Subject: Re: kmalloc() allocation.

"Richard B. Johnson" wrote:
> Now, I could set up a linked-list of buffers and use vmalloc()
> if the buffers were allocated from non-paged RAM. I don't think
> they are. These buffers must be present during an interrupt.

Non-paged RAM? I'm not sure what you mean by that.

Both kmalloc and vmalloc allocate pages, but neither will allocate pages
that the system will swap out (page out). [vk]malloc pages are always
around during an interrupt.

Jeff




--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the
MandrakeSoft | dash and shrieking like a cheerleader."
| -Max

2000-10-30 16:55:33

by Tigran Aivazian

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Jeff Garzik wrote:

> "Richard B. Johnson" wrote:
> > Now, I could set up a linked-list of buffers and use vmalloc()
> > if the buffers were allocated from non-paged RAM. I don't think
> > they are. These buffers must be present during an interrupt.
>
> Non-paged RAM? I'm not sure what you mean by that.
>
> Both kmalloc and vmalloc allocate pages, but neither will allocate pages
> that the system will swap out (page out). [vk]malloc pages are always
> around during an interrupt.
>

Jeff, I was going to tell him that but in the previous sentence he was
talking about userspace supplied buffers and those are certainly not
pinned.

Regards,
Tigran

2000-10-30 17:10:45

by Jeff Garzik

[permalink] [raw]
Subject: Re: kmalloc() allocation.

Tigran Aivazian wrote:
>
> On Mon, 30 Oct 2000, Jeff Garzik wrote:
>
> > "Richard B. Johnson" wrote:
> > > Now, I could set up a linked-list of buffers and use vmalloc()
> > > if the buffers were allocated from non-paged RAM. I don't think
> > > they are. These buffers must be present during an interrupt.
> >
> > Non-paged RAM? I'm not sure what you mean by that.
> >
> > Both kmalloc and vmalloc allocate pages, but neither will allocate pages
> > that the system will swap out (page out). [vk]malloc pages are always
> > around during an interrupt.

> Jeff, I was going to tell him that but in the previous sentence he was
> talking about userspace supplied buffers and those are certainly not
> pinned.

Well the problem sounds really strange then. Why are kmalloc/vmalloc
being talked about at all, if we are dealing with userspace-supplied
buffers?

IF copy_to_user is being used here, userspace buffers in kernel space
are pointless. Any userspace buffer supplied to read(2) must by design
be a different buffer than the 2nd arg of copy_to_user. If copy_to_user
is being used, then there is a "copy" taking place...

Richard, if you want to read directly into userspace buffers, kiobufs
are the way to go... If you don't want to ever swap them in and out,
you can mlock(2) them. Or simply allocate the memory in the driver, and
mmap those buffers. Much easier than read(2), and it eliminates any
copy step.

Jeff



--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the
MandrakeSoft | dash and shrieking like a cheerleader."
| -Max

2000-10-30 17:59:20

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Tigran Aivazian wrote:

> On Mon, 30 Oct 2000, Richard B. Johnson wrote:
> > > So, if you don't need physically contiguous (and fast) allocations perhaps
> > > you could make use of vmalloc()/vfree() instead? There must be also some
> > > "exotic" allocation APIs like bootmem but I know nothing of them so I stop
> > > here.
> >
> > Okay. Looks like I need a linked-list so I can use noncontiguous memory.
>
> Just to remind, I was talking of physically and not just virtually
> contiguous. vmalloc will still give you a virtually-contiguous chunk. But
> if by "I need a linked-list" you mean that each node of the list may be
> talking to some hardware but the hardware won't know about the whole list,
> then you still need to use physically-contiguous allocator like
> __get_free_pages() for each data node, i.e. if your hardware actually
> needs physically contiguous chunk to talk to. Also, in this case, using
> vmalloc() to allocate just the "linkage/admin overhead" is silly, just
> using kmalloc or even creating a private slab object cache is probably a
> better idea.
>
> Regards,
> Tigran
>

If I can only get 128k bytes of RAM that is still present during
an interrupt, because of a kmalloc() limitation, then I need to
allocate multiple buffers and keep their pointers in a list, right?

It doesn't actually have to be a linked list, the buffers are
never deallocated until the module is removed.

char *ram_128k[16];

16 buffers with 128k in each.

In the interrupt, I just write to the previously-allocated 16 buffers.
In the read(), I just read from them.


Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-30 18:07:34

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Jeff Garzik wrote:

> "Richard B. Johnson" wrote:
> > Now, I could set up a linked-list of buffers and use vmalloc()
> > if the buffers were allocated from non-paged RAM. I don't think
> > they are. These buffers must be present during an interrupt.
>
> Non-paged RAM? I'm not sure what you mean by that.
>
> Both kmalloc and vmalloc allocate pages, but neither will allocate pages
> that the system will swap out (page out). [vk]malloc pages are always
> around during an interrupt.
>
> Jeff

Hmm, vmalloc() doesn't seem to have the size limitation. Are you sure
that it's present during an interrupt? I can't page-fault during the
interrupt.



Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-30 18:08:44

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Jeff Garzik wrote:

> Tigran Aivazian wrote:
> >
> > On Mon, 30 Oct 2000, Jeff Garzik wrote:
> >
> > > "Richard B. Johnson" wrote:
> > > > Now, I could set up a linked-list of buffers and use vmalloc()
> > > > if the buffers were allocated from non-paged RAM. I don't think
> > > > they are. These buffers must be present during an interrupt.
> > >
> > > Non-paged RAM? I'm not sure what you mean by that.
> > >
> > > Both kmalloc and vmalloc allocate pages, but neither will allocate pages
> > > that the system will swap out (page out). [vk]malloc pages are always
> > > around during an interrupt.
>
> > Jeff, I was going to tell him that but in the previous sentence he was
> > talking about userspace supplied buffers and those are certainly not
> > pinned.
>
> Well the problem sounds really strange then. Why are kmalloc/vmalloc
> being talked about at all, if we are dealing with userspace-supplied
> buffers?
>

No, not user-supplied buffers. Just a fixed kernel allocation large
enough to take an entire image which can be up to and including 2
megabytes.



Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-30 18:12:24

by Jeff Garzik

[permalink] [raw]
Subject: Re: kmalloc() allocation.

"Richard B. Johnson" wrote:
>
> On Mon, 30 Oct 2000, Jeff Garzik wrote:
>
> > "Richard B. Johnson" wrote:
> > > Now, I could set up a linked-list of buffers and use vmalloc()
> > > if the buffers were allocated from non-paged RAM. I don't think
> > > they are. These buffers must be present during an interrupt.
> >
> > Non-paged RAM? I'm not sure what you mean by that.
> >
> > Both kmalloc and vmalloc allocate pages, but neither will allocate pages
> > that the system will swap out (page out). [vk]malloc pages are always
> > around during an interrupt.
> >
> > Jeff
>
> Hmm, vmalloc() doesn't seem to have the size limitation. Are you sure
> that it's present during an interrupt? I can't page-fault during the
> interrupt.

vmalloc'd memory does have a size limitation, though it's larger than
kmalloc's limit. AFAIK vmalloc'd memory is a collection of pages
remapping in the page tables to be virtually contiguous, implying that
it is present during an interrupt.

--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the
MandrakeSoft | dash and shrieking like a cheerleader."
| -Max

2000-10-30 18:21:25

by Alan Cox

[permalink] [raw]
Subject: Re: kmalloc() allocation.

> How much memory would it be reasonable for kmalloc() to be able
> to allocate to a module?

64K probably less. kmalloc allocates physically linear spaces. vmalloc will
happily grab you 2Mb of space but it will not be physically linear

2000-10-30 18:38:17

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Alan Cox wrote:

> > How much memory would it be reasonable for kmalloc() to be able
> > to allocate to a module?
>
> 64K probably less. kmalloc allocates physically linear spaces. vmalloc will
> happily grab you 2Mb of space but it will not be physically linear
>

Okay. Thanks.


Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-31 05:28:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: kmalloc() allocation.

Followup to: <[email protected]>
By author: "Richard B. Johnson" <[email protected]>
In newsgroup: linux.dev.kernel
>
> > 64K probably less. kmalloc allocates physically linear spaces. vmalloc will
> > happily grab you 2Mb of space but it will not be physically linear
> >
>
> Okay. Thanks.
>

FWIW, vmalloc()-allocated pages are definitely pinned-down and
available to interrupts. However, you should keep in mind that the
vmalloc() call *itself* is quite expensive on SMP machines (have to
interrupt all CPUs and flush their TLBs!!) so if you're using
vmalloc(), be careful with the number of calls you make. Of course,
this is usually not a problem.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-10-31 06:11:27

by Brian Gerst

[permalink] [raw]
Subject: Re: kmalloc() allocation.

"H. Peter Anvin" wrote:
>
> Followup to: <[email protected]>
> By author: "Richard B. Johnson" <[email protected]>
> In newsgroup: linux.dev.kernel
> >
> > > 64K probably less. kmalloc allocates physically linear spaces. vmalloc will
> > > happily grab you 2Mb of space but it will not be physically linear
> > >
> >
> > Okay. Thanks.
> >
>
> FWIW, vmalloc()-allocated pages are definitely pinned-down and
> available to interrupts. However, you should keep in mind that the
> vmalloc() call *itself* is quite expensive on SMP machines (have to
> interrupt all CPUs and flush their TLBs!!) so if you're using
> vmalloc(), be careful with the number of calls you make. Of course,
> this is usually not a problem.

This was just changed in 2.4 so that vmalloced pages are faulted in on
demand.

--

Brian Gerst

2000-10-31 07:06:12

by Mike Galbraith

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, 30 Oct 2000, Rik van Riel wrote:

> On Mon, 30 Oct 2000, Richard B. Johnson wrote:
>
> > How much memory would it be reasonable for kmalloc() to be able
> > to allocate to a module?
>
> > There are 256 megabytes of SDRAM available. I don't think it's
> > reasonable that a 1/2 megabyte allocation would fail, especially
> > since it's the first module being installed.
>
> If you write the defragmentation code for the VM, I'll
> be happy to bump up the limit a bit ...

Hmm.. Bill Hawes wrote a memory defragger a long time ago. I have a
copy of it lying around if you want to take a look at it.

-Mike

2000-10-31 08:45:21

by Andi Kleen

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, Oct 31, 2000 at 01:11:29AM -0500, Brian Gerst wrote:
> This was just changed in 2.4 so that vmalloced pages are faulted in on
> demand.

Could you explain how it handles the vmalloc() -- vfree() -- vmalloc() of same
virtual space but different physical race ?

-Andi

2000-10-31 08:49:31

by Tigran Aivazian

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Brian Gerst wrote:

> "H. Peter Anvin" wrote:
> >
> > Followup to: <[email protected]>
> > By author: "Richard B. Johnson" <[email protected]>
> > In newsgroup: linux.dev.kernel
> > >
> > > > 64K probably less. kmalloc allocates physically linear spaces. vmalloc will
> > > > happily grab you 2Mb of space but it will not be physically linear
> > > >
> > >
> > > Okay. Thanks.
> > >
> >
> > FWIW, vmalloc()-allocated pages are definitely pinned-down and
> > available to interrupts. However, you should keep in mind that the
> > vmalloc() call *itself* is quite expensive on SMP machines (have to
> > interrupt all CPUs and flush their TLBs!!) so if you're using
> > vmalloc(), be careful with the number of calls you make. Of course,
> > this is usually not a problem.
>
> This was just changed in 2.4 so that vmalloced pages are faulted in on
> demand.

what do you mean?! That is, of course, impossible because it would break
all existing software, so I won't even bother checking the code, safely
assuming that you perhaps meant something else, ok?

Thanks,
Tigran

2000-10-31 08:54:32

by Andi Kleen

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, Oct 31, 2000 at 08:49:02AM +0000, Tigran Aivazian wrote:
>
> what do you mean?! That is, of course, impossible because it would break
> all existing software, so I won't even bother checking the code, safely
> assuming that you perhaps meant something else, ok?

He refers to faulting into the page table from a master table, not faulting
from disk.

-Andi

2000-10-31 09:08:00

by Tigran Aivazian

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Andi Kleen wrote:

> On Tue, Oct 31, 2000 at 08:49:02AM +0000, Tigran Aivazian wrote:
> >
> > what do you mean?! That is, of course, impossible because it would break
> > all existing software, so I won't even bother checking the code, safely
> > assuming that you perhaps meant something else, ok?
>
> He refers to faulting into the page table from a master table, not faulting
> from disk.
>

Ah, ok then. Thanks Andi, I was a bit worried that the world has changed
too radically for me to catch up :)

Regards,
Tigran

2000-10-31 09:25:36

by Andi Kleen

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, Oct 31, 2000 at 09:07:29AM +0000, Tigran Aivazian wrote:
> On Tue, 31 Oct 2000, Andi Kleen wrote:
>
> > On Tue, Oct 31, 2000 at 08:49:02AM +0000, Tigran Aivazian wrote:
> > >
> > > what do you mean?! That is, of course, impossible because it would break
> > > all existing software, so I won't even bother checking the code, safely
> > > assuming that you perhaps meant something else, ok?
> >
> > He refers to faulting into the page table from a master table, not faulting
> > from disk.
> >
>
> Ah, ok then. Thanks Andi, I was a bit worried that the world has changed
> too radically for me to catch up :)

Well, unless I'm missing something major the new method is racy (it does
not handle vmalloc-vfree-vmalloc of same area on a different CPU)

-Andi

2000-10-31 10:49:19

by Ingo Oeser

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Mon, Oct 30, 2000 at 02:40:16PM -0200, Rik van Riel wrote:
> > There are 256 megabytes of SDRAM available. I don't think it's
> > reasonable that a 1/2 megabyte allocation would fail, especially
> > since it's the first module being installed.
> If you write the defragmentation code for the VM, I'll
> be happy to bump up the limit a bit ...

Should become easier once we start doing physical page scannings.

We could record physical continous freeable areas on the fly
then. If someone asks for them later, we recheck whether they
still exists and free (inactive_clean) or remap (active or
inactive_dirty) the whole area, whether they are used or not.

This could still be improved by using up smallest fit areas
first for kmalloc() based on these areas.

But beware: We just have a good hint here, which needs to be
rechecked every time we allocate such areas to become
guarantee.

Rik: What do you think about this (physical cont. area cache) for 2.5?

Regards

Ingo Oeser
--
Feel the power of the penguin - run [email protected]
<esc>:x

2000-10-31 13:00:07

by Brian Gerst

[permalink] [raw]
Subject: Re: kmalloc() allocation.

Andi Kleen wrote:
>
> On Tue, Oct 31, 2000 at 01:11:29AM -0500, Brian Gerst wrote:
> > This was just changed in 2.4 so that vmalloced pages are faulted in on
> > demand.
>
> Could you explain how it handles the vmalloc() -- vfree() -- vmalloc() of same
> virtual space but different physical race ?

As far as I can tell (I didn't write the code), vfree didn't change.
It's only vmalloc that's lazy now.

--

Brian Gerst

2000-10-31 13:36:37

by Rik van Riel

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Ingo Oeser wrote:
> On Mon, Oct 30, 2000 at 02:40:16PM -0200, Rik van Riel wrote:

> > If you write the defragmentation code for the VM, I'll
> > be happy to bump up the limit a bit ...
>
> Should become easier once we start doing physical page scannings.
>
> We could record physical continous freeable areas on the fly
> then. If someone asks for them later, we recheck whether they
> still exists and free (inactive_clean) or remap (active or
> inactive_dirty) the whole area, whether they are used or not.
>
> This could still be improved by using up smallest fit areas
> first for kmalloc() based on these areas.

> Rik: What do you think about this (physical cont. area cache) for 2.5?

http://www.surriel.com/zone-alloc.html

cheers,

Rik
--
"What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/

2000-10-31 14:01:02

by Richard B. Johnson

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Rik van Riel wrote:

> On Tue, 31 Oct 2000, Ingo Oeser wrote:
> > On Mon, Oct 30, 2000 at 02:40:16PM -0200, Rik van Riel wrote:
>
> > > If you write the defragmentation code for the VM, I'll
> > > be happy to bump up the limit a bit ...
> >
> > Should become easier once we start doing physical page scannings.
> >
> > We could record physical continous freeable areas on the fly
> > then. If someone asks for them later, we recheck whether they
> > still exists and free (inactive_clean) or remap (active or
> > inactive_dirty) the whole area, whether they are used or not.
> >
> > This could still be improved by using up smallest fit areas
> > first for kmalloc() based on these areas.
>
> > Rik: What do you think about this (physical cont. area cache) for 2.5?
>
> http://www.surriel.com/zone-alloc.html
>
> cheers,
>
> Rik
> --

Since Linux is starting to be used in many 'strange' non-desktop
environments, maybe it's time to provide a hook to reserve the
top N kilobytes of RAM for strange buffers. Like:

append="..,reserve=2M".

Upon startup, a pointer, valid when using the kernel DS, could be
initialized to point to the beginning of this area. This is essentially
zero overhead for the kernel because it just points to one longword
greater than the RAM the kernel will use.

In the event that this is too much work, then an additional entry could
be made in the GDT to address this area, and the resulting segment
number could be included in a kernel header file. To access it, code
would do:

push ds
movl $RESERVE_MEM, %eax
movl %eax,ds
.....
DS:[0] now points to its beginning.
pop ds

This 'free' area could be used for all kinds of stuff including helping
to relocate/debug come complex things.

The cost to performance is zero. A GDT entry on Intel is 8 bytes.


Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-31 14:45:25

by afei

[permalink] [raw]
Subject: Re: kmalloc() allocation.



On Tue, 31 Oct 2000, Ingo Oeser wrote:

> On Mon, Oct 30, 2000 at 02:40:16PM -0200, Rik van Riel wrote:
> > > There are 256 megabytes of SDRAM available. I don't think it's
> > > reasonable that a 1/2 megabyte allocation would fail, especially
> > > since it's the first module being installed.
> > If you write the defragmentation code for the VM, I'll
> > be happy to bump up the limit a bit ...
>
> Should become easier once we start doing physical page scannings.
>
> We could record physical continous freeable areas on the fly
> then. If someone asks for them later, we recheck whether they
> still exists and free (inactive_clean) or remap (active or
> inactive_dirty) the whole area, whether they are used or not.

I am confused. Why cannot one simply audit the memory usage and always
have an up-to-date list of free memory pages? When a page is allocated,
the allocator should make a call to move that page outside of the
freelist; and when it is free, just move it back to the free list. Is it
because of the overhead?

Fei
>
> This could still be improved by using up smallest fit areas
> first for kmalloc() based on these areas.
>
> But beware: We just have a good hint here, which needs to be
> rechecked every time we allocate such areas to become
> guarantee.
>
> Rik: What do you think about this (physical cont. area cache) for 2.5?
>
> Regards
>
> Ingo Oeser
> --
> Feel the power of the penguin - run [email protected]
> <esc>:x
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>

2000-10-31 14:49:35

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Brian Gerst wrote:

> Andi Kleen wrote:
> >
> > On Tue, Oct 31, 2000 at 01:11:29AM -0500, Brian Gerst wrote:
> > > This was just changed in 2.4 so that vmalloced pages are faulted in on
> > > demand.
> >
> > Could you explain how it handles the vmalloc() -- vfree() -- vmalloc() of same
> > virtual space but different physical race ?
>
> As far as I can tell (I didn't write the code), vfree didn't change.
> It's only vmalloc that's lazy now.

The code for vmalloc allocates the pages at vmalloc time, not after. The
TLB is populated lazily, but most definately not the page tables.

-ben

2000-10-31 15:03:07

by Alan Cox

[permalink] [raw]
Subject: Re: kmalloc() allocation.

> The code for vmalloc allocates the pages at vmalloc time, not after. The
> TLB is populated lazily, but most definately not the page tables.

Is the lazy tlb population interrupt safe or do I need to change any driver
using vmalloced memory from an IRQ ?

2000-10-31 15:18:09

by Ingo Oeser

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, Oct 31, 2000 at 11:35:46AM -0200, Rik van Riel wrote:
> > Rik: What do you think about this (physical cont. area cache) for 2.5?
^^^^^^^^^^^^^^^^^^^^^^^^^ == PCAC
>
> http://www.surriel.com/zone-alloc.html

Read it when you published it first, but didn't notice you still
worked on it ;-)

My approach is still different. We get the HINT for free. And
your zone only shift this problem from page to mem_zone level.

I thought about sth. like this:

/* Adds an physical continuous area of pages to the PCAC.
* To be implemented later, once we decide on a data structure
* for this, which can do fast unique insert and at least O(N)
* retrieve. (Hashes?)
*/
void add_phys_cont_chunk(struct phys_page *start, size_t area_size);

/* Add page(s) to pool, where we prefer to kmalloc() small things
* and vmalloc() things. This gets us close to a best fit
* strategy instead of sth. like the first fit we have now.
*/
void add_small_phys_area(struct phys_page *start, size_t area_size);

/* Gets a chunk of at least area_size pages and removes it from
* the PCAC or NULL of none found.
*
* To be implemented later along with the above routine.
*/
struct phys_page *get_phys_cont_chunk(size_t area_size);

#define suitable(p) (moveable(p) || freeable(p)) /* refine this */
#define MIN_PHYS_CHUNK 2 /* tune this */

/* in physical page scan to transfer REFERENCED bit */

size_t area_size = 0;
struct phys_page *p, *chunk_start;

p = prev = first_phys_page;

while (p != last_phys_page) {
if (area_size) {
if (suitable(p)) {

/* expand recent chunk */
area_size++;

} else {

/* insert last chunk */
if (area_size >= MIN_PHYS_CHUNK)
add_phys_cont_chunk(chunk_start, area_size);
else add_small_phys_area(chunk_start, area_size);
area_size = 0;
}
} else {
if (suitable(p)) {

/* start new chunk */
area_size = 1;
chunk_start = p;
}
}
p = p->next;
}

And later, when we need a physically continuous area >= MIN_PHYS_CHUNK:

/* lookup PCAC for a hint */
struct phys_page *s = get_phys_cont_chunk(area_size);

if (s) {
size_t a = area_size;
struct phys_page *p = s;

/* lock down page tables */
while (a--) {
if ( ! free_or_move_page(p) )
break;
p = p->next;
}
/* unlock page tables*/
if (!a)
return s; /* hey, it worked! */

} else {
/* no hints, try it the old way or fail */
}


Hope it sound not too stupid ;-)

Regards

Ingo Oeser
--
Feel the power of the penguin - run [email protected]
<esc>:x

2000-10-31 15:57:27

by Pauline Middelink

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000 around 08:59:53 -0500, Richard B. Johnson wrote:
[snip]

> Since Linux is starting to be used in many 'strange' non-desktop
> environments, maybe it's time to provide a hook to reserve the
> top N kilobytes of RAM for strange buffers. Like:
>
> append="..,reserve=2M".
>
> Upon startup, a pointer, valid when using the kernel DS, could be
> initialized to point to the beginning of this area. This is essentially
> zero overhead for the kernel because it just points to one longword
> greater than the RAM the kernel will use.

Please look at bigphysarea, it allocates a piece of meory at boottime
and has a small allocator over it to dispatch it to drivers. Mostly
video framegrabbers at this time... But the interface et all is there...

http://www.polyware.nl/~middelink/En/hob-v4l.html

Met vriendelijke groet,
Pauline Middelink
--
PGP Key fingerprint = DE 6B D0 D9 19 AD A7 A0 58 A3 06 9D B6 34 39 E2
For more details look at my website http://www.polyware.nl/~middelink

2000-10-31 15:59:37

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Alan Cox wrote:

> > The code for vmalloc allocates the pages at vmalloc time, not after. The
> > TLB is populated lazily, but most definately not the page tables.
>
> Is the lazy tlb population interrupt safe or do I need to change any driver
> using vmalloced memory from an IRQ ?

It should be safe since it's just copying pgd/pmd pointers into the
per-process page tables; the pte's are still shared.

That said, reading vmalloc.c leads to the discovery that
vmalloc_area_pages will currently race on SMP (the pmd/pte allocation
routines are not SMP safe). Untested/obvious patch below. Ultimately
we'll have to move the locking into pmd_alloc/pte_alloc, but I'm not sure
if that's appropriate so close to 2.4.

-ben


--- v2.4.0-test10-pre7/mm/vmalloc.c Mon Oct 30 16:02:27 2000
+++ test-10-7/mm/vmalloc.c Tue Oct 31 10:58:47 2000
@@ -121,7 +121,11 @@
if (end > PGDIR_SIZE)
end = PGDIR_SIZE;
do {
- pte_t * pte = pte_alloc_kernel(pmd, address);
+ pte_t * pte;
+
+ lock_kernel();
+ pte = pte_alloc_kernel(pmd, address);
+ unlock_kernel();
if (!pte)
return -ENOMEM;
if (alloc_area_pte(pte, address, end - address, gfp_mask, prot))
@@ -142,8 +146,10 @@
flush_cache_all();
do {
pmd_t *pmd;
-
+
+ lock_kernel();
pmd = pmd_alloc_kernel(dir, address);
+ unlock_kernel();
if (!pmd)
return -ENOMEM;


2000-10-31 16:12:09

by Rik van Riel

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, 31 Oct 2000, Ingo Oeser wrote:
> On Tue, Oct 31, 2000 at 11:35:46AM -0200, Rik van Riel wrote:
> > > Rik: What do you think about this (physical cont. area cache) for 2.5?
> ^^^^^^^^^^^^^^^^^^^^^^^^^ == PCAC
> >
> > http://www.surriel.com/zone-alloc.html
>
> Read it when you published it first, but didn't notice you still
> worked on it ;-)
>
> My approach is still different. We get the HINT for free. And
> your zone only shift this problem from page to mem_zone level.

It's a nice idea, but you still want to be sure you won't
allocate eg. page tables randomly in the middle of the
PCACs ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/

2000-10-31 18:22:56

by Ingo Oeser

[permalink] [raw]
Subject: Re: kmalloc() allocation.

On Tue, Oct 31, 2000 at 02:11:24PM -0200, Rik van Riel wrote:
[PCAC]
> It's a nice idea, but you still want to be sure you won't
> allocate eg. page tables randomly in the middle of the
> PCACs ;)

Yes. That's why we check later, whether our hint is still true.

If we cannot free or move all pages from this area, we retry by
looking up the PCAC again (which I forgot to show, because I was
in a hurry :-/ ), or try the old method and might fail there.

So we have only an idea, where we MIGHT have an physical area of
this size, but have no idea whether it is STILL freeable or
movable.

That's why I didn't care about ANY locking[1]. The only thing, that
I take for granted, is that all struct phys_page are linked
together and represent the whole systems physical memory
including holes. Anything that breaks these assumption must be
fixed in my code.

Once you have implemented physical page scanning, I'll try to
implement this. I just needed to know, whether you like the idea,
or it is total crap ;-)

Another problem is, that the PCAC needs memory to store these
areas. One possibility is to store in in struct phys_page and
have a global hash table for the sizes. But these are details for
2.5 and not for now ;-)

Regards

Ingo Oeser

[1] Later I have to lock the PCAC related structures of course.
--
Feel the power of the penguin - run [email protected]
<esc>:x