2008-02-20 12:23:50

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Document huge memory/cache overhead of memory controller in Kconfig

I was a little surprised that 2.6.25-rc* increased struct page for the memory
controller. At least on many x86-64 machines it will not fit into a single
cache line now anymore and also costs considerable amounts of RAM.
At earlier review I remembered asking for a external data structure for this.

It's also quite unobvious that a innocent looking Kconfig option with a
single line Kconfig description has such a negative effect.

This patch attempts to document these disadvantages at least so that users
configuring their kernel can make a informed decision.

Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

Index: linux/init/Kconfig
===================================================================
--- linux.orig/init/Kconfig
+++ linux/init/Kconfig
@@ -394,6 +394,14 @@ config CGROUP_MEM_CONT
Provides a memory controller that manages both page cache and
RSS memory.

+ Note that setting this option increases fixed memory overhead
+ associated with each page of memory in the system by 4/8 bytes
+ and also increases cache misses because struct page on many 64bit
+ systems will not fit into a single cache line anymore.
+
+ Only enable when you're ok with these trade offs and really
+ sure you need the memory controller.
+
config PROC_PID_CPUSET
bool "Include legacy /proc/<pid>/cpuset file"
depends on CPUSETS


2008-02-20 13:00:54

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Andi Kleen wrote:
> Document huge memory/cache overhead of memory controller in Kconfig
>
> I was a little surprised that 2.6.25-rc* increased struct page for the memory
> controller. At least on many x86-64 machines it will not fit into a single
> cache line now anymore and also costs considerable amounts of RAM.

The size of struct page earlier was 56 bytes on x86_64 and with 64 bytes it
won't fit into the cacheline anymore? Please also look at
http://lwn.net/Articles/234974/

> At earlier review I remembered asking for a external data structure for this.
>
> It's also quite unobvious that a innocent looking Kconfig option with a
> single line Kconfig description has such a negative effect.
>
> This patch attempts to document these disadvantages at least so that users
> configuring their kernel can make a informed decision.
>
> Cc: [email protected]
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig
> +++ linux/init/Kconfig
> @@ -394,6 +394,14 @@ config CGROUP_MEM_CONT
> Provides a memory controller that manages both page cache and
> RSS memory.
>
> + Note that setting this option increases fixed memory overhead
> + associated with each page of memory in the system by 4/8 bytes
> + and also increases cache misses because struct page on many 64bit
> + systems will not fit into a single cache line anymore.
> +
> + Only enable when you're ok with these trade offs and really
> + sure you need the memory controller.
> +

Looks good

Acked-by: Balbir Singh <[email protected]>

> config PROC_PID_CPUSET
> bool "Include legacy /proc/<pid>/cpuset file"
> depends on CPUSETS


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-20 15:12:19

by John Stoffel

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

>>>>> "Balbir" == Balbir Singh <[email protected]> writes:

Balbir> Andi Kleen wrote:
>> Document huge memory/cache overhead of memory controller in Kconfig
>>
>> I was a little surprised that 2.6.25-rc* increased struct page for the memory
>> controller. At least on many x86-64 machines it will not fit into a single
>> cache line now anymore and also costs considerable amounts of RAM.

Balbir> The size of struct page earlier was 56 bytes on x86_64 and with 64 bytes it
Balbir> won't fit into the cacheline anymore? Please also look at
Balbir> http://lwn.net/Articles/234974/

>> At earlier review I remembered asking for a external data structure for this.
>>
>> It's also quite unobvious that a innocent looking Kconfig option with a
>> single line Kconfig description has such a negative effect.
>>
>> This patch attempts to document these disadvantages at least so that users
>> configuring their kernel can make a informed decision.
>>
>> Cc: [email protected]
>>
>> Signed-off-by: Andi Kleen <[email protected]>
>>
>> Index: linux/init/Kconfig
>> ===================================================================
>> --- linux.orig/init/Kconfig
>> +++ linux/init/Kconfig
>> @@ -394,6 +394,14 @@ config CGROUP_MEM_CONT
>> Provides a memory controller that manages both page cache and
>> RSS memory.
>>
>> + Note that setting this option increases fixed memory overhead
>> + associated with each page of memory in the system by 4/8 bytes
>> + and also increases cache misses because struct page on many 64bit
>> + systems will not fit into a single cache line anymore.
>> +
>> + Only enable when you're ok with these trade offs and really
>> + sure you need the memory controller.
>> +

I know this is a pedantic comment, but why the heck is it called such
a generic term as "Memory Controller" which doesn't give any
indication of what it does.

Shouldn't it be something like "Memory Quota Controller", or "Memory
Limits Controller"?

Also, the Kconfig name "CGROUP_MEM_CONT" is just wrong, it should be
"CGROUP_MEM_CONTROLLER", just spell it out so it's clear what's up.

It took me a bunch of reading of Documentation/controllers/memory.txt
to even start to understand what the purpose of this was. The
document could also use a re-writing to include a clear introduction
at the top to explain "what" a memory controller is.

Something which talks about limits, resource management, quotas, etc
would be nice.

Thanks,
John

2008-02-20 15:30:07

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

John Stoffel wrote:
> I know this is a pedantic comment, but why the heck is it called such
> a generic term as "Memory Controller" which doesn't give any
> indication of what it does.
>
> Shouldn't it be something like "Memory Quota Controller", or "Memory
> Limits Controller"?
>

It's called the memory controller since it controls the amount of memory that a
user can allocate (via limits). The generic term for any resource manager
plugged into cgroups is a controller. If you look through some of the references
in the document, we've listed our plans to support other categories of memory as
well. Hence it's called a memory controller

> Also, the Kconfig name "CGROUP_MEM_CONT" is just wrong, it should be
> "CGROUP_MEM_CONTROLLER", just spell it out so it's clear what's up.
>

This has some history as well. Control groups was called containers earlier.
That way a name like CGROUP_MEM_CONT could stand for cgroup memory container or
cgroup memory controller.

> It took me a bunch of reading of Documentation/controllers/memory.txt
> to even start to understand what the purpose of this was. The
> document could also use a re-writing to include a clear introduction
> at the top to explain "what" a memory controller is.
>
> Something which talks about limits, resource management, quotas, etc
> would be nice.
>


The references, specially reference [1] contains a lot of details on limits,
guarantees, etc. Since they've been documented in the past on lkml, I decided
to keep them out of the documentation and mention them as references. If it's
going to help to add that terminology; I can create another document describing
what resource management means and what the commonly used terms mean.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-20 15:49:23

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig


On Feb 20 2008 20:50, Balbir Singh wrote:
>John Stoffel wrote:
>> I know this is a pedantic comment, but why the heck is it called such
>> a generic term as "Memory Controller" which doesn't give any
>> indication of what it does.
>>
>> Shouldn't it be something like "Memory Quota Controller", or "Memory
>> Limits Controller"?
>
>It's called the memory controller since it controls the amount of
>memory that a user can allocate (via limits). The generic term for
>any resource manager plugged into cgroups is a controller.

For ordinary desktop people, memory controller is what developers
know as MMU or sometimes even some other mysterious piece of silicon
inside the heavy box.

>If you look through some of the references in the document, we've
>listed our plans to support other categories of memory as well.
>Hence it's called a memory controller
>
>> Also, the Kconfig name "CGROUP_MEM_CONT" is just wrong, it should
>> be "CGROUP_MEM_CONTROLLER", just spell it out so it's clear what's
>> up.
>
>This has some history as well. Control groups was called containers
>earlier. That way a name like CGROUP_MEM_CONT could stand for cgroup
>memory container or cgroup memory controller.

CONT is shorthand for "continue" ;-) (SIGCONT, f.ex.), ctrl or ctrlr
it is for controllers (comes from Solaris iirc.)

2008-02-20 16:27:37

by John Stoffel

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

>>>>> "Jan" == Jan Engelhardt <[email protected]> writes:

Jan> On Feb 20 2008 20:50, Balbir Singh wrote:
>> John Stoffel wrote:
>>> I know this is a pedantic comment, but why the heck is it called such
>>> a generic term as "Memory Controller" which doesn't give any
>>> indication of what it does.
>>>
>>> Shouldn't it be something like "Memory Quota Controller", or "Memory
>>> Limits Controller"?
>>
>> It's called the memory controller since it controls the amount of
>> memory that a user can allocate (via limits). The generic term for
>> any resource manager plugged into cgroups is a controller.

Jan> For ordinary desktop people, memory controller is what developers
Jan> know as MMU or sometimes even some other mysterious piece of
Jan> silicon inside the heavy box.

That's what was confusing me at first. I was wondering why we needed
a memory controller when we already had one in Linux!

Also, controlling a resource is more a matter of limits or quotas, not
controls. Well, I'll actually back off on that, since controls does
have a history in other industries.

But for computers, limits is an expected and understood term, and for
filesystems it's quotas. So in this case, I *still* think you should
be using the term "Memory Quota Controller" instead. It just makes it
clearer to a larger audience what you mean.

>> If you look through some of the references in the document, we've
>> listed our plans to support other categories of memory as well.
>> Hence it's called a memory controller
>>
>>> Also, the Kconfig name "CGROUP_MEM_CONT" is just wrong, it should
>>> be "CGROUP_MEM_CONTROLLER", just spell it out so it's clear what's
>>> up.

>> This has some history as well. Control groups was called containers
>> earlier. That way a name like CGROUP_MEM_CONT could stand for
>> cgroup memory container or cgroup memory controller.

Jan> CONT is shorthand for "continue" ;-) (SIGCONT, f.ex.), ctrl or
Jan> ctrlr it is for controllers (comes from Solaris iirc.)

Right, CTLR would be more regular shorthand for CONTROLLER.

Basically, I think you're overloading a commonly used term for your
own uses and when it's exposed to regular users, it will cause
confusion.

Thanks,
John

2008-02-20 16:34:53

by John Stoffel

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

>>>>> "Balbir" == Balbir Singh <[email protected]> writes:

Balbir> John Stoffel wrote:
>> I know this is a pedantic comment, but why the heck is it called such
>> a generic term as "Memory Controller" which doesn't give any
>> indication of what it does.
>>
>> Shouldn't it be something like "Memory Quota Controller", or "Memory
>> Limits Controller"?
>>

Balbir> It's called the memory controller since it controls the amount
Balbir> of memory that a user can allocate (via limits).

Ding! See how you mention limits here? That should be part of the
generic term in the Kconfig to make it crystal clear what you mean by
a memory controller.

Balbir> The generic term for any resource manager plugged into
Balbir> cgroups is a controller.

The general term for managing resources is limits or quotas. Not
controllers.

Balbir> If you look through some of the references in the document,
Balbir> we've listed our plans to support other categories of memory
Balbir> as well. Hence it's called a memory controller

Still don't buy it, sorry. :]

>> Also, the Kconfig name "CGROUP_MEM_CONT" is just wrong, it should be
>> "CGROUP_MEM_CONTROLLER", just spell it out so it's clear what's up.
>>

Balbir> This has some history as well. Control groups was called
Balbir> containers earlier. That way a name like CGROUP_MEM_CONT
Balbir> could stand for cgroup memory container or cgroup memory
Balbir> controller.

>> It took me a bunch of reading of Documentation/controllers/memory.txt
>> to even start to understand what the purpose of this was. The
>> document could also use a re-writing to include a clear introduction
>> at the top to explain "what" a memory controller is.
>>
>> Something which talks about limits, resource management, quotas, etc
>> would be nice.
>>

Balbir> The references, specially reference [1] contains a lot of
Balbir> details on limits, guarantees, etc. Since they've been
Balbir> documented in the past on lkml, I decided to keep them out of
Balbir> the documentation and mention them as references. If it's
Balbir> going to help to add that terminology; I can create another
Balbir> document describing what resource management means and what
Balbir> the commonly used terms mean.

Well, I think you need to first setup a new directory called
Documentation/cgroups/ and then you can put in an introduction.txt and
your controllers.txt files there.

But controllers is just too generic a term. For example, if I'm
talking about a controller on my desktop, does that mean I'm talking
about:

SCSI, IDE, memory, USB, Firewire or serial ports?

I've got all of them on my main system. Again, I think you're
overloading a very generic term in a very non-obvious way and it needs
to be clarified for the regular developers and users.

Thanks,
John

2008-02-20 16:35:30

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

John Stoffel wrote:
>>>>>> "Jan" == Jan Engelhardt <[email protected]> writes:
>
> Jan> On Feb 20 2008 20:50, Balbir Singh wrote:
>>> John Stoffel wrote:
>>>> I know this is a pedantic comment, but why the heck is it called such
>>>> a generic term as "Memory Controller" which doesn't give any
>>>> indication of what it does.
>>>>
>>>> Shouldn't it be something like "Memory Quota Controller", or "Memory
>>>> Limits Controller"?
>>> It's called the memory controller since it controls the amount of
>>> memory that a user can allocate (via limits). The generic term for
>>> any resource manager plugged into cgroups is a controller.
>
> Jan> For ordinary desktop people, memory controller is what developers
> Jan> know as MMU or sometimes even some other mysterious piece of
> Jan> silicon inside the heavy box.
>
> That's what was confusing me at first. I was wondering why we needed
> a memory controller when we already had one in Linux!
>
> Also, controlling a resource is more a matter of limits or quotas, not
> controls. Well, I'll actually back off on that, since controls does
> have a history in other industries.
>
> But for computers, limits is an expected and understood term, and for
> filesystems it's quotas. So in this case, I *still* think you should
> be using the term "Memory Quota Controller" instead. It just makes it
> clearer to a larger audience what you mean.
>

Memory Quota sounds very confusing to me. Usually a quota implies limits, but in
a true framework, one can also implement guarantees and shares.

>>> If you look through some of the references in the document, we've
>>> listed our plans to support other categories of memory as well.
>>> Hence it's called a memory controller
>>>
>>>> Also, the Kconfig name "CGROUP_MEM_CONT" is just wrong, it should
>>>> be "CGROUP_MEM_CONTROLLER", just spell it out so it's clear what's
>>>> up.
>
>>> This has some history as well. Control groups was called containers
>>> earlier. That way a name like CGROUP_MEM_CONT could stand for
>>> cgroup memory container or cgroup memory controller.
>
> Jan> CONT is shorthand for "continue" ;-) (SIGCONT, f.ex.), ctrl or
> Jan> ctrlr it is for controllers (comes from Solaris iirc.)
>
> Right, CTLR would be more regular shorthand for CONTROLLER.
>
> Basically, I think you're overloading a commonly used term for your
> own uses and when it's exposed to regular users, it will cause
> confusion.
>

OK, I'll queue a patch and try to explain various terms used by resource management.

> Thanks,
> John


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-20 16:55:33

by Ray Lee

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

On Wed, Feb 20, 2008 at 7:20 AM, Balbir Singh <[email protected]> wrote:
> John Stoffel wrote:
> > I know this is a pedantic comment, but why the heck is it called such
> > a generic term as "Memory Controller" which doesn't give any
> > indication of what it does.
> >
> > Shouldn't it be something like "Memory Quota Controller", or "Memory
> > Limits Controller"?
> >
>
> It's called the memory controller since it controls the amount of memory that a
> user can allocate (via limits). The generic term for any resource manager
> plugged into cgroups is a controller. If you look through some of the references
> in the document, we've listed our plans to support other categories of memory as
> well. Hence it's called a memory controller

While logical, the term is too generic. Memory [Allocation] Governor
might be closer. Memory Quota Controller actually matches the already
established terminology (quotas).

Regardless, Andi's point remains: At minimum, the kconfig text needs
to be clear for distributors and end-users as to why they'd want to
enable this, or what reasons would cause them to not enable it.

2008-02-20 16:56:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig


> I know this is a pedantic comment, but why the heck is it called such
> a generic term as "Memory Controller" which doesn't give any
> indication of what it does.

I don't think it's pedantic. I would agree with you in fact
that the Kconfig description is not very helpful, even with
my warning added.

-Andi

2008-02-20 16:58:44

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig


> OK, I'll queue a patch and try to explain various terms used by resource management.

Don't make it too verbose or nobody will read it. It should
be more like a one paragraph abstract on a scientific paper
about the linux memory controller.

But I think it should include some variant of the warning that
was in the original patch in this thread (that could be the
second paragraph)

-Andi

2008-02-20 18:19:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Hi!

> >> I know this is a pedantic comment, but why the heck is it called such
> >> a generic term as "Memory Controller" which doesn't give any
> >> indication of what it does.
> >>
> >> Shouldn't it be something like "Memory Quota Controller", or "Memory
> >> Limits Controller"?
> >
> >It's called the memory controller since it controls the amount of
> >memory that a user can allocate (via limits). The generic term for
> >any resource manager plugged into cgroups is a controller.
>
> For ordinary desktop people, memory controller is what developers
> know as MMU or sometimes even some other mysterious piece of silicon
> inside the heavy box.

Actually I'd guess 'memory controller' == 'DRAM controller' == part of
northbridge that talks to DRAM.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-20 18:28:21

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig


On Feb 20 2008 18:19, Pavel Machek wrote:
>>
>> For ordinary desktop people, memory controller is what developers
>> know as MMU or sometimes even some other mysterious piece of silicon
>> inside the heavy box.
>
>Actually I'd guess 'memory controller' == 'DRAM controller' == part of
>northbridge that talks to DRAM.

Yeah that must have been it when Windows says it found a new controller
after changing the mainboard underneath.

2008-02-20 18:51:13

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

On Wed 2008-02-20 19:28:03, Jan Engelhardt wrote:
>
> On Feb 20 2008 18:19, Pavel Machek wrote:
> >>
> >> For ordinary desktop people, memory controller is what developers
> >> know as MMU or sometimes even some other mysterious piece of silicon
> >> inside the heavy box.
> >
> >Actually I'd guess 'memory controller' == 'DRAM controller' == part of
> >northbridge that talks to DRAM.
>
> Yeah that must have been it when Windows says it found a new controller
> after changing the mainboard underneath.

Just for fun... this option really has to be renamed:

Memory controller
~~~~~~~~~~~~~~~~~
>From Wikipedia, the free encyclopedia

The memory controller is a chip on a computer's motherboard or CPU die
which manages the flow of data going to and from the memory.

Most computers based on an Intel processor have a memory controller
implemented on their motherboard's north bridge, though some modern
microprocessors, such as AMD's Athlon 64 and Opteron processors, IBM's
POWER5, and Sun Microsystems UltraSPARC T1 have a memory controller on
the CPU die to reduce the memory latency.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-21 05:11:18

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Nick Piggin wrote:
> On Wednesday 20 February 2008 23:52, Balbir Singh wrote:
>> Andi Kleen wrote:
>>> Document huge memory/cache overhead of memory controller in Kconfig
>>>
>>> I was a little surprised that 2.6.25-rc* increased struct page for the
>>> memory controller. At least on many x86-64 machines it will not fit into
>>> a single cache line now anymore and also costs considerable amounts of
>>> RAM.
>> The size of struct page earlier was 56 bytes on x86_64 and with 64 bytes it
>> won't fit into the cacheline anymore? Please also look at
>> http://lwn.net/Articles/234974/
>
> BTW. We'll probably want to increase the width of some counters
> in struct page at some point for 64-bit, so then it really will
> go over with the memory controller!
>

Hmm...

> Actually, an external data structure is a pretty good idea. We
> could probably do it easily with a radix tree (pfn->memory
> controller). And that might be a better option for distros.
>

I'll put in my long list of TODOs. I started looking at it yesterday again and
here are my early thoughts

1. We could create something similar to mem_map, we would need to handle 4
different ways of creating mem_map.
2. On x86 with 64 GB ram, if we decided to use vmalloc space, we would need 64
MB of vmalloc'ed memory

I have not explored your latest suggestion of pfn <-> memory controller mapping
yet. I'll explore it and see how that goes.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-21 05:51:50

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Nick Piggin wrote:
>> 1. We could create something similar to mem_map, we would need to handle 4
>
>> different ways of creating mem_map.
>
>> 2. On x86 with 64 GB ram, if we decided to use vmalloc space, we would need
>
>> 64 MB of vmalloc'ed memory
>
> That's going to be a big job. You could probably do it quite easily for
>
> flatmem (just store an offset into the start of your page array), and
>
> maybe even sparsemem (add some "extra" information to the extents).
>
>> I have not explored your latest suggestion of pfn <-> memory controller
>
>> mapping yet. I'll explore it and see how that goes.
>
> If you did that using a radix-tree, then it could be a runtime option
>
> without having to use vmalloc. And you wouldn't have to care about
>
> memory models. I'd say it will be the fastest way to get a prototype
>
> running.
>

OK, I'll explore and prototype the radix tree based approach and see how that goes.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-21 06:50:38

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

On Wed, 20 Feb 2008 21:45:13 +0530
Balbir Singh <[email protected]> wrote:

> > But for computers, limits is an expected and understood term, and for
> > filesystems it's quotas. So in this case, I *still* think you should
> > be using the term "Memory Quota Controller" instead. It just makes it
> > clearer to a larger audience what you mean.
> >
>
> Memory Quota sounds very confusing to me. Usually a quota implies limits, but in
> a true framework, one can also implement guarantees and shares.
>
This "cgroup memory contoller" is called as "Memory Resource Contoller"
in my office ;)

How about Memory Resouce Contoller ?


-Kame

2008-02-21 06:56:54

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

KAMEZAWA Hiroyuki wrote:
> On Wed, 20 Feb 2008 21:45:13 +0530
> Balbir Singh <[email protected]> wrote:
>
>>> But for computers, limits is an expected and understood term, and for
>>> filesystems it's quotas. So in this case, I *still* think you should
>>> be using the term "Memory Quota Controller" instead. It just makes it
>>> clearer to a larger audience what you mean.
>>>
>> Memory Quota sounds very confusing to me. Usually a quota implies limits, but in
>> a true framework, one can also implement guarantees and shares.
>>
> This "cgroup memory contoller" is called as "Memory Resource Contoller"
> in my office ;)
>
> How about Memory Resouce Contoller ?

That is a good name and believe me or not I was thinking of the same name.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-21 10:36:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Nick Piggin wrote:
> On Wednesday 20 February 2008 23:52, Balbir Singh wrote:
>> Andi Kleen wrote:
>>> Document huge memory/cache overhead of memory controller in Kconfig
>>>
>>> I was a little surprised that 2.6.25-rc* increased struct page for the
>>> memory controller. At least on many x86-64 machines it will not fit into
>>> a single cache line now anymore and also costs considerable amounts of
>>> RAM.
>> The size of struct page earlier was 56 bytes on x86_64 and with 64 bytes it
>> won't fit into the cacheline anymore? Please also look at
>> http://lwn.net/Articles/234974/
>
> BTW. We'll probably want to increase the width of some counters
> in struct page at some point for 64-bit,

You mean change count to atomic64_t? Do you have real evidence
the 32bit counter is a problem?

> so then it really will
> go over with the memory controller!

Not sure how they are related? The count and the memory controller
data would be always separate.

BTW if the memory controllers were limited in number it would
be also possible on 64bit to encode them in the high bits of
->flags. I assume 16bit or so could be spared in there. Probably
would not be enough though.

> Actually, an external data structure is a pretty good idea. We
> could probably do it easily with a radix tree (pfn->memory
> controller). And that might be a better option for distros.

I would think just a separate vmalloc()ed array for the counters
would be easy enough. That array could be allocated the first time
the memory controller is used (so making it zero cost for
distribution kernels when it is not used at all) and then also on
memory hotplug etc. If we assume most memory will be in
memory controllers that is also more efficient (in terms of
memory and of cache consumption) than any kind
of tree.

Balbir mentioned one reason they didn't do that earlier was
that they worried about the limited vmalloc space on 32bit,
but I don't think that's a good reason against it. That is because
vmalloc on 32bit is limited because of the limited direct
mapped kernel memory, but increasing mem_map size eats that
the same limited resource. So rather the 32bit vmalloc
reservation can be just increased by the same amount as the
mem_map increase would be (ok modulo hotplug, but that
is difficult anyways on 32bit)

Another issue is that it will slightly increase TLB/cache
cost of the memory controller, but I think that would be a fair
trade off for it being zero cost when disabled but compiled
in.

Doing it with vmalloc should be easy enough. I can do such
a patch later unless someone beats me to it...

-Andi

2008-02-21 10:42:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig


> 1. We could create something similar to mem_map, we would need to handle 4

4? At least x86 mainline only has two ways now. flatmem and vmemmap.

> different ways of creating mem_map.

Well it would be only a single way to create the "aux memory controller
map" (or however it will be called). Basically just a call to single
function from a few different places.

> 2. On x86 with 64 GB ram,

First i386 with 64GB just doesn't work, at least not with default 3:1
split. Just calculate it yourself how much of the lowmem area is left
after the 64GB mem_map is allocated. Typical rule of thumb is that 16GB
is the realistic limit for 32bit x86 kernels. Worrying about
anything more does not make much sense.

> if we decided to use vmalloc space, we would need 64
> MB of vmalloc'ed memory

Yes and if you increase mem_map you need exactly the same space
in lowmem too. So increasing the vmalloc reservation for this is
equivalent. Just make sure you use highmem backed vmalloc.

-Andi

2008-02-21 11:09:30

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Andi Kleen wrote:
> Nick Piggin wrote:
>> On Wednesday 20 February 2008 23:52, Balbir Singh wrote:
>>> Andi Kleen wrote:
>>>> Document huge memory/cache overhead of memory controller in Kconfig
>>>>
>>>> I was a little surprised that 2.6.25-rc* increased struct page for the
>>>> memory controller. At least on many x86-64 machines it will not fit into
>>>> a single cache line now anymore and also costs considerable amounts of
>>>> RAM.
>>> The size of struct page earlier was 56 bytes on x86_64 and with 64 bytes it
>>> won't fit into the cacheline anymore? Please also look at
>>> http://lwn.net/Articles/234974/
>> BTW. We'll probably want to increase the width of some counters
>> in struct page at some point for 64-bit,
>
> You mean change count to atomic64_t? Do you have real evidence
> the 32bit counter is a problem?
>
>> so then it really will
>> go over with the memory controller!
>
> Not sure how they are related? The count and the memory controller
> data would be always separate.
>
> BTW if the memory controllers were limited in number it would
> be also possible on 64bit to encode them in the high bits of
> ->flags. I assume 16bit or so could be spared in there. Probably
> would not be enough though.
>
>> Actually, an external data structure is a pretty good idea. We
>> could probably do it easily with a radix tree (pfn->memory
>> controller). And that might be a better option for distros.
>
> I would think just a separate vmalloc()ed array for the counters
> would be easy enough. That array could be allocated the first time
> the memory controller is used (so making it zero cost for
> distribution kernels when it is not used at all) and then also on
> memory hotplug etc. If we assume most memory will be in
> memory controllers that is also more efficient (in terms of
> memory and of cache consumption) than any kind
> of tree.
>
> Balbir mentioned one reason they didn't do that earlier was
> that they worried about the limited vmalloc space on 32bit,
> but I don't think that's a good reason against it. That is because
> vmalloc on 32bit is limited because of the limited direct
> mapped kernel memory, but increasing mem_map size eats that
> the same limited resource. So rather the 32bit vmalloc
> reservation can be just increased by the same amount as the
> mem_map increase would be (ok modulo hotplug, but that
> is difficult anyways on 32bit)
>
> Another issue is that it will slightly increase TLB/cache
> cost of the memory controller, but I think that would be a fair
> trade off for it being zero cost when disabled but compiled
> in.
>
> Doing it with vmalloc should be easy enough. I can do such
> a patch later unless someone beats me to it...
>

I'll get to it, but I have too many things on my plate at the moment. KAMEZAWA
also wanted to look at it. I looked through some vmalloc() internals yesterday
and I am worried about allocating all the memory on a single node in a NUMA
system and changing VMALLOC_XXXX on every architecture to provide more vmalloc
space. I might be missing something obvious.



--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-21 14:46:39

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Hi

> > >> For ordinary desktop people, memory controller is what developers
> > >> know as MMU or sometimes even some other mysterious piece of silicon
> > >> inside the heavy box.
> > >
> > >Actually I'd guess 'memory controller' == 'DRAM controller' == part of
> > >northbridge that talks to DRAM.
> >
> > Yeah that must have been it when Windows says it found a new controller
> > after changing the mainboard underneath.
>
> Just for fun... this option really has to be renamed:

I think one reason of many people easy confusion is caused by bad menu
hierarchy.
I popose mem-cgroup move to child of cgroup and resource counter
(= obey denend on).

if you don't mind, please try to following patch.
may be, looks good than before.

---
init/Kconfig | 52 ++++++++++++++++++++++++++--------------------------
1 file changed, 26 insertions(+), 26 deletions(-)

Index: b/init/Kconfig
===================================================================
--- a/init/Kconfig 2008-02-17 16:44:46.000000000 +0900
+++ b/init/Kconfig 2008-02-21 23:33:51.000000000 +0900
@@ -311,6 +311,32 @@ config CPUSETS

Say N if unsure.

+config PROC_PID_CPUSET
+ bool "Include legacy /proc/<pid>/cpuset file"
+ depends on CPUSETS
+ default y
+
+config CGROUP_CPUACCT
+ bool "Simple CPU accounting cgroup subsystem"
+ depends on CGROUPS
+ help
+ Provides a simple Resource Controller for monitoring the
+ total CPU consumed by the tasks in a cgroup
+
+config RESOURCE_COUNTERS
+ bool "Resource counters"
+ help
+ This option enables controller independent resource accounting
+ infrastructure that works with cgroups
+ depends on CGROUPS
+
+config CGROUP_MEM_CONT
+ bool "Memory controller for cgroups"
+ depends on CGROUPS && RESOURCE_COUNTERS
+ help
+ Provides a memory controller that manages both page cache and
+ RSS memory.
+
config GROUP_SCHED
bool "Group CPU scheduler"
default y
@@ -352,20 +378,6 @@ config CGROUP_SCHED

endchoice

-config CGROUP_CPUACCT
- bool "Simple CPU accounting cgroup subsystem"
- depends on CGROUPS
- help
- Provides a simple Resource Controller for monitoring the
- total CPU consumed by the tasks in a cgroup
-
-config RESOURCE_COUNTERS
- bool "Resource counters"
- help
- This option enables controller independent resource accounting
- infrastructure that works with cgroups
- depends on CGROUPS
-
config SYSFS_DEPRECATED
bool "Create deprecated sysfs files"
depends on SYSFS
@@ -387,18 +399,6 @@ config SYSFS_DEPRECATED
If you are using a distro that was released in 2006 or later,
it should be safe to say N here.

-config CGROUP_MEM_CONT
- bool "Memory controller for cgroups"
- depends on CGROUPS && RESOURCE_COUNTERS
- help
- Provides a memory controller that manages both page cache and
- RSS memory.
-
-config PROC_PID_CPUSET
- bool "Include legacy /proc/<pid>/cpuset file"
- depends on CPUSETS
- default y
-
config RELAY
bool "Kernel->user space relay support (formerly relayfs)"
help

2008-02-21 14:57:18

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

KOSAKI Motohiro wrote:
> Hi
>
>> > >> For ordinary desktop people, memory controller is what developers
>> > >> know as MMU or sometimes even some other mysterious piece of silicon
>> > >> inside the heavy box.
>> > >
>> > >Actually I'd guess 'memory controller' == 'DRAM controller' == part of
>> > >northbridge that talks to DRAM.
>> >
>> > Yeah that must have been it when Windows says it found a new controller
>> > after changing the mainboard underneath.
>>
>> Just for fun... this option really has to be renamed:
>
> I think one reason of many people easy confusion is caused by bad menu
> hierarchy.
> I popose mem-cgroup move to child of cgroup and resource counter
> (= obey denend on).
>
> if you don't mind, please try to following patch.
> may be, looks good than before.
>

Sure makes sense

> ---
> init/Kconfig | 52 ++++++++++++++++++++++++++--------------------------
> 1 file changed, 26 insertions(+), 26 deletions(-)
>
> Index: b/init/Kconfig
> ===================================================================
> --- a/init/Kconfig 2008-02-17 16:44:46.000000000 +0900
> +++ b/init/Kconfig 2008-02-21 23:33:51.000000000 +0900
> @@ -311,6 +311,32 @@ config CPUSETS
>
> Say N if unsure.
>
> +config PROC_PID_CPUSET
> + bool "Include legacy /proc/<pid>/cpuset file"
> + depends on CPUSETS
> + default y
> +
> +config CGROUP_CPUACCT
> + bool "Simple CPU accounting cgroup subsystem"
> + depends on CGROUPS
> + help
> + Provides a simple Resource Controller for monitoring the
> + total CPU consumed by the tasks in a cgroup
> +
> +config RESOURCE_COUNTERS
> + bool "Resource counters"
> + help
> + This option enables controller independent resource accounting
> + infrastructure that works with cgroups
> + depends on CGROUPS
> +
> +config CGROUP_MEM_CONT
> + bool "Memory controller for cgroups"
> + depends on CGROUPS && RESOURCE_COUNTERS
> + help
> + Provides a memory controller that manages both page cache and
> + RSS memory.
> +

We have some more changes planned for the text and renames planned, including
calling the component as a memory resource controller. The menu changes make
sense, so feel free to push them

Acked-by: Balbir Singh <[email protected]>

> config GROUP_SCHED
> bool "Group CPU scheduler"
> default y
> @@ -352,20 +378,6 @@ config CGROUP_SCHED
>
> endchoice
>
> -config CGROUP_CPUACCT
> - bool "Simple CPU accounting cgroup subsystem"
> - depends on CGROUPS
> - help
> - Provides a simple Resource Controller for monitoring the
> - total CPU consumed by the tasks in a cgroup
> -
> -config RESOURCE_COUNTERS
> - bool "Resource counters"
> - help
> - This option enables controller independent resource accounting
> - infrastructure that works with cgroups
> - depends on CGROUPS
> -
> config SYSFS_DEPRECATED
> bool "Create deprecated sysfs files"
> depends on SYSFS
> @@ -387,18 +399,6 @@ config SYSFS_DEPRECATED
> If you are using a distro that was released in 2006 or later,
> it should be safe to say N here.
>
> -config CGROUP_MEM_CONT
> - bool "Memory controller for cgroups"
> - depends on CGROUPS && RESOURCE_COUNTERS
> - help
> - Provides a memory controller that manages both page cache and
> - RSS memory.
> -
> -config PROC_PID_CPUSET
> - bool "Include legacy /proc/<pid>/cpuset file"
> - depends on CPUSETS
> - default y
> -
> config RELAY
> bool "Kernel->user space relay support (formerly relayfs)"
> help


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-21 23:55:53

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Hi!

> > > >> For ordinary desktop people, memory controller is what developers
> > > >> know as MMU or sometimes even some other mysterious piece of silicon
> > > >> inside the heavy box.
> > > >
> > > >Actually I'd guess 'memory controller' == 'DRAM controller' == part of
> > > >northbridge that talks to DRAM.
> > >
> > > Yeah that must have been it when Windows says it found a new controller
> > > after changing the mainboard underneath.
> >
> > Just for fun... this option really has to be renamed:
>
> I think one reason of many people easy confusion is caused by bad menu
> hierarchy.
> I popose mem-cgroup move to child of cgroup and resource counter
> (= obey denend on).

> +config CGROUP_MEM_CONT
> + bool "Memory controller for cgroups"

Memory _resource_ controller for cgroups?

> + depends on CGROUPS && RESOURCE_COUNTERS
> + help
> + Provides a memory controller that manages both page cache and

Same here.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-22 03:12:15

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Hi

> > I think one reason of many people easy confusion is caused by bad menu
> > hierarchy.
> > I popose mem-cgroup move to child of cgroup and resource counter
> > (= obey denend on).
>
> > +config CGROUP_MEM_CONT
> > + bool "Memory controller for cgroups"
>
> Memory _resource_ controller for cgroups?

Ahhh
my proposal only change menu hierarchy.
I don't know best name and i hope avoid rename discussion ;-)

Thanks.


- kosaki

2008-02-22 03:18:07

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

On Wednesday 20 February 2008 23:52, Balbir Singh wrote:
> Andi Kleen wrote:
> > Document huge memory/cache overhead of memory controller in Kconfig
> >
> > I was a little surprised that 2.6.25-rc* increased struct page for the
> > memory controller. At least on many x86-64 machines it will not fit into
> > a single cache line now anymore and also costs considerable amounts of
> > RAM.
>
> The size of struct page earlier was 56 bytes on x86_64 and with 64 bytes it
> won't fit into the cacheline anymore? Please also look at
> http://lwn.net/Articles/234974/

BTW. We'll probably want to increase the width of some counters
in struct page at some point for 64-bit, so then it really will
go over with the memory controller!

Actually, an external data structure is a pretty good idea. We
could probably do it easily with a radix tree (pfn->memory
controller). And that might be a better option for distros.

2008-02-22 04:46:18

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Andi Kleen wrote:
>> 1. We could create something similar to mem_map, we would need to handle 4
>
> 4? At least x86 mainline only has two ways now. flatmem and vmemmap.
>
>> different ways of creating mem_map.
>
> Well it would be only a single way to create the "aux memory controller
> map" (or however it will be called). Basically just a call to single
> function from a few different places.
>
>> 2. On x86 with 64 GB ram,
>
> First i386 with 64GB just doesn't work, at least not with default 3:1
> split. Just calculate it yourself how much of the lowmem area is left
> after the 64GB mem_map is allocated. Typical rule of thumb is that 16GB
> is the realistic limit for 32bit x86 kernels. Worrying about
> anything more does not make much sense.
>

I understand what you say Andi, but nothing in the kernel stops us from
supporting 64GB. Should a framework like memory controller make an assumption
that not more than 16GB will be configured on an x86 box?

>> if we decided to use vmalloc space, we would need 64
>> MB of vmalloc'ed memory
>
> Yes and if you increase mem_map you need exactly the same space
> in lowmem too. So increasing the vmalloc reservation for this is
> equivalent. Just make sure you use highmem backed vmalloc.
>

I see two problems with using vmalloc. One, the reservation needs to be done
across architectures. Two, a big vmalloc chunk is not node aware, if all the
pages come from the same node, we have a penalty to pay in a NUMA system.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-22 06:57:33

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

On Thu, 21 Feb 2008 16:33:33 +0530
Balbir Singh <[email protected]> wrote:

> > Another issue is that it will slightly increase TLB/cache
> > cost of the memory controller, but I think that would be a fair
> > trade off for it being zero cost when disabled but compiled
> > in.
> >
> > Doing it with vmalloc should be easy enough. I can do such
> > a patch later unless someone beats me to it...
> >
>
> I'll get to it, but I have too many things on my plate at the moment. KAMEZAWA
> also wanted to look at it. I looked through some vmalloc() internals yesterday
> and I am worried about allocating all the memory on a single node in a NUMA
> system and changing VMALLOC_XXXX on every architecture to provide more vmalloc
> space. I might be missing something obvious.
>

I'll post a series of patch to do that later (it's under debug now...)
I'm glad if people (including you) look it and give me advices.

Regards,
-Kame

2008-02-22 07:11:00

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Feb 2008 16:33:33 +0530
> Balbir Singh <[email protected]> wrote:
>
>>> Another issue is that it will slightly increase TLB/cache
>>> cost of the memory controller, but I think that would be a fair
>>> trade off for it being zero cost when disabled but compiled
>>> in.
>>>
>>> Doing it with vmalloc should be easy enough. I can do such
>>> a patch later unless someone beats me to it...
>>>
>> I'll get to it, but I have too many things on my plate at the moment. KAMEZAWA
>> also wanted to look at it. I looked through some vmalloc() internals yesterday
>> and I am worried about allocating all the memory on a single node in a NUMA
>> system and changing VMALLOC_XXXX on every architecture to provide more vmalloc
>> space. I might be missing something obvious.
>>
>
> I'll post a series of patch to do that later (it's under debug now...)
> I'm glad if people (including you) look it and give me advices.
>

Thank you so much for your help. I'll definitely look at it and review/test them.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-22 09:49:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Balbir Singh wrote:
> Andi Kleen wrote:
>>> 1. We could create something similar to mem_map, we would need to handle 4
>> 4? At least x86 mainline only has two ways now. flatmem and vmemmap.
>>
>>> different ways of creating mem_map.
>> Well it would be only a single way to create the "aux memory controller
>> map" (or however it will be called). Basically just a call to single
>> function from a few different places.
>>
>>> 2. On x86 with 64 GB ram,
>> First i386 with 64GB just doesn't work, at least not with default 3:1
>> split. Just calculate it yourself how much of the lowmem area is left
>> after the 64GB mem_map is allocated. Typical rule of thumb is that 16GB
>> is the realistic limit for 32bit x86 kernels. Worrying about
>> anything more does not make much sense.
>>
>
> I understand what you say Andi, but nothing in the kernel stops us from
> supporting 64GB.

Well in practice it just won't work at least at default page offset.

> Should a framework like memory controller make an assumption
> that not more than 16GB will be configured on an x86 box?

It doesn't need to. Just increase __VMALLOC_RESERVE by the
respective amount (end_pfn * sizeof(unsigned long))

Then 64GB still won't work in practice, but at least you made no such
assumption in theory @)

Also there is the issue of memory hotplug. In theory later
memory hotplugs could fill up vmalloc. Luckily x86 BIOS
are supposed to declare how much they plan to hot add memory later
using the SRAT memory hotplug area (in fact the old non sparsemem
hotadd implementation even relied on that). It would
be possible to adjust __VMALLOC_RESERVE at boot even for that. I suspect
this issue could be also just ignored at first; it is unlikely
to be serious.


>>> if we decided to use vmalloc space, we would need 64
>>> MB of vmalloc'ed memory
>> Yes and if you increase mem_map you need exactly the same space
>> in lowmem too. So increasing the vmalloc reservation for this is
>> equivalent. Just make sure you use highmem backed vmalloc.
>>
>
> I see two problems with using vmalloc. One, the reservation needs to be done
> across architectures.

Only on 32bit. Ok hacking it into all 32bit architectures might be
difficult, but I assume it would be ok to rely on the architecture
maintainers for that and only enable it on some selected architectures
using Kconfig for now.

On 64bit vmalloc should be by default large enough so it could
be enabled for all 64bit architectures.

>Two, a big vmalloc chunk is not node aware,

vmalloc_node()

-Andi

2008-02-22 12:19:41

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Andi Kleen wrote:
> Balbir Singh wrote:
>> Andi Kleen wrote:
>>>> 1. We could create something similar to mem_map, we would need to handle 4
>>> 4? At least x86 mainline only has two ways now. flatmem and vmemmap.
>>>
>>>> different ways of creating mem_map.
>>> Well it would be only a single way to create the "aux memory controller
>>> map" (or however it will be called). Basically just a call to single
>>> function from a few different places.
>>>
>>>> 2. On x86 with 64 GB ram,
>>> First i386 with 64GB just doesn't work, at least not with default 3:1
>>> split. Just calculate it yourself how much of the lowmem area is left
>>> after the 64GB mem_map is allocated. Typical rule of thumb is that 16GB
>>> is the realistic limit for 32bit x86 kernels. Worrying about
>>> anything more does not make much sense.
>>>
>> I understand what you say Andi, but nothing in the kernel stops us from
>> supporting 64GB.
>
> Well in practice it just won't work at least at default page offset.
>
>> Should a framework like memory controller make an assumption
>> that not more than 16GB will be configured on an x86 box?
>
> It doesn't need to. Just increase __VMALLOC_RESERVE by the
> respective amount (end_pfn * sizeof(unsigned long))
>
> Then 64GB still won't work in practice, but at least you made no such
> assumption in theory @)
>
> Also there is the issue of memory hotplug. In theory later
> memory hotplugs could fill up vmalloc. Luckily x86 BIOS
> are supposed to declare how much they plan to hot add memory later
> using the SRAT memory hotplug area (in fact the old non sparsemem
> hotadd implementation even relied on that). It would
> be possible to adjust __VMALLOC_RESERVE at boot even for that. I suspect
> this issue could be also just ignored at first; it is unlikely
> to be serious.
>

My concern with all the points you mentioned is that this solution might need to
change again, depending on the factors you've mentioned. vmalloc() is good and
straightforward, but it has these dependencies which could call for another
rewrite of the code.

>
>>>> if we decided to use vmalloc space, we would need 64
>>>> MB of vmalloc'ed memory
>>> Yes and if you increase mem_map you need exactly the same space
>>> in lowmem too. So increasing the vmalloc reservation for this is
>>> equivalent. Just make sure you use highmem backed vmalloc.
>>>
>> I see two problems with using vmalloc. One, the reservation needs to be done
>> across architectures.
>
> Only on 32bit. Ok hacking it into all 32bit architectures might be
> difficult, but I assume it would be ok to rely on the architecture
> maintainers for that and only enable it on some selected architectures
> using Kconfig for now.
>

Yes, but that's not such a good idea

> On 64bit vmalloc should be by default large enough so it could
> be enabled for all 64bit architectures.
>
>> Two, a big vmalloc chunk is not node aware,
>
> vmalloc_node()
>

vmalloc_node() would need to work much the same way as mem_map does. I am
tempted to try the mem_map and radix tree approaches. I think KAMEZAWA is
already working and has a first draft of the radix tree changes ready.

> -Andi


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-02-22 12:59:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

On Fri, Feb 22, 2008 at 05:44:47PM +0530, Balbir Singh wrote:
>
> My concern with all the points you mentioned is that this solution might need to
> change again,

No why would it need to change again?

> depending on the factors you've mentioned. vmalloc() is good and
> straightforward, but it has these dependencies which could call for another
> rewrite of the code.

The hotplug change would not need a rewrite of anything, just
some additional code in the SRAT parser to increase __VMALLOC_RESERVE for
each hotplug region. It's likely <= 3 additional lines.

>
> >
> >>>> if we decided to use vmalloc space, we would need 64
> >>>> MB of vmalloc'ed memory
> >>> Yes and if you increase mem_map you need exactly the same space
> >>> in lowmem too. So increasing the vmalloc reservation for this is
> >>> equivalent. Just make sure you use highmem backed vmalloc.
> >>>
> >> I see two problems with using vmalloc. One, the reservation needs to be done
> >> across architectures.
> >
> > Only on 32bit. Ok hacking it into all 32bit architectures might be
> > difficult, but I assume it would be ok to rely on the architecture
> > maintainers for that and only enable it on some selected architectures
> > using Kconfig for now.
> >
>
> Yes, but that's not such a good idea

Waiting for the maintainers? Why not?

I assume the memory controller would be primarily used on larger
systems anyways and except for i386 these should be mostly 64bit
these days anyways.

> > On 64bit vmalloc should be by default large enough so it could
> > be enabled for all 64bit architectures.
> >
> >> Two, a big vmalloc chunk is not node aware,
> >
> > vmalloc_node()
> >
>
> vmalloc_node() would need to work much the same way as mem_map does. I am

would? It already is implemented and works just fine AFAIK.

I don't understand the rest of your point.

-Andi

2008-02-22 15:52:17

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Document huge memory/cache overhead of memory controller in Kconfig

Andi Kleen wrote:
> On Fri, Feb 22, 2008 at 05:44:47PM +0530, Balbir Singh wrote:
>> My concern with all the points you mentioned is that this solution might need to
>> change again,
>
> No why would it need to change again?
>
>> depending on the factors you've mentioned. vmalloc() is good and
>> straightforward, but it has these dependencies which could call for another
>> rewrite of the code.
>
> The hotplug change would not need a rewrite of anything, just
> some additional code in the SRAT parser to increase __VMALLOC_RESERVE for
> each hotplug region. It's likely <= 3 additional lines.
>

Yes, but that is hotplug changes only for i386/x86-64.

>>>>>> if we decided to use vmalloc space, we would need 64
>>>>>> MB of vmalloc'ed memory
>>>>> Yes and if you increase mem_map you need exactly the same space
>>>>> in lowmem too. So increasing the vmalloc reservation for this is
>>>>> equivalent. Just make sure you use highmem backed vmalloc.
>>>>>
>>>> I see two problems with using vmalloc. One, the reservation needs to be done
>>>> across architectures.
>>> Only on 32bit. Ok hacking it into all 32bit architectures might be
>>> difficult, but I assume it would be ok to rely on the architecture
>>> maintainers for that and only enable it on some selected architectures
>>> using Kconfig for now.
>>>
>> Yes, but that's not such a good idea
>
> Waiting for the maintainers? Why not?

It limits the platforms the code can run on. A feature independent of the
architecture should if possible not depend on architecture specific support

>
> I assume the memory controller would be primarily used on larger
> systems anyways and except for i386 these should be mostly 64bit
> these days anyways.
>
>>> On 64bit vmalloc should be by default large enough so it could
>>> be enabled for all 64bit architectures.
>>>
>>>> Two, a big vmalloc chunk is not node aware,
>>> vmalloc_node()
>>>
>> vmalloc_node() would need to work much the same way as mem_map does. I am
>
> would? It already is implemented and works just fine AFAIK.
>
> I don't understand the rest of your point.
>

Oh! I guess, it's the extra I am. The point I was trying to make was that we
would need to split up the cgroup map the same way as the per node mem_map.

> -Andi


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL